Subject: [scala-user] Re: Process, ProcessBuilder, ProcessLogger: Too many open files
(Accidentally posted reply to just Daniel rather than the group, so
I'll summarize here.)

I'm using the Scala IDE for Eclipse, which uses 2.9.1.final. I
checked out the latest scala from trunk, and so far things are running
very well! The number of open files stays right around 100 using 64
threads. This is great!

Thank you all for taking the time and effort to help me out, and for
fixing that issue. I really appreciate it! The scala community is
very lucky to have you!

Cheers,

William Harvey
http://www.cse.ohio-state.edu/~harveywi

On Feb 21, 7:02 pm, Daniel Sobral <dcsob...@xxxxxxxxx> wrote:
> You forgot to mention the version of Scala you are using.
>
> By the way, scala.io.Source must be closed explicitly. At the time I
> started using Scala, however, most examples didn't do that, because it
> would break the wonderful one-liners. Mind you, I was as guilty of
> that as anyone else, but, at any rate, if you are not closing your
> scala.io.Source, you should revise that.
>
> The process library was leaking file descriptors. A fix to that went
> in just last week, which I believe took care of all cases. If,
> however, you can reproduce the problem with a recent Scala from trunk,
> I'd be most interested in hearing about it. Also, if you are passing a
> ProcessIO, then your code is responsible for closing the streams that
> are passed to it.
>
> On Tue, Feb 21, 2012 at 20:59, William Harvey
>
>
>
>
>
>
>
>
>
> <harve...@xxxxxxxxxxxxxxxxxx> wrote:
> > Dear All,
>
> > I'm writing some code to read a bunch of file names from an input
> > file, then run a linux program on each of them in parallel.  I am
> > using scala.sys.process.Process to create a ProcessBuilder, and I'm
> > using a custom ProcessLogger to take the output of the linux program
> > and pluck out the console output that I need.  The relevant bits of my
> > code look like this:
>
> >      val accInfo = new Array[Float](numConformations)
>
> >      final class DsspOutputParser(conformationID: Int) {
> >        var numLinesRead = 0
> >        var residueIdx = 0
> >        @inline
> >        def processLine(line: String): Unit = {
> >          numLinesRead += 1
> >          if (numLinesRead > 25) {
> >            val acc = line.substring(34, 38).trim.toInt
> >            accInfo(conformationID) = acc
> >            residueIdx += 1
> >          }
> >        }
> >      }
>
> >      val tasks = withBufferedReader(new File(config.datasetDir,
> > "conformation_filenames.txt"))(br => {
> >        Iterator.continually(br.readLine()).takeWhile(_ !=
> > null).toList.zipWithIndex.map {
> >          case (conformationFilename, i) => {
> >            future {
> >              println((i+1) + " of " + numConformations)
>
> >              val pb = Process(dsspCommand + " " +
> > conformationFilename)
> >              val outputParser = new DsspOutputParser(i)
> >              val procLog = ProcessLogger(outputParser.processLine(_))
> >              pb.!(ProcessLogger(outputParser.processLine(_)))
>
> >            }
> >          }
> >        }
> >      })
>
> >      tasks.grouped(10).foreach{group =>
> > scala.actors.Futures.awaitAll(Long.MaxValue / 2L, group: _*)}
>
> > The problem is that after running for a while, the program barfs with
> > the following error:
>
> > <function0>: caught java.io.IOException: Cannot run program "dssp-2-
> > linux-amd64": java.io.IOException: error=24, Too many open files
> > java.io.IOException: Cannot run program "dssp-2-linux-amd64":
> > java.io.IOException: error=24, Too many open files
> >        at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
> >        at scala.sys.process.ProcessBuilderImpl
> > $Simple.run(ProcessBuilderImpl.scala:68)
> >        at scala.sys.process.ProcessBuilderImpl
> > $AbstractBuilder.run(ProcessBuilderImpl.scala:99)
> >        at scala.sys.process.ProcessBuilderImpl$AbstractBuilder$$anonfun
> > $runBuffered$1.apply(ProcessBuilderImpl.scala:147)
> >        at scala.sys.process.ProcessBuilderImpl$AbstractBuilder$$anonfun
> > $runBuffered$1.apply(ProcessBuilderImpl.scala:147)
> >        at
> > scala.sys.process.ProcessLogger$$anon$1.buffer(ProcessLogger.scala:
> > 64)
> >        at scala.sys.process.ProcessBuilderImpl
> > $AbstractBuilder.runBuffered(ProcessBuilderImpl.scala:147)
> >        at scala.sys.process.ProcessBuilderImpl$AbstractBuilder.
> > $bang(ProcessBuilderImpl.scala:113)
> >        [et cetera]
>
> > So it looks like something might not be cleaning up after itself, or
> > the garbage collector might not be aggressive enough (i.e. it looks
> > like the process streams aren't getting properly closed).  The output
> > of lsof lists a bunch of lines that look like this:
>
> > java    23555 harveywi 7693w  FIFO    0,8      0t0 866166108 pipe
>
> > I have seen a similar problem before a few years back (using
> > scala.io.Source.fromFile(_) in small parallel batches), and the
> > solution was to invoke System.gc() periodically to make sure that the
> > offending streams were closed and cleaned up.  However, that
> > workaround doesn't seem to be effective anymore.
>
> > I am using sun java 1.6.0_29, and the VM arguments that I am using are
> > "-Xmx1G -server".  I am using the default garbage collector, which
> > might be the problem.
>
> > Am I doing something really silly?  If it is up to me to close the
> > streams manually, how do I do that?  I poked through the scala
> > standard library source and didn't see anything obvious.  If it's not
> > my responsibility to close the streams, and the scala standard library
> > is properly closing them, do you have any good ideas on how I might be
> > able to get around this issue?
>
> > Thank you!
>
> > -William Harvey
> >http://www.cse.ohio-state.edu/~harveywi
>
> --
> Daniel C. Sobral
>
> I travel to the future all the time.

(C)2011 mailinglist-archive.com