Subject: Re: [scala-user] Process, ProcessBuilder, ProcessLogger: Too many open files
You forgot to mention the version of Scala you are using.

By the way, scala.io.Source must be closed explicitly. At the time I
started using Scala, however, most examples didn't do that, because it
would break the wonderful one-liners. Mind you, I was as guilty of
that as anyone else, but, at any rate, if you are not closing your
scala.io.Source, you should revise that.

The process library was leaking file descriptors. A fix to that went
in just last week, which I believe took care of all cases. If,
however, you can reproduce the problem with a recent Scala from trunk,
I'd be most interested in hearing about it. Also, if you are passing a
ProcessIO, then your code is responsible for closing the streams that
are passed to it.

On Tue, Feb 21, 2012 at 20:59, William Harvey
<harveywi@xxxxxxxxxxxxxxxxxx> wrote:
> Dear All,
>
> I'm writing some code to read a bunch of file names from an input
> file, then run a linux program on each of them in parallel.  I am
> using scala.sys.process.Process to create a ProcessBuilder, and I'm
> using a custom ProcessLogger to take the output of the linux program
> and pluck out the console output that I need.  The relevant bits of my
> code look like this:
>
>      val accInfo = new Array[Float](numConformations)
>
>      final class DsspOutputParser(conformationID: Int) {
>        var numLinesRead = 0
>        var residueIdx = 0
>        @inline
>        def processLine(line: String): Unit = {
>          numLinesRead += 1
>          if (numLinesRead > 25) {
>            val acc = line.substring(34, 38).trim.toInt
>            accInfo(conformationID) = acc
>            residueIdx += 1
>          }
>        }
>      }
>
>      val tasks = withBufferedReader(new File(config.datasetDir,
> "conformation_filenames.txt"))(br => {
>        Iterator.continually(br.readLine()).takeWhile(_ !=
> null).toList.zipWithIndex.map {
>          case (conformationFilename, i) => {
>            future {
>              println((i+1) + " of " + numConformations)
>
>              val pb = Process(dsspCommand + " " +
> conformationFilename)
>              val outputParser = new DsspOutputParser(i)
>              val procLog = ProcessLogger(outputParser.processLine(_))
>              pb.!(ProcessLogger(outputParser.processLine(_)))
>
>            }
>          }
>        }
>      })
>
>      tasks.grouped(10).foreach{group =>
> scala.actors.Futures.awaitAll(Long.MaxValue / 2L, group: _*)}
>
> The problem is that after running for a while, the program barfs with
> the following error:
>
> <function0>: caught java.io.IOException: Cannot run program "dssp-2-
> linux-amd64": java.io.IOException: error=24, Too many open files
> java.io.IOException: Cannot run program "dssp-2-linux-amd64":
> java.io.IOException: error=24, Too many open files
>        at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
>        at scala.sys.process.ProcessBuilderImpl
> $Simple.run(ProcessBuilderImpl.scala:68)
>        at scala.sys.process.ProcessBuilderImpl
> $AbstractBuilder.run(ProcessBuilderImpl.scala:99)
>        at scala.sys.process.ProcessBuilderImpl$AbstractBuilder$$anonfun
> $runBuffered$1.apply(ProcessBuilderImpl.scala:147)
>        at scala.sys.process.ProcessBuilderImpl$AbstractBuilder$$anonfun
> $runBuffered$1.apply(ProcessBuilderImpl.scala:147)
>        at scala.sys.process.ProcessLogger$$anon$1.buffer(ProcessLogger.scala:
> 64)
>        at scala.sys.process.ProcessBuilderImpl
> $AbstractBuilder.runBuffered(ProcessBuilderImpl.scala:147)
>        at scala.sys.process.ProcessBuilderImpl$AbstractBuilder.
> $bang(ProcessBuilderImpl.scala:113)
>        [et cetera]
>
> So it looks like something might not be cleaning up after itself, or
> the garbage collector might not be aggressive enough (i.e. it looks
> like the process streams aren't getting properly closed).  The output
> of lsof lists a bunch of lines that look like this:
>
> java    23555 harveywi 7693w  FIFO    0,8      0t0 866166108 pipe
>
> I have seen a similar problem before a few years back (using
> scala.io.Source.fromFile(_) in small parallel batches), and the
> solution was to invoke System.gc() periodically to make sure that the
> offending streams were closed and cleaned up.  However, that
> workaround doesn't seem to be effective anymore.
>
> I am using sun java 1.6.0_29, and the VM arguments that I am using are
> "-Xmx1G -server".  I am using the default garbage collector, which
> might be the problem.
>
> Am I doing something really silly?  If it is up to me to close the
> streams manually, how do I do that?  I poked through the scala
> standard library source and didn't see anything obvious.  If it's not
> my responsibility to close the streams, and the scala standard library
> is properly closing them, do you have any good ideas on how I might be
> able to get around this issue?
>
> Thank you!
>
> -William Harvey
> http://www.cse.ohio-state.edu/~harveywi



--
Daniel C. Sobral

I travel to the future all the time.

(C)2011 mailinglist-archive.com