Subject: Re: [scala-user] Process, ProcessBuilder, ProcessLogger: Too many open files You forgot to mention the version of Scala you are using. By the way, scala.io.Source must be closed explicitly. At the time I started using Scala, however, most examples didn't do that, because it would break the wonderful one-liners. Mind you, I was as guilty of that as anyone else, but, at any rate, if you are not closing your scala.io.Source, you should revise that. The process library was leaking file descriptors. A fix to that went in just last week, which I believe took care of all cases. If, however, you can reproduce the problem with a recent Scala from trunk, I'd be most interested in hearing about it. Also, if you are passing a ProcessIO, then your code is responsible for closing the streams that are passed to it. On Tue, Feb 21, 2012 at 20:59, William Harvey <harveywi@xxxxxxxxxxxxxxxxxx> wrote: > Dear All, > > I'm writing some code to read a bunch of file names from an input > file, then run a linux program on each of them in parallel. I am > using scala.sys.process.Process to create a ProcessBuilder, and I'm > using a custom ProcessLogger to take the output of the linux program > and pluck out the console output that I need. The relevant bits of my > code look like this: > > val accInfo = new Array[Float](numConformations) > > final class DsspOutputParser(conformationID: Int) { > var numLinesRead = 0 > var residueIdx = 0 > @inline > def processLine(line: String): Unit = { > numLinesRead += 1 > if (numLinesRead > 25) { > val acc = line.substring(34, 38).trim.toInt > accInfo(conformationID) = acc > residueIdx += 1 > } > } > } > > val tasks = withBufferedReader(new File(config.datasetDir, > "conformation_filenames.txt"))(br => { > Iterator.continually(br.readLine()).takeWhile(_ != > null).toList.zipWithIndex.map { > case (conformationFilename, i) => { > future { > println((i+1) + " of " + numConformations) > > val pb = Process(dsspCommand + " " + > conformationFilename) > val outputParser = new DsspOutputParser(i) > val procLog = ProcessLogger(outputParser.processLine(_)) > pb.!(ProcessLogger(outputParser.processLine(_))) > > } > } > } > }) > > tasks.grouped(10).foreach{group => > scala.actors.Futures.awaitAll(Long.MaxValue / 2L, group: _*)} > > The problem is that after running for a while, the program barfs with > the following error: > > <function0>: caught java.io.IOException: Cannot run program "dssp-2- > linux-amd64": java.io.IOException: error=24, Too many open files > java.io.IOException: Cannot run program "dssp-2-linux-amd64": > java.io.IOException: error=24, Too many open files > at java.lang.ProcessBuilder.start(ProcessBuilder.java:460) > at scala.sys.process.ProcessBuilderImpl > $Simple.run(ProcessBuilderImpl.scala:68) > at scala.sys.process.ProcessBuilderImpl > $AbstractBuilder.run(ProcessBuilderImpl.scala:99) > at scala.sys.process.ProcessBuilderImpl$AbstractBuilder$$anonfun > $runBuffered$1.apply(ProcessBuilderImpl.scala:147) > at scala.sys.process.ProcessBuilderImpl$AbstractBuilder$$anonfun > $runBuffered$1.apply(ProcessBuilderImpl.scala:147) > at scala.sys.process.ProcessLogger$$anon$1.buffer(ProcessLogger.scala: > 64) > at scala.sys.process.ProcessBuilderImpl > $AbstractBuilder.runBuffered(ProcessBuilderImpl.scala:147) > at scala.sys.process.ProcessBuilderImpl$AbstractBuilder. > $bang(ProcessBuilderImpl.scala:113) > [et cetera] > > So it looks like something might not be cleaning up after itself, or > the garbage collector might not be aggressive enough (i.e. it looks > like the process streams aren't getting properly closed). The output > of lsof lists a bunch of lines that look like this: > > java 23555 harveywi 7693w FIFO 0,8 0t0 866166108 pipe > > I have seen a similar problem before a few years back (using > scala.io.Source.fromFile(_) in small parallel batches), and the > solution was to invoke System.gc() periodically to make sure that the > offending streams were closed and cleaned up. However, that > workaround doesn't seem to be effective anymore. > > I am using sun java 1.6.0_29, and the VM arguments that I am using are > "-Xmx1G -server". I am using the default garbage collector, which > might be the problem. > > Am I doing something really silly? If it is up to me to close the > streams manually, how do I do that? I poked through the scala > standard library source and didn't see anything obvious. If it's not > my responsibility to close the streams, and the scala standard library > is properly closing them, do you have any good ideas on how I might be > able to get around this issue? > > Thank you! > > -William Harvey > http://www.cse.ohio-state.edu/~harveywi -- Daniel C. Sobral I travel to the future all the time. |