Subject: [scala-user] Process, ProcessBuilder, ProcessLogger: Too many open files Dear All, I'm writing some code to read a bunch of file names from an input file, then run a linux program on each of them in parallel. I am using scala.sys.process.Process to create a ProcessBuilder, and I'm using a custom ProcessLogger to take the output of the linux program and pluck out the console output that I need. The relevant bits of my code look like this: val accInfo = new Array[Float](numConformations) final class DsspOutputParser(conformationID: Int) { var numLinesRead = 0 var residueIdx = 0 @inline def processLine(line: String): Unit = { numLinesRead += 1 if (numLinesRead > 25) { val acc = line.substring(34, 38).trim.toInt accInfo(conformationID) = acc residueIdx += 1 } } } val tasks = withBufferedReader(new File(config.datasetDir, "conformation_filenames.txt"))(br => { Iterator.continually(br.readLine()).takeWhile(_ != null).toList.zipWithIndex.map { case (conformationFilename, i) => { future { println((i+1) + " of " + numConformations) val pb = Process(dsspCommand + " " + conformationFilename) val outputParser = new DsspOutputParser(i) val procLog = ProcessLogger(outputParser.processLine(_)) pb.!(ProcessLogger(outputParser.processLine(_))) } } } }) tasks.grouped(10).foreach{group => scala.actors.Futures.awaitAll(Long.MaxValue / 2L, group: _*)} The problem is that after running for a while, the program barfs with the following error: <function0>: caught java.io.IOException: Cannot run program "dssp-2- linux-amd64": java.io.IOException: error=24, Too many open files java.io.IOException: Cannot run program "dssp-2-linux-amd64": java.io.IOException: error=24, Too many open files at java.lang.ProcessBuilder.start(ProcessBuilder.java:460) at scala.sys.process.ProcessBuilderImpl $Simple.run(ProcessBuilderImpl.scala:68) at scala.sys.process.ProcessBuilderImpl $AbstractBuilder.run(ProcessBuilderImpl.scala:99) at scala.sys.process.ProcessBuilderImpl$AbstractBuilder$$anonfun $runBuffered$1.apply(ProcessBuilderImpl.scala:147) at scala.sys.process.ProcessBuilderImpl$AbstractBuilder$$anonfun $runBuffered$1.apply(ProcessBuilderImpl.scala:147) at scala.sys.process.ProcessLogger$$anon$1.buffer(ProcessLogger.scala: 64) at scala.sys.process.ProcessBuilderImpl $AbstractBuilder.runBuffered(ProcessBuilderImpl.scala:147) at scala.sys.process.ProcessBuilderImpl$AbstractBuilder. $bang(ProcessBuilderImpl.scala:113) [et cetera] So it looks like something might not be cleaning up after itself, or the garbage collector might not be aggressive enough (i.e. it looks like the process streams aren't getting properly closed). The output of lsof lists a bunch of lines that look like this: java 23555 harveywi 7693w FIFO 0,8 0t0 866166108 pipe I have seen a similar problem before a few years back (using scala.io.Source.fromFile(_) in small parallel batches), and the solution was to invoke System.gc() periodically to make sure that the offending streams were closed and cleaned up. However, that workaround doesn't seem to be effective anymore. I am using sun java 1.6.0_29, and the VM arguments that I am using are "-Xmx1G -server". I am using the default garbage collector, which might be the problem. Am I doing something really silly? If it is up to me to close the streams manually, how do I do that? I poked through the scala standard library source and didn't see anything obvious. If it's not my responsibility to close the streams, and the scala standard library is properly closing them, do you have any good ideas on how I might be able to get around this issue? Thank you! -William Harvey http://www.cse.ohio-state.edu/~harveywi |