Wednesday, April 6, 2011

Scala IO versus Guava: The Basics

A friend of mine once said that everything in life was about search and sort. Thinking about it for a while, it seems he's right. Almost. The rest is about IO.

IO in Java

Question is how you do IO. Long, long ago, probably before Java 1.2, Java's IO classes were sketchy, to say the least. Later versions solved some of that (introducing Readers and Writers), and eventually, with Java 1.4, we got Java NIO. If all goes well, we will have the new NIO soon.

IO in external libraries

Nevertheless, in many cases, people still rely on external library to make their lives a little easier. Commons IO has been a popular choice for some time, and at some point, Guava also added some IO abstractions to its libraries.

IO in Scala

It makes you wonder about Scala's IO classes. At first, it doesn't look too good. The 'scala.io' package has a Source class that eases reading files, doing some automatic resource management. That's good. But then it turns out the abstraction returned is an Iterable. And you don't want to have an Iterator traversing the contents of you file. If it bails out, then you are left with an open file handle, leaving your file open for the rest of the existence of your VM instance. In fact, if you're searching StackOverflow, you will quickly find many complaints about scala.io being broken, or about scala.io being still broken.

Scala's New IO

But there might be hope out there. There is a Scala library that seems to address some of the concerns normally addressed by the libraries I mentioned, including decent support for automatic resource management. The name of the library: scala-io. I know. It might be a good idea to change the name.

What does it give you?

Scala IO first of all is built on top of scala-arm, the library providing the foundation for automatic resource management. On top of that, it gives you quite a bit of goodness for reading and writing bytes and text. In this post, I will go over some of its features, comparing it to how it's done in Scala:

Copying an InputStream into a Byte Array

This is how't its done in Guava:

InputStream in = ...;
byte[] buffer = ByteStreams.toByteArray(in);

And this is the same thing, done in Scala IO:

val in: InputStream = ...
Resource.fromInputStream(in).byteArray

Similar, but there is a big difference though. In the first case, the stream is not closed. In the second cases, it is.

InputSuppliers

Guava has an abstraction that allows you to pass an object providing access to an InputStream around. The InputStream itself is not opened yet, but it will get opened once you ask the object give you the Input. The good thing about it is that code that opens the stream can also be responsible for closing it, without having to know how the stream got opened:

public interface InputSupplier<T> {
T getInput() throws IOException;
}

In a way, a Scala IO Resource is an InputSupplier or/and an OutputSupplier. However, there is no need to implement an interface to defer the construction of the actual underlying object providing or accepting bytes. Instead, you just pass in a block of code that will get evaluated right before you are about to access or write your bytes, leveraging Scala's by-name parameters.

So you could do something like this:

Resource.fromInputStream(new FileInputStream(...))

...without the file already getting opened. As a consequence you can access the Resource multiple times without running into trouble. The FileInputStream will be closed after you have acted on it, but you can still 'reopen' it afterwards.

Filling a byte array

In some cases, all you want to do is fill an existing byte array. In Guava, this is how you would do it:

InputStream in = null;
try {
in = ...
byte[] buffer = new buffer[100];
ByteStreams.readFully(in, buffer);
} finally {
Closeables.closeQuietly(in);
}

In Scala IO, it's quite a bit easier:

val in: InputStream = ...
val buffer = new Array[Byte](100)
Resource.fromInputStream(in).bytes.copyToArray(buffer)

Note the absence of a try finally block. First a Resource is getting created, then we obtain a bytes view on that object, and then we use Traversable's copyToArray method to copy the data into the array.

Copy InputStream to OutputStream

This is how you do it in Java using Guava:

InputStream in = ...;
OutputStream out = ...;
try {
ByteStreams.copy(in, out);
} finally {
Closeables.closeQuietly(in);
Closeables.closeQuietly(out);
}

This is the same thing done in Scala IO:

val in: InputStream = ...
val out: OutputStream = ...
Resource.fromInputStream(in).copyData(Resource.fromOutputStream(out))

Seems rather verbose. And as a matter of fact, it doesn't need to be this way. If you import a number of implicits, then the above could expressed like this as well:

in.asInput.copyData(out.asOutput)

There are implicits turning the InputStream into an Input object with the copyData operation, and similar implicit conversions from OutputStream to an object upon which you can invoke toOutput.

Reading a String

This is how it's done in Guava:

InputStream in = ...;
String content = null;
try {
content = CharStreams.toString(new InputStreamReader(in, "UTF-8"));
} finally {
Closeables.closeQuietly(in);
}

... and this is the same thing, done in Scala IO:

val in: InputStream = ...
val content = Resource.fromInputStream(in).slurpString(Codec.UTF8)

or, alternatively:

val in: InputStream = ...
val content = in.asInput.slurpString(Codec.UTF8)

1 comments:

Jesse Eichar said...

Nice post. A couple of points that you might find interesting.

- I have spent the last few weeks of vacation working on Scala-io and almost have a new version ready for release (0.2.0). This version contains more handy methods like compareContents, containsSlice, etc... In addition it has added options for performing several operations on a single open stream, which the current version does not yet support.
- A handy way to reduce the verbosity of your examples would be to import scalax.io.Resource._ so that you don't have to import fromInputStream with Resource.
- The scala-io file API is handy for accessing filesystems allowing you do to things like: "." ** "*.java" to select all java files in a file subtree.