Wednesday, October 28, 2009

Preon on JRockit

After my Preon talk earlier today, Alex Buckley warned me that reflection is not going to guarantee that the fields will always be returned in the same order. Preon currently relies on the fields to be returned in the order in which they were defined, to some extent.

That is, if this is the data structure you defined:
public class Image {
@Bound int width;
@Bound int height;
}
Then Preon will expect that the reflection API will also return the fields in this order.

I think I sort of knew this in the back of my head, but it never failed any of the tests, so I stopped worrying about it. So, the question is, should I start worrying now? It's clearly undesirable to depend on some coincidental properties of a Java VM, but are there actually VMs that will return a different field ordering? With IBMs VM being based on Sun's VM, it's unlikely they will differ. So that leaves us with JRockit.

In the end, I figured it would be wise to have an automated test that at least guarantees that the current setup also wors on JRockit. It doesn't solve the problem, but it does provide some degree of guarantee that the problem will usually not manifest itself. And it turns out, the test succeeds.

Friday, October 23, 2009

JUnit 4.7 @Rules!!!

It took me a while before I really understood what this @Rule business in JUnit 4.7 is really about. I like it!

Last week, I had to make sure that I could test a RESTful web service client. So, all I really wanted is to make sure that a web server would setup a temporary resource, and always return a particular response. However, since the test would be executed on a continuous integration service, there was - as always - the risk that while setting up a temporary web server, there would be port conflicts with other tests running simultaneously on that same host.

So this is what I ended up doing:

Creating a Web Server

Well, not really a full web server, but rather a wrapper around Jetty, and called it WebServer. However, this WebServer implemented MethodRule. Now, this is the trick. If in your test class, you create a field pointing to an object implementing the MethodRule interface, and you mark this to be processed as a rule, by adding the @Rule implementation, then JUnit will call back on your instance for every test it will run, allowing you to add additional behavior around your test execution.
interface MethodRule {
Statement apply(final Statement base, FrameworkMethod method, Object target)
}
My WebServer class implements this method by starting and stopping Jetty. This is what it says in my test class:
@Rule
public WebServer server =
new WebServer("WEBSERVER_PORT", 9191);
Notice that my WebServer constructor contains two parameters. The first parameter contains the name of an environment variable that might exist. If it does exist, then the value of that environment variable will be used as the port number for the web server. The second parameter is the default port number, to be used in case the port number is not set by the environment variable.

Environment Variables?

The reason for having the ability to pass in an environment variable here is that this allows Hudson to make sure that you don't have any port conflicts. Now, obviously, with all of this I don't know in advance which port number is going to be used. That's why my Web Server also implements a method getURL() which will return a String representation of the resource that we are going to hit. I don't have to keep track of that port number. The WebServer will just tell me which port it's using.

Which Resource?

Now, I can already hear the next question coming: how would the WebServer know which resource it needs to server? That a good question. With all of this working, you would prefer not having to worry about this from your test. (I mean, you could obviously further configure the Web Server from your test, but that would be kind of awkward, since it's already running.)

Annotations to the rescue

The answer turns out to be easier than you might expect: you just use another annotation. Now, this is not something that @Rule is dictating or anything, but I feel this is going to be an emerging pattern. I basically tell the WebServer which resource to serve by having an annotation on my test method telling it.

So my test looks like this:
@Test
@WebResource(content="classpath:whatever.xml", contentType="text/xml")
public void shouldBeAbleToDealWithWhatever {
String url = server.getURL();
// Download something from that URL using your client
}
Notice the classpath: prefix here. This is a little something borrowed from Spring. This will force the WebServer to locate the resource on the classpath. If you happen to run your tests from Maven, then this is an excellent way of letting the Web Server know that it should look into src/test/resources.

Conclusions

@Rule rules. It works beautifully. And using annotations on your test methods to parameterize the execution of your rules makes it even more useful.

Saturday, October 17, 2009

Xeger has arrived!!!

Last Friday, I quickly tried to generate some test XML samples out of an industry standard XML Schema, using xmlgen. And it failed. Even though xmlgen is a great tool, it's currently not capable of accommodating for every type definition encountered in the schema. More specifically, it fails to support schema definitions that include restrictions based on regular expressions, like these:
        Tijdnotatie als hh:mm.               
In order to be able to support schema definitions like these, xmlgen would have to be able to basically revert the regular expressions, and generate text snippets that would be considered valid according the expression.

When I started to look around, to see if there was something out there capable of doing that, I couldn't find anything like it in Java. Perl and Ruby had some support for it, but that's it. I asked around on Stackoverflow, but no solution showed up.

It made me wonder if I could roll my own 10+ years ago, I did something similar when creating Javascript validators for zipcodes, phone numbers, etc. Back then, I basically constructed by own finite state machine to validate patterns (Javascript didn't have regex support yet). The finite state machines also allowed me to generate valid samples, by simply randomly walking the state transitions.

So the question was if there was anything out there creating state machines from regular expressions. And then somebody pointed me to this project. It was exactly what I needed. Creating something capable of generating text out of these state transition definitions was a breeze. Xeger was born. (Xeger = opposite of Regex)

So this is how it works. You pass in a regex, and Xeger will generate random text matching this regular expression.
String regex = "[ab]{4,6}c";
Xeger generator = new Xeger(regex);
String result = generator.generate();
assert result.matches(regex);
In the example above, Xeger will generate Strings such as "aababc", "bbbbbbc", etc.

Next stop will be to complete my updated version of xmlgen. You can find Xeger here.

Thursday, October 15, 2009

Maven Archetype for Simple Preon Project

It's there. Just got to http://preon.flotsam.nl/getting-started.html and follow the instructions. This is by far the easiest way to get started.

Tuesday, October 13, 2009

Preon Encoding Roadmap

In my previous post, I highlighted some of the challenges and questions to be answered before Preon can be made to support encoding as well. The main problem is not so much that it is hard to understand how the data should be encoded. That's all pretty clear. The meta data gathered by Preon provides sufficient detail to make that fairly easy to do. No, the real problem is in preserving consistency.

If you really try to imagine the ultimate goal, then it's immediately clear that it will be close to impossible to get there in one step. So what do you do? You break things up in phases. That's what I tried to do tonight:



The picture above depicts the different stages for getting closer to where I want Preon to be. Read it from the bottom up.

Phase 1: Writing all data to a stream

Currently, the Codec only defines a decode operation. We clearly need an encode operation as well. The decode operation is going to take the object passed in and write it to the output channel. Period. By adding this operation, you will be able to encode data, however when you actually make changes to the data, you're basically on your own.

Here are some of the complications that could occur if you change the data:
  1. In case Preon loaded an instance of its own LazyLoadingList, then trying to modify that list is going to throw exceptions.
  2. If you replace the value of an attribute that is used in Limbo expressions, there's not only a chance that you write corrupted data, but there's also a chance that you will not even be able to continue to read the data any longer. Remember, Preon is decoding lazily. It might not even have read the data that you are about to write. By changing the attributes that Preon uses in calculating the starting point of a section of data to read, you might get in trouble.
Phase 2: Binding to public accessors

We need to be able to understand when code outside of Preon is making changes to an object that was loaded using Preon, because if it does, then we can no longer afford to drop a cached version of that instance. However, currently Preon only binds to fields, and not to bean-type accessors. It's going to be close to impossible to track changes to those private fields, but it will be possible to track changes to those fields if Preon binds to the accessor methods rather than the fields. So, we will probably need that feature to be there.

Phase 3: Copy on change

Maybe this is not the right term. In some cases, all we need to do is make sure that we hold on to a cached copy indefinitely, until data is persisted. In other cases, we will need to make sure that we actually replace the entire copy of an existing object and replace it by something else.

Phase 4: Consistency checks while writing

As I said, making sure that we preserve consistency over the entire file is going to be one of the biggest challenges. The previous step has made sure that we can actually change the file and write it again, but it's not going to guarantee that whatever is going to be written is consistent. For that, we need something else. This phase is about adding a feature that will check consistency while writing.

(Consequently, data early in the file will always prevail over data later on in the file. If the file first contains an integer denoting the size of a list following, and that value is greater than the actual size of the list when the list needs to be written, the list either needs to be truncated or grow, or an exception will have to be thrown. This phase is about making sure an exception is thrown.)

Phase 5: Rewrite only if required

In many cases, it's not going to be required to first load data into an object and then write it to output again. If the object encoded did not change, then we can just stream the from the original source. (In this case, the BitBuffer.) This is - hopefully - going to be an optimization that is going to pay off big time.

Phase 6: Autocorrecting

In phase 4, I already said that there will be cases in which you ideally want the list of elements to grow or shrink if the attribute that denotes the number of item in the list is updated. This phase is about considering solutions like these. It will probably be quite hard, if not impossible at all, but it's worth to take it into consideration.

Phase 7, 8, 9, 10, ...

O man, if only I would have time.

Sunday, October 11, 2009

Inserting Data Using Preon

I started work on supporting encoding data the other day. There are quite a few complications, and I noticed that just jotting them down is not going to help or anyone else to come up with some solutions. First I need a better description of the problem.

So I decided I would add a couple of examples that are slightly more detailed, in order to be able to highlight the problems, and explain the solutions that I'm currently thinking of.

The first example is about a simple image format. (I'm just so grateful we have images. Where would I have been without them. They always seem to serve as be the best examples in cases like this.) It only defines a slot for the number of pixels and then defines the color values for those pixels.


The example above illustrates an instance of this model, based on a 400 pixel image. The orange region highlights the first pixel value in the byte stream. It shows the relation between the object instance and the corresponding location in the byte stream.

Now, if the above would have been decoded using Preon, then the code would have probably looked a little like this:

class Image {
@Bound int nrPixels;
@BoundList(size="nrPixels") List<Color> pixels;
}

class Color {
@Bound byte red;
@Bound byte green;
@Bound byte blue;
}

It clearly shows the link between the nrPixels attribute and the pixels attribute. Now, suppose that Preon would allow you to modify the data and then encode the data. What would that look like in this case? The image below is trying to capture it:


So, only inserting a single pixel would already require quite some changes. In fact, it would basically require a rewrite of the entire byte stream.
  • The nrPixels attribute would change from 0x190 to 0x191. (See pink region.)

  • The entire pixel array would have to be shifted to the right in order to allocate space for the new pixel to be inserted.

  • That new pixel would have to be inserted. (See orange region.)
The question that I'm trying to answer myself is how Preon should deal with this. There are a couple of challenges, given the way Preon works.
  • If the List pixel 'array' would have supported the insert operation, then the 'nrPixels' attribute is not automatically updated if you would insert a new pixel.

  • Changing the nrPixels attribute would not automatically allocate a new element in the list of pixels.

  • Given the fact that Preon writes to private fields of objects, it's going to be pretty hard to notice updates to any of these fields at all.
Now, that's quite a few challenges. My first thoughts on this:
  • Since the List pixels attribute will cause Preon to insert an instance of its own List class, it should be possible to intercept changes to that list. If the size of the list is based on an attribute value read upstream, then it should be possible to update that value with every change to the pixels list.

  • However, this would only work in case that attribute value is only defining an encoding/decoding attribute for that specific List. If there are other data elements read downstream that are based on that same attribute value, it would be impossible to update it automatically.

  • In cases like these, Preon could still be able to check the dependencies as a constraint and abort the encoding process if the constraints are violated.

Saturday, October 10, 2009

Changing Preon Modules

I know this could be painful, but I will go ahead and do it anyway. The last couple of weeks I spent some of my sparetime to work on a capability for outputting data from Preon. And I planned to support streaming output first. Now, that requires a couple of new abstractions that I first added to a module called preon-bitchannel. But I think I'm going to collapse preon-bitbuffer and preon-bitchannel into one new module: preon-io.

The two abstractions for reading bit'stream' type of content are just so intertwined that it hardly makes sense to have two different modules for it. For instance, one of the capabilities that will turn up in BitBuffer is the capability to 'stream' content directly to an output channel.

Now, why is this important? It's a little hard to explain, but I will give it a go. First of all, you should be aware of the fact that Preon is loading data lazily. That means that - in some cases - rather than loading data aggressively, it will keep a pointer to a position in the BitBuffer, and only start reading from that position if the application is actually accessing that data. Now, if you would be able to update data, but you would leave that specific segment untouched, then encoding would be as simple as writing from the BitBuffer directly to the output channel. And an implementation could do that extremely efficiently if the buffer and channel abstractions are designed to work with each other.

Any way, a code example probably would help a lot in this case. Sit tight. Something is coming.