Saturday, February 6, 2010

DocBook, FOP and Fonts

I'm proud to say that - during the six years of employment at the company formerly known as Sun Microsystems - I wrote all my documents in DocBook. Of course there was the occasional warning that we were all expected to use StarOffice, but by making sure the DocBook generated output resembled the printed material produced by HQ, it never turned into a big argument. And since my entire DocBook chain was built from open source, I had to use Apache FOP.

Apache FOP has a long history. For some reason, it seems impossible to ever arrive at a version 1.0. The versions I used at Sun unfortunately never supported the keep-with-next poperty, which resulted in weird page-endings, that I then had to manually fix. However, that has been solved with the latest versions, and I think the output generated by the DocBook stylesheets can be quite ok.

That is, if you customize them. And while customizing, you also might want to use other fonts than the standard fonts. However, if you are using Apache FOP, then simply referencing alternative fonts is not going to get you anywhere. You also need to make sure the font metrics can be found. And if you don't have these font metrics files yet, you first have to generate them.

Apache FOP is providing some utilities for generating font metrics, however none of that is based on Maven. If you are using the Maven Docbkx Plugin, then you might want something that integrates with Maven. Last weekend I realized that - although the Maven Docbkx Plugin supports something like that - it is probably one of its best kept secrets.

So, this is the way you use it.

Add a plugin

First of all, you need to add a plugin to the plugins section.

<plugin>
<groupId>com.agilejava.docbkx</groupId>
<artifactId>docbkx-fop-support</artifactId>
<executions>
<execution>
<phase>pre-site</phase>
<goals>
<goal>generate</goal>
</goals>
<configuration>
<ansi>true</ansi>
<sourceDirectory>...</sourceDirectory>
</configuration>
</execution>
</executions>
</plugin>


The sourceDirectory parameter is pointing to a directory containing the .ttf files. To be perfectly honest, I don't remember exactly what the ansi parameter was about, but AFAIR, it had something to do with being able to search the documents. (Huge disclaimer: it may be about something completely different. Feel free to plough through the FOP documentation to understand what it is about.)

As I said, the font metrics files are required to get something done. However, just generating the files is not going to make a huge difference. In order to reap the benefits from these metrics files, you will need to tell the Maven Docbkx Plugin what to do with it.

The first thing you should do is add some font metadata to the Maven Docbkx Plugin's configuration:


<fonts>
<font>
<name>Calibri</name>
<style>normal</style>
<weight>normal</weight>
<embedFile>${fonts.dir}/calibri.ttf</embedFile>
<metricsFile>${basedir}/target/fonts/calibri-metrics.xml</metricsFile>
</font>
</fonts>


This should be enough to make the plugin aware of the existence of this font, and to have a way resolve its name to the required meta data. With this in place, making the plugin use this font is pretty easy. You can use the fonts name in all places where the stylesheets are referring to a font name. So the below will change the body font to Calibri.


<bodyFontFamily>Calibri</bodyFontFamily>
Share/Save/Bookmark
Thursday, January 21, 2010

OOPSLA slides uploaded

Share/Save/Bookmark
Monday, January 11, 2010

The Size of the Slice

During the last couple of months, I have been going back and forward on how to deal with slicing while encoding. If it comes to clearing your mind, nothing helps as explaining it to others (well, perhaps there are some things that help even better), so here goes nothing:

Slicing

First question: what is slicing anyway? In Preon, slicing is basically creating a partial view on a BitBuffer. If you happen to have a BitBuffer with 512 bits, and you slice it, what you get is a new BitBuffer. If your current position is on bit 9, and you create a slice of 16 bits (hardly realistic, but it's just easier to work with smaller numbers), then you get a BitBuffer that contains 16 bits. On position 0, it will have the value of position 9 in the original BitBuffer, and so on.

Use of Slices

Slices are useful in situations in 1) you have a collection of things and 2) you can't predict how many items it has, but you can predict how much space it occupies in total. Basically, if you first slice the BitBuffer, you can basically just keep reading until you the end of the BitBuffer. In reality, the BitBuffer can contain way more data, but you can never read beyond the end of the slice.

The Size of the Slice

There are essentially two different ways to define the size of the slice: the size of the slice could either be a static value, or it can be defined in terms of data read upstream. As an example of the second case: there might be a situation in which the size of the slice is defined by a value encoded as an int right in front of the start of the actual slice.

Ways to Define a Slice

Out of the box, Preon provides two different ways to define a slice. There's a '@Slice' annotation, and a '@LengthPrefix annotation. The second annotation existed before the first one, and is not as powerful as the first one. So while I was thinking about all of this, I did consider dropping @LengthPrefix entirely.

So how does it work?

The @Slice annotatation is defined like this: @Slice(size="..."). In this case the size is a Limbo expression, so it can either be a static value or an expression that incorporates data read upstream.

The @LengthPrefix annotation is defined like this: @LengthPrefix(size="...", endian={ByteOrder}). In this case, the size is not the size of the actual slice, but the size of a value represented as an integer that holds the size of the slice. (See the example below.)


So, you could say that with @Slice you can do everything that can be done with @LengthPrefix and a little more, but not the other way around. So far, there doesn't seem to be a reason to keep @LengthPrefix any longer; it seems we can just deprecate it. But there is a little more to that. Read on.

Encoding

Encoding a slice is in a way more complicated than decoding it. Suppose that the slice is defined to have a fixed size. In that case, the size of the data to be encoded within that slice should not be bigger than the size of the slice itself.

In case the size of the slice is a function of data read upstream, then it becomes even more complicated. If the data within the slice requires a certain amount of bits, then that function needs to result in that number of bits. Consequently, the variables used in that function need to have proper values, but since those variables are read from bits upstream, those bits need to have the proper value as well. So, the space required by downstream data requires upstream bits to have a certain value.

It's imported to note that this is a specific case of some challenge that has been reported before, in a previous post. The @LengthPrefix shines a different light on it though.

The following two snippets are essentially the same:
// Example 1:
@BoundNumber(size="8") int size;
@Slice(size="size") @BoundList byte[];

// Example 2:
@LengthPrefix(size="8") @BoundList byte[];
The Codecs generated by Preon will be almost identical. The main difference is that, in the first case, the size attribute is something that is accessible by the client. In the second case, it is data that will be read from the BitBuffer, but it will never be exposed to the Preon user.

Because of the information hiding in case of @LengthPrefix, it will be easier to preserve the dependency between the size attribute and the actual size of the slice. Encoding the slice in case of @LengthPrefix could work like this:
  • Determine if the size of data structure from the slice can be determined before it is encoded. In case it can, simply write the size of the slice to the BitChannel first, and then continue writing the data structure that needs to be encoded.
  • Otherwise, write the entire data structure to a temporary BitChannel backed by an in memory buffer. Then, before flushing the contents of this buffer into the target BitChannel, determine the required size of the buffer, and write that value to the target channel first. Then write the contents of the temporary buffer into the BitChannel.
So, this will guarantee that the encoded representation will always be correct. Something that is currently impossible when using @Slice. For @Slice, we it is possible to write to a temporary buffer, and determine the required size, but then there is not much we can do with that information, since it's data that could have been changed. (And having the ability to preserve dependencies between fields that can be changed is currently deliberately left out of the Preon roadmap.)

Summary

Trying to summarize my thoughts. Although there is some benefit in using @LengthPrefix when encoding, I'm still considering taking it out. @LenghtPrefix is only going to be a benefit in a few cases, whereas the problem that manifests itself when using @Slice exposes a challenge that 1) is way harder to solve, but also 2) much more awarding to solve.

If I take that route, then there will not be a reason for having the ability to write the slice to a temporary buffer to determine it's size. It's possible to write to a BitChannel directly, and then throw an exception if the number of bits required exceeds the size defined by the slice.


Share/Save/Bookmark

Hamcrest-based Schema Validation

There doesn't seem to be an easy way to validate an XML document against a schema, so I figured I would roll my own library for it. Now, with that library, validating against a schema all of a sudden becomes easy. In the past, I hardly ever considered validating a document against a schema in my tests, but now I find myself doing it all over the place.

Anyway, I will just give a brief introduction. (And there really isn't that much to talk about.)

First of all, let's assume that you have a File reference to an XML file (xml) and a File reference to a RelaxNG schema file (schema). Now, in order to validate that file from JUnit, all you have to do is this:
assertThat(xml, isValidatedBy(schema));
That's it. If all things are well, then you won't see anything at all. However, if there is an error in your file, you might get something like this:
java.lang.AssertionError:
Expected: conforming to sample.rng
but: tag name "foo" is not allowed. Possible tag names are: <bar> at line <2>
In order to get these detailed messages (leveraging Hamcrest 1.2's TypeSafeDiagnosingMatcher) from JUnit, you do however need to use MatcherAssert from Hamcrest itself. JUnit's Assert class will not display the details.
Share/Save/Bookmark
Sunday, December 6, 2009

Mystery Programming Language Hiding in Plain Sight

It probably will not strike you as a surprise that there is a lot to do these days on the next programming language. Java - wildly popular as it may be - does have some limitations and there are some things that you just cannot shoehorn into into the language without violating its conceptual integrity.

From all of these pretenders to the Java throne, we at Xebia probably like Scala most, for various reasons. We like the fact that it marries two programming paradigms (functional and OO), that it can be extended, and that there is type safety without paying a huge toll. But there are obviously alternatives such as Erlang, Clojure and LUA (and Fortress, perhaps, some day?).

But hold on, there is more… There is something out there, and you may not be aware of it. Before I reveal its name, I will refer to it as programming language Rarebit.

  • Rarebit has immutable state only. In that sense, it's more like Erlang than like Scala. In terms of Scala, everyting in Rarebit is a val. (As a consequence, programming recursive structures is not uncommon in Rarebit.)
  • Rarebit has pattern matching, just like Erlang and Scala. In fact, it takes pattern matching to a whole new level, since your entire program is basically driven by pattern matching. (Pattern matching driven is the new 'driven'?)
  • Like Scala, XML is a first class citizen in Rarebit. There is a difference though. In Scala, if you want to address a particular node in your document, you use an XPath-alike expression. In Rarebit, it is really one-hundred-percent XPath.
  • Like Scala and Clojure, Rarebit lives on the Java VM. Normally, it is interpreted, but it is also possible to compile Rarebit downto plain old bytecode.
  • Unlike Scala and Clojure, Rarebit does not allow you to call native Java libraries just like that. You can however get that by using extensions that I will discuss later on. However, out of the box, it doesn't support it. That's a bless and a curse, but I am hanging towards the bless. Because of this 'limitation' Rarebit really is guaranteed to be side-effect free. And that's not all, because:
  • Unlike Scala and Clojure, Rarebit does not require the VM. There are other versions that run directly on the host operating system. In that sense, Rarebit is more similar to Ruby and Python: they all come in a C-version and in a J-version.
  • Unlike LUA and Ruby, the entire language is specified, submitted to a standardization organization, and officially endorsed.
  • Unfortunately unlike Scala and Ruby, Rarebit is extremely stable. 
  • Rarebit is dynamically typed, just like Clojure and Erlang, although a later version (largely backwards compatible) allows you to work in a more strongly typed manner, more like Scala.
  • Rarebit does allow you to pass around function pointers, and it does support something that - with a bit of imagination - you could consider support for closures. (Just like Scala.)
  • Rarebit is extensible. That is, it allows you to add more language constructs to the language, although it's questionable if you can really talk about it in that way, since:
  • In Rarebit, your code is just data, just like in Clojure. So you can manipulate programs using programs easily.
  • Rarebit does however allow you to introduce new 'syntax' that will also allow you to call out to libraries outside of the set of functions supported by Rarebit itself. Note however that - if you use these features - you can introduce side-effects, which is clearly not necessarily a desirable feature.
  • Unlike Scala and Clojure, Rarebit has very decent IDE support. (I guess it won't be long until Scala support in IntelliJ becomes up to par.)
Hopefully you are getting curious by now. Because here comes the big surprise: Rarebit not only is capable of running on the Java VM; it has already been part of the Java language runtime since ages, hiding in plain sight. It was there all along, when other programming languages demanded an SPI for plugging themselves into the JRE, it was there during the debate on getting invokedynamic in the VM. In fact, if I remember correctly, then it was already there in Java 1.4, but it was certainly already present and available for the VM around that time.

Sounds pretty good huh? I bet you want to get your hands on that programming language. So here goes nothing:

Rarebit = XSLT




Share/Save/Bookmark
Wednesday, November 11, 2009

Spring ME Supporting Namespaces?

It was actually quite a while ago since I looked at Spring ME, but then Davide Cerbo mentioned that he had presented Spring ME on Android at a Rome Spring meeting. Way cool! It triggered me thinking about the things that - according to the document I once wrote about it - were not implemented yet.

One of the things that I said was missing was support for namespaces. But is it really? Last week, I started to get some doubts. Maybe it was magically supported anyhow, and I just never bothered to give it a try.

Tonight I gave it a try. I have to admit that it didn't work immediately, but that had more to do with the fact that Spring ME didn't support FactoryBean and InitializingBean yet, and since I tested it on utils:constant, support for that turned out to be required. (FieldRetrievingFactoryBean implements both of these interfaces.) So, once I added support for FactoryBean and InitializingBean, it turned out to be working fine. Quite to my astonishment, I have to say.

So, in my test set up, this is my Spring configuration file:


<beans xmlns="..." xsi="..." util="..." schemalocation="...">

<bean id="person" class="me.springframework.sample.namespaces.Person">
<property name="age">
<util:constant field="java.lang.Integer.MAX_VALUE">
</util:constant></property>
</bean>

</beans>

... and this is what Spring ME is generating:


org.springframework.beans.factory.config.FieldRetrievingFactoryBean result =
new org.springframework.beans.factory.config.FieldRetrievingFactoryBean();
result.setStaticField("java.lang.Integer.MAX_VALUE");
result.afterPropertiesSet();
return result.getObject();

How about that! Note that I am no defining the FieldRetrievingFactoryBean anywhere in my configuration file. That's all based on Spring namespaces. Based on this, I get the feeling that this would basically work for any namespace, as long as the bean definitions generated by Spring are actually compatible with the subset of Spring currently supported by Spring ME.
Share/Save/Bookmark
Wednesday, October 28, 2009

Preon on JRockit

After my Preon talk earlier today, Alex Buckley warned me that reflection is not going to guarantee that the fields will always be returned in the same order. Preon currently relies on the fields to be returned in the order in which they were defined, to some extent.

That is, if this is the data structure you defined:
public class Image {
@Bound int width;
@Bound int height;
}
Then Preon will expect that the reflection API will also return the fields in this order.

I think I sort of knew this in the back of my head, but it never failed any of the tests, so I stopped worrying about it. So, the question is, should I start worrying now? It's clearly undesirable to depend on some coincidental properties of a Java VM, but are there actually VMs that will return a different field ordering? With IBMs VM being based on Sun's VM, it's unlikely they will differ. So that leaves us with JRockit.

In the end, I figured it would be wise to have an automated test that at least guarantees that the current setup also wors on JRockit. It doesn't solve the problem, but it does provide some degree of guarantee that the problem will usually not manifest itself. And it turns out, the test succeeds.

Share/Save/Bookmark

Blog Archive

About Me

My Photo
Wilfred Springer
View my complete profile

My Blog List