Sunday, March 29, 2009

Preon Class File Format

A couple of weeks ago, I blogged a little about Preon's CodecDescriptor, and that it should be changed. This weekend, I completed the new interface, and made sure the entire documentation generation process is now using that interface. I'm quite pleased with the result, although I noticed one area where it can be improved.

The description below is what the new setup produces for the Java Classfile Codec produced from the POJO classes here. Note that the output is a little messy, but that's mostly Blogger.com's style. The actual output looks a little better and can be downloaded here. The same goes for the hyperlinks. The hyperlinks down below don't work. But they do work in the original source document.

Class file

Class file is composed out of several other smaller elements. The table below provides an overview.

Name

Description

Size (in bits)

Magic

a 32-bit integer value (big endian).

32 (4 bytes)

Minor version

a 16-bit integer value (big endian).

16 (2 bytes)

Major version

a 16-bit integer value (big endian).

16 (2 bytes)

Constant pool count

a 16-bit integer value (big endian).

16 (2 bytes)

Constant pool

a list of elements.

The number of elements in the list is the difference between Constant pool count of Class file and 1.

The particular choice is based on a 8-bit value preceeding the actual encoded value. If 7, then Class cp info will be choosen. If 6, then Double cp info will be choosen. If 9, then Field ref cp info will be choosen. If 4, then Float cp info will be choosen. If 3, then Integer cp info will be choosen. If 11, then Interface method ref cp info will be choosen. If 10, then Method ref cp info will be choosen. If 12, then Name and type cp info will be choosen. If 8, then String cp info will be choosen. If 1, then Utf8 cp info will be choosen.

(unknown)

Access flags

a 16-bit integer value (big endian).

16 (2 bytes)

This class

a 16-bit integer value (big endian).

16 (2 bytes)

Super class

a 16-bit integer value (big endian).

16 (2 bytes)

Interfaces count

a 16-bit integer value (big endian).

16 (2 bytes)

Interfaces

a 32-bit integer value (little endian).

The number of elements in the list is Interfaces count of Class file.

Interfaces count of Class file times 32

Field count

a 16-bit integer value (big endian).

16 (2 bytes)

Fields

Field info.

The number of elements in the list is Field count of Class file.

(unknown)

Method count

a 16-bit integer value (big endian).

16 (2 bytes)

Methods

Method info.

The number of elements in the list is Method count of Class file.

(unknown)

Attribute count

a 16-bit integer value (big endian).

16 (2 bytes)

Attributes

Source file or Deprecated.

The number of elements in the list is Attribute count of Class file.

The particular type of data structure is selected based on the value of 16 leading bits. These bits are interpreted as an unsigned int. The table below lists the conditions, and the data structure assumed when these conditions are met.

Condition

Data structure

the value (a String) of the nth element of Constant pool of Class file (with n being the difference between the value of the first 16 bits and 1) equals the String "SourceFile"

Source file

the value (a String) of the nth element of Constant pool of Class file (with n being the difference between the value of the first 16 bits and 1) equals the String "Deprecated"

Deprecated

(unknown)

Class cp info

Class cp info is composed out of several other smaller elements. The table below provides an overview.

Name

Description

Size (in bits)

Name index

a 16-bit integer value (big endian).

16 (2 bytes)

Double cp info

Double cp info is composed out of several other smaller elements. The table below provides an overview.

Name

Description

Size (in bits)

Value

a 64-bit integer value (big endian).

64 (8 bytes)

Field ref cp info

Field ref cp info is composed out of several other smaller elements. The table below provides an overview.

Name

Description

Size (in bits)

Class index

a 16-bit integer value (big endian).

16 (2 bytes)

Name and type index

a 16-bit integer value (big endian).

16 (2 bytes)

Float cp info

Float cp info is composed out of several other smaller elements. The table below provides an overview.

Name

Description

Size (in bits)

Value

a 32-bit integer value (big endian).

32 (4 bytes)

Integer cp info

Integer cp info is composed out of several other smaller elements. The table below provides an overview.

Name

Description

Size (in bits)

Value

a 32-bit integer value (big endian).

32 (4 bytes)

Interface method ref cp info

Interface method ref cp info is composed out of several other smaller elements. The table below provides an overview.

Name

Description

Size (in bits)

Class index

a 16-bit integer value (big endian).

16 (2 bytes)

Name and type index

a 16-bit integer value (big endian).

16 (2 bytes)

Method ref cp info

Method ref cp info is composed out of several other smaller elements. The table below provides an overview.

Name

Description

Size (in bits)

Class index

a 16-bit integer value (big endian).

16 (2 bytes)

Name and type index

a 16-bit integer value (big endian).

16 (2 bytes)

Name and type cp info

Name and type cp info is composed out of several other smaller elements. The table below provides an overview.

Name

Description

Size (in bits)

Name index

a 16-bit integer value (big endian).

16 (2 bytes)

Descriptor index

a 16-bit integer value (big endian).

16 (2 bytes)

String cp info

String cp info is composed out of several other smaller elements. The table below provides an overview.

Name

Description

Size (in bits)

String index

a 16-bit integer value (big endian).

16 (2 bytes)

Utf8 cp info

Utf8 cp info is composed out of several other smaller elements. The table below provides an overview.

Name

Description

Size (in bits)

Length

a 16-bit integer value (big endian).

16 (2 bytes)

Value

A sequence of characters, encoded in ASCII.

The number of characters of the string is Length of Utf8 cp info.

8 times Length of Utf8 cp info

Constant value

Constant value is composed out of several other smaller elements. The table below provides an overview.

Name

Description

Size (in bits)

Attribute length

a 32-bit integer value (big endian).

32 (4 bytes)

Constant value index

a 16-bit integer value (big endian).

16 (2 bytes)

Synthetic

Synthetic is composed out of several other smaller elements. The table below provides an overview.

Name

Description

Size (in bits)

Attribute length

a 32-bit integer value (big endian).

32 (4 bytes)

Deprecated

Deprecated is composed out of several other smaller elements. The table below provides an overview.

Name

Description

Size (in bits)

Attribute length

a 32-bit integer value (big endian).

32 (4 bytes)

Field info

Field info is composed out of several other smaller elements. The table below provides an overview.

Name

Description

Size (in bits)

Access flags

a 16-bit integer value (big endian).

16 (2 bytes)

Name index

a 16-bit integer value (big endian).

16 (2 bytes)

Descriptor index

a 16-bit integer value (big endian).

16 (2 bytes)

Attributes count

a 16-bit integer value (big endian).

16 (2 bytes)

Attributes

Constant value, Synthetic or Deprecated.

The number of elements in the list is Attributes count of Field info.

The particular type of data structure is selected based on the value of 16 leading bits. These bits are interpreted as an unsigned int. The table below lists the conditions, and the data structure assumed when these conditions are met.

Condition

Data structure

the value (a String) of the nth element of Constant pool of Class file (with n being the difference between the value of the first 16 bits and 1) equals the String "ConstantValue"

Constant value

the value (a String) of the nth element of Constant pool of Class file (with n being the difference between the value of the first 16 bits and 1) equals the String "Synthetic"

Synthetic

the value (a String) of the nth element of Constant pool of Class file (with n being the difference between the value of the first 16 bits and 1) equals the String "Deprecated"

Deprecated

(unknown)

Line number table entry

Line number table entry is composed out of several other smaller elements. The table below provides an overview.

Name

Description

Size (in bits)

Start pc

a 16-bit integer value (big endian).

16 (2 bytes)

Line number

a 16-bit integer value (big endian).

16 (2 bytes)

Line number table

Line number table is composed out of several other smaller elements. The table below provides an overview.

Name

Description

Size (in bits)

Attribute length

a 32-bit integer value (big endian).

32 (4 bytes)

Line number table length

a 16-bit integer value (big endian).

16 (2 bytes)

Line number table

Line number table entry.

The number of elements in the list is Line number table length of Line number table.

Line number table length of Line number table times 32

Local variable table entry

Local variable table entry is composed out of several other smaller elements. The table below provides an overview.

Name

Description

Size (in bits)

Start pc

a 16-bit integer value (big endian).

16 (2 bytes)

Length

a 16-bit integer value (big endian).

16 (2 bytes)

Name index

a 16-bit integer value (big endian).

16 (2 bytes)

Descriptor index

a 16-bit integer value (big endian).

16 (2 bytes)

Index

a 16-bit integer value (big endian).

16 (2 bytes)

Local variable table

Local variable table is composed out of several other smaller elements. The table below provides an overview.

Name

Description

Size (in bits)

Attribute length

a 32-bit integer value (big endian).

32 (4 bytes)

Local variable table length

a 16-bit integer value (big endian).

16 (2 bytes)

Local variable table

Local variable table entry.

The number of elements in the list is Local variable table length of Local variable table.

Local variable table length of Local variable table times 80

Code

Code is composed out of several other smaller elements. The table below provides an overview.

Name

Description

Size (in bits)

Attribute length

a 32-bit integer value (big endian).

32 (4 bytes)

Max stack

a 16-bit integer value (big endian).

16 (2 bytes)

Max locals

a 16-bit integer value (big endian).

16 (2 bytes)

Code length

a 32-bit integer value (big endian).

32 (4 bytes)

Code

a 8-bit integer value (little endian).

The number of elements in the list is Code length of Code.

Code length of Code times 8

Exception table length

a 16-bit integer value (big endian).

16 (2 bytes)

Attributes count

a 16-bit integer value (big endian).

16 (2 bytes)

Attributes

Line number table or Local variable table.

The number of elements in the list is Attributes count of Code.

The particular type of data structure is selected based on the value of 16 leading bits. These bits are interpreted as an unsigned int. The table below lists the conditions, and the data structure assumed when these conditions are met.

Condition

Data structure

the value (a String) of the nth element of Constant pool of Class file (with n being the difference between the value of the first 16 bits and 1) equals the String "LineNumberTable"

Line number table

the value (a String) of the nth element of Constant pool of Class file (with n being the difference between the value of the first 16 bits and 1) equals the String "LocalVariableTable"

Local variable table

(unknown)

Exceptions

Exceptions is composed out of several other smaller elements. The table below provides an overview.

Name

Description

Size (in bits)

Attribute length

a 32-bit integer value (big endian).

32 (4 bytes)

Number of exceptions

a 16-bit integer value (big endian).

16 (2 bytes)

Exception index table

a 32-bit integer value (little endian).

The number of elements in the list is Number of exceptions of Exceptions.

Number of exceptions of Exceptions times 32

Method info

Method info is composed out of several other smaller elements. The table below provides an overview.

Name

Description

Size (in bits)

Access flags

a 16-bit integer value (big endian).

16 (2 bytes)

Name index

a 16-bit integer value (big endian).

16 (2 bytes)

Descriptor index

a 16-bit integer value (big endian).

16 (2 bytes)

Attributes count

a 16-bit integer value (big endian).

16 (2 bytes)

Attributes

a data structure selected from a list of 4.

The number of elements in the list is Attributes count of Method info.

The particular type of data structure is selected based on the value of 16 leading bits. These bits are interpreted as an unsigned int. The table below lists the conditions, and the data structure assumed when these conditions are met.

Condition

Data structure

the value (a String) of the nth element of Constant pool of Class file (with n being the difference between the value of the first 16 bits and 1) equals the String "Code"

Code

the value (a String) of the nth element of Constant pool of Class file (with n being the difference between the value of the first 16 bits and 1) equals the String "Exceptions"

Exceptions

the value (a String) of the nth element of Constant pool of Class file (with n being the difference between the value of the first 16 bits and 1) equals the String "Synthetic"

Synthetic

the value (a String) of the nth element of Constant pool of Class file (with n being the difference between the value of the first 16 bits and 1) equals the String "Deprecated"

Deprecated

(unknown)

Source file

Source file is composed out of several other smaller elements. The table below provides an overview.

Name

Description

Size (in bits)

Attribute length

a 32-bit integer value (big endian).

32 (4 bytes)

Source file index

a 16-bit integer value (big endian).

16 (2 bytes)

Wednesday, March 25, 2009

JavaOne 2009 Full Session Catalog (PDF)

Yesterday, I've been working a little on creating a full JavaOne 2009 session catalog. It's not done yet, but all of the summaries are included now, so I figured it would be good to share that. The objective is to complete the document, and have all of the presenters and their bios in there as well.

JavaOne 2009 Session Catalog JavaOne 2009 Session Catalog Wilfred Springer The JavaOne 2009 session catalog is a document that contains the summaries of all JavaOne 2009 sessions, giving you a chance to read through all material while you are offline.

Tuesday, March 24, 2009

Preon changes

The last couple of weeks, I spend most of my commuting time on redesiging the CodecDescriptor in Preon. And I think I'm getting there. Find the results below. I am currently trying to work this into the different existing Codecs. Not even halfway yet, but I think I will be able to get there this week.

Now, one of the important changes that this new CodecDescriptor introduces is the Documenter. I want to rename that into something a little bit more sensible. Anybody with a good idea on that, please send your suggestions. ;-)

The Documenter has been in Pecia for a while, but you could only use it in a paragraph context. I am making changes to Pecia now to use it all over the place. It would be good to have another name for it though. (Did I already mention that?)

Anyway, the Documenter - or whatever its name is going to be - is an object that is capable of rendering itself into a certain context. It works like a callback. If you pass the Documenter to - say - a paragraph, then the paragraph will turn around and ask the Documenter to render itself.


package nl.flotsam.preon;

import nl.flotsam.pecia.Contents;
import nl.flotsam.pecia.Documenter;
import nl.flotsam.pecia.ParaContents;

public interface CodecDescriptor {

/**
* An enumeration with different adjectives.
*/
public enum Adjective {
A, THE, NONE;

public String asTextPreferA() {
switch(this) {
case A: return "a ";
case THE: return "the ";
default: return "";
}
}

public String asTextPreferAn() {
switch(this) {
case A: return "an ";
case THE: return "the ";
default: return "";
}
}

}

/**
* Returns an object capable of writing a one-line summary of the data
* structure. Expect the summary to be printed at the beginning of a
* paragraph, but make sure the paragraph is ended in such a way that more
* lines might be appended to that paragraph, if required, by some other
* component. I.e. make sure you end with a dot-space. (". ") Typically
* starts with {@link Adjective#A}.
*
*/
<C extends ParaContents<?>> Documenter<C> summary();

/**
* Returns an object capable of rendering a short reference to the type of
* data for which the Codec provides the decoder. This reference should
* <em>at least</em> include a reference to the type of data decoded by
* 'sub'-Codecs. The {@link Adjective} argument allows the implementor to
* generate a correct reference, such as 'a list' instead of 'an list'.
*
* <p>
* Note that implementers should assume that the particular piece of data
* that is going to be referenced here will be detailed further along the
* road. Unless {@link #requiresDedicatedSection()} returns
* <code>true</code>, that could be within the same section.
* </p>
*
* @param adjective
* The adjective to use; <code>null</code> if no adjective should
* be used.
*/
<C extends ParaContents<?>> Documenter<C> reference(Adjective adjective);

/**
* Returns an object capable of writing detailed information on the format
* to the document section passed in. Typically implemented by writing a
* (couple of) paragraph(s), and forwarding to the CodecDescriptor of a
* nested {@linkplain Codec}. Note that - while forwarding - the descriptor
* has the option to replace the way the buffer is referenced.
*
* @param bufferReference
* A String based human readable reference to the encoded data.
*/
<C extends Contents<?>> Documenter<C> details(String bufferReference);

/**
* Returns a boolean indicating if the type of data for which the Codec
* provides the decoder should be documented in a dedicated section.
*
* @return A boolean indicating if the type of data for which the Codec
* provides the decoder should be documented in a dedicated section:
* <code>true</code> if it does; <code>false</code> if it doesn't.
*/
boolean requiresDedicatedSection();

/**
* Returns the title of the section to be rendered, in case
* {@link #requiresDedicatedSection()} returns <code>true</code>.
*
* @return The title of the section to be rendered, in case
* {@link #requiresDedicatedSection()} returns <code>true</code>.
*/
String getTitle();

}

Saturday, March 7, 2009

Spring ME and GWT

Yep, it has arrived. The Spring ME code base now contains an example of a GWT application wired together with Spring ME, thanks to Wojciech Mlynarczyk's awesome contributions. In order to get it running, you currently still need to check out the entire code base (check here for details).  But hold on, it's getting even better: it not only demonstrates the use of Spring ME in a GWT context (it can be done!), but it also demonstrates autowiring in Spring ME. How about that?


Monday, March 2, 2009

Codec Descriptors in Preon

The sole purpose of this entry is get my head around something that has been laying there rotting for quite a while now, and it's about time I get rid of it. I just haven't figured out how to do it yet. So, this is a note to self. Not only reminding me to make some changes (hurry up man!), but also to explain myself what the situation is a like right now, and how I feel I could get rid of it.

First of all, what is a CodecDescriptor. A CodecDescriptor is an interface. It's the interface that needs to be implemented by all Codecs. That is, all Codecs need to be able to return an implementation. It may be implemented by the Codec itself, but it might also be implemented by something else, outside of the codec.

Now, why do we even have an interface like that? The answer is simple. A Codec needs to be able to describe itself. This is just because of the way Preon works. If you construct a Codec for a certain type of data structure, then you essentially create a Codec that delegates to other Codecs, until you hit some of the elementary types defined by the framework. One of the ambitions of Preon is to have something that prevents you from having to maintain documentation on the encoding format. That documentation is generated by the Codecs themselves. They describe themselves, and - if they depend on other Codecs for realizing part of the work - they will delegate generation of documentation to other Codecs.

This approach has been the subject of quite some debate in the past. People argued that using something like a template language would have been good enough. I doubt it.

  • First of all, the inner workings of a particular type of Codec might be quite involved. Externalizing the bits responsible for generating documentation would break the encapsulation big time. Logic that is closely related is now spread across two different places.
  • Second, the Codec will often depend on a number of parameters. For instance, the Codec decoding numeric values requires to have the number of bits set, the byte order, the target data type, etc. All of that is also required for the documentation. Consequently, the template would require access to all of this data, and therefore the Codec would need to publically expose all of this, again breaking the encapsulation.
  • So you don't want to loose encapsulation, but you do want polymorfism. Whenever a Codec of lists needs to document itself, it needs the ability to say something about the type of elements it's decoding. In terms of Preon, that's just another Codec. When the Codec of lists is documenting itself, it should be able to just ask the element Codec to state something about itself, without being required to understand what that Codec is actually doing. (If you break that restriction, you are going to end up with a kazillion marker interfaces, and a lot of additional code in the Codec of lists.) So, if polymorfism is what you want, the object oriented approach makes a lot of sense.
  • In many cases, you would render things differently in a different context. In fact, one way of rendering might be ok in one context, but totally ridiculous in another context. If would for instance make perfect sense to generate a list item in a list, but it would be ridiculous to generate a list item in an image. With all of this, compile time checks come in handy. Not a strong case for template languages.

Now, as you may have noticed, I've drifted away from the question why a Codec needs a separate CodecDescriptor. After all, if they share so much in common - as I argue above - then why not have it al included in one class? In all honesty, I am not sure if I will be able to answer this question. The only thing I can say for sure is that the CodecDescriptor interface has had a comletely different lifecycle than the Codec so far. In many cases, if I didn't care about the actual description yet, I had the Codec return a general purpose Codec descriptor. Maybe not all that sensible, but it allowed me to continue on the important bits, without focusing too much on continuosly implementing boilerplate code.

Anyhow, enough about that. Let's take a look at the actual interface itself. This is it:

public interface CodecDescriptor {
    <T> void writeReference(ParaContents<T> contents);
    String getLabel();
    <T, V extends ParaContents<T>> V putOneLiner(V para);
    boolean hasFullDescription();
    <T> Contents<T> putFullDescription(Contents<T> contents);
    String getSize();
}

Doesn't look all that complicated, does it? The problem is that the operations are not documented. Not here, and not in the actual source code. There are processing expectations however. So I will try to capture those by looking at the operations one by one:

void writeReference(ParaContents contents)

I think that this operation is expected to do is to generate a reference to the type of data structure it is supporting into the target document.

String getLabel()

I think what this operation is expected to do is to generate a refeference to the type of data structure it is supporting. However, since all this operation is capable of doing is return a String, that basically prevents it from generating a real link. (A link in the Pecia definition of the word. That is, something that eventually will be rendered into an HTML anchor tag, or a DocBook xref.)

ParaContents putOneLiner(ParaContents para)

I think what this operation is expected to do is to genenerate a single short description of itself into the output document. As far as I remember, this was done to allow you to generate a short description in a table for a field, and then make sure it includes a reference to something defined in detail elsewhere. Sort of similar to the way JavaDoc works.

boolean hasFullDescription()

Called to check if the next operation is implemented in a meaningful way.

Contents putFullDescription(Contents contents)

Similar to putOneLiner. However, in putOneLiner, it would be impossible to write outside a single paragraph. In putFullDescription, you can basically write a number of paragraphs, include tables, images, etc.

String getSize()

I think it is expected to return a human readable description of the size. Is it still required? That's hard to tell. Probably not, now we have getSizeExpr() on Codec.

So, as you can see, the contract is quite fragile. And there are a couple of other problems:

In some cases, you want the references to be inlined. In other cases, you want them to be part of the start of a sentence. Do you need to start whatever is generated with a capital? Or do you expect the framework to recognize when to change whatever you are generating into something starting with a capital?

As you can see, there is quite a bit of stuff that needs to be clarified or redefined. Question is, how? I will start to take a look at it tonight, and see if I can slowly refactor this to something that makes a little bit more sense, and is a little bit more helpful.

Sunday, March 1, 2009

Writing a Java byte code decoder without actually writing it

During my talk about Preon at Devoxx last year, I mentioned that I was working on capturing Java's class file format in Preon. This weekend, I took another shot at it, and it's starting to come together quite well. :-)

The source code itself is too much to publish here (what else would you expect from something capable of reading byte code), but if you're interested, I suggest you check out the latest version here. And as you will see, it does not actual contain any Java code reading byte code; it's all done in the Preon declarative way.