Saturday, September 26, 2009

Parameterized Byte Order

I started checking out Lua byte code a minute ago. (Thanks to Stackoverflow for finding its location.) Now, I haven't started reading all of the details yet, but there is one section that already is proving to be troublesome.

1 byte Endianness flag (default 1)

• 0=big endian, 1=little endian

1 byte Size of int (in bytes) (default 4)

1 byte Size of size_t (in bytes) (default 4)

1 byte Size of Instruction (in bytes) (default 4)


It's the byte order. It means that - if I would try to decode Lua bytecode using Preon - I would have to make some changes to the annotations.

Currently, the annotations for decoding numeric data look like this:


@BoundNumber(byteOrder=LittleEndian)
@BoundNumber(byteOrder=BigEndian)

... but now that byteOrder needs to be defined by something read before. I guess there are a couple of ways to deal with it:

Option 1: Have the ability to change the 'global' default byte order setting as a consequence of reading a value

Now, global doesn't really need to be global. It might be good enough if it would be a default for the current 'lexical scope'. So basically, if you would read a block of data that would indicate a change in the byte order, then Preon could basically respect those changes for anything read within that block.

Option 2: Have the ability to refer to ByteOrder enum values read previously from @BoundNumber annotations

Now this is harder. Not undoable, because this is exactly what you can already do for reading the number of bits for instance:



@BoundNumber(size="header.numberOfBits")

For byte order, it could be similar:


@BoundNumber(byteOrder="header.byteOrder")

However, this would mean that instead of typing byteOrder as an enum value of type ByteOrder, it would be a String. That would be less compile time safe. And it would require an incredible number of complex pointers pointing to that header attribute.

But maybe there is a third option. Add a new ByteOrder enum value called Default, basically indicating that the ByteOrder is defined somewhere else. Currently, the default is LittleEndian. But maybe that default needs to be set as an annotation as well.

Anyway, there is a lot to think about. I will see if I can answer this one today.


Tuesday, September 22, 2009

Preon Encoding Started

I started work on supporting Preon the other day. One of the first things I will have to do is to define the interface that will be used to write data. Picking that interface is not trivial. One of the ideas that I suggested last year is not to try to keep Preon symmetrical; that is, it probably makes sense not to preserve the random access type of access to the data when writing data, but write to a streaming type of interface instead.

There are a couple of reasons why that would make sense:
  • In-place editing is just going to be way to painful: if you would a couple of elements to a List, you would need to move blocks of data around. I can hardly imagine what that would look like, and it's probably not going to be very efficient.
  • Now, regardless on what the repercussions would be in terms of moving blocks of data around in memory and on disk, it would also imply changing all pointers that still have a reference to other places in that buffer of bytes. So, suppose that you would have two lazy loading lists after each other. Then in order to be able to lookup elements of the second list on the fly, that list will keep a reference to its starting point. That reference will be based on the offset of the start of its content relatively to the beginning of the file. Now, inserting elements in the first list also needs to result in updating those references.
  • Preon aims to be threadsafe. That complicates matters further. As a result of adding an element to a list, thousands of pointers might potentially require an update, but from the perspective of the client consistency would need to be preserved during that transaction. This calls for an STM type-of solution.
I'm still not sure if it would be totally ridiculous to even support in-place editing, but it's just going to be to hard at this stage. Having the ability to 1) process changes, and 2) to persist all of those changes to disk in a single snapshot would already be quite a challenge, but it would be an awesome feature.

Tomorrow, I will try to add some thoughts on the shape of the interface that I am currently considering.

Saturday, September 12, 2009

Transforming WADL Documents to Documentation

Two years ago, Mark Nottingham published an XSLT stylesheet to transform a WADL document into a human readable document. When I tried to create something similar a year ago, I immediately ran into a problem when trying to split a resource path into text and parameter parts. I'm sure you can get something done by calling a template recursively, but I wondered how hard it would be to do it using an XSLT extension function.

I figured I would give it a go, and created a function that will return a Nodset with the different parts of the path attribute. The stylesheet down below is using it. It will iterate over all resources with a path attribute, and then for each resource apply the wadl-utils:parse() function to its @path attribute. This function will return a new document, and since it is an XML document, we can apply an XPath expression on it. In this case, the XPath expression is /path/*. So, inside of the for-each loop visiting all resources, there is another for-each loop visiting all parts of the @path attribute.

Looking back, it's questionable if something like this couldn't have been done using plain old XSLT, but it was an interesting excercise anyway. The code is in the WADL repository at http://wadl.dev.java.net/.


<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"
xmlns:wadl-utils="xalan://org.jvnet.ws.wadl.xslt.WadlXsltUtils"
xmlns:wadl="http://wadl.dev.java.net/2009/02"
exclude-result-prefixes="wadl-utils">

<xsl:output method="text"/>

<xsl:template match="/">
<xsl:for-each select="//wadl:resource[@path]">
<xsl:value-of select="@path"/>
<xsl:text>: </xsl:text>
<xsl:for-each select="wadl-utils:parse(@path)/path/*">
<xsl:text>[</xsl:text>
<xsl:value-of select="local-name(.)"/>
<xsl:text>: '</xsl:text>
<xsl:value-of select="text()"/>
<xsl:text>']</xsl:text>
</xsl:for-each>
</xsl:for-each>
</xsl:template>

</xsl:stylesheet>

Monday, September 7, 2009

Solr Configuration File Schema

I'm starting to like Solr. There's just one thing I can't stand. That's when there is no schema for editing configuration files.

This is what the Wiki page says:
:TODO: we should try to make a DTD for the schema
Now, no matter how much I like Solr and miss a schema: creating a DTD for it is just so 90s. There's gotta be something better. Why not a RelaxNG schema instead? This is my first stab at it:


datatypes d = "http://www.w3.org/2001/XMLSchema"

start =
element schema {
attribute name { text }?,
attribute version { text },
types,
fields,
element uniqueKey { text },
element defaultSearchField { text },
element solrQueryParser {
attribute defaultOperator { "AND" | "OR" }
},
copyField*,
similarity?
}
types = element types { fieldtype* }
fieldType =
element fieldType {
attribute name { text },
attribute class { text },
attribute sortMissingLast { d:boolean },
attribute omitNorms { d:boolean },
attribute indexed { d:boolean }?,
(empty | analyzer)
}
analyzer =
element analyzer {
attribute class { text }?,
tokenizer?,
filter*
}
tokenizer =
element tokenizer {
attribute class { text }
}
filter =
element filter {
attribute class { text },
attribute ignoreCase { d:boolean }?,
attribute words { text }?,
attribute enablePositionIncrements { d:boolean }?,
attribute generateWordParts { d:int }?,
attribute generateNumberParts { d:int }?,
attribute catenateWords { d:int }?,
attribute catenateNumbers { d:int }?,
attribute catenateAll { d:int }?,
attribute splitOnCaseChange { d:int }?,
attribute protected { text }?
}
fields = element fields { field*, dynamicField* }
field =
element field {
attribute name { text },
attribute type { text },
attribute indexed { d:boolean }?,
attribute compressed { d:boolean }?,
attribute stored { d:boolean }?,
attribute required { d:boolean }?,
attribute multiValued { d:boolean }?,
attribute omitNorms { d:boolean }?,
attribute termVectors { d:boolean }?
}
dynamicField =
element dynamicField {
attribute name { text },
attribute type { text },
attribute indexed { d:boolean },
attribute stored { d:boolean }
}
copyField =
element copyField {
attribute source { text },
attribute sku { text }
}
similarity =
element similarity {
attribute class { text },
element str {
attribute name { text }
}*
}