This is what the Wiki page says:
:TODO: we should try to make a DTD for the schemaNow, no matter how much I like Solr and miss a schema: creating a DTD for it is just so 90s. There's gotta be something better. Why not a RelaxNG schema instead? This is my first stab at it:
datatypes d = "http://www.w3.org/2001/XMLSchema"
start =
element schema {
attribute name { text }?,
attribute version { text },
types,
fields,
element uniqueKey { text },
element defaultSearchField { text },
element solrQueryParser {
attribute defaultOperator { "AND" | "OR" }
},
copyField*,
similarity?
}
types = element types { fieldtype* }
fieldType =
element fieldType {
attribute name { text },
attribute class { text },
attribute sortMissingLast { d:boolean },
attribute omitNorms { d:boolean },
attribute indexed { d:boolean }?,
(empty | analyzer)
}
analyzer =
element analyzer {
attribute class { text }?,
tokenizer?,
filter*
}
tokenizer =
element tokenizer {
attribute class { text }
}
filter =
element filter {
attribute class { text },
attribute ignoreCase { d:boolean }?,
attribute words { text }?,
attribute enablePositionIncrements { d:boolean }?,
attribute generateWordParts { d:int }?,
attribute generateNumberParts { d:int }?,
attribute catenateWords { d:int }?,
attribute catenateNumbers { d:int }?,
attribute catenateAll { d:int }?,
attribute splitOnCaseChange { d:int }?,
attribute protected { text }?
}
fields = element fields { field*, dynamicField* }
field =
element field {
attribute name { text },
attribute type { text },
attribute indexed { d:boolean }?,
attribute compressed { d:boolean }?,
attribute stored { d:boolean }?,
attribute required { d:boolean }?,
attribute multiValued { d:boolean }?,
attribute omitNorms { d:boolean }?,
attribute termVectors { d:boolean }?
}
dynamicField =
element dynamicField {
attribute name { text },
attribute type { text },
attribute indexed { d:boolean },
attribute stored { d:boolean }
}
copyField =
element copyField {
attribute source { text },
attribute sku { text }
}
similarity =
element similarity {
attribute class { text },
element str {
attribute name { text }
}*
}
3 comments:
Very cool!
There's one problem though. Solr schema isn't really fixed and rigid. The tokenizers and filter factories can define their own custom attributes e.g. WordDelimiterFilterFactory defines catenateAll etc.
Hmmm, so the question is if this schema could be autogenerated somehow. Or make it extensible and have some additional schematron rules. Or make all attributes that I know of optional.
The problem obviously is that in a schema, you will not be able to to match on elements with a certain attribute value. (Thinking. I think only Schematron would be able to do that.)
I don't know how/whether RelaxNG can handle it, but with XSD you can definitely leave room for extensions. I guess the schema (RelaxNG or XSD) should cover all the fixed elements though - so for example in the filter element, the name and the class attributes will always be there so the schema should at least cover them.
Post a Comment