URL rewriting is not enough
Apache has bucket brigades - which are essentially lists of output buffers - at the heart of its filtering architecture. The buckets are moved brigade wise through the output filters that manipulate them. The first idea is to fill sax events into those buckets. This is possible because buckets morph into simple text output by calling their read function. So there is a sax filter, that turns the outgoing bucket stream int a stream of sax events. These can be rewritten by subsequent filters. Whatever happens, before they finally reach the network they morph into text.
This has been implemented in mod_xml2. The problem currently is, that modules that manipulate sax buckets need to be written in C. The existing modules mod_xi and mod_i18n were too hard to write (and are currently not sufficiently maintained).
My current plan (which is work in progress by now)
is therefore to make sax buckets available to
higher level languages with access to the apache api, namely perl and
lua. Since this implies wrapping the sax events with an API
that then must be made available to said languages, I use
libxml2
DOM nodes for this. These are already wrapped. Even more
important is that they have a well documented api for both languages.
The sax buckets have been renamed to node buckets since their binary format
is completely different and since they hold libxml2
nodes. The switch to
node buckets also saves a lot of code in mod_xml2
.
Functionality already implemented in libxml2
does not need to be
reimplemented.
Parsing the outgoing XML runs the libxml2
tree builder with hooked
sax handlers. Element nodes are removed from the tree the in the end handler,
all other nodes are removed immediately. Node buckets are shared buckets
with reference counting. This is used to have start and end element hold
the same node. As a result it is easy to rebuild the tree from the bucket stream,
since the start bucket already knows the end bucket.
libxml2
implements streaming XPath expression, which allow matching
a very restricted subset of XPath expressions while parsing. Using these it should
be easily possible to implement filters which call a given callback passing the matched
subtree as a parameter. The point with these is that only these subtrees need to be build.
Implementing KID
like template engines that execute <?perl
and <?lua
processing instructions should also be doable.
My current project goals are to
The last one is because I like libxml2
. It is highly useful for
web stuff because it can also parse HTML. It is also to justify the
name.
A similar project with a different approach is Nick Kew's mod_xmlns.