Genshi Performance Enhancements
Original content from Genshi list
1) Static match templates
Many match templates don't actually need to be run at render time; rather they represent transformations that could be done immediately after the template has been parsed (let's call this "compile time"). For example, if you have a match template that matches every <foo>…</ foo> element and transforms that into <div class="foo">…</div>, Genshi should be able to expand that transformation at compile time. When the template is actually rendered, Genshi no longers sees <foo> tags and a corresponding match template, it just sees the <div class="foo"> tags. Static matching should probably be opt-in (or at least opt-out), for example by defining the match template as <py:match path="foo" static="true">.
This would require some surgery to add a proper optimization stage to the template "compilation" process. That stage would also allow other kinds of optimizations, such as moving the checks for valid nesting of py:choose/py:when/py:otherwise out of the render stage. But static matching definitely has the most potential for a huge speed boost in a lot of scenarios.
2) Matching fast-paths
Here, the idea is to add fast paths for simple but common match template constructs such as matching by tag name and/or attribute value. Instead of going through the full XPath matching algorithm, you use a simple hash lookup to determine whether a given element matches. (Alec Flett recently brought up this idea, and started a branch to implement it. Alec, how's that branch doing?)
3) Serialization hints / Markup event collapsing
Genshi currently computes the XML or HTML representation of every event in a template output stream at render time, to enable generating different serializations of the same markup at the infoset(-ish) level. I.e. write your templates in XHTML, and switch to HTML on the fly when rendering.
However, if you know beforehand that you're going to be using a specific serialization method, the representation of many of the template events could be pre-computed, so that actually serializing that event just means returning a static string. For example, you could pre-compute the string representation of the markup event (START, ('b', [(class, 'foo')])) to be <b class="foo">. When the serializer sees that event, it just returns the pre-computed string.
The challenge with this is that template directives and expressions, but also stream filters, can replace events in a template output stream. So if that START tag had an expression in an attribute value, you can't pre-compute the serialization (especially considering that expressions returning None remove the attribute altogether). Or a stream filter might replace that event entirely, changing the tagname to "strong" or adding/modifying attributes. XML namespaces raise yet more challenges.
But in general, being able to map an event to a static string should really help performance. Taking it a bit further, it should be possible to collapse multiple "static" events into just one event. Alec Thomas has started some work in this direction on the "optimizer" branch. Alec, any insights you'd like to add to this?
4) Compilation to Python byte code
This would compile templates down to Python byte code. I started this on the "inline" branch, but it's incomplete, and the measurement results were somewhat underwhelming. It might still help general performance, but I think the stuff mentioned above would be more beneficial.
Okay that's all I can think of for now.
I'd be willing to act as mentor for such a project, if there's someone who wants to tackle the problems and seems capable of doing the work.