Skip to content

JDOM2 Feature XPath Upgrade

paulk-asert edited this page Apr 9, 2012 · 4 revisions

The XPath API is being rewritten in JDOM2. This is a drastic step to undertake, so here is the motivation on why it is necessary.

Why XPath in JDOM 1.x is broken.

The Factory concept:

JDOM has a single static method setXPathClass(String classname).

The problems with this are:

  1. JDOM could be used by many different places in a code base. One code point may change the default class but another unrelated code point could suddenly start failing because it is not using the right library. This could cause odd issues that are hard to debug.
  2. There is no way to have a thread-safe way of having multiple concurrent active XPath libraries.
  3. it is slow to have to use reflection to create each XPath instance each time (with a Constructor instance).

The XPath Namespaces:

The XPath class does not model the normal way that XPaths are used in regular practice. Most libraries (all the ones I have investigated) allow you to compile the XPath expression. Once the expression is compiled, the only thing that can change is the value attached to variables.

The JDOM XPath class allows you to change variables, as well as Namespaces (because the code uses the Context object at evaluate time to 'augment' the namespace context with the in-scope namespaces of the context.

The actual code only 'appends' to the namespace context (adding prefixes that were not already defined) but this makes no sense because it would only be useful if the expression uses these new namespace prefixes... and that means that the expression must have been compiled with missing namespace prefixes, which is absurd.

Thus, using the namespaces on the context to 'augment' the compiled XPath expression is broken, or useless.

The XPath string, boolean, and number value:

The XPath class has the functions: valueOf(), numberValueOf(), and booleanValueOf(). These functions relate to the fact that XPath expressions can return values of these types. But, the XPath-way to do functions like this would be to use the appropriate XPath function: string(), number(), or boolean(). These JDOM XPath methods are just 'sugar functions' that closely resemble functionality that is built in to the Jaxen library only.

In a generic-typed world, it makes sense to remove these functions and replace them with appropriately typed generic functions. Additionally, XPath 2.0 has a multitude of return types, not just String, Double, and Boolean.

Serialization:

JDOM XPath claims to be serializable, but, the code only serializes the expression, not the namespaces and not the variables. It does not even serialize the factory class used to create the XPath.

Since the Variables are not possible to serialize properly (because we have no idea of what Object values are set, or whether they are serializable), it is not possible to effectively serialize the XPath itself.

As a result, the serialization is very broken.

Exceptions:

The XPath class throws JDOMException (a checked exception) whenever a problem is encountered in the XPath processing. This exception requires try/catch/throws logic to process.

This issue is subject to debate, and there are pro/con tradeoffs in whatever system is used, but the model of java.util.regex.* (Pattern/Matcher) seems to be a close analogy to what XPath processing should be like.

In fact, the native Java XPath API (javax.xml.xpath.*) uses unchecked exceptions to process XPath expressions.

Conclusion

The XPath API in JDOM is broken, incomplete, too tied to Jaxen, and not easy to use.

What does JDOM2 do differently

JDOM2 replaces XPath with: XPathFactory, XPathExpression, XPathBuilder, and XPathDiagnostic. A new feature in JDOM 2.0.0 is the XPathHelper class which has static methods that build XPath queries based on existing JDOM content. In addition, there are some abstract classes that make library-specific implementations easier to build.

XPathFactory

An XPathFactory instance is the user-facing API to a specific implementation of an XPath library. For example there is a Jaxen-specific XPathFactory.

XPathFactory provides static methods to access a default XPath library (instance()), and also provides a static means to create instances of non-default XPath library implementations (newInstance(factory-name)).

The factory is then used to compile XPathExpression instances.

XPathExpression

This is the new core concept of an XPath in JDOM2. The XPathExpression is evaluated against a JDOM context in order to get an XPath result.

XPathExpression has a generic type. It only returns content that matches it's generic type. The generic type is set by creating the XPathExpression with an appropriately typed Filter.

This generic typing imposes behaviour on XPathExpression that 'goes beyond' what regular XPath libraries do. For example, the XPathExpression "//node()" returns every node in the document. If this were used as the expression for an XPathExpression<Element> then the result would only contain the Elements. If you want to get everything back then you should use a pass-through filter, but that can return only Object generic types (XPathExpression<Object>).

This additional level of filtering of XPath results is bound to raise issues when the results people get from a query is inconsistent with expectations. People will wonder whether the XPath expression never selected the content, or whether the Filter is removing selected content.

The role of the XPathDiagnostic class is to allow the user to access what the XPath result data in both its raw and its filtered state. Thus allowing the (relatively) easy debugging of where data is filtered.

XPathBuilder

Finally, compiling an XPathExpression requires the simultaneous availability of the core expression, its namespace context, its variable context, and the Filter. These items may be cumbersome and bulky.

The XPathBuilder allows for the management of this information in an intermediate stage, allowing the namespace and variable contexts to be built up. When the information is all accumulated in the XPathBuilder you can submit the builder to an XPathFactory to be compiled.

This allows for an alternative way to compile an XPathExpression that may (depending on circumstances) be easier to control and manage than using the XPathFactory.compile(...) methods directly.

XPathHelper

XPathHelper is a new class (ported from JDOM 1.x's contrib area, and then substantially rewritten) that allows users to express the location of any JDOM content as an XPath query. You can get the full path to the content, or you can get the path relative to some other JDOM content.

Abstract Classes

JDOM2 provides an AbstractXPathExpression class that implements all the Filtering logic, and validates the input data. This provides a convenient base on which to build a library-specific compiled expression.

JDOM2 provides a useful implementation of the XPathDiagnostic interface called XPathDiagnosticImpl. This class is a full implementation that not only has a useful toString() method, but also stores the information internally in a way that makes inspection of the data easy from within a Java debugger.

#Putting it all together

JDOM 2.x XPaths are very different from JDOM 1.x. The JDOM2 XPath API:

  1. always compiles expressions before evaluating them
  2. is never serializable
  3. You need to declare all the referenced Namespaces when you compile the expression, not when you evaluate it.
  4. is not 'sensitive' to the namespaces on the context node at evaluation time
  5. throws IllegalArgumentException, IllegalStateException, and NullPointerException, and never throws JDOMException.
  6. is able to have multiple XPath back-end implementations operating simultaneously.
  7. it is able to support XPath2.0 libraries because it only uses the most simple functional entry points. Any special variable and return values can be easily handled with appropriae filtering on the results.

Example - Simple values

Note how the return type is Element, not Object:

XPathExpression<Element> xpath =
    XPathFactory.instance().compile("/path/to/node", Filters.element());
Element emt = xpath.evaluateFirst(document);
if (emt != null) {
    System.out.println("XPath has result: " + emt.getName());
}

Example - List results.

XPathExpression<Element> xpath =
    XPathFactory.instance().compile("/path/to/node", Filters.element());
List<Element> elements = xpath.evaluate(document);
for (Element emt : elements) {
    System.out.println("XPath has result: " + emt.getName());
}

JDOM 1.x Compatibility

Because such a large change has happened it becomes difficult to maintain compatibility with JDOM 1.x. There are some options, but it seems simplest to just deprecate the old JDOM 1.x XPath class, and to have the new JDOM2 API alongside the old API. This may lead to some confusion with people seeing the XPath class, and thinking it is still useful, but the deprecation message should make that unlikely.

To maintain as much compatibility as possible though, the 'old' XPath class has been left substantially unchanged from the JDOM 1.x version. It does not support generics, and in fact it suffers from all the problems itemised at the top of this page. Please do not use the XPath class in any JDOM2-enabled code. The new API should be much easier to use. The old XPath class is completely unsupported.

The deprecated XPath still uses (by default) Jaxen in the back-end, but it no longer uses the 'interface layer' embedded in the Jaxen library, instead it implements a replacement interface layer in JDOM. The new interface layer allows for the build process to be simpler (removes the circular compile dependency). Additionally, the current JDOM layer in JAXEN only supports org.jdom.* classes, not org.jdom2.*.

Of significance in this update is that you can use any JDOM content as the context for an XPath expression, not just Element and Document content. Also, the improved performance of the Iterators in JDOM2 has a very direct impact on XPath performance, so XPath expressions are now much faster. So, even though XPath class is deprecated in JDOM2, it has significant improvements over the JDOM 1.x implementation.

Clone this wiki locally