What's this library about
Extensible Markup Language (XML) documents are widely used as
containers for the exchange and storage of arbitrary data in today's
systems. Updates to this data requires exchanging of the entire
XML document between hosts, unless there's a mechanism that
allows exchanging only the updates of XML documents. This memo
describes a framework utilizing XML Path language (XPath) selectors
with the aid of which a set of patches can be applied to an existing
initial XML document.
This xml-patch library utilizes the libxml2 library which has a full
XPath 1.0 support. The
xml-patch-ops I-D describes the semantics of these patch operations.
How does the library work ?
- Parsing
- First the initial XML document (to pe patched) and the frame XML diff document which contains patch operation elements are parsed with the libxml2 library. One-by-one patch operations: add, remove & replace are then applied to the document to be patched.
- Resolving default namespace
- If the patch operation element has an in-scope default namespace declaration, the "sel" selector values are changed so that e.g. a selector value 'root' is changed to '*[local-name()="root"][namespace-uri()="default_ns_uri"]'. This is because libxml2 strictly follows the XPath 1.0 spec where "root" selection is used to locate an unqualified <root> element. The xml-patch-ops I-D has adopted a more relaxed model in this case and it has actually a similar approach than what W3C Schema structures is using for types. XCAP has also this kind of model.
- Resolving namespace references
- The "sel" selector values may contain also prefixed names. The namespace uris for these prefixes are found by requesting all the in-scope namespaces within the patch operation element. These prefixes/URIs are then registered before the XPath request is evaluated. The XPath evaluation should always locate a single node from the document to be patched. Once the target node is found then a patch operation will be done. E.g. when adding element(s) a simple unlinking of the new node(s) is first done. Then namespace references within the new content are recursively moved to references within the document to be patched. This is done by matching nodes with the same namespace URIs. Finally this new node is added.
- Multi-select extension
- In addition to the "sel" selector, an "msel" selector could be used with XPath selections. This is an extension to the xml-patch-ops I-D. This allows e.g. to remove multiple attributes or elements with one request. The resulting node-set may thus contain from one to unbounded nodes. This can be used in combination with the anywhere "//" selector which is also not allowed in the I-D because of performance/simplicity reasons. During element removals or replacements the library checks for nested elements and it doesn't produce an error in such a case, instead it just omits those elements which have parents in the selected node-set.
- Text node patching
- The xml-patch-ops I-D contains a possibility to patch text nodes or attribute values, once a proper algorithm is available. This library does not (yet?) have support for them mostly because of ipr issues.
Example logging results
Examples