vendredi 9 avril 2010

Limitations of XSugar

In previous posts, I explained how XSugar could be used to transform configuration files to XML and back. Preserving white characters (space, tab, carriage return, etc.) is possible by creating special elements to hold them. In a simple round-trip, when the XML document is not modified, it works well. But, there are issues when the XML document is modified. For example, be this simple XML with strict nodes, from the students example of XSugar.

<students xmlns="http://studentsRus.org/">
<student sid="19701234">
<name>John Doe</name>
<email>john_doe@notmail.org</email>
</student>
<space1> </space1>
<space2> </space2>
<newline>
</newline>
<student sid="19785678">
<name>Jane Dow</name>
<email>dow@bmail.org</email>
</student>
<space1> </space1>
<space2> </space2>
<newline>
</newline>
</students>

There are spaces and newline elements that follow the student record. They are used to restitute the exact formating of the original string. We can then modify an existing value, and spaces will be preserved, and this correspond to a minimal edit.

But, say another record is inserted after the first student, but before space1 element, then the first record will see it's spacing reset to default values, and the new inserted element will get the spaces from the first record! This doesn't correspond to minimal edit, and is similar to alignment problem found by Foster et. al. in Boomerang.

If a student element is removed, but spaces elements are left, then it will cause a syntax error according to XSugar stylesheet. If an element is moved, it must inserted to right place in the XML, otherwise syntax error may occur.

It may be possible to fix the XML before transformation to non-XML, by adding, moving or deleting space elements. The idea to use the original document and the modified one to produce the updated output is already what does a lens, and is exactly what Augeas does. And now, with the introduction of the recursive lens, Augeas is about to support context-free languages, and opens the door to handle the Web server Apache configuration, and many other.

That's all for now, stay tuned!