mardi 1 juin 2010

Square lens for Augeas

Augeas has the ability to handle XML like tag. Here is an example of how to do it with key and del lens in a simplified version for a paragraph tag of an HTML document.

let para = [ Util.del_str("<") . key "p" . Util.del_str(">") .
                    store /[a-zA-Z0-9 \r\n\t]*/ .
                    Util.del_str("</p>") ]

That's working fine, but there is a gotcha: we have to list all HTML tags as strings, not as a regexp. Let's create a new version of this lens to process arbitrary tag.

let tag = [ Util.del_str("<") . key /[a-z]+/ . Util.del_str(">") .
                 store /[a-zA-Z0-9 \r\n\t]*/ .
                 Util.del_str("</") . del /[a-z]+/ "abc" . Util.del_str(">") ]

Let see what happens with the tree cases get, put and create.

  • The get will accept arbitrary tags and set the node label accordingly 
    • <p>text</p> ---> {"p" = "text"}
  • The put direction, without updating the label, will yield correct result
    • {"p" = "text"} ---> <p>text</p>
  • In the create operation, in the case of a new label, this will yield the default value for the close tag, and this will produce a syntax error
    • <p>text</p> ---> {"p" = "text"} ---> {"b" = "text"} ---> <b>text</abc>

Also, this lens accepts a malformed tags, like "<a>text</p>", and we should throw an error in this case.

We need that the closed tag be linked to the key. The square lens is just about that. Let's rewrite the example with the square lens. The lens takes two arguments. First, a regexp that describe the tag, and behaves as a key. The other is a lens that represent what's inside the tag.

let content = Util.del(">") . store /[a-zA-Z0-9 \r\n\t]*/ . Util.del_str("<")
let tag = [ Util.del_str("<") . square /[a-z]+/ content . Util.del_str(">") ]

The first difference is that, now, the open and close tag are related. In the get direction, we can test that the second tag is the same as the first, and then detect syntax errors in the input document. The other difference is about the create operation in the put direction, because now we can copy the key of the node at the end of the content, and then yield correct behavior.

"<p>text</p>" ---> {"p" = "text"} ---> {"b" = "text"} ---> "<b>text</b>"

The first lens to benefit from it will be httpd.conf lens. Stay tuned, the patch is cooking!

Aucun commentaire: