Affichage des articles dont le libellé est noesis. Afficher tous les articles
Affichage des articles dont le libellé est noesis. Afficher tous les articles

mercredi 23 décembre 2009

Strict reversibility in XSugar

XSugar is a tool to do bidirectional transformations between two file format. This is particulary useful to provide common API to configuration files under Linux. For example, here is the result of a stylesheet on /etc/hosts file :

<hosts xmlns="http://usherbrooke.ca/">
<record>
<ipaddr>127.0.0.1</ipaddr>
<canonical>localhost</canonical>
</record>
</hosts>

This file can be converted back to it's flat format. But, as you may notice, indentation doesn't appears in the XML file, and will be lost. Spacing is reset to a default value. The round-trip between hosts file and XML format keeps the semantic, but looses formating. Even without modification, if the file is written back, diff will show changes. Once spaces are reset, round-trip will yield identity function i.e. strings will be exactly the same.

One solution to overcome this problem is to add to the XML all elements that would be lost otherwise. This can be done by labeling terminal elements, and add corresponding nodes to XML part of the stylesheet. For examples, this rule loose optional "a" header :

A = [a]*
X = [x]+
n : [A] [X x] "z" = <x> [X x] </x>

Providing input "aaaaxxz" will give the following XML :
<x>xx</x>
Converting it back to non-XML will yield the string "xxz". Since the empty string matches "[a]*", this is the default string that is returned.

Now, let's label the terminal "A" :

A = [a]*
X = [x]+
n : [A a] [X x] "z" = <x> [X x] <a> [A a] <a></x>

Now, we get the string
<x>
  xx
  <a>aaaa</a>
</x>

and converting it back to non-XML format yield "aaaaxxz", the exact same string as the original input.

Preserving semantic of the file is simple bidirectional property. In addition, if the stylesheet preserve the concrete representation of an input, I call this strict bidirectionality.

Strict bidirectionality can be achieved by labeling unlabeled terminal, and add corresponding element to the XML part. I did a small prototype of this algorithm, that augment the resulting stylesheet. Hence, any stylesheet can be made strict bidirectional.

It rises the question : can we staticaly verify that a stylesheet is strictly bidirectional. Hopefully yes, it's really simple. We have to do the basic check that the stylesheet is bidirectional, and then verify that all regular expression terminal are labeled. This way, we are sure that all the variable concrete string will be represented in the XML.

Automatic strict bidirectionality for stylesheet and static validation of this property will be useful to provide the behavior a system administrator would expect from a tool that modify configuration files under Linux. Let's go on!

mercredi 30 septembre 2009

Java and system administration

Java is a wonderful platform for software development, it's mature, stable and feature-full. I had a bias against Java for many reasons in the past, as my coworkers, that lead to dismiss it a few time. Now, since I'm working with Java for Noesis, I had time to revisit this technology. Here are the bias deconstructions.
  • Java JVM is slow : We don't know for sure at first if the running time of a target application will be slower or faster with C++ or Java, and I don't want to dig into benchmarking. There are so many factors that may lead to a winner and the contrary. We know for sure that there is some overhead while running Java program, but for many applications, the raw output is not a primary concern. In scientific computing field, some code was not optimized and was taking a huge time to run, and optimization was leading to great performance improvement, better than a technology switch. Also, since there are large libraries available for Java, reuse reduces the development time, and it may represent a large portion of total running time. Startup time is annoying, but it occurs only the first time, and can be reduced by read ahead caching.
  • Java is memory hungry : java applications uses a lot of memory because they do a lot of things.
  • Java was proprietary : I was excluding Java because it was not open source, and then, any open source Java software had a non-free dependency. Free JVM and libraries was available, but you were on your own by doing this. Since Java JDK is now open, we are free to use it.
  • Java is complicated : there is some learning curve, but that's not so bad. Eclipse is helping a lot to reduce burden because of typechecking. Compiling with "ant" is so much easy compared to autotools madness! And since CDBS has support for ant, it's easy to package a Java program.
  • Java is huge : Installing JRE headless on a minimal ubuntu requires about 93MB of disk space. I agree that if you run only one application, it's huge. But if we were mainly running java software, this stack would be used for all applications, and also here, the reuse lowers disk requirement. But for Noesis, it's a good question to ask : do system administrator will be willing to install about one hundred MB of stuff for it? 100MB disk space costs about 1 cent today, but with backups and management time, say 5 cents. Well, I guess it would be the best investment we can do with a nickel. Still, I will have to work hard to convince system administrators to install a JRE on their Linux virtual server that takes less than few hundred MB, because it's a large proportion of the server install size.
  • What about embedded devices? It could be of practical interest to get Noesis running on a small device. There is the JME, but since it's stripped, I don't really know for sure if it won't break something.
It's a chance that XSugar was in Java, because it's definitely an important technology in the whole picture, and a new arrow to my quiver.