mercredi 30 septembre 2009

Java and system administration

Java is a wonderful platform for software development, it's mature, stable and feature-full. I had a bias against Java for many reasons in the past, as my coworkers, that lead to dismiss it a few time. Now, since I'm working with Java for Noesis, I had time to revisit this technology. Here are the bias deconstructions.
  • Java JVM is slow : We don't know for sure at first if the running time of a target application will be slower or faster with C++ or Java, and I don't want to dig into benchmarking. There are so many factors that may lead to a winner and the contrary. We know for sure that there is some overhead while running Java program, but for many applications, the raw output is not a primary concern. In scientific computing field, some code was not optimized and was taking a huge time to run, and optimization was leading to great performance improvement, better than a technology switch. Also, since there are large libraries available for Java, reuse reduces the development time, and it may represent a large portion of total running time. Startup time is annoying, but it occurs only the first time, and can be reduced by read ahead caching.
  • Java is memory hungry : java applications uses a lot of memory because they do a lot of things.
  • Java was proprietary : I was excluding Java because it was not open source, and then, any open source Java software had a non-free dependency. Free JVM and libraries was available, but you were on your own by doing this. Since Java JDK is now open, we are free to use it.
  • Java is complicated : there is some learning curve, but that's not so bad. Eclipse is helping a lot to reduce burden because of typechecking. Compiling with "ant" is so much easy compared to autotools madness! And since CDBS has support for ant, it's easy to package a Java program.
  • Java is huge : Installing JRE headless on a minimal ubuntu requires about 93MB of disk space. I agree that if you run only one application, it's huge. But if we were mainly running java software, this stack would be used for all applications, and also here, the reuse lowers disk requirement. But for Noesis, it's a good question to ask : do system administrator will be willing to install about one hundred MB of stuff for it? 100MB disk space costs about 1 cent today, but with backups and management time, say 5 cents. Well, I guess it would be the best investment we can do with a nickel. Still, I will have to work hard to convince system administrators to install a JRE on their Linux virtual server that takes less than few hundred MB, because it's a large proportion of the server install size.
  • What about embedded devices? It could be of practical interest to get Noesis running on a small device. There is the JME, but since it's stripped, I don't really know for sure if it won't break something.
It's a chance that XSugar was in Java, because it's definitely an important technology in the whole picture, and a new arrow to my quiver.

lundi 28 septembre 2009

Brics projects packages

New packages from brics projects are now available for Ubuntu. This is a snapshot of dependencies required for bidirectional configuration file parsing. Main packages are :
  • automaton : library for finite automata
  • grammar : library and utility for grammar validation and handling
  • xsugar : main program for bidirectional transformations
You can find them on launchpad :

http://launchpad.net/~francis-giraldeau/+archive/noesis

Take care, names are subject to change. Have fun!

jeudi 10 septembre 2009

Configuration as XML

Augeas provides an XPath like interface and API to access and modify configuration files. But, there are few limitations :
  • Augeas is limited to regular grammar, it can't parse nested structured documents. Il did search to see if it could be possible to use regular approximations for context free grammar, but in this case it's not possible. A regular expression parser uses only a finite automata, and for general context free grammar, we need a stack to keep track of the nested level of the document.
  • Augeas doesn't provide a complete xml file from a configuration file, and hence, can't use all the XML libraries processing available.
We need a parser that will be able to parse a general context-free grammar. It should be easy to write grammars, and LR or LALR parsers are too hard to user, since grammar must be written to avoid ambiguities and some type of recursion. The Earley parser algorithm is able to do that.

The project XSugar is exactly what I was looking for. First, it implements a tokenless Earley parser, that has relative acceptable performances on config files. XSugar is able to do bidirectional transformation between a concrete file and an XML document and vice versa. There are few issues that must be resolved.
  • Bidirectional relation doesn't preserve formating of the config file. The reversability propriety is hence approximate, because a round trip will yield the same result, except for spaces and indentation. You have to keep formating manually, and this can be tedious. There is no way to verify that the stylesheet is able to capture all character of the input. Strict unidirectionality is required for config files.
  • Ignorable Elements, like nodes to keep spaces and indentation, has to be present in the XML file, otherwise the unparsing fail. The problem is that clients that will modify the XML will have to add formating nodes. One of the main benefit of using XML was to abstract formating, and this requirement on XML breaks this abstraction. Ignorable Elements must be optional, and when not provided, a default value should be used.
  • The order of elements matters in the XML. If nodes are not provided in the right order, the unparsing fail. The client has to know in which order to provide Elements, and it would be better if the client has not to worry about it.
Those are the main issues I see to make a new day for configuration management come true.

To test those concepts, I created a new project, called Noesis. It means "insight", and I thought it would be meaningful for the current project. News soon.