Update 2011-30-09: I did not noticed the wonderful CKAN wiki explaining several aspects I was looking for.
As I previously announced, I will write a GUI for the datapkg tool during these 10 weeks of stage at FBK.
I am beginning today to study its source code. It is very well written – a couple of hacks here and there, but is highly object-oriented and makes heavy uses of design patterns – but lacks of internal documentation.
The datapkg command line interface makes use of the whole architecture that, even with a great separation of concerns – seems not to be designed as a library. Even if it is a small program, it is quite complex.
For example, as I understand, the simple search by name of a data-set involves the creation of a Command object (from the command line) that creates a Specification object, which in turn creates an Index object. It could continue from there, I stopped looking. It is well organised but difficult to be understood by an external developer.
I am not criticizing datapkg code negatively. Indeed I am impressed with the quality of the code organization and the beauty of its object-oriented organization. Being it not written to be used as a library, it lacks of some explanations of the architecture and design. An (internal) API is also missing.
Therefore, I think that I will write a tiny simple library to help me hide the complexity of the program. Before that, I strongly believe that some Software Engineering practices are needed to help me to perform my job. In the next days I will perform some reverse engineering practices and draw an initial architecture of datapkg. Then, I will dram some diagrams to help me understand which components are called.
Coding can wait.