first impressions of datapkg, how to proceed


Update 2011-30-09: I did not noticed the wonderful CKAN wiki explaining several aspects I was looking for.

As I previously announced, I will write a GUI for the datapkg tool during these 10 weeks of stage at FBK. I am beginning today to study its source code. It is very well written - a couple of hacks here and there, but is highly object-oriented and makes heavy uses of design patterns - but lacks of internal documentation. The datapkg command line interface makes use of the whole architecture that, even with a great separation of concerns - seems not to be designed as a library. Even if it is a small program, it is quite complex.

For example, as I understand, the simple search by name of a data-set involves the creation of a Command object (from the command line) that creates a Specification object, which in turn creates an Index object. It could continue from there, I stopped looking. It is well organised but difficult to be understood by an external developer.

I am not criticizing datapkg code negatively. Indeed I am impressed with the quality of the code organization and the beauty of its object-oriented organization. Being it not written to be used as a library, it lacks of some explanations of the architecture and design. An (internal) API is also missing.

Therefore, I think that I will write a tiny simple library to help me hide the complexity of the program. Before that, I strongly believe that some Software Engineering practices are needed to help me to perform my job. In the next days I will perform some reverse engineering practices and draw an initial architecture of datapkg. Then, I will dram some diagrams to help me understand which components are called.

Coding can wait.

I do not use a commenting system anymore, but I would be glad to read your feedback. Feel free to contact me.