Sep 26, 2011

First impressions of datapkg, how to proceed

Update 2011-30-09: I did not noticed the wonderful CKAN wiki explaining several aspects I was looking for.

As I previously announced, I will write a GUI for the datapkg tool during these 10 weeks of stage at FBK.
I am beginning today to study its source code. It is very well written – a couple of hacks here and there, but is highly object-oriented and makes heavy uses of design patterns – but lacks of internal documentation.
The datapkg command line interface makes use of the whole architecture that, even with a great separation of concerns – seems not to be designed as a library. Even if it is a small program, it is quite complex.

For example, as I understand, the simple search by name of a data-set involves the creation of a Command object (from the command line) that creates a Specification object, which in turn creates an Index object. It could continue from there, I stopped looking. It is well organised but difficult to be understood by an external developer.

I am not criticizing datapkg code negatively. Indeed I am impressed with the quality of the code organization and the beauty of its object-oriented organization. Being it not written to be used as a library, it lacks of some explanations of the architecture and design. An (internal) API is also missing.

Therefore, I think that I will write a tiny simple library to help me hide the complexity of the program. Before that, I strongly believe that some Software Engineering practices are needed to help me to perform my job. In the next days I will perform some reverse engineering practices and draw an initial architecture of datapkg. Then, I will dram some diagrams to help me understand which components are called.

Coding can wait.

written by dgraziotin

Dr. Daniel Graziotin received his PhD in computer science, software engineering at the Free University of Bozen-Bolzano, Italy. His research interests include human aspects in empirical software engineering with psychological measurements, Web engineering, and open science. He researches, publishes, and reviews for venues in software engineering, human-computer interaction, and psychology. Daniel is the founder of the psychoempirical software engineering discipline and guidelines. He is associate editor at the Journal of Open Research Software, academic editor at the Research Ideas and Outcomes (RIO) journal, and academic editor at the Open Communications in Computer Science journal. He is the local coordinator of the Italian Open science local group for the Open Knowledge Foundation. He is a member of ACM, SIGSOFT, and IEEE.

Leave a comment