Skip to content
JP Kanter edited this page Sep 16, 2020 · 3 revisions

Welcome to the solr2triplestore-tool wiki!

General Usage

As written in the readme.md this provides two separated files, the main.py is the actual command line tool ready to be used in the limited environment the overall project was created for, converting data from a Apache Solr to Linked Open Data, the second part, the SpchtDescriptorFormat.py contains an entire Python class with some additional applications, in fact, some functions of the class aren't even used in the implementation but yet there.

I sincerely hope that the readme is explanation enough and i can focus instead on the general making and maybe extension of this project. I confess that i wrote parts of it to get more familiar with python and maybe learn a new thing or two. I still believe created a somewhat useful tool on the way there so it might be worthy to pursue a further extension of the project. This is only possible if the 1000 Line class file is described in any way. So that is what i want to do in this Wiki.

Design

Thoughts

When "designing" ( i kinda refuse to use that word ) the SPCHT Format i had some question about how to handle everything properly. One of the reason i went through all the hassle to create this instead of using an already existing system was that i really don't like the way XML Files works in general. It feels like a lot of overhead for almost no reason. I intended to write the Descriptor File by hand, having a clunky language defining anything did not seem very useful. On the other hand there were some design decisions i quickly ran into:

There are multiple key-types that use more than one value, while stuff like field or type are straight forward, other thinks need additional data. Mapping for example, the map itself is of course a dictionary, a key-value definition. But there are also some config informations and in the end everything falls under the definition of "map", i went for keys like mapping and mapping_setting, while a nested dictionary might have done the trick. I am quite unsure about this still, maybe a nested style like

{ 'mapping': 
   { 
       map: {...},
       settings: {...},
       ref: '../../file.json'
   }
}

Would have been a lot more elegant, on the other hand it creates endless nesting (and the potential in this case to link more than one referenced file.

The newest offender is the if field, it naturally has to have three settings, condition, value and field, i will probably go for if_field, if_value and if_condition. But still, the nested way would surely look more elegant, on the other hand it would be a tad harder to handle but more grouped. I started the way with one dimensional settings and only using nesting for fall back and the occasionally data dump inside the descriptor so there are visual cues when there is nesting going on. And yet, i doubt myself upon this decision. So if you sit here, looking in this Wiki, cursing me for writing it in this clunky way, be comforted by the thought that i actually thought about it and decided against it. Thanks for coming to my TED Talk about not knowing what to do.

Clone this wiki locally