-
Notifications
You must be signed in to change notification settings - Fork 2
Scripts
The following scripts are used in the Eurostat Linked Data conversion process. The scripts can be used together or standalone in order to serve different scenarios. In the following, a brief description on how to run each script is outlined.
Parses the Table of Contents and prints the dataset URLs:
How to Run on Windows: ParseToC.bat -n 5
How to Run on Linux: sh ParseToC.sh -n 5
where
* `n` represents the number of dataset URLs to print
Type -h
for help.
Uncompress the contents of the compressed dataset file:
How to Run on Windows: UnCompressFile.bat -i c:/test/zip/bsbu_m.sdmx.zip -o c:/uncompress/
How to Run on Linux: sh UnCompressFile.sh -i ~/test/zip/bsbu_m.sdmx.zip -o ~/uncompress/
where
* `i` is the input directory path of the compressed file
* `o` is output directory path where the contents of the compressed file will be stored
Type -h
for help.
Downloads the compressed dataset file from the specified URL:
How to Run on Windows: DownloadZip.bat -p c:/test/zip/ -t c:/test/tsv/ -u "http://epp.eurostat.ec.europa.eu/NavTree_prod/everybody/BulkDownloadListing?sort=1&downfile=data/apro_cpb_sugar.sdmx.zip"
How to Run on Linux: sh DownloadZip.sh -p ~/test/zip/ -t ~/test/tsv/ -u "http://epp.eurostat.ec.europa.eu/NavTree_prod/everybody/BulkDownloadListing?sort=1&downfile=data/apro_cpb_sugar.sdmx.zip"
where
* `p` is the directory path where the compressed `.zip` file will be stored
* `t` is the directory path where the compressed `.tsv` file will be stored
* `u` is the URL of the dataset file
Type -h
for help.
Parses the Data Structure Definition (DSD) of a dataset and converts it into RDF using Data Cube vocabulary.
How to Run on Windows: DSDParser.bat -i c:/tempZip/bsbu_m.dsd.xml -o c:/test/ -f TURTLE -a c:/sdmx-code.ttl
How to Run on Linux: sh DSDParser.sh -i ~/tempZip/dsd/bsbu_m.dsd.xml -o ~/test/ -f TURTLE -a ~/sdmx-code.ttl
where
* `i` is the file path of DSD xml file
* `o` is the output directory path where RDF will be stored
* `f` is the format for RDF serialization (RDF/XML, TURTLE, N-TRIPLES)
* `a` is the file path of `sdmx-code.ttl`. It can be downloaded from http://code.google.com/p/publishing-statistical-data/source/browse/trunk/specs/src/main/vocab/sdmx-code.ttl
Type -h
for help.
Parses the SDMX dataset observations and converts it into RDF using DataCube vocabulary.
How to Run on Windows: SDMXParser.bat -f tsieb010 -o c:/test/ -i c:/tempZip/tsieb010.sdmx.xml -l c:/log/ -t c:/tsv/tsieb010.tsv.gz
How to Run on Linux: sh SDMXParser.sh -f tsieb010 -o ~/test/ -i ~/sdmx/tsieb010.sdmx.xml -l ~/log/ -t ~/tsv/tsieb010.tsv.gz
where
* `f` is the name of the datset
* `o` is the output directory path where RDF will be stored
* `i` is the file path of the SDMX `xml` file
* `l` is the directory path where the logs of the dataset conversion will be stored
* `t` is the file path of the SDMX `tsv` file
Type -h
for help.
Generates the VoID file which will be used to populate the triple store described in Step 5 and Step6.
How to Run on Windows: Metadata.bat -i c:/toc/table_of_contents.xml -o c:/test/
How to Run on Linux: sh Metadata.sh -i ~/toc/table_of_contents.xml -o ~/test/
where
* `i` is the file path of the table of contents (optional parameter)
* `o` is the output directory path where the VoID file will be stored
Type -h
for help.
Converts the dictionaries/codelists into RDF. It further generates a catalog file which is used to load all dictionaries/codelists into the triple store
How to Run on Windows: DictionaryParser.bat -i c:/dicPath/ -o c:/outputPath/ -c c:/catalogPath/ -f TURTLE
How to Run on Linux: sh DictionaryParser.sh -i ~/dicPath/ -o ~/outputPath/ -c ~/catalogPath/ -f TURTLE
where
* `i` is the directory path where the dictionaries are stored
* `o` is the directory path where the RDF will be stored
* `c` is the directory path where the catalog file will be stored
* `f` is the format for RDF serialization (RDF/XML, TURTLE, N-TRIPLES). This RDF serialization is *only* used to create the catalog file. Dictionaries are generated only in RDF/XML format
Type -h
for help.
Downloads all the compressed Datasets files from the Bulk Download page by extracting URLs from Table of Contents.
How to Run on Windows: EuroStatMirror.bat -p c:/zip/ -t c:/tsv/
How to Run Linux: sh EuroStatMirror.sh -p ~/zip/ -t ~/tsv/
where
* `p` is the directory path where the `zip` files are downloaded
* `t` is the directory path where the `tsv` files are downloaded
Type -h
for help.
Converts the complete Eurostat datasets into RDF:
How to Run: sh Main.sh -i ~/sdmx-code.ttl -l ~/logs/
where
* `i` is the file path of `sdmx-code.ttl`. It can be downloaded from http://code.google.com/p/publishing-statistical-data/source/browse/trunk/specs/src/main/vocab/sdmx-code.ttl
* `l` is the directory path where logs will be generated
Type -h
for help.
Generates the titles of the datasets in RDF.
How to Run : sh DatasetTitles.sh -o ~/title/
where
* `o` is the output directory path where the RDF will be stored
Type -h
for help.