Skip to content

semagrow/docker-sevod-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 

Repository files navigation

docker-sevod-scraper

This is the docker container for sevod-scraper. Sevod-scraper is a tool to create dataset metadata for Semagrow.

To build docker-sevod-scraper go into the clone directory and run

docker build -t sevod-scraper .

As an example, suppose that your dataset is stored locally in /my/path/dump.nt and its SPARQL endpoint is http://localhost:8080/sparql. To extract its metadata for Semagrow, issue the following command:

docker run --rm \
   -e MODE=rdfdump \
   -e INPUT=/share/dump.nt \
   -e OUTPUT=/share/metadata.ttl \
   -e ENDPOINT=http://localhost:8890/sparql \
   -v /my/path:/share \
  sevod-scraper

The result of sevod-scraper should appear in /my/path/metadata.ttl.

In general, you can control the sevod-scraper using the following environment variables:

Variable Description
MODE Determines the type of the input. Possible values:
  rdfdump: input is a path to an RDF file in ntriples format
  geordfdump: input is a path to geospatial RDF file in ntriples format
  cassandra: input is a Cassandra keyspace
  sparql: input is a SPARQL endpoint
INPUT input (type of input determined by MODE)
OUTPUT path to write the output
ENDPOINT SPARQL endpoint URL (used for annotation only)
PREFIXES List of known URI prefixes (comma-separated)
GRAPH Graph (only for SPARQL endpoint)
POLYGON Known bounding polygon (for geospatial RDF files)
NAMESPACE Namespace for URI mappings (only for cassandra)

For more info about the tool and more examples see https://github.com/semagrow/sevod-scraper