SANSA RDF is a library to read RDF files into Spark. SANSA RDF Reader is an extension of io package of SANSA RDF Reader for reading N-Quads, Turtle and RDF/XML serialization formats of RDF.
This package reads N-Quads, Turtle and RDF/XML files and loads them into RDD, DataFrame and GraphX's Graph of Spark.
The main application class is sansa_rdf.App
.
The application requires as application argument:
- path to the input folder containing the data as .nq, .rdf or .ttl (e.g.
data/stw.rdf
)
To run the application on a standalone Spark cluster
- Setup a Spark cluster
- Build the application with Maven
cd /path/to/application
mvn clean package
- Submit the application to the Spark cluster
spark-submit \
--class sansa_rdf.App \
--master spark://spark-master:7077 \
target/RDF_Reader-1.0-SNAPSHOT.jar \
/data/input
and for running each object individually replace the value of --class with one of sansa_rdf.io.NQuadReader, sansa_rdf.io.TurtleReader or sansa_rdf.io.XmlReader.