A Library (with accompanying cli tool) to transform Pandas DataFrames into Exports (RDF) to be sent to DGraph Live Loader
python -m pip install dgraphpandas
❯ dgraphpandas --help
usage: dgraphpandas [-h] [-x {upserts,schema,types}] [-f FILE] -c CONFIG
[-ck CONFIG_FILE_KEY] [-o OUTPUT_DIR] [--console]
[--export_csv] [--encoding ENCODING]
[--chunk_size CHUNK_SIZE]
[--gz_compression_level GZ_COMPRESSION_LEVEL]
[--key_separator KEY_SEPARATOR]
[--add_dgraph_type_records ADD_DGRAPH_TYPE_RECORDS]
[--drop_na_intrinsic_objects DROP_NA_INTRINSIC_OBJECTS]
[--drop_na_edge_objects DROP_NA_EDGE_OBJECTS]
[--illegal_characters ILLEGAL_CHARACTERS]
[--illegal_characters_intrinsic_object ILLEGAL_CHARACTERS_INTRINSIC_OBJECT]
[--version] [-v {DEBUG,INFO,WARNING,ERROR,NOTSET}]
This is a real example which you can find in the samples folder and run from the root of this repository:
dgraphpandas \
--config samples/planets/dgraphpandas.json \
--config_file_key planet \
--file samples/planets/solar_system.csv \
--output samples/planets/output
This example can also be found in Notebook form.
import dgraphpandas as dpd
# Define a Configuration for your data files(s). Explained further in the Configuration section.
config = {
"transform": "horizontal",
"files": {
"planet": {
"subject_fields": ["id"],
"edge_fields": ["type"],
"type_overrides": {
"order_from_sun": "int32",
"diameter_earth_relative": "float32",
"diameter_km": "float32",
"mass_earth_relative": "float32",
"mean_distance_from_sun_au": "float32",
"orbital_period_years": "float32",
"orbital_eccentricity": "float32",
"mean_orbital_velocity_km_sec": "float32",
"rotation_period_days": "float32",
"inclination_axis_degrees": "float32",
"mean_temperature_surface_c": "float32",
"gravity_equator_earth_relative": "float32",
"escape_velocity_km_sec": "float32",
"mean_density": "float32",
"number_moons": "int32",
"rings": "bool"
},
"ignore_fields": ["image", "parent"]
}
}
}
# Perform a Horizontal Transform on the passed file using the config/key
# Generate RDF Upsert statements
intrinsic, edges = dpd.to_rdf('solar_system.csv', config, 'planet', output_dir='.', export_rdf=True)
# Do something with these statements e.g write to zip and ship to DGraph
# The cli will zip this output automatically
# In module mode when you provide output_dir and export_rdf it will automatically zip and write to disk
print(intrinsic)
print(edges)
Alternatively, you could call the underlying methods
# Perform a Horizontal Transform on the passed file using the config/key
intrinsic, edges = horizontal_transform('solar_system.csv', config, "planet")
# Generate RDF Upsert statements
intrinsic_upserts, edges_upserts = generate_upserts(intrinsic, edges)