Skip to content

projekt-opal/vectograph

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

52 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Vectograph

Vectograph is an open-source software library for automatically creating a graph structured data from a given tabular data.

Creating Structured Data from Tabular Data

Let X be a m by n matrix representing the input tabular, the structured data is created by following these steps:

  1. Apply QCUT algorithm for each column that has at least min_unique_val_per_column number of unique values.
  2. Consider
    1. the i.th row as the i.th concise bounded description of the i.th event.
    2. the j.th column as the j.th relation/predicate/edge.
    3. A triple is modeled as event_i -> relation_j -> X_ij.

Assume that we have the first row of fetch_california_housing is

[   8.3252       41.            6.98412698    1.02380952  322. 2.55555556   37.88       -122.23      ]

Applying the QCUT algorithm with default parameters min_unique_val_per_column=6, num_quantile=5 generates 0.th CBD

<Event_0> <Feature_Category_0> <0_quantile_4> .
<Event_0> <Feature_Category_1> <1_quantile_4> .
<Event_0> <Feature_Category_2> <2_quantile_4> .
<Event_0> <Feature_Category_3> <3_quantile_1> .
<Event_0> <Feature_Category_4> <4_quantile_0> .
<Event_0> <Feature_Category_5> <5_quantile_1> .
<Event_0> <Feature_Category_6> <6_quantile_4> .
<Event_0> <Feature_Category_7> <7_quantile_0> .

that consist of n triples. <Feature_Category_0> represents the 0.th relation, i.e., 0.th column, whereas <0_quantile_4> represents a tail entity , i.e., the 4.th bin of the 0.th column of the tabular data. . After the data conversion, we store each bin values. For instance, running examples/sklearn_example.py generates Feature_Category_0_Mapping.csv that indicates 0_quantile_4 corresponds a bin that cover all values greater or equal than 5.10972.

Installation

git clone https://github.com/dice-group/Vectograph.git
conda create -n temp python=3.6 # Or be sure that your have Python => 3.6.
conda activate temp
pip install -e . 
python -c "import vectograph"
python -m pytest tests

Examples

API Example

from vectograph.transformers import GraphGenerator
from vectograph.quantizer import QCUT
import pandas as pd
from sklearn import datasets

X, y = datasets.fetch_california_housing(return_X_y=True)
X_transformed = QCUT(min_unique_val_per_column=6, num_quantile=5).transform(pd.DataFrame(X))
# Add prefix
X_transformed.index = 'Event_' + X_transformed.index.astype(str)
kg = GraphGenerator().transform(X_transformed)

for s, p, o in kg:
    print(s, p, o)

Scripting Example

Create a toy dataset via sklearn. Available datasets: boston, iris, diabetes, digits, wine, and breast_cancer.

python create_toy_data.py --toy_dataset_name "boston"
# Discretize each column having at least 12 unique values into 10 quantiles, otherwise do nothing
python main.py --tabularpath "boston.csv" --kg_name "boston.nt" --num_quantile=10 --min_unique_val_per_column=12

From a tabular data to knowledge graph embeddings

# (1) Clone the repositories.
git clone https://github.com/dice-group/DAIKIRI-Embedding.git
git clone https://github.com/dice-group/vectograph.git
# (3) Create a virtual enviroment and install the dependicies pertaining to the DAIKIRI-Embedding framework.
conda env create -f DAIKIRI-Embedding/environment.yml
conda activate daikiri
# (4) Install dependencies of the vectograph framework.
pip install -e vectograph/.
# (5) Create a knowledge graph by using an example dataset from sklearn.datasets.fetch_california_housing.html
python vectograph/create_toy_data.py --toy_dataset_name "wine"
python vectograph/main.py --tabularpath "wine.csv" --kg_name "train.txt" --num_quantile=10 --min_unique_val_per_column=12
# (6) Generate Embeddings
python DAIKIRI-Embedding/main.py --path_dataset_folder '.' --model 'ConEx'  > conex_emb.log
# (7) Log file contains all relevant information
cat conex_emb.log
# Result: A folder named with current time created that contains
# info.log, ConEx_entity_embeddings.csv, ConEx_relation_embeddings.csv, etc.

How to cite

If you really like this framework and want to cite it in your work, feel free to

@inproceedings{demir2021convolutional,
title={Convolutional Complex Knowledge Graph Embeddings},
author={Caglar Demir and Axel-Cyrille Ngonga Ngomo},
booktitle={Eighteenth Extended Semantic Web Conference - Research Track},
year={2021},
url={https://openreview.net/forum?id=6T45-4TFqaX}}

For any further questions, please contact: caglar.demir@upb.de

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 100.0%