Skip to content

CMake based project for graph database extraction from binary format to adjacency matrices

License

Notifications You must be signed in to change notification settings

scitia/graphiso-extractor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Graphiso-Extractor

Logo of K!Scithia

Table of contents

Introduction

This project is intended for graph database extraction for further use with dedicated Python libraries for analysis Graph-Sub-Graph Isomorphism problem and general researches underlie the graph theory. It can be useful for scientific researches, learning graph theory and experiments conduction for scientist as well as science students. It can be used for testing your own improvements of existing graphs algorithms, writing your own and looking for dependencies in/between these beautiful structures.

Prerequisite

Launching this application assume that you have installed on your computer a few required libraries and you use work station with Intel processor that support Math Kernel Library (MKL). In the parenthesis are provided recommended versions of those libraries on which project was tested.

  • Armadillo Library (12.8.1)
  • Boost Library (1.85.0)
  • Intel MKL Library (2024.2)
  • GNU (13.2.0)

Database Description

You can read PDF article that is attached here in this project.

Thanks to effort of P. Foggia, C. Sansone, M. Vento I could renew the database that was genereted synthetically and contains graphs of various sizes and characteristics.

Databse contains 72800 paris of graphs:

  • 18200 - for for strict graph isomorphism
  • 54600 - for graph-sub-graph isomorphism

Below there there are listed types of the graphs in database:

  • Randomly Connected Graphs with different values of the edge density η
  • Regular Meshes, with different dimensionality: 2D, 3D and 4D
  • Irregular Meshes with different degrees of irregularity ρ
  • Bounded Valence Graphs, with valence equal to 3.6 and 9
  • Irregular Bounded Valence Graphs with a degree of irregularity α

Kind of Graph No. of Couples Parameters Size
Randomly Connected Graphs (3000 couples) 1000 η = 0.01 From 20 to 1000
1000 η = 0.05 From 20 to 1000
1000 η = 0.1 From 20 to 1000
Regular Mesh (2300 couples) 1000 2D-mesh From 16 to 1024
800 3D-mesh From 27 to 1000
500 4D-mesh From 16 to 1296
Modified Mesh (6900 couples) 3000 Irregular 2D-mesh
ρ ∈ {0.2, 0.4, 0.6}
From 16 to 1024
2400 Irregular 3D-mesh
ρ ∈ {0.2, 0.4, 0.6}
From 27 to 1000
1500 Irregular 4D-mesh
ρ ∈ {0.2, 0.4, 0.6}
From 16 to 1296
Bounded Valence Graphs (3000 couples) 1000 valence = 3 From 20 to 1000
1000 valence = 6 From 20 to 1000
1000 valence = 9 From 20 to 1000
Irregular Bounded Valence Graphs(3000 couples) 1000 valence = 3
α = 0.1
From 20 to 1000
1000 valence = 6
α = 0.1
From 20 to 1000
1000 valence = 9
α = 0.1
From 20 to 1000
The table above presents graphs generated for the isomorphism case.

Downloads

Due to the fact that project installation is highly platform dependent, here you can directly download files (converted as well as original).

But still feel free to use this code if want.

[Original files] [Converted database]

Hardware requirements: To unzip covnerted files you will need 50GB free space on your disk.

Instalation

Before you run install.bat script make sure that you have MKL Library installed on your computer, Boost library built with MinGW and Aramdillo library.

Recommended installation paths are:

  • Boost: C:/local/boost_1_85_0
  • Armadillo: C:/armadillo
  • MKL: default

If you want to customize installation paths - default installation process will failed. You will be forced to modify CMakeLists.txt file in order to amend proper information about location of the above mentioned tools on your computer.

Please familiar with official documentation for above mentioned tools and its installation processess.

After you will install all required tools download original files for databse that are base for conversion process from link Download resource

At this stage we assume that you have all required tools installed and you physically have required files downloaded from above mentioned link.

Now you can do below steps:

  1. Download the source code of this project
git clone https://github.com/scitia/graphiso-extractor.git
  1. Open config.ini file and fill all required iformation about source files location, target location and ground truth files location like in exmaple below:
[database]
source=c:/db/bin
ground=c:/db/gtr
target=c:/db/csv

For source and ground properties you have to point to /bin and /gtr directories respectively of downloaded database.

  1. Run installation script:
.\install.bat

You will be informed about every step of installation. After this process you will be in possession of converted database that you can use in high-level programming languages as python in more comfortable way. This usage will be described in the next section of this documentation.

Structure

Generated database contains .csv files that represents graphs adjacency matrices in the non compressed form.

Generated resource contains 4 directories:

├── iso
├── si2
├── si4
└── si6

Each of them has the same structure as follows:

Expand to see details
.
├── bvg
│   ├── b03
│   │   ├── m1000
│   │   ├── m200
│   │   ├── m400
│   │   ├── m600
│   │   ├── m800
│   │   ├── s100
│   │   ├── s20
│   │   ├── s40
│   │   ├── s60
│   │   └── s80
│   ├── b03m
│   │   ├── m1000
│   │   ├── m200
│   │   ├── m400
│   │   ├── m600
│   │   ├── m800
│   │   ├── s100
│   │   ├── s20
│   │   ├── s40
│   │   ├── s60
│   │   └── s80
│   ├── b06
│   │   ├── m1000
│   │   ├── m200
│   │   ├── m400
│   │   ├── m600
│   │   ├── m800
│   │   ├── s100
│   │   ├── s20
│   │   ├── s40
│   │   ├── s60
│   │   └── s80
│   ├── b06m
│   │   ├── m1000
│   │   ├── m200
│   │   ├── m400
│   │   ├── m600
│   │   ├── m800
│   │   ├── s100
│   │   ├── s20
│   │   ├── s40
│   │   ├── s60
│   │   └── s80
│   ├── b09
│   │   ├── m1000
│   │   ├── m200
│   │   ├── m400
│   │   ├── m600
│   │   ├── m800
│   │   ├── s100
│   │   ├── s20
│   │   ├── s40
│   │   ├── s60
│   │   └── s80
│   └── b09m
│       ├── m1000
│       ├── m200
│       ├── m400
│       ├── m600
│       ├── m800
│       ├── s100
│       ├── s20
│       ├── s40
│       ├── s60
│       └── s80
├── m2D
│   ├── m2D
│   │   ├── m1024
│   │   ├── m196
│   │   ├── m400
│   │   ├── m576
│   │   ├── m784
│   │   ├── s100
│   │   ├── s16
│   │   ├── s36
│   │   ├── s64
│   │   └── s81
│   ├── m2Dr2
│   │   ├── m1024
│   │   ├── m196
│   │   ├── m400
│   │   ├── m576
│   │   ├── m784
│   │   ├── s100
│   │   ├── s16
│   │   ├── s36
│   │   ├── s64
│   │   └── s81
│   ├── m2Dr4
│   │   ├── m1024
│   │   ├── m196
│   │   ├── m400
│   │   ├── m576
│   │   ├── m784
│   │   ├── s100
│   │   ├── s16
│   │   ├── s36
│   │   ├── s64
│   │   └── s81
│   └── m2Dr6
│       ├── m1024
│       ├── m196
│       ├── m400
│       ├── m576
│       ├── m784
│       ├── s100
│       ├── s16
│       ├── s36
│       ├── s64
│       └── s81
├── m3D
│   ├── m3D
│   │   ├── m1000
│   │   ├── m216
│   │   ├── m343
│   │   ├── m512
│   │   ├── m729
│   │   ├── s125
│   │   ├── s27
│   │   └── s64
│   ├── m3Dr2
│   │   ├── m1000
│   │   ├── m216
│   │   ├── m343
│   │   ├── m512
│   │   ├── m729
│   │   ├── s125
│   │   ├── s27
│   │   └── s64
│   ├── m3Dr4
│   │   ├── m1000
│   │   ├── m216
│   │   ├── m343
│   │   ├── m512
│   │   ├── m729
│   │   ├── s125
│   │   ├── s27
│   │   └── s64
│   └── m3Dr6
│       ├── m1000
│       ├── m216
│       ├── m343
│       ├── m512
│       ├── m729
│       ├── s125
│       ├── s27
│       └── s64
├── m4D
│   ├── m4D
│   │   ├── m1296
│   │   ├── m256
│   │   ├── m625
│   │   ├── s16
│   │   └── s81
│   ├── m4Dr2
│   │   ├── m1296
│   │   ├── m256
│   │   ├── m625
│   │   ├── s16
│   │   └── s81
│   ├── m4Dr4
│   │   ├── m1296
│   │   ├── m256
│   │   ├── m625
│   │   ├── s16
│   │   └── s81
│   └── m4Dr6
│       ├── m1296
│       ├── m256
│       ├── m625
│       ├── s16
│       └── s81
└── rand
    ├── r001
    │   ├── m1000
    │   ├── m200
    │   ├── m400
    │   ├── m600
    │   ├── m800
    │   ├── s100
    │   ├── s20
    │   ├── s40
    │   ├── s60
    │   └── s80
    ├── r005
    │   ├── m1000
    │   ├── m200
    │   ├── m400
    │   ├── m600
    │   ├── m800
    │   ├── s100
    │   ├── s20
    │   ├── s40
    │   ├── s60
    │   └── s80
    └── r01
        ├── m1000
        ├── m200
        ├── m400
        ├── m600
        ├── m800
        ├── s100
        ├── s20
        ├── s40
        ├── s60
        └── s80

Each leaf on above tree has 100 directires inside named 00-99 and evey of them has tree files:

.
├── ga.csv
├── gb.csv
└── gt.txt

where:

  • ga.csv (Graph A)
    • in sense of strict isomorphism is first graph of the pair.
    • in sense of graph-sub-graph isomorphism this graph represents the structure that in Graph B isomorphic structures will be searched for.
  • gb.csv (Graph B)
    • in sense of strict isomorphism is second graph of the pair.
    • in sense of graph-sub-graph isomorphism this graph contains ζ isomorphic structures towards A.
  • gt.txt (Ground Truth)
    • contains single number that for isomorphism class of pairs is always 1, but in the sense of graph-sub-graph isomorphism problem represents the number of isomorhpic structures in graph B towards A.

Examples of use

Determining isomorphism relation beteen graphs in python

f1n = 'ga.csv'
f2n = 'gb.csv'

mat_df_1 = pd.read_csv(f1n, sep=",", header=None)
G1 = nx.from_pandas_adjacency(mat_df_1)

mat_df_2 = pd.read_csv(f2n, sep=",", header=None)
G2 = nx.from_pandas_adjacency(mat_df_2)

GM = isomorphism.GraphMatcher(G1, G2)

print(f'Graphs {f1n} and {f2n} are isomorphic: ' + str(GM.is_isomorphic()))

Determining sub-graph isomorphism relation between graphs in python

f1n = 'ga.csv'
f2n = 'gb.csv'

mat_df_1 = pd.read_csv(f1n, sep=",", header=None)
G1 = nx.from_pandas_adjacency(mat_df_1)

mat_df_2 = pd.read_csv(f2n, sep=",", header=None)
G2 = nx.from_pandas_adjacency(mat_df_2)

GM = isomorphism.GraphMatcher(G2, G1)

count_isomorphic_subgraphs = sum(1 for _ in GM.subgraph_isomorphisms_iter())

print(f"Actual number of isomorphic sub-graphs: {count_isomorphic_subgraphs}")

Example drawing

An example pair of two isomorphic graphs and vertex maping between them as a map container in center part of drawing:

Examples graph and vertex mapping

I encourage you to use my dedicated python library (works in progress) to deal with graphs and automates processing these structures. Use this installation only if you need to possess files locally on your computer to speed up processing in your researches.

Summary

There are in plans to set up the FTP server to share all generated files. This documentation will be completed in the future.

I am not responsible for the completeness and correctness of the data. The source files were not modified in any way.

Nevertheless it will be nice I you report any bugs or problems.

References

Conrad Sanderson and Ryan Curtin. Armadillo: a template-based C++ library for linear algebra. Journal of Open Source Software, Vol. 1, No. 2, pp. 26, 2016.

Conrad Sanderson and Ryan Curtin. Practical Sparse Matrices in C++ with Hybrid Storage and Template-Based Expression Optimisation. Mathematical and Computational Applications, Vol. 24, No. 3, 2019.

P. Foggia, C. Sansone, M. Vento A Database of Graphs for Isomorphism and Sub-Graph Isomorphism Benchmarking Dipartimento di Informatica e Sistemistica - Università di Napoli "Federico II" Via Claudio 21, I-80125 Napoli (Italy)

Cordella, L. P.; Foggia, P.; Sansone, C. Vento, M. (2001) An Improved Algorithm for Matching Large Graphs 3rd IAPR-TC15 Workshop on Graph-based Representations in Pattern Recognition: 149–159.

About

CMake based project for graph database extraction from binary format to adjacency matrices

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published