This project is intended for graph database extraction for further use with dedicated Python libraries for analysis Graph-Sub-Graph Isomorphism problem and general researches underlie the graph theory. It can be useful for scientific researches, learning graph theory and experiments conduction for scientist as well as science students. It can be used for testing your own improvements of existing graphs algorithms, writing your own and looking for dependencies in/between these beautiful structures.
Launching this application assume that you have installed on your computer a few required libraries and you use work station with Intel processor that support Math Kernel Library (MKL). In the parenthesis are provided recommended versions of those libraries on which project was tested.
- Armadillo Library (12.8.1)
- Boost Library (1.85.0)
- Intel MKL Library (2024.2)
- GNU (13.2.0)
You can read PDF article that is attached here in this project.
Thanks to effort of P. Foggia, C. Sansone, M. Vento I could renew the database that was genereted synthetically and contains graphs of various sizes and characteristics.
Databse contains 72800 paris of graphs:
- 18200 - for for strict graph isomorphism
- 54600 - for graph-sub-graph isomorphism
Below there there are listed types of the graphs in database:
- Randomly Connected Graphs with different values of the edge density η
- Regular Meshes, with different dimensionality: 2D, 3D and 4D
- Irregular Meshes with different degrees of irregularity ρ
- Bounded Valence Graphs, with valence equal to 3.6 and 9
- Irregular Bounded Valence Graphs with a degree of irregularity α
Kind of Graph | No. of Couples | Parameters | Size |
---|---|---|---|
Randomly Connected Graphs (3000 couples) | 1000 | η = 0.01 | From 20 to 1000 |
1000 | η = 0.05 | From 20 to 1000 | |
1000 | η = 0.1 | From 20 to 1000 | |
Regular Mesh (2300 couples) | 1000 | 2D-mesh | From 16 to 1024 |
800 | 3D-mesh | From 27 to 1000 | |
500 | 4D-mesh | From 16 to 1296 | |
Modified Mesh (6900 couples) | 3000 | Irregular 2D-mesh ρ ∈ {0.2, 0.4, 0.6} |
From 16 to 1024 |
2400 | Irregular 3D-mesh ρ ∈ {0.2, 0.4, 0.6} |
From 27 to 1000 | |
1500 | Irregular 4D-mesh ρ ∈ {0.2, 0.4, 0.6} |
From 16 to 1296 | |
Bounded Valence Graphs (3000 couples) | 1000 | valence = 3 | From 20 to 1000 |
1000 | valence = 6 | From 20 to 1000 | |
1000 | valence = 9 | From 20 to 1000 | |
Irregular Bounded Valence Graphs(3000 couples) | 1000 | valence = 3 α = 0.1 |
From 20 to 1000 |
1000 | valence = 6 α = 0.1 |
From 20 to 1000 | |
1000 | valence = 9 α = 0.1 |
From 20 to 1000 |
Due to the fact that project installation is highly platform dependent, here you can directly download files (converted as well as original).
But still feel free to use this code if want.
[Original files] [Converted database]
Hardware requirements: To unzip covnerted files you will need 50GB free space on your disk.
Before you run install.bat script make sure that you have MKL Library installed on your computer, Boost library built with MinGW and Aramdillo library.
Recommended installation paths are:
- Boost:
C:/local/boost_1_85_0
- Armadillo:
C:/armadillo
- MKL: default
If you want to customize installation paths - default installation process will failed. You will be forced to modify CMakeLists.txt file in order to amend proper information about location of the above mentioned tools on your computer.
Please familiar with official documentation for above mentioned tools and its installation processess.
After you will install all required tools download original files for databse that are base for conversion process from link Download resource
At this stage we assume that you have all required tools installed and you physically have required files downloaded from above mentioned link.
Now you can do below steps:
- Download the source code of this project
git clone https://github.com/scitia/graphiso-extractor.git
- Open config.ini file and fill all required iformation about source files location, target location and ground truth files location like in exmaple below:
[database]
source=c:/db/bin
ground=c:/db/gtr
target=c:/db/csv
For source and ground properties you have to point to /bin and /gtr directories respectively of downloaded database.
- Run installation script:
.\install.bat
You will be informed about every step of installation. After this process you will be in possession of converted database that you can use in high-level programming languages as python in more comfortable way. This usage will be described in the next section of this documentation.
Generated database contains .csv files that represents graphs adjacency matrices in the non compressed form.
Generated resource contains 4 directories:
├── iso
├── si2
├── si4
└── si6
Each of them has the same structure as follows:
Expand to see details
.
├── bvg
│ ├── b03
│ │ ├── m1000
│ │ ├── m200
│ │ ├── m400
│ │ ├── m600
│ │ ├── m800
│ │ ├── s100
│ │ ├── s20
│ │ ├── s40
│ │ ├── s60
│ │ └── s80
│ ├── b03m
│ │ ├── m1000
│ │ ├── m200
│ │ ├── m400
│ │ ├── m600
│ │ ├── m800
│ │ ├── s100
│ │ ├── s20
│ │ ├── s40
│ │ ├── s60
│ │ └── s80
│ ├── b06
│ │ ├── m1000
│ │ ├── m200
│ │ ├── m400
│ │ ├── m600
│ │ ├── m800
│ │ ├── s100
│ │ ├── s20
│ │ ├── s40
│ │ ├── s60
│ │ └── s80
│ ├── b06m
│ │ ├── m1000
│ │ ├── m200
│ │ ├── m400
│ │ ├── m600
│ │ ├── m800
│ │ ├── s100
│ │ ├── s20
│ │ ├── s40
│ │ ├── s60
│ │ └── s80
│ ├── b09
│ │ ├── m1000
│ │ ├── m200
│ │ ├── m400
│ │ ├── m600
│ │ ├── m800
│ │ ├── s100
│ │ ├── s20
│ │ ├── s40
│ │ ├── s60
│ │ └── s80
│ └── b09m
│ ├── m1000
│ ├── m200
│ ├── m400
│ ├── m600
│ ├── m800
│ ├── s100
│ ├── s20
│ ├── s40
│ ├── s60
│ └── s80
├── m2D
│ ├── m2D
│ │ ├── m1024
│ │ ├── m196
│ │ ├── m400
│ │ ├── m576
│ │ ├── m784
│ │ ├── s100
│ │ ├── s16
│ │ ├── s36
│ │ ├── s64
│ │ └── s81
│ ├── m2Dr2
│ │ ├── m1024
│ │ ├── m196
│ │ ├── m400
│ │ ├── m576
│ │ ├── m784
│ │ ├── s100
│ │ ├── s16
│ │ ├── s36
│ │ ├── s64
│ │ └── s81
│ ├── m2Dr4
│ │ ├── m1024
│ │ ├── m196
│ │ ├── m400
│ │ ├── m576
│ │ ├── m784
│ │ ├── s100
│ │ ├── s16
│ │ ├── s36
│ │ ├── s64
│ │ └── s81
│ └── m2Dr6
│ ├── m1024
│ ├── m196
│ ├── m400
│ ├── m576
│ ├── m784
│ ├── s100
│ ├── s16
│ ├── s36
│ ├── s64
│ └── s81
├── m3D
│ ├── m3D
│ │ ├── m1000
│ │ ├── m216
│ │ ├── m343
│ │ ├── m512
│ │ ├── m729
│ │ ├── s125
│ │ ├── s27
│ │ └── s64
│ ├── m3Dr2
│ │ ├── m1000
│ │ ├── m216
│ │ ├── m343
│ │ ├── m512
│ │ ├── m729
│ │ ├── s125
│ │ ├── s27
│ │ └── s64
│ ├── m3Dr4
│ │ ├── m1000
│ │ ├── m216
│ │ ├── m343
│ │ ├── m512
│ │ ├── m729
│ │ ├── s125
│ │ ├── s27
│ │ └── s64
│ └── m3Dr6
│ ├── m1000
│ ├── m216
│ ├── m343
│ ├── m512
│ ├── m729
│ ├── s125
│ ├── s27
│ └── s64
├── m4D
│ ├── m4D
│ │ ├── m1296
│ │ ├── m256
│ │ ├── m625
│ │ ├── s16
│ │ └── s81
│ ├── m4Dr2
│ │ ├── m1296
│ │ ├── m256
│ │ ├── m625
│ │ ├── s16
│ │ └── s81
│ ├── m4Dr4
│ │ ├── m1296
│ │ ├── m256
│ │ ├── m625
│ │ ├── s16
│ │ └── s81
│ └── m4Dr6
│ ├── m1296
│ ├── m256
│ ├── m625
│ ├── s16
│ └── s81
└── rand
├── r001
│ ├── m1000
│ ├── m200
│ ├── m400
│ ├── m600
│ ├── m800
│ ├── s100
│ ├── s20
│ ├── s40
│ ├── s60
│ └── s80
├── r005
│ ├── m1000
│ ├── m200
│ ├── m400
│ ├── m600
│ ├── m800
│ ├── s100
│ ├── s20
│ ├── s40
│ ├── s60
│ └── s80
└── r01
├── m1000
├── m200
├── m400
├── m600
├── m800
├── s100
├── s20
├── s40
├── s60
└── s80
Each leaf on above tree has 100 directires inside named 00-99 and evey of them has tree files:
.
├── ga.csv
├── gb.csv
└── gt.txt
where:
- ga.csv (Graph A)
- in sense of strict isomorphism is first graph of the pair.
- in sense of graph-sub-graph isomorphism this graph represents the structure that in Graph B isomorphic structures will be searched for.
- gb.csv (Graph B)
- in sense of strict isomorphism is second graph of the pair.
- in sense of graph-sub-graph isomorphism this graph contains ζ isomorphic structures towards A.
- gt.txt (Ground Truth)
- contains single number that for isomorphism class of pairs is always 1, but in the sense of graph-sub-graph isomorphism problem represents the number of isomorhpic structures in graph B towards A.
f1n = 'ga.csv'
f2n = 'gb.csv'
mat_df_1 = pd.read_csv(f1n, sep=",", header=None)
G1 = nx.from_pandas_adjacency(mat_df_1)
mat_df_2 = pd.read_csv(f2n, sep=",", header=None)
G2 = nx.from_pandas_adjacency(mat_df_2)
GM = isomorphism.GraphMatcher(G1, G2)
print(f'Graphs {f1n} and {f2n} are isomorphic: ' + str(GM.is_isomorphic()))
f1n = 'ga.csv'
f2n = 'gb.csv'
mat_df_1 = pd.read_csv(f1n, sep=",", header=None)
G1 = nx.from_pandas_adjacency(mat_df_1)
mat_df_2 = pd.read_csv(f2n, sep=",", header=None)
G2 = nx.from_pandas_adjacency(mat_df_2)
GM = isomorphism.GraphMatcher(G2, G1)
count_isomorphic_subgraphs = sum(1 for _ in GM.subgraph_isomorphisms_iter())
print(f"Actual number of isomorphic sub-graphs: {count_isomorphic_subgraphs}")
An example pair of two isomorphic graphs and vertex maping between them as a map container in center part of drawing:
I encourage you to use my dedicated python library (works in progress) to deal with graphs and automates processing these structures. Use this installation only if you need to possess files locally on your computer to speed up processing in your researches.
There are in plans to set up the FTP server to share all generated files. This documentation will be completed in the future.
I am not responsible for the completeness and correctness of the data. The source files were not modified in any way.
Nevertheless it will be nice I you report any bugs or problems.
Conrad Sanderson and Ryan Curtin. Armadillo: a template-based C++ library for linear algebra. Journal of Open Source Software, Vol. 1, No. 2, pp. 26, 2016.
Conrad Sanderson and Ryan Curtin. Practical Sparse Matrices in C++ with Hybrid Storage and Template-Based Expression Optimisation. Mathematical and Computational Applications, Vol. 24, No. 3, 2019.
P. Foggia, C. Sansone, M. Vento A Database of Graphs for Isomorphism and Sub-Graph Isomorphism Benchmarking Dipartimento di Informatica e Sistemistica - Università di Napoli "Federico II" Via Claudio 21, I-80125 Napoli (Italy)
Cordella, L. P.; Foggia, P.; Sansone, C. Vento, M. (2001) An Improved Algorithm for Matching Large Graphs 3rd IAPR-TC15 Workshop on Graph-based Representations in Pattern Recognition: 149–159.