Skip to content

jacquesfize/GMatch4py

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

97 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Build Status

GMatch4py a graph matching library for Python

GMatch4py is a library dedicated to graph matching. Graph structure are stored in NetworkX graph objects. GMatch4py algorithms were implemented with Cython to enhance performance.

Requirements

  • Python 3
  • Numpy and Cython installed (if not : (sudo) pip(3) install numpy cython)

Installation

To install GMatch4py, run the following commands:

git clone https://github.com/Jacobe2169/GMatch4py.git
cd GMatch4py
(sudo) pip(3) install .

Get Started

Graph input format

In GMatch4py, algorithms manipulate networkx.Graph, a complete graph model that comes with a large spectrum of parser to load your graph from various inputs : *.graphml,*.gexf,.. (check here to see all the format accepted)

Use GMatch4py

If you want to use algorithms like graph edit distances, here is an example:

# Gmatch4py use networkx graph 
import networkx as nx 
# import the GED using the munkres algorithm
import gmatch4py as gm

In this example, we use generated graphs using networkx helpers:

g1=nx.complete_bipartite_graph(5,4) 
g2=nx.complete_bipartite_graph(6,4)

All graph matching algorithms in Gmatch4py work this way:

  • Each algorithm is associated with an object, each object having its specific parameters. In this case, the parameters are the edit costs (delete a vertex, add a vertex, ...)
  • Each object is associated with a compare() function with two parameters. First parameter is a list of the graphs you want to compare, i.e. measure the distance/similarity (depends on the algorithm). Then, you can specify a sample of graphs to be compared to all the other graphs. To this end, the second parameter should be a list containing the indices of these graphs (based on the first parameter list). If you rather compute the distance/similarity between all graphs, just use the None value.
ged=gm.GraphEditDistance(1,1,1,1) # all edit costs are equal to 1
result=ged.compare([g1,g2],None) 
print(result)

The output is a similarity/distance matrix :

array([[0., 14.],
       [10., 0.]])

This output result is "raw", if you wish to have normalized results in terms of distance (or similarity) you can use :

ged.similarity(result)
# or 
ged.distance(result)

Exploit nodes and edges attributes

In this latest version, we add the possibility to exploit graph attributes ! To do so, the base.Base is extended with the set_attr_graph_used(node_attr,edge_attr) method.

import networkx as nx 
import gmatch4py as gm
ged = gm.GraphEditDistance(1,1,1,1)
ged.set_attr_graph_used("theme","color") # Edge colors and node themes attributes will be used.

List of algorithms

  • Graph Embedding
    • Graph2Vec [1]
  • Node Embedding
    • DeepWalk [7]
    • Node2vec [8]
  • Graph kernels
    • Random Walk Kernel (debug needed) [3]
      • Geometrical
      • K-Step
    • Shortest Path Kernel [3]
    • Weisfeiler-Lehman Kernel [4]
      • Subtree Kernel
  • Graph Edit Distance [5]
    • Approximated Graph Edit Distance
    • Hausdorff Graph Edit Distance
    • Bipartite Graph Edit Distance
    • Greedy Edit Distance
  • Vertex Ranking [2]
  • Vertex Edge Overlap [2]
  • Bag of Nodes (a bag of words model using nodes as vocabulary)
  • Bag of Cliques (a bag of words model using cliques as vocabulary)
  • MCS [6]

Publications associated

  • [1] Narayanan, Annamalai and Chandramohan, Mahinthan and Venkatesan, Rajasekar and Chen, Lihui and Liu, Yang. Graph2vec: Learning distributed representations of graphs. MLG 2017, 13th International Workshop on Mining and Learning with Graphs (MLGWorkshop 2017).
  • [2] Papadimitriou, P., Dasdan, A., & Garcia-Molina, H. (2010). Web graph similarity for anomaly detection. Journal of Internet Services and Applications, 1(1), 19-30.
  • [3] Vishwanathan, S. V. N., Schraudolph, N. N., Kondor, R., & Borgwardt, K. M. (2010). Graph kernels. Journal of Machine Learning Research, 11(Apr), 1201-1242.
  • [4] Shervashidze, N., Schweitzer, P., Leeuwen, E. J. V., Mehlhorn, K., & Borgwardt, K. M. (2011). Weisfeiler-lehman graph kernels. Journal of Machine Learning Research, 12(Sep), 2539-2561.
  • [5] Fischer, A., Riesen, K., & Bunke, H. (2017). Improved quadratic time approximation of graph edit distance by combining Hausdorff matching and greedy assignment. Pattern Recognition Letters, 87, 55-62.
  • [6] A graph distance metric based on the maximal common subgraph, H. Bunke and K. Shearer, Pattern Recognition Letters, 1998
  • [7] Perozzi, B., Al-Rfou, R., & Skiena, S. (2014, August). Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 701-710). ACM.
  • [8] node2vec: Scalable Feature Learning for Networks. Aditya Grover and Jure Leskovec. Knowledge Discovery and Data Mining, 2016.

Author(s)

Jacques Fize, jacques[dot]fize[at]cirad[dot]fr

Some algorithms from other projects were integrated to Gmatch4py. Be assured that each code is associated with a reference to the original.

CHANGELOG

18.06.2022

  • Debug the skipgram import
  • Gmatch4py should work with new gensim version

7.05.2019

  • Debug (problems with float edge weight)
  • Add the AbstractEditDistance.edit_path(G,H) method that return the edit path, the cost matrix and the selected cost index in the cost matrix
  • Add a tqdm progress bar for the gmatch4py.helpers.reader.import_dir() function

12.03.2019

  • Add Node2vec

05.03.2019

  • Add Graph Embedding algorithms
  • Remove depreciated methods and classes
  • Add logo
  • Update documentation

25.02.2019

  • Add New Graph Class. Features : Cython Extensions, precomputed values (degrees, neighbor info), hash representation of edges and nodes for a faster comparison
  • Some algorithms are parallelized such as graph edit distances or Jaccard

TODO List

  • Debug algorithms --> Random Walk Kernel, Deltacon
  • Optimize algorithms --> Vertex Ranking