Skip to content
m4xxed edited this page Apr 5, 2023 · 19 revisions

GXL format

This page describes the GXL format. GXL is a format to exchange graphs based on XML. A graph consists of nodes and edges connecting exactly two nodes. Nodes and edges are typed and can have attributes. Nodes can be hierarchical, that is, a node can have child nodes. General information on GXL can be found here.

GXL syntax

The GXL we process is a dialect of the general GXL format, defined by the Axivion Suite, with specific limitations and additional restrictions. Foremost, there is no metaschema and not all aspects of standard GXL are actually supported. Moreover, an implementation detail demands that all nodes of a graph must be provided first, before any edge can be specified. Nesting of nodes is expressed by an edge of type Belongs_To. A GXL file must contain exactly one graph.

The DTD for Axivion's GXL variant defining the syntax supported is as follows (it is a subset of the complete GXL DTD specification):

<!-- subset of GXL 1.0 used in the Axivion Suite
     Document Type Definition
     (based on GXL DTD 1.0 of April 25, 2002)

copyright of original GXL DTD 1.0 by

       Andy Schuerr
         Real-Time Systems Lab
         Darmstadt University of Technology
         Merckstr. 25, D-64283 Darmstadt, Germany
         andy.schuerr@es.tu-darmstadt.de

       Susan Elliott Sim
         School of Information and Computer Science
         444 Computer Science Bldg.
         University of California, Irvine
         ses@ics.uci.edu

       Ric Holt
         Department of Computer Science
         University of Waterloo
         Waterloo N2L 3G1, Canada
         holt@plg.uwaterloo.ca

       Andreas Winter
         Institute for Software Technology
         University of Koblenz-Landau
         Universitaetsstrasse 1, D-56070 Koblenz, Germany
         winter@uni-koblenz.de
-->
<!-- Attribute values -->
<!ENTITY % val "
           bool    |
           int     |
           float   |
           string  |
           toogle
           ">
<!-- gxl -->
<!ELEMENT gxl (graph) >
<!ATTLIST gxl
	xmlns:xlink CDATA #FIXED "http://www.w3.org/1999/xlink"
>
<!-- type -->
<!ELEMENT type EMPTY>
<!ATTLIST type
	xlink:type (simple) #FIXED "simple"
	xlink:href CDATA #REQUIRED
>
<!-- graph -->
<!ELEMENT graph (type? , attr* , ( node | edge )*) >
<!ATTLIST graph
	id ID #REQUIRED
	role NMTOKEN #IMPLIED
	edgeids (true | false) "false"
	>
<!-- node -->
<!ELEMENT node (type? , attr*) >
<!ATTLIST node
	id ID #REQUIRED
>
<!-- edge -->
<!ELEMENT edge (type?, attr*) >
<!ATTLIST edge
	id ID #IMPLIED
	from IDREF #REQUIRED
	to IDREF #REQUIRED
>

<!-- attr -->
<!ELEMENT attr>
<!ATTLIST attr
	id ID #IMPLIED
	name NMTOKEN #REQUIRED
	kind NMTOKEN #IMPLIED
>

<!-- atomic values -->
<!ELEMENT bool (#PCDATA)>
<!ELEMENT int (#PCDATA)>
<!ELEMENT float (#PCDATA)>
<!ELEMENT string (#PCDATA)>
<!ELEMENT toggle (#PCDATA)>

Explanation by example

The format is best explained by an example. Let's suppose we have a graph for a ficticous Java program as follows:

  • There is a package p1.
  • There are four classes c1, c2, c3, and c4.
  • There is one method m1.
  • There is one field f1.
  • All four classes are contained in package p1.
  • Method m1 is contained in c1.
  • Field f1 is contained in c1.
  • Method m1 sets the value of field f1.
  • Class c2 calls class c3.
  • Class c3 calls class c4.

In the following subsections we describe in detail how this graph is encoded in GXL.

Containing graph

The graph elements listed above must be surrounded by XML code as follows:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE gxl SYSTEM "http://www.gupro.de/GXL/gxl-1.0.dtd">
<gxl xmlns:xlink="http://www.w3.org/1999/xlink">
  <graph id="CodeFacts" edgeids="true">

  <!---Here come the node declarations---->
  <!---Here come the edge declarations---->

  </graph>
</gxl>

The attribute id of the graph clause (here: CodeFacts) specifies the name of the graph. You are free to use any name. Everything else is alike for all GXL files.

Node declarations

A node is declared as follows:

    <node id="N1">
      <type xlink:href="Method"/>
      <attr name="Source.Name">
        <string>m1</string>
      </attr>
      <attr name="Linkage.Name">
        <string>p1.c1.m1</string>
      </attr>
      <attr name="Metric.Number_Of_Calling_Routines">
        <int>1</int>
      </attr>
      <attr name="Metric.Number_Of_Called_Routines">
        <int>1</int>
      </attr>	  
      <attr name="Metric.Lines.LOC">
        <int>5</int>
      </attr>
      <attr name="Metric.McCabe_Complexity">
        <int>1</int>
      </attr>	  
    </node>

where:

  • The attribute id with value N1 is a node identifier unique in the GXL file that will be used in edge declarations to specify the source or target, respectively, of an edge. Node identifiers are case-sensitive.
  • The value for xlink:href in <type xlink:href="Method"/> (here: Method) specifies the type of the node; any string is possible to name a type. Type names are case-sensitive.
  • The nested clauses <attr name="...">...</attr> specify attributes (see below) of the node.

Edge declarations

Edges are declared as follows:

  <edge id="E9" from="N1" to="N2">
    <type xlink:href="Set"/>
      <attr name="Metric.Number_Of_Calls">
        <int>5</int>
      </attr>	
  </edge>

where:

  • The attribute id with value E9 is an edge identifier unique in the GXL file. Edge identifiers are case-sensitive.
  • The value for xlink:href in <type xlink:href="Set"/> (here: Set) specifies the type of the edge; any string is possible to name a type. Type names are case-sensitive. Although the same name could be used for both a node and edge type, we recommend not to do that.
  • Edges can have attributes, too, completely analogous to nodes.

To express that one node is contained in another node, a special edge type Belongs_To is used (mind the exact case-sensitive spelling) from the child to its parent node. For instance, the following edge declaration:

    <edge id="E1" from="N1" to="N3">
      <type xlink:href="Belongs_To"/>
    </edge>

specifies that the node with the id N1 is a child of the node with the id N3.

Although these hierarchical edges can have attributes in the input GXL file, too, these attributes will be ignored for the visualization.

Node and edge attributes

Nodes and edges can have attributes. They are declared by way of the following attr clauses:

   <attr name="Source.Name">
     <!---Here comes the attribute type and value---->
   </attr>

where the value for attribute name provides the name of the attribute. There are no restriction for attribute names. By convention, composite names can be separated by a period, but you could as well use other separators. In our use of GXL, metrics to be visualized must have the prefix Metric.. That is, if you have a metric X that you want to, for instance, map onto the height of a block, its name must be Metric.X. Only numeric (integer and floating point numbers) properties can currently be visualized. String and toggles may be shown as values, but cannot otherwise influence visual attributes of visual objects such as height, depth, width, or color.

Attribute names are case-sensitive.

Nodes and edges may have any number and kind of attributes. There are only two restrictions for nodes: Every node must have a string attribute Source.Name and a string attribute Linkage.Name:

  • The string attribute Source.Name will be used to label the game object representing the node in the visualization. It will be shown to the user.
  • The string attribute Linkage.Name serves as a unique identifier of the node. It is used only internally. Above we already mentioned that the attribute id in clauses like <node id="N1"> (here: N1) must be unique within a GXL file, too. However, the node id serves only for the specification of the source or target, respectively, of an edge within a particular GXL file. The string attribute Linkage.Name, on the other hand, can be used to relate the same node between two GXL files. For instance, in the evolution of a software we have one GXL file for each software revision, but the Linkage.Name of the same node remains the same across each. To put it differently, while the attribute id for the same logical node in two different graphs may be different, its node attribute Linkage.Name must be the same in those two GXL files, otherwise they will be considered different nodes. Moreover, this Linkage.Name can be used as an identifier for CSV files with additional node metrics.

As opposed to nodes, edges do not have an edge attribute Linkage.Name that identifies them across different GXL files.

Attribute types and values

There are four different kinds of attribute types.

String attributes

String attributes are declared as follows:

  <string>c2</string>

where c2 would be the value of the string attribute.

Integer attributes

Integer attributes are declared as follows:

  <int>1</int>

where 1 would be the value of the integer attribute.

Floating point attributes

Floating point attributes are declared as follows:

  <int>3.141</int>

where 3.141 would be the value of the floating point attribute. A period must be used as a decimal point. Scientific floating point notation is possible, but not recommended to avoid potential syntax errors.

Boolean attributes (also known as toggles)

Boolean attributes (also known as toggles or binary properties) are declared without any further value as follows:

  <enum/>

For instance, the clause

  <node id="N1">
  <type xlink:href="Method"/>
      <attr name="Linkage.Is_Definition">
        <enum/>
      </attr>
  </node>

declares that the Method has the property Linkage.Is_Definition. If there is no such clause for a property, the node or edge does not possess this binary property.

Complete example

The complete example for the graph above looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE gxl SYSTEM "http://www.gupro.de/GXL/gxl-1.0.dtd">
<gxl xmlns:xlink="http://www.w3.org/1999/xlink">
  <graph id="CodeFacts" edgeids="true">
      <node id="N1">
      <type xlink:href="Method"/>
      <attr name="Source.Name">
        <string>m1</string>
      </attr>
      <attr name="Linkage.Name">
        <string>p1.c1.m1</string>
      </attr>
      <attr name="Metric.Number_Of_Calling_Routines">
        <int>1</int>
      </attr>
      <attr name="Metric.Number_Of_Called_Routines">
        <int>1</int>
      </attr>	  
      <attr name="Metric.Lines.LOC">
        <int>5</int>
      </attr>
      <attr name="Metric.McCabe_Complexity">
        <int>1</int>
      </attr>	  
    </node>
    <node id="N2">
      <type xlink:href="Field"/>
      <attr name="Source.Name">
        <string>f1</string>
      </attr>
      <attr name="Linkage.Name">
        <string>p1.c1.f1</string>
      </attr>  
      <attr name="Metric.Lines.LOC">
        <int>1</int>
      </attr>
      <attr name="Metric.Number_Of_Calling_Routines">
        <int>1</int>
      </attr>
      <attr name="Metric.Number_Of_Called_Routines">
        <int>1</int>
      </attr>	 	  
      </node>
	  <node id="N3">
      <type xlink:href="Class"/>
      <attr name="Source.Name">
        <string>c1</string>
      </attr>
      <attr name="Linkage.Name">
        <string>p1.c1</string>
      </attr>
      <attr name="Metric.Number_Of_Calling_Routines">
        <int>1</int>
      </attr>
      <attr name="Metric.Number_Of_Called_Routines">
        <int>1</int>
      </attr>	  
      <attr name="Metric.Lines.LOC">
        <int>1</int>
      </attr>
      <attr name="Metric.McCabe_Complexity">
        <int>1</int>
      </attr>	  
    </node>
    <node id="N4">
      <type xlink:href="Class"/>
      <attr name="Source.Name">
        <string>c2</string>
      </attr>
      <attr name="Linkage.Name">
        <string>p1.c2</string>
      </attr>
      <attr name="Metric.Number_Of_Calling_Routines">
        <int>1</int>
      </attr>
      <attr name="Metric.Number_Of_Called_Routines">
        <int>1</int>
      </attr>	  
      <attr name="Metric.Lines.LOC">
        <int>1</int>
      </attr>
      <attr name="Metric.McCabe_Complexity">
        <int>1</int>
      </attr>	  
    </node>
    <node id="N5">
      <type xlink:href="Class"/>
      <attr name="Source.Name">
        <string>c3</string>
      </attr>
      <attr name="Linkage.Name">
        <string>p1.c3</string>
      </attr>
      <attr name="Metric.Number_Of_Calling_Routines">
        <int>2</int>
      </attr>
      <attr name="Metric.Number_Of_Called_Routines">
        <int>2</int>
      </attr>	  
      <attr name="Metric.Lines.LOC">
        <int>2</int>
      </attr>
      <attr name="Metric.McCabe_Complexity">
        <int>2</int>
      </attr>		  
    </node>
    <node id="N6">
      <type xlink:href="Class"/>
      <attr name="Source.Name">
        <string>c4</string>
      </attr>
      <attr name="Linkage.Name">
        <string>p1.c4</string>
      </attr>
      <attr name="Metric.Number_Of_Calling_Routines">
        <int>3</int>
      </attr>
      <attr name="Metric.Number_Of_Called_Routines">
        <int>3</int>
      </attr>	  
      <attr name="Metric.Lines.LOC">
        <int>3</int>
      </attr>
      <attr name="Metric.McCabe_Complexity">
        <int>3</int>
      </attr>		  
    </node>
    <node id="N7">
      <type xlink:href="Package"/>
      <attr name="Source.Name">
        <string>p1</string>
      </attr>
      <attr name="Linkage.Name">
        <string>p1</string>
      </attr>
	  <attr name="Metric.Number_Of_Descendants">
        <int>3</int>
      </attr>	
    </node>
    <edge id="E1" from="N1" to="N3">
      <type xlink:href="Belongs_To"/>
    </edge>
    <edge id="E2" from="N2" to="N3">
      <type xlink:href="Belongs_To"/>
    </edge>
    <edge id="E3" from="N3" to="N7">
      <type xlink:href="Belongs_To"/>
    </edge>	
    <edge id="E4" from="N4" to="N7">
      <type xlink:href="Belongs_To"/>
    </edge>
    <edge id="E5" from="N5" to="N7">
      <type xlink:href="Belongs_To"/>
    </edge>
    <edge id="E6" from="N6" to="N7">
      <type xlink:href="Belongs_To"/>
    </edge>	
    <edge id="E7" from="N4" to="N5">
      <type xlink:href="Call"/>
    </edge>
    <edge id="E8" from="N5" to="N6">
      <type xlink:href="Call"/>
    </edge>
	<edge id="E9" from="N1" to="N2">
      <type xlink:href="Set"/>
    </edge>
  </graph>
</gxl>

Visualization of example graph

The user can configure the visualization for a graph stored in GXL in many ways. The following picture shows how the example graph would look like with the following visualization settings:

  • Leaf nodes are visualized as blocks.
  • The width of a block for a leaf node is determined by the node attribute Metric.Number_Of_Calling_Routines.
  • The height of a block for a leaf node is determined by the node attribute Metric.Lines.LOC.
  • The color of a block for a leaf node is determined by the node attribute Metric.McCabe_Complexity on a color gradient from white to red.
  • Inner nodes are drawn as cylinders.
  • The height of a cylinder for an inner node is set to the fixed constant 0.001.
  • The color of a cylinder for an inner node is determined by the node attribute Metric.Level on a color gradient from light blue to blue. The metric Metric.Level is not contained in the input GXL file, but computed when the graph is read. The level of a node is the number of hierarchical levels from the node to its root. A root has level 0. An immediate child of the root has level 1 and so on.
  • A circular layout was chosen to compute the positions of the nodes. Here the nesting of circles and blocks depicts the node hierarchy.
  • The edges are hierarchically bundled.

Visualization of example graph