Note: project under heavy development!
UCFS is an Unuversal Context-Free Solver: a tool to solve problems related to context-free and regular language intersection. Examples of such problems:
- Parsing
- Context-free path querying (CFPQ)
- Context-free language reachability (CFL-r)
Usage: kotgll options_list
Options:
--input -> Input format (always required) { Value should be one of [string, graph] }
--grammar -> Grammar format (always required) { Value should be one of [cfg, rsm] }
--sppf [ON] -> Sppf mode { Value should be one of [on, off] }
--inputPath -> Path to input txt file (always required) { String }
--grammarPath -> Path to grammar txt file (always required) { String }
--outputPath -> Path to output txt file (always required) { String }
--help, -h -> Usage info
git clone https://github.com/FormalLanguageConstrainedPathQuerying/kotgll.git
or
git clone git@github.com:FormalLanguageConstrainedPathQuerying/kotgll.git
or
gh repo clone FormalLanguageConstrainedPathQuerying/kotgll
cd kotgll
gradle run --args="--help"
You will see the "Options list" message.
gradle run --args="--input graph --grammar rsm --sppf off --inputPath src/test/resources/cli/TestGraphReadWriteCSV/dyck.csv --grammarPath src/test/resources/cli/TestRSMReadWriteTXT/dyck.txt --outputPath ./result.txt"
curl -L -O https://github.com/FormalLanguageConstrainedPathQuerying/kotgll/releases/download/v1.0.0/kotgll-1.0.0.jar
java -jar kotgll-1.0.0.jar --input graph --grammar rsm --sppf off --inputPath src/test/resources/cli/TestGraphReadWriteCSV/dyck.csv --grammarPath src/test/resources/cli/TestRSMReadWriteTXT/dyck.txt --outputPath ./result.txt
StartNonterminal("S")
Nonterminal("S") -> Terminal("subClassOf_r") Nonterminal("S") Terminal("subClassOf")
Nonterminal("S") -> Terminal("subClassOf_r") Terminal("subClassOf")
Nonterminal("S") -> Terminal("type_r") Nonterminal("S") Terminal("type")
Nonterminal("S") -> Terminal("type_r") Terminal("type")
StartState(id=0,nonterminal=Nonterminal("S"),isStart=true,isFinal=false)
State(id=0,nonterminal=Nonterminal("S"),isStart=true,isFinal=false)
State(id=1,nonterminal=Nonterminal("S"),isStart=false,isFinal=false)
State(id=4,nonterminal=Nonterminal("S"),isStart=false,isFinal=false)
State(id=3,nonterminal=Nonterminal("S"),isStart=false,isFinal=true)
State(id=2,nonterminal=Nonterminal("S"),isStart=false,isFinal=false)
State(id=6,nonterminal=Nonterminal("S"),isStart=false,isFinal=true)
State(id=5,nonterminal=Nonterminal("S"),isStart=false,isFinal=false)
TerminalEdge(tail=0,head=1,terminal=Terminal("subClassOf_r"))
TerminalEdge(tail=0,head=4,terminal=Terminal("type_r"))
TerminalEdge(tail=1,head=3,terminal=Terminal("subClassOf"))
NonterminalEdge(tail=1,head=2,nonterminal=Nonterminal("S"))
TerminalEdge(tail=4,head=6,terminal=Terminal("type"))
NonterminalEdge(tail=4,head=5,nonterminal=Nonterminal("S"))
TerminalEdge(tail=2,head=3,terminal=Terminal("subClassOf"))
TerminalEdge(tail=5,head=6,terminal=Terminal("type"))
The GLL algorithm has been modified to support graph input. The proposed modification has been evaluated on several real graphs for the scenario of finding all pairs of reachability.
Machine configuration: PC with Ubuntu 20.04, Intel Core i7-6700 3.40GHz CPU, DDR4 64Gb RAM.
Enviroment configuration:
- Java HotSpot(TM) 64-Bit server virtual machine (build 15.0.2+7-27, mixed mode, sharing).
- JVM heap configuration: 55Gb both xms and xmx.
The graph data is selected from CFPQ_Data dataset.
A detailed description of the graphs is listed bellow.
Graph name | |V| | |E| | #subClassOf | #type | #broaderTransitive |
---|---|---|---|---|---|
Enzyme | 48 815 | 86 543 | 8 163 | 14 989 | 8 156 |
Eclass | 239 111 | 360 248 | 90 962 | 72 517 | 0 |
Go hierarchy | 45 007 | 490 109 | 490 109 | 0 | 0 |
Go | 582 929 | 1 437 437 | 94 514 | 226 481 | 0 |
Geospecies | 450 609 | 2 201 532 | 0 | 89 065 | 20 867 |
Taxonomy | 5 728 398 | 14 922 125 | 2 112 637 | 2 508 635 | 0 |
All queries used in evaluation are variants of same-generation query.
The inverse of an x
relation and the respective edge is denoted as x_r
.
Grammars used for RDF graphs:
G1
S -> subClassOf_r S subClassOf | subClassOf_r subClassOf
| type_r S type | type_r type
The representation of G1 context-free grammar in the repository can be found here.
The representation of G1 context-free grammar as recursive automaton in the repository can be found here.
The results of the all pairs reachability queries evaluation on graphs related to RDF analysis are listed below.
In each row, the best mean time in seconds is highlighted in bold.
Graph | CFG | RSM | GLL4Graph |
---|---|---|---|
Enzyme | 0.107 | 0.044 | 0.22 |
Eclass | 0.94 | 0.43 | 1.5 |
Go hierarchy | 4.1 | 3.0 | 3.6 |
Go | 3.2 | 1.86 | 5.55 |
Geospecies | 0.97 | 0.34 | 2.89 |
Taxonomy | 31.2 | 14.8 | 45.4 |
More results, but in raw form, can be found in the repository kotgll_benchmarks