Replies: 11 comments
-
I think that RDF and SPARQL APIs distinguish between variables and BNodes. Jim On Thu, Apr 30, 2015 at 9:37 AM Jörn Hees notifications@github.com wrote:
|
Beta Was this translation helpful? Give feedback.
-
Thanks for the reply. At the moment i'm just after BGPs, but you're right, it could get a lot more complicated. Using a SPIN-like approach to just map SPARQL to RDF and then use your RGDA1 algorithm on it is a very interesting idea. They're not transforming the Variables into BNodes, but i could. Another problem i see for now is the order preserving Turtle sequence they're using (http://spinrdf.org/sp.html#overview) for the BGP, as i want a statement-order independent mapping... so i guess i'll try to go for a more direct mapping into an RDF graph via BNodes for the reification of each BGP statement and BNodes for all Variables. |
Beta Was this translation helpful? Give feedback.
-
FYI, we added support for this in the Attean perl library to help with caching. It was a fairly simply extension of existing graph canonicalization code to handle any set of patterns (triples, triple patterns, quads, quad patterns, SPARQL Results...).
|
Beta Was this translation helpful? Give feedback.
-
This might not be too immediately practical, since the code I'm pointing See HashablePatternList: https://code.google.com/p/fuxi/source/browse/lib/Rete/Network.py#50 Perhaps there is a more recent port of this .. However, this only covers Basic Graph Patterns ... On Fri, May 1, 2015 at 3:01 PM, Gregory Todd Williams <
|
Beta Was this translation helpful? Give feedback.
-
first of all: thanks for all the cool feedback, i guess i would've chewed on this for quite a while without you guys ;) I ended up using the following reification based approach, mostly because it's quite short (most of the below is doctest) and uses RGDA1 canonicalization that we already have in rdflib. Below I first convert each triple into a reified statement in a new graph, also converting Variables into BNodes. Then I use the RGDA1 canonicalization on that graph and afterwards re-extract the triples, transforming the renamed BNodes into renamed Variables. The result is a variable-name and order independent canonicalization of the BGP: def canonicalize_sparql_bgp(gp):
"""Returns a canonical basic graph pattern (BGP) with canonical var names.
:param gp: a GraphPattern in form of a list of triples with Variables
:return: A canonical GraphPattern with Variables renamed.
>>> U = URIRef
>>> V = Variable
>>> gp1 = [
... (V('blub'), V('bar'), U('blae')),
... (V('foo'), V('bar'), U('bla')),
... (V('foo'), U('poo'), U('blub')),
... ]
>>> cgp = canonicalize_sparql_bgp(gp1)
>>> v_blub = V('cb0')
>>> v_bar = V(
... 'cb3d1b27f6269e23775a8da8d966dd669aa8262176ae6b938cccd653316791c42269')
>>> v_foo = V(
... 'cb3b2718590899b3875a33cdc4aad060832711a614ee9c0ac83323f2e961bcc3f2db')
>>> expected = [
... (v_blub, v_bar, U('blae')),
... (v_foo, v_bar, U('bla')),
... (v_foo, U('poo'), U('blub'))
... ]
>>> cgp == expected
True
To show that this is variable name and order independent we shuffle gp1 and
rename its vars:
>>> gp2 = [
... (V('foonkyname'), V('baaar'), U('bla')),
... (V('foonkyname'), U('poo'), U('blub')),
... (V('funkyname'), V('baaar'), U('blae')),
... ]
>>> cgp == canonicalize_sparql_bgp(gp2)
True
"""
assert isinstance(gp, Iterable)
g = Graph()
for t in gp:
triple_bnode = BNode()
s, p, o = [BNode(i) if isinstance(i, Variable) else i for i in t]
g.add((triple_bnode, RDF['type'], RDF['Statement']))
g.add((triple_bnode, RDF['subject'], s))
g.add((triple_bnode, RDF['predicate'], p))
g.add((triple_bnode, RDF['object'], o))
cg = rdflib.compare.to_canonical_graph(g)
cgp = []
for triple_bnode in cg.subjects(RDF['type'], RDF['Statement']):
t = [
cg.value(triple_bnode, p)
for p in [RDF['subject'], RDF['predicate'], RDF['object']]
]
t = tuple([Variable(i) if isinstance(i, BNode) else i for i in t])
cgp.append(t)
return sorted(cgp) remaining question is: do we want this in rdflib somewhere? |
Beta Was this translation helpful? Give feedback.
-
Hmm, SPIN's approach is not what I would have imagined. They seem to have a special sp:_ URI space for variables, and like Jorn said, they use a list to order the BGP patterns, both of which seem odd to me, but are probably needed if they want to do complete 1:1 reconstructions. |
Beta Was this translation helpful? Give feedback.
-
I'd like to draw your attention to a related discussion flaming up on the semantic web mailing list: |
Beta Was this translation helpful? Give feedback.
-
@jimmccusker: Would it be difficult to extend RGDA1 to N3? Surely, you would first have to extend it to Generalized RDF Graphs. The next problem are formulas: In order to run RGDA1 on an N3 formula, it could first determine a hash for each subformula by invoking itself recursively. Then it could use these hashes in place of the original formulas. But then there are also the variable bindings ... Finally, having support for N3 could make it easier to canonicalize SPARQL? |
Beta Was this translation helpful? Give feedback.
-
Some small changes would need to be made to support Generalized RDF graphs I think a better general use approach for canonicalizing SPARQL will be Is full N3 used commonly anymore? I haven't seen much of it. Jim On Thu, May 21, 2015 at 3:31 PM Urs Holzer notifications@github.com wrote:
|
Beta Was this translation helpful? Give feedback.
-
Well, I am using N3 extensively. (Although I am usually not representative.) Also, Jos De Roo is still actively developing EYE. But okay, point about SPARQL taken. |
Beta Was this translation helpful? Give feedback.
-
FYI I think I'm now canonicalizing blank nodes in predicates in RGDA1, but I haven't tested it explicitly. |
Beta Was this translation helpful? Give feedback.
-
I'm currently performing >> 1M SPARQL Queries as part of some machine learning algorithm. As this takes a while, i thought about caching results for SPARQL queries. The problem here is, that different SPARQL queries can contain Variables with different names, but are isomorphic otherwise. Example:
is isomorphic to
For quick checking in a cache it would be cool to have a canonical form of a SPARQL Pattern, very much like #441 (
rdflib.compare.to_canonical_graph(g1)
) forrdflib.Graph
.A SPARQL Query's pattern part can be represented as an
rdflib.Graph
which containsVariable
s. By replacing Variables with BNodes (using the variable name as bnode id) one gets pretty close to a graph that one could use theto_canonical_graph
algorithm on, with one exception: BNodes can't be used as predicates (RDF Concepts).As this is out of spec, i guess it's ok this fails:
Nevertheless, as this is quite close to a cool feature and graph canonicalization isn't exactly the easiest problem to think about: is it maybe possible to slightly adapt the RGDA1 algorithm to support BNodes in the predicate position as well and thereby also making it fit for SPARQL Patterns? Maybe @jimmccusker has an idea on this?
Beta Was this translation helpful? Give feedback.
All reactions