Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPARQLWrapper does not work for CONSTRUCT and DESCRIBE queries on the UniProt SPARQL endpoint which is Virtuoso #234

Open
vemonet opened this issue Jun 5, 2024 · 3 comments

Comments

@vemonet
Copy link

vemonet commented Jun 5, 2024

When running any CONSTRUCT or DESCRIBE query on the UniProt SPARQL endpoint https://sparql.uniprot.org/sparql/, whatever the return format asked (XML, turtle) SPARQLWrapper fails to resolve the query

Code to reproduce:

When asking for XML at least an error is thrown:

from SPARQLWrapper import TURTLE, XML, SPARQLWrapper

query = """PREFIX up: <http://purl.uniprot.org/core/>
PREFIX taxon: <http://purl.uniprot.org/taxonomy/>
CONSTRUCT
{
	?protein a up:HumanProtein .
}
WHERE
{
	?protein a up:Protein .
	?protein up:organism taxon:9606
} LIMIT 10"""

sparql_endpoint = SPARQLWrapper("https://sparql.uniprot.org/sparql/")
sparql_endpoint.setReturnFormat(XML)
sparql_endpoint.setQuery(query)

results = sparql_endpoint.query().convert()
print(results)

Error message:

ExpatError                                Traceback (most recent call last)
Cell In[8], line 20
     17 # sparql_endpoint.setReturnFormat(TURTLE)
     18 sparql_endpoint.setQuery(query)
---> 20 results = sparql_endpoint.query().convert()
     21 print(results)

File ~/dev/.venv/lib/python3.10/site-packages/SPARQLWrapper/Wrapper.py:1190, in QueryResult.convert(self)
   1188 if _content_type_in_list(ct, _SPARQL_XML):
   1189     _validate_format("XML", [XML], ct, self.requestedFormat)
-> 1190     return self._convertXML()
   1191 elif _content_type_in_list(ct, _XML):
   1192     _validate_format("XML", [XML], ct, self.requestedFormat)

File ~/dev/.venv/lib/python3.10/site-packages/SPARQLWrapper/Wrapper.py:1073, in QueryResult._convertXML(self)
   1065 def _convertXML(self) -> Document:
   1066     """
   1067     Convert an XML result into a Python dom tree. This method can be overwritten in a
   1068     subclass for a different conversion method.
   (...)
   1071     :rtype: :class:`xml.dom.minidom.Document`
   1072     """
-> 1073     doc = parse(self.response)
   1074     rdoc = cast(Document, doc)
...
--> 211     parser.Parse(b"", True)
    212 except ParseEscape:
    213     pass

ExpatError: no element found: line 1, column 0

When asking for turtle, SPARQLWrapper does not even throw an error:

from SPARQLWrapper import TURTLE, XML, SPARQLWrapper

query = """PREFIX up: <http://purl.uniprot.org/core/>
PREFIX taxon: <http://purl.uniprot.org/taxonomy/>
CONSTRUCT
{
	?protein a up:HumanProtein .
}
WHERE
{
	?protein a up:Protein .
	?protein up:organism taxon:9606
} LIMIT 10"""

sparql_endpoint = SPARQLWrapper("https://sparql.uniprot.org/sparql/")
# sparql_endpoint.setReturnFormat(XML)
sparql_endpoint.setReturnFormat(TURTLE)
sparql_endpoint.setQuery(query)

results = sparql_endpoint.query().convert()
print(results)

Printing results gives HTML: b'<!DOCTYPE html SYSTEM "about:legacy-compat">\n<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en"><head><title>UniProt</title>......

UniProt uses OpenLink Virtuoso and supports the SPARQL 1.1 Standard.

@vemonet
Copy link
Author

vemonet commented Jun 5, 2024

Using requests with the most logical config to request a SPARQL endpoint just works, so the problem is on SPARQLWrapper doing weird things internally:

import requests
from rdflib import Graph

query = """PREFIX up: <http://purl.uniprot.org/core/>
PREFIX taxon: <http://purl.uniprot.org/taxonomy/>
CONSTRUCT
{
	?protein a up:HumanProtein .
}
WHERE
{
	?protein a up:Protein .
	?protein up:organism taxon:9606
} LIMIT 10"""

response = requests.post(
    "https://sparql.uniprot.org/sparql/",
    headers={
        "Accept": "text/turtle"
    },
    data={
        "query": query
    },
    timeout=60,
)
response.raise_for_status()
g = Graph()
g.parse(data=response.text, format="turtle")

print(response.text)
print(len(g))

In bonus we get basic features like timeout working! (the .setTimeout() option from SPARQLWrapper does not work at all, at least for UniProt endpoint, but this should go in another issue)

@JervenBolleman
Copy link

UniProt is not pure virtuoso and has some middleware that expects accept headers to ask for an rdf format if using describe and or construct.

@vemonet
Copy link
Author

vemonet commented Jun 20, 2024

@JervenBolleman SPARQLWrapper also fails to run SELECT queries to SwissLipids https://beta.sparql.swisslipids.org/

Error 500 Internal Server Error</h1><p>The server was not able to handle your request.:

from SPARQLWrapper import XML, SPARQLWrapper, JSON

query = """PREFIX sh: <http://www.w3.org/ns/shacl#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT DISTINCT ?comment ?query
WHERE
{
    ?sq a sh:SPARQLExecutable ;
        rdfs:label|rdfs:comment ?comment ;
        sh:select|sh:ask|sh:construct|sh:describe ?query .
}"""

sparql_endpoint = SPARQLWrapper("https://beta.sparql.swisslipids.org/")
sparql_endpoint.setReturnFormat(XML)
sparql_endpoint.setTimeout(60)
sparql_endpoint.setQuery(query)

results = sparql_endpoint.query().convert()
print(results)

With requests it works:

import requests

query = """PREFIX sh: <http://www.w3.org/ns/shacl#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT DISTINCT ?comment ?query
WHERE
{
    ?sq a sh:SPARQLExecutable ;
        rdfs:label|rdfs:comment ?comment ;
        sh:select|sh:ask|sh:construct|sh:describe ?query .
}"""

response = requests.post(
    "https://beta.sparql.swisslipids.org/",
    headers={
        "Accept": "application/json",
        "User-agent": "sparqlwrapper 2.0.1a0 (rdflib.github.io/sparqlwrapper)"
    },
    data={
        "query": query
    },
    timeout=60,
)
try:
    response.raise_for_status()
    print(response.json())
except requests.exceptions.HTTPError as e:
    print(e)
    print(response.text)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants