RDFLib Rounding numeric values #1539
Replies: 12 comments
-
can you provide a minimal example? |
Beta Was this translation helpful? Give feedback.
-
Attached are the input JSONLD file and output TL file from an RDF conversion. keigito-20160804124458.jsonld.txt |
Beta Was this translation helpful? Give feedback.
-
i was searching for something as minimal as this:
Some tests show that this behavior originates from turtle.py/TurtleSerializer/label():
To be honest, I'm not entirely sure whether this is a bug or a feature, but i can see that it is annoying in your case. IIRC this is designed to make
On the other hand making this easier to configure wouldn't hurt that much... |
Beta Was this translation helpful? Give feedback.
-
Thanks for looking into this. I would be happy if this were something that could be set up in the config. Different users have different needs... |
Beta Was this translation helpful? Give feedback.
-
I have a bit different results, at least when it comes to the number of decimal digits transmitted through parsing of Turtle files to the graph in Python. For a graph in Turtle: :v gc:hasValue 1.0947123456789123e+02 . I'm getting: i.e. I have 12 significant figures (out of 15 I had in TTL). |
Beta Was this translation helpful? Give feedback.
-
And ... when I added: :s gc:hasValue -2616.34857099. to my TTL, I'm getting: http://chemsem.com/s http://purl.org/gc/hasValue -2616.34857099 in the Graph, i.e. no degradation of precision ! |
Beta Was this translation helpful? Give feedback.
-
And, upon: ms=rdflib.URIRef("http://chemsem.com/s") I see I have do degradation of precision in Python variables ! |
Beta Was this translation helpful? Give feedback.
-
I think the crucial difference is that Turtle differs in notation between from rdflib import *
print(list(Graph().parse(data='''
<urn:foo> <http://chemsem.com/number> 14.00307400478 .
''', format='turtle').objects()))
[rdflib.term.Literal(u'14.00307400478',
datatype=rdflib.term.URIRef(u'http://www.w3.org/2001/XMLSchema#decimal'))] And print(list(Graph().parse(data='''
<urn:foo> <http://chemsem.com/number> 1.400307400478E+01 .
''', format='turtle').objects()))
[rdflib.term.Literal(u'14.0030740048',
datatype=rdflib.term.URIRef(u'http://www.w3.org/2001/XMLSchema#double'))] Whereas JSON-LD only deals with print(list(Graph().parse(data='''
{"@id": "urn:foo", "http://chemsem.com/number": 14.00307400478}
''', format='json-ld').objects()))
[rdflib.term.Literal(u'14.0030740048',
datatype=rdflib.term.URIRef(u'http://www.w3.org/2001/XMLSchema#double'))] It seems from this that the Turtle serializer in RDFLib is at fault for losing precision: print(Graph().parse(data='''
<urn:foo> <http://chemsem.com/number> 1.400307400478E+01 .
''', format='turtle').serialize(format='turtle')) since it yields (with superfluous prefixes removed): @prefix ns1: <http://chemsem.com/> .
<urn:foo> ns1:number 1.400307e+01 . (That's my reading of it, since the Turtle spec section on numbers doesn't seem to impose any length limits.) |
Beta Was this translation helpful? Give feedback.
-
@niklasl and @joernhees - some suggestions below. this specific issue goes away if we can represent the exponent form provided without datatype internally as data = '''
<urn:foo> <http://chemsem.com/number> 1.4003074004780012E+01 .
'''
print(Graph().parse(data=data, format='turtle').serialize(format='turtle').decode('utf-8')) returns
with this change in notation3.py *** 1444,1450 ****
m = exponent_syntax.match(argstr, i)
if m:
j = m.end()
! res.append(Decimal(argstr[i:j]))
return j instead of: ! res.append(float(argstr[i:j])) however for internally generated data, there is no good answer as to how to serialize a float as text. would it be useful to have a precision flag for serializing floats? a parameter free change would be something like the following in term.py: if self.datatype == _XSD_DOUBLE:
return '%s' % Decimal.from_float(self.value) instead of: return sub("\\.?0*e", "e", '%e' % float(self)) would return:
or with a template = '%%.%de' % precision_digits
return template % self.value returns
|
Beta Was this translation helpful? Give feedback.
-
@niklasl Can you look at @satra's ideas and see if any are a solution? Any update to RDFlib on this issues would be much appreciated. Stuart |
Beta Was this translation helpful? Give feedback.
-
@niklasl is right, the issue is that We do the serialisation using For the original posters literal we get the "right" value with 11 digits: > '%.11e'%14.0030740048
'1.40030740048e+01' BUT, there is no way to know how much precision to include, since this information is lost in a float/double. My take would be: If you need higher precision you can use See also: https://www.w3.org/TR/swbp-xsch-datatypes/#sec-numerics |
Beta Was this translation helpful? Give feedback.
-
As I said, I would vote that this is a "feature" - if not the real fix is to make the literal class save and re-use the original lexical representation, but this would influence lots of things, this for example:
What would happen if you compared Comparisons today happen in value-space on purpose. In any case, I am moving this to milestone 5.0 :) |
Beta Was this translation helpful? Give feedback.
-
I have a numeric value represented as -2616.34857099 in JSON-LD (context file datatype xsd:float or xsd:double). In the output TTL file this is reported as "-2.616349E+03".
Why is RDFLib rounding the value to seven digits? Can this be fixed?
Stuart
Beta Was this translation helpful? Give feedback.
All reactions