RDFLib Rounding numeric values #1539

stuchalk · 2016-08-12T16:37:49Z

stuchalk
Aug 12, 2016

I have a numeric value represented as -2616.34857099 in JSON-LD (context file datatype xsd:float or xsd:double). In the output TTL file this is reported as "-2.616349E+03".

Why is RDFLib rounding the value to seven digits? Can this be fixed?

Stuart

joernhees · 2016-08-15T09:10:37Z

joernhees
Aug 15, 2016
Maintainer

can you provide a minimal example?

0 replies

stuchalk · 2016-08-15T09:50:49Z

stuchalk
Aug 15, 2016
Author

Attached are the input JSONLD file and output TL file from an RDF conversion.
Look at line 139 in the JSONLD file => "number": 14.00307400478,
Look at line 4597 in the TTL file => https://staging.chemsem.com/pub/keigito-20160804124458/mol-sys/s1/a2/atomMass/ gc:hasNumber "14.00307"^^ns0:float;

keigito-20160804124458.jsonld.txt
keigito-20160804124458.ttl.txt

0 replies

joernhees · 2016-08-15T13:37:28Z

joernhees
Aug 15, 2016
Maintainer

i was searching for something as minimal as this:

In [1]: from rdflib import *
INFO:rdflib:RDFLib Version: 4.2.1

In [2]: data = """{
  "@id": "urn:foo",
  "http://chemsem.com/number": 14.00307400478
}"""

In [3]: g = Graph().parse(data=data, format='json-ld')

In [4]: list(g)
Out[4]:
[(rdflib.term.URIRef(u'urn:foo'),
  rdflib.term.URIRef(u'http://chemsem.com/number'),
  rdflib.term.Literal(u'14.0030740048', datatype=rdflib.term.URIRef(u'http://www.w3.org/2001/XMLSchema#double')))]

In [5]: print g.serialize(format="turtle")
@prefix ns1: <http://chemsem.com/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<urn:foo> ns1:number 1.400307e+01 .


In [6]: print g.serialize(format="nt")
<urn:foo> <http://chemsem.com/number> "14.0030740048"^^<http://www.w3.org/2001/XMLSchema#double> .

Some tests show that this behavior originates from turtle.py/TurtleSerializer/label():

In [13]: l = list(g)[0][2]

In [14]: l.n3()
Out[14]: u'"14.0030740048"^^<http://www.w3.org/2001/XMLSchema#double>'

In [15]: l._literal_n3()
Out[15]: u'"14.0030740048"^^<http://www.w3.org/2001/XMLSchema#double>'

In [16]: l._literal_n3(use_plain=True)
Out[16]: u'1.400307e+01'

To be honest, I'm not entirely sure whether this is a bug or a feature, but i can see that it is annoying in your case. IIRC this is designed to make Literal(1) end up as 1 in ttl output, so simply switching it by default will cause trouble for people relying on that:

In [17]: Literal(1).n3()
Out[17]: u'"1"^^<http://www.w3.org/2001/XMLSchema#integer>'

In [18]: Literal(1)._literal_n3(use_plain=True)
Out[18]: u'1'

On the other hand making this easier to configure wouldn't hurt that much...

0 replies

stuchalk · 2016-08-15T13:40:42Z

stuchalk
Aug 15, 2016
Author

Thanks for looking into this. I would be happy if this were something that could be set up in the config. Different users have different needs...

0 replies

sopekmir · 2016-08-19T17:31:06Z

sopekmir
Aug 19, 2016

I have a bit different results, at least when it comes to the number of decimal digits transmitted through parsing of Turtle files to the graph in Python.

For a graph in Turtle:
@Prefix : http://chemsem.com/ .
@Prefix gc: http://purl.org/gc/ .
@Prefix xsd: http://www.w3.org/2001/XMLSchema# .

:v gc:hasValue 1.0947123456789123e+02 .
:w gc:hasValue "1.0947123456789123e+02"^^xsd:double.
:x gc:hasValue "1.0947123456789123e+02"^^xsd:float.

I'm getting:
(...)
http://chemsem.com/w http://purl.org/gc/hasValue 109.471234568
http://chemsem.com/x http://purl.org/gc/hasValue 109.471234568
http://chemsem.com/v http://purl.org/gc/hasValue 109.471234568

i.e. I have 12 significant figures (out of 15 I had in TTL).
This seems to be more than you reported. (I used rdflib 4.2.2-dev)
however, the results is the same witout respect to the XSD type indicated or no type at all !

0 replies

sopekmir · 2016-08-19T17:33:26Z

sopekmir
Aug 19, 2016

And ... when I added:

:s gc:hasValue -2616.34857099.

to my TTL, I'm getting:

http://chemsem.com/s http://purl.org/gc/hasValue -2616.34857099

in the Graph, i.e. no degradation of precision !

0 replies

sopekmir · 2016-08-19T17:41:06Z

sopekmir
Aug 19, 2016

And, upon:

ms=rdflib.URIRef("http://chemsem.com/s")
mp=rdflib.URIRef("http://purl.org/gc/hasValue")
va=t3.value(ms,mp)
print va
-2616.34857099
vaf=float(va)
print vaf
-2616.34857099

I see I have do degradation of precision in Python variables !

0 replies

niklasl · 2016-08-19T21:06:54Z

niklasl
Aug 19, 2016
Maintainer

I think the crucial difference is that Turtle differs in notation between xsd:decimal:

from rdflib import *

print(list(Graph().parse(data='''
    <urn:foo> <http://chemsem.com/number> 14.00307400478 .
    ''', format='turtle').objects()))

[rdflib.term.Literal(u'14.00307400478',
 datatype=rdflib.term.URIRef(u'http://www.w3.org/2001/XMLSchema#decimal'))]

And xsd:double:

print(list(Graph().parse(data='''
    <urn:foo> <http://chemsem.com/number> 1.400307400478E+01 .
    ''', format='turtle').objects()))

[rdflib.term.Literal(u'14.0030740048',
 datatype=rdflib.term.URIRef(u'http://www.w3.org/2001/XMLSchema#double'))]

Whereas JSON-LD only deals with xsd:double (due, IIRC, to the way JSON (or the intersection of JSON implementations at least), is limited to that):

print(list(Graph().parse(data='''
    {"@id": "urn:foo", "http://chemsem.com/number": 14.00307400478}
    ''', format='json-ld').objects()))

[rdflib.term.Literal(u'14.0030740048',
 datatype=rdflib.term.URIRef(u'http://www.w3.org/2001/XMLSchema#double'))]

It seems from this that the Turtle serializer in RDFLib is at fault for losing precision:

print(Graph().parse(data='''
    <urn:foo> <http://chemsem.com/number> 1.400307400478E+01 .
    ''', format='turtle').serialize(format='turtle'))

since it yields (with superfluous prefixes removed):

@prefix ns1: <http://chemsem.com/> .

<urn:foo> ns1:number 1.400307e+01 .

(That's my reading of it, since the Turtle spec section on numbers doesn't seem to impose any length limits.)

0 replies

satra · 2016-10-08T20:24:58Z

satra
Oct 8, 2016

@niklasl and @joernhees - some suggestions below.

this specific issue goes away if we can represent the exponent form provided without datatype internally as Decimal

data = '''                                                                                          
    <urn:foo> <http://chemsem.com/number> 1.4003074004780012E+01 .                                  
    '''                                                                                             
print(Graph().parse(data=data, format='turtle').serialize(format='turtle').decode('utf-8'))

returns

<urn:foo> ns1:number 14.003074004780012 .

with this change in notation3.py

*** 1444,1450 ****
                  m = exponent_syntax.match(argstr, i)
                  if m:
                      j = m.end()
!                     res.append(Decimal(argstr[i:j]))
                      return j

instead of:

!                     res.append(float(argstr[i:j]))

however for internally generated data, there is no good answer as to how to serialize a float as text. would it be useful to have a precision flag for serializing floats?

a parameter free change would be something like the following in term.py:

                if self.datatype == _XSD_DOUBLE:
                    return '%s' % Decimal.from_float(self.value)

instead of:

                    return sub("\\.?0*e", "e", '%e' % float(self))

would return:

<urn:foo> ns1:number 14.003074004780000194614331121556460857391357421875 .

or with a precision_digits parameter

                    template = '%%.%de' % precision_digits
                    return template % self.value

returns

<urn:foo> ns1:number 1.40030740047800002e+01 .

0 replies

stuchalk · 2016-10-19T16:53:20Z

stuchalk
Oct 19, 2016
Author

@niklasl Can you look at @satra's ideas and see if any are a solution? Any update to RDFlib on this issues would be much appreciated. Stuart

0 replies

gromgull · 2016-10-19T18:14:11Z

gromgull
Oct 19, 2016
Maintainer

@niklasl is right, the issue is that 2.2 in turtle is datatype decimal, whereas 2.2e0 is a double. Since we have a double, we either have to serialize as a string with explicit datatype, or using the e notation.

We do the serialisation using "%e"%myfloat notation, where Python will default to 6 digits of precision.

For the original posters literal we get the "right" value with 11 digits:

> '%.11e'%14.0030740048
'1.40030740048e+01'

BUT, there is no way to know how much precision to include, since this information is lost in a float/double.

My take would be: xsd:double and xsd:float are intended to be represented as a floating point value internally, with all the trouble this bring with it.

If you need higher precision you can use xsd:decimal (which maps to the Decimal class in python. These should keep your the same precision that you input, and you can also make your json-ld save literals with this data-type explicitly.

See also: https://www.w3.org/TR/swbp-xsch-datatypes/#sec-numerics

0 replies

gromgull · 2017-01-19T13:24:40Z

gromgull
Jan 19, 2017
Maintainer

As I said, I would vote that this is a "feature" - if not the real fix is to make the literal class save and re-use the original lexical representation, but this would influence lots of things, this for example:

In [104]: rdflib.Literal('02', datatype=rdflib.XSD.integer)
Out[104]: rdflib.term.Literal(u'2', datatype=rdflib.term.URIRef(u'http://www.w3.org/2001/XMLSchema#integer'))

What would happen if you compared "02"^^xsd:int and "2"^^xsd:int?

Comparisons today happen in value-space on purpose.

In any case, I am moving this to milestone 5.0 :)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RDFLib Rounding numeric values #1539

{{title}}

Replies: 12 comments

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

RDFLib Rounding numeric values #1539

stuchalk Aug 12, 2016

Replies: 12 comments

joernhees Aug 15, 2016 Maintainer

stuchalk Aug 15, 2016 Author

joernhees Aug 15, 2016 Maintainer

stuchalk Aug 15, 2016 Author

sopekmir Aug 19, 2016

sopekmir Aug 19, 2016

sopekmir Aug 19, 2016

niklasl Aug 19, 2016 Maintainer

satra Oct 8, 2016

stuchalk Oct 19, 2016 Author

gromgull Oct 19, 2016 Maintainer

gromgull Jan 19, 2017 Maintainer

stuchalk
Aug 12, 2016

joernhees
Aug 15, 2016
Maintainer

stuchalk
Aug 15, 2016
Author

joernhees
Aug 15, 2016
Maintainer

stuchalk
Aug 15, 2016
Author

sopekmir
Aug 19, 2016

sopekmir
Aug 19, 2016

sopekmir
Aug 19, 2016

niklasl
Aug 19, 2016
Maintainer

satra
Oct 8, 2016

stuchalk
Oct 19, 2016
Author

gromgull
Oct 19, 2016
Maintainer

gromgull
Jan 19, 2017
Maintainer