Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

documentation element: character data between child elements not allowed #390

Closed
woutdenolf opened this issue Mar 4, 2024 · 4 comments
Closed
Labels
wontfix This will not be worked on

Comments

@woutdenolf
Copy link

I'm trying to use xmlschema and found a validation error which we don't get with lxml (see nexusformat/definitions#1368)

import xmlschema
xmlschema.XMLSchema("nxdl.xsd")
Traceback (most recent call last):
  File "/home/denolf/virtualenvs/nexus/lib/python3.10/site-packages/xmlschema/validators/schemas.py", line 1197, in _parse_inclusions
    self.include_schema(location, self.base_url)
  File "/home/denolf/virtualenvs/nexus/lib/python3.10/site-packages/xmlschema/validators/schemas.py", line 1264, in include_schema
    schema = type(self)(
  File "/home/denolf/virtualenvs/nexus/lib/python3.10/site-packages/xmlschema/validators/schemas.py", line 482, in __init__
    self.parse_error(e.reason or e, elem=e.elem)
  File "/home/denolf/virtualenvs/nexus/lib/python3.10/site-packages/xmlschema/validators/xsdbase.py", line 196, in parse_error
    raise error
xmlschema.validators.exceptions.XMLSchemaParseError: character data between child elements not allowed:

Schema component:

  <xs:element xmlns:xs="http://www.w3.org/2001/XMLSchema" name="example">m^2</xs:element>

Path: /xs:schema/xs:simpleType[4]/xs:annotation/xs:documentation/xs:element[1]

Schema URL: file:///tmp/nexus_definitions/nxdlTypes.xsd

Origin URL: file:///tmp/nexus_definitions/nxdl.xsd

The issue is this

<?xml version="1.0" encoding="UTF-8"?>

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
	targetNamespace="http://definition.nexusformat.org/nxdl/3.1"
	xmlns:nxdl="http://definition.nexusformat.org/nxdl/3.1"
	elementFormDefault="qualified">

	<xs:simpleType name="NX_ANGLE">
		<xs:annotation>
			<xs:documentation>
				units of angle
				<xs:element name="example">rad</xs:element>
			</xs:documentation>
		</xs:annotation>
		<xs:restriction base="xs:string" />
	</xs:simpleType>
	
</xs:schema>

The docs https://www.w3schools.com/xml/el_documentation.asp say that

 <documentation
source=URI reference
xml:lang=language>

Any well-formed XML content

</documentation> 

I'm not sure what Any well-formed XML content means but in xmlschema it means either plain text or children, not both? Can anyone provide additional context? I'm not that familiar with XMLSchema.

@woutdenolf
Copy link
Author

For reference, the error comes from

reason = _("character data between child elements not allowed")

@brunato
Copy link
Member

brunato commented Mar 5, 2024

Hi,
the content type of <documentation> should be mixed but, maybe for a wrong derivation on the meta-schema the content is set as element-only. I'll try to fix ASAP, meanwhile, for obtaining a schema instance, provide validation='lax':

>>> import xmlschema
>>> xs = xmlschema.XMLSchema11("nxdlTypes.xsd", validation='lax')
>>> len(xs.all_errors)
33

despite the errors the schema instance should be usable for validating XML data.

thank you

@brunato brunato added the bug Something isn't working label Mar 5, 2024
@brunato
Copy link
Member

brunato commented Mar 8, 2024

The matter is with:

<xs:element name="example">rad</xs:element>

and similar tags inside xs:documentation elements.

That is actually an invalid element declaration because an xs:element cannot contain character data.

Probably many validators don't check the tags inside xs:documentation but this could be incorrect because the content of xs:documentation is a lax wildcard, so the well-know elements in its content must be valid (this is for sure the case of an xs:element).

This is the XSD meta-schema part that declare an xs:documentation element:

<xs:element name="documentation" id="documentation">
   <xs:annotation>
     <xs:documentation source="http://www.w3.org/TR/xmlschema-1/#element-documentation"/>
   </xs:annotation>
   <xs:complexType mixed="true">
    <xs:sequence minOccurs="0" maxOccurs="unbounded">
     <xs:any processContents="lax"/>
    </xs:sequence>
    <xs:attribute name="source" type="xs:anyURI"/>
    <xs:attribute ref="xml:lang"/>
    <xs:anyAttribute namespace="##other" processContents="lax"/>
   </xs:complexType>
 </xs:element>

Changing the processing of lax wildcards for this is not possible. Introducing another option for handling xs:documentation sections in a special mode is bad. Consider that this library is used also for decoding, not only for validation.

So if there aren't other evidences that force the interpretation of xs:documentation section as a skip wildcard instead of a lax wildcard the only option is to fix your schema (nxdlTypes.xsd) or use the workaround that i suggested above.

P.S.: also the schema validator embedded in my IDE (PyCharm) reports nxdlTypes.xsd schema as invalid:

image

@brunato brunato added wontfix This will not be worked on and removed bug Something isn't working labels Mar 8, 2024
@woutdenolf
Copy link
Author

woutdenolf commented Mar 9, 2024

I totally misinterpreted the validation error. It is not the presence of an xs:element that is the problem, it is the xs:element itself which is invalid. We will need to fix our schema. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

2 participants