Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

If a file has multiple time the same xml:id, the file is not parsed by lxml #141

Open
PonteIneptique opened this issue Apr 5, 2019 · 6 comments

Comments

@PonteIneptique
Copy link
Member

This is an issue because it might be weird to expect this as a non-parsable error. I think in this context, we should be able to say "Non parsable because multiple duplicate xml:id" or whatever.

@sonofmun
Copy link
Contributor

sonofmun commented Apr 5, 2019

I think there are multiple error messages that we should try to make more specific, if we can. Although I also think (if I recall correctly) that if you bump the verbosity up to 10 (only because it doesn't go to 11), that at least the DTD errors are shown.

@PonteIneptique
Copy link
Member Author

Agreed, we should also make the message better one day.
But this is not a DTD error : if feels weird that duplicateness (?) of xml:ids actually prevent LXML to parse the doc !

@sonofmun
Copy link
Contributor

sonofmun commented Apr 5, 2019

Overlapping xml:ids is actually invalid XML, isn't it? Like overlapping tags. Does LXML throw an appropriate error that we can catch and interpret on our side? Or does it just lock up?

@PonteIneptique
Copy link
Member Author

lxml.etree.XMLSyntaxError: ID Meyer_1_l_1 already defined, line 484, column 46

@sonofmun
Copy link
Contributor

sonofmun commented Apr 5, 2019

Looking at that error, I don't know if we can effectively parse it to be able to say that the problem is with the xml:id. Although maybe the fact that the message begins with "ID" could help here. However, it seems to me that the better alternative would be to return the text of the error, i.e., "ID Meyer_1_l_1 already defined, line 484, column 46". That way the user would know precisely what and where the error occurred. Perhaps these should be shown under the table as opposed to within the table. And in the table we could call this an "XML Parsing Error" or something similar.

@sonofmun
Copy link
Contributor

Found another error today that we need to specify better. When the xml:lang attribute is missing on a ti:translation element, Hook gives the error Metadata availability failed. This is not very helpful and should be more specific. Perhaps we should change the title of this issue to "Better specify hook errors"?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants