Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StartLocation offset does not match to SystemId during external entity replacing #91

Open
nkutsche opened this issue Nov 3, 2019 · 1 comment

Comments

@nkutsche
Copy link
Contributor

nkutsche commented Nov 3, 2019

Hi,

if I set the properties IS_SUPPORTING_EXTERNAL_ENTITIES and IS_REPLACING_ENTITY_REFERENCES to true I'm facing the following problem. The replacing of the external entity works fine, but for the first event after the reader jumbs into the entity document or back to the main document the ValidatingStreamReader.getStartLocation() method returns an invalid location (at least in my case).
First case: reader jumps into the external entity document -> systemId is from the entity document, but the character offset points to a position in the main document.
Second case: reader jumps back into the main document -> the otherway around: systemId is the main document, but offset points to the entity document.

The following sample should demonstrate it:

package foo.bar.baz;

import com.ctc.wstx.sr.ValidatingStreamReader;
import com.ctc.wstx.stax.WstxInputFactory;
import org.codehaus.stax2.XMLInputFactory2;

import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLResolver;
import javax.xml.stream.XMLStreamException;
import javax.xml.transform.stream.StreamSource;
import java.io.StringReader;

import static javax.xml.stream.XMLStreamConstants.*;

public class StackOverflowTest {


    public static void main(String[] args) throws XMLStreamException {
        XMLInputFactory2 xmlInputFactory = (XMLInputFactory2) WstxInputFactory.newInstance();
        xmlInputFactory.setProperty(XMLInputFactory.IS_SUPPORTING_EXTERNAL_ENTITIES,
                Boolean.TRUE);
        xmlInputFactory.setProperty(XMLInputFactory.IS_REPLACING_ENTITY_REFERENCES,
                Boolean.TRUE);

        String mainXml = "<!DOCTYPE main [" +
                "<!ENTITY incl SYSTEM \"include.xml\">" +
                "]>" +
                "<main>&incl;</main>";
        String inclXml = "<include></include>";

        StringReader mainReader = new StringReader(mainXml);
        StringReader inclReader = new StringReader(inclXml);

        StreamSource main = new StreamSource(mainReader, "main.xml");
        final StreamSource incl = new StreamSource(inclReader, "include.xml");

        xmlInputFactory.setXMLResolver(new XMLResolver() {
            @Override
            public Object resolveEntity(String publicID, String systemID,
                                        String baseURI, String namespace) 
                    throws XMLStreamException {
                return incl;
            }
        });

        ValidatingStreamReader sr = (ValidatingStreamReader)
                xmlInputFactory.createXMLStreamReader(main);

//        START DOCUMENT
        sr.next();
//        DOCTYPE
        sr.next();
//        START ELEMENT <main>
        printStatusInfo(sr);
        sr.next();
//        START ELEMENT <include>
        printStatusInfo(sr);
        sr.next();
//        END ELEMENT </include>
        printStatusInfo(sr);
        sr.next();
//        END ELEMENT </main>
        printStatusInfo(sr);

    }

    private static void printStatusInfo(ValidatingStreamReader sr)
            throws XMLStreamException {
        System.out.println("event  = " + eventType(sr));
        System.out.println("name  = " + (sr.hasName() ? sr.getName() : "-"));
        System.out.println("start = " + sr.getStartLocation().getSystemId()
                + "#" + sr.getStartLocation().getCharacterOffset());
        System.out.println("end   = " + sr.getEndLocation().getSystemId()
                + "#" + sr.getEndLocation().getCharacterOffset());
        System.out.println("cur   = " + sr.getCurrentLocation().getSystemId()
                + "#" + sr.getCurrentLocation().getCharacterOffset());
    }

    private static String eventType(ValidatingStreamReader sr){
        switch (sr.getEventType()){
            case START_ELEMENT:
                return "START_ELEMENT";
            case END_ELEMENT:
                return "END_ELEMENT";
            default:
                return "UNKNOWN";
        }
    }
}

This is the result:

event  = START_ELEMENT
name  = main
start = main.xml#53
end   = main.xml#59
cur   = main.xml#59
event  = START_ELEMENT
name  = include
start = include.xml#59
end   = include.xml#9
cur   = include.xml#9
event  = END_ELEMENT
name  = include
start = include.xml#9
end   = include.xml#19
cur   = include.xml#19
event  = END_ELEMENT
name  = main
start = main.xml#19
end   = main.xml#72
cur   = main.xml#72

The bad locations are:

event  = START_ELEMENT
name  = include
start = include.xml#59

(I would expect include.xml#0)
and

event  = END_ELEMENT
name  = main
start = main.xml#19

(I would expect main.xml#65)

Thanks for your help!
Best Regards,
Nico

@cowtowncoder
Copy link
Member

cowtowncoder commented Nov 4, 2019

Sounds like a bug, yes. If anyone has time, the first step could be to write a (failing) unit test to reproduce the problem in its smallest form, and help verify eventual fix.
That would help figure out what exactly is happening wrt discrepancy, why source and location information are disjoint.

nkutsche added a commit to nkutsche/woodstox that referenced this issue Dec 8, 2019
cowtowncoder pushed a commit that referenced this issue Dec 11, 2019
Adds unit test case for issue #91
cowtowncoder added a commit that referenced this issue Dec 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants