DOCX reader: Nested lists (numbers & literals) with new start values and new OL-blocks #10096
Replies: 3 comments
-
I took a look at the XML in this document. Pandoc usually expects the paragraphs in a list to have embedded <w:numPr><w:ilvl w:val="0" /><w:numId w:val="1001" /></w:numPr> Here the In your document, the paragraphs don't have this:
The styles.xml contains:
which does specify the But obviously it works okay in Word. I'm not sure where Word is getting the list level information? Anything special about how this was created? |
Beta Was this translation helpful? Give feedback.
-
Hello John,I have to thank you for adressing my question.It is true that indent levels in this DOCX formatting are not provided using level-lists in this kind of Word document. Instead, there are at least four contextual indicators that might support getting the right corresponding indent level:1. All subsequent paragraphs, numbered or bulleted, belong to one level or another level.2. Each numbering style belongs to one specific indent level.3. The content level is technically equal to its first appearance, which also corresponds with the left margin of the paragraph used on each different level. Thus, the first numbering style used in this range defines the first indent level, the second numbering style the second indent level…4. If one former numbering style & paragraph margin is repeated, it also means that the former numbering list is continued.Especially the third indicator might help to identify the different levels used upon their introduction.I hope this helps understanding how this type of DOCX might be parsed into XHTML. If you need further examples, please let me know.Kind Regards,Tobias
|
Beta Was this translation helpful? Give feedback.
-
I don't see where or how either of these things is specified in the XML, though. |
Beta Was this translation helpful? Give feedback.
-
Hello!
First of all, Pandoc has becoming very powerful for converting docx files to epub files!
Because of this I have tried to convert more complex Word DOCX files to EPUB2 files. Some of my documents have nested lists, with numbers and literals. Here, I can reproduce an error concerning the XHTML-output:
I have enclosed a DOCX file that shows this specific problem
Nested List.docx
If there is a way how I can modify the way Pandoc reads the DOCX file I am interested to learn this. Besides this, I hope that this will be tested as reproducible bug and therefore be solved.
Regards!
Beta Was this translation helpful? Give feedback.
All reactions