Skip to content

GitBruno/docxicml

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DOCX to ICML converter

docxicml is designed to convert MS Word (DOCX) documents to Adobe InDesign (ICML). It aims to produce clean files using semantic information only.

This converter ignores all non-semantical info like font names and colours. It will however keep track of unstyled italics, bolds and page breaks. Unlike Pandoc, docxicml assumes styles are applied semantically and therefore tracks all style references.

This package is standing on the shoulders of Python-Mammoth it generates a dynamic style map and transform the HTML to ICML using a XSLT stylesheet.

Contents

  1. Usage
  2. Supported Elements
  3. Dependencies
  4. Installation
  5. Limitations
  6. Getting Help

Usage

Convert a word document (docx) to xhtml and icml with the following command:

docxicml source.docx

The newly generated files will be at the same location as source document:

source.docx
source.xhtml
source.icml

Supported Elements

The following elements are supported:

  • Paragraph Styles
  • Character Styles
  • Bold and italic
  • Strikethrough and Underlines
  • Superscript and Subscript
  • Headings
  • Ordered and Unordered Lists
  • Tables (Including headers and footers)
  • Footnotes and endnotes (Yet to be implemented)
  • Line, Column and Page Breaks
  • Hyperlinks (Yet to be implemented)
  • Images (Only embedded EMF)

Dependencies

docxicml requires Java 6 or later. (It uses SaxonHE for XSLT 2.0 transformations.)

Installation

make install

Limitations

As it stands, there is room for improvements. We need to finalise implementation of all elements listed above. It might be a good idea to port this to Javascript so we can run it with easy on a wide variety of systems without installing the Java runtime. Both XSLT processor and Mammoth have Javascript implementations: mammoth.js, Saxon-JS. It would be usefull to be able to round-trip the files.

Getting Help

Bugs and feature requests are tracked with GitHub Issue Tracker.

License

About

A semantical DOCX to ICML converter

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published