.uni file type and spec #2

chrisrzhou · 2020-08-09T19:40:56Z

Currently, unified-doc has a .file method that supports outputting the source content in various file formats:

null: source/original file
.txt: a file containing only the textContent of the document.
.html: HTML version of the document.

In the future, support for .pdf and .docx file outputs could be possible when the unified ecosystem matures with relevant support with hast.

It's worthwhile to think of a file type that works seamlessly in the unified-doc (and also unified ecosystem). A brief pass for this spec includes:

Stores the hast tree
Stores important file information and metadata (e.g. filename, mimeType)
Optionally store annotations (based on the Annotation interface)
Optionally store the source content (this would almost double the filesize, but I'm not sure what are best practices here).
???

In the unified-doc ecosystem, if we have this file type specced, we can support it natively by simply reading the hast content, which are interoperable with unified-doc APIs, allowing us to very easily search/annotate/convert files without the need for specific parsers and compilers.

A backend system and data store can optionally choose to store files in .uni format, and bulk-process files of varying types with unified document APIs.

The text was updated successfully, but these errors were encountered:

chrisrzhou added the idea label Aug 9, 2020

chrisrzhou mentioned this issue Aug 9, 2020

Python implementation of unified-doc #3

Open

chrisrzhou added the help wanted Extra attention is needed label Aug 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.uni file type and spec #2

.uni file type and spec #2

chrisrzhou commented Aug 9, 2020

.uni file type and spec #2

.uni file type and spec #2

Comments

chrisrzhou commented Aug 9, 2020