EpiDoc Conversions

#Generally Speaking

The overview of changes to be made to Perseus P4 documents to make the EpiDoc-based P5 can be found in the spreadsheet Perseus P4 to EpiDoc-based P5, edited by Simona Stoyanova, who is also an author of the EpiDoc Guidelines.

#What if a mapping is missing for a TEI tag? If you encounter a TEI tag in the P4 texts that is missing from the EpiDoc schema and which you think needs to be there in order to accurately represent manuscripts in EpiDoc, please do the following:

work around the issue as best as you can in your text
enter an issue in the github tracker for the general PerseusDL/canonical issue tracker, describing the problem and assigning the label epidoc_enhancement
enter an issue in the github tracker for the text repository (e.g. canonical-greekLit, canonical-latinLit) indicating what text it was, what you did, and that this needs to be revisited in the future when the epidoc schema is updated for full manuscript support. Assign the issue the label 'epidoc enhancement'

#Current Workflow There is not yet one streamlined workflow for a Perseus P4 to EpiDoc conversion. A prerequisite for creating one will definitely require finalizing the list changes to be made. However, if you are working manually, opening Oxygen XML editor and making sure the top of the file looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-model href="http://www.stoa.org/epidoc/schema/8.19/tei-epidoc.rng"
schematypens="http://relaxng.org/ns/structure/1.0"?>
<?xml-model href="http://www.stoa.org/epidoc/schema/8.19/tei-epidoc.rng"
schematypens="http://purl.oclc.org/dsdl/schematron"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0">

...will allow Oxygen to provide editorial clues about changes that need to be made.

In addition, here are some of the scripts currently used when to converting a legacy Perseus document to EpiDoc compliance, which also provide guidance about the various steps required:

fix-misc.pl makes sure that the text element has an attribute noting the language of the text, in addition to making sure line elements are sequentially numbered. However, be sure to read the use restrictions in the comment of this script, since it can only be used on texts with a particular, line-based citation scheme.
epidocConversions.scala enforces a number of small conversions based off the P4 to P5 spreadsheet, which are drawn from, listed in, and can be modified with currentTransforms.txt. As explained in the in-script comments, EpiDoc-related current problems with epidocConversions.scala include:
- HTML entities are converted to UTF-8 using a combination of the html2uni() function contained in the script, instead of the PerseusDTD and TEI2 files, which are currently in the dtd folder of the repo. This causes a number of problems, one of them being an incomplete handling of funder attributes. In addition, the script strips the Perseus DTD declaration, which causes problems with XML validity.
- date elements aren't yet converted to EpiDoc.
- The script doesn't add the edition or translation wrapper div.
- id attributes aren't yet converted to xml:id attributes
- The current trackChanges() function in the script for adding a new line to the revisionDesc element can cause invalid XML and other problems, and needs to be completely rewritten.
currentTransforms.txt is essentially just a machine-actionable version of the P4 to P5 spreadsheet, and is required for epidocConversions.scala to run. It could also be used to build a new, more streamlined workflow.

As of right now, the above scripts are used as an aid to a fundamentally manual process. All files are checked and manually corrected, so any problems resulting from the scripts above shouldn't appear too often at scale. One exception to this is the stripping of the Perseus DTD, as this problem may also appear in files that were split from larger files. See Invalid XML or Incomplete (?) Data for more on scripting-workflow-based problems that may appear at a wider scale.

#Future Needs

The Perseus P4 to EpiDoc-based P5 needs to be completed to the community's satisfaction, so that
A streamlined conversion script, sheet, or program, probably XSLT, possibly based off the P4 to P5 Eagle Project XSLT can be developed, including the information from the dtd files. This will (sadly) have to take into account and include remedial work from the past round of efforts.

Got questions that aren't answered on any of these pages or their links? See Questions and Decisions, Asking and Reaching.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EpiDoc Conversions

Clone this wiki locally