This repository is a hard fork of the Open Commentaries' main server as of fd2b294d1ff89a8d73aaeec53b316d31ce038572.
This project uses the Gettext support that is built into Phoenix. To add a translation, enclose the default
text where you want it to appear in the application in a call to the gettext backend: gettext("My default text.")
.
Then run mix gettext.extract
and mix gettext.merge priv/gettext
. These commands will find
your newly added i18n string and add it to the default.po
files for each of the languages that the project supports.
Don't edit the default.pot
file at the root of the priv/gettext
directory.
Instead, find your newly added string (or any strings whose translations you want to modify) in the default.po
file of the language into which you're translating.
A translation looks something like this:
#: lib/text_server_web/components/layouts/app.html.heex:16
#, elixir-autogen, elixir-format
msgid "About"
msgstr "À propos du projet"
The string from the call to gettext/1
is the msgid
, and you can add your translation to this string on the msgstr
line.
In order to start the app locally, you will need to set a few environment variables:
ZOTERO_API_URL
: for now, set it to something like https://api.zotero.org/groups/YOUR_GROUP_HERE, since Zotero prefixes most API queries by the user or group. (See https://www.zotero.org/support/dev/web_api/v3/basics.)ZOTERO_API_TOKEN
: See https://www.zotero.org/settings/keys.
In production, a few additional variables are required:
DATABASE_URL
: For example: postgres://USER:PASS@HOST/DATABASESECRET_KEY_BASE
: For signing cookies etc.PHX_HOST
: The hostname of the application. ajmc.unil.ch for now. Note that, even though the Phoenix server will not be talking to the outside world directly (all traffic goes through a proxy), it still needs to know what hostname to expect in requests so that it can respond properly.PORT
: The local port for the server. This is where you'll send the proxied requests to, so if the proxy is serving the app at https://ajmc.unil.ch:443, it should proxy requests to something like http://127.0.0.1:4000.SENDGRID_API_KEY
: Sign up at sendgrid.com. This API key is needed to send account verification emails for trusted users.
This application is deployed using an Elixir Release that is built and deployed via a Docker container. The container's specification can be found in the Dockerfile. Note the (very simple) Dockerfile.postgres as well: an example of using it can be found in docker-compose.yaml.
(Note that this docker-compose file is not used in production, but is rather a development convenience for debugging deployment issues.)
All of the configuration for the production Phoenix/Cowboy endpoint can be found in config/runtime.exs. Note that HTTPS is not enforced at the application level. Instead, the expectation is that the application only allows local access, which is brokered to the outside world by a reverse proxy such as nginx. Bear in mind that the proxy needs to allow websocket connections in order for LiveView to work.
The Dockerfile builds a release of the Elixir application in a fairly standard way, but we also need to seed the database with the latest textual data about the Ajax commentaries.
To perform this seeding, entrypoint.sh runs /app/bin/text_server eval "TextServer.Release.seed_database"
. This function starts the application processes (except for the HTTP server) and calls TextServer.Ingestion.Ajmc.run/0
.
TextServer.Ingestion.Ajmc.run/0
deletes all of the existing comments and commentaries: the data have the potential to change in difficult-to-reconcile ways, so it's easier just to start fresh, since we store the source files locally (more on that in a second).
TextServer.Ingestion.Ajmc.run/0
then creates the Version
s (= editions) of the critical text (Sophocles' Ajax), as detailed in TextServer.Ingestion.Versions
.
Versions TextServer.Ingestion.Versions
These Versions are CTS-compliant editions of the text, meaning that they all descend from the same Work, which is identified by the URN urn:cts:greekLit:tlg0011.tlg003. Right now, we're only making one Version, based on Greg Crane's TEI XML encoding of Lloyd-Jones 1994's OCT. Eventually, we will ingest more editions into the same format.
The data structure for representing a text is essentially an ordered list of TextNode
s. We need to keep the order (found at the offset
property internally) even though each TextNode
also has a location
because the locations do not necessarily match textual order: lines can be transposed, for example, so that the reading order of lines 5, 6, and 7 might actually be 6, 5, 7. To take a real example, the lines 1028–1039 are bracketed in some editions and arguably should be excluded from the text. That would mean a jump from 1027 to 1040 -- still properly ordered, but irreconcilable across editions without individual ordering.
Caveat lector: the following might change
Each TextNode
can be broken down further into an ordered list of graphemes. (We use graphemes and not characters in order to simplify handling polytonic Greek combining characters.) Annotations typically refer to lemmata as the range of graphemes that correspond to the word tokens of a given lemma. That means that instead of the CTS standard urn:cts:greekLit:tlg0011.tlg003.ajmc-fin:1034@Ἐρινὺς
, we would refer to the grapheme range at urn:cts:greekLit:tlg0011.tlg003.ajmc-fin:1034@7-12
.
This approach, however, should likely change, decomposing each edition to its TextToken
s. This transition is a work in progress.
Once the Version
s have been ingested, we ingest each of the commentaries detailed in the commentaries.toml. Their source files can be found with the glob pattern priv/static/json/*_tess_retrained.json
. (Nota bene: Eventually we will need to move these files elsewhere, as we can only store public domain content in this repository.)
Each CanonicalCommentary
pulls its data from Zotero by mapping the id
from the corresponding tess_retrained.json to its accompanying zotero_id
.
Each CanonicalCommentary
has two kinds of comments: Comment
s, which have a word-anchor
and thus a lemma, and LemmalessComments
, which have a scope-anchor
(a range of lines).
Each Comment
is mapped to its corresponding tokens in urn:cts:greekLit:tlg0011.tlg003.ajmc-lj
; each LemmalessComment
is mapped to the corresponding lines.
Note that sometimes these mappings will producde nonsensical results: Weckleinn, for instance, reorders the words in line 4, so his Comment
on that line has a lemma ("ἔνθα Αἴαντος ἐσχάτην τάξιν ἔχει") that does not correspond to the text ("Αἴαντος, ἔνθα τάξιν ἐσχάτην ἔχει") — and this is a relatively minor discrepancy.
This is why it's important that we also allow readers to change the "base" or critical text and to apply the comments in a flexible way.
We render the lemma of comments as a heatmap over the critical text in the reading environment, allowing readers to see at a glance when lines have been heavily glossed. To do so, we borrow approaches from the OOXML specification and ProseMirror:
We need to group the graphemes of each text node (line of Ajax) with the elements that should apply (we’re also preserving things like cruces and editorial insertions), including comments.
Starting by finding the comments that apply to a given line:
# comment starts with this text node OR
# comment ends on this text node OR
# text node is in the middle of a multi-line comment
comment.start_text_node_id == text_node.id or
comment.end_text_node_id == text_node.id or
(comment.start_text_node.offset <= text_node.offset and
text_node.offset <= comment.end_text_node.offset)
we then check each grapheme to see if one of those comments applies:
cond do
# comment applies only to this text node
c.start_text_node == c.end_text_node ->
i in c.start_offset..(c.end_offset - 1)
# comment starts on this text_node
c.start_text_node == text_node ->
i >= c.start_offset
# comment ends on this text node
c.end_text_node == text_node ->
i <= c.end_offset
# entire text node is in this comment
true ->
true
end
with that information (packaged in an admittedly confusing tuple of graphemes and tags), we can linearly render the text as a series of “grapheme blocks” with their unique tag sets:
<.text_element
:for={{graphemes, tags} <- @text_node.graphemes_with_tags}
tags={tags} text={Enum.join(graphemes)}
/>
It remains to be determined how we will work with comments that don't match the underlying critical text.
We follow the CTS URN spec, which can at times be confusing.
Essentially, every collection
(which is roughly analogous to a git repository)
contains one or more text_group
s. It can be helpful to think of each
text_group
as an author, but remember that "author" here designates not a
person but rather a loose grouping of works related by style, content, and
(usually) language. Sometimes the author is "anonymous" or "unknown" --- hence
text_group
instead of "author".
Each text_group
contains one or more work
s. You might think of these as
texts, e.g., "Homer's Odyssey" or "Lucan's Bellum Civile".
A work
can be further specified by a version
URN component that points to
either an edition
(in the traditional sense of the word) or a translation
.
So in rough database speak:
- A
version
has a type indication of one ofcommentary
,edition
, ortranslation
- A
version
belongs to awork
- A
work
belongs to atext_group
- A
text_group
belongs to acollection
In reverse:
- A
collection
has manytext_group
s - A
text_group
has manywork
s - A
work
has manyversion
s, each of which is typed ascommentary
,edition
, ortranslation
Note that the CTS specification allows for
an additional level of granularity known as exemplar
s. In our experience, creating
exemplars mainly introduced unnecessary redundancy with versions, so we have
opted not to include them in our API. See also http://capitains.org/pages/vocabulary.
To start your Phoenix server:
- Install dependencies with
mix deps.get
- Make sure your configuration (./config) is correct
- Create and migrate your database with
mix ecto.setup
- Start Phoenix endpoint with
mix phx.server
or inside IEx withiex -S mix phx.server
Now you can visit localhost:4000
from your browser.
Ready to run in production? Please check our deployment guides.
We're leveraging Phoenix LiveView as much as possible for the front-end, but occasionally we need modern niceties for CSS and JS. If you need to install a dependency:
- Think very carefully.
- Do we really need this dependency?
- What happens if it breaks?
- Can we just use part of the dependency in the
vendor/
directory with proper attribution? - If you really must install a dependency --- like
@tailwindcss/forms
--- runnpm i -D <dependency>
from within theassets/
directory.
Data and application code in this repository were produced in the context of the Ajax Multi-Commentary project, funded by the Swiss National Science Foundation under an Ambizione grant PZ00P1_186033.
Contributors: Carla Amaya (UNIL), Sven Najem-Meyer (EPFL), Charles Pletcher (UNIL), Matteo Romanello (UNIL), Bruce Robertson (Mount Allison University).
Open Commentaries: Collaborative, cutting-edge editions of ancient texts
Copyright (C) 2022 New Alexandria Foundation
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <https://www.gnu.org/licenses/>.