Skip to content

DataModel

Hiroshi Ichikawa edited this page Jul 25, 2018 · 4 revisions
« back to DeveloperGuide

PFIF entities (person and note)

The data model is defined in app/model.py.

The schema for storing information about missing people closely follows the PFIF 1.4 model.

Each person corresponds to a Person entity (which has a set of fields defined in PFIF 1.4). Associated with each Person entity, there can be any number of Note entities (whose fields are also defined in PFIF 1.4).

The database is partitioned according to repository. For example, person-finder.appspot.com/foo and person-finder.appspot.com/bar will look and behave like two separate instances of Person Finder, each with their own separate database. But, it is really just a single application, where all the Person and Note entities have a repo field and all queries filter on the repository.

Both Person and Note entities have unique record IDs, as described in PFIF. The repository name and the record IDs are used together to form the key names of the App Engine entities. The common logic concerned with record IDs and repositories is implemented in the Base class, from which both Person and Note are derived.


Other entities

We also keep a few other kinds of entities in the datastore.

Repo entities don't contain anything. There's one for each repository, and it exists to make the repositories appear in administrative menus.

UniqueId entities also don't contain anything. We create one (and use its numeric ID) whenever we want to generate a unique identifier.

Photo entities store the binary image data of uploaded photos. Like Person and Note entities, Photo entities are also partitioned by repository. They're referenced in the photo properties of Person entities.

Authorization entities control access to the data through the feeds and the read/write API. Clients of the API have to specify an authorization token. Each Authorization entity specifies what powers are associated with a given authorization token.

Counter entities are used to store counts of how many records are in the database, for reporting and charting.

StaticSiteMapInfo and SiteMapPingStatus are used to store information for site maps (for indexing by search engines). This feature is not used much and will probably be removed.

ConfigEntry entities store configuration settings for each repository. For example, different repositories can have different languages available in the language menu; there are also various other locality-related options.


Indexing and ranking

In indexing.py, Person records are indexed by putting all possible prefixes of first_name and last_name into a StringListProperty called names_prefixes. The names are normalized (by removing accents and converting to uppercase) before forming these prefixes. See text_query.py for the normalization function.

In the CmpResults.rank() method, matches are ranked by a score based on how much of the name is matched and whether the first and last name are matched in the right order. The ordering preference is different for Chinese, Japanese, and Korean names.

#. Remove all accents from letters (e.g. replace å with a).
#. Convert all letters to uppercase.
#. Store three extra attributes on the Person entity:



« back to DeveloperGuide
Clone this wiki locally