Skip to content
This repository has been archived by the owner. It is now read-only.

Annotation format

christian-oreilly edited this page Mar 7, 2017 · 7 revisions

Every curated paper is associated with a .pcr file. The content of this file is a a JSON representation of a list of ANNOTATION objects.

Objects definition

ANNOTATION: {"pubId":string, "annotId": string, "version": string, "tags": list of TAG, "comment":string, "authors":list of strings, "parameters":list of PARAMETER, "localizer":LOCALIZER, "experimentProperties":list of PARAMETER_REFERENCE}

PARAMETER: {"id":string, "description":PARAMETER_DESC, "requiredTags":list of REQUIRED_TAG, "relationship":None or RELATIONSHIP, "isExperimentProperty":Boolean}

TAG: {"id":string, "name":string}

REQUIRED_TAG: {"id":string, "name":string, "rootId":string}

LOCALIZER: The definition of the LOCALIZER object is depending on the value of the "type" argument and is as follow:

  • LOCALIZER: {"type":"text", "location": int, "text": string}
  • LOCALIZER: {"type":"figure", "no": string}
  • LOCALIZER: {"type":"table", "no": string, "noRow": None or int, "noCol": None or int}
  • LOCALIZER: {"type":"equation", "no": string, "equation": None or Python code string}
  • LOCALIZER: {"type":"position", "noPage": int, "x":float , "y":float, "width":float, "height":float}
  • LOCALIZER: {"type":"null"}

PARAM_DESC: The definition of PARAM_DESC object is depending on the value of the "type" argument and is as follow:

  • PARAM_DESC: {"type":"pointValue", "depVar":NUMERICAL_VARIABLE}
  • PARAM_DESC: {"type":"function", "depVar":VARIABLE, "indepVars":list of VARIABLE, "parameterRefs":list of PARAMETER_REFERENCE, "equation":Python code string}
  • PARAM_DESC: {"type":"numericalTrace", "depVar":NUMERICAL_VARIABLE, "indepVars":list of NUMERICAL_VARIABLE}

STATISTIC: This field takes one value from the following list: ["raw", "mean", "median", "mode", "sem", "sd", "var", "CI_90", "CI_95", "CI_99" "N", "min", "max", "other"]

RELATIONSHIP: The definition of RELATIONSHIP objects depends on the value of the "type" argument and is as follow:

  • RELATIONSHIP: {"type":"point", "entity1":TAG, "entity2":None}
  • RELATIONSHIP: {"type":"directed", "entity1":TAG, "entity2":TAG}
  • RELATIONSHIP: {"type":"undirected", "entity1":TAG, "entity2":TAG}

VARIABLE: {"typeId":string, "unit":string, "statistic":STATISTIC}

NUMERICAL_VARIABLE: {"typeId":string, "values":VALUES}

VALUES: VALUES can be used to define series of values or eventually single values (i.e., using unitary length lists). The definition of the VALUES object is depending on the value of the "type" argument and is as follow:

  • VALUES: {"type":"simple", "values":list of floats, "unit":string, "statistic":STATISTIC}
  • VALUES: {"type":"compound", "valuesLst":list of VALUES}

PARAMETER_REFERENCE: {"instanceId":string, "paramTypeId":string}


Fields definition

ANNOTATION:

  • pubId: ID of the publication. This takes the value of the DOI if the document has a DOI. Else, it takes the value of "PMID_" followed by the Pubmed ID number. Document with no DOI nor Pubmed ID are not supported.
  • annotId: Unique identifier for this annotation (UUID objects according to RFC 4122).
  • version: Version of the JSON annotation format used for creating this annotation.
  • comment: Free form format comments that the user want to associate with the annotation.
  • authors: List of the users that created and/or modified this annotation.
  • experimentProperties: Object of type EXP_PROPERTIES which defines a list a key:value items describing experimental features necessary to interpret adequately the value of the evaluated parameter.

PARAMETER:

  • id : Unique ID of this parameter instance.
  • description: Object of the PARAMETER_DESC class, containing the description of the parameter (e.g. its value, its unit, etc.).
  • requiredTags: Dictionary of tags required by the parameter type definitions contained in the modelingDictionary.csv file.
  • isExperimentProperty: If set to true, this parameter specify an experimental setup parameter. Else, it specifies a modeling parameter.
  • relationship: Object of type RELATIONSHIP specifying whether this value is associated with a point measurement or is describing a measure characterizing a directed/undirected relationship between two entities.

LOCALIZER:

  • With type == "text":

    • location: The character where the annotation start in the associated .txt file.
    • text: Annotated text.
  • With type == "figure":

    • no: Number of the figure, e.g. "1", "3.c", "III".
  • With type == "table":

    • no: Number of the table, e.g. "1", "3.c", "III".
    • noCol: Specify the index of the column (starting from 1) containing the annotated information. Can be left empty by specifying None. Should be specified only if the table has a simple structure so that it can reliably be formated as a N x M matrix of values in an unambiguous way.
    • noRow: Idem as noCol but for rows.
  • With type == "equation":

    • no: Number of the equation, e.g. "1", "3.c", "III".
    • equation: String that can be parsed in Python with an eval() call so that it defines the relevant equation in a computable format. [TODO: To be further developed so that it can manage arguments.]
  • With type == "position":

    • no: Number of the pages, starting by 1 and incrementing at every new page of the PDF.
    • x, y : Position of the top-left corner of a rectangle defining the region of interest. We are using normalized coordinates: (0,0) is top-left of the page, (1,1) is bottom-right.
    • width, length: Size of the region of interest, in normalized coordinate space.
  • With type == "null":

    • Takes no argument. Used to integrate data that are not related to a publication.

TAG and REQUIRED_TAG: Tags comes from the Neurolex ontology. ID and names are as defined in this ontology.

  • id : Neurolex unique ID corresponding to this tag.
  • name: Neurolex name corresponding to this tag.
  • root_id: Unique ID of the Neurolex tag which is the root of the three in which (id, name) should be located. This root_id correspond to the required_tag category specified in the definition of the parameter type (as specified in the modelingDictionary.csv file).

PARAM_DESC:

  • With type == "pointValue": Used to annotate a value for a specific parameter. This is different from the "numericalTrace" type because it is a repetition of measurements of a single variable rather than a digitization of a y=f(x) trace.

    • "depVar": Object of type NUMERICAL_VARIABLE typically containing a single value. However, it can also contain more than one value if, for example, there are repetitions in the measurement of this parameter. This "type" of PARAM_DESC contains a dependent variable but no independent variable since it models a standalone parameter value rather than a relationship defined between more than one variable.
  • With type == "function": Used to annotate the values of a parameter as an analytical relationship, e.g., when numerical recordings are fitted with a model and parameters of this model are reported. Reporting only the value of these model parameters as "pointValue" parameters is inadequate because their values are meaningless unless they can be linked to the equation and the other parameters of the model.

    • "depVar":Object of the class VARIABLE defining the dependent variable of the function.
    • "indepVars": List of VARIABLE objects defining the independent variable of the function.
    • "parameterRefs": List (potentially empty) of PARAMETER_REFERENCE instances used in the equation of the function.
    • "equation": A valid Python code string (i.e., a string that can be executed with an eval() call without generating any exception) computing the value of the dependent variable given values for the parameters and independent variables. This string should have the following form "X = f(Y1, Y2, ..., YN, P1, P2, ..., PM)" where "f" defines a function, and Y1, Y2, ..., YN and P1, P2, ..., PM are the names of the parameters and variables.
  • With type == "numerical_trace": This type can be used to record experimental measurements describing a relationship between a dependent variable and one or more independent variables.

    • "depVar": Object of the class NUMERICAL_VARIABLE defining the dependent variable of the function.
    • "indepVars": List of NUMERICAL_VARIABLE objects defining the independent variable of the function.

STATISTIC: The default value for this field is "raw" which states that the recorded value is a measurement rather than some statistics. If the value is a statistics taking out of a set of measurements, it can be set as one of the following value:

  • "mean": Mean value.
  • "median": Median value.
  • "mode": Mode of the distribution.
  • "sem": Standard error of the mean.
  • "sd": Standard deviation.
  • "var": Maximal value.
  • "CI_90" : 90% confidence interval.
  • "CI_95" : 95% confidence interval.
  • "CI_99" : 99% confidence interval.
  • "N": Number of samples.
  • "min": Minimal value.
  • "max": Maximal value.
  • "other": States that value is some statistics which is not available in the previous choices.

RELATIONSHIP: Establishes the relationship between the entities involved in a measurement. The default value type is "point". The other types ("directed" and "undirected") are useful for measurements where there is intrinsically two entities involved in the measurement such as those concerning connectivity.

  • With type == "point":
    • "entity1": Object TAG defining the entity to which the relationship applies (e.g., a specific ion channel variety).
    • "entity2": Always equal to None since this type of relationship involved just one entity.
  • With type == "directed":
    • "entity1": Object TAG defining the entity which is the starting point of a directed relationship (e.g., a brain region, a cell type).
    • "entity2": Object TAG defining the entity which is the ending point of a directed relationship (e.g., a brain region, a cell type).
  • With type == "undirected":
    • "entity1": Object TAG defining one of the two entities involved in this undirected relationship (e.g., a brain region, a cell type).
    • "entity2": Object TAG defining one of the two entities involved in this undirected relationship (e.g., a brain region, a cell type).

VARIABLE: Defines a variable. As opposed to the NUMERICAL_VARIABLE, objects of the VARIABLE class do not hold any specific values.

  • typeId: Unique ID of the parameter type associated with this variable. List of defined parameter types and their ID is available in the project modelingDictionary.csv file.
  • unit: Unit associated with this variable. Validity of the specified unit is verified with the quantities Python package.
  • statistic: Object of type STATISTIC used to define whether this variable hold values for single measures or if it is the result of a statistical operation performed on a sample of measures.

NUMERICAL_VARIABLE: Defines variables object that store specific values.

  • typeId: Unique ID of the parameter type associated with this variable. List of defined parameter types and their ID is available in the project modelingDictionary.csv file.
  • values: Object of type VALUES used to hold specific values associated with this variable.

VALUES:

  • With type == "simple": Used for values that are defined by a single standalone variable (as opposed to compound values).

    • "values": List of floats holding the measured values.
    • "unit": Unit for the values.
    • "statistic": Object of type STATISTIC defining whether the values are direct measurements (then it take the value of "raw") or if a statistic was used to obtain the values from the original sample of measurements (then it take the name of the statistic).
  • With type == "compound": For values that are compound, that is, that are in fact a mixture of different values. We see these types for example when a mean value is reported along with its standard error and the size of the sample. It makes no sense to report the standard error or the sample size as a standalone values. Similarly, important information is lost if we report only the mean values. This set of values are forming a unity and this unity can be preserved by defining them using the compound type.

    • "valueLst": List of VALUES objects that forms a logical ensemble to described a specific set of measurement (e.g., a mean and its standard deviation).

PARAMETER_REFERENCE:

  • instanceId: Value of the id field of a PARAMETER object.
  • paramTypeId: Value of the typeId associated with this parameter.