Skip to content

Latest commit

 

History

History
90 lines (76 loc) · 16.8 KB

README.md

File metadata and controls

90 lines (76 loc) · 16.8 KB

ENA-metadata-templates

The European Nucleotide Archive has specific metadata requirements for submitting data.

This repository contains tabular-format and xlsx spreadsheet metadata templates required to submit data to ENA using the ena-upload-cli or Galaxy's ENA upload tool. Specifically, there are templates for all the existing sample checklists. These templates are kept automatically up to date with ENA to guarantee the use of the latest attributes. Additionally we also provide a machine actionable JSON file for each template and a checklist_overview.json file in the root of this repository listing all available templates in a machine actionable way.

Supported ENA Checklists:

ID Name Description
ERC000011 ENA default sample checklist Minimum information required for the sample
ERC000012 GSC MIxS air Genomic Standards Consortium package extension for reporting of measurements and observations obtained from the environment where the sample was obtained. By choosing the environmental package, a selection of fields can be made from a relevant subsets of the GSC terms.
ERC000013 GSC MIxS host associated Genomic Standards Consortium package extension for reporting of measurements and observations obtained from the environment where the sample was obtained. By choosing the environmental package, a selection of fields can be made from a relevant subsets of the GSC terms.
ERC000014 GSC MIxS human associated Genomic Standards Consortium package extension for reporting of measurements and observations obtained from the environment where the sample was obtained. By choosing the environmental package, a selection of fields can be made from a relevant subsets of the GSC terms.
ERC000015 GSC MIxS human gut Genomic Standards Consortium package extension for reporting of measurements and observations obtained from the environment where the sample was obtained. By choosing the environmental package, a selection of fields can be made from a relevant subsets of the GSC terms.
ERC000016 GSC MIxS human oral Genomic Standards Consortium package extension for reporting of measurements and observations obtained from the environment where the sample was obtained. By choosing the environmental package, a selection of fields can be made from a relevant subsets of the GSC terms.
ERC000017 GSC MIxS human skin Genomic Standards Consortium package extension for reporting of measurements and observations obtained from the environment where the sample was obtained. By choosing the environmental package, a selection of fields can be made from a relevant subsets of the GSC terms.
ERC000018 GSC MIxS human vaginal Genomic Standards Consortium package extension for reporting of measurements and observations obtained from the environment where the sample was obtained. By choosing the environmental package, a selection of fields can be made from a relevant subsets of the GSC terms.
ERC000019 GSC MIxS microbial mat biolfilm Genomic Standards Consortium package extension for reporting of measurements and observations obtained from the environment where the sample was obtained. By choosing the environmental package, a selection of fields can be made from a relevant subsets of the GSC terms.
ERC000020 GSC MIxS plant associated Genomic Standards Consortium package extension for reporting of measurements and observations obtained from the environment where the sample was obtained. By choosing the environmental package, a selection of fields can be made from a relevant subsets of the GSC terms.
ERC000021 GSC MIxS sediment Genomic Standards Consortium package extension for reporting of measurements and observations obtained from the environment where the sample was obtained. By choosing the environmental package, a selection of fields can be made from a relevant subsets of the GSC terms.
ERC000022 GSC MIxS soil Genomic Standards Consortium package extension for reporting of measurements and observations obtained from the environment where the sample was obtained. By choosing the environmental package, a selection of fields can be made from a relevant subsets of the GSC terms.
ERC000023 GSC MIxS wastewater sludge Genomic Standards Consortium package extension for reporting of measurements and observations obtained from the environment where the sample was obtained. By choosing the environmental package, a selection of fields can be made from a relevant subsets of the GSC terms.
ERC000024 GSC MIxS water Genomic Standards Consortium package extension for reporting of measurements and observations obtained from the environment where the sample was obtained. By choosing the environmental package, a selection of fields can be made from a relevant subsets of the GSC terms.
ERC000025 GSC MIxS miscellaneous natural or artificial environment Genomic Standards Consortium package extension for reporting of measurements and observations obtained from the environment where the sample was obtained. By choosing the environmental package, a selection of fields can be made from a relevant subsets of the GSC terms.
ERC000027 ENA Micro B3 Minimum information about a Micro B3 sample. A checklist for reporting metadata of marine microbial samples associated with genomics data. NOTE: Non-genomics data, i.e. oceanographic environmental data and morphology-based biodiversity data, should be submitted to the appropriate National Oceanographic Data Centre according to established reporting practices maintained by oceanographic community experts. Major National Oceanographic Data Centres from countries bordering the North-East Atlantic, and its adjacent seas: the Mediterranean, the Black Sea, the Baltic, the North Sea and the Arctic are listed at http://www.seadatanet.org/Overview/Partners. For the Ocean Sampling Day campaign, non-genomics data shall be reported to the PANGAEA (http://www.pangaea.de/submit/).
ERC000028 ENA prokaryotic pathogen minimal sample checklist Minimum information required for a prokaryotic pathogen sample
ERC000029 ENA Global Microbial Identifier reporting standard checklist GMI_MDM:1.1 Minimum Data for Matching (MDM). A checklist for reporting metadata of pathogen samples for the Global Microbial Identifier (GMI) reporting system. More about GMI can be found here http://www.g-m-i.org/
ERC000030 ENA Tara Oceans Minimum information about a Tara Oceans sample. A checklist for reporting metadata of oceanic plankton samples associated with genomics data from the Tara Oceans Expedition.
ERC000031 GSC MIxS built environment Genomic Standards Consortium package extension for reporting of measurements and observations obtained from the environment where the sample was obtained. By choosing the environmental package, a selection of fields can be made from a relevant subsets of the GSC terms.
ERC000032 ENA Influenza virus reporting standard checklist Minimum information about an Influenza virus sample. A checklist for reporting metadata of Influenza virus samples associated with genomic data. This minimum metadata standard supports submission of avian, human and mammalian surveillance data as well as serology and viruse isolate information (where available). The ENA Influenza sample checklist is based on standards in use at the Influenza Research Database.
ERC000033 ENA virus pathogen reporting standard checklist Minimum information about a virus pathogen. A checklist for reporting metadata of virus pathogen samples associated with genomic data. This minimum metadata standard was developed by the COMPARE platform for submission of virus surveillance and outbreak data (such as Ebola) as well as virus isolate information.
ERC000034 ENA mutagenesis by carcinogen treatment checklist Minimum Information required for reporting samples associated with genomic data, derived from carcinogen induced animal tumours. This minimum metadata standard was developed in collaboration with Duncan Odom lab for the Mouse Liver Cancer Evolution Project.
ERC000035 ENA Crop Plant sample enhanced annotation checklist The ENA Crop sample enhanced checklist has been developed in collaboration with a number of EMBL-EBI teams to capture enriched annotation of published crop plant samples that lack sufficient reported metadata and are typically associated with systematic transcriptomic realignment-based analyses.
ERC000036 ENA sewage checklist Minimum information about sewage samples. A checklist for reporting of sewage surveillance samples associated with sequence data from metagenomic sequencing projects. This minimum metadata standard was developed by the COMPARE platform.
ERC000037 ENA Plant Sample Checklist ENA implementation of plant specimen contextual information associated with molecular data. The checklist has been developed in collaboration with the NCBI-GenBank and iPlant data resources under the umbrella of the Genomic Standards Consortium.
ERC000038 ENA Shellfish Checklist Shellfish contextual information associated with molecular data. The checklist has been developed in collaboration with EMBRIC Project partners.
ERC000039 ENA parasite sample checklist Minimum information about parasite samples. A checklist for reporting metadata of parasite samples associated with molecular data. This standard was developed by the COMPARE platform and can be used for submission of sample metadata derived from protozoan parasites (e.g. Cryptosporidium) and also multicellular eukaryotic parasites (e.g. Platyhelminthes and Nematoda).
ERC000040 ENA UniEuk_EukBank Checklist Minimum information required for reporting samples associated with the UniEuk EukBank initiative. This checklist aims to capture contextual metadata associated with V4 18S SSU rRNA molecular data.
ERC000041 ENA Global Microbial Identifier Proficiency Test (GMI PT) checklist Minimum information to standardise metadata related to samples used in GMI PT (Global Microbial Identifier Proficiency Test). A checklist for reporting metadata of GMI PT samples associated with molecular data. This minimum metadata standard was developed by the COMPARE platform and can be used for submission of sample metadata derived from Campylobacter coli, Campylobacter jejuni, Listeria monocytogenes, Klebsiella pneumoniae, Salmonella enterica, Escherichia coli and Staphylococcus aureus.
ERC000043 ENA Marine Microalgae Checklist Marine microalgae contextual information. The checklist has been developed in collaboration with EMBRIC Project partners and is suitable for reporting metadata related to environmental samples and those in culture collections.
ERC000044 COMPARE-ECDC-EFSA pilot human-associated reporting standard A checklist for reporting metadata of human-associated pathogen samples for the COMPARE-ECDC-EFSA reporting system.
ERC000045 COMPARE-ECDC-EFSA pilot food-associated reporting standard A checklist for reporting metadata of food-borne pathogen samples for the COMPARE-ECDC-EFSA reporting system.
ERC000047 GSC MIMAGS Genomic Standards Consortium package extension for reporting of measurements and observations obtained from the environment where the sample was obtained. By choosing the environmental package, a selection of fields can be made from a relevant subsets of the GSC terms.
ERC000048 GSC MISAGS Genomic Standards Consortium package extension for reporting of measurements and observations obtained from the environment where the sample was obtained. By choosing the environmental package, a selection of fields can be made from a relevant subsets of the GSC terms.
ERC000049 GSC MIUVIGS Genomic Standards Consortium package extension for reporting of measurements and observations obtained from the environment where the sample was obtained. By choosing the environmental package, a selection of fields can be made from a relevant subsets of the GSC terms.
ERC000050 ENA binned metagenome Minimum information to standardise metadata of binned metagenome samples. Ensures binned and MAG metagenome assembly metadata is compatible.
ERC000051 PDX Checklist Minimum information required for reporting samples associated with patient-derived xenograft (PDX) models or patient samples
ERC000052 HoloFood Checklist Minimum information required for reporting HoloFood samples. HoloFood is a 'hologenomic' approach that will improve the efficiency of food production systems by understanding the biomolecular and physiological processes affected by incorporating feed additives and novel sustainable feeds in farmed animals (https://www.holofood.eu/).
ERC000053 Tree of Life Checklist Minimum information required for reporting samples associated with the Tree of Life Programme (https://www.sanger.ac.uk/programme/tree-of-life/).
ERC000055 GSC MIxS agriculture Genomic Standards Consortium package extension for reporting of measurements and observations obtained from the environment where the sample was obtained. By choosing the environmental package, a selection of fields can be made from a relevant subsets of the GSC terms.
ERC000056 GSC MIxS Food and Production Genomic Standards Consortium package extension for reporting of measurements and observations obtained from the environment where the sample was obtained. By choosing the environmental package, a selection of fields can be made from a relevant subsets of the GSC terms. This package is a combination of the four food extensions (MIxS-food-animal and animal feed, MIxS-food-farm environment, MIxS-food-food production facility, MIxS-food-human foods).
ERC000057 GSC MIxS Symbiont Genomic Standards Consortium package extension for reporting of measurements and observations obtained from the environment where the sample was obtained. By choosing the environmental package, a selection of fields can be made from a relevant subsets of the GSC terms.
ERC000058 GSC MIxS Hydrocarbon Genomic Standards Consortium package extension for reporting of measurements and observations obtained from the environment where the sample was obtained. By choosing the environmental package, a selection of fields can be made from a relevant subsets of the GSC terms.

Tabular metadata templates (*.tsv)

There are four (\.tsv)* files, one for each metadata object (study, sample, experiment and run).

Workbook (*.xlsx) templates

Workbook templates contain one worksheet per metadata object. Controlled vocabulary options can be selected from dropdown menus.

Machine actionable JSON files

Every template folder has a machine actionable yaml file describing all attributes in a template in the following way:

  {
  "name": "Attribute name",
  "cardinality": "mandatory",
  "description": "Description of the attribute",
  "units": "m/s",
  "regex": "Regular expression",
  "field_type": "TEXT_FIELD",
  "cv": [
      "Controlled vocabulary 1",
      "Controlled vocabulary 2"
    ]
  },

Operational Practices

A GitHub Action is put in place to pull the sample checklists and the study, sample, experiment and run XSD specifications and update these templates and READMEs accordingly.

Versioning

The version releases on this repository will be synchronized with the ena-upload-cli release cycle, guarantying compliance between what the tool can submit, and the templates it uses.