BEF-China test collection for dataset search

This repository provides a test collection for dataset search in biodiversity. The test collections consists of 14 questions collected in different biodiversity research related projects and reflecting real user informations needs, a corpus of 372 datasets created in the scope of the BEF-China project and human assessments evaluating which dataset is relevant to a question.

Further information on the BEF-China project can be obtained from the website: https://bef-china.com.

Data Corpus

The data-portal is available under this link https://data.botanik.uni-halle.de.

Each dataset is available with the following pattern: https://data.botanik.uni-halle.de/bef-china/datasets/<dataset-number>, e.g., https://data.botanik.uni-halle.de/bef-china/datasets/630

Questions

Question Number	Question
Q1	Name 3 species that occur in the shrub layer.
Q2	Find 3 plant species where root lengths (depth) have been considered.
Q3	Find 3 datasets from oaks where nitrogen content have been measured.
Q4	Find 3 datasets where dry weights from conifers have been measured.
Q5	Which nutrients occur in soil?
Q6	Identify all parameters that are correlated to soil depth.
Q7	Which taxa associated with tree species have been found, for example, insects on host trees?
Q8	Which soil samples in BEF-China data show a low pH value?
Q9	Does tree diversity reduce competition?
Q10	Do the soil carbon concentrations increse with soil depth?
Q11	Are there data about the leaf area index (LAI) and in particular in combination with diversity?
Q12	How does tree height have been measured in BEF-China experiments?
Q13	How does the nitrogen cycle interact with water?
Q14	How significant is the role of throughfall as water input to the forest floor?

Human Assessments

The human assessments are available in the provided txt file complying with the TREC benchmark data format. An entry in the txt file looks as follows:

1 0 161 1

The first number denotes the question number, the second number provides the dataset number, the third number denotes the relevance judgment (1-relevant) and the last number is the timestamp of the creation of the entry. All datasets of the corpus that are not mentioned for a question are not deemed relevant.

Licenses

Shield:

The BEF-China test collection is licensed under a Creative Commons Attribution 4.0 International License.

Citation

When reusing the dataset please cite it as follows:

Felicitas Löffler, Andreas Schuldt, Birgitta König-Ries, Helge Bruelheide, & Friederike Klan. (2022). fusion-jena/befchina-test-collection: Major service release (v2.0) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.7371711

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
LICENSE.md		LICENSE.md
README.md		README.md
befchina_gold_standard.txt		befchina_gold_standard.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BEF-China test collection for dataset search

Data Corpus

Questions

Human Assessments

Licenses

Citation

About

Releases 3

Packages

License

fusion-jena/befchina-test-collection

Folders and files

Latest commit

History

Repository files navigation

BEF-China test collection for dataset search

Data Corpus

Questions

Human Assessments

Licenses

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 3

Packages 0

Packages