Dataset Providers: we want you!

This is the main repository for the BioCADDIE GYM Harvester project

This project will explore the use of social coding sites such as GitHub for publishing and sharing descriptions of datasets. We will develop create a format aligned with the Health Care and Life Sciences (HCLS) Dataset Description profile that can be easily embedded in a project repository, and foster a lightweight tool ecosystem around this. This includes dynamic publishing of dataset descriptions, and automatic validation through the Travis continuous integration system. We will pilot this project by taking existing datasets and describing them retrospectively, and by working with an existing project and describing this prospectively. The goal will to ultimately have this indexed within the biocaddie system.

Getting Started

Go to the fake-data demo/template site:

http://biodatasets.github.io/mybiocaddie/

And click "Fork Me", follow the instruction there.

About this repository

For sharing our data and metadata, follow the instructions above. This repo houses many of the scripts and tools that take of things behind the scenes.

Validation and Processing Scripts

See the bin directory for useful scripts for parsing and aggregating md files.

The mybiocaddie template includes a file .travis.yml

Currently this downloads a python script from this repo, and this is executed to test the contents of the repo. Note: people forking the mybiocaddie template will need to enable travis to perform checks.

Harvesting

See: https://github.com/cmungall/biocaddie-gym/milestone/4

One of the aims of the project is to demonstrate the feasibility of harvesting user-supplied metadata in github repositories.

The demonstrator function here is not intended to supplant actual BioCaddie harvesting technologies, but to show how they may be augmented.

See the Makefile for how to run this harvesting step.

TODO: issue 11 make a CI job

Dataset Providers: we want you!

Do you provide or release data using a distributed VCS site like github? Are you interested in doing this? Alternatively, are you interested in using github to provide metadata for data released elsewhere (e.g. Dryad, FigShare, ...)?

If so, we want to hear from you!

Twitter: @biocaddie or @chrismungall GitHub: @cmungall Email: cjmungall@lbl.gov

More details coming soon...

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
bin		bin
presentations		presentations
samples		samples
.travis.yml		.travis.yml
Makefile		Makefile
README.md		README.md
biocaddie-context.jsonld		biocaddie-context.jsonld
report.md		report.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Getting Started

About this repository

Validation and Processing Scripts

Harvesting

Dataset Providers: we want you!

About

Releases

Packages

Contributors 2

Languages

cmungall/biocaddie-gym

Folders and files

Latest commit

History

Repository files navigation

Getting Started

About this repository

Validation and Processing Scripts

Harvesting

Dataset Providers: we want you!

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages