Welcome to the OSA repository for all things open-source in agricultural technology (agritech) development. This accompanies the OpenSourceAgriculture newsletter, which you can sign up to here.
The idea behind this repository is to collate all open-source datasets and projects in agritech in one place for easy reference and to get a better picture of what is out there.
If you see a dataset is missing or you find an error in the tables, please submit a pull request or issue detailing the changes.
Annotated image data is the backbone of precision agricultural operations such as site-specific weed control. This data is essential for training algorithms that can find weeds, insects and count fruit on the tree. A summary of datasets from each domain are provided below. Click on the drop-down list to find out more.
Open-access image datasets of weeds
Dataset | Task | Image Number | Class Number | Species | Description |
---|---|---|---|---|---|
Agriculture-Vision | Instance Segmentation | Aerial images for detecting weeds in various agricultural fields. | |||
Carrot-Weed | Segmentation | 39 | 2 | carrot (Daucus carota ssp. sativus), unspecified weeds | |
Corn/Lettuce/Radish | Classification | 7200 | 8 | maize (Zea mays), Canada thistle (Cirsium arvense), fat hen (Chenopodium album), bluegrass (Poa spp.), lettuce, radish | |
CottonWeeds | Classification | 5187 | 15 | morningglory (Ipomoea spp.), carpetweed (Mollugo verticillata), Palmer amaranth (Amaranthus palmeri), waterhemp (Amaranthus tuberculata), purslane (Portulaca spp.), nutsedge (Cyperus spp.), eclipta (Eclipta prostrata), sicklepod (Senna obtusifolia), spotted spurge (Euphorbia maculata), ragweed (Ambrosia spp.), goosegrass (Eleusine indica), prickly sida (Sida spinosa), crabgrass (Digitaria spp.), swinecress (Lepidium spp.), spurred anoda (Anoda cristata) | |
CottonWeedDet12 | Object Detection | 5648 (9370 instances) | 12 | ||
CWF-788 | Segmentation | 788 | 1 | cauliflower (Brassica oleracea var. botrytis) | |
CWFID | Segmentation | 60 | 2 | carrot, unspecified weeds | |
GrassClover | Segmentation | 8000 | 5 | white clover (Trifolium repens), red clover (Trifolium pratense), shepherd’s purse (Capsella bursa-pastoris), unspecified thistle, dandelion (Taraxacum officinale) | |
LincolnBeet | Bounding box | 4,402 | 2 | sugar beet (Beta vulgaris var. altissima), unspecified weeds | |
Plant Seedling Dataset | Segmentation | 5,539 | 12 | maize, wheat (Triticum aestivum), sugar beet, scentless mayweed (Matricaria perforata), common chickweed (Stellaria media), shepherd’s purse, cleavers (Galium aparine), charlock (Sinapis arvensis), fat hen, small-flowered cranesbill (Geranium pusillum), blackgrass (Alopecurus myosuroides), loose silky-bent (Apera spica-venti) | |
Precision Sustainable Ag 2021 OpenCV Competition | Bounding box | 727 | 7 | grass species (Poaceae spp.), horseweed (Erigeron canadensis), cowpea (Vigna unguiculata), crimson clover (Trifolium incarnatum), goosefoot (Chenopodium album), velvetleaf (Abutilon theophrasti), sunflower (Helianthus annuus) | |
RoboWeedMap | Bounding box | 1147 | 2 | Unspecified monocotyledonous, Unspecified dicotyledonous | |
Sandplain Lupins | Segmentation | 795 (7989 instances) | 1 | Sandplain lupin (Lupinus cosentinii) | This repository contains five datasets collected in the field by a DJI Phantom 4 or smartphone in the northern wheatbelt of Western Australia. |
Soybean/Grass/Broadleaf/Soil | Segmentation | 15,336 | 3 | soybean (Glycine max), grass weeds, broadleaf weeds | |
Sugar beets | Segmentation | 300 | 10 | sugar beet, Nine unspecified weed species | |
Weed-AI | All | Hosting platform | |||
WeedMap | Segmentation | 10,196 | 2 | sugar beet | |
WeedNet | Segmentation | 155 | 2 | sugar beet, unspecified weeds |
Open-access image datasets of insects
Dataset | Task | Image Number | Classes | Description |
---|---|---|---|---|
IP102 | Classification/ object detection | Classification: >75,000, bounding box: 19,000 | 102 | A very large open-source dataset of insect pests. The IP102 is annotated with a hierarchical taxonomy and the insect pests which mainly affect one specific agricultural product are grouped into the same upper-level category. The full class list |
Open-access image datasets of plant diseases
Dataset | Task | Image Number | Classes | Description |
---|---|---|---|---|
PlantVillage | Image Classification | 54,306 | 14 crop species, 26 diseases | Dataset with a focus on plant disease detection. |
Dhan-Shomadhan: A Dataset of Rice Leaf Disease Classification for Bangladeshi Local Rice | Image Classification | 1106 | 5 dieases (Brown Spot, Leaf Scaled, Rice Blast, Rice Turngo, Steath Blight) | An image classification dataset for five disease in Bangladeshi rice production, in field and white backgrounds. |
Open-access image datasets for crop phenotyping
Dataset | Task | Image Number | Classes | Description |
---|---|---|---|---|
Global Wheat Head Dataset | Object detection/ segmentation | GWHD2020 - 4700, GWHD2021 - 6422 | wheat heads | A field-collected dataset with wheat heads annotated with either bounding boxes (2020) or segmentation (2021). The GWHD2021 builds on the GWHD2020 by adding 1722 images and segmentation level annotations. Both can be downloaded from the link provided. |
Open-access image datasets for fruit counting and yield estimation
Dataset | Task | Image Number | Classes | Description |
---|---|---|---|---|
KFuji RGB-DSM dataset | Object Detection | 967 (12,839 instances) | 1 (fuji apples) | RGB and Depth images of apple trees for fruit detection and counting. |
MinneApple | Object detection/ segmentation | 1 (apples) | 1000 (41,000 instances) | A comprehensive dataset for developing apple detection and segmentation algorithms. Representative results are provided for yield estimation. |
Tools for improving the algorithm development process.
Project Name | Task | Description |
---|---|---|
Project AgML | ML Pipeline | Standardising the development of ML algorithms, specific to agricultural data. |
RootPainter | Custom segmentation | RootPainter is a GUI-based software tool for the rapid, corrective training of deep neural networks for use in biological image analysis. RootPainter uses a client-server architecture, allowing it to be used on a standard laptop with access to Google Colab or to be installed and run locally. |
Segment-Anything Model (SAM) | Zero-shot segmentation | A recently released tool for zero-shot segmentation of images from Meta Research. Whilst not trained on agricultural data (though one plant dataset is used), the algorithm learns the concept of objects and can extrapolate well into unseen areas. |
Open-source hardware projects for field use.
Project Name | Task | Description |
---|---|---|
AgOpenGPS | GPS Guidance | A globally popular open-source GPS guidance system for tractors and implements, with substantial user base and development community. AgOpenGPS features a substantial user interface for additional features such as variable rate and mapping. |
OpenWeedLocator (OWL) | Site-specific weed control | A DIY weed detection device based around the Raspberry Pi and Google Coral. Complete instructions for building and deploying. |
Twisted Fields - Acorn | Robotic Platform | Acorn is a solar-powered, light-weight, and open source Precision Farming Rover (PFR) for in-field use. |