Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added base scoring program #14

Merged
merged 14 commits into from
Jun 12, 2024
Merged

Added base scoring program #14

merged 14 commits into from
Jun 12, 2024

Conversation

DavidCarlyn
Copy link
Member

Addresses #5

I've migrated the major-minor functionality over, but haven't adapted it for this particular format yet, but should be able to add it soon.

The .txt files I added to the reference_data folder were used to quickly test this setup. This will still need testing for an entire approach (once we get the baseline up and running on this format, we can use that).

I made several assumptions on were the data (prediction, solutions, etc.) will be held. @egrace479 let me know if you see a problem with my assumptions.

Also worth noting is we don't have error checking for the input files (predictions for example). Should we?

@egrace479
Copy link
Member

The location for data is described in the competition.yaml file.

I think we're meant to have the ingestion program pull in the input_data (the validation and testing images), then the submitted model.py should return the predictions to be matched against the reference_data. Presumably, they could return a table with the image filename and prediction (hybrid or not). I currently have the reference_data CSVs (butterfly_ref_<valid or test>_<A or mimic>.csv) set up with a ssp_indicator column (major and minor) for species A. They all have a filename and hybrid_stat_ref column to match with the testing images.

I'll read through what you have here on Monday. Glad you included a test case!

helper_scripts/dataio.py Outdated Show resolved Hide resolved
scoring_program/scoring_config.yaml Outdated Show resolved Hide resolved
scoring_program/score.py Outdated Show resolved Hide resolved
@DavidCarlyn
Copy link
Member Author

I still have to update the second scoring program score_maj_min.py to take into account the major and minor scoring, but there should be enough here to start testing.

work4cs and others added 5 commits May 29, 2024 18:06
* add bioclip code_submission

* add ingestion program and bioclip model submission

* add model environment and change prediction.txt file

* remove defaults in ingestion.py

* Update baselines/BioCLIP_code_submission/metadata

Co-authored-by: Elizabeth Campolongo <38985481+egrace479@users.noreply.github.com>

* Update baselines/BioCLIP_code_submission/model.py

Co-authored-by: Elizabeth Campolongo <38985481+egrace479@users.noreply.github.com>

* Update baselines/BioCLIP_code_submission/model.py

Co-authored-by: Elizabeth Campolongo <38985481+egrace479@users.noreply.github.com>

* Update baselines/BioCLIP_code_submission/model.py

Co-authored-by: Elizabeth Campolongo <38985481+egrace479@users.noreply.github.com>

* Update baselines/BioCLIP_code_submission/model.py

Co-authored-by: Elizabeth Campolongo <38985481+egrace479@users.noreply.github.com>

* Update baselines/BioCLIP_code_submission/requirements.txt

Co-authored-by: Elizabeth Campolongo <38985481+egrace479@users.noreply.github.com>

* Update ingestion_program/ingestion.py

Co-authored-by: Elizabeth Campolongo <38985481+egrace479@users.noreply.github.com>

* Update baselines/BioCLIP_code_submission/model.py

Co-authored-by: Elizabeth Campolongo <38985481+egrace479@users.noreply.github.com>

* Apply suggestions from code review

deal with device variable

---------

Co-authored-by: Elizabeth Campolongo <38985481+egrace479@users.noreply.github.com>
@DavidCarlyn
Copy link
Member Author

I updated the sample code submission for the DINOv2 baseline. Some notes:

  • Need to test & debug updated baseline code (with ingestion, scoring, etc.)

  • Still need to update scoring for major-minor use-case.

scoring_program/score.py Outdated Show resolved Hide resolved
egrace479 and others added 4 commits June 7, 2024 22:25
* divide and rename scoring programs by task
make helper functions accessible to scoring programs

* Update metadata files to point at proper scoring function, input, and output

* Update mimic scoring program
get proper input files to read, not just directories
print all scores and output challenge scores to scores.json for CodaBench to read
Add requirements file for temporary fix

* Update species A scoring program
get proper input files to read, not just directories
print all scores and output challenge scores to scores.json for CodaBench to read
scores still need proper labels and requirements file is temporary fix pending base container

* Add 'ref' to solution filenames to match competition.yaml in formatting branch
@DavidCarlyn
Copy link
Member Author

When we are evaluating in scoring_program_A, will all the entries be either major and minor subspecies and no others?
I'm double checking, because before we would include more than just the major and minor subspecies to calculate the threshold
and just report the accuracy of the major and minor rows. Since we are splitting this up into different test sets/tasks, then I
am assuming that we are calculating the threshold per task then. I believe we are in agreement on that, just double checking.

@egrace479 egrace479 self-requested a review June 10, 2024 14:25
@egrace479
Copy link
Member

@work4cs, this is working as expected now (just without the container).

Copy link
Member

@egrace479 egrace479 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will want to remove the requirements installation from both scoring programs once we have the container sorted, but this is functioning as-is on CodaBench with codalab/competitions-v2-compute-worker.

This was linked to issues Jun 11, 2024
@egrace479 egrace479 added this to the Functional Challenge Bundle milestone Jun 11, 2024
@egrace479
Copy link
Member

@work4cs and @DavidCarlyn I think we're good at this point? We'll change the scoring programs to not require the requirements file once we get the container functioning.

@work4cs work4cs merged commit da18386 into set-up Jun 12, 2024
@work4cs work4cs deleted the scoring branch June 12, 2024 19:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Baseline submission--DinoV2 Add scoring program
3 participants