Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ML Hackathon Objectives #149

Open
bnestor opened this issue Jul 22, 2024 · 0 comments
Open

ML Hackathon Objectives #149

bnestor opened this issue Jul 22, 2024 · 0 comments

Comments

@bnestor
Copy link

bnestor commented Jul 22, 2024

I wanted to summarise some of the Orcasound ML objectives that could be pursued in the hackathon

ML Objectives

  1. Improving the detection model. Better performance, or longer context windows to eliminate boat noise
  2. Use the contributed detections as weak labels for retraining the model (Continual Learning or Weakly Supervised Learning)
  3. Species differentiation (Some species labels from DCLDE, ONC, Orcasound, and OOI)
  4. Click detection vs call detection (I have labels for clicks-only data)

Available Data
I can upload OOI (custom agreement) and ONC (CC-BY) data to azure ahead of time for speed of training during the hackathon. We should also place it on huggingface or a dataverse which allows licensing control so that others may reproduce the work.

I can provide labels for marine mammal presence absence labels for 768 instances from OOI (4TB of negative files), 1469 from Orcasound (68000 negative files), and 17290 from ONC (40TB negative files). I can also provide a pre-trained wav2vecU-2 backbone if users are just interested in fine-tuning models. These ~20k positive files also have species and ecotype annotations for granular classification. About ~2500 of the calls have specific call start and end timestamps if someone wants to take a shot at call catalogue classification.

Aspirational Hackathon Formats
It would be nice to formalise a task and leaderboard similar to how these hackathons/benchmarks do it:
DCASE Challenge Task 5 is a good hackathon motivation:https://dcase.community/challenge2022/task-few-shot-bioacoustic-event-detection
I also like the WILDS challenge, which lacks audio: https://wilds.stanford.edu/

I can provide data, some data loading scripts in pytorch or huggingface, and a test set environment if we want to follow these hackathons' approaches.

Tasks

No tasks being tracked yet.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant