Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PIG Dataset Issue Tracker #7

Open
2 tasks
kevinzakka opened this issue Apr 8, 2023 · 4 comments
Open
2 tasks

PIG Dataset Issue Tracker #7

kevinzakka opened this issue Apr 8, 2023 · 4 comments
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@kevinzakka
Copy link
Collaborator

kevinzakka commented Apr 8, 2023

The PIG dataset has a few issues:

  • Wrong tempo: a song's tempo does not match the original piece's tempo.

    • Nocturne
  • Inconsistent sustain: a song can have the sustain pedal baked into the notes which means the hands need to reach more notes than physically possible at a given timestep.

    • Gymnopédie No. 1

We need to fix these issues or find all the affected songs so that they can be excluded from the benchmark score calculation.

@kevinzakka kevinzakka added bug Something isn't working help wanted Extra attention is needed labels Apr 8, 2023
@NeroBlackstone
Copy link

Not only Wrong tempo and Inconsistent sustain. There are many other types of errors in the PIG Dataset. Including fingerings that can't be played physically, like Crossed Chord. This has been discovered in earlier PIG data analysis.

Please read Checklist Models for Improved Output Fluency in Piano Fingering Prediction

Considering that classical music in PIG has entered the public domain, I think we should establish an open-source fingering dataset to facilitate collaboration among researchers to correct errors in the dataset and improve the quality of fingering annotations for data-driven methods.

Another weakness of PIG is that it is not organized in a common format and is difficult to parse.

Maybe we could convince the authors (Nakamura) to open-source the PIG dataset. And recruit volunteers to maintain it.

PS: I read your paper, Really amazing work!

@kevinzakka
Copy link
Collaborator Author

Thanks for the reference @NeroBlackstone! Agreed regarding maintaining and open-sourcing PIG. In an ideal world, the agent can discover its own fingering, in which case no need for PIG, but alas exploration is hard :)

@NeroBlackstone
Copy link

NeroBlackstone commented Sep 15, 2023

In an ideal world, the agent can discover its own fingering, in which case no need for PIG, but alas exploration is hard :)

If the player is a robot (like this work), the pig data set may not be needed like you said.

Enumeration action space + invalid action masking, and setting some optimization goals, it's enough to let the agent discover its own "best fingering".

But it's difficult to optimize multiple goals. There may even be conflicts between optimization goals. Moreover, the fingering results are offen not suitable for human player.

My conclusion is that human-labeled dataset (like PIG) is essential for learning initial expert policy. Otherwise, the fingering results will be difficult to benefit humans.

It's essential to maintain PIG for generating human playable fingering.

@NeroBlackstone
Copy link

NeroBlackstone commented Sep 15, 2023

In an ideal world

I don't think there is such a perfect environmental model. Even if we can use 3D models to simulate all the details of the hand.

Because there's one thing that can't be simulated - the feeling when we playing.

In other words, humans make fingering decisions based on their perception of the difficulty of the fingering.

This “feel” must be extracted using human-annotated datasets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants