Skip to content

Latest commit

 

History

History
46 lines (32 loc) · 1.96 KB

README.md

File metadata and controls

46 lines (32 loc) · 1.96 KB

basic-pitch-torch

PyTorch version of Spotify's Basic Pitch, a lightweight audio-to-MIDI converter. The provided weights in Spotify's repo are converted using this script. Hopefully this helps researchers who are more accustomed to PyTorch to re-use the pretrained model.

Usage

For transcribing MIDI files, similar to Basic Pitch:

from basic_pitch_torch.inference import predict

model_output, midi_data, note_events = predict(audio_path)

For loading the nn.Module:

from basic_pitch_torch.model import BasicPitchTorch

pt_model = BasicPitchTorch()
pt_model.load_state_dict(torch.load('assets/basic_pitch_pytorch_icassp_2022.pth'))
pt_model.eval()

with torch.no_grad():
    output_pt = pt_model(y_torch)
    contour_pt, note_pt, onset_pt = output_pt['contour'], output_pt['note'], output_pt['onset']

Result Validation

In tests/ we show two levels of validation tests using a test audio from GuitarSet:

  • On model output

    • Most of the discrepancies originated from float division (e.g. normalized_log) and error propagation further down the network. The difference should be minimal enough to be ignored during MIDI note creation.
    Contour abs diff - max: 0.0003006, min: 0.0, avg: 5.863e-06
    Onset abs diff   - max: 0.0002712, min: 0.0, avg: 1.431e-05
    Note abs diff    - max: 0.0002297, min: 0.0, avg: 6.6e-06
    
  • On MIDI transcription

    • The transcribed MIDI using both TF and PT models are identical (see midi_data_pt.mid and midi_data_tf.mid)

References

Bittner, Rachel M., et al. "A lightweight instrument-agnostic model for polyphonic note transcription and multipitch estimation." ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022.