MTANet

Introduction

The official implementation of "MTANET: Multi-band Time-frequency attention Network for Singing Melody Extraction from Polyphonic Music.

We propose a more powerful singing melody extractor named multi-band time-frequency attention network (MTANet) for polyphonic music. Experimental results show that our proposed MTANet achieves promising performance compared with existing state-of-the-art methods, while keeping with a small number of network parameters.

Important update

2023. 03. 19

(i) Due to the author's mistake, Figure 3 in the manuscript of the paper shows an earlier version, which may cause some misunderstandings for reviewers and readers. I am very sorry for this situation! The following picture is the revised version for reference and I will make formal corrections in the subsequent manuscript.

(ii) Rename the MMNet to the MTANet.

2023. 03. 20

The author has contacted the chairs and applied for modification. If the modification is successful, please ignore the above update. I am very sorry for the inconvenience to the reviewers and readers.

2023. 05. 20

The Paper has been accepted by INTERSPEECH 2023 and the official version awaits the official release.

The rest of the code will be sorted out and published soon.

2023. 06. 11

All the code is uploaded.

2023. 08. 19

When I read back the paper, I found a mistake witch is one of the dimension tracking descriptions in Figure 4. Specifically, the dimensions after concatenate operation are different for different stages. For example, the input feature size in the first MFA module is (B, 32, F, T), so the feature size after concatenate operation should be (B, 32+4×16, F, T). The difference is that the feature size after concatenate operation in the subsequent MFA modules is (B, 16+4×16, F, T) (i.e., B, (N+1)×C, F, T).

Although the original intention is to facilitate understanding and reading, but we ignored the strict relationship between the paper and the code. Since the paper can no longer be modified, it is very sorry for the troubles that bring readers here.

Getting Started

Download Datasets

After downloading the data, use the txt files in the data folder, and process the CFP feature by feature_extraction.py.

Note that the label data corresponding to the frame shift should be available before generation.

main.py is the main function of this project.

Model implementation

Refer to the file: mtanet.py

The replication code for other comparison models has been uploaded and can be found in the folder: control group model.

Result

Prediction result

The visualization illustrates that our proposed MTANet can reduce the octave errors and the melody detection errors.

Comprehensive result

The scores here are either taken from their respective papers or from the result implemented by us. Experimental results show that our proposed MTANet achieves promising performance compared with existing state-of-the-art methods.

Correction：Number of parameters for TONet from 214M to 147M.

Ablation study result

We conducted seven ablations to verify the effectiveness of each design in the proposed network. Due to the page limit, we selected the ADC2004 dataset for ablation study in the paper. More detailed results are presented here.

Special thanks

Citing

@inproceedings{gao23i_interspeech,
  author={Yuan Gao and Ying Hu and Liusong Wang and Hao Huang and Liang He},
  title={{MTANet: Multi-band Time-frequency Attention Network for Singing Melody Extraction from Polyphonic Music}},
  year=2023,
  booktitle={Proc. INTERSPEECH 2023},
  pages={5396--5400},
  doi={10.21437/Interspeech.2023-2494}
}

Name		Name	Last commit message	Last commit date
Latest commit History 128 Commits
data		data
fig		fig
model		model
README.md		README.md
config.py		config.py
data_generator.py		data_generator.py
feature_extraction.py		feature_extraction.py
main.py		main.py
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation