Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading mp4 vs wav #1

Open
gjkunde opened this issue Mar 30, 2023 · 2 comments
Open

Reading mp4 vs wav #1

gjkunde opened this issue Mar 30, 2023 · 2 comments

Comments

@gjkunde
Copy link

gjkunde commented Mar 30, 2023

I am attempting to read the new data set with the mp4 files, while this code snippet from mixer.py

sig, sr_sig = __audioread_load(filename, offset=0.0, duration=None, dtype=np.float32)

returns an array of values with length 242550 for the ToyAMOS1 wav files, it only returns the sample
rate of 48,000 for the mp4 files but the length of sig is 0 and there is a warning warning:

/var/folders/mv/qbxkzz3d5zj4dh3wmt30cpfh000r_w/T/ipykernel_55465/1690306295.py:1: FutureWarning: librosa.core.audio.__audioread_load
Deprecated as of librosa version 0.10.0.
It will be removed in librosa version 1.0.

@noboru2000
Copy link

@gjkunde
Thank you for your report.

I’m not sure if this is caused by librosa but I remember that some versions of the FFMPEG decoder for the MPEG-4 ALS had a bug decoding it.

Could you please try to extract the mp4 file using the official MPEG-4 ALS decoder?
You can download the reference software of the MPEG-4 ALS from the following ISO/IEC link:
https://standards.iso.org/iso-iec/14496/-26/ed-2/en/confTools.zip

The source code in the mp4alsRM25.zip is a reference software for MPEG-4 Audio Lossless Coding.
Note that mp4alsRM25sp.zip is for the simple profile that does not contain codes for supporting 32-bit float.
This reference software of MPEG-4 ALS can extract the mp4 file encoded with the MPEG-4 ALS.

@daisukelab
Copy link
Collaborator

daisukelab commented Mar 31, 2023

Hi @gjkunde,

Thank you for your interest. I tried to reproduce the issue and partially could.
In short, please try downgrading your librosa to 0.9.2 or older, which could solve your issue.

Thanks!
(Of course, you can try what Noboru suggested. It would show more details about the .mp4 encoding.)

The followings are the logs that I tried.

>>> import numpy as np
>>> import librosa
>>> librosa.__version__
'0.10.0.post2'
>>> from librosa.core.audio import __audioread_load
>>> sig, sr = __audioread_load('/hdd/datasets/ToyADMOS2/ToyTrain/normal/TN001-carA1-speed1_mic1_00001.mp4', offset=0.0, duration=None, dtype=np.float32)
<stdin>:1: FutureWarning: librosa.core.audio.__audioread_load
        Deprecated as of librosa version 0.10.0.
        It will be removed in librosa version 1.0.
>>> len(sig)
576000

The older versions are fine.

>>> import numpy as np

>>> import librosa
>>> librosa.__version__
'0.8.1'
>>> from librosa.core.audio import __audioread_load
>>> sig, sr = __audioread_load('/lab/data/toy21/ToyADMOS2/ToyTrain/normal/TN001-carA1-speed1_mic1_00001.mp4', offset=0.0, duration=None, dtype=np.float32)
>>> len(sig)
576000

>>> import librosa
>>> librosa.__version__
'0.9.2'
>>> from librosa.core.audio import __audioread_load
>>> sig, sr = __audioread_load('/hdd/datasets/ToyADMOS2/ToyTrain/normal/TN001-carA1-speed1_mic1_00001.mp4', offset=0.0, duration=None, dtype=np.float32)
>>> len(sig)
576000

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants