Reading mp4 vs wav #1

gjkunde · 2023-03-30T15:55:50Z

I am attempting to read the new data set with the mp4 files, while this code snippet from mixer.py

sig, sr_sig = __audioread_load(filename, offset=0.0, duration=None, dtype=np.float32)

returns an array of values with length 242550 for the ToyAMOS1 wav files, it only returns the sample
rate of 48,000 for the mp4 files but the length of sig is 0 and there is a warning warning:

/var/folders/mv/qbxkzz3d5zj4dh3wmt30cpfh000r_w/T/ipykernel_55465/1690306295.py:1: FutureWarning: librosa.core.audio.__audioread_load
Deprecated as of librosa version 0.10.0.
It will be removed in librosa version 1.0.

noboru2000 · 2023-03-31T06:20:39Z

@gjkunde
Thank you for your report.

I’m not sure if this is caused by librosa but I remember that some versions of the FFMPEG decoder for the MPEG-4 ALS had a bug decoding it.

Could you please try to extract the mp4 file using the official MPEG-4 ALS decoder?
You can download the reference software of the MPEG-4 ALS from the following ISO/IEC link:
https://standards.iso.org/iso-iec/14496/-26/ed-2/en/confTools.zip

The source code in the mp4alsRM25.zip is a reference software for MPEG-4 Audio Lossless Coding.
Note that mp4alsRM25sp.zip is for the simple profile that does not contain codes for supporting 32-bit float.
This reference software of MPEG-4 ALS can extract the mp4 file encoded with the MPEG-4 ALS.

daisukelab · 2023-03-31T07:37:44Z

Hi @gjkunde,

Thank you for your interest. I tried to reproduce the issue and partially could.
In short, please try downgrading your librosa to 0.9.2 or older, which could solve your issue.

Thanks!
(Of course, you can try what Noboru suggested. It would show more details about the .mp4 encoding.)

The followings are the logs that I tried.

>>> import numpy as np
>>> import librosa
>>> librosa.__version__
'0.10.0.post2'
>>> from librosa.core.audio import __audioread_load
>>> sig, sr = __audioread_load('/hdd/datasets/ToyADMOS2/ToyTrain/normal/TN001-carA1-speed1_mic1_00001.mp4', offset=0.0, duration=None, dtype=np.float32)
<stdin>:1: FutureWarning: librosa.core.audio.__audioread_load
        Deprecated as of librosa version 0.10.0.
        It will be removed in librosa version 1.0.
>>> len(sig)
576000

The older versions are fine.

>>> import numpy as np

>>> import librosa
>>> librosa.__version__
'0.8.1'
>>> from librosa.core.audio import __audioread_load
>>> sig, sr = __audioread_load('/lab/data/toy21/ToyADMOS2/ToyTrain/normal/TN001-carA1-speed1_mic1_00001.mp4', offset=0.0, duration=None, dtype=np.float32)
>>> len(sig)
576000

>>> import librosa
>>> librosa.__version__
'0.9.2'
>>> from librosa.core.audio import __audioread_load
>>> sig, sr = __audioread_load('/hdd/datasets/ToyADMOS2/ToyTrain/normal/TN001-carA1-speed1_mic1_00001.mp4', offset=0.0, duration=None, dtype=np.float32)
>>> len(sig)
576000

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reading mp4 vs wav #1

Reading mp4 vs wav #1

gjkunde commented Mar 30, 2023

noboru2000 commented Mar 31, 2023

daisukelab commented Mar 31, 2023 •

edited

Loading

Reading mp4 vs wav #1

Reading mp4 vs wav #1

Comments

gjkunde commented Mar 30, 2023

noboru2000 commented Mar 31, 2023

daisukelab commented Mar 31, 2023 • edited Loading

daisukelab commented Mar 31, 2023 •

edited

Loading