This doc shows how to train a new AudioSeal model. The training pipeline was developed using AudioCraft (version 0.1.4 and later). The following example is tested on Pytorch==2.1.0 and, torchaudio==2.1.0:
We need AudioCraft >=1.4.0a1. If you want to experiment with different datasets and training recipes, we advise that you download the source code of audiocraft and install directly from source, see Installation notes:
git clone https://github.com/facebookresearch/audiocraft.git
cd audiocraft
pip install -e .
sudo apt-get install ffmpeg
# Or if you are using Anaconda or Miniconda
conda install "ffmpeg<5" -c conda-forge
Note that the step of installing ffmpeg (<5.0.0) in the notes is mandatory, otherwise the training loop will fail as our AAC augmentation step depends on it.
The dataset should be processed in AudioCraft format. The first step is to create the manifest for your dataset. For Voxpopuli (which is used in the paper), run the following command:
# Download the raw audios and segment them
git clone https://github.com/facebookresearch/voxpopuli.git
cd voxpopuli
python -m voxpopuli.download_audios --root [ROOT] --subset 400k
python -m voxpopuli.get_unlabelled_data --root [ROOT] --subset 400k
# Run audiocraft data tool to prepare the manifest
cd [PATH to audiocraft]
python -m audiocraft.data.audio_dataset [ROOT] egs/voxpopuli/data.jsonl.gz
Then, prepare the following datasource definition and put it inside the "[audiocraft root]/configs/dset/audio/voxpopuli.yaml":
# @package __global__
datasource:
max_sample_rate: 16000
max_channels: 1
train: egs/voxpopuli
valid: egs/voxpopuli
evaluate: egs/voxpopuli
generate: egs/voxpopuli
The training pipeline uses Dora to structure the experiments and perform grid-based paratermeter tuning. It is useful to get yourself familiar with Dora concepts such as dora run, dora grid, etc. before starting.
To test the training pipeline locally, see this documentation in Audiocraft. You can replace the example dataset with the above Voxpopuli, e.g. run the following command within the Audiocraft cloned directory:
dora run solver=watermark/robustness dset=audio/example
By default the checkpoints and experiment files are stored in /tmp/audiocraft_$USER/outputs
. To customize where your own Dora output and experiment folder are, as well as to run in a SLURM cluster, define a config file with the following structure:
# File name: my_config.yaml
default:
dora_dir: [DORA PATH]
partitions:
global: your_slurm_partitions
team: your_slurm_partitions
reference_dir: /tmp
darwin: # if we detect we are on a Mac, then most likely we are doing unit testing etc.
dora_dir: [YOUR PATH]
partitions:
global: your_slurm_partitions
team: your_slurm_partitions
reference_dir: [REFERENCE PATH]
where partitions
indicates the SLURM partitions you are entitled to run your jobs. Then re-run the dora run
command with the custom config:
AUDIOCRAFT_CONFIG=my_config.yaml dora run solver=watermark/robustness dset=audio/voxpopuli
If successful, the checkpoints will be stored in an experiment folder in your dora dir, i.e. [DORA_PATH]/xps/[HASH-ID]/checkpoint_XXX.th
, where HASH-ID
is the Id of the experiment you will see in the output log when running dora run
. You can choose to evaluate your checkpoints with diffferent settings for nbits, and choose the ones with lowest losses:
AUDIOCRAFT_CONFIG=my_config.yaml dora run solver=watermark/robustness execute_only=evaluate dset=audio/voxpopuli continue_from=[PATH_TO_THE_CHECKPOINT_FILE] +dummy_watermarker.nbits=16 seanet.detector.output_dim=32
The checkpoint contains the jointly-trained generator and detector, so it cannot be used right away in AudioSeal API. To extract the generator and detector, run the conversion script in Audioseal code "src/scripts/checkpoints.py":
python [AudioSeal path]/src/scripts/checkpoints.py --checkpoint=[PATH TO CHECKPOINT] --outdir=[OUTPUT_DIR] --suffix=[name of the new model]
After this step, there will be two checkpoint files named generator_[suffix].pth
and detector_[suffix].pth
in the output directory [OUTPUT_DIR]. You can use these new checkpoints directly with AudioSeal API, for instance:
model = AudioSeal.load_generator("[OUTPUT_DIR]/generator_[suffix].pth", nbits=16)
watermark = model.get_watermark(wav, sr)
detector = AudioSeal.load_detector("[OUTPUT_DIR]/detector_[suffix].pth", nbits=16)
result, message = detector(watermarked_audio, sr)
We also provide the hyperparameter and training config (in Dora term, a "grid") to reproduce our checkpoints for AudioSeal in HuggingFace (which is also the one used to produce the results ported in the ICML paper). To get this, check the AudioCraft's watermarking grid. To reproduce the result, run the dora grid
command:
AUDIOCRAFT_CONFIG=my_config.yaml AUDIOCRAFT_DSET=audio/voxpopuli dora grid watermarking.1315_kbits_seeds
- If you encounter the error
Unsupported formats
on Linux, the ffmpeg is not properly installed or superseded by other backends in your system. Try to instruct dora to use the libs you installed in your environment explicitly, i.e. adding them toLD_LIBRARY_PATH
. If you use Anaconda, you can try:
LD_LIBRARY_PATH=$CONDA_PREFIX/lib:$LD_LIBRARY_PATH AUDIOCRAFT_DORA_DIR=my_config.yaml [dora run/grid command]