This document is a work in progress.
If you're looking for some functionality in particular, it's a good idea to take a look at the source code. Core functionality is mostly in pydub/audio_segment.py
– a number of AudioSegment
methods are in the pydub/effects.py
module, and added to AudioSegment
via the effect registration process (the register_pydub_effect()
decorator function)
Currently Undocumented:
- Playback (
pydub.playback
) - Signal Processing (compression, EQ, normalize, speed change -
pydub.effects
,pydub.scipy_effects
) - Signal generators (Sine, Square, Sawtooth, Whitenoise, etc -
pydub.generators
) - Effect registration system (basically the
pydub.utils.register_pydub_effect
decorator)
AudioSegment
objects are immutable, and support a number of operators.
from pydub import AudioSegment
sound1 = AudioSegment.from_file("/path/to/sound.wav", format="wav")
sound2 = AudioSegment.from_file("/path/to/another_sound.wav", format="wav")
# sound1 6 dB louder, then 3.5 dB quieter
louder = sound1 + 6
quieter = sound1 - 3.5
# sound1, with sound2 appended
combined = sound1 + sound2
# sound1 repeated 3 times
repeated = sound1 * 3
# duration
duration_in_milliseconds = len(sound1)
# first 5 seconds of sound1
beginning = sound1[:5000]
# last 5 seconds of sound1
end = sound1[-5000:]
# split sound1 in 5-second slices
slices = sound1[::5000]
# Advanced usage, if you have raw audio data:
sound = AudioSegment(
# raw audio data (bytes)
data=b'…',
# 2 byte (16 bit) samples
sample_width=2,
# 44.1 kHz frame rate
frame_rate=44100,
# stereo
channels=2
)
Any operations that combine multiple AudioSegment
objects in any way will first ensure that they have the same number of channels, frame rate, sample rate, bit depth, etc. When these things do not match, the lower quality sound is modified to match the quality of the higher quality sound so that quality is not lost: mono is converted to stereo, bit depth and frame rate/sample rate are increased as needed. If you do not want this behavior, you may explicitly reduce the number of channels, bits, etc using the appropriate AudioSegment
methods.
Open an audio file as an AudioSegment
instance and return it. there are also a number of wrappers provided for convenience, but you should probably just use this directly.
from pydub import AudioSegment
# wave and raw don’t use ffmpeg
wav_audio = AudioSegment.from_file("/path/to/sound.wav", format="wav")
raw_audio = AudioSegment.from_file("/path/to/sound.raw", format="raw",
frame_rate=44100, channels=2, sample_width=2)
# all other formats use ffmpeg
mp3_audio = AudioSegment.from_file("/path/to/sound.mp3", format="mp3")
# use a file you've already opened (advanced …ish)
with open("/path/to/sound.wav", "rb") as wav_file:
audio_segment = AudioSegment.from_file(wav_file, format="wav")
# also supports the os.PathLike protocol for python >= 3.6
from pathlib import Path
wav_path = Path("path/to/sound.wav")
wav_audio = AudioSegment.from_file(wav_path)
The first argument is the path (as a string) of the file to read, or a file handle to read from.
Supported keyword arguments:
format
| example:"aif"
| default: autodetected Format of the output file. Supports"wav"
and"raw"
natively, requires ffmpeg for all other formats."raw"
files require 3 additional keyword arguments,sample_width
,frame_rate
, andchannels
, denoted below with:raw
only. This extra info is required because raw audio files do not have headers to include this info in the file itself like wav files do.sample_width
| example:2
raw
only — Use1
for 8-bit audio2
for 16-bit (CD quality) and4
for 32-bit. It’s the number of bytes per sample.channels
| example:1
raw
only —1
for mono,2
for stereo.frame_rate
| example:2
raw
only — Also known as sample rate, common values are44100
(44.1kHz - CD audio), and48000
(48kHz - DVD audio)start_second
| example:2.0
| default:None
Offset (in seconds) to start loading the audio file. IfNone
, the audio will start loading from the beginning.duration
| example:2.5
| default:None
Number of seconds to be loaded. IfNone
, full audio will be loaded.
Write the AudioSegment
object to a file – returns a file handle of the output file (you don't have to do anything with it, though).
from pydub import AudioSegment
sound = AudioSegment.from_file("/path/to/sound.wav", format="wav")
# simple export
file_handle = sound.export("/path/to/output.mp3", format="mp3")
# more complex export
file_handle = sound.export("/path/to/output.mp3",
format="mp3",
bitrate="192k",
tags={"album": "The Bends", "artist": "Radiohead"},
cover="/path/to/albumcovers/radioheadthebends.jpg")
# split sound in 5-second slices and export
for i, chunk in enumerate(sound[::5000]):
with open("sound-%s.mp3" % i, "wb") as f:
chunk.export(f, format="mp3")
The first argument is the location (as a string) to write the output, or a file handle to write to. If you do not pass an output file or path, a temporary file is generated.
Supported keyword arguments:
format
| example:"aif"
| default:"mp3"
Format of the output file. Supports"wav"
and"raw"
natively, requires ffmpeg for all other formats.codec
| example:"libvorbis"
For formats that may contain content encoded with different codecs, you can specify the codec you'd like the encoder to use. For example, the "ogg" format is often used with the "libvorbis" codec. (requires ffmpeg)bitrate
| example:"128k"
For compressed formats, you can pass the bitrate you'd like the encoder to use (requires ffmpeg). Each codec accepts different bitrate arguments so take a look at the ffmpeg documentation for details (bitrate usually shown as-b
,-ba
or-a:b
).tags
| example:{"album": "1989", "artist": "Taylor Swift"}
Allows you to supply media info tags for the encoder (requires ffmpeg). Not all formats can receive tags (mp3 can).parameters
| example:["-ac", "2"]
Pass additional command line parameters to the ffmpeg call. These are added to the end of the call (in the output file section).id3v2_version
| example:"3"
| default:"4"
Set the ID3v2 version used by ffmpeg to add tags to the output file. If you want Windows Exlorer to display tags, use"3"
here (source).cover
| example:"/path/to/imgfile.png"
Allows you to supply a cover image (path to the image file). Currently, only MP3 files allow this keyword argument. Cover image must be a jpeg, png, bmp, or tiff file.
Creates a zero-duration AudioSegment
.
from pydub import AudioSegment
empty = AudioSegment.empty()
len(empty) == 0
This is useful for aggregation loops:
from pydub import AudioSegment
sounds = [
AudioSegment.from_wav("sound1.wav"),
AudioSegment.from_wav("sound2.wav"),
AudioSegment.from_wav("sound3.wav"),
]
playlist = AudioSegment.empty()
for sound in sounds:
playlist += sound
Creates a silent audiosegment, which can be used as a placeholder, spacer, or as a canvas to overlay other sounds on top of.
from pydub import AudioSegment
ten_second_silence = AudioSegment.silent(duration=10000)
Supported keyword arguments:
duration
| example:3000
| default:1000
(1 second) Length of the silentAudioSegment
, in millisecondsframe_rate
| example44100
| default:11025
(11.025 kHz) Frame rate (i.e., sample rate) of the silentAudioSegment
in Hz
Creates a multi-channel audiosegment out of multiple mono audiosegments (two or more). Each mono audiosegment passed in should be exactly the same length, down to the frame count.
from pydub import AudioSegment
left_channel = AudioSegment.from_wav("sound1.wav")
right_channel = AudioSegment.from_wav("sound1.wav")
stereo_sound = AudioSegment.from_mono_audiosegments(left_channel, right_channel)
Returns the loudness of the AudioSegment
in dBFS (db relative to the maximum possible loudness). A Square wave at maximum amplitude will be roughly 0 dBFS (maximum loudness), whereas a Sine Wave at maximum amplitude will be roughly -3 dBFS.
from pydub import AudioSegment
sound = AudioSegment.from_file("sound1.wav")
loudness = sound.dBFS
Number of channels in this audio segment (1 means mono, 2 means stereo)
from pydub import AudioSegment
sound = AudioSegment.from_file("sound1.wav")
channel_count = sound.channels
Number of bytes in each sample (1 means 8 bit, 2 means 16 bit, etc). CD Audio is 16 bit, (sample width of 2 bytes).
from pydub import AudioSegment
sound = AudioSegment.from_file("sound1.wav")
bytes_per_sample = sound.sample_width
CD Audio has a 44.1kHz sample rate, which means frame_rate
will be 44100
(same as sample rate, see frame_width
). Common values are 44100
(CD), 48000
(DVD), 22050
, 24000
, 12000
and 11025
.
from pydub import AudioSegment
sound = AudioSegment.from_file("sound1.wav")
frames_per_second = sound.frame_rate
Number of bytes for each "frame". A frame contains a sample for each channel (so for stereo you have 2 samples per frame, which are played simultaneously). frame_width
is equal to channels * sample_width
. For CD Audio it'll be 4
(2 channels times 2 bytes per sample).
from pydub import AudioSegment
sound = AudioSegment.from_file("sound1.wav")
bytes_per_frame = sound.frame_width
A measure of loudness. Used to compute dBFS, which is what you should use in most cases. Loudness is logarithmic (rms is not), which makes dB a much more natural scale.
from pydub import AudioSegment
sound = AudioSegment.from_file("sound1.wav")
loudness = sound.rms
The highest amplitude of any sample in the AudioSegment
. Useful for things like normalization (which is provided in pydub.effects.normalize
).
from pydub import AudioSegment
sound = AudioSegment.from_file("sound1.wav")
peak_amplitude = sound.max
The highest amplitude of any sample in the AudioSegment
, in dBFS (relative to the highest possible amplitude value). Useful for things like normalization (which is provided in pydub.effects.normalize
).
from pydub import AudioSegment
sound = AudioSegment.from_file("sound1.wav")
normalized_sound = sound.apply_gain(-sound.max_dBFS)
Returns the duration of the AudioSegment
in seconds (len(sound)
returns milliseconds). This is provided for convenience; it calls len()
internally.
from pydub import AudioSegment
sound = AudioSegment.from_file("sound1.wav")
assert sound.duration_seconds == (len(sound) / 1000.0)
The raw audio data of the AudioSegment. Useful for interacting with other audio libraries or weird APIs that want audio data in the form of a bytestring. Also comes in handy if you’re implementing effects or other direct signal processing.
You probably don’t need this, but if you do… you’ll know.
from pydub import AudioSegment
sound = AudioSegment.from_file("sound1.wav")
raw_audio_data = sound.raw_data
Returns the number of frames in the AudioSegment
. Optionally you may pass in a ms
keywork argument to retrieve the number of frames in that number of milliseconds of audio in the AudioSegment
(useful for slicing, etc).
from pydub import AudioSegment
sound = AudioSegment.from_file("sound1.wav")
number_of_frames_in_sound = sound.frame_count()
number_of_frames_in_200ms_of_sound = sound.frame_count(ms=200)
Supported keyword arguments:
ms
| example:3000
| default:None
(entire duration ofAudioSegment
) When specified, method returns number of frames in X milliseconds of theAudioSegment
Returns a new AudioSegment
, created by appending another AudioSegment
to this one (i.e., adding it to the end), Optionally using a crossfade. AudioSegment(…).append()
is used internally when adding AudioSegment
objects together with the +
operator.
By default a 100ms (0.1 second) crossfade is used to eliminate pops and crackles.
from pydub import AudioSegment
sound1 = AudioSegment.from_file("sound1.wav")
sound2 = AudioSegment.from_file("sound2.wav")
# default 100 ms crossfade
combined = sound1.append(sound2)
# 5000 ms crossfade
combined_with_5_sec_crossfade = sound1.append(sound2, crossfade=5000)
# no crossfade
no_crossfade1 = sound1.append(sound2, crossfade=0)
# no crossfade
no_crossfade2 = sound1 + sound2
Supported keyword arguments:
crossfade
| example:3000
| default:100
(entire duration ofAudioSegment
) When specified, method returns number of frames in X milliseconds of theAudioSegment
Overlays an AudioSegment
onto this one. In the resulting AudioSegment
they will play simultaneously. If the overlaid AudioSegment
is longer than this one, the result will be truncated (so the end of the overlaid sound will be cut off). The result is always the same length as this AudioSegment
even when using the loop
, and times
keyword arguments.
Since AudioSegment
objects are immutable, you can get around this by overlaying the shorter sound on the longer one, or by creating a silent AudioSegment
with the appropriate duration, and overlaying both sounds on to that one.
from pydub import AudioSegment
sound1 = AudioSegment.from_file("sound1.wav")
sound2 = AudioSegment.from_file("sound2.wav")
played_togther = sound1.overlay(sound2)
sound2_starts_after_delay = sound1.overlay(sound2, position=5000)
volume_of_sound1_reduced_during_overlay = sound1.overlay(sound2, gain_during_overlay=-8)
sound2_repeats_until_sound1_ends = sound1.overlay(sound2, loop=true)
sound2_plays_twice = sound1.overlay(sound2, times=2)
# assume sound1 is 30 sec long and sound2 is 5 sec long:
sound2_plays_a_lot = sound1.overlay(sound2, times=10000)
len(sound1) == len(sound2_plays_a_lot)
Supported keyword arguments:
position
| example:3000
| default:0
(beginning of thisAudioSegment
) The overlaidAudioSegment
will not begin until X milliseconds have passedloop
| example:True
| default:False
(entire duration ofAudioSegment
) The overlaidAudioSegment
will repeat (starting atposition
) until the end of thisAudioSegment
times
| example:4
| default:1
(entire duration ofAudioSegment
) The overlaidAudioSegment
will repeat X times (starting atposition
) but will still be truncated to the length of thisAudioSegment
gain_during_overlay
| example:-6.0
| default:0
(no change in volume during overlay) Change the original audio by this many dB while overlaying audio. This can be used to make the original audio quieter while the overlaid audio plays.
Change the amplitude (generally, loudness) of the AudioSegment
. Gain is specified in dB. This method is used internally by the +
operator.
from pydub import AudioSegment
sound1 = AudioSegment.from_file("sound1.wav")
# make sound1 louder by 3.5 dB
louder_via_method = sound1.apply_gain(+3.5)
louder_via_operator = sound1 + 3.5
# make sound1 quieter by 5.7 dB
quieter_via_method = sound1.apply_gain(-5.7)
quieter_via_operator = sound1 - 5.7
A more general (more flexible) fade method. You may specify start
and end
, or one of the two along with duration (e.g., start
and duration
).
from pydub import AudioSegment
sound1 = AudioSegment.from_file("sound1.wav")
fade_louder_for_3_seconds_in_middle = sound1.fade(to_gain=+6.0, start=7500, duration=3000)
fade_quieter_beteen_2_and_3_seconds = sound1.fade(to_gain=-3.5, start=2000, end=3000)
# easy way is to use the .fade_in() convenience method. note: -120dB is basically silent.
fade_in_the_hard_way = sound1.fade(from_gain=-120.0, start=0, duration=5000)
fade_out_the_hard_way = sound1.fade(to_gain=-120.0, end=0, duration=5000)
Supported keyword arguments:
to_gain
| example:-3.0
| default:0
(0dB, no change) Resulting change at the end of the fade.-6.0
means fade will be be from 0dB (no change) to -6dB, and everything after the fade will be -6dB.from_gain
| example:-3.0
| default:0
(0dB, no change) Change at the beginning of the fade.-6.0
means fade (and all audio before it) will be be at -6dB will fade up to 0dB – the rest of the audio after the fade will be at 0dB (i.e., unchanged).start
| example:7500
| NO DEFAULT Position to begin fading (in milliseconds).5500
means fade will begin after 5.5 seconds.end
| example:4
| NO DEFAULT The overlaidAudioSegment
will repeat X times (starting atposition
) but will still be truncated to the length of thisAudioSegment
duration
| example:4
| NO DEFAULT You can usestart
orend
with duration, instead of specifying both - provided as a convenience.
Fade out (to silent) the end of this AudioSegment
. Uses .fade()
internally.
Supported keyword arguments:
duration
| example:5000
| NO DEFAULT How long (in milliseconds) the fade should last. Passed directly to.fade()
internally
Fade in (from silent) the beginning of this AudioSegment
. Uses .fade()
internally.
Supported keyword arguments:
duration
| example:5000
| NO DEFAULT How long (in milliseconds) the fade should last. Passed directly to.fade()
internally
Make a copy of this AudioSegment
that plays backwards. Useful for Pink Floyd, screwing around, and some audio processing algorithms.
Creates an equivalent version of this AudioSegment
with the specified sample width (in bytes). Increasing this value does not generally cause a reduction in quality. Reducing it definitely does cause a loss in quality. Higher Sample width means more dynamic range.
Creates an equivalent version of this AudioSegment
with the specified frame rate (in Hz). Increasing this value does not generally cause a reduction in quality. Reducing it definitely does cause a loss in quality. Higher frame rate means larger frequency response (higher frequencies can be represented).
Creates an equivalent version of this AudioSegment
with the specified number of channels (1 is Mono, 2 is Stereo). Converting from mono to stereo does not cause any audible change. Converting from stereo to mono may result in loss of quality (but only if the left and right chanels differ).
Splits a stereo AudioSegment
into two, one for each channel (Left/Right). Returns a list with the new AudioSegment
objects with the left channel at index 0 and the right channel at index 1.
from pydub import AudioSegment
sound1 = AudioSegment.from_file("sound1.wav")
# make left channel 6dB quieter and right channe 2dB louder
stereo_balance_adjusted = sound1.apply_gain_stereo(-6, +2)
Apply gain to the left and right channel of a stereo AudioSegment
. If the AudioSegment
is mono, it will be converted to stereo before applying the gain.
Both gain arguments are specified in dB.
from pydub import AudioSegment
sound1 = AudioSegment.from_file("sound1.wav")
# pan the sound 15% to the right
panned_right = sound1.pan(+0.15)
# pan the sound 50% to the left
panned_left = sound1.pan(-0.50)
Takes one positional argument, pan amount, which should be between -1.0 (100% left) and +1.0 (100% right)
When pan_amount == 0.0 the left/right balance is not changed.
Panning does not alter the perceived loundness, but since loudness is decreasing on one side, the other side needs to get louder to compensate. When panned hard left, the left channel will be 3dB louder and the right channel will be silent (and vice versa).
Returns the raw audio data as an array of (numeric) samples. Note: if the audio has multiple channels, the samples for each channel will be serialized – for example, stereo audio would look like [sample_1_L, sample_1_R, sample_2_L, sample_2_R, …]
.
This method is mainly for use in implementing effects, and other processing.
from pydub import AudioSegment
sound = AudioSegment.from_file(“sound1.wav”)
samples = sound.get_array_of_samples()
# then modify samples...
new_sound = sound._spawn(samples)
note that when using numpy or scipy you will need to convert back to an array before you spawn:
import array
import numpy as np
from pydub import AudioSegment
sound = AudioSegment.from_file(“sound1.wav”)
samples = sound.get_array_of_samples()
# Example operation on audio data
shifted_samples = np.right_shift(samples, 1)
# now you have to convert back to an array.array
shifted_samples_array = array.array(sound.array_type, shifted_samples)
new_sound = sound._spawn(shifted_samples_array)
Here's how to convert to a numpy float32 array:
import numpy as np
from pydub import AudioSegment
sound = AudioSegment.from_file("sound1.wav")
sound = sound.set_frame_rate(16000)
channel_sounds = sound.split_to_mono()
samples = [s.get_array_of_samples() for s in channel_sounds]
fp_arr = np.array(samples).T.astype(np.float32)
fp_arr /= np.iinfo(samples[0].typecode).max
And how to convert it back to an AudioSegment:
import io
import scipy.io.wavfile
wav_io = io.BytesIO()
scipy.io.wavfile.write(wav_io, 16000, fp_arr)
wav_io.seek(0)
sound = pydub.AudioSegment.from_wav(wav_io)
Returns a value between -1.0 and 1.0 representing the DC offset of a channel. This is calculated using audioop.avg()
and normalizing the result by samples max value.
Supported keyword arguments:
channel
| example:2
| default:1
Selects left (1) or right (2) channel to calculate DC offset. If segment is mono, this value is ignored.
Removes DC offset from channel(s). This is done by using audioop.bias()
, so watch out for overflows.
Supported keyword arguments:
-
channel
| example:2
| default: None Selects left (1) or right (2) channel remove DC offset. If value if None, removes from all available channels. If segment is mono, this value is ignored. -
offset
| example:-0.1
| default: None Offset to be removed from channel(s). Calculates offset if it's None. Offset values must be between -1.0 and 1.0.
Collection of DSP effects that are implemented by AudioSegment
objects.
Make a copy of this AudioSegment
and inverts the phase of the signal. Can generate anti-phase waves for noise suppression or cancellation.
Various functions for finding/manipulating silence in AudioSegments. For creating silent AudioSegments, see AudioSegment.silent().
Returns a list of all silent sections [start, end] in milliseconds of audio_segment. Inverse of detect_nonsilent(). Can be very slow since it has to iterate over the whole segment.
from pydub import AudioSegment, silence
print(silence.detect_silence(AudioSegment.silent(2000)))
# [[0, 2000]]
Supported keyword arguments:
-
min_silence_len
| example:500
| default: 1000 The minimum length for silent sections in milliseconds. If it is greater than the length of the audio segment an empty list will be returned. -
silence_thresh
| example:-20
| default: -16 The upper bound for how quiet is silent in dBFS. -
seek_step
| example:5
| default: 1 Size of the step for checking for silence in milliseconds. Smaller is more precise. Must be a positive whole number.
Returns a list of all silent sections [start, end] in milliseconds of audio_segment. Inverse of detect_silence() and has all the same arguments. Can be very slow since it has to iterate over the whole segment.
Supported keyword arguments:
-
min_silence_len
| example:500
| default: 1000 The minimum length for silent sections in milliseconds. If it is greater than the length of the audio segment an empty list will be returned. -
silence_thresh
| example:-20
| default: -16 The upper bound for how quiet is silent in dBFS. -
seek_step
| example:5
| default: 1 Size of the step for checking for silence in milliseconds. Smaller is more precise. Must be a positive whole number.
Returns list of audio segments from splitting audio_segment on silent sections.
Supported keyword arguments:
-
min_silence_len
| example:500
| default: 1000 The minimum length for silent sections in milliseconds. If it is greater than the length of the audio segment an empty list will be returned. -
silence_thresh
| example:-20
| default: -16 The upper bound for how quiet is silent in dBFS. -
seek_step
| example:5
| default: 1 Size of the step for checking for silence in milliseconds. Smaller is more precise. Must be a positive whole number. -
keep_silence
~ example: True | default: 100 How much silence to keep in ms or a bool. leave some silence at the beginning and end of the chunks. Keeps the sound from sounding like it is abruptly cut off. When the length of the silence is less than the keep_silence duration it is split evenly between the preceding and following non-silent segments. If True is specified, all the silence is kept, if False none is kept.
Returns the millisecond/index that the leading silence ends. If there is no end it will return the length of the audio_segment.
from pydub import AudioSegment, silence
print(silence.detect_silence(AudioSegment.silent(2000)))
# 2000
Supported keyword arguments:
-
silence_thresh
| example:-20
| default: -50 The upper bound for how quiet is silent in dBFS. -
chunk_size
| example:5
| default: 10 Size of the step for checking for silence in milliseconds. Smaller is more precise. Must be a positive whole number.