(Largely based on rbracco's tutorial, big thanks to him for his work on getting this going for us!)
fastai
's audio module has been in development for a while by active forum members:
from fastai.vision.all import *
from fastaudio.core.all import *
from fastaudio.augment.all import *
tar_extract_at_filename
simply extracts at the file name (as the name suggests)
path_dig = untar_data(URLs.SPEAKERS10, extract_func=tar_extract_at_filename)
Now we want to grab just the audio files.
audio_extensions[:5]
fnames = get_files(path_dig, extensions=audio_extensions)
fnames[:5]
We can convert any audio file to a tensor with AudioTensor
. Let's try opening a file:
at = AudioTensor.create(fnames[0])
at, at.shape
at.show()
cfg = AudioConfig.Voice()
Our configuration will limit options like the frequency range and the sampling rate
cfg.f_max, cfg.sample_rate
We can then make a transform from this configuration to turn raw audio into a workable spectrogram per our settings:
aud2spec = AudioToSpec.from_cfg(cfg)
For our example, we'll crop out the original audio file to 1000 ms
crop1s = ResizeSignal(1000)
Let's build a Pipeline
how we'd expect our data to come in
pipe = Pipeline([AudioTensor.create, crop1s, aud2spec])
And try visualizing what our newly made data becomes.
First, we'll remove that cropping:
pipe = Pipeline([AudioTensor.create, aud2spec])
for fn in fnames[:3]:
audio = AudioTensor.create(fn)
audio.show()
pipe(fn).show()