8.8 KiB
Using Preprocessors in BatDetect2
Overview
In the previous sections ({doc}audio
and {doc}spectrogram
), we discussed the individual steps involved in converting raw audio into a processed spectrogram suitable for BatDetect2 models, and how to configure these steps using YAML files (specifically the audio:
and spectrogram:
sections within a main preprocessing:
configuration block).
This page focuses on how this configured pipeline is represented and used within BatDetect2, primarily through the concept of a Preprocessor
object.
This object bundles together your chosen audio loading settings and spectrogram generation settings into a single component that can perform the end-to-end processing.
Do I Need to Interact with Preprocessors Directly?
Usually, no. For standard model training or running inference with BatDetect2 using the provided scripts, the system will automatically:
- Read your main configuration file (e.g.,
config.yaml
). - Find the
preprocessing:
section (containingaudio:
andspectrogram:
settings). - Build the appropriate
Preprocessor
object internally based on your settings. - Use that internal
Preprocessor
object automatically whenever audio needs to be loaded and converted to a spectrogram.
However, understanding the Preprocessor
object is useful if you want to:
- Customize: Go beyond the standard configuration options by interacting with parts of the pipeline programmatically.
- Integrate: Use BatDetect2's preprocessing steps within your own custom Python analysis scripts.
- Inspect/Debug: Manually apply preprocessing steps to specific files or clips to examine intermediate outputs (like the processed waveform) or the final spectrogram.
Getting a Preprocessor Object
If you do want to work with the preprocessor programmatically, you first need to get an instance of it. This is typically done based on a configuration:
-
Define Configuration: Create your
preprocessing:
configuration, usually in a YAML file (let's call itpreprocess_config.yaml
), detailing your desiredaudio
andspectrogram
settings.# preprocess_config.yaml audio: resample: samplerate: 256000 # ... other audio settings ... spectrogram: frequencies: min_freq: 15000 max_freq: 120000 scale: dB # ... other spectrogram settings ...
-
Load Configuration & Build Preprocessor (in Python):
from batdetect2.preprocessing import load_preprocessing_config, build_preprocessor from batdetect2.preprocess.types import Preprocessor # Import the type # Load the configuration from the file config_path = "path/to/your/preprocess_config.yaml" preprocessing_config = load_preprocessing_config(config_path) # Build the actual preprocessor object using the loaded config preprocessor: Preprocessor = build_preprocessor(preprocessing_config) # 'preprocessor' is now ready to use!
-
Using Defaults: If you just want the standard BatDetect2 default preprocessing settings, you can build one without loading a config file:
from batdetect2.preprocessing import build_preprocessor from batdetect2.preprocess.types import Preprocessor # Build with default settings default_preprocessor: Preprocessor = build_preprocessor()
Applying Preprocessing
Once you have a preprocessor
object, you can use its methods to process audio data:
1. End-to-End Processing (Common Use Case):
These methods take an audio source identifier (file path, Recording object, or Clip object) and return the final, processed spectrogram.
preprocessor.preprocess_file(path)
: Processes an entire audio file.preprocessor.preprocess_recording(recording_obj)
: Processes the entire audio associated with asoundevent.data.Recording
object.preprocessor.preprocess_clip(clip_obj)
: Processes only the specific time segment defined by asoundevent.data.Clip
object.- Efficiency Note: Using
preprocess_clip
is highly recommended when you are only interested in analyzing a small portion of a potentially long recording. It avoids loading the entire audio file into memory, making it much more efficient.
- Efficiency Note: Using
from soundevent import data
# Assume 'preprocessor' is built as shown before
# Assume 'my_clip' is a soundevent.data.Clip object defining a segment
# Process an entire file
spectrogram_from_file = preprocessor.preprocess_file("my_recording.wav")
# Process only a specific clip (more efficient for segments)
spectrogram_from_clip = preprocessor.preprocess_clip(my_clip)
# The results (spectrogram_from_file, spectrogram_from_clip) are xr.DataArrays
print(type(spectrogram_from_clip))
# Output: <class 'xarray.core.dataarray.DataArray'>
2. Intermediate Steps (Advanced Use Cases):
The preprocessor also allows access to intermediate stages if needed:
preprocessor.load_clip_audio(clip_obj)
(and similar for file/recording): Loads the audio and applies only the waveform processing steps (resampling, centering, etc.) defined in theaudio
config. Returns the processed waveform as anxr.DataArray
. This is useful if you want to analyze or manipulate the waveform itself before spectrogram generation.preprocessor.compute_spectrogram(waveform)
: Takes an already loaded waveform (eithernp.ndarray
orxr.DataArray
) and applies only the spectrogram generation steps defined in thespectrogram
config.- If you provide an
xr.DataArray
(e.g., fromload_clip_audio
), it uses the sample rate from the array's coordinates. - If you provide a raw
np.ndarray
, it must assume a sample rate. It uses thedefault_samplerate
that was determined when thepreprocessor
was built (based on youraudio
config's resample settings or the global default). Be cautious when using NumPy arrays to ensure the sample rate assumption is correct for your data!
- If you provide an
# Example: Get waveform first, then spectrogram
waveform = preprocessor.load_clip_audio(my_clip)
# waveform is an xr.DataArray
# ...potentially do other things with the waveform...
# Compute spectrogram from the loaded waveform
spectrogram = preprocessor.compute_spectrogram(waveform)
# Example: Process external numpy array (use with caution re: sample rate)
# import soundfile as sf # Requires installing soundfile
# numpy_waveform, original_sr = sf.read("some_other_audio.wav")
# # MUST ensure numpy_waveform's actual sample rate matches
# # preprocessor.default_samplerate for correct results here!
# spec_from_numpy = preprocessor.compute_spectrogram(numpy_waveform)
Understanding the Output: xarray.DataArray
All preprocessing methods return the final spectrogram (or the intermediate waveform) as an xarray.DataArray
.
What is it? Think of it like a standard NumPy array (holding the numerical data of the spectrogram) but with added "superpowers":
- Labeled Dimensions: Instead of just having axis 0 and axis 1, the dimensions have names, typically
"frequency"
and"time"
. - Coordinates: It stores the actual frequency values (e.g., in Hz) corresponding to each row and the actual time values (e.g., in seconds) corresponding to each column along the dimensions.
Why is it used?
- Clarity: The data is self-describing. You don't need to remember which axis is time and which is frequency, or what the units are – it's stored with the data.
- Convenience: You can select, slice, or plot data using the real-world coordinate values (times, frequencies) instead of just numerical indices. This makes analysis code easier to write and less prone to errors.
- Metadata: It can hold additional metadata about the processing steps in its
attrs
(attributes) dictionary.
Using the Output:
-
Input to Model: For standard training or inference, you typically pass this
xr.DataArray
spectrogram directly to the BatDetect2 model functions. -
Inspection/Analysis: If you're working programmatically, you can use xarray's powerful features. For example (these are just illustrations of xarray):
# Get the shape (frequency_bins, time_bins) # print(spectrogram.shape) # Get the frequency coordinate values # print(spectrogram['frequency'].values) # Select data near a specific time and frequency # value_at_point = spectrogram.sel(time=0.5, frequency=50000, method="nearest") # print(value_at_point) # Select a time slice between 0.2 and 0.3 seconds # time_slice = spectrogram.sel(time=slice(0.2, 0.3)) # print(time_slice.shape)
In summary, while BatDetect2 often handles preprocessing automatically based on your configuration, the underlying Preprocessor
object provides a flexible interface for applying these steps programmatically if needed, returning results in the convenient and informative xarray.DataArray
format.