7.5 KiB
BatDetect2 Architecture Overview
This document provides a comprehensive map of the batdetect2 codebase architecture. It is intended to serve as a deep-dive reference for developers, agents, and contributors navigating the project.
batdetect2 is designed as a modular deep learning pipeline for detecting and classifying bat echolocation calls in high-frequency audio recordings. It heavily utilizes PyTorch, PyTorch Lightning for training, and the Soundevent library for standardized audio and geometry data classes.
The repository follows a configuration-driven design pattern, heavily utilizing pydantic/omegaconf (via BaseConfig) and the Factory/Registry patterns for dependency injection and modularity. The entire pipeline can be orchestrated via the high-level API BatDetect2API (src/batdetect2/api_v2.py).
1. Data Flow Pipeline
The standard lifecycle of a prediction request follows these sequential stages, each handled by an isolated, replaceable module:
- Audio Loading (
batdetect2.audio): Read raw.wavfiles into standard NumPy arrays orsoundevent.data.Clipobjects. Handles resampling. - Preprocessing (
batdetect2.preprocess): Converts raw 1D waveforms into 2D Spectrogram tensors. - Forward Pass (
batdetect2.models): A PyTorch neural network processes the spectrogram and outputs dense prediction tensors (e.g., detection heatmaps, bounding box sizes, class probabilities). - Postprocessing (
batdetect2.postprocess): Decodes the raw output tensors back into explicit geometry bounding boxes and runs Non-Maximum Suppression (NMS) to filter redundant predictions. - Formatting (
batdetect2.data): Transforms the predictions into standard formats (.csv,.json,.parquet) usingOutputFormatterProtocol.
2. Core Modules Breakdown
2.1 Audio and Preprocessing
audio/:- Centralizes audio I/O using
AudioLoader. It abstracts over thesoundeventlibrary, efficiently handling fullRecordingfiles or smallerClipsegments, standardizing the sample rate.
- Centralizes audio I/O using
preprocess/:- Dictated by the
PreprocessorProtocol. - Its primary responsibility is spectrogram generation via Short-Time Fourier Transform (STFT).
- During training, it incorporates data augmentation layers (e.g., amplitude scaling, time masking, frequency masking, spectral mean subtraction) configured via
PreprocessingConfig.
- Dictated by the
2.2 Deep Learning Models (models/)
The models directory contains all PyTorch neural network architectures. The default architecture is an Encoder-Decoder (U-Net style) network.
blocks.py: Reusable neural network blocks, including standard Convolutions (ConvBlock) and specialized layers likeFreqCoordConvDownBlock/FreqCoordConvUpBlockwhich append normalized spatial frequency coordinates to explicitly grant convolutional filters frequency-awareness.encoder.py: The downsampling path (feature extraction). Builds a sequential list of blocks and captures skip connections.bottleneck.py: The deepest, lowest-resolution segment connecting the Encoder and Decoder. Features an optionalSelfAttentionmechanism to weigh global temporal contexts.decoder.py: The upsampling path (reconstruction), actively integrating skip connections (residuals) from the Encoder.heads.py: Attach to the backbone's feature map to output specific predictions:BBoxHead: Predicts bounding box sizes.ClassifierHead: Predicts species classes.DetectorHead: Predicts detection probability heatmaps.
backbones.py&detectors.py: Assemble the encoder, bottleneck, decoder, and heads into a cohesiveDetectormodel.__init__.py:Model: The overarching wrappertorch.nn.Modulecontaining thedetector,preprocessor,postprocessor, andtargets.
2.3 Targets and Regions of Interest (targets/)
Crucial for training, this module translates physical annotations (Regions of Interest / ROIs) into training targets (tensors).
rois.py: ImplementsROITargetMapper. Maps a geometric bounding box into a 2D referencePosition(time, freq) and aSizearray. Includes strategies like:AnchorBBoxMapper: Maps based on a fixed bounding box corner/center.PeakEnergyBBoxMapper: Identifies the physical coordinate of peak acoustic energy inside the bounding box and calculates offsets to the box edges.
targets.py: Reconstructs complete multi-channel target heatmaps and coordinate tensors from the ROIs to compute losses during training.
2.4 Postprocessing (postprocess/)
- Implements
PostprocessorProtocol. - Reverses the logic from
targets. It scans the model's output detection heatmaps for peaks, extracts the predicted sizes and class probabilities at those peaks, and decodes them back into physicalsoundevent.data.Geometry(Bounding Boxes). - Automatically applies Non-Maximum Suppression (NMS) configured via
PostprocessConfigto remove highly overlapping predictions.
2.5 Data Management (data/)
annotations/: Utilities to load dataset annotations supporting multiple standardized schemas (AOEF,BatDetect2formats).datasets.py: Aggregates recordings and annotations into memory.predictions/: Handles the exporting of model results viaOutputFormatterProtocol. Includes formatters forRawOutput,.parquet,.json, etc.
2.6 Evaluation (evaluate/)
- Computes scientific metrics using
EvaluatorProtocol. - Provides specific testing environments for tasks like
Clip Classification,Clip Detection, andTop Classpredictions. - Generates precision-recall curves and scatter plots.
2.7 Training (train/)
- Implements the distributed PyTorch training loop via PyTorch Lightning.
lightning.py: ContainsTrainingModule, theLightningModulethat orchestrates the optimizer, learning rate scheduler, forward passes, and backpropagation using the generatedtargets.
3. Interfaces and Tooling
3.1 APIs
api_v2.py(BatDetect2API): The modern API object. It is deeply integrated with dependency injection usingBatDetect2Config. It instantiates the loader, targets, preprocessor, postprocessor, and model, exposing easy-to-use methods likeprocess_file,evaluate, andtrain.api.py: The legacy API. Kept for backwards compatibility. Uses hardcoded default instances rather than configuration objects.
3.2 Command Line Interface (cli/)
- Implements terminal commands utilizing
click. Commands includebatdetect2 detect,evaluate, andtrain.
3.3 Core and Configuration (core/, config.py)
core/registries.py: A string-based Registry pattern (e.g.,block_registry,roi_mapper_registry) that allows developers to dynamically swap components (like a custom neural network block) via configuration files without modifying python code.config.py: Aggregates all modularBaseConfigobjects (AudioConfig,PreprocessingConfig,BackboneConfig) into the monolithicBatDetect2Config.
Summary
To navigate this codebase effectively:
- Follow
api_v2.pyto see how high-level operations invoke individual components. - Rely heavily on the typed Protocols located in each subsystem's
types.pymodule (for examplesrc/batdetect2/preprocess/types.pyandsrc/batdetect2/postprocess/types.py) to understand inputs and outputs without needing to read each implementation. - Understand that data flows structurally as
soundeventprimitives externally, and as puretorch.Tensorinternally through the network.