mirror of
https://github.com/macaodha/batdetect2.git
synced 2026-04-04 15:20:19 +02:00
Incorporate previous docs into new structure
This commit is contained in:
parent
d2d804f0c3
commit
67bb66db3c
@ -1,93 +0,0 @@
|
|||||||
# BatDetect2 Architecture Overview
|
|
||||||
|
|
||||||
This document provides a comprehensive map of the `batdetect2` codebase architecture. It is intended to serve as a deep-dive reference for developers, agents, and contributors navigating the project.
|
|
||||||
|
|
||||||
`batdetect2` is designed as a modular deep learning pipeline for detecting and classifying bat echolocation calls in high-frequency audio recordings. It heavily utilizes **PyTorch**, **PyTorch Lightning** for training, and the **Soundevent** library for standardized audio and geometry data classes.
|
|
||||||
|
|
||||||
The repository follows a configuration-driven design pattern, heavily utilizing `pydantic`/`omegaconf` (via `BaseConfig`) and the Factory/Registry patterns for dependency injection and modularity. The entire pipeline can be orchestrated via the high-level API `BatDetect2API` (`src/batdetect2/api_v2.py`).
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 1. Data Flow Pipeline
|
|
||||||
|
|
||||||
The standard lifecycle of a prediction request follows these sequential stages, each handled by an isolated, replaceable module:
|
|
||||||
|
|
||||||
1. **Audio Loading (`batdetect2.audio`)**: Read raw `.wav` files into standard NumPy arrays or `soundevent.data.Clip` objects. Handles resampling.
|
|
||||||
2. **Preprocessing (`batdetect2.preprocess`)**: Converts raw 1D waveforms into 2D Spectrogram tensors.
|
|
||||||
3. **Forward Pass (`batdetect2.models`)**: A PyTorch neural network processes the spectrogram and outputs dense prediction tensors (e.g., detection heatmaps, bounding box sizes, class probabilities).
|
|
||||||
4. **Postprocessing (`batdetect2.postprocess`)**: Decodes the raw output tensors back into explicit geometry bounding boxes and runs Non-Maximum Suppression (NMS) to filter redundant predictions.
|
|
||||||
5. **Formatting (`batdetect2.data`)**: Transforms the predictions into standard formats (`.csv`, `.json`, `.parquet`) using `OutputFormatterProtocol`.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 2. Core Modules Breakdown
|
|
||||||
|
|
||||||
### 2.1 Audio and Preprocessing
|
|
||||||
- **`audio/`**:
|
|
||||||
- Centralizes audio I/O using `AudioLoader`. It abstracts over the `soundevent` library, efficiently handling full `Recording` files or smaller `Clip` segments, standardizing the sample rate.
|
|
||||||
- **`preprocess/`**:
|
|
||||||
- Dictated by the `PreprocessorProtocol`.
|
|
||||||
- Its primary responsibility is spectrogram generation via Short-Time Fourier Transform (STFT).
|
|
||||||
- During training, it incorporates data augmentation layers (e.g., amplitude scaling, time masking, frequency masking, spectral mean subtraction) configured via `PreprocessingConfig`.
|
|
||||||
|
|
||||||
### 2.2 Deep Learning Models (`models/`)
|
|
||||||
The `models` directory contains all PyTorch neural network architectures. The default architecture is an Encoder-Decoder (U-Net style) network.
|
|
||||||
- **`blocks.py`**: Reusable neural network blocks, including standard Convolutions (`ConvBlock`) and specialized layers like `FreqCoordConvDownBlock`/`FreqCoordConvUpBlock` which append normalized spatial frequency coordinates to explicitly grant convolutional filters frequency-awareness.
|
|
||||||
- **`encoder.py`**: The downsampling path (feature extraction). Builds a sequential list of blocks and captures skip connections.
|
|
||||||
- **`bottleneck.py`**: The deepest, lowest-resolution segment connecting the Encoder and Decoder. Features an optional `SelfAttention` mechanism to weigh global temporal contexts.
|
|
||||||
- **`decoder.py`**: The upsampling path (reconstruction), actively integrating skip connections (residuals) from the Encoder.
|
|
||||||
- **`heads.py`**: Attach to the backbone's feature map to output specific predictions:
|
|
||||||
- `BBoxHead`: Predicts bounding box sizes.
|
|
||||||
- `ClassifierHead`: Predicts species classes.
|
|
||||||
- `DetectorHead`: Predicts detection probability heatmaps.
|
|
||||||
- **`backbones.py` & `detectors.py`**: Assemble the encoder, bottleneck, decoder, and heads into a cohesive `Detector` model.
|
|
||||||
- **`__init__.py:Model`**: The overarching wrapper `torch.nn.Module` containing the `detector`, `preprocessor`, `postprocessor`, and `targets`.
|
|
||||||
|
|
||||||
### 2.3 Targets and Regions of Interest (`targets/`)
|
|
||||||
Crucial for training, this module translates physical annotations (Regions of Interest / ROIs) into training targets (tensors).
|
|
||||||
- **`rois.py`**: Implements `ROITargetMapper`. Maps a geometric bounding box into a 2D reference `Position` (time, freq) and a `Size` array. Includes strategies like:
|
|
||||||
- `AnchorBBoxMapper`: Maps based on a fixed bounding box corner/center.
|
|
||||||
- `PeakEnergyBBoxMapper`: Identifies the physical coordinate of peak acoustic energy inside the bounding box and calculates offsets to the box edges.
|
|
||||||
- **`targets.py`**: Reconstructs complete multi-channel target heatmaps and coordinate tensors from the ROIs to compute losses during training.
|
|
||||||
|
|
||||||
### 2.4 Postprocessing (`postprocess/`)
|
|
||||||
- Implements `PostprocessorProtocol`.
|
|
||||||
- Reverses the logic from `targets`. It scans the model's output detection heatmaps for peaks, extracts the predicted sizes and class probabilities at those peaks, and decodes them back into physical `soundevent.data.Geometry` (Bounding Boxes).
|
|
||||||
- Automatically applies Non-Maximum Suppression (NMS) configured via `PostprocessConfig` to remove highly overlapping predictions.
|
|
||||||
|
|
||||||
### 2.5 Data Management (`data/`)
|
|
||||||
- **`annotations/`**: Utilities to load dataset annotations supporting multiple standardized schemas (`AOEF`, `BatDetect2` formats).
|
|
||||||
- **`datasets.py`**: Aggregates recordings and annotations into memory.
|
|
||||||
- **`predictions/`**: Handles the exporting of model results via `OutputFormatterProtocol`. Includes formatters for `RawOutput`, `.parquet`, `.json`, etc.
|
|
||||||
|
|
||||||
### 2.6 Evaluation (`evaluate/`)
|
|
||||||
- Computes scientific metrics using `EvaluatorProtocol`.
|
|
||||||
- Provides specific testing environments for tasks like `Clip Classification`, `Clip Detection`, and `Top Class` predictions.
|
|
||||||
- Generates precision-recall curves and scatter plots.
|
|
||||||
|
|
||||||
### 2.7 Training (`train/`)
|
|
||||||
- Implements the distributed PyTorch training loop via PyTorch Lightning.
|
|
||||||
- **`lightning.py`**: Contains `TrainingModule`, the `LightningModule` that orchestrates the optimizer, learning rate scheduler, forward passes, and backpropagation using the generated `targets`.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 3. Interfaces and Tooling
|
|
||||||
|
|
||||||
### 3.1 APIs
|
|
||||||
- **`api_v2.py` (`BatDetect2API`)**: The modern API object. It is deeply integrated with dependency injection using `BatDetect2Config`. It instantiates the loader, targets, preprocessor, postprocessor, and model, exposing easy-to-use methods like `process_file`, `evaluate`, and `train`.
|
|
||||||
- **`api.py`**: The legacy API. Kept for backwards compatibility. Uses hardcoded default instances rather than configuration objects.
|
|
||||||
|
|
||||||
### 3.2 Command Line Interface (`cli/`)
|
|
||||||
- Implements terminal commands utilizing `click`. Commands include `batdetect2 detect`, `evaluate`, and `train`.
|
|
||||||
|
|
||||||
### 3.3 Core and Configuration (`core/`, `config.py`)
|
|
||||||
- **`core/registries.py`**: A string-based Registry pattern (e.g., `block_registry`, `roi_mapper_registry`) that allows developers to dynamically swap components (like a custom neural network block) via configuration files without modifying python code.
|
|
||||||
- **`config.py`**: Aggregates all modular `BaseConfig` objects (`AudioConfig`, `PreprocessingConfig`, `BackboneConfig`) into the monolithic `BatDetect2Config`.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Summary
|
|
||||||
To navigate this codebase effectively:
|
|
||||||
1. Follow **`api_v2.py`** to see how high-level operations invoke individual components.
|
|
||||||
2. Rely heavily on the typed **Protocols** located in each subsystem's `types.py` module (for example `src/batdetect2/preprocess/types.py` and `src/batdetect2/postprocess/types.py`) to understand inputs and outputs without needing to read each implementation.
|
|
||||||
3. Understand that data flows structurally as `soundevent` primitives externally, and as pure `torch.Tensor` internally through the network.
|
|
||||||
@ -1,106 +0,0 @@
|
|||||||
# Using AOEF / Soundevent Data Sources
|
|
||||||
|
|
||||||
## Introduction
|
|
||||||
|
|
||||||
The **AOEF (Acoustic Open Event Format)**, stored as `.json` files, is the annotation format used by the underlying `soundevent` library and is compatible with annotation tools like **Whombat**.
|
|
||||||
BatDetect2 can directly load annotation data stored in this format.
|
|
||||||
|
|
||||||
This format can represent two main types of annotation collections:
|
|
||||||
|
|
||||||
1. `AnnotationSet`: A straightforward collection of annotations for various audio clips.
|
|
||||||
2. `AnnotationProject`: A more structured format often exported by annotation tools (like Whombat).
|
|
||||||
It includes not only the annotations but also information about annotation _tasks_ (work assigned to annotators) and their status (e.g., in-progress, completed, verified, rejected).
|
|
||||||
|
|
||||||
This section explains how to configure a data source in your `DatasetConfig` to load data from either type of AOEF file.
|
|
||||||
|
|
||||||
## Configuration
|
|
||||||
|
|
||||||
To define a data source using the AOEF format, you add an entry to the `sources` list in your main `DatasetConfig` (usually within your primary YAML configuration file) and set the `format` field to `"aoef"`.
|
|
||||||
|
|
||||||
Here are the key fields you need to specify for an AOEF source:
|
|
||||||
|
|
||||||
- `format: "aoef"`: **(Required)** Tells BatDetect2 to use the AOEF loader for this source.
|
|
||||||
- `name: your_source_name`: **(Required)** A unique name you choose for this data source (e.g., `"whombat_project_export"`, `"final_annotations"`).
|
|
||||||
- `audio_dir: path/to/audio/files`: **(Required)** The path to the directory where the actual audio `.wav` files referenced in the annotations are located.
|
|
||||||
- `annotations_path: path/to/your/annotations.aoef`: **(Required)** The path to the single `.aoef` or `.json` file containing the annotation data (either an `AnnotationSet` or an `AnnotationProject`).
|
|
||||||
- `description: "Details about this source..."`: (Optional) A brief description of the data source.
|
|
||||||
- `filter: ...`: **(Optional)** Specific settings used _only if_ the `annotations_path` file contains an `AnnotationProject`.
|
|
||||||
See details below.
|
|
||||||
|
|
||||||
## Filtering Annotation Projects (Optional)
|
|
||||||
|
|
||||||
When working with annotation projects, especially collaborative ones or those still in progress (like exports from Whombat), you often want to train only on annotations that are considered complete and reliable.
|
|
||||||
The optional `filter:` section allows you to specify criteria based on the status of the annotation _tasks_ within the project.
|
|
||||||
|
|
||||||
**If `annotations_path` points to a simple `AnnotationSet` file, the `filter:` section is ignored.**
|
|
||||||
|
|
||||||
If `annotations_path` points to an `AnnotationProject`, you can add a `filter:` block with the following options:
|
|
||||||
|
|
||||||
- `only_completed: <true_or_false>`:
|
|
||||||
- `true` (Default): Only include annotations from tasks that have been marked as "completed".
|
|
||||||
- `false`: Include annotations regardless of task completion status.
|
|
||||||
- `only_verified: <true_or_false>`:
|
|
||||||
- `false` (Default): Verification status is not considered.
|
|
||||||
- `true`: Only include annotations from tasks that have _also_ been marked as "verified" (typically meaning they passed a review step).
|
|
||||||
- `exclude_issues: <true_or_false>`:
|
|
||||||
- `true` (Default): Exclude annotations from any task that has been marked as "rejected" or flagged with issues.
|
|
||||||
- `false`: Include annotations even if their task was marked as having issues (use with caution).
|
|
||||||
|
|
||||||
**Default Filtering:** If you include the `filter:` block but omit some options, or if you _omit the entire `filter:` block_, the default settings are applied to `AnnotationProject` files: `only_completed: true`, `only_verified: false`, `exclude_issues: true`.
|
|
||||||
This common default selects annotations from completed tasks that haven't been rejected, without requiring separate verification.
|
|
||||||
|
|
||||||
**Disabling Filtering:** If you want to load _all_ annotations from an `AnnotationProject` regardless of task status, you can explicitly disable filtering by setting `filter: null` in your YAML configuration.
|
|
||||||
|
|
||||||
## YAML Configuration Examples
|
|
||||||
|
|
||||||
**Example 1: Loading a standard AnnotationSet (or a Project with default filtering)**
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
# In your main DatasetConfig YAML file
|
|
||||||
|
|
||||||
sources:
|
|
||||||
- name: "MyFinishedAnnotations"
|
|
||||||
format: "aoef" # Specifies the loader
|
|
||||||
audio_dir: "/path/to/my/audio/"
|
|
||||||
annotations_path: "/path/to/my/dataset.soundevent.json" # Path to the AOEF file
|
|
||||||
description: "Finalized annotations set."
|
|
||||||
# No 'filter:' block means default filtering applied IF it's an AnnotationProject,
|
|
||||||
# or no filtering applied if it's an AnnotationSet.
|
|
||||||
```
|
|
||||||
|
|
||||||
**Example 2: Loading an AnnotationProject, requiring verification**
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
# In your main DatasetConfig YAML file
|
|
||||||
|
|
||||||
sources:
|
|
||||||
- name: "WhombatVerifiedExport"
|
|
||||||
format: "aoef"
|
|
||||||
audio_dir: "relative/path/to/audio/" # Relative to where BatDetect2 runs or a base_dir
|
|
||||||
annotations_path: "exports/whombat_project.aoef" # Path to the project file
|
|
||||||
description: "Annotations from Whombat project, only using verified tasks."
|
|
||||||
filter: # Customize the filter
|
|
||||||
only_completed: true # Still require completion
|
|
||||||
only_verified: true # *Also* require verification
|
|
||||||
exclude_issues: true # Still exclude rejected tasks
|
|
||||||
```
|
|
||||||
|
|
||||||
**Example 3: Loading an AnnotationProject, disabling all filtering**
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
# In your main DatasetConfig YAML file
|
|
||||||
|
|
||||||
sources:
|
|
||||||
- name: "WhombatRawExport"
|
|
||||||
format: "aoef"
|
|
||||||
audio_dir: "data/audio_pool/"
|
|
||||||
annotations_path: "exports/whombat_project_all.aoef"
|
|
||||||
description: "All annotations from Whombat, regardless of task status."
|
|
||||||
filter: null # Explicitly disable task filtering
|
|
||||||
```
|
|
||||||
|
|
||||||
## Summary
|
|
||||||
|
|
||||||
To load standard `soundevent` annotations (including Whombat exports), set `format: "aoef"` for your data source in the `DatasetConfig`.
|
|
||||||
Provide the `audio_dir` and the path to the single `annotations_path` file.
|
|
||||||
If dealing with `AnnotationProject` files, you can optionally use the `filter:` block to select annotations based on task completion, verification, or issue status.
|
|
||||||
@ -1,9 +0,0 @@
|
|||||||
# Loading Data
|
|
||||||
|
|
||||||
```{toctree}
|
|
||||||
:maxdepth: 1
|
|
||||||
:caption: Loading Data
|
|
||||||
|
|
||||||
aoef
|
|
||||||
legacy
|
|
||||||
```
|
|
||||||
@ -1,122 +0,0 @@
|
|||||||
# Using Legacy BatDetect2 Annotation Formats
|
|
||||||
|
|
||||||
## Introduction
|
|
||||||
|
|
||||||
If you have annotation data created using older BatDetect2 annotation tools, BatDetect2 provides tools to load these datasets.
|
|
||||||
These older formats typically use JSON files to store annotation information, including bounding boxes and labels for sound events within recordings.
|
|
||||||
|
|
||||||
There are two main variations of this legacy format that BatDetect2 can load:
|
|
||||||
|
|
||||||
1. **Directory-Based (`format: "batdetect2"`):** Annotations for each audio recording are stored in a _separate_ JSON file within a dedicated directory.
|
|
||||||
There's a naming convention linking the JSON file to its corresponding audio file (e.g., `my_recording.wav` annotations are stored in `my_recording.wav.json`).
|
|
||||||
2. **Single Merged File (`format: "batdetect2_file"`):** Annotations for _multiple_ recordings are aggregated into a _single_ JSON file.
|
|
||||||
This file contains a list, where each item represents the annotations for one recording, following the same internal structure as the directory-based format.
|
|
||||||
|
|
||||||
When you configure BatDetect2 to use these formats, it will read the legacy data and convert it internally into the standard `soundevent` data structures used by the rest of the pipeline.
|
|
||||||
|
|
||||||
## Configuration
|
|
||||||
|
|
||||||
You specify which legacy format to use within the `sources` list of your main `DatasetConfig` (usually in your primary YAML configuration file).
|
|
||||||
|
|
||||||
### Format 1: Directory-Based
|
|
||||||
|
|
||||||
Use this when you have a folder containing many individual JSON annotation files, one for each audio file.
|
|
||||||
|
|
||||||
**Configuration Fields:**
|
|
||||||
|
|
||||||
- `format: "batdetect2"`: **(Required)** Identifies this specific legacy format loader.
|
|
||||||
- `name: your_source_name`: **(Required)** A unique name for this data source.
|
|
||||||
- `audio_dir: path/to/audio/files`: **(Required)** Path to the directory containing the `.wav` audio files.
|
|
||||||
- `annotations_dir: path/to/annotation/jsons`: **(Required)** Path to the directory containing the individual `.json` annotation files.
|
|
||||||
- `description: "Details..."`: (Optional) Description of this source.
|
|
||||||
- `filter: ...`: (Optional) Settings to filter which JSON files are processed based on flags within them (see "Filtering Legacy Annotations" below).
|
|
||||||
|
|
||||||
**YAML Example:**
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
# In your main DatasetConfig YAML file
|
|
||||||
sources:
|
|
||||||
- name: "OldProject_SiteA_Files"
|
|
||||||
format: "batdetect2" # Use the directory-based loader
|
|
||||||
audio_dir: "/data/SiteA/Audio/"
|
|
||||||
annotations_dir: "/data/SiteA/Annotations_JSON/"
|
|
||||||
description: "Legacy annotations stored as individual JSONs per recording."
|
|
||||||
# filter: ... # Optional filter settings can be added here
|
|
||||||
```
|
|
||||||
|
|
||||||
### Format 2: Single Merged File
|
|
||||||
|
|
||||||
Use this when you have a single JSON file that contains a list of annotations for multiple recordings.
|
|
||||||
|
|
||||||
**Configuration Fields:**
|
|
||||||
|
|
||||||
- `format: "batdetect2_file"`: **(Required)** Identifies this specific legacy format loader.
|
|
||||||
- `name: your_source_name`: **(Required)** A unique name for this data source.
|
|
||||||
- `audio_dir: path/to/audio/files`: **(Required)** Path to the directory containing the `.wav` audio files referenced _within_ the merged JSON file.
|
|
||||||
- `annotations_path: path/to/your/merged_annotations.json`: **(Required)** Path to the single `.json` file containing the list of annotations.
|
|
||||||
- `description: "Details..."`: (Optional) Description of this source.
|
|
||||||
- `filter: ...`: (Optional) Settings to filter which records _within_ the merged file are processed (see "Filtering Legacy Annotations" below).
|
|
||||||
|
|
||||||
**YAML Example:**
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
# In your main DatasetConfig YAML file
|
|
||||||
sources:
|
|
||||||
- name: "OldProject_Merged"
|
|
||||||
format: "batdetect2_file" # Use the merged file loader
|
|
||||||
audio_dir: "/data/AllAudio/"
|
|
||||||
annotations_path: "/data/CombinedAnnotations/old_project_merged.json"
|
|
||||||
description: "Legacy annotations aggregated into a single JSON file."
|
|
||||||
# filter: ... # Optional filter settings can be added here
|
|
||||||
```
|
|
||||||
|
|
||||||
## Filtering Legacy Annotations
|
|
||||||
|
|
||||||
The legacy JSON annotation structure (for both formats) included boolean flags indicating the status of the annotation work for each recording:
|
|
||||||
|
|
||||||
- `annotated`: Typically `true` if a human had reviewed or created annotations for the file.
|
|
||||||
- `issues`: Typically `true` if problems were noted during annotation or review.
|
|
||||||
|
|
||||||
You can optionally filter the data based on these flags using a `filter:` block within the source configuration.
|
|
||||||
This applies whether you use `"batdetect2"` or `"batdetect2_file"`.
|
|
||||||
|
|
||||||
**Filter Options:**
|
|
||||||
|
|
||||||
- `only_annotated: <true_or_false>`:
|
|
||||||
- `true` (**Default**): Only process entries where the `annotated` flag in the JSON is `true`.
|
|
||||||
- `false`: Process entries regardless of the `annotated` flag.
|
|
||||||
- `exclude_issues: <true_or_false>`:
|
|
||||||
- `true` (**Default**): Skip processing entries where the `issues` flag in the JSON is `true`.
|
|
||||||
- `false`: Process entries even if they are flagged with `issues`.
|
|
||||||
|
|
||||||
**Default Filtering:** If you **omit** the `filter:` block entirely, the default settings (`only_annotated: true`, `exclude_issues: true`) are applied automatically.
|
|
||||||
This means only entries marked as annotated and not having issues will be loaded.
|
|
||||||
|
|
||||||
**Disabling Filtering:** To load _all_ entries from the legacy source regardless of the `annotated` or `issues` flags, explicitly disable the filter:
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
filter: null
|
|
||||||
```
|
|
||||||
|
|
||||||
**YAML Example (Custom Filter):** Only load entries marked as annotated, but _include_ those with issues.
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
sources:
|
|
||||||
- name: "LegacyData_WithIssues"
|
|
||||||
format: "batdetect2" # Or "batdetect2_file"
|
|
||||||
audio_dir: "path/to/audio"
|
|
||||||
annotations_dir: "path/to/annotations" # Or annotations_path for merged
|
|
||||||
filter:
|
|
||||||
only_annotated: true
|
|
||||||
exclude_issues: false # Include entries even if issues flag is true
|
|
||||||
```
|
|
||||||
|
|
||||||
## Summary
|
|
||||||
|
|
||||||
BatDetect2 allows you to incorporate datasets stored in older "BatDetect2" JSON formats.
|
|
||||||
|
|
||||||
- Use `format: "batdetect2"` and provide `annotations_dir` if you have one JSON file per recording in a directory.
|
|
||||||
- Use `format: "batdetect2_file"` and provide `annotations_path` if you have a single JSON file containing annotations for multiple recordings.
|
|
||||||
- Optionally use the `filter:` block with `only_annotated` and `exclude_issues` to select data based on flags present in the legacy JSON structure.
|
|
||||||
|
|
||||||
The system will handle loading, filtering (if configured), and converting this legacy data into the standard `soundevent` format used internally.
|
|
||||||
@ -7,4 +7,8 @@ about trade-offs.
|
|||||||
:maxdepth: 1
|
:maxdepth: 1
|
||||||
|
|
||||||
model-output-and-validation
|
model-output-and-validation
|
||||||
|
postprocessing-and-thresholds
|
||||||
|
pipeline-overview
|
||||||
|
preprocessing-consistency
|
||||||
|
target-encoding-and-decoding
|
||||||
```
|
```
|
||||||
|
|||||||
34
docs/source/explanation/pipeline-overview.md
Normal file
34
docs/source/explanation/pipeline-overview.md
Normal file
@ -0,0 +1,34 @@
|
|||||||
|
# Pipeline overview
|
||||||
|
|
||||||
|
batdetect2 processes recordings as a sequence of modules. Each stage has a
|
||||||
|
clear role and configuration surface.
|
||||||
|
|
||||||
|
## End-to-end flow
|
||||||
|
|
||||||
|
1. Audio loading
|
||||||
|
2. Preprocessing (waveform -> spectrogram)
|
||||||
|
3. Detector forward pass
|
||||||
|
4. Postprocessing (peaks, decoding, thresholds)
|
||||||
|
5. Output formatting and export
|
||||||
|
|
||||||
|
## Why the modular design matters
|
||||||
|
|
||||||
|
The model, preprocessing, postprocessing, targets, and output formatting are
|
||||||
|
configured separately. That makes it easier to:
|
||||||
|
|
||||||
|
- swap components without rewriting the whole pipeline,
|
||||||
|
- keep experiments reproducible,
|
||||||
|
- adapt workflows to new datasets.
|
||||||
|
|
||||||
|
## Core objects in the stack
|
||||||
|
|
||||||
|
- `BatDetect2API` orchestrates training, inference, and evaluation workflows.
|
||||||
|
- `ModelConfig` defines architecture, preprocessing, postprocessing, and
|
||||||
|
targets.
|
||||||
|
- `Targets` controls event filtering, class encoding/decoding, and ROI mapping.
|
||||||
|
|
||||||
|
## Related pages
|
||||||
|
|
||||||
|
- Preprocessing rationale: {doc}`preprocessing-consistency`
|
||||||
|
- Postprocessing rationale: {doc}`postprocessing-and-thresholds`
|
||||||
|
- Target rationale: {doc}`target-encoding-and-decoding`
|
||||||
43
docs/source/explanation/postprocessing-and-thresholds.md
Normal file
43
docs/source/explanation/postprocessing-and-thresholds.md
Normal file
@ -0,0 +1,43 @@
|
|||||||
|
# Postprocessing and thresholds
|
||||||
|
|
||||||
|
After the detector runs on a spectrogram, the model output is still a set of
|
||||||
|
dense prediction tensors. Postprocessing turns that into a final list of call
|
||||||
|
detections with positions, sizes, and class scores.
|
||||||
|
|
||||||
|
## What postprocessing does
|
||||||
|
|
||||||
|
In broad terms, the pipeline:
|
||||||
|
|
||||||
|
1. suppresses nearby duplicate peaks,
|
||||||
|
2. extracts candidate detections,
|
||||||
|
3. reads size and class values at each detected location,
|
||||||
|
4. decodes outputs into call-level predictions.
|
||||||
|
|
||||||
|
This is where score thresholds and output density limits are applied.
|
||||||
|
|
||||||
|
## Why thresholds matter
|
||||||
|
|
||||||
|
Thresholds control the balance between sensitivity and precision.
|
||||||
|
|
||||||
|
- Lower thresholds keep more detections, including weaker calls, but may add
|
||||||
|
false positives.
|
||||||
|
- Higher thresholds remove low-confidence detections, but may miss faint calls.
|
||||||
|
|
||||||
|
You can tune this behavior per run without retraining the model.
|
||||||
|
|
||||||
|
## Two common threshold controls
|
||||||
|
|
||||||
|
- `detection_threshold`: minimum score required to keep a detection.
|
||||||
|
- `classification_threshold`: minimum class score used when assigning class
|
||||||
|
labels.
|
||||||
|
|
||||||
|
Both settings shape the final output and should be validated on reviewed local
|
||||||
|
data.
|
||||||
|
|
||||||
|
## Practical workflow
|
||||||
|
|
||||||
|
Tune thresholds on a representative subset first, then lock settings for the
|
||||||
|
full analysis run.
|
||||||
|
|
||||||
|
- How-to: {doc}`../how_to/tune-detection-threshold`
|
||||||
|
- CLI reference: {doc}`../reference/cli/predict`
|
||||||
36
docs/source/explanation/preprocessing-consistency.md
Normal file
36
docs/source/explanation/preprocessing-consistency.md
Normal file
@ -0,0 +1,36 @@
|
|||||||
|
# Preprocessing consistency
|
||||||
|
|
||||||
|
Preprocessing consistency is one of the biggest factors behind stable model
|
||||||
|
performance.
|
||||||
|
|
||||||
|
## Why consistency matters
|
||||||
|
|
||||||
|
The detector is trained on spectrograms produced by a specific preprocessing
|
||||||
|
pipeline. If inference uses different settings, the model can see a shifted
|
||||||
|
input distribution and performance may drop.
|
||||||
|
|
||||||
|
Typical mismatch sources:
|
||||||
|
|
||||||
|
- sample-rate differences,
|
||||||
|
- changed frequency crop,
|
||||||
|
- changed STFT window/hop,
|
||||||
|
- changed spectrogram transforms.
|
||||||
|
|
||||||
|
## Practical implication
|
||||||
|
|
||||||
|
When possible, keep preprocessing settings aligned between:
|
||||||
|
|
||||||
|
- training,
|
||||||
|
- evaluation,
|
||||||
|
- deployment inference.
|
||||||
|
|
||||||
|
If you intentionally change preprocessing, treat this as a new experiment and
|
||||||
|
re-validate on reviewed local data.
|
||||||
|
|
||||||
|
## Related pages
|
||||||
|
|
||||||
|
- Configure audio preprocessing:
|
||||||
|
{doc}`../how_to/configure-audio-preprocessing`
|
||||||
|
- Configure spectrogram preprocessing:
|
||||||
|
{doc}`../how_to/configure-spectrogram-preprocessing`
|
||||||
|
- Preprocessing config reference: {doc}`../reference/preprocessing-config`
|
||||||
40
docs/source/explanation/target-encoding-and-decoding.md
Normal file
40
docs/source/explanation/target-encoding-and-decoding.md
Normal file
@ -0,0 +1,40 @@
|
|||||||
|
# Target encoding and decoding
|
||||||
|
|
||||||
|
batdetect2 turns annotated sound events into training targets, then maps model
|
||||||
|
outputs back into interpretable predictions.
|
||||||
|
|
||||||
|
## Encoding path (annotations -> model targets)
|
||||||
|
|
||||||
|
At training time, the target system:
|
||||||
|
|
||||||
|
1. checks whether an event belongs to the configured detection target,
|
||||||
|
2. assigns a classification label (or none for non-specific class matches),
|
||||||
|
3. maps event geometry into position and size targets.
|
||||||
|
|
||||||
|
This behaviour is configured through `TargetConfig`,
|
||||||
|
`TargetClassConfig`, and ROI mapper settings.
|
||||||
|
|
||||||
|
## Decoding path (model outputs -> tags and geometry)
|
||||||
|
|
||||||
|
At inference time, class labels and ROI parameters are decoded back into
|
||||||
|
annotation tags and geometry.
|
||||||
|
|
||||||
|
This makes outputs interpretable in the same conceptual space as your original
|
||||||
|
annotations.
|
||||||
|
|
||||||
|
## Why this matters
|
||||||
|
|
||||||
|
Target definitions are not just metadata. They directly shape:
|
||||||
|
|
||||||
|
- what events are treated as positive examples,
|
||||||
|
- which class names the model learns,
|
||||||
|
- how geometry is represented and reconstructed.
|
||||||
|
|
||||||
|
Small changes here can alter both training outcomes and prediction semantics.
|
||||||
|
|
||||||
|
## Related pages
|
||||||
|
|
||||||
|
- Configure detection target logic: {doc}`../how_to/configure-target-definitions`
|
||||||
|
- Configure class mapping: {doc}`../how_to/define-target-classes`
|
||||||
|
- Configure ROI mapping: {doc}`../how_to/configure-roi-mapping`
|
||||||
|
- Target config reference: {doc}`../reference/targets-config-workflow`
|
||||||
53
docs/source/how_to/configure-aoef-dataset.md
Normal file
53
docs/source/how_to/configure-aoef-dataset.md
Normal file
@ -0,0 +1,53 @@
|
|||||||
|
# How to configure an AOEF dataset source
|
||||||
|
|
||||||
|
Use this guide when your annotations are stored in AOEF/soundevent JSON files,
|
||||||
|
including exports from Whombat.
|
||||||
|
|
||||||
|
## 1) Add an AOEF source entry
|
||||||
|
|
||||||
|
In your dataset config, add a source with `format: aoef`.
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
sources:
|
||||||
|
- name: my_aoef_source
|
||||||
|
format: aoef
|
||||||
|
audio_dir: /path/to/audio
|
||||||
|
annotations_path: /path/to/annotations.soundevent.json
|
||||||
|
```
|
||||||
|
|
||||||
|
## 2) Choose filtering behavior for annotation projects
|
||||||
|
|
||||||
|
If `annotations_path` is an `AnnotationProject`, you can filter by task state.
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
sources:
|
||||||
|
- name: whombat_verified
|
||||||
|
format: aoef
|
||||||
|
audio_dir: /path/to/audio
|
||||||
|
annotations_path: /path/to/project_export.aoef
|
||||||
|
filter:
|
||||||
|
only_completed: true
|
||||||
|
only_verified: true
|
||||||
|
exclude_issues: true
|
||||||
|
```
|
||||||
|
|
||||||
|
If you omit `filter`, default project filtering is applied.
|
||||||
|
|
||||||
|
To disable filtering for project files:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
filter: null
|
||||||
|
```
|
||||||
|
|
||||||
|
## 3) Check that the source loads
|
||||||
|
|
||||||
|
Run a summary on your dataset config:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
batdetect2 data summary path/to/dataset.yaml
|
||||||
|
```
|
||||||
|
|
||||||
|
## 4) Continue to training or evaluation
|
||||||
|
|
||||||
|
- For training: {doc}`../tutorials/train-a-custom-model`
|
||||||
|
- For field-level reference: {doc}`../reference/data-sources`
|
||||||
64
docs/source/how_to/configure-audio-preprocessing.md
Normal file
64
docs/source/how_to/configure-audio-preprocessing.md
Normal file
@ -0,0 +1,64 @@
|
|||||||
|
# How to configure audio preprocessing
|
||||||
|
|
||||||
|
Use this guide to set sample-rate and waveform-level preprocessing behaviour.
|
||||||
|
|
||||||
|
## 1) Set audio loader settings
|
||||||
|
|
||||||
|
The audio loader config controls resampling.
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
samplerate: 256000
|
||||||
|
resample:
|
||||||
|
enabled: true
|
||||||
|
method: poly
|
||||||
|
```
|
||||||
|
|
||||||
|
If your recordings are already at the expected sample rate, you can disable
|
||||||
|
resampling.
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
samplerate: 256000
|
||||||
|
resample:
|
||||||
|
enabled: false
|
||||||
|
```
|
||||||
|
|
||||||
|
## 2) Set waveform transforms in preprocessing config
|
||||||
|
|
||||||
|
Waveform transforms are configured in `preprocess.audio_transforms`.
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
preprocess:
|
||||||
|
audio_transforms:
|
||||||
|
- name: center_audio
|
||||||
|
- name: scale_audio
|
||||||
|
- name: fix_duration
|
||||||
|
duration: 0.5
|
||||||
|
```
|
||||||
|
|
||||||
|
Available built-ins:
|
||||||
|
|
||||||
|
- `center_audio`
|
||||||
|
- `scale_audio`
|
||||||
|
- `fix_duration`
|
||||||
|
|
||||||
|
## 3) Use the config in your workflow
|
||||||
|
|
||||||
|
For CLI inference/evaluation, use `--audio-config`.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
batdetect2 predict directory \
|
||||||
|
path/to/model.ckpt \
|
||||||
|
path/to/audio_dir \
|
||||||
|
path/to/outputs \
|
||||||
|
--audio-config path/to/audio.yaml
|
||||||
|
```
|
||||||
|
|
||||||
|
## 4) Verify quickly on a small subset
|
||||||
|
|
||||||
|
Run on a small folder first and confirm that outputs and runtime are as
|
||||||
|
expected before full-batch runs.
|
||||||
|
|
||||||
|
## Related pages
|
||||||
|
|
||||||
|
- Spectrogram settings: {doc}`configure-spectrogram-preprocessing`
|
||||||
|
- Preprocessing config reference: {doc}`../reference/preprocessing-config`
|
||||||
54
docs/source/how_to/configure-roi-mapping.md
Normal file
54
docs/source/how_to/configure-roi-mapping.md
Normal file
@ -0,0 +1,54 @@
|
|||||||
|
# How to configure ROI mapping
|
||||||
|
|
||||||
|
Use this guide to control how annotation geometry is encoded into training
|
||||||
|
targets and decoded back into boxes.
|
||||||
|
|
||||||
|
## 1) Set the default ROI mapper
|
||||||
|
|
||||||
|
The default mapper is `anchor_bbox`.
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
roi:
|
||||||
|
name: anchor_bbox
|
||||||
|
anchor: bottom-left
|
||||||
|
time_scale: 1000.0
|
||||||
|
frequency_scale: 0.001163
|
||||||
|
```
|
||||||
|
|
||||||
|
## 2) Choose an anchor strategy
|
||||||
|
|
||||||
|
Typical options include `bottom-left` and `center`.
|
||||||
|
|
||||||
|
- `bottom-left` is the current default.
|
||||||
|
- `center` can be easier to reason about in some workflows.
|
||||||
|
|
||||||
|
## 3) Set scale factors intentionally
|
||||||
|
|
||||||
|
- `time_scale` controls width scaling.
|
||||||
|
- `frequency_scale` controls height scaling.
|
||||||
|
|
||||||
|
Use values that are consistent with your model setup and keep them fixed when
|
||||||
|
comparing experiments.
|
||||||
|
|
||||||
|
## 4) (Optional) override ROI mapping for specific classes
|
||||||
|
|
||||||
|
You can set class-level `roi` in `classification_targets` when needed.
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
classification_targets:
|
||||||
|
- name: species_x
|
||||||
|
tags:
|
||||||
|
- key: class
|
||||||
|
value: Species X
|
||||||
|
roi:
|
||||||
|
name: anchor_bbox
|
||||||
|
anchor: center
|
||||||
|
time_scale: 1000.0
|
||||||
|
frequency_scale: 0.001163
|
||||||
|
```
|
||||||
|
|
||||||
|
## Related pages
|
||||||
|
|
||||||
|
- Target definitions: {doc}`configure-target-definitions`
|
||||||
|
- Class definitions: {doc}`define-target-classes`
|
||||||
|
- Target encoding overview: {doc}`../explanation/target-encoding-and-decoding`
|
||||||
59
docs/source/how_to/configure-spectrogram-preprocessing.md
Normal file
59
docs/source/how_to/configure-spectrogram-preprocessing.md
Normal file
@ -0,0 +1,59 @@
|
|||||||
|
# How to configure spectrogram preprocessing
|
||||||
|
|
||||||
|
Use this guide to set STFT, frequency range, and spectrogram transforms.
|
||||||
|
|
||||||
|
## 1) Configure STFT and frequency range
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
preprocess:
|
||||||
|
stft:
|
||||||
|
window_duration: 0.002
|
||||||
|
window_overlap: 0.75
|
||||||
|
window_fn: hann
|
||||||
|
frequencies:
|
||||||
|
min_freq: 10000
|
||||||
|
max_freq: 120000
|
||||||
|
```
|
||||||
|
|
||||||
|
## 2) Configure spectrogram transforms
|
||||||
|
|
||||||
|
`spectrogram_transforms` are applied in order.
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
preprocess:
|
||||||
|
spectrogram_transforms:
|
||||||
|
- name: pcen
|
||||||
|
time_constant: 0.4
|
||||||
|
gain: 0.98
|
||||||
|
bias: 2.0
|
||||||
|
power: 0.5
|
||||||
|
- name: spectral_mean_subtraction
|
||||||
|
- name: scale_amplitude
|
||||||
|
scale: db
|
||||||
|
```
|
||||||
|
|
||||||
|
Common built-ins:
|
||||||
|
|
||||||
|
- `pcen`
|
||||||
|
- `spectral_mean_subtraction`
|
||||||
|
- `scale_amplitude` (`db` or `power`)
|
||||||
|
- `peak_normalize`
|
||||||
|
|
||||||
|
## 3) Configure output size
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
preprocess:
|
||||||
|
size:
|
||||||
|
height: 128
|
||||||
|
resize_factor: 0.5
|
||||||
|
```
|
||||||
|
|
||||||
|
## 4) Keep train and inference settings aligned
|
||||||
|
|
||||||
|
Use the same preprocessing setup for training and prediction whenever possible.
|
||||||
|
Large mismatches can degrade model performance.
|
||||||
|
|
||||||
|
## Related pages
|
||||||
|
|
||||||
|
- Why consistency matters: {doc}`../explanation/preprocessing-consistency`
|
||||||
|
- Preprocessing config reference: {doc}`../reference/preprocessing-config`
|
||||||
58
docs/source/how_to/configure-target-definitions.md
Normal file
58
docs/source/how_to/configure-target-definitions.md
Normal file
@ -0,0 +1,58 @@
|
|||||||
|
# How to configure target definitions
|
||||||
|
|
||||||
|
Use this guide to define which annotated sound events are considered valid
|
||||||
|
detection targets.
|
||||||
|
|
||||||
|
## 1) Start from a targets config file
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
detection_target:
|
||||||
|
name: bat
|
||||||
|
match_if:
|
||||||
|
name: has_tag
|
||||||
|
tag:
|
||||||
|
key: call_type
|
||||||
|
value: Echolocation
|
||||||
|
assign_tags:
|
||||||
|
- key: call_type
|
||||||
|
value: Echolocation
|
||||||
|
- key: order
|
||||||
|
value: Chiroptera
|
||||||
|
```
|
||||||
|
|
||||||
|
`match_if` decides whether an annotation is included in the detection target.
|
||||||
|
|
||||||
|
## 2) Use condition combinators when needed
|
||||||
|
|
||||||
|
You can combine conditions with `all_of`, `any_of`, and `not`.
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
detection_target:
|
||||||
|
name: bat
|
||||||
|
match_if:
|
||||||
|
name: all_of
|
||||||
|
conditions:
|
||||||
|
- name: has_tag
|
||||||
|
tag:
|
||||||
|
key: call_type
|
||||||
|
value: Echolocation
|
||||||
|
- name: not
|
||||||
|
condition:
|
||||||
|
name: has_any_tag
|
||||||
|
tags:
|
||||||
|
- key: call_type
|
||||||
|
value: Social
|
||||||
|
- key: class
|
||||||
|
value: Not Bat
|
||||||
|
```
|
||||||
|
|
||||||
|
## 3) Verify with a small sample first
|
||||||
|
|
||||||
|
Before full training, inspect a small annotation subset and confirm that the
|
||||||
|
selection logic keeps the events you expect.
|
||||||
|
|
||||||
|
## Related pages
|
||||||
|
|
||||||
|
- Class mapping: {doc}`define-target-classes`
|
||||||
|
- ROI mapping: {doc}`configure-roi-mapping`
|
||||||
|
- Targets reference: {doc}`../reference/targets-config-workflow`
|
||||||
59
docs/source/how_to/define-target-classes.md
Normal file
59
docs/source/how_to/define-target-classes.md
Normal file
@ -0,0 +1,59 @@
|
|||||||
|
# How to define target classes
|
||||||
|
|
||||||
|
Use this guide to map annotations to classification labels used during
|
||||||
|
training.
|
||||||
|
|
||||||
|
## 1) Add classification target entries
|
||||||
|
|
||||||
|
Each entry defines a class name and matching tags.
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
classification_targets:
|
||||||
|
- name: pippip
|
||||||
|
tags:
|
||||||
|
- key: class
|
||||||
|
value: Pipistrellus pipistrellus
|
||||||
|
- name: pippyg
|
||||||
|
tags:
|
||||||
|
- key: class
|
||||||
|
value: Pipistrellus pygmaeus
|
||||||
|
```
|
||||||
|
|
||||||
|
## 2) Use `assign_tags` to control decoded output tags
|
||||||
|
|
||||||
|
If you want prediction output tags to differ from matching tags, set
|
||||||
|
`assign_tags` explicitly.
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
classification_targets:
|
||||||
|
- name: pipistrelle_group
|
||||||
|
tags:
|
||||||
|
- key: class
|
||||||
|
value: Pipistrellus pipistrellus
|
||||||
|
assign_tags:
|
||||||
|
- key: genus
|
||||||
|
value: Pipistrellus
|
||||||
|
```
|
||||||
|
|
||||||
|
## 3) Use `match_if` for complex class rules
|
||||||
|
|
||||||
|
For advanced conditions, use `match_if` instead of `tags`.
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
classification_targets:
|
||||||
|
- name: long_call
|
||||||
|
match_if:
|
||||||
|
name: duration
|
||||||
|
operator: gt
|
||||||
|
seconds: 0.02
|
||||||
|
```
|
||||||
|
|
||||||
|
## 4) Confirm class names are unique
|
||||||
|
|
||||||
|
`classification_targets.name` values must be unique.
|
||||||
|
|
||||||
|
## Related pages
|
||||||
|
|
||||||
|
- Detection-target filtering: {doc}`configure-target-definitions`
|
||||||
|
- ROI mapping: {doc}`configure-roi-mapping`
|
||||||
|
- Targets config reference: {doc}`../reference/targets-config-workflow`
|
||||||
66
docs/source/how_to/import-legacy-batdetect2-annotations.md
Normal file
66
docs/source/how_to/import-legacy-batdetect2-annotations.md
Normal file
@ -0,0 +1,66 @@
|
|||||||
|
# How to import legacy batdetect2 annotations
|
||||||
|
|
||||||
|
Use this guide if your annotations are in older batdetect2 JSON formats.
|
||||||
|
|
||||||
|
Two legacy formats are supported:
|
||||||
|
|
||||||
|
- `batdetect2`: one annotation JSON file per recording
|
||||||
|
- `batdetect2_file`: one merged JSON file for many recordings
|
||||||
|
|
||||||
|
## 1) Choose the correct source format
|
||||||
|
|
||||||
|
Directory-based annotations (`format: batdetect2`):
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
sources:
|
||||||
|
- name: legacy_per_file
|
||||||
|
format: batdetect2
|
||||||
|
audio_dir: /path/to/audio
|
||||||
|
annotations_dir: /path/to/annotation_json_dir
|
||||||
|
```
|
||||||
|
|
||||||
|
Merged annotation file (`format: batdetect2_file`):
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
sources:
|
||||||
|
- name: legacy_merged
|
||||||
|
format: batdetect2_file
|
||||||
|
audio_dir: /path/to/audio
|
||||||
|
annotations_path: /path/to/merged_annotations.json
|
||||||
|
```
|
||||||
|
|
||||||
|
## 2) Set optional legacy filters
|
||||||
|
|
||||||
|
Legacy filters are based on `annotated` and `issues` flags.
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
filter:
|
||||||
|
only_annotated: true
|
||||||
|
exclude_issues: true
|
||||||
|
```
|
||||||
|
|
||||||
|
To load all entries regardless of flags:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
filter: null
|
||||||
|
```
|
||||||
|
|
||||||
|
## 3) Validate and convert if needed
|
||||||
|
|
||||||
|
Check loaded records:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
batdetect2 data summary path/to/dataset.yaml
|
||||||
|
```
|
||||||
|
|
||||||
|
Convert to annotation-set output for downstream tooling:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
batdetect2 data convert path/to/dataset.yaml --output path/to/output.json
|
||||||
|
```
|
||||||
|
|
||||||
|
## 4) Continue with current workflows
|
||||||
|
|
||||||
|
- Run predictions: {doc}`run-batch-predictions`
|
||||||
|
- Train on imported data: {doc}`../tutorials/train-a-custom-model`
|
||||||
|
- Field-level reference: {doc}`../reference/data-sources`
|
||||||
@ -2,14 +2,16 @@
|
|||||||
|
|
||||||
How-to guides help you complete specific tasks while working.
|
How-to guides help you complete specific tasks while working.
|
||||||
|
|
||||||
## Who this section is for
|
|
||||||
|
|
||||||
- Ecologists running repeat analyses.
|
|
||||||
- Python-savvy users integrating BatDetect2 into workflows.
|
|
||||||
|
|
||||||
```{toctree}
|
```{toctree}
|
||||||
:maxdepth: 1
|
:maxdepth: 1
|
||||||
|
|
||||||
run-batch-predictions
|
run-batch-predictions
|
||||||
tune-detection-threshold
|
tune-detection-threshold
|
||||||
|
configure-aoef-dataset
|
||||||
|
import-legacy-batdetect2-annotations
|
||||||
|
configure-audio-preprocessing
|
||||||
|
configure-spectrogram-preprocessing
|
||||||
|
configure-target-definitions
|
||||||
|
define-target-classes
|
||||||
|
configure-roi-mapping
|
||||||
```
|
```
|
||||||
|
|||||||
@ -82,7 +82,6 @@ tutorials/index
|
|||||||
how_to/index
|
how_to/index
|
||||||
reference/index
|
reference/index
|
||||||
explanation/index
|
explanation/index
|
||||||
legacy/index
|
|
||||||
```
|
```
|
||||||
|
|
||||||
```{toctree}
|
```{toctree}
|
||||||
|
|||||||
@ -1,14 +0,0 @@
|
|||||||
# Legacy documentation
|
|
||||||
|
|
||||||
These pages contain existing technical material that predates the Diataxis
|
|
||||||
reorganization. They remain available during migration.
|
|
||||||
|
|
||||||
```{toctree}
|
|
||||||
:maxdepth: 1
|
|
||||||
|
|
||||||
../architecture
|
|
||||||
../data/index
|
|
||||||
../preprocessing/index
|
|
||||||
../postprocessing
|
|
||||||
../targets/index
|
|
||||||
```
|
|
||||||
@ -1,126 +0,0 @@
|
|||||||
# Postprocessing: From Model Output to Predictions
|
|
||||||
|
|
||||||
## What is Postprocessing?
|
|
||||||
|
|
||||||
After the BatDetect2 neural network analyzes a spectrogram, it doesn't directly output a neat list of bat calls.
|
|
||||||
Instead, it produces raw numerical data, usually in the form of multi-dimensional arrays or "heatmaps".
|
|
||||||
These arrays contain information like:
|
|
||||||
|
|
||||||
- The probability of a sound event being present at each time-frequency location.
|
|
||||||
- The probability of each possible target class (e.g., species) at each location.
|
|
||||||
- Predicted size characteristics (like duration and bandwidth) at each location.
|
|
||||||
- Internal learned features at each location.
|
|
||||||
|
|
||||||
**Postprocessing** is the sequence of steps that takes these numerical model outputs and translates them into a structured list of detected sound events, complete with predicted tags, bounding boxes, and confidence scores.
|
|
||||||
The {py:mod}`batdetect2.postprocess` mode handles this entire workflow.
|
|
||||||
|
|
||||||
## Why is Postprocessing Necessary?
|
|
||||||
|
|
||||||
1. **Interpretation:** Raw heatmap outputs need interpretation to identify distinct sound events (detections).
|
|
||||||
A high probability score might spread across several adjacent time-frequency bins, all related to the same call.
|
|
||||||
2. **Refinement:** Model outputs can be noisy or contain redundancies.
|
|
||||||
Postprocessing steps like Non-Maximum Suppression (NMS) clean this up, ensuring (ideally) only one detection is reported for each actual sound event.
|
|
||||||
3. **Contextualization:** Raw outputs lack real-world units.
|
|
||||||
Postprocessing adds back time (seconds) and frequency (Hz) coordinates, converts predicted sizes to physical units using configured scales, and decodes predicted class indices back into meaningful tags based on your target definitions.
|
|
||||||
4. **User Control:** Postprocessing includes tunable parameters, most importantly **thresholds**.
|
|
||||||
By adjusting these, you can control the trade-off between finding more potential calls (sensitivity) versus reducing false positives (specificity) _without needing to retrain the model_.
|
|
||||||
|
|
||||||
## The Postprocessing Pipeline
|
|
||||||
|
|
||||||
BatDetect2 applies a series of steps to convert the raw model output into final predictions.
|
|
||||||
Understanding these steps helps interpret the results and configure the process effectively:
|
|
||||||
|
|
||||||
1. **Non-Maximum Suppression (NMS):**
|
|
||||||
|
|
||||||
- **Goal:** Reduce redundant detections.
|
|
||||||
If the model outputs high scores for several nearby points corresponding to the same call, NMS selects the single highest peak in a local neighbourhood and suppresses the others (sets their score to zero).
|
|
||||||
- **Configurable:** The size of the neighbourhood (`nms_kernel_size`) can be adjusted.
|
|
||||||
|
|
||||||
2. **Coordinate Remapping:**
|
|
||||||
|
|
||||||
- **Goal:** Add coordinate (time/frequency) information.
|
|
||||||
This step takes the grid-based model outputs (which just have row/column indices) and associates them with actual time (seconds) and frequency (Hz) coordinates based on the input spectrogram's properties.
|
|
||||||
The result is coordinate-aware arrays (using {py:class}`xarray.DataArray`}).
|
|
||||||
|
|
||||||
3. **Detection Extraction:**
|
|
||||||
|
|
||||||
- **Goal:** Identify the specific points representing detected events.
|
|
||||||
- **Process:** Looks for peaks in the NMS-processed detection heatmap that are above a certain confidence level (`detection_threshold`).
|
|
||||||
It also often limits the maximum number of detections based on a rate (`top_k_per_sec`) to avoid excessive outputs in very busy files.
|
|
||||||
- **Configurable:** `detection_threshold`, `top_k_per_sec`.
|
|
||||||
|
|
||||||
4. **Data Extraction:**
|
|
||||||
|
|
||||||
- **Goal:** Gather all relevant information for each detected point.
|
|
||||||
- **Process:** For each time-frequency location identified in Step 3, this step looks up the corresponding values in the _other_ remapped model output arrays (class probabilities, predicted sizes, internal features).
|
|
||||||
- **Intermediate Output 1:** The result of this stage (containing aligned scores, positions, sizes, class probabilities, and features for all detections in a clip) is often accessible programmatically as an {py:class}`xarray.Dataset`}.
|
|
||||||
This can be useful for advanced users needing direct access to the numerical outputs.
|
|
||||||
|
|
||||||
5. **Decoding & Formatting:**
|
|
||||||
|
|
||||||
- **Goal:** Convert the extracted numerical data into interpretable, standard formats.
|
|
||||||
- **Process:**
|
|
||||||
- **ROI Recovery:** Uses the predicted position and size values, along with the ROI mapping configuration defined in the `targets` module, to reconstruct an estimated bounding box ({py:class}`soundevent.data.BoundingBox`}).
|
|
||||||
- **Class Decoding:** Translates the numerical class probability vector into meaningful {py:class}`soundevent.data.PredictedTag` objects.
|
|
||||||
This involves:
|
|
||||||
- Applying the `classification_threshold` to ignore low-confidence class scores.
|
|
||||||
- Using the class decoding rules (from the `targets` module) to map the name(s) of the high-scoring class(es) back to standard tags (like `species: Myotis daubentonii`).
|
|
||||||
- Optionally selecting only the top-scoring class or multiple classes above the threshold.
|
|
||||||
- Including the generic "Bat" tags if no specific class meets the threshold.
|
|
||||||
- **Feature Conversion:** Converts raw feature vectors into {py:class}`soundevent.data.Feature` objects.
|
|
||||||
- **Intermediate Output 2:** This step might internally create a list of simplified `RawPrediction` objects containing the bounding box, scores, and features.
|
|
||||||
This intermediate list might also be accessible programmatically for users who prefer a simpler structure than the final {py:mod}`soundevent` objects.
|
|
||||||
|
|
||||||
6. **Final Output (`ClipPrediction`):**
|
|
||||||
- **Goal:** Package everything into a standard format.
|
|
||||||
- **Process:** Collects all the fully processed `SoundEventPrediction` objects (each containing a sound event with geometry, features, overall score, and predicted tags with scores) for a given audio clip into a final {py:class}`soundevent.data.ClipPrediction` object.
|
|
||||||
This is the standard output format representing the model's findings for that clip.
|
|
||||||
|
|
||||||
## Configuring Postprocessing
|
|
||||||
|
|
||||||
You can control key aspects of this pipeline, especially the thresholds and NMS settings, via a `postprocess:` section in your main configuration YAML file.
|
|
||||||
Adjusting these **allows you to fine-tune the detection results without retraining the model**.
|
|
||||||
|
|
||||||
**Key Configurable Parameters:**
|
|
||||||
|
|
||||||
- `detection_threshold`: (Number >= 0, e.g., `0.1`) Minimum score for a peak to be considered a detection.
|
|
||||||
**Lowering this increases sensitivity (more detections, potentially more false positives); raising it increases specificity (fewer detections, potentially missing faint calls).**
|
|
||||||
- `classification_threshold`: (Number >= 0, e.g., `0.3`) Minimum score for a _specific class_ prediction to be assigned as a tag.
|
|
||||||
Affects how confidently the model must identify the class.
|
|
||||||
- `top_k_per_sec`: (Integer > 0, e.g., `200`) Limits the maximum density of detections reported per second.
|
|
||||||
Helps manage extremely dense recordings.
|
|
||||||
- `nms_kernel_size`: (Integer > 0, e.g., `9`) Size of the NMS window in pixels/bins.
|
|
||||||
Affects how close two distinct peaks can be before one suppresses the other.
|
|
||||||
|
|
||||||
**Example YAML Configuration:**
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
# Inside your main configuration file (e.g., config.yaml)
|
|
||||||
|
|
||||||
postprocess:
|
|
||||||
nms_kernel_size: 9
|
|
||||||
detection_threshold: 0.1 # Lower threshold -> more sensitive
|
|
||||||
classification_threshold: 0.3 # Higher threshold -> more confident classifications
|
|
||||||
top_k_per_sec: 200
|
|
||||||
# ... other sections preprocessing, targets ...
|
|
||||||
```
|
|
||||||
|
|
||||||
**Note:** These parameters can often also be adjusted via Command Line Interface (CLI) arguments when running predictions, or through function arguments if using the Python API, providing flexibility for experimentation.
|
|
||||||
|
|
||||||
## Accessing Intermediate Results
|
|
||||||
|
|
||||||
While the final `ClipPrediction` objects are the standard output, the `Postprocessor` object used internally provides methods to access results from intermediate stages (like the `xr.Dataset` after Step 4, or the list of `RawPrediction` objects after Step 5).
|
|
||||||
|
|
||||||
This can be valuable for:
|
|
||||||
|
|
||||||
- Debugging the pipeline.
|
|
||||||
- Performing custom analyses on the numerical outputs before final decoding.
|
|
||||||
- **Transfer Learning / Feature Extraction:** Directly accessing the extracted `features` (from Step 4 or 5a) associated with detected events can be highly useful for training other models or further analysis.
|
|
||||||
|
|
||||||
Consult the API documentation for details on how to access these intermediate results programmatically if needed.
|
|
||||||
|
|
||||||
## Summary
|
|
||||||
|
|
||||||
Postprocessing is the conversion between neural network outputs and meaningful, interpretable sound event detections.
|
|
||||||
BatDetect2 provides a configurable pipeline including NMS, coordinate remapping, peak detection with thresholding, data extraction, and class/geometry decoding.
|
|
||||||
Researchers can easily tune key parameters like thresholds via configuration files or arguments to adjust the final set of predictions without altering the trained model itself, and advanced users can access intermediate results for custom analyses or feature reuse.
|
|
||||||
@ -1,92 +0,0 @@
|
|||||||
# Audio Loading and Preprocessing
|
|
||||||
|
|
||||||
## Purpose
|
|
||||||
|
|
||||||
Before BatDetect2 can analyze the sounds in your recordings, the raw audio data needs to be loaded from the file and prepared.
|
|
||||||
This initial preparation involves several standard waveform processing steps.
|
|
||||||
This `audio` module handles this first stage of preprocessing.
|
|
||||||
|
|
||||||
It's crucial to understand that the _exact same_ preprocessing steps must be applied both when **training** a model and when **using** that trained model later to make predictions (inference).
|
|
||||||
Consistent preprocessing ensures the model receives data in the format it expects.
|
|
||||||
|
|
||||||
BatDetect2 allows you to control these audio preprocessing steps through settings in your main configuration file.
|
|
||||||
|
|
||||||
## The Audio Processing Pipeline
|
|
||||||
|
|
||||||
When BatDetect2 needs to process an audio segment (either a full recording or a specific clip), it follows a defined sequence of steps:
|
|
||||||
|
|
||||||
1. **Load Audio Segment:** The system first reads the specified time segment from the audio file.
|
|
||||||
- **Note:** BatDetect2 typically works with **mono** audio.
|
|
||||||
By default, if your file has multiple channels (e.g., stereo), only the **first channel** is loaded and used for subsequent processing.
|
|
||||||
2. **Adjust Duration (Optional):** If you've specified a target duration in your configuration, the loaded audio segment is either shortened (by cropping from the start) or lengthened (by adding silence, i.e., zeros, at the end) to match that exact duration.
|
|
||||||
This is sometimes required by specific model architectures that expect fixed-size inputs.
|
|
||||||
By default, this step is **off**, and the original clip duration is used.
|
|
||||||
3. **Resample (Optional):** If configured (and usually **on** by default), the audio's sample rate is changed to a specific target value (e.g., 256,000 Hz).
|
|
||||||
This is vital for standardizing the data, as different recording devices capture audio at different rates.
|
|
||||||
The model needs to be trained and run on data with a consistent sample rate.
|
|
||||||
4. **Center Waveform (Optional):** If configured (and typically **on** by default), the system removes any constant shift away from zero in the waveform (known as DC offset).
|
|
||||||
This is a standard practice that can sometimes improve the quality of later signal processing steps.
|
|
||||||
5. **Scale Amplitude (Optional):** If configured (typically **off** by default), the waveform's amplitude (loudness) is adjusted.
|
|
||||||
The standard method used here is "peak normalization," which scales the entire clip so that the loudest point has an absolute value of 1.0.
|
|
||||||
This can help standardize volume levels across different recordings, although it's not always necessary or desirable depending on your analysis goals.
|
|
||||||
|
|
||||||
## Configuring Audio Processing
|
|
||||||
|
|
||||||
You can control these steps via settings in your main configuration file (e.g., `config.yaml`), usually within a dedicated `audio:` section (which might itself be under a broader `preprocessing:` section).
|
|
||||||
|
|
||||||
Here are the key options you can set:
|
|
||||||
|
|
||||||
- **Resampling (`resample`)**:
|
|
||||||
|
|
||||||
- To enable resampling (recommended and usually default), include a `resample:` block.
|
|
||||||
To disable it completely, you might set `resample: null` or omit the block.
|
|
||||||
- `samplerate`: (Number) The target sample rate in Hertz (Hz) that all audio will be converted to.
|
|
||||||
This **must** match the sample rate expected by the BatDetect2 model you are using or training (e.g., `samplerate: 256000`).
|
|
||||||
- `mode`: (Text, `"poly"` or `"fourier"`) The underlying algorithm used for resampling.
|
|
||||||
The default `"poly"` is generally a good choice.
|
|
||||||
You typically don't need to change this unless you have specific reasons.
|
|
||||||
|
|
||||||
- **Duration (`duration`)**:
|
|
||||||
|
|
||||||
- (Number or `null`) Sets a fixed duration for all audio clips in **seconds**.
|
|
||||||
If set (e.g., `duration: 4.0`), shorter clips are padded with silence, and longer clips are cropped.
|
|
||||||
If `null` (default), the original clip duration is used.
|
|
||||||
|
|
||||||
- **Centering (`center`)**:
|
|
||||||
|
|
||||||
- (Boolean, `true` or `false`) Controls DC offset removal.
|
|
||||||
Default is usually `true`.
|
|
||||||
Set to `false` to disable.
|
|
||||||
|
|
||||||
- **Scaling (`scale`)**:
|
|
||||||
- (Boolean, `true` or `false`) Controls peak amplitude normalization.
|
|
||||||
Default is usually `false`.
|
|
||||||
Set to `true` to enable scaling so the maximum absolute amplitude becomes 1.0.
|
|
||||||
|
|
||||||
**Example YAML Configuration:**
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
# Inside your main configuration file (e.g., training_config.yaml)
|
|
||||||
|
|
||||||
preprocessing: # Or this might be at the top level
|
|
||||||
audio:
|
|
||||||
# --- Resampling Settings ---
|
|
||||||
resample: # Settings block to control resampling
|
|
||||||
samplerate: 256000 # Target sample rate in Hz (Required if resampling)
|
|
||||||
mode: poly # Algorithm ('poly' or 'fourier', optional, defaults to 'poly')
|
|
||||||
# To disable resampling entirely, you might use:
|
|
||||||
# resample: null
|
|
||||||
|
|
||||||
# --- Other Settings ---
|
|
||||||
duration: null # Keep original clip duration (e.g., use 4.0 for 4 seconds)
|
|
||||||
center: true # Remove DC offset (default is often true)
|
|
||||||
scale: false # Do not normalize peak amplitude (default is often false)
|
|
||||||
|
|
||||||
# ... other configuration sections (like model, dataset, targets) ...
|
|
||||||
```
|
|
||||||
|
|
||||||
## Outcome
|
|
||||||
|
|
||||||
After these steps, the output is a standardized audio waveform (represented as a numerical array with time information).
|
|
||||||
This processed waveform is now ready for the next stage of preprocessing, which typically involves calculating the spectrogram (covered in the next module/section).
|
|
||||||
Ensuring these audio preprocessing settings are consistent is fundamental for achieving reliable results in both training and inference.
|
|
||||||
@ -1,46 +0,0 @@
|
|||||||
# Preprocessing Audio for BatDetect2
|
|
||||||
|
|
||||||
## What is Preprocessing?
|
|
||||||
|
|
||||||
Preprocessing refers to the steps taken to transform your raw audio recordings into a standardized format suitable for analysis by the BatDetect2 deep learning model.
|
|
||||||
This module (`batdetect2.preprocessing`) provides the tools to perform these transformations.
|
|
||||||
|
|
||||||
## Why is Preprocessing Important?
|
|
||||||
|
|
||||||
Applying a consistent preprocessing pipeline is important for several reasons:
|
|
||||||
|
|
||||||
1. **Standardization:** Audio recordings vary significantly depending on the equipment used, recording conditions, and settings (e.g., different sample rates, varying loudness levels, background noise).
|
|
||||||
Preprocessing helps standardize these aspects, making the data more uniform and allowing the model to learn relevant patterns more effectively.
|
|
||||||
2. **Model Requirements:** Deep learning models, particularly those like BatDetect2 that analyze 2D-patterns in spectrograms, are designed to work with specific input characteristics.
|
|
||||||
They often expect spectrograms of a certain size (time x frequency bins), with values represented on a particular scale (e.g., logarithmic/dB), and within a defined frequency range.
|
|
||||||
Preprocessing ensures the data meets these requirements.
|
|
||||||
3. **Consistency is Key:** Perhaps most importantly, the **exact same preprocessing steps** must be applied both when _training_ the model and when _using the trained model to make predictions_ (inference) on new data.
|
|
||||||
Any discrepancy between the preprocessing used during training and inference can significantly degrade the model's performance and lead to unreliable results.
|
|
||||||
BatDetect2's configurable pipeline ensures this consistency.
|
|
||||||
|
|
||||||
## How Preprocessing is Done in BatDetect2
|
|
||||||
|
|
||||||
BatDetect2 handles preprocessing through a configurable, two-stage pipeline:
|
|
||||||
|
|
||||||
1. **Audio Loading & Preparation:** This first stage deals with the raw audio waveform.
|
|
||||||
It involves loading the specified audio segment (from a file or clip), selecting a single channel (mono), optionally resampling it to a consistent sample rate, optionally adjusting its duration, and applying basic waveform conditioning like centering (DC offset removal) and amplitude scaling.
|
|
||||||
(Details in the {doc}`audio` section).
|
|
||||||
2. **Spectrogram Generation:** The prepared audio waveform is then converted into a spectrogram.
|
|
||||||
This involves calculating the Short-Time Fourier Transform (STFT) and then applying a series of configurable steps like cropping the frequency range, applying amplitude representations (like dB scale or PCEN), optional denoising, optional resizing to the model's required dimensions, and optional final normalization.
|
|
||||||
(Details in the {doc}`spectrogram` section).
|
|
||||||
|
|
||||||
The entire pipeline is controlled via settings in your main configuration file (typically a YAML file), usually grouped under a `preprocessing:` section which contains subsections like `audio:` and `spectrogram:`.
|
|
||||||
This allows you to easily define, share, and reproduce the exact preprocessing used for a specific model or experiment.
|
|
||||||
|
|
||||||
## Next Steps
|
|
||||||
|
|
||||||
Explore the following sections for detailed explanations of how to configure each stage of the preprocessing pipeline and how to use the resulting preprocessor:
|
|
||||||
|
|
||||||
```{toctree}
|
|
||||||
:maxdepth: 1
|
|
||||||
:caption: Preprocessing Steps:
|
|
||||||
|
|
||||||
audio
|
|
||||||
spectrogram
|
|
||||||
usage
|
|
||||||
```
|
|
||||||
@ -1,141 +0,0 @@
|
|||||||
# Spectrogram Generation
|
|
||||||
|
|
||||||
## Purpose
|
|
||||||
|
|
||||||
After loading and performing initial processing on the audio waveform (as described in the Audio Loading section), the next crucial step in the `preprocessing` pipeline is to convert that waveform into a **spectrogram**.
|
|
||||||
A spectrogram is a visual representation of sound, showing frequency content over time, and it's the primary input format for many deep learning models, including BatDetect2.
|
|
||||||
|
|
||||||
This module handles the calculation and subsequent processing of the spectrogram.
|
|
||||||
Just like the audio processing, these steps need to be applied **consistently** during both model training and later use (inference) to ensure the model performs reliably.
|
|
||||||
You control this entire process through the configuration file.
|
|
||||||
|
|
||||||
## The Spectrogram Generation Pipeline
|
|
||||||
|
|
||||||
Once BatDetect2 has a prepared audio waveform, it follows these steps to create the final spectrogram input for the model:
|
|
||||||
|
|
||||||
1. **Calculate STFT (Short-Time Fourier Transform):** This is the fundamental step that converts the 1D audio waveform into a 2D time-frequency representation.
|
|
||||||
It calculates the frequency content within short, overlapping time windows.
|
|
||||||
The output is typically a **magnitude spectrogram**, showing the intensity (amplitude) of different frequencies at different times.
|
|
||||||
Key parameters here are the `window_duration` and `window_overlap`, which affect the trade-off between time and frequency resolution.
|
|
||||||
2. **Crop Frequencies:** The STFT often produces frequency information over a very wide range (e.g., 0 Hz up to half the sample rate).
|
|
||||||
This step crops the spectrogram to focus only on the frequency range relevant to your target sounds (e.g., 10 kHz to 120 kHz for typical bat echolocation).
|
|
||||||
3. **Apply PCEN (Optional):** If configured, Per-Channel Energy Normalization is applied.
|
|
||||||
PCEN is an adaptive technique that adjusts the gain (loudness) in each frequency channel based on its recent history.
|
|
||||||
It can help suppress stationary background noise and enhance the prominence of transient sounds like echolocation pulses.
|
|
||||||
This step is optional.
|
|
||||||
4. **Set Amplitude Scale / Representation:** The values in the spectrogram (either raw magnitude or post-PCEN values) need to be represented on a suitable scale.
|
|
||||||
You choose one of the following:
|
|
||||||
- `"amplitude"`: Use the linear magnitude values directly.
|
|
||||||
(Default)
|
|
||||||
- `"power"`: Use the squared magnitude values (representing energy).
|
|
||||||
- `"dB"`: Apply a logarithmic transformation (specifically `log(1 + C*Magnitude)`).
|
|
||||||
This compresses the range of values, often making variations in quieter sounds more apparent, similar to how humans perceive loudness.
|
|
||||||
5. **Denoise (Optional):** If configured (and usually **on** by default), a simple noise reduction technique is applied.
|
|
||||||
This method subtracts the average value of each frequency bin (calculated across time) from that bin, assuming the average represents steady background noise.
|
|
||||||
Negative values after subtraction are clipped to zero.
|
|
||||||
6. **Resize (Optional):** If configured, the dimensions (height/frequency bins and width/time bins) of the spectrogram are adjusted using interpolation to match the exact input size expected by the neural network architecture.
|
|
||||||
7. **Peak Normalize (Optional):** If configured (typically **off** by default), the entire final spectrogram is scaled so that its highest value is exactly 1.0.
|
|
||||||
This ensures all spectrograms fed to the model have a consistent maximum value, which can sometimes aid training stability.
|
|
||||||
|
|
||||||
## Configuring Spectrogram Generation
|
|
||||||
|
|
||||||
You control all these steps via settings in your main configuration file (e.g., `config.yaml`), within the `spectrogram:` section (usually located under the main `preprocessing:` section).
|
|
||||||
|
|
||||||
Here are the key configuration options:
|
|
||||||
|
|
||||||
- **STFT Settings (`stft`)**:
|
|
||||||
|
|
||||||
- `window_duration`: (Number, seconds, e.g., `0.002`) Length of the analysis window.
|
|
||||||
- `window_overlap`: (Number, 0.0 to <1.0, e.g., `0.75`) Fractional overlap between windows.
|
|
||||||
- `window_fn`: (Text, e.g., `"hann"`) Name of the windowing function.
|
|
||||||
|
|
||||||
- **Frequency Cropping (`frequencies`)**:
|
|
||||||
|
|
||||||
- `min_freq`: (Integer, Hz, e.g., `10000`) Minimum frequency to keep.
|
|
||||||
- `max_freq`: (Integer, Hz, e.g., `120000`) Maximum frequency to keep.
|
|
||||||
|
|
||||||
- **PCEN (`pcen`)**:
|
|
||||||
|
|
||||||
- This entire section is **optional**.
|
|
||||||
Include it only if you want to apply PCEN.
|
|
||||||
If omitted or set to `null`, PCEN is skipped.
|
|
||||||
- `time_constant`: (Number, seconds, e.g., `0.4`) Controls adaptation speed.
|
|
||||||
- `gain`: (Number, e.g., `0.98`) Gain factor.
|
|
||||||
- `bias`: (Number, e.g., `2.0`) Bias factor.
|
|
||||||
- `power`: (Number, e.g., `0.5`) Compression exponent.
|
|
||||||
|
|
||||||
- **Amplitude Scale (`scale`)**:
|
|
||||||
|
|
||||||
- (Text: `"dB"`, `"power"`, or `"amplitude"`) Selects the final representation of the spectrogram values.
|
|
||||||
Default is `"amplitude"`.
|
|
||||||
|
|
||||||
- **Denoising (`spectral_mean_substraction`)**:
|
|
||||||
|
|
||||||
- (Boolean: `true` or `false`) Enables/disables the spectral mean subtraction denoising step.
|
|
||||||
Default is usually `true`.
|
|
||||||
|
|
||||||
- **Resizing (`size`)**:
|
|
||||||
|
|
||||||
- This entire section is **optional**.
|
|
||||||
Include it only if you need to resize the spectrogram to specific dimensions required by the model.
|
|
||||||
If omitted or set to `null`, no resizing occurs after frequency cropping.
|
|
||||||
- `height`: (Integer, e.g., `128`) Target number of frequency bins.
|
|
||||||
- `resize_factor`: (Number or `null`, e.g., `0.5`) Factor to scale the time dimension by.
|
|
||||||
`0.5` halves the width, `null` or `1.0` keeps the original width.
|
|
||||||
|
|
||||||
- **Peak Normalization (`peak_normalize`)**:
|
|
||||||
- (Boolean: `true` or `false`) Enables/disables final scaling of the entire spectrogram so the maximum value is 1.0.
|
|
||||||
Default is usually `false`.
|
|
||||||
|
|
||||||
**Example YAML Configuration:**
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
# Inside your main configuration file
|
|
||||||
|
|
||||||
preprocessing:
|
|
||||||
audio:
|
|
||||||
# ... (your audio configuration settings) ...
|
|
||||||
resample:
|
|
||||||
samplerate: 256000 # Ensure this matches model needs
|
|
||||||
|
|
||||||
spectrogram:
|
|
||||||
# --- STFT Parameters ---
|
|
||||||
stft:
|
|
||||||
window_duration: 0.002 # 2ms window
|
|
||||||
window_overlap: 0.75 # 75% overlap
|
|
||||||
window_fn: hann
|
|
||||||
|
|
||||||
# --- Frequency Range ---
|
|
||||||
frequencies:
|
|
||||||
min_freq: 10000 # 10 kHz
|
|
||||||
max_freq: 120000 # 120 kHz
|
|
||||||
|
|
||||||
# --- PCEN (Optional) ---
|
|
||||||
# Include this block to enable PCEN, omit or set to null to disable.
|
|
||||||
pcen:
|
|
||||||
time_constant: 0.4
|
|
||||||
gain: 0.98
|
|
||||||
bias: 2.0
|
|
||||||
power: 0.5
|
|
||||||
|
|
||||||
# --- Final Amplitude Representation ---
|
|
||||||
scale: dB # Choose 'dB', 'power', or 'amplitude'
|
|
||||||
|
|
||||||
# --- Denoising ---
|
|
||||||
spectral_mean_substraction: true # Enable spectral mean subtraction
|
|
||||||
|
|
||||||
# --- Resizing (Optional) ---
|
|
||||||
# Include this block to resize, omit or set to null to disable.
|
|
||||||
size:
|
|
||||||
height: 128 # Target height in frequency bins
|
|
||||||
resize_factor: 0.5 # Halve the number of time bins
|
|
||||||
|
|
||||||
# --- Final Normalization ---
|
|
||||||
peak_normalize: false # Do not scale max value to 1.0
|
|
||||||
```
|
|
||||||
|
|
||||||
## Outcome
|
|
||||||
|
|
||||||
The output of this module is the final, processed spectrogram (as a 2D numerical array with time and frequency information).
|
|
||||||
This spectrogram is now in the precise format expected by the BatDetect2 neural network, ready to be used for training the model or for making predictions on new data.
|
|
||||||
Remember, using the exact same `spectrogram` configuration settings during training and inference is essential for correct model performance.
|
|
||||||
@ -1,175 +0,0 @@
|
|||||||
# Using Preprocessors in BatDetect2
|
|
||||||
|
|
||||||
## Overview
|
|
||||||
|
|
||||||
In the previous sections ({doc}`audio`and {doc}`spectrogram`), we discussed the individual steps involved in converting raw audio into a processed spectrogram suitable for BatDetect2 models, and how to configure these steps using YAML files (specifically the `audio:` and `spectrogram:` sections within a main `preprocessing:` configuration block).
|
|
||||||
|
|
||||||
This page focuses on how this configured pipeline is represented and used within BatDetect2, primarily through the concept of a **`Preprocessor`** object.
|
|
||||||
This object bundles together your chosen audio loading settings and spectrogram generation settings into a single component that can perform the end-to-end processing.
|
|
||||||
|
|
||||||
## Do I Need to Interact with Preprocessors Directly?
|
|
||||||
|
|
||||||
**Usually, no.** For standard model training or running inference with BatDetect2 using the provided scripts, the system will automatically:
|
|
||||||
|
|
||||||
1. Read your main configuration file (e.g., `config.yaml`).
|
|
||||||
2. Find the `preprocessing:` section (containing `audio:` and `spectrogram:` settings).
|
|
||||||
3. Build the appropriate `Preprocessor` object internally based on your settings.
|
|
||||||
4. Use that internal `Preprocessor` object automatically whenever audio needs to be loaded and converted to a spectrogram.
|
|
||||||
|
|
||||||
**However**, understanding the `Preprocessor` object is useful if you want to:
|
|
||||||
|
|
||||||
- **Customize:** Go beyond the standard configuration options by interacting with parts of the pipeline programmatically.
|
|
||||||
- **Integrate:** Use BatDetect2's preprocessing steps within your own custom Python analysis scripts.
|
|
||||||
- **Inspect/Debug:** Manually apply preprocessing steps to specific files or clips to examine intermediate outputs (like the processed waveform) or the final spectrogram.
|
|
||||||
|
|
||||||
## Getting a Preprocessor Object
|
|
||||||
|
|
||||||
If you _do_ want to work with the preprocessor programmatically, you first need to get an instance of it.
|
|
||||||
This is typically done based on a configuration:
|
|
||||||
|
|
||||||
1. **Define Configuration:** Create your `preprocessing:` configuration, usually in a YAML file (let's call it `preprocess_config.yaml`), detailing your desired `audio` and `spectrogram` settings.
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
# preprocess_config.yaml
|
|
||||||
audio:
|
|
||||||
resample:
|
|
||||||
samplerate: 256000
|
|
||||||
# ... other audio settings ...
|
|
||||||
spectrogram:
|
|
||||||
frequencies:
|
|
||||||
min_freq: 15000
|
|
||||||
max_freq: 120000
|
|
||||||
scale: dB
|
|
||||||
# ... other spectrogram settings ...
|
|
||||||
```
|
|
||||||
|
|
||||||
2. **Load Configuration & Build Preprocessor (in Python):**
|
|
||||||
|
|
||||||
```python
|
|
||||||
from batdetect2.preprocessing import load_preprocessing_config, build_preprocessor
|
|
||||||
from batdetect2.preprocess.types import Preprocessor # Import the type
|
|
||||||
|
|
||||||
# Load the configuration from the file
|
|
||||||
config_path = "path/to/your/preprocess_config.yaml"
|
|
||||||
preprocessing_config = load_preprocessing_config(config_path)
|
|
||||||
|
|
||||||
# Build the actual preprocessor object using the loaded config
|
|
||||||
preprocessor: Preprocessor = build_preprocessor(preprocessing_config)
|
|
||||||
|
|
||||||
# 'preprocessor' is now ready to use!
|
|
||||||
```
|
|
||||||
|
|
||||||
3. **Using Defaults:** If you just want the standard BatDetect2 default preprocessing settings, you can build one without loading a config file:
|
|
||||||
|
|
||||||
```python
|
|
||||||
from batdetect2.preprocessing import build_preprocessor
|
|
||||||
from batdetect2.preprocess.types import Preprocessor
|
|
||||||
|
|
||||||
# Build with default settings
|
|
||||||
default_preprocessor: Preprocessor = build_preprocessor()
|
|
||||||
```
|
|
||||||
|
|
||||||
## Applying Preprocessing
|
|
||||||
|
|
||||||
Once you have a `preprocessor` object, you can use its methods to process audio data:
|
|
||||||
|
|
||||||
**1.
|
|
||||||
End-to-End Processing (Common Use Case):**
|
|
||||||
|
|
||||||
These methods take an audio source identifier (file path, Recording object, or Clip object) and return the final, processed spectrogram.
|
|
||||||
|
|
||||||
- `preprocessor.preprocess_file(path)`: Processes an entire audio file.
|
|
||||||
- `preprocessor.preprocess_recording(recording_obj)`: Processes the entire audio associated with a `soundevent.data.Recording` object.
|
|
||||||
- `preprocessor.preprocess_clip(clip_obj)`: Processes only the specific time segment defined by a `soundevent.data.Clip` object.
|
|
||||||
- **Efficiency Note:** Using `preprocess_clip` is **highly recommended** when you are only interested in analyzing a small portion of a potentially long recording.
|
|
||||||
It avoids loading the entire audio file into memory, making it much more efficient.
|
|
||||||
|
|
||||||
```python
|
|
||||||
from soundevent import data
|
|
||||||
|
|
||||||
# Assume 'preprocessor' is built as shown before
|
|
||||||
# Assume 'my_clip' is a soundevent.data.Clip object defining a segment
|
|
||||||
|
|
||||||
# Process an entire file
|
|
||||||
spectrogram_from_file = preprocessor.preprocess_file("my_recording.wav")
|
|
||||||
|
|
||||||
# Process only a specific clip (more efficient for segments)
|
|
||||||
spectrogram_from_clip = preprocessor.preprocess_clip(my_clip)
|
|
||||||
|
|
||||||
# The results (spectrogram_from_file, spectrogram_from_clip) are xr.DataArrays
|
|
||||||
print(type(spectrogram_from_clip))
|
|
||||||
# Output: <class 'xarray.core.dataarray.DataArray'>
|
|
||||||
```
|
|
||||||
|
|
||||||
**2.
|
|
||||||
Intermediate Steps (Advanced Use Cases):**
|
|
||||||
|
|
||||||
The preprocessor also allows access to intermediate stages if needed:
|
|
||||||
|
|
||||||
- `preprocessor.load_clip_audio(clip_obj)` (and similar for file/recording): Loads the audio and applies _only_ the waveform processing steps (resampling, centering, etc.) defined in the `audio` config.
|
|
||||||
Returns the processed waveform as an `xr.DataArray`.
|
|
||||||
This is useful if you want to analyze or manipulate the waveform itself before spectrogram generation.
|
|
||||||
- `preprocessor.compute_spectrogram(waveform)`: Takes an _already loaded_ waveform (either `np.ndarray` or `xr.DataArray`) and applies _only_ the spectrogram generation steps defined in the `spectrogram` config.
|
|
||||||
- If you provide an `xr.DataArray` (e.g., from `load_clip_audio`), it uses the sample rate from the array's coordinates.
|
|
||||||
- If you provide a raw `np.ndarray`, it **must assume a sample rate**.
|
|
||||||
It uses the `default_samplerate` that was determined when the `preprocessor` was built (based on your `audio` config's resample settings or the global default).
|
|
||||||
Be cautious when using NumPy arrays to ensure the sample rate assumption is correct for your data!
|
|
||||||
|
|
||||||
```python
|
|
||||||
# Example: Get waveform first, then spectrogram
|
|
||||||
waveform = preprocessor.load_clip_audio(my_clip)
|
|
||||||
# waveform is an xr.DataArray
|
|
||||||
|
|
||||||
# ...potentially do other things with the waveform...
|
|
||||||
|
|
||||||
# Compute spectrogram from the loaded waveform
|
|
||||||
spectrogram = preprocessor.compute_spectrogram(waveform)
|
|
||||||
|
|
||||||
# Example: Process external numpy array (use with caution re: sample rate)
|
|
||||||
# import soundfile as sf # Requires installing soundfile
|
|
||||||
# numpy_waveform, original_sr = sf.read("some_other_audio.wav")
|
|
||||||
# # MUST ensure numpy_waveform's actual sample rate matches
|
|
||||||
# # preprocessor.default_samplerate for correct results here!
|
|
||||||
# spec_from_numpy = preprocessor.compute_spectrogram(numpy_waveform)
|
|
||||||
```
|
|
||||||
|
|
||||||
## Understanding the Output: `xarray.DataArray`
|
|
||||||
|
|
||||||
All preprocessing methods return the final spectrogram (or the intermediate waveform) as an **`xarray.DataArray`**.
|
|
||||||
|
|
||||||
**What is it?** Think of it like a standard NumPy array (holding the numerical data of the spectrogram) but with added "superpowers":
|
|
||||||
|
|
||||||
- **Labeled Dimensions:** Instead of just having axis 0 and axis 1, the dimensions have names, typically `"frequency"` and `"time"`.
|
|
||||||
- **Coordinates:** It stores the actual frequency values (e.g., in Hz) corresponding to each row and the actual time values (e.g., in seconds) corresponding to each column along the dimensions.
|
|
||||||
|
|
||||||
**Why is it used?**
|
|
||||||
|
|
||||||
- **Clarity:** The data is self-describing.
|
|
||||||
You don't need to remember which axis is time and which is frequency, or what the units are – it's stored with the data.
|
|
||||||
- **Convenience:** You can select, slice, or plot data using the real-world coordinate values (times, frequencies) instead of just numerical indices.
|
|
||||||
This makes analysis code easier to write and less prone to errors.
|
|
||||||
- **Metadata:** It can hold additional metadata about the processing steps in its `attrs` (attributes) dictionary.
|
|
||||||
|
|
||||||
**Using the Output:**
|
|
||||||
|
|
||||||
- **Input to Model:** For standard training or inference, you typically pass this `xr.DataArray` spectrogram directly to the BatDetect2 model functions.
|
|
||||||
- **Inspection/Analysis:** If you're working programmatically, you can use xarray's powerful features.
|
|
||||||
For example (these are just illustrations of xarray):
|
|
||||||
|
|
||||||
```python
|
|
||||||
# Get the shape (frequency_bins, time_bins)
|
|
||||||
# print(spectrogram.shape)
|
|
||||||
|
|
||||||
# Get the frequency coordinate values
|
|
||||||
# print(spectrogram['frequency'].values)
|
|
||||||
|
|
||||||
# Select data near a specific time and frequency
|
|
||||||
# value_at_point = spectrogram.sel(time=0.5, frequency=50000, method="nearest")
|
|
||||||
# print(value_at_point)
|
|
||||||
|
|
||||||
# Select a time slice between 0.2 and 0.3 seconds
|
|
||||||
# time_slice = spectrogram.sel(time=slice(0.2, 0.3))
|
|
||||||
# print(time_slice.shape)
|
|
||||||
```
|
|
||||||
|
|
||||||
In summary, while BatDetect2 often handles preprocessing automatically based on your configuration, the underlying `Preprocessor` object provides a flexible interface for applying these steps programmatically if needed, returning results in the convenient and informative `xarray.DataArray` format.
|
|
||||||
76
docs/source/reference/data-sources.md
Normal file
76
docs/source/reference/data-sources.md
Normal file
@ -0,0 +1,76 @@
|
|||||||
|
# Data source reference
|
||||||
|
|
||||||
|
This page summarizes dataset source formats and their config fields.
|
||||||
|
|
||||||
|
## Supported source formats
|
||||||
|
|
||||||
|
| Format | Description |
|
||||||
|
| --- | --- |
|
||||||
|
| `aoef` | AOEF/soundevent annotation files (`AnnotationSet` or `AnnotationProject`) |
|
||||||
|
| `batdetect2` | Legacy format with one JSON annotation file per recording |
|
||||||
|
| `batdetect2_file` | Legacy format with one merged JSON annotation file |
|
||||||
|
|
||||||
|
## AOEF (`format: aoef`)
|
||||||
|
|
||||||
|
Required fields:
|
||||||
|
|
||||||
|
- `name`
|
||||||
|
- `format`
|
||||||
|
- `audio_dir`
|
||||||
|
- `annotations_path`
|
||||||
|
|
||||||
|
Optional fields:
|
||||||
|
|
||||||
|
- `description`
|
||||||
|
- `filter`
|
||||||
|
|
||||||
|
`filter` is only used when `annotations_path` points to an
|
||||||
|
`AnnotationProject`.
|
||||||
|
|
||||||
|
AOEF filter options:
|
||||||
|
|
||||||
|
- `only_completed` (default: `true`)
|
||||||
|
- `only_verified` (default: `false`)
|
||||||
|
- `exclude_issues` (default: `true`)
|
||||||
|
|
||||||
|
Use `filter: null` to disable project filtering.
|
||||||
|
|
||||||
|
## Legacy per-file (`format: batdetect2`)
|
||||||
|
|
||||||
|
Required fields:
|
||||||
|
|
||||||
|
- `name`
|
||||||
|
- `format`
|
||||||
|
- `audio_dir`
|
||||||
|
- `annotations_dir`
|
||||||
|
|
||||||
|
Optional fields:
|
||||||
|
|
||||||
|
- `description`
|
||||||
|
- `filter`
|
||||||
|
|
||||||
|
## Legacy merged file (`format: batdetect2_file`)
|
||||||
|
|
||||||
|
Required fields:
|
||||||
|
|
||||||
|
- `name`
|
||||||
|
- `format`
|
||||||
|
- `audio_dir`
|
||||||
|
- `annotations_path`
|
||||||
|
|
||||||
|
Optional fields:
|
||||||
|
|
||||||
|
- `description`
|
||||||
|
- `filter`
|
||||||
|
|
||||||
|
Legacy filter options:
|
||||||
|
|
||||||
|
- `only_annotated` (default: `true`)
|
||||||
|
- `exclude_issues` (default: `true`)
|
||||||
|
|
||||||
|
Use `filter: null` to disable filtering.
|
||||||
|
|
||||||
|
## Related guides
|
||||||
|
|
||||||
|
- {doc}`../how_to/configure-aoef-dataset`
|
||||||
|
- {doc}`../how_to/import-legacy-batdetect2-annotations`
|
||||||
@ -7,6 +7,10 @@ configuration, and data structures.
|
|||||||
:maxdepth: 1
|
:maxdepth: 1
|
||||||
|
|
||||||
cli/index
|
cli/index
|
||||||
|
data-sources
|
||||||
|
preprocessing-config
|
||||||
|
postprocess-config
|
||||||
|
targets-config-workflow
|
||||||
configs
|
configs
|
||||||
targets
|
targets
|
||||||
```
|
```
|
||||||
|
|||||||
31
docs/source/reference/postprocess-config.md
Normal file
31
docs/source/reference/postprocess-config.md
Normal file
@ -0,0 +1,31 @@
|
|||||||
|
# Postprocess config reference
|
||||||
|
|
||||||
|
`PostprocessConfig` controls how raw detector outputs are converted into final
|
||||||
|
detections.
|
||||||
|
|
||||||
|
Defined in `batdetect2.postprocess.config`.
|
||||||
|
|
||||||
|
## Fields
|
||||||
|
|
||||||
|
- `nms_kernel_size` (int > 0)
|
||||||
|
- neighborhood size for non-maximum suppression.
|
||||||
|
- `detection_threshold` (float >= 0)
|
||||||
|
- minimum detection score to keep a candidate event.
|
||||||
|
- `classification_threshold` (float >= 0)
|
||||||
|
- minimum class score used when assigning class tags.
|
||||||
|
- `top_k_per_sec` (int > 0)
|
||||||
|
- maximum detection density per second.
|
||||||
|
|
||||||
|
## Defaults
|
||||||
|
|
||||||
|
- `detection_threshold`: `0.01`
|
||||||
|
- `classification_threshold`: `0.1`
|
||||||
|
- `top_k_per_sec`: `100`
|
||||||
|
|
||||||
|
`nms_kernel_size` defaults to the library constant used by the NMS module.
|
||||||
|
|
||||||
|
## Related pages
|
||||||
|
|
||||||
|
- Threshold behaviour: {doc}`../explanation/postprocessing-and-thresholds`
|
||||||
|
- Threshold tuning workflow: {doc}`../how_to/tune-detection-threshold`
|
||||||
|
- CLI predict options: {doc}`cli/predict`
|
||||||
61
docs/source/reference/preprocessing-config.md
Normal file
61
docs/source/reference/preprocessing-config.md
Normal file
@ -0,0 +1,61 @@
|
|||||||
|
# Preprocessing config reference
|
||||||
|
|
||||||
|
This page summarizes preprocessing-related config objects used by batdetect2.
|
||||||
|
|
||||||
|
## Audio loader config (`AudioConfig`)
|
||||||
|
|
||||||
|
Defined in `batdetect2.audio.loader`.
|
||||||
|
|
||||||
|
Fields:
|
||||||
|
|
||||||
|
- `samplerate` (int): target audio sample rate in Hz.
|
||||||
|
- `resample.enabled` (bool): whether to resample loaded audio.
|
||||||
|
- `resample.method` (`poly` or `fourier`): resampling method.
|
||||||
|
|
||||||
|
## Model preprocessing config (`PreprocessingConfig`)
|
||||||
|
|
||||||
|
Defined in `batdetect2.preprocess.config`.
|
||||||
|
|
||||||
|
Top-level fields:
|
||||||
|
|
||||||
|
- `audio_transforms`: ordered waveform transforms.
|
||||||
|
- `stft`: STFT parameters.
|
||||||
|
- `frequencies`: spectrogram frequency range.
|
||||||
|
- `spectrogram_transforms`: ordered spectrogram transforms.
|
||||||
|
- `size`: final resize settings.
|
||||||
|
|
||||||
|
### `audio_transforms` built-ins
|
||||||
|
|
||||||
|
- `center_audio`
|
||||||
|
- `scale_audio`
|
||||||
|
- `fix_duration` (`duration` in seconds)
|
||||||
|
|
||||||
|
### `stft` fields
|
||||||
|
|
||||||
|
- `window_duration`
|
||||||
|
- `window_overlap`
|
||||||
|
- `window_fn`
|
||||||
|
|
||||||
|
### `frequencies` fields
|
||||||
|
|
||||||
|
- `min_freq`
|
||||||
|
- `max_freq`
|
||||||
|
|
||||||
|
### `spectrogram_transforms` built-ins
|
||||||
|
|
||||||
|
- `pcen`
|
||||||
|
- `scale_amplitude` (`scale: db|power`)
|
||||||
|
- `spectral_mean_subtraction`
|
||||||
|
- `peak_normalize`
|
||||||
|
|
||||||
|
### `size` fields
|
||||||
|
|
||||||
|
- `height`
|
||||||
|
- `resize_factor`
|
||||||
|
|
||||||
|
## Related pages
|
||||||
|
|
||||||
|
- Audio preprocessing how-to: {doc}`../how_to/configure-audio-preprocessing`
|
||||||
|
- Spectrogram preprocessing how-to:
|
||||||
|
{doc}`../how_to/configure-spectrogram-preprocessing`
|
||||||
|
- Why consistency matters: {doc}`../explanation/preprocessing-consistency`
|
||||||
61
docs/source/reference/targets-config-workflow.md
Normal file
61
docs/source/reference/targets-config-workflow.md
Normal file
@ -0,0 +1,61 @@
|
|||||||
|
# Targets config workflow reference
|
||||||
|
|
||||||
|
This page summarizes the target-definition configuration used by batdetect2.
|
||||||
|
|
||||||
|
## `TargetConfig`
|
||||||
|
|
||||||
|
Defined in `batdetect2.targets.config`.
|
||||||
|
|
||||||
|
Fields:
|
||||||
|
|
||||||
|
- `detection_target`: one `TargetClassConfig` defining detection eligibility.
|
||||||
|
- `classification_targets`: list of `TargetClassConfig` entries for class
|
||||||
|
encoding/decoding.
|
||||||
|
- `roi`: default ROI mapper config.
|
||||||
|
|
||||||
|
## `TargetClassConfig`
|
||||||
|
|
||||||
|
Defined in `batdetect2.targets.classes`.
|
||||||
|
|
||||||
|
Fields:
|
||||||
|
|
||||||
|
- `name`: class label name.
|
||||||
|
- `tags`: tag list used for matching (shortcut for `match_if`).
|
||||||
|
- `match_if`: explicit condition config (`match_if` is accepted as alias).
|
||||||
|
- `assign_tags`: tags used when decoding this class.
|
||||||
|
- `roi`: optional class-specific ROI mapper override.
|
||||||
|
|
||||||
|
`tags` and `match_if` are mutually exclusive.
|
||||||
|
|
||||||
|
## Supported condition config types
|
||||||
|
|
||||||
|
Built from `batdetect2.data.conditions`.
|
||||||
|
|
||||||
|
- `has_tag`
|
||||||
|
- `has_all_tags`
|
||||||
|
- `has_any_tag`
|
||||||
|
- `duration`
|
||||||
|
- `frequency`
|
||||||
|
- `all_of`
|
||||||
|
- `any_of`
|
||||||
|
- `not`
|
||||||
|
|
||||||
|
## ROI mapper config
|
||||||
|
|
||||||
|
`roi` supports built-in mappers including:
|
||||||
|
|
||||||
|
- `anchor_bbox`
|
||||||
|
- `peak_energy_bbox`
|
||||||
|
|
||||||
|
Key `anchor_bbox` fields:
|
||||||
|
|
||||||
|
- `anchor`
|
||||||
|
- `time_scale`
|
||||||
|
- `frequency_scale`
|
||||||
|
|
||||||
|
## Related pages
|
||||||
|
|
||||||
|
- Detection target setup: {doc}`../how_to/configure-target-definitions`
|
||||||
|
- Class setup: {doc}`../how_to/define-target-classes`
|
||||||
|
- ROI setup: {doc}`../how_to/configure-roi-mapping`
|
||||||
|
- Concept overview: {doc}`../explanation/target-encoding-and-decoding`
|
||||||
@ -1,141 +0,0 @@
|
|||||||
# Step 4: Defining Target Classes and Decoding Rules
|
|
||||||
|
|
||||||
## Purpose and Context
|
|
||||||
|
|
||||||
You've prepared your data by defining your annotation vocabulary (Step 1: Terms), removing irrelevant sounds (Step 2: Filtering), and potentially cleaning up or modifying tags (Step 3: Transforming Tags).
|
|
||||||
Now, it's time for a crucial step with two related goals:
|
|
||||||
|
|
||||||
1. Telling `batdetect2` **exactly what categories (classes) your model should learn to identify** by defining rules that map annotation tags to class names (like `pippip`, `myodau`, or `noise`).
|
|
||||||
This process is often called **encoding**.
|
|
||||||
2. Defining how the model's predictions (those same class names) should be translated back into meaningful, structured **annotation tags** when you use the trained model.
|
|
||||||
This is often called **decoding**.
|
|
||||||
|
|
||||||
These definitions are essential for both training the model correctly and interpreting its output later.
|
|
||||||
|
|
||||||
## How it Works: Defining Classes with Rules
|
|
||||||
|
|
||||||
You define your target classes and their corresponding decoding rules in your main configuration file (e.g., your `.yaml` training config), typically under a section named `classes`.
|
|
||||||
This section contains:
|
|
||||||
|
|
||||||
1. A **list** of specific class definitions.
|
|
||||||
2. A definition for the **generic class** tags.
|
|
||||||
|
|
||||||
Each item in the `classes` list defines one specific class your model should learn.
|
|
||||||
|
|
||||||
## Defining a Single Class
|
|
||||||
|
|
||||||
Each specific class definition rule requires the following information:
|
|
||||||
|
|
||||||
1. `name`: **(Required)** This is the unique, simple name for this class (e.g., `pipistrellus_pipistrellus`, `myotis_daubentonii`, `noise`).
|
|
||||||
This label is used during training and is what the model predicts.
|
|
||||||
Choose clear, distinct names.
|
|
||||||
**Each class name must be unique.**
|
|
||||||
2. `tags`: **(Required)** This list contains one or more specific tags (using `key` and `value`) used to identify if an _existing_ annotation belongs to this class during the _encoding_ phase (preparing training data).
|
|
||||||
3. `match_type`: **(Optional, defaults to `"all"`)** Determines how the `tags` list is evaluated during _encoding_:
|
|
||||||
- `"all"`: The annotation must have **ALL** listed tags to match.
|
|
||||||
(Default).
|
|
||||||
- `"any"`: The annotation needs **AT LEAST ONE** listed tag to match.
|
|
||||||
4. `output_tags`: **(Optional)** This list specifies the tags that should be assigned to an annotation when the model _predicts_ this class `name`.
|
|
||||||
This is used during the _decoding_ phase (interpreting model output).
|
|
||||||
- **If you omit `output_tags` (or set it to `null`/~), the system will default to using the same tags listed in the `tags` field for decoding.** This is often what you want.
|
|
||||||
- Providing `output_tags` allows you to specify a different, potentially more canonical or detailed, set of tags to represent the class upon prediction.
|
|
||||||
For example, you could match based on simplified tags but output standardized tags.
|
|
||||||
|
|
||||||
**Example: Defining Species Classes (Encoding & Default Decoding)**
|
|
||||||
|
|
||||||
Here, the `tags` used for matching during encoding will also be used for decoding, as `output_tags` is omitted.
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
# In your main configuration file
|
|
||||||
classes:
|
|
||||||
# Definition for the first class
|
|
||||||
- name: pippip # Simple name for Pipistrellus pipistrellus
|
|
||||||
tags: # Used for BOTH encoding match and decoding output
|
|
||||||
- key: species
|
|
||||||
value: Pipistrellus pipistrellus
|
|
||||||
# match_type defaults to "all"
|
|
||||||
# output_tags is omitted, defaults to using 'tags' above
|
|
||||||
|
|
||||||
# Definition for the second class
|
|
||||||
- name: myodau # Simple name for Myotis daubentonii
|
|
||||||
tags: # Used for BOTH encoding match and decoding output
|
|
||||||
- key: species
|
|
||||||
value: Myotis daubentonii
|
|
||||||
```
|
|
||||||
|
|
||||||
**Example: Defining a Class with Separate Encoding and Decoding Tags**
|
|
||||||
|
|
||||||
Here, we match based on _either_ of two tags (`match_type: any`), but when the model predicts `'pipistrelle'`, we decode it _only_ to the specific `Pipistrellus pipistrellus` tag plus a genus tag.
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
classes:
|
|
||||||
- name: pipistrelle # Name for a Pipistrellus group
|
|
||||||
match_type: any # Match if EITHER tag below is present during encoding
|
|
||||||
tags:
|
|
||||||
- key: species
|
|
||||||
value: Pipistrellus pipistrellus
|
|
||||||
- key: species
|
|
||||||
value: Pipistrellus pygmaeus # Match pygmaeus too
|
|
||||||
output_tags: # BUT, when decoding 'pipistrelle', assign THESE tags:
|
|
||||||
- key: species
|
|
||||||
value: Pipistrellus pipistrellus # Canonical species
|
|
||||||
- key: genus # Assumes 'genus' key exists
|
|
||||||
value: Pipistrellus # Add genus tag
|
|
||||||
```
|
|
||||||
|
|
||||||
## Handling Overlap During Encoding: Priority Order Matters
|
|
||||||
|
|
||||||
As before, when preparing training data (encoding), if an annotation matches the `tags` and `match_type` rules for multiple class definitions, the **order of the class definitions in the configuration list determines the priority**.
|
|
||||||
|
|
||||||
- The system checks rules from the **top** of the `classes` list down.
|
|
||||||
- The annotation gets assigned the `name` of the **first class rule it matches**.
|
|
||||||
- **Place more specific rules before more general rules.**
|
|
||||||
|
|
||||||
_(The YAML example for prioritizing Species over Noise remains the same as the previous version)_
|
|
||||||
|
|
||||||
## Handling Non-Matches & Decoding the Generic Class
|
|
||||||
|
|
||||||
What happens if an annotation passes filtering/transformation but doesn't match any specific class rule during encoding?
|
|
||||||
|
|
||||||
- **Encoding:** As explained previously, these annotations are **not ignored**.
|
|
||||||
They are typically assigned to a generic "relevant sound" category, often called the **"Bat"** class in BatDetect2, intended for all relevant bat calls not specifically classified.
|
|
||||||
- **Decoding:** When the model predicts this generic "Bat" category (or when processing sounds that weren't assigned a specific class during encoding), we need a way to represent this generic status with tags.
|
|
||||||
This is defined by the `generic_class` list directly within the main `classes` configuration section.
|
|
||||||
|
|
||||||
**Defining the Generic Class Tags:**
|
|
||||||
|
|
||||||
You specify the tags for the generic class like this:
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
# In your main configuration file
|
|
||||||
classes: # Main configuration section for classes
|
|
||||||
# --- List of specific class definitions ---
|
|
||||||
classes:
|
|
||||||
- name: pippip
|
|
||||||
tags:
|
|
||||||
- key: species
|
|
||||||
value: Pipistrellus pipistrellus
|
|
||||||
# ... other specific classes ...
|
|
||||||
|
|
||||||
# --- Definition of the generic class tags ---
|
|
||||||
generic_class: # Define tags for the generic 'Bat' category
|
|
||||||
- key: call_type
|
|
||||||
value: Echolocation
|
|
||||||
- key: order
|
|
||||||
value: Chiroptera
|
|
||||||
# These tags will be assigned when decoding the generic category
|
|
||||||
```
|
|
||||||
|
|
||||||
This `generic_class` list provides the standard tags assigned when a sound is identified as relevant (passed filtering) but doesn't belong to one of the specific target classes you defined.
|
|
||||||
Like the specific classes, sensible defaults are often provided if you don't explicitly define `generic_class`.
|
|
||||||
|
|
||||||
**Crucially:** Remember, if sounds should be **completely excluded** from training (not even considered "generic"), use **Filtering rules (Step 2)**.
|
|
||||||
|
|
||||||
### Outcome
|
|
||||||
|
|
||||||
By defining this list of prioritized class rules (including their `name`, matching `tags`, `match_type`, and optional `output_tags`) and the `generic_class` tags, you provide `batdetect2` with:
|
|
||||||
|
|
||||||
1. A clear procedure to assign a target label (`name`) to each relevant annotation for training.
|
|
||||||
2. A clear mapping to convert predicted class names (including the generic case) back into meaningful annotation tags.
|
|
||||||
|
|
||||||
This complete definition prepares your data for the final heatmap generation (Step 5) and enables interpretation of the model's results.
|
|
||||||
@ -1,141 +0,0 @@
|
|||||||
# Step 2: Filtering Sound Events
|
|
||||||
|
|
||||||
## Purpose
|
|
||||||
|
|
||||||
When preparing your annotated audio data for training a `batdetect2` model, you often want to select only specific sound events.
|
|
||||||
For example, you might want to:
|
|
||||||
|
|
||||||
- Focus only on echolocation calls and ignore social calls or noise.
|
|
||||||
- Exclude annotations that were marked as low quality.
|
|
||||||
- Train only on specific species or groups of species.
|
|
||||||
|
|
||||||
This filtering module allows you to define rules based on the **tags** associated with each sound event annotation.
|
|
||||||
Only the events that pass _all_ your defined rules will be kept for further processing and training.
|
|
||||||
|
|
||||||
## How it Works: Rules
|
|
||||||
|
|
||||||
Filtering is controlled by a list of **rules**.
|
|
||||||
Each rule defines a condition based on the tags attached to a sound event.
|
|
||||||
An event must satisfy **all** the rules you define in your configuration to be included.
|
|
||||||
If an event fails even one rule, it is discarded.
|
|
||||||
|
|
||||||
## Defining Rules in Configuration
|
|
||||||
|
|
||||||
You define these rules within your main configuration file (usually a `.yaml` file) under a specific section (the exact name might depend on the main training config, but let's assume it's called `filtering`).
|
|
||||||
|
|
||||||
The configuration consists of a list named `rules`.
|
|
||||||
Each item in this list is a single filter rule.
|
|
||||||
|
|
||||||
Each **rule** has two parts:
|
|
||||||
|
|
||||||
1. `match_type`: Specifies the _kind_ of check to perform.
|
|
||||||
2. `tags`: A list of specific tags (each with a `key` and `value`) that the rule applies to.
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
# Example structure in your configuration file
|
|
||||||
filtering:
|
|
||||||
rules:
|
|
||||||
- match_type: <TYPE_OF_CHECK_1>
|
|
||||||
tags:
|
|
||||||
- key: <tag_key_1a>
|
|
||||||
value: <tag_value_1a>
|
|
||||||
- key: <tag_key_1b>
|
|
||||||
value: <tag_value_1b>
|
|
||||||
- match_type: <TYPE_OF_CHECK_2>
|
|
||||||
tags:
|
|
||||||
- key: <tag_key_2a>
|
|
||||||
value: <tag_value_2a>
|
|
||||||
# ... add more rules as needed
|
|
||||||
```
|
|
||||||
|
|
||||||
## Understanding `match_type`
|
|
||||||
|
|
||||||
This determines _how_ the list of `tags` in the rule is used to check a sound event.
|
|
||||||
There are four types:
|
|
||||||
|
|
||||||
1. **`any`**: (Keep if _at least one_ tag matches)
|
|
||||||
|
|
||||||
- The sound event **passes** this rule if it has **at least one** of the tags listed in the `tags` section of the rule.
|
|
||||||
- Think of it as an **OR** condition.
|
|
||||||
- _Example Use Case:_ Keep events if they are tagged as `Species: Pip Pip` OR `Species: Pip Pyg`.
|
|
||||||
|
|
||||||
2. **`all`**: (Keep only if _all_ tags match)
|
|
||||||
|
|
||||||
- The sound event **passes** this rule only if it has **all** of the tags listed in the `tags` section.
|
|
||||||
The event can have _other_ tags as well, but it must contain _all_ the ones specified here.
|
|
||||||
- Think of it as an **AND** condition.
|
|
||||||
- _Example Use Case:_ Keep events only if they are tagged with `Sound Type: Echolocation` AND `Quality: Good`.
|
|
||||||
|
|
||||||
3. **`exclude`**: (Discard if _any_ tag matches)
|
|
||||||
|
|
||||||
- The sound event **passes** this rule only if it does **not** have **any** of the tags listed in the `tags` section.
|
|
||||||
If it matches even one tag in the list, the event is discarded.
|
|
||||||
- _Example Use Case:_ Discard events if they are tagged `Quality: Poor` OR `Noise Source: Insect`.
|
|
||||||
|
|
||||||
4. **`equal`**: (Keep only if tags match _exactly_)
|
|
||||||
- The sound event **passes** this rule only if its set of tags is _exactly identical_ to the list of `tags` provided in the rule (no more, no less).
|
|
||||||
- _Note:_ This is very strict and usually less useful than `all` or `any`.
|
|
||||||
|
|
||||||
## Combining Rules
|
|
||||||
|
|
||||||
Remember: A sound event must **pass every single rule** defined in the `rules` list to be kept.
|
|
||||||
The rules are checked one by one, and if an event fails any rule, it's immediately excluded from further consideration.
|
|
||||||
|
|
||||||
## Examples
|
|
||||||
|
|
||||||
**Example 1: Keep good quality echolocation calls**
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
filtering:
|
|
||||||
rules:
|
|
||||||
# Rule 1: Must have the 'Echolocation' tag
|
|
||||||
- match_type: any # Could also use 'all' if 'Sound Type' is the only tag expected
|
|
||||||
tags:
|
|
||||||
- key: Sound Type
|
|
||||||
value: Echolocation
|
|
||||||
# Rule 2: Must NOT have the 'Poor' quality tag
|
|
||||||
- match_type: exclude
|
|
||||||
tags:
|
|
||||||
- key: Quality
|
|
||||||
value: Poor
|
|
||||||
```
|
|
||||||
|
|
||||||
_Explanation:_ An event is kept only if it passes BOTH rules.
|
|
||||||
It must have the `Sound Type: Echolocation` tag AND it must NOT have the `Quality: Poor` tag.
|
|
||||||
|
|
||||||
**Example 2: Keep calls from Pipistrellus species recorded in a specific project, excluding uncertain IDs**
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
filtering:
|
|
||||||
rules:
|
|
||||||
# Rule 1: Must be either Pip pip or Pip pyg
|
|
||||||
- match_type: any
|
|
||||||
tags:
|
|
||||||
- key: Species
|
|
||||||
value: Pipistrellus pipistrellus
|
|
||||||
- key: Species
|
|
||||||
value: Pipistrellus pygmaeus
|
|
||||||
# Rule 2: Must belong to 'Project Alpha'
|
|
||||||
- match_type: any # Using 'any' as it likely only has one project tag
|
|
||||||
tags:
|
|
||||||
- key: Project ID
|
|
||||||
value: Project Alpha
|
|
||||||
# Rule 3: Exclude if ID Certainty is 'Low' or 'Maybe'
|
|
||||||
- match_type: exclude
|
|
||||||
tags:
|
|
||||||
- key: ID Certainty
|
|
||||||
value: Low
|
|
||||||
- key: ID Certainty
|
|
||||||
value: Maybe
|
|
||||||
```
|
|
||||||
|
|
||||||
_Explanation:_ An event is kept only if it passes ALL three rules:
|
|
||||||
|
|
||||||
1. It has a `Species` tag that is _either_ `Pipistrellus pipistrellus` OR `Pipistrellus pygmaeus`.
|
|
||||||
2. It has the `Project ID: Project Alpha` tag.
|
|
||||||
3. It does _not_ have an `ID Certainty: Low` tag AND it does _not_ have an `ID Certainty: Maybe` tag.
|
|
||||||
|
|
||||||
## Usage
|
|
||||||
|
|
||||||
You will typically specify the path to the configuration file containing these `filtering` rules when you set up your data processing or training pipeline in `batdetect2`.
|
|
||||||
The tool will then automatically load these rules and apply them to your annotated sound events.
|
|
||||||
@ -1,79 +0,0 @@
|
|||||||
# Defining Training Targets
|
|
||||||
|
|
||||||
A crucial aspect of training any supervised machine learning model, including BatDetect2, is clearly defining the **training targets**.
|
|
||||||
This process determines precisely what the model should learn to detect, localize, classify, and characterize from the input data (in this case, spectrograms).
|
|
||||||
The choices made here directly influence the model's focus, its performance, and how its predictions should be interpreted.
|
|
||||||
|
|
||||||
For BatDetect2, defining targets involves specifying:
|
|
||||||
|
|
||||||
- Which sounds in your annotated dataset are relevant for training.
|
|
||||||
- How these sounds should be categorized into distinct **classes** (e.g., different species).
|
|
||||||
- How the geometric **Region of Interest (ROI)** (e.g., bounding box) of each sound maps to the specific **position** and **size** targets the model predicts.
|
|
||||||
- How these classes and geometric properties relate back to the detailed information stored in your annotation **tags** (using a consistent **vocabulary/terms**).
|
|
||||||
- How the model's output (predicted class names, positions, sizes) should be translated back into meaningful tags and geometries.
|
|
||||||
|
|
||||||
## Sound Event Annotations: The Starting Point
|
|
||||||
|
|
||||||
BatDetect2 assumes your training data consists of audio recordings where relevant sound events have been **annotated**.
|
|
||||||
A typical annotation for a single sound event provides two key pieces of information:
|
|
||||||
|
|
||||||
1. **Location & Extent:** Information defining _where_ the sound occurs in time and frequency, usually represented as a **bounding box** (the ROI) drawn on a spectrogram.
|
|
||||||
2. **Description (Tags):** Information _about_ the sound event, provided as a set of descriptive **tags** (key-value pairs).
|
|
||||||
|
|
||||||
For example, an annotation might have a bounding box and tags like:
|
|
||||||
|
|
||||||
- `species: Myotis daubentonii`
|
|
||||||
- `quality: Good`
|
|
||||||
- `call_type: Echolocation`
|
|
||||||
|
|
||||||
A single sound event can have **multiple tags**, allowing for rich descriptions.
|
|
||||||
This richness requires a structured process to translate the annotation (both tags and geometry) into the precise targets needed for model training.
|
|
||||||
The **target definition process** provides clear rules to:
|
|
||||||
|
|
||||||
- Interpret the meaning of different tag keys (**Terms**).
|
|
||||||
- Select only the relevant annotations (**Filtering**).
|
|
||||||
- Potentially standardize or modify the tags (**Transforming**).
|
|
||||||
- Map the geometric ROI to specific position and size targets (**ROI Mapping**).
|
|
||||||
- Map the final set of tags on each selected annotation to a single, definitive **target class** label (**Classes**).
|
|
||||||
|
|
||||||
## Configuration-Driven Workflow
|
|
||||||
|
|
||||||
BatDetect2 is designed so that researchers can configure this entire target definition process primarily through **configuration files** (typically written in YAML format), minimizing the need for direct programming for standard workflows.
|
|
||||||
These settings are usually grouped under a main `targets:` key within your overall training configuration file.
|
|
||||||
|
|
||||||
## The Target Definition Steps
|
|
||||||
|
|
||||||
Defining the targets involves several sequential steps, each configurable and building upon the previous one:
|
|
||||||
|
|
||||||
1. **Defining Vocabulary (Terms & Tags):** Understand how annotations use tags (key-value pairs).
|
|
||||||
This step involves defining the meaning (**Terms**) behind the tag keys (e.g., `species`, `call_type`).
|
|
||||||
Often, default terms are sufficient, but understanding this is key to using tags in later steps.
|
|
||||||
(See: {doc}`tags_and_terms`})
|
|
||||||
2. **Filtering Sound Events:** Select only the relevant sound event annotations based on their tags (e.g., keeping only high-quality calls).
|
|
||||||
(See: {doc}`filtering`})
|
|
||||||
3. **Transforming Tags (Optional):** Modify tags on selected annotations for standardization, correction, grouping (e.g., species to genus), or deriving new tags.
|
|
||||||
(See: {doc}`transform`})
|
|
||||||
4. **Defining Classes & Decoding Rules:** Map the final tags to specific target **class names** (like `pippip` or `myodau`).
|
|
||||||
Define priorities for overlap and specify how predicted names map back to tags (decoding).
|
|
||||||
(See: {doc}`classes`})
|
|
||||||
5. **Mapping ROIs (Position & Size):** Define how the geometric ROI (e.g., bounding box) of each sound event maps to the specific reference **point** (e.g., center, corner) and scaled **size** values (width, height) used as targets by the model.
|
|
||||||
(See: {doc}`rois`})
|
|
||||||
6. **The `Targets` Object:** Understand the outcome of configuring steps 1-5 – a functional object used internally by BatDetect2 that encapsulates all your defined rules for filtering, transforming, ROI mapping, encoding, and decoding.
|
|
||||||
(See: {doc}`use`)
|
|
||||||
|
|
||||||
The result of this configuration process is a clear set of instructions that BatDetect2 uses during training data preparation to determine the correct "answer" (the ground truth label and geometry representation) for each relevant sound event.
|
|
||||||
|
|
||||||
Explore the detailed steps using the links below:
|
|
||||||
|
|
||||||
```{toctree}
|
|
||||||
:maxdepth: 1
|
|
||||||
:caption: Target Definition Steps:
|
|
||||||
|
|
||||||
tags_and_terms
|
|
||||||
filtering
|
|
||||||
transform
|
|
||||||
classes
|
|
||||||
rois
|
|
||||||
labels
|
|
||||||
use
|
|
||||||
```
|
|
||||||
@ -1,76 +0,0 @@
|
|||||||
# Step 5: Generating Training Targets
|
|
||||||
|
|
||||||
## Purpose and Context
|
|
||||||
|
|
||||||
Following the previous steps of defining terms, filtering events, transforming tags, and defining specific class rules, this final stage focuses on **generating the ground truth data** used directly for training the BatDetect2 model.
|
|
||||||
This involves converting the refined annotation information for each audio clip into specific **heatmap formats** required by the underlying neural network architecture.
|
|
||||||
|
|
||||||
This step essentially translates your structured annotations into the precise "answer key" the model learns to replicate during training.
|
|
||||||
|
|
||||||
## What are Heatmaps?
|
|
||||||
|
|
||||||
Heatmaps, in this context, are multi-dimensional arrays, often visualized as images aligned with the input spectrogram, where the values at different time-frequency coordinates represent specific information about the sound events.
|
|
||||||
For BatDetect2 training, three primary heatmaps are generated:
|
|
||||||
|
|
||||||
1. **Detection Heatmap:**
|
|
||||||
|
|
||||||
- **Represents:** The presence or likelihood of relevant sound events across the spectrogram.
|
|
||||||
- **Structure:** A 2D array matching the spectrogram's time-frequency dimensions.
|
|
||||||
Peaks (typically smoothed) are generated at the reference locations of all sound events that passed the filtering stage (including both specifically classified events and those falling into the generic "Bat" category).
|
|
||||||
|
|
||||||
2. **Class Heatmap:**
|
|
||||||
|
|
||||||
- **Represents:** The location and class identity for sounds belonging to the _specific_ target classes you defined in Step 4.
|
|
||||||
- **Structure:** A 3D array with dimensions for category, time, and frequency.
|
|
||||||
It contains a separate 2D layer (channel) for each target class name (e.g., 'pippip', 'myodau').
|
|
||||||
A smoothed peak appears on a specific class layer only if a sound event assigned to that class exists at that location.
|
|
||||||
Events assigned only to the generic class do not produce peaks here.
|
|
||||||
|
|
||||||
3. **Size Heatmap:**
|
|
||||||
- **Represents:** The target dimensions (duration/width and bandwidth/height) of detected sound events.
|
|
||||||
- **Structure:** A 3D array with dimensions for size-dimension ('width', 'height'), time, and frequency.
|
|
||||||
At the reference location of each detected sound event, this heatmap stores two numerical values corresponding to the scaled width and height derived from the event's bounding box.
|
|
||||||
|
|
||||||
## How Heatmaps are Created
|
|
||||||
|
|
||||||
The generation of these heatmaps is an automated process within `batdetect2`, driven by your configurations from all previous steps.
|
|
||||||
For each audio clip and its corresponding spectrogram in the training dataset:
|
|
||||||
|
|
||||||
1. The system retrieves the associated sound event annotations.
|
|
||||||
2. Configured **filtering rules** (Step 2) are applied to select relevant annotations.
|
|
||||||
3. Configured **tag transformation rules** (Step 3) are applied to modify the tags of the selected annotations.
|
|
||||||
4. Configured **class definition rules** (Step 4) are used to assign a specific class name or determine generic "Bat" status for each processed annotation.
|
|
||||||
5. These final annotations are then mapped onto initialized heatmap arrays:
|
|
||||||
- A signal (initially a single point) is placed on the **Detection Heatmap** at the reference location for each relevant annotation.
|
|
||||||
- The scaled width and height values are placed on the **Size Heatmap** at the reference location.
|
|
||||||
- If an annotation received a specific class name, a signal is placed on the corresponding layer of the **Class Heatmap** at the reference location.
|
|
||||||
6. Finally, Gaussian smoothing (a blurring effect) is typically applied to the Detection and Class heatmaps to create spatially smoother targets, which often aids model training stability and performance.
|
|
||||||
|
|
||||||
## Configurable Settings for Heatmap Generation
|
|
||||||
|
|
||||||
While the content of the heatmaps is primarily determined by the previous configuration steps, a few parameters specific to the heatmap drawing process itself can be adjusted.
|
|
||||||
These are usually set in your main configuration file under a section like `labelling`:
|
|
||||||
|
|
||||||
- `sigma`: (Number, e.g., `3.0`) Defines the standard deviation, in pixels or bins, of the Gaussian kernel used for smoothing the Detection and Class heatmaps.
|
|
||||||
Larger values result in more diffused heatmap peaks.
|
|
||||||
- `position`: (Text, e.g., `"bottom-left"`, `"center"`) Specifies the geometric reference point within each sound event's bounding box that anchors its representation on the heatmaps.
|
|
||||||
- `time_scale` & `frequency_scale`: (Numbers) These crucial scaling factors convert the physical duration (in seconds) and frequency bandwidth (in Hz) of annotation bounding boxes into the numerical values stored in the 'width' and 'height' channels of the Size Heatmap.
|
|
||||||
- **Important Note:** The appropriate values for these scales are dictated by the requirements of the specific BatDetect2 model architecture being trained.
|
|
||||||
They ensure the size information is presented in the units or relative scale the model expects.
|
|
||||||
**Consult the documentation or tutorials for your specific model to determine the correct `time_scale` and `frequency_scale` values.** Mismatched scales can hinder the model's ability to learn size regression accurately.
|
|
||||||
|
|
||||||
**Example YAML Configuration for Labelling Settings:**
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
# In your main configuration file
|
|
||||||
labelling:
|
|
||||||
sigma: 3.0 # Std. dev. for Gaussian smoothing (pixels/bins)
|
|
||||||
position: "bottom-left" # Bounding box reference point
|
|
||||||
time_scale: 1000.0 # Example: Scales seconds to milliseconds
|
|
||||||
frequency_scale: 0.00116 # Example: Scales Hz relative to ~860 Hz (model specific!)
|
|
||||||
```
|
|
||||||
|
|
||||||
## Outcome: Final Training Targets
|
|
||||||
|
|
||||||
Executing this step for all training data yields the complete set of target heatmaps (Detection, Class, Size) for each corresponding input spectrogram.
|
|
||||||
These arrays constitute the ground truth data that the BatDetect2 model directly compares its predictions against during the training phase, guiding its learning process.
|
|
||||||
@ -1,85 +0,0 @@
|
|||||||
# Defining Target Geometry: Mapping Sound Event Regions
|
|
||||||
|
|
||||||
## Introduction
|
|
||||||
|
|
||||||
In the previous steps of defining targets, we focused on determining _which_ sound events are relevant (`filtering`), _what_ descriptive tags they should have (`transform`), and _which category_ they belong to (`classes`).
|
|
||||||
However, for the model to learn effectively, it also needs to know **where** in the spectrogram each sound event is located and approximately **how large** it is.
|
|
||||||
|
|
||||||
Your annotations typically define the location and extent of a sound event using a **Region of Interest (ROI)**, most commonly a **bounding box** drawn around the call on the spectrogram.
|
|
||||||
This ROI contains detailed spatial information (start/end time, low/high frequency).
|
|
||||||
|
|
||||||
This section explains how BatDetect2 converts the geometric ROI from your annotations into the specific positional and size information used as targets during model training.
|
|
||||||
|
|
||||||
## From ROI to Model Targets: Position & Size
|
|
||||||
|
|
||||||
BatDetect2 does not directly predict a full bounding box.
|
|
||||||
Instead, it is trained to predict:
|
|
||||||
|
|
||||||
1. **A Reference Point:** A single point `(time, frequency)` that represents the primary location of the detected sound event within the spectrogram.
|
|
||||||
2. **Size Dimensions:** Numerical values representing the event's size relative to that reference point, typically its `width` (duration in time) and `height` (bandwidth in frequency).
|
|
||||||
|
|
||||||
This step defines _how_ BatDetect2 calculates this specific reference point and these numerical size values from the original annotation's bounding box.
|
|
||||||
It also handles the reverse process – converting predicted positions and sizes back into bounding boxes for visualization or analysis.
|
|
||||||
|
|
||||||
## Configuring the ROI Mapping
|
|
||||||
|
|
||||||
You can control how this conversion happens through settings in your configuration file (e.g., your main `.yaml` file).
|
|
||||||
These settings are usually placed within the main `targets:` configuration block, under a specific `roi:` key.
|
|
||||||
|
|
||||||
Here are the key settings:
|
|
||||||
|
|
||||||
- **`position`**:
|
|
||||||
|
|
||||||
- **What it does:** Determines which specific point on the annotation's bounding box is used as the single **Reference Point** for training (e.g., `"center"`, `"bottom-left"`).
|
|
||||||
- **Why configure it?** This affects where the peak signal appears in the target heatmaps used for training.
|
|
||||||
Different choices might slightly influence model learning.
|
|
||||||
The default (`"bottom-left"`) is often a good starting point.
|
|
||||||
- **Example Value:** `position: "center"`
|
|
||||||
|
|
||||||
- **`time_scale`**:
|
|
||||||
|
|
||||||
- **What it does:** This is a numerical scaling factor that converts the _actual duration_ (width, measured in seconds) of the bounding box into the numerical 'width' value the model learns to predict (and which is stored in the Size Heatmap).
|
|
||||||
- **Why configure it?** The model predicts raw numbers for size; this scale gives those numbers meaning.
|
|
||||||
For example, setting `time_scale: 1000.0` means the model will be trained to predict the duration in **milliseconds** instead of seconds.
|
|
||||||
- **Important Considerations:**
|
|
||||||
- You can often set this value based on the units you prefer the model to work with internally.
|
|
||||||
However, having target numerical values roughly centered around 1 (e.g., typically between 0.1 and 10) can sometimes improve numerical stability during model training.
|
|
||||||
- The default value in BatDetect2 (e.g., `1000.0`) has been chosen to scale the duration relative to the spectrogram width under default STFT settings.
|
|
||||||
Be aware that if you significantly change STFT parameters (window size or overlap), the relationship between the default scale and the spectrogram dimensions might change.
|
|
||||||
- Crucially, whatever scale you use during training **must** be used when decoding the model's predictions back into real-world time units (seconds).
|
|
||||||
BatDetect2 generally handles this consistency for you automatically when using the full pipeline.
|
|
||||||
- **Example Value:** `time_scale: 1000.0`
|
|
||||||
|
|
||||||
- **`frequency_scale`**:
|
|
||||||
- **What it does:** Similar to `time_scale`, this numerical scaling factor converts the _actual frequency bandwidth_ (height, typically measured in Hz or kHz) of the bounding box into the numerical 'height' value the model learns to predict.
|
|
||||||
- **Why configure it?** It gives physical meaning to the model's raw numerical prediction for bandwidth and allows you to choose the internal units or scale.
|
|
||||||
- **Important Considerations:**
|
|
||||||
- Same as for `time_scale`.
|
|
||||||
- **Example Value:** `frequency_scale: 0.00116`
|
|
||||||
|
|
||||||
**Example YAML Configuration:**
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
# Inside your main configuration file (e.g., training_config.yaml)
|
|
||||||
|
|
||||||
targets: # Top-level key for target definition
|
|
||||||
# ... filtering settings ...
|
|
||||||
# ... transforms settings ...
|
|
||||||
# ... classes settings ...
|
|
||||||
|
|
||||||
# --- ROI Mapping Settings ---
|
|
||||||
roi:
|
|
||||||
position: "bottom-left" # Reference point (e.g., "center", "bottom-left")
|
|
||||||
time_scale: 1000.0 # e.g., Model predicts width in ms
|
|
||||||
frequency_scale: 0.00116 # e.g., Model predicts height relative to ~860Hz (or other model-specific scaling)
|
|
||||||
```
|
|
||||||
|
|
||||||
## Decoding Size Predictions
|
|
||||||
|
|
||||||
These scaling factors (`time_scale`, `frequency_scale`) are also essential for interpreting the model's output correctly.
|
|
||||||
When the model predicts numerical values for width and height, BatDetect2 uses these same scales (in reverse) to convert those numbers back into physically meaningful durations (seconds) and bandwidths (Hz/kHz) when reconstructing bounding boxes from predictions.
|
|
||||||
|
|
||||||
## Outcome
|
|
||||||
|
|
||||||
By configuring the `roi` settings, you ensure that BatDetect2 consistently translates the geometric information from your annotations into the specific reference points and scaled size values required for training the model.
|
|
||||||
Using consistent scales that are appropriate for your data and potentially beneficial for training stability allows the model to effectively learn not just _what_ sound is present, but also _where_ it is located and _how large_ it is, and enables meaningful interpretation of the model's spatial and size predictions.
|
|
||||||
@ -1,166 +0,0 @@
|
|||||||
# Step 1: Managing Annotation Vocabulary
|
|
||||||
|
|
||||||
## Purpose
|
|
||||||
|
|
||||||
To train `batdetect2`, you will need sound events that have been carefully annotated. We annotate sound events using **tags**. A tag is simply a piece of information attached to an annotation, often describing what the sound is or its characteristics. Common examples include `Species: Myotis daubentonii` or `Quality: Good`.
|
|
||||||
|
|
||||||
Each tag fundamentally has two parts:
|
|
||||||
|
|
||||||
* **Value:** The specific information (e.g., "Myotis daubentonii", "Good").
|
|
||||||
* **Term:** The *type* of information (e.g., "Species", "Quality"). This defines the context or meaning of the value.
|
|
||||||
|
|
||||||
We use this flexible **Term: Value** approach because it allows you to annotate your data with any kind of information relevant to your project, while still providing a structure that makes the meaning clear.
|
|
||||||
|
|
||||||
While simple terms like "Species" are easy to understand, sometimes the underlying definition needs to be more precise to ensure everyone interprets it the same way (e.g., using a standard scientific definition for "Species" or clarifying what "Call Type" specifically refers to).
|
|
||||||
|
|
||||||
This `terms` module is designed to help manage these definitions effectively:
|
|
||||||
|
|
||||||
1. It provides **standard definitions** for common terms used in bioacoustics, ensuring consistency.
|
|
||||||
2. It lets you **define your own custom terms** if you need concepts specific to your project.
|
|
||||||
3. Crucially, it allows you to use simple **"keys"** (like shortcuts) in your configuration files to refer to these potentially complex term definitions, making configuration much easier and less error-prone.
|
|
||||||
|
|
||||||
## The Problem: Why We Need Defined Terms
|
|
||||||
|
|
||||||
Imagine you have a tag that simply says `"Myomyo"`.
|
|
||||||
If you created this tag, you might know it's a shortcut for the species _Myotis myotis_.
|
|
||||||
But what about someone else using your data or model later? Does `"Myomyo"` refer to the species? Or maybe it's the name of an individual bat, or even the location where it was recorded? Simple tags like this can be ambiguous.
|
|
||||||
|
|
||||||
To make things clearer, it's good practice to provide context.
|
|
||||||
We can do this by pairing the specific information (the **Value**) with the _type_ of information (the **Term**).
|
|
||||||
For example, writing the tag as `species: Myomyo` is much less ambiguous.
|
|
||||||
Here, `species` is the **Term**, explaining that `Myomyo` is a **Value** representing a species.
|
|
||||||
|
|
||||||
However, another challenge often comes up when sharing data or collaborating.
|
|
||||||
You might use the term `species`, while a colleague uses `Species`, and someone else uses the more formal `Scientific Name`.
|
|
||||||
Even though you all mean the same thing, these inconsistencies make it hard to combine data or reuse analysis pipelines automatically.
|
|
||||||
|
|
||||||
This is where standardized **Terms** become very helpful.
|
|
||||||
Several groups work to create standard definitions for common concepts.
|
|
||||||
For instance, the Darwin Core standard provides widely accepted terms for biological data, like `dwc:scientificName` for a species name.
|
|
||||||
Using standard Terms whenever possible makes your data clearer, easier for others (and machines!) to understand correctly, and much more reusable across different projects.
|
|
||||||
|
|
||||||
**But here's the practical problem:** While using standard, well-defined Terms is important for clarity and reusability, writing out full definitions or long standard names (like `dwc:scientificName` or "Scientific Name according to Darwin Core standard") every single time you need to refer to a species tag in a configuration file would be extremely tedious and prone to typing errors.
|
|
||||||
|
|
||||||
## The Solution: Keys (Shortcuts) and the Registry
|
|
||||||
|
|
||||||
This module uses a central **Registry** that stores the full definitions of various Terms.
|
|
||||||
Each Term in the registry is assigned a unique, short **key** (a simple string).
|
|
||||||
|
|
||||||
Think of the **key** as shortcut.
|
|
||||||
|
|
||||||
Instead of using the full Term definition in your configuration files, you just use its **key**.
|
|
||||||
The system automatically looks up the full definition in the registry using the key when needed.
|
|
||||||
|
|
||||||
**Example:**
|
|
||||||
|
|
||||||
- **Full Term Definition:** Represents the scientific name of the organism.
|
|
||||||
- **Key:** `species`
|
|
||||||
- **In Config:** You just write `species`.
|
|
||||||
|
|
||||||
## Available Keys
|
|
||||||
|
|
||||||
The registry comes pre-loaded with keys for many standard terms used in bioacoustics, including those from the `soundevent` package and some specific to `batdetect2`. This means you can often use these common concepts without needing to define them yourself.
|
|
||||||
|
|
||||||
Common examples of pre-defined keys might include:
|
|
||||||
|
|
||||||
* `species`: For scientific species names (e.g., *Myotis daubentonii*).
|
|
||||||
* `common_name`: For the common name of a species (e.g., "Daubenton's bat").
|
|
||||||
* `genus`, `family`, `order`: For higher levels of biological taxonomy.
|
|
||||||
* `call_type`: For functional call types (e.g., 'Echolocation', 'Social').
|
|
||||||
* `individual`: For identifying specific individuals if tracked.
|
|
||||||
* `class`: **(Special Key)** This key is often used **by default** in configurations when defining the target classes for your model (e.g., the different species you want the model to classify). If you are specifying a tag that represents a target class label, you often only need to provide the `value`, and the system assumes the `key` is `class`.
|
|
||||||
|
|
||||||
This is not an exhaustive list. To discover all the term keys currently available in the registry (including any standard ones loaded automatically and any custom ones you've added in your configuration), you can:
|
|
||||||
|
|
||||||
1. Use the function `batdetect2.terms.get_term_keys()` if you are working directly with Python code.
|
|
||||||
2. Refer to the main `batdetect2` API documentation for a list of commonly included standard terms.
|
|
||||||
|
|
||||||
Okay, let's refine the "Defining Your Own Terms" section to incorporate the explanation about namespacing within the `name` field description, keeping the style clear and researcher-focused.
|
|
||||||
|
|
||||||
## Defining Your Own Terms
|
|
||||||
|
|
||||||
While many common terms have pre-defined keys, you might need a term specific to your project or data that isn't already available (e.g., "Recording Setup", "Weather Condition", "Project Phase", "Noise Source"). You can easily define these custom terms directly within a configuration file (usually your main `.yaml` file).
|
|
||||||
|
|
||||||
Typically, you define custom terms under a dedicated section (often named `terms`). Inside this section, you create a list, where each item in the list defines one new term using the following fields:
|
|
||||||
|
|
||||||
* `key`: **(Required)** This is the unique shortcut key or nickname you will use to refer to this term throughout your configuration (e.g., `weather`, `setup_id`, `noise_src`). Choose something short and memorable.
|
|
||||||
* `label`: (Optional) A user-friendly label for the term, which might be used in reports or visualizations (e.g., "Weather Condition", "Setup ID"). If you don't provide one, it defaults to using the `key`.
|
|
||||||
* `name`: (Optional) A more formal or technical name for the term.
|
|
||||||
* It's good practice, especially if defining terms that might overlap with standard vocabularies, to use a **namespaced format** like `<namespace>:<term_name>`. The `namespace` part helps avoid clashes with terms defined elsewhere. For example, the standard Darwin Core term for scientific name is `dwc:scientificName`, where `dwc` is the namespace for Darwin Core. Using namespaces makes your custom terms more specific and reduces potential confusion.
|
|
||||||
* If you don't provide a `name`, it defaults to using the `key`.
|
|
||||||
* `definition`: (Optional) A brief text description explaining what this term represents (e.g., "The primary source of background noise identified", "General weather conditions during recording"). If omitted, it defaults to "Unknown".
|
|
||||||
* `uri`: (Optional) If your term definition comes directly from a standard online vocabulary (like Darwin Core), you can include its unique web identifier (URI) here.
|
|
||||||
|
|
||||||
**Example YAML Configuration for Custom Terms:**
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
# In your main configuration file
|
|
||||||
|
|
||||||
# (Optional section to define custom terms)
|
|
||||||
terms:
|
|
||||||
- key: weather # Your chosen shortcut
|
|
||||||
label: Weather Condition
|
|
||||||
name: myproj:weather # Formal namespaced name
|
|
||||||
definition: General weather conditions during recording (e.g., Clear, Rain, Fog).
|
|
||||||
|
|
||||||
- key: setup_id # Another shortcut
|
|
||||||
label: Recording Setup ID
|
|
||||||
name: myproj:setupID # Formal namespaced name
|
|
||||||
definition: The unique identifier for the specific hardware setup used.
|
|
||||||
|
|
||||||
- key: species # Defining a term with a standard URI
|
|
||||||
label: Scientific Name
|
|
||||||
name: dwc:scientificName
|
|
||||||
uri: http://rs.tdwg.org/dwc/terms/scientificName # Example URI
|
|
||||||
definition: The full scientific name according to Darwin Core.
|
|
||||||
|
|
||||||
# ... other configuration sections ...
|
|
||||||
```
|
|
||||||
|
|
||||||
When `batdetect2` loads your configuration, it reads this `terms` section and adds your custom definitions (linked to their unique keys) to the central registry. These keys (`weather`, `setup_id`, etc.) are then ready to be used in other parts of your configuration, like defining filters or target classes.
|
|
||||||
|
|
||||||
## Using Keys to Specify Tags (in Filters, Class Definitions, etc.)
|
|
||||||
|
|
||||||
Now that you have keys for all the terms you need (both pre-defined and custom), you can easily refer to specific **tags** in other parts of your configuration, such as:
|
|
||||||
|
|
||||||
- Filtering rules (as seen in the `filtering` module documentation).
|
|
||||||
- Defining which tags represent your target classes.
|
|
||||||
- Associating extra information with your classes.
|
|
||||||
|
|
||||||
When you need to specify a tag, you typically use a structure with two fields:
|
|
||||||
|
|
||||||
- `key`: The **key** (shortcut) for the _Term_ part of the tag (e.g., `species`, `quality`, `weather`).
|
|
||||||
**It defaults to `class`** if you omit it, which is common when defining the main target classes.
|
|
||||||
- `value`: The specific _value_ of the tag (e.g., `Myotis daubentonii`, `Good`, `Rain`).
|
|
||||||
|
|
||||||
**Example YAML Configuration (e.g., inside a filter rule):**
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
# ... inside a filtering configuration section ...
|
|
||||||
rules:
|
|
||||||
# Rule: Exclude events recorded in 'Rain'
|
|
||||||
- match_type: exclude
|
|
||||||
tags:
|
|
||||||
- key: weather # Use the custom term key defined earlier
|
|
||||||
value: Rain
|
|
||||||
# Rule: Keep only 'Myotis daubentonii' (using the default 'class' key implicitly)
|
|
||||||
- match_type: any # Or 'all' depending on logic
|
|
||||||
tags:
|
|
||||||
- value: Myotis daubentonii # 'key: class' is assumed by default here
|
|
||||||
# key: class # Explicitly writing this is also fine
|
|
||||||
# Rule: Keep only 'Good' quality events
|
|
||||||
- match_type: any # Or 'all' depending on logic
|
|
||||||
tags:
|
|
||||||
- key: quality # Use a likely pre-defined key
|
|
||||||
value: Good
|
|
||||||
```
|
|
||||||
|
|
||||||
## Summary
|
|
||||||
|
|
||||||
- Annotations have **tags** (Term + Value).
|
|
||||||
- This module uses short **keys** as shortcuts for Term definitions, stored in a **registry**.
|
|
||||||
- Many **common keys are pre-defined**.
|
|
||||||
- You can define **custom terms and keys** in your configuration file (using `key`, `label`, `definition`).
|
|
||||||
- You use these **keys** along with specific **values** to refer to tags in other configuration sections (like filters or class definitions), often defaulting to the `class` key.
|
|
||||||
|
|
||||||
This system makes your configurations cleaner, more readable, and less prone to errors by avoiding repetition of complex term definitions.
|
|
||||||
@ -1,118 +0,0 @@
|
|||||||
# Step 3: Transforming Annotation Tags (Optional)
|
|
||||||
|
|
||||||
## Purpose and Context
|
|
||||||
|
|
||||||
After defining your vocabulary (Step 1: Terms) and filtering out irrelevant sound events (Step 2: Filtering), you have a dataset of annotations ready for the next stages.
|
|
||||||
Before you select the final target classes for training (Step 4), you might want or need to **modify the tags** associated with your annotations.
|
|
||||||
This optional step allows you to clean up, standardize, or derive new information from your existing tags.
|
|
||||||
|
|
||||||
**Why transform tags?**
|
|
||||||
|
|
||||||
- **Correcting Mistakes:** Fix typos or incorrect values in specific tags (e.g., changing an incorrect species label).
|
|
||||||
- **Standardizing Labels:** Ensure consistency if the same information was tagged using slightly different values (e.g., mapping "echolocation", "Echoloc.", and "Echolocation Call" all to a single standard value: "Echolocation").
|
|
||||||
- **Grouping Related Concepts:** Combine different specific tags into a broader category (e.g., mapping several different species tags like _Myotis daubentonii_ and _Myotis nattereri_ to a single `genus: Myotis` tag).
|
|
||||||
- **Deriving New Information:** Automatically create new tags based on existing ones (e.g., automatically generating a `genus: Myotis` tag whenever a `species: Myotis daubentonii` tag is present).
|
|
||||||
|
|
||||||
This step uses the `batdetect2.targets.transform` module to apply these changes based on rules you define.
|
|
||||||
|
|
||||||
## How it Works: Transformation Rules
|
|
||||||
|
|
||||||
You control how tags are transformed by defining a list of **rules** in your configuration file (e.g., your main `.yaml` file, often under a section named `transform`).
|
|
||||||
|
|
||||||
Each rule specifies a particular type of transformation to perform.
|
|
||||||
Importantly, the rules are applied **sequentially**, in the exact order they appear in your configuration list.
|
|
||||||
The output annotation from one rule becomes the input for the next rule in the list.
|
|
||||||
This means the order can matter!
|
|
||||||
|
|
||||||
## Types of Transformation Rules
|
|
||||||
|
|
||||||
Here are the main types of rules you can define:
|
|
||||||
|
|
||||||
1. **Replace an Exact Tag (`replace`)**
|
|
||||||
|
|
||||||
- **Use Case:** Fixing a specific, known incorrect tag.
|
|
||||||
- **How it works:** You specify the _exact_ original tag (both its term key and value) and the _exact_ tag you want to replace it with.
|
|
||||||
- **Example Config:** Replace the informal tag `species: Pip pip` with the correct scientific name tag.
|
|
||||||
```yaml
|
|
||||||
transform:
|
|
||||||
rules:
|
|
||||||
- rule_type: replace
|
|
||||||
original:
|
|
||||||
key: species # Term key of the tag to find
|
|
||||||
value: "Pip pip" # Value of the tag to find
|
|
||||||
replacement:
|
|
||||||
key: species # Term key of the replacement tag
|
|
||||||
value: "Pipistrellus pipistrellus" # Value of the replacement tag
|
|
||||||
```
|
|
||||||
|
|
||||||
2. **Map Values (`map_value`)**
|
|
||||||
|
|
||||||
- **Use Case:** Standardizing different values used for the same concept, or grouping multiple specific values into one category.
|
|
||||||
- **How it works:** You specify a `source_term_key` (the type of tag to look at, e.g., `call_type`).
|
|
||||||
Then you provide a `value_mapping` dictionary listing original values and the new values they should be mapped to.
|
|
||||||
Only tags matching the `source_term_key` and having a value listed in the mapping will be changed.
|
|
||||||
You can optionally specify a `target_term_key` if you want to change the term type as well (e.g., mapping species to a genus).
|
|
||||||
- **Example Config:** Standardize different ways "Echolocation" might have been written for the `call_type` term.
|
|
||||||
```yaml
|
|
||||||
transform:
|
|
||||||
rules:
|
|
||||||
- rule_type: map_value
|
|
||||||
source_term_key: call_type # Look at 'call_type' tags
|
|
||||||
# target_term_key is not specified, so the term stays 'call_type'
|
|
||||||
value_mapping:
|
|
||||||
echolocation: Echolocation
|
|
||||||
Echolocation Call: Echolocation
|
|
||||||
Echoloc.: Echolocation
|
|
||||||
# Add mappings for other values like 'Social' if needed
|
|
||||||
```
|
|
||||||
- **Example Config (Grouping):** Map specific Pipistrellus species tags to a single `genus: Pipistrellus` tag.
|
|
||||||
```yaml
|
|
||||||
transform:
|
|
||||||
rules:
|
|
||||||
- rule_type: map_value
|
|
||||||
source_term_key: species # Look at 'species' tags
|
|
||||||
target_term_key: genus # Change the term to 'genus'
|
|
||||||
value_mapping:
|
|
||||||
"Pipistrellus pipistrellus": Pipistrellus
|
|
||||||
"Pipistrellus pygmaeus": Pipistrellus
|
|
||||||
"Pipistrellus nathusii": Pipistrellus
|
|
||||||
```
|
|
||||||
|
|
||||||
3. **Derive a New Tag (`derive_tag`)**
|
|
||||||
- **Use Case:** Automatically creating new information based on existing tags, like getting the genus from a species name.
|
|
||||||
- **How it works:** You specify a `source_term_key` (e.g., `species`).
|
|
||||||
You provide a `target_term_key` for the new tag to be created (e.g., `genus`).
|
|
||||||
You also provide the name of a `derivation_function` (e.g., `"extract_genus"`) that knows how to perform the calculation (e.g., take "Myotis daubentonii" and return "Myotis").
|
|
||||||
`batdetect2` has some built-in functions, or you can potentially define your own (see advanced documentation).
|
|
||||||
You can also choose whether to keep the original source tag (`keep_source: true`).
|
|
||||||
- **Example Config:** Create a `genus` tag from the existing `species` tag, keeping the species tag.
|
|
||||||
```yaml
|
|
||||||
transform:
|
|
||||||
rules:
|
|
||||||
- rule_type: derive_tag
|
|
||||||
source_term_key: species # Use the value from the 'species' tag
|
|
||||||
target_term_key: genus # Create a tag with the 'genus' term
|
|
||||||
derivation_function: extract_genus # Use the built-in function for this
|
|
||||||
keep_source: true # Keep the original 'species' tag
|
|
||||||
```
|
|
||||||
- **Another Example:** Convert species names to uppercase (modifying the value of the _same_ term).
|
|
||||||
```yaml
|
|
||||||
transform:
|
|
||||||
rules:
|
|
||||||
- rule_type: derive_tag
|
|
||||||
source_term_key: species # Use the value from the 'species' tag
|
|
||||||
# target_term_key is not specified, so the term stays 'species'
|
|
||||||
derivation_function: to_upper_case # Assume this function exists
|
|
||||||
keep_source: false # Replace the original species tag
|
|
||||||
```
|
|
||||||
|
|
||||||
## Rule Order Matters
|
|
||||||
|
|
||||||
Remember that rules are applied one after another.
|
|
||||||
If you have multiple rules, make sure they are ordered correctly to achieve the desired outcome.
|
|
||||||
For instance, you might want to standardize species names _before_ deriving the genus from them.
|
|
||||||
|
|
||||||
## Outcome
|
|
||||||
|
|
||||||
After applying all the transformation rules you've defined, the annotations will proceed to the next step (Step 4: Select Target Tags & Define Classes) with their tags potentially cleaned, standardized, or augmented based on your configuration.
|
|
||||||
If you don't define any rules, the tags simply pass through this step unchanged.
|
|
||||||
@ -1,91 +0,0 @@
|
|||||||
# Bringing It All Together: The `Targets` Object
|
|
||||||
|
|
||||||
## Recap: Defining Your Target Strategy
|
|
||||||
|
|
||||||
In the previous sections, we covered the sequential steps to precisely define what your BatDetect2 model should learn, specified within your configuration file:
|
|
||||||
|
|
||||||
1. **Terms:** Establishing the vocabulary for annotation tags.
|
|
||||||
2. **Filtering:** Selecting relevant sound event annotations.
|
|
||||||
3. **Transforming:** Optionally modifying tags.
|
|
||||||
4. **Classes:** Defining target categories, setting priorities, and specifying tag decoding rules.
|
|
||||||
5. **ROI Mapping:** Defining how annotation geometry maps to target position and size values.
|
|
||||||
|
|
||||||
You define all these aspects within your configuration file (e.g., YAML), which holds the complete specification for your target definition strategy, typically under a main `targets:` key.
|
|
||||||
|
|
||||||
## What is the `Targets` Object?
|
|
||||||
|
|
||||||
While the configuration file specifies _what_ you want to happen, BatDetect2 needs an active component to actually _perform_ these steps.
|
|
||||||
This is the role of the `Targets` object.
|
|
||||||
|
|
||||||
The `Targets` is an organized container that holds all the specific functions and settings derived from your configuration file (`TargetConfig`).
|
|
||||||
It's created directly from your configuration and provides methods to apply the **filtering**, **transformation**, **ROI mapping** (geometry to position/size and back), **class encoding**, and **class decoding** steps you defined.
|
|
||||||
It effectively bundles together all the target definition logic determined by your settings into a single, usable object.
|
|
||||||
|
|
||||||
## How is it Created and Used?
|
|
||||||
|
|
||||||
For most standard training workflows, you typically won't need to create or interact with the `Targets` object directly in Python code.
|
|
||||||
BatDetect2 usually handles its creation automatically when you provide your main configuration file during training setup.
|
|
||||||
|
|
||||||
Conceptually, here's what happens behind the scenes:
|
|
||||||
|
|
||||||
1. You provide the path to your configuration file (e.g., `my_training_config.yaml`).
|
|
||||||
2. BatDetect2 reads this file and finds your `targets:` configuration section.
|
|
||||||
3. It uses this configuration to build an instance of the `Targets` object using a dedicated function (like `load_targets`), loading it with the appropriate logic based on your settings.
|
|
||||||
|
|
||||||
```python
|
|
||||||
# Conceptual Example: How BatDetect2 might use your configuration
|
|
||||||
from batdetect2.targets import load_targets # The function to load/build the object
|
|
||||||
from batdetect2.targets.types import TargetProtocol # The type/interface
|
|
||||||
|
|
||||||
# You provide this path, usually as part of the main training setup
|
|
||||||
target_config_file = "path/to/your/target_config.yaml"
|
|
||||||
|
|
||||||
# --- BatDetect2 Internally Does Something Like This: ---
|
|
||||||
# Loads your config and builds the Targets object using the loader function
|
|
||||||
# The resulting object adheres to the TargetProtocol interface
|
|
||||||
targets_processor: TargetProtocol = load_targets(target_config_file)
|
|
||||||
# ---------------------------------------------------------
|
|
||||||
|
|
||||||
# Now, 'targets_processor' holds all your configured logic and is ready
|
|
||||||
# to be used internally by the training pipeline or for prediction processing.
|
|
||||||
```
|
|
||||||
|
|
||||||
## What Does the `Targets` Object Do? (Its Role)
|
|
||||||
|
|
||||||
Once created, the `targets_processor` object plays several vital roles within the BatDetect2 system:
|
|
||||||
|
|
||||||
1. **Preparing Training Data:** During the data loading and label generation phase of training, BatDetect2 uses this object to process each annotation from your dataset _before_ the final training format (e.g., heatmaps) is generated.
|
|
||||||
For each annotation, it internally applies the logic:
|
|
||||||
- `targets_processor.filter(...)`: To decide whether to keep the annotation.
|
|
||||||
- `targets_processor.transform(...)`: To apply any tag modifications.
|
|
||||||
- `targets_processor.encode(...)`: To get the final class name (e.g., `'pippip'`, `'myodau'`, or `None` for the generic class).
|
|
||||||
- `targets_processor.get_position(...)`: To determine the reference `(time, frequency)` point from the annotation's geometry.
|
|
||||||
- `targets_processor.get_size(...)`: To calculate the _scaled_ width and height target values from the annotation's geometry.
|
|
||||||
2. **Interpreting Model Predictions:** When you use a trained model, its raw outputs (like predicted class names, positions, and sizes) need to be translated back into meaningful results.
|
|
||||||
This object provides the necessary decoding logic:
|
|
||||||
- `targets_processor.decode(...)`: Converts a predicted class name back into representative annotation tags.
|
|
||||||
- `targets_processor.recover_roi(...)`: Converts a predicted position and _scaled_ size values back into an estimated geometric bounding box in real-world coordinates (seconds, Hz).
|
|
||||||
- `targets_processor.generic_class_tags`: Provides the tags for sounds classified into the generic category.
|
|
||||||
3. **Providing Metadata:** It conveniently holds useful information derived from your configuration:
|
|
||||||
- `targets_processor.class_names`: The final list of specific target class names.
|
|
||||||
- `targets_processor.generic_class_tags`: The tags representing the generic class.
|
|
||||||
- `targets_processor.dimension_names`: The names used for the size dimensions (e.g., `['width', 'height']`).
|
|
||||||
|
|
||||||
## Why is Understanding This Important?
|
|
||||||
|
|
||||||
As a researcher using BatDetect2, your primary interaction is typically through the **configuration file**.
|
|
||||||
The `Targets` object is the component that materializes your configurations.
|
|
||||||
|
|
||||||
Understanding its role can be important:
|
|
||||||
|
|
||||||
- It helps connect the settings in your configuration file (covering terms, filtering, transforms, classes, and ROIs) to the actual behavior observed during training or when interpreting model outputs.
|
|
||||||
If the results aren't as expected (e.g., wrong classifications, incorrect bounding box predictions), reviewing the relevant sections of your `TargetConfig` is the first step in debugging.
|
|
||||||
- Furthermore, understanding this structure is beneficial if you plan to create custom Python scripts.
|
|
||||||
While standard training runs handle this object internally, the underlying functions for filtering, transforming, encoding, decoding, and ROI mapping are accessible or can be built individually.
|
|
||||||
This modular design provides the **flexibility to use or customize specific parts of the target definition workflow programmatically** for advanced analyses, integration tasks, or specialized data processing pipelines, should you need to go beyond the standard configuration-driven approach.
|
|
||||||
|
|
||||||
## Summary
|
|
||||||
|
|
||||||
The `Targets` object encapsulates the entire configured target definition logic specified in your `TargetConfig` file.
|
|
||||||
It acts as the central component within BatDetect2 for applying filtering, tag transformation, ROI mapping (geometry to/from position/size), class encoding (for training preparation), and class/ROI decoding (for interpreting predictions).
|
|
||||||
It bridges the gap between your declarative configuration and the functional steps needed for training and using BatDetect2 models effectively, while also offering components for more advanced, scripted workflows.
|
|
||||||
Loading…
Reference in New Issue
Block a user