mirror of
https://github.com/macaodha/batdetect2.git
synced 2026-04-04 15:20:19 +02:00
Compare commits
16 Commits
5d92f3a00d
...
591d4f4ae8
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
591d4f4ae8 | ||
|
|
2b235e28bb | ||
|
|
716b3a3778 | ||
|
|
548cd366cd | ||
|
|
67bb66db3c | ||
|
|
d2d804f0c3 | ||
|
|
42d6a0940c | ||
|
|
3074980e18 | ||
|
|
ebc1f573c8 | ||
|
|
c5b2446978 | ||
|
|
8e33473b4e | ||
|
|
b253a54cc8 | ||
|
|
dfe19c68f0 | ||
|
|
e4bbde9995 | ||
|
|
7277151f33 | ||
|
|
9ec2f1a107 |
1
.gitignore
vendored
1
.gitignore
vendored
@ -128,6 +128,7 @@ notebooks/tmp
|
||||
/notebooks
|
||||
/AGENTS.md
|
||||
/scripts
|
||||
/todo.md
|
||||
|
||||
# Assets
|
||||
!assets/*
|
||||
|
||||
@ -113,7 +113,7 @@ Your data may differ, and as a result it is very strongly recommended that you v
|
||||
|
||||
|
||||
## FAQ
|
||||
For more information please consult our [FAQ](faq.md).
|
||||
For more information please consult our [FAQ](docs/source/faq.md).
|
||||
|
||||
|
||||
## Reference
|
||||
|
||||
@ -1,93 +0,0 @@
|
||||
# BatDetect2 Architecture Overview
|
||||
|
||||
This document provides a comprehensive map of the `batdetect2` codebase architecture. It is intended to serve as a deep-dive reference for developers, agents, and contributors navigating the project.
|
||||
|
||||
`batdetect2` is designed as a modular deep learning pipeline for detecting and classifying bat echolocation calls in high-frequency audio recordings. It heavily utilizes **PyTorch**, **PyTorch Lightning** for training, and the **Soundevent** library for standardized audio and geometry data classes.
|
||||
|
||||
The repository follows a configuration-driven design pattern, heavily utilizing `pydantic`/`omegaconf` (via `BaseConfig`) and the Factory/Registry patterns for dependency injection and modularity. The entire pipeline can be orchestrated via the high-level API `BatDetect2API` (`src/batdetect2/api_v2.py`).
|
||||
|
||||
---
|
||||
|
||||
## 1. Data Flow Pipeline
|
||||
|
||||
The standard lifecycle of a prediction request follows these sequential stages, each handled by an isolated, replaceable module:
|
||||
|
||||
1. **Audio Loading (`batdetect2.audio`)**: Read raw `.wav` files into standard NumPy arrays or `soundevent.data.Clip` objects. Handles resampling.
|
||||
2. **Preprocessing (`batdetect2.preprocess`)**: Converts raw 1D waveforms into 2D Spectrogram tensors.
|
||||
3. **Forward Pass (`batdetect2.models`)**: A PyTorch neural network processes the spectrogram and outputs dense prediction tensors (e.g., detection heatmaps, bounding box sizes, class probabilities).
|
||||
4. **Postprocessing (`batdetect2.postprocess`)**: Decodes the raw output tensors back into explicit geometry bounding boxes and runs Non-Maximum Suppression (NMS) to filter redundant predictions.
|
||||
5. **Formatting (`batdetect2.data`)**: Transforms the predictions into standard formats (`.csv`, `.json`, `.parquet`) using `OutputFormatterProtocol`.
|
||||
|
||||
---
|
||||
|
||||
## 2. Core Modules Breakdown
|
||||
|
||||
### 2.1 Audio and Preprocessing
|
||||
- **`audio/`**:
|
||||
- Centralizes audio I/O using `AudioLoader`. It abstracts over the `soundevent` library, efficiently handling full `Recording` files or smaller `Clip` segments, standardizing the sample rate.
|
||||
- **`preprocess/`**:
|
||||
- Dictated by the `PreprocessorProtocol`.
|
||||
- Its primary responsibility is spectrogram generation via Short-Time Fourier Transform (STFT).
|
||||
- During training, it incorporates data augmentation layers (e.g., amplitude scaling, time masking, frequency masking, spectral mean subtraction) configured via `PreprocessingConfig`.
|
||||
|
||||
### 2.2 Deep Learning Models (`models/`)
|
||||
The `models` directory contains all PyTorch neural network architectures. The default architecture is an Encoder-Decoder (U-Net style) network.
|
||||
- **`blocks.py`**: Reusable neural network blocks, including standard Convolutions (`ConvBlock`) and specialized layers like `FreqCoordConvDownBlock`/`FreqCoordConvUpBlock` which append normalized spatial frequency coordinates to explicitly grant convolutional filters frequency-awareness.
|
||||
- **`encoder.py`**: The downsampling path (feature extraction). Builds a sequential list of blocks and captures skip connections.
|
||||
- **`bottleneck.py`**: The deepest, lowest-resolution segment connecting the Encoder and Decoder. Features an optional `SelfAttention` mechanism to weigh global temporal contexts.
|
||||
- **`decoder.py`**: The upsampling path (reconstruction), actively integrating skip connections (residuals) from the Encoder.
|
||||
- **`heads.py`**: Attach to the backbone's feature map to output specific predictions:
|
||||
- `BBoxHead`: Predicts bounding box sizes.
|
||||
- `ClassifierHead`: Predicts species classes.
|
||||
- `DetectorHead`: Predicts detection probability heatmaps.
|
||||
- **`backbones.py` & `detectors.py`**: Assemble the encoder, bottleneck, decoder, and heads into a cohesive `Detector` model.
|
||||
- **`__init__.py:Model`**: The overarching wrapper `torch.nn.Module` containing the `detector`, `preprocessor`, `postprocessor`, and `targets`.
|
||||
|
||||
### 2.3 Targets and Regions of Interest (`targets/`)
|
||||
Crucial for training, this module translates physical annotations (Regions of Interest / ROIs) into training targets (tensors).
|
||||
- **`rois.py`**: Implements `ROITargetMapper`. Maps a geometric bounding box into a 2D reference `Position` (time, freq) and a `Size` array. Includes strategies like:
|
||||
- `AnchorBBoxMapper`: Maps based on a fixed bounding box corner/center.
|
||||
- `PeakEnergyBBoxMapper`: Identifies the physical coordinate of peak acoustic energy inside the bounding box and calculates offsets to the box edges.
|
||||
- **`targets.py`**: Reconstructs complete multi-channel target heatmaps and coordinate tensors from the ROIs to compute losses during training.
|
||||
|
||||
### 2.4 Postprocessing (`postprocess/`)
|
||||
- Implements `PostprocessorProtocol`.
|
||||
- Reverses the logic from `targets`. It scans the model's output detection heatmaps for peaks, extracts the predicted sizes and class probabilities at those peaks, and decodes them back into physical `soundevent.data.Geometry` (Bounding Boxes).
|
||||
- Automatically applies Non-Maximum Suppression (NMS) configured via `PostprocessConfig` to remove highly overlapping predictions.
|
||||
|
||||
### 2.5 Data Management (`data/`)
|
||||
- **`annotations/`**: Utilities to load dataset annotations supporting multiple standardized schemas (`AOEF`, `BatDetect2` formats).
|
||||
- **`datasets.py`**: Aggregates recordings and annotations into memory.
|
||||
- **`predictions/`**: Handles the exporting of model results via `OutputFormatterProtocol`. Includes formatters for `RawOutput`, `.parquet`, `.json`, etc.
|
||||
|
||||
### 2.6 Evaluation (`evaluate/`)
|
||||
- Computes scientific metrics using `EvaluatorProtocol`.
|
||||
- Provides specific testing environments for tasks like `Clip Classification`, `Clip Detection`, and `Top Class` predictions.
|
||||
- Generates precision-recall curves and scatter plots.
|
||||
|
||||
### 2.7 Training (`train/`)
|
||||
- Implements the distributed PyTorch training loop via PyTorch Lightning.
|
||||
- **`lightning.py`**: Contains `TrainingModule`, the `LightningModule` that orchestrates the optimizer, learning rate scheduler, forward passes, and backpropagation using the generated `targets`.
|
||||
|
||||
---
|
||||
|
||||
## 3. Interfaces and Tooling
|
||||
|
||||
### 3.1 APIs
|
||||
- **`api_v2.py` (`BatDetect2API`)**: The modern API object. It is deeply integrated with dependency injection using `BatDetect2Config`. It instantiates the loader, targets, preprocessor, postprocessor, and model, exposing easy-to-use methods like `process_file`, `evaluate`, and `train`.
|
||||
- **`api.py`**: The legacy API. Kept for backwards compatibility. Uses hardcoded default instances rather than configuration objects.
|
||||
|
||||
### 3.2 Command Line Interface (`cli/`)
|
||||
- Implements terminal commands utilizing `click`. Commands include `batdetect2 detect`, `evaluate`, and `train`.
|
||||
|
||||
### 3.3 Core and Configuration (`core/`, `config.py`)
|
||||
- **`core/registries.py`**: A string-based Registry pattern (e.g., `block_registry`, `roi_mapper_registry`) that allows developers to dynamically swap components (like a custom neural network block) via configuration files without modifying python code.
|
||||
- **`config.py`**: Aggregates all modular `BaseConfig` objects (`AudioConfig`, `PreprocessingConfig`, `BackboneConfig`) into the monolithic `BatDetect2Config`.
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
To navigate this codebase effectively:
|
||||
1. Follow **`api_v2.py`** to see how high-level operations invoke individual components.
|
||||
2. Rely heavily on the typed **Protocols** located in each subsystem's `types.py` module (for example `src/batdetect2/preprocess/types.py` and `src/batdetect2/postprocess/types.py`) to understand inputs and outputs without needing to read each implementation.
|
||||
3. Understand that data flows structurally as `soundevent` primitives externally, and as pure `torch.Tensor` internally through the network.
|
||||
@ -19,6 +19,7 @@ extensions = [
|
||||
"sphinx.ext.autosummary",
|
||||
"sphinx.ext.intersphinx",
|
||||
"sphinxcontrib.autodoc_pydantic",
|
||||
"sphinx_click",
|
||||
"numpydoc",
|
||||
"myst_parser",
|
||||
"sphinx_autodoc_typehints",
|
||||
@ -38,16 +39,28 @@ source_suffix = {
|
||||
|
||||
html_theme = "sphinx_book_theme"
|
||||
html_static_path = ["_static"]
|
||||
html_theme_options = {
|
||||
"home_page_in_toc": True,
|
||||
"show_navbar_depth": 2,
|
||||
"show_toc_level": 2,
|
||||
}
|
||||
|
||||
intersphinx_mapping = {
|
||||
"python": ("https://docs.python.org/3", None),
|
||||
"click": ("https://click.palletsprojects.com/en/stable/", None),
|
||||
"librosa": ("https://librosa.org/doc/latest/", None),
|
||||
"lightning": ("https://lightning.ai/docs/pytorch/stable/", None),
|
||||
"loguru": ("https://loguru.readthedocs.io/en/stable/", None),
|
||||
"numpy": ("https://numpy.org/doc/stable/", None),
|
||||
"omegaconf": ("https://omegaconf.readthedocs.io/en/latest/", None),
|
||||
"pytorch": ("https://pytorch.org/docs/stable/", None),
|
||||
"soundevent": ("https://mbsantiago.github.io/soundevent/", None),
|
||||
"pydantic": ("https://docs.pydantic.dev/latest/", None),
|
||||
"xarray": ("https://docs.xarray.dev/en/stable/", None),
|
||||
}
|
||||
|
||||
# -- Options for autodoc ------------------------------------------------------
|
||||
autosummary_generate = True
|
||||
autosummary_generate = False
|
||||
autosummary_imported_members = True
|
||||
|
||||
autodoc_default_options = {
|
||||
@ -59,3 +72,7 @@ autodoc_default_options = {
|
||||
"show-inheritance": True,
|
||||
"module-first": True,
|
||||
}
|
||||
|
||||
numpydoc_show_class_members = False
|
||||
numpydoc_show_inherited_class_members = False
|
||||
numpydoc_class_members_toctree = False
|
||||
|
||||
@ -1,106 +0,0 @@
|
||||
# Using AOEF / Soundevent Data Sources
|
||||
|
||||
## Introduction
|
||||
|
||||
The **AOEF (Acoustic Open Event Format)**, stored as `.json` files, is the annotation format used by the underlying `soundevent` library and is compatible with annotation tools like **Whombat**.
|
||||
BatDetect2 can directly load annotation data stored in this format.
|
||||
|
||||
This format can represent two main types of annotation collections:
|
||||
|
||||
1. `AnnotationSet`: A straightforward collection of annotations for various audio clips.
|
||||
2. `AnnotationProject`: A more structured format often exported by annotation tools (like Whombat).
|
||||
It includes not only the annotations but also information about annotation _tasks_ (work assigned to annotators) and their status (e.g., in-progress, completed, verified, rejected).
|
||||
|
||||
This section explains how to configure a data source in your `DatasetConfig` to load data from either type of AOEF file.
|
||||
|
||||
## Configuration
|
||||
|
||||
To define a data source using the AOEF format, you add an entry to the `sources` list in your main `DatasetConfig` (usually within your primary YAML configuration file) and set the `format` field to `"aoef"`.
|
||||
|
||||
Here are the key fields you need to specify for an AOEF source:
|
||||
|
||||
- `format: "aoef"`: **(Required)** Tells BatDetect2 to use the AOEF loader for this source.
|
||||
- `name: your_source_name`: **(Required)** A unique name you choose for this data source (e.g., `"whombat_project_export"`, `"final_annotations"`).
|
||||
- `audio_dir: path/to/audio/files`: **(Required)** The path to the directory where the actual audio `.wav` files referenced in the annotations are located.
|
||||
- `annotations_path: path/to/your/annotations.aoef`: **(Required)** The path to the single `.aoef` or `.json` file containing the annotation data (either an `AnnotationSet` or an `AnnotationProject`).
|
||||
- `description: "Details about this source..."`: (Optional) A brief description of the data source.
|
||||
- `filter: ...`: **(Optional)** Specific settings used _only if_ the `annotations_path` file contains an `AnnotationProject`.
|
||||
See details below.
|
||||
|
||||
## Filtering Annotation Projects (Optional)
|
||||
|
||||
When working with annotation projects, especially collaborative ones or those still in progress (like exports from Whombat), you often want to train only on annotations that are considered complete and reliable.
|
||||
The optional `filter:` section allows you to specify criteria based on the status of the annotation _tasks_ within the project.
|
||||
|
||||
**If `annotations_path` points to a simple `AnnotationSet` file, the `filter:` section is ignored.**
|
||||
|
||||
If `annotations_path` points to an `AnnotationProject`, you can add a `filter:` block with the following options:
|
||||
|
||||
- `only_completed: <true_or_false>`:
|
||||
- `true` (Default): Only include annotations from tasks that have been marked as "completed".
|
||||
- `false`: Include annotations regardless of task completion status.
|
||||
- `only_verified: <true_or_false>`:
|
||||
- `false` (Default): Verification status is not considered.
|
||||
- `true`: Only include annotations from tasks that have _also_ been marked as "verified" (typically meaning they passed a review step).
|
||||
- `exclude_issues: <true_or_false>`:
|
||||
- `true` (Default): Exclude annotations from any task that has been marked as "rejected" or flagged with issues.
|
||||
- `false`: Include annotations even if their task was marked as having issues (use with caution).
|
||||
|
||||
**Default Filtering:** If you include the `filter:` block but omit some options, or if you _omit the entire `filter:` block_, the default settings are applied to `AnnotationProject` files: `only_completed: true`, `only_verified: false`, `exclude_issues: true`.
|
||||
This common default selects annotations from completed tasks that haven't been rejected, without requiring separate verification.
|
||||
|
||||
**Disabling Filtering:** If you want to load _all_ annotations from an `AnnotationProject` regardless of task status, you can explicitly disable filtering by setting `filter: null` in your YAML configuration.
|
||||
|
||||
## YAML Configuration Examples
|
||||
|
||||
**Example 1: Loading a standard AnnotationSet (or a Project with default filtering)**
|
||||
|
||||
```yaml
|
||||
# In your main DatasetConfig YAML file
|
||||
|
||||
sources:
|
||||
- name: "MyFinishedAnnotations"
|
||||
format: "aoef" # Specifies the loader
|
||||
audio_dir: "/path/to/my/audio/"
|
||||
annotations_path: "/path/to/my/dataset.soundevent.json" # Path to the AOEF file
|
||||
description: "Finalized annotations set."
|
||||
# No 'filter:' block means default filtering applied IF it's an AnnotationProject,
|
||||
# or no filtering applied if it's an AnnotationSet.
|
||||
```
|
||||
|
||||
**Example 2: Loading an AnnotationProject, requiring verification**
|
||||
|
||||
```yaml
|
||||
# In your main DatasetConfig YAML file
|
||||
|
||||
sources:
|
||||
- name: "WhombatVerifiedExport"
|
||||
format: "aoef"
|
||||
audio_dir: "relative/path/to/audio/" # Relative to where BatDetect2 runs or a base_dir
|
||||
annotations_path: "exports/whombat_project.aoef" # Path to the project file
|
||||
description: "Annotations from Whombat project, only using verified tasks."
|
||||
filter: # Customize the filter
|
||||
only_completed: true # Still require completion
|
||||
only_verified: true # *Also* require verification
|
||||
exclude_issues: true # Still exclude rejected tasks
|
||||
```
|
||||
|
||||
**Example 3: Loading an AnnotationProject, disabling all filtering**
|
||||
|
||||
```yaml
|
||||
# In your main DatasetConfig YAML file
|
||||
|
||||
sources:
|
||||
- name: "WhombatRawExport"
|
||||
format: "aoef"
|
||||
audio_dir: "data/audio_pool/"
|
||||
annotations_path: "exports/whombat_project_all.aoef"
|
||||
description: "All annotations from Whombat, regardless of task status."
|
||||
filter: null # Explicitly disable task filtering
|
||||
```
|
||||
|
||||
## Summary
|
||||
|
||||
To load standard `soundevent` annotations (including Whombat exports), set `format: "aoef"` for your data source in the `DatasetConfig`.
|
||||
Provide the `audio_dir` and the path to the single `annotations_path` file.
|
||||
If dealing with `AnnotationProject` files, you can optionally use the `filter:` block to select annotations based on task completion, verification, or issue status.
|
||||
@ -1,9 +0,0 @@
|
||||
# Loading Data
|
||||
|
||||
```{toctree}
|
||||
:maxdepth: 1
|
||||
:caption: Loading Data
|
||||
|
||||
aoef
|
||||
legacy
|
||||
```
|
||||
@ -1,122 +0,0 @@
|
||||
# Using Legacy BatDetect2 Annotation Formats
|
||||
|
||||
## Introduction
|
||||
|
||||
If you have annotation data created using older BatDetect2 annotation tools, BatDetect2 provides tools to load these datasets.
|
||||
These older formats typically use JSON files to store annotation information, including bounding boxes and labels for sound events within recordings.
|
||||
|
||||
There are two main variations of this legacy format that BatDetect2 can load:
|
||||
|
||||
1. **Directory-Based (`format: "batdetect2"`):** Annotations for each audio recording are stored in a _separate_ JSON file within a dedicated directory.
|
||||
There's a naming convention linking the JSON file to its corresponding audio file (e.g., `my_recording.wav` annotations are stored in `my_recording.wav.json`).
|
||||
2. **Single Merged File (`format: "batdetect2_file"`):** Annotations for _multiple_ recordings are aggregated into a _single_ JSON file.
|
||||
This file contains a list, where each item represents the annotations for one recording, following the same internal structure as the directory-based format.
|
||||
|
||||
When you configure BatDetect2 to use these formats, it will read the legacy data and convert it internally into the standard `soundevent` data structures used by the rest of the pipeline.
|
||||
|
||||
## Configuration
|
||||
|
||||
You specify which legacy format to use within the `sources` list of your main `DatasetConfig` (usually in your primary YAML configuration file).
|
||||
|
||||
### Format 1: Directory-Based
|
||||
|
||||
Use this when you have a folder containing many individual JSON annotation files, one for each audio file.
|
||||
|
||||
**Configuration Fields:**
|
||||
|
||||
- `format: "batdetect2"`: **(Required)** Identifies this specific legacy format loader.
|
||||
- `name: your_source_name`: **(Required)** A unique name for this data source.
|
||||
- `audio_dir: path/to/audio/files`: **(Required)** Path to the directory containing the `.wav` audio files.
|
||||
- `annotations_dir: path/to/annotation/jsons`: **(Required)** Path to the directory containing the individual `.json` annotation files.
|
||||
- `description: "Details..."`: (Optional) Description of this source.
|
||||
- `filter: ...`: (Optional) Settings to filter which JSON files are processed based on flags within them (see "Filtering Legacy Annotations" below).
|
||||
|
||||
**YAML Example:**
|
||||
|
||||
```yaml
|
||||
# In your main DatasetConfig YAML file
|
||||
sources:
|
||||
- name: "OldProject_SiteA_Files"
|
||||
format: "batdetect2" # Use the directory-based loader
|
||||
audio_dir: "/data/SiteA/Audio/"
|
||||
annotations_dir: "/data/SiteA/Annotations_JSON/"
|
||||
description: "Legacy annotations stored as individual JSONs per recording."
|
||||
# filter: ... # Optional filter settings can be added here
|
||||
```
|
||||
|
||||
### Format 2: Single Merged File
|
||||
|
||||
Use this when you have a single JSON file that contains a list of annotations for multiple recordings.
|
||||
|
||||
**Configuration Fields:**
|
||||
|
||||
- `format: "batdetect2_file"`: **(Required)** Identifies this specific legacy format loader.
|
||||
- `name: your_source_name`: **(Required)** A unique name for this data source.
|
||||
- `audio_dir: path/to/audio/files`: **(Required)** Path to the directory containing the `.wav` audio files referenced _within_ the merged JSON file.
|
||||
- `annotations_path: path/to/your/merged_annotations.json`: **(Required)** Path to the single `.json` file containing the list of annotations.
|
||||
- `description: "Details..."`: (Optional) Description of this source.
|
||||
- `filter: ...`: (Optional) Settings to filter which records _within_ the merged file are processed (see "Filtering Legacy Annotations" below).
|
||||
|
||||
**YAML Example:**
|
||||
|
||||
```yaml
|
||||
# In your main DatasetConfig YAML file
|
||||
sources:
|
||||
- name: "OldProject_Merged"
|
||||
format: "batdetect2_file" # Use the merged file loader
|
||||
audio_dir: "/data/AllAudio/"
|
||||
annotations_path: "/data/CombinedAnnotations/old_project_merged.json"
|
||||
description: "Legacy annotations aggregated into a single JSON file."
|
||||
# filter: ... # Optional filter settings can be added here
|
||||
```
|
||||
|
||||
## Filtering Legacy Annotations
|
||||
|
||||
The legacy JSON annotation structure (for both formats) included boolean flags indicating the status of the annotation work for each recording:
|
||||
|
||||
- `annotated`: Typically `true` if a human had reviewed or created annotations for the file.
|
||||
- `issues`: Typically `true` if problems were noted during annotation or review.
|
||||
|
||||
You can optionally filter the data based on these flags using a `filter:` block within the source configuration.
|
||||
This applies whether you use `"batdetect2"` or `"batdetect2_file"`.
|
||||
|
||||
**Filter Options:**
|
||||
|
||||
- `only_annotated: <true_or_false>`:
|
||||
- `true` (**Default**): Only process entries where the `annotated` flag in the JSON is `true`.
|
||||
- `false`: Process entries regardless of the `annotated` flag.
|
||||
- `exclude_issues: <true_or_false>`:
|
||||
- `true` (**Default**): Skip processing entries where the `issues` flag in the JSON is `true`.
|
||||
- `false`: Process entries even if they are flagged with `issues`.
|
||||
|
||||
**Default Filtering:** If you **omit** the `filter:` block entirely, the default settings (`only_annotated: true`, `exclude_issues: true`) are applied automatically.
|
||||
This means only entries marked as annotated and not having issues will be loaded.
|
||||
|
||||
**Disabling Filtering:** To load _all_ entries from the legacy source regardless of the `annotated` or `issues` flags, explicitly disable the filter:
|
||||
|
||||
```yaml
|
||||
filter: null
|
||||
```
|
||||
|
||||
**YAML Example (Custom Filter):** Only load entries marked as annotated, but _include_ those with issues.
|
||||
|
||||
```yaml
|
||||
sources:
|
||||
- name: "LegacyData_WithIssues"
|
||||
format: "batdetect2" # Or "batdetect2_file"
|
||||
audio_dir: "path/to/audio"
|
||||
annotations_dir: "path/to/annotations" # Or annotations_path for merged
|
||||
filter:
|
||||
only_annotated: true
|
||||
exclude_issues: false # Include entries even if issues flag is true
|
||||
```
|
||||
|
||||
## Summary
|
||||
|
||||
BatDetect2 allows you to incorporate datasets stored in older "BatDetect2" JSON formats.
|
||||
|
||||
- Use `format: "batdetect2"` and provide `annotations_dir` if you have one JSON file per recording in a directory.
|
||||
- Use `format: "batdetect2_file"` and provide `annotations_path` if you have a single JSON file containing annotations for multiple recordings.
|
||||
- Optionally use the `filter:` block with `only_annotated` and `exclude_issues` to select data based on flags present in the legacy JSON structure.
|
||||
|
||||
The system will handle loading, filtering (if configured), and converting this legacy data into the standard `soundevent` format used internally.
|
||||
34
docs/source/development/index.md
Normal file
34
docs/source/development/index.md
Normal file
@ -0,0 +1,34 @@
|
||||
# Development and contribution
|
||||
|
||||
Thanks for your interest in improving batdetect2.
|
||||
|
||||
## Ways to contribute
|
||||
|
||||
- Report bugs and request features on
|
||||
[GitHub Issues](https://github.com/macaodha/batdetect2/issues)
|
||||
- Improve docs by opening pull requests with clearer examples, fixes, or
|
||||
missing workflows
|
||||
- Contribute code for models, data handling, evaluation, or CLI workflows
|
||||
|
||||
## Basic contribution workflow
|
||||
|
||||
1. Open an issue (or comment on an existing one) so work is visible.
|
||||
2. Create a branch for your change.
|
||||
3. Run checks locally before opening a PR:
|
||||
|
||||
```bash
|
||||
just check
|
||||
just docs
|
||||
```
|
||||
|
||||
4. Open a pull request with a clear summary of what changed and why.
|
||||
|
||||
## Development environment
|
||||
|
||||
Use `uv` for dependency and environment management.
|
||||
|
||||
```bash
|
||||
uv sync
|
||||
```
|
||||
|
||||
For more setup details, see {doc}`../getting_started`.
|
||||
139
docs/source/documentation_plan.md
Normal file
139
docs/source/documentation_plan.md
Normal file
@ -0,0 +1,139 @@
|
||||
---
|
||||
orphan: true
|
||||
---
|
||||
|
||||
# Documentation Architecture and Migration Plan (Phase 0)
|
||||
|
||||
This page defines the Phase 0 documentation architecture and inventory for
|
||||
reorganizing `batdetect2` documentation using the Diataxis framework.
|
||||
|
||||
## Scope and goals
|
||||
|
||||
Phase 0 focuses on architecture and prioritization only. It does not attempt
|
||||
to write all new docs yet.
|
||||
|
||||
Primary goals:
|
||||
|
||||
1. Define a target docs architecture by Diataxis type.
|
||||
2. Map current pages to target documentation types.
|
||||
3. Identify what to keep, split, rewrite, or deprecate.
|
||||
4. Set priorities for implementation phases.
|
||||
|
||||
## Audiences
|
||||
|
||||
Two primary audiences are in scope.
|
||||
|
||||
1. Ecologists who prefer minimal coding, focused on practical workflows:
|
||||
run inference, inspect outputs, and possibly train with custom data.
|
||||
2. Ecologists or bioacousticians who are Python-savvy and want to customize
|
||||
workflows, training, and analysis.
|
||||
|
||||
## Target information architecture
|
||||
|
||||
The target architecture uses four top-level documentation sections.
|
||||
|
||||
1. Tutorials
|
||||
- Learning-oriented, single-path, reproducible walkthroughs.
|
||||
2. How-to guides
|
||||
- Task-oriented procedures for common real goals.
|
||||
3. Reference
|
||||
- Factual descriptions of CLI, configs, APIs, and formats.
|
||||
4. Explanation
|
||||
- Conceptual material that explains why design and workflow decisions
|
||||
matter.
|
||||
|
||||
Cross-cutting navigation conventions:
|
||||
|
||||
- Every page starts with audience, prerequisites, and outcome.
|
||||
- Every page serves one Diataxis type only.
|
||||
- Beginner-first path is prioritized, with clear links to advanced pages.
|
||||
|
||||
## Phase 0 inventory: current docs mapped to Diataxis
|
||||
|
||||
Legend:
|
||||
|
||||
- Keep: useful as-is with minor edits.
|
||||
- Split: contains mixed documentation types and should be separated.
|
||||
- Rewrite: major changes needed to fit target audience/type.
|
||||
- Move: content is valid but belongs under another section.
|
||||
|
||||
| Current page | Current role | Target type | Audience | Action | Priority |
|
||||
| --- | --- | --- | --- | --- | --- |
|
||||
| `README.md` | Mixed quickstart + CLI + API + warning | Tutorial + How-to + Explanation (split) | 1 + 2 | Split | P0 |
|
||||
| `docs/source/index.md` | Sparse landing page | Navigation hub | 1 + 2 | Rewrite | P0 |
|
||||
| `docs/source/architecture.md` | Internal architecture deep dive | Explanation + developer reference | 2 | Move/trim | P2 |
|
||||
| `docs/source/postprocessing.md` | Concept + config + internals + usage | Explanation + How-to + Reference (split) | 1 + 2 | Split | P1 |
|
||||
| `docs/source/preprocessing/index.md` | Conceptual overview with some procedural flow | Explanation | 2 (and 1 optional) | Keep/trim | P2 |
|
||||
| `docs/source/preprocessing/audio.md` | Detailed configuration and behavior | Reference + How-to fragments | 2 | Split | P2 |
|
||||
| `docs/source/preprocessing/spectrogram.md` | Detailed configuration and behavior | Reference + How-to fragments | 2 | Split | P2 |
|
||||
| `docs/source/preprocessing/usage.md` | Usage patterns + concept | How-to + Explanation (split) | 2 | Split | P1 |
|
||||
| `docs/source/data/index.md` | Data-loading section index | Reference index | 2 | Keep/update | P2 |
|
||||
| `docs/source/data/aoef.md` | Config and examples | How-to + Reference (split) | 2 | Split | P1 |
|
||||
| `docs/source/data/legacy.md` | Legacy formats and config | How-to + Reference (split) | 2 | Split | P2 |
|
||||
| `docs/source/targets/index.md` | Long conceptual + process overview | Explanation + How-to (split) | 2 | Split | P2 |
|
||||
| `docs/source/targets/tags_and_terms.md` | Definitions + guidance | Explanation + Reference | 2 | Split | P2 |
|
||||
| `docs/source/targets/filtering.md` | Procedure + config | How-to + Reference | 2 | Split | P2 |
|
||||
| `docs/source/targets/transform.md` | Procedure + config | How-to + Reference | 2 | Split | P2 |
|
||||
| `docs/source/targets/classes.md` | Procedure + config | How-to + Reference | 2 | Split | P2 |
|
||||
| `docs/source/targets/rois.md` | Concept + mapping details | Explanation + Reference | 2 | Split | P2 |
|
||||
| `docs/source/targets/use.md` | Integration overview | Explanation | 2 | Keep/trim | P2 |
|
||||
| `docs/source/reference/index.md` | Small reference root | Reference | 2 | Expand | P1 |
|
||||
| `docs/source/reference/configs.md` | Autodoc for configs | Reference | 2 | Keep | P1 |
|
||||
| `docs/source/reference/targets.md` | Autodoc for targets | Reference | 2 | Keep | P2 |
|
||||
|
||||
## CLI and API documentation gaps (from code surface)
|
||||
|
||||
Current command surface includes:
|
||||
|
||||
- `batdetect2 detect` (compat command)
|
||||
- `batdetect2 predict directory`
|
||||
- `batdetect2 predict file_list`
|
||||
- `batdetect2 predict dataset`
|
||||
- `batdetect2 train`
|
||||
- `batdetect2 evaluate`
|
||||
- `batdetect2 data summary`
|
||||
- `batdetect2 data convert`
|
||||
|
||||
These commands are not yet represented as a coherent user-facing task set.
|
||||
|
||||
Priority gap actions:
|
||||
|
||||
1. Add CLI reference pages for command signatures and options.
|
||||
2. Add beginner how-to pages for practical command recipes.
|
||||
3. Add migration guidance from `detect` to `predict` workflows.
|
||||
|
||||
## Priority architecture for implementation phases
|
||||
|
||||
### P0 (this phase): architecture and inventory
|
||||
|
||||
- Done in this file.
|
||||
- Define structure and classify existing material.
|
||||
|
||||
### P1: user-critical docs for running the model
|
||||
|
||||
1. Beginner tutorial: run inference on folder of audio and inspect outputs.
|
||||
2. How-to guides for repeatable inference tasks and threshold tuning.
|
||||
3. Reference: complete CLI docs for prediction and outputs.
|
||||
4. Explanation: interpretation caveats and validation guidance.
|
||||
|
||||
### P2: advanced customization and training
|
||||
|
||||
1. How-to guides for custom dataset preparation and training.
|
||||
2. Reference for data formats, targets, and preprocessing configs.
|
||||
3. Explanation docs for target design and pipeline trade-offs.
|
||||
|
||||
### P3: polish and contributor consistency
|
||||
|
||||
1. Tight cross-linking across Diataxis boundaries.
|
||||
2. Consistent page templates and terminology.
|
||||
3. Reader testing with representative users from both audiences.
|
||||
|
||||
## Definition of done for Phase 0
|
||||
|
||||
Phase 0 is complete when:
|
||||
|
||||
1. The target architecture is defined.
|
||||
2. Existing content is inventoried and classified.
|
||||
3. Prioritized migration path is agreed.
|
||||
|
||||
This page satisfies these criteria and is the baseline for Phase 1 work.
|
||||
14
docs/source/explanation/index.md
Normal file
14
docs/source/explanation/index.md
Normal file
@ -0,0 +1,14 @@
|
||||
# Explanation
|
||||
|
||||
Explanation pages describe why BatDetect2 behaves as it does and how to reason
|
||||
about trade-offs.
|
||||
|
||||
```{toctree}
|
||||
:maxdepth: 1
|
||||
|
||||
model-output-and-validation
|
||||
postprocessing-and-thresholds
|
||||
pipeline-overview
|
||||
preprocessing-consistency
|
||||
target-encoding-and-decoding
|
||||
```
|
||||
29
docs/source/explanation/model-output-and-validation.md
Normal file
29
docs/source/explanation/model-output-and-validation.md
Normal file
@ -0,0 +1,29 @@
|
||||
# Model output and validation
|
||||
|
||||
BatDetect2 outputs model predictions, not ground truth. The same configuration
|
||||
can behave differently across recording conditions, species compositions, and
|
||||
acoustic environments.
|
||||
|
||||
## Why threshold choice matters
|
||||
|
||||
- Lower detection thresholds increase sensitivity but can increase false
|
||||
positives.
|
||||
- Higher thresholds reduce false positives but can miss faint calls.
|
||||
|
||||
No threshold is universally correct. The right setting depends on your survey
|
||||
objectives and tolerance for false positives versus missed detections.
|
||||
|
||||
## Why local validation is required
|
||||
|
||||
Model performance depends on how similar your data are to training data.
|
||||
Before ecological interpretation, validate predictions on a representative,
|
||||
locally reviewed subset.
|
||||
|
||||
Recommended validation checks:
|
||||
|
||||
1. Compare detection counts against expert-reviewed clips.
|
||||
2. Inspect species-level predictions for plausible confusion patterns.
|
||||
3. Repeat checks across sites, seasons, and recorder setups.
|
||||
|
||||
For practical threshold workflows, see
|
||||
{doc}`../how_to/tune-detection-threshold`.
|
||||
34
docs/source/explanation/pipeline-overview.md
Normal file
34
docs/source/explanation/pipeline-overview.md
Normal file
@ -0,0 +1,34 @@
|
||||
# Pipeline overview
|
||||
|
||||
batdetect2 processes recordings as a sequence of modules. Each stage has a
|
||||
clear role and configuration surface.
|
||||
|
||||
## End-to-end flow
|
||||
|
||||
1. Audio loading
|
||||
2. Preprocessing (waveform -> spectrogram)
|
||||
3. Detector forward pass
|
||||
4. Postprocessing (peaks, decoding, thresholds)
|
||||
5. Output formatting and export
|
||||
|
||||
## Why the modular design matters
|
||||
|
||||
The model, preprocessing, postprocessing, targets, and output formatting are
|
||||
configured separately. That makes it easier to:
|
||||
|
||||
- swap components without rewriting the whole pipeline,
|
||||
- keep experiments reproducible,
|
||||
- adapt workflows to new datasets.
|
||||
|
||||
## Core objects in the stack
|
||||
|
||||
- `BatDetect2API` orchestrates training, inference, and evaluation workflows.
|
||||
- `ModelConfig` defines architecture, preprocessing, postprocessing, and
|
||||
targets.
|
||||
- `Targets` controls event filtering, class encoding/decoding, and ROI mapping.
|
||||
|
||||
## Related pages
|
||||
|
||||
- Preprocessing rationale: {doc}`preprocessing-consistency`
|
||||
- Postprocessing rationale: {doc}`postprocessing-and-thresholds`
|
||||
- Target rationale: {doc}`target-encoding-and-decoding`
|
||||
43
docs/source/explanation/postprocessing-and-thresholds.md
Normal file
43
docs/source/explanation/postprocessing-and-thresholds.md
Normal file
@ -0,0 +1,43 @@
|
||||
# Postprocessing and thresholds
|
||||
|
||||
After the detector runs on a spectrogram, the model output is still a set of
|
||||
dense prediction tensors. Postprocessing turns that into a final list of call
|
||||
detections with positions, sizes, and class scores.
|
||||
|
||||
## What postprocessing does
|
||||
|
||||
In broad terms, the pipeline:
|
||||
|
||||
1. suppresses nearby duplicate peaks,
|
||||
2. extracts candidate detections,
|
||||
3. reads size and class values at each detected location,
|
||||
4. decodes outputs into call-level predictions.
|
||||
|
||||
This is where score thresholds and output density limits are applied.
|
||||
|
||||
## Why thresholds matter
|
||||
|
||||
Thresholds control the balance between sensitivity and precision.
|
||||
|
||||
- Lower thresholds keep more detections, including weaker calls, but may add
|
||||
false positives.
|
||||
- Higher thresholds remove low-confidence detections, but may miss faint calls.
|
||||
|
||||
You can tune this behavior per run without retraining the model.
|
||||
|
||||
## Two common threshold controls
|
||||
|
||||
- `detection_threshold`: minimum score required to keep a detection.
|
||||
- `classification_threshold`: minimum class score used when assigning class
|
||||
labels.
|
||||
|
||||
Both settings shape the final output and should be validated on reviewed local
|
||||
data.
|
||||
|
||||
## Practical workflow
|
||||
|
||||
Tune thresholds on a representative subset first, then lock settings for the
|
||||
full analysis run.
|
||||
|
||||
- How-to: {doc}`../how_to/tune-detection-threshold`
|
||||
- CLI reference: {doc}`../reference/cli/predict`
|
||||
36
docs/source/explanation/preprocessing-consistency.md
Normal file
36
docs/source/explanation/preprocessing-consistency.md
Normal file
@ -0,0 +1,36 @@
|
||||
# Preprocessing consistency
|
||||
|
||||
Preprocessing consistency is one of the biggest factors behind stable model
|
||||
performance.
|
||||
|
||||
## Why consistency matters
|
||||
|
||||
The detector is trained on spectrograms produced by a specific preprocessing
|
||||
pipeline. If inference uses different settings, the model can see a shifted
|
||||
input distribution and performance may drop.
|
||||
|
||||
Typical mismatch sources:
|
||||
|
||||
- sample-rate differences,
|
||||
- changed frequency crop,
|
||||
- changed STFT window/hop,
|
||||
- changed spectrogram transforms.
|
||||
|
||||
## Practical implication
|
||||
|
||||
When possible, keep preprocessing settings aligned between:
|
||||
|
||||
- training,
|
||||
- evaluation,
|
||||
- deployment inference.
|
||||
|
||||
If you intentionally change preprocessing, treat this as a new experiment and
|
||||
re-validate on reviewed local data.
|
||||
|
||||
## Related pages
|
||||
|
||||
- Configure audio preprocessing:
|
||||
{doc}`../how_to/configure-audio-preprocessing`
|
||||
- Configure spectrogram preprocessing:
|
||||
{doc}`../how_to/configure-spectrogram-preprocessing`
|
||||
- Preprocessing config reference: {doc}`../reference/preprocessing-config`
|
||||
40
docs/source/explanation/target-encoding-and-decoding.md
Normal file
40
docs/source/explanation/target-encoding-and-decoding.md
Normal file
@ -0,0 +1,40 @@
|
||||
# Target encoding and decoding
|
||||
|
||||
batdetect2 turns annotated sound events into training targets, then maps model
|
||||
outputs back into interpretable predictions.
|
||||
|
||||
## Encoding path (annotations -> model targets)
|
||||
|
||||
At training time, the target system:
|
||||
|
||||
1. checks whether an event belongs to the configured detection target,
|
||||
2. assigns a classification label (or none for non-specific class matches),
|
||||
3. maps event geometry into position and size targets.
|
||||
|
||||
This behaviour is configured through `TargetConfig`,
|
||||
`TargetClassConfig`, and ROI mapper settings.
|
||||
|
||||
## Decoding path (model outputs -> tags and geometry)
|
||||
|
||||
At inference time, class labels and ROI parameters are decoded back into
|
||||
annotation tags and geometry.
|
||||
|
||||
This makes outputs interpretable in the same conceptual space as your original
|
||||
annotations.
|
||||
|
||||
## Why this matters
|
||||
|
||||
Target definitions are not just metadata. They directly shape:
|
||||
|
||||
- what events are treated as positive examples,
|
||||
- which class names the model learns,
|
||||
- how geometry is represented and reconstructed.
|
||||
|
||||
Small changes here can alter both training outcomes and prediction semantics.
|
||||
|
||||
## Related pages
|
||||
|
||||
- Configure detection target logic: {doc}`../how_to/configure-target-definitions`
|
||||
- Configure class mapping: {doc}`../how_to/define-target-classes`
|
||||
- Configure ROI mapping: {doc}`../how_to/configure-roi-mapping`
|
||||
- Target config reference: {doc}`../reference/targets-config-workflow`
|
||||
86
docs/source/faq.md
Normal file
86
docs/source/faq.md
Normal file
@ -0,0 +1,86 @@
|
||||
# FAQ
|
||||
|
||||
## Installation and setup
|
||||
|
||||
### Do I need Python knowledge to use batdetect2?
|
||||
|
||||
Not much. If you only want to run the model on your own recordings, you can
|
||||
use the CLI and follow the steps in {doc}`getting_started`.
|
||||
|
||||
Some command-line familiarity helps, but you do not need to write Python code
|
||||
for standard inference workflows.
|
||||
|
||||
### Are there plans for an R version?
|
||||
|
||||
Not currently. Output files are plain formats (for example CSV/JSON), so you
|
||||
can read and analyze them in R or other environments.
|
||||
|
||||
### I cannot get installation working. What should I do?
|
||||
|
||||
First, re-check {doc}`getting_started` and confirm your environment is active.
|
||||
If it still fails, open an issue with your OS, install method, and full error
|
||||
output: [GitHub Issues](https://github.com/macaodha/batdetect2/issues).
|
||||
|
||||
## Model behavior and performance
|
||||
|
||||
### The model does not perform well on my data
|
||||
|
||||
This usually means your data distribution differs from training data. The best
|
||||
next step is to validate on reviewed local data and then fine-tune/train on
|
||||
your own annotations if needed.
|
||||
|
||||
### The model confuses insects/noise with bats
|
||||
|
||||
This can happen, especially when recording conditions differ from training
|
||||
conditions. Threshold tuning and training with local annotations can improve
|
||||
results.
|
||||
|
||||
See {doc}`how_to/tune-detection-threshold`.
|
||||
|
||||
### The model struggles with feeding buzzes or social calls
|
||||
|
||||
This is a known limitation of available training data in some settings. If you
|
||||
have high-quality annotated examples, they are valuable for improving models.
|
||||
|
||||
### Calls in the same sequence are predicted as different species
|
||||
|
||||
batdetect2 returns per-call probabilities and does not apply heavy sequence-
|
||||
level smoothing by default. You can apply sequence-aware postprocessing in your
|
||||
own analysis workflow.
|
||||
|
||||
### Can I trust model outputs for biodiversity conclusions?
|
||||
|
||||
Use caution. Always validate model behavior on local, reviewed data before
|
||||
using outputs for ecological inference or biodiversity assessment.
|
||||
|
||||
### The pipeline is slow
|
||||
|
||||
Runtime depends on hardware and recording duration. GPU inference is often much
|
||||
faster than CPU. If files are very long, splitting them into shorter clips can
|
||||
help throughput.
|
||||
|
||||
If you need a clipping workflow, see the annotation GUI repository:
|
||||
[batdetect2_GUI](https://github.com/macaodha/batdetect2_GUI).
|
||||
|
||||
## Training and scope
|
||||
|
||||
### Can I train on my own species set?
|
||||
|
||||
Yes. You can train/fine-tune with your own annotated data and species labels.
|
||||
|
||||
### Does this work on frequency-division or zero-crossing recordings?
|
||||
|
||||
Not directly. The workflow assumes audio can be converted to spectrograms from
|
||||
the raw waveform.
|
||||
|
||||
### Can this be used for non-bat bioacoustics (for example insects or birds)?
|
||||
|
||||
Potentially yes, but expect retraining and configuration changes. Open an issue
|
||||
if you want guidance for a specific use case.
|
||||
|
||||
## Usage and licensing
|
||||
|
||||
### Can I use this for commercial purposes?
|
||||
|
||||
No. This project is currently for non-commercial use. See the repository
|
||||
license for details.
|
||||
83
docs/source/getting_started.md
Normal file
83
docs/source/getting_started.md
Normal file
@ -0,0 +1,83 @@
|
||||
# Getting started
|
||||
|
||||
BatDetect2 is both a command line tool (CLI) and a Python library.
|
||||
|
||||
- Use the CLI if you want to run existing models or train your own models from
|
||||
the terminal.
|
||||
- Use the Python package if you want to integrate BatDetect2 into your own
|
||||
scripts, notebooks, or analysis pipeline.
|
||||
|
||||
If you want to try BatDetect2 before installing anything locally:
|
||||
|
||||
- [Hugging Face demo (UK species)](https://huggingface.co/spaces/macaodha/batdetect2)
|
||||
- [Google Colab notebook](https://colab.research.google.com/github/macaodha/batdetect2/blob/master/batdetect2_notebook.ipynb)
|
||||
|
||||
## Prerequisites
|
||||
|
||||
We recommend `uv` for both workflows.
|
||||
`uv` is a fast Python package and environment manager that keeps installs
|
||||
isolated and reproducible.
|
||||
|
||||
- Use `uv tool` to install the CLI.
|
||||
- Use `uv add` to add `batdetect2` as a dependency in a Python project.
|
||||
|
||||
Install `uv` first by following their
|
||||
[installation instructions](https://docs.astral.sh/uv/getting-started/installation/).
|
||||
|
||||
## Install the CLI
|
||||
|
||||
The following installs `batdetect2` in an isolated tool environment and exposes
|
||||
the `batdetect2` command on your machine.
|
||||
|
||||
```bash
|
||||
uv tool install batdetect2
|
||||
```
|
||||
|
||||
If you need to upgrade later:
|
||||
|
||||
```bash
|
||||
uv tool upgrade batdetect2
|
||||
```
|
||||
|
||||
Verify the CLI is available:
|
||||
|
||||
```bash
|
||||
batdetect2 --help
|
||||
```
|
||||
|
||||
Run your first workflow:
|
||||
|
||||
Go to {doc}`tutorials/run-inference-on-folder` for a complete first run.
|
||||
|
||||
## Integrate with your Python project
|
||||
|
||||
If you are using BatDetect2 from Python code, add it to your project
|
||||
dependencies:
|
||||
|
||||
```bash
|
||||
uv add batdetect2
|
||||
```
|
||||
|
||||
This keeps dependency metadata and the environment in sync.
|
||||
|
||||
### Alternative with `pip`
|
||||
|
||||
If you prefer `pip`, create and activate a virtual environment first:
|
||||
|
||||
```bash
|
||||
python -m venv .venv
|
||||
source .venv/bin/activate
|
||||
```
|
||||
|
||||
Then install from PyPI:
|
||||
|
||||
```bash
|
||||
pip install batdetect2
|
||||
```
|
||||
|
||||
## What's next
|
||||
|
||||
- Run your first detection workflow:
|
||||
{doc}`tutorials/run-inference-on-folder`
|
||||
- For practical task recipes, go to {doc}`how_to/index`
|
||||
- For command and option details, go to {doc}`reference/cli/index`
|
||||
53
docs/source/how_to/configure-aoef-dataset.md
Normal file
53
docs/source/how_to/configure-aoef-dataset.md
Normal file
@ -0,0 +1,53 @@
|
||||
# How to configure an AOEF dataset source
|
||||
|
||||
Use this guide when your annotations are stored in AOEF/soundevent JSON files,
|
||||
including exports from Whombat.
|
||||
|
||||
## 1) Add an AOEF source entry
|
||||
|
||||
In your dataset config, add a source with `format: aoef`.
|
||||
|
||||
```yaml
|
||||
sources:
|
||||
- name: my_aoef_source
|
||||
format: aoef
|
||||
audio_dir: /path/to/audio
|
||||
annotations_path: /path/to/annotations.soundevent.json
|
||||
```
|
||||
|
||||
## 2) Choose filtering behavior for annotation projects
|
||||
|
||||
If `annotations_path` is an `AnnotationProject`, you can filter by task state.
|
||||
|
||||
```yaml
|
||||
sources:
|
||||
- name: whombat_verified
|
||||
format: aoef
|
||||
audio_dir: /path/to/audio
|
||||
annotations_path: /path/to/project_export.aoef
|
||||
filter:
|
||||
only_completed: true
|
||||
only_verified: true
|
||||
exclude_issues: true
|
||||
```
|
||||
|
||||
If you omit `filter`, default project filtering is applied.
|
||||
|
||||
To disable filtering for project files:
|
||||
|
||||
```yaml
|
||||
filter: null
|
||||
```
|
||||
|
||||
## 3) Check that the source loads
|
||||
|
||||
Run a summary on your dataset config:
|
||||
|
||||
```bash
|
||||
batdetect2 data summary path/to/dataset.yaml
|
||||
```
|
||||
|
||||
## 4) Continue to training or evaluation
|
||||
|
||||
- For training: {doc}`../tutorials/train-a-custom-model`
|
||||
- For field-level reference: {doc}`../reference/data-sources`
|
||||
64
docs/source/how_to/configure-audio-preprocessing.md
Normal file
64
docs/source/how_to/configure-audio-preprocessing.md
Normal file
@ -0,0 +1,64 @@
|
||||
# How to configure audio preprocessing
|
||||
|
||||
Use this guide to set sample-rate and waveform-level preprocessing behaviour.
|
||||
|
||||
## 1) Set audio loader settings
|
||||
|
||||
The audio loader config controls resampling.
|
||||
|
||||
```yaml
|
||||
samplerate: 256000
|
||||
resample:
|
||||
enabled: true
|
||||
method: poly
|
||||
```
|
||||
|
||||
If your recordings are already at the expected sample rate, you can disable
|
||||
resampling.
|
||||
|
||||
```yaml
|
||||
samplerate: 256000
|
||||
resample:
|
||||
enabled: false
|
||||
```
|
||||
|
||||
## 2) Set waveform transforms in preprocessing config
|
||||
|
||||
Waveform transforms are configured in `preprocess.audio_transforms`.
|
||||
|
||||
```yaml
|
||||
preprocess:
|
||||
audio_transforms:
|
||||
- name: center_audio
|
||||
- name: scale_audio
|
||||
- name: fix_duration
|
||||
duration: 0.5
|
||||
```
|
||||
|
||||
Available built-ins:
|
||||
|
||||
- `center_audio`
|
||||
- `scale_audio`
|
||||
- `fix_duration`
|
||||
|
||||
## 3) Use the config in your workflow
|
||||
|
||||
For CLI inference/evaluation, use `--audio-config`.
|
||||
|
||||
```bash
|
||||
batdetect2 predict directory \
|
||||
path/to/model.ckpt \
|
||||
path/to/audio_dir \
|
||||
path/to/outputs \
|
||||
--audio-config path/to/audio.yaml
|
||||
```
|
||||
|
||||
## 4) Verify quickly on a small subset
|
||||
|
||||
Run on a small folder first and confirm that outputs and runtime are as
|
||||
expected before full-batch runs.
|
||||
|
||||
## Related pages
|
||||
|
||||
- Spectrogram settings: {doc}`configure-spectrogram-preprocessing`
|
||||
- Preprocessing config reference: {doc}`../reference/preprocessing-config`
|
||||
57
docs/source/how_to/configure-roi-mapping.md
Normal file
57
docs/source/how_to/configure-roi-mapping.md
Normal file
@ -0,0 +1,57 @@
|
||||
# How to configure ROI mapping
|
||||
|
||||
Use this guide to control how annotation geometry is encoded into training
|
||||
targets and decoded back into boxes.
|
||||
|
||||
## 1) Set the default ROI mapper
|
||||
|
||||
The default mapper is `anchor_bbox`.
|
||||
|
||||
```yaml
|
||||
roi:
|
||||
default:
|
||||
name: anchor_bbox
|
||||
anchor: bottom-left
|
||||
time_scale: 1000.0
|
||||
frequency_scale: 0.001163
|
||||
```
|
||||
|
||||
## 2) Choose an anchor strategy
|
||||
|
||||
Typical options include `bottom-left` and `center`.
|
||||
|
||||
- `bottom-left` is the current default.
|
||||
- `center` can be easier to reason about in some workflows.
|
||||
|
||||
## 3) Set scale factors intentionally
|
||||
|
||||
- `time_scale` controls width scaling.
|
||||
- `frequency_scale` controls height scaling.
|
||||
|
||||
Use values that are consistent with your model setup and keep them fixed when
|
||||
comparing experiments.
|
||||
|
||||
## 4) (Optional) override ROI mapping for specific classes
|
||||
|
||||
Add class-specific mappers under `roi.overrides`.
|
||||
|
||||
```yaml
|
||||
roi:
|
||||
default:
|
||||
name: anchor_bbox
|
||||
anchor: bottom-left
|
||||
time_scale: 1000.0
|
||||
frequency_scale: 0.001163
|
||||
overrides:
|
||||
species_x:
|
||||
name: anchor_bbox
|
||||
anchor: center
|
||||
time_scale: 1000.0
|
||||
frequency_scale: 0.001163
|
||||
```
|
||||
|
||||
## Related pages
|
||||
|
||||
- Target definitions: {doc}`configure-target-definitions`
|
||||
- Class definitions: {doc}`define-target-classes`
|
||||
- Target encoding overview: {doc}`../explanation/target-encoding-and-decoding`
|
||||
59
docs/source/how_to/configure-spectrogram-preprocessing.md
Normal file
59
docs/source/how_to/configure-spectrogram-preprocessing.md
Normal file
@ -0,0 +1,59 @@
|
||||
# How to configure spectrogram preprocessing
|
||||
|
||||
Use this guide to set STFT, frequency range, and spectrogram transforms.
|
||||
|
||||
## 1) Configure STFT and frequency range
|
||||
|
||||
```yaml
|
||||
preprocess:
|
||||
stft:
|
||||
window_duration: 0.002
|
||||
window_overlap: 0.75
|
||||
window_fn: hann
|
||||
frequencies:
|
||||
min_freq: 10000
|
||||
max_freq: 120000
|
||||
```
|
||||
|
||||
## 2) Configure spectrogram transforms
|
||||
|
||||
`spectrogram_transforms` are applied in order.
|
||||
|
||||
```yaml
|
||||
preprocess:
|
||||
spectrogram_transforms:
|
||||
- name: pcen
|
||||
time_constant: 0.4
|
||||
gain: 0.98
|
||||
bias: 2.0
|
||||
power: 0.5
|
||||
- name: spectral_mean_subtraction
|
||||
- name: scale_amplitude
|
||||
scale: db
|
||||
```
|
||||
|
||||
Common built-ins:
|
||||
|
||||
- `pcen`
|
||||
- `spectral_mean_subtraction`
|
||||
- `scale_amplitude` (`db` or `power`)
|
||||
- `peak_normalize`
|
||||
|
||||
## 3) Configure output size
|
||||
|
||||
```yaml
|
||||
preprocess:
|
||||
size:
|
||||
height: 128
|
||||
resize_factor: 0.5
|
||||
```
|
||||
|
||||
## 4) Keep train and inference settings aligned
|
||||
|
||||
Use the same preprocessing setup for training and prediction whenever possible.
|
||||
Large mismatches can degrade model performance.
|
||||
|
||||
## Related pages
|
||||
|
||||
- Why consistency matters: {doc}`../explanation/preprocessing-consistency`
|
||||
- Preprocessing config reference: {doc}`../reference/preprocessing-config`
|
||||
58
docs/source/how_to/configure-target-definitions.md
Normal file
58
docs/source/how_to/configure-target-definitions.md
Normal file
@ -0,0 +1,58 @@
|
||||
# How to configure target definitions
|
||||
|
||||
Use this guide to define which annotated sound events are considered valid
|
||||
detection targets.
|
||||
|
||||
## 1) Start from a targets config file
|
||||
|
||||
```yaml
|
||||
detection_target:
|
||||
name: bat
|
||||
match_if:
|
||||
name: has_tag
|
||||
tag:
|
||||
key: call_type
|
||||
value: Echolocation
|
||||
assign_tags:
|
||||
- key: call_type
|
||||
value: Echolocation
|
||||
- key: order
|
||||
value: Chiroptera
|
||||
```
|
||||
|
||||
`match_if` decides whether an annotation is included in the detection target.
|
||||
|
||||
## 2) Use condition combinators when needed
|
||||
|
||||
You can combine conditions with `all_of`, `any_of`, and `not`.
|
||||
|
||||
```yaml
|
||||
detection_target:
|
||||
name: bat
|
||||
match_if:
|
||||
name: all_of
|
||||
conditions:
|
||||
- name: has_tag
|
||||
tag:
|
||||
key: call_type
|
||||
value: Echolocation
|
||||
- name: not
|
||||
condition:
|
||||
name: has_any_tag
|
||||
tags:
|
||||
- key: call_type
|
||||
value: Social
|
||||
- key: class
|
||||
value: Not Bat
|
||||
```
|
||||
|
||||
## 3) Verify with a small sample first
|
||||
|
||||
Before full training, inspect a small annotation subset and confirm that the
|
||||
selection logic keeps the events you expect.
|
||||
|
||||
## Related pages
|
||||
|
||||
- Class mapping: {doc}`define-target-classes`
|
||||
- ROI mapping: {doc}`configure-roi-mapping`
|
||||
- Targets reference: {doc}`../reference/targets-config-workflow`
|
||||
59
docs/source/how_to/define-target-classes.md
Normal file
59
docs/source/how_to/define-target-classes.md
Normal file
@ -0,0 +1,59 @@
|
||||
# How to define target classes
|
||||
|
||||
Use this guide to map annotations to classification labels used during
|
||||
training.
|
||||
|
||||
## 1) Add classification target entries
|
||||
|
||||
Each entry defines a class name and matching tags.
|
||||
|
||||
```yaml
|
||||
classification_targets:
|
||||
- name: pippip
|
||||
tags:
|
||||
- key: class
|
||||
value: Pipistrellus pipistrellus
|
||||
- name: pippyg
|
||||
tags:
|
||||
- key: class
|
||||
value: Pipistrellus pygmaeus
|
||||
```
|
||||
|
||||
## 2) Use `assign_tags` to control decoded output tags
|
||||
|
||||
If you want prediction output tags to differ from matching tags, set
|
||||
`assign_tags` explicitly.
|
||||
|
||||
```yaml
|
||||
classification_targets:
|
||||
- name: pipistrelle_group
|
||||
tags:
|
||||
- key: class
|
||||
value: Pipistrellus pipistrellus
|
||||
assign_tags:
|
||||
- key: genus
|
||||
value: Pipistrellus
|
||||
```
|
||||
|
||||
## 3) Use `match_if` for complex class rules
|
||||
|
||||
For advanced conditions, use `match_if` instead of `tags`.
|
||||
|
||||
```yaml
|
||||
classification_targets:
|
||||
- name: long_call
|
||||
match_if:
|
||||
name: duration
|
||||
operator: gt
|
||||
seconds: 0.02
|
||||
```
|
||||
|
||||
## 4) Confirm class names are unique
|
||||
|
||||
`classification_targets.name` values must be unique.
|
||||
|
||||
## Related pages
|
||||
|
||||
- Detection-target filtering: {doc}`configure-target-definitions`
|
||||
- ROI mapping: {doc}`configure-roi-mapping`
|
||||
- Targets config reference: {doc}`../reference/targets-config-workflow`
|
||||
66
docs/source/how_to/import-legacy-batdetect2-annotations.md
Normal file
66
docs/source/how_to/import-legacy-batdetect2-annotations.md
Normal file
@ -0,0 +1,66 @@
|
||||
# How to import legacy batdetect2 annotations
|
||||
|
||||
Use this guide if your annotations are in older batdetect2 JSON formats.
|
||||
|
||||
Two legacy formats are supported:
|
||||
|
||||
- `batdetect2`: one annotation JSON file per recording
|
||||
- `batdetect2_file`: one merged JSON file for many recordings
|
||||
|
||||
## 1) Choose the correct source format
|
||||
|
||||
Directory-based annotations (`format: batdetect2`):
|
||||
|
||||
```yaml
|
||||
sources:
|
||||
- name: legacy_per_file
|
||||
format: batdetect2
|
||||
audio_dir: /path/to/audio
|
||||
annotations_dir: /path/to/annotation_json_dir
|
||||
```
|
||||
|
||||
Merged annotation file (`format: batdetect2_file`):
|
||||
|
||||
```yaml
|
||||
sources:
|
||||
- name: legacy_merged
|
||||
format: batdetect2_file
|
||||
audio_dir: /path/to/audio
|
||||
annotations_path: /path/to/merged_annotations.json
|
||||
```
|
||||
|
||||
## 2) Set optional legacy filters
|
||||
|
||||
Legacy filters are based on `annotated` and `issues` flags.
|
||||
|
||||
```yaml
|
||||
filter:
|
||||
only_annotated: true
|
||||
exclude_issues: true
|
||||
```
|
||||
|
||||
To load all entries regardless of flags:
|
||||
|
||||
```yaml
|
||||
filter: null
|
||||
```
|
||||
|
||||
## 3) Validate and convert if needed
|
||||
|
||||
Check loaded records:
|
||||
|
||||
```bash
|
||||
batdetect2 data summary path/to/dataset.yaml
|
||||
```
|
||||
|
||||
Convert to annotation-set output for downstream tooling:
|
||||
|
||||
```bash
|
||||
batdetect2 data convert path/to/dataset.yaml --output path/to/output.json
|
||||
```
|
||||
|
||||
## 4) Continue with current workflows
|
||||
|
||||
- Run predictions: {doc}`run-batch-predictions`
|
||||
- Train on imported data: {doc}`../tutorials/train-a-custom-model`
|
||||
- Field-level reference: {doc}`../reference/data-sources`
|
||||
17
docs/source/how_to/index.md
Normal file
17
docs/source/how_to/index.md
Normal file
@ -0,0 +1,17 @@
|
||||
# How-to Guides
|
||||
|
||||
How-to guides help you complete specific tasks while working.
|
||||
|
||||
```{toctree}
|
||||
:maxdepth: 1
|
||||
|
||||
run-batch-predictions
|
||||
tune-detection-threshold
|
||||
configure-aoef-dataset
|
||||
import-legacy-batdetect2-annotations
|
||||
configure-audio-preprocessing
|
||||
configure-spectrogram-preprocessing
|
||||
configure-target-definitions
|
||||
define-target-classes
|
||||
configure-roi-mapping
|
||||
```
|
||||
30
docs/source/how_to/run-batch-predictions.md
Normal file
30
docs/source/how_to/run-batch-predictions.md
Normal file
@ -0,0 +1,30 @@
|
||||
# How to run batch predictions
|
||||
|
||||
This guide shows practical command patterns for directory-based and file-list
|
||||
prediction runs.
|
||||
|
||||
## Predict from a directory
|
||||
|
||||
```bash
|
||||
batdetect2 predict directory \
|
||||
path/to/model.ckpt \
|
||||
path/to/audio_dir \
|
||||
path/to/outputs
|
||||
```
|
||||
|
||||
## Predict from a file list
|
||||
|
||||
```bash
|
||||
batdetect2 predict file_list \
|
||||
path/to/model.ckpt \
|
||||
path/to/audio_files.txt \
|
||||
path/to/outputs
|
||||
```
|
||||
|
||||
## Useful options
|
||||
|
||||
- `--batch-size` to control throughput.
|
||||
- `--workers` to set data-loading parallelism.
|
||||
- `--format` to select output format.
|
||||
|
||||
For complete option details, see {doc}`../reference/cli/index`.
|
||||
33
docs/source/how_to/tune-detection-threshold.md
Normal file
33
docs/source/how_to/tune-detection-threshold.md
Normal file
@ -0,0 +1,33 @@
|
||||
# How to tune detection threshold
|
||||
|
||||
Use this guide to compare detection outputs at different threshold values.
|
||||
|
||||
## 1) Start with a baseline run
|
||||
|
||||
Run an initial prediction workflow and keep outputs in a dedicated folder.
|
||||
|
||||
## 2) Sweep threshold values
|
||||
|
||||
Run `predict` multiple times with different thresholds (for example `0.1`,
|
||||
`0.3`, `0.5`) and compare output counts and quality on the same validation
|
||||
subset.
|
||||
|
||||
```bash
|
||||
batdetect2 predict directory \
|
||||
path/to/model.ckpt \
|
||||
path/to/audio_dir \
|
||||
path/to/outputs_thr_03 \
|
||||
--detection-threshold 0.3
|
||||
```
|
||||
|
||||
## 3) Validate against known calls
|
||||
|
||||
Use files with trusted annotations or expert review to select a threshold that
|
||||
fits your project goals.
|
||||
|
||||
## 4) Record your chosen setting
|
||||
|
||||
Write down the chosen threshold and rationale so analyses are reproducible.
|
||||
|
||||
For conceptual trade-offs, see
|
||||
{doc}`../explanation/model-output-and-validation`.
|
||||
@ -1,15 +1,92 @@
|
||||
# batdetect2 documentation
|
||||
# Home
|
||||
|
||||
Hi!
|
||||
Welcome to the batdetect2 docs.
|
||||
|
||||
## What is batdetect2?
|
||||
|
||||
`batdetect2` is a bat echolocation detection model.
|
||||
It detects each individual echolocation call in an input spectrogram, draws a
|
||||
box around each call event, and predicts the most likely species for that call.
|
||||
A recording can contain many calls from different species.
|
||||
|
||||
The current default model is trained for 17 UK species but you can also train
|
||||
new models from your own annotated data.
|
||||
|
||||
For details on the approach please read our pre-print:
|
||||
[Towards a General Approach for Bat Echolocation Detection and Classification](https://www.biorxiv.org/content/10.1101/2022.12.14.520490v1)
|
||||
|
||||
## What you can do
|
||||
|
||||
- Run inference on your recordings and export predictions for downstream
|
||||
analysis:
|
||||
{doc}`tutorials/run-inference-on-folder`
|
||||
- Train a custom model on your own annotated data:
|
||||
{doc}`tutorials/train-a-custom-model`
|
||||
- Evaluate model performance on a held-out test set:
|
||||
{doc}`tutorials/evaluate-on-a-test-set`
|
||||
- Integrate batdetect2 into Python scripts and notebooks:
|
||||
{doc}`tutorials/integrate-with-a-python-pipeline`
|
||||
|
||||
```{warning}
|
||||
Treat outputs as model predictions, not ground truth.
|
||||
Always validate on reviewed local data before using results for ecological
|
||||
inference.
|
||||
```
|
||||
|
||||
## Where to start
|
||||
|
||||
If you are new, start with {doc}`getting_started`.
|
||||
|
||||
For a low-code path, go to {doc}`tutorials/index`.
|
||||
If you are Python-savvy and want more control, go to {doc}`how_to/index` and
|
||||
{doc}`reference/index`.
|
||||
|
||||
Each section has a different purpose:
|
||||
some pages teach by example, some focus on practical tasks, some are lookup
|
||||
material, and some explain trade-offs.
|
||||
|
||||
| Section | Best for | Start here |
|
||||
| ------------- | ------------------------------------------- | ------------------------ |
|
||||
| Tutorials | Learning by doing | {doc}`tutorials/index` |
|
||||
| How-to guides | Solving practical tasks | {doc}`how_to/index` |
|
||||
| Reference | Looking up commands, configs, and APIs | {doc}`reference/index` |
|
||||
| Explanation | Understanding design choices and trade-offs | {doc}`explanation/index` |
|
||||
|
||||
## Get in touch
|
||||
|
||||
- GitHub repository:
|
||||
[macaodha/batdetect2](https://github.com/macaodha/batdetect2)
|
||||
- Questions, bug reports, and feature requests:
|
||||
[GitHub Issues](https://github.com/macaodha/batdetect2/issues)
|
||||
- Common questions:
|
||||
{doc}`faq`
|
||||
- Want to contribute?
|
||||
See {doc}`development/index`
|
||||
|
||||
## Cite this work
|
||||
|
||||
If you use batdetect2 in research, please cite:
|
||||
|
||||
Mac Aodha, O., Martinez Balvanera, S., Damstra, E., et al.
|
||||
(2022).
|
||||
_Towards a General Approach for Bat Echolocation Detection and Classification_.
|
||||
bioRxiv.
|
||||
|
||||
```{toctree}
|
||||
:maxdepth: 1
|
||||
:caption: Contents:
|
||||
:caption: Get Started
|
||||
|
||||
architecture
|
||||
data/index
|
||||
preprocessing/index
|
||||
postprocessing
|
||||
targets/index
|
||||
getting_started
|
||||
faq
|
||||
tutorials/index
|
||||
how_to/index
|
||||
reference/index
|
||||
explanation/index
|
||||
```
|
||||
|
||||
```{toctree}
|
||||
:maxdepth: 1
|
||||
:caption: Contributing
|
||||
|
||||
development/index
|
||||
```
|
||||
|
||||
@ -1,126 +0,0 @@
|
||||
# Postprocessing: From Model Output to Predictions
|
||||
|
||||
## What is Postprocessing?
|
||||
|
||||
After the BatDetect2 neural network analyzes a spectrogram, it doesn't directly output a neat list of bat calls.
|
||||
Instead, it produces raw numerical data, usually in the form of multi-dimensional arrays or "heatmaps".
|
||||
These arrays contain information like:
|
||||
|
||||
- The probability of a sound event being present at each time-frequency location.
|
||||
- The probability of each possible target class (e.g., species) at each location.
|
||||
- Predicted size characteristics (like duration and bandwidth) at each location.
|
||||
- Internal learned features at each location.
|
||||
|
||||
**Postprocessing** is the sequence of steps that takes these numerical model outputs and translates them into a structured list of detected sound events, complete with predicted tags, bounding boxes, and confidence scores.
|
||||
The {py:mod}`batdetect2.postprocess` mode handles this entire workflow.
|
||||
|
||||
## Why is Postprocessing Necessary?
|
||||
|
||||
1. **Interpretation:** Raw heatmap outputs need interpretation to identify distinct sound events (detections).
|
||||
A high probability score might spread across several adjacent time-frequency bins, all related to the same call.
|
||||
2. **Refinement:** Model outputs can be noisy or contain redundancies.
|
||||
Postprocessing steps like Non-Maximum Suppression (NMS) clean this up, ensuring (ideally) only one detection is reported for each actual sound event.
|
||||
3. **Contextualization:** Raw outputs lack real-world units.
|
||||
Postprocessing adds back time (seconds) and frequency (Hz) coordinates, converts predicted sizes to physical units using configured scales, and decodes predicted class indices back into meaningful tags based on your target definitions.
|
||||
4. **User Control:** Postprocessing includes tunable parameters, most importantly **thresholds**.
|
||||
By adjusting these, you can control the trade-off between finding more potential calls (sensitivity) versus reducing false positives (specificity) _without needing to retrain the model_.
|
||||
|
||||
## The Postprocessing Pipeline
|
||||
|
||||
BatDetect2 applies a series of steps to convert the raw model output into final predictions.
|
||||
Understanding these steps helps interpret the results and configure the process effectively:
|
||||
|
||||
1. **Non-Maximum Suppression (NMS):**
|
||||
|
||||
- **Goal:** Reduce redundant detections.
|
||||
If the model outputs high scores for several nearby points corresponding to the same call, NMS selects the single highest peak in a local neighbourhood and suppresses the others (sets their score to zero).
|
||||
- **Configurable:** The size of the neighbourhood (`nms_kernel_size`) can be adjusted.
|
||||
|
||||
2. **Coordinate Remapping:**
|
||||
|
||||
- **Goal:** Add coordinate (time/frequency) information.
|
||||
This step takes the grid-based model outputs (which just have row/column indices) and associates them with actual time (seconds) and frequency (Hz) coordinates based on the input spectrogram's properties.
|
||||
The result is coordinate-aware arrays (using {py:class}`xarray.DataArray`}).
|
||||
|
||||
3. **Detection Extraction:**
|
||||
|
||||
- **Goal:** Identify the specific points representing detected events.
|
||||
- **Process:** Looks for peaks in the NMS-processed detection heatmap that are above a certain confidence level (`detection_threshold`).
|
||||
It also often limits the maximum number of detections based on a rate (`top_k_per_sec`) to avoid excessive outputs in very busy files.
|
||||
- **Configurable:** `detection_threshold`, `top_k_per_sec`.
|
||||
|
||||
4. **Data Extraction:**
|
||||
|
||||
- **Goal:** Gather all relevant information for each detected point.
|
||||
- **Process:** For each time-frequency location identified in Step 3, this step looks up the corresponding values in the _other_ remapped model output arrays (class probabilities, predicted sizes, internal features).
|
||||
- **Intermediate Output 1:** The result of this stage (containing aligned scores, positions, sizes, class probabilities, and features for all detections in a clip) is often accessible programmatically as an {py:class}`xarray.Dataset`}.
|
||||
This can be useful for advanced users needing direct access to the numerical outputs.
|
||||
|
||||
5. **Decoding & Formatting:**
|
||||
|
||||
- **Goal:** Convert the extracted numerical data into interpretable, standard formats.
|
||||
- **Process:**
|
||||
- **ROI Recovery:** Uses the predicted position and size values, along with the ROI mapping configuration defined in the `targets` module, to reconstruct an estimated bounding box ({py:class}`soundevent.data.BoundingBox`}).
|
||||
- **Class Decoding:** Translates the numerical class probability vector into meaningful {py:class}`soundevent.data.PredictedTag` objects.
|
||||
This involves:
|
||||
- Applying the `classification_threshold` to ignore low-confidence class scores.
|
||||
- Using the class decoding rules (from the `targets` module) to map the name(s) of the high-scoring class(es) back to standard tags (like `species: Myotis daubentonii`).
|
||||
- Optionally selecting only the top-scoring class or multiple classes above the threshold.
|
||||
- Including the generic "Bat" tags if no specific class meets the threshold.
|
||||
- **Feature Conversion:** Converts raw feature vectors into {py:class}`soundevent.data.Feature` objects.
|
||||
- **Intermediate Output 2:** This step might internally create a list of simplified `RawPrediction` objects containing the bounding box, scores, and features.
|
||||
This intermediate list might also be accessible programmatically for users who prefer a simpler structure than the final {py:mod}`soundevent` objects.
|
||||
|
||||
6. **Final Output (`ClipPrediction`):**
|
||||
- **Goal:** Package everything into a standard format.
|
||||
- **Process:** Collects all the fully processed `SoundEventPrediction` objects (each containing a sound event with geometry, features, overall score, and predicted tags with scores) for a given audio clip into a final {py:class}`soundevent.data.ClipPrediction` object.
|
||||
This is the standard output format representing the model's findings for that clip.
|
||||
|
||||
## Configuring Postprocessing
|
||||
|
||||
You can control key aspects of this pipeline, especially the thresholds and NMS settings, via a `postprocess:` section in your main configuration YAML file.
|
||||
Adjusting these **allows you to fine-tune the detection results without retraining the model**.
|
||||
|
||||
**Key Configurable Parameters:**
|
||||
|
||||
- `detection_threshold`: (Number >= 0, e.g., `0.1`) Minimum score for a peak to be considered a detection.
|
||||
**Lowering this increases sensitivity (more detections, potentially more false positives); raising it increases specificity (fewer detections, potentially missing faint calls).**
|
||||
- `classification_threshold`: (Number >= 0, e.g., `0.3`) Minimum score for a _specific class_ prediction to be assigned as a tag.
|
||||
Affects how confidently the model must identify the class.
|
||||
- `top_k_per_sec`: (Integer > 0, e.g., `200`) Limits the maximum density of detections reported per second.
|
||||
Helps manage extremely dense recordings.
|
||||
- `nms_kernel_size`: (Integer > 0, e.g., `9`) Size of the NMS window in pixels/bins.
|
||||
Affects how close two distinct peaks can be before one suppresses the other.
|
||||
|
||||
**Example YAML Configuration:**
|
||||
|
||||
```yaml
|
||||
# Inside your main configuration file (e.g., config.yaml)
|
||||
|
||||
postprocess:
|
||||
nms_kernel_size: 9
|
||||
detection_threshold: 0.1 # Lower threshold -> more sensitive
|
||||
classification_threshold: 0.3 # Higher threshold -> more confident classifications
|
||||
top_k_per_sec: 200
|
||||
# ... other sections preprocessing, targets ...
|
||||
```
|
||||
|
||||
**Note:** These parameters can often also be adjusted via Command Line Interface (CLI) arguments when running predictions, or through function arguments if using the Python API, providing flexibility for experimentation.
|
||||
|
||||
## Accessing Intermediate Results
|
||||
|
||||
While the final `ClipPrediction` objects are the standard output, the `Postprocessor` object used internally provides methods to access results from intermediate stages (like the `xr.Dataset` after Step 4, or the list of `RawPrediction` objects after Step 5).
|
||||
|
||||
This can be valuable for:
|
||||
|
||||
- Debugging the pipeline.
|
||||
- Performing custom analyses on the numerical outputs before final decoding.
|
||||
- **Transfer Learning / Feature Extraction:** Directly accessing the extracted `features` (from Step 4 or 5a) associated with detected events can be highly useful for training other models or further analysis.
|
||||
|
||||
Consult the API documentation for details on how to access these intermediate results programmatically if needed.
|
||||
|
||||
## Summary
|
||||
|
||||
Postprocessing is the conversion between neural network outputs and meaningful, interpretable sound event detections.
|
||||
BatDetect2 provides a configurable pipeline including NMS, coordinate remapping, peak detection with thresholding, data extraction, and class/geometry decoding.
|
||||
Researchers can easily tune key parameters like thresholds via configuration files or arguments to adjust the final set of predictions without altering the trained model itself, and advanced users can access intermediate results for custom analyses or feature reuse.
|
||||
@ -1,92 +0,0 @@
|
||||
# Audio Loading and Preprocessing
|
||||
|
||||
## Purpose
|
||||
|
||||
Before BatDetect2 can analyze the sounds in your recordings, the raw audio data needs to be loaded from the file and prepared.
|
||||
This initial preparation involves several standard waveform processing steps.
|
||||
This `audio` module handles this first stage of preprocessing.
|
||||
|
||||
It's crucial to understand that the _exact same_ preprocessing steps must be applied both when **training** a model and when **using** that trained model later to make predictions (inference).
|
||||
Consistent preprocessing ensures the model receives data in the format it expects.
|
||||
|
||||
BatDetect2 allows you to control these audio preprocessing steps through settings in your main configuration file.
|
||||
|
||||
## The Audio Processing Pipeline
|
||||
|
||||
When BatDetect2 needs to process an audio segment (either a full recording or a specific clip), it follows a defined sequence of steps:
|
||||
|
||||
1. **Load Audio Segment:** The system first reads the specified time segment from the audio file.
|
||||
- **Note:** BatDetect2 typically works with **mono** audio.
|
||||
By default, if your file has multiple channels (e.g., stereo), only the **first channel** is loaded and used for subsequent processing.
|
||||
2. **Adjust Duration (Optional):** If you've specified a target duration in your configuration, the loaded audio segment is either shortened (by cropping from the start) or lengthened (by adding silence, i.e., zeros, at the end) to match that exact duration.
|
||||
This is sometimes required by specific model architectures that expect fixed-size inputs.
|
||||
By default, this step is **off**, and the original clip duration is used.
|
||||
3. **Resample (Optional):** If configured (and usually **on** by default), the audio's sample rate is changed to a specific target value (e.g., 256,000 Hz).
|
||||
This is vital for standardizing the data, as different recording devices capture audio at different rates.
|
||||
The model needs to be trained and run on data with a consistent sample rate.
|
||||
4. **Center Waveform (Optional):** If configured (and typically **on** by default), the system removes any constant shift away from zero in the waveform (known as DC offset).
|
||||
This is a standard practice that can sometimes improve the quality of later signal processing steps.
|
||||
5. **Scale Amplitude (Optional):** If configured (typically **off** by default), the waveform's amplitude (loudness) is adjusted.
|
||||
The standard method used here is "peak normalization," which scales the entire clip so that the loudest point has an absolute value of 1.0.
|
||||
This can help standardize volume levels across different recordings, although it's not always necessary or desirable depending on your analysis goals.
|
||||
|
||||
## Configuring Audio Processing
|
||||
|
||||
You can control these steps via settings in your main configuration file (e.g., `config.yaml`), usually within a dedicated `audio:` section (which might itself be under a broader `preprocessing:` section).
|
||||
|
||||
Here are the key options you can set:
|
||||
|
||||
- **Resampling (`resample`)**:
|
||||
|
||||
- To enable resampling (recommended and usually default), include a `resample:` block.
|
||||
To disable it completely, you might set `resample: null` or omit the block.
|
||||
- `samplerate`: (Number) The target sample rate in Hertz (Hz) that all audio will be converted to.
|
||||
This **must** match the sample rate expected by the BatDetect2 model you are using or training (e.g., `samplerate: 256000`).
|
||||
- `mode`: (Text, `"poly"` or `"fourier"`) The underlying algorithm used for resampling.
|
||||
The default `"poly"` is generally a good choice.
|
||||
You typically don't need to change this unless you have specific reasons.
|
||||
|
||||
- **Duration (`duration`)**:
|
||||
|
||||
- (Number or `null`) Sets a fixed duration for all audio clips in **seconds**.
|
||||
If set (e.g., `duration: 4.0`), shorter clips are padded with silence, and longer clips are cropped.
|
||||
If `null` (default), the original clip duration is used.
|
||||
|
||||
- **Centering (`center`)**:
|
||||
|
||||
- (Boolean, `true` or `false`) Controls DC offset removal.
|
||||
Default is usually `true`.
|
||||
Set to `false` to disable.
|
||||
|
||||
- **Scaling (`scale`)**:
|
||||
- (Boolean, `true` or `false`) Controls peak amplitude normalization.
|
||||
Default is usually `false`.
|
||||
Set to `true` to enable scaling so the maximum absolute amplitude becomes 1.0.
|
||||
|
||||
**Example YAML Configuration:**
|
||||
|
||||
```yaml
|
||||
# Inside your main configuration file (e.g., training_config.yaml)
|
||||
|
||||
preprocessing: # Or this might be at the top level
|
||||
audio:
|
||||
# --- Resampling Settings ---
|
||||
resample: # Settings block to control resampling
|
||||
samplerate: 256000 # Target sample rate in Hz (Required if resampling)
|
||||
mode: poly # Algorithm ('poly' or 'fourier', optional, defaults to 'poly')
|
||||
# To disable resampling entirely, you might use:
|
||||
# resample: null
|
||||
|
||||
# --- Other Settings ---
|
||||
duration: null # Keep original clip duration (e.g., use 4.0 for 4 seconds)
|
||||
center: true # Remove DC offset (default is often true)
|
||||
scale: false # Do not normalize peak amplitude (default is often false)
|
||||
|
||||
# ... other configuration sections (like model, dataset, targets) ...
|
||||
```
|
||||
|
||||
## Outcome
|
||||
|
||||
After these steps, the output is a standardized audio waveform (represented as a numerical array with time information).
|
||||
This processed waveform is now ready for the next stage of preprocessing, which typically involves calculating the spectrogram (covered in the next module/section).
|
||||
Ensuring these audio preprocessing settings are consistent is fundamental for achieving reliable results in both training and inference.
|
||||
@ -1,46 +0,0 @@
|
||||
# Preprocessing Audio for BatDetect2
|
||||
|
||||
## What is Preprocessing?
|
||||
|
||||
Preprocessing refers to the steps taken to transform your raw audio recordings into a standardized format suitable for analysis by the BatDetect2 deep learning model.
|
||||
This module (`batdetect2.preprocessing`) provides the tools to perform these transformations.
|
||||
|
||||
## Why is Preprocessing Important?
|
||||
|
||||
Applying a consistent preprocessing pipeline is important for several reasons:
|
||||
|
||||
1. **Standardization:** Audio recordings vary significantly depending on the equipment used, recording conditions, and settings (e.g., different sample rates, varying loudness levels, background noise).
|
||||
Preprocessing helps standardize these aspects, making the data more uniform and allowing the model to learn relevant patterns more effectively.
|
||||
2. **Model Requirements:** Deep learning models, particularly those like BatDetect2 that analyze 2D-patterns in spectrograms, are designed to work with specific input characteristics.
|
||||
They often expect spectrograms of a certain size (time x frequency bins), with values represented on a particular scale (e.g., logarithmic/dB), and within a defined frequency range.
|
||||
Preprocessing ensures the data meets these requirements.
|
||||
3. **Consistency is Key:** Perhaps most importantly, the **exact same preprocessing steps** must be applied both when _training_ the model and when _using the trained model to make predictions_ (inference) on new data.
|
||||
Any discrepancy between the preprocessing used during training and inference can significantly degrade the model's performance and lead to unreliable results.
|
||||
BatDetect2's configurable pipeline ensures this consistency.
|
||||
|
||||
## How Preprocessing is Done in BatDetect2
|
||||
|
||||
BatDetect2 handles preprocessing through a configurable, two-stage pipeline:
|
||||
|
||||
1. **Audio Loading & Preparation:** This first stage deals with the raw audio waveform.
|
||||
It involves loading the specified audio segment (from a file or clip), selecting a single channel (mono), optionally resampling it to a consistent sample rate, optionally adjusting its duration, and applying basic waveform conditioning like centering (DC offset removal) and amplitude scaling.
|
||||
(Details in the {doc}`audio` section).
|
||||
2. **Spectrogram Generation:** The prepared audio waveform is then converted into a spectrogram.
|
||||
This involves calculating the Short-Time Fourier Transform (STFT) and then applying a series of configurable steps like cropping the frequency range, applying amplitude representations (like dB scale or PCEN), optional denoising, optional resizing to the model's required dimensions, and optional final normalization.
|
||||
(Details in the {doc}`spectrogram` section).
|
||||
|
||||
The entire pipeline is controlled via settings in your main configuration file (typically a YAML file), usually grouped under a `preprocessing:` section which contains subsections like `audio:` and `spectrogram:`.
|
||||
This allows you to easily define, share, and reproduce the exact preprocessing used for a specific model or experiment.
|
||||
|
||||
## Next Steps
|
||||
|
||||
Explore the following sections for detailed explanations of how to configure each stage of the preprocessing pipeline and how to use the resulting preprocessor:
|
||||
|
||||
```{toctree}
|
||||
:maxdepth: 1
|
||||
:caption: Preprocessing Steps:
|
||||
|
||||
audio
|
||||
spectrogram
|
||||
usage
|
||||
```
|
||||
@ -1,141 +0,0 @@
|
||||
# Spectrogram Generation
|
||||
|
||||
## Purpose
|
||||
|
||||
After loading and performing initial processing on the audio waveform (as described in the Audio Loading section), the next crucial step in the `preprocessing` pipeline is to convert that waveform into a **spectrogram**.
|
||||
A spectrogram is a visual representation of sound, showing frequency content over time, and it's the primary input format for many deep learning models, including BatDetect2.
|
||||
|
||||
This module handles the calculation and subsequent processing of the spectrogram.
|
||||
Just like the audio processing, these steps need to be applied **consistently** during both model training and later use (inference) to ensure the model performs reliably.
|
||||
You control this entire process through the configuration file.
|
||||
|
||||
## The Spectrogram Generation Pipeline
|
||||
|
||||
Once BatDetect2 has a prepared audio waveform, it follows these steps to create the final spectrogram input for the model:
|
||||
|
||||
1. **Calculate STFT (Short-Time Fourier Transform):** This is the fundamental step that converts the 1D audio waveform into a 2D time-frequency representation.
|
||||
It calculates the frequency content within short, overlapping time windows.
|
||||
The output is typically a **magnitude spectrogram**, showing the intensity (amplitude) of different frequencies at different times.
|
||||
Key parameters here are the `window_duration` and `window_overlap`, which affect the trade-off between time and frequency resolution.
|
||||
2. **Crop Frequencies:** The STFT often produces frequency information over a very wide range (e.g., 0 Hz up to half the sample rate).
|
||||
This step crops the spectrogram to focus only on the frequency range relevant to your target sounds (e.g., 10 kHz to 120 kHz for typical bat echolocation).
|
||||
3. **Apply PCEN (Optional):** If configured, Per-Channel Energy Normalization is applied.
|
||||
PCEN is an adaptive technique that adjusts the gain (loudness) in each frequency channel based on its recent history.
|
||||
It can help suppress stationary background noise and enhance the prominence of transient sounds like echolocation pulses.
|
||||
This step is optional.
|
||||
4. **Set Amplitude Scale / Representation:** The values in the spectrogram (either raw magnitude or post-PCEN values) need to be represented on a suitable scale.
|
||||
You choose one of the following:
|
||||
- `"amplitude"`: Use the linear magnitude values directly.
|
||||
(Default)
|
||||
- `"power"`: Use the squared magnitude values (representing energy).
|
||||
- `"dB"`: Apply a logarithmic transformation (specifically `log(1 + C*Magnitude)`).
|
||||
This compresses the range of values, often making variations in quieter sounds more apparent, similar to how humans perceive loudness.
|
||||
5. **Denoise (Optional):** If configured (and usually **on** by default), a simple noise reduction technique is applied.
|
||||
This method subtracts the average value of each frequency bin (calculated across time) from that bin, assuming the average represents steady background noise.
|
||||
Negative values after subtraction are clipped to zero.
|
||||
6. **Resize (Optional):** If configured, the dimensions (height/frequency bins and width/time bins) of the spectrogram are adjusted using interpolation to match the exact input size expected by the neural network architecture.
|
||||
7. **Peak Normalize (Optional):** If configured (typically **off** by default), the entire final spectrogram is scaled so that its highest value is exactly 1.0.
|
||||
This ensures all spectrograms fed to the model have a consistent maximum value, which can sometimes aid training stability.
|
||||
|
||||
## Configuring Spectrogram Generation
|
||||
|
||||
You control all these steps via settings in your main configuration file (e.g., `config.yaml`), within the `spectrogram:` section (usually located under the main `preprocessing:` section).
|
||||
|
||||
Here are the key configuration options:
|
||||
|
||||
- **STFT Settings (`stft`)**:
|
||||
|
||||
- `window_duration`: (Number, seconds, e.g., `0.002`) Length of the analysis window.
|
||||
- `window_overlap`: (Number, 0.0 to <1.0, e.g., `0.75`) Fractional overlap between windows.
|
||||
- `window_fn`: (Text, e.g., `"hann"`) Name of the windowing function.
|
||||
|
||||
- **Frequency Cropping (`frequencies`)**:
|
||||
|
||||
- `min_freq`: (Integer, Hz, e.g., `10000`) Minimum frequency to keep.
|
||||
- `max_freq`: (Integer, Hz, e.g., `120000`) Maximum frequency to keep.
|
||||
|
||||
- **PCEN (`pcen`)**:
|
||||
|
||||
- This entire section is **optional**.
|
||||
Include it only if you want to apply PCEN.
|
||||
If omitted or set to `null`, PCEN is skipped.
|
||||
- `time_constant`: (Number, seconds, e.g., `0.4`) Controls adaptation speed.
|
||||
- `gain`: (Number, e.g., `0.98`) Gain factor.
|
||||
- `bias`: (Number, e.g., `2.0`) Bias factor.
|
||||
- `power`: (Number, e.g., `0.5`) Compression exponent.
|
||||
|
||||
- **Amplitude Scale (`scale`)**:
|
||||
|
||||
- (Text: `"dB"`, `"power"`, or `"amplitude"`) Selects the final representation of the spectrogram values.
|
||||
Default is `"amplitude"`.
|
||||
|
||||
- **Denoising (`spectral_mean_substraction`)**:
|
||||
|
||||
- (Boolean: `true` or `false`) Enables/disables the spectral mean subtraction denoising step.
|
||||
Default is usually `true`.
|
||||
|
||||
- **Resizing (`size`)**:
|
||||
|
||||
- This entire section is **optional**.
|
||||
Include it only if you need to resize the spectrogram to specific dimensions required by the model.
|
||||
If omitted or set to `null`, no resizing occurs after frequency cropping.
|
||||
- `height`: (Integer, e.g., `128`) Target number of frequency bins.
|
||||
- `resize_factor`: (Number or `null`, e.g., `0.5`) Factor to scale the time dimension by.
|
||||
`0.5` halves the width, `null` or `1.0` keeps the original width.
|
||||
|
||||
- **Peak Normalization (`peak_normalize`)**:
|
||||
- (Boolean: `true` or `false`) Enables/disables final scaling of the entire spectrogram so the maximum value is 1.0.
|
||||
Default is usually `false`.
|
||||
|
||||
**Example YAML Configuration:**
|
||||
|
||||
```yaml
|
||||
# Inside your main configuration file
|
||||
|
||||
preprocessing:
|
||||
audio:
|
||||
# ... (your audio configuration settings) ...
|
||||
resample:
|
||||
samplerate: 256000 # Ensure this matches model needs
|
||||
|
||||
spectrogram:
|
||||
# --- STFT Parameters ---
|
||||
stft:
|
||||
window_duration: 0.002 # 2ms window
|
||||
window_overlap: 0.75 # 75% overlap
|
||||
window_fn: hann
|
||||
|
||||
# --- Frequency Range ---
|
||||
frequencies:
|
||||
min_freq: 10000 # 10 kHz
|
||||
max_freq: 120000 # 120 kHz
|
||||
|
||||
# --- PCEN (Optional) ---
|
||||
# Include this block to enable PCEN, omit or set to null to disable.
|
||||
pcen:
|
||||
time_constant: 0.4
|
||||
gain: 0.98
|
||||
bias: 2.0
|
||||
power: 0.5
|
||||
|
||||
# --- Final Amplitude Representation ---
|
||||
scale: dB # Choose 'dB', 'power', or 'amplitude'
|
||||
|
||||
# --- Denoising ---
|
||||
spectral_mean_substraction: true # Enable spectral mean subtraction
|
||||
|
||||
# --- Resizing (Optional) ---
|
||||
# Include this block to resize, omit or set to null to disable.
|
||||
size:
|
||||
height: 128 # Target height in frequency bins
|
||||
resize_factor: 0.5 # Halve the number of time bins
|
||||
|
||||
# --- Final Normalization ---
|
||||
peak_normalize: false # Do not scale max value to 1.0
|
||||
```
|
||||
|
||||
## Outcome
|
||||
|
||||
The output of this module is the final, processed spectrogram (as a 2D numerical array with time and frequency information).
|
||||
This spectrogram is now in the precise format expected by the BatDetect2 neural network, ready to be used for training the model or for making predictions on new data.
|
||||
Remember, using the exact same `spectrogram` configuration settings during training and inference is essential for correct model performance.
|
||||
@ -1,175 +0,0 @@
|
||||
# Using Preprocessors in BatDetect2
|
||||
|
||||
## Overview
|
||||
|
||||
In the previous sections ({doc}`audio`and {doc}`spectrogram`), we discussed the individual steps involved in converting raw audio into a processed spectrogram suitable for BatDetect2 models, and how to configure these steps using YAML files (specifically the `audio:` and `spectrogram:` sections within a main `preprocessing:` configuration block).
|
||||
|
||||
This page focuses on how this configured pipeline is represented and used within BatDetect2, primarily through the concept of a **`Preprocessor`** object.
|
||||
This object bundles together your chosen audio loading settings and spectrogram generation settings into a single component that can perform the end-to-end processing.
|
||||
|
||||
## Do I Need to Interact with Preprocessors Directly?
|
||||
|
||||
**Usually, no.** For standard model training or running inference with BatDetect2 using the provided scripts, the system will automatically:
|
||||
|
||||
1. Read your main configuration file (e.g., `config.yaml`).
|
||||
2. Find the `preprocessing:` section (containing `audio:` and `spectrogram:` settings).
|
||||
3. Build the appropriate `Preprocessor` object internally based on your settings.
|
||||
4. Use that internal `Preprocessor` object automatically whenever audio needs to be loaded and converted to a spectrogram.
|
||||
|
||||
**However**, understanding the `Preprocessor` object is useful if you want to:
|
||||
|
||||
- **Customize:** Go beyond the standard configuration options by interacting with parts of the pipeline programmatically.
|
||||
- **Integrate:** Use BatDetect2's preprocessing steps within your own custom Python analysis scripts.
|
||||
- **Inspect/Debug:** Manually apply preprocessing steps to specific files or clips to examine intermediate outputs (like the processed waveform) or the final spectrogram.
|
||||
|
||||
## Getting a Preprocessor Object
|
||||
|
||||
If you _do_ want to work with the preprocessor programmatically, you first need to get an instance of it.
|
||||
This is typically done based on a configuration:
|
||||
|
||||
1. **Define Configuration:** Create your `preprocessing:` configuration, usually in a YAML file (let's call it `preprocess_config.yaml`), detailing your desired `audio` and `spectrogram` settings.
|
||||
|
||||
```yaml
|
||||
# preprocess_config.yaml
|
||||
audio:
|
||||
resample:
|
||||
samplerate: 256000
|
||||
# ... other audio settings ...
|
||||
spectrogram:
|
||||
frequencies:
|
||||
min_freq: 15000
|
||||
max_freq: 120000
|
||||
scale: dB
|
||||
# ... other spectrogram settings ...
|
||||
```
|
||||
|
||||
2. **Load Configuration & Build Preprocessor (in Python):**
|
||||
|
||||
```python
|
||||
from batdetect2.preprocessing import load_preprocessing_config, build_preprocessor
|
||||
from batdetect2.preprocess.types import Preprocessor # Import the type
|
||||
|
||||
# Load the configuration from the file
|
||||
config_path = "path/to/your/preprocess_config.yaml"
|
||||
preprocessing_config = load_preprocessing_config(config_path)
|
||||
|
||||
# Build the actual preprocessor object using the loaded config
|
||||
preprocessor: Preprocessor = build_preprocessor(preprocessing_config)
|
||||
|
||||
# 'preprocessor' is now ready to use!
|
||||
```
|
||||
|
||||
3. **Using Defaults:** If you just want the standard BatDetect2 default preprocessing settings, you can build one without loading a config file:
|
||||
|
||||
```python
|
||||
from batdetect2.preprocessing import build_preprocessor
|
||||
from batdetect2.preprocess.types import Preprocessor
|
||||
|
||||
# Build with default settings
|
||||
default_preprocessor: Preprocessor = build_preprocessor()
|
||||
```
|
||||
|
||||
## Applying Preprocessing
|
||||
|
||||
Once you have a `preprocessor` object, you can use its methods to process audio data:
|
||||
|
||||
**1.
|
||||
End-to-End Processing (Common Use Case):**
|
||||
|
||||
These methods take an audio source identifier (file path, Recording object, or Clip object) and return the final, processed spectrogram.
|
||||
|
||||
- `preprocessor.preprocess_file(path)`: Processes an entire audio file.
|
||||
- `preprocessor.preprocess_recording(recording_obj)`: Processes the entire audio associated with a `soundevent.data.Recording` object.
|
||||
- `preprocessor.preprocess_clip(clip_obj)`: Processes only the specific time segment defined by a `soundevent.data.Clip` object.
|
||||
- **Efficiency Note:** Using `preprocess_clip` is **highly recommended** when you are only interested in analyzing a small portion of a potentially long recording.
|
||||
It avoids loading the entire audio file into memory, making it much more efficient.
|
||||
|
||||
```python
|
||||
from soundevent import data
|
||||
|
||||
# Assume 'preprocessor' is built as shown before
|
||||
# Assume 'my_clip' is a soundevent.data.Clip object defining a segment
|
||||
|
||||
# Process an entire file
|
||||
spectrogram_from_file = preprocessor.preprocess_file("my_recording.wav")
|
||||
|
||||
# Process only a specific clip (more efficient for segments)
|
||||
spectrogram_from_clip = preprocessor.preprocess_clip(my_clip)
|
||||
|
||||
# The results (spectrogram_from_file, spectrogram_from_clip) are xr.DataArrays
|
||||
print(type(spectrogram_from_clip))
|
||||
# Output: <class 'xarray.core.dataarray.DataArray'>
|
||||
```
|
||||
|
||||
**2.
|
||||
Intermediate Steps (Advanced Use Cases):**
|
||||
|
||||
The preprocessor also allows access to intermediate stages if needed:
|
||||
|
||||
- `preprocessor.load_clip_audio(clip_obj)` (and similar for file/recording): Loads the audio and applies _only_ the waveform processing steps (resampling, centering, etc.) defined in the `audio` config.
|
||||
Returns the processed waveform as an `xr.DataArray`.
|
||||
This is useful if you want to analyze or manipulate the waveform itself before spectrogram generation.
|
||||
- `preprocessor.compute_spectrogram(waveform)`: Takes an _already loaded_ waveform (either `np.ndarray` or `xr.DataArray`) and applies _only_ the spectrogram generation steps defined in the `spectrogram` config.
|
||||
- If you provide an `xr.DataArray` (e.g., from `load_clip_audio`), it uses the sample rate from the array's coordinates.
|
||||
- If you provide a raw `np.ndarray`, it **must assume a sample rate**.
|
||||
It uses the `default_samplerate` that was determined when the `preprocessor` was built (based on your `audio` config's resample settings or the global default).
|
||||
Be cautious when using NumPy arrays to ensure the sample rate assumption is correct for your data!
|
||||
|
||||
```python
|
||||
# Example: Get waveform first, then spectrogram
|
||||
waveform = preprocessor.load_clip_audio(my_clip)
|
||||
# waveform is an xr.DataArray
|
||||
|
||||
# ...potentially do other things with the waveform...
|
||||
|
||||
# Compute spectrogram from the loaded waveform
|
||||
spectrogram = preprocessor.compute_spectrogram(waveform)
|
||||
|
||||
# Example: Process external numpy array (use with caution re: sample rate)
|
||||
# import soundfile as sf # Requires installing soundfile
|
||||
# numpy_waveform, original_sr = sf.read("some_other_audio.wav")
|
||||
# # MUST ensure numpy_waveform's actual sample rate matches
|
||||
# # preprocessor.default_samplerate for correct results here!
|
||||
# spec_from_numpy = preprocessor.compute_spectrogram(numpy_waveform)
|
||||
```
|
||||
|
||||
## Understanding the Output: `xarray.DataArray`
|
||||
|
||||
All preprocessing methods return the final spectrogram (or the intermediate waveform) as an **`xarray.DataArray`**.
|
||||
|
||||
**What is it?** Think of it like a standard NumPy array (holding the numerical data of the spectrogram) but with added "superpowers":
|
||||
|
||||
- **Labeled Dimensions:** Instead of just having axis 0 and axis 1, the dimensions have names, typically `"frequency"` and `"time"`.
|
||||
- **Coordinates:** It stores the actual frequency values (e.g., in Hz) corresponding to each row and the actual time values (e.g., in seconds) corresponding to each column along the dimensions.
|
||||
|
||||
**Why is it used?**
|
||||
|
||||
- **Clarity:** The data is self-describing.
|
||||
You don't need to remember which axis is time and which is frequency, or what the units are – it's stored with the data.
|
||||
- **Convenience:** You can select, slice, or plot data using the real-world coordinate values (times, frequencies) instead of just numerical indices.
|
||||
This makes analysis code easier to write and less prone to errors.
|
||||
- **Metadata:** It can hold additional metadata about the processing steps in its `attrs` (attributes) dictionary.
|
||||
|
||||
**Using the Output:**
|
||||
|
||||
- **Input to Model:** For standard training or inference, you typically pass this `xr.DataArray` spectrogram directly to the BatDetect2 model functions.
|
||||
- **Inspection/Analysis:** If you're working programmatically, you can use xarray's powerful features.
|
||||
For example (these are just illustrations of xarray):
|
||||
|
||||
```python
|
||||
# Get the shape (frequency_bins, time_bins)
|
||||
# print(spectrogram.shape)
|
||||
|
||||
# Get the frequency coordinate values
|
||||
# print(spectrogram['frequency'].values)
|
||||
|
||||
# Select data near a specific time and frequency
|
||||
# value_at_point = spectrogram.sel(time=0.5, frequency=50000, method="nearest")
|
||||
# print(value_at_point)
|
||||
|
||||
# Select a time slice between 0.2 and 0.3 seconds
|
||||
# time_slice = spectrogram.sel(time=slice(0.2, 0.3))
|
||||
# print(time_slice.shape)
|
||||
```
|
||||
|
||||
In summary, while BatDetect2 often handles preprocessing automatically based on your configuration, the underlying `Preprocessor` object provides a flexible interface for applying these steps programmatically if needed, returning results in the convenient and informative `xarray.DataArray` format.
|
||||
8
docs/source/reference/cli/base.rst
Normal file
8
docs/source/reference/cli/base.rst
Normal file
@ -0,0 +1,8 @@
|
||||
Base command
|
||||
============
|
||||
|
||||
The options on this page apply to all subcommands.
|
||||
|
||||
.. click:: batdetect2.cli:cli
|
||||
:prog: batdetect2
|
||||
:nested: none
|
||||
8
docs/source/reference/cli/data.rst
Normal file
8
docs/source/reference/cli/data.rst
Normal file
@ -0,0 +1,8 @@
|
||||
Data command
|
||||
============
|
||||
|
||||
Inspect and convert dataset config files.
|
||||
|
||||
.. click:: batdetect2.cli.data:data
|
||||
:prog: batdetect2 data
|
||||
:nested: full
|
||||
18
docs/source/reference/cli/detect_legacy.rst
Normal file
18
docs/source/reference/cli/detect_legacy.rst
Normal file
@ -0,0 +1,18 @@
|
||||
Legacy detect command
|
||||
=====================
|
||||
|
||||
.. warning::
|
||||
|
||||
``batdetect2 detect`` is a legacy compatibility command.
|
||||
Prefer ``batdetect2 predict directory`` for new workflows.
|
||||
|
||||
Migration at a glance
|
||||
---------------------
|
||||
|
||||
- Legacy: ``batdetect2 detect AUDIO_DIR ANN_DIR DETECTION_THRESHOLD``
|
||||
- Current: ``batdetect2 predict directory MODEL_PATH AUDIO_DIR OUTPUT_PATH``
|
||||
with optional ``--detection-threshold``
|
||||
|
||||
.. click:: batdetect2.cli.compat:detect
|
||||
:prog: batdetect2 detect
|
||||
:nested: none
|
||||
8
docs/source/reference/cli/evaluate.rst
Normal file
8
docs/source/reference/cli/evaluate.rst
Normal file
@ -0,0 +1,8 @@
|
||||
Evaluate command
|
||||
================
|
||||
|
||||
Evaluate a checkpoint against a configured test dataset.
|
||||
|
||||
.. click:: batdetect2.cli.evaluate:evaluate_command
|
||||
:prog: batdetect2 evaluate
|
||||
:nested: none
|
||||
51
docs/source/reference/cli/index.md
Normal file
51
docs/source/reference/cli/index.md
Normal file
@ -0,0 +1,51 @@
|
||||
# CLI reference
|
||||
|
||||
Use this section to find the right command quickly, then open the command page
|
||||
for full options and argument details.
|
||||
|
||||
## How to use this section
|
||||
|
||||
1. Start with {doc}`base` for options shared across the CLI.
|
||||
2. Pick the command group or command you need from the command map below.
|
||||
3. Open the linked page for complete autogenerated option reference.
|
||||
|
||||
## Command map
|
||||
|
||||
| Command | Use it for | Required positional args |
|
||||
| --- | --- | --- |
|
||||
| `batdetect2 predict` | Run inference on audio | Depends on subcommand (`directory`, `file_list`, `dataset`) |
|
||||
| `batdetect2 data` | Inspect and convert dataset configs | Depends on subcommand (`summary`, `convert`) |
|
||||
| `batdetect2 train` | Train or fine-tune models | `TRAIN_DATASET` |
|
||||
| `batdetect2 evaluate` | Evaluate a checkpoint on a test dataset | `MODEL_PATH`, `TEST_DATASET` |
|
||||
| `batdetect2 detect` | Legacy compatibility workflow | `AUDIO_DIR`, `ANN_DIR`, `DETECTION_THRESHOLD` |
|
||||
|
||||
## Global options and conventions
|
||||
|
||||
- Global CLI options are documented in {doc}`base`.
|
||||
- Paths with spaces should be wrapped in quotes.
|
||||
- Input audio is expected to be mono.
|
||||
- Legacy `detect` uses a required threshold argument, while `predict` uses
|
||||
the optional `--detection-threshold` override.
|
||||
|
||||
```{warning}
|
||||
`batdetect2 detect` is a legacy command.
|
||||
Prefer `batdetect2 predict directory` for new workflows.
|
||||
```
|
||||
|
||||
## Related pages
|
||||
|
||||
- {doc}`../../tutorials/run-inference-on-folder`
|
||||
- {doc}`../../how_to/run-batch-predictions`
|
||||
- {doc}`../../how_to/tune-detection-threshold`
|
||||
- {doc}`../configs`
|
||||
|
||||
```{toctree}
|
||||
:maxdepth: 1
|
||||
|
||||
Base command and global options <base>
|
||||
Predict command group <predict>
|
||||
Data command group <data>
|
||||
Train command <train>
|
||||
Evaluate command <evaluate>
|
||||
Legacy detect command <detect_legacy>
|
||||
```
|
||||
9
docs/source/reference/cli/predict.rst
Normal file
9
docs/source/reference/cli/predict.rst
Normal file
@ -0,0 +1,9 @@
|
||||
Predict command
|
||||
===============
|
||||
|
||||
Run model inference from a directory, a file list, or a dataset.
|
||||
Use ``--detection-threshold`` to override the model default per run.
|
||||
|
||||
.. click:: batdetect2.cli.inference:predict
|
||||
:prog: batdetect2 predict
|
||||
:nested: full
|
||||
8
docs/source/reference/cli/train.rst
Normal file
8
docs/source/reference/cli/train.rst
Normal file
@ -0,0 +1,8 @@
|
||||
Train command
|
||||
=============
|
||||
|
||||
Train a model from dataset configs or fine-tune from a checkpoint.
|
||||
|
||||
.. click:: batdetect2.cli.train:train_command
|
||||
:prog: batdetect2 train
|
||||
:nested: none
|
||||
@ -1,7 +0,0 @@
|
||||
# Config Reference
|
||||
|
||||
```{eval-rst}
|
||||
.. automodule:: batdetect2.configs
|
||||
:members:
|
||||
:inherited-members: pydantic.BaseModel
|
||||
```
|
||||
5
docs/source/reference/configs.rst
Normal file
5
docs/source/reference/configs.rst
Normal file
@ -0,0 +1,5 @@
|
||||
Config reference
|
||||
================
|
||||
|
||||
.. automodule:: batdetect2.config
|
||||
:members:
|
||||
76
docs/source/reference/data-sources.md
Normal file
76
docs/source/reference/data-sources.md
Normal file
@ -0,0 +1,76 @@
|
||||
# Data source reference
|
||||
|
||||
This page summarizes dataset source formats and their config fields.
|
||||
|
||||
## Supported source formats
|
||||
|
||||
| Format | Description |
|
||||
| --- | --- |
|
||||
| `aoef` | AOEF/soundevent annotation files (`AnnotationSet` or `AnnotationProject`) |
|
||||
| `batdetect2` | Legacy format with one JSON annotation file per recording |
|
||||
| `batdetect2_file` | Legacy format with one merged JSON annotation file |
|
||||
|
||||
## AOEF (`format: aoef`)
|
||||
|
||||
Required fields:
|
||||
|
||||
- `name`
|
||||
- `format`
|
||||
- `audio_dir`
|
||||
- `annotations_path`
|
||||
|
||||
Optional fields:
|
||||
|
||||
- `description`
|
||||
- `filter`
|
||||
|
||||
`filter` is only used when `annotations_path` points to an
|
||||
`AnnotationProject`.
|
||||
|
||||
AOEF filter options:
|
||||
|
||||
- `only_completed` (default: `true`)
|
||||
- `only_verified` (default: `false`)
|
||||
- `exclude_issues` (default: `true`)
|
||||
|
||||
Use `filter: null` to disable project filtering.
|
||||
|
||||
## Legacy per-file (`format: batdetect2`)
|
||||
|
||||
Required fields:
|
||||
|
||||
- `name`
|
||||
- `format`
|
||||
- `audio_dir`
|
||||
- `annotations_dir`
|
||||
|
||||
Optional fields:
|
||||
|
||||
- `description`
|
||||
- `filter`
|
||||
|
||||
## Legacy merged file (`format: batdetect2_file`)
|
||||
|
||||
Required fields:
|
||||
|
||||
- `name`
|
||||
- `format`
|
||||
- `audio_dir`
|
||||
- `annotations_path`
|
||||
|
||||
Optional fields:
|
||||
|
||||
- `description`
|
||||
- `filter`
|
||||
|
||||
Legacy filter options:
|
||||
|
||||
- `only_annotated` (default: `true`)
|
||||
- `exclude_issues` (default: `true`)
|
||||
|
||||
Use `filter: null` to disable filtering.
|
||||
|
||||
## Related guides
|
||||
|
||||
- {doc}`../how_to/configure-aoef-dataset`
|
||||
- {doc}`../how_to/import-legacy-batdetect2-annotations`
|
||||
@ -1,10 +1,16 @@
|
||||
# Reference documentation
|
||||
|
||||
```{eval-rst}
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
:caption: Contents:
|
||||
Reference pages provide factual, complete descriptions of commands,
|
||||
configuration, and data structures.
|
||||
|
||||
configs
|
||||
targets
|
||||
```{toctree}
|
||||
:maxdepth: 1
|
||||
|
||||
cli/index
|
||||
data-sources
|
||||
preprocessing-config
|
||||
postprocess-config
|
||||
targets-config-workflow
|
||||
configs
|
||||
targets
|
||||
```
|
||||
|
||||
31
docs/source/reference/postprocess-config.md
Normal file
31
docs/source/reference/postprocess-config.md
Normal file
@ -0,0 +1,31 @@
|
||||
# Postprocess config reference
|
||||
|
||||
`PostprocessConfig` controls how raw detector outputs are converted into final
|
||||
detections.
|
||||
|
||||
Defined in `batdetect2.postprocess.config`.
|
||||
|
||||
## Fields
|
||||
|
||||
- `nms_kernel_size` (int > 0)
|
||||
- neighborhood size for non-maximum suppression.
|
||||
- `detection_threshold` (float >= 0)
|
||||
- minimum detection score to keep a candidate event.
|
||||
- `classification_threshold` (float >= 0)
|
||||
- minimum class score used when assigning class tags.
|
||||
- `top_k_per_sec` (int > 0)
|
||||
- maximum detection density per second.
|
||||
|
||||
## Defaults
|
||||
|
||||
- `detection_threshold`: `0.01`
|
||||
- `classification_threshold`: `0.1`
|
||||
- `top_k_per_sec`: `100`
|
||||
|
||||
`nms_kernel_size` defaults to the library constant used by the NMS module.
|
||||
|
||||
## Related pages
|
||||
|
||||
- Threshold behaviour: {doc}`../explanation/postprocessing-and-thresholds`
|
||||
- Threshold tuning workflow: {doc}`../how_to/tune-detection-threshold`
|
||||
- CLI predict options: {doc}`cli/predict`
|
||||
61
docs/source/reference/preprocessing-config.md
Normal file
61
docs/source/reference/preprocessing-config.md
Normal file
@ -0,0 +1,61 @@
|
||||
# Preprocessing config reference
|
||||
|
||||
This page summarizes preprocessing-related config objects used by batdetect2.
|
||||
|
||||
## Audio loader config (`AudioConfig`)
|
||||
|
||||
Defined in `batdetect2.audio.loader`.
|
||||
|
||||
Fields:
|
||||
|
||||
- `samplerate` (int): target audio sample rate in Hz.
|
||||
- `resample.enabled` (bool): whether to resample loaded audio.
|
||||
- `resample.method` (`poly` or `fourier`): resampling method.
|
||||
|
||||
## Model preprocessing config (`PreprocessingConfig`)
|
||||
|
||||
Defined in `batdetect2.preprocess.config`.
|
||||
|
||||
Top-level fields:
|
||||
|
||||
- `audio_transforms`: ordered waveform transforms.
|
||||
- `stft`: STFT parameters.
|
||||
- `frequencies`: spectrogram frequency range.
|
||||
- `spectrogram_transforms`: ordered spectrogram transforms.
|
||||
- `size`: final resize settings.
|
||||
|
||||
### `audio_transforms` built-ins
|
||||
|
||||
- `center_audio`
|
||||
- `scale_audio`
|
||||
- `fix_duration` (`duration` in seconds)
|
||||
|
||||
### `stft` fields
|
||||
|
||||
- `window_duration`
|
||||
- `window_overlap`
|
||||
- `window_fn`
|
||||
|
||||
### `frequencies` fields
|
||||
|
||||
- `min_freq`
|
||||
- `max_freq`
|
||||
|
||||
### `spectrogram_transforms` built-ins
|
||||
|
||||
- `pcen`
|
||||
- `scale_amplitude` (`scale: db|power`)
|
||||
- `spectral_mean_subtraction`
|
||||
- `peak_normalize`
|
||||
|
||||
### `size` fields
|
||||
|
||||
- `height`
|
||||
- `resize_factor`
|
||||
|
||||
## Related pages
|
||||
|
||||
- Audio preprocessing how-to: {doc}`../how_to/configure-audio-preprocessing`
|
||||
- Spectrogram preprocessing how-to:
|
||||
{doc}`../how_to/configure-spectrogram-preprocessing`
|
||||
- Why consistency matters: {doc}`../explanation/preprocessing-consistency`
|
||||
67
docs/source/reference/targets-config-workflow.md
Normal file
67
docs/source/reference/targets-config-workflow.md
Normal file
@ -0,0 +1,67 @@
|
||||
# Targets config workflow reference
|
||||
|
||||
This page summarizes the target-definition configuration used by batdetect2.
|
||||
|
||||
## `TargetConfig`
|
||||
|
||||
Defined in `batdetect2.targets.config`.
|
||||
|
||||
Fields:
|
||||
|
||||
- `detection_target`: one `TargetClassConfig` defining detection eligibility.
|
||||
- `classification_targets`: list of `TargetClassConfig` entries for class
|
||||
encoding/decoding.
|
||||
- `roi`: ROI mapping config with `default` mapper and optional per-class
|
||||
`overrides`.
|
||||
|
||||
## `TargetClassConfig`
|
||||
|
||||
Defined in `batdetect2.targets.classes`.
|
||||
|
||||
Fields:
|
||||
|
||||
- `name`: class label name.
|
||||
- `tags`: tag list used for matching (shortcut for `match_if`).
|
||||
- `match_if`: explicit condition config (`match_if` is accepted as alias).
|
||||
- `assign_tags`: tags used when decoding this class.
|
||||
|
||||
`tags` and `match_if` are mutually exclusive.
|
||||
|
||||
## Supported condition config types
|
||||
|
||||
Built from `batdetect2.data.conditions`.
|
||||
|
||||
- `has_tag`
|
||||
- `has_all_tags`
|
||||
- `has_any_tag`
|
||||
- `duration`
|
||||
- `frequency`
|
||||
- `all_of`
|
||||
- `any_of`
|
||||
- `not`
|
||||
|
||||
## ROI mapper config
|
||||
|
||||
`roi.default` and each `roi.overrides.<class_name>` entry support built-in
|
||||
mappers including:
|
||||
|
||||
- `anchor_bbox`
|
||||
- `peak_energy_bbox`
|
||||
|
||||
Key `anchor_bbox` fields:
|
||||
|
||||
- `anchor`
|
||||
- `time_scale`
|
||||
- `frequency_scale`
|
||||
|
||||
Top-level ROI mapping shape:
|
||||
|
||||
- `default`: fallback mapper used for all classes.
|
||||
- `overrides`: optional mapping from class name to mapper config.
|
||||
|
||||
## Related pages
|
||||
|
||||
- Detection target setup: {doc}`../how_to/configure-target-definitions`
|
||||
- Class setup: {doc}`../how_to/define-target-classes`
|
||||
- ROI setup: {doc}`../how_to/configure-roi-mapping`
|
||||
- Concept overview: {doc}`../explanation/target-encoding-and-decoding`
|
||||
@ -1,6 +0,0 @@
|
||||
# Targets Reference
|
||||
|
||||
```{eval-rst}
|
||||
.. automodule:: batdetect2.targets
|
||||
:members:
|
||||
```
|
||||
5
docs/source/reference/targets.rst
Normal file
5
docs/source/reference/targets.rst
Normal file
@ -0,0 +1,5 @@
|
||||
Targets reference
|
||||
=================
|
||||
|
||||
.. automodule:: batdetect2.targets
|
||||
:members:
|
||||
@ -1,141 +0,0 @@
|
||||
# Step 4: Defining Target Classes and Decoding Rules
|
||||
|
||||
## Purpose and Context
|
||||
|
||||
You've prepared your data by defining your annotation vocabulary (Step 1: Terms), removing irrelevant sounds (Step 2: Filtering), and potentially cleaning up or modifying tags (Step 3: Transforming Tags).
|
||||
Now, it's time for a crucial step with two related goals:
|
||||
|
||||
1. Telling `batdetect2` **exactly what categories (classes) your model should learn to identify** by defining rules that map annotation tags to class names (like `pippip`, `myodau`, or `noise`).
|
||||
This process is often called **encoding**.
|
||||
2. Defining how the model's predictions (those same class names) should be translated back into meaningful, structured **annotation tags** when you use the trained model.
|
||||
This is often called **decoding**.
|
||||
|
||||
These definitions are essential for both training the model correctly and interpreting its output later.
|
||||
|
||||
## How it Works: Defining Classes with Rules
|
||||
|
||||
You define your target classes and their corresponding decoding rules in your main configuration file (e.g., your `.yaml` training config), typically under a section named `classes`.
|
||||
This section contains:
|
||||
|
||||
1. A **list** of specific class definitions.
|
||||
2. A definition for the **generic class** tags.
|
||||
|
||||
Each item in the `classes` list defines one specific class your model should learn.
|
||||
|
||||
## Defining a Single Class
|
||||
|
||||
Each specific class definition rule requires the following information:
|
||||
|
||||
1. `name`: **(Required)** This is the unique, simple name for this class (e.g., `pipistrellus_pipistrellus`, `myotis_daubentonii`, `noise`).
|
||||
This label is used during training and is what the model predicts.
|
||||
Choose clear, distinct names.
|
||||
**Each class name must be unique.**
|
||||
2. `tags`: **(Required)** This list contains one or more specific tags (using `key` and `value`) used to identify if an _existing_ annotation belongs to this class during the _encoding_ phase (preparing training data).
|
||||
3. `match_type`: **(Optional, defaults to `"all"`)** Determines how the `tags` list is evaluated during _encoding_:
|
||||
- `"all"`: The annotation must have **ALL** listed tags to match.
|
||||
(Default).
|
||||
- `"any"`: The annotation needs **AT LEAST ONE** listed tag to match.
|
||||
4. `output_tags`: **(Optional)** This list specifies the tags that should be assigned to an annotation when the model _predicts_ this class `name`.
|
||||
This is used during the _decoding_ phase (interpreting model output).
|
||||
- **If you omit `output_tags` (or set it to `null`/~), the system will default to using the same tags listed in the `tags` field for decoding.** This is often what you want.
|
||||
- Providing `output_tags` allows you to specify a different, potentially more canonical or detailed, set of tags to represent the class upon prediction.
|
||||
For example, you could match based on simplified tags but output standardized tags.
|
||||
|
||||
**Example: Defining Species Classes (Encoding & Default Decoding)**
|
||||
|
||||
Here, the `tags` used for matching during encoding will also be used for decoding, as `output_tags` is omitted.
|
||||
|
||||
```yaml
|
||||
# In your main configuration file
|
||||
classes:
|
||||
# Definition for the first class
|
||||
- name: pippip # Simple name for Pipistrellus pipistrellus
|
||||
tags: # Used for BOTH encoding match and decoding output
|
||||
- key: species
|
||||
value: Pipistrellus pipistrellus
|
||||
# match_type defaults to "all"
|
||||
# output_tags is omitted, defaults to using 'tags' above
|
||||
|
||||
# Definition for the second class
|
||||
- name: myodau # Simple name for Myotis daubentonii
|
||||
tags: # Used for BOTH encoding match and decoding output
|
||||
- key: species
|
||||
value: Myotis daubentonii
|
||||
```
|
||||
|
||||
**Example: Defining a Class with Separate Encoding and Decoding Tags**
|
||||
|
||||
Here, we match based on _either_ of two tags (`match_type: any`), but when the model predicts `'pipistrelle'`, we decode it _only_ to the specific `Pipistrellus pipistrellus` tag plus a genus tag.
|
||||
|
||||
```yaml
|
||||
classes:
|
||||
- name: pipistrelle # Name for a Pipistrellus group
|
||||
match_type: any # Match if EITHER tag below is present during encoding
|
||||
tags:
|
||||
- key: species
|
||||
value: Pipistrellus pipistrellus
|
||||
- key: species
|
||||
value: Pipistrellus pygmaeus # Match pygmaeus too
|
||||
output_tags: # BUT, when decoding 'pipistrelle', assign THESE tags:
|
||||
- key: species
|
||||
value: Pipistrellus pipistrellus # Canonical species
|
||||
- key: genus # Assumes 'genus' key exists
|
||||
value: Pipistrellus # Add genus tag
|
||||
```
|
||||
|
||||
## Handling Overlap During Encoding: Priority Order Matters
|
||||
|
||||
As before, when preparing training data (encoding), if an annotation matches the `tags` and `match_type` rules for multiple class definitions, the **order of the class definitions in the configuration list determines the priority**.
|
||||
|
||||
- The system checks rules from the **top** of the `classes` list down.
|
||||
- The annotation gets assigned the `name` of the **first class rule it matches**.
|
||||
- **Place more specific rules before more general rules.**
|
||||
|
||||
_(The YAML example for prioritizing Species over Noise remains the same as the previous version)_
|
||||
|
||||
## Handling Non-Matches & Decoding the Generic Class
|
||||
|
||||
What happens if an annotation passes filtering/transformation but doesn't match any specific class rule during encoding?
|
||||
|
||||
- **Encoding:** As explained previously, these annotations are **not ignored**.
|
||||
They are typically assigned to a generic "relevant sound" category, often called the **"Bat"** class in BatDetect2, intended for all relevant bat calls not specifically classified.
|
||||
- **Decoding:** When the model predicts this generic "Bat" category (or when processing sounds that weren't assigned a specific class during encoding), we need a way to represent this generic status with tags.
|
||||
This is defined by the `generic_class` list directly within the main `classes` configuration section.
|
||||
|
||||
**Defining the Generic Class Tags:**
|
||||
|
||||
You specify the tags for the generic class like this:
|
||||
|
||||
```yaml
|
||||
# In your main configuration file
|
||||
classes: # Main configuration section for classes
|
||||
# --- List of specific class definitions ---
|
||||
classes:
|
||||
- name: pippip
|
||||
tags:
|
||||
- key: species
|
||||
value: Pipistrellus pipistrellus
|
||||
# ... other specific classes ...
|
||||
|
||||
# --- Definition of the generic class tags ---
|
||||
generic_class: # Define tags for the generic 'Bat' category
|
||||
- key: call_type
|
||||
value: Echolocation
|
||||
- key: order
|
||||
value: Chiroptera
|
||||
# These tags will be assigned when decoding the generic category
|
||||
```
|
||||
|
||||
This `generic_class` list provides the standard tags assigned when a sound is identified as relevant (passed filtering) but doesn't belong to one of the specific target classes you defined.
|
||||
Like the specific classes, sensible defaults are often provided if you don't explicitly define `generic_class`.
|
||||
|
||||
**Crucially:** Remember, if sounds should be **completely excluded** from training (not even considered "generic"), use **Filtering rules (Step 2)**.
|
||||
|
||||
### Outcome
|
||||
|
||||
By defining this list of prioritized class rules (including their `name`, matching `tags`, `match_type`, and optional `output_tags`) and the `generic_class` tags, you provide `batdetect2` with:
|
||||
|
||||
1. A clear procedure to assign a target label (`name`) to each relevant annotation for training.
|
||||
2. A clear mapping to convert predicted class names (including the generic case) back into meaningful annotation tags.
|
||||
|
||||
This complete definition prepares your data for the final heatmap generation (Step 5) and enables interpretation of the model's results.
|
||||
@ -1,141 +0,0 @@
|
||||
# Step 2: Filtering Sound Events
|
||||
|
||||
## Purpose
|
||||
|
||||
When preparing your annotated audio data for training a `batdetect2` model, you often want to select only specific sound events.
|
||||
For example, you might want to:
|
||||
|
||||
- Focus only on echolocation calls and ignore social calls or noise.
|
||||
- Exclude annotations that were marked as low quality.
|
||||
- Train only on specific species or groups of species.
|
||||
|
||||
This filtering module allows you to define rules based on the **tags** associated with each sound event annotation.
|
||||
Only the events that pass _all_ your defined rules will be kept for further processing and training.
|
||||
|
||||
## How it Works: Rules
|
||||
|
||||
Filtering is controlled by a list of **rules**.
|
||||
Each rule defines a condition based on the tags attached to a sound event.
|
||||
An event must satisfy **all** the rules you define in your configuration to be included.
|
||||
If an event fails even one rule, it is discarded.
|
||||
|
||||
## Defining Rules in Configuration
|
||||
|
||||
You define these rules within your main configuration file (usually a `.yaml` file) under a specific section (the exact name might depend on the main training config, but let's assume it's called `filtering`).
|
||||
|
||||
The configuration consists of a list named `rules`.
|
||||
Each item in this list is a single filter rule.
|
||||
|
||||
Each **rule** has two parts:
|
||||
|
||||
1. `match_type`: Specifies the _kind_ of check to perform.
|
||||
2. `tags`: A list of specific tags (each with a `key` and `value`) that the rule applies to.
|
||||
|
||||
```yaml
|
||||
# Example structure in your configuration file
|
||||
filtering:
|
||||
rules:
|
||||
- match_type: <TYPE_OF_CHECK_1>
|
||||
tags:
|
||||
- key: <tag_key_1a>
|
||||
value: <tag_value_1a>
|
||||
- key: <tag_key_1b>
|
||||
value: <tag_value_1b>
|
||||
- match_type: <TYPE_OF_CHECK_2>
|
||||
tags:
|
||||
- key: <tag_key_2a>
|
||||
value: <tag_value_2a>
|
||||
# ... add more rules as needed
|
||||
```
|
||||
|
||||
## Understanding `match_type`
|
||||
|
||||
This determines _how_ the list of `tags` in the rule is used to check a sound event.
|
||||
There are four types:
|
||||
|
||||
1. **`any`**: (Keep if _at least one_ tag matches)
|
||||
|
||||
- The sound event **passes** this rule if it has **at least one** of the tags listed in the `tags` section of the rule.
|
||||
- Think of it as an **OR** condition.
|
||||
- _Example Use Case:_ Keep events if they are tagged as `Species: Pip Pip` OR `Species: Pip Pyg`.
|
||||
|
||||
2. **`all`**: (Keep only if _all_ tags match)
|
||||
|
||||
- The sound event **passes** this rule only if it has **all** of the tags listed in the `tags` section.
|
||||
The event can have _other_ tags as well, but it must contain _all_ the ones specified here.
|
||||
- Think of it as an **AND** condition.
|
||||
- _Example Use Case:_ Keep events only if they are tagged with `Sound Type: Echolocation` AND `Quality: Good`.
|
||||
|
||||
3. **`exclude`**: (Discard if _any_ tag matches)
|
||||
|
||||
- The sound event **passes** this rule only if it does **not** have **any** of the tags listed in the `tags` section.
|
||||
If it matches even one tag in the list, the event is discarded.
|
||||
- _Example Use Case:_ Discard events if they are tagged `Quality: Poor` OR `Noise Source: Insect`.
|
||||
|
||||
4. **`equal`**: (Keep only if tags match _exactly_)
|
||||
- The sound event **passes** this rule only if its set of tags is _exactly identical_ to the list of `tags` provided in the rule (no more, no less).
|
||||
- _Note:_ This is very strict and usually less useful than `all` or `any`.
|
||||
|
||||
## Combining Rules
|
||||
|
||||
Remember: A sound event must **pass every single rule** defined in the `rules` list to be kept.
|
||||
The rules are checked one by one, and if an event fails any rule, it's immediately excluded from further consideration.
|
||||
|
||||
## Examples
|
||||
|
||||
**Example 1: Keep good quality echolocation calls**
|
||||
|
||||
```yaml
|
||||
filtering:
|
||||
rules:
|
||||
# Rule 1: Must have the 'Echolocation' tag
|
||||
- match_type: any # Could also use 'all' if 'Sound Type' is the only tag expected
|
||||
tags:
|
||||
- key: Sound Type
|
||||
value: Echolocation
|
||||
# Rule 2: Must NOT have the 'Poor' quality tag
|
||||
- match_type: exclude
|
||||
tags:
|
||||
- key: Quality
|
||||
value: Poor
|
||||
```
|
||||
|
||||
_Explanation:_ An event is kept only if it passes BOTH rules.
|
||||
It must have the `Sound Type: Echolocation` tag AND it must NOT have the `Quality: Poor` tag.
|
||||
|
||||
**Example 2: Keep calls from Pipistrellus species recorded in a specific project, excluding uncertain IDs**
|
||||
|
||||
```yaml
|
||||
filtering:
|
||||
rules:
|
||||
# Rule 1: Must be either Pip pip or Pip pyg
|
||||
- match_type: any
|
||||
tags:
|
||||
- key: Species
|
||||
value: Pipistrellus pipistrellus
|
||||
- key: Species
|
||||
value: Pipistrellus pygmaeus
|
||||
# Rule 2: Must belong to 'Project Alpha'
|
||||
- match_type: any # Using 'any' as it likely only has one project tag
|
||||
tags:
|
||||
- key: Project ID
|
||||
value: Project Alpha
|
||||
# Rule 3: Exclude if ID Certainty is 'Low' or 'Maybe'
|
||||
- match_type: exclude
|
||||
tags:
|
||||
- key: ID Certainty
|
||||
value: Low
|
||||
- key: ID Certainty
|
||||
value: Maybe
|
||||
```
|
||||
|
||||
_Explanation:_ An event is kept only if it passes ALL three rules:
|
||||
|
||||
1. It has a `Species` tag that is _either_ `Pipistrellus pipistrellus` OR `Pipistrellus pygmaeus`.
|
||||
2. It has the `Project ID: Project Alpha` tag.
|
||||
3. It does _not_ have an `ID Certainty: Low` tag AND it does _not_ have an `ID Certainty: Maybe` tag.
|
||||
|
||||
## Usage
|
||||
|
||||
You will typically specify the path to the configuration file containing these `filtering` rules when you set up your data processing or training pipeline in `batdetect2`.
|
||||
The tool will then automatically load these rules and apply them to your annotated sound events.
|
||||
@ -1,78 +0,0 @@
|
||||
# Defining Training Targets
|
||||
|
||||
A crucial aspect of training any supervised machine learning model, including BatDetect2, is clearly defining the **training targets**.
|
||||
This process determines precisely what the model should learn to detect, localize, classify, and characterize from the input data (in this case, spectrograms).
|
||||
The choices made here directly influence the model's focus, its performance, and how its predictions should be interpreted.
|
||||
|
||||
For BatDetect2, defining targets involves specifying:
|
||||
|
||||
- Which sounds in your annotated dataset are relevant for training.
|
||||
- How these sounds should be categorized into distinct **classes** (e.g., different species).
|
||||
- How the geometric **Region of Interest (ROI)** (e.g., bounding box) of each sound maps to the specific **position** and **size** targets the model predicts.
|
||||
- How these classes and geometric properties relate back to the detailed information stored in your annotation **tags** (using a consistent **vocabulary/terms**).
|
||||
- How the model's output (predicted class names, positions, sizes) should be translated back into meaningful tags and geometries.
|
||||
|
||||
## Sound Event Annotations: The Starting Point
|
||||
|
||||
BatDetect2 assumes your training data consists of audio recordings where relevant sound events have been **annotated**.
|
||||
A typical annotation for a single sound event provides two key pieces of information:
|
||||
|
||||
1. **Location & Extent:** Information defining _where_ the sound occurs in time and frequency, usually represented as a **bounding box** (the ROI) drawn on a spectrogram.
|
||||
2. **Description (Tags):** Information _about_ the sound event, provided as a set of descriptive **tags** (key-value pairs).
|
||||
|
||||
For example, an annotation might have a bounding box and tags like:
|
||||
|
||||
- `species: Myotis daubentonii`
|
||||
- `quality: Good`
|
||||
- `call_type: Echolocation`
|
||||
|
||||
A single sound event can have **multiple tags**, allowing for rich descriptions.
|
||||
This richness requires a structured process to translate the annotation (both tags and geometry) into the precise targets needed for model training.
|
||||
The **target definition process** provides clear rules to:
|
||||
|
||||
- Interpret the meaning of different tag keys (**Terms**).
|
||||
- Select only the relevant annotations (**Filtering**).
|
||||
- Potentially standardize or modify the tags (**Transforming**).
|
||||
- Map the geometric ROI to specific position and size targets (**ROI Mapping**).
|
||||
- Map the final set of tags on each selected annotation to a single, definitive **target class** label (**Classes**).
|
||||
|
||||
## Configuration-Driven Workflow
|
||||
|
||||
BatDetect2 is designed so that researchers can configure this entire target definition process primarily through **configuration files** (typically written in YAML format), minimizing the need for direct programming for standard workflows.
|
||||
These settings are usually grouped under a main `targets:` key within your overall training configuration file.
|
||||
|
||||
## The Target Definition Steps
|
||||
|
||||
Defining the targets involves several sequential steps, each configurable and building upon the previous one:
|
||||
|
||||
1. **Defining Vocabulary (Terms & Tags):** Understand how annotations use tags (key-value pairs).
|
||||
This step involves defining the meaning (**Terms**) behind the tag keys (e.g., `species`, `call_type`).
|
||||
Often, default terms are sufficient, but understanding this is key to using tags in later steps.
|
||||
(See: {doc}`tags_and_terms`})
|
||||
2. **Filtering Sound Events:** Select only the relevant sound event annotations based on their tags (e.g., keeping only high-quality calls).
|
||||
(See: {doc}`filtering`})
|
||||
3. **Transforming Tags (Optional):** Modify tags on selected annotations for standardization, correction, grouping (e.g., species to genus), or deriving new tags.
|
||||
(See: {doc}`transform`})
|
||||
4. **Defining Classes & Decoding Rules:** Map the final tags to specific target **class names** (like `pippip` or `myodau`).
|
||||
Define priorities for overlap and specify how predicted names map back to tags (decoding).
|
||||
(See: {doc}`classes`})
|
||||
5. **Mapping ROIs (Position & Size):** Define how the geometric ROI (e.g., bounding box) of each sound event maps to the specific reference **point** (e.g., center, corner) and scaled **size** values (width, height) used as targets by the model.
|
||||
(See: {doc}`rois`})
|
||||
6. **The `Targets` Object:** Understand the outcome of configuring steps 1-5 – a functional object used internally by BatDetect2 that encapsulates all your defined rules for filtering, transforming, ROI mapping, encoding, and decoding.
|
||||
(See: {doc}`use`)
|
||||
|
||||
The result of this configuration process is a clear set of instructions that BatDetect2 uses during training data preparation to determine the correct "answer" (the ground truth label and geometry representation) for each relevant sound event.
|
||||
|
||||
Explore the detailed steps using the links below:
|
||||
|
||||
```{toctree}
|
||||
:maxdepth: 1
|
||||
:caption: Target Definition Steps:
|
||||
|
||||
tags_and_terms
|
||||
filtering
|
||||
transform
|
||||
classes
|
||||
rois
|
||||
use
|
||||
```
|
||||
@ -1,76 +0,0 @@
|
||||
# Step 5: Generating Training Targets
|
||||
|
||||
## Purpose and Context
|
||||
|
||||
Following the previous steps of defining terms, filtering events, transforming tags, and defining specific class rules, this final stage focuses on **generating the ground truth data** used directly for training the BatDetect2 model.
|
||||
This involves converting the refined annotation information for each audio clip into specific **heatmap formats** required by the underlying neural network architecture.
|
||||
|
||||
This step essentially translates your structured annotations into the precise "answer key" the model learns to replicate during training.
|
||||
|
||||
## What are Heatmaps?
|
||||
|
||||
Heatmaps, in this context, are multi-dimensional arrays, often visualized as images aligned with the input spectrogram, where the values at different time-frequency coordinates represent specific information about the sound events.
|
||||
For BatDetect2 training, three primary heatmaps are generated:
|
||||
|
||||
1. **Detection Heatmap:**
|
||||
|
||||
- **Represents:** The presence or likelihood of relevant sound events across the spectrogram.
|
||||
- **Structure:** A 2D array matching the spectrogram's time-frequency dimensions.
|
||||
Peaks (typically smoothed) are generated at the reference locations of all sound events that passed the filtering stage (including both specifically classified events and those falling into the generic "Bat" category).
|
||||
|
||||
2. **Class Heatmap:**
|
||||
|
||||
- **Represents:** The location and class identity for sounds belonging to the _specific_ target classes you defined in Step 4.
|
||||
- **Structure:** A 3D array with dimensions for category, time, and frequency.
|
||||
It contains a separate 2D layer (channel) for each target class name (e.g., 'pippip', 'myodau').
|
||||
A smoothed peak appears on a specific class layer only if a sound event assigned to that class exists at that location.
|
||||
Events assigned only to the generic class do not produce peaks here.
|
||||
|
||||
3. **Size Heatmap:**
|
||||
- **Represents:** The target dimensions (duration/width and bandwidth/height) of detected sound events.
|
||||
- **Structure:** A 3D array with dimensions for size-dimension ('width', 'height'), time, and frequency.
|
||||
At the reference location of each detected sound event, this heatmap stores two numerical values corresponding to the scaled width and height derived from the event's bounding box.
|
||||
|
||||
## How Heatmaps are Created
|
||||
|
||||
The generation of these heatmaps is an automated process within `batdetect2`, driven by your configurations from all previous steps.
|
||||
For each audio clip and its corresponding spectrogram in the training dataset:
|
||||
|
||||
1. The system retrieves the associated sound event annotations.
|
||||
2. Configured **filtering rules** (Step 2) are applied to select relevant annotations.
|
||||
3. Configured **tag transformation rules** (Step 3) are applied to modify the tags of the selected annotations.
|
||||
4. Configured **class definition rules** (Step 4) are used to assign a specific class name or determine generic "Bat" status for each processed annotation.
|
||||
5. These final annotations are then mapped onto initialized heatmap arrays:
|
||||
- A signal (initially a single point) is placed on the **Detection Heatmap** at the reference location for each relevant annotation.
|
||||
- The scaled width and height values are placed on the **Size Heatmap** at the reference location.
|
||||
- If an annotation received a specific class name, a signal is placed on the corresponding layer of the **Class Heatmap** at the reference location.
|
||||
6. Finally, Gaussian smoothing (a blurring effect) is typically applied to the Detection and Class heatmaps to create spatially smoother targets, which often aids model training stability and performance.
|
||||
|
||||
## Configurable Settings for Heatmap Generation
|
||||
|
||||
While the content of the heatmaps is primarily determined by the previous configuration steps, a few parameters specific to the heatmap drawing process itself can be adjusted.
|
||||
These are usually set in your main configuration file under a section like `labelling`:
|
||||
|
||||
- `sigma`: (Number, e.g., `3.0`) Defines the standard deviation, in pixels or bins, of the Gaussian kernel used for smoothing the Detection and Class heatmaps.
|
||||
Larger values result in more diffused heatmap peaks.
|
||||
- `position`: (Text, e.g., `"bottom-left"`, `"center"`) Specifies the geometric reference point within each sound event's bounding box that anchors its representation on the heatmaps.
|
||||
- `time_scale` & `frequency_scale`: (Numbers) These crucial scaling factors convert the physical duration (in seconds) and frequency bandwidth (in Hz) of annotation bounding boxes into the numerical values stored in the 'width' and 'height' channels of the Size Heatmap.
|
||||
- **Important Note:** The appropriate values for these scales are dictated by the requirements of the specific BatDetect2 model architecture being trained.
|
||||
They ensure the size information is presented in the units or relative scale the model expects.
|
||||
**Consult the documentation or tutorials for your specific model to determine the correct `time_scale` and `frequency_scale` values.** Mismatched scales can hinder the model's ability to learn size regression accurately.
|
||||
|
||||
**Example YAML Configuration for Labelling Settings:**
|
||||
|
||||
```yaml
|
||||
# In your main configuration file
|
||||
labelling:
|
||||
sigma: 3.0 # Std. dev. for Gaussian smoothing (pixels/bins)
|
||||
position: "bottom-left" # Bounding box reference point
|
||||
time_scale: 1000.0 # Example: Scales seconds to milliseconds
|
||||
frequency_scale: 0.00116 # Example: Scales Hz relative to ~860 Hz (model specific!)
|
||||
```
|
||||
|
||||
## Outcome: Final Training Targets
|
||||
|
||||
Executing this step for all training data yields the complete set of target heatmaps (Detection, Class, Size) for each corresponding input spectrogram.
|
||||
These arrays constitute the ground truth data that the BatDetect2 model directly compares its predictions against during the training phase, guiding its learning process.
|
||||
@ -1,85 +0,0 @@
|
||||
## Defining Target Geometry: Mapping Sound Event Regions
|
||||
|
||||
### Introduction
|
||||
|
||||
In the previous steps of defining targets, we focused on determining _which_ sound events are relevant (`filtering`), _what_ descriptive tags they should have (`transform`), and _which category_ they belong to (`classes`).
|
||||
However, for the model to learn effectively, it also needs to know **where** in the spectrogram each sound event is located and approximately **how large** it is.
|
||||
|
||||
Your annotations typically define the location and extent of a sound event using a **Region of Interest (ROI)**, most commonly a **bounding box** drawn around the call on the spectrogram.
|
||||
This ROI contains detailed spatial information (start/end time, low/high frequency).
|
||||
|
||||
This section explains how BatDetect2 converts the geometric ROI from your annotations into the specific positional and size information used as targets during model training.
|
||||
|
||||
### From ROI to Model Targets: Position & Size
|
||||
|
||||
BatDetect2 does not directly predict a full bounding box.
|
||||
Instead, it is trained to predict:
|
||||
|
||||
1. **A Reference Point:** A single point `(time, frequency)` that represents the primary location of the detected sound event within the spectrogram.
|
||||
2. **Size Dimensions:** Numerical values representing the event's size relative to that reference point, typically its `width` (duration in time) and `height` (bandwidth in frequency).
|
||||
|
||||
This step defines _how_ BatDetect2 calculates this specific reference point and these numerical size values from the original annotation's bounding box.
|
||||
It also handles the reverse process – converting predicted positions and sizes back into bounding boxes for visualization or analysis.
|
||||
|
||||
### Configuring the ROI Mapping
|
||||
|
||||
You can control how this conversion happens through settings in your configuration file (e.g., your main `.yaml` file).
|
||||
These settings are usually placed within the main `targets:` configuration block, under a specific `roi:` key.
|
||||
|
||||
Here are the key settings:
|
||||
|
||||
- **`position`**:
|
||||
|
||||
- **What it does:** Determines which specific point on the annotation's bounding box is used as the single **Reference Point** for training (e.g., `"center"`, `"bottom-left"`).
|
||||
- **Why configure it?** This affects where the peak signal appears in the target heatmaps used for training.
|
||||
Different choices might slightly influence model learning.
|
||||
The default (`"bottom-left"`) is often a good starting point.
|
||||
- **Example Value:** `position: "center"`
|
||||
|
||||
- **`time_scale`**:
|
||||
|
||||
- **What it does:** This is a numerical scaling factor that converts the _actual duration_ (width, measured in seconds) of the bounding box into the numerical 'width' value the model learns to predict (and which is stored in the Size Heatmap).
|
||||
- **Why configure it?** The model predicts raw numbers for size; this scale gives those numbers meaning.
|
||||
For example, setting `time_scale: 1000.0` means the model will be trained to predict the duration in **milliseconds** instead of seconds.
|
||||
- **Important Considerations:**
|
||||
- You can often set this value based on the units you prefer the model to work with internally.
|
||||
However, having target numerical values roughly centered around 1 (e.g., typically between 0.1 and 10) can sometimes improve numerical stability during model training.
|
||||
- The default value in BatDetect2 (e.g., `1000.0`) has been chosen to scale the duration relative to the spectrogram width under default STFT settings.
|
||||
Be aware that if you significantly change STFT parameters (window size or overlap), the relationship between the default scale and the spectrogram dimensions might change.
|
||||
- Crucially, whatever scale you use during training **must** be used when decoding the model's predictions back into real-world time units (seconds).
|
||||
BatDetect2 generally handles this consistency for you automatically when using the full pipeline.
|
||||
- **Example Value:** `time_scale: 1000.0`
|
||||
|
||||
- **`frequency_scale`**:
|
||||
- **What it does:** Similar to `time_scale`, this numerical scaling factor converts the _actual frequency bandwidth_ (height, typically measured in Hz or kHz) of the bounding box into the numerical 'height' value the model learns to predict.
|
||||
- **Why configure it?** It gives physical meaning to the model's raw numerical prediction for bandwidth and allows you to choose the internal units or scale.
|
||||
- **Important Considerations:**
|
||||
- Same as for `time_scale`.
|
||||
- **Example Value:** `frequency_scale: 0.00116`
|
||||
|
||||
**Example YAML Configuration:**
|
||||
|
||||
```yaml
|
||||
# Inside your main configuration file (e.g., training_config.yaml)
|
||||
|
||||
targets: # Top-level key for target definition
|
||||
# ... filtering settings ...
|
||||
# ... transforms settings ...
|
||||
# ... classes settings ...
|
||||
|
||||
# --- ROI Mapping Settings ---
|
||||
roi:
|
||||
position: "bottom-left" # Reference point (e.g., "center", "bottom-left")
|
||||
time_scale: 1000.0 # e.g., Model predicts width in ms
|
||||
frequency_scale: 0.00116 # e.g., Model predicts height relative to ~860Hz (or other model-specific scaling)
|
||||
```
|
||||
|
||||
### Decoding Size Predictions
|
||||
|
||||
These scaling factors (`time_scale`, `frequency_scale`) are also essential for interpreting the model's output correctly.
|
||||
When the model predicts numerical values for width and height, BatDetect2 uses these same scales (in reverse) to convert those numbers back into physically meaningful durations (seconds) and bandwidths (Hz/kHz) when reconstructing bounding boxes from predictions.
|
||||
|
||||
### Outcome
|
||||
|
||||
By configuring the `roi` settings, you ensure that BatDetect2 consistently translates the geometric information from your annotations into the specific reference points and scaled size values required for training the model.
|
||||
Using consistent scales that are appropriate for your data and potentially beneficial for training stability allows the model to effectively learn not just _what_ sound is present, but also _where_ it is located and _how large_ it is, and enables meaningful interpretation of the model's spatial and size predictions.
|
||||
@ -1,166 +0,0 @@
|
||||
# Step 1: Managing Annotation Vocabulary
|
||||
|
||||
## Purpose
|
||||
|
||||
To train `batdetect2`, you will need sound events that have been carefully annotated. We annotate sound events using **tags**. A tag is simply a piece of information attached to an annotation, often describing what the sound is or its characteristics. Common examples include `Species: Myotis daubentonii` or `Quality: Good`.
|
||||
|
||||
Each tag fundamentally has two parts:
|
||||
|
||||
* **Value:** The specific information (e.g., "Myotis daubentonii", "Good").
|
||||
* **Term:** The *type* of information (e.g., "Species", "Quality"). This defines the context or meaning of the value.
|
||||
|
||||
We use this flexible **Term: Value** approach because it allows you to annotate your data with any kind of information relevant to your project, while still providing a structure that makes the meaning clear.
|
||||
|
||||
While simple terms like "Species" are easy to understand, sometimes the underlying definition needs to be more precise to ensure everyone interprets it the same way (e.g., using a standard scientific definition for "Species" or clarifying what "Call Type" specifically refers to).
|
||||
|
||||
This `terms` module is designed to help manage these definitions effectively:
|
||||
|
||||
1. It provides **standard definitions** for common terms used in bioacoustics, ensuring consistency.
|
||||
2. It lets you **define your own custom terms** if you need concepts specific to your project.
|
||||
3. Crucially, it allows you to use simple **"keys"** (like shortcuts) in your configuration files to refer to these potentially complex term definitions, making configuration much easier and less error-prone.
|
||||
|
||||
## The Problem: Why We Need Defined Terms
|
||||
|
||||
Imagine you have a tag that simply says `"Myomyo"`.
|
||||
If you created this tag, you might know it's a shortcut for the species _Myotis myotis_.
|
||||
But what about someone else using your data or model later? Does `"Myomyo"` refer to the species? Or maybe it's the name of an individual bat, or even the location where it was recorded? Simple tags like this can be ambiguous.
|
||||
|
||||
To make things clearer, it's good practice to provide context.
|
||||
We can do this by pairing the specific information (the **Value**) with the _type_ of information (the **Term**).
|
||||
For example, writing the tag as `species: Myomyo` is much less ambiguous.
|
||||
Here, `species` is the **Term**, explaining that `Myomyo` is a **Value** representing a species.
|
||||
|
||||
However, another challenge often comes up when sharing data or collaborating.
|
||||
You might use the term `species`, while a colleague uses `Species`, and someone else uses the more formal `Scientific Name`.
|
||||
Even though you all mean the same thing, these inconsistencies make it hard to combine data or reuse analysis pipelines automatically.
|
||||
|
||||
This is where standardized **Terms** become very helpful.
|
||||
Several groups work to create standard definitions for common concepts.
|
||||
For instance, the Darwin Core standard provides widely accepted terms for biological data, like `dwc:scientificName` for a species name.
|
||||
Using standard Terms whenever possible makes your data clearer, easier for others (and machines!) to understand correctly, and much more reusable across different projects.
|
||||
|
||||
**But here's the practical problem:** While using standard, well-defined Terms is important for clarity and reusability, writing out full definitions or long standard names (like `dwc:scientificName` or "Scientific Name according to Darwin Core standard") every single time you need to refer to a species tag in a configuration file would be extremely tedious and prone to typing errors.
|
||||
|
||||
## The Solution: Keys (Shortcuts) and the Registry
|
||||
|
||||
This module uses a central **Registry** that stores the full definitions of various Terms.
|
||||
Each Term in the registry is assigned a unique, short **key** (a simple string).
|
||||
|
||||
Think of the **key** as shortcut.
|
||||
|
||||
Instead of using the full Term definition in your configuration files, you just use its **key**.
|
||||
The system automatically looks up the full definition in the registry using the key when needed.
|
||||
|
||||
**Example:**
|
||||
|
||||
- **Full Term Definition:** Represents the scientific name of the organism.
|
||||
- **Key:** `species`
|
||||
- **In Config:** You just write `species`.
|
||||
|
||||
## Available Keys
|
||||
|
||||
The registry comes pre-loaded with keys for many standard terms used in bioacoustics, including those from the `soundevent` package and some specific to `batdetect2`. This means you can often use these common concepts without needing to define them yourself.
|
||||
|
||||
Common examples of pre-defined keys might include:
|
||||
|
||||
* `species`: For scientific species names (e.g., *Myotis daubentonii*).
|
||||
* `common_name`: For the common name of a species (e.g., "Daubenton's bat").
|
||||
* `genus`, `family`, `order`: For higher levels of biological taxonomy.
|
||||
* `call_type`: For functional call types (e.g., 'Echolocation', 'Social').
|
||||
* `individual`: For identifying specific individuals if tracked.
|
||||
* `class`: **(Special Key)** This key is often used **by default** in configurations when defining the target classes for your model (e.g., the different species you want the model to classify). If you are specifying a tag that represents a target class label, you often only need to provide the `value`, and the system assumes the `key` is `class`.
|
||||
|
||||
This is not an exhaustive list. To discover all the term keys currently available in the registry (including any standard ones loaded automatically and any custom ones you've added in your configuration), you can:
|
||||
|
||||
1. Use the function `batdetect2.terms.get_term_keys()` if you are working directly with Python code.
|
||||
2. Refer to the main `batdetect2` API documentation for a list of commonly included standard terms.
|
||||
|
||||
Okay, let's refine the "Defining Your Own Terms" section to incorporate the explanation about namespacing within the `name` field description, keeping the style clear and researcher-focused.
|
||||
|
||||
## Defining Your Own Terms
|
||||
|
||||
While many common terms have pre-defined keys, you might need a term specific to your project or data that isn't already available (e.g., "Recording Setup", "Weather Condition", "Project Phase", "Noise Source"). You can easily define these custom terms directly within a configuration file (usually your main `.yaml` file).
|
||||
|
||||
Typically, you define custom terms under a dedicated section (often named `terms`). Inside this section, you create a list, where each item in the list defines one new term using the following fields:
|
||||
|
||||
* `key`: **(Required)** This is the unique shortcut key or nickname you will use to refer to this term throughout your configuration (e.g., `weather`, `setup_id`, `noise_src`). Choose something short and memorable.
|
||||
* `label`: (Optional) A user-friendly label for the term, which might be used in reports or visualizations (e.g., "Weather Condition", "Setup ID"). If you don't provide one, it defaults to using the `key`.
|
||||
* `name`: (Optional) A more formal or technical name for the term.
|
||||
* It's good practice, especially if defining terms that might overlap with standard vocabularies, to use a **namespaced format** like `<namespace>:<term_name>`. The `namespace` part helps avoid clashes with terms defined elsewhere. For example, the standard Darwin Core term for scientific name is `dwc:scientificName`, where `dwc` is the namespace for Darwin Core. Using namespaces makes your custom terms more specific and reduces potential confusion.
|
||||
* If you don't provide a `name`, it defaults to using the `key`.
|
||||
* `definition`: (Optional) A brief text description explaining what this term represents (e.g., "The primary source of background noise identified", "General weather conditions during recording"). If omitted, it defaults to "Unknown".
|
||||
* `uri`: (Optional) If your term definition comes directly from a standard online vocabulary (like Darwin Core), you can include its unique web identifier (URI) here.
|
||||
|
||||
**Example YAML Configuration for Custom Terms:**
|
||||
|
||||
```yaml
|
||||
# In your main configuration file
|
||||
|
||||
# (Optional section to define custom terms)
|
||||
terms:
|
||||
- key: weather # Your chosen shortcut
|
||||
label: Weather Condition
|
||||
name: myproj:weather # Formal namespaced name
|
||||
definition: General weather conditions during recording (e.g., Clear, Rain, Fog).
|
||||
|
||||
- key: setup_id # Another shortcut
|
||||
label: Recording Setup ID
|
||||
name: myproj:setupID # Formal namespaced name
|
||||
definition: The unique identifier for the specific hardware setup used.
|
||||
|
||||
- key: species # Defining a term with a standard URI
|
||||
label: Scientific Name
|
||||
name: dwc:scientificName
|
||||
uri: http://rs.tdwg.org/dwc/terms/scientificName # Example URI
|
||||
definition: The full scientific name according to Darwin Core.
|
||||
|
||||
# ... other configuration sections ...
|
||||
```
|
||||
|
||||
When `batdetect2` loads your configuration, it reads this `terms` section and adds your custom definitions (linked to their unique keys) to the central registry. These keys (`weather`, `setup_id`, etc.) are then ready to be used in other parts of your configuration, like defining filters or target classes.
|
||||
|
||||
## Using Keys to Specify Tags (in Filters, Class Definitions, etc.)
|
||||
|
||||
Now that you have keys for all the terms you need (both pre-defined and custom), you can easily refer to specific **tags** in other parts of your configuration, such as:
|
||||
|
||||
- Filtering rules (as seen in the `filtering` module documentation).
|
||||
- Defining which tags represent your target classes.
|
||||
- Associating extra information with your classes.
|
||||
|
||||
When you need to specify a tag, you typically use a structure with two fields:
|
||||
|
||||
- `key`: The **key** (shortcut) for the _Term_ part of the tag (e.g., `species`, `quality`, `weather`).
|
||||
**It defaults to `class`** if you omit it, which is common when defining the main target classes.
|
||||
- `value`: The specific _value_ of the tag (e.g., `Myotis daubentonii`, `Good`, `Rain`).
|
||||
|
||||
**Example YAML Configuration (e.g., inside a filter rule):**
|
||||
|
||||
```yaml
|
||||
# ... inside a filtering configuration section ...
|
||||
rules:
|
||||
# Rule: Exclude events recorded in 'Rain'
|
||||
- match_type: exclude
|
||||
tags:
|
||||
- key: weather # Use the custom term key defined earlier
|
||||
value: Rain
|
||||
# Rule: Keep only 'Myotis daubentonii' (using the default 'class' key implicitly)
|
||||
- match_type: any # Or 'all' depending on logic
|
||||
tags:
|
||||
- value: Myotis daubentonii # 'key: class' is assumed by default here
|
||||
# key: class # Explicitly writing this is also fine
|
||||
# Rule: Keep only 'Good' quality events
|
||||
- match_type: any # Or 'all' depending on logic
|
||||
tags:
|
||||
- key: quality # Use a likely pre-defined key
|
||||
value: Good
|
||||
```
|
||||
|
||||
## Summary
|
||||
|
||||
- Annotations have **tags** (Term + Value).
|
||||
- This module uses short **keys** as shortcuts for Term definitions, stored in a **registry**.
|
||||
- Many **common keys are pre-defined**.
|
||||
- You can define **custom terms and keys** in your configuration file (using `key`, `label`, `definition`).
|
||||
- You use these **keys** along with specific **values** to refer to tags in other configuration sections (like filters or class definitions), often defaulting to the `class` key.
|
||||
|
||||
This system makes your configurations cleaner, more readable, and less prone to errors by avoiding repetition of complex term definitions.
|
||||
@ -1,118 +0,0 @@
|
||||
# Step 3: Transforming Annotation Tags (Optional)
|
||||
|
||||
## Purpose and Context
|
||||
|
||||
After defining your vocabulary (Step 1: Terms) and filtering out irrelevant sound events (Step 2: Filtering), you have a dataset of annotations ready for the next stages.
|
||||
Before you select the final target classes for training (Step 4), you might want or need to **modify the tags** associated with your annotations.
|
||||
This optional step allows you to clean up, standardize, or derive new information from your existing tags.
|
||||
|
||||
**Why transform tags?**
|
||||
|
||||
- **Correcting Mistakes:** Fix typos or incorrect values in specific tags (e.g., changing an incorrect species label).
|
||||
- **Standardizing Labels:** Ensure consistency if the same information was tagged using slightly different values (e.g., mapping "echolocation", "Echoloc.", and "Echolocation Call" all to a single standard value: "Echolocation").
|
||||
- **Grouping Related Concepts:** Combine different specific tags into a broader category (e.g., mapping several different species tags like _Myotis daubentonii_ and _Myotis nattereri_ to a single `genus: Myotis` tag).
|
||||
- **Deriving New Information:** Automatically create new tags based on existing ones (e.g., automatically generating a `genus: Myotis` tag whenever a `species: Myotis daubentonii` tag is present).
|
||||
|
||||
This step uses the `batdetect2.targets.transform` module to apply these changes based on rules you define.
|
||||
|
||||
## How it Works: Transformation Rules
|
||||
|
||||
You control how tags are transformed by defining a list of **rules** in your configuration file (e.g., your main `.yaml` file, often under a section named `transform`).
|
||||
|
||||
Each rule specifies a particular type of transformation to perform.
|
||||
Importantly, the rules are applied **sequentially**, in the exact order they appear in your configuration list.
|
||||
The output annotation from one rule becomes the input for the next rule in the list.
|
||||
This means the order can matter!
|
||||
|
||||
## Types of Transformation Rules
|
||||
|
||||
Here are the main types of rules you can define:
|
||||
|
||||
1. **Replace an Exact Tag (`replace`)**
|
||||
|
||||
- **Use Case:** Fixing a specific, known incorrect tag.
|
||||
- **How it works:** You specify the _exact_ original tag (both its term key and value) and the _exact_ tag you want to replace it with.
|
||||
- **Example Config:** Replace the informal tag `species: Pip pip` with the correct scientific name tag.
|
||||
```yaml
|
||||
transform:
|
||||
rules:
|
||||
- rule_type: replace
|
||||
original:
|
||||
key: species # Term key of the tag to find
|
||||
value: "Pip pip" # Value of the tag to find
|
||||
replacement:
|
||||
key: species # Term key of the replacement tag
|
||||
value: "Pipistrellus pipistrellus" # Value of the replacement tag
|
||||
```
|
||||
|
||||
2. **Map Values (`map_value`)**
|
||||
|
||||
- **Use Case:** Standardizing different values used for the same concept, or grouping multiple specific values into one category.
|
||||
- **How it works:** You specify a `source_term_key` (the type of tag to look at, e.g., `call_type`).
|
||||
Then you provide a `value_mapping` dictionary listing original values and the new values they should be mapped to.
|
||||
Only tags matching the `source_term_key` and having a value listed in the mapping will be changed.
|
||||
You can optionally specify a `target_term_key` if you want to change the term type as well (e.g., mapping species to a genus).
|
||||
- **Example Config:** Standardize different ways "Echolocation" might have been written for the `call_type` term.
|
||||
```yaml
|
||||
transform:
|
||||
rules:
|
||||
- rule_type: map_value
|
||||
source_term_key: call_type # Look at 'call_type' tags
|
||||
# target_term_key is not specified, so the term stays 'call_type'
|
||||
value_mapping:
|
||||
echolocation: Echolocation
|
||||
Echolocation Call: Echolocation
|
||||
Echoloc.: Echolocation
|
||||
# Add mappings for other values like 'Social' if needed
|
||||
```
|
||||
- **Example Config (Grouping):** Map specific Pipistrellus species tags to a single `genus: Pipistrellus` tag.
|
||||
```yaml
|
||||
transform:
|
||||
rules:
|
||||
- rule_type: map_value
|
||||
source_term_key: species # Look at 'species' tags
|
||||
target_term_key: genus # Change the term to 'genus'
|
||||
value_mapping:
|
||||
"Pipistrellus pipistrellus": Pipistrellus
|
||||
"Pipistrellus pygmaeus": Pipistrellus
|
||||
"Pipistrellus nathusii": Pipistrellus
|
||||
```
|
||||
|
||||
3. **Derive a New Tag (`derive_tag`)**
|
||||
- **Use Case:** Automatically creating new information based on existing tags, like getting the genus from a species name.
|
||||
- **How it works:** You specify a `source_term_key` (e.g., `species`).
|
||||
You provide a `target_term_key` for the new tag to be created (e.g., `genus`).
|
||||
You also provide the name of a `derivation_function` (e.g., `"extract_genus"`) that knows how to perform the calculation (e.g., take "Myotis daubentonii" and return "Myotis").
|
||||
`batdetect2` has some built-in functions, or you can potentially define your own (see advanced documentation).
|
||||
You can also choose whether to keep the original source tag (`keep_source: true`).
|
||||
- **Example Config:** Create a `genus` tag from the existing `species` tag, keeping the species tag.
|
||||
```yaml
|
||||
transform:
|
||||
rules:
|
||||
- rule_type: derive_tag
|
||||
source_term_key: species # Use the value from the 'species' tag
|
||||
target_term_key: genus # Create a tag with the 'genus' term
|
||||
derivation_function: extract_genus # Use the built-in function for this
|
||||
keep_source: true # Keep the original 'species' tag
|
||||
```
|
||||
- **Another Example:** Convert species names to uppercase (modifying the value of the _same_ term).
|
||||
```yaml
|
||||
transform:
|
||||
rules:
|
||||
- rule_type: derive_tag
|
||||
source_term_key: species # Use the value from the 'species' tag
|
||||
# target_term_key is not specified, so the term stays 'species'
|
||||
derivation_function: to_upper_case # Assume this function exists
|
||||
keep_source: false # Replace the original species tag
|
||||
```
|
||||
|
||||
## Rule Order Matters
|
||||
|
||||
Remember that rules are applied one after another.
|
||||
If you have multiple rules, make sure they are ordered correctly to achieve the desired outcome.
|
||||
For instance, you might want to standardize species names _before_ deriving the genus from them.
|
||||
|
||||
## Outcome
|
||||
|
||||
After applying all the transformation rules you've defined, the annotations will proceed to the next step (Step 4: Select Target Tags & Define Classes) with their tags potentially cleaned, standardized, or augmented based on your configuration.
|
||||
If you don't define any rules, the tags simply pass through this step unchanged.
|
||||
@ -1,91 +0,0 @@
|
||||
## Bringing It All Together: The `Targets` Object
|
||||
|
||||
### Recap: Defining Your Target Strategy
|
||||
|
||||
In the previous sections, we covered the sequential steps to precisely define what your BatDetect2 model should learn, specified within your configuration file:
|
||||
|
||||
1. **Terms:** Establishing the vocabulary for annotation tags.
|
||||
2. **Filtering:** Selecting relevant sound event annotations.
|
||||
3. **Transforming:** Optionally modifying tags.
|
||||
4. **Classes:** Defining target categories, setting priorities, and specifying tag decoding rules.
|
||||
5. **ROI Mapping:** Defining how annotation geometry maps to target position and size values.
|
||||
|
||||
You define all these aspects within your configuration file (e.g., YAML), which holds the complete specification for your target definition strategy, typically under a main `targets:` key.
|
||||
|
||||
### What is the `Targets` Object?
|
||||
|
||||
While the configuration file specifies _what_ you want to happen, BatDetect2 needs an active component to actually _perform_ these steps.
|
||||
This is the role of the `Targets` object.
|
||||
|
||||
The `Targets` is an organized container that holds all the specific functions and settings derived from your configuration file (`TargetConfig`).
|
||||
It's created directly from your configuration and provides methods to apply the **filtering**, **transformation**, **ROI mapping** (geometry to position/size and back), **class encoding**, and **class decoding** steps you defined.
|
||||
It effectively bundles together all the target definition logic determined by your settings into a single, usable object.
|
||||
|
||||
### How is it Created and Used?
|
||||
|
||||
For most standard training workflows, you typically won't need to create or interact with the `Targets` object directly in Python code.
|
||||
BatDetect2 usually handles its creation automatically when you provide your main configuration file during training setup.
|
||||
|
||||
Conceptually, here's what happens behind the scenes:
|
||||
|
||||
1. You provide the path to your configuration file (e.g., `my_training_config.yaml`).
|
||||
2. BatDetect2 reads this file and finds your `targets:` configuration section.
|
||||
3. It uses this configuration to build an instance of the `Targets` object using a dedicated function (like `load_targets`), loading it with the appropriate logic based on your settings.
|
||||
|
||||
```python
|
||||
# Conceptual Example: How BatDetect2 might use your configuration
|
||||
from batdetect2.targets import load_targets # The function to load/build the object
|
||||
from batdetect2.targets.types import TargetProtocol # The type/interface
|
||||
|
||||
# You provide this path, usually as part of the main training setup
|
||||
target_config_file = "path/to/your/target_config.yaml"
|
||||
|
||||
# --- BatDetect2 Internally Does Something Like This: ---
|
||||
# Loads your config and builds the Targets object using the loader function
|
||||
# The resulting object adheres to the TargetProtocol interface
|
||||
targets_processor: TargetProtocol = load_targets(target_config_file)
|
||||
# ---------------------------------------------------------
|
||||
|
||||
# Now, 'targets_processor' holds all your configured logic and is ready
|
||||
# to be used internally by the training pipeline or for prediction processing.
|
||||
```
|
||||
|
||||
### What Does the `Targets` Object Do? (Its Role)
|
||||
|
||||
Once created, the `targets_processor` object plays several vital roles within the BatDetect2 system:
|
||||
|
||||
1. **Preparing Training Data:** During the data loading and label generation phase of training, BatDetect2 uses this object to process each annotation from your dataset _before_ the final training format (e.g., heatmaps) is generated.
|
||||
For each annotation, it internally applies the logic:
|
||||
- `targets_processor.filter(...)`: To decide whether to keep the annotation.
|
||||
- `targets_processor.transform(...)`: To apply any tag modifications.
|
||||
- `targets_processor.encode(...)`: To get the final class name (e.g., `'pippip'`, `'myodau'`, or `None` for the generic class).
|
||||
- `targets_processor.get_position(...)`: To determine the reference `(time, frequency)` point from the annotation's geometry.
|
||||
- `targets_processor.get_size(...)`: To calculate the _scaled_ width and height target values from the annotation's geometry.
|
||||
2. **Interpreting Model Predictions:** When you use a trained model, its raw outputs (like predicted class names, positions, and sizes) need to be translated back into meaningful results.
|
||||
This object provides the necessary decoding logic:
|
||||
- `targets_processor.decode(...)`: Converts a predicted class name back into representative annotation tags.
|
||||
- `targets_processor.recover_roi(...)`: Converts a predicted position and _scaled_ size values back into an estimated geometric bounding box in real-world coordinates (seconds, Hz).
|
||||
- `targets_processor.generic_class_tags`: Provides the tags for sounds classified into the generic category.
|
||||
3. **Providing Metadata:** It conveniently holds useful information derived from your configuration:
|
||||
- `targets_processor.class_names`: The final list of specific target class names.
|
||||
- `targets_processor.generic_class_tags`: The tags representing the generic class.
|
||||
- `targets_processor.dimension_names`: The names used for the size dimensions (e.g., `['width', 'height']`).
|
||||
|
||||
### Why is Understanding This Important?
|
||||
|
||||
As a researcher using BatDetect2, your primary interaction is typically through the **configuration file**.
|
||||
The `Targets` object is the component that materializes your configurations.
|
||||
|
||||
Understanding its role can be important:
|
||||
|
||||
- It helps connect the settings in your configuration file (covering terms, filtering, transforms, classes, and ROIs) to the actual behavior observed during training or when interpreting model outputs.
|
||||
If the results aren't as expected (e.g., wrong classifications, incorrect bounding box predictions), reviewing the relevant sections of your `TargetConfig` is the first step in debugging.
|
||||
- Furthermore, understanding this structure is beneficial if you plan to create custom Python scripts.
|
||||
While standard training runs handle this object internally, the underlying functions for filtering, transforming, encoding, decoding, and ROI mapping are accessible or can be built individually.
|
||||
This modular design provides the **flexibility to use or customize specific parts of the target definition workflow programmatically** for advanced analyses, integration tasks, or specialized data processing pipelines, should you need to go beyond the standard configuration-driven approach.
|
||||
|
||||
### Summary
|
||||
|
||||
The `Targets` object encapsulates the entire configured target definition logic specified in your `TargetConfig` file.
|
||||
It acts as the central component within BatDetect2 for applying filtering, tag transformation, ROI mapping (geometry to/from position/size), class encoding (for training preparation), and class/ROI decoding (for interpreting predictions).
|
||||
It bridges the gap between your declarative configuration and the functional steps needed for training and using BatDetect2 models effectively, while also offering components for more advanced, scripted workflows.
|
||||
35
docs/source/tutorials/evaluate-on-a-test-set.md
Normal file
35
docs/source/tutorials/evaluate-on-a-test-set.md
Normal file
@ -0,0 +1,35 @@
|
||||
# Tutorial: Evaluate on a test set
|
||||
|
||||
This tutorial shows how to evaluate a trained checkpoint on a held-out dataset
|
||||
and inspect the output metrics.
|
||||
|
||||
## Before you start
|
||||
|
||||
- A trained model checkpoint.
|
||||
- A test dataset config file.
|
||||
- (Optional) Targets, audio, inference, and evaluation config overrides.
|
||||
|
||||
## Tutorial steps
|
||||
|
||||
1. Select a checkpoint and a test dataset.
|
||||
2. Run `batdetect2 evaluate`.
|
||||
3. Inspect output metrics and prediction artifacts.
|
||||
4. Record evaluation settings for reproducibility.
|
||||
|
||||
## Example command
|
||||
|
||||
```bash
|
||||
batdetect2 evaluate \
|
||||
path/to/model.ckpt \
|
||||
path/to/test_dataset.yaml \
|
||||
--output-dir path/to/eval_outputs
|
||||
```
|
||||
|
||||
## What to do next
|
||||
|
||||
- Compare thresholds on representative files:
|
||||
{doc}`../how_to/tune-detection-threshold`
|
||||
- Check full evaluate options: {doc}`../reference/cli/evaluate`
|
||||
|
||||
This page is a starter scaffold and will be expanded with a full worked
|
||||
example.
|
||||
13
docs/source/tutorials/index.md
Normal file
13
docs/source/tutorials/index.md
Normal file
@ -0,0 +1,13 @@
|
||||
# Tutorials
|
||||
|
||||
Tutorials are for learning by doing. They provide a single, reproducible path
|
||||
to a concrete outcome.
|
||||
|
||||
```{toctree}
|
||||
:maxdepth: 1
|
||||
|
||||
run-inference-on-folder
|
||||
train-a-custom-model
|
||||
evaluate-on-a-test-set
|
||||
integrate-with-a-python-pipeline
|
||||
```
|
||||
42
docs/source/tutorials/integrate-with-a-python-pipeline.md
Normal file
42
docs/source/tutorials/integrate-with-a-python-pipeline.md
Normal file
@ -0,0 +1,42 @@
|
||||
# Tutorial: Integrate with a Python pipeline
|
||||
|
||||
This tutorial shows a minimal Python workflow for loading audio, running
|
||||
batdetect2, and collecting detections for downstream analysis.
|
||||
|
||||
## Before you start
|
||||
|
||||
- BatDetect2 installed in your Python environment.
|
||||
- A model checkpoint.
|
||||
- At least one input audio file.
|
||||
|
||||
## Tutorial steps
|
||||
|
||||
1. Load BatDetect2 in Python.
|
||||
2. Create an API instance from a checkpoint.
|
||||
3. Run `process_file` on one audio file.
|
||||
4. Read detection fields and class scores.
|
||||
5. Save or pass detections to your downstream pipeline.
|
||||
|
||||
## Example code
|
||||
|
||||
```python
|
||||
from pathlib import Path
|
||||
|
||||
from batdetect2.api_v2 import BatDetect2API
|
||||
|
||||
api = BatDetect2API.from_checkpoint(Path("path/to/model.ckpt"))
|
||||
prediction = api.process_file(Path("path/to/audio.wav"))
|
||||
|
||||
for detection in prediction.detections:
|
||||
top_class = api.get_top_class_name(detection)
|
||||
score = detection.detection_score
|
||||
print(top_class, score)
|
||||
```
|
||||
|
||||
## What to do next
|
||||
|
||||
- See API/config references: {doc}`../reference/index`
|
||||
- Learn practical CLI alternatives: {doc}`run-inference-on-folder`
|
||||
|
||||
This page is a starter scaffold and will be expanded with a full worked
|
||||
example.
|
||||
33
docs/source/tutorials/run-inference-on-folder.md
Normal file
33
docs/source/tutorials/run-inference-on-folder.md
Normal file
@ -0,0 +1,33 @@
|
||||
# Tutorial: Run inference on a folder of audio files
|
||||
|
||||
This tutorial walks through a first end-to-end inference run with the CLI.
|
||||
|
||||
## Before you start
|
||||
|
||||
- BatDetect2 installed in your environment.
|
||||
- A folder containing `.wav` files.
|
||||
- A model checkpoint path.
|
||||
|
||||
## Tutorial steps
|
||||
|
||||
1. Choose your input and output directories.
|
||||
2. Run prediction with the CLI.
|
||||
3. Verify output files were written.
|
||||
4. Inspect predictions and confidence scores.
|
||||
|
||||
## Example command
|
||||
|
||||
```bash
|
||||
batdetect2 predict directory \
|
||||
path/to/model.ckpt \
|
||||
path/to/audio_dir \
|
||||
path/to/outputs
|
||||
```
|
||||
|
||||
## What to do next
|
||||
|
||||
- Use {doc}`../how_to/tune-detection-threshold` to tune sensitivity.
|
||||
- Use {doc}`../reference/cli/index` for full command options.
|
||||
|
||||
This page is a starter scaffold and will be expanded with a full worked
|
||||
example.
|
||||
37
docs/source/tutorials/train-a-custom-model.md
Normal file
37
docs/source/tutorials/train-a-custom-model.md
Normal file
@ -0,0 +1,37 @@
|
||||
# Tutorial: Train a custom model
|
||||
|
||||
This tutorial walks through a first custom training run using your own
|
||||
annotations.
|
||||
|
||||
## Before you start
|
||||
|
||||
- BatDetect2 installed.
|
||||
- A training dataset config file.
|
||||
- (Optional) A validation dataset config file.
|
||||
|
||||
## Tutorial steps
|
||||
|
||||
1. Prepare training and validation dataset config files.
|
||||
2. Choose target definitions and model/training config files.
|
||||
3. Run `batdetect2 train`.
|
||||
4. Check that checkpoints and logs are written.
|
||||
5. Run a quick sanity inference on a small audio subset.
|
||||
|
||||
## Example command
|
||||
|
||||
```bash
|
||||
batdetect2 train \
|
||||
path/to/train_dataset.yaml \
|
||||
--val-dataset path/to/val_dataset.yaml \
|
||||
--targets path/to/targets.yaml \
|
||||
--model-config path/to/model.yaml \
|
||||
--training-config path/to/training.yaml
|
||||
```
|
||||
|
||||
## What to do next
|
||||
|
||||
- Evaluate the trained checkpoint: {doc}`evaluate-on-a-test-set`
|
||||
- Check full train options: {doc}`../reference/cli/train`
|
||||
|
||||
This page is a starter scaffold and will be expanded with a full worked
|
||||
example.
|
||||
@ -32,5 +32,6 @@ classification_targets:
|
||||
value: Rhinolophus ferrumequinum
|
||||
|
||||
roi:
|
||||
name: anchor_bbox
|
||||
anchor: top-left
|
||||
default:
|
||||
name: anchor_bbox
|
||||
anchor: top-left
|
||||
|
||||
65
faq.md
65
faq.md
@ -1,65 +0,0 @@
|
||||
# BatDetect2 - FAQ
|
||||
|
||||
## Installation
|
||||
|
||||
#### Do I need to know Python to be able to use this?
|
||||
No. To simply run the code on your own data you do not need any knowledge of Python. However, a small bit of familiarity with the terminal (i.e. command line) in Windows/Linux/OSX may make things easier.
|
||||
|
||||
|
||||
#### Are there any plans for an R version?
|
||||
Currently no. All the scripts export simple `.csv` files that can be read using any programming language of choice.
|
||||
|
||||
|
||||
#### How do I install the code?
|
||||
The codebase has been tested under Windows 10, Ubuntu, and OSX. Read the instructions in the main readme to get started. If you are having problems getting it working and you feel like you have tried everything (e.g. confirming that your Anaconda Python distribution is correctly installed) feel free to open an issue on GitHub.
|
||||
|
||||
|
||||
## Performance
|
||||
|
||||
#### The model does not work very well on my data?
|
||||
Our model is based on a machine learning approach and as such if your data is very different from our training set it may not work as well. Feel free to use our annotation tools to label some of your own data and retrain the model. Even better, if you have large quantities of audio data with reliable species data that you are willing to share with the community please get in touch.
|
||||
|
||||
|
||||
#### The model is incorrectly classifying insects/noise/... as bats?
|
||||
Fine-tuning the model on your data can make a big difference. See previous answer.
|
||||
|
||||
|
||||
#### The model fails to correctly detect feeding buzzes and social calls?
|
||||
This is a limitation of our current training data. If you have such data or would be willing to label some for us please get in touch.
|
||||
|
||||
|
||||
#### Calls that are clearly belonging to the same call sequence are being predicted as coming from different species?
|
||||
Currently we do not do any sophisticated post processing on the results output by the model. We return a probability associated with each species for each call. You can use these predictions to clean up the noisy predictions for sequences of calls.
|
||||
|
||||
|
||||
#### Can I trust the model outputs?
|
||||
The models developed and shared as part of this repository should be used with caution. While they have been evaluated on held out audio data, great care should be taken when using the model outputs for any form of biodiversity assessment. Your data may differ, and as a result it is very strongly recommended that you validate the model first using data with known species to ensure that the outputs can be trusted.
|
||||
|
||||
|
||||
#### The code works well but it is slow?
|
||||
Try a different/faster computer. On a reasonably recent desktop it takes about 13 seconds (on the GPU) or 1.3 minutes (on the CPU) to process 7.5 minutes of audio. In general, we observe a factor of ~5-10 speed up using recent Nvidia GPUs compared to CPU only systems.
|
||||
|
||||
|
||||
#### My audio files are very big and as a result the code is slow.
|
||||
If your audio files are very long in duration (i.e. multiple minutes) it might be better to split them up into several smaller files. Have a look at the instructions and scripts in our annotation GUI codebase for how to crop your files into shorter ones - see [here](https://github.com/macaodha/batdetect2_GUI).
|
||||
|
||||
|
||||
## Training a new model
|
||||
|
||||
#### Can I train a model on my own bat data with different species?
|
||||
Yes. You just need to provide annotations in the correct format.
|
||||
|
||||
|
||||
#### Will this work for frequency-division or zero-crossing recordings?
|
||||
No. The code assumes that we can convert the input audio into a spectrogram.
|
||||
|
||||
|
||||
#### Will this code work for non-bat audio data e.g. insects or birds?
|
||||
In principle yes, however you may need to change some of the training hyper-parameters to ignore high frequency information when you re-train. Please open an issue on GitHub if you have a specific request.
|
||||
|
||||
|
||||
|
||||
## Usage
|
||||
|
||||
#### Can I use the code for commercial purposes or incorporate raw source code or trained models into my commercial system?
|
||||
No. This codebase is currently only for non-commercial use. See the license.
|
||||
2
justfile
2
justfile
@ -20,7 +20,7 @@ install:
|
||||
# Testing & Coverage
|
||||
# Run tests using pytest.
|
||||
test:
|
||||
uv run pytest {{TESTS_DIR}}
|
||||
uv run pytest -n auto {{TESTS_DIR}}
|
||||
|
||||
# Run tests and generate coverage data.
|
||||
coverage:
|
||||
|
||||
@ -80,6 +80,7 @@ dev = [
|
||||
"numpydoc>=1.8.0",
|
||||
"sphinx-autodoc-typehints>=2.3.0",
|
||||
"sphinx-book-theme>=1.1.4",
|
||||
"sphinx-click>=6.1.0",
|
||||
"autodoc-pydantic>=2.2.0",
|
||||
"pytest-cov>=6.1.1",
|
||||
"ty>=0.0.1a12",
|
||||
@ -87,6 +88,7 @@ dev = [
|
||||
"pandas-stubs>=2.2.2.240807",
|
||||
"python-lsp-server>=1.13.0",
|
||||
"deepdiff>=8.6.1",
|
||||
"pytest-xdist[psutil]>=3.8.0",
|
||||
]
|
||||
dvclive = ["dvclive>=3.48.2"]
|
||||
mlflow = ["mlflow>=3.1.1"]
|
||||
|
||||
@ -50,7 +50,13 @@ from batdetect2.postprocess import (
|
||||
build_postprocessor,
|
||||
)
|
||||
from batdetect2.preprocess import PreprocessorProtocol, build_preprocessor
|
||||
from batdetect2.targets import TargetConfig, TargetProtocol, build_targets
|
||||
from batdetect2.targets import (
|
||||
ROIMapperProtocol,
|
||||
TargetConfig,
|
||||
TargetProtocol,
|
||||
build_roi_mapping,
|
||||
build_targets,
|
||||
)
|
||||
from batdetect2.train import (
|
||||
DEFAULT_CHECKPOINT_DIR,
|
||||
TrainingConfig,
|
||||
@ -70,6 +76,7 @@ class BatDetect2API:
|
||||
outputs_config: OutputsConfig,
|
||||
logging_config: AppLoggingConfig,
|
||||
targets: TargetProtocol,
|
||||
roi_mapper: ROIMapperProtocol,
|
||||
audio_loader: AudioLoader,
|
||||
preprocessor: PreprocessorProtocol,
|
||||
postprocessor: PostprocessorProtocol,
|
||||
@ -86,6 +93,7 @@ class BatDetect2API:
|
||||
self.outputs_config = outputs_config
|
||||
self.logging_config = logging_config
|
||||
self.targets = targets
|
||||
self.roi_mapper = roi_mapper
|
||||
self.audio_loader = audio_loader
|
||||
self.preprocessor = preprocessor
|
||||
self.postprocessor = postprocessor
|
||||
@ -125,6 +133,7 @@ class BatDetect2API:
|
||||
val_annotations=val_annotations,
|
||||
model=self.model,
|
||||
targets=self.targets,
|
||||
roi_mapper=self.roi_mapper,
|
||||
model_config=model_config or self.model_config,
|
||||
audio_loader=self.audio_loader,
|
||||
preprocessor=self.preprocessor,
|
||||
@ -171,6 +180,7 @@ class BatDetect2API:
|
||||
val_annotations=val_annotations,
|
||||
model=self.model,
|
||||
targets=self.targets,
|
||||
roi_mapper=self.roi_mapper,
|
||||
model_config=model_config or self.model_config,
|
||||
preprocessor=self.preprocessor,
|
||||
audio_loader=self.audio_loader,
|
||||
@ -205,6 +215,7 @@ class BatDetect2API:
|
||||
self.model,
|
||||
test_annotations,
|
||||
targets=self.targets,
|
||||
roi_mapper=self.roi_mapper,
|
||||
audio_loader=self.audio_loader,
|
||||
preprocessor=self.preprocessor,
|
||||
audio_config=audio_config or self.audio_config,
|
||||
@ -303,6 +314,7 @@ class BatDetect2API:
|
||||
self,
|
||||
audio_file: data.PathLike,
|
||||
batch_size: int | None = None,
|
||||
detection_threshold: float | None = None,
|
||||
) -> ClipDetections:
|
||||
recording = data.Recording.from_file(audio_file, compute_hash=False)
|
||||
|
||||
@ -313,6 +325,7 @@ class BatDetect2API:
|
||||
if batch_size is not None
|
||||
else self.inference_config.loader.batch_size
|
||||
),
|
||||
detection_threshold=detection_threshold,
|
||||
)
|
||||
detections = [
|
||||
detection
|
||||
@ -333,14 +346,19 @@ class BatDetect2API:
|
||||
def process_audio(
|
||||
self,
|
||||
audio: np.ndarray,
|
||||
detection_threshold: float | None = None,
|
||||
) -> list[Detection]:
|
||||
spec = self.generate_spectrogram(audio)
|
||||
return self.process_spectrogram(spec)
|
||||
return self.process_spectrogram(
|
||||
spec,
|
||||
detection_threshold=detection_threshold,
|
||||
)
|
||||
|
||||
def process_spectrogram(
|
||||
self,
|
||||
spec: torch.Tensor,
|
||||
start_time: float = 0,
|
||||
detection_threshold: float | None = None,
|
||||
) -> list[Detection]:
|
||||
if spec.ndim == 4 and spec.shape[0] > 1:
|
||||
raise ValueError("Batched spectrograms not supported.")
|
||||
@ -352,6 +370,7 @@ class BatDetect2API:
|
||||
|
||||
detections = self.postprocessor(
|
||||
outputs,
|
||||
detection_threshold=detection_threshold,
|
||||
)[0]
|
||||
return self.output_transform.to_detections(
|
||||
detections=detections,
|
||||
@ -361,9 +380,13 @@ class BatDetect2API:
|
||||
def process_directory(
|
||||
self,
|
||||
audio_dir: data.PathLike,
|
||||
detection_threshold: float | None = None,
|
||||
) -> list[ClipDetections]:
|
||||
files = list(get_audio_files(audio_dir))
|
||||
return self.process_files(files)
|
||||
return self.process_files(
|
||||
files,
|
||||
detection_threshold=detection_threshold,
|
||||
)
|
||||
|
||||
def process_files(
|
||||
self,
|
||||
@ -373,11 +396,13 @@ class BatDetect2API:
|
||||
audio_config: AudioConfig | None = None,
|
||||
inference_config: InferenceConfig | None = None,
|
||||
output_config: OutputsConfig | None = None,
|
||||
detection_threshold: float | None = None,
|
||||
) -> list[ClipDetections]:
|
||||
return process_file_list(
|
||||
self.model,
|
||||
audio_files,
|
||||
targets=self.targets,
|
||||
roi_mapper=self.roi_mapper,
|
||||
audio_loader=self.audio_loader,
|
||||
preprocessor=self.preprocessor,
|
||||
output_transform=self.output_transform,
|
||||
@ -386,6 +411,7 @@ class BatDetect2API:
|
||||
audio_config=audio_config or self.audio_config,
|
||||
inference_config=inference_config or self.inference_config,
|
||||
output_config=output_config or self.outputs_config,
|
||||
detection_threshold=detection_threshold,
|
||||
)
|
||||
|
||||
def process_clips(
|
||||
@ -396,11 +422,13 @@ class BatDetect2API:
|
||||
audio_config: AudioConfig | None = None,
|
||||
inference_config: InferenceConfig | None = None,
|
||||
output_config: OutputsConfig | None = None,
|
||||
detection_threshold: float | None = None,
|
||||
) -> list[ClipDetections]:
|
||||
return run_batch_inference(
|
||||
self.model,
|
||||
clips,
|
||||
targets=self.targets,
|
||||
roi_mapper=self.roi_mapper,
|
||||
audio_loader=self.audio_loader,
|
||||
preprocessor=self.preprocessor,
|
||||
output_transform=self.output_transform,
|
||||
@ -409,6 +437,7 @@ class BatDetect2API:
|
||||
audio_config=audio_config or self.audio_config,
|
||||
inference_config=inference_config or self.inference_config,
|
||||
output_config=output_config or self.outputs_config,
|
||||
detection_threshold=detection_threshold,
|
||||
)
|
||||
|
||||
def save_predictions(
|
||||
@ -456,6 +485,7 @@ class BatDetect2API:
|
||||
config: BatDetect2Config,
|
||||
) -> "BatDetect2API":
|
||||
targets = build_targets(config=config.model.targets)
|
||||
roi_mapper = build_roi_mapping(config=config.model.targets.roi)
|
||||
|
||||
audio_loader = build_audio_loader(config=config.audio)
|
||||
|
||||
@ -476,11 +506,13 @@ class BatDetect2API:
|
||||
output_transform = build_output_transform(
|
||||
config=config.outputs.transform,
|
||||
targets=targets,
|
||||
roi_mapper=roi_mapper,
|
||||
)
|
||||
|
||||
evaluator = build_evaluator(
|
||||
config=config.evaluation,
|
||||
targets=targets,
|
||||
roi_mapper=roi_mapper,
|
||||
transform=output_transform,
|
||||
)
|
||||
|
||||
@ -488,7 +520,8 @@ class BatDetect2API:
|
||||
# to avoid device mismatch errors
|
||||
model = build_model(
|
||||
config=config.model,
|
||||
targets=build_targets(config=config.model.targets),
|
||||
targets=targets,
|
||||
roi_mapper=roi_mapper,
|
||||
preprocessor=build_preprocessor(
|
||||
input_samplerate=audio_loader.samplerate,
|
||||
config=config.model.preprocess,
|
||||
@ -508,6 +541,7 @@ class BatDetect2API:
|
||||
outputs_config=config.outputs,
|
||||
logging_config=config.logging,
|
||||
targets=targets,
|
||||
roi_mapper=roi_mapper,
|
||||
audio_loader=audio_loader,
|
||||
preprocessor=preprocessor,
|
||||
postprocessor=postprocessor,
|
||||
@ -545,15 +579,18 @@ class BatDetect2API:
|
||||
and targets_config != model_config.targets
|
||||
):
|
||||
targets = build_targets(config=targets_config)
|
||||
roi_mapper = build_roi_mapping(config=targets_config.roi)
|
||||
model = build_model_with_new_targets(
|
||||
model=model,
|
||||
targets=targets,
|
||||
roi_mapper=roi_mapper,
|
||||
)
|
||||
model_config = model_config.model_copy(
|
||||
update={"targets": targets_config}
|
||||
)
|
||||
|
||||
targets = build_targets(config=model_config.targets)
|
||||
roi_mapper = build_roi_mapping(config=model_config.targets.roi)
|
||||
|
||||
audio_loader = build_audio_loader(config=audio_config)
|
||||
|
||||
@ -575,11 +612,13 @@ class BatDetect2API:
|
||||
output_transform = build_output_transform(
|
||||
config=outputs_config.transform,
|
||||
targets=targets,
|
||||
roi_mapper=roi_mapper,
|
||||
)
|
||||
|
||||
evaluator = build_evaluator(
|
||||
config=evaluation_config,
|
||||
targets=targets,
|
||||
roi_mapper=roi_mapper,
|
||||
transform=output_transform,
|
||||
)
|
||||
|
||||
@ -592,6 +631,7 @@ class BatDetect2API:
|
||||
outputs_config=outputs_config,
|
||||
logging_config=logging_config,
|
||||
targets=targets,
|
||||
roi_mapper=roi_mapper,
|
||||
audio_loader=audio_loader,
|
||||
preprocessor=preprocessor,
|
||||
postprocessor=postprocessor,
|
||||
|
||||
@ -27,7 +27,11 @@ BatDetect2 - Detection and Classification
|
||||
help="Increase verbosity. -v for INFO, -vv for DEBUG.",
|
||||
)
|
||||
def cli(verbose: int = 0):
|
||||
"""BatDetect2 - Bat Call Detection and Classification."""
|
||||
"""Run the BatDetect2 CLI.
|
||||
|
||||
This command initializes logging and exposes subcommands for prediction,
|
||||
training, evaluation, and dataset utilities.
|
||||
"""
|
||||
click.echo(INFO_STR)
|
||||
|
||||
enable_logging(verbose)
|
||||
|
||||
@ -12,7 +12,13 @@ DEFAULT_MODEL_PATH = os.path.join(
|
||||
)
|
||||
|
||||
|
||||
@cli.command()
|
||||
@cli.command(
|
||||
short_help="Legacy detection command.",
|
||||
epilog=(
|
||||
"Deprecated workflow. Prefer `batdetect2 predict directory` for "
|
||||
"new analyses."
|
||||
),
|
||||
)
|
||||
@click.argument(
|
||||
"audio_dir",
|
||||
type=click.Path(exists=True),
|
||||
@ -68,7 +74,10 @@ def detect(
|
||||
time_expansion_factor: int,
|
||||
**args,
|
||||
):
|
||||
"""Detect bat calls in files in AUDIO_DIR and save predictions to ANN_DIR.
|
||||
"""Legacy detection command for directory-based inference.
|
||||
|
||||
Detect bat calls in files in `AUDIO_DIR` and save predictions to
|
||||
`ANN_DIR`.
|
||||
|
||||
DETECTION_THRESHOLD is the detection threshold. All predictions with a
|
||||
score below this threshold will be discarded. Values between 0 and 1.
|
||||
@ -78,6 +87,11 @@ def detect(
|
||||
Spaces in the input paths will throw an error. Wrap in quotes.
|
||||
|
||||
Input files should be short in duration e.g. < 30 seconds.
|
||||
|
||||
Note
|
||||
----
|
||||
This command is kept for backwards compatibility. Prefer
|
||||
`batdetect2 predict directory` for new workflows.
|
||||
"""
|
||||
from batdetect2 import api
|
||||
from batdetect2.utils.detector_utils import save_results_to_file
|
||||
@ -132,7 +146,7 @@ def detect(
|
||||
|
||||
|
||||
def print_config(config):
|
||||
"""Print the processing configuration."""
|
||||
"""Print the processing configuration values."""
|
||||
click.echo("\nProcessing Configuration:")
|
||||
click.echo(f"Time Expansion Factor: {config.get('time_expansion')}")
|
||||
click.echo(f"Detection Threshold: {config.get('detection_threshold')}")
|
||||
|
||||
@ -7,11 +7,12 @@ from batdetect2.cli.base import cli
|
||||
__all__ = ["data"]
|
||||
|
||||
|
||||
@cli.group()
|
||||
def data(): ...
|
||||
@cli.group(short_help="Inspect and convert datasets.")
|
||||
def data():
|
||||
"""Inspect and convert dataset configuration files."""
|
||||
|
||||
|
||||
@data.command()
|
||||
@data.command(short_help="Print dataset summary information.")
|
||||
@click.argument(
|
||||
"dataset_config",
|
||||
type=click.Path(exists=True),
|
||||
@ -19,17 +20,27 @@ def data(): ...
|
||||
@click.option(
|
||||
"--field",
|
||||
type=str,
|
||||
help="If the dataset info is in a nested field please specify here.",
|
||||
help=(
|
||||
"Nested field name that contains dataset configuration. "
|
||||
"Use this when the config is wrapped in a larger file."
|
||||
),
|
||||
)
|
||||
@click.option(
|
||||
"--targets",
|
||||
"targets_path",
|
||||
type=click.Path(exists=True),
|
||||
help=(
|
||||
"Path to targets config file. If provided, a per-class summary "
|
||||
"table is printed."
|
||||
),
|
||||
)
|
||||
@click.option(
|
||||
"--base-dir",
|
||||
type=click.Path(exists=True),
|
||||
help="The base directory to which all recording and annotations paths are relative to.",
|
||||
help=(
|
||||
"Base directory used to resolve relative recording and annotation "
|
||||
"paths in the dataset config."
|
||||
),
|
||||
)
|
||||
def summary(
|
||||
dataset_config: Path,
|
||||
@ -37,6 +48,11 @@ def summary(
|
||||
targets_path: Path | None = None,
|
||||
base_dir: Path | None = None,
|
||||
):
|
||||
"""Show dataset size and optional class summary.
|
||||
|
||||
Prints the number of annotated clips. If `--targets` is provided, it also
|
||||
prints a per-class summary table based on the configured targets.
|
||||
"""
|
||||
from batdetect2.data import compute_class_summary, load_dataset_from_config
|
||||
from batdetect2.targets import load_targets
|
||||
|
||||
@ -60,7 +76,7 @@ def summary(
|
||||
print(summary.to_markdown())
|
||||
|
||||
|
||||
@data.command()
|
||||
@data.command(short_help="Convert dataset config to annotation set.")
|
||||
@click.argument(
|
||||
"dataset_config",
|
||||
type=click.Path(exists=True),
|
||||
@ -68,7 +84,10 @@ def summary(
|
||||
@click.option(
|
||||
"--field",
|
||||
type=str,
|
||||
help="If the dataset info is in a nested field please specify here.",
|
||||
help=(
|
||||
"Nested field name that contains dataset configuration. "
|
||||
"Use this when the config is wrapped in a larger file."
|
||||
),
|
||||
)
|
||||
@click.option(
|
||||
"--output",
|
||||
@ -78,15 +97,79 @@ def summary(
|
||||
@click.option(
|
||||
"--base-dir",
|
||||
type=click.Path(exists=True),
|
||||
help="The base directory to which all recording and annotations paths are relative to.",
|
||||
help=(
|
||||
"Base directory used to resolve relative recording and annotation "
|
||||
"paths in the dataset config."
|
||||
),
|
||||
)
|
||||
@click.option(
|
||||
"--audio-dir",
|
||||
type=click.Path(exists=True),
|
||||
help=(
|
||||
"Directory containing audio files. Output annotation paths are "
|
||||
"made relative to this directory."
|
||||
),
|
||||
)
|
||||
@click.option(
|
||||
"--add-source-tag",
|
||||
is_flag=True,
|
||||
help=(
|
||||
"Add a source tag to each clip annotation. This is useful for "
|
||||
"downstream tools that need to know which source the annotations "
|
||||
"came from."
|
||||
),
|
||||
)
|
||||
@click.option(
|
||||
"--include-sources",
|
||||
type=str,
|
||||
multiple=True,
|
||||
help=(
|
||||
"Only include sources with the specified names. If provided, only "
|
||||
"sources with matching names will be included in the output."
|
||||
),
|
||||
)
|
||||
@click.option(
|
||||
"--exclude-sources",
|
||||
type=str,
|
||||
multiple=True,
|
||||
help=(
|
||||
"Exclude sources with the specified names. If provided, sources with "
|
||||
"matching names will be excluded from the output."
|
||||
),
|
||||
)
|
||||
@click.option(
|
||||
"--apply-transforms/--no-apply-transforms",
|
||||
default=True,
|
||||
help=(
|
||||
"Apply any configured sound event transforms to the annotations. "
|
||||
"Defaults to True."
|
||||
),
|
||||
)
|
||||
@click.option(
|
||||
"--apply-filters/--no-apply-filters",
|
||||
default=True,
|
||||
help=(
|
||||
"Apply any configured sound event filters to the annotations. "
|
||||
"Defaults to True."
|
||||
),
|
||||
)
|
||||
def convert(
|
||||
dataset_config: Path,
|
||||
field: str | None = None,
|
||||
output: Path = Path("annotations.json"),
|
||||
base_dir: Path | None = None,
|
||||
audio_dir: Path | None = None,
|
||||
add_source_tag: bool = True,
|
||||
include_sources: list[str] | None = None,
|
||||
exclude_sources: list[str] | None = None,
|
||||
apply_transforms: bool = True,
|
||||
apply_filters: bool = True,
|
||||
):
|
||||
"""Convert a dataset config file to soundevent format."""
|
||||
"""Convert a dataset config into soundevent annotation-set format.
|
||||
|
||||
Writes a single annotation-set file that can be used by downstream tools.
|
||||
Use `--audio-dir` to control relative audio path handling in the output.
|
||||
"""
|
||||
from soundevent import data, io
|
||||
|
||||
from batdetect2.data import load_dataset, load_dataset_config
|
||||
@ -95,7 +178,15 @@ def convert(
|
||||
|
||||
config = load_dataset_config(dataset_config, field=field)
|
||||
|
||||
dataset = load_dataset(config, base_dir=base_dir)
|
||||
dataset = load_dataset(
|
||||
config,
|
||||
base_dir=base_dir,
|
||||
add_source_tag=add_source_tag,
|
||||
include_sources=include_sources,
|
||||
exclude_sources=exclude_sources,
|
||||
apply_transforms=apply_transforms,
|
||||
apply_filters=apply_filters,
|
||||
)
|
||||
|
||||
annotation_set = data.AnnotationSet(
|
||||
clip_annotations=list(dataset),
|
||||
@ -103,4 +194,12 @@ def convert(
|
||||
description=config.description,
|
||||
)
|
||||
|
||||
io.save(annotation_set, output)
|
||||
if audio_dir:
|
||||
audio_dir = Path(audio_dir)
|
||||
|
||||
if not audio_dir.is_absolute():
|
||||
audio_dir = audio_dir.resolve()
|
||||
|
||||
print(f"Using audio directory: {audio_dir}")
|
||||
|
||||
io.save(annotation_set, output, audio_dir=audio_dir)
|
||||
|
||||
@ -11,20 +11,73 @@ __all__ = ["evaluate_command"]
|
||||
DEFAULT_OUTPUT_DIR = Path("outputs") / "evaluation"
|
||||
|
||||
|
||||
@cli.command(name="evaluate")
|
||||
@cli.command(name="evaluate", short_help="Evaluate a model checkpoint.")
|
||||
@click.argument("model_path", type=click.Path(exists=True))
|
||||
@click.argument("test_dataset", type=click.Path(exists=True))
|
||||
@click.option("--targets", "targets_config", type=click.Path(exists=True))
|
||||
@click.option("--audio-config", type=click.Path(exists=True))
|
||||
@click.option("--evaluation-config", type=click.Path(exists=True))
|
||||
@click.option("--inference-config", type=click.Path(exists=True))
|
||||
@click.option("--outputs-config", type=click.Path(exists=True))
|
||||
@click.option("--logging-config", type=click.Path(exists=True))
|
||||
@click.option("--base-dir", type=click.Path(), default=Path.cwd())
|
||||
@click.option("--output-dir", type=click.Path(), default=DEFAULT_OUTPUT_DIR)
|
||||
@click.option("--experiment-name", type=str)
|
||||
@click.option("--run-name", type=str)
|
||||
@click.option("--workers", "num_workers", type=int)
|
||||
@click.option(
|
||||
"--targets",
|
||||
"targets_config",
|
||||
type=click.Path(exists=True),
|
||||
help="Path to targets config file.",
|
||||
)
|
||||
@click.option(
|
||||
"--audio-config",
|
||||
type=click.Path(exists=True),
|
||||
help="Path to audio config file.",
|
||||
)
|
||||
@click.option(
|
||||
"--evaluation-config",
|
||||
type=click.Path(exists=True),
|
||||
help="Path to evaluation config file.",
|
||||
)
|
||||
@click.option(
|
||||
"--inference-config",
|
||||
type=click.Path(exists=True),
|
||||
help="Path to inference config file.",
|
||||
)
|
||||
@click.option(
|
||||
"--outputs-config",
|
||||
type=click.Path(exists=True),
|
||||
help="Path to outputs config file.",
|
||||
)
|
||||
@click.option(
|
||||
"--logging-config",
|
||||
type=click.Path(exists=True),
|
||||
help="Path to logging config file.",
|
||||
)
|
||||
@click.option(
|
||||
"--base-dir",
|
||||
type=click.Path(),
|
||||
default=Path.cwd(),
|
||||
show_default=True,
|
||||
help=(
|
||||
"Base directory used to resolve relative paths in the dataset "
|
||||
"configuration."
|
||||
),
|
||||
)
|
||||
@click.option(
|
||||
"--output-dir",
|
||||
type=click.Path(),
|
||||
default=DEFAULT_OUTPUT_DIR,
|
||||
show_default=True,
|
||||
help="Directory where evaluation outputs are written.",
|
||||
)
|
||||
@click.option(
|
||||
"--experiment-name",
|
||||
type=str,
|
||||
help="Experiment name used for logging backends.",
|
||||
)
|
||||
@click.option(
|
||||
"--run-name",
|
||||
type=str,
|
||||
help="Run name used for logging backends.",
|
||||
)
|
||||
@click.option(
|
||||
"--workers",
|
||||
"num_workers",
|
||||
type=int,
|
||||
help="Number of worker processes for dataset loading.",
|
||||
)
|
||||
def evaluate_command(
|
||||
model_path: Path,
|
||||
test_dataset: Path,
|
||||
@ -40,6 +93,11 @@ def evaluate_command(
|
||||
experiment_name: str | None = None,
|
||||
run_name: str | None = None,
|
||||
):
|
||||
"""Evaluate a checkpoint against a test dataset.
|
||||
|
||||
Loads model and optional override configs, runs evaluation on
|
||||
`test_dataset`, and writes metrics/artifacts to `output_dir`.
|
||||
"""
|
||||
from batdetect2.api_v2 import BatDetect2API
|
||||
from batdetect2.audio import AudioConfig
|
||||
from batdetect2.data import load_dataset_from_config
|
||||
|
||||
@ -1,4 +1,6 @@
|
||||
from functools import wraps
|
||||
from pathlib import Path
|
||||
from typing import TYPE_CHECKING
|
||||
|
||||
import click
|
||||
from loguru import logger
|
||||
@ -7,12 +9,98 @@ from soundevent.audio.files import get_audio_files
|
||||
|
||||
from batdetect2.cli.base import cli
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from batdetect2.api_v2 import BatDetect2API
|
||||
from batdetect2.audio import AudioConfig
|
||||
from batdetect2.inference import InferenceConfig
|
||||
from batdetect2.outputs import OutputsConfig
|
||||
|
||||
__all__ = ["predict"]
|
||||
|
||||
|
||||
@cli.group(name="predict")
|
||||
@cli.group(name="predict", short_help="Run prediction workflows.")
|
||||
def predict() -> None:
|
||||
"""Run prediction with BatDetect2 API v2."""
|
||||
"""Run model inference on audio files.
|
||||
|
||||
Use one of the subcommands to select inputs from a directory, a text file
|
||||
list, or an annotation dataset.
|
||||
"""
|
||||
|
||||
|
||||
def common_predict_options(func):
|
||||
"""Attach options shared by all `predict` subcommands."""
|
||||
|
||||
@click.option(
|
||||
"--audio-config",
|
||||
type=click.Path(exists=True),
|
||||
help=(
|
||||
"Path to an audio config file. Use this to override audio "
|
||||
"loading and preprocessing-related settings."
|
||||
),
|
||||
)
|
||||
@click.option(
|
||||
"--inference-config",
|
||||
type=click.Path(exists=True),
|
||||
help=(
|
||||
"Path to an inference config file. Use this to override "
|
||||
"prediction-time thresholds and behavior."
|
||||
),
|
||||
)
|
||||
@click.option(
|
||||
"--outputs-config",
|
||||
type=click.Path(exists=True),
|
||||
help=(
|
||||
"Path to an outputs config file. Use this to control the "
|
||||
"prediction fields written to disk."
|
||||
),
|
||||
)
|
||||
@click.option(
|
||||
"--logging-config",
|
||||
type=click.Path(exists=True),
|
||||
help=(
|
||||
"Path to a logging config file. Use this to customize logging "
|
||||
"format and levels."
|
||||
),
|
||||
)
|
||||
@click.option(
|
||||
"--batch-size",
|
||||
type=int,
|
||||
help=(
|
||||
"Batch size for inference. If omitted, the value from the "
|
||||
"loaded config is used."
|
||||
),
|
||||
)
|
||||
@click.option(
|
||||
"--workers",
|
||||
"num_workers",
|
||||
type=int,
|
||||
default=0,
|
||||
show_default=True,
|
||||
help="Number of worker processes for audio loading.",
|
||||
)
|
||||
@click.option(
|
||||
"--format",
|
||||
"format_name",
|
||||
type=str,
|
||||
help=(
|
||||
"Output format name used by the prediction writer. If omitted, "
|
||||
"the default output format is used."
|
||||
),
|
||||
)
|
||||
@click.option(
|
||||
"--detection-threshold",
|
||||
type=click.FloatRange(min=0.0, max=1.0),
|
||||
default=None,
|
||||
help=(
|
||||
"Optional detection score threshold override. If omitted, "
|
||||
"the model default threshold is used."
|
||||
),
|
||||
)
|
||||
@wraps(func)
|
||||
def wrapped(*args, **kwargs):
|
||||
return func(*args, **kwargs)
|
||||
|
||||
return wrapped
|
||||
|
||||
|
||||
def _build_api(
|
||||
@ -21,7 +109,7 @@ def _build_api(
|
||||
inference_config: Path | None,
|
||||
outputs_config: Path | None,
|
||||
logging_config: Path | None,
|
||||
):
|
||||
) -> "tuple[BatDetect2API, AudioConfig | None, InferenceConfig | None, OutputsConfig | None]":
|
||||
from batdetect2.api_v2 import BatDetect2API
|
||||
from batdetect2.audio import AudioConfig
|
||||
from batdetect2.inference import InferenceConfig
|
||||
@ -68,6 +156,7 @@ def _run_prediction(
|
||||
batch_size: int | None,
|
||||
num_workers: int,
|
||||
format_name: str | None,
|
||||
detection_threshold: float | None,
|
||||
) -> None:
|
||||
logger.info("Initiating prediction process...")
|
||||
|
||||
@ -88,6 +177,7 @@ def _run_prediction(
|
||||
audio_config=audio_conf,
|
||||
inference_config=inference_conf,
|
||||
output_config=outputs_conf,
|
||||
detection_threshold=detection_threshold,
|
||||
)
|
||||
|
||||
common_path = audio_files[0].parent if audio_files else None
|
||||
@ -103,17 +193,14 @@ def _run_prediction(
|
||||
)
|
||||
|
||||
|
||||
@predict.command(name="directory")
|
||||
@predict.command(
|
||||
name="directory",
|
||||
short_help="Predict on audio files in a directory.",
|
||||
)
|
||||
@click.argument("model_path", type=click.Path(exists=True))
|
||||
@click.argument("audio_dir", type=click.Path(exists=True))
|
||||
@click.argument("output_path", type=click.Path())
|
||||
@click.option("--audio-config", type=click.Path(exists=True))
|
||||
@click.option("--inference-config", type=click.Path(exists=True))
|
||||
@click.option("--outputs-config", type=click.Path(exists=True))
|
||||
@click.option("--logging-config", type=click.Path(exists=True))
|
||||
@click.option("--batch-size", type=int)
|
||||
@click.option("--workers", "num_workers", type=int, default=0)
|
||||
@click.option("--format", "format_name", type=str)
|
||||
@common_predict_options
|
||||
def predict_directory_command(
|
||||
model_path: Path,
|
||||
audio_dir: Path,
|
||||
@ -125,7 +212,13 @@ def predict_directory_command(
|
||||
batch_size: int | None,
|
||||
num_workers: int,
|
||||
format_name: str | None,
|
||||
detection_threshold: float | None,
|
||||
) -> None:
|
||||
"""Predict on all audio files in a directory.
|
||||
|
||||
Loads a checkpoint, scans `audio_dir` for supported audio files, runs
|
||||
inference, and saves predictions to `output_path`.
|
||||
"""
|
||||
audio_files = list(get_audio_files(audio_dir))
|
||||
_run_prediction(
|
||||
model_path=model_path,
|
||||
@ -138,20 +231,18 @@ def predict_directory_command(
|
||||
batch_size=batch_size,
|
||||
num_workers=num_workers,
|
||||
format_name=format_name,
|
||||
detection_threshold=detection_threshold,
|
||||
)
|
||||
|
||||
|
||||
@predict.command(name="file_list")
|
||||
@predict.command(
|
||||
name="file_list",
|
||||
short_help="Predict on paths listed in a text file.",
|
||||
)
|
||||
@click.argument("model_path", type=click.Path(exists=True))
|
||||
@click.argument("file_list", type=click.Path(exists=True))
|
||||
@click.argument("output_path", type=click.Path())
|
||||
@click.option("--audio-config", type=click.Path(exists=True))
|
||||
@click.option("--inference-config", type=click.Path(exists=True))
|
||||
@click.option("--outputs-config", type=click.Path(exists=True))
|
||||
@click.option("--logging-config", type=click.Path(exists=True))
|
||||
@click.option("--batch-size", type=int)
|
||||
@click.option("--workers", "num_workers", type=int, default=0)
|
||||
@click.option("--format", "format_name", type=str)
|
||||
@common_predict_options
|
||||
def predict_file_list_command(
|
||||
model_path: Path,
|
||||
file_list: Path,
|
||||
@ -163,7 +254,13 @@ def predict_file_list_command(
|
||||
batch_size: int | None,
|
||||
num_workers: int,
|
||||
format_name: str | None,
|
||||
detection_threshold: float | None,
|
||||
) -> None:
|
||||
"""Predict on audio files listed in a text file.
|
||||
|
||||
The list file should contain one audio path per line. Empty lines are
|
||||
ignored.
|
||||
"""
|
||||
file_list = Path(file_list)
|
||||
audio_files = [
|
||||
Path(line.strip())
|
||||
@ -182,20 +279,18 @@ def predict_file_list_command(
|
||||
batch_size=batch_size,
|
||||
num_workers=num_workers,
|
||||
format_name=format_name,
|
||||
detection_threshold=detection_threshold,
|
||||
)
|
||||
|
||||
|
||||
@predict.command(name="dataset")
|
||||
@predict.command(
|
||||
name="dataset",
|
||||
short_help="Predict on recordings from a dataset config.",
|
||||
)
|
||||
@click.argument("model_path", type=click.Path(exists=True))
|
||||
@click.argument("dataset_path", type=click.Path(exists=True))
|
||||
@click.argument("output_path", type=click.Path())
|
||||
@click.option("--audio-config", type=click.Path(exists=True))
|
||||
@click.option("--inference-config", type=click.Path(exists=True))
|
||||
@click.option("--outputs-config", type=click.Path(exists=True))
|
||||
@click.option("--logging-config", type=click.Path(exists=True))
|
||||
@click.option("--batch-size", type=int)
|
||||
@click.option("--workers", "num_workers", type=int, default=0)
|
||||
@click.option("--format", "format_name", type=str)
|
||||
@common_predict_options
|
||||
def predict_dataset_command(
|
||||
model_path: Path,
|
||||
dataset_path: Path,
|
||||
@ -207,7 +302,13 @@ def predict_dataset_command(
|
||||
batch_size: int | None,
|
||||
num_workers: int,
|
||||
format_name: str | None,
|
||||
detection_threshold: float | None,
|
||||
) -> None:
|
||||
"""Predict on recordings referenced in an annotation dataset.
|
||||
|
||||
The dataset is read as a soundevent annotation set and unique recording
|
||||
paths are extracted before inference.
|
||||
"""
|
||||
dataset_path = Path(dataset_path)
|
||||
dataset = io.load(dataset_path, type="annotation_set")
|
||||
audio_files = sorted(
|
||||
@ -228,4 +329,5 @@ def predict_dataset_command(
|
||||
batch_size=batch_size,
|
||||
num_workers=num_workers,
|
||||
format_name=format_name,
|
||||
detection_threshold=detection_threshold,
|
||||
)
|
||||
|
||||
@ -8,26 +8,103 @@ from batdetect2.cli.base import cli
|
||||
__all__ = ["train_command"]
|
||||
|
||||
|
||||
@cli.command(name="train")
|
||||
@cli.command(name="train", short_help="Train or fine-tune a model.")
|
||||
@click.argument("train_dataset", type=click.Path(exists=True))
|
||||
@click.option("--val-dataset", type=click.Path(exists=True))
|
||||
@click.option("--model", "model_path", type=click.Path(exists=True))
|
||||
@click.option("--targets", "targets_config", type=click.Path(exists=True))
|
||||
@click.option("--model-config", type=click.Path(exists=True))
|
||||
@click.option("--training-config", type=click.Path(exists=True))
|
||||
@click.option("--audio-config", type=click.Path(exists=True))
|
||||
@click.option("--evaluation-config", type=click.Path(exists=True))
|
||||
@click.option("--inference-config", type=click.Path(exists=True))
|
||||
@click.option("--outputs-config", type=click.Path(exists=True))
|
||||
@click.option("--logging-config", type=click.Path(exists=True))
|
||||
@click.option("--ckpt-dir", type=click.Path(exists=True))
|
||||
@click.option("--log-dir", type=click.Path(exists=True))
|
||||
@click.option("--train-workers", type=int)
|
||||
@click.option("--val-workers", type=int)
|
||||
@click.option("--num-epochs", type=int)
|
||||
@click.option("--experiment-name", type=str)
|
||||
@click.option("--run-name", type=str)
|
||||
@click.option("--seed", type=int)
|
||||
@click.option(
|
||||
"--val-dataset",
|
||||
type=click.Path(exists=True),
|
||||
help="Path to validation dataset config file.",
|
||||
)
|
||||
@click.option(
|
||||
"--model",
|
||||
"model_path",
|
||||
type=click.Path(exists=True),
|
||||
help=(
|
||||
"Path to a checkpoint to continue training from. If omitted, "
|
||||
"training starts from a fresh model config."
|
||||
),
|
||||
)
|
||||
@click.option(
|
||||
"--targets",
|
||||
"targets_config",
|
||||
type=click.Path(exists=True),
|
||||
help="Path to targets config file.",
|
||||
)
|
||||
@click.option(
|
||||
"--model-config",
|
||||
type=click.Path(exists=True),
|
||||
help=("Path to model config file. Cannot be used together with --model."),
|
||||
)
|
||||
@click.option(
|
||||
"--training-config",
|
||||
type=click.Path(exists=True),
|
||||
help="Path to training config file.",
|
||||
)
|
||||
@click.option(
|
||||
"--audio-config",
|
||||
type=click.Path(exists=True),
|
||||
help="Path to audio config file.",
|
||||
)
|
||||
@click.option(
|
||||
"--evaluation-config",
|
||||
type=click.Path(exists=True),
|
||||
help="Path to evaluation config file.",
|
||||
)
|
||||
@click.option(
|
||||
"--inference-config",
|
||||
type=click.Path(exists=True),
|
||||
help="Path to inference config file.",
|
||||
)
|
||||
@click.option(
|
||||
"--outputs-config",
|
||||
type=click.Path(exists=True),
|
||||
help="Path to outputs config file.",
|
||||
)
|
||||
@click.option(
|
||||
"--logging-config",
|
||||
type=click.Path(exists=True),
|
||||
help="Path to logging config file.",
|
||||
)
|
||||
@click.option(
|
||||
"--ckpt-dir",
|
||||
type=click.Path(exists=True),
|
||||
help="Directory where checkpoints are saved.",
|
||||
)
|
||||
@click.option(
|
||||
"--log-dir",
|
||||
type=click.Path(exists=True),
|
||||
help="Directory where logs are written.",
|
||||
)
|
||||
@click.option(
|
||||
"--train-workers",
|
||||
type=int,
|
||||
help="Number of worker processes for training data loading.",
|
||||
)
|
||||
@click.option(
|
||||
"--val-workers",
|
||||
type=int,
|
||||
help="Number of worker processes for validation data loading.",
|
||||
)
|
||||
@click.option(
|
||||
"--num-epochs",
|
||||
type=int,
|
||||
help="Maximum number of training epochs.",
|
||||
)
|
||||
@click.option(
|
||||
"--experiment-name",
|
||||
type=str,
|
||||
help="Experiment name used for logging backends.",
|
||||
)
|
||||
@click.option(
|
||||
"--run-name",
|
||||
type=str,
|
||||
help="Run name used for logging backends.",
|
||||
)
|
||||
@click.option(
|
||||
"--seed",
|
||||
type=int,
|
||||
help="Random seed used for reproducibility.",
|
||||
)
|
||||
def train_command(
|
||||
train_dataset: Path,
|
||||
val_dataset: Path | None = None,
|
||||
@ -49,6 +126,12 @@ def train_command(
|
||||
experiment_name: str | None = None,
|
||||
run_name: str | None = None,
|
||||
):
|
||||
"""Train a BatDetect2 model.
|
||||
|
||||
Train either from a fresh config (`--model-config`) or by fine-tuning an
|
||||
existing checkpoint (`--model`). Training data are loaded from
|
||||
`train_dataset`, with optional validation data from `--val-dataset`.
|
||||
"""
|
||||
from batdetect2.api_v2 import BatDetect2API
|
||||
from batdetect2.audio import AudioConfig
|
||||
from batdetect2.config import BatDetect2Config
|
||||
|
||||
@ -19,7 +19,7 @@ The core components are:
|
||||
"""
|
||||
|
||||
from pathlib import Path
|
||||
from typing import List, Sequence
|
||||
from typing import Sequence
|
||||
|
||||
from loguru import logger
|
||||
from pydantic import Field
|
||||
@ -67,10 +67,10 @@ class DatasetConfig(BaseConfig):
|
||||
|
||||
name: str
|
||||
description: str
|
||||
sources: List[AnnotationFormats]
|
||||
sources: list[AnnotationFormats]
|
||||
|
||||
sound_event_filter: SoundEventConditionConfig | None = None
|
||||
sound_event_transforms: List[SoundEventTransformConfig] = Field(
|
||||
sound_event_transforms: list[SoundEventTransformConfig] = Field(
|
||||
default_factory=list
|
||||
)
|
||||
|
||||
@ -78,6 +78,11 @@ class DatasetConfig(BaseConfig):
|
||||
def load_dataset(
|
||||
config: DatasetConfig,
|
||||
base_dir: data.PathLike | None = None,
|
||||
add_source_tag: bool = True,
|
||||
include_sources: list[str] | None = None,
|
||||
exclude_sources: list[str] | None = None,
|
||||
apply_transforms: bool = True,
|
||||
apply_filters: bool = True,
|
||||
) -> Dataset:
|
||||
"""Load all clip annotations from the sources defined in a DatasetConfig."""
|
||||
clip_annotations = []
|
||||
@ -102,6 +107,12 @@ def load_dataset(
|
||||
for source in config.sources:
|
||||
annotated_source = load_annotated_dataset(source, base_dir=base_dir)
|
||||
|
||||
if include_sources and source.name not in include_sources:
|
||||
continue
|
||||
|
||||
if exclude_sources and source.name in exclude_sources:
|
||||
continue
|
||||
|
||||
logger.debug(
|
||||
"Loaded {num_examples} from dataset source '{source_name}'",
|
||||
num_examples=len(annotated_source.clip_annotations),
|
||||
@ -109,15 +120,16 @@ def load_dataset(
|
||||
)
|
||||
|
||||
for clip_annotation in annotated_source.clip_annotations:
|
||||
clip_annotation = insert_source_tag(clip_annotation, source)
|
||||
if add_source_tag:
|
||||
clip_annotation = insert_source_tag(clip_annotation, source)
|
||||
|
||||
if condition is not None:
|
||||
if condition is not None and apply_filters:
|
||||
clip_annotation = filter_clip_annotation(
|
||||
clip_annotation,
|
||||
condition,
|
||||
)
|
||||
|
||||
if transform is not None:
|
||||
if transform is not None and apply_transforms:
|
||||
clip_annotation = transform_clip_annotation(
|
||||
clip_annotation,
|
||||
transform,
|
||||
|
||||
@ -16,7 +16,7 @@ from batdetect2.outputs import OutputsConfig, build_output_transform
|
||||
from batdetect2.outputs.types import OutputFormatterProtocol
|
||||
from batdetect2.postprocess.types import ClipDetections
|
||||
from batdetect2.preprocess.types import PreprocessorProtocol
|
||||
from batdetect2.targets.types import TargetProtocol
|
||||
from batdetect2.targets.types import ROIMapperProtocol, TargetProtocol
|
||||
|
||||
DEFAULT_EVAL_DIR: Path = Path("outputs") / "evaluations"
|
||||
|
||||
@ -25,6 +25,7 @@ def run_evaluate(
|
||||
model: Model,
|
||||
test_annotations: Sequence[data.ClipAnnotation],
|
||||
targets: TargetProtocol | None = None,
|
||||
roi_mapper: ROIMapperProtocol | None = None,
|
||||
audio_loader: AudioLoader | None = None,
|
||||
preprocessor: PreprocessorProtocol | None = None,
|
||||
audio_config: AudioConfig | None = None,
|
||||
@ -46,6 +47,7 @@ def run_evaluate(
|
||||
|
||||
preprocessor = preprocessor or model.preprocessor
|
||||
targets = targets or model.targets
|
||||
roi_mapper = roi_mapper or model.roi_mapper
|
||||
|
||||
loader = build_test_loader(
|
||||
test_annotations,
|
||||
@ -57,6 +59,7 @@ def run_evaluate(
|
||||
output_transform = build_output_transform(
|
||||
config=output_config.transform,
|
||||
targets=targets,
|
||||
roi_mapper=roi_mapper,
|
||||
)
|
||||
evaluator = build_evaluator(
|
||||
config=evaluation_config,
|
||||
|
||||
@ -8,8 +8,8 @@ from batdetect2.evaluate.tasks import build_task
|
||||
from batdetect2.evaluate.types import EvaluationTaskProtocol, EvaluatorProtocol
|
||||
from batdetect2.outputs import OutputTransformProtocol, build_output_transform
|
||||
from batdetect2.postprocess.types import ClipDetections, ClipDetectionsTensor
|
||||
from batdetect2.targets import build_targets
|
||||
from batdetect2.targets.types import TargetProtocol
|
||||
from batdetect2.targets import build_roi_mapping, build_targets
|
||||
from batdetect2.targets.types import ROIMapperProtocol, TargetProtocol
|
||||
|
||||
__all__ = [
|
||||
"Evaluator",
|
||||
@ -67,17 +67,23 @@ class Evaluator:
|
||||
def build_evaluator(
|
||||
config: EvaluationConfig | dict | None = None,
|
||||
targets: TargetProtocol | None = None,
|
||||
roi_mapper: ROIMapperProtocol | None = None,
|
||||
transform: OutputTransformProtocol | None = None,
|
||||
) -> EvaluatorProtocol:
|
||||
targets = targets or build_targets()
|
||||
|
||||
roi_mapper = roi_mapper or build_roi_mapping()
|
||||
|
||||
if config is None:
|
||||
config = EvaluationConfig()
|
||||
|
||||
if not isinstance(config, EvaluationConfig):
|
||||
config = EvaluationConfig.model_validate(config)
|
||||
|
||||
transform = transform or build_output_transform(targets=targets)
|
||||
transform = transform or build_output_transform(
|
||||
targets=targets,
|
||||
roi_mapper=roi_mapper,
|
||||
)
|
||||
|
||||
return Evaluator(
|
||||
targets=targets,
|
||||
|
||||
@ -18,19 +18,21 @@ from batdetect2.outputs import (
|
||||
)
|
||||
from batdetect2.postprocess.types import ClipDetections
|
||||
from batdetect2.preprocess.types import PreprocessorProtocol
|
||||
from batdetect2.targets.types import TargetProtocol
|
||||
from batdetect2.targets.types import ROIMapperProtocol, TargetProtocol
|
||||
|
||||
|
||||
def run_batch_inference(
|
||||
model: Model,
|
||||
clips: Sequence[data.Clip],
|
||||
targets: TargetProtocol | None = None,
|
||||
roi_mapper: ROIMapperProtocol | None = None,
|
||||
audio_loader: AudioLoader | None = None,
|
||||
preprocessor: PreprocessorProtocol | None = None,
|
||||
audio_config: AudioConfig | None = None,
|
||||
output_transform: OutputTransformProtocol | None = None,
|
||||
output_config: OutputsConfig | None = None,
|
||||
inference_config: InferenceConfig | None = None,
|
||||
detection_threshold: float | None = None,
|
||||
num_workers: int = 1,
|
||||
batch_size: int | None = None,
|
||||
) -> list[ClipDetections]:
|
||||
@ -44,10 +46,12 @@ def run_batch_inference(
|
||||
|
||||
preprocessor = preprocessor or model.preprocessor
|
||||
targets = targets or model.targets
|
||||
roi_mapper = roi_mapper or model.roi_mapper
|
||||
|
||||
output_transform = output_transform or build_output_transform(
|
||||
config=output_config.transform,
|
||||
targets=targets,
|
||||
roi_mapper=roi_mapper,
|
||||
)
|
||||
|
||||
loader = build_inference_loader(
|
||||
@ -62,6 +66,7 @@ def run_batch_inference(
|
||||
module = InferenceModule(
|
||||
model,
|
||||
output_transform=output_transform,
|
||||
detection_threshold=detection_threshold,
|
||||
)
|
||||
trainer = Trainer(enable_checkpointing=False, logger=False)
|
||||
outputs = trainer.predict(module, loader)
|
||||
@ -76,12 +81,14 @@ def process_file_list(
|
||||
model: Model,
|
||||
paths: Sequence[data.PathLike],
|
||||
targets: TargetProtocol | None = None,
|
||||
roi_mapper: ROIMapperProtocol | None = None,
|
||||
audio_loader: AudioLoader | None = None,
|
||||
audio_config: AudioConfig | None = None,
|
||||
preprocessor: PreprocessorProtocol | None = None,
|
||||
inference_config: InferenceConfig | None = None,
|
||||
output_config: OutputsConfig | None = None,
|
||||
output_transform: OutputTransformProtocol | None = None,
|
||||
detection_threshold: float | None = None,
|
||||
batch_size: int | None = None,
|
||||
num_workers: int = 0,
|
||||
) -> list[ClipDetections]:
|
||||
@ -98,6 +105,7 @@ def process_file_list(
|
||||
model,
|
||||
clips,
|
||||
targets=targets,
|
||||
roi_mapper=roi_mapper,
|
||||
audio_loader=audio_loader,
|
||||
preprocessor=preprocessor,
|
||||
batch_size=batch_size,
|
||||
@ -106,4 +114,5 @@ def process_file_list(
|
||||
audio_config=audio_config,
|
||||
output_transform=output_transform,
|
||||
inference_config=inference_config,
|
||||
detection_threshold=detection_threshold,
|
||||
)
|
||||
|
||||
@ -14,11 +14,14 @@ class InferenceModule(LightningModule):
|
||||
self,
|
||||
model: Model,
|
||||
output_transform: OutputTransformProtocol | None = None,
|
||||
detection_threshold: float | None = None,
|
||||
):
|
||||
super().__init__()
|
||||
self.model = model
|
||||
self.detection_threshold = detection_threshold
|
||||
self.output_transform = output_transform or build_output_transform(
|
||||
targets=model.targets
|
||||
targets=model.targets,
|
||||
roi_mapper=model.roi_mapper,
|
||||
)
|
||||
|
||||
def predict_step(
|
||||
@ -33,7 +36,10 @@ class InferenceModule(LightningModule):
|
||||
|
||||
outputs = self.model.detector(batch.spec)
|
||||
|
||||
clip_detections = self.model.postprocessor(outputs)
|
||||
clip_detections = self.model.postprocessor(
|
||||
outputs,
|
||||
detection_threshold=self.detection_threshold,
|
||||
)
|
||||
|
||||
return [
|
||||
self.output_transform.to_clip_detections(
|
||||
|
||||
@ -74,7 +74,7 @@ from batdetect2.postprocess.types import (
|
||||
from batdetect2.preprocess.config import PreprocessingConfig
|
||||
from batdetect2.preprocess.types import PreprocessorProtocol
|
||||
from batdetect2.targets.config import TargetConfig
|
||||
from batdetect2.targets.types import TargetProtocol
|
||||
from batdetect2.targets.types import ROIMapperProtocol, TargetProtocol
|
||||
|
||||
__all__ = [
|
||||
"BBoxHead",
|
||||
@ -186,12 +186,15 @@ class Model(torch.nn.Module):
|
||||
targets : TargetProtocol
|
||||
Describes the set of target classes; used when building heads and
|
||||
during training target construction.
|
||||
roi_mapper : ROIMapperProtocol
|
||||
Maps geometries to target-size channels and back.
|
||||
"""
|
||||
|
||||
detector: DetectionModel
|
||||
preprocessor: PreprocessorProtocol
|
||||
postprocessor: PostprocessorProtocol
|
||||
targets: TargetProtocol
|
||||
roi_mapper: ROIMapperProtocol
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
@ -199,12 +202,14 @@ class Model(torch.nn.Module):
|
||||
preprocessor: PreprocessorProtocol,
|
||||
postprocessor: PostprocessorProtocol,
|
||||
targets: TargetProtocol,
|
||||
roi_mapper: ROIMapperProtocol,
|
||||
):
|
||||
super().__init__()
|
||||
self.detector = detector
|
||||
self.preprocessor = preprocessor
|
||||
self.postprocessor = postprocessor
|
||||
self.targets = targets
|
||||
self.roi_mapper = roi_mapper
|
||||
|
||||
def forward(self, wav: torch.Tensor) -> list[ClipDetectionsTensor]:
|
||||
"""Run the full detection pipeline on a waveform tensor.
|
||||
@ -234,6 +239,7 @@ class Model(torch.nn.Module):
|
||||
def build_model(
|
||||
config: ModelConfig | None = None,
|
||||
targets: TargetProtocol | None = None,
|
||||
roi_mapper: ROIMapperProtocol | None = None,
|
||||
preprocessor: PreprocessorProtocol | None = None,
|
||||
postprocessor: PostprocessorProtocol | None = None,
|
||||
) -> Model:
|
||||
@ -272,10 +278,19 @@ def build_model(
|
||||
"""
|
||||
from batdetect2.postprocess import build_postprocessor
|
||||
from batdetect2.preprocess import build_preprocessor
|
||||
from batdetect2.targets import build_targets
|
||||
from batdetect2.targets import build_roi_mapping, build_targets
|
||||
|
||||
config = config or ModelConfig()
|
||||
targets = targets or build_targets(config=config.targets)
|
||||
|
||||
targets_config = getattr(targets, "config", None)
|
||||
roi_config = (
|
||||
targets_config.roi
|
||||
if isinstance(targets_config, TargetConfig)
|
||||
else config.targets.roi
|
||||
)
|
||||
|
||||
roi_mapper = roi_mapper or build_roi_mapping(config=roi_config)
|
||||
preprocessor = preprocessor or build_preprocessor(
|
||||
config=config.preprocess,
|
||||
input_samplerate=config.samplerate,
|
||||
@ -286,6 +301,7 @@ def build_model(
|
||||
)
|
||||
detector = build_detector(
|
||||
num_classes=len(targets.class_names),
|
||||
num_sizes=len(roi_mapper.dimension_names),
|
||||
config=config.architecture,
|
||||
)
|
||||
return Model(
|
||||
@ -293,16 +309,19 @@ def build_model(
|
||||
postprocessor=postprocessor,
|
||||
preprocessor=preprocessor,
|
||||
targets=targets,
|
||||
roi_mapper=roi_mapper,
|
||||
)
|
||||
|
||||
|
||||
def build_model_with_new_targets(
|
||||
model: Model,
|
||||
targets: TargetProtocol,
|
||||
roi_mapper: ROIMapperProtocol,
|
||||
) -> Model:
|
||||
"""Build a new model with a different target set."""
|
||||
detector = build_detector(
|
||||
num_classes=len(targets.class_names),
|
||||
num_sizes=len(roi_mapper.dimension_names),
|
||||
backbone=model.detector.backbone,
|
||||
)
|
||||
|
||||
@ -311,4 +330,5 @@ def build_model_with_new_targets(
|
||||
postprocessor=model.postprocessor,
|
||||
preprocessor=model.preprocessor,
|
||||
targets=targets,
|
||||
roi_mapper=roi_mapper,
|
||||
)
|
||||
|
||||
@ -136,6 +136,7 @@ class Detector(DetectionModel):
|
||||
|
||||
def build_detector(
|
||||
num_classes: int,
|
||||
num_sizes: int = 2,
|
||||
config: BackboneConfig | None = None,
|
||||
backbone: BackboneModel | None = None,
|
||||
) -> DetectionModel:
|
||||
@ -181,6 +182,7 @@ def build_detector(
|
||||
)
|
||||
bbox_head = BBoxHead(
|
||||
in_channels=backbone.out_channels,
|
||||
num_sizes=num_sizes,
|
||||
)
|
||||
return Detector(
|
||||
backbone=backbone,
|
||||
|
||||
@ -165,14 +165,15 @@ class BBoxHead(nn.Module):
|
||||
1×1 convolution with 2 output channels (duration, bandwidth).
|
||||
"""
|
||||
|
||||
def __init__(self, in_channels: int):
|
||||
def __init__(self, in_channels: int, num_sizes: int = 2):
|
||||
"""Initialise the BBoxHead."""
|
||||
super().__init__()
|
||||
self.in_channels = in_channels
|
||||
self.num_sizes = num_sizes
|
||||
|
||||
self.bbox = nn.Conv2d(
|
||||
in_channels=self.in_channels,
|
||||
out_channels=2,
|
||||
out_channels=self.num_sizes,
|
||||
kernel_size=1,
|
||||
padding=0,
|
||||
)
|
||||
|
||||
@ -28,7 +28,7 @@ from batdetect2.postprocess.types import (
|
||||
ClipDetectionsTensor,
|
||||
Detection,
|
||||
)
|
||||
from batdetect2.targets.types import TargetProtocol
|
||||
from batdetect2.targets.types import ROIMapperProtocol, TargetProtocol
|
||||
|
||||
__all__ = [
|
||||
"ClipDetectionsTransformConfig",
|
||||
@ -55,10 +55,12 @@ class OutputTransform(OutputTransformProtocol):
|
||||
def __init__(
|
||||
self,
|
||||
targets: TargetProtocol,
|
||||
roi_mapper: ROIMapperProtocol,
|
||||
detection_transform_steps: Sequence[DetectionTransform] = (),
|
||||
clip_transform_steps: Sequence[ClipDetectionsTransform] = (),
|
||||
):
|
||||
self.targets = targets
|
||||
self.roi_mapper = roi_mapper
|
||||
self.detection_transform_steps = list(detection_transform_steps)
|
||||
self.clip_transform_steps = list(clip_transform_steps)
|
||||
|
||||
@ -89,7 +91,11 @@ class OutputTransform(OutputTransformProtocol):
|
||||
detections: ClipDetectionsTensor,
|
||||
start_time: float = 0,
|
||||
) -> list[Detection]:
|
||||
decoded = to_detections(detections.numpy(), targets=self.targets)
|
||||
decoded = to_detections(
|
||||
detections.numpy(),
|
||||
targets=self.targets,
|
||||
roi_mapper=self.roi_mapper,
|
||||
)
|
||||
shifted = shift_detections_to_start_time(
|
||||
decoded,
|
||||
start_time=start_time,
|
||||
@ -151,8 +157,9 @@ class OutputTransform(OutputTransformProtocol):
|
||||
def build_output_transform(
|
||||
config: OutputTransformConfig | dict | None = None,
|
||||
targets: TargetProtocol | None = None,
|
||||
roi_mapper: ROIMapperProtocol | None = None,
|
||||
) -> OutputTransformProtocol:
|
||||
from batdetect2.targets import build_targets
|
||||
from batdetect2.targets import build_roi_mapping, build_targets
|
||||
|
||||
if config is None:
|
||||
config = OutputTransformConfig()
|
||||
@ -161,9 +168,11 @@ def build_output_transform(
|
||||
config = OutputTransformConfig.model_validate(config)
|
||||
|
||||
targets = targets or build_targets()
|
||||
roi_mapper = roi_mapper or build_roi_mapping()
|
||||
|
||||
return OutputTransform(
|
||||
targets=targets,
|
||||
roi_mapper=roi_mapper,
|
||||
detection_transform_steps=[
|
||||
detection_transform_registry.build(transform_config)
|
||||
for transform_config in config.detection_transforms
|
||||
|
||||
@ -6,7 +6,7 @@ import numpy as np
|
||||
from soundevent import data
|
||||
|
||||
from batdetect2.postprocess.types import ClipDetectionsArray, Detection
|
||||
from batdetect2.targets.types import TargetProtocol
|
||||
from batdetect2.targets.types import ROIMapperProtocol, TargetProtocol
|
||||
|
||||
__all__ = [
|
||||
"DEFAULT_CLASSIFICATION_THRESHOLD",
|
||||
@ -25,6 +25,7 @@ DEFAULT_CLASSIFICATION_THRESHOLD = 0.1
|
||||
def to_detections(
|
||||
detections: ClipDetectionsArray,
|
||||
targets: TargetProtocol,
|
||||
roi_mapper: ROIMapperProtocol,
|
||||
) -> List[Detection]:
|
||||
predictions = []
|
||||
|
||||
@ -39,7 +40,7 @@ def to_detections(
|
||||
):
|
||||
highest_scoring_class = targets.class_names[class_scores.argmax()]
|
||||
|
||||
geom = targets.decode_roi(
|
||||
geom = roi_mapper.decode(
|
||||
(time, freq),
|
||||
dims,
|
||||
class_name=highest_scoring_class,
|
||||
|
||||
@ -4,7 +4,7 @@ from soundevent import data, plot
|
||||
from batdetect2.plotting.clips import plot_clip
|
||||
from batdetect2.plotting.common import create_ax
|
||||
from batdetect2.preprocess.types import PreprocessorProtocol
|
||||
from batdetect2.targets.types import TargetProtocol
|
||||
from batdetect2.targets.types import ROIMapperProtocol, TargetProtocol
|
||||
|
||||
__all__ = [
|
||||
"plot_clip_annotation",
|
||||
@ -48,6 +48,7 @@ def plot_clip_annotation(
|
||||
def plot_anchor_points(
|
||||
clip_annotation: data.ClipAnnotation,
|
||||
targets: TargetProtocol,
|
||||
roi_mapper: ROIMapperProtocol,
|
||||
figsize: tuple[int, int] | None = None,
|
||||
ax: Axes | None = None,
|
||||
size: int = 1,
|
||||
@ -63,7 +64,11 @@ def plot_anchor_points(
|
||||
if not targets.filter(sound_event):
|
||||
continue
|
||||
|
||||
position, _ = targets.encode_roi(sound_event)
|
||||
class_name = targets.encode_class(sound_event)
|
||||
position, _ = roi_mapper.encode(
|
||||
sound_event.sound_event,
|
||||
class_name=class_name,
|
||||
)
|
||||
positions.append(position)
|
||||
|
||||
X, Y = zip(*positions, strict=False)
|
||||
|
||||
@ -1,6 +1,10 @@
|
||||
"""Main entry point for the BatDetect2 Postprocessing pipeline."""
|
||||
|
||||
from batdetect2.postprocess.config import PostprocessConfig
|
||||
from batdetect2.postprocess.config import (
|
||||
DEFAULT_CLASSIFICATION_THRESHOLD,
|
||||
DEFAULT_DETECTION_THRESHOLD,
|
||||
PostprocessConfig,
|
||||
)
|
||||
from batdetect2.postprocess.nms import non_max_suppression
|
||||
from batdetect2.postprocess.postprocessor import (
|
||||
Postprocessor,
|
||||
@ -28,4 +32,6 @@ __all__ = [
|
||||
"PostprocessorProtocol",
|
||||
"build_postprocessor",
|
||||
"non_max_suppression",
|
||||
"DEFAULT_CLASSIFICATION_THRESHOLD",
|
||||
"DEFAULT_DETECTION_THRESHOLD",
|
||||
]
|
||||
|
||||
@ -63,7 +63,14 @@ class Postprocessor(torch.nn.Module, PostprocessorProtocol):
|
||||
def forward(
|
||||
self,
|
||||
output: ModelOutput,
|
||||
detection_threshold: float | None = None,
|
||||
) -> list[ClipDetectionsTensor]:
|
||||
threshold = (
|
||||
self.detection_threshold
|
||||
if detection_threshold is None
|
||||
else detection_threshold
|
||||
)
|
||||
|
||||
detection_heatmap = non_max_suppression(
|
||||
output.detection_probs.detach(),
|
||||
kernel_size=self.nms_kernel_size,
|
||||
@ -78,7 +85,7 @@ class Postprocessor(torch.nn.Module, PostprocessorProtocol):
|
||||
feature_heatmap=output.features,
|
||||
classification_heatmap=output.class_probs,
|
||||
max_detections=max_detections,
|
||||
threshold=self.detection_threshold,
|
||||
threshold=threshold,
|
||||
)
|
||||
|
||||
return [
|
||||
|
||||
@ -81,5 +81,8 @@ class ClipPrediction:
|
||||
|
||||
class PostprocessorProtocol(Protocol):
|
||||
def __call__(
|
||||
self, output: "ModelOutput"
|
||||
self,
|
||||
output: "ModelOutput",
|
||||
*,
|
||||
detection_threshold: float | None = None,
|
||||
) -> list[ClipDetectionsTensor]: ...
|
||||
|
||||
@ -10,7 +10,10 @@ from batdetect2.targets.config import TargetConfig
|
||||
from batdetect2.targets.rois import (
|
||||
AnchorBBoxMapperConfig,
|
||||
ROIMapperConfig,
|
||||
ROIMapperProtocol,
|
||||
ROIMappingConfig,
|
||||
build_roi_mapper,
|
||||
build_roi_mapping,
|
||||
)
|
||||
from batdetect2.targets.targets import (
|
||||
Targets,
|
||||
@ -30,12 +33,15 @@ from batdetect2.targets.types import (
|
||||
Size,
|
||||
SoundEventDecoder,
|
||||
SoundEventEncoder,
|
||||
SoundEventFilter,
|
||||
TargetProtocol,
|
||||
)
|
||||
|
||||
__all__ = [
|
||||
"AnchorBBoxMapperConfig",
|
||||
"Position",
|
||||
"ROIMappingConfig",
|
||||
"ROIMapperProtocol",
|
||||
"ROIMapperConfig",
|
||||
"ROITargetMapper",
|
||||
"Size",
|
||||
@ -46,6 +52,7 @@ __all__ = [
|
||||
"TargetConfig",
|
||||
"TargetProtocol",
|
||||
"Targets",
|
||||
"build_roi_mapping",
|
||||
"build_roi_mapper",
|
||||
"build_sound_event_decoder",
|
||||
"build_sound_event_encoder",
|
||||
|
||||
@ -14,7 +14,6 @@ from batdetect2.data.conditions import (
|
||||
SoundEventConditionConfig,
|
||||
build_sound_event_condition,
|
||||
)
|
||||
from batdetect2.targets.rois import ROIMapperConfig
|
||||
from batdetect2.targets.terms import call_type, generic_class
|
||||
from batdetect2.targets.types import SoundEventDecoder, SoundEventEncoder
|
||||
|
||||
@ -39,8 +38,6 @@ class TargetClassConfig(BaseConfig):
|
||||
|
||||
assign_tags: List[data.Tag] = Field(default_factory=list)
|
||||
|
||||
roi: ROIMapperConfig | None = None
|
||||
|
||||
_match_if: SoundEventConditionConfig = PrivateAttr()
|
||||
|
||||
@model_validator(mode="after")
|
||||
|
||||
@ -9,7 +9,7 @@ from batdetect2.targets.classes import (
|
||||
DEFAULT_DETECTION_CLASS,
|
||||
TargetClassConfig,
|
||||
)
|
||||
from batdetect2.targets.rois import AnchorBBoxMapperConfig, ROIMapperConfig
|
||||
from batdetect2.targets.rois import ROIMappingConfig
|
||||
|
||||
__all__ = [
|
||||
"TargetConfig",
|
||||
@ -25,7 +25,7 @@ class TargetConfig(BaseConfig):
|
||||
default_factory=lambda: DEFAULT_CLASSES
|
||||
)
|
||||
|
||||
roi: ROIMapperConfig = Field(default_factory=AnchorBBoxMapperConfig)
|
||||
roi: ROIMappingConfig = Field(default_factory=ROIMappingConfig)
|
||||
|
||||
@field_validator("classification_targets")
|
||||
def check_unique_class_names(cls, v: List[TargetClassConfig]):
|
||||
|
||||
@ -1,23 +1,19 @@
|
||||
"""Handles mapping between geometric ROIs and target representations.
|
||||
"""Map geometric ROIs to target representations and back.
|
||||
|
||||
This module defines a standardized interface (`ROITargetMapper`) for converting
|
||||
a sound event's Region of Interest (ROI) into a target representation suitable
|
||||
for machine learning models, and for decoding model outputs back into geometric
|
||||
ROIs.
|
||||
a sound event ROI into a target representation for model training and decoding
|
||||
model outputs back into approximate geometries.
|
||||
|
||||
The core operations are:
|
||||
1. **Encoding**: A `soundevent.data.SoundEvent` is mapped to a reference
|
||||
`Position` (time, frequency) and a `Size` array. The method for
|
||||
determining the position and size varies by the mapper implementation
|
||||
(e.g., using a bounding box anchor or the point of peak energy).
|
||||
2. **Decoding**: A `Position` and `Size` array are mapped back to an
|
||||
approximate `soundevent.data.Geometry` (typically a `BoundingBox`).
|
||||
Core operations:
|
||||
|
||||
This logic is encapsulated within specific mapper classes. Configuration for
|
||||
each mapper (e.g., anchor point, scaling factors) is managed by a corresponding
|
||||
Pydantic config object. The `ROIMapperConfig` type allows for flexibly
|
||||
selecting and configuring the desired mapper. This module separates the
|
||||
*geometric* aspect of target definition from *semantic* classification.
|
||||
- Encode a `soundevent.data.SoundEvent` into a reference `Position`
|
||||
`(time, frequency)` and a `Size` array.
|
||||
- Decode a `Position` and `Size` array into an approximate
|
||||
`soundevent.data.Geometry` (usually a `BoundingBox`).
|
||||
|
||||
The specific mapping depends on the selected mapper implementation. Config
|
||||
objects provide mapper-specific parameters such as anchor choice and scaling.
|
||||
This module focuses on the geometric part of target definition.
|
||||
"""
|
||||
|
||||
from typing import Annotated, Literal
|
||||
@ -33,7 +29,12 @@ from batdetect2.core.arrays import spec_to_xarray
|
||||
from batdetect2.core.configs import BaseConfig
|
||||
from batdetect2.preprocess import PreprocessingConfig, build_preprocessor
|
||||
from batdetect2.preprocess.types import PreprocessorProtocol
|
||||
from batdetect2.targets.types import Position, ROITargetMapper, Size
|
||||
from batdetect2.targets.types import (
|
||||
Position,
|
||||
ROIMapperProtocol,
|
||||
ROITargetMapper,
|
||||
Size,
|
||||
)
|
||||
|
||||
__all__ = [
|
||||
"Anchor",
|
||||
@ -44,12 +45,15 @@ __all__ = [
|
||||
"DEFAULT_TIME_SCALE",
|
||||
"PeakEnergyBBoxMapper",
|
||||
"PeakEnergyBBoxMapperConfig",
|
||||
"ROIMappingConfig",
|
||||
"ROIMapperProtocol",
|
||||
"ROIMapperConfig",
|
||||
"ROIMapperImportConfig",
|
||||
"ROITargetMapper",
|
||||
"SIZE_HEIGHT",
|
||||
"SIZE_ORDER",
|
||||
"SIZE_WIDTH",
|
||||
"build_roi_mapping",
|
||||
"build_roi_mapper",
|
||||
]
|
||||
|
||||
@ -131,12 +135,12 @@ class AnchorBBoxMapper(ROITargetMapper):
|
||||
This class implements the `ROITargetMapper` protocol for `BoundingBox`
|
||||
geometries.
|
||||
|
||||
**Encoding**: The `position` is a fixed anchor point on the bounding box
|
||||
(e.g., "bottom-left"). The `size` is a 2-element array containing the
|
||||
scaled width and height of the box.
|
||||
Encoding uses a fixed anchor point on the bounding box for `position`
|
||||
(for example, ``bottom-left``). The `size` is a 2-element array with
|
||||
scaled width and height.
|
||||
|
||||
**Decoding**: Reconstructs a `BoundingBox` from an anchor point and
|
||||
scaled width/height.
|
||||
Decoding reconstructs a `BoundingBox` from anchor position and scaled
|
||||
width/height.
|
||||
|
||||
Attributes
|
||||
----------
|
||||
@ -300,13 +304,12 @@ class PeakEnergyBBoxMapper(ROITargetMapper):
|
||||
|
||||
This class implements the `ROITargetMapper` protocol.
|
||||
|
||||
**Encoding**: The `position` is the (time, frequency) coordinate of the
|
||||
point with the highest energy within the sound event's bounding box. The
|
||||
`size` is a 4-element array representing the scaled distances from this
|
||||
peak energy point to the left, bottom, right, and top edges of the box.
|
||||
Encoding sets `position` to the (time, frequency) coordinate of peak energy
|
||||
inside the sound event bounding box. The `size` is a 4-element array with
|
||||
scaled distances from the peak point to left, bottom, right, and top edges.
|
||||
|
||||
**Decoding**: Reconstructs a `BoundingBox` by adding/subtracting the
|
||||
un-scaled distances from the peak energy point.
|
||||
Decoding reconstructs a `BoundingBox` by applying the unscaled distances to
|
||||
the peak-energy position.
|
||||
|
||||
Attributes
|
||||
----------
|
||||
@ -461,6 +464,59 @@ implementations by using the `name` field as a discriminator.
|
||||
"""
|
||||
|
||||
|
||||
class ROIMappingConfig(BaseConfig):
|
||||
"""Configuration for class-aware ROI mapping.
|
||||
|
||||
Attributes
|
||||
----------
|
||||
default : ROIMapperConfig
|
||||
Default mapper used when no class-specific override exists.
|
||||
overrides : dict[str, ROIMapperConfig]
|
||||
Optional class-specific mapper overrides by class name.
|
||||
"""
|
||||
|
||||
default: ROIMapperConfig = Field(default_factory=AnchorBBoxMapperConfig)
|
||||
overrides: dict[str, ROIMapperConfig] = Field(default_factory=dict)
|
||||
|
||||
|
||||
class ClassAwareROIMapper(ROIMapperProtocol):
|
||||
"""Apply a default ROI mapper with optional per-class overrides."""
|
||||
|
||||
dimension_names: list[str]
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
default_mapper: ROITargetMapper,
|
||||
overrides: dict[str, ROITargetMapper] | None = None,
|
||||
):
|
||||
self.default_mapper = default_mapper
|
||||
self.overrides = overrides or {}
|
||||
self.dimension_names = list(default_mapper.dimension_names)
|
||||
|
||||
def encode(
|
||||
self,
|
||||
sound_event: data.SoundEvent,
|
||||
class_name: str | None = None,
|
||||
) -> tuple[Position, Size]:
|
||||
mapper = self._select_mapper(class_name)
|
||||
return mapper.encode(sound_event)
|
||||
|
||||
def decode(
|
||||
self,
|
||||
position: Position,
|
||||
size: Size,
|
||||
class_name: str | None = None,
|
||||
) -> data.Geometry:
|
||||
mapper = self._select_mapper(class_name)
|
||||
return mapper.decode(position, size)
|
||||
|
||||
def _select_mapper(self, class_name: str | None = None) -> ROITargetMapper:
|
||||
if class_name is not None and class_name in self.overrides:
|
||||
return self.overrides[class_name]
|
||||
|
||||
return self.default_mapper
|
||||
|
||||
|
||||
def build_roi_mapper(
|
||||
config: ROIMapperConfig | None = None,
|
||||
) -> ROITargetMapper:
|
||||
@ -485,6 +541,36 @@ def build_roi_mapper(
|
||||
return roi_mapper_registry.build(config)
|
||||
|
||||
|
||||
def build_roi_mapping(
|
||||
config: ROIMappingConfig | None = None,
|
||||
) -> ROIMapperProtocol:
|
||||
"""Build a class-aware ROI mapper and validate consistency."""
|
||||
config = config or ROIMappingConfig()
|
||||
|
||||
default_mapper = build_roi_mapper(config.default)
|
||||
overrides = {
|
||||
class_name: build_roi_mapper(mapper_config)
|
||||
for class_name, mapper_config in config.overrides.items()
|
||||
}
|
||||
|
||||
expected = list(default_mapper.dimension_names)
|
||||
|
||||
for class_name, mapper in overrides.items():
|
||||
actual = list(mapper.dimension_names)
|
||||
|
||||
if actual != expected:
|
||||
raise ValueError(
|
||||
"All ROI mappers must share the same dimension order. "
|
||||
f"Default dimensions: {expected}, "
|
||||
f"class '{class_name}' dimensions: {actual}."
|
||||
)
|
||||
|
||||
return ClassAwareROIMapper(
|
||||
default_mapper=default_mapper,
|
||||
overrides=overrides,
|
||||
)
|
||||
|
||||
|
||||
VALID_ANCHORS = [
|
||||
"bottom-left",
|
||||
"bottom-right",
|
||||
|
||||
@ -12,21 +12,21 @@ from batdetect2.targets.classes import (
|
||||
get_class_names_from_config,
|
||||
)
|
||||
from batdetect2.targets.config import TargetConfig
|
||||
from batdetect2.targets.rois import (
|
||||
AnchorBBoxMapperConfig,
|
||||
build_roi_mapper,
|
||||
from batdetect2.targets.types import (
|
||||
Position,
|
||||
ROIMapperProtocol,
|
||||
Size,
|
||||
TargetProtocol,
|
||||
)
|
||||
from batdetect2.targets.types import Position, Size, TargetProtocol
|
||||
|
||||
|
||||
class Targets(TargetProtocol):
|
||||
"""Encapsulates the complete configured target definition pipeline.
|
||||
"""Encapsulates the configured target class definition pipeline.
|
||||
|
||||
This class implements the `TargetProtocol`, holding the configured
|
||||
functions for filtering, transforming, encoding (tags to class name),
|
||||
decoding (class name to tags), and mapping ROIs (geometry to position/size
|
||||
and back). It provides a high-level interface to apply these steps and
|
||||
access relevant metadata like class names and dimension names.
|
||||
functions for filtering, encoding (tags to class name), and decoding
|
||||
(class name to tags). Geometry ROI mapping is handled separately by
|
||||
``ROIMapperProtocol``.
|
||||
|
||||
Instances are typically created using the `build_targets` factory function
|
||||
or the `load_targets` convenience loader.
|
||||
@ -39,14 +39,10 @@ class Targets(TargetProtocol):
|
||||
generic_class_tags
|
||||
A list of `soundevent.data.Tag` objects representing the configured
|
||||
generic class category (used when no specific class matches).
|
||||
dimension_names
|
||||
The names of the size dimensions handled by the ROI mapper
|
||||
(e.g., ['width', 'height']).
|
||||
"""
|
||||
|
||||
class_names: list[str]
|
||||
detection_class_tags: list[data.Tag]
|
||||
dimension_names: list[str]
|
||||
detection_class_name: str
|
||||
|
||||
def __init__(self, config: TargetConfig):
|
||||
@ -63,10 +59,6 @@ class Targets(TargetProtocol):
|
||||
config.classification_targets
|
||||
)
|
||||
|
||||
self._roi_mapper = build_roi_mapper(config.roi)
|
||||
|
||||
self.dimension_names = self._roi_mapper.dimension_names
|
||||
|
||||
self.class_names = get_class_names_from_config(
|
||||
config.classification_targets
|
||||
)
|
||||
@ -74,21 +66,6 @@ class Targets(TargetProtocol):
|
||||
self.detection_class_name = config.detection_target.name
|
||||
self.detection_class_tags = config.detection_target.assign_tags
|
||||
|
||||
self._roi_mapper_overrides = {
|
||||
class_config.name: build_roi_mapper(class_config.roi)
|
||||
for class_config in config.classification_targets
|
||||
if class_config.roi is not None
|
||||
}
|
||||
|
||||
for class_name in self._roi_mapper_overrides:
|
||||
if class_name not in self.class_names:
|
||||
# TODO: improve this warning
|
||||
logger.warning(
|
||||
"The ROI mapper overrides contains a class ({class_name}) "
|
||||
"not present in the class names.",
|
||||
class_name=class_name,
|
||||
)
|
||||
|
||||
def filter(self, sound_event: data.SoundEventAnnotation) -> bool:
|
||||
"""Apply the configured filter to a sound event annotation.
|
||||
|
||||
@ -147,75 +124,10 @@ class Targets(TargetProtocol):
|
||||
"""
|
||||
return self._decode_fn(class_label)
|
||||
|
||||
def encode_roi(
|
||||
self, sound_event: data.SoundEventAnnotation
|
||||
) -> tuple[Position, Size]:
|
||||
"""Extract the target reference position from the annotation's roi.
|
||||
|
||||
Delegates to the internal ROI mapper's `get_roi_position` method.
|
||||
|
||||
Parameters
|
||||
----------
|
||||
sound_event : data.SoundEventAnnotation
|
||||
The annotation containing the geometry (ROI).
|
||||
|
||||
Returns
|
||||
-------
|
||||
tuple[float, float]
|
||||
The reference position `(time, frequency)`.
|
||||
|
||||
Raises
|
||||
------
|
||||
ValueError
|
||||
If the annotation lacks geometry.
|
||||
"""
|
||||
class_name = self.encode_class(sound_event)
|
||||
|
||||
if class_name in self._roi_mapper_overrides:
|
||||
return self._roi_mapper_overrides[class_name].encode(
|
||||
sound_event.sound_event
|
||||
)
|
||||
|
||||
return self._roi_mapper.encode(sound_event.sound_event)
|
||||
|
||||
def decode_roi(
|
||||
self,
|
||||
position: Position,
|
||||
size: Size,
|
||||
class_name: str | None = None,
|
||||
) -> data.Geometry:
|
||||
"""Recover an approximate geometric ROI from a position and dimensions.
|
||||
|
||||
Delegates to the internal ROI mapper's `recover_roi` method, which
|
||||
un-scales the dimensions and reconstructs the geometry (typically a
|
||||
`BoundingBox`).
|
||||
|
||||
Parameters
|
||||
----------
|
||||
pos
|
||||
The reference position `(time, frequency)`.
|
||||
dims
|
||||
NumPy array with size dimensions (e.g., from model prediction),
|
||||
matching the order in `self.dimension_names`.
|
||||
|
||||
Returns
|
||||
-------
|
||||
data.Geometry
|
||||
The reconstructed geometry (typically `BoundingBox`).
|
||||
"""
|
||||
if class_name in self._roi_mapper_overrides:
|
||||
return self._roi_mapper_overrides[class_name].decode(
|
||||
position,
|
||||
size,
|
||||
)
|
||||
|
||||
return self._roi_mapper.decode(position, size)
|
||||
|
||||
|
||||
DEFAULT_TARGET_CONFIG: TargetConfig = TargetConfig(
|
||||
classification_targets=DEFAULT_CLASSES,
|
||||
detection_target=DEFAULT_DETECTION_CLASS,
|
||||
roi=AnchorBBoxMapperConfig(),
|
||||
)
|
||||
|
||||
|
||||
@ -292,6 +204,7 @@ def load_targets(
|
||||
def iterate_encoded_sound_events(
|
||||
sound_events: Iterable[data.SoundEventAnnotation],
|
||||
targets: TargetProtocol,
|
||||
roi_mapper: ROIMapperProtocol,
|
||||
) -> Iterable[tuple[str | None, Position, Size]]:
|
||||
for sound_event in sound_events:
|
||||
if not targets.filter(sound_event):
|
||||
@ -303,6 +216,9 @@ def iterate_encoded_sound_events(
|
||||
continue
|
||||
|
||||
class_name = targets.encode_class(sound_event)
|
||||
position, size = targets.encode_roi(sound_event)
|
||||
position, size = roi_mapper.encode(
|
||||
sound_event.sound_event,
|
||||
class_name=class_name,
|
||||
)
|
||||
|
||||
yield class_name, position, size
|
||||
|
||||
@ -6,6 +6,7 @@ from soundevent import data
|
||||
|
||||
__all__ = [
|
||||
"Position",
|
||||
"ROIMapperProtocol",
|
||||
"ROITargetMapper",
|
||||
"Size",
|
||||
"SoundEventDecoder",
|
||||
@ -26,7 +27,6 @@ class TargetProtocol(Protocol):
|
||||
class_names: list[str]
|
||||
detection_class_tags: list[data.Tag]
|
||||
detection_class_name: str
|
||||
dimension_names: list[str]
|
||||
|
||||
def filter(self, sound_event: data.SoundEventAnnotation) -> bool: ...
|
||||
|
||||
@ -37,6 +37,23 @@ class TargetProtocol(Protocol):
|
||||
|
||||
def decode_class(self, class_label: str) -> list[data.Tag]: ...
|
||||
|
||||
|
||||
class ROIMapperProtocol(Protocol):
|
||||
dimension_names: list[str]
|
||||
|
||||
def encode(
|
||||
self,
|
||||
sound_event: data.SoundEvent,
|
||||
class_name: str | None = None,
|
||||
) -> tuple[Position, Size]: ...
|
||||
|
||||
def decode(
|
||||
self,
|
||||
position: Position,
|
||||
size: Size,
|
||||
class_name: str | None = None,
|
||||
) -> data.Geometry: ...
|
||||
|
||||
def encode_roi(
|
||||
self,
|
||||
sound_event: data.SoundEventAnnotation,
|
||||
|
||||
@ -93,7 +93,8 @@ class ValidationMetrics(Callback):
|
||||
model = pl_module.model
|
||||
if self.output_transform is None:
|
||||
self.output_transform = build_output_transform(
|
||||
targets=model.targets
|
||||
targets=model.targets,
|
||||
roi_mapper=model.roi_mapper,
|
||||
)
|
||||
|
||||
output_transform = self.output_transform
|
||||
|
||||
@ -40,7 +40,7 @@ def build_checkpoint_callback(
|
||||
if run_name is not None:
|
||||
checkpoint_dir = checkpoint_dir / run_name
|
||||
|
||||
checkpoint_dir.mkdir(parents=True, exist_ok=True)
|
||||
Path(checkpoint_dir).mkdir(parents=True, exist_ok=True)
|
||||
|
||||
return ModelCheckpoint(
|
||||
dirpath=str(checkpoint_dir),
|
||||
|
||||
@ -14,8 +14,12 @@ from soundevent import data
|
||||
|
||||
from batdetect2.core.configs import BaseConfig
|
||||
from batdetect2.preprocess import MAX_FREQ, MIN_FREQ
|
||||
from batdetect2.targets import build_targets, iterate_encoded_sound_events
|
||||
from batdetect2.targets.types import TargetProtocol
|
||||
from batdetect2.targets import (
|
||||
build_roi_mapping,
|
||||
build_targets,
|
||||
iterate_encoded_sound_events,
|
||||
)
|
||||
from batdetect2.targets.types import ROIMapperProtocol, TargetProtocol
|
||||
from batdetect2.train.types import ClipLabeller, Heatmaps
|
||||
|
||||
__all__ = [
|
||||
@ -42,6 +46,7 @@ class LabelConfig(BaseConfig):
|
||||
|
||||
def build_clip_labeler(
|
||||
targets: TargetProtocol | None = None,
|
||||
roi_mapper: ROIMapperProtocol | None = None,
|
||||
min_freq: float = MIN_FREQ,
|
||||
max_freq: float = MAX_FREQ,
|
||||
config: LabelConfig | None = None,
|
||||
@ -53,12 +58,13 @@ def build_clip_labeler(
|
||||
lambda: config.to_yaml_string(),
|
||||
)
|
||||
|
||||
if targets is None:
|
||||
targets = build_targets()
|
||||
targets = targets or build_targets()
|
||||
roi_mapper = roi_mapper or build_roi_mapping()
|
||||
|
||||
return partial(
|
||||
generate_heatmaps,
|
||||
targets=targets,
|
||||
roi_mapper=roi_mapper,
|
||||
min_freq=min_freq,
|
||||
max_freq=max_freq,
|
||||
target_sigma=config.sigma,
|
||||
@ -73,6 +79,7 @@ def generate_heatmaps(
|
||||
clip_annotation: data.ClipAnnotation,
|
||||
spec: torch.Tensor,
|
||||
targets: TargetProtocol,
|
||||
roi_mapper: ROIMapperProtocol,
|
||||
min_freq: float,
|
||||
max_freq: float,
|
||||
target_sigma: float = 3.0,
|
||||
@ -89,7 +96,7 @@ def generate_heatmaps(
|
||||
height = spec.shape[-2]
|
||||
width = spec.shape[-1]
|
||||
num_classes = len(targets.class_names)
|
||||
num_dims = len(targets.dimension_names)
|
||||
num_dims = len(roi_mapper.dimension_names)
|
||||
clip = clip_annotation.clip
|
||||
|
||||
# Initialize heatmaps
|
||||
@ -109,6 +116,7 @@ def generate_heatmaps(
|
||||
for class_name, (time, frequency), size in iterate_encoded_sound_events(
|
||||
clip_annotation.sound_events,
|
||||
targets,
|
||||
roi_mapper,
|
||||
):
|
||||
time_index = map_to_pixels(time, width, clip.start_time, clip.end_time)
|
||||
freq_index = map_to_pixels(frequency, height, min_freq, max_freq)
|
||||
|
||||
@ -6,23 +6,24 @@ from lightning import Trainer, seed_everything
|
||||
from loguru import logger
|
||||
from soundevent import data
|
||||
|
||||
from batdetect2.audio import AudioConfig, build_audio_loader
|
||||
from batdetect2.audio.types import AudioLoader
|
||||
from batdetect2.evaluate import build_evaluator
|
||||
from batdetect2.evaluate.types import EvaluatorProtocol
|
||||
from batdetect2.audio import AudioConfig, AudioLoader, build_audio_loader
|
||||
from batdetect2.evaluate import EvaluatorProtocol, build_evaluator
|
||||
from batdetect2.logging import (
|
||||
LoggerConfig,
|
||||
TensorBoardLoggerConfig,
|
||||
build_logger,
|
||||
)
|
||||
from batdetect2.models import Model, ModelConfig, build_model
|
||||
from batdetect2.preprocess import build_preprocessor
|
||||
from batdetect2.preprocess.types import PreprocessorProtocol
|
||||
from batdetect2.targets import build_targets
|
||||
from batdetect2.targets.types import TargetProtocol
|
||||
from batdetect2.train import TrainingConfig
|
||||
from batdetect2.preprocess import PreprocessorProtocol, build_preprocessor
|
||||
from batdetect2.targets import (
|
||||
ROIMapperProtocol,
|
||||
TargetProtocol,
|
||||
build_roi_mapping,
|
||||
build_targets,
|
||||
)
|
||||
from batdetect2.train.callbacks import ValidationMetrics
|
||||
from batdetect2.train.checkpoints import build_checkpoint_callback
|
||||
from batdetect2.train.config import TrainingConfig
|
||||
from batdetect2.train.dataset import build_train_loader, build_val_loader
|
||||
from batdetect2.train.labels import build_clip_labeler
|
||||
from batdetect2.train.lightning import build_training_module
|
||||
@ -39,6 +40,7 @@ def run_train(
|
||||
val_annotations: Sequence[data.ClipAnnotation] | None = None,
|
||||
model: Model | None = None,
|
||||
targets: Optional["TargetProtocol"] = None,
|
||||
roi_mapper: Optional["ROIMapperProtocol"] = None,
|
||||
preprocessor: Optional["PreprocessorProtocol"] = None,
|
||||
audio_loader: Optional["AudioLoader"] = None,
|
||||
labeller: Optional["ClipLabeller"] = None,
|
||||
@ -69,8 +71,15 @@ def run_train(
|
||||
if model is not None:
|
||||
targets = targets or model.targets
|
||||
|
||||
if roi_mapper is None and targets is model.targets:
|
||||
roi_mapper = model.roi_mapper
|
||||
|
||||
targets = targets or build_targets(config=model_config.targets)
|
||||
|
||||
roi_mapper = roi_mapper or build_roi_mapping(
|
||||
config=model_config.targets.roi
|
||||
)
|
||||
|
||||
audio_loader = audio_loader or build_audio_loader(config=audio_config)
|
||||
|
||||
preprocessor = preprocessor or build_preprocessor(
|
||||
@ -80,6 +89,7 @@ def run_train(
|
||||
|
||||
labeller = labeller or build_clip_labeler(
|
||||
targets,
|
||||
roi_mapper,
|
||||
min_freq=preprocessor.min_freq,
|
||||
max_freq=preprocessor.max_freq,
|
||||
config=train_config.labels,
|
||||
@ -119,6 +129,7 @@ def run_train(
|
||||
evaluator=build_evaluator(
|
||||
train_config.validation,
|
||||
targets=targets,
|
||||
roi_mapper=roi_mapper,
|
||||
),
|
||||
checkpoint_dir=checkpoint_dir,
|
||||
num_epochs=num_epochs,
|
||||
|
||||
@ -243,7 +243,7 @@ def test_user_can_load_checkpoint_with_new_targets(
|
||||
detector = cast(Detector, api.model.detector)
|
||||
classifier_head = cast(ClassifierHead, detector.classifier_head)
|
||||
|
||||
assert api.targets.config == sample_targets.config
|
||||
assert api.targets.config == sample_targets.config # type: ignore
|
||||
assert detector.num_classes == len(sample_targets.class_names)
|
||||
assert (
|
||||
classifier_head.classifier.out_channels
|
||||
@ -399,6 +399,61 @@ def test_process_file_uses_resolved_batch_size_by_default(
|
||||
assert captured["batch_size"] == api_v2.inference_config.loader.batch_size
|
||||
|
||||
|
||||
def test_detection_threshold_override_changes_process_file_results(
|
||||
api_v2: BatDetect2API,
|
||||
example_audio_files: list[Path],
|
||||
) -> None:
|
||||
"""User story: users can override threshold in process_file."""
|
||||
|
||||
default_prediction = api_v2.process_file(example_audio_files[0])
|
||||
strict_prediction = api_v2.process_file(
|
||||
example_audio_files[0],
|
||||
detection_threshold=1.0,
|
||||
)
|
||||
|
||||
assert len(strict_prediction.detections) <= len(
|
||||
default_prediction.detections
|
||||
)
|
||||
|
||||
|
||||
def test_detection_threshold_override_is_ephemeral_in_process_file(
|
||||
api_v2: BatDetect2API,
|
||||
example_audio_files: list[Path],
|
||||
) -> None:
|
||||
"""User story: per-call threshold override does not change defaults."""
|
||||
|
||||
before = api_v2.process_file(example_audio_files[0])
|
||||
_ = api_v2.process_file(
|
||||
example_audio_files[0],
|
||||
detection_threshold=1.0,
|
||||
)
|
||||
after = api_v2.process_file(example_audio_files[0])
|
||||
|
||||
assert len(before.detections) == len(after.detections)
|
||||
np.testing.assert_allclose(
|
||||
[det.detection_score for det in before.detections],
|
||||
[det.detection_score for det in after.detections],
|
||||
atol=1e-6,
|
||||
)
|
||||
|
||||
|
||||
def test_detection_threshold_override_changes_spectrogram_results(
|
||||
api_v2: BatDetect2API,
|
||||
example_audio_files: list[Path],
|
||||
) -> None:
|
||||
"""User story: threshold override works in spectrogram path."""
|
||||
|
||||
audio = api_v2.load_audio(example_audio_files[0])
|
||||
spec = api_v2.generate_spectrogram(audio)
|
||||
default_detections = api_v2.process_spectrogram(spec)
|
||||
strict_detections = api_v2.process_spectrogram(
|
||||
spec,
|
||||
detection_threshold=1.0,
|
||||
)
|
||||
|
||||
assert len(strict_detections) <= len(default_detections)
|
||||
|
||||
|
||||
def test_per_call_overrides_are_ephemeral(monkeypatch) -> None:
|
||||
"""User story: call-level overrides do not mutate resolved defaults."""
|
||||
|
||||
|
||||
@ -1,7 +1,7 @@
|
||||
import numpy as np
|
||||
import torch
|
||||
import torch.nn.functional as F
|
||||
from hypothesis import given
|
||||
from hypothesis import given, settings
|
||||
from hypothesis import strategies as st
|
||||
|
||||
from batdetect2.detector import parameters
|
||||
@ -9,6 +9,7 @@ from batdetect2.utils import audio_utils, detector_utils
|
||||
|
||||
|
||||
@given(duration=st.floats(min_value=0.1, max_value=1))
|
||||
@settings(deadline=None)
|
||||
def test_can_compute_correct_spectrogram_width(duration: float):
|
||||
samplerate = parameters.TARGET_SAMPLERATE_HZ
|
||||
params = parameters.DEFAULT_SPECTROGRAM_PARAMETERS
|
||||
@ -87,6 +88,7 @@ def test_pad_audio_without_fixed_size(duration: float):
|
||||
|
||||
|
||||
@given(duration=st.floats(min_value=0.1, max_value=2))
|
||||
@settings(deadline=None)
|
||||
def test_computed_spectrograms_are_actually_divisible_by_the_spec_divide_factor(
|
||||
duration: float,
|
||||
):
|
||||
|
||||
@ -8,15 +8,6 @@ from click.testing import CliRunner
|
||||
from batdetect2.cli import cli
|
||||
|
||||
|
||||
def test_cli_detect_help() -> None:
|
||||
"""User story: get usage help for legacy detect command."""
|
||||
|
||||
result = CliRunner().invoke(cli, ["detect", "--help"])
|
||||
|
||||
assert result.exit_code == 0
|
||||
assert "Detect bat calls in files in AUDIO_DIR" in result.output
|
||||
|
||||
|
||||
def test_cli_detect_command_on_test_audio(tmp_path: Path) -> None:
|
||||
"""User story: run legacy detect on example audio directory."""
|
||||
|
||||
|
||||
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue
Block a user