Incorporate previous docs into new structure

2026-07-08 05:10:09 +02:00 · 2026-03-28 19:42:09 +00:00 · 2026-03-28 19:42:09 +00:00 · 67bb66db3c
commit 67bb66db3c
parent d2d804f0c3
37 changed files with 810 additions and 1827 deletions
--- a/docs/source/architecture.md
+++ b/docs/source/architecture.md
@ -1,93 +0,0 @@
 # BatDetect2 Architecture Overview
 This document provides a comprehensive map of the `batdetect2` codebase architecture. It is intended to serve as a deep-dive reference for developers, agents, and contributors navigating the project.
 `batdetect2` is designed as a modular deep learning pipeline for detecting and classifying bat echolocation calls in high-frequency audio recordings. It heavily utilizes **PyTorch**, **PyTorch Lightning** for training, and the **Soundevent** library for standardized audio and geometry data classes.
 The repository follows a configuration-driven design pattern, heavily utilizing `pydantic`/`omegaconf` (via `BaseConfig`) and the Factory/Registry patterns for dependency injection and modularity. The entire pipeline can be orchestrated via the high-level API `BatDetect2API` (`src/batdetect2/api_v2.py`).
 ---
 ## 1. Data Flow Pipeline
 The standard lifecycle of a prediction request follows these sequential stages, each handled by an isolated, replaceable module:
 1. **Audio Loading (`batdetect2.audio`)**: Read raw `.wav` files into standard NumPy arrays or `soundevent.data.Clip` objects. Handles resampling.
 2. **Preprocessing (`batdetect2.preprocess`)**: Converts raw 1D waveforms into 2D Spectrogram tensors.
 3. **Forward Pass (`batdetect2.models`)**: A PyTorch neural network processes the spectrogram and outputs dense prediction tensors (e.g., detection heatmaps, bounding box sizes, class probabilities).
 4. **Postprocessing (`batdetect2.postprocess`)**: Decodes the raw output tensors back into explicit geometry bounding boxes and runs Non-Maximum Suppression (NMS) to filter redundant predictions.
 5. **Formatting (`batdetect2.data`)**: Transforms the predictions into standard formats (`.csv`, `.json`, `.parquet`) using `OutputFormatterProtocol`.
 ---
 ## 2. Core Modules Breakdown
 ### 2.1 Audio and Preprocessing
 - **`audio/`**: 
  - Centralizes audio I/O using `AudioLoader`. It abstracts over the `soundevent` library, efficiently handling full `Recording` files or smaller `Clip` segments, standardizing the sample rate.
 - **`preprocess/`**: 
  - Dictated by the `PreprocessorProtocol`. 
  - Its primary responsibility is spectrogram generation via Short-Time Fourier Transform (STFT).
  - During training, it incorporates data augmentation layers (e.g., amplitude scaling, time masking, frequency masking, spectral mean subtraction) configured via `PreprocessingConfig`.
 ### 2.2 Deep Learning Models (`models/`)
 The `models` directory contains all PyTorch neural network architectures. The default architecture is an Encoder-Decoder (U-Net style) network.
 - **`blocks.py`**: Reusable neural network blocks, including standard Convolutions (`ConvBlock`) and specialized layers like `FreqCoordConvDownBlock`/`FreqCoordConvUpBlock` which append normalized spatial frequency coordinates to explicitly grant convolutional filters frequency-awareness.
 - **`encoder.py`**: The downsampling path (feature extraction). Builds a sequential list of blocks and captures skip connections.
 - **`bottleneck.py`**: The deepest, lowest-resolution segment connecting the Encoder and Decoder. Features an optional `SelfAttention` mechanism to weigh global temporal contexts.
 - **`decoder.py`**: The upsampling path (reconstruction), actively integrating skip connections (residuals) from the Encoder.
 - **`heads.py`**: Attach to the backbone's feature map to output specific predictions:
  - `BBoxHead`: Predicts bounding box sizes.
  - `ClassifierHead`: Predicts species classes.
  - `DetectorHead`: Predicts detection probability heatmaps.
 - **`backbones.py` & `detectors.py`**: Assemble the encoder, bottleneck, decoder, and heads into a cohesive `Detector` model.
 - **`__init__.py:Model`**: The overarching wrapper `torch.nn.Module` containing the `detector`, `preprocessor`, `postprocessor`, and `targets`.
 ### 2.3 Targets and Regions of Interest (`targets/`)
 Crucial for training, this module translates physical annotations (Regions of Interest / ROIs) into training targets (tensors).
 - **`rois.py`**: Implements `ROITargetMapper`. Maps a geometric bounding box into a 2D reference `Position` (time, freq) and a `Size` array. Includes strategies like:
  - `AnchorBBoxMapper`: Maps based on a fixed bounding box corner/center.
  - `PeakEnergyBBoxMapper`: Identifies the physical coordinate of peak acoustic energy inside the bounding box and calculates offsets to the box edges.
 - **`targets.py`**: Reconstructs complete multi-channel target heatmaps and coordinate tensors from the ROIs to compute losses during training.
 ### 2.4 Postprocessing (`postprocess/`)
 - Implements `PostprocessorProtocol`.
 - Reverses the logic from `targets`. It scans the model's output detection heatmaps for peaks, extracts the predicted sizes and class probabilities at those peaks, and decodes them back into physical `soundevent.data.Geometry` (Bounding Boxes).
 - Automatically applies Non-Maximum Suppression (NMS) configured via `PostprocessConfig` to remove highly overlapping predictions.
 ### 2.5 Data Management (`data/`)
 - **`annotations/`**: Utilities to load dataset annotations supporting multiple standardized schemas (`AOEF`, `BatDetect2` formats).
 - **`datasets.py`**: Aggregates recordings and annotations into memory.
 - **`predictions/`**: Handles the exporting of model results via `OutputFormatterProtocol`. Includes formatters for `RawOutput`, `.parquet`, `.json`, etc.
 ### 2.6 Evaluation (`evaluate/`)
 - Computes scientific metrics using `EvaluatorProtocol`.
 - Provides specific testing environments for tasks like `Clip Classification`, `Clip Detection`, and `Top Class` predictions.
 - Generates precision-recall curves and scatter plots.
 ### 2.7 Training (`train/`)
 - Implements the distributed PyTorch training loop via PyTorch Lightning.
 - **`lightning.py`**: Contains `TrainingModule`, the `LightningModule` that orchestrates the optimizer, learning rate scheduler, forward passes, and backpropagation using the generated `targets`.
 ---
 ## 3. Interfaces and Tooling
 ### 3.1 APIs
 - **`api_v2.py` (`BatDetect2API`)**: The modern API object. It is deeply integrated with dependency injection using `BatDetect2Config`. It instantiates the loader, targets, preprocessor, postprocessor, and model, exposing easy-to-use methods like `process_file`, `evaluate`, and `train`.
 - **`api.py`**: The legacy API. Kept for backwards compatibility. Uses hardcoded default instances rather than configuration objects.
 ### 3.2 Command Line Interface (`cli/`)
 - Implements terminal commands utilizing `click`. Commands include `batdetect2 detect`, `evaluate`, and `train`.
 ### 3.3 Core and Configuration (`core/`, `config.py`)
 - **`core/registries.py`**: A string-based Registry pattern (e.g., `block_registry`, `roi_mapper_registry`) that allows developers to dynamically swap components (like a custom neural network block) via configuration files without modifying python code.
 - **`config.py`**: Aggregates all modular `BaseConfig` objects (`AudioConfig`, `PreprocessingConfig`, `BackboneConfig`) into the monolithic `BatDetect2Config`.
 ---
 ## Summary
 To navigate this codebase effectively:
 1. Follow **`api_v2.py`** to see how high-level operations invoke individual components.
 2. Rely heavily on the typed **Protocols** located in each subsystem's `types.py` module (for example `src/batdetect2/preprocess/types.py` and `src/batdetect2/postprocess/types.py`) to understand inputs and outputs without needing to read each implementation.
 3. Understand that data flows structurally as `soundevent` primitives externally, and as pure `torch.Tensor` internally through the network.
--- a/docs/source/data/aoef.md
+++ b/docs/source/data/aoef.md
@ -1,106 +0,0 @@
 # Using AOEF / Soundevent Data Sources
 ## Introduction
 The **AOEF (Acoustic Open Event Format)**, stored as `.json` files, is the annotation format used by the underlying `soundevent` library and is compatible with annotation tools like **Whombat**.
 BatDetect2 can directly load annotation data stored in this format.
 This format can represent two main types of annotation collections:
 1.  `AnnotationSet`: A straightforward collection of annotations for various audio clips.
 2.  `AnnotationProject`: A more structured format often exported by annotation tools (like Whombat).
    It includes not only the annotations but also information about annotation _tasks_ (work assigned to annotators) and their status (e.g., in-progress, completed, verified, rejected).
 This section explains how to configure a data source in your `DatasetConfig` to load data from either type of AOEF file.
 ## Configuration
 To define a data source using the AOEF format, you add an entry to the `sources` list in your main `DatasetConfig` (usually within your primary YAML configuration file) and set the `format` field to `"aoef"`.
 Here are the key fields you need to specify for an AOEF source:
 - `format: "aoef"`: **(Required)** Tells BatDetect2 to use the AOEF loader for this source.
 - `name: your_source_name`: **(Required)** A unique name you choose for this data source (e.g., `"whombat_project_export"`, `"final_annotations"`).
 - `audio_dir: path/to/audio/files`: **(Required)** The path to the directory where the actual audio `.wav` files referenced in the annotations are located.
 - `annotations_path: path/to/your/annotations.aoef`: **(Required)** The path to the single `.aoef` or `.json` file containing the annotation data (either an `AnnotationSet` or an `AnnotationProject`).
 - `description: "Details about this source..."`: (Optional) A brief description of the data source.
 - `filter: ...`: **(Optional)** Specific settings used _only if_ the `annotations_path` file contains an `AnnotationProject`.
  See details below.
 ## Filtering Annotation Projects (Optional)
 When working with annotation projects, especially collaborative ones or those still in progress (like exports from Whombat), you often want to train only on annotations that are considered complete and reliable.
 The optional `filter:` section allows you to specify criteria based on the status of the annotation _tasks_ within the project.
 **If `annotations_path` points to a simple `AnnotationSet` file, the `filter:` section is ignored.**
 If `annotations_path` points to an `AnnotationProject`, you can add a `filter:` block with the following options:
 - `only_completed: <true_or_false>`:
  - `true` (Default): Only include annotations from tasks that have been marked as "completed".
  - `false`: Include annotations regardless of task completion status.
 - `only_verified: <true_or_false>`:
  - `false` (Default): Verification status is not considered.
  - `true`: Only include annotations from tasks that have _also_ been marked as "verified" (typically meaning they passed a review step).
 - `exclude_issues: <true_or_false>`:
  - `true` (Default): Exclude annotations from any task that has been marked as "rejected" or flagged with issues.
  - `false`: Include annotations even if their task was marked as having issues (use with caution).
 **Default Filtering:** If you include the `filter:` block but omit some options, or if you _omit the entire `filter:` block_, the default settings are applied to `AnnotationProject` files: `only_completed: true`, `only_verified: false`, `exclude_issues: true`.
 This common default selects annotations from completed tasks that haven't been rejected, without requiring separate verification.
 **Disabling Filtering:** If you want to load _all_ annotations from an `AnnotationProject` regardless of task status, you can explicitly disable filtering by setting `filter: null` in your YAML configuration.
 ## YAML Configuration Examples
 **Example 1: Loading a standard AnnotationSet (or a Project with default filtering)**
 ```yaml
 # In your main DatasetConfig YAML file
 sources:
  - name: "MyFinishedAnnotations"
    format: "aoef" # Specifies the loader
    audio_dir: "/path/to/my/audio/"
    annotations_path: "/path/to/my/dataset.soundevent.json" # Path to the AOEF file
    description: "Finalized annotations set."
    # No 'filter:' block means default filtering applied IF it's an AnnotationProject,
    # or no filtering applied if it's an AnnotationSet.
 ```
 **Example 2: Loading an AnnotationProject, requiring verification**
 ```yaml
 # In your main DatasetConfig YAML file
 sources:
  - name: "WhombatVerifiedExport"
    format: "aoef"
    audio_dir: "relative/path/to/audio/" # Relative to where BatDetect2 runs or a base_dir
    annotations_path: "exports/whombat_project.aoef" # Path to the project file
    description: "Annotations from Whombat project, only using verified tasks."
    filter: # Customize the filter
      only_completed: true # Still require completion
      only_verified: true # *Also* require verification
      exclude_issues: true # Still exclude rejected tasks
 ```
 **Example 3: Loading an AnnotationProject, disabling all filtering**
 ```yaml
 # In your main DatasetConfig YAML file
 sources:
  - name: "WhombatRawExport"
    format: "aoef"
    audio_dir: "data/audio_pool/"
    annotations_path: "exports/whombat_project_all.aoef"
    description: "All annotations from Whombat, regardless of task status."
    filter: null # Explicitly disable task filtering
 ```
 ## Summary
 To load standard `soundevent` annotations (including Whombat exports), set `format: "aoef"` for your data source in the `DatasetConfig`.
 Provide the `audio_dir` and the path to the single `annotations_path` file.
 If dealing with `AnnotationProject` files, you can optionally use the `filter:` block to select annotations based on task completion, verification, or issue status.
--- a/docs/source/data/index.md
+++ b/docs/source/data/index.md
@ -1,9 +0,0 @@
 # Loading Data
 ```{toctree}
 :maxdepth: 1
 :caption: Loading Data
 aoef
 legacy
 ```
--- a/docs/source/data/legacy.md
+++ b/docs/source/data/legacy.md
@ -1,122 +0,0 @@
 # Using Legacy BatDetect2 Annotation Formats
 ## Introduction
 If you have annotation data created using older BatDetect2 annotation tools, BatDetect2 provides tools to load these datasets.
 These older formats typically use JSON files to store annotation information, including bounding boxes and labels for sound events within recordings.
 There are two main variations of this legacy format that BatDetect2 can load:
 1.  **Directory-Based (`format: "batdetect2"`):** Annotations for each audio recording are stored in a _separate_ JSON file within a dedicated directory.
    There's a naming convention linking the JSON file to its corresponding audio file (e.g., `my_recording.wav` annotations are stored in `my_recording.wav.json`).
 2.  **Single Merged File (`format: "batdetect2_file"`):** Annotations for _multiple_ recordings are aggregated into a _single_ JSON file.
    This file contains a list, where each item represents the annotations for one recording, following the same internal structure as the directory-based format.
 When you configure BatDetect2 to use these formats, it will read the legacy data and convert it internally into the standard `soundevent` data structures used by the rest of the pipeline.
 ## Configuration
 You specify which legacy format to use within the `sources` list of your main `DatasetConfig` (usually in your primary YAML configuration file).
 ### Format 1: Directory-Based
 Use this when you have a folder containing many individual JSON annotation files, one for each audio file.
 **Configuration Fields:**
 - `format: "batdetect2"`: **(Required)** Identifies this specific legacy format loader.
 - `name: your_source_name`: **(Required)** A unique name for this data source.
 - `audio_dir: path/to/audio/files`: **(Required)** Path to the directory containing the `.wav` audio files.
 - `annotations_dir: path/to/annotation/jsons`: **(Required)** Path to the directory containing the individual `.json` annotation files.
 - `description: "Details..."`: (Optional) Description of this source.
 - `filter: ...`: (Optional) Settings to filter which JSON files are processed based on flags within them (see "Filtering Legacy Annotations" below).
 **YAML Example:**
 ```yaml
 # In your main DatasetConfig YAML file
 sources:
  - name: "OldProject_SiteA_Files"
    format: "batdetect2" # Use the directory-based loader
    audio_dir: "/data/SiteA/Audio/"
    annotations_dir: "/data/SiteA/Annotations_JSON/"
    description: "Legacy annotations stored as individual JSONs per recording."
    # filter: ... # Optional filter settings can be added here
 ```
 ### Format 2: Single Merged File
 Use this when you have a single JSON file that contains a list of annotations for multiple recordings.
 **Configuration Fields:**
 - `format: "batdetect2_file"`: **(Required)** Identifies this specific legacy format loader.
 - `name: your_source_name`: **(Required)** A unique name for this data source.
 - `audio_dir: path/to/audio/files`: **(Required)** Path to the directory containing the `.wav` audio files referenced _within_ the merged JSON file.
 - `annotations_path: path/to/your/merged_annotations.json`: **(Required)** Path to the single `.json` file containing the list of annotations.
 - `description: "Details..."`: (Optional) Description of this source.
 - `filter: ...`: (Optional) Settings to filter which records _within_ the merged file are processed (see "Filtering Legacy Annotations" below).
 **YAML Example:**
 ```yaml
 # In your main DatasetConfig YAML file
 sources:
  - name: "OldProject_Merged"
    format: "batdetect2_file" # Use the merged file loader
    audio_dir: "/data/AllAudio/"
    annotations_path: "/data/CombinedAnnotations/old_project_merged.json"
    description: "Legacy annotations aggregated into a single JSON file."
    # filter: ... # Optional filter settings can be added here
 ```
 ## Filtering Legacy Annotations
 The legacy JSON annotation structure (for both formats) included boolean flags indicating the status of the annotation work for each recording:
 - `annotated`: Typically `true` if a human had reviewed or created annotations for the file.
 - `issues`: Typically `true` if problems were noted during annotation or review.
 You can optionally filter the data based on these flags using a `filter:` block within the source configuration.
 This applies whether you use `"batdetect2"` or `"batdetect2_file"`.
 **Filter Options:**
 - `only_annotated: <true_or_false>`:
  - `true` (**Default**): Only process entries where the `annotated` flag in the JSON is `true`.
  - `false`: Process entries regardless of the `annotated` flag.
 - `exclude_issues: <true_or_false>`:
  - `true` (**Default**): Skip processing entries where the `issues` flag in the JSON is `true`.
  - `false`: Process entries even if they are flagged with `issues`.
 **Default Filtering:** If you **omit** the `filter:` block entirely, the default settings (`only_annotated: true`, `exclude_issues: true`) are applied automatically.
 This means only entries marked as annotated and not having issues will be loaded.
 **Disabling Filtering:** To load _all_ entries from the legacy source regardless of the `annotated` or `issues` flags, explicitly disable the filter:
 ```yaml
 filter: null
 ```
 **YAML Example (Custom Filter):** Only load entries marked as annotated, but _include_ those with issues.
 ```yaml
 sources:
  - name: "LegacyData_WithIssues"
    format: "batdetect2" # Or "batdetect2_file"
    audio_dir: "path/to/audio"
    annotations_dir: "path/to/annotations" # Or annotations_path for merged
    filter:
      only_annotated: true
      exclude_issues: false # Include entries even if issues flag is true
 ```
 ## Summary
 BatDetect2 allows you to incorporate datasets stored in older "BatDetect2" JSON formats.
 - Use `format: "batdetect2"` and provide `annotations_dir` if you have one JSON file per recording in a directory.
 - Use `format: "batdetect2_file"` and provide `annotations_path` if you have a single JSON file containing annotations for multiple recordings.
 - Optionally use the `filter:` block with `only_annotated` and `exclude_issues` to select data based on flags present in the legacy JSON structure.
 The system will handle loading, filtering (if configured), and converting this legacy data into the standard `soundevent` format used internally.
--- a/docs/source/explanation/index.md
+++ b/docs/source/explanation/index.md
@ -7,4 +7,8 @@ about trade-offs.
 :maxdepth: 1
 model-output-and-validation
 postprocessing-and-thresholds
 pipeline-overview
 preprocessing-consistency
 target-encoding-and-decoding
 ```
--- a/docs/source/explanation/pipeline-overview.md
+++ b/docs/source/explanation/pipeline-overview.md
@ -0,0 +1,34 @@
 # Pipeline overview
 batdetect2 processes recordings as a sequence of modules. Each stage has a
 clear role and configuration surface.
 ## End-to-end flow
 1. Audio loading
 2. Preprocessing (waveform -> spectrogram)
 3. Detector forward pass
 4. Postprocessing (peaks, decoding, thresholds)
 5. Output formatting and export
 ## Why the modular design matters
 The model, preprocessing, postprocessing, targets, and output formatting are
 configured separately. That makes it easier to:
 - swap components without rewriting the whole pipeline,
 - keep experiments reproducible,
 - adapt workflows to new datasets.
 ## Core objects in the stack
 - `BatDetect2API` orchestrates training, inference, and evaluation workflows.
 - `ModelConfig` defines architecture, preprocessing, postprocessing, and
  targets.
 - `Targets` controls event filtering, class encoding/decoding, and ROI mapping.
 ## Related pages
 - Preprocessing rationale: {doc}`preprocessing-consistency`
 - Postprocessing rationale: {doc}`postprocessing-and-thresholds`
 - Target rationale: {doc}`target-encoding-and-decoding`
--- a/docs/source/explanation/postprocessing-and-thresholds.md
+++ b/docs/source/explanation/postprocessing-and-thresholds.md
@ -0,0 +1,43 @@
 # Postprocessing and thresholds
 After the detector runs on a spectrogram, the model output is still a set of
 dense prediction tensors. Postprocessing turns that into a final list of call
 detections with positions, sizes, and class scores.
 ## What postprocessing does
 In broad terms, the pipeline:
 1. suppresses nearby duplicate peaks,
 2. extracts candidate detections,
 3. reads size and class values at each detected location,
 4. decodes outputs into call-level predictions.
 This is where score thresholds and output density limits are applied.
 ## Why thresholds matter
 Thresholds control the balance between sensitivity and precision.
 - Lower thresholds keep more detections, including weaker calls, but may add
  false positives.
 - Higher thresholds remove low-confidence detections, but may miss faint calls.
 You can tune this behavior per run without retraining the model.
 ## Two common threshold controls
 - `detection_threshold`: minimum score required to keep a detection.
 - `classification_threshold`: minimum class score used when assigning class
  labels.
 Both settings shape the final output and should be validated on reviewed local
 data.
 ## Practical workflow
 Tune thresholds on a representative subset first, then lock settings for the
 full analysis run.
 - How-to: {doc}`../how_to/tune-detection-threshold`
 - CLI reference: {doc}`../reference/cli/predict`
--- a/docs/source/explanation/preprocessing-consistency.md
+++ b/docs/source/explanation/preprocessing-consistency.md
@ -0,0 +1,36 @@
 # Preprocessing consistency
 Preprocessing consistency is one of the biggest factors behind stable model
 performance.
 ## Why consistency matters
 The detector is trained on spectrograms produced by a specific preprocessing
 pipeline. If inference uses different settings, the model can see a shifted
 input distribution and performance may drop.
 Typical mismatch sources:
 - sample-rate differences,
 - changed frequency crop,
 - changed STFT window/hop,
 - changed spectrogram transforms.
 ## Practical implication
 When possible, keep preprocessing settings aligned between:
 - training,
 - evaluation,
 - deployment inference.
 If you intentionally change preprocessing, treat this as a new experiment and
 re-validate on reviewed local data.
 ## Related pages
 - Configure audio preprocessing:
  {doc}`../how_to/configure-audio-preprocessing`
 - Configure spectrogram preprocessing:
  {doc}`../how_to/configure-spectrogram-preprocessing`
 - Preprocessing config reference: {doc}`../reference/preprocessing-config`
--- a/docs/source/explanation/target-encoding-and-decoding.md
+++ b/docs/source/explanation/target-encoding-and-decoding.md
@ -0,0 +1,40 @@
 # Target encoding and decoding
 batdetect2 turns annotated sound events into training targets, then maps model
 outputs back into interpretable predictions.
 ## Encoding path (annotations -> model targets)
 At training time, the target system:
 1. checks whether an event belongs to the configured detection target,
 2. assigns a classification label (or none for non-specific class matches),
 3. maps event geometry into position and size targets.
 This behaviour is configured through `TargetConfig`,
 `TargetClassConfig`, and ROI mapper settings.
 ## Decoding path (model outputs -> tags and geometry)
 At inference time, class labels and ROI parameters are decoded back into
 annotation tags and geometry.
 This makes outputs interpretable in the same conceptual space as your original
 annotations.
 ## Why this matters
 Target definitions are not just metadata. They directly shape:
 - what events are treated as positive examples,
 - which class names the model learns,
 - how geometry is represented and reconstructed.
 Small changes here can alter both training outcomes and prediction semantics.
 ## Related pages
 - Configure detection target logic: {doc}`../how_to/configure-target-definitions`
 - Configure class mapping: {doc}`../how_to/define-target-classes`
 - Configure ROI mapping: {doc}`../how_to/configure-roi-mapping`
 - Target config reference: {doc}`../reference/targets-config-workflow`
--- a/docs/source/how_to/configure-aoef-dataset.md
+++ b/docs/source/how_to/configure-aoef-dataset.md
@ -0,0 +1,53 @@
 # How to configure an AOEF dataset source
 Use this guide when your annotations are stored in AOEF/soundevent JSON files,
 including exports from Whombat.
 ## 1) Add an AOEF source entry
 In your dataset config, add a source with `format: aoef`.
 ```yaml
 sources:
  - name: my_aoef_source
    format: aoef
    audio_dir: /path/to/audio
    annotations_path: /path/to/annotations.soundevent.json
 ```
 ## 2) Choose filtering behavior for annotation projects
 If `annotations_path` is an `AnnotationProject`, you can filter by task state.
 ```yaml
 sources:
  - name: whombat_verified
    format: aoef
    audio_dir: /path/to/audio
    annotations_path: /path/to/project_export.aoef
    filter:
      only_completed: true
      only_verified: true
      exclude_issues: true
 ```
 If you omit `filter`, default project filtering is applied.
 To disable filtering for project files:
 ```yaml
 filter: null
 ```
 ## 3) Check that the source loads
 Run a summary on your dataset config:
 ```bash
 batdetect2 data summary path/to/dataset.yaml
 ```
 ## 4) Continue to training or evaluation
 - For training: {doc}`../tutorials/train-a-custom-model`
 - For field-level reference: {doc}`../reference/data-sources`
--- a/docs/source/how_to/configure-audio-preprocessing.md
+++ b/docs/source/how_to/configure-audio-preprocessing.md
@ -0,0 +1,64 @@
 # How to configure audio preprocessing
 Use this guide to set sample-rate and waveform-level preprocessing behaviour.
 ## 1) Set audio loader settings
 The audio loader config controls resampling.
 ```yaml
 samplerate: 256000
 resample:
  enabled: true
  method: poly
 ```
 If your recordings are already at the expected sample rate, you can disable
 resampling.
 ```yaml
 samplerate: 256000
 resample:
  enabled: false
 ```
 ## 2) Set waveform transforms in preprocessing config
 Waveform transforms are configured in `preprocess.audio_transforms`.
 ```yaml
 preprocess:
  audio_transforms:
    - name: center_audio
    - name: scale_audio
    - name: fix_duration
      duration: 0.5
 ```
 Available built-ins:
 - `center_audio`
 - `scale_audio`
 - `fix_duration`
 ## 3) Use the config in your workflow
 For CLI inference/evaluation, use `--audio-config`.
 ```bash
 batdetect2 predict directory \
  path/to/model.ckpt \
  path/to/audio_dir \
  path/to/outputs \
  --audio-config path/to/audio.yaml
 ```
 ## 4) Verify quickly on a small subset
 Run on a small folder first and confirm that outputs and runtime are as
 expected before full-batch runs.
 ## Related pages
 - Spectrogram settings: {doc}`configure-spectrogram-preprocessing`
 - Preprocessing config reference: {doc}`../reference/preprocessing-config`
--- a/docs/source/how_to/configure-roi-mapping.md
+++ b/docs/source/how_to/configure-roi-mapping.md
@ -0,0 +1,54 @@
 # How to configure ROI mapping
 Use this guide to control how annotation geometry is encoded into training
 targets and decoded back into boxes.
 ## 1) Set the default ROI mapper
 The default mapper is `anchor_bbox`.
 ```yaml
 roi:
  name: anchor_bbox
  anchor: bottom-left
  time_scale: 1000.0
  frequency_scale: 0.001163
 ```
 ## 2) Choose an anchor strategy
 Typical options include `bottom-left` and `center`.
 - `bottom-left` is the current default.
 - `center` can be easier to reason about in some workflows.
 ## 3) Set scale factors intentionally
 - `time_scale` controls width scaling.
 - `frequency_scale` controls height scaling.
 Use values that are consistent with your model setup and keep them fixed when
 comparing experiments.
 ## 4) (Optional) override ROI mapping for specific classes
 You can set class-level `roi` in `classification_targets` when needed.
 ```yaml
 classification_targets:
  - name: species_x
    tags:
      - key: class
        value: Species X
    roi:
      name: anchor_bbox
      anchor: center
      time_scale: 1000.0
      frequency_scale: 0.001163
 ```
 ## Related pages
 - Target definitions: {doc}`configure-target-definitions`
 - Class definitions: {doc}`define-target-classes`
 - Target encoding overview: {doc}`../explanation/target-encoding-and-decoding`
--- a/docs/source/how_to/configure-spectrogram-preprocessing.md
+++ b/docs/source/how_to/configure-spectrogram-preprocessing.md
@ -0,0 +1,59 @@
 # How to configure spectrogram preprocessing
 Use this guide to set STFT, frequency range, and spectrogram transforms.
 ## 1) Configure STFT and frequency range
 ```yaml
 preprocess:
  stft:
    window_duration: 0.002
    window_overlap: 0.75
    window_fn: hann
  frequencies:
    min_freq: 10000
    max_freq: 120000
 ```
 ## 2) Configure spectrogram transforms
 `spectrogram_transforms` are applied in order.
 ```yaml
 preprocess:
  spectrogram_transforms:
    - name: pcen
      time_constant: 0.4
      gain: 0.98
      bias: 2.0
      power: 0.5
    - name: spectral_mean_subtraction
    - name: scale_amplitude
      scale: db
 ```
 Common built-ins:
 - `pcen`
 - `spectral_mean_subtraction`
 - `scale_amplitude` (`db` or `power`)
 - `peak_normalize`
 ## 3) Configure output size
 ```yaml
 preprocess:
  size:
    height: 128
    resize_factor: 0.5
 ```
 ## 4) Keep train and inference settings aligned
 Use the same preprocessing setup for training and prediction whenever possible.
 Large mismatches can degrade model performance.
 ## Related pages
 - Why consistency matters: {doc}`../explanation/preprocessing-consistency`
 - Preprocessing config reference: {doc}`../reference/preprocessing-config`
--- a/docs/source/how_to/configure-target-definitions.md
+++ b/docs/source/how_to/configure-target-definitions.md
@ -0,0 +1,58 @@
 # How to configure target definitions
 Use this guide to define which annotated sound events are considered valid
 detection targets.
 ## 1) Start from a targets config file
 ```yaml
 detection_target:
  name: bat
  match_if:
    name: has_tag
    tag:
      key: call_type
      value: Echolocation
  assign_tags:
    - key: call_type
      value: Echolocation
    - key: order
      value: Chiroptera
 ```
 `match_if` decides whether an annotation is included in the detection target.
 ## 2) Use condition combinators when needed
 You can combine conditions with `all_of`, `any_of`, and `not`.
 ```yaml
 detection_target:
  name: bat
  match_if:
    name: all_of
    conditions:
      - name: has_tag
        tag:
          key: call_type
          value: Echolocation
      - name: not
        condition:
          name: has_any_tag
          tags:
            - key: call_type
              value: Social
            - key: class
              value: Not Bat
 ```
 ## 3) Verify with a small sample first
 Before full training, inspect a small annotation subset and confirm that the
 selection logic keeps the events you expect.
 ## Related pages
 - Class mapping: {doc}`define-target-classes`
 - ROI mapping: {doc}`configure-roi-mapping`
 - Targets reference: {doc}`../reference/targets-config-workflow`
--- a/docs/source/how_to/define-target-classes.md
+++ b/docs/source/how_to/define-target-classes.md
@ -0,0 +1,59 @@
 # How to define target classes
 Use this guide to map annotations to classification labels used during
 training.
 ## 1) Add classification target entries
 Each entry defines a class name and matching tags.
 ```yaml
 classification_targets:
  - name: pippip
    tags:
      - key: class
        value: Pipistrellus pipistrellus
  - name: pippyg
    tags:
      - key: class
        value: Pipistrellus pygmaeus
 ```
 ## 2) Use `assign_tags` to control decoded output tags
 If you want prediction output tags to differ from matching tags, set
 `assign_tags` explicitly.
 ```yaml
 classification_targets:
  - name: pipistrelle_group
    tags:
      - key: class
        value: Pipistrellus pipistrellus
    assign_tags:
      - key: genus
        value: Pipistrellus
 ```
 ## 3) Use `match_if` for complex class rules
 For advanced conditions, use `match_if` instead of `tags`.
 ```yaml
 classification_targets:
  - name: long_call
    match_if:
      name: duration
      operator: gt
      seconds: 0.02
 ```
 ## 4) Confirm class names are unique
 `classification_targets.name` values must be unique.
 ## Related pages
 - Detection-target filtering: {doc}`configure-target-definitions`
 - ROI mapping: {doc}`configure-roi-mapping`
 - Targets config reference: {doc}`../reference/targets-config-workflow`
--- a/docs/source/how_to/import-legacy-batdetect2-annotations.md
+++ b/docs/source/how_to/import-legacy-batdetect2-annotations.md
@ -0,0 +1,66 @@
 # How to import legacy batdetect2 annotations
 Use this guide if your annotations are in older batdetect2 JSON formats.
 Two legacy formats are supported:
 - `batdetect2`: one annotation JSON file per recording
 - `batdetect2_file`: one merged JSON file for many recordings
 ## 1) Choose the correct source format
 Directory-based annotations (`format: batdetect2`):
 ```yaml
 sources:
  - name: legacy_per_file
    format: batdetect2
    audio_dir: /path/to/audio
    annotations_dir: /path/to/annotation_json_dir
 ```
 Merged annotation file (`format: batdetect2_file`):
 ```yaml
 sources:
  - name: legacy_merged
    format: batdetect2_file
    audio_dir: /path/to/audio
    annotations_path: /path/to/merged_annotations.json
 ```
 ## 2) Set optional legacy filters
 Legacy filters are based on `annotated` and `issues` flags.
 ```yaml
 filter:
  only_annotated: true
  exclude_issues: true
 ```
 To load all entries regardless of flags:
 ```yaml
 filter: null
 ```
 ## 3) Validate and convert if needed
 Check loaded records:
 ```bash
 batdetect2 data summary path/to/dataset.yaml
 ```
 Convert to annotation-set output for downstream tooling:
 ```bash
 batdetect2 data convert path/to/dataset.yaml --output path/to/output.json
 ```
 ## 4) Continue with current workflows
 - Run predictions: {doc}`run-batch-predictions`
 - Train on imported data: {doc}`../tutorials/train-a-custom-model`
 - Field-level reference: {doc}`../reference/data-sources`
--- a/docs/source/how_to/index.md
+++ b/docs/source/how_to/index.md
@ -2,14 +2,16 @@
 How-to guides help you complete specific tasks while working.
 ## Who this section is for
 - Ecologists running repeat analyses.
 - Python-savvy users integrating BatDetect2 into workflows.
 ```{toctree}
 :maxdepth: 1
 run-batch-predictions
 tune-detection-threshold
 configure-aoef-dataset
 import-legacy-batdetect2-annotations
 configure-audio-preprocessing
 configure-spectrogram-preprocessing
 configure-target-definitions
 define-target-classes
 configure-roi-mapping
 ```
--- a/docs/source/index.md
+++ b/docs/source/index.md
@ -82,7 +82,6 @@ tutorials/index
 how_to/index
 reference/index
 explanation/index
 legacy/index
 ```
 ```{toctree}
--- a/docs/source/legacy/index.md
+++ b/docs/source/legacy/index.md
@ -1,14 +0,0 @@
 # Legacy documentation
 These pages contain existing technical material that predates the Diataxis
 reorganization. They remain available during migration.
 ```{toctree}
 :maxdepth: 1
 ../architecture
 ../data/index
 ../preprocessing/index
 ../postprocessing
 ../targets/index
 ```
--- a/docs/source/postprocessing.md
+++ b/docs/source/postprocessing.md
@ -1,126 +0,0 @@
 # Postprocessing: From Model Output to Predictions
 ## What is Postprocessing?
 After the BatDetect2 neural network analyzes a spectrogram, it doesn't directly output a neat list of bat calls.
 Instead, it produces raw numerical data, usually in the form of multi-dimensional arrays or "heatmaps".
 These arrays contain information like:
 - The probability of a sound event being present at each time-frequency location.
 - The probability of each possible target class (e.g., species) at each location.
 - Predicted size characteristics (like duration and bandwidth) at each location.
 - Internal learned features at each location.
 **Postprocessing** is the sequence of steps that takes these numerical model outputs and translates them into a structured list of detected sound events, complete with predicted tags, bounding boxes, and confidence scores.
 The {py:mod}`batdetect2.postprocess` mode handles this entire workflow.
 ## Why is Postprocessing Necessary?
 1.  **Interpretation:** Raw heatmap outputs need interpretation to identify distinct sound events (detections).
    A high probability score might spread across several adjacent time-frequency bins, all related to the same call.
 2.  **Refinement:** Model outputs can be noisy or contain redundancies.
    Postprocessing steps like Non-Maximum Suppression (NMS) clean this up, ensuring (ideally) only one detection is reported for each actual sound event.
 3.  **Contextualization:** Raw outputs lack real-world units.
    Postprocessing adds back time (seconds) and frequency (Hz) coordinates, converts predicted sizes to physical units using configured scales, and decodes predicted class indices back into meaningful tags based on your target definitions.
 4.  **User Control:** Postprocessing includes tunable parameters, most importantly **thresholds**.
    By adjusting these, you can control the trade-off between finding more potential calls (sensitivity) versus reducing false positives (specificity) _without needing to retrain the model_.
 ## The Postprocessing Pipeline
 BatDetect2 applies a series of steps to convert the raw model output into final predictions.
 Understanding these steps helps interpret the results and configure the process effectively:
 1.  **Non-Maximum Suppression (NMS):**
    - **Goal:** Reduce redundant detections.
      If the model outputs high scores for several nearby points corresponding to the same call, NMS selects the single highest peak in a local neighbourhood and suppresses the others (sets their score to zero).
    - **Configurable:** The size of the neighbourhood (`nms_kernel_size`) can be adjusted.
 2.  **Coordinate Remapping:**
    - **Goal:** Add coordinate (time/frequency) information.
      This step takes the grid-based model outputs (which just have row/column indices) and associates them with actual time (seconds) and frequency (Hz) coordinates based on the input spectrogram's properties.
      The result is coordinate-aware arrays (using {py:class}`xarray.DataArray`}).
 3.  **Detection Extraction:**
    - **Goal:** Identify the specific points representing detected events.
    - **Process:** Looks for peaks in the NMS-processed detection heatmap that are above a certain confidence level (`detection_threshold`).
      It also often limits the maximum number of detections based on a rate (`top_k_per_sec`) to avoid excessive outputs in very busy files.
    - **Configurable:** `detection_threshold`, `top_k_per_sec`.
 4.  **Data Extraction:**
    - **Goal:** Gather all relevant information for each detected point.
    - **Process:** For each time-frequency location identified in Step 3, this step looks up the corresponding values in the _other_ remapped model output arrays (class probabilities, predicted sizes, internal features).
    - **Intermediate Output 1:** The result of this stage (containing aligned scores, positions, sizes, class probabilities, and features for all detections in a clip) is often accessible programmatically as an {py:class}`xarray.Dataset`}.
      This can be useful for advanced users needing direct access to the numerical outputs.
 5.  **Decoding & Formatting:**
    - **Goal:** Convert the extracted numerical data into interpretable, standard formats.
    - **Process:**
      - **ROI Recovery:** Uses the predicted position and size values, along with the ROI mapping configuration defined in the `targets` module, to reconstruct an estimated bounding box ({py:class}`soundevent.data.BoundingBox`}).
      - **Class Decoding:** Translates the numerical class probability vector into meaningful {py:class}`soundevent.data.PredictedTag` objects.
        This involves:
        - Applying the `classification_threshold` to ignore low-confidence class scores.
        - Using the class decoding rules (from the `targets` module) to map the name(s) of the high-scoring class(es) back to standard tags (like `species: Myotis daubentonii`).
        - Optionally selecting only the top-scoring class or multiple classes above the threshold.
        - Including the generic "Bat" tags if no specific class meets the threshold.
      - **Feature Conversion:** Converts raw feature vectors into {py:class}`soundevent.data.Feature` objects.
    - **Intermediate Output 2:** This step might internally create a list of simplified `RawPrediction` objects containing the bounding box, scores, and features.
      This intermediate list might also be accessible programmatically for users who prefer a simpler structure than the final {py:mod}`soundevent` objects.
 6.  **Final Output (`ClipPrediction`):**
    - **Goal:** Package everything into a standard format.
    - **Process:** Collects all the fully processed `SoundEventPrediction` objects (each containing a sound event with geometry, features, overall score, and predicted tags with scores) for a given audio clip into a final {py:class}`soundevent.data.ClipPrediction` object.
      This is the standard output format representing the model's findings for that clip.
 ## Configuring Postprocessing
 You can control key aspects of this pipeline, especially the thresholds and NMS settings, via a `postprocess:` section in your main configuration YAML file.
 Adjusting these **allows you to fine-tune the detection results without retraining the model**.
 **Key Configurable Parameters:**
 - `detection_threshold`: (Number >= 0, e.g., `0.1`) Minimum score for a peak to be considered a detection.
  **Lowering this increases sensitivity (more detections, potentially more false positives); raising it increases specificity (fewer detections, potentially missing faint calls).**
 - `classification_threshold`: (Number >= 0, e.g., `0.3`) Minimum score for a _specific class_ prediction to be assigned as a tag.
  Affects how confidently the model must identify the class.
 - `top_k_per_sec`: (Integer > 0, e.g., `200`) Limits the maximum density of detections reported per second.
  Helps manage extremely dense recordings.
 - `nms_kernel_size`: (Integer > 0, e.g., `9`) Size of the NMS window in pixels/bins.
  Affects how close two distinct peaks can be before one suppresses the other.
 **Example YAML Configuration:**
 ```yaml
 # Inside your main configuration file (e.g., config.yaml)
 postprocess:
  nms_kernel_size: 9
  detection_threshold: 0.1 # Lower threshold -> more sensitive
  classification_threshold: 0.3 # Higher threshold -> more confident classifications
  top_k_per_sec: 200
 # ... other sections preprocessing, targets ...
 ```
 **Note:** These parameters can often also be adjusted via Command Line Interface (CLI) arguments when running predictions, or through function arguments if using the Python API, providing flexibility for experimentation.
 ## Accessing Intermediate Results
 While the final `ClipPrediction` objects are the standard output, the `Postprocessor` object used internally provides methods to access results from intermediate stages (like the `xr.Dataset` after Step 4, or the list of `RawPrediction` objects after Step 5).
 This can be valuable for:
 - Debugging the pipeline.
 - Performing custom analyses on the numerical outputs before final decoding.
 - **Transfer Learning / Feature Extraction:** Directly accessing the extracted `features` (from Step 4 or 5a) associated with detected events can be highly useful for training other models or further analysis.
 Consult the API documentation for details on how to access these intermediate results programmatically if needed.
 ## Summary
 Postprocessing is the conversion between neural network outputs and meaningful, interpretable sound event detections.
 BatDetect2 provides a configurable pipeline including NMS, coordinate remapping, peak detection with thresholding, data extraction, and class/geometry decoding.
 Researchers can easily tune key parameters like thresholds via configuration files or arguments to adjust the final set of predictions without altering the trained model itself, and advanced users can access intermediate results for custom analyses or feature reuse.
--- a/docs/source/preprocessing/audio.md
+++ b/docs/source/preprocessing/audio.md
@ -1,92 +0,0 @@
 # Audio Loading and Preprocessing
 ## Purpose
 Before BatDetect2 can analyze the sounds in your recordings, the raw audio data needs to be loaded from the file and prepared.
 This initial preparation involves several standard waveform processing steps.
 This `audio` module handles this first stage of preprocessing.
 It's crucial to understand that the _exact same_ preprocessing steps must be applied both when **training** a model and when **using** that trained model later to make predictions (inference).
 Consistent preprocessing ensures the model receives data in the format it expects.
 BatDetect2 allows you to control these audio preprocessing steps through settings in your main configuration file.
 ## The Audio Processing Pipeline
 When BatDetect2 needs to process an audio segment (either a full recording or a specific clip), it follows a defined sequence of steps:
 1.  **Load Audio Segment:** The system first reads the specified time segment from the audio file.
    - **Note:** BatDetect2 typically works with **mono** audio.
      By default, if your file has multiple channels (e.g., stereo), only the **first channel** is loaded and used for subsequent processing.
 2.  **Adjust Duration (Optional):** If you've specified a target duration in your configuration, the loaded audio segment is either shortened (by cropping from the start) or lengthened (by adding silence, i.e., zeros, at the end) to match that exact duration.
    This is sometimes required by specific model architectures that expect fixed-size inputs.
    By default, this step is **off**, and the original clip duration is used.
 3.  **Resample (Optional):** If configured (and usually **on** by default), the audio's sample rate is changed to a specific target value (e.g., 256,000 Hz).
    This is vital for standardizing the data, as different recording devices capture audio at different rates.
    The model needs to be trained and run on data with a consistent sample rate.
 4.  **Center Waveform (Optional):** If configured (and typically **on** by default), the system removes any constant shift away from zero in the waveform (known as DC offset).
    This is a standard practice that can sometimes improve the quality of later signal processing steps.
 5.  **Scale Amplitude (Optional):** If configured (typically **off** by default), the waveform's amplitude (loudness) is adjusted.
    The standard method used here is "peak normalization," which scales the entire clip so that the loudest point has an absolute value of 1.0.
    This can help standardize volume levels across different recordings, although it's not always necessary or desirable depending on your analysis goals.
 ## Configuring Audio Processing
 You can control these steps via settings in your main configuration file (e.g., `config.yaml`), usually within a dedicated `audio:` section (which might itself be under a broader `preprocessing:` section).
 Here are the key options you can set:
 - **Resampling (`resample`)**:
  - To enable resampling (recommended and usually default), include a `resample:` block.
    To disable it completely, you might set `resample: null` or omit the block.
  - `samplerate`: (Number) The target sample rate in Hertz (Hz) that all audio will be converted to.
    This **must** match the sample rate expected by the BatDetect2 model you are using or training (e.g., `samplerate: 256000`).
  - `mode`: (Text, `"poly"` or `"fourier"`) The underlying algorithm used for resampling.
    The default `"poly"` is generally a good choice.
    You typically don't need to change this unless you have specific reasons.
 - **Duration (`duration`)**:
  - (Number or `null`) Sets a fixed duration for all audio clips in **seconds**.
    If set (e.g., `duration: 4.0`), shorter clips are padded with silence, and longer clips are cropped.
    If `null` (default), the original clip duration is used.
 - **Centering (`center`)**:
  - (Boolean, `true` or `false`) Controls DC offset removal.
    Default is usually `true`.
    Set to `false` to disable.
 - **Scaling (`scale`)**:
  - (Boolean, `true` or `false`) Controls peak amplitude normalization.
    Default is usually `false`.
    Set to `true` to enable scaling so the maximum absolute amplitude becomes 1.0.
 **Example YAML Configuration:**
 ```yaml
 # Inside your main configuration file (e.g., training_config.yaml)
 preprocessing: # Or this might be at the top level
  audio:
    # --- Resampling Settings ---
    resample: # Settings block to control resampling
      samplerate: 256000 # Target sample rate in Hz (Required if resampling)
      mode: poly # Algorithm ('poly' or 'fourier', optional, defaults to 'poly')
      # To disable resampling entirely, you might use:
      # resample: null
    # --- Other Settings ---
    duration: null # Keep original clip duration (e.g., use 4.0 for 4 seconds)
    center: true # Remove DC offset (default is often true)
    scale: false # Do not normalize peak amplitude (default is often false)
 # ... other configuration sections (like model, dataset, targets) ...
 ```
 ## Outcome
 After these steps, the output is a standardized audio waveform (represented as a numerical array with time information).
 This processed waveform is now ready for the next stage of preprocessing, which typically involves calculating the spectrogram (covered in the next module/section).
 Ensuring these audio preprocessing settings are consistent is fundamental for achieving reliable results in both training and inference.
--- a/docs/source/preprocessing/index.md
+++ b/docs/source/preprocessing/index.md
@ -1,46 +0,0 @@
 # Preprocessing Audio for BatDetect2
 ## What is Preprocessing?
 Preprocessing refers to the steps taken to transform your raw audio recordings into a standardized format suitable for analysis by the BatDetect2 deep learning model.
 This module (`batdetect2.preprocessing`) provides the tools to perform these transformations.
 ## Why is Preprocessing Important?
 Applying a consistent preprocessing pipeline is important for several reasons:
 1.  **Standardization:** Audio recordings vary significantly depending on the equipment used, recording conditions, and settings (e.g., different sample rates, varying loudness levels, background noise).
    Preprocessing helps standardize these aspects, making the data more uniform and allowing the model to learn relevant patterns more effectively.
 2.  **Model Requirements:** Deep learning models, particularly those like BatDetect2 that analyze 2D-patterns in spectrograms, are designed to work with specific input characteristics.
    They often expect spectrograms of a certain size (time x frequency bins), with values represented on a particular scale (e.g., logarithmic/dB), and within a defined frequency range.
    Preprocessing ensures the data meets these requirements.
 3.  **Consistency is Key:** Perhaps most importantly, the **exact same preprocessing steps** must be applied both when _training_ the model and when _using the trained model to make predictions_ (inference) on new data.
    Any discrepancy between the preprocessing used during training and inference can significantly degrade the model's performance and lead to unreliable results.
    BatDetect2's configurable pipeline ensures this consistency.
 ## How Preprocessing is Done in BatDetect2
 BatDetect2 handles preprocessing through a configurable, two-stage pipeline:
 1.  **Audio Loading & Preparation:** This first stage deals with the raw audio waveform.
    It involves loading the specified audio segment (from a file or clip), selecting a single channel (mono), optionally resampling it to a consistent sample rate, optionally adjusting its duration, and applying basic waveform conditioning like centering (DC offset removal) and amplitude scaling.
    (Details in the {doc}`audio` section).
 2.  **Spectrogram Generation:** The prepared audio waveform is then converted into a spectrogram.
    This involves calculating the Short-Time Fourier Transform (STFT) and then applying a series of configurable steps like cropping the frequency range, applying amplitude representations (like dB scale or PCEN), optional denoising, optional resizing to the model's required dimensions, and optional final normalization.
    (Details in the {doc}`spectrogram` section).
 The entire pipeline is controlled via settings in your main configuration file (typically a YAML file), usually grouped under a `preprocessing:` section which contains subsections like `audio:` and `spectrogram:`.
 This allows you to easily define, share, and reproduce the exact preprocessing used for a specific model or experiment.
 ## Next Steps
 Explore the following sections for detailed explanations of how to configure each stage of the preprocessing pipeline and how to use the resulting preprocessor:
 ```{toctree}
 :maxdepth: 1
 :caption: Preprocessing Steps:
 audio
 spectrogram
 usage
 ```
--- a/docs/source/preprocessing/spectrogram.md
+++ b/docs/source/preprocessing/spectrogram.md
@ -1,141 +0,0 @@
 # Spectrogram Generation
 ## Purpose
 After loading and performing initial processing on the audio waveform (as described in the Audio Loading section), the next crucial step in the `preprocessing` pipeline is to convert that waveform into a **spectrogram**.
 A spectrogram is a visual representation of sound, showing frequency content over time, and it's the primary input format for many deep learning models, including BatDetect2.
 This module handles the calculation and subsequent processing of the spectrogram.
 Just like the audio processing, these steps need to be applied **consistently** during both model training and later use (inference) to ensure the model performs reliably.
 You control this entire process through the configuration file.
 ## The Spectrogram Generation Pipeline
 Once BatDetect2 has a prepared audio waveform, it follows these steps to create the final spectrogram input for the model:
 1.  **Calculate STFT (Short-Time Fourier Transform):** This is the fundamental step that converts the 1D audio waveform into a 2D time-frequency representation.
    It calculates the frequency content within short, overlapping time windows.
    The output is typically a **magnitude spectrogram**, showing the intensity (amplitude) of different frequencies at different times.
    Key parameters here are the `window_duration` and `window_overlap`, which affect the trade-off between time and frequency resolution.
 2.  **Crop Frequencies:** The STFT often produces frequency information over a very wide range (e.g., 0 Hz up to half the sample rate).
    This step crops the spectrogram to focus only on the frequency range relevant to your target sounds (e.g., 10 kHz to 120 kHz for typical bat echolocation).
 3.  **Apply PCEN (Optional):** If configured, Per-Channel Energy Normalization is applied.
    PCEN is an adaptive technique that adjusts the gain (loudness) in each frequency channel based on its recent history.
    It can help suppress stationary background noise and enhance the prominence of transient sounds like echolocation pulses.
    This step is optional.
 4.  **Set Amplitude Scale / Representation:** The values in the spectrogram (either raw magnitude or post-PCEN values) need to be represented on a suitable scale.
    You choose one of the following:
    - `"amplitude"`: Use the linear magnitude values directly.
      (Default)
    - `"power"`: Use the squared magnitude values (representing energy).
    - `"dB"`: Apply a logarithmic transformation (specifically `log(1 + C*Magnitude)`).
      This compresses the range of values, often making variations in quieter sounds more apparent, similar to how humans perceive loudness.
 5.  **Denoise (Optional):** If configured (and usually **on** by default), a simple noise reduction technique is applied.
    This method subtracts the average value of each frequency bin (calculated across time) from that bin, assuming the average represents steady background noise.
    Negative values after subtraction are clipped to zero.
 6.  **Resize (Optional):** If configured, the dimensions (height/frequency bins and width/time bins) of the spectrogram are adjusted using interpolation to match the exact input size expected by the neural network architecture.
 7.  **Peak Normalize (Optional):** If configured (typically **off** by default), the entire final spectrogram is scaled so that its highest value is exactly 1.0.
    This ensures all spectrograms fed to the model have a consistent maximum value, which can sometimes aid training stability.
 ## Configuring Spectrogram Generation
 You control all these steps via settings in your main configuration file (e.g., `config.yaml`), within the `spectrogram:` section (usually located under the main `preprocessing:` section).
 Here are the key configuration options:
 - **STFT Settings (`stft`)**:
  - `window_duration`: (Number, seconds, e.g., `0.002`) Length of the analysis window.
  - `window_overlap`: (Number, 0.0 to <1.0, e.g., `0.75`) Fractional overlap between windows.
  - `window_fn`: (Text, e.g., `"hann"`) Name of the windowing function.
 - **Frequency Cropping (`frequencies`)**:
  - `min_freq`: (Integer, Hz, e.g., `10000`) Minimum frequency to keep.
  - `max_freq`: (Integer, Hz, e.g., `120000`) Maximum frequency to keep.
 - **PCEN (`pcen`)**:
  - This entire section is **optional**.
    Include it only if you want to apply PCEN.
    If omitted or set to `null`, PCEN is skipped.
  - `time_constant`: (Number, seconds, e.g., `0.4`) Controls adaptation speed.
  - `gain`: (Number, e.g., `0.98`) Gain factor.
  - `bias`: (Number, e.g., `2.0`) Bias factor.
  - `power`: (Number, e.g., `0.5`) Compression exponent.
 - **Amplitude Scale (`scale`)**:
  - (Text: `"dB"`, `"power"`, or `"amplitude"`) Selects the final representation of the spectrogram values.
    Default is `"amplitude"`.
 - **Denoising (`spectral_mean_substraction`)**:
  - (Boolean: `true` or `false`) Enables/disables the spectral mean subtraction denoising step.
    Default is usually `true`.
 - **Resizing (`size`)**:
  - This entire section is **optional**.
    Include it only if you need to resize the spectrogram to specific dimensions required by the model.
    If omitted or set to `null`, no resizing occurs after frequency cropping.
  - `height`: (Integer, e.g., `128`) Target number of frequency bins.
  - `resize_factor`: (Number or `null`, e.g., `0.5`) Factor to scale the time dimension by.
    `0.5` halves the width, `null` or `1.0` keeps the original width.
 - **Peak Normalization (`peak_normalize`)**:
  - (Boolean: `true` or `false`) Enables/disables final scaling of the entire spectrogram so the maximum value is 1.0.
    Default is usually `false`.
 **Example YAML Configuration:**
 ```yaml
 # Inside your main configuration file
 preprocessing:
  audio:
    # ... (your audio configuration settings) ...
    resample:
      samplerate: 256000 # Ensure this matches model needs
  spectrogram:
    # --- STFT Parameters ---
    stft:
      window_duration: 0.002 # 2ms window
      window_overlap: 0.75 # 75% overlap
      window_fn: hann
    # --- Frequency Range ---
    frequencies:
      min_freq: 10000 # 10 kHz
      max_freq: 120000 # 120 kHz
    # --- PCEN (Optional) ---
    # Include this block to enable PCEN, omit or set to null to disable.
    pcen:
      time_constant: 0.4
      gain: 0.98
      bias: 2.0
      power: 0.5
    # --- Final Amplitude Representation ---
    scale: dB # Choose 'dB', 'power', or 'amplitude'
    # --- Denoising ---
    spectral_mean_substraction: true # Enable spectral mean subtraction
    # --- Resizing (Optional) ---
    # Include this block to resize, omit or set to null to disable.
    size:
      height: 128 # Target height in frequency bins
      resize_factor: 0.5 # Halve the number of time bins
    # --- Final Normalization ---
    peak_normalize: false # Do not scale max value to 1.0
 ```
 ## Outcome
 The output of this module is the final, processed spectrogram (as a 2D numerical array with time and frequency information).
 This spectrogram is now in the precise format expected by the BatDetect2 neural network, ready to be used for training the model or for making predictions on new data.
 Remember, using the exact same `spectrogram` configuration settings during training and inference is essential for correct model performance.
--- a/docs/source/preprocessing/usage.md
+++ b/docs/source/preprocessing/usage.md
@ -1,175 +0,0 @@
 # Using Preprocessors in BatDetect2
 ## Overview
 In the previous sections ({doc}`audio`and {doc}`spectrogram`), we discussed the individual steps involved in converting raw audio into a processed spectrogram suitable for BatDetect2 models, and how to configure these steps using YAML files (specifically the `audio:` and `spectrogram:` sections within a main `preprocessing:` configuration block).
 This page focuses on how this configured pipeline is represented and used within BatDetect2, primarily through the concept of a **`Preprocessor`** object.
 This object bundles together your chosen audio loading settings and spectrogram generation settings into a single component that can perform the end-to-end processing.
 ## Do I Need to Interact with Preprocessors Directly?
 **Usually, no.** For standard model training or running inference with BatDetect2 using the provided scripts, the system will automatically:
 1.  Read your main configuration file (e.g., `config.yaml`).
 2.  Find the `preprocessing:` section (containing `audio:` and `spectrogram:` settings).
 3.  Build the appropriate `Preprocessor` object internally based on your settings.
 4.  Use that internal `Preprocessor` object automatically whenever audio needs to be loaded and converted to a spectrogram.
 **However**, understanding the `Preprocessor` object is useful if you want to:
 - **Customize:** Go beyond the standard configuration options by interacting with parts of the pipeline programmatically.
 - **Integrate:** Use BatDetect2's preprocessing steps within your own custom Python analysis scripts.
 - **Inspect/Debug:** Manually apply preprocessing steps to specific files or clips to examine intermediate outputs (like the processed waveform) or the final spectrogram.
 ## Getting a Preprocessor Object
 If you _do_ want to work with the preprocessor programmatically, you first need to get an instance of it.
 This is typically done based on a configuration:
 1.  **Define Configuration:** Create your `preprocessing:` configuration, usually in a YAML file (let's call it `preprocess_config.yaml`), detailing your desired `audio` and `spectrogram` settings.
    ```yaml
    # preprocess_config.yaml
    audio:
      resample:
        samplerate: 256000
      # ... other audio settings ...
    spectrogram:
      frequencies:
        min_freq: 15000
        max_freq: 120000
      scale: dB
      # ... other spectrogram settings ...
    ```
 2.  **Load Configuration & Build Preprocessor (in Python):**
    ```python
    from batdetect2.preprocessing import load_preprocessing_config, build_preprocessor
    from batdetect2.preprocess.types import Preprocessor # Import the type
    # Load the configuration from the file
    config_path = "path/to/your/preprocess_config.yaml"
    preprocessing_config = load_preprocessing_config(config_path)
    # Build the actual preprocessor object using the loaded config
    preprocessor: Preprocessor = build_preprocessor(preprocessing_config)
    # 'preprocessor' is now ready to use!
    ```
 3.  **Using Defaults:** If you just want the standard BatDetect2 default preprocessing settings, you can build one without loading a config file:
    ```python
    from batdetect2.preprocessing import build_preprocessor
    from batdetect2.preprocess.types import Preprocessor
    # Build with default settings
    default_preprocessor: Preprocessor = build_preprocessor()
    ```
 ## Applying Preprocessing
 Once you have a `preprocessor` object, you can use its methods to process audio data:
 **1.
 End-to-End Processing (Common Use Case):**
 These methods take an audio source identifier (file path, Recording object, or Clip object) and return the final, processed spectrogram.
 - `preprocessor.preprocess_file(path)`: Processes an entire audio file.
 - `preprocessor.preprocess_recording(recording_obj)`: Processes the entire audio associated with a `soundevent.data.Recording` object.
 - `preprocessor.preprocess_clip(clip_obj)`: Processes only the specific time segment defined by a `soundevent.data.Clip` object.
  - **Efficiency Note:** Using `preprocess_clip` is **highly recommended** when you are only interested in analyzing a small portion of a potentially long recording.
    It avoids loading the entire audio file into memory, making it much more efficient.
 ```python
 from soundevent import data
 # Assume 'preprocessor' is built as shown before
 # Assume 'my_clip' is a soundevent.data.Clip object defining a segment
 # Process an entire file
 spectrogram_from_file = preprocessor.preprocess_file("my_recording.wav")
 # Process only a specific clip (more efficient for segments)
 spectrogram_from_clip = preprocessor.preprocess_clip(my_clip)
 # The results (spectrogram_from_file, spectrogram_from_clip) are xr.DataArrays
 print(type(spectrogram_from_clip))
 # Output: <class 'xarray.core.dataarray.DataArray'>
 ```
 **2.
 Intermediate Steps (Advanced Use Cases):**
 The preprocessor also allows access to intermediate stages if needed:
 - `preprocessor.load_clip_audio(clip_obj)` (and similar for file/recording): Loads the audio and applies _only_ the waveform processing steps (resampling, centering, etc.) defined in the `audio` config.
  Returns the processed waveform as an `xr.DataArray`.
  This is useful if you want to analyze or manipulate the waveform itself before spectrogram generation.
 - `preprocessor.compute_spectrogram(waveform)`: Takes an _already loaded_ waveform (either `np.ndarray` or `xr.DataArray`) and applies _only_ the spectrogram generation steps defined in the `spectrogram` config.
  - If you provide an `xr.DataArray` (e.g., from `load_clip_audio`), it uses the sample rate from the array's coordinates.
  - If you provide a raw `np.ndarray`, it **must assume a sample rate**.
    It uses the `default_samplerate` that was determined when the `preprocessor` was built (based on your `audio` config's resample settings or the global default).
    Be cautious when using NumPy arrays to ensure the sample rate assumption is correct for your data!
 ```python
 # Example: Get waveform first, then spectrogram
 waveform = preprocessor.load_clip_audio(my_clip)
 # waveform is an xr.DataArray
 # ...potentially do other things with the waveform...
 # Compute spectrogram from the loaded waveform
 spectrogram = preprocessor.compute_spectrogram(waveform)
 # Example: Process external numpy array (use with caution re: sample rate)
 # import soundfile as sf # Requires installing soundfile
 # numpy_waveform, original_sr = sf.read("some_other_audio.wav")
 # # MUST ensure numpy_waveform's actual sample rate matches
 # # preprocessor.default_samplerate for correct results here!
 # spec_from_numpy = preprocessor.compute_spectrogram(numpy_waveform)
 ```
 ## Understanding the Output: `xarray.DataArray`
 All preprocessing methods return the final spectrogram (or the intermediate waveform) as an **`xarray.DataArray`**.
 **What is it?** Think of it like a standard NumPy array (holding the numerical data of the spectrogram) but with added "superpowers":
 - **Labeled Dimensions:** Instead of just having axis 0 and axis 1, the dimensions have names, typically `"frequency"` and `"time"`.
 - **Coordinates:** It stores the actual frequency values (e.g., in Hz) corresponding to each row and the actual time values (e.g., in seconds) corresponding to each column along the dimensions.
 **Why is it used?**
 - **Clarity:** The data is self-describing.
  You don't need to remember which axis is time and which is frequency, or what the units are – it's stored with the data.
 - **Convenience:** You can select, slice, or plot data using the real-world coordinate values (times, frequencies) instead of just numerical indices.
  This makes analysis code easier to write and less prone to errors.
 - **Metadata:** It can hold additional metadata about the processing steps in its `attrs` (attributes) dictionary.
 **Using the Output:**
 - **Input to Model:** For standard training or inference, you typically pass this `xr.DataArray` spectrogram directly to the BatDetect2 model functions.
 - **Inspection/Analysis:** If you're working programmatically, you can use xarray's powerful features.
  For example (these are just illustrations of xarray):
  ```python
  # Get the shape (frequency_bins, time_bins)
  # print(spectrogram.shape)
  # Get the frequency coordinate values
  # print(spectrogram['frequency'].values)
  # Select data near a specific time and frequency
  # value_at_point = spectrogram.sel(time=0.5, frequency=50000, method="nearest")
  # print(value_at_point)
  # Select a time slice between 0.2 and 0.3 seconds
  # time_slice = spectrogram.sel(time=slice(0.2, 0.3))
  # print(time_slice.shape)
  ```
 In summary, while BatDetect2 often handles preprocessing automatically based on your configuration, the underlying `Preprocessor` object provides a flexible interface for applying these steps programmatically if needed, returning results in the convenient and informative `xarray.DataArray` format.
--- a/docs/source/reference/data-sources.md
+++ b/docs/source/reference/data-sources.md
@ -0,0 +1,76 @@
 # Data source reference
 This page summarizes dataset source formats and their config fields.
 ## Supported source formats
 | Format | Description |
 | --- | --- |
 | `aoef` | AOEF/soundevent annotation files (`AnnotationSet` or `AnnotationProject`) |
 | `batdetect2` | Legacy format with one JSON annotation file per recording |
 | `batdetect2_file` | Legacy format with one merged JSON annotation file |
 ## AOEF (`format: aoef`)
 Required fields:
 - `name`
 - `format`
 - `audio_dir`
 - `annotations_path`
 Optional fields:
 - `description`
 - `filter`
 `filter` is only used when `annotations_path` points to an
 `AnnotationProject`.
 AOEF filter options:
 - `only_completed` (default: `true`)
 - `only_verified` (default: `false`)
 - `exclude_issues` (default: `true`)
 Use `filter: null` to disable project filtering.
 ## Legacy per-file (`format: batdetect2`)
 Required fields:
 - `name`
 - `format`
 - `audio_dir`
 - `annotations_dir`
 Optional fields:
 - `description`
 - `filter`
 ## Legacy merged file (`format: batdetect2_file`)
 Required fields:
 - `name`
 - `format`
 - `audio_dir`
 - `annotations_path`
 Optional fields:
 - `description`
 - `filter`
 Legacy filter options:
 - `only_annotated` (default: `true`)
 - `exclude_issues` (default: `true`)
 Use `filter: null` to disable filtering.
 ## Related guides
 - {doc}`../how_to/configure-aoef-dataset`
 - {doc}`../how_to/import-legacy-batdetect2-annotations`
--- a/docs/source/reference/index.md
+++ b/docs/source/reference/index.md
@ -7,6 +7,10 @@ configuration, and data structures.
 :maxdepth: 1
 cli/index
 data-sources
 preprocessing-config
 postprocess-config
 targets-config-workflow
 configs
 targets
 ```
--- a/docs/source/reference/postprocess-config.md
+++ b/docs/source/reference/postprocess-config.md
@ -0,0 +1,31 @@
 # Postprocess config reference
 `PostprocessConfig` controls how raw detector outputs are converted into final
 detections.
 Defined in `batdetect2.postprocess.config`.
 ## Fields
 - `nms_kernel_size` (int > 0)
  - neighborhood size for non-maximum suppression.
 - `detection_threshold` (float >= 0)
  - minimum detection score to keep a candidate event.
 - `classification_threshold` (float >= 0)
  - minimum class score used when assigning class tags.
 - `top_k_per_sec` (int > 0)
  - maximum detection density per second.
 ## Defaults
 - `detection_threshold`: `0.01`
 - `classification_threshold`: `0.1`
 - `top_k_per_sec`: `100`
 `nms_kernel_size` defaults to the library constant used by the NMS module.
 ## Related pages
 - Threshold behaviour: {doc}`../explanation/postprocessing-and-thresholds`
 - Threshold tuning workflow: {doc}`../how_to/tune-detection-threshold`
 - CLI predict options: {doc}`cli/predict`
--- a/docs/source/reference/preprocessing-config.md
+++ b/docs/source/reference/preprocessing-config.md
@ -0,0 +1,61 @@
 # Preprocessing config reference
 This page summarizes preprocessing-related config objects used by batdetect2.
 ## Audio loader config (`AudioConfig`)
 Defined in `batdetect2.audio.loader`.
 Fields:
 - `samplerate` (int): target audio sample rate in Hz.
 - `resample.enabled` (bool): whether to resample loaded audio.
 - `resample.method` (`poly` or `fourier`): resampling method.
 ## Model preprocessing config (`PreprocessingConfig`)
 Defined in `batdetect2.preprocess.config`.
 Top-level fields:
 - `audio_transforms`: ordered waveform transforms.
 - `stft`: STFT parameters.
 - `frequencies`: spectrogram frequency range.
 - `spectrogram_transforms`: ordered spectrogram transforms.
 - `size`: final resize settings.
 ### `audio_transforms` built-ins
 - `center_audio`
 - `scale_audio`
 - `fix_duration` (`duration` in seconds)
 ### `stft` fields
 - `window_duration`
 - `window_overlap`
 - `window_fn`
 ### `frequencies` fields
 - `min_freq`
 - `max_freq`
 ### `spectrogram_transforms` built-ins
 - `pcen`
 - `scale_amplitude` (`scale: db|power`)
 - `spectral_mean_subtraction`
 - `peak_normalize`
 ### `size` fields
 - `height`
 - `resize_factor`
 ## Related pages
 - Audio preprocessing how-to: {doc}`../how_to/configure-audio-preprocessing`
 - Spectrogram preprocessing how-to:
  {doc}`../how_to/configure-spectrogram-preprocessing`
 - Why consistency matters: {doc}`../explanation/preprocessing-consistency`
--- a/docs/source/reference/targets-config-workflow.md
+++ b/docs/source/reference/targets-config-workflow.md
@ -0,0 +1,61 @@
 # Targets config workflow reference
 This page summarizes the target-definition configuration used by batdetect2.
 ## `TargetConfig`
 Defined in `batdetect2.targets.config`.
 Fields:
 - `detection_target`: one `TargetClassConfig` defining detection eligibility.
 - `classification_targets`: list of `TargetClassConfig` entries for class
  encoding/decoding.
 - `roi`: default ROI mapper config.
 ## `TargetClassConfig`
 Defined in `batdetect2.targets.classes`.
 Fields:
 - `name`: class label name.
 - `tags`: tag list used for matching (shortcut for `match_if`).
 - `match_if`: explicit condition config (`match_if` is accepted as alias).
 - `assign_tags`: tags used when decoding this class.
 - `roi`: optional class-specific ROI mapper override.
 `tags` and `match_if` are mutually exclusive.
 ## Supported condition config types
 Built from `batdetect2.data.conditions`.
 - `has_tag`
 - `has_all_tags`
 - `has_any_tag`
 - `duration`
 - `frequency`
 - `all_of`
 - `any_of`
 - `not`
 ## ROI mapper config
 `roi` supports built-in mappers including:
 - `anchor_bbox`
 - `peak_energy_bbox`
 Key `anchor_bbox` fields:
 - `anchor`
 - `time_scale`
 - `frequency_scale`
 ## Related pages
 - Detection target setup: {doc}`../how_to/configure-target-definitions`
 - Class setup: {doc}`../how_to/define-target-classes`
 - ROI setup: {doc}`../how_to/configure-roi-mapping`
 - Concept overview: {doc}`../explanation/target-encoding-and-decoding`
--- a/docs/source/targets/classes.md
+++ b/docs/source/targets/classes.md
@ -1,141 +0,0 @@
 # Step 4: Defining Target Classes and Decoding Rules
 ## Purpose and Context
 You've prepared your data by defining your annotation vocabulary (Step 1: Terms), removing irrelevant sounds (Step 2: Filtering), and potentially cleaning up or modifying tags (Step 3: Transforming Tags).
 Now, it's time for a crucial step with two related goals:
 1.  Telling `batdetect2` **exactly what categories (classes) your model should learn to identify** by defining rules that map annotation tags to class names (like `pippip`, `myodau`, or `noise`).
    This process is often called **encoding**.
 2.  Defining how the model's predictions (those same class names) should be translated back into meaningful, structured **annotation tags** when you use the trained model.
    This is often called **decoding**.
 These definitions are essential for both training the model correctly and interpreting its output later.
 ## How it Works: Defining Classes with Rules
 You define your target classes and their corresponding decoding rules in your main configuration file (e.g., your `.yaml` training config), typically under a section named `classes`.
 This section contains:
 1.  A **list** of specific class definitions.
 2.  A definition for the **generic class** tags.
 Each item in the `classes` list defines one specific class your model should learn.
 ## Defining a Single Class
 Each specific class definition rule requires the following information:
 1.  `name`: **(Required)** This is the unique, simple name for this class (e.g., `pipistrellus_pipistrellus`, `myotis_daubentonii`, `noise`).
    This label is used during training and is what the model predicts.
    Choose clear, distinct names.
    **Each class name must be unique.**
 2.  `tags`: **(Required)** This list contains one or more specific tags (using `key` and `value`) used to identify if an _existing_ annotation belongs to this class during the _encoding_ phase (preparing training data).
 3.  `match_type`: **(Optional, defaults to `"all"`)** Determines how the `tags` list is evaluated during _encoding_:
    - `"all"`: The annotation must have **ALL** listed tags to match.
      (Default).
    - `"any"`: The annotation needs **AT LEAST ONE** listed tag to match.
 4.  `output_tags`: **(Optional)** This list specifies the tags that should be assigned to an annotation when the model _predicts_ this class `name`.
    This is used during the _decoding_ phase (interpreting model output).
    - **If you omit `output_tags` (or set it to `null`/~), the system will default to using the same tags listed in the `tags` field for decoding.** This is often what you want.
    - Providing `output_tags` allows you to specify a different, potentially more canonical or detailed, set of tags to represent the class upon prediction.
      For example, you could match based on simplified tags but output standardized tags.
 **Example: Defining Species Classes (Encoding & Default Decoding)**
 Here, the `tags` used for matching during encoding will also be used for decoding, as `output_tags` is omitted.
 ```yaml
 # In your main configuration file
 classes:
  # Definition for the first class
  - name: pippip # Simple name for Pipistrellus pipistrellus
    tags: # Used for BOTH encoding match and decoding output
      - key: species
        value: Pipistrellus pipistrellus
    # match_type defaults to "all"
    # output_tags is omitted, defaults to using 'tags' above
  # Definition for the second class
  - name: myodau # Simple name for Myotis daubentonii
    tags: # Used for BOTH encoding match and decoding output
      - key: species
        value: Myotis daubentonii
 ```
 **Example: Defining a Class with Separate Encoding and Decoding Tags**
 Here, we match based on _either_ of two tags (`match_type: any`), but when the model predicts `'pipistrelle'`, we decode it _only_ to the specific `Pipistrellus pipistrellus` tag plus a genus tag.
 ```yaml
 classes:
  - name: pipistrelle # Name for a Pipistrellus group
    match_type: any # Match if EITHER tag below is present during encoding
    tags:
      - key: species
        value: Pipistrellus pipistrellus
      - key: species
        value: Pipistrellus pygmaeus # Match pygmaeus too
    output_tags: # BUT, when decoding 'pipistrelle', assign THESE tags:
      - key: species
        value: Pipistrellus pipistrellus # Canonical species
      - key: genus # Assumes 'genus' key exists
        value: Pipistrellus # Add genus tag
 ```
 ## Handling Overlap During Encoding: Priority Order Matters
 As before, when preparing training data (encoding), if an annotation matches the `tags` and `match_type` rules for multiple class definitions, the **order of the class definitions in the configuration list determines the priority**.
 - The system checks rules from the **top** of the `classes` list down.
 - The annotation gets assigned the `name` of the **first class rule it matches**.
 - **Place more specific rules before more general rules.**
 _(The YAML example for prioritizing Species over Noise remains the same as the previous version)_
 ## Handling Non-Matches & Decoding the Generic Class
 What happens if an annotation passes filtering/transformation but doesn't match any specific class rule during encoding?
 - **Encoding:** As explained previously, these annotations are **not ignored**.
  They are typically assigned to a generic "relevant sound" category, often called the **"Bat"** class in BatDetect2, intended for all relevant bat calls not specifically classified.
 - **Decoding:** When the model predicts this generic "Bat" category (or when processing sounds that weren't assigned a specific class during encoding), we need a way to represent this generic status with tags.
  This is defined by the `generic_class` list directly within the main `classes` configuration section.
 **Defining the Generic Class Tags:**
 You specify the tags for the generic class like this:
 ```yaml
 # In your main configuration file
 classes: # Main configuration section for classes
  # --- List of specific class definitions ---
  classes:
    - name: pippip
      tags:
        - key: species
          value: Pipistrellus pipistrellus
    # ... other specific classes ...
  # --- Definition of the generic class tags ---
  generic_class: # Define tags for the generic 'Bat' category
    - key: call_type
      value: Echolocation
    - key: order
      value: Chiroptera
    # These tags will be assigned when decoding the generic category
 ```
 This `generic_class` list provides the standard tags assigned when a sound is identified as relevant (passed filtering) but doesn't belong to one of the specific target classes you defined.
 Like the specific classes, sensible defaults are often provided if you don't explicitly define `generic_class`.
 **Crucially:** Remember, if sounds should be **completely excluded** from training (not even considered "generic"), use **Filtering rules (Step 2)**.
 ### Outcome
 By defining this list of prioritized class rules (including their `name`, matching `tags`, `match_type`, and optional `output_tags`) and the `generic_class` tags, you provide `batdetect2` with:
 1.  A clear procedure to assign a target label (`name`) to each relevant annotation for training.
 2.  A clear mapping to convert predicted class names (including the generic case) back into meaningful annotation tags.
 This complete definition prepares your data for the final heatmap generation (Step 5) and enables interpretation of the model's results.
--- a/docs/source/targets/filtering.md
+++ b/docs/source/targets/filtering.md
@ -1,141 +0,0 @@
 # Step 2: Filtering Sound Events
 ## Purpose
 When preparing your annotated audio data for training a `batdetect2` model, you often want to select only specific sound events.
 For example, you might want to:
 - Focus only on echolocation calls and ignore social calls or noise.
 - Exclude annotations that were marked as low quality.
 - Train only on specific species or groups of species.
 This filtering module allows you to define rules based on the **tags** associated with each sound event annotation.
 Only the events that pass _all_ your defined rules will be kept for further processing and training.
 ## How it Works: Rules
 Filtering is controlled by a list of **rules**.
 Each rule defines a condition based on the tags attached to a sound event.
 An event must satisfy **all** the rules you define in your configuration to be included.
 If an event fails even one rule, it is discarded.
 ## Defining Rules in Configuration
 You define these rules within your main configuration file (usually a `.yaml` file) under a specific section (the exact name might depend on the main training config, but let's assume it's called `filtering`).
 The configuration consists of a list named `rules`.
 Each item in this list is a single filter rule.
 Each **rule** has two parts:
 1.  `match_type`: Specifies the _kind_ of check to perform.
 2.  `tags`: A list of specific tags (each with a `key` and `value`) that the rule applies to.
 ```yaml
 # Example structure in your configuration file
 filtering:
  rules:
    - match_type: <TYPE_OF_CHECK_1>
      tags:
        - key: <tag_key_1a>
          value: <tag_value_1a>
        - key: <tag_key_1b>
          value: <tag_value_1b>
    - match_type: <TYPE_OF_CHECK_2>
      tags:
        - key: <tag_key_2a>
          value: <tag_value_2a>
    # ... add more rules as needed
 ```
 ## Understanding `match_type`
 This determines _how_ the list of `tags` in the rule is used to check a sound event.
 There are four types:
 1.  **`any`**: (Keep if _at least one_ tag matches)
    - The sound event **passes** this rule if it has **at least one** of the tags listed in the `tags` section of the rule.
    - Think of it as an **OR** condition.
    - _Example Use Case:_ Keep events if they are tagged as `Species: Pip Pip` OR `Species: Pip Pyg`.
 2.  **`all`**: (Keep only if _all_ tags match)
    - The sound event **passes** this rule only if it has **all** of the tags listed in the `tags` section.
      The event can have _other_ tags as well, but it must contain _all_ the ones specified here.
    - Think of it as an **AND** condition.
    - _Example Use Case:_ Keep events only if they are tagged with `Sound Type: Echolocation` AND `Quality: Good`.
 3.  **`exclude`**: (Discard if _any_ tag matches)
    - The sound event **passes** this rule only if it does **not** have **any** of the tags listed in the `tags` section.
      If it matches even one tag in the list, the event is discarded.
    - _Example Use Case:_ Discard events if they are tagged `Quality: Poor` OR `Noise Source: Insect`.
 4.  **`equal`**: (Keep only if tags match _exactly_)
    - The sound event **passes** this rule only if its set of tags is _exactly identical_ to the list of `tags` provided in the rule (no more, no less).
    - _Note:_ This is very strict and usually less useful than `all` or `any`.
 ## Combining Rules
 Remember: A sound event must **pass every single rule** defined in the `rules` list to be kept.
 The rules are checked one by one, and if an event fails any rule, it's immediately excluded from further consideration.
 ## Examples
 **Example 1: Keep good quality echolocation calls**
 ```yaml
 filtering:
  rules:
    # Rule 1: Must have the 'Echolocation' tag
    - match_type: any # Could also use 'all' if 'Sound Type' is the only tag expected
      tags:
        - key: Sound Type
          value: Echolocation
    # Rule 2: Must NOT have the 'Poor' quality tag
    - match_type: exclude
      tags:
        - key: Quality
          value: Poor
 ```
 _Explanation:_ An event is kept only if it passes BOTH rules.
 It must have the `Sound Type: Echolocation` tag AND it must NOT have the `Quality: Poor` tag.
 **Example 2: Keep calls from Pipistrellus species recorded in a specific project, excluding uncertain IDs**
 ```yaml
 filtering:
  rules:
    # Rule 1: Must be either Pip pip or Pip pyg
    - match_type: any
      tags:
        - key: Species
          value: Pipistrellus pipistrellus
        - key: Species
          value: Pipistrellus pygmaeus
    # Rule 2: Must belong to 'Project Alpha'
    - match_type: any # Using 'any' as it likely only has one project tag
      tags:
        - key: Project ID
          value: Project Alpha
    # Rule 3: Exclude if ID Certainty is 'Low' or 'Maybe'
    - match_type: exclude
      tags:
        - key: ID Certainty
          value: Low
        - key: ID Certainty
          value: Maybe
 ```
 _Explanation:_ An event is kept only if it passes ALL three rules:
 1.  It has a `Species` tag that is _either_ `Pipistrellus pipistrellus` OR `Pipistrellus pygmaeus`.
 2.  It has the `Project ID: Project Alpha` tag.
 3.  It does _not_ have an `ID Certainty: Low` tag AND it does _not_ have an `ID Certainty: Maybe` tag.
 ## Usage
 You will typically specify the path to the configuration file containing these `filtering` rules when you set up your data processing or training pipeline in `batdetect2`.
 The tool will then automatically load these rules and apply them to your annotated sound events.
--- a/docs/source/targets/index.md
+++ b/docs/source/targets/index.md
@ -1,79 +0,0 @@
 # Defining Training Targets
 A crucial aspect of training any supervised machine learning model, including BatDetect2, is clearly defining the **training targets**.
 This process determines precisely what the model should learn to detect, localize, classify, and characterize from the input data (in this case, spectrograms).
 The choices made here directly influence the model's focus, its performance, and how its predictions should be interpreted.
 For BatDetect2, defining targets involves specifying:
 - Which sounds in your annotated dataset are relevant for training.
 - How these sounds should be categorized into distinct **classes** (e.g., different species).
 - How the geometric **Region of Interest (ROI)** (e.g., bounding box) of each sound maps to the specific **position** and **size** targets the model predicts.
 - How these classes and geometric properties relate back to the detailed information stored in your annotation **tags** (using a consistent **vocabulary/terms**).
 - How the model's output (predicted class names, positions, sizes) should be translated back into meaningful tags and geometries.
 ## Sound Event Annotations: The Starting Point
 BatDetect2 assumes your training data consists of audio recordings where relevant sound events have been **annotated**.
 A typical annotation for a single sound event provides two key pieces of information:
 1.  **Location & Extent:** Information defining _where_ the sound occurs in time and frequency, usually represented as a **bounding box** (the ROI) drawn on a spectrogram.
 2.  **Description (Tags):** Information _about_ the sound event, provided as a set of descriptive **tags** (key-value pairs).
 For example, an annotation might have a bounding box and tags like:
 - `species: Myotis daubentonii`
 - `quality: Good`
 - `call_type: Echolocation`
 A single sound event can have **multiple tags**, allowing for rich descriptions.
 This richness requires a structured process to translate the annotation (both tags and geometry) into the precise targets needed for model training.
 The **target definition process** provides clear rules to:
 - Interpret the meaning of different tag keys (**Terms**).
 - Select only the relevant annotations (**Filtering**).
 - Potentially standardize or modify the tags (**Transforming**).
 - Map the geometric ROI to specific position and size targets (**ROI Mapping**).
 - Map the final set of tags on each selected annotation to a single, definitive **target class** label (**Classes**).
 ## Configuration-Driven Workflow
 BatDetect2 is designed so that researchers can configure this entire target definition process primarily through **configuration files** (typically written in YAML format), minimizing the need for direct programming for standard workflows.
 These settings are usually grouped under a main `targets:` key within your overall training configuration file.
 ## The Target Definition Steps
 Defining the targets involves several sequential steps, each configurable and building upon the previous one:
 1.  **Defining Vocabulary (Terms & Tags):** Understand how annotations use tags (key-value pairs).
    This step involves defining the meaning (**Terms**) behind the tag keys (e.g., `species`, `call_type`).
    Often, default terms are sufficient, but understanding this is key to using tags in later steps.
    (See: {doc}`tags_and_terms`})
 2.  **Filtering Sound Events:** Select only the relevant sound event annotations based on their tags (e.g., keeping only high-quality calls).
    (See: {doc}`filtering`})
 3.  **Transforming Tags (Optional):** Modify tags on selected annotations for standardization, correction, grouping (e.g., species to genus), or deriving new tags.
    (See: {doc}`transform`})
 4.  **Defining Classes & Decoding Rules:** Map the final tags to specific target **class names** (like `pippip` or `myodau`).
    Define priorities for overlap and specify how predicted names map back to tags (decoding).
    (See: {doc}`classes`})
 5.  **Mapping ROIs (Position & Size):** Define how the geometric ROI (e.g., bounding box) of each sound event maps to the specific reference **point** (e.g., center, corner) and scaled **size** values (width, height) used as targets by the model.
    (See: {doc}`rois`})
 6.  **The `Targets` Object:** Understand the outcome of configuring steps 1-5 – a functional object used internally by BatDetect2 that encapsulates all your defined rules for filtering, transforming, ROI mapping, encoding, and decoding.
    (See: {doc}`use`)
 The result of this configuration process is a clear set of instructions that BatDetect2 uses during training data preparation to determine the correct "answer" (the ground truth label and geometry representation) for each relevant sound event.
 Explore the detailed steps using the links below:
 ```{toctree}
 :maxdepth: 1
 :caption: Target Definition Steps:
 tags_and_terms
 filtering
 transform
 classes
 rois
 labels
 use
 ```
--- a/docs/source/targets/labels.md
+++ b/docs/source/targets/labels.md
@ -1,76 +0,0 @@
 # Step 5: Generating Training Targets
 ## Purpose and Context
 Following the previous steps of defining terms, filtering events, transforming tags, and defining specific class rules, this final stage focuses on **generating the ground truth data** used directly for training the BatDetect2 model.
 This involves converting the refined annotation information for each audio clip into specific **heatmap formats** required by the underlying neural network architecture.
 This step essentially translates your structured annotations into the precise "answer key" the model learns to replicate during training.
 ## What are Heatmaps?
 Heatmaps, in this context, are multi-dimensional arrays, often visualized as images aligned with the input spectrogram, where the values at different time-frequency coordinates represent specific information about the sound events.
 For BatDetect2 training, three primary heatmaps are generated:
 1.  **Detection Heatmap:**
    - **Represents:** The presence or likelihood of relevant sound events across the spectrogram.
    - **Structure:** A 2D array matching the spectrogram's time-frequency dimensions.
      Peaks (typically smoothed) are generated at the reference locations of all sound events that passed the filtering stage (including both specifically classified events and those falling into the generic "Bat" category).
 2.  **Class Heatmap:**
    - **Represents:** The location and class identity for sounds belonging to the _specific_ target classes you defined in Step 4.
    - **Structure:** A 3D array with dimensions for category, time, and frequency.
      It contains a separate 2D layer (channel) for each target class name (e.g., 'pippip', 'myodau').
      A smoothed peak appears on a specific class layer only if a sound event assigned to that class exists at that location.
      Events assigned only to the generic class do not produce peaks here.
 3.  **Size Heatmap:**
    - **Represents:** The target dimensions (duration/width and bandwidth/height) of detected sound events.
    - **Structure:** A 3D array with dimensions for size-dimension ('width', 'height'), time, and frequency.
      At the reference location of each detected sound event, this heatmap stores two numerical values corresponding to the scaled width and height derived from the event's bounding box.
 ## How Heatmaps are Created
 The generation of these heatmaps is an automated process within `batdetect2`, driven by your configurations from all previous steps.
 For each audio clip and its corresponding spectrogram in the training dataset:
 1.  The system retrieves the associated sound event annotations.
 2.  Configured **filtering rules** (Step 2) are applied to select relevant annotations.
 3.  Configured **tag transformation rules** (Step 3) are applied to modify the tags of the selected annotations.
 4.  Configured **class definition rules** (Step 4) are used to assign a specific class name or determine generic "Bat" status for each processed annotation.
 5.  These final annotations are then mapped onto initialized heatmap arrays:
    - A signal (initially a single point) is placed on the **Detection Heatmap** at the reference location for each relevant annotation.
    - The scaled width and height values are placed on the **Size Heatmap** at the reference location.
    - If an annotation received a specific class name, a signal is placed on the corresponding layer of the **Class Heatmap** at the reference location.
 6.  Finally, Gaussian smoothing (a blurring effect) is typically applied to the Detection and Class heatmaps to create spatially smoother targets, which often aids model training stability and performance.
 ## Configurable Settings for Heatmap Generation
 While the content of the heatmaps is primarily determined by the previous configuration steps, a few parameters specific to the heatmap drawing process itself can be adjusted.
 These are usually set in your main configuration file under a section like `labelling`:
 - `sigma`: (Number, e.g., `3.0`) Defines the standard deviation, in pixels or bins, of the Gaussian kernel used for smoothing the Detection and Class heatmaps.
  Larger values result in more diffused heatmap peaks.
 - `position`: (Text, e.g., `"bottom-left"`, `"center"`) Specifies the geometric reference point within each sound event's bounding box that anchors its representation on the heatmaps.
 - `time_scale` & `frequency_scale`: (Numbers) These crucial scaling factors convert the physical duration (in seconds) and frequency bandwidth (in Hz) of annotation bounding boxes into the numerical values stored in the 'width' and 'height' channels of the Size Heatmap.
  - **Important Note:** The appropriate values for these scales are dictated by the requirements of the specific BatDetect2 model architecture being trained.
    They ensure the size information is presented in the units or relative scale the model expects.
    **Consult the documentation or tutorials for your specific model to determine the correct `time_scale` and `frequency_scale` values.** Mismatched scales can hinder the model's ability to learn size regression accurately.
 **Example YAML Configuration for Labelling Settings:**
 ```yaml
 # In your main configuration file
 labelling:
  sigma: 3.0 # Std. dev. for Gaussian smoothing (pixels/bins)
  position: "bottom-left" # Bounding box reference point
  time_scale: 1000.0 # Example: Scales seconds to milliseconds
  frequency_scale: 0.00116 # Example: Scales Hz relative to ~860 Hz (model specific!)
 ```
 ## Outcome: Final Training Targets
 Executing this step for all training data yields the complete set of target heatmaps (Detection, Class, Size) for each corresponding input spectrogram.
 These arrays constitute the ground truth data that the BatDetect2 model directly compares its predictions against during the training phase, guiding its learning process.
--- a/docs/source/targets/rois.md
+++ b/docs/source/targets/rois.md
@ -1,85 +0,0 @@
 # Defining Target Geometry: Mapping Sound Event Regions
 ## Introduction
 In the previous steps of defining targets, we focused on determining _which_ sound events are relevant (`filtering`), _what_ descriptive tags they should have (`transform`), and _which category_ they belong to (`classes`).
 However, for the model to learn effectively, it also needs to know **where** in the spectrogram each sound event is located and approximately **how large** it is.
 Your annotations typically define the location and extent of a sound event using a **Region of Interest (ROI)**, most commonly a **bounding box** drawn around the call on the spectrogram.
 This ROI contains detailed spatial information (start/end time, low/high frequency).
 This section explains how BatDetect2 converts the geometric ROI from your annotations into the specific positional and size information used as targets during model training.
 ## From ROI to Model Targets: Position & Size
 BatDetect2 does not directly predict a full bounding box.
 Instead, it is trained to predict:
 1.  **A Reference Point:** A single point `(time, frequency)` that represents the primary location of the detected sound event within the spectrogram.
 2.  **Size Dimensions:** Numerical values representing the event's size relative to that reference point, typically its `width` (duration in time) and `height` (bandwidth in frequency).
 This step defines _how_ BatDetect2 calculates this specific reference point and these numerical size values from the original annotation's bounding box.
 It also handles the reverse process – converting predicted positions and sizes back into bounding boxes for visualization or analysis.
 ## Configuring the ROI Mapping
 You can control how this conversion happens through settings in your configuration file (e.g., your main `.yaml` file).
 These settings are usually placed within the main `targets:` configuration block, under a specific `roi:` key.
 Here are the key settings:
 - **`position`**:
  - **What it does:** Determines which specific point on the annotation's bounding box is used as the single **Reference Point** for training (e.g., `"center"`, `"bottom-left"`).
  - **Why configure it?** This affects where the peak signal appears in the target heatmaps used for training.
    Different choices might slightly influence model learning.
    The default (`"bottom-left"`) is often a good starting point.
  - **Example Value:** `position: "center"`
 - **`time_scale`**:
  - **What it does:** This is a numerical scaling factor that converts the _actual duration_ (width, measured in seconds) of the bounding box into the numerical 'width' value the model learns to predict (and which is stored in the Size Heatmap).
  - **Why configure it?** The model predicts raw numbers for size; this scale gives those numbers meaning.
    For example, setting `time_scale: 1000.0` means the model will be trained to predict the duration in **milliseconds** instead of seconds.
  - **Important Considerations:**
    - You can often set this value based on the units you prefer the model to work with internally.
      However, having target numerical values roughly centered around 1 (e.g., typically between 0.1 and 10) can sometimes improve numerical stability during model training.
    - The default value in BatDetect2 (e.g., `1000.0`) has been chosen to scale the duration relative to the spectrogram width under default STFT settings.
      Be aware that if you significantly change STFT parameters (window size or overlap), the relationship between the default scale and the spectrogram dimensions might change.
    - Crucially, whatever scale you use during training **must** be used when decoding the model's predictions back into real-world time units (seconds).
      BatDetect2 generally handles this consistency for you automatically when using the full pipeline.
  - **Example Value:** `time_scale: 1000.0`
 - **`frequency_scale`**:
  - **What it does:** Similar to `time_scale`, this numerical scaling factor converts the _actual frequency bandwidth_ (height, typically measured in Hz or kHz) of the bounding box into the numerical 'height' value the model learns to predict.
  - **Why configure it?** It gives physical meaning to the model's raw numerical prediction for bandwidth and allows you to choose the internal units or scale.
  - **Important Considerations:**
    - Same as for `time_scale`.
  - **Example Value:** `frequency_scale: 0.00116`
 **Example YAML Configuration:**
 ```yaml
 # Inside your main configuration file (e.g., training_config.yaml)
 targets: # Top-level key for target definition
  # ... filtering settings ...
  # ... transforms settings ...
  # ... classes settings ...
  # --- ROI Mapping Settings ---
  roi:
    position: "bottom-left" # Reference point (e.g., "center", "bottom-left")
    time_scale: 1000.0 # e.g., Model predicts width in ms
    frequency_scale: 0.00116 # e.g., Model predicts height relative to ~860Hz (or other model-specific scaling)
 ```
 ## Decoding Size Predictions
 These scaling factors (`time_scale`, `frequency_scale`) are also essential for interpreting the model's output correctly.
 When the model predicts numerical values for width and height, BatDetect2 uses these same scales (in reverse) to convert those numbers back into physically meaningful durations (seconds) and bandwidths (Hz/kHz) when reconstructing bounding boxes from predictions.
 ## Outcome
 By configuring the `roi` settings, you ensure that BatDetect2 consistently translates the geometric information from your annotations into the specific reference points and scaled size values required for training the model.
 Using consistent scales that are appropriate for your data and potentially beneficial for training stability allows the model to effectively learn not just _what_ sound is present, but also _where_ it is located and _how large_ it is, and enables meaningful interpretation of the model's spatial and size predictions.
--- a/docs/source/targets/tags_and_terms.md
+++ b/docs/source/targets/tags_and_terms.md
@ -1,166 +0,0 @@
 # Step 1: Managing Annotation Vocabulary
 ## Purpose
 To train `batdetect2`, you will need sound events that have been carefully annotated. We annotate sound events using **tags**. A tag is simply a piece of information attached to an annotation, often describing what the sound is or its characteristics. Common examples include `Species: Myotis daubentonii` or `Quality: Good`.
 Each tag fundamentally has two parts:
 * **Value:** The specific information (e.g., "Myotis daubentonii", "Good").
 * **Term:** The *type* of information (e.g., "Species", "Quality"). This defines the context or meaning of the value.
 We use this flexible **Term: Value** approach because it allows you to annotate your data with any kind of information relevant to your project, while still providing a structure that makes the meaning clear.
 While simple terms like "Species" are easy to understand, sometimes the underlying definition needs to be more precise to ensure everyone interprets it the same way (e.g., using a standard scientific definition for "Species" or clarifying what "Call Type" specifically refers to).
 This `terms` module is designed to help manage these definitions effectively:
 1.  It provides **standard definitions** for common terms used in bioacoustics, ensuring consistency.
 2.  It lets you **define your own custom terms** if you need concepts specific to your project.
 3.  Crucially, it allows you to use simple **"keys"** (like shortcuts) in your configuration files to refer to these potentially complex term definitions, making configuration much easier and less error-prone.
 ## The Problem: Why We Need Defined Terms
 Imagine you have a tag that simply says `"Myomyo"`.
 If you created this tag, you might know it's a shortcut for the species _Myotis myotis_.
 But what about someone else using your data or model later? Does `"Myomyo"` refer to the species? Or maybe it's the name of an individual bat, or even the location where it was recorded? Simple tags like this can be ambiguous.
 To make things clearer, it's good practice to provide context.
 We can do this by pairing the specific information (the **Value**) with the _type_ of information (the **Term**).
 For example, writing the tag as `species: Myomyo` is much less ambiguous.
 Here, `species` is the **Term**, explaining that `Myomyo` is a **Value** representing a species.
 However, another challenge often comes up when sharing data or collaborating.
 You might use the term `species`, while a colleague uses `Species`, and someone else uses the more formal `Scientific Name`.
 Even though you all mean the same thing, these inconsistencies make it hard to combine data or reuse analysis pipelines automatically.
 This is where standardized **Terms** become very helpful.
 Several groups work to create standard definitions for common concepts.
 For instance, the Darwin Core standard provides widely accepted terms for biological data, like `dwc:scientificName` for a species name.
 Using standard Terms whenever possible makes your data clearer, easier for others (and machines!) to understand correctly, and much more reusable across different projects.
 **But here's the practical problem:** While using standard, well-defined Terms is important for clarity and reusability, writing out full definitions or long standard names (like `dwc:scientificName` or "Scientific Name according to Darwin Core standard") every single time you need to refer to a species tag in a configuration file would be extremely tedious and prone to typing errors.
 ## The Solution: Keys (Shortcuts) and the Registry
 This module uses a central **Registry** that stores the full definitions of various Terms.
 Each Term in the registry is assigned a unique, short **key** (a simple string).
 Think of the **key** as shortcut.
 Instead of using the full Term definition in your configuration files, you just use its **key**.
 The system automatically looks up the full definition in the registry using the key when needed.
 **Example:**
 - **Full Term Definition:** Represents the scientific name of the organism.
 - **Key:** `species`
 - **In Config:** You just write `species`.
 ## Available Keys
 The registry comes pre-loaded with keys for many standard terms used in bioacoustics, including those from the `soundevent` package and some specific to `batdetect2`. This means you can often use these common concepts without needing to define them yourself.
 Common examples of pre-defined keys might include:
 * `species`: For scientific species names (e.g., *Myotis daubentonii*).
 * `common_name`: For the common name of a species (e.g., "Daubenton's bat").
 * `genus`, `family`, `order`: For higher levels of biological taxonomy.
 * `call_type`: For functional call types (e.g., 'Echolocation', 'Social').
 * `individual`: For identifying specific individuals if tracked.
 * `class`: **(Special Key)** This key is often used **by default** in configurations when defining the target classes for your model (e.g., the different species you want the model to classify). If you are specifying a tag that represents a target class label, you often only need to provide the `value`, and the system assumes the `key` is `class`.
 This is not an exhaustive list. To discover all the term keys currently available in the registry (including any standard ones loaded automatically and any custom ones you've added in your configuration), you can:
 1.  Use the function `batdetect2.terms.get_term_keys()` if you are working directly with Python code.
 2.  Refer to the main `batdetect2` API documentation for a list of commonly included standard terms.
 Okay, let's refine the "Defining Your Own Terms" section to incorporate the explanation about namespacing within the `name` field description, keeping the style clear and researcher-focused.
 ## Defining Your Own Terms
 While many common terms have pre-defined keys, you might need a term specific to your project or data that isn't already available (e.g., "Recording Setup", "Weather Condition", "Project Phase", "Noise Source"). You can easily define these custom terms directly within a configuration file (usually your main `.yaml` file).
 Typically, you define custom terms under a dedicated section (often named `terms`). Inside this section, you create a list, where each item in the list defines one new term using the following fields:
 * `key`: **(Required)** This is the unique shortcut key or nickname you will use to refer to this term throughout your configuration (e.g., `weather`, `setup_id`, `noise_src`). Choose something short and memorable.
 * `label`: (Optional) A user-friendly label for the term, which might be used in reports or visualizations (e.g., "Weather Condition", "Setup ID"). If you don't provide one, it defaults to using the `key`.
 * `name`: (Optional) A more formal or technical name for the term.
    * It's good practice, especially if defining terms that might overlap with standard vocabularies, to use a **namespaced format** like `<namespace>:<term_name>`. The `namespace` part helps avoid clashes with terms defined elsewhere. For example, the standard Darwin Core term for scientific name is `dwc:scientificName`, where `dwc` is the namespace for Darwin Core. Using namespaces makes your custom terms more specific and reduces potential confusion.
    * If you don't provide a `name`, it defaults to using the `key`.
 * `definition`: (Optional) A brief text description explaining what this term represents (e.g., "The primary source of background noise identified", "General weather conditions during recording"). If omitted, it defaults to "Unknown".
 * `uri`: (Optional) If your term definition comes directly from a standard online vocabulary (like Darwin Core), you can include its unique web identifier (URI) here.
 **Example YAML Configuration for Custom Terms:**
 ```yaml
 # In your main configuration file
 # (Optional section to define custom terms)
 terms:
  - key: weather              # Your chosen shortcut
    label: Weather Condition
    name: myproj:weather      # Formal namespaced name
    definition: General weather conditions during recording (e.g., Clear, Rain, Fog).
  - key: setup_id             # Another shortcut
    label: Recording Setup ID
    name: myproj:setupID      # Formal namespaced name
    definition: The unique identifier for the specific hardware setup used.
  - key: species              # Defining a term with a standard URI
    label: Scientific Name
    name: dwc:scientificName
    uri: http://rs.tdwg.org/dwc/terms/scientificName # Example URI
    definition: The full scientific name according to Darwin Core.
 # ... other configuration sections ...
 ```
 When `batdetect2` loads your configuration, it reads this `terms` section and adds your custom definitions (linked to their unique keys) to the central registry. These keys (`weather`, `setup_id`, etc.) are then ready to be used in other parts of your configuration, like defining filters or target classes.
 ## Using Keys to Specify Tags (in Filters, Class Definitions, etc.)
 Now that you have keys for all the terms you need (both pre-defined and custom), you can easily refer to specific **tags** in other parts of your configuration, such as:
 - Filtering rules (as seen in the `filtering` module documentation).
 - Defining which tags represent your target classes.
 - Associating extra information with your classes.
 When you need to specify a tag, you typically use a structure with two fields:
 - `key`: The **key** (shortcut) for the _Term_ part of the tag (e.g., `species`, `quality`, `weather`).
  **It defaults to `class`** if you omit it, which is common when defining the main target classes.
 - `value`: The specific _value_ of the tag (e.g., `Myotis daubentonii`, `Good`, `Rain`).
 **Example YAML Configuration (e.g., inside a filter rule):**
 ```yaml
 # ... inside a filtering configuration section ...
 rules:
  # Rule: Exclude events recorded in 'Rain'
  - match_type: exclude
    tags:
      - key: weather # Use the custom term key defined earlier
        value: Rain
  # Rule: Keep only 'Myotis daubentonii' (using the default 'class' key implicitly)
  - match_type: any # Or 'all' depending on logic
    tags:
      - value: Myotis daubentonii # 'key: class' is assumed by default here
        # key: class # Explicitly writing this is also fine
  # Rule: Keep only 'Good' quality events
  - match_type: any # Or 'all' depending on logic
    tags:
      - key: quality # Use a likely pre-defined key
        value: Good
 ```
 ## Summary
 - Annotations have **tags** (Term + Value).
 - This module uses short **keys** as shortcuts for Term definitions, stored in a **registry**.
 - Many **common keys are pre-defined**.
 - You can define **custom terms and keys** in your configuration file (using `key`, `label`, `definition`).
 - You use these **keys** along with specific **values** to refer to tags in other configuration sections (like filters or class definitions), often defaulting to the `class` key.
 This system makes your configurations cleaner, more readable, and less prone to errors by avoiding repetition of complex term definitions.
--- a/docs/source/targets/transform.md
+++ b/docs/source/targets/transform.md
@ -1,118 +0,0 @@
 # Step 3: Transforming Annotation Tags (Optional)
 ## Purpose and Context
 After defining your vocabulary (Step 1: Terms) and filtering out irrelevant sound events (Step 2: Filtering), you have a dataset of annotations ready for the next stages.
 Before you select the final target classes for training (Step 4), you might want or need to **modify the tags** associated with your annotations.
 This optional step allows you to clean up, standardize, or derive new information from your existing tags.
 **Why transform tags?**
 - **Correcting Mistakes:** Fix typos or incorrect values in specific tags (e.g., changing an incorrect species label).
 - **Standardizing Labels:** Ensure consistency if the same information was tagged using slightly different values (e.g., mapping "echolocation", "Echoloc.", and "Echolocation Call" all to a single standard value: "Echolocation").
 - **Grouping Related Concepts:** Combine different specific tags into a broader category (e.g., mapping several different species tags like _Myotis daubentonii_ and _Myotis nattereri_ to a single `genus: Myotis` tag).
 - **Deriving New Information:** Automatically create new tags based on existing ones (e.g., automatically generating a `genus: Myotis` tag whenever a `species: Myotis daubentonii` tag is present).
 This step uses the `batdetect2.targets.transform` module to apply these changes based on rules you define.
 ## How it Works: Transformation Rules
 You control how tags are transformed by defining a list of **rules** in your configuration file (e.g., your main `.yaml` file, often under a section named `transform`).
 Each rule specifies a particular type of transformation to perform.
 Importantly, the rules are applied **sequentially**, in the exact order they appear in your configuration list.
 The output annotation from one rule becomes the input for the next rule in the list.
 This means the order can matter!
 ## Types of Transformation Rules
 Here are the main types of rules you can define:
 1.  **Replace an Exact Tag (`replace`)**
    - **Use Case:** Fixing a specific, known incorrect tag.
    - **How it works:** You specify the _exact_ original tag (both its term key and value) and the _exact_ tag you want to replace it with.
    - **Example Config:** Replace the informal tag `species: Pip pip` with the correct scientific name tag.
      ```yaml
      transform:
        rules:
          - rule_type: replace
            original:
              key: species # Term key of the tag to find
              value: "Pip pip" # Value of the tag to find
            replacement:
              key: species # Term key of the replacement tag
              value: "Pipistrellus pipistrellus" # Value of the replacement tag
      ```
 2.  **Map Values (`map_value`)**
    - **Use Case:** Standardizing different values used for the same concept, or grouping multiple specific values into one category.
    - **How it works:** You specify a `source_term_key` (the type of tag to look at, e.g., `call_type`).
      Then you provide a `value_mapping` dictionary listing original values and the new values they should be mapped to.
      Only tags matching the `source_term_key` and having a value listed in the mapping will be changed.
      You can optionally specify a `target_term_key` if you want to change the term type as well (e.g., mapping species to a genus).
    - **Example Config:** Standardize different ways "Echolocation" might have been written for the `call_type` term.
      ```yaml
      transform:
        rules:
          - rule_type: map_value
            source_term_key: call_type # Look at 'call_type' tags
            # target_term_key is not specified, so the term stays 'call_type'
            value_mapping:
              echolocation: Echolocation
              Echolocation Call: Echolocation
              Echoloc.: Echolocation
              # Add mappings for other values like 'Social' if needed
      ```
    - **Example Config (Grouping):** Map specific Pipistrellus species tags to a single `genus: Pipistrellus` tag.
      ```yaml
      transform:
        rules:
          - rule_type: map_value
            source_term_key: species # Look at 'species' tags
            target_term_key: genus # Change the term to 'genus'
            value_mapping:
              "Pipistrellus pipistrellus": Pipistrellus
              "Pipistrellus pygmaeus": Pipistrellus
              "Pipistrellus nathusii": Pipistrellus
      ```
 3.  **Derive a New Tag (`derive_tag`)**
    - **Use Case:** Automatically creating new information based on existing tags, like getting the genus from a species name.
    - **How it works:** You specify a `source_term_key` (e.g., `species`).
      You provide a `target_term_key` for the new tag to be created (e.g., `genus`).
      You also provide the name of a `derivation_function` (e.g., `"extract_genus"`) that knows how to perform the calculation (e.g., take "Myotis daubentonii" and return "Myotis").
      `batdetect2` has some built-in functions, or you can potentially define your own (see advanced documentation).
      You can also choose whether to keep the original source tag (`keep_source: true`).
    - **Example Config:** Create a `genus` tag from the existing `species` tag, keeping the species tag.
      ```yaml
      transform:
        rules:
          - rule_type: derive_tag
            source_term_key: species # Use the value from the 'species' tag
            target_term_key: genus # Create a tag with the 'genus' term
            derivation_function: extract_genus # Use the built-in function for this
            keep_source: true # Keep the original 'species' tag
      ```
    - **Another Example:** Convert species names to uppercase (modifying the value of the _same_ term).
      ```yaml
      transform:
        rules:
          - rule_type: derive_tag
            source_term_key: species # Use the value from the 'species' tag
            # target_term_key is not specified, so the term stays 'species'
            derivation_function: to_upper_case # Assume this function exists
            keep_source: false # Replace the original species tag
      ```
 ## Rule Order Matters
 Remember that rules are applied one after another.
 If you have multiple rules, make sure they are ordered correctly to achieve the desired outcome.
 For instance, you might want to standardize species names _before_ deriving the genus from them.
 ## Outcome
 After applying all the transformation rules you've defined, the annotations will proceed to the next step (Step 4: Select Target Tags & Define Classes) with their tags potentially cleaned, standardized, or augmented based on your configuration.
 If you don't define any rules, the tags simply pass through this step unchanged.
--- a/docs/source/targets/use.md
+++ b/docs/source/targets/use.md
@ -1,91 +0,0 @@
 # Bringing It All Together: The `Targets` Object
 ## Recap: Defining Your Target Strategy
 In the previous sections, we covered the sequential steps to precisely define what your BatDetect2 model should learn, specified within your configuration file:
 1.  **Terms:** Establishing the vocabulary for annotation tags.
 2.  **Filtering:** Selecting relevant sound event annotations.
 3.  **Transforming:** Optionally modifying tags.
 4.  **Classes:** Defining target categories, setting priorities, and specifying tag decoding rules.
 5.  **ROI Mapping:** Defining how annotation geometry maps to target position and size values.
 You define all these aspects within your configuration file (e.g., YAML), which holds the complete specification for your target definition strategy, typically under a main `targets:` key.
 ## What is the `Targets` Object?
 While the configuration file specifies _what_ you want to happen, BatDetect2 needs an active component to actually _perform_ these steps.
 This is the role of the `Targets` object.
 The `Targets` is an organized container that holds all the specific functions and settings derived from your configuration file (`TargetConfig`).
 It's created directly from your configuration and provides methods to apply the **filtering**, **transformation**, **ROI mapping** (geometry to position/size and back), **class encoding**, and **class decoding** steps you defined.
 It effectively bundles together all the target definition logic determined by your settings into a single, usable object.
 ## How is it Created and Used?
 For most standard training workflows, you typically won't need to create or interact with the `Targets` object directly in Python code.
 BatDetect2 usually handles its creation automatically when you provide your main configuration file during training setup.
 Conceptually, here's what happens behind the scenes:
 1.  You provide the path to your configuration file (e.g., `my_training_config.yaml`).
 2.  BatDetect2 reads this file and finds your `targets:` configuration section.
 3.  It uses this configuration to build an instance of the `Targets` object using a dedicated function (like `load_targets`), loading it with the appropriate logic based on your settings.
 ```python
 # Conceptual Example: How BatDetect2 might use your configuration
 from batdetect2.targets import load_targets # The function to load/build the object
 from batdetect2.targets.types import TargetProtocol # The type/interface
 # You provide this path, usually as part of the main training setup
 target_config_file = "path/to/your/target_config.yaml"
 # --- BatDetect2 Internally Does Something Like This: ---
 # Loads your config and builds the Targets object using the loader function
 # The resulting object adheres to the TargetProtocol interface
 targets_processor: TargetProtocol = load_targets(target_config_file)
 # ---------------------------------------------------------
 # Now, 'targets_processor' holds all your configured logic and is ready
 # to be used internally by the training pipeline or for prediction processing.
 ```
 ## What Does the `Targets` Object Do? (Its Role)
 Once created, the `targets_processor` object plays several vital roles within the BatDetect2 system:
 1.  **Preparing Training Data:** During the data loading and label generation phase of training, BatDetect2 uses this object to process each annotation from your dataset _before_ the final training format (e.g., heatmaps) is generated.
    For each annotation, it internally applies the logic:
    - `targets_processor.filter(...)`: To decide whether to keep the annotation.
    - `targets_processor.transform(...)`: To apply any tag modifications.
    - `targets_processor.encode(...)`: To get the final class name (e.g., `'pippip'`, `'myodau'`, or `None` for the generic class).
    - `targets_processor.get_position(...)`: To determine the reference `(time, frequency)` point from the annotation's geometry.
    - `targets_processor.get_size(...)`: To calculate the _scaled_ width and height target values from the annotation's geometry.
 2.  **Interpreting Model Predictions:** When you use a trained model, its raw outputs (like predicted class names, positions, and sizes) need to be translated back into meaningful results.
    This object provides the necessary decoding logic:
    - `targets_processor.decode(...)`: Converts a predicted class name back into representative annotation tags.
    - `targets_processor.recover_roi(...)`: Converts a predicted position and _scaled_ size values back into an estimated geometric bounding box in real-world coordinates (seconds, Hz).
    - `targets_processor.generic_class_tags`: Provides the tags for sounds classified into the generic category.
 3.  **Providing Metadata:** It conveniently holds useful information derived from your configuration:
    - `targets_processor.class_names`: The final list of specific target class names.
    - `targets_processor.generic_class_tags`: The tags representing the generic class.
    - `targets_processor.dimension_names`: The names used for the size dimensions (e.g., `['width', 'height']`).
 ## Why is Understanding This Important?
 As a researcher using BatDetect2, your primary interaction is typically through the **configuration file**.
 The `Targets` object is the component that materializes your configurations.
 Understanding its role can be important:
 - It helps connect the settings in your configuration file (covering terms, filtering, transforms, classes, and ROIs) to the actual behavior observed during training or when interpreting model outputs.
  If the results aren't as expected (e.g., wrong classifications, incorrect bounding box predictions), reviewing the relevant sections of your `TargetConfig` is the first step in debugging.
 - Furthermore, understanding this structure is beneficial if you plan to create custom Python scripts.
  While standard training runs handle this object internally, the underlying functions for filtering, transforming, encoding, decoding, and ROI mapping are accessible or can be built individually.
  This modular design provides the **flexibility to use or customize specific parts of the target definition workflow programmatically** for advanced analyses, integration tasks, or specialized data processing pipelines, should you need to go beyond the standard configuration-driven approach.
 ## Summary
 The `Targets` object encapsulates the entire configured target definition logic specified in your `TargetConfig` file.
 It acts as the central component within BatDetect2 for applying filtering, tag transformation, ROI mapping (geometry to/from position/size), class encoding (for training preparation), and class/ROI decoding (for interpreting predictions).
 It bridges the gap between your declarative configuration and the functional steps needed for training and using BatDetect2 models effectively, while also offering components for more advanced, scripted workflows.