Update targets docs

2025-06-29 22:51:58 +02:00 · 2025-04-19 20:35:34 +01:00 · 2025-04-19 20:35:34 +01:00 · 089328a4f0
commit 089328a4f0
parent 6236e78414
2 changed files with 78 additions and 59 deletions
--- a/docs/source/targets/index.md
+++ b/docs/source/targets/index.md
@ -1,41 +1,40 @@
 # Defining Training Targets

 A crucial aspect of training any supervised machine learning model, including BatDetect2, is clearly defining the **training targets**.
-This process determines precisely what the model should learn to detect and recognize from the input data (in this case, spectrograms).
+This process determines precisely what the model should learn to detect, localize, classify, and characterize from the input data (in this case, spectrograms).
 The choices made here directly influence the model's focus, its performance, and how its predictions should be interpreted.

 For BatDetect2, defining targets involves specifying:

 - Which sounds in your annotated dataset are relevant for training.
- How these sounds should be categorized into distinct **classes** (e.g., different species, types of calls, noise categories).
- How the model's output (predicted class names) should be translated back into meaningful tags.
+- How these sounds should be categorized into distinct **classes** (e.g., different species).
+- How the geometric **Region of Interest (ROI)** (e.g., bounding box) of each sound maps to the specific **position** and **size** targets the model predicts.
+- How these classes and geometric properties relate back to the detailed information stored in your annotation **tags** (using a consistent **vocabulary/terms**).
+- How the model's output (predicted class names, positions, sizes) should be translated back into meaningful tags and geometries.

 ## Sound Event Annotations: The Starting Point

 BatDetect2 assumes your training data consists of audio recordings where relevant sound events have been **annotated**.
 A typical annotation for a single sound event provides two key pieces of information:

-1.  **Location & Extent:** Information defining _where_ the sound occurs in time and frequency, usually represented as a **bounding box** drawn on a spectrogram.
-2.  **Description (Tags):** Information _about_ the sound event, provided as a set of descriptive **tags**.
+1.  **Location & Extent:** Information defining _where_ the sound occurs in time and frequency, usually represented as a **bounding box** (the ROI) drawn on a spectrogram.
+2.  **Description (Tags):** Information _about_ the sound event, provided as a set of descriptive **tags** (key-value pairs).

-Tags are fundamental to how BatDetect2 understands annotations.
-Each tag is a **key-value pair**, much like labels used in many data systems.
-For example, a single echolocation pulse annotation might have tags like:
+For example, an annotation might have a bounding box and tags like:

 - `species: Myotis daubentonii`
 - `quality: Good`
 - `call_type: Echolocation`
- `verified_by: ExpertA`

-A key aspect is that a single sound event can (and often does) have **multiple tags** associated with it, allowing for rich, detailed descriptions capturing various facets of information (taxonomy, signal quality, functional type, verification status, etc.).
-
-While this detailed, multi-tag approach is powerful for data representation, standard classification models typically require a single target class label for each training example.
-Therefore, the core task of the **target definition process** described in the following sections is to provide BatDetect2 with clear rules to:
+A single sound event can have **multiple tags**, allowing for rich descriptions.
+This richness requires a structured process to translate the annotation (both tags and geometry) into the precise targets needed for model training.
+The **target definition process** provides clear rules to:

 - Interpret the meaning of different tag keys (**Terms**).
 - Select only the relevant annotations (**Filtering**).
 - Potentially standardize or modify the tags (**Transforming**).
- Ultimately map the rich set of tags on each selected annotation to a single, definitive **target class** label for training (**Classes**).
+- Map the geometric ROI to specific position and size targets (**ROI Mapping**).
+- Map the final set of tags on each selected annotation to a single, definitive **target class** label (**Classes**).

 ## Configuration-Driven Workflow

@ -46,29 +45,34 @@ These settings are usually grouped under a main `targets:` key within your overa

 Defining the targets involves several sequential steps, each configurable and building upon the previous one:

-1.  **Defining Vocabulary:** Understand how annotations use tags (key-value pairs like `species: Myotis daubentonii`).
-    This first step involves defining the meaning (**Terms**) behind the keys used in your tags (e.g., establishing what `species` or `call_type` represents using standard or custom definitions).
-    Often, the default vocabulary provided by BatDetect2 is sufficient, so you may not need to configure this step explicitly.
-    However, reading through this section is encouraged to understand how tags are formally defined via keys, which is essential for using them effectively in subsequent steps like filtering and class definition.
-2.  **Filtering Sound Events:** Select only the sound event annotations that are relevant for your specific training goal, based on their tags (e.g., keeping only high-quality echolocation calls).
-3.  **Transforming Tags (Optional):** Modify the tags on the selected annotations.
-    This is useful for standardizing inconsistent labels, correcting errors, grouping related concepts (like mapping species to genus), or deriving new information.
-4.  **Defining Classes & Decoding Rules:** Map the final set of tags on each annotation to a specific target **class name** (like `pippip` or `myodau`) that the model will learn.
-    This step also includes setting priorities for overlapping definitions and specifying how predicted class names should be translated back into tags (decoding).
-5.  **The `Targets` Object:** Understand the outcome of this configuration process – a functional object used internally by BatDetect2 that encapsulates all your defined rules for filtering, transforming, encoding, and decoding.
+1.  **Defining Vocabulary (Terms & Tags):** Understand how annotations use tags (key-value pairs).
+    This step involves defining the meaning (**Terms**) behind the tag keys (e.g., `species`, `call_type`).
+    Often, default terms are sufficient, but understanding this is key to using tags in later steps.
+    (See: {doc}`tags_and_terms`})
+2.  **Filtering Sound Events:** Select only the relevant sound event annotations based on their tags (e.g., keeping only high-quality calls).
+    (See: {doc}`filtering`})
+3.  **Transforming Tags (Optional):** Modify tags on selected annotations for standardization, correction, grouping (e.g., species to genus), or deriving new tags.
+    (See: {doc}`transform`})
+4.  **Defining Classes & Decoding Rules:** Map the final tags to specific target **class names** (like `pippip` or `myodau`).
+    Define priorities for overlap and specify how predicted names map back to tags (decoding).
+    (See: {doc}`classes`})
+5.  **Mapping ROIs (Position & Size):** Define how the geometric ROI (e.g., bounding box) of each sound event maps to the specific reference **point** (e.g., center, corner) and scaled **size** values (width, height) used as targets by the model.
+    (See: {doc}`rois`})
+6.  **The `Targets` Object:** Understand the outcome of configuring steps 1-5 – a functional object used internally by BatDetect2 that encapsulates all your defined rules for filtering, transforming, ROI mapping, encoding, and decoding.
+    (See: {doc}`use`)

-The result of this entire configuration process is a clear set of instructions that BatDetect2 uses during training data preparation to determine the correct "answer" (the ground truth label or target) for each relevant sound event.
+The result of this configuration process is a clear set of instructions that BatDetect2 uses during training data preparation to determine the correct "answer" (the ground truth label and geometry representation) for each relevant sound event.

 Explore the detailed steps using the links below:

 ```{toctree}
 :maxdepth: 1
-:caption: Contents:
+:caption: Target Definition Steps:

 tags_and_terms
 filtering
 transform
 classes
-labels
+rois
 use
 ```
--- a/docs/source/targets/use.md
+++ b/docs/source/targets/use.md
@ -1,26 +1,27 @@
-# Bringing It All Together: The `Targets` Object
+## Bringing It All Together: The `Targets` Object

-## Recap: Defining Your Target Strategy
+### Recap: Defining Your Target Strategy

-Previously, we covered the steps to precisely define what your BatDetect2 model should learn:
+In the previous sections, we covered the sequential steps to precisely define what your BatDetect2 model should learn, specified within your configuration file:

 1.  **Terms:** Establishing the vocabulary for annotation tags.
 2.  **Filtering:** Selecting relevant sound event annotations.
 3.  **Transforming:** Optionally modifying tags.
-4.  **Classes:** Defining target categories, setting priorities, and specifying decoding rules.
+4.  **Classes:** Defining target categories, setting priorities, and specifying tag decoding rules.
+5.  **ROI Mapping:** Defining how annotation geometry maps to target position and size values.

-You define these aspects within a configuration file (e.g., YAML), which holds the complete specification for your target definition strategy.
+You define all these aspects within your configuration file (e.g., YAML), which holds the complete specification for your target definition strategy, typically under a main `targets:` key.

-## What is the `Targets` Object?
+### What is the `Targets` Object?

-While the configuration file specifies _what_ you want to happen, BatDetect2 needs a component to actually _perform_ these steps.
+While the configuration file specifies _what_ you want to happen, BatDetect2 needs an active component to actually _perform_ these steps.
 This is the role of the `Targets` object.

-Think of the `Targets` object as an organized container that holds all the specific functions and settings derived from your configuration file (specifically, your `TargetConfig` section).
-It's created directly from your configuration and provides methods to apply the filtering, transformation, encoding, and decoding steps you defined.
-It effectively bundles together all the logic determined by your settings into a single, usable object.
+The `Targets` is an organized container that holds all the specific functions and settings derived from your configuration file (`TargetConfig`).
+It's created directly from your configuration and provides methods to apply the **filtering**, **transformation**, **ROI mapping** (geometry to position/size and back), **class encoding**, and **class decoding** steps you defined.
+It effectively bundles together all the target definition logic determined by your settings into a single, usable object.

-## How is it Created and Used?
+### How is it Created and Used?

 For most standard training workflows, you typically won't need to create or interact with the `Targets` object directly in Python code.
 BatDetect2 usually handles its creation automatically when you provide your main configuration file during training setup.
@ -28,49 +29,63 @@ BatDetect2 usually handles its creation automatically when you provide your main
 Conceptually, here's what happens behind the scenes:

 1.  You provide the path to your configuration file (e.g., `my_training_config.yaml`).
-2.  BatDetect2 reads this file and finds your `targets` configuration section.
-3.  It uses this configuration to build an instance of the `Targets` object, loading it with the appropriate functions for filtering, transforming, encoding, and decoding based on your settings.
+2.  BatDetect2 reads this file and finds your `targets:` configuration section.
+3.  It uses this configuration to build an instance of the `Targets` object using a dedicated function (like `load_targets`), loading it with the appropriate logic based on your settings.

 ```python
 # Conceptual Example: How BatDetect2 might use your configuration
-from batdetect2.targets import Targets # The class we are discussing
+from batdetect2.targets import load_targets # The function to load/build the object
+from batdetect2.targets.types import TargetProtocol # The type/interface

 # You provide this path, usually as part of the main training setup
 target_config_file = "path/to/your/target_config.yaml"

 # --- BatDetect2 Internally Does Something Like This: ---
-# Loads your config and builds the Targets object using a factory method
-targets_processor = Targets.from_file(target_config_file)
+# Loads your config and builds the Targets object using the loader function
+# The resulting object adheres to the TargetProtocol interface
+targets_processor: TargetProtocol = load_targets(target_config_file)
 # ---------------------------------------------------------

 # Now, 'targets_processor' holds all your configured logic and is ready
-# to be used internally by the training pipeline.
+# to be used internally by the training pipeline or for prediction processing.
 ```

-## What Does the `Targets` Object Do?
+### What Does the `Targets` Object Do? (Its Role)

-Once created, the `targets_processor` object plays two vital roles within the BatDetect2 system:
+Once created, the `targets_processor` object plays several vital roles within the BatDetect2 system:

-1.  **Preparing Training Data:** During the data loading phase of training, BatDetect2 uses this object to process each annotation from your dataset _before_ the final heatmap targets (Step 5) are generated.
-    For each annotation, it will internally apply the logic defined in your configuration using methods like `targets_processor.filter(...)`, `targets_processor.transform(...)`, and `targets_processor.encode(...)`.
-2.  **Interpreting Model Predictions:** When you use a trained model, this object (or the configuration used to create it) is needed to translate the model's raw output (predicted class names) back into the meaningful annotation tags you defined using the decoding rules (`targets_processor.decode(...)` and accessing `targets_processor.generic_class_tags`).
-3.  **Providing Metadata:** It conveniently holds useful information derived from your configuration, such as the final list of specific class names (`targets_processor.class_names`) and the tags representing the generic class (`targets_processor.generic_class_tags`).
+1.  **Preparing Training Data:** During the data loading and label generation phase of training, BatDetect2 uses this object to process each annotation from your dataset _before_ the final training format (e.g., heatmaps) is generated.
+    For each annotation, it internally applies the logic:
+    - `targets_processor.filter(...)`: To decide whether to keep the annotation.
+    - `targets_processor.transform(...)`: To apply any tag modifications.
+    - `targets_processor.encode(...)`: To get the final class name (e.g., `'pippip'`, `'myodau'`, or `None` for the generic class).
+    - `targets_processor.get_position(...)`: To determine the reference `(time, frequency)` point from the annotation's geometry.
+    - `targets_processor.get_size(...)`: To calculate the _scaled_ width and height target values from the annotation's geometry.
+2.  **Interpreting Model Predictions:** When you use a trained model, its raw outputs (like predicted class names, positions, and sizes) need to be translated back into meaningful results.
+    This object provides the necessary decoding logic:
+    - `targets_processor.decode(...)`: Converts a predicted class name back into representative annotation tags.
+    - `targets_processor.recover_roi(...)`: Converts a predicted position and _scaled_ size values back into an estimated geometric bounding box in real-world coordinates (seconds, Hz).
+    - `targets_processor.generic_class_tags`: Provides the tags for sounds classified into the generic category.
+3.  **Providing Metadata:** It conveniently holds useful information derived from your configuration:
+    - `targets_processor.class_names`: The final list of specific target class names.
+    - `targets_processor.generic_class_tags`: The tags representing the generic class.
+    - `targets_processor.dimension_names`: The names used for the size dimensions (e.g., `['width', 'height']`).

-## Why is Understanding This Important?
+### Why is Understanding This Important?

 As a researcher using BatDetect2, your primary interaction is typically through the **configuration file**.
-The `Targets` object is the component that brings that configuration to life.
+The `Targets` object is the component that materializes your configurations.

-Understanding its existence and role is key:
+Understanding its role can be important:

- It helps connect the settings in your configuration file to the actual behavior observed during training or when interpreting model outputs.
-  If the results aren't as expected, reviewing the relevant sections of your `TargetConfig` is the first step in debugging.
+- It helps connect the settings in your configuration file (covering terms, filtering, transforms, classes, and ROIs) to the actual behavior observed during training or when interpreting model outputs.
+  If the results aren't as expected (e.g., wrong classifications, incorrect bounding box predictions), reviewing the relevant sections of your `TargetConfig` is the first step in debugging.
 - Furthermore, understanding this structure is beneficial if you plan to create custom Python scripts.
-  While standard training runs handle this object internally, the underlying functions for filtering, transforming, encoding, and decoding are accessible or can be built individually.
+  While standard training runs handle this object internally, the underlying functions for filtering, transforming, encoding, decoding, and ROI mapping are accessible or can be built individually.
  This modular design provides the **flexibility to use or customize specific parts of the target definition workflow programmatically** for advanced analyses, integration tasks, or specialized data processing pipelines, should you need to go beyond the standard configuration-driven approach.

-## Summary
+### Summary

-The `Targets` object encapsulates the entire configured target definition logic specified in your configuration file.
-It acts as the central hub within BatDetect2 for applying filtering, tag transformation, class encoding (for training preparation), and class decoding (for interpreting predictions).
+The `Targets` object encapsulates the entire configured target definition logic specified in your `TargetConfig` file.
+It acts as the central component within BatDetect2 for applying filtering, tag transformation, ROI mapping (geometry to/from position/size), class encoding (for training preparation), and class/ROI decoding (for interpreting predictions).
 It bridges the gap between your declarative configuration and the functional steps needed for training and using BatDetect2 models effectively, while also offering components for more advanced, scripted workflows.