docs: add task guides and API/config references

2026-05-22 22:32:18 +02:00 · 2026-04-30 11:48:19 +01:00 · 2026-04-30 11:48:19 +01:00 · 300716895e
commit 300716895e
parent 9dec35b1ce
21 changed files with 971 additions and 1 deletions
--- a/docs/source/explanation/evaluation-concepts-and-matching.md
+++ b/docs/source/explanation/evaluation-concepts-and-matching.md
@ -0,0 +1,48 @@
 # Evaluation concepts and matching
 Evaluation is not just "run predictions and compute one number".
 The reported metric depends on the evaluation task, the matching rule, and the treatment of clip boundaries and generic labels.
 ## Task families answer different questions
 Built-in task families include:
 - sound event detection,
 - sound event classification,
 - top-class detection,
 - clip detection,
 - clip classification.
 Choose the task that matches the scientific or engineering question.
 ## Matching matters
 For sound-event-style tasks, predictions and annotations are matched using an affinity function.
 Important controls include:
 - `affinity`,
 - `affinity_threshold`,
 - `strict_match`,
 - `ignore_start_end`.
 Small changes here can change the reported metric without changing the underlying predictions.
 ## Boundary handling matters
 The evaluation base task can exclude events near clip boundaries through `ignore_start_end`.
 This is useful when clip boundaries make matches ambiguous.
 ## Generic labels can matter in classification
 Classification tasks can include or exclude generic targets depending on configuration.
 That affects what counts as a valid class-level comparison.
 ## Related pages
 - Evaluate on a test set: {doc}`../tutorials/evaluate-on-a-test-set`
 - Evaluation config reference: {doc}`../reference/evaluation-config`
 - Model output and validation: {doc}`model-output-and-validation`
--- a/docs/source/explanation/extracted-features-and-embeddings.md
+++ b/docs/source/explanation/extracted-features-and-embeddings.md
@ -0,0 +1,36 @@
 # Extracted features and embeddings
 The current API exposes a per-detection `features` vector.
 Older BatDetect2 workflows also exposed concepts such as `cnn_feats`, `spec_features`, and `spec_slices`.
 ## What the current feature vector is
 In the current stack, each retained detection can carry an internal feature representation produced by the model output pipeline.
 This is useful for downstream exploration, comparison, and custom analysis.
 ## What these features are not
 They are not automatically human-interpretable ecological variables.
 They are also not a substitute for careful validation.
 ## Why people refer to them as embeddings
 In practice, users often treat these feature vectors as embeddings because they can be used as dense learned representations of detections.
 That usage is reasonable, but you should still treat them as model-derived internal representations whose meaning depends on the training setup.
 ## Legacy terminology versus current terminology
 - legacy `cnn_feats` referred to CNN feature outputs in the older workflow,
 - legacy `spec_features` referred to lower-level extracted call features,
 - current `features` are the per-detection vectors attached to `Detection` objects.
 These are related ideas, but not necessarily one-to-one replacements.
 ## Related pages
 - Inspect detection features in Python: {doc}`../how_to/inspect-detection-features-in-python`
 - Legacy feature extraction: {doc}`../legacy/feature-extraction`
--- a/docs/source/explanation/interpreting-formatted-outputs.md
+++ b/docs/source/explanation/interpreting-formatted-outputs.md
@ -0,0 +1,36 @@
 # Interpreting formatted outputs
 BatDetect2 can write predictions in several output formats.
 Those formats are different views of the same underlying detections, not different model behaviors.
 ## Separate the underlying detection from the serialized file
 Internally, the current stack works with clip-level detections containing geometry, detection score, class scores, and features.
 Output formatters then serialize those detections in different ways.
 ## Raw outputs are richest
 The `raw` format preserves the broadest structured view of detections and is a good default when you want to inspect or reload predictions later.
 ## Tabular outputs are for analysis convenience
 The `parquet` format is convenient for data analysis workflows, but the tabular representation is only one projection of the underlying detection object.
 ## Legacy-shaped outputs are mainly for compatibility
 The `batdetect2` formatter writes the older BatDetect2-style JSON shape.
 Use it when you need compatibility with older downstream tools or workflows.
 ## The meaning does not come from the file extension
 Do not assume that a `.json`, `.parquet`, or `.nc` file changes what the model predicted.
 It changes how the prediction is packaged and how much detail is retained.
 ## Related pages
 - Output formats reference: {doc}`../reference/output-formats`
 - Outputs config reference: {doc}`../reference/outputs-config`
--- a/docs/source/explanation/what-batdetect2-predicts.md
+++ b/docs/source/explanation/what-batdetect2-predicts.md
@ -0,0 +1,45 @@
 # What BatDetect2 predicts
 BatDetect2 predicts call-level events, not recording-level truth.
 For each retained detection, the current stack can expose:
 - a geometry describing where the event sits in time-frequency space,
 - a detection score,
 - a class-score vector,
 - an internal feature vector.
 ## Detection score versus class scores
 These are different outputs and should not be interpreted as the same thing.
 - The detection score is about whether the event is kept as a detection.
 - The class-score vector ranks classes for that detected event.
 A detection can be kept while still having uncertain class identity.
 ## Predictions are conditional on the workflow
 The final output also depends on:
 - preprocessing,
 - postprocessing,
 - thresholds,
 - target definitions,
 - output transforms.
 That is why two runs can differ even when they use the same checkpoint.
 ## What BatDetect2 does not predict
 BatDetect2 does not directly output ecological truth.
 It also does not eliminate the need for local validation.
 Use reviewed local data before making ecological claims.
 ## Related pages
 - Model output and validation: {doc}`model-output-and-validation`
 - Postprocessing and thresholds: {doc}`postprocessing-and-thresholds`
 - Interpreting formatted outputs: {doc}`interpreting-formatted-outputs`
--- a/docs/source/how_to/choose-an-inference-input-mode.md
+++ b/docs/source/how_to/choose-an-inference-input-mode.md
@ -0,0 +1,66 @@
 # How to choose an inference input mode
 Use this guide to decide whether `predict directory`, `predict file_list`, or `predict dataset` is the right entry point for your run.
 ## Use `predict directory` when the recordings already live together
 This is the simplest choice.
 Use it when:
 - your recordings are already organized in one directory tree,
 - you want BatDetect2 to discover audio files for you,
 - you are doing a first pass over a folder of recordings.
 ```bash
 batdetect2 predict directory \
  path/to/model.ckpt \
  path/to/audio_dir \
  path/to/outputs
 ```
 ## Use `predict file_list` when you need explicit control over the file set
 Use it when:
 - you want to run only a selected subset,
 - your files are spread across directories,
 - another tool has already produced the exact list of recordings to process.
 The list file should contain one path per line.
 ```bash
 batdetect2 predict file_list \
  path/to/model.ckpt \
  path/to/audio_files.txt \
  path/to/outputs
 ```
 ## Use `predict dataset` when your workflow is already annotation-set driven
 Use it when:
 - your project already has a `soundevent` annotation set,
 - you want prediction runs aligned with that annotation metadata,
 - you want BatDetect2 to resolve recording paths from the annotation set.
 ```bash
 batdetect2 predict dataset \
  path/to/model.ckpt \
  path/to/annotation_set.json \
  path/to/outputs
 ```
 The dataset command reads a `soundevent` annotation set and extracts unique recording paths before inference.
 ## Rule of thumb
 - Start with `directory` for the easiest first run.
 - Use `file_list` when selection matters.
 - Use `dataset` when the rest of your workflow is already dataset-based.
 ## Related pages
 - Run batch predictions: {doc}`run-batch-predictions`
 - Tune inference clipping: {doc}`tune-inference-clipping`
 - Predict command reference: {doc}`../reference/cli/predict`
--- a/docs/source/how_to/choose-and-configure-evaluation-tasks.md
+++ b/docs/source/how_to/choose-and-configure-evaluation-tasks.md
@ -0,0 +1,66 @@
 # How to choose and configure evaluation tasks
 Use this guide when the default evaluation tasks do not match the question you want to answer.
 ## Know the default first
 By default, BatDetect2 evaluation starts with:
 - sound event detection,
 - sound event classification.
 Those are good defaults for many projects, but not for all of them.
 ## Choose the task that matches the question
 Common built-in task families include:
 - `sound_event_detection`
 - `sound_event_classification`
 - `top_class_detection`
 - `clip_detection`
 - `clip_classification`
 Choose based on the question you care about.
 - Use sound-event tasks when you care about individual call events.
 - Use clip tasks when you care about clip-level presence or clip-level class evidence.
 - Use top-class detection when you want matching based on the highest-scoring class per detection.
 ## Configure tasks in `EvaluationConfig`
 Example:
 ```yaml
 tasks:
  - name: sound_event_detection
    prefix: detection
    affinity_threshold: 0.0
    strict_match: true
  - name: clip_classification
    prefix: clip_classification
 ```
 Pass the config with:
 ```bash
 batdetect2 evaluate \
  path/to/model.ckpt \
  path/to/test_dataset.yaml \
  --base-dir path/to/project_root \
  --evaluation-config path/to/evaluation.yaml
 ```
 Include `--base-dir` when the dataset config resolves recordings through relative paths.
 ## Change one thing at a time
 When comparing models or settings, avoid changing task definitions, thresholds, matching behavior, and datasets all at once.
 Otherwise it becomes hard to explain why the metric changed.
 ## Related pages
 - Evaluation tutorial: {doc}`../tutorials/evaluate-on-a-test-set`
 - Evaluation config reference: {doc}`../reference/evaluation-config`
 - Evaluation concepts: {doc}`../explanation/evaluation-concepts-and-matching`
--- a/docs/source/how_to/fine-tune-from-a-checkpoint.md
+++ b/docs/source/how_to/fine-tune-from-a-checkpoint.md
@ -0,0 +1,45 @@
 # How to fine-tune from a checkpoint
 Use this guide when you want to continue from an existing checkpoint instead of training a fresh model config.
 ## Use `--model` for checkpoint-based training
 Pass a checkpoint with `--model`.
 Do not combine `--model` with `--model-config`.
 ```bash
 batdetect2 train \
  path/to/train_dataset.yaml \
  --val-dataset path/to/val_dataset.yaml \
  --model path/to/model.ckpt \
  --training-config path/to/training.yaml
 ```
 ## Keep targets and preprocessing aligned
 If you override targets or audio-related settings while fine-tuning, validate that they still match the checkpoint and your dataset.
 Mismatches here can produce confusing failures or invalid comparisons.
 ## Decide what question the fine-tune should answer
 Common fine-tuning goals are:
 - adapting to local recording conditions,
 - adapting to a new label set,
 - improving performance on a narrower deployment context.
 Make that goal explicit before comparing results.
 ## Evaluate after fine-tuning
 Always compare the fine-tuned checkpoint against a held-out dataset.
 Use the same evaluation setup when comparing before and after.
 ## Related pages
 - Training tutorial: {doc}`../tutorials/train-a-custom-model`
 - Evaluate a test set: {doc}`../tutorials/evaluate-on-a-test-set`
 - Train command reference: {doc}`../reference/cli/train`
--- a/docs/source/how_to/inspect-class-scores-in-python.md
+++ b/docs/source/how_to/inspect-class-scores-in-python.md
@ -0,0 +1,44 @@
 # How to inspect class scores in Python
 Use this guide when you need more than the top class label for each detection.
 ## Get the ranked class scores
 `BatDetect2API.get_class_scores` returns `(class_name, score)` pairs for one detection.
 ```python
 from pathlib import Path
 from batdetect2.api_v2 import BatDetect2API
 api = BatDetect2API.from_checkpoint(Path("path/to/model.ckpt"))
 prediction = api.process_file(Path("path/to/audio.wav"))
 for detection in prediction.detections:
    print("detection score:", detection.detection_score)
    for class_name, score in api.get_class_scores(detection):
        print(class_name, score)
 ```
 ## Separate detection confidence from class ranking
 Keep these two ideas separate:
 - `detection_score` tells you how strongly the model kept the event as a detection,
 - `class_scores` tell you how the model ranked classes for that detected event.
 A detection can have a reasonable detection score while still having uncertain class ranking.
 ## Hide the top class if needed
 If you want to inspect only the alternatives, pass `include_top_class=False`.
 ```python
 api.get_class_scores(detection, include_top_class=False)
 ```
 ## Related pages
 - Python tutorial: {doc}`../tutorials/integrate-with-a-python-pipeline`
 - API reference: {doc}`../reference/api`
 - Understanding scores: {doc}`../explanation/what-batdetect2-predicts`
--- a/docs/source/how_to/inspect-detection-features-in-python.md
+++ b/docs/source/how_to/inspect-detection-features-in-python.md
@ -0,0 +1,49 @@
 # How to inspect detection features in Python
 Use this guide when you want the per-detection feature vectors exposed by the current API.
 ## Get the feature vector for one detection
 Each detection carries a `features` vector.
 The API exposes it through `get_detection_features`.
 ```python
 from pathlib import Path
 from batdetect2.api_v2 import BatDetect2API
 api = BatDetect2API.from_checkpoint(Path("path/to/model.ckpt"))
 prediction = api.process_file(Path("path/to/audio.wav"))
 for detection in prediction.detections:
    features = api.get_detection_features(detection)
    print(features.shape)
 ```
 ## Use features for exploration, not as ground truth labels
 These features are internal model representations attached to detections.
 They can be useful for:
 - exploratory visualization,
 - downstream clustering,
 - comparison across detections,
 - building extra analysis pipelines.
 They do not replace validation.
 They also do not automatically have a one-to-one interpretation as ecological variables.
 ## Save predictions with features included
 If you need features on disk, use an output format that supports them, such as `raw` or `parquet`, and keep feature inclusion enabled.
 See {doc}`save-predictions-in-different-output-formats`.
 ## Related pages
 - Understanding features and embeddings: {doc}`../explanation/extracted-features-and-embeddings`
 - Output formats reference: {doc}`../reference/output-formats`
 - API reference: {doc}`../reference/api`
--- a/docs/source/how_to/interpret-evaluation-outputs.md
+++ b/docs/source/how_to/interpret-evaluation-outputs.md
@ -0,0 +1,41 @@
 # How to interpret evaluation outputs
 Use this guide after `batdetect2 evaluate` has written metrics and plots to disk.
 ## Start by identifying the task
 Do not interpret a metric until you know which evaluation task produced it.
 For example, a detection score and a clip-classification score answer different questions.
 ## Read the output directory as a bundle
 Treat the evaluation output directory as one package:
 - metrics,
 - plots,
 - saved predictions,
 - config context.
 Do not lift a single number out of context and treat it as the whole story.
 ## Look for failure patterns, not just overall averages
 Check:
 - whether errors concentrate in certain taxa,
 - whether specific sites or recorder setups behave differently,
 - whether threshold choices are driving the result,
 - whether predictions are near clip boundaries or matching thresholds.
 ## Keep validation and deployment questions separate
 A model can look good on one task and still be a poor fit for your deployment question.
 Interpret the outputs in relation to the real use case, not only the easiest metric to report.
 ## Related pages
 - Evaluation tutorial: {doc}`../tutorials/evaluate-on-a-test-set`
 - Evaluation concepts: {doc}`../explanation/evaluation-concepts-and-matching`
 - Model output and validation: {doc}`../explanation/model-output-and-validation`
--- a/docs/source/how_to/run-batch-predictions.md
+++ b/docs/source/how_to/run-batch-predictions.md
@ -3,6 +3,8 @@
 This guide shows practical command patterns for directory-based and file-list
 prediction runs.
 Use it after you already know which input mode you want and need concrete command templates for a repeatable batch run.
 ## Predict from a directory
 ```bash
@ -12,6 +14,8 @@ batdetect2 predict directory \
  path/to/outputs
 ```
 Use this when BatDetect2 should discover the audio files for you.
 ## Predict from a file list
 ```bash
@ -21,10 +25,35 @@ batdetect2 predict file_list \
  path/to/outputs
 ```
 Use this when another part of your workflow already produced the exact recording list to process.
 ## Predict from a dataset config
 ```bash
 batdetect2 predict dataset \
  path/to/model.ckpt \
  path/to/annotation_set.json \
  path/to/outputs
 ```
 Use this when your project already has a `soundevent` annotation set and you want to extract unique recording paths from it.
 ## Useful options
 - `--batch-size` to control throughput.
 - `--workers` to set data-loading parallelism.
 - `--format` to select output format.
 - `--inference-config` to control clipping and loader behavior.
 - `--outputs-config` to control serialization and output transforms.
 - `--detection-threshold` to override the detection threshold for a run.
-For complete option details, see {doc}`../reference/cli/index`.
+## Practical workflow
 For large runs:
 1. test the command on a small reviewed subset,
 2. lock the config files and command shape,
 3. write outputs to a dedicated directory per run,
 4. record the checkpoint, config paths, and thresholds used.
 For complete option details, see {doc}`../reference/cli/predict`.
--- a/docs/source/how_to/save-predictions-in-different-output-formats.md
+++ b/docs/source/how_to/save-predictions-in-different-output-formats.md
@ -0,0 +1,64 @@
 # How to save predictions in different output formats
 Use this guide when you need BatDetect2 outputs in a specific representation for downstream tools.
 ## Choose the format that matches the job
 Current built-in output formats include:
 - `raw`: one NetCDF file per clip, best for rich structured outputs,
 - `parquet`: tabular storage for data analysis workflows,
 - `soundevent`: prediction-set JSON for soundevent-style tooling,
 - `batdetect2`: legacy per-recording JSON output.
 ## Select a format from the CLI
 Use `--format` for quick experiments.
 ```bash
 batdetect2 predict directory \
  path/to/model.ckpt \
  path/to/audio_dir \
  path/to/outputs \
  --format parquet
 ```
 ## Use an outputs config for repeatable runs
 Use an outputs config when you want reproducible control over format and transforms.
 Example:
 ```yaml
 format:
  name: raw
  include_class_scores: true
  include_features: true
  include_geometry: true
 transform:
  detection_transforms: []
  clip_transforms: []
 ```
 Run with:
 ```bash
 batdetect2 predict directory \
  path/to/model.ckpt \
  path/to/audio_dir \
  path/to/outputs \
  --outputs-config path/to/outputs.yaml
 ```
 ## Pick the simplest useful format
 - Use `raw` if you want the richest output surface and easy round-tripping.
 - Use `parquet` if you want tabular analysis in Python or data-lake workflows.
 - Use `soundevent` if you want prediction-set JSON.
 - Use `batdetect2` only when you need the legacy JSON shape.
 ## Related pages
 - Outputs config reference: {doc}`../reference/outputs-config`
 - Output formats reference: {doc}`../reference/output-formats`
 - Output transforms reference: {doc}`../reference/output-transforms`
--- a/docs/source/how_to/tune-detection-threshold.md
+++ b/docs/source/how_to/tune-detection-threshold.md
@ -2,6 +2,10 @@
 Use this guide to compare detection outputs at different threshold values.
 The goal is not to find a universal threshold.
 The goal is to choose a threshold that fits your reviewed local data and the project trade-off between missed calls and false positives.
 ## 1) Start with a baseline run
 Run an initial prediction workflow and keep outputs in a dedicated folder.
@ -20,11 +24,22 @@ batdetect2 predict directory \
  --detection-threshold 0.3
 ```
 Keep each threshold run in a separate output directory.
 That makes it easier to compare counts and inspect example files without mixing results.
 ## 3) Validate against known calls
 Use files with trusted annotations or expert review to select a threshold that
 fits your project goals.
 Check both:
 - obvious false positives,
 - obvious missed calls.
 If class interpretation matters downstream, inspect class ranking behavior as well, not just detection counts.
 ## 4) Record your chosen setting
 Write down the chosen threshold and rationale so analyses are reproducible.
--- a/docs/source/how_to/tune-inference-clipping.md
+++ b/docs/source/how_to/tune-inference-clipping.md
@ -0,0 +1,63 @@
 # How to tune inference clipping
 Use this guide when long recordings need to be split into smaller clips during inference.
 ## What clipping controls
 `InferenceConfig.clipping` controls how recordings are split before batching.
 Key fields are:
 - `duration`: clip duration in seconds,
 - `overlap`: overlap between adjacent clips,
 - `max_empty`: how much empty padding is allowed,
 - `discard_empty`: whether empty clips are dropped.
 ## Start from the defaults
 Use the built-in clipping behavior first unless you already know you need something else.
 Only tune clipping when:
 - recordings are much longer than your normal working set,
 - you are seeing edge effects around calls,
 - you need tighter control over throughput or padding behavior.
 ## Override clipping with an inference config
 Create an inference config file and pass it to `predict` or `evaluate`.
 Example:
 ```yaml
 clipping:
  enabled: true
  duration: 0.5
  overlap: 0.1
  max_empty: 0.0
  discard_empty: true
 loader:
  batch_size: 8
 ```
 Run with:
 ```bash
 batdetect2 predict directory \
  path/to/model.ckpt \
  path/to/audio_dir \
  path/to/outputs \
  --inference-config path/to/inference.yaml
 ```
 ## Validate clipping changes on a small reviewed subset
 Changing clipping changes what the model sees per batch and can change how events near clip boundaries behave.
 Check a reviewed subset before applying clipping changes to a full project.
 ## Related pages
 - Inference config reference: {doc}`../reference/inference-config`
 - Run batch predictions: {doc}`run-batch-predictions`
 - Understanding the pipeline: {doc}`../explanation/pipeline-overview`
--- a/docs/source/reference/api.md
+++ b/docs/source/reference/api.md
@ -0,0 +1,65 @@
 # `BatDetect2API` reference
 `BatDetect2API` is the main entry point for the current Python workflow.
 It wraps model loading, inference, evaluation, output formatting, and training-related entry points behind one object.
 Defined in `batdetect2.api_v2`.
 ## Create an API instance
 - `BatDetect2API.from_checkpoint(path, ...)`
  - load a trained checkpoint and optional config overrides.
 - `BatDetect2API.from_config(config)`
  - build a full stack from a `BatDetect2Config` object.
 ## Inference methods
 - `process_file(audio_file, ...)`
  - run inference for one recording.
 - `process_files(audio_files, ...)`
  - run batch inference across a sequence of file paths.
 - `process_directory(audio_dir, ...)`
  - run inference across the audio files found in one directory.
 - `process_clips(clips, ...)`
  - run inference on an explicit sequence of clip objects.
 - `process_audio(audio, ...)`
  - run inference starting from a waveform array.
 - `process_spectrogram(spec, ...)`
  - run inference starting from a spectrogram tensor.
 ## Prediction inspection helpers
 - `get_top_class_name(detection)`
  - return the highest-scoring class name for one detection.
 - `get_class_scores(detection, include_top_class=True, sort_descending=True)`
  - return ranked `(class_name, score)` pairs.
 - `get_detection_features(detection)`
  - return the per-detection feature vector.
 ## Audio loading helpers
 - `load_audio(path)`
 - `load_recording(recording)`
 - `load_clip(clip)`
 - `generate_spectrogram(audio)`
 ## Output persistence helpers
 - `save_predictions(predictions, path, audio_dir=None, format=None, config=None)`
 - `load_predictions(path, format=None, config=None)`
 Use these when you want to save programmatic predictions without going through the CLI.
 ## Training and evaluation entry points
 - `train(...)`
 - `finetune(...)`
 - `evaluate(...)`
 - `evaluate_predictions(...)`
 ## Related pages
 - Python tutorial: {doc}`../tutorials/integrate-with-a-python-pipeline`
 - Outputs config reference: {doc}`outputs-config`
 - Output formats reference: {doc}`output-formats`
--- a/docs/source/reference/app-config.md
+++ b/docs/source/reference/app-config.md
@ -0,0 +1,38 @@
 # Top-level app config reference
 The top-level config object is `BatDetect2Config`.
 Defined in `batdetect2.config`.
 It combines the main configuration surfaces used across training, inference, evaluation, outputs, and logging.
 ## Fields
 - `config_version`
 - `train`
  - training-specific config.
 - `evaluation`
  - evaluation task and plot config.
 - `model`
  - model architecture, preprocessing, postprocessing, and targets.
 - `audio`
  - audio loading and resampling config.
 - `inference`
  - clipping and loader config for prediction-time workflows.
 - `outputs`
  - output format and output transform config.
 - `logging`
  - logging backend and formatting config.
 ## Mental model
 Think of `BatDetect2Config` as the complete application wiring for the current stack.
 Use it when you want one reproducible config that describes the whole workflow.
 ## Related pages
 - Inference config: {doc}`inference-config`
 - Evaluation config: {doc}`evaluation-config`
 - Outputs config: {doc}`outputs-config`
 - General config reference: {doc}`configs`
--- a/docs/source/reference/evaluation-config.md
+++ b/docs/source/reference/evaluation-config.md
@ -0,0 +1,46 @@
 # Evaluation config reference
 `EvaluationConfig` defines which evaluation tasks run and which plots they generate.
 Defined in `batdetect2.evaluate.config`.
 ## Top-level fields
 - `tasks`
  - list of task configs.
 ## Built-in task families
 Current built-in tasks include:
 - `sound_event_detection`
 - `sound_event_classification`
 - `top_class_detection`
 - `clip_detection`
 - `clip_classification`
 ## Shared task controls
 Common task-level controls include:
 - `prefix`
 - `ignore_start_end`
 Sound-event-style tasks also support:
 - `affinity`
 - `affinity_threshold`
 - `strict_match`
 ## Default behavior
 The default evaluation config starts with:
 - sound event detection,
 - sound event classification.
 ## Related pages
 - Choose and configure evaluation tasks: {doc}`../how_to/choose-and-configure-evaluation-tasks`
 - Evaluation concepts: {doc}`../explanation/evaluation-concepts-and-matching`
 - Evaluate CLI reference: {doc}`cli/evaluate`
--- a/docs/source/reference/inference-config.md
+++ b/docs/source/reference/inference-config.md
@ -0,0 +1,41 @@
 # Inference config reference
 `InferenceConfig` controls how files are clipped and batched during prediction-time workflows.
 Defined in `batdetect2.inference.config`.
 ## Top-level fields
 - `loader`
  - data-loader settings for inference.
 - `clipping`
  - controls how recordings are split into clips before batching.
 ## `loader`
 Current built-in loader field:
 - `batch_size` (int, default `8`)
 ## `clipping`
 Fields:
 - `enabled` (bool)
 - `duration` (float, seconds)
 - `overlap` (float, seconds)
 - `max_empty` (float)
 - `discard_empty` (bool)
 ## When to override this config
 Override `InferenceConfig` when:
 - long recordings need different clipping behavior,
 - you want to tune batch size for your hardware,
 - you need reproducible prediction settings across runs.
 ## Related pages
 - Tune inference clipping: {doc}`../how_to/tune-inference-clipping`
 - Predict CLI reference: {doc}`cli/predict`
--- a/docs/source/reference/output-formats.md
+++ b/docs/source/reference/output-formats.md
@ -0,0 +1,63 @@
 # Output formats reference
 BatDetect2 currently supports several built-in output formatters.
 ## `raw`
 Defined by `RawOutputConfig`.
 Best for rich structured outputs and round-tripping.
 Key fields:
 - `include_class_scores`
 - `include_features`
 - `include_geometry`
 Writes one NetCDF `.nc` file per clip.
 ## `parquet`
 Defined by `ParquetOutputConfig`.
 Best for tabular analysis workflows.
 Key fields:
 - `include_class_scores`
 - `include_features`
 - `include_geometry`
 Writes a parquet table, typically `predictions.parquet`.
 ## `soundevent`
 Defined by `SoundEventOutputConfig`.
 Best when you want a `PredictionSet` JSON workflow.
 Key fields:
 - `top_k`
 - `min_score`
 Writes a prediction-set JSON file.
 ## `batdetect2`
 Defined by `BatDetect2OutputConfig`.
 This is the legacy BatDetect2-style JSON output.
 Key fields:
 - `event_name`
 - `annotation_note`
 Writes one `.json` file per recording.
 ## Related pages
 - Outputs config: {doc}`outputs-config`
 - Save predictions in different output formats: {doc}`../how_to/save-predictions-in-different-output-formats`
 - Understanding formatted outputs: {doc}`../explanation/interpreting-formatted-outputs`
--- a/docs/source/reference/output-transforms.md
+++ b/docs/source/reference/output-transforms.md
@ -0,0 +1,37 @@
 # Output transforms reference
 Output transforms operate after decoding and before formatting.
 Defined in `batdetect2.outputs.transforms`.
 ## Top-level config
 `OutputTransformConfig` contains:
 - `detection_transforms`
 - `clip_transforms`
 ## Detection transforms
 Detection transforms operate on one detection at a time.
 Built-in examples include:
 - filtering by frequency,
 - filtering by duration.
 These can remove detections entirely if they fail the transform.
 ## Clip transforms
 Clip transforms operate on the list of detections for one clip.
 Built-in examples include:
 - removing detections above Nyquist,
 - removing detections at clip edges.
 ## Related pages
 - Outputs config: {doc}`outputs-config`
 - Understanding outputs: {doc}`../explanation/interpreting-formatted-outputs`
--- a/docs/source/reference/outputs-config.md
+++ b/docs/source/reference/outputs-config.md
@ -0,0 +1,33 @@
 # Outputs config reference
 `OutputsConfig` controls two layers of prediction handling:
 - how detections are transformed before formatting,
 - how formatted outputs are written to disk.
 Defined in `batdetect2.outputs.config`.
 ## Fields
 - `format`
  - output format config.
 - `transform`
  - output transform config.
 ## Mental model
 The output workflow is:
 1. model outputs are decoded into detections,
 2. optional output transforms filter or adjust those detections,
 3. a formatter serializes them to disk.
 ## Default behavior
 By default, the current stack uses the raw output formatter unless you override it.
 ## Related pages
 - Output formats: {doc}`output-formats`
 - Output transforms: {doc}`output-transforms`
 - Save predictions in different output formats: {doc}`../how_to/save-predictions-in-different-output-formats`