docs: clarify train base-dir option

Merge branch 'train' into doc
docs: add legacy workflow and migration guidance
2026-05-22 22:32:18 +02:00 · 2026-04-30 16:51:24 +01:00 · 2026-04-30 11:50:04 +01:00 · 2026-04-30 11:48:25 +01:00 · 2026-04-30 11:48:19 +01:00 · 2026-04-30 11:48:11 +01:00
40 changed files with 2210 additions and 114 deletions
--- a/README.md
+++ b/README.md
@ -1,11 +1,40 @@
 # BatDetect2
 <img style="display: block-inline;" width="64" height="64" src="assets/bat_icon.png"> Code for detecting and classifying bat echolocation calls in high frequency audio recordings.

-## Getting started
-### Python Environment
+## What BatDetect2 is useful for

-We recommend using an isolated Python environment to avoid dependency issues. Choose one
-of the following options:
+BatDetect2 can help you screen recordings for bat calls,
+find recordings that need expert review,
+and compare model outputs across sites or projects with appropriate caution.
+
+It is best used as a tool to support ecological work,
+not as a replacement for validation or expert interpretation.
+
+## Start here
+
+If you want the simplest current workflow,
+use the documentation site and start with:
+
+- getting started: `docs/source/getting_started.md`
+- first tutorial: `docs/source/tutorials/run-inference-on-folder.md`
+
+The current docs default to:
+
+- the current command-line workflow: `batdetect2 predict`
+- the current Python workflow: `batdetect2.api_v2.BatDetect2API`
+
+If you need the previous workflow based on `batdetect2 detect` or `batdetect2.api`,
+use the legacy docs section and migration guide in the docs site.
+
+## Install BatDetect2
+
+If you already use Python,
+activate the environment where you want BatDetect2 to live.
+
+If not,
+create a fresh one first so BatDetect2 stays separate from other software on your machine.
+
+Two common options are:

 * Install the Anaconda Python 3.10 distribution for your operating system from [here](https://www.continuum.io/downloads). Create a new environment and activate it:

@ -14,7 +43,7 @@ conda create -y --name batdetect2 python==3.10
 conda activate batdetect2
 ```

-* If you already have Python installed (version >= 3.8,< 3.11) and prefer using virtual environments then:
+* If you already have Python installed (version >= 3.10,< 3.14), you can create a fresh environment with:

 ```bash
 python -m venv .venv
@ -37,6 +66,43 @@ pip install .

 Make sure you have the environment activated before installing `batdetect2`.

+## Run BatDetect2 on a folder of recordings
+
+Once installed,
+the simplest current workflow is to run BatDetect2 on a folder of `.wav` files.
+
+If you are working from this repository checkout,
+you can use this example checkpoint path:
+
+```text
+src/batdetect2/models/checkpoints/Net2DFast_UK_same.pth.tar
+```
+
+Example command:
+
+```bash
+batdetect2 predict directory \
+  src/batdetect2/models/checkpoints/Net2DFast_UK_same.pth.tar \
+  example_data/audio \
+  outputs
+```
+
+This will scan the audio files in `example_data/audio`
+and save model outputs to `outputs`.
+
+For the full beginner walkthrough,
+use `docs/source/tutorials/run-inference-on-folder.md`.
+
+## Legacy workflow
+
+The sections below are kept only for people maintaining older BatDetect2 scripts and analysis pipelines.
+
+If you are new to BatDetect2,
+stop here and use the current docs and command above.
+
+If you really do need the older workflow,
+the reference material is below.
+

 ## Try the model
 1) You can try a demo of the model (for UK species) on [huggingface](https://huggingface.co/spaces/macaodha/batdetect2).
@ -48,9 +114,15 @@ Make sure you have the environment activated before installing `batdetect2`.

 After following the above steps to install the code you can run the model on your own data.

+The remainder of this section is legacy reference material.
+

 ### Using the command line

+The commands below describe the legacy CLI workflow.
+
+For new work, prefer the current docs and `batdetect2 predict`.
+
 You can run the model by opening the command line and typing:
 ```bash
 batdetect2 detect AUDIO_DIR ANN_DIR DETECTION_THRESHOLD
@ -73,6 +145,10 @@ You can also specify which model to use by setting the `--model_path` argument.

 ### Using the Python API

+The examples below describe the legacy Python API.
+
+For new work, prefer `batdetect2.api_v2.BatDetect2API` and the current docs site.
+
 If you prefer to process your data within a Python script then you can use the `batdetect2` Python API.

 ```python
@ -98,7 +174,10 @@ You can integrate the detections or the extracted features to your custom analys


 ## Training the model on your own data
-Take a look at the steps outlined in finetuning readme [here](batdetect2/finetune/readme.md) for a description of how to train your own model.
+Take a look at the training tutorial in the docs site first.
+
+If you are working from this repository checkout,
+start with `docs/source/tutorials/train-a-custom-model.md`.


 ## Data and annotations
--- a/docs/plan.md
+++ b/docs/plan.md
@ -0,0 +1,441 @@
+# Documentation Plan
+
+## Goal
+
+Build documentation around the main user stories:
+
+1. Run inference with the CLI on one folder of audio.
+2. Use the Python API for inference with fine-grained control over outputs,
+   including per-file workflows, class scores, features, and batch processing.
+3. Train or fine-tune a custom model.
+4. Evaluate a model and understand what the metrics mean.
+5. Understand the concepts needed to use BatDetect2 correctly.
+
+The docs should provide:
+
+- a simple happy path in tutorials,
+- richer task-oriented guidance in how-to guides,
+- complete lookup material in reference,
+- deep conceptual coverage in understanding.
+
+Note: the current docs tree uses `explanation/`. For Diataxis consistency, this
+plan uses `understanding/` as the target name for that conceptual section.
+
+## Current State Review
+
+### Looks reasonably complete
+
+- `docs/source/index.md`: good top-level orientation and navigation.
+- `docs/source/getting_started.md`: solid install and entry-point guidance.
+- `docs/source/explanation/*.md`: the conceptual pages are currently the
+  strongest part of the docs, especially pipeline overview, thresholds,
+  preprocessing consistency, and targets.
+- `docs/source/how_to/configure-*.md` and related target/data pages: practical
+  support docs for preprocessing, targets, ROI mapping, and dataset formats are
+  in decent shape.
+- `docs/source/reference/cli/*.rst`: CLI reference wiring exists and should
+  render useful option-level documentation from the Click commands.
+
+### Partially complete
+
+- `docs/source/how_to/run-batch-predictions.md`: useful, but thin.
+- `docs/source/how_to/tune-detection-threshold.md`: useful, but too brief for
+  a key workflow.
+- `docs/source/reference/preprocessing-config.md`
+- `docs/source/reference/postprocess-config.md`
+- `docs/source/reference/targets-config-workflow.md`
+
+These are good summaries, but they do not yet feel like complete references for
+all the customization surfaces available in the code.
+
+### Clearly incomplete or scaffolded
+
+- `docs/source/tutorials/run-inference-on-folder.md`
+- `docs/source/tutorials/integrate-with-a-python-pipeline.md`
+- `docs/source/tutorials/train-a-custom-model.md`
+- `docs/source/tutorials/evaluate-on-a-test-set.md`
+
+All four main tutorials are still starter scaffolds. This is the biggest gap in
+the current user story.
+
+### Major mismatch to resolve
+
+- `README.md` still tells an older story built around `batdetect2 detect` and
+  `batdetect2.api`.
+- The docs site tells the newer story built around `batdetect2 predict` and
+  `batdetect2.api_v2`.
+
+This creates avoidable confusion for users and should be treated as a priority
+documentation alignment issue.
+
+### Legacy documentation is not yet placed clearly
+
+The repo still contains meaningful legacy documentation material, but it is not
+yet presented as a clearly marked legacy path inside the docs.
+
+Users need two things:
+
+- a clear message that these docs exist for the previous BatDetect2 workflow,
+- a clear recommendation that new users should prefer the newer CLI/API
+  workflows and migrate where possible.
+
+## Legacy Documentation Plan
+
+### Goals
+
+1. Preserve access to the old workflow documentation.
+2. Prevent new users from accidentally following legacy guidance.
+3. Give current users a clear migration path from legacy to current workflows.
+
+### Proposed location
+
+Add a dedicated legacy area inside the docs, for example:
+
+- `docs/source/legacy/index.md`
+- `docs/source/legacy/cli-detect.md`
+- `docs/source/legacy/python-api.md`
+- `docs/source/legacy/feature-extraction.md`
+- `docs/source/legacy/migration-guide.md`
+
+This keeps the material available without mixing it into the main happy-path
+docs.
+
+### User-facing messaging
+
+Add clear notices in all relevant navigation entry points.
+
+Suggested message pattern:
+
+"If you want to use the previous version of BatDetect2, see the legacy
+documentation. For new workflows, we recommend using the current `predict`
+CLI and `BatDetect2API` interfaces."
+
+Places that should link to the legacy docs:
+
+- `docs/source/index.md`
+- `docs/source/getting_started.md`
+- `README.md`
+- tutorial landing pages where users may be coming from older workflows
+- any page that mentions the old `detect` command or old Python API
+
+### Migration guide plan
+
+Add a dedicated migration guide that explains:
+
+1. who should migrate now and who may need to stay on the legacy workflow,
+2. the mapping from old CLI commands to new CLI commands,
+3. the mapping from old Python API calls to new `api_v2` / `BatDetect2API`
+   patterns,
+4. what changed in outputs, terminology, and configuration,
+5. how legacy feature extraction concepts map to the new API surfaces,
+6. what behavior differences users should validate before switching,
+7. a short migration checklist.
+
+High-priority migration mappings to document:
+
+- `batdetect2 detect` -> `batdetect2 predict directory`
+- old `batdetect2.api` file processing -> `BatDetect2API.from_checkpoint(... )`
+  plus `process_file`, `process_files`, `process_audio`, or
+  `process_spectrogram`
+- legacy `cnn_feats`, `spec_features`, and `spec_slices` -> current output and
+  feature access patterns, with explicit notes where there is no direct
+  one-to-one replacement
+
+### Legacy content handling plan
+
+For each legacy page or legacy concept:
+
+1. Decide whether it should be preserved as-is, rewritten as a legacy page, or
+   replaced by the migration guide.
+2. Add a prominent warning banner saying it describes the previous workflow.
+3. Link forward to the current equivalent page when one exists.
+
+### Definition of done for legacy handling
+
+Legacy documentation work is done when:
+
+1. a reader can clearly distinguish legacy from current docs,
+2. old users can still find the previous workflow documentation,
+3. new users are consistently directed to the new docs,
+4. there is a practical migration guide covering the main CLI and Python API
+   transitions.
+
+## Main Gaps By User Story
+
+### 1. CLI inference
+
+Current coverage exists, but the happy path is not truly documented yet.
+
+Missing:
+
+- a full worked tutorial from input audio to saved outputs,
+- clear guidance on what outputs are written and how to inspect them,
+- stronger documentation for `predict dataset`,
+- a clearer story for default model vs custom checkpoint,
+- practical guidance for selecting output formats and thresholds.
+
+### 2. Python API inference
+
+This is currently the weakest major story.
+
+The code exposes much more than the docs explain, including:
+
+- `BatDetect2API.from_checkpoint` and `from_config`,
+- `process_file`, `process_files`, `process_directory`, `process_clips`,
+- `process_audio`, `process_spectrogram`,
+- `get_top_class_name`, `get_class_scores`, `get_detection_features`,
+- `save_predictions` and `load_predictions`.
+
+Missing docs:
+
+- an API-first tutorial with a simple path,
+- a how-to for file-by-file inspection and custom post-processing,
+- a how-to for batch API inference,
+- a reference page for `BatDetect2API`,
+- an explanation of what the feature vectors are and how users should think
+  about them.
+
+Important terminology note:
+
+- the old API/docs talk about `cnn_feats`, `spec_features`, and `spec_slices`,
+- the new API exposes per-detection `features`,
+- users interested in embeddings / downstream exploration will need a clear,
+  explicit doc that connects these ideas.
+
+### 3. Batch inference
+
+Batch prediction exists in both CLI and API workflows, but the docs do not yet
+explain the design space well.
+
+Missing:
+
+- when to use `directory` vs `file_list` vs `dataset`,
+- how clipping works during inference,
+- what `InferenceConfig` controls,
+- how batch size, workers, and output format choices affect runs,
+- how to organize large runs reproducibly.
+
+### 4. Training a custom model
+
+Supporting pages exist, but the end-to-end story is not yet there.
+
+Missing:
+
+- one complete tutorial from dataset config to checkpoints and sanity check,
+- a "minimum viable training setup" page,
+- clearer explanation of how model, targets, audio, training, inference,
+  outputs, and logging configs fit together,
+- a fine-tuning story versus training from scratch.
+
+### 5. Evaluation
+
+Evaluation is significantly under-documented relative to the code.
+
+Missing:
+
+- what evaluation tasks exist,
+- what metrics and plots are produced,
+- how predictions are matched to annotations,
+- how to interpret failures and trade-offs,
+- how to configure evaluation for different research questions.
+
+### 6. Understanding / concepts
+
+This is the best-developed section today, but it still needs expansion.
+
+Concepts that should be covered more fully:
+
+- what the model predicts,
+- what the raw and formatted outputs represent,
+- how to interpret detection scores and class scores,
+- what targets are and how they shape training and decoding,
+- how preprocessing choices affect model behavior,
+- what the extracted features represent and when they are useful,
+- what evaluation metrics actually measure,
+- why local validation is required before ecological inference.
+
+## Proposed Documentation Architecture
+
+## Target Table of Contents
+
+### Home
+
+- Home
+- Getting started
+- FAQ
+- Legacy docs
+
+### Tutorials
+
+These should be the default path for most users.
+
+- Tutorial: Run inference on a folder of audio
+- Tutorial: Explore predictions in Python for one file
+- Tutorial: Train a custom model
+- Tutorial: Evaluate a trained model
+
+### How-to Guides
+
+These cover practical tasks once the user is past the happy path.
+
+- How to choose an inference input mode
+- How to run batch predictions from a directory
+- How to run batch predictions from a file list
+- How to run predictions from a dataset config
+- How to tune detection thresholds
+- How to inspect class scores in Python
+- How to inspect detection features in Python
+- How to save predictions in different output formats
+- How to configure inference clipping
+- How to configure audio preprocessing
+- How to configure spectrogram preprocessing
+- How to configure target definitions
+- How to define target classes
+- How to configure ROI mapping
+- How to configure an AOEF dataset
+- How to import legacy BatDetect2 annotations
+- How to fine-tune from a checkpoint
+- How to choose and configure evaluation tasks
+- How to interpret evaluation outputs
+
+### Reference
+
+This should be the complete lookup layer.
+
+- CLI reference
+- CLI reference: base command and global options
+- CLI reference: predict
+- CLI reference: data
+- CLI reference: train
+- CLI reference: evaluate
+- CLI reference: legacy detect
+- API reference: `BatDetect2API`
+- Config reference: top-level app config
+- Config reference: inference config
+- Config reference: evaluation config
+- Config reference: outputs config
+- Config reference: output formats
+- Config reference: output transforms
+- Config reference: preprocessing config
+- Config reference: postprocess config
+- Config reference: targets config workflow
+- Reference: data sources
+- Reference: targets module
+
+### Understanding
+
+This is the conceptual layer and should carry the deeper Diataxis
+"understanding" material.
+
+- What BatDetect2 predicts
+- How the pipeline fits together
+- How to interpret detection scores and class scores
+- How to interpret formatted outputs
+- What extracted features / embeddings are and are not
+- Postprocessing and thresholds
+- Preprocessing consistency and domain shift
+- Target encoding and decoding
+- Evaluation concepts and matching behavior
+- Model output, validation, and ecological interpretation
+
+### Legacy
+
+This is a clearly signposted area for the previous workflow only.
+
+- Legacy overview
+- Legacy CLI workflow with `batdetect2 detect`
+- Legacy Python API with `batdetect2.api`
+- Legacy feature extraction outputs
+- Migration guide: legacy to current workflows
+
+### Tutorials
+
+Keep tutorials opinionated and minimal. Each one should show the default happy
+path with the fewest possible choices.
+
+Planned tutorial set:
+
+1. Run inference on a folder of audio.
+2. Explore predictions in Python for one file.
+3. Train a custom model.
+4. Evaluate a trained model.
+
+### How-to Guides
+
+Use how-to guides for branching tasks and customization.
+
+Planned additions or expansions:
+
+- Choose an inference input mode: directory, file list, or dataset.
+- Run large batch inference reproducibly.
+- Save predictions in different output formats.
+- Inspect class scores and features in Python.
+- Explore detection features / embeddings downstream.
+- Tune clipping and inference settings.
+- Fine-tune from a checkpoint.
+- Choose and configure evaluation tasks.
+- Interpret evaluation artifacts.
+
+### Reference
+
+Reference should become the complete map of all configurable surfaces.
+
+High-priority additions:
+
+- `BatDetect2API` reference.
+- `InferenceConfig` reference.
+- `EvaluationConfig` reference.
+- `OutputsConfig` and output format reference.
+- Output transform reference.
+- clearer config composition reference for the full app config.
+
+### Understanding
+
+This is where the deeper conceptual material should live.
+
+High-priority pages:
+
+1. What BatDetect2 predicts.
+2. How to interpret outputs, scores, and uncertainty.
+3. What extracted features / embeddings are and are not.
+4. Targets, labels, and decoded outputs.
+5. Preprocessing consistency and domain shift.
+6. Postprocessing, thresholds, and output density.
+7. How evaluation works and what the metrics mean.
+8. Why local validation is required before ecological interpretation.
+
+## Priority Order
+
+### Phase 1: Fix the primary user journey
+
+1. Expand the four scaffold tutorials into real end-to-end guides.
+2. Add a proper Python/API inference story.
+3. Document outputs and how to inspect them.
+4. Align `README.md` with the newer CLI/API documentation story.
+5. Create the legacy docs section and add clear signposting to it.
+
+### Phase 2: Cover the customization surface
+
+1. Add how-to guides for batch inference, output formats, and API inspection.
+2. Add reference pages for inference, outputs, evaluation, and API surfaces.
+3. Add fine-tuning and advanced training guidance.
+4. Write the migration guide from legacy to current workflows.
+
+### Phase 3: Deepen understanding
+
+1. Expand the conceptual section into a true understanding section.
+2. Add pages for output interpretation, features/embeddings, and evaluation
+   concepts.
+3. Reader-test the docs against realistic user questions.
+
+## Immediate Next Steps
+
+1. Decide whether to rename `explanation/` to `understanding/` or keep the
+   current directory name and just treat it as the Diataxis understanding
+   section.
+2. Draft the target table of contents for Tutorials, How-to, Reference, and
+   Understanding.
+3. Draft the legacy docs section and migration-guide table of contents.
+4. Rewrite the four scaffold tutorials first.
+5. Add the missing API, outputs, evaluation, and migration documentation
+   immediately after.
--- a/docs/source/_static/.gitkeep
+++ b/docs/source/_static/.gitkeep
--- a/docs/source/explanation/evaluation-concepts-and-matching.md
+++ b/docs/source/explanation/evaluation-concepts-and-matching.md
@ -0,0 +1,48 @@
+# Evaluation concepts and matching
+
+Evaluation is not just "run predictions and compute one number".
+
+The reported metric depends on the evaluation task, the matching rule, and the treatment of clip boundaries and generic labels.
+
+## Task families answer different questions
+
+Built-in task families include:
+
+- sound event detection,
+- sound event classification,
+- top-class detection,
+- clip detection,
+- clip classification.
+
+Choose the task that matches the scientific or engineering question.
+
+## Matching matters
+
+For sound-event-style tasks, predictions and annotations are matched using an affinity function.
+
+Important controls include:
+
+- `affinity`,
+- `affinity_threshold`,
+- `strict_match`,
+- `ignore_start_end`.
+
+Small changes here can change the reported metric without changing the underlying predictions.
+
+## Boundary handling matters
+
+The evaluation base task can exclude events near clip boundaries through `ignore_start_end`.
+
+This is useful when clip boundaries make matches ambiguous.
+
+## Generic labels can matter in classification
+
+Classification tasks can include or exclude generic targets depending on configuration.
+
+That affects what counts as a valid class-level comparison.
+
+## Related pages
+
+- Evaluate on a test set: {doc}`../tutorials/evaluate-on-a-test-set`
+- Evaluation config reference: {doc}`../reference/evaluation-config`
+- Model output and validation: {doc}`model-output-and-validation`
--- a/docs/source/explanation/extracted-features-and-embeddings.md
+++ b/docs/source/explanation/extracted-features-and-embeddings.md
@ -0,0 +1,36 @@
+# Extracted features and embeddings
+
+The current API exposes a per-detection `features` vector.
+
+Older BatDetect2 workflows also exposed concepts such as `cnn_feats`, `spec_features`, and `spec_slices`.
+
+## What the current feature vector is
+
+In the current stack, each retained detection can carry an internal feature representation produced by the model output pipeline.
+
+This is useful for downstream exploration, comparison, and custom analysis.
+
+## What these features are not
+
+They are not automatically human-interpretable ecological variables.
+
+They are also not a substitute for careful validation.
+
+## Why people refer to them as embeddings
+
+In practice, users often treat these feature vectors as embeddings because they can be used as dense learned representations of detections.
+
+That usage is reasonable, but you should still treat them as model-derived internal representations whose meaning depends on the training setup.
+
+## Legacy terminology versus current terminology
+
+- legacy `cnn_feats` referred to CNN feature outputs in the older workflow,
+- legacy `spec_features` referred to lower-level extracted call features,
+- current `features` are the per-detection vectors attached to `Detection` objects.
+
+These are related ideas, but not necessarily one-to-one replacements.
+
+## Related pages
+
+- Inspect detection features in Python: {doc}`../how_to/inspect-detection-features-in-python`
+- Legacy feature extraction: {doc}`../legacy/feature-extraction`
--- a/docs/source/explanation/index.md
+++ b/docs/source/explanation/index.md
@ -1,14 +1,19 @@
-# Explanation
+# Understanding

-Explanation pages describe why BatDetect2 behaves as it does and how to reason
-about trade-offs.
+Understanding pages explain how BatDetect2 works, what its outputs mean, and how to reason about trade-offs.
+
+Use this section when you want help interpreting the tool, not just running it.

 ```{toctree}
 :maxdepth: 1

+what-batdetect2-predicts
+interpreting-formatted-outputs
+extracted-features-and-embeddings
 model-output-and-validation
 postprocessing-and-thresholds
 pipeline-overview
 preprocessing-consistency
 target-encoding-and-decoding
+evaluation-concepts-and-matching
 ```
--- a/docs/source/explanation/interpreting-formatted-outputs.md
+++ b/docs/source/explanation/interpreting-formatted-outputs.md
@ -0,0 +1,36 @@
+# Interpreting formatted outputs
+
+BatDetect2 can write predictions in several output formats.
+
+Those formats are different views of the same underlying detections, not different model behaviors.
+
+## Separate the underlying detection from the serialized file
+
+Internally, the current stack works with clip-level detections containing geometry, detection score, class scores, and features.
+
+Output formatters then serialize those detections in different ways.
+
+## Raw outputs are richest
+
+The `raw` format preserves the broadest structured view of detections and is a good default when you want to inspect or reload predictions later.
+
+## Tabular outputs are for analysis convenience
+
+The `parquet` format is convenient for data analysis workflows, but the tabular representation is only one projection of the underlying detection object.
+
+## Legacy-shaped outputs are mainly for compatibility
+
+The `batdetect2` formatter writes the older BatDetect2-style JSON shape.
+
+Use it when you need compatibility with older downstream tools or workflows.
+
+## The meaning does not come from the file extension
+
+Do not assume that a `.json`, `.parquet`, or `.nc` file changes what the model predicted.
+
+It changes how the prediction is packaged and how much detail is retained.
+
+## Related pages
+
+- Output formats reference: {doc}`../reference/output-formats`
+- Outputs config reference: {doc}`../reference/outputs-config`
--- a/docs/source/explanation/what-batdetect2-predicts.md
+++ b/docs/source/explanation/what-batdetect2-predicts.md
@ -0,0 +1,45 @@
+# What BatDetect2 predicts
+
+BatDetect2 predicts call-level events, not recording-level truth.
+
+For each retained detection, the current stack can expose:
+
+- a geometry describing where the event sits in time-frequency space,
+- a detection score,
+- a class-score vector,
+- an internal feature vector.
+
+## Detection score versus class scores
+
+These are different outputs and should not be interpreted as the same thing.
+
+- The detection score is about whether the event is kept as a detection.
+- The class-score vector ranks classes for that detected event.
+
+A detection can be kept while still having uncertain class identity.
+
+## Predictions are conditional on the workflow
+
+The final output also depends on:
+
+- preprocessing,
+- postprocessing,
+- thresholds,
+- target definitions,
+- output transforms.
+
+That is why two runs can differ even when they use the same checkpoint.
+
+## What BatDetect2 does not predict
+
+BatDetect2 does not directly output ecological truth.
+
+It also does not eliminate the need for local validation.
+
+Use reviewed local data before making ecological claims.
+
+## Related pages
+
+- Model output and validation: {doc}`model-output-and-validation`
+- Postprocessing and thresholds: {doc}`postprocessing-and-thresholds`
+- Interpreting formatted outputs: {doc}`interpreting-formatted-outputs`
--- a/docs/source/getting_started.md
+++ b/docs/source/getting_started.md
@ -1,22 +1,41 @@
 # Getting started

-BatDetect2 is both a command line tool (CLI) and a Python library.
+If you want to run BatDetect2 on your recordings,
+start with the command-line route below.

- Use the CLI if you want to run existing models or train your own models from
-  the terminal.
- Use the Python package if you want to integrate BatDetect2 into your own
-  scripts, notebooks, or analysis pipeline.
+You do not need to write Python code for a standard first run.
+
+BatDetect2 also has a Python interface,
+but that is mainly for users writing their own analysis scripts.
+
+- Use the command-line route if you want to run an existing model or train your own model by typing commands in a terminal window.
+- Use the Python route only if you already want to work in scripts or notebooks.
+
+```{note}
+If you are looking for the previous BatDetect2 workflow based on `batdetect2 detect` or `batdetect2.api`, go to {doc}`legacy/index`.
+New docs default to the current `predict` CLI and `BatDetect2API` workflow.
+```

 If you want to try BatDetect2 before installing anything locally:

 - [Hugging Face demo (UK species)](https://huggingface.co/spaces/macaodha/batdetect2)
 - [Google Colab notebook](https://colab.research.google.com/github/macaodha/batdetect2/blob/master/batdetect2_notebook.ipynb)

-## Prerequisites
+## The simplest route for most users
+
+1. Install BatDetect2.
+2. Use a model checkpoint.
+3. Run the first tutorial on a folder of recordings.
+
+If that is what you want,
+you can ignore the Python sections for now.
+
+## Install BatDetect2

 We recommend `uv` for both workflows.
-`uv` is a fast Python package and environment manager that keeps installs
-isolated and reproducible.
+
+`uv` is a tool that helps install Python software cleanly,
+without mixing it into the rest of your machine.

 - Use `uv tool` to install the CLI.
 - Use `uv add` to add `batdetect2` as a dependency in a Python project.
@ -26,8 +45,8 @@ Install `uv` first by following their

 ## Install the CLI

-The following installs `batdetect2` in an isolated tool environment and exposes
-the `batdetect2` command on your machine.
+The following installs `batdetect2` in its own small environment and makes the
+`batdetect2` command available on your machine.

 ```bash
 uv tool install batdetect2
@ -49,16 +68,34 @@ Run your first workflow:

 Go to {doc}`tutorials/run-inference-on-folder` for a complete first run.

-## Integrate with your Python project
+## Choose a model checkpoint

-If you are using BatDetect2 from Python code, add it to your project
-dependencies:
+The current command-line and Python workflows expect an explicit checkpoint path.
+
+A checkpoint is the saved model file that BatDetect2 will use for prediction.
+
+You can use:
+
+- a checkpoint you trained yourself, or
+- a checkpoint distributed with your installation or repository checkout.
+
+In this repository checkout, an example pretrained checkpoint is available at:
+
+```text
+src/batdetect2/models/checkpoints/Net2DFast_UK_same.pth.tar
+```
+
+Use that path in the tutorial commands if you want a concrete starting point from this source tree.
+
+## Python route for users writing code
+
+If you are using BatDetect2 from Python code, add it to your Python project:

 ```bash
 uv add batdetect2
 ```

-This keeps dependency metadata and the environment in sync.
+This keeps your project settings and installed packages in sync.

 ### Alternative with `pip`

@ -77,7 +114,10 @@ pip install batdetect2

 ## What's next

- Run your first detection workflow:
+- Run your first workflow on a folder of recordings:
  {doc}`tutorials/run-inference-on-folder`
- For practical task recipes, go to {doc}`how_to/index`
- For command and option details, go to {doc}`reference/cli/index`
+- If you write code and want the Python route:
+  {doc}`tutorials/integrate-with-a-python-pipeline`
+- For common practical tasks, go to {doc}`how_to/index`
+- For detailed command help, go to {doc}`reference/cli/index`
+- To understand outputs and trade-offs, go to {doc}`explanation/index`
--- a/docs/source/how_to/choose-an-inference-input-mode.md
+++ b/docs/source/how_to/choose-an-inference-input-mode.md
@ -0,0 +1,66 @@
+# How to choose an inference input mode
+
+Use this guide to decide whether `predict directory`, `predict file_list`, or `predict dataset` is the right entry point for your run.
+
+## Use `predict directory` when the recordings already live together
+
+This is the simplest choice.
+
+Use it when:
+
+- your recordings are already organized in one directory tree,
+- you want BatDetect2 to discover audio files for you,
+- you are doing a first pass over a folder of recordings.
+
+```bash
+batdetect2 predict directory \
+  path/to/model.ckpt \
+  path/to/audio_dir \
+  path/to/outputs
+```
+
+## Use `predict file_list` when you need explicit control over the file set
+
+Use it when:
+
+- you want to run only a selected subset,
+- your files are spread across directories,
+- another tool has already produced the exact list of recordings to process.
+
+The list file should contain one path per line.
+
+```bash
+batdetect2 predict file_list \
+  path/to/model.ckpt \
+  path/to/audio_files.txt \
+  path/to/outputs
+```
+
+## Use `predict dataset` when your workflow is already annotation-set driven
+
+Use it when:
+
+- your project already has a `soundevent` annotation set,
+- you want prediction runs aligned with that annotation metadata,
+- you want BatDetect2 to resolve recording paths from the annotation set.
+
+```bash
+batdetect2 predict dataset \
+  path/to/model.ckpt \
+  path/to/annotation_set.json \
+  path/to/outputs
+```
+
+The dataset command reads a `soundevent` annotation set and extracts unique recording paths before inference.
+
+## Rule of thumb
+
+- Start with `directory` for the easiest first run.
+- Use `file_list` when selection matters.
+- Use `dataset` when the rest of your workflow is already dataset-based.
+
+## Related pages
+
+- Run batch predictions: {doc}`run-batch-predictions`
+- Tune inference clipping: {doc}`tune-inference-clipping`
+- Predict command reference: {doc}`../reference/cli/predict`
--- a/docs/source/how_to/choose-and-configure-evaluation-tasks.md
+++ b/docs/source/how_to/choose-and-configure-evaluation-tasks.md
@ -0,0 +1,66 @@
+# How to choose and configure evaluation tasks
+
+Use this guide when the default evaluation tasks do not match the question you want to answer.
+
+## Know the default first
+
+By default, BatDetect2 evaluation starts with:
+
+- sound event detection,
+- sound event classification.
+
+Those are good defaults for many projects, but not for all of them.
+
+## Choose the task that matches the question
+
+Common built-in task families include:
+
+- `sound_event_detection`
+- `sound_event_classification`
+- `top_class_detection`
+- `clip_detection`
+- `clip_classification`
+
+Choose based on the question you care about.
+
+- Use sound-event tasks when you care about individual call events.
+- Use clip tasks when you care about clip-level presence or clip-level class evidence.
+- Use top-class detection when you want matching based on the highest-scoring class per detection.
+
+## Configure tasks in `EvaluationConfig`
+
+Example:
+
+```yaml
+tasks:
+  - name: sound_event_detection
+    prefix: detection
+    affinity_threshold: 0.0
+    strict_match: true
+  - name: clip_classification
+    prefix: clip_classification
+```
+
+Pass the config with:
+
+```bash
+batdetect2 evaluate \
+  path/to/model.ckpt \
+  path/to/test_dataset.yaml \
+  --base-dir path/to/project_root \
+  --evaluation-config path/to/evaluation.yaml
+```
+
+Include `--base-dir` when the dataset config resolves recordings through relative paths.
+
+## Change one thing at a time
+
+When comparing models or settings, avoid changing task definitions, thresholds, matching behavior, and datasets all at once.
+
+Otherwise it becomes hard to explain why the metric changed.
+
+## Related pages
+
+- Evaluation tutorial: {doc}`../tutorials/evaluate-on-a-test-set`
+- Evaluation config reference: {doc}`../reference/evaluation-config`
+- Evaluation concepts: {doc}`../explanation/evaluation-concepts-and-matching`
--- a/docs/source/how_to/fine-tune-from-a-checkpoint.md
+++ b/docs/source/how_to/fine-tune-from-a-checkpoint.md
@ -0,0 +1,45 @@
+# How to fine-tune from a checkpoint
+
+Use this guide when you want to continue from an existing checkpoint instead of training a fresh model config.
+
+## Use `--model` for checkpoint-based training
+
+Pass a checkpoint with `--model`.
+
+Do not combine `--model` with `--model-config`.
+
+```bash
+batdetect2 train \
+  path/to/train_dataset.yaml \
+  --val-dataset path/to/val_dataset.yaml \
+  --model path/to/model.ckpt \
+  --training-config path/to/training.yaml
+```
+
+## Keep targets and preprocessing aligned
+
+If you override targets or audio-related settings while fine-tuning, validate that they still match the checkpoint and your dataset.
+
+Mismatches here can produce confusing failures or invalid comparisons.
+
+## Decide what question the fine-tune should answer
+
+Common fine-tuning goals are:
+
+- adapting to local recording conditions,
+- adapting to a new label set,
+- improving performance on a narrower deployment context.
+
+Make that goal explicit before comparing results.
+
+## Evaluate after fine-tuning
+
+Always compare the fine-tuned checkpoint against a held-out dataset.
+
+Use the same evaluation setup when comparing before and after.
+
+## Related pages
+
+- Training tutorial: {doc}`../tutorials/train-a-custom-model`
+- Evaluate a test set: {doc}`../tutorials/evaluate-on-a-test-set`
+- Train command reference: {doc}`../reference/cli/train`
--- a/docs/source/how_to/index.md
+++ b/docs/source/how_to/index.md
@ -1,12 +1,22 @@
 # How-to Guides

-How-to guides help you complete specific tasks while working.
+How-to guides help you answer practical questions once you are past the first tutorial.
+
+Use this section when you already know the basic workflow and want help with one specific task.

 ```{toctree}
 :maxdepth: 1

+choose-an-inference-input-mode
 run-batch-predictions
+tune-inference-clipping
 tune-detection-threshold
+inspect-class-scores-in-python
+inspect-detection-features-in-python
+save-predictions-in-different-output-formats
+fine-tune-from-a-checkpoint
+choose-and-configure-evaluation-tasks
+interpret-evaluation-outputs
 configure-aoef-dataset
 import-legacy-batdetect2-annotations
 configure-audio-preprocessing
--- a/docs/source/how_to/inspect-class-scores-in-python.md
+++ b/docs/source/how_to/inspect-class-scores-in-python.md
@ -0,0 +1,44 @@
+# How to inspect class scores in Python
+
+Use this guide when you need more than the top class label for each detection.
+
+## Get the ranked class scores
+
+`BatDetect2API.get_class_scores` returns `(class_name, score)` pairs for one detection.
+
+```python
+from pathlib import Path
+
+from batdetect2.api_v2 import BatDetect2API
+
+api = BatDetect2API.from_checkpoint(Path("path/to/model.ckpt"))
+prediction = api.process_file(Path("path/to/audio.wav"))
+
+for detection in prediction.detections:
+    print("detection score:", detection.detection_score)
+    for class_name, score in api.get_class_scores(detection):
+        print(class_name, score)
+```
+
+## Separate detection confidence from class ranking
+
+Keep these two ideas separate:
+
+- `detection_score` tells you how strongly the model kept the event as a detection,
+- `class_scores` tell you how the model ranked classes for that detected event.
+
+A detection can have a reasonable detection score while still having uncertain class ranking.
+
+## Hide the top class if needed
+
+If you want to inspect only the alternatives, pass `include_top_class=False`.
+
+```python
+api.get_class_scores(detection, include_top_class=False)
+```
+
+## Related pages
+
+- Python tutorial: {doc}`../tutorials/integrate-with-a-python-pipeline`
+- API reference: {doc}`../reference/api`
+- Understanding scores: {doc}`../explanation/what-batdetect2-predicts`
--- a/docs/source/how_to/inspect-detection-features-in-python.md
+++ b/docs/source/how_to/inspect-detection-features-in-python.md
@ -0,0 +1,49 @@
+# How to inspect detection features in Python
+
+Use this guide when you want the per-detection feature vectors exposed by the current API.
+
+## Get the feature vector for one detection
+
+Each detection carries a `features` vector.
+
+The API exposes it through `get_detection_features`.
+
+```python
+from pathlib import Path
+
+from batdetect2.api_v2 import BatDetect2API
+
+api = BatDetect2API.from_checkpoint(Path("path/to/model.ckpt"))
+prediction = api.process_file(Path("path/to/audio.wav"))
+
+for detection in prediction.detections:
+    features = api.get_detection_features(detection)
+    print(features.shape)
+```
+
+## Use features for exploration, not as ground truth labels
+
+These features are internal model representations attached to detections.
+
+They can be useful for:
+
+- exploratory visualization,
+- downstream clustering,
+- comparison across detections,
+- building extra analysis pipelines.
+
+They do not replace validation.
+
+They also do not automatically have a one-to-one interpretation as ecological variables.
+
+## Save predictions with features included
+
+If you need features on disk, use an output format that supports them, such as `raw` or `parquet`, and keep feature inclusion enabled.
+
+See {doc}`save-predictions-in-different-output-formats`.
+
+## Related pages
+
+- Understanding features and embeddings: {doc}`../explanation/extracted-features-and-embeddings`
+- Output formats reference: {doc}`../reference/output-formats`
+- API reference: {doc}`../reference/api`
--- a/docs/source/how_to/interpret-evaluation-outputs.md
+++ b/docs/source/how_to/interpret-evaluation-outputs.md
@ -0,0 +1,41 @@
+# How to interpret evaluation outputs
+
+Use this guide after `batdetect2 evaluate` has written metrics and plots to disk.
+
+## Start by identifying the task
+
+Do not interpret a metric until you know which evaluation task produced it.
+
+For example, a detection score and a clip-classification score answer different questions.
+
+## Read the output directory as a bundle
+
+Treat the evaluation output directory as one package:
+
+- metrics,
+- plots,
+- saved predictions,
+- config context.
+
+Do not lift a single number out of context and treat it as the whole story.
+
+## Look for failure patterns, not just overall averages
+
+Check:
+
+- whether errors concentrate in certain taxa,
+- whether specific sites or recorder setups behave differently,
+- whether threshold choices are driving the result,
+- whether predictions are near clip boundaries or matching thresholds.
+
+## Keep validation and deployment questions separate
+
+A model can look good on one task and still be a poor fit for your deployment question.
+
+Interpret the outputs in relation to the real use case, not only the easiest metric to report.
+
+## Related pages
+
+- Evaluation tutorial: {doc}`../tutorials/evaluate-on-a-test-set`
+- Evaluation concepts: {doc}`../explanation/evaluation-concepts-and-matching`
+- Model output and validation: {doc}`../explanation/model-output-and-validation`
--- a/docs/source/how_to/run-batch-predictions.md
+++ b/docs/source/how_to/run-batch-predictions.md
@ -3,6 +3,8 @@
 This guide shows practical command patterns for directory-based and file-list
 prediction runs.

+Use it after you already know which input mode you want and need concrete command templates for a repeatable batch run.
+
 ## Predict from a directory

 ```bash
@ -12,6 +14,8 @@ batdetect2 predict directory \
  path/to/outputs
 ```

+Use this when BatDetect2 should discover the audio files for you.
+
 ## Predict from a file list

 ```bash
@ -21,10 +25,35 @@ batdetect2 predict file_list \
  path/to/outputs
 ```

+Use this when another part of your workflow already produced the exact recording list to process.
+
+## Predict from a dataset config
+
+```bash
+batdetect2 predict dataset \
+  path/to/model.ckpt \
+  path/to/annotation_set.json \
+  path/to/outputs
+```
+
+Use this when your project already has a `soundevent` annotation set and you want to extract unique recording paths from it.
+
 ## Useful options

 - `--batch-size` to control throughput.
 - `--workers` to set data-loading parallelism.
 - `--format` to select output format.
+- `--inference-config` to control clipping and loader behavior.
+- `--outputs-config` to control serialization and output transforms.
+- `--detection-threshold` to override the detection threshold for a run.

-For complete option details, see {doc}`../reference/cli/index`.
+## Practical workflow
+
+For large runs:
+
+1. test the command on a small reviewed subset,
+2. lock the config files and command shape,
+3. write outputs to a dedicated directory per run,
+4. record the checkpoint, config paths, and thresholds used.
+
+For complete option details, see {doc}`../reference/cli/predict`.
--- a/docs/source/how_to/save-predictions-in-different-output-formats.md
+++ b/docs/source/how_to/save-predictions-in-different-output-formats.md
@ -0,0 +1,64 @@
+# How to save predictions in different output formats
+
+Use this guide when you need BatDetect2 outputs in a specific representation for downstream tools.
+
+## Choose the format that matches the job
+
+Current built-in output formats include:
+
+- `raw`: one NetCDF file per clip, best for rich structured outputs,
+- `parquet`: tabular storage for data analysis workflows,
+- `soundevent`: prediction-set JSON for soundevent-style tooling,
+- `batdetect2`: legacy per-recording JSON output.
+
+## Select a format from the CLI
+
+Use `--format` for quick experiments.
+
+```bash
+batdetect2 predict directory \
+  path/to/model.ckpt \
+  path/to/audio_dir \
+  path/to/outputs \
+  --format parquet
+```
+
+## Use an outputs config for repeatable runs
+
+Use an outputs config when you want reproducible control over format and transforms.
+
+Example:
+
+```yaml
+format:
+  name: raw
+  include_class_scores: true
+  include_features: true
+  include_geometry: true
+transform:
+  detection_transforms: []
+  clip_transforms: []
+```
+
+Run with:
+
+```bash
+batdetect2 predict directory \
+  path/to/model.ckpt \
+  path/to/audio_dir \
+  path/to/outputs \
+  --outputs-config path/to/outputs.yaml
+```
+
+## Pick the simplest useful format
+
+- Use `raw` if you want the richest output surface and easy round-tripping.
+- Use `parquet` if you want tabular analysis in Python or data-lake workflows.
+- Use `soundevent` if you want prediction-set JSON.
+- Use `batdetect2` only when you need the legacy JSON shape.
+
+## Related pages
+
+- Outputs config reference: {doc}`../reference/outputs-config`
+- Output formats reference: {doc}`../reference/output-formats`
+- Output transforms reference: {doc}`../reference/output-transforms`
--- a/docs/source/how_to/tune-detection-threshold.md
+++ b/docs/source/how_to/tune-detection-threshold.md
@ -2,6 +2,10 @@

 Use this guide to compare detection outputs at different threshold values.

+The goal is not to find a universal threshold.
+
+The goal is to choose a threshold that fits your reviewed local data and the project trade-off between missed calls and false positives.
+
 ## 1) Start with a baseline run

 Run an initial prediction workflow and keep outputs in a dedicated folder.
@ -20,11 +24,22 @@ batdetect2 predict directory \
  --detection-threshold 0.3
 ```

+Keep each threshold run in a separate output directory.
+
+That makes it easier to compare counts and inspect example files without mixing results.
+
 ## 3) Validate against known calls

 Use files with trusted annotations or expert review to select a threshold that
 fits your project goals.

+Check both:
+
+- obvious false positives,
+- obvious missed calls.
+
+If class interpretation matters downstream, inspect class ranking behavior as well, not just detection counts.
+
 ## 4) Record your chosen setting

 Write down the chosen threshold and rationale so analyses are reproducible.
--- a/docs/source/how_to/tune-inference-clipping.md
+++ b/docs/source/how_to/tune-inference-clipping.md
@ -0,0 +1,63 @@
+# How to tune inference clipping
+
+Use this guide when long recordings need to be split into smaller clips during inference.
+
+## What clipping controls
+
+`InferenceConfig.clipping` controls how recordings are split before batching.
+
+Key fields are:
+
+- `duration`: clip duration in seconds,
+- `overlap`: overlap between adjacent clips,
+- `max_empty`: how much empty padding is allowed,
+- `discard_empty`: whether empty clips are dropped.
+
+## Start from the defaults
+
+Use the built-in clipping behavior first unless you already know you need something else.
+
+Only tune clipping when:
+
+- recordings are much longer than your normal working set,
+- you are seeing edge effects around calls,
+- you need tighter control over throughput or padding behavior.
+
+## Override clipping with an inference config
+
+Create an inference config file and pass it to `predict` or `evaluate`.
+
+Example:
+
+```yaml
+clipping:
+  enabled: true
+  duration: 0.5
+  overlap: 0.1
+  max_empty: 0.0
+  discard_empty: true
+loader:
+  batch_size: 8
+```
+
+Run with:
+
+```bash
+batdetect2 predict directory \
+  path/to/model.ckpt \
+  path/to/audio_dir \
+  path/to/outputs \
+  --inference-config path/to/inference.yaml
+```
+
+## Validate clipping changes on a small reviewed subset
+
+Changing clipping changes what the model sees per batch and can change how events near clip boundaries behave.
+
+Check a reviewed subset before applying clipping changes to a full project.
+
+## Related pages
+
+- Inference config reference: {doc}`../reference/inference-config`
+- Run batch predictions: {doc}`run-batch-predictions`
+- Understanding the pipeline: {doc}`../explanation/pipeline-overview`
--- a/docs/source/index.md
+++ b/docs/source/index.md
@ -1,63 +1,87 @@
 # Home

-Welcome to the batdetect2 docs.
+Welcome to the BatDetect2 documentation.

-## What is batdetect2?
+## What is BatDetect2?

-`batdetect2` is a bat echolocation detection model.
-It detects each individual echolocation call in an input spectrogram, draws a
-box around each call event, and predicts the most likely species for that call.
-A recording can contain many calls from different species.
+`batdetect2` detects bat echolocation calls in audio recordings.

-The current default model is trained for 17 UK species but you can also train
-new models from your own annotated data.
+It can help you screen large collections of recordings,
+find files that need expert review,
+and support ecology and conservation work where manual review alone would be slow.

-For details on the approach please read our pre-print:
-[Towards a General Approach for Bat Echolocation Detection and Classification](https://www.biorxiv.org/content/10.1101/2022.12.14.520490v1)
+In practice,
+BatDetect2 takes recordings,
+looks for likely bat calls,
+draws a box around each detected event,
+and scores the most likely class for that event.

-## What you can do
+The current default model is trained for 17 UK species.

- Run inference on your recordings and export predictions for downstream
-  analysis:
+The library also supports custom training,
+fine-tuning,
+evaluation,
+and more advanced use from Python.
+
+For details on the underlying approach, see the pre-print:
+[Towards a General Approach for Bat Echolocation Detection and Classification](https://www.biorxiv.org/content/10.1101/2022.12.14.520490v1)
+
+## A good first use for BatDetect2
+
+BatDetect2 is a good fit when you want to:
+
+- scan many recordings for likely bat activity,
+- prioritize files for expert review,
+- compare outputs across projects with appropriate caution,
+- build reviewed local datasets for later model improvement.
+
+It is not a substitute for validation.
+
+## Main user journeys
+
+- I want to run the model on my recordings:
  {doc}`tutorials/run-inference-on-folder`
- Train a custom model on your own annotated data:
-  {doc}`tutorials/train-a-custom-model`
- Evaluate model performance on a held-out test set:
-  {doc}`tutorials/evaluate-on-a-test-set`
- Integrate batdetect2 into Python scripts and notebooks:
+- I write code and want to use Python:
  {doc}`tutorials/integrate-with-a-python-pipeline`
+- I want to train or fine-tune a custom model:
+  {doc}`tutorials/train-a-custom-model`
+- I want to evaluate a trained model on held-out data:
+  {doc}`tutorials/evaluate-on-a-test-set`

 ```{warning}
 Treat outputs as model predictions, not ground truth.
-Always validate on reviewed local data before using results for ecological
-inference.
+Always validate on reviewed local data before using results for ecological inference.
 ```

-## Where to start
+```{note}
+Looking for the previous BatDetect2 workflow?
+See {doc}`legacy/index`.
+The legacy docs are still available, but new workflows should use `batdetect2 predict` and `BatDetect2API`.
+```

-If you are new, start with {doc}`getting_started`.
+## How to use this site

-For a low-code path, go to {doc}`tutorials/index`.
-If you are Python-savvy and want more control, go to {doc}`how_to/index` and
-{doc}`reference/index`.
+Start with {doc}`getting_started` if you are new.

-Each section has a different purpose:
-some pages teach by example, some focus on practical tasks, some are lookup
-material, and some explain trade-offs.
+Then choose the section that matches what you need.

-| Section       | Best for                                    | Start here               |
-| ------------- | ------------------------------------------- | ------------------------ |
-| Tutorials     | Learning by doing                           | {doc}`tutorials/index`   |
-| How-to guides | Solving practical tasks                     | {doc}`how_to/index`      |
-| Reference     | Looking up commands, configs, and APIs      | {doc}`reference/index`   |
-| Explanation   | Understanding design choices and trade-offs | {doc}`explanation/index` |
+If you are here mainly to run the model on recordings,
+start with Tutorials.
+
+| Section | Best for | Start here |
+| --- | --- | --- |
+| Tutorials | Step-by-step routes for the most common tasks | {doc}`tutorials/index` |
+| How-to guides | Answers to specific practical questions | {doc}`how_to/index` |
+| Reference | Detailed command and settings help | {doc}`reference/index` |
+| Understanding | Concepts, interpretation, and trade-offs | {doc}`explanation/index` |
+| Legacy | Previous workflow and migration guidance | {doc}`legacy/index` |

 ## Get in touch

 - GitHub repository:
  [macaodha/batdetect2](https://github.com/macaodha/batdetect2)
 - Questions, bug reports, and feature requests:
-  [GitHub Issues](https://github.com/macaodha/batdetect2/issues)
+  [GitHub Issues](https://github.com/macaodha/batdetect2/issues)
 - Common questions:
  {doc}`faq`
 - Want to contribute?
@ -65,7 +89,7 @@ material, and some explain trade-offs.

 ## Cite this work

-If you use batdetect2 in research, please cite:
+If you use BatDetect2 in research, please cite:

 Mac Aodha, O., Martinez Balvanera, S., Damstra, E., et al.
 (2022).
@ -82,6 +106,7 @@ tutorials/index
 how_to/index
 reference/index
 explanation/index
+legacy/index
 ```

 ```{toctree}
--- a/docs/source/legacy/cli-detect.md
+++ b/docs/source/legacy/cli-detect.md
@ -0,0 +1,39 @@
+# Legacy CLI workflow: `batdetect2 detect`
+
+This page documents the previous CLI workflow based on `batdetect2 detect`.
+
+```{warning}
+This is legacy documentation.
+For new workflows, use `batdetect2 predict directory` instead.
+If you are migrating, start with {doc}`migration-guide`.
+```
+
+## Legacy command shape
+
+```bash
+batdetect2 detect AUDIO_DIR ANN_DIR DETECTION_THRESHOLD
+```
+
+Common legacy options included:
+
+- `--cnn_features`
+- `--spec_features`
+- `--time_expansion_factor`
+- `--save_preds_if_empty`
+- `--model_path`
+
+## Current replacement
+
+The closest current CLI entry point is:
+
+```bash
+batdetect2 predict directory \
+  path/to/model.ckpt \
+  path/to/audio_dir \
+  path/to/outputs
+```
+
+## Related pages
+
+- Migration guide: {doc}`migration-guide`
+- Current predict docs: {doc}`../reference/cli/predict`
--- a/docs/source/legacy/feature-extraction.md
+++ b/docs/source/legacy/feature-extraction.md
@ -0,0 +1,34 @@
+# Legacy feature extraction outputs
+
+The previous BatDetect2 workflow exposed several output concepts that users may still rely on.
+
+These included:
+
+- `cnn_feats`
+- `spec_features`
+- `spec_slices`
+
+## Why this matters
+
+Users exploring older notebooks or downstream analysis code often encounter these names first.
+
+The current stack exposes a different surface centered on per-detection `features` plus configurable output formatters.
+
+## Migration note
+
+There is not always a strict one-to-one replacement.
+
+When migrating, validate which part of the old workflow you actually need:
+
+- low-level exported features,
+- spectrogram slices,
+- model-internal feature vectors,
+- legacy JSON output shape.
+
+Then map that need onto the current API and output format configuration.
+
+## Related pages
+
+- Migration guide: {doc}`migration-guide`
+- Current features explanation: {doc}`../explanation/extracted-features-and-embeddings`
+- Output formats reference: {doc}`../reference/output-formats`
--- a/docs/source/legacy/index.md
+++ b/docs/source/legacy/index.md
@ -0,0 +1,27 @@
+# Legacy documentation
+
+This section documents the previous BatDetect2 workflow.
+
+Use these pages if you need to keep working with the older `batdetect2 detect` command or the older `batdetect2.api` interface.
+
+For new projects, we recommend the current workflow:
+
+- CLI: `batdetect2 predict`
+- Python: `batdetect2.api_v2.BatDetect2API`
+
+If you are moving from the older workflow, start with {doc}`migration-guide`.
+
+```{warning}
+These pages describe the previous workflow.
+They are kept for continuity and migration support.
+New users should start with {doc}`../getting_started` and {doc}`../tutorials/index`.
+```
+
+```{toctree}
+:maxdepth: 1
+
+cli-detect
+python-api
+feature-extraction
+migration-guide
+```
--- a/docs/source/legacy/migration-guide.md
+++ b/docs/source/legacy/migration-guide.md
@ -0,0 +1,96 @@
+# Migration guide: legacy to current workflows
+
+Use this guide when moving from the previous BatDetect2 workflow to the current CLI and API.
+
+## Who should migrate now
+
+You should migrate if:
+
+- you are starting a new workflow,
+- you want the current docs path,
+- you want the newer CLI and API surface,
+- you are maintaining code that does not depend on the exact legacy JSON or feature outputs.
+
+You may need the legacy workflow a bit longer if:
+
+- downstream tooling depends on the exact old output structure,
+- you rely on older notebooks built around `batdetect2.api`,
+- you depend on legacy feature extraction outputs without a validated replacement yet.
+
+## CLI mapping
+
+- `batdetect2 detect AUDIO_DIR ANN_DIR DETECTION_THRESHOLD`
+  -> `batdetect2 predict directory MODEL_PATH AUDIO_DIR OUTPUT_PATH --detection-threshold ...`
+
+Main changes:
+
+- the model path is now a positional argument on the `predict` subcommand,
+- the current workflow expects an explicit checkpoint path rather than silently relying on the old default CLI behavior,
+- output formatting is configurable,
+- threshold override is an option rather than a required positional argument,
+- there are separate subcommands for directory, file-list, and dataset-driven inference.
+
+## Python API mapping
+
+- old: `import batdetect2.api as api`
+- current: `from batdetect2.api_v2 import BatDetect2API`
+
+Typical migration shape:
+
+```python
+from pathlib import Path
+
+from batdetect2.api_v2 import BatDetect2API
+
+api = BatDetect2API.from_checkpoint(Path("path/to/model.ckpt"))
+prediction = api.process_file(Path("path/to/audio.wav"))
+```
+
+Useful replacements:
+
+- legacy `process_file` -> current `BatDetect2API.process_file`
+- legacy `process_audio` -> current `BatDetect2API.process_audio`
+- legacy `process_spectrogram` -> current `BatDetect2API.process_spectrogram`
+- legacy one-off batch loops -> current `process_files` or CLI `predict`
+
+## Output and terminology changes
+
+Legacy workflows often centered on:
+
+- BatDetect2-style JSON output,
+- `cnn_feats`,
+- `spec_features`,
+- `spec_slices`.
+
+Current workflows center on:
+
+- `ClipDetections` and `Detection` objects,
+- per-detection `detection_score`,
+- per-detection `class_scores`,
+- per-detection `features`,
+- configurable output formatters.
+
+## What to validate after migration
+
+Before replacing a legacy workflow in production or research analysis, validate:
+
+- that thresholds are still appropriate,
+- that outputs are being saved in the right format,
+- that downstream code reads the new outputs correctly,
+- that feature-related assumptions still hold,
+- that evaluation and ecological interpretation are unchanged only where you have actually verified that.
+
+## Migration checklist
+
+1. Identify the old entry points you use.
+2. Replace them with the current CLI or `BatDetect2API` equivalents.
+3. Choose an output format explicitly.
+4. Re-run on a small reviewed subset.
+5. Compare outputs and downstream behavior.
+6. Update any notebooks or scripts that assume legacy field names.
+
+## Related pages
+
+- Current getting started: {doc}`../getting_started`
+- Current tutorials: {doc}`../tutorials/index`
+- Current API reference: {doc}`../reference/api`
--- a/docs/source/legacy/python-api.md
+++ b/docs/source/legacy/python-api.md
@ -0,0 +1,40 @@
+# Legacy Python API: `batdetect2.api`
+
+This page documents the previous Python API workflow based on `batdetect2.api`.
+
+```{warning}
+This is legacy documentation.
+For new workflows, use `batdetect2.api_v2.BatDetect2API`.
+If you are migrating, start with {doc}`migration-guide`.
+```
+
+## Legacy entry points
+
+Common legacy functions included:
+
+- `process_file`
+- `process_audio`
+- `process_spectrogram`
+- `load_audio`
+- `generate_spectrogram`
+- `postprocess`
+
+The legacy API also exposed the default model and default config more directly.
+
+## Current replacement
+
+The current Python path is:
+
+```python
+from pathlib import Path
+
+from batdetect2.api_v2 import BatDetect2API
+
+api = BatDetect2API.from_checkpoint(Path("path/to/model.ckpt"))
+prediction = api.process_file(Path("path/to/audio.wav"))
+```
+
+## Related pages
+
+- Migration guide: {doc}`migration-guide`
+- Current API reference: {doc}`../reference/api`
--- a/docs/source/reference/api.md
+++ b/docs/source/reference/api.md
@ -0,0 +1,65 @@
+# `BatDetect2API` reference
+
+`BatDetect2API` is the main entry point for the current Python workflow.
+
+It wraps model loading, inference, evaluation, output formatting, and training-related entry points behind one object.
+
+Defined in `batdetect2.api_v2`.
+
+## Create an API instance
+
+- `BatDetect2API.from_checkpoint(path, ...)`
+  - load a trained checkpoint and optional config overrides.
+- `BatDetect2API.from_config(config)`
+  - build a full stack from a `BatDetect2Config` object.
+
+## Inference methods
+
+- `process_file(audio_file, ...)`
+  - run inference for one recording.
+- `process_files(audio_files, ...)`
+  - run batch inference across a sequence of file paths.
+- `process_directory(audio_dir, ...)`
+  - run inference across the audio files found in one directory.
+- `process_clips(clips, ...)`
+  - run inference on an explicit sequence of clip objects.
+- `process_audio(audio, ...)`
+  - run inference starting from a waveform array.
+- `process_spectrogram(spec, ...)`
+  - run inference starting from a spectrogram tensor.
+
+## Prediction inspection helpers
+
+- `get_top_class_name(detection)`
+  - return the highest-scoring class name for one detection.
+- `get_class_scores(detection, include_top_class=True, sort_descending=True)`
+  - return ranked `(class_name, score)` pairs.
+- `get_detection_features(detection)`
+  - return the per-detection feature vector.
+
+## Audio loading helpers
+
+- `load_audio(path)`
+- `load_recording(recording)`
+- `load_clip(clip)`
+- `generate_spectrogram(audio)`
+
+## Output persistence helpers
+
+- `save_predictions(predictions, path, audio_dir=None, format=None, config=None)`
+- `load_predictions(path, format=None, config=None)`
+
+Use these when you want to save programmatic predictions without going through the CLI.
+
+## Training and evaluation entry points
+
+- `train(...)`
+- `finetune(...)`
+- `evaluate(...)`
+- `evaluate_predictions(...)`
+
+## Related pages
+
+- Python tutorial: {doc}`../tutorials/integrate-with-a-python-pipeline`
+- Outputs config reference: {doc}`outputs-config`
+- Output formats reference: {doc}`output-formats`
--- a/docs/source/reference/app-config.md
+++ b/docs/source/reference/app-config.md
@ -0,0 +1,38 @@
+# Top-level app config reference
+
+The top-level config object is `BatDetect2Config`.
+
+Defined in `batdetect2.config`.
+
+It combines the main configuration surfaces used across training, inference, evaluation, outputs, and logging.
+
+## Fields
+
+- `config_version`
+- `train`
+  - training-specific config.
+- `evaluation`
+  - evaluation task and plot config.
+- `model`
+  - model architecture, preprocessing, postprocessing, and targets.
+- `audio`
+  - audio loading and resampling config.
+- `inference`
+  - clipping and loader config for prediction-time workflows.
+- `outputs`
+  - output format and output transform config.
+- `logging`
+  - logging backend and formatting config.
+
+## Mental model
+
+Think of `BatDetect2Config` as the complete application wiring for the current stack.
+
+Use it when you want one reproducible config that describes the whole workflow.
+
+## Related pages
+
+- Inference config: {doc}`inference-config`
+- Evaluation config: {doc}`evaluation-config`
+- Outputs config: {doc}`outputs-config`
+- General config reference: {doc}`configs`
--- a/docs/source/reference/evaluation-config.md
+++ b/docs/source/reference/evaluation-config.md
@ -0,0 +1,46 @@
+# Evaluation config reference
+
+`EvaluationConfig` defines which evaluation tasks run and which plots they generate.
+
+Defined in `batdetect2.evaluate.config`.
+
+## Top-level fields
+
+- `tasks`
+  - list of task configs.
+
+## Built-in task families
+
+Current built-in tasks include:
+
+- `sound_event_detection`
+- `sound_event_classification`
+- `top_class_detection`
+- `clip_detection`
+- `clip_classification`
+
+## Shared task controls
+
+Common task-level controls include:
+
+- `prefix`
+- `ignore_start_end`
+
+Sound-event-style tasks also support:
+
+- `affinity`
+- `affinity_threshold`
+- `strict_match`
+
+## Default behavior
+
+The default evaluation config starts with:
+
+- sound event detection,
+- sound event classification.
+
+## Related pages
+
+- Choose and configure evaluation tasks: {doc}`../how_to/choose-and-configure-evaluation-tasks`
+- Evaluation concepts: {doc}`../explanation/evaluation-concepts-and-matching`
+- Evaluate CLI reference: {doc}`cli/evaluate`
--- a/docs/source/reference/index.md
+++ b/docs/source/reference/index.md
@ -1,12 +1,20 @@
 # Reference documentation

-Reference pages provide factual, complete descriptions of commands,
-configuration, and data structures.
+Reference pages are the detailed lookup pages.
+
+Use this section when you need exact command options, setting names, output details, or Python API entries.

 ```{toctree}
 :maxdepth: 1

 cli/index
+api
+app-config
+inference-config
+evaluation-config
+outputs-config
+output-formats
+output-transforms
 data-sources
 preprocessing-config
 postprocess-config
--- a/docs/source/reference/inference-config.md
+++ b/docs/source/reference/inference-config.md
@ -0,0 +1,41 @@
+# Inference config reference
+
+`InferenceConfig` controls how files are clipped and batched during prediction-time workflows.
+
+Defined in `batdetect2.inference.config`.
+
+## Top-level fields
+
+- `loader`
+  - data-loader settings for inference.
+- `clipping`
+  - controls how recordings are split into clips before batching.
+
+## `loader`
+
+Current built-in loader field:
+
+- `batch_size` (int, default `8`)
+
+## `clipping`
+
+Fields:
+
+- `enabled` (bool)
+- `duration` (float, seconds)
+- `overlap` (float, seconds)
+- `max_empty` (float)
+- `discard_empty` (bool)
+
+## When to override this config
+
+Override `InferenceConfig` when:
+
+- long recordings need different clipping behavior,
+- you want to tune batch size for your hardware,
+- you need reproducible prediction settings across runs.
+
+## Related pages
+
+- Tune inference clipping: {doc}`../how_to/tune-inference-clipping`
+- Predict CLI reference: {doc}`cli/predict`
--- a/docs/source/reference/output-formats.md
+++ b/docs/source/reference/output-formats.md
@ -0,0 +1,63 @@
+# Output formats reference
+
+BatDetect2 currently supports several built-in output formatters.
+
+## `raw`
+
+Defined by `RawOutputConfig`.
+
+Best for rich structured outputs and round-tripping.
+
+Key fields:
+
+- `include_class_scores`
+- `include_features`
+- `include_geometry`
+
+Writes one NetCDF `.nc` file per clip.
+
+## `parquet`
+
+Defined by `ParquetOutputConfig`.
+
+Best for tabular analysis workflows.
+
+Key fields:
+
+- `include_class_scores`
+- `include_features`
+- `include_geometry`
+
+Writes a parquet table, typically `predictions.parquet`.
+
+## `soundevent`
+
+Defined by `SoundEventOutputConfig`.
+
+Best when you want a `PredictionSet` JSON workflow.
+
+Key fields:
+
+- `top_k`
+- `min_score`
+
+Writes a prediction-set JSON file.
+
+## `batdetect2`
+
+Defined by `BatDetect2OutputConfig`.
+
+This is the legacy BatDetect2-style JSON output.
+
+Key fields:
+
+- `event_name`
+- `annotation_note`
+
+Writes one `.json` file per recording.
+
+## Related pages
+
+- Outputs config: {doc}`outputs-config`
+- Save predictions in different output formats: {doc}`../how_to/save-predictions-in-different-output-formats`
+- Understanding formatted outputs: {doc}`../explanation/interpreting-formatted-outputs`
--- a/docs/source/reference/output-transforms.md
+++ b/docs/source/reference/output-transforms.md
@ -0,0 +1,37 @@
+# Output transforms reference
+
+Output transforms operate after decoding and before formatting.
+
+Defined in `batdetect2.outputs.transforms`.
+
+## Top-level config
+
+`OutputTransformConfig` contains:
+
+- `detection_transforms`
+- `clip_transforms`
+
+## Detection transforms
+
+Detection transforms operate on one detection at a time.
+
+Built-in examples include:
+
+- filtering by frequency,
+- filtering by duration.
+
+These can remove detections entirely if they fail the transform.
+
+## Clip transforms
+
+Clip transforms operate on the list of detections for one clip.
+
+Built-in examples include:
+
+- removing detections above Nyquist,
+- removing detections at clip edges.
+
+## Related pages
+
+- Outputs config: {doc}`outputs-config`
+- Understanding outputs: {doc}`../explanation/interpreting-formatted-outputs`
--- a/docs/source/reference/outputs-config.md
+++ b/docs/source/reference/outputs-config.md
@ -0,0 +1,33 @@
+# Outputs config reference
+
+`OutputsConfig` controls two layers of prediction handling:
+
+- how detections are transformed before formatting,
+- how formatted outputs are written to disk.
+
+Defined in `batdetect2.outputs.config`.
+
+## Fields
+
+- `format`
+  - output format config.
+- `transform`
+  - output transform config.
+
+## Mental model
+
+The output workflow is:
+
+1. model outputs are decoded into detections,
+2. optional output transforms filter or adjust those detections,
+3. a formatter serializes them to disk.
+
+## Default behavior
+
+By default, the current stack uses the raw output formatter unless you override it.
+
+## Related pages
+
+- Output formats: {doc}`output-formats`
+- Output transforms: {doc}`output-transforms`
+- Save predictions in different output formats: {doc}`../how_to/save-predictions-in-different-output-formats`
--- a/docs/source/tutorials/evaluate-on-a-test-set.md
+++ b/docs/source/tutorials/evaluate-on-a-test-set.md
@ -3,33 +3,89 @@
 This tutorial shows how to evaluate a trained checkpoint on a held-out dataset
 and inspect the output metrics.

+This tutorial is for advanced users who want to compare one trained model against a separate test dataset.
+
 ## Before you start

 - A trained model checkpoint.
 - A test dataset config file.
 - (Optional) Targets, audio, inference, and evaluation config overrides.

-## Tutorial steps
+```{note}
+This page is for model evaluation.
+If you only want to run BatDetect2 on recordings,
+start with {doc}`run-inference-on-folder` instead.
+```

-1. Select a checkpoint and a test dataset.
-2. Run `batdetect2 evaluate`.
-3. Inspect output metrics and prediction artifacts.
-4. Record evaluation settings for reproducibility.
+## Outcome

-## Example command
+By the end of this tutorial you will have:
+
+- run `batdetect2 evaluate`,
+- written evaluation metrics and result files,
+- understood what to inspect first,
+- identified the next pages for evaluation concepts and configuration.
+
+## 1. Start with a held-out dataset
+
+Use a dataset that was not used for training or tuning.
+
+A held-out dataset is simply a separate dataset kept aside for evaluation.
+
+If you tune thresholds or configs on the same dataset that you report as final evaluation, the results will be optimistic.
+
+## 2. Run evaluation

 ```bash
 batdetect2 evaluate \
  path/to/model.ckpt \
  path/to/test_dataset.yaml \
+  --base-dir path/to/project_root \
  --output-dir path/to/eval_outputs
 ```

+This command loads the checkpoint,
+runs prediction on the test dataset,
+applies the chosen evaluation tasks,
+and writes metrics and result files to the output directory.
+
+Use `--base-dir` whenever the dataset config contains relative paths.
+
+That is the common case for project-local dataset files.
+
+## 3. Inspect the output directory
+
+Look for:
+
+- summary metrics,
+- generated plots,
+- saved prediction files if they were enabled,
+- enough metadata to reproduce the run later.
+
+The exact set depends on the configured evaluation tasks and plots.
+
+## 4. Interpret the results in context
+
+Do not reduce evaluation to a single number.
+
+Check:
+
+- which task the metric belongs to,
+- which thresholding or matching assumptions were used,
+- whether class-level behavior matches your use case,
+- whether the failures are concentrated in specific taxa, sites, or recording conditions.
+
+## 5. Record the evaluation setup
+
+Keep the command, config files, checkpoint path, and dataset version together.
+
+That matters for reproducibility and for later model comparisons.
+
 ## What to do next

 - Compare thresholds on representative files:
  {doc}`../how_to/tune-detection-threshold`
+- Configure evaluation tasks: {doc}`../how_to/choose-and-configure-evaluation-tasks`
+- Interpret evaluation artifacts: {doc}`../how_to/interpret-evaluation-outputs`
+- Learn the evaluation concepts: {doc}`../explanation/evaluation-concepts-and-matching`
 - Check full evaluate options: {doc}`../reference/cli/evaluate`
-
-This page is a starter scaffold and will be expanded with a full worked
-example.
--- a/docs/source/tutorials/index.md
+++ b/docs/source/tutorials/index.md
@ -1,7 +1,12 @@
 # Tutorials

-Tutorials are for learning by doing. They provide a single, reproducible path
-to a concrete outcome.
+Tutorials are the default learning path.
+
+Each tutorial follows one recommended route from start to finish.
+
+Use tutorials when you want the simplest route to a concrete outcome.
+
+Use {doc}`../how_to/index` when you need to customize a workflow.

 ```{toctree}
 :maxdepth: 1
--- a/docs/source/tutorials/integrate-with-a-python-pipeline.md
+++ b/docs/source/tutorials/integrate-with-a-python-pipeline.md
@ -3,21 +3,52 @@
 This tutorial shows a minimal Python workflow for loading audio, running
 batdetect2, and collecting detections for downstream analysis.

+This tutorial is for people who already want to work in Python.
+
+If you mainly want to run the model on recordings,
+start with {doc}`run-inference-on-folder` instead.
+
 ## Before you start

 - BatDetect2 installed in your Python environment.
 - A model checkpoint.
 - At least one input audio file.

-## Tutorial steps
+```{note}
+This page is more technical than the standard first-run tutorial.
+You do not need this page for a normal first use of BatDetect2.
+```

-1. Load BatDetect2 in Python.
-2. Create an API instance from a checkpoint.
-3. Run `process_file` on one audio file.
-4. Read detection fields and class scores.
-5. Save or pass detections to your downstream pipeline.
+If you are working from this repository checkout, you can start with:

-## Example code
+```text
+src/batdetect2/models/checkpoints/Net2DFast_UK_same.pth.tar
+```
+
+## Outcome
+
+By the end of this tutorial you will have:
+
+- created a `BatDetect2API` object,
+- run inference on one file,
+- inspected the top class, class-score list, and detection score,
+- identified where to go next for feature extraction, saving predictions, and batch workflows.
+
+## 1. Create the API instance
+
+Load the checkpoint once and reuse the API object for multiple files.
+
+```python
+from pathlib import Path
+
+from batdetect2.api_v2 import BatDetect2API
+
+api = BatDetect2API.from_checkpoint(Path("path/to/model.ckpt"))
+```
+
+## 2. Run inference on one file
+
+`process_file` is the simplest Python entry point when you want one prediction object per recording.

 ```python
 from pathlib import Path
@ -33,10 +64,55 @@ for detection in prediction.detections:
    print(top_class, score)
 ```

+`prediction` is a `ClipDetections` object.
+
+It contains:
+
+- the clip metadata,
+- a list of detections,
+- a box for each detected event,
+- one detection score per event,
+- a full list of class scores per event,
+- a feature vector per event.
+
+## 3. Inspect class scores, not just the top class
+
+If you are exploring results,
+it is often useful to inspect the full ranked class-score list.
+
+```python
+for detection in prediction.detections:
+    print("top class:", api.get_top_class_name(detection))
+    print("detection score:", detection.detection_score)
+    print("class scores:")
+    for class_name, score in api.get_class_scores(detection):
+        print(f"  {class_name}: {score:.3f}")
+```
+
+This helps separate two different questions:
+
+- "Did the model think there was a call here?"
+- "If there was a call, which class did it score highest?"
+
+## 4. Keep the first workflow small
+
+Before scaling up, run the API on a few representative files and inspect the results manually.
+
+This catches path issues and obviously implausible outputs early.
+
+## 5. Move to the right next workflow
+
+Once the single-file path is working, choose the next page based on what you need:
+
+- save predictions to disk,
+- inspect class scores more carefully,
+- inspect detection features,
+- process many files in one run.
+
 ## What to do next

- See API/config references: {doc}`../reference/index`
- Learn practical CLI alternatives: {doc}`run-inference-on-folder`
-
-This page is a starter scaffold and will be expanded with a full worked
-example.
+- API reference: {doc}`../reference/api`
+- Inspect ranked class scores: {doc}`../how_to/inspect-class-scores-in-python`
+- Inspect detection features: {doc}`../how_to/inspect-detection-features-in-python`
+- Save predictions to disk: {doc}`../how_to/save-predictions-in-different-output-formats`
+- Learn the CLI happy path: {doc}`run-inference-on-folder`
--- a/docs/source/tutorials/run-inference-on-folder.md
+++ b/docs/source/tutorials/run-inference-on-folder.md
@ -1,33 +1,115 @@
-# Tutorial: Run inference on a folder of audio files
+# Tutorial: Run BatDetect2 on a folder of audio files

 This tutorial walks through a first end-to-end inference run with the CLI.

+It is the default starting point for new users.
+
+Use it when you want to run an existing model on a folder of recordings and quickly check what BatDetect2 found.
+
 ## Before you start

 - BatDetect2 installed in your environment.
 - A folder containing `.wav` files.
 - A model checkpoint path.

-## Tutorial steps
+A checkpoint is the saved model file that BatDetect2 uses to make predictions.

-1. Choose your input and output directories.
-2. Run prediction with the CLI.
-3. Verify output files were written.
-4. Inspect predictions and confidence scores.
+If you are working from this repository checkout, you can use:

-## Example command
+```text
+src/batdetect2/models/checkpoints/Net2DFast_UK_same.pth.tar
+```
+
+## Outcome
+
+By the end of this tutorial you will have:
+
+- run `batdetect2 predict directory`,
+- saved predictions to disk,
+- checked that BatDetect2 wrote output files,
+- identified the next pages to use for tuning or customization.
+
+## 1. Choose your input and output paths
+
+Pick three paths:
+
+- the checkpoint to use,
+- the directory containing your audio files,
+- an output directory where BatDetect2 will save its results.
+
+Example layout:
+
+```text
+project/
+  model.pth.tar
+  audio/
+    file_001.wav
+    file_002.wav
+  outputs/
+```
+
+## 2. Run prediction on the directory
+
+Use this command when you want BatDetect2 to scan a folder of recordings automatically.

 ```bash
 batdetect2 predict directory \
-  path/to/model.ckpt \
+  path/to/model.pth.tar \
  path/to/audio_dir \
  path/to/outputs
 ```

+What this does:
+
+- loads the checkpoint,
+- finds audio files in `audio_dir`,
+- splits recordings into smaller pieces internally when needed,
+- saves result files to `outputs`.
+
+## 3. Verify that outputs were written
+
+After the command completes, inspect the output directory.
+
+For a first run,
+the important check is simple:
+
+- did BatDetect2 create result files,
+- are they in the output directory you expected,
+- did it process the recordings you meant to analyze.
+
+Different workflows can save results in different file formats.
+
+You do not need to learn those details for the first run.
+
+If you later need to choose a specific output format,
+go to {doc}`../how_to/save-predictions-in-different-output-formats`.
+
+## 4. Inspect predictions
+
+Start with a small subset of representative files.
+
+Check:
+
+- whether detections were written for the expected recordings,
+- whether output counts are plausible,
+- whether the model is obviously too sensitive or too conservative,
+- whether the predicted classes look broadly reasonable for your data.
+
+Do not treat the first run as validated ecological output.
+
+The first run is a workflow check.
+
+Validation comes next.
+
+## 5. Tune only after you have a baseline
+
+If the first run is too noisy or misses obvious calls, tune thresholds on a reviewed subset rather than changing settings blindly across the full dataset.
+
+Use {doc}`../how_to/tune-detection-threshold` for that process.
+
 ## What to do next

- Use {doc}`../how_to/tune-detection-threshold` to tune sensitivity.
- Use {doc}`../reference/cli/index` for full command options.
-
-This page is a starter scaffold and will be expanded with a full worked
-example.
+- If you need a different input mode, use {doc}`../how_to/choose-an-inference-input-mode`.
+- If you want to tune sensitivity, use {doc}`../how_to/tune-detection-threshold`.
+- If you already write code and want more control from Python, use {doc}`integrate-with-a-python-pipeline`.
+- If you need full command details, use {doc}`../reference/cli/predict`.
--- a/docs/source/tutorials/train-a-custom-model.md
+++ b/docs/source/tutorials/train-a-custom-model.md
@ -3,21 +3,44 @@
 This tutorial walks through a first custom training run using your own
 annotations.

+This tutorial is for advanced users who already have dataset files and want to train a model on their own annotated data.
+
 ## Before you start

 - BatDetect2 installed.
 - A training dataset config file.
 - (Optional) A validation dataset config file.
+- A targets config file if you are not using the default target setup.
+- A model config file if you are not training from the built-in defaults.

-## Tutorial steps
+```{note}
+This is not the first page to start with if you only want to run the existing model on recordings.
+Use {doc}`run-inference-on-folder` for that.
+```

-1. Prepare training and validation dataset config files.
-2. Choose target definitions and model/training config files.
-3. Run `batdetect2 train`.
-4. Check that checkpoints and logs are written.
-5. Run a quick sanity inference on a small audio subset.
+## Outcome

-## Example command
+By the end of this tutorial you will have:
+
+- started a training run,
+- written checkpoints and logs,
+- understood the minimum settings involved,
+- identified the next pages for fine-tuning and evaluation.
+
+## 1. Gather the minimum required inputs
+
+At minimum, a custom training run needs:
+
+- a training dataset config,
+- optional validation dataset config,
+- either a model config for a fresh run or a checkpoint for continued training,
+- optional settings files for targets, audio, training, evaluation, inference, outputs, and logging.
+
+The most important point is that the dataset file, target definitions, and preprocessing choices need to agree with each other.
+
+## 2. Run a first training command
+
+Use a command like this for a fresh run:

 ```bash
 batdetect2 train \
@ -28,10 +51,35 @@ batdetect2 train \
  --training-config path/to/training.yaml
 ```

+Use `--model` instead of `--model-config` when you want to continue from an existing checkpoint.
+
+## 3. Check that outputs are being written
+
+After the command starts, verify that:
+
+- the run initializes without configuration errors,
+- checkpoints are written to the checkpoint directory,
+- logs are written to the log directory or configured logger backend,
+- the training and validation datasets load as expected.
+
+## 4. Run a sanity inference pass after training
+
+Do not wait until full evaluation to confirm that the trained checkpoint behaves sensibly.
+
+Take a small reviewed subset of recordings and run a quick prediction pass with the new checkpoint.
+
+That catches setup mismatches early, especially around targets and preprocessing.
+
+## 5. Evaluate on held-out data
+
+Once the checkpoint looks sensible on a small sanity subset, run the formal evaluation workflow on a held-out test set.
+
+That is where you should compare models, thresholds, and task-level performance metrics.
+
 ## What to do next

 - Evaluate the trained checkpoint: {doc}`evaluate-on-a-test-set`
+- Fine-tune from a checkpoint: {doc}`../how_to/fine-tune-from-a-checkpoint`
+- Configure targets: {doc}`../how_to/configure-target-definitions`
+- Configure preprocessing: {doc}`../how_to/configure-audio-preprocessing`
 - Check full train options: {doc}`../reference/cli/train`
-
-This page is a starter scaffold and will be expanded with a full worked
-example.
--- a/src/batdetect2/cli/train.py
+++ b/src/batdetect2/cli/train.py
@ -24,6 +24,14 @@ __all__ = ["train_command"]
        "training starts from a fresh model config."
    ),
 )
+@click.option(
+    "--base-dir",
+    type=click.Path(exists=True),
+    help=(
+        "Base directory used to resolve relative paths inside the training "
+        "and validation dataset configs."
+    ),
+)
@click.option(
    "--targets",
    "targets_config",
@ -111,6 +119,7 @@ def train_command(
    model_path: Path | None = None,
    ckpt_dir: Path | None = None,
    log_dir: Path | None = None,
+    base_dir: Path | None = None,
    targets_config: Path | None = None,
    model_config: Path | None = None,
    training_config: Path | None = None,
@ -191,7 +200,10 @@ def train_command(
        model_conf = model_conf.model_copy(update={"targets": target_conf})

    logger.info("Loading training dataset...")
-    train_annotations = load_dataset_from_config(train_dataset)
+    train_annotations = load_dataset_from_config(
+        train_dataset,
+        base_dir=base_dir,
+    )
    logger.debug(
        "Loaded {num_annotations} training examples",
        num_annotations=len(train_annotations),
@ -199,7 +211,10 @@ def train_command(

    val_annotations = None
    if val_dataset is not None:
-        val_annotations = load_dataset_from_config(val_dataset)
+        val_annotations = load_dataset_from_config(
+            val_dataset,
+            base_dir=base_dir,
+        )
        logger.debug(
            "Loaded {num_annotations} validation examples",
            num_annotations=len(val_annotations),
Author	SHA1	Message	Date
mbsantiago	f82ec218f0	docs: clarify train base-dir option	2026-04-30 16:51:24 +01:00
mbsantiago	9da05c172c	Merge branch 'train' into doc	2026-04-30 11:50:04 +01:00
mbsantiago	a2f2a2d398	docs: add legacy workflow and migration guidance	2026-04-30 11:48:25 +01:00
mbsantiago	300716895e	docs: add task guides and API/config references	2026-04-30 11:48:19 +01:00
mbsantiago	9dec35b1ce	docs: expand core user workflow tutorials	2026-04-30 11:48:11 +01:00
mbsantiago	9635a858bd	docs: align docs entry points with current workflows	2026-04-30 11:48:07 +01:00