From 214fbdb7cc1d978efefc3812499a1780511cc065 Mon Sep 17 00:00:00 2001
From: mbsantiago <santiago.mbal@gmail.com>
Date: Thu, 7 May 2026 07:09:49 +0100
Subject: [PATCH] docs: expand inference tutorial examples and model selection
 guidance

---
 docs/source/how_to/choose-a-model.md          | 112 +++++++++
 docs/source/how_to/index.md                   |   7 +-
 .../tutorials/run-inference-on-folder.md      | 221 +++++++++++++-----
 example_data/audio_files.txt                  |   2 +
 4 files changed, 278 insertions(+), 64 deletions(-)
 create mode 100644 docs/source/how_to/choose-a-model.md
 create mode 100644 example_data/audio_files.txt

diff --git a/docs/source/how_to/choose-a-model.md b/docs/source/how_to/choose-a-model.md
new file mode 100644
index 0000000..959deb7
--- /dev/null
+++ b/docs/source/how_to/choose-a-model.md
@@ -0,0 +1,112 @@
+# How to choose a model
+
+Use this guide when you want to choose which model checkpoint BatDetect2 loads.
+
+You can choose a model in both the CLI and the Python API.
+
+## Where you can choose the model
+
+In the CLI, use `--model` with commands that load a checkpoint, including:
+
+- `batdetect2 process`
+- `batdetect2 evaluate`
+- `batdetect2 train`
+- `batdetect2 finetune`
+
+In Python, pass the model source to `BatDetect2API.from_checkpoint(...)`.
+
+If you do not choose a model, BatDetect2 uses the built-in default UK model.
+
+## Use a local checkpoint path
+
+Use a local path when you already have a checkpoint file on disk.
+
+CLI example:
+
+```bash
+batdetect2 process directory \
+    path/to/audio \
+    path/to/outputs \
+    --model path/to/model.ckpt
+```
+
+Python example:
+
+```python
+from batdetect2.api_v2 import BatDetect2API
+
+api = BatDetect2API.from_checkpoint("path/to/model.ckpt")
+```
+
+## Use a bundled checkpoint alias
+
+BatDetect2 also supports bundled checkpoint aliases.
+
+The built-in UK model is available as `uk_same`.
+The alias `batdetect2_uk_same` also works.
+
+CLI example:
+
+```bash
+batdetect2 process directory \
+    path/to/audio \
+    path/to/outputs \
+    --model uk_same
+```
+
+Python example:
+
+```python
+from batdetect2.api_v2 import BatDetect2API
+
+api = BatDetect2API.from_checkpoint("uk_same")
+```
+
+## Use a Hugging Face URI
+
+You can also load a checkpoint from Hugging Face with a URI like:
+
+```text
+hf://owner/repo/path/to/model.ckpt
+```
+
+This needs the optional Hugging Face dependency to be installed.
+For example, install it with `pip install batdetect2[huggingface]`.
+
+CLI example:
+
+```bash
+batdetect2 process directory \
+    path/to/audio \
+    path/to/outputs \
+    --model hf://owner/repo/path/to/model.ckpt
+```
+
+Python example:
+
+```python
+from batdetect2.api_v2 import BatDetect2API
+
+api = BatDetect2API.from_checkpoint(
+    "hf://owner/repo/path/to/model.ckpt"
+)
+```
+
+## Choose the right source
+
+- Use a local path when you already have a checkpoint file.
+- Use an alias when you want one of the bundled models.
+- Use a Hugging Face URI when the checkpoint lives in a Hugging Face repo.
+
+## Related pages
+
+- Run inference on a folder:
+  {doc}`../tutorials/run-inference-on-folder`
+- `BatDetect2API` reference:
+  {doc}`../reference/api`
+- Process command reference:
+  {doc}`../reference/cli/predict`
+- Train a custom model:
+  {doc}`../tutorials/train-a-custom-model`
+- Fine-tune from a checkpoint:
+  {doc}`fine-tune-from-a-checkpoint`
diff --git a/docs/source/how_to/index.md b/docs/source/how_to/index.md
index 4fafe5a..40806dc 100644
--- a/docs/source/how_to/index.md
+++ b/docs/source/how_to/index.md
@@ -1,12 +1,15 @@
 # How-to Guides
 
-How-to guides help you answer practical questions once you are past the first tutorial.
+How-to guides help you answer practical questions once you are past the first
+tutorial.
 
-Use this section when you already know the basic workflow and want help with one specific task.
+Use this section when you already know the basic workflow and want help with one
+specific task.
 
 ```{toctree}
 :maxdepth: 1
 
+choose-a-model
 choose-an-inference-input-mode
 run-batch-predictions
 tune-inference-clipping
diff --git a/docs/source/tutorials/run-inference-on-folder.md b/docs/source/tutorials/run-inference-on-folder.md
index 2f2834a..c4cf5e1 100644
--- a/docs/source/tutorials/run-inference-on-folder.md
+++ b/docs/source/tutorials/run-inference-on-folder.md
@@ -1,120 +1,217 @@
-# Tutorial: Run BatDetect2 on a folder of audio files
+# Run BatDetect2 on a folder of audio files
 
-This tutorial walks through a first end-to-end inference run with the CLI.
+This tutorial shows how to run BatDetect2 on a folder of recordings from the command line.
 
-It is the default starting point for new users.
+Use it when you want a first pass over a folder of audio recordings and want to see what BatDetect2 finds.
 
-Use it when you want to run an existing model on a folder of recordings and
-quickly check what BatDetect2 found.
+If you want to follow the tutorial exactly, you can use the example recordings that come with the repository.
 
 ## Before you start
 
-- BatDetect2 installed in your environment.
-- A folder containing `.wav` files.
-- A model checkpoint path.
+You need:
 
-A checkpoint is the saved model file that BatDetect2 uses to make predictions.
+- BatDetect2 installed.
+- A folder containing supported audio files.
+- A place to save the results.
 
-If you are working from this repository checkout, you can use:
+If you have not installed BatDetect2 yet, start with {doc}`../getting_started`.
 
-```text
-src/batdetect2/models/checkpoints/Net2DFast_UK_same.pth.tar
+## Optional: use the repository example files
+
+If you want to follow the steps with the same paths shown here, clone the repository and move into it:
+
+```bash
+git clone https://github.com/macaodha/batdetect2.git
+cd batdetect2
 ```
 
-## Outcome
+Then you can use these example paths from the repository root.
+
+## What you will do
 
 By the end of this tutorial you will have:
 
 - run `batdetect2 process directory`,
 - saved predictions to disk,
-- checked that BatDetect2 wrote output files,
-- identified the next pages to use for tuning or customization.
+- checked that BatDetect2 wrote the files you expected,
+- tried a second run with a higher detection threshold,
+- identified the next pages to use if you want to customise the run.
 
-## 1. Choose your input and output paths
+## 1. Choose your input and output folders
 
-Pick three paths:
+Pick:
 
-- the checkpoint to use,
-- the directory containing your audio files,
-- an output directory where BatDetect2 will save its results.
+- the folder containing your audio files,
+- an output folder where BatDetect2 should save results.
 
 Example layout:
 
 ```text
 project/
-  model.pth.tar
   audio/
     file_001.wav
     file_002.wav
   outputs/
 ```
 
-## 2. Run processing on the directory
+If `outputs/` does not exist yet, that is fine.
+BatDetect2 can create it.
 
-Use this command when you want BatDetect2 to scan a folder of recordings
-automatically.
+If you are using the repository example files, your layout already looks like this:
+
+```text
+batdetect2/
+  example_data/
+    audio/
+      20170701_213954-MYOMYS-LR_0_0.5.wav
+      20180530_213516-EPTSER-LR_0_0.5.wav
+      20180627_215323-RHIFER-LR_0_0.5.wav
+```
+
+## 2. Run BatDetect2 on the folder
+
+For a first run, use the built-in default UK model:
 
 ```bash
 batdetect2 process directory \
-  path/to/model.pth.tar \
-  path/to/audio_dir \
+  path/to/audio \
   path/to/outputs
 ```
 
+If you are using the repository example files, run:
+
+```bash
+batdetect2 process directory \
+  example_data/audio \
+  example_outputs/first_run
+```
+
 What this does:
 
-- loads the checkpoint,
-- finds audio files in `audio_dir`,
-- splits recordings into smaller pieces internally when needed,
-- saves result files to `outputs`.
+- looks for supported audio files in `path/to/audio`,
+- runs the model on each recording,
+- saves the results in `path/to/outputs`.
 
-## 3. Verify that outputs were written
+You do not need to choose a model for this first run.
+If you do nothing, BatDetect2 uses the built-in default UK model.
 
-After the command completes, inspect the output directory.
+If you want to use a different model later, see {doc}`../how_to/choose-a-model`.
 
-For a first run, the important check is simple:
+## 3. Check the output files
 
-- did BatDetect2 create result files,
-- are they in the output directory you expected,
-- did it process the recordings you meant to analyze.
+After the command finishes, look in your output folder.
 
-Different workflows can save results in different file formats.
+By default, the CLI writes predictions in the `batdetect2` output format.
+This is a JSON-based format used for BatDetect2-style outputs.
 
-You do not need to learn those details for the first run.
+With the default settings, you will usually see one `.json` file and one `_detections.csv` file per recording.
 
-If you later need to choose a specific output format, go to
-{doc}`../how_to/save-predictions-in-different-output-formats`.
+For the repository example run, that means files like:
 
-## 4. Inspect predictions
+```text
+example_outputs/first_run/
+  20170701_213954-MYOMYS-LR_0_0.5.wav.json
+  20170701_213954-MYOMYS-LR_0_0.5.wav_detections.csv
+  20180530_213516-EPTSER-LR_0_0.5.wav.json
+  20180530_213516-EPTSER-LR_0_0.5.wav_detections.csv
+  20180627_215323-RHIFER-LR_0_0.5.wav.json
+  20180627_215323-RHIFER-LR_0_0.5.wav_detections.csv
+```
 
-Start with a small subset of representative files.
+One of the JSON files will look roughly like this:
 
-Check:
+```json
+{
+  "annotated": false,
+  "annotation": [
+    {
+      "class": "Rhinolophus ferrumequinum",
+      "class_prob": 0.889,
+      "det_prob": 0.889,
+      "end_time": 0.0668,
+      "event": "Echolocation",
+      "high_freq": 84857,
+      "individual": "-1",
+      "low_freq": 67578,
+      "start_time": 0.0
+    }
+  ]
+}
+```
 
-- whether detections were written for the expected recordings,
-- whether output counts are plausible,
-- whether the model is obviously too sensitive or too conservative,
-- whether the predicted classes look broadly reasonable for your data.
+Very briefly:
 
-Do not treat the first run as validated ecological output.
+- `annotated: false` means this is a prediction file, not a reviewed annotation file.
+- `annotation` holds the list of detections.
+- Each detection includes a predicted class, detection score, class score, time bounds, and frequency bounds.
 
-The first run is a workflow check.
+For more detail, see {doc}`../explanation/interpreting-formatted-outputs`.
+If you want to save results in another format, see {doc}`../how_to/save-predictions-in-different-output-formats`.
 
-Validation comes next.
+## 4. Run the same folder with a higher threshold
 
-## 5. Tune only after you have a baseline
+If you want, you can also run the same folder again with a higher detection threshold and save that run in a separate output folder.
 
-If the first run is too noisy or misses obvious calls, tune thresholds on a
-reviewed subset rather than changing settings blindly across the full dataset.
+```bash
+batdetect2 process directory \
+    path/to/audio \
+    path/to/outputs_threshold_05 \
+    --detection-threshold 0.5
+```
 
-Use {doc}`../how_to/tune-detection-threshold` for that process.
+Concrete example:
 
-## What to do next
+```bash
+batdetect2 process directory \
+    example_data/audio \
+    example_outputs/threshold_05 \
+    --detection-threshold 0.5
+```
 
-- If you need a different input mode, use
-  {doc}`../how_to/choose-an-inference-input-mode`.
-- If you want to tune sensitivity, use
-  {doc}`../how_to/tune-detection-threshold`.
-- If you already write code and want more control from Python, use
-  {doc}`integrate-with-a-python-pipeline`.
-- If you need full command details, use {doc}`../reference/cli/predict`.
+Keeping this in a separate folder makes it easy to compare runs later.
+
+## 5. Run the model on a list of recordings
+
+If you only want to process selected recordings, use `file_list`.
+The list file should contain one recording path per line.
+
+Example `audio_files.txt`:
+
+```text
+path/to/audio/file_001.wav
+path/to/audio/file_002.wav
+path/to/audio/file_010.wav
+```
+
+Repository example:
+
+```text
+example_data/audio/20170701_213954-MYOMYS-LR_0_0.5.wav
+example_data/audio/20180530_213516-EPTSER-LR_0_0.5.wav
+```
+
+Then run:
+
+```bash
+batdetect2 process file_list \
+    path/to/audio_files.txt \
+    path/to/selected_outputs
+```
+
+Concrete example:
+
+```bash
+batdetect2 process file_list \
+    example_data/audio_files.txt \
+    example_outputs/selected_outputs
+```
+
+This is useful when your recordings are spread across folders, or when you only want to run a chosen subset.
+
+## Common next steps
+
+- If your recordings are not all in one folder, or you want to compare input modes, see {doc}`../how_to/choose-an-inference-input-mode`.
+- If you want to save results in another format, see {doc}`../how_to/save-predictions-in-different-output-formats`.
+- If you want to choose a different model, see {doc}`../how_to/choose-a-model`.
+- If you already write code and want more control from Python, see {doc}`integrate-with-a-python-pipeline`.
+- If you want the full command reference, including `--model`, see {doc}`../reference/cli/predict`.
diff --git a/example_data/audio_files.txt b/example_data/audio_files.txt
new file mode 100644
index 0000000..c53ad8a
--- /dev/null
+++ b/example_data/audio_files.txt
@@ -0,0 +1,2 @@
+example_data/audio/20170701_213954-MYOMYS-LR_0_0.5.wav
+example_data/audio/20180530_213516-EPTSER-LR_0_0.5.wav