diff --git a/docs/source/tutorials/evaluate-on-a-test-set.md b/docs/source/tutorials/evaluate-on-a-test-set.md
index 2e94c2e..a4bce6a 100644
--- a/docs/source/tutorials/evaluate-on-a-test-set.md
+++ b/docs/source/tutorials/evaluate-on-a-test-set.md
@@ -3,33 +3,89 @@
 This tutorial shows how to evaluate a trained checkpoint on a held-out dataset
 and inspect the output metrics.
 
+This tutorial is for advanced users who want to compare one trained model against a separate test dataset.
+
 ## Before you start
 
 - A trained model checkpoint.
 - A test dataset config file.
 - (Optional) Targets, audio, inference, and evaluation config overrides.
 
-## Tutorial steps
+```{note}
+This page is for model evaluation.
+If you only want to run BatDetect2 on recordings,
+start with {doc}`run-inference-on-folder` instead.
+```
 
-1. Select a checkpoint and a test dataset.
-2. Run `batdetect2 evaluate`.
-3. Inspect output metrics and prediction artifacts.
-4. Record evaluation settings for reproducibility.
+## Outcome
 
-## Example command
+By the end of this tutorial you will have:
+
+- run `batdetect2 evaluate`,
+- written evaluation metrics and result files,
+- understood what to inspect first,
+- identified the next pages for evaluation concepts and configuration.
+
+## 1. Start with a held-out dataset
+
+Use a dataset that was not used for training or tuning.
+
+A held-out dataset is simply a separate dataset kept aside for evaluation.
+
+If you tune thresholds or configs on the same dataset that you report as final evaluation, the results will be optimistic.
+
+## 2. Run evaluation
 
 ```bash
 batdetect2 evaluate \
   path/to/model.ckpt \
   path/to/test_dataset.yaml \
+  --base-dir path/to/project_root \
   --output-dir path/to/eval_outputs
 ```
 
+This command loads the checkpoint,
+runs prediction on the test dataset,
+applies the chosen evaluation tasks,
+and writes metrics and result files to the output directory.
+
+Use `--base-dir` whenever the dataset config contains relative paths.
+
+That is the common case for project-local dataset files.
+
+## 3. Inspect the output directory
+
+Look for:
+
+- summary metrics,
+- generated plots,
+- saved prediction files if they were enabled,
+- enough metadata to reproduce the run later.
+
+The exact set depends on the configured evaluation tasks and plots.
+
+## 4. Interpret the results in context
+
+Do not reduce evaluation to a single number.
+
+Check:
+
+- which task the metric belongs to,
+- which thresholding or matching assumptions were used,
+- whether class-level behavior matches your use case,
+- whether the failures are concentrated in specific taxa, sites, or recording conditions.
+
+## 5. Record the evaluation setup
+
+Keep the command, config files, checkpoint path, and dataset version together.
+
+That matters for reproducibility and for later model comparisons.
+
 ## What to do next
 
 - Compare thresholds on representative files:
   {doc}`../how_to/tune-detection-threshold`
+- Configure evaluation tasks: {doc}`../how_to/choose-and-configure-evaluation-tasks`
+- Interpret evaluation artifacts: {doc}`../how_to/interpret-evaluation-outputs`
+- Learn the evaluation concepts: {doc}`../explanation/evaluation-concepts-and-matching`
 - Check full evaluate options: {doc}`../reference/cli/evaluate`
-
-This page is a starter scaffold and will be expanded with a full worked
-example.
diff --git a/docs/source/tutorials/integrate-with-a-python-pipeline.md b/docs/source/tutorials/integrate-with-a-python-pipeline.md
index 0c4fffd..1c62390 100644
--- a/docs/source/tutorials/integrate-with-a-python-pipeline.md
+++ b/docs/source/tutorials/integrate-with-a-python-pipeline.md
@@ -3,21 +3,52 @@
 This tutorial shows a minimal Python workflow for loading audio, running
 batdetect2, and collecting detections for downstream analysis.
 
+This tutorial is for people who already want to work in Python.
+
+If you mainly want to run the model on recordings,
+start with {doc}`run-inference-on-folder` instead.
+
 ## Before you start
 
 - BatDetect2 installed in your Python environment.
 - A model checkpoint.
 - At least one input audio file.
 
-## Tutorial steps
+```{note}
+This page is more technical than the standard first-run tutorial.
+You do not need this page for a normal first use of BatDetect2.
+```
 
-1. Load BatDetect2 in Python.
-2. Create an API instance from a checkpoint.
-3. Run `process_file` on one audio file.
-4. Read detection fields and class scores.
-5. Save or pass detections to your downstream pipeline.
+If you are working from this repository checkout, you can start with:
 
-## Example code
+```text
+src/batdetect2/models/checkpoints/Net2DFast_UK_same.pth.tar
+```
+
+## Outcome
+
+By the end of this tutorial you will have:
+
+- created a `BatDetect2API` object,
+- run inference on one file,
+- inspected the top class, class-score list, and detection score,
+- identified where to go next for feature extraction, saving predictions, and batch workflows.
+
+## 1. Create the API instance
+
+Load the checkpoint once and reuse the API object for multiple files.
+
+```python
+from pathlib import Path
+
+from batdetect2.api_v2 import BatDetect2API
+
+api = BatDetect2API.from_checkpoint(Path("path/to/model.ckpt"))
+```
+
+## 2. Run inference on one file
+
+`process_file` is the simplest Python entry point when you want one prediction object per recording.
 
 ```python
 from pathlib import Path
@@ -33,10 +64,55 @@ for detection in prediction.detections:
     print(top_class, score)
 ```
 
+`prediction` is a `ClipDetections` object.
+
+It contains:
+
+- the clip metadata,
+- a list of detections,
+- a box for each detected event,
+- one detection score per event,
+- a full list of class scores per event,
+- a feature vector per event.
+
+## 3. Inspect class scores, not just the top class
+
+If you are exploring results,
+it is often useful to inspect the full ranked class-score list.
+
+```python
+for detection in prediction.detections:
+    print("top class:", api.get_top_class_name(detection))
+    print("detection score:", detection.detection_score)
+    print("class scores:")
+    for class_name, score in api.get_class_scores(detection):
+        print(f"  {class_name}: {score:.3f}")
+```
+
+This helps separate two different questions:
+
+- "Did the model think there was a call here?"
+- "If there was a call, which class did it score highest?"
+
+## 4. Keep the first workflow small
+
+Before scaling up, run the API on a few representative files and inspect the results manually.
+
+This catches path issues and obviously implausible outputs early.
+
+## 5. Move to the right next workflow
+
+Once the single-file path is working, choose the next page based on what you need:
+
+- save predictions to disk,
+- inspect class scores more carefully,
+- inspect detection features,
+- process many files in one run.
+
 ## What to do next
 
-- See API/config references: {doc}`../reference/index`
-- Learn practical CLI alternatives: {doc}`run-inference-on-folder`
-
-This page is a starter scaffold and will be expanded with a full worked
-example.
+- API reference: {doc}`../reference/api`
+- Inspect ranked class scores: {doc}`../how_to/inspect-class-scores-in-python`
+- Inspect detection features: {doc}`../how_to/inspect-detection-features-in-python`
+- Save predictions to disk: {doc}`../how_to/save-predictions-in-different-output-formats`
+- Learn the CLI happy path: {doc}`run-inference-on-folder`
diff --git a/docs/source/tutorials/run-inference-on-folder.md b/docs/source/tutorials/run-inference-on-folder.md
index ca7f5bd..cff78b7 100644
--- a/docs/source/tutorials/run-inference-on-folder.md
+++ b/docs/source/tutorials/run-inference-on-folder.md
@@ -1,33 +1,115 @@
-# Tutorial: Run inference on a folder of audio files
+# Tutorial: Run BatDetect2 on a folder of audio files
 
 This tutorial walks through a first end-to-end inference run with the CLI.
 
+It is the default starting point for new users.
+
+Use it when you want to run an existing model on a folder of recordings and quickly check what BatDetect2 found.
+
 ## Before you start
 
 - BatDetect2 installed in your environment.
 - A folder containing `.wav` files.
 - A model checkpoint path.
 
-## Tutorial steps
+A checkpoint is the saved model file that BatDetect2 uses to make predictions.
 
-1. Choose your input and output directories.
-2. Run prediction with the CLI.
-3. Verify output files were written.
-4. Inspect predictions and confidence scores.
+If you are working from this repository checkout, you can use:
 
-## Example command
+```text
+src/batdetect2/models/checkpoints/Net2DFast_UK_same.pth.tar
+```
+
+## Outcome
+
+By the end of this tutorial you will have:
+
+- run `batdetect2 predict directory`,
+- saved predictions to disk,
+- checked that BatDetect2 wrote output files,
+- identified the next pages to use for tuning or customization.
+
+## 1. Choose your input and output paths
+
+Pick three paths:
+
+- the checkpoint to use,
+- the directory containing your audio files,
+- an output directory where BatDetect2 will save its results.
+
+Example layout:
+
+```text
+project/
+  model.pth.tar
+  audio/
+    file_001.wav
+    file_002.wav
+  outputs/
+```
+
+## 2. Run prediction on the directory
+
+Use this command when you want BatDetect2 to scan a folder of recordings automatically.
 
 ```bash
 batdetect2 predict directory \
-  path/to/model.ckpt \
+  path/to/model.pth.tar \
   path/to/audio_dir \
   path/to/outputs
 ```
 
+What this does:
+
+- loads the checkpoint,
+- finds audio files in `audio_dir`,
+- splits recordings into smaller pieces internally when needed,
+- saves result files to `outputs`.
+
+## 3. Verify that outputs were written
+
+After the command completes, inspect the output directory.
+
+For a first run,
+the important check is simple:
+
+- did BatDetect2 create result files,
+- are they in the output directory you expected,
+- did it process the recordings you meant to analyze.
+
+Different workflows can save results in different file formats.
+
+You do not need to learn those details for the first run.
+
+If you later need to choose a specific output format,
+go to {doc}`../how_to/save-predictions-in-different-output-formats`.
+
+## 4. Inspect predictions
+
+Start with a small subset of representative files.
+
+Check:
+
+- whether detections were written for the expected recordings,
+- whether output counts are plausible,
+- whether the model is obviously too sensitive or too conservative,
+- whether the predicted classes look broadly reasonable for your data.
+
+Do not treat the first run as validated ecological output.
+
+The first run is a workflow check.
+
+Validation comes next.
+
+## 5. Tune only after you have a baseline
+
+If the first run is too noisy or misses obvious calls, tune thresholds on a reviewed subset rather than changing settings blindly across the full dataset.
+
+Use {doc}`../how_to/tune-detection-threshold` for that process.
+
 ## What to do next
 
-- Use {doc}`../how_to/tune-detection-threshold` to tune sensitivity.
-- Use {doc}`../reference/cli/index` for full command options.
-
-This page is a starter scaffold and will be expanded with a full worked
-example.
+- If you need a different input mode, use {doc}`../how_to/choose-an-inference-input-mode`.
+- If you want to tune sensitivity, use {doc}`../how_to/tune-detection-threshold`.
+- If you already write code and want more control from Python, use {doc}`integrate-with-a-python-pipeline`.
+- If you need full command details, use {doc}`../reference/cli/predict`.
diff --git a/docs/source/tutorials/train-a-custom-model.md b/docs/source/tutorials/train-a-custom-model.md
index 269e6a3..3a1ff82 100644
--- a/docs/source/tutorials/train-a-custom-model.md
+++ b/docs/source/tutorials/train-a-custom-model.md
@@ -3,21 +3,44 @@
 This tutorial walks through a first custom training run using your own
 annotations.
 
+This tutorial is for advanced users who already have dataset files and want to train a model on their own annotated data.
+
 ## Before you start
 
 - BatDetect2 installed.
 - A training dataset config file.
 - (Optional) A validation dataset config file.
+- A targets config file if you are not using the default target setup.
+- A model config file if you are not training from the built-in defaults.
 
-## Tutorial steps
+```{note}
+This is not the first page to start with if you only want to run the existing model on recordings.
+Use {doc}`run-inference-on-folder` for that.
+```
 
-1. Prepare training and validation dataset config files.
-2. Choose target definitions and model/training config files.
-3. Run `batdetect2 train`.
-4. Check that checkpoints and logs are written.
-5. Run a quick sanity inference on a small audio subset.
+## Outcome
 
-## Example command
+By the end of this tutorial you will have:
+
+- started a training run,
+- written checkpoints and logs,
+- understood the minimum settings involved,
+- identified the next pages for fine-tuning and evaluation.
+
+## 1. Gather the minimum required inputs
+
+At minimum, a custom training run needs:
+
+- a training dataset config,
+- optional validation dataset config,
+- either a model config for a fresh run or a checkpoint for continued training,
+- optional settings files for targets, audio, training, evaluation, inference, outputs, and logging.
+
+The most important point is that the dataset file, target definitions, and preprocessing choices need to agree with each other.
+
+## 2. Run a first training command
+
+Use a command like this for a fresh run:
 
 ```bash
 batdetect2 train \
@@ -28,10 +51,35 @@ batdetect2 train \
   --training-config path/to/training.yaml
 ```
 
+Use `--model` instead of `--model-config` when you want to continue from an existing checkpoint.
+
+## 3. Check that outputs are being written
+
+After the command starts, verify that:
+
+- the run initializes without configuration errors,
+- checkpoints are written to the checkpoint directory,
+- logs are written to the log directory or configured logger backend,
+- the training and validation datasets load as expected.
+
+## 4. Run a sanity inference pass after training
+
+Do not wait until full evaluation to confirm that the trained checkpoint behaves sensibly.
+
+Take a small reviewed subset of recordings and run a quick prediction pass with the new checkpoint.
+
+That catches setup mismatches early, especially around targets and preprocessing.
+
+## 5. Evaluate on held-out data
+
+Once the checkpoint looks sensible on a small sanity subset, run the formal evaluation workflow on a held-out test set.
+
+That is where you should compare models, thresholds, and task-level performance metrics.
+
 ## What to do next
 
 - Evaluate the trained checkpoint: {doc}`evaluate-on-a-test-set`
+- Fine-tune from a checkpoint: {doc}`../how_to/fine-tune-from-a-checkpoint`
+- Configure targets: {doc}`../how_to/configure-target-definitions`
+- Configure preprocessing: {doc}`../how_to/configure-audio-preprocessing`
 - Check full train options: {doc}`../reference/cli/train`
-
-This page is a starter scaffold and will be expanded with a full worked
-example.