From 105384d9a24e62a4507a86b8861f2ae5133d0dc7 Mon Sep 17 00:00:00 2001
From: mbsantiago <santiago.mbal@gmail.com>
Date: Wed, 6 May 2026 17:45:27 +0100
Subject: [PATCH] remove documentation plan

---
 docs/plan.md | 441 ---------------------------------------------------
 1 file changed, 441 deletions(-)
 delete mode 100644 docs/plan.md

diff --git a/docs/plan.md b/docs/plan.md
deleted file mode 100644
index 4fe5f3d..0000000
--- a/docs/plan.md
+++ /dev/null
@@ -1,441 +0,0 @@
-# Documentation Plan
-
-## Goal
-
-Build documentation around the main user stories:
-
-1. Run inference with the CLI on one folder of audio.
-2. Use the Python API for inference with fine-grained control over outputs,
-   including per-file workflows, class scores, features, and batch processing.
-3. Train or fine-tune a custom model.
-4. Evaluate a model and understand what the metrics mean.
-5. Understand the concepts needed to use BatDetect2 correctly.
-
-The docs should provide:
-
-- a simple happy path in tutorials,
-- richer task-oriented guidance in how-to guides,
-- complete lookup material in reference,
-- deep conceptual coverage in understanding.
-
-Note: the current docs tree uses `explanation/`. For Diataxis consistency, this
-plan uses `understanding/` as the target name for that conceptual section.
-
-## Current State Review
-
-### Looks reasonably complete
-
-- `docs/source/index.md`: good top-level orientation and navigation.
-- `docs/source/getting_started.md`: solid install and entry-point guidance.
-- `docs/source/explanation/*.md`: the conceptual pages are currently the
-  strongest part of the docs, especially pipeline overview, thresholds,
-  preprocessing consistency, and targets.
-- `docs/source/how_to/configure-*.md` and related target/data pages: practical
-  support docs for preprocessing, targets, ROI mapping, and dataset formats are
-  in decent shape.
-- `docs/source/reference/cli/*.rst`: CLI reference wiring exists and should
-  render useful option-level documentation from the Click commands.
-
-### Partially complete
-
-- `docs/source/how_to/run-batch-predictions.md`: useful, but thin.
-- `docs/source/how_to/tune-detection-threshold.md`: useful, but too brief for
-  a key workflow.
-- `docs/source/reference/preprocessing-config.md`
-- `docs/source/reference/postprocess-config.md`
-- `docs/source/reference/targets-config-workflow.md`
-
-These are good summaries, but they do not yet feel like complete references for
-all the customization surfaces available in the code.
-
-### Clearly incomplete or scaffolded
-
-- `docs/source/tutorials/run-inference-on-folder.md`
-- `docs/source/tutorials/integrate-with-a-python-pipeline.md`
-- `docs/source/tutorials/train-a-custom-model.md`
-- `docs/source/tutorials/evaluate-on-a-test-set.md`
-
-All four main tutorials are still starter scaffolds. This is the biggest gap in
-the current user story.
-
-### Major mismatch to resolve
-
-- `README.md` still tells an older story built around `batdetect2 detect` and
-  `batdetect2.api`.
-- The docs site tells the newer story built around `batdetect2 predict` and
-  `batdetect2.api_v2`.
-
-This creates avoidable confusion for users and should be treated as a priority
-documentation alignment issue.
-
-### Legacy documentation is not yet placed clearly
-
-The repo still contains meaningful legacy documentation material, but it is not
-yet presented as a clearly marked legacy path inside the docs.
-
-Users need two things:
-
-- a clear message that these docs exist for the previous BatDetect2 workflow,
-- a clear recommendation that new users should prefer the newer CLI/API
-  workflows and migrate where possible.
-
-## Legacy Documentation Plan
-
-### Goals
-
-1. Preserve access to the old workflow documentation.
-2. Prevent new users from accidentally following legacy guidance.
-3. Give current users a clear migration path from legacy to current workflows.
-
-### Proposed location
-
-Add a dedicated legacy area inside the docs, for example:
-
-- `docs/source/legacy/index.md`
-- `docs/source/legacy/cli-detect.md`
-- `docs/source/legacy/python-api.md`
-- `docs/source/legacy/feature-extraction.md`
-- `docs/source/legacy/migration-guide.md`
-
-This keeps the material available without mixing it into the main happy-path
-docs.
-
-### User-facing messaging
-
-Add clear notices in all relevant navigation entry points.
-
-Suggested message pattern:
-
-"If you want to use the previous version of BatDetect2, see the legacy
-documentation. For new workflows, we recommend using the current `predict`
-CLI and `BatDetect2API` interfaces."
-
-Places that should link to the legacy docs:
-
-- `docs/source/index.md`
-- `docs/source/getting_started.md`
-- `README.md`
-- tutorial landing pages where users may be coming from older workflows
-- any page that mentions the old `detect` command or old Python API
-
-### Migration guide plan
-
-Add a dedicated migration guide that explains:
-
-1. who should migrate now and who may need to stay on the legacy workflow,
-2. the mapping from old CLI commands to new CLI commands,
-3. the mapping from old Python API calls to new `api_v2` / `BatDetect2API`
-   patterns,
-4. what changed in outputs, terminology, and configuration,
-5. how legacy feature extraction concepts map to the new API surfaces,
-6. what behavior differences users should validate before switching,
-7. a short migration checklist.
-
-High-priority migration mappings to document:
-
-- `batdetect2 detect` -> `batdetect2 predict directory`
-- old `batdetect2.api` file processing -> `BatDetect2API.from_checkpoint(... )`
-  plus `process_file`, `process_files`, `process_audio`, or
-  `process_spectrogram`
-- legacy `cnn_feats`, `spec_features`, and `spec_slices` -> current output and
-  feature access patterns, with explicit notes where there is no direct
-  one-to-one replacement
-
-### Legacy content handling plan
-
-For each legacy page or legacy concept:
-
-1. Decide whether it should be preserved as-is, rewritten as a legacy page, or
-   replaced by the migration guide.
-2. Add a prominent warning banner saying it describes the previous workflow.
-3. Link forward to the current equivalent page when one exists.
-
-### Definition of done for legacy handling
-
-Legacy documentation work is done when:
-
-1. a reader can clearly distinguish legacy from current docs,
-2. old users can still find the previous workflow documentation,
-3. new users are consistently directed to the new docs,
-4. there is a practical migration guide covering the main CLI and Python API
-   transitions.
-
-## Main Gaps By User Story
-
-### 1. CLI inference
-
-Current coverage exists, but the happy path is not truly documented yet.
-
-Missing:
-
-- a full worked tutorial from input audio to saved outputs,
-- clear guidance on what outputs are written and how to inspect them,
-- stronger documentation for `predict dataset`,
-- a clearer story for default model vs custom checkpoint,
-- practical guidance for selecting output formats and thresholds.
-
-### 2. Python API inference
-
-This is currently the weakest major story.
-
-The code exposes much more than the docs explain, including:
-
-- `BatDetect2API.from_checkpoint` and `from_config`,
-- `process_file`, `process_files`, `process_directory`, `process_clips`,
-- `process_audio`, `process_spectrogram`,
-- `get_top_class_name`, `get_class_scores`, `get_detection_features`,
-- `save_predictions` and `load_predictions`.
-
-Missing docs:
-
-- an API-first tutorial with a simple path,
-- a how-to for file-by-file inspection and custom post-processing,
-- a how-to for batch API inference,
-- a reference page for `BatDetect2API`,
-- an explanation of what the feature vectors are and how users should think
-  about them.
-
-Important terminology note:
-
-- the old API/docs talk about `cnn_feats`, `spec_features`, and `spec_slices`,
-- the new API exposes per-detection `features`,
-- users interested in embeddings / downstream exploration will need a clear,
-  explicit doc that connects these ideas.
-
-### 3. Batch inference
-
-Batch prediction exists in both CLI and API workflows, but the docs do not yet
-explain the design space well.
-
-Missing:
-
-- when to use `directory` vs `file_list` vs `dataset`,
-- how clipping works during inference,
-- what `InferenceConfig` controls,
-- how batch size, workers, and output format choices affect runs,
-- how to organize large runs reproducibly.
-
-### 4. Training a custom model
-
-Supporting pages exist, but the end-to-end story is not yet there.
-
-Missing:
-
-- one complete tutorial from dataset config to checkpoints and sanity check,
-- a "minimum viable training setup" page,
-- clearer explanation of how model, targets, audio, training, inference,
-  outputs, and logging configs fit together,
-- a fine-tuning story versus training from scratch.
-
-### 5. Evaluation
-
-Evaluation is significantly under-documented relative to the code.
-
-Missing:
-
-- what evaluation tasks exist,
-- what metrics and plots are produced,
-- how predictions are matched to annotations,
-- how to interpret failures and trade-offs,
-- how to configure evaluation for different research questions.
-
-### 6. Understanding / concepts
-
-This is the best-developed section today, but it still needs expansion.
-
-Concepts that should be covered more fully:
-
-- what the model predicts,
-- what the raw and formatted outputs represent,
-- how to interpret detection scores and class scores,
-- what targets are and how they shape training and decoding,
-- how preprocessing choices affect model behavior,
-- what the extracted features represent and when they are useful,
-- what evaluation metrics actually measure,
-- why local validation is required before ecological inference.
-
-## Proposed Documentation Architecture
-
-## Target Table of Contents
-
-### Home
-
-- Home
-- Getting started
-- FAQ
-- Legacy docs
-
-### Tutorials
-
-These should be the default path for most users.
-
-- Tutorial: Run inference on a folder of audio
-- Tutorial: Explore predictions in Python for one file
-- Tutorial: Train a custom model
-- Tutorial: Evaluate a trained model
-
-### How-to Guides
-
-These cover practical tasks once the user is past the happy path.
-
-- How to choose an inference input mode
-- How to run batch predictions from a directory
-- How to run batch predictions from a file list
-- How to run predictions from a dataset config
-- How to tune detection thresholds
-- How to inspect class scores in Python
-- How to inspect detection features in Python
-- How to save predictions in different output formats
-- How to configure inference clipping
-- How to configure audio preprocessing
-- How to configure spectrogram preprocessing
-- How to configure target definitions
-- How to define target classes
-- How to configure ROI mapping
-- How to configure an AOEF dataset
-- How to import legacy BatDetect2 annotations
-- How to fine-tune from a checkpoint
-- How to choose and configure evaluation tasks
-- How to interpret evaluation outputs
-
-### Reference
-
-This should be the complete lookup layer.
-
-- CLI reference
-- CLI reference: base command and global options
-- CLI reference: predict
-- CLI reference: data
-- CLI reference: train
-- CLI reference: evaluate
-- CLI reference: legacy detect
-- API reference: `BatDetect2API`
-- Config reference: top-level app config
-- Config reference: inference config
-- Config reference: evaluation config
-- Config reference: outputs config
-- Config reference: output formats
-- Config reference: output transforms
-- Config reference: preprocessing config
-- Config reference: postprocess config
-- Config reference: targets config workflow
-- Reference: data sources
-- Reference: targets module
-
-### Understanding
-
-This is the conceptual layer and should carry the deeper Diataxis
-"understanding" material.
-
-- What BatDetect2 predicts
-- How the pipeline fits together
-- How to interpret detection scores and class scores
-- How to interpret formatted outputs
-- What extracted features / embeddings are and are not
-- Postprocessing and thresholds
-- Preprocessing consistency and domain shift
-- Target encoding and decoding
-- Evaluation concepts and matching behavior
-- Model output, validation, and ecological interpretation
-
-### Legacy
-
-This is a clearly signposted area for the previous workflow only.
-
-- Legacy overview
-- Legacy CLI workflow with `batdetect2 detect`
-- Legacy Python API with `batdetect2.api`
-- Legacy feature extraction outputs
-- Migration guide: legacy to current workflows
-
-### Tutorials
-
-Keep tutorials opinionated and minimal. Each one should show the default happy
-path with the fewest possible choices.
-
-Planned tutorial set:
-
-1. Run inference on a folder of audio.
-2. Explore predictions in Python for one file.
-3. Train a custom model.
-4. Evaluate a trained model.
-
-### How-to Guides
-
-Use how-to guides for branching tasks and customization.
-
-Planned additions or expansions:
-
-- Choose an inference input mode: directory, file list, or dataset.
-- Run large batch inference reproducibly.
-- Save predictions in different output formats.
-- Inspect class scores and features in Python.
-- Explore detection features / embeddings downstream.
-- Tune clipping and inference settings.
-- Fine-tune from a checkpoint.
-- Choose and configure evaluation tasks.
-- Interpret evaluation artifacts.
-
-### Reference
-
-Reference should become the complete map of all configurable surfaces.
-
-High-priority additions:
-
-- `BatDetect2API` reference.
-- `InferenceConfig` reference.
-- `EvaluationConfig` reference.
-- `OutputsConfig` and output format reference.
-- Output transform reference.
-- clearer config composition reference for the full app config.
-
-### Understanding
-
-This is where the deeper conceptual material should live.
-
-High-priority pages:
-
-1. What BatDetect2 predicts.
-2. How to interpret outputs, scores, and uncertainty.
-3. What extracted features / embeddings are and are not.
-4. Targets, labels, and decoded outputs.
-5. Preprocessing consistency and domain shift.
-6. Postprocessing, thresholds, and output density.
-7. How evaluation works and what the metrics mean.
-8. Why local validation is required before ecological interpretation.
-
-## Priority Order
-
-### Phase 1: Fix the primary user journey
-
-1. Expand the four scaffold tutorials into real end-to-end guides.
-2. Add a proper Python/API inference story.
-3. Document outputs and how to inspect them.
-4. Align `README.md` with the newer CLI/API documentation story.
-5. Create the legacy docs section and add clear signposting to it.
-
-### Phase 2: Cover the customization surface
-
-1. Add how-to guides for batch inference, output formats, and API inspection.
-2. Add reference pages for inference, outputs, evaluation, and API surfaces.
-3. Add fine-tuning and advanced training guidance.
-4. Write the migration guide from legacy to current workflows.
-
-### Phase 3: Deepen understanding
-
-1. Expand the conceptual section into a true understanding section.
-2. Add pages for output interpretation, features/embeddings, and evaluation
-   concepts.
-3. Reader-test the docs against realistic user questions.
-
-## Immediate Next Steps
-
-1. Decide whether to rename `explanation/` to `understanding/` or keep the
-   current directory name and just treat it as the Diataxis understanding
-   section.
-2. Draft the target table of contents for Tutorials, How-to, Reference, and
-   Understanding.
-3. Draft the legacy docs section and migration-guide table of contents.
-4. Rewrite the four scaffold tutorials first.
-5. Add the missing API, outputs, evaluation, and migration documentation
-   immediately after.