batdetect2/docs/source/how_to/choose-and-configure-evaluation-tasks.md
2026-05-06 17:22:18 +01:00

75 lines
1.8 KiB
Markdown

# How to choose and configure evaluation tasks
Use this guide when the default evaluation tasks do not match the question you
want to answer.
## Know the default first
By default, BatDetect2 evaluation starts with:
- sound event detection,
- sound event classification.
Those are good defaults for many projects, but not for all of them.
## Choose the task that matches the question
Common built-in task families include:
- `sound_event_detection`
- `sound_event_classification`
- `top_class_detection`
- `clip_detection`
- `clip_classification`
Choose based on the question you care about.
- Use sound-event tasks when you care about individual call events.
- Use clip tasks when you care about clip-level presence or clip-level class
evidence.
- Use top-class detection when you want matching based on the highest-scoring
class per detection.
## Configure tasks in `EvaluationConfig`
Example:
```yaml
tasks:
- name: sound_event_detection
prefix: detection
affinity_threshold: 0.0
strict_match: true
- name: clip_classification
prefix: clip_classification
```
Pass the config with:
```bash
batdetect2 evaluate \
path/to/test_dataset.yaml \
--model path/to/model.ckpt \
--base-dir path/to/project_root \
--evaluation-config path/to/evaluation.yaml
```
Include `--base-dir` when the dataset config resolves recordings through
relative paths.
## Change one thing at a time
When comparing models or settings, avoid changing task definitions, thresholds,
matching behavior, and datasets all at once.
Otherwise it becomes hard to explain why the metric changed.
## Related pages
- Evaluation tutorial:
{doc}`../tutorials/evaluate-on-a-test-set`
- Evaluation config reference:
{doc}`../reference/evaluation-config`
- Evaluation concepts:
{doc}`../explanation/evaluation-concepts-and-matching`