mirror of
https://github.com/macaodha/batdetect2.git
synced 2026-05-23 06:41:53 +02:00
42 lines
1.3 KiB
Markdown
42 lines
1.3 KiB
Markdown
# How to interpret evaluation outputs
|
|
|
|
Use this guide after `batdetect2 evaluate` has written metrics and plots to disk.
|
|
|
|
## Start by identifying the task
|
|
|
|
Do not interpret a metric until you know which evaluation task produced it.
|
|
|
|
For example, a detection score and a clip-classification score answer different questions.
|
|
|
|
## Read the output directory as a bundle
|
|
|
|
Treat the evaluation output directory as one package:
|
|
|
|
- metrics,
|
|
- plots,
|
|
- saved predictions,
|
|
- config context.
|
|
|
|
Do not lift a single number out of context and treat it as the whole story.
|
|
|
|
## Look for failure patterns, not just overall averages
|
|
|
|
Check:
|
|
|
|
- whether errors concentrate in certain taxa,
|
|
- whether specific sites or recorder setups behave differently,
|
|
- whether threshold choices are driving the result,
|
|
- whether predictions are near clip boundaries or matching thresholds.
|
|
|
|
## Keep validation and deployment questions separate
|
|
|
|
A model can look good on one task and still be a poor fit for your deployment question.
|
|
|
|
Interpret the outputs in relation to the real use case, not only the easiest metric to report.
|
|
|
|
## Related pages
|
|
|
|
- Evaluation tutorial: {doc}`../tutorials/evaluate-on-a-test-set`
|
|
- Evaluation concepts: {doc}`../explanation/evaluation-concepts-and-matching`
|
|
- Model output and validation: {doc}`../explanation/model-output-and-validation`
|