lionel/batdetect2

mirror of https://github.com/macaodha/batdetect2.git synced 2026-05-23 06:41:53 +02:00

mbsantiago 300716895e docs: add task guides and API/config references

2026-04-30 11:48:19 +01:00

1.3 KiB

Raw Blame History

How to interpret evaluation outputs

Use this guide after batdetect2 evaluate has written metrics and plots to disk.

Start by identifying the task

Do not interpret a metric until you know which evaluation task produced it.

For example, a detection score and a clip-classification score answer different questions.

Read the output directory as a bundle

Treat the evaluation output directory as one package:

metrics,
plots,
saved predictions,
config context.

Do not lift a single number out of context and treat it as the whole story.

Look for failure patterns, not just overall averages

Check:

whether errors concentrate in certain taxa,
whether specific sites or recorder setups behave differently,
whether threshold choices are driving the result,
whether predictions are near clip boundaries or matching thresholds.

Keep validation and deployment questions separate

A model can look good on one task and still be a poor fit for your deployment question.

Interpret the outputs in relation to the real use case, not only the easiest metric to report.

Evaluation tutorial: {doc}../tutorials/evaluate-on-a-test-set
Evaluation concepts: {doc}../explanation/evaluation-concepts-and-matching
Model output and validation: {doc}../explanation/model-output-and-validation