clean up instructions

This commit is contained in:
macaodha 2022-12-14 18:55:54 +00:00
parent 7b5b2be08f
commit a5c263093f
6 changed files with 57 additions and 53 deletions

View File

@ -14,20 +14,19 @@ Code for detecting and classifying bat echolocation calls in high frequency audi
### Try the model ### Try the model
Click [here](https://colab.research.google.com/github/macaodha/batdetect2/blob/master/batdetect2_notebook.ipynb) to run the model using Google Colab. Click [here](https://colab.research.google.com/github/macaodha/batdetect2/blob/master/batdetect2_notebook.ipynb) to run the model using Google Colab. You can also run this notebook locally.
You can also run this notebook locally.
### Running the model on your own data ### Running the model on your own data
After following the above steps to install the code you can run the model on your own data by opening the command line where the code is located and typing: After following the above steps to install the code you can run the model on your own data by opening the command line where the code is located and typing:
`python run_batdetect.py AUDIO_DIR ANN_DIR DETECTION_THRESHOLD` `python run_batdetect.py AUDIO_DIR ANN_DIR DETECTION_THRESHOLD`
e.g. e.g.
`python run_batdetect.py example_data/audio/ example_data/anns/ 0.3` `python run_batdetect.py example_data/audio/ example_data/anns/ 0.3`
`AUDIO_DIR` is the path on your computer to the audio wav files of interest. `AUDIO_DIR` is the path on your computer to the audio wav files of interest.
`ANN_DIR` is the path on your computer where the model predictions will be saved. The model will output both `.csv` and `.json` results for each audio file. `ANN_DIR` is the path on your computer where the model predictions will be saved. The model will output both `.csv` and `.json` results for each audio file.
`DETECTION_THRESHOLD` is a number between 0 and 1 specifying the cut-off threshold applied to the calls. A smaller number will result in more calls detected, but with the chance of introducing more mistake. `DETECTION_THRESHOLD` is a number between 0 and 1 specifying the cut-off threshold applied to the calls. A smaller number will result in more calls detected, but with the chance of introducing more mistakes.
There are also optional arguments, e.g. you can request that the model outputs features (i.e. estimated call parameters) such as duration, max_frequency, etc. by setting the flag `--spec_features`. These will be saved as `*_spec_features.csv` files: There are also optional arguments, e.g. you can request that the model outputs features (i.e. estimated call parameters) such as duration, max_frequency, etc. by setting the flag `--spec_features`. These will be saved as `*_spec_features.csv` files:
`python run_batdetect.py example_data/audio/ example_data/anns/ 0.3 --spec_features` `python run_batdetect.py example_data/audio/ example_data/anns/ 0.3 --spec_features`
@ -35,6 +34,10 @@ There are also optional arguments, e.g. you can request that the model outputs f
You can also specify which model to use by setting the `--model_path` argument. If not specified, it will default to using a model trained on UK data. You can also specify which model to use by setting the `--model_path` argument. If not specified, it will default to using a model trained on UK data.
### Training the model on your own data
Take a look at the steps outlined in fintuning readme [here](bat_detect/finetune/readme.md) for a description of how to train your own model.
### Data and annotations ### Data and annotations
The raw audio data and annotations used to train the models in the paper will be added soon. The raw audio data and annotations used to train the models in the paper will be added soon.
The audio interface used to annotate audio data for training and evaluation is available [here](https://github.com/macaodha/batdetect2_GUI). The audio interface used to annotate audio data for training and evaluation is available [here](https://github.com/macaodha/batdetect2_GUI).

View File

@ -1,4 +1,4 @@
# Evaluating model # Evaluating BatDetect2
This script evaluates a trained model and outputs several plots summarizing the performance. It is used as follows: This script evaluates a trained model and outputs several plots summarizing the performance. It is used as follows:
`python path_to_store_images/ path_to_audio_files/ path_to_annotation_file/ path_to_trained_model/` `python path_to_store_images/ path_to_audio_files/ path_to_annotation_file/ path_to_trained_model/`

View File

@ -1,36 +1,40 @@
# Finetuning the BatDetet2 model on your own data # Finetuning the BatDetet2 model on your own data
Main steps:
1. Annotate your data using the annotation GUI. 1. Annotate your data using the annotation GUI.
2. Run `prep_data_finetune.py` to create a training and validation split for your data. 2. Run `prep_data_finetune.py` to create a training and validation split for your data.
3. Run `finetune_model.py` to finetune a model on your data. 3. Run `finetune_model.py` to finetune a model on your data.
## 1. Annotate calls of interest in audio data ## 1. Annotate calls of interest in audio data
Use the annotation tools provided [here](https://github.com/macaodha/batdetect2_GUI) to manually identify where the events of interest (e.g. bat echolocation calls) are in your files.
This will result in a directory of audio files and a directory of annotation files, where each audio file will have a corresponding `.json` annotation file. This will result in a directory of audio files and a directory of annotation files, where each audio file will have a corresponding `.json` annotation file.
Use the annotation GUI to do this.
Make sure to annotation all instances of a bat call. Make sure to annotation all instances of a bat call.
If unsure of the species, just label the call as `Bat`. If unsure of the species, just label the call as `Bat`.
## 2. Split data into train and test ## 2. Split data into train and test sets
* Run `prep_data_finetune.py` to split the data into train and test sets. This will result in two separate files, a train and a test one. After performing the previous step you should have a directory of annotations files saved as jsons, one for each audio file you have annotated.
Example usage: * The next step is to split these into training and testing subsets.
`python prep_data_finetune.py dataset_name path_to_audio/audio/ path_to_annotations/anns/ path_to_output_anns/` Run `prep_data_finetune.py` to split the data into train and test sets. This will result in two separate files, a train and a test one, i.e.
This may result an error if it does not result in the files containing the same species in the train and test splits. You can try different random seeds if this is an issue e.g. `--rand_seed 123456`. `python prep_data_finetune.py dataset_name path_to_audio/ path_to_annotations/ path_to_output_anns/`
This may result an error if it does not generate output files containing the same set of species in the train and test splits. You can try different random seeds if this is an issue e.g. `--rand_seed 123456`.
* Can also load split from text files, where each line of the text file is the name of a .wav file e.g. * You can also load the train and test split using text files, where each line of the text file is the name of a `wav` file (without the file path) e.g.
`python prep_data_finetune.py yucatan /data1/bat_data/data/yucatan/audio/ /data1/bat_data/data/yucatan/anns/ /data1/bat_data/annotations/anns_finetune/ --train_file path_to_file/yucatan_train_split.txt --test_file path_to_file/yucatan_test_split.txt` `python prep_data_finetune.py dataset_name path_to_audio/ path_to_annotations/ path_to_output/ --train_file path_to_file/list_of_train_files.txt --test_file path_to_file/list_of_test_files.txt`
* Can also replace class names. Use semi colons to separate, without spaces between them e.g. * Can also replace class names. This can be helpful if you don't think you have enough calls/files for a given species. Use semi-colons to separate, without spaces between them e.g.
`python prep_data_finetune.py brazil_data /data1/bat_data/data/brazil_data/audio/ /data1/bat_data/data/brazil_data/anns/ /data1/bat_data/annotations/anns_finetune/ --input_class_names "Histiotus;Molossidae;Lasiurus;Myotis;Rhogeesa;Vespertilionidae" --output_class_names "Group One;Group One;Group One;Group Two;Group Two;Group Three"` `python prep_data_finetune.py dataset_name path_to_audio/audio/ path_to_annotations/anns/ path_to_output/ --input_class_names "Histiotus;Molossidae;Lasiurus;Myotis;Rhogeesa;Vespertilionidae" --output_class_names "Group One;Group One;Group One;Group Two;Group Two;Group Three"`
## 3. Finetune the model ## 3. Finetuning the model
Example usage: Finally, you can finetune the model using your data i.e.
`python finetune_model.py path_to_audio/audio/ path_to_train/TRAIN.json path_to_train/TEST.json ../../models/model_to_finetune.pth.tar` `python finetune_model.py path_to_audio/ path_to_train/TRAIN.json path_to_train/TEST.json ../../models/Net2DFast_UK_same.pth.tar`
Here, `TRAIN.json` and `TEST.json` are the splits created in the previous steps.
## Additional notes #### Additional notes
* For the first step it is better to cut the files into less than 5 second audio clips and make sure to annotate them exhaustively (i.e. all bat calls should be annotated). * For the first step it is better to cut the files into less than 5 second audio clips and make sure to annotate them exhaustively (i.e. all bat calls should be annotated).
* You can train the model for longer, by setting the `--num_epochs` flag to a larger number e.g. `--num_epochs 400`. The default is 200. * You can train the model for longer, by setting the `--num_epochs` flag to a larger number e.g. `--num_epochs 400`. The default is `200`.
* If you do not want to finetune the model, but instead want to train it from scratch you can set the `--train_from_scratch` flag. * If you do not want to finetune the model, but instead want to train it from scratch, you can set the `--train_from_scratch` flag.

View File

@ -2,6 +2,8 @@
`python train_model.py data_dir annotation_dir` e.g. `python train_model.py data_dir annotation_dir` e.g.
`python train_model.py /data1/bat_data/data/ /data1/bat_data/annotations/anns/` `python train_model.py /data1/bat_data/data/ /data1/bat_data/annotations/anns/`
More comprehensive instructions are provided in the finetune directory.
## Training on your own data ## Training on your own data
You can either use the finetuning scripts to finetune from an existing training dataset. Follow the instructions in the `../finetune/` directory. You can either use the finetuning scripts to finetune from an existing training dataset. Follow the instructions in the `../finetune/` directory.

File diff suppressed because one or more lines are too long

12
faq.md
View File

@ -32,16 +32,16 @@ This is a limitation of our current training data. If you have such data or woul
Currently we do not do any sophisticated post processing on the results output by the model. We return a probability associated with each species for each call. You can use these predictions to clean up the noisy predictions for sequences of calls. Currently we do not do any sophisticated post processing on the results output by the model. We return a probability associated with each species for each call. You can use these predictions to clean up the noisy predictions for sequences of calls.
#### Can I trust the model outputs?
The models developed and shared as part of this repository should be used with caution. While they have been evaluated on held out audio data, great care should be taken when using the model outputs for any form of biodiversity assessment. Your data may differ, and as a result it is very strongly recommended that you validate the model first using data with known species to ensure that the outputs can be trusted.
#### The code works well but it is slow? #### The code works well but it is slow?
Try a different/faster computer. On a reasonably recent desktop it takes about 13 seconds (on the GPU) or 1.3 minutes (on the CPU) to process 7.5 minutes of audio. In general, we observe a factor of ~5-10 speed up using recent Nvidia GPUs compared to CPU only systems. Try a different/faster computer. On a reasonably recent desktop it takes about 13 seconds (on the GPU) or 1.3 minutes (on the CPU) to process 7.5 minutes of audio. In general, we observe a factor of ~5-10 speed up using recent Nvidia GPUs compared to CPU only systems.
#### My audio files are very big and as a result the code is slow. #### My audio files are very big and as a result the code is slow.
If your audio files are very long in duration (i.e. mulitple minutes) it might be better to split them up into several smaller files. `sox` is a command line tool that can achieve this. It's easy to install on Ubuntu (e.g. `sudo apt-get install sox`) and is also available for Windows and OSX [here](http://sox.sourceforge.net/). To split up a file into 8 second chunks: If your audio files are very long in duration (i.e. multiple minutes) it might be better to split them up into several smaller files. Have a look at the instructions and scripts in our annotation GUI codebase for how to crop your files into shorter ones - see [here](https://github.com/macaodha/batdetect2_GUI).
`sox INPUT_FILENAME.wav OUTPUT_FILENAME.wav trim 0 8 : newfile : restart`
This will result in a bunch of individual wav files appended with a number e.g. OUTPUT_FILENAME001.wav, OUTPUT_FILENAME002.wav, ... If you have time expanded files you might want to take the time expansion factor into account when splitting the files e.g. if the files are time expanded by 10 you should multiply the chuck length by 10. So to get 8 seconds in real time you would want to split the files into 8x10 second chunks.
## Training a new model ## Training a new model
@ -62,4 +62,4 @@ In principle yes, however you may need to change some of the training hyper-para
## Usage ## Usage
#### Can I use the code for commercial purposes or incorporate raw source code or trained models into my commercial system? #### Can I use the code for commercial purposes or incorporate raw source code or trained models into my commercial system?
No. This codebase is only for non-commercial use. No. This codebase is currently only for non-commercial use. See the license.