6.6 KiB
Step 3: Transforming Annotation Tags (Optional)
Purpose and Context
After defining your vocabulary (Step 1: Terms) and filtering out irrelevant sound events (Step 2: Filtering), you have a dataset of annotations ready for the next stages. Before you select the final target classes for training (Step 4), you might want or need to modify the tags associated with your annotations. This optional step allows you to clean up, standardize, or derive new information from your existing tags.
Why transform tags?
- Correcting Mistakes: Fix typos or incorrect values in specific tags (e.g., changing an incorrect species label).
- Standardizing Labels: Ensure consistency if the same information was tagged using slightly different values (e.g., mapping "echolocation", "Echoloc.", and "Echolocation Call" all to a single standard value: "Echolocation").
- Grouping Related Concepts: Combine different specific tags into a broader category (e.g., mapping several different species tags like Myotis daubentonii and Myotis nattereri to a single
genus: Myotis
tag). - Deriving New Information: Automatically create new tags based on existing ones (e.g., automatically generating a
genus: Myotis
tag whenever aspecies: Myotis daubentonii
tag is present).
This step uses the batdetect2.targets.transform
module to apply these changes based on rules you define.
How it Works: Transformation Rules
You control how tags are transformed by defining a list of rules in your configuration file (e.g., your main .yaml
file, often under a section named transform
).
Each rule specifies a particular type of transformation to perform. Importantly, the rules are applied sequentially, in the exact order they appear in your configuration list. The output annotation from one rule becomes the input for the next rule in the list. This means the order can matter!
Types of Transformation Rules
Here are the main types of rules you can define:
-
Replace an Exact Tag (
replace
)- Use Case: Fixing a specific, known incorrect tag.
- How it works: You specify the exact original tag (both its term key and value) and the exact tag you want to replace it with.
- Example Config: Replace the informal tag
species: Pip pip
with the correct scientific name tag.transform: rules: - rule_type: replace original: key: species # Term key of the tag to find value: "Pip pip" # Value of the tag to find replacement: key: species # Term key of the replacement tag value: "Pipistrellus pipistrellus" # Value of the replacement tag
-
Map Values (
map_value
)- Use Case: Standardizing different values used for the same concept, or grouping multiple specific values into one category.
- How it works: You specify a
source_term_key
(the type of tag to look at, e.g.,call_type
). Then you provide avalue_mapping
dictionary listing original values and the new values they should be mapped to. Only tags matching thesource_term_key
and having a value listed in the mapping will be changed. You can optionally specify atarget_term_key
if you want to change the term type as well (e.g., mapping species to a genus). - Example Config: Standardize different ways "Echolocation" might have been written for the
call_type
term.transform: rules: - rule_type: map_value source_term_key: call_type # Look at 'call_type' tags # target_term_key is not specified, so the term stays 'call_type' value_mapping: echolocation: Echolocation Echolocation Call: Echolocation Echoloc.: Echolocation # Add mappings for other values like 'Social' if needed
- Example Config (Grouping): Map specific Pipistrellus species tags to a single
genus: Pipistrellus
tag.transform: rules: - rule_type: map_value source_term_key: species # Look at 'species' tags target_term_key: genus # Change the term to 'genus' value_mapping: "Pipistrellus pipistrellus": Pipistrellus "Pipistrellus pygmaeus": Pipistrellus "Pipistrellus nathusii": Pipistrellus
-
Derive a New Tag (
derive_tag
)- Use Case: Automatically creating new information based on existing tags, like getting the genus from a species name.
- How it works: You specify a
source_term_key
(e.g.,species
). You provide atarget_term_key
for the new tag to be created (e.g.,genus
). You also provide the name of aderivation_function
(e.g.,"extract_genus"
) that knows how to perform the calculation (e.g., take "Myotis daubentonii" and return "Myotis").batdetect2
has some built-in functions, or you can potentially define your own (see advanced documentation). You can also choose whether to keep the original source tag (keep_source: true
). - Example Config: Create a
genus
tag from the existingspecies
tag, keeping the species tag.transform: rules: - rule_type: derive_tag source_term_key: species # Use the value from the 'species' tag target_term_key: genus # Create a tag with the 'genus' term derivation_function: extract_genus # Use the built-in function for this keep_source: true # Keep the original 'species' tag
- Another Example: Convert species names to uppercase (modifying the value of the same term).
transform: rules: - rule_type: derive_tag source_term_key: species # Use the value from the 'species' tag # target_term_key is not specified, so the term stays 'species' derivation_function: to_upper_case # Assume this function exists keep_source: false # Replace the original species tag
Rule Order Matters
Remember that rules are applied one after another. If you have multiple rules, make sure they are ordered correctly to achieve the desired outcome. For instance, you might want to standardize species names before deriving the genus from them.
Outcome
After applying all the transformation rules you've defined, the annotations will proceed to the next step (Step 4: Select Target Tags & Define Classes) with their tags potentially cleaned, standardized, or augmented based on your configuration. If you don't define any rules, the tags simply pass through this step unchanged.