mirror of
https://github.com/macaodha/batdetect2.git
synced 2025-06-29 22:51:58 +02:00
Add transform docs
This commit is contained in:
parent
991529cf86
commit
d97614a10d
118
docs/targets/transform.md
Normal file
118
docs/targets/transform.md
Normal file
@ -0,0 +1,118 @@
|
|||||||
|
## Step 3: Transforming Annotation Tags (Optional)
|
||||||
|
|
||||||
|
### Purpose and Context
|
||||||
|
|
||||||
|
After defining your vocabulary (Step 1: Terms) and filtering out irrelevant sound events (Step 2: Filtering), you have a dataset of annotations ready for the next stages.
|
||||||
|
Before you select the final target classes for training (Step 4), you might want or need to **modify the tags** associated with your annotations.
|
||||||
|
This optional step allows you to clean up, standardize, or derive new information from your existing tags.
|
||||||
|
|
||||||
|
**Why transform tags?**
|
||||||
|
|
||||||
|
- **Correcting Mistakes:** Fix typos or incorrect values in specific tags (e.g., changing an incorrect species label).
|
||||||
|
- **Standardizing Labels:** Ensure consistency if the same information was tagged using slightly different values (e.g., mapping "echolocation", "Echoloc.", and "Echolocation Call" all to a single standard value: "Echolocation").
|
||||||
|
- **Grouping Related Concepts:** Combine different specific tags into a broader category (e.g., mapping several different species tags like _Myotis daubentonii_ and _Myotis nattereri_ to a single `genus: Myotis` tag).
|
||||||
|
- **Deriving New Information:** Automatically create new tags based on existing ones (e.g., automatically generating a `genus: Myotis` tag whenever a `species: Myotis daubentonii` tag is present).
|
||||||
|
|
||||||
|
This step uses the `batdetect2.targets.transform` module to apply these changes based on rules you define.
|
||||||
|
|
||||||
|
### How it Works: Transformation Rules
|
||||||
|
|
||||||
|
You control how tags are transformed by defining a list of **rules** in your configuration file (e.g., your main `.yaml` file, often under a section named `transform`).
|
||||||
|
|
||||||
|
Each rule specifies a particular type of transformation to perform.
|
||||||
|
Importantly, the rules are applied **sequentially**, in the exact order they appear in your configuration list.
|
||||||
|
The output annotation from one rule becomes the input for the next rule in the list.
|
||||||
|
This means the order can matter!
|
||||||
|
|
||||||
|
### Types of Transformation Rules
|
||||||
|
|
||||||
|
Here are the main types of rules you can define:
|
||||||
|
|
||||||
|
1. **Replace an Exact Tag (`replace`)**
|
||||||
|
|
||||||
|
- **Use Case:** Fixing a specific, known incorrect tag.
|
||||||
|
- **How it works:** You specify the _exact_ original tag (both its term key and value) and the _exact_ tag you want to replace it with.
|
||||||
|
- **Example Config:** Replace the informal tag `species: Pip pip` with the correct scientific name tag.
|
||||||
|
```yaml
|
||||||
|
transform:
|
||||||
|
rules:
|
||||||
|
- rule_type: replace
|
||||||
|
original:
|
||||||
|
key: species # Term key of the tag to find
|
||||||
|
value: "Pip pip" # Value of the tag to find
|
||||||
|
replacement:
|
||||||
|
key: species # Term key of the replacement tag
|
||||||
|
value: "Pipistrellus pipistrellus" # Value of the replacement tag
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Map Values (`map_value`)**
|
||||||
|
|
||||||
|
- **Use Case:** Standardizing different values used for the same concept, or grouping multiple specific values into one category.
|
||||||
|
- **How it works:** You specify a `source_term_key` (the type of tag to look at, e.g., `call_type`).
|
||||||
|
Then you provide a `value_mapping` dictionary listing original values and the new values they should be mapped to.
|
||||||
|
Only tags matching the `source_term_key` and having a value listed in the mapping will be changed.
|
||||||
|
You can optionally specify a `target_term_key` if you want to change the term type as well (e.g., mapping species to a genus).
|
||||||
|
- **Example Config:** Standardize different ways "Echolocation" might have been written for the `call_type` term.
|
||||||
|
```yaml
|
||||||
|
transform:
|
||||||
|
rules:
|
||||||
|
- rule_type: map_value
|
||||||
|
source_term_key: call_type # Look at 'call_type' tags
|
||||||
|
# target_term_key is not specified, so the term stays 'call_type'
|
||||||
|
value_mapping:
|
||||||
|
echolocation: Echolocation
|
||||||
|
Echolocation Call: Echolocation
|
||||||
|
Echoloc.: Echolocation
|
||||||
|
# Add mappings for other values like 'Social' if needed
|
||||||
|
```
|
||||||
|
- **Example Config (Grouping):** Map specific Pipistrellus species tags to a single `genus: Pipistrellus` tag.
|
||||||
|
```yaml
|
||||||
|
transform:
|
||||||
|
rules:
|
||||||
|
- rule_type: map_value
|
||||||
|
source_term_key: species # Look at 'species' tags
|
||||||
|
target_term_key: genus # Change the term to 'genus'
|
||||||
|
value_mapping:
|
||||||
|
"Pipistrellus pipistrellus": Pipistrellus
|
||||||
|
"Pipistrellus pygmaeus": Pipistrellus
|
||||||
|
"Pipistrellus nathusii": Pipistrellus
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Derive a New Tag (`derive_tag`)**
|
||||||
|
- **Use Case:** Automatically creating new information based on existing tags, like getting the genus from a species name.
|
||||||
|
- **How it works:** You specify a `source_term_key` (e.g., `species`).
|
||||||
|
You provide a `target_term_key` for the new tag to be created (e.g., `genus`).
|
||||||
|
You also provide the name of a `derivation_function` (e.g., `"extract_genus"`) that knows how to perform the calculation (e.g., take "Myotis daubentonii" and return "Myotis").
|
||||||
|
`batdetect2` has some built-in functions, or you can potentially define your own (see advanced documentation).
|
||||||
|
You can also choose whether to keep the original source tag (`keep_source: true`).
|
||||||
|
- **Example Config:** Create a `genus` tag from the existing `species` tag, keeping the species tag.
|
||||||
|
```yaml
|
||||||
|
transform:
|
||||||
|
rules:
|
||||||
|
- rule_type: derive_tag
|
||||||
|
source_term_key: species # Use the value from the 'species' tag
|
||||||
|
target_term_key: genus # Create a tag with the 'genus' term
|
||||||
|
derivation_function: extract_genus # Use the built-in function for this
|
||||||
|
keep_source: true # Keep the original 'species' tag
|
||||||
|
```
|
||||||
|
- **Another Example:** Convert species names to uppercase (modifying the value of the _same_ term).
|
||||||
|
```yaml
|
||||||
|
transform:
|
||||||
|
rules:
|
||||||
|
- rule_type: derive_tag
|
||||||
|
source_term_key: species # Use the value from the 'species' tag
|
||||||
|
# target_term_key is not specified, so the term stays 'species'
|
||||||
|
derivation_function: to_upper_case # Assume this function exists
|
||||||
|
keep_source: false # Replace the original species tag
|
||||||
|
```
|
||||||
|
|
||||||
|
### Rule Order Matters!
|
||||||
|
|
||||||
|
Remember that rules are applied one after another.
|
||||||
|
If you have multiple rules, make sure they are ordered correctly to achieve the desired outcome.
|
||||||
|
For instance, you might want to standardize species names _before_ deriving the genus from them.
|
||||||
|
|
||||||
|
### Outcome
|
||||||
|
|
||||||
|
After applying all the transformation rules you've defined, the annotations will proceed to the next step (Step 4: Select Target Tags & Define Classes) with their tags potentially cleaned, standardized, or augmented based on your configuration.
|
||||||
|
If you don't define any rules, the tags simply pass through this step unchanged.
|
@ -72,7 +72,7 @@ dev-dependencies = [
|
|||||||
"ruff>=0.7.3",
|
"ruff>=0.7.3",
|
||||||
"ipykernel>=6.29.4",
|
"ipykernel>=6.29.4",
|
||||||
"setuptools>=69.5.1",
|
"setuptools>=69.5.1",
|
||||||
"basedpyright>=1.28.4",
|
"pyright>=1.1.399",
|
||||||
]
|
]
|
||||||
|
|
||||||
[tool.ruff]
|
[tool.ruff]
|
||||||
|
Loading…
Reference in New Issue
Block a user