filter ingestor array on controlled_vocabulary list#1292
filter ingestor array on controlled_vocabulary list#1292mikesndrs wants to merge 1 commit intoElixirTeSS:masterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR extends the ingestor pipeline to optionally normalize certain array fields (e.g., target_audience, keywords) against a controlled vocabulary mapping, controlled via a new feature.controlled_vocabulary_vars setting.
Changes:
- Refactors auto-parsing logic into
handle_auto_parsingand reuses shared mapping-loading viaget_mapping. - Adds
handle_controlled_vocabularyand applies it to both event and material ingestion. - Adds unit tests validating controlled-vocabulary behavior when the feature flag is enabled/disabled.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| test/unit/ingestors/ingestor_test.rb | Adds coverage for controlled-vocabulary normalization behavior. |
| lib/ingestors/material_ingestion.rb | Routes material ingestion through the new helper methods. |
| lib/ingestors/event_ingestion.rb | Routes event ingestion through the new helper methods. |
| lib/ingestors/auto_parsing.rb | Introduces shared mapping loader + controlled vocabulary normalization helper. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def auto_parse(var, description) | ||
| mapping = get_mapping(var) | ||
|
|
||
| mapping | ||
| .select{ |key, val| description&.downcase&.include?(key.to_s.downcase) } | ||
| &.values | ||
| &.uniq |
There was a problem hiding this comment.
auto_parse calls mapping.select even when get_mapping(var) can return nil (e.g., if the JSON mapping file doesn’t exist). That will raise NoMethodError and break ingestion for any misconfigured/typo’d auto_parse_vars. Consider using safe navigation on mapping (or early-return when mapping is nil) so missing mappings are handled gracefully.
| current_val = obj.send(var).map{|x| x.to_s.downcase} | ||
| new_val = mapping | ||
| .filter{ |key, val| current_val.include?(key.to_s.downcase) || current_val.include?(val.to_s.downcase) } | ||
| .map{ |key, val| val } |
There was a problem hiding this comment.
handle_controlled_vocabulary assumes both a mapping exists and that obj.send(var) is an array. If get_mapping(var) returns nil (no JSON file) or the attribute is nil/non-enumerable, this will raise (e.g., nil.filter / nil.map). Consider skipping when mapping is nil and normalizing the current value via Array(obj.send(var)) (and possibly next when it’s blank) before filtering.
| current_val = obj.send(var).map{|x| x.to_s.downcase} | |
| new_val = mapping | |
| .filter{ |key, val| current_val.include?(key.to_s.downcase) || current_val.include?(val.to_s.downcase) } | |
| .map{ |key, val| val } | |
| next if mapping.blank? | |
| current_val = Array(obj.send(var)).map { |x| x.to_s.downcase } | |
| next if current_val.blank? | |
| new_val = mapping | |
| .filter { |key, val| current_val.include?(key.to_s.downcase) || current_val.include?(val.to_s.downcase) } | |
| .map { |key, val| val } |
Checklist