Skip to content

feat: English ITN enhancements from NeMo reference#361

Merged
pengzhendong merged 1 commit into
masterfrom
feat/en-itn-nemo-enhancements
Jun 10, 2026
Merged

feat: English ITN enhancements from NeMo reference#361
pengzhendong merged 1 commit into
masterfrom
feat/en-itn-nemo-enhancements

Conversation

@pengzhendong

Copy link
Copy Markdown
Member

Summary

Backport features from NeMo's English ITN that were supported in code but not covered by their test suite.

Time

  • Numeric minutes with "past" (e.g. "ten past four" → 04:10)
  • "till" alias for "to" (e.g. "quarter till two" → 01:45)
  • "o' clock" / "o clock" variants
  • Zero-pad hours in to_hour.tsv

Telephone

  • "triple X" → XXX (e.g. "triple five" → 555)

Date

  • BCE, CE and long-form suffixes ("before common era", "common era")
  • H1/H2 financial half-year periods
  • Century ranges ("nineteen hundreds" → 1900s)
  • Millennium ranges ("two thousands" → 2000s)
  • "X hundred" year form ("nineteen hundred" → 1900)

Measure

  • Replace cdrewrite on VSIGMA with finite string_map for pluralization — fixes OOM on runtime compose
  • Proper -ies/-es rules and irregular plurals (feet, inches, ounces)
  • Fix "per" unit weight to prefer direct TSV matches (e.g. mph over mi/h)

Test plan

  • 487 project unit tests pass (3.04s)
  • NeMo test suite: 469/470 (99.8%), no regressions
  • All normalize calls < 3ms

Time:
- Support numeric minutes with "past" (e.g. "ten past four" -> 04:10)
- Add "till" as alias for "to" (e.g. "quarter till two" -> 01:45)
- Add "o' clock" and "o clock" variants
- Zero-pad hours in to_hour.tsv

Telephone:
- Add "triple X" support (e.g. "triple five" -> 555)

Date:
- Add BCE, CE and long-form year suffixes
- Add H1/H2 financial half-year periods
- Add century ("nineteen hundreds" -> 1900s) and millennium ranges
- Add "X hundred" year form (e.g. "nineteen hundred" -> 1900)

Measure:
- Replace cdrewrite with finite string_map for pluralization (fixes OOM)
- Add -ies/-es rules and irregular plurals (feet, inches, ounces)
- Fix "per" unit priority to prefer direct TSV matches (e.g. mph)
@pengzhendong pengzhendong merged commit 57f8585 into master Jun 10, 2026
1 check passed
@pengzhendong pengzhendong deleted the feat/en-itn-nemo-enhancements branch June 10, 2026 03:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant