Skip to content

fix: English TN spurious space after opening punctuation#363

Merged
pengzhendong merged 1 commit into
masterfrom
fix/en-tn-punct-spacing
Jun 11, 2026
Merged

fix: English TN spurious space after opening punctuation#363
pengzhendong merged 1 commit into
masterfrom
fix/en-tn-punct-spacing

Conversation

@pengzhendong

@pengzhendong pengzhendong commented Jun 11, 2026

Copy link
Copy Markdown
Member

Fixed pengzhendong/wetext#14

Summary

  • English TN verbalizer's INSERT_SPACE was applied uniformly between all non-punct tokens, causing unwanted spaces after opening quotes/parens (e.g., "hello"" hello", (hello)( hello))
  • Split the verbalizer pattern so classify tokens use INSERT_SPACE for inter-word spacing while punct tokens use DELETE_SPACE — punct values already carry surrounding spacing via the tagger's add_weight(accep(" "), -1.0).star
  • Add 3 test cases covering the reported issue and related patterns

Fixes wenet-e2e/WeTextProcessing downstream issue: pengzhendong/wetext#14

Test plan

  • All 792 existing tests pass (English + Chinese TN/ITN)
  • New test cases added: "So, one hundred thousand merit shouldn't be a problem.", "hello", (hello)
  • Verified no regression on: hello, world, He said "hello", a"b", hello "world"

…ation

The verbalizer's INSERT_SPACE was applied uniformly between all
non-punct tokens, causing unwanted spaces after opening quotes and
parens (e.g., `"hello"` → `" hello"`). Split the verbalizer pattern
so classify tokens use INSERT_SPACE for inter-word spacing while
punct tokens use DELETE_SPACE — punct values already carry surrounding
spacing via the tagger's add_weight(accep(" "), -1.0).star.

Add test cases for the reported issue and related patterns.
@pengzhendong pengzhendong merged commit cb2fb03 into master Jun 11, 2026
1 check passed
@pengzhendong pengzhendong deleted the fix/en-tn-punct-spacing branch June 11, 2026 05:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

English: normalize inserts a space into quotes

1 participant