fix: English TN spurious space after opening punctuation by pengzhendong · Pull Request #363 · wenet-e2e/WeTextProcessing

pengzhendong · 2026-06-11T05:45:48Z

Summary

English TN verbalizer's INSERT_SPACE was applied uniformly between all non-punct tokens, causing unwanted spaces after opening quotes/parens (e.g., "hello" → " hello", (hello) → ( hello))
Split the verbalizer pattern so classify tokens use INSERT_SPACE for inter-word spacing while punct tokens use DELETE_SPACE — punct values already carry surrounding spacing via the tagger's add_weight(accep(" "), -1.0).star
Add 3 test cases covering the reported issue and related patterns

Fixes wenet-e2e/WeTextProcessing downstream issue: pengzhendong/wetext#14

Test plan

All 792 existing tests pass (English + Chinese TN/ITN)
New test cases added: "So, one hundred thousand merit shouldn't be a problem.", "hello", (hello)
Verified no regression on: hello, world, He said "hello", a"b", hello "world"

…ation The verbalizer's INSERT_SPACE was applied uniformly between all non-punct tokens, causing unwanted spaces after opening quotes and parens (e.g., `"hello"` → `" hello"`). Split the verbalizer pattern so classify tokens use INSERT_SPACE for inter-word spacing while punct tokens use DELETE_SPACE — punct values already carry surrounding spacing via the tagger's add_weight(accep(" "), -1.0).star. Add test cases for the reported issue and related patterns.

pengzhendong merged commit cb2fb03 into master Jun 11, 2026
1 check passed

pengzhendong deleted the fix/en-tn-punct-spacing branch June 11, 2026 05:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: English TN spurious space after opening punctuation#363

fix: English TN spurious space after opening punctuation#363
pengzhendong merged 1 commit into
masterfrom
fix/en-tn-punct-spacing

pengzhendong commented Jun 11, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pengzhendong commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pengzhendong commented Jun 11, 2026 •

edited

Loading