Skip to content

Add an AI "Reformat" option for manuscript sections#1277

Merged
atomantic merged 5 commits into
mainfrom
feat/manuscript-ai-reformat
Jun 15, 2026
Merged

Add an AI "Reformat" option for manuscript sections#1277
atomantic merged 5 commits into
mainfrom
feat/manuscript-ai-reformat

Conversation

@atomantic

Copy link
Copy Markdown
Owner

Summary

The deterministic Format button (regex) keeps hitting PDF-paste cases it fundamentally can't resolve — genuinely scrambled text: duplicated/misplaced quotes, a dropped "I , ambiguous wraps before a capitalized word. This adds a second per-section Reformat (AI) button (both Live and Review modes) that resolves these semantically.

It sends the section to the AI provider selected in the sidebar with a strict prompt (new manuscript-reformat stage): fix only whitespace, line breaks, drop-caps, and quotation-mark placement — change no words. The result is persisted server-side (snapshotted, so History ▸ Revert undoes it).

Trustworthy by construction — the integrity guard. Reformatting only moves whitespace and re-attaches quotes, so the letter/digit "skeleton" of the text is (near-)identical before and after. The service compares skeletons and rejects (400, nothing saved) any result that rewrote, added, or dropped words. The model is only ever allowed to move whitespace/quotes plus delete a tiny bounded budget of artifact characters (e.g. a duplicated drop-cap, verified to form a subsequence — no substitutions). So the AI can never silently alter your prose.

The core reformatManuscriptText() is exported so the importer can reuse it to clean prose at ingest (tracked as a PLAN item — it needs opt-in UX because it's one LLM call per issue).

What's here

  • Prompt + stage: data.reference/prompts/stages/manuscript-reformat.md + a stage-config.json entry; migration 087 seeds both into existing installs (boot runs migrations, not setup-data.js).
  • Service: reformatManuscriptText / reformatManuscriptSection + the integrity guard in manuscriptFix.js.
  • Route: POST /pipeline/series/:id/manuscript/sections/:issueId/reformat (providerOverride/modelOverride, errors → 400/404 via the shared mapper).
  • Client: reformatPipelineManuscriptSection API wrapper + the second header button, wired through both section components with the editor's provider override.

Test plan

  • server/services/pipeline/manuscriptReformat.test.js — 10 tests on the integrity guard: pure-whitespace change accepted, de-hyphenation accepted, word substitution / inserted sentence / large deletion rejected, tiny artifact deletion accepted, code-fence + marker stripping, empty-input no-op, and the format-label/source pass-through. All green.
  • Template render + stage-config registration verified via the real template engine; migration 087 runs idempotently.
  • eslint clean (client + server); existing manuscript suites pass.

Note

The first AI render on a section costs one LLM call (uses the sidebar provider). The plain Format button stays for instant/offline cleanup.

…ed by regex

The deterministic Format button can't resolve genuinely scrambled PDF-paste
text (duplicated/misplaced quotes, ambiguous wraps before a capitalized word).
Add a second per-section "Reformat (AI)" button that sends the text to the
selected AI provider with a strict "fix only whitespace, line breaks,
drop-caps, and quote placement — change no words" prompt (new stage
`manuscript-reformat`), then persists the result (snapshotted, revertible).

Trustworthy by construction: a server-side integrity guard compares the
letter/digit skeleton of the text before and after and rejects (400, nothing
saved) any result that rewrote, added, or dropped words. The model may only
move whitespace and re-attach quotes, plus delete a tiny budget of artifact
characters (e.g. a duplicated drop-cap). The core reformatManuscriptText() is
exported so the importer can reuse it (PLAN item) to clean prose at ingest.

- prompt: data.reference/prompts/stages/manuscript-reformat.md (+ stage-config)
- migration 087 seeds the stage into existing installs (boot runs migrations,
  not setup-data)
- service: reformatManuscriptText / reformatManuscriptSection in manuscriptFix.js
- route: POST /pipeline/series/:id/manuscript/sections/:issueId/reformat
- client: reformatPipelineManuscriptSection + the second header button
…hange path

- the proportional deletion budget let a large section silently absorb a dropped contiguous clause (a clause is a valid subsequence deletion). Replaced the run-length heuristic (which under-reported when deleted chars coincidentally matched later text) with a small ABSOLUTE total-deletion budget: for a subsequence the net length drop is the exact total deleted, so an 8-char cap rejects any dropped clause/sentence regardless of document size while still allowing a few scattered artifact glyphs (duplicated drop-cap)
- the no-change reformat path no longer flashes a 'saved' badge or rewrites baseline; it only toasts 'Already well-formatted'
…, edit-safe

Three correctness fixes from codex review of the AI reformat path:

1. Unsaved edits were ignored — the endpoint loaded the SAVED stage text, so
   reformatting with unsaved edits in the textarea reformatted stale text and
   the response overwrote the edits. The endpoint is now compute-only and takes
   the client's live `content`; the client sends its current (possibly unsaved)
   text.

2. Stale overwrite during the slow call — the endpoint no longer persists; the
   client owns the save and, before applying, checks the section's live content
   still equals what it sent (via a live-sections ref), discarding the result
   instead of clobbering a mid-call edit. Cross-tab concurrent saves are
   last-write-wins, identical to the Save button.

3. Integrity guard accepted short word deletions — `do not go` → `do go`
   (3 skeleton chars) slipped past the deletion budget yet inverts meaning.
   Replaced the budget/subsequence allowance with an EXACT skeleton match: the
   model may only move whitespace and quotation-mark/punctuation glyphs, never
   delete a letter or word. A duplicated word stays put (the deterministic
   Format button owns that dedup). Prompt updated to match.

Route moves to POST /series/:id/manuscript/reformat (compute-only, no issue
mutation). reformatManuscriptText stays exported for the importer.
…sitive

The skeleton lowercased before comparing, so a case-only rewrite (US → us, a de-capitalized name/heading) passed the guard despite the 'every letter preserved exactly' contract. No reformat operation changes letter case, so compare case-sensitively and reject case-only changes too.
…y skeleton

The skeleton ignored all non-alphanumerics, so a mangled contraction (don't → dont) passed. Keep apostrophes (curly normalized to straight, so a benign smart-quote pass isn't flagged) — they're never touched by a legitimate reformat. Documented the residual cases that genuinely can't be guarded without breaking reformatting: a mid-word hyphen removal (X-ray → Xray) is indistinguishable from de-hyphenating a wrap-split word, and a removed inter-word boundary (now here → nowhere) from a drop-cap rejoin (T\nhe → The). Changelog scoped to the actual guarantee.
@atomantic atomantic merged commit 25ec5e2 into main Jun 15, 2026
2 checks passed
@atomantic atomantic deleted the feat/manuscript-ai-reformat branch June 15, 2026 17:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant