Skip to content

Shared CPDF_Parser + per-layer overlay holder#23

Open
bobsingor wants to merge 25 commits into
embedpdf/mainfrom
embedpdf/feature/read-purity
Open

Shared CPDF_Parser + per-layer overlay holder#23
bobsingor wants to merge 25 commits into
embedpdf/mainfrom
embedpdf/feature/read-purity

Conversation

@bobsingor
Copy link
Copy Markdown
Contributor

No description provided.

bobsingor added 25 commits May 8, 2026 23:07
Harden layer read-purity by preventing layer page lookup and name-tree read APIs from entering mutable PDF graph paths.

This change:

- Adds a CPDF_LayerDocument::GetPageDictionary() override so unresolved layer page slots fail closed instead of running PDFium’s mutable page-tree fallback.
- Adds CPDF_NameTree::CreateForReading() and moves named destinations, attachments, JavaScript actions, and form-fill JavaScript scanning onto const traversal.
- Keeps mutating name-tree operations on the existing mutable creation path.
- Expands read-only layer canaries across render/cache, annotation, catalog, name-tree, malformed page-tree, and sibling-layer isolation cases.
- Verifies read-only layer workflows save empty deltas while peer layer mutations remain isolated.
Make `CPDF_Creator::WriteNewObjs()` reachability-aware so newly allocated indirect objects that are no longer referenced do not get written into saved PDFs. The save reachability set now includes normal document graph references plus trailer-owned roots such as `/Info` and non-inline `/Encrypt`, avoiding dangling trailer references while still pruning true orphans.

Also refactor `EPDF_SetEncryption()` to use an inline encryption dictionary so it follows CPDF_Creator’s existing trailer-owned encrypt path, and update Bug1206 to assert render-only saves remain stable while reopened rendering still matches.
Introduce EPDF_LoadMemBaseDocument to load a shareable base PDF document from a memory buffer. Refactor loading logic into LoadBaseDocumentImpl that accepts a RetainPtr<IFX_SeekableReadStream> so both file-access and in-memory paths share parsing code. Implement the in-memory path using CFX_ReadOnlySpanStream and UNSAFE_BUFFERS, and update public/fpdfview.h with the new API and documentation. Add test coverage: register the C API symbol in fpdf_view_c_api_test and add an embedder test that loads a PDF from memory. Also add the necessary include headers.
Introduce a size_t-based API for loading base documents from memory: add internal LoadMemBaseDocumentImpl and a public EPDF_LoadMemBaseDocument64 that accepts size_t. Preserve the existing EPDF_LoadMemBaseDocument(int) for backward compatibility but add a guard to reject negative sizes. Update tests to exercise the new API and register it in the C API test. Also add documentation for EPDF_LoadMemBaseDocument64 in the public header and update the release comment to reference both variants.
Introduce EPDFDoc_GetPageObjectNumberByIndex to return a page dictionary's indirect object number by zero-based page index without constructing a CPDF_Page (returns 0 for invalid indices, XFA pages, or direct objects). Add the function declaration to public/fpdfview.h, implement it in fpdfsdk/fpdf_view.cpp (with PDF_ENABLE_XFA guard), register it in the C API test, and add an embedder test verifying null/invalid inputs, non-parsing behavior, and consistency with EPDFPage_GetObjectNumber after loading the page.
Make annotation dictionary parameters const-correct by changing GetAnnotAP, GetAnnotAPNoFallback, and HasAPStream to accept const CPDF_Dictionary*. Update FPDFAnnot_GetColor to use the const-returning GetAnnotDictFromFPDFAnnotation and pass the const dict to HasAPStream. Add an annot_index parameter to RawAnnotContext, forward it to the CPDF_AnnotContext base, and update the creation site to pass the index. These changes improve const-safety and propagate the annotation index into the annot context.
Add ObjectTreeReferenceResolveMode to control how CPDF_Reference targets are resolved (through their holder or via the traversed document). Update ObjectTreeTraverser and GetObjectsWithReferences to accept this mode (defaulting to the previous behavior). Use kEffectiveDocument when collecting reachable objects for layer documents so overlay/promoted objects from a layer override the frozen base graph and are included in saves. Also add a unit test (LayerArtifactIncludesNewAnnotObjectBodies) to verify layer artifacts include newly created annotation object bodies, and add a clarifying comment in CPDF_Creator::WriteOldObjs about incremental vs full saves.
Introduce EmbedPDF redaction support and a reusable helper to append Form XObjects to a page.

- Add public API (public/epdf_redact.h) exposing EPDFAnnot_ApplyRedaction, EPDFAnnot_ApplyRedactionWithReport, EPDFPage_ApplyRedactions, and EPDFPage_ApplyRedactionsWithReport to apply redaction annotations and report removed annotations.
- Implement redaction logic in fpdfsdk/epdf_redact.cpp: collects redaction areas, removes text, flattens optional RO streams, detaches widget entries from AcroForm, cascades popup removals, and writes removal reports with /NM UTF-8 handling.
- Add page content helper (fpdfsdk/epdf_page_content_helpers.*) to append Form XObjects to pages and use it from fpdfsdk/fpdf_annot.cpp (replacing inlined flattening code).
- Update BUILD.gn to include new sources and update public/fpdf_annot.h to expose the new API header.
- Add tests and resources to cover redaction behaviors (removal, reporting, popup cascade, widget handling, preservation of sibling REDACTs, and touch-only annotations).

These changes centralize redaction behavior, provide reporting for removed annotations, and factor Form XObject flattening into a reusable helper.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant