High-throughput HTML parser + CSS selector engine for Zig.
Performance numbers are not conformance claims. The parser is intentionally permissive and currently does not fully match browser-grade tree-construction behavior.
- Conformance details: Documentation#conformance-status
- Benchmark methodology: Documentation#performance-and-benchmarks
- Raw outputs:
bench/results/latest.md,bench/results/latest.json
See the latest benchmark snapshot for more details
Source: bench/results/latest.json (stable profile).
ours-compact │████████████████████│ 1736.88 MB/s (100.00%)
ours-full │███████████████████░│ 1671.17 MB/s (96.22%)
lol-html │███████████░░░░░░░░░│ 993.44 MB/s (57.20%)
| Profile | nwmatcher | qwery_contextual | html5lib subset | WHATWG HTML parsing |
|---|---|---|---|---|
strictest/fastest |
20/20 (0 failed) | 54/54 (0 failed) | 524/600 (76 failed) | 440/500 (60 failed) |
Source: bench/results/external_suite_report.json
- 🔎 CSS selector queries: comptime, runtime, and cached runtime selectors.
- 🧭 DOM navigation: parent, siblings, first/last child, and children iteration.
- 💤 Lazy decode/normalize path: attribute/entity decode and text normalization happen on query-time APIs.
- 🧪 Debug tooling: selector mismatch diagnostics and instrumentation wrappers.
- 🧰 Parse profiles:
strictestandfastestoption bundles for benchmarks/workloads. - 🧵 Destructive parsing by default for throughput, with an opt-in non-destructive read-only mode.
const std = @import("std");
const html = @import("html");
const options: html.ParseOptions = .{};
test "basic parse + query" {
var input = "<div id='app'><a class='nav' href='/docs'>Docs</a></div>".*;
var doc = try options.parse(std.testing.allocator, &input);
defer doc.deinit();
var links = doc.query("div#app > a.nav");
const a = links.next() orelse return error.TestUnexpectedResult;
const href = (try a.getAttributeValue(std.testing.allocator, "href")) orelse return error.TestUnexpectedResult;
defer href.free(&doc, std.testing.allocator);
try std.testing.expectEqualStrings("/docs", href.value);
}Parsing goes through options.parse(...). Use const options: html.ParseOptions = .{ .non_destructive = true }; when the caller bytes must remain unchanged, including file-backed memory maps. This mode reads the original source directly and does not make a full-source copy.
Raw nodes are compact by default. store_last_child enables O(1) children().last(), while store_prev_sibling enables O(1) previous-sibling traversal instead of fallback scans.
-Dintlen=u16|u32|u64|usizeselects the integer width used for document spans and node indexes.- Smaller widths reduce memory use but also reduce the maximum parseable input size.
u32is the default. Useu64for multi-gigabyte inputs.
- Full manual: Documentation
- API details: Documentation#core-api
- Selector grammar: Documentation#selector-support
- Parse mode guidance: Documentation#mode-guidance
- Non-destructive parsing: Documentation#non-destructive-parsing
- Conformance: Documentation#conformance-status
- Architecture: Documentation#architecture
- Troubleshooting: Documentation#troubleshooting
zig build test
zig build docs-check
zig build examples-check
zig build ship-checkexamples/basic_parse_query.zigexamples/runtime_selector.zigexamples/cached_selector.zigexamples/query_time_decode.zigexamples/inner_text_options.zigexamples/non_destructive_parse.zig
MIT. See LICENSE.