PDF extraction that checks its own work. #2 reading order accuracy — zero AI, zero GPU, zero cost.
-
Updated
Jun 10, 2026 - Python
PDF extraction that checks its own work. #2 reading order accuracy — zero AI, zero GPU, zero cost.
Extract structured data from local or remote LLM models
A schema-driven framework for LLM structured extraction enhanced by multi-stage RL training (SFT→DPO→GRPO), with interpretable reward design and end-to-end reproducibility.
Reproducible diagnostic investigation of a fine-tuned SLM that scored 99.75% on evaluation and failed silently on 10% of production inputs. Full pipeline. Every number verified.
Claude Code Skill for structured information extraction from code/docs/logs. 6-step Python pipeline (source grounding, dedup, confidence scoring, entity resolution, relation inference, KG injection). Zero dependencies, no API keys. Replaces LangExtract.
Collection of purpose-built MCP servers for AI agent workflows.
Auditable LLM extraction for Java: structured output with source citations, PDF bounding boxes, confidence, provenance, and audit JSON.
A simple llm library
Send your low-confidence document extractions. A human reviews them against the PDF and returns a typed Pydantic/Zod response. Managed document verification for AI agents. PDF + handwritten OCR. Client-side fragmentation: full document never leaves your machine. $0.80/page + $5 free credit. Express 30-min SLA. Built on open source awaithumans.
news-summizr extracts structured summaries from headlines, labeling key points like announcement, products, region for quick insight.
A new package is designed to facilitate structured, reliable extraction of key insights from user-provided texts about cultural topics. It accepts a text input, such as an article or discussion prompt
Structured CV extraction with strict JSON schema and anti-hallucination guarantees.
Turn tutorial videos into structured specs — Pine Script, recipes, code walkthroughs
Automated research paper analysis: PDF → JSON with evidence extraction using LLMs (DeepSeek, Gemma). Extracts methods, results, datasets, and claims with precise evidence grounding.
schema-driven evaluation for LLM JSON extraction, json evaluation, structured-extraction, benchmark
Automated prompt optimization using mentor-agent architecture. Generate and refine prompts from labeled data.
Parses SEC EDGAR Form 10-K annual reports into standardized JSON, automatically identifying the content and status of every Item
AI-powered travel agency assistant (*) a LangGraph stateful agent on Telegram that captures preferences through natural conversation, generates personalized itineraries via Groq/Llama 3.3, auto-manages leads in Excel, and remembers returning users. Built with LangChain, FastAPI, and python-telegram-bot.
ReAct-based intelligent analysis Agent with 4-layer architecture (Skill-Agent-LLMService-Tool), dual tool-calling modes (Native FC / Prompt-based), triple execution engine (Offline/Fast/Agent), incremental reflection with convergence detection, Skill template system, SSE streaming, Prometheus monitoring, and SFT trajectory export.
Human-in-the-loop LLM orchestration with structured signal extraction and session persistence. Annotate confusion and curiosity—feedback shapes responses, topology accumulates over time. API-first design, no gamification. FastAPI + Claude + SQLite + D3.
Add a description, image, and links to the structured-extraction topic page so that developers can more easily learn about it.
To associate your repository with the structured-extraction topic, visit your repo's landing page and select "manage topics."