POST /api/v1/pdf-to-json
PDF to structured JSON.
Tables, sections, types.
Feed in any PDF — financial reports, research papers, government forms — and get back clean JSON with tables, sections, key-value pairs, and typed columns. No regex. No templates.
How it works
Upload a PDF
Multipart file upload or base64 JSON — works with scans too.
AI reads every page
Page-by-page extraction with rolling context. Tables, sections, key-value pairs detected.
Get typed JSON
Columns are typed (number, date, currency, %). Tables merged across pages.
# Submit a PDF
curl -X POST https://api.contexa.works/api/v1/pdf-to-json \
-H "x-rapidapi-key: YOUR_KEY" \
-F "file=@report.pdf"
# Response: { "jobId": "abc-123", "status": "processing" }
# Poll for status
curl https://api.contexa.works/api/v1/jobs/abc-123
# Get JSON result
curl https://api.contexa.works/api/v1/jobs/abc-123/resultWhat sets it apart
Typed columns
Numbers, currency, percentages, dates, booleans — detected per column, not just strings.
Multi-page tables
Tables spanning multiple pages are merged automatically using column-name matching.
Sections & headings
Document structure is preserved — headings, body text, and nested tables.
Key-value pairs
Form fields, metadata, and label-value pairs extracted separately from tables.
Custom prompts
Tell the AI what to focus on or how to transform the output.
Date locale detection
Detects US/UK/EU date formats from currency, addresses, and spelling clues.
See it on your own PDF
Drop a PDF in the playground. Structured JSON in seconds. No signup.
Try it now