POST /api/v1/pdf-to-json

PDF to structured JSON.
Tables, sections, types.

Feed in any PDF — financial reports, research papers, government forms — and get back clean JSON with tables, sections, key-value pairs, and typed columns. No regex. No templates.

How it works

01

Upload a PDF

Multipart file upload or base64 JSON — works with scans too.

02

AI reads every page

Page-by-page extraction with rolling context. Tables, sections, key-value pairs detected.

03

Get typed JSON

Columns are typed (number, date, currency, %). Tables merged across pages.

# Submit a PDF
curl -X POST https://api.contexa.works/api/v1/pdf-to-json \
  -H "x-rapidapi-key: YOUR_KEY" \
  -F "file=@report.pdf"

# Response: { "jobId": "abc-123", "status": "processing" }

# Poll for status
curl https://api.contexa.works/api/v1/jobs/abc-123

# Get JSON result
curl https://api.contexa.works/api/v1/jobs/abc-123/result

What sets it apart

Typed columns

Numbers, currency, percentages, dates, booleans — detected per column, not just strings.

Multi-page tables

Tables spanning multiple pages are merged automatically using column-name matching.

Sections & headings

Document structure is preserved — headings, body text, and nested tables.

Key-value pairs

Form fields, metadata, and label-value pairs extracted separately from tables.

Custom prompts

Tell the AI what to focus on or how to transform the output.

Date locale detection

Detects US/UK/EU date formats from currency, addresses, and spelling clues.

See it on your own PDF

Drop a PDF in the playground. Structured JSON in seconds. No signup.

Try it now