PDF to Excel

Extract tables from PDFs into a formatted Excel file. This API is asynchronous — submit a job, poll for status, then download the result.

1

Submit your PDF

POST your PDF to start an extraction job. The API returns immediately with a job ID while processing continues in the background.

curl — submit a job
curl -X POST https://api.contexa.works/api/v1/pdf-to-excel \
  -H "x-rapidapi-key: YOUR_API_KEY" \
  -F "file=@financial-report.pdf"
response — 202 Accepted
{
  "jobId": "d4e5f6a7-...",
  "status": "processing"
}

You can optionally pass a callbackUrl to receive a webhook when the job completes, instead of polling:

curl — with callback
curl -X POST https://api.contexa.works/api/v1/pdf-to-excel \
  -H "x-rapidapi-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "base64": "'$(base64 -i report.pdf)'",
    "callbackUrl": "https://your-app.com/webhooks/contexa"
  }'
2

Poll for completion

Check the job status by polling with the job ID. The status will be processing, completed, or failed.

curl — check status
curl https://api.contexa.works/api/v1/jobs/d4e5f6a7-... \
  -H "x-rapidapi-key: YOUR_API_KEY"
response — processing
{
  "id": "d4e5f6a7-...",
  "status": "processing",
  "documentType": "pdf",
  "fileName": "financial-report.pdf",
  "result": null,
  "error": null,
  "processingTimeMs": null,
  "createdAt": "2024-06-15T10:30:00Z",
  "completedAt": null
}
response — completed
{
  "id": "d4e5f6a7-...",
  "status": "completed",
  "documentType": "pdf",
  "fileName": "financial-report.pdf",
  "result": { "tableCount": 4 },
  "error": null,
  "processingTimeMs": 8320,
  "createdAt": "2024-06-15T10:30:00Z",
  "completedAt": "2024-06-15T10:30:08Z"
}
If using a callback URL, you'll receive a POST when the job finishes:
{ "jobId": "d4e5f6a7-...", "status": "completed", "processingTimeMs": 8320 }
3

Download the Excel file

Once the job is completed, download the .xlsx file. Each table gets its own worksheet with formatted headers, typed columns, and auto-filters.

curl — download result
curl https://api.contexa.works/api/v1/jobs/d4e5f6a7-.../result \
  -H "x-rapidapi-key: YOUR_API_KEY" \
  -o financial-report.xlsx
JavaScript — full flow
const BASE = "https://api.contexa.works/api/v1"

// 1. Submit
const submit = await fetch(`${BASE}/pdf-to-excel`, {
  method: "POST",
  headers: { "x-rapidapi-key": "YOUR_API_KEY" },
  body: formData,
})
const { jobId } = await submit.json()

// 2. Poll until done
let job
do {
  await new Promise(r => setTimeout(r, 2000))
  const res = await fetch(`${BASE}/jobs/${jobId}`, {
    headers: { "x-rapidapi-key": "YOUR_API_KEY" },
  })
  job = await res.json()
} while (job.status === "processing")

// 3. Download
if (job.status === "completed") {
  const file = await fetch(`${BASE}/jobs/${jobId}/result`, {
    headers: { "x-rapidapi-key": "YOUR_API_KEY" },
  })
  const blob = await file.blob()
  // save blob as .xlsx
}
Python — full flow
import requests, time

BASE = "https://api.contexa.works/api/v1"
headers = {"x-rapidapi-key": key}

# 1. Submit
r = requests.post(
    f"{BASE}/pdf-to-excel",
    headers=headers,
    files={"file": open("report.pdf", "rb")},
)
job_id = r.json()["jobId"]

# 2. Poll
while True:
    time.sleep(2)
    status = requests.get(
        f"{BASE}/jobs/{job_id}",
        headers=headers,
    ).json()
    if status["status"] in ("completed", "failed"):
        break

# 3. Download
if status["status"] == "completed":
    result = requests.get(
        f"{BASE}/jobs/{job_id}/result",
        headers=headers,
    )
    with open("report.xlsx", "wb") as f:
        f.write(result.content)
    print(f"Saved! {status['result']['tableCount']} tables extracted")
What you get: A formatted Excel file with one worksheet per table, bold headers, typed columns (numbers, currency, percentages, dates), auto-filters, and frozen header rows.
TermsPrivacyContact