PDF to Excel
Extract tables from PDFs into a formatted Excel file. This API is asynchronous — submit a job, poll for status, then download the result.
1
Submit your PDF
POST your PDF to start an extraction job. The API returns immediately with a job ID while processing continues in the background.
curl — submit a job
curl -X POST https://api.contexa.works/api/v1/pdf-to-excel \ -H "x-rapidapi-key: YOUR_API_KEY" \ -F "file=@financial-report.pdf"
response — 202 Accepted
{
"jobId": "d4e5f6a7-...",
"status": "processing"
}You can optionally pass a callbackUrl to receive a webhook when the job completes, instead of polling:
curl — with callback
curl -X POST https://api.contexa.works/api/v1/pdf-to-excel \
-H "x-rapidapi-key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"base64": "'$(base64 -i report.pdf)'",
"callbackUrl": "https://your-app.com/webhooks/contexa"
}'2
Poll for completion
Check the job status by polling with the job ID. The status will be processing, completed, or failed.
curl — check status
curl https://api.contexa.works/api/v1/jobs/d4e5f6a7-... \ -H "x-rapidapi-key: YOUR_API_KEY"
response — processing
{
"id": "d4e5f6a7-...",
"status": "processing",
"documentType": "pdf",
"fileName": "financial-report.pdf",
"result": null,
"error": null,
"processingTimeMs": null,
"createdAt": "2024-06-15T10:30:00Z",
"completedAt": null
}response — completed
{
"id": "d4e5f6a7-...",
"status": "completed",
"documentType": "pdf",
"fileName": "financial-report.pdf",
"result": { "tableCount": 4 },
"error": null,
"processingTimeMs": 8320,
"createdAt": "2024-06-15T10:30:00Z",
"completedAt": "2024-06-15T10:30:08Z"
}If using a callback URL, you'll receive a POST when the job finishes:
{ "jobId": "d4e5f6a7-...", "status": "completed", "processingTimeMs": 8320 }3
Download the Excel file
Once the job is completed, download the .xlsx file. Each table gets its own worksheet with formatted headers, typed columns, and auto-filters.
curl — download result
curl https://api.contexa.works/api/v1/jobs/d4e5f6a7-.../result \ -H "x-rapidapi-key: YOUR_API_KEY" \ -o financial-report.xlsx
JavaScript — full flow
const BASE = "https://api.contexa.works/api/v1"
// 1. Submit
const submit = await fetch(`${BASE}/pdf-to-excel`, {
method: "POST",
headers: { "x-rapidapi-key": "YOUR_API_KEY" },
body: formData,
})
const { jobId } = await submit.json()
// 2. Poll until done
let job
do {
await new Promise(r => setTimeout(r, 2000))
const res = await fetch(`${BASE}/jobs/${jobId}`, {
headers: { "x-rapidapi-key": "YOUR_API_KEY" },
})
job = await res.json()
} while (job.status === "processing")
// 3. Download
if (job.status === "completed") {
const file = await fetch(`${BASE}/jobs/${jobId}/result`, {
headers: { "x-rapidapi-key": "YOUR_API_KEY" },
})
const blob = await file.blob()
// save blob as .xlsx
}Python — full flow
import requests, time
BASE = "https://api.contexa.works/api/v1"
headers = {"x-rapidapi-key": key}
# 1. Submit
r = requests.post(
f"{BASE}/pdf-to-excel",
headers=headers,
files={"file": open("report.pdf", "rb")},
)
job_id = r.json()["jobId"]
# 2. Poll
while True:
time.sleep(2)
status = requests.get(
f"{BASE}/jobs/{job_id}",
headers=headers,
).json()
if status["status"] in ("completed", "failed"):
break
# 3. Download
if status["status"] == "completed":
result = requests.get(
f"{BASE}/jobs/{job_id}/result",
headers=headers,
)
with open("report.xlsx", "wb") as f:
f.write(result.content)
print(f"Saved! {status['result']['tableCount']} tables extracted")What you get: A formatted Excel file with one worksheet per table, bold headers, typed columns (numbers, currency, percentages, dates), auto-filters, and frozen header rows.