Files API Quickstart
Submit a real batch in five minutes. By the end of this guide you'll have submitted a single document, polled its status, downloaded its converted result, and submitted a two-item batch — using only your API key and curl (or Python / Node).
Prerequisites
- An API key. Grab one from the Mathpix Console and export it as
APP_KEYin your shell. - For
s3://,gs://, or Azure Blob URLs, a registered data source for the bucket. Publichttps://URLs work without any setup.
export APP_KEY="your-app-key"
1. Submit a single document
Use POST /files/v1/uri to submit one document by URL.
- cURL
- Python
- Node
curl -X POST https://api.mathpix.com/files/v1/uri \
-H "app_key: $APP_KEY" \
-H 'Content-Type: application/json' \
--data '{
"source_uri": "https://cdn.mathpix.com/examples/cs229-notes1.pdf",
"conversion_formats": { "docx": true, "md": true }
}'
import os, requests
r = requests.post("https://api.mathpix.com/files/v1/uri",
json={
"source_uri": "https://cdn.mathpix.com/examples/cs229-notes1.pdf",
"conversion_formats": {"docx": True, "md": True},
},
headers={"app_key": os.environ["APP_KEY"], "Content-Type": "application/json"},
)
print(r.json()) # {"file_id": "<uuid>"}
const res = await fetch("https://api.mathpix.com/files/v1/uri", {
method: "POST",
headers: { app_key: process.env.APP_KEY, "Content-Type": "application/json" },
body: JSON.stringify({
source_uri: "https://cdn.mathpix.com/examples/cs229-notes1.pdf",
conversion_formats: { docx: true, md: true },
}),
});
const { file_id } = await res.json();
console.log(file_id);
{ "file_id": "b1c9c3a8-55e4-4a09-b7d0-218ba5de4c4d" }
Keep the returned file_id — it's how you'll check status and download results.
2. Check status
Poll GET /files/v1/{file_id} until status == "completed" (or "error"). A typical document moves through these states:
// Just submitted
{ "file_id": "b1c9c3a8-55e4-4a09-b7d0-218ba5de4c4d", "status": "pending", "percent_done": 0 }
// Pages extracted, OCR in progress
{ "file_id": "b1c9c3a8-55e4-4a09-b7d0-218ba5de4c4d", "status": "split", "percent_done": 60.0 }
// Done — outputs available
{ "file_id": "b1c9c3a8-55e4-4a09-b7d0-218ba5de4c4d", "status": "completed", "percent_done": 100.0 }
- cURL
- Python
curl -H "app_key: $APP_KEY" \
"https://api.mathpix.com/files/v1/$FILE_ID"
import os, requests, time
while True:
r = requests.get(f"https://api.mathpix.com/files/v1/{file_id}",
headers={"app_key": os.environ["APP_KEY"]})
body = r.json()
print(body["status"], body.get("percent_done"))
if body["status"] in ("completed", "error"):
break
time.sleep(2)
3. Download the result
Once status is completed, request results by extension via GET /files/v1/{file_id}.{ext}.
- Always produced (download without pre-requesting):
mmd,lines.json,lines.mmd.json. - On request (must be set in
conversion_formatson submission):docx,html,tex.zip,md, and others — see the full availability table in Supported Formats.
- cURL
- Python
# Mathpix Markdown — always available
curl -H "app_key: $APP_KEY" \
"https://api.mathpix.com/files/v1/$FILE_ID.mmd" \
-o result.mmd
# DOCX — only if you asked for it on submission
curl -H "app_key: $APP_KEY" \
"https://api.mathpix.com/files/v1/$FILE_ID.docx" \
-o result.docx
import os, requests
for ext in ("mmd", "docx"):
r = requests.get(f"https://api.mathpix.com/files/v1/{file_id}.{ext}",
headers={"app_key": os.environ["APP_KEY"]})
with open(f"result.{ext}", "wb") as f:
f.write(r.content)
4. Submit many at once
For batches, use POST /files/v1/jobs instead — up to 200,000 files in a single request. Pass an array of source URIs plus job-wide conversion/OCR options applied to every file. Each file can carry an optional custom_id for your own correlation. job_id is optional — the server generates one if you omit it — but you must supply your own when you use custom_id (it's a per-job identifier), as the example below does.
- cURL
- Python
- Node
curl -X POST https://api.mathpix.com/files/v1/jobs \
-H "app_key: $APP_KEY" \
-H 'Content-Type: application/json' \
--data '{
"job_id": "quickstart-batch",
"files": [
{ "source_uri": "https://cdn.mathpix.com/examples/cs229-notes1.pdf", "custom_id": "cs229" },
{ "source_uri": "https://example.com/manual.pdf", "custom_id": "manual" }
],
"conversion_formats": { "docx": true, "md": true }
}'
import os, requests
r = requests.post("https://api.mathpix.com/files/v1/jobs",
json={
"job_id": "quickstart-batch",
"files": [
{"source_uri": "https://cdn.mathpix.com/examples/cs229-notes1.pdf", "custom_id": "cs229"},
{"source_uri": "https://example.com/manual.pdf", "custom_id": "manual"},
],
"conversion_formats": {"docx": True, "md": True},
},
headers={"app_key": os.environ["APP_KEY"], "Content-Type": "application/json"},
)
print(r.json()) # {"job_id": "quickstart-batch", "file_count": 2}
const res = await fetch("https://api.mathpix.com/files/v1/jobs", {
method: "POST",
headers: { app_key: process.env.APP_KEY, "Content-Type": "application/json" },
body: JSON.stringify({
job_id: "quickstart-batch",
files: [
{ source_uri: "https://cdn.mathpix.com/examples/cs229-notes1.pdf", custom_id: "cs229" },
{ source_uri: "https://example.com/manual.pdf", custom_id: "manual" },
],
conversion_formats: { docx: true, md: true },
}),
});
const { job_id, file_count } = await res.json();
console.log(`Job ${job_id} accepted ${file_count} files`);
{
"job_id": "quickstart-batch",
"file_count": 2
}
5. Track the job
Poll GET /files/v1/jobs/{job_id} for status and counters, then list per-file results (with optional status= filter) via GET /files/v1/jobs/{job_id}/files.
- cURL
- Python
# Job status + counters
curl -H "app_key: $APP_KEY" \
"https://api.mathpix.com/files/v1/jobs/$JOB_ID"
# Per-file listing — paginate with paging_state from each response
curl -H "app_key: $APP_KEY" \
"https://api.mathpix.com/files/v1/jobs/$JOB_ID/files"
# Only the errored files
curl -H "app_key: $APP_KEY" \
"https://api.mathpix.com/files/v1/jobs/$JOB_ID/files?status=error"
import os, requests, time
headers = {"app_key": os.environ["APP_KEY"]}
# Wait for the job to complete
while True:
body = requests.get(f"https://api.mathpix.com/files/v1/jobs/{job_id}",
headers=headers).json()
print(body["status"], body.get("files_completed"), "/", body.get("file_count"))
if body["status"] == "completed":
break
time.sleep(5)
# List per-file results
files, page = [], None
while True:
params = {"paging_state": page} if page else {}
body = requests.get(f"https://api.mathpix.com/files/v1/jobs/{job_id}/files",
params=params, headers=headers).json()
files.extend(body["files"])
page = body.get("next_page_token")
if not page:
break
print(f"{len(files)} files: {[f['custom_id'] for f in files]}")
{
"job_id": "quickstart-batch",
"status": "completed",
"file_count": 2,
"files_completed": 2,
"files_errored": 0,
"created_at": "2026-05-28T12:00:00Z",
"modified_at": "2026-05-28T12:04:11Z"
}
Once the job is completed, download per-file outputs the same way as Step 3 — GET /files/v1/{file_id}.{ext} for each file_id returned by the listing.
Where to go next
- Process a Document Async — full reference for the single-document endpoint.
- Async Batch Document Processing — full reference for jobs, including pagination and idempotency.
- Async Document Lifecycle — file status, download, and DELETE.
- Connect your cloud storage — register data sources for your S3, Azure Blob, or GCS buckets so you can send private URIs and have results uploaded directly to your storage.
- Migrate From SCS to Files API — for existing SCS customers.