Files API overview
What is the Files API
The Mathpix Files API is a high-throughput, asynchronous document processing API. It's designed for large-scale workloads — converting whole document archives, running continuous ingestion pipelines, or batching up to 200,000 files in a single call — at a lower per-page cost than v3/pdf.
When to use Files API vs v3/pdf
v3/pdf (Convert API) | Files API | |
|---|---|---|
| Use it for | Real-time, per-document processing in interactive apps | Processing existing document archives, one-time conversion of large PDF collections, continuous high-volume pipelines, data-wall workloads |
| Submission | One document per request | One document, or batches of up to 200,000 files per /jobs call |
| Result delivery | Polled from Mathpix-hosted URLs | Polled from Mathpix-hosted URLs or written directly to your S3 / GCS / Azure bucket |
| Per-page price | Flat — see pricing | $1.50 / 1,000 pages below 30M/month; $1.00 / 1,000 pages above |
| Single-document latency | Lower (optimized for interactive) | Higher per document — but a million-document workload completes far sooner |
How it works
Every Files API workflow is the same three steps, whether you submit one document or 200,000:
- (Optional) Connect your storage. Register a data source once to read inputs from — and write results back to — your own S3, GCS, or Azure bucket. Skip this if you only submit public/presigned
https://URLs. - Submit. Send one document with
POST /files/v1/uri(or a direct upload withPOST /files/v1), or a batch withPOST /files/v1/jobs. Submission returns immediately — processing happens asynchronously. - Poll, then collect. Poll the file or job for completion, then download outputs (
mmd,docx,tex.zip, …) — or, if you set adestination_uri, have them written straight to your bucket.
Core concepts
file_id— the handle for a single document, returned by every submission. Use it to poll status and download results: see Async Document Lifecycle.job_id— groups many files submitted together so you can track and list them as a unit: see Async Batch Document Processing.custom_id— your own per-file identifier, echoed back in job listings so you don't have to track Mathpix'sfile_ids. Paired with ajob_id, it also makes submission idempotent (safe to retry).- Data sources — the keyless grant that lets Mathpix read from and write to your cloud storage. No credentials are shared; you grant access via IAM role (AWS), AD app (Azure), or service-account impersonation (GCS): see Data Sources.
Looking for the product pitch — pricing, positioning, and why teams choose the Files API? See mathpix.com/files-api.
What's coming next
Webhooks. Today you poll GET /files/v1/jobs/{job_id} for completion. Soon you'll be able to subscribe to job- and file-level events on submission (job.completed, file.completed, file.error) — Mathpix POSTs to your endpoint when state changes, no polling required. Webhook response shapes mirror the polling endpoints, so callbacks slot directly into existing handlers.
Next steps
- Files API Quickstart — guide walkthrough.
- Connect your cloud storage — grant Mathpix access to your S3, Azure Blob, or GCS bucket.
- Process a Document Async — API reference for the single-document endpoint (
POST /files/v1/uri). - Async Batch Document Processing — API reference for the bulk endpoint (
POST /files/v1/jobs).