Files API overview

What is the Files API

The Mathpix Files API is a high-throughput, asynchronous document processing API. It's designed for large-scale workloads — converting whole document archives, running continuous ingestion pipelines, or batching up to 200,000 files in a single call — at a lower per-page cost than v3/pdf.

When to use Files API vs `v3/pdf`

	`v3/pdf` (Convert API)	Files API
Use it for	Real-time, per-document processing in interactive apps	Processing existing document archives, one-time conversion of large PDF collections, continuous high-volume pipelines, data-wall workloads
Submission	One document per request	One document, or batches of up to 200,000 files per `/jobs` call
Result delivery	Polled from Mathpix-hosted URLs	Polled from Mathpix-hosted URLs or written directly to your S3 / GCS / Azure bucket
Per-page price	Flat — see pricing	$1.50 / 1,000 pages below 30M/month; $1.00 / 1,000 pages above
Single-document latency	Lower (optimized for interactive)	Higher per document — but a million-document workload completes far sooner

How it works

Every Files API workflow is the same three steps, whether you submit one document or 200,000:

(Optional) Connect your storage. Register a data source once to read inputs from — and write results back to — your own S3, GCS, or Azure bucket. Skip this if you only submit public/presigned https:// URLs.
Submit. Send one document with POST /files/v1/uri (or a direct upload with POST /files/v1), or a batch with POST /files/v1/jobs. Submission returns immediately — processing happens asynchronously.
Poll, then collect. Poll the file or job for completion, then download outputs (mmd, docx, tex.zip, …) — or, if you set a destination_uri, have them written straight to your bucket.

Core concepts

file_id — the handle for a single document, returned by every submission. Use it to poll status and download results: see Async Document Lifecycle.
job_id — groups many files submitted together so you can track and list them as a unit: see Async Batch Document Processing.
custom_id — your own per-file identifier, echoed back in job listings so you don't have to track Mathpix's file_ids. Paired with a job_id, it also makes submission idempotent (safe to retry).
Data sources — the keyless grant that lets Mathpix read from and write to your cloud storage. No credentials are shared; you grant access via IAM role (AWS), AD app (Azure), or service-account impersonation (GCS): see Data Sources.

Looking for the product pitch — pricing, positioning, and why teams choose the Files API? See mathpix.com/files-api.

What's coming next

Webhooks. Today you poll GET /files/v1/jobs/{job_id} for completion. Soon you'll be able to subscribe to job- and file-level events on submission (job.completed, file.completed, file.error) — Mathpix POSTs to your endpoint when state changes, no polling required. Webhook response shapes mirror the polling endpoints, so callbacks slot directly into existing handlers.

Next steps

Files API Quickstart — guide walkthrough.
Connect your cloud storage — grant Mathpix access to your S3, Azure Blob, or GCS bucket.
Process a Document Async — API reference for the single-document endpoint (POST /files/v1/uri).
Async Batch Document Processing — API reference for the bulk endpoint (POST /files/v1/jobs).

What is the Files API​

When to use Files API vs v3/pdf​

How it works​

Core concepts​

What's coming next​

Next steps​