Skip to main content

Files API overview

What is the Files API

The Mathpix Files API is a high-throughput, asynchronous document processing API. It's designed for large-scale workloads — converting whole document archives, running continuous ingestion pipelines, or batching up to 200,000 files in a single call — at a lower per-page cost than v3/pdf.

When to use Files API vs v3/pdf

v3/pdf (Convert API)Files API
Use it forReal-time, per-document processing in interactive appsProcessing existing document archives, one-time conversion of large PDF collections, continuous high-volume pipelines, data-wall workloads
SubmissionOne document per requestOne document, or batches of up to 200,000 files per /jobs call
Result deliveryPolled from Mathpix-hosted URLsPolled from Mathpix-hosted URLs or written directly to your S3 / GCS / Azure bucket
Per-page priceFlat — see pricing$1.50 / 1,000 pages below 30M/month; $1.00 / 1,000 pages above
Single-document latencyLower (optimized for interactive)Higher per document — but a million-document workload completes far sooner

How it works

Every Files API workflow is the same three steps, whether you submit one document or 200,000:

  1. (Optional) Connect your storage. Register a data source once to read inputs from — and write results back to — your own S3, GCS, or Azure bucket. Skip this if you only submit public/presigned https:// URLs.
  2. Submit. Send one document with POST /files/v1/uri (or a direct upload with POST /files/v1), or a batch with POST /files/v1/jobs. Submission returns immediately — processing happens asynchronously.
  3. Poll, then collect. Poll the file or job for completion, then download outputs (mmd, docx, tex.zip, …) — or, if you set a destination_uri, have them written straight to your bucket.

Core concepts

  • file_id — the handle for a single document, returned by every submission. Use it to poll status and download results: see Async Document Lifecycle.
  • job_id — groups many files submitted together so you can track and list them as a unit: see Async Batch Document Processing.
  • custom_id — your own per-file identifier, echoed back in job listings so you don't have to track Mathpix's file_ids. Paired with a job_id, it also makes submission idempotent (safe to retry).
  • Data sources — the keyless grant that lets Mathpix read from and write to your cloud storage. No credentials are shared; you grant access via IAM role (AWS), AD app (Azure), or service-account impersonation (GCS): see Data Sources.

Looking for the product pitch — pricing, positioning, and why teams choose the Files API? See mathpix.com/files-api.

What's coming next

Webhooks. Today you poll GET /files/v1/jobs/{job_id} for completion. Soon you'll be able to subscribe to job- and file-level events on submission (job.completed, file.completed, file.error) — Mathpix POSTs to your endpoint when state changes, no polling required. Webhook response shapes mirror the polling endpoints, so callbacks slot directly into existing handlers.

Next steps