Skip to main content

Process a Document Async

POST api.mathpix.com/files/v1/uri

Submit a single document for async processing by remote URI. Returns a file_id for polling status and downloading results. Use POST /files/v1/jobs for bulk submission of many URIs in one call.

Accepted source_uri forms

SchemeExampleRequires data source
s3://s3://your-bucket/path/file.pdfYes — AWS data source
gs://gs://your-bucket/path/file.pdfYes — GCS data source
Azure Blobhttps://{account}.blob.core.windows.net/{container}/{blob}.pdfYes — Azure data source
https://https://example.com/file.pdfNo — public URL

Azure Blob URIs are host-matched on *.blob.core.windows.net; supply the full HTTPS form (no azure:// shorthand).

Example

curl -X POST https://api.mathpix.com/files/v1/uri \
-H 'app_key: APP_KEY' \
-H 'Content-Type: application/json' \
--data '{
"source_uri": "https://cdn.mathpix.com/examples/cs229-notes1.pdf",
"conversion_formats": { "docx": true, "md": true }
}'
Example response
{
"file_id": "b1c9c3a8-55e4-4a09-b7d0-218ba5de4c4d"
}

Examples by source scheme

curl -X POST https://api.mathpix.com/files/v1/uri \
-H 'app_key: APP_KEY' \
-H 'Content-Type: application/json' \
--data '{
"source_uri": "https://cdn.mathpix.com/examples/cs229-notes1.pdf",
"conversion_formats": { "docx": true, "md": true }
}'

Request parameters

In addition to the Files-API-specific parameters below, this endpoint accepts the same OCR and conversion options as POST /v3/pdf (alphabets_allowed, rm_spaces, math_inline_delimiters, include_smiles, etc.).

source_uri string

Remote location of the source document. One of:

  • s3://bucket/key — requires a registered AWS data source for the bucket.
  • gs://bucket/key — requires a registered GCS data source.
  • https://{account}.blob.core.windows.net/{container}/{blob} — requires a registered Azure data source.
  • https://... — public URL with no auth.
conversion_formats object (optional)

Which output formats to produce. Same shape and supported keys as POST /v3/pdf — e.g. { "docx": true, "md": true, "tex.zip": true }. Output also includes Mathpix Markdown (.mmd) by default.

destination_uri string (optional)

Where to write conversion results when they complete. Same scheme rules as source_uri; must be a writeable location backed by a registered data source. Outputs land at <destination_uri>/<destination_basename>.<ext> per requested format (destination_basename defaults to the file_id), and cropped images (when image_output_mode is "local") under <destination_uri>/images/. When omitted, results stay in Mathpix storage and are fetched via GET /files/v1/{file_id}.{ext} — see Data retention for how long each artifact is kept.

destination_basename string (optional)

Basename for this file's output objects within destination_uri — results land at <destination_uri>/<destination_basename>.<ext>. Defaults to the file_id.

custom_id string (optional)

Customer-supplied identifier for this submission. Max 256 chars, characters [A-Za-z0-9_\-.:], case-sensitive. Echoed back in the job file listing for correlation. Requires job_idcustom_id is a per-job identifier, so submitting one without a job_id is rejected with 400 bad_request. Together, (job_id, custom_id) is the idempotency key: re-submitting the same pair returns the original file_id rather than creating a new submission. To make a submission idempotent without a job_id, use the Idempotency-Key header instead (below).

job_id string (optional)

Associates this file with a job. When omitted, the file is treated as standalone. Required whenever custom_id is supplied. See Jobs for how files group into jobs and Idempotency for the dedup semantics.

image_output_mode string (optional)

Set to "local" to write cropped images (figures, equation crops, etc.) directly into your destination_uri storage (under <destination_uri>/images/), alongside the converted outputs. When unset (default), cropped images stay on Mathpix's CDN and are referenced by URL from the Mathpix Markdown output. Requires destination_uri to be set and backed by a registered data source.

Idempotency

A single-document submission can be made safe to retry in one of two ways:

  • Idempotency-Key request header — for standalone submissions with no job_id. Send a client-generated unique value (same constraints as custom_id: max 256 chars, [A-Za-z0-9_\-.:]). Re-sending a request with the same key returns the original file_id instead of creating (and billing) a duplicate. This is the recommended dedup mechanism for /uri.
  • (job_id, custom_id) — for submissions that belong to a job. See Jobs → Idempotency.

If both a custom_id (with its required job_id) and an Idempotency-Key header are present, the (job_id, custom_id) pair takes precedence.

Response

{ "file_id": "<uuid>" }

The returned file_id is the handle for everything that happens next — see Async Document Lifecycle:

Errors

Errors follow the shared Files API error envelope. Codes you'll see most often on this endpoint:

CodeHTTPWhen it fires
bad_request400source_uri missing, malformed, or uses an unsupported scheme.
unauthorized401app_key invalid or missing.
data_source_not_found404source_uri references a bucket with no registered data source.
data_source_access_denied403Bucket has a data source but the IAM grant isn't reachable (revoked, wrong ExternalId, expired token).
not_found404URI's bucket / key returned 404 from the upstream storage.
unsupported_format415Input MIME type is not one of the supported input types.
quota_exceeded429Monthly page or file quota reached.
storage_throttled503Upstream storage backend returned throttling — retry with backoff.

Supported inputs

PDF, DOCX, DOC, PPTX, EPUB, ODT, RTF, and other document formats (detected by MIME type). See Supported formats for the full list.

See also