Process a Document Async
POST api.mathpix.com/files/v1/uri
Submit a single document for async processing by remote URI. Returns a file_id for polling status and downloading results. Use POST /files/v1/jobs for bulk submission of many URIs in one call.
Accepted source_uri forms
| Scheme | Example | Requires data source |
|---|---|---|
s3:// | s3://your-bucket/path/file.pdf | Yes — AWS data source |
gs:// | gs://your-bucket/path/file.pdf | Yes — GCS data source |
| Azure Blob | https://{account}.blob.core.windows.net/{container}/{blob}.pdf | Yes — Azure data source |
https:// | https://example.com/file.pdf | No — public URL |
Azure Blob URIs are host-matched on *.blob.core.windows.net; supply the full HTTPS form (no azure:// shorthand).
Example
- cURL
- Python
- JavaScript / TypeScript
curl -X POST https://api.mathpix.com/files/v1/uri \
-H 'app_key: APP_KEY' \
-H 'Content-Type: application/json' \
--data '{
"source_uri": "https://cdn.mathpix.com/examples/cs229-notes1.pdf",
"conversion_formats": { "docx": true, "md": true }
}'
import requests
r = requests.post("https://api.mathpix.com/files/v1/uri",
json={
"source_uri": "https://cdn.mathpix.com/examples/cs229-notes1.pdf",
"conversion_formats": {"docx": True, "md": True},
},
headers={
"app_key": "APP_KEY",
"Content-Type": "application/json",
},
)
print(r.json()) # {"file_id": "<uuid>"}
const response = await fetch("https://api.mathpix.com/files/v1/uri", {
method: "POST",
headers: {
app_key: "APP_KEY",
"Content-Type": "application/json",
},
body: JSON.stringify({
source_uri: "https://cdn.mathpix.com/examples/cs229-notes1.pdf",
conversion_formats: { docx: true, md: true },
}),
});
const { file_id } = await response.json();
console.log(`File ID: ${file_id}`);
{
"file_id": "b1c9c3a8-55e4-4a09-b7d0-218ba5de4c4d"
}
Examples by source scheme
- https://
- s3://
- gs://
- Azure Blob
curl -X POST https://api.mathpix.com/files/v1/uri \
-H 'app_key: APP_KEY' \
-H 'Content-Type: application/json' \
--data '{
"source_uri": "https://cdn.mathpix.com/examples/cs229-notes1.pdf",
"conversion_formats": { "docx": true, "md": true }
}'
# Bucket must have a registered AWS data source (see /reference/files-v1-data-sources)
curl -X POST https://api.mathpix.com/files/v1/uri \
-H 'app_key: APP_KEY' \
-H 'Content-Type: application/json' \
--data '{
"source_uri": "s3://your-bucket/contracts/contract-001.pdf",
"conversion_formats": { "docx": true, "md": true }
}'
# Bucket must have a registered GCS data source (see /reference/files-v1-data-sources)
curl -X POST https://api.mathpix.com/files/v1/uri \
-H 'app_key: APP_KEY' \
-H 'Content-Type: application/json' \
--data '{
"source_uri": "gs://your-bucket/contracts/contract-001.pdf",
"conversion_formats": { "docx": true, "md": true }
}'
# Container must have a registered Azure data source (see /reference/files-v1-data-sources)
curl -X POST https://api.mathpix.com/files/v1/uri \
-H 'app_key: APP_KEY' \
-H 'Content-Type: application/json' \
--data '{
"source_uri": "https://acmestorage.blob.core.windows.net/contracts/contract-001.pdf",
"conversion_formats": { "docx": true, "md": true }
}'
Request parameters
In addition to the Files-API-specific parameters below, this endpoint accepts the same OCR and conversion options as POST /v3/pdf (alphabets_allowed, rm_spaces, math_inline_delimiters, include_smiles, etc.).
source_uri Remote location of the source document. One of:
s3://bucket/key— requires a registered AWS data source for the bucket.gs://bucket/key— requires a registered GCS data source.https://{account}.blob.core.windows.net/{container}/{blob}— requires a registered Azure data source.https://...— public URL with no auth.
conversion_formats Which output formats to produce. Same shape and supported keys as POST /v3/pdf — e.g. { "docx": true, "md": true, "tex.zip": true }. Output also includes Mathpix Markdown (.mmd) by default.
destination_uri Where to write conversion results when they complete. Same scheme rules as source_uri; must be a writeable location backed by a registered data source. Outputs land at <destination_uri>/<destination_basename>.<ext> per requested format (destination_basename defaults to the file_id), and cropped images (when image_output_mode is "local") under <destination_uri>/images/. When omitted, results stay in Mathpix storage and are fetched via GET /files/v1/{file_id}.{ext} — see Data retention for how long each artifact is kept.
destination_basename Basename for this file's output objects within destination_uri — results land at <destination_uri>/<destination_basename>.<ext>. Defaults to the file_id.
custom_id Customer-supplied identifier for this submission. Max 256 chars, characters [A-Za-z0-9_\-.:], case-sensitive. Echoed back in the job file listing for correlation. Requires job_id — custom_id is a per-job identifier, so submitting one without a job_id is rejected with 400 bad_request. Together, (job_id, custom_id) is the idempotency key: re-submitting the same pair returns the original file_id rather than creating a new submission. To make a submission idempotent without a job_id, use the Idempotency-Key header instead (below).
job_id Associates this file with a job. When omitted, the file is treated as standalone. Required whenever custom_id is supplied. See Jobs for how files group into jobs and Idempotency for the dedup semantics.
image_output_mode Set to "local" to write cropped images (figures, equation crops, etc.) directly into your destination_uri storage (under <destination_uri>/images/), alongside the converted outputs. When unset (default), cropped images stay on Mathpix's CDN and are referenced by URL from the Mathpix Markdown output. Requires destination_uri to be set and backed by a registered data source.
Idempotency
A single-document submission can be made safe to retry in one of two ways:
Idempotency-Keyrequest header — for standalone submissions with nojob_id. Send a client-generated unique value (same constraints ascustom_id: max 256 chars,[A-Za-z0-9_\-.:]). Re-sending a request with the same key returns the originalfile_idinstead of creating (and billing) a duplicate. This is the recommended dedup mechanism for/uri.(job_id, custom_id)— for submissions that belong to a job. See Jobs → Idempotency.
If both a custom_id (with its required job_id) and an Idempotency-Key header are present, the (job_id, custom_id) pair takes precedence.
Response
{ "file_id": "<uuid>" }
The returned file_id is the handle for everything that happens next — see Async Document Lifecycle:
GET /files/v1/{file_id}— poll status (pending/split/completed/error).GET /files/v1/{file_id}.{ext}— download a specific output format once complete.DELETE /files/v1/{file_id}— explicitly remove before the retention window expires.
Errors
Errors follow the shared Files API error envelope. Codes you'll see most often on this endpoint:
| Code | HTTP | When it fires |
|---|---|---|
bad_request | 400 | source_uri missing, malformed, or uses an unsupported scheme. |
unauthorized | 401 | app_key invalid or missing. |
data_source_not_found | 404 | source_uri references a bucket with no registered data source. |
data_source_access_denied | 403 | Bucket has a data source but the IAM grant isn't reachable (revoked, wrong ExternalId, expired token). |
not_found | 404 | URI's bucket / key returned 404 from the upstream storage. |
unsupported_format | 415 | Input MIME type is not one of the supported input types. |
quota_exceeded | 429 | Monthly page or file quota reached. |
storage_throttled | 503 | Upstream storage backend returned throttling — retry with backoff. |
Supported inputs
PDF, DOCX, DOC, PPTX, EPUB, ODT, RTF, and other document formats (detected by MIME type). See Supported formats for the full list.
See also
POST /files/v1/jobs— submit many URIs at once.POST /files/v1— direct multipart file upload (no remote URI needed).- Async Document Lifecycle — status, download, and delete endpoints for the file_id you receive.
- Data sources — register the buckets you'll source from.
- Files API Quickstart — guide walkthrough.
- Files API overview — when to use Files API vs
v3/pdf.