Process a Document Async

POST api.mathpix.com/files/v1/uri

Submit a single document for async processing by remote URI.

Returns a file_id for polling status and downloading results.

note

Use POST /files/v1/jobs for bulk submission of many URIs in one call.

Accepted `source_uri` forms

Depending on its type, a source may require a registered data source: a connection between your Mathpix account and a bucket or container you own, with an access grant. Public URLs require none; the Required column shows this per type.

Type	Scheme	Required	Example
Public URL	`https://`	No	`https://example.com/file.pdf`
AWS S3	`s3://`	Yes	`s3://your-bucket/path/file.pdf`
Google Cloud Storage	`gs://`	Yes	`gs://your-bucket/path/file.pdf`
Azure Blob Storage	`https://` (host-matched)	Yes	`https://{account}.blob.core.windows.net/{container}/{blob}.pdf`

note

Azure Blob sources are recognized by their host (*.blob.core.windows.net). Supply the full HTTPS URL; there is no separate Azure scheme.

Example

{
  "source_uri": "https://cdn.mathpix.com/examples/cs229-notes1.pdf",
  "conversion_formats": { "docx": true, "md": true }
}

curl -X POST https://api.mathpix.com/files/v1/uri \
-H 'app_key: APP_KEY' \
-H 'Content-Type: application/json' \
--data '{
  "source_uri": "https://cdn.mathpix.com/examples/cs229-notes1.pdf",
  "conversion_formats": { "docx": true, "md": true }
}'

import requests
r = requests.post("https://api.mathpix.com/files/v1/uri",
    json={
        "source_uri": "https://cdn.mathpix.com/examples/cs229-notes1.pdf",
        "conversion_formats": {"docx": True, "md": True},
    },
    headers={
        "app_key": "APP_KEY",
        "Content-Type": "application/json",
    },
)
print(r.json())  # {"file_id": "<uuid>"}

const response = await fetch("https://api.mathpix.com/files/v1/uri", {
  method: "POST",
  headers: {
    app_key: "APP_KEY",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    source_uri: "https://cdn.mathpix.com/examples/cs229-notes1.pdf",
    conversion_formats: { docx: true, md: true },
  }),
});
const { file_id } = await response.json();
console.log(`File ID: ${file_id}`);

body := bytes.NewBufferString(`{
  "source_uri": "https://cdn.mathpix.com/examples/cs229-notes1.pdf",
  "conversion_formats": {"docx": true, "md": true}
}`)
req, _ := http.NewRequest("POST", "https://api.mathpix.com/files/v1/uri", body)
req.Header.Set("app_key", "APP_KEY")
req.Header.Set("Content-Type", "application/json")
resp, _ := http.DefaultClient.Do(req)
defer resp.Body.Close()
result, _ := io.ReadAll(resp.Body)
fmt.Println(string(result)) // {"file_id": "<uuid>"}

HttpClient client = HttpClient.newHttpClient();
String body = """
    {
      "source_uri": "https://cdn.mathpix.com/examples/cs229-notes1.pdf",
      "conversion_formats": {"docx": true, "md": true}
    }
    """;
HttpRequest request = HttpRequest.newBuilder()
    .uri(URI.create("https://api.mathpix.com/files/v1/uri"))
    .header("app_key", "APP_KEY")
    .header("Content-Type", "application/json")
    .POST(HttpRequest.BodyPublishers.ofString(body))
    .build();
HttpResponse<String> response = client.send(request, HttpResponse.BodyHandlers.ofString());
System.out.println(response.body());

Example response
{
  "file_id": "b1c9c3a8-55e4-4a09-b7d0-218ba5de4c4d"
}

Examples by source scheme

The request shape is identical across schemes; only source_uri changes. Buckets and containers must have a registered data source; public https:// URLs need none.

Public URL
AWS S3
Google Cloud Storage
Azure Blob

{
  "source_uri": "https://cdn.mathpix.com/examples/cs229-notes1.pdf",
  "conversion_formats": { "docx": true, "md": true }
}

{
  "source_uri": "s3://your-bucket/contracts/contract-001.pdf",
  "conversion_formats": { "docx": true, "md": true }
}

{
  "source_uri": "gs://your-bucket/contracts/contract-001.pdf",
  "conversion_formats": { "docx": true, "md": true }
}

{
  "source_uri": "https://acmestorage.blob.core.windows.net/contracts/contract-001.pdf",
  "conversion_formats": { "docx": true, "md": true }
}

Request parameters

In addition to the Files-API-specific parameters below, this endpoint accepts the same OCR and conversion options as POST /v3/pdf (e.g. alphabets_allowed, rm_spaces, math_inline_delimiters, include_smiles).

source_uri string

Remote location of the source document. See Accepted source_uri forms above for the supported types and when a registered data source is required.

filename string (optional)

Display name for this file, returned in status responses and the job file listing, and used to name outputs. Defaults to <file_id>.pdf when omitted.

conversion_formats object (optional)

Which output formats to produce. Same shape and supported keys as POST /v3/pdf, e.g. { "docx": true, "md": true, "tex.zip": true }. Output also includes Mathpix Markdown (.mmd) by default.

destination_uri string (optional)

Where to write conversion results when they complete. Same scheme rules as source_uri; must be a writeable location backed by a registered data source.

Output file path convention: each requested format lands at <destination_uri>/<destination_basename>.<ext> (the basename defaults to the file_id). When image_output_mode is "local", cropped images land under <destination_uri>/images/.

When omitted, results stay in Mathpix storage and are fetched via GET /files/v1/{file_id}.{ext}. See Data retention for how long each artifact is kept.

note

The destination_uri must be short enough that the output object keys derived from it stay within your storage provider's object-key limit:

AWS S3 and Google Cloud Storage: 1024 bytes
Azure Blob Storage: 1024 UTF-16 characters and 254 path segments

The limit is measured on the object key; the scheme and bucket prefix do not count. A longer folder is rejected with 400 destination_uri_too_long.

s3_region string (optional)

Region of the destination_uri S3 bucket. Defaults to us-east-1.

destination_basename string (optional)

Basename for this file's output objects within destination_uri; results land at <destination_uri>/<destination_basename>.<ext>. Defaults to the file_id.

If <destination_uri>/<basename>.<ext> would exceed the provider's object-key limit, the basename falls back to a fixed output, and results land at <destination_uri>/output.<ext>. The folder still uniquely identifies the file, so keep basenames short (or omit this field) to preserve your own naming.

custom_id string (optional)

Customer-supplied identifier for this submission. Max 256 chars, characters [A-Za-z0-9_\-.:], case-sensitive. Echoed back in the job file listing so you can correlate results with your own ids.

Requires job_id. A custom_id is a per-job identifier, so submitting one without a job_id is rejected with 400 bad_request.

Together, (job_id, custom_id) is the idempotency key: re-submitting the same pair returns the original file_id rather than creating a new submission. To make a submission idempotent without a job_id, use the Idempotency-Key header instead (see Idempotency below).

job_id string (optional)

Associates this file with a job. When omitted, the file is treated as standalone. Required whenever custom_id is supplied. See Jobs for how files group into jobs and Idempotency for the dedup semantics.

image_output_mode string (optional)

Set to "local" to write cropped images (figures, equation crops, etc.) directly into your destination_uri storage, under <destination_uri>/images/, alongside the converted outputs. Requires destination_uri to be set and backed by a registered data source.

When unset (default), cropped images stay on Mathpix's CDN and are referenced by URL from the Mathpix Markdown output.

Idempotency

A single-document submission can be made safe to retry in one of two ways:

Idempotency-Key request header: for standalone submissions with no job_id. Send a client-generated unique value (same constraints as custom_id: max 256 chars, [A-Za-z0-9_\-.:]). Re-sending a request with the same key returns the original file_id instead of creating (and billing) a duplicate. This is the recommended dedup mechanism for /uri.
(job_id, custom_id): for submissions that belong to a job. See Jobs → Idempotency.

If both a custom_id (with its required job_id) and an Idempotency-Key header are present, the (job_id, custom_id) pair takes precedence.

note

Only a live file (pending, split, or completed) counts as an idempotent hit. If the original file reached error status or was deleted, the same key is a miss and resubmission creates a fresh file, so you can retry failures with the same key.

Response

{
  "file_id": "b1c9c3a8-55e4-4a09-b7d0-218ba5de4c4d"
}

Response body

file_id string

The handle for everything that happens next; see Async Document Lifecycle.

GET /files/v1/{file_id}: poll status (pending / split / completed / error).
GET /files/v1/{file_id}.{ext}: download a specific output format once complete.
DELETE /files/v1/{file_id}: explicitly remove before the retention window expires.

Errors

Errors follow the shared Files API error envelope. Codes you'll see most often on this endpoint:

Code	HTTP	When it fires
`bad_request`	400	`source_uri` missing, malformed, or uses an unsupported scheme.
`destination_uri_too_long`	400	`destination_uri` is too long: an output object key derived from it would exceed the storage provider's object-key limit (S3/GCS 1024 bytes; Azure 1024 chars + 254 segments).
`unauthorized`	401	`app_key` invalid or missing.
`data_source_not_found`	404	`source_uri` references a bucket with no registered data source.
`data_source_access_denied`	403	Bucket has a data source but the IAM grant isn't reachable (revoked, wrong ExternalId, expired token).
`not_found`	404	URI's bucket / key returned 404 from the upstream storage.
`unsupported_format`	415	Input MIME type is not one of the supported input types.
`quota_exceeded`	429	Monthly page or file quota reached.
`storage_throttled`	503	Upstream storage backend returned throttling; retry with backoff.

Supported inputs

PDF, DOCX, DOC, PPTX, EPUB, ODT, RTF, and other document formats (detected by MIME type). See Supported formats for the full list.

Accepted source_uri forms​

Example​

Examples by source scheme​

Request parameters​

Idempotency​

Response​

Response body​

Errors​

Supported inputs​

See also​

Accepted `source_uri` forms

Example

Examples by source scheme

Request parameters

Idempotency

Response

Response body

Errors

Supported inputs

See also