Skip to main content

Async Document Lifecycle

Once a file has been submitted via POST /files/v1/uri, POST /files/v1/jobs, or POST /files/v1 (direct multipart upload), use these endpoints to poll its status, download its converted results, or delete it.

EndpointDescription
GET /files/v1/{file_id}Poll processing status
GET /files/v1/{file_id}.{ext}Download a converted result in the requested format
DELETE /files/v1/{file_id}Permanently remove a file and its results

GET /files/v1/{file_id}

GET api.mathpix.com/files/v1/{file_id}

Returns the file's status and processing progress. Poll until status == "completed" (or "error").

Example

curl -H 'app_key: APP_KEY' \
https://api.mathpix.com/files/v1/b1c9c3a8-55e4-4a09-b7d0-218ba5de4c4d
// Response 200
{
"file_id": "b1c9c3a8-55e4-4a09-b7d0-218ba5de4c4d",
"status": "split", // "pending" | "split" | "completed" | "error"
"filename": "input.pdf",
"custom_id": "contract-001", // echoed back when supplied at submit; null otherwise
"num_pages": 50,
"num_pages_completed": 25,
"percent_done": 50.0,
"format_primary": "mmd",
"formats": { // per-requested-format conversion status
"md": "completed",
"docx": "processing"
}
}

Status values

StatusMeaning
pendingFile registered, queued for processing.
splitPages extracted, OCR and conversion in progress (poll percent_done).
completedAll processing finished — results available via download.
errorProcessing failed — see the error fields on the response (below).

Per-format conversion status

format_primary is always mmd. The formats map carries one entry per format you requested via conversion_formats, each with its own conversion status (pending / processing / completed / error). Conversions complete independently of — and can lag behind — the top-level status: a file can be completed overall while an individual format is still processing. Poll formats.{ext} before downloading that extension; a download of a format that isn't yet completed returns 404 format_not_ready.

Error fields

When status is "error", the response carries the same error + error_info object used by Files API request errors, instead of progress:

// Response 200  (status == "error")
{
"file_id": "b1c9c3a8-55e4-4a09-b7d0-218ba5de4c4d",
"status": "error",
"error": "data_source_access_denied", // machine-readable code
"error_info": {
"id": "data_source_access_denied", // duplicates `error` (v3-parser compatibility)
"message": "Access denied to source" // human-readable detail
},
"filename": "input.pdf",
"num_pages": 0,
"percent_done": 0.0
}

error is a stable, machine-readable code; see the error reference for the full list. Because remote-source fetching happens asynchronously, source problems surface here (not on the original submit call) — common values include data_source_access_denied (the bucket's grant isn't set up), data_source_not_found (no data source registered for the bucket), and content_too_large.


GET /files/v1/{file_id}.{ext}

GET api.mathpix.com/files/v1/{file_id}.{ext}

Download a converted result. The MMD format is always produced; other formats produce only when requested via conversion_formats on the original submission.

Supported extensions

mmd, md, md.zip, mmd.zip, docx, pptx, html, html.zip, tex.zip, latex.pdf, pdf, lines.json, lines.mmd.json.

See Supported formats for the full list with descriptions.

Example

curl -H 'app_key: APP_KEY' \
-o output.docx \
https://api.mathpix.com/files/v1/b1c9c3a8-55e4-4a09-b7d0-218ba5de4c4d.docx

Response headers

Content-Type:        <MIME type for the requested extension>
Content-Disposition: attachment; filename="<basename>.<ext>"

Errors

CodeHTTPWhen it fires
format_not_ready404Format is still converting (formats.{ext} is pending or processing). Retry after a short delay.
unsupported_format415Extension wasn't requested via conversion_formats on the original submission, or isn't a supported output format.
not_found404file_id doesn't exist (or was deleted).

lines.json and lines.mmd.json are available once the primary mmd format completes.


DELETE /files/v1/{file_id}

DELETE api.mathpix.com/files/v1/{file_id}

Permanently remove a file and its results from Mathpix-owned storage. Files are auto-deleted on a per-artifact schedule (source and page images after 30 days, text outputs after 90 days — see Data retention); call this to remove sooner.

Example

curl -X DELETE -H 'app_key: APP_KEY' \
https://api.mathpix.com/files/v1/b1c9c3a8-55e4-4a09-b7d0-218ba5de4c4d
// Response 200
{
"file_id": "b1c9c3a8-55e4-4a09-b7d0-218ba5de4c4d",
"status": "deleted"
}

Behavior

  • Only terminal files can be deleted. A file still being processed (pending / split) cannot be deleted; DELETE returns 409 conflict. Wait for completed or error, then delete.
  • Idempotent. Calling DELETE on an already-deleted file returns the same 200 / status: deleted body, not 404.
  • Mathpix-owned storage only. Results delivered to a customer-owned bucket via destination_uri are not affected — those live under your bucket's own lifecycle policy. Mathpix never deletes from customer-owned buckets.
  • Billing counters preserved. Per-month page and file counts that drive billing are never decremented. Deleting a file does not credit your account.
  • Job counters preserved. A file's job remains intact; file_count / files_completed / files_errored on the parent job are not adjusted.

Errors

CodeHTTPWhen it fires
not_found404file_id doesn't resolve to any row (and has never existed).
conflict409file_id exists but is still processing (pending / split) — not yet deletable.
forbidden403file_id exists but is owned by a different group.

See also