Skip to main content

Data Retention

This page describes Mathpix's retention policy regarding data submitted to each API endpoint, and the recommended patterns for keeping outputs long-term.

Overview

Retention depends on the endpoint and on the metadata.improve_mathpix flag (see Privacy). At a glance:

Endpoint familyWhat we storeDefault retention
/v3/pdfSource document, page images, text outputsUp to 30 days (source + page images), up to 90 days (text outputs)
/v3/converterDownloadable conversion outputs (DOCX, ZIP, etc.)Ephemeral — download immediately
/v3/text, /v3/latex, /v3/strokes, /v3/batchInput image + cached OCR result (only with improve_mathpix: true)Up to 90 days
Request metadata (billing, audit)Timestamps, status, page counts, error codesRetained for audit and billing

Request bodies sent with metadata.improve_mathpix: false are not persisted for later retrieval — see improve_mathpix and retention below.

/v3/pdf

PDF processing creates several artifacts, each with its own retention:

ArtifactRetentionHow to keep longer
Uploaded source document (.pdf, .docx, .pptx, etc.)Up to 30 daysKeep the original on your side
Page image files backing cdn.mathpix.com/cropped/... URLsUp to 30 daysRequest a zip output format at processing time
Mathpix Markdown output (.mmd)Up to 90 daysStore the MMD string returned from the API response on your side
Line detection outputs (.lines.json, .lines.mmd.json)Up to 90 daysDownload immediately after processing and store on your side
Data deleted via DELETE /v3/pdf/{pdf_id}Removed immediately (CDN may serve cached copies for up to 10 minutes)N/A

Keeping images long-term

cdn.mathpix.com/cropped/... URLs returned in your output will stop resolving once the underlying page images are past retention. The recommended pattern for long-term image access is to request a zip output format at processing time. Zip outputs embed all referenced images inline, so the output is self-contained and has no dependency on our CDN.

Supported formats that bundle images:

  • .mmd.zip — Mathpix Markdown + images
  • .md.zip — vanilla Markdown + images
  • .html.zip — HTML + images
  • .tex.zip — LaTeX + images
  • .docx, .pptx — Word / PowerPoint with images embedded

Request a zip output at submission time by passing the format in conversion_formats:

{
"url": "https://example.com/your-document.pdf",
"conversion_formats": {
"mmd.zip": true,
"docx": true
}
}

Trade-off: zip outputs are larger than MMD-with-CDN-URLs because image bytes are embedded.

Regenerating outputs after the fact

Zip outputs can only be generated while the source document is still in our storage (up to 30 days after processing). If the source is no longer available, re-processing requires re-uploading it.

If you processed a PDF within the last 30 days and want to add a zip output later, you can call /v3/converter on the MMD output to generate one — as long as the referenced page images are still in our storage. If the page images have already expired, the conversion will succeed but images will be missing from the output.

Interaction with DELETE

Calling DELETE /v3/pdf/{pdf_id} removes the source document, page images, MMD, and line data immediately. Any cdn.mathpix.com/cropped/... URLs in outputs you've already stored on your side will stop resolving once the CDN cache expires (up to 10 minutes). Conversions generated before the delete are not retroactively removed, but any images they reference via CDN will also stop resolving.

/v3/converter

The converter accepts Mathpix Markdown and produces downloadable outputs (DOCX, ZIP bundles, LaTeX source, HTML, etc.).

ArtifactRetentionNotes
Downloadable conversion outputsNot guaranteed for any fixed periodTreat as ephemeral — download the output as soon as the conversion completes. Do not use the download URL as a durable link.
Conversion status and metadataUp to 90 daysAfter expiry, GET /v3/converter/{id}.{ext} will return 404.
Data deleted via DELETE /v3/converter/{conversion_id}Removed immediatelyN/A

If you need the converted file long-term, download and host it yourself. To regenerate a conversion, re-submit the source MMD.

Image endpoints: /v3/text, /v3/latex, /v3/strokes, /v3/batch

These endpoints accept a single image or a batch and return OCR results synchronously. What we retain depends on metadata.improve_mathpix:

improve_mathpixInput imageOCR result cacheQueryable via /v3/ocr-results
true (default)Retained for up to 90 daysRetained for up to 30 daysYes
falseNot persisted to diskNot cachedNo

Request metadata (timestamps, status, page count, error codes) is retained for auditing and billing regardless of improve_mathpix.

The API response body contains the full OCR result — store it on your side if you need it long-term. The result cache available via /v3/ocr-results is a convenience for retrieving recent results, not a durable storage mechanism.

Query endpoints: /v3/ocr-results, /v3/pdf-results

These endpoints return records we already have — they don't themselves retain anything beyond what the underlying endpoint retained:

  • /v3/ocr-results returns image OCR records for requests sent with improve_mathpix: true. Available for up to 30 days.
  • /v3/pdf-results returns PDF processing records. Available for up to 90 days.

improve_mathpix and retention

metadata.improve_mathpix: false tells Mathpix not to use your data for quality improvements and not to persist it for later retrieval. Effects per endpoint family:

  • /v3/pdf: the uploaded source document is deleted immediately after processing completes. Page image files that back cdn.mathpix.com/cropped/... URLs are retained for up to 30 days. Text outputs (MMD, line data) are retained for up to 90 days. To remove CDN-referenced page images immediately, call DELETE /v3/pdf/{pdf_id} explicitly.
  • Image endpoints (/v3/text, /v3/latex, /v3/strokes, /v3/batch): the image is not saved to disk, the OCR result is not cached, and the request is not queryable via /v3/ocr-results. Results are available only in the direct API response.
  • /v3/converter: the converter runs on MMD text you submit, so improve_mathpix has no effect on retention here — downloadable outputs are treated as ephemeral regardless.

Need longer retention?

If your workflow requires extended retention beyond the defaults, contact support@mathpix.com to discuss a custom retention policy on your API key.