Data Retention
This page describes Mathpix's retention policy regarding data submitted to each API endpoint, and the recommended patterns for keeping outputs long-term.
Overview
Retention depends on the endpoint and on the metadata.improve_mathpix flag (see Privacy). At a glance:
| Endpoint family | What we store | Default retention |
|---|---|---|
/v3/pdf | Source document, page images, text outputs | Up to 30 days (source + page images), up to 90 days (text outputs) |
/v3/converter | Downloadable conversion outputs (DOCX, ZIP, etc.) | Ephemeral — download immediately |
/v3/text, /v3/latex, /v3/strokes, /v3/batch | Input image + cached OCR result (only with improve_mathpix: true) | Up to 90 days |
| Request metadata (billing, audit) | Timestamps, status, page counts, error codes | Retained for audit and billing |
Request bodies sent with metadata.improve_mathpix: false are not persisted for later retrieval — see improve_mathpix and retention below.
/v3/pdf
PDF processing creates several artifacts, each with its own retention:
| Artifact | Retention | How to keep longer |
|---|---|---|
Uploaded source document (.pdf, .docx, .pptx, etc.) | Up to 30 days | Keep the original on your side |
Page image files backing cdn.mathpix.com/cropped/... URLs | Up to 30 days | Request a zip output format at processing time |
Mathpix Markdown output (.mmd) | Up to 90 days | Store the MMD string returned from the API response on your side |
Line detection outputs (.lines.json, .lines.mmd.json) | Up to 90 days | Download immediately after processing and store on your side |
Data deleted via DELETE /v3/pdf/{pdf_id} | Removed immediately (CDN may serve cached copies for up to 10 minutes) | N/A |
Keeping images long-term
cdn.mathpix.com/cropped/... URLs returned in your output will stop resolving once the underlying page images are past retention. The recommended pattern for long-term image access is to request a zip output format at processing time. Zip outputs embed all referenced images inline, so the output is self-contained and has no dependency on our CDN.
Supported formats that bundle images:
.mmd.zip— Mathpix Markdown + images.md.zip— vanilla Markdown + images.html.zip— HTML + images.tex.zip— LaTeX + images.docx,.pptx— Word / PowerPoint with images embedded
Request a zip output at submission time by passing the format in conversion_formats:
{
"url": "https://example.com/your-document.pdf",
"conversion_formats": {
"mmd.zip": true,
"docx": true
}
}
Trade-off: zip outputs are larger than MMD-with-CDN-URLs because image bytes are embedded.
Regenerating outputs after the fact
Zip outputs can only be generated while the source document is still in our storage (up to 30 days after processing). If the source is no longer available, re-processing requires re-uploading it.
If you processed a PDF within the last 30 days and want to add a zip output later, you can call /v3/converter on the MMD output to generate one — as long as the referenced page images are still in our storage. If the page images have already expired, the conversion will succeed but images will be missing from the output.
Interaction with DELETE
Calling DELETE /v3/pdf/{pdf_id} removes the source document, page images, MMD, and line data immediately. Any cdn.mathpix.com/cropped/... URLs in outputs you've already stored on your side will stop resolving once the CDN cache expires (up to 10 minutes). Conversions generated before the delete are not retroactively removed, but any images they reference via CDN will also stop resolving.
/v3/converter
The converter accepts Mathpix Markdown and produces downloadable outputs (DOCX, ZIP bundles, LaTeX source, HTML, etc.).
| Artifact | Retention | Notes |
|---|---|---|
| Downloadable conversion outputs | Not guaranteed for any fixed period | Treat as ephemeral — download the output as soon as the conversion completes. Do not use the download URL as a durable link. |
| Conversion status and metadata | Up to 90 days | After expiry, GET /v3/converter/{id}.{ext} will return 404. |
Data deleted via DELETE /v3/converter/{conversion_id} | Removed immediately | N/A |
If you need the converted file long-term, download and host it yourself. To regenerate a conversion, re-submit the source MMD.
Image endpoints: /v3/text, /v3/latex, /v3/strokes, /v3/batch
These endpoints accept a single image or a batch and return OCR results synchronously. What we retain depends on metadata.improve_mathpix:
improve_mathpix | Input image | OCR result cache | Queryable via /v3/ocr-results |
|---|---|---|---|
true (default) | Retained for up to 90 days | Retained for up to 30 days | Yes |
false | Not persisted to disk | Not cached | No |
Request metadata (timestamps, status, page count, error codes) is retained for auditing and billing regardless of improve_mathpix.
The API response body contains the full OCR result — store it on your side if you need it long-term. The result cache available via /v3/ocr-results is a convenience for retrieving recent results, not a durable storage mechanism.
Query endpoints: /v3/ocr-results, /v3/pdf-results
These endpoints return records we already have — they don't themselves retain anything beyond what the underlying endpoint retained:
/v3/ocr-resultsreturns image OCR records for requests sent withimprove_mathpix: true. Available for up to 30 days./v3/pdf-resultsreturns PDF processing records. Available for up to 90 days.
improve_mathpix and retention
metadata.improve_mathpix: false tells Mathpix not to use your data for quality improvements and not to persist it for later retrieval. Effects per endpoint family:
/v3/pdf: the uploaded source document is deleted immediately after processing completes. Page image files that backcdn.mathpix.com/cropped/...URLs are retained for up to 30 days. Text outputs (MMD, line data) are retained for up to 90 days. To remove CDN-referenced page images immediately, callDELETE /v3/pdf/{pdf_id}explicitly.- Image endpoints (
/v3/text,/v3/latex,/v3/strokes,/v3/batch): the image is not saved to disk, the OCR result is not cached, and the request is not queryable via/v3/ocr-results. Results are available only in the direct API response. /v3/converter: the converter runs on MMD text you submit, soimprove_mathpixhas no effect on retention here — downloadable outputs are treated as ephemeral regardless.
Need longer retention?
If your workflow requires extended retention beyond the defaults, contact support@mathpix.com to discuss a custom retention policy on your API key.