Migrate from SCS to Files API
Files API provides the same processing and output options as SCS, but you submit jobs yourself via API instead of through Mathpix support.
Prerequisites
- A Mathpix app key. All requests authenticate with an
app_keyheader (see Authentication). If you don't have one yet, sign up at the Mathpix Console and copy a key from the API keys page. SCS customers migrating over will typically be setting this up for the first time. - One-time data source setup. Required to enable private URI inputs and automatic result uploads straight to your cloud storage. Data sources are registered per group (your organization), not per app key — so every app key in your group shares the same registered sources, and you only set each bucket up once.
What's changing
SCS is operated manually by Mathpix. A customer would send a list of source and destination paths and enable storage access, and Mathpix would run the processing job and upload your results. The Files API exposes the same machinery as a public API:
| SCS | Files API | |
|---|---|---|
| Who runs the job | Mathpix engineer (via internal CLI) | Customer (via HTTP request) |
| Onboarding | Email support, exchange S3 credentials, scheduled by engineer | Self-serve: register a data source once, then submit any time |
| Concurrent jobs | Per engineer availability | Submit on-demand, multiple jobs in flight |
| Pricing | Custom per contract | Public tiered pricing |
| Model | Same Mathpix OCR model | Same Mathpix OCR model |
What you keep doing
- Same processing model. OCR quality, layout extraction, equation/figure crops, and Mathpix Markdown output are unchanged from SCS.
- For each input file you get an
.mmd,.lines.json, and.lines.mmd.json, plus the option to include mmd conversions and a localimages/folder of cropped images and equations, referenced relatively from the MMD. Drop-in compatible with whatever consumes your SCS output today. - Same conversion formats — request
docx,tex.zip,html, etc. viaconversion_formatson the submission, just like before. - Same delivery model. Results land in your own bucket via
destination_uri; Mathpix never holds long-term copies of your outputs.
What changes
- Self-serve data sources. No more emailing credentials. Grant Mathpix access to your bucket via IAM role (AWS), AD app (Azure), or service-account impersonation (GCS) — once. See the Data Sources API.
- Tiered pricing. Public, marginal-cost tiers applied per calendar month. See pricing.
- Idempotency via
custom_id. Supply a per-filecustom_idon submission; resubmitting the same(job_id, custom_id)returns the originalfile_idinstead of creating a duplicate. Safe to retry on network blips and timeouts. - Crop delivery is opt-in via
image_output_mode. This setting controls where the cropped images (equations, figures, tables) go. To match the classic SCS output shape, set"image_output_mode": "local": the worker writes a looseimages/folder of crops into yourdestination_uribucket, the MMD references them by relative path (images/<id>.jpg), and Mathpix keeps no long-term copy. Omit it (the default) and crops are hosted on Mathpix's CDN and referenced by absolute URL, with nothing written to your bucket.- Zipped and rich formats are self-contained either way. Per-format
.zipvariants embed their crops in the archive, and rich conversions (docx,pdf,pptx, …) embed the real crop images inside the file regardless ofimage_output_mode.localmode only matters when you want the standaloneimages/folder delivered alongside non-zipped outputs like plain.mmdor.md.
- Zipped and rich formats are self-contained either way. Per-format
A worked migration
A migrated workload typically walks your storage source to enumerate the input files, then submits them as a single job, giving each file its own destination_uri under your output prefix. destination_uri is per file (not job-wide), so a per-document subfolder keeps each document's results and crops together — the same layout SCS classic produced. The job_id you supply becomes the handle you'll use for status reads:
- cURL
- Python
curl -X POST https://api.mathpix.com/files/v1/jobs \
-H "app_key: $APP_KEY" \
-H 'Content-Type: application/json' \
--data '{
"job_id": "2026-05-contracts",
"image_output_mode": "local",
"conversion_formats": { "docx": true, "md": true },
"files": [
{ "source_uri": "s3://acme-source/contracts/2026-05/contract-001.pdf", "custom_id": "contract-001", "destination_uri": "s3://acme-output/processed/2026-05/contract-001/" },
{ "source_uri": "s3://acme-source/contracts/2026-05/contract-002.pdf", "custom_id": "contract-002", "destination_uri": "s3://acme-output/processed/2026-05/contract-002/" }
]
}'
import os, requests, boto3
# Walk your source prefix to build the files list
s3 = boto3.client("s3")
out_base = "s3://acme-output/processed/2026-05"
files = []
paginator = s3.get_paginator("list_objects_v2")
for page in paginator.paginate(Bucket="acme-source", Prefix="contracts/2026-05/"):
for obj in page.get("Contents", []):
if obj["Key"].endswith(".pdf"):
custom_id = obj["Key"].rsplit("/", 1)[-1].removesuffix(".pdf")
files.append({
"source_uri": f"s3://acme-source/{obj['Key']}",
"custom_id": custom_id,
"destination_uri": f"{out_base}/{custom_id}/",
"destination_basename": custom_id,
})
# Submit the batch
r = requests.post("https://api.mathpix.com/files/v1/jobs",
json={
"job_id": "2026-05-contracts",
"image_output_mode": "local",
"conversion_formats": {"docx": True, "md": True},
"files": files,
},
headers={"app_key": os.environ["APP_KEY"], "Content-Type": "application/json"},
)
print(r.json()) # {"job_id": "...", "file_count": <n>}
{
"job_id": "2026-05-contracts",
"file_count": 2
}
Track the job using the same id you supplied as job_id:
# Status + counters
curl -H "app_key: $APP_KEY" \
"https://api.mathpix.com/files/v1/jobs/2026-05-contracts"
# Errored files only
curl -H "app_key: $APP_KEY" \
"https://api.mathpix.com/files/v1/jobs/2026-05-contracts/files?status=error"
Results land in s3://acme-output/processed/2026-05/<file_id>.mmd (plus images/, .docx, etc. per the formats you requested) — same layout SCS produced. The "image_output_mode": "local" option is what writes cropped images into your bucket's images/ folder with relative references; omit it and crops stay on Mathpix's CDN instead.
Support
- Questions during migration: email support@mathpix.com with subject
[SCS migration]. - Your existing SCS contract terms carry over until your migration is complete — reach out before submitting volume that would exceed your previous SCS allotment.
- Stuck on data-source setup? The Data Sources API reference covers the per-provider IAM steps. Send your AWS account ID / Azure tenant / GCS project to support if you'd like a setup review.
Next steps
- Data Sources API — register your source and destination buckets.
- Files API Quickstart — end-to-end walkthrough.
- Async Batch Document Processing — full reference for
POST /files/v1/jobs, including idempotency, partial-success handling, and pagination.