Migrate from SCS to Files API

Files API provides the same processing and output options as SCS, but you submit jobs yourself via API instead of through Mathpix support.

Prerequisites

A Mathpix app key. All requests authenticate with an app_key header (see Authentication). If you don't have one yet, sign up at the Mathpix Console and copy a key from the API keys page. SCS customers migrating over will typically be setting this up for the first time.
One-time data source setup. Required to enable private URI inputs and automatic result uploads straight to your cloud storage. Data sources are registered per group (your organization), not per app key, so every app key in your group shares the same registered sources and you only set each bucket up once.

What's changing

SCS is operated manually by Mathpix. A customer would send a list of source and destination paths and enable storage access, and Mathpix would run the processing job and upload your results. The Files API exposes the same machinery as a public API:

	SCS	Files API
Who runs the job	Mathpix engineer (via internal CLI)	Customer (via HTTP request)
Onboarding	Email support, exchange S3 credentials, scheduled by engineer	Self-serve: register a data source once, then submit any time
Concurrent jobs	Per engineer availability	Submit at any time, multiple jobs at once
Pricing	Custom per contract	Public tiered pricing
Model	Same Mathpix OCR model	Same Mathpix OCR model

What you keep doing

Same processing model. OCR quality, layout extraction, equation/figure crops, and Mathpix Markdown output are unchanged from SCS.
Same output files. For each input file you get an .mmd, .lines.json, and .lines.mmd.json, plus the option to include Mathpix Markdown conversions and a local images/ folder of cropped images and equations, referenced relatively from the MMD. These are the same files SCS delivered, so whatever consumes your SCS output today keeps working unchanged.
Same conversion formats. Request formats such as docx, tex.zip, and html via conversion_formats on the submission, just like before.
Same delivery model. Results land in your own bucket via destination_uri; Mathpix never holds long-term copies of your outputs.

What changes

Self-serve data sources. No more emailing credentials. Grant Mathpix access to your bucket via an IAM role (AWS S3), an Azure Active Directory application (Azure Blob Storage), or service-account impersonation (Google Cloud Storage), and you only do it once. See the Data Sources API.
Tiered pricing. Public, marginal-cost tiers applied per calendar month. See pricing.
Idempotency via custom_id. Supply a per-file custom_id on submission; resubmitting the same (job_id, custom_id) returns the original file_id instead of creating a duplicate. Safe to retry after network failures and timeouts.
Crop delivery is opt-in via image_output_mode. This setting controls where the cropped images (equations, figures, tables) go. To match the classic SCS output shape, set "image_output_mode": "local": the worker writes a loose images/ folder of crops into your destination_uri bucket, the MMD references them by relative path (images/<id>.jpg), and Mathpix keeps no long-term copy. Omit it (the default) and crops are hosted on Mathpix's CDN and referenced by absolute URL, with nothing written to your bucket.
- Zipped and rich formats are self-contained either way. Per-format .zip variants embed their crops in the archive, and rich conversions such as docx, pdf, and pptx embed the real crop images inside the file regardless of image_output_mode. local mode only matters when you want the standalone images/ folder delivered alongside non-zipped outputs like plain .mmd or .md.

A worked migration

A migrated workload typically first enumerates the input files, by listing the source prefix with your storage provider's SDK or command-line tools, and then submits them all as a single job via POST /files/v1/jobs.

destination_uri is per file, not job-wide. Giving each file its own subfolder under your output prefix keeps each document's results and crops together, the same layout SCS classic produced. Setting destination_basename to the custom_id additionally makes the output filenames deterministic (they default to the server-generated file_id).

The job_id you supply becomes the handle you use for status reads. The submission below sends two contract documents, requests DOCX and Markdown conversions on top of the default Mathpix Markdown, and asks for cropped images in each file's destination folder:

cURL
Python
JavaScript / TypeScript
Go
Java

curl -X POST https://api.mathpix.com/files/v1/jobs \
-H "app_key: $APP_KEY" \
-H 'Content-Type: application/json' \
--data '{
  "job_id": "2026-05-contracts",
  "image_output_mode": "local",
  "conversion_formats": { "docx": true, "md": true },
  "files": [
    {
      "source_uri": "s3://acme-source/contracts/2026-05/contract-001.pdf",
      "custom_id": "contract-001",
      "destination_uri": "s3://acme-output/processed/2026-05/contract-001/",
      "destination_basename": "contract-001"
    },
    {
      "source_uri": "s3://acme-source/contracts/2026-05/contract-002.pdf",
      "custom_id": "contract-002",
      "destination_uri": "s3://acme-output/processed/2026-05/contract-002/",
      "destination_basename": "contract-002"
    }
  ]
}'

import os, requests
r = requests.post("https://api.mathpix.com/files/v1/jobs",
    json={
        "job_id": "2026-05-contracts",
        "image_output_mode": "local",
        "conversion_formats": {"docx": True, "md": True},
        "files": [
            {
                "source_uri": "s3://acme-source/contracts/2026-05/contract-001.pdf",
                "custom_id": "contract-001",
                "destination_uri": "s3://acme-output/processed/2026-05/contract-001/",
                "destination_basename": "contract-001",
            },
            {
                "source_uri": "s3://acme-source/contracts/2026-05/contract-002.pdf",
                "custom_id": "contract-002",
                "destination_uri": "s3://acme-output/processed/2026-05/contract-002/",
                "destination_basename": "contract-002",
            },
        ],
    },
    headers={"app_key": os.environ["APP_KEY"], "Content-Type": "application/json"},
)
print(r.json())  # {"file_count": 2, "job_id": "2026-05-contracts"}

const response = await fetch("https://api.mathpix.com/files/v1/jobs", {
  method: "POST",
  headers: { app_key: process.env.APP_KEY, "Content-Type": "application/json" },
  body: JSON.stringify({
    job_id: "2026-05-contracts",
    image_output_mode: "local",
    conversion_formats: { docx: true, md: true },
    files: [
      {
        source_uri: "s3://acme-source/contracts/2026-05/contract-001.pdf",
        custom_id: "contract-001",
        destination_uri: "s3://acme-output/processed/2026-05/contract-001/",
        destination_basename: "contract-001",
      },
      {
        source_uri: "s3://acme-source/contracts/2026-05/contract-002.pdf",
        custom_id: "contract-002",
        destination_uri: "s3://acme-output/processed/2026-05/contract-002/",
        destination_basename: "contract-002",
      },
    ],
  }),
});
const { job_id, file_count } = await response.json();
console.log(`Job ${job_id} accepted ${file_count} files`);

body := bytes.NewBufferString(`{
  "job_id": "2026-05-contracts",
  "image_output_mode": "local",
  "conversion_formats": {"docx": true, "md": true},
  "files": [
    {
      "source_uri": "s3://acme-source/contracts/2026-05/contract-001.pdf",
      "custom_id": "contract-001",
      "destination_uri": "s3://acme-output/processed/2026-05/contract-001/",
      "destination_basename": "contract-001"
    },
    {
      "source_uri": "s3://acme-source/contracts/2026-05/contract-002.pdf",
      "custom_id": "contract-002",
      "destination_uri": "s3://acme-output/processed/2026-05/contract-002/",
      "destination_basename": "contract-002"
    }
  ]
}`)
req, _ := http.NewRequest("POST", "https://api.mathpix.com/files/v1/jobs", body)
req.Header.Set("app_key", os.Getenv("APP_KEY"))
req.Header.Set("Content-Type", "application/json")
resp, _ := http.DefaultClient.Do(req)
defer resp.Body.Close()
result, _ := io.ReadAll(resp.Body)
fmt.Println(string(result)) // {"file_count": 2, "job_id": "2026-05-contracts"}

HttpClient client = HttpClient.newHttpClient();
String body = """
    {
      "job_id": "2026-05-contracts",
      "image_output_mode": "local",
      "conversion_formats": {"docx": true, "md": true},
      "files": [
        {
          "source_uri": "s3://acme-source/contracts/2026-05/contract-001.pdf",
          "custom_id": "contract-001",
          "destination_uri": "s3://acme-output/processed/2026-05/contract-001/",
          "destination_basename": "contract-001"
        },
        {
          "source_uri": "s3://acme-source/contracts/2026-05/contract-002.pdf",
          "custom_id": "contract-002",
          "destination_uri": "s3://acme-output/processed/2026-05/contract-002/",
          "destination_basename": "contract-002"
        }
      ]
    }
    """;
HttpRequest request = HttpRequest.newBuilder()
    .uri(URI.create("https://api.mathpix.com/files/v1/jobs"))
    .header("app_key", System.getenv("APP_KEY"))
    .header("Content-Type", "application/json")
    .POST(HttpRequest.BodyPublishers.ofString(body))
    .build();
HttpResponse<String> response = client.send(request, HttpResponse.BodyHandlers.ofString());
System.out.println(response.body());

Example response
{
  "file_count": 2,
  "job_id": "2026-05-contracts"
}

Track the job

Track the job with the same id you supplied as job_id. The first request below reads the job's status and counters via GET /files/v1/jobs/{job_id}; the second lists only the files that failed via GET /files/v1/jobs/{job_id}/files?status=error.

cURL
Python
JavaScript / TypeScript
Go
Java

curl -H "app_key: $APP_KEY" \
  "https://api.mathpix.com/files/v1/jobs/2026-05-contracts"

curl -H "app_key: $APP_KEY" \
  "https://api.mathpix.com/files/v1/jobs/2026-05-contracts/files?status=error"

import os, requests
headers = {"app_key": os.environ["APP_KEY"]}
job = requests.get("https://api.mathpix.com/files/v1/jobs/2026-05-contracts",
                   headers=headers).json()
print(job["status"], job["files_completed"], "/", job["file_count"])
errored = requests.get("https://api.mathpix.com/files/v1/jobs/2026-05-contracts/files",
                       params={"status": "error"}, headers=headers).json()
print([f["custom_id"] for f in errored["files"]])

const headers = { app_key: process.env.APP_KEY };
const job = await (await fetch(
  "https://api.mathpix.com/files/v1/jobs/2026-05-contracts",
  { headers },
)).json();
console.log(job.status, job.files_completed, "/", job.file_count);
const errored = await (await fetch(
  "https://api.mathpix.com/files/v1/jobs/2026-05-contracts/files?status=error",
  { headers },
)).json();
console.log(errored.files.map((f) => f.custom_id));

for _, path := range []string{
    "/files/v1/jobs/2026-05-contracts",
    "/files/v1/jobs/2026-05-contracts/files?status=error",
} {
    req, _ := http.NewRequest("GET", "https://api.mathpix.com"+path, nil)
    req.Header.Set("app_key", os.Getenv("APP_KEY"))
    resp, _ := http.DefaultClient.Do(req)
    result, _ := io.ReadAll(resp.Body)
    resp.Body.Close()
    fmt.Println(string(result))
}

HttpClient client = HttpClient.newHttpClient();
for (String path : new String[]{
        "/files/v1/jobs/2026-05-contracts",
        "/files/v1/jobs/2026-05-contracts/files?status=error"}) {
    HttpRequest request = HttpRequest.newBuilder()
        .uri(URI.create("https://api.mathpix.com" + path))
        .header("app_key", System.getenv("APP_KEY"))
        .GET()
        .build();
    System.out.println(client.send(request, HttpResponse.BodyHandlers.ofString()).body());
}

The job status once both files finish:

Example response (job status)
{
  "job_id": "2026-05-contracts",
  "status": "completed",
  "file_count": 2,
  "files_completed": 2,
  "files_errored": 0,
  "created_at": "2026-07-21T19:37:01.227Z",
  "modified_at": "2026-07-21T19:39:20.328Z"
}

An empty files list from the error listing means no file failed:

Example response (errored files)
{
  "files": [],
  "next_page_token": null
}

When files do fail, each entry in the error listing carries the custom_id you assigned at submission, so you can resubmit exactly the failed inputs; the Files API Quickstart walks through a batch with a failing file.

note

If you lose a job_id, recover it with GET /files/v1/jobs, which lists your jobs newest first with optional date filters.

With the submission above, each document's results land under its own prefix: for the first file, s3://acme-output/processed/2026-05/contract-001/contract-001.mmd, the requested contract-001.docx and contract-001.md, and an images/ folder of crops (written because the submission set "image_output_mode": "local"). This is the same layout SCS produced.

Support

Questions during migration: email support@mathpix.com with subject [SCS migration].
Your existing SCS contract terms carry over until your migration is complete; contact support before submitting volume that would exceed your previous SCS allotment.
Help with data source setup: the Data Sources API reference covers the per-provider IAM steps. Send your AWS account ID, Azure tenant, or Google Cloud project to support if you'd like a setup review.

Next steps

Data Sources API: register your source and destination buckets.
Files API Quickstart: end-to-end walkthrough.
Async Batch Document Processing: full reference for POST /files/v1/jobs, including idempotency, partial-success handling, and pagination.

Prerequisites​

What's changing​

What you keep doing​

What changes​

A worked migration​

Track the job​

Support​

Next steps​