Skip to main content

Process an Image

Submit an image containing math, text, tables, or chemistry diagrams and get back structured content as Mathpix Markdown, LaTeX, HTML, or other formats.

What you can process

  • Math equations (printed and handwritten)
  • Text with math (mixed content)
  • Tables (structured tabular data)
  • Chemistry diagrams (returned as SMILES notation)
  • Documents (multi-line content with layout)

Send an image URL

Send an image URL to the v3/text endpoint:

Input image for this example: a handwritten piecewise function.
Input image for this example: a handwritten piecewise function.
from mpxpy.mathpix_client import MathpixClient
client = MathpixClient(app_id="APP_ID", app_key="APP_KEY")
image = client.image_new(
url="https://mathpix-ocr-examples.s3.amazonaws.com/cases_hw.jpg"
)
# Get Mathpix Markdown
print(image.mmd())
# Get line-by-line OCR data
print(image.lines_json())
Example response
{
"auto_rotate_confidence": 0,
"auto_rotate_degrees": 0,
"confidence": 1,
"confidence_rate": 1,
"image_height": 332,
"image_width": 850,
"is_handwritten": true,
"is_printed": false,
"latex_styled": "f(x)=\\left\\{\\begin{array}{ll}\nx^{2} & \\text { if } x<0 \\\\\n2 x & \\text { if } x \\geq 0\n\\end{array}\\right.",
"request_id": "14b53567-9f6c-4895-ab3d-e4a8ae18f9c1",
"text": "$f(x)=\\left\\{\\begin{array}{ll}x^{2} & \\text { if } x<0 \\\\ 2 x & \\text { if } x \\geq 0\\end{array}\\right.$",
"version": "SuperNet-109p4"
}

In the example response, the latex_styled field renders as:

f(x)={x2 if x<02x if x0f(x)=\left\{\begin{array}{ll} x^{2} & \text { if } x<0 \\ 2 x & \text { if } x \geq 0 \end{array}\right.

Send an image file

Upload an image file to the v3/text endpoint via multipart form-data:

from mpxpy.mathpix_client import MathpixClient
client = MathpixClient(app_id="APP_ID", app_key="APP_KEY")
image = client.image_new(file_path="cases_hw.jpg")
print(image.mmd())
note

When sending an image file, all options are sent as stringified JSON in a top-level options_json parameter.

Request data and HTML formats

Request multiple output formats from the v3/text endpoint in a single call:

{
"src": "https://mathpix-ocr-examples.s3.amazonaws.com/cases_hw.jpg",
"formats": ["text", "data", "html"],
"data_options": {
"include_asciimath": true,
"include_latex": true
}
}
Example response with multiple formats
{
"request_id": "054135c6-fca6-4a46-8f74-2814fa13dc8e",
"version": "SuperNet-109p4",
"is_printed": false,
"is_handwritten": true,
"confidence": 1,
"confidence_rate": 1,
"text": "\\( f(x)=\\left\\{\\begin{array}{ll}x^{2} & \\text { if } x<0 \\\\ 2 x & \\text { if } x \\geq 0\\end{array}\\right. \\)",
"html": "<div><span class=\"math-inline\">...</span></div>",
"data": [
{
"type": "asciimath",
"value": "f(x)={[x^(2),\" if \"x < 0],[2x,\" if \"x >= 0]:}"
},
{
"type": "latex",
"value": "f(x)=\\left\\{\\begin{array}{ll}x^{2} & \\text { if } x<0 \\\\ 2 x & \\text { if } x \\geq 0\\end{array}\\right."
}
]
}

The response includes a text field, an html field, and a data array with both asciimath and latex representations, as requested via the formats and data_options parameters. The latex value in the data array renders as:

f(x)={x2 if x<02x if x0f(x)=\left\{\begin{array}{ll} x^{2} & \text { if } x<0 \\ 2 x & \text { if } x \geq 0 \end{array}\right.

Get line-by-line data

Set the include_line_data request parameter on the v3/text endpoint to get per-line results with position contours, useful for overlaying results on the original image.

Input image for this example: text with a circuit diagram.
Input image for this example: text with a circuit diagram.
{
"src": "https://mathpix.com/examples/text_with_diagram.png",
"formats": ["text"],
"include_line_data": true
}
Example response with line_data
{
"auto_rotate_confidence": 0,
"auto_rotate_degrees": 0,
"confidence": 0.45044320869814447,
"confidence_rate": 0.9904373020612357,
"image_height": 733,
"image_width": 932,
"is_handwritten": false,
"is_printed": true,
"line_data": [
{
"after_hyphen": false,
"cnt": [[62,37],[265,39],[542,45],[787,54],[860,56],[863,71],[863,85],[860,93],[784,93],[451,87],[87,76],[31,73],[0,71],[0,37]],
"confidence": 0.45044320869814447,
"confidence_rate": 0.9904373020612357,
"conversion_output": true,
"id": "74d60966b4ad49a7acf63d2c7e6cbbc6",
"included": true,
"is_handwritten": false,
"is_printed": true,
"text": "Equivalent resistance between points \\( \\mathrm{A} \\& \\mathrm{~B} \\) in the adjacent circuit is -",
"type": "text"
},
{
"cnt": [[0,687],[0,238],[656,238],[656,687]],
"conversion_output": false,
"error_id": "image_not_supported",
"id": "282982c526304333b76ba533d21dd909",
"included": false,
"is_handwritten": false,
"is_printed": true,
"type": "diagram"
}
],
"request_id": "53b99d58-28d6-4ab3-a606-b94457885e73",
"text": "Equivalent resistance between points \\( \\mathrm{A} \\& \\mathrm{~B} \\) in the adjacent circuit is -",
"version": "SuperNet-109p4"
}

For this example response, the line_data array contains two elements, one for each detected region in the image:

  1. The first element has "type": "text" and contains the recognized sentence in its text field, along with a cnt array describing the polygon contour around that region. The text field renders as:

    Equivalent resistance between points A & B in the adjacent circuit is -

  2. The second element has "type": "diagram". Because the OCR engine detects diagrams but does not convert them to text, it returns "error_id": "image_not_supported" and "included": false.

The cnt arrays contain polygon coordinates that describe the boundary of each detected region. The API returns these coordinates as raw data — the visualization below is rendered here for illustration only:

textdiagram
Illustration of the cnt polygon coordinates overlaid on the input image. Blue: text region. Orange: diagram region. The API returns the raw coordinates — rendering is up to your application.

Get word-by-word data

Set the include_word_data request parameter on the v3/text endpoint to get per-word results with position contours.

Input image for this example: text mixed with math.
Input image for this example: text mixed with math.
{
"src": "https://mathpix.com/examples/text_with_math_0.jpg",
"include_word_data": true
}
Example response with word_data
{
"is_printed": true,
"is_handwritten": false,
"auto_rotate_confidence": 0.00939574267408716,
"auto_rotate_degrees": 0,
"word_data": [
{
"type": "text",
"cnt": [[111, 104], [3, 104], [3, 74], [111, 74]],
"text": "Perform",
"confidence": 0.99951171875,
"confidence_rate": 0.9999593007867263,
"latex": "\\text { Perform }"
},
{
"type": "text",
"cnt": [[160, 104], [115, 104], [115, 74], [160, 74]],
"text": "the",
"confidence": 1,
"confidence_rate": 1,
"latex": "\\text { the }"
},
{
"type": "math",
"cnt": [[322, 191], [132, 191], [132, 110], [322, 110]],
"text": "\\( \\frac{2 p-2}{p} \\div \\frac{4 p-4}{9 p^{2}} \\)",
"confidence": 0.99853515625,
"confidence_rate": 0.9999436201400773,
"latex": "\\frac{2 p-2}{p} \\div \\frac{4 p-4}{9 p^{2}}"
}
]
}

For this example response, the word_data array contains three elements. Each element includes a type field ("text" or "math"), a cnt array with the bounding polygon, and a latex field with the LaTeX representation:

  1. "type": "text" — the word "Perform"
  2. "type": "text" — the word "the"
  3. "type": "math" — the math expression, whose latex field renders as:
2p2p÷4p49p2\frac{2 p-2}{p} \div \frac{4 p-4}{9 p^{2}}

Auto rotation

The v3/text endpoint automatically corrects images in the wrong orientation. Control auto rotation with the auto_rotate_confidence_threshold request parameter (default 0.99). Set to 1 to disable.

Input image for this example: incorrect orientation.
Input image for this example: incorrect orientation.
Output: same image after automatic rotation correction.
Output: same image after automatic rotation correction.

Next steps