Skip to main content

POST v3/text

POST api.mathpix.com/v3/text

Process an image containing math, text, tables, or chemistry diagrams. Accepts an image URL, base64-encoded image, or file upload (multipart form-data with options_json).

Returns structured content as Mathpix Markdown (with LaTeX math inside \( ... \) and \[ ... \] delimiters), HTML, or extracted data formats. Chemistry diagrams are returned as <smiles>...</smiles> SMILES notation.

See the image processing guide for step-by-step examples.

info

Mathpix ignores all EXIF data for all images, particularly EXIF orientation.

note

When sending an image file via multipart form-data, all options are sent as stringified JSON in a top-level options_json parameter.

info

Request limits: 5 MB JSON body, 10 MB image download from URL, 2 MB base64-encoded image, 15 second URL download timeout. See Limits & Quotas for details.

Request parameters

src string (optional)

Image URL or base64-encoded image (e.g. data:image/jpeg;base64,...)

metadata object (optional)

Key-value object

tags [string] (optional)

Tags are lists of strings that can be used to identify results. see query image results

async [bool] (optional)

This flag is to be used when sending non-interactive requests

callback object (optional)
formats [string] (optional)

List of formats, see Format Descriptions. Empty array or object returns the text format.

Values

text, data, html, latex_styled

data_options object (optional)

See DataOptions section, specifies outputs for data and html return fields

include_detected_alphabets bool (optional)

Return detected alphabets

alphabets_allowed object (optional)

See AlphabetsAllowed section, use this to specify which alphabets you don't want in the output

region object (optional)

Specify the image area with the pixel coordinates top_left_x, top_left_y, width, and height. All four properties are required if region is provided. Empty object {} is valid (treated as no region)

enable_blue_hsv_filter bool (optional), default value is false

Enables a special mode of image processing where it OCRs blue hue text exclusively.

confidence_threshold number in [0,1] (optional)

Specifies threshold for triggering math_confidence errors. Returns error when confidence is below this value

confidence_rate_threshold number in [0,1] (optional), default value is 0.75

Specifies threshold for triggering math_confidence errors at the symbol level.

include_equation_tags bool (optional)

Specifies whether to include equation number tags inside equations LaTeX. When set to true, it sets "idiomatic_eqn_arrays": true, because equation numbering works better in those environments compared to the array environment.

Example

\tag{eq_number}, where eq_number is an equation number (e.g. 1.12)

include_line_data bool (optional)

Specifies whether to return information segmented line by line, see LineData object section for details

include_word_data bool (optional)

Specifies whether to return information segmented word by word, see WordData object section for details

include_smiles bool (optional)

Enable experimental chemistry diagram OCR, via RDKIT normalized SMILES with isomericSmiles=False, included in text output format, via MMD SMILES syntax <smiles>...</smiles>

include_inchi bool (optional)

Include InChI data as XML attributes inside <smiles> elements. Only applies when include_smiles is true.

Example

<smiles inchi="..." inchikey="...">...</smiles>

include_diagram_text bool (optional), default value is false

Enables text extraction from diagrams (use with "include_line_data": true). The extracted text will be part of line data, and not part of the text, or any other output format specified. The parent_id of these text lines will correspond to the id of one of the diagrams in the line data. Also, diagram will have children_ids to store references to these text lines.

include_page_info bool (optional), default value is true

Controls whether page info elements are included in the final text output. Page info refers to various elements that are not part of the main text.

auto_rotate_confidence_threshold number in [0,1] (optional), default value is 0.99

Specifies threshold for auto rotating image to correct orientation. Can be disabled with a value of 1 (see Auto rotation section for details).

rm_spaces bool (optional), default value is true

Determines whether extra white space is removed from equations in latex_styled and text formats.

rm_fonts bool (optional), default value is false

Determines whether font commands such as \mathbf and \mathrm are removed from equations in latex_styled and text formats.

idiomatic_eqn_arrays bool (optional), default value is false

Specifies whether to use aligned, gathered, or cases instead of an array environment for a list of equations.

idiomatic_braces bool (optional), default value is false

Specifies whether to remove unnecessary braces for LaTeX output.

Example

x^2 is returned instead of x^{2}

numbers_default_to_math bool (optional), default value is false

Specifies whether numbers are always math.

Example

Answer: \( 17 \) instead of Answer: 17

math_fonts_default_to_math bool (optional), default value is false

Specifies whether math fonts are always math.

Example

Answer: \( 2 \mathrm { ms } \) instead of Answer: 2 ms

math_inline_delimiters [string, string] (optional), default value is ["\\(", "\\)"]

Specifies begin inline math and end inline math delimiters for text outputs.

math_display_delimiters [string, string] (optional), default value is ["\\[", "\\]"]

Specifies begin display math and end display math delimiters for text outputs.

enable_spell_check bool (deprecated)

Deprecated, has no effect on the output.

enable_tables_fallback bool, default value is false

Enables advanced table processing algorithm that supports very large and complex tables.

fullwidth_punctuation bool (optional), default value is null

Controls if punctuation will be fullwidth Unicode (default for east Asian languages like Chinese), or halfwidth Unicode (default for Latin scripts, Cyrillic scripts etc.). When null, fullwidth vs halfwidth will be decided based on image content. Punctuation inside math will always stay halfwidth.

Format descriptions

FormatDescription
textMathpix Markdown
htmlHTML rendered from text via mathpix-markdown-it
dataData computed from text as specified in the data_options request parameter
latex_styledStyled Latex, returned only in cases that the whole image can be reduced to a single equation

DataOptions object

Data options are used to return elements of the image output. These outputs are all computed from the text format described above. The data_options parameter must be an object — only the keys listed below are accepted. Unknown keys return opts_unknown_data_option.

include_svg bool (optional)

include math SVG in html and data formats

include_table_html bool (optional)

include HTML for html and data outputs (tables only)

include_latex bool (optional)

include math mode latex in data and html

include_tsv bool (optional)

include tab separated values (TSV) in data and html outputs (tables only)

include_asciimath bool (optional)

include asciimath in data and html outputs

include_mathml bool (optional)

include mathml in data and html outputs

include_sub_math bool (optional)

include sub-math elements in data and html outputs

Response body

request_id string

Request ID, for debugging purposes

text string (optional)

Recognized text format, if such is found

latex_styled string (optional)

Math Latex string of math equation, if the image is of a single equation

confidence number in [0,1] (optional)

Estimated probability 100% correct

confidence_rate number in [0,1] (optional)

Estimated confidence of output quality

line_data [object] (optional)

List of LineData objects

word_data [object] (optional)

List of WordData objects

data [object] (optional)

List of Data objects

html string (optional)

Annotated HTML output

detected_alphabets [object] (optional)
is_printed bool (optional)

Specifies if printed content was detected in an image

is_handwritten bool (optional)

Specifies if handwritten content was detected in an image

auto_rotate_confidence number in [0,1] (optional)

Estimated probability that image needs to be rotated, see Auto rotation

auto_rotate_degrees number in {0, 90, -90, 180} (optional)

Estimated angle of rotation in degrees to put image in correct orientation, see Auto rotation

error string (optional)

US locale error message

error_info object (optional)

Error info object

version string

This string is opaque to clients and only useful as a way of understanding differences in results for requests using the same image. Our service relies on training data, the service implementation, and the underlying platforms we run on (e.g., AWS, PyTorch). Initially, the version string will only change when the training data or process changes, but in the future we might provide additional distinctions between versions.

Data object

Data objects allow extracting the math elements from an OCR result.

type string

one of asciimath, mathml, latex, svg, tsv

value string

value corresponding to type

DetectedAlphabet object

Each field is a boolean that is true if any characters from that alphabet are recognized in the image, regardless of whether the result fields contain those characters.

  • en — English
  • hi — Hindi Devanagari
  • zh — Chinese
  • ja — Kana Hiragana or Katakana
  • ko — Hangul Jamo
  • ru — Russian
  • th — Thai
  • ta — Tamil
  • te — Telugu
  • gu — Gujarati
  • bn — Bengali
  • vi — Vietnamese

AlphabetsAllowed object

A map from alphabet key to boolean that controls which alphabets are allowed in the output. This is useful when different alphabets contain look-alike characters (e.g. Latin B vs Cyrillic В) that can cause incorrect Unicode encodings in the result.

  • Keys correspond to the alphabet codes listed in DetectedAlphabet (e.g. hi, ru)
  • By default all alphabets are allowed
  • Set a key to false to suppress that alphabet in the output
  • Setting a key to true has the same effect as omitting it

Example

{"alphabets_allowed": {"ru": false, "hi": false}}

LineData object

Returned when include_line_data is set to true. Contains information about all textual line elements detected in the image. Concatenating content from line_data recreates the top-level text, html, and data fields.

The OCR engine does not support some lines (like diagrams), which are skipped. Lines with extraneous content (like equation numbers) or low confidence have conversion_output set to false.

id string

Unique line identifier

parent_id string (optional)

Unique line identifier of the parent.

children_ids [string] (optional)

List of children unique identifiers.

type string

See line types and subtypes for details.

subtype string (optional)

See line types and subtypes for details.

cnt [[x,y]]

Contour for line expressed as list of (x,y) pixel coordinate pairs

included bool

Whether this line is included in the top level OCR result (deprecated, use conversion_output)

conversion_output boolean

Whether this line is included in the top level OCR result

is_printed bool

True if line has printed text, false otherwise.

is_handwritten bool

True if line has handwritten text, false otherwise.

error_id string (optional)

Error ID, reason why the line is not included in final result

text string (optional)

Text (Mathpix Markdown) for line

confidence number in [0,1] (optional)

Estimated probability 100% correct

confidence_rate number in [0,1] (optional)

Estimated confidence of output quality

after_hyphen bool (optional)

specifies if the current line occurs after the text line which ended with hyphen

html string (optional)

Annotated HTML output for the line

data [Data] (optional)

List of Data object's

Possible values for error_id:

  • image_not_supported — OCR engine doesn't accept this type of line
  • image_max_size — line is larger than maximal size which OCR engine supports
  • math_confidence — OCR engine failed to confidently recognize the content of the line
  • image_no_content — line has strange spatial dimensions, e.g. height of the line is zero

Line data types and subtypes

Types and subtypes returned as part of line data and PDF lines data (types are the keys, subtypes are values):

{
"chart_info": [],
"x_axis_tick_label": [],
"y_axis_tick_label": [],
"x_axis_label": [],
"y_axis_label": [],
"legend_label": [],
"model_label": [],
"page_info": [],
"equation_number": [],
"table": [],
"diagram": [
"algorithm",
"pseudocode",
"chemistry",
"chemistry_reaction",
"triangle"
],
"chart": [
"column",
"bar",
"line",
"analytical",
"pie",
"scatter",
"area"
],
"diagram_info": [],
"text": [
"vertical",
"big_capital_letter"
],
"math": [],
"column": [],
"code": [],
"pseudocode": [],
"figure_label": [],
"form_field": [
"parentheses",
"dotted",
"dashed",
"box",
"checkbox",
"circle"
],
"qed_symbol": [],
"multiple_choice_block": [],
"multiple_choice_option": [],
"footnote": [],
"table_of_contents_container": [],
"table_of_contents_row": [],
"table_of_contents_item": [],
"table_of_contents_number": [],
"title": [],
"quote": [],
"section_header": [],
"authors": [],
"abstract": [],
"rotated_container": [],
"table_cell": [
"split",
"spanning"
]
}

WordData object

Returned when include_word_data is set to true. Contains information about all word-level elements detected in the image.

type string

One of text, math, table, diagram, equation_number

subtype string (optional)

Either not set, or chemistry, or triangle (more diagram subtypes coming soon)

cnt [[x,y]]

Contour for word expressed as list of (x,y) pixel coordinate pairs

text string (optional)

Text (Mathpix Markdown) for word

latex string (optional)

Math mode LaTeX (Mathpix Markdown) for word

confidence number in [0,1] (optional)

Estimated probability 100% correct

confidence_rate number in [0,1] (optional)

Estimated confidence of output quality

Auto rotation

The auto rotation feature detects when images are in the wrong orientation and corrects them before processing.

Control the confidence threshold with the auto_rotate_confidence_threshold request parameter (number in [0,1]). Default is 0.99, meaning the image is rotated only when the algorithm is 99% confident. Set to 1 to disable auto rotation.

The response includes:

  • auto_rotate_confidence — confidence that the image needs rotation (number in [0,1], ~0 if correct, ~1 if rotated)
  • auto_rotate_degrees — rotation angle applied (one of 0, 90, -90, 180)