Text Recognition
Extract and recognize text from document images including titles, text blocks, headers, and footers. Lower edit distance indicates better performance.
Multimodal Document Understanding
Building on OmniDocBench and its upcoming extension OmniDocBench-Pro, the challenge provides a unified evaluation framework for document-centric multimodal tasks involving charts, tables, figures, layouts, and natural text.
The DataMFM Challenge focuses on multimodal document understanding, a core challenge at the intersection of vision, language, and structured reasoning. Building on OmniDocBench and its upcoming extension OmniDocBench-Pro, the challenge provides a unified evaluation framework for document-centric multimodal tasks involving charts, tables, figures, layouts, and natural text.
This challenge is part of the Emerging Directions in Data for Multimodal Foundation Models (DataMFM) workshop at CVPR 2026, which examines research directions including web-scale to world-scale data, agentic and self-generated data pipelines, and principled data selection and mixture design.
Given a document page image, produce a complete Markdown representation. The evaluation decomposes the output into four dimensions, following the OmniDocBench evaluation framework:
Extract and recognize text from document images including titles, text blocks, headers, and footers. Lower edit distance indicates better performance.
Parse table structures and extract cell contents from complex document tables. Evaluated on structural and content accuracy.
Recognize mathematical formulas and convert them to LaTeX. Evaluated by CDM (Character Detection Matching), which compares rendered formula images for fair assessment across diverse LaTeX representations.
Overall Score Formula: Rankings are determined by the Overall score:
Overall = ((1 − Text_Edit_Distance) × 100 + Table_TEDS + Formula_CDM) / 3
Note: Challenge timeline will be announced soon. See the workshop dates for paper submission deadlines.
Coming Soon: The leaderboard will be available once the challenge officially launches. Stay tuned!
| Rank | Team | Text ED ↓ | Table TEDS ↑ | Formula CDM ↑ | Overall ↑ |
|---|---|---|---|---|---|
| 1 | — | — | — | — | — |
| 2 | — | — | — | — | — |
| 3 | — | — | — | — | — |
The challenge data is based on OmniDocBench and its upcoming extension OmniDocBench-Pro. Download links and detailed documentation will be released when the challenge launches.
Submit predictions as Markdown files — one .md file per document page, packed in a .zip archive:
\n\n). Headings with #.$$...$$, inline in $...$. Content must be LaTeX.<table> format (recommended for merged cells) or Markdown pipe tables..md file per pageimage_001.jpg → image_001.md).md files into a single .zip archive (flat structure, no subdirectories).zip file to EvalAIOpen to all researchers worldwide. No team size limit.
Max 3 per day. Final evaluation allows 2 submissions.
Top teams must submit technical report. External data must be disclosed.