DataMFM

Multimodal Document Understanding

Building on OmniDocBench and its upcoming extension OmniDocBench-Pro, the challenge provides a unified evaluation framework for document-centric multimodal tasks involving charts, tables, figures, layouts, and natural text.

--
Days
--
Hours
--
Minutes
--
Seconds

Overview

The DataMFM Challenge focuses on multimodal document understanding, a core challenge at the intersection of vision, language, and structured reasoning. Building on OmniDocBench and its upcoming extension OmniDocBench-Pro, the challenge provides a unified evaluation framework for document-centric multimodal tasks involving charts, tables, figures, layouts, and natural text.

This challenge is part of the Emerging Directions in Data for Multimodal Foundation Models (DataMFM) workshop at CVPR 2026, which examines research directions including web-scale to world-scale data, agentic and self-generated data pipelines, and principled data selection and mixture design.

End-to-End Document Parsing

Given a document page image, produce a complete Markdown representation. The evaluation decomposes the output into four dimensions, following the OmniDocBench evaluation framework:

01

Text Recognition

Extract and recognize text from document images including titles, text blocks, headers, and footers. Lower edit distance indicates better performance.

Edit Distance ↓
02

Table Recognition

Parse table structures and extract cell contents from complex document tables. Evaluated on structural and content accuracy.

TEDS ↑
03

Formula Recognition

Recognize mathematical formulas and convert them to LaTeX. Evaluated by CDM (Character Detection Matching), which compares rendered formula images for fair assessment across diverse LaTeX representations.

CDM ↑

Overall Score Formula: Rankings are determined by the Overall score:

Overall = ((1 − Text_Edit_Distance) × 100 + Table_TEDS + Formula_CDM) / 3

Timeline

Note: Challenge timeline will be announced soon. See the workshop dates for paper submission deadlines.

TBD
Challenge Announcement
TBD
Data Release
TBD
Submission Deadline
June 2026
Workshop @ CVPR

Leaderboard

Coming Soon: The leaderboard will be available once the challenge officially launches. Stay tuned!

Leaderboard Preview

RankTeamText ED ↓Table TEDS ↑Formula CDM ↑Overall ↑
1
2
3

Dataset

The challenge data is based on OmniDocBench and its upcoming extension OmniDocBench-Pro. Download links and detailed documentation will be released when the challenge launches.

Submission

Submission Format

Submit predictions as Markdown files — one .md file per document page, packed in a .zip archive:

submission.zip ├── document_page_001.md ← matches document_page_001.jpg ├── document_page_002.md └── ... ─── Example .md content ─── # Section Title Body text paragraph with standard Markdown formatting. Separate paragraphs with double newlines. $$ \frac{\partial L}{\partial \theta} = \sum_{i=1}^{N} \nabla_\theta \ell(f(x_i), y_i) $$ <table> <tr> <th>Model</th><th>Accuracy</th> </tr> <tr> <td>Baseline</td><td>82.3</td> </tr> </table>
Format Rules:
Text: Standard Markdown. Paragraphs separated by double newlines (\n\n). Headings with #.
Formulas: Display formulas in $$...$$, inline in $...$. Content must be LaTeX.
Tables: HTML <table> format (recommended for merged cells) or Markdown pipe tables.
Order: Elements must appear in reading order — the eval script uses element position in the file as the predicted reading order.

How to Submit

  1. Run your model on all document page images to produce one .md file per page
  2. Name each file to match the image filename (e.g., image_001.jpgimage_001.md)
  3. Pack all .md files into a single .zip archive (flat structure, no subdirectories)
  4. Upload the .zip file to EvalAI
Submit on EvalAI

Rules

Eligibility

Open to all researchers worldwide. No team size limit.

Submissions

Max 3 per day. Final evaluation allows 2 submissions.

Requirements

Top teams must submit technical report. External data must be disclosed.