Invited Speakers

Ranjay Krishna
Ranjay Krishna

University of Washington

Ziwei Liu
Ziwei Liu

Nanyang Technological University

Yilun Du
Yilun Du

Harvard University

Aishwarya Agrawal
Aishwarya Agrawal

University of Montreal, Mila

Tentative Schedule

June 3, 2026  |  1:00 PM – 6:00 PM  |  Room 111

Time Event
13:15–13:30 Opening
13:30–14:00 Invited Talk 1
14:00–14:15 Oral 1
14:15–14:45 Invited Talk 2
14:45–15:00 Oral 2
15:00–16:00 Poster Session + Coffee Break
16:00–16:30 Invited Talk 3
16:30–16:45 Oral 3
16:45–17:15 Invited Talk 4
17:15–17:30 Oral 4
17:30–17:40 Competition Announcement
17:40–17:50 Challenge Task 1 Winner Talk
17:50–18:00 Challenge Task 2 Winner Talk
18:00–18:10 Concluding Remarks
18:10 Adjourn

DataMFM Challenge

The DataMFM Challenge focuses on multimodal document understanding at the intersection of vision, language, and structured reasoning. It currently covers two complementary components: Document Parsing and Chart Understanding, based on newly prepared challenge datasets built from OmniDocBench and ChartNet.
Scope: Document Parsing + Chart Understanding.
Timeline: Apr 27 Release · May 11 Submission Opening · May 29 Submission Deadline · Jun 03 Workshop.
Challenge Portal: DataMFM Challenge Portal →

Call for Papers

We invite submissions on any topics related to Data for Multimodal Foundation Models (DataMFM), including, but not limited to:
  • Data collection, generation, and curation for multimodal foundation models
  • Data quality improvement, filtering, and pruning for scalable and efficient multimodal training
  • Data recipes and mixture design for balancing scale, quality, diversity, and coverage
  • Synthetic–real hybrid datasets and multimodal data augmentation for robust model development
  • Benchmark renewal, creation, and evaluation design for trustworthy multimodal applications
  • Detection and mitigation of dataset contamination in training and evaluation
  • Cross-modal alignment and grounding across text, image, audio, and video modalities
  • Fairness, bias reduction, and inclusive representation in multimodal datasets
  • Data provenance, documentation, licensing, and governance for trustworthy dataset lifecycles
  • Metrics and frameworks for assessing multimodal data quality, diversity, and contamination
  • Bridging modality gaps between text-rich and vision-centric domains
  • Agentic synthetic data generation and self-improving data pipelines driven by multimodal or VLA models
  • Building sustainable, transparent, and community-driven multimodal data ecosystems for next generation foundation models
Submission Guidelines:
The workshop accepts submissions in three tracks:
(1) Full-length Papers (Archival, Proceedings Track): Up to 8 pages, excluding references; Double-blind review; Accepted papers will appear in the CVPR 2026 Workshop Proceedings;
(2) Short Papers / Extended Abstracts (Non-archival): Up to 4 pages, excluding references; Double-blind review; Intended for work-in-progress, datasets, benchmarks, and early-stage ideas;
(3) CVPR 2026 Accepted Papers (Non-archival, Non-anonymous): Papers accepted to the main CVPR 2026 conference; Presented at the workshop but not included in the workshop proceedings
Submission Site: Proceedings Track: https://openreview.net/group?id=thecvf.com/CVPR/2026/Workshop/DataMFM_Proceedings_Track
Non-archival Track: https://openreview.net/group?id=thecvf.com/CVPR/2026/Workshop/DataMFM_Non-archival
All submissions should use the CVPR 2026 paper template.

Accepted Papers

Proceedings Track

  • DataMFM-1   VLA-AD: Agentic Vision-Language Foundation Models for Context-Aware Anomaly Detection
  • DataMFM-3   Scalable Parallel Prompting for Complex AV Video Captioning
  • DataMFM-5   Adversarial Feedback from Segmentation Network to Siamese Diffusion for Improving Tumor Segmentation
  • DataMFM-8   AdGaze-3500: Evaluating Large Multimodal Models' Ability to Predict Human Attention to Ads
  • DataMFM-9   TimeCausality: Evaluating the Causal Ability in Time Dimension for Vision Language Models
  • DataMFM-10   Uncertainty-Guided Data Curation for 3D Object Detection
  • DataMFM-11   Longitudinal Multimodal Modeling for Alzheimer's Disease with Pre-trained Brain Latent Diffusion and Mixture-of-Experts Fusion
  • DataMFM-12   Learning Multimodal Priors with Shared Vector Quantization for Incomplete Multimodal Diagnosis
  • DataMFM-13   VLM Reality Check: A Causal Counterfactual Benchmark for Diagnosing Cognitive Biases in Vision-Language Models
  • DataMFM-19   Cinéaste: A Fine-grained Contextual Movie Question Answering Benchmark with Automated Data Curation

Non-archival Track

To be announced.

Poster Printing: Please follow the instructions to print your posters. https://cvpr.thecvf.com/Conferences/2026/PosterPrinting

Important Dates

Event Date
Paper submission deadline March 10, 2026 (archival); April 13, 2026 (non-archival)
Notification of acceptance March 25, 2026 (archival); April 21, 2026 (non-archival)
Camera-ready submission deadline April 7, 2026 (archival)
Workshop date June 3, 2026, 1:00 PM – 6:00 PM, Room 111

Challenge Organizers

Xiaolong Luo

Harvard University

Simeng Han

Stanford University

Longtian Ye

2077AI Foundation

Minglai Yang

2077AI Foundation

Henry Zhang

University of California, Berkeley

Liam Liu

2077AI Foundation

Organizers

Pengyuan Li

MIT-IBM Watson AI lab

Zexue He

Stanford University

Zihan Wang

Abaka AI

Xuan (Ruby) Zhang

2077AI Foundation

Wenhu Chen

University of Waterloo

Manling Li

Northwestern University

Rogerio Feris

MIT-IBM Watson AI lab

Sponsors