Beyond the Leaderboard: Bridging Research and Real-World AI Performance

November 11, 2025

On-Demand Webinar

Date:

November 11, 2025

Time:

Duration:

1 hour

Watch Now

Read the transcript

Watch now

On-demand webinar

Read the transcript

Overview

Public benchmarks help us understand progress in AI but they rarely reflect real-world performance. In this session, SAP’s Chief Data Scientist, Daniel Dahlmeier, breaks down why enterprise teams see such a gap between leaderboard results and deployed model behaviour, and what it takes to evaluate models in a way that truly represents real-world outcomes.

This webinar covers practical, research-backed techniques to measure accuracy, safety, and reasoning more effectively across LLMs, multimodal models, and agents.

What You’ll Learn

Why benchmark results often fail in production
‍Static prompts, model-centric evaluation, and narrow test sets can misrepresent true rankings.
How SAP approaches enterprise-grade evaluations
‍Including domain-specific datasets, multimodal document extraction, prompt optimization, and safety assessments.
The impact of prompt optimization on model rankings
‍SAP’s research shows that when prompts are optimized per model, leaderboard positions shift dramatically, sometimes flipping top and bottom performers.
The future of safety evaluation & automated red teaming
‍Why static harmful-prompt benchmarks fall short, and how dynamic red teaming uncovers deeper vulnerabilities.
How enterprises should evaluate agents
‍From static to dynamic multi-turn evaluation, tool-use assessment, and measuring partial progress to understand where agents fail and why.

Who This Is For

AI leaders and product owners building enterprise AI systems
ML engineers and researchers running LLM evaluations
Teams deploying agents, copilots, or automation workflows
Anyone wanting a practical, realistic view of model performance

Speakers

Daniel Dahlmeier

Chief Data Scientist

SAP

Daniel is a Data Science Manager and the Chief Data Scientist at the SAP Business AI unit, where his team develops deep-learning models for document processing and benchmarking. He also serves as an adjunct assistant professor at the Singapore University of Technology and Design and teaches at Heidelberg University.

Si Chen

VP, Strategy & Marketing

Appen

Si leads strategy and marketing at Appen and brings extensive experience across traditional AI/ML models, generative AI, multimodal AI systems, and intelligent robotics. Prior to Appen, she held leadership positions at Tencent AI & Robotics Lab and AWS China, driving innovation and partnerships in applied AI.