Resources
Webinars

Beyond the Leaderboard: Bridging Research and Real-World AI Performance

November 11, 2025
On-Demand Webinar
On-Demand Webinar
Date:
November 11, 2025
Time:
Duration:
1 hour
Watch now

On-demand webinar

Share

Overview

Public benchmarks help us understand progress in AI but they rarely reflect real-world performance. In this session, SAP’s Chief Data Scientist, Daniel Dahlmeier, breaks down why enterprise teams see such a gap between leaderboard results and deployed model behaviour, and what it takes to evaluate models in a way that truly represents real-world outcomes.

This webinar covers practical, research-backed techniques to measure accuracy, safety, and reasoning more effectively across LLMs, multimodal models, and agents.

What You’ll Learn

  • Why benchmark results often fail in production
    Static prompts, model-centric evaluation, and narrow test sets can misrepresent true rankings.
  • How SAP approaches enterprise-grade evaluations
    Including domain-specific datasets, multimodal document extraction, prompt optimization, and safety assessments.
  • The impact of prompt optimization on model rankings
    SAP’s research shows that when prompts are optimized per model, leaderboard positions shift dramatically, sometimes flipping top and bottom performers.
  • The future of safety evaluation & automated red teaming
    Why static harmful-prompt benchmarks fall short, and how dynamic red teaming uncovers deeper vulnerabilities.
  • How enterprises should evaluate agents
    From static to dynamic multi-turn evaluation, tool-use assessment, and measuring partial progress to understand where agents fail and why.

Who This Is For

  • AI leaders and product owners building enterprise AI systems
  • ML engineers and researchers running LLM evaluations
  • Teams deploying agents, copilots, or automation workflows
  • Anyone wanting a practical, realistic view of model performance

Speakers

Daniel Dahlmeier
Chief Data Scientist
SAP

Daniel is a Data Science Manager and the Chief Data Scientist at the SAP Business AI unit, where his team develops deep-learning models for document processing and benchmarking. He also serves as an adjunct assistant professor at the Singapore University of Technology and Design and teaches at Heidelberg University.

Si Chen
VP, Strategy & Marketing
Appen

Si leads strategy and marketing at Appen and brings extensive experience across traditional AI/ML models, generative AI, multimodal AI systems, and intelligent robotics. Prior to Appen, she held leadership positions at Tencent AI & Robotics Lab and AWS China, driving innovation and partnerships in applied AI.