KAROKAN-EUThe European AI Productivity Index2026

The European AI Productivity Index

Name: KAROKAN-EU: The European AI Productivity Index
Creator: Karokan

Assesses whether frontier AI models can perform economically valuable professional tasks in European contexts — EU law, multi-country taxation, industrial standards, and cross-border regulatory analysis.

Blog Research team

1,200+

Planned tasks

Professional domains

Languages at launch

2026

Release

Leaderboard coming 2026

First results will be published alongside the initial release. Contact the research team to participate in the pilot evaluation.

Task categories

Weighted contribution to the overall score

EU Law & Regulation22%

Interpretation of directives, regulations, and court decisions across member states

Cross-border Taxation18%

VAT, transfer pricing, and multi-jurisdiction compliance tasks

Industrial Standards16%

CE marking, EN/ISO compliance, and technical product documentation

Financial Services15%

MiFID II, DORA, and Basel III application in European contexts

Public Procurement13%

OJEU notices, tender evaluation, and contracting authority obligations

Data & Privacy (GDPR)16%

Data subject rights, DPIAs, and cross-border transfer mechanisms

Related research

Published work this benchmark builds on — and the gap it addresses

GDPval: Evaluating AI Model Performance on Real-World Economically Valuable Tasks ↗

OpenAI, 2025

Measures model performance on real deliverables across 44 occupations — but centred on the US economy. KAROKAN-EU applies the same economically-grounded approach to EU professional contexts.

LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in LLMs ↗

Guha et al., 2023 · arXiv:2308.11462

162 legal reasoning tasks, almost entirely US common-law. No equivalent coverage exists for EU and member-state civil law systems.

MultiEURLEX: A Multi-lingual and Multi-label Legal Document Classification Dataset ↗

Chalkidis et al., 2021 · arXiv:2109.00904

65k EU laws in 23 official EU languages — evidence that EU legal text is a distinct, underserved evaluation domain.

LexGLUE: A Benchmark Dataset for Legal Language Understanding in English ↗

Chalkidis et al., 2022 · arXiv:2110.00976

Includes EUR-Lex tasks but evaluates in English only, leaving multilingual EU legal reasoning unmeasured.

FinBen: A Holistic Financial Benchmark for Large Language Models ↗

Xie et al., 2024 · arXiv:2402.12659

36 datasets across 24 financial tasks — English and US-market centred, motivating an EU financial-regulation counterpart (MiFID II, DORA, Basel III).

About

KAROKAN-EU is designed to measure AI productivity in European professional settings. Unlike English-centric evaluations, it probes model capabilities on tasks that require deep knowledge of EU institutional frameworks, member-state legal systems, and cross-border regulatory complexity. Tasks are authored by verified domain experts — lawyers, tax advisors, policy analysts, and engineers — and independently validated before inclusion.

Methodology

Each task is evaluated in a closed-book, multi-turn setting. Models receive a realistic professional prompt and are scored on factual accuracy, legal correctness, and contextual completeness by expert human raters. Final scores are aggregated across professional domains using a weighted average reflecting economic activity distribution in the EU.

Get involved

Review the benchmark design, submit a model for the pilot evaluation, or collaborate with our research team.

Contact research →

Other benchmarks

KAROKAN-LANG2026

European Multilingual Evaluation

A rigorous benchmark evaluating LLM quality beyond English, across all 24 official EU languages in professional and institutional contexts.

KAROKAN-ACTQ4 2026

AI Act Compliance Benchmark

A benchmark evaluating whether AI systems satisfy EU AI Act requirements: risk classification, documentation, transparency, and human oversight at model and system level.