EUR 100-180
per hour
Design adversarial suites for agentic workflows, tool misuse, multilingual jailbreaks, and instruction hijacking. Work directly with safety researchers on frontier models.
Role Directory
Public Karokan opportunities grouped under Remote.
EUR 100-180
per hour
Design adversarial suites for agentic workflows, tool misuse, multilingual jailbreaks, and instruction hijacking. Work directly with safety researchers on frontier models.
EUR 80-140
per hour
Create and validate professional-language benchmark tasks beyond English across public and enterprise workflows. Contribute to the KAROKAN-LANG evaluation suite.
EUR 90-160
per hour
Design verifier-backed reasoning tasks for advanced STEM evaluation and post-training refinement of frontier models.
EUR 130-210
per hour
Author and review training data for pharmacovigilance, drug labeling, biomedical QA, and expert reasoning in clinical contexts.
EUR 90-150
per hour
Evaluate outputs on grid resilience, energy forecasting, infrastructure optimization, and scientific reliability for climate AI models.
EUR 70-110
per hour
Review Portuguese instruction following, reasoning accuracy, and localization fidelity for generalist frontier systems.
EUR 80-140
per hour
Design and run end-to-end evaluation pipelines for deployed LLM products, covering instruction tuning quality, regression testing, and performance benchmarking.
EUR 110-190
per hour
Produce high-quality legal synthetic data and contract analysis annotations for NLP models targeting EU cross-border commercial law.