Introducing Data Agents
Apr 21, 2025
Summary
🚀 Data Agents are AI systems that take your raw data and automatically generate high-quality, complex, and grounded benchmarks according to your instructions and domains.
🔥 You can now generate accurate, reasoning-based AI benchmarks from your own data in minutes
⚡ With Data Agents, we created 100+ benchmarks and 100,000+ samples using documentation from popular software tools like React, TensorFlow, Kubernetes, and many more. All datasets are now live on RELAI.ai and Hugging Face.
📊 Coming in our next post: Our LLM leaderboard — see how popular models perform across these benchmarks.
The AI Evaluation Challenge:
There are new models emerging every day, and countless AI systems built to process information using them. But one key question remains: does the AI system actually work on your data?
Most benchmarks don't evaluate how well AI systems like RAGs or agentic RAGs perform on specific applications and domains. Manual reviews and evaluations take months.
At RELAI, we’ve tackled this challenge by introducing Data Agents.
What is a Data Agent?
Data Agents are AI systems that take your raw data and automatically generate high-quality, complex, and grounded benchmarks according to your instructions and domains. These benchmarks are designed to evaluate and optimize AI systems on your data.

Figure 1. Data Agents convert your raw data to high-quality, complex, and grounded benchmarks ready to be used in model evaluation and optimization.
Data agents can create reasoning samples, where your AI system must gather and combine multiple pieces of information through reasoning to produce an answer. The curated samples include grounded reasoning and thinking tokens that can be used to fine-tune your reasoning models.
We had to solve some tough technical problems—like guaranteeing sample accuracy, generating diverse reasoning scenarios, and designing comprehensive, quantitative evaluations.
RELAI’s Public Data Agent
To demonstrate the power of Data Agents, we have applied them to publicly available documentation (with allowable licenses) from +50 software platforms, including React, TensorFlow, Kubernetes, Django, PyTorch, Node.js, LangChain, and Google Cloud, Docker and scikit-learn. The result: 100+ benchmarks with 100,000+ samples (many involving complex reasoning), all open-access on our platform—and also available on Hugging Face
How Efficient Data Agents Are?
And here's the kicker: all of these 100 benchmarks were created in under 2 days by our small team, showcasing just how powerful and efficient Data Agents really are.
We also evaluated various state-of-the-art LLMs developed by OpenAI, Google, X, Anthropic, Meta, DeepSeek and others to see which perform best across these benchmarks—and even uncovered which software tools they “prefer”! We will discuss these results in our next post.
For now, help us spread the word—and sign up to try Data Agents for free!