
Alibaba has launched a new Tongyi DeepResearch. It is the first fully open-source web agent to match OpenAI’s DeepResearch in performance. However, unlike most AI releases, it includes a complete playbook for building powerful research agents. So in this article, I have talked about 5 things to know about.
1. It Matches, and Sometimes Beats, Proprietary Rivals

The company released the benchmark scores, which revealed that it achieved 32.9 on Humanity’s Last Exam, 43.4 on BrowseComp, 46.7 on BrowseComp-ZH, and 75 on xbench-DeepSearch. These tests are designed for complex information-seeking tasks, and Alibaba’s results systematically outperform both existing open-source and proprietary Deep Research agents.
The interesting part here is that by reaching parity with OpenAI’s DeepResearch, Alibaba has indicated that high-level research performance is no longer limited to closed systems.
2. A Full-Stack Methodology
The company has also shared a complete methodology for building advanced research agents. It spans Agentic Continual Pre-training (CPT) for developing strong foundations, Supervised Fine-Tuning (SFT) to bootstrap initial reasoning abilities, and Reinforcement Learning (RL) for refining performance.
The showstopper here is the new AgentFounder, which is an automated system that continuously synthesizes new training data, creating a self-improving “data flywheel.” For inference, the model works effectively in vanilla ReAct mode without any prompt engineering, while Heavy Mode demonstrates its maximum reasoning and planning capabilities.
3. Advanced Data Generation That Pushes AI Limits
Alibaba has developed an end-to-end synthetic data engine, which is said to be capable of creating PhD-level research questions without human intervention. This pipeline not only increases question difficulty but also makes sure that the verifications are accurate and minimizes inconsistencies between data structure and reasoning steps.
Furthermore, by relying on large-scale synthetic tasks instead of human-labeled datasets, the team achieved more stable and scalable improvements. This approach effectively breaks through previous limits on AI research agent performance and provides a robust foundation for future development.
4. Infrastructure Built for Scale
The company introduced a custom Group Relative Policy Optimization (GRPO) algorithm for reinforcement learning, which stabilizes training and avoids pitfalls like format collapse. They also replaced costly, inconsistent live web APIs with a synthetic training environment, enabling faster iteration and reduced development costs.
Moreover, a stable tool sandbox was created to handle failures, retries, and concurrency, preventing errors from disrupting learning. Apart from that, Alibaba implemented automated, real-time data curation that adjusted the training set, ensuring both stability and performance gains as the model evolved.
5. Already Powering Real Applications

Obviously, Alibaba’s ecosystem has been using Tongyi DeepResearch. But according to the company, it has been used by others as well. For example, it powers Gaode Mate’s “Xiao Gao,” which is an AI copilot that can generate detailed, multi-day driving tours based on user preferences, such as scenic spots and pet-friendly hotels. In legal tech, Tongyi FaRui autonomously performs legal research, retrieves and cross-references case law, and cites statutes with professional accuracy.
So by open-sourcing Tongyi DeepResearch and its entire training methodology, one thing is clear: Alibaba is challenging proprietary AI leaders and empowering developers worldwide. This release could speed up innovation across industries such as navigation, legal research, and more.