Alibaba Group Holding has developed an artificial intelligence (AI) model that rivals leading ones from US peers such as OpenAI and Anthropic in terms of coding capabilities, in a sign that Chinese tech firms are running neck-and-neck with American players in open source models.
Qwen2.5 Coder, the latest open source large language model (LLM) from Alibaba’s cloud computing arm, has matched or surpassed OpenAI’s GPT-4o and Claude 3.5 Sonnet from Amazon.com-backed Anthropic in coding capabilities, according to evaluations that included HumanEval, EvalPlus and Aider, the Qwen team said in a statement on Tuesday. Alibaba owns the South China Morning Post.
In nine out of 12 such evaluations, Qwen2.5 Coder’s flagship variant performed better than GPT-4o and Claude 3.5 Sonnet, according to the statement. Until now, the coding capabilities of open source LLMs have typically lagged behind proprietary models from the likes of OpenAI and Anthropic.
At the same time, open source models are increasingly used by researchers and developers for specific AI applications. Facebook parent Meta Platforms said this month that it was making its open source Llama models available to US government agencies to develop defence and national security applications.
According to Meta, open source models matter when it comes to US-China tech rivalry. “Other nations – including China and other competitors of the United States – understand this as well, and are racing to develop their own open source models, investing heavily to leap ahead of the US,” Nick Clegg, the president of global affairs at Meta, wrote in a statement on its website on November 4.
The Alibaba AI coding assistant could enhance efficiency by supporting secure, private, and local code development through integration with integrated development environment (IDE) tools, according to Ahsen Khaliq, who leads machine learning growth at Hugging Face, the machine-learning developer community that hosts various open source models.
The Qwen2.5 Coder’s developers credited its enhanced performance partly to comprehensive training data.