DeepSeek unveils new AI reasoning method as anticipation for its next-gen model rises

By South China Morning Post | Created at 2025-04-05 23:36:18 | Updated at 2025-04-06 18:48:40 19 hours ago

Chinese artificial intelligence (AI) start-up DeepSeek has introduced a novel approach to improving the reasoning capabilities of large language models (LLMs), as the public awaits the release of the company’s next-generation model.

In collaboration with researchers from Tsinghua University, DeepSeek developed a technique that combines methods referred to as generative reward modelling (GRM) and self-principled critique tuning, according to a paper published on Friday. The dual approach aims to enable LLMs to deliver better and faster results to general queries.

The resulting DeepSeek-GRM models outperformed existing methods, having “achieved competitive performance” with strong public reward models, the researchers wrote. Reward modelling is a process that guides an LLM towards human preferences.

DeepSeek intended to make the GRM models open source, according to the researchers, but they did not give a timeline.

The academic paper, published on the online scientific paper repository arXiv, comes amid speculation about the start-up’s next move following the global attention garnered by the firm’s V3 foundation model and R1 reasoning model.

Reuters reported last month that DeepSeek-R2, the successor to R1, could be released as soon as this month, as the company rushes to capitalise on its rising profile. The release of DeepSeek-R1 rocked the global tech community with its cost-efficient performance that rivalled leading models.

DeepSeek has remained tight-lipped about the rumoured R2 release. It has not commented on the matter through official public channels, but a customer service account denied the report in a group chat with business clients, Chinese media outlets reported last month.

Read Entire Article