Chinese artificial intelligence (AI) start-up DeepSeek has introduced a novel approach to improving the reasoning capabilities of large language models (LLMs), as the public awaits the release of the company’s next-generation model.
In collaboration with researchers from Tsinghua University, DeepSeek developed a technique that combines methods referred to as generative reward modelling (GRM) and self-principled critique tuning, according to a paper published on Friday. The dual approach aims to enable LLMs to deliver better and faster results to general queries.
The resulting DeepSeek-GRM models outperformed existing methods, having “achieved competitive performance” with strong public reward models, the researchers wrote. Reward modelling is a process that guides an LLM towards human preferences.
DeepSeek intended to make the GRM models open source, according to the researchers, but they did not give a timeline.
The academic paper, published on the online scientific paper repository arXiv, comes amid speculation about the start-up’s next move following the global attention garnered by the firm’s V3 foundation model and R1 reasoning model.
Reuters reported last month that DeepSeek-R2, the successor to R1, could be released as soon as this month, as the company rushes to capitalise on its rising profile. The release of DeepSeek-R1 rocked the global tech community with its cost-efficient performance that rivalled leading models.
DeepSeek has remained tight-lipped about the rumoured R2 release. It has not commented on the matter through official public channels, but a customer service account denied the report in a group chat with business clients, Chinese media outlets reported last month.