DeepSeek releases a visual primitive reasoning method, enhancing multimodal complex reasoning capabilities.

PANews April 30 news, according to a technical report released by DeepSeek, it proposes the "Visual Primitives" method, which addresses the Reference Gap issue in multimodal tasks by embedding basic visual units such as points and boxes into the reasoning chain. This method is based on the DeepSeek-V4-Flash architecture and achieves low image token consumption through compressed KV cache. In counting and spatial reasoning benchmark tests, its performance can be compared to GPT-5.4, Claude-Sonnet-4.6, and Gemini-3-Flash (limited to certain dimensions). The team stated that they will open-source part of the benchmarks and data, and the model weights will be released after integration.

免责声明：本文章仅代表作者个人观点，不代表本平台的立场和观点。本文章仅供信息分享，不构成对任何人的任何投资建议。用户与作者之间的任何争议，与本平台无关。如网页中刊载的文章或图片涉及侵权，请提供相关的权利证明和身份证明发送邮件到support@aicoin.com，本平台相关工作人员将会进行核查。

DeepSeek releases a visual primitive reasoning method, enhancing multimodal complex reasoning capabilities.

Selected Articles by PANews

Table of Contents

Related Articles