
PANews April 23 news, GitHub page shows that DeepSeek has open-sourced a high-performance GPU operator library TileKernels, which is developed based on TileLang. This library is deeply optimized for the training and inference of large language models (LLM), with operator performance close to the limits of hardware computational intensity and memory bandwidth.
TileKernels includes MoE routing, FP8/FP4 quantization, and various fusion operators, and has been put into use in the internal environment of DeepSeek. This library is currently compatible with NVIDIA SM90 and the latest SM100 (Blackwell) architecture, with a runtime requirement of CUDA 13.1 or higher.
免责声明:本文章仅代表作者个人观点,不代表本平台的立场和观点。本文章仅供信息分享,不构成对任何人的任何投资建议。用户与作者之间的任何争议,与本平台无关。如网页中刊载的文章或图片涉及侵权,请提供相关的权利证明和身份证明发送邮件到support@aicoin.com,本平台相关工作人员将会进行核查。