李理的博客

负对数似然和交叉熵


翻译:The Log-Sum-Exp Trick

本文翻译The Log-Sum-Exp Trick

对数概率向量的归一化是统计建模中的常见任务,但当对大数值进行指数运算时,这可能导致下溢或上溢。本文将讨论用于解决此问题的对数-和-指数技巧(log-sum-exp trick)。


翻译:DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

本文翻译DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning


翻译:YaRN: Efficient Context Window Extension of Large Language Models

本文翻译YaRN: Efficient Context Window Extension of Large Language Models


翻译:DeepSeek-R1: Advancing LLM Reasoning with Reinforcement Learning

本文翻译DeepSeek-R1: Advancing LLM Reasoning with Reinforcement Learning


翻译:DeepSeek Explained 6: All you need to know about Reinforcement Learning in LLM training

本文翻译DeepSeek Explained 6: All you need to know about Reinforcement Learning in LLM training


翻译:DeepSeek Explained 5: DeepSeek-V3-Base

本文翻译DeepSeek Explained 5: DeepSeek-V3-Base


翻译:DeepSeek Explained 4: Multi-Token Prediction

本文翻译DeepSeek Explained 4: Multi-Token Prediction


Multi-head Latent Attention代码分析

本文解释MLA的代码。


翻译:DeepSeek-V3 Explained 3: Auxiliary-Loss-Free Load Balancing

本文翻译DeepSeek-V3 Explained 3: Auxiliary-Loss-Free Load Balancing