李理的博客

深度学习理论与实战：提高篇

国内首本免费的深度学习书籍！涵盖听觉、视觉、语言和强化学习四大领域，深入浅出的理论分析和详尽的代码分析。(5/18增加用Cycle GAN实现Image to Image Tanslation；6/5增加机器翻译；6/17增加Policy Gradient)。

全书更新完毕！

转载请联系作者(fancyerii at gmail dot com)！

Posted by lili on March 14, 2019

翻译：DeepSeek Explained 5: DeepSeek-V3-Base

本文翻译 DeepSeek Explained 5: DeepSeek-V3-Base。

Posted by lili on June 20, 2025

翻译：DeepSeek Explained 4: Multi-Token Prediction

本文翻译 DeepSeek Explained 4: Multi-Token Prediction。

Posted by lili on June 20, 2025

Multi-head Latent Attention代码分析

本文解释MLA的代码。

Posted by lili on June 19, 2025

翻译：DeepSeek-V3 Explained 3: Auxiliary-Loss-Free Load Balancing

本文翻译 DeepSeek-V3 Explained 3: Auxiliary-Loss-Free Load Balancing。

Posted by lili on June 18, 2025

翻译：DeepSeek-V3 Explained 2: DeepSeekMoE

本文翻译 DeepSeek-V3 Explained 2: DeepSeekMoE。

Posted by lili on June 18, 2025

RoPE代码分析

本文介绍RoPE的不同代码实现。

Posted by lili on June 13, 2025

翻译：DeepSeek-V3 Explained 1: Multi-head Latent Attention

本文翻译 DeepSeek-V3 Explained 1: Multi-head Latent Attention。

Posted by lili on June 1, 2025

翻译：The Llama 3 Herd of Models

本文分析阅读The Llama 3 Herd of Models。

Posted by lili on July 30, 2024

Huggingface Whisper代码阅读（一）

本文分析阅读Huggingface Whisper的代码。

Posted by lili on May 31, 2024

CMake+OpenMPI环境

本文介绍openmpi的非root安装，并且在cmake中使用它。

Posted by lili on May 15, 2024

FEATURED TAGS

人工智能深度学习 chatbot PyTorch Java BERT git 编程 OCR 汪曾祺语音识别 Kaldi Linux XLNet 情感分析 sentiment analysis 语法纠错 Transformer Tensorflow Huggingface Ubuntu TensorFlow 深度学习框架 Tensor2Tensor 机器翻译微信 wechat automation selenium webdriver pywinauto CentOS GPU Appium t2t 代码阅读中英翻译公众号爬虫 ocr tesseract pytesseract python 默认参数位置参数 VPN JSON Jackson huggingface RoPE PagedAttention vLLM Pre-training LLM CPT weather forecasting graph neural networks qlora quantization transformers cmake pip pipenv conda padding vscode debug source code build deep learning Speech ASR linux pytorch extension Deep Learning DeepSeek Attention MoE

ABOUT ME

读读论文，写写代码。

FRIENDS

Li Li