李理的博客

Implementing and Optimizing a BPE Tokenizer from Scratch—Part 2: Optimizing the Algorithm

This series of articles implements a subtask of Stanford’s CS336 Assignment 1: building an efficient training algorithm for a BPE Tokenizer. Through a series of optimizations, our algorithm’s training time on OpenWebText was reduced from over 10 hours to less than 10 minutes. This series explains these optimizations, including algorithmic improvements, data structure enhancements, parallelization with OpenMP, Cython optimization, and implementing key code in C++ along with its integration via Cython. This is the third article, focusing on optimizing the previous algorithm.


动手实现和优化BPE Tokenizer的训练——第2部分:优化算法

本系列文章完成Stanford CS336作业1的一个子任务——实现BPE Tokenizer的高效训练算法。通过一系列优化,我们的算法在OpenWebText上的训练时间从最初的10多个小时优化到小于10分钟。本系列文章解释这一系列优化过程,包括:算法的优化,数据结构的优化,并行(openmp)优化,cython优化,用c++实现关键代码和c++库的cython集成等内容。本文是第三篇,优化之前的算法。


Implementing and Optimizing a BPE Tokenizer from Scratch—Part 1: The Simplest Implementation

This series of articles implements a subtask of Stanford’s CS336 Assignment 1: building an efficient training algorithm for a BPE Tokenizer. Through a series of optimizations, our algorithm’s training time on OpenWebText was reduced from over 10 hours to less than 10 minutes. This series explains these optimizations, including algorithmic improvements, data structure enhancements, parallelization with OpenMP, Cython optimization, and implementing key code in C++ along with its integration via Cython. This is the second article, covering the implementation of the simplest algorithm.


动手实现和优化BPE Tokenizer的训练——第1部分:最简单实现

本系列文章完成Stanford CS336作业1的一个子任务——实现BPE Tokenizer的高效训练算法。通过一系列优化,我们的算法在OpenWebText上的训练时间从最初的10多个小时优化到小于10分钟。本系列文章解释这一系列优化过程,包括:算法的优化,数据结构的优化,并行(openmp)优化,cython优化,用c++实现关键代码和c++库的cython集成等内容。本文是第二篇,实现一个最简单的算法。


Building and Optimizing a BPE Tokenizer from Scratch—Part 0: Introduction

This series of articles implements a subtask of Stanford’s CS336 Assignment 1: building an efficient training algorithm for a BPE Tokenizer. Through a series of optimizations, our algorithm’s training time on OpenWebText was reduced from over 10 hours to less than 10 minutes. This series explains these optimizations, including algorithmic improvements, data structure enhancements, parallelization with OpenMP, Cython optimization, and implementing key code in C++ along with its integration via Cython. This first article covers the task’s introduction, how to get the source code, and how to set up the development environment.


动手实现和优化BPE Tokenizer的训练——第0部分:简介

本系列文章完成Stanford CS336作业1的一个子任务——实现BPE Tokenizer的高效训练算法。通过一系列优化,我们的算法在OpenWebText上的训练时间从最初的10多个小时优化到小于10分钟。本系列文章解释这一系列优化过程,包括:算法的优化,数据结构的优化,并行(openmp)优化,cython优化,用c++实现关键代码和c++库的cython集成等内容。本文是第一篇,内容包括这个任务的介绍,获取源代码和设置开发环境。


模型优化


负对数似然和交叉熵


翻译:The Log-Sum-Exp Trick

本文翻译The Log-Sum-Exp Trick

对数概率向量的归一化是统计建模中的常见任务,但当对大数值进行指数运算时,这可能导致下溢或上溢。本文将讨论用于解决此问题的对数-和-指数技巧(log-sum-exp trick)。


翻译:DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

本文翻译DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning