Current curriculum covers topic from classical NLP to modern LLM training, scaling, and deployment:
- Foundations: text preprocessing, tokenization (BPE / WordPiece / Unigram), word representations (TF-IDF, Word2Vec, GloVe)
- Sequence Models: n-gram & neural LMs, RNN/LSTM, seq2seq, attention, Transformer
- Pre-trained LMs & LLMs: BERT / GPT / T5, transfer learning, prompting, CLM/MLM pre-training, scaling laws
- Modern LLM Architecture: RoPE / ALiBi, KV cache, MHA → MQA / GQA / MLA, RMSNorm, SwiGLU
- Training at Scale: mixed precision, ZeRO / FSDP, 5D parallelism, Mixture of Experts
- Efficient Inference: quantization (GPTQ, AWQ, INT8/4, FP8), distillation, speculative decoding, PagedAttention
- Applied LLMs: Information Retrieval, RAG, AI agents (ReAct, tool use, memory, MCP)
- Post-training: alignment, RLHF, DPO
- German Gritsai @grgera
- Anastasiia Vozniuk @natriistorm
- Ildar Khabutdinov @depinwhite
| Week # | Date | Topic | Lecture | Seminar | Additional | Recording |
|---|---|---|---|---|---|---|
| 1 | February 10 | Intro to NLP & Tokenization | slides | ipynb | materials | TBA |
| 2 | February 17 | Feature Extraction and Word Representations | slides | ipynb | materials | TBA |
| 3 | February 24 | Language Modeling, Seq2Seq, Attention | slides | - | materials | TBA |
| 4 | March 3 | Transfer Learning, BERT-like, LLMs | slides | ipynb | materials | YouTube |
| 5 | March 10 | Large Language Models Pre-Training | slides | - | materials | YouTube |
| 6 | March 17 | Modern LLMs evolution beyond the Transformer | slides | ipynb | materials | YouTube |
| 7 | March 31 | Training Large Language Models | slides | - | materials | TBA |
| 8 | April 7 | 5D Parallelism, Mixture of Experts | slides | - | materials | TBA |
| 9 | April 14 | Efficient Inference Techniques and Methods | slides | - | materials | TBA |
| 10 | April 28 | Information Retrieval & RAG | slides | - | materials | TBA |
| 11 | May 5 | AI Agents | slides | ipynb | materials | TBA |
| Task # | Release | Deadline | Inside | Materials |
|---|---|---|---|---|
| 1 | March 10 | March 19 - 23:59 | Explore NLP pipeline | ipynb |
| 2 | April 19 | May 3 - 23:59 | Models Fine-Tuning | ipynb |
| 3 | May 9 | May 17 - 23:59 | AI agent systems | ipynb |
Final mark = 0.3 × (oral answer grade) + 0.7 × (average score for practical assignments)
Both oral exam and homeworks are blocking parts, you need to pass both parts to pass the course.
- Probability Theory + Statistics
- Machine Learning
- Python Python guide
- Basic knowledge on NLP
We expect students to know basics of Natural Language Processing, as the course focuses on more advanced topics. When you unsure about the basics, we recommned to read these lectures / materials:
