Week-by-Week Schedule

Week 9 — Neural Networks from Scratch (Karpathy)

Karpathy "Neural Networks: Zero to Hero" — Episode 1: micrograd Part 1 · Episode 1 continued — code along · Finish micrograd, internalize backprop fully · Karpathy Episode 2: makemore (bigram model) · …

7 daily tasks

Week 10 — Deep Learning Theory Foundation

Activation functions: ReLU, Sigmoid, Tanh, GELU, SiLU · Forward + backward propagation deep dive · Loss functions: MSE, Cross-entropy, BCE · Optimizers: SGD, Momentum, Adam, AdamW · …

7 daily tasks

Week 11 — PyTorch Mastery

Tensors, devices, basic ops · Autograd: requires_grad, .backward(), .no_grad() · nn.Module, custom layers · Dataset and DataLoader · …

7 daily tasks

Week 12 — Computer Vision Foundations (CNNs)

What is convolution? Filters, stride, padding · Pooling, receptive field · CNN architectures: LeNet, AlexNet, VGG · ResNet + residual connections (critical) · …

7 daily tasks

Week 13 — NLP Fundamentals + Word Embeddings

Tokenization basics: characters, words, subwords · BPE (Byte-Pair Encoding) — how GPT tokenizes · Karpathy tokenizer Part 2 + finish · Word2Vec, GloVe (legacy but interview-relevant) · …

7 daily tasks

Week 14 — The Transformer Architecture

"Attention Is All You Need" paper — first pass, skim · 3Blue1Brown "But what is a GPT?" + "Attention in transformers, visually explained" · Self-attention math: Q, K, V matrices · Multi-head attention, positional encodings (sinusoidal, RoPE, ALiBi) · …

7 daily tasks

Week 15 — Build GPT from Scratch with Karpathy

Karpathy "Let's build GPT: from scratch" — minutes 0-30 · Continue — minutes 30-60 · Continue — minutes 60-90 · Continue — minutes 90-120 · …

7 daily tasks

Week 16 — HuggingFace Ecosystem + Fine-Tuning Basics

HF Transformers: pipeline, AutoModel, AutoTokenizer · Loading + using pretrained models (BERT, GPT-2) · Tokenizers in HF, padding, attention masks · Datasets library: load, map, filter · …

7 daily tasks

Topics Covered

Every subtopic below is a separate daily task in the roadmap, with hand-picked resources (YouTube videos, docs, papers) for each.

Karpathy "Neural Networks: Zero to Hero" — Episode 1: micrograd Part 1

Episode 1 continued — code along

Finish micrograd, internalize backprop fully

Karpathy Episode 2: makemore (bigram model)

Episode 2 continued

Karpathy Episode 3: MLP (full 3-hour video)

Re-implement micrograd from memory — no peeking

Activation functions: ReLU, Sigmoid, Tanh, GELU, SiLU

Forward + backward propagation deep dive

Loss functions: MSE, Cross-entropy, BCE

Optimizers: SGD, Momentum, Adam, AdamW

Regularization: Dropout, L1/L2, early stopping

Batch norm vs Layer norm (mandatory interview topic)

Weight init (Xavier, He) + vanishing/exploding gradients

Tensors, devices, basic ops

Autograd: requires_grad, .backward(), .no_grad()

nn.Module, custom layers

Dataset and DataLoader

Full training loop: zero_grad → forward → loss → backward → step

Save/load models (state_dict), checkpointing, W&B integration

Build MLP for MNIST in PyTorch, log with W&B

What is convolution? Filters, stride, padding

Pooling, receptive field

CNN architectures: LeNet, AlexNet, VGG

ResNet + residual connections (critical)

Transfer learning, fine-tuning pretrained models

Build a CNN in PyTorch for CIFAR-10

Fine-tune pretrained ResNet-18 on custom dataset + read about ViT, CLIP

Tokenization basics: characters, words, subwords

BPE (Byte-Pair Encoding) — how GPT tokenizes

Karpathy tokenizer Part 2 + finish

Word2Vec, GloVe (legacy but interview-relevant)

Modern embeddings: sentence-transformers, BGE, OpenAI embeddings

RNN basics + why they failed → motivation for Transformers

LSTM + vanishing gradients (interview topic)

"Attention Is All You Need" paper — first pass, skim

3Blue1Brown "But what is a GPT?" + "Attention in transformers, visually explained"

Self-attention math: Q, K, V matrices

Multi-head attention, positional encodings (sinusoidal, RoPE, ALiBi)

Encoder-only (BERT), Decoder-only (GPT), Encoder-Decoder (T5) — when to use which

Re-read "Attention Is All You Need" — careful second pass

KV cache, Pre-LN vs Post-LN (interview topics)

Karpathy "Let's build GPT: from scratch" — minutes 0-30

Continue — minutes 30-60

Continue — minutes 60-90

Continue — minutes 90-120

Re-watch multi-head attention section, pause and code along

Re-implement entire nanoGPT from memory — don't copy his code

Train mini-GPT on custom text (Shakespeare, song lyrics, your favorite book)

HF Transformers: pipeline, AutoModel, AutoTokenizer

Loading + using pretrained models (BERT, GPT-2)

Tokenizers in HF, padding, attention masks

Datasets library: load, map, filter

Trainer API — training on custom dataset

Fine-tune DistilBERT for sentiment classification (IMDB)

Push model to HuggingFace Hub with model card

Deep Learning & PyTorch — Neural Networks from Scratch

All 5 Phases

Week-by-Week Schedule

Week 9 — Neural Networks from Scratch (Karpathy)

Week 10 — Deep Learning Theory Foundation

Week 11 — PyTorch Mastery

Week 12 — Computer Vision Foundations (CNNs)

Week 13 — NLP Fundamentals + Word Embeddings

Week 14 — The Transformer Architecture

Week 15 — Build GPT from Scratch with Karpathy

Week 16 — HuggingFace Ecosystem + Fine-Tuning Basics

Topics Covered