The Complete AI Engineer Roadmap
Phase 2 of 5

Deep Learning + PyTorch

Neural networks mastery, PyTorch fluency, build GPT from scratch, basic CV.

Weeks 9-16 · Months 3-4~112 hours · 8 weeks

All 5 Phases

Week-by-Week Schedule

Week 9 — Neural Networks from Scratch (Karpathy)

Karpathy "Neural Networks: Zero to Hero" — Episode 1: micrograd Part 1 · Episode 1 continued — code along · Finish micrograd, internalize backprop fully · Karpathy Episode 2: makemore (bigram model) · …

7 daily tasks

Week 10 — Deep Learning Theory Foundation

Activation functions: ReLU, Sigmoid, Tanh, GELU, SiLU · Forward + backward propagation deep dive · Loss functions: MSE, Cross-entropy, BCE · Optimizers: SGD, Momentum, Adam, AdamW · …

7 daily tasks

Week 11 — PyTorch Mastery

Tensors, devices, basic ops · Autograd: requires_grad, .backward(), .no_grad() · nn.Module, custom layers · Dataset and DataLoader · …

7 daily tasks

Week 12 — Computer Vision Foundations (CNNs)

What is convolution? Filters, stride, padding · Pooling, receptive field · CNN architectures: LeNet, AlexNet, VGG · ResNet + residual connections (critical) · …

7 daily tasks

Week 13 — NLP Fundamentals + Word Embeddings

Tokenization basics: characters, words, subwords · BPE (Byte-Pair Encoding) — how GPT tokenizes · Karpathy tokenizer Part 2 + finish · Word2Vec, GloVe (legacy but interview-relevant) · …

7 daily tasks

Week 14 — The Transformer Architecture

"Attention Is All You Need" paper — first pass, skim · 3Blue1Brown "But what is a GPT?" + "Attention in transformers, visually explained" · Self-attention math: Q, K, V matrices · Multi-head attention, positional encodings (sinusoidal, RoPE, ALiBi) · …

7 daily tasks

Week 15 — Build GPT from Scratch with Karpathy

Karpathy "Let's build GPT: from scratch" — minutes 0-30 · Continue — minutes 30-60 · Continue — minutes 60-90 · Continue — minutes 90-120 · …

7 daily tasks

Week 16 — HuggingFace Ecosystem + Fine-Tuning Basics

HF Transformers: pipeline, AutoModel, AutoTokenizer · Loading + using pretrained models (BERT, GPT-2) · Tokenizers in HF, padding, attention masks · Datasets library: load, map, filter · …

7 daily tasks

Topics Covered

Every subtopic below is a separate daily task in the roadmap, with hand-picked resources (YouTube videos, docs, papers) for each.

  • Karpathy "Neural Networks: Zero to Hero" — Episode 1: micrograd Part 1
  • Episode 1 continued — code along
  • Finish micrograd, internalize backprop fully
  • Karpathy Episode 2: makemore (bigram model)
  • Episode 2 continued
  • Karpathy Episode 3: MLP (full 3-hour video)
  • Re-implement micrograd from memory — no peeking
  • Activation functions: ReLU, Sigmoid, Tanh, GELU, SiLU
  • Forward + backward propagation deep dive
  • Loss functions: MSE, Cross-entropy, BCE
  • Optimizers: SGD, Momentum, Adam, AdamW
  • Regularization: Dropout, L1/L2, early stopping
  • Batch norm vs Layer norm (mandatory interview topic)
  • Weight init (Xavier, He) + vanishing/exploding gradients
  • Tensors, devices, basic ops
  • Autograd: requires_grad, .backward(), .no_grad()
  • nn.Module, custom layers
  • Dataset and DataLoader
  • Full training loop: zero_grad → forward → loss → backward → step
  • Save/load models (state_dict), checkpointing, W&B integration
  • Build MLP for MNIST in PyTorch, log with W&B
  • What is convolution? Filters, stride, padding
  • Pooling, receptive field
  • CNN architectures: LeNet, AlexNet, VGG
  • ResNet + residual connections (critical)
  • Transfer learning, fine-tuning pretrained models
  • Build a CNN in PyTorch for CIFAR-10
  • Fine-tune pretrained ResNet-18 on custom dataset + read about ViT, CLIP
  • Tokenization basics: characters, words, subwords
  • BPE (Byte-Pair Encoding) — how GPT tokenizes
  • Karpathy tokenizer Part 2 + finish
  • Word2Vec, GloVe (legacy but interview-relevant)
  • Modern embeddings: sentence-transformers, BGE, OpenAI embeddings
  • RNN basics + why they failed → motivation for Transformers
  • LSTM + vanishing gradients (interview topic)
  • "Attention Is All You Need" paper — first pass, skim
  • 3Blue1Brown "But what is a GPT?" + "Attention in transformers, visually explained"
  • Self-attention math: Q, K, V matrices
  • Multi-head attention, positional encodings (sinusoidal, RoPE, ALiBi)
  • Encoder-only (BERT), Decoder-only (GPT), Encoder-Decoder (T5) — when to use which
  • Re-read "Attention Is All You Need" — careful second pass
  • KV cache, Pre-LN vs Post-LN (interview topics)
  • Karpathy "Let's build GPT: from scratch" — minutes 0-30
  • Continue — minutes 30-60
  • Continue — minutes 60-90
  • Continue — minutes 90-120
  • Re-watch multi-head attention section, pause and code along
  • Re-implement entire nanoGPT from memory — don't copy his code
  • Train mini-GPT on custom text (Shakespeare, song lyrics, your favorite book)
  • HF Transformers: pipeline, AutoModel, AutoTokenizer
  • Loading + using pretrained models (BERT, GPT-2)
  • Tokenizers in HF, padding, attention masks
  • Datasets library: load, map, filter
  • Trainer API — training on custom dataset
  • Fine-tune DistilBERT for sentiment classification (IMDB)
  • Push model to HuggingFace Hub with model card