The Complete AI Engineer Roadmap
Phase 1 of 5

Foundations

Math foundations, Python ML toolchain, classical ML, first deployed project with proper backend.

Weeks 1-8 · Months 1-2~112 hours · 8 weeks

All 5 Phases

Week-by-Week Schedule

Week 1 — Linear Algebra Foundations

Vectors, dot product, vector addition · Linear combinations, span, basis vectors · Matrix transformations · Matrix multiplication, 3D transformations · …

7 daily tasks

Week 2 — Calculus + Probability for ML

Derivatives, geometric meaning · Chain rule (THIS IS BACKPROP) · Partial derivatives, gradients · Probability basics, random variables, expectation, variance · …

7 daily tasks

Week 3 — Python ML Toolchain + Linux/Git Basics

Install uv (modern Python package manager) + Linux/Bash basics · NumPy deep dive: broadcasting, vectorization · Pandas: DataFrames, filtering, groupby, merge · Pandas continued: pivot, time series, missing data · …

7 daily tasks

Week 4 — Classical ML Part 1: Regression + Classification

What is ML? Supervised vs Unsupervised. Train/val/test splits · Linear Regression (single variable) — math + intuition · Multiple linear regression, gradient descent · Implement linear regression from scratch in NumPy · …

7 daily tasks

Week 5 — Classical ML Part 2: Trees + Ensembles

Decision Trees: entropy, Gini · Random Forests + bagging · Gradient Boosting intuition · XGBoost + LightGBM hands-on · …

7 daily tasks

Week 6 — Model Evaluation + Feature Engineering + MLflow

Why accuracy misleads. Precision, Recall, F1 · ROC, AUC, PR curves · k-fold + stratified cross-validation · Data leakage: causes + prevention · …

7 daily tasks

Week 7 — PROJECT 1: End-to-End Tabular ML Pipeline

Pick dataset, EDA, document findings · Feature engineering: missing values, encoding, scaling · Build sklearn Pipeline with ColumnTransformer · Train 3 models: LogReg baseline, RF, XGBoost. Stratified k-fold CV · …

7 daily tasks

Week 8 — Backend Foundations: HTTP, FastAPI, Docker, Cloud Deployment

HTTP fundamentals: methods, status codes, headers, JSON · FastAPI Hello World, path/query params, async basics · FastAPI: Pydantic request/response models, validation · Serve Project 1 model via FastAPI: /predict, /health, async endpoint · …

7 daily tasks

Topics Covered

Every subtopic below is a separate daily task in the roadmap, with hand-picked resources (YouTube videos, docs, papers) for each.

  • Vectors, dot product, vector addition
  • Linear combinations, span, basis vectors
  • Matrix transformations
  • Matrix multiplication, 3D transformations
  • Determinants, inverse matrices
  • Eigenvectors & eigenvalues, SVD intuition
  • Implement matmul, eigendecomposition by hand in NumPy
  • Derivatives, geometric meaning
  • Chain rule (THIS IS BACKPROP)
  • Partial derivatives, gradients
  • Probability basics, random variables, expectation, variance
  • Distributions: Bernoulli, Gaussian, Categorical
  • Bayes Theorem + conditional probability
  • Cross-entropy, KL divergence
  • Install uv (modern Python package manager) + Linux/Bash basics
  • NumPy deep dive: broadcasting, vectorization
  • Pandas: DataFrames, filtering, groupby, merge
  • Pandas continued: pivot, time series, missing data
  • Matplotlib + Seaborn
  • Git/GitHub: branches, commits, PRs, .gitignore
  • Set up: GitHub, W&B, HuggingFace, Kaggle accounts. Push first repo.
  • What is ML? Supervised vs Unsupervised. Train/val/test splits
  • Linear Regression (single variable) — math + intuition
  • Multiple linear regression, gradient descent
  • Implement linear regression from scratch in NumPy
  • Logistic regression, sigmoid, binary classification
  • Logistic regression from scratch + cost derivation
  • Overfitting, regularization (L1/L2), bias-variance
  • Decision Trees: entropy, Gini
  • Random Forests + bagging
  • Gradient Boosting intuition
  • XGBoost + LightGBM hands-on
  • KNN, Naive Bayes overview
  • K-Means clustering + PCA
  • Compare 5 algorithms on Titanic dataset, F1 scores
  • Why accuracy misleads. Precision, Recall, F1
  • ROC, AUC, PR curves
  • k-fold + stratified cross-validation
  • Data leakage: causes + prevention
  • Feature scaling, one-hot, target encoding
  • sklearn Pipelines + ColumnTransformer
  • MLflow: track experiments with params + metrics + artifacts
  • Pick dataset, EDA, document findings
  • Feature engineering: missing values, encoding, scaling
  • Build sklearn Pipeline with ColumnTransformer
  • Train 3 models: LogReg baseline, RF, XGBoost. Stratified k-fold CV
  • Hyperparameter tuning with GridSearchCV/RandomizedSearchCV
  • Log everything in MLflow. Write README
  • Clean GitHub structure: /data, /notebooks, /src, README.md, requirements.txt
  • HTTP fundamentals: methods, status codes, headers, JSON
  • FastAPI Hello World, path/query params, async basics
  • FastAPI: Pydantic request/response models, validation
  • Serve Project 1 model via FastAPI: /predict, /health, async endpoint
  • Docker: images, containers, Dockerfile, multi-stage build
  • Dockerize FastAPI service + write docker-compose.yml + env vars (.env)
  • Deploy to Render or HuggingFace Spaces. Public URL. Add error handling middleware.