The Complete AI Engineer Roadmap
Phase 3 of 5

Applied LLM Engineering

Master prompt engineering, RAG, fine-tuning with LoRA, agents, production patterns.

Weeks 17-24 · Months 5-6~112 hours · 8 weeks

All 5 Phases

Week-by-Week Schedule

Week 17 — Prompt Engineering + Structured Output + LLM APIs

System/User/Assistant structure, OpenAI API basics, async API calls · Anthropic API + Claude vision API (multimodal bonus) · Few-shot prompting, chain-of-thought, self-consistency, ReAct · Structured output: JSON mode, function calling · …

7 daily tasks

Week 18 — RAG Foundations + Vector Databases

What is RAG? Why it exists. RAG vs fine-tuning decision · Chunking strategies: fixed, semantic, recursive, late chunking · Embedding models: BGE, e5, nomic, OpenAI text-embedding-3 · Vector databases: Qdrant + pgvector · …

7 daily tasks

Week 19 — Advanced RAG: Reranking, Query Rewriting, Evaluation

Why basic RAG fails in production · Cross-encoder rerankers: bge-reranker, cohere-rerank · Query rewriting + HyDE (Hypothetical Document Embeddings) · Multi-hop retrieval, parent-document retrieval · …

7 daily tasks

Week 20 — PROJECT 3: Production RAG over Real Corpus

Pick corpus, document ingestion pipeline · Implement chunking · Embedding + Qdrant storage · Retrieval + reranker integration · …

7 daily tasks

Week 21 — Fine-Tuning Theory: LoRA, QLoRA, PEFT

When fine-tuning beats prompting · Full FT vs PEFT trade-offs · LoRA math: W' = W + AB, why low-rank works · QLoRA: 4-bit quantization + LoRA · …

7 daily tasks

Week 22 — Hands-On QLoRA Fine-Tuning

Pick base model (Qwen2.5-1.5B or Llama-3.2-1B-Instruct) and domain dataset · Prepare dataset in correct chat template format · Set up trl.SFTTrainer + bitsandbytes 4-bit + LoRA config · First training run (small) — debug issues · …

7 daily tasks

Week 23 — Agents + Tool Use + MCP

What is an AI agent? Tool calling protocols · LangGraph for stateful agents · LangGraph tutorial — build first agent · Pydantic AI / instructor for typed agent outputs · …

7 daily tasks

Week 24 — Production LLM Patterns: Observability, Streamlit, Frontend Demo

Streamlit basics — build LLM demos in Python · Wrap your RAG (Project 3) with a Streamlit UI · LLM observability: Langfuse setup + integration · Add tracing + cost tracking to your RAG project · …

7 daily tasks

Topics Covered

Every subtopic below is a separate daily task in the roadmap, with hand-picked resources (YouTube videos, docs, papers) for each.

  • System/User/Assistant structure, OpenAI API basics, async API calls
  • Anthropic API + Claude vision API (multimodal bonus)
  • Few-shot prompting, chain-of-thought, self-consistency, ReAct
  • Structured output: JSON mode, function calling
  • Pydantic + instructor library for typed LLM outputs
  • Build resume → JSON extractor with retries + exponential backoff
  • Token budgeting, context window management, cost tracking
  • What is RAG? Why it exists. RAG vs fine-tuning decision
  • Chunking strategies: fixed, semantic, recursive, late chunking
  • Embedding models: BGE, e5, nomic, OpenAI text-embedding-3
  • Vector databases: Qdrant + pgvector
  • Cosine similarity, HNSW indexing intuition
  • Build basic RAG: load → chunk → embed → store → retrieve → generate
  • Hybrid search: BM25 + dense embeddings + multimodal (CLIP) intro
  • Why basic RAG fails in production
  • Cross-encoder rerankers: bge-reranker, cohere-rerank
  • Query rewriting + HyDE (Hypothetical Document Embeddings)
  • Multi-hop retrieval, parent-document retrieval
  • RAG evaluation: faithfulness, answer relevance, context precision
  • Upgrade Week 18 RAG: + reranker + query rewriting + RAGAS
  • Document everything in README. Before/after metrics comparison
  • Pick corpus, document ingestion pipeline
  • Implement chunking
  • Embedding + Qdrant storage
  • Retrieval + reranker integration
  • FastAPI streaming endpoint + auth + rate limiting
  • Dockerize + deploy to Render/HF Spaces
  • RAGAS evaluation + README + demo video
  • When fine-tuning beats prompting
  • Full FT vs PEFT trade-offs
  • LoRA math: W' = W + AB, why low-rank works
  • QLoRA: 4-bit quantization + LoRA
  • SFT vs DPO vs RLHF concepts
  • HuggingFace PEFT library hands-on
  • Dataset formatting: chat templates, ShareGPT, Alpaca
  • Pick base model (Qwen2.5-1.5B or Llama-3.2-1B-Instruct) and domain dataset
  • Prepare dataset in correct chat template format
  • Set up trl.SFTTrainer + bitsandbytes 4-bit + LoRA config
  • First training run (small) — debug issues
  • Full training run with W&B logging
  • Evaluate before/after on held-out set — target ≥5% improvement
  • Push LoRA adapter to HF Hub with model card
  • What is an AI agent? Tool calling protocols
  • LangGraph for stateful agents
  • LangGraph tutorial — build first agent
  • Pydantic AI / instructor for typed agent outputs
  • MCP (Model Context Protocol)
  • Build tool-using agent: ≥3 tools (web search, calculator, file I/O)
  • Memory: short-term vs long-term, episodic
  • Streamlit basics — build LLM demos in Python
  • Wrap your RAG (Project 3) with a Streamlit UI
  • LLM observability: Langfuse setup + integration
  • Add tracing + cost tracking to your RAG project
  • Build simple vanilla HTML+JS frontend that calls your FastAPI
  • Add prompt injection guardrails + PII detection (Presidio)
  • Buffer: catch up on incomplete work, polish projects