Welcome to

AI bootcamp 2

AI engineering in the enterprise

Course Syllabus and Content

Week 1

Advanced RAG with Multi-Media RAG

1 Unit

  • 01
    Advanced RAG with Multi-Media RAG
     
    • Advanced RAG Reranker Training & Triplet Fundamentals

      • Learn contrastive loss vs triplet loss approaches for training retrievers
      • Understand tri-encoder vs cross-encoder performance trade-offs
      • Master triplet-loss fundamentals and semi-hard negative mining strategies
      • Fine-tune rerankers using Cohere Rerank API & SBERT (sbert.net, Hugging Face)
    • Multimodal & Metadata RAG

      • Index and query images, tables, and structured JSON using ColQwen-Omni (ColPali-based late interaction for audio, video, and visual documents)
      • Implement metadata filtering, short vs long-term indices, and query routing logic
    • Cartridges RAG Technique

      • Learn how Cartridges compress large corpora into small, trainable KV-cache structures for efficient retrieval (~39x less memory, ~26x faster)
      • Master the Self-Study training approach using synthetic Q&A and context distillation for generalized question answering
    • Cartridge-Based Retrieval

      • Learn modular retrieval systems with topic-specific "cartridges" for precision memory routing
    • Late Interaction Methods

      • Study architectures like ColQwen-Omni that combine multimodal (text, audio, image) retrieval using late interaction fusion
    • Multi-Vector vs Single-Vector Retrieval

      • Compare ColBERT/Turbopuffer vs FAISS, and understand trade-offs in granularity, accuracy, and inference cost
    • Query Routing & Hybrid Memory Systems

      • Explore dynamic routing between lexical, dense, and multimodal indexes
    • Loss Functions for Retriever Training

      • Compare contrastive loss vs triplet loss, and learn about semi-hard negative mining
    • Reranker Tuning with SBERT or APIs

      • Fine-tune rerankers (SBERT, Cohere API), evaluate with MRR/nDCG, and integrate into retrieval loops
    • Exercises: Advanced RAG Techniques

      • Implement triplet loss vs contrastive loss for reranker training with semi-hard negative mining
      • Build multimodal RAG systems with images, tables, and query routing
      • Compare single-vector (FAISS) vs multi-vector (ColBERT) retrieval
      • Create cartridge-based RAG with topic-specific memory routing
Week 2

Advanced AI-Evals & Monitoring

1 Unit

  • 01
    Advanced AI-Evals & Monitoring
     
    • Advanced AI-Evals & Monitoring

      • Scale LLM-judge for bulk multimodal outputs
      • Build dashboards comparing judge accuracy vs IR metrics
      • Implement auto-gate builds if accuracy drops below 95%
    • Agent Failure Analysis Deep Dive

      • Create transition-state heatmaps & tool states visualization
      • Construct failure-matrices with LLM classification
      • Develop systematic debugging workflows
    • Enhancing RAG with Contextual Retrieval Recipes

      • Use Instructor-driven synthetic data (Anthropic GitHub)
      • Integrate web-search solutions (e.g., exa.ai)
      • Apply LogFire, Braintrust augmentations
      • Implement Cohere reranker + advanced logging
    • Advanced Synthetic & Statistical Validation

      • Generate persona-varied synthetic questions (angry/confused personas) and rewrite questions for better retrieval
      • Perform embedding-diversity checks and JSONL corpus structuring
      • Work with multi-vector databases
      • Build parallel experimentation harness using ThreadPoolExecutor
    • Strategic Feedback Collection

      • Collect feedback with different types; use binary feedback (thumbs up/down) instead of stars
      • Distinguish between two segment types: lack of data vs lack of capabilities
      • Address common but fixable capability issues
    • Dynamic Prompting & Validation

      • Build dynamic UI with chain-of-thought wrapping using XML or streaming
      • Incorporate validators with regex (e.g., checking fake emails generated by LLM)
    • Data Segmentation & Prioritization

      • Segment data based on patterns
      • Apply Expected Value formula: Impact × Percentage of Queries × Probability of Success
    • Topic Discovery with BERTopic

      • Configure and apply BERTopic for unsupervised topic discovery
      • Set up embedding model, UMAP, and HDBSCAN for effective clustering
      • Visualize topic similarities and relationships
      • Analyze satisfaction scores by topic to identify pain points
      • Create matrices showing relationship between topics and satisfaction
      • Identify the "danger zone" of high-volume, low-satisfaction query areas
    • Persona-Driven Synthetic Queries

      • Generate diverse queries (angry, curious, confused users) to stress-test retrieval and summarization pipelines
    • Regex & Schema Validators for LLM Outputs

      • Add lightweight automated checks for emails, JSON formats, and other structural expectations
    • Segmentation-Driven Summarization

      • Build summarization-specific chunks, integrate financial metadata, and compare with BM25 retrieval
    • Failure-Type Segmentation

      • Classify failures into retrieval vs generation errors to guide improvement priorities
    • Clustering Queries with BERTopic

      • Use UMAP + HDBSCAN to group user queries into semantically meaningful clusters
    • Mapping Feedback to Topics

      • Overlay evaluator scores onto clusters to identify weak performance areas
    • Danger Zone Heatmaps

      • Visualize query volume vs success rates to prioritize high-impact fixes
    • Feedback-to-Reranker Loop

      • Build iterative reranking systems driven by topic segmentation and evaluation feedback
    • Dynamic Prompting for Tool Selection

      • Teach LLMs to output structured tool calls reliably (JSON schema, guardrails, few-shots)
    • Tool Disambiguation and Clarification Loops

      • Design prompts that force models to ask clarifying questions before executing
    • XML-Based CoT Streaming for Agents

      • Output reasoning traces in structured XML-like format for real-time dashboards or UIs
    • Production-Grade Project

      • Deploy a full RAG + fine-tuned LLM service
      • Add multiple tools with RAG and implement tool routing
      • Include multimodal retrieval, function-calling, LLM-judge pipeline, and monitoring
      • Achieve ≥ 95% end-to-end task accuracy
    • Exercises: AI Evaluation & Monitoring Pipeline

      • Build LLM-as-judge evaluation pipelines with accuracy dashboarding
      • Apply BERTopic for failure analysis and danger zone heatmaps
      • Generate persona-driven synthetic queries for stress-testing
      • Implement automated quality gates with statistical validation
Week 3

Intro RL & RLHF

1 Unit

  • 01
    Intro RL & RLHF
     
    • Markov Processes as LLM Analogies

      • Frame token generation as a Markov Decision Process (MDP) with states, actions, and rewards
    • Monte Carlo vs Temporal Difference Learning

      • Compare Monte Carlo episode-based learning with Temporal Difference updates, and their relevance to token-level prediction
    • Q-Learning & Policy Gradients

      • Explore conceptual foundations of Q-learning and policy gradients as the basis of RLHF and preference optimization
    • RL in Decoding and Chain-of-Thought

      • Apply RL ideas during inference without retraining, including CoT prompting with reward feedback and speculative decoding verification
    • Exercises: RL Foundations with Neural Networks

      • Implement token generation as MDP with policy and value networks
      • Compare Monte Carlo vs Temporal Difference learning for value estimation
      • Build Q-Learning from tables to DQN with experience replay
      • Implement REINFORCE with baseline subtraction and entropy regularization
Week 4

RL & RLHF Framework

1 Unit

  • 01
    RL & RLHF Framework
     
    • DSPy + RL Integration

      • Explore DSPy's prompt optimizer and RL system built into the pipeline
    • LangChain RL

      • Use LangChain's experimental RL chain for reinforcement learning tasks
    • RL Fine-Tuning with OpenAI API

      • Implement RL fine-tuning using OpenAI's API
    • RL Fine-Tuning Applications

      • Apply RL fine-tuning for state-of-the-art email generation
      • Apply RL fine-tuning for summarization tasks
    • RL Fine-Tuning with OpenPipe

      • Use OpenPipe for RL fine-tuning workflows
    • DPO/PPO/GPRO Comparison

      • Compare Direct Preference Optimization, Proximal Policy Optimization, and GPRO approaches
    • Reinforcement Learning with Verifiable Rewards (RLVR)

      • Learn about RLVR methodology for training with verifiable reward signals
    • Rubric-Based RL Systems

      • Explore rubric-based systems to guide RL at inference time for multi-step reasoning
    • Training Agents to Control Web Browsers

      • Train agents to control web browsers with RL and Imitation Learning
    • Exercises: RL Frameworks & Advanced Algorithms

      • Compare DSPy vs LangChain for building QA systems
      • Implement GRPO and RLVR algorithms
      • Build multi-turn agents with turn-level credit assignment
      • Create privacy-preserving multi-model systems (PAPILLON) with utility-privacy tradeoffs
Week 5

How RAG Finetuning and RLHF Fits in Production

1 Unit

  • 01
    How RAG Finetuning and RLHF Fits in Production
     
    • End-to-End LLM Finetuning & Orchestration using RL

      • Prepare instruction-tuning datasets (synthetic + human)
      • Finetune a small LLM on your RAG tasks
      • Use RL to finetune the same dataset and compare results across all approaches
      • Select the appropriate finetuning approach and build RAG
      • Implement orchestration patterns (pipelines, agents)
      • Set up continuous monitoring integration using Braintrust
    • RL Frameworks in Practice

      • Use DSPy, OpenAI API, LangChain's RLChain, OpenPipe ART, and PufferLib for RLHF tasks
    • Rubric-Based Reward Systems

      • Design interpretable rubrics to score reasoning, structure, and correctness
    • Real-World Applications of RLHF

      • Explore applications in summarization, email tuning, and web agent fine-tuning
    • RL and RLHF for RAG

      • Apply RL techniques to optimize retrieval and generation in RAG pipelines
      • Use RLHF to improve response quality based on user feedback and preferences
    • Exercises: End-to-End RAG with Finetuning & RLHF

      • Finetune a small LLM (Llama 3.2 3B or Qwen 2.5 3B) on ELI5 dataset using LoRA/QLoRA
      • Apply RLHF with rubric-based rewards to optimize responses
      • Build production RAG with DSPy orchestration, logging, and monitoring
      • Compare base → finetuned → RLHF-optimized models