How to Land an AI Engineering Job in 2026

Webinar starts in

00DAYS
:
00HRS
:
00MINS
:
00SEC
Join the Webinar
Tags
    Author
      Technology
        Rating
        Pricing
        Sort By
        Video
        Results To Show
        Most Recent
        Most Popular
        Highest Rated
        Reset

        lesson

        How RAG Finetuning and RLHF Fits in Production

        - End-to-End LLM Finetuning & Orchestration using RL - Prepare instruction-tuning datasets (synthetic + human) - Finetune a small LLM on your RAG tasks - Use RL to finetune the same dataset and compare results across all approaches - Select the appropriate finetuning approach and build RAG - Implement orchestration patterns (pipelines, agents) - Set up continuous monitoring integration using Braintrust - RL Frameworks in Practice - Use DSPy, OpenAI API, LangChain's RLChain, OpenPipe ART, and PufferLib for RLHF tasks - Rubric-Based Reward Systems - Design interpretable rubrics to score reasoning, structure, and correctness - Real-World Applications of RLHF - Explore applications in summarization, email tuning, and web agent fine-tuning - RL and RLHF for RAG - Apply RL techniques to optimize retrieval and generation in RAG pipelines - Use RLHF to improve response quality based on user feedback and preferences - Exercises: End-to-End RAG with Finetuning & RLHF - Finetune a small LLM (Llama 3.2 3B or Qwen 2.5 3B) on ELI5 dataset using LoRA/QLoRA - Apply RLHF with rubric-based rewards to optimize responses - Build production RAG with DSPy orchestration, logging, and monitoring - Compare base → finetuned → RLHF-optimized models

        lesson

        RL & RLHF Framework

        - DSPy + RL Integration - Explore DSPy's prompt optimizer and RL system built into the pipeline - LangChain RL - Use LangChain's experimental RL chain for reinforcement learning tasks - RL Fine-Tuning with OpenAI API - Implement RL fine-tuning using OpenAI's API - RL Fine-Tuning Applications - Apply RL fine-tuning for state-of-the-art email generation - Apply RL fine-tuning for summarization tasks - RL Fine-Tuning with OpenPipe - Use OpenPipe for RL fine-tuning workflows - DPO/PPO/GPRO Comparison - Compare Direct Preference Optimization, Proximal Policy Optimization, and GPRO approaches - Reinforcement Learning with Verifiable Rewards (RLVR) - Learn about RLVR methodology for training with verifiable reward signals - Rubric-Based RL Systems - Explore rubric-based systems to guide RL at inference time for multi-step reasoning - Training Agents to Control Web Browsers - Train agents to control web browsers with RL and Imitation Learning - Exercises: RL Frameworks & Advanced Algorithms - Compare DSPy vs LangChain for building QA systems - Implement GRPO and RLVR algorithms - Build multi-turn agents with turn-level credit assignment - Create privacy-preserving multi-model systems (PAPILLON) with utility-privacy tradeoffs

        lesson

        Intro RL & RLHF

        - Markov Processes as LLM Analogies - Frame token generation as a Markov Decision Process (MDP) with states, actions, and rewards - Monte Carlo vs Temporal Difference Learning - Compare Monte Carlo episode-based learning with Temporal Difference updates, and their relevance to token-level prediction - Q-Learning & Policy Gradients - Explore conceptual foundations of Q-learning and policy gradients as the basis of RLHF and preference optimization - RL in Decoding and Chain-of-Thought - Apply RL ideas during inference without retraining, including CoT prompting with reward feedback and speculative decoding verification - Exercises: RL Foundations with Neural Networks - Implement token generation as MDP with policy and value networks - Compare Monte Carlo vs Temporal Difference learning for value estimation - Build Q-Learning from tables to DQN with experience replay - Implement REINFORCE with baseline subtraction and entropy regularization

        lesson

        Advanced AI-Evals & Monitoring

        - Advanced AI-Evals & Monitoring - Scale LLM-judge for bulk multimodal outputs - Build dashboards comparing judge accuracy vs IR metrics - Implement auto-gate builds if accuracy drops below 95% - Agent Failure Analysis Deep Dive - Create transition-state heatmaps & tool states visualization - Construct failure-matrices with LLM classification - Develop systematic debugging workflows - Enhancing RAG with Contextual Retrieval Recipes - Use Instructor-driven synthetic data (Anthropic GitHub) - Integrate web-search solutions (e.g., exa.ai) - Apply LogFire, Braintrust augmentations - Implement Cohere reranker + advanced logging - Advanced Synthetic & Statistical Validation - Generate persona-varied synthetic questions (angry/confused personas) and rewrite questions for better retrieval - Perform embedding-diversity checks and JSONL corpus structuring - Work with multi-vector databases - Build parallel experimentation harness using ThreadPoolExecutor - Strategic Feedback Collection - Collect feedback with different types; use binary feedback (thumbs up/down) instead of stars - Distinguish between two segment types: lack of data vs lack of capabilities - Address common but fixable capability issues - Dynamic Prompting & Validation - Build dynamic UI with chain-of-thought wrapping using XML or streaming - Incorporate validators with regex (e.g., checking fake emails generated by LLM) - Data Segmentation & Prioritization - Segment data based on patterns - Apply Expected Value formula: Impact × Percentage of Queries × Probability of Success - Topic Discovery with BERTopic - Configure and apply BERTopic for unsupervised topic discovery - Set up embedding model, UMAP, and HDBSCAN for effective clustering - Visualize topic similarities and relationships - Analyze satisfaction scores by topic to identify pain points - Create matrices showing relationship between topics and satisfaction - Identify the "danger zone" of high-volume, low-satisfaction query areas - Persona-Driven Synthetic Queries - Generate diverse queries (angry, curious, confused users) to stress-test retrieval and summarization pipelines - Regex & Schema Validators for LLM Outputs - Add lightweight automated checks for emails, JSON formats, and other structural expectations - Segmentation-Driven Summarization - Build summarization-specific chunks, integrate financial metadata, and compare with BM25 retrieval - Failure-Type Segmentation - Classify failures into retrieval vs generation errors to guide improvement priorities - Clustering Queries with BERTopic - Use UMAP + HDBSCAN to group user queries into semantically meaningful clusters - Mapping Feedback to Topics - Overlay evaluator scores onto clusters to identify weak performance areas - Danger Zone Heatmaps - Visualize query volume vs success rates to prioritize high-impact fixes - Feedback-to-Reranker Loop - Build iterative reranking systems driven by topic segmentation and evaluation feedback - Dynamic Prompting for Tool Selection - Teach LLMs to output structured tool calls reliably (JSON schema, guardrails, few-shots) - Tool Disambiguation and Clarification Loops - Design prompts that force models to ask clarifying questions before executing - XML-Based CoT Streaming for Agents - Output reasoning traces in structured XML-like format for real-time dashboards or UIs - Production-Grade Project - Deploy a full RAG + fine-tuned LLM service - Add multiple tools with RAG and implement tool routing - Include multimodal retrieval, function-calling, LLM-judge pipeline, and monitoring - Achieve ≥ 95% end-to-end task accuracy - Exercises: AI Evaluation & Monitoring Pipeline - Build LLM-as-judge evaluation pipelines with accuracy dashboarding - Apply BERTopic for failure analysis and danger zone heatmaps - Generate persona-driven synthetic queries for stress-testing - Implement automated quality gates with statistical validation

        lesson

        Advanced RAG with Multi-Media RAG

        - Advanced RAG Reranker Training & Triplet Fundamentals - Learn contrastive loss vs triplet loss approaches for training retrievers - Understand tri-encoder vs cross-encoder performance trade-offs - Master triplet-loss fundamentals and semi-hard negative mining strategies - Fine-tune rerankers using Cohere Rerank API & SBERT (sbert.net, Hugging Face) - Multimodal & Metadata RAG - Index and query images, tables, and structured JSON using ColQwen-Omni (ColPali-based late interaction for audio, video, and visual documents) - Implement metadata filtering, short vs long-term indices, and query routing logic - Cartridges RAG Technique - Learn how Cartridges compress large corpora into small, trainable KV-cache structures for efficient retrieval (~39x less memory, ~26x faster) - Master the Self-Study training approach using synthetic Q&A and context distillation for generalized question answering - Cartridge-Based Retrieval - Learn modular retrieval systems with topic-specific "cartridges" for precision memory routing - Late Interaction Methods - Study architectures like ColQwen-Omni that combine multimodal (text, audio, image) retrieval using late interaction fusion - Multi-Vector vs Single-Vector Retrieval - Compare ColBERT/Turbopuffer vs FAISS, and understand trade-offs in granularity, accuracy, and inference cost - Query Routing & Hybrid Memory Systems - Explore dynamic routing between lexical, dense, and multimodal indexes - Loss Functions for Retriever Training - Compare contrastive loss vs triplet loss, and learn about semi-hard negative mining - Reranker Tuning with SBERT or APIs - Fine-tune rerankers (SBERT, Cohere API), evaluate with MRR/nDCG, and integrate into retrieval loops - Exercises: Advanced RAG Techniques - Implement triplet loss vs contrastive loss for reranker training with semi-hard negative mining - Build multimodal RAG systems with images, tables, and query routing - Compare single-vector (FAISS) vs multi-vector (ColBERT) retrieval - Create cartridge-based RAG with topic-specific memory routing


        Articles

        view all ⭢