The Future Of Software engineering and AI: What YOU can do about it
Webinar starts in
Explore all newline lessons
lesson
Code Loading Lessonlesson
Replit Embed Lessonlesson
Codesandbox Embed Lessonlesson
MDX Powered Lessonlesson
Code Loading Lessonlesson
Real-world applications- Summarization with RL - Email quality tuning → with reward-weighted phrase control - Web agent fine-tuning → Browser control with RL/IL - Academic reference → Training Web Agents with RL + IL
lesson
Rubric-based reward systems (e.g., Will Brown’s system)- Study how to guide LLMs with step-by-step rubrics at training and inference time
lesson
RL with OpenAI, LangChain, OpenPipe, DSPy- Learn to apply RL/RLHF using production-grade frameworks
lesson
Framework of RL- Use DSPy to define reusable prompts, reward conditions, and scoring logic in code
lesson
DPO vs PPO vs GPRO vs RLVR- Compare key fine-tuning strategies - DPO (Direct Preference Optimization) - PPO (Proximal Policy Optimization) - GPRO (Generalized Preference Ranking Optimization) - RLVR (Reinforcement Learning with Verifiable Rewards)
lesson
RLHF pipeline design - reward models vs judge models- Learn to design a full Reinforcement Learning with Human Feedback pipeline - Compare approaches that use - Learned reward models (e.g., preference-trained scoring networks) - LLM-as-Judge models that assign scores without a separate model - Understand trade-offs in transparency, cost, and reproducibility
lesson
Mini-Project - Location + Temperature Multi-Tool Chaining- Build a small agent that: - Extracts a user’s location from natural text - Calls a get_temperature(location) tool - Returns a formatted summary (e.g., “It’s 82°F in San Francisco right now.” - Add optional chaining logic (e.g., if temp > 85 → suggest activities; else → suggest indoor events)