The Future Of Software engineering and AI: What YOU can do about it
Webinar starts in
lesson
Real-world applications- Summarization with RL - Email quality tuning → with reward-weighted phrase control - Web agent fine-tuning → Browser control with RL/IL - Academic reference → Training Web Agents with RL + IL
lesson
Rubric-based reward systems (e.g., Will Brown’s system)- Study how to guide LLMs with step-by-step rubrics at training and inference time
lesson
RL with OpenAI, LangChain, OpenPipe, DSPy- Learn to apply RL/RLHF using production-grade frameworks
lesson
Framework of RL- Use DSPy to define reusable prompts, reward conditions, and scoring logic in code
lesson
DPO vs PPO vs GPRO vs RLVR- Compare key fine-tuning strategies - DPO (Direct Preference Optimization) - PPO (Proximal Policy Optimization) - GPRO (Generalized Preference Ranking Optimization) - RLVR (Reinforcement Learning with Verifiable Rewards)
lesson
RLHF pipeline design - reward models vs judge models- Learn to design a full Reinforcement Learning with Human Feedback pipeline - Compare approaches that use - Learned reward models (e.g., preference-trained scoring networks) - LLM-as-Judge models that assign scores without a separate model - Understand trade-offs in transparency, cost, and reproducibility
lesson
Mini-Project - Location + Temperature Multi-Tool Chaining- Build a small agent that: - Extracts a user’s location from natural text - Calls a get_temperature(location) tool - Returns a formatted summary (e.g., “It’s 82°F in San Francisco right now.” - Add optional chaining logic (e.g., if temp > 85 → suggest activities; else → suggest indoor events)
lesson
XML-based UI with CoT streaming- Implement a UI layer (mock or real) where the model streams output step-by-step using CoT (Chain-of-Thought) reasoning wrapped in XML-style syntax
lesson
Tool disambiguation & follow-up prompt chains- Handle ambiguous or incomplete inputs by prompting the LLM to ask clarifying questions or reroute the query - Build workflows where the model decides whether to - Run a tool immediately - Request more info - Pass context to a secondary tool or summarizer
lesson
Dynamic prompt engineering for tool triggering- Learn to design prompts that guide the LLM to select and execute tools conditionally - Use schema hints, guardrails, and examples to make the LLM reliably output structured tool-calling requests
lesson
Build - segment, evaluate, rerank → reranker training loop- Use the identified weak topics to - Segment: Label relevant queries/chunks - Evaluate: Add gold or synthetic judgments - Rerank: Use Cohere or SBERT to rerank outputs - Train: Fine-tune a reranker on these topic-specific weak areas