The Future Of Software engineering and AI: What YOU can do about it

Webinar starts in

00DAYS
:
00HRS
:
00MINS
:
00SEC
Join the Webinar

Lessons

Explore all newline lessons

Tags
Author
Pricing
Sort By
Video
react@^17.0.1
Most Recent
Most Popular
Highest Rated
Reset

lesson

Real-world applications

- Summarization with RL - Email quality tuning → with reward-weighted phrase control - Web agent fine-tuning → Browser control with RL/IL - Academic reference → Training Web Agents with RL + IL

lesson

Rubric-based reward systems (e.g., Will Brown’s system)

- Study how to guide LLMs with step-by-step rubrics at training and inference time

lesson

RL with OpenAI, LangChain, OpenPipe, DSPy

- Learn to apply RL/RLHF using production-grade frameworks

lesson

Framework of RL

- Use DSPy to define reusable prompts, reward conditions, and scoring logic in code

lesson

DPO vs PPO vs GPRO vs RLVR

- Compare key fine-tuning strategies - DPO (Direct Preference Optimization) - PPO (Proximal Policy Optimization) - GPRO (Generalized Preference Ranking Optimization) - RLVR (Reinforcement Learning with Verifiable Rewards)

lesson

RLHF pipeline design - reward models vs judge models

- Learn to design a full Reinforcement Learning with Human Feedback pipeline - Compare approaches that use - Learned reward models (e.g., preference-trained scoring networks) - LLM-as-Judge models that assign scores without a separate model - Understand trade-offs in transparency, cost, and reproducibility

lesson

Mini-Project - Location + Temperature Multi-Tool Chaining

- Build a small agent that: - Extracts a user’s location from natural text - Calls a get_temperature(location) tool - Returns a formatted summary (e.g., “It’s 82°F in San Francisco right now.” - Add optional chaining logic (e.g., if temp > 85 → suggest activities; else → suggest indoor events)