Lessons

Explore all newline lessons

Tags
Author
Pricing
Sort By
Video
Most Recent
Most Popular
Highest Rated
Reset

lesson

Real-world applications

- Summarization with RL - Email quality tuning → with reward-weighted phrase control - Web agent fine-tuning → Browser control with RL/IL - Academic reference → Training Web Agents with RL + IL

lesson

Rubric-based reward systems (e.g., Will Brown’s system)

- Study how to guide LLMs with step-by-step rubrics at training and inference time

lesson

RL with OpenAI, LangChain, OpenPipe, DSPy

- Learn to apply RL/RLHF using production-grade frameworks

lesson

Framework of RL

- Use DSPy to define reusable prompts, reward conditions, and scoring logic in code

lesson

DPO vs PPO vs GPRO vs RLVR

- Compare key fine-tuning strategies - DPO (Direct Preference Optimization) - PPO (Proximal Policy Optimization) - GPRO (Generalized Preference Ranking Optimization) - RLVR (Reinforcement Learning with Verifiable Rewards)

lesson

RLHF pipeline design - reward models vs judge models

- Learn to design a full Reinforcement Learning with Human Feedback pipeline - Compare approaches that use - Learned reward models (e.g., preference-trained scoring networks) - LLM-as-Judge models that assign scores without a separate model - Understand trade-offs in transparency, cost, and reproducibility

lesson

Mini-Project - Location + Temperature Multi-Tool Chaining

- Build a small agent that: - Extracts a user’s location from natural text - Calls a get_temperature(location) tool - Returns a formatted summary (e.g., “It’s 82°F in San Francisco right now.” - Add optional chaining logic (e.g., if temp > 85 → suggest activities; else → suggest indoor events)

lesson

XML-based UI with CoT streaming

- Implement a UI layer (mock or real) where the model streams output step-by-step using CoT (Chain-of-Thought) reasoning wrapped in XML-style syntax

lesson

Tool disambiguation & follow-up prompt chains

- Handle ambiguous or incomplete inputs by prompting the LLM to ask clarifying questions or reroute the query - Build workflows where the model decides whether to - Run a tool immediately - Request more info - Pass context to a secondary tool or summarizer

lesson

Dynamic prompt engineering for tool triggering

- Learn to design prompts that guide the LLM to select and execute tools conditionally - Use schema hints, guardrails, and examples to make the LLM reliably output structured tool-calling requests

lesson

Build - segment, evaluate, rerank → reranker training loop

- Use the identified weak topics to - Segment: Label relevant queries/chunks - Evaluate: Add gold or synthetic judgments - Rerank: Use Cohere or SBERT to rerank outputs - Train: Fine-tune a reranker on these topic-specific weak areas