The Future Of Software engineering and AI: What YOU can do about it

Webinar starts in

00DAYS
:
00HRS
:
00MINS
:
00SEC
Join the Webinar
Tags
    Author
      Technology
        Rating
        Pricing
        Sort By
        Video
        Results To Show
        Most Recent
        Most Popular
        Highest Rated
        Reset

        lesson

        Real-world applications

        - Summarization with RL - Email quality tuning → with reward-weighted phrase control - Web agent fine-tuning → Browser control with RL/IL - Academic reference → Training Web Agents with RL + IL

        lesson

        Rubric-based reward systems (e.g., Will Brown’s system)

        - Study how to guide LLMs with step-by-step rubrics at training and inference time

        lesson

        RL with OpenAI, LangChain, OpenPipe, DSPy

        - Learn to apply RL/RLHF using production-grade frameworks

        lesson

        Framework of RL

        - Use DSPy to define reusable prompts, reward conditions, and scoring logic in code

        lesson

        DPO vs PPO vs GPRO vs RLVR

        - Compare key fine-tuning strategies - DPO (Direct Preference Optimization) - PPO (Proximal Policy Optimization) - GPRO (Generalized Preference Ranking Optimization) - RLVR (Reinforcement Learning with Verifiable Rewards)

        lesson

        RLHF pipeline design - reward models vs judge models

        - Learn to design a full Reinforcement Learning with Human Feedback pipeline - Compare approaches that use - Learned reward models (e.g., preference-trained scoring networks) - LLM-as-Judge models that assign scores without a separate model - Understand trade-offs in transparency, cost, and reproducibility

        lesson

        Mini-Project - Location + Temperature Multi-Tool Chaining

        - Build a small agent that: - Extracts a user’s location from natural text - Calls a get_temperature(location) tool - Returns a formatted summary (e.g., “It’s 82°F in San Francisco right now.” - Add optional chaining logic (e.g., if temp > 85 → suggest activities; else → suggest indoor events)

        lesson

        XML-based UI with CoT streaming

        - Implement a UI layer (mock or real) where the model streams output step-by-step using CoT (Chain-of-Thought) reasoning wrapped in XML-style syntax

        lesson

        Tool disambiguation & follow-up prompt chains

        - Handle ambiguous or incomplete inputs by prompting the LLM to ask clarifying questions or reroute the query - Build workflows where the model decides whether to - Run a tool immediately - Request more info - Pass context to a secondary tool or summarizer

        lesson

        Dynamic prompt engineering for tool triggering

        - Learn to design prompts that guide the LLM to select and execute tools conditionally - Use schema hints, guardrails, and examples to make the LLM reliably output structured tool-calling requests

        lesson

        Build - segment, evaluate, rerank → reranker training loop

        - Use the identified weak topics to - Segment: Label relevant queries/chunks - Evaluate: Add gold or synthetic judgments - Rerank: Use Cohere or SBERT to rerank outputs - Train: Fine-tune a reranker on these topic-specific weak areas

        lesson

        Building “danger zone” maps

        - Use a 2D prioritization heatmap


        Articles

        view all ⭢