RL in decoding, CoT prompting, and feedback loops
- Understand how RL ideas are used without training by introducing dynamic feedback in inference - Apply reward scoring or confidence thresholds to adjust CoT (Chain-of-Thought) reasoning steps - Use external tools (e.g., validators or search APIs) as part of a feedback loop that rewards correct or complete answers - Understand how RL concepts power speculative decoding verification, scratchpad agents, and dynamic rerouting during generation