Feedforward Networks & Loss-Centric Training

- Understand the role of linear + nonlinear layers in neural networks - Explore how MLPs refine outputs after self-attention in transformers - Learn the structure of FFNs (e.g., two-layer projection + activation like ReLU/SwiGLU) - Implement your own FFN in PyTorch with real training/evaluation - Compare activation functions: ReLU, GELU, SwiGLU - Understand how dropout prevents co-adaptation and improves generalization - Learn the role of LayerNorm, positional encoding, and skip connections - Build intuition for how transformers encode depth, context, and structure into layers