Positional Encoding + DeepSeek Internals

- Understand why self-attention requires positional encoding - Compare encoding types: sinusoidal, RoPE, learned, binary, integer - Study skip connections and layer norms: stability and convergence - Learn from DeepSeek-V3 architecture: MLA (KV compression), MoE (expert gating), MTP (parallel decoding), FP8 training - Explore when and why to use advanced transformer optimizations