Full Transformer Architecture (From Scratch)

- Connect all core transformer components: embeddings, attention, feedforward, normalization - Implement skip connections and positional encodings manually - Use sanity checks and test loss to debug your model assembly - Observe transformer behavior on structured prompts and simple sequences - Compare transformer predictions vs earlier trigram or FFN models to appreciate context depth