Tokens, Embeddings & Modalities — Foundations of Understanding Text, Image, and Audio

- Understand the journey from raw text → tokens → token IDs → embeddings - Compare word-based, BPE, and advanced tokenizers (LLaMA, GPT-2, T5) - Analyze how good/bad tokenization affects loss, inference time, and semantic meaning - Learn how embedding vectors represent meaning and change with context - Explore and manipulate Word2Vec-style word embeddings through vector math and dot product similarity - Apply tokenization and embedding logic to multimodal models (CLIP, ViLT, ViT-GPT2) - Conduct retrieval and classification tasks using image and audio embeddings (CLIP, Wav2Vec2) - Discuss emerging architectures like Byte Latent Transformers and their implications