- Welcome & Community
  - Course Overview
  - Community: Getting Started with Circle and Notion
- Python & Tooling Essentials
  - Intro to Python: Why Python for AI and Why Use Python 3 (Not Python 2)
  - Install Python and Set Virtual Environments
  - Basic Python Introduction: Variables, Data Types, Loops & Functions
  - Using Jupyter Notebooks
- Introduction to AI Tools & Ecosystem
  - Introduction to AI & Why Learn It
  - Models & Their Features: Choosing the Right AI Model for Your Needs
  - Finding and Using AI Models & Datasets: A Practical Guide with Hugging Face
  - Using Restricted vs Unrestricted Open-Source AI Models from Hugging Face
  - Hardware for AI: GPUs, TPUs, and Apple Silicon
  - Advanced AI Concepts Worth Knowing
  - Practical Tips on How to Be Productive with AI
- Brainstorming Ideas with AI
  - Brainstorming with Prompting Tools

AI Onboarding & Python Essentials

- Meet the instructors and understand the support ecosystem (Circle, Notion, async help)
- Learn the 4 learning pillars: concept clarity, muscle memory, project building, and peer community
- Understand course philosophy: minimize math, maximize intuition, focus on real-world relevance
- Set up accountability systems, learning tools, and productivity habits for long-term success

Orientation — Course Introduction

- Jupyter & Python Setup
  - Understanding why Python is used in AI (simplicity, libraries, end-to-end stack)
  - Exploring Jupyter Notebooks: shortcuts, code + text blocks, and cloud tools like Google Colab
- Hands-On with Arrays, Vectors, and Tensors
  - Creating and manipulating 2D and 3D NumPy arrays (reshaping, indexing, slicing)
  - Performing matrix operations: element-wise math and dot products
  - Visualizing vectors and tensors in 2D and 3D space using matplotlib
- Mathematical Foundations in Practice
  - Exponentiation and logarithms: visual intuition and matrix operations
  - Normalization techniques and why they matter in ML workflows
  - Activation functions: sigmoid and softmax with coding from scratch
- Statistics and Real Data Practice
  - Exploring core stats: mean, standard deviation, normal distributions
  - Working with real datasets (Titanic) using Pandas: filtering, grouping, feature engineering, visualization
  - Preprocessing tabular data for ML: encoding, scaling, train/test split
- Bonus Topics
  - Intro to probability, distributions, classification vs regression
  - Tensor intuition and compute providers (GPU, Colab, cloud vs local)

Orientation — Technical Kickoff

Onboarding & Tooling

- Compare transformer-based LLMs vs diffusion models and their use cases
- Understand the "lego blocks" of LLM-based systems: prompts, embeddings, generation, inference
- Explore core LLM application types: RAG, vertical models, agents, and multimodal apps
- Learn how LLMs are being used in different roles and industries (e.g., healthcare, finance, legal)
- Discuss practical project scoping: what to build vs outsource, how to identify viable ideas
- Identify limitations of LLMs: hallucinations, lack of reasoning, sensitivity to prompts
- Highlight real-world startup examples (e.g., AutoShorts, HeadshotPro) and venture-backed tools

Navigating the Landscape of LLM Projects & Modalities

- Understand how inference works in LLMs (prompt processing vs. autoregressive decoding)
- Explore real-world AI applications: RAG, vertical models, agents, multimodal tools
- Learn the five phases of the model lifecycle: pretraining to RLHF to evaluation
- Compare architecture types: generic LLMs vs. ChatGPT vs. domain-specialized models
- Work with tools like Hugging Face, Modal, and vector databases
- Build a “Hello World” LLM inference API using OPT-125m on Modal

From Theory to Practice — Building Your First LLM Application

- Metrics and Evaluation Design
- Foundation for Future Metrics Work
- Building synthetic data for AI applications

Intro to AI-Centric Evaluation

AI Projects and Use Cases

- Learn foundational prompt styles: vague vs. specific, structured formatting, XML-tagging
- Practice prompt design for controlled output: enforcing strict JSON formats with Pydantic
- Discover failure modes and label incorrect LLM behavior (e.g., hallucinations, format issues)
- Build early evaluators to measure LLM output quality and rule-following
- Write your first "LLM-as-a-judge" prompts to automate pass/fail decisions
- Iterate prompts based on analysis-feedback loops and evaluator results
- Explore advanced prompting techniques: multi-turn, rubric-based human alignment, and A/B testing
- Experiment with `dspy` for signature-based structured prompting and validation

Prompt Engineering — From Structure to Evaluation (Mini Project 1)

- Understand the journey from raw text → tokens → token IDs → embeddings
- Compare word-based, BPE, and advanced tokenizers (LLaMA, GPT-2, T5)
- Analyze how good/bad tokenization affects loss, inference time, and semantic meaning
- Learn how embedding vectors represent meaning and change with context
- Explore and manipulate Word2Vec-style word embeddings through vector math and dot product similarity
- Apply tokenization and embedding logic to multimodal models (CLIP, ViLT, ViT-GPT2)
- Conduct retrieval and classification tasks using image and audio embeddings (CLIP, Wav2Vec2)
- Discuss emerging architectures like Byte Latent Transformers and their implications

Tokens, Embeddings & Modalities — Foundations of Understanding Text, Image, and Audio

Prompt Engineering & Embeddings

- Understand how CLIP learns joint image-text representations using contrastive learning
- Run your first CLIP similarity queries and interpret shared embedding space
- Practice prompt engineering with images — and see how wording shifts retrieval results
- Build retrieval systems: text-to-image and image-to-image using cosine similarity
- Experiment with visual vector arithmetic: apply analogies to embeddings
- Explore advanced tasks like visual question answering (VQA) and image captioning
- Compare multimodal architectures: CLIP, ViLT, ViT-GPT2 and how they process fusion
- Learn how modality-specific encoders (image/audio) integrate into transformer models

Multimodal Embeddings (CLIP)

- Understand the full RAG pipeline: pre-retrieval, retrieval, and post-retrieval stages
- Learn the difference between term-based and embedding-based retrieval methods (e.g., TF-IDF, BM25 vs. vector search)
- Explore vector databases, chunking, and query optimization techniques like HyDE, reranking, and filtering
- Use contrastive learning and cosine similarity to map queries and documents into shared vector spaces
- Practice retrieval evaluation using `recall@k`, `precision@k`, and `MRR`
- Generate synthetic data using LLMs (Instructor, Pydantic) for local eval scenarios
- Implement baseline vector search pipelines using LanceDB and OpenAI embeddings (3-small, 3-large)
- Apply rerankers and statistically validate results with bootstrapping and t-tests to build intuition around eval reliability

RAG & Retrieval Techniques (Mini Project 2)

Multimodal + Retrieval-Augmented Systems

- Understand what n-grams are and how they model language with simple probabilities
- Implement bigram and trigram extraction using sliding windows over character sequences
- Construct frequency dictionaries and normalize into probability matrices
- Sample random text using bigram and trigram models to generate synthetic sequences
- Evaluate model quality using entropy, character diversity, and negative log likelihood (NLL)
- One-hot encode inputs and build PyTorch models for bigram and trigram neural networks
- Train models with cross-entropy loss and monitor training dynamics
- Compare classical vs. neural models in terms of coherence, prediction accuracy, and generalization

N-Gram Language Models (Mini Project 3)

- Triplet-Based Embedding Adaptation
- User-to-Music & E-commerce Use Cases

Triplet Loss Embedding Finetuning for Search & Ranking (Mini Project 4)

Classical Language Models

- Understand the motivation for attention: limitations of fixed-window n-gram models
- Explore how word meaning changes with context using static vs contextual embeddings (e.g., "bank" problem)
- Learn the mechanics of self-attention: Query, Key, Value, dot products, and weighted sums
- Manually compute attention scores and visualize how softmax creates probabilistic context focus
- Implement self-attention layers in PyTorch using toy examples and evaluate outputs
- Visualize attention heatmaps using real LLMs to interpret which words the model attends to
- Compare loss curves of self-attention models vs trigram models and observe learning dynamics - Understand how embeddings evolve through transformer layers and extract them using GPT-2
- Build both single-head and multi-head transformer models; compare their predictions and training performance
- Implement a Mixture-of-Experts (MoE) attention model and observe gating behavior on different inputs
- Evaluate self-attention vs MoE vs n-gram models on fluency, generalization, and loss curves
- Run meta-evaluation across all models to compare generation quality and training stability

Building Self-Attention Layers

- Understand the difference between fine-tuning and instruction fine-tuning (IFT)
- Learn when to apply fine-tuning vs IFT vs RAG based on domain, style, or output needs
- Explore lightweight tuning methods like LoRA, BitFit, and prompt tuning
- Build instruction-tuned systems for outputs like JSON, tone, formatting, or domain tasks
- Apply fine-tuning to real case studies: HTML generation, resume scoring, financial tasks
- Use Hugging Face PEFT tools to train and evaluate LoRA-tuned models
- Understand tokenizer compatibility, loss choices, and runtime hardware considerations
- Compare instruction-following performance of base vs IFT models with real examples

Instructional Finetuning with LoRA (Mini Project 5)

Attention & Finetuning

- Understand the role of linear + nonlinear layers in neural networks
- Explore how MLPs refine outputs after self-attention in transformers
- Learn the structure of FFNs (e.g., two-layer projection + activation like ReLU/SwiGLU)
- Implement your own FFN in PyTorch with real training/evaluation
- Compare activation functions: ReLU, GELU, SwiGLU
- Understand how dropout prevents co-adaptation and improves generalization
- Learn the role of LayerNorm, positional encoding, and skip connections
- Build intuition for how transformers encode depth, context, and structure into layers

Feedforward Networks & Loss-Centric Training

- Understand what CLIP is and how contrastive learning aligns image/text modalities
- Fine-tune CLIP for classification (e.g., pizza types) or regression (e.g., solar prediction)
- Add heads on top of CLIP embeddings for specific downstream tasks
- Compare zero-shot performance vs fine-tuned model accuracy
- Apply domain-specific LoRA tuning to vision/text encoders
- Explore regression/classification heads, cosine similarity scoring, and decision layers
- Learn how diffusion models extend CLIP-like embeddings for text-to-image and video generation
- Understand how video generation differs via temporal modeling, spatiotemporal coherence

Multimodal Finetuning (Mini Project 6)

Architectures & Multimodal Systems

- Connect all core transformer components: embeddings, attention, feedforward, normalization
- Implement skip connections and positional encodings manually
- Use sanity checks and test loss to debug your model assembly
- Observe transformer behavior on structured prompts and simple sequences
- Compare transformer predictions vs earlier trigram or FFN models to appreciate context depth

Full Transformer Architecture (From Scratch)

- Analyze case studies on production-grade RAG systems and tools like Relari and Evidently
- Understand common RAG bottlenecks and solutions: chunking, reranking, retriever+generator coordination
- Compare embedding models (small vs large) and reranking strategies
- Evaluate real-world RAG outputs using recall, MRR, and qualitative techniques
- Learn how RAG design changes based on use case (enterprise Q&A, citation engines, document summaries)

Advanced RAG & Retrieval Methods

Assembling & Training Transformers

- Fine-tune CLIP to classify car damage using real-world image categories
- Use Google Custom Search API to generate labeled datasets from scratch
- Apply PEFT techniques like LoRA to vision models and optimize hyperparameters with Optuna
- Evaluate accuracy using cosine similarity over natural language prompts (e.g. “a car with large damage”)
- Deploy the model in a real-world insurance agent workflow using LLaMA for reasoning over predictions

CLIP Fine-Tuning for Insurance

- Use SymPy to introduce symbolic reasoning to LLMs for math-focused applications
- Fine-tune with Chain-of-Thought (CoT) data that blends natural language with executable Python
- Learn two-stage finetuning: CoT → CoT+Tool integration
- Evaluate reasoning accuracy using symbolic checks, semantic validation, and regression metrics
- Train quantized models with LoRA and save for deployment with minimal resource overhead

Math Reasoning & Tool-Augmented Finetuning

Specialized Finetuning Projects

- Learn why base LLMs are misaligned and how preference data corrects this
- Understand the difference between DPO, PPO, RLHF, and GRPO
- Generate math-focused DPO datasets using numeric correctness as preference signal
- Apply ensemble voting to simulate “majority correctness” and eliminate hallucinations
- Evaluate model learning using preference alignment instead of reward models
- Compare training pipelines: DPO vs RLHF vs PPO — cost, control, complexity

Preference-Based Finetuning — DPO, PPO, RLHF & GRPO

- Reverse engineer modern code agents like Copilot, Cursor, Windsurf, and Augment Code
- Compare transformer context windows vs RAG + AST-powered systems
- Learn how indexing, retrieval, caching, and incremental compilation create agentic coding experiences
- Explore architecture of knowledge graphs, graph-based embeddings, and execution-aware completions
- Design your own multi-agent AI IDE stack: chunking, AST parsing, RAG + LLM collaboration

Building AI Code Agents — Case Studies from Copilot, Cursor, Windsurf

Advanced RLHF & Engineering Architectures

- Understand agent design patterns: Tool use, Planning, Reflection, Collaboration
- Learn evaluation challenges in agent systems: output variability, partial correctness
- Study architecture patterns: single-agent vs constellation/multi-agent
- Explore memory models, tool integration, and production constraints
- Compare agent toolkits: AutoGen, LangGraph, CrewAI, and practical use cases

Agent Design Patterns

- Implement text-to-SQL using structured prompts and fine-tuned models
- Train and evaluate SQL generation accuracy using execution-based metrics
- Explore text-to-music pipelines: prompt → MIDI → audio generation
- Compare contrastive vs generative learning in multimodal alignment
- Study evaluation tradeoffs for logic-heavy vs creative outputs

Text-to-SQL and Text-to-Music Architectures

Agents & Multimodal Code Systems

- Understand why self-attention requires positional encoding
- Compare encoding types: sinusoidal, RoPE, learned, binary, integer
- Study skip connections and layer norms: stability and convergence
- Learn from DeepSeek-V3 architecture: MLA (KV compression), MoE (expert gating), MTP (parallel decoding), FP8 training
- Explore when and why to use advanced transformer optimizations

Positional Encoding + DeepSeek Internals

- Map the end-to-end LLM production chain: data, serving, latency, monitoring
- Explore multi-tenant LLM APIs, vector databases, caching, rate limiting
- Understand tradeoffs between hosting vs using APIs, and inference tuning
- Plan a scalable serving stack (e.g., LLM + vector DB + API + orchestrator)
- Learn about LLMOps roles, workflows, and production-level tooling

LLM Production Chain (Inference, Deployment, CI/CD)

Deep Internals & Production Pipelines

- Explore use of RAG in enterprise settings with citation engines
- Compare hallucination reduction strategies: constrained decoding, retrieval, DPO
- Evaluate model trustworthiness for sensitive applications
- Learn from production examples in legal, compliance, and finance contexts

RAG Hallucination Control & Enterprise Search

- Break down roles: AI Engineer, Model Engineer, Researcher, PM, Architect
- Prepare for FAANG/LLM interviews with DSA, behavioral prep, and project portfolio
- Use ChatGPT and other tools for mock interviews and story crafting
- Learn how to build a standout AI resume, repo, and demo strategy
- Explore internal AI projects, indie hacker startup paths, and transition guides

Career Prep — Roles, Interviews, and AI Career Paths

- Track foundational trends: RAG, Agents, Fine-tuning, RLHF, Infra
- Understand tradeoffs of long context windows vs retrieval pipelines
- Compare agent frameworks (CrewAI vs LangGraph vs Relevance AI)
- Learn from real 2025 GenAI use cases: productivity + emotion-first design
- Stay current via curated newsletters, YouTube breakdowns, and community tools

Staying Current with AI (Research, News, and Tools)

- 2 courses - [Fundamentals of transformers with Alvin Wan](https://www.newline.co/courses/fundamentals-of-transformers-live-workshop) and [Responsive LLM Applications with Server-Sent Events](https://www.newline.co/courses/responsive-llm-applications-with-server-sent-events)
- Prompt engineering templates
- AI newsletters, channels, X, reddit channels
- Break down of LLama components
- Open source models with their capabilities
- Data sources
- AI specific cloud services
- Open source frameworks
- Project ideas from other indie hackers
- Bonus: FANG Machine learning interview cheatsheet
- Free API Keys for building AI Applications
- How people will are using GenAI in 2025
- How to stay ahead of AI Trends?
- N8N and Free High-Roi AI Automation Templates worth $50,000

Bonus Content

Enterprise LLMs, Hallucinations & Career Growth

In this bootcamp, we dive deep into Large Language Models (LLMs) to help you understand, build, and optimize their architecture for real-world applications. LLMs are revolutionizing industries—from customer support to content creation—but understanding how these models work and optimizing them for specific tasks presents unique challenges.

Over an intensive, multi-week curriculum, we cover:

The technical foundations of LLMs, including autoregressive decoding, positional encoding, and multi-head attention.
The LLM lifecycle—from large-scale pretraining to fine-tuning and instruction tuning for niche applications.
Industry best practices for model evaluation, identifying performance bottlenecks, and employing cutting-edge architectures to balance efficiency and scalability.
This bootcamp includes hours of in-depth instruction, hands-on coding sessions, and access to a dedicated community for ongoing support and discussions. Additionally, you’ll have exclusive access to code templates, an expansive reference library, and downloadable resources for continuous learning.

Your expert guides through this bootcamp are:

[Dr. Dipen Bhuva](https://www.linkedin.com/in/dipen-bhuva-21a296158/): Dr. Dipen is an AI/ML researcher with 150+ citations and 16 published research papers. He has 3 tier-1 publications, including Internet of Things (Elsevier), Biomedical Signal Processing and Control (Elsevier), and IEEE Access. In his research journey, he has collaborated with NASA-Glenn Centre, Cleveland Clinic, and the US department of energy for his research papers. He was an official reviewer and has reviewed 100+ research papers from Elsevier, IEEE Transactions, ICRA, MDPI, and other top journals and conferences. He has a PhD from Cleveland State University with a focus on LLMs in cybersecurity. He also has a master's in informatics at Northeastern University. 

[Zao Yang](https://www.linkedin.com/in/zaoyang/): Zao is a co-founder of Newline, a platform used by 150k professionals from companies like Salesforce, Adobe, Disney, and Amazon. Zao has a rich history in the tech industry, co-creating Farmville (200 million users, $3B revenue) and Kaspa ($3B market cap). Self-taught in deep learning, generative AI, and machine learning, Zao is passionate about empowering others to develop practical AI applications. His extensive knowledge of both the technical and business sides of AI projects will be invaluable as you work on your own.

With Dipen and Zao's guidance, you’ll gain practical insights into building and deploying advanced AI models, preparing you for the most challenging and rewarding roles in the AI field.

You’ll receive a comprehensive set of resources to help you master large language models.

Resources

Unlock exclusive bonuses to accelerate your AI journey.

Bonus

Everyone’s heard of ChatGPT, but what truly powers these modern large language models? It all starts with the transformer architecture. This bootcamp demystifies LLMs, taking you from concept to code and giving you a full, hands-on understanding of how transformers work. You’ll gain intuitive insights into the core components—autoregressive decoding, multi-head attention, and more—while bridging theory, math, and code. By the end, you’ll be ready to understand, build, and optimize LLMs, with the skills to read research papers, evaluate models, and confidently tackle ML interviews.

AI Bootcamp

Be able to build large language models, which can increase your salaries by $50k a year. Worth $500k over 10 years

Cheatsheet on generative AI interviews for FANGs, a $50k a year over a $500k value

A complete course on end to end streaming Langchain with a fully functional application for startups. $15k in value

Be able run consulting in AI $100k in annual value. Over 10 years. $1m

Be able to build an AI company $1M in annual value

Technical and business design review from Alvin and Zao about your project. $25000 dollars in value

$3.4M in value. This will be a $10k to $15k bootcamp in the future

What You will Gain

Understand the lifecycle of large language models, from training to inference

Build and deploy a fully functional LLM Inference API

Master tokenization techniques, including byte-pair encoding and word embeddings

Develop foundational models like n-grams and transition to transformer-based models

Implement self-attention and feed-forward neural networks in transformers

Evaluate LLM performance using metrics like perplexity

Deploy models using modern tools like Huggingface, Modal, and TorchScript

Adapt pre-trained LLMs through fine-tuning and retrieval-augmented generation (RAG)

Leverage state-of-the-art tools for data curation and adding ethical guardrails

Apply instruction-tuning techniques with low-rank adapters

Explore multi-modal LLMs integrating text, voice, images, and robotics

Understand machine learning operations, from project scoping to deployment

Design intelligent agents with planning, reflection, and collaboration capabilities

Keep up-to-date with AI trends, tools, and industry best practices

Receive technical reviews and mentorship to refine your projects

Create a robust portfolio showcasing real-world AI applications

---
title: LLM Production Chain (Inference, Deployment, CI/CD)
privateVideoUrl:
isPublicLesson: true
description: >-
  - Map the end-to-end LLM production chain: data, serving, latency, monitoring

  - Explore multi-tenant LLM APIs, vector databases, caching, rate limiting