RAG & Retrieval Techniques (Mini Project 2)

- Understand the full RAG pipeline: pre-retrieval, retrieval, and post-retrieval stages - Learn the difference between term-based and embedding-based retrieval methods (e.g., TF-IDF, BM25 vs. vector search) - Explore vector databases, chunking, and query optimization techniques like HyDE, reranking, and filtering - Use contrastive learning and cosine similarity to map queries and documents into shared vector spaces - Practice retrieval evaluation using `recall@k`, `precision@k`, and `MRR` - Generate synthetic data using LLMs (Instructor, Pydantic) for local eval scenarios - Implement baseline vector search pipelines using LanceDB and OpenAI embeddings (3-small, 3-large) - Apply rerankers and statistically validate results with bootstrapping and t-tests to build intuition around eval reliability