The Future Of Software engineering and AI: What YOU can do about it
Webinar starts in
lesson
Mapping low-satisfaction topics- Overlay user feedback or evaluator scores onto topic clusters to identify underperforming areas
lesson
BERTopic + UMAP + HDBSCAN- Learn to use BERTopic to group large sets of queries into semantically meaningful clusters - UMAP to reduce embedding dimensionality to preserve structure for visualization - HDBSCAN density-based clustering algorithm to find clusters without requiring a predefined number (unlike K-means) - Students will cluster synthetic or real queries and label dominant topic groups
lesson
Segmenting by failure type - lack of data vs lack of capability- Build classification logic to tag each failure
lesson
Segmentation-Driven Summarization- Summarization-optimized chunk generation - Fact-check and financial metadata integration - Comparing synthetic chunks vs BM25 retrieval
lesson
Regex validators- Add simple but powerful validation checks to assess whether LLM outputs meet structural expectations
lesson
Building persona-varied synthetic data- Learn how to generate diverse, realistic user queries using persona conditioning
lesson
Mini-lab - Compare decoding methods on a complex promptAI bootcamp 2- Run the same input prompt using Top-k, Top-p, and Beam search decoding - Measure differences in diversity, accuracy, repetition, and latency across the methods - Discuss which strategy works best for each context and explain why
lesson
Tokenization deep dive - Byte-level language modeling vs traditional tokenizationAI bootcamp 2- Learn how byte-level models process raw UTF-8 bytes directly, with a vocabulary size of 256 - Understand how this approach removes the need for subword tokenizers like BPE or SentencePiece - Compare byte-level models to tokenized models with larger vocabularies (e.g., 30k–50k tokens) - Analyze the trade-offs between the two approaches in terms of simplicity - Evaluate how each approach handles multilingual text - Assess the impact on model size - Examine differences in performance
lesson
Hard-negative mining strategiesAI bootcamp 2- Implement pipelines that automatically surface confusing negatives
lesson
Cohere Rerank API & SBERT fine-tuning ([sbert.net], Hugging Face)AI bootcamp 2- Learn to use off-the-shelf rerankers like Cohere’s API or fine-tune SBERT models to optimize document ranking post-retrieval
lesson
Triplet-loss fundamentals and semi-hard negative miningAI bootcamp 2- Dive into triplet formation strategies - Focusing on how to find semi-hard negatives (similar but incorrect results that challenge the model)
lesson
Tri-encoder vs cross-encoder performance trade-offsAI bootcamp 2- Explore the architectural trade-offs between Bi/tri-encoders vs cross-encoders - Learn when to use hybrid systems (e.g., bi-encoder retrieval + cross-encoder reranking)