Understand the scope and applications of NLP across industries
Master the text preprocessing pipeline using industry-standard tools
Identify and analyze linguistic ambiguities and computational challenges
Implement tokenization algorithms for different languages and domains
2.1.3 Hands-on Session
# Sample preprocessing pipelineimport nltkimport spacyfrom nltk.tokenize import word_tokenizefrom nltk.stem import PorterStemmer, WordNetLemmatizer# Text preprocessing demonstrationtext ="The researchers are researching research papers."# Expected output: tokens, stems, lemmas, POS tags
Activities: - Building a custom tokenizer for social media text - Comparing stemming vs. lemmatization performance - Multilingual preprocessing challenges
2.2 Week 2: Statistical Language Models and N-grams
2.2.1 Topic Overview
Statistical language modeling forms the foundation of modern NLP. This week covers:
Probability Theory in language: Chain rule, independence assumptions
N-gram Models: Mathematical formulation and implementation details
Smoothing Techniques: Handling zero probabilities and data sparsity
Evaluation Metrics: Perplexity, cross-entropy, and information theory
Code
graph TD # A[Unigram Model<br/>P(w)] --> B[Bigram Model<br/>P(w|w-1)] # B --> C[Trigram Model<br/>P(w|w-2,w-1)] # C --> D[N-gram Model<br/>P(w|w-n+1...w-1)] A --> E[Independence<br/>High bias, Low variance] D --> F[Context Dependence<br/>Low bias, High variance]
graph TD
# A[Unigram Model<br/>P(w)] --> B[Bigram Model<br/>P(w|w-1)]
# B --> C[Trigram Model<br/>P(w|w-2,w-1)]
# C --> D[N-gram Model<br/>P(w|w-n+1...w-1)]
A --> E[Independence<br/>High bias, Low variance]
D --> F[Context Dependence<br/>Low bias, High variance]
Figure 3: N-gram Model Hierarchy
TipMathematical Foundation
The n-gram probability is calculated as: \[P(w_i|w_{i-n+1}^{i-1}) = \frac{C(w_{i-n+1}^{i})}{C(w_{i-n+1}^{i-1})}\]
Where \(C(·)\) represents the count function in the training corpus.
2.2.2 Learning Objectives
Build and evaluate n-gram language models from scratch
Understand the bias-variance tradeoff in model complexity
Apply various smoothing techniques (Laplace, Good-Turing, Kneser-Ney)
Calculate and interpret perplexity scores
2.2.3 Hands-on Session
Implementation Tasks: - N-gram model training on different corpus sizes - Perplexity calculation and analysis - Smoothing method comparison study - Text generation using trained models
2.3 Week 3: Word Representations and Embeddings
Figure 4: Word Embedding Space
2.3.1 Topic Overview
The transition from sparse to dense word representations revolutionized NLP:
Distributional Semantics: “You shall know a word by the company it keeps”
Word2Vec Algorithms: Skip-gram and Continuous Bag of Words (CBOW)
Global Vectors (GloVe): Matrix factorization approach to embeddings
FastText: Subword information and out-of-vocabulary handling
NoteWord2Vec Intuition
Skip-gram: Predicts context words given target word
CBOW: Predicts target word given context words
Both use hierarchical softmax or negative sampling for efficiency
Long Short-Term Memory (LSTM): Solving the vanishing gradient problem
Gated Recurrent Units (GRUs): Simplified gating mechanisms
Bidirectional Networks: Capturing both forward and backward context
Code
graph LR A[Vanilla RNN<br/>Vanishing Gradients] --> B[LSTM<br/>Forget Gate + Input Gate + Output Gate] B --> C[GRU<br/>Reset Gate + Update Gate] C --> D[BiLSTM<br/>Forward + Backward Processing] B --> E[Applications:<br/>Language Modeling<br/>Sequence Classification<br/>Named Entity Recognition]
graph LR
A[Vanilla RNN<br/>Vanishing Gradients] --> B[LSTM<br/>Forget Gate + Input Gate + Output Gate]
B --> C[GRU<br/>Reset Gate + Update Gate]
C --> D[BiLSTM<br/>Forward + Backward Processing]
B --> E[Applications:<br/>Language Modeling<br/>Sequence Classification<br/>Named Entity Recognition]
Figure 5: RNN Architecture Evolution
2.4.2 Learning Objectives
Design neural architectures for sequence processing tasks
Understand gradient flow in recurrent connections
Implement bidirectional processing for improved context modeling
Apply regularization techniques specific to sequential data
2.4.3 Hands-on Session
Implementation Focus: - RNN-based language model implementation - LSTM vs. GRU comparison on sequence classification - Gradient clipping and other training stabilization techniques - Attention visualization in sequence-to-sequence models
2.5 Week 5: Attention Mechanisms and Transformers
Figure 6: Transformer Architecture
2.5.1 Topic Overview
The attention mechanism fundamentally changed how we process sequences:
Masked Language Modeling (MLM): Predict masked tokens (BERT)
Causal Language Modeling (CLM): Predict next token (GPT)
Prefix Language Modeling: Hybrid approach (GLM, PaLM)
2.6.2 Learning Objectives
Understand different pre-training strategies and their trade-offs
Master transfer learning concepts for NLP applications
Analyze scaling laws and emergent capabilities in large models
Implement fine-tuning pipelines for specific tasks
2.6.3 Hands-on Session
Practical Applications: - Fine-tuning BERT for text classification using Hugging Face - Prompt engineering with GPT models - Model comparison across different architectures - Efficient fine-tuning with parameter-efficient methods (LoRA, adapters)
2.7 Week 8: Large Language Models II — Alignment and Advanced Applications
2.7.1 Topic Overview
This module extends the study of LLMs from pre-training and fine-tuning (Week 6) toward practical alignment methods and advanced deployment techniques:
Instruction and Preference Tuning
Supervised fine-tuning for following instructions
Reinforcement Learning from Human Feedback (RLHF)
Direct Preference Optimization (DPO) as a lightweight alternative
Parameter-Efficient Methods
LoRA and adapters
Prompt-tuning and prefix-tuning
Efficient training at scale
Deployment and Applications
Retrieval-Augmented Generation (RAG) for factual grounding
Agents and tool integration (planning, external APIs, memory)
Multi-modal LLMs (text + vision + speech)
NoteAlignment & Deployment Techniques
Instruction Tuning: Supervised adaptation to human instructions
Reinforcement Learning from Human Feedback (RLHF): Aligning models with preference data
Explore model compression and acceleration for real-world deployment
Analyze emerging applications of LLMs in RAG, agents, and multi-modal systems
2.7.3 Hands-on Session
Practical Applications: - Fine-tuning with LoRA for a downstream classification task
- Implementing a toy preference-based tuning pipeline (DPO)
- Building a retrieval-augmented QA system with Hugging Face
- Experimenting with multi-modal models (e.g., LLaVA)
2.8 Week 8: Evaluation and Ethics in NLP
2.8.1 Topic Overview
Responsible AI development requires comprehensive evaluation and ethical considerations:
Evaluation Methodologies: Intrinsic vs. extrinsic evaluation frameworks
Bias Detection: Identifying and measuring algorithmic bias
Design comprehensive evaluation frameworks for NLP systems
Identify and quantify algorithmic bias using statistical methods
Implement bias mitigation techniques at different pipeline stages
Consider broader societal implications of NLP deployment
2.8.4 Hands-on Session
# Bias evaluation examplefrom sklearn.metrics import confusion_matriximport pandas as pd# Fairness metric calculationdef demographic_parity_difference(y_true, y_pred, sensitive_attr):# Implementation for measuring bias across groupspass
Evaluation Tasks: - Word embedding bias testing (WEAT, SEAT) - Fairness evaluation across demographic groups - Carbon footprint estimation for model training - Adversarial testing for robustness
3 Part II: Student Project Presentations
3.1 Project Categories and Detailed Descriptions
3.1.1 Category A: Text Classification and Analysis
# Example aspect-sentiment pairstext ="The pizza was delicious but the service was terrible."aspects = [ ("pizza", "positive"), ("service", "negative")]
3.1.1.3 Project 3: Fake News Detection with Explainability
Objective: Build interpretable fake news detection systems
graph TB A[Question] --> B[Query Encoder] B --> C[Passage Retrieval] C --> D[Passage Encoder] D --> E[Answer Extraction] E --> F[Answer] G[(Knowledge Base<br/>Wikipedia/Web)] --> C H[Retriever Model<br/>DPR/ColBERT] --> C I[Reader Model<br/>BERT/T5] --> E
graph TB
A[Question] --> B[Query Encoder]
B --> C[Passage Retrieval]
C --> D[Passage Encoder]
D --> E[Answer Extraction]
E --> F[Answer]
G[(Knowledge Base<br/>Wikipedia/Web)] --> C
H[Retriever Model<br/>DPR/ColBERT] --> C
I[Reader Model<br/>BERT/T5] --> E
Clear presentation, professional documentation, visualizations
NotePeer Evaluation Component
Students will also evaluate their peers’ presentations using a structured rubric, contributing 10% to the final grade. This encourages active engagement and critical thinking.
5 Resources and Tools
5.1 Essential Python Libraries
# Core NLP librariesimport nltk # Natural Language Toolkitimport spacy # Industrial-strength NLPimport transformers # Hugging Face Transformersimport torch # PyTorch for deep learningimport tensorflow as tf # TensorFlow alternative# Specialized librariesimport gensim # Topic modeling and word embeddingsimport scikit-learn # Traditional ML algorithmsimport datasets # Hugging Face datasetsimport evaluate # Evaluation metrics
Individual Work: Weekly topic presentations must be completed individually
Team Projects: Final projects may be completed in teams of maximum 2 students
Code Sharing: All implementations must be original with proper attribution
Plagiarism Policy: Zero tolerance for academic dishonesty
7.2 Technical Requirements
ImportantSubmission Requirements
Version Control: All projects must use Git with clear commit history
Reproducibility: Include requirements.txt and detailed setup instructions
Documentation: README with project description, usage, and results
Public Repository: GitHub repository with appropriate license
7.3 Support and Office Hours
Weekly Office Hours: Tuesdays 2-4 PM and Thursdays 10-12 PM
Online Forum: Course Slack workspace for peer discussion
Technical Support: TA sessions for implementation help
Guest Speakers: Industry professionals and researchers (select weeks)
7.4 Computing Resources
Students have access to: - Local GPUs: NVIDIA RTX 4090 for model training - Cloud Credits: $100 Google Cloud Platform credits per student - Cluster Access: University HPC cluster for large-scale experiments - Pretrained Models: Access to Hugging Face Pro for latest models
This course syllabus is subject to updates based on student needs and emerging trends in NLP research. All changes will be communicated through the course management system.