Introduction to Natural Language Processing Course

Author

A.Belcaid

Published

September 1, 2025

1 Course Overview

NoteCourse Structure
  • Duration: 14 weeks total
  • Format: Student-led research presentations + practical sessions
  • Target: Fifth-year computer science students
  • Prerequisites: Machine Learning, Python programming, Linear Algebra
Code
gantt
    title Course Timeline
    dateFormat  YYYY-MM-DD
    section Topic Presentations
    Week 1 - NLP Fundamentals     :active, w1, 2024-09-01, 7d
    Week 2 - Language Models      :w2, after w1, 7d
    Week 3 - Word Embeddings      :w3, after w2, 7d
    Week 4 - Neural Networks      :w4, after w3, 7d
    Week 5 - Transformers         :w5, after w4, 7d
    Week 6 - Large LMs            :w6, after w5, 7d
    Week 7 - Ethics & Evaluation  :w7, after w6, 7d
    section Project Phase
    Project Development           :p1, after w7, 21d
    Final Presentations          :p2, after p1, 21d
gantt
    title Course Timeline
    dateFormat  YYYY-MM-DD
    section Topic Presentations
    Week 1 - NLP Fundamentals     :active, w1, 2024-09-01, 7d
    Week 2 - Language Models      :w2, after w1, 7d
    Week 3 - Word Embeddings      :w3, after w2, 7d
    Week 4 - Neural Networks      :w4, after w3, 7d
    Week 5 - Transformers         :w5, after w4, 7d
    Week 6 - Large LMs            :w6, after w5, 7d
    Week 7 - Ethics & Evaluation  :w7, after w6, 7d
    section Project Phase
    Project Development           :p1, after w7, 21d
    Final Presentations          :p2, after p1, 21d
Figure 1: Course Timeline and Structure

First 7 weeks: Student presentations (2 hours) + Hands-on practice (1 hour)
Last 7 weeks: Project presentations and peer evaluation

2 Part I: Weekly Topics for Student Presentations

2.1 Week 1: Fundamentals of Natural Language Processing

A flowchart showing the typical NLP processing pipeline from raw text to applications
Figure 2: NLP Pipeline Overview

2.1.1 Topic Overview

  • History and Evolution: From rule-based systems to modern neural approaches
  • Core Challenges: Ambiguity, context, pragmatics, and world knowledge
  • Text Preprocessing: Tokenization, normalization, and cleaning techniques
  • Linguistic Foundations: Morphology, syntax, semantics, and pragmatics
ImportantKey Resources

2.1.2 Learning Objectives

  1. Understand the scope and applications of NLP across industries
  2. Master the text preprocessing pipeline using industry-standard tools
  3. Identify and analyze linguistic ambiguities and computational challenges
  4. Implement tokenization algorithms for different languages and domains

2.1.3 Hands-on Session

# Sample preprocessing pipeline
import nltk
import spacy
from nltk.tokenize import word_tokenize
from nltk.stem import PorterStemmer, WordNetLemmatizer

# Text preprocessing demonstration
text = "The researchers are researching research papers."
# Expected output: tokens, stems, lemmas, POS tags

Activities: - Building a custom tokenizer for social media text - Comparing stemming vs. lemmatization performance - Multilingual preprocessing challenges

2.2 Week 2: Statistical Language Models and N-grams

2.2.1 Topic Overview

Statistical language modeling forms the foundation of modern NLP. This week covers:

  • Probability Theory in language: Chain rule, independence assumptions
  • N-gram Models: Mathematical formulation and implementation details
  • Smoothing Techniques: Handling zero probabilities and data sparsity
  • Evaluation Metrics: Perplexity, cross-entropy, and information theory
Code
graph TD
    # A[Unigram Model<br/>P(w)] --> B[Bigram Model<br/>P(w|w-1)]
    # B --> C[Trigram Model<br/>P(w|w-2,w-1)]
    # C --> D[N-gram Model<br/>P(w|w-n+1...w-1)]

    A --> E[Independence<br/>High bias, Low variance]
    D --> F[Context Dependence<br/>Low bias, High variance]
graph TD
    # A[Unigram Model<br/>P(w)] --> B[Bigram Model<br/>P(w|w-1)]
    # B --> C[Trigram Model<br/>P(w|w-2,w-1)]
    # C --> D[N-gram Model<br/>P(w|w-n+1...w-1)]

    A --> E[Independence<br/>High bias, Low variance]
    D --> F[Context Dependence<br/>Low bias, High variance]
Figure 3: N-gram Model Hierarchy
TipMathematical Foundation

The n-gram probability is calculated as: \[P(w_i|w_{i-n+1}^{i-1}) = \frac{C(w_{i-n+1}^{i})}{C(w_{i-n+1}^{i-1})}\]

Where \(C(·)\) represents the count function in the training corpus.

2.2.2 Learning Objectives

  • Build and evaluate n-gram language models from scratch
  • Understand the bias-variance tradeoff in model complexity
  • Apply various smoothing techniques (Laplace, Good-Turing, Kneser-Ney)
  • Calculate and interpret perplexity scores

2.2.3 Hands-on Session

Implementation Tasks: - N-gram model training on different corpus sizes - Perplexity calculation and analysis - Smoothing method comparison study - Text generation using trained models

2.3 Week 3: Word Representations and Embeddings

Visualization of word embeddings showing semantic relationships in vector space
Figure 4: Word Embedding Space

2.3.1 Topic Overview

The transition from sparse to dense word representations revolutionized NLP:

  • Distributional Semantics: “You shall know a word by the company it keeps”
  • Word2Vec Algorithms: Skip-gram and Continuous Bag of Words (CBOW)
  • Global Vectors (GloVe): Matrix factorization approach to embeddings
  • FastText: Subword information and out-of-vocabulary handling
NoteWord2Vec Intuition
  • Skip-gram: Predicts context words given target word
  • CBOW: Predicts target word given context words
  • Both use hierarchical softmax or negative sampling for efficiency

2.3.2 Key Resources

2.3.3 Learning Objectives

  1. Understand limitations of one-hot encoding and sparse representations
  2. Master the mathematics behind Word2Vec training objectives
  3. Implement embedding evaluation using intrinsic and extrinsic methods
  4. Analyze semantic and syntactic relationships in embedding spaces

2.3.4 Hands-on Session

# Word2Vec training example
from gensim.models import Word2Vec
import matplotlib.pyplot as plt
from sklearn.manifold import TSNE

# Training and visualization pipeline
model = Word2Vec(sentences, vector_size=100, window=5, min_count=1)
# Visualize embeddings with t-SNE

Practical Exercises: - Training Word2Vec on domain-specific corpora - t-SNE visualization of embedding clusters - Word analogy tasks: “king - man + woman = ?” - Cross-lingual embedding alignment

2.4 Week 4: Neural Networks for NLP

2.4.1 Topic Overview

The neural revolution in NLP began with architectures designed for sequential data:

  • Recurrent Neural Networks (RNNs): Processing variable-length sequences
  • Long Short-Term Memory (LSTM): Solving the vanishing gradient problem
  • Gated Recurrent Units (GRUs): Simplified gating mechanisms
  • Bidirectional Networks: Capturing both forward and backward context
Code
graph LR
    A[Vanilla RNN<br/>Vanishing Gradients] --> B[LSTM<br/>Forget Gate + Input Gate + Output Gate]
    B --> C[GRU<br/>Reset Gate + Update Gate]
    C --> D[BiLSTM<br/>Forward + Backward Processing]
    
    B --> E[Applications:<br/>Language Modeling<br/>Sequence Classification<br/>Named Entity Recognition]
graph LR
    A[Vanilla RNN<br/>Vanishing Gradients] --> B[LSTM<br/>Forget Gate + Input Gate + Output Gate]
    B --> C[GRU<br/>Reset Gate + Update Gate]
    C --> D[BiLSTM<br/>Forward + Backward Processing]
    
    B --> E[Applications:<br/>Language Modeling<br/>Sequence Classification<br/>Named Entity Recognition]
Figure 5: RNN Architecture Evolution

2.4.2 Learning Objectives

  • Design neural architectures for sequence processing tasks
  • Understand gradient flow in recurrent connections
  • Implement bidirectional processing for improved context modeling
  • Apply regularization techniques specific to sequential data

2.4.3 Hands-on Session

Implementation Focus: - RNN-based language model implementation - LSTM vs. GRU comparison on sequence classification - Gradient clipping and other training stabilization techniques - Attention visualization in sequence-to-sequence models

2.5 Week 5: Attention Mechanisms and Transformers

Detailed diagram of the Transformer architecture showing encoder-decoder structure
Figure 6: Transformer Architecture

2.5.1 Topic Overview

The attention mechanism fundamentally changed how we process sequences:

  • Attention Intuition: Differentiable dictionary lookup mechanism
  • Self-Attention: Query, Key, Value matrices and scaled dot-product
  • Multi-Head Attention: Parallel attention computations with different representations
  • Transformer Architecture: Complete replacement of recurrence with attention
ImportantAttention Formula

The scaled dot-product attention is computed as: \[\text{Attention}(Q,K,V) = \text{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right)V\]

Where \(Q\), \(K\), \(V\) are query, key, and value matrices respectively.

2.5.2 Key Resources

2.5.3 Learning Objectives

  1. Grasp the mathematical intuition behind attention mechanisms
  2. Understand positional encoding and why it’s necessary
  3. Implement multi-head attention from scratch
  4. Analyze computational complexity advantages over RNNs

2.5.4 Hands-on Session

# Self-attention implementation
import torch
import torch.nn as nn

class MultiHeadAttention(nn.Module):
    def __init__(self, d_model, num_heads):
        super().__init__()
        self.d_model = d_model
        self.num_heads = num_heads
        # Implementation details...

Practical Tasks: - Implementing scaled dot-product attention - Building a mini-Transformer for sequence classification - Attention weight visualization and interpretation - Positional encoding analysis

2.6 Week 6: Large Language Models and Pre-training

2.6.1 Topic Overview

The era of large-scale pre-trained models has transformed NLP applications:

  • Pre-training Paradigms: From word prediction to masked language modeling
  • BERT Family: Bidirectional encoder representations from transformers
  • GPT Series: Autoregressive language model scaling
  • Transfer Learning: Fine-tuning strategies for downstream tasks
Code
timeline
    title LLM Evolution Timeline
    2018 : BERT (Google)
         : GPT-1 (OpenAI)
         : ELMo (AllenNLP)
    2019 : GPT-2
         : RoBERTa
         : XLNet
    2020 : GPT-3
         : T5
         : ELECTRA
    2021 : PaLM
         : Codex
         : WebGPT
    2022 : ChatGPT
         : InstructGPT
         : LaMDA
    2023 : GPT-4
         : Claude
         : LLaMA
timeline
    title LLM Evolution Timeline
    2018 : BERT (Google)
         : GPT-1 (OpenAI)
         : ELMo (AllenNLP)
    2019 : GPT-2
         : RoBERTa
         : XLNet
    2020 : GPT-3
         : T5
         : ELECTRA
    2021 : PaLM
         : Codex
         : WebGPT
    2022 : ChatGPT
         : InstructGPT
         : LaMDA
    2023 : GPT-4
         : Claude
         : LLaMA
Figure 7: Evolution of Large Language Models
NotePre-training Objectives
  • Masked Language Modeling (MLM): Predict masked tokens (BERT)
  • Causal Language Modeling (CLM): Predict next token (GPT)
  • Prefix Language Modeling: Hybrid approach (GLM, PaLM)

2.6.2 Learning Objectives

  • Understand different pre-training strategies and their trade-offs
  • Master transfer learning concepts for NLP applications
  • Analyze scaling laws and emergent capabilities in large models
  • Implement fine-tuning pipelines for specific tasks

2.6.3 Hands-on Session

Practical Applications: - Fine-tuning BERT for text classification using Hugging Face - Prompt engineering with GPT models - Model comparison across different architectures - Efficient fine-tuning with parameter-efficient methods (LoRA, adapters)

2.7 Week 8: Large Language Models II — Alignment and Advanced Applications

2.7.1 Topic Overview

This module extends the study of LLMs from pre-training and fine-tuning (Week 6) toward practical alignment methods and advanced deployment techniques:

  • Instruction and Preference Tuning
    • Supervised fine-tuning for following instructions
    • Reinforcement Learning from Human Feedback (RLHF)
    • Direct Preference Optimization (DPO) as a lightweight alternative
  • Parameter-Efficient Methods
    • LoRA and adapters
    • Prompt-tuning and prefix-tuning
    • Efficient training at scale
  • Deployment and Applications
    • Retrieval-Augmented Generation (RAG) for factual grounding
    • Agents and tool integration (planning, external APIs, memory)
    • Multi-modal LLMs (text + vision + speech)
NoteAlignment & Deployment Techniques
  • Instruction Tuning: Supervised adaptation to human instructions
  • Reinforcement Learning from Human Feedback (RLHF): Aligning models with preference data
  • Direct Preference Optimization (DPO): Lightweight preference alignment
  • Parameter-Efficient Fine-tuning:
  • Model Compression & Acceleration: Distillation, pruning, quantization for efficient inference
  • Retrieval-Augmented Generation (RAG): Grounding LLMs with external knowledge

2.7.2 Learning Objectives

  • Understand instruction tuning and preference-based alignment strategies
  • Compare parameter-efficient fine-tuning approaches
  • Explore model compression and acceleration for real-world deployment
  • Analyze emerging applications of LLMs in RAG, agents, and multi-modal systems

2.7.3 Hands-on Session

Practical Applications: - Fine-tuning with LoRA for a downstream classification task
- Implementing a toy preference-based tuning pipeline (DPO)
- Building a retrieval-augmented QA system with Hugging Face
- Experimenting with multi-modal models (e.g., LLaVA)

2.8 Week 8: Evaluation and Ethics in NLP

2.8.1 Topic Overview

Responsible AI development requires comprehensive evaluation and ethical considerations:

  • Evaluation Methodologies: Intrinsic vs. extrinsic evaluation frameworks
  • Bias Detection: Identifying and measuring algorithmic bias
  • Fairness Metrics: Demographic parity, equalized odds, individual fairness
  • Environmental Impact: Carbon footprint of large model training
Diagram showing how bias enters NLP systems through data, algorithms, and applications
Figure 8: Bias in NLP Systems
WarningEthical Considerations
  • Representation Bias: Training data may not represent all populations
  • Measurement Bias: Evaluation metrics may favor certain groups
  • Evaluation Bias: Test sets may contain demographic skews
  • Deployment Bias: Real-world usage may differ from intended applications

2.8.2 Key Resources

2.8.3 Learning Objectives

  1. Design comprehensive evaluation frameworks for NLP systems
  2. Identify and quantify algorithmic bias using statistical methods
  3. Implement bias mitigation techniques at different pipeline stages
  4. Consider broader societal implications of NLP deployment

2.8.4 Hands-on Session

# Bias evaluation example
from sklearn.metrics import confusion_matrix
import pandas as pd

# Fairness metric calculation
def demographic_parity_difference(y_true, y_pred, sensitive_attr):
    # Implementation for measuring bias across groups
    pass

Evaluation Tasks: - Word embedding bias testing (WEAT, SEAT) - Fairness evaluation across demographic groups - Carbon footprint estimation for model training - Adversarial testing for robustness

3 Part II: Student Project Presentations

3.1 Project Categories and Detailed Descriptions

3.1.1 Category A: Text Classification and Analysis

3.1.1.1 Project 1: Multi-label News Article Classification

Objective: Build a robust multi-label classification system for news articles

  • Dataset: Reuters-21578 or BBC News Dataset
  • Techniques: Compare traditional ML (TF-IDF + SVM) vs. modern approaches (BERT, RoBERTa)
  • Evaluation: Multi-label F1, Hamming loss, subset accuracy
  • Extensions: Hierarchical classification, active learning for label efficiency
TipTechnical Challenges
  • Label Imbalance: Some categories have very few examples
  • Label Correlation: Economic news often overlaps with political news
  • Temporal Drift: News topics evolve over time

3.1.1.2 Project 2: Aspect-Based Sentiment Analysis

Objective: Joint extraction of aspects and sentiment classification

  • Dataset: SemEval ABSA datasets, restaurant/hotel reviews
  • Techniques: Joint models for aspect extraction and sentiment classification
  • Evaluation: Aspect-level F1 scores, sentiment accuracy per aspect
  • Extensions: Cross-domain adaptation, multilingual ABSA
# Example aspect-sentiment pairs
text = "The pizza was delicious but the service was terrible."
aspects = [
    ("pizza", "positive"),
    ("service", "negative")
]

3.1.1.3 Project 3: Fake News Detection with Explainability

Objective: Build interpretable fake news detection systems

  • Dataset: FakeNewsNet, LIAR dataset
  • Techniques: Feature engineering + deep learning + attention visualization
  • Evaluation: Classification metrics + human evaluation of explanations
  • Extensions: Multi-modal fake news detection (text + images)

3.1.2 Category B: Information Extraction and Retrieval

3.1.2.1 Project 4: Named Entity Recognition for Specialized Domains

Objective: Develop domain-specific NER systems

  • Dataset: BioBERT datasets (biomedical) or financial NER
  • Techniques: CRF vs. BERT-based sequence labeling
  • Evaluation: Entity-level F1, error analysis by entity type
  • Extensions: Few-shot NER, nested entity recognition

3.1.2.2 Project 5: Open-Domain Question Answering System

Objective: Build end-to-end question answering pipeline

  • Dataset: Natural Questions, MS MARCO
  • Architecture: Dense Passage Retrieval + Reading Comprehension
  • Evaluation: Exact match, F1, answer coverage analysis
  • Extensions: Multi-hop reasoning, conversational QA
Code
graph TB
    A[Question] --> B[Query Encoder]
    B --> C[Passage Retrieval]
    C --> D[Passage Encoder]
    D --> E[Answer Extraction]
    E --> F[Answer]
    
    G[(Knowledge Base<br/>Wikipedia/Web)] --> C
    H[Retriever Model<br/>DPR/ColBERT] --> C
    I[Reader Model<br/>BERT/T5] --> E
graph TB
    A[Question] --> B[Query Encoder]
    B --> C[Passage Retrieval]
    C --> D[Passage Encoder]
    D --> E[Answer Extraction]
    E --> F[Answer]
    
    G[(Knowledge Base<br/>Wikipedia/Web)] --> C
    H[Retriever Model<br/>DPR/ColBERT] --> C
    I[Reader Model<br/>BERT/T5] --> E
Figure 9: Question Answering System Architecture

3.1.2.3 Project 6: Automatic Fact Verification

Objective: Verify claims against evidence sources

  • Dataset: FEVER (Fact Extraction and VERification)
  • Pipeline: Evidence retrieval → Textual entailment → Verdict prediction
  • Evaluation: FEVER score, evidence selection accuracy
  • Extensions: Real-time fact-checking, claim generation

3.1.3 Category C: Text Generation and Summarization

3.1.3.1 Project 7: Neural Abstractive Text Summarization

Objective: Generate coherent abstractive summaries

  • Dataset: CNN/DailyMail, XSum
  • Techniques: Encoder-decoder with attention, copy mechanisms, coverage
  • Evaluation: ROUGE scores, BERTScore, factual consistency
  • Extensions: Multi-document summarization, controllable length

3.1.3.2 Project 8: Dialogue System with Personality

Objective: Develop persona-consistent chatbots

  • Dataset: PersonaChat, Blended Skill Talk
  • Techniques: Persona-aware generation, retrieval-augmented responses
  • Evaluation: Automatic metrics + human evaluation for consistency
  • Extensions: Emotional intelligence, long-term memory

3.1.3.3 Project 9: Creative Writing Assistant

Objective: AI-powered creative content generation

  • Dataset: WritingPrompts, poetry corpora
  • Techniques: Fine-tuned GPT models with controlled generation
  • Evaluation: Creativity metrics, human preference studies, style analysis
  • Extensions: Interactive story writing, genre transfer

3.1.4 Category D: Multilingual and Low-Resource NLP

3.1.4.1 Project 10: Cross-lingual Text Classification

Objective: Zero-shot transfer across languages

  • Dataset: MLDoc, XNLI
  • Techniques: Multilingual BERT, cross-lingual word embeddings
  • Evaluation: Zero-shot transfer performance analysis
  • Extensions: Few-shot adaptation, code-switching handling

3.1.4.2 Project 11: Machine Translation for Low-Resource Languages

Objective: Improve translation for under-resourced languages

  • Dataset: OPUS collections, WMT shared tasks
  • Techniques: Transfer learning, back-translation, multilingual models
  • Evaluation: BLEU, chrF, human evaluation, error analysis
  • Extensions: Pivot translation, unsupervised MT

3.1.5 Category E: Specialized Applications

3.1.5.1 Project 12: Legal Document Analysis

Objective: Automated legal document processing

  • Dataset: Legal case documents, contracts, legislative texts
  • Techniques: Domain-adapted BERT, hierarchical document modeling
  • Evaluation: Legal expert evaluation, domain-specific metrics
  • Extensions: Legal precedent search, contract risk assessment

3.1.5.2 Project 13: Medical Text Mining

Objective: Clinical decision support through NLP

  • Dataset: MIMIC-III clinical notes, PubMed abstracts
  • Techniques: BioBERT, clinical NER, relation extraction
  • Evaluation: Medical accuracy, clinical utility assessment
  • Extensions: Drug interaction prediction, diagnosis assistance

3.1.5.3 Project 14: Mental Health Detection in Social Media

Objective: Early detection of mental health indicators

  • Dataset: Reddit mental health datasets, Twitter emotion data
  • Techniques: Multi-modal analysis, temporal pattern modeling
  • Evaluation: Precision/recall for sensitive detection, ethical review
  • Extensions: Crisis intervention systems, privacy-preserving methods
WarningEthical Considerations for Mental Health Projects
  • Privacy: Anonymization and consent requirements
  • Harm Prevention: Avoiding false positives in crisis detection
  • Professional Oversight: Collaboration with mental health professionals
  • Bias Mitigation: Ensuring fair representation across demographics

4 Assessment Criteria

4.1 Topic Presentations (Weeks 1-7)

Table 1: Assessment rubric for weekly topic presentations
Criterion Weight Description
Content Quality 40% Research depth, technical accuracy, concept coverage
Presentation Skills 30% Communication clarity, visual aids, audience engagement
Technical Implementation 30% Code quality, demonstration effectiveness, innovation

4.2 Project Presentations (Weeks 8-14)

Table 2: Assessment rubric for final project presentations
Criterion Weight Key Evaluation Points
Technical Rigor 35% Methodology selection, implementation quality, experimental design
Innovation 25% Novel approaches, creative problem-solving, original insights
Results & Analysis 25% Comprehensive evaluation, error analysis, baseline comparison
Communication 15% Clear presentation, professional documentation, visualizations
NotePeer Evaluation Component

Students will also evaluate their peers’ presentations using a structured rubric, contributing 10% to the final grade. This encourages active engagement and critical thinking.

5 Resources and Tools

5.1 Essential Python Libraries

# Core NLP libraries
import nltk              # Natural Language Toolkit
import spacy             # Industrial-strength NLP
import transformers      # Hugging Face Transformers
import torch            # PyTorch for deep learning
import tensorflow as tf # TensorFlow alternative

# Specialized libraries
import gensim           # Topic modeling and word embeddings
import scikit-learn     # Traditional ML algorithms
import datasets         # Hugging Face datasets
import evaluate         # Evaluation metrics

5.2 Dataset Repositories

5.3 Evaluation Tools and Metrics

  • Text Generation: BLEU, ROUGE, BERTScore, METEOR
  • Classification: Precision, Recall, F1, AUC-ROC
  • Sequence Labeling: seqeval, entity-level F1
  • Bias and Fairness: Fairlearn, AI Fairness 360

6 Timeline and Milestones

Code
gantt
    title Detailed Course Timeline
    dateFormat  YYYY-MM-DD
    section Topic Presentations
    Week 1: NLP Fundamentals      :active, t1, 2024-09-01, 7d
    Week 2: Language Models       :t2, after t1, 7d
    Week 3: Word Embeddings       :t3, after t2, 7d
    Week 4: Neural Networks       :t4, after t3, 7d
    Week 5: Transformers          :t5, after t4, 7d
    Week 6: Large Language Models :t6, after t5, 7d
    Week 7: Ethics & Evaluation   :t7, after t6, 7d
    section Milestones
    Project Proposal Due          :milestone, m1, 2024-10-01, 0d
    Progress Check Meeting        :milestone, m2, 2024-11-01, 0d
    Final Presentation           :milestone, m3, 2024-12-01, 0d
    section Project Development
    Project Phase 1              :p1, after t7, 21d
    Project Phase 2              :p2, after p1, 21d
    Final Presentations          :p3, after p2, 14d
gantt
    title Detailed Course Timeline
    dateFormat  YYYY-MM-DD
    section Topic Presentations
    Week 1: NLP Fundamentals      :active, t1, 2024-09-01, 7d
    Week 2: Language Models       :t2, after t1, 7d
    Week 3: Word Embeddings       :t3, after t2, 7d
    Week 4: Neural Networks       :t4, after t3, 7d
    Week 5: Transformers          :t5, after t4, 7d
    Week 6: Large Language Models :t6, after t5, 7d
    Week 7: Ethics & Evaluation   :t7, after t6, 7d
    section Milestones
    Project Proposal Due          :milestone, m1, 2024-10-01, 0d
    Progress Check Meeting        :milestone, m2, 2024-11-01, 0d
    Final Presentation           :milestone, m3, 2024-12-01, 0d
    section Project Development
    Project Phase 1              :p1, after t7, 21d
    Project Phase 2              :p2, after p1, 21d
    Final Presentations          :p3, after p2, 14d
Figure 10: Detailed Course Timeline with Milestones

6.1 Key Milestones

  • Project proposal submission (2-page document)
  • Literature review progress check
  • Dataset acquisition and preliminary analysis
  • Technical approach validation with instructor
  • 5-minute project pitches to class
  • Peer feedback and suggestions
  • Final project scope confirmation
  • Team formation (if applicable)
  • Interim results presentation
  • Technical challenges discussion
  • Methodology adjustments if needed
  • Timeline reassessment and planning
  • 15-minute presentations with Q&A
  • Live demonstrations of working systems
  • Peer evaluation and feedback
  • Industry guest evaluators (when possible)

7 Additional Course Information

7.1 Collaboration and Academic Integrity

  • Individual Work: Weekly topic presentations must be completed individually
  • Team Projects: Final projects may be completed in teams of maximum 2 students
  • Code Sharing: All implementations must be original with proper attribution
  • Plagiarism Policy: Zero tolerance for academic dishonesty

7.2 Technical Requirements

ImportantSubmission Requirements
  • Version Control: All projects must use Git with clear commit history
  • Reproducibility: Include requirements.txt and detailed setup instructions
  • Documentation: README with project description, usage, and results
  • Public Repository: GitHub repository with appropriate license

7.3 Support and Office Hours

  • Weekly Office Hours: Tuesdays 2-4 PM and Thursdays 10-12 PM
  • Online Forum: Course Slack workspace for peer discussion
  • Technical Support: TA sessions for implementation help
  • Guest Speakers: Industry professionals and researchers (select weeks)

7.4 Computing Resources

Students have access to: - Local GPUs: NVIDIA RTX 4090 for model training - Cloud Credits: $100 Google Cloud Platform credits per student - Cluster Access: University HPC cluster for large-scale experiments - Pretrained Models: Access to Hugging Face Pro for latest models


This course syllabus is subject to updates based on student needs and emerging trends in NLP research. All changes will be communicated through the course management system.