Studies
Admissions
The Institute
Resources
Studies
Admissions
The Institute
Resources
Studies
Admissions
The Institute
Resources

DS314BKK

Mastering NLP: Foundations to Frontiers with LLMs

Bangkok Campus
Jun 30, 2025 - Jul 18, 2025
This course offers a comprehensive introduction to state-of-the-art NLP methods with strong focus on recent advances in LLMs, specifically BERT and GPT.
Bangkok Campus
Jun 30, 2025 - Jul 18, 2025
Polina Proskura

Faculty

Polina Proskura

Applied Scientist at Amazon

Course length

3 weeks

Duration

3 hours
per day

Total hours

45 hours

Credits

6 ECTS

Language

English

Course type

Offline

Fee for single course

€1500

Fee for degree students

€750

Skills you’ll learn

Deep Learning basicsEmbeddings in NLPText ProcessingLarge Language ModelsUsing the Open AI APIMachine TranslationSemantic Analysis
OverviewCourse outlineCourse materialsPrerequisitesMethod & grading

Overview

Natural Language Processing is central to many modern AI applications – from search engines and machine translation to large-scale text analysis. This course offers a comprehensive introduction to state-of-the-art NLP methods, with a strong focus on recent advances in large language models (LLMs), specifically BERT and GPT.

We will begin the course with the fundamentals of NLP, such as text preprocessing and vector-based representations of words and sentences. We will then move on to tasks such as text classification, language modelling, and machine translation, to develop familiarity with the core challenges in the field. Building on this foundation, we will explore modern neural network architectures for language tasks, progressing from RNNs to BERT and GPT-based transformers.

The course places a strong emphasis on both practical skills and theoretical understanding, providing valuable preparation for a future career as a data scientist.

Learning highlights

  • Understand the theoretical foundations of modern Natural Language Processing, including the architecture of complex systems such as transformers and large language models.
  • Gain hands-on experience with state-of-the-art NLP frameworks and libraries, such as PyTorch, Hugging Face Transformers, and the OpenAI API.
  • Develop the ability to implement and evaluate solutions for tasks such as text generation, machine translation, question answering, and text summarisation.
  • Become familiar with the capabilities, limitations, and complexities of NLP models in various industrial applications.
  • Invent and present their solution to a real-world NLP problem.

Course outline

15 classes

Dive into the details of the course and get a sense of what each class will cover.
Monday
Tuesday
Wednesday
Thursday
Friday
Monday
1

Introduction. Text Processing.

Structure of text data. Preprocessing techniques: tokenisation, normalisation, stemming, and lemmatisation. Text preprocessing in Python using the NLTK and spaCy libraries.

Tuesday
2

Linguistic Foundations of NLP problems.

Linguistic basics: syntax, semantics, and morphology. POS tagging and dependency parsing using real-world problems. Errors and ambiguities in real-world texts.

Wednesday
3

Text Representations.

Basic methods: Bag of Words, TF-IDF, and the vector space model. Document similarity and text classification problems.

Thursday
4

Language Modelling Problem.

N-gram language models: sequence probabilities, smoothing techniques, and perplexity. Generation strategies including greedy decoding and beam search.

Friday
5

Word Embeddings.

Distributional semantics. Word2Vec: CBOW and Skip-Gram. Negative sampling. GloVe. Evaluation of word embeddings: intrinsic vs extrinsic. Gensim library.

Monday
6

Neural Networks for NLP.

Introduction to neural networks: feedforward neural network, backpropagation, gradient descent. Text classification problem using PyTorch.

Tuesday
7

Recurrent Neural Networks and LSTMs.

RNN. Vanishing gradient problem. LSTM and GRU. Applications and evaluation.

Wednesday
8

Sequence-to-sequence problems.

Encoder-decoder models. Machine translation problem. Conditional language modelling. Bottleneck. Attention mechanism.

Thursday
9

Transformers.

Self-attention, positional encoding, residual connections. Architecture of the modern transformer. Interpretability of transformers.

Friday
10

Pretrained models and BERT.

Masked language modelling. Next sentence prediction. BERT architecture and applications. Fine-tuning BERT using the transformers library.

Monday
11

GPT models.

Causal language modelling. GPT architecture and applications. Zero-shot vs few-shot learning. Prompt engineering basics.

Tuesday
12

Evaluation and Interpretability.

Metrics for classification and generation (BLEU, ROUGE, WER). Explainability of language models. Attention visualisation.

Wednesday
13

LLM API.

Introduction to OpenAI API and HuggingFace. Prompt design. Fine-tuning, adapters, in-context learning.

Thursday
14

Modern LLMs.

Evaluation of LLM behaviour: bias, hallucinations, safety. Mixture of Experts. Retrieval-augmented generation.

Friday
15

Project.

NLP demo projects.

Prerequisites

Linear algebra: vectors, dot products, linear functions, matrices, matrix decompositions.

Probability theory and statistics.

Python: functions, classes, wrappers.

Methodology

The course consists of 3-hour sessions, which will be divided into lectures and seminars. The seminars include practical assignments that will be completed both in class with the support of the instructor and individually at home. The final project will be carried out in groups. There will be 15-minute tests each week on the material covered.

Grading

The final grade will be composed of the following criteria:
60% - Seminar assignments
20% - Tests
20% - Final project
Polina Proskura

Faculty

Polina Proskura

Applied Scientist at Amazon

Polina Proskura is an applied scientist and researcher specialising in deep learning, natural language processing, and reinforcement learning. She graduated from MIPT in 2019 and completed her Master's in Data Science at EPFL in 2022.

She currently works as an Applied Scientist at Amazon, where she contributes to various deep learning projects powering large-scale applications on amazon.com. Her recent research focuses on effective ensembling techniques for neural networks, with applications in NLP and model robustness. She is the co-author of several peer-reviewed publications, including work on uncertainty estimation for neural networks, core deep learning problems, and large-scale NLP systems.

See full profile

Apply for this course

Snap up your chance to enroll before all spaces fill up.

Mastering NLP: Foundations to Frontiers with LLMs

by Polina Proskura

Total hours

45 Hours

Dates

Jun 30 - Jul 18, 2025

Fee for single course

€1500

Fee for degree students

€750

How to secure your spot

Complete the form below to kickstart your application

Schedule your Harbour.Space interview

If successful, get ready to join us on campus

FAQ

Will I receive a certificate after completion?

Yes. Upon completion of the course, you will receive a certificate signed by the director of the program your course belonged to.

Do I need a visa?

This depends on your case. Please check with the Spanish or Thai consulate in your country of residence about visa requirements. We will do our part to provide you with the necessary documents, such as the Certificate of Enrollment.

Can I get a discount?

Yes. The easiest way to enroll in a course at a discounted price is to register for multiple courses. Registering for multiple courses will reduce the cost per individual course. Please ask the Admissions Office for more information about the other kinds of discounts we offer and what you can do to receive one.