DS314BKK

Faculty
Polina Proskura
Applied Scientist at Amazon
Course length
Duration
Total hours
Credits
Language
Course type
Fee for single course
Fee for degree students
Skills you’ll learn
Natural Language Processing is central to many modern AI applications – from search engines and machine translation to large-scale text analysis. This course offers a comprehensive introduction to state-of-the-art NLP methods, with a strong focus on recent advances in large language models (LLMs), specifically BERT and GPT.
We will begin the course with the fundamentals of NLP, such as text preprocessing and vector-based representations of words and sentences. We will then move on to tasks such as text classification, language modelling, and machine translation, to develop familiarity with the core challenges in the field. Building on this foundation, we will explore modern neural network architectures for language tasks, progressing from RNNs to BERT and GPT-based transformers.
The course places a strong emphasis on both practical skills and theoretical understanding, providing valuable preparation for a future career as a data scientist.
15 classes
Structure of text data. Preprocessing techniques: tokenisation, normalisation, stemming, and lemmatisation. Text preprocessing in Python using the NLTK and spaCy libraries.
Linguistic basics: syntax, semantics, and morphology. POS tagging and dependency parsing using real-world problems. Errors and ambiguities in real-world texts.
Basic methods: Bag of Words, TF-IDF, and the vector space model. Document similarity and text classification problems.
N-gram language models: sequence probabilities, smoothing techniques, and perplexity. Generation strategies including greedy decoding and beam search.
Distributional semantics. Word2Vec: CBOW and Skip-Gram. Negative sampling. GloVe. Evaluation of word embeddings: intrinsic vs extrinsic. Gensim library.
Introduction to neural networks: feedforward neural network, backpropagation, gradient descent. Text classification problem using PyTorch.
RNN. Vanishing gradient problem. LSTM and GRU. Applications and evaluation.
Encoder-decoder models. Machine translation problem. Conditional language modelling. Bottleneck. Attention mechanism.
Self-attention, positional encoding, residual connections. Architecture of the modern transformer. Interpretability of transformers.
Masked language modelling. Next sentence prediction. BERT architecture and applications. Fine-tuning BERT using the transformers library.
Causal language modelling. GPT architecture and applications. Zero-shot vs few-shot learning. Prompt engineering basics.
Metrics for classification and generation (BLEU, ROUGE, WER). Explainability of language models. Attention visualisation.
Introduction to OpenAI API and HuggingFace. Prompt design. Fine-tuning, adapters, in-context learning.
Evaluation of LLM behaviour: bias, hallucinations, safety. Mixture of Experts. Retrieval-augmented generation.
NLP demo projects.
Books
Media
Linear algebra: vectors, dot products, linear functions, matrices, matrix decompositions.
Probability theory and statistics.
Python: functions, classes, wrappers.
The course consists of 3-hour sessions, which will be divided into lectures and seminars. The seminars include practical assignments that will be completed both in class with the support of the instructor and individually at home. The final project will be carried out in groups. There will be 15-minute tests each week on the material covered.
Polina Proskura is an applied scientist and researcher specialising in deep learning, natural language processing, and reinforcement learning. She graduated from MIPT in 2019 and completed her Master's in Data Science at EPFL in 2022.
She currently works as an Applied Scientist at Amazon, where she contributes to various deep learning projects powering large-scale applications on amazon.com. Her recent research focuses on effective ensembling techniques for neural networks, with applications in NLP and model robustness. She is the co-author of several peer-reviewed publications, including work on uncertainty estimation for neural networks, core deep learning problems, and large-scale NLP systems.
See full profileApply for this course
by Polina Proskura
Total hours
45 Hours
Dates
Jun 30 - Jul 18, 2025
Fee for single course
€1500
Fee for degree students
€750
How to secure your spot
Complete the form below to kickstart your application
Schedule your Harbour.Space interview
If successful, get ready to join us on campus
FAQ
Will I receive a certificate after completion?
Yes. Upon completion of the course, you will receive a certificate signed by the director of the program your course belonged to.
Do I need a visa?
This depends on your case. Please check with the Spanish or Thai consulate in your country of residence about visa requirements. We will do our part to provide you with the necessary documents, such as the Certificate of Enrollment.
Can I get a discount?
Yes. The easiest way to enroll in a course at a discounted price is to register for multiple courses. Registering for multiple courses will reduce the cost per individual course. Please ask the Admissions Office for more information about the other kinds of discounts we offer and what you can do to receive one.