Studies
Admissions
The Institute
Resources
Studies
Admissions
The Institute
Resources
Studies
Admissions
The Institute
Resources

СS207

Text Mining

Online
Feb 21, 2022 - Mar 11, 2022
The course covers the main algorithms and concepts of Text Mining, including both “classical” methods from the Information Retrieval domain and modern Deep Learning architectures.
Online
Feb 21, 2022 - Mar 11, 2022
Sergey Khoroshenkikh

Faculty

Sergey Khoroshenkikh

Senior Software Engineer at Yandex

Course length

3 weeks

Duration

3 hours
per day

Total hours

45 hours

Credits

4 ECTS

Language

English

Course type

Online

Fee for single course

€1500

Fee for degree students

€750

Skills you’ll learn

Computer ScienceNeural NetworksText AnalysisspaCyscikit-learnFastTextPyTorch
OverviewCourse outlineCourse materialsPrerequisitesMethod & grading

Overview

The Natural Language Processing (NLP) field has gained attention in recent years because of impressive algorithmic advances in Deep Learning and significant progress in hardware.

Text Mining is a subset of NLP focused on unsupervised and semi-supervised algorithms of text analysis. The course covers the main algorithms and concepts of Text Mining, including both “classical” methods from the Information Retrieval domain (like TF-IDF and topic modelling) and modern Deep Learning architectures.

Learning highlights

  • What types of problems can be solved with Text Mining.
  • Which algorithms are used for various Text Mining problems.
  • How to use practical tools for Text Mining.

Course outline

15 classes

Dive into the details of the course and get a sense of what each class will cover.
Monday
Tuesday
Wednesday
Thursday
Friday
Monday
1

Introduction

NLP pipeline with spaCy. TF-IDF. Text analysis with scikit-learn.

Tuesday
2

Language Models

Definition, algorithms, and evaluation.

Wednesday
3

Text Classification

Algorithms and feature engineering.

Thursday
4

Topic Modeling - 1

K-means clustering. Non-negative matrix factorisation (NMF).

Friday
5

Topic Modeling - 2

Latent Semantic Indexing (LSI). Latent Dirichlet Allocation (LDA).

Monday
6

Word Vectors - 1

Distributional hypothesis. Word2Vec algorithm.

Tuesday
7

Word Vectors - 2

GloVe algorithm. Typical use cases of word vectors in NLP tasks. Word2Vec in recommendation systems. Analysis of graphs using Node2Vec.

Wednesday
8

Neural Networks

Feedforward neural networks. Computation graph and backpropagation. Optimization methods.

Thursday
9

PyTorch

Tensors, gradients, layers

Friday
10

Recurrent Neural Networks - 1

Vanilla RNN. Neural Language Models.

Monday
11

Recurrent Neural Networks - 2

Vanishing gradients. LSTM and GRU. Bidirectional RNN.

Tuesday
12

Neural Language Models

Contextual word embeddings. ELMo. ULMfit.

Wednesday
13

Transformers

Attention. Transformer block, encoder, decoder. BERT.

Thursday
14

Case study

Case study: news aggregator

Friday
15

Final

Final project session

Prerequisites

Strong programming background (Python).

Understanding of machine learning concepts and algorithms.

Solid knowledge of multivariate calculus and linear algebra.

Methodology

The course is focused on practical tools and applications of text mining yet providing the necessary theoretical and algorithmic background.

During the course, students will choose a text mining problem, explore it and present the research results in the final session. Also, sessions 1-13 will be followed by graded assignments.

Grading

The final grade will be composed of the following criteria:
30% - Homework (5 home assignments x 6% each)
70% - Final Project
Sergey Khoroshenkikh

Faculty

Sergey Khoroshenkikh

Senior Software Engineer at Yandex

Sergey Khoroshenkikh is a senior software engineer with eight years of experience in applied machine learning and data analysis. He graduated from the Moscow Institute of Physics and Technology in 2015. At Yandex, he has been working on large-scale machine learning solutions for web advertising as well as routing algorithms for Yandex Delivery.

Research/Academic Interests: Random graphs, complex networks

See full profile

Apply for this course

Snap up your chance to enroll before all spaces fill up.

Text Mining

by Sergey Khoroshenkikh

Total hours

45 Hours

Dates

Feb 21 - Mar 11, 2022

Fee for single course

€1500

Fee for degree students

€750

How to secure your spot

Complete the form below to kickstart your application

Schedule your Harbour.Space interview

If successful, get ready to join us on campus

FAQ

Will I receive a certificate after completion?

Yes. Upon completion of the course, you will receive a certificate signed by the director of the program your course belonged to.

Do I need a visa?

This depends on your case. Please check with the Spanish or Thai consulate in your country of residence about visa requirements. We will do our part to provide you with the necessary documents, such as the Certificate of Enrollment.

Can I get a discount?

Yes. The easiest way to enroll in a course at a discounted price is to register for multiple courses. Registering for multiple courses will reduce the cost per individual course. Please ask the Admissions Office for more information about the other kinds of discounts we offer and what you can do to receive one.