Studies
Admissions
The Institute
Resources
Studies
Admissions
The Institute
Resources
Studies
Admissions
The Institute
Resources

CS410

Data Storages

Barcelona Campus
Apr 28, 2025 - May 16, 2025
During this course we will study what problems of modern software can be solved by data storages. We will study the whole spectrum of existing data storages.
Barcelona Campus
Apr 28, 2025 - May 16, 2025
Nikolay Golov

Faculty

Nikolay Golov

CPO of Tengri Data Platform

Course length

3 weeks

Duration

3 hours
per day

Total hours

45 hours

Credits

6 ECTS

Language

English

Course type

Offline

Fee for single course

€2999

Fee for degree students

€1999

Skills you’ll learn

SQLDatabase EnginesQuery OptimizationLandscape of Modern DatabasesDesign a Multi-Database SystemBuilding a Big Data Analytical Infrastructure
OverviewCourse outlineCourse materialsPrerequisitesMethod & grading

Overview

All contemporary software platforms, whether developed by large corporations (Facebook, Google, OpenAI, etc.) or small businesses, rely on the use of databases or data storage. The foundation of this course is the notion that data storage is an answer or a solution to a problem rather than a technology in and of itself.

During this course, we will study what problems modern software can solve with data storage. We will study the whole spectrum of existing data storages, such as classical RDBMS, key-value storages, NOSQL, document storages, column storages, OLAP, vector databases, embedded and serverless databases, and their weak and strong points.

Students will learn to understand how to identify requirements for the data storages in a given software system and how to wisely choose a particular data storage (or multiple storages), taking into consideration both business requirements and the chosen software architecture (monolithic, microservice, etc.). We will study all concepts and mental models needed to understand data storage, wisely choose them, and embed them into software - manually, as a manager, or using an LLM, such as ChatGPT.

Learning highlights

  • The course starts with a brief overview of a data-storage task for any software. Data storage tasks in general, with all possible solutions, are like files, in-memory services, or specialised applications (databases).
  • We proceed with a list of requirements that proved to be essential for a data storage tool: ACID, transactions, availability of data access languages (SQL, etc.). Afterwards, we will illustrate why given requirements determined the market dominance of classical relational databases (Oracle, MS SQL, PostgreSQL, MySQL, etc.) at the end of the 20th century. Later, we’ll describe why the technological advances of the 21st century gave birth to a set of non-classical databases, such as in-memory storage, document storage, columnar storage, etc.
  • The bulk of the remaining course focuses on the tradeoffs to be considered during technology selection and database design. We will discuss a Polyglot Persistence paradigm for combining multiple databases for different facets of an application, combining their strengths and mitigating their weaknesses. We discuss the balance between performance, complexity, and permitted data delay for various databases and architectural approaches, as well as the fundamental limitations of the CAP theorem. We emphasise the difference between OLAP (analytical) and OLTP tasks and modern data warehouse designs (Data Vault, Anchor Modelling, etc.).
  • Plenty of hands-on examples and homework are given to demonstrate ideas and compare and contrast various approaches and technologies. The course wraps up with a discussion of modern state-of-the-art databases, like serverless cloud databases and global cloud tools, violating the CAP theorem. We will also discuss how modern LLM tools (ChatGPT) can be used to design data storage applications and how data storage can be used to benefit LLM (vector databases).
  • During the course we will make a significant focus on the various open-source tools to avoid relying too heavily on particular vendors.

Course outline

15 classes

Dive into the details of the course and get a sense of what each class will cover.
Monday
Tuesday
Wednesday
Thursday
Friday
Monday
1

Session 1

Introduction. Data Storage in General. CRUD. Relational model. SQLite.

Tuesday
2

Session 2

Data Modeling. ER Modelling. SQL Queries. ACID: Atomicity, Durability, Isolation, and Consistency.

Wednesday
3

Session 3

Designing Tables. Normalisation. 1NF, 2NF, 3NF, … 6NF

Thursday
4

Session 4

Classical RBMS - PostgreSQL, Oracle, Microsoft SQL, MySQL. Transaction. Levels of Transaction Isolation.

Friday
5

Session 5

Advanced Transaction Isolation Levels. Database Indexes.

Monday
6

Session 6

Analytical SQL - GROUP BY, Window Functions. Views. Reporting, BI tools.

Tuesday
7

Session 7

Document Storage. MongoDB, JSON Store. Data Lake.

Wednesday
8

Session 8

Key-value Storage. Sharding. Redis. Caching.

Thursday
9

Session 9

Data Bus. Kafka. Event Driven software Architecture.

Friday
10

Session 10

OLAP Databases. Databases for Analytics. Columnar Storage. Snowflake, BigQuery. DuckDB.

Monday
11

Session 11

Combining Databases. Polyglot Persistence.

Tuesday
12

Session 12

CAP Theorem. Distributed Systems from a Data Storage Point-of-view.

Wednesday
13

Session 13

Data Warehouse. Data Modelling for Analytics. Data Vault, Inmon, Kimball, Anchor Modeling.

Thursday
14

Session 14

Databases of the future. Serverless concept. Headless concept.

Friday
15

Session 15

Final Quiz

Prerequisites

Python coding experience.

Basic understanding of algorithms or set theory.

(optional) SQL

Methodology

Classes will consist of lectures and discussions on given topics. Each day, during a class, there will be a practical task with some type of database being discussed, of at least 5 different types (RDBS, embedded, serverless, key-value, document-storage, and OLAP). During the course, students shall work in groups and do four projects, each studying some aspect of data storage.

Grading

The final grade will be composed of the following criteria:
25% - Final quiz
60% - Practical projects (4x15%)
15% - Participation
There will be four projects, which can be done in small groups or solo, and a final quiz at the end of the course.
Nikolay Golov

Faculty

Nikolay Golov

CPO of Tengri Data Platform

Nikolay got his M.S. degree in applied mathematics and cybernetics from Moscow State University, Russia. Afterwards, he had 15 years of experience building data platforms for various startups and enterprises. From 2013 until 2019, he headed the Data Platform of Avito, Craigslist of Russia, which grew to a multi-billion-dollar company from a small startup. In Avito, he was responsible for analytical databases (Vertica, ClickHouse), OLTP engines (PostgreSQL, Redis, MongoDB), and data buses (Kafka) for analytics and microservices. Later he was Head of Data Platform at ManyChat (a California and Barcelona-based SaaS startup), responsible for the implementation and growth of its Data Platform (AWS+Redis+Snowflake+Tableau), which is being used for analytics and AI. Currently Nikolay is a CPO of a startup, creating a new analytical database, Tengri Data Platform.

See full profile

Apply for this course

Snap up your chance to enroll before all spaces fill up.

Data Storages

by Nikolay Golov

Total hours

45 Hours

Dates

Apr 28 - May 16, 2025

Fee for single course

€2999

Fee for degree students

€1999

How to secure your spot

Complete the form below to kickstart your application

Schedule your Harbour.Space interview

If successful, get ready to join us on campus

FAQ

Will I receive a certificate after completion?

Yes. Upon completion of the course, you will receive a certificate signed by the director of the program your course belonged to.

Do I need a visa?

This depends on your case. Please check with the Spanish or Thai consulate in your country of residence about visa requirements. We will do our part to provide you with the necessary documents, such as the Certificate of Enrollment.

Can I get a discount?

Yes. The easiest way to enroll in a course at a discounted price is to register for multiple courses. Registering for multiple courses will reduce the cost per individual course. Please ask the Admissions Office for more information about the other kinds of discounts we offer and what you can do to receive one.