During last two decades, an enormous amount of data became stored in electronic form and available for analysis. Recent achievements in Big Data space enabled the academia and industry to invent new ways to use data to make better decisions. Nearly every field of human endeavor ranging from medicine to finance reaped the benefits of the age of data and created extraordinary demand for data scientists. The data science is a new frontier of human knowledge and a new domain of discovery. Multiterabyte datasets that are common these days contain hidden information that will only be revealed to those who know how to mine it. In a true spirit of the explorer, the data scientist develops methods to discover knowledge hidden in an ever faster-expanding ocean of data.
The Bachelor programme in Data Science starts with the foundation in mathematics, programming, and algorithms and followed by databases, statistics, and machine learning. During the last year, students focus on applications and recent achievements in data science including advancements in image analysis, text mining, and bioinformatics. The breadth of the knowledge is further expanded at regular talks and presentations by key figures of academia and industry that are regularly held at the University. Bachelor's students apply their newly earned knowledge to a programme-long project based on an idea they propose during enrollment. The project expands as students’ knowledge grows and upon graduation the outcome is presented to peers, mentors, industry partners and even venture funds during university demo day.
The programme builds the mathematical basis upon which students will develop understanding of programming, statistics, machine learning and data management during following years. The courses are mostly given in a form of lectures and takeaway coursework.
The course offers the foundations of classical combinatorial and theoretic graph objects, concepts, and methods. The course introduces the notion of enumeration techniques with emphasis on permutations and combinations, generating functions, recurrence relations, inclusion and exclusion, and the pigeonhole principle.
This is an introductory course to the C++ programming language. By the end of this course, students should be able to understand and use the basic programming constructs of C/C++, manipulate various C/C++ datatypes, such as arrays, strings, and pointers, isolate and fix common errors in C++ programs, use memory appropriately, including proper allocation/deallocation procedures, apply object-oriented approaches to software problems in C++, write small-scale C++ programs using the above skills
The course offers the foundations of calculus including the introduction to real numbers and the notion of limits and derivatives. Students learn to solve simple limits and to use this skill in simplest applied problems
This course will give students an introduction to the aims and techniques of formal logic. The logic of truth functions and quantifiers. The concepts of validity and truth and their relation to formal deduction.
This course introduces basic concepts of linear algebra including “planes” and “spaces”. Students will grasp the notion of space structure that lies at the foundation of the general concept of “linear space". Students will also study mathematical objects such as matrices, learn the rules of operations on matrices and will be encouraged to use them on the basis of practical examples of their application.
The course introduces basic data structures and algorithms. This includes combinatorial, graph, sorting and other algorithms. By the end of the course, students will grasp the utility of formalising procedures and structuring data for analysis as well as learn to use a range of basic instruments for such work.
In this course students will learn about important instruments such as the use of the method of generating functions in enumerative combinatorics, Möbius transforms, finite differences (solving linear recurrence relations with constant coefficients) etc. We will consider more complex classes of graphs compared to those introduced in the preceding course and prove formulas and asymptotic forms for a number of trees and other connected graphs.
In this course we introduce some of the core concepts of OOP. We will also focus on programming, in particular the object-oriented programming paradigm in Python. Topics include primitives, expressions, assignments, functions, environments, OOP, and inheritance.
In this course, students develop an understanding of the definite integral. Using practical examples, students learn about the origins of this concept and its importance. Students will also learn about series, the emergence of which is justified by the special case of the power series, which are known to arise from the methodological framework of moment-generating functions that determine the majority of important topics already discussed in courses of Combinatorics and Graphs. – 2. This section of the course will include Taylor series and asymptotes, which are essential for the study of future topics that require these or other assessments such as assessment of the algorithm complexity.
This course will build upon topics that students learned in Mathematical Foundations of Computing and Algorithms and Data Structures 1. The course assumes a significantly broader programming background and will introduce more advanced algorithms and data structures including Merge-Sort, Quick-Sort, Order Statistics, heaps, hashes, search trees, etc.
The course introduces the concept of a linear vector space. We provide many practical examples that explain it Students learn to accurately manipulate multidimensional objects with the understanding that they are dealing not with abstract concepts but with objects that have a natural impact on reality. Students will also be introduced to groups, fields and rings that are important notions for the following courses and their applications.
Introduction to the fundamental concepts of computer systems. Explores how computer systems execute programs and manipulate data, working from the C programming language down to the microprocessor. Topics covered include: the C programming language, data representation, machine-level code, computer arithmetic, elements of code compilation, memory organization and management, and performance evaluation and optimisation.
The course will expose students to the most cutting-edge directions of research in combinatorics and graphs and will introduce advanced algebraic, probabilistic and topological methods for analysis of combinatorial and graph theoretic problems. These methods are particularly important due to many applications in advanced network modeling, algorithm analysis, and other domains. We will also cover practical applications of these methods.
The objective of the Operating Systems course is to familiarise students with the basic organising principles and technologies used in modern computing platforms (the operating systems together with the computer hardware), as well as their place and role in the IT field. The course will offer ample practical exercises to strengthen the understanding of core topics and to prepare students to advance their knowledge of the modern means of parallel and distributed processes for their effective application in scientific research and computing.
This course explains and analyses multivariable functions and their derivatives and integrals. We also introduce Fourier transformation, a particularly important topic, with applications ranging from probability theory to image analysis. Finally, the course covers the concept of “measure” with a focus on the Jourdan and Lebesgue measures.
Students will begin work on the Capstone Project from the very beginning of the programme. They're guided by a mentor and regularly discuss their direction and progress. In the first year, students determine a detailed objective for their project, research existing alternatives, outline differentiators of their approach, work on identifying the approach for implementation of the project. This will include creation of a development plan and implementation of a prototype. At the end of the year students will submit an outline document detailing the progress, the results of the literature research and description of a prototype.
The university will offer regular open lectures by professors, experts, and key figured by technology field. Students in data science program are required to attend many of the lectures and submit a write up describing what they learned during the talk. Students of the first year will be required to describe the statement of the problem discussed and its significance.
The second year also contains courses that start covering tremendously useful data science tools as well as technical writing instruments.
Most courses require practical coursework and a course project enabling students to get a feel for the challenges and approaches used in this field.
The students will also begin developing software for the Capstone project.
By the end of this year, students will be able to write programmes, use primary data science tools and conduct data analysis and will be ready to study applied courses during the final year of the programme.
The course introduces the notion of randomness and its appropriate terminology. We start with a classical definition of probability – the frequency and the Bernoulli scheme – and guide students towards the continuous cases, in particular, basic geometric probability being a fundamental one. The course also covers random variables and their probability distributions, as we present theory and practical examples. We proceed to methods of characterising random values including expected values, variance, etc. Other topics include Chebyshev's inequality – with examples of practical applications, laws of large numbers and central limit theorems.
Students will get acquainted with basic statistical principles, concepts, and methods, including sampling, variational series, point and interval estimation of distribution parameters, hypothesis testing and regression. The course will explore methods of moments, maximum likelihood point estimates, central static methods for building confidence intervals, Minmax and Bayesian approaches to the study of risks, Student's T test, Fisher's test, chi-squared test, Kolmogorov test etc. to test hypotheses. Students will be able to build a good theoretical base, reinforced by a large number of practical examples.
In this course, we will introduce students to Java programming language and to the necessary skills to develop Java applications.
The course introduces students to the Unix/Linux ecosystem. We start with installation and configuration of the operating system and continue with a discussion of packages, file systems, etc. The majority of the course lectures are devoted to the extremely powerful set of tools available via the Unix/Linux command line. We cover grep, awk, regular expressions, popular editors, command line Perl, shell scripting, etc. We conclude the course with the discussion of git - the source control system.
The purpose of the Parallel and Distributed Computing discipline is to acquaint students with the principles of organisation, technologies and the place and role of distributed and parallel computing in the field of information technology. Students will work with practical training elements to consolidate the information received, and to prepare for further studies in modern means of network computing and their effective application in research. There are many methods because modern analysis of big data is highly diverse. Therefore, the course is naturally divided into two modules. Previously gained knowledge is sufficient for adequate comprehension of the first part.
In this course, students will become familiar with the foundational principles of optimisation of functions with one of several variables. Students will learn about Lagrange equations and how to work with them. The course will also explore various simplex methods. Many method and algorithm illustrations will be reinforced by practical examples and computer modelling.
This course provides a broad introduction to machine learning and statistical pattern recognition. Topics include: supervised learning (generative/discriminative learning, parametric/non-parametric learning, neural networks, support vector machines); unsupervised learning (clustering, dimensionality reduction, kernel methods); learning theory (bias/variance tradeoffs; VC theory; large margins); reinforcement learning and adaptive control. The course will also discuss recent applications of machine learning, such as to robotic control, data mining, autonomous navigation, bioinformatics, speech recognition, and text and web data processing.
The course exposes students to an extremely valuable set of practical tools that has become a commonplace with professional data analysts. The bulk of the lectures is focused on R language and its popular packages. We will also cover commercial tools including Matlab, SPSS and Maple. Students will learn to install and configure these tools, load and transform data, conduct statistical analysis, draw charts and graphs, create machine learning models and perform other data analytics tasks. The course contains significant "hands on" components giving students and opportunity to gain deep understanding of the tools presented.
This course is a natural extension of courses on Probability and Statistics. Students will become familiar with the mathematical theory of stochastic processes and their applications in data analysis. The course discusses Markov chains and processes, branching processes, birth-death processes, Gaussian and Poisson processes, and martingale theorems. The course will also discuss the analysis of various time series, elements of financial mathematics and other topics.
The course introduces students to the principles and practice of computer networking. Structure and components of computer networks, packet switching, layered architectures. Applications: web/http, voice-over-IP, p2p file sharing and socket programming. Reliable transport: TCP/IP, reliable transfer, flow control, and congestion control. The network layer: names and addresses, routing. Local area networks: ethernet and switches. Wireless networks and network security.
File organization and access, buffer management, performance analysis, and storage management. Database system architecture, query optimization, transaction management, recovery, concurrency control. Reliability, protection, and integrity. Design and management issues.
The course will introduce students to the methods of evaluating the complexity of different computations, as well as the limitations of mathematical algorithms and computers. The issues and challenges discussed in this course include computational models, complexity evaluation, polynomial computable problems, polynomial algorithms, theorem hierarchy and its use for evidence of computability, polynomial reductions, reducible NP, proof of NP completeness, NP-complete problems, approximate solution of optimisation problems, problems in the polynomial hierarchy and PSPACE, probabilistic polynomial algorithms, PSPACE completeness, circuit complexity, first-order complexity, interactive proofs, interactive protocols, one-way functions and their use in cryptography.
Cryptography has become an indispensable method of ensuring security and privacy in the modern computing system. This course covers the inner workings of cryptographic primitives and how they should be used to ensure data security and privacy of communication.
This course is a natural extension of Introduction to Optimisation as well as a guide to a real life application of data analysis. The topics and methods covered include various modifications of the gradient method, the conjugate gradient method, Newton's method, self-concordant functions, convergence from convex analysis, minimisation of non-smooth functions, unconstrained minimisation of non-smooth functions, projection method, methods of stochastic search, problem of conditional extremes, dual problems, interior point method (centre tracking), regression tasks, application to classifications, and compressed sensing.
Students will become familiar with the programming languages Python and how it can be used for data analysis. This is a very important ingredient in the data scientist tool box. The course will pay special attention to the basis of the language, object-oriented programming, treatment of errors, code design and testing, string manipulation, memory model, functional programming, review of libraries, and parallel computing. Additionally the language will be examined from the standpoint of its ability to process large volumes of data.
In the second year, students will focus on the implementation phase of the project. This includes development and validation of software, collection of data, data analysis, etc. At the end of the year students will submit a progress report including the status of key stages of the project and a description of the remaining work. Students will also rehearse a presentation for their mentor to practice for the end of programme presentation that will take place at the end of the third year.
The university will offer regular open lectures by professors, experts and key figured by technology field. Students in data science program are required to attend many of the lectures and submit a write up describing what they learned during the talk. During the second year, students will be required to describe the problem statement, its significance and outline the presented solution.
The programme offers many of practical and interdisciplinary courses. The courses are taught by researchers and professionals who practice the courses they teach either academically or by sharing their professional experiences in their field.
The goal of the final year is to expose students to a range of real-world applications of the material that they've learned to ensure seamless transition into a professional roles.
The many issues examined in this course, of importance both for theory and practice, include: Hartley function, topics on sorting, topics on communication protocols, application of the rectangle method, Shannon entropy, the logic of knowledge, conditional Shannon entropy and the amount of information, coding with a small average code length, information inequality, Shannon limit, text encryption, the use of Shannon entropy in statistics, forecasting, Kolmogorov complexity, conditional complexity, PAC learning, Vapnik-Chervonenkis (VC) dimension.
The course introduces the key tool that enables efficient processing for Big Data problems. We outline classical problems in this space and explore solutions using MapRedice paradigm. Students will be given an opportunity to write MapReduce programmes and evaluate their performance. We will also cover installation and configuration aspects of key components including HDFS, MapReduce Engine, Pig, Hive and Zookeeper.
A continuation of Parallel and Disrtibuted Computing - 1
A continuation of Machine Learning -1
This course will focus on gradient methods of convex optimization with certain relaxations in the possibility of gradient calculation. In particular, the course will focus on: 1. Randomized methods 2. Dual method 3. Opportunity for parallelization 4. Accounting of sparsity 5. Markov Chain Monte Carlo (MCMC) methods 6. Non-gradient methods 7. Coordinate descent methods Students will go over the applications of the aforementioned methods to solve problems of ranking web-pages and finding transport and economic equilibria in large networks."
The course offers training that builds a solid foundation in chemistry, biology, computer science, mathematics and statistics. This training will enable students to communicate fluently with experts across these disciplines, and to have the skills necessary to apply computing tools to address contemporary problems in biology and medicine. The training will enhance the professional opportunities for undergraduates to pursue careers in pure or applied research in academia, government, pharmaceutical, medical, or biotechnology sectors.
This is a survey course covering modern achievements in the field of data science ranging from cases where data science became a vehicle for a successful commercial venture to situations where it allowed for dramatic increase in efficiency of existing businesses. We discuss advancements in web search, recommendation systems, social network analysis as well as in statistical language processing and speech recognition. Significant attention is given to software and hardware advancements that made data science analysis possible.
The objective of the course is to expose students to hardware and software factors that limit performance of a computation. The topics range from ability to recognize the limiting component of hardware (ex: cpu, io, network, RAM) by looking at a loaded system to identification of bottlenecks in source code and review of libraries that are particularly helpful in development of efficient software. As a result of this course students will learn practical skills necessary to develop and execute efficient programs.
"How can we teach a computer to determine that one section of text is about sports and another about politics? What if we want to go ahead and implement the search for similar texts and have an automatic detection of keywords? A lot of interesting tasks are related to text mining. This is the classic problem of building spam-filters and more extravagant undertakings, such as prediction of quotations on the stock exchange based on Twitter messages. This course will discuss the various problems of text mining and the mathematics behind them. We will also learn how to solve some of these problems through practical examples."
The course discusses the typical challenges in large scale software development projects and approaches to overcome them. We examine the variety of popular development processes from rigid to agile ones. Students are split into groups and offered a course project for which they will be required to develop software following each step of development process starting from requirement gathering and analysis, through development iterations and quality control culminating at deployment.
Applications of computer science to genomics, and concepts in genomics from a computer science point of view. Topics: dynamic programming, sequence alignments, hidden Markov models, Gibbs sampling, and probabilistic context-free grammars. Applications of these tools to sequence analysis: comparative genomics, DNA sequencing and assembly, genomic annotation of repeats, genes, and regulatory sequences, microarrays and gene expression, phylogeny and molecular evolution, and RNA structure. Prerequisites: familiarity with basic algorithmic concepts.
This is an extremely important practical contemporary course. With this course, we will learn: introduction to image analysis, fundamentals of image processing, image comparison, image categorisation, highlighting/focusing of objects in the image, image search by content, image recognition, analysis of the human face, the optical flow and background subtraction, track shooting and event recognition in video and computer view.
Students learn about challenges often encountered during collaborative work on a large project. These challenges stem from the professional reality that involves frequently changing requirements, imperfect effort estimations, frequent direction changes to the execution of the project as well as lack of coordination between team members. The course introduces proven techniques to manage these challenges. Successful project management allows teams to control costs, manage risks and meet deadlines. Students will learn methods to structure technical projects, identify key stages and tasks, determine task dependencies, assess the level of effort and design project plan, etc. We introduce popular project management software and offer students an opportunity to design their course project.
To conduct internet based data analysis, it is extremely important to be able to work with the internet as a graph, where the vertices are represented by web pages and the edges are represented by hyperlinks. As it happens, this graph has a definite "topology."
The course will discuss this topic, and how to use this knowledge to analyse the internet and other similar complex networks, including social, biological and inter-bank networks. In addition, the course will discuss modern algorithms on a large graphs such as PageRank, which ranks search pages by relevance to the search query and the epidemics on graphs i.e. the spread of real epidemics as well as information on social networks.
The course introduces popular data visualization packages. We discuss methods and approaches to ad-hoc data visualizations as well as factors that make visualization clear, informative and attractive.
The course introduces students to practical aspects of working within a group of peers. We discuss ways of splitting the tasks, facilitating productive meetings in a situation of professional disagreement and different styles and personalities. The course is heavily centered around discussions and exploration of practical examples. Students are split into groups and given an opportunity to observe the described phenomena.
The course covers basics of efficient, structured and organised technical writing. It introduces common structures and formats for technical documents ranging from workplace email communication to software requirements and API documentation. Students are taught to recognise audience of the documents and to build documents that meet their needs. Students will be introduced to professional writing instruments and will gain introductory experience using such tools through extensive exercises. Additionally course focuses on the creation of visual materials including diagrams and charts.
The course is offered in cooperation with Interaction Design program of Harbour.Space University. The students in the data science programme get an opportunity to learn about challenges in creating usable, inyuitive, efficient and practical design for software applications and websites. The course describes challenges in the field of interaction design and discusses solutions. The course project will help students to appreciate the value that design can bring to a product. The course is cross-listed with Interaction Design curriculum, giving students of both programmes an opportunity to collaborate and work on a project that requires significant input from both fields.
Students will complete the project in their final year. By the end of year, they will finish the development of software, testing, deployment, data acquisition and analysis, preparation of their project report and documentation and the final presentation. The project will be presented to peers, mentors, the programme director, academic and industrial partners as well as venture capital organisations.
The university will offer regular open lectures by professors, experts, and key figures in the technology field. Students in the data science programme are required to attend a significant quantity of the lectures and submit a reports describing what they learned. During the final year, students will be required to describe the problem, its context and significance, the high-level summary of the solution as well as the future steps in development.