Technologies and Infrastructure for Big Data
Language of the course: English
Students will learn the principles of construction and the basics of organizing the development of modern software solutions for processing big data, typical errors that arise when working with big data, and signs of their manifestation, methods for eliminating typical errors that occur when working with an integration solution, basic mechanisms and algorithms for analyzing big data and extracting from them the knowledge, principles and technologies of functioning of the chosen integration platform, the possibilities of modern and promising means of integrating systems, etc. applications and services, principles of processing, storage and protection of data.
Students will learn to apply methods and tools for analyzing functional requirements for an integration solution, design and develop basic software applications for data processing using a computing cluster based on modern technologies for processing big data, perform the procedures for assembling software modules, services and components of an integration solution in accordance with the technical task, make settings for the parameters of the selected integration platform, evaluate the performance of the integration solution, to design and develop integrated solutions for data processing using one or more algorithms for data analysis and retrieval of information, develop new algorithms on the basis of existing procedures to deploy and configure integration platforms.
Students will have the skills to distribute the tasks of deploying and configuring the selected integration platform in accordance with the terms of reference, assembling software modules, services and components of the integration solution based on the selected integration platform, skills of analyzing and evaluating the development of technical specifications of the integration solution.
Main topics within the discipline:
- The evolution of big data processing systems
- HDFS Distributed File System
- MapReduce technology
- Providing fault tolerance with Apache ZooKeeper
- Resource managers YARN and Mesos
- Batch processing of big data
- Stream processing of big data
- Interactive big data processing
lectures and practical classes