Language of the course: English
- Study of the main reasons of Big Data formation. It’s detection and identification.
- Introduction to Grid technologies, WMS, MapReduce, stream data processing
- Understanding of MapReduce principles and Apache Hadoop technology
- Understanding of HDFS principles and building of Apache Hadoop infrastructure
- Introducing to Apache Storm Technology
Main topics of the discipline:
- Definition of the term large data and the basic model. Use of large data. The role of large data in the national economy.
- Requirements for the profession of analytics of large data.
- The main stages of the life cycle. Collection, consolidation and cleaning of data.
- Correlation coefficient. Graphical representation. Statement of the problem of regression analysis. Linear regression. Least square method. Their role in the analysis of large data.
- Data collection and consolidation, data visualization, R language for analytics, work with DBMS.
- Hadoop, HDFS, Map / Reduce, YARN, Storm, Apache Spark.
- Importance of the phenomenon of large data for the development of society and science. Causes of the trend of large data.
- Problems and opportunities associated with the appearance of large data.
Lectures and laboratory works.