Technologies and Infrastructure for Big Data

Entry requirements: basic knowledge of programming and web technologies, SQL\DBMS knowledges

Credits: 6

Course: Core

Language of the course: English

Objectives

  • Identification of the main reasons for the formation of Big Data and current identification of Big Data
  • Introducing to data processing technologies: Grid, WMS, MapReduce
  • Overview of MapReduce core and Apache Hadoop technology
  • Overview of HDFS and basic infrastructure of Apache Hadoop
  • Introducing to Apache Spark technology and Apache Streaming

 

Contents

Big Data Technology certainly occupies a core role in development of modern software solutions in large industrial companies. Today, efficient processing and analyzing of Big Data bring the basis for successful business development, as well as an advantage among business rivals in industry competition. That is why this course is oriented towards development of student’s skills in Big Data processing and analyzing area. The course covers a brief description of the Big Data history, its nowadays definition and identification. Then HDFS distributed storage file system foundations will be studied, as well as the basics of Apache Hadoop and MapReduce technologies. During the course the technology of Apache Spark and Spark Streaming will be also covered. On completion of the study the student will have skills to work with the basic Big Data technologies, such as, Apache Hadoop and Apache Spark.

Format

Lectures and workshops

Assessment

Attendance is mandatory.

Grading: 60% course work: 20% data crawler, 20% implementation on Big Data technologies, 20% data analysis and reporting; 20% work in workshops; 20% final examination test.