Big Data Hadoop Live Project Training Online Course

Big Data Hadoop Live Project Training

All levels
English

Course Description

Big Data and Hadoop online training is essential to understand the power of Big Data. The training introduces about Hadoop, MapReduce, and Hadoop Distributed File system (HDFS). It will drive you through the process of developing distributed processing of large data sets across clusters of computers and administering Hadoop. The participants will learn how to handle heterogeneous data coming from...

See more See less

What you’ll learn

Big Data Hadoop Live Project Training
Timely Doubt Resolution
Dedicated Student Success Mentor
Certification & Job Assistance
Free Access to Workshop & Webinar
Certification & Job Assistance
No Cost EMI Option
Role of Relational Database Management System (RDBMS) and Grid computing
Concepts of MapReduce and HDFS
Using Hadoop I/O to write MapReduce programs
Set up Hadoop cluster and administer
Use of Sqoop in controlling the import and consistency
Hadoop testing applications using MRUnit and other automation tools
Concepts of MapReduce and HDFS
Develop MapReduce applications to solve the problems
Hive, a data warehouse software, for querying and managing large datasets residing in distributed storage
Spark, Spark SQL, Streaming, Data Frame, RDD, GraphX and MLlib writing Spark applications
Configuring ETL tools like Pentaho/Talend to work with MapReduce, Hive, Pig, etc.

Covering Topics | Program Insights

Lecture-1 Introduction to Big Data and Hadoop

Lecture-2 Hadoop Architecture and HDFS

Lecture-3 Hadoop Cluster Configuration

Lecture-4 Big Data Processing with MapReduce

Lecture-5 Analysis using Apache Pig

Lecture-6 Analysis using Hive Data Warehousing Infrastructure

Lecture-7 Advanced Apache Hive and HBase

Lecture-8 Real Time Analytics with Apache Spark

Lecture-9 Importing and Exporting Data using Sqoop

Lecture-10 Oozie Workflow Management and Using Flume for Analyzing Streaming Data

Lecture-11 Visualizing Big Data

Lecture-12 Introducing Cloud Computing

Curriculum

  Lecture-1 Introduction to Big Data and Hadoop
·       Understanding Big Data

·       Types of Big Data

·       Big Data Challenges

·       Limitations & Solutions of Big Data Architecture

·       Hadoop & its Features

·       Hadoop Ecosystem

·       Different Hadoop Distributions

·       Difference between Traditional Data and Big Data

·       Hadoop 2.x Core Components Preview

·       Hadoop Storage: HDFS (Hadoop Distributed File System)

·       Hadoop Processing: MapReduce Framework

·       Distributed Data Storage in Hadoop, HDFS and Hbase

·       Hadoop Data processing Analyzing Services MapReduce and spark, Hive Pig and Storm

·       Data Integration Tools in Hadoop

·       Resource Management and cluster management Services

·       Practical Exercise

  Lecture-2 Hadoop Architecture and HDFS
·       Hadoop 2.x Cluster Architecture

·       Federation and High Availability Architecture

·       Typical Production Hadoop Cluster

·       Hadoop Cluster Modes

·       Common Hadoop Shell Commands

·       Hadoop 2.x Configuration Files

·       Single Node Cluster & Multi-Node Cluster set up

·       Basic Hadoop Administration

·       Need of Hadoop in Big Data

·       The MapReduce Framework

·       What is YARN?

·       Understanding Big Data Components

·       Monitoring, Management and Orchestration Components 
        of Hadoop Ecosystem

·       Different Distributions of Hadoop

·       Practical Exercise

  Lecture-3 Hadoop Cluster Configuration
·       Hortonworks sandbox installation & configuration

·       Hadoop Configuration files

·       Working with Hadoop services using Ambari

·       Hadoop Daemons

·       Browsing Hadoop UI consoles

·       Basic Hadoop Shell commands

·       Eclipse & winscp installation & configurations on VM

·       Practical Exercise

  Lecture-4 Big Data Processing with MapReduce
·       Running a MapReduce application in MR2

·       MapReduce Framework on YARN

·       Fault tolerance in YARN

·       Map, Reduce & Shuffle phases

·       Understanding Mapper, Reducer & Driver classes

·       Writing MapReduce WordCount program

·       Executing & monitoring a Map Reduce job

·       Counters

·       Distributed Cache

·       MRunit

·       Reduce Join

·       Custom Input Format

·       Sequence Input Format

·       XML file Parsing using MapReduce

·       Practical Exercise

  Lecture-5 Analysis using Apache Pig
·       Introduction to Apache Pig

·       MapReduce vs Pig

·       Pig Components & Pig Execution

·       Pig architecture

·       Pig Data Types & Data Models in Pig

·       Pig Latin Programs

·       Shell and Utility Commands

·       Pig processing – loading and transforming data

·       Pig built-in functions

·       Filtering, grouping, sorting data

·       Relational join operators

·       Pig UDF & Pig Streaming

·       Testing Pig scripts with Punit

·       Aviation use-case in PIG

·       Pig Demo of Healthcare Dataset

·       Practical Exercise

  Lecture-6 Analysis using Hive Data Warehousing Infrastructure
·       Background of Hive

·       Hive vs Pig

·       Hive architecture and Components

·       Hive Metastore

·       Comparison with Traditional Database

·       Limitations of Hive

·       Hive Query Language

·       Derby to MySQL database

·       Managed & external tables

·       Data processing – loading data into tables

·       Hive Query Language

·       Using Hive built-in functions

·       Hive Data Types and Data Models

·       Partitioning data using Hive

·       Bucketing data

·       Hive Scripting

·       Using Hive UDF's

·       Hive Tables (Managed Tables and External Tables)

·       Importing Data

·       Querying Data & Managing Outputs

·       Hive Demo on Healthcare Dataset

·       Practical Exercise

  Lecture-7 Advanced Apache Hive and HBase
·       Hive QL: Joining Tables, Dynamic Partitioning

·       Custom MapReduce Scripts

·       Hive Indexes and views

·       Hive Query Optimizers

·       Hive Thrift Server

·       Hive UDF

·       Apache HBase: Introduction to NoSQL Databases and HBase

·       HBase v/s RDBMS

·       HBase Components

·       HBase Architecture

·       HBase shell

·       HBase Client API

·       Hive Data Loading Techniques

·       HBase Run Modes

·       HBase Configuration

·       Creating table

·       Creating column families

·       CLI commands – get, put, delete & scan

·       Scan Filter operations

·       Zookeeper & its role in HBase environment

·       Apache Zookeeper Introduction

·       ZooKeeper Data Model

·       Zookeeper Service

·       HBase Bulk Loading

·       Getting and Inserting Data

·       HBase Filters

·       Practical Exercise

  Lecture-8 Real Time Analytics with Apache Spark
·       What is Spark

·       Spark Ecosystem

·       Spark Components

·       What is Scala

·       Why Scala

·       Spark Context

·       Spark RDD

·       A short introduction to streaming

·       Spark Streaming

·       Discretized Streams

·       Stateful and stateless transformations

·       Checkpointing

·       Operating with other streaming platforms (such as Apache Kafka)

·       Structured Streaming

·       Practical Exercise

  Lecture-9 Importing and Exporting Data using Sqoop
·       Importing data from RDBMS to HDFS

·       Exporting data from HDFS to RDBMS

·       Importing & exporting data between RDBMS & Hive tables

·       Practical Exercise

  Lecture-10 Oozie Workflow Management and Using Flume for Analyzing Streaming Data
·       Overview of Oozie

·       Oozie Workflow Architecture

·       Creating workflows with Oozie

·       Introduction to Flume

·       Flume Architecture

·       Flume Demo

·       Practical Exercise

  Lecture-11 Visualizing Big Data
·       Introduction

·       Tableau

·       Chart types

·       Data visualization tools

·       Practical Exercise

  Lecture-12 Introducing Cloud Computing
·       Cloud computing basics

·       Concepts and terminology

·       Goals and benefits

·       Risks and challenges

·       Roles and boundaries

·       Cloud characteristics

·       Cloud delivery models

·       Cloud deployment models

·       Practical Exercise

Frequently Asked Questions

The candidates with basic understanding of computers, SQL, and elementary programing skills in Python are ideal for this training.

The course offers a variety of online training options, including: • Live Virtual Classroom Training: Participate in real-time interactive sessions with instructors and peers. • 1:1 Doubt Resolution Sessions: Get personalized assistance and clarification on course-related queries. • Recorded Live Lectures*: Access recorded sessions for review or to catch up on missed classes. • Flexible Schedule: Enjoy the flexibility to learn at your own pace and according to your schedule.

Live Virtual Classroom Training allows you to attend instructor-led sessions in real-time through an online platform. You can interact with the instructor, ask questions, participate in discussions, and collaborate with fellow learners, simulating the experience of a traditional classroom setting from the comfort of your own space.

If you miss a live session, you can access recorded lectures* to review the content covered during the session. This allows you to catch up on any missed material at your own pace and ensures that you don't fall behind in your learning journey.

Ans: The course offers a flexible schedule, allowing you to learn at times that suit you best. Whether you have other commitments or prefer to study during specific hours, the course structure accommodates your needs, enabling you to balance your learning with other responsibilities effectively. *Note: Availability of recorded live lectures may vary depending on the course and training provider.