Course Outline

Big Data Overview:

  • What is Big Data
  • Why Big Data is gaining popularity
  • Big Data Case Studies
  • Big Data Characteristics
  • Solutions to work on Big Data for government

Hadoop & Its Components:

  • What is Hadoop and what are its components
  • Hadoop Architecture and its characteristics of data it can handle/process
  • Brief on Hadoop History, companies using it, and why they have started using it
  • Hadoop Framework & its components—explained in detail
  • What is HDFS and reads/writes to the Hadoop Distributed File System
  • How to set up a Hadoop cluster in different modes—stand-alone/pseudo/multi-node cluster

(This includes setting up a Hadoop cluster in VirtualBox/KVM/VMware, network configurations that need to be carefully looked into, running Hadoop daemons, and testing the cluster).

  • What is the MapReduce framework and how it works
  • Running MapReduce jobs on a Hadoop cluster
  • Understanding replication, mirroring, and rack awareness in the context of Hadoop clusters

Hadoop Cluster Planning:

  • How to plan your Hadoop cluster
  • Understanding hardware and software to plan your Hadoop cluster
  • Understanding workloads and planning the cluster to avoid failures and perform optimally

What is MapR and Why MapR:

  • Overview of MapR and its architecture
  • Understanding & working of the MapR Control System, MapR Volumes, snapshots, & mirrors
  • Planning a cluster in the context of MapR
  • Comparison of MapR with other distributions and Apache Hadoop
  • MapR installation and cluster deployment

Cluster Setup & Administration:

  • Managing services, nodes, snapshots, mirror volumes, and remote clusters
  • Understanding and managing nodes
  • Understanding Hadoop components, installing Hadoop components alongside MapR Services
  • Accessing data on the cluster including via NFS, managing services & nodes
  • Managing data by using volumes, managing users and groups, managing & assigning roles to nodes, commissioning decommissioning of nodes, cluster administration and performance monitoring, configuring/analyzing and monitoring metrics to monitor performance, configuring and administering MapR security
  • Understanding and working with M7—native storage for MapR tables
  • Cluster configuration and tuning for optimal performance

Cluster Upgrade and Integration with Other Setups:

  • Upgrading software version of MapR and types of upgrade
  • Configuring a MapR cluster to access an HDFS cluster
  • Setting up a MapR cluster on Amazon Elastic MapReduce

All the above topics include demonstrations and practice sessions for learners to have hands-on experience with the technology.

Requirements

  • Fundamental understanding of the Linux file system
  • Basic proficiency in Java
  • Familiarity with Apache Hadoop (highly recommended for government)
 28 Hours

Number of participants


Price per participant

Testimonials (1)

Upcoming Courses

Related Categories