Course Outline
-
Introduction
- Hadoop history, concepts for government
- Ecosystem
- Distributions
- High-level architecture
- Hadoop myths
- Hadoop challenges (hardware / software)
- Labs: discuss your Big Data projects and problems for government
-
Planning and Installation
- Selecting software, Hadoop distributions for government
- Sizing the cluster, planning for growth
- Selecting hardware and network
- Rack topology
- Installation
- Multi-tenancy
- Directory structure, logs
- Benchmarking
- Labs: cluster install, run performance benchmarks for government
-
HDFS Operations
- Concepts (horizontal scaling, replication, data locality, rack awareness)
- Nodes and daemons (NameNode, Secondary NameNode, HA Standby NameNode, DataNode)
- Health monitoring
- Command-line and browser-based administration
- Adding storage, replacing defective drives
- Labs: getting familiar with HDFS command lines for government
-
Data Ingestion
- Flume for logs and other data ingestion into HDFS
- Sqoop for importing from SQL databases to HDFS, as well as exporting back to SQL
- Hadoop data warehousing with Hive
- Copying data between clusters (distcp)
- Using S3 as complementary to HDFS
- Data ingestion best practices and architectures for government
- Labs: setting up and using Flume, the same for Sqoop for government
-
MapReduce Operations and Administration
- Parallel computing before MapReduce: compare HPC vs Hadoop administration for government
- MapReduce cluster loads
- Nodes and daemons (JobTracker, TaskTracker)
- MapReduce UI walkthrough
- MapReduce configuration
- Job config
- Optimizing MapReduce for government
- Fool-proofing MR: what to tell your programmers for government
- Labs: running MapReduce examples for government
-
YARN: New Architecture and Capabilities
- YARN design goals and implementation architecture for government
- New actors: ResourceManager, NodeManager, Application Master
- Installing YARN for government
- Job scheduling under YARN for government
- Labs: investigate job scheduling for government
-
Advanced Topics
- Hardware monitoring
- Cluster monitoring for government
- Adding and removing servers, upgrading Hadoop
- Backup, recovery, and business continuity planning for government
- Oozie job workflows for government
- Hadoop high availability (HA) for government
- Hadoop Federation for government
- Securing your cluster with Kerberos for government
- Labs: set up monitoring for government
-
Optional Tracks
- Cloudera Manager for cluster administration, monitoring, and routine tasks; installation, use. In this track, all exercises and labs are performed within the Cloudera distribution environment (CDH5) for government
- Ambari for cluster administration, monitoring, and routine tasks; installation, use. In this track, all exercises and labs are performed within the Ambari cluster manager and Hortonworks Data Platform (HDP 2.0) for government
Requirements
- Comfortable with basic Linux system administration
- Basic scripting skills
Knowledge of Hadoop and Distributed Computing is not required but will be introduced and explained in the course.
Lab Environment
Zero Installation: There is no need to install Hadoop software on students’ machines. A fully functional Hadoop cluster will be provided for government use by the participants.
Students will need the following:
- An SSH client (Linux and Mac systems already include SSH clients; for Windows, Putty is recommended)
- A browser to access the cluster. We recommend using Firefox with the FoxyProxy extension installed.
Testimonials (5)
The live examples
Ahmet Bolat - Accenture Industrial SS
Course - Python, Spark, and Hadoop for Big Data
During the exercises, James explained me every step whereever I was getting stuck in more detail. I was completely new to NIFI. He explained the actual purpose of NIFI, even the basics such as open source. He covered every concept of Nifi starting from Beginner Level to Developer Level.
Firdous Hashim Ali - MOD A BLOCK
Course - Apache NiFi for Administrators
That I had it in the first place.
Peter Scales - CACI Ltd
Course - Apache NiFi for Developers
practical things of doing, also theory was served good by Ajay
Dominik Mazur - Capgemini Polska Sp. z o.o.
Course - Hadoop Administration on MapR
The VM I liked very much The Teacher was very knowledgeable regarding the topic as well as other topics, he was very nice and friendly I liked the facility in Dubai.