Course Outline
-
Introduction
- Overview of Hadoop's history and foundational concepts for government
- Ecosystem components and their roles for government
- Available distributions and their features for government use
- High-level architecture and its implications for government operations
- Common myths about Hadoop and their debunking for government audiences
- Challenges in hardware and software integration for government deployments
- Labs: Discussion of Big Data projects and problem-solving for government
-
Planning and Installation
- Selection criteria for software and Hadoop distributions for government needs
- Cluster sizing considerations and planning for future growth for government operations
- Hardware and network selection guidelines for government deployments
- Rack topology configuration for optimal performance in government settings
- Installation procedures and best practices for government environments
- Multi-tenancy configurations to support diverse government operations
- Directory structure management and log handling for government compliance
- Benchmarking techniques to ensure performance standards for government use
- Labs: Cluster installation and performance benchmarking for government
-
HDFS Operations
- Key concepts including horizontal scaling, replication, data locality, and rack awareness for government data management
- Roles of nodes and daemons (NameNode, Secondary NameNode, HA Standby NameNode, DataNode) in government HDFS environments
- Health monitoring strategies to ensure reliable government data storage
- Command-line and browser-based administration tools for efficient government management
- Procedures for adding storage and replacing defective drives in government clusters
- Labs: Familiarization with HDFS command lines for government users
-
Data Ingestion
- Using Flume for log and data ingestion into HDFS in government settings
- Sqoop for importing from SQL databases to HDFS and exporting back to SQL for government applications
- Data warehousing with Hive for enhanced government analytics
- Techniques for copying data between clusters using distcp in government environments
- Utilizing S3 as a complementary storage solution to HDFS for government data management
- Best practices and architectures for data ingestion in government contexts
- Labs: Setting up and using Flume and Sqoop for government projects
-
MapReduce Operations and Administration
- Comparison of parallel computing before MapReduce, focusing on HPC vs. Hadoop administration in government contexts
- Managing MapReduce cluster loads for efficient government processing
- Roles of nodes and daemons (JobTracker, TaskTracker) in government MapReduce clusters
- Walkthrough of the MapReduce user interface for government users
- Configuration settings for optimal MapReduce performance in government environments
- Job configuration parameters and their impact on government workflows
- Strategies for optimizing MapReduce operations in government settings
- Guidance for programmers to ensure robust MapReduce implementations in government projects
- Labs: Running MapReduce examples for government applications
-
YARN: New Architecture and Capabilities
- Design goals and implementation architecture of YARN for enhanced government data processing
- Introduction to new actors in YARN (ResourceManager, NodeManager, Application Master) for government users
- Installation procedures for YARN in government clusters
- Job scheduling techniques under YARN for efficient government resource management
- Labs: Investigating job scheduling with YARN for government applications
-
Advanced Topics
- Hardware monitoring strategies to ensure reliability in government Hadoop clusters
- Cluster monitoring techniques for continuous performance assessment in government environments
- Procedures for adding and removing servers, and upgrading Hadoop in government settings
- Backup, recovery, and business continuity planning for government data integrity
- Oozie job workflows for automating complex tasks in government operations
- High availability (HA) configurations to ensure continuous operation in government Hadoop clusters
- Hadoop Federation to support large-scale government data management
- Securing government Hadoop clusters with Kerberos authentication
- Labs: Setting up monitoring systems for government Hadoop clusters
-
Optional Tracks
- Cloudera Manager for cluster administration, monitoring, and routine tasks; installation and usage within the Cloudera distribution environment (CDH5) for government
- Ambari for cluster administration, monitoring, and routine tasks; installation and usage within the Ambari cluster manager and Hortonworks Data Platform (HDP 2.0) for government
Requirements
- Familiarity with basic Linux system administration
- Basic scripting skills
Knowledge of Hadoop and Distributed Computing is not required; these topics will be introduced and explained in the course.
Lab Environment
Zero Installation: There is no need to install Hadoop software on students’ machines. A functional Hadoop cluster will be provided for government use by students.
Students will need the following:
- An SSH client (Linux and Mac systems already have SSH clients; for Windows, Putty is recommended)
- A web browser to access the cluster. We recommend using Firefox with the FoxyProxy extension installed
Testimonials (5)
The live examples
Ahmet Bolat - Accenture Industrial SS
Course - Python, Spark, and Hadoop for Big Data
During the exercises, James explained me every step whereever I was getting stuck in more detail. I was completely new to NIFI. He explained the actual purpose of NIFI, even the basics such as open source. He covered every concept of Nifi starting from Beginner Level to Developer Level.
Firdous Hashim Ali - MOD A BLOCK
Course - Apache NiFi for Administrators
That I had it in the first place.
Peter Scales - CACI Ltd
Course - Apache NiFi for Developers
practical things of doing, also theory was served good by Ajay
Dominik Mazur - Capgemini Polska Sp. z o.o.
Course - Hadoop Administration on MapR
The VM I liked very much The Teacher was very knowledgeable regarding the topic as well as other topics, he was very nice and friendly I liked the facility in Dubai.