Course Outline
Section 1: Introduction to Hadoop for Government
- Hadoop history, concepts
- Ecosystem
- Distributions
- High-level architecture
- Hadoop myths
- Hadoop challenges
- Hardware and software requirements
- Laboratory: First look at Hadoop
Section 2: HDFS for Government
- Design and architecture
- Concepts (horizontal scaling, replication, data locality, rack awareness)
- Daemons: Namenode, Secondary namenode, Data node
- Communications and heartbeats
- Data integrity
- Read/write path
- Namenode High Availability (HA), Federation
- Laboratory: Interacting with HDFS
Section 3: MapReduce for Government
- Concepts and architecture
- Daemons (MapReduce Version 1): Jobtracker, Tasktracker
- Phases: Driver, Mapper, Shuffle/Sort, Reducer
- MapReduce Version 1 and Version 2 (YARN)
- Internals of MapReduce
- Introduction to Java MapReduce program
- Laboratory: Running a sample MapReduce program
Section 4: Pig for Government
- Pig vs. Java MapReduce
- Pig job flow
- Pig Latin language
- ETL with Pig
- Transformations and Joins
- User-defined functions (UDF)
- Laboratory: Writing Pig scripts to analyze data
Section 5: Hive for Government
- Architecture and design
- Data types
- SQL support in Hive
- Creating Hive tables and querying
- Partitions
- Joins
- Text processing
- Laboratory: Various exercises on processing data with Hive
Section 6: HBase for Government
- Concepts and architecture
- HBase vs. RDBMS vs. Cassandra
- HBase Java API
- Time series data on HBase
- Schema design
- Laboratory: Interacting with HBase using shell; Programming in HBase Java API; Schema design exercise
Requirements
- Proficient in Java programming language (most programming exercises are conducted in Java)
- Comfortable in a Linux environment (ability to navigate the Linux command line and edit files using vi or nano)
Lab Environment
Zero Install: There is no need for students to install Hadoop software on their personal machines. A fully functional Hadoop cluster will be provided for government.
Students will need the following:
- An SSH client (Linux and Mac systems come with built-in SSH clients; for Windows, Putty is recommended)
- A web browser to access the cluster, with Firefox being the preferred choice
Testimonials (5)
The live examples
Ahmet Bolat - Accenture Industrial SS
Course - Python, Spark, and Hadoop for Big Data
During the exercises, James explained me every step whereever I was getting stuck in more detail. I was completely new to NIFI. He explained the actual purpose of NIFI, even the basics such as open source. He covered every concept of Nifi starting from Beginner Level to Developer Level.
Firdous Hashim Ali - MOD A BLOCK
Course - Apache NiFi for Administrators
That I had it in the first place.
Peter Scales - CACI Ltd
Course - Apache NiFi for Developers
practical things of doing, also theory was served good by Ajay
Dominik Mazur - Capgemini Polska Sp. z o.o.
Course - Hadoop Administration on MapR
The VM I liked very much The Teacher was very knowledgeable regarding the topic as well as other topics, he was very nice and friendly I liked the facility in Dubai.