Course Outline

Section 1: Introduction to Hadoop for Government

  • Hadoop history, concepts
  • Ecosystem
  • Distributions
  • High-level architecture
  • Hadoop myths
  • Hadoop challenges
  • Hardware and software requirements
  • Laboratory: First look at Hadoop

Section 2: HDFS for Government

  • Design and architecture
  • Concepts (horizontal scaling, replication, data locality, rack awareness)
  • Daemons: Namenode, Secondary namenode, Data node
  • Communications and heartbeats
  • Data integrity
  • Read/write path
  • Namenode High Availability (HA), Federation
  • Laboratory: Interacting with HDFS

Section 3: MapReduce for Government

  • Concepts and architecture
  • Daemons (MapReduce Version 1): Jobtracker, Tasktracker
  • Phases: Driver, Mapper, Shuffle/Sort, Reducer
  • MapReduce Version 1 and Version 2 (YARN)
  • Internals of MapReduce
  • Introduction to Java MapReduce program
  • Laboratory: Running a sample MapReduce program

Section 4: Pig for Government

  • Pig vs. Java MapReduce
  • Pig job flow
  • Pig Latin language
  • ETL with Pig
  • Transformations and Joins
  • User-defined functions (UDF)
  • Laboratory: Writing Pig scripts to analyze data

Section 5: Hive for Government

  • Architecture and design
  • Data types
  • SQL support in Hive
  • Creating Hive tables and querying
  • Partitions
  • Joins
  • Text processing
  • Laboratory: Various exercises on processing data with Hive

Section 6: HBase for Government

  • Concepts and architecture
  • HBase vs. RDBMS vs. Cassandra
  • HBase Java API
  • Time series data on HBase
  • Schema design
  • Laboratory: Interacting with HBase using shell; Programming in HBase Java API; Schema design exercise

Requirements

  • Proficient in Java programming language (most programming exercises are conducted in Java)
  • Comfortable in a Linux environment (ability to navigate the Linux command line and edit files using vi or nano)

Lab Environment

Zero Install: There is no need for students to install Hadoop software on their personal machines. A fully functional Hadoop cluster will be provided for government.

Students will need the following:

  • An SSH client (Linux and Mac systems come with built-in SSH clients; for Windows, Putty is recommended)
  • A web browser to access the cluster, with Firefox being the preferred choice
 28 Hours

Number of participants


Price per participant

Testimonials (5)

Upcoming Courses

Related Categories