Course Outline

Section 1: Data Management in HDFS for Government

  • Various Data Formats (JSON / Avro / Parquet)
  • Compression Schemes
  • Data Masking
  • Labs: Analyzing different data formats; enabling compression

Section 2: Advanced Pig for Government

  • User-defined Functions
  • Introduction to Pig Libraries (ElephantBird / Data-Fu)
  • Loading Complex Structured Data using Pig
  • Pig Tuning
  • Labs: Advanced pig scripting, parsing complex data types

Section 3: Advanced Hive for Government

  • User-defined Functions
  • Compressed Tables
  • Hive Performance Tuning
  • Labs: Creating compressed tables, evaluating table formats and configuration

Section 4: Advanced HBase for Government

  • Advanced Schema Modelling
  • Compression
  • Bulk Data Ingest
  • Wide-table / Tall-table Comparison
  • HBase and Pig
  • HBase and Hive
  • HBase Performance Tuning
  • Labs: Tuning HBase; accessing HBase data from Pig & Hive; Using Phoenix for data modeling

Requirements

  • Comfortable with the Java programming language (most programming exercises are in Java)
  • Proficient in a Linux environment (ability to navigate the Linux command line and edit files using vi or nano)
  • A working knowledge of Hadoop.

Lab Environment

Zero Install: There is no need for students to install Hadoop software on their machines. A functional Hadoop cluster will be provided for government use by the students.

Students will need the following:

 21 Hours

Number of participants


Price per participant

Testimonials (5)

Upcoming Courses

Related Categories