Course Outline

Section 1: Data Management in HDFS

  • Various Data Formats (JSON, Avro, Parquet)
  • Compression Schemes
  • Data Masking
  • Labs: Analyzing different data formats; enabling compression for government applications

Section 2: Advanced Pig

  • User-defined Functions
  • Introduction to Pig Libraries (ElephantBird, Data-Fu)
  • Loading Complex Structured Data using Pig for government datasets
  • Pig Tuning
  • Labs: Advanced pig scripting, parsing complex data types in government contexts

Section 3: Advanced Hive

  • User-defined Functions
  • Compressed Tables for efficient storage and retrieval
  • Hive Performance Tuning to optimize queries for government use cases
  • Labs: Creating compressed tables, evaluating table formats and configuration for government data

Section 4: Advanced HBase

  • Advanced Schema Modeling for complex data structures in government databases
  • Compression Techniques to enhance storage efficiency
  • Bulk Data Ingest for large-scale government datasets
  • Wide-table vs. Tall-table Comparison for optimal performance
  • Integrating HBase with Pig and Hive for comprehensive data processing in government environments
  • HBase Performance Tuning to meet high-demand government operations
  • Labs: Tuning HBase configurations; accessing HBase data from Pig and Hive for government applications; Using Phoenix for advanced data modeling

Requirements

  • Proficient in the Java programming language (most programming exercises are conducted in Java)
  • Comfortable operating in a Linux environment (ability to navigate the Linux command line and edit files using vi or nano)
  • Practical knowledge of Hadoop

Lab Environment

Zero Install: There is no requirement for students to install Hadoop software on their personal devices. A functional Hadoop cluster will be provided for government training purposes.

Students will need the following:

 21 Hours

Number of participants


Price per participant

Testimonials (5)

Upcoming Courses

Related Categories