Course Outline
Section 1: Data Management in HDFS for Government
- Various Data Formats (JSON / Avro / Parquet)
- Compression Schemes
- Data Masking
- Labs: Analyzing different data formats; enabling compression
Section 2: Advanced Pig for Government
- User-defined Functions
- Introduction to Pig Libraries (ElephantBird / Data-Fu)
- Loading Complex Structured Data using Pig
- Pig Tuning
- Labs: Advanced pig scripting, parsing complex data types
Section 3: Advanced Hive for Government
- User-defined Functions
- Compressed Tables
- Hive Performance Tuning
- Labs: Creating compressed tables, evaluating table formats and configuration
Section 4: Advanced HBase for Government
- Advanced Schema Modelling
- Compression
- Bulk Data Ingest
- Wide-table / Tall-table Comparison
- HBase and Pig
- HBase and Hive
- HBase Performance Tuning
- Labs: Tuning HBase; accessing HBase data from Pig & Hive; Using Phoenix for data modeling
Requirements
- Comfortable with the Java programming language (most programming exercises are in Java)
- Proficient in a Linux environment (ability to navigate the Linux command line and edit files using vi or nano)
- A working knowledge of Hadoop.
Lab Environment
Zero Install: There is no need for students to install Hadoop software on their machines. A functional Hadoop cluster will be provided for government use by the students.
Students will need the following:
Testimonials (5)
The live examples
Ahmet Bolat - Accenture Industrial SS
Course - Python, Spark, and Hadoop for Big Data
During the exercises, James explained me every step whereever I was getting stuck in more detail. I was completely new to NIFI. He explained the actual purpose of NIFI, even the basics such as open source. He covered every concept of Nifi starting from Beginner Level to Developer Level.
Firdous Hashim Ali - MOD A BLOCK
Course - Apache NiFi for Administrators
That I had it in the first place.
Peter Scales - CACI Ltd
Course - Apache NiFi for Developers
practical things of doing, also theory was served good by Ajay
Dominik Mazur - Capgemini Polska Sp. z o.o.
Course - Hadoop Administration on MapR
The VM I liked very much The Teacher was very knowledgeable regarding the topic as well as other topics, he was very nice and friendly I liked the facility in Dubai.