Course Outline
-
Scala Primer for Government
- A quick introduction to Scala for government use
- Labs: Getting acquainted with Scala for government applications
-
Spark Basics for Government
- Background and history of Spark for government
- Integration of Spark with Hadoop for government workflows
- Core concepts and architecture of Spark for government
- Overview of the Spark ecosystem (core, SQL, MLlib, streaming) for government applications
- Labs: Installing and running Spark in a government environment
-
First Look at Spark for Government
- Running Spark in local mode for government testing
- Exploring the Spark web UI for government monitoring
- Using the Spark shell for government data exploration
- Analyzing datasets – part 1, tailored for government needs
- Inspecting RDDs in a government context
- Labs: Exploring the Spark shell for government applications
-
RDDs for Government
- Concepts of RDDs for government data processing
- Partitioning strategies for government datasets
- Operations and transformations on RDDs for government tasks
- Types of RDDs suitable for government use cases
- Key-Value pair RDDs for government applications
- MapReduce operations using RDDs in a government context
- Caching and persistence techniques for government data
- Labs: Creating, inspecting, and caching RDDs for government projects
-
Spark API Programming for Government
- Introduction to the Spark API and RDD API for government developers
- Submitting the first program to Spark in a government environment
- Debugging and logging techniques for government applications
- Configuration properties for government deployments
- Labs: Programming in the Spark API, submitting jobs for government tasks
-
Spark SQL for Government
- SQL support in Spark for government data queries
- Dataframes and their use in government datasets
- Defining tables and importing datasets for government analysis
- Querying data frames using SQL for government reports
- Storage formats (JSON, Parquet) for government data storage
- Labs: Creating and querying data frames, evaluating data formats for government needs
-
MLlib for Government
- Introduction to MLlib for government machine learning tasks
- Overview of MLlib algorithms suitable for government applications
- Labs: Writing MLib applications for government projects
-
GraphX for Government
- Overview of the GraphX library for government data analysis
- GraphX APIs and their application in government workflows
- Labs: Processing graph data using Spark for government tasks
-
Spark Streaming for Government
- Overview of streaming capabilities in Spark for government real-time data processing
- Evaluating streaming platforms suitable for government use
- Streaming operations and their application in government scenarios
- Sliding window operations for government data streams
- Labs: Writing spark streaming applications for government tasks
-
Spark and Hadoop for Government
- Introduction to Hadoop (HDFS, YARN) for government data storage and processing
- Architecture of Hadoop + Spark integration for government workflows
- Running Spark on Hadoop YARN in a government environment
- Processing HDFS files using Spark for government applications
-
Spark Performance and Tuning for Government
- Broadcast variables for optimizing government data processing
- Accumulators for tracking government data metrics
- Memory management and caching strategies for government applications
-
Spark Operations for Government
- Deploying Spark in a production environment for government use
- Sample deployment templates for government IT teams
- Configuration settings optimized for government requirements
- Monitoring tools and techniques for government deployments
- Troubleshooting common issues in government Spark environments
Requirements
PRE-REQUISITES
Familiarity with either the Java, Scala, or Python programming languages (our labs are conducted in Scala and Python)
Basic understanding of a Linux development environment, including command line navigation and file editing using tools such as VI or nano, is required for government participants.
Testimonials (6)
Doing similar exercises different ways really help understanding what each component (Hadoop/Spark, standalone/cluster) can do on its own and together. It gave me ideas on how I should test my application on my local machine when I develop vs when it is deployed on a cluster.
Thomas Carcaud - IT Frankfurt GmbH
Course - Spark for Developers
Ajay was very friendly, helpful and also knowledgable about the topic he was discussing.
Biniam Guulay - ICE International Copyright Enterprise Germany GmbH
Course - Spark for Developers
Ernesto did a great job explaining the high level concepts of using Spark and its various modules.
Michael Nemerouf
Course - Spark for Developers
The trainer made the class interesting and entertaining which helps quite a bit with all day training.
Ryan Speelman
Course - Spark for Developers
We know a lot more about the whole environment.
John Kidd
Course - Spark for Developers
Richard is very calm and methodical, with an analytic insight - exactly the qualities needed to present this sort of course.