Course Outline
Day 01
Overview of Big Data Business Intelligence for Criminal Intelligence Analysis
- Case Studies from Law Enforcement - Predictive Policing
- Adoption Rate of Big Data in Law Enforcement Agencies and Their Alignment with Future Operations Using Big Data Predictive Analytics
- Emerging Technology Solutions Such as Gunshot Sensors, Surveillance Video, and Social Media
- Leveraging Big Data to Mitigate Information Overload
- Integrating Big Data with Legacy Systems
- Basic Understanding of Enabling Technologies in Predictive Analytics for Government Use
- Data Integration and Dashboard Visualization for Enhanced Decision-Making
- Fraud Management Strategies Using Big Data
- Business Rules and Fraud Detection Techniques
- Threat Detection and Profiling Methods
- Cost-Benefit Analysis for Implementing Big Data Solutions in Government Agencies
Introduction to Big Data for Government
- Main Characteristics of Big Data: Volume, Variety, Velocity, and Veracity
- MPP (Massively Parallel Processing) Architecture for Efficient Data Processing
- Data Warehouses: Static Schema, Slowly Evolving Datasets for Stable Operations
- MPP Databases: Greenplum, Exadata, Teradata, Netezza, Vertica, etc.
- Hadoop-Based Solutions: Flexible and Scalable Without Strict Dataset Structure Requirements
- Typical Pattern: HDFS (Hadoop Distributed File System), MapReduce for Data Processing, and Retrieval from HDFS
- Apache Spark for Real-Time Stream Processing in Government Applications
- Batch Processing: Suited for Analytical and Non-Interactive Tasks
- Volume: Complex Event Processing (CEP) for Streaming Data
- Common Choices for CEP Products: Infostreams, Apama, MarkLogic, etc.
- Less Production-Ready Options: Storm/S4
- NoSQL Databases: Columnar and Key-Value Stores Best Suited as Analytical Adjuncts to Data Warehouses/Databases
NoSQL Solutions for Government
- KV Store - Keyspace, Flare, SchemaFree, RAMCloud, Oracle NoSQL Database (OnDB)
- KV Store - Dynamo, Voldemort, Dynomite, SubRecord, Mo8onDb, DovetailDB
- KV Store (Hierarchical) - GT.m, Cache
- KV Store (Ordered) - TokyoTyrant, Lightcloud, NMDB, Luxio, MemcacheDB, Actord
- KV Cache - Memcached, Repcached, Coherence, Infinispan, EXtremeScale, JBossCache, Velocity, Terracoqua
- Tuple Store - Gigaspaces, Coord, Apache River
- Object Database - ZopeDB, DB40, Shoal
- Document Store - CouchDB, Cloudant, Couchbase, MongoDB, Jackrabbit, XML-Databases, ThruDB, CloudKit, Prsevere, Riak-Basho, Scalaris
- Wide Columnar Store - BigTable, HBase, Apache Cassandra, Hypertable, KAI, OpenNeptune, Qbase, KDI
Varieties of Data: Introduction to Data Cleaning Issues in Big Data for Government
- RDBMS: Static Structure/Schema Does Not Support Agile and Exploratory Environments
- NoSQL: Semi-Structured Data with Sufficient Flexibility to Store Without Exact Schema Before Storing Data
- Data Cleaning Challenges in Big Data Projects for Government
Hadoop for Government
- When to Select Hadoop for Government Applications?
- STRUCTURED: Enterprise Data Warehouses/Databases Can Store Massive Amounts of Data (at a Cost) but Impose Strict Structure, Not Ideal for Active Exploration
- SEMI-STRUCTURED Data: Difficult to Manage with Traditional Solutions (Data Warehouses/Databases)
- Warehousing Data: Significant Effort and Static Nature Even After Implementation
- HADOOP: Ideal for Variety and Volume of Data, Processed on Commodity Hardware
- Commodity Hardware Required to Create a Hadoop Cluster for Government Use
Introduction to MapReduce and HDFS for Government
- MapReduce: Distributed Computing Across Multiple Servers for Efficient Data Processing
- HDFS: Ensures Local Availability of Data for Computing Processes with Redundancy for Reliability
- Data: Can Be Unstructured or Schema-Less, Unlike RDBMS
- Developer's Responsibility to Make Sense of the Data for Government Applications
- Programming MapReduce: Working with Java (Pros and Cons), Manually Loading Data into HDFS for Government Projects
Day 02
Big Data Ecosystem -- Building Big Data ETL (Extract, Transform, Load) -- Selecting the Right Big Data Tools
- Hadoop vs. Other NoSQL Solutions for Government Use
- For Interactive, Random Access to Data: HBase (Column-Oriented Database) on Top of Hadoop
- Random Access to Data with Restrictions (Max 1 PB): Not Ideal for Ad-Hoc Analytics but Suitable for Logging, Counting, and Time-Series Analysis in Government Operations
- Sqoop: Importing Data from Databases to Hive or HDFS Using JDBC/ODBC Access for Government Projects
- Flume: Streaming Data (e.g., Log Data) into HDFS for Real-Time Processing in Government Applications
Big Data Management System for Government
- Moving Parts, Compute Nodes Start/Fail: ZooKeeper - For Configuration/Coordination/Naming Services in Government Environments
- Complex Pipeline/Workflow: Oozie - Managing Workflow, Dependencies, and Daisy Chaining for Efficient Operations in Government
- Deploying, Configuring, Cluster Management, Upgrades, etc. (Sys Admin): Ambari for Streamlined Administration in Government Agencies
- In the Cloud: Whirr for Flexible Big Data Solutions in Government
Predictive Analytics -- Fundamental Techniques and Machine Learning-Based Business Intelligence for Government
- Introduction to Machine Learning for Government Applications
- Learning Classification Techniques for Enhanced Predictive Models in Government
- Bayesian Prediction: Preparing a Training File for Accurate Forecasts in Government
- Support Vector Machine (SVM) for Robust Predictive Analysis in Government
- KNN p-Tree Algebra & Vertical Mining for Efficient Data Processing in Government
- Neural Networks for Advanced Pattern Recognition in Government Applications
- Random Forest (RF): Solving the Large Variable Problem in Big Data for Government
- Multi-Model Ensemble RF: Addressing Automation Challenges in Big Data for Government
- Automation through Soft10-M for Streamlined Operations in Government
- Text Analytic Tool - Treeminer for Extracting Insights from Textual Data in Government
- Agile Learning Methods for Continuous Improvement in Government Analytics
- Agent-Based Learning: Enhancing Predictive Models with Intelligent Agents in Government
- Distributed Learning: Scaling Analytics Across Multiple Nodes in Government Environments
- Introduction to Open-Source Tools for Predictive Analytics: R, Python, Rapidminer, Mahout for Government Use
Predictive Analytics Ecosystem and Its Application in Criminal Intelligence Analysis for Government
- Technology and the Investigative Process in Government Operations
- Insight Analytics for Informed Decision-Making in Government
- Visualization Analytics: Enhancing Data Presentation for Government Stakeholders
- Structured Predictive Analytics: Building Robust Models for Government Use
- Unstructured Predictive Analytics: Analyzing Unstructured Data for Government Applications
- Threat/Fraud/Vendor Profiling in Government Operations
- Recommendation Engine for Personalized Insights in Government
- Pattern Detection for Early Warning Systems in Government
- Rule/Scenario Discovery: Identifying Failures, Fraud, and Optimization Opportunities in Government
- Root Cause Discovery for Effective Problem-Solving in Government
- Sentiment Analysis for Understanding Public Opinion in Government
- CRM Analytics: Enhancing Customer Relationship Management in Government
- Network Analytics: Analyzing Complex Networks in Government Operations
- Text Analytics for Gaining Insights from Transcripts, Witness Statements, and Internet Chatter in Government Investigations
- Technology-Assisted Review for Efficient Data Analysis in Government
- Fraud Analytics: Detecting and Preventing Fraudulent Activities in Government
- Real-Time Analytics for Immediate Action in Government
Day 03
Real-Time and Scalable Analytics Over Hadoop for Government
- Why Common Analytic Algorithms Fail in Hadoop/HDFS for Government Applications
- Apache Hama: Bulk Synchronous Distributed Computing for Efficient Data Processing in Government
- Apache Spark: Cluster Computing and Real-Time Analytics for Dynamic Government Operations
- CMU Graphics Lab2: Graph-Based Asynchronous Approach to Distributed Computing for Advanced Government Analytics
- KNN p -- Algebra-Based Approach from Treeminer for Reducing Hardware Costs in Government Operations
Tools for eDiscovery and Forensics in Government
- eDiscovery Over Big Data vs. Legacy Data: A Comparison of Cost and Performance for Government Use
- Predictive Coding and Technology-Assisted Review (TAR) for Faster Discovery in Government Investigations
- Live Demo of vMiner to Demonstrate How TAR Enables Faster Discovery in Government Operations
- Faster Indexing Through HDFS: Managing the Velocity of Data in Government Projects
- NLP (Natural Language Processing): Open-Source Products and Techniques for Government Use
- eDiscovery in Foreign Languages: Technology for Foreign Language Processing in Government Investigations
Big Data BI for Cyber Security -- Achieving a 360-Degree View, Speedy Data Collection, and Threat Identification for Government
- Understanding the Basics of Security Analytics: Attack Surface, Security Misconfiguration, Host Defenses for Government Use
- Network Infrastructure/Large Datapipe/Response ETL for Real-Time Analytics in Government Operations
- Prescriptive vs. Predictive: Fixed Rule-Based vs. Auto-Discovery of Threat Rules from Metadata in Government
Gathering Disparate Data for Criminal Intelligence Analysis in Government
- Using IoT (Internet of Things) as Sensors for Capturing Data in Government Operations
- Using Satellite Imagery for Domestic Surveillance in Government Applications
- Using Surveillance and Image Data for Criminal Identification in Government Investigations
- Other Data Gathering Technologies: Drones, Body Cameras, GPS Tagging Systems, and Thermal Imaging Technology for Government Use
- Combining Automated Data Retrieval with Information from Informants, Interrogations, and Research for Comprehensive Analysis in Government
- Forecasting Criminal Activity to Enhance Public Safety in Government Operations
Day 04
Fraud Prevention BI from Big Data in Fraud Analytics for Government
- Basic Classification of Fraud Analytics: Rules-Based vs. Predictive Analytics for Government Use
- Supervised vs. Unsupervised Machine Learning for Fraud Pattern Detection in Government Operations
- Business-to-Business Fraud, Medical Claims Fraud, Insurance Fraud, Tax Evasion, and Money Laundering in Government Investigations
Social Media Analytics -- Intelligence Gathering and Analysis for Government
- How Social Media is Used by Criminals to Organize, Recruit, and Plan Activities in Government Investigations
- Big Data ETL API for Extracting Social Media Data for Government Use
- Text, Image, Metadata, and Video Analysis for Comprehensive Insights in Government Operations
- Sentiment Analysis from Social Media Feeds to Understand Public Sentiment in Government
- Contextual and Non-Contextual Filtering of Social Media Feeds for Accurate Information in Government
- Social Media Dashboard to Integrate Diverse Social Media Sources for Government Use
- Automated Profiling of Social Media Profiles for Enhanced Intelligence in Government Investigations
- Live Demo of Each Analytic Tool Using Treeminer for Government Applications
Big Data Analytics in Image Processing and Video Feeds for Government
- Image Storage Techniques in Big Data: Solutions for Data Exceeding Petabytes in Government Operations
- LTFS (Linear Tape File System) and LTO (Linear Tape Open) for Efficient Data Management in Government
- GPFS-LTFS (General Parallel File System - Linear Tape File System): Layered Storage Solution for Big Image Data in Government Projects
- Fundamentals of Image Analytics for Government Use
- Object Recognition: Identifying Objects in Images for Government Applications
- Image Segmentation: Dividing Images into Meaningful Parts for Government Analysis
- Motion Tracking: Monitoring Movement in Video Feeds for Government Operations
- 3-D Image Reconstruction: Building 3D Models from Image Data for Government Use
Biometrics, DNA, and Next-Generation Identification Programs for Government
- Beyond Fingerprinting and Facial Recognition: Advanced Biometric Techniques for Government Use
- Speech Recognition, Keystroke Analysis (Analyzing a User's Typing Pattern), and CODIS (Combined DNA Index System) for Enhanced Identification in Government
- Beyond DNA Matching: Using Forensic DNA Phenotyping to Construct Faces from DNA Samples in Government Investigations
Big Data Dashboard for Quick Accessibility of Diverse Data and Display for Government Use:
- Integration of Existing Application Platforms with Big Data Dashboards for Government Operations
- Big Data Management Strategies for Efficient Data Handling in Government
- Case Study of Big Data Dashboards: Tableau and Pentaho for Government Applications
- Using Big Data Apps to Push Location-Based Services in Government Operations
- Tracking Systems and Management Solutions for Government Use
Day 05
How to Justify Big Data BI Implementation Within an Organization for Government
- Defining the ROI (Return on Investment) for Implementing Big Data in Government Agencies
- Case Studies for Saving Analyst Time in Collection and Preparation of Data: Increasing Productivity in Government Operations
- Revenue Gain from Lower Database Licensing Costs in Government Projects
- Revenue Gain from Location-Based Services in Government Applications
- Cost Savings from Fraud Prevention in Government Operations
- An Integrated Spreadsheet Approach for Calculating Approximate Expenses vs. Revenue Gain/Savings from Big Data Implementation in Government
Step-by-Step Procedure for Replacing a Legacy Data System with a Big Data System for Government
- Big Data Migration Roadmap for Government Use
- Critical Information Needed Before Architecting a Big Data System for Government Operations
- Different Ways to Calculate Volume, Velocity, Variety, and Veracity of Data in Government Projects
- Estimating Data Growth in Government Environments
- Case Studies of Successful Migrations in Government Agencies
Review of Big Data Vendors and Their Products for Government Use
- Accenture
- APTEAN (Formerly CDC Software)
- Cisco Systems
- Cloudera
- Dell
- EMC
- GoodData Corporation
- Guavus
- Hitachi Data Systems
- Hortonworks
- HP
- IBM
- Informatica
- Intel
- Jaspersoft
- Microsoft
- MongoDB (Formerly 10Gen)
- MU Sigma
- NetApp
- Opera Solutions
- Oracle
- Pentaho
- Platfora
- QlikTech
- Quantum
- Rackspace
- Revolution Analytics
- Salesforce
- SAP
- SAS Institute
- Sisense
- Software AG/Terracotta
- Soft10 Automation
- Splunk
- Sqrrl
- Supermicro
- Tableau Software
- Teradata
- Think Big Analytics
- Tidemark Systems
- Treeminer
- VMware (Part of EMC)
Q&A Session for Government Audiences
Requirements
- Understanding of law enforcement processes and data systems for government
- Basic knowledge of SQL/Oracle or relational databases
- Basic understanding of statistics (at the spreadsheet level)
Audience
- Law Enforcement specialists with a technical background
Testimonials (2)
Abhi has excellent knowledge of Alteryx and he explained things very clearly. He understood our goals and created bespoke demo datasets that were relevant to our organisation, which was very impressive. The training was well-structured and delivered at a good pace, with time for questions.
Samuel Taylor - Manchester Metropolitan University
Course - Alteryx for Data Analysis
basics and loved the prepared documents and exercises