Course Outline

Day 01

Overview of Big Data Business Intelligence for Criminal Intelligence Analysis

  • Case Studies from Law Enforcement - Predictive Policing
  • Adoption Rate of Big Data in Law Enforcement Agencies and Their Alignment with Future Operations Using Big Data Predictive Analytics
  • Emerging Technology Solutions Such as Gunshot Sensors, Surveillance Video, and Social Media
  • Leveraging Big Data to Mitigate Information Overload
  • Integrating Big Data with Legacy Systems
  • Basic Understanding of Enabling Technologies in Predictive Analytics for Government Use
  • Data Integration and Dashboard Visualization for Enhanced Decision-Making
  • Fraud Management Strategies Using Big Data
  • Business Rules and Fraud Detection Techniques
  • Threat Detection and Profiling Methods
  • Cost-Benefit Analysis for Implementing Big Data Solutions in Government Agencies

Introduction to Big Data for Government

  • Main Characteristics of Big Data: Volume, Variety, Velocity, and Veracity
  • MPP (Massively Parallel Processing) Architecture for Efficient Data Processing
  • Data Warehouses: Static Schema, Slowly Evolving Datasets for Stable Operations
  • MPP Databases: Greenplum, Exadata, Teradata, Netezza, Vertica, etc.
  • Hadoop-Based Solutions: Flexible and Scalable Without Strict Dataset Structure Requirements
  • Typical Pattern: HDFS (Hadoop Distributed File System), MapReduce for Data Processing, and Retrieval from HDFS
  • Apache Spark for Real-Time Stream Processing in Government Applications
  • Batch Processing: Suited for Analytical and Non-Interactive Tasks
  • Volume: Complex Event Processing (CEP) for Streaming Data
  • Common Choices for CEP Products: Infostreams, Apama, MarkLogic, etc.
  • Less Production-Ready Options: Storm/S4
  • NoSQL Databases: Columnar and Key-Value Stores Best Suited as Analytical Adjuncts to Data Warehouses/Databases

NoSQL Solutions for Government

  • KV Store - Keyspace, Flare, SchemaFree, RAMCloud, Oracle NoSQL Database (OnDB)
  • KV Store - Dynamo, Voldemort, Dynomite, SubRecord, Mo8onDb, DovetailDB
  • KV Store (Hierarchical) - GT.m, Cache
  • KV Store (Ordered) - TokyoTyrant, Lightcloud, NMDB, Luxio, MemcacheDB, Actord
  • KV Cache - Memcached, Repcached, Coherence, Infinispan, EXtremeScale, JBossCache, Velocity, Terracoqua
  • Tuple Store - Gigaspaces, Coord, Apache River
  • Object Database - ZopeDB, DB40, Shoal
  • Document Store - CouchDB, Cloudant, Couchbase, MongoDB, Jackrabbit, XML-Databases, ThruDB, CloudKit, Prsevere, Riak-Basho, Scalaris
  • Wide Columnar Store - BigTable, HBase, Apache Cassandra, Hypertable, KAI, OpenNeptune, Qbase, KDI

Varieties of Data: Introduction to Data Cleaning Issues in Big Data for Government

  • RDBMS: Static Structure/Schema Does Not Support Agile and Exploratory Environments
  • NoSQL: Semi-Structured Data with Sufficient Flexibility to Store Without Exact Schema Before Storing Data
  • Data Cleaning Challenges in Big Data Projects for Government

Hadoop for Government

  • When to Select Hadoop for Government Applications?
  • STRUCTURED: Enterprise Data Warehouses/Databases Can Store Massive Amounts of Data (at a Cost) but Impose Strict Structure, Not Ideal for Active Exploration
  • SEMI-STRUCTURED Data: Difficult to Manage with Traditional Solutions (Data Warehouses/Databases)
  • Warehousing Data: Significant Effort and Static Nature Even After Implementation
  • HADOOP: Ideal for Variety and Volume of Data, Processed on Commodity Hardware
  • Commodity Hardware Required to Create a Hadoop Cluster for Government Use

Introduction to MapReduce and HDFS for Government

  • MapReduce: Distributed Computing Across Multiple Servers for Efficient Data Processing
  • HDFS: Ensures Local Availability of Data for Computing Processes with Redundancy for Reliability
  • Data: Can Be Unstructured or Schema-Less, Unlike RDBMS
  • Developer's Responsibility to Make Sense of the Data for Government Applications
  • Programming MapReduce: Working with Java (Pros and Cons), Manually Loading Data into HDFS for Government Projects

Day 02

Big Data Ecosystem -- Building Big Data ETL (Extract, Transform, Load) -- Selecting the Right Big Data Tools

  • Hadoop vs. Other NoSQL Solutions for Government Use
  • For Interactive, Random Access to Data: HBase (Column-Oriented Database) on Top of Hadoop
  • Random Access to Data with Restrictions (Max 1 PB): Not Ideal for Ad-Hoc Analytics but Suitable for Logging, Counting, and Time-Series Analysis in Government Operations
  • Sqoop: Importing Data from Databases to Hive or HDFS Using JDBC/ODBC Access for Government Projects
  • Flume: Streaming Data (e.g., Log Data) into HDFS for Real-Time Processing in Government Applications

Big Data Management System for Government

  • Moving Parts, Compute Nodes Start/Fail: ZooKeeper - For Configuration/Coordination/Naming Services in Government Environments
  • Complex Pipeline/Workflow: Oozie - Managing Workflow, Dependencies, and Daisy Chaining for Efficient Operations in Government
  • Deploying, Configuring, Cluster Management, Upgrades, etc. (Sys Admin): Ambari for Streamlined Administration in Government Agencies
  • In the Cloud: Whirr for Flexible Big Data Solutions in Government

Predictive Analytics -- Fundamental Techniques and Machine Learning-Based Business Intelligence for Government

  • Introduction to Machine Learning for Government Applications
  • Learning Classification Techniques for Enhanced Predictive Models in Government
  • Bayesian Prediction: Preparing a Training File for Accurate Forecasts in Government
  • Support Vector Machine (SVM) for Robust Predictive Analysis in Government
  • KNN p-Tree Algebra & Vertical Mining for Efficient Data Processing in Government
  • Neural Networks for Advanced Pattern Recognition in Government Applications
  • Random Forest (RF): Solving the Large Variable Problem in Big Data for Government
  • Multi-Model Ensemble RF: Addressing Automation Challenges in Big Data for Government
  • Automation through Soft10-M for Streamlined Operations in Government
  • Text Analytic Tool - Treeminer for Extracting Insights from Textual Data in Government
  • Agile Learning Methods for Continuous Improvement in Government Analytics
  • Agent-Based Learning: Enhancing Predictive Models with Intelligent Agents in Government
  • Distributed Learning: Scaling Analytics Across Multiple Nodes in Government Environments
  • Introduction to Open-Source Tools for Predictive Analytics: R, Python, Rapidminer, Mahout for Government Use

Predictive Analytics Ecosystem and Its Application in Criminal Intelligence Analysis for Government

  • Technology and the Investigative Process in Government Operations
  • Insight Analytics for Informed Decision-Making in Government
  • Visualization Analytics: Enhancing Data Presentation for Government Stakeholders
  • Structured Predictive Analytics: Building Robust Models for Government Use
  • Unstructured Predictive Analytics: Analyzing Unstructured Data for Government Applications
  • Threat/Fraud/Vendor Profiling in Government Operations
  • Recommendation Engine for Personalized Insights in Government
  • Pattern Detection for Early Warning Systems in Government
  • Rule/Scenario Discovery: Identifying Failures, Fraud, and Optimization Opportunities in Government
  • Root Cause Discovery for Effective Problem-Solving in Government
  • Sentiment Analysis for Understanding Public Opinion in Government
  • CRM Analytics: Enhancing Customer Relationship Management in Government
  • Network Analytics: Analyzing Complex Networks in Government Operations
  • Text Analytics for Gaining Insights from Transcripts, Witness Statements, and Internet Chatter in Government Investigations
  • Technology-Assisted Review for Efficient Data Analysis in Government
  • Fraud Analytics: Detecting and Preventing Fraudulent Activities in Government
  • Real-Time Analytics for Immediate Action in Government

Day 03

Real-Time and Scalable Analytics Over Hadoop for Government

  • Why Common Analytic Algorithms Fail in Hadoop/HDFS for Government Applications
  • Apache Hama: Bulk Synchronous Distributed Computing for Efficient Data Processing in Government
  • Apache Spark: Cluster Computing and Real-Time Analytics for Dynamic Government Operations
  • CMU Graphics Lab2: Graph-Based Asynchronous Approach to Distributed Computing for Advanced Government Analytics
  • KNN p -- Algebra-Based Approach from Treeminer for Reducing Hardware Costs in Government Operations

Tools for eDiscovery and Forensics in Government

  • eDiscovery Over Big Data vs. Legacy Data: A Comparison of Cost and Performance for Government Use
  • Predictive Coding and Technology-Assisted Review (TAR) for Faster Discovery in Government Investigations
  • Live Demo of vMiner to Demonstrate How TAR Enables Faster Discovery in Government Operations
  • Faster Indexing Through HDFS: Managing the Velocity of Data in Government Projects
  • NLP (Natural Language Processing): Open-Source Products and Techniques for Government Use
  • eDiscovery in Foreign Languages: Technology for Foreign Language Processing in Government Investigations

Big Data BI for Cyber Security -- Achieving a 360-Degree View, Speedy Data Collection, and Threat Identification for Government

  • Understanding the Basics of Security Analytics: Attack Surface, Security Misconfiguration, Host Defenses for Government Use
  • Network Infrastructure/Large Datapipe/Response ETL for Real-Time Analytics in Government Operations
  • Prescriptive vs. Predictive: Fixed Rule-Based vs. Auto-Discovery of Threat Rules from Metadata in Government

Gathering Disparate Data for Criminal Intelligence Analysis in Government

  • Using IoT (Internet of Things) as Sensors for Capturing Data in Government Operations
  • Using Satellite Imagery for Domestic Surveillance in Government Applications
  • Using Surveillance and Image Data for Criminal Identification in Government Investigations
  • Other Data Gathering Technologies: Drones, Body Cameras, GPS Tagging Systems, and Thermal Imaging Technology for Government Use
  • Combining Automated Data Retrieval with Information from Informants, Interrogations, and Research for Comprehensive Analysis in Government
  • Forecasting Criminal Activity to Enhance Public Safety in Government Operations

Day 04

Fraud Prevention BI from Big Data in Fraud Analytics for Government

  • Basic Classification of Fraud Analytics: Rules-Based vs. Predictive Analytics for Government Use
  • Supervised vs. Unsupervised Machine Learning for Fraud Pattern Detection in Government Operations
  • Business-to-Business Fraud, Medical Claims Fraud, Insurance Fraud, Tax Evasion, and Money Laundering in Government Investigations

Social Media Analytics -- Intelligence Gathering and Analysis for Government

  • How Social Media is Used by Criminals to Organize, Recruit, and Plan Activities in Government Investigations
  • Big Data ETL API for Extracting Social Media Data for Government Use
  • Text, Image, Metadata, and Video Analysis for Comprehensive Insights in Government Operations
  • Sentiment Analysis from Social Media Feeds to Understand Public Sentiment in Government
  • Contextual and Non-Contextual Filtering of Social Media Feeds for Accurate Information in Government
  • Social Media Dashboard to Integrate Diverse Social Media Sources for Government Use
  • Automated Profiling of Social Media Profiles for Enhanced Intelligence in Government Investigations
  • Live Demo of Each Analytic Tool Using Treeminer for Government Applications

Big Data Analytics in Image Processing and Video Feeds for Government

  • Image Storage Techniques in Big Data: Solutions for Data Exceeding Petabytes in Government Operations
  • LTFS (Linear Tape File System) and LTO (Linear Tape Open) for Efficient Data Management in Government
  • GPFS-LTFS (General Parallel File System - Linear Tape File System): Layered Storage Solution for Big Image Data in Government Projects
  • Fundamentals of Image Analytics for Government Use
  • Object Recognition: Identifying Objects in Images for Government Applications
  • Image Segmentation: Dividing Images into Meaningful Parts for Government Analysis
  • Motion Tracking: Monitoring Movement in Video Feeds for Government Operations
  • 3-D Image Reconstruction: Building 3D Models from Image Data for Government Use

Biometrics, DNA, and Next-Generation Identification Programs for Government

  • Beyond Fingerprinting and Facial Recognition: Advanced Biometric Techniques for Government Use
  • Speech Recognition, Keystroke Analysis (Analyzing a User's Typing Pattern), and CODIS (Combined DNA Index System) for Enhanced Identification in Government
  • Beyond DNA Matching: Using Forensic DNA Phenotyping to Construct Faces from DNA Samples in Government Investigations

Big Data Dashboard for Quick Accessibility of Diverse Data and Display for Government Use:

  • Integration of Existing Application Platforms with Big Data Dashboards for Government Operations
  • Big Data Management Strategies for Efficient Data Handling in Government
  • Case Study of Big Data Dashboards: Tableau and Pentaho for Government Applications
  • Using Big Data Apps to Push Location-Based Services in Government Operations
  • Tracking Systems and Management Solutions for Government Use

Day 05

How to Justify Big Data BI Implementation Within an Organization for Government

  • Defining the ROI (Return on Investment) for Implementing Big Data in Government Agencies
  • Case Studies for Saving Analyst Time in Collection and Preparation of Data: Increasing Productivity in Government Operations
  • Revenue Gain from Lower Database Licensing Costs in Government Projects
  • Revenue Gain from Location-Based Services in Government Applications
  • Cost Savings from Fraud Prevention in Government Operations
  • An Integrated Spreadsheet Approach for Calculating Approximate Expenses vs. Revenue Gain/Savings from Big Data Implementation in Government

Step-by-Step Procedure for Replacing a Legacy Data System with a Big Data System for Government

  • Big Data Migration Roadmap for Government Use
  • Critical Information Needed Before Architecting a Big Data System for Government Operations
  • Different Ways to Calculate Volume, Velocity, Variety, and Veracity of Data in Government Projects
  • Estimating Data Growth in Government Environments
  • Case Studies of Successful Migrations in Government Agencies

Review of Big Data Vendors and Their Products for Government Use

  • Accenture
  • APTEAN (Formerly CDC Software)
  • Cisco Systems
  • Cloudera
  • Dell
  • EMC
  • GoodData Corporation
  • Guavus
  • Hitachi Data Systems
  • Hortonworks
  • HP
  • IBM
  • Informatica
  • Intel
  • Jaspersoft
  • Microsoft
  • MongoDB (Formerly 10Gen)
  • MU Sigma
  • NetApp
  • Opera Solutions
  • Oracle
  • Pentaho
  • Platfora
  • QlikTech
  • Quantum
  • Rackspace
  • Revolution Analytics
  • Salesforce
  • SAP
  • SAS Institute
  • Sisense
  • Software AG/Terracotta
  • Soft10 Automation
  • Splunk
  • Sqrrl
  • Supermicro
  • Tableau Software
  • Teradata
  • Think Big Analytics
  • Tidemark Systems
  • Treeminer
  • VMware (Part of EMC)

Q&A Session for Government Audiences

Requirements

  • Understanding of law enforcement procedures and data systems for government
  • Fundamental knowledge of SQL/Oracle or relational databases
  • Basic proficiency in statistical analysis (at the spreadsheet level)

Audience

  • Law enforcement professionals with a technical background
 35 Hours

Number of participants


Price per participant

Testimonials (2)

Upcoming Courses

Related Categories