Course Outline
Day 01
Overview of Big Data Business Intelligence for Criminal Intelligence Analysis
- Case Studies from Law Enforcement - Predictive Policing
- Adoption Rate of Big Data in Law Enforcement Agencies and Their Alignment with Future Operations Using Big Data Predictive Analytics
- Emerging Technology Solutions Such as Gunshot Sensors, Surveillance Video, and Social Media
- Leveraging Big Data to Mitigate Information Overload
- Integrating Big Data with Legacy Systems
- Basic Understanding of Enabling Technologies in Predictive Analytics for Government Use
- Data Integration and Dashboard Visualization for Enhanced Decision-Making
- Fraud Management Strategies Using Big Data
- Business Rules and Fraud Detection Techniques
- Threat Detection and Profiling Methods
- Cost-Benefit Analysis for Implementing Big Data Solutions in Government Agencies
Introduction to Big Data for Government
- Main Characteristics of Big Data: Volume, Variety, Velocity, and Veracity
- MPP (Massively Parallel Processing) Architecture for Efficient Data Processing
- Data Warehouses: Static Schema, Slowly Evolving Datasets for Stable Operations
- MPP Databases: Greenplum, Exadata, Teradata, Netezza, Vertica, etc.
- Hadoop-Based Solutions: Flexible and Scalable Without Strict Dataset Structure Requirements
- Typical Pattern: HDFS (Hadoop Distributed File System), MapReduce for Data Processing, and Retrieval from HDFS
- Apache Spark for Real-Time Stream Processing in Government Applications
- Batch Processing: Suited for Analytical and Non-Interactive Tasks
- Volume: Complex Event Processing (CEP) for Streaming Data
- Common Choices for CEP Products: Infostreams, Apama, MarkLogic, etc.
- Less Production-Ready Options: Storm/S4
- NoSQL Databases: Columnar and Key-Value Stores Best Suited as Analytical Adjuncts to Data Warehouses/Databases
NoSQL Solutions for Government
- KV Store - Keyspace, Flare, SchemaFree, RAMCloud, Oracle NoSQL Database (OnDB)
- KV Store - Dynamo, Voldemort, Dynomite, SubRecord, Mo8onDb, DovetailDB
- KV Store (Hierarchical) - GT.m, Cache
- KV Store (Ordered) - TokyoTyrant, Lightcloud, NMDB, Luxio, MemcacheDB, Actord
- KV Cache - Memcached, Repcached, Coherence, Infinispan, EXtremeScale, JBossCache, Velocity, Terracoqua
- Tuple Store - Gigaspaces, Coord, Apache River
- Object Database - ZopeDB, DB40, Shoal
- Document Store - CouchDB, Cloudant, Couchbase, MongoDB, Jackrabbit, XML-Databases, ThruDB, CloudKit, Prsevere, Riak-Basho, Scalaris
- Wide Columnar Store - BigTable, HBase, Apache Cassandra, Hypertable, KAI, OpenNeptune, Qbase, KDI
Varieties of Data: Introduction to Data Cleaning Issues in Big Data for Government
- RDBMS: Static Structure/Schema Does Not Support Agile and Exploratory Environments
- NoSQL: Semi-Structured Data with Sufficient Flexibility to Store Without Exact Schema Before Storing Data
- Data Cleaning Challenges in Big Data Projects for Government
Hadoop for Government
- When to Select Hadoop for Government Applications?
- STRUCTURED: Enterprise Data Warehouses/Databases Can Store Massive Amounts of Data (at a Cost) but Impose Strict Structure, Not Ideal for Active Exploration
- SEMI-STRUCTURED Data: Difficult to Manage with Traditional Solutions (Data Warehouses/Databases)
- Warehousing Data: Significant Effort and Static Nature Even After Implementation
- HADOOP: Ideal for Variety and Volume of Data, Processed on Commodity Hardware
- Commodity Hardware Required to Create a Hadoop Cluster for Government Use
Introduction to MapReduce and HDFS for Government
- MapReduce: Distributed Computing Across Multiple Servers for Efficient Data Processing
- HDFS: Ensures Local Availability of Data for Computing Processes with Redundancy for Reliability
- Data: Can Be Unstructured or Schema-Less, Unlike RDBMS
- Developer's Responsibility to Make Sense of the Data for Government Applications
- Programming MapReduce: Working with Java (Pros and Cons), Manually Loading Data into HDFS for Government Projects
Day 02
Big Data Ecosystem -- Building Big Data ETL (Extract, Transform, Load) -- Selecting the Right Big Data Tools
- Hadoop vs. Other NoSQL Solutions for Government Use
- For Interactive, Random Access to Data: HBase (Column-Oriented Database) on Top of Hadoop
- Random Access to Data with Restrictions (Max 1 PB): Not Ideal for Ad-Hoc Analytics but Suitable for Logging, Counting, and Time-Series Analysis in Government Operations
- Sqoop: Importing Data from Databases to Hive or HDFS Using JDBC/ODBC Access for Government Projects
- Flume: Streaming Data (e.g., Log Data) into HDFS for Real-Time Processing in Government Applications
Big Data Management System for Government
- Moving Parts, Compute Nodes Start/Fail: ZooKeeper - For Configuration/Coordination/Naming Services in Government Environments
- Complex Pipeline/Workflow: Oozie - Managing Workflow, Dependencies, and Daisy Chaining for Efficient Operations in Government
- Deploying, Configuring, Cluster Management, Upgrades, etc. (Sys Admin): Ambari for Streamlined Administration in Government Agencies
- In the Cloud: Whirr for Flexible Big Data Solutions in Government
Predictive Analytics -- Fundamental Techniques and Machine Learning-Based Business Intelligence for Government
- Introduction to Machine Learning for Government Applications
- Learning Classification Techniques for Enhanced Predictive Models in Government
- Bayesian Prediction: Preparing a Training File for Accurate Forecasts in Government
- Support Vector Machine (SVM) for Robust Predictive Analysis in Government
- KNN p-Tree Algebra & Vertical Mining for Efficient Data Processing in Government
- Neural Networks for Advanced Pattern Recognition in Government Applications
- Random Forest (RF): Solving the Large Variable Problem in Big Data for Government
- Multi-Model Ensemble RF: Addressing Automation Challenges in Big Data for Government
- Automation through Soft10-M for Streamlined Operations in Government
- Text Analytic Tool - Treeminer for Extracting Insights from Textual Data in Government
- Agile Learning Methods for Continuous Improvement in Government Analytics
- Agent-Based Learning: Enhancing Predictive Models with Intelligent Agents in Government
- Distributed Learning: Scaling Analytics Across Multiple Nodes in Government Environments
- Introduction to Open-Source Tools for Predictive Analytics: R, Python, Rapidminer, Mahout for Government Use
Predictive Analytics Ecosystem and Its Application in Criminal Intelligence Analysis for Government
- Technology and the Investigative Process in Government Operations
- Insight Analytics for Informed Decision-Making in Government
- Visualization Analytics: Enhancing Data Presentation for Government Stakeholders
- Structured Predictive Analytics: Building Robust Models for Government Use
- Unstructured Predictive Analytics: Analyzing Unstructured Data for Government Applications
- Threat/Fraud/Vendor Profiling in Government Operations
- Recommendation Engine for Personalized Insights in Government
- Pattern Detection for Early Warning Systems in Government
- Rule/Scenario Discovery: Identifying Failures, Fraud, and Optimization Opportunities in Government
- Root Cause Discovery for Effective Problem-Solving in Government
- Sentiment Analysis for Understanding Public Opinion in Government
- CRM Analytics: Enhancing Customer Relationship Management in Government
- Network Analytics: Analyzing Complex Networks in Government Operations
- Text Analytics for Gaining Insights from Transcripts, Witness Statements, and Internet Chatter in Government Investigations
- Technology-Assisted Review for Efficient Data Analysis in Government
- Fraud Analytics: Detecting and Preventing Fraudulent Activities in Government
- Real-Time Analytics for Immediate Action in Government
Day 03
Real-Time and Scalable Analytics Over Hadoop for Government
- Why Common Analytic Algorithms Fail in Hadoop/HDFS for Government Applications
- Apache Hama: Bulk Synchronous Distributed Computing for Efficient Data Processing in Government
- Apache Spark: Cluster Computing and Real-Time Analytics for Dynamic Government Operations
- CMU Graphics Lab2: Graph-Based Asynchronous Approach to Distributed Computing for Advanced Government Analytics
- KNN p -- Algebra-Based Approach from Treeminer for Reducing Hardware Costs in Government Operations
Tools for eDiscovery and Forensics in Government
- eDiscovery Over Big Data vs. Legacy Data: A Comparison of Cost and Performance for Government Use
- Predictive Coding and Technology-Assisted Review (TAR) for Faster Discovery in Government Investigations
- Live Demo of vMiner to Demonstrate How TAR Enables Faster Discovery in Government Operations
- Faster Indexing Through HDFS: Managing the Velocity of Data in Government Projects
- NLP (Natural Language Processing): Open-Source Products and Techniques for Government Use
- eDiscovery in Foreign Languages: Technology for Foreign Language Processing in Government Investigations
Big Data BI for Cyber Security -- Achieving a 360-Degree View, Speedy Data Collection, and Threat Identification for Government
- Understanding the Basics of Security Analytics: Attack Surface, Security Misconfiguration, Host Defenses for Government Use
- Network Infrastructure/Large Datapipe/Response ETL for Real-Time Analytics in Government Operations
- Prescriptive vs. Predictive: Fixed Rule-Based vs. Auto-Discovery of Threat Rules from Metadata in Government
Gathering Disparate Data for Criminal Intelligence Analysis in Government
- Using IoT (Internet of Things) as Sensors for Capturing Data in Government Operations
- Using Satellite Imagery for Domestic Surveillance in Government Applications
- Using Surveillance and Image Data for Criminal Identification in Government Investigations
- Other Data Gathering Technologies: Drones, Body Cameras, GPS Tagging Systems, and Thermal Imaging Technology for Government Use
- Combining Automated Data Retrieval with Information from Informants, Interrogations, and Research for Comprehensive Analysis in Government
- Forecasting Criminal Activity to Enhance Public Safety in Government Operations
Day 04
Fraud Prevention BI from Big Data in Fraud Analytics for Government
- Basic Classification of Fraud Analytics: Rules-Based vs. Predictive Analytics for Government Use
- Supervised vs. Unsupervised Machine Learning for Fraud Pattern Detection in Government Operations
- Business-to-Business Fraud, Medical Claims Fraud, Insurance Fraud, Tax Evasion, and Money Laundering in Government Investigations
Social Media Analytics -- Intelligence Gathering and Analysis for Government
- How Social Media is Used by Criminals to Organize, Recruit, and Plan Activities in Government Investigations
- Big Data ETL API for Extracting Social Media Data for Government Use
- Text, Image, Metadata, and Video Analysis for Comprehensive Insights in Government Operations
- Sentiment Analysis from Social Media Feeds to Understand Public Sentiment in Government
- Contextual and Non-Contextual Filtering of Social Media Feeds for Accurate Information in Government
- Social Media Dashboard to Integrate Diverse Social Media Sources for Government Use
- Automated Profiling of Social Media Profiles for Enhanced Intelligence in Government Investigations
- Live Demo of Each Analytic Tool Using Treeminer for Government Applications
Big Data Analytics in Image Processing and Video Feeds for Government
- Image Storage Techniques in Big Data: Solutions for Data Exceeding Petabytes in Government Operations
- LTFS (Linear Tape File System) and LTO (Linear Tape Open) for Efficient Data Management in Government
- GPFS-LTFS (General Parallel File System - Linear Tape File System): Layered Storage Solution for Big Image Data in Government Projects
- Fundamentals of Image Analytics for Government Use
- Object Recognition: Identifying Objects in Images for Government Applications
- Image Segmentation: Dividing Images into Meaningful Parts for Government Analysis
- Motion Tracking: Monitoring Movement in Video Feeds for Government Operations
- 3-D Image Reconstruction: Building 3D Models from Image Data for Government Use
Biometrics, DNA, and Next-Generation Identification Programs for Government
- Beyond Fingerprinting and Facial Recognition: Advanced Biometric Techniques for Government Use
- Speech Recognition, Keystroke Analysis (Analyzing a User's Typing Pattern), and CODIS (Combined DNA Index System) for Enhanced Identification in Government
- Beyond DNA Matching: Using Forensic DNA Phenotyping to Construct Faces from DNA Samples in Government Investigations
Big Data Dashboard for Quick Accessibility of Diverse Data and Display for Government Use:
- Integration of Existing Application Platforms with Big Data Dashboards for Government Operations
- Big Data Management Strategies for Efficient Data Handling in Government
- Case Study of Big Data Dashboards: Tableau and Pentaho for Government Applications
- Using Big Data Apps to Push Location-Based Services in Government Operations
- Tracking Systems and Management Solutions for Government Use
Day 05
How to Justify Big Data BI Implementation Within an Organization for Government
- Defining the ROI (Return on Investment) for Implementing Big Data in Government Agencies
- Case Studies for Saving Analyst Time in Collection and Preparation of Data: Increasing Productivity in Government Operations
- Revenue Gain from Lower Database Licensing Costs in Government Projects
- Revenue Gain from Location-Based Services in Government Applications
- Cost Savings from Fraud Prevention in Government Operations
- An Integrated Spreadsheet Approach for Calculating Approximate Expenses vs. Revenue Gain/Savings from Big Data Implementation in Government
Step-by-Step Procedure for Replacing a Legacy Data System with a Big Data System for Government
- Big Data Migration Roadmap for Government Use
- Critical Information Needed Before Architecting a Big Data System for Government Operations
- Different Ways to Calculate Volume, Velocity, Variety, and Veracity of Data in Government Projects
- Estimating Data Growth in Government Environments
- Case Studies of Successful Migrations in Government Agencies
Review of Big Data Vendors and Their Products for Government Use
- Accenture
- APTEAN (Formerly CDC Software)
- Cisco Systems
- Cloudera
- Dell
- EMC
- GoodData Corporation
- Guavus
- Hitachi Data Systems
- Hortonworks
- HP
- IBM
- Informatica
- Intel
- Jaspersoft
- Microsoft
- MongoDB (Formerly 10Gen)
- MU Sigma
- NetApp
- Opera Solutions
- Oracle
- Pentaho
- Platfora
- QlikTech
- Quantum
- Rackspace
- Revolution Analytics
- Salesforce
- SAP
- SAS Institute
- Sisense
- Software AG/Terracotta
- Soft10 Automation
- Splunk
- Sqrrl
- Supermicro
- Tableau Software
- Teradata
- Think Big Analytics
- Tidemark Systems
- Treeminer
- VMware (Part of EMC)
Q&A Session for Government Audiences
Requirements
- Understanding of law enforcement procedures and data systems for government
- Fundamental knowledge of SQL/Oracle or relational databases
- Basic proficiency in statistical analysis (at the spreadsheet level)
Audience
- Law enforcement professionals with a technical background
Testimonials (2)
Difficult topics presented in simple, user-friendly way
Marcin - GE Medical Systems Polska Sp. z o.o.
Course - Introduction to Predictive AI
Deepthi was super attuned to my needs, she could tell when to add layers of complexity and when to hold back and take a more structured approach. Deepthi truly worked at my pace and ensured I was able to use the new functions /tools myself by first showing then letting me recreate the items myself which really helped embed the training. I could not be happier with the results of this training and with the level of expertise of Deepthi!