Get in Touch

Course Outline

AI Sovereignty and LLM Local Deployment for Government

  • Risks associated with cloud-based LLMs include data retention, training on user inputs, and exposure to foreign jurisdictions.
  • Ollama's architecture features a model server, registry, and an OpenAI-compatible API.
  • Comparison of Ollama with other local deployment solutions such as vLLM, llama.cpp, and Text Generation Inference.
  • Licensing terms for models like Llama, Mistral, Qwen, and Gemma.

Installation and Hardware Setup for Government

  • Installing Ollama on Linux systems with support for CUDA and ROCm.
  • CPU-only deployment options with AVX/AVX2 optimizations.
  • Docker deployment methods, including persistent volume mapping.
  • Strategies for multi-GPU setups and VRAM allocation.

Model Management for Government

  • Pulling models from the Ollama registry using commands like "ollama pull llama3."
  • Importing GGUF models from repositories such as HuggingFace and TheBloke.
  • Evaluating quantization levels, including Q4_K_M, Q5_K_M, and Q8_0, to balance performance and resource usage.
  • Managing model switching and concurrent loading limits.

Custom Modelfiles for Government

  • Writing custom Modelfile syntax with directives such as FROM, PARAMETER, SYSTEM, and TEMPLATE.
  • Tuning parameters like temperature, top_p, and repeat_penalty to optimize model behavior.
  • Engineering system prompts to guide role-specific behaviors.
  • Creating and publishing custom models to the local registry for government use.

API Integration for Government

  • Utilizing the OpenAI-compatible /v1/chat/completions endpoint.
  • Configuring streaming responses and JSON mode for efficient data handling.
  • Integrating with frameworks like LangChain, LlamaIndex, and custom applications for government operations.
  • Implementing authentication and rate limiting using reverse proxies to ensure secure access.

Performance Optimization for Government

  • Managing context window sizes and key-value (KV) cache to enhance performance.
  • Optimizing batch inference and parallel request handling for efficiency.
  • Allocating CPU threads and ensuring NUMA awareness for optimal resource utilization.
  • Monitoring GPU utilization and memory pressure to maintain system stability.

Security and Compliance for Government

  • Implementing network isolation for model serving endpoints to enhance security.
  • Developing input filtering and output moderation pipelines to ensure content integrity.
  • Maintaining audit logs of prompts and completions for accountability and compliance.
  • Verifying model provenance and hash values to ensure data authenticity and integrity.

Requirements

  • Intermediate skills in Linux and container administration.
  • General understanding of machine learning principles, including transformer models.
  • Proficiency with REST APIs and JSON data formats.

Audience

  • AI engineers and developers tasked with replacing cloud LLM APIs.
  • Organizations that have data sensitivity concerns, preventing the use of cloud-based models.
  • Government and defense teams requiring secure, air-gapped language models for government operations.
 14 Hours

Number of participants


Price per participant

Upcoming Courses

Related Categories