Thank you for sending your enquiry! One of our team members will contact you shortly.
Thank you for sending your booking! One of our team members will contact you shortly.
Course Outline
AI Sovereignty and LLM Local Deployment for Government
- Risks associated with cloud-based LLMs include data retention, training on user inputs, and exposure to foreign jurisdictions.
- Ollama's architecture features a model server, registry, and an OpenAI-compatible API.
- Comparison of Ollama with other local deployment solutions such as vLLM, llama.cpp, and Text Generation Inference.
- Licensing terms for models like Llama, Mistral, Qwen, and Gemma.
Installation and Hardware Setup for Government
- Installing Ollama on Linux systems with support for CUDA and ROCm.
- CPU-only deployment options with AVX/AVX2 optimizations.
- Docker deployment methods, including persistent volume mapping.
- Strategies for multi-GPU setups and VRAM allocation.
Model Management for Government
- Pulling models from the Ollama registry using commands like "ollama pull llama3."
- Importing GGUF models from repositories such as HuggingFace and TheBloke.
- Evaluating quantization levels, including Q4_K_M, Q5_K_M, and Q8_0, to balance performance and resource usage.
- Managing model switching and concurrent loading limits.
Custom Modelfiles for Government
- Writing custom Modelfile syntax with directives such as FROM, PARAMETER, SYSTEM, and TEMPLATE.
- Tuning parameters like temperature, top_p, and repeat_penalty to optimize model behavior.
- Engineering system prompts to guide role-specific behaviors.
- Creating and publishing custom models to the local registry for government use.
API Integration for Government
- Utilizing the OpenAI-compatible /v1/chat/completions endpoint.
- Configuring streaming responses and JSON mode for efficient data handling.
- Integrating with frameworks like LangChain, LlamaIndex, and custom applications for government operations.
- Implementing authentication and rate limiting using reverse proxies to ensure secure access.
Performance Optimization for Government
- Managing context window sizes and key-value (KV) cache to enhance performance.
- Optimizing batch inference and parallel request handling for efficiency.
- Allocating CPU threads and ensuring NUMA awareness for optimal resource utilization.
- Monitoring GPU utilization and memory pressure to maintain system stability.
Security and Compliance for Government
- Implementing network isolation for model serving endpoints to enhance security.
- Developing input filtering and output moderation pipelines to ensure content integrity.
- Maintaining audit logs of prompts and completions for accountability and compliance.
- Verifying model provenance and hash values to ensure data authenticity and integrity.
Requirements
- Intermediate skills in Linux and container administration.
- General understanding of machine learning principles, including transformer models.
- Proficiency with REST APIs and JSON data formats.
Audience
- AI engineers and developers tasked with replacing cloud LLM APIs.
- Organizations that have data sensitivity concerns, preventing the use of cloud-based models.
- Government and defense teams requiring secure, air-gapped language models for government operations.
14 Hours