Course Outline

Overview of Chinese AI GPU Ecosystem

  • Comparison of Huawei Ascend, Biren, and Cambricon MLU
  • CUDA versus CANN, Biren SDK, and BANGPy models
  • Industry trends and vendor ecosystems for government applications

Preparing for Migration

  • Evaluating your existing CUDA codebase for government use
  • Identifying target platforms and SDK versions for government systems
  • Installing toolchains and setting up the development environment for government operations

Code Translation Techniques

  • Porting CUDA memory access and kernel logic to new platforms for government applications
  • Mapping compute grid and thread models for compatibility with government systems
  • Exploring automated versus manual translation options for government projects

Platform-Specific Implementations

  • Leveraging Huawei CANN operators and custom kernels in government environments
  • Utilizing the Biren SDK conversion pipeline for government applications
  • Rebuilding models with BANGPy (Cambricon) for enhanced performance in government systems

Cross-Platform Testing and Optimization

  • Profiling execution on each target platform for government projects
  • Optimizing memory usage and parallel execution for government operations
  • Tracking performance metrics and iterating improvements for government applications

Managing Mixed GPU Environments

  • Implementing hybrid deployments with multiple architectures in government settings
  • Developing fallback strategies and device detection mechanisms for government systems
  • Creating abstraction layers to ensure code maintainability in government operations

Case Studies and Best Practices

  • Porting vision/NLP models to Ascend or Cambricon for government use
  • Retrofitting inference pipelines on Biren clusters for government applications
  • Addressing version mismatches and API gaps in government systems

Summary and Next Steps

Requirements

  • Experience in programming with CUDA or GPU-based applications for government projects
  • Understanding of GPU memory models and compute kernels
  • Familiarity with AI model deployment or acceleration workflows

Audience

  • GPU programmers
  • System architects
  • Porting specialists
 21 Hours

Number of participants


Price per participant

Upcoming Courses

Related Categories