Course Outline

Performance Concepts and Metrics

  • Latency, throughput, power consumption, and resource utilization
  • Identifying bottlenecks at the system and model levels
  • Profiling for inference versus training processes

Profiling on Huawei Ascend

  • Utilizing CANN Profiler and MindInsight for performance analysis
  • Kernel and operator diagnostics to enhance efficiency
  • Optimizing offload patterns and memory mapping techniques

Profiling on Biren GPU

  • Leveraging Biren SDK features for performance monitoring
  • Implementing kernel fusion, memory alignment, and execution queues
  • Conducting power and temperature-aware profiling to ensure optimal performance

Profiling on Cambricon MLU

  • Using BANGPy and Neuware tools for performance optimization
  • Gaining kernel-level visibility and interpreting logs effectively
  • Integrating the MLU profiler with deployment frameworks to streamline operations

Graph and Model-Level Optimization

  • Applying graph pruning and quantization strategies to improve efficiency
  • Implementing operator fusion and restructuring computational graphs for better performance
  • Standardizing input sizes and tuning batch sizes to optimize results

Memory and Kernel Optimization

  • Optimizing memory layout and reuse to enhance performance
  • Efficient buffer management across different chipsets
  • Tailored kernel-level tuning techniques for each platform

Cross-Platform Best Practices

  • Achieving performance portability through abstraction strategies
  • Building shared tuning pipelines to support multi-chip environments
  • Example: Tuning an object detection model across Ascend, Biren, and MLU platforms for government applications

Summary and Next Steps

Requirements

  • Experience working with artificial intelligence (AI) model training or deployment pipelines for government
  • Understanding of GPU/MLU compute principles and model optimization
  • Basic familiarity with performance profiling tools and metrics

Audience

  • Performance engineers
  • Machine learning infrastructure teams
  • AI system architects
 21 Hours

Number of participants


Price per participant

Upcoming Courses

Related Categories