Computing, Volume (106), No (5), Year (2024-2) , Pages (1519-1555)

Title : ( Many-BSP: an analytical performance model for CUDA kernels )

Authors: Ali Riahi , Abdorreza Savadi , Mahmoud Naghibzadeh ,

Access to full-text not allowed by authors

Citation: BibTeX | EndNote

Abstract

The unknown behavior of GPUs and the differing characteristics among their generations present a serious challenge in the analysis and optimization of programs in these processors. As a result, performance models have been developed to better analyze and describe the behavior of these processors. These models help programmers to configure applications and developers to improve the performance of these devices. This paper introduces an analytical model, called Many-BSP, to predict the execution time of a CUDA kernel. This model has high portability and can easily be used on various devices. There are many GPU features and behaviors that affect performance and will be discussed, including multi-threading, coalesced access to global memory, shared memory bank conflict, dual-issue instructions, limitation of functional units, parallelism in instruction, thread and warp levels, the instruction pipeline, branch divergence, and intra-block and inter-block overlapping between communications and computations. This model also employs the tree hierarchy and parameters of the Multi-BSP model to estimate the communication latency with memory. In Many-BSP, the execution time of a kernel is predicted by static analysis of CUDA and PTX codes. The performance of the model is tested on three devices of different generations and three real-world benchmarks. The results show that the execution time of a CUDA kernel can be predicted with a maximum error of 12.33%.

Keywords

, GPU, Analytical model, Performance prediction, CUDA, Microbenchmark
برای دانلود از شناسه و رمز عبور پرتال پویا استفاده کنید.

@article{paperid:1098001,
author = {Riahi, Ali and Savadi, Abdorreza and Naghibzadeh, Mahmoud},
title = {Many-BSP: an analytical performance model for CUDA kernels},
journal = {Computing},
year = {2024},
volume = {106},
number = {5},
month = {February},
issn = {0010-485X},
pages = {1519--1555},
numpages = {36},
keywords = {GPU; Analytical model; Performance prediction; CUDA; Microbenchmark},
}

[Download]

%0 Journal Article
%T Many-BSP: an analytical performance model for CUDA kernels
%A Riahi, Ali
%A Savadi, Abdorreza
%A Naghibzadeh, Mahmoud
%J Computing
%@ 0010-485X
%D 2024

[Download]