Skip to content

Publications

2025

Improve GPGPU Front-End Efficiency via Inter-Warp Instruction Sharing

Yu-Yu Hsiao, Liang-Chou Chen, Chung-Ho Chen

ISCAS 2025

Abstract: General-purpose GPUs (GPGPUs) leverage thousands of threads per core to attain high throughput. Threads executing the same program are dynamically grouped into SIMD batches known as warps, which exhibit high localities in accessing the instruction cache. Current designs allow warps to access the instruction cache independently in a time-sharing manner, leading to significant fetch redundancies. To address this issue, we introduce Inter-Warp Instruction Sharing, a lightweight yet effective technique to improve GPGPU front-end efficiency. The mechanism incorporates an improved fetch request arbitration algorithm, a fetch filter to prevent redundant fetching, and a modified instruction buffer that broadcasts instructions to all active warps. We show that our approach reduces I-Cache accesses by up to 85%, achieving a geometric mean reduction of 68% without performance degradation.


2024

GPGPU Pipeline Visualization for RISC-V SIMT Architecture

Yu-Yu Hsiao, Liang-Chou Chen, Chung-Ho Chen

CARRV 2024

Abstract: The increasing complexity of modern computer architectures demands more effective analysis and visualization techniques to optimize performance and microarchitecture. One such key approach is pipeline visualization, which offers valuable insights into the architectural design of processors. Although numerous pipeline visualization tools have emerged for CPUs, it is remarkable that research specifically on SIMT pipelines has been overlooked in the existing research. To bridge this gap, this paper proposes a visualization framework specifically for SIMT pipelines. We describe the methodology for generating and visualizing the SIMT pipeline trace and present three case studies including latency hiding, warp scheduling, and memory coalescing to demonstrate the effectiveness of our visualization framework. Our visualized pipeline traces provide insightful observations that align with the quantitative results, demonstrating the framework’s capability of analyzing SIMT processors.