Hyper-Q

11 Feb 2020 Tags:

Hyper-Q

supporting from Fermi GPU

Hyper-Q enables multiple CPU threads or processes to launch work on a single GPU simultaneouly, thereby dramaticallly increasing GPU utilization and slashing CPU idle times. This feature increases the total number of “connections” between the host and GPU by allowing 32 simultaneous, hardware-managed connections, compared to the single connection available with GPUs without Hyper-Q.

Hyper-Q is a flexible solution that allows connections for both CUDA streams and Message Passing Interface (MPI) processes, or even threads from within a processs. Hyper-Q could handle false dependencies.

False Dependencies

pre-Fermi architecture에서는 CPU thread가 CUDA stream에 작업 던졌을 때, 작업은 ‘Worker Distributor(WB)’라는 단일 파이프라인로 모이게 된다. 이후 WB는 dependencies가 충족되는지 확인 후 FCFS으로 사용가능한 SM에 작업을 시킨다.

for ( int i=0 ; i < 3 ; i++ )  {
    kernel_A<<< block, thread, smem, streams [i]>>> ();
    kernel_B<<< block, thread, smem, streams [i]>>> ();
    kernel_C<<< block, thread, smem, streams [i]>>> ()
}

stream [0] : a0->b0->c
stream [1] : a1->b1->c1
stream [2] : a2->b2->c2;

예시를 보면 streams[i]에 A, B,C 가 묶여서 launch되면 streaams[i]내부에는 dependencies가 생겼으나 streams의 각 item끼리는 서로 dependency가 없다. 하지만 pre-Fermi architecutre의 큐는 단일 큐이기 때문에 concurrent하게 작업을 할 수 있음에도 serial하게 작업을 하게 되어 false dependency가 생겼다.

From Kepler

Grid Management Unit이 Kepler architecture 이후 지원되면서 여러 hw work queue를 둬 각 stream은 개별 파이프라인에 삽입된다.

reference

NVIDA Hyper-Q

lynring24 blog

Hyper-Q

Hyper-Q

False Dependencies

From Kepler

reference

Related Posts

ETC, A Framework for Memory Oversubscription Management in Graphics Processing Units 30 Aug 2019

Improving GPGPU Concurrency with Elastic Kernels 03 Jul 2019

DCAPS:Dynamic Cache Allocation with Partial Sharing 07 Jun 2019