JOYCE Core Algorithm

Cost Model

Due to previous model in [[JOYCE Project]] and [[JOYCE Paper Draft]].

We have some basic operators on statevector:

  1. $b_{CG}$, statevec swap (swap data for GPU device), depends on bandwidth between CPU - GPU
  2. $b_{GG}$, statevec comm (read data frem another device), depends on peer-to-peer bandwidth GPU - GPU
  3. $b_{CS}$, statevec offload, CPU - SSD
  4. $t_e$, operator execution, depends on FLOPS for GPU

F: return value -> time cost for a gate execution
x_0: bit mapping (qubit —> bit index in state vector)
x_1: circuit info (matrix + target bit)
x_2: cluster communication topology, piperadius

In user API point of view, we have:

  1. Execute gate
  2. Bit swap

Bit position

|<-SSD(S)->|<-DRAM(D)->|<-Multi GPU(M)->|<-GPU Local(L)->|

Execute gate

target bit on GPU Local Range

rounds = num_blocks / num_GPU
Exec_Cost(GPU Local) = rounds * (t_e + state_on_gpu / b_CG)

target bit on GPU multiple Range

rounds = num_blocks / num_GPU
Exec_Cost(GPU Comm) = rounds (max(t_e, state_on_gpu / b_GG) + constant) ==(if pipeline)==
Exec_Cost(GPU Comm) = rounds
(t_e + state_on_gpu / b_GG) ==(if no pipeline)==

target bit on Dram —> target bit on GPU multiple range

rounds = num_blocks / num_GPU
Exec_Cost(DRAM) = Exec_Cost(GPU Comm) + state_on_gpu / b_CG

Bit swap

L, L

rounds = num_blocks / num_GPU
Swap_Cost(L, L) = rounds * t_e

—> Discussion on a table:

9 conditions

Our Target:
Bit swap can change locality, change bits on (S), D, M and L
target function: total cost reduce to lowest


JOYCE Core Algorithm
http://blog.chivier.site/2023-11-29/d063a18bfbde/
Author
Chivier Humber
Posted on
November 29, 2023
Licensed under