JOYCE Core Algorithm
Cost Model
Due to previous model in [[JOYCE Project]] and [[JOYCE Paper Draft]].
We have some basic operators on statevector:
- \(b_{CG}\), statevec swap (swap data for GPU device), depends on bandwidth between CPU - GPU
- \(b_{GG}\), statevec comm (read data frem another device), depends on peer-to-peer bandwidth GPU - GPU
- \(b_{CS}\), statevec offload, CPU - SSD
- \(t_e\), operator execution, depends on FLOPS for GPU
\[ F(x_0, x_1, x_2) \] F: return value -> time cost for a gate execution x_0: bit mapping (qubit --> bit index in state vector) x_1: circuit info (matrix + target bit) x_2: cluster communication topology, piperadius
In user API point of view, we have:
- Execute gate
- Bit swap
Bit position
|<-SSD(S)->|<-DRAM(D)->|<-Multi GPU(M)->|<-GPU Local(L)->|
Execute gate
target bit on GPU Local Range
rounds = num_blocks / num_GPU Exec_Cost(GPU Local) = rounds * (t_e + state_on_gpu / b_CG)
target bit on GPU multiple Range
rounds = num_blocks / num_GPU Exec_Cost(GPU Comm) = rounds * (max(t_e, state_on_gpu / b_GG) + constant) ==(if pipeline)== Exec_Cost(GPU Comm) = rounds * (t_e + state_on_gpu / b_GG) ==(if no pipeline)==
target bit on Dram --> target bit on GPU multiple range
rounds = num_blocks / num_GPU Exec_Cost(DRAM) = Exec_Cost(GPU Comm) + state_on_gpu / b_CG
Bit swap
L, L
rounds = num_blocks / num_GPU Swap_Cost(L, L) = rounds * t_e
--> Discussion on a table:
9 conditions
Our Target: Bit swap can change locality, change bits on (S), D, M and L target function: total cost reduce to lowest