Introduction

The goal of this document is to present the results of Quantum Fourier Transform (QFT) circuit simulation benchmarking. The main focus is investigation of the relation between the number of input qubits and the time required to complete the corresponding simulation. The influence of other parameters, such as the number of computing nodes used, is also examined.

Methodology

The QFT simulations were set up using two different frameworks – QuEST and Qiskit. QuEST (Quantum Exact Simulation Toolkit) is a high performance simulator written in C and C++. It is well-suited for HPC, as it supports multithreading (OpenMP) and distributed multitasking via message passing (MPI). Qiskit, on the other hand, is a Python library, with core functions implemented in C++. It supports multithreading, and it is possible to leverage it to use parallel MPI tasks. This is done via cache blocking1.

The simulations were run on ARCHER2 computing cluster. The number of threads was fixed throughout simulations to be equal to the number of cores per node – 128. This way, each node was fully utilised2 without the need to pass any local messages, which would negatively impact the performance. On the other hand, the number of nodes used was varied between different powers of 2 (from 1 to 256)3, to examine the scaling speedup. In Qiskit implementation, the number of blocking qubits had to follow the formula: \(N_{block}=N_{qub}-\log_2{N_{proc}}\), and due to memory limitations could not be larger than 26, as otherwise the maximum size of passed messages would exceed 32-bit integers4.

The circuit corresponding to QFT for \(n\) qubits looks as follows:

Source: qiskit.org

It is a non-trivial circuit, which makes it particularly suitable for simulations – it consists of \(n\) Hadamard gates, \(\frac{1}{2}n(n-1)\) \(UROT_k\) gates, and \(\lfloor n/2 \rfloor\) swap gates. \(UROT_k\) represents a controlled phase shift by \(\frac{2\pi}{2^k}\). The overall action of the circuit on vector \(\lvert x \rangle\) can be written as: \[ QFT \lvert x \rangle = \frac{1}{\sqrt{2^n}}\bigotimes_{k=1}^n \left( \lvert 0 \rangle + \exp{\frac{2\pi ix}{2^k} \lvert 1 \rangle} \right) \] If we take \(x = 0\), the action becomes trivial: \[ QFT \lvert 0 \rangle = \frac{1}{\sqrt{2^n}}\bigotimes_{k=1}^n \left( \lvert 0 \rangle + \lvert 1 \rangle \right) = \frac{1}{\sqrt{2^n}}\sum_{i=0}^{2^n-1} \lvert i \rangle \] I.e. every state has an equal weight. Such a result is very easy to verify, validating that the written circuit is working correctly.

Results

The results were written by a SLURM script to an output file in a CSV format with three columns – Nqubits, Ntasks and Time (in ms). In order to save on credits, most jobs were submitted for a fixed number of nodes, as otherwise some of the reserved resources would stay idle. The outputs of each series of computations were combined by hand5.

The final raw data transformed into more readable form are presented below. \(N=2^k\) columns represent the number of nodes, while floating point values represent timings in seconds. Some timings are not available due to memory limits, and these are noted as NA.

QuEST timings:

## # A tibble: 13 × 10
##    Nqubits  `N=1`  `N=2`  `N=4`   `N=8`  `N=16`  `N=32`  `N=64` `N=128` `N=256`
##      <dbl>  <dbl>  <dbl>  <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
##  1      28   6.46   4.42   2.59   0.941   0.587   0.333   0.215   0.133   0.131
##  2      29  13.4    9.05   5.52   3.04    1.14    0.662   0.326   0.210   0.104
##  3      30  28.0   18.7   11.2    6.46    3.69    1.33    0.782   0.381   0.254
##  4      31  58.6   38.7   22.8   13.3     7.70    4.19    1.52    0.896   0.468
##  5      32 125.    80.9   47.4   27.1    15.6     8.81    4.70    1.75    1.02 
##  6      33 264.   169.    97.7   55.7    30.9    17.5     9.71    5.18    2.04 
##  7      34  NA     NA    201.   114.     63.5    35.4    19.6    10.9     5.67 
##  8      35  NA     NA     NA    235.    129.     72.0    39.4    21.9    12.1  
##  9      36  NA     NA     NA     NA     262.    147.     79.3    44.2    24.4  
## 10      37  NA     NA     NA     NA      NA     299.    160.     88.9    48.3  
## 11      38  NA     NA     NA     NA      NA      NA     322.    179.     96.0  
## 12      39  NA     NA     NA     NA      NA      NA      NA     358.    193.   
## 13      40  NA     NA     NA     NA      NA      NA      NA      NA     387.

Qiskit timings:

## # A tibble: 11 × 8
##    Nqubits `N=1`  `N=2`  `N=4`  `N=8` `N=16` `N=32` `N=64`
##      <dbl> <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
##  1      28  11.2   9.52   4.85   2.76   1.71   1.32   1.21
##  2      29  23.5  20.0    9.89   5.14   2.98   1.75   1.47
##  3      30  49.8  43.6   21.6   11.1    5.83   3.21   2.31
##  4      31 105.   92.2   45.4   23.2   12.1    6.27   4.18
##  5      32 220.  194.    96.2   50.1   25.8   13.3    7.89
##  6      33 481.  391.   190.    95.9   48.7   25.3   14.5 
##  7      34  NA    NA    412.   208.   107.    54.7   30.4 
##  8      35  NA    NA     NA    442.   222.   112.    61.4 
##  9      36  NA    NA     NA     NA    476.   242.   130.  
## 10      37  NA    NA     NA     NA     NA    483.   255.  
## 11      38  NA    NA     NA     NA     NA     NA    552.

The above runtimes can be plotted directly.