Reinforcement learning for semi-autonomous approximate quantum eigensolver

F Albarrán-Arriagada; J C Retamal; E Solano; L Lamata

doi:10.1088/2632-2153/ab43b4

1. Introduction

In the past few years, the symbiosis between quantum mechanics and machine learning into the topic named quantum machine learning (QML) has been a fruitful area [1–4], either applying classical machine learning techniques to quantum tasks such as quantum metrology [5, 6], quantum state estimation [7, 8], and others [9–14]; or using quantum mechanics to enhance machine learning algorithms for classical applications [3, 15–21]. Any machine learning algorithm can be classified into learning from big data and learning from interactions.

For the first group, we have two classes of algorithms, one of them are the supervised learning algorithms, which use a previously labeled data set named training data to infer a labeled criterion which is used to classify new data; a remarkable example is pattern recognition algorithms [22–24]. The other class is unsupervised learning algorithms. In this case, the training data is not necessary, and the approach is to group the unlabeled data in different sets, where each set is characterized by the mean value of some property of its constituents. The different groups are constructed to optimize some indicator of the dispersion in each subset with respect to the value that characterized it, e.g.the standard deviation. An example of these algorithms is the clustering problem [25, 26].

For the second group, we have the reinforcement learning (RL) algorithms [27]. Here, one accessible and manipulable system called agent (A) interacts with another unknown system called environment (E). The strategy relies on A improving its performance in a specific task ${ \mathcal Q }(A,E)$ , which depends on the state of the systems A and E. This improvement employs the results of multiple interactions among A and E. The general framework of the RL paradigm is composed of three parts, the policy, the reward function (RF) and the value function (VF). The policy defines the main steps of the algorithm that we can divide into three steps. First, the information extraction, which considers the interaction among A and E, and how to obtain the information from it. Second, the feedback loop, that specifies the channel used to communicate the information extracted to A. Third, the decision process, where we decide the action on A in order to progress towards the aimed-for goal, and then start with the information extraction again. The RF defines the criterion to reward (punish) the actions which improve (worsen) the performance of A respect to the task ${ \mathcal Q }(A,E)$ at each step. Finally, the VF gives us the global performance of the algorithm, ensuring the convergence of it. One of the most impressive examples of this paradigm is the recent developing of chess, go and shogi masters players without database [28, 29]. This class of algorithms mimic the most primitive form of human learning, commonly named trial and error. It means that a near-future implementation of quantum artificial intelligence may apply this paradigm to a quantum system to enhance a quantum task as the main way to learn. For this reason, the development of the quantum version of the RL paradigm has played an important role in QML in recent years [3, 30–34].

A crucial task in physics is the characterization of the different interactions among systems. This characterization is helpful to evaluate the risks of our actions and act to minimize them. Therefore, any autonomous artificial intelligence must have this ability.

In quantum mechanics, a physical interaction (observable) is represented by a Hermitian matrix or quantum operator, which is characterized by its eigenvalues and eigenvectors. The calculation of the eigenvectors and eigenvalues of a quantum interaction by a classical computer implies that we need to encode the quantum information into classical bits, which is inconvenient for unknown quantum interactions. Moreover, the implementation of a full quantum eigensolver [35–38] using near-future quantum computers seems impractical due to the number of needed resources [39]. The emergence of hybrid classical-quantum algorithms in the past few years [40–46] opens the door to the development of useful eigensolvers. Nevertheless, these works are mainly focused on the eigenvalues, eigenvectors, and properties of quantum systems such as molecules, being the characterization of a physical interaction less studied.

In this article, we propose a hybrid quantum-classical algorithm to calculate an approximation to the eigenvector of any quantum interaction described by a Hermitian matrix with minimal resources [47]. In our proposal, we use single-shot measurement and classical communication given by a feedback loop, which characterizes a RL protocol. The main goal of this proposal is to obtain a high-fidelity approximation (above 98% for the single-qubit case), without measuring fidelities or some expectation value, which reduce drastically the number of iterations of the algorithm, decreasing the effect of noise sources, and without human intervention. We also show how to extend the algorithm to the multiqubit and high-dimensional situations. This protocol could be useful to implement semi-autonomous quantum devices with the capability to decide using the characterization of an interaction, which is an essential ingredient for the implementation of artificial quantum intelligence [4] and artificial quantum life [48, 49].

2. Quantum eigensolver protocol

Our proposal is related to recent works about a measurement-based algorithm to adapt one known state to another unknown one [50–52]. Here, we define the general framework of our protocol based on the RL paradigm and then, we explain in details the single qubit case, the single qudit case, and the multiqubit case.

In our protocol, we consider as the agent a manipulable and known quantum system described by the state $| {\phi }_{A,0}\rangle$ , which correspond to any initialization of a given physical system. The environment is a black box, which produces an unknown interaction inside it. This interaction is characterized by an unknown Hermitian operator ${\hat{{ \mathcal O }}}_{E}$ , which generates a unitary transformation ${\hat{U}}_{E}={{\rm{e}}}^{-{\rm{i}}\tau {\hat{{ \mathcal O }}}_{E}}$ over the quantum system A when it interacts with the system E, where τ is a parameter related to the interaction time with the black-box, e.g.a spin particle (agent) traversing a region with a magnetic field (environment) for a time t ∼ τ.

The policy is as follows:

Information extraction: The system A interacts with E changing its state as
$\begin{eqnarray}&&| {\bar{\phi }}_{A,0}\rangle ={\hat{U}}_{E}| {\phi }_{A,0}\rangle .\end{eqnarray} \tag{ 1 }$
Next, we perform a measurement process over $| {\bar{\phi }}_{A,0}\rangle$ in the basis $\{| {\phi }_{A,0}\rangle ,\ldots ,| {\phi }_{A,d-1}\rangle \}$ , where d is the dimension of the Hilbert space of A and $\langle {\phi }_{A,j}| {\phi }_{A,k}\rangle ={\delta }_{j,k}$ .
Feedback loop: The information of the measuring process is communicated to a command center with the ability to perform a unitary transformation ${\hat{{ \mathcal U }}}_{j}$ (quantum gate) over the state of A in order to change the possible results in the next information extraction step.
Decision process: If the outcome of the measurement process is the state $| {\phi }_{A,j}\rangle$ , with $j\ne 0$ , this means that $| {\phi }_{A,0}\rangle$ changes when system A interacts with E, therefore, $| {\phi }_{A,0}\rangle$ cannot be an eigenvector of ${\hat{{ \mathcal O }}}_{E}$ . In this case, we define the unitary transformation ${\hat{{ \mathcal U }}}_{j}$ as
$\begin{eqnarray}&&{\hat{{ \mathcal U }}}_{j}={{\rm{e}}}^{-{\rm{i}}{\varphi }_{y}{\hat{S}}_{y,j}}{{\rm{e}}}^{-{\rm{i}}{\varphi }_{z}{\hat{S}}_{z,j}}{{\rm{e}}}^{-{\rm{i}}{\varphi }_{x}{\hat{S}}_{x,j}},\end{eqnarray} \tag{ 2 }$
where
$\begin{eqnarray}\begin{array}{rcl}{\hat{S}}_{x,j} & = & \displaystyle \frac{1}{2}\left(| {\phi }_{A,0}\rangle \langle {\phi }_{A,j}| +| {\phi }_{A,j}\rangle \langle {\phi }_{A,0}| \right),\\ {\hat{S}}_{y,j} & = & -\displaystyle \frac{{\rm{i}}}{2}\left(| {\phi }_{A,0}\rangle \langle {\phi }_{A,j}| -| {\phi }_{A,j}\rangle \langle {\phi }_{A,0}| \right),\\ {\hat{S}}_{z,j} & = & \displaystyle \frac{1}{2}\left(| {\phi }_{A,0}\rangle \langle {\phi }_{A,0}| -| {\phi }_{A,j}\rangle \langle {\phi }_{A,j}| \right),\end{array}\end{eqnarray} \tag{ 3 }$
and φ_α is a random angle in the range $[-w\pi ,w\pi ]$ , with w the searching range given by the RF. We note that ${\hat{{ \mathcal U }}}_{j}$ is a pseudo-random rotation in the subspace expanded by $\{| {\phi }_{A,0}\rangle ,| {\phi }_{A,j}\rangle \}$ . For this outcome we define the state of A as ${\hat{{ \mathcal U }}}_{j}| {\phi }_{A,0}\rangle$ , and start again with the information extraction step.If the outcome of the measuring process is $| {\phi }_{A,0}\rangle$ , it means that $| {\phi }_{A,0}\rangle$ could be an eigenvector of ${\hat{{ \mathcal O }}}_{E}$ . We point out that the eigenvectors of an operator remain constant up to a global phase under the action of a function of this operator. In this case, we apply the identity operator ${\mathbb{I}}$ . Moreover, we keep the same state $| {\phi }_{A,0}\rangle$ and start again with the information extraction step. Figure 1 shows a scheme of the policy of the algorithm.

Figure 1. Refer to the following caption and surrounding text. — **Figure 1.** Diagram of the protocol. The solid green arrows show flow direction of A state. The blue dashed arrows represent the feedback loops, and the red arrow with dot end marks the states in each step. The state $| {\phi }_{A,0}^{(k)}\rangle$ corresponds to the start point of the kth iteration, and the state $| {\phi }_{A,0}^{(k+1)}\rangle$ corresponds to the end point of the kth iteration, that is also the state at the beginning of the $(k+1)$ th iteration.
Download figure:
Standard image High-resolution image

$| {\phi }_{A,0}^{(k)}\rangle $ — **Figure 1.** Diagram of the protocol. The solid green arrows show flow direction of A state. The blue dashed arrows represent the feedback loops, and the red arrow with dot end marks the states in each step. The state $| {\phi }_{A,0}^{(k)}\rangle$ corresponds to the start point of the kth iteration, and the state $| {\phi }_{A,0}^{(k+1)}\rangle$ corresponds to the end point of the kth iteration, that is also the state at the beginning of the $(k+1)$ th iteration.
Download figure:
Standard image High-resolution image

For the RF we define the reward rate r < 1 and the punishment rate p > 1. If the outcome of the measure is $| {\phi }_{A,0}\rangle$ we define $\bar{w}=w\cdot r$ and $\bar{w}=w\cdot p$ in other case. Finally, we renamed $w=\bar{w}$ for the next iteration of the algorithm, which means that when we measure $| {\phi }_{A,0}\rangle$ we reduce the searching range, and we increase it in other case. The initial value for w is chosen according to the problem.

As we can note, the protocol does not need store the states, or all the history of the algorithm, it only needs to store the final operation ${\hat{D}}^{(N)}$ via storing the parameters that characterize this operation classically.

To ensure the convergence of our algorithm, we define the VF as the value of w. This implies that, when $w\to 0$ , our protocol converges. For a correct choice of r and p we have that $w\to 0$ only if we obtain, in the measurement process of $| {\bar{\phi }}_{A,0}\rangle$ , the outcome $| {\phi }_{A,0}\rangle$ many times in a row. This means that $\langle {\phi }_{A,0}| {\bar{\phi }}_{A,0}\rangle \sim 1$ , therefore $| {\phi }_{A,0}\rangle$ is an approximate eigenvector of ${\hat{{ \mathcal O }}}_{E}$ .

As this is an iterative protocol, we define the following notation for the remainder of the article: any super-index between parenthesis refers to the iteration of the algorithm, e.g. $| {\phi }_{A,0}^{(4)}\rangle$ is the state of A before the interaction with E in the fourth iteration. Similarly, ${\hat{{ \mathcal U }}}_{j}^{(k)}$ is the unitary transformation defined in the decision process for the iteration k. As a special case, the super-index $(1)$ refers to the initial values, e.g. ${w}^{(1)}$ represents the initial searching range.

It is necessary to mention that our algorithm uses one single-shot measurement per loop, representing advantage with respect to employing an expectation value or the fidelity. The latter imply hundreds of measurements for a two-level system, being this proposal exposed less time to noise sources. Also, as we use pseudo-random operations ${\hat{D}}^{(k)}$ , the effect of any noise in the gate can be seen as part of the randomness of the protocol.

2.1. Single-qubit case

In the single-qubit case, ${\hat{{ \mathcal O }}}_{E}$ is described by a 2 × 2 Hermitian matrix with eigenvectors $\{| {v}_{0}\rangle ,| {v}_{1}\rangle \}$ and eigenvalues $\{{\lambda }_{0},{\lambda }_{1}\}$ , respectively. As these two eigenvectors are orthonormal, we can write

$\begin{eqnarray}\begin{array}{rcl}| {v}_{0}\rangle & = & \cos \left(\displaystyle \frac{\alpha }{2}\right)| 0\rangle +{{\rm{e}}}^{{\rm{i}}\beta }\sin \left(\displaystyle \frac{\alpha }{2}\right)| 1\rangle ,\\ | {v}_{1}\rangle & = & \sin \left(\displaystyle \frac{\alpha }{2}\right)| 0\rangle -{{\rm{e}}}^{{\rm{i}}\beta }\cos \left(\displaystyle \frac{\alpha }{2}\right)| 1\rangle \end{array}\end{eqnarray} \tag{ 4 }$

where $\alpha \in [0,2\pi ],\beta \in [0,\pi ]$ and

$\begin{eqnarray}&&| 0\rangle =\left(\begin{array}{c}1\\ 0\end{array}\right),\quad | 1\rangle =\left(\begin{array}{c}0\\ 1\end{array}\right).\end{eqnarray} \tag{ 5 }$

We define ${\hat{{ \mathcal O }}}_{E}$ and ${\hat{U}}_{E}$ as

$\begin{eqnarray}\begin{array}{rcl}{\hat{{ \mathcal O }}}_{E} & = & {\lambda }_{0}| {v}_{0}\rangle \langle {v}_{0}| +{\lambda }_{1}| {v}_{1}\rangle \langle {v}_{1}| ,\\ {\hat{U}}_{E} & = & {{\rm{e}}}^{-{\rm{i}}{\lambda }_{0}\tau }| {v}_{0}\rangle \langle {v}_{0}| +{{\rm{e}}}^{-{\rm{i}}{\lambda }_{1}\tau }| {v}_{1}\rangle \langle {v}_{1}| .\end{array}\end{eqnarray} \tag{ 6 }$

Policy. In this case, we write the state $| {\phi }_{A,0}^{(k)}\rangle$ before the black-box as

$\begin{eqnarray}&&| {\phi }_{A,0}^{(k)}\rangle =\cos \left(\displaystyle \frac{{\theta }^{(k)}}{2}\right)| 0\rangle +{{\rm{e}}}^{{\rm{i}}{\varphi }^{(k)}}\sin \left(\displaystyle \frac{{\theta }^{(k)}}{2}\right)| 1\rangle ,\end{eqnarray} \tag{ 7 }$

and the state $| {\bar{\phi }}_{A,0}^{(k)}\rangle$ after E as

$\begin{eqnarray}\begin{array}{rcl}| {\bar{\phi }}_{A,0}^{(k)}\rangle & = & \cos \left(\displaystyle \frac{{\bar{\theta }}^{(k)}}{2}\right)| 0\rangle +{{\rm{e}}}^{{\rm{i}}{\bar{\varphi }}^{(k)}}\sin \left(\displaystyle \frac{{\bar{\theta }}^{(k)}}{2}\right)| 1\rangle \\ & = & \cos \left(\displaystyle \frac{{{\rm{\Delta }}}_{\theta }^{(k)}}{2}\right)| {\phi }_{A,0}^{(k)}\rangle +{{\rm{e}}}^{{\rm{i}}{{\rm{\Delta }}}_{\varphi }^{(k)}}\sin \left(\displaystyle \frac{{{\rm{\Delta }}}_{\theta }^{(k)}}{2}\right)| {\phi }_{A,1}^{(k)}\rangle \end{array}\end{eqnarray} \tag{ 8 }$

where

$\begin{eqnarray}&&| {\phi }_{A,1}^{(k)}\rangle =\sin \left(\displaystyle \frac{{\theta }^{(k)}}{2}\right)| 0\rangle -{{\rm{e}}}^{{\rm{i}}{\varphi }^{(k)}}\cos \left(\displaystyle \frac{{\theta }^{(k)}}{2}\right)| 1\rangle .\end{eqnarray} \tag{ 9 }$

For the explicit form ${\bar{\theta }}^{(k)}$ and ${\bar{\phi }}^{(k)}$ in terms of α, β, τ and the eigenvalues of ${\hat{{ \mathcal O }}}_{E}$ see appendix A. Moreover, for the explicit form of ${{\rm{\Delta }}}_{\theta }^{(k)}$ and ${{\rm{\Delta }}}_{\phi }^{(k)}$ , see appendix B. Now, to perform the measurement process over $| {\bar{\phi }}_{A,0}^{(k)}\rangle$ , we apply the basis-rotation matrix

$\begin{eqnarray}&&{\hat{D}}^{(k)\dagger }=| 0\rangle \langle {\phi }_{A,0}^{(k)}| +| 1\rangle \langle {\phi }_{A,1}^{(k)}| ,\end{eqnarray} \tag{ 10 }$

in order to measure in the basis $\{| 0\rangle ,| 1\rangle \}$ for all iterations. After the measurement process, the state of A is $| {m}^{(k)}\rangle$ , where ${m}^{(k)}\in \{0,1\}$ is the outcome of the measurement with probabilities ${{ \mathcal P }}_{0}^{(k)}={\cos }^{2}({{\rm{\Delta }}}^{(k)}/2)$ and ${{ \mathcal P }}_{1}^{(k)}={\sin }^{2}({{\rm{\Delta }}}^{(k)}/2)$ , respectively. If m^(k) = 0, then we transform the state $| 0\rangle \to | {\phi }_{A,0}^{(k)}\rangle$ , using the matrix ${\hat{D}}^{(k)}$ , and start again the algorithm. If ${m}^{(k)}=1$ , we transform the state $| 1\rangle \to | {\phi }_{A,0}^{(k)}\rangle$ using ${\hat{D}}^{(k)}{\sigma }_{x}$ , where σ_x is the Pauli matrix x, and apply the pseudo-random operator ${\hat{{ \mathcal U }}}_{1}^{(k)}$ defined by equation (2). Then, after the measurement process, we apply over $| {m}^{(k)}\rangle$ the operator ${\hat{G}}_{0}^{(k)}$ defined by

$\begin{eqnarray}&&{\hat{G}}_{0}^{(k)}={\hat{D}}^{(k+1)}\hat{{ \mathcal R }}\end{eqnarray} \tag{ 11 }$

where

$\begin{eqnarray}\begin{array}{rcl}{\hat{D}}^{(k+1)} & = & (1-{m}^{(k)}){\hat{D}}^{(k)}+{m}^{(k)}{\hat{{ \mathcal U }}}_{1}^{(k)}{\hat{D}}^{(k)},\\ \hat{{ \mathcal R }} & = & (1-{m}^{(k)}){\mathbb{I}}+{m}^{(k)}{\sigma }_{x}.\end{array}\end{eqnarray} \tag{ 12 }$

Given that ${\hat{D}}^{(k)}$ transforms $| {\phi }_{A,j}^{(k)}\rangle \to | j\rangle$ ( $| j\rangle \in \{| 0\rangle ,| 1\rangle \}$ ), we can write ${\hat{{ \mathcal U }}}_{1}^{(k)}={\hat{D}}^{(k)}{\hat{u}}_{1}{\hat{D}}^{(k)\dagger }$ , where

$\begin{eqnarray}&&{\hat{u}}_{1}={{\rm{e}}}^{-{\rm{i}}{\varphi }_{y}{\hat{S}}_{y}}{{\rm{e}}}^{-{\rm{i}}{\varphi }_{z}{\hat{S}}_{z}}{{\rm{e}}}^{-{\rm{i}}{\varphi }_{x}{\hat{S}}_{x}},\end{eqnarray} \tag{ 13 }$

with ${\hat{S}}_{j}=(1/2){\sigma }_{j}$ the spin operators, with σ_j the Pauli matrix j. Then, the operator ${\hat{D}}^{(k+1)}$ reads

$\begin{eqnarray}&&{\hat{D}}^{(k+1)}=(1-{m}^{(k)}){\hat{D}}^{(k)}+{m}^{(k)}{\hat{D}}^{(k)}{\hat{u}}_{1}.\end{eqnarray} \tag{ 14 }$

For this case, the RF that defines the value of ${w}^{(k+1)}$ for each step reads

$\begin{eqnarray}&&{w}^{(k+1)}=\left[(1-{m}^{(k)})r+{m}^{(k)}p\right]{w}^{(k)},\end{eqnarray} \tag{ 15 }$

where r and p are the reward rate and punishment rate, respectively, described previously.

When the algorithm converges, we have $| {\phi }_{A,0}^{(N)}\rangle \approx | {\bar{\phi }}_{A,0}^{(N)}\rangle$ , where N is the number of iterations. Moreover, in this case ${\hat{D}}^{(N)}$ is an approximation of the matrix that diagonalizes ${\hat{{ \mathcal O }}}_{E}$ , that is

$\begin{eqnarray}&&{\hat{D}}^{(N)\dagger }{\hat{{ \mathcal O }}}_{E}{\hat{D}}^{(N)}\sim {\lambda }_{0}| 0\rangle \langle 0| +{\lambda }_{1}| 1\rangle \langle 1| .\end{eqnarray} \tag{ 16 }$

In order to explore the complete space we must choose w⁽¹⁾ = 1.

2.2. Single-qudit case

In this case, the agent is a d-dimensional system or qudit, the operator ${\hat{{ \mathcal O }}}_{E}$ is described by a d × d Hermitian matrix with eigenvalues {λ_j}, eigenvectors $\{| {v}_{j}\rangle \}$ and j = {0,1,2 ,..., d − 1}. In the kth iteration of the algorithm, the state of A before E reads

$\begin{eqnarray}&&| {\phi }_{A,0}^{(k)}\rangle =\displaystyle \sum _{j=0}^{d-1}{c}_{j}| j\rangle ,\end{eqnarray} \tag{ 17 }$

while for simplicity we choose the initial state $| {\phi }_{A,0}^{(1)}\rangle =| 0\rangle$ . After the interaction with E, we have

$\begin{eqnarray}&&| {\bar{\phi }}_{A,0}^{(k)}\rangle ={\hat{U}}_{E}| {\phi }_{A,0}^{(k)}\rangle =\displaystyle \sum _{j=0}^{d-1}{\bar{c}}_{j}| {\phi }_{A,j}^{(k)}\rangle .\end{eqnarray} \tag{ 18 }$

Subsequently, we apply the operator ${\hat{D}}^{(k)\dagger }$ , which is defined now as

$\begin{eqnarray}&&{\hat{D}}^{(k)\dagger }=\displaystyle \sum _{j=0}^{d-1}| j\rangle \langle {\phi }_{A,j}^{(k)}| ,\end{eqnarray} \tag{ 19 }$

and perform the measurement process in the basis $\{| 0\rangle ,| 1\rangle ,\ldots ,| d-1\rangle \}$ . After this process, the state of A is $| {m}^{(k)}\rangle$ , where ${m}^{(k)}\in \{0,1,\ldots ,d-1\}$ is the outcome of the measurement process. In this case the decision process applies the operator ${\hat{G}}_{0}^{(k)}$ defined by equation (11), but with

$\begin{eqnarray}&&\begin{array}{l}\hat{{ \mathcal R }}={\delta }_{0,{m}^{(k)}}({\mathbb{I}}-\hat{{ \mathcal X }})+\hat{{ \mathcal X }},\\ {\hat{D}}^{(k+1)}=\displaystyle \sum _{j=0}^{d-1}{\delta }_{j,{m}^{(k)}}{\hat{{ \mathcal U }}}_{j}^{(k)}{\hat{D}}^{(k)},\end{array}\end{eqnarray} \tag{ 20 }$

where

$\begin{eqnarray}&&\hat{{ \mathcal X }}=\displaystyle \sum _{j=1}^{d-1}\left(| 0\rangle \langle j| +| j\rangle \langle 0| \right)\end{eqnarray} \tag{ 21 }$

with ${\hat{{ \mathcal U }}}_{{m}^{(k)}}^{(k)}$ as defined in equation (2) and ${\hat{{ \mathcal U }}}_{0}^{(k)}={\mathbb{I}}$ . Also in this case ${\hat{{ \mathcal U }}}_{j}^{(k)}={\hat{D}}^{(k)}{\hat{u}}_{j}{\hat{D}}^{(k)\dagger }$ , where

$\begin{eqnarray}&&{\hat{u}}_{j}={{\rm{e}}}^{-{\rm{i}}{\varphi }_{y}{\hat{S}}_{y}^{j}}{{\rm{e}}}^{-{\rm{i}}{\varphi }_{z}{\hat{S}}_{z}^{j}}{{\rm{e}}}^{-{\rm{i}}{\varphi }_{x}{\hat{S}}_{x}^{j}},\end{eqnarray} \tag{ 22 }$

and

$\begin{eqnarray}\begin{array}{c}\begin{array}{rcl}{\hat{S}}_{x}^{j} & = & \displaystyle \frac{1}{2}\left(| 0 \rangle \langle j| +| j \rangle \langle 0| \right),\\ {\hat{S}}_{y}^{j} & = & -\displaystyle \frac{{\rm{i}}}{2}\left(| 0 \rangle \langle j| -| j \rangle \langle 0| \right),\\ {\hat{S}}_{z}^{j} & = & \displaystyle \frac{1}{2}\left(| 0 \rangle \langle 0| -| j \rangle \langle j| \right),\end{array}\end{array}\end{eqnarray} \tag{ 23 }$

therefore,

$\begin{eqnarray}&&{\hat{D}}^{(k+1)}=\displaystyle \sum _{j=0}^{d-1}{\delta }_{j,{m}^{(k)}}{\hat{D}}^{(k)}{\hat{u}}_{{m}^{(k)}}.\end{eqnarray} \tag{ 24 }$

The state of A for the next iteration reads $| {\phi }_{A,0}^{(k+1)}\rangle ={\hat{G}}_{0}^{(k)}| {m}^{(k)}\rangle$ .

Finally, the RF that updates the value of the searching range is given by

$\begin{eqnarray}&&{w}^{(k+1)}=\left[(r-p){\delta }_{0,{m}^{(k)}}+p\right]{w}^{(k)}.\end{eqnarray} \tag{ 25 }$

Once the algorithm converges, we have that

$\begin{eqnarray}&&| {\phi }_{A,0}^{({N}_{0}+1)}\rangle ={\hat{D}}^{({N}_{0})}| {\phi }_{A,0}^{(1)}\rangle ,\end{eqnarray} \tag{ 26 }$

is an approximate eigenvector, therefore

$\begin{eqnarray}&&| \langle {\phi }_{A,0}^{({N}_{0}+1)}| {\hat{{ \mathcal O }}}_{E}| {\phi }_{A,0}^{({N}_{0}+1)}\rangle | \sim 1.\end{eqnarray} \tag{ 27 }$

In order to find another eigenvector of ${\hat{{ \mathcal O }}}_{E}$ , we start again the algorithm for the iteration ${N}_{0}+1$ , i.e. ${w}^{({N}_{0}+1)}={w}^{(1)}=2\pi$ , but now the state before E is given by $| {\phi }_{A,1}^{({N}_{0}+1)}\rangle ={\hat{D}}^{({N}_{0})}| {\phi }_{A,1}^{(1)}\rangle$ . We redefine equation (23) as

$\begin{eqnarray}\begin{array}{rcl}{\hat{S}}_{x}^{j} & = & \displaystyle \frac{1}{2}\left(| 1\rangle \langle j| +| j\rangle \langle 1| \right),\\ {\hat{S}}_{y}^{j} & = & -\displaystyle \frac{{\rm{i}}}{2}\left(| 1\rangle \langle j| -| j\rangle \langle 1| \right),\\ {\hat{S}}_{z}^{j} & = & \displaystyle \frac{1}{2}\left(| 1\rangle \langle 1| -| j\rangle \langle j| \right).\end{array}\end{eqnarray} \tag{ 28 }$

Thus, we can calculate the operator ${\hat{u}}_{j}$ as in equation (22).

The decision process changes as

$\begin{eqnarray}&&{\hat{G}}_{1}^{(k)}={\hat{D}}^{(k+1)}{\hat{{ \mathcal R }}}_{1},\end{eqnarray} \tag{ 29 }$

where

$\begin{eqnarray}&&\begin{array}{l}{\hat{{ \mathcal R }}}_{1}={\delta }_{1,{m}^{(k)}}({\mathbb{I}}-{\hat{{ \mathcal X }}}_{1})+{\hat{{ \mathcal X }}}_{1},\\ {\hat{D}}^{(k+1)}=\displaystyle \sum _{j=0}^{d-1}{\delta }_{j,{m}^{(k)}}{\hat{D}}^{(k)}{\hat{u}}_{j},\\ {\hat{{ \mathcal X }}}_{1}=\displaystyle \sum _{j\ne 1}\left(| 1\rangle \langle j| +| j\rangle \langle 1| \right),\end{array}\end{eqnarray} \tag{ 30 }$

and ${\hat{u}}_{0}={\hat{u}}_{1}={\mathbb{I}}$ . Finally, the RF reads

$\begin{eqnarray}&&{w}^{(k+1)}=\left[(r-p){\delta }_{1,{m}^{(k)}}-p{\delta }_{0,{m}^{(k)}}+p\right]{w}^{(k)}.\end{eqnarray} \tag{ 31 }$

These changes mean that we perform the protocol in the subspace orthogonal to $| {\phi }_{A,0}^{(1)}\rangle$ . When the algorithm converges again, after N₁ iterations more, we have that the states ${\hat{D}}^{({N}_{0}+{N}_{1})}| {\phi }_{A,0}^{(1)}\rangle$ and ${\hat{D}}^{({N}_{0}+{N}_{1})}| {\phi }_{A,1}^{(1)}\rangle$ are approximate eigenvectors. Therefore, to obtain the next eigenvector we perform the algorithm again but in the subspace orthogonal to $\{| {\phi }_{A,0}^{(1)}\rangle ,| {\phi }_{A,1}^{(1)}\rangle \}$ , and so on. At $N={N}_{0}+{N}_{1}\,+...+\,{N}_{d-2}$ iterations we have that the states $| {\phi }_{A,j}^{N}\rangle ={\hat{D}}^{(N)}| {\phi }_{A,j}^{(1)}\rangle$ with $j=0,1,\ldots ,d-1$ are the d eigenvectors of ${\hat{{ \mathcal O }}}_{E}$ .

2.3. Multiqubit case

For this case, we can suppose that the system A is a qudit state, where now the states $| j\rangle$ of the basis, correspond to the binary representation of j with ${{log}}_{2}(d)$ digits. For example, for d = 16 we have 4 digits, where each of them represents the state of a qubit; then $| 5\rangle =| 0101\rangle$ . Also, we can produce the different operators ${\hat{u}}_{j}$ using controlled-not gates and single-qubit rotations [53]. Therefore, we can map this problem to the qudit case obtaining the same algorithm as in the previous case.

As we can see from this section, our protocol does not need to encode quantum information in a classical processor, being advantageous with respect to classical algorithms that need to characterize the quantum interactions by quantum tomography. The latter imply hundreds of measurements of the quantum system, using in this process more resources than the entire algorithm proposed. Moreover, as our algorithm finds the eigenstate statistically, it is simpler than a full quantum algorithm that finds the eigenstates exactly, being our protocol experimentally feasible. The [51, 52] show the experimental implementation of an algorithm that employs the same basics steps in which our current algorithm is based, for the case of quantum states, instead of quantum operators, opening the door to the implementation of this work.

3. Numerical results

It is convenient to define the following quantities for the numerical analysis of the protocol, $\nu =r\cdot p\Rightarrow p=\nu /r$ , with r (p) the reward (punishment) rate, the total number of rewards n_r and the total number of punishments n_p in the algorithm. The VF of our algorithm is the value of ${w}^{(N)}={r}^{{n}_{r}}{p}^{{n}_{p}}$ where $N={n}_{r}+{n}_{p}$ are the total number of iterations. Also, we can rewrite

$\begin{eqnarray}&&{w}^{(N)}={r}^{{n}_{r}-{n}_{p}}{\nu }^{{n}_{p}},\end{eqnarray} \tag{ 32 }$

where the convergence condition is given by ${w}^{(N)}\ll 1$ . If $\nu \lt 1$ , we see from equation (32) that the convergence condition can be satisfied even if ${n}_{p}\sim {n}_{r}$ , which implies that the protocol does not necessarily converge to the eigenstates of ${\hat{{ \mathcal O }}}_{E}$ . If ν = 1, we have that ${w}^{(N)}\to 0\ \Longleftrightarrow \ {n}_{r}\gg {n}_{p}$ . For ν > 1, the algorithm converges whenever ${n}_{r} \ggg {n}_{p}$ . Moreover, when ν is larger, the algorithm needs more iterations to converge, but nevertheless it achieves larger fidelities. This is the exploration versus exploitation balance known in RL. Here, we perform the simulation for a single- and two-qubit case for different values of ν and r. Remember that for all cases we choose w⁽¹⁾ = 1. Also, for simplicity we choose $| {\phi }_{A,0}^{(1)}\rangle =| 0\rangle$ for the single-qubit case and $| {\phi }_{A,j}^{(1)}\rangle =| {j}_{{\rm{bin}}}\rangle$ for the two-qubit case, where ${j}_{{\rm{bin}}}$ is the binary representation of j, e.g. $| {\phi }_{A,2}^{(1)}\rangle =| 10\rangle$ . Moreover, ${\hat{D}}^{(1)}={\mathbb{I}}$ for all cases.

Finally, as the unitary operator ${\hat{u}}_{j}$ given by equation (22) depends on pseudo-randoms angles, we perform many times the algorithm, defining the mean fidelity ${ \mathcal F }$ and the mean searching range ${ \mathcal W }$ as

$\begin{eqnarray}\begin{array}{rcl}{{ \mathcal F }}_{j}(k) & = & \mathop{\max }\limits_{{\ell }}\displaystyle \frac{1}{{ \mathcal N }}\displaystyle \sum _{i=1}^{{ \mathcal N }}| \langle {{\ell }}_{E}| {\hat{D}}_{i}^{(k)}| j\rangle | ,\\ { \mathcal W }(k) & = & \displaystyle \frac{1}{{ \mathcal N }}\displaystyle \sum _{i=1}^{{ \mathcal N }}{w}_{i}^{(k)},\end{array}\end{eqnarray} \tag{ 33 }$

where $| {{\ell }}_{E}\rangle$ is the ℓth eigenvector of ${\hat{{ \mathcal O }}}_{E}$ , the index i refers to the ith repetition of the protocol and ${ \mathcal N }$ is the total number of repetitions. In all subsequent cases we choose ${ \mathcal N }=1000$ .

3.1. Single-qubit case

For the general performance of our protocol, we start with a ${\hat{{ \mathcal O }}}_{E}$ described by a random Hermitian matrix. Figure 2 shows the mean fidelity ${{ \mathcal F }}_{0}(k)={{ \mathcal F }}_{1}(k)$ for different values of the reward rate r, and the parameter ν. From this figure, we can see that for r = 0.9 and ν = 2, we obtain ${{ \mathcal F }}_{0}(k)\gt 0.98$ with k < 300. Also, in all cases we have ${{ \mathcal F }}_{0}(k)\gt 0.90$ for k < 10. It means that using a reduced number of iterations we can obtain good fidelities for the eigenvector of a completely random single-qubit operator. On the other hand, we observe that when r and ν are larger, the maximum value of ${{ \mathcal F }}_{0}(k)$ increases, but we need more iterations for the convergence of the algorithm. Figure 3 shows the mean searching range ${ \mathcal W }(k)$ for the same cases. From this figure we can clearly see how the algorithm needs less iterations when r and ν decrease, with the extreme case of r = 0.6, ν = 1, where the algorithm converges before 70 iterations.

Figure 2. Refer to the following caption and surrounding text. — **Figure 2.** Numerical results for the mean fidelity ${{ \mathcal F }}_{0}(k)$ given by equation (33) where ${\hat{{ \mathcal O }}}_{E}$ corresponds to a random Hermitian matrix acting over a single qubit. We employ ${ \mathcal N }=1000$ .
Download figure:
Standard image High-resolution image

**Figure 2.** Numerical results for the mean fidelity ${{ \mathcal F }}_{0}(k)$ given by equation (33) where ${\hat{{ \mathcal O }}}_{E}$ corresponds to a random Hermitian matrix acting over a single qubit. We employ ${ \mathcal N }=1000$ .
Download figure:
Standard image High-resolution image

Figure 3. Refer to the following caption and surrounding text. — **Figure 3.** Numerical results for the mean searching rate ${ \mathcal W }(k)$ given by equation (33) where ${\hat{{ \mathcal O }}}_{E}$ corresponds to a random Hermitian matrix acting over a single-qubit. We employ ${ \mathcal N }=1000$ .
Download figure:
Standard image High-resolution image

**Figure 3.** Numerical results for the mean searching rate ${ \mathcal W }(k)$ given by equation (33) where ${\hat{{ \mathcal O }}}_{E}$ corresponds to a random Hermitian matrix acting over a single-qubit. We employ ${ \mathcal N }=1000$ .
Download figure:
Standard image High-resolution image

Now, we consider a particular example ${\hat{{ \mathcal O }}}_{E}={\hat{S}}_{x}=\tfrac{1}{2}{\sigma }_{x}$ . In this case, the distance in the Bloch sphere between $| 0\rangle$ and the eigenstates of ${\hat{{ \mathcal O }}}_{E}$ is the largest possible. Figure 4 shows that our algorithm converges with few iterations to good approximations of the eigenvectors, we can see that we obtain the eigenvectors with fidelity above 98% in 400 iterations, for the case ν = 2 and r = 0.9.

Figure 4. Refer to the following caption and surrounding text. — **Figure 4.** Numerical results for the mean fidelity ${{ \mathcal F }}_{0}(k)$ and the mean searching rate ${ \mathcal W }(k)$ given by equation (33), where ${\hat{{ \mathcal O }}}_{E}={\hat{S}}_{x}$ . We employ ${ \mathcal N }=1000$ .
Download figure:
Standard image High-resolution image

As we can see, the maximum fidelity for the case ${\hat{{ \mathcal O }}}_{E}={\hat{S}}_{x}$ has decreased with respect to the random one. This is because the distance between $| 0\rangle$ and the eigenvectors of ${\hat{S}}_{x}$ is larger than the distance between $| 0\rangle$ and the eigenvectors of ${\hat{{ \mathcal O }}}_{E}$ in the random case, therefore, the protocol has worse convergence.

3.2. Two-qubit case

This case is analogous to the single-qudit case with d = 4. First, for a general performance, we consider ${\hat{{ \mathcal O }}}_{E}$ as a random two-qubit operator. Moreover, we choose ${ \mathcal N }=1000$ and calculate the mean fidelity ${{ \mathcal F }}_{j}(k)$ and the mean searching range ${{ \mathcal W }}_{j}$ given by equation (33). Figure 5 shows the numerical calculation for r = 0.9 and ν = {1.5,2}. It shows again that for small ν the convergence is faster but the maximum value of ${{ \mathcal F }}_{j}$ is smaller. Furthermore, with ν = 2 we need 8500 iterations such that the four approximate eigenvectors converge. With ν = 1.5, we only need 6000 iterations. Nevertheless, for ν = 2 we obtain ${{ \mathcal F }}_{j}\gt 0.89$ for all j, with even ${{ \mathcal F }}_{2}$ and ${{ \mathcal F }}_{3}$ up to 0.93. In the other case, with ν = 1.5, the maximum values are ${{ \mathcal F }}_{0}\sim 0.88$ , and $\{{{ \mathcal F }}_{1},{{ \mathcal F }}_{2},{{ \mathcal F }}_{3}\}\lt 0.92$ . Also, we can see from the evolution of ${ \mathcal W }(k)$ that the number of iterations needed for the convergence is smaller each time that the algorithm starts again to approximate the next eigenvector, that is, N₀ > N₁ > N₂. Finally, we consider as special case ${\hat{{ \mathcal O }}}_{E}=\hat{B}$ , where $\hat{B}$ is an operator given by

$\begin{eqnarray}&&\hat{B}=| {\phi }_{+}\rangle \langle {\phi }_{+}| -| {\phi }_{-}\rangle \langle {\phi }_{-}| +2\left(| {\psi }_{+}\rangle \langle {\psi }_{+}| -| {\psi }_{-}\rangle \langle {\psi }_{-}| \right),\end{eqnarray} \tag{ 34 }$

with

$\begin{eqnarray}\begin{array}{rcl}| {\phi }_{\pm }\rangle & = & \sqrt{\displaystyle \frac{1}{2}}\left(| 00\rangle \pm | 11\rangle \right),\\ | {\psi }_{\pm }\rangle & = & \sqrt{\displaystyle \frac{1}{2}}\left(| 01\rangle \pm | 10\rangle \right),\end{array}\end{eqnarray} \tag{ 35 }$

the maximally-entangled Bell states. Figure 6 shows the performance of our protocol for this case. We can see that we obtain high fidelities ( ${{ \mathcal F }}_{j}\gt 0.99$ ) with only 1000 iterations to approximate the four eigenvectors. We obtain this performance due to the fact that our algorithm is sensitive to the number of the product states involved in each subspace (dimension of the subspace) and not to the total dimension of the operator ${\hat{{ \mathcal O }}}_{E}$ . In this case, the operator $\hat{B}$ is block-diagonal, where one block acts in the subspace $\{| 00\rangle ,| 11\rangle \}$ and the other in $\{| 01\rangle ,| 10\rangle \}$ . This implies that the present case is similar to two independent single-qubit cases. In figure 6, we can see that from k = 1 to k = 500 we approximate the eigenstates of the first block, that is $| {\phi }_{\pm }\rangle$ at the same time, and from k = 501 to k = 1000 we approximate the eigenstates of the second block $| {\psi }_{\pm }\rangle$ , where both cases have a performance similar to the single-qubit case.

Figure 5. Refer to the following caption and surrounding text. — **Figure 5.** Numerical results for the mean fidelity ${{ \mathcal F }}_{j}(k)$ and the mean searching rate ${ \mathcal W }(k)$ given by equation (33), where ${\hat{{ \mathcal O }}}_{E}$ is a random two-qubit operator. We employ ${ \mathcal N }=1000$ and r = 0.9.
Download figure:
Standard image High-resolution image

**Figure 5.** Numerical results for the mean fidelity ${{ \mathcal F }}_{j}(k)$ and the mean searching rate ${ \mathcal W }(k)$ given by equation (33), where ${\hat{{ \mathcal O }}}_{E}$ is a random two-qubit operator. We employ ${ \mathcal N }=1000$ and r = 0.9.
Download figure:
Standard image High-resolution image

Figure 6. Refer to the following caption and surrounding text. — **Figure 6.** Numerical results for the mean fidelity ${{ \mathcal F }}_{j}(k)$ and the mean searching rate ${ \mathcal W }(k)$ given by equation (33). Here, ${\hat{{ \mathcal O }}}_{E}=\hat{B}$ , which is described by equation (34). We employ ${ \mathcal N }=1000,r=0.9$ , and ν = 2.
Download figure:
Standard image High-resolution image

4. Conclusions

We propose and analyze an approximate quantum eigensolver based on RL with minimal resources. This proposal can be classified as a hybrid classical-quantum algorithm, such that we use a classical optimization algorithm to change a quantum system to improve a quantum task using a feedback loop combined with partially-random unitary gates. This is in contrast with other hybrid algorithms that measure the fidelities or some expectation value in each step. Therefore, our proposal is advantageous with respect to the usual hybrid algorithms, in the sense that our protocol needs minimal storage to save only the last step of the algorithm and employs just one single-shot measurement per iteration, instead of fidelities or expectation-value measurements, which decrease the effect of the source of noise. Moreover, our protocol considers pseudo-random two-level rotations, such that it is not necessary to implement high-fidelity operations, because the randomness of the algorithm absorbs the errors of the gates. For this reason, our algorithm would be experimentally feasible in almost any current quantum platform.

Additionally, we validated our proposal with numerical calculations of four different choices of the operator ${\hat{{ \mathcal O }}}_{E}$ , random single-qubit operator, ${\hat{S}}_{x}$ operator, random two-qubit operator, and $\hat{B}$ operator defined by equation (34), obtaining as a general rule that our algorithm reaches higher fidelities for the approximate eigenvectors for large values of ν and r, but the convergence in this case is slower. This is related to the balance between exploration and exploitation typical from RL algorithms. Moreover, our algorithm is sensitive to the size of the different subspaces expanded by product states and not to the size of the total space of the operator ${\hat{{ \mathcal O }}}_{E}$ . This is the case showed in figure 6, where the eigenvectors are the maximally-entangled Bell states. We point out that, in order to improve the performance of the protocol in future extensions, it could be interesting to study dynamical reward rates (r) and dynamical parameter ν.

Finally, due to the simplicity, minimal resources employed by our protocol, and the fact that we need only a basic classical processor (command center) capable to perform pseudo-random rotations, it can be useful for the development of near future semi-autonomous quantum devices, which will have to make decisions with incomplete information obtained by interaction with the external environment.

Acknowledgments

We acknowledge support from Financiamiento Basal para Centros Científicos y Tecnológicos de Excelencia (Grant No. FB0807), projects QMiCS (820505) and OpenSuperQ (820363) of the EU Flagship on Quantum Technologies, EU FET Open Grant Quromorphic, Basque Government IT986-16, and PGC2018-095113-B-I00 (MCIU/AEI/FEDER, UE).

Data availability statement

The data that support the findings of this study are openly available at https://github.com/PanchoAlbarran/EigenSolver.

Appendix A.: Explicit form of ${\bar{\theta }}^{(k)}$ and ${\bar{\phi }}^{(k)}$

Here, we further clarify the protocol developed in the main text.

From equation (4), we have

$\begin{eqnarray}\begin{array}{rcl}| 0\rangle & = & \cos \left(\displaystyle \frac{\alpha }{2}\right)| {v}_{0}\rangle +\sin \left(\displaystyle \frac{\alpha }{2}\right)| {v}_{1}\rangle \\ | 1\rangle & = & {{\rm{e}}}^{-{\rm{i}}\beta }\left[\sin \left(\displaystyle \frac{\alpha }{2}\right)| {v}_{0}\rangle -\cos \left(\displaystyle \frac{\alpha }{2}\right)| {v}_{1}\rangle \right].\end{array}\end{eqnarray} \tag{ A.1 }$

Replacing equation (7) we obtain

$\begin{eqnarray}\begin{array}{rcl}| {\phi }_{A,0}^{(k)}\rangle & = & \left[\cos \left(\displaystyle \frac{{\theta }^{(k)}}{2}\right)\cos \left(\displaystyle \frac{\alpha }{2}\right)+{{\rm{e}}}^{{\rm{i}}({\varphi }^{(k)}-\beta )}\sin \left(\displaystyle \frac{{\theta }^{(k)}}{2}\right)\sin \left(\displaystyle \frac{\alpha }{2}\right)\right]| {v}_{0}\rangle \\ & & +\left[\cos \left(\displaystyle \frac{{\theta }^{(k)}}{2}\right)\sin \left(\displaystyle \frac{\alpha }{2}\right)-{{\rm{e}}}^{{\rm{i}}({\varphi }^{(k)}-\beta )}\sin \left(\displaystyle \frac{{\theta }^{(k)}}{2}\right)\cos \left(\displaystyle \frac{\alpha }{2}\right)\right]| {v}_{1}\rangle .\qquad \end{array}\end{eqnarray} \tag{ A.2 }$

Thus,

$\begin{eqnarray}\begin{array}{rcl}| {\bar{\phi }}_{A,0}^{(k)}\rangle & = & {\hat{U}}_{E}| {\phi }_{A,0}^{(k)}\rangle ={{\rm{e}}}^{-{\rm{i}}{\lambda }_{0}\tau }\left[\cos \left(\displaystyle \frac{{\theta }^{(k)}}{2}\right)\cos \left(\displaystyle \frac{\alpha }{2}\right)+{{\rm{e}}}^{{\rm{i}}({\varphi }^{(k)}-\beta )}\sin \left(\displaystyle \frac{{\theta }^{(k)}}{2}\right)\sin \left(\displaystyle \frac{\alpha }{2}\right)\right]| {v}_{0}\rangle \\ & & +{{\rm{e}}}^{-{\rm{i}}{\lambda }_{1}\tau }\left[\cos \left(\displaystyle \frac{{\theta }^{(k)}}{2}\right)\sin \left(\displaystyle \frac{\alpha }{2}\right)-{{\rm{e}}}^{{\rm{i}}({\varphi }^{(k)}-\beta )}\sin \left(\displaystyle \frac{{\theta }^{(k)}}{2}\right)\cos \left(\displaystyle \frac{\alpha }{2}\right)\right]| {v}_{1}\rangle .\end{array}\end{eqnarray} \tag{ A.3 }$

By means of the definition of $| {v}_{0}\rangle$ and $| {v}_{1}\rangle$ given by equation (4), we obtain

$\begin{eqnarray}\begin{array}{rcl}| {\bar{\phi }}_{A,0}^{(k)}\rangle & = & {{\rm{e}}}^{-{\rm{i}}{\lambda }_{0}\tau }\left[\cos \left(\displaystyle \frac{{\theta }^{(k)}}{2}\right)\cos \left(\displaystyle \frac{\alpha }{2}\right)+{{\rm{e}}}^{{\rm{i}}({\varphi }^{(k)}-\beta )}\sin \left(\displaystyle \frac{{\theta }^{(k)}}{2}\right)\sin \left(\displaystyle \frac{\alpha }{2}\right)\right]\\ & & \cdot \left[\cos \left(\displaystyle \frac{\alpha }{2}\right)| 0\rangle +{{\rm{e}}}^{{\rm{i}}\beta }\sin \left(\displaystyle \frac{\alpha }{2}\right)| 1\rangle \right]+{{\rm{e}}}^{-{\rm{i}}{\lambda }_{1}\tau }\left[\cos \left(\displaystyle \frac{{\theta }^{(k)}}{2}\right)\sin \left(\displaystyle \frac{\alpha }{2}\right)\right.\\ & & \left.-{{\rm{e}}}^{{\rm{i}}({\varphi }^{(k)}-\beta )}\sin \left(\displaystyle \frac{{\theta }^{(k)}}{2}\right)\cos \left(\displaystyle \frac{\alpha }{2}\right)\right]\left[\sin \left(\displaystyle \frac{\alpha }{2}\right)| 0\rangle -{{\rm{e}}}^{{\rm{i}}\beta }\cos \left(\displaystyle \frac{\alpha }{2}\right)| 1\rangle \right].\end{array}\end{eqnarray} \tag{ A.4 }$

We rewrite the eigenvalues as λ₀ = δ − λ and λ₁ = δ + λ where δ = (λ₁ + λ₀)/2 and λ = (λ₁ − λ₀)/2. Then, we rewrite equation (A.4) up to a global phase as

$\begin{eqnarray}\begin{array}{rcl}| {\bar{\phi }}_{A,0}^{(k)}\rangle & = & \left[\cos \left(\displaystyle \frac{{\theta }^{(k)}}{2}\right)\cos (\lambda \tau )+\mathrm{icos}\left(\displaystyle \frac{{\theta }^{(k)}}{2}\right)\cos (\alpha )\sin (\lambda \tau )\right.\\ & & \left.+{\mathrm{ie}}^{{\rm{i}}({\varphi }^{(k)}-\beta )}\sin \left(\displaystyle \frac{{\theta }^{(k)}}{2}\right)\sin (\alpha )\sin (\lambda \tau )\right]| 0\rangle +{{\rm{e}}}^{{\rm{i}}{\varphi }^{(k)}}\left[{\mathrm{ie}}^{-{\rm{i}}({\varphi }^{(k)}-\beta )}\cos \left(\displaystyle \frac{{\theta }^{(k)}}{2}\right)\sin (\alpha )\sin (\lambda \tau )\right.\\ & & \left.+\sin \left(\displaystyle \frac{{\theta }^{(k)}}{2}\right)\cos (\lambda \tau )-\mathrm{isin}\left(\displaystyle \frac{{\theta }^{(k)}}{2}\right)\cos (\alpha )\sin (\lambda \tau )\right]| 1\rangle .\end{array}\end{eqnarray} \tag{ A.5 }$

This state has the form

$\begin{eqnarray}&&| {\bar{\phi }}_{A,0}^{(k)}\rangle =({a}_{0}+{\rm{i}}{b}_{0})| {v}_{0}\rangle +{{\rm{e}}}^{{\rm{i}}{\varphi }^{(k)}}({a}_{1}+{\rm{i}}{b}_{1})| {v}_{1}\rangle ,\end{eqnarray} \tag{ A.6 }$

with

$\begin{eqnarray}\begin{array}{rcl}{a}_{0} & = & \cos \left(\displaystyle \frac{{\theta }^{(k)}}{2}\right)\cos (\lambda \tau )-\sin ({\varphi }^{(k)}-\beta )\sin \left(\displaystyle \frac{{\theta }^{(k)}}{2}\right)\sin (\alpha )\sin (\lambda \tau ),\\ {b}_{0} & = & \cos \left(\displaystyle \frac{{\theta }^{(k)}}{2}\right)\cos (\alpha )\sin (\lambda \tau )+\cos ({\varphi }^{(k)}-\beta )\sin \left(\displaystyle \frac{{\theta }^{(k)}}{2}\right)\sin (\alpha )\sin (\lambda \tau ),\\ {a}_{1} & = & \sin \left(\displaystyle \frac{{\theta }^{(k)}}{2}\right)\cos (\lambda \tau )+\sin ({\varphi }^{(k)}-\beta )\cos \left(\displaystyle \frac{{\theta }^{(k)}}{2}\right)\sin (\alpha )\sin (\lambda \tau ),\\ {b}_{1} & = & -\sin \left(\displaystyle \frac{{\theta }^{(k)}}{2}\right)\cos (\alpha )\sin (\lambda \tau )+\cos ({\varphi }^{(k)}-\beta )\cos \left(\displaystyle \frac{{\theta }^{(k)}}{2}\right)\sin (\alpha )\sin (\lambda \tau ).\end{array}\end{eqnarray} \tag{ A.7 }$

Finally, up to a global phase, the state given by equation (A.7) can be written in the form of equation (8), where

$\begin{eqnarray}&&{\bar{\theta }}^{(k)}={\cos }^{-1}\left(\sqrt{{a}_{0}^{2}+{b}_{0}^{2}}\right);{\bar{\varphi }}^{(k)}=\left[{\varphi }^{(k)}+{\tan }^{-1}\left(\displaystyle \frac{{b}_{1}}{{a}_{1}}\right)-{\tan }^{-1}\left(\displaystyle \frac{{b}_{0}}{{a}_{0}}\right)\right]{\rm{mod}}(2\pi ).\end{eqnarray} \tag{ A.8 }$

Appendix B.: Explicit form of ${{\rm{\Delta }}}_{\theta }^{(k)}$ and ${{\rm{\Delta }}}_{\varphi }^{(k)}$

From equations (7) and (9) we have

$\begin{eqnarray}\begin{array}{rcl}| 0\rangle & = & \cos \left(\displaystyle \frac{{\theta }^{(k)}}{2}\right)| {\phi }_{A,0}^{(k)}\rangle +\sin \left(\displaystyle \frac{{\theta }^{(k)}}{2}\right)| {\phi }_{A,1}^{(k)}\rangle ,\\ | 1\rangle & = & {{\rm{e}}}^{-{\rm{i}}{\varphi }^{(k)}}\sin \left(\displaystyle \frac{{\theta }^{(k)}}{2}\right)| {\phi }_{A,0}^{(k)}\rangle -{{\rm{e}}}^{-{\rm{i}}{\varphi }^{(k)}}\cos \left(\displaystyle \frac{{\theta }^{(k)}}{2}\right)| {\phi }_{A,1}^{(k)}\rangle .\end{array}\end{eqnarray} \tag{ B.1 }$

Replacing this expression in the first line of equation (8), we obtain

$\begin{eqnarray}\begin{array}{rcl}| {\bar{\phi }}_{A,0}^{(k)}\rangle & = & \left[\cos \left(\displaystyle \frac{{\bar{\theta }}^{(k)}}{2}\right)\cos \left(\displaystyle \frac{{\theta }^{(k)}}{2}\right)+{{\rm{e}}}^{{\rm{i}}({\bar{\varphi }}^{(k)}-{\varphi }^{(k)})}\sin \left(\displaystyle \frac{{\bar{\theta }}^{(k)}}{2}\right)\sin \left(\displaystyle \frac{{\theta }^{(k)}}{2}\right)\right]| {\phi }_{A,0}^{(k)}\rangle \\ & & +\left[\cos \left(\displaystyle \frac{{\bar{\theta }}^{(k)}}{2}\right)\sin \left(\displaystyle \frac{{\theta }^{(k)}}{2}\right)-{{\rm{e}}}^{{\rm{i}}({\bar{\varphi }}^{(k)}-{\varphi }^{(k)})}\sin \left(\displaystyle \frac{{\bar{\theta }}^{(k)}}{2}\right)\cos \left(\displaystyle \frac{{\theta }^{(k)}}{2}\right)\right]| {\phi }_{A,1}^{(k)}\rangle \\ & = & {{\rm{e}}}^{{\rm{i}}{{\rm{\Psi }}}_{0}}\cos \left(\displaystyle \frac{{{\rm{\Delta }}}_{\theta }^{(k)}}{2}\right)| {\phi }_{A,0}^{(k)}\rangle +{{\rm{e}}}^{{\rm{i}}{{\rm{\Psi }}}_{1}}\sin \left(\displaystyle \frac{{{\rm{\Delta }}}_{\theta }^{(k)}}{2}\right)| {\phi }_{A,1}^{(k)}\rangle ,\end{array}\end{eqnarray} \tag{ B.2 }$

where

$\begin{eqnarray}\begin{array}{rcl}{\cos }^{2}\left(\displaystyle \frac{{{\rm{\Delta }}}_{\theta }^{(k)}}{2}\right) & = & {\cos }^{2}\left(\displaystyle \frac{{\bar{\theta }}^{(k)}}{2}\right){\cos }^{2}\left(\displaystyle \frac{{\theta }^{(k)}}{2}\right)+{\sin }^{2}\left(\displaystyle \frac{{\bar{\theta }}^{(k)}}{2}\right){\sin }^{2}\left(\displaystyle \frac{{\theta }^{(k)}}{2}\right)\\ & & +2\cos \left(\displaystyle \frac{{\bar{\theta }}^{(k)}}{2}\right)\sin \left(\displaystyle \frac{{\bar{\theta }}^{(k)}}{2}\right)\cos \left(\displaystyle \frac{{\theta }^{(k)}}{2}\right)\sin \left(\displaystyle \frac{{\theta }^{(k)}}{2}\right)\cos ({\bar{\varphi }}^{(k)}-{\varphi }^{(k)})\\ & = & {\left[\cos \left(\displaystyle \frac{{\bar{\theta }}^{(k)}}{2}\right)\cos \left(\displaystyle \frac{{\theta }^{(k)}}{2}\right)+\sin \left(\displaystyle \frac{{\bar{\theta }}^{(k)}}{2}\right)\sin \left(\displaystyle \frac{{\theta }^{(k)}}{2}\right)\right]}^{2}\\ & & +2\cos \left(\displaystyle \frac{{\bar{\theta }}^{(k)}}{2}\right)\sin \left(\displaystyle \frac{{\bar{\theta }}^{(k)}}{2}\right)\cos \left(\displaystyle \frac{{\theta }^{(k)}}{2}\right)\sin \left(\displaystyle \frac{{\theta }^{(k)}}{2}\right)\left[\cos ({\bar{\varphi }}^{(k)}-{\varphi }^{(k)})-1\right]\\ & = & {\cos }^{2}\left(\displaystyle \frac{{\bar{\theta }}^{(k)}-{\theta }^{(k)}}{2}\right)+\displaystyle \frac{1}{2}\sin ({\bar{\theta }}^{(k)})\sin ({\theta }^{(k)})\left[\cos ({\bar{\varphi }}^{(k)}-{\varphi }^{(k)})-1\right],\end{array}\end{eqnarray} \tag{ B.3 }$

$\begin{eqnarray}&&{{\rm{\Psi }}}_{0}={\tan }^{-1}\left[\displaystyle \frac{\sin ({\bar{\varphi }}^{(k)}-{\varphi }^{(k)})\sin \left(\tfrac{{\bar{\theta }}^{(k)}}{2}\right)\sin \left(\tfrac{{\theta }^{(k)}}{2}\right)}{\cos \left(\tfrac{{\bar{\theta }}^{(k)}}{2}\right)\cos \left(\tfrac{{\theta }^{(k)}}{2}\right)+\cos ({\bar{\varphi }}^{(k)}-{\varphi }^{(k)})\sin \left(\tfrac{{\bar{\theta }}^{(k)}}{2}\right)\sin \left(\tfrac{{\theta }^{(k)}}{2}\right)}\right]\end{eqnarray} \tag{ B.4 }$

and

$\begin{eqnarray}&&{{\rm{\Psi }}}_{1}={\tan }^{-1}\left[\displaystyle \frac{\sin ({\bar{\varphi }}^{(k)}-{\varphi }^{(k)})\sin \left(\tfrac{{\bar{\theta }}^{(k)}}{2}\right)\cos \left(\tfrac{{\theta }^{(k)}}{2}\right)}{\cos \left(\tfrac{{\bar{\theta }}^{(k)}}{2}\right)\sin \left(\tfrac{{\theta }^{(k)}}{2}\right)+\cos ({\bar{\varphi }}^{(k)}-{\varphi }^{(k)})\sin \left(\tfrac{{\bar{\theta }}^{(k)}}{2}\right)\cos \left(\tfrac{{\theta }^{(k)}}{2}\right)}\right].\end{eqnarray} \tag{ B.5 }$

Finally, up to a global phase, we can write the state $| {\bar{\phi }}_{A,0}^{(k)}\rangle$ as

$\begin{eqnarray}&&| {\bar{\phi }}_{A,0}^{(k)}\rangle =\cos \left(\displaystyle \frac{{{\rm{\Delta }}}_{\theta }^{(k)}}{2}\right)| {\phi }_{A,0}^{(k)}\rangle +{{\rm{e}}}^{{\rm{i}}{{\rm{\Delta }}}_{\phi }^{(k)}}\sin \left(\displaystyle \frac{{{\rm{\Delta }}}_{\theta }^{(k)}}{2}\right)| {\phi }_{A,1}^{(k)}\rangle \end{eqnarray} \tag{ B.6 }$

with ${{\rm{\Delta }}}_{\phi }^{(k)}={{\rm{\Psi }}}_{1}-{{\rm{\Psi }}}_{0}$ .

Author e-mails

Author affiliations

ORCID iDs

Dates

Peer review information

Reinforcement learning for semi-autonomous approximate quantum eigensolver

Article metrics

Submit

Share this article

Author e-mails

Author affiliations

ORCID iDs

Dates

Peer review information

Abstract

1. Introduction

2. Quantum eigensolver protocol

2.1. Single-qubit case

2.2. Single-qudit case

2.3. Multiqubit case

3. Numerical results

3.1. Single-qubit case

3.2. Two-qubit case

4. Conclusions

Acknowledgments

Data availability statement

Appendix A.: Explicit form of {\bar{\theta }}^{(k)} and {\bar{\phi }}^{(k)}

Appendix B.: Explicit form of {{\rm{\Delta }}}_{\theta }^{(k)} and {{\rm{\Delta }}}_{\varphi }^{(k)}

Appendix A.: Explicit form of ${\bar{\theta }}^{(k)}$ and ${\bar{\phi }}^{(k)}$

Appendix B.: Explicit form of ${{\rm{\Delta }}}_{\theta }^{(k)}$ and ${{\rm{\Delta }}}_{\varphi }^{(k)}$