Universal discriminative quantum neural networks

Chen, H.; Wossnig, L.; Severini, S.; Neven, H.; Mohseni, M.

doi:10.1007/s42484-020-00025-7

Universal discriminative quantum neural networks

Research Article
Open access
Published: 15 December 2020

Volume 3, article number 1, (2021)
Cite this article

Download PDF

You have full access to this open access article

Quantum Machine Intelligence Aims and scope Submit manuscript

Universal discriminative quantum neural networks

Download PDF

H. Chen^1,2,3,
L. Wossnig^2,3,
S. Severini^2,4,
H. Neven⁵ &
…
M. Mohseni⁵

4713 Accesses
44 Citations
14 Altmetric
1 Mention
Explore all metrics

Abstract

Recent results have demonstrated the successful applications of quantum-classical hybrid methods to train quantum circuits for a variety of machine learning tasks. A natural question to ask is consequentially whether we can also train such quantum circuits to discriminate quantum data, i.e., perform classification on data stored in form of quantum states. Although quantum mechanics fundamentally forbids deterministic discrimination of non-orthogonal states, we show in this work that it is possible to train a quantum circuit to discriminate such data with a trade-off between minimizing error rates and inconclusiveness rates of the classification tasks. Our approach achieves at the same time a performance which is close to the theoretically optimal values and a generalization ability to previously unseen quantum data. This generalization power hence distinguishes our work from previous circuit optimization results and furthermore provides an example of a quantum machine learning task that has inherently no classical analogue.

Hybrid Quantum Machine Learning Classifier with Classical Neural Network Transfer Learning

Hyperparameter Importance of Quantum Neural Networks Across Small Datasets

Training deep quantum neural networks

Article Open access 10 February 2020

Discover the latest articles, news and stories from top researchers in related subjects.

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Quantum computation has been shown to provide speedups in several applications over classical computation in the query model. Besides the famous Shor’s algorithm for prime number factorization, quantum computers can also produce statistical patterns that are hard to produce for classical devices. This raises the possibility that quantum computers can also recognize patterns that are hard to recognize for classical computers, or, in general, that quantum computers can help solve classical machine learning problems more efficiently. Recently, the intersection of quantum computation and machine learning has received a considerable amount of attention. Using the circuit model of computation, several quantum algorithms have been designed that in principle provide quadratic to exponential speedups on classical data (Biamonte et al. 2017; Ciliberto et al. 2018).

A related area is concerned with developing novel machine learning methods that operate on quantum data. In general, any set of quantum states which encode meaningful information can be considered quantum data. To motivate this direction, we want to emphasize that using quantum states as a storage medium for information has been demonstrated to provide advantages in several ways. For example, by coupling a quantum state with another target system, we can obtain information about the target system with increased sensitivity. Quantum metrology allows for example for a quadratic improvement over classical methods in terms of the statistical sampling error, i.e., the scaling of the standard deviations in estimates obtained through repeated measurements. Another example is quantum sensing, which provides much higher sensitivity for tasks like target detections in microwaves, i.e., quantum radar (Barzanjeh et al. 2015), and in general, sensing electric or magnetic fields (Degen et al. 2017). A practical application for these methods is the reduction of damage to pictures which are sensitive to the exposure of light (Schaller and Schützhold 2006).

Certain types of datasets are inherently quantum mechanical. Such data could, for example, be the output of quantum information processing procedures such as simulation of quantum materials, or quantum chemistry more generally. For such datasets, we conjecture the inherent advantage of quantum computers to perform recognition and classification tasks. For example, topological materials made in the exotic topological phase have non-classical electronic properties and are promising materials to build fault-tolerant quantum computers (Qi and Zhang 2011; Karzig et al. 2017). Predicting the phase of topological materials has been a very challenging problem for classical approaches. However, it has recently been shown that quantum neural networks could be used to recognize the phase of a quantum state (Cong et al. 2019) and hence for predicting this phase. In addition, the promised security of quantum communication protocols and a surge of ideas in quantum communication networks (Kimble 2008; Ren et al. 2017) further stimulates the research into areas dealing with inherently quantum data.

In this work, we explore the general problem of classifying quantum data. This problem can be seen as an extension of the established field of quantum state discrimination, which identifies a quantum state among a set of a priori completely known candidate states. A key challenge for the discrimination of quantum states is that a deterministic discrimination is impossible when the complex vectors representing the input states are non-orthogonal, i.e., when their overlaps are non-zero. Quantum state discrimination then allows finding the measurement that optimally discriminates these states. Note that we will use in the following input data and (quantum) states interchangeably.

However, it is not possible to directly apply quantum state discrimination to classify states, i.e., quantum data. First, it is inappropriate to assume that one possesses the complete knowledge of the input data a priori, which are often only samples generated from a data collecting process. Also, even with all the input data available as quantum states, performing quantum tomography on them is prohibitively expensive. In addition, quantum state discrimination often fails to give the optimal discriminative measurement in an analytically closed form, unless the quantum states are already orthogonal or possess certain symmetry properties (Barnett and Croke 2009). In case it fails, one may use numerical optimization to find the optimal measurement. However, the exponential growth of the dimensionality of the density matrices renders the numerical optimization also inefficient if performed on a classical device.

Due to the limitations of quantum state discrimination, it is natural to ask whether we can use a quantum computer to help with the optimization procedure. Since fully error corrected quantum computers are not available yet, a recent stream of works proposed various applications for circuit learning (Banchi et al. 2016; Wan et al. 2017; Innocenti et al. 2018; Romero et al. 2017; Mitarai et al. 2018; Farhi and Neven 2018; Verdon et al. 2017; Li and Benjamin 2017; Grant et al. 2018; Schuld et al. 2018; Xu et al. 2019; Khatri et al. 2019), which constitutes a form of quantum-classical hybrid neural network that have been shown to be less prone to the inherent errors of early-stage quantum hardware. In this work, we similarly utilize a hybrid approach to learn the design of a shallow quantum circuit for the classification of quantum states. Concretely, this hybrid scheme consists of a classical computer which interactively changes the parameters of a quantum circuit in order to optimize the output of the quantum computation. In other words, we train a quantum circuit to classify the states correctly.

The approach we take is novel in two ways. First, we use a quantum circuit ansatz that is designed for an implementation on near-term devices (details available in Appendix 1). This ansatz allows for a shallow circuit but is still universal; i.e., it can perform any unitary transformation allowed by quantum mechanics. It comprises gates from a universal gate set consisting of C-NOT and single-qubit gates, which is motivated by the fact that their implementations are known for the current mainstream experimental architectures. It is furthermore nearly optimal in terms of the number of C-NOT gates, which is an important feature for an implementation on near-term devices. Second, unlike previous works on quantum state discrimination, we focus on the generalization ability of our circuit; i.e., we train the circuit on a specific range of the parameters with the goal of maximizing its generalization performance, and hence in a learning setting. This distinguishes our work from the pure optimization problem for the state discrimination task, i.e., optimizing the circuit to distinguish only a concrete set of states. We show here that this universal quantum circuit can be trained as a discriminator for classification of non-orthogonal quantum data which is sampled from various different probability distributions. Our discriminator can achieve a near-zero error rate by producing inconclusive signals.

2 Dataset and framework

In this work, we propose a novel approach for training a universal quantum circuit to classify quantum data, which is stored in qubits. In this section, we first introduce the mathematical notation and description of quantum data. We then specify the quantum data we use in this work for classification. Next, we outline the approach we take to optimize a universal quantum circuit which is used to classify the quantum data. We defer the detailed decomposition of this quantum circuit to Appendix 1.

Mathematical descriptions

Quantum data are collections of quantum states which store useful information. For these data, we may assume that their density matrices ρ are parameterized by parameters a which follow a probability distribution α specific to the carried information. Then, for classification, we are normally presented with an unknown quantum state ρ_x, which belongs to a family of quantum states, each described mathematically by:

$$ \begin{array}{@{}rcl@{}} \rho_{i}(a_{i}), a_{i}\sim \alpha_{i}, \end{array} $$

(1)

where i is the label for the corresponding family, and ρ_i(a_i) is the density matrix describing the quantum state in the family i, parametrized by a_i, and the parameters a_i are assumed to follow the probability distribution α_i. The purpose of a classifier is to identify the family x. Note that to train the classifier, we draw samples of ρ(a_i) according to the distribution α_i.

Transformations of quantum states are described by a unitary matrix U, which transforms a quantum state ρ according to the rule ρ → UρU^‡.

A measurement on the quantum state ρ is described by a set of matrices {M_j}, which are Hermitian, positive semi-definite, and sum to the identity. Here, j labels the possible measurement outcomes, and the probability p_j for the measurement outcome j is given by p_j = Tr(M_jρ). Such a collection of matrices M_j is commonly called a positive-operator valued measure (POVM). A common example of POVM is a projection-valued measure (PVM). In the case of a PVM, each M_j is a projector into some linear subspace and different M_j are orthogonal to each other, i.e., M_jM_i = δ_ijM_j. With the help of ancilla qubits, any POVM could be realized by a quantum circuit consisting of a series of unitary matrices (transformations) and measurements in the computational basis. Conversely, a quantum circuit which consists of a parameterized set of gates and measurements could also represent a range of different POVMs. There exists a quantum circuit which could represent any POVM with a fixed number of possible measurement outcomes. Such a circuit is called a universal discriminator in this paper, and the specific one we chose to use here is discussed in Appendix 1.

Dataset

For this work, we restrict our attention to the classification of two families of quantum states stored in a 2-qubit system. Our first family consists of pure states, parametrized by a real number a ∈ [0, 1]:

$$ \begin{array}{@{}rcl@{}} \psi_{1}(a) = \left( \sqrt{1-a^{2}}, 0, a, 0 \right), \rho_{1} =| {\psi_{1}(a)}\rangle \langle{\psi_{1}(a)}|. \end{array} $$

(2)

The second family consists of mixed states ρ₂(b) where b ∈ [0, 1]. Specifically,

$$ \begin{array}{@{}rcl@{}} &\psi_{2/3} = \left( 0, \pm \sqrt{1-b^{2}}, b, 0 \right), \\ &\rho_{2}(b) = \frac{1}{2}|{\psi_{2}}\rangle\langle{\psi_{2}}| + \frac{1}{2} |{\psi_{3}}\rangle\langle {\psi_{3}}|. \end{array} $$

(3)

The overlap between ψ₁ and ψ_2/3 is ab, indicating that the two families of states are non-orthogonal. For the case of a fixed a and $b=\frac {1}{\sqrt {2}}$, the maximal success rate for unambiguously discriminating between ρ₁ and ρ₂ has been studied theoretically, and experimentally demonstrated (Mohseni et al. 2004). The specific distributions we have tested in our experiments are summarized in Table 1. To generate the data for the training, validation, and testing of our circuits, we randomly and independently sampled points from the corresponding distributions.

Table 1 A summary of different test cases we classify in this work

Full size table

Approach

Overall, there are two major strategies to cope with our inherent inability to perform deterministic discrimination of quantum states: (a) Minimum-error discrimination: In this strategy, the task is to minimize the probability that the inevitable errors occur in the classification. (b) Unambiguous discrimination: In this strategy, the discriminator has one more output prediction than the number of classes: the inconclusive outcome. The task is to eliminate the error rate of the discriminator while minimizing the probability of this inevitable inconclusive outcome. A pure unambiguous discrimination with strictly zero error rate is not guaranteed to be possible for arbitrary quantum data. From the perspective of numerical optimization, one hence needs to allow for some small but non-zero errors.

In this work, we use a machine learning approach for training a universal quantum circuit capable of giving any quantum measurements with four possible measurement outcomes $m_{i_{2}i_{1}}$, where i₁, i₂ ∈ {0, 1} are the measurement outcomes of the first and the second qubits respectively. The parameterization of this circuit is discussed in Appendix 1. By assuming that input ρ₁(a) produces the output m₀₀ or m₁₀, input ρ₂ produces the output m₀₁, and assuming that m₁₁ is the inconclusive output, this circuit acts as a discriminator for our experiment datasets. Therefore, we could trivially define various probabilities (success probability P_suc, error probability P_err, and inconclusive probability P_inc) with respect to the input (training) data with known class label. For example, when ρ₁ is the input, the probability of detecting m₀₁ is the P_err, and the probability of detecting m₁₁ is the P_inc. In this work, we perform experiments on simulated quantum computers, where these probabilities are available since the whole state is stored and processed on a classical computer. We note that on real quantum computers, these probabilities need to be estimated through repeated measurements up to some precision, and require repeated data input.

To train the circuit, we use a heuristically motivated loss function defined in Eq. 4, which is the averaged absolute difference between the desired probabilities and the measured probabilities. It contains hyperparameters α_err and α_inc to balance between the erroneous outcomes and the inconclusive outcomes:

$$ \begin{array}{@{}rcl@{}} J &=& \underset{i}{\sum} \frac{1}{|S_{i}|} \underset{a_{i}\in S_{i}}{\sum} \left| P_{\text{suc}}(\rho_{i}(a_{i})) - 1 \right| \\ && + \alpha_{\text{err}} \underset{i}{\sum} \frac{1}{|S_{i}|} \underset{a_{i}\in S_{i}}{\sum} \left| P_{\text{err}}(\rho_{i}(a_{i})) - 0 \right| \\ && + \alpha_{\text{inc}} \underset{i}{\sum} \frac{1}{|S_{i}|} \underset{a_{i}\in S_{i}}{\sum} \left| P_{\text{inc}}(\rho_{i}(a_{i})) - 0 \right| . \end{array} $$

(4)

Here, we assume that for each family of quantum states, we are given a set S_i of training samples, where each class is labelled by i. We denote with |S_i| the cardinality of this set, i.e., the number of samples in the training set S_i, α_err is the penalty for making errors, and α_inc is the penalty for giving inconclusive outcomes. P_suc(ρ_i)/P_err(ρ_i)/P_inc(ρ_i) are the probabilities of giving a correct/erroneous/inconclusive measurement outcome for the specific input quantum data ρ_i. This loss function measures the performance of our quantum circuit as a minimal-error discriminator (when α_err < α_inc) or as an unambiguous discriminator (when α_err > α_inc).

To train this circuit, we use the Adam optimization algorithm (Kingma and Ba 2014), and we calculate the gradients using the forward difference formula.

For our specific problem of classifying ρ₁ and ρ₂ as defined in Eqs. 2 and 3, we define an extra set of success/erroneous/inconclusive rates in Eq. 5 to summarize and compare the performance of different instances of the training process:

$$ \begin{array}{@{}rcl@{}} P_{s} &=& \frac{1}{3} P_{s}(\rho_{1})_{\text{avg}} + \frac{2}{3} P_{s}(\rho_{2})_{\text{avg}} \\ &=&\frac{1}{3} P_{s}(\psi_{1})_{\text{avg}} + \frac{1}{3} P_{s}(\psi_{2})_{\text{avg}} + \frac{1}{3} P_{s}(\psi_{3})_{\text{avg}}, \end{array} $$

(5)

where s stands for suc (successful), err (erroneous), or inc (inconclusive). The subscript avg means that the probabilities are calculated as the average value for all samples of either the training set, or the test set (but not both). The choice of weights ($\frac {1}{3} $and $\frac {2}{3}$) in the Eq. 5 was made to be consistent with the results in Mohseni et al. (2004).

3 Theoretical analysis

Here, we describe a theoretical result to which we will compare our numerical results. In the general case, assume we have a family (or class) of quantum data ρ(a), each one parameterized by a and occurring with a probability P(a). Assume in addition that we have a quantum measurement described by a POVM with elements $\{{\varPi }_{i}\}_{i\in \mathbb {N}}$, where i labels different measurement outcomes. Then, the probability of detecting measurement outcome i, averaged over any of the input data ρ(a), is:

$$ \begin{array}{@{}rcl@{}} \int \text{Tr}({\varPi}_{i} \rho(a)) P(a)\mathrm{d}a &=& \text{Tr}\left[\int {\varPi}_{i}\rho(a) P(a)\mathrm{d}a a \right] \\ &=& \text{Tr}\left[{\varPi}_{i}\int\rho(a) P(a)\mathrm{d}a\right] \\ &=& \text{Tr}\left[{\varPi}_{i} \rho\right], \end{array} $$

(6)

where $\rho ={\int \limits } \rho (a) P(a)\mathrm {d} a$, and the integration of the matrix is done in an element-wise fashion. Therefore, if Tr(π_iρ) = 0 for some i, then ${\int \limits }_{D} \text {Tr}({\varPi }_{i} \rho (a)) P(a) = 0$ for any subset D with non-zero measure in the whole parameter space of a. This is due to the fact that Tr[π_iρ(a)]P(a) ≥ 0 for any parameter a.

The analysis above shows that the problem of unambiguously discriminating $\rho _1={\int \limits }_a \rho _1(a) P_1(a)\mathrm {d} a$ and $\rho _2={\int \limits }_a \rho _2(a) P_2(a) \mathrm {d} b$, is equivalent to the problem of unambiguously discriminating the family ρ₁(a),∀a, from the family ρ₂(b),∀b, where P₁(a)/P₂(b) is the probability of occurrence of ρ₁(a)/ρ₂(b). That is, if {π₁, π₂, π_inc} is a POVM that unambiguously classifies all members of the two families ρ₁(a) and ρ₂(b), for all possible parameters, i.e., π_inc corresponds to the inconclusive outcome with:

$$ \begin{array}{@{}rcl@{}} &\text{Tr}({\varPi}_{2}\rho_{1}(a)) = 0, \forall a,\\ &\text{Tr}({\varPi}_{1}\rho_{2}(b)) = 0, \forall b, \end{array} $$

then Tr(π₁ρ₂) = Tr(π₂ρ₁) = 0, and vice versa. Using this formalism, we can theoretically analyze the different cases we described in Table 1 based on the works of Raynal et al. (2003) and Barnett and Croke (2009), and the results are displayed in Table 2. Note that these are average case success probabilities.

Table 2 A summary of maximal success rate when the error rate is exactly 0 for the different test cases classified in this work

Full size table

4 Numerical results

In this work, we aim to train a universal discriminator to discriminate different families of quantum data. Here, we present the results of training the universal discriminator to discriminate different distributions summarized in Table 1 on a quantum computer. The training is done by simulating the evolution of the quantum system under the parametrized circuits in a classical computer. To balance between eliminating the error rate (P_err) while minimizing the inconclusive rate (P_inc), we use a specific training strategy described in the following. We first prioritize a smaller inconclusive rate by starting with a zero penalty for erroneous outcomes (α_inc > α_err = 0), and then increase the α_err in a step-wise manner until a certain objective error rate is achieved. Similar optimization procedures have been used in the context of variational auto-encoders both in classical machine learning (Sønderby et al. 2016), and in quantum machine learning applications (Rocchetto et al. 2018). Using this scheme, we train our circuit to unambiguously discriminate the two families of quantum states and observe the convergence toward the theoretical success rates for the discriminator obtained in Section 3 with an increasing amount of training data . Notably, we do not observe any signs of overfitting despite the varying size of the training dataset (Fig. 1a).

Trade-off between the error rate and the inconclusive rate

Here, we show that our model is able to obtain a much higher success rate (P_suc) if we allow a slightly higher error rate compared with the previous results. This hints at a trade-off between the error rate (P_err) and the inconclusive rate (P_inc) which can be utilized in real-world applications.

Specifically, for the dataset “Case 4” in Table 1, we fix the two penalties, α_err and α_inc, during the training and observed a gradual transition from unambiguous-like classification (characterized by a near-zero error probability) to minimal-error–like classification (characterized by the near-zero inconclusiveness) when we use varying penalties (Fig. 2a–c) throughout the different trainings with random initializations. Allowing a small error rate results then in a much higher success rate, which has not been predicted theoretically. We note that introducing the penalty terms α_err and α_inc also makes the training process more stable (Fig. 2a). Therefore, the hyperparameters α_err and α_inc act as a form of regularization and could be adjusted to give a higher success probability or a lower inconclusiveness rate for the final model (Fig. 2).

Furthermore, similar trade-off effects exhibited in all datasets are listed in Table 1. If we stop the training once the error rate drops below 0.01, we can achieve a much higher success rate than the theoretical case of exactly zero error rate (Fig. 3).

5 Learning convergence from ensemble measurements

We additionally perform experiments in which we estimate the probabilities from repeated measurements on the (simulated) quantum device. We find that the noise in gradient calculation which is caused by these estimated probabilities could be effectively countered by increasing the number of repeated measurements, using a lower error rate, and adjusting the step size in the forward difference formula. The detailed discussion is available in Appendix Appendix. Therefore, our study here appears to be feasible to be run on error-corrected quantum devices. We leave open the effects of machine noise (the noise caused by imperfect quantum devices), and an actual implementation as future projects.

6 Conclusions

We have developed a quantum circuit learning approach for the classification of quantum data. Specifically, we have designed a heuristically motivated loss function and used the stochastic optimization algorithm Adam in a quantum-classical hybrid scheme to train a circuit to perform quantum state discrimination. This training process generalizes well for the discrimination tasks on new data, i.e., states from the parameter range which have not been seen during the training process. This distinguishes our work from previous results on quantum circuit learning, in particular the very recent study in Fanizza et al. (2018), which only optimizes circuits for specific inputs. Note that this prior work hence does not consider the generalization ability and hence does not treat the actual learning problem, which aims at optimization as well as generalization.

In our work, we observe a trade-off between the error rates and the inconclusive rates when we penalize them differently in the loss function. Although this experiment is done on simulated quantum computers where exact measurement probabilities are available, we show that this optimization could be experimentally performed with repeated measurements of the quantum states. We note that the recent quantum methods for estimating the analytical gradient via variations in the unitaries (Mitarai et al. 2018) can be directly applied to training our circuits; therefore, one can perform the optimization efficiently on near-term quantum devices. Also, although the Adam optimization algorithm is shown to be sufficient for the experiments conducted in this paper, several optimization algorithms specific to variational hybrid quantum-classical algorithms have been proposed and may provide improvements in more complicated cases (see for example Kübler et al. (2019)).

In this work, we have not addressed the issue of scalability of classifying quantum states. However, we expect most kinds of quantum data of interests will only require polynomial-depth circuits for classifying them. For example, it is likely that an ansatz based on the idea of tensor networks (e.g., Grant et al. (2018) and Cong et al. (2019)) can classify the different phases of ground states of quantum many-body systems in polynomial depth. Also, a scheme where one systematically increases the depth of the ansatz circuit will help explore the required circuit depth for classifying quantum data. A similar idea has been explored in the context of variational quantum eigensolver (Ostaszewski et al. 2019).

We believe that with the progress on technologies for preservation and transportation of quantum states, we will see many applications of a trained discriminative quantum circuits introduced here. Quantum state discrimination by itself plays a key role in quantum information processing protocols and is used in quantum cryptography (Bennett 1992a), quantum cloning (Duan and Guo 1998), quantum state separation, and entanglement concentration (Chefles 2000). Our work can provide improvements on these traditional areas by producing a classifier that is resilient to the statistical noise found in the actual communication. For example, we can consider an improved version of the B92 quantum key distribution protocol (Bennett 1992b) by including the noise-induced randomness in its two quantum keys and classify them with our discriminative circuit. Furthermore, we can consider training a discriminative quantum circuit used to construct quantum repeaters and state purification units within quantum communication networks. The training can take quantum data that have noise specific to the communication networks and therefore produces a discriminator that can recognize and filter those noise to provide better performance. Our discriminator can also be used to verify the output of other generative models, such as the quantum version of Boltzmann machines (Amin et al. 2018), or generative artificial neural networks (Goodfellow et al. 2014; Lloyd and Weedbrook 2018).

Notes

This assumes that the cost function follows a normal distribution with variance of the order $\frac {1}{\sqrt {N}}$, where N is the number of measurements made in reach run in order to calculate the cost function.

References

Amin MH, Andriyash E, Rolfe J, Kulchytskyy B, Melko R (2018) Quantum Boltzmann machine. Physical Review X 8(2):021050. https://doi.org/10.1103/physrevx.8.021050
Article Google Scholar
Banchi L, Pancotti N, Bose S (2016) Quantum gate learning in qubit networks: Toffoli gate without time-dependent control. Npj Quantum Inf 2:16019
Article Google Scholar
Barnett SM, Croke S (2009) Quantum state discrimination. Adv Opt Photonics 1(2):238. https://www.osapublishing.org/aop/abstract.cfm?uri=aop-1-2-238
Article Google Scholar
Barzanjeh S, Guha S, Weedbrook C, Vitali D, Shapiro JH, Pirandola S (2015) Microwave quantum illumination. Phys Rev Lett 114(8):080503. https://doi.org/10.1103/physrevlett.114.080503
Article Google Scholar
Bennett CH (1992a) Quantum cryptography using any two nonorthogonal states. Phys Rev Lett 68(21):3121
Article MathSciNet Google Scholar
Bennett CH (1992b) Quantum cryptography using any two nonorthogonal states. Phys Rev Lett 68(21):3121–3124. https://doi.org/10.1103/physrevlett.68.3121
Article MathSciNet Google Scholar
Biamonte J, Wittek P, Pancotti N, Rebentrost P, Wiebe N, Lloyd S (2017) Quantum machine learning. Nature 549(7671):195–202. https://doi.org/10.1038/nature23474
Article Google Scholar
Chefles A (2000) Quantum state discrimination. Contemp Phys 41(6):401–424
Article Google Scholar
Ciliberto C, Herbster M, Ialongo AD, Pontil M, Rocchetto A, Severini S, Wossnig L (2018) Quantum machine learning: a classical perspective. Proc R Soc A 474(2209):20170551
Article MathSciNet Google Scholar
Cong I, Choi S, Lukin MD (2019) Quantum convolutional neural networks. Nat Phys 15 (12):1273–1278. https://doi.org/10.1038/s41567-019-0648-8
Article Google Scholar
Degen C, Reinhard F, Cappellaro P (2017) Quantum sensing. Rev Mod Phys 89(3):035002. https://doi.org/10.1103/revmodphys.89.035002
Article MathSciNet Google Scholar
Duan LM, Guo GC (1998) Probabilistic cloning and identification of linearly independent quantum states. Phys Rev Lett 80(22):4999
Article Google Scholar
Fanizza M, Mari A, Giovannetti V (2018) Optimal universal learning machines for quantum state discrimination. arXiv:180503477
Farhi E, Neven H (2018) Classification with quantum neural networks on near term processors. arXiv:180206002
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, pp 2672–2680
Grant E, Benedetti M, Cao S, Hallam A, Lockhart J, Stojevic V, Green AG, Severini S (2018) Hierarchical quantum classifiers. arXiv:180403680
Innocenti L, Banchi L, Ferraro A, Bose S, Paternostro M (2018) Supervised learning of time-independent Hamiltonians for gate design. arXiv:180307119
Iten R, Colbeck R, Christandl M (2016) Quantum circuits for quantum channels. Phys Rev A Atom Mol Opt Phys 93(3):052316. https://doi.org/10.1103/PhysRevA.95.052316, arXiv:1609.08103
Google Scholar
Iten R, Colbeck R, Kukuljan I, Home J, Christandl M (2015) Quantum circuits for isometries. Physical Review A - Atomic, Molecular, and Optical Physics. https://doi.org/10.1103/PhysRevA.93.032318. arXiv:1501.06911
Karzig T, Knapp C, Lutchyn RM, Bonderson P, Hastings MB, Nayak C, Alicea J, Flensberg K, Plugge S, Oreg Y, Marcus CM, Freedman MH (2017) Scalable designs for quasiparticle-poisoning-protected topological quantum computation with Majorana zero modes. Phys Rev B 95(23):235305. https://doi.org/10.1103/physrevb.95.235305
Article Google Scholar
Khatri S, LaRose R, Poremba A, Cincio L, Sornborger AT, Coles PJ (2019) Quantum-assisted quantum compiling. Quantum 3:140. https://doi.org/10.22331/q-2019-05-13-140
Article Google Scholar
Kimble HJ (2008) The quantum Internet. Nature 453(7198):1023–1030. https://doi.org/10.1038/nature07127
Article Google Scholar
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv:1412.6980
Kübler JM, Arrasmith A, Cincio L, Coles PJ (2019) An adaptive optimizer for measurement-frugal variational algorithms. arXiv:1909.09083
Li Y, Benjamin SC (2017) Efficient variational quantum simulator incorporating active error minimization. Phys Rev X 7(2):021050
Google Scholar
Lloyd S, Weedbrook C (2018) Quantum generative adversarial learning. Phys Rev Lett 121 (4):040502. https://doi.org/10.1103/physrevlett.121.040502
Article MathSciNet Google Scholar
Mitarai K, Negoro M, Kitagawa M, Fujii K (2018) Quantum circuit learning. arXiv:180300745
Mohseni M, Steinberg AM, Bergou JA (2004) Optical realization of optimal unambiguous discrimination for pure and mixed quantum states. Phys Rev Lett 93(20):200403. https://doi.org/10.1103/PhysRevLett.93.200403,0401002
Article Google Scholar
Ostaszewski M, Grant E, Benedetti M (2019) Quantum circuit structure learning. arXiv:1905.09692
Qi XL, Zhang SC (2011) Topological insulators and superconductors. Rev Mod Phys 83 (4):1057–1110. https://doi.org/10.1103/revmodphys.83.1057
Article Google Scholar
Raynal P, Lütkenhaus N, van Enk SJ (2003) Reduction theorems for optimal unambiguous state discrimination of density matrices. Phys Rev A 68:022308. https://doi.org/10.1103/PhysRevA.68.022308. arXiv:0304179
Article Google Scholar
Ren JG, Xu P, Yong HL, Zhang L, Liao SK, Yin J, Liu WY, Cai WQ, Yang M, Li L, Yang KX, Han X, Yao YQ, Li J, Wu HY, Wan S, Liu L, Liu DQ, Kuang YW, He ZP, Shang P, Guo C, Zheng RH, Tian K, Zhu ZC, Liu NL, Lu CY, Shu R, Chen YA, Peng CZ, Wang JY, Pan JW (2017) Ground-to-satellite quantum teleportation. Nature 549(7670):70–73. https://doi.org/10.1038/nature23675
Article Google Scholar
Rocchetto A, Grant E, Strelchuk S, Carleo G, Severini S (2018) Learning hard quantum distributions with variational autoencoders. npj Quantum Information 4(1), https://doi.org/10.1038/s41534-018-0077-z, arXiv:1710.00725
Romero J, Olson JP, Aspuru-Guzik A (2017) Quantum autoencoders for efficient compression of quantum data. Quantum Sci Technol 2(4):045001
Article Google Scholar
Schaller G, Schützhold R (2006) Quantum algorithm for optical-template recognition with noise filtering. Phys Rev A 74(1):012303. https://doi.org/10.1103/physreva.74.012303
Article Google Scholar
Schuld M, Bocharov A, Svore K, Wiebe N (2018) Circuit-centric quantum classifiers. arXiv:180400633
Shende VV, Bullock SS, Markov IL (2006) Synthesis of quantum logic circuits. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems p 18, https://doi.org/10.1109/TCAD.2005.855930, arXiv:0406176
Shende VV, Markov IL, Bullock SS (2004) Smaller two-qubit circuits for quantum communication and computation. In: Proceedings - Design automation and test in Europe conference and exhibition, vol 2, pp 980–985. https://doi.org/10.1109/DATE.2004.1269020
Sønderby CK, Raiko T, Maaløe L, Sønderby SK, Winther O (2016) Ladder variational autoencoders. arXiv:1602.02282
Verdon G, Broughton M, Biamonte J (2017) A quantum algorithm to train neural networks using low-depth circuits. arXiv:171205304
Wan KH, Dahlsten O, Kristjánsson H, Gardner R, Kim M (2017) Quantum generalisation of feedforward neural networks. npj Quantum Inf 3(1):36
Article Google Scholar
Xu X, Sun J, Endo S, Li Y, Benjamin SC, Yuan X (2019) Variational algorithms for linear algebra. arXiv:1909.03898

Download references

Acknowledgments

We want to thank Raban Iten, Oliver Reardon-Smith, and Roger Colbeck for valuable insights in parametrizing the general measurement circuits and Jarrod McClean for feedback on the manuscript. This project acknowledges the use of the EPSRC funded Tier 2 facility JADE, the use of the UCL Legion High Performance Computing Facility (Legion@UCL), and associated support services, in the completion. This work has been carried out while L.W. and S.S. participated in the workshop of Measurement and control of quantum systems at the Institut Henri Poincare. The financial support is kindly acknowledged.

Funding

L.W. is supported by the Royal Society. S.S. is supported by the Royal Society, EPSRC, the National Natural Science Foundation of China, and the grant ARO-MURI W911NF-17-1-0304 (US DOD, UK MOD and UK EPSRC under the Multidisciplinary University Research Initiative).

Author information

Authors and Affiliations

Department of Physics & Astronomy, University College London, London, UK
H. Chen
Department of Computer Science, University College London, London, UK
H. Chen, L. Wossnig & S. Severini
Rahko Limited, The Coalface, Clifton House, 46 Clifton Terrace, Finsbury Park, N4 3JP, London, UK
H. Chen & L. Wossnig
Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai, China
S. Severini
Google Quantum AI Laboratory, Venice, CA, USA
H. Neven & M. Mohseni

Authors

H. Chen
View author publications
You can also search for this author in PubMed Google Scholar
L. Wossnig
View author publications
You can also search for this author in PubMed Google Scholar
S. Severini
View author publications
You can also search for this author in PubMed Google Scholar
H. Neven
View author publications
You can also search for this author in PubMed Google Scholar
M. Mohseni
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to L. Wossnig.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1: Quantum circuits for POVM

This section describes the parametrization of the circuit capable of performing any quantum measurement on 2-qubit inputs with 4 possible measurement outcomes. This circuit could be represented by the following circuit diagram:

(7)

1.1 1.1 Cosine-sine decomposition

Here, we mention the cosine-sine decomposition of unitary matrices, which will be frequently used in the following sections. For every unitary matrix $U\in \mathbb {C}^{2^n\times 2^n}$, it can be decomposed as:

$$ \begin{array}{@{}rcl@{}} U_n = \left( \begin{array}{ll} A_0 & 0 \\ 0 & A_1 \end{array}\right) \left( \begin{array}{ll} C & -S \\ S & C \end{array}\right) \left( \begin{array}{ll} B_0 & 0 \\ 0 & B_1 \end{array}\right) \end{array} $$

(8)

where A₀, A₁, B₀, B₁ are unitary matrices of size 2^n− 1 × 2^n− 1, C and S are real diagonal matrices of size 2^n− 1 × 2^n− 1 satisfying . It can be written in the following circuit equivalence diagram:

(9)

Here, a box represents the control part of a uniformly controlled gate; see section IV of (Iten et al. 2015) for details. In the circuit in Eq. 7, the first qubit is initiated to be |0〉, so we have:

(10)

1.2 1.2 Decomposition of the circuit in Eq. 7

For a general measurement giving at most 4 measurement outcomes, we have the following circuit representation:

(11)

The first V could be decomposed using the circuit equivalence on page 5 of Iten et al. (2016) into:

where the R gate does not act on the second qubit. Applying the cosine-sine decomposition gives:

The uniformly controlled $V{^{\prime }}$ and U can be merged and put after the measurement of M₁ as:

The first line of the circuit could be merged with the second line as follows:

(12)

And then we can apply the cosine-sine decomposition to $V^{\prime \prime }$. Throwing away the last gate on the third and the fourth qubits, we obtain:

(13)

The uniformly controlled rotations and the remaining two-qubit unitary gates could be easily parametrized by CNOTs and single-qubit rotations. For example, see Shende et al. (2006) and Shende et al. (2004).

Appendix 2: Learning convergence from ensemble measurements

Here, we simulate the process that a classical-quantum hybrid scheme would implement utilizing a quantum device and analyze its performance. These numerical simulations can in principle be validated in a physical experiment, where the measurement outcomes are used to infer the different probabilities for the cost function. To have a good estimation of the probabilities, and hence the cost function, one has to make repeated measurements to train the model, and we note that in particular better methods to evaluate the analytical gradient are available on a shallow quantum device (Mitarai et al. 2018). We first give a brief discussion of the estimated number of repeated measurements which are required to approximate the gradient. This follows the treatment of Farhi and Neven (2018) (Section 3). Since the gradients are calculated using the forward difference formula:

$$ \begin{array}{@{}rcl@{}} \frac{df}{dx}(x) = \frac{f(x+\varepsilon) - f(x)}{\varepsilon} + O(\varepsilon) \end{array} $$

(14)

The error in the calculation of f must be at most of the order of O(ε²), in order to prevent dominating the total error. To achieve this ideally with a 99% probability, one requires the number of repeated measurements to be of the order $\frac {1}{(\varepsilon ^2)^2}=\frac {1}{\varepsilon ^4}$.^{Footnote 1} For example, when ε = 10^− 3, the ideal number of repetitions is given by 10¹².

In practice, we do not use $\frac {1}{\varepsilon ^4}$ measurements, since the Adam optimization algorithm is designed with the noise of the cost function taken into account. To give an estimate of the number of repeated measurements which are required for the convergence of the optimization process, we perform two numerical experiments. We first look at the case when the number of repeated measurements was large (≥ 10³) and ε = 10^− 2. We find that 10⁵ repeated measurements for each iteration are a robust configuration for a successful convergence. Second, we use a small number of repeated measurements but varied the learning rate and increased the maximal number of iterations for Adam. Setting ε = 10^− 2 and taking only 100 repeated measurements, we observe that the optimizations were successful with a large number of iterations. In both experiments, the penalties are set to α_inc = 5 and α_err = 40.

Large number of repetitions

Our results show that for a fixed maximum number of iterations (5000) for Adam, a combination of ε = 10^− 2 and 10⁵ repeated measurements gives robust results, i.e., the final cost function is close to the value obtained with the exact probabilities (with an error within 3%) and is stable (with a relative standard deviation of 13%). A more detailed description of the trade-off between repeated measurements and the stability of the cost function is shown in Fig. 4.

Small learning rates and high number of iterations

Our numerical experiments further show that in the case of using a small number of repeated measurements, lowered learning rates could effectively counter the noisy brought by the insufficient sampling. Although in this case, the optimization requires a large number of iterations to finish. For example, with only 100 repeated measurements, the variance of cost function J₁ after 20,000 iterations decreases as we lower the learning rate (Fig. 5a). We could visually observe the optimization process where the cost function J₁ slowly approach the optimal value in Fig 5b. Here, the gradient step is taken as ε = 10^− 2.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Chen, H., Wossnig, L., Severini, S. et al. Universal discriminative quantum neural networks. Quantum Mach. Intell. 3, 1 (2021). https://doi.org/10.1007/s42484-020-00025-7

Download citation

Received: 05 August 2019
Accepted: 03 September 2020
Published: 15 December 2020
DOI: https://doi.org/10.1007/s42484-020-00025-7

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Universal discriminative quantum neural networks

Abstract

Similar content being viewed by others

Hybrid Quantum Machine Learning Classifier with Classical Neural Network Transfer Learning