Final Report Womanium Quantum+AI 2024 Bootcamp Project

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

August 10, 2024

QML For Conspicuity Detection

Name Country
Martyna Anna Czuba Poland
Hussein Shiri Lebanon
QML Womanium Quantum+AI Project Submission

The project focuses on conspicuity detection in production, which makes it possible to identify improvement
measures for individual work steps or sub-processes at an early stage and thus optimize the production
process. To do this, we analyze process data such as image data or time series to uncover deviations and weak
points in production. Classical methods for analyzing such data are very time-consuming.

1 Task One Figure 2 illustrates the real and imaginary compo-


nents of the amplitudes of the quantum states |0⟩ and
The first task is about getting familiar with penny- |1⟩ after applying the RX(θ) gate to the initial state
lane from the codebooks on the pennylane.ai website. |0⟩. The RX(θ) gate rotates the state around the x-
Three notebooks are recommended to be finished and axis of the Bloch sphere by an angle θ. As θ varies from
all were finished by our team, ”Introduction to Quan- 0 to 4π, the amplitudes oscillate, reflecting the periodic
tum Computing”, ”Single-Qubit Gates” and ”Circuits nature of the rotation. The real parts show sinusoidal
with Many Qubits”. behavior, with |0⟩ and |1⟩ having a phase difference
The ”Introduction to Quantum Computing” sec- of π, illustrating the effect of the RX rotation on the
tion covers the basics of quantum computing, including qubit state.
qubits, superposition, and entanglement. It provides The ”Circuits with Many Qubits” section pro-
a foundational understanding of how quantum states gresses to building and analyzing quantum circuits
are represented and manipulated. Differences between with multiple qubits. This section covers the construc-
classical and quantum computing are discussed, as tion of more complex circuits and the use of entangle-
well as the principles of quantum mechanics that sup- ment to create quantum states that cannot be repre-
port quantum computation. The tutorials explain how sented classically.
quantum gates work on qubits, introduce the concept The full solution can be found in the ”Task 1” folder
of normalization (where the sum of probabilities of all in the GitHub repository.
states equals one), and cover algebraic properties of
quantum operations, such as combining operators and
the importance of commutators. 2 Task Two
U = np . array ( [ [1 , 1 ] ,[1 , - 1 ] ] ) This task required familiarizing oneself with Varia-
U = U / np . sqrt ( 2 ) tional Classifiers. This topic pertains to a variational
@qml . qnode ( dev )
def apply_u () :
quantum machine learning model used for supervised
qml . QubitUnitary (U , wires = 0 ) learning. This is a popular approach that uses train-
# return the state able quantum circuits as machine learning models,
return qml . state () similar to neural networks. The circuit (also known
as ansatz, parameterized circuit) is measured multi-
Fig 1: Example of applying a custom unitary ple times to estimate the expectation of some observ-
able, and the result is interpreted as a prediction.
The ”Single-Qubit Gates” section focuses on The weights of the quantum circuit are optimized dur-
the operations that can be performed on individ- ing training to minimize a cost function, similar to
ual qubits. These gates are the building blocks how weights are optimized in classical neural networks.
of more complex quantum circuits. Implementa- This approach is known by different names such as vari-
tions and visualizations of single-qubit gates using ational circuits, quantum circuit learning, and param-
Pennylane provide hands-on experience with ap- eterized circuits. This concept is visualized in Fig. 3
plying these gates to manipulate quantum states.

Fig 3: The idea of a hybrid quantum-classical training


algorithm for variational circuits. Source: [5]

To formalise the problem, let X be a set of inputs


and Y a set of outputs. Given a dataset
D = {(x1 , y 1 ), . . . , (xM , y M )}
of pairs of so-called training inputs xm ∈ X and tar-
Fig 2: Evolution of quantum state for different angles of the get outputs y m ∈ Y for m = 1, . . . , M , our goal is to
RX gate predict the output y ∈ Y of a new input x ∈ X .
2
This task is divided into two main sections: empirical findings that suggest models train more ef-
fectively when starting with small random weights.
• Variational Classifiers for the Parity Function In our implementation, we can choose different op-
• Variational Classifiers for Iris Classification timizers (Gradient Descent Optimizer, Nesterov Mo-
mentum Optimizer, Adam Optimizer). Addition-
2.1 Variational Classifiers for the Par- ally, the code is written to be easily reusable and
maintainable in different notebooks. This is par-
ity Function ticularly useful when the training dataset is larger.
The first notebook called ”Variational Classifier” shows
that a variational circuit can be optimized to emulate
the parity function
(
⊗n 1 if uneven number of 1’s in x
g : x ∈ {0, 1} → y =
0 else.

We performed data preprocessing by converting the la-


bels from 0,1 to -1,1 to directly correspond to the out-
put of the quantum circuit, which ranges from [-1,1]
due to the Pauli-Z measurement. This alignment facil-
itates clear interpretation and seamless integration of
quantum computing techniques with classical machine
learning methods.We also split our dataset into train-
ing and test sets. In the mathematical form the idea of Fig 5: The plots illustrate (a) training accuracy over epochs,
a variational classifier can be write as the expectation (b) validation accuracy over epochs, (c) cost over epochs, and
value of an observable O as the output of a classifier: (d) bias over epochs. The model demonstrates significant
fluctuations in the early stages of training but eventually
stabilizes, achieving 1 accuracy on both the training and
f (x; θ) = ⟨ψ(x)|U † σz1 U |ψ(x)⟩ validation sets and a minimal cost, indicating effective learning
and generalization.
, where σz operator applied on first qubit, U is a model
circuit. The variational circuit is visualized in Fig 4. The training and validation accuracy both fluctu-
ate significantly in the initial epochs but stabilize and
reach 1 around the 50th epoch, indicating the model
eventually learns and generalizes well. The cost value
rapidly decreases at first, then stabilizes close to zero,
reflecting effective learning. The bias value also fluc-
tuates initially but converges around zero over time,
suggesting the model parameters stabilize as training
progresses. Additionally, we calculated the metrics: ac-
Fig 4: The circuit model using for parity classification.
curacy, precision, recall, and F1 score for the test data.
This quantum circuit diagram illustrates a varia- All of them equal 1.
tional classifier using basis encoding to initialize the
qubits with the input data. The circuit comprises 2.2 Variational Classifiers for Iris Clas-
two layers of parameterized single-qubit rotation gates
sification
(Rot) and CNOT gates in a ring topology. The final
measurement step typically involves the Pauli-Z op- We prepared a function ‘load and prepare iris data‘
erator, with the output corresponding to the model’s to load the iris data, normalize the vectors, and split
prediction. the data into training, validation, and test sets as part
For the training process, we used the Mean Squared of the preprocessing stage. The data are represented
Error (MSE) . The formula for MSE is: as 4-dimensional real-valued vectors, and encode these
n
inputs into 2 qubits. The Ansatz that we will use
1X 2 in this section is described in the previous section.
MSE = (yi − ŷi )
n i=1 We have changed state preparation method. We used
the built-in function ‘qml.MottonenStatePreparation‘.
,where yi are the actual labels, ŷi are the predicted val- This built-in operation prepares an arbitrary state on
ues, and n is the number of samples. We employed the the given wires using a decomposition into gates devel-
Nesterov Momentum Optimizer with a learning rate of oped by Mttnen et al. (2004) [3]. This state prepa-
0.5. We used batches of 5 examples over 100 epochs. ration stage prepares a specific quantum state using
The small initial weight values are chosen to ensure a series of RY rotation gates and CNOT gates. The
that the training process starts smoothly and avoids RY gates apply rotations around the Y-axis, while
numerical issues that could arise from very large or the CNOT gates create entanglement between the
very small initial weights. This practice is based on qubits.This operations are shown in Fig. 6.
3
Fig 6: Quantum circuit diagram for Mottonen State
Preparation.

We utilized the Nesterov Momentum Optimizer with a learn-


ing rate of 0.01. We used batches of 5 examples over 60 epochs.
For this implementation, we used 2 qubits and 6 layers, with the Fig 8: kernel acting on an image
state preparation method set to ’Mottonen’.

For example, a neural network for image classification starts


first by applying several layers, one of which is the convolution
layer before reaching the final layer and performing classification.
This helps in decreasing the image size and extracting the most
important features for classification.

3.2 Quantum convolution


Instead of using a classical model for classification, we can use a
hybrid model. This model starts first by performing ”quantum
convolution” called ”quanvolution” instead of classical convolu-
tion layers. Then using classical layers for image classification.
In our solution we have compared the validation accuracy
using the same dataset for 2 kernels of different sizes. A kernel
of size 4x4 vs 2x2.

• Using a 4x4 kernel:


Fig 7: The plots illustrate (a) training accuracy over epochs, Original image size: (28, 28, 1)
(b) validation accuracy over epochs, (c) cost over epochs, and Size after convolution: (7, 7, 16)
(d) bias over epochs. The model demonstrates rapid learning
and stabilization, achieving 1 accuracy on both the training and
• Using a 2x2 kernel:
validation sets, and effectively minimizing the cost. The bias
Original image size: (28, 28, 1)
value converges to zero, indicating stable model parameters.
Size after convolution: (14, 14, 4)

The training and validation accuracy both increase rapidly in the


The 3 values w x h x c, where c is the number of channels,
initial epochs, stabilizing at 1 around the 20th epoch, indicating
change according to the kernel size.
effective learning and generalization. The cost value drops sig-
nificantly within the first 10 epochs and remains low, reflecting
successful minimization. The bias value fluctuates initially but
converges to zero over time. Additionally, we calculated the met-
rics: accuracy, precision, recall, and F1 score for the test data.
All of them equal 1.

2.3 Summary
Summarizing both sections, in this task we deepened our under-
standing of variational quantum circuits. We trained both cir-
cuits for different problems, generated plots, and calculated met-
rics. We also ensured that our code is clean and well-managed,
making it easy to use and open to changes in the future.
Additionally, based on the article Circuit-centric quantum
classifiers [4], we implemented a circuit called circuit-centric
quantum classifiers. However, we did not have time to conduct
experiments with this code. We plan to extend this task in the
future.

Fig 9: Validation accuracy for each epoch

3 Task Three
This comparison shows a 5-10% increase in the validation
3.1 Classic convolution accuracy using a 4x4 kernel compared to a 2x2. This encourages
trying different kernel sizes to find the one that gives the highest
Convolution layers are part of a neural network that accuracy after classification.
works on images to perform certain task like classifica-
tion, encoding, ... These layers use a kernel to per-
form feature extraction and reduce the image dimensions. Our solution uses the following dataset and parameters:
4
• Dataset:
Training: 100 images
Testing: 20 images
MNIST Dataset

• Other parameters:
Epochs: 100
Classes: 10
Batch size: 4

Full solution is found in ”Task 3” folder in the github repos-


itory.

4 Task Four
The task was to develop a model to learn the sine function on Fig 10: Cost Function Convergence During Training.
the interval [0, 2π]. We discretized the interval with a suitable
number of points and used the sine values at these points as la-
bels. The goal was to implement a Quantum Machine Learning
model that reproduces the sine function values. Our solutions are Fig 10 shows the convergence of the MSE cost function during
based on several key articles,including Schuld, Maria et.al ”The 30 epochs of training. The maximum cost value is 1.0195960 at
effect of data encoding on the expressive power of variational epoch 0, and the minimum cost value achieved is 0.00000650 at
quantum machine learning models” [6] and Patrick Holzer,Ivica epoch 30, indicating successful fitting of the sine function to the
Turkalj ”Spectral Invariance and Maximality Properties of the training data.
Frequency Spectrum of Quantum Neural Networks” [2].These
articles have been invaluable, providing us with a wealth of in-
formation and serving as significant resources for our research
and development..
We identified three sections in this task:
• Quantum Model 1

• Quantum Model 2

• Classical Model

4.1 Quantum Model 1


Based on the previously mentioned articles [6], [2], we learned
that the expressivity of the corresponding quantum model is fun-
damentally limited by the data encoding. Inspired by these ar-
ticles, specifically Section II [6]. A, we have chosen the following
Fig 11:
ansatz in a slightly modified form, but based on the same prin-
Evaluation of VQC Model on on Extended Interval for Sine
ciples.
Function.
The generated data is in the form: g(x) = sin(x). We dis-
cretized this function into 1000 points over the
 interval [0, 2π] We
use a single-qubit gate generator, H = 21 σx . Our variational
Fig 11 compares the test labels (red circles) and test predictions
circuit is of the form
(black crosses) for the sine function over the extended interval
[2π, 7π]. The VQC model demonstrates excellent performance
fθ (x) = ⟨0|U † (x, θ)M U (x, θ)|0⟩
with a very low Mean Squared Error (MSE) of 0.0000067 on
,where |0⟩ is a single qubit, M = σz , and this interval. The close alignment of predictions with actual
labels indicates the model’s robustness and generalization capa-
(1) bility. The MSE values are: training set - 0.0000060, test set
U (x, θ) = Wθ Rx (x). - 0.0000064, and interval [2π, 7π] - 0.0000067, confirming the
model’s accurate learning of the sine function.
This circuit is illustrated in Fig 9.
In our code, it is possible to change the cost function to the
Mean Absolute Error (MAE) by using ‘cost MAE‘. The logs of
this training are saved in the file ‘1 Training Process Sin func-
tion.txt‘ in folder experiments.

4.1.1 Adjustment of number of points


Fig 9: Variational Quantum Circuit for Learning the Sine In this section, we will select a sufficient number of data points
Function to effectively learn the sine function. In the ’perform training
function’, we run the training process on data generated using
During the training process, we utilize the Gradient Descent different numbers of points (1000, 100, 50, 30, 20, 10, 5, 4, 3, 2, 1).
Optimizer (learning rate 0.1) to minimize the Mean Squared Er- The test cost is always measured on the same dataset: the in-
ror (MSE) cost function over 30 epochs. The parameters of the terval [2π, 7π], sampled with 1000 points. Fig 12 illustrates the
quantum circuit are initially small and are iteratively updated relationship between the number of data points and the corre-
to fit the sine function based on the training data. sponding training and test costs for the quantum model.
5
extend the task and train the model on a more complex func-
tion. We visualize our results with plots and demonstrate that
this model effectively learns more complex functions.
Here we slightly modify the model. The system is in exactly
the same form as proposed in the article Schuld, Maria et.al
”The effect of data encoding on the expressive power of varia-
tional quantum machine learning models” [6]. The architecture
from this article is visualized in Fig 14.

Fig 12: Analysis of Training and Test Costs vs. Number of Fig 14:The general the Quantum Model. Source: [6].
Data Points
We used a (univariate) quantum model fθ (x) and follow the
The parameters tend to converge to similar values as the assumption that overall quantum circuit has the form:
number of data points increases, indicating stable model train- U (x) = W (L+1) S(x)W (L) ...W (2) S(x)W (1)
ing. As for the cost function behavior, an increase in the number
of data points results in a slight increase in the cost function, sug- We can see that our model is splitted into ‘data encoding
gesting a trade-off between model complexity and the volume of (circuit) block1‘ S(x) and ‘trainable (circuit) block‘ W . We
training data. It is experimentally observed that with the given use popular strategy of encoding an input single-qubit rotations.
training parameters, the model requires only 4 data points to Our variational circuit is of the form
effectively learn the sine function For cases with 3, 2, and 1 data fθ (x) = ⟨0|U † (x, θ)M U (x, θ)|0⟩
points, despite achieving a final cost of zero during training, the
,where |0⟩ is a single qubit, M = σz , and
test cost remains significantly high, indicating overfitting and
poor generalization performance U (x, θ) = W (2) Rx (x)W (1)
This analysis underscores the importance of selecting an ap- .
propriate number of data points for training quantum models.
While fewer data points can lead to overfitting, an optimal num-
ber can achieve effective learning and good generalization.

4.1.2 Training for more complicated function


We evaluate the performance of this model on functions with a
larger frequency spectrum. Despite numerous training sessions
and hyperparameter updates, this model failed to fit the func-
Fig 15:Second Variational Quantum Circuit for Learning the
tion.
Sine Function
This model handled the problem equally well. During the
training process, we utilize the Gradient Descent Optimizer
(learning rate 0.1) to minimize the Mean Squared Er- ror (MSE)
cost function over 30 epochs. The parameters of the quantum
circuit are initially small and are iteratively updated to fit the
sine function based on the training data.

Fig 13: Evaluation of VQC Model on function with larger


frequency spectrum.
Fig 13 shows the limitations of the quantum model in fitting
complex functions. Above, we can observe that this quantum
model is capable of fitting only a sine function. In other words,
this quantum model (utilizing a single Pauli-X rotation) can only
learn to fit a Fourier series of a single frequency, and this is pos-
Fig 16: Cost Function Convergence During Training.
sible only if that frequency exactly matches the scaling of the
data. Regardless of the chosen weights, the single qubit model Fig 10 shows the convergence of the MSE cost function during
for L = 1 will always represent a sine function of a fixed fre- 30 epochs of training. Comparing with the previous model, both
quency. The weights merely adjust the amplitude, vertical shift, training processes utilized the same dataset of 800 samples, with
and phase of the sine wave. Exactly as suggested by the arti- identical hyperparameters and parameter initialization, allowing
cles [6], [2]. for a direct comparison of the two circuits. Both circuits effec-
tively minimized the cost function over 30 epochs, with Circuit
1 achieving a final cost of 0.0000065 and Circuit 2 reaching a
4.2 Quantum Model 2 slightly lower final cost of 0.0000042. The mean cost and stan-
We implemented the proposed model from article Schuld, Maria dard deviation were nearly identical between the circuits, and
et.al ”The effect of data encoding on the expressive power of vari- the final parameters showed minimal differences, indicating that
ational quantum machine learning models” [6]. Subsequently, we both circuits successfully and stably learned the sine function
6
with comparable performance. More details on this compari-
son can be found in the accompanying notebook. According to
observations from [6], the single-qubit model with L = 1 consis-
tently produces a sine function with a fixed frequency, regardless
of the chosen weights. The weights influence only the amplitude,
vertical shift, and phase of the sine wave.

4.2.1 Training for more complicated function


In this section, we aim to expand the frequency spectrum to a
wider range.
Our target function: Fig. 18 :Evaluation of Neural Network Model for Sine Function
g(x) = (0.05 + 0.05j) e ix
+ (0.05 + 0.05j) e i·2x Approximation.

i·3x
+ (0.05 + 0.05j) e + (0.05 + 0.05j) ei·4x + (0.05 + 0.05j) ei·5x For the quantum models, Circuit 1 achieved a final cost of
0.0000065 using just 2 parameters over 30 epochs, while Circuit
Our variational circuit is of the form
2 reached a final cost of 0.0000042 with the same number of pa-
fθ (x) = ⟨0|U † (x, θ)M U (x, θ)|0⟩ rameters and epochs. These results demonstrate the efficiency of
quantum models in attaining lower costs with significantly fewer
,where |0⟩ is a single qubit, M = σz , and. parameters compared to the classical neural network, particu-
(L+1) (L) (2) (1)
larly for this specific task.
U (x, θ) = Wθ SL (x)Wθ · · · Wθ S1 (x)Wθ However, it is important to emphasize that this analysis does
not account for several critical factors. Comprehensive evalua-
| {z } | {z }
Layer L Layer 1
tion should consider aspects such as computational complexity,
,where L = 5, S(x) = e−ixH = Rx (ϕ), Wθ = scalability, and the robustness of each approach to draw more
RZ(ω)RY (θ)RZ(ϕ). definitive conclusions. Further exploration in these areas is nec-
The circuit is visualized in Fig. 16. essary to fully understand the comparative advantages and lim-
itations of classical and quantum solutions.

4.4 Future work


Fig 16: Variational Quantum Circuit. Open questions:
Based on observations in the ”Adjustment of Number of
During the training process, we utilize the Adam Optimizer Points” section, why exactly does an increase in the number of
(learning rate 0.1) over 300 epochs. Additionally, we modify data points result in a slight increase in the cost function? How
the loss function slightly. We multiply our loss function by a does this apply to other problems? Do we know exactly how
factor of 1/2. Adding the 1/2 factor is common practice for sim- many data points are needed depending on the problem?
plifying the gradient during backpropagation.The gradients are
scaled down by a factor of 2 when we include the 21 factor in the
loss function. It is important to note, that this can influence the
learning process, particularly with regard to the learning rate
5 Task Five
5.1 Introduction
As task 5 is the most challenging and needs the most time, we
took the following approach. We started working first on task 5
from the start of the project phase and even 1-2 weeks before.
We continued working on this task in parallel with the other
tasks till the end of the project phase.

The general approach for this task is summarized in the fol-


lowing points:
• Image Preprocessing and Exploration: Provides a com-
prehensive analysis and preprocessing pipeline for image
data.
• Building the Dataset: We randomly select images from
Fig 17: Evaluation of VQC Model on function with larger both the train and test folders, ensuring an equal number
frequency spectrum. of images from each class to maintain a balanced dataset.
For binary classification, we consolidate classes 1 through
Overall, Fig 17 shows that the model performs well on both 5 into a single class, resulting in two final classes.
training and test data, with only minor discrepancies that could
be further optimized. • Dimensionality Reduction: Given the large image size, we
applied two dimensionality reduction techniquesPrincipal
Component Analysis (PCA) and Autoencodersto extract
4.3 Classical Model the most significant features and reduce the number of
variables needed for encoding in the quantum circuit.
This study defines a neural network model using PyTorch to ap-
proximate the sine function, aiming to capture its underlying • Data encoding: The classical data were encoded into the
pattern through regression. The model consists of a simple feed- quantum circuit using amplitude encoding, applied to cir-
forward neural network with one hidden layer, trained using the cuits of 4 and 8 qubits.
Adam optimizer and Mean Squared Error (MSE) loss function.
• Model Training: The model was trained using the Adam
The network is trained over 10, 000 epochs, ultimately achieving
optimizer, with a comparative analysis of three different
a final loss value of 0.00084. The classical neural network model
loss functions: exponential loss, binary cross-entropy loss,
employed in this study has a total of 61 trainable parameters.
and focal loss.
These results underscore the efficiency of quantum models in
achieving lower final costs with significantly fewer parameters, • Performance Evaluation: To evaluate our model, we cal-
particularly for the problem at hand. culated accuracy, recall, precision, and F1 score
7
5.2 Exploring the dataset: However, further reduction and feature extraction are neces-
sary. For instance, a 64x64 image yields 4,096 values to en-
The initial step involves examining the dataset’s contents: how code, which would require 12 qubits using amplitude encodinga
the images are stored, what the labels represent, the subject time-consuming process. Therefore, we implement two classical
matter of the images, and their dimensions. dimensionality reduction techniques: PCA and autoencoders.
The dataset consists of 6 labels: 1 label for non-defective We acknowledge that this probably had a significant impact
images (Label 0: good weld) and 5 labels for various defects on the model’s learning process. Feature extraction techniques,
(Labels 1-5). To enable binary classification, we merged im- such as PCA and autoencoders, are used to capture the most
ages with Labels 1-5 into a single ”defective” category (Label relevant information from the data while reducing its dimension-
1), while retaining Label 0 as the ”non-defective” category. The ality, thereby improving the models efficiency and effectiveness.
label distribution is as follows: Additionally, more advanced techniques, such as Fast Fourier
Transform (FFT), could be explored in future experiments to
• Label 0 (good weld): 10,947
further enhance feature extraction and potentially provide bet-
• Label 1 (burn through): 2,134 ter model performance. In the future, additional experiments on
• Label 2 (contamination): 8,403 larger feature spaces may be necessary.

• Label 3 (lack of fusion): 5,035


5.4 PCA vs autoencoder:
• Label 4 (misalignment): 3,682
Principal Component Analysis (PCA) is a dimensionality reduc-
• Label 5 (lack of penetration): 3,053 tion technique that compresses the dataset while preserving the
most significant patterns. In other words, it acts as a compres-
sion tool that reduces the size of the data while retaining as much
information as possible.
The key difference between PCA and autoencoders lies in
their reduction techniques. PCA utilizes a linear mapping func-
tion, making it a linear dimensionality reduction method, while
autoencoders apply non-linear transformations, enabling them
to capture more complex relationships within the data.
In our study, we applied PCA to reduce each 64x64 image to
16 values, aligning with our goal of using amplitude encoding on
a 4-qubit ansatz. We also utilized an autoencoder to reduce the
image size from 64x64 to 256 values, which allows us to encode
the data into an 8-qubit ansatz using amplitude encoding.
In summary, we employed two different approaches based on
circuit size: one uses PCA to encode data into a 4-qubit circuit,
Fig 19: Class-wise distribution of images in the dataset and the other uses an autoencoder to encode data into an 8-qubit
circuit, both utilizing amplitude encoding.

5.3 Data preprocessing and prepara- 5.5 Quantum tensor networks:


tion: Quantum tensor networks are structures designed to represent
We combined the images from the train and test folders because and manipulate quantum states, leveraging their interconnected
we observed a significant discrepancy in accuracy: the model design. In these networks, tensors take the place of traditional
achieved high accuracy when tested on images from the train graph nodes, with each tensor corresponding to a 2-qubit ansatz.
folder, but low accuracy when tested on images from the test These tensors are linked via CNOT gates to form larger config-
folder. To improve the model’s ability to generalize to unseen urations, such as 4-qubit or 8-qubit anstze, which are crucial
data, we merged both folders into a single dataset. Images are for quantum machine learning applications. This specific circuit
then randomly selected from this combined dataset for training configuration is known as the SU (4) circuit.
and testing..

• Original images:

Fig 20: General element of SU (4)


We chose this block ansatz because of the big number of
parameters so the model can learn better and due to its high en-
tanglement using 3 CNOTs. Also, we will use the MERA quan-
tum tensor network architecture instead of TTN. The reason for
• Cropped images: that is that MERA has extra layers and parameters which help
the trainable ansatz learn better and give higher accuracy. This
difference between the 2 architectures can be clearly seen in the
following fig. 21.

5.6 Loss function:


Before going in detail into the results, it is important to explain
• Resized images after crop, 64x64: the 3 different loss function we have used.
• Exponential loss:
X
loss = (1 + 10e7pi )−1
i
• Binary cross entropy loss:
output
Xsize
Cropping reduces the image size and removes irrelevant Loss = − yi · log ŷi
parts, while resizing ensures all images are uniform in size.
8 i=1
Fig. 21: TTN vs MERA architecture. Source: [1]

Train set Test set Validation set Loss epochs Validation acc Test acc
7971,8028 1006,995 1023,977 exponential 200 88.41% 87%
7971,8028 1006,995 1023,977 cross entropy 100 83.41% 83.3%
7971,8028 1006,995 1023,977 FL, gamma=1 100 85.71% 84.85%
7971,8028 1006,995 1023,977 FL, gamma=2 100 87.46% 86.65%
Table 1: table showing the results for circuit with 4 qubits.(Binary Classification)

• Focal loss:

FL(pt ) = −(1 − pt )γ log(pt )


The exponential loss function was taken from the research
paper Practical overview of image classification with tensor-
network quantum circuits, It did not have a name so we chose
exponential loss for it since it uses exponential.
This loss function focuses on optimizing all images instead
of part of the image like what the binary cross entropy does. So
we might get a correct label for some images but binary cross
entropy keeps optimizing these correct labels instead of focusing
on other images.
The focal loss is the cross entropy loss multiplied by
−(1 − pt )γ
which helps the model to focus more on hard examples, since
after easy examples have been easily classified we need more fo-
cus on the hard examples, as no need to focus more on the easy In fact, the accuracy is not the only important metric. The
examples. damaged induced by predicting a defective model is not defective
can lead to missing defects and have product failure in the end.
The F1-score and the confusion matrix are some metrics to help
5.7 Four qubits circuit: analyze the ability of the model to predict each class 0 and 1.
We now start with our first solution, after getting the images, The calculated f1-Scores are [0.87654321, 0.86272439] for
performing preprocessing, and building our circuit we can now classes [0, 1]. We can clearly see that for 7971,8028 train images
compare the 3 loss function after performing PCA. (7971 for class 0 and 8028 for class 1) we had the model slightly
better at predicting class 0 than class 1 in the test dataset.
from Modules . Utils import apply_pca
X_reduced = apply_pca ( X_set , n_components = 16 )
5.8 Eight qubits circuit:
Now we encode the 16 values into the 4-qubits circuit us-
ing amplitude encoding and we optimize the ansatze parameters Using 8 qubits gives us the ability to encode more information in
using Adam optimizer for the different loss functions. the circuit, also using the autoencoder allows us to find relations,
From the table above we can see that the exponential loss especially non-linear ones that PCA might have missed when it
gave the best validation and test accuracy’s of 88.41% and 87% performed reduction. In this approach we have used keras to
respectively. Although we ran it for 200 epochs, it was able to build the autoencoder.
reach its maximum accuracy before 100 epochs, so we ran the Steps for building the autoencoder:
other cost functions for 100 epochs. • Build the encoder part
9
Train set Test set Validation set Loss epochs Validation acc Test acc
2393,2407 303,297 304,296 cross entropy 200 88.17% 86.0%
2393,2407 303,297 304,296 exponential 200 92.67% 89.83%
2393,2407 303,297 304,296 FL, gamma=1 200 89.33% 86.0%
2393,2407 303,297 304,296 FL, gamma=2 200 92.33% 80.83%
Table 2: table showing the results for circuit with 8 qubits.(Binary classification)

• Build the decoder part We suggest that using the autoencoder with exponential loss
could lead to improved accuracy. Additionally, we observed a
• Compile and fit the model
slight difference between the MERA architecture described in
• Predict the encoded dataset(train, validation and test) this paper [1] and the version implemented in PennyLane.
The autoencoder was trained using the mean squared error
loss combined with Adam optimizer using keras.
6 Challenges and future scope
6.1 Challenges
We encountered several challenges while working on the project::
• Implementing research papers:
Given the high dimensionality of our images, approxi-
mately 800x800 pixels, significant additional effort was
required to apply the techniques discussed in this project.
Most existing studies typically focus on lower-dimensional
datasets, such as 28x28 pixel images, and often utilize
simplified datasets like MNIST. In contrast, our data is
more complex, originating from real-world sources rather
than standardized datasets, which necessitated the de-
velopment of more sophisticated methods to effectively
process and analyze these images.
• Training time:
Fig 22: The architecture of the used autoencoder. As the complexity of the problem (image) increases, our
We can see that for the second time the exponential loss model expands in both width and depth. This growth also
function was the one that gave the best validation and test ac- translates to an increase in the training time required. In
curacy’s. We had to use less training images since we have a the context of quantum machine learning, this means that
bigger circuit of 8 qubits. Using a bigger training set will in- as the image complexity rises, the quantum circuits must
crease the training time a lot. become more intricate, with more qubits and deeper lay-
We show another time the plotted confusion matrix to see ers of gates, leading to longer training times and greater
how good the model was at predicting each individual class de- computational demands.
fective vs non-defective. • Challenges and Learning Experiences: A significant chal-
lenge was the time constraint. The project involved nu-
merous aspects that we had to familiarize ourselves with,
which proved to be both educational and broadening for
our perspectives.

6.2 Future Scope


6.2.1 Binary-classification:
We are considering applying the concept from Task 4 to Task 5
by using quantum machine learning to analyze image data. The
idea is to treat each row of an image as a sine-like function, ex-
tracting its frequency components through Fourier analysis and
representing them as a sum of sine and cosine functions. A
quantum model could be trained to learn the correct behavior
of these functions by processing Fourier-type sums that capture
the frequency patterns in each row.
Once trained, the model could evaluate new images by ana-
We can see another time from the F1-score that the model lyzing the frequency components of each row. If the image is cor-
was able to slightly better predict class 0 compared to class rect, the quantum model would recognize the familiar patterns,
1. The calculated F1-scores are the following [0.89950577, leading to a low loss function. If the image contains anomalies,
0.89713322], for classes [0, 1]. the model would detect deviations in the frequency patterns,
resulting in a high loss function.
This is still just an idea without detailed implementation,
5.9 Extra: but we believe its a concept worth exploring further, especially
Although not required by task 5, we have worked also on check- given the potential advantages of quantum models in handling
ing the accuracy of the solution for multiclassification using 4,6 complex frequency data and detecting subtle anomalies.
different labels and using only the binary cross entropy loss with
PCA.
6.2.2 Multi-classification:
We got a lot lower results, and the maximum we were able
to get was around accuracy 60% for 6 class and 80% for 4 class. This project focused on binary classification but in some cases
The results are partially summarized in Table 3. multi-classification is required and binary classification is not
10
Train set Validation set Loss epochs Validation acc
349,364,350,350,325,361 61,42,39,51,61,47 cross entropy 70 56.81%
349,364,350,350,325,361 61,42,39,51,61,47 cross entropy 100 46.18%
349,364,350,350,325,361 61,42,39,51,61,47 cross entropy 215 48.18%
349,364,350,350,325,361 61,42,39,51,61,47 cross entropy 300 59.9%%
Table 3: table showing the results for circuit with 8 qubits.(multi-class classification, 6 classes)

sufficient. For example classifying the breeds of a certain animal


like dogs. Or classifying the type of a vehicle.
References
In our project we have shown some results when the project
[1] Daniel Gonzalez, Lukasz Cincio, Mikkel Kjaergaard, and
is expanded to 6 classes, this expantion can also be worked on
et al. Multi-class quantum classifiers with tensor network
to find some ways to get better accuracy.
circuits for quantum phase recognition. arXiv preprint
arXiv:2110.08386, 2021.
6.2.3 Unanswered questions:
[2] Patrick Holzer and Ivica Turkalj. Spectral invariance and
Based on observations in task 4 on the Adjustment of Number maximality properties of the frequency spectrum of quantum
of Points section, why exactly does an increase in the number of neural networks. Physical Review A, 104(3):032404, 2021.
data points result in a slight increase in the cost function? How
does this apply to other problems? Do we know exactly how [3] Mikko Mottonen, Juha J. Vartiainen, Ville Bergholm, and
many data points are needed depending on the problem? Martti M. Salomaa. Transformation of quantum states us-
How Does Our model compare to other models? Can we find ing uniformly controlled rotations. arXiv preprint quant-
an exponential speedup in QML? ph/0407010, 2004.
[4] Maria Schuld, Alex Bocharov, Krysta Svore, and Nathan
6.2.4 Startup: Wiebe. Circuit-centric quantum classifiers. Physical Review
Stratups in ML/AI mainly can provide consultation, new soft- A, 101(3):032308, 2020.
ware or programs and other services to companies and individu- [5] Maria Schuld and Francesco Petruccione. Machine Learning
als. Startups that try to merge AI/ML with quantum technolo- with Quantum Computers. Springer International Publish-
gies can use classical machine learning to build new software to ing, 2019.
speed certain tasks related to quantum technologies like better
or faster compilers for quantum computers. [6] Maria Schuld, Ryan Sweke, and Johannes Jakob Meyer. The
Startups in this field can also focus on research in trying to effect of data encoding on the expressive power of varia-
find new algorithms or potential use cases with speedup in the tional quantum machine learning models. Physical Review
field of quantum machine learning. A, 103(3):032430, 2021.

11

You might also like