**In this blog post, we want to take a closer look at the idea of a support vector machine that works on the basis of quantum computing or rather quantum circuits. The post is primarily aimed at people with experience in machine learning who want to learn more about quantum computing or quantum machine learning methods. **

**Support Vector Machines & how to measure performance**

When one starts to familiarize oneself with the field of AI and machine learning (ML), it usually doesn't take long before one stumbles across the concept of support vector machines (SVMs). This method, also referred to as Support Vector Classifier (SVC), is assigned to the field of Supervised Learning Algorithms and, in its basic formulation, is used for binary classification tasks, i.e. for tasks where an input (e.g. images) is to be assigned to one of two classes (e.g. dogs or cats). For this purpose, the so-called model is trained in advance with a lot of “labeled” data (e.g. images for which it has been said whether dogs or cats can be seen on them).

There are various metrics for evaluating the quality of such models. One such standard metric is the accuracy, which measures how many correct predictions the model has made. This is done by taking new labeled data that has not yet been shown to the model during training and classifying it. In the case of good training and high-quality data, accuracy values of well over 90% up to 100% can be achieved, although this is not always the case. In an industrial context, a minimally better classification of a few percent can sometimes save a lot, which in turn brings us to SVM. In this article, we want to look at how to train an SVM with quantum computers in order to build a better model under certain circumstances.

**Classical Support Vector Machine **

We will not talk in detail about the theory of classical SVMs, but we will mention some important points that are relevant for the transition to the quantum version.

When training an SVM, \( N \) labeled data points of the form \( (x_i, y_i) \) with \( i\in\lbrace1, \ldots , N\rbrace, x_i\in \mathbb{R}^n \) and \( y_i\in\lbrace -1, 1\rbrace \) are fed into the model. Here, \( x_i \) are the features (e.g. color values of individual pixels in an image) and \( y_i \) is the label of the corresponding class (e.g. ±1 for dog/cat). The classification of new data points is then based on the decision function.

Here, \(a_i, b\) are trainable parameters,\( \text{sgn}( )\) denotes the sign-function and \( \langle \cdot , \cdot \rangle \) is the standard scalar product. In order to determine the class of new of new features \( x \) after training, the scalar product of it with the name-giving *support vectors *(SVs) must be calculated. These SVs are a subset of the training data and abstractly define a plane in \( \mathbb{R}^n \), which divides the space into two areas. Depending on the area in which they lie, data points are assigned to one of the two classes.

The above equation separates the data points by a linear plane, also known as *decision boundary* or *hyperplane*. However, for realistic classification problems much more complex decision boundaries are required. On the other hand, it is quite possible that a hyperplane can be found by transforming the data into a higher-dimensional space. This is known as the *kernel trick* and is achieved by simply replacing,

where a general *kernel function* \( k \) must fulfill special properties in order to “behave” like a scalar product in higher dimensions.

A linear plane in the higher-dimensional space can correspond to a more complicated decision boundary in the actual *feature space* (see figure). The feature map \( \varphi \), which realizes the mapping, does not even have to be known in the general case, only the scalar product or rather kernel function is crucial. For complex classification tasks, however, finding a suitable kernel or, if desired, a suitable feature map, is not a trivial task.

_{Figure 1: Example for a classification task with a non-linear decision boundary. By mapping the data into 3D space, a hyperplane can be found to separate the data, which results in a curved decision boundary in the original feature space.}

With this, all relevant parts were introduced in order to understand, how quantum computers can utilized for classification tasks with SVMs.

**What’s in there for Quantum?**

There are various methods for using quantum computers to solve classification problems with SVMs. Here we consider the case when using a *quantum kernel*, analogous to this paper. We define the kernel via the overlap of two wave functions.

From a physical point of view, this kernel would be referred to as the fidelity of the two wave functions, which expresses the share of \( \psi(x_2) \) in \( \psi(x_1) \) and vice versa. If they are identical, this results in a fidelity of 1, while the fidelity will be 0 in the case of maximum dissimilarity (orthogonality of states). In the context of an SVM, it can therefore be said that the kernel measures how similar the two states are.

In the case of a quantum kernel, an explicit feature map \( \psi \) is also used, which is realized by a quantum circuit. This circuit takes features as input and encodes them, for example, as rotation angles of quantum gates. An example of a circuit representing a feature map with \( n=2 \) features can be seen in the figure below. It is the same feature map that was used in paper and it was chosen in such a way that the resulting kernel is not too trivial and therefore easy to simulate classically.

_{Figure 2: Circuit of the feature map with 2 features, built and drawn with Qiskit}Now, anyone who has worked with quantum computers knows that only the number of measured bitstrings consisting of 0s and 1s is accessible experimentally. So, given such a result, how can the fidelity or kernel be calculated for a given feature map?

The answer to this becomes clear if you look at the feature map as the evolution of a quantum state through a unitary operator acting on the ground state \( \vert 0 \rangle \) (note, that in this case \( \vert 0\rangle := \vert 0\rangle^{\otimes n} \) refers to the ground state of an \(n\) qubit register and is used interchangeably depending on the context):

In more technical terms, this rewriting shows that the kernel can be interpreted as the probability of measuring the 0-bit string after a circuit has been built with \( U \) and the adjoint operator \( U^\dagger \), with the corresponding features \( x_1 \) and \( x_2 \) as input.

In order to train the quantum SVM with \( n \) training features, the \( n(n-1)/2 \) independent elements of the kernel matrix can be calculated by executing and measuring circuits in the manner described above. After the training, the classification of new features can be achieved by calculating the kernel matrix of them with the previously determined support vectors and use the results in the decision function defined in eq. (1) when replacing the scalar product with the kernel function.

Now that we have discussed how to train/operate an SVM with quantum computers, the an open question is still:

**What does it help?**

First, the positive: For certain classification tasks, the accuracy of a SVM can actually be improved by utilizing a quantum kernel compared to “classical” standard kernels. However, this applies for the most part to artificial data sets. By choosing the right circuits for the feature map, better results can sometimes be achieved for real classification tasks. However, for such tasks it is also difficult to estimate whether there are also classically computable kernels that deliver similar or even better results.

The core idea behind the application of quantum kernels is that quantum computers operate *naturally* in high-dimensional vector spaces, which in turn is the essence behind the kernel trick (more precisely: the kernel trick refers to the transformation into a higher-dimensional space and finding a hyperplane there, to be utilized for complex classification in the lower-dimensional space). However, no general statements can be made about the efficiency of quantum kernels for classification tasks. Sometimes things have to be tried out in order to find a possible advantage.

In a follow-up post we will dive into an implementation of a quantum kernel SVM and how the quantum kernel matrix can be calculated with different devices via *PlanQK**.*