4th resrach paper

Neural Networks 162 (2023) 212–224
Contents lists available at ScienceDirect
Neural Networks
journal homepage: www.elsevier.com/locate/neunet
Edge computing on TPU for brain implant signal analysis

∗
János Rokai a,b , , István Ulbert a,c ,1 , Gergely Márton a,c ,1
a
Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Magyar tudósok körútja 2, building
Q2, H-1117 Budapest, Hungary
b
János Szentágothai Doctoral School of Neurosciences, Semmelweis University, Üllői út 26, H-1085 Budapest, Hungary
c
Faculty of Information Technology and Bionics, Pázmány Péter Catholic University, Práter utca 50/a, H-1083 Budapest, Hungary
article info a b s t r a c t
Article history: The ever-increasing number of recording sites of silicon-based probes imposes a great challenge for
Received 10 October 2022 detecting and evaluating single-unit activities in an accurate and efficient manner. Currently separate
Received in revised form 18 January 2023 solutions are available for high precision offline evaluation and separate solutions for embedded
Accepted 23 February 2023
systems where computational resources are more limited.
Available online 28 February 2023
We propose a deep learning-based spike sorting system, that utilizes both unsupervised and
Keywords: supervised paradigms to learn a general feature embedding space and detect neural activity in raw
Spike sorting data as well as predict the feature vectors for sorting. The unsupervised component uses contrastive
Deep learning learning to extract features from individual waveforms, while the supervised component is based
Brain–computer interface on the MobileNetV2 architecture. One of the key advantages of our system is that it can be trained
Feature extraction on multiple, diverse datasets simultaneously, resulting in greater generalizability than previous deep
Edge device learning-based models.
Electrophysiology
We demonstrate that the proposed model does not only reaches the accuracy of current state-
of-art offline spike sorting methods but has the unique potential to run on edge Tensor Processing
Units (TPUs), specialized chips designed for artificial intelligence and edge computing. We compare
our model performance with state of art solutions on paired datasets as well as on hybrid recordings
as well. The herein demonstrated system paves the way to the integration of deep learning-based
spike sorting algorithms into wearable electronic devices, which will be a crucial element of high-end
brain–computer interfaces.
© 2023 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY license
(http://creativecommons.org/licenses/by/4.0/).
1. Introduction Neural activities are usually recorded with sampling rates be-
tween 20–30 kHz (Bod et al., 2022). In order to remove local field
Extracellular recordings in the central nervous system (CNS) potential, low- and high-frequency noises to more reliably iden-
provide information on neural activity patterns that can be valu- tify single-cell activities, a bandpass frequency filter is applied to
able both for researchers in the field of neuroscience and for the recordings between 0,3–3 kHz (or 0,5–5 kHz).
developers in the brain–computer interface industry. In order Manual curation is no longer a viable option for interpreting
to analyze these neural patterns, the sources of single neuronal raw data due to the increased number of channels, by which the
activities (single-units, spikes) need to be identified and clustered time of the manual curation increases as well. Subjective bias is
(spike sorting). To increase the precision of spike sorting and also present in manual curation based on the experience of the
the number of recorded extracellular activity (spikes) from neu- curator.
rons, high-density neural microelectrode arrays (MEAs) are used, To automate the detection and clustering of the spikes, spike
which are implanted into the CNS (Fiáth et al., 2019; Steinmetz sorting algorithms are developed to speed up the processing of
et al., 2021). The number of recording sites on the MEAs is high-channel-number recordings. Conventional spike sorting al-
growing rapidly (Berényi et al., 2014; Fiáth et al., 2018), by which gorithms comprise three main processes: spike detection, feature
the recorded data is also growing, making an automated, robust, extraction, and clustering of the features. For spike detection
input-source agnostic spike sorter an increasingly valuable asset. a plethora of approaches are present already in the literature:
thresholding, non-linear energy operators (Kim & Kim, 2000),
∗ Corresponding author at: Institute of Cognitive Neuroscience and Psychol- Teager energy operator thresholding (Choi, Jung, & Kim, 2006),
ogy, Research Centre for Natural Sciences, Magyar tudósok körútja 2, building or wavelet decomposition (Quiroga, Nadasdy, & Ben-Shaul, 2004).
Q2, H-1117 Budapest, Hungary. For feature extraction some studies use principal component
E-mail address: [email protected] (J. Rokai). analysis (PCA) (Biffi, Ghezzi, Pedrocchi, & Ferrigno, 2008; Vargas-
1 These authors contribute equally. Irwin & Donoghue, 2007; Wood, Fellows, Donoghue, & Black,
https://doi.org/10.1016/j.neunet.2023.02.036
0893-6080/© 2023 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
J. Rokai, I. Ulbert and G. Márton Neural Networks 162 (2023) 212–224
2004), Independent component analysis (Hill, Moore-Kochlacs, sensibility aspect of the data, it is advisable to process the raw
Vasireddi, Sejnowski, & Frost, 2010; Jäckel, Frey, Fiscella, Franke, data locally. The model described in the paper is built upon a
& Hierlemann, 2012; Mamlouk, Sharp, Menne, Hofmann, & Mar- well-known and widely used library namely Tensorflow and Ten-
tinetz, 2005; Takahashi, Anzai, & Sakurai, 2003), optimal wavelet sorflow Lite and is built to run on devices with smaller processing
transforms (Yang & Mason, 2017) or Laplacian eigenmaps (Chah powers as well, like Tensor Processing Units (TPUs). TPUs are
et al., 2011). To cluster the so-generated features several cluster- custom hardware, specialized in running deep learning models
ing algorithms were proposed in the past: k-means-clustering (Dai very efficiently. To be able to perform complex deep learning
& Luo, 2014; Takahashi et al., 2003; Vargas-Irwin & Donoghue, inferences, efficient implementation of the different operators
2007), superparamagnetic clustering (Quiroga et al., 2004), spec-
was needed so only a limited number of architecture types are
tral clustering (Huang, Gan, & Ling, 2021), Hdbscan (Yger et al.,
supported.
2018) or Hdsort (Diggelmann, Fiscella, Hierlemann, & Franke,
We present our system, where, by taking the advantage of
2018).
both supervised and unsupervised worlds, a self-supervised model
Modern spike sorting methods also offer automated pipelines
like the different versions of KiloSort 1 (Pachitariu, Steinmetz, is trained to extract the waveform features, getting rid of the
Kadir, Carandini, & Kenneth, 2016), 2 (Stringer et al., 2019), 2.5 positional information from the cluster identity. The supervised
(Steinmetz et al., 2021), SpyKING CIRCUS (Yger et al., 2018) and model will detect the spikes present in a sample and at the same
MountainSort4 (Chung et al., 2017). Deep learning methods were time will predict the previously learned features of the spikes
also applied to different phases of spike sorting (Buccino, Gar- (Fig. 1).
cia, & Yger, 2022). For the detection phase, architectures like Our goal was to develop a deep learning-based spike sorting
LSTMs (Rácz et al., 2020), CNNs (Saif-Ur-Rehman et al., 2019) algorithm and demonstrate its efficiency on edge TPUs. We aimed
were suggested, but deep learning models were proposed for to develop a system that can be easily scaled, so users could easily
feature extraction (Eom et al., 2021; Moghaddasi, Aliyari Shoore- choose the best performance/speed suited for their needs.
hdeli, Fatahi, & Haghparast, 2020; Wouters, Kloosterman, & Bertr,
2021), clustering (Rácz et al., 2020; Yang, Wu, & Zeng, 2017), and
2. Methods
for full spike sorting functionalities as well. Different approaches
have been implemented such as a 1D-CNN-based architecture (Li,
Wang, Zhang, & Li, 2020), an autoencoder-based model (Rokai, 2.1. Data description
Rácz, Fiáth, Ulbert, & Márton, 2021) or deep contractive autoen-
coder (Radmanesh et al., 2022). Despite deep learning methods For acquiring data for the training of the model, a popular
thriving in other research areas, spike sorting seems to be more database was chosen, namely the Spike Forest platform (Magl
challenging for them. The main difficulty of developing deep et al., 2020), which enables the access to a plethora of well-known
learning-based spike sorting is that the ground truth for a wave- spike sorting datasets through a standardized interface. Because
form is given in the form of its cluster assignment. Thus, using the model was intended to be optimized to 128 channel record-
these ground truth labels as targets for a machine learning model ings, only datasets with high channel count were considered. Be-
will fail to generalize to new recordings, where the true clusters
cause of its hybrid nature, the Hybrid Janelia dataset was chosen
can differ in number and their properties. The cluster identity
as the main data source. In this dataset, the waveform templates
integrates not only the properties of the waveforms of the cluster
were recorded at 30 kHz as part of Kampff‘s Ultra Dense Extra-
but also the channel-wise position of the waveforms as well.
cellular Survey and based on these recorded templates, new hy-
An all-self-supervised deep-learning model can be an obvious
answer to this problem, however current self-supervised archi- brid recordings were generated with Kilosort2. From the Hybrid
tectures have limited performances compared to their supervised Janelia dataset, two recordings were chosen for training, namely
counterparts. the REC_64C_600S_11 and REC_32C_600S_31 recordings. For test-
Another important aspect of spike sorting methods is the ing the model‘s performance, recording
tradeoff between performance in speed and performance in ac- REC_64C_600S_12 (HS_64_12) and REC_32C_600S_32 (HS_32_32)
curacy. Modern state-of-the-art algorithms are performing offline was used.
spike sorting on a high-performance PC. However, a plethora of Both training recordings were used to train the self-supervised
potential applications would benefit from on-site spike sorting. and supervised models, while the results of the final model were
In order to evaluate the raw data on-site, allowing for build- generated on the test recordings.
ing closed-loop systems, several hardware implementations were To be able to compare the performance of the detection itself
suggested to create an online embedded spike sorting system. to other algorithms, recordings of two paired datasets were used:
These implementations can extract specific features, like the first recordings from Boyden dataset (Allen et al., 2018) and from
and second derivative extrema (Paraskevopoulou, Barsakcioglu, Yger dataset (Yger et al., 2018). In both studies extracellular and
Saberi, Eftekhar, & Constandinou, 2013), or can deliver spike sort- intracellular voltages were recorded simultaneously, where the
ing in its full spectrum (Cai, Gan, Wang, Zhang, & Han, 2019; Hao, ground-truth of the extracellular recording was established based
Chen, Richardson, Van der Spiegel, & Aflatouni, 2021; Hwang,
on the intracellular recording. Despite these recordings provide
Lee, Lin, & Lai, 2013; Schaffer, Nagy, Kincses, Fiath, & Ulbert,
high precision ground truth data of real electrophysiological data,
2021; Seong, Lee, & Jeon, 2021; Valencia & Alimohammad, 2019;
the ground truth only considers spikes from a single cell.
Wang et al., 2019; Xu et al., 2019), using methods like template
All the recordings were padded to 128 channels.
matching (Wang et al., 2019) or autoencoders (Seong et al., 2021).
The on-chip spike sorting solutions however sacrifice precision In this study, we refer to the electrical activity of a single
for speed, while they are also limited in the number of channels, neuron on a single channel over a short period of time as a
they can efficiently process data from. waveform. The term ‘‘samples’’ refers to individual inputs of
Spike sorting to be available on the different types of devices data for either the supervised or unsupervised model. For the
will be a real need potentially in the near future. For example, a unsupervised model, the input data is a waveform, so the terms
proposed solution for brain–machine interfaces, Neuralink (Musk, ‘‘sample’’ and ‘‘waveform’’ are used interchangeably. In the case of
2019), uses a system that is made out of an implanted sensor chip the supervised model, samples are snippets of electrical activity
communicating wirelessly with a mobile phone. Because of the that may contain multiple waveforms.
213
Fig. 1. Schematics of the training. In the top subfigure (A), the self-supervised model is depicted during training. Pairs of inputs are provided to the model and
are processed by a shared encoder, which produces a feature vector. This feature vector is then passed through a projection head and the NNCLR loss is calculated
based on the output of the projection head. The loss is then backpropagated through the model. During inference, only the encoder is used. The resulting feature
embeddings are used as labels in the supervised model depicted in the bottom subfigure (B). The supervised model is a single-shot detector type object detection
system, which has been modified to include a feature prediction branch. The goal of this branch is to learn the feature embeddings generated by the self-supervised
model during training.
2.2. Preprocessing inference time, only the encoder is used to extract features from
new input samples. The generated embeddings are then used as
The raw data was minimally preprocessed in order to simu- labels in the supervised phase of training.
late the conditions of modern electrodes like Neuropixels (Jun, Nearest-neighbor contrastive learning (NNCLR) (Dwibedi, Ay-
Steinmetz, et al., 2017), which have integrated bandpass filters. tar, Tompson, Sermanet, & Zisserman, 2021) is a self-supervised
To further this goal, a bandpass filter between 300 and 3000 Hz learning method based on contrastive learning, in which the
was applied to the data. The main channel for different clusters of model outputs a feature vector. In the contrastive learning
spikes was determined semi-automatically by averaging out win- paradigm, the model takes in two inputs and generates two
dows containing spikes from a particular cluster and generating different feature vectors. The contrastive loss function then either
an automatic proposal for the main channel number, which was pulls the vectors together or pushes them apart in the feature
then manually reviewed by an expert. space, depending on whether the inputs are considered positive
or negative pairs. This results in the model producing similar
2.3. Self-supervised learning for feature extraction feature vectors for similar inputs and separable feature vectors
for non-similar inputs. The similarity can be (and it is most
The self-supervised learning approach utilized in this study of the cases) of higher order. To properly utilize this principle
involves the use of unsupervised learning, specifically nearest- and effectively train the model in an unsupervised manner, it
neighbor contrastive learning, to extract features from the given is necessary to augment the inputs to produce multiple similar
waveforms. input pairs for the contrastive model. NNCLR builds upon this
Unsupervised, or self-supervised learning has come a long principle by using nearest-neighbor to enhance the proximity
way and the performance gap between unsupervised and su- between different views of the same sample, which are typically
pervised learning is closing. A handful of unsupervised methods produced by data augmentation. In NNCLR, the nearest-neighbor
offer promising results, however for the purpose of this study component is used to select a similar point in the feature space
nearest-neighbor contrastive learning (NNCLR) was chosen. (Q) for one of the feature vectors using nearest-neighbor (NN).
During training, pairs of inputs are given to the model and The dot product is then calculated between the selected similar
fed through the same encoder, which produces a feature vector point and the other feature vector (which is l2 normalized). The
for each input. These feature vectors are then processed by a other inputs in the mini-batch act as negative samples. The loss
projection head, and the NNCLR loss is calculated based on the function, LNNCLR
i (Eq. (1)), is defined as the function of the two
output of this projection head. The loss is then backpropagated feature vectors (zi and zi+ ) which are generated by the model
through the model to update the weights of the encoder. At θ from the same input i in the given mini-batch using data
214
Fig. 2. Pairs of waveforms as inputs for self-supervised model. The pairs of inputs for the self-supervised model consists of an instance from a cluster and the
mean template waveform of that cluster. The averaged waveform is showed in red, while the instance is depicted in blue. The normalization is done before the
augmentation. The left column contains examples of input pairs before augmentation, while the right column contains examples of input pairs after augmentation.
augmentation (Eq. (2)) (zi = θ (aug(inputi )),zi+ = θ (aug(inputi ))). The base model for NNCLR was constructed using Residual
Instead of calculating the dot product, NNCLR uses NN to select blocks, 1D convolution layers, Dense layers, and Batch normal-
a similar point (NN (zi , Q )) for one of the feature vectors. This ization layers. (Fig. 3.) The input to the model is a 1D sample
selection is used in the loss function to pull the feature vectors for with shape (1 × 105), and the output is a vector with 32 di-
positive pairs together and push the feature vectors for negative mensions (1 × 32). The residual blocks and convolution layers are
pairs apart in the feature space. used to extract features from the input sample, while the batch
NN (zi ,Q )·zi+ normalization layers help to stabilize the training process and
exp( τ
) improve the model’s performance. Leaky ReLU activation function
LNNCLR
i = − log ∑ (1)
n NN (zi ,Q )·zk+ after the batch normalization layers introduces a small non-zero
k=1 exp( τ
)
gradient for negative input values, allowing the model to learn
To extract relevant features from waveforms using NNCLR, more robust features. Together, these layers work to transform
waveforms were extracted from the dataset and the average of the input sample into a compact, low-dimensional feature vector
the waveforms was calculated for each cluster. Samples with sin- that represents the underlying patterns in the data. The model
gle channel were then extracted from the waveforms which were depth is quite shallow, to enhance stability and avoid overfitting.
105 datapoints long. The waveforms were normalized between The NNCLR architecture includes a projection head during
[0, 1] to facilitate the learning of the waveforms themselves and training, but this block is removed during inference. The remain-
avoid overfitting to the signal to noise ratio of the samples. Noise ing backbone model, depicted in Fig. 1 as the encoder, is used for
was introduced into the one-dimensional input through aug-
unsupervised inference phase. In this phase, the feature vector of
mentations using random scaling (Eq. (2)) and jittering (Eq. (3)),
each mean waveform is extracted and saved as a label for use in
which introduced multiplicative and additive noise with a normal
the supervised phase.
distribution, respectively (Eq. (4)).
scaling (x) = x ∗ random.normal(mean = 1, stddev = 0.1) (2) 2.4. Supervised model
jittering (x) = x + random.normal(mean = 0, stddev = 0.03)
(3) The supervised model is the final model of the proposed
system. As stated previously, the self-supervised model is used
aug (x) = jittering(scaling (x)) (4) as an auxiliary model to generate embeddings for the spikes.
Samples were extracted from filtered data based on ground
Positive pairs were formed using an instance i waveform from a
truth spike labeling. Each sample was 128 datapoints long (4.26
cluster k and the average waveform of the same cluster (Eq. (5),
ms) and was formed by flattening the channels into a single
(6)) (Fig. 2).
axis and using the time axis as the other axis. Each sample was
inputik,1 = aug(xki ) (5) generated based on a specific spike, which was always placed in
N the middle of the sample, referred to as the ‘‘central spike’’. This
1 ∑ method of sample generation allows for natural augmentation of
input2k = aug( xki ) (6)
N the data because multiple samples may contain the same spike,
i=1
215
Fig. 3. Backbone model for embedding generation for NNCLR. The backbone of the NNCLR is a simple model made of customized 1D Residual Blocks, 1D convolution
layers and Batch normalization. After every BatchNormalization layer a LeakyRelu activation layer is present as well, which is not depicted on this figure to increase
clarity. The input is a 1-dimensional sample (of shape 1 × 105), while the output is a vector of 32 dimensions (1 × 32).
but with different central spikes and therefore different positions practical for this use case, as it had lower inference speed. There-
on the time axis. This not only allows for data augmentation, but fore, MobileNetV2 was selected because it is a lightweight archi-
also maximizes the positive to negative sample ratio by ensuring tecture supported by the edgeTPU and is well-suited for systems
that every sample contains at least one spike. The samples were with limited computational resources. The MobileNetV2 architec-
then augmented with 2D scaling and jittering (see Fig. 4) to ture uses Depthwise Separable Convolutions (Chollet, 2017) in
improve the model’s robustness against different noise levels and inverted residual blocks, which give it a highly efficient and low-
time-dependent noise. computational nature. The alpha parameter, also known as the
Single-shot detector (SSD) models are anchor-based object de- width multiplier, controls the width of the convolutional blocks in
tection models that use a set of pre-defined boxes, called anchor MobileNetV2 and determines the trade-off between accuracy and
boxes or anchors, to identify objects in an input image. These performance. A smaller alpha value results in decreased accuracy
anchor boxes are placed at various locations and scales through- but increased performance due to reduced computational re-
out the image and are designed to overlap heavily, allowing the quirements, while a larger alpha value leads to increased accuracy
model to identify objects with high precision. To improve the but decreased performance. For this study, the MobileNetV2 was
accuracy of the detection, the model also predicts the transfor- customized with an alpha value of 0.2. The MobileNetV2 was also
mation parameters that control the positions and sizes of the customized by doubling the output dimensions while maintaining
anchor boxes. This enables the model to adjust the boxes to better the depth of the original model, which greatly improved the
match the objects in the image, even if they are shorter or have model’s performance.
different aspect ratios than the anchor boxes. During inference, The output of the model consists of 3 different branches: box-
the model outputs a set of confidence scores for each anchor , score- and feature prediction branches. The score and feature
box, indicating the likelihood that the box contains an object. To branches have a common, but separate branch from the box
select the best prediction for each object, non-max suppression branch, branching at the end only. Both main branches use ar-
(NMS) is applied to the overlapping boxes, taking into account chitectural elements from SSDLite introduced with MobileNetV2.
the confidence scores. NMS removes lower confidence boxes that The score prediction consists of 2 different classes, where the
are highly overlapped with higher confidence boxes, leaving only model predicts the probability that a box is containing a spike
the box with the highest confidence score for each object. This or not. The feature prediction branch output has the same dimen-
helps to reduce false positive detections and improve the overall sions as the feature vectors generated by the previously described
accuracy of the model. self-supervised model. For the box and score prediction, Focal-
The custom anchor system was designed to address the aspect Loss (Lin, Goyal, Girshick, He, & Dollar, 2020) was applied, while
ratio of waveforms when representing them on a 2D plane. The for the feature prediction the cosine similarity loss was calculated
anchor boxes used had a universal width of 5 channels, which during training. The provided feature vector for the boxes con-
was chosen for simplicity to match the width of the ground taining no-spikes was a vector of the same length filled with zero
truth boxes. Using anchor boxes with different widths would not values. At inference, a postprocessing step is added to the model
significantly impact performance because the model is able to output, where a NMS is performed based on the box and score
predict the transformation parameters for the positions and sizes predictions and based on the results the predicted feature vectors
of the anchor boxes, allowing it to adjust the boxes to better fit are filtered as well.
the objects in the input image. The key factor in improving the
precision of the detection is ensuring that the anchor boxes have 2.5. Clustering
good overlap with the objects in the image, as this allows the
model to accurately predict the transformation parameters. For the clustering of the post-processed data, two clustering
Ground truth boxes were formed based on ground truth 2D algorithms were used. A more time-efficient choice was the Isos-
points and had a universal width of 5 channels (covering an elec- plit5 clustering algorithm, which was first introduced and used
trode space of approximately ∼38 µm2 ), with the ground truth for PC-based real-time spike sorting(25). The other clustering
point placed in the middle. This customization of the labeling algorithm chosen was the Accumulative clustering algorithm.
generation and anchor system allows the model to accurately pre- The latter one is based on hierarchical clustering. While the
dict the transformation parameters of the anchor boxes, enabling former does not require any hyperparameters, the latter does. The
it to adjust the boxes to better match the objects in the input hyperparameters were chosen by maximizing the clustering ac-
image. The ability to predict these transformation parameters is curacy on a portion of the training dataset using hyperparameter
important for improving the precision of the detection. The model search. The searched parameter for building the nearest neighbor
predicted 1024 anchor boxes for each sample. graph was the number of neighbors (n_neighbors = 5) and for
The SSD model was chosen based on the limitations of the the agglomerative clustering was the distance threshold (dis-
edgeTPU hardware and the need for a simple yet efficient archi- tance_threshold = 5.5). The latter one determined the distance
tecture. MobileNetV2 (Sandler, Howard, Zhu, Zhmoginov, & Chen, above which clusters were not merged.
2018) and EfficientDet (Tan, Pang, & Le, 2019) were considered as To increase the speed and performance of these sorters, 10
two promising options. EfficientDet had higher accuracies accord- principal components were extracted from the original features,
ing to previous research, but its greater complexity made it less with principal component analysis (PCA). The use of 10 principal
216
Fig. 4. Generated samples with boxes. Ground truth data is generated to be in 2D format, where X -axis is the channel axis, while Y -axis is the time axis. The boxes
are formed so that they center the spikes both in length (l = 32) and channel-wise (w = 5). Because the 64-channel recordings are used, padding can be observed
on both sides of the samples. The final input dimension thus being 128 × 128.
components was chosen to provide flexibility for future recording 2.7. Evaluation
sorting and to prevent overfitting (Figure S1). The so reduced
features were then given as input to the clustering algorithms. The evaluation of the self-supervised model was done with
multiple similarity matrices. In order to build the Mean Embed-
2.6. TPU device ding Similarity (MES) matrix, the self-supervised model’s encoder
(E) was applied to waveform samples (xw c ,i , where c is the cluster
identity index, i is the instance index) to generate feature vectors
To run the model in an embedded environment, EdgeTPU chip
in the latent space. These feature vectors were then grouped by
was chosen, which has as basis a Tensor Processing Unit (TPU)
their ground truth cluster identity, and the Euclidean distance
which is a specialized ASIC chip for deep-learning tasks. The TPU
between them was calculated using Eq. (7). The resulting distance
is built with a plethora of supported operations, however despite
values were used to populate the similarity matrix s, which was
having a large basis of supported operations, it is still considered
then used to evaluate the separability of the different clusters
a limitation in the architecture designing process. To speed up
in the latent space. To normalize the values in s, the minimum
the designing phase, known, supported architectures were chosen
value (min(s)) and the maximum value (max(s)) of s are first
from. The EdgeTPU runs the operations in an efficient manner, determined. Then, for each value in s (si,j ), the equation subtracts
requiring only 1 Watt per 2 Tera Operations Per Second (TOPS). the minimum value from it, and divides the result by the differ-
From the first node that the EdgeTPU encounters as being an ence between the maximum and minimum values. Finally, this
unsupported operation, the execution will be performed on the result is subtracted from 1 to obtain the normalized value for MES
CPU side. (Eq. (8)).
The evaluation of the model speed is done on two different  
1 ∑ 1
TPU devices: a Coral Development Board Mini (CDBM), which

E(xw E(xw
∑
si,j =  i,n ) − j,m ) (7)
 
consists of a MediaTek 8167s System on Chip (which integrates a N n M 
m 2
Quad-core Arm Cortex-A35 CPU and an IMG PowerVR GE8300 GPU),
2 GB LPDDR3 and a TPU module; a Coral USB Accelerator (CUA) si,j − minimum(s)
MESi,j = 1 − (8)
consisting of a TPU module with a USB 3.0 connector. The CUA maximum(s)
acts as a peripheral to a PC with a configuration of AMD Ryzen 7 To create the Distance Between Clusters matrix (DBS), the
2700X Eight-Core Processor 3.70 GHz CPU, 16 GB DDR3 RAM and distances between the different clusters on the channel axis (xch )
a Nvidia GeForce RTX 2080 SUPER GPU. were considered. The process of constructing the matrix involved
The inference speed is measured on both systems, measuring several steps. First, a standard distance matrix (d) was created
the net speed of the model inference on the TPU chip, and mea- using Eq. (9), which calculates the distance between two clusters
suring the additional time needed by the NMS postprocessing. As based on their positions on the channel axis. Next, this matrix
the NMS is integrated into the model itself, the TFLite library will was normalized to ensure that all the values were within a spe-
automatically take care of the data transfer between the CPU and cific range. Finally, the values in the matrix were inverted using
TPU, because NMS runs only on CPU not being supported by the Eq. (10). This resulted in the final distance matrix D, which repre-
TPU. Thus, the execution can be divided into two major parts: sents the distance between the different clusters on the channel
spike detection and feature prediction executed on the TPU, and axis. The values in the matrix are normalized and inverted to
the postprocessing, like NMS and sorting will be effectuated on ensure that they are easy to interpret and compare.
the CPU. To run the proposed model on the TPU, it is necessary
di,j = xch ch
j − xi (9)
to quantize the model, which involves converting the model’s
parameters from 32-bit float values to 8-bit integers. This process di,j − minimum(d)
DBSi,j = 1 − (10)
can often result in a performance drop, but to minimize this drop, channel_number
a quantization-aware training method was applied during the By normalizing the matrices S and D and combining them, a
model’s training. This technique involves introducing quantiza- Combined matrix was created (Eq. (11)). This matrix combines
tion noise during training, which allows the model to learn to both the positional difference between clusters on the channel
be more robust to the effects of quantization. When the model axis and the similarity between their feature vectors, allowing for
is then quantized for deployment, it should experience a smaller the separability of the clusters from each other to be investigated.
performance drop compared to a model that was not trained with The resulting Combined matrix reflects the overall similarity be-
quantization-aware training. tween pairs of clusters, taking into account both their positions
217
Fig. 5. Embedding of the waveforms after NNCLR training. Using the NNCLR method, a highly separable latent space is obtained. To visualize the high-dimensional
space, t-SNE is applied for dimensionality reduction. In the figure, we can observe that different clusters overlap each other if their waveforms are similar while
being separated from those that differ.
Table 1
Detection accuracy comparison. Results of different algorithms on different, paired datasets. Two main datasets were
used to compare our model to state-of-art spike sorting algorithms.
Algorithm Datasets Avg acc
BOYDEN YGER
1103_1_1 509_1_1 419_1_7 20170621 20170622_1 20170622_2
HerdingSpikes2 Hilgen et al. (2017) – – – 0,93 0,81 0,93 0,89*
IronClust Jun, Mitelut, et al. (2017) 0,84 0,76 0,74 0,84 0,66 0,94 0,8
JRClust Jun, Mitelut, et al. (2017) 0,92 0,53 0,88 – – 0,94 0,82*
KiloSort Pachitariu et al. (2016) 0,96 0,05 0,75 0,97 0,97 0,94 0,77
KiloSort2 Stringer et al. (2019) 0,57 0,65 0,9 0,42 1 0,94 0,74
MountainSort4 Chung et al. (2017) 0,96 0,76 0,71 1 0,97 0,92 0,88
SpykingCircus Yger et al. (2018) 0,93 0,69 0,75 0,98 1 0,94 0,88
Tridesclous Pouzat and Garcia (2019) 0,89 0 0,71 0,98 0,96 0,94 0,74
Average acc 0,86 0,49 0,77 0,87 0,91 0,93 0,80
Ours 0,96 0,73 0,88 1 1 1 0,93
and their feature vectors. false positive (FP) elements as well:

S·D TP
Combined = (11) Acc = (13)
maximum(S · D) TP + FN + FP
The Template Embedding Similarity matrix (TES) (Eq. (12)) The FP and FN boxes of the detector were excluded from
was built to investigate the distance between the embeddings of sorting, but they were incorporated into the scores of the sorting
the cluster-wise averaged waveforms. In each point of the matrix, metrics.
the distance is shown between the mean waveform embeddings Scores were generated both for the stand-alone sorting step,
and the embedding of the mean waveforms of a particular cluster. as well as for the combination of the detection and score per-
 ( ) ( ) formance. The latter was used to evaluate the performance in
1 ∑ w 1 ∑ w  comparison with other state-of-art offline methods.
 
TESi,j = E xi,n − E xj,m  (12)

 N M 
n m 2
3. Results
The evaluation of the final supervised model was performed
in two steps: the evaluation of the spike detection capability and 3.1. Results of the unsupervised part
the evaluation of the feature prediction quality. To evaluate the
detection capability of the supervised model, a series of boxes See Figs. 5 and 6.
were excluded from the computations: boxes that were on the
edge of the samples in the time-axis were excluded because it is 3.2. Results of the supervised part
not optimal to detect and predict the feature set of halved spikes.
The evaluation of the feature prediction was also made on the so See Tables 1–5.
filtered outputs.
Accuracy metric was used to evaluate the performance of the 4. Discussion
detection and sorting. The used accuracy metrics is identical to
the one used on the Spikeforest database. This metric (Eq. (13)) In this paper we introduced a new approach to deep-learning-
takes into account the true positive (TP), false negative (FN) and based spike sorting, namely the training of a supervised network
218
Fig. 6. Similarity matrices of spike clusters. Subfigure (A) shows similarity matrix between the different cluster-means. (B) depicts the normalized 1-distance
between different clusters in the channel-axis. (C) subfigure is the combination of (A) and (B) both normalized between 0,1, are combined thus a similarity matrix
is built which considers both distance and mean embedding between clusters. In subfigure (D) the normalized similarity matrix between the cluster template
embeddings can be seen.
Table 2 Table 4
Detection performance on hybrid data. Results of average Spike sorting performance. Comparison of the different spike
accuracies on the two hybrid data with multiple ground sorting algorithms on two of the hybrid datasets.
truth. The hybrid datasets are from the Hybrid Janelia Algorithm Accuracy
dataset.
HS_64_12 HS_32_32
Datasets Average detection accuracy
HerdingSpikes2 0,79 0,47
HS_64_12 95,69% IronClust 0,86 0,52
HS_32_32 50,05% IRClust 0,89 0,53
KiloSort 0,94 0,51
KiloSort2 0,84 0,53
Table 3
MountainSort4 0,82 0,48
Clustering performance. Results of the two clustering methods used for the
SpykingCircus 0,91 0,54
sorting of the features given by our model.
Tridesclous 0,87 0,54
Clustering algorithm Datasets Mean 0,86 0,51
HS_64_12 HS_32_32 Ours 0,86 0,42
ISOSplit5 (25) 74,60% 69,90%

Agglomerative clustering 89,85% 81,62% Table 5
Inference speed performance. Comparison of the inference speed of
the different types of setups. Inference speed is composed of the
model’s computation time for making predictions and the time
in a two-staged process, using well-known architectures and required for the non-maximum suppression (NMS) step.
concepts. The proposed model can be run on platforms such as Setup NMS incl.
TPUs, PCs, and mobile devices supported by the Tensorflow Lite PC CPU + GPU 3,95 ms
library. PC CPU + USB accel 5.32 ms
The self-supervised model is designed to learn the relevant Coral DevBoard Mini 22,15 ms
features of the waveforms without being provided with any labels
or specific cluster assignments. Instead, the model is trained to
make sure that different views of the same waveform, which
are produced through data augmentation, are mapped to similar
219
feature vectors in the embedding space. This is achieved through In contrast, our self-supervised model is input-source agnostic,
the use of the nearest-neighbor contrastive loss function (NNCLR), meaning it does not rely on the specific structure of the electrode
which measures the similarity between different views of the in order to process the data. This allows us to train the model on
same waveform by selecting the nearest neighbor in the feature multiple datasets simultaneously, resulting in a more generalized
space for one of the feature vectors and calculating the dot embedding of the waveform properties. By getting rid of channel-
product between the two. By training the model in this way, wise information, we are able to consider a broader range of
we can ensure that it learns a generalizable representation of waveform shapes and variations, improving the model’s ability
the waveform, which is not specific to any particular dataset or to generalize to new recordings.
recording. This improves the model’s ability to generalize to new, The ability to train the final supervised model on multiple
unseen data and makes it more robust to variations in the data recordings at once is also important for improving its general-
distribution. izability. In the supervised setup, this is achieved by assigning
The supervised model is then trained to detect spikes and pre- a binary class to the relevant boxes (containing neural activity)
dict the corresponding feature vectors in the embedding space. and predicting the features of those waveforms in the embed-
This is done through the use of an anchor-based single-shot ding space, rather than assigning cluster numbers. This approach
detector (SSD), which uses a map of anchor boxes to identify differs from previous conventions in the field, where cluster
possible detections. These boxes are designed to overlap heavily, numbers were typically assigned instead of predicting features.
enabling high-precision detection. To further increase precision, Overall, this two-staged process enables the development of a
the model predicts the transformation parameters for the posi- more robust and generalizable deep learning-based spike sorting
tions and sizes of the boxes, which allows it to detect shorter model.
spike activities. Non-max suppression (NMS) is used to select the
best box prediction from the overlapping boxes with different 4.1. Self-supervised model
confidence scores.
The supervised model is trained in a similar way to other
One of the limitations of some unsupervised learning algo-
object detection models, by providing it with labeled data that
rithms is that they can be biased by the inclusion of labeled data
specifies which boxes correspond to spikes and which do not.
or assumptions about the data. In the case of the self-supervised
However, instead of predicting the cluster assignments for each
model used in this paper, the inclusion of cluster-wise-averaged
box, the model is trained to predict the feature vectors for the
waveforms as one of the pairs for the contrastive learning process
boxes that contain spikes. This approach has the advantage of
could be seen as introducing a supervised bias. However, to
being more generalizable, as the model is not tied to any specific
mitigate this potential bias, augmentation was applied to the
cluster assignments or electrode geometry. By training on mul-
cluster average waveforms as well. Additionally, using different
tiple recordings at once, we can ensure that the model learns a
clusters with similar average waveforms can also help to alleviate
more generalizable representation of the waveform features. This
the impact of this bias, as shown in Fig. 5. In this figure, different
is important for the same reason that it was important for the
clusters tend to overlap because the waveforms are very similar,
self-supervised model: generalizability. By training on multiple
which helps to avoid unnecessary cluster separation. Overall,
recordings at once, we can ensure that the model is able to
these measures help to ensure that the self-supervised model
handle new recordings with different cluster assignments and/or
is able to learn more generalized feature representations, rather
electrode geometry. (Fig. 1.)
than being overly influenced by any labeled data or assumptions
Importantly the model operates on pre-filtered input data, as-
about the data.
suming a hardware-based bandpass frequency filter is already im-
To demonstrate the effectiveness of our approach in creating
plemented when it comes to embedding systems (like Neuropixel
a general embedding space, and the overlapping clusters can
probes (Jun, Steinmetz, et al., 2017)).
be indeed resolved by using channel information, we generated
Because the input of the self-supervised model is the one-
similarity matrices to analyze the distinguishability of various
dimensional waveform itself, thus not relying on any geome-
clusters (Fig. 6). These matrices were generated by training our
trical structure of the electrode itself, we can state that the
self-supervised model is input-source agnostic, thus having the model on two different datasets simultaneously, which allowed
advantage of training a single model with data from differ- us to observe the separability of the different waveforms within
ent sources, producing a more generalized embedding space for these datasets. In order to further examine the separability of
waveforms of all forms and shapes. the clusters, we also included channel-distance information be-
This appears to be a major benefit over other deep-learning- tween the clusters in our analysis. This was necessary because
based spike sorting algorithms, where the system is either su- the hybrid recordings we used to train our model contained sim-
pervised, thus not source agnostic, or unsupervised with a lower ilar waveforms that were used to generate different clusters on
accuracy and dependent on the geometrical structure of the elec- different channels. The combination of both types of information
trode. resulted in a highly separable matrix, demonstrating the ability
Supervised models, which are trained on input data from a of our model to create a general embedding space that is able to
single source, create an internal embedding of the given wave- effectively separate different waveforms.
forms. This means that they represent the waveforms in a specific
way in order to identify patterns and make predictions. However, 4.2. Supervised model
when these models are applied to new recordings, they may
struggle because the internal representation of the waveforms Feature prediction and the detection of the individual spikes
is not general enough. It only captures the variability of the were assessed separately as well. To assess the performance of
properties of the waveforms seen during training and does not the detection of our model, we used paired recordings. This
account for other types of waveforms. This is a common issue in allowed us to compare the results to those of other existing
supervised spike sorting, as the recordings used for training often solutions. The results, shown in Table 1, demonstrate that our
only contain a limited number of waveforms, and new record- model performs very well in terms of spike detection and is
ings may have different cluster assignments and/or a different able to generalize to new recordings with different electrode
electrode geometry. parameters and waveform types. In fact, the results show that our
220
Fig. 7. Embedded features after clustering. The features generated by our model. To the features NMS is applied, PCA and the results of the latter are then given as
inputs for t-SNE to visualize the original 32-dimensional feature space in 2 dimensions. The clustering was performed after PCA, the different coloring representing
different ground truth clusters.
model performs better and more consistently than current state- 4.4. Inside the black box
of-the-art methods, even though it is specifically designed for use
on embedded systems. These results suggest that our model is The supervised model was designed to identify and predict the
a promising solution for accurate and reliable spike sorting in a relevant features of spike waveforms, and to do so, it utilizes a
variety of settings. series of filters that are applied to the input data. These filters are
A separate assessment was made for the two hybrid datasets, designed to activate in response to specific patterns or features
where detection, sorting and the combination of the two was within the waveform data, and as such, they are able to identify
considered (Fig. 7). The detection performance has a quite large and classify different types of spike waveforms. To better under-
gap between the two recordings (Table 2.): one of the probable stand the features that the model is learning, a tool was used to
explanations for this is that for the HS_64_12 recording cluster extract the filters and their activations in response to a particular
with the smallest SNR has an SNR value of 4.38, while for the input sample. Fig. 8A shows a selection of these filters, and it can
HS_32_32 the minimum SNR is 0.34. The sorting of the found be seen that at the first layers, the filters are activated in response
spikes show a more robust performance: while the Isosplit5 algo- to any spike-like waveform. However, as the model progresses
rithm provides a faster sorting, the agglomerative clustering has through the layers, the activations become more specific and are
a better performance on the generated feature space, however able to distinguish between different types of waveforms based
being the slower one. (Table 3.) on their form and characteristics. Fig. 8B shows the layer-specific
Table 4. compares the spike sorting performance of our system activations averaged and superimposed on the original input data.
with other methods. We demonstrate that our compact model The strong contrast in activation between the spikes and their
surroundings suggests that the model is able to identify and
can reach the performance of some of the offline sorters and
predict the features of the spikes not only based on the wave-
comparable with other the state-of-the-art methods.
forms themselves, but also based on their relationship with their
surroundings and their appearances on other nearby channels.
4.3. TPU inference
4.5. Scalability
We tested the inference speed of our model on 128-channel
samples in three different scenarios: a completely PC-based setup, The presented system can also be scaled both data-wise as
where high performance CPU and GPU is available; a hybrid architecture wise. One of the benefits of such system is that it
setup where high performance CPU is coupled with a TPU-based can be trained on more datasets at once but also the backbone
USB Accelerator, and a development-board-based setup, where a architecture of both the unsupervised and supervised part can be
lower performance CPU is coupled with a TPU. The first setup improved according to the state-of-the-art methods.
was obviously the fastest, while the last one was the slowest The presented system is designed to be scalable in two key
one. The exported TFLite model is also heavily influenced by the ways. First, it can be trained on multiple datasets simultaneously,
speed of the CPU, because of the integrated NMS which runs allowing it to learn a more generalized representation of the
on the CPU. The difference in latency time seen between the waveform properties. This is achieved by using a self-supervised
different setups in the Postprocessing included column in Table 5, model to extract relevant features from the waveforms, and a
consists of the different processing speeds of the NMS node in supervised object-detection-based model to detect spikes and
the model. The CPU-TPU setup can achieve real-time inference predict the feature-vectors in the embedding-space learned by
speed, being able to process recordings with sampling rate up to the self-supervised model. This approach allows the system to
24 kHz. In contrast, the DevBoard-setup has a slower inference be input-source agnostic, meaning it can be trained on data from
speed, because of the lower CPU performance: in an online matter different sources without requiring any prior knowledge of the
it can handle data with 6 kHz sampling rate. recording conditions or electrode geometry.
221
Fig. 8. Reverse engineering the model. Subfigure (A) depicts the activation of the filters on different levels of the model. The input layer shows the sample that
the following activations were calculated on. The same input layer was used for subfigure (B) as well. On subfigure (B) four different heatmaps are superimposed
on top of the original input. The heatmaps represent the mean activation of all the filters on the respective level.
Second, the system can also be scaled in terms of archi- without any hardware changes, effectively converting the prob-
tecture by using state-of-the-art methods to improve both the lem of spike sorting from a hardware problem to a software
self-supervised and supervised components of the model. This problem at the level of embedded systems. This makes the prob-
means that the system can be updated to reflect advances in lem of spike sorting more flexible and adaptable, allowing for
machine learning techniques, without requiring any hardware more efficient and accurate spike sorting on various platforms
changes. The so created embedding system then can be updated and devices.
222
Overall, the scalable nature of the system makes it a powerful Appendix A. Supplementary data
tool for addressing the challenge of spike sorting in a variety of
different contexts. Supplementary material related to this article can be found
online at https://doi.org/10.1016/j.neunet.2023.02.036.
4.6. Limitations and future work
References
One of the limitations of the current system is that it may
struggle with spatio-temporally highly overlapping spikes. This Allen, B. D., Moore-Kochlacs, C., Bernstein, J. G., Kinney, J. P., Scholvin, J.,
Seoane, L. F., et al. (2018). Automated in vivo patch-clamp evaluation
is because the current implementation of the Non-Maximum of extracellular multielectrode array spike recording capability. Journal of
Suppression (NMS) postprocessing step is not optimized for sep- Neurophysiology, 120(5), 2182–2200.
arating spatio-temporally overlapping spikes. As a result, there Berényi, A., Somogyvári, Z., Nagy, A. J., Roux, L., Long, J. D., Fujisawa, S., et
is room for improvement in this area. As a future work, we al. (2014). Large-scale, high-density (up to 512 channels) recording of local
circuits in behaving animals. Journal of Neurophysiology, 111(5), 1132–1149.
intend to not only increase the separability between overlapping
Biffi, E., Ghezzi, D., Pedrocchi, A., & Ferrigno, G. (2008). Spike detection algo-
spikes, but also to migrate the NMS postprocessing to the Tensor rithm improvement, spike waveforms projections with PCA and hierarchical
Processing Unit (TPU) in order to reduce the computational time classification. In IET conf publ, no. 540 CP.
difference between using a PC with a USB accelerator and the Bod, R. B., Rokai, J., Meszéna, D., Fiáth, R., Ulbert, I., & Márton, G. (2022). From
end to end: Gaining, sorting, and employing high-density neural single unit
Coral Dev Board. By addressing these issues, we hope to further
recordings. Frontiers in Neuroinformatics, 16(June).
improve the performance of the spike sorting pipeline. Buccino, A. P., Garcia, S., & Yger, P. (2022). Spike sorting: new trends and chal-
lenges of the era of high-density probes. Progress in Biomedical Engineering,
5. Conclusion 4(2), 1–20.
Cai, H., Gan, C., Wang, T., Zhang, Z., & Han, S. (2019). Once-for-all: Train one
network and specialize it for efficient deployment. (pp. 1–15). Available
To the best of our knowledge, our model is the first deep- from: http://arxiv.org/abs/1908.09791.
learning-based model which can handle recordings with high Chah, E., Hok, V., Della-Chiesa, A., Miller, J. J. H., O’Mara, S. M., & Reilly, R. B.
number of channels, can be deployed to embedded systems (es- (2011). Automated spike sorting algorithm based on Laplacian eigenmaps
pecially to TPUs) and can run real-time spike-sorting (depending and k -means clustering. Journal of Neural Engineering [Internet], 8(1), Arti-
cle 016006, Available from: https://iopscience.iop.org/article/10.1088/1741-
on the configuration) on never-seen recordings with performance 2560/8/1/016006.
comparable to current state-of-the-art methods. Choi, J. H., Jung, H. K., & Kim, T. (2006). A new action potential detector using
the MTEO and its effects on spike sorting systems at low signal-to-noise
CRediT authorship contribution statement ratios. IEEE Transactions on Biomedical Engineering, 53(4), 738–746.
Chollet, F. (2017). Xception: Deep learning with depthwise separable convolu-
tions. In 2017 IEEE conference on computer vision and pattern recognition,
János Rokai: Conceptualization, Methodology, Implemented (CVPR) [Internet] (pp. 1800–1807). IEEE, Available from: http://ieeexplore.
the algorithms, Writing – original draft. István Ulbert: Writing ieee.org/document/8099678/.
– original draft, Supervision. Gergely Márton: Conceptualization, Chung, J. E., Magl, J. F., Barnett, A. H., Tolosa, V. M., Tooker, A. C., Lee, K. Y., et al.
(2017). A fully automated approach to spike sorting. Neuron [Internet], 95(6),
Methodology, Writing – original draft, Supervision.
1381–1394.e6. http://dx.doi.org/10.1016/j.neuron.2017.08.030.
Dai, M., & Luo, J. (2014). A robust method for spike sorting with overlap
Declaration of competing interest decomposition. Journal of Computers, 9(3), 1195–1198.
Diggelmann, R., Fiscella, M., Hierlemann, A., & Franke, F. (2018). Automatic spike
sorting for high-density microelectrode arrays. Journal of Neurophysiology,
The authors declare that they have no known competing finan- 120(6), 3155–3171.
cial interests or personal relationships that could have appeared Dwibedi, D., Aytar, Y., Tompson, J., Sermanet, P., & Zisserman, A. (2021). With a
to influence the work reported in this paper. little help from my friends: Nearest-neighbor contrastive learning of visual
representations. In Proc IEEE int conf comput vis. (pp. 9568–9577).
Eom, J., Park, I. Y., Kim, S., Jang, H., Park, S., Huh, Y., et al. (2021). Deep-learned
Data availability
spike representations and sorting via an ensemble of auto-encoders. Neural
Networks[Internet], 134, 131–142. http://dx.doi.org/10.1016/j.neunet.2020.11.
Data will be made available on request. 009.
Fiáth, R., Márton, A. L., Mátyás, F., Pinke, D., Márton, G., Tóth, K., et al. (2019).
Slow insertion of silicon probes improves the quality of acute neuronal
Acknowledgments recordings. Scientific Reports, 9(1), 1–17.
Fiáth, R., Raducanu, B. C., Musa, S., Andrei, A., Lopez, C. M., van Hoof, C., et al.
Project no. FK132823 has been implemented with the sup- (2018). A silicon-based neural probe with densely-packed low-impedance
port provided by the Ministry of Innovation and Technology of titanium nitride microelectrodes for ultrahigh-resolution in vivo recordings.
Biosensors and Bioelectronics [Internet], 106(2017), 86–92. http://dx.doi.org/
Hungary from the National Research, Development and Innova-
10.1016/j.bios.2018.01.060.
tion Fund, financed under the FK_19 funding scheme. This re- Hao, H., Chen, J., Richardson, A., Van der Spiegel, J., & Aflatouni, F. A. (2021).
search was also funded by the Hungarian Brain Research Program 10.8 µw Neural signal recorder and processor with unsupervised analog
(2017_1.2.1-NKP-2017-00002) and the TUDFO/51757-1/2019-ITM classifier for spike sorting. IEEE Transactions on Biomedical Circuits and
Systems [Internet], 15(2), 351–364, Available from: https://ieeexplore.ieee.org/
grant by the Hungarian National Research, Development and
document/9417633/.
Innovation Office. Project no. RRF-2.3.1-21-2022-00015 has been Hilgen, G., Sorbaro, M., Pirmoradian, S., Muthmann, J. O., Kepiro, I. E., Ullo, S.,
implemented with the support provided by the European Union. et al. (2017). Unsupervised spike sorting for large-scale, high-density multi-
JR is thankful to Semmelweis University for the EFOP-3.6.3- electrode arrays. Cell Reports [Internet], 18(10), 2521–2532. http://dx.doi.org/
VEKOP-16-2017-00009 grant and to the Ministry of Innovation 10.1016/j.celrep.2017.02.038.
Hill, E. S., Moore-Kochlacs, C., Vasireddi, S. K., Sejnowski, T. J., & Frost, W. N.
and Technology of Hungary from the National Research, Devel- (2010). Validation of independent component analysis for rapid spike sorting
opment and Innovation Fund for the ÚNKP-21-3-II-SE-1. of optical recording data. Journal of Neurophysiology, 104(6), 3721–3731.
Huang, L., Gan, L., & Ling, B. W. K. (2021). A unified optimization model of
Code availability feature extraction and clustering for spike sorting. IEEE Transactions on Neural
Systems and Rehabilitation Engineering, 29, 750–759.
Hwang, W. J., Lee, W. H., Lin, S. J., & Lai, S. Y. (2013). Efficient architecture
The software is publicly available at: https://github.com/rokai for spike sorting in reconfigurable hardware. Sensors (Switzerland), 13(11),
jano/spike-on-edge 14860–14887.
223
Jäckel, D., Frey, U., Fiscella, M., Franke, F., & Hierlemann, A. (2012). Applicability Saif-Ur-Rehman, M., Lienk mper, R., Parpaley, Y., Wellmer, J., Liu, C., Lee, B., et
of independent component analysis on high-density microelectrode array al. (2019). SpikeDeeptector: A deep-learning based method for detection of
recordings. Journal of Neurophysiology, 108(1), 334–348. neural spiking activity. Journal of Neural Engineering, 16(5).
Jun, J. J., Mitelut, C., Lai, C., Gratiy, S. L., Anastassiou, C. A., & Harris, T. D. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., & Chen, L.-C. (2018).
(2017). Real-time spike sorting platform for high-density extracellular probes MobileNetV2: Inverted residuals and linear bottlenecks. In 2018 IEEE/CVF con-
with ground-truth validation and drift correction. Article 101030, bioRxiv ference on computer vision and pattern recognition [Internet] (pp. 4510–4520).
https://doi.org/10.1101/101030. IEEE, Available from: https://ieeexplore.ieee.org/document/8578572/.
Jun, J. J., Steinmetz, N. A., Siegle, J. H., Denman, D. J., Bauza, M., Barbarits, B.,
Schaffer, L., Nagy, Z., Kincses, Z., Fiath, R., & Ulbert, I. (2021). Spatial information
et al. (2017). Fully integrated silicon probes for high-density recording of
based OSort for real-time spike sorting using FPGA. IEEE Transactions on
neural activity. Nature [Internet], 551(7679), 232–236. http://dx.doi.org/10.
Biomedical Engineering, 68(1), 99–108.
1038/nature24636.
Seong, C., Lee, W., & Jeon, D. (2021). A multi-channel spike sorting processor
Kim, K. H., & Kim, S. J. (2000). Neural spike sorting under nearly 0-db signal-
to-noise ratio using nonlinear energy operator and artificial neural-network with accurate clustering algorithm using convolutional autoencoder. IEEE
classifier. IEEE Transactions on Biomedical Engineering, 47(10), 1406–1411. Transactions on Biomedical Circuits and Systems, 15(6), 1441–1453.
Li, Z., Wang, Y., Zhang, N., & Li, X. (2020). An accurate and robust method for Steinmetz, N. A., Aydin, C., Lebedeva, A., Okun, M., Pachitariu, M., Bauza, M.,
spike sorting based on convolutional neural networks. Brain Sciences, 10(11), et al. (2021). Neuropixels 2.0: A miniaturized high-density probe for stable,
1–16. long-term brain recordings. Science(80- ), 372(6539).
Lin, T. Y., Goyal, P., Girshick, R., He, K., & Dollar, P. (2020). Focal loss for Stringer, C., et al. (2019). Spontaneous behaviors drive multidimensional,
dense object detection. IEEE Transactions on Pattern Analysis and Machine brainwide activity. Science, 364, Article eaav7893. http://dx.doi.org/10.1126/
Intelligence, 42(2), 318–327. science.aav7893.
Magl, J., Jun, J. J., Lovero, E., Morley, A. J., Hurwitz, C. L., Buccino, A. P., et Takahashi, S., Anzai, Y., & Sakurai, Y. (2003). A new approach to spike sorting for
al. (2020). SpikeForest, reproducible web-facing ground-truth validation of multi-neuronal activities recorded with a tetrode - How ICA can be practical.
automated neural spike sorters. Elife [Internet], 9, Article e55167, Available Neuroscience Research, 46(3), 265–272.
from: https://elifesciences.org/articles/55167. Tan, M., Pang, R., & Le, Q. V. (2019). EfficientDet: Scalable and efficient object
Mamlouk, A. M., Sharp, H., Menne, K. M. L., Hofmann, U. G., & Martinetz, T. detection. In Proceedings of the IEEE computer society conference on computer
(2005). Unsupervised spike sorting with ICA and its evaluation using GENESIS
vision and pattern recognition [Internet] (pp. 10778–10787). Available from:
simulations. Neurocomputing, 65–66(SPEC. ISS.), 275–282.
http://arxiv.org/abs/1911.09070.
Moghaddasi, M., Aliyari Shoorehdeli, M., Fatahi, Z., & Haghparast, A. (2020).
Valencia, D., & Alimohammad, A. (2019). A real-time spike sorting system using
Unsupervised automatic online spike sorting using reward-based online clus-
parallel OSort clustering. IEEE Transactions on Biomedical Circuits and Systems,
tering. Biomedical Signal Processing and Control[Internet], 56, Article 101701.
http://dx.doi.org/10.1016/j.bspc.2019.101701. 13(6), 1700–1713.
Musk, E. (2019). An integrated brain-machine interface platform with thousands Vargas-Irwin, C., & Donoghue, J. P. (2007). Automated spike sorting using density
of channels. Journal of Medical Internet Research, 21(10), 1–14. grid contour clustering and subtractive waveform decomposition. Journal of
Pachitariu, M., Steinmetz, N., Kadir, S., Carandini, M., & Kenneth, D. H. Neuroscience Methods, 164(1), 1–18.
(2016). Kilosort: realtime spike-sorting for extracellular electrophysiology Wang, P. K., Pun, S. H., Chen, C. H., McCullagh, E. A., Klug, A., Li, A., et al. (2019).
with hundreds of channels. Article 061481, bioRxiv https://doi.org/10.1101/ Low-latency single channel real-time neural spike sorting system based on
061481. template matching. PLoS One, 14(11), 1–30.
Paraskevopoulou, S. E., Barsakcioglu, D. Y., Saberi, M. R., Eftekhar, A., & Con- Wood, F., Fellows, M., Donoghue, J. P., & Black, M. J. (2004). Automatic spike
standinou, T. G. (2013). Feature extraction using first and second derivative sorting for neural decoding. In Annu int conf IEEE eng med biol - proc., vol.
extrema (FSDE) for real-time and hardware-efficient spike sorting. Journal 26, no. VI (pp. 4009–4012).
of Neuroscience Methods [Internet], 215(1), 29–37. http://dx.doi.org/10.1016/j. Wouters, J., Kloosterman, F., & Bertr, A. (2021). A data-driven spike sorting
jneumeth.2013.01.012. feature map for resolving spike overlap in the feature space. Journal of Neural
Pouzat, C., & Garcia, S. (2019). Tridesclous [internet]. [cited 2020 Jul 7]. Available Engineering, 18(4).
from: https://github.com/tridesclous/tridesclous. Xu, H., Han, Y., Han, X., Xu, J., Lin, S., & Cheung, R. C. C. (2019). Unsupervised
Quiroga, R. Q., Nadasdy, Z., & Ben-Shaul, Y. (2004). Unsupervised spike detec-
and real-time spike sorting chip for neural signal processing in hippocampal
tion and sorting with wavelets and superparamagnetic clustering. Neural
prosthesis. Journal of Neuroscience Methods [Internet], 311(2018), 111–121.
Computation, 16(8), 1661–1687.
http://dx.doi.org/10.1016/j.jneumeth.2018.10.019.
Rácz, M., Liber, C., Németh, E., Fiáth, R., Rokai, J., Harmati, I., et al. (2020).
Spike detection and sorting with deep learning. Journal of Neural Engineering Yang, Y., & Mason, A. J. (2017). Frequency band separability feature extraction
[Internet], 17(1), Article 016038, Available from: https://iopscience.iop.org/ method with weighted Haar wavelet implementation for implantable spike
article/10.1088/1741-2552/ab4896. sorting. IEEE Transactions on Neural Systems and Rehabilitation Engineering,
Radmanesh, M., Rezaei, A. A., Hashemi, A., Jalili, Mahdi, & Goudarzi, M. M. (2022). 25(6), 530–538.
Online spike sorting via deep contractive autoencoder, Neural Networks Yang, K., Wu, H., & Zeng, Y. (2017). A simple deep learning method for neuronal
[Internet] vol. 155. (pp. 39–49). Available from: https://doi.org/10.1016/j. spike sorting. Journal of Physics: Conference Series[Internet], 910(1), Arti-
neunet.2022.08.001. cle 012062, Available from: https://iopscience.iop.org/article/10.1088/1742-
Rokai, J., Rácz, M., Fiáth, R., Ulbert, I., & Márton, G. (2021). Elvisort: Encoding 6596/910/1/012062.
latent variables for instant sorting, an artificial intelligence-based end-to-end Yger, P., Spampinato, G. L. B., Esposito, E., Lefebvre, B., Deny, S., Gardella, C., et al.
solution. Journal of Neural Engineering, 18(4). (2018). A spike sorting toolbox for up to thousands of electrodes validated
with ground truth recordings in vitro and in vivo. Elife, 7, 1–23.
224

4th resrach paper

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

4th resrach paper

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

4th resrach paper

Uploaded by

Copyright:

Available Formats

Neural Networks 162 (2023) 212–224

Contents lists available at ScienceDirect

Edge computing on TPU for brain implant signal analysis

and their feature vectors. false positive (FP) elements as well:

ISOSplit5 (25) 74,60% 69,90%

You might also like