2004 Unsupervised Spike Detection and Sorting With Wavelets and Superparamagnetic Clustering
2004 Unsupervised Spike Detection and Sorting With Wavelets and Superparamagnetic Clustering
2004 Unsupervised Spike Detection and Sorting With Wavelets and Superparamagnetic Clustering
Y. Ben-Shaul
[email protected]
ICNC, Hebrew University, Jerusalem, Israel
This study introduces a new method for detecting and sorting spikes from
multiunit recordings. The method combines the wavelet transform, which
localizes distinctive spike features, with superparamagnetic clustering,
which allows automatic classification of the data without assumptions
such as low variance or gaussian distributions. Moreover, an improved
method for setting amplitude thresholds for spike detection is proposed.
We describe several criteria for implementation that render the algorithm
unsupervised and fast. The algorithm is compared to other conventional
methods using several simulated data sets whose characteristics closely
resemble those of in vivo recordings. For these data sets, we found that
the proposed algorithm outperformed conventional methods.
1 Introduction
The basic algorithmic steps of spike classification are as follows: (1) spike
detection, (2) extraction of distinctive features from the spike shapes, and
(3) clustering of the spikes by these features. Spike sorting methods are
typically based on clustering predefined spike shape features such as peak-
to-peak amplitude, width, or principal components (Abeles & Goldstein,
1977; Lewicki, 1998). Nevertheless, it is impossible to know beforehand
which of these features is optimal for discriminating between spike classes
2 Theoretical Background
where ψa,b (t) are dilated (contracted), and shifted versions of a unique
wavelet function ψ(t),
− 12 t−b
ψa,b (t) = |a| ψ (2.2)
a
where a and b are the scale and translation parameters, respectively. Equa-
tion 2.1 can be inverted, thus providing the reconstruction of x(t).
The WT maps the signal that is represented by one independent variable
t onto a function of two independent variables a, b. This procedure is redun-
dant and inefficient for algorithmic implementations; therefore, the WT is
usually defined at discrete scales a and discrete times b by choosing the set
of parameters {aj = 2−j ; bj,k = 2−j k}, with integers j and k. Contracted ver-
sions of the wavelet function match the high-frequency components, while
dilated versions match the low-frequency components. Then, by correlating
the original signal with wavelet functions of different sizes, we can obtain
details of the signal at several scales. These correlations with the different
wavelet functions can be arranged in a hierarchical scheme called multires-
olution decomposition (Mallat, 1989). The multiresolution decomposition
algorithm separates the signal into details at different scales and a coarser
representation of the signal named “approximation” (for details, see Mallat,
1989; Chui, 1992; Samar, Swartz, & Raghveer, 1995; Quian Quiroga, Sakow-
icz, Basar, & Schürmann, 2001; Quian Quiroga & Garcia, 2003).
In this study we implemented a four-level decomposition using Haar
wavelets, which are rescaled square functions. Haar wavelets were chosen
due to their compact support and orthogonality, which allows the discrimi-
native features of the spikes to be expressed with a few wavelet coefficients
and without a priori assumptions on the spike shapes.
where T is the temperature (see below). Note that only those nearest neigh-
bors of xi that were in the same previous state s are the candidates to change
their values to snew . Neighbors that change their values create a “frontier”
and cannot change their value again during the same iteration. Points that
do not change their value in a first attempt can do so if revisited during the
same iteration. Then for each point of the frontier, we apply equation 2.4
again to calculate the probability of changing the state to snew for their re-
spective neighbors. The frontier is updated, and the update is repeated until
the frontier does not change any more. At that stage, we start the procedure
again from another point and repeat it several times in order to get represen-
tative statistics. Points that are relatively close together (i.e., corresponding
to a given cluster) will change their state together. This observation can be
quantified by measuring the point-point correlation δsi ,sj and defining xi ,
xj to be members of the same cluster if δsi ,sj ≥ θ , for a given threshold θ.
Unsupervised Spike Detection and Sorting 1665
Figure 2 summarizes the three principal stages of the algorithm: (1) spikes
are detected automatically via amplitude thresholding; (2) the wavelet trans-
1666 R. Quiroga, Z. Nadasdy, and Y. Ben-Shaul
form is calculated for each of the spikes and the optimal coefficients for
separating the spike classes are automatically selected; and (3) the selected
wavelet coefficients then serve as the input to the SPC algorithm, and cluster-
ing is performed after automatic selection of the temperature corresponding
to the superparamagnetic phase. (A Matlab implementation of the algorithm
can be obtained on-line from www.vis.caltech.edu/∼rodri.)
Figure 2: Overview of the automatic clustering procedure. (A) Spikes are de-
tected by setting an amplitude threshold. (B) A set of wavelet coefficients rep-
resenting the relevant features of the spikes is selected. (C) The SPC algorithm
is used to cluster the spikes automatically.
1668 R. Quiroga, Z. Nadasdy, and Y. Ben-Shaul
|x|
Thr = 4σn ; σn = median , (3.1)
0.6745
2.8
Conventional estimation
2.6 Improved estimation
Real value
2.4
2.2
1.8
1.6
1.4
1.2
0.8
0 5 10 15 20 25 30 35 40 45 50
Firing rate (Hz)
Figure 3: Estimation of noise level used for determining the amplitude thresh-
old. Note how the conventional estimation based on the standard deviation of
the signal increases with the firing rate, whereas the improved estimation from
equation 3.1 remains close to the real value. See the text for details.
4 Data Simulation
A) 1
B)
Class 1
1.5
Class 2
Class 3 1
0.5
0.5
0 0
−0.5
−0.5 −1
−1.5
−1
0 50 100 150 200 250 300 0 1 2 3 4 5
Samples Time (sec)
C) 3 1 3 1 2 21 3 1 2
1.5
0.5
−0.5
−1
Figure 4: Simulated data set used for spike sorting. (A) The three template
spike shapes. (B) The previous spikes embedded in the background noise. (C)
The same data with a magnified timescale. Note the variability of spikes from
the same class due to the background noise.
1672 R. Quiroga, Z. Nadasdy, and Y. Ben-Shaul
A) B) 1.5
1 Class 1
Class 2
Class 3 1
0.5
0.5
0
0
−0.5
−0.5
C) 1.5 3 2 3 2 1 1 2
0.5
−0.5
Figure 5: Another simulated data set. (A) The three template spike shapes. (B)
The previous spikes embedded in the background noise. (C) The same data
with a magnified timescale. Here the spike shapes are more difficult to differen-
tiate. Note in the lower plot that the variability in the spike shapes makes their
classification difficult.
between them are relatively small and temporally localized. By adding the
background noise, it appears to be very difficult to identify the three spike
classes (see Figure 5C). As with the previous data set, the variability of
spikes of the same class is apparent.
All the data sets used in this article are available on-line at www.vis.
caltech.edu/∼rodri.
5 Results
The method was tested using four generic examples of 60 sec length, each
simulated at four different noise levels, as described in the previous section.
Since the first example was relatively easy to cluster, in this case we also
generated four extra time series with higher noise levels.
5.1 Spike Detection. Figures 4 and 5 show two of the simulated data
sets. The horizontal lines drawn in Figures 4B and C and 5B and C are
the thresholds for spike detection using equation 3.1. Table 1 summarizes
the performance of the detection procedure for all data sets and noise lev-
els. Detection performances for overlapping spikes (i.e., spike pairs within
64 data points) are reported separately (values in brackets). Overlapping
Unsupervised Spike Detection and Sorting 1673
Table 1: Number of Misses and False Positives for the Different Data Sets.
Example Number
(Noise Level) Number of Spikes Misses False Positives
Example 1 [0.05] 3514 (785) 17 (193) 711
[0.10] 3522 (769) 2 (177) 57
[0.15] 3477 (784) 145 (215) 14
[0.20] 3474 (796) 714 (275) 10
spikes hamper the detection performance because they are detected as sin-
gle events when they appear too close in time.
In comparison with the other examples, a relatively large number of
spikes were not detected in data set 1 for the highest noise levels (0.15 and
0.2). This is due to the spike class with opposite polarity (class 2 in Figure
4). In fact, setting up an additional negative threshold reduced the number
of misses from 145 to 5 for noise level 0.15 and from 714 to 178 for 0.2. In
the case of the overlapping spikes, this reduction is from 360 to 52 and from
989 to 134, respectively. In all other cases, the number of undetected spikes
was relatively low.
With the exception of the first two noise levels in example 1 and the
first noise level in example 3, the number of false positives was very small
(less than 1%). Lowering the threshold value in equation 3.1 (e.g., 3.5 σn )
would indeed reduce the number of misses but also increase the number of
false positives. The optimal trade-off between number of misses and false
positives depends on the experimenter’s preference, but we remark that
the automatic threshold of equation 3.1 gives an optimal value for different
noise levels. In the case of example 1 (noise level 0.05 and 0.1) and example
3 (noise level 0.05), the large number of false positives is exclusively due
to double detections. Since the noise level is very low in these cases, the
threshold is also low, and consequently, the second positive peak of the
class 3 spike shown in Figure 4 is detected. One solution would be to take a
higher threshold value (e.g., 4.5 σn ), but this would not be optimal for high
1674 R. Quiroga, Z. Nadasdy, and Y. Ben-Shaul
A
4 D4
A) 3 D3 D
2 D1
−1
KS criterion
Max. variance
−2 Class 1
Class 2
Class 3
−3
10 20 30 40 50 60
A4
D4 D
B) 4
3 D
2 D
1
0
KS criterion
Max. variance
−1 Class 1
Class 2
Class 3
−2
10 20 30 40 50 60
Figure 6: Wavelet transform of the spikes from Figure 4 and Figure 5 (panes A
and B, respectively). Each curve represents the wavelet coefficients for a given
spike, the gray levels denoting the spike class after clustering with SPC. (A)
Several wavelet coefficients are sensitive to localized features. (B) Separation is
much more difficult due to the similarity of the spike shapes. The markers show
coefficients selected based on the variance criteria and coefficients selected based
on deviation from normality. D1 –D4 are the detail levels, and A4 corresponds to
the last approximation level.
1676 R. Quiroga, Z. Nadasdy, and Y. Ben-Shaul
1000 500
2
0 0
−0.6 −0.4 −0.2 0 0.2 0 2 4 6
500 500
3
0 0
−1 −0.5 0 −2 0 2 4
39
1000 500
1
0 0
−0.2 0 0.2 0.4 −2 0 2 4
20
500 500
4
0 0
−0.5 0 0.5 1 −1 0 1 2 3
1000 1000
38
0 0
−0.2 0 0.2 0.4 −2 −1 0 1 2
500 1000
40
0 0
−0.2 0 0.2 0.4 −1 0 1 2
1000 1000
19
0 0
−0.5 0 0.5 1 −1 0 1 2
10
500 500
43
0 0
0.05 0.1 0.15 0.2 0.25 −1 0 1 2
500 500
42
0 0
−0.15 −0.1 −0.05 0 0 1 2 3
20
500 500
10
0 0
−1 0 1 2 −0.5 0 0.5 1
A) 1
0.5
Coeff. 4
−0.5
−0.5
−1
2
0 1.5 2 2.5 3 3.5 4 4.5
−0.5
−2 0 0.5 1
Coeff. 8
Coeff. 1
C)
2
1
PCA 3
−1
−2
−3 2 4
−4 −3 −2 −1 −2 0
0 1 2 3 −4
PCA 2
PCA 1
Figure 8: Best projection of the wavelet coefficients selected with the (A) KS
criterion and the (B) variance criterion. (C) The projection on the first three
principal components. Note that only with the wavelet coefficients selected
by using the KS criterion is it possible to separate the three clusters. Clusters
assignment (shown with different gray levels) was done after use of SPC.
1678 R. Quiroga, Z. Nadasdy, and Y. Ben-Shaul
Table 2: Number of Classification Errors for All Examples and Noise Levels
Obtained Using SPC, K-Means, and Different Spike Features.
Notes: In parentheses are the number of correct clusters detected when different from 3. The numbers
corresponding to the example shown on Figure 8 are underlined.
A) 4000
3500
3000
Clusters size
2500
2000
1000
500
0
0 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20
Temperature
Class 1
Class 2 C)
Class 3
Non clustered
B)
Figure 9: (A) Cluster size vs. temperature. Based on the stability criterion of
the clusters, a temperature of 0.02 was automatically chosen for separating the
three spike classes. (B) All spikes with gray levels according to the outcome of
the clustering algorithm. Note the presence of overlapping spikes. (C) Original
spike shapes.
A) Class 1
Class 2
Class 3
Non clustered
C) Class 1
Class 2
Class 3
Non clustered
Figure 10: Outcome of the clustering algorithm for the three remaining exam-
ples. The inset plots show the original spike shapes. Most of the classification
errors (gray traces) are due to overlapping spikes with short temporal separa-
tion.
Unsupervised Spike Detection and Sorting 1681
of the feature space. The most used algorithms are supervised and usually
assume gaussian shapes of the clusters and specific properties of the noise
distribution. In order to illustrate the difference with these methods, we
will compare results using SPC with those obtained using K-means. The
partitioning of data by K-means keeps objects of the same cluster as close
as possible (using Euclidean distance in our case) and as far as possible
from objects in the other clusters. The standard K-means algorithm leaves
tudes 1.0, 0.7, and 0.5, separated by 3 ms in average (SD = 1, range, 1–5 ms).
From a total of 2360 spikes, again the three clusters were correctly detected,
and we had 25 classification errors using wavelets with SPC.
Finally we considered a correlation between the spike amplitudes and
the background activity, similar to the condition when spikes co-occur with
local field events. We used the same spike shapes and noise level shown
in Figure 5, but the spike amplitudes varied from 0.5 to 1 depending on
6 Discussion
wavelet coefficients in comparison with PCA was shown for several exam-
ples generated with different noise levels. For comparison, we also used the
whole spike shape as input to the clustering algorithm. As shown in Table
2, the dimensionality reduction achieved with the KS test clearly improves
the clustering performance. Since wavelets are a linear transform, using all
the wavelet coefficients yields nearly the same results as taking the entire
spike shape (as it is just a rescaling of the space). Since the need of a low-
Acknowledgments
We are very thankful to Richard Andersen and Christof Koch for support
and advice. We also acknowledge very useful discussions with Noam Shen-
tal, Moshe Abeles, Ofer Mazor, Bijan Pesaran, and Gabriel Kreiman. We are
in debt to Eytan Domany for providing us the SPC code and to Alon Nevet
1686 R. Quiroga, Z. Nadasdy, and Y. Ben-Shaul
who provided the original spike data for the simulation. This work was
supported by the Sloan-Swartz foundation and DARPA.
References
Abeles, M., & Goldstein, M. (1977). Multispike train analysis. Proc. IEEE, 65,
Quian Quiroga, R., Kraskov, A., Kreuz, T., & Grassberger, P. (2002). Performance
of different synchronization measures in real data: A case study on electroen-
cephalographic signals. Phys. Rev. E, 65, 041903.
Quian Quiroga, R., Sakowicz, O., Basar, E., & Schürmann, M. (2001). Wavelet
Transform in the analysis of the frequency composition of evoked potentials.
Brain Research Protocols, 8, 16–24.
Samar, V. J., Swartz, K. P., & Raghveer, M. R. (1995). Multiresolution analysis of