Implementing a calibration-free SSVEP-based BCI system with 160 targets

Yonghao Chen; Chen Yang; Xiaochen Ye; Xiaogang Chen; Yijun Wang; Xiaorong Gao

doi:10.1088/1741-2552/ac0bfa

1. Introduction

Brain–computer interface (BCI) is a technique that builds a direct pathway between the human brain and external devices without relying on peripheral muscles and tissues [1]. Steady-state visual evoked potential (SSVEP), an influential paradigm of electroencephalogram (EEG)-based BCI, has attracted considerable attention due to its high information transfer rate (ITR) [2] and low dependence on training data [3]. SSVEPs are signals that exist in the human brain, mostly the occipital region when users gaze at a visual flicker stimulus with a fixed frequency [4]. Generally, the neural responses consist of oscillations at the fundamental frequency and harmonics of visual stimulation. By encoding different frequencies and phases into visual stimuli, many SSVEP-based BCI systems with multiple targets were proposed [5]. A typical SSVEP-based speller system contains 40 commands, in which its frequency covers from 8 Hz to 15.8 Hz with an interval of 0.2 Hz [2]. Because SSVEP signals hold robust features in the frequency domain, many studies have validated the feasibility of calibration-free SSVEP-based BCI through canonical correlation analysis (CCA) [6].

The most popular metrics for performance evaluation in BCI is ITR [7]. It is calculated by the number of targets, average detection time, and recognition accuracy. Previous studies mainly focused on boosting recognition accuracy and decreasing detection time. Many P300-based BCI systems implementing multiple targets were devised at the cost of longer time to output instructions and acquire training data, resulting in attrition of the overall ITR [8, 9]. Recently, a hybrid SSVEP and P300 based BCI system successfully implemented 108 targets with training data [10]. However, limited research has been done to grow the number of instructions performed in SSVEP-based BCI, especially the system without calibration. The traditional stimulus encoding approach, joint frequency-phase modulation (JFPM) [2], is challenging when the interval of stimulus frequency needs to be reduced due to more targets. In detail, as the divergence of stimulus frequency decreases, the difficulty of useful recognitions will be more considerable, which brings more complexity for the classification algorithm. Besides, dual-frequency decoding is a promising mechanism for increasing targets through the incorporation of two different stimulus frequencies [11–13]. Nevertheless, this paradigm requires enough training data for detections, which costs a lot of time to collect training data. Moreover, an encoding method named multiple frequencies sequential coding (MFSC) [14] was proposed for increasing the number of targets in SSVEP-based BCIs. MFSC encodes stimulus sequentially with various frequencies, and the study [14] has demonstrated its feasibility and great potential. The stimuli based on MFSC can be decoded separately in every epoch, providing a realizable way for a calibration-free system implementing multiple targets. This encoding method can be conveniently applied to asynchronous systems through cyclic code. Nonetheless, the study [14] only demonstrated a four-target system. Therefore, it is still meaningful to further investigate the possibility of applying MSFC to a calibration-free system that enables numerous commands.

An SSVEP–BCI system implementing more targets can be applied in more situations that require many instructions. For instance, SSVEP-based BCI has been applied to robot control [15], keyboard simulator [16], speller system [2], and spatial navigation [17], etc. Those applications situation cannot be satisfied with SSVEP–BCI systems of few targets.

Despite traditional SSVEP-based BCI containing limited targets, users can continuously input several characters to stimulate a system enabling multiple targets. However, during actual experiments, the users are difficult to input accurately and continuously. Taking a two-targets SSVEP-based BCI system for example, seven continuous inputs are equal to ${2^7} = 128$ targets. However, if the overall accuracy should maintain 90%, the average accuracy of every single input should be $\sqrt[7]{{0.9}} = 0.9851$ , which is difficult for many users. As a result, a system implementing multiple targets is essential.

Inspired by the fundamental idea of MFSC, this research proposed a calibration-free SSVEP-based BCI system implementing 160 targets. Each stimulation consists of four segments from eight stimulation components (8–15 Hz, interval 1 Hz), and the phases keep the same with the corresponding value in the study [18]. Each segment maintains one second. As a result, four seconds should be spent to output an instruction completely. Consequently, an optimal selection needs to be determined from 8⁴ = 4096 possible permutation. In this research, we optimized the stimulation permutation, ensuring the globally most substantial distance between different visual stimuli' neural response. The unsupervised classification algorithm employed in this research is based on filter bank CCA (FBCCA) [3]. Finally, 12 subjects participated in the online experiments performing cue-guided tasks, and eight of them engaged in offline experiments. The online system achieved an average accuracy of 87.16 ±11.46% and an ITR of 78.84 ±15.59 bits min⁻¹.

2. Methods

2.1. Subjects

Twelve volunteers in Tsinghua University, with normal or corrected to normal vision, were involved in the experiments (two females and ten males, aged 19–30 years old). Eight of them engaged in offline and online experiments, and the other four subjects participated in online experiments separately.

This research was approved by the Research Ethics Committee of Tsinghua University. All subjects were required to read and sign an informed consent form before the experimentation and received economic compensation.

2.2. Stimulation paradigm

This experiment used a 37.5 inch liquid-crystal display monitor (U3818DW, DELL, USA) with a 60 Hz refresh rate and 3840 × 1600 pixel resolution to present visual stimuli. As shown in figure 1, the monitor presented 160 targets in the form of a 10 × 16 matrix, and each stimulus was presented in a $96 \times 96$ pixels square. The space between stimuli was 144 pixels in the horizontal orientation and 52 pixels in the vertical orientation. It is worth to notice that the interface showed input box and black characters on white squares only during online experiments. The targets were organized in ten rows (0–9) and 16 columns (A–P). In detail, the 'A0' square was the first target in the first row, the 'B0' square was the second target in the first row, and the 'A1' square was the first target in the second row. Participants' eyes kept a distance of 60 cm with the center part of the interface during the experiments.

Figure 1. Refer to the following caption and surrounding text. — **Figure 1.** Stimulation interface of online experiments.
Download figure:
Standard image High-resolution image

We used the multiple frequency sequential coding (MFSC) [14] as the fundamental protocol of encoding multiple visual stimuli (defined as code words). A code word was composed of four continuous sinusoidal stimuli (defined as code elements) that lasted one second. This study chose eight kinds of code elements to accomplish this code word set, covering frequencies from 8 Hz to 15 Hz (from element '0' to '7') with an interval of 1 Hz. Figure 2 demonstrated the components of different visual stimuli. Previous studies [2, 18] based on JFPM have demonstrated the importance of mixed frequency and phase, and this study adopted the same phases as duplications of previous studies. In other words, the phase of the 8 Hz stimulus is 0, and the phase of the 9 Hz stimulus is $0.5\pi$ , every 1 Hz increased in the frequency domain, $0.5\pi$ grown in the corresponding phase. The stimulus program was developed under MATLAB, exploiting the Psychophysics toolbox [19].

Figure 2. Refer to the following caption and surrounding text. — **Figure 2.** Schematic diagram of encoding visual stimuli (code word).
Download figure:
Standard image High-resolution image

2.3. Optimization

This study used response of stimulation to optimize arrangement of code words. Suppose ${N^{{\text{word}}}}$ is the number of all possible combinations:

$\begin{equation}\begin{array}{*{20}{c}} {{N^{{\text{word}}}} = {E^{\text{L}}}} \end{array}\end{equation} \tag{ 1 }$

where E is the number of code element, and L is the number of continuous segments. Respectively, the neural responses corresponding to lth visual stimuli in kth code word is $S_k^{\left( l \right)}.$ The distance between two code word responses with the smallest response distance is defined as the code set response distance:

$\begin{equation}\begin{array}{*{20}{c}} {{D_{\text{S}}}\, = \,\frac{1}{{{N_{\text{c}}}}}\mathop {\min }\limits_{1 \leqslant a < b \leqslant E} \mathop \sum \limits_{l = 1}^L {{\left( {S_{\text{b}}^{\left( l \right)}\, - \,S_{\text{a}}^{\left( l \right)}} \right)}^2}} \end{array}\end{equation} \tag{ 2 }$

where a and b indicate the indices of code word. The optimization objective function can be expressed as:

$\begin{equation}\begin{array}{*{20}{c}} {{S^*}\, = \,\arg \mathop {\max }\limits_S {D_{\text{S}}}} \end{array}.\end{equation} \tag{ 3 }$

However, when a large amount of code words is required, the problem scale is large, and the enumeration method will take a huge amount of time. Therefore, this study uses simulated annealing (SA) method [20] to quickly obtain approximate solutions. Initially, a code set response is randomly generated, the initial temperature is set to 10 000 × L, the cut-off temperature is set to 0.001, and the number of iterations is set to 50 × L. This study simulated different visual stimuli responses through the benchmark dataset [18] that contains EEG data of 35 healthy subjects from standard SSVEP–BCI cue-guided spelling tasks. The experiment consisted of six blocks, and each block consisted of 40 trials in random order. The EEG data were recorded using a Synamps2 EEG system (Neuroscan, Inc), enabling 64 channels, according to an extended 10–20 system. Because of the major activated area of SSVEP response, nine channels in the occipital region (Pz, POz, PO3, PO4, PO5, PO6, Oz, O1, O2) were selected for further optimization. Each trial lasted 6 s, containing 0.5 s cue time, 5 s flickering stimulus and 0.5 s rest time. Because every code element stayed 1 s, the calculation process only used the data extracted between 0.5 s and 1.5 s. A notch filter at 50 Hz was applied in this dataset to remove the power-line noise, and the amplifier filtered signal with passband ranged from 0.15 Hz to 200 Hz. The average SSVEP responses of corresponding frequencies across all blocks and all trials were used to simulate the response in optimization problem. Finally, an optimized arrangement ${S^*}$ was calculated, as shown in figure 3.

Figure 3. Refer to the following caption and surrounding text. — **Figure 3.** Components of optimized arrangement ${S^*}$ ('0' represents stimuli of 8 Hz and '7' represents stimuli of 15 Hz), the structure was identical with figure 1.
Download figure:
Standard image High-resolution image

**Figure 3.** Components of optimized arrangement ${S^*}$ ('0' represents stimuli of 8 Hz and '7' represents stimuli of 15 Hz), the structure was identical with figure 1.
Download figure:
Standard image High-resolution image

2.4. Data recording

This study utilized a Neuroscan Synamps2 system to record EEG signals through nine electrodes placed in the occipital region (Pz, POz, PO3, PO4, PO5, PO6, Oz, O1, O2) where SSVEP signals embody the highest signal-to-noise ratio (SNR). All the electrodes were placed according to the International 10/20 system. For the sake of removing unnecessary background noises, the recorded signals were purified by a notch filter at 50 Hz, and electrode impedances were kept below 10 KΩ during recording. Moreover, all signals were downsampled to 250 Hz from the primary sampling rate of 1 KHz. We uploaded the dataset on the website http://bci.med.tsinghua.edu.cn.

2.5. Feature extraction and classification algorithms

The most common algorithms used in SSVEP–BCI are CCA [6] and its comprehensive approach. CCA finds spatial filters between multiple channels EEG signals and predefined sinusoidal template, resulting in maxim um SNR of EEG signals. Then the Pearson's correlation analysis between weighted templates and weighted signals is accepted as the classification criterion. Usually, when the sampling rate is ${f_{\text{s}}}$ , and the number of sampling points is ${N_{\text{s}}}$ , the sinusoidal template ${\boldsymbol{Y}_i}$ related to stimulus frequency ${f_i}$ can be defined as:

$\begin{align} {\boldsymbol{Y}_i} & = \left[ \begin{array}{*{20}{c}} {{\boldsymbol{y}_i}\left( {\frac{1}{{{f_{\text{s}}}}}} \right)}& \cdots &{{\boldsymbol{y}_i}\left( {\frac{{{N_{\text{s}}}}}{{{f_{\text{s}}}}}} \right)} \end{array} \right],\nonumber \\ {\boldsymbol{y}_i}\left( t \right) & = \left[ {\begin{array}{*{20}{c}} {\sin \left( {2\pi {f_i}t} \right)} \\ {\cos \left( {2\pi {f_i}t} \right)} \\ \vdots \\ {\sin \left( {2\pi {N_{\text{h}}}{f_i}t} \right)} \\ {\cos \left( {2\pi {N_{\text{h}}}{f_i}t} \right)} \end{array}} \right],\,\;t\, = \,\left[ {\frac{1}{{{f_{\text{s}}}}},\frac{2}{{{f_{\text{s}}}}},\frac{3}{{{f_{\text{s}}}}}\, \cdots \,\frac{{{N_{\text{s}}}}}{{{f_{\text{s}}}}}} \right] \end{align} \tag{ 4 }$

where ${N_{\text{h}}}$ is the number of concerned harmonics. Established from standard CCA, many extended algorithms were exploited for promoting higher performance [21]. Among those studies, research [3] introduced the filter bank preprocessing strategy to CCA, and this algorithm significantly enhanced the performance of SSVEP–BCI. FBCCA applied several bandpass filters to decompose the SSVEP signals into sub-band components, contributing to more efficient extraction of SSVEP harmonic components, validated in many studies [22]. In this study, we took FBCCA as the theoretical framework of target detection.

According to the fundamental idea of feature extraction and classification algorithms in SSVEP–BCI [23], we took the filter bank and spatial filters as the feature extraction approaches, then used the Pearson correlation coefficient as classification features. The flow diagram of the detection process is shown in figure 4. The recorded data removed a 140 ms latency delay in the visual pathway [24]. Four segments of code elements built four sinusoidal templates ( ${\boldsymbol{Y}_{{c_j}}} \in {\mathbb{R}^{2{N_h} \times {N_s}}},j = 0,1,2,3$ ) that were used to calculate the correlation value with four segments of data ( ${X_j} \in {\mathbb{R}^{{N_{\text{c}}} \times {N_{\text{s}}}}},j = 0,1,2,3$ ), here ${N_{\text{c}}}$ indicates the number of channels. The predefined ${N_{{\text{fb}}}}$ filter banks decomposed data into several sub-band components $\left( {X_j^m \in {\mathbb{R}^{{N_{\text{c}}} \times {N_{\text{s}}}}},j = 0,1,2,3,m = 1,2, \ldots ,{N_{{\text{fb}}}}} \right)$ . The upper and lower cut-off frequencies maintained the same with [3]. In other word, for the mth sub-band, the lower cut-off frequency was $m \times 8{ }$ Hz and the upper cut-off frequency was 90 Hz, achieved by Chebyshev Type I infinite impulse response (IIR) filters. The corresponding correlation coefficients can be acquired through:

$\begin{equation}\small \begin{array}{*{20}{c}} \!\! \!{\rho _{ij}} \! = \! \sum\limits_{m = 1}^{{N_{{\text{fb}}}}} w\left( m \right) \times {\text{CCA}}\left( {X_j^m,{{\text{Y}}_{{c_j}}}\left( i \right)} \! \right)\!,\,\,\,i \!= \! 1,2, \ldots ,160, \\ {{ }j = 0,1,2,3,m = 1,2, \ldots ,{N_{{\text{fb}}}}} \end{array}\end{equation} \tag{ 5 }$

Figure 4. Refer to the following caption and surrounding text. — **Figure 4.** Flow diagram of the detection algorithm. (A) Format of original data. (B) The relationship between sinusoidal template and code words. (C) The core classification process.
Download figure:
Standard image High-resolution image

where weighted vector $w\left( m \right)$ was defined as:

$\begin{equation}\begin{array}{*{20}{c}} {w\left( m \right)\, = \,{m^{ - 1.25}}\, + \,0.25,\,m\, = \,1,\,2,\, \ldots \,,\,{N_{{\text{fb}}}}} . \end{array}\end{equation} \tag{ 6 }$

Because selections of code words were sparse in all possible combinations, by taking the sum of correlation coefficients as the feature for classification, a synthesized detection criterion can be described as:

$\begin{equation}{ }\begin{array}{*{20}{c}} {\tau = \mathop {{\text{argmax}}}\limits_i {\delta _i} = \mathop {{\text{argmax}}}\limits_i \sqrt {{\rho _{i0}}^2 + {\rho _{i1}}^2 + {\rho _{i2}}^2 + {\rho _{i3}}^2} ,} \\ {i\, = \,1,2, \ldots ,160}. \end{array}\end{equation} \tag{ 7 }$

2.6. Experiment process

In order to evaluate the feasibility of the proposed paradigm, we conducted a series of offline and online experiments. The major intensions of offline experiments were providing materials for parameter optimization and performance estimation. After that, online experiments were conducted to simulate the actual BCI interaction process. Eight participants (seven males, one female S2, aged 19–30) were requested to complete three blocks of cue-guided tasks (except the S3), and each block contained 160 trials that comprised all visual stimuli in random orders. Due to the personal reason of S3, he just completed two blocks. All participants of offline experiments were involved in the online experiments. Four students in the Tsinghua University (three males, one female S11, aged 20–29) joined the online experiments as inexperienced users. Offline experiments contained three blocks of scanning targets, and online experiments had two blocks, where each block included 160 trials that presented 160 visual stimuli in random orders. As figure 5 elaborated, enough time for rest (1 s in offline experiments and 0.5 s in online experiments) and cue illustration (2 s) has been provided for subjects. As a matter of fact, the intervals can be significantly shortened if subjects were accustomed to the interface's keyboard. The intervals for providing results and showing cues were designed to be long enough for native users to follow the instructions and have enough rest during long-time tasks. Those intervals would not interfere with the classification and the stimulation process, so this study still used 0.5 s as the duration of gaze between targets in the calculation of ITR, which was similar to previous studies [2, 18]. Both offline experiments and online experiments consisted of scanning of whole 160 targets in random order. Volunteers had a break for rest per 40 characters.

Figure 5. Refer to the following caption and surrounding text. — **Figure 5.** Flow chart of the experiment procedure (offline and online experiments).
Download figure:
Standard image High-resolution image

3. Results

3.1. Offline BCI experiments

The offline experiments were devised to evaluate the effectiveness and help optimizing parameters. ${N_{\text{h}}}$ and ${N_{{\text{fb}}}}$ were key parameters to be optimized. Because this study adopted the fundamental frequencies covered from 8 Hz to 15 Hz, we adopted the same ${N_{\text{h}}} = 5$ with previous studies on the same frequency bands [2, 3, 21], so this study only considered the influence of ${N_{{\text{fb}}}}$ , the relationships between ${ }{N_{{\text{fb}}}}{ }$ and average accuracy were shown in figure 6. Thus, the optimal parameters were ${N_{\text{h}}}$ = 5, ${N_{{\text{fb}}}}$ = 6. This was a common phenomenon, the optimal ${N_{{\text{fb}}}}{ }$ of all subjects was 6.

Figure 6. Refer to the following caption and surrounding text. — **Figure 6.** The relationships between ${N_{{\text{fb}}}}$ and average accuracy.
Download figure:
Standard image High-resolution image

Under conditions of the optimal parameters ( ${N_{\text{h}}}$ = 5, ${N_{{\text{fb}}}}$ = 6), results of offline experiments were illustrated in table 1. The subjects with accuracy higher than 90% were highlighted. During this study, the T calculated in ITR was 4.5 s, containing 4 s data for analysis and 0.5 s for gaze between targets.

Table 1. Results of offline experiments (T = 4.5 s in ITR calculation).

Subjects	Accuracy (%)	ITR (bits min⁻¹)
S1	95.41	89.58
S2	92.91	85.79
S3	86.56	76.93
S4	89.58	81.04
S5	67.91	54.27
S6	80.62	69.28
S7	85.41	75.42
S8	63.33	49.23
Mean ± STD	82.72 ± 10.80	72.69 ± 14.41

Accuracies higher than 90% are marked in bold

3.2. Online BCI experiments

Different from offline experiments, online experiments displayed identified label through detection algorithm ( ${N_{\text{h}}}$ = 5, ${N_{{\text{fb}}}}$ = 6). Results of online experiments are shown in table 2. The subjects with accuracy higher than 90% were highlighted.

Table 2. Results of online experiments (T = 4.5 s in ITR calculation).

		No. of trials	ITR
Subjects	Accuracy (%)	(correct/incorrect)	(bits min⁻¹)
S1	96.25	308/12	90.89
S2	98.44	315/5	94.55
S3	75.63	242/78	63.17
S4	94.69	303/17	88.45
S5	94.37	302/18	87.98
S6	62.81	201/119	48.67
S7	94.06	301/19	87.50
S8	78.44	251/69	66.57
S9	91.25	292/28	83.38
S10	71.56	229/91	58.41
S11	99.06	317/3	95.69
S12	89.37	286/34	80.75
Mean	87.16	279/320	78.84
STD	11.46	—	15.59

Accuracies higher than 90% are marked in bold

It is worth mentioning that online experiments contained characters and an input box in the stimulation interface, which is beneficial to the user's concentration. As a consequence, the average recognition accuracy of online experiments was slightly higher than offline experiments. Overall, only two subjects (S3, S6) achieved lower accuracy in online experiments, which may be contributed by the fluctuation of mental state and attention. Especially, S5 achieved 94.37% accuracy in online experiments, although his performance in offline experiments was not outstanding. According to his intuitive feeling, the characters facilitated him to focus on the stimulation interface, leading to higher accuracy.

3.3. Signal analysis

This study used SSVEP as the primary classification feature, which was robust enough in the online experiments. To further validate the characteristics of recorded EEG signals, we analyzed signals from its frequency domain. The average EEG responses of code elements were acquired through averaging across channels and trials. Specifically, the amplitude spectra of EEG responses of a typical subject were shown in figure 7. The responses comprised the base frequency components and the harmonics, which were consistent with previous studies [18].

Figure 7. Refer to the following caption and surrounding text. — **Figure 7.** The amplitude spectra of average EEG responses of a typical subject (S11 in online experiments).
Download figure:
Standard image High-resolution image

Moreover, the EEG responses of the presented visual stimuli should be related between the temporal domain and the frequency domain. We used the short-time Fourier transform (STFT) to elaborate on this characteristic. STFT results of some typical SSVEP responses that were averaged among subjects and channels were displayed in figure 8. The results were normalized on columns, ensuring the standard deviation of column vectors is 1, and the mean of column vectors is 0.

Figure 8. Refer to the following caption and surrounding text. — **Figure 8.** Time-frequency characteristics of the partial SSVEP responses. The numbers in brackets indicate the parallel code words, and the color bars indicate normalized STFT amplitude.
Download figure:
Standard image High-resolution image

4. Discussions

According to online experiments, sequential coding is a plausible protocol for designing calibration-free SSVEP-based BCI implementing multiple targets. The average accuracy of online experiments demonstrated that the classification feature is robust enough to be identified without training data. It is worth noting that future work can further exploit other strategies to increase classification accuracy and reduce the stimuli duration.

4.1. Methods of increasing accuracy

Because of the highly subject-specific complexity in EEG signals, incorporating training data will significantly improve the detection performance of SSVEP–BCI [22, 25]. No doubt, adopting subject-specific training data can boost the overall detection accuracy in this study. Owing to the characteristic of MFSC in this study, a training set of 160 targets can be acquired through combinations of eight code elements, which shortens training time. Because this study exploited an unsupervised approach with the predefined templates, the transitions between stimuli were neglected. If supervised methods are applied, the responses between transitions would be a crucial factor, and it needs further research. Furthermore, exploring transfer learning techniques would be a candidate method for enhancing accuracy with a small amount of training data [26].

Additionally, by combining different frequencies and phases, the selection of code element had an extraordinary impact on classification accuracy. The confusion matrix of online experiments is shown in figure 9. From it, we can infer that the arrangement of code word was acceptable due to the unnoticeable existence of specific error trials. According to our experience, three main reasons were responsible for wrong recognitions:

(a)
interference from adjacent stimuli.
(b)
some stimulation frequencies share common harmonics. For instance, 30 Hz is the second harmonic of 15 Hz and the third harmonic of 10 Hz, which may engender misjudgments.
(c)
visual fatigue caused by long-time tasks.

Figure 9. Refer to the following caption and surrounding text. — **Figure 9.** The confusion matrix of online experiments results (12 subjects × two trials).
Download figure:
Standard image High-resolution image

Respectively, adjacent stimuli of the gaze target negatively obstruct the user's attention, which can be weakened with a larger monitor screen. Secondly, problem (b) can be solved through optimization of visual stimuli' frequency. For example, the code element '0' corresponding 8 Hz can be replaced with 8.2 Hz. Thirdly, visual fatigue is an inevitable factor in long-time SSVEP–BCI tasks. Although many plausible anti-fatigue approaches such as the high-frequency stimulus [27] and the hybrid paradigm [10] were devised, reducing stimuli duration is extremely important in this study.

4.2. Methods of decreasing stimuli duration

The most direct way to diminish the time of stimuli is shortening the length of each code element, which may lead to lower detection accuracy. Equivalently, the strategy of increasing accuracy will probably assist the reduction of unnecessary stimulation. From the results of online experiments, we can infer that the length of visual stimuli was too lengthy for S2 and S11. However, S6 and S10 need more prolonged stimulation to guarantee efficiency. The dynamic stopping [28] or dynamic window strategy [29, 30] are good choices to address those problems by adaptively picking stimuli time towards different subjects. In fact, because of the highly sparse code word pattern, partial detection results can be decided after three code elements stimuli, reducing the consumed time. Likewise, the asynchronous system based on sequential coding is a viable approach for more efficient interaction by cycling stimulation sequences. Indeed, the study [14] has manifested the utility of applying MFSC to asynchronous systems.

Besides, decreasing the single code element duration while increasing the length of code words is another possible solution. For instance, the stimulation time remains 4 s if eight segments of 0.5 s are adopted. The sparsity of code word will grow, but the response of every element will decrease, so it is still worth optimizing the length of code word and code element.

4.3. Performance evaluation

Evaluating the performance of the BCI system is always a complex problem. In this study, we used the ITR and accuracy as the performance measure, which is very common in BCI studies. However, the definition of ITR was based on several assumptions which cannot be satisfied all the time [31]. Thus we can take the mutual information as another measure protocol. Alternatively channel capacity [32] could be used. According to the confusion matrix of all 12 subjects, the online experiments can reach average mutual information of 81.39 bits min⁻¹, similar to standard ITR. Due to the scarcity of the data, whether the assumptions of ITR are satisfied could not be verified, and mutual information could not be accurately estimated separately for each subject.

4.4. Paradigm comparison

Many previous studies focusing on increasing targets (equal or larger than 40) have been done, consisting of SSVEP-based [3, 11]; CVEP-based [33, 34]; P300-based [8, 9]; hybrid paradigm [10, 35]. The detailed information is shown in table 3. Compared with previous studies, this system based on MFSC retained the advantages of calibration-free and high accuracy. Future studies can also expand this protocol into more targets, hundreds even thousands with more code elements and longer code words.

Table 3. Characteristics of BCI study focusing on multiple targets (DF means dual-frequency).

Authors (Year)	Number of targets	Paradigm	Accuracy (%)	ITR (bits min⁻¹)	Calibration-free (Y/N)
Townsend et al (2010) [8]	72	P300	91.52	23.17	N
Jin et al (2011) [9]	84	P300	98.00	36.3	N
Yin et al (2013) [35]	64	SSVEP&P300	93.85	56.44	N
Chen et al (2014) [37]	45	SSVEP	84.1	105	Y
Chen et al (2015) [3]	40	SSVEP	91.95	151.18	Y
Nakanish et al (2018) [22]	40	SSVEP	89.83	325.33	N
Wei et al (2018) [33]	48	CVEP	91.67	181.05	N
Liu et al (2018) [34]	64	CVEP	88.36	184.6	N
Liang et al (2020) [11]	40	SSVEP (DF)	96.06	196.09	N
Xu et al (2020) [10]	108	SSVEP&P300	81.67	172.46	N
Our present work	160	SSVEP	87.16	78.84	Y

4.5. Number of targets

This study uses an SSVEP-based BCI system with 160 targets as the demonstration of applying MFSC on BCI. As elaborated in the method section, the optimized arrangements of code words can be selected and changed through the optimization approach. A robust system containing more targets is plausible when more code words are included. However, considering the LCD monitor's size restrictions, we chose 160 as a specific number to exemplify the plausibility of this encoding protocol.

With eight code elements and four segments, the maximum number of targets can reach 4096. But the estimated minimum distance between targets will grow less and less when the number changes from eight to 4096, which brings challenges for the classification process. The estimated minimum distance will reduce significantly when the number of targets reaches 8³ = 512, in other word, the hamming distance between code words will equal or less than 1, the minimum distance between code words will be the minimum distance between code elements. Whether the BCI systems with so many commands are still working worth further research.

4.6. Potential applications

Speller system designed for disabled patients is one of the most common applications of SSVEP–BCIs. Recent SSVEP–BCI speller research focused on the simulation of keyboards [16], and many systems were proven to be effective. Nonetheless, the speller system will be more efficient if the users can select more commands. Take the SSVEP–BCI based Chinese speller system for example, previous Chinese speller systems focused on spelling Pinyin with each character separately, in other words, users have to input several characters continuously for selecting a Chinese word, which is clearly inconvenient and time-consuming. Despite the study [36] has provided a 'double-spelling' strategy for simplicity and efficiency, it still need three continuous recognitions for inputting a Chinese character. Particularly, according to Chinese syllabification, an interface showing all Chinese pinyin combinations can more quickly input Chinese words. All the combinations are shown in figure 10, this interface ensures the possibilities of selecting different blocks are close. If the user intends to input a Chinese character, she will find the corresponding Pinyin in this interface, and the second interface will show the Chinese character and words for options. Obviously, this approach is only feasible when the number of targets is relatively high.

Figure 10. Refer to the following caption and surrounding text. — **Figure 10.** All combinations of Pinyin in the SSVEP-based BCI interface.
Download figure:
Standard image High-resolution image

4.7. Why MFSC?

As demonstrated in the introduction section, compared with multiple inputs with less targets, a single input with more targets can guarantee higher accuracy. Although this study adopted the MFSC as a serial encoding principle, the correct detection does not depend on the correct detection of every segment, resulting in high accuracy of overall recognitions. Furthermore, only training data of eight frequencies need to be collected if this system uses training algorithms.

5. Conclusion

This research designed a calibration-free SSVEP–BCI system implementing 160 targets by expanding the idea of MFSC. To our knowledge, this system is the only calibration-free BCI system that adopts over 100 commands. This study introduced an optimization method for the experimental paradigm and adopted FBCCA as the principal classification approach. The result of online experiments validated the feasibility and robustness of this system. Seven of 12 subjects reached average recognition accuracy higher than 90%. The total average recognition accuracy of online experiments was 87.16 ± 11.46%, and average ITR was 78.84 ± 15.59 bits min⁻¹. The results demonstrate that this protocol is reliable for designing SSVEP–BCIs containing numerous targets. This study will encourage more practical and multifunctional BCI applications.

Acknowledgments

This study was supported in part by the Key-Area Research and development program of Guangdong province (No. 2018B030339001), the National Key Research and Development Program of China (No. 2017YFB1002505), the National Foundation of China under Grants (No. 62006024), Aeronautical Science Foundation of China (No. 2019ZG073001), Fundamental research Funds for the Central Universities (BUPT Project No. 2049XD17), Beijing Science and Technology Plan (No. Z201100004420015).

Data availability statement

The data that support the findings of this study are openly available at the following URL/DOI: http://bci.med.tsinghua.edu.cn.

Author e-mails

Author affiliations

Author notes

ORCID iDs

Dates

Peer review information