Abstract
Complexity analysis of short-term cardiovascular control is traditionally performed using entropy-based approaches including corrective terms or strategies to cope with the loss of reliability of conditional distributions with pattern length. This study proposes a new approach aiming at the estimation of conditional entropy (CE) from short data segments (about 250 samples) based on the k-nearest-neighbor technique. The main advantages are: (i) the control of the loss of reliability of the conditional distributions with the pattern length without introducing a priori information; (ii) the assessment of complexity indexes without fixing the pattern length to an arbitrary low value. The approach, referred to as k-nearest-neighbor conditional entropy (KNNCE), was contrasted with corrected approximate entropy (CApEn), sample entropy (SampEn) and corrected CE (CCE), being the most frequently exploited approaches for entropy-based complexity analysis of short cardiovascular series. Complexity indexes were evaluated during the selective pharmacological blockade of the vagal and/or sympathetic branches of the autonomic nervous system. We found that KNNCE was more powerful than CCE in detecting the decrease of complexity of heart period variability imposed by double autonomic blockade. In addition, KNNCE provides indexes indistinguishable from those derived from CApEn and SampEn. Since this result was obtained without using strategies to correct the CE estimate and without fixing the embedding dimension to an arbitrary low value, KNNCE is potentially more valuable than CCE, CApEn and SampEn when the number of past samples most useful to reduce the uncertainty of future behaviors is high and/or variable among conditions and/or groups.
List of abbreviations
SE | Shannon entropy |
CE | conditional entropy |
CCE | corrected CE |
KNNCE | k-nearest-neighbor CE |
ApEn | approximate entropy |
CApEn | corrected ApEn |
SampEn | sample entropy |
UQ | uniform quantization |
B | baseline |
AT | atropine |
PR | propranolol |
AT + PR | PR follows the administration of AT |
CL | clonidine |
HP | heart period |
SAP | systolic arterial pressure |
CI | complexity index |
NCI | normalized CI |
1. Introduction
Relevant information about short-term cardiovascular control can be obtained from entropy-based complexity analysis of short heart rate variability series (Pincus et al 1993, Richman and Moorman 2000, Porta et al 2000). Indeed, in healthy subjects it was found that the entropy-based complexity of heart rate variability decreased as a function of the gradual sympathetic activation and vagal withdrawal induced by a graded head-up tilt test (Porta et al 2007b), thus mirroring the state of the sympatho-vagal balance governing cardiovascular control. The relevance of entropy-based complexity indexes (CIs) has been substantiated by their ability to identify pathological conditions (Costa et al 2002, Boettger et al 2006, Javorka et al 2008, Schulz et al 2010).
Usually entropy-based indexes of complexity approximate conditional entropy (CE). CE calculation requires the construction of patterns of length L − 1 using the technique of the lagged coordinates (Takens 1981) and the assessment of the distribution of the images of a given pattern (i.e. the distribution of the current value given previous L − 1 samples). If the distribution of the images conditioned by the assigned pattern is flat and this situation occurs for all the patterns extracted from the series, CE is large. Conversely, if the distribution of the images exhibits a peak and this situation occurs for all the patterns, future values can be largely predicted from previous samples and CE is low. In order to estimate the conditional distribution of the images of a given pattern, the entire set of patterns extracted from the series is scanned to search all the patterns similar to the assigned one according to a similarity criterion. The reliability of this approach is dramatically limited when the number of patterns detected to be similar to a given one is insignificant, thus limiting its application to short data sequences. Sometimes no pattern can be actually recognized to be indistinguishable from the assigned one within a given tolerance (except itself). This situation drastically reduces the reliability of the CE estimate by imposing a bias toward its erroneous reduction (Pincus 1995, Porta et al 1998, Richman and Moorman 2000, Porta et al 2007b). This bias has been overcome by imposing a correction to the CE estimate (Porta et al 1998) or introducing a priori information to cope with the loss of the reliability of conditional probability estimates when the number of patterns similar to the assigned one decreases importantly (Richman and Moorman 2000, Porta et al 2007b).
The aim of this study is to propose a technique for CE estimation that does not need either addition of correction terms or handling the shortfall of the reliability of conditional distribution. This technique exploits the k-nearest-neighbor approach for the construction of the conditional distribution. The k patterns most similar to the reference one (i.e. the k nearest neighbors), regardless of their actual similarity to it, are selected to contribute with their images to the conditional distribution. As a consequence the set of images is formed by k samples for any pattern length L. The approach was contrasted with the most commonly utilized approaches for the entropy-based complexity analysis of short heart period (HP) variability series: (i) corrected approximate entropy (CApEn) (Pincus 1995, Porta et al 2007b); (ii) sample entropy (SampEn) (Richman and Moorman 2000); (iii) corrected CE (CCE) (Porta et al 1998) making use of uniform quantization (UQ). Comparison was made during an experimental protocol modulating the complexity of the short-term cardiovascular control via the administration of drug blocking sympathetic and/or parasympathetic branches of the autonomic nervous system (Parlow et al 1995).
2. Methods
2.1. CE
Given the stationary time series y = {y(i), i = 1, ..., N} let us define as a pattern of length L the ordered sequence of L samples, yL(i) = (y(i), y(i − 1),..., y(i − L + 1)). The pattern yL(i) is actually a point in the L-dimensional embedding space reconstructed with the technique of the lagged coordinates with delay equal to 1. The pattern yL(i) can be seen as the sequence formed by the current sample, y(i), and by the sequence of L − 1 past samples, yL−1(i − 1). The sample y(i) is referred to as the image of yL−1(i − 1) in the following. We define as yL = {yL(i), i = L, ..., N} and yL − 1 = {yL − 1(i − 1), i = L, ..., N} the sets of patterns of length L and L − 1 respectively. The CE associated with yL measures the average amount of information carried by the most recent sample of a pattern in yL, y(i), when L − 1 previous samples (i.e. yL−1(i − 1)) are given. It is computed as the weighted sum of the Shannon entropy (SE) of the images of yL−1(i − 1), y/yL−1(i − 1):

where the weight is the probability of the pattern yL−1(i − 1), p(yL−1(i − 1)). The probability p(yL−1(i − 1)) is estimated as the fraction of patterns in yL−1 similar to yL−1(i − 1). The SE of the images of yL−1(i − 1) is given by

where log is the natural logarithm and

is the conditional probability that the current sample assumes the value y(i) given that its L − 1 previous samples are yL−1(i − 1). While the sum in (1) is extended over all the possible patterns yL−1(i − 1), the sum in (2) is extended over the possible values y(i) given yL−1(i − 1). CE is bounded between 0 and SE of y. CE = 0 is computed when the distribution of the images of yL−1(i − 1) exhibits a unit peak and this situation occurs for any yL−1(i − 1), thus suggesting the full predictability of the current sample given L − 1 past values. The SE of y

represents the average information carried by y(i) when no previous samples are given. In addition, CE(L) is always decreasing (or constant) with L since the increase of the number of past values reduces (or leaves unchanged) the uncertainty about future values. CE(1) is set to SE(y).
2.2. CE estimation
The CE is estimated according to two strategies.
The first strategy relies on the construction of the distribution of the images of yL−1(i − 1), y/yL−1(i − 1), on the estimate of SE(y/yL−1(i − 1)) and on the computation of (1). The construction of y/yL−1(i − 1) implies scanning yL−1 to search for yL−1(i − 1). When a pattern yL−1(j − 1) similar to yL−1(i − 1) is found in yL−1, the image of yL−1(j − 1), y(j), is stored, thus constructing y/yL−1(i − 1). Given y/yL−1(i − 1), SE(y/yL−1(i − 1)) can be computed.
The second strategy does not need the estimation of SE(y/yL−1(i − 1)). It is based on the direct estimation of the conditional probability of y(i) given yL−1(i − 1), p(y(i)/yL−1(i − 1)), according to (3). It implies scanning yL and yL−1 to search for patterns similar to yL(i) and yL−1(i − 1) respectively. The ratio of the fraction of patterns similar to yL(i) in yL to that of patterns similar to yL−1(i − 1) in yL−1 allows the estimation of p(y(i)/yL−1(i − 1)).
Three traditional approaches to the estimate of entropy-based complexity will be described in the following. One of these approaches makes use of the first strategy (i.e. the UQ approach), while the remaining two methods exploit the second strategy (i.e. ApEn and SampEn).
2.3. UQ approach
This approach makes use of the first strategy described in section 2.2. The full dynamics of the series is spread over ξ bins of size ε = (max – min)/ξ where max and min stand for the maximum and the minimum of y. The UQ imposes a perfect partition of the L-dimensional embedding space into ξL cells (i.e. hyper-cubes of side ε). Cells are disjoint and their union covers the entire L-dimensional embedding space. Cells define the coarse graining of the embedding space and similarity criterion: indeed, all the patterns inside a cell are indistinguishable. Therefore, given a pattern yL−1(i − 1), every pattern belonging to the same cell of yL−1(i − 1) contributes with its image to form the distribution of y/yL−1(i − 1). Given the UQ procedure the distribution of y/yL−1(i − 1) is quantized over ξ bins and SE(y/yL–1(i − 1)) is easily computed from the sample frequencies. The evaluation of SE(y/yL−1(i − 1)), extended to all the patterns yL−1(i − 1), allows the assessment of the CE according to (1).
2.4. Approximate entropy
Approximate entropy (ApEn) makes use of the second strategy described in section 2.2. Indeed, ApEn is computed as

where Ni(L, r) is the number of points in yL at distance smaller than r from yL(i) and Ni(L − 1, r) is the number of points in yL−1 at distance smaller than r from yL−1(i − 1). As a consequence ApEn is calculated without approximating SE(y/yL−1(i − 1)) and solely based on the estimate of (3) according to the fraction of patterns similar to yL−1(i − 1) in yL−1 that remain similar, within a tolerance r, in yL. Since we utilize the Euclidean norm to measure the distance between patterns in the embedding space, cells defining coarse graining of the embedding space are hyper-spheres of radius r, constructed around a given pattern (i.e. N − L + 1 intersecting pattern-centered cells). At difference with the UQ approach, cells intersect each other. However, all the cells have the same dimension as in the UQ approach. It is worth stressing that since yL(i) is always at distance closer than r from itself, both Ni(L, r) ≥ 1 and Ni(L − 1, r) ≥ 1. In addition, since Ni(L − 1, r) ≥ Ni(L, r), Ni(L, r)/Ni(L − 1, r) ≤ 1. ApEn(1) is computed as the negative average natural logarithm of the probability that two samples are at distance closer than r.
2.5. SampEn
Similarly to ApEn, SampEn is based on the direct approximation of the conditional probability. However, instead of approximating (3) as in ApEn, SampEn estimates the conditional probability that two patterns that are similar in yL−1 remain similar, within a tolerance r, in yL. It is given by the ratio of ∑Ni = LNi(L, r) to ∑Ni = LNi(L − 1, r). Therefore, SampEn is defined as

The Euclidean norm is utilized as in ApEn and, consequently, the coarse graining procedure is the same (i.e. N − L + 1 intersecting pattern-centered hyper-spheres). Since Ni(L − 1, r) ≥ Ni(L, r), then ∑Ni = LNi(L, r)/∑Ni = LNi(L − 1, r) ≤ 1. SampEn(1) is computed as the negative natural logarithm of the probability that two samples are at distance closer than r.
2.6. Toward a less biased estimator of CE
The UQ approach, ApEn and SampEn share a common feature: the dimension of the cells for coarse graining the embedding space is fixed (Porta et al 2007c). As a consequence, when the time series are relatively short, these techniques cannot ensure a reliable statistic for the estimation of the conditional distribution because very few patterns might be present inside the cells. Moreover, the reliability of the estimate deteriorates more and more with the length of the conditioning pattern (i.e. L − 1), thus leading to completely unreliable conditional distributions at high L. If the series are short, the complete loss of the reliability occurs at very low L (Porta et al 1998). The full unreliability occurs as early as only one pattern is present in the cell. This situation is commonly referred to as 'self-matching' in ApEn and SampEn calculation (Richman and Moorman 2000). In ApEn and SampEn calculation this unique pattern is located in the center of the cell, while in the UQ approach it can be anywhere inside the cell. 'Self-matching' introduces a significant bias in the estimate of conditional probabilities and, consequently, on the estimate of CE. Indeed, p(y(i)/yL−1(i − 1)) = 1 given the unique appearance of yL−1(i − 1). As a result the logarithm is equal to 0, thus artificially reducing CE. Therefore, strategies have been implemented to reduce this bias. The rationale of these strategies is mainly to substitute the false certainly associated with the unique appearance of the pattern with the maximum uncertainty that can be estimated over the series. In the case of the UQ technique the CE has been corrected, thus defining the CCE, by adding to a term proportional to the fraction of 'self-matching' patterns (Porta et al 1998). The coefficient of proportionality is the SE(y), thus substituting the erroneous null contribution of the 'self-matching' pattern to the CE estimate with the maximal amount of information carried by y (Porta et al 1998). In the case of ApEn a corrected ApEn (CApEn) has been defined by substituting the ratio Ni(L, r)/Ni(L − 1, r) with 1/(N − L + 1) when Ni(L, r) = 1 or Ni(L − 1, r) = 1 (Porta et al 2007b), thus forcing a situation of maximum uncertainty in place of false certainty. In the case of SampEn, correction is simply obtained by excluding 'self-matching' patterns from the computation of Ni(L, r) and Ni(L − 1, r) (i.e. both Ni(L, r) and Ni(L − 1, r) are diminished by 1) (Richman and Moorman 2000).
2.7. K-nearest-neighbor CE
In order to overcome the issue of 'self-matching', a completely different strategy can be followed. Instead of exploiting a coarse graining procedure based on equal size cells, we make use of a coarse graining approach based on cells of different sizes (Porta et al 2007c). This approach makes use of the Euclidean norm to measure the distance between patterns and the same coarse graining procedure as ApEn and SampEn (i.e. N − L + 1 intersecting pattern-centered cells). However, instead of keeping constant the size of the cells, it is adjusted to include a fixed amount of points (i.e. the k nearest neighbors of yL−1(i − 1)). All the k nearest neighbors of yL−1(i − 1) contribute with their images to form the conditional distribution y/yL−1(i − 1), thus limiting the loss of reliability of y/yL−1(i − 1) with L. SE(y/yL−1(i − 1)) is easily computed after UQ of y or as the negative natural logarithm of the probability that two samples of y/yL−1(i − 1) are at distance closer than r as in SampEn. CE is estimated as

In addition, this approach shares with CCE a significant advantage: regardless of the dynamics of the process, k-nearest-neighbor CE (KNNCE) exhibits a well-defined minimum over L. Since the appendage of one dimension to the embedding space (i.e. the unit increase of the length of the pattern) causes the spread of all the points in the embedding space, the k nearest neighbors of yL−1(i − 1) move apart. The spread produces two distinct consequences: (i) the unfolding of the dynamics, thus resolving the ambiguities in fixing future behaviors and decreasing KNNCE; (ii) the increase of uncertainty about future values, thus weakening prediction and increasing KNNCE. The balance between these two opposite contributions leads to the formation of a KNNCE minimum over L. Even though a minimum is always detectable even in the presence of fully unpredictable dynamics (e.g. white noise), its deepness depends on the type of dynamics, being deeper when repetitive patterns are found.
3. Experimental protocol and data analysis
3.1. Experimental protocol
This protocol was originally designed to study the effects of the pharmacological blockades of the parasympathetic and sympathetic branches of the autonomic nervous system on baroreflex sensitivity (Parlow et al 1995, Toader et al 2008). Briefly, we studied nine healthy male physicians aged from 25 to 46 years familiar with the study setting. None of them had any abnormal finding in history, physical examination or electrocardiography or was receiving any medication. All had normal resting brachial arterial pressure measured with a sphygmomanometer. They were instructed to avoid tobacco, alcohol and caffeine for 12 h and strenuous exercise for 24 h before each experiment. Electrocardiogram and noninvasive finger blood pressure (Finapress 2300, Ohmeda, Englewood, Colorado, USA) were recorded during the experiments. The hand of the subject was kept at the level of the heart. Signals were sampled at 500 Hz. Experimental sessions were performed in 3 days at approximately 2 week intervals. One volunteer took part only in the first day experiments. Subjects remained at rest in supine position in a quiet darkened room during all the recordings. Each experiment started in the morning between 08:00 and 09:00 AM and consisted of 15–20 min of baseline (B) recording followed by 15–20 min of recording after drug administration. Recordings were obtained: (i) on day 1 after parasympathetic blockade with 40 µg kg−1 i.v. atropine sulfate (atropine, AT) to block muscarinic receptors; (ii) on day 2 after β-adrenergic blockade with 200 µg kg−1 i.v. propranolol (PR) to block β1 cardiac and β2 vascular peripheral adrenergic receptors; (iii) on day 1 PR was administered at the end of the AT session (the dose of AT was reinforced by 10 µg kg−1) to combine the effect of AT and PR (AT + PR) and obtain a cardiac parasympathetic and sympathetic blockade; (iv) on day 3 recordings were obtained 120 min after 6 µg kg−1 per os clonidine hydrochloride (clonidine, CL) to centrally block the sympathetic outflow to heart and vasculature. All the subjects gave their written informed consent. The protocol adhered to the principles of the Declaration of Helsinki. The human research and ethical review board of the Hospices Civils de Lyon approved the protocol.
3.2. Extraction of the beat-to-beat variability series
After detecting the QRS complex on the electrocardiogram and locating the R-apex using parabolic interpolation, the temporal distance between two consecutive R parabolic apexes was computed and utilized as an approximation of HP. The maximum of arterial pressure inside HP was defined as systolic arterial pressure (SAP). The occurrences of QRS and SAP peaks were carefully checked to avoid erroneous detections or missed beats. If isolated ectopic beats affected HP and SAP values, these measures were linearly interpolated using the closest values unaffected by ectopic beats. HP and SAP were extracted on a beat-to-beat basis. The series were linearly detrended. Sequences of 256 consecutive measures were randomly selected inside each experimental condition. This length of the series was chosen because short-term analysis of cardiovascular variability series is usually performed over about 5 min recordings or 250 samples (Task Force 1996), thus focusing time scales of mechanisms typically involved in short-term cardiovascular regulation. If evident nonstationarities, such as very slow drifting of the mean or sudden changes of the variance, were present despite the linear detrending, the random selection was carried out again.
3.3. Complexity indexes
The following CIs and normalized CIs (NCIs) were assessed: (i) CICCE was the minimum of CCE over L after UQ based on ξ = 6 bins (Porta et al 1998, Porta et al 2007a); (ii) NCICCE was the ratio of CICCE to SE(y) assessed after UQ based on ξ = 6 bins (Porta et al 2000, Porta et al 2007a); (iii) CICApEn was assessed with L − 1 = 2 and r = 20% of the standard deviation (Pincus 1995, Porta et al 2007b); (iv) NCICApEn was the ratio of CICApEn to ApEn(1) (Porta et al 2007b); (v) CISampEn was assessed with L − 1 = 2 and r = 20% of the standard deviation (Richman and Moorman 2000); (vi) NCISampEn was the ratio of CISampEn to SampEn(1) (Porta et al 2007b); (vii) CIKNNCE was the minimum of KNNCE over L with k = 30 and UQ based on ξ = 6; (viii) NCIKNNCE was the ratio of CIKNNCE to SE(y) assessed after UQ based on ξ = 6 bins.
3.4. Statistical analysis
Kruskal–Wallis one way analysis of variance on ranks was applied to check whether CIs (or NCIs) changed after drug administration (Dunn's test for multiple comparisons versus B). After pooling together CIs (or NCIs) regardless of the experimental condition, Friedman repeated measures analysis of variance on ranks was applied to check differences among medians of CIs (or NCIs). Dunn's test was utilized to check the significance of all pair-wise comparisons. After pooling together CIs (or NCIs) regardless of the experimental condition, linear regression analysis was carried out between any pair of CIs (or NCIs). The Pearson product moment correlation coefficient, r, was calculated. If r was significantly different from 0, a significant linear correlation between indexes was detected. The coefficient of variations was assessed to compare the variability of indexes characterized by different medians. The significance of the difference between coefficients of variation was evaluated (Miller 1991). A p < 0.05 was always considered significant.
4. Results
Figure 1 shows an example of a deterministic periodic signal (figure 1(a)) and the course of CCE, CApEn, SampEn and KNNCE as a function of the pattern length L (figures 1(c), (e), (g), (i)). The deterministic signal is the periodic repetition of the sequence 1001000. Only the knowledge of six previous samples allows the prefect prediction of the next value. The arrow indicates CI derived from any functional. As expected CCE, CApEn, SampEn and KNNCE attained 0 at L − 1 = 6 and remained at 0 for L − 1 > 6 (figures 1(c), (e), (g), (i)). As a result of the minimization procedure utilized to derive CI from CCE and KNNCE, CICCE and CIKNNCE were equal to 0 (figures 1(c), (i)). In contrast, CApEn and SampEn were sampled at L − 1 = 2, thus producing CICApEn and CISampEn different from 0 (figures 1(e), (g)). Figure 1 shows a realization white Gaussian noise with zero mean and unit variance (figure 1(b)) and the course of CCE, CApEn, SampEn and KNNCE with L (figures 1(d), (f), (h), (j)). White noise is completely unpredictable independently of the number of past samples utilized to forecast future values. As expected, CCE and KNNCE remained almost constant with L (figures 1(d), (j)): the per cent decrease of CICCE and CIKNNCE with respect to the maximum was 7.0% and 5.6%. In contrast, CApEn increased and reached a plateau at L = 4 (figure 1(f)). The plateau value was equal to log(N − L + 1) and it was achieved when the per cent of the 'self-matching' patterns was 100%. SampEn progressively augmented with L and it became indefinite at L = 4 (figure 1(h)). Indeed, when the per cent of 'self-matching' patterns was 100%, SampEn led to log(0/0). CICApEn and CISampEn were sampled at L − 1 = 2 with a per cent decrease with respect to the maximum equal to 13.5% and 0.0%.
Figure 1. An example of a deterministic period signal and a realization of a Gaussian white noise are shown in (a) and (b) respectively. Values of both series are expressed as arbitrary units (a.u.). The course of CCE, CApEn, SampEn, KNNCE as a function of the pattern length relevant to the deterministic periodic signal are shown in (c), (e), (g) and (i) respectively, whereas those relevant to the Gaussian white noise are shown in (d), (f), (h) and (j) respectively. The arrow indicates the CI derived from any function according to the approach. While CIs derived from CCE and KNNCE are obtained via a minimization procedure, those derived from CApEn and SampEn are computed at fixed pattern length (L = 3).
Download figure:
Standard imageFigure 2 shows an example of the beat-to-beat HP and SAP series (figures 2(a), (b)). The two series exhibit regular patterns being largely but not completely predictable. Repetitive schemes in the HP series are characterized by high frequencies (figure 2(a)), while those in the SAP series by lower frequencies (figure 2(b)). The course of CCE, CApEn, SampEn and KNNCE as a function of the pattern length L computed over the HP and SAP series is shown in figures 2(c), (e), (g), (i) and (d), (f), (h), (j) respectively. CCE and KNNCE calculated over the HP series exhibited a deep minimum (figures 2(c), (i)), thus confirming the large regularity and predictability of the HP series. The same course was detectable when CCE and KNNCE were assessed over the SAP series (figures 2(d), (j)). CICCE and CIKNNCE, taken as the minimum of CCE and KNNCE, were 1.14 and 1.13 in the case of HP series and 0.99 and 0.94 in the case of SAP one with a per cent decrease with respect to the maximum of 28% and 29% in the case of HP series and 30% and 33% in the case of SAP one. CApEn computed from the HP and SAP series (figures 2(e), (f)) exhibited a minimum as well. However, CICApEn did not coincide with the minimum of CApEn. CICApEn was 3.30 in the case of HP series and 3.22 in the case of SAP series with a per cent decrease with respect to the maximum of 40% and 42%. The minimum of CApEn was followed by a steep rise up to a plateau. It was attained at L = 5 when the percentage of 'self-matching' patterns was 100%. Also the course of SampEn showed a minimum (figures 2(g), (h)). When SampEn was calculated over the HP series the minimum (i.e. 1.85) coincided with CISampEn, while when SampEn was computed over the SAP series, CISampEn (i.e. 1.91) did not correspond to the minimum. The per cent decrease of CISampEn with respect to the maximum of SampEn was 11% and 21% in figures 2(g) and (h) respectively. When assessed both from the HP and SAP series, SampEn became indefinite at L = 5 as a result of 100% of 'self-matching' patterns.
Figure 2. A beat-to-beat series of HP and SAP are shown in (a) and (b) respectively. The course of CCE, CApEn, SampEn, KNNCE as a function of the pattern length relevant to the HP series are shown in (c), (e), (g) and (i) respectively, while those relevant to the SAP series are shown in (d), (f), (h) and (j) respectively. The arrow indicates the CI derived from any function according to the approach. While CIs derived from CCE and KNNCE are obtained via a minimization procedure, those derived from CApEn and SampEn are computed at fixed pattern length (L = 3).
Download figure:
Standard imageFigure 3 shows all the CIs and NCIs pooled together regardless of the experimental condition. This figure suggests that CIs have different medians both when assessed from the HP (figure 3(a)) and SAP (figure 3(c)) series. Only CICCE and CIKNNCE had similar medians. The same finding held in the case of NCIs (figures 3(b), (d)). Since the medians were generally dissimilar, the coefficient of variation was utilized to compare the dispersion of CIs and NCIs. As to indexes derived from the HP series the coefficient of variation of CICCE, CICApEn, CISampEn and CIKNNCE was 0.24, 0.28, 0.25 and 0.23 and that of NCICCE, NCICApEn, NCISampEn and NCIKNNCE was 0.23, 0.27, 0.23 and 0.22 respectively. As to indexes derived from the SAP series the coefficient of variation of CICCE, CICApEn, CISampEn and CIKNNCE was 0.20, 0.32, 0.26 and 0.22 and that of NCICCE, NCICApEn, NCISampEn and NCIKNNCE was 0.23, 0.30, 0.25 and 0.24 respectively. In the case of HP series the coefficient of variation of CIs was insignificantly different, while in the case of SAP series the coefficient of variation of CICApEn was significantly higher than that of CICCE and CIKNNCE. Although the coefficients of variation of NCIs tended to be more homogenous, a significant difference between those of NCICApEn and NCICCE was detected in both HP and SAP series. When the indexes were computed from the HP series, the coefficient of variation of NCI was smaller than that of the corresponding CI. Over the SAP series results were more contrasting: the coefficient of variation of NCICApEn and NCISampEn was smaller than that of CICApEn and CISampEn respectively, while the reverse situation was observed when comparing NCICCE with CICCE and NCIKNNCE with CIKNNCE.
Figure 3. Box-and-whiskers plots report the 10th, 25th, 50th, 75th and 90th percentiles of CICCE, CICApEn, CISampEn and CIKNNCE computed from the HP (a) and SAP (c) series during autonomic blockade protocol. Indexes are pooled together independently of the experimental condition. NCICCE, NCICApEn, NCISampEn and NCIKNNCE computed from the HP and SAP series are shown in (b) and (d) respectively. All pair-wise comparisons are significant with p < 0.05 with the notable exception of CICCE versus CIKNNCE in (a) and (c) and NCICCE versus NCIKNNCE in (b) and (d).
Download figure:
Standard imageResults of linear correlation analysis between CI pairs derived from the HP series are reported in table 1. Pearson product moment correlation coefficient, r, and probability of type I error, p, are reported above and below the main diagonal respectively. The largest r was found between CICCE and CIKNNCE and between CISampEn and CICApEn (i.e. 0.973 and 0.967 respectively). The smallest r was found between CISampEn and CICCE and between CISampEn and CIKNNCE (i.e. 0.805 and 0.803 respectively). The linear relation between any CI pair was always significant with very low probability of type I error. The normalization procedure (table 2) improved the strength of linear relation between any pair of indexes with the exception of the relation between CICCE and CIKNNCE and between CISampEn and CICApEn, even though r remained very high (i.e. 0.971 and 0.95 respectively). Table 3 reports findings of the linear correlation analysis between CI pairs derived from the SAP series. The strength of the linear relation between CI pairs increased compared to findings relevant to the HP series (all the r values were above 0.9) and, conversely, the probability of type I error decreased. The largest r was again found between CICCE and CIKNNCE and between CISampEn and CICApEn (i.e. 0.976 and 0.975 respectively). The smallest r was found between CICApEn and CICCE and between CICApEn and CIKNNCE (i.e. 0.912 and 0.922 respectively). After normalization (table 4) the strength of linear relation between any pair of indexes was improved further.
Table 1. Linear correlation analysis between CIs derived from the HP series.
CICCE | CICApEn | CISampEn | CIKNNCE | |
---|---|---|---|---|
CICCE | 0.844 | 0.805 | 0.973 | |
CICApEn | 4.50 · 10−17 | 0.967 | 0.850 | |
CISampEn | 1.58 · 10−14 | 1.96 · 10−35 | 0.803 | |
CIKNNCE | 8.82 · 10−38 | 1.75 · 10−17 | 1.89 · 10−14 |
Correlation coefficient, r, and probability of type I error, p, are above and below the main diagonal respectively. CCE = corrected conditional entropy; CApEn = corrected approximate entropy; SampEn = sample entropy; KNNCE = k-nearest-neighbor conditional entropy.
Table 2. Linear correlation analysis between NCIs derived from the HP series.
NCICCE | NCICApEn | NCISampEn | NCIKNNCE | |
---|---|---|---|---|
NCICCE | 0.947 | 0.908 | 0.971 | |
NCICApEn | 1.09 · 10−29 | 0.950 | 0.947 | |
NCISampEn | 3.54 · 10−23 | 2.11 · 10−30 | 0.904 | |
NCIKNNCE | 3.44 · 10−37 | 9.67 · 10−30 | 1.06 · 10−22 |
Correlation coefficient, r, and probability of type I error, p, are above and below the main diagonal respectively. CCE = corrected conditional entropy; CApEn = corrected approximate entropy; SampEn = sample entropy; KNNCE = k-nearest-neighbor conditional entropy.
Table 3. Linear correlation analysis between CIs derived from the SAP series.
CICCE | CICApEn | CISampEn | CIKNNCE | |
---|---|---|---|---|
CICCE | 0.912 | 0.925 | 0.976 | |
CICApEn | 1.09 · 10−23 | 0.975 | 0.922 | |
CISampEn | 1.20 · 10−25 | 9.83 · 10−39 | 0.925 | |
CIKNNCE | 1.29 · 10−39 | 3.24 · 10−25 | 1.40 · 10−25 |
Correlation coefficient, r, and probability of type I error, p, are above and below the main diagonal respectively. CCE = corrected conditional entropy; CApEn = corrected approximate entropy; SampEn = sample entropy; KNNCE = k-nearest-neighbor conditional entropy.
Table 4. Linear correlation analysis between NCIs derived from the SAP series.
NCICCE | NCICApEn | NCISampEn | NCIKNNCE | |
---|---|---|---|---|
NCICCE | 0.958 | 0.956 | 0.980 | |
NCICApEn | 1.07 · 10−32 | 0.977 | 0.973 | |
NCISampEn | 4.56 · 10−32 | 4.14 · 10−40 | 0.963 | |
NCIKNNCE | 8.17 · 10−42 | 5.43 · 10−38 | 3.74 · 10−34 |
Correlation coefficient, r, and probability of type I error, p, are above and below the main diagonal respectively. CCE = corrected conditional entropy; CApEn = corrected approximate entropy; SampEn = sample entropy; KNNCE = k-nearest-neighbor conditional entropy.
Table 5 shows the medians (25th–75th percentiles) of CIs and NCIs computed from the HP series during autonomic blockade protocol. All CIs significantly decreased after AT. CICApEn, CISampEn and CIKNNCE decreased after AT + PR as well, while the decrease of CICCE did not reach statistical significance. All CIs remained unmodified after both PR and CL. NCIs provided similar results. CIs and NCIs computed from the SAP series during autonomic blockade protocol are reported in table 6. All CIs were unmodified by AT, AT + PR and PR. Conversely, CL significantly increased all CIs. Results were confirmed by NCIs.
Table 5. CIs and NCIs assessed over the HP series.
B | AT | AT + PR | PR | CL | |
---|---|---|---|---|---|
CICCE | 1.19 (1.07–1.28) | 0.62* (0.60–0.72) | 0.91 (0.87–1.11) | 1.19 (1.07–1.29) | 1.39 (1.23–1.44) |
NCICCE | 0.77 (0.74–0.81) | 0.40* (0.36–0.45) | 0.64 (0.55–0.67) | 0.79 (0.78–0.89) | 0.83 (0.82–0.86) |
CICApEn | 4.15 (3.80–4.38) | 1.79* (1.45–1.90) | 3.20* (2.84–3.68) | 4.21 (3.80–4.44) | 4.60 (4.31–4.71) |
NCICApEn | 1.81 (1.64–1.92) | 0.80* (0.64–0.86) | 1.39* (1.26–1.60) | 1.84 (1.71–1.96) | 2.01 (1.89–2.05) |
CISampEn | 2.14 (1.99–2.27) | 1.13* (0.92–1.29) | 1.73* (1.53–1.81) | 2.16 (1.83–2.39) | 2.41 (2.25–2.57) |
NCISampEn | 0.99 (0.91–1.02) | 0.55* (0.43–0.60) | 0.79* (0.73–0.85) | 1.01 (0.85–1.09) | 1.08 (1.03–1.19) |
CIKNNCE | 1.16 (1.06–1.25) | 0.62* (0.54–0.87) | 0.89* (0.86–1.08) | 1.17 (1.02–1.23) | 1.26 (1.13–1.36) |
NCIKNNCE | 0.77 (0.73–0.79) | 0.39* (0.33–0.55) | 0.61* (0.53–0.65) | 0.78 (0.76–0.82) | 0.77 (0.75–0.83) |
Values are expressed as median (first quartile–third quartile). CCE = corrected conditional entropy; CApEn = corrected approximate entropy; SampEn = sample entropy; KNNCE = k-nearest-neighbor conditional entropy; B = baseline; AT = atropine; PR = propranolol; AT + PR = atropine plus propranolol; CL = clonidine. The symbol * indicates a significant difference (p < 0.05) versus B.
Table 6. CIs and NCIs assessed over the SAP series.
B | AT | AT + PR | PR | CL | |
---|---|---|---|---|---|
CICCE | 0.88 (0.80–0.92) | 0.83 (0.75–0.87) | 0.95 (0.67–0.99) | 0.94 (0.80–1.01) | 1.22* (1.17–1.26) |
NCICCE | 0.55 (0.51–0.64) | 0.51 (0.44–0.52) | 0.57 (0.43–0.62) | 0.56 (0.49–0.62) | 0.78* (0.74–0.81) |
CICApEn | 2.79 (2.48–3.48) | 2.14 (1.73–2.89) | 3.15 (1.55–3.38) | 2.71 (2.32–3.22) | 4.43* (4.21–4.60) |
NCICApEn | 1.27 (1.08–1.52) | 0.95 (0.79–1.27) | 1.39 (0.71–1.48) | 1.19 (1.09–1.41) | 1.92* (1.85–1.98) |
CISampEn | 1.58 (1.46–1.85) | 1.36 (1.11–1.59) | 1.73 (1.12–1.79) | 1.54 (1.33–1.83) | 2.29* (2.17–2.45) |
NCISampEn | 0.73 (0.68–0.85) | 0.61 (0.52–0.73) | 0.78 (0.52–0.84) | 0.72 (0.67–0.82) | 1.04* (0.98–1.09) |
CIKNNCE | 0.81 (0.77–0.89) | 0.77 (0.62–0.85) | 0.91 (0.58–0.97) | 0.85 (0.76–0.92) | 1.15* (1.14–1.22) |
NCIKNNCE | 0.53 (0.48–0.60) | 0.47 (0.36–0.53) | 0.57 (0.38–0.63) | 0.51 (0.47–0.57) | 0.74* (0.72–0.78) |
Values are expressed as median (first quartile–third quartile). CCE = corrected conditional entropy; CApEn = corrected approximate entropy; SampEn = sample entropy; KNNCE = k-nearest-neighbor conditional entropy; B = baseline; AT = atropine; PR = propranolol; AT + PR = atropine plus propranolol; CL = clonidine. The symbol * indicates a significant difference (p < 0.05) versus B.
5. Discussion
This study proposes a new technique for the CE estimation over short data series based on the k-nearest-neighbor technique (i.e. the KNNCE approach) and compares it to the most commonly utilized methods to assess the entropy-based complexity of cardiovascular control from short beat-to-beat variability series (i.e. CCE, CApEn and SampEn).
5.1. KNNCE approach to the CE estimate
The most important features of the KNNCE approach are: (i) to limit the loss of reliability of the conditional distribution with the length of the conditioning pattern; (ii) to allow the assessment of CI and NCI according to an optimization procedure.
One of the main disadvantages of CApEn, SampEn and CCE is related to the loss of reliability of conditional distribution as a function of the embedding dimension. This limitation is the consequence of the utilization of a coarse graining strategy of the embedding space exploiting cells of the same size. CApEn, SampEn and CCE deal with this well-known limitation because it leads to a significant reduction of CE (Porta et al 1998, Richman and Moorman 2000, Porta et al 2007b). SampEn prevents the consideration of unreliable conditional distributions (i.e. it excludes 'self-matching' patterns). CApEn corrects unreliable conditional probabilities according to a priori assumptions (i.e. it substitutes erroneous certainty with full uncertainty). CCE adds a corrective term preventing the erroneous decrease of CE estimate (i.e. it imposes an increase of CE estimate according to a measure of unreliability of CE estimate provided by the fraction of 'self-matching' patterns). The KNNCE provides a CE estimate that does not need to introduce any corrective term or a priori information to deal with the loss of reliability of conditional distribution with the pattern length. Indeed, all the conditional distributions have the same consistency assigned a priori according to the number of nearest neighbors (i.e. k). In addition, consistency is independent of the conditioning pattern, yL−1(i − 1), and of its length: indeed, the number of nearest neighbors is the same for every pattern and constant with L.
Another relevant disadvantage of CApEn and SampEn is related to the arbitrary selection of the length of the conditioning pattern (i.e. L − 1) to derive CI and NCI. The original studies proposed L − 1 = 2 (Richman and Moorman 2000, Pincus et al 1993). This choice is definitely corrected when the dynamics of the series can be largely predicted using very few past samples and when the number of samples helpful to forecast future behaviors is invariant while changing groups and/or experimental conditions. However, a safer procedure should allow the automatic selection of the pattern length most useful to reduce the uncertainty of future behaviors. For example, if a large number of past samples is necessary to solve the ambiguities of the dynamics, as it occurs in the case of figure 1(a), the arbitrary assumption that few samples are sufficient to make a reliable prediction overestimates the complexity of the dynamics (see figures 1(e), (g)). The KNNCE approach allows the estimate of CIKNNCE and NCIKNNCE via a minimization procedure over L. It exploits the spread of the dynamics with L leading to the enlargement of the cell containing the k nearest neighbors of yL−1(i − 1). If the spread is helpful in unfolding dynamics and reducing the uncertainty in predicting future samples, KNNCE decreases. Conversely, if the spread leads to an increase of unpredictability, KNNCE rises. This situation, in association with the bounded nature of KNNCE between 0 and CE(1) = SE(y), produces an unequivocal KNNCE minimum over L. The minimum can be easily extracted via a fully automatic minimization procedure to derive an optimized CIKNNCE (or NCIKNNCE).
It is worth noting that CE of y given a pattern formed by L − 1 past samples can be estimated as the difference between the SE of yL and the SE of yL−1 (Porta et al 1998) or, equivalently, as the difference between the SE of y and the mutual information (MI) shared by y and the pattern formed by L − 1 past samples (Kraskov et al 2004). However, in this study neither of the two approaches was followed, thus avoiding the bias arising from errors individually made in the estimation of SE and MI (Kozachenko and Leonenko 1987, Kraskov et al 2004) that unlikely cancel each other in the difference.
5.2. Comparing entropy-based CIs
CIs spanned significantly different ranges of values. Therefore, we recommend to avoid any comparison between studies exploiting different indexes and to draw index-specific conclusions according to the adopted experimental protocol. The index providing the largest coefficient of variation was CICApEn, being significantly larger than CICCE and CIKNNCE in the SAP series. The large coefficient of variation of CICApEn might suggest a reduced discriminative statistical power of CICApEn compared to the remaining ones. Indeed, a larger number of subjects could be necessary to make the type I error probability, p, smaller than a user-defined significance level (e.g. 0.05). Nevertheless, the presumed limited power of CICApEn has no effect in this experimental protocol. Normalization of CIs, leading to NCIs, reduced the variability of indexes over the HP series (i.e. all the coefficients of variation of NCIs were smaller than those of CIs), thus prompting for their use in heart rate variability study to improve discriminative statistical power. This finding did not hold when NCIs were assessed over the SAP series. Since the normalization factor accounts for the SE of the series (Porta et al 2007a), we suggest that SE of HP is more responsible for the dispersion of CI indexes than SE of SAP.
Any CI pair was significantly linearly correlated. This result suggests that it is useless to report results derived from all the methods since they do not provide independent information. Preference should be given to the method allowing the clearest distinction between different conditions. However, given that coefficients of variation were not significantly different, except for CApEn, this selection cannot be made a priori. The correlation coefficient was higher between CICCE and CIKNNCE and between CICApEn and CISampEn. This finding suggests that the largest correlation can be found between methods utilizing the same technique for estimating CE: indeed, CCE and KNNCE computed (1) via the estimation of SE(y/yL−1(i − 1)), while CApEn and SampEn were based on the straight computation of the conditional probabilities. Normalization generally improved the correlation coefficient, especially when indexes were assessed over the SAP series, thus indicating that the different estimates of SE of the series were responsible for a certain degree of uncorrelation between CIs.
5.3. Effects of selective autonomic blockades over CIs
At B in healthy subjects the cardiac pacemaker is under control of vagal and sympathetic branches of the autonomic nervous system (Task Force 1996). The parasympathetic blockade induced by high dose of AT leaves the cardiac pacemaker under the sole sympathetic control, thus simplifying cardiac regulation and reducing the complexity of the cardiac control (Porta et al 2007c, Porta et al 2000). The β-adrenergic sympathetic blockade obtained via the administration of PR did not induce any variation on the complexity of cardiac regulation, thus suggesting that the contribution of sympathetic circuits to the complexity of cardiac control is negligible. The important reduction of the complexity of cardiac control after AT and the insignificant effect of PR were further confirmed by double-blockade of vagal and sympathetic branches obtained via the administration of PR after AT: indeed, CIs decreased after AT + PR and their reduction was similar to that after AT. The negligible contribution of sympathetic circuits to the complexity of cardiac control was further emphasized by the inability of CL to modify CIs and NCIs derived from the HP series.
The analysis of the complexity of the vascular control, as provided by CIs derived from the SAP series, sheds further light about cardiovascular regulation. The complexity of vascular control was not affected by high doses of AT (Porta et al 2000) as a likely result of the missing vagal influences over the vascular tree. In addition, the complexity of the vascular control was unmodified by PR, thus suggesting that sympathetic controls different from the β-adrenergic one are more effective in modulating SAP. This observation was confirmed by CL: indeed, the central blockade of all sympathetic influences affected the complexity of SAP. The increase of CIs after CL might be the effect of the desynchronizing effects over the vascular regulation induced by full sympathetic blockade. NCIs confirmed results obtained by CIs. It appears that normalization in this experimental protocol did not produce any additional advantage.
Despite the theoretical advantages of the KNNCE approach, CIKNNCE (or NCIKNNCE) showed the same ability as CICApEn and CISampEn (or NCICApEn and NCISampEn) to distinguish different experimental conditions. This result suggests that very few past samples were sufficient to predict future behaviors and their number remained constant over different experimental conditions. It is worth noting that KNNCE was more powerful in detecting the decrease in cardiac complexity after AT + PR than CCE. The limited discriminative power of CICCE and NCICCE might be the result of the larger dispersion of the indexes introduced by the corrective term.
The findings relevant to the KNNCE approach were confirmed when SE(y/yL−1(i − 1)) was assessed as the negative natural logarithm of the probability that two samples of y/yL−1(i − 1) are at distance closer than r = 20% of the standard deviation of y instead of computing SE(y/yL−1(i − 1)) after UQ over ξ = 6 bins. This observation suggests that findings relevant to the KNNCE approach are independent of the strategy utilized to assess the SE of the conditional distributions.
6. Conclusions
KNNCE challenges some theoretical problems linked to the estimation of CE without making use of strategies to cope with the loss of the reliability of conditional distributions with the embedding dimension and without imposing the assessment of CI (or NCI) at an arbitrary pattern length. Its performance over short cardiovascular variability series recorded during selective autonomic blockades is comparable with those of CApEn and SampEn and slightly better than that of CCE. Given the peculiar features of the KNNCE approach its utilization is recommended when the number of past samples necessary to definitely unfold dynamics is high and/or variable over different groups and/or experimental conditions. In addition the technique is particularly suitable in fields where, for any reasons, the length of the time series is very limited and the optimal embedding dimension is not a priori known. In this study comparison is limited to approaches quantifying CE directly from the data without any additional transformation except the binning procedure in the case of CCE. Future studies should enlarge comparison to techniques applying an appropriate transformation before estimating CE to convert values into symbols according to a given code (Wessel et al 2000, Cysarz et al 2012) or into permutation indexes according to a rank-order procedure (Bandt and Pompe 2002, Staniek and Lehnertz 2008, Parlitz et al 2012).
Acknowledgment
The Telethon GGP09247 grant to AP supported the study.