Wikipedia:Reference desk/Archives/Mathematics/2011 December 13
Mathematics desk | ||
---|---|---|
< December 12 | << Nov | December | Jan >> | December 14 > |
Welcome to the Wikipedia Mathematics Reference Desk Archives |
---|
The page you are currently viewing is an archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages. |
December 13
editError with 1 − 2 + 3 − 4 + · · ·?
editSo, I was looking at the second manipulation process (which can be seen here). It goes like:
Now, I understand the manipulation, but the last part doesn't make sense. The article claims that 1-2+3-4... is equal to . But the last part of this equation essentially is:
Sorry for my non-alignment but my point remains. Am I missing something that makes this equation correct, or is there really something wrong with it? 64.229.180.189 (talk) 04:06, 13 December 2011 (UTC)
- Did you read articles Grandi's series and Summation of Grandi's series? They say that 1-1+1-1... is divergent in the strict sense of the word (hence, manipulations with terms make it possible to "compute" many different values of the "sum"), but the value of 1/2 is the most "natural" of these, in some (very vague) sense of the word. For example,
--Itinerant1 (talk) 05:54, 13 December 2011 (UTC)
- See analytic continuation for a less vague sense of the word "natural". Bo Jacoby (talk) 07:26, 13 December 2011 (UTC).
- 1−2+3−4+··· = Σi(−1)i−1 = f(1)
where
- f(x) = Σi(−x)i−1 = 1−2x+3x2−4x3+···
But
- Σi(−x)i−1 = Σd((−x)i)/d(−x) = −d(Σ(−x)i)/dx = −d((1+x)−1)/dx = (1+x)−2.
So
- f(1) = (1+1)−2 = 1/4
or
- 1−2+3−4+··· = 1/4
Q.E.D.
Note also for example that
- 1−4+12−32+80−192+··· = f(2) = 1/9
Bo Jacoby (talk) 10:35, 13 December 2011 (UTC).
Sample median uncertainty from normal distribution contaminated with symmetric outliers
editI have n sample data . A few percent of the samples are false outlier measurements, which may be spread more or less symmetrically up to 10 standard deviations away from the mean of the true underlying distribution, which can be assumed normal with an unknown mean and standard deviation . For some sample sets of data, I may be unlucky that the few outliers do not cancel each other when calculating the sample mean . Thus, the uncertainty of the sample mean for a normal distribution with unknown mean will be an underestimate of the true sample mean uncertainty due to the presence of the outliers in the sample. Thus, I reckon the sample median will be a more robust estimator of the mean of the distribution, but I would like to estimate the uncertainty of this sample median. How do I do that? --Slaunger (talk) 10:25, 13 December 2011 (UTC)
- In the absence of outliers and for large sets, standard error of the median is times standard error of the sample mean. This estimate should be valid as long as either the distribution of outliers is symmetric, or the expected asymmetricity is small ( , where m is the number of outliers and p is the probability for the outlier to be above the underlying mean.)--Itinerant1 (talk) 11:02, 13 December 2011 (UTC)
- Cool, thanks. Just what I needed. I tried to find this information in a Wikipedia article but did not manage to find it. maybe I just did not look the right place? If it is not there, maybe it would be relevat to add including a source, as I think it is a rather useful quantity. --Slaunger (talk) 11:07, 13 December 2011 (UTC)
- This information isn't easy to find in Wikipedia, but is in Efficiency (statistics)#Example. Qwfp (talk) 17:02, 13 December 2011 (UTC)
- Thank you! That section was instructive reading. It also eludes to that although the sample median is not as efficient an estimator for the mean it is a sensible estimator to use, when a pdf is only approximately equal to a normal distribution and has outliers. That section could use a source though for the factor, which I guess is not so trivial to derive. (I guess it involves writing out an expression for the log-likelihood of the median conditioned N samples, and finding the curvature, or the Fisher information) --Slaunger (talk) 07:36, 14 December 2011 (UTC)
- This information isn't easy to find in Wikipedia, but is in Efficiency (statistics)#Example. Qwfp (talk) 17:02, 13 December 2011 (UTC)
- Cool, thanks. Just what I needed. I tried to find this information in a Wikipedia article but did not manage to find it. maybe I just did not look the right place? If it is not there, maybe it would be relevat to add including a source, as I think it is a rather useful quantity. --Slaunger (talk) 11:07, 13 December 2011 (UTC)
Uncertainty of a sample fractile difference robust std dev estimate of a normal distribution contaminated by a few outliers
editA related question. I also calulate the sample variance, but this is even more susceptible to the outliers. So instead I estimate the standard deviation by taking the sample fractile difference between the 15.9% and 84.1% fractiles (because for a normal distribution this will converge to the standard deviation of the underlying distribution). However this quantity is again uncertain due to the limited number of samples, and I would like to have an idea of the confidence I can put into this sample standard deviation found by this more robust fractile difference based approach. Now, if the the pdf had been normal, and I had used the sample variance , I know the variance would be distributed as a Pearson Type III distribution, and I know there would be a nice closed form formula for the variance of the variance, cf., Sample Variance Distribution on Wolfram. My best guess for the variance of the fractile difference based estimate of the standard deviation would be that it is close to that form, but should probably be a little larger (as was the case for the sample mean vs median uncertainty discussed above). Is there a closed form expression that I can use for given N measurements? --Slaunger (talk) 13:40, 13 December 2011 (UTC)
- I've never been happy with removing outliers and would always be very careful about that. The ozone hole over Antarctica was ignored because the actual measurements had been discarded as outliers. More prosaically it seems in many cases that outliers are not errors and results would have been better if people had taken note of them. What's the problem with just accepting the distribution has fat tails? Dmcq (talk) 17:12, 13 December 2011 (UTC)
- I basically agree with your reluctance to ignore the outliers. In this case, however, I fully understand the origin of the outliers. They are the result of a process, which will take place with low probability (relative to the normal data points). I even have a fairly good idea about the probability that a measurement will belong to this process, but I have no way of distinguishing them. My objective for using the method is that it is part of a negotiation process with a costumer regarding the method to use for verifying that the precision of a quantity does not exceed a predefined limit. But this precision should only refer to the most frequent occuring "normal process" samples, and not to the other unlikely process, which produce the outliers. The costumer has a good understanding of this as well. We also know that if we calculate the sample variance including all measurements, there will be a non-conformance. We know that the sample variance is completely dominated by a few outliers, and we also have a good mutual understanding that the sample variance is not a good measure of the actual precision, as by just looking at the sample pdf for some examples there is a clear and narrow peak, contaminated by a very broad, low probability fat tail. We have cases where the probability for the anomalous process is very, very low, and from these we know that the "normal process" measurements are normally distributed. Thus uit has been agreed with the customer that a more robust estimate for the precision, such as a fractile difference approach or a sample mean deviation is acceptable for verification purposes. Here we will be "punished" a little for the outliers, but they will not completely dominate the result. --Slaunger (talk) 07:20, 14 December 2011 (UTC)
- You might like to read Robust measures of scale, which suggests more efficient robust estimators. I guess the article it references (Rousseeuw and Croux JSTOR 2291267) must include expressions for their variances for the case of a Normal distribution (at least in the large-sample limit). It may be easier to use bootstrapping though and that would take account of your outliers, which presumably will increase the variance of any estimator of scale, even ones that are robust enough that the estimates themselves aren't influenced by the outliers. Qwfp (talk) 14:45, 14 December 2011 (UTC)
- Those were very helpful hints. Thanks. These things are really hard to find when you do not know the terms used to describe the methods. --Slaunger (talk) 21:17, 14 December 2011 (UTC)
- A collegue figured out that the Application: confidence intervals for quantiles section of the order statistic article deals with this problem. The Quantile article is also helpful. --Slaunger (talk) 11:38, 19 December 2011 (UTC)
- Those were very helpful hints. Thanks. These things are really hard to find when you do not know the terms used to describe the methods. --Slaunger (talk) 21:17, 14 December 2011 (UTC)
- You might like to read Robust measures of scale, which suggests more efficient robust estimators. I guess the article it references (Rousseeuw and Croux JSTOR 2291267) must include expressions for their variances for the case of a Normal distribution (at least in the large-sample limit). It may be easier to use bootstrapping though and that would take account of your outliers, which presumably will increase the variance of any estimator of scale, even ones that are robust enough that the estimates themselves aren't influenced by the outliers. Qwfp (talk) 14:45, 14 December 2011 (UTC)
- I basically agree with your reluctance to ignore the outliers. In this case, however, I fully understand the origin of the outliers. They are the result of a process, which will take place with low probability (relative to the normal data points). I even have a fairly good idea about the probability that a measurement will belong to this process, but I have no way of distinguishing them. My objective for using the method is that it is part of a negotiation process with a costumer regarding the method to use for verifying that the precision of a quantity does not exceed a predefined limit. But this precision should only refer to the most frequent occuring "normal process" samples, and not to the other unlikely process, which produce the outliers. The costumer has a good understanding of this as well. We also know that if we calculate the sample variance including all measurements, there will be a non-conformance. We know that the sample variance is completely dominated by a few outliers, and we also have a good mutual understanding that the sample variance is not a good measure of the actual precision, as by just looking at the sample pdf for some examples there is a clear and narrow peak, contaminated by a very broad, low probability fat tail. We have cases where the probability for the anomalous process is very, very low, and from these we know that the "normal process" measurements are normally distributed. Thus uit has been agreed with the customer that a more robust estimate for the precision, such as a fractile difference approach or a sample mean deviation is acceptable for verification purposes. Here we will be "punished" a little for the outliers, but they will not completely dominate the result. --Slaunger (talk) 07:20, 14 December 2011 (UTC)