Chapter 1
Chapter 1
Chapter 1
1.1.1. Mean value and standard deviation ...................................................................... 1 1.1.2. Standard Deviation and Probability...................................................................... 2 1.2. 1.3. 1.4. 1.5. 1.6. Confidence Intervals ............................................................................................... 3 Comparison of Means with Students t .................................................................... 4 Comparison of standard deviations with the F test................................................... 5 Q test for bad data ................................................................................................... 6 The Method of Least Squares .................................................................................. 7
Chapter 1 1.1.
Statistics
Gaussian distribution
If an experiment is repeated a great many times and if the errors are purely random, then the results tend to cluster symmetrically about the average value. The more times the experiment is repeated, the more closely the results approach an ideal smooth curve called the Gaussian distribution. In general, we can not make so many measurements in a lab experiment. We are more likely to repeat an experiment 3 to 5 times than 2000 times. However, from the small set of results, we can estimate the statistical parameters that describe the large set. We can then make estimates of statistical behavior from the small number of measurements. 1.1.1. Mean value and standard deviation If an experiment is repeated n times, mean can be calculated in the following formula: = (with i = 1, 2, 3... n) (1-1)
The standard deviation, s, measure how closely the data are clustered about the mean. The smaller the standard deviation, the more closely the data are clustered about the mean. The quantity n-1 in equation 1-2 is called the degrees of freedom 1 (1-2)
Example: Mean and standard deviation Find the average and the standard deviation for 821, 783, 834, and 855 Solution: The average is = = 823.2
= 30.3
The average and the standard deviation should both end at the same decimal place. 1.1.2. Standard Deviation and Probability The formula for Gaussian curve is (1-3)
Where e = 2.71828 is the base of natural logarithm. For q finite set of data, we approximate by and = 0 are used for simplicity. The maximum value of y is at x= , and the curve is
symmetric about x= .
1.2.
Confidence Intervals
Students t is statistical tool used most frequently to express confidence intervals and to compare results from different experiments. Calculating confidence intervals Confidence interval (1-4) Where s is the measured standard deviation, n is the number of observations, and t is Students t, taken from table 1-1.
Table 1-1. Values of students t Confidence level (%) Degrees of Freedom 1 2 3 4 5 6 7 8 9 10 15 20 1.000 0.816 0.765 0.741 0.727 0.718 0.711 0.706 0.703 0.700 0.691 0.687 6.314 2.920 2.353 2.132 2.015 1.943 1.895 1.860 1.833 1.812 1.753 1.725 12.706 4.303 3.182 2.776 2.571 2.447 2.365 2.306 2.262 2.228 2.131 2.086 3 31.821 6.965 4.541 3.747 3.365 3.143 2.998 2.896 2.821 2.764 2.602 2.528 63.656 9.925 5.841 4.604 4.032 3.707 3.500 3.355 3.250 3.169 2.947 2.845 127.321 14.089 7.453 5.598 4,773 4.317 4.029 3.832 3.690 3.581 3.252 3.153 636.578 31.598 12.924 8.610 6.869 5.959 5.408 5.041 4.781 4.587 4.073 3.850 50 90 95 98 99 99,5 99,9
25 30 40 60 120
The carbohydrate content of glycoprotein (a protein with sugars attached to it) is determined to be 12.6, 11.9, 13.0, 12.7, and 12.5 g of carbohydrate per 100 g of protein in replicate analyses. Find the 50% and 90% confidence intervals for the carbohydrate content. First calculate (=12,5) and s (0,4) for the five measurements. For the 50% confidence
interval, look up t in the Table 1-1 under 50 and across from four degrees of freedom. The value of t is 0.741, so the 50% confidence = 12.5 The 90% confidence interval is = 12.5 There is a 50% chance that the true mean, , lies within the range 12.50.1. There is a 90% chance that the true mean, , lies within the range 12.50.3. 1.3. Comparison of Means with Students t
We use a t test to compare one set of measurements with another to decide whether or not they are the same. Statisticians say we are testing the null hypothesis, which states that the mean values from two sets of measurements are not different. Because of inevitable random errors, we do not expect the mean values to be exactly the same, even if we are measuring the same physical quantity. Statistics gives us a probability that the observed difference between two means can arise from purely random measurement error. We customarily reject the null hypothesis if there is less than a 5% chance that the observed difference arises from random 4
error. We customarily reject the null hypothesis if there is less than a 5% chance that observed difference arises from random error. With this criterion, we have a 95% chance that our conclusion is correct. One time out of 20 when we conclude that two means are not different will be wrong. Here are three cases that are handled in slightly different manners: Case 1. We measure a quantity several times, obtaining an average value and a standard deviation. We need to compare our answer with an accepted answer. The average is not exactly the same as the accepted answer. Does our measured answer agree with the accepted answer within experimental error? Case 2. We measure a quantity multiple times by two different methods that give two different answers, each with its own standard deviation. Do the two results agree with each other within experimental error? Case 3. Sample 1 is measured once by Method 1 and once by Method 2, which do not give exactly the same results. Then a different sample, designated 2, is measured once by Method 1 and once by Method 2; and again, the results are not exactly equal to each other. The procedure is repeated for n different samples. Do the two methods agree with each other within experimental error? 1.4. Comparison of standard deviations with the F test
The F test tells us whether two standard deviations are significantly from each other. F is the quotient of the squares of the standard deviations Fcalculated = (1-5)
We always put the larger standard deviation in the numerator so that F 1. If Fcalculated > Ftable in table 1-2, then the difference is significiant. B ng 1-2. Critical values of F = at 95% confidence test Degrees of freedom for s1 2 3 4 5 6 12 20
2 3 4 5 6 12 20 1.5.
19.45 19.50 8.66 8.83 5.80 5.63 4.56 4.36 3.87 3.67 2.54 2.30 2.12 1.84 1.57 1.00
Sometimes one datum is inconsistent with the remaining data. You can use the Q test to help decide whether to retain or discard a questionable datum. Consider the five results 12.53, 12.56, 12.47, 12.67 and 12.48. Is 12.67 a bad point? To apply the Q test, arrange the data in order of increasing value and calculate Q, defined as Qcalculated =
(1-6)
12,47 12,48 12,53 12,56 12,67 The range is the total spread of the data. The gap is the difference between the questionable point and the nearest value. If Qcalculated > Qtable, the questionable point should be discarded. In the preceding example, Qcalculated = 0.11/0.20 = 0.55. In the table 1-3, we find Qtable = 0.64 (90% confidence interval). Because Qcalculated < Qtable, the questionable point should be retained. Table 1-3. Values of Q for rejection of data Number of observations 90 3 4 5 6 0.94 0.76 0.64 0.56 95 0.97 0.83 0.71 0.62 99 0.99 0.93 0.82 0.74 7 8 9 10 Q (confidence, %) Number of observations 90 0.51 0.47 0.44 0.41 95 0.57 0.52 0.49 0.47 99 0.68 0.63 0.60 0.57 Q (confidence, %)
If Qcalculated > Qtable, the value in question can be rejected with confidence respectively.
1.6.
Finding the equation of the line offset P(x) = ax by method of Least squares In which a is the slope S=