Determination of Effectiveness of Sa and Srs Methods On The Data Produced From Continuous Distributions
Determination of Effectiveness of Sa and Srs Methods On The Data Produced From Continuous Distributions
Latif OZTURK, Faculty of Economics and Administrative Sciences, K r kkale University, Turkey. E-mail: Habip KOAK, Faculty of Economics and Administrative Sciences, Marmara University, Turkey. E-mail: Abstract: In many scientific branches sampling methods are used commonly in their researches. The first preference of researchers is to select the best sampling technique to estimate the population parameters unbiased and with minimum error rate. The second is to make the cost minimum without sacrificing unbiasedness. Since one of the reasons that increase cost is the sample size; researchers degrease sample size to minimize the cost. Sequential Analysis (SA) can be a solution to this problem since it has an advantage over Simple Random Sampling (SRS) by means of sample size. In this study, databases produced from normal, uniform and exponential distributions by means of simulation are formed. Then, parameter estimations and hypotheses testing are applied by using the sampling methods mentioned above. Hence, two methods are compared from the point of views of sample size and unbiasedness. Key words: Sampling, Simulation, Sequential Analysis
1. Introduction A simple random sample is a subset of individuals chosen randomly from a larger set. Such that each individual has the same probability of being chosen at any stage during the sampling process, and each subset of k individuals has the same probability of being chosen for the sample as any other subset of k individuals.[1] This process and technique is known as Simple Random Sampling (SRS). The main benefit of SRS is that it guarantees that the sample chosen is representative of the population. But predetermined sample size is a disadvantage of SRS for some surveys. Because sampling size and method will directly affect the survey cost. To reduce survey cost more efficient sample design leading to possible reduction in sample sizes would be chosen.[2] Hence, one of the sampling methods is Sequential Analysis (SA) where the sample size is not predetermined. 2. Literature cited SA or sequential probability ratio test is a specific sequential hypothesis test, first developed by Abraham Wald[3] with Jacob Wolfowitz as a tool for more efficient industrial quality control during World War II. While originally developed for use in quality control studies in the realm of manufacturing, SA has been formulated for use in the computerized testing of human examinees as a termination criterion (Ferguson, 1969; Reckase, 1983; Eggen, 1999). SA has also connection to the problem of gambler's ruin studied by among others Huyghens already in 1657.[4]. In statistics, SA or sequential hypothesis testing is statistical analysis where the sample size is not fixed in advance. Instead data is evaluated as it is collected, and further sampling is stopped in accordance with a pre-defined stopping rule as soon as significant results are observed. Thus a conclusion may sometimes be reached at a much earlier stage than would be possible with SRS, at consequently lower financial cost. SA may be used for either continuous or discrete distributions. Samples are assumed to be identically independent distributed and drawn randomly. If SA is used in a cluster sampling it is applicable within the cluster, not to whole sample.[5] Sample size of SA must be less than the predetermined fixed sample size of SRS. It is a necessity. A vast literature in substantial detail of SA can be found in Lai (2001). The paper constitutes of a comprehensive review of recent developments in SA and some challenges and opportunities ahead. The review focuses on several classical problems and new horizons which highlight the interdisciplinary nature of subject.[6] 3. Methodology Average sample size in SA is the average of observed values which will be obtained for testing the unknown parameters before a decision taken. For parameter tests; first we introduce the data bases from exponential and normal distributions with different parameters by using random numbers. An ideal random number generator returns a sequence of independent uniformly distributed random variables from U(0,1). A pseudo-random number generator returns a sequence of almost independent and approximately uniformly distributed random variables.[7] From this, any given distribution functions the data produced by using the sequence. Let X be a continuous random variable with a cumulative distribution function (1) F(x) = P(X x) Then a random variable Y = F(X) has a U(0,1) distribution. To generate normal random variable; if U 1 and U 2 are independent U (0,1) variables, then (2) X 1 = SQRT( -2lnU 2 ) Sin(2 U 1 ) and (3) X 2 = SQRT( -2lnU 1 ) Cos(2 U 2 ) are independent standard normal variables. The normal variables can be obtained by unstandardizing X 1 and X 2 .
For an Exponential distribution with parameter q , (4) F(x) = 1 - e - x Therefore, applying the explanation above, X can be generated as; (5) X = -(1/ 1/ )l - U) this has an Exponential distribution. After produced the dada from Exponential or Normal distributions the parameters are calculated and stored. Then, we take s times sample of n size without replacement from that databases with SRS and compute the sample s statistics each time by using simulation. Then, we take the samples one by one and calculate the statistics using SA, until we reaches the standard error of the parameters calculated before with SRS. When doing this we also record the sample sizes which are fixed in SRS and variable in SA. Although tables can provide a useful guide for determining the sample size, you may need to calculate the necessary sample size for a different combination of levels of precision, confidence, and variability. In this simulation study the fixed sample sizes for different e values and finite population are calculated for SRS by using the equation;[8] (6) n = N/(1 + N(e 2 )) As a result we compare the two sampling method by means of their sample size while reaching the same standard error of parameters. 4. Empirical Results Here we give different simulation size with normal and exponential distributions with different parameter values. The value of parameters, the value of calculated statistics, standard error and the sample sizes for SRS and SA are given at the tables below. Table1. The 100 simulation results with N = 10000 , distribution N( (
N (m, s 2 )
) and a = 0,1
n(SRS) 99
0,9623 2,0083 3,8676 10,3734
St.Error of Mean SRS SA 0,09672 0,09620 0,2018 0,3887 1,0426 0,2048 0,3637 1,0274
n(SA) 79 88 71 83
-0,1014 14,7343 1,4808 1,4725 85 N (0,15 In SRS applications fixed sample size are used, either the population homogen or not. In SA fixed sample size is not used, but the results of Table1 shows that sample size of SA almost fixed with different parameter values. And the sample size of SA about %20 less than sample size of SRS on the average. Table2. The 100 simulation results with N = 10000 , distribution N ( m , s 2 ) and a = 0,05
N (m, s 2 )
n(SRS) 384
0,9722 2,0387 4,0386 9,6470 15,1279
St.Error of Mean SRS SA 0,0501 0,0501 0,1001 0,2061 0,4923 0,7720 0,1007 0,2048 0,4931 0,7790
If we take the significance level a = 0,05 instead of a = 0,1 the fixed sample size increase to 384 according to the Equation (6) for finite population (N=10000). But on the average, the sample size of SA in this application is less then sample size of SRS about %10. Table3. The 50 simulation results with N = 1000 , distribution N ( m , s 2 ) and a = 0,01
N (m, s 2 )
n(SRS) 909
0,9784 2,0783 4,0081 10,3105 15,3193
St.Error of Mean SRS SA 0,0326 0,0326 0,0689 0,1329 0,3419 0,5081 0,0689 0,1343 0,3502 0,5164
When we take the significance level a = 0,01 , the fixed sample size increase to nearly the population size. That is why the population size is taken 1000 instead of 10000 and the simulation number decreased to 50. Because it take large amount of time to take a result with the 10 000 population size and 100 simulations. In this situation, the sample size of SA less than the fixed sample size about %3. Table4. The 100 simulation results with N = 10000 , distribution Exp(q ) and a = 0,1 Exp(q ) Exp(1) Exp(2) Exp(4) Exp(5) Exp(10) n(SRS) 99
0,9692 0,5201 0,2498 0,1940 0,1024
St.Error of Mean SRS SA 0,0974 0,0859 0,0523 0,0251 0,0195 0,0103 0,0483 0,0253 0,0181 0,0095
n(SA) 55 59 77 60 63
Table5. The 100 simulation results with N = 10000 , distribution Exp(q ) and a = 0,05 Exp(q ) Exp(1) Exp(2) Exp(4) Exp(5) Exp(10) n(SRS) 384
0,9404 0,4753 0,3199 0,2499 0,1994
St.Error of Mean SRS SA 0,04799 0,04839 0,02426 0,01630 0,01263 0,01018 0,02446 0,01669 -0,01379 0,01011
Table6. The 50 simulation results with N = 1000 , distribution Exp(q ) and a = 0,01 Exp(q ) Exp(1) Exp(2) Exp(4) Exp(5) Exp(10) n(SRS) 909
1,1224 0,4943 0,2544 0,1954 0,0970
St.Error of Mean SRS SA 0,0372 0,0368 0,0163 0,0084 0,0065 0,0032 0,0158 0,0083 0,0066 0,00321
If we repeat the process with the data produced from exponential distribution, when the significance level a = 0,1 the sample size of SA is less than the fixed sample size about %30. When a = 0,05 and a = 0,01 the sample size decrease on the average about %10 and %5 respectively. 5. Conclusion Explaining the result briefly, when SRS and SA compared by means of their sample sizes which are taken from the continuous distributions (Normal and Exponential) with different parameter values, SA has an important advantage on SRS. In this study it seems that the same result which reached in SRS can be reachable with SA by using less sample sizes. The advantage of sample size of SA changes about between %3 and %30 depending on the significance level and the distribution of population. The study should be applied with different continuous and discrete distributions. Bibliography 1. Yates, Daniel S.; David S. Moore, Daren S. Starnes. The Practice of Statistics, 3rd Ed.. Freeman. ISBN 978-0-7167-7309-2., 2008 2. Ward, Denis, The Statistics Newsletter for the extended OECD Statistical Network, Issue No:12.2002, ( 3. Wald, Abraham , Sequential Tests of Statistical Hypotheses, The Annals of Mathematical Statistics 16 (2): 117 186., 1945 4. B. K. Gosh and P. K. Sen., Handbook of Sequential Analysis, New York: Marcel Dekker. ISBN 0-8247-8404-1., 1991 5. Maxfield, M.W. & Barton-Dobenin, J.,.A Sequential Sampling Plan For Determining Market
Boundaries, Journal of Small Business Management, VII(3), 25-59., 1980
6. 7. 8.
Lai, T. L. ,Sequential analysis: Some classical problems and new challenges, Statistica Sinica, 11, 303-351., 2001 Glenn D. Israel, Sampling the Evidence of Extension Program Impact, PEOD5, University of Florida., 1992 (