C Final Report
C Final Report
C Final Report
Srikar C CB.SC.U4AIE23355
DATA SAMPLING
Supervised By:
Dr. Vidhya Kamakshi V
Artificial Intelligence
Amrita Vishwa Vidyapeetham
1
+
CERTIFICATE
has not been submitted to any other University / Institute for the
Date:
2
+
CONTENTS
1. INTRODUCTION………………………. 4-6
2. OVERVIEW……………………………... 7-10
3. LITERATURE REVIEW……………….. 11
6. CONCLUSION………………………….. 26-27
7. WORKLOAD DISTRIBUTION……….. 28
3
+
1. INTRODUCTION
Data Sampling:
Fig-1
4
+
5
+
6
+
2. PROJECT OVERVIEW
Probability Sampling:
Fig-2
2. Stratified Sampling:
Description: Divides the population into subgroups (strata) based on
specific characteristics, and then samples from each stratum.
7
+
Fig-3
3. Systematic Sampling:
Description: Selects every kth element from a list after a random
starting point.
Procedure: Choosing a random starting point and then selecting every
kth element.
Use Case: When there is an ordered list and a systematic approach is
desired.
8
+
Fig-4
4. Cluster Sampling:
Description: Divides the population into clusters, randomly selects
some clusters, and includes all members from those clusters in the
sample.
Procedure: Randomly selecting clusters and including all individuals
from those clusters.
Use Case: Practical when it's difficult to directly sample individuals, and
clusters represent distinct groups.
9
+
Fig-5
Non-Probability Sampling:
1. Convenience Sampling:
Description: Involves selecting individuals who are easiest to reach or
obtain.
Procedure: Choosing participants based on accessibility and
convenience.
Use Case: Often used for quick and cost-effective studies but may
introduce bias.
2. Purposive Sampling:
Description: Selects participants based on specific criteria or
characteristics relevant to the research.
Procedure: Purposefully choosing individuals who meet defined
criteria.
Use Case: When researchers want to study a particular subgroup with
distinct characteristics.
3. Snowball Sampling:
Description: Starts with a few participants who, in turn, refer others to
participate.
Procedure: Participants refer additional participants, creating a
"snowball" effect.
Use Case: Useful when studying hard-to-reach populations or those
with shared characteristics.
4. Quota Sampling:
Description: Involves selecting individuals based on pre-defined
quotas, ensuring representation from different categories.
Procedure: Setting quotas for different subgroups and sampling
individuals to meet those quotas.
10
+
3. Literature Review
1. Acharyal, B., Bhattarai, G., de Gier, A., and Stein, A. (2000). Systematic
adaptive cluster sampling for the assessment of rare tree species in
Nepal. Forest Ecology and Management, 137, 65–73.
This study by Acharyal et al. focuses on the application of systematic adaptive
cluster sampling for assessing rare tree species in Nepal. The use of adaptive
cluster sampling suggests a methodical approach to data collection in
ecological studies. The paper may shed light on the challenges associated with
assessing rare species and how systematic adaptive cluster sampling
addresses these challenges.
2. Becker, E. F. (1991). A terrestrial furbearer estimator based on probability
sampling. Journal of Wildlife Management, 55, 730–737.
Becker's work in the Journal of Wildlife Management introduces a terrestrial
furbearer estimator based on probability sampling. The emphasis on
probability sampling suggests a rigorous and statistically sound approach to
estimating wildlife populations. This paper may contribute insights into the
development of estimation methods for terrestrial furbearers and highlight
the importance of probability-based sampling in wildlife management.
3. Bellhouse, D. R. (1988b). A brief history of random sampling methods. In
P. R. Krishnaiah and C. R. Rao (eds.), Handbook of Statistics, Vol. 6,
Sampling. Amsterdam: Elsevier Science Publishers, pp. 1–14.
Bellhouse's work provides a historical perspective on random sampling
methods, offering valuable insights into the evolution of sampling techniques.
This chapter within the "Handbook of Statistics" may serve as a foundational
11
+
12
+
char *elements[MAX_SIZE];
int num_elements = 0;
char *token = strtok(content, ",");
while (token != NULL) {
elements[num_elements] = malloc(strlen(token)
+ 1);
strcpy(elements[num_elements], token);
num_elements++;
token = strtok(NULL, ",");
}
if (sample_size > num_elements) {
sample_size = num_elements;
}
char *samples[sample_size];
for (int i = 0; i < sample_size; i++) {
int random_index = rand() % num_elements;
samples[i] =
malloc(strlen(elements[random_index]) + 1);
strcpy(samples[i], elements[random_index]);
}
printf("Random samples: ");
for (int i = 0; i < sample_size; i++) {
printf("%s ", samples[i]);
}
printf("\n");
for (int i = 0; i < num_elements; i++) {
free(elements[i]);
}
for (int i = 0; i < sample_size; i++) {
13
+
free(samples[i]);
}
fclose(file);
return 0;
}
Output
Systematic Sampling
Code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
14
+
int i;
if (file == NULL) {
perror("Error opening file");
return NULL;
}
if (content == NULL) {
printf("Memory allocation failed.\n");
fclose(file);
return NULL;
}
fclose(file);
total_elements = 1;
if (trimmed != NULL) {
strcpy(elements[i], trimmed);
}
}
15
+
if (samples == NULL) {
printf("Memory allocation failed.\n");
free(elements);
free(content);
return NULL;
}
printf("\n");
free(elements);
free(content);
return samples;
}
int main() {
char file_path[MAX_SIZE];
int sample_size;
char** result;
16
+
return 0;
}
Output
Cluster Sampling
Code:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
17
+
int main() {
srand(time(0)); // Seed for random number generation
18
+
19
+
printf("\n");
}
return 0;
}
Output
20
+
Stratified Sampling
Code
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
void generate_random_strata(float *data, int data_size, int num_strata,
float **strata) {
int i, j;
int stratum_size = data_size / num_strata;
// Shuffle the data
for (i = data_size - 1; i > 0; i--) {
int j = rand() % (i + 1);
float temp = data[i];
data[i] = data[j];
data[j] = temp;
}
// Create strata
for (i = 0; i < num_strata; i++) {
char stratum_name[20];
sprintf(stratum_name, "Stratum_%d", i + 1);
strata[i] = (float *)malloc(stratum_size * sizeof(float));
for (j = 0; j < stratum_size; j++) {
strata[i][j] = data[i * stratum_size + j];
}
}
}
void random_sampling_from_strata(float **strata, int num_strata, int
stratum_size, int sample_size, float *sampled_data) {
int i, j;
int k = 0;
// Combine all strata data into a single array
21
+
22
+
Output
23
+
5.FUTURE SCOPE
The future scope of data sampling is likely to be influenced by advancements in technology,
changes in data availability, and evolving data science methodologies. Here are some
potential trends and areas of growth in the future of data sampling:
24
+
5. Privacy-Preserving Sampling:
With growing concerns about data privacy, there may be an increased
focus on developing sampling methods that can generate
representative samples without compromising individual privacy.
Differential privacy and other privacy-preserving techniques could play
a significant role in this context.
6. Domain-Specific Sampling Techniques:
Different domains may have unique characteristics that require
specialized sampling techniques. Future developments may include the
creation of domain-specific sampling methods tailored to the needs of
specific industries such as healthcare, finance, and social sciences.
25
+
6. CONCLUSION
In conclusion, the project on data sampling has explored various aspects of sampling
methodologies, techniques, and their applications in the realm of data science. The
key findings and takeaways from the project are summarized below:
26
+
27
+
28