RES505 Research Methodology

YASHWANTRAO CHAVAN MAHARASHTRA OPEN UNIVERSITY
SCHOOL OF SCIENCES
(FORMERLY SCHOOL OF ARCHITECTURE, SCIENCE & TECHNOLOGY)
V151: M.Sc. Mathematics

2023 {As per NEP 2020}
Pattern
RES505
RESEARCH METHODOLOGY
(4 Credits)
Semester - I
Email: [email protected]
Website: www.ycmou.ac.in
Phone: +91-253-2231473
Yashwantrao Chavan RES505
Maharashtra Open Research Methodology

University
Brief Contents
Vice Chancellor’s Message............................................................................................................................3
Foreword By The Director ............................................................................................................................4
Credit 01 .........................................................................................................................................................5
CREDIT 01- UNIT 01: RESEARCH ......................................................................... 6
CREDIT 01-UNIT 02: RESEARCH METHODOLOGY ............................................ 31
CREDIT 01-UNIT 03: EXPERIMENTATION DESIGN .......................................... 50
CREDIT 01-UNIT 04: SAMPLING METHODS ....................................................... 71
CREDIT 02 ...................................................................................................................................................93
CREDIT-02 UNIT-01: DATA COLLECTION .......................................................... 94
CREDIT 02-UNIT 02: REPRESENTATION OF DATA .......................................... 123
CREDIT 02-UNIT 03: GRAPHICAL REPRESENTATION..................................... 147
CREDIT 02-UNIT 04: DATA ANALYSIS ............................................................. 177
CREDIT 03 .................................................................................................................................................202
CREDIT 03-UNIT 01: INFERENTIAL STATISTICAL IN RESEARCH .................. 203
CREDIT 03-UNIT 02: BIOSTATISTICAL TEST .................................................. 225
CREDIT 03-UNIT 03: USE OF ANOVA FOR THE RESEARCH ANALYSIS .......... 247
CREDIT 03-UNIT 04: APPLICATION OF CORRELATION OF DATA .................. 264
CREDIT- 04 ...............................................................................................................................................285
CREDIT 04-UNIT 01: LITERATURE COLLECTION ........................................... 286
CREDIT 04-UNIT 02: INTELLECTUAL PROPERTY RIGHTS ............................ 322
CREDIT 04- UNIT-03: RESEARCH DATABASE ................................................ 353
CREDIT 04-UNIT 04: RESEARCH MATRIX ......................................................381
Feedback Form ..........................................................................................................................................406
RES505: Research Methodology Page 1

RES505: RESEARCH METHODOLOGY
Yashwantrao Chavan Maharashtra Open University
Vice-Chancellor: Prof. Dr. P. G. Patil
School of Architecture, Science & Technology
Director of the School: Dr. Sunanda More
Programme Advisory Committee
Dr Sunanda More Dr Chetana Kamlaskar
Director, School of Architecture, Science & Associate Professor, School of Architecture, Science
Technology,YCMOU, Nashik & Technology,YCMOU, Nashik
Dr. Sunil Madhukar Gaikwad Dr. Hemlata Pradeep Dr. Kakulte Vikram
Associate Professor Nandurkar Raghunath
Department of Zoology, Head and Associate Professor Head and Asst. Professor
Shivaji University, Kolhapur Department of Zoology Department of Zoology,
Sant Gadage Baba Amravati K.T.H.M.College,Nashik
University, Amravati
Dr. Anil R. Kurhe Dr. Ganesh Suryawanshi Amol Pund
Associate Professor Assistant Professor. Academic Coordinator,
Dept of Zoology, PVP College, Loni Yogeshwari Mahavidyalaya, AST School, YCMOU,
Tal. Rahata, Dist. Ahmednagar Ambajogai, Dist Beed Nashik– 422222
Ashutosh Nirbhavane Sachin Jangam
Academic Coordinator, Academic Coordinator,
AST School, YCMOU, Nashik– 422222 AST School, YCMOU, Nashik– 422222
Development Team
Instructional Technology Course Coordinators Book Writer Book Editor
Editor
Dr Sunanda More, Rahul Navale, Prof. Dr. Anant J. Dr. Anil R. Kurhe
Former Director, & Manish Shingare Dhembare Associate Professor
Dr Chetana Kamlaskar Department of Zoology,
Academic Coordinator, Department of Zoology,
Director, School of PVP College, LoniTal.
Sciences, YCMOU, School of Sciences, PVP College, LoniTal.
Nashik YCMOU, Nashik Rahata, Dist. Ahmednagar Rahata, Dist.
Ahmednagar
This work by YCMOU is licensed under a Creative Commons Attribution-

NonCommercial-ShareAlike 4.0 International License.
 Book Publication : 12-Dec-2022 Publication No: 2606
 Publisher : Dr. Prakash Deshmukh, Registrar (I/C), YCMOU, Nashik- 422 222
 ISBN: 978-93-95855-62-4
 This SLM V141: M.Sc. (Physics) {2022 Pattern}, dtd. 12/12/2022
Book used in V142: M.Sc. (Chemistry) {2022 Pattern}, dtd. 12/12/2022
V143: M.Sc. (Zoology) {2022 Pattern}, dtd. 12/12/2022
V144: M.Sc. (Botany) {2022 Pattern}, dtd. 12/12/2022V151: M.Sc.
(Mathematics) {2023 Pattern}, dtd. 31/08/2023
V152: M.Sc. (Environmental Science) {2023 Pattern}, dtd. 31/08/2023
V153: M.Sc. (Physics) {2023 Pattern}, dtd. 31/08/2023
V154: M.Sc. (Chemistry) {2023 Pattern}, dtd. 31/08/2023
V155: M.Sc. (Zoology) {2023 Pattern}, dtd. 31/08/2023
V156: M.Sc. (Botany) {2023 Pattern}, dtd. 31/08/2023

Vice Chancellor’s Message
Dear Students, Greetings!!!

I offer cordial welcome to all of you for the Master’s degree programme of Yashwantrao
Chavan Maharashtra Open University.
As a post graduate student, you must have autonomy to learn, have information and
knowledge regarding different dimensions in the field of Science and at the same time
intellectual development is necessary for application of knowledge wisely. The process of
learning includes appropriate thinking, understanding important points, describing these points
on the basis of experience and observation, explaining them to others by speaking or writing
about them. The science of Education today accepts the principle that it is possible to achieve
excellence and knowledge in this regard.
The syllabus of this course has been structured in this book in such a way, to give you
autonomy to study easily without stirring from home. During the counseling sessions, scheduled
at your respective study centre, all your doubts will be clarified about the course and you will get
guidance from some qualified and experienced counsellors/ professors. This guidance will not
only be based on lectures, but it will also include various techniques such as question-answers,
doubt clarification. We expect your active participation in the contact sessions at the study
centre. Our emphasis is on ‘self-study’. If a student learns how to study, he will become
independent in learning throughout life. This course book has been written with the objective of
helping in self-study and giving you autonomy to learn at your convenience.
During this academic year, you have to give assignments, complete laboratory activities,
field visits and the Project work wherever required. You have to opt for specialization as per
programme structure. You will get experience and joy in personally doing above activities. This
will enable you to assess your own progress and thereby achieve a larger educational objective.
We wish that you will enjoy the courses of Yashwantrao Chavan Maharashtra Open
University, emerge successful and very soon become a knowledgeable and honorable Master’s
degree holder of this university.
I congratulate “Development Team” for the development of this excellent high quality
“Self- Learning Material (SLM)” for the students. I hope and believe that this SLM will be
immensely useful for all students of this program.
Best Wishes!
- Prof. Dr. P. G. Patil
Vice-Chancellor, YCMOU

Foreword By The Director
Dear Students,
Greetings!!!
This book aims at acquainting the students with conceptual and applied fundamentals about
Research Methodology required at PG level. The book has been specially designed for Science
students. It has a comprehensive coverage of concepts and its application in practical life. The
book contains numerous examples to build understanding and skills. The book is written with
self- instructional format. Each chapter is prepared with articulated structure to make the contents
not only easy to understand but also interesting to learn. Each chapter begins with learning
objectives which are stated using Action Verbs as per the Bloom’s Taxonomy. Each Unit is
started with introduction to arouse or stimulate curiosity of learner about the content/ topic. There
after the unit contains explanation of concepts supported by tables, figures, exhibits and solved
illustrations wherever necessary for better effectiveness and understanding. This book is written
in simple language, using spoken style and short sentences. Topics of each unit of the book
presents from simple to complex in logical sequence. This book is appropriate for low achiever
students with lower intellectual capacity and covers the syllabus of the course. Exercises given in
the chapter include conceptual questions and practical questions so as to create a ladder in the
minds of students to grasp each and every aspect of a particular concept. I thank the students who
have been a constant motivation for us. I am grateful to the writers, editors and the School faculty
associated in this SLM development of the Programme.
Best Wishes to all of you!!!
- Dr. Sunanda More

Director, School of Arch., Science &
Technology,
YCMOU

Credit 01

CREDIT 01- UNIT 01: RESEARCH
LEARNING OBJECTIVES
 Explain how the scientific method is used to develop new knowledge.
 Distinguish between different kinds of researches.
 Understand the importance, need and significance of the research.
 Understand research design and the process of research design.
 Formulate a research problem and state it as a hypothesis.
"I believe in innovation and that the way you get innovation is you fund research, and you learn
the basic facts." - Bill Gates.
INTRODUCTION
"Research, more simply put, is the search for knowledge and the search for truth". In the formal
sense, it is a systematic study of a problem addressed through a deliberately chosen strategy that
begins with the choice of an approach to preparing a preliminary plan, in terms of the development
of research hypotheses, the choice of methods and techniques and selection implemented to
development of data collection tools, data processing, interpretation and ends with the presentation
of the solution(s) to the problem.
Fig. 1.1.1
Research is creative and systematic work undertaken to increase the stock of knowledge”.
Research studies are done to discover new information or to answer a question about how we learn,
behave and function with the end-goal of benefitting society. Some studies might involve simple
tasks like completing a survey, being observed among a group of people or participating in a group
discussion.
The research purpose is a statement of "why" the study is being conducted, or the goal of the study.
The goal of a study might be to identify or describe a concept or to explain or predict a situation or
solution to a situation that indicates the type of study to be conducted (Buckingham, 1974).

It offers you many benefits that include: fostering critical thinking and analytical skills through
hands-on learning. Defining academic, career and personal interests. Expanding knowledge and
understanding of a chosen field outside of the classroom. Hence the research is included to Post
Graduate Students of Zoology, and gets benefited to aspirant ones.
01-01-01: RESEARCH
Research has been interpreted and defined by various scholars based on their areas of study and the
availability of resources at any given time. You will find that the basic meaning and context of these
definitions are the same. The difference between these definitions lies solely in the way the author
has researched his discipline. According, to Thyer, (2001) the word investigation consists of two
syllables, re and search. Re is a prefix meaning again, and again. Search is a verb that means to
examine, test and test or examine closely and carefully. It is also, a noun that describes careful,
systematic, study and research in a field of knowledge undertaken to ascertain fact or principle.
According to Merriam-Webster's online dictionary, the word research derives from the Middle
French recherche”, meaning to go in search, the term itself derives from the old French term
"investigator", a compound word of "re-" + "cerchier" or "seeker", meaning "to search". The first
recorded use of the term Research was in 1577. Research is structured inquiry that uses acceptable
scientific methods to solve problems and create new, generally applicable knowledge‖ (Dawson,
2019).
DEFINITION
Research has been defined in a variety of ways and there are similarities. Such as: -
 "An in-depth study of the subject, in particular to discover (new) information or arrive at a
(new) understanding".
 “It is the foundation of knowledge and provides guidelines for solving problems.”
 “The creation of new knowledge and/or the use of existing knowledge in a new and creative
way to generate new concepts, methods and insights”.
 “A detailed and careful study of something to find out more information about it.”
 “Research is defined as careful consideration of study regarding a particular concern or
problem using scientific methods.”

Fig. 1.1.2
Another definition of research is given by 'John W. Creswell' who states that "Research is a
process of steps used to collect and analyze information to improve our understanding of a
topic or problem." It consists of three steps: -ask a question, collect data to answer, and submit
an answer to the question.
OBJECTIVES OF RESEARCH
To understand clearly an observation and explain its logic and reason for happening:
i. To get insights about problem.
ii. To find solutions for a problem.
iii. To test existing laws or theories.
v. To develop new ideas, concepts and theories.
vi. To test hypothesis of a causal relationship between variables.
vii. To identify areas where research could make the difference.
viii. To predict future of events.
TYPES OF RESEARCH
The research is the different methodologies used to conduct research. Based on research goals,
timelines and purposes, different types of research are better suited for certain studies. The first part
of designing research is to determine what you want to study and what are your goals. For
example, you may simply want to learn more about a topic, or you may want to try to determine how
a new policy will affect lower-level employees at your company. The followings are the types of
research such as--
1. Fundamental research 2. Applied research
3. Qualitative research 4. Quantitative research
5. Mixed research 6. Exploratory research
7. Longitudinal research 8. Cross-sectional research

9. Field research 10. Laboratory research
11. Fixed research 12. Flexible research
13. Action research 14. Policy research
15. Comparative research 16. Causal research
17. Inductive research 18. Deductive research
GOOD QUALITY RESEARCH
A research is a wide task and it requires great efforts for a researcher. A good research should have
the following qualities:
i. Clarity: It is the most significant quality of any research. The research should be clear so
that others can easily understand the nature of your research. The research should have a
single version so that people cannot get sidetracked. The topic should have to be very clear
in mind of researcher so that he can properly undertake it. The research topic should have to
be free of any vagueness. Clarity also means that the research should have to be directional
and it should set the whole research methodology.
ii. Planned Research Design: Research design must be properly planned. For example, if a
researcher is using sampling technique for a selected group, the researcher must make the
sample representative. Here, the researcher can collect primary as well as secondary data.
The major challenge generally seen is personal bias in selecting data by the researcher.
iii. Maintain Ethical Standard: Mainly researchers work independently. Data reliability
should be the main concern. Ethical issues involved in conducting research should be given
precedence.
iv. Organized Presentation of Findings: The most important task of a researcher is to present
research findings in an organized manner. Researcher should avoid technical jargons and
must include objectivity in results.
v. Emphasize Limitations: It is desirable that the researcher points out limitations which he
has gone through the process of research. Limitations may be related to data collection,
shortage of time, money, etc.
vi. Rationalize Conclusions: Researchers must verify their work and provide rationalized
conclusions, which are mainly received when the research work is free from bias.

ADVANTAGES OF RESEARCH
i. Facilitates Discoveries: Research leads to the development of new concepts, theories,

principles, tools, methods, etc.
ii. Answers Queries: Research answers questions like what, where, when and how. The
answer provides a right direction and tries to give a proper solution.
iii. Facilitates Interaction with People: Research leads to interaction with people during the
process of data collection. Sometimes, the researcher not only gathers information from
respondents but they also educate the respondents which lead to social upliftment.
iv. Predicts Future: Research gathers data, analyses it and helps an organization in predicting
the future requirements for it.
v. Creates Progressive Outlook: Research creates progressive outlook in an organization. It
develops employees through logical thinking which ultimately results in overall success of
an organization.
vi. Uses Questionnaires: Questionnaires serve as an important tool to collect data and
information can be checked as they are recorded on questionnaire, thus elimination bias and
increasing objectivity of research.
vii. Maintains Objectivity: Objectivity is the ability to examine records as they are existing
without any bias. Research maintains objectivity and gives proper solutions to problems
specified.
LIMITATIONS OF RESEARCH
i. Bias by Researcher: Bias is a major issue in the success of any research work. Bias takes
place at many levels like personal bias by researcher, biased questionnaire, biased
respondent or improper sampling.
ii. Defective Data Collection: When a researcher is not loyal towards his work, he may use
faulty methods of data collection leading to faulty conclusions.
iii. Existence of Subjectivity: Subjectivity occurs when researcher is inclined by likes and
dislikes, beliefs, faith, etc. These factors may have a negative impact on the worth of
research and cause damage thereby increasing subjectivity of the research work.
iv. Lengthy and Time-consuming: Research is a lengthy process and a time-consuming
activity. Even though carried out in systematic manner, exploratory research may require
more time.

v. Costly Process: Research is costly process as it requires services of experts. Cost is also
involved in data collection.
RESEARCH GOALS
The aim of research is to use scientific methods to find answers to questions. The main goal of
the investigation is to find the truth that is hidden and not yet discovered. Although each research
study has its own specific purpose, we can imagine research objectives falling into the following
general groups:
i. To become familiar with a phenomenon or to gain new knowledge;
ii. Accurately represent the characteristics of a particular person, situation, or group;
iii. Determine how often something occurs or is associated with something else;
iv. To test a hypothesis of a causal relationship between variables.
MOTIVATION IN RESEARCH
Why the people to do research? This is a question of fundamental importance. Possible reasons for
conducting an investigation may be one or more of the following:
i. Desire for a research degree with the associated advantages;
ii. The desire to take on the challenge of solving unsolved problems, i.e. concern for practical
problems, initiates research;
iii. Desire to gain the intellectual pleasure of creative work;
iv. Desire to be of service to society;
v. Desire to gain reputation.
SHORT ANSWER QUESTIONS WITH MODEL ANSWER

1) What is research in a simple definition?
Answer: Research is defined as creating new knowledge and/or using existing knowledge in new
and creative ways to generate new concepts, methods and insights. This could involve the synthesis
and analysis of previous research to the extent that it leads to new and creative results.
2) What is the purpose of the research?

Answer: The aim of the research is therefore to find out what is known, what is not and what we can
develop further. In this way, scientists can develop new theories, ideas and products that shape our
society and our daily lives.

3) What are the two types of research?
Answer: There are two main categories of research methods: qualitative research methods and
quantitative research methods. Quantitative research methods involve using numbers to measure
data. Researchers can use statistical analysis to find connections and meaning in data.
4) What are research tools?

Answer: Investigative tools can be defined as vehicles that greatly facilitate investigations and
related activities. "Research Tools" enable researchers to collect, organize, analyze, visualize and
publish research results.
01-01-02: ESSENTIAL STEPS IN RESEARCH

Whenever a scientific problem is to be solved, there are several important steps to follow. The
problem must be stated clearly, including any simplifying assumptions. A universal set of
chronological components of research should be important.
The research process involves a series of closely related activities that an investigator must
undertake. The research process needs patients. There is no metric that shows your research is the
best. It's more of an art than a science. The following are the main steps in the social or economic
research process (Fig.2.1).
1. Selection of the research problem
2. Extensive bibliographical research
3. Hypothesis development
4. Preparation of the research design
5. Sampling
6. Data collection
7. Data analysis
8. Writing report

Fig.1.1.3: Research process
1. SELECTION OF RESEARCH PROBLEM

A selection of the research topic is a difficult task. If we select a title or research statement, other
activities would be easy to carry out. So, to fully understand the issue, it needs to be discussed with
colleagues, friends, experts, and teachers. The research topic or problem must be practical, relatively
important, feasible, ethically and politically justifiable.
2. LITERATURE RESEARCH OR EXTENSIVE LITERATURE RESEARCH

After the selection of the research problem, the second step is the mostly topic-related literature
review. The availability of literature can make research easier. For this purpose, scientific research
journals, peer reviewed journals, periodicals, project reports, published thesis and research data,
attendance of scientific conferences and symposiums and government journals, the reports and the
library must be studied.
3. HYPOTHESIZING
The development of "hypotheses" is a technical task that depends on the experience of the
researcher. Hypothesis consists in extracting the positive and negative aspects of cause and effect of
a problem. Narrows the scope of investigation and keeps the investigator on track.
4. CREATION OF THE RESEARCH DESIGN

After formulating the problem and setting up hypotheses about it, the research design must be
created by the researcher. You can understand the conceptual structure of the problem. Any type of
research design can be conducted depending on the nature and purpose of the study. Design, credits,
skills, time and funding are all considered.
5. SAMPLING
The researcher must design a sample. It's a plan to take your respondents from a specific area or
universe. The sample can be of two kinds:
i Probability sample

ii Non-probability sampling
6. DATA COLLECTION
Data collection is the researcher's most important task. The information collection should include
data coming from the following two types of researchers. Such as-
i. Primary data collection: Primary data can be any of the following. a. Trial
b. Questionnaire
c. Observation
d. Interview
ii. Secondary data collection: It has the following categories:
a. Literature review
b. Official and unofficial reports
c. library approach
7. DATA ANALYSIS
As data is collected, it is sent for analysis, which is the most technical work. Data analysis can be
divided into two main categories.
i. Data processing: It is divided as follows data manipulation, data coding, data classification,
data tabulation, data presentation, data measurement
ii. Data Exposure: Data exposure has the following subcategories- Description, explanation,
narrative, conclusion/findings, recommendations/suggestions
8. PREPARATION OF THE REPORT

An investigator must produce a report of his work. You should note the following points:
i. Designing the report in the early stages: The report should have a title, a brief introduction to
the problem and background, followed by an acknowledgment. There should be a table of
contents, grapes and graphics.
ii. Body of the report: It must contain the objectives, hypotheses, explanations and methods of the
investigation. It should be divided into chapters, with each chapter explaining its own title,
which should include a summary of the findings. The final section would clearly be conclusions
to show the main theme of study research.
iii. Completing the Report: After preparing the report, the final step in the business research
process includes a bibliography, references, appendices, an index, and maps or graphs for
illustrative purposes. For that, the information needs to be clearer.

1) What does research process mean?
Answer: It is a researcher's systematic approach to his field of study to produce knowledge that the
community within the field deems valuable.
2) Why is the research process important?

Answer: In order for governments, businesses, institutions, workers, organizations, and society in
general to function efficiently and effectively, it is important that the decisions they make are based
on valid and reliable information and thorough analysis. The search for this information is called the
research process.
3) What is the first step of the investigation?

Answer: The first step in research is to identify a question or problem. The first step in the research
process is the development of a research question. This may be an issue that needs to be resolved, or
there may be a lack of information on a specific topic.
Answering this question will be the focus of the research study.

4) What is good research?
Answer: Good research involves systematic planning and setting realistic time goals. They are
workable research methods based on a research methodology that best suits the nature of your
research question. It is based on sufficient relevant data and is reproducible and reproducible.
01-01-03: TYPES OF RESEARCH

Research is defined as the careful consideration of studies on a specific concern or problem using
scientific methods. According to the American sociologist Earl Robert Babbie, Research is a
systematic investigation to describe, explain, predict, and control the observed phenomenon.
Research is a systematic process of inquiry that involves the collection of data; documentation of
critical information; and analyzing and interpreting this data/information in accordance with
appropriate methods established by particular professional fields and academic disciplines.
"Discover scientific knowledge and stay connected to the world of science".

Fig. 1.1.4
RESEARCH PURPOSE :
i. Establishment of leads and new customers
ii. Existing customers perceive
iii. Set pragmatic goals
iv. Develop productive market methods
v. Addressing Business Challenges
vi. Placing along a commercial extension creates
vii. Building new business opportunities
CHARACTERISTICS OF RESEARCH
i. Smart research follows a scientific approach to capture the right knowledge.
ii. Researchers must observe ethics and a code of conduct when making observations or drawing
conclusions.
iii. The analysis is based on logical thinking and includes inductive and deductive methods.
iv. The data and knowledge over time comes from actual observations in natural settings.
v. There is a thorough Associate in nursing analysis of all accumulated knowledge so there are
no associated anomalies.
vi. It creates an opportunity to generate new questions. Existing data help to create additional
analysis options.
vii. It is analytical and uses all the data so there is no ambiguity in the conclusion.
viii. Accuracy is one of the most important aspects of research.
ix. Knowledge must be right and correct.
RESEARCH GOAL

The aim of the research method is to provide new knowledge or to deepen the understanding of a
topic or problem. This method has three main forms:
1. Exploratory: As the name suggests, researchers conduct "exploratory studies" to investigate a
variety of questions. The responses and analysis may not be indicative of the perceived
disadvantage. It is conducted to address new problem areas not previously explored. This
preliminary process provides the inspiration for further conclusive data collection and analysis.
2. Descriptive: Focuses on expanding knowledge‖ about topical issues through a knowledge-
gathering process. "Descriptive research" describes the behavior of a population sample. only
one variable is needed to conduct the study. The three main functions of descriptive studies are
to describe, explain and justify the results.
3. Explanatory: Explanatory or causal research is conducted to understand the implications of
specific changes to existing common practices. Running experiments is the preferred method.
For example, a study is conducted to understand the outcome of rebranding in terms of customer
retention.
TYPES OF RESEARCH
There are 7 main types of research. Researchers choose or acquire methods according to the type of
research topic they want to investigate and the research questions they want to answer (Fig.3.1).
Fig. 3.1: varieties of research

1. Qualitative research
Qualitative research refers to much more subjective, not quantitative, it uses completely different
methods of knowledge aggregation, data analysis, data decoding for meanings, definitions, features,
symbols, metaphors of things. Qualitative research further divided into the following types:
i. Ethnography: This research focuses primarily on the culture of the group of individuals,
which includes shared attributes, language, practices, structures, values, norms, and material

things, and evaluates human lifestyles. Ethno: people, Grapho: to write, this pendant could
embody ethnic groups, ethnogenesis, composition, transplantation and care characteristics.
ii. Phenomenology: This is a startlingly powerful strategy for demonstrating health professional
education methodology in a manner similar to that best suited to examining difficult issues
in health professional education.
2. Quantitative Research
This is about the systematic empirical investigation of "quantitative properties" and phenomena
and their relationships by asking a small question and adding numerical knowledge to examine it
with applied mathematical methods. Quantitative research styles are experimental, correlative, and
survey or descriptive. Statistics derived from quantitative analysis usually demonstrate the
existence of associative or causal relationships between variables. Quantitative research is coupled to
the philosophical and theoretical position of positivism.
3. Basic/Fundamental Research
Basic research or fundamental research could be a research style aimed at understanding a selected
phenomenon, study or law of nature. This type of research examines knowledge to seek the unknown
and to satisfy a form of curiosity. These typically include "how," "what," and "why" queries to
justify events. The fundamental aspect of research is into how processes or an idea works.
Information from basic research usually forms the basis for applied studies.
Here are some examples of the basic analysis:
i. A Study of the effects of alcohol consumption on the brain
ii. A study to find the elements that make up human DNA
iii. A study addressing stress levels make individuals more aggressive
iv. A study examining whether a food diet is healthier than meat
v. A Study of the origin of crypto currency
BENEFITS OF BASIC RESEARCH

Some benefits of conducting basic research include:
i. Understanding living systems and living processes
ii. Serving the organization in the long run by arming society for the problems
iii. Providing a basis for applied analysis

iv. This results in medical progress
4. Applied research
Applied research could be a style of inquiry where one wants to look for meaningful solutions to
"existing" problems. These include challenges at work, in education and in society. This type of

research uses empirical methods, values experiments to gather more knowledge in a field of study.
Applied research focuses on answering a specific question for a buyer or sponsor. It is a kind of
analytical technique to apply the natural sciences to the world to improve the human condition.
There are three types of applied research:
i. Action Research: Action research helps companies find meaningful solutions to problems
by guiding them.
ii. Analysis Research: In valuation research, researchers associate existing information to help
buyers make an intelligent decision.
iii. Analysis and Development: Analysis and development specializes in producing new goods
or services to meet the needs of a target market.
ADVANTAGES OF APPLIED ANALYSIS

On the other hand, applied research has many advantages as it serves to solve existing problems.
Here are some advantages of applied research:
i. Businesses save money by helping them create better options
ii. Make new goals
iii. Planning of new goods and services
iv. Providing unbiased knowledge through empirical evidence
5. Empirical Research
Empirical Research is defined as research using empirical evidences. It is an inquiry in which the
conclusions of the study are drawn solely from concrete empirical evidence and therefore testable
evidence. This empirical evidence is often collected in quantitative victimization research and
qualitative market research methods.
E.g. investigate whether paying attention to upbeat music while at work might encourage creativity.
The nursing assistant experiment is conducted by conducting a music website survey among a group
of listeners who are exposed to upbeat music and another group who pay no attention to the music at
all, and then observe the subjects.
The results obtained from such studies can provide an empirical test of whether or not they promote
power.
6. Descriptive Research
Descriptive research is outlined as a research technique that describes the characteristics of the
population or development under study. This descriptive methodology focuses more on the what‖ of
the research subject than on the why of the research subject. The descriptive research method focuses

primarily on characterizing a segment of the population without specializing in the "why" of a
selected development. In other words, it "describes" the subject of the analysis without obscuring
"why" it is happening. For example, a nurse interested in understanding consumer styles and trends
in the Big Apple could conduct a demographic survey in that region, gather information about the
population, and conduct descriptive research on that demographic. The study will then reveal details
on what is the buying behavior of the recent buyers of the current line‖, but no information on why
the pattern is emerging, try to enter this market and understand the nature of your market, it’s the
goal is the study.
CHARACTERISTICS OF DESCRIPTIVE RESEARCH

Some distinctive characteristics of descriptive research are:
i. Quantitative Research: Descriptive research could be a quantitative research method that
attempts to gather quantitative information for applied mathematical analysis of the
population sample. is a popular research tool that allows North American countries to capture
and describe the nature of the demographic segment.
ii. Uncontrolled Variables: In descriptive analysis, none of the variables are affected in any
way. Experimental methods of research are used for this. Therefore, the character of the
variable or its behavior is not in the hands of the researcher.
iii. Cross-sectional studies: Descriptive research is primarily a cross-sectional study, examining
disparate sections of a similar group.
iv. The Idea of More Research: Researchers continue to examine the information gathered and
analyzed from different techniques of descriptive victimization research. The information can
also facilitate the purpose regarding the categories of research methods used for the next
analysis.
7. Analytical Research
Analytical research has strong talents in providing subtle applied mathematical analysis and in
deciphering its results in its various research contexts. Because our conclusions are supported by
relevant and well-designed techniques, our methods consistently provide our clients with well-
supported insights that have a wide variety of applications. These uses include helping policymakers
make more practical decisions now, as well as guiding their design when it comes to designing
acceptable policies for the future.
Careers in Research
Here are some careers that use these research methods (v):

i. Researcher: A scientist may use basic research to study trends and, depending on the industry
in which he works, will work to find an answer to a resistance through the use of applied
research.
ii. Psychologists: Basic research helps psychologists perceive different types of mental illnesses,
while applied research helps them find viable solutions to help their patients cope with these
illnesses.
iii. Research Analysts: Research analysts use basic research to predict sales trends and use
applied research to find new ways to gather customer information for use in merchandising
strategies. Analysis Assistant
iv. Like an investigator an exploration assistant can use basic research when attempting to
increase knowledge in a study area and applied research when seeking a solution to a
problem.
v. Sociologists: A social scientist typically uses basic research when attempting to find more
specific groups of people, observing students in their environment, and may also use applied
analysis to find solutions to problems faced by a group.

1) What is outline research?
Answer: The systematic investigation into and study of materials and sources so as to ascertain facts
and reach new conclusions.
2) What does one mean by research?

Answer: Analysis is outlined because the creation of recent data and/or the employment of existing
knowledge in a very new and inventive manner so on generate new concepts, methodologies and
understandings. This might embody synthesis and analysis of previous analysis to the extent that it
ends up in new and inventive outcomes.
3) What's research and provides example?

Answer: Analysis is careful and arranged study or gathering of knowledge a few specific topics.
Associate in nursing example of research could be a project wherever scientists try and realize a cure
for AIDS. Associate in nursing example of research is that formation a high school student tracks
down information for a faculty report.
4) What's analysis and its purpose?

Answer: Analysis could be a tool by that they will take a look at their own, and every of other’s
theories, by victimization this antagonism to seek out a solution and advance knowledge. the aim of
the research is actually an in-progress method of correcting and purification hypotheses, which ought
to cause the acceptance of sure scientific truths.
01-01-04: IMPORTANCE AND APPLICATION OF RESEARCH

No matter how much experience you have or how diverse your social circle is, there are still things
you don't know. Research opens up the unknown, allowing you to explore the world from different
angles and foster a deeper understanding. In some fields, research is a necessity for success. In
others, it may not be absolutely necessary, but it has a lot of advantages.
Fig.1.1.5: Showing Importance of research
Here are FIFTEEN reasons why research is important:

1. A tool for acquiring knowledge
The main reason to participate in research is to increase your knowledge. Even if you are an expert
in your field, there is always much to discover. If you are researching a topic that is completely new
to you, it will help you develop your unique perspective on that topic. The whole study process
opens new doors for literary learning and development.
2. Enables effective learning
Studies reveal that research helps restore and protect memory, and improves math and problem-
solving skills. Thus, it prepares the mind for a better understanding of concepts and theories. A
person's ability to learn is enhanced and they can perform better than someone who is just reluctant
to study.
3. Job prospects in the field of AIDS
Research has an undeniable role in business. Successful companies have invested resources in
research and development to have all the information they need in the marketplace. Your study
period helps prepare you for all the research tasks you will need to complete in the future.
4. Help to understand the problems
It highlights issues that have yet to come to light. It gives people the opportunity to solve problems
and answer questions that society cannot answer.
5. Provide truthful evidence
The process of research can prove to dispel various myths that have accumulated in our minds. They
can grow because of a common belief or a bad resource. Logical and practical knowledge is always
available if you only seek the truth.
6. Develops a Love of Reading and Analyzing
Reading and writing are basic elements of study. Therefore, they automatically become familiar to
you if you engage in the search for facts and figures. Reading helps to open your mind to an endless
horizon of knowledge. While developing your writing skills, you have the ability to express yourself
constructively.
7. Exercise your mind
Letting your mind absorb logic and creativity regularly allows it to become more active. This creates
healthy curiosity, fuels the brain and prompts it to seek new answers.
8. Keeps you up to date with recent information
In various fields, especially those related to science, there are always new discoveries to be
discovered. Research keeps you from falling behind or having inaccurate information on a topic.
You can use the latest knowledge to develop ideas or speak confidently on a topic if needed. This
brings us to the next element related to building credibility.
9. Build Credibility
People tend to take a person's ideas seriously when it's clear they know them. Participating in
research helps form a strong foundation for opinion formation. It also makes it hard for people to
find fault with something you've come up with.
10. Focus your reach
If you're diving into a subject for the first time, it can be difficult to know where to start. Most of the
time, you have a huge amount of information to sort through. Research helps focus on the most
important and unique points so you can write meaningfully.
11. Teach Discrimination
As you become proficient in research, you can easily identify low- and high-quality data. You will
become more adept at distinguishing correct information from misinformation. Any gray area will
become clear where the truth is, but the conclusion may be in doubt.
12. Introducing new ideas
You may already have ideas and opinions on a topic you are researching. The more research you do,
the more perspectives you discover. It encourages you to entertain with new ideas and also to
reconsider your own point of view. It might even change your mind about a concept or two.

13. Sensitive
By researching important issues like racial injustice, climate change and gender inequality, you can
reach everyone. This type of research goes beyond collecting data and is about sharing real stories
that create awareness.
14. Encourage curiosity
The love of learning new things can last a lifetime if you have the resources to fuel it. Even the most
basic of studies opens up new possibilities and develops analytical skills. It is the reward of the
curiosity that is always burning in every human being. When you commit to gaining knowledge, it
will constantly help you grow.
15. Prepares you to Deal with the Future
If you are a business studies student, you can learn how to develop plans and strategies when you are
employed in the field. You can define your goals for the future. In fields like medicine, research
helps you identify diseases and symptoms and reveals new ways to eliminate them. So, you can
prepare for the real world by knowing more about the challenges you face.
APPLICATIONS OF RESEARCH
Research has wide range of applications in various fields. These applications are used in almost
every industry.
Fig. 1.1.6: Applications of Research
THESE RESEARCH APPLICATIONS ARE AS FOLLOWS: -

1. Medicine
Research is widely used in the medical field and in various pharmaceuticals to perform tests and find
new drugs to cure various diseases. It is thanks to research that pharmaceuticals have the ability to
synthesize new molecules and suitable diseases such as mumps, measles, polio, etc. Scientists study
the existing disease and the options available to cure it. They then applied different molecules and
tested its effectiveness against the disease.

2. Medical technology
Drugs are part of the pharmaceutical industry, and the second big leap in funding is medical
technology. This disease can be easily treated with medicine, but to find the disease or detect the
disease, it is necessary to have high-tech drugs. These products are technology-based and have
evolved over time.
3. Environmental Studies
Environmental research applications have become popular with almost every government in every
country. Various NGOs as well as environmental restoration organizations help to understand the
current outlook of the environment and foresee possible disasters to the people and make changes
accordingly.
4. Business Research
Business is an area that has found many research applications over the past few decades. Here are
some applications in business studies:
i. Product Research: Developing a new product requires a large amount of market research.
Companies should research existing products and their markets as well as customer needs and
wants. The company must also research what is missing in the market and what the customer
needs to create a better product and deliver excellent results in the market.
ii. Advertising Research: This is a very specialized and narrow form of marketing research
conducted with the aim of improving advertising effectiveness and reaching customers better.
In advertising research, there is a concept known as pre-testing, where ads are analyzed by a
selected audience and their responses are taken into account and the ad is tailored or modified
accordingly.
iii. Case Study: A case study is conducted by a number of businesses that want a comprehensive
understanding and detailed perspective of a customer or business or product or service for a
particular case they are interested in.
iv. Poll: A poll is a representation of the research results that are their question. Services to assist
researchers to understand and compare

1) What are the two applications of research methods?
Answer: There are two major types of research methods: qualitative research methods and
quantitative research methods. Quantitative research methods involve using numbers to measure
data. Researchers can use statistical analysis to find connections and meanings in data.

2) What is applied research and examples?
Answer: Here are some examples of applied psychology research: Analyze what kinds of incentives
will get people to give their time to charities. Investigate whether background music in the work
environment can contribute to increased productivity.
3) What is the application of the research?

Answer: Applied research is explained as below- 1) The branch of medicine. 2) Medical technology.
3) Environmental Research 4) Commercial Research. It also includes: - A) Product Research. B)
Advertising research. C) Case studies. D) Investigations. E) Focus Group. F) Evaluation of the
contest.
4) What is the application in the medical field?

Answer: Research is widely used in the medical field and in various pharmaceuticals to perform tests
and find new drugs to cure various diseases. It is thanks to research that pharmaceuticals have the
ability to synthesize new molecules and suitable diseases such as mumps, measles, polio, etc.
Scientists study the existing disease and the options available to cure it. They then applied different
molecules and tested its
MCQS TYPE OF QUESTIONS

1. Which of the following is not an essential element of report writing?
a. Research Methodology b. Reference
c. Conclusion d. None of these
2. Testing hypothesis is a -------------
a. Inferential statistics b. Descriptive statistics
c. Data preparation d. Data analysis
3. What is the purpose of doing research?
a. To identify problem b. To find the solution
c. Both a and b d. None of these
4. Which method can be applicable for collecting qualitative data?
a. Artifacts (Visual) b. People
c. Media products d. All of these
5. Survey research studies --------------
a. Circumstances b. Events
c. Populations d. Processes
Answer:

1-d 2-a 3-a 4-d 5-c
STATE WEATHER THE STATEMENTS ARE TRUE OR FALSE

1. The starting point in any research project is to formulate a question. (True/False)
2. The researcher's own personal interests and observations may be a valuable source of questions.
(True/False)
3. Theories of other researchers are not a particularly good source of research questions.
(True/False)
4. Successful research often raises new questions, even while it answers old questions.
(True/False)
5. Research reports can be located quickly by use of an abstract system. (True/False)
Answer:
1-True 2-False 3-True 4-True 5-True
MATCH THE FOLLOWING WORDS OF COLUMN-I AND COLUMN-II
COLUMN-I COLUMN-II
1. Fundamental research a. To find solution to problem by practical
2. Applied research b. To establish fact experimentally
3. Case study c. To identify right practical

4. Experimental research d. Establish concept, principles
5. Action research e. Study of individual details
Answer:
1-a 2-b 3-d 4-e 5-c
FILL IN THE BALKS WITH THE APPROPRIATE WORD.

1. A great deal of research ---------- done into the possible causes of cancer.
2. Research generates, ----------- builds credibility, develops understanding.
3. Research provides us with the information and ------- for problem solving.
4. Research expanding knowledge and ------------- outside of the classroom.
5. The core values of research are -----------.

Answer:
1-have been 2-novel ideas 3-knowledge 4-understanding 5-objectivity
SUMMARY
According to American Sociologist Earl Robert Babbie, Research is a systematic investigation that
describes, explains, predicts, and controls an observed phenomenon. Research is a systematic
investigative process that includes the collection of data; documentation of important information;
and analyzes and interprets such data/information according to appropriate methods established by
particular professional and academic fields‖.
The research topic or problem must be practical, relatively important, feasible, and ethically and
politically justified. After selecting the research problem, the second stage is the literature related
mainly to the subject. After formulating the problem and forming a hypothesis about it, the
researcher must create the research design. Any type of research plan can be conducted depending on
the nature and purpose of the research. The researcher must design a sample. Data collection is the
most important task of the researcher. Information collection should include data from the following
two types of researchers.
Research refers to the efforts of people to learn about a subject and to develop new knowledge‖.
People do research to learn about academic conversations about a topic, to identify gaps in
knowledge, to recognize research needs, and to develop new solutions to problems. Research is
structured inquiry that uses acceptable scientific methods to solve problems and create new,
generally applicable knowledge. Rocco, (2011) research is inquiry or careful investigation,
particularly through the search for new facts in any branch of knowledge‖. Creswell, (2008)
"Research is a systematic investigation in order to establish facts." In the broadest sense of the word,
the definition of research includes any collection of data, information and facts for the advancement
of knowledge. "Research involves defining and redefining problems, formulating hypotheses or
proposed solutions, collecting, organizing, and evaluating data, drawing conclusions, and reaching
conclusions to see whether they fit the formulated Hypothesis."
The main reason to participate in research is to increase your knowledge. If you are researching a
topic that is completely new to you, it will help you develop your unique perspective on that topic.
The whole study process opens new doors for literary learning and development. A person's ability to
learn is enhanced and they can perform better than someone who is just reluctant to study.
Research is widely used in the medical field and in various pharmaceuticals to perform tests and find
new drugs to cure various diseases. It is thanks to research that pharmaceuticals have the ability to
synthesize new molecules and suitable diseases such as mumps, measles, polio, etc. Drugs are part of
the pharmaceutical industry, and the second big leap in funding is medical technology. Companies

should research existing products and their markets as well as customer needs and wants. The
company must also research what is missing in the market and what the customer needs to create a
better product and deliver excellent results in the market. A case study is conducted by a number of
businesses that want a comprehensive understanding and detailed perspective of a customer or
business or product or service for a particular case they are interested in.
KEYWORD
Research- This is creative and systematic work done to increase the source of knowledge.
Hypothesis- A conjecture or suggested explanation made on the basis of limited evidence as a
starting point for further investigation.
Information- Knowledge gained from inquiry, study or instruction
Application- An act of putting something to use by applying new techniques.
Quantitative- It is a measure based on quantity or quantity rather than quality.
Qualitative- Qualitative data describing qualities or characteristics.
Basic - It relates to, or forms the basis of, or nature.
REFERENCES
1. Kukull, W. A.; Ganguli, M .(2012). The trees, the forest, and the lowhangingfruit. Neurology.
78(23):1886-1891.
2. Pepinsky, Thomas B. (2019). The Return of the Single-Country Study. Annual Review of
Political Science. 22: 187-203.
3. Alasuutari, Pertti, (2010). The rise and relevance of qualitative research. International Journal of
Social Research Methodology. 13 (2): 139–55.
4. Lichtman, Marilyn, (2013). Qualitative research in education: a user's guide (3rd ed.). Los
Angeles: SAGE Publications. ISBN 978-1-4129-9532-0.
YOUTUBE VIDEOS
1. https://www.youtube.com/watch?v=TFaKHyJGqvQ
2. https://www.youtube.com/watch?v=mV0bUQpz468
3. https://www.youtube.com/watch?v=GSeeyJVD0JU
4. https://www.youtube.com/watch?v=w_Ujkt83i18
WIKIPEDIA
1. https://en.wikipedia.org/wiki/Research
2. http://studylecturenotes.com/10-steps-in-research-process/
3. https://www.uou.ac.in/sites/default/files/slm/BHM-503T.pdf
4. https://theimportantsite.com/10-reasons-why-research-is-important/
5. https://www.marketing91.com/applications-of-research/
REFERENCE BOOKS
1. Kumar Ranjit: Research Methodology: A Step by Step Guide for Beginners, Sage Publication,
2014.
2. Kothari CR: Research Methodology, New Age International, 2011.
3. Shajahan S: Research Methods for Management, 2004.
4. Thanulingom N: Research Methodology, Himalaya Publishing, 2015.
5. Rajendar Kumar C: Research Methodology, APH Publishing, 2008.

CREDIT 01-UNIT 02: RESEARCH METHODOLOGY
LEARNING OBJECTIVES
 Understand a general definition of research design.

 Know why scientific research is done and students benefit from research.
 Define the overall process of designing a study from initiation to reporting.
 Students should be familiar with ethical issues in educational research.
 Know the main characteristics of all types of research and the problem posed in a study.
“There’s no discovery without a search and there’s no rediscovery without a research. Every
discovery man ever made has always been concealed. It takes searchers and researchers to unveil
them, that’s what make an insightful leader”- Benjamin Suulola
INTRODUCTION
A research methodology gives research legitimacy and provides scientifically sound findings. It
also provides a detailed plan that helps to keep researchers on track, making the process smooth,
effective and manageable. A researcher's methodology allows the reader to understand the approach
and methods used to reach conclusions.
Some of the sound research methodology which provides the following benefits:
i. Other researchers who want to replicate the research have enough information to do so.
ii. Researchers who receive criticism can refer to the methodology and explain their
approach.
iii. It can help provide researchers with a specific plan to follow throughout their research.
iv. The methodology design process helps researchers select the correct methods for the
objectives.
v. It allows researchers to document what they intend to achieve with the research from the
outset.
In a thesis, dissertation, academic journal article or other formal pieces of research, there are often
details of how the researcher approached the study and the methods and techniques they used. If
you're designing a research study, then it's helpful to understand what research methodology is and
the selection of techniques and tools available to you. In this article, we explore what research
methodology is, the types of research methodologies and the techniques and tools commonly used to
collect and analyze data. Overall these research methodology activities are covered in the unit.

01-02-01: GENERAL METHODS IN RESEARCH
Research design is a blueprint for answering your research question. Research Methodology is one
strategy used to carry out this plan. Research design and methods are different but closely related;
because good research design ensures that the data you receive will help you answer your research
question more effectively.
Research methods are specific procedures for collecting and analyzing data. Developing your
research methodology is an integral part of your research design. When planning your methods, you
will need to make two main decisions. First, decide how you will collect the data. Your method
depends on the type of data you need to answer your research question.
Research methodology refers to the methods and techniques used to effectively describe research.
Such procedures improve the research process and make research methods more visible to everyone.
The researcher is primarily responsible for how ideas are presented to an audience and how research
methods are explained. In research, several methods are used to interpret ideas; we will explore
different types in this article. However, the choice of method is entirely up to the researcher and there
is no restriction on the type. To analyze and make decisions regarding a business, sales, etc.
Somehow, data will be collected. This collected data will help to draw conclusions about the
performance of a particular company. As a result, data collection is essential to analyze a business
unit's operations, troubleshoot problems, and form hypotheses about specific items, if any. Before
discussing data collection methods, let's understand what data collection is and how it helps in
different fields.
SELECTION OF RESEARCH METHOD

It depends on your research goals. It depends on the audience (and people) you want to study. Let's
say you want to research what makes people happy or why some students are more conscious about
recycling on campus. To answer these questions, you need to make decisions about how your data
will be collected. The most frequently used methods include:
i. Participant's Observation
ii. Survey
iii. Interview
iv. Discussion groups
v. Experience
vi. Secondary data analysis / archival research

vii. Mixed method (combining some of the above factors)
DATA COLLECTION
In statistics, data collection is a process of gathering information from all relevant sources to
find a solution to a research problem. This helps to evaluate the outcome of the problem. The data
collection method allows one to conclude answers to related questions. Most organizations use data
collection methods to make assumptions about future probabilities and trends. Once the data has
been collected, it is necessary to go through the data sorting process.
The main source of data collection methods is Data. Data can be classified into two types, primary
data and secondary data. The most important thing about data collection in any research or
business process is that it helps to identify many important things about the business, especially
performance. So, the data collection process plays an important role in all flows.
According to data type, data collection methods are divided into two categories:
1. Primary data collection method
2. Secondary data collection method
In this article, different types of data collection methods and their advantages and limitations will be
explained.
1. Primary data collection methods
Primary data or raw data is the type of information obtained directly from the first source through
experiments, surveys or observations. The main data collection methods are classified into two
categories. They are: -
i. Quantitative data collection method
ii. Qualitative data collection method
Below are the different methods taken to collect data under these two data collection methods.
i. Quantitative data collection methods: These methods are based on mathematical calculations
using different formats such as closed-question, correlation and regression methods, mean,
median, or mode measurements. degree. This method is less expensive than the qualitative data
collection method and it can be applied in a short period of time.
ii. Qualitative data collection methods: These methods do not involve any mathematical
calculations. This method is closely associated with non-quantifiable elements. This qualitative
data collection method includes interviews, questionnaires, observations, case studies, and
more. There are several methods to collect this type of data. They are
a. Observation method

Observational methods are used when research involves behavioral science. This method is
systematically planned. It has to go through many checks and verifications. The different types
of observations are:
i. Structured and unstructured observations
ii. Controlled and uncontrolled observation
iii. Participants, non-participants and disguised observers
b. Interview method
Method of data collection in the form of verbal feedback. It is achieved in two ways, such as
i. Personal Interview - In this method, one person called the interviewer has to ask
questions directly to the other person. The personal interview can be structured or
unstructured, a face-to-face inquiry, a focused conversation, and more.
ii. Telephone Interview - In this method, the interviewer gathers information by
contacting people by phone to ask questions or verbal opinions.
c. Questionnaire Method
In this method, the questionnaire is sent to the respondents. They must read, answer, and then
return the questionnaire. The questions are printed in the order specified on the form. A good
survey should have the following characteristics:
i. Short and simple
ii. Should follow a logical sequence
iii. Leave enough space for the answer
iv. Avoid technical terms
v. Must have beautiful appearance such as color, paper quality to attract the attention of
respondents
d. Schedule
This method is similar to the questionnaire method with a slight difference. The list is specially
designed to populate the list. He explains the purpose and audience of the survey and can clear
up any misunderstandings, if any. Investigators must be trained to do their job with diligence and
patience.
2. Secondary data collection method
Secondary data is data collected by someone other than the actual user. This means that the
information is already available and someone is analyzing it. Secondary data includes magazines,
newspapers, books, magazines, etc. This can be published data or unpublished data. Published data
is available in a variety of resources including:
i. Government publications

ii. Public documents
iii. Historical documents and statistics
iv. Commercial documents
v. Technical and commercial review
Unpublished data includes:

i. Diary
ii. Letters
iii. Unpublished biographies, etc. The benefits of research methods
Advantages of data collection
i. In research, a critical assessment of the subject is important to analyze and verify the study.
This helps searchers discover search more efficiently. Different methods allow research to
be explored from different angles and analyzed practically.
ii. Quantitative and survey methods help in obtaining numerical results that help all studies.
The results can be easily formulated without much explanation in the thesis using numbers.
iii. Reliable research is important to using them. Methods help make it valuable and useful for
the subject and in a general way. Several methods help researchers shape their field of study
and enhance their knowledge.

1) What is qualitative research?
Answer: Qualitative research can be defined as the study of the nature of a phenomenon and is
particularly suitable for answering the question of why (not) something is being observed, evaluating
multi-component interventions complex and focused on the improvement of the intervention.
2) What is the quantity?

Answer: Quantitative research is the process of collecting and analyzing numerical data. It can be
used to find patterns and mean values, make predictions, test causal relationships, and generalize
results to larger populations.
3) What is quantitative research and examples?

Answer: Quantitative research collects information from current and potential customers using
sampling and submitting online surveys, online polls, and questionnaires, for example. The result can
be represented as a number.
4) What is a quantitative example?
Answer: Examples of quantitative data include numeric values such as measure, cost and weight;
Examples of qualitative data include descriptions (or labels) of certain attributes, such as "brown
eyes" or "vanilla ice cream".
01-02-02: NATURAL OBSERVATION

Naturalistic observation is a qualitative research method where you record the behaviors of your
research subjects in real world settings. You avoid interfering or influencing anything in a naturalistic
observation. You can think of naturalistic observation as people watching‖ with a purpose.
Nature observation is one of the research methods that can be used to design observational studies.
Another common type of observation is controlled observation. In this case, the researcher observes
the participant in a controlled environment. Observers control for most variables and ensure that
participants are structurally observed.
In nature observation, you study your research subjects in their own environment to explore their
behaviors without any outside influence or control. It is a research method used in field studies.
Traditionally, observational studies of nature have been used by animal researchers, psychologists,
ethnologists, and anthropologists. Observation of nature is useful as a method of hypothesis
generation because you gather rich information that can inspire new research.
OBSERVATION OF NATURE
In the 1930s, Scientist "Konrad Lorenz" coined the term "footprint", describing an important
period of learning for natural animals. Based on his natural observations, he believes these birds are
making a mark on potential first parents in their environment, and that they quickly learn to follow
and act on them. Nature observations are particularly useful for studying behaviors and actions that
may not be repeated in a controlled laboratory setting.
TYPES OF NATURAL OBSERVATION

Natural observations are as:

i. Confidential or overt: You conceal or reveal your identity as an observer to the participants
you are observing.
ii. Participating or non-participating: You participate in the activity or behavior yourself, or
observe from the sidelines.
HOW TO COLLECT DATA

You can use a variety of data collection methods to observe nature.
i. Audiovisual recordings
a. Today, it is common to collect visual information through audio and video recordings so
that you can review them later or share them with other trained observers. It's best to
place these recording devices discreetly so your attendees won't be distracted by them.
b. However, make sure to get written consent from each participant before recording.
Example: Audio and video recording compare to You place a video camera in the
schoolyard to study the frequency and patterns of your child's interactions with peers
over time. You ensure that these cameras are placed out of sight so that your research
subjects do not notice them.
ii. To take notes
a. You can take notes while observing nature. Write down anything that seems relevant or
important to you based on your research topics and interests in an unstructured way. e.g.
Notes
b. You observe social interactions and drinking in a college bar and take notes. You record
your perceptions of your subjects' beer consumption, their speaking frequency and
volume, their general demeanor, and anything else that stands out.
iii. Number of counts
a. If you're studying specific behaviors or events, it's helpful to count how often they
occur over a period of time. You can use the count to easily record each instance you
observe in the present moment. For example.
Number of counts
b. You focus on observing four specific students in a college bar. You study their
drinking behavior and record the number of drinks they consume during a visit.
iv. Data sampling
a. There is a lot of information you can gather when researching natural and unregulated
environments. To simplify your data collection, you will typically use data sampling.
b. Data sampling allows you to target your data logging to specific times or events.

v. Sampling time one. You only record observations at specific times.
ADVANTAGES OF NATURAL OBSERVATION

Nature observation is a valuable tool because of its versatility, extrinsic value, innovation, and
durability to research topics that cannot be studied in a laboratory.
i. Flexibility: Because natural observation is a non-empirical method, you are not bound by
strict procedures. You can avoid using rigid protocols and also modify methods mid-stream
if needed.
ii. Ecological validity: Natural observations have a particularly high ecological value because
you are using a real environment instead of a laboratory setting. Everyone doesn't always
act the same in and out of the lab. Your participants behave more authentically when they
are unaware that they are being observed.
iii. Innovation: Observing Nature helps you investigate topics that you cannot solve in the lab
for ethical reasons. You may also use technology to record conversations, behavior or other
noises, as long as your consent is obtained or it is ethically acceptable.
iv. Sustainability: Similarly, observing sustainability involves a cognitive process for
organizing knowledge regarding the evolutionary paths of natural systems humans, general
principles of sustainability, system-specific conditions for sustainability or un-sustainability,
and the complexity involved in the observation process.
DISADVANTAGES OF NATURAL OBSERVATIONS

The disadvantages of naturalist observation include the lack of scientific control, ethical
considerations, and the possibility of observer and subject bias.
i. Lack of control: Since you are researching in a natural environment, you cannot control the
framework or the variables. Without this control, you would not be able to draw conclusions
about causality. You may also not be able to replicate your findings in other settings, with
other people, or at other times.
ii. Ethical Considerations: Most people don't want to be tracked as they go about their day
without their consent or explicit knowledge. It is important to always respect privacy and try
to be discreet. It is best to use naturalist observations only in public situations where people
expect not to be alone. (Note: do not register people without prior written consent for
observation).
iii. Observer bias: Since you are collecting data indirectly, there is always a risk of observer bias
in natural observations. Your perceptions and interpretations of behavior may be influenced

by your own experiences and may misrepresent facts. This type of bias was particularly likely
in the participants' observational methods.
iv. Subject bias: When observing objects in a natural environment, they may sometimes perceive
that they are being observed. As a result, they may change their behavior to act in more
socially desirable ways to validate your expectations.

1) What is an example of a natural observation?
Answer: Examples of nature observations include a mythologist's study of chimpanzee behavior and
a developmental psychologist's observation of children at play. Compare similar observations; self-
monitoring observation; structured observation.
2) What are the 4 types of observations?

Answer: The four main types of observations in sociology are participatory observations, non-
participative observations, covert observations, and public observations.
3) What are the benefits of natural observations?

Answer: One advantage of natural observation is that it allows the investigator to observe the subject
directly in the natural setting. This method gives scientists direct insight into social behavior and can
help them see things they may never have encountered in the lab.
4) What is the main problem with natural observations?

Answer: The disadvantages of naturalistic observation include a lack of scientific control, ethical
considerations, and the potential for observer and subject bias.
01-02-03: FIELD STUDY

Fieldwork is defined as a method of qualitative data collection for the purpose of observing,
interacting with, and understanding people as they find themselves in a natural environment.
These studies are the conservationists observe the behavior of animals in their natural environment
and how they respond to certain situations. Similarly, social scientists conducting fieldwork may
conduct interviews or observe people from a distance to understand how they behave in social
settings and how they react to situations around them.
Fieldwork usually begins in a specific context, although the ultimate goal of research is to observe
and analyze the subject's specific behavior in that setting. However, the cause and effect of a certain

behavior is difficult to analyze due to the presence of many variables in the natural environment.
Most data collection is not purely about cause and effect, but mostly about correlation. While
fieldwork looks for correlations, the small sample size makes it difficult to establish a causal
relationship between two or more variables.
1. What is fieldwork?
2. Why conduct a field study?
3. How is field research different from laboratory research?
4. Various field research methods
5. Steps to conduct fieldwork
1. What is fieldwork?
Fieldwork is a process where data is collected through a qualitative method. The goal of fieldwork is
to observe and interpret the object of study in its natural environment. It is used in human studies and
medical professions. In addition, it connects theory and practical research by qualitative analysis of
data.
2. Why conduct a field study?
Fieldwork allows researchers to identify and observe objects and helps establish correlations between
objects and the environment, and how the environment can affect behavior. It provides insights into
objects as they are observed and analyzed over a long period of time. Fieldwork allows researchers
to fill in gaps in data that can be understood by performing extensive primary research.
3. Difference between field and laboratory studies
The distinguishing feature of field experiments is that they are conducted under realistic and often
unobtrusive conditions. This is in contrast to laboratory experiments, which impose scientific control
by testing a hypothesis in the artificial and highly controlled setting of the laboratory.
4. Various field research methods
There are five main types of methods for conducting fieldwork (Figure 3.1).
Fig.1.2.3: Different methods of field study

i. Ethnographic fieldwork: This type of fieldwork is particularly associated with fieldwork that
records and analyzes culture, society or community. Usually, this research method is used in
social, social and community anthropology.
ii. Qualitative Interviews: Qualitative interviews provide researchers with detailed information.
This massive information is extracted to make inferences regarding the sample group. This data
is collected by conducting interviews in an informal, chat or open-ended interview.
iii. Direct observation: Researchers gather information about their subjects through visual
observation. Researchers can record observations and events as field notes as a whole, without
the need for a guiding protocol. This form of research approach is known as unstructured
observation. However, in a structured observation, the researcher uses a set of guidelines or
established protocols for observing people and events. Furthermore, during direct observation,
the observer is isolated and does not interfere with the research equipment. It does not act as an
alternative to conducting fieldwork, but rather as an original approach to understanding research
behavior. This type of method is widely used in the fields of sociology and anthropology,
where researchers focus on recording the details of social life in a context, community or
society.
iv. Participant Observation: In this research method, the researcher participates in the daily lives
of the members selected for observation. This helps the observer to better understand the study.
In addition, these observational notes were a major type of data that researchers later developed
into detailed field notes.
v. Case Study: A case study is a research method, commonly used in the social and life sciences.
There is no single definition of a case study. However, a case study can be defined as an in-
depth study of a person, a group of people, or a unit, with the aim of generalizing over a number
of units‖. A case study is also described as an in-depth and systematic investigation of an
individual, group, community or other entity, in which the researcher deeply examines data
related to a number of variables.
5. Steps in Field Study
There are five main steps to conduct field study (Fig.3.2).
Fig.1.2.4: Steps to conduct field study

i. Location and defining issue or problem: It is essential to acquire researchers with expertise
in the field of study. In addition, their experience in the field will help them through the
following stages of conducting fieldwork.
ii. Designing the research project: After redefining the research topic, the researchers
determine the appropriate method to address the purpose and objectives of the research.
iii. Visit the study site and collect data: Depending on the objective, start observing.
Observers/researchers go into the field and begin to collect data through visual observations,
interviews, or by staying with the subject and experiencing their surroundings for deeper
understanding.
iv. Reporting research finding: Researchers go through the data analysis process after the data
is collected.
v. Interpretation of results: Researchers document a detailed field study report, explaining
their data and results. Give the field study an appropriate conclusion. The benefits of
fieldwork
THE ADVANTAGES OF FIELDWORK ARE :

i. It is conducted in a real, natural environment where there is no interference of variables and
the environment is not tampered with.
ii. Since the research is carried out in a relaxed environment, data can be collected even on
ancillary topics.
iii. The researcher gains a deep understanding of the research subjects due to being close to them
and therefore studies deeply, thoroughly and accurately.
THE DISADVANTAGES OF FIELDWORK ARE :

i. Studies are expensive and time consuming and can take years.
ii. It is very difficult for researchers to avoid bias in research.
iii. The notes must match exactly what the researcher said, but it is difficult to keep track of the
nomenclature.
iv. It is an interpretive method and it is subjective and depends entirely on the ability of the
researcher. v. In this method, it is not possible to control for external variables and this
constantly changes the nature of the study.

1) What does fieldwork mean?
Answer: Fieldwork refers to research conducted in the real world where the limitations of the
laboratory environment are removed in favor of the natural setting. This form of research generally
prohibits researchers from manipulating the environment directly.
2) What is an example field study?
Answer: Field Study Example: Flexible field user testing that combines usability testing with
adaptive interviewing. Asking people about their tasks and challenges gives you a wealth of
information.
3) What are the types of fieldwork?

Answer: Field research methods include: direct observation, participant observation and qualitative
interview. Each of these methods is described here. Terms related to these and other fieldwork topics
are defined in the glossary of research terms.
4) What are the benefits of fieldwork?

Answer: Fieldwork allows students to develop their understanding of different perspectives on social,
political or ecological issues, allowing them to clarify and justify themselves
01-02-04: EXPERIMENTATION
INTRODUCTION
Empirical research is research done with a scientific approach using two sets of variables. The first
set acts as a constant, which you use to measure the difference of the second set. For example, the
quantitative research method is empirical.
If you do not have enough data to support your decisions, you must first identify the facts.
Experimental research collects the data you need to help you make better decisions.
All studies conducted under scientifically acceptable conditions used experimental methods. The
success of empirical studies relies on researchers confirming that the change of a variable is solely
based on the manipulation of the constant variable. The study must establish a remarkable cause and
effect. You can conduct experimental research in the following situations:
i. Time is essential to establishing a cause-and-effect relationship.
ii. Invariant behavior between cause and effect.
iii. You want to understand the importance of cause and effect.

TYPES OF EXPERIMENTAL RESEARCH DESIGNS
The classic definition of experimental design is: "Methods used to collect data in experimental
studies". There are three main types of experimental designs (Figure.4.1).
i. Pre-experimental research design
ii. True experimental research design
iii. Quasi-experimental research design
Fig.4.1: Types of experimental designs
How you categorize research topics, by condition or group, determines the type of research plan you
should use.
1. Pre-empirical study design: One or more groups are kept under observation after causality has
been taken. You will conduct this research to understand if further investigation is needed for these
specific groups. You can divide pre-experimental research into three categories:
i. Unique Case Study design
ii. Single-group pre-trial-post-trial study design
iii. Comparing Static Groups
2. True experimental research design: True empirical research relies on statistical analysis to
prove or disprove a hypothesis, making it the correct form of research best. Among experimental
designs, only the true design can establish cause and effect in a group. In a real experiment, three
factors must be satisfied:
i. There is a control group, which will not be subject to change, and an experimental group,
which will experiment with the changed variables.
ii. A variable that can be manipulated by the researcher
iii. Random Distribution
3. Quasi-experimental design: The word "quasi" denotes similarity. A semi experimental
design is similar to experimental, but it is not the same. The difference between the two is the

designation of a control group. In this study, an independent variable was controlled, but participants
in a group were not randomly assigned. Research is mostly used in areas where randomization is
irrelevant or unnecessary.
BENEFITS OF PILOT STUDIES

i. It provides researchers with a high degree of control.
ii. Empirical studies give specific conclusions.
iii. The results of experimental studies can be duplicated.
iv. Empirical research helps to determine cause and effect.
v. It allows precise control of variables.
DISADVANTAGES OF EXPERIMENTAL RESEARCH

i. It is very prone to human error.
ii. Excessive control for unrelated variables can lead to personal bias of the researcher.
iii. That takes time.
iv. It is expensive.
v. Manipulating control variables can have ethical implications. Because, it produces artificial
results.

1) What are empirical studies and examples?
Answer: Empirical research is research done with a scientific approach using two sets of variables.
The first set acts as a constant, which you use to measure the difference of the second set. For
example, the quantitative research method is empirical.
2) How are empirical methods used in the research?

Answer: The empirical method involves manipulating one variable to determine if it causes changes
in another variable. This method is based on controlled research methods and the random assignment
of research subjects to test a hypothesis. The scientific method is the basis of the empirical method.
3) How important is testing?

Answer: It provides us with knowledge of the physical world, and experience itself provides the
evidence that underlies that knowledge. Experimentation plays many roles in science. One of its
important roles is to test theories and provide the basis of scientific knowledge.
4) What is the empirical method?

Answer: The empirical method involves manipulating variables to establish cause-andeffect
relationships. The main features are

1. The findings of a research have practical data for improving educational patterns is called
a. Pure research b. Applied research.
c. Descriptive research d. Experimental research
2. The data of research is ---------.
a. Qualitative only b. Quantitative only
c. Both (a) and (b) d. Neither (a) nor (b)
3. ------------ is a preferred sampling method for the population with finite size.
a. Area sampling b. Cluster sampling
c. Purposive sampling d. Systematic sampling
4. The case study is the study of a -------------.
a. Single group b. Single individual
c. Single community or family d. Single unit done intensively
5. Which of the following is not of experimental design
a. Pre-experimental b. True experimental
c. Post- experimental d. Quasi-experimental
Answer:
1-b 2-c 3-d 4-d 5-c
STATE WHETHER THE STATEMENTS ARE TRUE OR FALSE.

1. Does the researchers will follow appropriate ethical guidelines. (True/False)
2. Research is creative and systematic work undertaken and increase the knowledge.(True/False)
3. Research is the collection, organization and analysis of information.(True/False)
4. An experiment is an investigation in which a hypothesis is not tested. (True/False)

5. Research is a process of systematic inquiry collection of data only.(True/False)

Answer:
1-True 2-True 3-True 4-False 5-False
MATCH THE FOLLOWING WORDS WITH COLUMN-I TO COLUMN-II:
COLUMN-I COLUMN-II
1. Qualitative research a. Study height, weight
2. Quantitively research b. Study smart, beauty
3. Audio-visual research c. Study human DNA
4. Applies research d. Record sound and picture
5. Basic research e. Study to cure disease
Answer:
1-b 2-a 3-d 4-e 5-c
FILL IN THE BLANKS WITH APPROPRIATE WORD:

1. A basic research gives------------- innovations and development.
2. Research is a careful study of a particular problem using -------- methods.
3. Good research follows a ----------- approach to capture accurate data.
4. Qualitative research is a method that collects data using -------- methods.
5. Scientific research is a ---------- way of gathering data and harnessing curiosity.
Answer:
1- scientific 2- scientific 3- systematic 4- conversational 5- systematic
SUMMARY
Research design is a blueprint for answering your research question. Research design and methods
are different but closely related; because good research design ensures that the data you receive will
help you answer your research question more effectively. To answer these questions, you need to
make decisions about how your data will be collected. Research methods are specific procedures for
collecting and analyzing data. Your method depends on the type of data you need to answer your
research question. Research methodology refers to the methods and techniques used to effectively
describe research. In research, several methods are used to interpret ideas; We will explore different
types in this article. Somehow, data will be collected. Before discussing data collection methods, let's
understand what data collection is and how it helps in different fields.
Nature observation is one of the research methods that can be used to design observational studies. In
nature observation, you study your research subjects in their own environment to explore their
behaviors without any outside influence or control. Traditionally, observational studies of nature
have been used by animal researchers, psychologists, ethnologists, and anthropologists. Nature
observations are particularly useful for studying behaviors and actions that may not be repeated in a
controlled laboratory setting.
Fieldwork is defined as a method of qualitative data collection for the purpose of observing,
interacting with, and understanding people as they find themselves in a natural environment. For
example, conservationists observe the behavior of animals in their natural environment and how they
respond to certain situations. However, the cause and effect of a certain behavior is difficult to
analyze due to the presence of many variables in the natural environment.
Experimental research collects the data you need to help you make better decisions. All studies
conducted under scientifically acceptable conditions used experimental methods. The success of
empirical studies relies on researchers confirming that the change of a variable is solely based on the
manipulation of the constant variable. The study must establish a remarkable cause and effect.
KEY WORDS
Case Study - Defined as an in-depth study of a person, a group of people, or a unit, for the purpose
of generalizing over multiple units.
Ethnography- An essential research method to know the world from the point of view of its social
relations.
Innovation - It is vital to the continued success of any organization.
Data Sampling - This is a statistical analysis technique used to select, manipulate, and analyze a
representative subset of data points to identify patterns.
Time Sampling - Many studies of the development of social behavior use an observational
technique known as temporal sampling.
Literature - Literature is widely known as any collection of written works.
Design- Design is a blueprint or specification for building an object or system.
REFERENCES
1. Holland, Paul W. (December 1986). Statistics and Causal Inference. Journal of the American
Statistical Association. 81 (396): 945–960.
2. Stohr-Hunt, Patricia (1996). An Analysis of Frequency of Hands-on Experience and Science
Achievement. Journal of Research in Science Teaching. 33 (1): 101–109.
3. Baskerville, R. (1991). Risk Analysis as a Source of Professional Knowledge.
Computers & Security. 10 (8): 749–764.

4. Howell, K. E. (2013) Introduction to the Philosophy of Methodology. London: Sage
Publications.
YOUTUBE VIDEOS
1. https://www.youtube.com/watch?v=Y0wDYLpIoTw
2. https://www.youtube.com/watch?v=LpmGSioXxdo
3. https://www.youtube.com/watch?v=igwqp_yIgwM
4. https://www.youtube.com/watch?v=tBXznU_TPJo
WIKIPEDIA
1. https://www.scribbr.com/methodology/naturalistic-observation/
2. https://www.educba.com/types-of-research-methodology/
3. https://en.wikipedia.org/wiki/Field_research
4. https://www.nngroup.com/articles/field-studies/
5. https://www.questionpro.com/blog/experimental-research/
REFERENCE BOOKS
1. Kumar Ranjit: Research Methodology: A Step by Step Guide for Beginners, Sage Publication,
2014.
2. Kothari CR: Research Methodology, New Age International, 2011.
3. Shajahan S: Research Methods for Management, 2004.
4. Thanulingom N: Research Methodology, Himalaya Publishing, 2015.
5. Rajendar KumarC: Research Methodology, APH Publishing, 2008.

CREDIT 01-UNIT 03: EXPERIMENTATION DESIGN
LEARNING OBJECTIVES
 Describe the basic design, principles and research hypothesis of the research
 Test definition or test design specific research statement
 Identify the key characteristics of the actual test design
 Describe the difference between the experimental group and the control group
 Identify and describe different types of practical experimental designs
 Describe various data analysis, design, RBD, etc.
“Don't confuse hypothesis and theory. The former is a possible explanation; the latter, the correct
one. The establishment of theory is the very purpose of science” —Martin H. Fischer
INTRODUCTON
In common usage in the 21st century, a hypothesis refers to a provisional idea whose merit
requires evaluation. For proper evaluation, the framer of a hypothesis needs to define specifics in
operational terms. A hypothesis requires more work by the researcher in order to either confirm or
disprove it. In due course, a confirmed hypothesis may become part of a theory or occasionally may
grow to become a theory itself. Normally, scientific hypotheses have the form of a mathematical
model. Sometimes, but not always, one can also formulate them as existential statements, stating that
some particular instance of the phenomenon under examination has some characteristic and causal
explanations, which have the general form of universal statements, stating that every instance of the
phenomenon has a particular characteristic.
Consider the three Cs of principle or hypothesis writing in research: being clear, being concrete,
and being concise. Remember that, above all, you are trying to communicate with your reader, so
you should do everything you can to help them understand what you are trying to say.
Fig.3.1: Principal hypothesis

The importance of writing well can never be overstated for a successful professional career, and the
ability to write solid papers is an essential trait of a productive researcher. Writing a paper has its
own principle and hypothesis properly. Due course of writing avoiding missteps can be vital to the
overall success not only of a paper but of the underlying research as well. However, in this task basic
ideal writing are discussed which will useful to new commers in research area.
01-03-01: BASIC DESIGN, PRINCIPLES AND HYPOTHESIS

INTRODUCTION
Research design is the interpretive structure within which the research will be conducted. The
function of the study design is to enable the collection of relevant information with the smallest
possible input. Study design is important because it guides the researcher to identify appropriate
methods of data collection and analysis. A good study design is characterized by flexibility,
efficiency and relevance, etc. A well-developed study design is one that leads to minimal or no
errors if everything goes according to plan. It is important to have clarity of the research question
towards the objectives to be achieved. As a result, the researcher may have to create a combination
of different design approaches to create one that is appropriate for the problem at hand.
According to Green and Tull: Searching is the specification of methods and procedures for
obtaining necessary information. Preparing a study design, tailored to a particular research problem,
involves considering the following:
1. Objectives of the study
2. Data collection methods to be applied
3. Source of information - Sampling plan
4. Data collection tools
5. Data analysis tools: qualitative and quantitative
WHY DESIGN NEEDED
i. It facilitates the smooth running of various research activities, thus making research as
efficient as possible, generating maximum information with minimum effort, time and
money;
ii. It reduces ambiguity;
iii. This helps to achieve maximum efficiency and reliability;
iv. It helps to eliminate biases and marginal flaws;

v. Research design means planning in advance about the method to be used to collect relevant
data and the techniques used to analyze them, taking into account the purpose of the study
and the availability of personnel, time and resources.
vi. It minimizes wasted time;
vii. It is useful for collecting research papers;
viii. It is useful for hypothesis testing;
ix. It gives an idea of what kind of resources are needed in terms of money, manpower, time
and effort;
x. It provides an overview for other professionals;
xi. It steers research in the right direction.
PRINCIPLE
The principles and theories of science have been established by repeated work and observations.
It was peer-reviewed before being accepted by the scientific community. Acceptance does not imply
rigidity or constraint, nor is it dogmatic. Instead, as new data becomes available, previous scientific
explanations will be revised and improved, or discarded and replaced. Science is a means of making
sense of the world, with consistent and clearly described methods and principles within. There is a
development from a hypothesis to a theory using testable scientific laws. Only some scientific facts
are laws of nature and many hypotheses are tested to create a theory. Learn how scientific
assumptions, theories, and laws describe the natural world.
A Scientific method of gathering and evaluating data to obtain a solution to a problem is what we
mean when we say research. The invention of new ideas generally comes from the process of
research. Research is conducted by following the scientific method, which is employed when trying
to solve a problem.
There are two principles in research as
i. Basic principles
ii. Core principles
i. Basic Principles
The four basic principles of research are classified as; autonomy, beneficence, no malfeasance, and
justice.
i. The research principle of autonomy determines the right to agree or disagree to take part in
the research, and health-care methods needed to be decided by the patient.

ii. The research principle of beneficence demonstrates the researchers must act in the welfare
of the participants or patients.
iii. The research principle of non-malfeasance determines to encourage better rather than
causing harm to the participants or patients.
iv. The research principle of justice indicates the uniform distribution of treatment.
The principles of research are related to the concepts of ethical research, in that there is consideration
for not just the potential benefits to society at large, but to the dignity and safety of participants and
patients taking part in the research.
ii. Core Principles
A second core principle of research is that you approach the whole process in a systematic
manner. This is derived from something called the scientific method’. The aim of the scientific
method is to ensure that the researcher is honest and open about the way the research has been
conducted meaning that, in theory, it is possible for someone else to replicate the same process in a
different context. Typically, the scientific method is divided into six stages and importantly these
stages can map onto the different sections of your dissertation. These are described briefly here:
i. Define purpose – As we have explained, all research based around the scientific method
should have a stated purpose or objective. This is the driving force of the process because if
you don’t have a clear idea of what you are trying to do, your whole project will lack focus
and clarity. This should be explained in the introductory chapter of your dissertation
ii. Construct a hypothesis or research questions – A hypothesis is an idea or a proposition
that the researcher puts forward based on their reading of the research literature. Not all
research uses hypotheses in a traditional manner, but it is important to break down the
research objective into key statements or research questions that can be reviewed or tested.
In the scientific method, hypotheses should be based on what we already know and so it is
vital that these are consistent with an understanding of relevant research or theories that
already exist in this area. As a result, the hypotheses are traditionally incorporated in the
second chapter of the dissertation, often entitled the literature review.
iii. Test the hypothesis and collect data – All research should have a systematic plan to gather
information or data that can be used to answer the research questions or test the hypotheses.
This is sometimes referred to as the methodology and it should contain a detailed description
of the type of research conducted, how this was organized and who it involved. For
example, it should detail how many survey responses were received or how many interviews
were conducted. This information forms a methodology chapter in the final dissertation.

iv. Analyze data – The scientific method requires the researcher to subject the information they
have collected to systematic analysis and to present this in a formal manner. It is essential to
be able to show how the data has been used by the researcher. You may have heard the
phrase the data speaks for itself and you may be tempted just to list what you have found.
However, this does not follow the scientific method and so you must spend time thinking
about what it means and how it relates to the hypotheses or research questions. Data cannot
speak for itself and so it is the role of the researcher to do this but to explain how they have
undertaken the analysis.
v. Draw conclusion – The scientific method is intended to advance our knowledge and
understanding and so it is critical that you draw the analysis together to draw a conclusion
and show what your research contributes. This may include a development to our wider
insight into the topic, but it can also include how your organization might benefit from what
you have found. This is included in your final chapter of the dissertation.
vi. Communicate results – The final part of the scientific method assumes that you should
report your findings even if they are not what you expected or even if you have not been
able to support your hypotheses. The dissertation as a whole reflects this need to present and
communicate your results, but you may also think about how you can do this to senior
colleagues in your organization.
HYPOTHESIS
A hypothesis is an idea that can be tested by observations or experiments about the natural
world. To be considered scientific, hypotheses must be scientifically validated and must be
falsifiable, which means they are formulated in a way that can be proven to be false. The hypothesis
that he formed based on his observations included the following:
i. In an organism, there is a pair of factors that control a given trait.
ii. The organism inherits these elements from its parents, one element from each element.
iii. Each thing is passed from generation to generation as a discrete, immutable unit.
iv. When gametes are formed, the elements separate and are distributed as units for each
gamete.
v. If an organism has two different elements for a character, one can be represented for the
complete exclusion of the other.

Fig.1.3.1: Research hypothesis
A hypothesis is an educated guess, based on the probability of an outcome. The scientists came
up with the hypothesis after understanding all the current research on their topic. Hypotheses define
a relationship between at least two variables and are testable. For a hypothesis to work properly,
other researchers must be able to replicate results that prove or disprove it.
There are two types of hypotheses: the "descriptive" hypothesis that asks a question and the
"directive" hypothesis that makes a statement. A hypothesis is used in an experiment to determine
the relationship between two variables. The purpose of a hypothesis is to find an answer to a
question. A formalized hypothesis forces us to think about what outcomes we should look for in an
experiment. The first variable is called the independent variable. This is part of the experience that
can be modified and tested. The independent variable occurs first and can be considered the cause of
any change in the outcome. The result is called the dependent variable. Verification can have one of
three outcomes. Your guess could be:
i. Correct;
ii. Partially correct; or
iii. False.
The validity of these assumptions or assumptions cannot be conclusively verified if the procedure
adopted is incorrect.
So, a hypothesis is a hunch, an assumption, a doubt, an assertion or an idea regarding a phenomenon,
a relationship or a situation for which you do not know the fact or the truth. A researcher calls these
conjectures/hunches as hypotheses and they become the basis of an investigation. In most research,
assumptions will be based on your own or someone else's observations.
A researcher can make a valid investigation into a problem without formulating a hypothesis even
though this gives clarity, specificity and focus to a research problem.
The SEVEN most common types of assumptions are:
i. Simple assumptions.
ii. Complex hypothesis.
iii. Experimental hypothesis.
iv. Zero Hypothesis (denoted HO‖)
v. Alternative Hypothesis (denoted H1)
vi. Logical Hypothesis.
vii. Statistical hypothesis.
A hypothesis is either a suggested explanation for an observable phenomenon, or a reasonable
prediction of a possible causal relationship between several phenomena. In science, a theory is a
tested, well-supported, and agreed-upon explanation of a verified and proven set of factors.
If enough evidence accumulates to support a hypothesis, it moves on to the next stage called theory
in the scientific method and is accepted as a valid explanation of a phenomenon. Tanner further
explains that a scientific theory is a framework of observations and facts.
CHARACTERISTICS OF A GOOD HYPOTHESIS

i. The power of prediction
ii. The closer to the observable the better
iii. Simplicity
iv. Clarity
v. Testing ability
vi. Related to the problem
An active hypothesis is one that is tentatively accepted as the basis for further studies in the hope that
a defensible theory will be developed, even if the hypothesis is not. The last theory failed.
GENERAL FUNCTION OF HYPOTHESES

i. Development of research techniques
ii. Separate relevant observations from irrelevant observations
iii. Selection of required events
iv. Research Directorate
v. Acting as a guide Preventing Blind Search
vi. Accuracy and precision Links between theory and research
vii. Relationship between hypothesis and observation
viii. Given the answer to question
ix. Save time, money and energy

1) Definition what is a simple hypothesis?
Answer: Hypothesis (plural: hypotheses), in a scientific context, is a testable statement about a
relationship between two or more variables or a proposed explanation for a phenomenon.
2) What is the example hypothesis?

Answer: Hypothesis example such as: "Students who eat breakfast do better on math tests than
students who don't." "Students who are nervous before the English test will score better than students
who are not nervous."
3) Is a hypothesis a question?
Answer: Hypothesis is a statement that introduces a research question and proposes an expected
outcome. It is an integral part of the scientific method, which is the basis of scientific experiments.
4) What makes a good hypothesis?

Answer: Criteria for good hypotheses should be as concise and clear as possible; state the expected
relationship or difference between two or more variables; verifiable; and. based on prior knowledge,
drawn from a review of the literature, or from theory.
01-03-02: ONE -TWO GROUP EXPERIMENTAL DESIGN

One group is the treatment group (or experimental group) which is the group that receives the
treatment being studied. The other group is the control group which is the untreated group being
studied. Experimental design is a controlled plan for conducting research and organizing
experiments. It is a quantitative and scientific process that allows data to be collected and evaluated
objectively.
The purpose of an experiment design is to determine a conclusion about a phenomenon or determine
a conclusion to see if there is any truth to a hypothesis, which is an educated guess or hypothesis
about a phenomenon. What the conclusion should look like before starting an experiment.
Phenomena usually involve a relationship between two or more variables. A variable is something
that can change or be changed during a study. It is an element that can be manipulated, controlled, or
measured in a research study. For example, one variable might be time, height, weight, age, or
disease, as these factors can change or change. Typically, in an experimental design, a researcher will
evaluate variables between two groups. This is called two group design because it involves dividing
subjects into two groups so that they can be compared when evaluating a phenomenon.

There are several types of test models. In general, designs that are truly experimental have three main
characteristics: independent and dependent variables, pre- and post-test, and experimental and
control groups. In a real experiment, the effectiveness of an intervention was tested by comparing
two groups. One group was exposed to the intervention (the experimental group, also known as the
treatment group) and the other was not exposed to the intervention (the control group).
In some cases, it may be unethical to refuse to treat a control group in a trial. If you recruit two
groups of heavy addicts and only treat one group, the other group may suffer. For these cases, the
researchers used a comparison group that received the "conventional treatment," but the
experimenters needed to be clear about what this meant. For example, the standard addiction
recovery treatment is to participate in twelve-step programs such as Alcoholics Anonymous or drug
addicts’ meetings. An addiction researcher performing an experiment might use the twelve-step
program in his comparison group and use his experimental intervention in the experimental group.
The results will show whether the experimental intervention works better than the standard
treatment, which is useful information.
However, the use of a comparison group is a distinction from the true experimental design and is
more relevant to the semi-empirical designs. It is important to note that participants in an actual
experiment should be randomly assigned to a control or experimental group. Random assignment
uses a random process, such as a random number generator, to allocate participants to experimental
and control groups. Randomization is important in experimental research because it ensures that the
experimental and control groups are comparable and that any difference between the experimental
and control groups is due to randomization. We will discuss more about the logic behind luck
In an experiment, the independent variable is the tested intervention. In social work, this may include
a therapeutic technique, a prevention program, or access to certain services or supports. Social
science research may have a stimulus rather than an intervention as an independent variable, but this
is less common in social work research. For example, a researcher might elicit a response using an
electric shock or a death reading.
The dependent variable is usually the expected effect of the researcher's intervention. If a researcher
is testing a new therapy for people with bulimia, their dependent variable might be the number of
binge-eating episodes a participant report. The researchers can hope their intervention will reduce the
number of binges eating episodes that the participants reported. Therefore, they had to measure the
number of episodes that occurred before the intervention (pre-test) and after the intervention
(posttest).
Let's put these concepts in chronological order to see how the experience plays out from start to
finish. Once you have collected your sample, you will need to randomly assign your participants to
the experimental and control groups. You would then give both of your prettiest groups, measure
your dependent variable, to see how your participants were doing before you started the intervention.
Next, you will provide the intervention or independent variable to your test group. Remember that
many procedures last several weeks or months, especially treatments. Finally, you will manage your
audits for both groups to observe any changes in your dependent variable. Together, this is known as
the classic test design and is the simplest type of test design actually. All of the designs that we
review in this section are variations of this approach.
TWO-GROUP EXPERIMENTAL DESIGN
The two groups typically are a treatment group and a control group. The treatment group, also
known as the experimental group, receives the treatment that the researcher is evaluating. The
control group, on the other hand, does not receive the treatment. Instead, the control group receives
either a standard treatment (with known effect) or a fake (or inactive treatment).
Moreover, for these two groups, the researcher must determine how to assign subjects to each group,
which is another important aspect of experimental design. The primary method that researchers use
to assign subjects to groups is random assignment. With random assignment, subjects are put into
groups using a random method. Each subject has an equal chance of being assigned to a group, and
each subject is assigned to each group independently of other subjects. The assignment is like a coin
toss because the chance for assignment is a 50% chance. Similar to tossing a coin, computers can be
a useful tool in generating random assignments. Another feature that is useful and important with
randomly assigning subjects is having a larger number of subjects as opposed to a smaller number of
subjects.
This helps to make the groups more equivalent (similar in all attributes except the attribute being
researched). In fact, the goal is to make the groups probabilistically equivalent. This is when groups
are randomized in such a way that the two groups can be declared statistically equivalent. This
typically means that a 95% or greater equivalence is obtained. This level of equivalence ultimately
leads to more valid and solid conclusions. For example, two equivalent groups would be groups
where the subjects share the same characteristics, such as the mean age, mean weight, and health are
the same between the groups, so the efficacy of a drug could be adequately evaluated and
determined.
Experimental design is a controlled plan for conducting research and organizing experiments. It is a
quantitative and scientific process that allows data to be collected and evaluated objectively. The
purpose of an experiment design is to determine a conclusion about a phenomenon or determine a
conclusion to see if there is any truth to a hypothesis, which is an educated guess or hypothesis about
a phenomenon. what the conclusion should look like before starting an experiment.
Phenomena usually involve a relationship between two or more variables. A variable is something
that can change or be changed during a study. It is an element that can be manipulated, controlled, or
measured in a research study. For example, one variable might be time, height, weight, age, or
disease, as these factors can change or change. Typically, in an experimental design, a researcher will
evaluate variables between two groups. This is called two-group design because it involves dividing
subjects into two groups so that they can be compared when evaluating a phenomenon.
What is a treatment group? What are the two groups in test design? The two groups are usually a
treatment group and a control group. The treatment group, also known as the experimental group,
receives any treatment that the researcher.
1) What are the two groups in experimental design?

Answer: In a real experiment, the effectiveness of an intervention was tested by comparing two
groups. One group was exposed to the intervention (the experimental group, also known as the
treatment group) and the other was not exposed to the intervention (the control group).
2) What is a single group plan?

Answer: Single-group test design (or single case study) is a type of semi-empirical experiment in
which the outcome of interest is measured only once after giving a nonrandomized group of
participants some intervention. The goal of evaluating the effectiveness of this intervention may be.
3) What is a two-pair group design?

Answer: Design a group of two pairs. This design is a construction of a two-group design, in which
there are two experimental groups and two control groups. parallel group design. Design in which
two or more groups are used at the same time with only one variable being manipulated or modified.
4) Can an experiment have two groups of experiments?

Answer: In an experiment, there must be at least one control group and one experimental group;
however, a single trial can include multiple experimental groups, all of which are compared with a
control group.
01-03-03: MATCH PAIR DATA ANALYSIS

A type of analysis in which the subjects in the research and comparison groups are compared on
external factors by individually comparing the research subject with the subjects in the comparison
group (e.g. age-appropriate controls).

Appropriate pair wise experimental designs are often used in studies with small numbers of
participants and only two possible treatment conditions. For example, it could be used in an
experiment with a small sample for a new surgical study in which both groups have new surgery or
no new surgery. Having new surgery would be one treatment condition, while no surgery being
another, and in this case the control group. When matching pair design is used, study participants
are matched on key variable(s) that are relevant to the study subject. In a statistical study, a variable
is a characteristic of a test participant. For example, if the study involved people, characteristics such
as age or weight would be considered possible variables. For the example above, a key variable
might be age, so each study participant could be matched against another study participant of the
same age. Matching pair design is a random block design experience. When designing an
experiment, one must first ensure that the appropriate pairwise trial design will be appropriate for the
present study. The study should only have two different possible groups in which participants can be
placed. Study populations are usually smaller when a matched pair design is used. For this
hypothetical example, 200 participants were selected for the study. The study will measure the effect
of a new surgery on the health status of all study participants.
First, each study participant will be grouped with another based on common variables. In this
particular study, Scientists and Doctors believe that age is an important variable for this study, so
each participant will be linked to another participant of similar age. on one's own. Other variables
such as general health or weight can also be used to separate study participants into pairs.
Second, each group of participants will be selected by the experiment designer. One of the
participants will be selected at random from the moderated group or from the nonexecutive group.
The other participant will automatically be assigned to the other group. Therefore, each treatment
group will have 100 study participants. At the end of the study, the scientists and doctors analyzed
whether the group treated with surgery had better or worse outcomes than the group that did not.
MATCHED-PAIRS E XPERIMENT
A matched pair trial design is one that can be used to conduct randomized block design testing
with a relatively small number of participants. That way, researchers can reduce some of the
variables involved. Although matching participants can be time consuming, each treatment group has
similar characteristics and variables. This helps researchers know whether the difference in outcomes
after the treatments is over is due to the treatment method or to the variables.
In experiments involving survey data, a participant's responses can be influenced by the order in
which responses can be given; this is called the order effect. In a matched pair wise trial design, each

survey participant received only one treatment rather than both. Therefore, the order effect is
eliminated.
One problem with matched pair trials is the time it takes to match participants as completely as
possible. Also, not all variables can match; therefore, there is always the risk of a difference between
the two treatment groups. In addition, if a subject drops out for any reason, the subject that is relevant
to that subject will also be excluded from the study. This means data loss in an already small set of
objects.
OBJECTIVE
The purpose of paired samples is to obtain better statistics by controlling for the effect of other
unexpected variables. For example, if you are studying the health effects of alcohol, you can control
for age-related health effects by matching participants of similar ages.
TEST
When you run a hypothesis test, you should choose a test specifically for independent samples or
dependent (paired) samples. Paired patterns can be analyzed using the following specific tests:
i. McNemar's test is a non-parametric test for paired nominal data.
ii. The paired t-sample test (also known as the "related measurements" t-test or the dependent-
sample t-test) compares the means of two groups to see if there is a statistical difference
between the two group or not.
iii. Wilcoxon's signed rank test is a non-parametric alternative to the t-test. Note that this test
does not compare the average, it compares the average rank.

1) What is an example of a matching pair design?
Answer: An example would be a study of 100 people on diets. Each subject will be paired with
another subject of similar age and weight. Pairs will then be placed into study groups so that each
subject will be in an opposing study group, either on a diet or without a diet.
2) How is the matching pair test designed?

Answer: First, a group of subjects is collected and the variables that the researchers want to take into
account are identified. The objects are then matched against other objects with similar variables.
Each matched subject was placed in opposing study groups.
3) What is a pair study plan?

Answer: A matched pair study is a study in which each subject is matched with another subject with
similar variables. One of the paired subjects is randomly assigned to one study group, while the other
is then assigned to the other study group.
4) What is a matched pair’s experiment example?

Answer: What is a matched-pairs design example? One example would be a study of 100 people for
a diet. Each subject would be paired with another subject with similar age and weight. Then the pairs
would be placed into the study groups such that each subject is in an opposing study group, diet or
no diet
01-03-04: FACTORIAL AND RANDOMIZED BLOCK DESIGN

INTRODUCTION (FACTORIAL DESIGN)
Perhaps the easiest way to begin to understand factorial designs to see an example. Imagine a
design where we have an educational program where we want to test many different types of
programs to see which works best. For example, we want to vary the timing of the children's
schooling, with one group studying for 1 hour per week and another group studying for 4 hours per
week. And, we want to change the setting with one group receiving instruction in class (perhaps
being moved to a corner of the classroom) and the other being taken out of the classroom for
instruction in another room. We could think of having four separate groups to do this, but when we
change teaching times, which setting do we use: in class or after school? And, when we are working
on the framework, how much tutorial time are we going to use: 1 hour, 4 hours or something else?
With factorial design, we don't need to compromise to answer these questions. We can have it both
ways if we pass each of our two points in the guide conditions with each of our two settings. Let's
start by defining a few terms. In factorial design, a factor is a major independent variable. In this
example, we have two factors: instruction time and installation. Degree is a subdivision of an
element. In this example, the instruction time has two levels and the setting has two levels. We
sometimes describe a factorial design with a numeric notation. In this example, we can say that we
have a 2 x 2 factorial design (say "two x two"). In this notation, the number of numbers tells you how
many coefficients there are, and the numerical values tell you how many levels there are. If I say that

I have a 3 x 4 factorial design, you will know that I have 2 factors and one factor has 3 degrees while
the other has 4. The order of the numbers makes no difference and we can also call it factorial 4 x3
design. The number of different processing groups that we have in any factorial design can easily be
determined by multiplying by numerical notation. For example, in our example we have 2 x 2 = 4
groups. In our notation example we will need 3 x 4 = 12 groups.
FACTORIAL EXPERIMENT
In statistics, a full factorial test is a test whose design includes two or more factors, each with
discrete possible values or "levels", and whose test unit takes all possible combinations of those
levels for all those factors. Full factorial design can also be referred to as full cross design. Such an
experiment allows the investigator to study the effect of each factor on the response variable, as well
as the effect of the interaction between the factors on the response variable. For most factorial tests,
each factor has only two levels. For example, with two elements each taking 
DISADVANTAGES OF FACTORIAL DESIGN
The main drawback is the difficulty of testing with more than two elements or multiple levels. A
factorial design must be meticulously planned, because a mistake at one of the levels, or during
general operation, would jeopardize a large amount of work. Aside from these minor flaws, factorial
design is the mainstay of many sciences, yielding excellent results in the field.
RANDOM BLOCK DESIGN
Block design in statistics, also known as block, is the arrangement of experimental units or
subjects into groups called blocks. A block design is often used to account for or control potential
sources of undesired variation. Blocks are often divided into relatively uniform subsets depending on
the experimental conditions. By dividing objects into blocks, the researcher ensures that the change
within blocks is smaller than the change between blocks.
In an experiment, there are two main variables, the dependent variable and the independent
variable. The dependent variable is the variable that is tested or measured in an experiment while the
independent variable is the factor that is believed to have an effect on the dependent variable. The
independent variable is often modified to observe and record its effect on the dependent variable. A
confounding variable is a variable that affects both the dependent and independent variables and can
lead to false or misleading results.
A random block design (RBD) is an experimental design in which experimental subjects or units
are grouped into blocks with different treatments to be tested that are randomly assigned to the units.
in each block. Essentially, a randomized block design group’s subjects with similar characteristics
into blocks and randomly tests the effects of each treatment on individual subjects within the block.
For example, if a farm has a corn field affected by a plant disease and wants to test the effectiveness
of different fungicides in controlling it, they can divide the field into blocks and randomly process it.
Sections of each block with different fungicides test. By dividing the field into blocks, they were able
to account for some of the variations that might exist in the field. For example, a stretch of field may
have more shade and prolonged leaf wetness, creating an ideal environment for pathogens to thrive.
Another section may have a slightly different soil type or slope. Through blocking, the farm can
account for these potential confounding factors. The random block design is used to perform research
in many fields, such as pharmaceutical research, agriculture, and animal science.
PURPOSE OF BLOCK RANDOMIZATION
There are often confounding variables whose effects cannot be predicted, and a researcher may not
even be aware of the existence of certain confounding variables. Understanding a project and being
able to predict potential sources of variation is important for getting the most accurate relationship
between dependent and independent variables. To get the most accurate variable relationship, there
are advantages to using block randomization. Random block:
i. Reduce bias
ii. Error reduction
iii. Reduce variation in processing conditions
iv. Ensure that the results are not misinterpreted
v. Help
1) What does random block design mean?

Answer: The random block design is an experimental design in which the test units are grouped into
blocks called blocks. Treatments were randomly assigned to experimental units in each block. When
all treatments appear at least once in each block, we have a completely random block design.
2) What are the advantages of RBD?

Answer: The advantages of RBD are: Generally, more accurate than CRD. Some treatments may be
repeated more times than others. Missing cells are estimated easily.

Entire treatments or all replicates may be excluded from the analysis.
3) What is factorial design used for?
Answer: Factorial design allows estimating multilevel effects of one factor from other factors,
drawing valid conclusions under a wide range of experimental conditions. The simplest type of
factorial design involves only two factors or processing sets.
4) What are examples of factorial plans?

Answer: This is called mixed factorial design. For example, a researcher might choose to consider
cell phone use an internal factor by testing the same participants both using cell phones and not using
cell phones. Dynamic (while balancing the order of these two conditions).

1. What is listed in the first column of an ANOVA summary table?
a. degrees of freedom b. sums of squares
c. source of variation d. significance level
2. A 2 X 2 factorial design
a. Is called a one-way ANOVA. b. results in a four-cell matrix.
c. Cannot yield interactions. must d. Include an organismal independent variable.
3. What does an A X B interaction mean in a two-way ANOVA?
a. There must be significant main effects for Factors A and B.
b. The main effects for Factors A and B must be short of significance.
c. a change in the simple main effect of B over levels of A
d. If there are significant main effects, they must be interpreted first before interpreting the
interaction.
4. In the dark-fears study, ------------
a. the main effects seemed to be due to the interaction.
b. there were no main effects.
c. there was no interaction.
d. the interaction was due to the main effects.
5. Factorial experiments ----------------
a. include 2 or more dependent variables. b. includes 2 or more independent variables.
c. focus on unmeasured factors. d. focus on organismic factors.
Answer:
1-b 2-a 3-c 4-a 5-b

STATE WHETHER THE FOLLOWING STATEMENTS ARE TRUE OR FALSE.
1. A factorial design, experiment design consists of two or more factors. (True/False)
2. An experimental design where the experimental units are in groups called blocks. (True/False)
3. In RBD Some treatments may be replicated more times than others. (True/False)
4. Whole treatments or entire replicates not deleted from the analysis. (True/False)
5. RBD may give misleading results if blocks are homogeneous. (True/False)
Answer:
MATCH THE FOLLOWING WORDS WITH COLUMN-I AND COLUMN-II:
Column-I Column-II
1.RBD a. 8 ddifferent levels
2.2x2 factorial b. 2 dependent and 1 independent
3.2x2x2 factorial c. block and treatment
4.3x3 factorial d. 2 intervention (2 level)
5.2x4 factorial e. 3 level design
Answer:
1-c 2-d 3-b 4-e 5-a
FILL IN THE BLANKS WITH APPROPRIATE WORD

1. Three-level factorial designs are useful for investigating ---------- effects.
2. A 2×2 factorial design is the effects of two ---------- on a single dependent variable.
3. 2 x 4 designs are effect of 2 ------- variables, one with 2 levels and one with 4 levels.
4. An RBD is an experimental design where the units are in groups called-----------.
5. an example of --------------- is a vaccine trial to test the efficacy of a new vaccine.
Answer:
1- quadratic 2- independent 3- independent 4- blocks 5-block randomization

SUMMARY
The function of the study design is to enable the collection of relevant information with the smallest
possible input (effort, time and money). Only some scientific facts are laws of nature and many
hypotheses are tested to create a theory. Learn how scientific assumptions, theories, and laws
describe the natural world. A hypothesis is an idea that can be tested by observations or experiments
about the natural world.
Experimental design is a controlled plan for conducting research and organizing experiments. It is an
element that can be manipulated, controlled, or measured in a research study. Typically, in an
experimental design, a researcher will evaluate variables between two groups. Experimental design is
a controlled plan for conducting research and organizing experiments. It is an element that can be
manipulated, controlled, or measured in a research study. Typically, in an experimental design, a
researcher will evaluate variables between two groups. This is called two-group design because it
involves dividing subjects into two groups so that they can be compared when evaluating a
phenomenon.
Appropriate pair wise experimental designs are often used in studies with small numbers of
participants and only two possible treatment conditions. For example, it could be used in an
experiment with a small sample for a new surgical study in which both groups have new surgery or
no new surgery. When matching pair design is used, study participants are matched on key
variable(s) that are relevant to the study subject. For example, if the study involved people,
characteristics such as age or weight would be considered possible variables. Other variables such as
general health or weight can also be used to separate study participants into pairs. Second, each
group of participants will be selected by the experiment designer. Therefore, each treatment group
will have 100 study participants.
In this example, the instruction time has two levels and the setting has two levels. The number of
different processing groups that we have in any factorial design can easily be determined by
multiplying by numerical notation. In our notation example we will need 3 x 4 = 12 groups. Block
design in statistics, also known as block, is the arrangement of experimental units or subjects into
groups called blocks. In an experiment, there are two main variables, the dependent variable and the
independent variable. The dependent variable is the variable that is tested or measured in an
experiment while the independent variable is the factor that is believed to have an effect on the
dependent variable.

KEY WORDS
Block Design- A block design is a structured structure consisting of an assembly with a family of
subassemblies called blocks.
Independent variable- The independent variable is the cause. Its value is independent of the other
variables in your study.
Dependent variable- The dependent variable is the effect. Its value depends on changes in the
independent variable.
Variable- A quantity that can take on any value among a set.
Design- Design is a blueprint or specification for building an object or system.
Random- The meaning of random is to choose, assign or arrange randomly
REFERENCES
1. McCoy, S. K., & Major, B. (2003). Group identification moderate’s emotional response to
perceived prejudice. Personality and Social Psychology Bulletin, 29, 1005–1017.
2. Babbie, E. (2010). The practice of social research (12th ed.). Belmont, CA: Wadsworth;
Campbell, D., & Stanley, J. (1963). Experimental and quasi experimental designs for research.
Chicago, IL: Rand McNally.
3. Milliken & Johnson (1989), Tukey's single degree-of-freedom test for no additivity, pp. 7-8.
4. Lentner & Bishop (1993), In 6.8 No additivity of blocks and treatments, pp. 213–216.
YOUTUBE VIDEOS
1. https://www.youtube.com/watch?v=vJgcae2ziOM
2. https://www.youtube.com/watch?v=bYURT9wgc98
3. https://www.youtube.com/watch?v=CKTr9T1drcU
4. https://www.youtube.com/watch?v=slscHD40r78
WIKIPEDIA
1. https://en.wikipedia.org/wiki/Generalized_randomized_block_design
2. https://en.wikipedia.org/wiki/Factorial_experiment
3. https://en.wikipedia.org/wiki/Fractional_factorial_design
4. https://en.wikipedia.org/wiki/Randomization

REFERENCE BOOKS
1. Box, G.E.; Hunter, J.S.; Hunter,W.G., Statistics for Experimenters: Design, Innovation, and
Discovery, 2nd Edition. Wiley,2005.
2. Montgomery, Douglas C., Design and Analysis of Experiments (8th ed.). Hoboken, New Jersey:
Wiley, 2013.
3. Hinkelmann, Klaus; Kempthorne, Oscar.; Design and Analysis of Experiments, Volume I:
Introduction to Experimental Design (2nd ed). Wiley, 2008.
4. D. Basu.; Institute of Mathematical Statistics Lecture Notes: Monograph Series.
Hayward, CA: Institute for Mathematical Statistics, 1982.

CREDIT 01-UNIT 04: SAMPLING METHODS
LEARNING OBJECTIVES
 Explain how research is designed to acquire new knowledge
 Describe the role(s) of the research support staff in improving the integrity of the study
 Explain how the relationship between variables is used to answer the research question.
 Explain why randomization and randomization were used in the study.
 Explain why certain information about a study cannot be released to study participants
“Sampling, statisticians have told us, is a much more effective way of getting a good census” –
Rob Lowe
INTRODUCTION
Sampling is a technique of selecting individual members or a subset of the population to make

statistical inferences from them and estimate the characteristics of the whole population. Different
sampling methods are widely used by researchers in market research so that they do not need to
research the entire population to collect actionable insights. It is also a time-convenient and a cost-
effective method and hence forms the basis of any research design. Sampling techniques can be used
in research survey software for optimum derivation.
Although the idea of sampling is easiest to understand when you think about a very large
population, it makes sense to use sampling methods in studies of all types and sizes. After all, if you
can reduce the effort and cost of doing a study, why wouldn’t you? And because sampling allows
you to research larger target populations using the same resources as you would smaller ones, it
dramatically opens up the possibilities for research. Sampling is a little like having gears on a car or
bicycle. Instead of always turning a set of wheels of a specific size and being constrained by their
physical properties, it allows you to translate your effort to the wheels via the different gears, so
you‘re effectively choosing bigger or smaller wheels depending on the terrain you‘re on and how
much work you‘re able to do. Sampling allows you to gear your research so you’re less limited by

the constraints of cost, time, and complexity that come with different population sizes. It allows
aspirant student to do things like carrying out exit polls during elections, map the spread and effects
rates of epidemics across geographical areas, and carry out nationwide census research that provides
a snapshot of society and culture.
01-04-01: CONCEPT OF POPULATION

When studying a group of people, it is rarely possible to collect data from everyone in that group.
The sample is the group of individuals who will actually participate in the study. To draw valid
conclusions from your results, you must carefully decide how to select a sample that is representative
of the whole group. There are two types of sampling methods:
i. Probabilistic sampling involves random selection, allowing you to make powerful statistical
inferences about the entire group.
ii. Non-probability sampling involves non-random selection based on convenience or other
criteria, allowing you to easily collect data. Here is a clear explanation of how you choose
your sample in the methodology section of your paper or thesis.
(Source: https://www.questionpro.com/blog/types-of-sampling)
Fig.1.4.1: Sampling methods

POPULATION VS SAMPLE
First, you need to understand the difference between population and sample, and determine the target
population for your study (Fig.1.2).
i. The population is the entire group for which you want to draw a conclusion.
ii. A sample is the specific group of individuals from which you will collect data.
iii. Populations can be defined by location, age, income, and many other characteristics.

Fig.1.4.2: Population and sample
It can be very broad or quite narrow: perhaps you want to make inferences about your country's
entire adult population; perhaps your search focuses on customers of a certain company, patients
with specific health conditions, or students of a school. It is important to clearly define your target
population in terms of the objective and realistic aspects of your project. If the population is very
large, demographically mixed, and geographically dispersed, it may be difficult to reach a
representative sample.
SAMPLING FRAME
The sampling frame is the actual list of individuals from which the sample will be drawn. Ideally, it
should include the entire target population (and anyone outside of that population).
SAMPLE SIZE
The number of individuals you need to include in your sample depends on many factors, including
the size and variability of your population and your research plan. There are different formulas and
sample size calculators depending on what you want to achieve with statistical analysis.
PROBABILISTIC SAMPLING METHOD

Probabilistic sampling means that each member of the population has a chance of being selected. It is
mainly used in quantitative research. If you want to produce results that are representative of the
entire population, the probability sampling technique is the most valid choice. There are four main
types of probability sample (Fig.1.3).
Fig.1.4.3: Four types of probability sample

1. Simple random sampling: In a simple random sample, each member of the population has an equal
chance of being selected. Your sampling frame should include the entire population. To do this type

of sampling, you can use tools like random number generators or other purely chance-based
techniques.
2. Systematic Sampling: Systematic sampling is similar to simple random sampling, but it is
generally easier to implement. Each member of the population is listed with a number, and
individuals are selected at regular intervals. If you use this technique, it is important to ensure that
there are no hidden patterns in the list that could skew the pattern. For example, if the HR database
group’s employees by shift and the team members are listed in order of seniority, there is a risk that
your time period will ignore those in lower positions, which will lead to sampling in favor of senior
staff.
3. Stratified sampling: Stratified sampling involves dividing a population into subpopulations that
may differ significantly. This allows you to draw more accurate conclusions by ensuring that each
subgroup is correctly represented in the sample. To use this sampling method, you divide the
population into subgroups (called strata) based on relevant characteristics (e.g. gender, age group,
income group, location). Based on the overall population proportion, you calculate the number of
people to sample from each subgroup. You then use random or systematic sampling to select a
sample from each subgroup.
4. Cluster sampling: Cluster sampling also involves dividing the population into subgroups, but each
subgroup must have similar characteristics to the entire sample. Instead of sampling individuals from
each subgroup, you randomly select all subgroups.
If practical, you could include every individual from every sampled cluster. If the clusters
themselves are large, you can also sample the instances in each cluster using one of the techniques
above. This is called multistage sampling. This method is good for dealing with large and dispersed
populations, but is more prone to errors in the sample, as there can be significant differences between
clusters. It is difficult to ensure that the clusters sampled are truly representative of the entire
population.
NON-PROBABILITY SAMPLING METHOD

In a non-probability sample, individuals are selected based on non-random criteria and not all
individuals have a chance to be included.
This type of sample is more accessible and cheaper, but has a higher risk of sampling bias. This means
that the inferences you can make about the population are weaker than the probability samples, and your
conclusions may be more limited. If you use a nonprobability sample, you should always try to make it
as representative of the population as possible.

Non-probability sampling techniques are often used in exploratory and qualitative research. In these
types of research, the goal is not to test a hypothesis about a large population, but rather to develop an
initial understanding of a small or understudied population (Figure 1.4).
Fig. 1.4.4: Non-probability sampling methods

1. Convenient sampling: A convenience sample simply includes the individuals that are most
accessible to the researcher. This is a simple and cheap way to collect initial data, but there is no
way to know if the sample is representative of the population, so it cannot produce generalized
results.
2. Sampling voluntary responses: Similar to the convenience form, the voluntary feedback form
is primarily based on accessibility. Instead of researchers selecting participants and contacting
them directly, people volunteer (for example, by completing a public online survey). Volunteer
feedback forms are at least always somewhat biased, as some people are inherently more likely
to volunteer than others.
3. Purposeful sampling: This type of sampling, also known as judgmental sampling, involves the
researcher using their expertise to select a sample that is most useful for research purposes. It is
often used in qualitative research, where the researcher wishes to gain detailed knowledge of a
particular phenomenon rather than making statistical inferences, or when the population is very
small and specific body. An effective purpose template must have clear criteria and reasons for
inclusion.
4. Snowball sampling: If people are hard to reach, snowball sampling can be used to recruit
participants through other participants. The number of people you have access to the "snowball"
as you come in contact with more people. Using non-probability sampling

NON-PROBABILITY SAMPLING IS USED FOR THE FOLLOWING :
Hypothesis Generation: Researchers use non-probability sampling to generate hypotheses when
prior information is limited or non-existent. This method facilitates the immediate return of data and
forms the basis for further studies.
i. Exploratory Research: Researchers make extensive use of this sampling technique when
conducting qualitative, experimental, or exploratory research.
ii. Budget and Time Constraints: Non-probability method where there are budget and time
constraints and some preliminary data needs to be collected. Since survey design is not rigid, it
is easier to randomly select respondents and have them complete a survey or questionnaire.
HOW DO YOU DECIDE WHAT TYPE OF TEMPLATE TO USE?

For any research, it is essential to choose a sampling method with accuracy to meet your research
objectives. The effectiveness of your sampling depends on various factors. Here are some steps taken
by expert researchers to decide on the best sampling method.
i. Write down your research goals. Generally, it should be a combination of cost, precision, or
precision.
ii. Identify effective sampling techniques capable of achieving the research objectives.
iii. Test each of these methods and see if they help you achieve your goals.
iv. Choose the most suitable method for searching.
The difference between probability and non-probability sampling methods
Probability Sampling Methods Non-probability Sampling Methods
Probability Sampling is a sampling technique in Non-probability sampling method is a technique in

which samples taken from a larger population are which the researcher chooses samples based on
chosen based on probability theory. subjective judgment, preferably random selection.
These are also known as Random sampling methods. These are also called non-random sampling methods.
These are used for research which is conclusive. These are used for research which is exploratory.
These involve a long time to get the data. These are easy ways to collect the data quickly.
There is an underlying hypothesis in probability The hypothesis is derived later by conducting the
sampling before the study starts. Also, the objective research study in the case of non-probability
of this method is to validate the defined hypothesis. sampling.

1) What is sampling?
Answer: A sample is a subset of individuals from a larger population. Sampling is choosing the
group from which you will actually collect data in your study. For example, if you are researching
student opinions at your university, you might survey a sample of 100 students. In statistics,
sampling helps to test hypotheses about the characteristics of a population.
2) What is probabilistic sampling?

Answer: Probabilistic sampling means that each member of the target population has a chance to be
included in the sample. Probabilistic sampling methods include simple random sampling, systematic
sampling, stratified sampling, and cluster sampling.
3) What is Non-Probability Sampling?

Answer: In non-probability sampling, the sample is selected based on non-random criteria, and not
all members of the population have a chance to be included. Common non-probability sampling
methods include convenience sampling, voluntary response sampling, intentional sampling, snowball
sampling, and quota sampling.
4) What is multistage sampling?

Answer: In multi-stage sampling or multi-stage cluster sampling, you sample from a set using
smaller and smaller groups at each stage. For example, this method is often used to collect data from
a large geographically dispersed population in national surveys. You take advantage of hierarchical
groups (e.g. state with city to neighborhood) to create a sample that is less expensive and takes
longer to collect data.
01-04-02: RANDOM AND NON-RANDOM SAMPLING

In a recent post, we learned about sampling and the benefits it offers if you want to explore a
population. Today we will look at two main sampling methods. Let's start by defining the concept of
a sampling frame. A sampling frame is a list of items that make up the population under study.
Samples are drawn from this list. Survey subjects can be not only individuals, but also households,
institutions, or anything that can be surveyed. Items within a sampling frame are called sampling
units.

Let's look at an example. Suppose you want to measure customer satisfaction for a particular
company. To create a sampling frame, I was able to access the company's computer system and
pull a list of everyone who had purchased a product in the past year. Each person on this list is
considered a sample unit. You can then select a sample by selecting a group of those customers. The
percentage of sampling frames included in the sample is called the sample fraction. In previous
posts, we saw that this ratio and sample size determine the accuracy of the results obtained from
interviewing a sample.
RANDOM SAMPLES
A sample is always used if the following conditions are met:
i. All items in the population have a non-zero chance of being selected as part of the sample.
ii. For each item in the sample frame, we know exactly this probability, known as the inclusion
probability.
If these two criteria are met, examining the sample will yield unbiased results about the population.
A weighting method may need to be applied to obtain undistorted results. Such weighting is possible
because we know the probability of each individual being included in the sample. Samples obtained
under these conditions are also called spot checks. From the definition above, we can conclude that
we can create a random sample only if we have a sampling frame. Censuses, databases of mailing
addresses within cities, and lists of business customers are examples of sampling frames that allow
random sampling. In each case above, the population under study is different like Country residents,
city households, corporate customers, and so on.
Once a sampling frame is obtained, the random sampling method defines the exact method used to
select the samples. Simple random samples, systematic samples, stratified samples, unbalanced
stratified samples, cluster samples, etc.
NON-RANDOM SAMPLING
Nevertheless, meeting sample testing standards is not easy.
i. Free-handed sampling frames are relatively rare when conducting market research.
ii. Ensuring that all individuals in the population have non-zero selection probabilities are just
as difficult to achieve.
Knowing the exact inclusion probability for each sample unit is even more difficult incapable
individual Probability of being selected as a sample. When conducting live polls on the street, we do
not have access to the list of people who make up the population. When conducting telephone
interviews, we have a list of phone numbers, but everyone has a landline or list number. It does not
mean. If you receive a response from an online panel, the chances of it being included are zero, as

people without internet access cannot be selected. I regularly come across studies that report.
Formally, this is the wrong approach, but researchers tend to use it to show the effect of sample size
on the accuracy of their results.
There are differing opinions on the usefulness of specifying error bars in situations like this, as
reflected in the discussion in the following post.

1) What if it is a random sample?
Answer: A simple random sample is 25 IDs drawn from a company hat with 250 employees. The
crowd in this situation is about 250 workers, and the sample is random because each worker has the
same probability of being selected.
2) What are the characteristics of a random sample?

Answer: Randomness and known probability of selection seem to be his two most important aspects
of random sampling. This unpredictability factor is the first important factor in random sampling.
Preferably, each individual in the study population has a non-zero chance of being selected
3) What is the purpose of the sample?

Answer: Random sampling allows you to obtain sample results that are closer to what you would get
if you looked at the entire population. In the simplest random sample, each component of this
population is equally likely to be selected.
4) What are the perceived drawbacks of non-random sampling?

Answer: The main drawbacks include the small number of representatives, the inability to make
statistical statements about the results, and the potential for bias due to the sampling criteria used. At
worst, samples can be marred by intentional biases compared to the population as a whole, leading to
misleading results.
01-04-03: RANDOM VARIABLES

In probability, real-valued functions defined in the sample space of random experiments are
called random variables. That is, the random variable values correspond to the results of a random
experiment. Random variables are either discrete or continuous. In this article, we will discuss
different types of random variables.

DEFINITION
A random variable is a rule that assigns a numerical value to each outcome in the sample space.
Random variables are either discrete or continuous. A random variable is called discrete if it takes
only certain values within an interval. Otherwise it is continuous. Random variables are usually
capitalized, such as X and Y. If X takes values 1, 2, 3..... we are talking about a discrete random
variable.
We need a random variable that is measured as a function whose probabilities can be assigned to a
set of possible values. Obviously, the results depend on some unpredictable physical variables. Let's
say that if you toss a fair coin, the final outcome, heads or tails, depends on possible physical
conditions. It is not possible to predict which outcome will be determined. There are other
possibilities for coins to be broken or lost, but such considerations are avoided.
VARIABLE
A variable is any entity that can take on different values, e.g. age, country. A dependent variable is
one which is affected by another is independent. It has the same properties as random variables
without emphasizing any particular kind of stochastic experiment. It always obeys certain probability
laws. A variable is said to be discrete if it cannot take all values within the specified range. A random
variable is said to be a continuous random variable if it can take all the numbers provided over the
entire range.
TYPES OF RANDOM VARIABLES
As explained in the introduction, we have two random variables:

i. Discrete random variable
ii. Continuous random variable
Fig.1.4.5: Random variables

Let’s understand these types of variables in detail along with suitable examples below.
i. Discrete Random Variable

A discrete random variable can take only a finite number of distinct values such as 0, 1, 2, 3, 4… and
so on. The probability distribution of a random variable has a list of probabilities compared with
each of its possible values known as probability mass function.
In an analysis, let a person be chosen at random, and the person’s height is demonstrated by a
random variable. Logically the random variable is described as a function which relates the person to
the person’s height. Now in relation with the random variable, it is a probability distribution that
enables the calculation of the probability that the height is in any subset of likely values, such as the
likelihood that the height is between 175 and 185 cm, or the possibility that the height is either less
than 145 or more than 180 cm. Now another random variable could be the person’s age, which could
be either between 45 years to 50 years or less than 40 or more than 50.
ii. Continuous Random Variable
A numerically valued variable is said to be continuous if, in any unit of measurement, whenever it
can take on the values a and b. If the random variable X can assume an infinite and uncountable set
of values, it is said to be a continuous random variable. When X takes any value in a given interval
(a, b), it is said to be a continuous random variable in that interval.
Formally, a continuous random variable is such whose cumulative distribution function is constant
throughout. There are no gaps in between which would compare to numbers which have a limited
probability of occurring. Alternately, these variables almost never take an accurately prescribed
value c but there is a positive probability that its value will rest in particular intervals which can be
very small.
RANDOM VARIABLE FORMULA

For a given set of data the mean and variance random variable is calculated by the formula. So, here
we will define two major formulas:
i. Mean of random variable
ii. Variance of random variable
i. Mean of random variable: If X is the random variable and P is the respective probabilities, the
mean of a random variable is defined by:
Mean (μ) = ∑ XP
Where variable X consists of all possible values and P consist of respective probabilities.
ii. Variance of Random Variable: The variance tells how much the spread of random variable X
around the mean value is. The formula for the variance of a random variable is given by;
Var(X) = σ2 = E(X2) – [E(X)]2

Where E(X2) = ∑X2P and E(X) = ∑ XP 
Functions of Random Variables
Let the random variable X assume the values x1, x2…with corresponding probability P (x1), P
(x2),… then the expected value of the random variable is given by:
Expectation of X, E (x) = ∑ x P (x).
A new random variable Y can be stated by using a real Borel measurable function g: R→R, to the
results of a real-valued random variable X. That is, Y = f(X).
The cumulative distribution function of Y is then given by:
FY(y) = P(g(X)≤y)
If function g is invertible (say h = g-1) and is either increasing or decreasing, then the previous
relationship can be extended to obtain:
Now if we differentiate both the sides of the above expressions with respect to y, then the relation
between the probability density functions can be found: fY(y) = fx(h(y)) |dh(y)/dy|
RANDOM VARIABLE AND PROBABILITY DISTRIBUTION

The probability distribution of a random variable can be
i. Theoretical listing of outcomes and probabilities of the outcomes.
ii. An experimental listing of outcomes associated with their observed relative frequencies.
iii. A subjective listing of outcomes associated with their subjective probabilities.
The probability of a random variable X which takes the values x is defined as a probability function
of X is denoted by f (x) = f (X = x) A probability distribution always satisfies two conditions:
f(x)≥0 ∑f(x)=1
The important probability distributions are:
i. Binomial distribution
ii. Poisson distribution
iii. Bernoulli’s distribution
iv. Exponential distribution
v. Normal distribution
TRANSFORMATION OF RANDOM VARIABLES
The transformation of a random variable means to reassign the value to another variable. The
transformation is actually inserted to remap the number line from x to y, then the transformation
function is y = g(x).
Transformation of X or Expected Value of X for a Continuous Variable
Let the random variable X assume the values x1, x2, x3,… with corresponding probability P (x1), P
(x2), P (x3),.. then the expected value of the random variable is given by
Expectation of X, E (x) = ∫ x P (x)

1) What is meant by a random variable?
Answer: A random variable is a rule that assigns a numerical value to each outcome in a sample
space. Alternatively, it can be defined as a variable whose value is unknown, or as a function that
assigns a numerical value to each outcome of the experiment.
2) What are random variables and their types?

Answer: As you know, a random variable is a rule or function that assigns a numerical value to each
outcome of an experiment in a sample space. There are two types of random variables: discrete
random variables and continuous random variables.
3) How do you know? Is a random variable continuous or discrete?

Answer: A discrete variable contains a number of possible values that can be counted, is a variable
whose value can be obtained by counting. In contrast, a continuous variable is a variable whose
value is obtained by measurement.
4) What are some examples of discrete random variables?

Answer: The probability of any event in an experiment is a number between 0 and 1, and the sum of
all probabilities in the experiment is 1. An example of a discrete random variable is the number of
dice roll outcomes, the number of outcomes. When you roll a jack of spades or something from a
deck of cards.

01-04-04: INDEPENDENT AND INTERVENING VARIABLES
In research, a variable is any characteristic that can take different values, such as height, age,
temperature, or test results. Researchers often manipulate or measure independent and dependent
variables in studies to test causality.
i. Due to the independent variable. Its value is independent of other variables in the study.
ii. The dependent variable is the effect. Its value depends on changes in the independent
variables.
Examples of independent and dependent variables: I am planning a study to test whether changes in
room temperature affect zoological test results.
i. The independent variable is room temperature. Vary the room temperature by keeping half of
the participants cool and the other half warm.
ii. The dependent variable is the zoological test score. They use a standardized test to measure
the zoological performance of all participants and see if it varies with room temperature.
INDEPENDENT VARIABLES
An independent variable is a variable that is manipulated or modified to study its effects in an
experimental study. It is called 'independent' because it is not affected by other variables in the study.
Independent variables are also called:
i. Explanatory variables (variables that describe an event or outcome)
ii. Predictor variable (can be used to predict the value of the dependent variable)
iii. Right side variables (displayed on the right side of the regression equation).
These terms are used specifically in statistics to estimate the extent to which changes in independent
variables can explain or predict changes in dependent variables. The independent variable is exactly
what it sounds like.
This is an independent variable and will not be changed by any other variable you are trying to
measure. For example, a person's age could be an independent variable. Type of independent
variable
There are two main types of independent variables.
1. Experimental independent variables can be directly manipulated by researchers.
2. Subject variables cannot be manipulated by researchers, but can be used to group research
topics into categories.
1. Experimental variable

Experiments directly manipulate the independent variable to see how the dependent variable
influences it. Independent variables are usually applied at different levels to see how the results
differ. Only two levels can be applied to see if the independent variables have any effect. You can
also apply multiple levels to explore how the independent variable affects the dependent variable.
i. Independent variable level
They are investigating the effects of new drugs on blood pressure in hypertensive patients. The
independent variable is treatment, which varies directly between groups. They have three
independent variable levels, and each group receives a different level of treatment. Randomly assign
patients to one of three groups:
i. Low-dose experimental group
ii. High-dose experimental group
iii. Clustered placebo
In a real experiment, different levels of independent variables should be randomly assigned to
participants. Random assignment helps control participant characteristics so as not to affect
experimental results. This allows you to have confidence that the results for the dependent variable
come from manipulations of the independent variables only.
ii. Dependent variable
A dependent variable is a variable that changes as a result of manipulation of the independent
variable. It's the result you want to measure and it "depends" on the independent variable. Dependent
variables are also called statistics.
i. Response variable (response to changes in another variable)
ii. Outcome variable (represents the outcome you want to measure)
iii. Left variable (appears on the left side of the regression equation)
The dependent variable is what you record after manipulating the independent variable. Using this
measurement data, we use statistical analysis to see if and to what extent the independent variable
influences the dependent variable.
Based on the results, you can estimate the extent to which variation in the independent variables
causes changes in the dependent variable. You can also predict how much the dependent variable
will change with variation in the independent variable.
2. Subject variables
Subject variables are characteristics that vary from participant to participant and cannot be
manipulated by the researcher. For example, gender identity, ethnicity, race, income, and education
are important thematic variables that social researchers treat as independent variables. Random
assignment to participants is not possible as it is a feature of existing groups. Alternatively, you can

create a study design that compares the results of a group of participants with a characteristic. This is
a quasi-experimental design as there is no random assignment.
Example: Quasi-Experimental Design
They are investigating whether gender identity influences neuronal responses to infant cries. Your
independent variable is the subject variable, the participant's gender identity. They have three groups
of men, women and others. Its dependent variable is the response of brain activity to hearing a baby
cry.
INDEPENDENT VARIABLE DETECTION
Use the following list of questions to find out if you are dealing with independent variables.
i. Are variables used by researchers as a way to manipulate, control, or group topics?
ii. Does this variable come before other variables in time?
iii. Are researchers trying to understand if or how this variable influences another variable?
DEPENDENT VARIABLE DETECTION: Check if it is the dependent variable.

Will this variable be measured as a result of the survey? Or Does this variable depend on another
variable in the study? Or Is this variable only measured after other variables have changed?
Table4.1: Differences between independent and dependent variables
Research question Independent variable Dependent

variable(s)
Do tomatoes grow fastest under Type of light the tomato The rate of growth
fluorescent, incandescent, or natural plant is grown under of the tomato plant
light?
What is the effect of intermittent Presence or absence of
Blood sugar levels
fasting on blood sugar levels? intermittent fasting
Is medical marijuana effective for Presence or absence of
Frequency of pain
pain reduction in people with medical marijuana
Intensity of pain
chronic pain? use
To what extent does remote working Type of work
Job satisfaction self-
increase job satisfaction? environment (remote
reports
or in office)
Use your experimental data to analyze your results by creating descriptive statistics and visualizing
your results. Then choose an appropriate statistical test to test your hypothesis. The test type is
determined as follows:
i. Variable type
ii. Measurement level
iii. The number of levels of the independent variable.
Visualization of independent and dependent variables. Quantitative research recommends using

charts and graphs to visualize your findings. In general, the independent variable is on the X-axis
(horizontal) and the dependent variable is on the Y-axis (vertical).
The type of visualization you use depends on the type of variables in your research question.
i. Bar charts are best when you have categorical independent variables.
ii. Scatter plots or line charts work best when both the independent and dependent variables are
quantitative.
INTERVENING VARIABLES
An intervening variable, sometimes called a mediator variable. Mediating variables help us
understand the relationship between the independent variable, and dependent variable when there is
no such direct relationship between both. When independent variables cannot influence the
dependent variable, a mediating variable works as a referee between the two and help us navigate the
relationship between independent variables (IV) and dependent variables (DV). Mediating variables
are also called intervening variables.
Independent variables govern the dependent variables through the channel of mediating or
intervening variables (Fig.4.2).
Fig.1.4.6: Intervening variables between independent and dependent

Variables that the researcher can control and manipulate are called independent variables, while
variables that are seen to change and are estimated are called dependent variables. A dependent
variable arises because of changes in the independent variable that the researcher is interested in
investigating the outcome or outcome of. of the dependent variable. Dependent variables, unlike
independent variables, are not available to researchers. The independent variables are called
predictors because they are suspected causes of the observation. These are called outcome variables
because changes in the dependent variable depend on changes in the independent variables. The
main characteristics of mediating variables are:
i. A mediating variable is driven by an independent variable.

ii. A parametric variable influences the dependent variable.
iii. Statistical correlation between independent and dependent variables is higher in the presence
of mediator.
Linear regression analysis or ANOVA is used to statistically test whether a variable is parametric and
is called mediation analysis. The relationship between the independent and dependent variables is
fully described by the mediator. Without the mediator in the model, the relationship between the
independent and dependent variables vanishes.

1) What is the definition of an independent variable?
Answer: An independent variable is a variable that is manipulated, controlled, or varied in an
experimental study to study its effects. It is called 'independent' because it is not affected by other
variables in the study. Independent variables are also called explanatory variables (explain an event
or outcome). Predictor variable (which can be used to predict the value of the dependent variable)
right-hand variable (appears on the right side of the regression equation).
2) What is the definition of the dependent variable?

Answer: The dependent variable is what changes as a result of manipulating the independent
variable in the experiment. That's what you want to measure and it "depends" on the independent
variable. In statistics, the dependent variable is also called the response variable (it responds to
changes in another variable). Outcome variable (represents the outcome you want to measure). left
variable (appears to the left of the regression equation)
3) Why are independent and dependent variables important?

Answer: Identifying cause and effect is one of the most important parts of scientific research. It is
important to know what the independent variable is the cause and the dependent variable is the
effect.
4) What are some examples of independent and dependent variables?

Answer: I want to know how diet sodas and regular sodas affect blood sugar levels, so I do an
experiment. Soda type, diet or regular, is the independent variable. The measured blood glucose level
is the dependent variable. It depends on the type of data.

1. The number of students in a class is an example of ---------
a. Continuous variable b. Discrete variable
c. Definite variable d. None of these
2. A variable which can assume each and every value within a given range is called
a. Discrete variable b. Random variable
c. Qualitative variable d. Continuous variable
3. If x is a discrete random variable, the function fx is ---------
a. Distribution function b. Probability function
c. Density function d. None of these
4. Height measurements of 50 students studying in a college
a. Discrete variable b. Random variable
c. Constant d. Continuous variable
5. The speed of the car is an example of ----------
a. Continuous variable b. Discrete variable
c. Absolute variable d. None of these
Answer:
1-a 2-d 3-b 4-d 5-a

1. A specific characteristic of a population is called parameter. (True/False)
2. A specific characteristic of a sample is called as statistics. (True/False)
3. A set of all units of interest in a study is called as sample. (True/False)
4. A part of the population selected for study is called data. (True/False)
5. Data that are collected by anybody for some specific purpose and use are called as primary data.
(True/False)
Answer:
1. - True 2. - True 3. - False 4. - False 5. - True
Column-I Column-II
1. Random a. A character free from outside control

2. Non-random b. Lacking any definite plan or prearranged order
3. Independent c. Not random; not ordered randomly variables
4. Dependent d. In between independent and dependent variables
5. Intervention e. One that is dependent especially
Answer:
1-b 2-c 3-a 4-e 5-d

1. The data collected from published reports is known as --------- data.
2. Data in the population census reports are ---------- data.
3. Monthly rainfall in a city during the last ten years is a ---------- variable.
4. Statistics results are -------------- an average.
5. Statistics does not study -------------.
Answer:
1-secondary 2-primary 3-contineous 4- true 5-individual
SUMMARY
Probabilistic sampling involves random selection, allowing you to make powerful statistical
inferences about the entire group. Non-probability sampling involves nonrandom selection based on
convenience or other criteria, allowing you to easily collect data. In a simple random sample, each
member of the population has an equal chance of being selected. To do this type of sampling, you
can use tools like random number generators or other purely chance-based techniques. Each member
of the population is listed with a number, but instead of generating random numbers, individuals are
selected at regular intervals. To use this sampling method, you divide the population into subgroups
(called strata) based on relevant characteristics (e.g. gender, age group, income group, location). You
then use random or systematic sampling to select a sample from each subgroup.
A sampling frame is a list of items that make up the population under study. Items within a sampling
frame are called sampling units. To create a sampling frame, I was able to access the company's
computer system and pull a list of everyone who had purchased a product in the past year. A sample
is always used if the following conditions are met, as items in the population have a non-zero chance
of being selected as part of the sample.

The domain of a random variable is the sample space represented as a collection of possible
outcomes of random events. A random variable is a rule that assigns a numerical value to each
outcome in the sample space. A random variable is called discrete if it takes only certain values
within an interval. If X takes values 1, 2, 3, ..., we are talking about a discrete random variable. We
need a random variable that is measured as a function whose probabilities can be assigned to a set of
possible values.
A variable is said to be discrete if it cannot take all values within the specified range. In research, a
variable is any characteristic that can take different values, such as height, age, temperature, or test
results. Researchers often manipulate or measure independent and dependent variables in studies to
test causality. Its value is independent of other variables in the study. Variables that the researcher
can control and manipulate are called independent variables, while variables that are seen to change
and are estimated are called dependent variables. A dependent variable arises because of changes in
the independent variable that the researcher is interested in investigating the outcome or outcome of.
Dependent variables, unlike independent variables, are not available to researchers. Without the
mediator in the model, the relationship between the independent and dependent variables vanishes.
KEY WORDS
Sample Size- Sample size is a measure of the number of individual samples used in an experiment.
Random Sample- A randomly selected sample is intended to provide an unbiased representation of
the entire population.
Nonrandom Sampling- A method of selecting units from a population using a subjective (that is,
non-random) method.
Cluster Sampling- In cluster sampling, researchers divide a population into smaller groups called
clusters.
Sampling Frame- A list of objects or people that make up the population from which the sample is
taken.
Quasi-Sampling- A systematic sampling of every nth entry from the list is equivalent to a random
sampling for most practical purposes.
Placebo Group- This is the group of participants exposed to placebo or a sham independent
variable.
Subject Variables- Experiences or characteristics of research participants that are not of primary
interest but may influence the outcome of the study and should be considered during experiments or
data analysis.

REFERENCES
1. Spall, J. C. (2010), Factorial Design for Efficient Experimentation: Generating Informative Data
for System Identification. IEEE Control Systems Magazine. 30
(5): 38–53.
2. Giri, Narayan C.; Das, M. N. (1979), Design and Analysis of Experiments. New York, N.Y:
Wiley. pp. 53, 159, 264.
3. Box, G. E.; Hunter, W. G.; Hunter, J. S. (2005), Statistics for Experimenters:
Design, Innovation, and Discovery (2nd ed.). Wiley.
4. Blitzstein, Joe; Hwang, Jessica (2014), Introduction to Probability. CRC Press.
YOUTUBE VIDEOS
1. https://www.youtube.com/watch?v=6skCMCdh3FY
2. https://www.youtube.com/watch?v=NVr0OqeAdjw https://www.youtube.com/watch?v=V2-
Rpc1s9Rc&list=PLqMl6r3x6BUQvUoLYgmf3XmFW8LSEyXlo&index=24
WIKIPEDIA
1. https://www.scribbr.com/methodology/sampling-methods/
2. http://www.stats.gla.ac.uk/steps/glossary/sampling.html
3. https://www.netquest.com/blog/en/random-non-random-sampling
4. https://byjus.com/maths/random-variable/
REFERENCE BOOKS
1. Cutler, Alan (2003), The seashell on the mountain top. Heinemann, London.
2. Altman. DG., (1990), Practical Statistics for Medical Research, CRC Press.
3. L. Castaneda; V. Arunachalam & S. Dharmaraja (2012), Introduction to Probability and
Stochastic Processes with Applications. Wiley.
4. Kallenberg, Olav (2001), Foundations of Modern Probability (2nd ed.). Berlin: Springer Verlag.

CREDIT 02

CREDIT-02 UNIT-01: DATA COLLECTION
LEARNING OBJECTIVES
 Understand techniques for generating qualitative data.
 Know the characteristics of different methods and techniques for generating qualitative data
 Strengths and challenges of each of these qualitative data collection methods
 The data generated and the research of researchers who influence this data
 Understand the role and use of technology and social media in data collection
 Certainty of the research question regarding study design and data collection
“Data are just summaries of thousands of stories—tell a few of those stories to help make the
data meaningful.” ~ Dan Heath
INTRODUCTION
Data collection is the procedure of collecting, measuring and analyzing accurate information’s
for research using standard validated techniques.
A researcher can evaluate their hypothesis on the basis of collected data. In most cases, data
collection is the primary and most important step for research, irrespective of the field of research.
The approach of data collection is different for different fields of study, depending on the required
information.
Data collection tools refer to the devices/instruments used to collect data, such as a paper
questionnaire or computer-assisted interviewing system. Case Studies, Checklists, Interviews,
Observation sometimes, and Surveys or Questionnaires are all tools used to collect data.

The main purpose of data collection is to gather information in a measured and systematic manner
to ensure accuracy and facilitate data analysis. Since the data collected is meant to provide content
for data analysis, the information gathered must be of the highest quality for it to be of value.
Accurate data collection is necessary to make informed business decisions, ensure quality assurance,
and keep research integrity. During data collection, the researchers must identify the data types, the
sources of data, and what methods are being used. Depending on the researcher's research plan and
design, there are several ways data can be collected. The most commonly used methods are:
published literature sources, surveys (email and mail), interviews (telephone, face-to-face or focus
group), observations, documents and records, and experiments. All over primary and secondary data
collection are re-emphasized in the current unit of the study.
02-01-01: DATA COLLECTIONMETHODS

It is important to design experiments for data collection when the data required for analysis are
not available. The main goal is to find a way to collect the best subset of data quickly and efficiently.
Data plays a central role in data science and machine learning. In most cases, we assume that the data
we use for analysis and modeling is free and readily available. In some cases, there is no data and the
full dataset cannot be obtained or is taking too long to collect. In this case, we need to find a way to
try and collect the best subset of data that we can retrieve quickly and efficiently. The process of
designing experiments to collect data is called design of experiments. Examples of experimental
designs are research and clinical trials (Fig.1.1).
Fig.2.1.1: Example of experimental design
EXPERIMENTAL DESIGN FACTORS

Experimental factors are controlled independent variables. A variable whose level is set by the
experimenter. A factor is a general type or category of treatment. Differences in treatment represent
different levels of factors. Here are four main factors to consider when designing and conducting
data collection experiments: -

i. Time: You should be able to plan and run your experiments in a reasonable amount of time.
Suppose the customer service department of a particular organization experiences an exponential
increase in the number of calls, resulting in long wait times in the call center. Organizations can
design surveys that employees and customers can participate in. This must be done in a timely
and timely manner so that the data collected can be analyzed and used for data-driven decision
making that helps improve the customer service experience. Failure to timely plan experiments
and analyze collected data can negatively impact sales and profits.
ii. Datasets: When designing an experiment, we need to ensure that the data collected from the
experiment is sufficient to answer the required questions. The amount of data collected (sample
data) should be small compared to the expected total data (population data). Otherwise,
collection will take too long. Sample data should be representative of the entire population. For
example, experiments examining drug efficacy must be demographically representative
(covering different age groups, genders, ethnicities, etc.).
iii. Identify Important Characteristics: When designing an experiment for data collection, you need
to decide what your dependent or predictor variables. For example, if the purpose of the
experiment is to collect data that will allow us to estimate housing prices in a particular area, the
number of bedrooms, number of bathrooms, square footage, zip code, school district, year built,
Homeowners Association Expenses (HOA), etc. It is important to understand the critical functions
and control functions.
iv. Cost Phrased Text: Designing experiments to collect data can be very expensive. Running an
experiment also comes with a cost. For example, participants who answer a survey can be
rewarded as an incentive to participate. Data scientists and data analysts must also be paid for
their analysis of data collected from research. Before planning an experiment, it is important to
assess the cost of running the experiment and whether the benefits of the experiment outweigh
the risks. For example, if the findings can improve customer experience and increase profits, the
investment is worth it.
OBSERVATIONAL DATA COLLECTION METHODS

In observational data collection, you collect data by observing any possible relationship in the
phenomenon you are studying. There are four types of observational methods available to you as a
researcher: cross-sectional, case-control, cohort, and ecological. In a cross-sectional study, you
collect data on observed relationships only once. This method has the advantage of being cheaper
and less time consuming than case control and cohort methods. However, cross-sectional studies
may miss relationships that may arise over time.

Using the case control approach, you create cases and controls, and then observe them. One case has
the phenomenon of interest while the other does not. After identifying the cases and controls, you go
back in time to observe how your event of interest occurs in both groups. This is why case-control
studies are called retrospective. For example, suppose a medical researcher suspects that a certain
cosmetic causes skin cancer. You recruit people who have used cosmetics, cases and people who
have not used cosmetics, controls. You ask the participants to memorize the type of cosmetic and
how often they use it.
This method is less expensive and requires less time than the cohort method. However, this approach
has limitations when the people you are observing cannot recall the information correctly. We call
this recall bias because you rely on the participants' ability to remember information. In the cosmetic
example, recall bias would occur if the participants could not remember the exact type of cosmetic
and how many times, they had used it. In the cohort method, you follow people with similar
characteristics over a period of time. This method is beneficial when collecting data about events
occurring over a long period of time. It has the disadvantage of being expensive and more time
consuming. It is also not suitable for events that occur infrequently.
The three methods we discussed earlier collect data on individuals. When we are more interested in
studying a population than in individuals, we use an ecological approach. For example, let's say
you're interested in lung cancer rates in Iowa and North Dakota. You take the number of cancer cases
per 1000 people for each state from the National Cancer Institute and compare them. You can then
make assumptions about the possible causes of the difference between the two states. When you use
the ecological approach, you save time and money because the data is already available. However,
the data collected may lead you to infer population relationships that do not exist.
EXPERIMENT
Experiment is a method of data collection in which you, as a researcher, modify certain variables
and observe their effects on other variables. Variables that you manipulate are called independent
while variables that change due to manipulation are dependent variables. Imagine a manufacturer
testing the effect of a drug's strength on the number of bacteria in the body. The company decided to
test the strength of the drug at 10mg, 20mg and 40mg. In this example, drug strength is the
independent variable while.
The greatest advantage of using an experiment is that you can explore causal relationships that an
observational study cannot. Additionally, experimental research can be adapted to different fields
like medical research, agriculture, sociology, and psychology. Nevertheless, experiments have the
disadvantage of being expensive and requiring a lot of time.

SHORT ANSWER QUESTION WITH MODEL ANSWER
1) What factors are important in designing a good experiment?
Answer: Consider your variables and their relationships. Write a specific and testable hypothesis.
Design experimental treatments to control your independent variable.
Assign subjects to groups, between subjects, or within subjects.

2) What factors influence the experience?
Answer: Anything you can change or control in an experience. Common examples include the
duration of the temperature of the component experiencing some amount of matter light, and so on.
There are three types of variables in an experiment: the controlled variable, the independent variable,
and the dependent variable.
3) What are the five components of a test plan?

Answer: The five steps when designing an experiment are documentary history, observations,
hypothesis, experiment methodology, and conclusions. The researcher follows these steps to reach a
conclusion relevant to the study.
4) What are the 3 components of experimental design?
Answer: There are several types of experimental designs. In general, designs that are truly
experimental have three main characteristics: independent and dependent variables, pre- and post-
test, and experimental and control groups.
02-01-02: PRIMARY DATA COLLECTION

Primary data collection is the process of data collection through surveys, interviews, or
experiments. A typical example of primary data is a household survey. This form of data collection
allows researchers to ensure that primary data meet the standards required for their specific research
question in terms of quality, availability, statistical significance, and sampling. With increasing
access to specialized research tools, research firms, and field manuals around the world, primary data
have become a major source of empirical research in development economics.
In research field one of the most effective means is a primary data collection. Depending on the
question, these interviews can take the form of household surveys, business surveys, or agricultural

(business) surveys. Research teams should plan and prepare for primary data collection in advance.
Currently, supporting her three main components of this process.
RESEARCH TOOL TESTING .

One of the key elements and foundations of statistical research is data collection, and the most basic
data that can be collected in this process is primary data. In other words, data is the basis of all
statistical operations, and primary data is the simplest of all data. Learn what primary data is,
examples and different techniques for collecting primary data.
PRIMARY DATA
Primary data is a type of data collected directly from primary sources by researchers through
interviews, surveys, experiments, and so on‖. Primary data is typically collected from sources.
Where the data originally came from and is considered the best type of data in research. Primary data
sources are typically specifically selected and tailored to meet specific research needs or
requirements. Also, before choosing a data collection source, it is necessary to identify the purpose
of the survey, the target audience, etc. For example, if you're doing market research, the first thing to
do is identify your research objectives and sample population. This will determine which data
collection source is most appropriate. Offline surveys are better suited than online surveys for
populations living in remote areas without internet connectivity.
EXAMPLE OF PRIMARY DATA

1. Market research: This is a key aspect of any business strategy that gathers information about
your target market and customers. The data collected during market research is key as it is
specifically tailored to your business needs. Companies doing market research for upcoming new
products (such as phones) need to collect data from their target market such as purchasing power,
feature preferences, and daily phone usage. Because the products are different, we did not use
historical survey data.
2. Student work: When conducting academic research or final experiments, students collect data
from primary sources. The types of data collected in this process may vary depending on the type
of research being conducted, laboratory experiments, statistical data collection, etc. For example,
let's say you're doing a research project looking at the effects of drinking fruit juice daily on
college students.
3. Trauma Survivors: People react differently to trauma, but people who have experienced the
same type of trauma usually have common characteristics. Studies aimed at clarifying how
victims of sexual abuse have overcome traumatic experiences may include interviewing survivors,

sending surveys, or other key information for data collection. Source is included. Experiences are
different and every situation is unique. Therefore, using secondary data may not be the best option
in this case.
PRIMARY DATA COLLECTION METHODS

Primary data collection methods are the different ways in which primary data can be collected.
Describes the tools used to collect primary data. Some of them are listed below (Fig.1.1).
Fig.2.1.2: Primary data collection methods

1. Job interview
An interview is a method of data collection that involves two groups of people. The first is the
interviewer (the researcher who asks the question and collects the data) and the interviewee (the
subject or respondent who receives the question). Questions and answers during the interview can be
oral or verbal depending on the case. Interviews can be conducted in two ways with her: Personal
conversations and phone conversations. Face-to-face interviews require the interviewer or group of
interviewers to personally question the interviewee.
It may be direct or indirect, structural or structural, central or non-central, etc. Tools used when
conducting face-to-face interviews include notepads and recorders for recording conversations.
Telephone interviews, on the other hand, are conducted over the phone through regular voice or
video calls. His two parties involved can be interviewed using a video call such as Skype. This
requires a mobile phone, laptop, tablet, or desktop computer with internet connection.
ADVANTAGE
i. You can collect detailed information.
ii. Ability to detect non-response and response bias.
iii. You can check the sample.
DISADVANTAGES
i. It takes longer.

ii. Expensive.
iii. Interviewers may be biased.
2. Surveys and questionnaires

Surveys and questionnaires are the similar tools used to collect primary data. They are a series of
questions that are typed or written down and sent to the research sample for answer. After the
required responses have been provided, the survey will be returned to the researcher for their records.
We encourage experts to complete questionnaires and conduct pilot studies aimed at assessing
weaknesses in the questions or techniques used. There are two main types of surveys used for data
collection.
ONLINE AND OFFLINE RESEARCH .

Online surveys are conducted using internet-enabled devices such as mobile phones, PCs and tablets.
They can be communicated to respondents via email, website, or social media. Offline surveys, on
the other hand, do not require an internet connection to run. The most common offline surveys are
paper-based surveys. However, there are also offline surveys like Form Plus that can be completed
using a mobile device without access to an internet connection. This type of survey is known as an
online-offline survey because it can be completed offline but requires an internet connection to be
sent.
ADVANTAGES
i. Respondents are given sufficient time to provide their responses.
ii. No interviewer biases.
iii. Cheaper than an interview.
DISADVANTAGES
i. The non-response bias rate is high.
ii. Inflexible and cannot be changed after submission.
iii. It is a slow process.
3. The Observation
Observation method is primarily used in scientific research. Researchers use observations as
scientific tools and methods of data collection. Observations as a means of data collection are usually
systematically planned and managed. There are various approaches to observation methods.
Structured or unstructured, controlled or uncontrolled, participatory, non-participatory or veiled
approaches. Structured and unstructured approaches are characterized by careful definition of

observation objects, observer styles, conditions, and data selection. Observation processes that
satisfy this are said to be structured, and vice versa.
Controlled and uncontrolled approaches indicate whether the study was conducted in the natural
setting or according to a prearranged plan. When observations are made in a natural environment it is
not controlled, but when it is done in a laboratory it is. Educational institutions may require trial
classes to test the skills of new teachers before hiring them. Evaluators attend classes, observe
lessons, and make participants. Evaluation may also observe from outside the class and decide to
become a non-participant. Reviewers may also be asked to remain in class and disguise themselves
as students to make observations.
ADVANTAGES
i. Data are generally objective.
ii. Dates are not affected by past or future events.
DISADVANTAGES
i. Information is limited.
ii. Expensive
4. Focus Groups
A focus group is a group of two or more people with similar or common characteristics. They seek
candid thoughts and opinions from participants. Focus groups are the primary source of data
collection, as data are collected directly from the participants. This is commonly used for market
research where a group of market consumers discuss with a research moderator. Like an interview,
but includes discussion and dialogue instead of questions and answers. Focus groups are less formal,
with participants leading the bulk of the conversation and a facilitator overseeing the process.
ADVANTAGES
i. Less expensive than an interview. This is because the interviewer does not have to talk to each
participant individually.
ii. It doesn't take long.
DISADVANTAGES
i. In this case, response bias becomes an issue. This is because participants can be subjective
about what people think about sharing their genuine opinions.
ii. Groupthink does not clearly reflect individual opinion.
5. Experiments

An experiment is a structured study in which researchers attempt to understand the causes, effects,
and processes involved in a particular process. This method of data collection is typically controlled
by the researcher, who decides which subjects are used, how they are grouped, and what treatments
they receive. In the first phase of the experiment, researchers choose topics to consider. Therefore,
several actions are performed on these subjects and primary data consisting of actions and reactions
are recorded by researchers. They are then analyzed and conclusions drawn from the results of the
analysis. Experiments can be used to collect different types of primary data, but are primarily used
for laboratory data collection.
ADVANTAGES
i. Collected data is the result of a process and is therefore generally objective.
ii. Non-response bias is eliminated.
DISADVANTAGES
i. Erroneous data may be recorded due to human error.
ii. Expensive.
HOW TO COLLECT PRIMARY DATA WITH FORMPLUS

Primary data can be merged with Formplus via online surveys or questionnaires. So, let me show you
how to use surveys to collect primary data in Formplus.
Step 1: Create your survey
You can choose to create your survey from scratch or use one of Formplus' available templates.
Follow the steps below to create your question. Go to your Formplus account dashboard and click
on Create Form option. Alternatively, you can go to the Templates tab, enter your search query in
the search text box, and click the search icon.
Step 2: Edit and Customize Your Survey
You can edit your survey with over 30 form fields to ask your respondents a variety of relevant
questions. Tailor the questionnaire to the needs of the type of research being conducted. Add new
questions or edit existing questions using form fields in the form builder. In the form builder menu,
click the Customize tab to beautify your survey. You can edit the survey interface by changing its
design, colors, size, etc.
Step 3: Release your survey
After successfully completing the form, the next step is to collect the primary data required for the
survey. To do this, you will need to share your survey with a sample population that will answer the
questions. There are several ways to share your survey with participants. You can copy the form

link, share it directly on social media platforms, or share the QR code. You can also edit the form
link with specific keywords for unique identification.

ADVANTAGES OF PRIMARY DATA OVER SECONDARY DATA
Many examples can be seen of the advantages of primary data over secondary data. This is because
primary data are so heavily used in research, statistics, and even business. The advantages of primary
data are:
i. Specific: By collecting your own data, you are free to answer specific questions about your
company and research goals. In this case, the data collected is exactly what the researcher
wants and needs. Researchers report it in ways that benefit the organization or the state of the
research. For example, when conducting market research for a product, the data collected is
specific to that product.
ii. Accuracy: Primary data are much more accurate than secondary data. For example, when
collecting statistical data from online sources, you run the risk of encountering erroneous data.
This is because unlike data you collect yourself, data available online is not regulated. This is
common in journalism, where blogs share unverified and exaggerated information just to get
cheap traffic.
iii. Ownership: Data collected from primary sources is typically owned by the researcher, who
can choose whether or not to share the data with others. In the aforementioned market
research example, researchers can keep their results to themselves and deny access to
competitors who want to use the information.
DISADVANTAGES OF PRIMARY DATA

i. Expensive: Compared to secondary data, the data collection process for primary data is very
expensive. No matter how little the research is, at least one professional researcher will need to
be employed to carry out the research. Also, the research process itself may cost some amount
of money. How expensive it is, will be determined by which method is used in carrying out the
research.
ii. Time-consuming: Going from the starting point of deciding to perform the research, to the
point of generating data, the time is much longer compared to the time it takes to acquire
secondary data. Each stage of the primary data collection process requires much time for
execution.
iii. Feasibility: It is not always feasible to carry out primary research because of the volume and
unrealistic demands that may be required. For example, it will be unrealistic for a company to

do a census of the people living in a community, just to measure the size of their target market.
A more sensible thing to do in this case will be to use the data of the recorded census to know
the demography of people in that community.

1) What is the purpose of primary data?
Answer: The advantage of using primary data is that researchers can collect information for specific
purposes of their research. Essentially, the questions researchers ask are tailored to gather data useful
for research.
2) What are the most important goals of data collection?

Answer: The primary purpose of data collection is to gather information in a measured and
systematic manner to ensure accuracy and to facilitate data analysis. The data collected is intended to
provide content for data analysis, so the information collected must be of the highest quality to be of
value
3) What are the Benefits of Using Primary Sources?

Answer: Primary sources help students develop their knowledge, skills, and analytical skills. By
working directly with primary sources, students ask questions, think critically, draw intelligent
conclusions, and develop rational explanations and interpretations of past and present events and
issues.
4) What are the goals of data analysis?

Answer: The main purpose of data analytics is to apply statistical analysis and technology to data to
identify trends and solve problems. Data analytics is becoming increasingly important in enterprises
to analyze and design business processes to improve decision-making and performance.
02-01-03: SECONDARY DATA COLLECTION
(SCIENTIFIC JOURNALS, BOOKS, REPORTS, DATA BASE)

Secondary data is data that has been collected and made available from other sources. These data
are cheaper and more readily available than primary data and may also be available when primary
data cannot be obtained. Secondary data is data that has been collected from primary sources and

made available to researchers for use in their own research‖. This is a type of data that has been
collected in the past.
One researcher may have collected data for a particular project and then made it available for use by
another researcher. Data may also be collected for general use without specific research purposes, as
in the case of national censuses. Data that is classified as secondary to a particular search may be
said to be primary to another. This is the case when the data is reused, as primary data for the first
search and secondary data for the second search where it is used.
SECONDARY DATA SOURCE

Secondary data sources include books, personal sources, magazines, newspapers, websites,
government records, etc. Secondary data is known to be always available compared to primary data.
The use of these resources requires very little research and labor.
Fig.3.1: Secondary data sources

With the advent of electronic media and the internet, secondary data sources have become more
easily accessible. Some of these sources are highlighted below (Fig.3.1).
i. Books: Books are one of the most traditional means of data collection. These days, there are
books on every subject you can think of. When performing a search, all you have to do is find a
book on the topic you're looking for and then choose from the stock of books available on that
topic. Books, when carefully selected, are authentic sources of data and can be helpful in
preparing a documentary review.
ii. Published Sources: There are many published sources devoted to different research topics. The
authenticity of the data generated from these sources largely depends on the author and the
publishing company. Published sources may be printed or electronic as appropriate. They can be
paid or free at the discretion of the writer and publisher.

iii. Unpublished personal sources: These sources may not be as readily available and accessible as
compared to published sources. They are only accessible if the researcher shares them with
another researcher, who is not authorized to share them with third parties. For example, an
organization's product management team may need customer feedback data to assess what
customers think of their product and recommend improvements. They will need to collect data
from customer service, which primarily collects data to improve customer service.
iv. Journals: Nowadays, magazines are gradually becoming more important than books when it
comes to data collection. Indeed, journals are regularly updated with new publications
periodically, thus providing up-to-date information. Also, journals are often more specific when
it comes to research. For example, you might have a journal about "Collecting Secondary Data
for Quantitative Data" while a book is simply titled "Collecting Secondary Data".
v. Logs: In most cases, the information transmitted by logs is generally very reliable. Thus,
making it one of the most authentic sources of secondary data collection. The type of data
commonly shared in the press is often political, economic, and educational rather than scientific.
As a result, newspapers may not be the best source for collecting scientific data. Because.
vi. Websites: Information shared on websites is generally not controlled and as such may not be
reliable compared to other sources. However, some managed sites only share authentication data
and can be trusted by researchers. Most of these sites are usually government websites or private
organizations that are paid data collectors.
vii. Blog: Blogs are one of the most popular sources of data online and may even be less authentic
than websites. Nowadays almost everyone has a blog and many people use these blogs to
increase traffic to their website or make money through paid advertisements. So, they can't
always be trusted. For example, a blogger might write nice things about a product because they
are paid by the manufacturer to do it, even though those things are not true.
viii. Diaries: They are personal records and as such rarely used for data collection by researchers.
Also, diaries are usually personal, except for these days when people now share public diaries
containing specific events in their life. A common example of this is Anne Frank's diary which
contained an accurate record of the Nazi wars.
ix. Government Records: Government records are a very important and authentic source of
secondary data. They contain information useful in marketing, management, humanities, and
social science research. Some of these records include; census data, health records, education
institute records, etc. They are usually collected to aid proper planning, allocation of funds, and
prioritizing of projects.
x. Podcasts: Podcasts are gradually becoming very common these days, and a lot of people listen
to them as an alternative to radio. They are more or less like online radio stations and are

generating increasing popularity. Information is usually shared during podcasts, and listeners
can use it as a source of data collection.
SOME OTHER SOURCES OF DATA COLLECTION INCLUDE:

i. Letters
ii. Radio stations
iii. Public sector records.
SECONDARY D ATA COLLECTION TOOLS
Common tools for collecting secondary data include bots, devices, and libraries. To simplify the data
collection process from the secondary data sources highlighted above, researchers use these key tools
described below.
i. Bots: There is so much data online that it can be difficult for researchers to sift through all that
data and find what they are actually looking for. To simplify this data collection process,
programmers have developed bots that perform automated web scraping of relevant data. These
bots are "software robots" programmed to perform tasks for researchers. It is common for
businesses to use bots to pull data from forums and social media for sentiment and competitive
analysis.
ii. Internet-enabled device: This is a mobile phone, PC, or tablet with access to an Internet
connection. Used to access magazines, books, blogs, etc. Collect secondary data.
iii. Library: This is a traditional secondary data collection tool for researchers. The library contains
materials related to almost every field of study imaginable and is accessible to everyone.
Researchers may decide to sit in the library for a while and collect secondary data, or borrow
material for a while and return when they have finished collecting the necessary data.
iv. Radio: Radio stations are one of the secondary sources of data collection and require radio
access. With the advent of technology, it has even become possible to listen to the radio on your
mobile phone, as it is considered unnecessary to buy one.
SECONDARY D ATA ANALYSIS

Secondary data analysis is the process of analyzing data collected primarily by another researcher
who collected the data for another purpose. Researchers use secondary data to save time and resources
spent collecting primary data. Secondary data analysis has different phases, including pre-collection,
during-collection, and post-collection events.
THESE PHASES INCLUDE :

i. Purpose statement: Before collecting secondary data for analysis, the purpose should be known.
This means having a clear understanding of why you are collecting data (the ultimate goal of

your research work) and how this data can help you achieve that. This will help you collect the
right data and choose the best data sources and analysis methods.
ii. Research Design: This is a written plan of how the research activity will be conducted. It also
describes the types of data we collect, the sources of data collection, the methods and tools of
data collection, and even the methods of analysis.
iii. Developing the Research Questions: It is not enough to know the purpose of the survey; you
need to formulate survey questions that help you better identify secondary data. This is because it
is typically a pool of data to choose from, and by asking the right questions you can gather
reliable data. For example, a researcher trying to collect data on optimal fish diets that allow
rapid growth of fish should ask questions such as: What kind of fish should be considered? Are
the data quantitative or qualitative? What is the diet of the fish? Such as the growth rate of fish
after feeding.
iv. Identifying secondary data: After creating research questions, researchers use them as guides to
identify relevant data from data stores. For example, if the data collected is qualitative,
researchers can exclude qualitative data. Good secondary data are those that correctly answer the
questions above. For example, if you're looking for a solution to a linear programming problem,
the solution is the number that satisfies both your goal and your constraints. An answer that
doesn't do both is not a solution.
v. Secondary Data Analysis Phase: This phase is what many call the actual data analysis phase as
it is where the analysis is actually performed. However, the phases highlighted above are part of
the data analysis process as they affect how the analysis is performed. Once you have found a
dataset that seems suitable to meet the initial requirements above, the next step in the process is
to evaluate the dataset to ensure that it is suitable for your research topic. The data are evaluated
in a way that actually addresses the issue and answers the research question. Depending on the
data type, it is analyzed using quantitative or qualitative methods.
ADVANTAGES OF SECONDARY DATA

i. Economical, save effort and cost.
ii. It saves time.
iii. Secondary data helps make primary data collection more specific by identifying what gaps or
deficiencies exist and additional information needs to be collected.
iv. Helps develop a better understanding of the problem.
v. This provides a basis for comparing data collected by researchers.


DISADVANTAGES OF SECONDARY DATA
Secondary data rarely fit within the framework of market research elements. Reasons for non-
customization are:
i. Unit of Secondary Data Collection- Suppose you want disposable income information but
gross income data is available.
ii. The information you need may differ.
iii. Class boundaries may be different when units are same. iv. Thus, the data collected earlier
is of no use to you.
v. Accuracy of secondary data is not known.
vi. Data may be outdated.
EVALUATION OF SECONDARY DATA

Because of the above-mentioned disadvantages of secondary data, we will lead to evaluation of
secondary data. Evaluation means the following four requirements must be satisfied: -
i. Availability- It has to be seen that the kind of data you want is available or not. If it is not
available then you have to go for primary data.
ii. Relevance- It should be meeting the requirements of the problem. For this we have two
criteria: -
a. Units of measurement should be the same.
b. Concepts used must be same and currency of data should not be outdated.
iii. Accuracy- In order to find how accurate, the data is, the following points must be
considered: -
a. Specification and methodology used;
b. Margin of error should be examined;
c. The dependability of the source must be seen.
iv. Sufficiency- Adequate data should be available. Robert W Joselyn has classified the above
discussion into eight steps. These eight steps are sub classified into three categories. He has
given a detailed procedure for evaluating secondary data.
a. Applicability of research objective.
b. Cost of acquisition.
c. Accuracy of data.

1) What is the short answer for secondary data?
Answer: Secondary data is basically second-hand information. They are not collected as primary
data from the source. In other words, secondary data is already collected.
Therefore, they are relatively less reliable than primary data.
2) What is a better definition of desk research?
Answer: Secondary research is a type of research that has already been edited, collected, curated and
published by others. This includes reports and research from government agencies, industry
associations or other companies in the industry.
3) What are the properties of secondary data?

Answer: A characteristic of secondary data is the reliability of the data. Any secondary data you use
must be reliable. Data suitability; Data sufficiency; books; magazines and periodicals. Newspaper.
Electronic journal. General website. More articles.
4) Which secondary data describe advantages and disadvantages?

Answer: Secondary data is data that has already been collected from other sources and is readily
available from other sources. Such data are cheaper and more readily available than primary data,
and may be available even when primary data are not available at all.
02-01-04: INTERVIEW AND QUESTIONNAIRE

Conducted primarily in qualitative research, interviews are conducted when an investigator asks one
or more participants general, open-ended questions and records their responses. Researchers often
transcribe data into computer files for post-interview analysis.
Interviews are especially useful for revealing the story behind the participant's experience or for
getting more information on the topic. Interviews are useful for following up with individual
respondents after a survey. To explore their answers further. Qualitative research, in particular, uses
interviews to explore the meaning of central topics in a subject's living environment. The main task
of the survey is to make sense of what respondents are saying.
Fig.2.1.4: Face -face interview

Open-ended questions are usually asked during interviews in the hope of obtaining unbiased
responses, whereas closed-ended questions may force participants to answer in a particular way.
Open-ended questions give participants more options to answer. For example, an open-ended
question might be something like, "How do you balance exercise and schoolwork?" Closed question
provides default answer. Example: Answering "Do you exercise?" is limited to yes or no.
What the interviewer should know before the interview:

i. Knowledgeable- Familiar with the subject.
ii. Structured - Outline the flow of the interview.
iii. Clarity- Ask simple, easy, short questions and speak clearly and comprehensibly.
iv. Gentle - Tolerant, sensitive and patient when it comes to provocative and unconventional
opinions.
v. Control - Control the flow of the interview to avoid going off topic.
vi. Critical - Tests the reliability and validity of the information provided by the respondent.
vii. Remember - Save the information provided by respondents.
viii. Interpretation - Interpret what the interviewee said.
DIFFERENT TYPES OF INTERVIEWS

i. One-to-one: The most time-consuming and expensive approach, but the most common in
educational research. Suitable for interviewees who are comfortable writing and speaking to
one participant at a time.
ii. Focus Group: Usually a group of 4 to her is 6 people.
iii. Phone: Can be easy and quick, but usually only a few questions can be asked.
iv. Email: Easy to fill, thoughtful questions and answers. Ethical issues may need to be
addressed. For example, whether researchers obtained written permission from individuals
prior to participating in interviews, and the confidentiality of responses.
v. Open questions in surveys: We recommend using only open questions in interviews.
Because they are primarily of a qualitative nature.

STRUCTURED V /S UNSTRUCTURED
i. Structured or semi-structured format: involve prepared sheets that allow the interviewee
to choose from existing responses, resulting in a set of responses that are easy to analyses.

a. The interviewer might consider a summary column at the end or to the side of your sheet
in order to fill in additional information.
b. Most interviews are a combination of structured and unstructured, allowing flexibility.
ii. Unstructured format: Prompts or probes that remind the interviewer about topics to
discuss. Enables the researcher to produce a wealth of valuable data / insight, but requires
skill.
a. The interviewer might consider recording the interview or informing the participant that
they will be taking notes before starting.
b. One type of unstructured interview is a ‗preliminary interview,’ where the interviewer is
seeking areas or topics of significance for the interviewees.
iii. Focused interview: framework is established prior to the interview and recording / analysis
are simplified. Flow between topics is uninterrupted or free flowing.
SEQUENCE OF QUESTIONS
i Involve the respondent in the interview as early as possible.
ii Before asking about controversial topics (feelings, conclusions, etc.), ask for some facts first.
iii Use factual questions throughout the interview.
iv Before asking questions about the past or future, ask about the present.
v The final question allows respondents to provide additional information and impressions of
the interview that they consider relevant.
vi Questions should be asked carefully.
vii Questions should be asked one at a time.
viii Language should be open. Respondents should be able to select their own descriptive
vocabulary while answering questions.
ix Questions should be as neutral as possible.
x Be careful with why questions.
STRENGTHEN
i. Interviews provide useful information when participants cannot be observed directly.
ii. Interviewers have more control over the types of information they receive. You can choose
your own question.
iii. Effectively worded questions encourage unbiased and honest answers.
WEAKNESS
i. If only one interviewer interprets the information, the respondent may provide biased
information or become unreliable. The best research requires different perspectives.

ii. Interview answers may be deceptive because respondents attempt to answer in a way that
pleases the interviewer.
iii. Equipment can be a problem. The equipment can be expensive and requires a high degree of
technical expertise to use.
iv. Can be time consuming and inexperienced interviewers may not be able to properly focus on
questions.
CONDUCTING INTERVIEWS
These are procedures that are consistent in the literature for conducting interviews in research.
i. Identify respondents.
ii. Decide what type of interview to use.
iii. During the interview, tape the questions and answers.
iv. Take short notes during the conversation.
v. Find a quiet place suitable for the interview.
vi. Obtain the interviewer's consent to participate in the study.
vii. Plan, but be flexible.
viii. Obtain additional information using probes.
ix. Be courteous and professional after the interview.
 STRENGTHEN
i. Interviews provide useful information when participants cannot be observed directly.
ii. Interviewers have more control over the types of information they receive. You can choose
your own question.
iii. Effectively worded questions encourage unbiased and honest answers.
 WEAKNESS
i. If only one interviewer interprets the information, the respondent may provide biased
information or become unreliable. The best research requires different perspectives.
ii. Interview answers may be deceptive because respondents attempt to answer in a way that
pleases the interviewer.
iii. Equipment can be a problem. The equipment can be expensive and requires a high degree of
technical expertise to use.
iv. Can be time consuming and inexperienced interviewers may not be able to properly focus on
questions.
QUESTIONNAIRES

Questionnaires have many uses and are primarily used to find out what the public thinks. These
include market research, political research, customer service feedback, ratings, opinion polls and
social science research. Bell and Waters (2014) and O'Leary (2014) each provide a clear checklist for
creating a survey from start to finish. We've compared the two to make a comprehensive list. Bell
first reminds the researcher to get permission before answering the questionnaire, then ponders what
our question is and whether this is the best way to get the intended information.
O'Leary (2014) suggests first making concepts operational and defining measurable variables. Before
creating your own question, you should consider existing ways of adapting previous tools rather than
reinventing the wheel. At this point, ask both authors to write a question.
Fig.2.1.5: Model questionaries’

QUESTION STRUCTURE
Common types of question patterns are:
i. Oral / open
ii. List
iii. Categories
iv. Ranking
v. Crowd
vi. Communication network
vii. Scale
However, ambiguity and imprecision, assumptions, memory, knowledge, duplicate questions,
leading questions, presumptive questions, hypothetical questions, offensive questions, questions
about sensitive issues, etc. Emphasize that there are many potential problems. It is imperative to
search for jargon in your own language and come back to your hypotheses and goals frequently to
determine which questions are most relevant. O'Leary (2014) elaborates on issues such as
ambiguity, leadership, conflict, insults, unjustified assumptions, double-questioning, and
presumption. According to O'Leary, questions to avoid are:
i. Poorly worded
ii. Bias, Leading, or Loading
iii. Respondents have the following issues:

a. Memory-dependent questions
b. Offensive questions
c. Questions about assumed knowledge
d. Questions containing unfounded assumptions
e. Questions with socially desirable answers.
ORDERING QUESTIONS / APPEARANCE AND LAYOUT

Both authors emphasize thoughtfulness in the order of questions, consideration of logic, and ease
for the respondent. O'Leary (2014) elaborates further on composition and length issues. If it is too
long, respondents will be less likely to complete the survey. He also advises researchers to avoid
threatening, embarrassing, offensive, or difficult questions, especially at the beginning of the
questionnaire. Allow room for readability, limit the total number of pages, and consider the
impression the document makes, highlighting a few examples.
WRITE AN INSTRUCTION
This step is followed by the layout or rearrangement of the questions in both explanations. This is
probably because once the questions and other text are ready, this is the best time to review. O'Leary
(2014) encourages researchers to use formats that are professional and aesthetically pleasing,
engaging respondents and structured in a way that reduces the likelihood of making mistakes (e.g.
repeating questions). O`Leary (2014) provides final instructions to include a cover he latter
explaining who you are, project goals, non-disclosure agreements, etc. However, Bell & Waters
(2014) provide further instructions.
SAMPLE AND PILOT TESTS

Bell and Waters (2014) elaborate on response rates, and provide a representative or generalizable
sample. A better procedure is to test the questionnaire on preliminary respondents (including family
and friends), perform preliminary data analysis to ensure the method is effective, and adjust
accordingly. A typical pilot test has her list of six steps:
i. Have a pass
ii. Reflection
iii. Gather feedback
iv. Test the stats package

v. Make changes
vi. Back to top?
DISTRIBUTION
Bell and Waters (2014) give a brief description of the distribution method. They emphasize the need
to ensure confidentiality, provide return dates, and develop plans for 'returns' by email, and record
data immediately upon receipt. O'Leary (2014), faceto-face, mail, emails, and online are typical
methods. Bell and Waters (2014) emphasize the benefits of personal administration of
questionnaires, as researchers explain the purpose of the study and are more likely to receive
completed questionnaires. The authors continue to emphasize the value of online methods.
In particular, they cite "Survey Monkey" as the most popular and versatile survey tool available.
Students are encouraged to send reminders or emails to increase response rates and speed of
response.
ANALYSIS
Bell and Waters (2014) and O'Leary (2014) again disagree on the analysis. O'Leary (2014) suggests
collecting data as soon as possible, whereas (Bell (2014) suggests that researchers review responses
before coding and recoding only when time permits. Both methods have their merits: the amount of
data that can be used to make logical decisions and the amount of data that is available
O`Leary (2014) raises some concerns about using questionnaires as a research tool, as it is time
consuming, expensive and difficult. O'Leary (2014) argues that surveys are "notoriously difficult to
create" and often don't go according to plan.
STRENGTHEN
O`Leary (2014) finds this research method useful because administering questionnaires allows
researchers to generate data specific to their study and provide insights that may not otherwise be
available. It suggests some obvious advantages. In listing additional benefits of surveys, they
suggest:
i. Reach a large number of respondents
ii. Represents a larger population
iii. Allow comparison
iv. Generate standardized, quantifiable empirical data
v. Generate qualitative data using open-ended questions

vi. Confidentiality, anonymous
SHORT ANSWER QUESTION WIRH MODEL ANSWER

1) What is primary data collection?
Answer: Primary data collection is the process of collecting data through surveys, interviews, or
experiments. A typical example of primary data is a household survey.
2) What is the importance of collecting data?

Answer: Collecting data allows us to store and analyze important information about existing and
potential customers. Collecting this information can also save your business money by building a
customer database for future marketing and retargeting efforts.
3) How are primary data used in your research?

Answer: Primary research is data that you obtain directly. This means that the researcher either
conducts the research himself or outsources the data collection on his behalf. Primary research means
going straight to the source rather than relying on the data that already exists.
4) What are the distinguishing characteristics of primary data?

Answer: The characteristics of primary data are: Primary data is the original data. Primary data is
expensive. Collecting primary data requires a lot of time and effort. They are collected based on the
issue. They are intentionally collected from relevant respondents.
1. Which of these statements is true for collecting information from a third party?
a. The indirect oral investigation is used to collect data from the third parties
b. The mailed questionnaire method is apt for gathering information from third parties
c. Third parties prefer direct personal interviews to provide data to the researcher d. All of the
above
2. The main feature of secondary source of data is that -----------------.
a. It provides first-hand information to the researcher
b. It is more reliable compared to primary data
c. It implies that the data is collected from its original source

d. It involves collecting data from existing primary data
3. Which of the following are known as the types of research data?
a. Organized data and unorganized data
b. Qualitative data and quantitative data
c. Processed data and unprocessed data
d. None of the above
4. Which of the following statements is true about the collection of data?
a. The data that is collected from the place of origin is known as primary data
b. The data that is collected from the place of origin is known as secondary data
c. The data that is collected from the place of origin is known as tertiary data
5. Which of the following statements is true about data in research?
a. The data used for research is quantitative
b. The data used for research can be qualitative but never quantitative
c. The data used for research can be both quantitative and qualitative
d. The data used for research can be quantitative but never qualitative
Answer:
1-a 2-d 3-b 4-a 5-c
STATE WHETHER THE FOLLOWING STATEMENTS ARE TRUE OR FALSE
1. All statistics are numerical statement of facts. (True/False)

2. Statistics does not deal with interpretation of data. (True/False)
3. Primary data is original in nature. (True/False)
4. The primary data is much more reliable than secondary data. (True/False)
5. The indirect oral investigation is used to collect data from the second parties. (True/False)
Answer:
1-b 2-c 3- 4-a 5-d

1. ------------ the collection of related information.
2. ----------------- table contains records and Fields.
3. Database ---------------------- duplication of data.
4. Excel is good for ------------------- charts purpose.

5. In database, a row is known as ----------------.
Answer:
1- Record 2- Database 3- redundancy 4- build great 5- tuple
SUMMARY
Bell and Waters (2014) and O'Leary (2014) each provide a clear checklist for creating a survey from
start to finish. Bell first reminds the researcher to get permission before answering the questionnaire,
then ponders what our question is and whether this is the best way to get the intended information.
Before creating your own question, you should consider existing ways of adapting previous tools
rather than reinventing the wheel. A variable whose level is set by the experimenter. Here are four
main factors to consider when designing and conducting data collection experiments.
Primary data collection is the process of data collection through surveys, interviews, or
experiments. This form of data collection allows researchers to ensure that primary data meet the
standards required for their specific research question in terms of quality, availability, statistical
significance, and sampling. Field research is one of the most effective means of primary data
collection. Online surveys are conducted using internet-enabled devices such as mobile phones, PCs
and tablets. Offline surveys, on the other hand, do not require an internet connection to run.
However, there are also offline surveys like Form Plus that can be completed using a mobile device
without access to an internet connection.
Secondary data is data that has been collected and made available from other sources. Secondary
data is data that has been collected from primary sources and made available to researchers for use in
their own research. Secondary data is known to be always available compared to primary data.
Secondary data analysis is the process of analyzing data collected primarily by another researcher
who collected the data for another purpose. The process of secondary data analysis can be
quantitative or qualitative, depending on the type of data the researcher is working with. Quantitative
methods of secondary data analysis are applied to numerical data and analyzed mathematically,
whereas qualitative methods use words to provide detailed information about the data.
Secondary data analysis has different phases, including pre-collection, during collection, and post-
collection events. This means having a clear understanding of why you are collecting data (the
ultimate goal of your research work) and how this data can help you achieve that. This will help you
collect the right data and choose the best data sources and analysis methods. For example, a
researcher trying to collect data on optimal fish diets that allow rapid growth of fish should ask
questions such as: What kind of fish should be considered? For example, if the data collected is
qualitative, researchers can exclude qualitative data. Depending on the data type, it is analyzed using
quantitative or qualitative methods.

Conducted primarily in qualitative research, interviews are conducted when an investigator asks one
or more participants general, open-ended questions and records their responses. Open-ended
questions are usually asked during interviews in the hope of obtaining unbiased responses, whereas
closed-ended questions may force participants to answer in a particular way. Open-ended questions
give participants more options to answer.
KEY WORDS
Primary data - Data generated by the researchers themselves, surveys, interviews, experiments
specifically designed to understand and solve the research question at hand.
Secondary data - This is data that has already been collected from primary sources and made
available to researchers for use in their own research.
Survey - A survey is a list of questions or items used to collect data about a respondent's attitudes,
experiences, or opinions.
Data Reliability - Data reliability means that the data is complete and accurate.
Bot - A computer program that acts as an agent for a user or another program. Podcast - A digital
audio file that you can download to your computer or mobile device over the Internet.
Journals - Journals are academic publications containing articles written by researchers, professors,
and other professionals.
Trauma - Emotional reactions to horrific events such as accidents, rapes, and natural disasters.
Focus Groups - A research method that brings together a small group of people to answer questions
in a moderated environment.
REFERENCES
1. Bell, J., Waters, S., &EBooks Corporation. (2014). Doing your research project: A guide for
first-time researchers (Sixth ed.). Maidenhead, Berkshire: Open University Press.
2. Creswell, J. W. (2009). Research design: Qualitative, quantitative, and mixed methods approach
(3rd ed.). Los Angeles: Sage.
3. Kvale, S., and Sage Research Methods Online. (2008). Doing interviews. Thousand Oaks;
London: SAGE Publications, Limited.
4. McNamara, C. (1999). General Guidelines for Conducting Interviews, Authenticity Consulting,
LLC, Retrieved from: http://www.managementhelp.org/evaluatn/ intrview.htm
YOUTUBE VIDEOS
1. https://www.youtube.com/watch?v=lqqJ5BmXzB0
2. https://www.youtube.com/watch?v=oLcxcx4blTc
3. https://www.youtube.com/watch?v=2y8w3AoxHSE
4. https://www.youtube.com/watch?v=iecJry3Kwrk&list=PL0SUHdavZkG0sSbMWQgs
YJJkujK7ujKU
WIKIPEDIA
1. https://www.formpl.us/blog/primary-data
2. https://lled500.trubox.ca/2016/225
3. https://en.wikipedia.org/wiki/Data_collection
4. https://www.techtarget.com/whatis/definition/secondary-data
REFERENCE BOOKS
1. Dr. Roger Sapsford, Victor Jupp, ―Data Collection and Analysis, 2006.
2. Jovancic, Nemanja.Data Collection Methods for Obtaining Quantitative and Qualitative Data.
LeadQuizzes, 2005.
3. Schutt, R. Investigating the Social World. Sage Publications, 2006.
4. Corti, L. and Bishop, L. 'Strategies in Teaching Secondary Analysis of Qualitative
Data' FQS 6(1), 2005.

CREDIT 02-UNIT 02: REPRESENTATION OF DATA
LEARNING OBJECTIVES
 Understand data sets and present them in multiple ways
 Create line charts, summary charts, and bar charts to display the same data in different
formats
 Interpret line, summary, and bar charts to answer questions about data sets in multiple
formats
 Compare and express opinions on different presentations of data
 Determine the appropriate presentation for different situations
“You can achieve simplicity in the design of effective charts, graphs and tables by remembering
three fundamental principles: restrain, reduce, emphasize.” - Garr Reynolds
INTRODUCTION
A data representation is meant, any convention for the arrangement of things in the physical world
in such a way as to enable information to be encoded and later decoded by suitable automatic
systems. Data Representation refers to the form in which data is stored, processed, and transmitted.
For which devices are used, such as smart phones, iPods, and computers store data in digital formats
that can be handled by electronic circuitry.
Fig.2.1: Representation of data

Data and information presentation are an interdisciplinary field that deals with the graphic
representation of data and information. It is a particularly efficient way of communicating when
the data or information is numerous for example a time series. It is also the study of visual
representations of abstract data to reinforce human cognition. The abstract data include both
numerical and non-numerical data, such as text and geographic information. It is related to info
graphics and scientific visualization. One distinction is that it's information visualization when the
spatial representation is chosen, whereas it's scientific visualization when the spatial representation is
given.

From research point of view, this representation can be considered as a mapping between the original
data (usually numerical) and graphic elements (for example, lines or points in a chart). The mapping
determines how the attributes of these elements vary according to the data. In this light, a bar chart is
a mapping of the length of a bar to a magnitude of a variable. Since the graphic design of the
mapping can adversely affect the readability of a chart, mapping is a core competency of data
visualization. A research graph shows differences in line length, shape, orientation, distances, and
color readily without significant processing effort. This type of representation can be achieved by the
performing the current study.
02-02-01: QUANTITATIVE DATA

Quantitative data is the value of data in the form of counts or numbers where each data set has a
unique numerical value. This data is any quantifiable information that researchers can use for
mathematical calculations and statistical analysis to make real life decisions based on these
mathematical derivations.
Quantitative data answer questions such as “How many?”, “How often?”, “How much?”. This data
can be verified and conveniently evaluated using mathematical techniques.
For example, there are quantities corresponding to various parameters. For instance, How, many
students of M. Sc. Bioscience passed? is a question that will collect quantitative data. Values are
associated with most measuring parameters such as pass class, second class, first class, outstanding,
etc.,
Quantitative data makes measuring various parameters controllable due to the ease of mathematical
derivations they come with. It is usually collected for statistical analysis using surveys, polls, or
questionnaires sent across to a specific section of a population.
Researches can establish the retrieved results across a population (Fig.1.1).
Fig.2.2.1: Types of quantitative data

TYPES OF QUANTITATIVE DATA WITH EXAMPLES
The most common types of quantitative data are as below:
i. Counter: Count equated with entities. For example, the number of people downloading a
particular application from the App Store.

ii. Measurement of physical objects: Calculating measurement of any physical thing. For
example, the HR executive carefully measures the size of each cubicle assigned to the newly
joined employees.
iii. Sensory calculation: Mechanism to naturally ―sense‖ the measured parameters to create a
constant source of information. For example, a digital camera converts electromagnetic
information to a string of numerical data.
iv. Projection of data: Future data projection can be made using algorithms and other
mathematical analysis tools. For example, a marketer will predict an increase in sales after
launching a new product with a thorough analysis.
v. Quantification of qualitative entities: Identify numbers to qualitative information. For
example, asking respondents of an online survey to share the likelihood of recommendation
on a scale of 0-10.

QUANTITATIVE DATA COLLECTION METHODS
As quantitative data is in the form of numbers, mathematical and statistical analysis of these
numbers can lead to establishing some conclusive results.
There are two main Quantitative Data Collection Methods:
1. Survey
2. Interview
1. Surveys: Traditionally, surveys were conducted using paper-based methods and have gradually
evolved into online mediums. Closed-ended questions form a major part of these surveys as they are
more effective in collecting quantitative data. The survey makes include answer options which they
think are the most appropriate for a particular question. Surveys are integral in collecting feedback
from an audience which is larger than the conventional size. A critical factor about surveys is that the
responses collected should be such that they can be generalized to the entire population without
significant discrepancies. On the basis of the time involved in completing surveys, they are classified
into the following: –
i. To Longitudinal Studies: A type of observational research in which the market researcher
conducts surveys from a specific time period to another, i.e., over a considerable course of
time, is called longitudinal survey. This survey is often implemented for trend analysis or
studies where the primary objective is to collect and analyze a pattern in data.
ii. Cross-sectional Studies: A type of observational research in which the market research
conducts surveys at a particular time period across the target sample is known as cross-

sectional survey. This survey type implements a questionnaire to understand administer a
survey to collect quantitative data, the below principles are to be followed.
a. Fundamental Levels of Measurement – Nominal, Ordinal, Interval and Ratio Scales:
There are four measurement scales which are fundamental to creating a multiple-choice
question in a survey in collecting quantitative data. They are, nominal, ordinal, interval
and ratio measurement scales without the fundamentals of which, no multiple-choice
questions can be created.
b. Use of Different Question Types: To collect quantitative data, close-ended questions
have to be used in a survey. They can be a mix of multiple question types including
multiple-choice questions like semantic differential scale questions, rating scale
questions etc. that can help collect data that can be analyzed and made sense of.
c. Survey Distribution and Survey Data Collection: In the above, we have seen the
process of building a survey along with the survey design to collect quantitative data.
Survey distribution to collect data is the other important aspect of the survey process.
There are different ways of survey distribution.
Some of the most commonly used methods are:
i. Email: Sending surveys via email is the most popular and effective method for
distributing surveys. You can use Question Pro email management feature to send
and collect survey responses.
ii. By Respondents: Another effective way to distribute surveys and collect
quantitative data is to use templates. Since the respondents were knowledgeable and
also open to participating in the studies, the responses would be much higher.
iii. Embedding a survey into a web page: Embedding a survey on a web page
increases the number of responses high because respondents are close to the point
when the survey appears.
iv. Social distribution: Using social media to distribute a survey helps get more
responses from people who know the brand.
v. QR Code: The Question Pro QR Code stores the survey URL. You can
print/publish this code in magazines, on signs, business cards or any object/vehicle.
Because SMS Polls: A quick and effective way to conduct a poll to collect a large
number of responses is an SMS poll.
vi. Question Pro App: The Question Pro app allows for quick delivery of surveys and
responses that can be collected online and offline.

vii. API integration: You can use the Question Pro platform's API integration to ask
potential respondents to complete your survey.
2. One-to-one Interviews: This method of quantitative data collection has traditionally also been
done in person, but has moved to phone and online platforms. Interviews provide marketers with the
opportunity to gather detailed data from participants. Quantitative interviews are highly structured
and play an important role in information gathering. These online interviews have three main parts:
i. Face-to-face Interview: The interviewer can prepare a list of important interview questions
in addition to the survey questions already asked. In this way, the interviewees provide full
details about the topic under discussion. Interviewers can successfully interact with
interviewees on a personal level, which will help them gather more information on the topic,
which will also improve responses. The interviewer may also ask respondents to explain
unclear answers.
ii. Online/Telephone Interviews: Telephone interviews are nothing new, but these quantitative
interviews have also migrated to online media such as Skype or Zoom. Regardless of the
distance between interviewers and interviewees and their respective time zones,
communication becomes a click away with online interviews. In the case of a phone
interview, the interview is simply a phone call.
iii. Computer-assisted personal interview: This is a personal interviewing technique in which
the interviewer enters all collected data directly into a laptop or other similar device.
Processing time is reduced and interviewers do not have to carry the actual questionnaire and
simply enter responses into a laptop.
QUANTITATIVE DATA ANALYSIS

Data collection is an important part of the research process. However, these data must be analyzed to
understand them. There are several methods for analyzing quantitative data collected in surveys.
That is:
i. Cross Table: Cross table is the most widely used method of quantitative data analysis. This is
the preferred method because it uses the basic tabular form to draw conclusions between the
different data sets in study. It contains mutually exclusive or related data.
ii. Trend Analysis: Trend analysis is a method of statistical analysis that examines quantitative
data collected over a long period of time. This data analysis method collects feedback on the
changes of data over time and aims to understand the change of variables by considering that
one variable remains unchanged.

iii. Max Diff Analysis: Max Diff analysis is a quantitative data analysis method used to evaluate
the purchasing preferences of customers, and has larger parameters than the other customers
in the process. In its simplest form, this method is also known as the "best-worse" method.
This method is very similar to conjoined analysis but is much easier to perform and can be
used interchangeably.
iv. Link Analysis: As in the method above, link analysis is a similar quantitative data analysis
method to analyze the basic parameters of a purchase decision. This method is capable of
collecting and analyzing advanced metrics that provide insights related to purchasing
decisions along with the most important metrics.
v. TURF Analysis: TURF analysis or Total Reach and Frequency Analysis is a method of
quantitative data analysis that assesses the overall market reach of a product or service or
combination. This method is used by organizations to understand side matrices to represent
quantitative data that helps measure the difference between expected and actual performance.
This data analysis helps measure performance gaps and what needs to be done to close them.
vi. SWOT Analysis: A SWOT analysis is a method of quantitative data analysis that assigns
numerical values to indicate the strengths, weaknesses, opportunities, and threats of an
organization, product or service, based on which gives an overall picture of the competition.
This method helps to create effective business strategies.
vii. Text analysis: Text analysis is an advanced statistical method in which intelligent tools make
sense and quantify or transform qualitative and open data into understandable data. This
method is used when raw survey data is unstructured but needs to be combined into a logical
structure.
STEPS TO CONDUCT QUANTITATIVE DATA ANALYSIS

For quantitative data, raw information must be presented in a meaningful way using data analysis
methods. Quantitative data should be analyzed to find evidence that can help the research process.
i. Associate measures with variables: Combine measures such as Nominal, Normal, Interval,
and Proportional with variables. This step is very important to organize the data in the correct
order. Data can be imported into an Excel sheet to arrange it in a specific format.
ii. Link descriptive statistics to data: Link descriptive statistics to encapsulate available data.
It can be difficult to establish a pattern in the raw data. Some of the widely used descriptive
statistics are:
a. Average - The average of the values for a particular variable
b. Median - The midpoint of the scale for a variable

c. Mode- For a variable, the most common value
d. Frequency- Number of times a particular value is observed in the scale
e. Min and Max Values - Min and Max Values for the scale
iii. Decide on the scale: It is important to decide on the scale to conclude descriptive statistics
for the variable. For example, a nominal variable score will never have a mean or mean, and
the descriptive statistics will therefore change accordingly. Descriptive statistics are sufficient
in situations where the results are not generalizable to the population.
iv. Choose the right table to represent the data and analyze the collected data: After
deciding on the appropriate scale, researchers can use a tabular format to represent the data.
This data can be analyzed using different techniques such as cross-tabulation or TURF.
ADVANTAGES OF QUANTITATIVE DATA

Some of the advantages of quantitative data are as below-
i. Conduct in-depth research: Since quantitative data can be analyzed statistically, research has a
high potential for detail.
ii. Minimal bias: There were cases in research where individual bias was involved, leading to
inaccurate results. Due to the numerical nature of quantitative data, individual bias is
minimized to a large extent.
iii. Accurate results: The results obtained are objective in nature, they are extremely accurate.
DISADVANTAGES OF QUANTITATIVE DATA ARE

Some of the disadvantages of quantitative data are as below-
i. Limited information. Since quantitative data are not descriptive,
ii. it is difficult for researchers to make decisions based solely on the information gathered.
iii. Depends on the type of question. The bias in the results depends on the type of question
included to collect the quantitative data. Knowing the researcher's question and purpose of the
study is extremely important when collecting quantitative data.

1) What is quantitative data?
Answer: Quantitative data is used when a researcher tries to quantify a problem or address the
"what" or "how much" aspects of a research question. It is data that can be counted or compared on a
numerical scale.

2) What is the type of quantitative data?
Answer: Quantitative data are measurements of values or counts and are expressed as numbers.
Quantitative data is data about variables (for example, how much; how much; or frequency).
Qualitative data are "type" measurements and can be represented by names, symbols, or codes.
3) What is an example of quantitative data?

Answer: Quantitative data is data that can be counted or measured with numerical values. The two
main types of quantitative data are discrete data and continuous data.
Height in feet, age in years, and weight in pounds are examples of quantitative data
4) What is the best example of quantitative data?
Answer: Here are some examples of quantitative data: A one-gallon jug of milk. The painting
measures 14 inches wide and 12 inches long. Newborn baby weighs 6 pounds and 5 ounces. A 4-
pound bag of broccoli crowns. One cup of coffee contains 10 oz. Dr. Ajay is six feet tall. One tablet
weight 1.5 pounds.
02-02-02: QUALITATIVE DATA

Qualitative data is defined as the data that approximates and characterizes. Qualitative data can
be observed and recorded. This data type is non-numerical in nature. This type of data is collected
through methods of observations, one-to-one interviews, conducting focus groups, and similar
methods. Qualitative data in statistics is also known as categorical data that can be arranged
categorically based on the attributes and properties of a thing or a phenomenon.
For example, think of a student reading a paragraph from a book during one of the class sessions. A
teacher who is listening to the reading gives feedback on how the child read that paragraph. If the
teacher gives feedback based on fluency, intonation, throw of words, clarity in pronunciation without
giving a grade to the child, this is considered as an example of qualitative data.
It’s pretty easy to understand the difference between qualitative and quantitative data. Qualitative
data does not include numbers in its definition of traits, whereas quantitative data is all about
numbers. Examples as-
i. The cake is orange, blue, and black in color (qualitative).
ii. Females have brown, black, blonde, and red hair (qualitative).
iii. There are four cakes and three muffins kept in the basket (quantitative).
iv. One glass of fizzy drink has 97.5 calories (quantitative).


IMPORTANCE OF QUALITATIVE DATA
Qualitative data is important in determining the particular frequency of traits or
characteristics. It allows the statistician or the researchers to form parameters through which larger
data sets can be observed. Qualitative data provides the means by which observers can quantify the
world around them.
For a market researcher, collecting qualitative data helps in answering questions like, who their
customers are, what issues or problems they are facing, and where do they need to focus their
attention, so problems or issues are resolved.
Qualitative data is about the emotions or perceptions of people, what they feel. In quantitative data,
these perceptions and emotions are documented. It helps the market researchers understand the
language their consumers speak and deal with the problem effectively and efficiently.
QUALITATIVE DATA COLLECTION METHODS
Fig.2.2.2: Qualitative data collection methods

1. One-to-One Interviews: It is one of the most commonly used data collection instruments for
qualitative research, mainly because of its personal approach. The interviewer or the researcher
collects data directly from the interviewee on a one-to-one basis. The interview may be informal
and unstructured – conversational. Mostly the open-ended questions are asked spontaneously,
with the interviewer letting the flow of the interview dictate the questions to be asked.
2. Focus groups: This is done in a group discussion setting. The group is limited to 610 people,
and a moderator is assigned to moderate the ongoing discussion.
3. Record keeping: This method makes use of the already existing reliable documents and similar
sources of information as the data source. This data can be used in the new research. It is similar
to going to a library. There, one can go over books and other reference material to collect
relevant data that can be used in the research.

4. Process of observation: In this qualitative data collection method, the researcher immerses
himself/ herself in the setting where his respondents are, and keeps a keen eye on the
participants, and takes down notes. This is known as the process of observation.
5. Longitudinal studies: This data collection method is performed on the same data source
repeatedly over an extended period. It is an observational research method that goes on for a
few years and, in some cases, can go on for even decades. This data collection method aims to
find correlations through an empirical study of subjects with common traits.
6. Case studies: In this method, data is gathered by an in-depth analysis of case studies. The
versatility of this method is demonstrated in how this method can be used to analyze both
simple and complex subjects. The strength of this method is how judiciously it uses a
combination of one or more qualitative data collection methods to draw inferences.
QUALITATIVE DATA ANALYSIS

Analyzing your data is vital, as you have spent time and money collecting it. It is an essential process
because you don’t want to find yourself in the dark even after putting in so much effort. However,
there are no set ground rules for analyzing qualitative data; it all begins with understanding the two
main approaches to qualitative data.
TWO MAIN A PPROACHES TO QUALITATIVE DATA ANALYSIS

Deductive Approach
The deductive approach involves analyzing qualitative data based on a structure that is
predetermined by the researcher. A researcher can use the questions as a guide for analyzing the
data. This approach is quick and easy and can be used when a researcher has a fair idea about the
likely responses that he/she is going to receive from the sample population.
Inductive Approach
The inductive approach, on the contrary, is not based on a predetermined structure or set ground
rules/framework. It is a more time-consuming and thorough approach to qualitative data analysis. An
inductive approach is often used when a researcher has very little or no idea of the research
phenomenon.
Steps to Qualitative Data Analysis
Whether you are looking to analyze qualitative data collected through a one-to-one interview or
qualitative data from a survey, these simple steps will ensure a robust data analysis.
Step 1: Arrange your Data
Step 2: Organize all your Data

Step 3: Set a Code to the Data Collected
Step 4: Validate your Data
Step 5: Concluding the Analysis Process
Advantages of Qualitative Data
i. It helps in-depth analysis: Qualitative data collected provide the researchers with a
detailed analysis like thematic analysis of subject matters. While collecting qualitative data,
the researchers tend to probe the participants and can gather ample information by asking
the right kind of questions. From a series of questions and answers, the data that is collected
is used to conclude.
ii. Understand what customers think: Qualitative data helps the market researchers to
understand the mindset of their customers. The use of qualitative data gives businesses an
insight into why a customer purchased a product. Understanding customer language helps
market research infer the data collected more systematically.
iii. Rich data: Collected data can be used to conduct research in the future as well. Since the
questions asked to collect qualitative data are open-ended questions, respondents are free to
express their opinions, leading to more information.
Disadvantages of Qualitative Data
i. Time-consuming: As collecting qualitative data is more time-consuming, fewer people are
studying in comparison to collecting quantitative data. Unless time and budget allow,
smaller sample size is included.
ii. Not easy to generalize: Since fewer people are studied, it is difficult to generalize the
results of that population.
iii. Dependent on the researcher’s skills: This type of data is collected through one-to-one
interviews, observations, focus groups, etc. it relies on the researcher’s skills and experience
to collect information from the sample.
PRESENTATION OF TABULAR DATA

A table represents a large amount of data in an arranged, organized, engaging, coordinated and easy
to read form called the tabular presentation of data. It is a table that helps to represent even a large
amount of data in an engaging, easy to read, and coordinated manner. The data is arranged in rows
and columns. This is one of the most popularly used forms of presentation of data as data tables are
simple to prepare and read.
The main sections of a table are the table number, header, footnote, column or header comment,
header or header, table content, source notes, and footnotes.

i. Table of Numbers: For identification purposes and for ease of reference provided in the
table of numbers.
ii. Title: It provides the information base adjacent to the number.
iii. Column header or legend: It is placed at the top of the columns in the table; columns are
accompanied by specific numbers.
iv. Footnote: It provides additional coverage or interpretation that may be needed for any entries
in the table; Footnotes are needed to clarify data.
v. Line headers and stubs: This provides the specific issues covered in the horizontal lines.
The stalk is provided on the left side of the board.
vi. Row entries: Allows rows in one table to be sorted (T right) based on the order. The rows are
present in another table (T left). Very good ->.
vii. Source of information: It is included at the bottom of the table. Information Sources tell us
the source regarding specific information and the authenticity of the sources.
THE GOAL OF TABULATION

Here are the purposes of the tab:
i. To simplify complex data
ii. Give the basic characteristics of the data
iii. For the convenience of comparison
iv. To facilitate statistical analysis
v. Space saving
BASICS OF TABULAR PRESENTATION

WHAT ARE THE THREE LIMITATIONS OF A TABLE?
Following are the major limitations of a table:
1. Lacks description
i. The table represents only figures and not attributes.
ii. It ignores the qualitative aspects of the facts.
2. Incapable of presenting individual items
i. It does not present individual items.
ii. It presents aggregate data.
3. Needs special knowledge
i. The understanding of the table requires special knowledge.
ii. It cannot be easily used by a layman.
SHORT ANSWER TO QUESTIONS WITH MODEL ANSWER

1) What is tabulation in quantitative research?
Answer: Tabulation is a systematic and logical way of representing numerical data in rows and
columns to facilitate comparison and statistical analysis. It facilitates comparisons by gathering
relevant information and helps with statistical analysis and interpretation.
2) What are the classifications of tabular data representation?

Answer: Tabular data is divided into 4 categories. It is the presentation of qualitative (feature-based),
quantitative (quantitative trait-based), temporal (time-based) and spatial (location-based) data
representation.
3) What is a tabular form with an example?

Answer: Anything tabular is arranged in a table, with rows and columns. Sports statistics are usually
presented in tabular form. A table is a chart that arranges information into rows and columns.
Information presented in tabular form is tabular.
4) What is Tabular Data Presentation?

Answer: Tabulating, i.e. presenting data in a tabular manner, is a method of presenting data. It is the
systematic and logical arrangement of data in the form of rows and columns with respect to the
characteristics of the data. It is an orderly arrangement, compact and easy to understand.

02-02-03: FREQUENCY DISTRIBUTION
The frequency distribution table shows the frequency of each data set in an organized manner. It
helps us find patterns in the data and also allows us to analyze the data using measures of central
tendency and variance. The first step a mathematician does with the collected data is to arrange it
in the form of a frequency distribution table. All calculations and statistical checks and analysis will
come later.
A frequency distribution table is a way to organize data in a way that makes it more meaningful. A
frequency distribution table is a chart that summarizes all data under two columns-
variables/categories and their frequencies. It has two or three columns. Usually the first column
lists all results as individual values or as class intervals, depending on the size of the dataset. The
second column includes the score for each outcome. The third column shows the frequency of each
result. Also, the second column is optional.
THE MEANING OF FREQUENCY

Frequency indicates how often something happens. For example, your heart rate is 72 beats per
minute under normal conditions. Frequency is the number of times a value occurs. In daily life, we
encounter a lot of information in the form of numbers, tables, graphs, etc. This information can be
scores obtained by students, temperatures of different cities, scores scored in matches, etc. The
information collected is called data. After data is collected, we need to present it in a way that makes
sense so that it can be easily understood. Frequency distribution tables are a way to organize data.
Here is an example of a histogram to help you better understand the concept. Jane loves to play dice.
She rolls the dice and records her observations each time. These are her observations: 4, 6, 1, 2, 2, 5,
6, 6, 5, 4, 2, 3. To know the exact number of times she got each digit (1, 2, 3, 4, 5, 6) as the outcome,
she classifies them into categories.
An easy way is to draw a frequency distribution table with tally marks (Table 3.1).
Table 3.1: Showing tally marks and frequency

Out Tally Frequenc
comes
1 marks
I y 1
2 II 3
3 II 1
4 I 2
5 II 2
6 I
II 3
I

The table above is an example of a frequency distribution table. You can observe that all the data that
has been collected has been sorted into three columns.
Thus, a frequency distribution table is a graph that summarizes their values and frequencies. In other
words, it is a tool for organizing data. This allows us to easily understand the given set of
information. Thus, the frequency distribution table in statistics helps us to condense the data into a
simpler form so that we can easily observe its characteristics at a glance.
HOW TO BUILD A FREQUENCY DISTRIBUTION TABLE?
The frequency distribution table can be easily generated by following the steps below:
Step 1: Create a table with two columns - one titled the data you are organizing and the other column
will be frequency. [Draw three columns if you also want to add a tick]
Step 2: Examine the entries recorded in the data and decide if you want to plot an unpolled
frequency distribution or a clustered frequency distribution. If there are too many different
values, it is often better to use a clustered frequency distribution table.
Step 3: Write the values from the dataset in the first column.
Step 4: Count how many times each item repeats in the collected data. In other words, find the
frequency of each element by counting.
Step 5: Enter the frequency in the 2nd column corresponding to each item.
Step 6: Finally, you can also write the total frequency in the last row of the table. Let's take an
example. Dr. Anuja is a teacher. She wants to look at the scores the students in her class got on
the last exam. She didn't have time to look at the sheets one by one to see the score. So, she
asked Dr. Ajay to organize the data in a table so that it would be easier for her to see everyone's
notes together. Dr. Anuja suggests using a frequency distribution table to sort the data, so that
you get a better picture of the data instead of using a simple list.
Using a histogram here is a good way to present the data as it will show all of the student's scores in a
single graph. But how do you create a frequency distribution table? They worked hard to collect all the
data. The following table shows the test results of 20 students, i.e. for a class (Table 3.2).
Table 3.2: Showing score of marks and frequency

The frequency distribution table plotted above is called the group less frequency distribution table.
This is a representation of ungrouped data and is often used when you have a smaller data set.
Imagine how difficult it would be to create a similar table if you had a large number of observations,
such as student grades from three grades.
The table we will get will be quite long and the data will be difficult to understand. Therefore, in
such cases, we form class intervals to count how often the data belongs to that particular class
interval. To create such a frequency distribution table, first write the class intervals in a column.
Then count the number in each category by the number of times it occurs. Finally, write the
frequency in the last column.
Table 3.3: Showing class interval and frequency

Marks obtained in test Number of student Frequency
0-5 3
5-10 11
10-15 12
15-20 19
20-25 7
25-31 8
Total 60
A frequency distribution table drawn above is called a grouped frequency distribution table.

FREQUENCY DISTRIBUTION TABLE IN STATISTICS
A frequency distribution in statistics is a representation of data that shows the number of
observations over a given period of time. The frequency distribution representation can be graphical
or tabular. Now let's look at another way to represent data, i.e. represent data graphically. This is
done using a frequency distribution table plot. Such charts make it easier for you to understand the
data collected.
i. Bar charts represent data using bars of uniform width with equal spacing between them.
ii. A pie chart showing an entire circle, divided into sectors where each field corresponds to the
information it represents.
iii. The frequency polygon is plotted by connecting the midpoints of the bars in the histogram.

FREQUENCY DISTRIBUTION TABLE FOR CLUSTERED DATA
The frequency distribution table for grouped data is called the clustered frequency distribution table.
It is based on the class interval frequency. As discussed above that in this table all data categories are
divided into different class intervals of the same width, e.g. 0-10, 10-20, 20-30 etc. And then the
frequency of that class interval is marked for each interval. Look at the example frequency
distribution table for clustered data in the image below.
Table 3.4: Frequency distribution and grouped data

Marks Scored by Tally Marks Frequency
Students
40 - 50 III 3
50 - 60 IIII I 6
60 - 70 IIII 5
70 - 80 IIII I 6
8 0 - 90 II 2
Total= 22
CUMULATIVE FREQUENCY DISTRIBUTION TABLE
Cumulative frequency means the sum of the frequencies of the layer and all lower layers. It is
calculated by adding the frequency of each class below the corresponding class interval or category.
Here is an example of a cumulative frequency distribution table (Table 3.5).

The cumulative frequency distribution tablet saves a lot of time when tabulating data. It facilitates
calculations and leads to the organization of data in seconds.
1) What is cf in Frequency Distribution Table?

Answer: In the frequency distribution table, cf stands for cumulative frequency. cf represents the
collective or total frequency of a category and all categories below or above it.
2) What is the clustered frequency distribution table?

Answer: A clustered frequency distribution table is a table that represents categories as class
intervals. It is mainly used with large datasets.
3) What is the frequency distribution board?

Answer: A frequency distribution table is a tabular representation of the frequencies of given
categories. It represents data in an organized way, useful for graphing data or for calculating mean,
median and mode, variance, etc. It usually has two columns, one of the dataset categories and the
other. Is the frequency of occurrence of each category. Sometimes a tick column is also added before
the frequency to help with frequency counting.
4) What is the frequency distribution board used for?
Answer: Frequency Distribution Table is useful for performing calculations on given data. It
involves calculations involving measures of central tendency, variance, statistical testing, and
analysis. In addition, histograms of frequency distribution are useful for presenting data in a neat and
understandable manner.
02-02-04: ONE- WAY AND TWO -WAY TABLES

A one-way table is a frequency table for a single categorical variable. You usually construct a one-
way table to test whether the frequency counts differ from a hypothesized distribution using the chi-
square goodness-of-fit test. You may also simply want to construct a confidence interval around a
proportion.
A one-way table is a frequency table for a single categorical variable. You usually construct a one-
way table to test whether the frequency counts differ from a hypothesized distribution using the chi-

square goodness-of-fit test. You may also simply want to construct a confidence interval around a
proportion.
One-way frequency refers to a data table that examines only one categorical variable at a time.
The REQ procedure can tabulate this simple structure and generate tests for equal proportions
between categories.
Table 4.1: One-way frequency table

Choice Thailand Singapore India
Frequency 5 4 3
A two-way table is a way to display frequencies or relative frequencies for two categorical
variables. One category is represented by rows and a second category is represented by columns. A
one-way table is simply data from a bar chart put into a table. In a one-way table, you only work
with one categorical variable. Two-way frequency table: (showing "quantity") you might have
guessed that a two-way frequency table will handle two variables (called bivariate data).
Two-dimensional array
A way to display the frequency or relative frequencies of two variables. One of the variables is
represented by rows and the other by columns. They are used to see if there is a relationship between
two variables. A two-way table is a way to display the frequencies or relative frequencies for two
categorical variables. One category is represented by rows and a second category is represented by
columns. For example, 60 people (30 men and 30 women) were asked what kind of movie they
would prefer to watch, and the following responses were recorded:
i. 6 men preferred rom-coms.
ii. 16 men preferred action movies.
iii. 8 men preferred horror movies.
iv. 12 women preferred rom-coms.
v. 14 women preferred action movies.
vi. 4 women preferred horror movies.
The information collected was used to build the following two-way table:
Table 4.2: Two-way frequency table
Rom-com Action Horror Total
Men 6 16 8 30
Women 12 14 4 30
Total 18 30 12 60
The entries in the table are accounts; this type of table is called a double entry frequency table.
The table has several functions:
i. Categories are labeled in the left column and top row.
ii. Scores are placed in the center of the board.
iii. The total appears at the end of each row and column.
iv. The sum of all accounts (total) is placed at the bottom right.
v. The sums in the right column and bottom row are called marginal distributions.
vi. The entries in the middle of the table are called common frequencies.

TWO-WAY RELATIVE FREQUENCY TABLE
Instead of displaying numbers in a table, you can display relative frequencies. Here is the same two-
way relative frequency table (decimal, percent or scale) displayed instead of numbers:
Table 4.3: Two-way relative frequency table
Rom-com Action Horror Total
Men 0.1 0.267 0.133 0.5
Women 0.2 0.233 0.067 0.5
Total 0.3 0.5 0.2 1
To convert the number to a relative frequency, divide the number by the total number of elements. In
the chart above, the first count is for men/Rom-com (count = 6), so 6/60 = 0.1.
The sums in the right column and bottom row, like the double-entry frequency table, are called
marginal distributions. However, the entries in the middle of the table are called conditional
frequencies or conditional distributions.

1) What is a bidirectional frequency table?
Answer: A two-way table is a way to display frequencies or relative frequencies for two categorical
variables. One category is represented by rows and a second category is represented by columns.
2) How do you make a two-way frequency table?

Answer: Double tabulate, Identify two variables of interest. Determine the possible values of each
variable. Choose one variable to be represented by rows and the other to be represented by columns.
Fill the table with frequencies.
3) What is a two-way conditional frequency table?

Answer: A double entry frequency table is a special type of frequency table that shows the
relationship between two types. For example, the following table shows the relationship between
"gender" and "preferred movie type" in a category. Two-way panel showing movie options.
4) What is the frequency table?

Answer: A frequency table is an array that lists the entries and shows how many times those entries
occurred. We denote frequency by the English alphabet 'f'.

Q.1. ----------- explains the specific feature of the table which is not self-explanatory.
a. Footnote b. Source note
c. Body of the table d. Caption
Q.2. At the top of each column in a table, a column designation is provided to explain the figures of
the column which is known as ------------------.
a. Stub b. Caption
c. Head note d. Title
Q.3. -------- is a part of the table that gives information about the unit used in the table to represent
data.
a. Stub b. Caption
c. Head note d. Title
4. The classification in which data in a table is classified according to time is known as ---------.
a. Qualitative b. Quantitative
c. Temporal d. Spatial
5. The arrangement of data in rows and columns is called ----------------
a. Frequency distribution b. Cumulative frequency distribution
c. Tabulation d. Classification
Answer:
1-a 2-b 3-c 4-c 5-c

1. The normal distribution is actually defined by a mathematical equation. (True/False)
2. The range is a frequently used measure of central tendency. (True/False)
3. The mean is the score at the 50th percentile. (True/False)

4. The variance, unlike the range, uses all the scores in its computation. (True/False)
5. If all of the scores in a distribution are increased by exactly five points, the range will increase by
five points. (True/False)

1. Arranging the data in the form of a table is called ------------- of data.
2. Data means information in the form of ------------------- figures.
3. Data obtained in the ----------------------- form is called raw data.
4. Arranging the numerical figures in ascending or descending order is called an -----.
5. The number of times a particular observation occurs is called its -----------------.
Answer:
1-tabulation 2- numerical 3-original 4-array 5- frequency
SUMMARY
This data is any quantifiable information that researchers can use for mathematical calculations and
statistical analysis to make real-life decisions based on these mathematical derivations. For instance,
how much did that laptop cost? is a question that will collect quantitative data. Quantitative data
makes measuring various parameters controllable due to the ease of mathematical derivations they
come with. Mechanism to naturally sense the measured parameters to create a constant source of
information. For example, a digital camera converts electromagnetic information to a string of
numerical data. Quantification of qualitative entities: Identify numbers to qualitative information.

A table that represents a large amount of data in an organized, arranged, attractive, coordinated and
readable form is known as tabular data presentation. It's a table that helps represent even large
amounts of data in an attractive, readable, and coordinated way. When a table is used to represent
large amounts of data in a form that is arranged, organized, attractive, coordinated, and easy to read,
it is called a tabular data representation. The row and column method are one of the most popular
forms of data representation because data tables are easy to prepare and read.
A frequency distribution table is a way to organize data in a way that makes it more meaningful. A
frequency distribution table is a chart that summarizes all data under two columns-
variables/categories and their frequencies. The third column shows the frequency of each result.
Frequency distribution tables are a way to organize data. In a one-way table, you only work with one
categorical variable. Two-way frequency table: (showing "quantity") You might have guessed that a
two-way frequency table will handle two variables (called bivariate data). A two-dimensional array
is a way to display the frequency or relative frequencies of two variables. A two-way table is a way
to display the frequencies or relative frequencies for two categorical variables.
KEY WORDS
Tabulation - Tabulation is a systematic and logical representation of numeric data in rows and
columns to facilitate comparison and statistical analysis.
Key Notes - Notes prefixed with comments or explanations.
Footnotes - Footnotes are notes placed at the bottom of a page and used to refer to sections of text.
Source notes - Sources are cited to develop and support the titles and references in an authoritative
record.
Column Headers - Column Headers are headers that identify a column in the worksheet. Column
headers are at the top of each column and are labeled A, B, ... Z, AA, AB ....
Row Header - Row Header or Row Header is the column to the left of Column 1 in the worksheet,
containing the numbers (1, 2, 3, etc.) in the sheet.
Table of Contents - The table content element identifies one or more rows that make up the main
body (or "body") of the table.
Scoring method - Scoring is a way of recording data in groups of five people.
Recording the frequency in this way is equivalent to the total number of tally ticks made
Map Alignment - Map alignment is a method used to help design or evaluate the information
architecture of a website.

Count method - The count method returns the number of times a specified value appears in the
string.
REFERENCES
1. Siegel, Alan (2004), On universal classes 0f extremely random constant-time hash functions,
SIAM Journal on Computing, 33 (3): 505–543.
2. Morin, Pat (2014), Section 5.2.3: Tabulation hashing, Open Data Structures (in pseudocode)
(0.1Gβ ed.), pp. 115–116.
3. Mitzenmacher, Michael; Upfal, Eli (2014), Some practical randomized algorithms and
data structures, in Tucker, Allen; Gonzalez, Teofilo; Diaz-Herrera, Jorge (eds.), Computing
Handbook: Computer Science and Software Engineering (3rd ed.), CRC Press, pp. 11-1 - 11-23,
4. Siegel, Alan (2004), "On universal classes of extremely random constant-time hash functions",
SIAM Journal on Computing, 33 (3): 505–543.
YOUTUBE VIDEOS
1. https://www.youtube.com/watch?v=hECPeKv5tPM
2. https://www.youtube.com/watch?v=jEeqmHP4GcA
3. https://www.youtube.com/watch?v=Xr0BgvtXWwA
4. https://www.youtube.com/watch?v=JCiUNvfTks4
WIKIPEDIA
1. https://byjus.com/commerce/tabular-presentation-of-data/
2. https://www.questionpro.com/blog/quantitative-data/
3. https://www.cuemath.com/data/frequency-distribution-table/
4. https://www.statisticshowto.com/two-way-table/
REFERENCE BOOKS
1. Agresti A. (1990). Categorical Data Analysis. John Wiley and Sons, New York.
2. Kotz, S.; et al., eds. (2006), Encyclopedia of Statistical Sciences, Wiley.
3. Levine, D. (2014). An Easy to Understand Guide to Statistics and Analytics 3rd Edition.
Pearson FT Press.
4. Fink, Arlene (2005). How to Conduct Surveys. Thousand Oaks: Sage Publications.
CREDIT 02-UNIT 03: GRAPHICAL REPRESENTATION
LEARNING OBJECTIVES
 After successful completion of this unit, you will be able to
 Determine which chart type best represents the data for a given situation.
 Explain how graphs can lead to data misinterpretation.
 Compare representations of the same data set using different graphs or the same type of
graph but with different scales
 Choose the appropriate chart/ graph to represent a given data set
“Numbers have an important story to tell. They rely on you to give them a clear and convincing
voice.” ― Stephen Few
INTRODUCTION
A graph or chart or diagram is a diagrammatical illustration of a set of data. If the graph is
uploaded as an image file, it can be placed within articles just like any other image. Graphs must be
accurate and convey information efficiently. They should be viewable at different computer screen
resolutions. Ideally, graphs will also be aesthetically pleasing. Graphical representation is a way of
analyzing numerical data. It exhibits the relation between data, ideas, information and concepts in a
diagram. It is easy to understand and it is one of the most important learning strategies. It always
depends on the type of information in a particular domain.
Fig.2.3.1: Process of graphical representation
A chart is a graphical representation for data visualization, in which "the data is represented by
symbols, such as bars in a bar chart, lines in a line chart, or slices in a pie chart". A chart can
represent tabular numeric data, functions or some kinds of quality structure and provides different
info. The term "chart" as a graphical representation of data has multiple meanings. A data chart is
a type of diagram or graph that organizes and represents a set of numerical or qualitative data. Maps
that are adorned with extra information (map surround) for a specific purpose are often known as
charts, such as a nautical chart or aeronautical chart, typically spread over several map sheets. Other
domain-specific constructs are sometimes called charts, such as the chord chart in music notation or
a record chart for album popularity.
Charts are often used to ease understanding of large quantities of data and the relationships between
parts of the data. Charts can usually be read more quickly than the raw data. They are used in a wide
variety of fields, and can be created by hand (often on graph paper) or by computer using a charting
application.
Certain types of charts are more useful for presenting a given data set than others. For example, data
that presents percentages in different groups (such as "satisfied, not satisfied, unsure") are often
displayed in a pie chart, but maybe more easily understood when presented in a horizontal bar chart.
On the other hand, data that represents numbers that change over a period of time might be best
shown as a line chart. In the present script graphical representation and line graph are discussed.
02-03-01: GRAPHICAL REPRESENTATION AND LINE GRAPH

(Graphical Representation)
It is a way of analyzing numerical data. It shows the relationships between data, ideas, information,
and concepts in a diagram. It is easy to understand and it is one of the most important learning
strategies. It always depends on the type of information in a particular field.
Graph 2.3.1: Graphical representation of data
There are eight different types of graphical representations (Graph 1.1). Some of them are:
1. Line graph - Line chart or line graph is used to display continuous data and it is useful in
predicting future events over time.
2. Bar graph - Bar chart is used to display categories of data and it compares data using solid bars
to represent quantities.
3. Pictograph- A pictorial symbol for a word or phrase. Pictographs were used as the earliest
known form of writing; examples having been discovered in Egypt and Mesopotamia from

before 3000 BC. "An ancient pictograph of the Great Goddess" a pictorial representation of
statistics on a chart, graph, or computer screen.
4. Pie chart - Also known as pie chart showing the relationship between parts of a whole. Circles
are considered with 100% and used categories are represented with that specific percentage like
15%, 56% etc.
5. Histogram - A chart that uses bars to represent the frequency of numeric data organized into
intervals. Since all intervals are equal and continuous, all bars have the same width.
6. Frequency polygon - A frequency polygon is a line graph of class frequency plotted against
class midpoint. It can be obtained by joining the midpoints of the tops of the rectangles in the
histogram.
7. Frequency curve - A Frequency Curve is a smooth curve which corresponds to the limiting
case of a histogram computed for a frequency distribution of a continuous distribution as the
number of data points becomes very large.
8. Cumulative frequency curve-A curve that represents the cumulative frequency distribution of
grouped data on a graph is called a Cumulative Frequency Curve or an Ogive. Representing
cumulative frequency data on a graph is the most efficient way to understand the data and derive
results.
GENERAL RULES FOR GRAPHICAL REPRESENTATION OF DATA

There are certain rules for effectively presenting information in a graphical representation. They are:
i. Appropriate Title: Make sure that an appropriate title is set for the image that
represents the theme of the presentation.
ii. Unit of measure: Refers to the unit of measurement in the image.
iii. Appropriate scale: To represent data accurately, choose an appropriate scale.
iv. Index: Index the appropriate colors, shades, lines, and designs in the chart for better
understanding.
v. Data Source: Include the information source if needed at the bottom of the chart.
vi. Simplicity: Build charts in a way that is simple for anyone to understand.
vii. Neat: Choose size, font, color, etc. edited so that the chart is a visual aid for the presentation
of information.
LINE GRAPH
A line chart is a type of chart used to display information that changes over time. We draw line
graphs using multiple points connected by line segments.
We also call it the line graph. Line charts have two axes known as the "x" and "y" axes.

i. The horizontal axis is called the x-axis.
ii. The vertical axis is called the y-axis. Parts of the Line
GRAPH
The given image depicts parts of a line chart.
i. Title: Displays data for each line plot drawn.
ii. X-axis: Displays labels on the x-axis, usually time.
iii. Y-axis: displays labels on the y-axis, usually a numerical quantity
iv. Trend: We connect the dots to draw a chart.
The intersection of the labels on the x and y axes indicates the trend. In the given figure (Graph 1.2),
the intersection between Monday and Thursday shows that 5 muffins were sold on Monday.
Graph 2.3.2: Image depicts parts of a line graph
PLOTTING A LINE GRAPH

Drawing a line chart is easy. Here are the simple steps to consider when drawing a line chart.
i. Draw the x and y axes on the grid paper. Make sure to write the title above the chart so that
it identifies the focus of the chart.
ii. For example, if one of the factors is time, it travels on the horizontal axis, called the x-axis.
The other factor will then go on the vertical axis, called the y axis.
iii. Label the two axes according to their respective elements. For example, we can label the x-
axis as hours or days.
iv. Then, using the given data, show the correct values on the chart. Once you have joined the
dots, you can make clear inferences about the trend.
For example, the line graph shows New York's temperature trends on a hot day (Graph 1.3).

Graph 2.3.3: Line graph temperature of New York
READING A LINE GRAPH

Reading a line chart is very easy and you can learn how to read a chart by reading the points shared
below.
i. Let's first look at the two axes and try to make sense of these axes.
ii. Then look at the graph and check the values of the points that lie on the lines of the graph.
iii. Trace the lines and determine if there is some kind of increase or decrease. Also check for
repeating patterns and if there are intersection lines. This way you will know the purpose of
the line chart.
iv. It is also possible that you will see emerging patterns that help you guess the trend.

1) What is a line graph used for?
Answer: Line charts are used to track changes in short and long-time frames. When there are small
changes, it is better to use a line chart rather than a bar chart. Line charts can also be used to compare
changes over the same time period for multiple groups.
2) What data type is used for the line graph?

Answer: Continuous data is suitable for line graphs. Line charts make sense for continuous data on
the y-axis, as continuous data is measured on a scale with many possible values. Here are some
examples of continuous data: Age. Arterial pressure.

3) How do you describe the data in a line chart?
Answer: Line graph plots the data as a line over time. To describe the chart, follow the chart's
progress along the horizontal access and describe whether the chart goes down, goes up, or stays the
same.
4) What is a line chart?

Answer: A line graph is a tool used in statistics to analyze the trend of data to change over a
specified period in a coordinate plane. Here, time and data are represented on the x and y axes. It is
also known as a line chart. The x-axis or the horizontal axis usually has time; and time-varying data
displayed on the vertical or y-axis. The data obtained for each time period is called a "data point". It
is represented by a small circle. An example of a line chart would be to record a city's temperature
for all days of the week to analyze rising or falling trends.
02-03-02: HISTOGRAM, FREQUENCY POLYGONS

INTRODUCTION (HISTOGRAM)
A histogram is a graphical representation of the distribution of data. The histogram is
represented by a set of contiguous, rectangles, where each bar represents a data type. Statistics is a
branch of mathematics that is applied in many different fields. When numbers are repeated in
statistical data, this repetition is called frequency and can be written in tabular form, known as
frequency distribution. Frequency distributions can be graphed using different types of histograms
and histograms are one of them. In this article, let us discuss in detail what a chart is, how to create a
chart for the given data, different types of charts and the difference between histogram and bar chart
in detail.
The histogram is a graphical representation of the clustered frequency distribution with
continuous layers. It is an area plot and can be defined as a set of rectangles with bases with intervals
between layer boundaries and with areas proportional to frequencies in the respective layers. In such
representations, all rectangles are contiguous because the base spans the intervals between the class
boundaries. The height of the rectangle is proportional to the corresponding frequency of the similar
layers, and for different layers the height will be proportional to the corresponding frequency density.
In other words, a histogram is a graph consisting of rectangles whose area is proportional to the
frequency of a variable and whose width is equal to the class interval.
HOW TO DRAW A HISTOGRAM?

You need to follow the steps below to build the chart.
i. Start by plotting the class intervals on the X axis and the frequencies on the Y axis.

ii. The scales on both axes should be identical.
iii. The duration of the lesson must be exclusive.
iv. Draw a rectangle with the base as the layer interval and the corresponding frequency as the
height.
v. A rectangle is constructed on each layer interval because class boundaries are marked on the
horizontal axis and frequencies are displayed on the vertical axis.
vi. The height of each rectangle is proportional to the corresponding class frequency if the
intervals are equal.
vii. The area of each distinct rectangle is proportional to the corresponding layer frequency if the
intervals are not equal. Although histograms look similar to histograms, there is a slight
difference between them. The histogram has no gaps between two consecutive bars.
USE HISTOGRAM
Histograms are used under certain conditions. They are:
i. Data must be numeric.
ii. Histograms are used to check the shape of the data distribution.
iii. Allows you to check if the process changes from one stage to another.
iv. Used to determine if the output is different when two or more processes are involved.
v. Used to analyze whether a given process meets customer requirements.
TYPES OF HISTOGRAM
Histograms can be classified into different categories based on the frequency distribution of the data.
There are different types of distributions, such as normal distribution, asymmetrical distribution,
two-way distribution, multimodal distribution, comb distribution, edge vertex distribution, dog food
distribution, distribution heart cut, etc. Histograms can be used to represent these different types of
distributions. The different chart types are:
i. Uniform histogram
ii. Symmetric histogram
iii. Bimodal histogram
iv. Probability histogram
i. Uniform Histogram
A uniform distribution reveals that the number of classes is too small, and each class has the same
number of elements. It may involve distribution that has several peaks (Graph 2.1).

Graph 2.3.4: Showing uniform histogram
ii. Bimodal Histogram
If a graph has two vertices, it is said to be two patterns. Duality occurs when data sets containing
observations about two different types of individuals or groups are combined if the centers of two
separate histograms are far enough apart from the variability of the two data sets (Graph 2.2).
Graph 2.3.5: Showing bimodal histogram

iii. Symmetric Histogram
Symmetrical charts are also known as bell charts. When you draw a vertical line down the middle of
the chart and both sides are the same size and shape, the chart is said to be symmetrical. The diagram
is perfectly symmetrical if the right half of the image is similar to the left half. Asymmetrical
histograms are said to be asymmetrical (Graph 2.3).
Graph 2.3.6: Showing symmetric histogram

iv. Probability Histogram
The probability plot shows a graphical representation of a discrete probability distribution. It consists
of a rectangle whose center is each value of X, and the area of each rectangle is proportional to the

probability of the corresponding value. Probability histograms begin with the selection of classes.
The probability of each outcome is the height of the bars in the histogram.
APPLICATIONS OF THE HISTOGRAM: It can be seen as we discover different distributions.

I. Normal distribution
The normal model has the shape of a bell curve known as the normal distribution. In a normal
distribution, data points are more likely to appear on one side of the mean than on the other. Note
that the other distributions appear to be similar to the normal distribution. Statistical operations are
used to prove that a distribution is normal. It should be noted that the term "normal" explains a
particular distribution of a process. For example, in different processes they have a natural boundary
on one side and will produce skewed distributions. This is normal, that is, for processes where the
distribution is not considered normal (Graph 2.4).
Graph 2.3.7: Showing normal distribution histogram

i. Skewed Distribution
The skewed distribution is asymmetric because a natural boundary opposes the end results to one
side. The top of the distribution is the deviation in the direction of the boundary and a tail part moves
away from it. For example, a distribution that includes analyzes of an unaltered product will be
skewed because the product cannot pass purity greater than 100 percent. Other examples of natural
limits are holes not to be smaller than bore diameter or call reception time to be no less than zero.
The upper distribution is called right or left skewed depending on the direction of the tail (Graph
2.5).
Graph 2.3.8: Showing right-skewed distribution histogram

ii. Multimodal Distribution

Another name for the multimodal distribution is the plain distribution. Various conventional
dispersion processes are performed. Since there are many neighboring peaks, the distribution peak
has the shape of a plain (Graph 2.6).
Graph 2.3.9: Showing plateau distribution histogram

iii. Edge peak Distribution
This distribution is similar to the normal distribution except that it has a larger onesided peak.
Usually, this is due to poor chart structure, with the data being combined in a collection named
"bigger" (Graph 2.7).
Graph 2.3.10: Showing edge peak distribution histogram

iv. Comb Distribution
In this distribution, there are alternating high and short bars. This is mostly the result of rounded data
and/or poorly drawn graphs. For example, a temperature rounded to the nearest 0.2o will show a
comb-like appearance provided the width of the histogram bar is 0.1o (Graph 2.8).
Graph 2.3.11: Showing comb distribution histogram

v. Truncated or Heart-Cut Distribution
The distribution above looks like a normal distribution with the tails removed. The manufacturer can
create a normal distribution of products and then depend on testing to distinguish what is within
specifications and what is not. The resulting package for the end user from the specifications is cut to
the core (Graph 2.9).

Graph 2.3.12: Showing truncated or heart-cut distribution histogram
vi. Dog Food Distribution
Something is missing in this cast. It gives results close to the average. If the end-user gets this
distribution, the other end-user gets the tight distribution and the remaining end-user gets the dog
food, the brick-a-branch left after the master's meal. Even if the end user receives within
specification, the item is classified into 2 groups, namely, one group is closer to the higher
specification and the other is closer to the lower specification limit. This discrepancy causes
problems in the end-user process (Graph 2.10).
Fig.2.3.13: Showing dog food distribution histogram
FREQUENCY POLYGONS
A frequency polygon is a graphical representation of a data distribution that helps to
understand data through a particular shape. Frequency polygons are very similar to histograms
but are very useful and useful when comparing two or more data. The histogram mainly presents the
cumulative frequency distribution data in the form of a line graph. Let's learn about the frequency
polygon graph, the steps to create the graph and solve some examples to better understand the
concept.
A frequency polygon can be defined as a form of graph that represents information or data that is
widely used in statistics. This form of visual data representation helps to describe the shape and
trends of data in an organized and systematic way. The frequency polygons on the shape of the
histogram represent the number of occurrences of the class intervals. This type of chart is usually
plotted with a histogram, but can also be plotted without a histogram. While a histogram is a
histogram with rectangular bars with no gaps, a frequency polygon histogram is a line chart
representing cumulative frequency distribution data. The frequency polygon looks like the image
below (Graph 2.11):

Graph 2.3.14: Showing polygons
STEPS TO CONSTRUCT FREQUENCY POLYGONS
The curve in a frequency polygon is plotted on the X and Y axes. Like a regular histogram, the X axis
represents the values in the data set, and the Y-axis shows the number of occurrences of each
category. When plotting frequency polygons, the most important aspect is the midpoint known as the
class interval or class markers. The frequency polygon curve can be plotted with or without
histogram. To plot with histogram, we first draw rectangular bars based on class intervals and join the
midpoints of the bars to get frequency polygons.
Here are the steps to plot the frequency polygon without histogram:
Step 1: Plot the layer intervals for each layer on the X-axis as we plot the curve on the Y-axis.
Step 2: Calculate the midpoint of each interval as the class score.
Step 3: Once Grade scores are obtained, mark them on the x-axis.
Step 4: Since pitch always represents frequency, plot the histogram against each layer mark. It must
be drawn relative to the grade mark itself and not to the upper or lower boundary.
Step 5: After the points are marked, connect them with a line similar to the line chart.
Step 6: The curve obtained by this line segment is the frequency polygon.
 Difference between frequency polygon and histogram
Even though a frequency polygon graph is similar to a histogram and can be plotted with or without a
histogram, the two graphs are yet different from each other. The two graphs have their own unique
properties that show the difference visually. The differences are:
i Frequency polygons can be used to visualize discrete ungrouped data whereas a histogram can
only be used to represent grouped frequency distributions.
ii The histogram is a two-dimensional figure, that is, a collection of adjacent rectangles whereas the
frequency polygon is a line graph.
iii Frequency polygon can be used more effectively for the comparative study of two or more
frequency distributions because frequency polygons of different distributions can be drawn on the
same single graph. This is not possible in the case of histograms where we need separate
histograms for each of the frequency distributions.

iv For studying the relationship of the individual class frequencies to the total frequency, the
histogram gives a better picture and is accordingly preferred over the frequency polygon.
v Unlike histograms, a frequency polygon is a continuous curve and therefore possesses all the
distinct advantages of graphical representations, that is, it may be used to determine the slope,
rate of change, estimates (interpolation and extrapolation), etc., wherever admissible.
vi One point of similarity between histogram and frequency polygon is that they cannot be
constructed for frequency distributions with open-end classes. Also, suitable adjustments are
required for frequency distributions with unequal classes.

1) What are frequency polygons used for?
Answer: The frequency polygon is a visual representation of a distribution. Visualization tools are
used to understand the shape of a distribution. Basically, the frequency polygon indicates the number
of occurrences for each distinct class in the data set.
2)How do you know if a frequency polygon is biased or not?

Answer: The distribution is a symmetric, the position of the long tail - not the peak - is what gives
this frequency distribution its name. The long right tail is called the right oblique or the positive
oblique, while the long-left tail is called the left oblique or the negative oblique.
3) What are the advantages of frequency polygons?

Answer: The advantage is: 1. The frequency polygon not only helps ensure that the data is organized
and represented, but also makes it easy for anyone to compare and contrast all the results. 2. These
are much easier to understand and they give a clear picture of the distribution of the data.
4) Why is it called frequency polygon?

Answer: The reason it's called a polygon is that the line forms a plane with the horizontal axis being
one side of the shape: now a frequency polygon and a histogram both show the same information,
but in a different way.
02-03-03: FREQUENCY DISTRIBUTION AND CURVE

INTRODUCTION (FREQUENCY DISTRIBUTION)
In statistics, cumulative frequency is defined as the sum of frequencies distributed over different
class intervals. This means that data and totals are displayed in tabular form with frequencies

distributed according to class intervals. In this article, the cumulative frequency distribution, the
types of cumulative frequencies, and the structure of the cumulative frequency table are explained in
detail with examples. The cumulative frequency is the sum of the frequencies, the frequency of the
first-class interval is added to the frequency of the second-class interval, and the sum is added to the
frequency of the third-class interval. Therefore, a table representing cumulative frequencies
distributed over different classes is called a cumulative frequency table or a cumulative frequency
distribution. Cumulative frequency distributions are commonly used to identify the number of
observations that are above or below a specified frequency in a given data set.
TYPES OF CUMULATIVE FREQUENCY DISTRIBUTIONS
Cumulative frequency distributions are classified into two different types. That is, Ogive or less than
cumulative frequency and greater than/greater than cumulative frequency.
i. Less than cumulative frequency:
The distribution below the cumulative frequency is obtained by successively adding all previous
class frequencies together with the class in which they were written. This type starts accumulating
from the smallest size to the largest size.
ii. Greater than cumulative frequency:
A larger cumulative frequency is also called a larger type of cumulative frequency. Where the
distribution greater than the cumulative frequency is obtained by determining the overall cumulative
frequency from the highest class to the lowest class. Graph display of less than cumulative frequency
and cumulative frequency
Graphing cumulative frequencies is simpler and more convenient than using tables, bar charts,
frequency polygons, etc. Cumulative frequency charts can be drawn in two ways:
i. The following types of cumulative frequency distribution curves (or ogives)
ii. Create a cumulative frequency distribution curve (or ogive) with more than steps and less
than the cumulative frequency curve.
To create a less than cumulative frequency curve:
i. Mark the upper bound on the horizontal or X-axis.
ii. Mark the cumulative frequency on the vertical or Y-axis.
iii. Enter a point (X, Y) in the coordinate plane. where x represents the upper bound and y
represents the cumulative frequency.
iv. Finally, connect the dots to draw a smooth curve.

v. The curves thus obtained provide the following types of cumulative frequency distribution
graphs:
To draw a cumulative frequency distribution graph of less than type, consider the following
cumulative frequency distribution table which gives the number of participants in any level of essay
writing competition according to their age:
Table 3.1: Cumulative Frequency distribution table of less than type

Level of Essay Age Group Age group Number of Cumulative
(class interval) participants Frequency
(Frequency)
Level 1 10-15 Less than 15 20 20
Level 4 25-30 Less than 30 30 100
On plotting corresponding points according to table 3.1, we have
Graph 2.3.15: Showing frequency curve
Steps to Construct Greater than Cumulative Frequency Curve To create a Greater Than/Greater
than Cumulative Frequency Curve:
i. Mark the lower limit on the horizontal axis.
ii. Mark the cumulative frequency on the vertical axis.
iii. Enter a point (X, Y) in the coordinate plane. where x represents the lower bound and Y
represents the cumulative frequency.
iv. Finally, connect the points to draw a smooth curve.
v. The curve thus obtained provides a cumulative frequency distribution graph of more than
one type.
To draw more than one kind of cumulative frequency chart, consider the same cumulative frequency
chart showing the number of participants in each essay writing contest by age (Table
3.2).

Table 3.2: Cumulative frequency distribution table of more than type
Graph 2.3.16: Showing cumulative frequency curve

These charts are useful for finding the median for a given data set. The median can be found by
plotting both types of cumulative frequency distribution curves on the same graph. The value at the
intersection of both curves gives the median value for the given data set. Given Table 3.1, the median
can be calculated as follows (Graph 3.3).
Graph 2.3.17: Showing median using cumulative frequency graph
OGIVE
The word Ogive is a term used in architecture to describe a curved or curved shape. Ogive are
charts used to estimate the number of numbers below or above a particular variable or value in your
data. To create an Ogive, first the cumulative frequencies of the variables are calculated using a
frequency table. It does this by adding the counts of all previous variables in the given data set. The

result or final number in the cumulative frequency table is always equal to the total frequency of the
variable. The most commonly used frequency distribution graphs are histograms, frequency
polygons, frequency curves, and ogives. Let's take a closer look at one of the graphs called "Ogive".
DEFINITION
An ogive is defined as a frequency distribution graph of a series. An ogive is a cumulative
distribution plot that describes data values on the horizontal axis and cumulative relative counts,
cumulative counts, or cumulative percentage counts on the vertical axis. Cumulative frequency is
defined as the sum of all previous frequencies up to the present. To determine the popularity of a
particular piece of data, or the probability that it falls within a particular frequency range, the Ogive
curve helps pinpoint these details.
Create an ogive by plotting the points corresponding to the cumulative frequency of each class
interval. Most statisticians use the Ogive curve to visualize data pictorially. Useful for estimating the
number of observations below a certain value. A frequency chart is a frequency chart used to show
the properties of discrete and continuous data. Such numbers are more pleasing to the eye than
aggregated data. It is useful for facilitating comparative studies of two or more frequency
distribution. You can relate the shape and pattern of two frequency distributions. The two methods of
ogives are as follows (Graph 3.4).
Greater than or more than Ogive
Graph 2.3.18: Showing less than and more than ogives

The chart above represents the less-than and greater-than Ogee curves. The Rose Curve represents
the smaller ogive and the Purple Curve represents the larger ogive.
i. Less than ogive
The frequencies of all preceding classes are added to the class frequency. This series is called the less
than cumulative series. It is created by adding the first-class frequency to the second-class frequency,
then adding the third-class frequency. Cumulative downwards gives the series that are less than
cumulative.
ii. Greater than or More than Ogive
The following class frequencies are added to the class frequency. This series is called the Cumulative
Series Above or Above. It is constructed by subtracting the frequencies of the first and second

classes from the total frequency of the third class. The cumulative result is greater than or greater
than the cumulative series.
OGIVE CHART
An ogive chart is a curve of a cumulative or relative cumulative frequency distribution. To draw such
a curve, we need to express the counts as a percentage of the total counts. Such percentages are then
accumulated and plotted, as in Ogive. Below are the steps to create an Ogive below and larger.
HOW TO DRAW LESS THAN O GIVE CURVE?

i. Draw and mark the horizontal and vertical axes.
ii. Take the cumulative frequencies along the y-axis (vertical axis) and the upper-class limits on
the x-axis (horizontal axis).
iii. Against each upper-class limit, plot the cumulative frequencies.
iv. Connect the points with a continuous curve.
HOW TO DRAW GREATER THAN OR MORE THAN OGIVE CURVE?

i. Draw and mark the horizontal and vertical axes.
ii. Take the cumulative frequencies along the y-axis (vertical axis) and the lower-class limits on
the x-axis (horizontal axis).
iii. Against each lower-class limit, plot the cumulative frequencies. iv. Connect the points
with a continuous curve.
USES OF O GIVE CURVE

Ogive Graph or the cumulative frequency graphs are used to find the median of the given set of data.
If both, less than and greater than, cumulative frequency curve is drawn on the same graph, we can
easily find the median value. The point in which, both the curve intersects, corresponding to the x-
axis, gives the median value. Apart from finding the medians, Ogives are used in computing the
percentiles of the data set values.
OGIVE EXAMPLE
Question 1: Construct the more than cumulative frequency table and draw the Ogive for the below-
given data (Table 3.3).
Marks 1-10 11-20 21-30 31-40 41-50 51-60 61-70 71-80
Frequency 3 8 12 14 10 6 5 2
Solution:

Plotting an Ogive:
Plot the points with coordinates such as (70.5, 2), (60.5, 7), (50.5, 13), (40.5, 23), (30.5, 37), (20.5,
49), (10.5, 57), (0.5, 60).
An Ogive is connected to a point on the x-axis, that represents the actual upper limit of the last class,
i.e., (80.5, 0)
Take x-axis, 1 cm = 10 marks Y-axis =
1 cm – 10 c.f.
More than the Ogive Curve:
Fig.2.3.5: Showing more than ogive curve

1) Why use the frequency curve?
Answer: Smoothing the frequency polygon gives a curve called frequency curve. This smoothing
can be done when the number of observations in the frequency distribution grows infinitely large and
the class width becomes infinitely small.
2) How do you get the frequency curve?

Answer: The frequency curve is obtained by drawing a smooth freehand curve that passes through
the points of the frequency polygon as accurately as possible.
3) What are Frequency Polygons and Frequency Curves?
Answer: A frequency polygon is a closed figure created by connecting the centers of the tops of all
histogram rectangles with straight lines. A frequency curve is a freehand curve connecting the upper
midpoints of all histogram rectangles.
4) What is a sharp curve?

Answer: The Ogive plot is a curve of the cumulative frequency distribution or the cumulative
relative frequency distribution. To draw such a curve, we need to express the counts as a percentage
of the total counts. Such percentages are then accumulated and plotted, as in Ogive.
02-03-04: BAR DIAGRAMS AND PIE CHART

INTRODUCTION (BAR DIAGRAM/GRAPH)
A bar diagram/ chart is a visual representation of data (often grouped), in the form of vertical or
horizontal rectangular bars, where the lengths of the bars are proportional to the measure of the data.
They are also known as bar charts/graph. Bar chart is one of the ways of data processing in statistics.
A visual representation of grouped data, in the form of vertical or horizontal rectangular bars, where
the lengths of the bars correspond to the measure of the data, is called a bar chart or bar chart.
The bars are plotted with uniform width and variable magnitude represented on one of the axes. In
addition, the measurement of the variable is represented on other axes. The height or length of the
bars represents the value of the variable, and these graphs are also used to compare certain quantities.
The frequency distribution table can be easily represented using a bar graph which simplifies the
calculation and understanding of the data. The three main properties of a bar chart are:
i. Bar charts make it easy to compare different data sets between different groups.
ii. It shows the relationship using two axes, with categories on one axis and discrete values on
the other.
iii. The graph shows the main changes in the data over time.
BAR CHART TYPES

Bar charts can be vertical or horizontal. The main characteristic of any bar chart is its length or
height. If the length of the bar graph is greater, the values are larger than any given data. Bar charts
typically show categorical and numerical variables sorted over class time periods. They consist of a
shaft and a series of labeled horizontal or vertical bars. The bars represent the frequencies of discrete

values of a variable, or often discrete values. The number of values on the x-axis of a bar chart or on
the y-axis of a column chart is called the scale.
TYPES OF BAR CHARTS ARE:

i. Vertical bar chart
ii. Horizontal bar chart
Although the chart can be plotted horizontally or vertically, the most commonly used type of bar
chart is the vertical bar chart. The direction of the x and y axes varies depending on the vertical and
horizontal bar chart type. Besides vertical and horizontal bar charts, the two different types of bar
charts are:
i. Grouped bar charts
ii. Stacked bar chart
Now let's look at four different types of bar charts.
VERTICAL BAR DIAGRAM /CHART

When grouped data is represented vertically in a chart or table using bars, where the bars indicate
the measure of the data, these charts are called vertical bar charts. The data is plotted along the y-axis
of the histogram and the heights of the bars (Graph 4.1).
Graph 2.3.19: Showing vertical bar graph

HORIZONTAL BAR GRAPHS
When grouped data is represented horizontally in charts using bars, these charts are known as
horizontal bar charts, where the bars indicate the measure of the data. The data is plotted here along
the x-axis of the histogram and the lengths of the bars represent the values (Graph 4.2).

Graph 2.3.20: Showing horizontal bar graph
GROUPED BAR GRAPH
Clustered bar charts, also known as clustered / group bar charts, are used to represent discrete
values of multiple objects sharing the same category. In this type of bar chart, the total number of
cases is combined into a single bar. In other words, a group bar chart is a type of bar chart in which
different sets of data items are compared with each other. Here, a single color is used to represent the
entire particular sequence. Cluster bar charts can be represented using vertical and horizontal bar
charts (Graph 4.3).
Graph 2.3.21: Showing cluster/ grouped bar graph

STACKED BAR GRAPH
Stacked bar charts also known as composite bar charts, divide the aggregate into different sections.
In this type of bar chart, each section can be represented with different colors, making it easy to
identify different categories. Stacked bar charts require specific labeling to show different sections of
the bar. In a stacked bar chart, each bar represents the whole and each segment represents different
parts of the population (Graph 4.4).
Graph 2.3.22: Showing stacked bar graph

PROPERTIES OF BAR GRAPH
Some of the important properties of a bar chart are:
i. All bars must have a common base.
ii. Each column of the bar chart must be the same width.
iii. The height of the bar must match the data value.
iv. The distance between each bar should be the same.

APPLICATIONS OF BAR CHARTS
Bar charts are used to match things between different groups or to track changes over time. However,
when trying to estimate changes over time, a bar chart is more appropriate when the changes are
larger. Bar charts have a discrete domain and are often scaled so that all data can fit on the chart.
When there is no regular order of paired splits, the histogram bars can be arranged in any order. A
bar chart arranged from the highest number to the lowest number is called a Pareto chart. Advantages
and disadvantages of bar charts

ADVANTAGES:
i. Bar charts summarize large data sets in a simple visual form.
ii. It displays each type of data in the frequency distribution.
iii. It clarifies data trends better than tables.
iv. This helps to estimate key values at a glance.
DISADVANTAGES :
i. Sometimes bar charts don't reveal patterns, causes, impacts, etc.
ii. It can be easily manipulated to create misinformation.

IMPORTANT NOTE:
Some important notes regarding bar charts are:
i. In a bar chart, there should be equal spacing between the bars.
ii. It is recommended to use a bar chart if the data frequency is very large.
iii. Understand what data should be presented on the x- and y-axis and the relationship between
the two.

HOW TO DRAW A BAR GRAPH?
Let us consider an example, we have four different types of pets, such as cat, dog, rabbit, and
hamster and the corresponding numbers are 22, 39, 5 and 9 respectively. In order to visually
represent the data using the bar graph, we need to follow the steps given below.
Step 1: First, decide the title of the bar graph.
Step 2: Draw the horizontal axis and vertical axis. (For example, Types of Pets)
Step 3: Now, label the horizontal axis.
Step 4: Write the names on the horizontal axis, such as Cat, Dog, Rabbit, Hamster.
Step 5: Now, label the vertical axis. (For example, Number of Pets)
Step 6: Finalize the scale range for the given data.
Step 7: Finally, draw the bar graph that should represent each category of the pet with their
respective numbers.
PIE CHART
A pie chart is a graphical representation of data in the form of a chart or pie chart in which slices of
the pie represent the size of the data.
A list of numerical variables along with categorical variables is needed to present the data in the form
of a pie chart. The length of the arc of each slice, and therefore its area and central angle in the pie
chart, is proportional to its size.
A pie chart is a type of chart that displays data in a visual chart. It is one of the most popular charts
for data representation with attributes of circles, spheres and angular data to represent real world
information. The shape of a pie chart is circular, where the pie represents all the data and the slices of
the pie represent parts of the data and graph them discretely.
DEFINITION
A pie chart is a type of chart that records data in a circular manner that is further divided into
sections so that the data represents that particular part of the whole. Each of these sections or parts
represents a proportional part of the whole. It helps to interpret and present data. It is also used to
compare data.

EXAMPLE PIE DIAGRAM
Consider the following example of a box diagram that represents the ingredients for making a butter
cake (Fig.4.5).
Example: The whole pie represents a value of 100. It is divided into 10 slices or sectors. The various
colors represent the ingredients used to prepare the cake. What would be the exact quantity of each
of the ingredients represented in specific colors in the following pie chart?
Solution: As we can see, the pie is divided into 10 slices or sectors. To calculate the exact amount of
ingredients that are added to the cake, we divide the whole sector's value, i.e., 100 by the number of
sectors. So, 100’ 10 = 10. Hence, looking at the color divisions made in the pie chart we can
conclude that:
Fig.4.5: Pia chart

PIE CHART FORMULA
We know that the total value of the pie is always 100%. It is also known that a circle subtends an
angle of 360°. Hence, the total of all the data is equal to 360°. Based on these, there are two main
formulas used in pie charts:
i. To calculate the percentage of the given data, we use the formula: (Frequency ÷ Total
Frequency) × 100
ii. To convert the data into degrees we use the formula: (Given Data ÷ Total value of Data) ×
360°
We can work out the percentage for a given pie chart using the steps given below,
i. Categorize the given data and calculate the total
ii. Divide the different categories
iii. Convert the data into percentages
iv. Calculate the degrees
Let us understand the above steps using an example.
Example: Observe the following pie chart that represents the money spent by Ana at the funfair. The
indicated color shows the amount spent on each category. The total value of the data is 20 and the

amount spent on each category is interpreted as follows: Ice Cream- 4, Toffees- 4, Popcorn- 2 and
Rides- 10. (Fig. 4.6).
Fig.4.6: Pie chart of above example.


DIFFERENCE BETWEEN BAR CHART AND PIE CHART
Pie charts are one of the types of graphical representations. A pie chart is a pie chart and is divided
into several sections. Each part represents a part of the whole. While a bar chart represents discrete
data and compares one data to another.

DIFFERENCE BETWEEN BAR CHART AND LINE CHART
The main differences between a bar chart and a line chart are:
i. Bar charts represent data using rectangular bars, and bar heights represent the values
displayed in the data. Meanwhile, a line chart helps to display information when the data
series is connected by a straight line.
ii. Learning about line charts is a bit of a hassle because line charts draw too many lines on the
chart. While the bar chart helps to quickly show the relationship between the data.

1) What is a bar chart with an example?
Answer: A bar chart can be defined as a chart or graphical representation of data, quantities, or
numbers using bars or bands. Bar charts are used to compare and contrast the quantities, frequencies,
or other measurements of separate categories of data.
2) What types of bar charts are there?

Answer: There are four types of bar charts: vertical bar charts, horizontal bar charts, stacked bar
charts, and grouping bar charts.

3) What is a simple bar chart?
Answer: A simple bar chart is used to represent data relating to a single variable that is classified on
a spatial, quantitative, or temporal basis. In a simple bar chart, we create bars of equal width but
variable length, i.e. the magnitude of a quantity represented by the height or length of the bar.
4) What are the advantages of bar charts?

Answer: The main advantages of bar charts are: Bar charts can be used with numeric or categorical
data. It presents each data type in a frequency distribution. Big data can be easily summarized in a
visual form.

1. Ogive is also called ..............Graph.
a. Frequency b. Cumulative Frequency Curve
c. Frequency polygon d. Cumulative Percentage Frequency Curve
2. The curve drawn by taking upper limits along x-axis and cumulative frequency along y-axis is ----
-----.
a. frequency polygon b. more than ogive
c. less than ogive d. none of these
3.Cumulative frequency distribution graph is called ------------.
a. ogive b. frequency polygon
c. pie diagram. d. Frequency curve
4. The median distribution divides it in to -----------.
a. Three equal parts b. does not divide in any parts
c. two equal parts d. four equal parts
5. The curve drawn by taking upper limits along x-axis and cumulative frequency along y-axis is ----
-----.
a. frequency polygon b. more than ogive
c. less than ogive d. None of these
Answer:
1-d 2-c 3-a 4-d 5-c

1. The ogives are semicircular arches. (True/ False)

2. An ogive can be constructed over a frequency distribution histogram. (True/ False)
3. A less than ogive is constructed on the basis of cumulative frequencies being in ascending order.
(True/ False)
4. A circle is a polygon. Since, a polygon is a closed curve made up of only line segments. (True/
False)
5. The polygon is square shaped. (True/ False)
Answer:
COLUMN-I COLUMN-II

1. Ogives are used to compute the percentiles of the data set values and find them----.
2. The ogive curve is not closed and the end point does not touch to the ----------.
3. A type of line graph where the class ----------- is plotted against the class midpoint and the points
are joined by a line segment creating a curve.
4. In frequency polygon line graph, the points are connected with a ---------- segments.
5. A histogram graph that depicts data through rectangular-shaped bars with ----------.
Answer:
1-medians 2-x axis 3- frequency 4- straightline 5- no spaces
SUMMARY
It shows the relationships between data, ideas, information, and concepts in a diagram. As - Line
Chart - Line chart or line graph is used to display continuous data and it is useful in predicting future

events over time. Histogram - A chart that uses bars to represent the frequency of numeric data
organized into intervals. Line Graph - It shows the frequency of data on a given number line. The
table shows the number of data items present in the given time period.
It is an area plot and can be defined as a set of rectangles with bases with intervals between layer
boundaries and with areas proportional to frequencies in the respective layers. In other words, a
histogram is a graph consisting of rectangles whose area is proportional to the frequency of a
variable and whose width is equal to the class interval. Draw a rectangle with the base as the layer
interval and the corresponding frequency as the height. The height of each rectangle is proportional
to the corresponding class frequency if the intervals are equal. The area of each distinct rectangle is
proportional to the corresponding layer frequency if the intervals are not equal.
A frequency polygon is a graphical representation of a data distribution that helps to understand data
through a particular shape. The histogram mainly presents the cumulative frequency distribution data
in the form of a line graph. While a histogram is a histogram with rectangular bars with no gaps, a
frequency polygon histogram is a line chart representing cumulative frequency distribution data.
In statistics, cumulative frequency is defined as the sum of frequencies distributed over different
class intervals. This means that data and totals are displayed in tabular form with frequencies
distributed according to class intervals. Cumulative frequency distributions are classified into two
different types. Where the distribution greater than the cumulative frequency is obtained by
determining the overall cumulative frequency from the highest class to the lowest class.
Bar diagram/ chart is a visual representation of data (often grouped), in the form of vertical or
horizontal rectangular bars, where the lengths of the bars are proportional to the measure of the data.
A visual representation of grouped data, in the form of vertical or horizontal rectangular bars, where
the lengths of the bars correspond to the measure of the data, is called a bar chart or bar chart. In
addition, the measurement of the variable is represented on other axes.
KEY WORDS
Ogive- Ogive is defined as the histogram of the frequency distribution of a series.
Polygon - This is a type of line graph where the frequency of the layer is plotted based on the
midpoint of the layer and the points are connected by a line segment creating a curve.
Cumulative - it is an increase by successive additions.
Frequency - Frequency is the number of occurrences of a repeating event per unit of time.
Class interval - Class interval refers to the numerical width of any class in a particular distribution.
Bullet Curve - A Bullet Plot is a curve of Cumulative Frequency Distribution or Cumulative
Relative Frequency Distribution.

Bar Chart - A chart that presents categorical data with rectangular bars whose height or length is
proportional to the value they represent.
Line Chart - A line chart is a type of chart used to display information that changes over time.
Pie Charts - Special charts that use "slices of pie" to show the relative size of data
REFERENCES
1. Black, Ken (2009). Business Statistics: Contemporary Decision Making. John Wiley & Sons. p.
24.
nd
2. Everitt, B.S. (2002). The Cambridge Dictionary of Statistics (2 Ed.). Cambridge: Cambridge
University Press. ISBN 0-521-81099-X.
3. Koebe, Manfred (1992), On a new class of intersection graphs, Fourth Czechoslovakian

Symposium on Combinatorics, Graphs and Complexity (Prachatice, 1990), Ann. Discrete Math.,
vol. 51, North-Holland, Amsterdam, pp. 141–143.
4. Spinrad, Jeremy P. (2003), Efficient graph representations, Fields Institute Monographs, vol. 19,
American Mathematical Society, Providence, RI, p. 41.
YOUTUBE VIDEOS
1. https://www.youtube.com/watch?v=_K0IBXcgk48
2. https://www.youtube.com/watch?v=SDYEwv0WxMo
3. https://www.youtube.com/watch?v=uHRqkGXX55I
4. https://www.youtube.com/watch?v=FVRJU--8YMY
WIKIPEDIA
1. https://asq.org/quality-resources/histogram
2. https://www.cuemath.com/data/frequency-polygons/
3. https://byjus.com/maths/ogive/
4. https://byjus.com/maths/bar-graph/
REFERENCE BOOKS
1. Dodge, Yadolah (2008). The Concise Encyclopedia of Statistics. Springer.
2. Robbins, (1995). Polygons inscribed in a circle, American Mathematical Monthly 102.
3. Coxeter, H.S.M. (1973). Regular Polytopes, 3rd Edn, Dover (pbk).
4. Russell, Bertrand, (2004). History of Western Philosophy, Reprint Edition, Routledge.

CREDIT 02-UNIT 04: DATA ANALYSIS
LEARNING OBJECTIVES
 Create and analyze data with tools, and software applications.
 Identify appropriate representations for different situations.
 Compare data sets and solve real-world problems.
 Write statements and predict conclusions.
 Students will develop relevant programming skills.
 Students will apply data science concepts and methods to solve problems
“Without big data analytics, companies are blind and deaf, wandering out onto the web like deer
on a freeway” - Geoffrey Moore,
INTRODUCTION
Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of
discovering useful information, informing conclusions, and supporting decision-making. Data
analysis has multiple facets and approaches, encompassing diverse techniques under a variety of
names, and is used in different business, science, and social science domains. In today's business
world, data analysis plays a role in making decisions more scientific and helping businesses operate
more effectively
Fig.2.4.1: Data analysis tools

Analytics is the systematic computational analysis of data or statistics. It is used for the discovery,
interpretation, and communication of meaningful patterns in data. It also entails applying data
patterns toward effective decision-making. It can be valuable in areas rich with recorded
information; analytics relies on the simultaneous application of statistics, computer programming,
and operations research to quantify performance.
Organizations may apply analytics to business data to describe, predict, and improve business
performance. Specifically, areas within analytics include descriptive analytics, diagnostic analytics,

predictive analytics, prescriptive analytics, and cognitive analytics. Analytics may apply to a variety
of fields such as marketing, management, finance, online systems, information security, and software
services. Since analytics can require extensive computation (see big data), the algorithms and
software used for analytics harness the most current methods in computer science, statistics, and
mathematics.
Data analytics is a multidisciplinary field. There is extensive use of computer skills, mathematics,
statistics, the use of descriptive techniques and predictive models to gain valuable knowledge from
data through analytics. There is increasing use of the term advanced analytics, typically used to
describe the technical aspects of analytics, especially in the emerging fields such as the use of
machine learning techniques like neural networks, decision trees, logistic regression, linear to
multiple regression analysis, and classification to do predictive modeling. It also includes
unsupervised machine learning techniques like cluster analysis, principal component analysis,
segmentation profile analysis and association analysis. In this segment authors are trying to reflect
the arrow toward research data analysis and research software applications in the research sector.
Which is needful and useful to the newly research aspirants.
02-04-01: DATA ANALYSIS TOOLS AND TECHNIQUES

INTRODUCTION (DATA ANALYSIS TOOLS )
Data analytics is defined as a process of cleaning, transforming, and modeling data to uncover
insights useful for business decision making. The purpose of data analysis is to extract useful
information from data and make decisions based on data analysis. Data analytics is the process of
analyzing raw data in order to extract meaningful insights. This can be done through a variety of
methods, such as statistical analysis or machine learning. The systematic application of statistical and
logical techniques to describe the data scope, modularize the data structure, condense the data
representation, illustrate via images, tables, and graphs, and evaluate statistical inclinations,
probability data, and derive meaningful conclusions known as Data Analysis. These analytical
procedures enable us to induce the underlying inference from data by eliminating the unnecessary
chaos created by its rest. Data generation is a continual process; this makes data analysis a
continuous, iterative process where the collection and performing data analysis simultaneously.
Ensuring data integrity is one of the essential components of data analysis.
There are various examples where data analysis is used, ranging from transportation, risk and fraud
detection, customer interaction, city planning healthcare, web search, digital advertisement, and
more.
Data analytics can help small businesses in a number of ways. By understanding data analytics,
businesses can make better decisions about where to allocate their resources and how to price their

products or services. Additionally, data analytics can help businesses identify trends and understand
their customer base. Data analytics software can track and analyze data, allowing you to create
actionable reports and dashboards. If you’re looking for a reliable solution, read our guide to the best
data analytics tools and software available today.
Data analysis tools make it easier for users to process and manipulate data, analyze relationships and
correlations between data sets, and also help identify patterns and trends to interpret. Below is a
complete list of tools used to analyze the data in the study.
DATA ANALYSIS TOOLS
There are several data analysis tools available in the market, each with its own set of functions. The
selection of tools should always be based on the type of analysis performed and the type of data
worked. Here is a list of a few compelling tools for (Fig.1.2) data analysis.
Fig.2.4.2: Data analysis tools in market
1. Excel: It has various compelling features, and with additional plugins installed, it can handle a
massive amount of data. So, if you have data that does not come near the significant data
margin, Excel can be a versatile tool for data analysis.
2. Tableau: It falls under the BI Tool category, made for the sole purpose of data analysis. The
essence of Tableau is the Pivot Table and Pivot Chart and works towards representing data in
the most user-friendly way. It additionally has a data cleaning feature along with brilliant
analytical functions.
3. Power BI: It initially started as a plug-in for Excel, but later on, detached from it to develop in
one of the most data analytics tools. It comes in three versions: Free, Pro, and Premium. Its
Power Pivot and DAX language can implement sophisticated advanced analytics similar to
writing Excel formulas.
4. Fine Report: Fine Report comes with a straightforward drag and drops operation, which helps
design various reports and build a data decision analysis system. It can directly connect to all
kinds of databases, and its format is similar to that of Excel. Additionally, it also provides a
variety of dashboard templates and several self-developed visual plug-in libraries.

5. R & Python: These are programming languages that are very powerful and flexible. R is best at
statistical analysis, such as normal distribution, cluster classification algorithms, and regression
analysis. It also performs individual predictive analyses like customer behavior, spending, items
preferred by him based on his browsing history, and more. It also involves concepts of machine
learning and artificial intelligence.
6. SAS: It is a programming language for data analytics and data manipulation, which can easily
access data from any source. SAS has introduced a broad set of customer profiling products for
web, social media, and marketing analytics. It can predict their behaviors, manage, and optimize
communications.
DATA ANALYSIS CONSISTS OF THE FOLLOWING PHASES : AS BELOW-

i. Data Requirement Gathering: First, you need to think why do you want to do this data
analysis? Everything you need to discover the purpose or purpose of data analysis. You need to
decide what kind of data analysis you want to do! In this phase, you need to decide what to
analyze and how to measure it, you need to understand why you are surveying and what metrics
you should use to perform this analysis.
ii. Data collection: After gathering the requirements, you will have a clear idea of what you need
to measure and what your conclusions should be. Now is the time to collect your data as needed.
Once you have collected your data, remember that the collected data must be processed or
organized for analysis. Since you've collected data from a variety of sources, it's a good idea to
keep a journal with the date it was collected and the data source.
iii. Data cleaning: Now any data collected may not be useful or relevant for your analysis
purposes, so it needs to be cleaned up. Collected data may contain duplicate records, spaces or
errors. The data must be clean and error-free. This phase should be done before analysis because
based on data cleaning; your analysis result will be closer to the expected result.
iv. Data analysis: After the data is collected, cleaned and processed, it is ready for analysis. As
you work with data, you may find that you have the exact information you need, or that you
may need to collect more data. During this phase, you can use data analysis tools and software
that will help you understand, interpret, and draw conclusions based on the requirements.
v. Interpreting data: After analyzing your data, it's finally time to interpret your results. You can
choose how to phrase or communicate your data analysis, you can just use words, or it can be
tables or charts. Then use the results of your data analysis to decide your best course of action.

vi. Data visualization: Data visualization is very common in your daily life; they usually appear in
the form of tables and graphs. In other words, the data is presented in the form of graphs so that
the human brain understands and processes it more easily. Data visualization is often used to
uncover unknown events and trends. By observing relationships and comparing data sets, you
can find ways to uncover meaningful insights.
DATA ANALYSIS TECHNIQUES

There are different techniques for Data Analysis depending upon the question at hand, the type of
data, and the amount of data gathered. Each focus on taking onto the new data, mining insights, and
drilling down into the information to transform facts and figures into decision-making parameters.
Accordingly, the different techniques of data analysis can be categorized as follows:
Fig.2.4.3: Types of data analysis techniques.

1. Techniques based on Mathematics and Statistics
Descriptive Analysis: Descriptive Analysis considers the historical data, Key Performance
Indicators and describes the performance based on a chosen benchmark. It takes into account past
trends and how they might influence future performance.
Dispersion Analysis: Dispersion in the area onto which a data set is spread. This technique allows
data analysts to determine the variability of the factors under study.
i Regression Analysis: This technique works by modeling the relationship between a dependent
variable and one or more independent variables. A regression model can be linear, multiple,
logistic, ridge, non-linear, life data, and more.
ii Factor Analysis: This technique helps to determine if there exists any relationship between a
set of variables. This process reveals other factors or variables that describe the patterns in the
relationship among the original variables. Factor Analysis leaps forward into useful clustering
and classification procedures.

iii Discriminant Analysis: It is a classification technique in data mining. It identifies the different
points on different groups based on variable measurements. In simple terms, it identifies what
makes two groups different from one another; this helps to identify new items.
iv Time Series Analysis: In this kind of analysis, measurements are spanned across time, which
gives us a collection of organized data known as time series.
2. Techniques based on Artificial Intelligence and Machine Learning

i Artificial Neural Networks: A Neural network is a biologically-inspired programming
paradigm that presents a brain metaphor for processing information. An Artificial Neural
Network is a system that changes its structure based on information that flows through the
network. ANN can accept noisy data and are highly accurate. They can be considered highly
dependable in business classification and forecasting applications.
ii Decision Trees: As the name stands, it is a tree-shaped model representing a classification or
regression model. It divides a data set into smaller subsets, simultaneously developing into a
related decision tree.
iii Evolutionary Programming: This technique combines the different types of data analysis
using evolutionary algorithms. It is a domain-independent technique, which can explore ample
search space and manages attribute interaction very efficiently.
iv Fuzzy Logic: It is a data analysis technique based on the probability that helps handle the
uncertainties in data mining techniques.
3. Techniques based on Visualization and Graphs

Column Chart, Bar Chart: Both these charts are used to present numerical differences between
categories. The column chart takes to the height of the columns to reflect the differences. Axes
interchange in the case of the bar chart.
i Line Chart: This chart represents the change of data over a continuous interval of time.
ii Area Chart: This concept is based on the line chart. It also fills the area between the
polyline and the axis with color, representing better trend information.
iii Pie Chart: It is used to represent the proportion of different classifications. It is only suitable
for only one series of data. However, it can be made multi-layered to represent the
proportion of data in different categories.
iv Funnel Chart: This chart represents the proportion of each stage and reflects the size of
each module. It helps in comparing rankings.

v Word Cloud Chart: It is a visual representation of text data. It requires a large amount of
data, and the degree of discrimination needs to be high for users to perceive the most
prominent one. It is not a very accurate analytical technique.
vi Gantt Chart: It shows the actual timing and the progress of the activity compared to the
requirements.
vii Radar Chart: It is used to compare multiple quantized charts. It represents which variables
in the data have higher values and which have lower values. A radar chart is used for
comparing classification and series along with proportional representation.
viii Scatter Plot: It shows the distribution of variables in points over a rectangular coordinate
system. The distribution in the data points can reveal the correlation between the variables.
ix Bubble Chart: It is a variation of the scatter plot. Here, in addition to the x and y
coordinates, the bubble area represents the 3rd value.
x Gauge: It is a kind of materialized chart. Here the scale represents the metric, and the pointer
represents the dimension. It is a suitable technique to represent interval comparisons.
xi Frame Diagram: It is a visual representation of a hierarchy in an inverted tree structure.
xii Rectangular Tree Diagram: This technique is used to represent hierarchical relationships
but at the same level. It makes efficient use of space and represents the proportion
represented by each rectangular area.
xiii Regional Map: It uses color to represent value distribution over a map partition.
xiv Point Map: It represents the geographical distribution of data in points on a geographical
background. When the points are the same in size, it becomes meaningless for single data,
but if the points are as a bubble, it also represents the size of the data in each region.
xv Flow Map: It represents the relationship between an inflow area and an outflow area. It
represents a line connecting the geometric centers of gravity of the spatial elements. The use
of dynamic flow lines helps reduce visual clutter.
xvi Heat Map: This represents the weight of each point in a geographic area. The color here
represents the density.

1) What is data analysis in a research example?
Answer: A simple example of data analysis is whenever we make a decision in our daily life, we
think about what happened last time or what will happen by how to choose that particular decision. It
is nothing more than analyzing our past or future and making decisions based on that.

2) What are the methods of data analysis?
Answer: The two main data analysis methods are qualitative data analysis techniques and
quantitative data analysis techniques. These data analysis techniques can be used independently or in
combination to help business leaders and decision makers derive business insights from other types
of data. together.
3) What are Data Analytics?

Answer: Data analytics is the activity of working with data to gather useful insights, which can then
be used to make informed decisions. Presenting theory before data is an inherent mistake.
4) What are data analytics used for?

Answer: The main purpose of data analytics is to apply technology and statistical analysis to data to
find trends and solve problems. Data analytics is becoming increasingly important in business as a
way to analyze and shape business processes and improve decision making and business outcomes.
02-04-02: DATA ANALYSIS AND SOFTWARE

INTRODUCTION (RESEARCH DATA ANALYSIS)
Research data is any information that has been collected, observed, generated or created to validate
original research findings. Data Analysis is the process of systematically applying statistical and/or
logical techniques to describe and illustrate, condense and recap, and evaluate data. Although usually
digital, research data also includes non-digital formats such as laboratory notebooks and diaries.
TYPES OF RESEARCH DATA

Research data can take many forms. It might be:
i. Documents, Spreadsheets
ii. Laboratory Notebooks, Field Notebooks, Diaries
iii. Questionnaires, Transcripts, Codebooks
iv. Audiotapes, Videotapes
v. Photographs, Films
vi. Test Responses
vii. Slides, Artefacts, Specimens, Samples
viii. Collections of Digital Outputs
ix. Data Files
x. Database Contents (Video, Audio, Text, Images)

xi. Models, Algorithms, Scripts
xii. Contents of An Application (Input, Output, Log files For Analysis Software,
Simulation Software, Schemas)
xiii. Methodologies and Workflows
xiv. Standard Operating procedures and protocols
SOURCES OF RESEARCH DATA

Research data can be generated for different purposes and through different processes.
i. Observational data is captured in real-time, and is usually irreplaceable, for example sensor
data, survey data, sample data, and neuro-images.
ii. Experimental data is captured from lab equipment. It is often reproducible, but this can be
expensive. Examples of experimental data are gene sequences, chromatograms, and toroid
magnetic field data.
iii. Simulation data is generated from test models where model and metadata are more important
than output data. For example, climate models and economic models.
iv. Derived or compiled data has been transformed from pre-existing data points. It is
reproducible if lost, but this would be expensive. Examples are data mining, compiled
databases, and 3D models.
v. Reference or canonical data is a static or organic conglomeration or collection of smaller
(peer-reviewed) datasets, most probably published and curated. For example, gene sequence
databanks, chemical structures, or spatial data portals.
DATA ANALYSIS SOFTWARE

Text Analysis is also known as Data Mining. It is one of the methods of data analysis to discover a
pattern in a large data set using databases or data mining tools. It converts raw data into business
information. Business Intelligence tools are on the market and used to make strategic business
decisions. In general, it provides a way to extract and examine data, derive models, and finally
interpret the data. Now, there's software that can compound and analyze your data, so you don't have
to. Most survey tools provide you with reporting for quantitative data, which limits your analysis to
only consider the questions that have a quantifiable answer. This means qualitative data like open-
ended questions or comments must be manually reviewed for trends and behaviors.

Fig.2.4.5: Model of data analysis software
But when your business is taking in hundreds of reviews each day, you can't have employees
weeding through responses one by one. You need tools that automate this process and present your
team with a breakdown of your customer reviews. This is where qualitative data analysis software
comes into play. Qualitative data analysis software reviews your survey and customer reviews in
bulk, saving your team valuable time during reporting. The best qualitative data analysis software
available and the best free options you can use with your team.
1. Hub Spot
2. MAXQDA
3. Quirkos
4. Qualtrics
5. Raven's Eye
6. Square Feedback
7. Free QDA
8. QDA Miner Lite
9. Connected Text
10. Visão
1. Hub Spot: As part of its Service Hub suite, Hub Spot offers a customer feedback tool that
provides detailed analytics for surveys and customer reviews. Your data gets centralized into one
accessible dashboard which includes different charts and graphs summarizing your customers'
responses. With this simple setup, your team has a quick and clean way to review their daily
analytics without navigating around the site.
Additionally, Hub Spot's Service Hub tools are integrated with NPS® surveys. Net Promoter
Score, or NPS, is a type of survey that collects both quantitative and qualitative customer
feedback. Hub Spot's customer feedback tool analyzes these responses and provides you with a
detailed breakdown of customer satisfaction based on its findings.
2. MAXQDA: MAXQDA is qualitative data analysis software that's designed for companies
analyzing different types of customer data. The software allows you to import data from
interviews, focus groups, surveys, videos, and even social media. This way you can review all of
your qualitative data in one central location. Once imported, MAXQDA lets you organize your
information into categories or groups. You can mark specific data with tags and leave notes for
other employees to review your work. MAXQDA even lets you color code your data so that your
team knows exactly what to work on each day.
3. Quirkos: Quirkos includes a variety of tools that analyze and review qualitative data. One of its
most notable tools is its text analyzer which can find common keywords and phrases throughout
different text documents. Your team can upload its customer reviews or survey responses and use
this tool to identify recurring roadblocks in the customer experience.
Another interesting tool Quirkos provides is its ―word cloud‖ tool. The word cloud tool reviews
all of your text data and pulls out words that are frequently used. Then it groups them together
into a cluster to visualize the themes emerging from your data, just like in the example below.
4. Qualtrics: Qualtrics uses AI to review your survey data and forecast trends in customer behavior.
Its Predictive Analysis tool evaluates data and makes predictions about customer satisfaction
based on past survey responses. Use this information to interpret how customers will react to
changes you make to the customer experience. The Text Analysis‖ tool reviews survey comments
for popular trends and topics that are appearing in your customers' feedback. This tool saves your
team time by analyzing your surveys' qualitative data in bulk. Once the data is compounded,
Qualtrics provides you with a variety of display options including graphs, charts, slideshows, and
maps.
5. Raven's Eye: Raven's Eye is qualitative data software that can process multiple types of customer
data. One of its most popular features is its audio converter which uploads audio files into the
software and transforms them into text files. Then it analyzes the text for different insights into
customer behavior. So, if you conduct interviews or focus groups with customers, you can record
the audio for these sessions and upload them to Raven's Eye for analysis.
6. Square Feedback: Square Feedback is a free survey and customer feedback collection tool that
also provides qualitative data reporting. It can analyze survey responses to see how satisfied your
customers are with things like customer service, wait time, and product quality. It also includes
historical filter options that let you compare past data to current customer information to see how
your customer service has changed over time.
7. Free QDA: Free QDA is basic qualitative data analysis software that's commonly downloaded by
businesses looking for an inexpensive and simple tool. It uses a text analyzer to review customer
interviews and compounds the information into one central location. There you can create
categories for your data and group together popular words and phrases appearing in your
responses.

8. QDA Miner Lite: QDA Miner Lite processes your qualitative data and lays it out into categories
constructed into a tree structure. This tree structure consists of a core topic that's broken down into
smaller subtopics. These subtopics are groups of similar phrases or pieces of text that relate to the
overarching category. This way you can quickly upload and segment your text data without
having to review each one individually.
9. Connected Text: Connected Text is interesting text analyzer software that acts as an internal wiki
for customer feedback. Once a text is uploaded to the software, it's given a unique page and
categorized into a cluster of related documents. Users then search for keywords, customer names,
or topics to find specific text samples or similar pieces of feedback. This system helps your team
analyze qualitative data and makes it easier for customer service reps to locate past interactions
with a customer that may become relevant at a later time.
10. Visão: Visao analyzes text by breaking down the customer's response and filtering it into different
segments. Users can customize the segments and determine how they want to have their data
grouped. You can filter through individual pieces or look for common keywords between different
samples of text. This flexibility allows your team to perform bulk assessments and analyze notable
or outlying reviews.

1) What is data analysis in a research example?
Answer: A simple example of data analysis is every time we make a decision in our daily life it is by
thinking about what happened last time or what will happen when making that particular decision. It
is nothing more than analyzing our past or future and making decisions based on that.
2) What are Data Analytics?

Answer: Data analytics is the activity of working with data to gather useful insights, which can then
be used to make informed decisions. ―It is a mistake to make the theory before the data is available.
3) What are data analytics used for?

Answer: The main purpose of data analytics is to apply technology and statistical analysis to data to
find trends and solve problems. Data analytics is becoming increasingly important in business as a
way to analyze and shape business processes and improve decision making and business outcomes.
4) What are the benefits of data analytics?

Answer: Data analytics can help businesses better understand their customers, evaluate advertising
campaigns, personalize content, create content strategies, and develop products. Finally, businesses
can use data analytics to increase business performance and improve outcomes.
02-04-03: STATISTICS TOOLS

In the preceding chapters, basic elements for the proper execution of analytical work such as
personnel, laboratory facilities, equipment, and reagents were discussed. Before embarking upon the
actual analytical work, however, one more tool for the quality assurance of the work must be dealt
with: the statistical operations necessary to control and verify the analytical (Fig.3.1) procedures as
well as the resulting data.
Fig.2.4.6: Model tools and software

It was stated before that making mistakes in analytical work is unavoidable. This is the reason why a
complex system of precautions to prevent errors and traps to detect them has to be set up. An
important aspect of the quality control is the detection of both random and systematic errors. This
can be done by critically looking at the performance of the analysis as a whole and also of the
instruments and operators involved in the job. For the detection itself as well as for the quantification
of the errors, statistical treatment of data is indispensable.
A multitude of different statistical tools is available, some of them simple, some complicated, and
often very specific for certain purposes. In analytical work, the most important common operation is
the comparison of data, or sets of data, to quantify accuracy (bias) and precision. Fortunately, with a
few simple convenient statistical tools most of the information needed in regular laboratory work can
be obtained: the "t-test, the "F-test", and regression analysis. Therefore, examples of these will be
given in the ensuing pages.
Clearly, statistics are a tool, not an aim. Simple inspection of data, without statistical treatment, by an
experienced and dedicated analyst may be just as useful as statistical figures on the desk of the
disinterested. The value of statistics lies with organizing and simplifying data, to permit some
objective estimate showing that an analysis is under control or that a change has occurred. Equally
important is that the results of these statistical procedures are recorded and can be retrieved.
1. Error
2. Accuracy

3. Precision
4. Bias
1. Error
Error is the collective noun for any departure of the result from the "true" value. Analytical errors
can be:
i. Random or unpredictable deviations between replicates, quantified with the "standard
deviation".
ii. Systematic or predictable regular deviation from the "true" value, quantified as "mean
difference" (i.e. the difference between the true value and the mean of replicate
determinations).
iii. Constant, unrelated to the concentration of the substance analyzed (the analyte).
iv. Proportional, i.e. related to the concentration of the analyte.
2. Accuracy
The "trueness" or the closeness of the analytical result to the "true" value. It is constituted by a
combination of random and systematic errors (precision and bias) and cannot be quantified directly.
The test result may be a mean of several values. An accurate determination produces a "true"
quantitative value, i.e. it is precise and free of bias.
3. Precision
The closeness with which results of replicate analyses of a sample agree. It is a measure of dispersion
or scattering around the mean value and usually expressed in terms of standard deviation, standard
error or a range (difference between the highest and the lowest result).
4. Bias
The consistent deviation of analytical results from the "true" value caused by systematic errors in a
procedure. Bias is the opposite but most used measure for "trueness" which is the agreement of the
mean of analytical results with the true value, i.e. excluding the contribution of randomness
represented in precision. There are several components contributing to bias:
i. Method bias: The difference between the (mean) test result obtained from a number of
laboratories using the same method and an accepted reference value. The method bias may
depend on the analyze level.
ii. Laboratory bias: The difference between the (mean) test result from a particular laboratory
and the accepted reference value.

iii. Sample bias: The difference between the mean of replicate test results of a sample and the
("true") value of the target population from which the sample was taken. In practice, for a
laboratory this refers mainly to sample preparation, sub-sampling and weighing techniques.
Whether a sample is representative for the population in the field is an extremely important
aspect but usually falls outside the responsibility of the laboratory (in some cases laboratories
have their own field sampling personnel).
IMPORTANT STATISTICAL TOOLS IN RESEARCH

Biologists find statistical analysis in research to be the scariest aspect of conducting research.
However, the statistical tools in research can help researchers understand what to do with the data
and how to interpret the results, making the process as easy as possible.
i. Statistics Package for Social Sciences (SPSS)
It is a widely used software package for studying human behavior. SPSS can compile descriptive
statistics, as well as graphical representations of the results. In addition, it includes the ability to
create scripts that automate analysis or perform more advanced statistical processing.
ii. R Foundation for Statistical Computing
This software package is used in human behavior research and other fields. R is a powerful tool and
has a steep learning curve. However, this requires some level of coding. Furthermore, it comes with
an active community involved in building and improving the software and related plugins. iii.
MATLAB (The Math works)
It is an analytics platform and a programming language. Researchers and engineers use this software
and generate their own code and help answer their research questions. While MatLab can be a
difficult tool for newbies to use, it offers the flexibility of a researcher's needs.
iv. Microsoft Excel
This is not the best solution for statistical analysis in research, but MS Excel provides many tools for
simple data visualization and statistics. Easily create summary and customizable charts and metrics.
MS Excel is the most accessible option for those who want to get started with statistics.
v. Statistical Analysis Software (SAS)
It is a statistical platform used in business, healthcare, and human behavior research. It can perform
advanced analysis and produce publishable figures, tables, and graphs.
vi. Graph Pad Prism
This is a premium software used mainly by biological researchers. However, it offers a wide variety
for use in different fields. Similar to SPSS, GraphPad provides a scripting option to automate
analyzes to perform complex statistical calculations.

vii. Minitab
This software provides basic and advanced statistical tools for data analysis. However, like Graph
Pad and SPSS, Minitab requires proficiency in coding and can provide automated analysis. Using
statistical tools in research and data analysis.

1) What tools are used for statistical analysis?
Answer: Some of the most popular and practical statistical tools for quantifying such comparisons
are F-tests, t-tests, and regression analysis. Since the F-test and the t-test are the most basic, they'll be
covered first.
2) Why are statistical tools used?

Answer: Why should researchers use statistical tools? Statistical knowledge helps to use appropriate
data collection methods, analyze correctly, and present results effectively. Statistics is an important
process through which we make scientific discoveries, make data-driven decisions, and make
predictions with analysis.
3) What are statistical tools and their interpretation?

Answer: The statistical and interpretive tool is one of the most important yet simplest units in
Economics class 11. This unit is about using different types of tools to analyze and interpret data in
order to derive something meaningful.
4) What software is used for qualitative analysis?

Answer: So, what is the best free qualitative research software? NVivo, ATLAS. ti, Research Text
Analysis Software by Provalis, Quirkos, MAXQDA, Dedoose, Raven`s Eye, Qiqqa, web QDA,
Hyper RESEARCH, Transana, F4analyse, Annotations, Datagrav are some of the free qualitative
data analysis software.
02-04-04: SOFTWARE APPLICATIONS

Computers are electronic devices. Because it runs on electricity, it is called an electronic device. It
accepts the data, processes it, and produces the appropriate results.
The output of a computer system is also known as the output of the computer system. Computer
systems are commonly used today. It is used in various fields due to its fast processing, data storage,

ability to access data or information, data transmission, instant communication and connection
methods.
“A set of programmed instructions used to perform a certain task is called software”. The
software cannot be seen by the user. Users can only perform and confirm their functions. Hardware
runs all software. The two main types of software are system software and application software. The
most widely used computer software is the operating system "Windows, Linux, Ubuntu, etc." and
Microsoft Office (MS Word, Excel, PowerPoint, etc.) (Fig.4.1).
Fig.2.4.7: Software application
SOFTWARE APPLICATION
The term "application software" refers to software that performs specific functions for the user.
When the user directly interacts with the software, it is called application software. The sole purpose
of application software is to assist users in performing specific tasks. Microsoft Word and Excel, as
well as popular web browsers like Firefox and Google Chrome, is examples of application software.
It also includes a portfolio of mobile apps, including apps like Whats App for communication and
games like Candy Crush Saga. There are also app versions of popular services, such as weather or
traffic information, as well as apps that allow users to connect with businesses. Global Positioning
System "GPS", Graphics, Multimedia, Presentation Software,
Desktop Publishing Software, etc. are examples of such software.
FUNCTION OF APPLICATION SOFTWARE :

Application software programs are created to help with a wide range of tasks. Here are a few
examples:
i. Information and data management
ii. Management of documents (document exchange systems)
iii. Development of visuals and video
iv. Emails, text messaging, audio and video conferencing, and cooperation are all options.
v. Management of accounting, finance, and payroll
vi. Management of resources (ERP and CRM systems)
vii. Management of a project
viii. Management of business processes

ix. Software for education (LMS and e-learning systems)
x. Software for healthcare applications
NEED OF APPLICATION SOFTWARE:

End users may use application software‖ to perform one or more tasks. Here are some reasons to
have application software on your computer:
i Help users complete specific tasks: Application software is designed with the user in mind.
They help end users perform specialized tasks in a variety of industries, including education,
business, and entertainment. For example, Microsoft Word is popular application software that
allows users to create, edit, delete, and perform other tasks with Word documents.
ii Data management and manipulation: Business enterprises use application software to manage
and manipulate employee, customer, and other databases. Enterprise resource management
systems and customer relationship management systems are two common examples of
application software.
iii Allows users to organize information efficiently: Individual users can use application software
to efficiently create and manage large amounts of data. For example, Microsoft Excel is popular
application software that allows users to manage data tables.

TYPES OF APPLICATION SOFTWARE:
Application software can also be classified by cost and accessibility. Here are some application
software:
i Freeware: It is freely available, as the name implies. You can use free application software that
you can get from the Internet. Otherwise, this software does not allow you to modify it or charge
to share it. Examples include Adobe PDF, Mozilla Firefox, and Google Chrome.
ii Shareware: This software is freely available to users on a trial basis, often with a limited-time
offer. If consumers want to continue using this application software, they will have to pay a fee.
WinZip, Anti-virus and Adobe Reader are versions of shareware.
iii Open Source: This type of application software comes with the source code, allowing you to
modify it and even add features to it. They can be provided for free or for a fee. Open source
application software including Moodle and Apache Web Server.
iv Closed Source: This category includes most of the application software in use today. They are
usually for a fee, and the source code is often protected by intellectual property rights or patents.
It usually comes with a bunch of limitations. Microsoft Windows, Adobe Flash Player, Win
RAR, macOS and other operating systems are examples.

APPLICATION SOFTWARE :
Some of the examples of application software are:
i. Hotel Management System
ii. Payroll management
iii. Human Resource Management Systems
iv. Attendance Registration System
v. Payment system:

ADVANTAGES OF APPLICATION SOFTWARE :
i It meets the special requirements of customers. The client acknowledges that it must use explicit
programming to complete the task because it is explicitly scheduled for a reason.
ii Companies associated with specific applications can restrict access and consider ways to
monitor their activity.
iii With Health Logic it is possible to get standard engineer updates for licensed application
programming.

DISADVANTAGES OF APPLICATION SOFTWARE:
i. Developing application software to achieve certain goals can be quite expensive for
developers. This can affect their financial plans and sources of income, especially if
spending too much time on a product that is usually not worthwhile.
ii. Application software that is used regularly by many of us and then published on the Internet
is in real danger of being infected by bugs or other malicious projects.

DISTRACTIVE SOFTWARE
Just like there are positive software’s, there are also negative forms of the application software used
for nefarious purposes. Applications software can carry the following hidden programs or utilities:
i. Malware: This stands for malicious software. The most common forms of malware are
Trojan horses, worms and viruses.
ii. Adware and Spyware: Adware and spyware are other common types of software. Adware
includes sponsored freeware available when you register. Sometimes, adware tracks internet
surfing habits to become intrusive and change into spyware. It then keeps a record of all the

sites you have surfed and comes up with ads which it feels are relevant to you. Unlike
adware, spyware has a negative connotation.
iii. Greyware: This is a term used to categories all the software falling between malicious
software and other codes, including track-ware and spyware.
iv. Nagware: This refers to software that comes in the form of pop-ups asking users to register
for a product or purchase an app
v. Bloatware: Software that has so many different features that it requires considerable disk
space and memory resources to run
vi. Slime ware: This refers to software that interferes with user experience by changing
principal settings.
vii. Abandonware: Software which is no longer sold or supported by publishers.
viii. Dribble ware: Software that has too many updates and patches.

1) What is a research method tool?
Answer: Data collection tools refer to devices/instruments used to collect data, such as paper
questionnaires or computer-aided interview systems. Case studies, checklists, interviews, sometimes
observations, and surveys or questionnaires are all tools used to collect data.
2) Why are tools important in research?

Answer: Use research tools: A systematic process is required to collect essential data. Relevant,
sufficient quantity and quality data must be collected. Therefore, the tools used as the means of data
collection are called tools.
3) What software is used for qualitative analysis?

Answer: So, what is the best free qualitative research software? NVivo, ATLAS. ti, Research Text
Analysis Software Provalis, Quirkos, MAXQDA, Dedoose, Raven`s Eye, Qiqqa, web QDA, Hyper
RESEARCH, Transana, F4analyse, Annotations, Datagrav are some good free qualitative data
analysis software best.
4) What software is used to write research?

Answer: For example, you can use tools like Microsoft Word, Scrivener, or LaTeX to write and
publish. More tech-savvy people will use tools like Libre Office or Open Office. Some hate typing

and choose verbal software like Zapier or Dragon. These are the most common software used for all
kinds of documents
1. Data in -------- bytes size is called Big Data.

a. Tera b. Giga
c. Peta d. Meta
2. In computers, a ------ is a symbolic representation of facts or concepts from which information
may be obtained with a reasonable degree of confidence.
a. Data b. Knowledge
c. Program d. Algorithm
3. Amongst which of the following can be considered as the main source of unstructured data.
a. Twitter b. Facebook
c. Webpages d. All of the mentioned above
4. Which of the following is a branch of statistics?
a. Descriptive statistics b. Inferential statistics
c. Industry statistics d. Both A and B
5. The control charts and procedures of descriptive statistics which are used to enhance a procedure
can be classified into which of these categories?
a. Behavioral tools b. Serial tools
c. Industry statistics d. Statistical tools
Answer:
1-c 2-a 3-d 4-d 5-d
1. The word 'statistics' is derived from Latin word means status. (True/False)
2. Statistics is a science that deals with the techniques and methods of collection, classification and
presentation of data. (True/False)
3. A statistical question is one that results in varying responses and results (data). (True/False) 4.
There are two main types of statistical analysis: descriptive and inference, also unknown as
modeling. (True/False)
5. Statistics allows you to understand a subject much more superficially. (True/False)

Answer:
1-c 2-a 3-b 4-e 5-f
1. An analysis is a detailed ---------------- of a topic.

2. Quantitative research uses ---------------- data.
3. Age is a continuous variable because it can take on --------- of decimal places.
4. Discrete data is ---------------------- that can only take certain values.
5. Raw data is sometimes called ------------- data has not been processed for use.
Answer:
1-examination 2-numerical 3-any value 4-information 5-primary
SUMMARY
Data analytics is defined as a process of cleaning, transforming, and modeling data to uncover
insights useful for business decision making. It is nothing more than analyzing our past or future and
making decisions based on that. Now the same job that analyst does for business purpose is called
data analysis. To grow your business or even grow in your life, sometimes just analytics is enough.
This requires a well-designed study, a well-chosen sample, and an appropriate selection of statistical
tests. A variable is a trait that varies from person to person in a population. Quantitative variables are

measured using a scale and provide quantitative information, such as height and weight. Statistical
analysis shows "What's going on?" using data passed as dashboards.
Descriptive analytics analyzes complete data or a summary numerical data sample. Diagnostic scan
shows "Why is this happening?" by finding the cause from the insight found in the statistical
analysis. This analysis is useful for identifying patterns in data behavior. Predictive analytics shows
"what could happen" using previous data. Descriptive analytics combines information from all prior
analysis to determine the action to take to resolve an ongoing problem or decision. Based on current
situations and problems, they analyze data and make decisions.
While you need to understand what to do with the data and how to interpret the results, software
designed for statistical analysis can make the process as easy and smooth as possible. In fact,
statistical methods prevail in scientific research because they include planning, designing, collecting
data, analyzing, developing a meaningful interpretation, and communicating the results.
Furthermore, the results obtained from the research project are raw data that is meaningless unless
analyzed with statistical tools. Therefore, the identification of statistics in the study is absolutely
necessary to prove the research results. In this article, we will discuss how using statistical methods
in biology can help draw meaningful conclusions for the analysis of biological studies.
However, the statistical tools in research can help researchers understand what to do with the data
and how to interpret the results, making the process as easy as possible. In addition, it includes the
ability to create scripts that automate analysis or perform more advanced statistical processing. This
software package is used in human behavior research and other fields. Researchers and engineers use
this software and generate their own code and help answer their research questions. While Mat Lab
can be a difficult tool for new bies to use, it offers the flexibility of a researcher's needs. This is not
the best solution for statistical analysis in research, but MS Excel provides many tools for simple
data visualization and statistics. Similar to SPSS, Graph Pad provides a scripting option to automate
analyzes to perform complex statistical calculations. This software provides basic and advanced
statistical tools for data analysis. However, like Graph Pad and SPSS, Minitab requires proficiency in
coding and can provide automated analysis. Using statistical tools in research and data analysis.
A set of programmed instructions used to perform a certain task is called software. The term
"application software" refers to software that performs specific functions for the user. When the user
directly interacts with the software, it is called application software. The sole purpose of application
software is to assist users in performing specific tasks. Microsoft Word and Excel, as well as popular
web browsers like Firefox and Google Chrome, are examples of application software.
KEY WORDS

Software - Software is a collection of instructions, data or programs used to operate a computer and
perform specific tasks.
Hardware - Hardware refers to the tangible components or distribution system of a computer that
stores and executes the written instructions provided by the software.
System Software - System software is a type of computer program designed to run hardware and
application programs in a computer.
Application Software - This is a package of computer software that performs a specific function
directly for the end user or in some cases, for another application.
Piracy Software - The illegal copying, installation, use, distribution or sale of software in any way
other than as set forth in the license agreement.
Freeware - This is a program that is freely available.
Interpretation - The act of explaining the meaning of something.
Variable - This is the person, place, thing or phenomenon that you are trying to measure in some
way.
Data collection - Data analysis is the systematic application of statistical and/or logical techniques to
describe and illustrate, condense and summarize, and evaluate data.
Data Cleansing - Data Cleansing is the process of repairing or removing incorrect, corrupted,
improperly formatted, duplicate or incomplete data in a data set.
REFERENCES
1. Ryan, Thorne (2013). Caffeine and computer screens: student programmers endure weekend
long appathon. The Arbiter. Archived from the original on 2016-07-09.
2. Ceruzzi, Paul E. (2000). A History of Modern Computing. Cambridge, Massachusetts:
MIT Press. ISBN 0-262-03255-4.
3. Kenney, J. F.; Keeping, E. S. (1962). Mathematics of Statistics, Part 1 (3rd ed.).
Princeton, NJ: Van Nostrand Reinhold.
4. Wasserman, Larry (2004). All of Statistics. New York: Springer. p. 310. ISBN 978-
1-4419-2322-6.
YOUTUBE VIDEOS
1. https://www.youtube.com/watch?v=yZvFH7B6gKI
2. https://www.youtube.com/watch?v=BTB86HeZVwk

3. https://www.youtube.com/watch?v=lgCNTuLBMK4
WIKIPEDIA
1. https://www.guru99.com/what-is-data-analysis.html
2. https://imotions.com/blog/statistical-tools/
3. https://www.geeksforgeeks.org/what-is-application-software/
4. https://en.wikipedia.org/wiki/Application_software
REFERENCE BOOKS
1. Campbell-Kelly, Martin; Aspray, William (1996). Computer: A History of the Information

Machine. New York.
2. Howitt, D. and Cramer, D. (2008) Statistics in Psychology. Prentice Hall.
3. Charles Stangor (2011). "Research Methods for The Behavioral Sciences".
Wadsworth, Cengage Learning.
4. Wasserman, Larry (2004). All of Statistics. New York: Springer.

CREDIT 03

CREDIT 03-UNIT 01: INFERENTIAL STATISTICAL IN RESEARCH
LEARNING OBJECTIVES
 Students demonstrate knowledge of statistical data analysis.
 Define null hypothesis, alternative hypothesis, significance level, test statistic, p-value, and
statistical significance.
 Students develop the ability to create and evaluate data models.
 Students perform statistical analyzes using professional statistical software.
 Students demonstrate data management skills.
“Far better an approximate answer to the right question, which is often vague, than an exact
answer to the wrong question, which can always be made precise” — John W. Tukey.
INTRODUCTION
Statisticians use observed data to estimate population parameters. For example, sample means
are used to estimate population means; sample proportions, to estimate population proportions.
Fig.1.1: Basic statistical inferences
An estimate of a population parameter may be expressed in two ways, such as-
Point estimate. A point estimate of a population parameter is a single value of a statistic. For
example, the sample mean x is a point estimate of the population mean μ. Similarly, the sample
proportion p is a point estimate of the population proportion P.
Interval estimate. An interval estimate is defined by two numbers, between which a population
parameter is said to lie. For example, a < x < b is an interval estimate of the population mean μ. It
indicates that the population mean is greater than a but less than b.
It is often of interest to learn about the characteristics of a large group of elements such as
individuals, households, buildings, products, parts, customers, and so on. All the elements of interest
in a particular study form the population. Because of time, cost, and other considerations, data often

cannot be collected from every element of the population. In such cases, a subset of the population,
called a sample, is used to provide the data. Data from the sample are then used to develop estimates
of the characteristics of the larger population. The process of using a sample to make inferences
about a population is called statistical inference. In this section main affords are assigned to different
statistical analysis and fruitfulness output to the research students.
03-01-01: DIFFERENT STATISTICAL ESTIMATIONS

In mathematics, statistics is a method of interpreting, analyzing and summarizing data. Therefore,
types of statistics are classified according to the following signs: descriptive statistics and inferential
statistics. Based on the presentation of data in the form of glacier maps, graphs or tables, we analyze
and interpret them. Statistics is an application of mathematics that was originally conceived as the
science of statistics. For example, collecting and interpreting information about the country, such as
economy and population, military background, culture, etc. In terms of mathematical analysis,
statistics includes linear algebra, stochastic learning, differential equations, and dimensional
probability theory.
IMPORTANCE OF STATISTICS
Basically, statistical analysis is used to collect and examine information that is available in large
quantities. Statistics is a branch of mathematics in which calculations are performed on various data
using graphs, tables, charts, etc. The data collected here for analysis are called metrics. Now, when
we need to measure the data based on the scenario, the sample is taken from the population. Then the
analysis or calculation is done for the next measurement.
TYPES OF STATISTICS IN MATHEMATICS

Statistics belong to two main categories:
1. Descriptive statistics
2. Reference statistics
1. Descriptive statistics
This type of statistics presents data was concluded. The summation is based on a sample number
using parameters such as mean or standard deviation. Descriptive statistics is a way of organizing,
presenting, and describing data sets using tables, graphs, and summary measures. For example,
meeting people in the city through the Internet or television.
Descriptive statistics are also divided into four different categories:
i. Frequency measurement
ii. Measurement of diffusion
iii. Measures of Central Tendency

iv. Location Measurement
Frequency measurement is the number of times a particular tag occurs. The range, variance, and
standard deviation are measures of dispersion. It defines the distribution of data. The main trends are
the mean, median and mode of the data. And the rank measure describes the percentile and quartile
rank.
2. Reference statistics
This type of statistics serves to explain the importance of descriptive statistics. In other words, after
collecting, analyzing and synthesizing the data, we use these statistics to describe the meaning of the
collected data. Or we can say that it is used to draw conclusions from data that depend on random
variables such as observation error, sampling variables, etc. or draw conclusions from the population.
This allows us to make statements that are beyond the available information or data. For example,
take assumptions from hypothetical research.

STATISTICAL SIGNIFICANCE
i. Statistics work and provide a transparent picture of what we do on a regular basis.
ii. Statistical methods help us study various fields such as medicine, business administration,
economics, social sciences, etc.
iii. Statistics provide us with various types of data organized through graphs, tables, charts and
graphs.
iv. Statistics help to quantitatively understand the change in the data pattern of
v. Statistics give us a better understanding of the number of microdata. Statistics is a tool for
collecting accurate quantitative data
STATISTICAL ESTIMATION
In statistics, estimation refers to the process of drawing conclusions about a population from data
obtained from a sample.
Statistics uses sample statistics to estimate population parameters. For example, sampling means are
used to estimate population means; A sample proportion to calculate a population proportion.
The accommodation parameter estimate can be expressed in two ways:
i. Rating points. A point estimate of a population parameter is a single statistical value. For
example, the sample means x is a point estimate of the population μ.
Similarly, the sample rate p is a point estimate of the population proportion P.

ii. Estimated range. Compute the interval defined by two numbers between which the residual
parameter lies. For example, a > x and if; b is an interval estimate of the population mean μ.
This indicates that the population means is greater than a, but less than b.

CONFIDENCE INTERVAL
Statistics use confidence intervals to express the precision and uncertainty associated with a
particular sampling method. A confidence interval consists of three parts.
i. Confidence level
ii. Statistics.
iii. Margin of error.
The confidence level describes the uncertainty of the test procedure. The statistic and the margin of
error define an interval estimate that describes the precision of the method. The interval estimate of
the confidence interval is determined by the margin of error of the sample statistic. Suppose we are
computing an interval estimate of a population parameter. We can describe this estimate as a 95%
confidence interval. This means that if the same sampling method is used to select different samples
and calculate different interval estimates, the true population parameter will fall within the interval
defined by the margin of error. A sample statistical number 95% of the time. A confidence interval is
larger than a point estimate because the confidence interval indicates (a) the precision of the estimate
and (b) the uncertainty of the estimate.

1) What is meant by statistical estimation?
Answer: Estimate one of the many statistical methods for calculating the net worth of a population
based on selected observations from the population. For example, a point estimate is a number that
most closely represents the property's value.
2) What is statistical analysis?

Answer: Statistical analysis is the collection and interpretation of data to reveal patterns and trends.
It is part of data analysis. Statistical analysis can be used in situations such as gathering research
interpretations, statistical modeling, or designing surveys and studies.
3) What is statistical analysis?

Answer: Statistical analysis is the collection and interpretation of data to reveal patterns and trends.
It is part of data analysis. Statistical analysis can be used in situations such as collecting research
interpretations, statistical modeling, or designing surveys and studies.
4) What are the types of statistical analysis?

Answer: Types of Statistical Analysis. There are two main types of statistical analysis:
descriptive and inferential, also known as modeling.
03-01-02: HYPOTHESIS TESTING

In statistics, the determination of differences between groups of data due to true variability is done
through hypothesis testing. The sample data is obtained based on the assumptions of the population
parameter. Hypotheses can be divided into different types. In this article, let's take a detailed look at
the definition of a hypothesis the, different types of hypotheses and the importance of hypothesis
testing.
DEFINITION OF HYPOTHESES IN STATISTICS

In statistics, a hypothesis is defined as a formal statement that explains the relationship
between two or more variables in a given population. It helps the researcher to translate the given
problem into a clear explanation of the research result. It clearly explains and predicts the intended
outcome. It defines types of experimental design and guides the investigation of the research process.
TYPES OF HYPOTHESES
Hypotheses can be roughly divided into different types. They are:
i. Simple Hypothesis: A simple hypothesis is a hypothesis that there is a relationship
between two variables. One is called the dependent variable and the other is the
independent variable.
ii. Complex Hypothesis: Complex hypothesis is used when there is a relationship between
the variables. There are more than two dependent and independent variables in this
hypothesis.
iii. Null Hypothesis: In the null hypothesis, there is no significant difference between the
populations reported in the experiments due to experimental or sampling error. The null
hypothesis is denoted by .
iv. Alternative Hypothesis: In the alternative hypothesis, simple observations are easily
influenced by chance. It is designated or .

v. Empirical hypothesis: an empirical hypothesis is formed by experience and is based on
evidence.
vi. Statistical Hypothesis: In a statistical hypothesis, a statement is said to be logical or
illogical and the hypothesis is confirmed statistically.
vii. In addition to these types of hypotheses, there are also focused and ineffective hypotheses,
focused hypotheses and other random hypotheses.
FEATURES OF A HYPOTHESIS
The important features of a hypothesis are:
i. Hypothesis should be short and precise
ii. It should be specific

iii. Hypothesis must be related to existing knowledge
iv. It must be tested
STEPS IN HYPOTHESIS TESTING

All hypotheses are tested in a four-step process:
i. The first step for the analyst is to formulate two hypotheses so that only one hypothesis is
true.
ii. The next step is to develop an analysis plan that defines how the data will be evaluated.
iii. The third step is the execution of the plan and the physical analysis of the sample data.
iv. The fourth and final step is to analyze the results and either reject the null hypothesis or
confirm that the null hypothesis is true given the data.
SHORT ANSWERS QUESTIONS WITH MODEL ANSWER

1) What is hypothesis testing?
Answer: Hypothesis testing is a statistical process in which an analyst tests a hypothesis about a
population parameter. The methodology used by the analyst depends on the type of data used and the
reason for the analysis.
2) What are hypothesis tests and examples?

Answer: The main purpose of statistics is hypothesis testing. For example, you can conduct an
experiment and find that a certain drug is effective in treating headaches.

But if you can't replicate the experiment, no one will take your results seriously.
3) What is the main purpose of hypothesis testing?
Answer: The purpose of hypothesis testing is to test whether the null hypothesis (no difference, no
effect) is rejected or confirmed. If the null hypothesis is rejected, the research hypothesis can be
accepted. If the null hypothesis is accepted, the research hypothesis is rejected.
4) What are the three types of hypothesis tests?

Answer: The most common null hypothesis test for this type of statistical relationship is the t test. In
this section, we discuss three types of t-tests that are used for slightly different research designs: 1-
sample t-tests, dependent-samples t-tests, and independent-samples t-tests.
03-01-03: HYPOTHESIS TESTING

NULL HYPOTHESIS
Introduction (Null hypothesis) In mathematics, statistics is the study and investigation of
numerical data. To participate in surveys, we need to define a hypothesis. In general, there are two
types of hypotheses. One is the null hypothesis and the other is the alternative hypothesis.
In probability and statistics, the null hypothesis is a general statement or standard condition that
nothing or nothing will happen. For example, there is no correlation between groups or between two
measured events. Here, the hypothesis is generally assumed to be true until no other evidence is
found to reject the hypothesis. Let's learn more about its definition, symbol, principle, types and
examples in this article.
NULL HYPOTHESIS DEFINITION

A null hypothesis is a type of hypothesis describing a population parameter whose purpose is to test
the validity of given experimental data. This hypothesis is either rejected or not rejected based on the
viability of a particular population or sample. In other words, the null hypothesis is a hypothesis in
which the selected observations are random. It should be a statement that experts want to verify the
data. It is denoted by H0.
NULL HYPOTHESIS SIGN

In statistics, the null hypothesis is usually denoted by the letter H with the subscript "0" (zero), It is
pronounced . Meanwhile, the alternative hypothesis represents observations that are determined by
non-random causality. It is represented by .
THE PRINCIPLE OF THE NULL HYPOTHESIS

The principle used in the testing of null hypotheses is to collect data and determine the probability of
specific data in the course of research using random sampling, taking into account the validity of the
null hypothesis. If the given data do not meet the expected null hypothesis, the result will be very
weak and they will conclude that the given data set does not have strong evidence against the null
hypothesis due to insufficient evidence. Finally, researchers usually reject this.
NULL HYPOTHESIS FORMULA

Here, the hypothesis test formulas are given below for reference.
The formula for the null hypothesis is:
:P=
The formula for the alternative hypothesis is:
= P> , and ;P≠
The formula for a static test is:
Note that is the null hypothesis and P is the sample proportion.
TYPES OF NULL HYPOTHESES

Types of hypotheses are different. They are:
i. Simple Hypothesis: It gives the complete population distribution. With this method, the
sampling distribution is a function of the sample size.
ii. Composite Hypothesis: A composite hypothesis is a hypothesis that does not fully explain
the population distribution.
iii. Exact Hypothesis: Exact hypothesis determines the exact value of the parameter.
Example: μ= 50
iv. Imprecise hypothesis: This type of hypothesis does not specify the exact value of the
parameter. But it shows a specific range or interval. For example, 45 < μ 60

REJECTING THE NULL HYPOTHESIS
Sometimes the null hypothesis is also rejected. If this hypothesis is rejected, it means that the
research may be invalid. Many researchers ignore this hypothesis because it is simply the opposite of

an alternative hypothesis. It is better to create a hypothesis and test it. The goal of the researchers is
not to reject the hypothesis. However, it is clear that a perfect statistical model always involves not
rejecting the null hypothesis.
HOW DO YOU FIND THE NULL HYPOTHESIS ?

The null hypothesis asserts that there is no relationship between the measured event (the dependent
variable) and the independent variable. To test it, we don't have to believe that the null hypothesis is
true. Conversely, you can assume that there is a relationship between a set of variables (dependent
and independent). In many common applications, selection of null hypotheses cannot be automated,
but testing and calculations can be automated. The incomplete hypothesis selection is also based
solely on previous experience and conflicting advice. The choice can be more complex and based on
different applications and different purposes. The main limitation in choosing a null hypothesis is
that the hypothesis suggested by the data is based on evidence that proves nothing. This means that if
a hypothesis provides a summary of a data set, it is not worth testing the hypothesis against a specific
data set.
WHEN IS THE NULL HYPOTHESIS REJECTED ?

The null hypothesis is rejected using the p-value approach. If the P value is less than or equal to α,
the null hypothesis should be rejected in favor of the alternative hypothesis. If the value of P is
greater than α, the null hypothesis cannot be rejected.
EXAMPLES OF N ULL HYPOTHESES
Below are some examples of null hypotheses. Read the following to better understand the concept of
null hypothesis. Few more examples are:
i Are there is 100% chance of getting affected by dengue?
Answer: There could be chances of getting affected by dengue but not 100%.
ii Do teenagers are using mobile phones more than grown-ups to access the internet?
Answer: Age has no limit on using mobile phones to access the internet.
iii Does having apple daily will not cause fever?
Answer: Having apple daily does not assure of not having fever, but increases the immunity to
fight against such diseases.
iv Do the children better in doing mathematical calculations than grown-ups?
Answer: Age has no effect on Mathematical skills.
ALTERNATIVE HYPOTHESIS

The alternative hypothesis asserts that there is a statistically significant relationship between
two variables. Whereas the null hypothesis states that there is no statistical relationship between two
variables. In statistics, we usually come across different types of hypotheses. A statistical hypothesis
is provided for the statement of work based on the given logical data. It should be noted that the
hypothesis is not considered true or false.
DEFINITION
An alternative hypothesis is a statement used in a statistical test of inference. It contradicts the
null hypothesis and is denoted by or . We can also say that it is only an alternative to zero. In
hypothesis testing, an alternative theory is a statement that the researcher is testing. This statement is
true from the researcher's point of view and ultimately becomes a rejection of the null to replace it
with the alternative hypothesis. In this hypothesis, researchers predict a difference between two or
more variables that the sample of data observed in the test is not random.
EXAMPLE
In order to check the quality of river water, researchers observe for a year. According to the null
hypothesis, water quality in the first half of the year will not change compared to the second half of
the year. But in the alternative hypothesis, the water quality observed in the second half is bad.
Table 3.1: Difference between Null and Alternative Hypothesis

Null Hypothesis Alternative Hypothesis
It denotes there is no relationship between two It’s a hypothesis that a random cause may
measured phenomena. influence the observed data or sample.
It is represented by It is represented by
Example: Rohan will win at least Example: Rohan will win less than Rs.100000
in lucky draw.
Rs.100000 in lucky draw.
TYPES OF ALTERNATIVE HYPOTHESIS

Basically, there are three types of the alternative hypothesis, they are;
i. Left-Tailed: Here, it is expected that the sample proportion (π) is less than a specified value
which is denoted by π0, such that;
H1: π < π0
ii. Right-Tailed: It represents that the sample proportion (π) is greater than some value, denoted by
π0. H1: π > π0
iii. Two-Tailed: According to this hypothesis, the sample proportion (denoted by π) is not equal to
a specific value which is represented by π0.
H1: π ≠ π0
Note: The null hypothesis for all the three alternative hypotheses, would be H1: π = π0.
PAIRED TESTING
In statistics, a t-test can be represented as a statistical hypothesis test in which the test statistic
supports the Student's t-distribution when the null hypothesis is established. In a paired test, they
compare the means of two observation groups. Observations must be randomly assigned to each of
the two groups so that the difference in response observed is due to the treatment and not to other
factors. With two samples, an observation of one sample can be matched with an observation of the
other sample. This test can be used to make pre-event and post-event observations of a sample. Now,
let's take a closer look at what the paired t-test is, its formula, schedule, and how to perform the
paired t-test.
PAIRED TEST DEFINITION

The paired t test provides a hypothesis test of the difference between the population means for a set
of random samples whose variance is approximately normally distributed. Subjects are often tested in
a before-and-after situation or with subjects that are as similar as possible. A paired t test is a test
where the difference between two observations is zero.
Assume two paired sets such as Xi and Yi for i = 1, 2, …, n, whose pairwise differences are
independent, identical, and normally distributed. A paired t test then determines whether they are
significantly different.
PAIRED TEST FORMULA

Paired Test is a test based on the difference between the values of a pair. Well, one gets wet from the
other. In the formula for the paired test, this difference is denoted as d. The formula for the paired t
test is defined as the sum of the differences of each pair divided by the square root of n equal to the
sum of the squared differences divided by the squared difference for a total of n-1. The formula for
the paired t test is given by --
, Where = -
Paired T-Test Table

The Paired T-Test Table allows the t-value of a t-test to be converted to a significance statement.
The table is given below (Table 3.2):
Table 3.2: Paired t-test table

PAIRED VS UNPAIRED T-TEST
The similarity between paired and unpaired t-test is that both assume data from the normal
distribution.
CHARACTERISTICS OF UNPAIRED T-TEST:
i. The two groups taken should be independent.
ii. The sample size of the two groups need not be equal.
iii. It compares the mean of the data of the two groups.

iv. 95% confidence interval for the mean difference is calculated.
CHARACTERISTICS OF PAIRED T-TEST:

i. The data is taken from subjects who have been measured twice.
ii. 95% confidence interval is obtained from the difference between the two sets of joined
observations.
HOW TO FIND THE PAIRED T-TEST
Let us take two sets of data that are related to each other, say X and Y with xi X, yi Y. where i
= 1, 2, ……., n. Follow the steps given below to find the paired t-test.
i. Assume the null hypothesis that the actual mean difference is zero.
ii. Determine the difference di = yi – xi between the set of observation.
iii. Compute the mean difference.
iv. Calculate the standard error of the mean difference, which is equal to S d /√n, where n is the total
number, and Sd is the standard deviation of the difference.
v. Determine the t-statistic value.

vi. Refer to the T-distribution table and compare it with the tn-1 distribution. It gives the p value.

1) What is the null in the hypothesis?
Answer: The null hypothesis is a type of statistical hypothesis that suggests that there is no statistical
significance in a given set of observations. Hypothesis testing is used to assess the reliability of a
hypothesis using sample data. Sometimes referred to simply as "null", it is represented as H0.
2) Why do we use the null hypothesis?

Answer: A null hypothesis is useful because it can be tested to conclude whether or not there is a
relationship between two measured phenomena. It can inform the user that the results obtained are
due to chance or due to the falsification of a phenomenon.
3) How to do the test?

Answer: Two people sit on a single machine and work together to run and evaluate the same test.
Basically, a single task is shared between two people who exchange ideas, discuss test scenarios,
take notes, and generally collaborate to test the functionality of the software. This is a form of
intelligence testing.
4) What is the main purpose of party?
Answer: Pair testing helps break down barriers, collaborate with new people, and bounce ideas off of
testers' constructive feedback so that each role better understands where the other fits in and how it
leads to quality
03-01-04: TEST OF SIGNIFICANCE

Statistics is a sub-discipline of mathematics. It deals with the collection, presentation, analysis,
organization and interpretation of data, usually numerical. It is applied in many industrial,
scientific, social and economic fields. When a researcher conducts a study, a hypothesis must be
formulated, known as the null hypothesis. This hypothesis should be tested using preliminary
statistical studies. This process is called statistical hypothesis testing. Significance level or statistical
significance is an important term often used in statistics. In this article, we will discuss the
importance level in detail.
WHAT IS STATISTICAL SIGNIFICANCE?

In statistics, "significance" means "not random" or "probably true." We can say that when a
statistician says that a result is "highly significant," he is indicating that it is probably true. This does
not mean that the result is very important, but it shows that its probability is very high.
DEFINITION OF SIGNIFICANCE LEVEL

The significance level is defined as the false positive probability of rejecting the null hypothesis
when it is true. The level of significance is given as the probability of a type I error and is given by
the researcher with the error results. A significance level is a measure of statistical significance. It
determines whether the null hypothesis is accepted or rejected. The null hypothesis is expected to be
false or rejected if the result is statistically significant.
SIGNIFICANCE LEVEL
Significance is denoted by the Greek symbol α (alpha). Therefore, the significance level is defined
as follows:
Significance level = p (type I error) = α
Values or observations are less likely to be farther from the mean. The results are written as
"significant at x%".
Example: Significant value at 5% for p-value less than 0.05 or p and; 0.05. Similarly, significant at
1% means that the p-value is less than 0.01.
A significance level of 0.05 or 5% is assumed. If the p-value is low, it means that the detected values
are significantly different from the initial estimated population value. A p-value is considered more
significant when it is as small as possible. Even if the p value is very small, the result will be very
significant. But in general, p-values of less than 0.05 are considered significant because it is
convenient to have a p-value of less than 0.05.
HOW DO YOU FIND THE SIGNIFICANCE LEVEL ?
To measure the statistical significance of a result, the estimator must first calculate the p-value. It
determines the probability of determining the effect that confirms the null hypothesis. If the p-value
is less than the significance level (α), the null hypothesis is rejected. If the p-value thus observed is
not less than the significance level α, the null hypothesis is theoretically accepted. But in practice we
often increase the sample size and test if we have reached the level of significance. General
interpretation of p-value based on 10% significance level:
• If p > 0.1, the null hypothesis is not accepted.
• If p > 0.05 and p ≤ 0.1, it means that there is a low probability of the null hypothesis.
• If p > 0.01 and p ≤ 0.05, there should be a strong hypothesis about the null hypothesis.
• If p ≤ 0.01, then a very strong hypothesis about the null hypothesis is indicated.

Example of significance level
If we get a p-value of 0.03, this means that with the null hypothesis, the probability that the
difference is greater than what we found in our study is only 3%. Now we need to determine whether
this result is statistically significant. We know that the null hypothesis is true when the probability is
5% or less, and we tend to reject the null hypothesis and accept the alternative hypothesis. In this
case, the probability is 0.03, which is 3% (less than 5%), which ultimately means that we reject our
null hypothesis and accept the alternative hypothesis.

P -VALUE
In statistics, a researcher examines the significance of an observed result, known as a test statistic.
Hypothesis testing is also used for this test. The concept of P-value or probability is used throughout
statistical analysis. It determines the statistical significance and level of significance testing. In this
article, let's discuss the definition, formula, table, interpretation and use of P value to determine
significance level and more. Details
 DEFINITION OF P-VALUE
The P-value is called the probability value. It is defined as the probability of obtaining an outcome
that is the same or more extreme than the actual observation. The P value is known as the ultimate
level of significance within a hypothesis test that represents the probability that a given event will
occur. The P value is used as an alternative to the rejection point to indicate the lowest significance
at which the null hypothesis is rejected. When the P value is small, the alternative hypothesis is
stronger.
P-VALUE TABLE
Table 4.1: The P-value table shows the hypothesis interpretations
P-value Decision
P-value > The result is not statistically significant and hence don’t reject the null
hypothesis.
0.05
P-value < The result is statistically significant. Generally, reject the null
hypothesis in favour of the alternative hypothesis.
0.05
P - value < The result is highly statistically significant, and thus rejects
0.01 the null hypothesis in favour of the alternative hypothesis.
P - value > The result is not statistically significant and hence don’t reject
0.05 the null hypothesis.

In general, the level of statistical significance is often expressed as a p-value, which ranges from 0 to
1. The smaller the p value, the stronger the evidence, and therefore the result should be statistically
significant. Therefore, when the p-value is small, you reject the null hypothesis. To better
understand the concept of P-value, let's look at an example. Suppose a researcher tosses a coin ten
times with the null hypothesis that it is fair. The total number of heads is a two-tailed test statistic.
Suppose the researcher marks heads and tails (HTHTHTHTH) on each leaf in turn. Since this is the
predicted number of heads, the test statistic is 5 and the p-value is 1 (no exceptions). Assume that
the test statistic for this study is "number of switches" (ie, the number of times H is followed by T or
T is followed by H), which is again two-tailed. This gives a very high-test statistic of 9 and a p-value
of 1/28 = 1/256 or about 0.0039. This would be regarded as extremely significant, much beyond the
0.05 level. These findings suggest that the data set is exceedingly improbable to have happened by
random in terms of one test statistic, yet they do not imply that the coin is biased towards heads or
tails.
The data have a high p-value according to the first test statistic, indicating that the number of heads
observed is not impossible. The data have a low p-value according to the second test statistic, indicating
that the pattern of flips observed is extremely unlikely. There is no ―alternative hypothesis, (therefore
only the null hypothesis can be rejected), and such evidence could have a variety of explanations – the
data could be falsified, or the coin could have been flipped by a magician who purposefully swapped
outcomes. This example shows that the p-value depends entirely on the test statistic used and that the p-
value can only be used to reject the null hypothesis and not to test the alternative hypothesis.
P-VALUE FORMULA
We know that P-value is a statistical measure that helps to determine whether a hypothesis is true or not.
A P value is a number between 0 and 1. The level of significance (α) is a fixed threshold to be determined
by the researcher. It is usually set to 0.05. The formula for calculating the P value is
T-TEST
The t-test is any statistical hypothesis test in which the test statistic follows a Student’s t distribution
under the null hypothesis. It can be used to determine if two sets of data are significantly different
from each other, and is most commonly applied when the test statistic would follow a normal
distribution if the value of a scaling term in the test statistic were known.

T-test uses means and standard deviations of two samples to make a comparison. The formula for T-
test is given below:
STANDARD DEVIATION
A standard deviation (or σ) is a measure of how dispersed the data is in relation to the mean. Low
standard deviation means data are clustered around the mean, and high standard deviation indicates
data are more spread out. Variance and Standard deviation are the two important topics in Statistics.
It is the measure of the dispersion of statistical data. Dispersion is the extent to which values in a
distribution differ from the average of the distribution. To quantify the extent of the variation, there
are certain measures namely:
i. Range
ii. Quartile Deviation
iii. Mean Deviation
iv. Standard Deviation
The degree of dispersion is calculated by the procedure of measuring the variation of data points. In
this article, you will learn what is variance and standard deviation, formulas, and the procedure to
find the values with examples.
The formula for standard deviation is given by:

F- TEST
An F Test is a catch-all term for the tests which are using the F-distribution. Mostly, when people
talk about the F-Test, they are actually talking about the F-Test to Compare Two Variances. But the
f-statistic is used in a variety of tests such as regression analysis, the Chow test, and the Scheffe test.
If we are running an F-Test, we may use many kinds of technology to run the test. Because doing F-
test by hand, including variances, is a complex and time-consuming task. This article will explain
students about the F Test formula with examples. Let us learn it!
DEFINITION OF F-TEST:
In statistics, a test statistic has an F-distribution under the null hypothesis is known as an F test. It is
used to compare the statistical models as per the data set available. George W. Snedecor, in honor of
Sir Ronald A. Fisher, has given name to this formula as F Test Formula.
F TEST FORMULA
If we are using an F Test using technology, the following steps are there:
State the null hypothesis with the alternate hypothesis.
Calculate the F-value, using the formula. Find the F Statistic which is the critical value for this test.
This F-statistic formula is the ratio of the variance of the group means divided by the mean of the
within-group variances. Finally, support or reject the Null Hypothesis.
1) How do you know if a test is important?

Answer: If the calculated t-value is equal to or greater than the t-value in the table, the researcher
can conclude that there is a possibility that the relationship between the two variables is statistically
significant and is not random and rejects it. null hypothesis
2) Why is significance testing important?

Answer: Significance tests play a key role in experiments: they allow researchers to determine
whether their data support or reject a null hypothesis and, therefore, whether they can accept their
alternative hypothesis.
3) Is the p-0.1 value significant?
Answer: The smaller the p value, the stronger the evidence for rejecting H0. This leads to a guideline
of p<0>0.05, which is the probability that the null hypothesis is correct. 1 minus the P value is the
probability that the alternative hypothesis is true. A statistically significant test result (P ≤ 0.05)
means that the test hypothesis is incorrect or should be rejected. A P value greater than 0.05 means
that no effect was observed.
4) What does a p-value of 0.05 mean?

Answer: P > 0.05 is the probability that the null hypothesis is true. 1 minus the P value is the
probability that the alternative hypothesis is true. A statistically significant test result (P ≤ 0.05)
means that the test hypothesis is incorrect or should be rejected. A P value greater than 0.05 means
that no effect was observed.
1. A statement made about a population for testing purpose is called?

a. Statistic b. Hypothesis
c. Level of Significance d. Test-Statistic
2. If the assumed hypothesis is tested for rejection considering it to be true is called?
a. Null Hypothesis b. Statistical Hypothesis
c. Simple Hypothesis d. Composite Hypothesis
3. A statement whose validity is tested on the basis of a sample is called?
4. A hypothesis which defines the population distribution is called?
5. If the null hypothesis is false then which of the following is accepted?
a. Null Hypothesis b. Positive Hypothesis
c. Negative Hypothesis d. Alternative Hypothesis.
Answer:
1-b 2-a 3-b 4-c 5-d

1. Alternative Hypothesis is also called as Research Hypothesis. (True/False)
2. If the Null Hypothesis is false then Alternative Hypothesis is accepted. (True/False)
3. Test statistic not provides a basis for testing a Null Hypothesis. (True/False)
4. A test statistic is a random variable that is calculated and not used in a hypothesis test.
(True/False)
5. The rejection probability of Null Hypothesis when it is true is called as level of significance.
(True/False)
Answer:
1-True 2-True 3-False 4-False 5-True
Column-I Column-II
1. Chi-square test a. Test for significance difference of mean value in two small
sized sample when population deviation is not observed
2. ANOVA (f-test) b. Test for goodness fit of distribution
3. Z-test c. Test for significance difference of mean value in two
sample group
4. T-test d. Test for significance difference of mean value in two
sample group
Answer:
1-b 2-c 3-d 4-a

1. Hypothesis testing is used to assess the plausibility of a hypothesis using ------.
2. ---------------- is a statement made about a population in general.
3. If the P-value is less, reject the --------------.
4. Simple hypotheses are ones which give probabilities to ------------- observations.
5. The null hypothesis (H0) assumes that the true mean difference is ----------.
Answer:
1-sample data. 2- Hypothesis 3- null hypothesis 4- potential 5-equal to zero
SUMMARY

Based on the representation of data such as using pie charts, bar graphs, or tables, we analyses and
interpret it. In terms of mathematical analysis, the statistics include linear algebra, stochastic study,
differential equation and measure-theoretic probability theory. The data collected for analysis here is
called measurements. In Statistics, the determination of the variation between the group of data due
to true variation is done by hypothesis testing. In this article, let us discuss the hypothesis definition,
various types of hypothesis and the significance of hypothesis testing, which are explained in detail.
It helps the researcher to translate the given problem to a clear explanation for the outcome of the
study.
In mathematics, Statistics deals with the study of research and surveys on the numerical data. In
probability and statistics, the null hypothesis is a comprehensive statement or default status that there
is zero happening or nothing happening. The null hypothesis is a kind of hypothesis which explains
the population parameter whose purpose is to test the validity of the given experimental data. In
statistics, the null hypothesis is usually denoted by letter H with subscript ‗0‘ (zero), such that H0.
The principle followed for null hypothesis testing is, collecting the data and determining the chances
of a given set of data during the study on some random sample, assuming that the null hypothesis is
true.
In Statistics, the researcher checks the significance of the observed result, which is known as test
static. The P-value is known as the level of marginal significance within the hypothesis testing that
represents the probability of occurrence of the given event. The P-value is used as an alternative to
the rejection point to provide the least significance at which the null hypothesis would be rejected.
The t-test is any statistical hypothesis test in which the test statistic follows a Student’s t-distribution
under the null hypothesis. It can be used to determine if two sets of data are significantly different
from each other, and is most commonly applied when the test statistic would follow a normal
distribution if the value of a scaling term in the test statistic were known.
A test statistic which has an F-distribution under the null hypothesis is called an F test. To compare
the variance of two different sets of values, the F test formula is used. To be applied to F distribution
under the null hypothesis, we first need to find out the mean of two given observations and then
calculate the variances.
KEY WORDS
Significance - A quality worthy of attention; Meaning.
Virtue- The quality of being moral or virtuous.
Probability - Probability is how likely something is to happen.

Refusal - Refusal or rejection of a proposal, idea, etc.

Guess - He said so, without proof.
Significance Level - Significance level is a measure of statistical significance.
Hypothesis - A hypothesis or explanation proposed based on limited evidence as a starting point for
further investigation.
Pair Testing - This is a method of software testing where two people test the same function at the
same time in the same place by constantly exchanging ideas.
REFERENCES
1. Amrhein, Valentin; Greenland, Sander (2017). Remove, rather than redefine, statistical
significance. Nature Human Behaviour. 2 (1): 0224.
2. Royall R (2004). The Likelihood Paradigm for Statistical Evidence. The Nature of Scientific
Evidence. pp. 119–152.
3. Hubbard R, Bayarri MJ (2003), Confusion Over Measures of Evidence (p′s) Versus
Errors (α′s) in Classical Statistical Testing, The American Statistician, 57(3):171178
4. Goodman, S N (June 15, 1999). Toward evidence-based medical statistics. 1: The P Value
Fallacy. Ann Intern Med. 130 (12): 995–1004.
YOUTUBE VIDEOS
1. https://www.youtube.com/watch?v=KS6KEWaoOOE
2. https://www.youtube.com/watch?v=VK-rnA3-41c
3. https://www.youtube.com/watch?v=Q1yu6TQZ79w
4. https://www.youtube.com/watch?v=ChLO7wwt7h0
WIKIPEDIA
1. https://byjus.com/maths/tests-of-significance/
2. https://byjus.com/maths/what-is-null-hypothesis/
3. https://byjus.com/t-test-formula/
4. https://byjus.com/maths/p-value/
REFERENCE BOOK
1. Crease, Robert P. (2008), The Great Equations, New York: W. W. Norton.
2. Gregory Vlastos, Myles Burnyeat (1994), Socratic studies, Cambridge.
3. Bellhouse, P. (2001), John Arbuthnot", in Statisticians of the Centuries by C.C.
Heyde and E. Seneta, Springer.
4. Reinhart A (2015), Statistics Done Wrong: The Woefully Complete Guide. No Starch
Press.
CREDIT 03-UNIT 02: BIOSTATISTICAL TEST
LEARNING OBJECTIVES
 Students should know the specific terminology and spelling of statistical analysis.
 Students should learn how statistical methods fit into the general scientific process.
 Students should learn about testing, especially analysis and its importance.
 Students should understand the concept of frequency distribution as these are individual
values on a measurement scale.
 Students should organize the data into a normal or clustered frequency distribution table as
shown in table
“Statistics are the triumph of the quantitative method, and the quantitative method is the victory of
sterility and death.” -- Hilaire Belloc
INTRODUCTION
Statistical tests are used in the hypothesis testing. They can be used to: determine whether a
predicted variable has a statistically significant relationship with an outcome variable estimate the
difference between two or more groups. A statistical test provides a mechanism for making
quantitative decisions about a process or processes. The intent is to determine whether there is
enough evidence to "reject" a conjecture or hypothesis about the process. The conjecture is called the
null hypothesis. Not rejecting may be a good result if we want to continue to act as if we "believe"
the null hypothesis is true. Or it may be a disappointing result, possibly indicating we may not yet
have enough data to "prove" (Fig.2.1) something by rejecting the null hypothesis.
Fig.3.2.1: Observing as a test’s efficiency

th
During the 19 century, with the emergence of public health as a goal to improve hygiene and
conditions of the poor, statistics established itself as a distinct scientific field important for critically

interpreting studies of public health concerns. During the twentieth century, statistics began to evolve
mathematically and methodologically with hypothesis testing and experimental design.
Today, the design of medical experiments centers around clinical trials and observational studies, and
with the use of statistics, the collected data are summarized, weighed, and presented to direct both
physicians and the public towards evidence-based medicine. Having a basic understanding of
statistics is mandatory in evaluating the validity of published literature and applying it to patient care.
In this review, we aim to apply a practical approach in discussing basic statistical tests by providing a
guide to choosing the correct statistical test along with examples relevant to hand surgery research.
This section gives methods for constructing test statistics and their corresponding critical values for
both one-sided and two-sided tests for the specific situations outlined under the scope. It also
provides guidance on the sample sizes required for these tests.
03-02-01: BIOSTATICAL TESTS

Bio-statistical tests work by calculating a test statistic- a number that describes how much the
relationship between variables in your test differs from the null hypothesis of no relationship. If the
value of the test statistic is more extreme than the statistic calculated from the null hypothesis, then
you can infer a statistically significant relationship between the predictor and outcome variables.
Fig.3.2.2: Showing right statistical test

If the value of the test statistic is less extreme than the one calculated from the null hypothesis, then
you can infer no statistically significant relationship between the predictor and outcome variables.
WHEN TO PERFORM A STATISTICAL TEST
You can perform statistical tests on data that have been collected in a statistically valid manner,
either through an experiment, or through observations made using probability sampling methods. For
a statistical test to be valid, your sample size needs to be large enough to approximate the true
distribution of the population being studied.

STATISTICAL ASSUMPTIONS
If your data do not meet the assumptions of normality or homogeneity of variance, you can perform
a nonparametric statistical test that allows you to make comparisons without making assumptions
about the distribution of the data. If your data does not satisfy the assumption of independence of
observations, you can use a test that estimates the structure of your data. Statistical tests make some
common assumptions about the data they are testing:
i Independence of observations: The observations/variables you include in your test are not
related (for example, multiple measurements of a single test subject are not independent,
while measurements of multiple different test subjects are independent).
ii Homogeneity of variance: The variance within each compared group is the same across all
groups. If one group is significantly different than the others, this limits the effectiveness of
the test.
iii Normality of data: The data follows a normal distribution (also known as a bell curve). This
assumption only applies to quantitative data.
TYPES OF VARIABLES
The types of variables you have usually determine what type of statistical test you can use.
Quantitative variables represent a set of items (for example, the number of trees in a forest).
Types of quantitative variables include:
i Continuous: Represents dimensions and can usually be divided into units of less than one.
ii Discrete: Represents numbers and usually cannot be divided into units less than one.
iii Categorical variables represent groupings of objects (for example, different types of trees in a
forest). Types of categorical variables include:
iv Order: Display data in order (e.g. ratings).
v Nominal: for group names (e.g. brand or species names).
vi Binary: Displays dates with a yes/no or 1/0 outcome (e.g. win or lose).
PARAMETRIC TESTING:
Parametric tests generally have more stringent requirements than nonparametric tests and can draw
stronger conclusions from the data. They can only be performed on data that meet the usual
assumptions of a statistical test. The most common types of parametric tests include regression
tests, comparison tests, and correlation tests.
i. Regression tests: regression tests look for cause-and-effect relationships. They can be used to
estimate the effect of one or more continuous variables on another variable.
a Simple linear regression -

b Multiple linear regressions -
c Logistic regression -
ii. Comparison tests: Comparison tests look for differences between group means. They can be
used to test the effect of one categorical variable on the mean of another characteristic. T-test is used
when comparing the means of exactly two groups (for example, the mean height of males and
females). ANOVA and MANOVA tests are used when comparing the means of more than two
groups (for example, the average height of children, adolescents, and adults).
i. Correlation tests ii. Paired test
iii. Independent t-test iv. ANOVA
v. MANOVA
iii. Correlation Tests: Correlation tests test whether variables are related without assuming a
cause-and-effect relationship. These can be used to test whether two variables you want to use in a
multiple regression test are, for example, autocorrelated. Nonparametric tests make fewer
assumptions about the data and are useful when one or more general statistical assumptions are
violated. However, their conclusions are not as strong as parametric tests.

1) What does a statistical test do?
Answer: What is a statistical test? Statistical testing provides a mechanism for making quantitative
decisions about one or more processes. The goal is to determine whether there is enough evidence to
"disprove" an assumption or hypothesis about a process. The hypothesis is called the null hypothesis.
2) What is z test and T-test?

Answer: The z test is a form of hypothesis testing. Where the t-test looks at two sets of data that are
different from each other—with no standard deviation or standard deviation—the z test looks at the
means of sets of data that are different but have a standard deviation or standard deviation of
variation driven.
3) Is ANOVA a statistical test?

Answer: ANOVA means analysis of variance. It is a statistical test that was developed by Ronald
Fisher in 1918 and has been used ever since. Simply put, ANOVA tells you if there is a statistical
difference between three or more independent groups.

4) What are nonparametric tests?
Answer: Nonparametric tests are experiments that do not require a base number for the hypotheses.
It does not depend on the data associated with any particular parameter set of the probability
distribution. Non-parametric methods are also known as nonparametric tests because they do not
involve a population.
03-02-02: STUDENT T-TEST, APPLICATIONS AND IMPORTANT

INTRODUCTION (T-TEST)
The t-test is any statistical hypothesis test in which the test statistic follows a Student's t-
distribution under the null hypothesis. A t-test is the most commonly applied when the test statistic
would follow a normal distribution if the value of a scaling term in the test statistic were known.
When the scaling term is unknown and is replaced by an estimate based on the data, the test statistics
(under certain conditions) follow a Student's t distribution. The t-test can be used, for example, to
determine if the means of two sets of data are significantly different from each other.
Student's-t test (t-test), analysis of variance (ANOVA), and analysis of covariance (ANCOVA)
are statistical methods used in the testing of hypothesis for comparison of means between the groups.
For these methods, the test variable (dependent variable) must be continuously scaled and
approximately normally distributed. The mean is a representative measure of normally distributed
continuous variables, and statistical methods used to compare means are called parametric methods.
For non-normal continuous variables, the median is a representative measure, in which case
comparisons between groups are made using non-parametric methods. Most parametric tests have an
alternative non-parametric test.
There are many statistical tests within Student's t-test (t-test), ANOVA, and ANCOVA, and each
test has its own set of hypotheses. Although not all methods are common, some can be managed with
other available methods. The purpose of this article is to review the assumptions, applications, and
interpretations of some popular t test, ANOVA, and ANCOVA methods.
TYPES OF T TESTS
Fig.3.2.3: Showing t-test

The most commonly used t tests are one-sample and two-sample tests:
i. A sample location test to determine whether a population measure has the value specified in
the null hypothesis.
ii. A two-sample test of the position of the null hypothesis that the means of the two populations
are equal.
All of these tests are commonly called Student's t-tests, although strictly speaking this name should
only be used when the variances of the two populations are also equal. The form of test used to reject
this hypothesis is sometimes referred to as the Welch t-test. These tests are often referred to as
unpaired tests or independent t-tests because they are usually used when the statistical units that are
not the same in the two samples being compared are used.
T-TEST FORMULA
T-tests can be performed either manually by using a formula or through some software.
The formula for the manual calculation of t-value is given below:
Where x is the mean of the sample, and µ is the assumed mean, σ is the standard deviation, and n is
the number of observations.
T-TEST FOR THE DIFFERENCE IN MEAN:
Where x 1 and x 2 are the mean of two samples and σ1 and σ2 is the standard deviation of two
samples, and n1 and n2 are the numbers of observation of two samples.
ONE SAMPLE T-TEST (ONE -TAILED T-TEST)
i. One sample t-test is a statistical test where the critical area of a distribution is one-sided so
that the alternative hypothesis is accepted if the population parameter is either greater than
or less than a certain value, but not both.
ii. In the case where the t-score of the sample being tested falls into the critical area of a one-
sided test, the alternative hypothesis is to be accepted instead of the null hypothesis.

iii. A one-tailed test is used to determine if the population is either lower than or higher than
some hypothesized value.
iv. A one-tailed test is appropriate if the estimated value might depart from the sample value in
either of the directions, left or right, but not both.
v. For this test, the null hypothesis states that there is no difference between the true mean and
the assumed value whereas the alternative hypothesis states that either the assumed value is
greater than or less than the true mean but not both.
vi. For instance, if our H0: µ0 = µ and Ha: µ < µ0, such a test would be a one-sided test or more
precisely, a left-tailed test.
vii. Under such conditions, there is one rejection area only on the left tail of the distribution.
viii. If we consider µ = 100 and if our sample mean deviates significantly from 100 towards the
lower direction, H0 or null hypothesis is rejected. Otherwise, H0 is accepted at a given level
of significance.
ix. Similarly, if in another case, H0: µ = µ0 and Ha: µ > µ0, this is also a one-tailed test (right
tail) and the rejection region is present on the right tail of the curve.
x. In this case, when µ = 100 and the sample mean deviates significantly from 100 in the
upward direction, H0 is rejected otherwise, it is to be accepted.
TWO SAMPLED T-TEST (TWO-TAILED T-TEST)

i. Two sample t-test is a test a method in which the critical area of a distribution is twosided and
the test is performed to determine whether the population parameter of the sample is greater
than or less than a specific range of values.
ii. A two-tailed test rejects the null hypothesis in cases where the sample mean is significantly
higher or lower than the assumed value of the mean of the population.
iii. This type of test is appropriate when the null hypothesis is some assumed value, and the
alternative hypothesis is set as the value not equal to the specified value of the null
hypothesis.
iv. The two-tailed test is appropriate when we have H0: µ = µ0 and Ha: µ ≠ µ0 which may mean
µ > µ0 or µ < µ0.
v. Therefore, in a two-tailed test, there are two rejection regions, one in either direction, left or
right, towards each tail of the curve.
vi. Suppose, we take µ = 100 and if our sample mean deviates significantly from 100 in either
direction, the null hypothesis can be rejected. But if the sample mean does not deviate
considerably from µ, the null hypothesis is accepted.

INDEPENDENT T-TEST
i. An Independent t-test is a test used for judging the means of two independent groups to
determine the statistical evidence to prove that the population means are significantly
different.
ii. Subjects in each sample are also assumed to come from different populations, that is,
subjects in Sample A are assumed to come from Population A and subjects in Sample B‖ are
assumed to come from Population B.
iii. The populations are assumed to differ only in the level of the independent variable.
iv. Thus, any difference found between the sample means should also exist between population
means, and any difference between the population means must be due to the difference in the
levels of the independent variable.
v. Based on this information, a curve can be plotted to determine the effect of an independent
variable on the dependent variable and vice versa.
T-TEST EXAMPLE
If a sample of 10 copper wires is found to have a mean breaking strength of 527 kgs, is it feasible to
regard the sample as a part of a large population with a mean breaking strength of 578 kgs and a
standard deviation of 12.72 kgs? Test at 5% level of significance.
Taking the null hypothesis that the mean breaking strength of the population is equal to 578 kgs, we
can write:
H0: µ = 578 kgs

H1: µ ≠ 578 kgs x = 527 kgs, σ =
12.72, n = 10.
Based on the assumption that the population to be normal, the formula for the test statistic t can be
written as:
t = (527+578) / (12.722/√10) t =
21.597
As Ha is two-sided in the given question, a two-tailed test is to use for the determination of the
rejection regions at a 5% level of significance which comes to as under, using normal curve area
table: R: | t | > 1.96
The observed value of t is -1.488 which is in the acceptance region since R: | t | > 1.96, and thus, H0
is accepted.

T-TEST APPLICATIONS
i. The T-test is used to compare the mean of two samples, dependent or independent.
ii. It can also be used to determine if the sample mean is different from the assumed mean.
iii. T-test has an application in determining the confidence interval for a sample mean.

1) What does the Student's t-test do?
Answer: Student's t-test is used to compare means between two groups, while ANOVA is used to
compare between three or more groups. In ANOVA, the total P value is first obtained. A significant
P value from the ANOVA test indicates at least one pair between which the mean difference was
statistically significant.
2) Why is it called Student's t-test?

Answer: Student's t-tests are parametric tests based on Student's or t-distribution. The student
division is named after William Seely Gossett (1876-1937), who first identified it in 1908.
3) Is a student’s t-test the same as at test?

Answer: All such tests are usually called Student's t-tests, though strictly speaking that name should
only be used if the variances of the two populations are also assumed to be equal; the form of the test
used when this assumption is dropped is sometimes called Welch's t-test.
4) What does a t-test value mean?

Answer: The t value measures the size of the difference relative to the change in your sample data.
In other words, T is simply the estimated variance in units of standard error. The larger the size of T,
the greater the evidence against the null hypothesis.
03-02-03: CHI-SQUARE TEST, APPLICATIONS AND IMPORTANCE
The chi-square test is performed to determine the difference between the theoretical population
parameter and the observed data.
i. The chi-square test is a non-parametric test that does not assume that the data is normally
distributed, but rather the chi-square.

ii. It allows the researcher to test factors such as a number of factors such as consistency,
significance of population differences and homogeneity or differences in population
differences.
iii. This test is commonly used to determine whether a random sample is drawn from a
population with mean µ and variance σ2.
Fig.3.1: Chi-Square test
CHI-S QUARE TEST USES
The Chi-Square Test is performed for various purposes, some of which are:
i. This method is often used by researchers to determine differences between different

categorical variables in a population.
ii. The chi-square test can also be used as a goodness-of-fit test. It allows us to measure how
closely the theoretical distribution matches the observed distribution.
iii. It also serves as a test of independence, which allows the researcher to determine whether
two population characteristics are related or not.
CHI-SQUARE TEST FORMULA
Chi-square test is symbolically written as χ2 and the formula of chi-square for comparing variance is
given as:
where is the variance of the sample, is

the variance of the sample?
Similarly, when chi-square is used as a non-parametric test for testing the goodness of fit or for
testing the independence, the following formula is used:
Where 𝑂 𝑗 is the observed frequency of the cell in the ith row and jth column,

𝐸 𝑗is the expected frequency of the cell in the ith row and jth column.
CONDITIONS FOR THE CHI-SQUARE TEST
For the chi-square test to be performed, the following conditions are to be satisfied:
i. Observations should be recorded and collected randomly.
ii. Instance elements must all be independent.
iii. iii. The frequency of data in a group should not be less than 10. In such conditions, the
rearrangement of the elements should be done by combining the frequencies.
iv. The total number of individual components in the sample should be very large, for example, 50
or more.
v. Limits on frequencies should be linear and have squares or higher powers
CHI-SQUARED DISTRIBUTION
i. The chi-square distribution in statistics is the distribution of the sum of squares of independent
normal random variables. ii. This distribution is a special case of the gamma distribution and
one of the most common distributions in statistics.
iii. This distribution is used for the chi-square test to test for goodness-of-fit or to test for
independence.
iv. The chi-square distribution is a subset of the t-distribution, the F-distribution used for t-tests
and ANOVA.
CHI-SQUARE TEST OF INDEPENDENCE

i. When used as a test of independence, the chi-square test allows the researcher to determine
whether the two characteristics being tested are related.
ii. For this test, a null and an alternative hypothesis are developed, where the null hypothesis is
that the two attributes are unassigned and the alternative hypothesis is that the attributes are
correlated.
iii. From the given data, the expected frequencies are calculated and then the chi-square value is
calculated.
iv. Based on the calculated chi-square value, the null or alternative hypothesis is accepted.

v. Here, if the calculated chi-square value is less than the value in the table at a certain level of
significance, the null hypothesis is accepted, which indicates that there is no relationship
between these two characteristics.
vi. However, if the calculated chi-square value is greater than the value in the table, the
alternative hypothesis is accepted, indicating that there is a relationship between the two
attributes.
vii. The chi-square test only determined the presence of a relationship, not the magnitude of the
relationship or its form.
CHI-SQUARE GOODNESS -OF-FIT TEST

i. The chi-square test is performed as a goodness-of-fit test that helps the researcher compare
the theoretical distribution with the observed distribution.
ii. If the calculated chi-square value is less than the table value at a specific significance level,
the fit between the data is considered good.
iii. A good fit indicates that the difference between the observed and expected frequencies is
due to variation during the scan.
iv. However, if the calculated chi-square value is greater than the table value, the fit is not
considered good.
Table 3.1: Showing the chi-square distribution table:

CHI-SQUARE TEST EXAMPLES
i. The chi-square test used to determine whether a new drug works for fever is an example of
the chi-square test, which is used as a test of independence to determine the relationship
between a drug and fever.
ii. Another example of the chi-square test tests the genetic theory that says that children who
have one parent with blood type A and another with blood type B will always have one of
the three blood types A, AB, B, and the ratio of the three blood types averages 1:2:1 is.
Based on the expected and observed results, the appropriateness of the hypothesis can be
determined.
CHI-SQUARE TEST PROGRAMS

i. The chi-square test is used in cryptanalysis to determine the distribution of plaintext and
decrypted text.
ii. Similarly, it is used in bioinformatics to determine the distribution of various genes such as
disease genes and other important genes.
iii. The chi-square test is used by various researchers from different fields to test the primary or
secondary hypothesis.
SHORT ANSWER QUESTIONS WITH MODEL ANSWWR

1) How do you interpret chi-square results?
Answer: If your chi-square calculated value is greater than the chi-square critical value, then you
reject your null hypothesis. If your chi-square calculated value is less than the chi-square critical
value, then you "fail to reject" your null hypothesis.
2) What is the chi-square test and how is it calculated?

Answer: The chi-square formula is a statistical formula for comparing two or more sets of statistical
data. It is used for data consisting of variables distributed in different categories and is denoted χ2.
The chi-square formula is: χ2 = ∑ (Oi - Ei)2/Ei, where Oi = observed value (actual value) and Ei =
expected value.
3) How do you interpret chi-square results?

Answer: Simply put, the more different these values are, the higher the chi-square score, the more
likely it is, and the more likely we are to reject the null hypothesis and conclude that the variables are
related. with each other
4) What is the value of chi-square?
Answer: The chi-square value is a single number that summarizes all the differences between our
actual data and the expected data when there is no difference. When the actual data and the expected
data (when there is no difference) are the same, the chi-square value is 0. A larger difference results
in a larger chi-square value.
03-02-04: APPLICATIONS OF BIOSTATISTICS

Simply put, statistics involves the collection, organization, and analysis of data to provide
insights from a sample to an entire population. The field of statistics requires appropriate study
design, proper selection of data structures such as samples, and application of appropriate statistical
tests. Statistics deals with all types of data and also includes the planning of data collection,
especially research and planned experiments (Fig.4.1).
Fig.3.2.5: Representing applications of statistics in research.

The purpose of this exercise is to provide you with an in-depth discussion of various applications of
statistical methods.
IMPORTANCE OF STATISTICAL METHODS

Let's briefly focus on the use of statistical methods;
i. In identifying relationships between different variables in the data: Statistical methods
identify significant correlations between a number of data variables and identified
relationships between those variables that help to understand the problem being solved.
ii. When analyzing and interpreting data: a holistic approach to understanding data leads to
logical analysis and interpretation of data and deriving insights from complex information.
iii. Interpretation of conclusions: Analyzing and interpreting statistical data allows users to
understand how predictions can be deciphered for the revealed data, improving the accuracy
of conclusions and results.

iv. When using appropriate ML models: different machine learning methods/methods may be
more appropriate for different problems; A strong understanding of statistical background is
desirable for this.
v. In turning data observations into accurate information: From reviewing data

observations to generating decisions, data professionals strive to deliver concise and
accurate information. Through statistical methods, they can transform a set of observations
into transparent and meaningful insights.
USES OF BIOSTATISTICS
Here are the ELEVEN most important uses of statistics:
1. Interpreting research and conclusions
Statistics are an important part of most sciences, helping researchers test hypotheses, confirm or
reject theories, and draw reliable conclusions. does the data generated by experiments and studies is
never easy - you have to take into account coincidences and uncertainties, eliminate coincidences
and achieve the most accurate results? Statistical analysis helps reduce or eliminate errors, allowing
researchers to draw reliable conclusions that then guide further research.
2 Meta-Analyses of Literature Reviews
Before a researcher or academic begins a new study, it is common to conduct a comprehensive
literature review of all available published data on a given topic. However, it is always difficult to
draw firm conclusions from multiple studies, especially when the studies follow different research
methods, are published in different journals (resulting in publication bias), or span a long period of
time. Statistical analysis of this study helps to reveal the general truth of this study or reveal hidden
patterns or relationships.
3. Clinical trial design
One of the most important applications of statistical analysis is in the design of clinical trials. When
a new drug or treatment is discovered, it must first be tested in one or more groups of people to
determine its effectiveness and safety. A clinical trial involves selecting the population/sample size,
determining the period over which the treatment should be monitored, designing the phases, and
selecting the parameters to determine whether the treatment is effective or not. Biostatisticians can
perform the task of statistical analysis not only in its design, but also in the analysis and
interpretation of results.
4. Design Study
Do people who go to the gym live healthier and happier lives? How safe is New York? How
effective is your HIV education program? These questions cannot be answered without the help of

statistics. Research requires careful design and execution, considerations of survey design,
accounting for bias and fatigue, and more. Data collected from surveys should be carefully reviewed
by statistical analysis experts who also use their judgment and experience to provide meaningful
information from the survey. Surveys can help governments determine the effectiveness of
initiatives, companies can understand responses to specific products, and social scientists can
conduct quantitative research.
5. Epidemiological studies
Epidemiological studies help to determine the relationship between the cause and effect of a disease,
especially in outbreaks and epidemics. Statistical analysis identifies a possible cause of the disease,
such as B. the relationship between smoking and lung cancer. This information is used to develop
public health policies and implement preventive health programs. Data visualization and statistical
analysis also played an important role in understanding the Ebola epidemic in West Africa.
6. Statistical Modeling
Statistical modeling involves building predictive models based on pattern recognition and
knowledge discovery. It is used in environmental and geographic research, predicting election
results, analyzing population survival, and more. Meteorologists use statistical tools to predict the
weather. The line between statistical modeling and machine learning is becoming increasingly
blurred - Robert Tibshirani, a statistician at Stanford, has called machine learning "glorious
statistics".
7. Marketing
As an integral part of current market research, statistical methods offer marketers many interesting
insights and support effective marketing decisions. These decisions include the process of
developing, evaluating and selecting effective marketing strategies. These applications are
understanding, calculating, communicating, consumer satisfactions, correlation between markets,
etc.
8. Production
Statistical techniques have a very significant impact in the field of production, for example, to
answer what to produce, how to produce, when to produce, and many more are highly dependent on
statistical data analysis. In general, manufacturing units build a large variety of products using raw
materials or congregate products from distinct elements, these products (goods) are supplied to
different local and international markets. The powerful statistical techniques support specialists to
expedite production, statistical quality control charts and inspection methods in the manufacturing
industry.
9. Business Management

An all-inclusive statistical survey concludes productive business activities can improve the efficiency
of making more persuasive decisions in terms of future activities; various statistical techniques. Time
series analysis can be applied to decipher the behaviour of prices, production of goods, consumption
of products, capital distributions, etc. Multiple regression analysis to measure customer perspective,
demand analysis to identify the association between demand of a commodity and its supply. Analysis
of variance and analysis of covariance are methodological in nature to determine the reliability of all
experimental design and benefits.
10. Information Technology
For some areas of interest, statistics provide accurate predictions about specific results obtained from
extensive data analysis with a relatively small sample of a general area. These conclusions will help
to obtain reliable information. Statistical tools are very useful in the field of information technology;
they are very effective in obtaining and using information. For example, statistics support the
updating of relevant data for regional authorities.
11. Gross revenue, the amount transferred by functions in a given period.
Complete publication with information on functions and types of economy. Financial assets and
liabilities Social security receipts and bank statements Detailed description of social benefits. Social
security operations (source).
QUESTIONS WITH ANSWERS WITH MODEL ANSWER

1) What are statistics programs?
Answer: Applied statistics is at the root of data analysis, and the practice of applied statistics
involves analyzing data to help identify and define business needs. Today, we find applied statistics
in various fields such as medicine, information technology, engineering, finance, marketing,
accounting, economics, and more.
2) What are statistical tests used for?

Answer: Statistical testing provides a mechanism for making quantitative decisions about one or
more processes. The goal is to determine whether there is enough evidence to "disprove" an
assumption or hypothesis about a process.
3) What is the role of statistics in education?

Answer: Statistics allows teachers to understand student performance using descriptive statistics.
Reason 2: Statistics allow teachers to identify trends in student performance using data visualization.
Reason 3: Statistics allow educators to compare different teaching methods using hypothesis testing.

What is the use of statistics in real life?
Answer: Individuals use statistics to make financial planning and budgeting decisions, while
organizations use statistics to make financial policy decisions. Banks use statistics to reduce risk in
credit transactions, analyze financial market activity, and predict the impact of economic crises.

1. Which of these is a type of T-test?
a. One sample t-test b. Independent two-sample t-test
c. Paired sample t-test d. All of these
2. A t-test is a significance test that assesses
a. The means of two independent groups
b. The medians of two dependent groups
c. The modes of two independent variables
d. The standard deviation of three independent variables
3. To use a t-test, the dependent variable must have
a. Nominal or interval data b. Ordinal or ratio data
c. Interval or ratio data d. Ordinal or interval data
4. The probability of finding statistical significance is also known as
a. Degrees of freedom (df) b. p-value
c. Standard deviation (sd) d. A constant source of frustration!
5. T-tests and other significance tests are frequently criticized. Over-representation of statistical
significance in research may result in:
a. Publication bias b. Researcher fatigue
c. Lost funding d. Confused graduate students
Answer:
1-d 2-a 3-c 4-b 5-a
1. The T-test is not a reliable test. (True/False)

2. The T-test tells you about the significant difference between the two groups. (True /False)
3. The chi-square distribution curve is skewed to the right, and its shape depends on the degrees of
freedom df. (True /False)

4. The t-test, is used to determine whether the means of two groups are equal to each other. (True
/False)
5. If a p-value reported from a t test is less than 0.05, then that result is said to be statistically
significant. (True /False)
Answer:
1-False 2-True 3-True 4-True 5-True
MATCH THE FOLLOWING WORDS OF COLUMN-I AND COLUMN-II:
Column-I Column-II
1. T-test a. Like hood ratio
2. Student t-test b. Means of two group hypothesis tests
3. X2 test c. Null hypothesis
4. Pearson X2 test d. Observation of random set of variables
5. G-test e. Test of goodness of fit
Answer:
1-b 2-c 3-d 4-e 5-a
FILL IN THE BLANK WITH THE APPROPRIATE WORD

1. While a t-test is used to compare two means, the one-way ANOVA can be used to simultaneously
compare means of ------------ groups.
2. A t-test is a statistical test that is used to compare the means of ----------- groups.
3. he t-value measures the size of the difference relative to the variation in your -------- data.
4. Analysis of Variance, is a statistical test used to analyze the difference between the means of more
than ------------ groups
5. A statistically significant test result (P ≤ 0.05) means that the test hypothesis is ---------.
Answer:
1-several 2-two 3-sample 4-two 5- false
SUMMARY

Bio-statistical tests work by calculating a test statistic- a number that describes how much the
relationship between variables in your test differs from the null hypothesis of no relationship. If the
value of the test statistic is more extreme than the statistic calculated from the null hypothesis, then
you can infer a statistically significant relationship between the predictor and outcome variables. If
the value of the test statistic is less extreme than the one calculated from the null hypothesis, then
you can infer no statistically significant relationship between the predictor and outcome variables.
You can perform statistical tests on data that have been collected in a statistically valid manner,
either through an experiment, or through observations made using probability sampling methods. For
a statistical test to be valid, your sample size needs to be large enough to approximate the true
distribution of the population being studied.
Student's-t test (t-test), analysis of variance (ANOVA), and analysis of covariance (ANCOVA) are
statistical methods used in the testing of hypothesis for comparison of means between the groups.
Mean is the representative measure for normally distributed continuous variable and statistical
methods used to compare between the means are called parametric methods. For non-normal
continuous variable, median is representative measure, and in this situation, comparison between the
groups is performed using non-parametric methods.
A Chi-square test is performed to determine if there is a difference between the theoretical
population parameter and the observed data. It allows the researcher to test factors like a number of
factors like the goodness of fit, the significance of population variance, and the homogeneity or
difference in population variance. A Chi-square test can also be used as a test for goodness of fit.
In simplified expression, statistics encompasses collecting, organizing, analyzing data to derive
insights from sample to whole population. In identifying association amidst different variables in
data: Statistical techniques recognize the essential correlations between number of variables inside
data and defined relationships among those variables that helps in understanding the problem to be
addressed. Through statistical techniques, they can transform a number of observations into
transparent and significant insights.
KEY WORDS
t-test- The t-test is any statistical hypothesis test in which the test statistic follows a Student's t-
distribution under the null hypothesis.
Chi-square test- A chi-squared test (χ2) is basically a data analysis on the basis of observations of a
random set of variables.
Parametric- The relating to a parameter, mathematical or statistical variable.
Biostatics- The application of statistical techniques to scientific research in biological sciences.

ANOVA- Analysis of variance (ANOVA) is a collection of statistical models and their associated
estimation procedures used to analyze the differences among means.
ANCOVA- Analysis of covariance is used to test the main and interaction effects of categorical
variables on a continuous dependent variable, controlling for the effects of selected other
continuous variables, which co-vary with the dependent.
Designing- To create, fashion, execute, or construct according to plan, devise, contrive design a
system for tracking inventory.
Modeling- Modeling involves making a representation of something. Creating a tiny, functioning
volcano is an example of modeling.
Non-parametric- A statistical method in which the data are not assumed to come from prescribed
models that are determined by a small number of parameters.
REFERENCES
1. Nikulin, M. S. (1973), Chi-squared test for normality, Proceedings of the International Vilnius
Conference on Probability Theory and Mathematical Statistics, vol. 2, pp. 119–122.
2. Brink, Susanne; & Van Schalkwyk, Dirk J. (1982), Serum ferritin and mean corpuscular volume as
predictors of bone marrow iron stores, South African Medical Journal, Vol. 61, pp. 432–434.
3. Magidson, Jay; The CHAID approach to segmentation modeling: chi-squared automatic interaction
detection, in Bagozzi, Richard P. (Ed); Advanced Methods of Marketing Research, Blackwell,
Oxford, GB, 1994, pp. 118–159.
4. Boneau, C. Alan (1960). The effects of violations of assumptions underlying the t test.
Psychological Bulletin. 57 (1): 49–64.
YOUTUBE VIDEOS
1. https://www.youtube.com/watch?v=gPt2DubVJQM
2. https://www.youtube.com/watch?v=Pb9-tashUn8
3. https://www.youtube.com/watch?v=0NwA9xxxtHw
4. https://www.youtube.com/watch?v=8uUKkL7qgFk
WIKIPEDIA
1. https://www.kolabtree.com/blog/6-essential-applications-of-statistical-analysis/
2. ttps://en.wikipedia.org/wiki/Student%27s_t-test

3. https://en.wikipedia.org/wiki/Chi-squared_test
4. https://en.wikipedia.org/wiki/Welch%27s_t-test
REFERENCE BOOKS
1. Dodge, Yadolah (2008), The Concise Encyclopedia of Statistics. Springer Science & Business
Media.
2. O'Mahony, Michael (1986), Sensory Evaluation of Food: Statistical Methods and Procedures.
CRC Press.
3. Corder, G. W.; Foreman, D. I. (2014), Nonparametric Statistics: A Step-by-Step Approach, New
York: Wiley.
4. Greenwood, Cindy; Nikulin, M. S. (1996), A guide to chi-squared testing, New York: Wiley.

CREDIT 03-UNIT 03: USE OF ANOVA FOR THE RESEARCH ANALYSIS
LEARNING OBJECTIVES
 Describe the logic of analysis of variance.
 Set up and run a one-way ANOVA.
 Define the data in the ANOVA table.
 Interpret the results of the ANOVA result.
 Make multiple comparisons and interpret the results accordingly.
"A judicious man looks on statistics not to get knowledge, but to save himself from having
ignorance foisted on him." --Thomas Carlyle
03-03-01: ANOVA
INTRODUCTION
Professor R. Fisher was the first to use the term "variance" and, in fact, it was he who developed the
detailed complete theory of ANOVA and explained its usefulness in practice. i.e. standard deviation
= variance.
Analysis of variance (abbreviated as ANOVA) is an extremely useful technique concerning research

in the fields of economics, biology, education, psychology, sociology, and business/industry and in
researches of several other disciplines. This technique is used when multiple sample cases are
involved. As stated earlier, the significance of the difference between the means of two samples can
be judged through either z-test or the t-test, but the difficulty arises when we happen to examine the
significance of the difference amongst more than two sample means at the same time. The ANOVA
technique enables us to perform this simultaneous test and as such is considered to be an important
tool of analysis in the hands of a researcher. Using this technique, one can draw inferences about
whether the samples have been drawn from populations having the same mean. The ANOVA
technique is important in any situation where we want to compare more than two populations, such
as comparing the yield of multiple seeds, the gas mileage of four cars, the smoking habits of five
groups of university students, etc. In such circumstances, one generally does not want to consider all
possible combinations of two populations at once, as this would require a large number of tests
before reaching a decision. It also takes a lot of time and money, and even then, certain relationships
may remain unknown (especially interactions). Therefore, the ANOVA technique is often used, and
at the same time, the difference between the means of all populations is examined.

ANOVA
Professor R. Fisher was the first to use the term "variance" and, in fact, it was he who developed the
very complete theory of ANOVA and explained its usefulness in practice. i.e. Standard deviation and
variance.
ANALYSIS OF VARIANCE AND COVARIANCE

Later, Professor Snedecor and many others contributed to the development of this technique.
ANOVA is essentially a procedure for testing the difference among different groups of data for
homogeneity. The essence of ANOVA is that the total amount of variation in a set of data is broken
down into two types, that amount which can be attributed to chance and that amount which can be
attributed to specified causes.
There may be variation between samples and also within sample items. ANOVA consists in splitting
the variance for analytical purposes. Hence, it is a method of analyzing the variance to which a
response is subject into its various components corresponding to various sources of variation.
Through this technique one can explain whether various varieties of seeds or fertilizers or soils differ
significantly so that a policy decision could be taken accordingly, concerning a particular variety in
the context of agriculture researches. Similarly, the differences in various types of feed prepared for
a particular class of animal or various types of drugs manufactured for curing a specific disease may
be studied and judged to be significant or not through the application of ANOVA technique.
THE BASIC PRINCIPLE OF ANOVA

The basic principle of ANOVA is to test for differences among the means of the populations by
examining the amount of variation within each of these samples, relative to the amount of variation
between the samples. In terms of variation within the given population, it is assumed that the values
of (Xij) differ from the mean of this population only because of random effects i.e., there are
influences on (Xij) which are unexplainable, whereas in examining differences between populations
we assume that the difference between the mean of the jth population and the grand mean is
attributable to what is called a specific factor‘ or what is technically described as treatment effect.
Thus, while using ANOVA, we assume that each of the samples is drawn from a normal population
and that each of these populations has the same variance. We also assume that all factors other than
the one or more being tested are effectively controlled. This, in other words, means that we assume
the absence of many factors that might affect our conclusions concerning the factor(s) to be studied.
In summary, we need two estimates of the population variance, one based on the variance between
samples and one based on the within-sample variance. The two population variance estimates are
then compared using the F-test, which we will study in more detail.

This value of F is to be compared to the F-limit for given degrees of freedom. If the F value we work
out is equal or exceeds the F-limit value (to be seen from F tables No. 4(a) and 4(b) given in
appendix), we may say that there are significant differences between the sample means.
Analysis of variance, or ANOVA, is a strong statistical technique that is used to show the difference
between two or more means or components through significance tests. It also shows us a way to
make multiple comparisons of several populations’ means. The Anova test is performed by
comparing two types of variation, the variation between the sample means, as well as the variation
within each of the samples. The below mentioned formula represents one-way Anova test statistics:
Alternatively,
F = MST/MSE
MST = SST/ p-1
MSE = SSE/N-p
SSE = ∑ (n−1)
S2
Where,
F = Anova Coefficient
MSB = Mean sum of squares between the groups
MSW = Mean sum of squares within the groups

MSE = Mean sum of squares due to error SST = Total
Sum of squares p = Total number of populations n = The
total number of samples in a population
SSW = Sum of squares within the groups
SSB = Sum of squares between the groups
SSE = Sum of squares due to error
S = Standard deviation of the samples
N = Total number of observations

1) What is ANOVA?
Answer: A brief definition of ANOVA is that it is a statistical analysis technique that compares and
measures sets of data to determine their significance.
2) Why should the company use the variance analysis method?

Answer: From a business perspective, R and D partners can use ANOVA to analyze which
development method is the most cost-effective for the organization.
3) Why do investors use ANOVA?

Answer: Investors can look at a company's sales records and perform an ANOVA to determine
whether the company's performance is worth investing in.
4) Which ANOVA score (F value) is not statistically significant?

Answer: An F value of about 1 or less usually means little or no difference in values; That is, there is
no significant difference between the groups.
03-03-02: ONE-WAY ANOVA IN RESEARCH ANALYSIS

INTRODUCTION
ANOVA, which stands for Analysis of Variance, is a statistical test used to analyze the difference
between the means of more than two groups.
One-way ANOVA uses one independent variable (Fig.2.1), while two-way ANOVA uses two
independent variables.

Fig.3.3.1: Representing one-way ANOVA
F-tests are named after Sir Ronald Fisher. The F statistic is simply the ratio of two variances. The
variance is the square of the standard deviation. For the average person, standard deviations are
easier to understand than variances because they are in the same units as the data, not the squared
units. The F-statistic is based on the mean square ratio. The term "mean square" may sound
confusing, but it is simply an estimate of population variance that uses degrees of freedom (D.F.) to
calculate this estimate.
F TEST FORMULA
A test statistic that has an F distribution under the null hypothesis is called an F test. It is used to
compare statistical models on or available data sets. George W. Snedecor in honor of Sir Ronald A.
Fisher called this formula the F test formula.
The F-test formula is used to compare the difference between two different sets of values. To apply
the F-distribution under the null hypothesis, we must first find the mean of two given observations
and then calculate the variance.
Where, σ2 = Variance x = Values

given in a set of data
= Mean of the data n = Total

number of values
(F-Test Calculator is a free online tool that displays the mean, variance and frequency distribution
value for the given data set. Online F-test calculator tool makes the calculation faster, and also it
displays the F value in a fraction of seconds).
HOW TO USE THE F-TEST CALCULATOR?

The procedure to use the F-Test calculator is as follows:
Step 1: Enter the set of data values separated by a comma in the input field
Step 2: Now click the button ―Calculate‖ to get the frequency value
Step 3: Finally, the mean, variance and the F-value will be displayed in the output field

F-TEST
In Statistics, F-test is a statistical test that uses the frequency distribution. To calculate the F-value or
the frequency value, the test compares the variances for the given two data values. In the F-test, the
test static has a Frequency distribution (F-distribution) under the null hypothesis. Before, finding the
variance of the data set values, find the mean value for the given data set. Thus, the formula to
calculate f-value is given by
F-value – Variance1/Variance2
Variance1 = variance of the first data set
Variance2 = Variance of the second data set

1) What is the one-way ANOVA test for?
Answer: One-way ANOVA is usually used when you have a single independent variable or factor
and your goal is to test whether changes or different levels of that factor have a measurable effect on
the dependent variable.
2) What is a one-way ANOVA test in research?

Answer: One-way ANOVA ("analysis of variance") compares the means of two or more
independent groups to determine whether there is statistical evidence that related population
measures are significantly different. ANOVA is a one-way parametric test.
3) What does Anova explain with an example?

Answer: ANOVA tells you whether the dependent variable changes with the level of the
independent variable. For example, your independent variable is social media use, and you assign
groups to low, medium, and high social media use to see if there is a difference in hours of sleep per
night.
4) Why is ANOVA used?

Answer: ANOVA is useful when testing three or more variables. It is similar to multiple 2-sample t-
tests. However, it causes fewer type I errors and is suitable for some problems. ANOVA categorizes

differences by comparing the means of each group and accounts for the distribution of variance
across multiple sources.
03-03-03: TWO-WAY ANOVA ANALYSIS

A two-way ANOVA is used to estimate how the mean of a quantitative variable changes according
to the levels of two categorical variables. Use a two-way ANOVA when you want to know how
two independent variables, in combination, affect a dependent variable.
A two-way ANOVA (analysis of variance‖) is used to determine whether or not there is a

statistically significant difference between the means of three or more independent groups that have
been split on two variables (sometimes called factors‖) (Fig.3.1).
Fig.3.3.2: Representing to-way ANOVA
WHEN TO USE ONE-WAY ANOVA
When you have collected data on one categorical independent variable and one quantitative
dependent variable, using one-way ANOVA. The independent variable must have at least three
levels (ie, at least three different groups or categories).
ANOVA tells you whether the dependent variable changes as a function of the level of the
independent variable. For example, your independent variable is social media use, and you assign
groups to low, medium, and high levels of social media use to find out if there is a difference in
hours of sleep per night. Your independent variable is brand of soda, and you collect data on Coke,
Pepsi, Sprite, and Fanta to find out if there is a difference in the price per 100ml.
You, independent variable is type of fertilizer, and you treat crop fields with mixtures 1, 2 and 3 to
find out if there is a difference in crop yield.
The null hypothesis (H0) of ANOVA is that there is no difference among group means. The
alternate hypothesis (Ha) is that at least one group differs significantly from the overall mean of the
dependent variable.
If you want to compare two groups, use the t-test instead.
HOW DOES THE ANOVA TEST WORK?

ANOVA determines whether groups formed by levels of the independent variable are statistically
different if the means of the treatment levels differ from the overall mean of the dependent variable.
If one of the group means is significantly different from the overall mean, the null hypothesis is
rejected.
ANOVA uses the F test for statistical significance. This makes it possible to compare multiple means
at the same time, since the error is calculated for the entire set of comparisons, rather than for each
one-sided two-sided comparison (as is the case in the t-test).
The F-test compares the difference in each group mean to the total group difference.
When the within-groups variance is less than the between-groups variance, the F-test will find a
higher F-value and therefore a higher probability that the observed difference is real and not due to
chance.
ASSUMPTIONS OF ANOVA
The assumptions of the ANOVA test are the same as the general assumptions for any parametric test:
i Independence of observations: the data have been collected using statistically sound methods
and there is no hidden relationship between the observations. If your data does not meet this
assumption because you have a confounding variable that you need to statistically control for,
use a block variable ANOVA.
ii Normal response variable: The values of the dependent variable follow a normal distribution.
iii Homogeneity of types: The comparable type within each group is the same for each group. If
the differences between groups are different, ANOVA is probably not appropriate for the data.
PERFORMING ONE -WAY ANOVA
You can perform ANOVA manually, but it is difficult to do with more than a few observations. We
will perform our analysis in the R statistical program because it is free, powerful, and widely used.
For a complete explanation of this ANOVA example, see our guide to performing ANOVA in R.
The sample dataset from our imaginary crop yield experiment contains data about:
Fertilizer type (type 1, 2, or 3)
Planting density (1 = low density, 2 = high density)
Planting location in the field (blocks 1, 2, 3, or 4)
Final crop yield (in bushels per acre).

This gives us enough information to run various different ANOVA tests and see which model is the
best fit for the data.
For the one-way ANOVA, we will only analyze the effect of fertilizer type on crop yield.
After loading the dataset into our R environment, we can use the command to run an ANOVA. In
this example we will model the differences in the mean of the response variable, crop yield, as a
function of type of fertilizer.
ONE-WAY ANOVA R CODE
one.way<- aov (yield ~ fertilizer, data = crop data)
INTERPRETING THE RESULTS
To view the summary of a statistical model in R, use the summary () function.
One-way ANOVA model summary R code
The summary of an ANOVA test (in R) looks like this:
The ANOVA output provides an estimate of how much variation in the dependent variable that can
be explained by the independent variable.
i The first column lists the independent variable along with the model residuals (aka the model
error).
ii The D.F. column displays the degrees of freedom for the independent variable (calculated by
taking the number of levels within the variable and subtracting 1), and the degrees of freedom
for the residuals (calculated by taking the total number of observations minus 1, then subtracting
the number of levels in each of the independent variables).
iii The Sum Sq column displays the sum of squares (a.k.a. the total variation) between the group
means and the overall mean explained by that variable. The sum of squares for the fertilizer
variable is 6.07, while the sum of squares of the residuals is 35.89.
iv The Mean Sq column is the mean of the sum of squares, which is calculated by dividing the sum
of squares by the degrees of freedom.
v The F-value column is the test statistic from the F test: the mean square of each independent
variable divided by the mean square of the residuals. The larger the F value, the more likely it is
that the variation associated with the independent variable is real and not due to chance.

vi The Pr (>F) column is the p-value of the F-statistic. This shows how likely it is that the F-value
calculated from the test would have occurred if the null hypothesis of no difference among
group means were true.
vii Because the p-value of the independent variable, fertilizer, is significant (p < 0.05), it is likely
that fertilizer type does have a significant effect on average crop yield.
1) What is two-way ANOVA?

Answer: A two-way ANOVA is used to estimate how the mean of a quantitative variable changes
according to the levels of two categorical variables. Use a two-way ANOVA when you want to know
how two independent variables, in combination, affect a dependent variable
2) What is two-way ANOVA with example?

Answer: With a two-way ANOVA, there are two independents. For example, a two-way ANOVA
allows a company to compare worker productivity based on two independent variables, such as
department and gender. It is utilized to observe the interaction between the two factors. It tests the
effect of two factors at the same time.
3) Why ANOVA test is used?

Answer: ANOVA is helpful for testing three or more variables. It is similar to multiple two-sample
t-tests. However, it results in fewer type I errors and is appropriate for a range of issues. ANOVA
groups differed by comparing the means of each group and includes spreading out the variance into
diverse sources.
4) What are the advantages of a two-way ANOVA?

Answer: The advantages of using a two-variable design via Two-Way ANOVA: Decrease in cost.
The ability to analyze the interaction of two independent variables. Increased statistical power due to
smaller variance.
03-03-04: F-TEST
A two-way ANOVA is used to estimate how the mean of a quantitative variable changes according
to the levels of two categorical variables. Use a two-way ANOVA when you want to know how two
independent variables, in combination, affect a dependent variable.

A two-way ANOVA (analysis of variance) is used to determine whether or not there is a statistically
significant difference between the means of three or more independent groups that have been split on
two variables (sometimes called factors‖).
USE A TWO-WAY ANOVA
You should use a two-way ANOVA when you’d like to know how two factors affect a response
variable and whether or not there is an interaction effect between the two factors on the response
variable.
For example, suppose a botanist wants to explore how sunlight exposure and watering frequency
affect plant growth. She plants 40 seeds and lets them grow for two months under different
conditions for sunlight exposure and watering frequency. After two months, she records the height of
each plant.
In this case, we have the following variables:
i. Response variable: Plant growth
ii. Factors: Sunlight exposure, watering frequency and we would like to answer the following
questions:
a. Does sunlight exposure affect plant growth?
b. Does watering frequency affect plant growth?
c. Is there an interaction effect between sunlight exposure and watering frequency? (e.g. the
effect that sunlight exposure has on the plants is dependent on watering frequency)
We would use a two-way ANOVA for this analysis because we have two factors. If instead we
wanted to know how only watering frequency affected plant growth, we would use a one-way
ANOVA since we would only be working with one factor.
TWO-WAY ANOVA ASSUMPTIONS
For the results of a two-way ANOVA to be valid, the following assumptions should be met:
i. Normality – The response variable is approximately normally distributed for each group.
ii. Equal Variances – The variances for each group should be roughly equal.
iii. Independence – The observations in each group are independent of each other and the
observations within groups were obtained by a random sample.
TWO-WAY ANOVA: EXAMPLE

A botanist wants to know whether or not plant growth is influenced by sunlight exposure and
watering frequency. She plants 40 seeds and lets them grow for two months under different
conditions for sunlight exposure and watering frequency. After two months, she records the height of
each plant. The results are shown below:
In the table above, we see that there were five plants grown under each combination of conditions.
For example, there were five plants grown with daily watering and no sunlight and their heights after
two months were 4.8 inches, 4.4 inches, 3.2 inches, 3.9 inches, and 4.4 inches:
1) What is the F-test for?

Answer: The F-test is used by a researcher to test the equality of two population variances. When a
researcher wants to test whether two independent samples with the same variance are drawn from a
normal population, they usually use the F test.
2) What is the F-test formula?

Answer: The formula for the F statistic is: F-statistic = variance of group mean / mean of variances
within the group.

3) Why is the F-test used in ANOVA?
Answer: In ANOVA, we use the F test because we are testing the difference between the means of 2
or more groups, that is, we want to see if there is any difference between the groups. We do this
because running multiple tests can lead to something important when it isn't.
4) What is a good value for the F-test?
Answer: An F statistic of at least 3.95 is required to reject the null hypothesis at an alpha level of
0.1. At this level there is a 1% chance that you are wrong.

1. What is the primary purpose of ANOVA?
a. Comparing means across three or more groups
b. Comparing medians across three or more groups
c. Examining the relationship between two categorical variables
d. Identifying normally distributed data
2: Which of the following assumptions does not apply to ANOVA?
a. Independent observations b. Normal distribution of continuous variable
b. Homogeneity of variances d.Inclusion of one bivariate variable
3: How many pairwise comparisons would there be for an ANOVA with four groups?
a. 16 b. 4
b. 12 d. 6
4: Apply a Bonferroni adjustment to a p-value of .01 if the analyses included six pairwise
comparisons. If the threshold for statistical significance were .05, would the adjusted p-value be
significant?
a. Yes b. No
5: In which situation would you use planned comparisons?
a. After a significant ANOVA to compare each pair of means
b. Instead of an ANOVA when the data did not meet the normality assumption
c. When you have to choose between two categorical variables
d. When you conduct an ANOVA and have hypotheses about which sets of means are
different from one another
Answer:
1-a 2-d 3-d 4-b 5-d
STATE WHETHER THE STATEMENTS ARE TRUE OR FALSE.

1. ANOVA is a statistical measure adopted to analyze differences in group variances.
(True/False)
2. The only difference between a Tukey and Bonferroni is that a Tukey isdivided by the number of
comparisons. (True/False)
3. ANOVA is robust with violations to assumptions of homogeneity of variance so if you fail
Levene’s test by a small amount, you’re still ok. (True/False)
4. A significant difference among groups is considered large and important. (True/
False)
5. A 3 × 3 between-subjects design is a 2-way ANOVA. (True/False)
Answer:
1-True 2-False 3-False 4-True 5-True
MATCH THE FOLLOWING WORDS IN COLUMN-I TO COLUMN-II.

1. ANOVA checks the impact of one or more factors by comparing the means of ----- samples.
2. One-way ANOVA is typically used when you have a single ---------- variable.
3. One-way ANOVA is a statistical method to test the -------- hypothesis (H0).
4. ANOVA, mean square is a sum of squares divided by its associated degrees of ------.
5. N represents the total number of ---------.
Answer:
1- different 2- independent 3-null 4-freedom. 5- observations

SUMMARY
Analysis of variance (abbreviated as ANOVA) is an extremely useful technique concerning researches
in the fields of economics, biology, education, psychology, sociology, business/industry and in
researches of several other disciplines. The ANOVA technique enables us to perform this simultaneous
test and as such is considered to be an important tool of analysis in the hands of a researcher. In such
circumstances one generally does not want to consider all possible combinations of two populations at a
time for that would require a great number of tests before we would be able to arrive at a decision.
Therefore, one quite often utilizes the ANOVA technique and through it investigates the differences
among the means of all the populations simultaneously. Thus, through ANOVA technique one can, in
general, investigate any number of factors which are hypothesized or said to influence the dependent
variable. One may as well investigate the differences amongst various categories within each of these
factors which may have a large number of possible values.
ANOVA, which stands for Analysis of Variance, is a statistical test used to analyze the difference
between the means of more than two groups. Use a one-way ANOVA when you have collected data
about one categorical independent variable and one quantitative dependent variable. The alternate
hypothesis (Ha) is that at least one group differs significantly from the overall mean of the dependent
variable. ANOVA determines whether the groups created by the levels of the independent variable are
statistically different by calculating whether the means of the treatment levels are different from the
overall mean of the dependent variable. If any of the group means is significantly different from the
overall mean, then the null hypothesis is rejected. The F-test compares the variance in each group mean
from the overall group variance.
F-tests are named after the name of Sir Ronald Fisher. F-statistics are based on the ratio of mean
squares. To compare the variance of two different sets of values, the F test formula is used. To be
applied to F distribution under the null hypothesis, we first need to find out the mean of two given
observations and then calculate their variance.
KEY WORDS
Correlation- It is a statistical measure that expresses the extent to which two variables are linearly
related.
Summary- It is a brief statement or restatement of main points, especially as a conclusion to a work.
Homogeneity- The quality or state of being of a similar kind or of having a uniform structure or
composition throughout.
Calculated- worked out by mathematical calculation.

Significance- The quality of being worthy of attention; importance.
Deviation- The action of departing from an established course or accepted standard.
Difference- A point or way in which people or things are dissimilar.
Simultaneous- It means existing or occurring at the same time.
YOUTUBE VIDEOS
1. https://www.youtube.com/watch?v=xTpHD5WLuoA
2. https://www.youtube.com/watch?v=nvAMVY2cmok
3. https://www.youtube.com/watch?v=QfVx7AH8rck
REFERENCES
1. Smilde, A. K., Hoefsloot, H. C. and Westerhuis, J. A. (2008), The geometry of ASCA. Journal of
Chemometrics, 22, 464–471.
2. Smilde, Age K.; Jansen, Jeroen J.; Hoefsloot, Huub C. J.; Lamers, Robert-Jan A. N.; van der Greef,
Jan; Timmerman, Marieke E. (2005), ANOVA-simultaneous component analysis (ASCA): a new
tool for analyzing designed metabolomics data, Bioinformatics, 21 (13), 3043-3048.
3. Tiku, M. L. (1971), Power Function of the F-Test Under Non-Normal Situations. Journal of the
American Statistical Association. 66 (336): 913–916.
4. Hart (2001), Mann-Whitney test is not just a test of medians: differences in spread can be important.
BMJ. doi:10.1136/bmj.323.7309.391.
WIKIPEDIA
1. https://www.scribbr.com/statistics/one-way-anova/
2. https://en.wikipedia.org/wiki/Analysis_of_variance
3.https://en.wikipedia.org/wiki/One-way_analysis_of_variance
4.https://en.wikipedia.org/wiki/One-way_analysis_of_variance
REFERENCE BOOKS
1. Montgomery, Douglas C. (2001). Design and Analysis of Experiments (5 thEd.). New York: Wiley.
p. Section 3–2.
2. Montgomery, Douglas C. (2001). Design and Analysis of Experiments (5 thEd.). New York: Wiley.
p. Section 3–2.

3. Gelman, Andrew; Hill, Jennifer (2006). Data Analysis Using Regression and
Multilevel/Hierarchical Models. Cambridge University Press. pp. 45–46.
4. George Casella (2008). Statistical design. Springer Texts in Statistics. Springer.

CREDIT 03-UNIT 04: APPLICATION OF CORRELATION OF DATA
LEARNING OBJECTIVES
 Determine the direction and strength of the linear correlation between the two factors.
 Interpret Pearson's correlation coefficient and coefficient of determination and test its significance.
 List and explain the three assumptions and three limitations for estimating the correlation
coefficient.
 Distinguish between the predictor variable and criterion variable.
 Identify each source of variation in regression analysis.
Regression to the stage of early infancy is not a suitable method in and of itself. A regression can only
be effective if it happens in the natural course of therapy and if the client is able to maintain adult
consciousness at the same time‖-- Alice Miller
INTRODUCTION
Regression analysis is a powerful statistical method that allows you to examine the relationship
between two or more variables of interest. While there are many types of regression analysis, and
they all examine the influence of one or more independent variables on a dependent variable.
Fig.1.1: Showing regression analysis

The earliest form of regression was the method of least squares, which was published by Legendre in
1805, and by Gauss in 1809. Legendre and Gauss both applied the method to the problem of
determining, from astronomical observations, the orbits of bodies about the Sun. Gauss published a
further development of the theory of least squares in 1821, including a version of the Gauss Markov
theorem.
The term "regression" was coined by Francis Galton in the 19th century to describe a biological
phenomenon. The phenomenon was that the heights of descendants of tall ancestors tend to regress
down towards a normal average (the mean). For Galton, regression had only this biological meaning, but

his work was later extended by Udny Yule and Karl Pearson to a more general statistical context. In the
work of Yule and Pearson, the joint distribution of the response and explanatory variables is assumed to
be Gaussian. This assumption was weakened by R.A. Fisher in his works of 1922 and 1925. Fisher
assumed that the conditional distribution of the response variable is Gaussian, but the joint distribution
need not be. In this respect, Fisher's assumption is closer to Gauss's formulation of 1821.
Once a regression model has been constructed, it may be important to confirm the goodness of fit of the
model and the statistical significance of the estimated parameters. Commonly used checks of goodness
of fit include the R-squared, analyses of the pattern of residuals and hypothesis testing. Statistical
significance can be checked by an F-test of the overall fit, followed by t-tests of individual parameters.
04-01: CORRELATION REGRESSION ANALYSIS AND APPLICATIONS
INTRODUCTION (CORRELATION ANALYSIS)

Correlation analysis is used to determine the association between two continuous variables, such as a
dependent variable and an independent variable, or between two independent variables‖ Correlation
analysis in research is a statistical method that measures the strength of the linear relationship between
two variables and calculates their association. Simply put, correlation analysis measures the degree to
which one variable change due to changes in others. Correlation analysis is used to determine the degree
to which two variables are related. With correlation analysis, you calculate the correlation coefficient,
which tells you how much one variable change when the other changes. Correlation analysis gives you a
linear relationship between two variables.
INTRODUCTION TO REGRESSION ANALYSIS

Regression analysis is a statistical technique for analysing and comprehending the connection between
two or more variables of interest. The methodology used to do regression analysis aids in understanding
which elements are significant, which may be ignored, and how they interact with one another.
i Regression is a statistical approach used in finance, investment, and other fields to identify the
strength and type of a connection between one dependent variable (typically represented by Y)
and a sequence of other variables (known as independent variables).
ii Regression is essentially the "best guess" at utilising a collection of data to generate some form of
forecast. It is the process of fitting a set of points to a graph.
Regression analysis is a mathematical method for determining which of those factors has an effect. It
provides answers to the following questions:
i. Which factors are most important?
ii. Which of these may we disregard?

iii. How do those elements interact with one another, and perhaps most significantly, how confident
are we in all of these variables?
These elements are referred to as variables in regression analysis. You have your dependent variable,
which is the key aspect you're attempting to understand or forecast. Then there are your independent
variables, which are the elements you assume have an effect on your dependent variable.
TYPES OF REGRESSION ANALYSIS

1. Simple linear regression
The relationship between a dependent variable and a single independent variable is described using a
basic linear regression methodology. A Simple Linear Regression model reveals a linear or slanted
straight line relation, thus the name.
The simple linear model is expressed using the following equation:
Y = a + bX + ϵ
Where:
Y – variable that is dependent
X – Independent (explanatory) variable
a – Intercept
b – Slope
ϵ – Residual (error)
The dependent variable needs to be continuous/real, which is the most crucial component of Simple Linear
Regression. On the other hand, the independent variable can be evaluated using either continuous or
categorical values.
2. Multiple linear regression

Multiple linear regression (MLR), often known as multiple regression, is a statistical process that uses
multiple explanatory factors to predict the outcome of a response variable.
MLR is a method of representing the linear relationship between explanatory (independent) and response
(dependent) variables.
The mathematical representation of multiple linear regression is:
y=ß0+ ß1 x1+ ………….. ßn xn + ϵ
Where, y = the dependent variable’s predicted value
B0 = the y-intercept

B1X1= B1 is the coefficient for regression of the first independent variable X1 (The effect of
increasing the independent variable's value on the projected y value is referred to as X1.)
… = Repeat for as many independent variables as you're testing.
BnXn = the last independent variable's regression coefficient
ϵ = model error (i.e. how much flexibility is there in our y estimate)
Multiple linear regression uses the same criteria as single linear regression. Due to the huge number of
independent variables in multiple linear regression, there is an extra need for the model:
The absence of a link between two independent variables with a low correlation is referred to as non-
collinearity. It would be hard to determine the true correlations between the dependent and independent
variables if the independent variables were strongly correlated.
3. Non-linear regression
A sort of regression analysis in which data is fitted to a model and then displayed numerically is known as
nonlinear regression.
Simple linear regression connects two variables (X and Y) in a straight line (y = mx + b), whereas
nonlinear regression connects two variables (X and Y) in a nonlinear (curved) relationship.
The goal of the model is to minimize the sum of squares as much as possible. The sum of squares is a
statistic that tracks how much Y observations differ from the nonlinear (curved) function that was used to
anticipate Y.
In the same way that linear regression modelling aims to graphically trace a specific response from a set of
factors, nonlinear regression modelling aims to do the same.
Because the function is generated by a series of approximations (iterations) that may be dependent on
trial-and-error, nonlinear models are more complex to develop than linear models.
The Gauss-Newton methodology and the Levenberg-Marquardt approach are two well-known approaches
used by mathematicians.
Therefore, the term linear regression often describes multivariate linear regression.
THERE ARE SOME DIFFERENCES BETWEEN CORRELATION AND REGRESSION.

i Correlation shows the magnitude of the degree of correlation between two variables. It does not
define a line through the data points. They calculate the correlation, which shows how much one
variable change when the other remains constant. If r is 0.0, the relationship does not hold.
When r is positive, one variable goes up and the other goes down. When r is negative, one
variable goes up and the other goes down.
ii. Linear regression finds the best line that predicts y from x, but the correlation does not fit the
line.

iii. Correlation is used when both variables are measured, while linear regression is mainly used
when x is the manipulated variable.
Fig.1.1: Showing differences between correlation and regression

Table 4.1: Comparison between Correlation and Regression
Basis Correlation Regression

A statistical measure that defines co- Describes how an independent
relationship or association of two variable is associated with the
Meaning variables. dependent variable.
Dependent and No difference Both variables are different.

independent
variables
To fit the best line and estimate one
To describe a linear relationship variable based on another variable.
Usage
between two variables.
To estimate values of a random

To find a value expressing the variable based on the values of a
Objective fixed variable.
relationship between variables.

1) What do you mean by correlation?
Answer: Correlation is a statistical measure that expresses the extent to which two variables are
linearly related (meaning they change together at a constant rate). It's a common tool for describing
simple relationships without making a statement about cause and effect.
2) What is an example of a correlation?

Answer: The more time a student spends watching TV, the lower their exam scores tend to be. In other
words, the variable time spent watching TV and the variable exam score have a negative correlation.
As time spent watching TV increases, exam scores decrease.

3) What is correlation in data analysis?
Answer: Correlation analysis in research is a statistical method used to measure the strength of the
linear relationship between two variables and compute their association. Simply put - correlation
analysis calculates the level of change in one variable due to the change in the other.
4) Why is correlation used?

Answer: In summary, correlation coefficients are used to assess the strength and direction of the linear
relationships between pairs of variables. When both variables are normally distributed use Pearson's
correlation coefficient, otherwise use Spearman's correlation coefficient.
03-04-02: CORRELATION AND REGRESSION STATISTICS

The degree of association is measured by r after its originator and a measure of linear association.
Other complicated measures are used if a curved line is needed to represent the relationship (Fig.4.2).
Fig.3.4.2: Showing correlations graph

The coefficient of correlation is measured on a scale that varies from +1 to -1 through 0. The
complete correlation among two variables is represented by either +1 or -1. The correlation is
positive when one variable increase and so does the other; while it is negative when one decreases as
the other increases. The absence of correlation is described by 0.
CORRELATION COEFFICIENT FORMULA

Let X and Y be the two random variables.
The population correlation coefficient for X and Y is given by the formula:
Where,

ρXY = Population correlation coefficient between X and Y μX =
Mean of the variable X μY = Mean of the variable Y σX = Standard
deviation of X σY = Standard deviation of Y
E = Expected value operator
Cov = Covariance
The above formulas can also be written as:
The sample correlation coefficient formula is:
The above formulas are used to find the correlation coefficient for the given data. Based on the value
obtained through these formulas, we can determine how strong the association between two variables
is.
SIMPLE LINEAR REGRESSION E QUATION

As we know, linear regression is used to model the relationship between two variables.
Thus, a simple linear regression equation can be written as:
Y = a + bX
Where,
Y = Dependent variable X = Independent variable a =
[(∑y) (∑x2) – (∑x) (∑xy)]/ [n(∑x2) – (∑x)2] b =
[n(∑xy) – (∑x) (∑y)]/ [n(∑x2) – (∑x)2]
Regression Coefficient
In the linear regression line, the equation is given by:
Y = b0 + b1X
Here b0 is a constant and b1 is the regression coefficient. The formula for the regression coefficient is
given below.

b1 = ∑[(xi – x)(yi – y)]/ ∑[(xi – x)2]
The observed data sets are given by xi and yi. x and y are the mean value of the respective variables.
We know that there are two regression equations and two coefficients of regression.
The regression coefficient of y and x formula is:
byx = r.(σy/σx)
The regression coefficient of x on y formula is:
bxy = r.(σx/σy)
Where,
σx = Standard deviation of x σy = Standard deviation of y
PROPERTIES OF A REGRESSION COEFFICIENT ARE LISTED BELOW:
i. The regression coefficient is denoted by b.

ii. The regression coefficient of y on x can be represented as b yx. The regression coefficient of
x on y can be represented as bxy. If one of these regression coefficients is greater than 1, then
the other will be less than 1.
iii. They are not independent of the change of scale. They will change in the regression
coefficient if x and y are multiplied by any constant.
iv. The arithmetic mean of both regression coefficients is greater than or equal to the coefficient
of correlation.
v. The geometric mean between the two regression coefficients is equal to the correlation
coefficient.
vi. If bxy is positive, then byx is also positive and vice versa.
APPLICATIONS OF REGRESSION ANALYSIS
Most of the regression analysis is done to carry out processes in finances. So, here are 5 applications of
Regression Analysis in the field of finance and others relating to it.
Applications of regression analysis are as: -
i. Forecasting:
The most common use of regression analysis in business is for forecasting future opportunities
and threats. Demand analysis, for example, forecasts the number of things a customer is likely to
buy. When it comes to business, though, demand is not the only dependent variable. Regressive
analysis can anticipate significantly more than just direct income. For example, we may predict
the highest bid for an advertising by forecasting the number of consumers who would pass in front
of a specific billboard.
Insurance firms depend extensively on regression analysis to forecast policyholder
creditworthiness and the number of claims that might be filed in a particular time period.
ii. CAPM:
The Capital Asset Pricing Model (CAPM), which establishes the link between an asset's projected
return and the related market risk premium, relies on the linear regression model. It is also
frequently used in financial analysis by financial analysts to anticipate corporate returns and
operational performance.
The beta coefficient of a stock is calculated using regression analysis. Beta is a measure of return
volatility in relation to total market risk. Because it reflects the slope of the CAPM regression, we
can rapidly calculate it in Excel using the SLOPE tool.
iii. Comparing with competition:
It may be used to compare a company's financial performance to that of a certain counterpart. It
may also be used to determine the relationship between two firms' stock prices (this can be
extended to find correlation between 2 competing companies, 2 companies operating in an
unrelated industry etc).
It can assist the firm in determining which aspects are influencing their sales in contrast to the
comparative firm. These techniques can assist small enterprises in achieving rapid success in a
short amount of time.
iv. Identifying problems:
Regression is useful not just for providing factual evidence for management choices, but also for
detecting judgement mistakes. A retail store manager, for example, may assume that extending
shopping hours will significantly boost sales.
However, RA might suggest that the increase in income isn't enough to cover the increase in
operational cost as a result of longer working hours (such as additional employee labour
charges). As a result, this research may give quantitative backing for choices and help managers
avoid making mistakes based on their intuitions.
v. Reliable source:
Many businesses and their top executives are now adopting regression analysis (and other types of
statistical analysis) to make better business decisions and reduce guesswork and gut
instinct. Regression enables firms to take a scientific approach to management. Both small and
large enterprises are frequently bombarded with an excessive amount of data. Managers may use

regression analysis to filter through data and choose the relevant factors to make the best decisions
possible.
1) How do you write a regression equation?

Answer: A linear regression line has an equation of the form Y = a + bX, where X is the
explanatory variable and Y is the dependent variable. The slope of the line is b, and a is the
intercept (the value of y when x = 0).
2) What are two regression equations?
Answer: The regression equation of x on y is: (X – X) = bxy (Y – ) where bxy-the regression

coefficient of x on y. The regression equation of y on x is: (Y – ) = bxy (X – X) where byx-the
regression coefficient of y on x.
3) What is regression with example?

Answer: Formulating a regression analysis helps you predict the effects of the independent variable on the
dependent one. Example: we can say that age and height can be described using a linear regression model.
Since a person's height increases as age increases, they have a linear relationship.
4) What is correlation and regression?

Answer: The most common methods for studying the relationship between two quantitative variables
are correlation and linear regression. Correlation quantifies the strength of a linear relationship between
a pair of variables, while regression expresses the relationship in the form of an equation.
03-04-03: TOXICOLOGY
Toxicology is the scientific study of the adverse effects of chemicals on living organisms‖. It is the
observation and reporting of symptoms that occur as a result of exposure to toxic substances.
"Toxicology is a field of science that helps us understand the harm chemicals, substances or situations
can cause to people, animals, and the environment."
About 35 years ago, T.A. Loomis divided the science of toxicology into three main sub-disciplines:
environmental, economic, and forensic. These fractions are largely based on how humans are exposed
to potentially harmful chemicals.

Toxicologists study the interaction of chemicals and plants, animals, and people to determine the effects
of chemicals and assess the safety of compounds (Fig.3.1).
Fig. 3.4.5: Toxicity interaction
TOXICITY TESTING
Toxicological testing, also called "safety assessment or toxicity testing‖, is the process of determining
the extent to which a substance of interest adversely affects normal biological functions. Usually the
organism with a specific time of exposure, route of exposure and concentration of substances.
Toxicological testing can be done by chemical characterization, route of toxicity, target testing and
dose-related extrapolation, etc. Substances are tested using a variety of methods, including topical
application, inhalation, oral administration, injection, or water.
SPECIALIZATION IN TOXICOLOGY
i. Toxicology ii. Aquatic toxicity

iii. Toxic chemicals iv. Clinical Toxicology
v. Micro ecology. vii. Environmental Toxicology
vii. Forensic poison viii. Medical poison
ix. Occupational toxicity x. Regulatory Toxicity
More information on each method can be found below, including EURL ECVAM recommendations,
in:
i. Alternative Methods Tracking System for Regulatory Acceptance (TSAR)
ii. Core service database on alternative methods for animal testing (DB -ALM)
New Directive 2010/63/EU further strengthens the role of ECVAM and its mandates Its duties and
tasks are determined as follows (7):
i. Coordinate and promote the development and use of alternatives to procedures, including in
the areas of basic and applied research and regulatory testing

ii. Coordinate validation of alternative approaches at EU level
iii. Act as a focal point for information exchange on developing alternatives approach
iv. Establish, maintain, and manage a public database and information system on the alternative
approaches and their development status
v. Facilitate dialogue between legislators, regulators and all stakeholders, especially industry,
biomedical scientists, consumer organizations and human rights groups Development of
consumption, approval, regulatory acceptance, international recognition and use of
alternative methods
vi. Maintains a database of validated alternative methods (Alternative Test Methods
Tracking System, TSAR) and a database of in vitro testing protocols (INVITTOX)
vii. Database maintenance (Experimental Alternative Animal Testing Service, DBALM)

TYPES OF TOXICITY
i Acute toxicity: Acute systemic toxicity testing involves the evaluation of the total toxic effects
of a single or multiple do of a chemical or product over a 2-hour period by a specific route (oral,
dermal, inhalation) and those during a 21-day follow-up.
ii Aquatic toxicity: Aquatic toxicity generally refers to chemical effects on aquatic organisms and
is defined using organisms representing three trophic levels: algae or plants, invertebrates.
iii Aquatic Bioconcentration: Bioconcentration describes the accumulation of chemicals present
in water by an aquatic organism, while bioaccumulation includes the consumption of all sources
in the environment, e.g. water, food and sediment.
iv Biologics: Biologics are products derived from biological sources, including immunologic (such
as vaccines and serums), hormones, antibodies, and blood products.
v Carcinogenicity: Substances that cause tumors, increase their spread or malignancy, or shorten
the time before tumor formation after inhalation, ingestion, dermal or injection are considered
carcinogenic.
vi Eye irritation: Eye irritation after application of the test substance to the anterior surface of the
eye that is fully reversible within 21 days of application.
vii Genotoxicity: Genetic changes in the body and germ cells are associated with important health
effects, which can also occur with low exposure levels.
viii Phototoxicity: Phototoxicity (phototoxicity) is defined as a toxic reaction that occurs during
the initial exposure of the skin to certain chemicals and after light or as a result of irradiation of
the skin after systemic (oral, intravenous) use of a chemical substance.

ix Repeated Dose Toxicity: Repeated dose toxicity refers to the overall adverse toxicological
effects that occur as a result of repeated daily exposure to a substance over a period of time up
to the expected lifespan of the test species.
x Skin corrosion: Skin corrosion is defined as causing irreversible damage to the skin; i.e. visible
necrosis through the epidermis and into the skin after up to four hours of exposure to the test
substance.
xi Skin tones: irritants lead to a reversible local inflammatory response of the skin caused by the
innate (non-specific) immune system of the affected tissue.
xii Skin sensitization: Skin sensitization is a regulatory endpoint used to identify chemicals that
may cause an allergic reaction in sensitive individuals.
xiii Toxicokinetic: Toxic kinetic describes how the body handles chemicals as a function of dose
and time according to the concept of ADME (Absorption, Distribution, Metabolism and
Elimination).
1) What are special toxicity tests?

Answer: Reproductive and developmental toxicity test is a comprehensive evaluation of the effects of
chemicals on the growth of human cells after birth and after birth. This test uses laboratory animals to
check for adverse effects, such as human malformations and impaired fertility.
2) What is the use of toxicity test?

Answer: Toxicity tests are designed to produce data related to adverse effects on humans, animals, and
the environment. Manifold tests are used to evaluate specific hazardous effects known as endpoints,
including diseases, eye irritation, and others.
3) What are the types of toxicity?

Answer: There are generally five types of toxic entities; chemical, biological, physical, radiation and
behavioral toxicity: Disease-causing microorganisms and parasites are toxic in a broad sense but are
generally called pathogens rather than toxicants.
4) What are the parameters of toxicity testing?

Answer: Standard toxicological parameters such as body weight, food consumption, clinical pathology,
gross observations at necropsy, organ weights, and histopathological evaluation of tissues will be

conducted for all animals that die, are euthanized prior to scheduled termination, or are euthanized at
scheduled study
03-04-04: DETOXIFICATION
The word “toxic” goes back to ancient Greek: “Toxon” means “bow” and archers often used poisoned
arrows.
“A live organism's elimination of hazardous chemicals through physiological or pharmacological means”.
Additionally, it can be used to describe the time during drug withdrawal when an organism regains
equilibrium following a prolonged usage of an addictive chemical. Decontamination of toxin ingestion,
the use of antidotes, as well as procedures like dialysis and chelation therapy, are all ways that
detoxification can be accomplished in medicine.
Intoxication
Chemical, biological, physical, radioactive, and behavioral toxicity are the five main categories. Parasites
and pathogenic microbes are harmful. Being under the influence of one or more psychoactive substances
is known as intoxication. It can also be used to describe the results of consuming poison or excessive
amounts of generally safe drugs.
TYPES OF DETOXIFICATION
i. Detoxification from alcohol: Without medical supervision, a long-term alcohol addiction's

precipitous withdrawal can be lethal and create serious health issues. Detoxing from alcohol is not
an alcoholic treatment. After detoxification, additional therapies are needed to address the
underlying addiction that led to alcohol usage.
ii. Drug withdrawal: Depending on the location of treatment, drug detoxification differs; nonetheless,
most detox facilities offer treatment to prevent the physical withdrawal symptoms from alcohol
and other substances. To cope with the effects of withdrawal, the majority of detox programmes
also include counselling and therapy.
iii. Metabolic cleansing: An animal's metabolism can result in the production of hazardous chemicals,
which it can then reduce, oxidise, conjugate, and excrete from cells or tissues. These techniques
are especially
iv. Complementary medicine: Some theories in complementary medicine assert that using herbal,
electrical, or electromagnetic treatments, "toxins" can be eliminated from the body. Contrast
showers, detoxifying foot pads, oil-pulling, Gerson therapy, snake-stones, body-cleansing,
Scientology's Purification Rundown, water fasting, and metabolic therapy are a few of the
treatments available.
SYMPTOMS OF INTOXICATION
Specific symptoms of intoxication may vary depending on the substance that was ingested. However,
some of the common symptoms of alcohol intoxication include:
i. Ataxia: Ataxia is a condition that impairs walking. A drunk individual can have trouble walking
straight or keep falling over.
ii. Confusion and drowsiness: People who are intoxicated experience acute weariness and
disorientation.
iii. Euphoria: When under the influence, people may feel happy, talk a lot, and act in ways they
wouldn't usually do.
iv. Loss of inhibitions: Even a few drinks may cause people to feel more at ease, vulnerable, and
uninhibited.
v. Poor judgement: Being intoxicated can cause people to make poor choices and take risks, such
driving while intoxicated.
vi. Speech issues: Other common signs of intoxication include slurred speech and other speech
disorders.
vii. Vomiting: People who are intoxicated may vomit as their body attempts to recover.
PROCESSES OF DETOXIFICATION
i. Evaluation: Upon beginning drug detoxification, a patient is first tested to see which specific
substances are presently circulating in their bloodstream and the amount. Clinicians also evaluate
the patient for potential co-occurring disorders, dual diagnosis, and mental/behavioral issues.
ii. Stabilization: In this stage, the patient is guided through the process of detoxification. This may
be done with or without the use of medications but for the most part the former is more common.
Also, part of stabilization is explaining to the patient what to expect during treatment and the
recovery process.
iii. Guiding Patient into Treatment: The last step of the detoxification process is to ready the
patient for the actual recovery process. As drug detoxification only deals with the physical
dependency and addiction
ORGAN SYSTEM DETOXIFICATION
We call the process of eliminating toxins, “detoxication” or “detoxification,” which is the opposite
of “intoxication.” Different tissues detoxify in varying ways.
a. Lungs- can detoxify by removing gases (gas anesthetics are removed from the body by the
lungs).

b. Skin- can detoxify by reducing the penetration of toxic substances (toxins in water don’t get
in through the skin well; however, toxins in oils do penetrate easily).
c. Digestive System- can detoxify by eliminating toxic foods, by either vomiting or diarrhea.
d. Kidneys- detoxify by secreting toxins or filtering toxins out of the blood into urine.
e. Liver- detoxifies by changing the chemical nature of many toxins.
DETOXES” AND “CLEANSES”
A variety of “detoxification” diets, regimens, and therapies sometimes called “detoxes” or “cleanses” have
been suggested as ways to remove toxins from your body, lose weight, or promote health. These include:
i. Fasting
ii. Drinking only juices or similar beverages
iii. Eating only certain foods
iv. Using dietary supplements or other commercial products
v. Using herbs
vi. Cleansing the colon (lower intestinal tract) with enemas, laxatives, or colon hydrotherapy (also
called “colonic irrigation” or “colonics”)
vii. Reducing environmental exposures
viii. Using a sauna.
1) What do we mean by toxicity?

Answer: The degree to which a substance (a toxin or poison) can harm humans or animals. Acute toxicity
involves harmful effects in an organism through a single or short-term exposure.
2) Why is it called toxic?

Answer: The word “toxic” goes back to ancient Greek: “Toxon” means “bow” and archers often used
poisoned arrows. Why such a roundabout definition? Simple: The usual word for venom was already
taken in English.
3) What is toxic behavior?

Answer: Toxic behavior definition refers to a person whose behavior and ongoing actions cause harm to
other people by physical or mental means. These are the people who spread negative or toxic stress or
traumas on others behavior.
4) What is a toxic food?

Answer: Food toxins are natural substances covering a large variety of molecules, generated by fungi,
algae, plants, or bacteria metabolism with harmful effects on humans or other vertebrates even at very low
doses.
1. Which of the following are types of correlation?

a. Positive and Negative b. Simple, Partial and Multiple
c. Linear and Nonlinear d. All of the above
2. Which of the following is true for the coefficient of correlation?
a. The coefficient of correlation is not dependent on the change of scale
b. The coefficient of correlation is not dependent on the change of origin
c. The coefficient of correlation is not dependent on both the change of scale and change of
origin
3. Which of the following statements is true for correlation analysis?
a. It is a bivariate analysis b. It is a multivariate analysis
c. It is a univariate analysis d. Both a and c
4. If the values of two variables move in the same direction, ----------------.
a. The correlation is said to be non-linear
b. The correlation is said to be linear
c. The correlation is said to be negative
d. The correlation is said to be positive
5. If the values of two variables move in the opposite direction, ------------------.
a. The correlation is said to be linear
b. The correlation is said to be non-linear
c. The correlation is said to be positive
d. The correlation is said to be negative
Answer:
1-d 2-c 3-c 4-d 5-d
1. Correlation can be seen when two sets of data are graphed on a scatter plot, which is a graph with
an X and Y axis. (True/False)

2. The study of how variables are correlated is called correlation analysis. (True/False)
3. Correlation Analysis is statistical method that is used to discover if there is a relationship between
two variables. (True/False)
4. The sample correlation coefficient r is the not estimator of population correlation coefficient (r).
(True/False)
5. The coefficient of determination is the square of the non-correlation coefficient (r). (True/False)
Answer:
1-True 2- True 3-True 4-False 5-False
MATCH THE FOLLOWING WORDS OF COLUMN-I TO AND COLUMN-II
FILL IN THE BALNKS WITH APPROPRIATE WORD

1. A correlation reflects the strength of the -------------- between two (or more) variables.
2. The direction of a correlation can be either ------------- or negative.
3. The purpose of the correlation is to determine ----------------- and make predictions.
4. Correlation refers to a measure of how strongly two or more --------- are related to each other.
5. Correlation was developed by -------------- from an idea introduced by Francis Galton (1880).
6. Correlation mathematical formula was derived and published by ---------------- (1844).
Answer:
1- 2- 3- 4- 5- Karl 6- Auguste
relationship Positive relationships variables Pearson Bravais
SUMMARY

For a long time, regression analysis has been utilized extensively by enterprises to transform data into
useful information, and it continues to be a valuable asset to many leading sectors. The significance of
regression analysis lies in the fact that it is all about data: data refers to the statistics and statistics that
identify your company. The benefits of regression analysis are that it allows you to essentially crunch
the data to assist you make better business decisions now and in the future.
Correlation analysis is applied in quantifying the association between two continuous variables, for
example, a dependent and independent variable or among two independent variables. It ranges between -
1 and 1, denoted by r and quantifies the strength and direction of the linear association among two
variables. If the regression has one independent variable, then it is known as a simple linear regression.
If it has more than one independent variable, then it is known as multiple linear regressions. A
correlation reflects the strength and/or direction of the relationship between two (or more) variables. The
direction of a correlation can be either positive or negative.
So, here are 5 applications of Regression Analysis in the field of finance and others relating to it.
Showing the applications of regression analysis such as forecasting, CAPM, comparing with
competition, identifying problems, reliable source. Insurance firms depend extensively on regression
analysis to forecast policyholder credit worthiness and the number of claims that might be filed in a
particular time period.
Toxicology is a field of science that helps us understand the harm that chemicals, substances or
situations can cause to people, animals, and the environment. Toxicologists study the interactions of
chemicals with plants, animals, and humans to determine the effects of chemicals and assess the safety
of compounds.
“It is the physiological or medicinal removal of toxic substances from a living organism.” Additionally, it
can refer to the period of drug withdrawal during which an organism returns to homeostasis after long-
term use of an addictive substance. In medicine, detoxification can be achieved by decontamination of
poison ingestion and the use of antidotes as well as techniques such as dialysis and chelation therapy.
There are generally five types of toxicities; chemical, biological, physical, radioactive and behavioral. Disease-
causing microorganisms and parasites are toxic. Intoxication is the state of being affected by one or more
psychoactive drugs. It can also refer to the effects caused by the ingestion of poison or by the overconsumption
of normally harmless substances.
KEY WORDS
Forecasting - Forecasting is a technique that uses historical data as input to make informed guesses
that predict the direction of future trends.
Reliable - Consistently good in quality or performance; believable

Correlation - This is a statistical term that describes the degree to which two variables move in
harmony with each other.
Coefficient - Each of the factors of the product is considered in relation to a specific factor.
Equation - A mathematical statement consisting of two expressions joined by an equal sign.
Toxicity- The quality of being poisonous; the degree to which something is poisonous.
Detoxicate- A set of interventions aimed at managing acute intoxication and withdrawal.
Intoxication- The officer approached and broke up the scene, telling the intoxicated man to go to his
room due to his public intoxication.
Detoxication- The process of removing toxic substances out of body.
REFRENCES
1. Russell MAH, Cole PY, Idle MS, Adams L. Carbon monoxide yields of cigarettes and their
relation to nicotine yield and type of filter. BMJ 1975; 3:713.
2. Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of
clinical measurement. Lancet 1986; i:307-10.
3. Stulp, Freek, and Olivier Sigaud. Many Regression Algorithms, One Unified Model:
A Review. Neural Networks, vol. 69, Sept. 2015, pp. 60–79.
4. Armitage P, Berry G. In: Statistical Methods in Medical Research, 3 rdEd. Oxford:
Blackwell Scientific Publications, 1994:312-41.
YOUTUBE VIDEOS
1. https://www.youtube.com/watch?v=Tamoj84j64I
2. https://www.youtube.com/watch?v=K6kP9xmOrtk
3. https://www.youtube.com/watch?v=funYuQhPlmc
WIKIPEDIA
1. https://byjus.com/maths/correlation-and-regression/
2. https://www.cuemath.com/data/correlation-and-regression/
3. https://en.wikipedia.org/wiki/Geographical_indication
4. https://en.wikipedia.org/wiki/Toxicity
REFERENCE BOOKS

1. Sen, M. Srivastava, Regression Analysis- Theory, Methods, and Applications, Springer-Verlag,
Berlin, 2011 (4th printing).
2. Brown RA, Swanson-Beck J. Medical Statistics on Personal Computers, 2nd Ed.
London: BMJ Publishing Group, 1993.
3. Malakooti, B. (2013). Operations and Production Systems with Multiple Objectives.
John Wiley & Sons.
4. Fox, J. (1997). Applied Regression Analysis, Linear Models and Related Methods.
Sage Publication.

CREDIT- 04

CREDIT 04-UNIT 01: LITERATURE COLLECTION
LEARNING OBJECTIVES
 After successful completion of this unit, you will be able to

 Explain the purpose of specifying a research question.
 Identify different sources of ideas for the research topic.
 Summarize why conducting a literature review is helpful.
 Describe the steps to gather sources for a literature review.
"Literature always anticipates life. It does not copy it but molds it to its purpose. The nineteenth
century, as we know it, is largely an invention of Balzac."-- Oscar Wilde
INTRODUCTION
The Literature Cited (bibliography) is found at the end of your paper and contains the complete
reference for each of the in-text citations used in your paper. Generally, a citation includes the
author(s), date, title and source of your publication. You should pay careful attention to details of
formatting when you write your own. For papers published in journals you must provide the date,
title, journal name, volume number, and page numbers. For books you need the publication date,
title, publisher, and place of publication.
Fig.4.1.1: Showing needed

It is very important that you give credit to the source of any information or ideas that you present
in your essay. The source of the information or ideas must be credited directly in the text and this is
known as citing literature. Usually, the sources you cite will be your primary articles, popular article,
books, or texts. However, you should also cite any personal communications with researchers or
instructors. These are appropriate when you wish to quote a phrase that you cannot express better in
your own words. In general, you should rephrase the information or ideas cited into your own words
and cite the source appropriately.
At the end of your essay, you must have a section entitled 'Literature cited'. This section should
include an alphabetical list of all of the literature cited in the body of your essay. Include full

citations for all your sources, including primary articles, popular articles, books, textbooks, personal
communications or web references. Do not include any sources that you read but did not refer to in
the body of your essay.
The function of the literature cited section is two-fold. First, it gives credit to the researchers whose
ideas and findings you are discussing in your introduction. Second, it allows others who are
interested in your study to find relevant articles for further, in depth investigation of this topic. You
may have used the 'literature cited' yourselves, to track down additional primary sources on your
topic once you located one useful reference on your topic.
Some rules:
i When listing your references in this section, put them in alphabetical order according to the first
author. Give all authors, and don't use et al. in this section.
ii If you include more than one article by the same author, place these articles in chronological
order with the earliest article listed first.
iii Titles of books and journals should be underlined or italicized.
iv For articles from journals, you must give the volume number and may give the issue number.
v The publisher's name and city are required for books but not for journals.
vi Page numbers are not required for books but must be included for journal articles or for articles
in edited books.
vii Web page references should include the author's name, title and address of the website and date
on which you viewed it.
04-01-01: REVIEW (NEED, PROCESS, CONSULTING)

To gain background knowledge of the research topic, identify the concept relating to it potential
between them and to formulate a researchable hypothesis. To identify appropriate methodology,
research design, methods of measuring concepts, and techniques of analysis. ―A review is the
process of writing a summary, synthesis, and/or critique of material found as a result of a literature
search‖. It can be used as a background or setting for a primary research project.
A review is a review of a publication, product, service or company, or a critical view of current

literary, political or cultural events‖. In addition to a critical rating, a reviewer may assign a work a
score to indicate its relative value.
There are several reasons to review the literature:
i. Identify the developments in the field of study
ii. Learn about the information sources and the research methodologies

iii. Find gaps in the literature that can become research questions
iv. Validate the originality of a research project
v. Evaluate the methods
vi. Identify errors to avoid
vii. Highlight the strengths, weaknesses and controversies in the field of study
viii. Identify the subject experts
There are objectives you should keep in mind:

i. Inform the audience of the developments in the field
ii. Establish your credibility
iii. Discuss the relevance and significance of your question(s) iv. Provide the context for your
methodological approach
v. Discuss the relevance and appropriateness of your approach.
The depth and scope of a literature review depends on many factors, most notably the purpose and
audience of the review. For example, if you are writing a literature review to help you write your
dissertation, a very comprehensive literature review that includes all relevant literature on your topic
and relevant sources beyond those readily available and free. may be required.
TYPE OF LITERATURE SEARCH

i. Systematic reviews integrate available studies and experiments that focus on specific
research questions. This type of validation helps researchers overcome possible biases in
applying methods.
ii. The meta-analysis summarizes and statistically evaluates the results of a number of
systematic studies. This helps predetermine the results of similar studies, based on the
principle that related studies have a common truth.
iii. Synthetic reviews are said to be one of the broadest methodological approaches to analysis.
This includes theoretical literature, experimental and non-experimental studies.
It can be used to sketch ideas, analyze problems and explore theories.
THREE-DIMENSIONAL LITERATURE SEARCH MECHANISM

It is a structure that allows researchers and readers to understand prior art at a glance. A critical
review of the literature structure; first, focuses on how well the currently published research answers
the research question. Second, in the case of a critical review, how relevant are previous studies to
our research goals. The structure of literature survey is of five stages.

i. One is to classify and cluster the literature to be reviewed according to appropriate concepts.
ii. Then browse the abstracts to discover the strengths and weaknesses of the current literature.
iii. Reviews should be organized and categorized by subject.
iv. Strengths and weaknesses and gaps in current knowledge on the research topic are
mentioned.
v. Finally, we justify how a particular study aims to fill these knowledge gaps. must obey.
FUNCTIONS OF LITERARY STUDIES

Literature search basically has three functions: -
i. What insights and ideas have been identified about a topic and what their strengths and
weaknesses are in a particular research topic.
ii. It is to give the reader a clear notion that the researcher is familiar with the theory and concepts.
As a result, readers may think you are a better scientist than an expert in your field.
iii. A common final function is to present an idea about a particular problem and the theory that
supports that phenomenon.
PEER REVIEW PROCESS

The peer review process can be roughly summarized in 10 steps (Fig.1.2), but these steps may vary
slightly between journals. Find out what it is below:
Fig.4.1.2: Peer review process

i Submission of paper: The corresponding author or submitting author submits the paper to
the journal. This is typically done through an online system. Occasionally, journals accept
submissions by email.
ii Editor's evaluation: The journal checks the structure and placement of the work against the
journal's author guidelines to ensure it includes the required sections and style. Paper quality is
not evaluated at this point.
iii Review by Editor-in-Chief (EIC): EIC checks whether the work is suitable for journaling and
whether it is sufficiently original and interesting. Otherwise, the work may be rejected without
further review.
iv The EIC assigns Associate Editors (AEs): Some journals have associate editors who perform
peer reviews. If so, it will be assigned at this stage.
v Invitation to be a reviewer: The responsible editor will send invitations to those who he
considers to be appropriate reviewers. Once responses are received, further invitations will be
issued, if necessary, until the required number of acceptances (usually he is two) is reached, with
some variation from journal to journal.
vi Responding to invitations: Potential reviewers evaluate invitations based on their expertise,
conflicts of interest, and availability. Then accept or decline. If possible, we will also suggest
alternative reviewers in case of rejection.
vii Under verification: Reviewers read the work over and over again over time. A first reading
will help you get a first impression of the work. If serious problems are identified at this stage,
the appraiser may refuse the work without further work. Otherwise, read the paper a few more
times and take notes to create a detailed point-by-point review. The review is then submitted to
the journal with a recommendation to accept or reject it. Or, if it is rejected, a request for
revision (usually marked as either major or minor) is made before being reconsidered.
viii Journal evaluates reviews: The responsible editor considers all ratings returned before making
an overall decision. If the ratings differ significantly, the publisher may invite additional
reviewers to provide additional input before making a decision.
ix You will be notified of the decision: The editor sends the author a decision email containing
the relevant reviewer comments. Whether comments are anonymous or not depends on the type
of peer review the journal conducts.
x Next Step: At this point, reviewers should also receive an email or letter notifying them of the
outcome of their review. When work is returned for revision, reviewers should expect to receive
a new version unless they opt out of future participation. However, if only minor changes are
requested, this review can be performed by the responsible editor.

1) What is the purpose of review in research?
Answer: The purpose of literature review is to understand existing research and discussion on a
particular topic or area of research and to translate that knowledge into a report is to present in doing
a literature search can help you expand your knowledge in your field.
2) What is a review explained?

Answer: A review is a judgment or discussion about the quality of something. Reviewing can also
mean revisiting a topic or watching something another time as part of your learning. Review has
many other meanings, both as a noun and as a verb. A review is a critique of something - looking at
the pros and cons of something.
3) What are the four main types of reviews?

Answer: Many types of literary criticism have emerged over the years, but the four main types are
traditional, narrative, and systematic. target, meta-analysis, and metasynthesis. The main purpose of
a traditional or narrative literature review is to analyze and summarize a collection of literature.
4) What is the significance of review articles?

Answer: Good reviews clarify the state of knowledge, explain apparent contradictions, and identify
research needs. You can even create a consensus that didn't exist. Writing good reviews can help you
advance your career. Reviews tend to be widely cited, which helps with awareness and publicity.
04-01-02: LITERATION CITATION

A citation is a reference to a source used in your research‖. This is how you credit authors for their
creative and intellectual work that you referenced to support your research.
As a general rule, citations should include author's name, date, publisher information, journal
information, and/or DOI‖ (Digital Object Identifier).
Citation style is the formal way citation information is formatted‖. It shows what information is
included, order, punctuation, and other formats. There are different styles and the order in which
information appears (such as date of publication, title and page number after author's name, etc.),
punctuation conventions, use of italic is unique to their style. There are many ways to cite sources
from your research. Citation style sometimes depends on the discipline involved and sometimes on
the publisher/place of publication.

Citation style is a set of guidelines for how to cite sources in your academic writing. You should
always cite whenever you quote, paraphrase or summarize a source to avoid plagiarism. How you
present these quotes depends on the style you follow. The different styles are defined by different
universities, academic associations and publishers, often published in official textbooks with
detailed instructions and examples.
CITATION TYPES
There are many different types of citations, but they typically use one of three basic approaches:
quoting in parentheses, quoting numbers, or quoting footnotes. The most obvious identifying
feature of any citation style is the way the citations are presented in the text (Fig.2.1).
Fig.2.1: Representing citation styles
THERE ARE THREE MAIN APPROACHES :

i. Quoting in parentheses: You include source identification details in parentheses in the
text, usually the author's last name and date of publication, and the page number, if
applicable (author date). Sometimes the publication date is omitted (page-author). These
styles include APA style, MLA style, Chicago Author-Date style, CSE Name-Year style.
ii. Numerical citation: You include a number in parentheses or in superscript, corresponding
to an entry in your numbered list of references. These types include AMA (medical) type,
Chicago Notes type, CSE Citation-Name type, CSE Citation-Sequence type, IEEE type.
iii. Citation Note: You include the full citation in footnotes or endnotes, indicated in the text
by a number or symbol. These styles include- Blue Book (law), additional citation styles.
DISCIPLINE WISE CITATION STYLE

In most cases, your University, Department, or Guide will tell you which citation style to follow in
your paper. If you're unsure, it's best to check your facility's manual or ask someone. If you are
submitting to a magazine, they will often ask for a specific style. Sometimes the choice of citation
style can be up to you. In these cases, you can make decisions based on citation styles commonly
used in your field. Try reading other articles in your field to see how they cite their sources, or check
out the table 2.1, below.
Table 2.1: Showing discipline and their citation style

Discipline Typical citation style(s)
Economics Harvard
Engineering & IT IEEE
Humanities Chicago notes and bibliography; Harvard; MLA
Law Bluebook; OSCOLA
Medicine AMA; NLM; Vancouver
Political science APSA
Psychology APA
Sciences ACS; Chicago author-date; CSE citation-name or

citation-sequence; CSE name-year; Harvard
Social sciences AAA; APA; ASA; Chicago author-date
TYPES OF CITATION METHODS

Citation styles also differ in the format of the reference list or the bibliographic entries themselves
(e.g. capitalization, order of information, use of italics). And many style guides also offer advice on
more general matters like text formatting, punctuation, and numbers.
Citation Styles for Humanities and Social Sciences
i. APSA Style Manual (Political Science): PDF version of the 2018 revised edition of the
style manual published by APSA, the American Political Science Association. APSA Style
is based upon Chicago Author-Date.
ii. AP Stylebook (Journalism) : This link opens in a new window Not a citation guide, but a
writing guide for journalists from the Associated Press. Provides "guidelines for spelling,
language, punctuation, usage and journalistic style." Print editions are located in several
libraries.
iii. ASA Style (Sociology): Tips on formatting references using the 6th edition of the official
style of the American Sociological Association. See also this pdf handout from University
of Connecticut. Print edition is located at Odegaard Library. ASA Style is based upon
Chicago Author-Date.
iv. Harvard referencing system: Guide from Anglia Ruskin University (license CC BYNC-
ND 3.0). The Harvard system is an author-date citation system used by many British and
Commonwealth universities, and also by UW's Geography Department.

v. Linguistics Style Sheet: The Linguistics Society of America's Unified Style Sheet for
Linguistics Journals (2003). See also the specific Language Style Sheet used in that journal.
This style is very similar to Chicago Author-Date.
vi. Blue Book/ Maroon Book: The University of Chicago Manual of Legal Citation, an attempt
to systematize a simpler style of legal citation than presented in The Bluebook.
vii. MHRA Style Guide Online (Humanities, UK): Download the 3rd edition of the style
guide for the Modern Humanities Research Association format, used by British humanities
journals. Offers both a footnote and an author-date referencing system.
CITATION STYLES FOR SCIENCES

i. ACM Style (Computing): From the Association for Computing Machinery. A numbered
sequence style, but with reference formats very similar to Chicago Author-Date style. ii.
ACS Style Quick Guide (Chemistry): Quick citation guidelines from The ACS Guide to
Scholarly Communication, the official guide from the American Chemical Society. The print
edition of the ACS Style Guide is located in several UW libraries. (ACS uses both author-
date and numbered referencing systems).
iii. AIAA Style (Aeronautics and Astronautics): From the American Institute of Aeronautics
and Astronautics. A numbered sequence style, similar to CSE Citation Sequence.
iv. AIP Style Manual (Physics): PDF version of the 4th edition (1990) of the style manual for
the American Institute of Physics, which uses a numbered, sequenced system similar to CSE
Citation-Sequence.
v. AMS Style Guide (Mathematics): PDF version of the style guide for journals published by
the American Mathematical Society, which is primarily concerned with typesetting
mathematical writing but does refer to a numbered sequence citation format.
vi. ASCE Style (Civil Engineering): For publishing with the American Society of Civil
Engineers. An author-date parenthetical style similar to CSE Name-Year
vii. ASME Referencing Guidelines (Mechanical Engineering): From the American Society
of Mechanical Engineers. A slightly tweaked version of Chicago Notes style.
viii. NLM's Citing Medicine: 2nd edition of the official National Library of Medicine style guide
for authors, editors, and publishers

CITING LITERATURE IN THE BODY OF TEXT: As below: -

i When he sources has a single author: Dhembare (2015) Alteration in hematology of
European Rabbit, Oryctolagus cuniculus (Lin.) due to exposure of Ipomoea carnea (Jacq) leaf
extract.
ii When the source has two authors: Dhembare and Kakad (2017) Preliminary
phytochemicals screening of some medicinal plants from Bhandardhra, MS.
iii When the source has three or more authors: Dhembare et al. (2015) Effect of pH,
temperature and kinetics of pectinase enzyme using Aspergillus niger by solid state
fermentation.
iv When two sources are used in a single sentence: Dhembare et al, (2013) and Nillena (2012)
discuss the artificial ripener and its effect on human.
v Citing personal communications: Salmon embryos develop faster at higher temperatures (L.
Albright, personal communication).
vi Internet communications: Website or e-mail information is treated with the same credibility
as personal communications, and viewed as less reliable than primary sources. Include the
author and title of the website, its address and the date that you accessed it.
vii Paper is cited in one of your references: Dhembare et al., (2013, cited in Gupta et al., 2016)
have supported the truth about the artificial ripener.
LISTING 'LITERATURE CITED ' AT THE END OF YOUR ESSAY:

At the end of your essay, you should have a section called "Literature Citation". This section
should include an alphabetical list of all documents cited in the body of your essay. Include full
citations for all your sources, including key articles, popular articles, books, manuals, personal
communications, or web references. Do not include any sources you have read but have not
referenced in the body of your essay.
Examples:
i Book: Anant Dhembare. 2021. Animal Development. Vision Publication, Pune.
ii Journal article: Dhembare, A. J. (2012) Diversity and density of zooplankton from Muladam
Rahuri, Ahmednagar district, Maharashtra. Geobios, 39 (1): 3-8.
iii Chapter in an Edited book: Elsner, R.E. and S Ashwell-Erickson. Cardiovascular adjustments
to diving. In the Biology of marine mammals, Edited by H.T. Andersen, Academic Press, New
York, pp 117-145.

1) What is a citation?

Answer: A "citation" is your way of telling your readers that some element of your work comes from
another source. It also gives your readers the information they need to find details about the source's
location on the reference page or in the works cited. The quote must include a set of parentheses.
2) How is the citation done?

Answer: Citation, when citing news sources in a report, you include a number in parentheses
corresponding to the number of sources listed in the order in which they appear in the report, with
the source listed first is [1], next source [ 2], and so on.
3) Why are quotes used?

Answer: Citation tells the reader where you have found your information. Citations allow your
readers to learn more, starting with your sources. Citations give credibility to people whose words or
ideas you use. Citation protects you from plagiarism.
4) What is the difference between citation and reference?

Answer: In the citation section, you confirm the source of a specific part of the article or assignment
text. Alternatively, by referring to the full list of sources referenced and supporting the author's
argument, provided at the end of the document or article.
04-01-03: RESEARCH REPORTING, TEXT TABLES, FIGURES

A report is a written explanation of something observed, heard, done, or studied. It is a systematic
and organized presentation of the facts and conclusions of an event that has happened somewhere or
has been discovered after thorough research.
STEPS INVOLVED IN REPORTING

The report is used as a form of written assessment to discover what has been learned from reading,
researching, or experiencing an important skill in use. widely in the workplace. Good report writing
is an important quality for any researcher as it also presents the results to readers outside of your
field with experts in the field. It focuses on the results, conclusions, findings made, efforts made, and
inferences drawn from the research conducted. The report should be written in a simple yet academic
style. The language should be formal and not the same as the language used in media publications
(Fig.3.1).

Fig.4.1.4: Research reporting process
A GOOD RESEARCH REPORT HAS THE FOLLOWING FUNCTIONS:

i. To provide the information regarding the findings of research work i.e. methods,
ii. Data analysis, conclusion and so on in the systematic, scientific and accepted way.
iii. To elicit crucial facts for solution derived and decision making. iv. To prove the worth
and legitimacy of assigned research job.
v. To provide the judgment tools for the judgment of quality and talent of researcher
within and outside the academia.
vi. To communicate the research findings professionally. vii. To pertain the
credibility of the research.
viii. To develop appreciation of standards, consolidate arguments and identify the knowledge gaps.
TYPES OF REPORTS: Here are some important types of research reports. As: -
i Technical report: In the technical report, the focus is on the methods used, the assumptions
made during the research, and the detailed presentation of the results, including their
limitations. and supporting data. For example, the project signals when a hotel is being
conceptualized.
ii Formal or informal report: A well-structured formal report is carefully written, objectively
clear, organized, and detailed enough to allow the reader to understand the concepts. They are
written with non-personal elements while an informal report can be direct, concise with
informal language, e.g. inter-departmental communication with an announcement or a memo.
iii Popularity report: The strength of this report is its simplicity and attractiveness. Simplification
is achieved through clear writing, minimal technical details, especially mathematical details,
and heavy use of tables and diagrams. Attractive layouts with large print, lots of subtitles, even
occasional figurines are another feature of popular reporting.

iv Information or analytical reports: Information reports (annual reports, monthly financial
statements and personnel reports) convey objective information from one area of the
organization to another. Analytical reports (scientific studies, feasibility reports and real estate
assessments) present problem-solving efforts.
v Proposal report: Proposal is a variation of the problem statement. A proposal is a document
prepared to describe how one organization can meet the needs of another. Most government
agencies advertise their claims by issuing "requests for proposals" or calling for tenders. The
RFP specifies a need and potential suppliers prepare proposal reports showing how they can
meet that need.
vi Vertical or horizontal reports: All reports that move up or down in the hierarchy are called
vertical reports; these reports contribute to management control. On the other hand, bilateral
reports help in coordination within the organization. The cyclical relationship between units at
the same organizational level (manufacturing services and financial services) is lateral.
vii Internal or external reports: internal reports that are disseminated within the organization,
e.g. those circulated during the morning meeting of hotel managers. External reports, such as
the company's annual report, are prepared for distribution outside the organization, e.g. a report
showing the growth, popularity of a client brand hotel compared to other brands.
viii Periodic Reports: Reports are generated periodically to guide management in exercising better
control. Special formats are pre-printed and the system creates them to be identical in nature.
ix ix. Functional reports: This classification includes accounting reports, marketing reports,
financial statements and many other types of reports that derive from the end use of the report.
Almost any report can be included in most of these categories. And the same report can be put
into several classifications.
a. Preprinted Form: These are like fill in the blank reports, relatively short (5 pages) and deal
with routine information, mostly numerical information.
b. Letter: Shorter reports that are aimed to develop an understanding in the people outside the
organization. These reports include all the normal parts of a letter, but they may also have
headings, footnotes, tables, and figures. Personal pronouns are used in this type of report.
c. Memo: Common for short (10 pages) informal reports distributed within an organization.
The memo format of ―Date, ―To, ―From, and ―Subject‖ is used. Like longer reports,
they often have internal headings and sometimes have visual aids.
d. Manuscript: These are the reports that range from a few pages to several hundred pages and
require a formal approach. As their length increases, reports in manuscript format require
more elements before and after the text of the report.


FIGURES, TABLES, IMAGE AND MAP WRITING
FIGURES-
Figures and tables (display elements) are often the fastest way to convey large amounts of complex
information that is difficult to explain in writing. Many readers will just look at your display
elements without reading the main text of your manuscript. Therefore, make sure that your display
elements can be independent of the text and clearly communicate your most meaningful results.
Display elements are also very important in attracting readers to your work. Well-designed and
compelling display elements will capture your readers' interest, force them to take the time to
understand a number, and possibly even entice them to read your entire manuscript.
Finally, high-quality display elements give your work a professional look. Readers will assume that a
manuscript with a professional appearance contains good scientific quality. This way, readers can
have more confidence in your results and your interpretation of them. When deciding which of your
results to show as display items, consider the following questions:
i. Are there any data that readers might rather see as a display item rather than text?
ii. Do your figures supplement the text and not just repeat what you have already stated?
iii. Have you put data into a table that could easily be explained in the text such as simple statistics
or p values?
FIGURES WRITING (S KETCH): Figures are ideal for presenting:
i. Images
ii. Data plots iii. Maps iv. Schematics
TABLES WRITING (DRAW)

Tables are a concise and effective way to present large amounts of data. You should design them
carefully so that you clearly communicate your results to busy researchers. The following is an
example of a well-designed table:
i. Clear and concise legend/caption
ii. Data divided into categories for clarity
iii. Sufficient spacing between columns and rows
iv. Units are provided
v. Font type and size are legible
IMAGES WRITING (SKETCH)

Images help readers visualize the information you are trying to convey. Often, it is difficult to
describe enough using words. Images can help achieve the precision required for a scientific

manuscript. For example, it may not be enough to say, "Surface has nanoscale features". In this case,
it is ideal to provide images under a microscope.
For images, be sure to:
i. Include scale bars
ii. Consider labeling important items
iii. Indicate the meaning of different colours and symbols used
DATA PLOTS WRITING (DRAW )

Data cells transmit large amounts of data quickly. The goal is usually to show a functional or
statistical relationship between two or more items. However, details about individual data points are
often omitted to emphasize the relationship represented by the set of points. Here we have example
metrics that combine images and paths in multiple tables.
For data plots, be sure to:
i. Label all axes
ii. Specify units for quantities
iii. Label all curves and data sets
iv. Use a legible font size
MAPS WRITING (SKETCH)

The map is important to place the fieldwork in the context where it is being done. A good map will
help the reader understand how the site affects your learning. Plus, it will help other researchers
replicate your work or find other sites of a similar nature. Here we have a map used in salmon
research.
For maps, be sure to:
i. Include latitude and longitude
ii. Include scale bars
iii. Label important items
iv. Consider adding a map legend
SCHEMATICS
Diagrams help identify the key elements of a system or process. They should only highlight key
elements, as adding unimportant elements can clutter the image. A diagram consisting only of
drawings chosen by the author, providing a degree of flexibility not available with images. They can
also be used in situations where photography is difficult or impossible. Below is a diagram
explaining how to use nanotubes to capture energy from liquids.

For schematics, be sure to:
i. Label key items
ii. Provide complementary explanations in the caption and main text
AVOIDING IMAGE MANIPULATION

You should never intentionally manipulate your image to change or enhance your results. To avoid
any inadvertent manipulation, you should only minimally process your figures before submitting to
the journal, the images you submit should faithfully represent the original image files.
i. Adjusting image brightness or contrast, for example in a fluorescence microscope, is
acceptable only if applied equally to all images, including controls
ii. Cropping of images should be avoided during data generation unless it significantly
improves the clarity or brevity of the presentation. Make sure that cropping does not exclude
any information necessary to understand the figure, such as molecular markers in the gel
electrophoresis.
iii. Any adjustments or processing software used must be indicated.

1) What are table of figures?
Answer: A table of figures is a list, sorted by page number, of the captions pulled from figures,
images, or tables in your document. It's like a table of contents, but it's a table of anything to which
you can add a caption.
2) How do you write a table figure?

Answer: Tables should be labeled with a number preceding the table title; tables and figures are
labeled independently of one another. Tables should also have lines demarcating different parts of the
table (title, column headers, data, and footnotes if present).
3) How many pages does a research thesis have?

Answer: As a general rule, an average research thesis or dissertation should be between 5 and 7
pages. In addition, these articles should have at least two paragraphs per page. However, the length of
a paper depends on various factors, mainly on the type of paper, the technicality and the professor's
expectations.
4) What are the basic parts of a research paper?

Answer: There are eight main sections in a research paper: title (cover page), introduction, literature
review, research methods, data analysis, results, conclusion, and reference pages.
04-01-04: RESEARCH PAPERS, PROJECT PROPOSAL, PROJECT

REPORTS, DISCUSSION, BIBLIOGRAPHY WRITING
INTRODUCTION (RESEARCH PAPER)
A research paper is a piece of academic writing that provides analysis, interpretation, and argument
based on in-depth independent research. Research papers are similar to academic essays, but they are
usually longer and more detailed assignments, designed to assess not only your writing skills but also
your skills in scholarly research‖. Writing a research paper requires you to demonstrate a strong
knowledge of your topic, engage with a variety of sources, and make an original contribution to the
debate.
Fig.4.1.6: Representing research paper writing.

This step-by-step guide takes you through the entire writing process, from understanding your
assignment to proofreading your final draft is called as research paper writing‖.
UNDERSTAND THE ASSIGNMENT

Completing a research paper successfully means accomplishing the specific tasks set out for you.
Before you start, make sure you thoroughly understanding the assignment task sheet:
i. Read it carefully, looking for anything confusing you might need to clarify with your
professor,
ii. Identify the assignment goal, deadline, length specifications, formatting, and submission
method,
iii. Make a bulleted list of the key points, and then go back and cross completed items off as
your writing.
CHOOSE A RESEARCH PAPER TOPIC

There are many ways to generate an idea for a research paper, from brainstorming with pen and
paper to talking it through with a fellow student or professor. You can try free writing, which

involves taking a broad topic and writing continuously for two or three minutes to identify absolutely
anything relevant that could be interesting.
You can also gain inspiration from other research. The discussion or recommendations sections of
research papers often include ideas for other specific topics that require further examination. Once
you have a broad subject area, narrow it down to choose a topic that interests you, meets the criteria
of your assignment, and is possible to research. Aim for ideas that are both original and specific:
i. A paper following the chronology of World War II would not be original or specific enough.
ii. A paper on the experience of Danish citizens living close to the German border during
World War II would be specific and could be original enough.
The AI-powered Citation Checker helps you avoid common mistakes such as:
i. Missing commas and periods
ii. Incorrect usage of - et al.
iii. Ampersands (&) in narrative citations
iv. Missing reference entries

CONDUCT PRELIMINARY RESEARCH
Note any discussions that seem important to the topic, and try to find an issue that you can focus
your paper around. Use a variety of sources, including journals, books and reliable websites, to
ensure you do not miss anything glaring.
Do not only verify the ideas you have in mind, but look for sources that contradict your point of
view.
i. Is there anything people seem to overlook in the sources you research?
ii. Are there any heated debates you can address?
iii. Do you have a unique take on your topic?
iv. Have there been some recent developments that build on the extant research?
v. You might find it helpful to formulate some research questions to help guide you.

DEVELOP A THESIS STATEMENT (TITLE)
A thesis statement is a sentence that states the topic and purpose of your paper or thesis‖. A good
thesis statement will direct the structure of your essay and will allow your reader to understand the
ideas you will discuss within your paper. A thesis statement is a statement of your central argument.
It establishes the purpose and position of your paper. If you started with a research question, the
thesis statement should answer it. It should also show what evidence and reasoning you‘ll use to
support that answer. The thesis statement should be concise, contentious, and coherent. That means it
should briefly summarize your argument in a sentence or two; make a claim that requires further
evidence or analysis; and make a coherent point that relates to every part of the paper. You will
probably revise and refine the thesis statement as you do more research, but it can serve as a guide
throughout the writing process.
OUTLINE OF RESEARCH PAPER

A research paper outline is essentially a list of the key topics, arguments and evidence you want to
include, divided into sections with headings so that you know roughly what the paper will look like
before you start writing. A structure outline can help make the writing process much more efficient,
so it’s worth dedicating some time to create one. The outline mainly includes- title, authors address,
abstracts, keywords, introductions, materials and methods, results and discussion, table and graphs,
and references.
WRITE A FIRST DRAFT OF THE RESEARCH PAPER

Your first draft won’t be perfect- you can polish later on. Your priorities at this stage are as follows:
i. Maintaining forward momentum - write now, perfect later.
ii. Paying attention to clear organization and logical ordering of paragraphs and sentences,
which will help when you come to the second draft.
iii. Expressing your ideas as clearly as possible, so you know what you were trying to say when
you come back to the text.
INTRODUCTION IN RESEARCH PAPER

The research paper introduction should address three questions: What, why, and how? After
finishing the introduction, the reader should know what the paper is about, why it is worth reading,
and how you build your arguments.
What? Be specific about the topic of the paper, introduce the background, and define key terms
or concepts.
Why? This is the most important, but also the most difficult, part of the introduction. Try to
provide brief answers to the following questions: What new material or insight are you offering?
What important issues does your essay help define or answer?
How? To let the reader, know what to expect from the rest of the paper, the introduction should
include a map of what will be discussed, briefly presenting the key elements of the paper in
chronological order.

WRITE A COMPELLING BODY OF TEXT
The major struggle faced by most writers is how to organize the information presented in the paper,
which is one reason an outline is so useful. However, remember that the outline is only a guide and,
when writing, you can be flexible with the order in which the information and arguments are
presented.
One way to stay on track is to use your thesis statement and topic sentences. Check:
i. topic sentences against the thesis statement;
ii. topic sentences against each other, for similarities and logical ordering;
iii. and each sentence against the topic sentence of that paragraph.
WRITE THE CONCLUSION

The research paper conclusion is designed to help your reader out of the paper‘s argument, giving
them a sense of finality. Trace the course of the paper, emphasizing how it all comes together to
prove your thesis statement. Give the paper a sense of finality by making sure the reader understands
how you’ve settled the issues raised in the introduction. You might also discuss the more general
consequences of the argument, outline what the paper offers to future students of the topic, and
suggest any questions the paper’s argument raises but cannot or does not try to answer.
You should not:
i. Offer new arguments or essential information
ii. Take up any more space than necessary
iii. Begin with stock phrases that signal you are ending the paper (e.g. In conclusion)
THE SECOND DRAFT

There are four main considerations when it comes to the second draft.
i. Check how your vision of the paper lines up with the first draft and, more importantly, that
your paper still answers the assignment.
ii. Identify any assumptions that might require (more substantial) justification, keeping your
reader’s perspective foremost in mind. Remove these points if you cannot substantiate them
further.
iii. Be open to rearranging your ideas. Check whether any sections feel out of place and whether
your ideas could be better organized.
iv. If you find that old ideas do not fit as well as you anticipated, you should cut them out or
condense them. You might also find that new and well-suited ideas occurred to you during
the writing of the first draft now is the time to make them part of the paper.

THE REVISION PROCESSES
The goal during the revision and proofreading process is to ensure you have completed all the
necessary tasks and that the paper is as well-articulated as possible.
GLOBAL CONCERNS
i. Confirm that your paper completes every task specified in your assignment sheet.
ii. Check for logical organization and flow of paragraphs.
iii. Check paragraphs against the introduction and thesis statement.
FINE -GRAINED DETAILS
i. Each sentence helps support the topic sentence.
ii. No unnecessary or irrelevant information is present.
iii. All technical terms your audience might not know are identified.
FOLLOWING ARE THE MAIN STEPS OF RESEARCH PAPER WRITING
Step 1- Write the Introduction
Step 2- Write the compelling body text
Step 3- Writes the Materials and Methods
Step 4- Write the Result and Discussion
Step 5- Write the Conclusion/ Summary
Step 6- Write the References/ Bibliography
Step 7- Draw Tables/Graphs/Images
Step 8- Write the Abstract
Step 9- Write the proper title (But Put First)
(Note- There is various research paper formats are available)
PROJECT PROPOSAL WRITING
INTRODUCTION (PROPOSAL WRITING)
A project proposal is a written document outlining everything stakeholders should know about a
project, including the timeline, budget, objectives, and goals. Your project proposal should
summarize your project details and sell your idea so stakeholders feel inclined to get involved in the
initiative‖. Project proposal are an extremely important aspect of a project. It must be properly
structured and also contain necessary and pertinent information regarding the project. No data fields
are displayed in the project field.
The aim of the project is to create a good product and report and software, hardware, theory, etc.
that you have developed during the project is only a means to this end. Design documents should be

gradually converted into project reports as the different phases of the project are completed. Ideally,
you should generate a large number of reports as you go along and use a week or two in advance to
compile them into a cohesive document.
Fig.4.1.7: Project proposal

WRITE A PROJECT PROPOSAL
An orderly, well presented, and consistently formatted document is easy to read and suggests a
careful and professional attitude towards preparation. Remember that quantity does not automatically
guarantee quality. A 150-page report is not twice as good as a 75-page report, a 10,000-row
implementation is also not twice as good as a 5,000-row report. Concise, clear, and elegant are
invaluable qualities in report writing, as well as in programming, and will be rewarded. Try to ensure
that your report contains the following (exact structure, chapter titles, etc. are up to you):
TITLE PAGE
This must include the title of the project and the author's name. fake report. You can also provide the
name of your supervisor if you wish. important:
EXECUTIVE SUMMARY
The Executive Summary is a very brief summary of the content of the report. It should be about half
a page. Someone unfamiliar with your project should have a good idea of the project after reading
the brief themselves and will know if it interests them.
ACKNOWLEDGMENTS
It is generally a good idea to thank people who have provided exceptionally helpful support,
technical or otherwise, throughout your project. Your supervisor will obviously be happy to be
recognized as they will invest a lot of time monitoring your progress.
TABLE OF CONTENTS
This page should list the main chapters and (sub) sections of your report. Choose meaningful chapter
and section titles and use double-spaced lines for clarity. If possible, you should include a page
number indicating the beginning of each chapter/section. Try to avoid too many levels of subtitles -
three is enough.
INTRODUCTION
This is one of the most important elements of the report. It should begin with a clear statement of the
project's purpose so that the nature and scope of the project can be understood by the casual reader. It
should summarize everything you want to achieve, providing a clear summary of the project's
background, relevance, and key contributions. The introduction should set the scene for the project
and provide the reader with a summary of the main things to look for in the rest of the report. When
detailing contributions, it is helpful to provide hints to the section(s) of the report that provide
relevant technical details. The introduction itself is largely non-technical. It is helpful to state the
main goals of the project as part of the introduction. However, avoid the temptation to list low-level
goals one after another in the introduction, and then, in the review (see below), say a reference like
"All of your goals project achieved ...".
CONTEXT
The context section of the report should identify the project's place in its context and provide a
proposed alternative to achieving the project's objectives. Background can be included as part of an
introduction but is usually better in a separate chapter, especially if the project involves a large
amount of background work. When referring to other works, cite the sources in which they are
mentioned or used, rather than just listing them at the end.
BODY OF THE PROPOSAL
The central part of the report usually consists of three or four programs detailing the engineering
work carried out on the project. The structure of these chapters depends a lot on the project. They
may reflect the chronological development of the project, e.g. design, implementation, testing,
optimization, evaluation, etc. If you've built new software, you should describe and demonstrate your
program design at a high level, possibly using an approved graphical form such as UML. It should
also document any issues or interesting features in your implementation. Integration and testing are
also important to discuss in some cases. You should thoroughly discuss the contents of these sections
with your supervisor.
CONCLUSIONS OF WORK
The project conclusion should list what has been learned as a result of the work you did. For
example, "The use of overloading in C++ provides a very elegant mechanism for parallelization
throughout sequential programs". Avoid tedious personal reflections like "I've learned a lot about
C++ programming..." Usually, end a report by listing ways to go further in the project. For example,
this could be a plan to improve the project if you had the opportunity to rework it, turning the
project's deliverables into a more polished final product.
REFERENCES

This is a list of all books, articles, manuals, etc. used in the project and mentioned in the report. You
must provide enough information to allow the reader to find the source. In the case of a manual, you
must cite the name of the publisher as well as the author(s). One weakness of many reports is
incomplete citation of sources. It's easy to do right, so there's no excuse at all. Each bibliographic
entry must list the author(s) and name of the work and must provide full details of where it can be
found.
APPENDIX
Appendices contain information outside the main body of the report. The information often includes
items such as sections of code, tables, test cases, or other documentation that can break the theme of
the text if it appears on the spot. You should try to link all your documents into one volume and
create the black book.
PROGRAM LIST
A complete program list is NOT included in the report, except in specific cases requested by your
supervisor. We highly recommend spending time looking at student reports from previous projects to
get an idea of what's good and bad. All reports from previous years are available in hard copy in the
CCCF and electronically in the student projects section. These documents are only accessible from
the TIFR IP domain.
PROJECT REPORT WRITING

INTRODUCTION (PROJECT REPORT WRITING)
A project report is simply a document that provides detail on the overall status of the project or
specific aspects of the project's progress or performance‖. Regardless of the type of report, it is made
up of project data based on economic, technical, financial, managerial or production aspects.
Depending on the project and organizational processes, additional project reports with in-depth
analysis and recommendations may also be required at the end of the project. Report writing is a
useful opportunity to evaluate a project, document lessons learned, and enrich your organization's
knowledge base for future projects. Try these steps to write better project reports.
Fig.4.1.8: Representing project report writing

EFFECTIVE PROJECT REPORT IN 7 STEPS

i Decide the Objective: Take some time to think about the purpose of the report. Do you need to
describe, explain, introduce or convince? Having a clear goal from the start ensures that you
stay focused, making it easier to engage your readers.
ii Understand your audience: Writing a formal annual report for your stakeholders is very
different from a financial review. Adjust language, data usage, and supporting graphics for your
audience. It is also helpful to consider the reader's personal communication style, for example,
how do they write emails or structure documents? Reflect their preferences as much as possible.
You may need to develop a more formal or informal tone for your own natural style. Applying
this technique will build rapport and make your ideas more receptive to your readers.
iii Format and type of report: Before you begin, check the format and type of the report. Do you
need to submit a written report or make a presentation? Do you need to write a formal, informal,
financial, annual, technical, investigative or problem-solving report? You should also confirm if
templates are available in the organization. Checking out these details can save you time later!
iv Gather facts and data: Include interesting facts and facts that will strengthen your argument.
Get started with your collaborative project site and work as needed. Remember to cite sources
such as articles, case studies, and interviews.
v Structure the Report: A report typically has four elements:

a. Executive Summary: Your report will begin with the summary, which is written once the
report is finished. As the first item the reader encounters, this is the most important section
of the document. They will likely use the summary to decide how much of the report they
need to read so make it count!
b. Introduction: Provide a context for the report and outline the structure of the contents.
Identify the scope of the report and any particular methodologies used
c. Body: It’s now time to put your writing skills to work! This is the longest section of the
report and should present background details, analysis, discussions, and recommendations
for consideration. Draw upon data and supporting graphics to support your position
d. Conclusion: Bring together the various elements of the report in a clear and concise
manner. Identify the next steps and any actions that your reader needs to take.
vi Readability: Take the time to make the report accessible and enjoyable to read. If you work in
word, the navigation pane is a great way to help your readers navigate the document. Use
formatting, images, and lists to break up long pieces of text.
vii Editing and revising: Take a moment to format your report and make it easy to read. No one
likes reading long blocks of text, so use formatting, images, and lists to break them up.
Remember that the first draft of the report cannot be perfect! So be sure to edit and revise the

content until you are sure it is error free. You can also consult your colleagues! So now that you
know the basics of project reporting and how to create it, let's take a look at some really cool
use cases for project reports.
(Note- There are several projects reporting format avails on internet)
DISCUSSION WRITING
INTRODUCTION (DISCUSSION WRITING )
The purpose of the discussion is to interpret and describe the significance of your findings in
light of what was already known about the research problem being investigated, and to explain any
new understanding or fresh insights about the problem after you've taken the findings into
consideration. The discussion will always connect to the introduction by way of the research
questions or hypotheses you posed and the literature you reviewed, but it does not simply repeat or
rearrange the introduction; the discussion should always explain how your study has moved the
reader's understanding of the research problem forward from where you left them at the end of the
introduction.
Fig.4.1.9: Representing discussion writing

IMPORTANCE OF A GOOD DISCUSSION
This section is often considered the most important part of a research paper. It most effectively
demonstrates your ability as a researcher to think critically about an issue, to develop creative
solutions to problems based on the findings, and to formulate a deeper, more profound understanding
of the research problem you are studying (Fig.4.4).
i. The discussion section is where you explore the underlying meaning of your research, its
possible implications in other areas of study, and the possible improvements that can be made
in order to further develop the concerns of your research.
ii. This is the section where you need to present the importance of your study and how it
may be able to contribute to and/or fill existing gaps in the field. If appropriate, the discussion
section is also where you state how the findings from your study revealed new gaps in the
literature that had not been previously exposed or adequately described.

iii. This part of the paper is not strictly governed by objective reporting of information but,
rather, it is where you can engage in creative thinking about issues through evidence-based
interpretation of findings. This is where you infuse your results with meaning.
STRUCTURE AND WRITING STYLE
I. General Rules
These are the general rules you should adopt when composing your discussion of the results:
i. Do not be verbose or repetitive.
ii. Be concise and make your points clearly.
iii. Avoid using jargon and Follow a logical stream of thought.
iv. Use the present verb tense, especially for established facts; however, refer to specific works
and references in the past tense.
v. If needed, use subheadings to help organize your presentation or to group your interpretations
into themes.
II. The Content
The content of the discussion section of your paper most often includes:
i. Explanation of results: comment on whether or not the results were expected and present
explanations for the results; go into greater depth when explaining findings that were
unexpected or especially profound. If appropriate, note any unusual or unanticipated patterns
or trends that emerged from your results and explain their meaning.
ii. References to previous research: compare your results with the findings from other studies,
or use the studies to support a claim. This can include re-visiting key sources already cited in
your literature review section, or, save them to cite later in the discussion section if they are
more important to compare with your results than being part of the general research you cited
to provide context and background information.
iii. Deduction: a claim for how the results can be applied more generally. For example,
describing lessons learned, proposing recommendations that can help improve a situation, or
recommending best practices.
iv. Hypothesis: a more general claim or possible conclusion arising from the results [which may
be proved or disproved in subsequent research].
III. Organization and Structure
Keep the following sequential points in mind as you organize and write the discussion section of
your paper:
i. Think of your discussion as an inverted pyramid. Organize the discussion from the general to
the specific, linking your findings to the literature, then to theory, then to practice [if
appropriate].

ii. Use the same key terms, mode of narration, and verb tense [present] that you used when
describing the research problem in the introduction.
iii. Begin by briefly re-stating the research problem you were investigating and answer all of the
research questions underpinning the problem that you posed in the introduction. iv. Describe
the patterns, principles, and relationships shown by each major finding and place them in
proper perspective. The sequencing of providing this information is important; first states the
answer, then the relevant results, then cite the work of others. If appropriate, refer the reader
to a figure or table to help enhance the interpretation of the data. The order of interpreting
each major finding should be in the same order as they were described in your results section.
v. A good discussion section includes analysis of any unexpected findings. This paragraph
should begin with a description of the unexpected finding, followed by a brief interpretation
as to why you believe it appeared and, if necessary, it‘s possible significance in relation to the
overall study. If more than one unexpected finding emerged during the study, describe each it
in the order they appeared as you gathered the data. vi. Before concluding the discussion,
identify potential limitations and weaknesses. Comment on their relative importance in
relation to your overall interpretation of the results and, if necessary, note how they may
affect the validity of the findings. Avoid using an apologetic tone; however, be honest and
self-critical.
vii. The discussion section should end with a concise summary of the principal implications of
the findings regardless of statistical significance. Give a brief explanation about why you
believe the findings and conclusions of your study are important and how they support
broader knowledge or understanding of the research problem. This can be followed by any
recommendations for further research. However, do not offer recommendations which could
have been easily addressed within the study. This demonstrates to the reader you have
inadequately examined and interpreted the data.
IV. Overall Objectives
The objectives of your discussion section should include the following:
i. Reiterate the Research Problem/State the Major Findings
Briefly reiterate for your readers the research problem or problems you are investigating and the
methods you used to investigate them, and then move quickly to describe the major findings of the
study. You should write a direct, declarative, and succinct proclamation of the study results.
ii. Explain the Meaning of the Findings and Why They are Important
No one has thought as long and hard about your study as you have. Systematically explain the
meaning of the findings and why you believe they are important. After reading the discussion section,
you want the reader to think about the results [why hadn’t I thought of that?]. You don’t want to force

the reader to go through the paper multiple times to figure out what it all means. Begin this part of the
section by repeating what you consider to be your most important finding first.
iii. Relate the Findings to Similar Studies
No study is so novel or possesses such a restricted focus that it has absolutely no relation to other
previously published research. The discussion section should relate your study findings to those of
other studies, particularly if questions raised by previous studies served as the motivation for your
study, the findings of other studies support your findings [which strengthens the importance of your
study results], and/or they point out how your study differs from other similar studies
iv. Consider Alternative Explanations of the Findings
It is important to remember that the purpose of research is to discover and not to prove. When writing
the discussion section, you should carefully consider all possible explanations for the study results,
rather than just those that fit your prior assumptions or biases.
v. Acknowledge the Study’s Limitations
It is far better for you to identify and acknowledge your study’s limitations than to have them pointed
out by your professor! Describe the generalizability of your results to other situations, if applicable to
the method chosen, then describe in detail problems you encountered in the method(s) you used to
gather information. Note any unanswered questions or issues your study did not address, and....
vi. Make Suggestions for Further Research
Although your study may offer important insights about the research problem, other questions related
to the problem likely remain unanswered. Moreover, some unanswered questions may have become
more focused because of your study. You should make suggestions for further research in the
discussion section.
v. Problems to Avoid
a. Do not waste entire sentences restating your results. Should you need to remind the reader of
the finding to be discussed, use "bridge sentences" that relate the result to the interpretation.
An example would be: The lack of available housing to single women with children in rural
areas of Texas suggests that... [Then move to the interpretation of this finding].
b. Recommendations for further research can be included in either the discussion or conclusion
of your paper but do not repeat your recommendations in the both sections.
c. Do not introduce new results in the discussion. Be wary of mistaking the reiteration of a
specific finding for an interpretation.
d. Use of the first person is acceptable, but too much use of the first person may actually
distract the reader from the main points.

THE BIBLIOGRAPHY
INTRODUCTION (BIBLIOGRAPHY )
The etymology of the term goes back semantically to the neo-Latin bibliography. It is a Greek word
that means to copy books, bibli (book) and graphia - graphy (writing). This concept was adopted
by Greek writers in the first three centuries of our era and is known as manual copying.
Fig.4.1.10: Representing bibliography

In the early 12th century, the concept took a literal form and was considered a method of compiling
intellectual books and documents. However, the modern concept of bibliography did not develop
until the 17th century.
DEFINITION
A bibliography is the list of sources a work's author used to create the work. It accompanies just
about every type of academic writing, like essays, research papers, and reports.
A bibliography is a list of all the sources you have used (whether referenced or not) in the course of
researching your work.
In general, a bibliography should include: names of authors, names of works, names and locations of
companies that have published copies of sources. The bibliography must clearly and fully describe
the sources used to prepare the report. This is an alphabetical list by last name of the authors.
TYPES OF BIBLIOGRAPHY
Carter and Barker describe bibliography as a dual academic discipline: an organized listing of books
(bibliography) and a systematic description of books as physical objects (bibliography) description
item. These two distinct concepts and practices have distinct reasons and serve different purposes.
i. An enumerative bibliography
This is a systematic listing of books and other works such as journal articles. Bibliography
ranges from "cited works" lists at the end of books and articles, to comprehensive and
independent publications. A notable example of a comprehensive, independent publication is
Gow's A. E. Housman: An outline, together with a list of his classic articles (1936). As separate
works, they can be found in constrained volumes such as those shown on the right, or in

computerized bibliographic databases. A library directory, although not called a "directory", is
essentially a directory. Bibliographic works are almost always considered third-level sources.
ii. Descriptive bibliography:
Fredson bowers described and elaborated on the standard practice of descriptive bibliography in
his Principles of Bibliography (1949). Scholars today consider Bowers' academic instruction
authoritative. In this classic text, the purveyors describe the bibliography's basic function as
"[providing] enough data for the reader to identify the book described, understand the
impression, and recognize the correct content."
STANDARD BIBLIOGRAPHY FORMAT
Bibliography Format for a Book: A standard bibliography for a book typically consists of the
following information:
i. Author(s)
ii. Title
iii. Publisher
iv. Date of Publication Example: Surname of author, name or two initials, Title taken from
title page-underlined or in italics, Edition (if more than one), volume if more than one,
place of publication, publishers, date on title page or copyright date. e.g. Kothari, C.R.,
Research Methods and Techniques,1989, New Delhi: Wiley Eastern Limited,4835/24
Ansari Road, Daryaganj, New Delhi 110 006.
Bibliography Format for a Periodical & Journal Article: An entry for a journal or periodical
article contains the following information:
i. Author(s)
ii. Article Title
iii. Journal Title
iv. Volume Number
v. Pages
vi. Date of Publication
Bibliography Format for Internet Sources: Format for internet sources usually includes the
following information:
i. Author (Website)
ii. Article Title
iii. Publication Information
iv. Version

v. Date of Publication
vi. Location (Digital Object Identifier – DOI or URL)
Reference list vs. Bibliography

i. A reference list is a list of all the sources that you have referred to in your text. A reference list
may be ordered in alphabetical order of authors' names, or numerically, depending on the
referencing system you are using.
ii. If you have been asked to include a reference list, you may also include a bibliography which
lists works that you have read but not cited.
iii. A bibliography lists all the sources you used when researching your assignment. You may
include texts that you have not referred to directly in your work, but which have had an
influence on your ideas. If you find you have a lot of works that are not referred to directly
though, you may wish to look back over your work and check that all of the ideas are fully
referenced.
1) What is a bibliography?
Answer: The term bibliography is used to refer to the list of sources (e.g. books, articles, websites) used
to write an assignment (e.g. an essay). It usually includes all sources referenced even if they are not
directly cited (mentioned) in the assignment.
2) What is the bibliography with examples?

Answer: The term "bibliography" is an umbrella term for any list of sources cited at the end of an
academic work. Some style guides use different terms to refer to folders. For example, the MLA format
refers to the bibliography of an article as a Citational Works page. The APA calls it the reference page.
3) What are folders used for?

Answer: Whether you are writing an article, book, research report or thesis, your bibliography is an
essential tool for communicating important information to your readers: First, By providing complete
details about every source you have used, you allow your readers to find these books and read them, if
they wish.
4) What is bibliography in research?

Answer: A bibliography is a list of books and other sources that you have used to prepare a research
paper. Sometimes these lists will include work that you have reviewed but do not specifically cite in
your assignment.
1. Which of the following is not an essential element of report writing?

a. Research Methodology b. Reference
c. Conclusion d. None of these
2. What is the purpose of doing research?
a. To identify problem b. To find the solution
c. Both a and b d. None of these
3. The main purpose of research in education is to -----------.
a. Help in the personal growth of an individual
b. Help the candidate become an eminent educationist
c. Increase job prospects of an individual
d. Increase social status of an individual
4. Which is the purpose of theory building?
a. Applied research b. Action research
c. Fundamental research d. Survey research.
5. What does description of the research problem not include?
a. Background of the study b. Theories on which it is based
c. Assumptions underlying it d. Review of research done.
Answer:
1-d 2-a 3-b 4-c 5-d
1. APA and MLA are the most common styles to use. (True/False)
2. Formatting of report writing also makes easy to writer to write. (True/False)
3. Vague research question and going off-topic is better. (True/False)
4. Corrected proofs are articles in press that contain the author's corrections. (True/False)
5. Online proofing is the process of sharing content for feedback and approval. (True/False)

Answer:
1-True 2-True 3-False 4-True 5-True
Column-I Column-II
1. Introduction a. The last part of something, its end or result, summarized
2. Methodology b. A systematic investigation designed to develop a knowledge
3. Result c. A body of methods, rules, and postulates employed
4. Discussion d. Something that happens or exists because of something else
5. Conclusion e. A conversation or debate about a specific topic
Answer:
1-b 2-c 3-d 4-e 5-a
FILL IN THE BLANK WITH APPROPRIATE WORDS
1. The conclusion is the final -------- of your research paper.

2. A part of a book or -------- paper are the preliminary to the main portion.
3. The ------- are the raw materials, tools, subject and chemicals used in your experiments.
4. The results in a research paper describes what the researcher(s) found when they analyzed there -----
---.
5. The discussion is the final parts of a research paper, in which an author describes, analyzes, and -----
--- their findings.
Answer:
1- paragraph 2- research 3- materials 4- data 5- interprets
SUMMARY
This section is often considered the most important part of a research paper because it most effectively
demonstrates your ability as a researcher to think critically about an issue, to develop creative solutions
to problems based on the findings, and to formulate a deeper, more profound understanding of the
research problem you are studying. If appropriate, the discussion section is also where you state how the
findings from your study revealed new gaps in the literature that had not been previously exposed or
adequately described.

Document review is the process of writing a summary, synthesis, and/or critique of material found as a
result of a literature search. There are several reasons to review the literature. Identify the developments
in the field of study; learn about the information sources and the research methodologies. Find gaps in
the literature that can become research questions inform the audience of the developments in the field.
Discuss the relevance and appropriateness of your approach. The depth and scope of a literature review
depends on many factors, most notably the purpose and audience of the review. Citation style is the
formal way citation information is formatted. There are different styles and in the order in which
information appears (such as date of publication, title and page number after author's name, etc.),
punctuation conventions, use of italic. It is unique to their style. There are many ways to cite sources
from your research. For example: Citation style is a set of guidelines for how to cite sources in your
academic writing.
It is a systematic and organized presentation of the facts and conclusions of an event that has happened
somewhere or has been discovered after thorough research. The report is used as a form of written
assessment to discover what has been learned from reading, researching, or experiencing an important
skill in use. Good report writing is an important quality for any researcher as it also presents the results
to readers outside of your field with experts in the field. It focuses on the results, conclusions, findings
made, efforts made, and inferences drawn from the research conducted. The report should be written in
a simple yet academic style. The aim of the project is to create a good product and report and software,
hardware, theory, etc. that you have developed during the project is only a means to this end. The article
should focus on the specific objectives of the project, the methodology used, and the main results.
KEY WORDS
Introduction- A section of a book or treatise that summarizes the main body.

Methodology - A set of methods, rules, and assumptions used by a discipline.
Body Text - Body text or body text is the text that forms the main body of a book, magazine, website,
or other print or digital work.
Discussion- The act or process of talking about something in order to come to a decision or exchange
ideas.
Proofreading - Editors will polish your raw work into a document's glittering jewel.
Appendix - A section or table of sub-problems at the end of a book or document.
Bibliography-This is a list of works (such as books and articles) written on a particular subject or by a
particular author.
Novelty - New, original or unusual quality.

Relevance - The ability (as in the case of an information retrieval system) to retrieve documents that
meet the needs of the user.
REFERENCES
1. Lawrence, Amanda (2018), Chan, Leslie; Loizides, Fernando (Ed). Influence Seekers: The
Production of Grey Literature for Policy and Practice. Information Services & Use. 37 (4): 389–403.
2. Kukull, W. A.; Ganguli, M. (2012), Generalizability: The trees, the forest, and the low-hanging
fruit. Neurology. 78 (23): 1886–1891.
3. Canagarajah, A. Suresh (1996), From Critical Research Practice to Critical Research Reporting.
TESOL Quarterly. 30 (2): 321–331.
4. Gauch, Jr., H.G. (2003), Scientific method in practice. Cambridge, UK: Cambridge University
Press. 2003 ISBN 0-521-81689-0.
YOUTUBE VIDEOS
1. https://www.youtube.com/watch?v=-ny_eujxhhs
2. https://www.youtube.com/watch?v=cMJWtNDqGzI
3. https://www.youtube.com/watch?v=3iE4WAjaPE0
WIKIPEDIA
1. https://www.brightwork.com/blog/7-steps-effective-report-writing
2. https://www.scribbr.com/category/research-paper/
3. https://library.sacredheart.edu/c.php?g=29803&p=185933
4. ttps://www.scribbr.com/category/research-paper/
REFERENCE BOOKS
1. Blick, Ronald (2003). Technically-Write! Prentice Hall.

2. Lannon, John (2007). Technical Communication. Longman.
3. Groh, Arnold (2018). Research Methods in Indigenous Contexts. New York: Springer.
4. Pawar, Neelam (2020). Type of Research and Type Research Design. Research
Methodology: An Overview. Vol. 15. KD Publications.

CREDIT 04-UNIT 02: INTELLECTUAL PROPERTY RIGHTS
LEARNING OBJECTIVES
 Be aware of their patent rights exercised in their project.
 Register in our country and abroad their inventions, designs and dissertations or theories
written by students during the implementation of their projects.
 Knowledge of patents, copyrights, trademarks, designs and information technology laws.
 Show by product and ask students to identify different DPI types.
 Be aware of the ethical and professional issues that arise in the context of intellectual
property law
“Stealing music is not right, and I can understand people being very upset about their intellectual
property being stolen.” --- Steve Jobs
INTRODUCTION
Intellectual Property: “Intellectual Property (IP) refers to creations of the mind, such as
inventions; literary and artistic works; designs; and symbols, names and images used in commerce”.
Intellectual Property Rights (IPR) refers to the legal rights given to the inventor or creator to
protect his invention or creation for a certain period of time. These legal rights confer an
exclusive right to the inventor/creator or his assignee to fully utilize his invention/creation for a
given period of time.
Intellectual property rights include patents, copyright, industrial design rights, trademarks, plant
variety rights, trade dress, geographical indications, and in some jurisdictions trade secrets.
Fig.1.1: Intellectual property types
There are also more specialized or derived varieties of sui generis exclusive rights, such as circuit
design rights, supplementary protection certificates for pharmaceutical products and database rights.
The term "industrial property" is sometimes used to refer to a large subset of intellectual property

rights including patents, trademarks, industrial designs, utility models, service marks, trade
names, and geographical indications.
Intellectual Property (IP) is an intangible property that comes into existence through human intellect.
It refers to the creations of the mind or the products of human intellect such as inventions; designs;
literary and artistic works; symbols, names and images used in commerce.
The Convention Establishing the World Intellectual Property Organization‖ states that intellectual
property shall include the rights relating to:
literary, artistic, and scientific works,
performances of performing artists, phonograms, and broadcasts,
inventions in all fields of human Endeavour,
scientific discoveries,
industrial designs,
trademarks, service marks, commercial names and designations,
protection against unfair competition, and
all other rights resulting from intellectual activity in the industrial, scientific, literary, or artistic fields.
04-02-01: INTELLECTUAL PROPERTY RIGHTS

Intellectual Property Right (IPR) is the right granted to people to the creations of their minds:
inventions, literary and artistic works, symbols, names and images used in commerce‖. They often
grant creators exclusive rights to use their work for a certain period of time.
These rights are enshrined in Article 27 of the Universal Declaration of Human Rights, which
provides for the right to benefit from the protection of the material and spiritual interests resulting
from the copyright of scientific, literary or artistic works. The importance of intellectual property was
first recognized in the Paris Convention for the Protection of Industrial Property (1883) and later in
the Berne Convention for the Protection of Literary and Artistic Works (1886).
Fig.4.2.1: Intellectual property rights

Both treaties are administered by the World Intellectual Property Organization (WIPO).
Intellectual property rights are generally divided into two main areas:

i. Copyright and related rights: The rights of authors of literary and artistic works (such as
books and other works, musical works, paintings, sculptures, computer programs and film) is
protected by the author, for a minimum period of 50 years after the author's death.
ii. Industrial property: Industrial property can be divided into two main areas:
a. Protection of distinctive signs, in particular trademarks and geographical indications.
Trademarks distinguish the goods or services of one undertaking from those of other
undertakings.
b. Geographical Indications (GIs) identify a good as originating in a place where a given
characteristic of the good is essentially attributable to its geographical origin.
i. The protection of such distinctive signs aims to stimulate and ensure fair
competition and to protect consumers, by enabling them to make informed choices
between various goods and services.
ii. The protection may last indefinitely, provided the sign in question continues to be
distinctive.
NEED OF IPR
The progress and happiness of mankind depends on the ability to create and invent new works in the
fields of technology and culture. The basic needs are: -
i. Encourage innovation: Legal protection of new creations encourages a commitment to
providing additional resources for new creations,
ii. Economic growth: The promotion and protection of intellectual property stimulates
economic growth, creates jobs, new industries, improves the quality and enjoyment of life,
iii. Protecting Creators' Rights: Intellectual property rights are required to protect creators and
other producers of their intellectual products, goods, and services by granting them certain
lifetime rights,
INDIA AND IPRS

i. India is a member of the World Trade Organization (WTO) and is committed to the
Agreement on Trade-Related Aspects of Intellectual Property (TrIPS Agreement).
ii. India is also a member of the World Intellectual Property Organization (WIPO), an
organization responsible for promoting the protection of intellectual property rights
globally.
iii. India is also a party to the following important international treaties and conventions
administered by WIPO regarding intellectual property rights.

a. Budapest Treaty on the International Recognition of Microbial Deposits for the Purposes
of Patent Procedure,
b. Paris Convention for the Protection of Industrial Property,
c. Convention Establishing the World Intellectual Property Organization,
d. Berne Convention for the Protection of Literary and Artistic Works,
e. Patent Cooperation Treaty,
f. Protocol Relating to the Madrid Agreement on the International Registration of Marks,
g. Washington Treaty on Intellectual Property Related to Integrated Circuits,
h. Nairobi Treaty for the Protection of the Olympic Symbol,
i. Convention to protect producers against unauthorized copying of their phonograms,
j. The Marrakesh Treaty to facilitate access to published works for the blind and for people
with disabilities.
NATIONAL IPR POLICY

i. The National Intellectual Property Rights (IPR) Policy 2016 was adopted in May 2016 as
a vision document to guide future development of IPRs in the country.
ii. Its clarion call is Creative India; Innovative India‖.
iii. It encompasses and brings to a single platform all IPRs, taking into account all inter-
linkages and thus aims to create and exploit synergies between all forms of intellectual
property (IP), concerned statutes and agencies.
iv. It sets in place an institutional mechanism for implementation, monitoring and review. It
aims to incorporate and adapt global best practices to the Indian scenario.
v. Department of Industrial Policy & Promotion (DIPP), Ministry of Commerce,
Government of India, has been appointed as the nodal department to coordinate, guide and
oversee the implementation and future development of IPRs in India.
vi. The Cell for IPR Promotion & Management (CIPAM)’ setup under the aegis of DIPP, is
to be the single point of reference for implementation of the objectives of the National IPR
Policy.
vii. India’s IPR regime is in compliance with the WTO's agreement on Trade- Related Aspects
of Intellectual Property Rights (TRIPS).
OBJECTIVES OF IPR
i. Outreach and Promotion - To create public awareness about the economic, social and
cultural benefits of IPRs among all sections of society.

ii. Generation of IPRs - To stimulate the generation of IPRs.
iii. Legal and Legislative Framework - To have strong and effective IPR laws, which balance
the interests of rights owners with larger public interest.
iv. Administration and Management - To modernize and strengthen service-oriented IPR
administration.
v. Commercialization of IPRs - Get value for IPRs through commercialization.
vi. Enforcement and Adjudication - To strengthen the enforcement and adjudicatory
mechanisms for combating IPR infringements.
vii. Human Capital Development - To strengthen and expand human resources, institutions
and capacities for teaching, training, research and skill building in IPRs.
ACHIEVEMENTS UNDER NEW IPR POLICY

i. Improvement in GII Ranking: India‘s rank in the Global Innovation Index (GII) issued by
WIPO has improved from 81st in 2015 to 52nd place in 2019.
ii. Strengthening of institutional mechanism regarding IP protection and promotion.
a. Clearing Backlog/ Reducing Pendency in IP applications: Augmentation of
technical manpower by the government has resulted in drastic reduction in pendency
in IP applications.
b. Automatic issuance of electronically generated patent and trademark certificates has
also been introduced.
iii. Increase in Patent and trademark Filings: Patent filings have increased by nearly 7% in
the first 8 months of 2018-19 vis-à-Vis the corresponding period of 2017-18. Trademark
filings have increased by nearly 28% in this duration.
iv. IP Process Re-engineering Patent Rules, 2003 have been amended to streamline
processes and make them more users friendly. Revamped Trade Marks Rules have been
notified in 2017.
v. Creating IPR Awareness: IPR Awareness programs have been conducted in academic
institutions, including rural schools through satellite communication, and for industry,
police, customs and judiciary.
vi. Technology and Innovation Support Centres (TISCs): In conjunction with WIPO,
TISCs have been established in various institutions across different states.
ISSUES IN INDIA’S IPR REGIME

i. Section 3(d) of the Indian Patent Act 1970 (as amended in 2005) does not allow patent
to be granted to inventions involving new forms of a known substance unless it differs
significantly in properties with regard to efficacy.
a. This means that the Indian Patent Act does not allow evergreening of patents.

b. This has been a cause of concern to the pharma companies. Section 3(d) was
instrumental in the Indian Patent Office (IPO) rejecting the patent for Novartis’ drug
Glivec (imatinib mesylate).
ii. Issue of Compulsory likening (CL): CL is problematic for foreign investors who bring
technology as they are concerned about the misuse of CL to replicate their products. It has
been impacting India-EU FTA negotiations.
iii. India continues to remain on the United States Trade Representative's (USTR’s) Priority
Watch List for alleged violations of intellectual property rights (IPR). In its latest Special 301
report released by the United States Trade Representative (USTR), the US termed India as
one of the world’s most challenging major economies" with respect to protection and
enforcement of IP.
iv. Data Exclusivity: Foreign investors and MNCs allege that Indian law does not protect
against unfair commercial use of test data or other data submitted to the government during
the application for market approval of pharmaceutical or agro-chemical products. For this
they demand a Data Exclusivity law.
v. Enforcement of the Copyright act is weak, and piracy of copyrighted materials is
widespread.

1) What is the importance of IPR?
Answer: Intellectual Property Rights are used to convey the monopoly of the holder over the usage of the
specified property or items for a definite time period. Any violation of these rights attracts severe
penalties.
2) What are the 4 types of intellectual property rights?

Answer: Patents, trademarks, copyrights, and trade secrets are valuable assets of the company and
understanding how they work and how they are created is critical to knowing how to protect them.
3) What are the two main purposes of IPR?

Answer: The protection of such distinctive signs aims to stimulate and ensure fair competition and to
protect consumers, by enabling them to make informed choices between various goods and services.
4) What is intellectual property law in India?

Answer: Intellectual property laws provide legal protection to inventors and creators by granting them
exclusive rights to their creations. This ensures that they can benefit from the commercialization of their
ideas and prevent others from using their creations without permission

04-02-02: A PATENT
A patent is the granting of a property right by a sovereign authority to an inventor. This grant
provides the inventor exclusive rights to the patented process, design, or invention for a designated
period in exchange for a comprehensive disclosure of the invention.
Fig.4.2.2: Patent
The word patent originates from the Latin ‘patere’, which means "to lay open". It is a shortened
version of the term letters patent, which was an open document or instrument issued by a monarch or
government granting exclusive rights to a person, predating the modern patent system. Similar grants
included land patents, which were land grants by early state governments.
A patent is often referred to as a form of intellectual property right, an expression which is also
used to refer to trademarks and copyrights, and which has proponents and detractors. Some other types
of intellectual property rights are also called patents in some jurisdictions: industrial design rights are
called design patents, plant breeders' rights, and utility models are sometimes called petty patents or
innovation patents. Particular species of patents for inventions include biological patents, business
method patents, chemical patents and software patents
Under the World Trade Organization's (WTO) TRIPS Agreement, patents should be available in
WTO member states for any invention, in all fields of technology, provided they are new, involve an
inventive step, and are capable of industrial application.
There are variations on what is patentable subject matter from country to country, also among WTO
member states. TRIPS also provide that the term of protection available should be a minimum of 20
years.
Types of patents: There are three types of patents. Such as: -
i Utility patents may be granted to anyone who invents or discovers any new and useful
process, machine, article of manufacture, or composition of matter,
ii Design patents may be granted to anyone who invents a new, original, and ornamental design
for an article of manufacture; and
iii Plant patents may be granted to anyone who invents or discovers and asexually reproduces
any distinct and new variety of plant. Utility and plant patents last for 20 years from the date of

filing, while design patents last for 15 years if filed after May 13, 2015, or 14 years if filed
before May 13, 2015.
Examples of Patents
i. One of the most notable patents in the past 40 years was the personal computer filed in 1980 by
Steve Jobs and three other employees of Apple Inc.
ii. King C. Gillette patented the razor in 1904 and was dubbed a "safety razor."
iii. Garrett Morgan was granted a patent for the traffic light in 1923.
iv. The patent for the television in 1930 to Philo Taylor Farnsworth for the "first television
system."
v. At age 20, Farnsworth had created the first electric television image and went on to invent an
early model of the electronic microscope.
COPYRIGHTS
According to the United States Patent and Trademark Office, "original works of authorship" are
protected by copyrights, which are legal protections for creative works of the mind. They incorporate
visual craftsmanship, artistic works, different compositions, movement, and programming. Copyrights
keep others from duplicating the work without the communicated consent of the copyright proprietor.
Copyrights, like other forms of intellectual property, are granted for a predetermined period of time,
allowing the owner to profit from their creation. Copyrights are conceded for the most extreme time of a
long time from the demise of the creator. Special cases apply to works for recruit and unknown works.
Fig.4.2.3: Copyright symbol

Classes of Copyrights
In India, following classes of Copyrights exist:
i. Literature: Books, Papers, Exploration articles, Oral talks, Talks, Arrangements, PC program,
Programming, Data sets.
ii. Dramatics: Dramas and screenplays
iii. Sound Accounts: Recording of sounds no matter what the medium on which such recording is
made for example a Phonogram and a Disc ROM.
iv. Artistic: Drawing, Painting, Logo, Guide, Diagram, Photos, Work of Engineering, Inscriptions,
and Craftsmanship.

v. Musical: Melodic documentations, barring any words or any activity planned to be sung,
spoken or performed with the music. A melodic work need not be recorded to appreciate
Copyright insurance.
vi. Cinematograph Movies: Cinematograph Film' is a visual recording performed by any medium,
shaped through a cycle and incorporates a sound recording. For instance, Films, television
Projects, Visual Recording, Sound Recording, and so on.
TRADEMARKS
Legal protections for words, phrases, designs, or marks that identify a particular product or
service are known as trademarks. Brand names are licensed innovation that adds to the picture and
notoriety of the item or administration to which it has a place, and to the organization to which it has a
place. Past imagery, a brand name can be unquestionably important to an organization, provoking a
few organizations to remember them for their valuation. Brand names are safeguarded everlastingly,
for however long it's being used and the holder can protect it. Instances of brand names incorporate
the brilliant curve for Mcdonald's.
Fig.4.2.4: Trademark symbol

The Trade Marks Act of 1940, which was derived from the Trade Marks Act of 1938 in the United
Kingdom, was India's first statutory law pertaining to trademarks (TM). The incorporation of TM
provisions in the Indian Penal Code, Criminal Procedure Code, and Sea Customs Act followed. Later
on, Exchange Imprints Act, 1940 was rechristened as Exchange and Product Imprints Act, 1958. The
Trade Marks Act of 1999 ended nearly four decades of this law. The requirement for this happened to
conform to the arrangements of the Outings. It is the ongoing overseeing regulation connected with
enlisted TM.
Fig.4.2.5: Trademark types
The Exchange Imprints Vault was laid out in India in 1940 and by and by it manages the Exchange

Imprints Act, 1999 and the principles there under. It goes about as an asset and data focus and is a
facilitator in issues connecting with exchange denotes the country. The target of the Exchange Imprints
Act, 1999 is to enroll brand names applied for in the nation and to accommodate better assurance of
exchange mark for labor and products and furthermore to forestall false utilization of the imprint. The
principal capability of the Library is to enlist brand names which fits the bill for enrollment under the
Demonstration and Rules.
Types of Trademarks in India
Trademarks are often used to identify the source, owner, or developer of a product or service. There are
several types of trademarks a business can choose from, including logos, names, taglines, and product
brands. However, the use of any mark may be mistaken for an existing one is prohibited. Soft drink firms,
for example, are not permitted to use Coca-Cola-like logos or names. The different types of trademark in
India are: -
i Word Marks: It consists of one or more words, letters, numbers, or anything else that can be
written in standard characters, such as a brand name, slogan, or tagline. In straightforward words
where one needs to enlist just the letters, words or mix of words or numerals with practically no
creative and pictorial portrayal can enroll the exchange under word mark classification. It tends to
be utilized in any plan, style and textual style. Microsoft, Tata, KFC, and IBM
ii Device mark: Device marks include any label, sticker, monogram, logo, or geometrical figure
with or without a word element. Gadget imprints may likewise incorporate varieties yet in the
event that the enlistment is made alongside colors, similar mix of varieties must be utilized to
guarantee brand name security. The device mark is striking and appealing, making it simple for
the general public to remember it. At the point when extraordinary gadgets are utilized it helps in
acknowledgment of the business since individuals may not recall the name yet they can without
much of a stretch recollect the logo. A good illustration of a device mark is the Apple logo.
iii Sound Mark: Sound marks are features that are acquired through hearing and are distinguished by
their distinctive sound. It is a trademark in which sound serves the purpose of identifying the
commercial source of goods or services in a unique way. As per the new arrangement the sound
submitted ought to be in MP3 configuration and it shouldn't surpass 30 seconds of length and
visual portrayal of the sound documentations. The first company to worldwide register a sound
trademark was YAHOO. In India ICICI bank was quick to enlist sound as a brand name.
iv Three-dimensional trademark: It includes both shapes of goods or packaging. It is a non-
conventional trademark and to get its registered the shape of goods or its packaging must be
distinctively different from the competitors in the market and is enough for the public to recognize
the origin of the goods, without the aid of other word marks on it. In simple words it must be able
to perform the function of the trademark.

v Color Brand name: Brand name act grants enlistment of mix of varieties to address the labor and
products. At the point when the peculiarity is asserted in the mix of varieties regardless of gadget
it is called variety mark. Color trademarks can be of a single color or a combination of colors, as
stated in Section 10 of the Trademark Act of 1999. However, the fact that Section 2(1)(m) of the
Trademarks Act of 1999 requires a mark to be a "combination of colors" demonstrates the
legislature's intention to prohibit single-color trademarks. Subsequently, it is suggestible to
petition for a mix of varieties as a brand name yet where an exchange mark is enrolled without
restriction of variety, it will be considered to be enlisted for all tones.
vi Smell Imprints: At the point when the smell is particular and can't be confused with another item,
a smell imprint can be perceived. Think about aromas.
vii Shape of merchandise: Brand names can be enlisted in shape or merchandise on the off chance
that they have an unmistakable shape. However, it can't be enrolled if the - State of merchandise
which results from the idea of products themselves; a product's shape, which is necessary to
achieve a technical result; State of products, which gives significant worth to the merchandise.
State of products are additionally non-traditional brand names.
GEOGRAPHICAL INDICATIONS
India, as an individual from WTO, ordered the Geological Signs of Merchandise (Enlistment and Security)
Act, 1999. It entered into effect on September 15, 2003. Topographical Markers have been characterized
under Article 22 (1) of the WTO Settlement on Excursions.
Fig.4.2.6: Geographical Indication of Indian Logo

Geological Sign (GI) is a sign utilized on an item that begins from a particular topographical area. The
item should have notoriety and characteristics of the spot of beginning. Products produced over
generations by rural, marginal, and indigenous communities typically have a GI that is registered on them.
These products have a huge international and national reputation due to some of their distinctive
characteristics. GI label gives the right to just those enrolled clients the option to utilize the item name,
and keeps others from utilizing the item name that doesn't fulfill the guidelines recommended.
Type of Products
GI tags are used on the following types of products: -

Handicrafts – Examples would be Madhubani Paintings, Mysore Silk.
Food items – Example would be Tirupati Laddu, Rasgulla.
Wine & Drinks – Example would be Champagne, Cognac of France; Scotch Whisky of UK, Tequila of
Mexico.
Industrial Products – Machinery, Power Generator.
Agricultural Products – Basmati Rice.
Geographical Indications – Laws & Treaties
Government of India enacted Geographical Indications of Goods (Registration and Protection) Act, 1999.
This act came into force in September, 2003. There are many laws and treaties enacted by the World
Intellectual Property Organization (WIPO) and World Trade Organization (WTO) for the protection of
Geographical Indications. Geographical Indications under WIPO are of 3 main Treaties:
Paris Convention
Madrid Agreement
Lisbon Agreement
Geographical Indication Registered Products – India
As of March 2020, India had registered 361 GI Products.
Registration of GI’s began in the year 2004-05 after the above-mentioned law came into effect in 2003.
Darjeeling Tea of West Bengal was the first product to receive the GI tag in India. Both the product and
the logo received the GI tag.
The other products to receive GI tags were Aranmula Kannadi a Handicraft from Kerala, Pochampalli Ikat
a Handicraft from Telangana.
The latest 4 products to receive GI tags were Dindigul Locks, manufactured product from Tamil Nadu,
Kandangi Saree a handicraft of Tamil Nadu, Srivilliputtur Palkova Food stuff of Tamil Nadu, and the
361st GI product, Kaji Nemu an agricultural product of Assam.
Out of 361 GI products registered in India, 15 products are originating from 9 different countries – Italy,
France, UK, USA, Ireland, Mexico, Thailand, Peru, Portugal.
Phulkari Handicraft – Origin from Punjab, Haryana, Rajasthan.
Warli Painting – Maharashtra, Gujarat, Daman & Diu
Malabar Robusta Coffee – Kerala & Karnataka
Geographical Indications – States in India
Karnataka has the highest registered GI products in India. It has 42 GI products.
Tamil Nadu has the 2nd highest GI registered products. It has 35 GI products.
Maharashtra has the 3rd highest GI registered products. It has 30 GI products.
Jharkhand has no GI’s registered.
Haryana and Punjab do not have GI products registered under its name individually

TRADITIONAL KNOWLEDGE
Traditional knowledge (TK) is knowledge, “know-how, skills and practices that are developed, sustained
and passed on from generation to generation within a community, often forming part of its cultural or
spiritual identity”. While there is not yet an accepted definition of TK at the international level, it can be
said that:
TK in a general sense embraces the content of knowledge itself as well as traditional cultural expressions,
including distinctive signs and symbols associated with TK.
TK in the narrow sense refers to knowledge as such, in particular the knowledge resulting from
intellectual activity in a traditional context, and includes know-how, practices, skills, and innovations.
Traditional knowledge can be found in a wide variety of contexts, including: agricultural, scientific,
technical, ecological and medicinal knowledge as well as biodiversity-related knowledge.
It is the old and native information held by any local area or a gathering. In the past, it was only available
orally and was not recorded anywhere. In this way, Customary Information was verbally given to people
in the future. TK isn't restricted to a specific field. It covers a wide region, for example, the utilization of
plants or their concentrates for clinical medicines, a customary type of dance, specific methods utilized for
hunting, make information/abilities, etc. However, there is no authority record except for certain types of
TK track down appearance in the way of life, stories, legends, fables, ceremonies, melodies, and so on.
Already, there was no system accessible to safeguard TK, yet presently, it has been perceived as IPR
under Excursions Arrangement. The Public authority of India has made a computerized library named as
Customary Information Advanced Library (TKDL) as a vault of 2,50,000 definitions of different
frameworks of Indian medication.
Fig.4.2.7: Traditional knowledge

Backgrounder
The topic of the 2019 meeting of the Unified Countries Extremely Durable Discussion on Native Issues is
the age, transmission and assurance of customary information. Customary information is frequently sent
orally from one age to another. It will in general be all in all possessed and can be communicated in
stories, melodies, legends, precepts, social qualities, convictions, ceremonies and so forth. It is likewise
the hotspot for the customary use and the executives of grounds, regions and assets, with native rural
practices that consideration for the earth, without draining the assets. Native people groups follow oral
customs, with moves, works of art, carvings and other artistical articulations, that are rehearsed and gone
down through centuries. The identities, cultural heritage, and livelihoods of indigenous people are all

based on traditional knowledge. Conventional information underlines native people groups' all-
encompassing methodology of life, which is a focal component of the world's social and organic variety.
Challenges of TK
Worldwide accounts of continuous expansionism, bigotry, abuse and dispossession of native people
groups have prompted primary imbalances and cultural prohibition and weakness. These cycles that have
additionally subverted and underestimated customary information. Native dialects, which envelop colossal
conventional information connected with preservation and environmental frameworks and proposition
potential open doors for safeguarding biodiversity and keeping up with social variety, are additionally
under danger. Assigning 2019 as the Worldwide Year of Native Dialects is the Unified Countries General
Gathering's work to secure and safeguard native dialects, and subsequently, safeguard customary
information.
Value of TK
There is a growing appreciation of the value of traditional knowledge. These are as: -
i Traditional information is important not exclusively to the people who rely upon it in their regular
routines, yet to present day industry and horticulture.
ii Traditional information about land and species preservation and the board and rejuvenation of
organic assets protection is grounded in the regular routines and practices of native people groups
and their nearby comprehension of their surroundings developed more than millennia.
iii It can possibly assume a critical part in feasible turn of events and for resolving the most
squeezing worldwide issues, for example, environmental change, land the board, land protection,
and to reinforce logical, mechanical and clinical examination, as confirmed in bury alia drugs.
Iv In addition, indigenous and non-indigenous peoples alike can benefit from traditional knowledge's
promising means of ensuring food security. Numerous native land and ecological administration
rehearses have been demonstrated to upgrade and advance biodiversity at the neighborhood level
and help in keeping up with solid biological systems.
v Educational practices that combine indigenous traditional knowledge and languages are a
significant means of preserving indigenous cultures and identities, lowering illiteracy rates and the
number of students who drop out of school, enhancing learning, safeguarding the environment,
and fostering well-being.
INDUSTRIAL DESIGNS
From a legitimate perspective, a modern plan is the decorative part of an article. Two-dimensional
features, such as patterns, lines, or colors, or three-dimensional features, like an item's shape, can make up
an industrial design.

Modern plan is a course of configuration applied to actual items that are to be fabricated by large scale
manufacturing. It is the creative process of determining and defining a product's form and characteristics
before the product is made or produced. Craft-based design is a process or approach in which the form of
the product is determined by the product's creator largely concurrently with the act of producing it. It
consists solely of repeated, frequently automated replication.
Fig.4.2.8: Industrial design

A design is a creation of the mortal mind, which is appealing to the eyes and attracts the attention of the
observers. The need to cover Artificial Designs (ID) was honored in the 18th century and the Indian
legislation legislated the Patterns and Designs Act’ in 1872 for the first time. The Act was legislated to
cover the rights over the creation of the designs and new patterns by the formulators. The Act was
replaced by the British Patents and Designs Act in 1907, which latterly came the base for the Indian
Patents and Designs Act, 1911. In 1970, a separate Act was legislated for the patent, i.e. the Patent Act,
1970. The Indian Patents and Designs Act, 1911, remained in force for designs only. Eventually, in the
time 2000, a devoted Act for the ID was passed, which came into force in 2001.
Artificial design rights
“Artificial design rights are intellectual property rights that make exclusive the visual design of
objects that aren't purely utilitarian.” A design patent would also be considered under this order. Under
the Hague Agreement Concerning the International Deposit of Industrial Designs, a WIPO- administered
convention, a procedure for a transnational enrollment exists. An aspirant can file for a single
transnational deposit with WIPO or with the public office in a country party to the convention. The design
will also be defended in as numerous member countries of the convention as asked.
Protection of industrial design

In order to get the protection of Industrial Design, the Design has to be registered before the Design
registry of the country (or region) where an application is filed. In order to get protected, industrial
designs: -
i. Must be new/ novelty.
ii. Must be original.
iii. Shouldn't be borne anywhere to the public in India or in any other countries by any publication
in palpable (palpable).

iv. Shouldn't be used previous to the date of form or where the operation for enrollment is
applicable on the precedence date.
v. Significantly differs from known designs or combinations of known design features.
vi. Shouldn't be clashing with public order or morals or integrity.
vii. Not be prejudicial to the security of India.
viii. Design cannot include a trademark, property mark, or any cultural rights as defined under the
Copyright Act, 1957.
ix. ix Shouldn't contain any profane or libelous matter.
Excluded from the scope of industrial design

i. Any prints, charts, books, timetables, documents or forms, instruments, patterns, cards,
circulars, orders, and numerous others which fall in cultural or erudite workshop, Markers,
commemoratives, cards, cartoons.
ii. Any principle or system of structure/ construction of a composition.
iii. Bare mechanical contrivance, structures and structures.
iv. corridor of papers not manufactured and vended independently.
v. Variations generally used in the trade.
vi. Mere factory differences of factors of an assembly.
vii. A bare change in the size of the composition.
viii. Flags, totems, or signs of any country.
ix. The duration of protection is for at least 10 times.
x. Its figure for a redundant period of five times. In numerous countries, the duration is for a
longer period i.e. 15- 25 times.
Benefits of protection of industrial design

Obtaining exclusive rights to a product or an article with a particular appearance may result in a
substantial return on the investment because it will allow you to prevent others from reproducing a popular
design. Benefits of protection of designs are as follows: -
i Is a revenue generator
ii Protection of exclusive rights
iii Return on investments
iv Promotes brand value
v Facilitates marketing and commercialization
vi Unique selling propositions

1) What is copyright?
Answer: A copyright is a set of rights automatically granted to the person who creates an original work of
authorship such as a piece of literature, song, film or software. These rights include the right to reproduce
the work, make derivative works, distribute copies, and publicly perform and display the work.
2) What is the exactly meaning of patent?

Answer: A patent is an exclusive right granted for an invention, which is a product or a process that
provides, in general, a new way of doing something, or offers a new technical solution to a problem.
3) What is a trademark?
Answer: A trademark is a sign capable of distinguishing the goods or services of one enterprise from
those of other enterprises. Trademarks are protected by intellectual property rights.
4) What is industrial design in IPL?

Answer: Industrial design intellectual property refers to compositions of colors or lines which give a
three-dimensional look to a handicraft or product. Industrial design intellectual property refers to
compositions of colors or lines which give a three-dimensional look to a handicraft or product.
04-02-03: PATIENT APPLICATION PROCESS (REGISTRATION IN

INDIA)
“Patents provide intellectual property rights protection to newly invented products, services and
processes.” To enforce these rights, Patents must be registered under the Indian Patent Act of 1970.
What exactly is the procedure for Patent registration, is what we have discussed in this blog at length.
Procedure of Patent Registration in India starts with conducting patent searches before the patent
application is actually filed. Patent Searches help determine the novelty of an invention which is the most
significant eligibility criteria for Patent Registration in India. After ensuring that the invention is
absolutely novel and original, a Patent application can be filed to the Indian Patent Office at any of its
regional branches located in Delhi, Mumbai, Chennai and Kolkata.
Patent Applications can either be provisional or complete. Both these filings serve different purposes in
the patent application process. Provisional patent filing establishes patent rights over a product which is
yet to be developed and helps claim an early filing date. The status of the Patent after Provisional patent
filing remains “Pending.” A 12 months window is further provided to develop the invention before a
complete patent application can actually be filed.

A complete patent filing, on the other hand, includes the detailed description of the developed invention,
outlining all the claims of the applicant, and presenting any necessary diagram to represent it. The aim is
to provide a full and detailed disclosure of the invention to the Patent Office for successful Patent
Registration.
SEARCH A PATENT
It is tempting to think all searching can be done electronically and for the majority of modern patents
(published after 1975) this is essentially true. Patent searchers, especially inventors who need to
thoroughly search the entire realm of patents to ensure their idea hasn't already been patented, have more
limited options available electronically and for free. Pre-1976 U.S. patents are often difficult to find
because the patent pages were put into the USPTO database as scanned images, and full text searching
was added later through machine transcription. Older patents from outside the U.S. can be even more
challenging to find. Below you'll find some basic tips and strategies for locating patents.
Doing a preliminary patent search is an important first step for inventors hoping to patent their new
invention. The following tutorial, produced by the USPTO, details a patent-searching process that can be
adapted for just about any free patent search tool.
Keyword searching using free patent search tools may give you an idea of what is out there (or not,
depending on which terms you use). You can also use this strategy to identify classification codes,
inventor names, and other information you can then use to run additional searches. Keyword searching
should not be your primary or only patent searching strategy if you are conducting a preliminary patent
search or are doing your own prior art search. As: -
i The Lens: The Lens covers over 100 million patent documents from around the world. Includes
classification searching and quick access to patent family information.
ii Google Patents: Quick keyword searching for US and other patents. Full text of older patents
may have issues related to automated character recognition from scanned patent image files.
FINDING A PATENT BASED ON KNOWN INFORMATION
 The patent number is the key to the entire patent information system. It doesn't matter when or
where a patent was issued, as long as you know the number, you'll be able to get a full-text patent
in no time. Most free patent search websites will let you type in a U.S patent number and get a
PDF version of that patent. Some search for patents from other countries as well.
 If you know the name of the inventors, owners, or assignees, you can search using Lens and
Espacenet respectively. These search tools allow you to narrow down your search to certain fields
(e.g., assignee name, inventor name, owner name, etc.) in the full text of the patent. This allows

you to quickly get a list of the patents owned by a specific company or inventions by a particular
person.
DRAFTING OF PATENT
Patent drafting is part of how an idea is patented and is the process of writing a description and making a
patent claim. This is at the heart of every patent application. When a patent is granted or licensed, the draft
serves as the specification portion of the document.
As a first step, your patent attorney asks you to enter into an invention disclosure agreement. This allows
you to communicate your invention in sufficient detail so that the attorney understands the invention. At
this point, your attorney begins drafting a patent application that begins with the design statements.
Once your attorney has accurately grasped the scope of the invention in the draft claim, the inventor or
drafter begins to prepare all the necessary drawings to help better explain the claims. household. In some
cases, images show existing inventions that make the right distinction between these elements and the
innovation you're adopting.
During patent drafting, many collaborative discussions take place between you, the designer, and the
attorney. It is not uncommon for the scope of claims to vary slightly during this period. When these
changes occur, it may be an attempt to further differentiate the new invention from the existing ones.
These changes may also involve new or expanded understanding of the invention or its use.
THE PARTS OF A PATENT DRAFT?
As outlined in 37 CFR 1.77, the non-provisional patent draft includes the following thirteen sections:
The title of your invention
A cross-referenced list of any related patent applications
A statement about any federally sponsored R&D —if applicable
The names of all parties if there is a joint research agreement
References to a "sequential listing," any tables or computer program listings, as well as any appendix
submitted to a CD or storage device and the incorporation-by-reference list
Background information on the invention
A brief summary of invention
A short description of the drawings
A detailed description of invention
The claim or claims
An abstract of the disclosure.
Sequence Listing, if not supplied on a CD or storage device

An oath or declaration
A provisional patent draft includes the following, according to 37 CFR 1.16(d):
The name of all inventors
The creator's residential address
The title of the invention
Name and registration number of attorney or agent and docket number (if applicable)
Correspondence address
List of any U.S. government agencies with interest in the application
PATENT REGISTRATION
Patent registration is a legal process that grants exclusive rights of ownership and use to the inventor of a
product, service, or technology. Thus, the inventor gets the exclusive right to his invention for the entire
period of validity of the patent registration. The patent registration process is extremely important for
inventors and businesses to protect their innovative ideas and prevent others from using, selling or
producing their inventions without permission or licensed. In India, patent registration is governed by the
Patent Act 1970 and administered by the Indian Patent Office.
Benefits of Patent Registration: Benefits of patents are as: -
I. Legal protection: Patent registration provides legal protection to inventors by granting them
exclusive rights to their inventions. It prohibits anyone from making, using, selling or importing a patented
invention without the permission of the inventor.
ii. Market advantage: Patent acquisition gives inventors and companies a competitive edge in the
marketplace. This allows them to make the most of their unique inventions and prevents competitors from
stealing their ideas. With patents, inventors can gain a monopoly on the market, set themselves apart from
competitors, and can demand higher prices or licensing fees for patented technology. their regime.
iii. Financial opportunities: Patents can open up a variety of financial opportunities. They can attract
investors and venture capitalists interested in supporting innovative technologies. Patented inventions can
also generate revenue through licensing agreements. Additionally, patents can increase the overall value of
a company, making it more attractive for mergers, acquisitions, or partnerships.
iv. Encourage research and development: Patent registration encourages and rewards innovation by
providing inventors with limited-time exclusivity. This encourages inventors and companies to invest in
research and development (R&D), knowing that they will enjoy exclusive patent rights and potential
financial benefits from their inventions. me.
TYPES OF PATENTS IN I NDIA

Now we know why Patent filing is important, we need to know which patent type you need to apply for.
To get there, you need to understand the 3 different types of patents, and the kinds of inventions it
protects:
TYPE OF UTILITY PATENT:
Utility patents are one of the most popular types of patents in India. These types of patents cover any
improvement or invention in a product, process, or machine. It is also called "patent of invention". So, if
you created a new electric vehicle, or a solar-related machine, etc., you would be applying for a utility
patent.
TYPE OF DESIGN PATENT :

Design patents are one of the types of patents that protect the aesthetic appearance of manufactured
products/goods. These types of patents protect the “surface decoration”, i.e. the appearance of the product.
Therefore, when applying for a design patent, you can only claim ownership of a single shape or design.
These types of patents are also less expensive than the other two types of patents. For example, the Pepsi
bottle has a patented design registered for its unique shape, shape and feel.
TYPE OF PLANT PATENT:

As we all know, the origin of Ayurveda comes from India. We have access to many herbs and plants,
many of which are still unknown. They can use these plants for healing etc. Therefore, plant patents are
also among the most important types of patents. New hybrids and biologically discovered plants are some
of the plants covered by this patent. For example, an African violet tree and a new type of almond tree are
both patented plants.
TYPES OF PATENT APPLICATIONS IN INDIA

The Patents Act 1970 governs all laws and provisions related to patents, including types of patents. As per
the act, there are 7 different types of patent applications in India. Namely,
i Provisional Patent:
Patents are usually granted in order of precedence. So, in order to claim top priority on any invention,
a provisional patent application would be very helpful. A provisional patent means your invention is still
in development. So, all those who apply for a provisional patent have one year to perfect their invention,
after filing the application. However, if the applicant does not submit a permanent patent description
within one year, the application will be cancelled.
ii Conventional/non-provisional patents:

You can file this type of patent application when you are sure that all stages of the invention/improvement
have been completed and that your innovative product is in its final stages. The reason is, unlike
provisional patents, in this type of patent you must submit a complete description of your invention when
you apply. As a result, you don't have an extra year to make changes to your invention.
iii Common applications:
It is possible that prior to filing a patent application with the Indian Patent Office, the applicant filed a
patent application in one or more of the signatories to the convention. Conventional countries can be any
country that is a signatory to any international treaty or agreement with India. When you, as an applicant,
apply for a pre-IPO treaty, you have the option of requesting a priority date based on your other
application in a different growth country/region.
iv International PCT application:
PCT stands for Patent Cooperation Treaty. This type of patent application is a simplified process for filing
in all the signatory countries at once! The best part about this app is that you can get protection for your
invention in up to 142 treaty countries.
v Application of the PCT national stage:
This type of patent application refers to the country stage of a PCT application in a particular country.
This application will help you to be protected in a specific country by requesting a backdated priority date,
based on your international application. The only condition is that you must apply within 31 months of the
priority date.
vi Additional patent application:
You have applied for a utility patent to protect it. Today, it's been a year since you filed for a generic
patent and it's still under review. However, you could have created another invention that makes electric
scooters able to travel more kilometers than before. It is an improvement/addition to your patentable
invention. For such an improvement, you file an additional patent application. Here, an important point is
that a patent application will only be granted if the main patent application is granted.
vii Split application:
As the name suggests, this application is filed to separate multiple applications from the original patent
application. This usually applies when an inventor has more than one invention that he wants to protect.
PROCEDURE OF PATENT APPLICATION/ REGISTRATION IN I NDIA

The patent application procedure in India begins with a patent search before the actual patent application
is filed. Patent search helps in determining the novelty of an invention, which is the most important
eligibility criterion for patent registration in India. After ensuring that the invention is completely new and
original, a patent application can be filed with the Indian Patent Office at any of its regional branches in
Delhi, Mumbai, Chennai and Calcutta.

The allocation can be temporary or total. These two filings serve different purposes in the patent
application process. A provisional patent filing establishes patent rights over an undeveloped product and
allows an early filing date to be requested. The status of the patent after the provisional patent filing
remains "pending". In addition, there is an additional 12-month period for patent development before a
full patent application is filed.
In contrast, a complete patent application includes a detailed description of the developed invention,
outlines all of the applicant's claims, and shows any diagrams necessary to represent it. The purpose is to
provide complete and detailed patent information to the Patent Office for successful patent registration.
Here are the six steps of a patent application or registration:
Fig. 4.2.8: Steps of patent application

Step 1: Patent search
Patent search is the first step of the patent application process in India and is conducted worldwide. It is
generally considered safe to perform a patent search prior to filing a patent application. If an invention is
of a known or close to a known technical condition, the novelty of the invention may be challenged by the
Indian Patent Office and the application may be challenged. Therefore, it is important to perform a
preliminary patent search to assess the likelihood of your patent being approved by the Patent Office.
Step 2: Drafting of patent specifications
After extensive research around the world, the invention description is written in technical legal language
with or without an inventor's statement. Without a statement it is a draft specification and with a statement
it is a full specification. The patent description describes the area of invention, a detailed description of
the invention, along with practical examples and best practices for using the invention, so that people
knowledgeable in the field can easily easy to use the invention. Legal patent protection is granted when
the specification is written along with the inventor's request and is complete.
Step 3: Apply for a patent
After drafting the patent description, you can begin the process of filing a patent application in India. As
discussed earlier, patent applications can be draft or complete, based on written specifications. Draft or
full description is filed on Form 2 while patent application form is filed on Form 1 as per the provisions of
Indian Patent Act. If a provisional patent description is filed, within 12 months of the filing date, the full
description must be filed with the inventor's claim. There are 6 different types of patent application forms
that can be filed depending on their purpose. That is: -

Ordinary patent application: Used to file a patent application directly in India, seeking protection only
within the Indian territory.
PCT National phase patent application: Used to enter the national phase of PCT or Patent Cooperation
Treaty in India, after which the applicant will be allowed to seek patent protection in multiple countries.
PCT International patent application: Used to file a single patent application for patent registration in
multiple countries which are members of the PCT, providing a centralized process for seeking patent
protection internationally.
Convention patent application: Used to claim priority based on an earlier application filed in a convention
country, allowing applicants to secure the priority date for their invention.
Divisional patent application: Used to divide an existing patent application into multiple separate
applications, typically when the existing application contains multiple inventions.
Patent of addition application: Used to file an application for an improvement or modification of an
already existing invention for which a patent has already been granted.
Step 4: Publication of patents to public protest
Upon completion of the patent application process, after the expiration of 18 months from the filing date
of the patent application or priority date, whichever comes first, the patent will be published on the official
gazette and open to the public for consultation and inspection. This gives the public the opportunity to
challenge the patent on a valid basis.
Step 5: Patent Check Request
Patent applications are not examined until a request for examination is filed. Appraisal requests must be
filed within 48 months of the patent application or priority date. The patent examiner will review the
patent application and issue an examination report. The evaluation report includes a series of objections
raised by the reviewer. Responses to the audit report must be submitted within 12 months of the audit
report's release date. If necessary, the assessor may summon the applicant or his or her representative to a
hearing to resolve objections. That is why this stage is also known as the patent procedure.
Step 6: Patent granting
Once all objections to the examination report have been resolved and the examiner is satisfied with the
applicant's response, the application is submitted for patent registration. This marks the end of the patent
application procedure. On the other hand, if the examiner is not satisfied with the applicant's answers and
arguments, the patent application may be rejected. In this case, the applicant will have to repeat the whole
patent procedure in India to obtain patent protection.

1) What are the steps for patent application in India?
Answer: The steps to file a patent includes prior art search; filing of the application; publication of
application; request for examination; response to objections and subsequently grant of patent.
2) How much does patent application cost in India?
Answer: Cost of filing patent in India by the normal route. The official patent cost for filing a patent
application is ₹1,600 for an individual or ₹4000 for a small or ₹8000 large entity.
3) What is the timeline of patent in India?
Answer: A patent is granted for a period of twenty years from the date of application or priority date,
whichever is earlier.
4) What are the steps to getting a patent?
Answer: The following are the main steps to get patents. As-Understand your invention, research your
invention, choose the type of protection, draft your patent application, wait for a formal response.
04-02-04: DOCUMENTS REQUIRED FOR PATENT REGISTRATION

The patent application process in India would be incomplete without submitting certain supporting
documents along with the application. These documents are essential for successful patent registration as
incorrect or incorrect filing may result in the Patent Office rejecting the application. Below is a complete
list of documents required to apply for a patent in India.
i. Registration form: A completed and signed patent application form (Form 1) is required,
providing details such as the applicant's name, address and contact information.
ii. Provisional/full specs: A patent description document (Form 2), detailing the invention,
including technical aspects, practical examples, and any drawings or diagrams necessary to
understand the invention is required. A provisional patent specification may be submitted initially,
followed by a full patent specification within 12 months.
iii. Patent Summary: A brief summary (no more than 150 words) is required, highlighting the
technical features and advantages of the invention.
iv. Authority: If the patent application is carried out by a patent agent or service provider, a power of
attorney is required to authorize them to act on behalf of the applicant.
v. Statement and Commitment: A declaration and undertaking (Form 3) must be submitted
declaring the title of the inventor and the accuracy of the information provided in the application.
vi. Because. Priority profile (if any): If the applicant claims priority on the basis of a patent
application previously filed in the treaty country, a certified copy of the priority document must be
submitted together with the English translation of the document. there.
vii. Proof of Right to Apply: In case the applicant is not the inventor, a document establishing the
applicant's right to apply for a patent must be submitted, such as a deed of assignment or an
agreement between the employer and the employer. labor.
viii. Form 28: If the applicant is a registered MSME or a start-up company recognized by the DPI,
they can apply for a reduction in the patent registration fee using Form 28.

ix. Proof of fee payment: The patent application fee required to file a patent application, as well as
any additional fees for specific services such as patent examination and filing of counter-patent
responses, must be paid and must submit proof of such payment.
Drafting of Application
This is the most important step in the patent application process. As mentioned above, the patent
application is accompanied by Form 2, which asks the inventor to provide the technical specifications of
the invention. It should be as detailed as possible and include the different parts of the invention (if
divided into steps); drawings and diagrams showing the mechanism of inventions; Background of the
invention; a detailed description of the content of the invention, the purpose for which the invention was
created, and how it serves the particular industry to which it is likely to belong; specification summary as
well as the patent abstract.
Submit request
Patent requests can be filed in writing with the patent registry or electronically. Form 1 (with main contact
information of the inventor) or Form 28 (for startups and small organizations); Form 2 (with full
regulations or specifications); Form 3 (in case the application is an international application); Form 5
(statement of all inventors of the invention) and Form 26 (power of attorney, if the application is filed by
an attorney), must be filed together on the portal or posted on the register along with the pay applicable
fees. patent registration fee. More importantly, in case a full description is filed within 12 (twelve) months
of the filing date, the inventor must have an earlier filing date on the originally filed application (together
with the original application description). temporary).
Publication
Patent applications are published 18 (eighteen) months from the filing date or from the priority date,
whichever comes first (Rule 24 of the Patent Rules).
It is interesting to note that the Patent Act provides for accelerated publication of an application in cases
where the applicant does not wish to wait the full 18 (eighteen) months. In that case, the applicant may file
Form 9 (prior disclosure required) with the required fee. As part of a request for advance disclosure, the
Controller General shall publish the request within one (one) month from the date of filing the request.
Application for consideration

It should be noted that after publication of the application, a request for examination must be filed on
Form 18. Unfortunately, the patent application examination process is not an automatic process. Requests
for assessment must be filed within 48 (forty-eight) months of the filing date or priority date of the
application. Like publishing, requests for early review can be filed on Form 18A. The Controller General,
after submitting the review request, will forward the request, specifications and other relevant documents
to the reviewer. The inspector will make the first inspection report within 1 (one) month (but not more
than 3 (three) months from the date of reference by the Inspector General).
Protest/ Objection
If objections are raised in the context of an initial review report, the applicant (inventor) or his or her
designated representative (on behalf of the applicant) must file a response to the objections raised (pay the
initial audit report) and comply with any requirements (as stated) within 6 (six) months (from the date of
issue of the first audit report).
Patent granting
Provided that all objections are eliminated, in the event that the examiner and the controlling authority
come to the conclusion that the application and accompanying documents comply with the law, the patent
order shall be notified to the applicant (or for the representative). and subsequently published in Patent
Review. In case of objection to the grant of a patent, any person has the right to submit a notice of
objection to the Controller General, within one (one) year from the date of publication from the date of
grant of the patent.

1) How do I register patent rights?
Answer: The steps to file a patent includes prior art search; filing of the application; publication of
application; request for examination; response to objections and subsequently grant of patent.
2) Who is eligible for patent registration?

Answer: Who can apply for a patent? A patent application can be filed either by true and first inventor
or his assignee, either alone or jointly with any other person. However, legal representative of any
deceased person can also make an application for patent.
3) Who is the owner of a patent?

Answer: In the US, the inventor is presumed to be the initial owner of a patent or patent application. If
there is more than one inventor, there may be more than one owner. Ownership can be transferred or
reassigned.
4) Who can issue patents?

Answer: Patents are granted by either a national patent office or a regional office that handles patents
for multiple countries.

1. What protects the intellectual property created by artists?
a. Copyright b. Geographical indications
c. Patents d. Trademark
2. What products the intellectual property created by designers?
a. Copyright b. Register design
3. What protects the intellectual property created by inventors?
a. Copyright b. Geographical indications
4. Which of these is a geographical indication?
a. BMW b. Champagne
c. Hogwarts d. Play Station
5. What does trademark protect?
a. Invention b. a work of art
c. logo, names and brands d. a secret formula
Answer:
1-a 2-b 3-c 4-d 5-c

1. Intellectual property is lawfully protected by patents, copyrights and trademarks. (True/False)
2. The person who pays for the inventing owns the patents. (True/False)
3. Improvements to old technologies may be patented. (True/False)
4. A patent grants the inventor the right to make, use, sell, and import his /her invention. (True/False)
5. A patent application has been updated after it has been filed to incorporate new features. (True/False)
Answer:
1-True 2-False 3-True 4-False 5-False
COLUMN-I COLUMN-II
1. Copyright a. It is used for the protection of new inventions.

2. Patent b. It consists of a recognizable sign, design, or expression
3. Trademark c. It is the legal right of the owner of intellectual property.

4. Trade Secret d. protected against use without consent.
5. Property right e. It indicates the geographical uniqueness of an item.
Answer:
1-c 2-a 3-b 4-e 5-d

1. Section 7(2) of the Act read with ----------- of the Patent Rules states that a proof of right to
make the application should be furnished.
2. The ----------- is usually in the form of the original assignment, under which the assignee claims
their right to make the application.
3. Document of right means a ------------ evidencing the creation, modification, transfer,
reservation or extinction of a right.
4. A formal declaration enumerating the rights of the citizen compare ------------.
5. These are direct proofs, proofs by contrapositive and -------------, and proofs by induction.
Answer:
1- Rule 10 2-proof of right 3- document 4- bill of rights 5- contradiction
SUMMARY
The definition of intellectual property rights is any and all rights associated with intangible assets owned
by a person or company and protected against use without consent. Intangible assets refer to non-
physical property, including right of ownership in intellectual property. Trademarks protect logos,
sounds, words, colors, or symbols used by a company to distinguish its service or product. Trademark
examples include the Twitter logo, McDonald’s golden arches, and the font used by Dunkin. Copyright
law protects the rights of the original creator of original works of intellectual property. Unlike patents,
copyrights must be tangible. For instance, you can’t copyright an idea. But you can write down an
original speech, poem, or song and get a copyright.
Once someone creates an original work of authorship (OWA), the author automatically owns the
copyright. But, registering with the U.S. Copyright Office gives owners a head-start in the legal system.
Trade secrets are a company’s intellectual property that isn’t public, has economic value, and carries

information. They may be a formula, recipe, or process used to gain a competitive advantage. To qualify
as a trade secret, companies must work to protect proprietary information actively. Once the information
is public knowledge, then it’s no longer protected under trade secrets laws.
A trademark is a sign capable of distinguishing the goods or services of one enterprise from those of
other enterprises. Trademarks are protected by intellectual property rights. Traditional knowledge (TK)
is knowledge, know-how, skills and practices that are developed, sustained and passed on from
generation to generation within a community, often forming part of its cultural or spiritual identity. A
geographical indication (GI) is a name or sign used on certain products which corresponds to a specific
geographical location or origin (e.g., a town, region, or country).
India, as a member of the World Trade Organization (WTO), enacted the Geographical Indications of
Goods (Registration and Protection) Act, 1999 has come into force with effect from 15 September 2003.
Indications which identify a good as originating in the territory of a member, or a region or a locality in
that territory, where a given quality, reputation or characteristic of the good is essentially attributable to
its geographic origin. Industrial design is a process of design applied to physical products that are to be
manufactured by mass production. It is the creative act of determining and defining a product's form and
features, which takes place in advance of the manufacture or production of the product.
KEY WORDS
Patent -for an invention is granted by government to the inventor, giving the inventor the right to stop
others, for a limited period.
Copyright – A set of rights automatically granted to the person who creates the original work of
authorship, such as literature, song, film, or software.
Parody – Imitation of the style of a certain writer, artist or genre with deliberate exaggeration for comic
effect.
Intellectual Property - Intellectual Property (IP) refers to intellectual creations such as inventions;
literary and artistic works; projects; and symbols, names and images used in trade.
Trademark- A trademark is a sign capable of distinguishing the goods or services of one enterprise from
those of other enterprises. Trademarks are protected by intellectual property rights.
Geographical indications- It is a sign used on products that have a specific geographical origin and
possess qualities or a reputation that are due to that origin.
Traditional knowledge- It is knowledge, know-how, skills and practices that are developed, sustained
and passed on from generation to generation within a community, often forming part of its cultural or
spiritual identity.

REFERENCES
1. Lemley, Mark A.; Shapiro, Carl (2005), Probabilistic Patents. Journal of Economic Perspectives,
Stanford Law and Economics Olin Working Paper No. 288. 19: 75.
2. Miriam Bitton (2012), Rethinking the Anti-Counterfeiting Trade Agreement's Criminal Copyright
Enforcement Measures the Journal of Criminal Law & Criminology 102(1):67–117.
3. Martin, G; Sorenson, C; Faunce, TA (2007), Balancing intellectual monopoly privileges and the
need for essential medicines. Globalization and Health. 3:3-4.
YOUTUBE VIDEOES
1. https://www.youtube.com/watch?v=AGpmo-Y8RUk2.
2. https://www.youtube.com/watch?v=Bj1_z56VEJ0
3. https://www.youtube.com/watch?v=TdePs0s6Ka8
4. https://www.youtube.com/watch?v=VMhcnaOBKvM
WIKIPEDIA
1. https://nyaaya.org/legal-explainer/patient-rights-in-india/
2. https://en.wikipedia.org/wiki/Intellectual_property
3. https://en.wikipedia.org/wiki/Patent
4. https://instr.iastate.libguides.com/patents/USsearch
OER
1. Ahuja V K 2017, Law Relating to Intellectual Property Rights, LexisNexis India Book Stores.
2. Rajagopalan Radhakrishnan, 2008, Intellectual Property Rights, Excel Books.
3. Asha Vijay Durafeand Dhanashree K. Toradmalle, 2020, Intellectual Property Rights, Wiley India
Pvt Ltd.
REFERENCE BOOKS
1. Vaidyanathan, Siva, (2004). The Anarchist in the Library: How the Clash Between Freedom and
Control Is Hacking the Real World and Crashing the System. New York: Basic Books.
2. Shiva, Vandana (2016). Biopiracy: The Plunder of Nature and Knowledge. North Atlantic Books.
3. World Intellectual Property Organization (WIPO) (2016). Understanding Industrial Property. World
Intellectual Property Organization.
4. Rupinder Tewari and Mamta Bhardwaj (2021). Intellectual Property, A Primer for Academia.
Panjab University, Chandigarh.

CREDIT 04- UNIT-03: RESEARCH DATABASE
LEARNING OBJECTIVES
 Describe the differences between data, information, and knowledge;
 Describe why database technology must be used for data resource management;
 Define the term database and identify the steps to creating one;
 Describe the role of a database management system;
 Describe the characteristics of a data warehouse; and
 Define data mining and describe its role in an organization.
“No data is clean, but most is useful” ~ Dean Abbott
INTRODUCTION
It is widely acknowledged that the 21st century is driven by innovation and knowledge creation, which are
also the main drivers of economic development in any country, like the last two decades of the century.
21st century confirmed. The generation, storage and meaningful use of data related to research results are
vital ingredients for innovation and knowledge creation in any country.
After World War II, the world saw unprecedented growth in research and academia, not just limited to the
sciences. This has posed unprecedented challenges across all facets of academia, from human resource
management in research and higher education institutions to deeper questions about ethics in the world.
academic. In fact, the situation is more complicated as most academic activities have shifted from
professional to professional activities. When the number of participants in research and related activities is
limited, the peer-led approach is very practical and tangible for evaluating research outcomes, among
other parameters, of any individual and the ethics involved are fundamentally related to integrity of a peer
group.
04-03-01: RESEARCH DATABASE

“Data is information that is collected, stored, and processed. This can be in the form of numbers, words,
images, or other forms of information. It is often used to represent specific information about people,
places, things, or events, and can help understand patterns, trends, and relationships in collected
information. An essential element in many fields, including business, science, technology, and the social
sciences, it is used to inform decision-making, stimulate innovation, and enhance understanding of the
world around us. In the following sections, you will learn about databases, their different types, and how
to store and manage data in these databases.

“A database is a collection of data organized and stored in a structured format that allows easy data access,
manipulation, and analysis. » Databases can be used to store many types of data, including financial
records, customer information, inventory records, and more. There are many types of databases, including
relational databases, object-oriented databases, and NoSQL databases, and they can be used in a variety of
applications, such as data warehouses, data processing, online transactions, etc.
PRIMARY DATABASE TYPES: Databases can be primarily divided into two main types:
 Single-file: Used for representing a single piece of information or data, they use individual files
and simple structures.
 Multi-file relational: These databases are relatively a lot more complicated, and they make use of
tables in order to display the relationship between different sets of data.
OTHER TYPES OF DATABASES:

There are various other widely used databases, which are as follows:
 Hierarchical database: A hierarchical database is a type of database that uses a hierarchical model
to organize data. In a hierarchical database, data is organized in a tree structure, with each record
represented as a node in the tree. Each node is connected to one or more child nodes, and each
child node can have its own set of child nodes, creating a hierarchy of nodes. In a hierarchical
database, each node in the hierarchy can have only one parent, and the relationship between the
nodes is one to many. This means that a parent node can have many child nodes, but a child node
can have only one parent node.
 Network database: Network databases use a network model to organize data. In a network
database, data is organized into a series of interconnected records, where each record represents an
entity and the relationships between the entities are represented as lines. join records. Each record
in this type of database can have many parent and child records, creating a complex web of data
relationships. This allows for a more expressive and flexible data model than hierarchical
databases, which have a more rigid one-to-many relationship between parent and child nodes.
 Object-oriented databases: In an object-oriented database, data is organized into objects, which are
independent entities that contain both the data and the methods that operate on that data. Object-
oriented databases are designed to support the creation and management of complex data
structures and are commonly used in applications that require manipulation of large amounts of
structured and semi-structured data.
 Relational databases: This is the most widely used type of database and they store data in tables
that are related to each other through common keys or indexes. Examples of relational databases
include MySQL, Oracle, and Microsoft SQL Server.
 NoSQL Database: These databases are designed to handle large amounts of unstructured or semi-

structured data and do not use a traditional table-based relational database model. Instead, they use
a variety of data models, such as key-value pairs, documents, and graphs to store data. Examples
of NoSQL databases include MongoDB, Cassandra, and Couchbase.
 Centralized database: A centralized database is a database that is stored on a central server and
accessible by many users over a network. The central server acts as a database hub and all users
access the same copy of the database. This database is commonly used in organizations to store
and manage data shared by multiple users or departments.
 Distributed databases: Distributed databases are hosted and managed on multiple servers, instead
of a single central server. It is designed to provide faster data access and improve database
scalability and reliability. In a distributed database, data is divided into smaller pieces and stored
on multiple servers, where each server is responsible for storing and managing a portion of the
data.
 Cloud database: Cloud databases are stored and managed in the cloud, not on a local server or
device. Cloud databases are accessible over the Internet and can be used by multiple users or
applications, providing a flexible and scalable way to store and manage data. Cloud databases can
be used for many applications including web and mobile apps, data warehouses, and more.
 Personal Database: Personal databases are designed to store and manage the data of an individual
or a small group of users. They are typically smaller in size and scope than enterprise databases,
and are used by large organizations to store and manage data for large numbers of users. You can
use personal databases to store and manage many types of data, including financial records,
contact details, personal notes, and more.
 Active database: Operations databases are used to store and manage the data needed to support an
organization's day-to-day operations. Operations databases are typically designed to handle large
volumes of transactions and are used to store essential real-time data, such as customer orders,
inventory levels, and financial transactions. main. They are often used in applications that require
fast data access and the ability to update data in real time.
 Enterprise database: An enterprise database is used by a large organization to store and manage
the data needed to support the organization's business operations and processes. Enterprise
databases are designed to handle large volumes of transactions and support large numbers of
users.
 End User Database: Used by non-technical users, such as professionals or individual consumers,
end-to-end databases are used to store and manage data. They are generally designed to be easy to
use and do not require deep technical knowledge or expertise to set up and maintain.
 Commercial Database: Commercial database developed and sold by a company for customer use.
In general, they are more feature rich and powerful than free or open source databases, and they

can be used in many applications, including enterprise, government, and education. Usually
offered as a service, the database is hosted and managed by the company and is accessible to
customers over the Internet.
 Graphical Database: A graph database is a type of NoSQL database that uses graph theory to
store, manage, and query data. In a graph database, data is organized into nodes, which represent
entities or objects, and edges, which represent relationships between nodes. They are especially
suitable for storing and managing data with complex relationships and connections, such as social
networks, recommendation engines, and fraud detection systems.
 Open Source Database: The open source database is freely available and can be used, modified,
and distributed by anyone. The open source database is developed and maintained by a
community of volunteers and users are free to access and modify the source code as needed.
DATABASES AND INDEXING
Indexing improves database performance by minimizing the number of disk hits required to complete a
query. It is a data structure technique used to quickly locate and access data in a database. Several
database fields are used to build the index. The primary key or candidate key of the table is copied in the
first column, which is the search key. To speed up data retrieval, the values are also kept in sorted order. It
should be emphasized that it is not necessary to sort the data. The second column is a data reference or
pointer containing a set of pointers containing the address of the disk block where that particular key value
can be found.
ATTRIBUTES OF INDEXING
 Access Types: This refers to the type of access such as value-based search, range access, etc.
 Access Time: It refers to the time needed to find a particular data element or set of elements.
 Insertion Time: It refers to the time taken to find the appropriate space and insert new data.
 Deletion Time: Time taken to find an item and delete it as well as update the index structure.
 Space Overhead: It refers to the additional space required by the index.
In general, there are three types of file organization mechanisms followed by indexing methods for storing
data. Where the indices are based on the order the values are sorted. These are usually faster and more
traditional caching mechanisms. These ordered or sequential file organizations can store data in a dense or
sparse format.

DENSE INDEX: -
For every search key value in the data file, there is an index record.
This record contains the search key and also a reference to the first data record with that search key value.
SPARSE INDEX: -
The index record appears only for a few items in the data file. Each item points to a block as shown.
To locate a record, we find the index record with the largest search key value less than or equal to the
search key value we are looking for.
We start at that record pointed to by the index record, and proceed along with the pointers in the file (that
is, sequentially) until we find the desired record.
Number of Accesses required=log₂(n)+1, (here n=number of blocks acquired by index file).
CLUSTERED INDEXING : -
 When more than two records are stored in the same file this type of storing is known as cluster
indexing. By using cluster indexing we can reduce the cost of searching reason being multiple
records related to the same thing are stored in one place and it also gives the frequent joining of
more than two tables (records).
 The clustering index is defined on an ordered data file. The data file is ordered on a non-key field.
In some cases, the index is created on non-primary key columns which may not be unique for each
record. In such cases, in order to identify the records faster, we will group two or more columns
together to get the unique values and create an index out of them. This method is known as the
clustering index. Essentially, records with similar properties are grouped together, and indexes for
these groupings are formed.
 Students studying each semester, for example, are grouped together. First-semester students,
second-semester students, third-semester students, and so on are categorized.
ADVANTAGES OF DATABASE
 There are several advantages of using a database management system to store, which are as
follows:
 Data organization: Database provide tools for organizing data in a structured and logical way,
which can make it easier to search, sort, and retrieve data.
 Data integrity: Database enforce rules and constraints on the data to ensure that it is accurate and
consistent.
 Data security: Database provide tools for controlling access to the data and protecting it from
unauthorized access or tampering.

 Data scalability: Database can support the storage and management of large amounts of data, and
they can be scaled up or down as needed to meet the changing data storage and processing needs
of the application.
 Data interoperability: Database can support the integration of data from a variety of sources and
can be accessed by multiple users or application programs.
DISADVANTAGES TO USING A DATABASE :
 There are several disadvantages of using a database management system to store, which are as
follows:
 Cost: Database can be expensive to set up and maintain, especially for large or complex systems.
 Complexity: Database can be complex to set up and manage, and may require specialized
knowledge or expertise.
 Dependency: Applications that use a Database may be too dependent on the Database, which can
cause a constraint on their flexibility and portability.
 Performance: Database can have overhead that can affect the performance of the application,
especially for applications with a high volume of transactions or complex queries.
CITATION INDEX DATABASES & TOOL

Web of Science is a primary citation indexing platform that provides access to a variety of
interdisciplinary research reference databases. You can create a citation report that provides the citation
count and author index for a set of publications. Please see 3 simple steps to create your Web of Science
(WoS) citation report.
In Cites is a citation analysis tool designed for quantitative analysis of search performance based on Web
of Science data. It provides information on citation ranking data by researchers, institutions, countries, and
journals. Cites manuals are available:
i. Scopus and SciVal: Scopus is another large database of abstracts and citations of peer-reviewed
documents. You can find metric definitions in Scopus through the quick reference tool. To
generate Scopus citation report, please refer to 3 simple steps to create your Scopus citation
report. Researchers can request programmatic access to citation data, metadata, and abstracts for
academic journals indexed in Scopus, as well as full-text journals and books on the platform.
ScienceDirect full-text platform, free of charge, for non-commercial use subject to Elsevier
policies and regulations. limited use.
SciVal is a Scopus data-driven research effectiveness tool. SciVal can provide analysis of the
performance of individuals or groups of researchers and organizations worldwide. Please refer to
the SciVal Metric Guide for more information on the metrics covered by SciVal. To request
access to SciVal, please see the instructions here.

ii. Google Scholar: Authors can create their own Google Scholar profile, which allows them to
include all of their indexed publications in Google Scholar and view metrics such as citation count
and index h. Please see 3 easy steps to prepare your Google Scholar (GS) profile.
iii. Purpose: Lens is a free and open resource used to search, analyze, and manage scientific work,
patents, and patent sequence data. With over 200 million records, it is one of the largest indexes
available. The API and Bulk Data are available through a 14-day free trial (see here).
iv. Size: Dimensions is a linked data platform launched by Digital Science in 2018, with full-text
indexing for discovery. The free version of the database includes access to publications, datasets,
and citation counts. Access to grants, policy documents, patents, or clinical trial information
requires a paid subscription. Dimensions also provide free access to non-commercial Scientific
Research Projects, through which users will gain access to Dimensional Analytics, Dimension
APIs, and/or Dimension data on Google BigQuery. For more information, see here.
v. Essential scientific indicators: Based on Web of Science Core Collection data, you can identify
influential people, organizations, articles, publications, and countries in your field of research.
MAJOR CITATION INDEX

"The citation index (indexing) is an ordered list of articles cited, each accompanied by a list of articles
cited." The cited article is identified as the source and the cited article is the reference. Aggregation and
indexing services are products that publishers sell or provide. Journal content can be searched using
subject titles (keywords, author names, titles, abstracts, etc.) A presence in relevant online indexing and
abstracting services is critical to the success of a journal. Nowadays, research is done online, so it is
imperative that the journal be present in the relevant online search system. A citation index is a type of
bibliographic database, which is an index of citations between publications, allowing users to easily
determine which documents later cite which documents earlier.
A form of citation index was first discovered in the 12th century in Jewish religious literature. Legal
citation indexes were discovered in the 18th century and made popular by citations such as Shepard’s
(1873). In 1960, the Eugene Garfields Institute for Scientific Information (ISI) introduced the first citation
index for articles published in academic journals, first the Scientific Citation Index (SCI), then the
Scientific Citation Index (SCI). Social Science Citation Index and Arts and Humanities Citation Index.
The first automatic citation indexing was done by “CiteSeer” in 1997. Other data sources include Google
Scholar and Elsevier's Scopus.
MAJOR CITATION INDEXING SERVICES:

i. SCI and extended SCI: Published by ISI, a subsidiary of Thomson Reuters. As mentioned, SCI
was originally produced by ISI and created by Eugene Garfield (1964). The SCI database serves

two purposes: first, to identify what each scientist has published, and second, where and how often
that scientist's papers are cited. The electronic version of SCI is called “Web of Science”. SCI
expands to index 8,073 journals with citation references in 174 scientific disciplines in scientific
publishing.
ii. Scope: Scopus (Elsevier) is a bibliographic database containing abstracts and citations of
scholarly journal articles. It includes 21,000 titles from more than 5,000 publishers. It is only
available online.
iii. India Citation Index (ICI): The ICI Online Citation Data System is a new web-based platform to
periodically measure India's search performance. This online bibliographic database was launched
in 2009. ICI includes more than 800 journals published in India in science, engineering, health and
society.

1) What is research database?
Answer: A research database is an organized, searchable collection of information that allows you to
quickly search many resources simultaneously.
2) What is an example of a research database?

Answer: Several free research databases are available online, with common examples including EBSCO
and JSTOR. Free research databases often allow you to create an account and access academic journals,
books and primary sources.
3) Is PubMed a research database?

Answer: PubMed is a free resource supporting the search and retrieval of biomedical and life sciences
literature with the aim of improving health–both globally and personally. The PubMed database contains
more than 36 million citations and abstracts of biomedical literature.
4) What are online research databases?

Answer: An online research database is an electronic collection of information that include scholarly
articles, books, journals, magazines, books, and newspapers. Some research databases also include video,
images, and more. Libraries purchase access to online databases for patron use.

04-03-02: WEB OF SCIENCE /WEB OF KNOWLEDGE
Web of Science (WoS) formerly known as Web of Knowledge (WoK)) is a paid platform that provides
access (usually via the Internet) to a number of databases that provide reference and citation data. from
academic journals, conference proceedings, and other databases. articles from a variety of academic
disciplines. Until 1997, it was originally produced by the Institute of Scientific Information. It is currently
owned by Clarivate.
The citation index is based on the fact that scientific citations act as a link between similar research items
and lead to corresponding or related scientific publications, such as articles, conference proceedings,
abstracts, etc. In its current form, WoS represents a comprehensive citation index. The platform allows us
to track academic and scientific literature in 254 disciplines from nearly 171 million documents and nearly
1.9 billion cited references. It is an integrated and flexible platform that provides easy access to diverse
and high-quality scientific information in the sciences, social sciences, arts and humanities, as well as
research tools. research and analysis. Users can search for relevant information available in international
journals, open access resources, books, patents, proceedings or websites. It is one of the most trusted
global citation databases as well as one of the most powerful search engines, providing reliable citation
and publication data for review and research purposes. rescue.
Currently, WoS is the foundation for a large number of databases; The Web of Science core collection
indexes every piece of content from start to finish, creating a complete and solid look at over 115 years of
high-quality research.
ONLINE DATABASES :
The Web of Science Core Collection consists of six online databases:
i. The Science Citations Extended Index (SCIE) includes more than 8,500 notable journals spanning
over 150 disciplines, spanning from 1900 to the present day.
ii. The Social Science Citation Index includes more than 3,000 journals in the fields of the social
sciences, also covering the period from 1900 to the present day.
iii. The Arts and Humanities Citation Index includes more than 1,700 arts and humanities journals
since 1975. In addition, 250 major scientific and social science journals are included.

iv. The Emerging Sources Citation Index (ESCI) includes more than 5,000 journals in the sciences,
social sciences and humanities.
v. The Book Citation Index includes more than 60,000 curated books since 2005.
vi. The Conference Proceedings Citation Index includes more than 160,000 scientific conference
titles from 1990 to the present.
REGIONAL DATABASES : Since 2008, the Web of Science hosts a number of regional citation indices:
i Chinese Science Citation Database, produced in partnership with the Chinese Academy of
Sciences, was the first indexing database in a language other than English
ii SciELO Citation Index, established in 2013, covering Brazil, Spain, Portugal, the Caribbean and
South Africa, and an additional 12 countries of Latin America
iii Korea Citation Index in 2014, with updates from the National Research Foundation of Korea
iv Russian Science Citation Index in 2015
v Arabic Regional Citation Index in 2020
SCOPUS DATABASE/SCIENCE DIRECT

Scopus is a premium subscription and commercial product that Elsevier launched as a comprehensive
bibliographic data source in 2004. Since then, Scopus has enjoyed equal or better standing with the WoS
database. According to the Scopus website, the Scopus database contains up to 80 million documents,
234,000 books, 7,000 publishers, and 80,000 organization records. To create bibliometric analysis using
the Scopus database, the user must first retrieve data from the database. Unfortunately, it's not free. With
over 16 million publications from more than 3,800 journals and over 40,000 Elsevier e-books, Science
Direct supports and facilitates quality research. Scopus provides insight into global research results in
science, technology, medicine, social sciences, arts and humanities, with content from more than 5,000
publishers.
Scopus indexes abstract metadata and references from thousands of publishers, including Elsevier. Scopus
builds additional features based on this metadata, such as citation matching, author profiles, and link
profiles. Scopus indexes nearly the entire Science Direct database, but not full-text articles. It generates
records and metrics using this data. Science Direct offers full-text scientific, technical, and medical
publications, mainly published by Elsevier, with features to help users stay informed and more
productive.
The following steps are examples of retrieving published documents from Scopus: -
i Create a Scopus account at https://www.scopus.com/ via your institution. Scopus will direct you
to the steps needed for creating an account.

ii Once the registration is completed, log on to Scopus. By default, Scopus opens the “Start
Exploring” window with the search documents within the article title, abstract, and keywords.
iii In the search box provided, type in the keywords and search terms you're looking for. For more
details on the query string, click on the Advanced field, including operators and field codes.
iv Use the “Limit to” or “Exclude” buttons to refine results.
v To analyze all the results, click on “All,” then click on the “Analyze Search Results” button.
vi To export the result, click on “CSV export”’ and save the file that will be uploaded later by
VOSviewer software.
GOOGLE SCHOLAR
Google Scholar, developed by Google Inc. Launched in 2004, it is the world's largest indexing and
citation database for the scientific literature, including more scholarly journals and other scientific articles
than other comparable citation databases. similar to Scopus, Web of Science, etc. -review articles, theses,
books, abstracts and legal opinions from academic publishers and professional associations, online print
repositories, universities, subject portals, and other academic institutions. Although Google does not
disclose the size of the Google Scholar database, bibliographic researchers estimate that it contains about
390 million documents, including articles, citations, and patents, making it the most the largest university
research tool in the world. Google Scholar found 88% of all these citations, many of which were not found
by other sources and nearly all of the others found by the remaining sources (89- 94 %). An earlier
statistical estimate published in PLOS One using the mark-and-recall method estimated coverage of about
79-90% of all articles published in English, with an estimate of around 100 million. Google Scholar is
also one of the oldest Google services. Anurag says its comprehensive database of research papers, legal
cases and other scientific publications is the fourth search service launched by Google. To celebrate the
18th anniversary of this very important tool, I asked Anurag to share 18 things you can do in Google
Scholar that you might have missed: -
 Copy article citations in the style of your choice.
 Dig deeper with related searches.
 And don’t miss the related articles.
 Read the papers you find.
 Access Google Scholar tools web with the Scholar Button browser extension.
 Learn more about authors through Scholar profiles.
 Easily find topic experts.
 Search for court opinions with the “Case law” button.
 See how those court opinions have been cited.
 Understand how a legal opinion depends on another.

 Sign up for Google Scholar alerts.
 Save interesting articles to your library.
 Keep your library organized with labels.
 If you’re a researcher, share your research with all your colleagues.
 Look through Scholar’s annual top publications and papers.
 Get even more specific with Advanced Search.
 Find extra help on Google Scholar’s help page.
 Keep up with Google Scholar news.
 Tips for searching Google scholar
 Although Google Scholar limits each search to a maximum of 1,000 results, there is still too much
to explore and you need an efficient way to find relevant articles. Here is a list of pro tips that will
save you time and make your search more efficient.
 Google Scholar searches are not case sensitive
 Use keywords instead of full sentences
 Use quotes to search for an exact match
 Add the year to the search phrase to get articles published in a particular year
 Use the side bar controls to adjust your search result
 Use Boolean operator to better control your searches
CiteSeerX
CiteSeerX (formerly known as CiteSeer) is a public search engine and digital library containing
scientific and scholarly articles, primarily in the fields of computer science and information.
CiteSeer's goal is to improve the dissemination and access of scientific and academic literature. As a non-
profit service that can be freely used by anyone, it is considered part of the open access movement that
attempts to change academic and scientific publishing to allow more access. into the scientific literature.
CiteSeer has made Open Archives Initiative metadata free of all indexed documents and links indexed
documents to other metadata sources such as DBLP and ACM Portal if possible. To promote open data,
CiteSeerX shares its data for non-commercial purposes under a Creative Commons license. CiteSeerX is
a growing digital library and search engine for scientific literature, focusing primarily on information and
computer science literature. CiteSeerX aims to improve the dissemination of scientific literature and
improve the functionality, usability, availability, cost, completeness, efficiency, and speed of accessing
scientific and academic knowledge. art. Instead of just creating another digital library, CiteSeerX strives to
provide resources such as algorithms, data, metadata, services, techniques, and software that can be used
to promote digital libraries. other. CiteSeerX has developed new methods and algorithms for indexing
PostScript and PDF searchable articles on the web.
CiteSeer was developed in 1997 at the NEC Research Institute in Princeton, New Jersey by Steve
Lawrence, Lee Giles and Kurt Bollacker. The service was transferred to the Pennsylvania State University
College of Information Science and Technology in 2003. Since then, the project has been led by Professor
Lee Giles.
After serving as a public search engine for nearly ten years, CiteSeer, originally designed as a simple
prototype, began to grow far beyond the capabilities of the original architecture. Since its inception, the
original CiteSeer has grown to index more than 750,000 documents and answer more than 1.5 million
queries per day, surpassing the limits of the system's capabilities. Based on an analysis of the problems
the original system encountered and the needs of the research community, a new architecture and new data
model was developed for the "next generation CiteSeer" or CiteSeerX, continuing the legacy of the
forward-looking CiteSeer (also known as the Research Index) is a digital science library primarily for
computer scientists. It contains full-text research articles that are free to download from the web.
Articles are indexed by the Automatic Citation Index (ACI) system that links records together through
references cited in an article and citations made to the article. there. It provides links to related articles and
can determine the context of a citation. CiteSeer supports full boolean, phrase and neighborhood searches.
You can choose to search for full-text documents or citations contained in these documents. CiteSeer
became public in 1998 and had many new features unavailable in academic search engines at that time.
CiteseerX provides the following features. These included: -
i Autonomous Citation Indexing automatically created a citation index that can be used for
literature search and evaluation.
ii Citation statistics and related documents were computed for all articles cited in the database, not
just the indexed articles.
iii Reference linking, allowing browsing of the database using citation links.
iv Citation context showed the context of citations to a given paper, allowing a researcher to quickly
and easily see what other researchers have to say about an article of interest.
v Related documents were shown using citation and word-based measures, and an active and
continuously updated bibliography is shown for each document.

1) What is the Web of Science database?
Answer: Web of Science (WoS) is the world's oldest, most widely used and authoritative database of
research publications and citation
2) Is Web of Science database free?

Answer: The Web of Science (WoS; previously known as Web of Knowledge) is a paid-access
platform that provides (typically via the internet) access to multiple databases that provide reference and
citation data from academic journals, conference proceedings, and other documents in various academic
disciplines.
3) Is Web of Science better than PubMed?

Answer: Web of Science and Google Scholar track the most citations. PubMed tracks citations only for
PubMed Central articles. Both Google Scholar and Web of Science track citations--how many times that
an article has been cited by other articles, books, or sources.
4) Why Web of Science is better than Scopus?

Answer: Web of Science has a greater depth of scientific citations, while Scopus focuses on more modern
sources, because its database was founded later and is “younger”.
04-03-03: World Wide Science (WWS)

WWS is a global academic search engine providing access to national and international research databases
from around the world. WorldWideScience.org is a global scientific search engine (academic databases
and search engines) designed to accelerate scientific discovery and progress by accelerating the sharing of
scientific knowledge. Through multilateral partnerships, WorldWideScience.org makes this possible
anyone with an Internet connection can make a single query of the national research databases and portals
of more than 70 countries, covering all the inhabited continents of the world and more than three quarters
of the world's population.
From the user's point of view, WorldWideScience.org makes the databases work as if they were a single
entity. WorldWideScience.org implements a federated search to include global science and research
results. Federated search technology allows the data manager to search multiple data sources in real time
with a single query. It provides simultaneous access to scientific databases on the "deep web" that are
generally unsearchable by commercial search engines. In June 2010, WorldWideScience.org launched
multilingual translation features. With Microsoft's Bing translator, Multilingual WorldWideScience.org
offers users the opportunity to search databases in ten languages and translate the results into the language
of their choice. One-to-many and many-to-one machine translations can be done for Arabic, Chinese,
English, French, German, Japanese, Korean, Portuguese, Russian and Spanish.
FEATURES AND ABILITIES WORLD WIDE SCIENCE
WorldWideScience.org provides science search through a variety of features and abilities, including: -
i Clustering of results by subtopics or dates to help users target their search

ii Wikipedia results related to user search terms
iii Eureka Science News results related to user search terms
iv Mark and send option for emailing results to friends and colleagues
v Enhanced information related to the user's real-time search
vi Alerts service
vii Multilingual Translations
IEEE Xplore
The IEEE Xplore Digital Library is a powerful resource for discovering and accessing scientific and
technical content published by the Institute of Electrical and Electronics Engineers (IEEE) and its
publishing partners. IEEE Xplore is the leading academic database for engineering and computer
science. It can be used to search not only journal articles, but also conference papers and books. It mainly
contains material published by the Institute of Electricity and Electronics (IEE) and other partner
publishers.
It provides online access to more than 5 million documents from publications in computer science,
electrical engineering, electronics and related fields. Its documents and other materials include more
than 300 peer-reviewed journals, more than 1,900 global conferences, more than 11,000 technical
standards, nearly 5,000 e-books, and more than 500 online courses, with approximately 20,000 new
documents added each month. Anyone can search IEEE Xplore and find bibliographic records and
abstracts within its contents, while access to full-text documents requires a personal or institutional
subscription.
CONTENT TYPES IN IEEE XPLORE: The following content types are available on IEEE Xplore:
i. Books: IEEE Press and IEEE Computer Society Press, together with John Wiley and Sons, Inc.,
develop and publish books in the fields of electrical, computer, and software engineering under
the Wiley-IEEE Press and Wiley-IEEE Computer Society Press imprints.
ii. Conference Proceedings: IEEE publishes more than 1,700 state-of-the-art conference
proceedings annually, recognized worldwide by academia and industry as the most important
compendium of electrical engineering, computer science and related fields.
iii. Courses: In addition to IEEE-USA professional development courses, IEEE Xplore offers online
course titles on the IEEE Learning Network. Topics covered are, for example
iv. Journals and Journals: IEEE publishes leading journals, transactions, letters and magazines in
the fields of electrical engineering, computing, biotechnology, telecommunications, electricity and
energy, and dozens of other technologies. Articles from IBM, SMPTE, BIAI and TUP journals are
also available in IEEE Xplore.

v. Standards: IEEE is a leading developer of global standards in various fields including
biomedicine and healthcare, information security, information technology, nanotechnology, power
and energy, telecommunications and transportation.
Pub Med
PubMed Central (PMC) is a free digital repository that archives open access full-text scientific articles
published in biomedical and life sciences journals. PubMed Central is one of the largest research
databases developed by the National Center for Biotechnology Information (NCBI) and is more than a
document repository. PMC submissions are indexed and formatted for advanced metadata, medical
oncology and unique identifiers that enrich the XML-structured data of each article. PMC content can be
linked to other NCBI databases and accessed through Entrez search and retrieval systems, further
enhancing the public's ability to find, read, and advance their biomedical knowledge.
As of December 2018, the PMC archive contained more than 5.2 million articles contributed by
publishers or authors who have deposited their manuscripts in the archive in accordance with NIH's public
access policy. Some publishers delay publishing their articles in PubMed Central for a period of time after
publication, known as a "lock-in period," which varies from a few months to a few years, depending on
the journal. (Six- or twelve-month embargoes are most common.) PubMed Central is a good example of
"systematic third-party external distribution," which continues to be banned by contributors from many
publishers.
PubMed Central® (PMC) is a free full-text archive of biomedical and life science journals from the
National Library of Medicine of the National Institutes of Health (NIH/NLM) of the United States. In
compliance with NLM's Biomedical Literature Collection and Preservation Act, PMC is part of the NLM
Collection, which also includes NLM's extensive print and approved electronic journals, and supports
contemporary biomedical and health research and practice, as well as future scholarship. PMC has been
available to the public online since 2000 and is developed and maintained by NLM's National Center for
Biotechnology Information (NCBI). PMCID (PubMed Central Identifier), also known as PMC reference
number, is the bibliographic identifier of the PubMed Central open access database, just as PMID is the
bibliographic identifier of the PubMed database. However, the two identifiers are different. It consists of
"PMC" followed by a sequence of seven numbers.
Since its inception in 2000, PMC has grown from two publications, PNAS: Proceedings of the National
Academy of Sciences and Molecular Biology of the Cell, to an archive of thousands of journal articles. In
addition, PMC also includes author manuscripts deposited in the NIH Manuscript Submission System and
through, and preprints collected through the NIH Preprint Pilot.

DIFFERENCE BETWEEN PUB MED AND PUBMED CENTRAL?
PubMed is a biomedical literature database which contains the abstracts of publications in the database.
PubMed Central is a full text repository, which contains the full text of publications in the database.
Publications that are archived in PubMed Central may be found when searching PubMed. In PubMed, the
abstract of the publication is available and searchable. The same publication in PubMed Central contains
the full text article and the full text is searchable.
PubMed and PubMed Central (PMC) are often mistakenly thought of as the same database. To be
officially recognized by PubMed, journals must be selected by the National Library of Medicine for
inclusion in MEDLINE. This requires a review process to ensure that the journal meets the appropriate
quality and technological standards necessary for inclusion in MEDLINE.
PubMed Central is a different database from PubMed and does not have the same stringent requirements
for inclusion. Articles from PMC may be searchable in PubMed, but this does not mean they are indexed
in MEDLINE. If a publisher claims that their journal is in PubMed, you should confirm that this means
they are indexed in MEDLINE and not simply searchable in PubMed by appearing in PMC.

1) Which is better Google Scholar or Web of Science?
Answer: Google Scholar and Scopus do a lot more of this than Web of Science. Non-English language
publications – Google Scholar will find more of these. Interdisciplinary field coverage - Scopus and
Google Scholar cover more journals in fields that span multiple disciplines
2) What is the use of in Web of Science?

Answer: You can check out our Lib Guide on Boolean Operators for more help or you can go directly to
the Web of Science Search Operators Help page. The asterisk * can be used in place of a character or
letter when any number of characters or letters, or no character, might be in its place.
3) What is ISI Web of Science database?

Answer: The Web of Science (WoS; previously known as WoK Web of Knowledge) is a paid-access
platform that provides (typically via the internet) access to multiple databases that provide reference and
citation data from academic journals, conference proceedings, and other documents in various academic
disciplines.
4) What are the benefits of Web of Science?

Answer: The Web of Science provides a common search language, navigation environment, and data
structure allowing researchers to search broadly across disparate resources and use the citation
connections inherent to the index to navigate to relevant research results and measure impact.
04-03-04: E-THESIS ONLINE SERVICES (EThoS)

The E-Theses Online Service (EThOS) is a bibliographic database and composite catalog of electronic
theses provided by the British Library, the National Library of Great Britain. From February 2022, EThOS
will provide access to over 500,000 theses published by over 140 UK universities, with around 3,000 new
theses added every month.
DATA STORED IN ETHOS
Theses indexed by EThOS have at least the name of the thesis, author, publisher and date. Optional
metadata can be added, such as thesis summary, thesis supervisor, sponsor, cross-links to other databases
and the full text of the thesis itself.
The EThOS website provides open access to the full text of approximately 160,000 UK digitized treatises.
They are freely accessible by registering and then logging into EThOS. Open access is also ensured
through links to the donor agency's institutional archive. Since 2015, EThOS has integrated management
of institutions and other unique identifiers, incl.: -
i The author's ORCID identifier.
ii The International Standard Name Identifier (ISNI).
iii The Handle System.
iv Digital Object Identifiers (DOIs).
SEARCHING ETHOS METADATA

If metadata exists, it can be used as search criteria. So, for example, in addition to the basic search, an
advanced search makes it possible to search for theses by year of publication, publisher, author's first
name, surname, thesis title, thesis supervisor and many other metadata. EThOS data can also be accessed
programmatically (by machines) using the Open Archives Initiative (OAI) Metadata Collection
Protocol (PMH), Data Cite and its Application Programming Interface (API).
TYPES OF ESSAYS INCLUDED

In addition to indexing PhD theses, EThOS also tracks other types of PhD, including:
i Doctor of Medicine (MD) thesis, for example Pensée Wu's thesis.
ii Doctor of Science (ScD or DSc) theses, for example Else Bartels thesis.
iii Doctor of Engineering (EngD or DEng) theses, for example Chris Dighton's thesis.
iv Doctor of Professional Studies (ProfD or DProf) theses, for example Andreas Georgiou's thesis.

v Doctor of Music (MusD or DMus) and Doctor of Musical Arts (DMA) theses, for example
Evaristo Lopez thesis.
vi Doctor of Education (EdD or DEd) theses, for example Mary Greaves thesis.
vii Doctor of Philosophy (PhD or DPhil) by prior publication in peer reviewed journals, for
example Jill Steward's thesis.
viii Doctor of Philosophy in creative writing, for example Sally O'Reilly's thesis.
DEVELOPMENT AND CONTACT

EThOS was developed in collaboration with HEIs across the UK and was funded by Research Libraries
UK (RLUK) and launched in January 2009. British Library EThOS staff can be contacted during office
hours via email, Twitter and in person at the British Library.
DIRECTORY OF OPEN ACCESS JOURNALS (DOAJ)

DOAJ is a very special academic database because all indexed articles are freely available. It includes
about 4.5 million objects from various fields. DOAJ is one of the most popular general index databases in
terms of usage and reputation, which journals can usually search relatively early in publication. With
nearly 12,000 journal members, more than 1.2 million monthly visitors, and continuously updated journal
metadata, DOAJ is a powerful platform for finding quality literature.
The Directory of Open Access Journals (DOAJ) is an indexing service that provides access to high-
quality peer-reviewed open access journals. DOAJ is completely free and all the information it contains is
freely available. What is important about DOAJ is that it is independent. It is funded either by donations
from sponsors and organ donors. This allows DOAJ to offer all of its services free of charge. This includes
the fact that it is a free service for those publishers who wish to be indexed in DOAJ. Importantly, from
the author's point of view, all information is freely available, so you can find out if a particular journal has
been indexed by DOAJ.
OPEN ACCESS
 When reviewing DOAJ, it is important to know the open access model of the publication.
Basically, this is a model where authors, their institutions, funding bodies or other stakeholders
pay the publishing costs, instead of using a subscription model or a model where you have to buy
or even rent an individual article.
 The video above provides a good introduction to open publishing and the history and goals of
DOAJ. You may also want to check out one of our earlier articles on open access publishing in
relation to predatory publishing.
 There are some surprising facts in the video that surprise us to say the least. One (in about 4

minutes) is that DOAJ receives about 500 requests per month from journals that want to be
indexed. About 40% of these journals will eventually be added to the DOAJ database.
 Criteria to be indexed in DOAJ
 It is useful to be aware of the important criteria for indexing journals in DOAJ because we need to
be sure that a journal listed on the DOAJ website is reliable and not likely to be predatory.
 DOAJ provides "Best Practices for Publication and Core Standards for Inclusion." which is
comprehensive and interesting for those who wish to index their journals in DOAJ. It is interesting
to see that DOAJ requires some basic requirements before indexing a journal. These include: -
 All content must be immediately available without embargo.
 Each journal must have its own home page, indicated by a single URL. Journal collection URL is
not accepted.
 Basic information such as aims and scope, editorial, guidelines for authors, plagiarism policy, etc.
be hosted on the magazine website and not be stored on another website.
 Each article must have its own URL.
 The journal directory must have at least one ISSN.
 The home page of the magazine must be easy to use and clear and comprehensive.
 The editorial website must be updated. It must be possible to contact the members of the editorial
board and their names and affiliations must be indicated.
 The author must be given detailed and thorough instructions.
 Articles must be peer-reviewed and the type of review must be clearly indicated on the website.
 The fees of the author or authors must be clearly stated in the journal. If there are no fees, this
must be clearly stated.
 The journal's open access policy must be clearly stated.
INDIAN CITATION INDEX (ICI)

Global citation databases such as Web of Science and Scopus cover only a handful of Indian academic
journals, so there has always been a demand for domestic citation databases to ensure better visibility of
Indian scientific literature. This applies especially to journals in the social sciences and humanities,
which are not included in databases such as WoS and Scopus. In October 2009, ICI launched Knowledge
Foundation and Diva Enterprises Pvt. Ltd., is a subscription-based data portal covering citable data since
2004. It is an online collection of multidisciplinary reference-cum-bibliographic databases covering about
800 multidisciplinary academic journals published in South Asia, especially India.
ICI provides a multidisciplinary research platform covering about 1000 Indian scientific journals. The
ICI database also produces other useful spin-offs such as Indian Science Citation Index (ISCI), Indian
Social Sciences and Humanities Citation Index (ISSHCI), Indian Journals Citation Reports (IJCR),

Indian Citation Index (ICI) is an online bibliographic database. which includes summaries. and citations
from academic journals. Currently, ICI covers more than 1,100 Indian journals covering science,
technology, medicine and social sciences, including arts and humanities. ICI covers data since 2004 and
provides full texts of open access journal titles.
There are currently several journals each and every field. ICI provides search and analysis capabilities. ICI
was launched in India in 2009 and is funded by Diva Enterprises Pvt. Ltd. Indian Science and
Technology Abstracts (ISTA) and Directory of Indian Journals (DOIJ). ICI was launched in October
2009 in New Delhi with 100 magazines. It provides access to the full text of all open access journals. It
links to Google Scholar for more information and provides analysis tools. There are 50 different advanced
subject categories, such as health science, mathematics, IT and technology, economics and agriculture.
These subject categories are further divided into second and third level. Quotations symbolize the fusion
of R and D’s ideas. References mentioned by researchers in their writings clearly link their current
research work and previous literary archival work. The Indian Citation Index (ICI) uses intellectual links
by listing both cited and cited works. Like other indexes, it allows you to go back in time to previously
published books. But uniquely, one can also wait to determine who has since referenced the earlier study.
SCOPE
Indian R and D literature of all disciplines like science, engineering, medicine, agriculture, social
sciences and humanities is published in 1000s and journals/serials or other papers from India.
GOALS
i Provides access to nationally and globally published articles in the local Indian R and D literature
ii. It reflects and gives a true picture of India's locally published scientific contribution at the national
and global level
BENEFITS ICI
The Indian Citation Index (ICI) enables the research community to map information published in local
national journals/journals etc. Whether you are just starting out in your research, an experienced academic
researcher or teacher or librarian or administrator, ICI provides objective content. and tools to support
your research role.
ADVANTAGES OF ICI'S presence in the scientific community are:
i. A comprehensive research and assessment tool for Indian literature
ii. To facilitate comprehensive scientometric and bibliometric studies of Indian literature
iii. Assist in measuring and analyzing individual, institutional, regional and national R and D
performance for strategic planning
iv. A real tool to generate complete and comprehensive analytical reports on R and D health in India
v. ICI can generate national R and D indicators like Indian Journals Citation Reports etc.

vi. Catalyze the image and visibility of Indian information content and publications
vii. To help the decision makers to reach a decision point about the superiority of competitors to
decide on some awards, grants, recruitment etc.
viii. To boost the Indian publishing industry globally
ix. Authors regardless of affiliation and location can publish their R and D findings in local national
journals in India.
x. to have a genuine tool/basis for effective and rigorous evaluation of scientific papers in India
Preprint site arXiv

arXiv is a free distribution service and open access archive containing 2,313,626 scientific
articles in the fields of physics, mathematics, computing, quantitative biology, quantitative
finance, statistics, electrical engineering and systems science, and economics. The materials on
this site have not been peer-reviewed by ArXiv. One of the first and most popular online preprint
archives, arXiv (http://arXiv.org), was very popular and strongly influenced the course of
scientific research for nearly three decades. The websites arXiv.org, chemArxiv.com and
bioRxiv.org cover almost every field of science. In fact, most research articles in certain fields are
available as preprints, even before they are published, and recently many journals require that an
article be registered on arXiv before processing it for publication.
ARXIV'S PREPRINT SERVER HITS

The popular preprint server arXiv.org, where physicists, astronomers, mathematicians,
computer scientists, quantitative biology, chemistry, statistics, and engineering researchers
regularly upload manuscripts to share their results publicly before peer review, now contains
more than a million scientific articles. "The term 'preprint' most often refers to a manuscript that
has undergone peer review and is now awaiting publication in a traditional journal. A preprint
available online may also be called an 'e-print.'
Researchers now submit about 8,000 articles a month to arXiv, an average of more than 250 a
day. Site administrators make raw, non-peer reviewed manuscripts available in batches after a
brief quality control review. These include, for example, a quick look at the qualification of one
of 130 volunteer moderators and automatic filtering to check if text overlaps with existing
articles. Judging by the current number of articles displayed on the arXiv homepage, the
manuscript with a significant million ID is "Well-developed and sparse estimation of covariance

and inverse covariance matrices using common penalty" by Ashwini Maurya of Michigan State
University, East Lansing.
But, the site's millions of articles can't be determined just like that, says arXiv founder Paul
Ginsparg, a physicist at Cornell University in Ithaca, New York. This number is actually a little
fuzzy because submissions are indexed and sometimes duplicates or spam come in, which can
now be detected by screening software but were easier to ignore in the early days of the site.
MULTI-SUBJECT PREPRINT SERVICES

i. AgEcon Search: A free, accessible archive for full-text academic literature in agricultural
and applied economics, including working papers.
ii. ArXiv.org: Open access to articles in physics, mathematics, computer science,
quantitative biology, quantitative finance, statistics, electrical engineering and systems
science, and economics.
iii. CogPrints: An electronic collection of self-archived articles from fields related to
cognition, including: psychology, neuroscience, linguistics, computer science, philosophy,
and biology.
iv. DOE Pages: After a 12-month administrative hiatus, DOE PAGES provide free public
access to the best available full-text version of DOE-sponsored scientific publications -
whether peer-reviewed, accepted manuscripts, or published scientific journal articles.
DOE PAGES currently has more than 65,000 publicly available full-text publications and
accepted manuscripts and continues to grow by approximately 20,000-30,000 publicly
available articles and manuscripts per year.
v. Pre-Nature Service: A free service launched in January 2007. It is a permanent,
indexable archive of pre-publication research and preliminary findings in the fields of
biology, medicine (excluding clinical trials), chemistry and earth sciences.
vi. SSRN: a multidisciplinary print-ready service containing approximately one million
documents in various fields. Includes sections for professional sciences, health sciences,
humanities, life sciences, physical sciences and social sciences.

1) What is a preprint in arXiv?
Answer: A preprint is a complete manuscript shared with a public audience without peer review. Often,
preprints are also submitted for peer review and publication in a traditional scholarly journal.

2) Can arXiv reject a paper?
Answer: Although moderation is not peer review, arXiv's moderators have the same power as editors to
reject papers.
3) What is the purpose of DOAJ?

Answer: DOAJ's mission is to increase the visibility, accessibility, reputation, usage and impact of
quality, peer-reviewed, open access scholarly research journals globally, regardless of discipline,
geography or language.
4) What is the citation index in India?

Answer: Indian Citation Index (ICI) is a home-grown abstracts and citation database, with
multidisciplinary objective information/knowledge contents from about 1000 top Indian scholarly
journals. It provides powerful search engine basically to perform search and evaluation for researchers,
policy makers, decision makers etc.

1. The objective of citation style manuals is –
a. Attribution of other’s intellectual work b. Attribution of own intellectual work
c. Attribution of corporate intellectual outcomes d. All of these
2. The citation manual published by Aspen Publisher which is known as alternative to other manuals
useful for law subject is –
a. APA b. MLA
c. ALWD d. Blue Book
3. Which citation style has been developed by University of Oxford and used in almost all universities in
the United Kingdom?
a. OSCOLA b. Blue Book
c. ALWD d. None of above
4. ‘International Standard Book Number is a ------------ digit unique numeric commercial machine-
readable identification number meant for monograph publications.
a. Eight b. Ten
c. Eleven d. Thirteen
5. Who is responsible for assessing ISBN in India?
a. National Library, Kolkata b. Imperial Library, Kolkata
c. National Book Trust, New Delhi d. Raja Ram Mohan Roy Foundation, Kolkata

Answer:
1-a 2-c 3-b 4-d 5-d

1. A free, open access repository of full-text scholarly literature in all subject, including working papers.
(True/False)
2. A preprint is a complete manuscript shared with a public audience without peer review. (True/False)
3. ICI cannot generate national R&D indicators like, Indian Journals Citation Reports, etc. (True/False)
4. DOAJ is only special academic database since all the indexed articles can be accessed freely.
(True/False)
5. arXiv's moderators have the same power as editors to reject papers. (True/False)
Answer:
1-True 2-True 3-False 4-False 5-True
COLUMN-I COLUMN-II
1. Web of science a. Scientific literature digital library and search engine
2. Cite SeerX b. A free digital repository
3. PubMed Central c. Master Journal List

4. Indian citation Index d. Free distribution service and an open-access archive scholarly article
5. Preprint site e. Developed by "The Knowledge Foundation"
Answer:
1-c 2-a 3-b 4-e 5-d

1. Google Scholar and Scopus do a lot more of this than -------------.
2. Scopus is a -------- database containing abstracts and citations for academic journal articles.
3. ICI is a new web platform for measuring performance of Indian ------------- periodically.
4. A --------- is a collection of data that stored in a structured format and analysis of the data.
5. In India first Copyright Act was passed in --------------.
Answer:
1- Web of Science 2- bibliographic 3- research 4-database 5-1914
SUMMARY
Summarized here are the widely used databases or those that have some unique features. We have
also included a good number of the freely accessible databases. Interestingly good deal of useful
information can be extracted using freely accessible bases.
Indexing is a very useful technique that helps in optimizing the search time in database queries. The
table of database indexing consists of a search key and pointer. There are four types of indexing:
Primary, Secondary Clustering, and Multivalued Indexing. Primary indexing is divided into two types,
dense and sparse. Dense indexing is used when the index table contains records for every search key.
Sparse indexing is used when the index table does not use a search key for every record. Multilevel
indexing uses B+ Tree. The main purpose of indexing is to provide better performance for data
retrieval.
PubMed Central is distinct from PubMed. PubMed Central is a free digital archive of full articles,
accessible to anyone from anywhere via a web browser (with varying provisions for reuse). Conversely,
although PubMed is a searchable database of biomedical citations and abstracts, the full-text article resides
elsewhere.
WorldWideScience.org implements federated searching to provide its coverage of global science and
research results. Federated searching technology allows the information patron to search multiple data
sources with a single query in real time. It provides simultaneous access to "deep web" scientific
databases, which are typically not searchable by commercial search engines.
Theses indexed by EThOS have a minimum of a thesis title, author, awarding body and date. Optional
additional metadata may be included such as the thesis abstract, doctoral advisor, sponsor, cross links to
other databases and the full text of the thesis itself.
In the context of looking at DOAJ, it is important to know about the open access model of publishing.
Essentially, it is a model where the authors, their institutions, funding bodies or other stakeholders pay
the publication costs, rather than operating a subscription model, or one where you are required to buy,
or even rent, an individual article.
ICI provides a multidisciplinary research platform covering about 1000 scholarly journals from India.
The ICI database also produces other useful byproducts like Indian Science Citation Index (ISCI),
Indian Social Science and Humanities Citation Index (ISSHCI), Indian Journals Citation Reports
(IJCR), The Indian Citation Index (ICI) is an online bibliographic database containing abstracts and
citations from academic journals. Currently ICI covers more than 1100 journals from India covering
scientific, technical, medical, and social sciences that includes arts and humanities.

"The term “preprint” most often refers to a manuscript that has gone through a peer-review process and
now awaits publication in a traditional journal. A preprint accessible over the Web may also be referred
to as an “e-print.” The popular preprint server arXiv.org, where physicists, Astronomy, mathematicians,
computer scientists, Quantitative biology, Chemistry, Statistics, and engineering routinely upload
manuscripts to publicly share their findings before peer review, now holds more than 1 million research
articles.
KEY WORDS
Data base- An organized collection of structured information, or data, typically stored electronically in a
computer system.
Data indexing- A database index is a data structure that improves the speed of data retrieval operations on
a database table at the cost of additional writes and storage.
Web of Science- A selective citation index of scientific and scholarly publishing covering journals,
proceedings, books, and data compilations.
Science direct- It is the world's leading source for scientific, technical, and medical research. Explore
journals, books and articles.
PubMed- PubMed is a free search engine accessing primarily the MEDLINE database of references and
abstracts on life sciences and biomedical topics.
Indian Citation Index- It provides powerful search engine basically to perform search and evaluation for
researchers, policy makers, decision makers etc.
Preprint Site- Preprint versions of articles may or may not be peer reviewed or may be the author's final,
peer-reviewed manuscript as accepted for publication.
REFERENCES
1. Gusenbauer M (2019). Google Scholar to overshadow them all? Comparing the sizes of 12 academic
search engines and bibliographic databases. Scientometrics. 118 (1): 177–214.
2. Giri R, Das AK (2011). Indian Citation Index: a new web platform for measuring performance of
Indian research periodicals. Library Hi Tech News. 28 (3): 33–35.
3. Gamble A (2018). Biological Abstracts (Clarivate Analytics). The Charleston Advisor. 20(1):19-25.
4. Kirkwood HP, Kirkwood MC (2011). Econlit and Google Scholar Go Head-to-Head. 35(2): 38–41.
YOUTUBE VIDEO
1. https://www.youtube.com/watch?v=iXGbH2hRsUw
2. https://www.youtube.com/watch?v=_WuYieVbKBU
3. https://www.youtube.com/watch?v=cD1Xml9E1_E

4. https://www.youtube.com/watch?v=n4F2Vh6YNF8
WIKIPEDIA
1. https://paperpile.com/g/google-scholar-guide/
2. https://libguides.ntu.edu.sg/c.php?g=929556&p=6716121
3. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4800951/
4. https://www.indiancitationindex.com/ici.aspx?target=benifits#
5. https://www.nature.com/articles/nature.2014.16643
REFERENCE BOOK
1. World Intellectual Property Organization (WIPO) (2016). Understanding Industrial Property. World
Intellectual Property Organization.
2. Rupinder Tewari and Mamta Bhardwaj (2021). Intellectual Property, A Primer for Academia. Panjab
University, Chandigarh.
3. Academic integrity and research quality (2021) University Grants Commission, New Delhi.
SUGGESTED READINGS
1. Adie, E., and W. Roe (2013). Altmetric: Enriching scholarly content with article-level discussion and
metrics. Learned Publishing 26 (1): 11–17.
2. Baykoucheva, Svetla, (2015). Managing Scientific Information and Research Data. by Science Direct.
3. Chaddah, P. and S.C. Lakhotia (2018). A Policy Statement on Dissemination and Evaluation of
Research Output in India. Proc. INSA 84 No. 2 June: 319–329.
4. Chakraborty, S., J. Gowrishankar, A. Joshi, P. Kannan, R. K. Kohli, S. C. Lakhotia, G. Misra, C. M.
Nautiyal, K. Ramasubramanian, N. Sathyamurthy and A. K. Singhvi (2020). In Summary for the
Month. NASI, April 2020.

CREDIT 04-UNIT 04: RESEARCH MATRIX
LEARNING OBJECTIVES
 Journal metrics knows the grant applications, academic promotion and potential journals to
publish.
 This database allows you to determine the relative importance of journals within their subject
categories.
 Impact factors of journal can be used or determined the standards of the journal
 To acquisition of journals in which the research paper publishes.
 Identify journals relevant to your research.
 Confirm the status of journals in which you have published.
"I believe in innovation and that the way you get innovation is you fund research and you learn the basic
facts."- Bill Gates
RESEARCH METRICS
Research metrics are bibliometric tools used in the publishing industry as indicators of research
performance at both the journal and author levels. The two main components of bibliometric
research are the number of publications and the number of citations to publications. Since its
introduction, the citation-based Journal Impact Factor (JIF) has been one of the most important
parameters for evaluating journals. For a long time, this was the only tool available to evaluate
the performance of scientific journals.
There are now a growing number of different research metrics available at the journal and author
level, from traditional impact factor to Eigen factor, h-index to Altmetrics and more. Based on the rich
resources of the SCI (Science Citation Index) database, the Institute for Scientific Information
(ISI) launched a tool to classify academic journals based on their citations and impact in the scientific
community. Beginning in 1975, SCI began publishing the JIF and the Instant Index as part of Journal
Citation Reports (JCR), providing an instant overview of cited data. From the beginning, the SCI
database contained the institutional information of all authors of articles published in the journal.

This facilitated scientific collaboration in terms of publishing journal articles, not only in writing research
articles but also in relation to laboratory experiments. In fact, from the very beginning, the ISI database
created traces of joint research and its possible globalization. Self-references are often omitted in the
quantitative calculation of research metrics. Self-citation occurs when an author cites his own previous
article or a journal article cites articles already published in the same journal. There is nothing unethical
about self-referencing and self-referencing journals. However, excessive self-referencing can raise
doubts among evaluators, data analysts, and others about the evaluation of studies.
04-04-01: JOURNAL METRICS

We provide an overview of the most important information in the publication-citation matrix. We show
how influence factors are defined and emphasize the difference between synchronous and diachronic
influence factors. The advantages and disadvantages of both as research evaluation tools are discussed.
There are two types of journal matrix such as:
i Impact Factor (IF)

ii Journal Impact Factor (JIF)
i. Impact Factor (IF)
Impact factor (IF) is probably the most talked about metric for evaluating journal performance. The
Journal Impact Factor (JIF) attempts to quantify the idea of citations to apply it to a wide range of
journals published in different disciplines. It was created to help librarians collect and manage journals in
the 1960s, and has since become a general benchmark for journal quality. After ISI (now Clarivate
Analytics) used journal statistics to compile the Science Citation Index (SCI), Journal Citation Reports
(JCR) began to be published in 1975 as part of the SCI and the Social Sciences Citation Index. SSCI).
JCR offers a potentially systematic and objective way of evaluating the world's leading journals using
statistically measurable citations. JCR provides a quantitative tool for ranking, rating, ranking and
comparing journals. JIF is one of these, which is a simple research metric, it is the average number of
citations of articles published in a journal in a given year in the two-year window immediately preceding
the year. ISI presented the JIF as an important tool for classifying academic journals by analyzing the
citations received and their impact on the scientific community.
ii. Journal Impact Factor (JIF)
An offshoot of citation analysis is Journal Impact Factor (JIF) which is used to sort or rank journals by
their relative importance. The underlying assumption behind Impact Factors (IF) is that journals with
high IF publish articles that are cited more often than journals with lower IF.

IMPACT FACTORS MAY BE USED BY:
i. Authors to decide where to submit an article for publication.

ii. Libraries to make collection development decisions
iii. Academic departments to assess academic productivity
iv. Academic departments to make decisions on promotion and tenure.
JOURNAL IMPACT FACTOR CALCULATED

Thomson defines impact factor as, “The journal Impact Factor is the average number of times articles
from the journal published in the past two years have been cited in the JCR year. The Impact Factor
is calculated by dividing the number of citations in the JCR year by the total number of articles published
in the two previous years. An Impact Factor of 1.0 means that, on average, the articles published one or
two year ago have been cited one time. An Impact Factor of 2.5 means that, on average, the articles
published one or two year ago have been cited two and a half times. Citing articles may be from the same
journal; most citing articles are from different journals.”
A journal's impact factor for 2008 would be calculated by taking the number of citations in 2008 to
articles that were published in 2007 and 2006 and dividing that number by the total number of articles
published in that same journal in 2007 and 2006. Below is how Thomson calculating the 2008 impact
factor for the journal Academy of Management Review:
Thus, the Impact Factor of 6.125 for the journal, Academy of Management Review for 2008 indicates that
on average, the articles published in this journal in the past two years have been cited about 6.125 times.
FACTORS TO CONSIDER (CONSULTING IMPACT FACTORS):

i. Publication Date: The impact factor is based on citation frequency of articles from a journal in
their first few years of publication. This does not serve well the journals with articles that get cited
over a longer period of time (let's say, 10 years) rather than immediately. In other words, journals

in rapidly expanding fields such as cell biology and computing tend to have much higher
immediate citation rates leading to higher IFs than journals in fields like Education or Economics.
ii. Journal Impact Factor Not Article Impact Factor: Citations to articles in a journal are not
evenly distributed. In fact, some articles in a journal may not be cited at all but a few highly cited
articles could lead to a high IF. Therefore, the IF does not accurately reflect the quality of
individual articles published in a journal. Also, journals with more issues and articles can have
higher Impact Factors which could be misleading as it does not really reflect the quality of
articles.
iii. Review Articles: Review articles (which tend to receive more citations), editorials, letters, and
news items are not counted in article total but if cited are counted as citations for the journal. This
leaves room for manipulation of ratio used to calculate impact factors leading to inflated impact
factors in some cases.
iv. Clinical Journals: Clinical journals usually have low citation counts. This puts such journals at a
disadvantage with research journals in the field that have higher citation counts.
v. Uneven Coverage: The Journal Citation Reports focuses much more on disciplines where the
primary means of publishing is through journal article. It provides less coverage to areas in Social
Sciences and Humanities, where books and other publishing formats are more prevalent.
LIST OF IMPACT FACTOR OF VARIOUS JOURNALS

Clarivate Analytics published the Journal Citation Reports (JCR) 2022 in June 2023. The report lists the
top 20 journals by highest impact factor. The best journal is Lancet Journal with an impact factor of
168.9. Other journals in the top 20 are Nature, Cell and Science. JCR is a valuable resource for
researchers and librarians because it provides information on the impact of academic journals.
Sr. No. Journal Name ISSN E-ISSN IF 2022
1. Lancet 0140-6736 1474-547X 168.9
2. New England Journal of Medicine 0028-4793 1533-4406 158.5
3. The American Medical Association 0098-7484 1538-3598 120.7
4. Nature Reviews Drug Discovery 1474-1776 1474-1784 120.1
5. Nature Reviews Molecular Cell Biology 1471-0072 1471-0080 112.7
6. BMJ-British Medical Journal 0959-535X 1756-1833 105.7

Sr. No. Journal Name ISSN E-ISSN IF 2022
7. Nature Reviews Immunology 1474-1733 1474-1741 100.3
8. Nature Reviews Microbiology 1740-1526 1740-1534 88.1
9. Nature Reviews Materials 2058-8437 2058-8437 83.5
10. Nature Medicine 1078-8956 1546-170X 82.9
11. Nature Reviews Disease Primers 2056-676X 2056-676X 81.5
12. Nature Reviews Clinical Oncology 1759-4774 1759-4782 78.8
13. Nature Reviews Cancer 1474-175X 1474-1768 78.5
14. Lancet Respiratory Medicine 2213-2600 N/A 76,2
15. World Psychiatry 1723-8617 2051-5545 73.3
16. Nature Reviews Gastroenterology & Hepatology 1759-5045 1759-5053 65.1
17. Nature 0028-0836 1476-4687 64.8
18. Cell 0092-8674 1097-4172 64.5
19. Lancet Psychiatry 2215-0374 N/A 64.3
20. Chemical Reviews 0009-2665 1520-6890 62.1

(Source: https://wos-journal.info/blog/the-top-20-highest-impact-factor-journals--)
THE JIF REMAINS A USEFUL

JIF is still a useful tool if you want to know how successful researchers or institutions are at publishing in
prestigious journals.
i Keeping front matter in cite
ii Short citation window
iii Not all papers are cited equally
iv Disciplinary differences
v Overall inflation
LIMITATIONS OF JIF
i The IF is an arithmetic mean and it doesn’t adjust for the distribution of citations.
ii The Impact Factor only considers the number of citations, not the nature or quality.
iii Impact Factors cannot be compared across different subject areas.
iv The JCR doesn’t distinguish between citations made to articles, reviews, or editorials.
v IF can show significant variation year-on-year, especially in smaller journals.
PROBLEMS OF IMPACT FACTOR

i The demise of the Impact Factor. The strength of the relationship between citation rates and IF is
down to levels last seen 40 years ago.
ii Impact factors declared unfit for duty.
iii High-impact journals: where newsworthiness trumps methodology.
iv Web of Science and its corresponding Journal Impact Factor are inadequate for an understanding
of the impact of scholarly work from developing regions.
v How journals manipulate the importance of research and one way to fix it.
vi High impact factors are meant to represent strong citation rates, but these journal impact. factors
are more effective at predicting a paper’s retraction rate.
THE IMPACT FACTOR IN THE ETHICAL WAY

Journal effect is one of the topics of discussion among new editors of low-profile journals. There are four
main ways to improve the quality of a small non-profit scientific journal and how to improve journal
quality?
i Attract quality manuscripts
ii Do good editorial work
iii Increase visibility of the journal
iv Trustful cooperation with owner and publisher
v The Five-year Impact Factor

1) What is a good Impact Factor for a journal?
Answer: In most fields, the impact factor of 10 or greater is considered an excellent score while 3 is
flagged as good and the average score is less than 1. However, the impact factor is best read in terms
of subject matter in the form of the 27 research disciplines identified in the Journal Citation Reports
2) How Impact Factor of a journal is calculated?
Answer: The Impact Factor is calculated by dividing the number of citations in the JCR year by the total
number of articles published in the two previous years. An Impact Factor of 1.0 means that, on average,
the articles published one or two year ago have been cited one time.

3) What is Elsevier journal?
Answer: Elsevier is a world-leading provider of information solutions that enhance the
performance of science, health, and technology professionals
4) What is h-index and impact factor?

Answer: The h-index serves as an alternative to more traditional journal impact factor metrics in
the evaluation of the impact of the work of a particular researcher. Because only the most highly
cited articles contribute to the h-index, its determination is a simpler process.
04-04-02: CITESCORE
"CiteScore (CS) of an academic journal is a metric that captures the annual average number of
citations for recent articles published in that journal." It was compiled by Ebsco based on the citations
recorded in the Scopus database. Absolute rankings and percentiles are also presented in the specific
topic of each journal. This journal evaluation metric was published in December 2016 as an alternative
to the Journal Citation Reports (JCR) Impact Factor (IF) calculated by Clarivate. CiteScore is based on
the JCR IF instead of two or five citations collected for articles published in the previous four years.
CiteScore's impartiality was questioned upon publication by bibliometrics experts such as Carl
Bergstrom, who found that it favored Elsevier's Nature titles.
CiteScore is another metric to measure the influence of a journal in Scopus. The calculation of the
current year's CiteScore is based on the number of citations received by the journal during the last 4
years (including the reporting year) divided by the number of documents published in the journal during
these four years. CiteScore 2022 is calculated as follows:
CALCULATION OF CITE SCORE IN THE YEAR 2022:

No. of citations received in 2019-2012 to documents published in 2019-2012
CiteScore in 2022 =
No. of documents published in 2019-2022
Note: Document types include: articles, reviews, conference papers, data papers and book chapters.
CiteScore 2022 were released in Jun 2023 with a new methodology. The new CiteScore counts only
peer-reviewed publication types and adopts a 4-year citation window in the numerator (instead of 1
year). Read this article to learn more about the new methodology.
CiteScore metrics are a family of 8 indicators, include: CiteScore, CiteScore Tracker, CiteScore
Percentile, CiteScore Quartiles, CiteScore Rank, Citation Count, Document Count and Percentage Cited.

i. CiteScore Tracker provides a current review of how a journal is performing during the course of
the year. It is updated every month.
ii. CiteScore Percentile indicates how a journal ranks relative to other journals in the same subject
field. (The fields are defined according to the Scopus field definitions).
Notes
i. CiteScore is a metric without field-normalization, thus should not be compared between
subject fields (different citation practices across disciplines affect the values of the metric). If
you wish to compare journals across subject fields, use SNIP or SJR instead, which are field-
normalized metrics.
ii. CiteScore is calculated on an annual basis, showing the average citations for a full calendar
year. CiteScore Tracker calculation is updated every month, giving a current indication of a
journal's performance.
DIFFERENCES BETWEEN CITESCORE AND IMPACT FACTOR

CiteScore has some of the same disadvantages as JIF, namely that it is not comparable between
domains and is an average calculated from a skewed distribution. It can be instructive to compare
CiteScore and JIF, as both indexes rate journals, naturally based on different databases.
CiteScore Journal Impact Factor

Uses a three-year citation window Uses two-year window
Based on the Scopus database (number of Based on Web of Science database
citations and journal coverage in certain subjects
is higher)
Includes all document types: in citation count in Numerator includes citations to any type of
the numerator and publication count in the only publication; denominator includes selected
denominator; both fully consistent document types.
Covers all subjects Only available for journals indexed in the SCIE and
SSCI
IMPACT PER PUBLICATION (IPP)

Based on the Scopus database, the IPP was launched according to the model of the JIF in 2014 and
calculated by the Center for Science and Technology Research (CWTS) at Leiden University. It is defined
as the number of citations given in a given year to publications of the past three years divided by the total
number of peer-reviewed publications (presentations, reviews) of the previous three years. For example, if
the IPP is considered for 2019, the publications would be from 2016-2018. Like the JIF, the IPP does not

address differences in citation practices between disciplines. It was previously known as the Raw Impact
Index (RIP) and was replaced by the CiteScore index in 2016. Quoted / Quoted from Half-Life Although
the JIF is an important tool in the hands of librarians, they must always make decisions about collecting
and removing individual journals and evaluating journal repurchases.
This is a very complex problem and there are no easy solutions, but the Cited Half-Life and Citing Half-
Life metrics shed light on the problem. "Cited" basically means citations received, while "Cited" means
citations given, both of which emphasize the journal's importance in archival records as well. In JCR,
chronological information is presented as packets of cited and cited journals, but not in a way that usage
patterns can be quickly understood. The Journal Half-Life Package presents this information in such a way
that chronological patterns are easy to see. Half-Life magazine can help librarians decide how long the
collected issues of the magazine catalog should extend.
Cited Half-Life looks at the citations that articles previously published in the journal receive (incoming
citations) in a JCR data year. This metric helps us understand the age of currently cited publications.
Journals receive citations for everything they've ever published in a year of JCR data, and Cited Half-Life
shows how far researchers go when they cite articles published in that journal. Cited Half-Life is the
average age of Journal Articles Cited in the JCR data year.
Half of the journal's cited articles were recently published as Cited Half-Life. For example, the five-year
cited half-life of a journal as of 2015 means that half of the journal articles cited in 2015 were published in
the previous five years, including 2015. It does not have to be a whole number, such as four, and half-life
can also be half-life. The half-life is always calculated backwards from the previous year. The age of the
reference is equal to the publication year of the reference minus the publication year of the cited
publication. This information can help evaluate retrospective purchases of the magazine. The referenced
Half-Life is a good indicator if you're interested in looking at magazines and seeing if older or newer
material in a magazine catches your eye.
Citing Half-Life looks at the citations of the year from JCR data. It is specifically defined as the median
age of citations to a JCR journal over a year. The citation age is equal to the publication year of the cited
publication (i.e., the JCR data year) minus the publication year of the cited publication. By definition, half
of the journal's outgoing citations are articles published before Citing Half-Life, and half are articles
published after Citing Half-Life. If a journal has a citation half-life of four, this means that the average age
of its citations is four years, half of the citations are from publications older than four years, and the other
half are older. The Half-Life reference provides a different view of a journal's relationship with its peers,
such as which journal it cites the most and how far back that citation relationship extends.

1) What is citation index in research?

Answer: A citation index is a kind of bibliographic database, an index of citations between publications,
allowing the user to easily establish which later documents cite which earlier documents.
2) What is difference between CiteScore and Impact Factor?

Answer: CiteScore calculation is based on Scopus data, while Impact Factor is based on Web of Science
data. CiteScore uses a 3-year window while Impact Factor adopts a 2-year window. CiteScore includes all
document types indexed by Scopus, include articles, reviews, letters, notes, editorials, conference papers,
etc.
3) What do you mean by impact factor?

Answer: The impact factor (IF) is a measure of the frequency with which the average article in a journal
has been cited in a particular year. It is used to measure the importance or rank of a journal by calculating
the times its articles are cited. How Impact Factor is Calculated?
4) What is Scopus index?

Answer: Scopus is a source-neutral abstract and citation database for journals, book series, and
conference series in the Health Sciences, Physical Sciences, Social Sciences, and Life Sciences literature,
and one of the largest and most widely used indexes.
04-04-03: NEWLY EMERGED INDICATORS

It is a specific, trackable and measurable achievement or change that shows progress in your logic model
or work plan towards a specific result or output. Common examples of indicators are: participation
rate, attitudes, individual behavior, incidence and prevalence. Basically, the goal is to capture both
qualitative and quantitative aspects of citations to provide indicators of journal performance. Own
factor and article impact are based on WoS data, while SNIP SJR and SCImago indicators are based on
Scopus data.
i. Eigenfactor,
ii. Article Influence,
iii. SNIP,
iv. SJR,
v. SCImago Journal
i. Eigenfactor
In 2007, the Web of Science-based JCR included Eigenfactor and Article Impact Score, two relatively
complex metrics compared to the JIF. These metrics were developed by the Bergston Laboratory at

Washington State University and are freely available at Eigenfactor.org. The own factor captures the
influence of a journal based on whether it has been cited in other prestigious journals. A citation from a
highly cited journal is more valuable than a journal with few citations. It also includes different citation
standards and different time scales during which citations occur in different subjects.
The eigenfactor score calculation is based on how many times a journal's articles published in the last five
years were cited in the JCR data year, but it also takes into account which journals have increased
citations, so highly cited journals have more influence on the web. . . than less cited journals. In other
words, if a journal receives citations from high-level or prestigious journals, the eigenfactor score will
be higher than another journal that receives the most citations from medium journals. Citations from one
journal article to another in the same journal are removed so that journal self-citation does not affect
eigenfactor scores.
Citations are also weighted according to the length of the bibliography from which they originate, to
accommodate different fields of research. Simply put, the calculation of the characteristic factor is the
weighted number of citations in the JCR data year to articles published in the journal during the last five
years divided by the total number of articles published in the journal during the same five years. Self-
authority scores are calculated based on WoS and JCR data.
The eigenfactor approach is considered more robust than the JIF metric, which counts only incoming
citations without considering the importance of those citations. The own factor measures the total
importance of the journal to the research community. The score measures the importance of a journal
and can be used together with the h-index to evaluate the work of individual researchers. The
eigenfactors are also usually very small numbers because the scores are now scaled so that the sum of all
JCR registry eigenfactors is 100, called the normalized eigenfactor, which rescales the eigenfactors so
that the average log score is 1. Pages can then be compared and measured against their score of 1. A diary
with a normalized eigenfactor score of 3 indicates that it is three times better than the JCR average diary.
This score does not take journal size into account, meaning that larger journals tend to have a higher
characteristic factor because they receive more citations.
A journal's Eigenfactor score is measured as its importance to the scientific community. Scores are scaled
so that the sum of all journal scores is 100. In 2006, Nature had the highest score of 1.992.
i. Intended to reflect the influence and prestige of journals.
ii. Created to help capture the value of publication output vs. journal quality (i.e. the value of a single
publication in a major journal vs. many publications in minor journals).
ADVANTAGES OF EIGENFACTOR/ARTICLE INFLUENCE SCORE :

i. Can be accessed for free.
ii. Includes a built-in evaluation period of five years.

iii. Attempts to give a more accurate representation of the merit of citations than do raw citation
counts.
DISADVANTAGES OF EIGENFACTOR/ARTICLE I NFLUENCE SCORE:

i. Eigenfactor assigns journals to a single category, making it more difficult to compare across
disciplines.
ii. Some argue that Eigenfactor score isn't much different than raw citation counts.
ii. Article Influence

Related to the right factor score, the Journal Article Influence (AI) score indicates the relative
importance of each article in the first five years after publication. For example, if an article was published
in 2010, AI measures its average impact from 2011 to 2015. This is similar in interpretation to the JCR
impact factor.
It is calculated by multiplying the characteristic factor score by 0.01 and dividing by the number of
articles in a journal, which is normalized as a fraction of all articles in a given database. Scores are
normalized so that the average impact score of all articles in the JCR database is 1.00. A score greater
than 1.00 indicates that each article in the journal has a greater than average impact. Eigenfactor
calculation methods and article impact scores can be found at eigenfactor.org.
Article Influence Score
The mean Article Influence Score is 1.00. An Article Influence Score greater than 1.00 indicates that the
articles in a journal have an above-average influence.
i. Measures the average influence, per article, of the papers published in a journal.
ii. Calculated by dividing the Eigenfactor by the number of articles published in the journal.
iii. Source Normalized Impact per Paper (SNIP)

SNIP is a key indicator provided by CWTS Journal Indicators. The original version was developed by
Henk Moed in 2009, revised in 2012. Based on the Scopus database, SNIP tries to measure the effect of
contextual citation by weighting citations based on the total number of references in a subject field, and
improves the subject’s specific properties through simplification. cross-disciplinary comparisons
between journals. It measures received citations relative to "citation potential" or expected citations in a
topic using Scopus data. Basically, the longer the reference list of a citing publication, the lower the
citation value of that publication. The effect of a single citation is appreciated more in subjects where
citations are less likely, and vice versa.

SNIP only counts citations from specific content types (articles, reviews, and conference proceedings)
and does not count citations from publications that Scopus classifies as "non-citing sources." These
include commercial journals and many publications in the arts and humanities. SNIP, which is
published twice a year, is calculated in the current year by dividing the number of normalized citations
given to publications from the previous three years by the total number of publications from the previous
three years. A journal with a SNIP of 1.0 has the average (not median) number of journal citations in its
field.
iv. SCImago Journal and Country Rank SCImago Journal and Country Rank, developed by the
SCImago Lab at the University of Granada, Spain, is a freely accessible web portal containing journal and
country scientific indicators developed from data contained in the Scopus database. SCImago's online
analysis environment enables analysis, monitoring and evaluation of scientific journals on the one hand
and national research systems on the other. Details of their rankings are available at SCImago.com.
Its main indicator is called the SCImago Journal Rank Indicator (SJR), which measures the scientific
prestige of an average article in a publication; in fact, it expresses how central the average journal article is
to the global scientific conversation. The purpose of the SJR indicator is to describe the effect of subject
matter, journal quality and reputation on citations. It calculates the value of a journal considering the
value of the sources that quote it, rather than counting all the quotes equally.
The SJR score is calculated by online analysis of citations received from journals. The methodology
takes into account both the number of citations and the source of the citations, since citations from highly
rated journals are more valuable than citations from weaker journals. Advertising value depends on the
field, quality and reputation of the source journals in which the referenced article is published. Each
citation received by a journal is given a weight based on the SJR of the citing journal. A citation from a
journal with a high SJR value is more valuable than a citation from a journal with a low SJR value.
SCImago uses the Scopus database and journal classification system to classify journals into different
fields and only considers peer-reviewed articles, reviews and conference proceedings. Calculating SJR is
an iterative process that distributes rank values among journals until a steady state solution is reached,
similar to the Google Page RankTM method. In practice, the SJR calculation divides the (weighted)
average number of citations of a publication in a certain year by the number of articles published in the
journal in the previous three years.

he average SJR is now normalized to 1, which means that articles with an SJR greater than 1 are ranked
above the average. Considering the citation behavior of different disciplines, SJR can be used to make
comparisons between journals in different disciplines. The effect of SJR is to equalize differences between
fields, ie. citations in highly cited fields (eg neuroscience, pharmacology) are less valuable than citations
in low cited fields (mathematics, humanities). In addition to the journal rankings, the SCImago laboratory
also calculates the SCImago Institutional Ranking (SIR), a classification of academic and research-
related institutions, using a composite indicator that combines three different sets of indicators based on
research performance, innovation performance and measured in the society effect based on their online
presence.

1) What is the difference between IPP and SNIP?
Answer: The number of publications of a source in the past three years. IPP. The impact per publication,
calculated as the number of citations given in the present year to publications in the past three years
divided by the total number of publications in the past three years.
2) What are indicators used for in research?

Answer: Indicators are used in establishing baselines, monitoring, and evaluation. Information is gathered
in the baseline to set the target for the indicator. Indicators can then be used for determining progress
toward results in monitoring as well as in monitoring the context of the conflict.
3) How do you choose indicators in research?

Answer: When selecting indicators, it is important to consider both the long-term as well as the short-term
objectives and how each will be measured. “Benchmark” indicators, which measure progress made toward
achieving greater outcomes, are key to ensuring programs and initiatives are on-track to reaching long-
term goals.
4) What is a good SNIP score for a journal?

Answer: A journal with a SNIP of 1.0 has the median (not mean) number of citations for journals in that
field. SNIP only considers for peer reviewed articles, conference papers and reviews. SNIP scores are
available from the two databases listed below: CWTS Journal Indicators and Scopus.
04-04-04: AUTHOR LEVEL METRICS

Author-level metrics measure the impact of an individual researcher's scientific output. Author-level
metrics are designed to help researchers assess the cumulative impact of their work, rather than a single
publication. All author-level metrics are derived from article-level metrics: they aggregate or summarize
the impact of an author's publications.
Fig.4.4.1: Author level matrix

It not only provides the first overview of researchers' contribution, but also creates a basis for
cooperation between researchers and institutions. Author-level metrics also provide a tool to evaluate
research contributions that play an important role in advancing their careers. However, it must be
emphasized that these cannot be the only criteria, as Eugene Garfield himself emphasized, but must be
supplemented by a simultaneous mechanism or other support mechanisms.
FREQUENTLY-USED METRICS
i h-index: measures the cumulative impact of a researcher's output by looking at the number of
citations a work has received.
ii i-10-index: created by Google Scholar, it measures the number of publications with at least 10
citations.
iii g-index: aims to improve on the h-index by giving more weight to highly-cited articles.
iv e-index: The aim of the e-index is to differentiate between scientists with similar h-indices but
different citation patterns.
v Altmatrics Altmetrics stands for "alternative metrics."
vi Unique ID: Digital Object Identifiers (DOIs) are used to uniquely identify digital research works,
and provide a persistent link to the location of the work on the internet.
Citation-based metrics for journals can easily be extended to include factors that influence their
productivity and measure their impact on the scientific community. We have already emphasized the
importance of citations, as they can easily be extended to conclude the contributions of authors at their
individual or collective level.
i. h-index
The H-index seems to provide better quality information than the total number of scientific publications
and the total number of citations received. Knowing the number of publications alone does not indicate
how well these articles have been received by other researchers. Similarly, the total number of citations

can be unduly affected by a small number of highly cited articles, which may not reflect a person's true
productivity, or a large number of poorly cited articles, where the quality of work.
Fig.4.4.2: The h-index

H-index
H-index = The number of papers (N) on a list of publications ranked in descending order by the times
cited that have N or more citations.
The H-index was developed by J.E. Hirsch and published in Proceedings of the National Academy of
Sciences of the United States of America. Full citation: Hirsch JE. An index to quantify an individual's
scientific research output. Proceedings of the National Academy of Sciences. 2005.
The main advantage of the h-index is that it is not biased upwards by a small number of highly cited
articles or slanted according to the long tail of cited works. A high h-index indicates that the research
receives enough attention in the literature and indicates the continuity and modernity of the research of a
certain author. Since July 2011, Google has provided an automatically calculated h-index to its Google
Scholar profile.
Fig.4.4.3: Calculation of h index graphically.
The Web of Science uses the H-Index to quantify research output by measuring author productivity and
impact.
H-Index = number of papers (h) with a citation number ≥ h.
Example: a scientist with an H-Index of 37 has 37 papers cited at least 37 times.
ADVANTAGES OF THE H-INDEX:

i. Allows for direct comparisons within disciplines
ii. Measures quantity and impact by a single value.
DISADVANTAGES OF THE H-INDEX:
i. Does not give an accurate measure for early-career researchers
ii. Calculated by using only articles that are indexed in Web of Science. If a researcher publishes an
article in a journal that is not indexed by Web of Science, the article as well as any citations to it
will not be included in the H-Index calculation.
TOOLS FOR MEASURING H-INDEX:

i. Web of Science
ii. Google Scholar
ii. i-10 index:
One such metric is i-10, which gives the number of publications with at least 10 citations. This is a good
index to check if the author has published a reasonable number of publications and if his peer group has
noticed him. Depending on different departments and subjects, it can be done "i-n", where "n" can be any
number. Therefore, such a metric is i-20, which gives the number of publications with at least 20
references. This is a good index to check that the author has a good number of publications of reasonable
quality and that he is noticed by his peers group Depending on different departments and subjects, it can
be done "i-n", where "n" can be any number.
Fig.4.4.4: i-index
Created by Google Scholar and used in Google's My Citations feature.
i10-Index = the number of publications with at least 10 citations.
This very simple measure is only used by Google Scholar, and is another way to help gauge the
productivity of a scholar.
ADVANTAGES OF I 10-I NDEX
i. Very simple and straightforward to calculate
ii. My Citations in Google Scholar is free and easy to use
DISADVANTAGES OF I 10-INDEX
i. Used only in Google Scholar

Leo Egghe's g-index
The G-index was proposed by Leo Egghe in his paper "Theory and Practice of the G-Index" in 2006 as an
improvement on the H-Index. G-Index is calculated this way: " ranked in decreasing order of the number
of citations that they received; the G-Index is the (unique) largest number such that the top g articles
received (together) at least g^2 citations."
The index is calculated based on the distribution of citations received for the publications of a given
author. Suppose that research papers are ranked in descending order by the number of citations they have
received, then the g-index is the unique largest number such that the best g papers together have received
at least g2 citations. Therefore, it can be defined as the largest number of g highly cited articles with an
average number of citations of at least g. The designed g-index effectively elevates low reference articles
to high reference articles.
ADVANTAGES OF THE G-INDEX:

i. Accounts for the performance of author's top articles
ii. Helps to make more apparent the difference between authors' respective impacts. The inflated
values of the G-Index help to give credit to lowly-cited or non-cited papers while giving credit for
highly-cited papers.
DISADVANTAGES OF THE G-INDEX:

i. Introduced in 2006. and debate continues whether G-Index is superior to H-Index. Might not be
as widely accepted as H-Index.

COMPARING H -INDEX AND G-INDEX
With the h-index, if an article has enough citations to enter the h-index, additional citations to the
same article are not that important, which mostly means those articles that have a much higher citation
than the h-index does not matter for this value. In other words, the h-index basically determines the
quality threshold of the publications. Instead, the g-index weighs highly cited articles. The g-index
value is always equal to or greater than the h-index value.
iv. e-index: The e-index is a complementary measure to the h-index in the evaluation of highly cited
researchers, in which case even the g-index may not catch up with the loss of cited data. The e-index helps
evaluate the overlooked excess references that they may receive in addition to the h-index.
hc index: If someone published some highly cited papers decades ago but is now inactive, the h-index
may be higher than an established scientist who continues to publish, or a promising new scientist who is
just beginning to gain recognition. The hc-index (the modern h-index) weights newer papers more than
older papers to account for their decline in research value over time. This gives a somewhat clearer picture
of the newer level of productivity and impact. m-index: Here, when comparing citations, the length of the
researchers' careers is taken into account, because a publisher who has been publishing for decades
will certainly have a higher h-index compared to a relatively younger researcher.
v. Altmetrics (Alt-metrics or Article-level Metrics or Alternative Metrics)

In the 21st century, driven by the Internet, researchers want to maintain their online profile through
social networks, international forums and peer-to-peer collaboration. Citation-based metrics have long
been an accepted measure of scholarly productivity and quality, but more and more authors and
researchers are using a toolbox often called Altmetrics. These next-generation tools provide a
fundamentally holistic picture of how one's own research affects the creation and dissemination of
knowledge, and how it is perceived by society at large.
Fig.4.4.5: Altmetric guide

Further efforts to free academia from dependence on the impact factor led to alternative metrics such as
the PlumX metric, article-level metric (ALM used by PLOS), or Altmetric (used by Nature Publishing,
Wiley, and others). With these tools, readers can now track coverage of individual studies across a wide
range of journal websites. They can see the number of citations, blog posts and social shares, or find out
how many times a script appeared in the news. Although all these figures can certainly help us to get a

complete and up-to-date picture of the online activity of individual journals, they should not be used
alone to evaluate the importance of a manuscript.
ADVANTAGES OF ALTMETRIC
i Early impact evidence: In practice, the most important advantage of many alternative indicators
is that they give early impact evidence.
ii Wider impact evidence: All Altmetrics and webometrics reflect impact that is at least partly
different from citation impact.
iii Publishers. Discover how published research is being used and shared around the world.
iv Institutions: Understand and interpret the attention surrounding your institution's research and
identify areas of strength or those that need improvement for your long-term objectives.
LIMITATIONS OF ALTMATRICS
There are a number of limitations to the use of altmetrics:
i Altmetrics don’t tell the whole story.
ii Like any metric, there’s a potential for gaming of altmetrics.
iii Altmetrics are relatively new, and more research into their use is needed.
iv Data are not normalized.
v Known tracking issues.
vi. Unique ID for Research Contributors/Authors

Researcher ID and Open Researcher and Contributor ID (ORCID), available at ResearcherID.com
and ORCID.org. Creating a UID also facilitates the creation of online profiles and chat groups based on
specific research interests. Since several scientists working in the same or different fields can have the
same first name and last name, the scientific community has a problem of unclear authorship. To
avoid this problem, the idea of giving each researcher a "unique author ID" has been proposed. The idea
of a centrally managed system to unambiguously identify the authors of scientific publications has been
around since the 1940s, but has gained new attention with the proliferation of online journals, databases,
and publication archives.
Fig.4.4.6: Open Researcher and Contributor ID

The advantages of such a system are as follows: From an administrative standpoint:

i Less ambiguity as to who has published a certain paper when different variations of an author’s
name have been used.
ii Ability to accurately measure citations of individual papers or authors.
iii Easier evaluation of an author’s productivity and impact in his/her field.
From an IT standpoint:
i Simplified data handling and storage; author identification only has to be stored in one place.
ii Richer cross-referencing possible, e.g., search engines, browsers, and other applications can create
links between an author’s biographical information and his/her published works.
iii Opportunity to create new networks of data, e.g., academic genealogies.

1) What is author-level eigenfactor metrics?
Answer: Author-level Eigenfactor is a version of Eigenfactor for single authors. Eigenfactor regards
authors as nodes in a network of citations. The score of an author according to this metric is his or her
eigenvector centrality in the network.
2) What is author score?

Answer: Author metrics are a way to measure a scholarly author's output and impact. An H-index is the
most common measurement (see "Glossary" in the filters to the left). An author's H-index can be found in
Google Scholar and Scopus, among others.
3) How do you measure author impact?

Answer: An author's impact on their field or discipline has traditionally been measured using the number
of academic publications he or she has authored and the number of times these publications are cited by
other researchers.
4) What is the h-index of an author?

Answer: The h index is a metric for evaluating the cumulative impact of an author's scholarly output and
performance; measures quantity with quality by comparing publications to citations. The h index corrects
for the disproportionate weight of highly cited publications or publications that have not yet been cited.

1) The impact factor is NOT
a. Author metric b. Journal metric
c. Calculated in 2, 3 or 5 years basis d. helpful in journal targeting

2) Which journal metric tracks citations of the last two years
a. CiteScore b. Impact Factor
c. H5 index d. Article Influence
3) Which journal metric tracks citations of the last three years
c. H5 index d. Eigenfactor
4) The five-year based journal metric is
c. SNIP d. Article Influence
5) The impact factor depends on (select more than one options, if applicable)
a. Number of citations b. Number of publications
c. Number of downloads d. Number of views
Answer:
1-a 2-b 3-a 4-d 5- b

1. Eigenfactor is the Journal metrics for controversial fields of research. (True/False)
2. Review Articles are cited more than other types of articles. (True/False)
3. Platform-independent author identifier is ORCID. (True/False)

5. i-10 index is an author metric by Google Scholar. (True/False)
5. i-10 index refers to number of papers with 20 or more citations. (True/False)
Answer:
1-False 2-True 3-True 4-True 5- False
Column-I Column-II
1. i10-index a. the highest number of publications of a scientist
2. h-index b. the number of publications with at least 10 citation
3. g-index c. the square root of the excess citations over those used for calculating
the h-index
4. e-index d. the unique largest number such that the top g articles received
together at least g2 citations.
Answer:

1-b 2-a 3-d 4-c

1. The ------------, created by Leo Egghe as a response to the h-index.
2. The i10-index was created by ---------- as an index to rank author impact.
3. The ---------, developed in 2005 by J.E. Hirsch, is one of the most widely used author-level-
metrics.
4. h-index of 15 means the researcher has at least --------- that have been cited at least 15 times each.
5. Author-level metrics help track an individual researcher's impact in an -------------.
Answer:
1-g-indrx 2- Google Scholar 3- h-index 4- 15 publications 5- academic discipline
SUMMARY
A journal articles are be impactful, they have to be discoverable, and online discovery rests almost entirely
on indexing. Journals included in an index are considered to be of higher quality than journals that are not
as these have to go through a vetting process to be included or indexed in reputed bibliographic databases.
Based on the citations, there are several research evaluation metrics for both journals and authors.
In order to address some of the drawbacks of JIF and related metrics, efforts have been made to develop
new-generation metrics, both using WoS and Scopus databases. These metrics involve complex algorithm-
based calculations for assessing the quality of journals using the vast mesh of citations. Eigenfactor and
Article Influence are based on WoS data, whereas SNIP and SJR indicators are based on Scopus data. The
Eigenfactor Score calculation is based on the number of times articles from the journal published in the
past five years have been cited in the JCR data year, but it also considers which journals have contributed
these citations so that highly cited journals will influence the network more than lesser cited journals, with
self-citations not being considered. Related to the Eigenfactor score, the Article Influence (AI) score of a
journal is a measure of the relative importance of each of its articles over the first five years after
publication.
Based on Scopus database, SNIP attempts to measures contextual citation impact by weighing citations
based on the total number of citations in a subject field and corrects subject-specific characteristics,
simplifying cross-discipline comparisons between journals. Similarly, SCImago Journal Rank Indicator
(SJR) measures the scientific prestige of the average article in a journal. Both SNIP and SJR use three
years window for taking into account the published papers in the Scopus database.
The citation-based metrics for journals can easily be extended to authors. h-index is the most widely
known author-level index and is a very widely used criterion as a proxy for author’s academic

achievements in the research domain. The index is defined as the number (h) of author’s research papers,
which have been cited at least the same number (h) of times. Along with the h-index, there are indices
such as i10, g-index, among others, e-index, m-index which have become popular.
In recent times, authors and scholars are increasingly using a basket of tools, usually referred to as
Altmetrics, which supposedly captures a holistic picture of how one’s research is perceived by society at
large. These new tools can be used by individual researchers, departments, institutions, and publishers. In
addition to citations, Altmetrics can include tweets, comments, shares or links, readers, subscribers,
followers, downloads, clicks or views, saves, bookmarks, and favorites. The Altmetric Attention Score is
presented within a colorful donut, each colour indicates a different source of online attention. These
metrics have also led to the creation of unique research identities, facilitated by the websites
ResearcherID.com and ORCID.org.
KEYWORDS
Research matrix- Using a review matrix enables you to quickly compare and contrast articles in order to
determine the scope of research across time.
Impact factors- The impact factor (IF) or journal impact factor (JIF) of an academic journal is a
scientometric index calculated by Clarivate that reflects the yearly.
Article influence- The Article Influence Score calculates measures the relative importance of the journal
on a per-article basis.
Unique ID- Digital Object Identifiers (DOIs) are used to uniquely identify digital research works.
i-index- A list of journals organized by discipline, subject, region or other factors.
h-index- The h-index reflects both the number of publications and the number of citations per publication.
g-index- It is the (unique) largest number such that the top g articles received (together) at least g²
citations.
e-index- A principal investigator is obligated not only to make academic impacts but also to do so in a
cost-effective fashion.
m-index- It is another variant of the h-index that displays h-index per year since first publication.
YOUTUBE VIDEO
1. https://www.youtube.com/watch?v=AjsHxxiDrQI
2. https://www.youtube.com/watch?v=lgVuyzke6OY
3. https://www.youtube.com/watch?v=IN587De8Pis
4. https://www.youtube.com/watch?v=yS7oWq2loA4

WIKIPEDIA
1. https://en.wikipedia.org/wiki/Impact_factor
2. https://www.elsevier.com/solutions/scopus/how-scopus-works/metrics/citescore
3. https://journalinsights.elsevier.com/journals/0969-806X/snip
4. https://en.wikipedia.org/wiki/SCImago_Journal_Rank
REFERENCES
1. Giri, Rabishankar; Das, Anup Kumar (2011). Indian Citation Index: a new web platform for
measuring performance of Indian research periodicals. Library Hi Tech News. 28 (3): 33–35.
2. Rupinder Tewari and Mamta Bhardwaj (2021). Intellectual Property, A Primer for Academia. Panjab
University, Chandigarh.
3. Academic integrity and research quality (2021) University Grants Commission, New Delhi.
4. Egghe, Leo. 2006. Theory and Practise of the g-index. Scientometrics. 69 (1): 131–152.
5. Hirsch, J.E. 2005. An Index to Quantify Individual’s Scientific Output. PNAS 102: 42 16569–16572.
SUGGESTED READINGS
1. Das, A.K. 2015. Research Evaluation Metrics. UNESCO.
2. Reitz, Joan M. 2013. Online Dictionary for Library and Information Science: http://www.abc-clio.
com/ODLIS/searchODLIS.aspx.
3. Roemer, Robin Chin and Rachel Borchardt. 2015. Meaningful Metrics: A 21st Century Librarian’s
Guide to Bibliometric and Research Impact. ala.org
4. Rousseau, R., Leo Egghe, and Raf Guns. 2018. Becoming Metric-wise: A bibliometric guide for
researchers. Science Direct.

Feedback Form
School of Sciences
(Formerly School of Architecture, Science & Technology)
Yashwantrao Chavan Maharashtra Open University,
Nashik – 422 222
FEEDBACK SHEET FOR THE STUDENT
Dear Student,
You have gone through this book, it is time for you to do some thinking for us.
Please answer the following questions sincerely. Your response will help us to
analyse our performance and make the future editions of this book more useful.
Your response will be completely confidential and will in no way affect your
examination results. Your suggestions will receive prompt attention from us.
Please submit your feedback online at this QR Code or at following link

https://forms.gle/rpDib9sy5b8JEisQ9
or email at: [email protected]

or send this filled “Feedback Sheet” by post to above address.
(Please tick the appropriate box)
Pl. write your Program Code Course Code & Name
Style
01. Do you feel that this book enables you to learn the subject independently
without any help from others?
Yes No Not Sure

02. Do you feel the following sections in this book serve their purpose? Write the
appropriate code in the boxes.
Code 1 for “Serve the purpose fully”
Code 2 for “Serve the purpose partially”
Code 3 for “Do not serve any purpose”
Code 4 for “Purpose is not clear”
Warming up Check Point Answer to Check Points
To Begin With Summary References

Objectives Key Words
03. Do you feel the following sections or features, if included, will enhance self -
learning and reduce help from others?
Yes No Not Sure
Index
Glossary
List of “Important Terms Introduced”
Two Colour Printing
Content
04. How will you rate your understanding of the contents of this Book?
Very Bad Bad Average Good Excellent
05. How will you rate the language used in this Book?
Very Simple Simple Average Complicated Extremely Complicated
06. Whether the Syllabus and content of book complement to each other?
Yes No Not Sure
07. Which Topics you find most easy to understand in this book?
Sr.No. Topic Name Page No.

08. Which Topics you find most difficult to understand in this book?
Sr.No. Topic Name Page No.
09. List the difficult topics you encountered in this Book. Also try to suggest
how they can be improved.
Use the following codes:
Code 1 for “Simplify Text”
Code 2 for “Add Illustrative Figures”
Code 3 for “Provide Audio-Vision (Audio Cassettes with companion Book)”
Code 4 for “Special emphasis on this topic in counseling”
Sr.No. Topic Name Page No. Required Action Code
10. List the errors which you might have encountered in this book.
Sr.No. Page Line Errors Possible Corrections

No. No.

11. Based on your experience, how would you place the components of distance
learning for their effectiveness?
Use the following code.
Code 1 for “Most Effective” Code 3 for “Average” Code 5 fo r “Least Effective”
Code 2 for “Effective” Code 4 for “less Effective”
Printed Book Counseling Lab Journal
Audio Lectures Home Assignment YouTube Videos
Video Lectures Lab-Experiment Online Counseling
12. Give your overall rating to this book?
1. 2. 3. 4. 5.
13. Any additional suggestions:
Thank you for your co-operation!

RES505 Research Methodology

Uploaded by

Copyright:

Available Formats

RES505 Research Methodology

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

RES505 Research Methodology

Uploaded by

Copyright:

Available Formats

YASHWANTRAO CHAVAN MAHARASHTRA OPEN UNIVERSITY

V151: M.Sc. Mathematics

Maharashtra Open Research Methodology

Foreword By The Director ............................................................................................................................4

CREDIT 01- UNIT 01: RESEARCH ......................................................................... 6

CREDIT 01-UNIT 02: RESEARCH METHODOLOGY ............................................ 31

CREDIT 01-UNIT 03: EXPERIMENTATION DESIGN .......................................... 50

CREDIT 01-UNIT 04: SAMPLING METHODS ....................................................... 71

CREDIT-02 UNIT-01: DATA COLLECTION .......................................................... 94

CREDIT 02-UNIT 02: REPRESENTATION OF DATA .......................................... 123

CREDIT 02-UNIT 03: GRAPHICAL REPRESENTATION..................................... 147

CREDIT 02-UNIT 04: DATA ANALYSIS ............................................................. 177

CREDIT 03-UNIT 01: INFERENTIAL STATISTICAL IN RESEARCH .................. 203

CREDIT 03-UNIT 02: BIOSTATISTICAL TEST .................................................. 225

CREDIT 03-UNIT 04: APPLICATION OF CORRELATION OF DATA .................. 264

CREDIT 04-UNIT 01: LITERATURE COLLECTION ........................................... 286

CREDIT 04-UNIT 02: INTELLECTUAL PROPERTY RIGHTS ............................ 322

CREDIT 04- UNIT-03: RESEARCH DATABASE ................................................ 353

CREDIT 04-UNIT 04: RESEARCH MATRIX ......................................................381

Feedback Form ..........................................................................................................................................406

RES505: Research Methodology Page 1

This work by YCMOU is licensed under a Creative Commons Attribution-

RES505: Research Methodology Page 2

Dear Students, Greetings!!!

RES505: Research Methodology Page 3

- Dr. Sunanda More

RES505: Research Methodology Page 4

RES505: Research Methodology Page 5

RES505: Research Methodology Page 6

RES505: Research Methodology Page 7

RES505: Research Methodology Page 8

GOOD QUALITY RESEARCH

RES505: Research Methodology Page 9

i. Facilitates Discoveries: Research leads to the development of new concepts, theories,

RES505: Research Methodology Page 10

SHORT ANSWER QUESTIONS WITH MODEL ANSWER

2) What is the purpose of the research?

RES505: Research Methodology Page 11

4) What are research tools?

01-01-02: ESSENTIAL STEPS IN RESEARCH

RES505: Research Methodology Page 12

1. SELECTION OF RESEARCH PROBLEM

2. LITERATURE RESEARCH OR EXTENSIVE LITERATURE RESEARCH

4. CREATION OF THE RESEARCH DESIGN

RES505: Research Methodology Page 13

8. PREPARATION OF THE REPORT

SHORT ANSWER QUESTIONS WITH MODEL ANSWER

2) Why is the research process important?

3) What is the first step of the investigation?

Answering this question will be the focus of the research study.

01-01-03: TYPES OF RESEARCH

RES505: Research Methodology Page 15

RES505: Research Methodology Page 16

Fig. 3.1: varieties of research

RES505: Research Methodology Page 17

BENEFITS OF BASIC RESEARCH

iii. Providing a basis for applied analysis

RES505: Research Methodology Page 18

ADVANTAGES OF APPLIED ANALYSIS