RES505 Research Methodology
RES505 Research Methodology
RES505 Research Methodology
SCHOOL OF SCIENCES
(FORMERLY SCHOOL OF ARCHITECTURE, SCIENCE & TECHNOLOGY)
RES505
RESEARCH METHODOLOGY
(4 Credits)
Semester - I
Email: [email protected]
Website: www.ycmou.ac.in
Phone: +91-253-2231473
Yashwantrao Chavan RES505
Brief Contents
Vice Chancellor’s Message............................................................................................................................3
Credit 01 .........................................................................................................................................................5
CREDIT 02 ...................................................................................................................................................93
CREDIT 03 .................................................................................................................................................202
CREDIT 03-UNIT 03: USE OF ANOVA FOR THE RESEARCH ANALYSIS .......... 247
CREDIT- 04 ...............................................................................................................................................285
As a post graduate student, you must have autonomy to learn, have information and
knowledge regarding different dimensions in the field of Science and at the same time
intellectual development is necessary for application of knowledge wisely. The process of
learning includes appropriate thinking, understanding important points, describing these points
on the basis of experience and observation, explaining them to others by speaking or writing
about them. The science of Education today accepts the principle that it is possible to achieve
excellence and knowledge in this regard.
The syllabus of this course has been structured in this book in such a way, to give you
autonomy to study easily without stirring from home. During the counseling sessions, scheduled
at your respective study centre, all your doubts will be clarified about the course and you will get
guidance from some qualified and experienced counsellors/ professors. This guidance will not
only be based on lectures, but it will also include various techniques such as question-answers,
doubt clarification. We expect your active participation in the contact sessions at the study
centre. Our emphasis is on ‘self-study’. If a student learns how to study, he will become
independent in learning throughout life. This course book has been written with the objective of
helping in self-study and giving you autonomy to learn at your convenience.
During this academic year, you have to give assignments, complete laboratory activities,
field visits and the Project work wherever required. You have to opt for specialization as per
programme structure. You will get experience and joy in personally doing above activities. This
will enable you to assess your own progress and thereby achieve a larger educational objective.
We wish that you will enjoy the courses of Yashwantrao Chavan Maharashtra Open
University, emerge successful and very soon become a knowledgeable and honorable Master’s
degree holder of this university.
I congratulate “Development Team” for the development of this excellent high quality
“Self- Learning Material (SLM)” for the students. I hope and believe that this SLM will be
immensely useful for all students of this program.
Best Wishes!
- Prof. Dr. P. G. Patil
Vice-Chancellor, YCMOU
Dear Students,
Greetings!!!
This book aims at acquainting the students with conceptual and applied fundamentals about
Research Methodology required at PG level. The book has been specially designed for Science
students. It has a comprehensive coverage of concepts and its application in practical life. The
book contains numerous examples to build understanding and skills. The book is written with
self- instructional format. Each chapter is prepared with articulated structure to make the contents
not only easy to understand but also interesting to learn. Each chapter begins with learning
objectives which are stated using Action Verbs as per the Bloom’s Taxonomy. Each Unit is
started with introduction to arouse or stimulate curiosity of learner about the content/ topic. There
after the unit contains explanation of concepts supported by tables, figures, exhibits and solved
illustrations wherever necessary for better effectiveness and understanding. This book is written
in simple language, using spoken style and short sentences. Topics of each unit of the book
presents from simple to complex in logical sequence. This book is appropriate for low achiever
students with lower intellectual capacity and covers the syllabus of the course. Exercises given in
the chapter include conceptual questions and practical questions so as to create a ladder in the
minds of students to grasp each and every aspect of a particular concept. I thank the students who
have been a constant motivation for us. I am grateful to the writers, editors and the School faculty
associated in this SLM development of the Programme.
Best Wishes to all of you!!!
"I believe in innovation and that the way you get innovation is you fund research, and you learn
the basic facts." - Bill Gates.
INTRODUCTION
"Research, more simply put, is the search for knowledge and the search for truth". In the formal
sense, it is a systematic study of a problem addressed through a deliberately chosen strategy that
begins with the choice of an approach to preparing a preliminary plan, in terms of the development
of research hypotheses, the choice of methods and techniques and selection implemented to
development of data collection tools, data processing, interpretation and ends with the presentation
of the solution(s) to the problem.
Fig. 1.1.1
Research is creative and systematic work undertaken to increase the stock of knowledge”.
Research studies are done to discover new information or to answer a question about how we learn,
behave and function with the end-goal of benefitting society. Some studies might involve simple
tasks like completing a survey, being observed among a group of people or participating in a group
discussion.
The research purpose is a statement of "why" the study is being conducted, or the goal of the study.
The goal of a study might be to identify or describe a concept or to explain or predict a situation or
solution to a situation that indicates the type of study to be conducted (Buckingham, 1974).
01-01-01: RESEARCH
Research has been interpreted and defined by various scholars based on their areas of study and the
availability of resources at any given time. You will find that the basic meaning and context of these
definitions are the same. The difference between these definitions lies solely in the way the author
has researched his discipline. According, to Thyer, (2001) the word investigation consists of two
syllables, re and search. Re is a prefix meaning again, and again. Search is a verb that means to
examine, test and test or examine closely and carefully. It is also, a noun that describes careful,
systematic, study and research in a field of knowledge undertaken to ascertain fact or principle.
According to Merriam-Webster's online dictionary, the word research derives from the Middle
French recherche”, meaning to go in search, the term itself derives from the old French term
"investigator", a compound word of "re-" + "cerchier" or "seeker", meaning "to search". The first
recorded use of the term Research was in 1577. Research is structured inquiry that uses acceptable
scientific methods to solve problems and create new, generally applicable knowledge‖ (Dawson,
2019).
DEFINITION
Research has been defined in a variety of ways and there are similarities. Such as: -
"An in-depth study of the subject, in particular to discover (new) information or arrive at a
(new) understanding".
“It is the foundation of knowledge and provides guidelines for solving problems.”
“The creation of new knowledge and/or the use of existing knowledge in a new and creative
way to generate new concepts, methods and insights”.
“A detailed and careful study of something to find out more information about it.”
“Research is defined as careful consideration of study regarding a particular concern or
problem using scientific methods.”
Another definition of research is given by 'John W. Creswell' who states that "Research is a
process of steps used to collect and analyze information to improve our understanding of a
topic or problem." It consists of three steps: -ask a question, collect data to answer, and submit
an answer to the question.
OBJECTIVES OF RESEARCH
To understand clearly an observation and explain its logic and reason for happening:
i. To get insights about problem.
ii. To find solutions for a problem.
iii. To test existing laws or theories.
v. To develop new ideas, concepts and theories.
vi. To test hypothesis of a causal relationship between variables.
vii. To identify areas where research could make the difference.
viii. To predict future of events.
TYPES OF RESEARCH
The research is the different methodologies used to conduct research. Based on research goals,
timelines and purposes, different types of research are better suited for certain studies. The first part
of designing research is to determine what you want to study and what are your goals. For
example, you may simply want to learn more about a topic, or you may want to try to determine how
a new policy will affect lower-level employees at your company. The followings are the types of
research such as--
1. Fundamental research 2. Applied research
3. Qualitative research 4. Quantitative research
5. Mixed research 6. Exploratory research
7. Longitudinal research 8. Cross-sectional research
A research is a wide task and it requires great efforts for a researcher. A good research should have
the following qualities:
i. Clarity: It is the most significant quality of any research. The research should be clear so
that others can easily understand the nature of your research. The research should have a
single version so that people cannot get sidetracked. The topic should have to be very clear
in mind of researcher so that he can properly undertake it. The research topic should have to
be free of any vagueness. Clarity also means that the research should have to be directional
and it should set the whole research methodology.
ii. Planned Research Design: Research design must be properly planned. For example, if a
researcher is using sampling technique for a selected group, the researcher must make the
sample representative. Here, the researcher can collect primary as well as secondary data.
The major challenge generally seen is personal bias in selecting data by the researcher.
iii. Maintain Ethical Standard: Mainly researchers work independently. Data reliability
should be the main concern. Ethical issues involved in conducting research should be given
precedence.
iv. Organized Presentation of Findings: The most important task of a researcher is to present
research findings in an organized manner. Researcher should avoid technical jargons and
must include objectivity in results.
v. Emphasize Limitations: It is desirable that the researcher points out limitations which he
has gone through the process of research. Limitations may be related to data collection,
shortage of time, money, etc.
vi. Rationalize Conclusions: Researchers must verify their work and provide rationalized
conclusions, which are mainly received when the research work is free from bias.
LIMITATIONS OF RESEARCH
i. Bias by Researcher: Bias is a major issue in the success of any research work. Bias takes
place at many levels like personal bias by researcher, biased questionnaire, biased
respondent or improper sampling.
ii. Defective Data Collection: When a researcher is not loyal towards his work, he may use
faulty methods of data collection leading to faulty conclusions.
iii. Existence of Subjectivity: Subjectivity occurs when researcher is inclined by likes and
dislikes, beliefs, faith, etc. These factors may have a negative impact on the worth of
research and cause damage thereby increasing subjectivity of the research work.
iv. Lengthy and Time-consuming: Research is a lengthy process and a time-consuming
activity. Even though carried out in systematic manner, exploratory research may require
more time.
RESEARCH GOALS
The aim of research is to use scientific methods to find answers to questions. The main goal of
the investigation is to find the truth that is hidden and not yet discovered. Although each research
study has its own specific purpose, we can imagine research objectives falling into the following
general groups:
i. To become familiar with a phenomenon or to gain new knowledge;
ii. Accurately represent the characteristics of a particular person, situation, or group;
iii. Determine how often something occurs or is associated with something else;
iv. To test a hypothesis of a causal relationship between variables.
MOTIVATION IN RESEARCH
Why the people to do research? This is a question of fundamental importance. Possible reasons for
conducting an investigation may be one or more of the following:
i. Desire for a research degree with the associated advantages;
ii. The desire to take on the challenge of solving unsolved problems, i.e. concern for practical
problems, initiates research;
iii. Desire to gain the intellectual pleasure of creative work;
iv. Desire to be of service to society;
v. Desire to gain reputation.
3. HYPOTHESIZING
The development of "hypotheses" is a technical task that depends on the experience of the
researcher. Hypothesis consists in extracting the positive and negative aspects of cause and effect of
a problem. Narrows the scope of investigation and keeps the investigator on track.
5. SAMPLING
The researcher must design a sample. It's a plan to take your respondents from a specific area or
universe. The sample can be of two kinds:
i Probability sample
6. DATA COLLECTION
Data collection is the researcher's most important task. The information collection should include
data coming from the following two types of researchers. Such as-
i. Primary data collection: Primary data can be any of the following. a. Trial
b. Questionnaire
c. Observation
d. Interview
ii. Secondary data collection: It has the following categories:
a. Literature review
b. Official and unofficial reports
c. library approach
7. DATA ANALYSIS
As data is collected, it is sent for analysis, which is the most technical work. Data analysis can be
divided into two main categories.
i. Data processing: It is divided as follows data manipulation, data coding, data classification,
data tabulation, data presentation, data measurement
ii. Data Exposure: Data exposure has the following subcategories- Description, explanation,
narrative, conclusion/findings, recommendations/suggestions
Research is a systematic process of inquiry that involves the collection of data; documentation of
critical information; and analyzing and interpreting this data/information in accordance with
appropriate methods established by particular professional fields and academic disciplines.
"Discover scientific knowledge and stay connected to the world of science".
RESEARCH PURPOSE :
i. Establishment of leads and new customers
ii. Existing customers perceive
iii. Set pragmatic goals
iv. Develop productive market methods
v. Addressing Business Challenges
vi. Placing along a commercial extension creates
vii. Building new business opportunities
CHARACTERISTICS OF RESEARCH
i. Smart research follows a scientific approach to capture the right knowledge.
ii. Researchers must observe ethics and a code of conduct when making observations or drawing
conclusions.
iii. The analysis is based on logical thinking and includes inductive and deductive methods.
iv. The data and knowledge over time comes from actual observations in natural settings.
v. There is a thorough Associate in nursing analysis of all accumulated knowledge so there are
no associated anomalies.
vi. It creates an opportunity to generate new questions. Existing data help to create additional
analysis options.
vii. It is analytical and uses all the data so there is no ambiguity in the conclusion.
viii. Accuracy is one of the most important aspects of research.
ix. Knowledge must be right and correct.
RESEARCH GOAL
TYPES OF RESEARCH
There are 7 main types of research. Researchers choose or acquire methods according to the type of
research topic they want to investigate and the research questions they want to answer (Fig.3.1).
Applied research could be a style of inquiry where one wants to look for meaningful solutions to
"existing" problems. These include challenges at work, in education and in society. This type of
i. Action Research: Action research helps companies find meaningful solutions to problems
by guiding them.
ii. Analysis Research: In valuation research, researchers associate existing information to help
buyers make an intelligent decision.
iii. Analysis and Development: Analysis and development specializes in producing new goods
or services to meet the needs of a target market.
5. Empirical Research
Empirical Research is defined as research using empirical evidences. It is an inquiry in which the
conclusions of the study are drawn solely from concrete empirical evidence and therefore testable
evidence. This empirical evidence is often collected in quantitative victimization research and
qualitative market research methods.
E.g. investigate whether paying attention to upbeat music while at work might encourage creativity.
The nursing assistant experiment is conducted by conducting a music website survey among a group
of listeners who are exposed to upbeat music and another group who pay no attention to the music at
all, and then observe the subjects.
The results obtained from such studies can provide an empirical test of whether or not they promote
power.
6. Descriptive Research
Descriptive research is outlined as a research technique that describes the characteristics of the
population or development under study. This descriptive methodology focuses more on the what‖ of
the research subject than on the why of the research subject. The descriptive research method focuses
9. Build Credibility
People tend to take a person's ideas seriously when it's clear they know them. Participating in
research helps form a strong foundation for opinion formation. It also makes it hard for people to
find fault with something you've come up with.
10. Focus your reach
If you're diving into a subject for the first time, it can be difficult to know where to start. Most of the
time, you have a huge amount of information to sort through. Research helps focus on the most
important and unique points so you can write meaningfully.
11. Teach Discrimination
As you become proficient in research, you can easily identify low- and high-quality data. You will
become more adept at distinguishing correct information from misinformation. Any gray area will
become clear where the truth is, but the conclusion may be in doubt.
12. Introducing new ideas
You may already have ideas and opinions on a topic you are researching. The more research you do,
the more perspectives you discover. It encourages you to entertain with new ideas and also to
reconsider your own point of view. It might even change your mind about a concept or two.
APPLICATIONS OF RESEARCH
Research has wide range of applications in various fields. These applications are used in almost
every industry.
COLUMN-I COLUMN-II
1. Fundamental research a. To find solution to problem by practical
2. Applied research b. To establish fact experimentally
Answer:
1-a 2-b 3-d 4-e 5-c
SUMMARY
According to American Sociologist Earl Robert Babbie, Research is a systematic investigation that
describes, explains, predicts, and controls an observed phenomenon. Research is a systematic
investigative process that includes the collection of data; documentation of important information;
and analyzes and interprets such data/information according to appropriate methods established by
particular professional and academic fields‖.
The research topic or problem must be practical, relatively important, feasible, and ethically and
politically justified. After selecting the research problem, the second stage is the literature related
mainly to the subject. After formulating the problem and forming a hypothesis about it, the
researcher must create the research design. Any type of research plan can be conducted depending on
the nature and purpose of the research. The researcher must design a sample. Data collection is the
most important task of the researcher. Information collection should include data from the following
two types of researchers.
Research refers to the efforts of people to learn about a subject and to develop new knowledge‖.
People do research to learn about academic conversations about a topic, to identify gaps in
knowledge, to recognize research needs, and to develop new solutions to problems. Research is
structured inquiry that uses acceptable scientific methods to solve problems and create new,
generally applicable knowledge. Rocco, (2011) research is inquiry or careful investigation,
particularly through the search for new facts in any branch of knowledge‖. Creswell, (2008)
"Research is a systematic investigation in order to establish facts." In the broadest sense of the word,
the definition of research includes any collection of data, information and facts for the advancement
of knowledge. "Research involves defining and redefining problems, formulating hypotheses or
proposed solutions, collecting, organizing, and evaluating data, drawing conclusions, and reaching
conclusions to see whether they fit the formulated Hypothesis."
The main reason to participate in research is to increase your knowledge. If you are researching a
topic that is completely new to you, it will help you develop your unique perspective on that topic.
The whole study process opens new doors for literary learning and development. A person's ability to
learn is enhanced and they can perform better than someone who is just reluctant to study.
Research is widely used in the medical field and in various pharmaceuticals to perform tests and find
new drugs to cure various diseases. It is thanks to research that pharmaceuticals have the ability to
synthesize new molecules and suitable diseases such as mumps, measles, polio, etc. Drugs are part of
the pharmaceutical industry, and the second big leap in funding is medical technology. Companies
KEYWORD
Research- This is creative and systematic work done to increase the source of knowledge.
Hypothesis- A conjecture or suggested explanation made on the basis of limited evidence as a
starting point for further investigation.
Information- Knowledge gained from inquiry, study or instruction
Application- An act of putting something to use by applying new techniques.
Quantitative- It is a measure based on quantity or quantity rather than quality.
Qualitative- Qualitative data describing qualities or characteristics.
Basic - It relates to, or forms the basis of, or nature.
REFERENCES
1. Kukull, W. A.; Ganguli, M .(2012). The trees, the forest, and the lowhangingfruit. Neurology.
78(23):1886-1891.
2. Pepinsky, Thomas B. (2019). The Return of the Single-Country Study. Annual Review of
Political Science. 22: 187-203.
3. Alasuutari, Pertti, (2010). The rise and relevance of qualitative research. International Journal of
Social Research Methodology. 13 (2): 139–55.
4. Lichtman, Marilyn, (2013). Qualitative research in education: a user's guide (3rd ed.). Los
Angeles: SAGE Publications. ISBN 978-1-4129-9532-0.
YOUTUBE VIDEOS
1. https://www.youtube.com/watch?v=TFaKHyJGqvQ
2. https://www.youtube.com/watch?v=mV0bUQpz468
3. https://www.youtube.com/watch?v=GSeeyJVD0JU
4. https://www.youtube.com/watch?v=w_Ujkt83i18
RES505: Research Methodology Page 29
WIKIPEDIA
1. https://en.wikipedia.org/wiki/Research
2. http://studylecturenotes.com/10-steps-in-research-process/
3. https://www.uou.ac.in/sites/default/files/slm/BHM-503T.pdf
4. https://theimportantsite.com/10-reasons-why-research-is-important/
5. https://www.marketing91.com/applications-of-research/
REFERENCE BOOKS
1. Kumar Ranjit: Research Methodology: A Step by Step Guide for Beginners, Sage Publication,
2014.
2. Kothari CR: Research Methodology, New Age International, 2011.
3. Shajahan S: Research Methods for Management, 2004.
4. Thanulingom N: Research Methodology, Himalaya Publishing, 2015.
5. Rajendar Kumar C: Research Methodology, APH Publishing, 2008.
“There’s no discovery without a search and there’s no rediscovery without a research. Every
discovery man ever made has always been concealed. It takes searchers and researchers to unveil
them, that’s what make an insightful leader”- Benjamin Suulola
INTRODUCTION
A research methodology gives research legitimacy and provides scientifically sound findings. It
also provides a detailed plan that helps to keep researchers on track, making the process smooth,
effective and manageable. A researcher's methodology allows the reader to understand the approach
and methods used to reach conclusions.
Some of the sound research methodology which provides the following benefits:
i. Other researchers who want to replicate the research have enough information to do so.
ii. Researchers who receive criticism can refer to the methodology and explain their
approach.
iii. It can help provide researchers with a specific plan to follow throughout their research.
iv. The methodology design process helps researchers select the correct methods for the
objectives.
v. It allows researchers to document what they intend to achieve with the research from the
outset.
In a thesis, dissertation, academic journal article or other formal pieces of research, there are often
details of how the researcher approached the study and the methods and techniques they used. If
you're designing a research study, then it's helpful to understand what research methodology is and
the selection of techniques and tools available to you. In this article, we explore what research
methodology is, the types of research methodologies and the techniques and tools commonly used to
collect and analyze data. Overall these research methodology activities are covered in the unit.
DATA COLLECTION
In statistics, data collection is a process of gathering information from all relevant sources to
find a solution to a research problem. This helps to evaluate the outcome of the problem. The data
collection method allows one to conclude answers to related questions. Most organizations use data
collection methods to make assumptions about future probabilities and trends. Once the data has
been collected, it is necessary to go through the data sorting process.
The main source of data collection methods is Data. Data can be classified into two types, primary
data and secondary data. The most important thing about data collection in any research or
business process is that it helps to identify many important things about the business, especially
performance. So, the data collection process plays an important role in all flows.
According to data type, data collection methods are divided into two categories:
1. Primary data collection method
2. Secondary data collection method
In this article, different types of data collection methods and their advantages and limitations will be
explained.
1. Primary data collection methods
Primary data or raw data is the type of information obtained directly from the first source through
experiments, surveys or observations. The main data collection methods are classified into two
categories. They are: -
i. Quantitative data collection method
ii. Qualitative data collection method
Below are the different methods taken to collect data under these two data collection methods.
i. Quantitative data collection methods: These methods are based on mathematical calculations
using different formats such as closed-question, correlation and regression methods, mean,
median, or mode measurements. degree. This method is less expensive than the qualitative data
collection method and it can be applied in a short period of time.
ii. Qualitative data collection methods: These methods do not involve any mathematical
calculations. This method is closely associated with non-quantifiable elements. This qualitative
data collection method includes interviews, questionnaires, observations, case studies, and
more. There are several methods to collect this type of data. They are
a. Observation method
Nature observation is one of the research methods that can be used to design observational studies.
Another common type of observation is controlled observation. In this case, the researcher observes
the participant in a controlled environment. Observers control for most variables and ensure that
participants are structurally observed.
In nature observation, you study your research subjects in their own environment to explore their
behaviors without any outside influence or control. It is a research method used in field studies.
Traditionally, observational studies of nature have been used by animal researchers, psychologists,
ethnologists, and anthropologists. Observation of nature is useful as a method of hypothesis
generation because you gather rich information that can inspire new research.
OBSERVATION OF NATURE
In the 1930s, Scientist "Konrad Lorenz" coined the term "footprint", describing an important
period of learning for natural animals. Based on his natural observations, he believes these birds are
making a mark on potential first parents in their environment, and that they quickly learn to follow
and act on them. Nature observations are particularly useful for studying behaviors and actions that
may not be repeated in a controlled laboratory setting.
01-02-04: EXPERIMENTATION
INTRODUCTION
Empirical research is research done with a scientific approach using two sets of variables. The first
set acts as a constant, which you use to measure the difference of the second set. For example, the
quantitative research method is empirical.
If you do not have enough data to support your decisions, you must first identify the facts.
Experimental research collects the data you need to help you make better decisions.
All studies conducted under scientifically acceptable conditions used experimental methods. The
success of empirical studies relies on researchers confirming that the change of a variable is solely
based on the manipulation of the constant variable. The study must establish a remarkable cause and
effect. You can conduct experimental research in the following situations:
i. Time is essential to establishing a cause-and-effect relationship.
ii. Invariant behavior between cause and effect.
iii. You want to understand the importance of cause and effect.
How you categorize research topics, by condition or group, determines the type of research plan you
should use.
1. Pre-empirical study design: One or more groups are kept under observation after causality has
been taken. You will conduct this research to understand if further investigation is needed for these
specific groups. You can divide pre-experimental research into three categories:
i. Unique Case Study design
ii. Single-group pre-trial-post-trial study design
iii. Comparing Static Groups
2. True experimental research design: True empirical research relies on statistical analysis to
prove or disprove a hypothesis, making it the correct form of research best. Among experimental
designs, only the true design can establish cause and effect in a group. In a real experiment, three
factors must be satisfied:
i. There is a control group, which will not be subject to change, and an experimental group,
which will experiment with the changed variables.
ii. A variable that can be manipulated by the researcher
iii. Random Distribution
3. Quasi-experimental design: The word "quasi" denotes similarity. A semi experimental
design is similar to experimental, but it is not the same. The difference between the two is the
COLUMN-I COLUMN-II
1. Qualitative research a. Study height, weight
2. Quantitively research b. Study smart, beauty
3. Audio-visual research c. Study human DNA
4. Applies research d. Record sound and picture
5. Basic research e. Study to cure disease
Answer:
1-b 2-a 3-d 4-e 5-c
SUMMARY
Research design is a blueprint for answering your research question. Research design and methods
are different but closely related; because good research design ensures that the data you receive will
help you answer your research question more effectively. To answer these questions, you need to
make decisions about how your data will be collected. Research methods are specific procedures for
collecting and analyzing data. Your method depends on the type of data you need to answer your
research question. Research methodology refers to the methods and techniques used to effectively
describe research. In research, several methods are used to interpret ideas; We will explore different
types in this article. Somehow, data will be collected. Before discussing data collection methods, let's
understand what data collection is and how it helps in different fields.
Nature observation is one of the research methods that can be used to design observational studies. In
nature observation, you study your research subjects in their own environment to explore their
RES505: Research Methodology Page 47
behaviors without any outside influence or control. Traditionally, observational studies of nature
have been used by animal researchers, psychologists, ethnologists, and anthropologists. Nature
observations are particularly useful for studying behaviors and actions that may not be repeated in a
controlled laboratory setting.
Fieldwork is defined as a method of qualitative data collection for the purpose of observing,
interacting with, and understanding people as they find themselves in a natural environment. For
example, conservationists observe the behavior of animals in their natural environment and how they
respond to certain situations. However, the cause and effect of a certain behavior is difficult to
analyze due to the presence of many variables in the natural environment.
Experimental research collects the data you need to help you make better decisions. All studies
conducted under scientifically acceptable conditions used experimental methods. The success of
empirical studies relies on researchers confirming that the change of a variable is solely based on the
manipulation of the constant variable. The study must establish a remarkable cause and effect.
KEY WORDS
Case Study - Defined as an in-depth study of a person, a group of people, or a unit, for the purpose
of generalizing over multiple units.
Ethnography- An essential research method to know the world from the point of view of its social
relations.
Innovation - It is vital to the continued success of any organization.
Data Sampling - This is a statistical analysis technique used to select, manipulate, and analyze a
representative subset of data points to identify patterns.
Time Sampling - Many studies of the development of social behavior use an observational
technique known as temporal sampling.
Literature - Literature is widely known as any collection of written works.
Design- Design is a blueprint or specification for building an object or system.
REFERENCES
1. Holland, Paul W. (December 1986). Statistics and Causal Inference. Journal of the American
Statistical Association. 81 (396): 945–960.
2. Stohr-Hunt, Patricia (1996). An Analysis of Frequency of Hands-on Experience and Science
Achievement. Journal of Research in Science Teaching. 33 (1): 101–109.
3. Baskerville, R. (1991). Risk Analysis as a Source of Professional Knowledge.
Computers & Security. 10 (8): 749–764.
YOUTUBE VIDEOS
1. https://www.youtube.com/watch?v=Y0wDYLpIoTw
2. https://www.youtube.com/watch?v=LpmGSioXxdo
3. https://www.youtube.com/watch?v=igwqp_yIgwM
4. https://www.youtube.com/watch?v=tBXznU_TPJo
WIKIPEDIA
1. https://www.scribbr.com/methodology/naturalistic-observation/
2. https://www.educba.com/types-of-research-methodology/
3. https://en.wikipedia.org/wiki/Field_research
4. https://www.nngroup.com/articles/field-studies/
5. https://www.questionpro.com/blog/experimental-research/
REFERENCE BOOKS
1. Kumar Ranjit: Research Methodology: A Step by Step Guide for Beginners, Sage Publication,
2014.
2. Kothari CR: Research Methodology, New Age International, 2011.
3. Shajahan S: Research Methods for Management, 2004.
4. Thanulingom N: Research Methodology, Himalaya Publishing, 2015.
5. Rajendar KumarC: Research Methodology, APH Publishing, 2008.
“Don't confuse hypothesis and theory. The former is a possible explanation; the latter, the correct
one. The establishment of theory is the very purpose of science” —Martin H. Fischer
INTRODUCTON
In common usage in the 21st century, a hypothesis refers to a provisional idea whose merit
requires evaluation. For proper evaluation, the framer of a hypothesis needs to define specifics in
operational terms. A hypothesis requires more work by the researcher in order to either confirm or
disprove it. In due course, a confirmed hypothesis may become part of a theory or occasionally may
grow to become a theory itself. Normally, scientific hypotheses have the form of a mathematical
model. Sometimes, but not always, one can also formulate them as existential statements, stating that
some particular instance of the phenomenon under examination has some characteristic and causal
explanations, which have the general form of universal statements, stating that every instance of the
phenomenon has a particular characteristic.
Consider the three Cs of principle or hypothesis writing in research: being clear, being concrete,
and being concise. Remember that, above all, you are trying to communicate with your reader, so
you should do everything you can to help them understand what you are trying to say.
Research design is the interpretive structure within which the research will be conducted. The
function of the study design is to enable the collection of relevant information with the smallest
possible input. Study design is important because it guides the researcher to identify appropriate
methods of data collection and analysis. A good study design is characterized by flexibility,
efficiency and relevance, etc. A well-developed study design is one that leads to minimal or no
errors if everything goes according to plan. It is important to have clarity of the research question
towards the objectives to be achieved. As a result, the researcher may have to create a combination
of different design approaches to create one that is appropriate for the problem at hand.
According to Green and Tull: Searching is the specification of methods and procedures for
obtaining necessary information. Preparing a study design, tailored to a particular research problem,
involves considering the following:
1. Objectives of the study
2. Data collection methods to be applied
3. Source of information - Sampling plan
4. Data collection tools
5. Data analysis tools: qualitative and quantitative
i. It facilitates the smooth running of various research activities, thus making research as
efficient as possible, generating maximum information with minimum effort, time and
money;
ii. It reduces ambiguity;
iii. This helps to achieve maximum efficiency and reliability;
iv. It helps to eliminate biases and marginal flaws;
PRINCIPLE
The principles and theories of science have been established by repeated work and observations.
It was peer-reviewed before being accepted by the scientific community. Acceptance does not imply
rigidity or constraint, nor is it dogmatic. Instead, as new data becomes available, previous scientific
explanations will be revised and improved, or discarded and replaced. Science is a means of making
sense of the world, with consistent and clearly described methods and principles within. There is a
development from a hypothesis to a theory using testable scientific laws. Only some scientific facts
are laws of nature and many hypotheses are tested to create a theory. Learn how scientific
assumptions, theories, and laws describe the natural world.
A Scientific method of gathering and evaluating data to obtain a solution to a problem is what we
mean when we say research. The invention of new ideas generally comes from the process of
research. Research is conducted by following the scientific method, which is employed when trying
to solve a problem.
There are two principles in research as
i. Basic principles
ii. Core principles
i. Basic Principles
The four basic principles of research are classified as; autonomy, beneficence, no malfeasance, and
justice.
i. The research principle of autonomy determines the right to agree or disagree to take part in
the research, and health-care methods needed to be decided by the patient.
HYPOTHESIS
A hypothesis is an idea that can be tested by observations or experiments about the natural
world. To be considered scientific, hypotheses must be scientifically validated and must be
falsifiable, which means they are formulated in a way that can be proven to be false. The hypothesis
that he formed based on his observations included the following:
i. In an organism, there is a pair of factors that control a given trait.
ii. The organism inherits these elements from its parents, one element from each element.
iii. Each thing is passed from generation to generation as a discrete, immutable unit.
iv. When gametes are formed, the elements separate and are distributed as units for each
gamete.
v. If an organism has two different elements for a character, one can be represented for the
complete exclusion of the other.
3) Is a hypothesis a question?
Answer: Hypothesis is a statement that introduces a research question and proposes an expected
outcome. It is an integral part of the scientific method, which is the basis of scientific experiments.
The two groups typically are a treatment group and a control group. The treatment group, also
known as the experimental group, receives the treatment that the researcher is evaluating. The
control group, on the other hand, does not receive the treatment. Instead, the control group receives
either a standard treatment (with known effect) or a fake (or inactive treatment).
Moreover, for these two groups, the researcher must determine how to assign subjects to each group,
which is another important aspect of experimental design. The primary method that researchers use
to assign subjects to groups is random assignment. With random assignment, subjects are put into
groups using a random method. Each subject has an equal chance of being assigned to a group, and
each subject is assigned to each group independently of other subjects. The assignment is like a coin
toss because the chance for assignment is a 50% chance. Similar to tossing a coin, computers can be
a useful tool in generating random assignments. Another feature that is useful and important with
randomly assigning subjects is having a larger number of subjects as opposed to a smaller number of
subjects.
This helps to make the groups more equivalent (similar in all attributes except the attribute being
researched). In fact, the goal is to make the groups probabilistically equivalent. This is when groups
are randomized in such a way that the two groups can be declared statistically equivalent. This
typically means that a 95% or greater equivalence is obtained. This level of equivalence ultimately
leads to more valid and solid conclusions. For example, two equivalent groups would be groups
where the subjects share the same characteristics, such as the mean age, mean weight, and health are
the same between the groups, so the efficacy of a drug could be adequately evaluated and
determined.
Experimental design is a controlled plan for conducting research and organizing experiments. It is a
quantitative and scientific process that allows data to be collected and evaluated objectively. The
purpose of an experiment design is to determine a conclusion about a phenomenon or determine a
conclusion to see if there is any truth to a hypothesis, which is an educated guess or hypothesis about
a phenomenon. what the conclusion should look like before starting an experiment.
Phenomena usually involve a relationship between two or more variables. A variable is something
that can change or be changed during a study. It is an element that can be manipulated, controlled, or
RES505: Research Methodology Page 59
measured in a research study. For example, one variable might be time, height, weight, age, or
disease, as these factors can change or change. Typically, in an experimental design, a researcher will
evaluate variables between two groups. This is called two-group design because it involves dividing
subjects into two groups so that they can be compared when evaluating a phenomenon.
What is a treatment group? What are the two groups in test design? The two groups are usually a
treatment group and a control group. The treatment group, also known as the experimental group,
receives any treatment that the researcher.
MATCHED-PAIRS E XPERIMENT
A matched pair trial design is one that can be used to conduct randomized block design testing
with a relatively small number of participants. That way, researchers can reduce some of the
variables involved. Although matching participants can be time consuming, each treatment group has
similar characteristics and variables. This helps researchers know whether the difference in outcomes
after the treatments is over is due to the treatment method or to the variables.
In experiments involving survey data, a participant's responses can be influenced by the order in
which responses can be given; this is called the order effect. In a matched pair wise trial design, each
OBJECTIVE
The purpose of paired samples is to obtain better statistics by controlling for the effect of other
unexpected variables. For example, if you are studying the health effects of alcohol, you can control
for age-related health effects by matching participants of similar ages.
TEST
When you run a hypothesis test, you should choose a test specifically for independent samples or
dependent (paired) samples. Paired patterns can be analyzed using the following specific tests:
i. McNemar's test is a non-parametric test for paired nominal data.
ii. The paired t-sample test (also known as the "related measurements" t-test or the dependent-
sample t-test) compares the means of two groups to see if there is a statistical difference
between the two group or not.
iii. Wilcoxon's signed rank test is a non-parametric alternative to the t-test. Note that this test
does not compare the average, it compares the average rank.
FACTORIAL EXPERIMENT
In statistics, a full factorial test is a test whose design includes two or more factors, each with
discrete possible values or "levels", and whose test unit takes all possible combinations of those
levels for all those factors. Full factorial design can also be referred to as full cross design. Such an
experiment allows the investigator to study the effect of each factor on the response variable, as well
as the effect of the interaction between the factors on the response variable. For most factorial tests,
each factor has only two levels. For example, with two elements each taking
The main drawback is the difficulty of testing with more than two elements or multiple levels. A
factorial design must be meticulously planned, because a mistake at one of the levels, or during
general operation, would jeopardize a large amount of work. Aside from these minor flaws, factorial
design is the mainstay of many sciences, yielding excellent results in the field.
Block design in statistics, also known as block, is the arrangement of experimental units or
subjects into groups called blocks. A block design is often used to account for or control potential
sources of undesired variation. Blocks are often divided into relatively uniform subsets depending on
the experimental conditions. By dividing objects into blocks, the researcher ensures that the change
within blocks is smaller than the change between blocks.
In an experiment, there are two main variables, the dependent variable and the independent
variable. The dependent variable is the variable that is tested or measured in an experiment while the
independent variable is the factor that is believed to have an effect on the dependent variable. The
independent variable is often modified to observe and record its effect on the dependent variable. A
confounding variable is a variable that affects both the dependent and independent variables and can
lead to false or misleading results.
A random block design (RBD) is an experimental design in which experimental subjects or units
are grouped into blocks with different treatments to be tested that are randomly assigned to the units.
in each block. Essentially, a randomized block design group’s subjects with similar characteristics
into blocks and randomly tests the effects of each treatment on individual subjects within the block.
RES505: Research Methodology Page 64
For example, if a farm has a corn field affected by a plant disease and wants to test the effectiveness
of different fungicides in controlling it, they can divide the field into blocks and randomly process it.
Sections of each block with different fungicides test. By dividing the field into blocks, they were able
to account for some of the variations that might exist in the field. For example, a stretch of field may
have more shade and prolonged leaf wetness, creating an ideal environment for pathogens to thrive.
Another section may have a slightly different soil type or slope. Through blocking, the farm can
account for these potential confounding factors. The random block design is used to perform research
in many fields, such as pharmaceutical research, agriculture, and animal science.
There are often confounding variables whose effects cannot be predicted, and a researcher may not
even be aware of the existence of certain confounding variables. Understanding a project and being
able to predict potential sources of variation is important for getting the most accurate relationship
between dependent and independent variables. To get the most accurate variable relationship, there
are advantages to using block randomization. Random block:
i. Reduce bias
ii. Error reduction
iii. Reduce variation in processing conditions
iv. Ensure that the results are not misinterpreted
v. Help
Answer:
1-True 2-True 3-True 4-False 5-False
Column-I Column-II
1.RBD a. 8 ddifferent levels
2.2x2 factorial b. 2 dependent and 1 independent
3.2x2x2 factorial c. block and treatment
4.3x3 factorial d. 2 intervention (2 level)
5.2x4 factorial e. 3 level design
Answer:
Answer:
1- quadratic 2- independent 3- independent 4- blocks 5-block randomization
REFERENCES
1. McCoy, S. K., & Major, B. (2003). Group identification moderate’s emotional response to
perceived prejudice. Personality and Social Psychology Bulletin, 29, 1005–1017.
2. Babbie, E. (2010). The practice of social research (12th ed.). Belmont, CA: Wadsworth;
Campbell, D., & Stanley, J. (1963). Experimental and quasi experimental designs for research.
Chicago, IL: Rand McNally.
3. Milliken & Johnson (1989), Tukey's single degree-of-freedom test for no additivity, pp. 7-8.
4. Lentner & Bishop (1993), In 6.8 No additivity of blocks and treatments, pp. 213–216.
YOUTUBE VIDEOS
1. https://www.youtube.com/watch?v=vJgcae2ziOM
2. https://www.youtube.com/watch?v=bYURT9wgc98
3. https://www.youtube.com/watch?v=CKTr9T1drcU
4. https://www.youtube.com/watch?v=slscHD40r78
WIKIPEDIA
1. https://en.wikipedia.org/wiki/Generalized_randomized_block_design
2. https://en.wikipedia.org/wiki/Factorial_experiment
3. https://en.wikipedia.org/wiki/Fractional_factorial_design
4. https://en.wikipedia.org/wiki/Randomization
“Sampling, statisticians have told us, is a much more effective way of getting a good census” –
Rob Lowe
INTRODUCTION
Although the idea of sampling is easiest to understand when you think about a very large
population, it makes sense to use sampling methods in studies of all types and sizes. After all, if you
can reduce the effort and cost of doing a study, why wouldn’t you? And because sampling allows
you to research larger target populations using the same resources as you would smaller ones, it
dramatically opens up the possibilities for research. Sampling is a little like having gears on a car or
bicycle. Instead of always turning a set of wheels of a specific size and being constrained by their
physical properties, it allows you to translate your effort to the wheels via the different gears, so
you‘re effectively choosing bigger or smaller wheels depending on the terrain you‘re on and how
much work you‘re able to do. Sampling allows you to gear your research so you’re less limited by
(Source: https://www.questionpro.com/blog/types-of-sampling)
SAMPLING FRAME
The sampling frame is the actual list of individuals from which the sample will be drawn. Ideally, it
should include the entire target population (and anyone outside of that population).
SAMPLE SIZE
The number of individuals you need to include in your sample depends on many factors, including
the size and variability of your population and your research plan. There are different formulas and
sample size calculators depending on what you want to achieve with statistical analysis.
If practical, you could include every individual from every sampled cluster. If the clusters
themselves are large, you can also sample the instances in each cluster using one of the techniques
above. This is called multistage sampling. This method is good for dealing with large and dispersed
populations, but is more prone to errors in the sample, as there can be significant differences between
clusters. It is difficult to ensure that the clusters sampled are truly representative of the entire
population.
2. Sampling voluntary responses: Similar to the convenience form, the voluntary feedback form
is primarily based on accessibility. Instead of researchers selecting participants and contacting
them directly, people volunteer (for example, by completing a public online survey). Volunteer
feedback forms are at least always somewhat biased, as some people are inherently more likely
to volunteer than others.
3. Purposeful sampling: This type of sampling, also known as judgmental sampling, involves the
researcher using their expertise to select a sample that is most useful for research purposes. It is
often used in qualitative research, where the researcher wishes to gain detailed knowledge of a
particular phenomenon rather than making statistical inferences, or when the population is very
small and specific body. An effective purpose template must have clear criteria and reasons for
inclusion.
4. Snowball sampling: If people are hard to reach, snowball sampling can be used to recruit
participants through other participants. The number of people you have access to the "snowball"
as you come in contact with more people. Using non-probability sampling
These are also known as Random sampling methods. These are also called non-random sampling methods.
These are used for research which is conclusive. These are used for research which is exploratory.
These involve a long time to get the data. These are easy ways to collect the data quickly.
There is an underlying hypothesis in probability The hypothesis is derived later by conducting the
sampling before the study starts. Also, the objective research study in the case of non-probability
of this method is to validate the defined hypothesis. sampling.
NON-RANDOM SAMPLING
Nevertheless, meeting sample testing standards is not easy.
i. Free-handed sampling frames are relatively rare when conducting market research.
ii. Ensuring that all individuals in the population have non-zero selection probabilities are just
as difficult to achieve.
Knowing the exact inclusion probability for each sample unit is even more difficult incapable
individual Probability of being selected as a sample. When conducting live polls on the street, we do
not have access to the list of people who make up the population. When conducting telephone
interviews, we have a list of phone numbers, but everyone has a landline or list number. It does not
mean. If you receive a response from an online panel, the chances of it being included are zero, as
A random variable is a rule that assigns a numerical value to each outcome in the sample space.
Random variables are either discrete or continuous. A random variable is called discrete if it takes
only certain values within an interval. Otherwise it is continuous. Random variables are usually
capitalized, such as X and Y. If X takes values 1, 2, 3..... we are talking about a discrete random
variable.
We need a random variable that is measured as a function whose probabilities can be assigned to a
set of possible values. Obviously, the results depend on some unpredictable physical variables. Let's
say that if you toss a fair coin, the final outcome, heads or tails, depends on possible physical
conditions. It is not possible to predict which outcome will be determined. There are other
possibilities for coins to be broken or lost, but such considerations are avoided.
VARIABLE
A variable is any entity that can take on different values, e.g. age, country. A dependent variable is
one which is affected by another is independent. It has the same properties as random variables
without emphasizing any particular kind of stochastic experiment. It always obeys certain probability
laws. A variable is said to be discrete if it cannot take all values within the specified range. A random
variable is said to be a continuous random variable if it can take all the numbers provided over the
entire range.
Now if we differentiate both the sides of the above expressions with respect to y, then the relation
between the probability density functions can be found: fY(y) = fx(h(y)) |dh(y)/dy|
The probability of a random variable X which takes the values x is defined as a probability function
of X is denoted by f (x) = f (X = x) A probability distribution always satisfies two conditions:
f(x)≥0 ∑f(x)=1
The important probability distributions are:
i. Binomial distribution
ii. Poisson distribution
iii. Bernoulli’s distribution
iv. Exponential distribution
v. Normal distribution
RES505: Research Methodology Page 82
TRANSFORMATION OF RANDOM VARIABLES
The transformation of a random variable means to reassign the value to another variable. The
transformation is actually inserted to remap the number line from x to y, then the transformation
function is y = g(x).
Transformation of X or Expected Value of X for a Continuous Variable
Let the random variable X assume the values x1, x2, x3,… with corresponding probability P (x1), P
(x2), P (x3),.. then the expected value of the random variable is given by
Expectation of X, E (x) = ∫ x P (x)
i. Due to the independent variable. Its value is independent of other variables in the study.
ii. The dependent variable is the effect. Its value depends on changes in the independent
variables.
Examples of independent and dependent variables: I am planning a study to test whether changes in
room temperature affect zoological test results.
i. The independent variable is room temperature. Vary the room temperature by keeping half of
the participants cool and the other half warm.
ii. The dependent variable is the zoological test score. They use a standardized test to measure
the zoological performance of all participants and see if it varies with room temperature.
INDEPENDENT VARIABLES
An independent variable is a variable that is manipulated or modified to study its effects in an
experimental study. It is called 'independent' because it is not affected by other variables in the study.
Independent variables are also called:
i. Explanatory variables (variables that describe an event or outcome)
ii. Predictor variable (can be used to predict the value of the dependent variable)
iii. Right side variables (displayed on the right side of the regression equation).
These terms are used specifically in statistics to estimate the extent to which changes in independent
variables can explain or predict changes in dependent variables. The independent variable is exactly
what it sounds like.
This is an independent variable and will not be changed by any other variable you are trying to
measure. For example, a person's age could be an independent variable. Type of independent
variable
There are two main types of independent variables.
1. Experimental independent variables can be directly manipulated by researchers.
2. Subject variables cannot be manipulated by researchers, but can be used to group research
topics into categories.
1. Experimental variable
Use your experimental data to analyze your results by creating descriptive statistics and visualizing
your results. Then choose an appropriate statistical test to test your hypothesis. The test type is
determined as follows:
i. Variable type
RES505: Research Methodology Page 86
ii. Measurement level
iii. The number of levels of the independent variable.
The type of visualization you use depends on the type of variables in your research question.
i. Bar charts are best when you have categorical independent variables.
ii. Scatter plots or line charts work best when both the independent and dependent variables are
quantitative.
INTERVENING VARIABLES
An intervening variable, sometimes called a mediator variable. Mediating variables help us
understand the relationship between the independent variable, and dependent variable when there is
no such direct relationship between both. When independent variables cannot influence the
dependent variable, a mediating variable works as a referee between the two and help us navigate the
relationship between independent variables (IV) and dependent variables (DV). Mediating variables
are also called intervening variables.
Independent variables govern the dependent variables through the channel of mediating or
intervening variables (Fig.4.2).
Column-I Column-II
1. Random a. A character free from outside control
Answer:
1-b 2-c 3-a 4-e 5-d
SUMMARY
Probabilistic sampling involves random selection, allowing you to make powerful statistical
inferences about the entire group. Non-probability sampling involves nonrandom selection based on
convenience or other criteria, allowing you to easily collect data. In a simple random sample, each
member of the population has an equal chance of being selected. To do this type of sampling, you
can use tools like random number generators or other purely chance-based techniques. Each member
of the population is listed with a number, but instead of generating random numbers, individuals are
selected at regular intervals. To use this sampling method, you divide the population into subgroups
(called strata) based on relevant characteristics (e.g. gender, age group, income group, location). You
then use random or systematic sampling to select a sample from each subgroup.
A sampling frame is a list of items that make up the population under study. Items within a sampling
frame are called sampling units. To create a sampling frame, I was able to access the company's
computer system and pull a list of everyone who had purchased a product in the past year. A sample
is always used if the following conditions are met, as items in the population have a non-zero chance
of being selected as part of the sample.
KEY WORDS
Sample Size- Sample size is a measure of the number of individual samples used in an experiment.
Random Sample- A randomly selected sample is intended to provide an unbiased representation of
the entire population.
Nonrandom Sampling- A method of selecting units from a population using a subjective (that is,
non-random) method.
Cluster Sampling- In cluster sampling, researchers divide a population into smaller groups called
clusters.
Sampling Frame- A list of objects or people that make up the population from which the sample is
taken.
Quasi-Sampling- A systematic sampling of every nth entry from the list is equivalent to a random
sampling for most practical purposes.
Placebo Group- This is the group of participants exposed to placebo or a sham independent
variable.
Subject Variables- Experiences or characteristics of research participants that are not of primary
interest but may influence the outcome of the study and should be considered during experiments or
data analysis.
YOUTUBE VIDEOS
1. https://www.youtube.com/watch?v=6skCMCdh3FY
2. https://www.youtube.com/watch?v=NVr0OqeAdjw https://www.youtube.com/watch?v=V2-
Rpc1s9Rc&list=PLqMl6r3x6BUQvUoLYgmf3XmFW8LSEyXlo&index=24
WIKIPEDIA
1. https://www.scribbr.com/methodology/sampling-methods/
2. http://www.stats.gla.ac.uk/steps/glossary/sampling.html
3. https://www.netquest.com/blog/en/random-non-random-sampling
4. https://byjus.com/maths/random-variable/
REFERENCE BOOKS
1. Cutler, Alan (2003), The seashell on the mountain top. Heinemann, London.
2. Altman. DG., (1990), Practical Statistics for Medical Research, CRC Press.
3. L. Castaneda; V. Arunachalam & S. Dharmaraja (2012), Introduction to Probability and
Stochastic Processes with Applications. Wiley.
4. Kallenberg, Olav (2001), Foundations of Modern Probability (2nd ed.). Berlin: Springer Verlag.
“Data are just summaries of thousands of stories—tell a few of those stories to help make the
data meaningful.” ~ Dan Heath
INTRODUCTION
Data collection is the procedure of collecting, measuring and analyzing accurate information’s
for research using standard validated techniques.
A researcher can evaluate their hypothesis on the basis of collected data. In most cases, data
collection is the primary and most important step for research, irrespective of the field of research.
The approach of data collection is different for different fields of study, depending on the required
information.
Data collection tools refer to the devices/instruments used to collect data, such as a paper
questionnaire or computer-assisted interviewing system. Case Studies, Checklists, Interviews,
Observation sometimes, and Surveys or Questionnaires are all tools used to collect data.
Accurate data collection is necessary to make informed business decisions, ensure quality assurance,
and keep research integrity. During data collection, the researchers must identify the data types, the
sources of data, and what methods are being used. Depending on the researcher's research plan and
design, there are several ways data can be collected. The most commonly used methods are:
published literature sources, surveys (email and mail), interviews (telephone, face-to-face or focus
group), observations, documents and records, and experiments. All over primary and secondary data
collection are re-emphasized in the current unit of the study.
Data plays a central role in data science and machine learning. In most cases, we assume that the data
we use for analysis and modeling is free and readily available. In some cases, there is no data and the
full dataset cannot be obtained or is taking too long to collect. In this case, we need to find a way to
try and collect the best subset of data that we can retrieve quickly and efficiently. The process of
designing experiments to collect data is called design of experiments. Examples of experimental
designs are research and clinical trials (Fig.1.1).
iv. Cost Phrased Text: Designing experiments to collect data can be very expensive. Running an
experiment also comes with a cost. For example, participants who answer a survey can be
rewarded as an incentive to participate. Data scientists and data analysts must also be paid for
their analysis of data collected from research. Before planning an experiment, it is important to
assess the cost of running the experiment and whether the benefits of the experiment outweigh
the risks. For example, if the findings can improve customer experience and increase profits, the
investment is worth it.
EXPERIMENT
Experiment is a method of data collection in which you, as a researcher, modify certain variables
and observe their effects on other variables. Variables that you manipulate are called independent
while variables that change due to manipulation are dependent variables. Imagine a manufacturer
testing the effect of a drug's strength on the number of bacteria in the body. The company decided to
test the strength of the drug at 10mg, 20mg and 40mg. In this example, drug strength is the
independent variable while.
The greatest advantage of using an experiment is that you can explore causal relationships that an
observational study cannot. Additionally, experimental research can be adapted to different fields
like medical research, agriculture, sociology, and psychology. Nevertheless, experiments have the
disadvantage of being expensive and requiring a lot of time.
Answer: There are several types of experimental designs. In general, designs that are truly
experimental have three main characteristics: independent and dependent variables, pre- and post-
test, and experimental and control groups.
In research field one of the most effective means is a primary data collection. Depending on the
question, these interviews can take the form of household surveys, business surveys, or agricultural
PRIMARY DATA
Primary data is a type of data collected directly from primary sources by researchers through
interviews, surveys, experiments, and so on‖. Primary data is typically collected from sources.
Where the data originally came from and is considered the best type of data in research. Primary data
sources are typically specifically selected and tailored to meet specific research needs or
requirements. Also, before choosing a data collection source, it is necessary to identify the purpose
of the survey, the target audience, etc. For example, if you're doing market research, the first thing to
do is identify your research objectives and sample population. This will determine which data
collection source is most appropriate. Offline surveys are better suited than online surveys for
populations living in remote areas without internet connectivity.
DISADVANTAGES
i. It takes longer.
ADVANTAGES
i. Respondents are given sufficient time to provide their responses.
ii. No interviewer biases.
iii. Cheaper than an interview.
DISADVANTAGES
i. The non-response bias rate is high.
ii. Inflexible and cannot be changed after submission.
iii. It is a slow process.
3. The Observation
Observation method is primarily used in scientific research. Researchers use observations as
scientific tools and methods of data collection. Observations as a means of data collection are usually
systematically planned and managed. There are various approaches to observation methods.
Structured or unstructured, controlled or uncontrolled, participatory, non-participatory or veiled
approaches. Structured and unstructured approaches are characterized by careful definition of
ADVANTAGES
i. Data are generally objective.
ii. Dates are not affected by past or future events.
DISADVANTAGES
i. Information is limited.
ii. Expensive
4. Focus Groups
A focus group is a group of two or more people with similar or common characteristics. They seek
candid thoughts and opinions from participants. Focus groups are the primary source of data
collection, as data are collected directly from the participants. This is commonly used for market
research where a group of market consumers discuss with a research moderator. Like an interview,
but includes discussion and dialogue instead of questions and answers. Focus groups are less formal,
with participants leading the bulk of the conversation and a facilitator overseeing the process.
ADVANTAGES
i. Less expensive than an interview. This is because the interviewer does not have to talk to each
participant individually.
ii. It doesn't take long.
DISADVANTAGES
i. In this case, response bias becomes an issue. This is because participants can be subjective
about what people think about sharing their genuine opinions.
ii. Groupthink does not clearly reflect individual opinion.
5. Experiments
ADVANTAGES
i. Collected data is the result of a process and is therefore generally objective.
ii. Non-response bias is eliminated.
DISADVANTAGES
i. Erroneous data may be recorded due to human error.
ii. Expensive.
Interviews are especially useful for revealing the story behind the participant's experience or for
getting more information on the topic. Interviews are useful for following up with individual
respondents after a survey. To explore their answers further. Qualitative research, in particular, uses
interviews to explore the meaning of central topics in a subject's living environment. The main task
of the survey is to make sense of what respondents are saying.
SEQUENCE OF QUESTIONS
i Involve the respondent in the interview as early as possible.
ii Before asking about controversial topics (feelings, conclusions, etc.), ask for some facts first.
iii Use factual questions throughout the interview.
iv Before asking questions about the past or future, ask about the present.
v The final question allows respondents to provide additional information and impressions of
the interview that they consider relevant.
vi Questions should be asked carefully.
vii Questions should be asked one at a time.
viii Language should be open. Respondents should be able to select their own descriptive
vocabulary while answering questions.
ix Questions should be as neutral as possible.
x Be careful with why questions.
STRENGTHEN
i. Interviews provide useful information when participants cannot be observed directly.
ii. Interviewers have more control over the types of information they receive. You can choose
your own question.
iii. Effectively worded questions encourage unbiased and honest answers.
WEAKNESS
i. If only one interviewer interprets the information, the respondent may provide biased
information or become unreliable. The best research requires different perspectives.
CONDUCTING INTERVIEWS
These are procedures that are consistent in the literature for conducting interviews in research.
i. Identify respondents.
ii. Decide what type of interview to use.
iii. During the interview, tape the questions and answers.
iv. Take short notes during the conversation.
v. Find a quiet place suitable for the interview.
vi. Obtain the interviewer's consent to participate in the study.
vii. Plan, but be flexible.
viii. Obtain additional information using probes.
ix. Be courteous and professional after the interview.
STRENGTHEN
i. Interviews provide useful information when participants cannot be observed directly.
ii. Interviewers have more control over the types of information they receive. You can choose
your own question.
iii. Effectively worded questions encourage unbiased and honest answers.
WEAKNESS
i. If only one interviewer interprets the information, the respondent may provide biased
information or become unreliable. The best research requires different perspectives.
ii. Interview answers may be deceptive because respondents attempt to answer in a way that
pleases the interviewer.
iii. Equipment can be a problem. The equipment can be expensive and requires a high degree of
technical expertise to use.
iv. Can be time consuming and inexperienced interviewers may not be able to properly focus on
questions.
QUESTIONNAIRES
WRITE AN INSTRUCTION
This step is followed by the layout or rearrangement of the questions in both explanations. This is
probably because once the questions and other text are ready, this is the best time to review. O'Leary
(2014) encourages researchers to use formats that are professional and aesthetically pleasing,
engaging respondents and structured in a way that reduces the likelihood of making mistakes (e.g.
repeating questions). O`Leary (2014) provides final instructions to include a cover he latter
explaining who you are, project goals, non-disclosure agreements, etc. However, Bell & Waters
(2014) provide further instructions.
DISTRIBUTION
Bell and Waters (2014) give a brief description of the distribution method. They emphasize the need
to ensure confidentiality, provide return dates, and develop plans for 'returns' by email, and record
data immediately upon receipt. O'Leary (2014), faceto-face, mail, emails, and online are typical
methods. Bell and Waters (2014) emphasize the benefits of personal administration of
questionnaires, as researchers explain the purpose of the study and are more likely to receive
completed questionnaires. The authors continue to emphasize the value of online methods.
In particular, they cite "Survey Monkey" as the most popular and versatile survey tool available.
Students are encouraged to send reminders or emails to increase response rates and speed of
response.
ANALYSIS
Bell and Waters (2014) and O'Leary (2014) again disagree on the analysis. O'Leary (2014) suggests
collecting data as soon as possible, whereas (Bell (2014) suggests that researchers review responses
before coding and recoding only when time permits. Both methods have their merits: the amount of
data that can be used to make logical decisions and the amount of data that is available
O`Leary (2014) raises some concerns about using questionnaires as a research tool, as it is time
consuming, expensive and difficult. O'Leary (2014) argues that surveys are "notoriously difficult to
create" and often don't go according to plan.
STRENGTHEN
O`Leary (2014) finds this research method useful because administering questionnaires allows
researchers to generate data specific to their study and provide insights that may not otherwise be
available. It suggests some obvious advantages. In listing additional benefits of surveys, they
suggest:
i. Reach a large number of respondents
ii. Represents a larger population
iii. Allow comparison
iv. Generate standardized, quantifiable empirical data
v. Generate qualitative data using open-ended questions
1. Which of these statements is true for collecting information from a third party?
a. The indirect oral investigation is used to collect data from the third parties
b. The mailed questionnaire method is apt for gathering information from third parties
c. Third parties prefer direct personal interviews to provide data to the researcher d. All of the
above
2. The main feature of secondary source of data is that -----------------.
a. It provides first-hand information to the researcher
b. It is more reliable compared to primary data
c. It implies that the data is collected from its original source
Answer:
1-a 2-d 3-b 4-a 5-c
SUMMARY
Bell and Waters (2014) and O'Leary (2014) each provide a clear checklist for creating a survey from
start to finish. Bell first reminds the researcher to get permission before answering the questionnaire,
then ponders what our question is and whether this is the best way to get the intended information.
Before creating your own question, you should consider existing ways of adapting previous tools
rather than reinventing the wheel. A variable whose level is set by the experimenter. Here are four
main factors to consider when designing and conducting data collection experiments.
Primary data collection is the process of data collection through surveys, interviews, or
experiments. This form of data collection allows researchers to ensure that primary data meet the
standards required for their specific research question in terms of quality, availability, statistical
significance, and sampling. Field research is one of the most effective means of primary data
collection. Online surveys are conducted using internet-enabled devices such as mobile phones, PCs
and tablets. Offline surveys, on the other hand, do not require an internet connection to run.
However, there are also offline surveys like Form Plus that can be completed using a mobile device
without access to an internet connection.
Secondary data is data that has been collected and made available from other sources. Secondary
data is data that has been collected from primary sources and made available to researchers for use in
their own research. Secondary data is known to be always available compared to primary data.
Secondary data analysis is the process of analyzing data collected primarily by another researcher
who collected the data for another purpose. The process of secondary data analysis can be
quantitative or qualitative, depending on the type of data the researcher is working with. Quantitative
methods of secondary data analysis are applied to numerical data and analyzed mathematically,
whereas qualitative methods use words to provide detailed information about the data.
Secondary data analysis has different phases, including pre-collection, during collection, and post-
collection events. This means having a clear understanding of why you are collecting data (the
ultimate goal of your research work) and how this data can help you achieve that. This will help you
collect the right data and choose the best data sources and analysis methods. For example, a
researcher trying to collect data on optimal fish diets that allow rapid growth of fish should ask
questions such as: What kind of fish should be considered? For example, if the data collected is
qualitative, researchers can exclude qualitative data. Depending on the data type, it is analyzed using
quantitative or qualitative methods.
KEY WORDS
Primary data - Data generated by the researchers themselves, surveys, interviews, experiments
specifically designed to understand and solve the research question at hand.
Secondary data - This is data that has already been collected from primary sources and made
available to researchers for use in their own research.
Survey - A survey is a list of questions or items used to collect data about a respondent's attitudes,
experiences, or opinions.
Data Reliability - Data reliability means that the data is complete and accurate.
Bot - A computer program that acts as an agent for a user or another program. Podcast - A digital
audio file that you can download to your computer or mobile device over the Internet.
Journals - Journals are academic publications containing articles written by researchers, professors,
and other professionals.
Trauma - Emotional reactions to horrific events such as accidents, rapes, and natural disasters.
Focus Groups - A research method that brings together a small group of people to answer questions
in a moderated environment.
REFERENCES
1. Bell, J., Waters, S., &EBooks Corporation. (2014). Doing your research project: A guide for
first-time researchers (Sixth ed.). Maidenhead, Berkshire: Open University Press.
2. Creswell, J. W. (2009). Research design: Qualitative, quantitative, and mixed methods approach
(3rd ed.). Los Angeles: Sage.
3. Kvale, S., and Sage Research Methods Online. (2008). Doing interviews. Thousand Oaks;
London: SAGE Publications, Limited.
4. McNamara, C. (1999). General Guidelines for Conducting Interviews, Authenticity Consulting,
LLC, Retrieved from: http://www.managementhelp.org/evaluatn/ intrview.htm
YOUTUBE VIDEOS
RES505: Research Methodology Page 121
1. https://www.youtube.com/watch?v=lqqJ5BmXzB0
2. https://www.youtube.com/watch?v=oLcxcx4blTc
3. https://www.youtube.com/watch?v=2y8w3AoxHSE
4. https://www.youtube.com/watch?v=iecJry3Kwrk&list=PL0SUHdavZkG0sSbMWQgs
YJJkujK7ujKU
WIKIPEDIA
1. https://www.formpl.us/blog/primary-data
2. https://lled500.trubox.ca/2016/225
3. https://en.wikipedia.org/wiki/Data_collection
4. https://www.techtarget.com/whatis/definition/secondary-data
REFERENCE BOOKS
1. Dr. Roger Sapsford, Victor Jupp, ―Data Collection and Analysis, 2006.
2. Jovancic, Nemanja.Data Collection Methods for Obtaining Quantitative and Qualitative Data.
LeadQuizzes, 2005.
3. Schutt, R. Investigating the Social World. Sage Publications, 2006.
4. Corti, L. and Bishop, L. 'Strategies in Teaching Secondary Analysis of Qualitative
Data' FQS 6(1), 2005.
LEARNING OBJECTIVES
Understand data sets and present them in multiple ways
Create line charts, summary charts, and bar charts to display the same data in different
formats
Interpret line, summary, and bar charts to answer questions about data sets in multiple
formats
Compare and express opinions on different presentations of data
Determine the appropriate presentation for different situations
“You can achieve simplicity in the design of effective charts, graphs and tables by remembering
three fundamental principles: restrain, reduce, emphasize.” - Garr Reynolds
INTRODUCTION
A data representation is meant, any convention for the arrangement of things in the physical world
in such a way as to enable information to be encoded and later decoded by suitable automatic
systems. Data Representation refers to the form in which data is stored, processed, and transmitted.
For which devices are used, such as smart phones, iPods, and computers store data in digital formats
that can be handled by electronic circuitry.
Quantitative data answer questions such as “How many?”, “How often?”, “How much?”. This data
can be verified and conveniently evaluated using mathematical techniques.
For example, there are quantities corresponding to various parameters. For instance, How, many
students of M. Sc. Bioscience passed? is a question that will collect quantitative data. Values are
associated with most measuring parameters such as pass class, second class, first class, outstanding,
etc.,
Quantitative data makes measuring various parameters controllable due to the ease of mathematical
derivations they come with. It is usually collected for statistical analysis using surveys, polls, or
questionnaires sent across to a specific section of a population.
Researches can establish the retrieved results across a population (Fig.1.1).
Height in feet, age in years, and weight in pounds are examples of quantitative data
4) What is the best example of quantitative data?
Answer: Here are some examples of quantitative data: A one-gallon jug of milk. The painting
measures 14 inches wide and 12 inches long. Newborn baby weighs 6 pounds and 5 ounces. A 4-
pound bag of broccoli crowns. One cup of coffee contains 10 oz. Dr. Ajay is six feet tall. One tablet
weight 1.5 pounds.
Thus, a frequency distribution table is a graph that summarizes their values and frequencies. In other
words, it is a tool for organizing data. This allows us to easily understand the given set of
information. Thus, the frequency distribution table in statistics helps us to condense the data into a
simpler form so that we can easily observe its characteristics at a glance.
HOW TO BUILD A FREQUENCY DISTRIBUTION TABLE?
The frequency distribution table can be easily generated by following the steps below:
Step 1: Create a table with two columns - one titled the data you are organizing and the other column
will be frequency. [Draw three columns if you also want to add a tick]
Step 2: Examine the entries recorded in the data and decide if you want to plot an unpolled
frequency distribution or a clustered frequency distribution. If there are too many different
values, it is often better to use a clustered frequency distribution table.
Step 3: Write the values from the dataset in the first column.
Step 4: Count how many times each item repeats in the collected data. In other words, find the
frequency of each element by counting.
Step 5: Enter the frequency in the 2nd column corresponding to each item.
Step 6: Finally, you can also write the total frequency in the last row of the table. Let's take an
example. Dr. Anuja is a teacher. She wants to look at the scores the students in her class got on
the last exam. She didn't have time to look at the sheets one by one to see the score. So, she
asked Dr. Ajay to organize the data in a table so that it would be easier for her to see everyone's
notes together. Dr. Anuja suggests using a frequency distribution table to sort the data, so that
you get a better picture of the data instead of using a simple list.
Using a histogram here is a good way to present the data as it will show all of the student's scores in a
single graph. But how do you create a frequency distribution table? They worked hard to collect all the
data. The following table shows the test results of 20 students, i.e. for a class (Table 3.2).
5-10 11
10-15 12
15-20 19
20-25 7
25-31 8
Total 60
A frequency distribution table drawn above is called a grouped frequency distribution table.
FREQUENCY DISTRIBUTION TABLE IN STATISTICS
A frequency distribution in statistics is a representation of data that shows the number of
observations over a given period of time. The frequency distribution representation can be graphical
or tabular. Now let's look at another way to represent data, i.e. represent data graphically. This is
done using a frequency distribution table plot. Such charts make it easier for you to understand the
data collected.
i. Bar charts represent data using bars of uniform width with equal spacing between them.
ii. A pie chart showing an entire circle, divided into sectors where each field corresponds to the
information it represents.
iii. The frequency polygon is plotted by connecting the midpoints of the bars in the histogram.
50 - 60 IIII I 6
60 - 70 IIII 5
70 - 80 IIII I 6
8 0 - 90 II 2
Total= 22
Cumulative frequency means the sum of the frequencies of the layer and all lower layers. It is
calculated by adding the frequency of each class below the corresponding class interval or category.
Here is an example of a cumulative frequency distribution table (Table 3.5).
Answer: Frequency Distribution Table is useful for performing calculations on given data. It
involves calculations involving measures of central tendency, variance, statistical testing, and
analysis. In addition, histograms of frequency distribution are useful for presenting data in a neat and
understandable manner.
A two-way table is a way to display frequencies or relative frequencies for two categorical
variables. One category is represented by rows and a second category is represented by columns. A
one-way table is simply data from a bar chart put into a table. In a one-way table, you only work
with one categorical variable. Two-way frequency table: (showing "quantity") you might have
guessed that a two-way frequency table will handle two variables (called bivariate data).
Two-dimensional array
A way to display the frequency or relative frequencies of two variables. One of the variables is
represented by rows and the other by columns. They are used to see if there is a relationship between
two variables. A two-way table is a way to display the frequencies or relative frequencies for two
categorical variables. One category is represented by rows and a second category is represented by
columns. For example, 60 people (30 men and 30 women) were asked what kind of movie they
would prefer to watch, and the following responses were recorded:
i. 6 men preferred rom-coms.
ii. 16 men preferred action movies.
iii. 8 men preferred horror movies.
iv. 12 women preferred rom-coms.
v. 14 women preferred action movies.
vi. 4 women preferred horror movies.
The information collected was used to build the following two-way table:
Table 4.2: Two-way frequency table
Rom-com Action Horror Total
Men 6 16 8 30
Women 12 14 4 30
Total 18 30 12 60
The entries in the table are accounts; this type of table is called a double entry frequency table.
RES505: Research Methodology Page 141
The table has several functions:
i. Categories are labeled in the left column and top row.
ii. Scores are placed in the center of the board.
iii. The total appears at the end of each row and column.
iv. The sum of all accounts (total) is placed at the bottom right.
v. The sums in the right column and bottom row are called marginal distributions.
vi. The entries in the middle of the table are called common frequencies.
TWO-WAY RELATIVE FREQUENCY TABLE
Instead of displaying numbers in a table, you can display relative frequencies. Here is the same two-
way relative frequency table (decimal, percent or scale) displayed instead of numbers:
Table 4.3: Two-way relative frequency table
Rom-com Action Horror Total
Men 0.1 0.267 0.133 0.5
Women 0.2 0.233 0.067 0.5
Total 0.3 0.5 0.2 1
To convert the number to a relative frequency, divide the number by the total number of elements. In
the chart above, the first count is for men/Rom-com (count = 6), so 6/60 = 0.1.
The sums in the right column and bottom row, like the double-entry frequency table, are called
marginal distributions. However, the entries in the middle of the table are called conditional
frequencies or conditional distributions.
Answer:
1-tabulation 2- numerical 3-original 4-array 5- frequency
SUMMARY
This data is any quantifiable information that researchers can use for mathematical calculations and
statistical analysis to make real-life decisions based on these mathematical derivations. For instance,
how much did that laptop cost? is a question that will collect quantitative data. Quantitative data
makes measuring various parameters controllable due to the ease of mathematical derivations they
come with. Mechanism to naturally sense the measured parameters to create a constant source of
information. For example, a digital camera converts electromagnetic information to a string of
numerical data. Quantification of qualitative entities: Identify numbers to qualitative information.
KEY WORDS
Tabulation - Tabulation is a systematic and logical representation of numeric data in rows and
columns to facilitate comparison and statistical analysis.
Key Notes - Notes prefixed with comments or explanations.
Footnotes - Footnotes are notes placed at the bottom of a page and used to refer to sections of text.
Source notes - Sources are cited to develop and support the titles and references in an authoritative
record.
Column Headers - Column Headers are headers that identify a column in the worksheet. Column
headers are at the top of each column and are labeled A, B, ... Z, AA, AB ....
Row Header - Row Header or Row Header is the column to the left of Column 1 in the worksheet,
containing the numbers (1, 2, 3, etc.) in the sheet.
Table of Contents - The table content element identifies one or more rows that make up the main
body (or "body") of the table.
Scoring method - Scoring is a way of recording data in groups of five people.
Recording the frequency in this way is equivalent to the total number of tally ticks made
Map Alignment - Map alignment is a method used to help design or evaluate the information
architecture of a website.
REFERENCES
1. Siegel, Alan (2004), On universal classes 0f extremely random constant-time hash functions,
SIAM Journal on Computing, 33 (3): 505–543.
2. Morin, Pat (2014), Section 5.2.3: Tabulation hashing, Open Data Structures (in pseudocode)
(0.1Gβ ed.), pp. 115–116.
3. Mitzenmacher, Michael; Upfal, Eli (2014), Some practical randomized algorithms and
data structures, in Tucker, Allen; Gonzalez, Teofilo; Diaz-Herrera, Jorge (eds.), Computing
Handbook: Computer Science and Software Engineering (3rd ed.), CRC Press, pp. 11-1 - 11-23,
4. Siegel, Alan (2004), "On universal classes of extremely random constant-time hash functions",
SIAM Journal on Computing, 33 (3): 505–543.
YOUTUBE VIDEOS
1. https://www.youtube.com/watch?v=hECPeKv5tPM
2. https://www.youtube.com/watch?v=jEeqmHP4GcA
3. https://www.youtube.com/watch?v=Xr0BgvtXWwA
4. https://www.youtube.com/watch?v=JCiUNvfTks4
WIKIPEDIA
1. https://byjus.com/commerce/tabular-presentation-of-data/
2. https://www.questionpro.com/blog/quantitative-data/
3. https://www.cuemath.com/data/frequency-distribution-table/
4. https://www.statisticshowto.com/two-way-table/
REFERENCE BOOKS
1. Agresti A. (1990). Categorical Data Analysis. John Wiley and Sons, New York.
2. Kotz, S.; et al., eds. (2006), Encyclopedia of Statistical Sciences, Wiley.
3. Levine, D. (2014). An Easy to Understand Guide to Statistics and Analytics 3rd Edition.
Pearson FT Press.
4. Fink, Arlene (2005). How to Conduct Surveys. Thousand Oaks: Sage Publications.
RES505: Research Methodology Page 146
CREDIT 02-UNIT 03: GRAPHICAL REPRESENTATION
LEARNING OBJECTIVES
After successful completion of this unit, you will be able to
Determine which chart type best represents the data for a given situation.
Explain how graphs can lead to data misinterpretation.
Compare representations of the same data set using different graphs or the same type of
graph but with different scales
Choose the appropriate chart/ graph to represent a given data set
“Numbers have an important story to tell. They rely on you to give them a clear and convincing
voice.” ― Stephen Few
INTRODUCTION
A graph or chart or diagram is a diagrammatical illustration of a set of data. If the graph is
uploaded as an image file, it can be placed within articles just like any other image. Graphs must be
accurate and convey information efficiently. They should be viewable at different computer screen
resolutions. Ideally, graphs will also be aesthetically pleasing. Graphical representation is a way of
analyzing numerical data. It exhibits the relation between data, ideas, information and concepts in a
diagram. It is easy to understand and it is one of the most important learning strategies. It always
depends on the type of information in a particular domain.
A chart is a graphical representation for data visualization, in which "the data is represented by
symbols, such as bars in a bar chart, lines in a line chart, or slices in a pie chart". A chart can
represent tabular numeric data, functions or some kinds of quality structure and provides different
info. The term "chart" as a graphical representation of data has multiple meanings. A data chart is
a type of diagram or graph that organizes and represents a set of numerical or qualitative data. Maps
RES505: Research Methodology Page 147
that are adorned with extra information (map surround) for a specific purpose are often known as
charts, such as a nautical chart or aeronautical chart, typically spread over several map sheets. Other
domain-specific constructs are sometimes called charts, such as the chord chart in music notation or
a record chart for album popularity.
Charts are often used to ease understanding of large quantities of data and the relationships between
parts of the data. Charts can usually be read more quickly than the raw data. They are used in a wide
variety of fields, and can be created by hand (often on graph paper) or by computer using a charting
application.
Certain types of charts are more useful for presenting a given data set than others. For example, data
that presents percentages in different groups (such as "satisfied, not satisfied, unsure") are often
displayed in a pie chart, but maybe more easily understood when presented in a horizontal bar chart.
On the other hand, data that represents numbers that change over a period of time might be best
shown as a line chart. In the present script graphical representation and line graph are discussed.
It is a way of analyzing numerical data. It shows the relationships between data, ideas, information,
and concepts in a diagram. It is easy to understand and it is one of the most important learning
strategies. It always depends on the type of information in a particular field.
There are eight different types of graphical representations (Graph 1.1). Some of them are:
1. Line graph - Line chart or line graph is used to display continuous data and it is useful in
predicting future events over time.
2. Bar graph - Bar chart is used to display categories of data and it compares data using solid bars
to represent quantities.
3. Pictograph- A pictorial symbol for a word or phrase. Pictographs were used as the earliest
known form of writing; examples having been discovered in Egypt and Mesopotamia from
LINE GRAPH
A line chart is a type of chart used to display information that changes over time. We draw line
graphs using multiple points connected by line segments.
We also call it the line graph. Line charts have two axes known as the "x" and "y" axes.
For example, the line graph shows New York's temperature trends on a hot day (Graph 1.3).
USE HISTOGRAM
Histograms are used under certain conditions. They are:
i. Data must be numeric.
ii. Histograms are used to check the shape of the data distribution.
iii. Allows you to check if the process changes from one stage to another.
iv. Used to determine if the output is different when two or more processes are involved.
v. Used to analyze whether a given process meets customer requirements.
TYPES OF HISTOGRAM
Histograms can be classified into different categories based on the frequency distribution of the data.
There are different types of distributions, such as normal distribution, asymmetrical distribution,
two-way distribution, multimodal distribution, comb distribution, edge vertex distribution, dog food
distribution, distribution heart cut, etc. Histograms can be used to represent these different types of
distributions. The different chart types are:
i. Uniform histogram
ii. Symmetric histogram
iii. Bimodal histogram
iv. Probability histogram
i. Uniform Histogram
A uniform distribution reveals that the number of classes is too small, and each class has the same
number of elements. It may involve distribution that has several peaks (Graph 2.1).
FREQUENCY POLYGONS
A frequency polygon is a graphical representation of a data distribution that helps to
understand data through a particular shape. Frequency polygons are very similar to histograms
but are very useful and useful when comparing two or more data. The histogram mainly presents the
cumulative frequency distribution data in the form of a line graph. Let's learn about the frequency
polygon graph, the steps to create the graph and solve some examples to better understand the
concept.
A frequency polygon can be defined as a form of graph that represents information or data that is
widely used in statistics. This form of visual data representation helps to describe the shape and
trends of data in an organized and systematic way. The frequency polygons on the shape of the
histogram represent the number of occurrences of the class intervals. This type of chart is usually
plotted with a histogram, but can also be plotted without a histogram. While a histogram is a
histogram with rectangular bars with no gaps, a frequency polygon histogram is a line chart
representing cumulative frequency distribution data. The frequency polygon looks like the image
below (Graph 2.11):
In statistics, cumulative frequency is defined as the sum of frequencies distributed over different
class intervals. This means that data and totals are displayed in tabular form with frequencies
Cumulative frequency distributions are classified into two different types. That is, Ogive or less than
cumulative frequency and greater than/greater than cumulative frequency.
i. Less than cumulative frequency:
The distribution below the cumulative frequency is obtained by successively adding all previous
class frequencies together with the class in which they were written. This type starts accumulating
from the smallest size to the largest size.
ii. Greater than cumulative frequency:
A larger cumulative frequency is also called a larger type of cumulative frequency. Where the
distribution greater than the cumulative frequency is obtained by determining the overall cumulative
frequency from the highest class to the lowest class. Graph display of less than cumulative frequency
and cumulative frequency
Graphing cumulative frequencies is simpler and more convenient than using tables, bar charts,
frequency polygons, etc. Cumulative frequency charts can be drawn in two ways:
i. The following types of cumulative frequency distribution curves (or ogives)
ii. Create a cumulative frequency distribution curve (or ogive) with more than steps and less
than the cumulative frequency curve.
To create a less than cumulative frequency curve:
i. Mark the upper bound on the horizontal or X-axis.
ii. Mark the cumulative frequency on the vertical or Y-axis.
iii. Enter a point (X, Y) in the coordinate plane. where x represents the upper bound and y
represents the cumulative frequency.
iv. Finally, connect the dots to draw a smooth curve.
Steps to Construct Greater than Cumulative Frequency Curve To create a Greater Than/Greater
than Cumulative Frequency Curve:
i. Mark the lower limit on the horizontal axis.
ii. Mark the cumulative frequency on the vertical axis.
iii. Enter a point (X, Y) in the coordinate plane. where x represents the lower bound and Y
represents the cumulative frequency.
iv. Finally, connect the points to draw a smooth curve.
v. The curve thus obtained provides a cumulative frequency distribution graph of more than
one type.
To draw more than one kind of cumulative frequency chart, consider the same cumulative frequency
chart showing the number of participants in each essay writing contest by age (Table
3.2).
OGIVE
The word Ogive is a term used in architecture to describe a curved or curved shape. Ogive are
charts used to estimate the number of numbers below or above a particular variable or value in your
data. To create an Ogive, first the cumulative frequencies of the variables are calculated using a
frequency table. It does this by adding the counts of all previous variables in the given data set. The
DEFINITION
An ogive is defined as a frequency distribution graph of a series. An ogive is a cumulative
distribution plot that describes data values on the horizontal axis and cumulative relative counts,
cumulative counts, or cumulative percentage counts on the vertical axis. Cumulative frequency is
defined as the sum of all previous frequencies up to the present. To determine the popularity of a
particular piece of data, or the probability that it falls within a particular frequency range, the Ogive
curve helps pinpoint these details.
Create an ogive by plotting the points corresponding to the cumulative frequency of each class
interval. Most statisticians use the Ogive curve to visualize data pictorially. Useful for estimating the
number of observations below a certain value. A frequency chart is a frequency chart used to show
the properties of discrete and continuous data. Such numbers are more pleasing to the eye than
aggregated data. It is useful for facilitating comparative studies of two or more frequency
distribution. You can relate the shape and pattern of two frequency distributions. The two methods of
ogives are as follows (Graph 3.4).
Greater than or more than Ogive
OGIVE CHART
An ogive chart is a curve of a cumulative or relative cumulative frequency distribution. To draw such
a curve, we need to express the counts as a percentage of the total counts. Such percentages are then
accumulated and plotted, as in Ogive. Below are the steps to create an Ogive below and larger.
OGIVE EXAMPLE
Question 1: Construct the more than cumulative frequency table and draw the Ogive for the below-
given data (Table 3.3).
Marks 1-10 11-20 21-30 31-40 41-50 51-60 61-70 71-80
Frequency 3 8 12 14 10 6 5 2
Solution:
DISADVANTAGES :
i. Sometimes bar charts don't reveal patterns, causes, impacts, etc.
ii. It can be easily manipulated to create misinformation.
IMPORTANT NOTE:
Some important notes regarding bar charts are:
i. In a bar chart, there should be equal spacing between the bars.
ii. It is recommended to use a bar chart if the data frequency is very large.
iii. Understand what data should be presented on the x- and y-axis and the relationship between
the two.
PIE CHART
A pie chart is a graphical representation of data in the form of a chart or pie chart in which slices of
the pie represent the size of the data.
A list of numerical variables along with categorical variables is needed to present the data in the form
of a pie chart. The length of the arc of each slice, and therefore its area and central angle in the pie
chart, is proportional to its size.
A pie chart is a type of chart that displays data in a visual chart. It is one of the most popular charts
for data representation with attributes of circles, spheres and angular data to represent real world
information. The shape of a pie chart is circular, where the pie represents all the data and the slices of
the pie represent parts of the data and graph them discretely.
DEFINITION
A pie chart is a type of chart that records data in a circular manner that is further divided into
sections so that the data represents that particular part of the whole. Each of these sections or parts
represents a proportional part of the whole. It helps to interpret and present data. It is also used to
compare data.
Answer:
1-d 2-c 3-a 4-d 5-c
Answer:
1-True 2-True 3-True 4-False 5-False
COLUMN-I COLUMN-II
SUMMARY
It shows the relationships between data, ideas, information, and concepts in a diagram. As - Line
Chart - Line chart or line graph is used to display continuous data and it is useful in predicting future
KEY WORDS
Ogive- Ogive is defined as the histogram of the frequency distribution of a series.
Polygon - This is a type of line graph where the frequency of the layer is plotted based on the
midpoint of the layer and the points are connected by a line segment creating a curve.
Cumulative - it is an increase by successive additions.
Frequency - Frequency is the number of occurrences of a repeating event per unit of time.
Class interval - Class interval refers to the numerical width of any class in a particular distribution.
Bullet Curve - A Bullet Plot is a curve of Cumulative Frequency Distribution or Cumulative
Relative Frequency Distribution.
REFERENCES
1. Black, Ken (2009). Business Statistics: Contemporary Decision Making. John Wiley & Sons. p.
24.
nd
2. Everitt, B.S. (2002). The Cambridge Dictionary of Statistics (2 Ed.). Cambridge: Cambridge
University Press. ISBN 0-521-81099-X.
YOUTUBE VIDEOS
1. https://www.youtube.com/watch?v=_K0IBXcgk48
2. https://www.youtube.com/watch?v=SDYEwv0WxMo
3. https://www.youtube.com/watch?v=uHRqkGXX55I
4. https://www.youtube.com/watch?v=FVRJU--8YMY
WIKIPEDIA
1. https://asq.org/quality-resources/histogram
2. https://www.cuemath.com/data/frequency-polygons/
3. https://byjus.com/maths/ogive/
4. https://byjus.com/maths/bar-graph/
REFERENCE BOOKS
1. Dodge, Yadolah (2008). The Concise Encyclopedia of Statistics. Springer.
2. Robbins, (1995). Polygons inscribed in a circle, American Mathematical Monthly 102.
3. Coxeter, H.S.M. (1973). Regular Polytopes, 3rd Edn, Dover (pbk).
4. Russell, Bertrand, (2004). History of Western Philosophy, Reprint Edition, Routledge.
“Without big data analytics, companies are blind and deaf, wandering out onto the web like deer
on a freeway” - Geoffrey Moore,
INTRODUCTION
Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of
discovering useful information, informing conclusions, and supporting decision-making. Data
analysis has multiple facets and approaches, encompassing diverse techniques under a variety of
names, and is used in different business, science, and social science domains. In today's business
world, data analysis plays a role in making decisions more scientific and helping businesses operate
more effectively
Organizations may apply analytics to business data to describe, predict, and improve business
performance. Specifically, areas within analytics include descriptive analytics, diagnostic analytics,
Data analytics is defined as a process of cleaning, transforming, and modeling data to uncover
insights useful for business decision making. The purpose of data analysis is to extract useful
information from data and make decisions based on data analysis. Data analytics is the process of
analyzing raw data in order to extract meaningful insights. This can be done through a variety of
methods, such as statistical analysis or machine learning. The systematic application of statistical and
logical techniques to describe the data scope, modularize the data structure, condense the data
representation, illustrate via images, tables, and graphs, and evaluate statistical inclinations,
probability data, and derive meaningful conclusions known as Data Analysis. These analytical
procedures enable us to induce the underlying inference from data by eliminating the unnecessary
chaos created by its rest. Data generation is a continual process; this makes data analysis a
continuous, iterative process where the collection and performing data analysis simultaneously.
Ensuring data integrity is one of the essential components of data analysis.
There are various examples where data analysis is used, ranging from transportation, risk and fraud
detection, customer interaction, city planning healthcare, web search, digital advertisement, and
more.
Data analytics can help small businesses in a number of ways. By understanding data analytics,
businesses can make better decisions about where to allocate their resources and how to price their
There are several data analysis tools available in the market, each with its own set of functions. The
selection of tools should always be based on the type of analysis performed and the type of data
worked. Here is a list of a few compelling tools for (Fig.1.2) data analysis.
1. Excel: It has various compelling features, and with additional plugins installed, it can handle a
massive amount of data. So, if you have data that does not come near the significant data
margin, Excel can be a versatile tool for data analysis.
2. Tableau: It falls under the BI Tool category, made for the sole purpose of data analysis. The
essence of Tableau is the Pivot Table and Pivot Chart and works towards representing data in
the most user-friendly way. It additionally has a data cleaning feature along with brilliant
analytical functions.
3. Power BI: It initially started as a plug-in for Excel, but later on, detached from it to develop in
one of the most data analytics tools. It comes in three versions: Free, Pro, and Premium. Its
Power Pivot and DAX language can implement sophisticated advanced analytics similar to
writing Excel formulas.
4. Fine Report: Fine Report comes with a straightforward drag and drops operation, which helps
design various reports and build a data decision analysis system. It can directly connect to all
kinds of databases, and its format is similar to that of Excel. Additionally, it also provides a
variety of dashboard templates and several self-developed visual plug-in libraries.
iii. Data cleaning: Now any data collected may not be useful or relevant for your analysis
purposes, so it needs to be cleaned up. Collected data may contain duplicate records, spaces or
errors. The data must be clean and error-free. This phase should be done before analysis because
based on data cleaning; your analysis result will be closer to the expected result.
iv. Data analysis: After the data is collected, cleaned and processed, it is ready for analysis. As
you work with data, you may find that you have the exact information you need, or that you
may need to collect more data. During this phase, you can use data analysis tools and software
that will help you understand, interpret, and draw conclusions based on the requirements.
v. Interpreting data: After analyzing your data, it's finally time to interpret your results. You can
choose how to phrase or communicate your data analysis, you can just use words, or it can be
tables or charts. Then use the results of your data analysis to decide your best course of action.
But when your business is taking in hundreds of reviews each day, you can't have employees
weeding through responses one by one. You need tools that automate this process and present your
team with a breakdown of your customer reviews. This is where qualitative data analysis software
comes into play. Qualitative data analysis software reviews your survey and customer reviews in
bulk, saving your team valuable time during reporting. The best qualitative data analysis software
available and the best free options you can use with your team.
1. Hub Spot
2. MAXQDA
3. Quirkos
4. Qualtrics
5. Raven's Eye
6. Square Feedback
7. Free QDA
8. QDA Miner Lite
9. Connected Text
10. Visão
1. Hub Spot: As part of its Service Hub suite, Hub Spot offers a customer feedback tool that
provides detailed analytics for surveys and customer reviews. Your data gets centralized into one
accessible dashboard which includes different charts and graphs summarizing your customers'
responses. With this simple setup, your team has a quick and clean way to review their daily
analytics without navigating around the site.
Additionally, Hub Spot's Service Hub tools are integrated with NPS® surveys. Net Promoter
Score, or NPS, is a type of survey that collects both quantitative and qualitative customer
feedback. Hub Spot's customer feedback tool analyzes these responses and provides you with a
detailed breakdown of customer satisfaction based on its findings.
2. MAXQDA: MAXQDA is qualitative data analysis software that's designed for companies
analyzing different types of customer data. The software allows you to import data from
interviews, focus groups, surveys, videos, and even social media. This way you can review all of
your qualitative data in one central location. Once imported, MAXQDA lets you organize your
RES505: Research Methodology Page 186
information into categories or groups. You can mark specific data with tags and leave notes for
other employees to review your work. MAXQDA even lets you color code your data so that your
team knows exactly what to work on each day.
3. Quirkos: Quirkos includes a variety of tools that analyze and review qualitative data. One of its
most notable tools is its text analyzer which can find common keywords and phrases throughout
different text documents. Your team can upload its customer reviews or survey responses and use
this tool to identify recurring roadblocks in the customer experience.
Another interesting tool Quirkos provides is its ―word cloud‖ tool. The word cloud tool reviews
all of your text data and pulls out words that are frequently used. Then it groups them together
into a cluster to visualize the themes emerging from your data, just like in the example below.
4. Qualtrics: Qualtrics uses AI to review your survey data and forecast trends in customer behavior.
Its Predictive Analysis tool evaluates data and makes predictions about customer satisfaction
based on past survey responses. Use this information to interpret how customers will react to
changes you make to the customer experience. The Text Analysis‖ tool reviews survey comments
for popular trends and topics that are appearing in your customers' feedback. This tool saves your
team time by analyzing your surveys' qualitative data in bulk. Once the data is compounded,
Qualtrics provides you with a variety of display options including graphs, charts, slideshows, and
maps.
5. Raven's Eye: Raven's Eye is qualitative data software that can process multiple types of customer
data. One of its most popular features is its audio converter which uploads audio files into the
software and transforms them into text files. Then it analyzes the text for different insights into
customer behavior. So, if you conduct interviews or focus groups with customers, you can record
the audio for these sessions and upload them to Raven's Eye for analysis.
6. Square Feedback: Square Feedback is a free survey and customer feedback collection tool that
also provides qualitative data reporting. It can analyze survey responses to see how satisfied your
customers are with things like customer service, wait time, and product quality. It also includes
historical filter options that let you compare past data to current customer information to see how
your customer service has changed over time.
7. Free QDA: Free QDA is basic qualitative data analysis software that's commonly downloaded by
businesses looking for an inexpensive and simple tool. It uses a text analyzer to review customer
interviews and compounds the information into one central location. There you can create
categories for your data and group together popular words and phrases appearing in your
responses.
1. Error
Error is the collective noun for any departure of the result from the "true" value. Analytical errors
can be:
i. Random or unpredictable deviations between replicates, quantified with the "standard
deviation".
ii. Systematic or predictable regular deviation from the "true" value, quantified as "mean
difference" (i.e. the difference between the true value and the mean of replicate
determinations).
iii. Constant, unrelated to the concentration of the substance analyzed (the analyte).
iv. Proportional, i.e. related to the concentration of the analyte.
2. Accuracy
The "trueness" or the closeness of the analytical result to the "true" value. It is constituted by a
combination of random and systematic errors (precision and bias) and cannot be quantified directly.
The test result may be a mean of several values. An accurate determination produces a "true"
quantitative value, i.e. it is precise and free of bias.
3. Precision
The closeness with which results of replicate analyses of a sample agree. It is a measure of dispersion
or scattering around the mean value and usually expressed in terms of standard deviation, standard
error or a range (difference between the highest and the lowest result).
4. Bias
The consistent deviation of analytical results from the "true" value caused by systematic errors in a
procedure. Bias is the opposite but most used measure for "trueness" which is the agreement of the
mean of analytical results with the true value, i.e. excluding the contribution of randomness
represented in precision. There are several components contributing to bias:
i. Method bias: The difference between the (mean) test result obtained from a number of
laboratories using the same method and an accepted reference value. The method bias may
depend on the analyze level.
ii. Laboratory bias: The difference between the (mean) test result from a particular laboratory
and the accepted reference value.
SOFTWARE APPLICATION
The term "application software" refers to software that performs specific functions for the user.
When the user directly interacts with the software, it is called application software. The sole purpose
of application software is to assist users in performing specific tasks. Microsoft Word and Excel, as
well as popular web browsers like Firefox and Google Chrome, is examples of application software.
It also includes a portfolio of mobile apps, including apps like Whats App for communication and
games like Candy Crush Saga. There are also app versions of popular services, such as weather or
traffic information, as well as apps that allow users to connect with businesses. Global Positioning
System "GPS", Graphics, Multimedia, Presentation Software,
Desktop Publishing Software, etc. are examples of such software.
1. The word 'statistics' is derived from Latin word means status. (True/False)
2. Statistics is a science that deals with the techniques and methods of collection, classification and
presentation of data. (True/False)
3. A statistical question is one that results in varying responses and results (data). (True/False) 4.
There are two main types of statistical analysis: descriptive and inference, also unknown as
modeling. (True/False)
5. Statistics allows you to understand a subject much more superficially. (True/False)
Answer:
1-examination 2-numerical 3-any value 4-information 5-primary
SUMMARY
Data analytics is defined as a process of cleaning, transforming, and modeling data to uncover
insights useful for business decision making. It is nothing more than analyzing our past or future and
making decisions based on that. Now the same job that analyst does for business purpose is called
data analysis. To grow your business or even grow in your life, sometimes just analytics is enough.
This requires a well-designed study, a well-chosen sample, and an appropriate selection of statistical
tests. A variable is a trait that varies from person to person in a population. Quantitative variables are
KEY WORDS
REFERENCES
1. Ryan, Thorne (2013). Caffeine and computer screens: student programmers endure weekend
long appathon. The Arbiter. Archived from the original on 2016-07-09.
2. Ceruzzi, Paul E. (2000). A History of Modern Computing. Cambridge, Massachusetts:
MIT Press. ISBN 0-262-03255-4.
3. Kenney, J. F.; Keeping, E. S. (1962). Mathematics of Statistics, Part 1 (3rd ed.).
Princeton, NJ: Van Nostrand Reinhold.
4. Wasserman, Larry (2004). All of Statistics. New York: Springer. p. 310. ISBN 978-
1-4419-2322-6.
YOUTUBE VIDEOS
1. https://www.youtube.com/watch?v=yZvFH7B6gKI
2. https://www.youtube.com/watch?v=BTB86HeZVwk
WIKIPEDIA
1. https://www.guru99.com/what-is-data-analysis.html
2. https://imotions.com/blog/statistical-tools/
3. https://www.geeksforgeeks.org/what-is-application-software/
4. https://en.wikipedia.org/wiki/Application_software
REFERENCE BOOKS
LEARNING OBJECTIVES
Students demonstrate knowledge of statistical data analysis.
Define null hypothesis, alternative hypothesis, significance level, test statistic, p-value, and
statistical significance.
Students develop the ability to create and evaluate data models.
Students perform statistical analyzes using professional statistical software.
Students demonstrate data management skills.
“Far better an approximate answer to the right question, which is often vague, than an exact
answer to the wrong question, which can always be made precise” — John W. Tukey.
INTRODUCTION
Statisticians use observed data to estimate population parameters. For example, sample means
are used to estimate population means; sample proportions, to estimate population proportions.
Point estimate. A point estimate of a population parameter is a single value of a statistic. For
example, the sample mean x is a point estimate of the population mean μ. Similarly, the sample
proportion p is a point estimate of the population proportion P.
Interval estimate. An interval estimate is defined by two numbers, between which a population
parameter is said to lie. For example, a < x < b is an interval estimate of the population mean μ. It
indicates that the population mean is greater than a but less than b.
It is often of interest to learn about the characteristics of a large group of elements such as
individuals, households, buildings, products, parts, customers, and so on. All the elements of interest
in a particular study form the population. Because of time, cost, and other considerations, data often
IMPORTANCE OF STATISTICS
Basically, statistical analysis is used to collect and examine information that is available in large
quantities. Statistics is a branch of mathematics in which calculations are performed on various data
using graphs, tables, charts, etc. The data collected here for analysis are called metrics. Now, when
we need to measure the data based on the scenario, the sample is taken from the population. Then the
analysis or calculation is done for the next measurement.
STATISTICAL ESTIMATION
In statistics, estimation refers to the process of drawing conclusions about a population from data
obtained from a sample.
Statistics uses sample statistics to estimate population parameters. For example, sampling means are
used to estimate population means; A sample proportion to calculate a population proportion.
The accommodation parameter estimate can be expressed in two ways:
i. Rating points. A point estimate of a population parameter is a single statistical value. For
example, the sample means x is a point estimate of the population μ.
Similarly, the sample rate p is a point estimate of the population proportion P.
TYPES OF HYPOTHESES
Hypotheses can be roughly divided into different types. They are:
i. Simple Hypothesis: A simple hypothesis is a hypothesis that there is a relationship
between two variables. One is called the dependent variable and the other is the
independent variable.
ii. Complex Hypothesis: Complex hypothesis is used when there is a relationship between
the variables. There are more than two dependent and independent variables in this
hypothesis.
iii. Null Hypothesis: In the null hypothesis, there is no significant difference between the
populations reported in the experiments due to experimental or sampling error. The null
hypothesis is denoted by .
iv. Alternative Hypothesis: In the alternative hypothesis, simple observations are easily
influenced by chance. It is designated or .
FEATURES OF A HYPOTHESIS
The important features of a hypothesis are:
ALTERNATIVE HYPOTHESIS
It is represented by It is represented by
Example: Rohan will win at least Example: Rohan will win less than Rs.100000
in lucky draw.
Rs.100000 in lucky draw.
PAIRED TESTING
In statistics, a t-test can be represented as a statistical hypothesis test in which the test statistic
supports the Student's t-distribution when the null hypothesis is established. In a paired test, they
compare the means of two observation groups. Observations must be randomly assigned to each of
the two groups so that the difference in response observed is due to the treatment and not to other
factors. With two samples, an observation of one sample can be matched with an observation of the
other sample. This test can be used to make pre-event and post-event observations of a sample. Now,
let's take a closer look at what the paired t-test is, its formula, schedule, and how to perform the
paired t-test.
, Where = -
Answer: Pair testing helps break down barriers, collaborate with new people, and bounce ideas off of
testers' constructive feedback so that each role better understands where the other fits in and how it
leads to quality
DEFINITION OF P-VALUE
The P-value is called the probability value. It is defined as the probability of obtaining an outcome
that is the same or more extreme than the actual observation. The P value is known as the ultimate
level of significance within a hypothesis test that represents the probability that a given event will
occur. The P value is used as an alternative to the rejection point to indicate the lowest significance
at which the null hypothesis is rejected. When the P value is small, the alternative hypothesis is
stronger.
P-VALUE TABLE
Table 4.1: The P-value table shows the hypothesis interpretations
P-value Decision
P-value > The result is not statistically significant and hence don’t reject the null
hypothesis.
0.05
P-value < The result is statistically significant. Generally, reject the null
hypothesis in favour of the alternative hypothesis.
0.05
P - value < The result is highly statistically significant, and thus rejects
0.01 the null hypothesis in favour of the alternative hypothesis.
P - value > The result is not statistically significant and hence don’t reject
0.05 the null hypothesis.
T-TEST
The t-test is any statistical hypothesis test in which the test statistic follows a Student’s t distribution
under the null hypothesis. It can be used to determine if two sets of data are significantly different
from each other, and is most commonly applied when the test statistic would follow a normal
distribution if the value of a scaling term in the test statistic were known.
STANDARD DEVIATION
A standard deviation (or σ) is a measure of how dispersed the data is in relation to the mean. Low
standard deviation means data are clustered around the mean, and high standard deviation indicates
data are more spread out. Variance and Standard deviation are the two important topics in Statistics.
It is the measure of the dispersion of statistical data. Dispersion is the extent to which values in a
distribution differ from the average of the distribution. To quantify the extent of the variation, there
are certain measures namely:
i. Range
ii. Quartile Deviation
iii. Mean Deviation
iv. Standard Deviation
The degree of dispersion is calculated by the procedure of measuring the variation of data points. In
this article, you will learn what is variance and standard deviation, formulas, and the procedure to
find the values with examples.
DEFINITION OF F-TEST:
In statistics, a test statistic has an F-distribution under the null hypothesis is known as an F test. It is
used to compare the statistical models as per the data set available. George W. Snedecor, in honor of
Sir Ronald A. Fisher, has given name to this formula as F Test Formula.
F TEST FORMULA
If we are using an F Test using technology, the following steps are there:
State the null hypothesis with the alternate hypothesis.
Calculate the F-value, using the formula. Find the F Statistic which is the critical value for this test.
This F-statistic formula is the ratio of the variance of the group means divided by the mean of the
within-group variances. Finally, support or reject the Null Hypothesis.
Column-I Column-II
1. Chi-square test a. Test for significance difference of mean value in two small
sized sample when population deviation is not observed
2. ANOVA (f-test) b. Test for goodness fit of distribution
3. Z-test c. Test for significance difference of mean value in two
sample group
4. T-test d. Test for significance difference of mean value in two
sample group
Answer:
SUMMARY
distribution if the value of a scaling term in the test statistic were known.
A test statistic which has an F-distribution under the null hypothesis is called an F test. To compare
the variance of two different sets of values, the F test formula is used. To be applied to F distribution
under the null hypothesis, we first need to find out the mean of two given observations and then
calculate the variances.
KEY WORDS
Significance - A quality worthy of attention; Meaning.
Virtue- The quality of being moral or virtuous.
REFERENCES
1. Amrhein, Valentin; Greenland, Sander (2017). Remove, rather than redefine, statistical
significance. Nature Human Behaviour. 2 (1): 0224.
2. Royall R (2004). The Likelihood Paradigm for Statistical Evidence. The Nature of Scientific
Evidence. pp. 119–152.
3. Hubbard R, Bayarri MJ (2003), Confusion Over Measures of Evidence (p′s) Versus
Errors (α′s) in Classical Statistical Testing, The American Statistician, 57(3):171178
4. Goodman, S N (June 15, 1999). Toward evidence-based medical statistics. 1: The P Value
Fallacy. Ann Intern Med. 130 (12): 995–1004.
YOUTUBE VIDEOS
1. https://www.youtube.com/watch?v=KS6KEWaoOOE
2. https://www.youtube.com/watch?v=VK-rnA3-41c
3. https://www.youtube.com/watch?v=Q1yu6TQZ79w
4. https://www.youtube.com/watch?v=ChLO7wwt7h0
WIKIPEDIA
1. https://byjus.com/maths/tests-of-significance/
2. https://byjus.com/maths/what-is-null-hypothesis/
3. https://byjus.com/t-test-formula/
4. https://byjus.com/maths/p-value/
REFERENCE BOOK
1. Crease, Robert P. (2008), The Great Equations, New York: W. W. Norton.
2. Gregory Vlastos, Myles Burnyeat (1994), Socratic studies, Cambridge.
3. Bellhouse, P. (2001), John Arbuthnot", in Statisticians of the Centuries by C.C.
Heyde and E. Seneta, Springer.
4. Reinhart A (2015), Statistics Done Wrong: The Woefully Complete Guide. No Starch
Press.
RES505: Research Methodology Page 224
CREDIT 03-UNIT 02: BIOSTATISTICAL TEST
LEARNING OBJECTIVES
Students should know the specific terminology and spelling of statistical analysis.
Students should learn how statistical methods fit into the general scientific process.
Students should learn about testing, especially analysis and its importance.
Students should understand the concept of frequency distribution as these are individual
values on a measurement scale.
Students should organize the data into a normal or clustered frequency distribution table as
shown in table
“Statistics are the triumph of the quantitative method, and the quantitative method is the victory of
sterility and death.” -- Hilaire Belloc
INTRODUCTION
Statistical tests are used in the hypothesis testing. They can be used to: determine whether a
predicted variable has a statistically significant relationship with an outcome variable estimate the
difference between two or more groups. A statistical test provides a mechanism for making
quantitative decisions about a process or processes. The intent is to determine whether there is
enough evidence to "reject" a conjecture or hypothesis about the process. The conjecture is called the
null hypothesis. Not rejecting may be a good result if we want to continue to act as if we "believe"
the null hypothesis is true. Or it may be a disappointing result, possibly indicating we may not yet
have enough data to "prove" (Fig.2.1) something by rejecting the null hypothesis.
TYPES OF VARIABLES
The types of variables you have usually determine what type of statistical test you can use.
Quantitative variables represent a set of items (for example, the number of trees in a forest).
Types of quantitative variables include:
i Continuous: Represents dimensions and can usually be divided into units of less than one.
ii Discrete: Represents numbers and usually cannot be divided into units less than one.
iii Categorical variables represent groupings of objects (for example, different types of trees in a
forest). Types of categorical variables include:
iv Order: Display data in order (e.g. ratings).
v Nominal: for group names (e.g. brand or species names).
vi Binary: Displays dates with a yes/no or 1/0 outcome (e.g. win or lose).
PARAMETRIC TESTING:
Parametric tests generally have more stringent requirements than nonparametric tests and can draw
stronger conclusions from the data. They can only be performed on data that meet the usual
assumptions of a statistical test. The most common types of parametric tests include regression
tests, comparison tests, and correlation tests.
i. Regression tests: regression tests look for cause-and-effect relationships. They can be used to
estimate the effect of one or more continuous variables on another variable.
a Simple linear regression -
iii. Correlation Tests: Correlation tests test whether variables are related without assuming a
cause-and-effect relationship. These can be used to test whether two variables you want to use in a
multiple regression test are, for example, autocorrelated. Nonparametric tests make fewer
assumptions about the data and are useful when one or more general statistical assumptions are
violated. However, their conclusions are not as strong as parametric tests.
The t-test is any statistical hypothesis test in which the test statistic follows a Student's t-
distribution under the null hypothesis. A t-test is the most commonly applied when the test statistic
would follow a normal distribution if the value of a scaling term in the test statistic were known.
When the scaling term is unknown and is replaced by an estimate based on the data, the test statistics
(under certain conditions) follow a Student's t distribution. The t-test can be used, for example, to
determine if the means of two sets of data are significantly different from each other.
Student's-t test (t-test), analysis of variance (ANOVA), and analysis of covariance (ANCOVA)
are statistical methods used in the testing of hypothesis for comparison of means between the groups.
For these methods, the test variable (dependent variable) must be continuously scaled and
approximately normally distributed. The mean is a representative measure of normally distributed
continuous variables, and statistical methods used to compare means are called parametric methods.
For non-normal continuous variables, the median is a representative measure, in which case
comparisons between groups are made using non-parametric methods. Most parametric tests have an
alternative non-parametric test.
There are many statistical tests within Student's t-test (t-test), ANOVA, and ANCOVA, and each
test has its own set of hypotheses. Although not all methods are common, some can be managed with
other available methods. The purpose of this article is to review the assumptions, applications, and
interpretations of some popular t test, ANOVA, and ANCOVA methods.
TYPES OF T TESTS
i. A sample location test to determine whether a population measure has the value specified in
the null hypothesis.
ii. A two-sample test of the position of the null hypothesis that the means of the two populations
are equal.
All of these tests are commonly called Student's t-tests, although strictly speaking this name should
only be used when the variances of the two populations are also equal. The form of test used to reject
this hypothesis is sometimes referred to as the Welch t-test. These tests are often referred to as
unpaired tests or independent t-tests because they are usually used when the statistical units that are
not the same in the two samples being compared are used.
T-TEST FORMULA
T-tests can be performed either manually by using a formula or through some software.
Where x is the mean of the sample, and µ is the assumed mean, σ is the standard deviation, and n is
the number of observations.
T-TEST FOR THE DIFFERENCE IN MEAN:
Where x 1 and x 2 are the mean of two samples and σ1 and σ2 is the standard deviation of two
samples, and n1 and n2 are the numbers of observation of two samples.
ONE SAMPLE T-TEST (ONE -TAILED T-TEST)
i. One sample t-test is a statistical test where the critical area of a distribution is one-sided so
that the alternative hypothesis is accepted if the population parameter is either greater than
or less than a certain value, but not both.
ii. In the case where the t-score of the sample being tested falls into the critical area of a one-
sided test, the alternative hypothesis is to be accepted instead of the null hypothesis.
T-TEST EXAMPLE
If a sample of 10 copper wires is found to have a mean breaking strength of 527 kgs, is it feasible to
regard the sample as a part of a large population with a mean breaking strength of 578 kgs and a
standard deviation of 12.72 kgs? Test at 5% level of significance.
Taking the null hypothesis that the mean breaking strength of the population is equal to 578 kgs, we
can write:
t = (527+578) / (12.722/√10) t =
21.597
As Ha is two-sided in the given question, a two-tailed test is to use for the determination of the
rejection regions at a 5% level of significance which comes to as under, using normal curve area
table: R: | t | > 1.96
The observed value of t is -1.488 which is in the acceptance region since R: | t | > 1.96, and thus, H0
is accepted.
RES505: Research Methodology Page 232
T-TEST APPLICATIONS
i. The T-test is used to compare the mean of two samples, dependent or independent.
ii. It can also be used to determine if the sample mean is different from the assumed mean.
iii. T-test has an application in determining the confidence interval for a sample mean.
The chi-square test is performed to determine the difference between the theoretical population
parameter and the observed data.
i. The chi-square test is a non-parametric test that does not assume that the data is normally
distributed, but rather the chi-square.
iii. This test is commonly used to determine whether a random sample is drawn from a
population with mean µ and variance σ2.
The Chi-Square Test is performed for various purposes, some of which are:
ii. The chi-square test can also be used as a goodness-of-fit test. It allows us to measure how
closely the theoretical distribution matches the observed distribution.
iii. It also serves as a test of independence, which allows the researcher to determine whether
two population characteristics are related or not.
Chi-square test is symbolically written as χ2 and the formula of chi-square for comparing variance is
given as:
Where 𝑂 𝑗 is the observed frequency of the cell in the ith row and jth column,
For the chi-square test to be performed, the following conditions are to be satisfied:
i. Observations should be recorded and collected randomly.
ii. Instance elements must all be independent.
iii. iii. The frequency of data in a group should not be less than 10. In such conditions, the
rearrangement of the elements should be done by combining the frequencies.
iv. The total number of individual components in the sample should be very large, for example, 50
or more.
v. Limits on frequencies should be linear and have squares or higher powers
CHI-SQUARED DISTRIBUTION
i. The chi-square distribution in statistics is the distribution of the sum of squares of independent
normal random variables. ii. This distribution is a special case of the gamma distribution and
one of the most common distributions in statistics.
iii. This distribution is used for the chi-square test to test for goodness-of-fit or to test for
independence.
iv. The chi-square distribution is a subset of the t-distribution, the F-distribution used for t-tests
and ANOVA.
Answer: The chi-square value is a single number that summarizes all the differences between our
actual data and the expected data when there is no difference. When the actual data and the expected
data (when there is no difference) are the same, the chi-square value is 0. A larger difference results
in a larger chi-square value.
USES OF BIOSTATISTICS
Here are the ELEVEN most important uses of statistics:
1. Interpreting research and conclusions
Statistics are an important part of most sciences, helping researchers test hypotheses, confirm or
reject theories, and draw reliable conclusions. does the data generated by experiments and studies is
never easy - you have to take into account coincidences and uncertainties, eliminate coincidences
and achieve the most accurate results? Statistical analysis helps reduce or eliminate errors, allowing
researchers to draw reliable conclusions that then guide further research.
2 Meta-Analyses of Literature Reviews
Before a researcher or academic begins a new study, it is common to conduct a comprehensive
literature review of all available published data on a given topic. However, it is always difficult to
draw firm conclusions from multiple studies, especially when the studies follow different research
methods, are published in different journals (resulting in publication bias), or span a long period of
time. Statistical analysis of this study helps to reveal the general truth of this study or reveal hidden
patterns or relationships.
3. Clinical trial design
One of the most important applications of statistical analysis is in the design of clinical trials. When
a new drug or treatment is discovered, it must first be tested in one or more groups of people to
determine its effectiveness and safety. A clinical trial involves selecting the population/sample size,
determining the period over which the treatment should be monitored, designing the phases, and
selecting the parameters to determine whether the treatment is effective or not. Biostatisticians can
perform the task of statistical analysis not only in its design, but also in the analysis and
interpretation of results.
4. Design Study
Do people who go to the gym live healthier and happier lives? How safe is New York? How
effective is your HIV education program? These questions cannot be answered without the help of
Answer:
Column-I Column-II
1. T-test a. Like hood ratio
2. Student t-test b. Means of two group hypothesis tests
3. X2 test c. Null hypothesis
4. Pearson X2 test d. Observation of random set of variables
5. G-test e. Test of goodness of fit
Answer:
1-b 2-c 3-d 4-e 5-a
Answer:
SUMMARY
KEY WORDS
t-test- The t-test is any statistical hypothesis test in which the test statistic follows a Student's t-
distribution under the null hypothesis.
Chi-square test- A chi-squared test (χ2) is basically a data analysis on the basis of observations of a
random set of variables.
Parametric- The relating to a parameter, mathematical or statistical variable.
Biostatics- The application of statistical techniques to scientific research in biological sciences.
REFERENCES
1. Nikulin, M. S. (1973), Chi-squared test for normality, Proceedings of the International Vilnius
Conference on Probability Theory and Mathematical Statistics, vol. 2, pp. 119–122.
2. Brink, Susanne; & Van Schalkwyk, Dirk J. (1982), Serum ferritin and mean corpuscular volume as
predictors of bone marrow iron stores, South African Medical Journal, Vol. 61, pp. 432–434.
3. Magidson, Jay; The CHAID approach to segmentation modeling: chi-squared automatic interaction
detection, in Bagozzi, Richard P. (Ed); Advanced Methods of Marketing Research, Blackwell,
Oxford, GB, 1994, pp. 118–159.
4. Boneau, C. Alan (1960). The effects of violations of assumptions underlying the t test.
Psychological Bulletin. 57 (1): 49–64.
YOUTUBE VIDEOS
1. https://www.youtube.com/watch?v=gPt2DubVJQM
2. https://www.youtube.com/watch?v=Pb9-tashUn8
3. https://www.youtube.com/watch?v=0NwA9xxxtHw
4. https://www.youtube.com/watch?v=8uUKkL7qgFk
WIKIPEDIA
1. https://www.kolabtree.com/blog/6-essential-applications-of-statistical-analysis/
2. ttps://en.wikipedia.org/wiki/Student%27s_t-test
REFERENCE BOOKS
1. Dodge, Yadolah (2008), The Concise Encyclopedia of Statistics. Springer Science & Business
Media.
2. O'Mahony, Michael (1986), Sensory Evaluation of Food: Statistical Methods and Procedures.
CRC Press.
3. Corder, G. W.; Foreman, D. I. (2014), Nonparametric Statistics: A Step-by-Step Approach, New
York: Wiley.
4. Greenwood, Cindy; Nikulin, M. S. (1996), A guide to chi-squared testing, New York: Wiley.
"A judicious man looks on statistics not to get knowledge, but to save himself from having
ignorance foisted on him." --Thomas Carlyle
03-03-01: ANOVA
INTRODUCTION
Professor R. Fisher was the first to use the term "variance" and, in fact, it was he who developed the
detailed complete theory of ANOVA and explained its usefulness in practice. i.e. standard deviation
= variance.
There may be variation between samples and also within sample items. ANOVA consists in splitting
the variance for analytical purposes. Hence, it is a method of analyzing the variance to which a
response is subject into its various components corresponding to various sources of variation.
Through this technique one can explain whether various varieties of seeds or fertilizers or soils differ
significantly so that a policy decision could be taken accordingly, concerning a particular variety in
the context of agriculture researches. Similarly, the differences in various types of feed prepared for
a particular class of animal or various types of drugs manufactured for curing a specific disease may
be studied and judged to be significant or not through the application of ANOVA technique.
Analysis of variance, or ANOVA, is a strong statistical technique that is used to show the difference
between two or more means or components through significance tests. It also shows us a way to
make multiple comparisons of several populations’ means. The Anova test is performed by
comparing two types of variation, the variation between the sample means, as well as the variation
within each of the samples. The below mentioned formula represents one-way Anova test statistics:
Alternatively,
F = MST/MSE
MSE = SSE/N-p
SSE = ∑ (n−1)
S2
Where,
F = Anova Coefficient
MSB = Mean sum of squares between the groups
F-tests are named after Sir Ronald Fisher. The F statistic is simply the ratio of two variances. The
variance is the square of the standard deviation. For the average person, standard deviations are
easier to understand than variances because they are in the same units as the data, not the squared
units. The F-statistic is based on the mean square ratio. The term "mean square" may sound
confusing, but it is simply an estimate of population variance that uses degrees of freedom (D.F.) to
calculate this estimate.
F TEST FORMULA
A test statistic that has an F distribution under the null hypothesis is called an F test. It is used to
compare statistical models on or available data sets. George W. Snedecor in honor of Sir Ronald A.
Fisher called this formula the F test formula.
The F-test formula is used to compare the difference between two different sets of values. To apply
the F-distribution under the null hypothesis, we must first find the mean of two given observations
and then calculate the variance.
(F-Test Calculator is a free online tool that displays the mean, variance and frequency distribution
value for the given data set. Online F-test calculator tool makes the calculation faster, and also it
displays the F value in a fraction of seconds).
Step 1: Enter the set of data values separated by a comma in the input field
Step 2: Now click the button ―Calculate‖ to get the frequency value
Step 3: Finally, the mean, variance and the F-value will be displayed in the output field
F-value – Variance1/Variance2
When you have collected data on one categorical independent variable and one quantitative
dependent variable, using one-way ANOVA. The independent variable must have at least three
levels (ie, at least three different groups or categories).
ANOVA tells you whether the dependent variable changes as a function of the level of the
independent variable. For example, your independent variable is social media use, and you assign
groups to low, medium, and high levels of social media use to find out if there is a difference in
hours of sleep per night. Your independent variable is brand of soda, and you collect data on Coke,
Pepsi, Sprite, and Fanta to find out if there is a difference in the price per 100ml.
You, independent variable is type of fertilizer, and you treat crop fields with mixtures 1, 2 and 3 to
find out if there is a difference in crop yield.
The null hypothesis (H0) of ANOVA is that there is no difference among group means. The
alternate hypothesis (Ha) is that at least one group differs significantly from the overall mean of the
dependent variable.
ANOVA uses the F test for statistical significance. This makes it possible to compare multiple means
at the same time, since the error is calculated for the entire set of comparisons, rather than for each
one-sided two-sided comparison (as is the case in the t-test).
The F-test compares the difference in each group mean to the total group difference.
When the within-groups variance is less than the between-groups variance, the F-test will find a
higher F-value and therefore a higher probability that the observed difference is real and not due to
chance.
ASSUMPTIONS OF ANOVA
The assumptions of the ANOVA test are the same as the general assumptions for any parametric test:
i Independence of observations: the data have been collected using statistically sound methods
and there is no hidden relationship between the observations. If your data does not meet this
assumption because you have a confounding variable that you need to statistically control for,
use a block variable ANOVA.
ii Normal response variable: The values of the dependent variable follow a normal distribution.
iii Homogeneity of types: The comparable type within each group is the same for each group. If
the differences between groups are different, ANOVA is probably not appropriate for the data.
You can perform ANOVA manually, but it is difficult to do with more than a few observations. We
will perform our analysis in the R statistical program because it is free, powerful, and widely used.
For a complete explanation of this ANOVA example, see our guide to performing ANOVA in R.
The sample dataset from our imaginary crop yield experiment contains data about:
For the one-way ANOVA, we will only analyze the effect of fertilizer type on crop yield.
After loading the dataset into our R environment, we can use the command to run an ANOVA. In
this example we will model the differences in the mean of the response variable, crop yield, as a
function of type of fertilizer.
ONE-WAY ANOVA R CODE
The ANOVA output provides an estimate of how much variation in the dependent variable that can
be explained by the independent variable.
i The first column lists the independent variable along with the model residuals (aka the model
error).
ii The D.F. column displays the degrees of freedom for the independent variable (calculated by
taking the number of levels within the variable and subtracting 1), and the degrees of freedom
for the residuals (calculated by taking the total number of observations minus 1, then subtracting
the number of levels in each of the independent variables).
iii The Sum Sq column displays the sum of squares (a.k.a. the total variation) between the group
means and the overall mean explained by that variable. The sum of squares for the fertilizer
variable is 6.07, while the sum of squares of the residuals is 35.89.
iv The Mean Sq column is the mean of the sum of squares, which is calculated by dividing the sum
of squares by the degrees of freedom.
v The F-value column is the test statistic from the F test: the mean square of each independent
variable divided by the mean square of the residuals. The larger the F value, the more likely it is
that the variation associated with the independent variable is real and not due to chance.
vii Because the p-value of the independent variable, fertilizer, is significant (p < 0.05), it is likely
that fertilizer type does have a significant effect on average crop yield.
03-03-04: F-TEST
A two-way ANOVA is used to estimate how the mean of a quantitative variable changes according
to the levels of two categorical variables. Use a two-way ANOVA when you want to know how two
independent variables, in combination, affect a dependent variable.
You should use a two-way ANOVA when you’d like to know how two factors affect a response
variable and whether or not there is an interaction effect between the two factors on the response
variable.
For example, suppose a botanist wants to explore how sunlight exposure and watering frequency
affect plant growth. She plants 40 seeds and lets them grow for two months under different
conditions for sunlight exposure and watering frequency. After two months, she records the height of
each plant.
In this case, we have the following variables:
i. Response variable: Plant growth
ii. Factors: Sunlight exposure, watering frequency and we would like to answer the following
questions:
a. Does sunlight exposure affect plant growth?
b. Does watering frequency affect plant growth?
c. Is there an interaction effect between sunlight exposure and watering frequency? (e.g. the
effect that sunlight exposure has on the plants is dependent on watering frequency)
We would use a two-way ANOVA for this analysis because we have two factors. If instead we
wanted to know how only watering frequency affected plant growth, we would use a one-way
ANOVA since we would only be working with one factor.
For the results of a two-way ANOVA to be valid, the following assumptions should be met:
i. Normality – The response variable is approximately normally distributed for each group.
ii. Equal Variances – The variances for each group should be roughly equal.
iii. Independence – The observations in each group are independent of each other and the
observations within groups were obtained by a random sample.
In the table above, we see that there were five plants grown under each combination of conditions.
For example, there were five plants grown with daily watering and no sunlight and their heights after
two months were 4.8 inches, 4.4 inches, 3.2 inches, 3.9 inches, and 4.4 inches:
Answer:
1-True 2-False 3-False 4-True 5-True
KEY WORDS
Correlation- It is a statistical measure that expresses the extent to which two variables are linearly
related.
Summary- It is a brief statement or restatement of main points, especially as a conclusion to a work.
Homogeneity- The quality or state of being of a similar kind or of having a uniform structure or
composition throughout.
Calculated- worked out by mathematical calculation.
YOUTUBE VIDEOS
1. https://www.youtube.com/watch?v=xTpHD5WLuoA
2. https://www.youtube.com/watch?v=nvAMVY2cmok
3. https://www.youtube.com/watch?v=QfVx7AH8rck
REFERENCES
1. Smilde, A. K., Hoefsloot, H. C. and Westerhuis, J. A. (2008), The geometry of ASCA. Journal of
Chemometrics, 22, 464–471.
2. Smilde, Age K.; Jansen, Jeroen J.; Hoefsloot, Huub C. J.; Lamers, Robert-Jan A. N.; van der Greef,
Jan; Timmerman, Marieke E. (2005), ANOVA-simultaneous component analysis (ASCA): a new
tool for analyzing designed metabolomics data, Bioinformatics, 21 (13), 3043-3048.
3. Tiku, M. L. (1971), Power Function of the F-Test Under Non-Normal Situations. Journal of the
American Statistical Association. 66 (336): 913–916.
4. Hart (2001), Mann-Whitney test is not just a test of medians: differences in spread can be important.
BMJ. doi:10.1136/bmj.323.7309.391.
WIKIPEDIA
1. https://www.scribbr.com/statistics/one-way-anova/
2. https://en.wikipedia.org/wiki/Analysis_of_variance
3.https://en.wikipedia.org/wiki/One-way_analysis_of_variance
4.https://en.wikipedia.org/wiki/One-way_analysis_of_variance
REFERENCE BOOKS
1. Montgomery, Douglas C. (2001). Design and Analysis of Experiments (5 thEd.). New York: Wiley.
p. Section 3–2.
2. Montgomery, Douglas C. (2001). Design and Analysis of Experiments (5 thEd.). New York: Wiley.
p. Section 3–2.
LEARNING OBJECTIVES
Determine the direction and strength of the linear correlation between the two factors.
Interpret Pearson's correlation coefficient and coefficient of determination and test its significance.
List and explain the three assumptions and three limitations for estimating the correlation
coefficient.
Distinguish between the predictor variable and criterion variable.
Identify each source of variation in regression analysis.
Regression to the stage of early infancy is not a suitable method in and of itself. A regression can only
be effective if it happens in the natural course of therapy and if the client is able to maintain adult
consciousness at the same time‖-- Alice Miller
INTRODUCTION
Regression analysis is a powerful statistical method that allows you to examine the relationship
between two or more variables of interest. While there are many types of regression analysis, and
they all examine the influence of one or more independent variables on a dependent variable.
Regression analysis is a mathematical method for determining which of those factors has an effect. It
provides answers to the following questions:
i. Which factors are most important?
ii. Which of these may we disregard?
Where:
Y – variable that is dependent
X – Independent (explanatory) variable
a – Intercept
b – Slope
ϵ – Residual (error)
The dependent variable needs to be continuous/real, which is the most crucial component of Simple Linear
Regression. On the other hand, the independent variable can be evaluated using either continuous or
categorical values.
Multiple linear regression uses the same criteria as single linear regression. Due to the huge number of
independent variables in multiple linear regression, there is an extra need for the model:
The absence of a link between two independent variables with a low correlation is referred to as non-
collinearity. It would be hard to determine the true correlations between the dependent and independent
variables if the independent variables were strongly correlated.
3. Non-linear regression
A sort of regression analysis in which data is fitted to a model and then displayed numerically is known as
nonlinear regression.
Simple linear regression connects two variables (X and Y) in a straight line (y = mx + b), whereas
nonlinear regression connects two variables (X and Y) in a nonlinear (curved) relationship.
The goal of the model is to minimize the sum of squares as much as possible. The sum of squares is a
statistic that tracks how much Y observations differ from the nonlinear (curved) function that was used to
anticipate Y.
In the same way that linear regression modelling aims to graphically trace a specific response from a set of
factors, nonlinear regression modelling aims to do the same.
Because the function is generated by a series of approximations (iterations) that may be dependent on
trial-and-error, nonlinear models are more complex to develop than linear models.
The Gauss-Newton methodology and the Levenberg-Marquardt approach are two well-known approaches
used by mathematicians.
Therefore, the term linear regression often describes multivariate linear regression.
Where,
The above formulas are used to find the correlation coefficient for the given data. Based on the value
obtained through these formulas, we can determine how strong the association between two variables
is.
Regression Coefficient
Y = b0 + b1X
Here b0 is a constant and b1 is the regression coefficient. The formula for the regression coefficient is
given below.
The observed data sets are given by xi and yi. x and y are the mean value of the respective variables.
We know that there are two regression equations and two coefficients of regression.
byx = r.(σy/σx)
bxy = r.(σx/σy)
Where,
iii. They are not independent of the change of scale. They will change in the regression
coefficient if x and y are multiplied by any constant.
iv. The arithmetic mean of both regression coefficients is greater than or equal to the coefficient
of correlation.
v. The geometric mean between the two regression coefficients is equal to the correlation
coefficient.
vi. If bxy is positive, then byx is also positive and vice versa.
Most of the regression analysis is done to carry out processes in finances. So, here are 5 applications of
Regression Analysis in the field of finance and others relating to it.
Applications of regression analysis are as: -
i. Forecasting:
The most common use of regression analysis in business is for forecasting future opportunities
and threats. Demand analysis, for example, forecasts the number of things a customer is likely to
RES505: Research Methodology Page 271
buy. When it comes to business, though, demand is not the only dependent variable. Regressive
analysis can anticipate significantly more than just direct income. For example, we may predict
the highest bid for an advertising by forecasting the number of consumers who would pass in front
of a specific billboard.
Insurance firms depend extensively on regression analysis to forecast policyholder
creditworthiness and the number of claims that might be filed in a particular time period.
ii. CAPM:
The Capital Asset Pricing Model (CAPM), which establishes the link between an asset's projected
return and the related market risk premium, relies on the linear regression model. It is also
frequently used in financial analysis by financial analysts to anticipate corporate returns and
operational performance.
The beta coefficient of a stock is calculated using regression analysis. Beta is a measure of return
volatility in relation to total market risk. Because it reflects the slope of the CAPM regression, we
can rapidly calculate it in Excel using the SLOPE tool.
iii. Comparing with competition:
It may be used to compare a company's financial performance to that of a certain counterpart. It
may also be used to determine the relationship between two firms' stock prices (this can be
extended to find correlation between 2 competing companies, 2 companies operating in an
unrelated industry etc).
It can assist the firm in determining which aspects are influencing their sales in contrast to the
comparative firm. These techniques can assist small enterprises in achieving rapid success in a
short amount of time.
iv. Identifying problems:
Regression is useful not just for providing factual evidence for management choices, but also for
detecting judgement mistakes. A retail store manager, for example, may assume that extending
shopping hours will significantly boost sales.
However, RA might suggest that the increase in income isn't enough to cover the increase in
operational cost as a result of longer working hours (such as additional employee labour
charges). As a result, this research may give quantitative backing for choices and help managers
avoid making mistakes based on their intuitions.
v. Reliable source:
Many businesses and their top executives are now adopting regression analysis (and other types of
statistical analysis) to make better business decisions and reduce guesswork and gut
instinct. Regression enables firms to take a scientific approach to management. Both small and
large enterprises are frequently bombarded with an excessive amount of data. Managers may use
03-04-03: TOXICOLOGY
Toxicology is the scientific study of the adverse effects of chemicals on living organisms‖. It is the
observation and reporting of symptoms that occur as a result of exposure to toxic substances.
"Toxicology is a field of science that helps us understand the harm chemicals, substances or situations
can cause to people, animals, and the environment."
About 35 years ago, T.A. Loomis divided the science of toxicology into three main sub-disciplines:
environmental, economic, and forensic. These fractions are largely based on how humans are exposed
to potentially harmful chemicals.
TOXICITY TESTING
Toxicological testing, also called "safety assessment or toxicity testing‖, is the process of determining
the extent to which a substance of interest adversely affects normal biological functions. Usually the
organism with a specific time of exposure, route of exposure and concentration of substances.
Toxicological testing can be done by chemical characterization, route of toxicity, target testing and
dose-related extrapolation, etc. Substances are tested using a variety of methods, including topical
application, inhalation, oral administration, injection, or water.
SPECIALIZATION IN TOXICOLOGY
More information on each method can be found below, including EURL ECVAM recommendations,
in:
i. Alternative Methods Tracking System for Regulatory Acceptance (TSAR)
ii. Core service database on alternative methods for animal testing (DB -ALM)
New Directive 2010/63/EU further strengthens the role of ECVAM and its mandates Its duties and
tasks are determined as follows (7):
i. Coordinate and promote the development and use of alternatives to procedures, including in
the areas of basic and applied research and regulatory testing
xi Skin tones: irritants lead to a reversible local inflammatory response of the skin caused by the
innate (non-specific) immune system of the affected tissue.
xii Skin sensitization: Skin sensitization is a regulatory endpoint used to identify chemicals that
may cause an allergic reaction in sensitive individuals.
xiii Toxicokinetic: Toxic kinetic describes how the body handles chemicals as a function of dose
and time according to the concept of ADME (Absorption, Distribution, Metabolism and
Elimination).
03-04-04: DETOXIFICATION
The word “toxic” goes back to ancient Greek: “Toxon” means “bow” and archers often used poisoned
arrows.
“A live organism's elimination of hazardous chemicals through physiological or pharmacological means”.
Additionally, it can be used to describe the time during drug withdrawal when an organism regains
equilibrium following a prolonged usage of an addictive chemical. Decontamination of toxin ingestion,
the use of antidotes, as well as procedures like dialysis and chelation therapy, are all ways that
detoxification can be accomplished in medicine.
Intoxication
Chemical, biological, physical, radioactive, and behavioral toxicity are the five main categories. Parasites
and pathogenic microbes are harmful. Being under the influence of one or more psychoactive substances
is known as intoxication. It can also be used to describe the results of consuming poison or excessive
amounts of generally safe drugs.
TYPES OF DETOXIFICATION
SYMPTOMS OF INTOXICATION
RES505: Research Methodology Page 277
Specific symptoms of intoxication may vary depending on the substance that was ingested. However,
some of the common symptoms of alcohol intoxication include:
i. Ataxia: Ataxia is a condition that impairs walking. A drunk individual can have trouble walking
straight or keep falling over.
ii. Confusion and drowsiness: People who are intoxicated experience acute weariness and
disorientation.
iii. Euphoria: When under the influence, people may feel happy, talk a lot, and act in ways they
wouldn't usually do.
iv. Loss of inhibitions: Even a few drinks may cause people to feel more at ease, vulnerable, and
uninhibited.
v. Poor judgement: Being intoxicated can cause people to make poor choices and take risks, such
driving while intoxicated.
vi. Speech issues: Other common signs of intoxication include slurred speech and other speech
disorders.
vii. Vomiting: People who are intoxicated may vomit as their body attempts to recover.
PROCESSES OF DETOXIFICATION
i. Evaluation: Upon beginning drug detoxification, a patient is first tested to see which specific
substances are presently circulating in their bloodstream and the amount. Clinicians also evaluate
the patient for potential co-occurring disorders, dual diagnosis, and mental/behavioral issues.
ii. Stabilization: In this stage, the patient is guided through the process of detoxification. This may
be done with or without the use of medications but for the most part the former is more common.
Also, part of stabilization is explaining to the patient what to expect during treatment and the
recovery process.
iii. Guiding Patient into Treatment: The last step of the detoxification process is to ready the
patient for the actual recovery process. As drug detoxification only deals with the physical
dependency and addiction
We call the process of eliminating toxins, “detoxication” or “detoxification,” which is the opposite
of “intoxication.” Different tissues detoxify in varying ways.
a. Lungs- can detoxify by removing gases (gas anesthetics are removed from the body by the
lungs).
A variety of “detoxification” diets, regimens, and therapies sometimes called “detoxes” or “cleanses” have
been suggested as ways to remove toxins from your body, lose weight, or promote health. These include:
i. Fasting
ii. Drinking only juices or similar beverages
iii. Eating only certain foods
iv. Using dietary supplements or other commercial products
v. Using herbs
vi. Cleansing the colon (lower intestinal tract) with enemas, laxatives, or colon hydrotherapy (also
called “colonic irrigation” or “colonics”)
vii. Reducing environmental exposures
viii. Using a sauna.
Answer:
1-d 2-c 3-c 4-d 5-d
1. Correlation can be seen when two sets of data are graphed on a scatter plot, which is a graph with
an X and Y axis. (True/False)
Answer:
1-True 2- True 3-True 4-False 5-False
1- 2- 3- 4- 5- Karl 6- Auguste
relationship Positive relationships variables Pearson Bravais
SUMMARY
Toxicology is a field of science that helps us understand the harm that chemicals, substances or
situations can cause to people, animals, and the environment. Toxicologists study the interactions of
chemicals with plants, animals, and humans to determine the effects of chemicals and assess the safety
of compounds.
“It is the physiological or medicinal removal of toxic substances from a living organism.” Additionally, it
can refer to the period of drug withdrawal during which an organism returns to homeostasis after long-
term use of an addictive substance. In medicine, detoxification can be achieved by decontamination of
poison ingestion and the use of antidotes as well as techniques such as dialysis and chelation therapy.
There are generally five types of toxicities; chemical, biological, physical, radioactive and behavioral. Disease-
causing microorganisms and parasites are toxic. Intoxication is the state of being affected by one or more
psychoactive drugs. It can also refer to the effects caused by the ingestion of poison or by the overconsumption
of normally harmless substances.
KEY WORDS
Forecasting - Forecasting is a technique that uses historical data as input to make informed guesses
that predict the direction of future trends.
Reliable - Consistently good in quality or performance; believable
REFRENCES
1. Russell MAH, Cole PY, Idle MS, Adams L. Carbon monoxide yields of cigarettes and their
relation to nicotine yield and type of filter. BMJ 1975; 3:713.
2. Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of
clinical measurement. Lancet 1986; i:307-10.
3. Stulp, Freek, and Olivier Sigaud. Many Regression Algorithms, One Unified Model:
A Review. Neural Networks, vol. 69, Sept. 2015, pp. 60–79.
4. Armitage P, Berry G. In: Statistical Methods in Medical Research, 3 rdEd. Oxford:
Blackwell Scientific Publications, 1994:312-41.
YOUTUBE VIDEOS
1. https://www.youtube.com/watch?v=Tamoj84j64I
2. https://www.youtube.com/watch?v=K6kP9xmOrtk
3. https://www.youtube.com/watch?v=funYuQhPlmc
WIKIPEDIA
1. https://byjus.com/maths/correlation-and-regression/
2. https://www.cuemath.com/data/correlation-and-regression/
3. https://en.wikipedia.org/wiki/Geographical_indication
4. https://en.wikipedia.org/wiki/Toxicity
REFERENCE BOOKS
"Literature always anticipates life. It does not copy it but molds it to its purpose. The nineteenth
century, as we know it, is largely an invention of Balzac."-- Oscar Wilde
INTRODUCTION
The Literature Cited (bibliography) is found at the end of your paper and contains the complete
reference for each of the in-text citations used in your paper. Generally, a citation includes the
author(s), date, title and source of your publication. You should pay careful attention to details of
formatting when you write your own. For papers published in journals you must provide the date,
title, journal name, volume number, and page numbers. For books you need the publication date,
title, publisher, and place of publication.
ii. Learn about the information sources and the research methodologies
vii. Highlight the strengths, weaknesses and controversies in the field of study
The depth and scope of a literature review depends on many factors, most notably the purpose and
audience of the review. For example, if you are writing a literature review to help you write your
dissertation, a very comprehensive literature review that includes all relevant literature on your topic
and relevant sources beyond those readily available and free. may be required.
ii Editor's evaluation: The journal checks the structure and placement of the work against the
journal's author guidelines to ensure it includes the required sections and style. Paper quality is
not evaluated at this point.
iii Review by Editor-in-Chief (EIC): EIC checks whether the work is suitable for journaling and
whether it is sufficiently original and interesting. Otherwise, the work may be rejected without
further review.
iv The EIC assigns Associate Editors (AEs): Some journals have associate editors who perform
peer reviews. If so, it will be assigned at this stage.
v Invitation to be a reviewer: The responsible editor will send invitations to those who he
considers to be appropriate reviewers. Once responses are received, further invitations will be
issued, if necessary, until the required number of acceptances (usually he is two) is reached, with
some variation from journal to journal.
vi Responding to invitations: Potential reviewers evaluate invitations based on their expertise,
conflicts of interest, and availability. Then accept or decline. If possible, we will also suggest
alternative reviewers in case of rejection.
vii Under verification: Reviewers read the work over and over again over time. A first reading
will help you get a first impression of the work. If serious problems are identified at this stage,
the appraiser may refuse the work without further work. Otherwise, read the paper a few more
times and take notes to create a detailed point-by-point review. The review is then submitted to
the journal with a recommendation to accept or reject it. Or, if it is rejected, a request for
revision (usually marked as either major or minor) is made before being reconsidered.
viii Journal evaluates reviews: The responsible editor considers all ratings returned before making
an overall decision. If the ratings differ significantly, the publisher may invite additional
reviewers to provide additional input before making a decision.
ix You will be notified of the decision: The editor sends the author a decision email containing
the relevant reviewer comments. Whether comments are anonymous or not depends on the type
of peer review the journal conducts.
x Next Step: At this point, reviewers should also receive an email or letter notifying them of the
outcome of their review. When work is returned for revision, reviewers should expect to receive
a new version unless they opt out of future participation. However, if only minor changes are
requested, this review can be performed by the responsible editor.
CITATION TYPES
There are many different types of citations, but they typically use one of three basic approaches:
quoting in parentheses, quoting numbers, or quoting footnotes. The most obvious identifying
feature of any citation style is the way the citations are presented in the text (Fig.2.1).
Economics Harvard
Psychology APA
TYPES OF REPORTS: Here are some important types of research reports. As: -
i Technical report: In the technical report, the focus is on the methods used, the assumptions
made during the research, and the detailed presentation of the results, including their
limitations. and supporting data. For example, the project signals when a hotel is being
conceptualized.
ii Formal or informal report: A well-structured formal report is carefully written, objectively
clear, organized, and detailed enough to allow the reader to understand the concepts. They are
written with non-personal elements while an informal report can be direct, concise with
informal language, e.g. inter-departmental communication with an announcement or a memo.
iii Popularity report: The strength of this report is its simplicity and attractiveness. Simplification
is achieved through clear writing, minimal technical details, especially mathematical details,
and heavy use of tables and diagrams. Attractive layouts with large print, lots of subtitles, even
occasional figurines are another feature of popular reporting.
SCHEMATICS
Diagrams help identify the key elements of a system or process. They should only highlight key
elements, as adding unimportant elements can clutter the image. A diagram consisting only of
drawings chosen by the author, providing a degree of flexibility not available with images. They can
also be used in situations where photography is difficult or impossible. Below is a diagram
explaining how to use nanotubes to capture energy from liquids.
A project proposal is a written document outlining everything stakeholders should know about a
project, including the timeline, budget, objectives, and goals. Your project proposal should
summarize your project details and sell your idea so stakeholders feel inclined to get involved in the
initiative‖. Project proposal are an extremely important aspect of a project. It must be properly
structured and also contain necessary and pertinent information regarding the project. No data fields
are displayed in the project field.
The aim of the project is to create a good product and report and software, hardware, theory, etc.
that you have developed during the project is only a means to this end. Design documents should be
An orderly, well presented, and consistently formatted document is easy to read and suggests a
careful and professional attitude towards preparation. Remember that quantity does not automatically
guarantee quality. A 150-page report is not twice as good as a 75-page report, a 10,000-row
implementation is also not twice as good as a 5,000-row report. Concise, clear, and elegant are
invaluable qualities in report writing, as well as in programming, and will be rewarded. Try to ensure
that your report contains the following (exact structure, chapter titles, etc. are up to you):
TITLE PAGE
This must include the title of the project and the author's name. fake report. You can also provide the
name of your supervisor if you wish. important:
EXECUTIVE SUMMARY
The Executive Summary is a very brief summary of the content of the report. It should be about half
a page. Someone unfamiliar with your project should have a good idea of the project after reading
the brief themselves and will know if it interests them.
ACKNOWLEDGMENTS
It is generally a good idea to thank people who have provided exceptionally helpful support,
technical or otherwise, throughout your project. Your supervisor will obviously be happy to be
recognized as they will invest a lot of time monitoring your progress.
TABLE OF CONTENTS
This page should list the main chapters and (sub) sections of your report. Choose meaningful chapter
and section titles and use double-spaced lines for clarity. If possible, you should include a page
number indicating the beginning of each chapter/section. Try to avoid too many levels of subtitles -
three is enough.
INTRODUCTION
RES505: Research Methodology Page 307
This is one of the most important elements of the report. It should begin with a clear statement of the
project's purpose so that the nature and scope of the project can be understood by the casual reader. It
should summarize everything you want to achieve, providing a clear summary of the project's
background, relevance, and key contributions. The introduction should set the scene for the project
and provide the reader with a summary of the main things to look for in the rest of the report. When
detailing contributions, it is helpful to provide hints to the section(s) of the report that provide
relevant technical details. The introduction itself is largely non-technical. It is helpful to state the
main goals of the project as part of the introduction. However, avoid the temptation to list low-level
goals one after another in the introduction, and then, in the review (see below), say a reference like
"All of your goals project achieved ...".
CONTEXT
The context section of the report should identify the project's place in its context and provide a
proposed alternative to achieving the project's objectives. Background can be included as part of an
introduction but is usually better in a separate chapter, especially if the project involves a large
amount of background work. When referring to other works, cite the sources in which they are
mentioned or used, rather than just listing them at the end.
BODY OF THE PROPOSAL
The central part of the report usually consists of three or four programs detailing the engineering
work carried out on the project. The structure of these chapters depends a lot on the project. They
may reflect the chronological development of the project, e.g. design, implementation, testing,
optimization, evaluation, etc. If you've built new software, you should describe and demonstrate your
program design at a high level, possibly using an approved graphical form such as UML. It should
also document any issues or interesting features in your implementation. Integration and testing are
also important to discuss in some cases. You should thoroughly discuss the contents of these sections
with your supervisor.
CONCLUSIONS OF WORK
The project conclusion should list what has been learned as a result of the work you did. For
example, "The use of overloading in C++ provides a very elegant mechanism for parallelization
throughout sequential programs". Avoid tedious personal reflections like "I've learned a lot about
C++ programming..." Usually, end a report by listing ways to go further in the project. For example,
this could be a plan to improve the project if you had the opportunity to rework it, turning the
project's deliverables into a more polished final product.
REFERENCES
Appendices contain information outside the main body of the report. The information often includes
items such as sections of code, tables, test cases, or other documentation that can break the theme of
the text if it appears on the spot. You should try to link all your documents into one volume and
create the black book.
PROGRAM LIST
A complete program list is NOT included in the report, except in specific cases requested by your
supervisor. We highly recommend spending time looking at student reports from previous projects to
get an idea of what's good and bad. All reports from previous years are available in hard copy in the
CCCF and electronically in the student projects section. These documents are only accessible from
the TIFR IP domain.
A project report is simply a document that provides detail on the overall status of the project or
specific aspects of the project's progress or performance‖. Regardless of the type of report, it is made
up of project data based on economic, technical, financial, managerial or production aspects.
Depending on the project and organizational processes, additional project reports with in-depth
analysis and recommendations may also be required at the end of the project. Report writing is a
useful opportunity to evaluate a project, document lessons learned, and enrich your organization's
knowledge base for future projects. Try these steps to write better project reports.
ii Understand your audience: Writing a formal annual report for your stakeholders is very
different from a financial review. Adjust language, data usage, and supporting graphics for your
audience. It is also helpful to consider the reader's personal communication style, for example,
how do they write emails or structure documents? Reflect their preferences as much as possible.
You may need to develop a more formal or informal tone for your own natural style. Applying
this technique will build rapport and make your ideas more receptive to your readers.
iii Format and type of report: Before you begin, check the format and type of the report. Do you
need to submit a written report or make a presentation? Do you need to write a formal, informal,
financial, annual, technical, investigative or problem-solving report? You should also confirm if
templates are available in the organization. Checking out these details can save you time later!
iv Gather facts and data: Include interesting facts and facts that will strengthen your argument.
Get started with your collaborative project site and work as needed. Remember to cite sources
such as articles, case studies, and interviews.
DISCUSSION WRITING
INTRODUCTION (DISCUSSION WRITING )
The purpose of the discussion is to interpret and describe the significance of your findings in
light of what was already known about the research problem being investigated, and to explain any
new understanding or fresh insights about the problem after you've taken the findings into
consideration. The discussion will always connect to the introduction by way of the research
questions or hypotheses you posed and the literature you reviewed, but it does not simply repeat or
rearrange the introduction; the discussion should always explain how your study has moved the
reader's understanding of the research problem forward from where you left them at the end of the
introduction.
This section is often considered the most important part of a research paper. It most effectively
demonstrates your ability as a researcher to think critically about an issue, to develop creative
solutions to problems based on the findings, and to formulate a deeper, more profound understanding
of the research problem you are studying (Fig.4.4).
i. The discussion section is where you explore the underlying meaning of your research, its
possible implications in other areas of study, and the possible improvements that can be made
in order to further develop the concerns of your research.
ii. This is the section where you need to present the importance of your study and how it
may be able to contribute to and/or fill existing gaps in the field. If appropriate, the discussion
section is also where you state how the findings from your study revealed new gaps in the
literature that had not been previously exposed or adequately described.
I. General Rules
These are the general rules you should adopt when composing your discussion of the results:
i. Do not be verbose or repetitive.
ii. Be concise and make your points clearly.
iii. Avoid using jargon and Follow a logical stream of thought.
iv. Use the present verb tense, especially for established facts; however, refer to specific works
and references in the past tense.
v. If needed, use subheadings to help organize your presentation or to group your interpretations
into themes.
II. The Content
The content of the discussion section of your paper most often includes:
i. Explanation of results: comment on whether or not the results were expected and present
explanations for the results; go into greater depth when explaining findings that were
unexpected or especially profound. If appropriate, note any unusual or unanticipated patterns
or trends that emerged from your results and explain their meaning.
ii. References to previous research: compare your results with the findings from other studies,
or use the studies to support a claim. This can include re-visiting key sources already cited in
your literature review section, or, save them to cite later in the discussion section if they are
more important to compare with your results than being part of the general research you cited
to provide context and background information.
iii. Deduction: a claim for how the results can be applied more generally. For example,
describing lessons learned, proposing recommendations that can help improve a situation, or
recommending best practices.
iv. Hypothesis: a more general claim or possible conclusion arising from the results [which may
be proved or disproved in subsequent research].
III. Organization and Structure
Keep the following sequential points in mind as you organize and write the discussion section of
your paper:
i. Think of your discussion as an inverted pyramid. Organize the discussion from the general to
the specific, linking your findings to the literature, then to theory, then to practice [if
appropriate].
The etymology of the term goes back semantically to the neo-Latin bibliography. It is a Greek word
that means to copy books, bibli (book) and graphia - graphy (writing). This concept was adopted
by Greek writers in the first three centuries of our era and is known as manual copying.
DEFINITION
A bibliography is the list of sources a work's author used to create the work. It accompanies just
about every type of academic writing, like essays, research papers, and reports.
A bibliography is a list of all the sources you have used (whether referenced or not) in the course of
researching your work.
In general, a bibliography should include: names of authors, names of works, names and locations of
companies that have published copies of sources. The bibliography must clearly and fully describe
the sources used to prepare the report. This is an alphabetical list by last name of the authors.
TYPES OF BIBLIOGRAPHY
Carter and Barker describe bibliography as a dual academic discipline: an organized listing of books
(bibliography) and a systematic description of books as physical objects (bibliography) description
item. These two distinct concepts and practices have distinct reasons and serve different purposes.
i. An enumerative bibliography
This is a systematic listing of books and other works such as journal articles. Bibliography
ranges from "cited works" lists at the end of books and articles, to comprehensive and
independent publications. A notable example of a comprehensive, independent publication is
Gow's A. E. Housman: An outline, together with a list of his classic articles (1936). As separate
works, they can be found in constrained volumes such as those shown on the right, or in
Bibliography Format for a Book: A standard bibliography for a book typically consists of the
following information:
i. Author(s)
ii. Title
iii. Publisher
iv. Date of Publication Example: Surname of author, name or two initials, Title taken from
title page-underlined or in italics, Edition (if more than one), volume if more than one,
place of publication, publishers, date on title page or copyright date. e.g. Kothari, C.R.,
Research Methods and Techniques,1989, New Delhi: Wiley Eastern Limited,4835/24
Ansari Road, Daryaganj, New Delhi 110 006.
Bibliography Format for a Periodical & Journal Article: An entry for a journal or periodical
article contains the following information:
i. Author(s)
ii. Article Title
iii. Journal Title
iv. Volume Number
v. Pages
vi. Date of Publication
Bibliography Format for Internet Sources: Format for internet sources usually includes the
following information:
i. Author (Website)
ii. Article Title
iii. Publication Information
iv. Version
1) What is a bibliography?
Answer: The term bibliography is used to refer to the list of sources (e.g. books, articles, websites) used
to write an assignment (e.g. an essay). It usually includes all sources referenced even if they are not
directly cited (mentioned) in the assignment.
Answer:
1-d 2-a 3-b 4-c 5-d
1. APA and MLA are the most common styles to use. (True/False)
2. Formatting of report writing also makes easy to writer to write. (True/False)
3. Vague research question and going off-topic is better. (True/False)
4. Corrected proofs are articles in press that contain the author's corrections. (True/False)
5. Online proofing is the process of sharing content for feedback and approval. (True/False)
Column-I Column-II
1. Introduction a. The last part of something, its end or result, summarized
2. Methodology b. A systematic investigation designed to develop a knowledge
3. Result c. A body of methods, rules, and postulates employed
4. Discussion d. Something that happens or exists because of something else
5. Conclusion e. A conversation or debate about a specific topic
Answer:
1-b 2-c 3-d 4-e 5-a
SUMMARY
This section is often considered the most important part of a research paper because it most effectively
demonstrates your ability as a researcher to think critically about an issue, to develop creative solutions
to problems based on the findings, and to formulate a deeper, more profound understanding of the
research problem you are studying. If appropriate, the discussion section is also where you state how the
findings from your study revealed new gaps in the literature that had not been previously exposed or
adequately described.
KEY WORDS
REFERENCES
1. Lawrence, Amanda (2018), Chan, Leslie; Loizides, Fernando (Ed). Influence Seekers: The
Production of Grey Literature for Policy and Practice. Information Services & Use. 37 (4): 389–403.
2. Kukull, W. A.; Ganguli, M. (2012), Generalizability: The trees, the forest, and the low-hanging
fruit. Neurology. 78 (23): 1886–1891.
3. Canagarajah, A. Suresh (1996), From Critical Research Practice to Critical Research Reporting.
TESOL Quarterly. 30 (2): 321–331.
4. Gauch, Jr., H.G. (2003), Scientific method in practice. Cambridge, UK: Cambridge University
Press. 2003 ISBN 0-521-81689-0.
YOUTUBE VIDEOS
1. https://www.youtube.com/watch?v=-ny_eujxhhs
2. https://www.youtube.com/watch?v=cMJWtNDqGzI
3. https://www.youtube.com/watch?v=3iE4WAjaPE0
WIKIPEDIA
1. https://www.brightwork.com/blog/7-steps-effective-report-writing
2. https://www.scribbr.com/category/research-paper/
3. https://library.sacredheart.edu/c.php?g=29803&p=185933
4. ttps://www.scribbr.com/category/research-paper/
REFERENCE BOOKS
“Stealing music is not right, and I can understand people being very upset about their intellectual
property being stolen.” --- Steve Jobs
INTRODUCTION
Intellectual Property: “Intellectual Property (IP) refers to creations of the mind, such as
inventions; literary and artistic works; designs; and symbols, names and images used in commerce”.
Intellectual Property Rights (IPR) refers to the legal rights given to the inventor or creator to
protect his invention or creation for a certain period of time. These legal rights confer an
exclusive right to the inventor/creator or his assignee to fully utilize his invention/creation for a
given period of time.
Intellectual property rights include patents, copyright, industrial design rights, trademarks, plant
variety rights, trade dress, geographical indications, and in some jurisdictions trade secrets.
There are also more specialized or derived varieties of sui generis exclusive rights, such as circuit
design rights, supplementary protection certificates for pharmaceutical products and database rights.
The term "industrial property" is sometimes used to refer to a large subset of intellectual property
NEED OF IPR
The progress and happiness of mankind depends on the ability to create and invent new works in the
fields of technology and culture. The basic needs are: -
i. Encourage innovation: Legal protection of new creations encourages a commitment to
providing additional resources for new creations,
ii. Economic growth: The promotion and protection of intellectual property stimulates
economic growth, creates jobs, new industries, improves the quality and enjoyment of life,
iii. Protecting Creators' Rights: Intellectual property rights are required to protect creators and
other producers of their intellectual products, goods, and services by granting them certain
lifetime rights,
OBJECTIVES OF IPR
i. Outreach and Promotion - To create public awareness about the economic, social and
cultural benefits of IPRs among all sections of society.
Fig.4.2.2: Patent
The word patent originates from the Latin ‘patere’, which means "to lay open". It is a shortened
version of the term letters patent, which was an open document or instrument issued by a monarch or
government granting exclusive rights to a person, predating the modern patent system. Similar grants
included land patents, which were land grants by early state governments.
A patent is often referred to as a form of intellectual property right, an expression which is also
used to refer to trademarks and copyrights, and which has proponents and detractors. Some other types
of intellectual property rights are also called patents in some jurisdictions: industrial design rights are
called design patents, plant breeders' rights, and utility models are sometimes called petty patents or
innovation patents. Particular species of patents for inventions include biological patents, business
method patents, chemical patents and software patents
Under the World Trade Organization's (WTO) TRIPS Agreement, patents should be available in
WTO member states for any invention, in all fields of technology, provided they are new, involve an
inventive step, and are capable of industrial application.
There are variations on what is patentable subject matter from country to country, also among WTO
member states. TRIPS also provide that the term of protection available should be a minimum of 20
years.
Types of patents: There are three types of patents. Such as: -
i Utility patents may be granted to anyone who invents or discovers any new and useful
process, machine, article of manufacture, or composition of matter,
ii Design patents may be granted to anyone who invents a new, original, and ornamental design
for an article of manufacture; and
iii Plant patents may be granted to anyone who invents or discovers and asexually reproduces
any distinct and new variety of plant. Utility and plant patents last for 20 years from the date of
COPYRIGHTS
According to the United States Patent and Trademark Office, "original works of authorship" are
protected by copyrights, which are legal protections for creative works of the mind. They incorporate
visual craftsmanship, artistic works, different compositions, movement, and programming. Copyrights
keep others from duplicating the work without the communicated consent of the copyright proprietor.
Copyrights, like other forms of intellectual property, are granted for a predetermined period of time,
allowing the owner to profit from their creation. Copyrights are conceded for the most extreme time of a
long time from the demise of the creator. Special cases apply to works for recruit and unknown works.
The Exchange Imprints Vault was laid out in India in 1940 and by and by it manages the Exchange
GEOGRAPHICAL INDICATIONS
India, as an individual from WTO, ordered the Geological Signs of Merchandise (Enlistment and Security)
Act, 1999. It entered into effect on September 15, 2003. Topographical Markers have been characterized
under Article 22 (1) of the WTO Settlement on Excursions.
INDUSTRIAL DESIGNS
From a legitimate perspective, a modern plan is the decorative part of an article. Two-dimensional
features, such as patterns, lines, or colors, or three-dimensional features, like an item's shape, can make up
an industrial design.
3) What is a trademark?
Answer: A trademark is a sign capable of distinguishing the goods or services of one enterprise from
those of other enterprises. Trademarks are protected by intellectual property rights.
Patent Applications can either be provisional or complete. Both these filings serve different purposes in
the patent application process. Provisional patent filing establishes patent rights over a product which is
yet to be developed and helps claim an early filing date. The status of the Patent after Provisional patent
filing remains “Pending.” A 12 months window is further provided to develop the invention before a
complete patent application can actually be filed.
It is tempting to think all searching can be done electronically and for the majority of modern patents
(published after 1975) this is essentially true. Patent searchers, especially inventors who need to
thoroughly search the entire realm of patents to ensure their idea hasn't already been patented, have more
limited options available electronically and for free. Pre-1976 U.S. patents are often difficult to find
because the patent pages were put into the USPTO database as scanned images, and full text searching
was added later through machine transcription. Older patents from outside the U.S. can be even more
challenging to find. Below you'll find some basic tips and strategies for locating patents.
Doing a preliminary patent search is an important first step for inventors hoping to patent their new
invention. The following tutorial, produced by the USPTO, details a patent-searching process that can be
adapted for just about any free patent search tool.
Keyword searching using free patent search tools may give you an idea of what is out there (or not,
depending on which terms you use). You can also use this strategy to identify classification codes,
inventor names, and other information you can then use to run additional searches. Keyword searching
should not be your primary or only patent searching strategy if you are conducting a preliminary patent
search or are doing your own prior art search. As: -
i The Lens: The Lens covers over 100 million patent documents from around the world. Includes
classification searching and quick access to patent family information.
ii Google Patents: Quick keyword searching for US and other patents. Full text of older patents
may have issues related to automated character recognition from scanned patent image files.
The patent number is the key to the entire patent information system. It doesn't matter when or
where a patent was issued, as long as you know the number, you'll be able to get a full-text patent
in no time. Most free patent search websites will let you type in a U.S patent number and get a
PDF version of that patent. Some search for patents from other countries as well.
If you know the name of the inventors, owners, or assignees, you can search using Lens and
Espacenet respectively. These search tools allow you to narrow down your search to certain fields
(e.g., assignee name, inventor name, owner name, etc.) in the full text of the patent. This allows
DRAFTING OF PATENT
Patent drafting is part of how an idea is patented and is the process of writing a description and making a
patent claim. This is at the heart of every patent application. When a patent is granted or licensed, the draft
serves as the specification portion of the document.
As a first step, your patent attorney asks you to enter into an invention disclosure agreement. This allows
you to communicate your invention in sufficient detail so that the attorney understands the invention. At
this point, your attorney begins drafting a patent application that begins with the design statements.
Once your attorney has accurately grasped the scope of the invention in the draft claim, the inventor or
drafter begins to prepare all the necessary drawings to help better explain the claims. household. In some
cases, images show existing inventions that make the right distinction between these elements and the
innovation you're adopting.
During patent drafting, many collaborative discussions take place between you, the designer, and the
attorney. It is not uncommon for the scope of claims to vary slightly during this period. When these
changes occur, it may be an attempt to further differentiate the new invention from the existing ones.
These changes may also involve new or expanded understanding of the invention or its use.
As outlined in 37 CFR 1.77, the non-provisional patent draft includes the following thirteen sections:
The title of your invention
A cross-referenced list of any related patent applications
A statement about any federally sponsored R&D —if applicable
The names of all parties if there is a joint research agreement
References to a "sequential listing," any tables or computer program listings, as well as any appendix
submitted to a CD or storage device and the incorporation-by-reference list
Background information on the invention
A brief summary of invention
A short description of the drawings
A detailed description of invention
The claim or claims
An abstract of the disclosure.
Sequence Listing, if not supplied on a CD or storage device
PATENT REGISTRATION
Patent registration is a legal process that grants exclusive rights of ownership and use to the inventor of a
product, service, or technology. Thus, the inventor gets the exclusive right to his invention for the entire
period of validity of the patent registration. The patent registration process is extremely important for
inventors and businesses to protect their innovative ideas and prevent others from using, selling or
producing their inventions without permission or licensed. In India, patent registration is governed by the
Patent Act 1970 and administered by the Indian Patent Office.
Benefits of Patent Registration: Benefits of patents are as: -
I. Legal protection: Patent registration provides legal protection to inventors by granting them
exclusive rights to their inventions. It prohibits anyone from making, using, selling or importing a patented
invention without the permission of the inventor.
ii. Market advantage: Patent acquisition gives inventors and companies a competitive edge in the
marketplace. This allows them to make the most of their unique inventions and prevents competitors from
stealing their ideas. With patents, inventors can gain a monopoly on the market, set themselves apart from
competitors, and can demand higher prices or licensing fees for patented technology. their regime.
iii. Financial opportunities: Patents can open up a variety of financial opportunities. They can attract
investors and venture capitalists interested in supporting innovative technologies. Patented inventions can
also generate revenue through licensing agreements. Additionally, patents can increase the overall value of
a company, making it more attractive for mergers, acquisitions, or partnerships.
iv. Encourage research and development: Patent registration encourages and rewards innovation by
providing inventors with limited-time exclusivity. This encourages inventors and companies to invest in
research and development (R&D), knowing that they will enjoy exclusive patent rights and potential
financial benefits from their inventions. me.
Utility patents are one of the most popular types of patents in India. These types of patents cover any
improvement or invention in a product, process, or machine. It is also called "patent of invention". So, if
you created a new electric vehicle, or a solar-related machine, etc., you would be applying for a utility
patent.
Drafting of Application
This is the most important step in the patent application process. As mentioned above, the patent
application is accompanied by Form 2, which asks the inventor to provide the technical specifications of
the invention. It should be as detailed as possible and include the different parts of the invention (if
divided into steps); drawings and diagrams showing the mechanism of inventions; Background of the
invention; a detailed description of the content of the invention, the purpose for which the invention was
created, and how it serves the particular industry to which it is likely to belong; specification summary as
well as the patent abstract.
Submit request
Patent requests can be filed in writing with the patent registry or electronically. Form 1 (with main contact
information of the inventor) or Form 28 (for startups and small organizations); Form 2 (with full
regulations or specifications); Form 3 (in case the application is an international application); Form 5
(statement of all inventors of the invention) and Form 26 (power of attorney, if the application is filed by
an attorney), must be filed together on the portal or posted on the register along with the pay applicable
fees. patent registration fee. More importantly, in case a full description is filed within 12 (twelve) months
of the filing date, the inventor must have an earlier filing date on the originally filed application (together
with the original application description). temporary).
Publication
Patent applications are published 18 (eighteen) months from the filing date or from the priority date,
whichever comes first (Rule 24 of the Patent Rules).
It is interesting to note that the Patent Act provides for accelerated publication of an application in cases
where the applicant does not wish to wait the full 18 (eighteen) months. In that case, the applicant may file
Form 9 (prior disclosure required) with the required fee. As part of a request for advance disclosure, the
Controller General shall publish the request within one (one) month from the date of filing the request.
Protest/ Objection
If objections are raised in the context of an initial review report, the applicant (inventor) or his or her
designated representative (on behalf of the applicant) must file a response to the objections raised (pay the
initial audit report) and comply with any requirements (as stated) within 6 (six) months (from the date of
issue of the first audit report).
Patent granting
Provided that all objections are eliminated, in the event that the examiner and the controlling authority
come to the conclusion that the application and accompanying documents comply with the law, the patent
order shall be notified to the applicant (or for the representative). and subsequently published in Patent
Review. In case of objection to the grant of a patent, any person has the right to submit a notice of
objection to the Controller General, within one (one) year from the date of publication from the date of
grant of the patent.
COLUMN-I COLUMN-II
1. Copyright a. It is used for the protection of new inventions.
Answer:
1-c 2-a 3-b 4-e 5-d
Answer:
1- Rule 10 2-proof of right 3- document 4- bill of rights 5- contradiction
SUMMARY
The definition of intellectual property rights is any and all rights associated with intangible assets owned
by a person or company and protected against use without consent. Intangible assets refer to non-
physical property, including right of ownership in intellectual property. Trademarks protect logos,
sounds, words, colors, or symbols used by a company to distinguish its service or product. Trademark
examples include the Twitter logo, McDonald’s golden arches, and the font used by Dunkin. Copyright
law protects the rights of the original creator of original works of intellectual property. Unlike patents,
copyrights must be tangible. For instance, you can’t copyright an idea. But you can write down an
original speech, poem, or song and get a copyright.
Once someone creates an original work of authorship (OWA), the author automatically owns the
copyright. But, registering with the U.S. Copyright Office gives owners a head-start in the legal system.
Trade secrets are a company’s intellectual property that isn’t public, has economic value, and carries
KEY WORDS
Patent -for an invention is granted by government to the inventor, giving the inventor the right to stop
others, for a limited period.
Copyright – A set of rights automatically granted to the person who creates the original work of
authorship, such as literature, song, film, or software.
Parody – Imitation of the style of a certain writer, artist or genre with deliberate exaggeration for comic
effect.
Intellectual Property - Intellectual Property (IP) refers to intellectual creations such as inventions;
literary and artistic works; projects; and symbols, names and images used in trade.
Trademark- A trademark is a sign capable of distinguishing the goods or services of one enterprise from
those of other enterprises. Trademarks are protected by intellectual property rights.
Geographical indications- It is a sign used on products that have a specific geographical origin and
possess qualities or a reputation that are due to that origin.
Traditional knowledge- It is knowledge, know-how, skills and practices that are developed, sustained
and passed on from generation to generation within a community, often forming part of its cultural or
spiritual identity.
YOUTUBE VIDEOES
1. https://www.youtube.com/watch?v=AGpmo-Y8RUk2.
2. https://www.youtube.com/watch?v=Bj1_z56VEJ0
3. https://www.youtube.com/watch?v=TdePs0s6Ka8
4. https://www.youtube.com/watch?v=VMhcnaOBKvM
WIKIPEDIA
1. https://nyaaya.org/legal-explainer/patient-rights-in-india/
2. https://en.wikipedia.org/wiki/Intellectual_property
3. https://en.wikipedia.org/wiki/Patent
4. https://instr.iastate.libguides.com/patents/USsearch
OER
1. Ahuja V K 2017, Law Relating to Intellectual Property Rights, LexisNexis India Book Stores.
2. Rajagopalan Radhakrishnan, 2008, Intellectual Property Rights, Excel Books.
3. Asha Vijay Durafeand Dhanashree K. Toradmalle, 2020, Intellectual Property Rights, Wiley India
Pvt Ltd.
REFERENCE BOOKS
1. Vaidyanathan, Siva, (2004). The Anarchist in the Library: How the Clash Between Freedom and
Control Is Hacking the Real World and Crashing the System. New York: Basic Books.
2. Shiva, Vandana (2016). Biopiracy: The Plunder of Nature and Knowledge. North Atlantic Books.
3. World Intellectual Property Organization (WIPO) (2016). Understanding Industrial Property. World
Intellectual Property Organization.
4. Rupinder Tewari and Mamta Bhardwaj (2021). Intellectual Property, A Primer for Academia.
Panjab University, Chandigarh.
INTRODUCTION
It is widely acknowledged that the 21st century is driven by innovation and knowledge creation, which are
also the main drivers of economic development in any country, like the last two decades of the century.
21st century confirmed. The generation, storage and meaningful use of data related to research results are
vital ingredients for innovation and knowledge creation in any country.
After World War II, the world saw unprecedented growth in research and academia, not just limited to the
sciences. This has posed unprecedented challenges across all facets of academia, from human resource
management in research and higher education institutions to deeper questions about ethics in the world.
academic. In fact, the situation is more complicated as most academic activities have shifted from
professional to professional activities. When the number of participants in research and related activities is
limited, the peer-led approach is very practical and tangible for evaluating research outcomes, among
other parameters, of any individual and the ethics involved are fundamentally related to integrity of a peer
group.
Indexing improves database performance by minimizing the number of disk hits required to complete a
query. It is a data structure technique used to quickly locate and access data in a database. Several
database fields are used to build the index. The primary key or candidate key of the table is copied in the
first column, which is the search key. To speed up data retrieval, the values are also kept in sorted order. It
should be emphasized that it is not necessary to sort the data. The second column is a data reference or
pointer containing a set of pointers containing the address of the disk block where that particular key value
can be found.
ATTRIBUTES OF INDEXING
Access Types: This refers to the type of access such as value-based search, range access, etc.
Access Time: It refers to the time needed to find a particular data element or set of elements.
Insertion Time: It refers to the time taken to find the appropriate space and insert new data.
Deletion Time: Time taken to find an item and delete it as well as update the index structure.
Space Overhead: It refers to the additional space required by the index.
In general, there are three types of file organization mechanisms followed by indexing methods for storing
data. Where the indices are based on the order the values are sorted. These are usually faster and more
traditional caching mechanisms. These ordered or sequential file organizations can store data in a dense or
sparse format.
For every search key value in the data file, there is an index record.
This record contains the search key and also a reference to the first data record with that search key value.
SPARSE INDEX: -
The index record appears only for a few items in the data file. Each item points to a block as shown.
To locate a record, we find the index record with the largest search key value less than or equal to the
search key value we are looking for.
We start at that record pointed to by the index record, and proceed along with the pointers in the file (that
is, sequentially) until we find the desired record.
Number of Accesses required=log₂(n)+1, (here n=number of blocks acquired by index file).
CLUSTERED INDEXING : -
When more than two records are stored in the same file this type of storing is known as cluster
indexing. By using cluster indexing we can reduce the cost of searching reason being multiple
records related to the same thing are stored in one place and it also gives the frequent joining of
more than two tables (records).
The clustering index is defined on an ordered data file. The data file is ordered on a non-key field.
In some cases, the index is created on non-primary key columns which may not be unique for each
record. In such cases, in order to identify the records faster, we will group two or more columns
together to get the unique values and create an index out of them. This method is known as the
clustering index. Essentially, records with similar properties are grouped together, and indexes for
these groupings are formed.
Students studying each semester, for example, are grouped together. First-semester students,
second-semester students, third-semester students, and so on are categorized.
ADVANTAGES OF DATABASE
There are several advantages of using a database management system to store, which are as
follows:
Data organization: Database provide tools for organizing data in a structured and logical way,
which can make it easier to search, sort, and retrieve data.
Data integrity: Database enforce rules and constraints on the data to ensure that it is accurate and
consistent.
Data security: Database provide tools for controlling access to the data and protecting it from
unauthorized access or tampering.
ONLINE DATABASES :
The Web of Science Core Collection consists of six online databases:
i. The Science Citations Extended Index (SCIE) includes more than 8,500 notable journals spanning
over 150 disciplines, spanning from 1900 to the present day.
ii. The Social Science Citation Index includes more than 3,000 journals in the fields of the social
sciences, also covering the period from 1900 to the present day.
iii. The Arts and Humanities Citation Index includes more than 1,700 arts and humanities journals
since 1975. In addition, 250 major scientific and social science journals are included.
REGIONAL DATABASES : Since 2008, the Web of Science hosts a number of regional citation indices:
i Chinese Science Citation Database, produced in partnership with the Chinese Academy of
Sciences, was the first indexing database in a language other than English
ii SciELO Citation Index, established in 2013, covering Brazil, Spain, Portugal, the Caribbean and
South Africa, and an additional 12 countries of Latin America
iii Korea Citation Index in 2014, with updates from the National Research Foundation of Korea
iv Russian Science Citation Index in 2015
v Arabic Regional Citation Index in 2020
GOOGLE SCHOLAR
Google Scholar, developed by Google Inc. Launched in 2004, it is the world's largest indexing and
citation database for the scientific literature, including more scholarly journals and other scientific articles
than other comparable citation databases. similar to Scopus, Web of Science, etc. -review articles, theses,
books, abstracts and legal opinions from academic publishers and professional associations, online print
repositories, universities, subject portals, and other academic institutions. Although Google does not
disclose the size of the Google Scholar database, bibliographic researchers estimate that it contains about
390 million documents, including articles, citations, and patents, making it the most the largest university
research tool in the world. Google Scholar found 88% of all these citations, many of which were not found
by other sources and nearly all of the others found by the remaining sources (89- 94 %). An earlier
statistical estimate published in PLOS One using the mark-and-recall method estimated coverage of about
79-90% of all articles published in English, with an estimate of around 100 million. Google Scholar is
also one of the oldest Google services. Anurag says its comprehensive database of research papers, legal
cases and other scientific publications is the fourth search service launched by Google. To celebrate the
18th anniversary of this very important tool, I asked Anurag to share 18 things you can do in Google
Scholar that you might have missed: -
Copy article citations in the style of your choice.
Dig deeper with related searches.
And don’t miss the related articles.
Read the papers you find.
Access Google Scholar tools web with the Scholar Button browser extension.
Learn more about authors through Scholar profiles.
Easily find topic experts.
Search for court opinions with the “Case law” button.
See how those court opinions have been cited.
Understand how a legal opinion depends on another.
CiteSeerX
CiteSeerX (formerly known as CiteSeer) is a public search engine and digital library containing
scientific and scholarly articles, primarily in the fields of computer science and information.
CiteSeer's goal is to improve the dissemination and access of scientific and academic literature. As a non-
profit service that can be freely used by anyone, it is considered part of the open access movement that
attempts to change academic and scientific publishing to allow more access. into the scientific literature.
CiteSeer has made Open Archives Initiative metadata free of all indexed documents and links indexed
documents to other metadata sources such as DBLP and ACM Portal if possible. To promote open data,
CiteSeerX shares its data for non-commercial purposes under a Creative Commons license. CiteSeerX is
a growing digital library and search engine for scientific literature, focusing primarily on information and
computer science literature. CiteSeerX aims to improve the dissemination of scientific literature and
improve the functionality, usability, availability, cost, completeness, efficiency, and speed of accessing
scientific and academic knowledge. art. Instead of just creating another digital library, CiteSeerX strives to
provide resources such as algorithms, data, metadata, services, techniques, and software that can be used
to promote digital libraries. other. CiteSeerX has developed new methods and algorithms for indexing
PostScript and PDF searchable articles on the web.
RES505: Research Methodology Page 364
CiteSeer was developed in 1997 at the NEC Research Institute in Princeton, New Jersey by Steve
Lawrence, Lee Giles and Kurt Bollacker. The service was transferred to the Pennsylvania State University
College of Information Science and Technology in 2003. Since then, the project has been led by Professor
Lee Giles.
After serving as a public search engine for nearly ten years, CiteSeer, originally designed as a simple
prototype, began to grow far beyond the capabilities of the original architecture. Since its inception, the
original CiteSeer has grown to index more than 750,000 documents and answer more than 1.5 million
queries per day, surpassing the limits of the system's capabilities. Based on an analysis of the problems
the original system encountered and the needs of the research community, a new architecture and new data
model was developed for the "next generation CiteSeer" or CiteSeerX, continuing the legacy of the
forward-looking CiteSeer (also known as the Research Index) is a digital science library primarily for
computer scientists. It contains full-text research articles that are free to download from the web.
Articles are indexed by the Automatic Citation Index (ACI) system that links records together through
references cited in an article and citations made to the article. there. It provides links to related articles and
can determine the context of a citation. CiteSeer supports full boolean, phrase and neighborhood searches.
You can choose to search for full-text documents or citations contained in these documents. CiteSeer
became public in 1998 and had many new features unavailable in academic search engines at that time.
CiteseerX provides the following features. These included: -
i Autonomous Citation Indexing automatically created a citation index that can be used for
literature search and evaluation.
ii Citation statistics and related documents were computed for all articles cited in the database, not
just the indexed articles.
iii Reference linking, allowing browsing of the database using citation links.
iv Citation context showed the context of citations to a given paper, allowing a researcher to quickly
and easily see what other researchers have to say about an article of interest.
v Related documents were shown using citation and word-based measures, and an active and
continuously updated bibliography is shown for each document.
IEEE Xplore
The IEEE Xplore Digital Library is a powerful resource for discovering and accessing scientific and
technical content published by the Institute of Electrical and Electronics Engineers (IEEE) and its
publishing partners. IEEE Xplore is the leading academic database for engineering and computer
science. It can be used to search not only journal articles, but also conference papers and books. It mainly
contains material published by the Institute of Electricity and Electronics (IEE) and other partner
publishers.
It provides online access to more than 5 million documents from publications in computer science,
electrical engineering, electronics and related fields. Its documents and other materials include more
than 300 peer-reviewed journals, more than 1,900 global conferences, more than 11,000 technical
standards, nearly 5,000 e-books, and more than 500 online courses, with approximately 20,000 new
documents added each month. Anyone can search IEEE Xplore and find bibliographic records and
abstracts within its contents, while access to full-text documents requires a personal or institutional
subscription.
CONTENT TYPES IN IEEE XPLORE: The following content types are available on IEEE Xplore:
i. Books: IEEE Press and IEEE Computer Society Press, together with John Wiley and Sons, Inc.,
develop and publish books in the fields of electrical, computer, and software engineering under
the Wiley-IEEE Press and Wiley-IEEE Computer Society Press imprints.
ii. Conference Proceedings: IEEE publishes more than 1,700 state-of-the-art conference
proceedings annually, recognized worldwide by academia and industry as the most important
compendium of electrical engineering, computer science and related fields.
iii. Courses: In addition to IEEE-USA professional development courses, IEEE Xplore offers online
course titles on the IEEE Learning Network. Topics covered are, for example
iv. Journals and Journals: IEEE publishes leading journals, transactions, letters and magazines in
the fields of electrical engineering, computing, biotechnology, telecommunications, electricity and
energy, and dozens of other technologies. Articles from IBM, SMPTE, BIAI and TUP journals are
also available in IEEE Xplore.
Pub Med
PubMed Central (PMC) is a free digital repository that archives open access full-text scientific articles
published in biomedical and life sciences journals. PubMed Central is one of the largest research
databases developed by the National Center for Biotechnology Information (NCBI) and is more than a
document repository. PMC submissions are indexed and formatted for advanced metadata, medical
oncology and unique identifiers that enrich the XML-structured data of each article. PMC content can be
linked to other NCBI databases and accessed through Entrez search and retrieval systems, further
enhancing the public's ability to find, read, and advance their biomedical knowledge.
As of December 2018, the PMC archive contained more than 5.2 million articles contributed by
publishers or authors who have deposited their manuscripts in the archive in accordance with NIH's public
access policy. Some publishers delay publishing their articles in PubMed Central for a period of time after
publication, known as a "lock-in period," which varies from a few months to a few years, depending on
the journal. (Six- or twelve-month embargoes are most common.) PubMed Central is a good example of
"systematic third-party external distribution," which continues to be banned by contributors from many
publishers.
PubMed Central® (PMC) is a free full-text archive of biomedical and life science journals from the
National Library of Medicine of the National Institutes of Health (NIH/NLM) of the United States. In
compliance with NLM's Biomedical Literature Collection and Preservation Act, PMC is part of the NLM
Collection, which also includes NLM's extensive print and approved electronic journals, and supports
contemporary biomedical and health research and practice, as well as future scholarship. PMC has been
available to the public online since 2000 and is developed and maintained by NLM's National Center for
Biotechnology Information (NCBI). PMCID (PubMed Central Identifier), also known as PMC reference
number, is the bibliographic identifier of the PubMed Central open access database, just as PMID is the
bibliographic identifier of the PubMed database. However, the two identifiers are different. It consists of
"PMC" followed by a sequence of seven numbers.
Since its inception in 2000, PMC has grown from two publications, PNAS: Proceedings of the National
Academy of Sciences and Molecular Biology of the Cell, to an archive of thousands of journal articles. In
addition, PMC also includes author manuscripts deposited in the NIH Manuscript Submission System and
through, and preprints collected through the NIH Preprint Pilot.
OPEN ACCESS
When reviewing DOAJ, it is important to know the open access model of the publication.
Basically, this is a model where authors, their institutions, funding bodies or other stakeholders
pay the publishing costs, instead of using a subscription model or a model where you have to buy
or even rent an individual article.
The video above provides a good introduction to open publishing and the history and goals of
DOAJ. You may also want to check out one of our earlier articles on open access publishing in
relation to predatory publishing.
There are some surprising facts in the video that surprise us to say the least. One (in about 4
BENEFITS ICI
The Indian Citation Index (ICI) enables the research community to map information published in local
national journals/journals etc. Whether you are just starting out in your research, an experienced academic
researcher or teacher or librarian or administrator, ICI provides objective content. and tools to support
your research role.
ADVANTAGES OF ICI'S presence in the scientific community are:
i. A comprehensive research and assessment tool for Indian literature
ii. To facilitate comprehensive scientometric and bibliometric studies of Indian literature
iii. Assist in measuring and analyzing individual, institutional, regional and national R and D
performance for strategic planning
iv. A real tool to generate complete and comprehensive analytical reports on R and D health in India
v. ICI can generate national R and D indicators like Indian Journals Citation Reports etc.
COLUMN-I COLUMN-II
1. Web of science a. Scientific literature digital library and search engine
2. Cite SeerX b. A free digital repository
Answer:
SUMMARY
Summarized here are the widely used databases or those that have some unique features. We have
also included a good number of the freely accessible databases. Interestingly good deal of useful
information can be extracted using freely accessible bases.
Indexing is a very useful technique that helps in optimizing the search time in database queries. The
table of database indexing consists of a search key and pointer. There are four types of indexing:
Primary, Secondary Clustering, and Multivalued Indexing. Primary indexing is divided into two types,
dense and sparse. Dense indexing is used when the index table contains records for every search key.
Sparse indexing is used when the index table does not use a search key for every record. Multilevel
indexing uses B+ Tree. The main purpose of indexing is to provide better performance for data
retrieval.
PubMed Central is distinct from PubMed. PubMed Central is a free digital archive of full articles,
accessible to anyone from anywhere via a web browser (with varying provisions for reuse). Conversely,
although PubMed is a searchable database of biomedical citations and abstracts, the full-text article resides
elsewhere.
WorldWideScience.org implements federated searching to provide its coverage of global science and
research results. Federated searching technology allows the information patron to search multiple data
sources with a single query in real time. It provides simultaneous access to "deep web" scientific
databases, which are typically not searchable by commercial search engines.
Theses indexed by EThOS have a minimum of a thesis title, author, awarding body and date. Optional
additional metadata may be included such as the thesis abstract, doctoral advisor, sponsor, cross links to
other databases and the full text of the thesis itself.
In the context of looking at DOAJ, it is important to know about the open access model of publishing.
Essentially, it is a model where the authors, their institutions, funding bodies or other stakeholders pay
the publication costs, rather than operating a subscription model, or one where you are required to buy,
or even rent, an individual article.
ICI provides a multidisciplinary research platform covering about 1000 scholarly journals from India.
The ICI database also produces other useful byproducts like Indian Science Citation Index (ISCI),
Indian Social Science and Humanities Citation Index (ISSHCI), Indian Journals Citation Reports
(IJCR), The Indian Citation Index (ICI) is an online bibliographic database containing abstracts and
citations from academic journals. Currently ICI covers more than 1100 journals from India covering
scientific, technical, medical, and social sciences that includes arts and humanities.
KEY WORDS
Data base- An organized collection of structured information, or data, typically stored electronically in a
computer system.
Data indexing- A database index is a data structure that improves the speed of data retrieval operations on
a database table at the cost of additional writes and storage.
Web of Science- A selective citation index of scientific and scholarly publishing covering journals,
proceedings, books, and data compilations.
Science direct- It is the world's leading source for scientific, technical, and medical research. Explore
journals, books and articles.
PubMed- PubMed is a free search engine accessing primarily the MEDLINE database of references and
abstracts on life sciences and biomedical topics.
Indian Citation Index- It provides powerful search engine basically to perform search and evaluation for
researchers, policy makers, decision makers etc.
Preprint Site- Preprint versions of articles may or may not be peer reviewed or may be the author's final,
peer-reviewed manuscript as accepted for publication.
REFERENCES
1. Gusenbauer M (2019). Google Scholar to overshadow them all? Comparing the sizes of 12 academic
search engines and bibliographic databases. Scientometrics. 118 (1): 177–214.
2. Giri R, Das AK (2011). Indian Citation Index: a new web platform for measuring performance of
Indian research periodicals. Library Hi Tech News. 28 (3): 33–35.
3. Gamble A (2018). Biological Abstracts (Clarivate Analytics). The Charleston Advisor. 20(1):19-25.
4. Kirkwood HP, Kirkwood MC (2011). Econlit and Google Scholar Go Head-to-Head. 35(2): 38–41.
YOUTUBE VIDEO
1. https://www.youtube.com/watch?v=iXGbH2hRsUw
2. https://www.youtube.com/watch?v=_WuYieVbKBU
3. https://www.youtube.com/watch?v=cD1Xml9E1_E
WIKIPEDIA
1. https://paperpile.com/g/google-scholar-guide/
2. https://libguides.ntu.edu.sg/c.php?g=929556&p=6716121
3. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4800951/
4. https://www.indiancitationindex.com/ici.aspx?target=benifits#
5. https://www.nature.com/articles/nature.2014.16643
REFERENCE BOOK
1. World Intellectual Property Organization (WIPO) (2016). Understanding Industrial Property. World
Intellectual Property Organization.
2. Rupinder Tewari and Mamta Bhardwaj (2021). Intellectual Property, A Primer for Academia. Panjab
University, Chandigarh.
3. Academic integrity and research quality (2021) University Grants Commission, New Delhi.
SUGGESTED READINGS
1. Adie, E., and W. Roe (2013). Altmetric: Enriching scholarly content with article-level discussion and
metrics. Learned Publishing 26 (1): 11–17.
2. Baykoucheva, Svetla, (2015). Managing Scientific Information and Research Data. by Science Direct.
3. Chaddah, P. and S.C. Lakhotia (2018). A Policy Statement on Dissemination and Evaluation of
Research Output in India. Proc. INSA 84 No. 2 June: 319–329.
4. Chakraborty, S., J. Gowrishankar, A. Joshi, P. Kannan, R. K. Kohli, S. C. Lakhotia, G. Misra, C. M.
Nautiyal, K. Ramasubramanian, N. Sathyamurthy and A. K. Singhvi (2020). In Summary for the
Month. NASI, April 2020.
"I believe in innovation and that the way you get innovation is you fund research and you learn the basic
facts."- Bill Gates
RESEARCH METRICS
Research metrics are bibliometric tools used in the publishing industry as indicators of research
performance at both the journal and author levels. The two main components of bibliometric
research are the number of publications and the number of citations to publications. Since its
introduction, the citation-based Journal Impact Factor (JIF) has been one of the most important
parameters for evaluating journals. For a long time, this was the only tool available to evaluate
the performance of scientific journals.
There are now a growing number of different research metrics available at the journal and author
level, from traditional impact factor to Eigen factor, h-index to Altmetrics and more. Based on the rich
resources of the SCI (Science Citation Index) database, the Institute for Scientific Information
(ISI) launched a tool to classify academic journals based on their citations and impact in the scientific
community. Beginning in 1975, SCI began publishing the JIF and the Instant Index as part of Journal
Citation Reports (JCR), providing an instant overview of cited data. From the beginning, the SCI
database contained the institutional information of all authors of articles published in the journal.
A journal's impact factor for 2008 would be calculated by taking the number of citations in 2008 to
articles that were published in 2007 and 2006 and dividing that number by the total number of articles
published in that same journal in 2007 and 2006. Below is how Thomson calculating the 2008 impact
factor for the journal Academy of Management Review:
Thus, the Impact Factor of 6.125 for the journal, Academy of Management Review for 2008 indicates that
on average, the articles published in this journal in the past two years have been cited about 6.125 times.
LIMITATIONS OF JIF
i The IF is an arithmetic mean and it doesn’t adjust for the distribution of citations.
ii The Impact Factor only considers the number of citations, not the nature or quality.
RES505: Research Methodology Page 385
iii Impact Factors cannot be compared across different subject areas.
iv The JCR doesn’t distinguish between citations made to articles, reviews, or editorials.
v IF can show significant variation year-on-year, especially in smaller journals.
04-04-02: CITESCORE
"CiteScore (CS) of an academic journal is a metric that captures the annual average number of
citations for recent articles published in that journal." It was compiled by Ebsco based on the citations
recorded in the Scopus database. Absolute rankings and percentiles are also presented in the specific
topic of each journal. This journal evaluation metric was published in December 2016 as an alternative
to the Journal Citation Reports (JCR) Impact Factor (IF) calculated by Clarivate. CiteScore is based on
the JCR IF instead of two or five citations collected for articles published in the previous four years.
CiteScore's impartiality was questioned upon publication by bibliometrics experts such as Carl
Bergstrom, who found that it favored Elsevier's Nature titles.
CiteScore is another metric to measure the influence of a journal in Scopus. The calculation of the
current year's CiteScore is based on the number of citations received by the journal during the last 4
years (including the reporting year) divided by the number of documents published in the journal during
these four years. CiteScore 2022 is calculated as follows:
Note: Document types include: articles, reviews, conference papers, data papers and book chapters.
CiteScore 2022 were released in Jun 2023 with a new methodology. The new CiteScore counts only
peer-reviewed publication types and adopts a 4-year citation window in the numerator (instead of 1
year). Read this article to learn more about the new methodology.
CiteScore metrics are a family of 8 indicators, include: CiteScore, CiteScore Tracker, CiteScore
Percentile, CiteScore Quartiles, CiteScore Rank, Citation Count, Document Count and Percentage Cited.
FREQUENTLY-USED METRICS
i h-index: measures the cumulative impact of a researcher's output by looking at the number of
citations a work has received.
ii i-10-index: created by Google Scholar, it measures the number of publications with at least 10
citations.
iii g-index: aims to improve on the h-index by giving more weight to highly-cited articles.
iv e-index: The aim of the e-index is to differentiate between scientists with similar h-indices but
different citation patterns.
v Altmatrics Altmetrics stands for "alternative metrics."
vi Unique ID: Digital Object Identifiers (DOIs) are used to uniquely identify digital research works,
and provide a persistent link to the location of the work on the internet.
Citation-based metrics for journals can easily be extended to include factors that influence their
productivity and measure their impact on the scientific community. We have already emphasized the
importance of citations, as they can easily be extended to conclude the contributions of authors at their
individual or collective level.
i. h-index
The H-index seems to provide better quality information than the total number of scientific publications
and the total number of citations received. Knowing the number of publications alone does not indicate
how well these articles have been received by other researchers. Similarly, the total number of citations
The Web of Science uses the H-Index to quantify research output by measuring author productivity and
impact.
H-Index = number of papers (h) with a citation number ≥ h.
Example: a scientist with an H-Index of 37 has 37 papers cited at least 37 times.
Fig.4.4.4: i-index
Created by Google Scholar and used in Google's My Citations feature.
i10-Index = the number of publications with at least 10 citations.
This very simple measure is only used by Google Scholar, and is another way to help gauge the
productivity of a scholar.
ADVANTAGES OF I 10-I NDEX
i. Very simple and straightforward to calculate
ii. My Citations in Google Scholar is free and easy to use
DISADVANTAGES OF I 10-INDEX
The G-index was proposed by Leo Egghe in his paper "Theory and Practice of the G-Index" in 2006 as an
improvement on the H-Index. G-Index is calculated this way: " ranked in decreasing order of the number
of citations that they received; the G-Index is the (unique) largest number such that the top g articles
received (together) at least g^2 citations."
The index is calculated based on the distribution of citations received for the publications of a given
author. Suppose that research papers are ranked in descending order by the number of citations they have
received, then the g-index is the unique largest number such that the best g papers together have received
at least g2 citations. Therefore, it can be defined as the largest number of g highly cited articles with an
average number of citations of at least g. The designed g-index effectively elevates low reference articles
to high reference articles.
ADVANTAGES OF ALTMETRIC
i Early impact evidence: In practice, the most important advantage of many alternative indicators
is that they give early impact evidence.
ii Wider impact evidence: All Altmetrics and webometrics reflect impact that is at least partly
different from citation impact.
iii Publishers. Discover how published research is being used and shared around the world.
iv Institutions: Understand and interpret the attention surrounding your institution's research and
identify areas of strength or those that need improvement for your long-term objectives.
LIMITATIONS OF ALTMATRICS
There are a number of limitations to the use of altmetrics:
i Altmetrics don’t tell the whole story.
ii Like any metric, there’s a potential for gaming of altmetrics.
iii Altmetrics are relatively new, and more research into their use is needed.
iv Data are not normalized.
v Known tracking issues.
2. Review Articles are cited more than other types of articles. (True/False)
Column-I Column-II
1. i10-index a. the highest number of publications of a scientist
2. h-index b. the number of publications with at least 10 citation
3. g-index c. the square root of the excess citations over those used for calculating
the h-index
4. e-index d. the unique largest number such that the top g articles received
together at least g2 citations.
Answer:
SUMMARY
A journal articles are be impactful, they have to be discoverable, and online discovery rests almost entirely
on indexing. Journals included in an index are considered to be of higher quality than journals that are not
as these have to go through a vetting process to be included or indexed in reputed bibliographic databases.
Based on the citations, there are several research evaluation metrics for both journals and authors.
In order to address some of the drawbacks of JIF and related metrics, efforts have been made to develop
new-generation metrics, both using WoS and Scopus databases. These metrics involve complex algorithm-
based calculations for assessing the quality of journals using the vast mesh of citations. Eigenfactor and
Article Influence are based on WoS data, whereas SNIP and SJR indicators are based on Scopus data. The
Eigenfactor Score calculation is based on the number of times articles from the journal published in the
past five years have been cited in the JCR data year, but it also considers which journals have contributed
these citations so that highly cited journals will influence the network more than lesser cited journals, with
self-citations not being considered. Related to the Eigenfactor score, the Article Influence (AI) score of a
journal is a measure of the relative importance of each of its articles over the first five years after
publication.
Based on Scopus database, SNIP attempts to measures contextual citation impact by weighing citations
based on the total number of citations in a subject field and corrects subject-specific characteristics,
simplifying cross-discipline comparisons between journals. Similarly, SCImago Journal Rank Indicator
(SJR) measures the scientific prestige of the average article in a journal. Both SNIP and SJR use three
years window for taking into account the published papers in the Scopus database.
The citation-based metrics for journals can easily be extended to authors. h-index is the most widely
known author-level index and is a very widely used criterion as a proxy for author’s academic
KEYWORDS
Research matrix- Using a review matrix enables you to quickly compare and contrast articles in order to
determine the scope of research across time.
Impact factors- The impact factor (IF) or journal impact factor (JIF) of an academic journal is a
scientometric index calculated by Clarivate that reflects the yearly.
Article influence- The Article Influence Score calculates measures the relative importance of the journal
on a per-article basis.
Unique ID- Digital Object Identifiers (DOIs) are used to uniquely identify digital research works.
i-index- A list of journals organized by discipline, subject, region or other factors.
h-index- The h-index reflects both the number of publications and the number of citations per publication.
g-index- It is the (unique) largest number such that the top g articles received (together) at least g²
citations.
e-index- A principal investigator is obligated not only to make academic impacts but also to do so in a
cost-effective fashion.
m-index- It is another variant of the h-index that displays h-index per year since first publication.
YOUTUBE VIDEO
1. https://www.youtube.com/watch?v=AjsHxxiDrQI
2. https://www.youtube.com/watch?v=lgVuyzke6OY
3. https://www.youtube.com/watch?v=IN587De8Pis
4. https://www.youtube.com/watch?v=yS7oWq2loA4
REFERENCES
1. Giri, Rabishankar; Das, Anup Kumar (2011). Indian Citation Index: a new web platform for
measuring performance of Indian research periodicals. Library Hi Tech News. 28 (3): 33–35.
2. Rupinder Tewari and Mamta Bhardwaj (2021). Intellectual Property, A Primer for Academia. Panjab
University, Chandigarh.
3. Academic integrity and research quality (2021) University Grants Commission, New Delhi.
4. Egghe, Leo. 2006. Theory and Practise of the g-index. Scientometrics. 69 (1): 131–152.
5. Hirsch, J.E. 2005. An Index to Quantify Individual’s Scientific Output. PNAS 102: 42 16569–16572.
SUGGESTED READINGS
1. Das, A.K. 2015. Research Evaluation Metrics. UNESCO.
2. Reitz, Joan M. 2013. Online Dictionary for Library and Information Science: http://www.abc-clio.
com/ODLIS/searchODLIS.aspx.
3. Roemer, Robin Chin and Rachel Borchardt. 2015. Meaningful Metrics: A 21st Century Librarian’s
Guide to Bibliometric and Research Impact. ala.org
4. Rousseau, R., Leo Egghe, and Raf Guns. 2018. Becoming Metric-wise: A bibliometric guide for
researchers. Science Direct.
Dear Student,
You have gone through this book, it is time for you to do some thinking for us.
Please answer the following questions sincerely. Your response will help us to
analyse our performance and make the future editions of this book more useful.
Your response will be completely confidential and will in no way affect your
examination results. Your suggestions will receive prompt attention from us.
Style
01. Do you feel that this book enables you to learn the subject independently
without any help from others?
03. Do you feel the following sections or features, if included, will enhance self -
learning and reduce help from others?
Yes No Not Sure
Index
Glossary
List of “Important Terms Introduced”
Two Colour Printing
Content
04. How will you rate your understanding of the contents of this Book?
05. How will you rate the language used in this Book?
Very Simple Simple Average Complicated Extremely Complicated
06. Whether the Syllabus and content of book complement to each other?
Yes No Not Sure
07. Which Topics you find most easy to understand in this book?
Sr.No. Topic Name Page No.
09. List the difficult topics you encountered in this Book. Also try to suggest
how they can be improved.
Use the following codes:
Code 1 for “Simplify Text”
Code 2 for “Add Illustrative Figures”
Code 3 for “Provide Audio-Vision (Audio Cassettes with companion Book)”
Code 4 for “Special emphasis on this topic in counseling”
10. List the errors which you might have encountered in this book.
1. 2. 3. 4. 5.