Edited Assessment and Evaluation PPT For 2014
Edited Assessment and Evaluation PPT For 2014
Edited Assessment and Evaluation PPT For 2014
Department of Psychology
Assessment and Evaluation, PGDT 423
Outline
CHAPTER ONE: Definitions of basic terms
Test
Measurement
Assessment
Evaluation
Test
Assessment
Measurement
Evaluation
CHAPTER ONE
INTRODUCTION: DEFINITION OF BASIC TERMS
Test: is a measuring tool or instrument in education.
C. Diagnostic Evaluation
Which is applied during instruction to find out the underlying
cause of students persistent learning difficulties.
D. Summative evaluation
This is the type of evaluation carried out at the end of the course
of instruction to determine the extent to which the instructional
objectives have been achieved.
It is called a summarizing evaluation because it looks at the
entire course of instruction or programme and can pass judgment
on both the teacher and students, the curriculum and the entire
system.
It is used for certification
CONT….
Formative evaluation Summative evaluation
80% to 84.99% = A-
75% to 79.99% = B+
70% to 74.99% = B
65% to 69.99% = B-
60% to 64.99% = C+
50% to 59.99% = C
It must be decided whether the test will be used to measure the entry
performance or the previous knowledge acquired by the student on the
subject.
Checking the quality of objectives
Do objectives reflect appropriate all the intended outcomes?
Are they observable and measurable and the outcomes clearly defined?
Are they attainable by intended learners in the time available?
Are they reflect the course and curriculum aims?
Selecting appropriate item types
Mode of item presentation must be considered: oral, paper and
pencil……..
true-false
completion
multiple choice
matching and
essay
Studies showed that arranging items from easy to hard will
yield higher scores than arranging from hard to easy.
Table of specification, TOS (Blue print)
Air pressure
24%
Wind
16%
Temperature
28%
Rainfall
20%
Clouds
12%
Total
Percent 100%
28% 32% 16% 16% 4% 4%
Weighting of the Content and Process Objectives
The proportion of test items on each topic depends on the
emphasis placed on it during teaching and the amount of time
spent.
Also, the proportion of items on each process objectives
depends on how important you view the particular process skill
to the level of students to be tested.
• However, it is important that you make the test a balanced one
in terms of the content and the process objectives you have been
trying to achieve through your series of lessons.
Cont….
Percentages are usually assigned to the topics of the content and the
process objectives such that each dimension will add up to 100%.
After this, you should decide on the type of test you want to use and
this will depend on the process objective to be measured, the
content and your own skill in constructing the different types of
tests.
Cont….
Determination of the Total Number of Items
At this stage, you consider the time available for the test, types of
test items to be used (essay or objective) and other factors like the
age, ability level of the students and the type of process objectives
to be measured.
When this decision is made, you then proceed to determine the total
number of items for each topic and process objectives as follows:
To obtain the number of items per topic, you multiply the percentage of
each by the total number of items to be constructed and divide by 100.
This you will record in the column in front of each topic in the extreme
right corner of the blueprint.
In the table below, 24% was assigned to Air Pressure. The total number
of items is 25 hence 6 items for the topic (24% of 25 items = 6 items).
To obtain the number of items per process objective, we also
multiply the percentage of each by the total number of items for
test and divide by 100.
These will be recorded in the bottom row of the blueprint under
each process objective. In the table below:
I. The percentage assigned to comprehension is 32% of the
total number of items which is 25. Hence, there will be 8
items for this objective (32% of 25 items).
II. To decide the number of items in each cell of the blue print,
you simply multiply the total number of items in a topic by
the percentage assigned to the process objective in each row
and divide by 100.
This procedure is repeated for all the cells in the blue print.
For example, to obtain the number of items on wind under
knowledge, you multiply 28% by 4 and divide by 100 i.e. 1.
Instructional Objectives
Contents Total Percent
Air pressure
24%
2 2 1 1 - - 6
Wind
16%
1 1 1 1 - - 4
Temperature
28%
2 2 1 1 - 1 7
Rainfall
20%
1 2 1 - 1 - 5
Clouds
12%
1 1 - 1 - - 3
Total
7 8 4 4 1 1 25
Percent 100%
28% 32% 16% 16% 4% 4%
There are also other ways of developing a test blue print.
For example, the table of specification that we have seen earlier can
be prepared in the following way.
Item Types
Build in a good scoring guide at the point of writing the test items.
Exclude extraneous or irrelevant information.
1. Objective
2. Subjective tests
1. Objective test items
Objective tests are highly structured and require the test taker
to select the correct answer from several alternatives or to
supply a word or short phrase to answer a question.
They are called objective because they have a single right or best
answer that can be determined in advance.
Types of Objective test
Completion item: The Ethiopian forces defeated the Italian invaders at Adwa in the
year _____
Advantages of short answer/completion items
The short-answer test items are one of the easiest to construct.
They reduces the possibility that students will obtain the correct
answer by guessing.
Difficulty for scoring, this is especially true where the item is not
clearly phrased to require a definitely correct answer and the
student’s spelling ability.
Suggestions to make good short-answer type
• Word the item so that the required answer is both brief and
specific.
Examples
o Poor: An animal that eats the flesh of other animals is
______________.
o Better: An animal that eats the flesh of other animals is called
_______________ (carnivorous).
• Do not take statements directly from textbooks to use as a basis
for short-answer items.
Examples
o Poor: Chlorine is _____________.
o Better: Chlorine belongs to a group of elements which
combine with metals to form salt. It is therefore, called a
________ (halogen).
A direct question is generally more desirable than an incomplete
statement.
Examples
– Poor: Yuri Gagarin made his orbital flight around the earth in
_____(1961)(but the answer could be also in space shuttle)
– Better: When did Yuri Gagarin make his orbital flight around the
earth? (1961)
If the answer is to be expressed in numerical units, indicate the type of
answer wanted.
Examples
– Poor: Sound travels _________________ in a second.
– Better: Sound travels ________________ meter in a second.
When completion items are used, do not use too many blanks.
______________________
______________________
_____________________
The Alternative Response Test Item
They can be scored quickly, reliably, and objectively by any body using an answer
key.
If carefully constructed, true/false test items have also the advantage of measuring higher
mental processes of understanding, application and interpretation.
Disadvantages of true/false items
When they are used exclusively, they tend to promote
memorization of factual information: names, dates, definitions.
Can often lead a teacher to favor testing of trivial/little knowledge.
They encourage students for guessing.
Can often lead a teacher to write ambiguous statements due to the
difficulty of writing statements which are clearly true or false.
Do not discriminate b/n students of varying ability as well as
other test items.
Can often include more irrelevant clues than do other item types.
Suggestions to construct good quality true/false test items
Don’t use all inclusive words such as “all, always, never, none,
no” etc within the framework of a true-false test item.
Example
– Poor: The set of integers includes the set of all natural
numbers. T/F
– Better: The set of integers includes the set of natural
numbers. T/F
Don’t use indefinite terms such as “greatly, usually,
frequently, and sometimes” etc.
Examples
– Poor: Validity is usually of more concern to the tester than
reliability in testing. T/F
– Better: Validity is adjudged more important than reliability
in testing. T/F
Suggestions to construct good quality true/false test items
1. Special date that you spent with your friends A) echoic memory
2. Watching sport program in TV B) Iconic memory
3. Radio information C) Semantic memory
D) Episodic memory
Easy to score.
Objective to score.
Verbal association between the stem and the correct answer should
be avoided.
An item should contain only one correct or clearly best answer.
Essay items
Essay tests are tests consisting of questions (items) designed to
elicit from the learners through freedom of response.
You should use essay questions in the measurement of complex
achievement.
Essay questions should also be used to measure those learning
outcomes that cannot be measured by objective test items.
Students have the freedom to express or state the answers in their
own words.
Classification of Essay Items
There are two types of essay items. These are:
1. Extended response
In this type, questions are asked in a way that the answers demand that the student
is not limited to the extent to which he has to discuss the issues raised or question
asked.
Example
Describe the sampling technique used in research studies.
Explain the various ways of preventing car accident in Ethiopia.
2. Restricted response
In this type, the questions are so structured that the students are limited, the scope of the
response is defined and restricted.
The answers given are to some extent controlled.
Example
Give three advantages and two disadvantages of essay tests.
When the number of observations is odd, the median is simply equal to the
middle value.
For example, the following test scores, 7, 7, 7, 20, 23, 23, 24, 25,
26 have a mode of 7.
set of data 5
Frequency
3
0
75 80 85 90 95
Score on Exam 1
Bimodal Distributions
Frequency
3
0
75 80 85 90 95
Score on Exam 1
Multimodal Distributions
Frequency
3
0
75 80 85 90 95
Score on Exam 1
Relations Between the Measures of Central Tendency
– The less similar the scores are to each other, the higher the
measure of dispersion will be.
125
100
Which of the distributions of 75
scores has the larger dispersion? 50
25
0
1 2 3 4 5 6 7 8 9 10
104
The three most commonly used measures of variability are the range,
variance, and standard deviation.
The Range
It is the simplest and crudest measure of variability calculated by
subtracting the lowest score from the highest score.
N
The variance is equal to the average squared deviation from the
mean.
To compute, take each score and subtract the mean and
square the result. Then find the average over scores. That is
variance.
High variance means that most scores are far away from the
mean.
Low variance indicates that most scores cluster tightly about
the mean.
Computing the Variance
(N=5) X X X X (X X ) 2
5 15 -10 100
10 15 -5 25
15 15 0 0
20 15 5 25
25 15 10 100
Total: 75 0 250
Mean: Variance Is 50
The Standard Deviation
The most useful measure of variability, or spread of scores is the
standard deviation.
It is essentially an average of the degree to which a set of scores
deviates from the mean.
If the Standard Deviation is large, it means the numbers are
spread out from their mean. Common for heterogeneous group/s.
If the Standard Deviation is small, it means the numbers are
close to their mean. Common for homogeneous group/s.
The procedure for calculating a Standard Deviation involves the
following steps:
Compute the mean.
Subtract the mean from each individual’s score.
Square each of these individual scores.
Find the sum of the squared scores (∑X2).
Divide the sum obtained in step 4 by N, the number of students, to
get the variance.
Find the square root of the result of step 5. This number is the
standard deviation (SD) of the scores.
The formula for the standard deviation (SD) is:
X X
2
SD
N
Standard deviation = variance
Variance = standard deviation2
Computing the Standard Deviation
(N=5)X X X X ( X X )2
5 15 -10 100
10 15 -5 25
15 15 0 0
20 15 5 25
25 15 10 100
Total: 75 0 250
Mean: Variance Is 50
Sqrt SD Is √50 =
Measures of Relationship
If we have two sets of scores from the same group of people, it is often
desirable to know the degree to which the scores are related.
For example, we may be interested in the relationship between the test
scores of students for the English Subject and their overall scores of
other subjects.
The degree of relationship is expressed in terms of coefficient of
correlation.
The value ranges from -1.00 to +1.00.
A perfect positive correlation is indicated by a coefficient of +1.00 and
a perfect negative correlation by a coefficient of -1.00.
A correlation of 0 indicates no relationship between the two sets of
scores. E.g. the relationship between your shoe size and your salary.
Obviously, the larger the coefficient (positive or negative), the higher
the degree of relationship expressed.
A positive correlation indicates that the variables increase together.
A negative correlation indicates that as one variable increases, the other
decreases.
There are two common measures of relationship expressed as
correlation coefficients.
1. Pearson Product-moment correlation coefficient
The most commonly used and most useful correlation coefficient.
It is indicated by the symbol r.
The Pearson correlation evaluates the linear relationship between two continuous
variables.
The formula for obtaining the coefficient of correlation is:
Or other alternative formula is:
XY X Y
N N N
r
2 2
X X
2
Y Y
2
N N N N
The following steps serve as a guide for computing a Pearson
product-moment correlation coefficient.
Begin by writing the pairs of scores to be studied in two columns.
Make certain that the pair of scores for each student is in the same
row.
Square each of the entries in the X column and enter the result in
the X2 column.
Square each of the entries in the Y column and enter the result
in the Y2 column.
In each row, multiply the entry in the X column by the entry in
the Y column, and enter the result in the XY column.
Add the entries in each column to find the sum of (∑) each
column.
Substitute the obtained values in the formula.
cont….
Student Score Score X2 Y2 XY
X Y
1 20 24 400 576 480
2 18 21 324 441 378
3 16 23 256 529 368
4 14 20 196 400 280
5 14 18 196 324 252
6 12 14 144 196 168
7 11 16 121 256 176
8 10 12 100 144 120
9 8 10 64 100 80
10 7 12 49 144 84
130 170 1850 3110 2386
(∑X) (∑Y) (∑X2) (∑Y2) (∑XY)
N = 10
cont….
X Y 130
13 y 2 3110 311
N 10
N 10
Y
Y
170
17
N 10 XY 2386 238.6
x2 1850 185 N 10
N 10
The formula for obtaining the coefficient of correlation is:
XY X
Y
N N N
r
2 2
X
X 2
Y
Y 2
N N N N
23860 22100
r
18500 16900 31100 28900
1760
r 0.94
1876
Cont…
• Where
– Σ = Sum of
– d = Difference in rank
– n = Number in group
Cont…
Students English Maths Rank Rank d d2
(mark) (mark) (English) (maths)
1 56 66
2 75 70
3 45 40
4 71 60
5 62 65
6 64 56
7 58 59
8 80 77
9 76 67
10 61 63
Cont…
Students Maths Rank (English) Rank (maths) d d2
(mark)
1 66 9 4 5 25
2 70 3 2 1 1
3 40 10 10 0 0
4 60 4 7 3 9
5 65 6 5 1 1
6 56 5 9 4 16
7 59 8 8 0 0
8 77 1 1 0 0
9 67 2 3 1 1
10 63 7 6 1 1
54
Where d = difference between ranks and d2 = difference squared.
We then calculate the following:
We then substitute this into the main equation with the other
information as follows:
Interpretation of correlation coefficient
Chapter Four
Reliability and Validity
Reliability
A reliability is the ability of test to provide a consistent test score
on repeated measurement.
Construct validity
Relates to whether the test is an adequate measure of the underlying
construct.
Face validity
Refers simply to whether or not a test "looks like" it measures what it is
intended to measure.
Factors influencing validity
Unclear direction
Too difficult vocabulary and sentence structure
Inappropriate level of difficulty of the test items
Poorly constructed test items
Ambiguity
Too short test
Identifiable pattern of answers
Factors in test administration and scoring
To select the best available items for future use and keep it
in the item bank.
i.e. P = T X 100
N
Thus for item 1 in table 3.1,
P = 14 X 100 = 0.7 X 100 = 70%
20
The difficulty level of items should not be “too easy” or “too difficult”
Item difficulty interpretation
Formula: