Assessment Course

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 221

Assessment and

Evaluation of
Learning
(PGDT 423)
Unit 1: Assessment: Concept, Purpose, and Principles

• Introduction
• This part will familiarize you with some basic
concepts as: test, measurement, assessment and
evaluation;
• Brief explanation of the role of educational
objectives in assessment;
• Important principles that have to be adhered when
assessing students’ learning; and
• The importance of involving students in the
assessment process
Unit one Learning Outcomes
After the end of this unit, you will be able to:
• Define the meaning of test, measurement,
assessment and evaluation.
• Examine the purposes of assessment and evaluation
of learning
• Identify the principles of assessment and evaluation
of learning.
• Apply the principles of assessment and evaluation
of learning in the local context
1.1 Concepts

Before we start studying educational assessment and


evaluation, we need to have a clear understanding
about certain related concepts.
You might have come across the concepts test,
measurement, assessment, & evaluation.
• Reflection:
• What do you know about these concepts? Can you
differentiate these concepts? Please try.
Quiz
• short narrow, concept and informal written test
• It requires very short time and can be given at the
class hours; just at the beginning or at the end of
the class
• Test:
• You have been taking tests ever since you have started
schooling to determine your academic performance.
• Tests are also used in work places to select individuals for
a certain job vacancy.
• Thus test in educational context is meant to the
presentation of a standard set of questions to be
answered by students.
• It is one instrument that is used for collecting
information about students’ behaviors or performances.
• Note: there are also other ways that helps to collect
information about students’ educational performances
such as observations, assignments, project works,
portfolios, etc.
Types of test
Tests can be categorized in various types depending
on what base is considered as follows-
Considering the items-
• Choice items (True-false, matching, multiple-choice)
• Completion items
• Short answered items
• Essay items
Considering how observations are scored
• Objective tests
• Subjective tests
• Considering degree of standardization
Standardized tests
Non-standardized tests
• Considering administrative conditions
Individual tests
Group tests
• Considering Language emphasis on response
Verbal tests
Performance tests
• Considering emphasis on time
Power tests
Speed tests
• Considering Score-referencing scheme
Norm-referenced tests
Criterion-referenced tests

• Considering what attribute measured


Achievement tests
Aptitude tests
Personality and adjustment tests
Interest inventory
Attitude and value questionnaire
Uses of test
• Selection decision
• Placement decision
• Classification decision
• Counseling and guidance decision
• Educational diagnostic and remedial decision
• Program improvement and evaluation decision

Other uses- The tests are also used for other purposes
such as information on grading of learners, giving feed-
back to learners, feed- back about effectiveness of
teaching, motivating learners to study and serving as a
scientific tool in research, Education and social sciences.
Examination
• covers large areas of contents,
• given at the end of the semester or the
course,
• the number of items included is large.
• Its main purpose in to assign grades.
Measurement:
• In our day to day life there are different things that
we measure.
• We measure our height and put it in terms of
meters and centimeters. We measure some of our
daily consumptions like sugar in kilograms and
liquids in liters.
• We measure temperature and express it in terms of
degree centigrade or degree Celsius.
• Hence to measure these things appropriate
instruments such as a meter, a weighing scale, or a
thermometer are mandatory in order to have
reliable measurements.
• Similarly, in education measurement is the process
by which the attributes of a person are described in
numbers.
• It is a quantitative description of the behavior or
performance of students.
• As educators we frequently measure human
attributes such as attitudes, academic achievement,
aptitudes, interests, personality and so forth.
• Measurement permits more objective description
concerning traits and facilitates comparisons.
• Hence, to measure we have to use certain
instruments so that we can conclude that a certain
student is better in a certain subject than others.
• Eg., a mathematics test is an instrument containing
questions and problems to be solved by students.
The number of right responses obtained is an
indication of performance of individual students in
mathematics
• Thus, the purpose of educational measurement is to
represent how much of ‘something’ is possessed by
a person using numbers.
• measurement is the assignment of quantitative
value to the results of a test or other assessment
techniques. Measurement can refer to both the
score obtained as well as the process itself
The process of measurement involves
• Specifying what is to be measured (learning objectives
and contents)
• Constructing devices to measure a certain attributes.
• Defining attributes to be measured (what behavior is to
be measured)
Results of measurement from class room tests may be used
to:
• Direct prescriptive study for the students who took the
test
• Indicate whether a students has mastered a well defined
body of subject matter or developed particular skill
• Indicate where a student stands in relation to others who
Characteristics of Psychological Measurement
• Psychological measurements have three
characteristics that put limitation on the
interpretation and use of psychological test. These
are:
1. It is descriptive:- a test score only describe
performance, It does not interpret or evaluate it
• e.g an IQ score of 102 describe a person’s
performance on particular test administration:
• interpretation of scores requires further
information, such as the performance of other
people to and knowledge of the test takers
background.
2. It is relative: scores usually are interpreted by
comparison with other people. Because:
• there are few absolute standards of performance
in areas that we test
• differences between people usually are more
interesting as useful than absolute
3. It is indirect:
• Characteristics are inferred from specific
behavior rather than being measured directly.
Assessment:
• In educational literature the concepts ‘assessment’ and
‘evaluation’ have been used with some confusion.
• Some educators have used them interchangeably to mean
the same thing. Others have used them as two different
concepts. Even when they are used differently there is too
much overlap in the interpretations of the two concepts.
• Cizek (in Phiye, 1997) provides a comprehensive definition of
assessment.
the planned process of gathering and synthesizing information
relevant to the purposes of:
(a) discovering and documenting students' strengths and
weaknesses,
(b) planning and enhancing instruction, or
(c) evaluating progress and making decisions about students.
• Generally, educational assessment is viewed as the
process of collecting information with the purpose of
making decisions about students.
• Instruments used to collect information includes tests,
observations, checklists, questionnaires and interviews.
• Rowntree (1974) views assessment as a human
encounter in which one person interacts with another
directly or indirectly with the purpose of obtaining and
interpreting information about the knowledge,
understanding, abilities and attitudes possessed by that
person.
• The key words in the definition of assessment is
collecting data to make decisions. Hence, to make
decisions one has to evaluate.
Types of Assessment
There are Informal and Formal types of Assessment
Informal Assessment –
• It is the process of gathering information about learning
on spur of movement or casually during class room
activity.
• It is not necessarily carefully planned but it meant provide
information that is critical to be known at that moment
• It includes a variety of techniques as:- questioning
learners,
• Observing learners work,
• Reviewing learners home work,
• Talking with learners and listening learners during reaction.
• Peer and self evaluation, and discussion.
• It doesn’t contribute to a students final grade
Formal assessment
• It is usually a written document, focused on assessing
specific competences of the learners.
• In a formal assessment score or grades is given on the
bases of student performance.
It includes variety of techniques as –
• Test, oral examination.
• Performance assessment tasks,
• Class work/Homework
• Examinations, project work,
• Portfolios and the like.
• Laboratory work, group work
• Research work, term paper
Evaluation:
• It is the process of making judgment about a given
situation,
• It refers to the process of judging the quality of
student learning on the basis of established
performance standards and assigning a value to
represent the worthiness or quality of that learning or
performance.
• It is concerned with determining how well they have
learned. When we evaluate, we are saying that
something is good, appropriate, valid, positive,
• Evaluation is based on assessment that provides
evidence of student achievement, the grade/course,
• Evaluation includes both quantitative and
qualitative descriptions of student behavior plus
value judgment concerning the desirability of that
behavior.
• Evaluation = Quantitative description of students’
behavior (measurement) + qualitative description of
students’ behavior (non-measurement) + value
judgment
• Thus, evaluation may or may not be based on
measurement (or tests) but when it is, it goes
beyond the simple quantitative description of
students’ behavior.
• Evaluation involves judgment. The quantitative
values that we obtain through measurement will
not have any meaning until they are evaluated
against some standards.
• Educators are constantly evaluating students and it
is usually done in comparison with some standard.
• Thus, evaluation is described as the comparison of
what is measured against some defined criteria and
to determine whether: it has been achieved,
appropriate, reasonable, valid and so forth.
• Evaluation summarizes and communicates what
students know and can do with respect to the
overall curriculum expectations to: parents,
teachers, institutions of further education, and
students themselves.
• In summary, test is one type of instrument used in
assessment, Measurement is assigning of numbers
to the results of a test or other forms of assessment
according to a specific rule
• Assessment is more comprehensive and inclusive
than testing and measurement. It includes the full
range of procedures (observations, rating of
performances, paper and pencil tests, etc)
• It may also include quantitative descriptions
(measurement) and qualitative descriptions (non-
measurement) of students’ behaviors.
• Evaluation, on the other hand, consists of making
judgments about the level of students’ achievement
for the purposes of grading and accountability and
for making decisions about promotion and
graduation.
Distinction between Assessment & Evaluation

• While assessment and evaluation are highly


interrelated and are often used interchangeably
as terms, they are not synonymous. The process
of assessment is to gather, summarize, interpret,
and use data to decide a direction for action.
• The process of evaluation is to gather,
summarize, interpret, and use data to
determine the extent to which an action was
successful.
Assessment
Evaluation
• Assessment is the gathering • Evaluation is the act of
of information about setting a value on the
something, such as student assessment information
performance. • Evaluation is a judgment
• Assessment is information • Evaluation is quantitative as
• Assessment is qualitative well as qualitative
• Assessment pinpoints specific • Evaluation ranks and sorts
strengths and weaknesses individuals within groups
• Assessment is diagnostic and
• Evaluation is only summative
formative, as well as
summative
• Assessment focuses on the • Evaluation focuses on the
individual student group
1.2 Importance and Purposes of Assessment
?
• What do you think is the purpose of assessment?
Why do teachers assess their students?
Importance of Assessment
• Classroom assessment provides feedback for
teachers about their effectiveness, and students
progress in learning.
• Classroom assessment helps individual teachers
obtain useful feedback on what, how much, and
how well their students are learning.
• The staff can then use this information to refocus
their teaching to help students make their learning
more efficient and more effective.
Note: in general, assessment in education focuses on
helping learning and improving teaching.

• With regards to the learner, assessment is aimed at


providing information that will help us make
decisions concerning remediation, enrichment,
selection exceptionality, progress and certification.
• With regard to teaching, assessment provides
information about the attainment of objectives, the
effectiveness of teaching methods and learning
materials.
Overall, assessment serves the following main
purposes.
• Assessment is used to inform and guide teaching
and learning:
• A good classroom assessment plan gathers evidence
of student learning that informs teachers'
instructional decisions.
• To plan effective instruction, teachers also need to
know what the student misunderstands and where
the misconceptions lie. In addition to helping
teachers formulate the next teaching steps, a good
classroom assessment plan provides a road map for
students.
• Assessment is used to help students set learning
goals:
• Students need frequent opportunities to reflect on
where their learning is at and what needs to be
done to achieve their learning goals.
• Assessment is used to assign report card and
grades: Grade reports provide summarized
information about student learning to parents,
schools, and other stakeholders.
• Assessment is used to motivate students: Research
has shown that students will be confident and
motivated when they experience progress and
achievement, rather than failure
Why we assess students?
Incentive to learn
Feedback to student
To inform instruction
Modification of learning activities
Selection of students
To decide success or failure
Feedback to teacher
Gather evidence of student learning
To motivate students
Increase student achievement
To assign grades/ranks.
1.3. The Role of Educational Objectives in Assessment

• Group Assignment (10%)

• Define educational or learning objectives and


learning outcomes
• Describe “Bloom’s Taxonomy of Educational
Objectives”.
1.4. Principles of Assessment

• These principles are expressed in terms of fair


(reliable and valid) assessment system.
• Assessment principles guide the collection of
meaningful information that will help inform
instructional decisions, promote student
engagement, and improve student learning.
• Different educators and school systems have
developed different sets of assessment principles.
Miller, Linn and Grunland (2009) have identified the
following general principles of assessment:
• New South West Wales Department of Education
and Training (2008) in Australia are more inclusive
than those principles listed by other educators.
These principles:
1. Assessment should be relevant. Assessment needs
to provide information about students’ knowledge,
skills and understandings of the learning outcomes
specified in the syllabus.
2. Assessment should be appropriate.
Assessment needs to provide information about
the particular kind of learning in which we are
interested

This means variety of assessment methods


provide information about all kinds of learning.
For example, some kinds of learning are best
assessed by observation some by projects work
and others by having students paper and pen
tasks
3. Assessment should be fair. Assessment needs to
provide opportunities for every student to
demonstrate what they know, understand and can do.
Assessment must be based on a belief that all learners
are on a path of development and that every learner
is capable of making progress.
There should not be advantaged or disadvantaged
based on diversity of cultural knowledge, experience,
language proficiency background, and the like.
4. Assessment should be accurate. Assessment
needs to provide evidence that accurately reflects
an individual student’s knowledge, skills and
understandings.

That is, assessments need to be reliable or


dependable in that they consistently measure a
student’s knowledge, skills and understandings. It
also needs to be objective so that if a second
person assesses a student’s work, they will come to
the same conclusion
5. Assessment should provide useful information.
The focus of assessment is to establish where
students are in their learning. This information can be
used for both summative purposes, such as the
awarding of a grade, and formative purposes to feed
directly into the teaching and learning cycle.
6. Assessment should be integrated into the
teaching and learning cycle. Assessment needs to be
an ongoing, integral part of the teaching and learning
cycle. It must allow teachers and students themselves
to monitor learning.

7. Assessment should draw on a wide range of


evidence. A complete picture of student achievement
in an area of learning depends on evidence that is
sampled from the full range of knowledge, skills and
understandings that make up the area of learning
8. Assessment should be manageable. Assessment
needs to be efficient, manageable and convenient. It
needs to be incorporate classroom activities and
capable of providing information that justifies the
time spent.
?
• What is their similarities and differences between
the two sets of principles discussed above?
• Based on your experiences, compare and contrast
the extent each of these principles were followed at
secondary and university education levels
1.5. Assessment and Some Basic Assumptions

• what assumptions you held in mind when preparing


assessment tools for assessing students?

?
• Angelo and Cross (1993) have listed seven basic
assumptions of classroom assessment which are
described as follows:
1. Although not exclusive, the quality of student
learning is directly, related to the quality of teaching.
• Therefore, one of the most promising ways to
improve learning is to improve teaching.
2. To improve their effectiveness, teachers need to
make their goals and objectives explicit. Then
they get specific, comprehendible feedback on
the extent to which they are achieving those
goals and objectives.

Effective assessment begins with clear goals. Before


teachers can assess how well their students are
learning, they must identify and clarify what they
are trying to teach.
3. To improve their learning, students need to receive
appropriate and focused feedback; they also need to
learn how to assess their own learning
4. The type of assessment most likely to improve
teaching and learning is the assessment conducted
by teachers to answer questions they themselves
have formulated in response to issues or problems in
their own teaching.
To best understand their students’ learning, teachers
need specific and timely information about the
particular individuals in their classes.
• How do you think assessment will help to
increase students’ motivation

?
• According to current cognitive research, people are
motivated to learn by success and competence.
When students feel ownership and have choice in
their learning, they are more likely to invest time
and energy in it.
• Assessment can be a motivator, not through reward
and punishment, but by stimulating students’
intrinsic interest.
Assessment can enhance student motivation by:
• emphasizing progress and achievement rather than
failure
• providing feedback to move learning forward
• reinforcing the idea that students have control over,
and responsibility for, their own learning
• building confidence in students so they can and need
to take risks
• being relevant, and appealing to students’
imaginations
• providing the scaffolding that students need to
genuinely succeed
• There is strong evidence that involving students in
the assessment process can have very definite
educational benefits.

• Activity ?
• 1) As prospective teachers how do you think you
can involve your students in the assessment
process?
• 2) In what ways can students benefit if they are
involved in the assessment process?
• One way in which we can involve our students in
the assessment process is to establish the standards
or assessment criteria with them.
• This will help students understand what is to be
assessed.
• It helps students develop a clear picture of where
they are going, where they are now and how they
can close the gap.
• Another aspect is to involve students in trying to
apply the assessment criteria for themselves. The
evidence is that through trying to apply criteria, or
mark using a model answer, students gain much
greater insight into what is actually being required.
• Third by self-assessment and peer assessment. .
• self-assessment helps to aware one’s own
strengths and weaknesses in a particular subject
and identify one’s own next steps.
• Self-assessment allows students to think more
carefully about what they do and do not know, and
what they additionally need to know to accomplish
certain tasks.
• Peer assessment is making judgment about other
students’ work.
• Students learn how to make better sense of
assessment criteria if it gives feedback and/or
marks against them. Giving and receiving
feedback is an important aspect of student
learning and will be valuable skills for them in
professional contexts and for future learning.
Assessment and Teacher Professional Competence in Ethiopia

• Assessment requires much of a teachers


professional time, both inside and outside the
classroom. Therefore, a teacher should have some
basic competencies on classroom assessment so as
to be able to effectively assess his/her students
learning.
?
• What competencies do you think you should have in
the area of assessment? Write down your ideas and
compare it with the work of another colleague.
Seven competences required for teachers are
described as follows:
1. Teachers should be skilled in choosing assessment
options appropriate for instructional decisions. They
need to be well-acquainted with the kinds of
information provided by a broad range of assessment
alternatives and their strengths and weaknesses.
2. Teachers should be skilled in developing
assessment methods appropriate for instructional
decisions. Assessment tools may be accurate and fair
(valid) or invalid. Teachers must be able to determine
the quality of the assessment tools they develop.
3. Teachers should be skilled in administering,
scoring, and interpreting the results of assessment
methods. It is not enough that teachers are able to
select and develop good assessment methods; they
must also be able to apply them properly.
4. Teachers should be skilled in using assessment results
when making decisions about individual students, planning
teaching, developing curriculum, and school improvement.
5. Teachers should be skilled in developing valid student
grading procedures that use pupil assessments. Grading
students is an important part of professional practice for
teachers.
6. Teachers should be skilled in communicating assessment
results to students, parents, other lay audiences, and other
educators. Furthermore, teachers sometimes will required to
defend their own assessment procedures and their
interpretations of them.
At other times, teachers may need to help the public to
interpret assessment results appropriately
7 Teachers should be skilled in recognizing unethical,
illegal, and otherwise inappropriate assessment
methods and uses of assessment information.
Teachers must be familiar in their own ethical and
legal responsibilities in assessment. In addition, they
should also identify the inappropriate assessment
practices of others whenever they are encountered.
• In Ethiopia, MoE has developed such assessment
related competences which professional teachers
are expected to possess.
Individual assessment (10%)
When can we conduct assessment?
• 1. Why we assess our students before starting
teaching?
• 2. Why we assess our students during
instruction?
• 3. Why we assess our students after
instruction?
2.3 Specific Contents
2.3.1. Types of assessment
There are different approaches in conducting
assessment in the classroom. Here we are going to
see three pairs of assessment typologies: namely:
• formal vs. informal,
• criterion referenced vs. norm referenced,
• formative vs. summative assessments.
A) Informal Assessment –
• It is the process of gathering information about learning
on spur of movement or casually during class room
activity.
• It is not necessarily carefully planned but it meant
provide information that is critical to be known at that
moment
• It includes a variety of techniques as:- questioning
learners,
• Observing learners work,
• Reviewing learners home work,
• Talking with learners and listening learners during reaction.
• Peer and self evaluation, and discussion.
B) Formal assessment
• It is usually a written document, focused on assessing
specific competences of the learners.
• In a formal assessment score or grades is given on the
bases of student performance.
It includes variety of techniques as –
• Test, oral examination.
• Performance assessment tasks,
• Class work/Homework
• Examinations, project work,
• Portfolios and the like.
• Laboratory work, group work
• Research work, term paper
C) Formative Assessment
• Assessment procedures can be classified
according to their functional role during classroom
instruction.
What is the difference between Formative and
Summative Assessments?
Formative Assessment: It can include both informal and
formal assessments and help to gain a clearer picture of
where students are and what they still need help with.
• They can be given before, during, and even after
instruction, as long as the goal is to improve instruction
• Formative assessments are ongoing assessments,
reviews, and observations in a classroom.
• They serve a diagnostic function for both students and
teachers.
• Students receive feedback and use it to adjust and
improve their performance or other aspects of their
engagement in the unit such as study techniques
• Teachers receive feedback on the quality of
learners’ understandings and consequently, can
modify their teaching approaches to provide
enrichment or remedial activities to more
effectively guide learners
• Formative assessment is also known by the name
‘assessment for learning’. The basic idea of this
concept is that the basic purpose of assessment
should be to enhance students learning
• There is still another name which is associated
with the concept of formative assessment,
‘continuous assessment’
• Continuous assessment (as opposed to terminal
assessment) is based on the premise that if
assessment is to help students’ improvement in
their learning and if a teacher is to determine the
progress of students towards the achievement of
the learning goals, it has to be conducted on a
continuous basis.
• Thus, continuous assessment is a teaching approach
as well as a process of deciding to what extent the
educational objectives are actually being realized
during instruction.
• In schools, continuous assessment of learning is
usually carried out by teachers on the basis of
impressions gained as they observe their students
at work or by various kinds of tests giving
periodically.
• Therefore, each decision is based on various types
of information that are determined through
different assessment methods at different time by
teachers
• In order to assess students' understanding, there
are various strategies that you can use.
• Can you mention some of the strategies that you

• What method of assessment can you use to


assess your students for formative
purposes? Please, try to mention as many
strategies as you can
• The following are some of the strategies of
assessment you can employ in your classrooms:
– You can make your students write their understanding
of vocabulary or concepts before and after instruction.
– You can ask students to summarize the main ideas
they've taken away from your presentation, discussion,
or assigned reading.
– You can make students complete a few problems or
questions at the end of instruction and check answers.
– You can interview students individually or in groups
about their thinking as they solve problems.
– You can assign brief, in-class writing assignments (e.g.,
"Why is this person or event representative of this
time period in history?)
• Tests and homework can also be used formatively
to analyze where students are in their learning
and provide specific, focused feedback regarding
performance and ways to improve it
Formative Assessment
(Assessment for learning)
It is to assist the learning process by providing feedback
It is continuously gathering evidence about learning.
It has the greatest impact on student learning.
It is diagnostic and remedial
It is non-graded
Can be done formally or informally for feedback.
It is process oriented (cultivating the learner)
Carried out during instruction
It is part of the teaching method.
.
D) Summative Assessment: Assessment of learning
• It typically comes at the end of a course (or unit)
of instruction.
• It evaluates the quality of students’ learning and
assigns a mark to that students’ work based on
how effectively learners have addressed the
performance standards and criteria.
• Assessment tasks conducted during the progress
of a semester may be regarded as summative in
nature if they only contribute to the final grades
of the students
• The techniques used in summative assessment
are determined by the instructional goals.
Typically, however, they include teacher made
achievement tests, ratings of various types of
performance, and assessment of products
(reports, drawings, etc.).
• A particular assessment task can be both
formative and summative. For example, students
could complete unit 1 of their Module and
complete an assessment task for which they
earned a mark that counted towards their final
grade. In this sense, the task is summative
• They could also receive extensive feedback on
their work that guide learners to achieve higher
levels of performance in subsequent tasks.
• In this sense, the task is formative – because it
helps students form different approaches and
strategies to improve their performance in the
future.
• Methods for informal assessment can be divided into
two main types: unstructured (e.g., student work
samples, journals) and structured (e.g., checklists,
observations).
• The unstructured methods frequently are somewhat
more difficult to score and evaluate, but they can
provide a great deal of valuable information about the
skills of the students.
• Structured methods can be reliable and valid techniques
when time is spent creating the "scoring" procedures.
• Another important aspect of informal assessments is
that they actively involve the students in the evaluation
process - they are not just paper-and-pencil tests
Summative Assessment: (Assessment of Learning)

at the end of a unit/term/semester.


the purpose is to gather evidence of student
achievement after instruction.
used primarily to make decisions for grading or
certification purpose.
to judge the learner’s overall performance.
for checking mastery
to pass or failure
to determine what has been learned from the lesson,
to summarize student progress.
E) Criterion-referenced and Norm-referenced
Assessments
• It is the method of interpretation of performance
1. Criterion-referenced Assessment:
• It allows use to quantify the extent students have achieved the
goals of a unit of study and a course.
• It is carried out against previously specified criteria and
performance standards.
• It is most appropriate for quickly assessing what concepts and
skills students have learned from a segment of instruction.
• Criterion referenced classrooms are mastery-oriented, informing
all students of the expected standard and teaching them to
succeed on related outcome measures.
• It helps to eliminate competition and may improve cooperation.
2. Norm-referenced Assessment:
• It determines student performance based on a position within a
norm group.
• It is appropriate when comparison across large numbers of
students is made for decisions regarding student placement and
advancement.
• For example, grade 8 national exams is determined based on their
relative standing in comparison to all other students who have
taken the exam.
• Thus, when we say that a student has scored 80 percentile, it
doesn’t mean that the student has scored an average of 80% score.
Rather it is meant to be that the student’s average score stands
above 79.9% of the students, and the remaining 20% of students
have scored above that particular student.
• Students’ assignment of ranks is also another example of norm-
referenced interpretation of students’ performances.
2.3.2. Assessment Strategies

• Assessment strategy refers to Methods or activities


in which students involved to ensure the learning
objectives of a subject, a unit or a lesson have been
adequately addressed.
• Assessment strategies are various and ranges from
informal, observation to formal examinations.
• One have to be selective to use a strategy that can
give students an opportunity to demonstrate the
kind of behavior that the learning outcome
demands
• Assessment strategies should also be related to the
course material and relevant to students’ lives
• Categorizing learning goals thoroughly helps to identify
what we want students to know and be able to do.
• Knowledge and understanding: What facts do students
know outright? What information can they retrieve?
What do they understand?
• Reasoning proficiency: Can students analyze, categorize,
and sort into component parts? Can they generalize and
synthesize what they have learned? Can they evaluate
and justify the worth of a process or decision?
• Skills: skills to be mastered such as reading fluently,
working productively in a group, making an oral
presentation, speaking a foreign language, or designing
an experiment.
Assessment strategies that can be used by classroom
teachers:
a. Classroom presentations: requires students to verbalize
their knowledge, select and present samples of finished
work, and organize their thoughts about a topic in order to
present a summary of their learning
b. Conferences: formal or informal meeting between the
teacher and a student for the purpose of exchanging
information or sharing ideas.
• It might be held to explore the student’s thinking and
suggest next steps; assess the student’s level of
understanding of a particular concept or procedure; and
review, clarify, and extend what the student has already
completed
c. Exhibitions/Demonstrations: It is a performance
in a public setting, during which a student explains
and applies a process, procedure, etc., in concrete
ways to show individual achievement of specific
skills and knowledge.
• What type of objectives do you think this
assessment strategy could serve to measure?
d. Interviews: It is a face-to-face conversation in
which teacher and student use inquiry to share
their knowledge and understanding of a topic or
problem.
• Such form of assessment can be used by the teacher to:
• explore the student’s thinking;
• assess the student’s level of understanding of a concept or
procedure; and
• gather information, obtain clarification, determine positions,
and probe for motivations
e. Observation: Observation is a process of systematically viewing
and recording students while they work, for the purpose of
making instruction decisions.
• Observation can take place at any time and in any setting. It
provides information on students' strengths and weaknesses,
learning styles, interests, and attitudes.
• Observations may be informal or highly structured, and
incidental or scheduled over different periods of time in
different learning contexts.
f. Performance tasks: During a performance task,
students create, produce, or present works on "real
world" issues. The performance task may be used to
assess a skill or proficiency, and provides useful
information on the process as well as the product.
g. Portfolios: A portfolio is a collection of samples
of a student’s work over time. It offers a visual
demonstration of a student’s achievement,
capabilities, strengths, weaknesses, knowledge, and
specific skills, over time and in a variety of contexts.
h. Questions and answers:
• It helps the teacher to determine whether students
understand what is being, presented;
• it also helps student to extend their thinking,
i) Checklists usually offer a yes/no format in relation to
student demonstration of specific criteria. They may be
used to record observations of an individual, a group or
a whole class.
j) Rating Scales allow teachers to indicate the degree or
frequency of the behaviors, skills and strategies
displayed by the learner. Rating scales state the criteria
and provide three or four response selections to
describe the quality or frequency of student work.
l) Rubrics use a set of criteria to evaluate a student's performance.
• They consist of a fixed measurement scale and detailed description of the
characteristics for each level of performance.
• These descriptions focus on the quality of the product or performance and
not the quantity.
• Rubrics use a set of specific criteria to evaluate student performance.
• They may be used to assess individuals or groups and, as with rating scales,
may be compared over time.
• The purpose of checklists, rating scales and rubrics is to:
- provide tools for systematic recording of observations
- provide tools for self-assessment
- provide samples of criteria for students prior to collecting and evaluating
data on their work
- record the development of specific skills, strategies, attitudes and
behaviours necessary for demonstrating learning
- clarify students' instructional needs by presenting a record of current
accomplishments.
m) One- Minute paper
• During the last few minutes of the class period,
you may ask students to answer on a half-sheet
of paper: "What is the most important point
you learned today?" and, "What point remains
least clear to you?"
• The purpose is to obtain data about students'
comprehension of a particular class session.
Then you can review responses and note any
useful comments. During the next class periods
you can emphasize the issues illuminated by
your students' comments.
n) Muddiest Point: This is similar to ‘One-
Minute Paper’ but only asks students to
describe what they didn't understand and
what they think might help.
• It is an important technique that will help you
to determine which key points of the lesson
were missed by the students.
• Here also you have to review before next class
meeting and use to clarify, correct, or
elaborate.
o) Student- generated test questions: You may allow
students to write test questions and model answers for
specified topics, in a format consistent with course
exams.
• This will give students the opportunity to evaluate the
course topics, reflect on what they understand, and what
good test items are. You may evaluate the questions and
use the goods ones as prompts for discussion.
p) Tests: This is the type of assessment that you are mostly
familiar with.
• A test requires students to respond to prompts in order
to demonstrate their knowledge (orally or in writing) or
their skills (e.g., through performance)..
2.3.3. Assessment in large classes

• Assessment methods are restricted by class size.


Due to time and resources constraints, teachers
often use less time-demanding assessment
methods
• However, this may not always optimize student
learning
• Literature has identified various assessment issues
associated with large classes. They include:
1. Surface Learning Approach: teachers rely on
assessment methods such as: multiple choices and
short answer question examinations.
• These assessments often only assess learning at the
lower levels of intellectual complexity.
• students tend to adopt a surface rote learning
• Higher level learning such as critical thinking and
analysis are often not fully assessed
2. Feedback is often inadequate:
• In a large class, teachers may not have time to give
detailed and constructive feedback to every
student.
• Most teachers give general feedback on written
assignments and tests.
3. Inconsistency in marking: Large class usually
consists of a diverse and complex group of students.
• Teachers have to ensure consistency and fairness
in marking and grading
4. Difficulty in monitoring cheating and plagiarism:
• Plagiarism is another challenge in assessing large
classes.
• Some students deliberately cheat in large classes
because they think they may not be identified
within a large group.
• Due to heavy workload and tight marking
teachers do not have enough time to thoroughly
check the work submitted by their students.
• To minimize plagiarism, assessment tasks must be
well thought and well-designed
5. Lack of interaction and engagement:
• Students are often not motivated to engage in a
large-sized lecture.
• Students are less likely to interact with teachers
because they feel less motivated and tend to hide
themselves in a large group.
Methods help to make the assessment of large class
effective include the following:
1. Front ending: effort at the beginning in setting
students for the work they are going to do can
improve the submitted work.
• These method can reduce marking time
2.Making use of in-class assignments: It is quick and
relatively easy to mark and provide feedback
• It also helps to identify gaps in understanding
3. Self-and peer-assessment
Self-assessment
• The emphasis is on the students’ responsibility for learning
• However, there are problems pertaining to their validity and
reliability if it is used for grading purposes
peer-assessment
• effective in getting feedback that staff may be busy to provide.
• However, it has to be carefully designed. Students need to know
what to do and there needs to be a transparent system by which
students can appeal their marks.
This may benefits students by:
• Showing how their peers have tackled a particular piece of work,
• Showing how to assess from the model /answer sheet and;
• giving them an opportunity to internalize the assessment criteria.
4. Group Assessments: Can reduces marking load
• However, the problem is that group members may
not contribute equally to be fair
5. Changing the assessment method, or at least
shortening it:
• Either modify existing assessments or explore new
methods of assessment. for example, reducing the
length of the assessment task without detaching
from your module's learning outcomes.
Constructing Tests
There are a wide variety of styles & formats for writing test items:
• Objective tests are highly structured and require the test taker
to select the correct answer from several alternatives or to
supply a word or short phrase to answer a question or complete
a statement.
• They are called objective because they have a single right or
best answer that can be determined in advance.
• Performance assessment tasks permit the student to organize
and construct the answer in essay form.
• Other types may require students to use equipment, generate
hypothesis, make observations, construct something or perform
for an audience.
• For most performance assessment tasks, there is not a single
best or right response. Expert judgment is required to score the
performances.
Constructing Objective Test Items
• There are various types of objective test items.
These can be classified into supply type items and
alternatives (selection type items).
• Supply type items include completion items and
short answer questions.
• Selection type test items include True/False,
multiple choice and matching.
• Each type of test has its unique characteristics,
uses, advantages, limitations, and rules for
construction.
CONSTRUCTING TRUE/FALSE TEST ITEMS
• True/ False test item consists declarative
statement that the learner is asked to mark true,
or false, right or wrong correct, or incorrect, yes,
or No, fact or opinion, agree or disagree, and the
like.
• In each case there are only two alternative
possible answers. It is used to measure facts,
definition of terms, and statement of principles
which is relatively simple learning outcome.
True/False Test Items

advantage
• It does not require much time to answer
• It can cover a wide range of content
• It can be scored quickly, reliably, and objectively
disadvantage
• It promotes memorization of factual information:
names, dates, definitions, and so on.
• encourage students for guessing. because there is
a 50 percent probability of getting the right answer.
• The diagnostic value of such test is nil
suggestions help to construct true/false test items.
• Avoid negative particularly double negative, if used
under line or bold them to minimize confusion.
• Restrict single-item statements to single concepts.
two concepts in a single item statement, may have two

responses
• Use approximately equal number of items
Make proportional number of true and false statement
• Make statements representing both categories equal in
length.
CONSTRUCTING MATCHING TEST ITEM

The standard matching format consists of


two columns:
• the questions or problems to be answered
or the premises, and
• the answers/option or response.
• They contain related words, phrases, or
symbols.
• Matching test items are ideally suited for
testing terminologies, knowledge of facts,
charts, diagrams, e.t.c.
• Advantage
• Easy to construct and easy to score
• Enables to measure wide areas in short paired of
time.
• Disadvantage
• Restricted to factual items and based on rote
memorization
• Difficult to get homogenous materials
Guidelines for the construction of matching items.
• Use fairly brief lists on the right.
The words and phrases that make up the premises should be short, and
those that make up the responses should be shorter still.
• Employ homogeneous lists. Both the list of premises and responses must be
composed of similar sorts of things. If not, an alert student will be able to
come up with the correct associations simply by “elimination”
• Include more responses than premises. If you use the exact same number of
responses as premises the student who knows half or more of the correct
associations guess the rest of the associations with very good chances.
• List responses in a logical order. (alphabetical or chronological order) don’t
accidentally give hints about which responses connect with which premises.
• Describe the basis for matching and the number of times a response can be
used. Direction has to clarify the nature of the associations students to use
Regarding the student’s use of responses, a phrase such as the following is
often employed
• Try to place all premises and responses for any matching item on a single
page. It makes free students from lots of potentially confusing flipping back
and forth in order to accurately link responses to premises
CONSTRUCTING SUPPLY TEST ITEMS

• The supply test item is a free response type of


item in which the student give their response in
words, phrase, symbols, or numbers.
• It is categorized in to Completion items and short answer
items.
• In the case of short answer item direct question is used
where as the completion item consists of incomplete
statements.
• Short answer item is suitable for measuring a wide variety of
relatively simple learning outcomes such as:
• Knowledge of terminologies , methods or procedure,
principles
Advantage
• Easy to construct
• Accurately and efficiently scored
• Reduce guessing.

Disadvantage
• Stress on rote recall and encourage student to spend
their time on rote memorizing of trivial details rather
than seeking important understanding
• In other words it is not suitable to measure higher
order of learning outcomes.
Guideline to construct supply test item
• Omit key words or phrases and substitute by blank space so that the
required answer will be definite and clear
• Avoid indefinite statement that may be logically answered by several
item
• Avoid excessive blanks in a single item when too many blanks are left
an incomplete statement has no meaning or it becomes ambiguous
• Do not take statements directly from the text as basis for short
answer item because it promotes memorization.
• Be sure that the question or statement poses problem to the
examine. A direct question is often more desirable than an
incomplete statement
• The length of the blank space should be equal in length unless it
gives clue to the students. However, be certain to include sufficient
space for the longest response.
CONSTRUCTING MULTIPLE- CHIOCE TEST ITEMS

• Multiple choice test item is the most widely


applicable type of test that contains two part:
• The stem (the problem part)
• the alternative or lists of suggested answers which
include words, numbers, symbols, and phrases.
• the correct option is called the key
• the incorrect options are named destructors
Advantage
• It is used to measure all levels of learning from rote
memorization to higher order learning.
• Scored quickly and accurately
• Compared to true/false items it minimizes guessing
• It offers several content sampling.
Disadvantage
• It takes time and needs skill to construct effectively
because; it is difficult to get plausible and attractive
distracters.
• It restrict demonstration of knowledge to the range of
provided options
• It provides clue that are un available in practice
Guideline for preparing Multiple choice test items
1. Be sure the stem asks a clear question and it not
contain a lot of irrelevant information.
2. Avoid grammatical or verbal clues to the correct
answer
3. All destructors should be plausible and attractive that
students
4.Never use “all of the above” as an answer choice, but
use “none of the above” to make items more demanding
5. Use three to five options
6. Randomly use all answer positions in approximately
equal numbers.
Don’t use patterned answers
Constructing Performance Assessments

• There are learning outcomes that cannot be


measured satisfactorily by objective type of tests.
These include such outcomes as the ability to:
organize and integrate ideas; express oneself in
writing; and the ability to create
• Such outcomes require less structuring of
responses than objective test items such as
written essays and other performance-based
assessments.
• In essay questions the students are free to construct,
relate, and present ideas in their own words.
• Learning outcomes concerned with the ability to
conceptualize, construct, organize, relate, and
evaluate ideas require the freedom of response and
the originality provided by essay questions.
Essay questions can be classified into: restricted-
response and extended response.
• Restricted-response . limits both the content and the
response. The content is usually restricted by the
scope of the topic to be discussed. Limitations on the
form of response are generally indicated in the
question.
• Extended response Essays: allow students:
• to select any factual information that they think is
relevant,
• to organize the answer in accordance with their
best judgment, and;
• to integrate and evaluate ideas as they deem
appropriate.
This freedom enables them to demonstrate their
ability to analyze problems, organize their ideas,
describe in their own words, and/or develop a
coherent argument.
Advantage
• measure higher order thinking skills,
• Extended-response essays focus on the integration
and application of thinking and problem solving
skills.
• enable to evaluate writing skills.
• easy to construct.
• have a positive effect on students learning.
disadvantage
• Time-consuming and difficult to score.
• must be scored by knowledgeable subject teacher
unless only one basic response is possible to a
given question or requirement,
• Measure only limited sample
Suggestions for the construction essay questions
• Restrict the use of essay questions to those learning outcomes
that can not be measured satisfactorily by objective items.
• Structure items so that the student’s task is explicitly bounded.
Phrase your essay items so that students will have no doubt
about the response you’re seeking.
• For each question, specify the point value, an acceptable
response-length, and a recommended time allocation
• Employ more questions requiring shorter answers rather than
fewer questions requiring longer answers. This rule is intended
to foster better content sampling in a test’s essay items
• Don’t employ optional questions. Having several options, end
up with different tests, unsuitable for comparison
Guidelines for scoring essay items and more reliable.
• You should ensure that you are firm emotionally,
mentally e.t.c before scoring
• All responses to one item should be scored before
moving to the next item
• Write out in advance a model answer to guide
yourself in grading the students’ answers
• Shuffle exam papers after scoring every question
before moving to the next
• The names of test takers should not be known
while scoring to avoid bias
2.3.5. Table of Specification and Arrangement of Items

Table of Specification
• The validity, reliability and usability of test depend
on the care with which the test are planned and
prepared.
• planning classroom test involves identifying the
instructional objectives earlier stated and the
subject matter (content) covered during the
teaching/learning process
• This planning leads to the preparation of table of
specification (the test blue print) for the test
Guides in planning a classroom test.
i. Determine the purpose of the test;
ii. Describe the instructional objectives and content to be
measured;
iii. Determine the relative emphasis to be given to each learning
outcome;
iv. Select the most appropriate item formats (essay or objective);
v. Develop the test blue print to guide the test construction;
vi. Prepare test items that is relevant to the learning outcomes
specified in the test plan;
vii. Decide on the pattern of scoring and the interpretation of result;
viii. Decide on the length and duration of the test, and
ix. Assemble the items into a test, prepare direction and administer
the test.
• A table of specification is a two-way table that
matches the objectives and content
• A table of specification guides the selection of test
items that measures a representative sample of
contents.
• Developing a table of specification involves:
-Preparing a list of learning outcomes, i.e. the type
of performance students are expected to
demonstrate
-Outlining the contents of instruction, i.e. the area
in which each type of performance is to be shown,
and
-Preparing the two way chart that relates the
learning outcomes to the instructional content.
Instructional Objectives

Contents Compreh Applicati Evaluatio Total Percent


Knowle Analysis Synthesis
ension on n

Assessment:
Concept, Purpose,
2 2 1 1 - - 6 24%
and Principles

assessment
strategies 1 1 1 1 - - 4 16%

item analysis 2 2 1 1 - 1 7 28%

interpretation of
scores 1 2 1 - 1 - 5 20%

Ethical Standards 1 1 - 1 - - 3 12%


of assessment
7 8 4 4 1 1 25
28% 32% 16% 16% 4% 4% 100%
Purpose of table of specification
• Identify clearly the scope emphasized by the test
• Relate the objectives to the content
• Use to balance construction of the test
• Prevent testing from content areas and
taxonomical level where it is easy to develops
test items
Arrangement of test items
Items can be arranged depending on their purposes.
in an achievement test the arrangement is as
follows:
• The type of items used
• The learning outcomes measured
• The difficulty of the items, and
• The subject matter measured
Administration of Tests
• It refers to the procedure of presenting the learning
task to the students to ascertain the degree of their
learning
• In administration of tests, all examinees must be
given fair chance to demonstrate their achievement
of the learning outcomes being measured
• This requires the provision of suitable physical and
psychological environment that help them perform
their best
Physical Conditions are the physical setting of the environment such
as
• Adequate place and seat in the room,
• Appropriate light, Ventilation, (comfortable temperature),
• Noise free area and the like.
Psychological conditions
• Psychological conditions are concerned with the learners’
psychological conditions such as:
• The examines should be relaxed as much as possible,
• Student should be informed as there is a test earlier
• Adequate time should be provided to complete the test
• Do not communicate at all one’s examination is started and avoid
unnecessary interruption
• Do not compromise with cheating.
• Avoid lunch noise
There are a number of conditions that may create
test anxiety on students and therefore should be
taken care of during test administration. These
include:
• Threatening students with tests if they do not
behave
• Warning students to do their best “because the
test is important”
• Telling students they must work fast in order to
finish on time.
• Threatening direct consequences if they fail
Ensuring Quality in Test Administration

Guidelines and steps in test administration that ensuring


quality in test administration:
• Make a proper sitting arrangements to prevent cheating
• Ensure orderly and proper distribution of questions papers
to the test takers.
• Time should not be wasted with unnecessary remarks,
instructions or threatening learners . It may develop test
anxiety.
• Remind test takers to avoid malpractices before they start
and make it clear that cheating will be penalized.
• avoid giving hints to test takers who ask about particular
items.
• Keep interruptions during the test to a minimum.
Credibility and Civility in Test Administration
• Credibility and Civility are aspects of characteristics of
assessment which have day to day relevance for developing
educational communities.
• Credibility deals with the value the eventual recipients and users
of the results of assessment place on the result with respect to
the grades obtained, certificates issued or the issuing institution.
• While civility on the other hand enquires whether the persons
being assessed are in such conditions as to give their best
without hindrances and burdens in the attributes being
assessed and whether the exercise is seen as integral to or as
external to the learning process.
• Hence, in test administration, effort should be made to see that
the test takers are given a fair and unaided chance to
demonstrate what they have learnt with respect to:
• Instruction
• Duration of test
• Venue and Sitting Arrangement: The test environment should be
learner friendly with adequate physical conditions such as work
space, good and comfortable writing desks, proper lighting, good
ventilation, moderate temperature, conveniences within reasonable
distance and serenity necessary for maximum concentration. It is
important to provide enough and comfortable seats with adequate
sitting arrangement for the test takers’ comfort and to reduce
collaboration between them. Adequate lighting, good ventilation and
moderate temperature reduce test anxiety and loss of concentration
which invariably affects performance in the test. Noise is another
undesirable factor that has to be adequately controlled both within
and outside the test immediate environment since it affects
concentration and test scores.
Factors influencing quality of tests
In the test itself
• Unclear directions:
• Reading vocabulary and sentence too difficult
• Inappropriate level of difficulty of the test items
• Poorly constructed test items
• Ambiguity
• Test items inappropriate for the outcomes being
measured
• Test too short
• Improper arrangement of items
• Identifiable pattern of answers
Some possible barriers in test items
– Ambiguous statements
– Excessive wordiness
– Difficult vocabulary
– Complex sentence structure
– Unclear instruction
– Unclear illustrative materials
– Race, ethnic and gender bias
Avoidance of Unintended Clues to the Answer
• Q) A tortoise is an_____________.
A) plant
B) mammal
C) animal
D) bird
• Q) Which one of the following instruments is used to
determine the direction of the wind?
A) Anemometer
B) Barometer
C) Hydrometer
D) Wind Vane
Some common clues in test items:
• Grammatical inconsistencies
• Verbal associations
• Specific determiners (e.g. always, never, all,
etc)
• Phrasing of incorrect responses
• Length of correct responses
• Location of correct responses
Preparing Directions for the Test or Assessment
• Purpose of the test or assessment
• Time allowed for completing the test or
performing the task
• Directions for responding
• How to record the answers
• The basis for scoring open-ended or extended
responses
Unit 3: Item Analysis

3.1 Introduction
• In the previous unit you learned about various
assessment strategies, planning, construction and
administration of classroom tests.
• In this unit, you are going to learn the techniques
of analyzing responses to test items to determine
their validity and reliability.
• You will also learn about the advantages and
techniques of test item banking.
Learning Outcomes

At the end of this unit you will be able to:


• Define item difficulty and discrimination indices
• Analyze items using difficulty and discrimination
indices.
• Analyze distracters of multiple choice items
• Improve item qualities through response analysis
• Select items for different purposes
• Bank test items for future use.
3.2. Sections and sub-sections

• Item analysis is the process involved in analyzing


testee’s responses to each item on the basic intent
of judging the quality of item.
• Item analysis helps to determine the adequacy of
the items within a test as well as the adequacy of
the test as a whole.
• Among several reasons of analyzing questions
Some of it includes the following:
• Identify content that has not been adequately
covered and should be re-taught
• Provide feedback to students
• Determine if any items need to be revised in the
event they are to be used again or become part of
an item file or bank
• Identify items that not functioned as intended
• Direct the teacher's attention to individual student
weaknesses
• The results of item analysis provide information
about the difficulty of the items and the ability of
the items to discriminate between better and
poorer students
• The statistics used in an item analysis are:
• Item-difficulty index (p). calculating the proportion

of examinees answered the item correctly.


• Item-discrimination index (d).
• Effectiveness of the destructors
3.2.1. Item difficulty level index
Item analysis procedures
• Rank the papers in order from the highest to the
lowest score
• Divide the test papers in to two- upper group and
lower group on the basis of their scores.
• Then take 27% of test paper from the upper group
(high achiever) and lower group (low achiever
• For each test item, tabulate the number of students in
the upper & lower groups who selected each option
• Compute the difficulty of each item (% of students
who got the right item)
• Item difficulty index can be calculated using the
formula:
• P=
or
• The difficulty indexes can range between 0.0 and
1.0 and are usually expressed as a percentage.
• A higher value indicates that a greater
proportion of examinees responded to the item
correctly, and it was thus an easier item.
• an average difficulty of .60 is ideal.
• For example: If 243 students answered item no.
1 correctly and 9 students answered incorrectly,
the difficulty level of the item would be 243/252
or .96.
• For instance, if the number of students in the class
is 70, the item analysis is computed taking 27% of
the upper and lower group

• 70 X

• Then use 19 test papers from the upper group and


19 test papers from the lower group, together it
accounts 38 papers.
• Activity: Calculate the item difficulty level for the
following four options multiple choice test item.
(The sign (*) shows the correct answer).
Response Options

Groups A B C D* Total

High Scorers 0 1 1 8 10

Low Scorers 1 1 5 3 10

Total 1 2 6 11 20
• Item difficulty interpretation

P-Value Percent Range Interpretation

> or = 0.75 75-100 Easy

< or = 0.25 0-25 Difficult

between .25 & .75 26-74 Average


• For criterion-referenced tests (CRTs), with their
emphasis on mastery-testing, many items on an
exam form will have p-values of .9 or above.
• Norm-referenced tests (NRTs), on the other hand,
are designed to be harder overall and to spread
out the examinees’ scores. Thus, many of the
items on an NRT will have difficulty indexes
between .4 and .6
• 3.2.2. Item discrimination index
• It is a numerical indicator that enables to
determine whether the question discriminates
appropriately between lower scoring and higher
scoring students.
• Formula D = UC - LC
1/2T

Compute discrimination index form the above item


• The interpretation of an item in relation to item
discrimination power can be seen as follows:
• Discrimination index ranges from negative one
(-1) to positive one (1)
• An item has maximum positive discriminating
power if all pupils from the upper group got the
item right and all from the lower group miss it.
Activity:
• An item has zero discriminating power if ______________.
• The item has negative discriminating power if ___________
In general,
• The higher the discrimination index the better the item is.
• An item is considered as having average discriminating
power as it approaches to 0.5.
• An item with a maximum positive discriminating power
would be ________ where all pupils in the upper group got
the item right and all the pupils in the lower group got the
item wrong.
• An item is considered acceptable if its discrimination index
is 0.30 or higher.
• Item discrimination interpretation
D value Direction Strength

>+0.4 positive strong

+0.2 to 0.4 Positive moderate

-0.2 to +0.2 none none


• For a small group of students, an index of
discrimination for an item that exceeds 0.20 is
considered satisfactory.
• For larger groups, the index should be higher
because more difference between groups would
be expected.
• For very easy or very difficult items, low
discrimination levels would be expected;
• For items with a difficulty level of about 70
percent, the discrimination should be at least .30
3.2.3. Distractor Analysis
• Distracters effectiveness is determined by
inspection or observation.
• In most case there is no need of calculation of an
index. Effective distracter is the one that attracts
more students from the lower group than the
upper group
• Project Work
• In the school where you are placed for your
Practicum activities, take corrected exam papers
of 1 section from the cooperating teacher and by
taking 10 multiple choice questions:
• calculate the difficulty level of each item
• calculate the discrimination power of each item
• analyze the plausibility of the distractors
• Present your work in the form of a report.
3.2.4 Item Banking

• Building a file of effective test items and


assessment tasks involves recording the items
adding information from analyses of students
responses, and
• filing the records by both the content area and
the objective that the item or task measures
-

Group Assignment (10%)


1. Compute difficulty and Discrimination levels
3. Judge effectiveness of destructors
-

Alternatives
Group
A* B C D

Upper 8 12 6 4

Lower 5 10 7 8
Unit 4: Interpretation of Scores

• Introduction
• In unit three you learned about how to analyze test
items in order to determine each test item and the
overall test in general. In this unit you are going to
be familiarized with the idea of test score
interpretation and the major statistical techniques
that can be used to interpret test scores.
Particularly, you will learn about the methods of
interpreting test scores, measures of central
tendency, measures of dispersion or variability,
measures of relative position, and measures of
relationship or association.
• Test interpretation is a process of assigning meaning.
• It is necessary because the raw score obtained from
a test standing on itself rarely has meaning.
• For instance, a score of 60% in one test cannot be
said to be better than a score of 50% obtained by the
same test taker in another test of the same subject.
• The test scores on their own lack a true zero point
and equal units.
• Moreover, they are not based on the same standard
of measurement and as such meaning cannot be
read into the scores on the basis of which academic
and psychological decisions may be taken.
• 4.1 Kinds of scores
• Data differ in terms of what properties of the real number
series (order, distance, or origin) we can attribute to the
scores.
The most common kinds of scores include nominal, ordinal,
interval, and ratio scales.
• A nominal scale involves the assignment of different
numerals to categorize that are qualitatively different.
• For example, we may assign the numeral 1 for males and
2 for females.
• These symbols do not have any of the three
characteristics (order, distance, or origin) we attribute to
the real number series.
• The 1 does not indicate more of something than the 0.
• An ordinal scale has the order property of a real
number series and gives an indication of rank
order.
• For example, ranking students based on their
performance would involve an ordinal scale.
• We know who is best, second best, third best,
etc.
• But the ranked do not tell us anything about the
difference between the scores
• With interval data we can interpret the
distances between scores.
• If, on a test with interval data, a Almaz has a
score of 60, Abebe a score of 50, and Beshatu
a score of 30, we could say that the distance
between Abebe’s and Beshadu’s scores (50 to
30) is twice the distance between Almaz”s and
Abebe’s scores (60 t0 50).
• If one measures with a ratio scale, the ratio of the
scores has meaning. Thus, a person whose height
is 2 meters is twice as a tall as a person whose
height is 1 meter.
• We can make this statement because a
measurement of 0 actually indicates no height.
That is, there is a meaningful zero point. However,
if a student scored 0 on a spelling test, we would
not interpret the score to mean that the student
had no spelling ability
4.2 Methods of Interpreting test scores

• a raw score is a numerical summary of a student’s


test performance, it is not meaningful without
further information
• raw score can provide meaning by either converting
it into a description of the specific tasks the student
can perform (criterion referenced interpretation) or
• converting it into some type of derived score that
indicates the student’s relative position in a clearly
defined referenced group (norm referenced
interpretation).
Criterion referenced interpretation
• Criterion - referenced interpretation is the interpretation
of test raw score based on the conversion of the raw
score into a description of the specific tasks that the
learner can perform.
• a score is given meaning compared to the standard of
performance that is set before the test is given.
• It permits the description of a learner’s test performance
without referring to the performance of others.
• pupil’s performance is described in terms of the speed
with which a task is performed, the precision with which
a task is performed, or the percentage of items correct
on some clearly defined set of learning tasks.
• It typically involves designing a test that
measures a set of clearly stated learning tasks.
• Enough items are used for each interpretation
to make it possible to describe test performance
in terms of students’ mastery or non-mastery of
learning tasks.
Norm referenced test interpretation
• It is the interpretation of raw score based on the
conversion of the raw score into some type of
derived score that indicates the learner’s relative
position in a clearly defined referenced group.
• This type of interpretation tells us how an
individual compares with other persons who have
taken the same test.
• Norm – referenced interpretation is usually used in
the classroom test to rank the test takers raw scores
from highest to lowest scores.
• It is interpreted by noting the position of an
individual’s score relative to that of other test takers
in the classroom test.
• In this type of test score interpretation, what is
important is a sufficient spread of test scores to
provide reliable ranking.
• The percentage score or the relative easy / difficult
nature of the test is not necessarily important in the
interpretation of test scores in terms of relative
performance.
4.2.1 Measures of Central Tendency

• The goal of measures of central tendency is to come up


with the one single score that best describes a distribution
of scores.
• It helps to whether the score tends to be composed of high
scores or low scores.
• There are three basic measures of central tendency – the
mean, the mode and the median –
• choosing one over another depends on two different
things:
• 1. The scale of measurement used, so that a summary
makes sense given the nature of the scores.
• 2. The shape of the frequency distribution, so that the
measure accurately summarizes the distribution.
• The Mean
• The mean, or arithmetic average, is the most
widely used measure of central tendency.
• The mean takes into account the value of each
score, and so one extremely high or low score
could have a considerable effect on it.
• It is helpful to know the mean because then
you can see which numbers are above and
below the mean.
• Here is an example of test scores for a Math’s
class: 82, 93, 86, 97, 82.
• To find the Mean, first you must add up all of
the numbers. (82+93+86+97+82= 440)
• Now, since there are 5 test scores, we will next
divide the sum by 5. (440÷5= 88).
• Thus, the Mean is 88. The formula used to
compute the mean is as follows:
The Median
• mean may not be the best indicator if one or a
few students score lower (or higher) than the
other students, their scores tend to pull the
mean in their direction.
• In this case the median is usually considered a
better indicator of student performance.
• There are also some types of scores that are
reported for standardized tests for which the
mean is not appropriate eg, (percentile scores),
• The median is a counting average; the number that
divides a distribution of scores exactly in half.
• It is determined by arranging the scores in order of
size and counting up to (or down to) the midpoint
of the set scores.
• The median will usually be around where most
scores fall. When the number of scores is odd, the
median is the middle score. If the number of scores
is even, the median will be halfway between the
two middle most scores. In this case the median is
not an actual score earned by one of the students.
The Mode
• It is the score that occurs most frequently and is
determined by inspection.
• It is the least reliable type of statistical average
and is frequently used merely as a preliminary
estimate of central tendency.
• A set of scores may sometimes have two or more
modes and in such cases are called bimodal or
multimodal respectively.
Shape of Distributions: Skewness
• all three measures of central tendency are
identical when a distribution is symmetrical,
• That means the right half of the distribution is the
mirror image of the left half of the distribution.
Shape of distribution of scores
• In a positively-skewed distribution most of the scores
concentrate at the low end of the distribution.
Example, difficult test
• In a negatively-skewed distribution, the majority of
scores are toward the high end of the distribution.
Example, easy test
NB
• With perfectly bell shaped distributions, the mean,
median, and mode are identical.
• With positively skewed data, the mode is lowest,
followed by the median and mean.
• With negatively skewed data, the mean is lowest,
followed by the median and mode.
4.2.2 Measures of Variability/Dispersion

• Measures of variability (dispersion) show how


scores are scattered around the particular
values.
• The three most commonly used measures of
variability are the range, the quartile deviation,
and the standard deviation.
Eg. Conceder the following three sets of scores with equal means
but different dispersions.
No Set 1 Set 2 Set 3
1 10 10 19
2 10 12 16
3 10 11 13
4 10 11 10
5 10 10 7
6 10 9 4
7 10 7 1
= 10 10 10
• All sets of scores have the same mean.
• The variability in set1 = is zero, because all seven scores are equal
• The dispersion of score in set 3 is greater than in set 2
Range
• It is the simplest and crudest measure of
variability calculated by subtracting the lowest
score from the highest score. Range = X max – X min
• Eg range for
• Set1 ____, S2____ S3_____
• The range provides a quick estimate of variability
but is undependable because it is based on the
position of the two extreme scores
Inter quartile range
• Inter quartile range (IQR) measure the data in
terms of quarters or percentiles,
• IQR is the distance between the 25th and 75th
percentile or the first and third quarter.
• The range of data is divided into four equal
percentiles or quarters (25%).
• IQR is the range of the middle 50% of the data.
because it uses the middle 50%, it is not affected by
outliers or extreme values.
• The IQR is often used with skewed data as it is
insensitive to the extreme scores.
The Standard Deviation

• standard deviation is the most useful measure of


variability, or spread of scores.
• It is essentially an average of the degree to which a
set of scores deviates from the mean.
• If the Standard Deviation is large, it means the
numbers are spread out from their mean.
If the Standard Deviation is small, it means the
numbers are close to their mean. Because it takes
into account the amount that each score deviates
from the mean,
• it is a more stable measure of variability than either
• Variance is the average squared deviation from
the mean of a set of data.
• Variance reflects the degree of variability in a
group of score. Its derivative is standard deviation
• It is used to find the standard deviation
• Symbolically Variance ( ) S2 = =

Where, s2= Variance,


X’ =deviation that is X’ = X-
SDis the square root of a variance
• The value of SD is always positive.
( ) S=

• A. Deviation score method of calculating SD (S)/

Score (x) Deviation (X-mean) (X-2)

10 2 4
7 -1 1
9 1 1
6 -2 4
8 0 0

=40x-)2=10
40/5=8
• Raw score method

Score (x) X2

10 100
7 49
9 81
6 36
8 64

• ∑x=40 ∑x2 = 330

• S2 = - = 330- == =2
N 5

Then, Variance (S2) = =10/5=2


• Standard deviation (S) = =1.4 this mean is that the score on the
average vary or deviate 1.4 unit from their mean.
• Which measure of dispersion to use?
• The quartile deviation is used with the median and is
satisfactory for analyzing a small number of scores.
• Because these scores are obtained by counting and thus are
not affected by the value of each score, they are especially
useful when one or more scores deviate markedly from the
others in the set.
• The standard deviation is used with the mean. It is the most
reliable measure of variability, and is especially useful in
testing.
• In addition to describe the spread of scores in a group, it
serves as a basis for computing standard scores,
• the standard error of measurement, and other statistics used
in analyzing and interpreting test scores.
4.2.3. Measures of Relative Position

• Measure the relative position of scores are


indicated in:
• Percentiles
• It is a score that indicates the rank of the
student compared to others (same age or same
grade), using a hypothetical group of 100
students. It tells you what percentage of people
you did better than.
• for example, A percentile of 87 indicates that the
student equals or surpasses 87 out of 100
Converting Data Value to Percentile
1. Arrange the data in ascending order
2. Count how many items are below your value. If for
example your score is 85 and there are multiple 85’s then
count how many are under the first 85.
• For example, in the students’ scores of 76, 77, 80, 83,
85, 85, 85, 90, 96 ,97 there are 4 items below 85.
• Percentile=(number of items below your data + 0.5)*100%
total number of values
• So Percentile = (4 + 0.5) *100% = 45 Percentile
10
Quartiles
• Quartile is another term referred to in
percentile measure.
• The total of 100% is broken into four equal
parts: 25%, 50%, 75% 100%.
• Lower Quartile is the 25th percentile. (0.25)
• Median Quartile is the 50th percentile. (0.50)
• Upper Quartile is the 75th percentile. (0.75)
Standard Scores
• It indicates a pupils relative position in a group
by showing how far the raw score is above or
below average.
• Basically, standard scores express test
performance in terms of standard deviation
units from the mean.
• Standard scores are scores that are based on
mean and standard deviation.
Types of standard scores
• Z Score: It is used for data distributions that are
approximately symmetric, a measure of relative
position.
• z-score gives us an estimate as to how many
standard deviations a particular score lies from
the mean.
• We define z score as z =X –
S
X = the data value in question
s = the sample standard deviation
Eg, if a person scored a 70 on a test with a mean of
50 and a standard deviation of 10, then they scored
2 standard deviations above the mean.
So, a z score of 2 means the original score was 2
standard deviations above the mean.
• If the z-score is 0 then your data value is the mean
• If the z-score > 0 (positive) then your data value is
above the mean
• If the z-score < 0 (negative) then your data value
is the below the mean.
• Example. Almaz scored a 25 on her math test.
Suppose the mean for this exam is 21, with a
standard deviation of 4. Dawit scored 60 on an
English test which had a mean of 50 with a
standard deviation of 5. Who did relatively
better?
• Almaz= z-score: 25 - 21 =1
4
• Dawit's z-score: 60-50 = 2
5
• T Scores: It refers to any set of normally
distributed standard scores that has a mean score
of 50 and a standard deviation of 10.
• The T – score is obtained by multiplying the Z-
score by 10 and adding the product to 50. That is,
T – Score = 50 + 10(z).
• A score of 60 is one standard deviation above the
mean, while a score of 30 is two standard
deviations below the mean. How????
• 4.2.4 Measures of Relationship
• It measures of relationship between two sets of scores from
the same group of people, it is often desirable to know the
degree to which the scores are related.
• For example, the relationship between the test scores of
students in English Subject and their overall scores subjects.
• The degree of relationship is expressed in terms of coefficient
of correlation. The value ranges from -1.00 to +1.00.
• A perfect positive correlation is indicated by a coefficient of
+1.00 and
• A perfect negative correlation by a coefficient of -1.00.
• A correlation of .00 indicates no relationship between the
two sets of scores. Obviously, the larger the coefficient
(positive or negative), the higher the degree of relationship
expressed.
• Among several measures of relationship product-moment
correlation coefficient, one and it is the most commonly
used and most useful correlation coefficient.
• It is indicated by the symbol r.
• The formula for obtaining the coefficient of correlation is:
r =  X NS
 X Y  Y 
Sx y

• Where, X = score of person on one variable


Y = score of same person on the other variable
= mean of the X distribution
= mean of the Y distribution
Sx = standard deviation of the X scores
Sy = standard deviation of the Y scores
N = number of pairs of scores OR
nxy  xy
rxy 
nx  x ny  n 
2 2 2 2

Where, n = number of pairs of scores


 = summation
x = Score of a person on one measure
y = Score of the same person on a second measure
Students Oral exam score (x) Written exam X2 Y2 XY
score(y)
A 10 57 100 3249 570
B 6 30 36 900 180
C 3 53 9 2809 159
D 5 44 25 1936 220
E 8 53 64 2809 424
F 2 25 4 625 50
G 4 33 16 1089 132
H 5 48 25 2304 240
I 10 59 100 3481 590
J 3 35 9 1225 105
Sum 56 437 338 20427 2670

nxy  xy
rxy =
nx 2

 x  ny 2  y 
2 2

102670  56437 
=
10388  56 1020427  437 
2 2

2228
= = 0.71
74413301
• Activity: Calculate the Pearson product
moment correlation coefficient

X 44 20 40 10 11 12

Y 3 4 3 40 12 20
UNIT 5: Ethical Standards of assessment

Introduction
• This unit introduce you with ethics as a mechanism
of maintaining quality in assessment practice.
• You will be familiarized with some basic standards
expected of professional teachers to be ethical in
their assessment practices.
• You will also be familiarized with some general
considerations in addressing diversity in the
classroom to make the assessment procedures
accessible and free of bias.
Learning Outcomes
Upon completion of this unit, you should be able to:
• List down ethical and professional standards of
assessment
• Propose contextualized ethical and professional
standards in using assessment
• Sensitize the consequences of unethical use of
assessments
• Adhere to the ethical standards of tests and test
uses
5.2. Sections and sub-sections

5.2.1 Ethical and Professional Standards of Assessment and its


use
• Ethical standards guide teachers to provide fair test to all test
takers regardless of age, gender, disability, ethnicity, religion,
linguistic background, or other personal characteristics.
• Fairness is a primary consideration in all aspects of testing. It:
– helps to ensure that all test takers are given a comparable
opportunity to demonstrate what they know and how they can
perform in the area being tested.
– implies that every test taker has the opportunity to prepare for the
test and is informed about the general nature and content of the
test.
– also extends to the accurate reporting of individual and group test
results.
The following are some ethical standards that
teachers may consider in their assessment practices.
1. Teachers should be skilled in choosing appropriate
assessment methods for instructional decisions.
• Teachers need to be well-acquainted with the
kinds of information provided by a broad range of
assessment alternatives and their strengths and
weaknesses.
• In particular, they should be familiar with criteria
for evaluating and selecting assessment methods
in light of instructional plans.
2. Teachers should develop tests that meet the
intended purpose and that are appropriate for the
intended test takers. This requires teachers to:
• Define the purpose for testing, the content and skills to be
tested, and the intended test takers.
• Develop tests that are appropriate with content, skills
tested, and content coverage for the intended purpose of
testing.
• Develop tests that have clear, accurate, and complete
information.
• Develop tests with appropriately modified forms or
administration procedures for test takers with disabilities
who need special accommodations.
3. In addition to Selecting and developing good
assessment methods, teachers should also be
skilled in administering, scoring and interpreting the
results from diverse assessment methods. This
requires teachers to:
• Provide and document appropriate procedures for test
takers with disabilities who need special accommodations
or those with diverse linguistic backgrounds.
• Protect the security of test materials, including eliminating
opportunities for test takers to obtain scores by fraudulent
(dis honest, deceiving)means.
• Develop and implement procedures for ensuring the
confidentiality of scores.
4. Teachers should be skilled in using assessment
results when making decisions about individual
students, planning teaching, developing
curriculum, and school improvement.
5. Teachers should be skilled in developing valid
grading procedures.
• Grading students is part of professional
practice for teachers. Grading indicates both
student's level of performance and teacher's
valuing of that performance.
6. Teachers should be skilled in communicating
assessment results to students, parents, other
lay audiences, and other educators.

7. Teachers should be skilled in recognizing unethical,
illegal, and inappropriate assessment methods.
• Fairness and professional ethical behavior must
undergird (support)all student assessment activities,
from the initial planning and gathering of information
to the interpretation, use, and communication of the
results.
• Teachers must be well-versed (familiar) in their own
ethical and legal responsibilities in assessment.
• They should also attempt to have the inappropriate
assessment practices of others discontinued whenever
they are encountered.
• Teachers should also participate with the wider
educational community in defining the limits of
In addition to the above stated points the following
principles can guide the development of a grading system.
• The system of grading should be clear and
understandable (to parents, other stakeholders, and
most especially students).
• The system of grading should be communicated to all
stakeholders (e.g., students, parents, administrators).
• Grading should be fair for all students regardless of
gender, socioeconomic status or any other personal
characteristics.
• Grading should support, enhance, and inform the
instructional process.
5.2.2. Ethnicity and Culture in tests and assessments

• All students have to be provided with equal


opportunity to demonstrate the skills and
knowledge being assessed.
• Fairness is fundamentally a socio-cultural, rather
than a technical issue.
• This section shows how culture and ethnicity may
influence teachers’ assessment practices and
what precautions one has to take to avoid bias
and be accommodative to students from all
cultural groups.
Do you believe that culture and ethnicity have any
role in teachers’ assessment practices?
what is your university experience?
• Students represent a variety of cultural and
linguistic backgrounds. If the cultural and linguistic
backgrounds are ignored, students may become
alienated or disengaged from the learning and
assessment process.
• Teachers need to be aware of how such
backgrounds may influence student performance
and the potential impact on learning.
Classroom assessment practices should be sensitive to
the cultural and linguistic diversity of students in order
to obtain accurate information about their learning.
• Assessment practices that attend to issues of
cultural diversity include:
• acknowledging students’ cultural backgrounds.
• sensitive to aspects of assessment that may hamper
students’ ability to demonstrate their knowledge
and understanding.
• use knowledge to adjust or scaffold assessment
practices if necessary.
Assessment practices that attend issues of linguistic
diversity include:
• acknowledging students’ differing linguistic
abilities.
• using assessment practices in which the language
demands do not unfairly prevent the students
from understanding what is expected of them.
• using assessment practices that allow students to
accurately demonstrate their understanding by
responding in ways that accommodate their
linguistic abilities.
For an assessment task to be fair, its content,
context, and performance expectations should:
• reflect knowledge, values, and experiences that
are equally familiar and appropriate to all
students;
• tap knowledge and skills that all students have had
adequate time to acquire;
• be as free as possible of cultural and ethnic
stereotypes
5.2.3. Disability and Assessment Practices

• Different world conventions were held and


documents signed towards the implementation of
inclusive education. Ethiopia is also accepted
inclusive education as a basic principle to guide its
policy and practice in relation to the education of
disabled students
• Inclusive education is based on the idea that all
students, including those with disabilities, should be
provided with the best possible education to develop
themselves. This implies for the provision of all
possible accommodations to address the educational
needs of disabled students considering the
• There are different strategies that can be
considered to make assessment practices
accessible to students with disabilities depending
on the type of disability. The following strategies
could be considered in summative assessments:
• Modifying assessments: - This should enable
disabled students to have full access to the
assessment without giving them any unfair
advantage.
• Others’ support: - Disabled students may need
the support of others in certain assessment
activities which they can not do it independently
• Time allowances: - Disabled students should be given additional time to
complete their assessments which the individual instructor has to decide
based on the purpose and nature of the assessment.
• Rest breaks: Some students may need rest breaks during the
examination. This may be to relieve pain or to attend to personal needs.
• Flexible schedules: In some cases disabled students may require
flexibility in the scheduling of examinations. For example, some students
may find it difficult to manage a number of examinations in quick
succession
• Alternative methods of assessment:- In certain situations where formal
methods of assessment may not be appropriate for disabled students,
the instructor should assess them using non formal methods such as
class works, portfolios, oral presentations, etc.
• Assistive Technology: Specific equipment may need to be available to
the student in an examination. Such arrangements often include the use
of personal computers, voice activated software and screen readers
5.2.4 Gender issues in assessment
• The issues of gender bias and fairness in assessment
are concerned with differences in opportunities for
boys and girls. A test is biased if boys and girls with
the same ability levels tend to obtain different
scores.
Test questions should be checked for:
• material or references that may be offensive to
members of one gender,
• unequal representation of men and women as actors
in test items or representation of members of each
gender only in stereotyped roles
The End

You might also like