Ed 312 Assessment 1
Ed 312 Assessment 1
Ed 312 Assessment 1
IN
LEARNING 1
(Ed 312)
LEARNER’S MODULE
MARIAN COLLEGE
1st SEMESTER
SY 2021-2022
Our world is experiencing an unprecedented health and economic crisis brought by COVID 19 Pandemic. This
current disruption distressed the workface across socioeconomic strata, metamorphosing the nature of the
work and the way we communicate with one another. Schools have to make adjustment in the teaching and
learning process. Flexible Learning Modality is a proposed mechanism to continue the delivery of educational
services during this period.
The Commission on Higher Education suggested three Flexible Learning Modalities; namely, online, offline.
Taking into account the availability of devices, internet connectivity, and level of digital literacy of our students,
we decided to use blended learning as our flexible mode of delivering instruction and other services. This
module designed to cater the needs of our students who do not have access to digital technology. Since it is
blended, other students have no option to avail the online component of blended learning.
You are expected to read the contents of this module, study the examples, practice answering the “Check your
progress” portion and answer the exercises at the end of every module. I expect that you will complete one
module per week. Submit your output every FRIDAY on the designated pigeonhole boxes located at the
Entrance of High School gate.
For any queries with regard to the use of this module or you encounter difficulty understanding the topic, please
do not hesitate to contact the undersigned on mobile phone number 09305171981. You can also reach me in my
messenger account Guada Edulan or send email in [email protected]
I will ask for your contact details during our course orientation so that I can personally monitor your progress in
this course. In case the CHED, LGU, and IATF will allow us to conduct in-campus/face-to-face teaching and
learning, we will inform you immediately through a text message or other medium of communication. May
Almighty God and Mother Mary our patroness will bless us always.
Guadalupe G. Edulan
Instructor
P1
Chapter 1
Introduction to Assessment
In Learning
To successfully describe the nature of assessment in learning, develop a concept map of its basic concepts and document
the experiences of teachers who apply its principles. To do so, you need to read the following information about the basic concepts,
measurement frameworks, and principles in assessing learning. You are expected to read this information before discussion,
analysis, and evaluation when you meet the teacher face-to-face or in your virtual classroom. If the information provided in this
worktext is not enough, you can search for more information in the internet.
P2
How is assessment in learning similar or different from the concept of measurement or evaluation of learning?
Measurement can be defined as the process of quantifying the attributes of an object, whereas evaluation may refer to the process
of making value judgments on the information collected from measurement based on specified criteria. In the context of assessment
in learning, measurement refers to the actual collection of information on student learning through the use of various strategies and
tools, while evaluation refers to the actual process of making a decision or judgment on student learning based on the information
collected from measurement. Therefore, assessment can be considered as an umbrella term consisting of measurement and
evaluation. However, there are some authors who consider assessment as distinct and separate from evaluation and measurement
(e.g., Huba and Freed 2000, Popham 1998).
The CTT, also known as the true score theory, explains that variations in the performance of examinees on a given measure
is due to variations in their abilities. The CTT assumes that an examinee’s observed score in a given measure is the sum of the
examinee’s true score and some degree of error in the measurement caused by some internal and external conditions. Hence, the
CTT also assumes that all measures are imperfect, and the scores obtained from a measure could differ from the true score (i.e., true
ability) of an examinee.
The CTT provides an estimation of the item difficulty based on the frequency or number of examinees who correctly answer
a particular item; items with fewer number of examinees with correct answer are considered more difficult. The CTT also provides an
estimation of item discrimination based on the number of examinees with higher or lower ability to answer a particular item. If an
item is able to distinguish between examinees with higher ability (i.e., higher total test score) and lower ability (i.e., lower test
score), then an item is considered to have good discrimination. Test reliability can also be estimated using approaches from CTT
(e.g., Kunder-Richardson 20, Cronbach’s alpha). Item analysis based on CTT has been the dominant approach because of the
simplicity of calculating the statistics (e.g., item difficulty index, item discrimination index, item-total correlation).
The IRT, on the other hand, analyzes test items by estimating the probability that an examinee answers an item correctly or
incorrectly. One of the central differences of IRT from CTT is that IRT, it is assumed that the characteristic of an item can be
estimated independently of the characteristic or ability of the examinee and vice-versa. Aside from item difficulty and item
discrimination indices, IRT analysis can provide significantly more information on items and tests, such as fit statistics, item
characteristic curve (ICC), and test characteristic curve (TCC). There are also different IRT models (e.g., one-parameter model, three-
parameter model) which can provide different item and test information that cannot be estimated using the CTT. In previous years,
there have been an increase in the use of IRT analysis as measurement framework despite the complexity of the analysis involved
due to the availability of IRT software.
P3
What are the different types of assessment in learning?
Assessment in learning could be of different types. The common types are formative, summative, diagnostic, and
placement. Other experts would describe the types of assessment as traditional and authentic.
Formative Assessment refers to assessment activities that provide information to both teachers and learners on how they
can improve the teaching learning process. This type of assessment is formative because it is used at the beginning and during
instruction for teachers to assess learners’ understanding. The information collected on the student learning allows teachers to
make adjustments to their instructional process and strategies to facilitate learning. Through performance reports and teacher
feedback, formative assessment can also inform learners about their strengths and weaknesses to enable them to take steps to
learn better and improve their performance as the class progresses.
Summative Assessment are assessment activities that aim to determine learners’ mastery of content or attainment of
learning outcomes. They are summative, as they are supposed to provide information on the quantity or quality of what students
have learned or achieved at the end of instruction. While data from summative assessment are typically used for evaluating learners’
performance in class, these data also provide teachers with information about the effectiveness of their teaching strategies and how
they can improve their instruction in the future. Through performance reports and teacher feedback, summative assessment can
also inform learners about what they have done well and what they need to improve on in their future classes or subjects.
Diagnostic Assessment aims to detect the learning problems or difficulties of the learners so that corrective measures or
interventions are done to ensure learning. Diagnostic assessment is usually done right after seeing signs of learning problems in the
course of teaching. It can also be done at the beginning of the school year for spirally-designed curriculum so that corrective actions
are applied if pre-requisite knowledge and skills for the targets of instructions have not been mastered yet.
Placement Assessment is usually done at the beginning of the school year to determine what the learners already know or
what are their needs that could inform design of instruction. Grouping of learners based on the results of placement assessment is
usually done before instruction to make it relevant to address the needs or accommodate the entry performance of the learners.
The entrance examination given in schools is an example of a placement assessment.
Traditional Assessment refers to the use of conventional strategies or tools to provide information about the learning of
students. Typically, objective (e.g., multiple choice) and subjective (e.g., essay) paper-and-pencil tests are used. Traditional
assessments are often used as basis for evaluating and grading learners. They are commonly used in classrooms because they are
easier to design and quicker to be scored. In general, traditional assessments are viewed as an inauthentic type of assessment.
Authentic Assessment refers to the use of the assessment strategies or tools that allow learners to perform or create a
product that are meaningful to the learners, as they are based on the real-world contexts. The authenticity of assessment tasks is
best described in terms of degree rather than the presence or absence of authenticity. Hence, an assessment can be more authentic
or less authentic compared with other assessments. The most authentic assessments are those that allow performances that most
closely resemble real-world tasks or applications in real-world settings or environment.
1. Assessment should have a clear purpose. Assessment starts with a clear purpose. The methods used in collecting
information should be based on this purpose. The interpretation of the data collected should be aligned with the purpose
that has been set. This assessment principle is congruent with the outcome-based education (OBE) principles of clarity of
focus and design down.
2. Assessment is not an end in itself. Assessment serves as a means to enhance student learning. It is not a simple recording
or documentation of what learners know and do not know. Collecting information about student learning, whether
formative or summative, should lead to decisions that will allow improvement of the learners.
3. Assessment is an ongoing, continuous, and a formative process. Assessment consists of a series of tasks and activities
conducted over time. It is not a one-shot activity and should be cumulative. Continuous feedback is an important element
of assessment. This assessment principle is congruent with the OBE principle of expanded opportunity.
P4
4. Assessment is learner-centered. Assessment is not about what the teacher does but what the learner can do. Assessment
of learners provides teachers with an understanding on how they can improve their teaching, which corresponds to the goal
of improving student learning.
5. Assessment is both process- and product-oriented. Assessment gives equal importance to learner performance or product
and the process they engage in to perform or produce a product.
6. Assessment must be comprehensive and holistic. Assessment should be performed using a variety of strategies and tools
designed to assess student learning in a holistic way. Assessment should be conducted in multiple periods to assess learning
over time. This assessment principle is also congruent with the OBE principle of expanded opportunity.
7. Assessment requires the use of appropriate measures. For assessment to be valid, the assessment tools or measures used
must have sound psychometric properties, including, but not limited to, validity and reliability. Appropriate measures also
mean that learners must be provided with challenging but age- and context-appropriate assessment tasks. This assessment
principle is consistent with the OBE principle of high expectations.
8. Assessment should be as authentic as possible. Assessment tasks or activities should closely, if not fully, approximate real-
life situations or experiences. Authenticity of assessment can be thought of as a continuum from least authentic to most
authentic, with more authentic tasks expected to be more meaningful for learners.
P5
Week 1- a
Name:_________________________________
Subject/Time:___________________________
Instructor’s name: _______________________
Instructions: a.) Using other bond papers/papers strictly prohibited.
DEVELOP
To determine whether you have acquired the needed information about the basic concepts and principles in assessment,
use the space provided to draw a metaphor (i.e., any object, thing, or action you could liken assessment to) that will visually
illustrate what is assessment in learning. Everyone will share and discuss the metaphors they have drawn in class.
EXAMPLE: A thermometer can be drawn as a metaphor for assessment if you consider measurement or collection of information
from a person (i.e., student) as central in the assessment process. A thermometer is a device that collects information about a
person’s body temperature, which provides information on whether a person’s body temperature is normal or not (i.e., high
temperature could be a symptom of fever). The information is then used by medical personnel to make decisions relative to the
collected information. This is similar to the process of assessment.
P6
Application week 1- b
Name:_________________________________
Subject/Time:___________________________
Instructor’s name: _______________________
APPLY
Based on the lessons on the basic concepts and principles in assessment in learning, select five core principles in assessing
learning and explain them in relation to your experience with a previous or current teacher in one of your courses/subjects.
EXAMPLE:
PRINCIPLE ILLUSTRATION OF PRACTICE
Assessment should be as authentic In our practicum course, we were asked to prepare a lesson plan then execute the
as possible plan in front of the students with my critic teacher around to evaluate my
performance. The actual planning of the lesson and its execution in front of the class
and the critic teacher is a very authentic way of assessing my ability to design and
deliver instruction rather than being assessed through demonstration in front of my
classmates in the classroom.
Given the example, continue the identification of illustrations of assessment practices guided by the principles discussed in
the class.
Share your insights on how your teacher’s assessment practices allowed you to improve your learning.
Principle Illustration of Practice
1.
2.
3.
4.
5.
P7
Quiz #1
Name:_________________________________
Subject/Time:___________________________
Instructor’s name: _______________________
3. Assessment is not about what the teacher does but what the learner can do.
This statement is most reflective of which principle of assessment?
A. Assessment should be as authentic as much as possible.
B. Assessment should have a clear purpose.
C. Assessment is not an end in itself.
D. Assessment is learner-centered.
5. Assessment should have a clear purpose. If you are already a classroom teacher,
how would you best demonstrate or practice this assessment principle?
A. Discuss with the class the grading system and your expectations of your students’ performance.
B. When giving tests, the purpose of each test is provided in the first page of the test paper.
C. Explain during the first day of classes your assessment techniques and your reasons for their use.
D. When deciding on an assessment task, its match and consistency with instructional objectives and
learning targets are ascertained.
P8
To be able to successfully prepare an assessment plan based on learning targets, you need to read the following
information about the purposes of assessing learning in the classroom, the basic qualities of effective classroom assessment,
learning targets, and use of appropriate assessment methods. You are expected to read this before discussion, analysis, and
evaluation when you meet the teacher face-to-face in your classroom.
In this lesson,
you are expected to:
explain the purpose of classroom assessment and
formulate learning targets that match appropriate assessment methods.
9.
As discussed in the previous lesson, assessment serves as the mechanism by which teachers are able to determine whether
instruction worked in facilitating the learning of students. Hence, it is very important that assessment is aligned with instruction and
the identified learning outcomes for learners. Knowing what will be taught (curriculum content, competency, and performance
standards) and how it will be taught (instruction) are as important as knowing what we want from the very start (curriculum
outcome) in determining the specific purpose and strategy for assessment. The alignment is easier if teachers have clear purpose on
what they are performing the assessment. Typically, teachers use classroom assessment for assessment of learning more than
assessment for learning and assessment as learning. Ideally, however, all three purposes of classroom assessment must be used.
While it is difficult to perform an assessment with all three purposes in mind, teachers must be able to understand the three
purposes of assessment, including knowing when and how to use them.
Formative. Teachers conduct assessment because they want to acquire information on the current status and level of
learners’ knowledge and skills or competencies. Teachers may need information (e.g., prior knowledge, strengths) about
the learners prior to instruction, so they can design their instructional plan to better suit the needs of the learners. Teachers
may also need information on learners during instruction to allow them to modify instruction or learning activities to help
learners achieve the learning outcomes. How teachers should facilitate students’ learning may be informed by the
information that may be acquired in the assessment results.
Diagnostic. Teachers can use assessment to identify specific learners’ weaknesses or difficulties that may affect their
achievement of the intended learning outcomes. Identifying these weaknesses allows teachers to focus on specific learning
needs and provide opportunities for instructional intervention or remediation inside or outside the classroom. The
diagnostic role of assessment may also lead to differentiated instruction or even individualized learning plans when deemed
necessary.
Evaluative. Teachers conduct assessment to measure learners’ performance or achievement for the purposes of making
judgment or grading in particular. Teachers need information on whether the learners have met the intended learning
outcomes after the instruction is fully implemented. The learners’ placement or promotion to the next educational level is
informed by the assessment results.
Facilitative. Classroom assessment may affect student learning. On the part of teachers, assessment for learning provides
information on students’ learning and achievement that teachers can use to improve instruction and the learning
experiences of learners. On the part of learners, assessment as learning allows them to monitor, evaluate, and improve
their own learning strategies. In both cases, student learning is facilitated.
Motivational. Classroom assessment can serve as mechanism for learners to be motivated and engaged in learning and
achievement in the classroom. Grades, for instance, can motivate and demotivate learners. Focusing on progress, providing
effective feedback, innovating assessment tasks, and using scaffolding during activities provide opportunities for
assessment to be motivating rather than demotivating.
10.
What are learning targets?
Bloom’s taxonomies of educational objectives provide teachers with a structured guide in formulating more specific learning targets
as they provide an exhaustive list of learning objectives. The taxonomies do not only serve as guide for teachers’ instruction but also
as a guide for their assessment of students learning in the classroom. Thus, it is imperative that teachers identify the levels of
expertise that they expect the learners to achieve and demonstrate. This will then inform the assessment method required to
properly assess student learning. It is assumed that a higher level of expertise in a given domain requires more sophisticated
assessment methods or strategies.
Table 2.2 Cognitive Process Dimensions in the Revised Bloom’s Taxonomy of Educational Objectives
Cognitive Level Description Illustrative Verbs Sample Objective
Create Combining parts to Compose, produce, develop, formulate, Propose a program of action to
make a whole devise, prepare, design, construct, help solve Metro Manila’s traffic
propose, and re-organize congestion.
Evaluate Judging the value Assess, measure, estimate, Critique the latest film that you have
of information or data evaluate, critique and judge. watched. Use the critique guidelines
and format discussed in the class.
Analyze Breaking down Analyze, calculate, examine, test, Classify the following chemical
information into parts compare, differentiate, organize, elements based on some
and classify categories/areas.
Apply Applying the facts, rules, Apply, employ, practice, relate, Solve the following problems using
concepts, and ideas in use, implement, carry-out, the different measures of central
another context and solve tendency.
Understand Understanding what the Describe, determine, interpret, translate, Explain the causes of malnutrition in
information means paraphrase, and explain the country.
Remember Recognizing and recalling Identify, list, name, underline, Name the 7th president of the
facts recall, retrieve, locate Philippines
Table 2.3 Knowledge Dimensions in the Revised Bloom’s Taxonomy of Educational Objectives
Knowledge Description Sample Question
Factual This type of knowledge is basic in every discipline. It tells the facts or bits of What is the capital city of the
information one needs to know in a discipline. This type of knowledge Philippines?
usually answers questions that begin with “who”, “where”, and “when”.
Conceptual This is type of knowledge is also fundamental in every discipline. It tells the What makes the Philippines the
concepts, generalizations, principles, theories, and models that one needs “Pearl of the orient sea”?
to know in a discipline. This type of knowledge usually answers questions
that begin with “what”.
Procedural This type of knowledge is also fundamental in every discipline. It tells the How do we develop items for an
processes, steps, techniques, methodologies, or specific skills needed in achievement test?
performing a specific task that one needs to know and be able to do in a
discipline. This type of knowledge usually answers questions that begin with
“how”.
Metacognitive This type of knowledge makes the discipline relevant to one’s life. It makes Why is Engeenering the most
one understand the value of learning on one’s life. It requires reflective suitable course for you?
knowledge and strategies on how to solve problems or perform a cognitive
task through understanding of oneself and context. This type of knowledge
usually answers questions that begin with “why”. Questions that begin with
“how” and what could be used if they are embedded in a situation that one
experiences in real life.
12.
Learning Targets
A learning target is “a statement of student performance for a relatively restricted type of learning outcome that will be
achieved in a single lesson or a few days” and contains “both a description of what students should know, understand, and be able
to do at the end of instruction and something about the criteria for judging the level of performance demonstrated” (McMillan
2014). In the words, learning targets are statements on what learners are supposed to learn and what they can do because of
instruction. Compared with educational goals, standards, and objectives, learning targets are the most specific and lead to more
specific instructional and assessment activities.
Learning targets should be congruent with the standards prescribes by program or level and aligned with the instructional
or learning objectives of a subject or course. Teachers must inform learners about the learning targets of lessons prior to classroom
instruction. The learning targets should be meaningful for the learners; hence, they must be as clear and as specific as possible. It is
suggested that learning targets be stated in the learners’ point of view, typically using the phrase “I can ….” For example, “I can
differentiate between instructional objectives and learning targets.”
With clear articulation of learning targets, learners will know what they are expected to learn during the lesson or set of
lessons. Learning targets will also inform learners what they should be able to do or demonstrate as evidence of their learning. Both
classroom instruction and assessment should be aligned with the specified learning targets of a lesson.
McMillan (2014) proposed five criteria for selecting learning targets: (1) establish the right number of learning targets (Are
there too many or too few targets?); (2) established comprehensive learning targets (Are all important types of learning included?);
(3) established learning targets that reflect school goals and 21 st century skills (Do the targets reflect school goals and 21 st century
knowledge, skills, and dispositions?); (4) established learning targets that are challenging yet feasible (Will the targets challenge
students to do their best work?); and (5) establish learning targets that are consistent with current principles of learning and
motivation (Are the targets consistent with research on learning and motivation?).
Other experts consider a fifth type of learning targets—affect, which refers to affective characteristics that students can develop and
demonstrate because of instruction. This includes attitudes, beliefs, interests, and values. Some experts use disposition as an
alternative term for affect. The following is an example of an affect or disposition learning target:
I can appreciate the importance of addressing potential ethical issues in the conduct of thesis research.
There are other types of assessment, and it is up to the teachers to select the method of assessment and design
appropriate assessment tasks and activities to measure the identified learning targets.
P14
Week 2
Name:_________________________________
Subject/Time:___________________________
Instructor’s name: _______________________
Instructions: a.) Using other bond papers/papers strictly prohibited.
DEVELOP
To know if you have acquired the information you need to learn in this lesson, kindly complete Tables 2.6 and 2.7.
What?
Why?
When?
Table 2.7 Relation between Educational Goals, Standards, Objectives, and Learning Targets
Goals Standards Objectives Learning
Targets
Description
Sample
Statements
P15
Quiz #2
Name:_________________________________
Subject/Time:___________________________
Instructor’s name: _______________________
1. What is this purpose of assessment that aims to identify students’ needs to inform instruction?
A. Assessment as Learning
B. Assessment for Learning
C. Assessment of Learning
D. Assessment with Learning
4. Which of the following types of pen-and-paper test is best matched with reasoning type of learning targets?
A. Essay
B. Matching-Type
C. Multiple-Choice
D. Short-Answer
5. If you are a values education teacher who intends to design an assessment task to determine your learners’
motivation in practicing pro-environmental behaviors, which of the following assessment strategies would
best addresses your purpose?
A. Learners developing and producing a video of their pro-environmental advocacy
B. Learners answering an essay question on “Why Pro-environmental Behavior Matters?”
C. Learners writing individual blogs on their pro-environmental activities and why they do it.
D. Learners conducting an action research on students’ motivation in pro-environmental behaviors.
P16.
In order to plan, create, and select the appropriate kind of assessment, you need to know the characteristics of the
different types of assessment according to purpose, function, and the kind of information needed about learners. You are
expected to read this before you can create your own illustrative scenario.
The information from educational assessment at the beginning of the lesson is used by the teacher to prepare relevant
instruction for learners. For example, if the learning target is for learners to determine the by-product of photosynthesis, then the
teacher can ask learners if they know what the food of plants is. If incorrect answers are provided, then the teacher can recommend
references for them to study. If the learning target is for the learners to divide a three-digit number by a two-digit number, then the
teacher can start with a three-item exercise on the task to identify who can and cannot perform the task. For those who can do the
task, the teacher can provide more exercises; for those who cannot, necessary direct instruction can be provided. At this point of
instruction, the results of the assessment are not graded because the information is used by the teacher to prepare a relevant ways
to teach.
Educational assessment during instruction is done where the teacher stops at certain parts of the teaching episodes to ask
learners questions, assign exercises, short essays, board work, and other tasks. If the majority of the learners are still unable to
accomplish the task, then the teacher realizes that further instruction is needed by learners. The teacher continuously provides a
series of practice drills and exercises until the learners are able meet the learning target. These drills and exercises are meant to
make learners consolidate the skill until they can execute it with ease. At this point of the instruction, the teacher should be able to
see the progress of the learners in accomplishing the task. The teacher can require the learners to collect the results of their drills
and exercises so that learners can track their own progress as well. This procedure allows learners to become active participants in
their own learning. At this point of the instruction, the results of assessment are not yet graded because the learners are still in the
process of reaching the learning target; and some learners do not progress at the same rate as the others.
When the teacher observes that majority or all of the learners are able to demonstrate the learning target, then the teacher
can now conduct the summative assessment. It is best to have a summative assessment for each learning target so that there is an
evidence that learning has taken place. Both the summative and formative assessment should be aligned to the same learning
target; in this case, there should be parallelism between the tasks provided in the formative and summative assessment.
18.
When the learners are provided with word problem-solving tasks in the summative assessments, word problem-solving
should have also be given during the formative assessment. When the learners are asked to identify the parts of the book during the
summative assessment, the same exercises should have been provided during the formative assessment. For physical education, if
the final performance is a folk dance, then learners are given time to practice and a pre-final performance is scheduled to give
feedback. The final dance performance is the summative assessment, and the time for practice and pre-final performance is the
formative assessment.
Psychological assessments, such as tests and scales, are measures that determine the learner’s cognitive and non-cognitive
characteristics. Examples of cognitive tests are those that measure ability, aptitude, intelligence, and critical thinking. Affective
measures are for personality, motivation, attitude, interest, and disposition. The results of these assessments are used by the
school’s guidance counselor to perform interventions on the learners’ academic, career, and social and emotional development.
Can a teacher-made test become a standardized test? Yes, as long as it is valid, reliable, and with a standard procedure for
administering, scoring, and interpreting results.
The norm-referenced test interprets results using the distribution of scores of a sample group. The mean and standard
deviations are computed for the group. The standing of every individual is a norm-referenced test is based on how far they are from
the mean and standard deviation of the sample. Standardized tests usually interpret scores using a norm set from a large sample.
Having an established norm for a test means obtaining the normal or average performance in the distribution of scores. A
normal distribution is obtains by increasing the sample size. A norm is a standard and is based on a very large group of samples.
Norms are reported in the manual of standardized tests.
A normal distribution found in the manual takes the shape of a bell curve. It shows the number of people within a range of
scores. It also reports the percentage of people with particular scores. The norms is used to convert a raw score into standard scores
for interpretability.
20.
Week 3
Name:_________________________________
Subject/Time:___________________________
Instructor’s name: _______________________
Instructions: a.) Using other bond papers/papers strictly prohibited.
1. Create a graphic organizer for the different kinds of test. You may represent your ideas inside the circle and make connections
among ideas. Explain your graphic organizer to your classmates.
2. To know more about the different kinds of assessment, complete the table by providing other specific examples of each kind of
assessment. You may use other references.
Type Example
Educational
Psychological
Paper-and-Pencil
Performance-based
Teacher-made
Standardized
Achievement
Aptitude
Speed
Power
Norm-referenced
Criterion-referenced
21.
Quiz #3 - A
Name:_________________________________
Subject/Time:___________________________
Instructor’s name: _______________________
Instructions: a.) Using other bond papers/papers strictly prohibited.
A. Multiple-Choice
Read carefully each item. Choose the letter of the correct and best answer in every item.
Quiz #3 - b
B. Read each case and identify what kind of assessment is referred to.
5. A teacher made a 10-item spelling test where the word is pronounced and
the learners will write the correct spelling. What kind of assessment is used?
6. A teacher in science tested learners’ familiarity with the parts of the heart.
An illustration of a heart is provided and they need to label the parts. What
is the function of assessment?
9. The learners who got perfect scores in the science achievement test were
invited to join the science club. In this way, how was the score used?
10. The teacher in mathematics wanted to determine how well the learners
have learned about the mathematics curriculum at the end of the school
year. The Iowa test basic skills on math was administered. What kind of
assessment was administered?
23.
Chapter
Development and administration
Of test
Lesson 4: Planning a Written Test
To be able to learn or enhance your skills in planning for a good classroom test, you need to review your
knowledge on lesson plan development, constructive alignment, and different test formats. It is suggested that you read books and
other references in print or online that could help you design a good written test.
Eva
Cre
lua
ate
tion
Synthesis Evaluate
Analysis Analyze
Application Apply
Comprehension Comprehension
Knowledge Remember
3. Calculate the weight for each topic. Once the test coverage is determined, the weight of each topic covered in the test is
determined. The weight assigned per topic in the test is based on the relevance and the time spent to cover each topic
during instruction. The percentage of time for a topic in a test is determined by dividing the time spent for that topic during
instruction by the total amount of time spent for all topics covered in the test. For example, for a test on the Theories of
Personality for General Psychology 101 class, the teacher spent ¼ to 1½ hour class sessions. As such, the weight for each
topic is as follows:
4. Determine the number of items for the whole test. To determine the number of items to be included in the test, the
amount of time needed to answer the items are considered. As a general rule, students are 30-60 seconds for each item in
test formats which choices. For a one-hour class, this means that the test should not exceed 60 items. However, because
you need also to give them for test paper/booklet distribution and giving instruction, the number of items should be less,
maybe just 50 items.
5. Determine the number of items per topic. To determine the number of items to be included in the test, the weights per
topic are considered. Thus, using the examples above, for a 60-item final test, Theories & Concepts, Humanistic Theories,
Cognitive Theories, Behavioral Theories, and Social Learning Theories will have a 5 items, Trait Theories – 10 items, and
Psychoanalytic Theories – 15 items.
26.
1. One-Way TOS. A one-way TOS maps out the content or topic, test objectives, number of hours spent, and format, number,
and placement of items. This type of TOS is easy to develop and use because it just works around the objectives without
considering the different levels of cognitive behaviors. However, a one-way TOS cannot ensure that all levels of cognitive
behaviors that should have been developed by the course are covered in the test.
2. Two-Way TOS. A two-way TOS reflects not only the content, time spent, and number of items but also the levels of
cognitive behavior targeted per test content based on the theory behind cognitive testing. For example, the common
framework for testing at present in the DepEd Classroom Assessment Policy is the Revised Bloom’s Taxonomy (DepEd,
2015). One advantage of this format is that it allows one to see the levels of cognitive skills and dimensions of knowledge
that are emphasized by the test. It also shows the framework of assessment used in the development of the test. However,
this format is more complex than the one-way format
3. Three-Way TOS. This type of TOS reflects the features of one-way and two-way TOS. One advantage of this format is that it
challenges the test writers to classify objectives based on the theory behind the assessment. It also shows the variability of
thinking skills targeted by the test. However, it takes a much longer to develop this type of TOS.
COMPUTATION
1. % of item = number of hours allotted
total number of hours
=6
24
= 0.25 OR 25%
MARIAN COLLEGE
IPIL ZAMBOANGA SIBUGAY
MIDTERM
TABLE OF SPECIFICATIONS
Subject: ASSESSMENT OF STUDENT LEARNING-1
Number of Items: 40 School Year: 2021 – 2022
.25
4. Exhibit understanding on 10 26, 31 32 29, 30 33, 34, 35 27, 28 10
the process of applying 6 25%
variety tools of assessment.
TOTAL 24 100% 40 9 6 5 6 10 4 40
Prepared By:
P28
Week 4
Name:_________________________________
Subject/Time:___________________________
Instructor’s name: _______________________
DEVELOP
Answer:___________________________________________________________________________________________
__________________________________________________________________________________________________
__________________________________________________________________________________________________
__________________________________________________________________________________________________
__________________________________________________________________________________________________
Answer:___________________________________________________________________________________________
__________________________________________________________________________________________________
__________________________________________________________________________________________________
__________________________________________________________________________________________________
__________________________________________________________________________________________________
3. When constructing a TOS where objectives are set without classify them according to their cognitive behavior, what
format do you use?
Answer:___________________________________________________________________________________________
__________________________________________________________________________________________________
__________________________________________________________________________________________________
__________________________________________________________________________________________________
__________________________________________________________________________________________________
4. If you designed a two-way TOS for your test, what does this format have?
Answer:___________________________________________________________________________________________
__________________________________________________________________________________________________
__________________________________________________________________________________________________
__________________________________________________________________________________________________
__________________________________________________________________________________________________
29.
Quiz #4
Name:_________________________________
Subject/Time:___________________________
Instructor’s name: _______________________
1. The instructional objective focuses on the development of learners’ knowledge. Can this objective be assessed
using the multiple-choice format?
A. No, this objective requires an essay format.
B. No, this objective is better assessed using matching type of test.
C. Yes, as multiple-choice is appropriate in assessing knowledge.
D. Yes, as multiple-choice is the most valid format when assessing format.
2. You prepared an objective test format for your quarterly test in mathematics. Which of the following could NOT
have been your test objective?
A. Interpret a line graph
B. Construct a line graph
C. Compare the information presented in a line graph
D. Draw conclusions from the data presented in a line graph.
3. Teacher Myrna prepared a TOS as her guide in developing a test. Why is this necessary?
A. To guide the planning of instruction
B. To satisfy the requirements in developing a test
C. To have a test blueprint as accreditation usually requires this plan
D. To ensure that the test is designed to cover what it intends to measure
4. Ms. Zamora prepared a TOS that shows both the objectives and the different levels of cognitive behavior. What
format could she have used?
A. One-way format
B. Two-way format
C. Three-way format
D. Four-way format
5. The school principal wants the teachers to develop a TOS that uses the two-way format than a one-way format.
Why do you think this is the principal’s preferred format?
A. So that the different levels of cognitive behaviors to be tested are known
B. So that the formats of the test are known by just looking at the TOS
C. So that the test writer would know the distribution of the test items
D. So that objectives for instruction are also reflected in the TOS
P30.
Classroom assessments are an integral part of learners’ learning. They do more than just measure learning. They also inform the
learners what needs to be learned and to what extent and how to learn them. They also provide the parents some feedback about
their child’s achievement of the desired learning outcomes. The schools also get to benefit from classroom assessments because
learners’ test results can provide them evidence-based data that are useful for instructional planning and decision-making. As such,
it is important that assessment tasks or tests are meaningful and further promote deep learning, as well as fulfill the criteria and
principles of test construction.
There are many ways by which learners can demonstrate their knowledge and skills and show evidence of their proficiencies at
the end of a lesson, unit, or subject. While authentic/performance-based assessments have been advocated as the better and more
appropriate methods in assessing learning outcomes, particularly as they assess higher-level thinking skills, traditional written
assessment methods, such as multiple-choice tests, are also considered as appropriate and efficient classroom assessment tools for
some types of learning targets. This is especially true for large classes and when test results are needed immediately for some
educational decisions. Traditional tests are also deemed reliable and exhibit excellent content and construct validity.
To learn or enhance your skills in developing good and effective test items for a particular test format, you need to go back and
review your prior knowledge on different test formats; how and when to choose a particular test format that is the most appropriate
measure of the identified learning objectives and desired learning outcomes of your subject; and how to construct good and
effective items for each format.
3. Is the test matched or aligned with the course’s DLOs and the course contents or learning activities?
The assessment tasks should be aligned with the instructional activities and the DLOs. Thus, it is important that you are
clear about what DLOs are to be addressed by your test and what course activities or tasks are to be implemented to
achieve the DLOs.
For example, if you want learners to articulate and justify their stand on ethical decision-making and social responsibility
practices in business (i.e., DLO), then an essay test class debate are appropriate measures and tasks for this learning
outcome. A multiple-choice test may be used but only if you intend to assess learners’ ability to recognize what is ethical
versus unethical decision-making practice. In the same manner, matching-type items may be appropriate if you want to
know whether your students can differentiate and match the different approaches or terms to their definitions.
4. Are the test items realistic to the students?
Test items should be meaningful and realistic to the learners. They should be relevant or related to their everyday
experiences. The use of concepts, terms, or situations that have not been discussed in the class or that they have never
encountered, read, or heard about should be minimized or avoided. This is to prevent learners from making wild guesses,
which will undermine your measurement of what they have really learned from the class.
Constructed-Response Test require learners to supply answers to a given question or problem. These include:
Short Answer Test. It consists of open-ended questions or incomplete sentences that require learners to create an answer
32.
For each item, which is typically a single word or short phrase. This includes the following types:
o Completion. It consists of incomplete statements that require the learners to fill in the blanks with the correct
word or phrase.
o Identification. It consists of statements that require the learners to identify or recall the terms/concepts, people,
places, or events that are being described.
o Enumeration. It requires the learners to list down all possible answers to the question.
Essay Test. It consists of problems/questions that require learners to compose or construct written responses, usually long
ones with several paragraphs.
Problem-Solving Test. It consists of problems/questions that require learners to solve problems in quantitative or non-
quantitative setting using knowledge and skills in mathematical concepts and procedures, and/or other higher-order
cognitive skills (e.g., reasoning, analysis, critical thinking, and skills).
Faulty: Which of the following is a type of statistical procedure used to test a hypothesis regarding significant relationship between
variables, particularly in terms of the extent and direction of association?
A. ANCOVA C. Correlation
B. ANOVA D. t-test
Good: Which of the following is an inferential statistical procedure used to test a hypothesis regarding significant differences
between two qualitative variables?
A. ANCOVA C. Chi-Square
B. ANOVA D. Mann-Whitey Test
2. Do not lift and use statements from the textbook or other learning materials as test questions.
3. Keep the vocabulary simple and understandable based on the level of learners/examinees.
4. Edit and proofread the items for grammatical and spelling before administering them to the learners.
Stem:
1. Write the directions in the stem in a clear and understandable manner.
Faulty: Read each question and indicate your answer by shading the circle corresponding to your answer.
Good: This test consists of two parts. Part A is a reading comprehension test, and Part B is a grammar/language test. Each
question is a multiple-choice test item with (5) options. You are to answer each question but will not be penalized for a
wrong answer or for guessing. You can go back and review your answers during the time allotted.
2. Write stems that are consistent in form and structure, that is, present all items either in question form or in descriptive or
declarative form.
Faulty: (1) Who was the Philippine president during Martial Law?
(2) The first president of the Commonwealth of the Philippines was .
Good: (1) Who was the Philippine president during Martial Law?
(2) Who was the first president of the Commonwealth of the Philippines?
3. Word the stem positively and avoided double negatives, such as NOT and EXCEPT in a stem. If a negative word is necessary,
underline or capitalize the words for emphasis.
Faulty: What does DNA stand for, and what is the organic chemical of complex molecular structure found in all cells and
viruses and codes genetic information for the transmission of inherited traits?
Options:
1. Provide three (3) to five (5) options per item, with only one being the correct or best answer/alternative.
2. Write options that are parallel or similar in form and length to avoid giving clues about the correct answer.
Faulty: Which experimental gas law describes how the pressure of a gas tends to increase as the volume of the container
decreases? (i.e, “The absolute pressure exerted by a given mass of an ideal gas is inversely proportional to the volume it
occupies.”)
A. Boyle’s Law D. Avogadro’s Law
B. Charles Law E. Faraday’s Law
C. Beer Lambert Law
Good: Which experimental gas law that describes how the pressure of gas tends to increase as the volume of the container
decreases? (i.e, “The absolute pressure exerted by a given mass of an ideal gas is inversely proportional to the volume it
occupies.”
A. Avogadro’s Law D. Charles Law
B. Beer Lambert Law E. Faraday’s Law
C. Boyle’s Law
4. Place correct response randomly to avoid a discernable pattern of correct answer.
5. Use None-of-the-above carefully and only when there is one absolutely correct answer, such as in spelling or math items.
Faulty: Who among the following has become the President of Philippine Senate?
A. Ferdinand Marcos D. Quintin Paredes
B. Manuel Quezon E. All of the Above
C. Manuel Roxas
Good: Who was the first ever President of the Philippine Senate?
A. Eulogio Rodriguez D. Manuel Roxas
B. Ferdinand Marcos E. Quintin Paredes
C. Manuel Roxas
7. Make all options realistic and reasonable.
Good: Directions: Column I is a list of countries while Column II presents the continent where these countries are located.
Write the letter of the continent corresponding to the country on the line provided in Column I.
Item #1’s instruction is less preferred as it does not detail the basis for matching the stem and the response options.
2. Ensure that the stimuli are longer and the responses are shorter.
3. For each item, include only topics that are related with one another and share the same foundation of information.
Faulty: Match the following:
A B
1. Indonesia A. Asia
2. Malaysia B. Bangkok
3. Philippines C. Jakarta
4. Thailand D. Kuala Lumpur
5. Year ASEAN was establish E. Manila
F. 1967
35.
Good: On the line to the left of each country in Column I, write the letter of the country’s capital presented in Column II.
Column I Column II
1. Indonesia A. Bandar Seri Begawan
2. Malaysia B. Bangkok
3. Philippines C. Jakarta
4. Thailand D. Kuala Lumpur
E. Manila
Item #1 is considered an unacceptable item because its response options are not parallel and include different kinds of
information that can provide clues to the correct/wrong answer. On the other hand, Item #2 details the basis for matching
and the response options only include related concepts.
4. Make the response options short, homogeneous, and arranged in logical order.
Faulty: Match the following fractions with their corresponding decimals equivalents:
A B
¼ A. 0.25
5/4 B. 0.28
7/25 C. 0.90
9/10 D. 1.25
36.
Good: Match the following fractions with their corresponding decimals equivalents:
A B
¼ A. 0.09
5/4 B. 0.25
7/25 C. 0.28
9/10 D. 0.90
E. 1.25
Item #1 is considered inferior to Item #2 because it includes the same number of response options as that of the stimuli, thus
making it more prone to guessing.
What are the general guidelines in writing true or false items?
True or false items are used to measure learners’ ability to identify whether a statement or proposition is correct/true or
incorrect/false. They are best used when learners’ ability to judge or evaluate is one of the desired learning outcomes of the course.
There are different variations of the true or false items. These include the following:
1. T-F Correction or Modified True-or-False Question. In this format, the statement is presented with a key word or phrase
that is underlined, and the learner has to supply the correct word or phrase.
e.g., Multiple-Choice Test is authentic.
2. Yes-No Variation. In this format, the learner has to choose yes or no, rather than true or false.
e.g., The following are kinds of test. Circle Yes if it is authentic test and No if not.
Multiple Choice Test Yes No
Debates Yes No
End-of-the Term Project Yes No
True or False Test Yes No
3. A-B Variation. In this format, the learner has to choose A or B, rather than true or false.
e.g., Indicate which of the following are traditional or authentic tests by circling A if it is a traditional test and B if it is
authentic.
Traditional Authentic
Multiple Choice Test A B
Debates A B
End-of-the Term Project A B
True or False Test A B
Because true or true test items are prone to guessing, as learners are asked to choose between two options, utmost care
should be exercised in writing true or false items. The following are the general guidelines in writing true or false items:
1. Include statements that are completely true or completely false.
Faulty: The presidential system of government, where the president is only the head of state or government, is adopted by
the United States, Chile, Panama, and South Korea.
Good: The presidential system, where the president is the only the head of state or government, is adopted by Chile.
Item #1 is of poor quality because, while the description is right, the countries given are not all correct. While South
Korea has a presidential system of government, it also has a prime minister who governs alongside with the
president.
2. Use and easy-to-understand statements.
Faulty: Education is a continuous process of higher adjustment for human beings who have evolved physically and mentally,
which is free and conscious of God, as manifested in nature around the intellectual, emotional, and humanity of man.
Good: Education is the process of facilitating learning or the acquisition of knowledge, skills, values, beliefs, and habits.
Items #1 is somewhat confusing, especially for younger learners because there are many ideas in one statements.
3. Refrain from using negatives—especially double negatives.
Faulty: There is nothing illegal about buying goods through the internet.
Faulty: The news and information posted on the CNN website is always accurate.
Good: The news and information posted on the CNN website is usually accurate.
Absolute words such as “always” and “never” restrict possibilities and make a statement as true 100 percent or all
the item. They are also a hint for a “false” answer.
5. Express a single idea test item.
Faulty: If an object is accelerating, a net force must be acting on it, and the acceleration of an object is directly proportional
to the net force applied to the object.
Good: if an object is accelerating, a net force must be acting on it.
Item #1 consists of two conflicting ideas, wherein one is not correct.
6. Avoid the use of unfamiliar words or vocabulary.
Faulty: Esprit de corps among soldiers is important in the face of hardships and opposition in fighting the terrorists.
Good: Military morale is important in the face of hardships and opposition in fighting the terrorists.
Students may have a difficult time understanding the statement, especially if the word “esprit de corps” has not
been discussed in the class. Using unfamiliar words would likely lead to guessing.
7. Avoid lifting statements from the textbook and other learning materials.
Faulty: Ferdinand Marcos declared martial law in 1972. Who was the president during that period?
Good: The government should start renewable sources of energy by using turbines called .
Item #1 has many possible answers because the statement is very general (e.g., wind, solar, biomass, geothermal,
and hydroelectric). Item #2 is more specific and only requires one correct answer (i.e., wind).
P38.
5. Avoid grammatical clues to the correct response.
Extended-Response Restricted-Response
How are the leopard and tiger differ? Support Tina is preparing for a demonstration to display at
your answer with details and information from her school’s science fair. She needs to show the
the article. effects of salt on the buoyancy of egg.
The following are the general guidelines in constructing good essay questions:
1. Clearly define the intended learning outcomes to be assessed by the essay test.
To design effective essay questions or prompts, the specific intended learning outcomes are identified. If the intended
learning outcomes to be assessed lack clarity and specificity, the questions or prompts may assess something other than
what they intend to assess. Appropriate direct verbs that most closely match the ability that learners should demonstrate
must be used in the prompts. These include verbs such as compose, analyze, interpret, explain, and justify, among others.
2. Refrain from using essay test for intended learning outcomes that are better assessed by the other kinds of assessment.
Some intended learning outcomes can be efficiently and reliably assessed by selected-type test rather than by essay test. In
the same manner, there are intended learning outcomes that are better assessed using other authentic assessments, such
as performance test, rather than by essay tests. Thus, it is important to take into consideration the limitations of essay tests
when planning and deciding what assessment method to employ for an intended learning outcome.
P39.
3. Clearly define and situate the task within a problem situation as well as the type of thinking required to answer the test.
Essay questions or prompts should provide clear and well-defined tasks to the learners. It is important to carefully choose
the directive verb, to write clearly the object or focus of the directive verb, and to delimit the scope of the task. Having clear
and well-defined tasks will guide learners on what to focus on when answering the prompts, thus avoiding responses that
contain ideas that are unrelated or irrelevant, too long, or focusing only on some part of the task. Emphasizing the type of
thinking required to answer the question will also guide students on the extent to which they should be creative, deep,
complex, and analytical in addressing and responding to the questions.
4. Present tasks that are fair, reasonable, and realistic to the students.
Essay questions should contain tasks or questions that students will be able to do or address. These include those that are
within the level of instruction/training, expertise, and experience of the students.
5. Be specific in the prompts about the time allotment and criteria for grading the response.
Essay prompts and directions should indicate the approximate time given to the students to answer the essay questions to
guide them on how much time they should allocate for each item, especially if several essay questions are presented. How
the responses are to be graded or rated should be clarified to guide the students on what to include in their responses.
Example: Consider the following score distribution: 12, 14, 14, 14, 17, 24, 27, 28, 30. Which of the following is/are
the correct measure/s of central tendency? Indicate all possible answers.
A. Mean = 20 D. Median = 17
B. Mean = 22 E. Mode = 14
C. Median = 16
Options A, D and E are all correct answers.
3. Type-In answer – This type of question does not provide options to choose form. Instead, the learners are asked to supply
the correct answer. The teacher should inform the learners at the start how their answers will be rated. For example, the
teacher may require just the correct answer or may require learners to present the step-by-step procedures in coming up
their answers. On the other hand, for non-mathematical problem solving, such as a case study, the teacher may present a
rubric how their answers will be rated.
Example: Compute the mean of the following score distribution: 32, 44, 56, 69, 75, 77, 95, 96. Indicate your
answer in the blank provided.
In this case, the learners will only need to give the correct answer without having to show the procedures for computation.
Example: Lillian, a 55-year old accountant, has been suffering from frequent dizziness, nausea, and
lightheadedness. During the interview Lillian was obviously restless, and sweating. She reported feeling so stressed
and fearful of anything without any apparent reason. She could not sleep and eat well. She also started to
withdraw from family and friends, as she experienced frequent panic attacks. She also said that she was constantly
worrying about everything in work and at home. What might be Lillian’s problem? What should she do to alleviate
all her symptoms?
Problem-solving test items are good test format as they minimize guessing, measure instructional objectives that focus on
higher cognitive levels, and measure extensive amount of contents or topics. However, they require more time for teachers
to construct, read, and correct, and are prone to rater bias, especially when scoring rubrics/criteria are not available. It is
therefore important that good quality problem-solving test items are constructed.
P40.
The following are some general guidelines in constructing good problem-solving test items:
1. Identify and explain the problem clear.
Faulty: Tricia was 135.6 lbs. when she started with her zumba/aerobics exercises. After three months of attending the
sessions three times a week, her weight was down to 122.8 lbs. About how many lbs. did she lose after three months?
Write your final answer in the space provided and show your computation. [This question asks “about how many” and does
not indicate whether the learners need to give the exact weight or whether they need to round off their answer and to what
extent.]
Good: Tricia was 135.6 lbs. when started with her zumba/aerobics exercises. After three months of attending the sessions
three times a week, her weight was down to 122.8 lbs. How many lbs. did she lose after three months? Write your final
answer in the space provided and show your computation. Write the exact weight; do not round off.
2. Be specific and clear of the type of response required from the students.
Faulty: ASEANA Bottlers, Inc. has been producing and selling Tutti Fruity juice in Philippines, aside from their Singapore
market. The sales for the juice in the Singapore market were S$5million more than those of their Philippine market in 2016,
S$3million more in 2017, and S$4.5million in 2018. If the sales in Philippine market in 2018 was PHP35million, what were
the sales in Singapore market during that year? [This is a faulty question because it does not specify in what currency should
the answer be presented.]
Good: ASEANA Bottlers, Inc. has been producing and selling Tutti Fruity juice in Philippines, aside from their Singapore
market. The sales for the juice in the Singapore market were S$%million more than those of their Philippine market in 2016,
S$3million more in 2017, and S$4.5million in 2018. If the sale in Mexican market in 2018 was PHP35million, what were the
sales in U.S market during that year? Provide answer in Singapore dollars (1S$ = PHP36.50). [This is a better item because it
specifies in what currency should the answer be presented, and the exchange rate was given.]
Faulty: VCV Consultancy Firm was commissioned to conduct a survey on the voters’ preferences in Visayas and Mindanao
for the upcoming presidential election. In Visayas, 65% are for Liberal Party (LP) candidate, while 35% are for the Nationalist
Party (NP) candidate. In Mindanao, 70% of the voters are Nationalists, while 30% are LP supporters. A survey was
conducted among 200 voters for each region. What is the probability that the survey will show a greater percentage of
Liberal Party supporters in Mindanao than in the Visiyas region? [This question is underlined because it does not specify the
basis for grading the answer.]
Good: VCV Consultancy Firm was commissioned to conduct a survey on the voters’ preferences in Visayas and Mindanao
for the upcoming presidential election. In Visayas, 65% are for Liberal Party (LP) candidate, while 35% are for the Nationalist
Party (NP) candidate. In Mindanao, 70% of the voters are Nationalists, while 30% are LP supporters. A survey was
conducted among 200 voters for each region.
What is the probability that the survey will show a greater percentage of Liberal Party supports in Mindanao than in the
Visayas region? Please show your solutions to support your answer. Your answer will be graded as follows:
0 point = for wrong answer and wrong solution
1 point = for correct answer only (i.e., without or wrong solution)
3 points = for correct answer with partial solutions
5 points = for correct answer with complete solutions
P41.
Week 5
Name:_________________________________
Subject/Time:___________________________
Instructor’s name: _______________________
Instructions: a.) Using other bond papers/papers strictly prohibited.
DEVELOP
Let us review what you have learned about constructing traditional tests.
To check whether you have learned the important information about constructing the traditional types of tests, please complete
the following graphical representation.
What are the types? When to use? Why choose it? How to construct?
P42.
Quiz #5
Name:_________________________________
Subject/Time:___________________________
Instructor’s name: _______________________
Instructions: a.) Using other bond papers/papers strictly prohibited.
b.) Answer the following items.
1. What are these statements that learners are expected to do or demonstrate as a result of engaging in the learning process?
A. Desired learning outcomes C. Learning intents
B. Learning goals D. Learning objectives
2. Which of the following is NOT a factor to consider when choosing a particular test format?
A. Desired learning outcomes of the lesson
B. Grade level of students
C. Learning activities
D. Level of thinking to be assessed
3. Mr. Tobias is planning to use a traditional/conventional type of classroom assessment for his trigonometry quarterly quiz.
Which of the following test formats he will likely NOT use?
A. Fill-in-the-blank test C. Multiple-choice
B. Matching type D. Oral presentation
4. What is the type of test in which the learners are asked to formulate their own answers?
A. Alternative response test C. Multiple-choice type
B. Constructive-response type D. Selected-response type
5. What is the type of true or false test item in which the statement is presented with a key word or brief phrase that is
underlined, and the student has to supply the correct word or phrase?
A. A-B variation C. T-F substitution variation
B. T-F correction question D. Yes-No variation
6. What is the type of test items in which learners are required to answer a question by filling in a blank with the correct word
or phrase?
A. Essay test
B. Fill-in-the-blank or completion test item
C. Modified true or false test
D. Short answer test
7. What is the most appropriate test format to use if teachers want to measure learners’ higher-order thinking skills,
particularly their abilities to reason, analyze, synthesize, and evaluate?
A. Essay C. Multiple-choice
B. Matching type D. True or False
8. What is the first step when planning to construct a final exam in Algebra?
A. Come up with a table if specification
B. Decide on the length of the test
C. Define the desired learning outcomes
D. Select the type of test to construct
9. What is the type of learning outcome that Ms. Araneta is assessing if she wants to construct a multiple-choice test for her
Philippine History class?
A. Knowledge C. Problem solving skills
B. Performance D. Product
10. In constructing a fill-in-the-blanks or completion test, what guidelines should be followed?
Answer:______________________________________________________________________________________________
_____________________________________________________________________________________________________
_____________________________________________________________________________________________________
_____________________________________________________________________________________________________
_____________________________________________________________________________________________________
P43.
In order to establish the validity and reliability of an assessment tool, you need to know the different ways of establishing
test validity and reliability. You are expected to read this before you can analyze your items.
There are different factors that affect the reliability of a measure. The reliability of a measure can be high or low, depending on the
following factors:
1. The number of items in a test – The more items a test has, the likelihood of reliability is high. The probability of obtaining
consistent scores is high because of the large pool of items.
2. Individual differences of participants – Every participant possesses characteristics that affect their performance in a test,
such as fatigue, concentration, innate ability, perseverance, and motivation. These individual factors change over time and
affect the consistency of the answers in a test.
3. External environment – The external environment may include room temperature, noise level, depth of instruction,
exposure to materials, and quality of instruction, which could affect changes in the responses of examinees in a test.
There are different ways in determining the reliability of a test. The specific kind of reliability will depend of the (1) variable
you are measuring, (2) type of test, and (3) number of versions of the test.
The different types of reliability are indicated and how they are done.
Notice in the third column that statistical analysis is needed to determine the test reliability.
Method in
Testing How is this reliability done? What statistics is used?
Reliability
1. Test-retest You have a test, and you need to administer it at one time Correlate the test scores from the first and the next
to a group of examinees. Administer it again at another administration.
time to the “same group” of examinees. Significant and positive correlation indicates that the
There is a time interval of not more than 6 months test has temporal stability overtime.
between the first and second administration of tests that
measure stable characteristics, such as standardized Correlation refers to a statistical procedure where
aptitude tests. The post-test can be given with a linear relationship is expected for two variables. You
minimum time interval of 30 minutes. may use Pearson Product Moment Correlation or
The responses in the test should more or less be the same Pearson r because test data are usually in an interval
across the two points in time. scale (refer to a statistics book for Pearson r).
4. Test of This procedure involves determining if the scores for A statistical analysis called Cronbach’s alpha or the
Internal each item are consistently answered by the examinees. Kuder Richardson is used to determine the internal
Consistency After administering the test to a group of examinees, it consistency of the items. A Cronbach’s alpha value
Using is necessary to determine and record the scores for of 0.60 and above indicates that the test items have
Kuder- each item. The idea here is to see if the responses per internal consistency.
Richardson item are consistent with each other.
And
Cronbach’s
Alpha
Method
5. Inter-rater This procedure is used to determine the consistency of A statistical analysis called Kendall’s tau coefficient
Reliability multiple raters when using rating scales and rubrics to of concordance is used to determine if the ratings
judge performance. The reliability here refers to the provided by multiple raters agree with each other.
similar or consistent ratings provided by more than Significant Kendall’s tau value indicates that the
one rater or judge when they use an assessment tool. raters concur or agree with each other in their
ratings.
Inter-rater is applicable when the assessment requires
the use of multiple raters.
You will notice in the table that statistical analysis is required to determine the reliability of a measure. The very basis
of statistical analysis to determine reliability is the use of linear regressions.
1. Linear regression
Linear regression is demonstrated when you have two variables that are measured, such as two set of scores in a test
taken at two different times by the same participants. When the two scores are plotted in a graph (with X- and Y-axis),
they tend to form a straight line. The straight line formed for the two sets of scores can produce a linear regression. When
a straight line is formed, we can say that there is a correlation between the two sets of scores. This can be seen in the
graph shown. This correlation is shown in the graph given. The graph is called a scatterplot. Each point in the scatterplot is
a respondent with two scores (one for each test).
P46.
20
18
16
14
12
10
6
2 4 6 8 10 12 14 16 18
Score 1
ƩX – Add all the X scores (Monday scores) XY – Multiply the X and Y scores
ƩX – Add all the Y scores (Tuesday scores) ƩX² – Add the squared values of X
X² – Square the value of the X scores ƩY² – Add the squared values of Y
(Monday scores) ƩXY – Add all the product of X and Y
Y² – Square the value of the Y scores
(Tuesday scores)
Substitute the values in the formula:
10 ( 1328 ) – ( 87)(139)
r=
√ [10 ( 871 ) – ( 87)² ][10 ( 2125 ) – (139) ²]
r =0.80
The value of correlation coefficient does not exceed 1.00 or -1.00. A value of 1.00 and -1.00 indicates perfect
correlation. In test of reliability though, we aim for high positive correlation to mean that there is consistency in the way
the student answered the tests taken.
The checklist has five items. The teacher wanted to determine if the items have internal consistency.
Student Item Item Item Item Item Total for Score- (Score-Mean)²
1 2 3 4 5 each case(X) Mean
A 5 4 4 4 1 19 2.8 7.84
B 3 3 3 3 2 15 -1.2 1.44
C 2 3 3 3 3 16 -0.2 0.04
D 1 2 3 3 3 13 -3.2 10.24
E 3 3 4 4 4 18 1.8 3.24
X̅case=16.2 Ʃ(Score – Mean)² = 22.8
Cronbach’s ɑ = ( n−1
n
)( σ t ²−Ʃ(σ
σt²
t ²)
)
Cronbach’s ɑ = (
5−1 )( 5.7 )
5 5.7−5.2
Cronbach’s ɑ = 0.10
The internal consistency of the responses in the attitude toward teaching is 0.10, indicating low internal
consistency.
The consistency of ratings can also be obtained using a coefficient of concordance. The Kendall’s ω coefficient of
concordance is used to test the agreement among raters.
Below is a performance task demonstrated by five students rated by three raters. The rubric used a scale of 1 to 4,
where in 4 is the highest and 1 is the lowest.
Five Rater Rater Rater Sum of D D²
demonstrations 1 2 3 Ratings
A 4 4 3 11 2.6 6.76
B 3 2 3 8 -0.4 0.16
C 3 4 4 11 2.6 6.76
D 3 3 2 8 -0.4 0.16
E 1 1 2 4 -4.4 19.36
X̅ Ratings= 8.4 ƩD²= 33.2
The scores given by the three raters are first computed by summing up the total ratings for each demonstration. The
mean is obtained for the sum of ratings (X̅ Ratings = 8.4). The mean is subtracted from each of the Sum of Ratings (D). Each mean
and summation of squared difference is substituted in the Kendall’s ω formula. In the formula, m is the numbers of raters.
12 Ʃ²
W= 2
m² ( N)( N −1)
12(33.2)
W=
3 ²(5)(52−1)
W=0.37
P49.
Kendall,s ω coefficient value of 0.38 indicates the agreement of the three raters in the five demonstration.
There is moderate concordance among the three raters because the value is far from 1.00.
There are cases for each type of validity provided that illustrates how it is conducted. After reading the cases and
references about the different kinds of validity, partner with a seatmate and answer the following questions. Discuss your
answers. You may use other references and browse the internet.
P50.
1. Content Validity
A coordinator in science is checking the science test paper for grade 4. She asked the grade 4 science teacher to
submit the table of specifications containing the objectives of the lesson and the corresponding items. The coordinator
checked whether each item is aligned with the objectives.
How are the objectives used when creating test items?
How is content validity determined when given the objectives and the items in a test?
What should be present in a test table of specifications when determining content validity?
Who checks the content validity of items?
2. Face validity
The assistant principal browsed the test paper made by the math teacher. She checked if the contents of the
items are about mathematics. She examined if instructions are clear. She browsed through the items if the grammar is
correct and if the vocabulary is within the students’ level of understanding.
What can be done in order to ensure that the assessment appears to be effective?
What practices are done in conducting face validity?
Why is face validity the weakest form of validity?
3. Predictive validity
School admission’s office developed an entrance examination. The officials wanted to determine if the results of
the entrance examination are accurate in identifying good students. They took the grades of the students accepted for
the first quarter. They correlated the entrance exam results and the first quarter grades. They found significant and
positive correlations between the entrance examination scores and grades. The entrance examination results predicted
the grades of students after the first quarter. Thus, there was predictive-prediction validity.
Why are two measures needed in predictive validity?
What is the assumed connection between these two measures?
How can we determine if a measure has predictive validity?
How are the test results of predictive validity interpreted?
4. Concurrent Validity
A school guidance counselor administered a math achievement test to grade 6 students. She also has a copy of
the student’s grade in math. She wanted to verify if the math grades of the students are measuring the same
competencies as the math achievement test. The school counselor correlated the math achievement scores and math
grades to determine if they are measuring the same competencies.
What needs to be available when conducting concurrent validity?
At least how many tests are needed for conducting concurrent validity?
What statistical analysis can be used to establish concurrent validity?
How are the results of a correlation coefficient interpreted for concurrent validity?
5. Construct Validity
A science test was made by a grade 10 teacher composed of four domains: matter, living things, force and
motion, and earth and space. There are 10 items under each domain. The teacher wanted to determine if the 10 items
made under each domain really belonged to that domain. The teacher consulted an expert in test measurement. They
conducted a procedure called factor analysis. Factor analysis is a statistical procedure done to determine if the items
written will load under the domain they belong.
What type of test requires construct validity?
What should the test have in order to verify its construct?
What are constructs and factors in a test?
How are these factors verified if they are appropriate for the test?
What results come out in construct validity?
How are the results in construct validity interpreted?
P51.
The construct validity of a measure is reported in journal articles. The following are guide questions used when
searching for the construct validity of a measure from reports:
What was the purpose of construct validity?
What type of test was used?
What are the dimensions or factors that were studied using construct validity?
What procedure was used to establish the construct validity?
What statistic was used for the construct validity?
What were the results of the tests construct validity?
6. Convergent Validity
A math teacher developed a test to be administered at the end of the school year, which measures number
sense, patterns and algebra, measurements, geometry, and statistics. It is assumed by the math teacher that students’
competencies in number sense improves their capacity to learn patterns and algebra and other concepts. After
administering the test, the scores were separated for each area, and these five domains were intercorrelated using
Pearson r. The positive correlation between number sense scores increase, the patterns and algebra scores also
increase. This shows student learning of number sense scaffold patterns and algebra competencies.
What should a test have in order to conduct convergent validity?
What are done with the domains in a test on convergent validity?
What analysis is used to determine convergent validity?
How are the results in convergent validity interpreted?
7. Divergent Validity
An English teacher taught metacognitive awareness strategy to comprehend a paragraph for grade 11 students.
She wanted to determine if the performance of her students in reading comprehension would reflect well in the reading
comprehension test. She administered the same reading comprehension test to another class which was not taught the
metacognitive awareness strategy. She compared the results using a t-test for independent samples and found that the
class that was taught metacognitive awareness strategy performed significantly better that the other group. The test has
divergent validity.
What conditions are needed to conduct divergent validity?
What assumption is being proved in divergent validity?
What statistical analysis can be used to establish divergent validity?
How are the results of divergent validity interpreted?
How to determine if an item is easy or difficult?
An item is difficult if majority of students are unable to provide the correct answer. The item is easy if majority
of the students are able to answer correctly. An item can discriminate if the examinees who score high in the test can
answer more the items correctly than examinees who got low scores.
Below is a dataset of five items on the addition and subtraction of integers. Follow the procedure to determine
the difficulty and discrimination of each item.
1. Get the total score of each student and arrange scores from the highest to lowest.
2. Obtain the upper and lower 27% of the group. Multiply 0.27 by the total number of students, and you will get a
value of 2.7. The rounded whole number value is 3.0. Get the top three students and the bottom 3 students based
on their total scores. The top three students are students 2, 5, and 9. The bottom three students are students 7, 8,
and 4. The rest of the students are not included in the item analysis.
3. Obtain the proportion correct for each item. This is computed for the upper 27% group and the lower 27%group.
This is done by summating the correct answer per item and dividing it by the total number of students.
pH + pL
Item difficulty =
2
The difficulty is interpreted using the table:
Computation
Get the results of your previous exam in the class and conduct item analysis. Determine the difficulty and
discrimination. Tabulate the results for each item below. Indicate the index of difficulty, then write if the item is very
difficult, difficult, average, easy, and very easy. In the last column, indicate the index of discrimination and write if very
good item, good item, fair item, or poor item.
Week 6
Name: __________________________________
Subject/Time: ___________________________
Instructor’s name: ________________________
Instructions: a.) Using other bond papers/papers strictly prohibited.
DEVELOP
A. Indicate the type of reliability applicable for each case. Write the type of reliability on the space before the number.
1. Mr. Perez conducted a survey of his students to determine their study habits. Each item is answered using
five-point scale (always, often, sometimes, rarely, never). He wanted to determine if the responses for each
item are consistent. What reliability technique is recommended?
2. A teacher administered a spelling test to her students. After a day, another spelling test was given with the
same length and stress of words. What reliability can be used for the two spelling tests?
3. A PE teacher requested two judges to rate the dance performance of her students in physical education.
What reliability can be used to determine the reliability of the judgments?
4. An English teacher administered a test to determine students’ use of verb given a subject with 20 items.
The scores were divided into items 1 to 10, and another for items 11 to 20. The teacher correlated the two
set of scores that form the same test. What reliability is done here?
5. A computer teacher gave a set of typing tests on Wednesday and gave the same set the following week.
The teacher wanted to know if the students’ typing skills are consistent. What reliability can be used?
B. Indicate the type of validity applicable for each case. Write the type of validity on the blank before the number.
1. The science coordinator developed a science test to determine who among the students will be placed in
an advanced science section. The students who scored high in the science test were selected. After two
quarters, the grades of the students in the advanced science were determined. The scores in the science
test were correlated with science grades to check if the science test was accurate in the selection of
students. What type of validity was used?
2. A test composed of listening comprehension, reading comprehension, and visual comprehension item was
administered to students. The researcher determined if the scores on each area refers to the same skill on
comprehension. The researcher hypothesized a significant and positive relationship among these factors.
What type of validity was established?
3. The guidance counselor conducted an interest inventory that measured the following factors: realistic,
investigate, artistic, scientific, enterprising, and conventional. The guidance counselor wanted to provide
evidence that the items constructed really belong to the factor proposed. After her analysis, the proposed
items had high factor loadings on the domain they belong to. What validity was conducted
4. The technology and livelihood education teacher developed a performance task to determine student
competency in preparing a dessert. The students were tasked with selecting a dessert, preparing the
ingredients, and making the dessert in the kitchen. The teacher developed a set of criteria to assess the
dessert. What type of validity is shown here?
5. The teacher in a robotics class taught students how to create a program to make the arms of a robot move.
The assessment was a performances task of making a program to make three kinds of robot arm
movements. The same assessment task was given to students with no robotics class. The programming
performance of the two classes was compared. What validity was established?
P55.
Quiz #6
Name:_________________________________
Subject/Time:___________________________
Instructor’s name: _______________________
Instructions: a.) Using other bond papers/papers strictly prohibited.
b.) Choose the letter of the correct and best answer in every item.
Test data are better appreciated and communicated if they are arranged, organized, and presented in a clear
and concise manner. Good presentation requires designing a table that can be read easily and quickly. Tables and
graphs are common tools that help readers better understand the test results that are conveyed to concerned
groups like teachers, students, parents, administrators, or researchers, which are used as basis in developing
programs to improve learning of students.
To begin the discussion in this lesson, consider a group of raw scores recorded from a summative test administered to
100 college students at a teacher education university.
53 30 21 42 33 41 42 45 32 58
36 51 42 49 64 46 57 35 45 51
57 38 49 54 61 36 53 48 52 49
41 58 42 43 49 51 42 50 62 60
33 43 78 52 58 45 53 40 60 33
75 66 78 52 58 45 53 40 60 33
46 45 79 34 46 43 47 37 33 64
37 36 36 46 41 43 42 47 56 62
50 53 49 39 52 52 50 37 53 40
34 43 43 57 48 43 42 42 65 35
P57.
How do we organize and present ungrouped data through tables?
As you can see in Table 7.1, the test scores presented as a simple list of raw scores. Raw scores are easy to get
because these are scores that are obtained from administering a test, a questionnaire, or any inventory rating scale to
measure knowledge, skills, or other attributes of interest. But as presented in the above table, how do these numbers
appeal to you? Most likely, they do not look interesting nor meaningful.
Apparently, the data presented in Tables 7.1 and 7.2 have been condensed as a result of grouping of scores.
Table 7.3 illustrates a grouped frequency distribution of test scores. The wide range of scores listed in Table 7.2 has
been reduced to 12 class intervals with an interval size of 5. Let us consider again cumulative percentage in the 5th row
of the class interval of 55–59, which is 87. We say that, 87 percent of the students got a score below 60.
The second column enters the midpoints of the test scores in each class interval. By the term itself, midpoint
connotes the middle score, which is halfway between the exact limits of the interval. In Table 7.3, the midpoint of the
class interval 60-64 is 62. The exact limits of this interval are 59.5 (60 – 0.5) have been condensed to appear simpler,
there is a tradeoff. Looking at Table 7.3, how many students scored 48 in the test? How many got a score of 37? As such,
while grouped frequency distribution condensed the data, it results in a loss of information on the individual scores
themselves.
Following are some conventions in presenting test data grouped in frequency distribution:
1. As much as possible, the size of the class intervals should be equal. Class intervals that are multiples of 5, 10, 100,
etc. are often desirable. At times, when large gaps exist in the data and unequal class intervals are used, such
intervals may cause inconvenience in the preparation of graphs and computation of certain descriptive statistical
measures.
P59.
The following formula can be useful in estimating the necessary class interval:
i=H–L
c
where i = size of the class intervals
H = highest test score
L = lowest test score
C = number of classes
The conventional number of classes to group the data generally varies from 7 to 20. As seen in Table 7.3, the size of
the class interval is 5, which is an odd number. If you look at the midpoints, these are whole numbers. If class size is
an even number, then the midpoints will contain decimal numbers which may add some difficulties in conventional
computations for some important statistical measures.
2. Start the class interval at a value which is a multiple of the class width. In Table 7.3, we used the class interval of 5
such that we start with the class value of 20, which is a multiple of 5 and where 20-24 includes the lowest test score
of 21, as seen in Table 7.1.
3. As much as possible, open-ended class intervals should be avoided, e.g., 100 and below or 150 and above. These will
cause some problems in graphing and computation of descriptive statistical measures.
Test Scores
Figure 7.1. Histogram of Test Scores of College Students
The graph was automatically generated with the use of statistical software. In this case, Statistical Package for
Social Sciences (SPSS) was used. Basic steps in SPSS application include the following:
Step 1. Open the Data Editor window. It is understood that the data has already been entered into the Data editor,
following the data entry process. The assumption here is that you already know the basics of entering data into
a statistical program.
P60.
Step 2. On the menu bar, click Analyze, then go to Descriptive Statistics, then to Frequencies. This brings up the
Frequencies dialog box as seen below.
The above image shows the Data Editor. In the next line of the image, you see there is row on File – Edit – Data –
Transform – Analyze – Graphs.
After you have clicked OK, the desired histogram will automatically be shown. The same process will be followed
in generating other types of graphs.
You may also try to organize the data using your knowledge in Excel. The following web preferences can be useful for
those who have never used SPSS: https://www.spss-tutorial.com/spss-data-analysis,www.statisticssolutions.com/spss-
statistics-help. There are more you can access. There are more tutorials available online.
2. Frequency Polygon. This is also used for quantitative data, and it is one of the most commonly used methods in
presenting test scores. It is the line graph of a frequency polygon. It is very similar to a histogram, but instead of
bars, it uses lines to compare sets of test data in the same axes. Figure 7.2 illustrates a frequency polygon.
P61.
In a frequency polygon, you have lines across the scores in the horizontal axis. Each point in the frequency
polygon represents two numbers, which are the score in the horizontal axis and the frequency of that class interval in
the vertical axis. Frequency polygons can also be superimposed to compare several frequency distributions, which
cannot be done with histograms.
You can construct a frequency polygon manually using the histogram in Figure 7.1 by following these simple steps:
1. Locate the midpoint on the top of each bar. Bear in mind that the height of each bar represents the frequency in
each class interval, and the width of the bar is the class interval. As such, that point in the middle of each bar is
actually the midpoint of that class interval. In the histogram on Figure 7.1, there are two spaces without bars. In
such a case, the midpoint falls on the line.
2. Draw a line to connect all the midpoints in consecutive order.
3. The line graph is an estimate of the frequency polygon of the test scores.
Following the above steps, we can draw a frequency polygon using the histogram presented earlier in Figure 7.1
Frequency polygons can also be drawn independently without drawing histograms. In your algebra, you need
and ordered pair (x, y) to graph a point in the coordinate system. For this, the midpoints of the class intervals are used to
plot the points. The midpoints will be the x values and y will be the respective frequencies in each class interval. For data
in Table 7.3, the (x, y) values will be the (x 1, f).
3. Cumulative Frequency Polygon. This graph is quite different from a frequency polygon because the cumulative
frequencies are plotted. In addition, you plot the point above the exact limits of the interval. As such, a cumulative
polygon gives a picture of the number of observations that fall below a certain score instead of the frequency within
a class interval.
P62.
In Table 7.3, the cumulative frequency is one the 4th column; in the 5th column is the conversion to cumulative
percentage. A cumulative percentage polygon is more useful when there is more than one frequency distribution with
unequal number of observations.
The following figures show the cumulative frequency polygon and cumulative percentage polygon, respectively, of the
data in Table 7.3 These cumulative frequency polygons are useful to obtain a number of summary measures. The graphs
display ogive (pronounced as “oh jive”) curves. Again, the images are computer-generated output using a statistics
software.
Scores
Scores
4. Bar Graph. This graph is often used to present frequencies in categories of a qualitative variable. It looks very similar
to a histogram, constructed in the same manner, but spaces are placed in between the consecutive bars. The
columns represent the categories and the height of each bar as in a histogram represents the frequency. If
experimental data are graphed, the independent variable in categories is usually plotted on the x-axis, while the
dependent variable is the test score on the y-axis. However, since the horizontally. Bar graphs are very useful in
comparison of test performance of groups categorized in two or more variables. Following are some examples of bar
graphs.
P63.
Countries
Figure 7.4. Mean Scores on Mathematics Test of Pre-Service Teachers in Different Countries
As you see in the graph above, actual numbers appear at the top portion of each graph. This is done for the
reader to see actual numbers, especially in cases when values are too close to each other and with many categories on
the baseline axis.
Figure 7.6. Students’ Competency Level in Geometry Level in Geometry Test by Major ship
P64.
5. Box-and-Whisker Plots. This is a very useful graph depicting the distribution of test scores through their quartiles.
The first quartile, Q1, is the point in the test scale below, which 25% of the scores lie. The second quartile is the
median, which defines the upper 50% and lower 50% of the scores. The third quartile is the point above which 25%
of the scores lie. The data on the test scores of 100 college students produced this image using the box-plot
approach.
Looking at the box-plot graph, the shaded rectangle represents the middle 50% of the test data. The line that
divides the rectangle is actually the median. The rectangle is referred to as the interquartile range box. The top side of
the rectangle is the 3rd quartile (Q3), and the bottom side is the 1st quartile (Q1). Looking at the scale on the left, more
or less, you can approximate the Q1 and Q3. As such, this type of graph will help readers easily see where the scores are
concentrated, how these scores are distributed and divided into quartiles, what score separates each quartile, as well as
the minimum and maximum values.
The whiskers are the lines that extend from either the top side or lower side of the box. These whiskers
represent the range for the bottom 25% and the top 25% of the data values, excluding outliers (i.e., the numbers you
see outside the whiskers). Outliers, which may be interpreted as “outcast” data, are the extreme scores. Note that
outliers are not necessarily “bad.” They can send an important message about certain phenomenon. For example, if you
want to exempt students from a final examination, the outliers at the top will indicate who can be exempted. At the
same time, those on the other extreme might need more attention and assistance to perform in your class.
6. Pie Graph. One commonly used method to represent categorical data is the use of circle graph. You have learned in
basic mathematics that there are 360° in a full circle. As such, the categories can be represented by the slices of the
circle that appear like a pie; thus, the name pie graph. The size of the pie is determined by the percentage of
students who belong in each category. For example, in a class of 100 students, results were categorized according to
different levels which showed that 10 students (10%) scored above average, 40 students (40%) average; 30 (30%)
below average; or and 2-0 (20%) poor. These percentages will be indicative of the size of the slices in the full circle. A
simple calculation will show that 10% of 360 ° is 36° , 40% of 360° is 144° , 30% of 360° is 108° , and 20% of 360° is 72
° . You will note that the sum of the percentages is equal to 360 ° , that is, the measure of the whole circle. Making a
pie chart is very easy. You may use an ordinary protractor or compass. Also, with the use of statistical software, you
can produce an attractive chart. You need to label each portion of the pie with different shades to indicate the
categories and label the whole chart as shown below.
P65.
What is skewness?
Examine the graphs below.
Figure 7.9. Negatively Skewed Distribution Figure 7.10. Positively Skewed Distribution
Figure 7.8 is labeled as normal distribution. Note that half the area of the curve is a mirror reflection of the
other half. In other words, it is a symmetrical distribution, which is also referred to as bell-shaped distribution. The
higher frequencies are concentrated in the middle of the distribution. A number of experiments have shown that IQ
scores, height, and weight of human beings follow a normal distribution.
The graphs in Figure 7.9 and 7.10 are asymmetrical in shape. The degree of asymmetry of a graph is its
skewness. Basic principle of a coordinate system tells you that, as you move toward the right of the x-axis, the numerical
value increases. Likewise, as you move up the y-axis, the scale value becomes higher. Thus, in a negatively-skewed
distribution, there are more who get higher scores and the tail, indicating lower frequences of distribution points to the
left or to scores and the tail indicates the lower frequencies are on the right or to the higher scores.
The graph in Figure 7.11 is a rectangular distribution. It occurs when the frequency of each score or class interval
of scores are the same or almost comparable such that it is also called a uniform distribution.
We have differentiated the four graphs in terms of skewness, which refers to their symmetry or asymmetry
(non-assymetry). Another way of characterizing frequency distribution is with respect to the numbers of “peaks” seen
on the curve.
Refer to the following grahps.
P67.
You see that the curve has only one peak. We refer to the shape of this distribution as unimodal. Now look at
the graph below. There are two peaks appearing at the highest frequencies.
We call this bimodal distribution. For those with more than two peaks, we call these multimodal distribution. In
addition, unimodal, bimodal, or multimodal may or may not be symmetric. Look back at the negatively-skewed and
positively-skewed distributions in Figure 7.9 and 7.10. Both have one peak; hence, they are also unmoral distributions.
What is kurtosis?
Another way of differentiating frequency distributions is shown below. Consider now the graphs of three
frequency distributions in Figure 7.14.
Week 7
Name: __________________________________
Subject/Time: ___________________________
Instructor’s name: ________________________
DEVELOP
At this point, let us see how well you understood what have been presented in the preceding sections.
1. Consider the table showing the results of a reading examination of set of students.
Frequency Distribution of Scores in Mid-Term Examination in Reading
Class Interval Midpoint F Cumulative Cumulative
Frequency Percentage
140-144 142 2 ________50________ ________100________
135-139 137 7 ________48________ _________96_______
130-134 132 9 _______41_________ _________82_______
125-129 127 14 _______32_________ ________64________
120-124 122 10 _______18_________ _______36_________
115-119 117 6 _______8_________ _______16_________
110-114 112 2 2 ______4__________
a. What is being described in the table?
d. How did we get the midpoints from the given class interval?
e. What is the lower limit of the class with the highest frequency?
f. What is the upper limit of the class with the lowest frequency?
P70.
Quiz #7
Name:_________________________________
Subject/Time:___________________________
Instructor’s name: _______________________
SUSTAIN
Class Interval F
90-94 6
85-89 9
80-84 7
75-79 13
70-74 14
65-69 19
60-64 11
55-59 11
50-54 9
45-49 8
40-44 8
b. What is the exact limit of the class interval with an observed frequency of 13? How did you determine it?
c. Without graphing, how do you see the shape of the graph? Is it symmetrical or skewed? Is it unimodal or
bimodal? Give a statement or two to support your answer.
d. Sketch the graph of the frequency distribution using the data on the table.
P71
Prepare
What are measures of central tendency?
The word “measures of central location or point of convergence of a set of values. Test scores have a tendency
to converge at a central value. This value is the average of the set of scores. In other words, a measure of central
tendency gives a single value that represents a given set of scores. Three commonly-used measures of central tendency
or measures of central location are the mean, the median, and the mode.
Mean. This is the most preferred measure of central tendency for use with test scores, also referred to as the
arithmetic mean. The computation is very simple. When a student has added up the examination scores he/she made in
a subject during the grading period and divided it by the number of examinations taken, then he/she has computed the
arithmetic mean.
ƩX
That is, X̅ = where X̅ = the mean, Ʃ X = the sum of all the scores, and
N
N = the number of scores in the set
Consider again the test scores of students given in Table 8.1, which is the same set of test scores used in the
previous lesson.
You have many ways of computing the mean. The traditional long computation techniques have outlived their relevance
due to advancement of technology and the emergence of statistical software. Using your scientific calculator, you will
see the symbols X̅ , ƩX. Just follow the simple steps indicated in the guide. There are also simple steps in Excel. Different
versions of the statistical software SPSS offer the fastest way of obtaining the mean, even with hundreds of scores in a
set. There is no loss of original information because you are dealing with original individual scores. The use of statistical
software will be explained later.
While we recognize the power of technology, there is information that is unappreciated because of the short-hand
processing of data through mechanical computations. Look at the conventional way of presenting data in a frequency
distribution table as done in Lesson 7:
Ʃ Xi f
X̅ =
N
Where Xi = midpoint of the class interval
f = frequency of each class interval
N = total frequency
Thus, the mean of the test scores in Table 7.1 is calculated as follows:
Ʃ X i f 4720
X̅ = = = 47.2
N 100
Looking at the table, do you find the value reasonable? Why?
The easiest way is to use SPSS by simply following these steps:
1. Open the Data Editor window. It is understood you have prepared the dataset earlier.
2. On the menu bar click Analyze, then Descriptive Statistics, then Frequencies. This opens the Frequencies dialog box.
P73.
Press Continue on the Descriptive Option box, then Press OK on the left Descriptive Box, and you will finally see
the following image.
DESCRIPTIVE VARIABLES = scores
/STATISTIC = MEAN STDDEV MIN MAX.
Descriptives
Descriptive Statistics
N Minimum Maximum Mean Std. Deviation
Scores 100 21.00 79.00 47.1500 10.57954
Valid N (listwise) 100
Look again at the earlier computation of the mean for test scores presented in Table 8.2. It is 47.2
Round it off to the nearest tenth. What did you find out with the mean?
Median. median is the value that divides the ranked score into halves, or the middle value of the ranked scores.
If the number of scores is odd, then there is only one middle value that gives the median. However, if the number of
scores in the set is an even number, then there are two middle values. In this case, the median is the average of these
two middle values. But if there are more than 50 scores, arranging the scores and finding the middle value will take time.
The scientific calculator will not give you the median. Again, statistical software can do this for you with simple steps
similar to finding the mean.
P75
1. On the menu bar click on Analyze, then Descriptive Statistic, then Frequencies. This opens the Frequencies dialog
box.
2. Click on the desired variable name in the left box. In the dataset, let us consider the test scores also in Table 8.1.
Move your cursor to Statistics and the Frequency Statistics box will pop out. Click Median.
3. You will also see that you can use the same process in finding the mean. Earlier, we opted to use the Descriptives
instead of the Frequencies. The click Continue. Then press OK.
Again, how do you work it out the conventional way? Either, you rank the 100 scores, which takes time, or you
arrange the scores in the frequency distribution as shown here:
Median = 44.5
+5 ( 1002 −44)
18
5(6)
= 44.5 +
18
= 46.17
You will see that this value is not too far from the value of 46.00 generated in the SPSS output. When rounded
off to a whole number, they give the same value.
Mode. Mode is the easiest measure of central tendency to obtain. It is the score or value with the highest
frequency in the set of scores. If the scores are arranged in a frequency distribution, the mode is estimated as the
midpoint of the class interval which has the highest frequency. This class interval with the highest frequency is also
called the modal class. In a graphical representation of the frequency distribution, the mode is the value in the
horizontal axis at which the curve is at its highest point. If there are two highest points, then, there are two modes, as
earlier discussed in Lesson 7. When all the scores in a group have the same frequency, the group of scores has no mode.
Considering the test data in Table 8.2, it can be seen that highest frequency of 21 occurred in the class interval
45-49. The rough estimate of the mode is 42, which is the midpoint of the class interval. Using statistical software and
following the steps in finding the mean and the median, the following image will appear which gives the value of the
mode computed directly from the raw data presented in Table 8.2.
Frequencies
Statistics
Scores
Valid 100
Missing 0
Mode 42.00
You see that 42.00 is equal to the earlier estimate we obtained, that is, the midpoint of the modal class.
However, in some cases the value using the conventional method is not exactly equal to the value generated by
statistical programs.
76.
Scale of Measurement. There are four levels of measurement that apply to the treatment of test data: nominal, ordinal,
interval, and ratio. In nominal measurement, the number is used for labeling or identification purposes only. An example
is the student’s identification number or section number. In data processing, instead of labeling gender as female or
male, a code “1” is used to denote Female and “2” to denote Male. While “2” is numerically greater than “1,” in this case
the difference of 1 has no meaning; it does not indicate that Male is better than Female. The purpose is simply to
differentiate or categorize the subjects by gender.
The ordinal level of measurement is used when the values can be ranked in some order of characteristics. The
numeric values used indicate the difference in traits under consideration. Academic awards are made on the basis of an
order of performance: first honor, second honor, third honor, and so on. Some assessment tools require students to
rank their interest or hobbies, or even career choices. Percentile ranks in national assessment test or entrance
examination are examples of measurement in an ordinal scale. Percentile score becomes more useful and meaningful
than simple raw scores in university entrance or division-wide examinations.
The interval level of measurement, which has the properties of both the nominal and ordinal scales, is attained
when the values can describe the magnitude of the differences between groups or when the intervals between the
numbers are equal. “Equal interval” means that the distance between the things represented by 3 and 4 is the same
distance represented by 4 and 5. The most common example of interval scale is temperature readings. The difference
between the temperatures 30° and 40° is the same as that between 90° and 100° . However, there is no true zero point.
The zero degree in the Celsius thermometer does not mean zero or absence of heat, 0 ° is an arbitrary value, a
convenient starting point. With arbitrary zero point, there is a restriction with interval data. You cannot say that an 80 °
object is twice as hot as a 40° object. In the educational setting, a student who gets a score of 120 in a reading ability
test is not twice the better reader than one who got a score of 60 in the same test.
The highest level of measurement is the ratio scale. As such, it carries the properties of the nominal, ordinal, and
interval scales. Its additional advantage is the presence of a true zero point, where zero indicates the total absence of
the trait being measured. A 0 cm as a measure of width means no width, 0 km as a measure of distance means no
distance traveled, and 0 words spelled means no word was spelled at all. Test scores as measure of achievement in
many school subjects are often treated as interval scale. However, if achievement in a performance test in Physical
Education is measured by the number of “push-ups” one can do in a minute or distance run in an hour; or in a Typing
Class where you count the words typed in a minute or words spelled correctly, then these are all on ratio scale.
Now, the most likely questions that cross your mind are: which measure of central tendency should I use? Do I
have to use all the three since the statistical program can automatically give the three measures the easiest way?
Generally, the mean is the most used measure of central tendency because it is appropriate for interval and
ratio variables, which are higher levels of measurement. Its value is affected by the change of a single score such that it
is regarded as the most accurate measure to represent a set of scores. In research, this is most used specifically when
you want to make an inference about the population characteristics on the basis of the observed sample value.
For the median, in some cases, we could have one very high score (or very few high scores) and many low
scores. This is especially true when the test is difficult, or when students are not well prepared for the test. This will
result to many low scores and a few high scores that will lead to a positively-skewed distribution. In the same way, when
the test is too easy for students, there will be many high scores, which leads to a negatively-skewed distribution. In both
cases, the mean can give an erroneous impression of central tendency because its value is pulled by the extreme values
that reduces its role as the representative value of the set of scores. Hence, the median is a better measure. It is the
value that occupies the middle position among the ranked values; thus, it is less likely to be drawn toward the direction
of extreme scores. It is an ordinal statistic but can also be used for interval or ratio data distribution.
The mode is determined by the highest frequency of observations that makes it a nominal statistic.
When the distribution becomes positively-skewed as shown in Figure 8.2, there are variations in their values. The mode
stays at the peak of the curve and its value will be the smallest. The mean will be pulled out from the peak of the
distribution toward the direction of the few high scores. Thus, the mean gets the largest value. The median is between
the mode and the mean.
There are several indices of variability, and the most commonly used in the area of assessment are the following.
Range. It is the difference between the highest (X H) and the lowest (X L) scores in a distribution. It is the simplest measure
of variability but also considered as the least accurate measure of dispersion because its value is determined by just two
scores in a group. It does not take into consideration the spread of all scores; its value simply depends on the highest
and lowest scores. Its value could be drastically changed by a single value. Consider the following examples:
Determine the range for the following scores: 9, 9, 9, 12, 12, 13, 15, 15, 17, 17, 18, 18, 20, 20, 20.
Range = Highest Score (HS) – Lowest Score (LS)
= 20 – 9
= 11
Now, replace a high score in one of the scores, say, the last score and make it 50. The range becomes:
Range = HS – L
= 50 – 9
= 41
You will see that with just a single core, the range increased so high, which can be interpreted as large
dispersion of test scores; however, when you look at the individual scores, it is not.
Variance and Standard Deviation. Standard deviation is the most widely used measure of variability and is
considered as the most accurate to represent the deviations of individual scores from the mean values in the
distribution.
μ = population mean
X = score in the distribution
If we are dealing with the sample data and wish to calculate an estimate of σ , the following formula is used for
such statistic:
X = raw score
X̅ = mean score
N = number of scores in the distribution
This formula is what statisticians term as an “unbiased” estimate, and more often preferred considering that in
both research and assessment studies, we deal on sample data rather than actual population data.
With the standard deviation, you can also see the differences between two or more distributions.
Using the scores in Class A and Class B in the above dataset, we can apply the formula:
Class A Class B
X (X – X̅) (X – X̅)2 X (X – X̅) (X – X̅)2
22 22-12 100 16 16-12 16
18 18-12 36 15 15-12 9
16 16-12 16 15 15-12 9
14 14-12 4 14 14-12 4
12 12-12 0 12 12-12 0
11 11-12 1 11 11-12 1
9 9-12 9 11 11-12 1
7 7-12 25 9 9-12 9
6 6-12 36 9 9-12 9
5 5-12 49 8 8-12 16
X̅ = 12 Ʃ(X – X̅)2 = 276 X̅ = 12 Ʃ(X – X̅)2 = 74
The values 276 and 74 are the sum of the squared deviations of scores in Class A and Class B, respectively. If
these are divided by the number of scores in each class, this gives the variance (S2):
276 74
S²A = = 30.67 S²B = = 8.22
10 – 1 10 – 1
The values above are both in squared units, while our original units of scores are not in squared units. When we
find their square roots, we obtain values that are on the same scale of units as the original set of scores. These too give
the respective standard deviation (S) of each class and computed as follows:
SA = √ S ² A SA = √ S ² B
= √ 30.67 = √ 8.22
= 5.538 = 2.867
You may be thinking that the process will be difficult if you are dealing with many scores in a distribution. This is
not really a problem if you have a scientific calculator. With the simple steps indicated in the User’s Guide, you can just
80.
enter the scores and you will see the symbol σ n, which stands for the standard deviation. You will also see the symbol σ
n-1, which is used when dealing with sample scores. When the sample scores are few, σ n-1 is used; if you are taking the
population, σ n is applied. In the example earlier, we used only 10 scores to explain the concept of variance and standard
deviation. thus, we used n-1 as the denominator, taking into consideration that 10 examinees is the sample population
of students.
An alternative formula is what we call the raw score formula, although this formula does not reflect the concept
of “deviation” which connotes “difference”.
The mathematical equation is:
√
2
SD = ƩX ² – ( ƩX ) / N
N
where:
ƩX = sum of the squares of the raw scores
(ƩX)2 = square of the sum of all the raw scores
N = number of scores or examinees
Again, your scientific calculator can be used to find these values. You will see the fuctions ƩX2 and ƩX in your
calculator.
For larger number of scores in a distribution, Microsoft Excel or SPSS will be most efficient in obtaining both the variance
and standard deviation. This can be done in few seconds if you have already entered and saved the data used to get the
measure of central tendency. To illustrate the simple steps in using SPSS, we refer you to the scores for Class A given
earlier.
80.
Week 8
Name: __________________________________
Subject/Time: ___________________________
Instructor’s name: ________________________
APPLY
1. Refer to this figure below as the frequency polygons representing entrance test scores of three groups of students in
different fields of specialization.
y –– Education –– business –– Engineering
frequency
0 55 65 75 85 95 105
Test score
d. Which group of students had the most dispersed scores in the test? Why do you say so?
e. What distribution is symmetrical? What distribution is skewed? Why do you say so?