French I Unit 2 Test Analysis: Background Information

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 5

Stephanie Krive

Midterm Exam Paper


LLT 808 Sec. 730
SS14

French I Unit 2 Test Analysis


Background Information
For the midterm exam review, I have chosen to analyze a test that I give in my French I
classes at about one quarter of the way through the year. At East Middle School in Traverse City,
we follow the Discovering French: Nouveau! curriculum. This curriculum was designed to teach
chunks of vocabulary alongside a grammatical concept, and then to follow up in later months
by revisiting certain material and building on top of that acquired knowledge. For the most part, I
enjoy teaching the course content by following along with its progression. For this reason, I often
use the quizzes and tests provided by this McDougal Littell curriculum. The test that I have
selected to analyze, which of course was one provided, is a summative assessment of the unit
that covers caf vocabulary (foods, beverages, and item cost) and vocabulary we use on a daily
basis (time, day, date, and weather).
The intended purpose of this Unit 2 Test is to gauge how well students now know how to
successfully use, or identify, the correct use of these vocabulary words and expressions. This
includes a listening portion, ten fill-in-the-blank selected-response items, incomplete lists to fill
in (days of the week and months of the year), and ten culturally based selected-response items.
This test is criterion-referenced, in that answers are marked either correct or incorrect depending
on what the students know regarding the standards (Carr, 10). There is only one possible answer
for each question, and the number of correct answers on each individual test determines each
students score. How each student performs has no effect on other students scores. The students
that take this test are native English speakers who are beginning French language learners.
LLT 808 - 730
Krive 1

Because French I can be taken by sixth through eighth graders, the test takers range in age from
11 to 14. As mentioned before, this test is taken at the close of the first quarter of the school year,
when there is a much higher focus on vocabulary, not grammar. The students, parents, and I are
the people most interested in these test scores. The principal (my evaluator) is interested in
overall grades, as well as proof of student growth, but is likely far less interested in this
assessment on its own.
Test Administration
In planning to administer this test, I look ahead and allot a full class period (51 minutes)
for the students to complete this test. I make sure that students are notified as the end of the unit
approaches, and try to announce the test date I have chosen at least a week in advance.
Additionally, I make sure to remind students of the forthcoming test date each day leading up to
it. During these days, we spend a significant amount of time reviewing all of the concepts that
will be on the test, ensuring that students will walk in on test day feeling confident in their
abilities. When the day finally arrives, students are welcomed to our classroom, where we will be
testing. I remind them verbally and with a projected note that we have reached test day, and
students have a five-minute flash review of whatever they would like to study most. When the
five minutes have passed, it is time to get started. I administer the test by explaining each portion
of it before we begin, and then we open on the listening portion together. I have audio CDs with
the listening tracks for each test and quiz, but I also have the script. I believe it is important for
students to hear other voices speaking French (and quickly!) since I cannot possibly always be
the one speaking to them. Once we finish going through the listening portion, activities 1-3,
students are free to complete the rest of the test on their own. Currently, this test is administered
on paper with a pencil. Although many formative assessments are done with miniature clocks or
individual whiteboards, this summative assessment is a simple paper test. The majority of
LLT 808 - 730
Krive 2

students finish after about 35 or 40 minutes of testing, but there are always the students who take
all the time they are given. If and when students need a little extra time to complete their test,
they get it. If not right in that moment, then certainly later that day when they can finish without
noise or anxiety.
Test Review
Due to the nature of how I acquired this test through the curriculum I teach, I cannot
justly speak with authority on how the listening passages were written or selected. I imagine the
test specifications offered much insight on what content should and would be assessed with this
test, as well as possible sources of difficulty for students that must be avoided (Carr, 87). As an
administrator and analyzer of this test, I do indeed feel that many of the questions are clear and
construct relevant. Even though the selected response items on the test are all written in the L2,
French, they are quite short in length and simple in structure. This in itself does not leave much
room to veer away from the construct at hand. That being stated, I think there is a slight issue
with the fairness in some of the selected response items. While some offer three different choices
that are not necessarily alike, all three items might still address the same content, like ordering
something at the caf. However, many of these offer choices that are all from the unit, but are
addressing completely different topics. In several cases, all it takes is one key word to rule out
one or both of the incorrect answers. Though some students finds this helpful and easy, I think it
could actually confuse some others if they are cued to be listening for a false topic. In this sense,
questions either give answers away, or function improperly by confusing some students (Winke,
Module 3). Although this test does seem to need some work regarding the fairness of test items, I
think it does a very good job keeping items independent of one another. Questions do not
piggyback off of one another, and this keeps the text as concise as possible while still addressing

LLT 808 - 730


Krive 3

all necessary components. In addition, the majority of test takers have considered this test to be
easy or not bad, when asked upon finishing.
Regarding reliability, this particular test is criterion-referenced and so its reliability is not
really in question. An answer may only be either correct or incorrect, so the margin of error is
quite low. However, the dependability of this test is something I am beginning to question quite a
bit (Carr, 108). In order to discover the truths that lie in data analysis, I decided to create a
spreadsheet that compares the students scores, as well as the Item Facility and Item
Discrimination. Thanks to Chapter 6 in Carr and Module 6, this was a relatively easy task. Low
IFs and IDs are highlighted in pink, while high IFs and IDs are in green. Of my 28 students, 14
of them (chosen at random) have their scores compared in the attached spreadsheet. When
ranked by highest score to lowest, the two median students both scored 21 points. In determining
the IF upper and IF lower, I left out these two median students, creating upper and lower IFs of
six people each. As you can see based on the data, this test does not appear to be the most
dependable as it rarely has a high B-Index. Of the 26 test items for which I gathered data, only
four have an ID of 50%, and that is the highest ID recorded. Even if many students answered
questions correctly throughout the assessment, I can call very few items good due to the very
low rate of discrimination between the higher performing students and the lower performing
students.
Strangely, there is a part of me that questions how bad these test questions are, even
with the low Item Discrimination. Students were quite successful overall on this test, especially
on questions like Part 3 of listening. In this section, students simply had to identify the weather
forecast they heard by matching it up with the picture displaying that type of weather. Most
students answered all of these questions correctly, and although these questions seem very easy, I

LLT 808 - 730


Krive 4

ask myself if that is so bad. My students, being trapped in the middle of their development and so
in need of reassurance, might actually benefit from a strong boost of confidence on a test section.
Perhaps motivation and confidence during a test, on the other hand, is an entirely different issue
to tackle.
Improvements
The question I return to at the end of my study is the original one: Are the scores from the
test useful? The answer I conclude is yes, for two reasons: If test questions can be reworked or
rewritten, this can now be done in order to have a more dependable test filled with better
questions. Secondly, the scores are very helpful to me, the teacher, because they tell me exactly
what content really must be reviewed in class. It is very clear what piece of the content was not
delivered adequately when everyone in the class is getting the answer wrong. Thankfully, as the
test administrator, I have easy access to this information and can use it to help everyone improve
in all areas of study.
Based on the data, it does appear that the test can be improved. The questions that have a
very low item discrimination are telling me that either those questions are too easy, and therefore
have no place on this test, or that they are too difficult/confusing, meaning it may do more harm
than good keeping them around. I do feel that many items on the test are beneficial and are
showing me exactly what I want the students to be able to identify/know. For that reason, I dont
believe an entire test rewrite is where I will go first. I would like to revise the test, changing the
items that do not seem to showcase true ability, and administer the test again to see what kind of
results appear in the new data.

Carr, N.T. (2011). Designing and analyzing language tests. Oxford: Oxford University Press.
LLT 808 - 730
Krive 5

You might also like