A. Two Families of Language Tests
Language testing is a test that is used to measure of student’ English ability skills and performance. Before the teacher design particular test, it is better to know type language testing. Language testing has two families, the first one is Norm-Referenced Test (NRT) and the other one is Criterion-Referenced Test (CRT). Of both families have difference function, NRT is used to help administrator and teacher to create program such as diagnostic and achievement decisions. It has one purpose to help language administrator and language teacher make much better to their students.
1. Norm-Referenced Test
To measure global language abilities (for the example English language proficiency, listening ability reading comprehension, speaking fluency and so on). The example of NRT test multiple choice, true-false, dictation or essay, student not know the typical of question that will display.
2. Criterion-Referenced Test
It usually to measure well-defined and fairly specific to measure well-defined and fairy specific instructional objective. The object are specific to particular course, program, school district or taste. For the example by the end of course, student able underline sentences containing main idea.
Type of interpretation
The difference of two categories of test, on CRT is compere to a particular criterion in absolute term, the student’s performance is compare only to the amount, or percentage, of material learned, for the example: student score 80 percent, which means that the student know 80 percent of material. No reference is made to performance of other student in that score interpretation.
“The value of a test indicates ability of the student in mastering the material that has been given”. On an NRT each student, a student’s performance is compare to those of all other student in percentage term, the score of NRT sometimes expressed with no reference to actual number of test question actual number correctly. For the example: a student scored in the 84th percentile, which means that student score than 84 out of 100 students in the group as whole. How many questions did the student answer correctly? We have no way of knowing because percentage score only express the student position relative to other students.
“Measured test scores of students seen in the comparison performance with other students rather than on the extent to which the material that he controlled”.
Type of measurement
NRT are typically most suitable for measure general abilities, for example would include reading abilities, listening and overall English language proficiency. One of The test that available is TOEFL (Test of English as a Foreign Language). TOEFL has four subtest: listening, structure reading and writing which must necessary be considered general ability.
On a CRT no limited grammar point, subtest on a CRT for a notional-functional language course might consist of interview that made of student abilities to perform greeting, agree or disagree, express an opinion and end a conversation
Purpose of testing
NRT and CRT have equal purpose to score but of both has different way. NRT is relative interpretation therefore to generate score that spread the student out along continuum of general abilities, so that any existing different among the individual can be distinguished. The other hand CRT is to access the amount of knowledge or skill learned by each student focus on individual knowledge or skills not distribution of scores, as a result.
Distribution of score
On a NRT score is generated, analyzed, selected, and refined, the different student will reflected their score it show different their abilities.
On a CRT final examination, the student able to score 100 if they learn hard of material on the final examination. Or it meant that student score on a CRT may be perfectly logical, acceptable, and even desirable.
Test structure
Typically NRT is relatively long and contain wide variety of question content type. The subtest rather general language skills like reading comprehension, listening comprehension, grammar and so forth. Each subtest is relative long 30-50 question and covers a wide variety of different content.
On CRT consist of numerous shorter subtest, which each test represent different instructional object. It usually have twelve subtest.
On CRT consist of numerous shorter subtest, which each test represent different instructional object. It usually have twelve subtest.
B. Matching test to decision purpose
In administering and teaching in language programs, there are four basic kinds of decisions. They are proficiency, placement, achievement and diagnostic. It call the primary language testing functions. This functions correspond to the NRT and CRT categories. NRT help in making program-level decisions (proficiency and placement) and CRT is useful in making program-level decision (diagnostic and achievement). Proficiency decisions and placement decisions are prerogative of administrators. Classroom-level decisions are the prerogative of classroom teachers.
The focus of program-level proficiency decisions is on general knowledge or skills. Proficiencies decisions are often based on proficiency tests specifically designed for such decisions. It assess the general knowledge or skills. It based on large scale standardized tests in order to protect the integrity of the institutions involved, to keep students from getting in over their heads and to prevent students from entering programs that they really don’t need.
Placement decisions usually have the goal of grouping students of similar ability levels together. This tests are designed to help decide what each student’s appropriate will be within a specific program, skill area or course. The purpose is to reveal a particular knowledge or skill. A proficiency test will tend to very general in character because it is designed to assess extremely wide hands of abilities. Placement must be more specifically related to a given program. Both tests should be norm-referenced instruments because decisions must be made on the students’ relative knowledge or skill levels.
All language teachers are in the business of fostering achievement in the form of language learning. Achievement decisions are decision about the amount of learning that students have accomplished. Achievements tests should be designed with very specific reference to a particular course. This link with a special course usually means that the achievements will be directly based on course objectives and will be criterion-referenced. Achievements tests must not specifically designed to measure the objectives of a given course. A good achievement test can tell teachers a great deal about their students’ achievement and about the adequacy of the course.
Diagnostic decisions are made at the beginning or middle of the term and aimed at fostering achievement by promoting strengths and eliminating the weaknesses of individual students. As with achievement tests, diagnostic tests are designed to determine the degree of specific instructional objectives of the course have already been accomplished.
Why a single test can’t fulfill all four functions? There are two reasons: differences in range of ability and differences in variety of content.
Use a spreadsheet to enter your students’ responses to the items on a test, analyze those responses to see which items are working, calculate the students’ total scores and descriptive statistics as well as their standardized scores, work out the correlation between their scores on the test and those from other measure, estimate the reliability or dependability of the test, investigate the validity of the test and keep records of their progress through the entire language program.
Reference
Brown, James Dean. 2005. “Testing in Language Programs: A Comprehensive Guide to English Language Assessment”. Singapore: McGraw-Hill Education.