Testing for Language Teachers. Summary

Arthur Hughes has written a very practical guide for teachers and students involved in test design to enable them to have a better understanding of the role that testing has in teaching. The author pays particular attention to the effect of testing on teaching, and therefore, in this book, he highlights from the very first chapter and in several chapters more, the importance of backwash and the impact it has not only in teaching but in society as a whole. This is with the end in view of stressing on the important relationship of testing to teaching.

Hughes states right from the onset that there exists mistrust in tests and test takers and that this sentiment is well-founded because there is undeniably very poor quality in language testing. First, the author sees that too often, the effect of tests on teaching, as well as learning has been harmful and that tests do not measure what is intended to be measured. Backwash is seen as a part of the impact a test may have on learners and teachers, educational systems and society in general. The author emphasizes on two kinds of backwash: harmful backwash and beneficial backwash. Backwash is beneficial when preparation for the test influences all teaching and learning activities because the stakes are high and can have an implication in the institution’s decisions or objectives in benefit of the students’ learning. Harmful backwash takes place when the contents and format of the test are not congruent to the objectives of the course or when certain skills are tested with, for example, a multiple choice item format that results in the idea of giving a lot of practice in this type of test instead of practicing the skill itself. The second reason that Hughes claims for the mistrust in tests is that the results of tests do not reflect the students’ real abilities. This is why tests are considered inaccurate. The author says that there are two main causes of inaccuracy. The first concern test content and test techniques and the second is the lack of reliability. These three concepts are elaborated on by the author and exemplified in subsequent chapters. With the purpose of overcoming the mistrust in tests caused by the poor quality in test design, Hughes wrote that this book aims to make three contributions to the improvement of testing: to help the teaching profession make better tests, enlighten those who are involved in the design of tests and exert pressure on professional testers and examination boards to make improvements on their tests.

Every testing situation is unique and there is a particular purpose for it. Therefore, the first thing testers have to do, according to Hughes, is to have a clear purpose for testing. In this sense, the author categorizes the kinds of tests according to the types of information each test provides. There are proficiency tests to measure people’s language ability regardless of any training that they have had in the language, achievement tests that are administered at the end of a course of study, diagnostic tests that are used to identify the test taker’s strengths and weaknesses, placement tests that are intended to place students at a certain level, direct testing which is when the test requires the test taker to perform exactly the skill that is being measured and indirect testing which measures the abilities that underlie the skill that is meant to be measured; the testing of one element at a time which is referred to as discrete point testing and integrative testing, by contrast, requires the test taker to incorporate several language elements as he or she performs a task; there is the norm-referenced testing designed to relate one test taker’s performance to that of another test taker, and criterion-reference testing that tells teachers what test takers can do with the language; objective testing means that no judgment is required on the part of the tester and subjective testing requires judgment on the part of the tester, computer adaptive testing offers an efficient way of collecting data on tests and test items by programming whatever information is desired and communicative language testing that focuses on measuring the test takers’ communicative language ability.

Hughes points out that language tests are considered to have validity if it measures what it is intended to measure. And what it is that is essentially being measured is the construct. There is construct validity when the test is able to measure the underlying ability. Furthermore, to assure that there is construct validity, other subordinate forms of validity can be looked at: content validity – when the test constitutes a representative sample of the language and linguistic structures established in the language program; criterion-related validity – relates to the degree to which the results of the test agree with a dependable established criteria; concurrent validity – a type of criterion-related validity that is concerned with the level of agreement between tests or another form of assessment aimed at measuring the same thing; predictive validity – another type of criterion-related validity that is concerned with the degree to which the results of a test can predict a test taker’s future performance in the real world; scoring validity - the way in which responses are scored according to the construct and face validity- is when the test looks like what it is supposed to measure.

Like validity, an entire chapter is devoted to reliability that describes that a test is reliable when scores obtained on a certain occasion are consistent to results when the same test is administered to the same students at a different time. Scorer reliability is also seen as an important consideration for test reliability that is when scorers can easily recognize one correct response or a degree of judgment is needed on the part of the scorer. Hughes offers advice on how to make a test more reliable. Enough samples should be taken of items of the language; writing items should not discriminate between weak and strong students; they should not allow candidates too much freedom to answer easily; and scoring should be as objective as possible. When subjectivity is called for, testers should agree on acceptable answers and the appropriate scores. Instructions must be unambiguous, clear and explicit; exams should be well written and legible, and the test formats and techniques are familiar to the candidates. There should be uniformity and non-distracting conditions in the administration of the tests.

To get into the main concern of helping language teachers write better tests, the author delves into test construction by laying down a set of general procedures for test development, advising teachers to state clearly the testing purpose; write the specifications of the test that deals with content, structure, timing, medium/channel and techniques, the required levels of performance and the scoring procedures; after writing the specifications, writing of the items take place, as well as he moderation of items or making the necessary changes. Then, there should be informal trialing of items, preferably on native speakers. Because the author addresses this book not only to teachers in the classroom but also to professional testers in testing institutions, he also mentions the calibration of scales, validation by other test designers especially if the stakes are high, writing of manuals for test takers and test users, as well as training those who will be involved in the test process.

The author goes on to define common test techniques determined to elicit test behavior and a whole chapter on testing for each of the skills: reading, listening, writing, speaking, grammar, vocabulary, the integrated skills, as well as for testing young learners. In each of these chapters, he gives different purposes for testing the skill, examples of constructs, operations, representative tasks, and suggestions for scoring. Towards the end of his textbook, the author employs a chapter on test administration to provide the testers with an organized approach to administering a test event that should guarantee test validity and reliability.

There is an appendix at the end that demonstrates how analysis of test data can help to improve tests. This is with the end in view of the author’s aim to help test designers to write better tests. He writes about statistical analysis that could provide the tester with information that would be useful in making decisions about the test results and the improvement of the tests. Finally, in this last chapter, Hughes gives us information about how to look at test statistics and item analysis, each with their corresponding related concepts.

Reference

Hughes, Arthur (1989) Testing for Language Teachers. Second Edition. United Kingdom: Cambridge University press.

^[a] Profesor investigador, Área académica de Lingüística, Instituto de Ciencias Sociales y Humanidades, UAEH. Contacto: eoccena@uaeh.edu.mx