1. Introduction
1.1 Background of the research
Language testing has been an integral component of language teaching from thevery start. Language tests were originally used by teachers to get a big picture of whatthe students had learned in and out of classroom. With the passage of time, tests began toserve more purposes, such as for placement of test-takers, for diagnosis of both learningand teaching, even for scientific research, etc. Since the 1940s, language testing hadgradually evolved from a single practice within the field of language teaching into anindependent and comprehensive discipline.Along with the progress made in language teaching theories, came the maturing andadvancing of testing theories and practices. It's universally acknowledged that languagetesting had gong through the following three eras: Pre-Scientific testing era,Psychometric-Structuralist testing era,and Psycholinguistic-Sociolinguistic testing era,also known as Communicative testing era.There is no point finding out when and where language testing first emerged, but itis generally believed that 1940s was a turning point for language testing. The periodbefore 1940s is regarded as the 1^^ era Pre-Scientific Testing era,in whichdevelopment and achievements in linguistics had little impact on language teaching, andboth the teaching practices and the tests lacked theoretical and scientific basis. Teachingpractices were mainly focused on delivering the knowledge of English language itself.Therefore, language tests took the form of translations or grammar, simply designed tocheck whether students had memorized the knowledge or not, and were scored accordingto the teachers' experience and judgment only.
………..
1.2 Research purpose and significance
As an independent institute, HUSTWB has a College English teacher crew andalmost the same students' academic level similar to other independent institutes. Theresearch on its language tests may serve as a reference for other independent institutes.This thesis aims to 1) make a brief evaluation of College English Achievement Tests(mentioned as “CEAT” hereby) in HUSTWB, 2) to find out to what degree the tests arevalid and what washback effects they have on the English teaching and learning.Through the researches on validity and washback effects of language tests inHUSTWB, the author is able to draw conclusions about the teaching and learningactivities conducted in the university and therefore give suggestions of how to improvethe teaching practices, and how to design more scientific language testing papers. Thisresearch can also serve as a pre-research for establishing a database of test papers. This thesis consists of five parts. The first part introduces the development inlanguage testing and the significance and purpose of this research.The second part offers definitions to the two key concepts "validity" and "washbackeffect”,and makes a brief review of theoretical and empirical studies in the field.The third part is about the methodology adopted in the paper, including researchpurpose, research subjects, research instruments, and data collection and analysis.The fourth part bases its analysis on the data collected from Part Three, exploring towhat extent CEAT is valid, and what washback effects, if any,the test has on Englishlearning and teaching process.The last part is conclusion,sharing the authors major findings from the research andgive suggestions to language teaching and testing in HUSTWB.
…………
2. Literature Reviews of Validity and Washback Effects
2.1 Validity as a criterion for language testing
As language testing practices and theories have continuously been improved andperfected, linguists like Henning,Alderson , Hughes,and Bachman, etc put forward andspecified some ways or methods in which qualities of a test can be measured, namelyreliability,validity, authenticity, difficulty, discrimination,and practicality, etc. Amongthese terms, "reliability" and “validity’’ are generally considered to be the most importantones. What kind of test can be regarded as a good test? To answer this question, manyexperts and linguists came up with various ideas,but there was one thing they had incommon "Does the test test what it is supposed to test?" We can put it in another way:"Is the test valid?”The following definition is given by Henning (1987): “Validity in general refers tothe appropriateness of a given test or any of its component parts as a measure of what itis purported to measure. A test is said to be valid to the extent that it measures what it issupposed to measure." According to this definition, the term “valid" or “validity” oughtto always be used together with the word “for”. Tests are designed to fulfill certainpurposes. If they are misused for purposes that they are not supposed to be used for,thevalidity of the tests would remain unknown.The definition given by Messick (1989) is that "validity is an integrated evaluativejudgment of the degree to which empirical evidence and theoretical rationales support theadequacy and appropriateness of inferences and actions based on test scores or othermodes of assessment."
………….
2.2 Washback effects of language testing
There is little dispute that language tests have always played an irreplaceable role inthe process and progress in language teaching and learning. Scholars had traditionallyconcentrated on designing and assessing a test in order to perfect it into a better fit for agiven purpose, and therefore more than often they discounted or overlooked thesignificance of a test's impact back on language teaching and learning. This kind ofimpact tests have on language teaching and testing is known as "washback effects”.Originally, the term "washback effect" came from the concept "measurement-driven instruction", first coined by Popham. He argued that “tests should and could driveteaching and hence learning." Since then, it has been referred to by many differentsynonyms, such as "test impact”,"systemic validity" (Frederic & Colllins 1989),"consequential validity" (Messick 1996),etc. As time went by, the term "washback" ispreferred by most researchers.Popham's argument was similar to Pearson's (1988) view that "it is commonlyasserted that tests have influence that can affect teachers and learners and thereby affectteaching and learning." But Pearson went into more details, "It is generally accepted thatpublic examinations influence the attitudes, behavior, and motivation of teachers,learners and parents."
………….
3. Research Design .........18
3.1 Research questions 18
3.2 Research subjects......... 20
3.3 Research instruments......... 23
3.3.1 Questionnaires ........25
3.3.2 Classroom observations .........27
3.4 Data collection and analysis......... 28
4. Results and Discussions......... 30
4.1 Content validity of CEAT in HUSTWB......... 30
4.2 Construct validity of CEAT in HUSTWB .........32
4.3 Washback effect questionnaires for teachers.........33
4.4 Washback effect questionnaires for students......... 37
4.5 Classroom observations......... 41
5. Conclusion .........44
5.1 Major findings......... 44
5.2 Limitations of the paper......... 45
5.3 Suggestions for further researches......... 45
4. Results and Discussion
4.1 Content validity of CEAT in HUSTWB
The test paper for CEAT in HUSTWB consists of written test and oral test. Thisthesis only focuses on written tests,which include five parts; Listening Comprehension(35%), Reading Comprehension (35%), Cloze (10%), Translation (5%), and Writing(15%). Listening and Reading Comprehension are further divided into smaller sections.(See Table 4.1) Among these questions,70% are objective question, and 30% aresubjective questions. Since content validity is closely related to what is tested, it is advisable to examinecontent validity by comparing the test paper with teaching syllabus* HUSTWB has beenadopting “College English Curriculum Requirements (For Trial Implementation 2004)”(referred to as "Requirements ” hereafter) as the guidance, and so the test paper should becompared with the "Requirements'',The “Requirements” specifically states that three levels of requirements are set: basicrequirements, intermediate requirements,and higher requirements. As HUSTWB itself isone of the independent institutes,whose students performed not so well in collegeentrance examination, it sets the standards a bit higher than the basic requirement andcloser to the intermediate requirement.
………..
Conclusion
As College English Achievement Test falls into the category of achievement test,content validity is to be measured first. By comparing the test paper with the teachingsyllabus, the author found out that listening test covers a wide range of the required areas,which shows a good content validity. However, in the reading section, the passageschosen for the test have limited topics among many and only cover 2 types of writingstyles, which indicates a weak content validity in reading. In all, the content validity isonly partially good.Construct validity is regarded as the very core of all types of validity. Whether thetest has good construct validity will be a vital factor for the evaluation of a test paper.Two approaches were employed to examine the construct validity of the test paper,correlation matrix and factor analysis. According to the evaluation (made in Chapter 3.3),the test paper under discussion has relatively high construct validity in terns ofsubjectively and objectively scored questions. But when each subtest is under themicroscope, the construct validity shows its poor properties.To improve the quality of the test, both the content validity and construct validityneed to be refined. As for content validity,content coverage and relevance need to betaken into account. As for the construct validity, each subtest should be modified. SinceCEAT covers a population over 2,000 students, this can not be achieved by a singleperson. Therefore it is advisable to recruit a crew of test-designers to work together for ascientifically and practically valid test.
…………
Reference (omitted)