Chapter One Introduction
1.1 Research Background
Since China adopted the reform and opening-up policy, the coimminicationbetween China and other countries has made great progress throughout various fields,especially in culture. Also China, as the largest developing country, is deepening theco-operation and communication with the western countries represented by America andthe United Kingdom. A large number of overseas movies and TV series have beenintroduced to China. Meanwhile, with China's becoming one of the members in WTOand the deepening of globalization and integration, English is becoming theinternational language, that is to say, English is the language of many internationalorganizations. In addition to that, due to oior government's attention to English learning,almost 300 million people are learning English as a foreign language in China and moreand more people are realizing the importance of learning spoken English.As is known to all, this new generation of Chinese students was bom in the"viewing era,’ and they are more visually oriented. As one of the most influential massmedia, movies and TV series, classic or just popular, have been exerting their particularcharm on people for generations ever since the first movie came into being. Theyundoubtedly have fascinated people who are on the front line of popular culture, forthey have glaringly wonderful features which make them unique and different from anyother forms of arts. A questionnaire about films among over 1200 undergraduates andpostgraduates in about 20 universities and institutes in Beijing, issued by the OrganizingCommittee of China's College Students Film Festival in 1999, indicates that 31.2% ofcollege students prefer to watch a film as their first choice in their spare time, 37.5% ofthem watch two or three films each month, and 53.2% of them have partiality for filmsimported from the west (Cai Dongdong, 2000).
…………
1.2 Purposes and Significance
This study, based on the corpus of English Movies and TV series scripts(CEMTVS),aims at refining or extracting the spoken words and expressions mainlyaccording to the computer frequency counts and analyzing them mainly fromcollocation、semantic meanings and pragmatic functions so as to improve the Englishlearners' ability in understanding and application of spoken English.Also this study would be of great significance, because through the method basedon the corpus, we can find the frequency, patterns of use of words and expressions, andthe context they are used in. In this way,more findings can be attained and maybe someare strange to us. And this study can provide more examples and evidences for thefindings, so it can make English learners learn about the idiomatic spoken English betterso that they can improve their understanding of English and the ability of interpersonalcommunication. At the same time, this study is of great significance in the compilationof the dictionary of the Spoken English, the English teaching and learning by means ofEMTVS and can also benefit to the translation of the subtitles and the analysis of thediscourse in EMTVS and so on.
………..
Chapter Two Literature Review
2.1 Corpus Linguistics
Corpus is a new technology in the study of language, which depends greatly on theuse of computer. It is a body of natural language material stored in computer-readableform. Programs can be written to manipulate the language material in various ways. It isa powerful resource for linguistic research. A corpus is defined in terms of both its formand its purpose.Early grammarians like Yespersen and Quirk have collected, when compiling thedictionaries, a large body of texts (Wang Kefei et al.,2004: 3),referred to as a corpus inlinguistics (Huang Changning & Li Juanzi, 2003:1-2). According to Leech (1997: 1),the term corpus was used by early linguists to define “a body of naturally-occurring(authentic) language data which can be used as a basis for linguistic research."The typical definition is that "in the language sciences a corpus is a body of writtentext or transcribed speech which can serve as a basis for linguistic analysis anddescription" (Kennedy, 2000).Richard, J.C. et al. define corpus as “a collection of materials that has been madefor a particular purpose, such as a set of textbooks which are being analyzed andcompared or a sample of sentences or utterances which are being analyzed for theirlinguistic features" (Richard & Piatt, 2000).
………..
2.2 Relevant Studies
At home and abroad,the relevant studies on SE and EMTVS are mainly found inthe researches on the language characteristics, the pragmatic functions of particular typeof language or a single word or phrase and the application of EMTVS in languagelearning and teaching. For a long time, a large number of researchers have made studies on the languagecharacteristics of spoken English and EMTVS from different aspects, for example, fromthe aspects of vocabulary and grammar. Some researchers carry out studies on oneparticular feature,while some on the comprehensive features.As mentioned in Levinson's (1983:87-88) book, there are many expressions inEnglish, for example, but, therefore, in conclusion’ to the contrary, still,anyway, well,besides, actually,all in all, so, after all and so on. These expressions are provided as aname called discourse makers. Vagueness, sometimes called flizziness, is widely used inhuman verbal communication, and it is one of the basic features of natural language,which can be proved by a large number of scholars (Russell, 1923; Zadeh,1965; WuTieping, 1999). And Zhang Liping (2006) makes a study on vagueness tags in Englishspeech.
……….
Chapter Three Research Methodology....... 12
3.1 Corpus Building....... 12
3.2 Corpus Tools .......12
3.3 Research Procedures....... 13
Chapter Four Data Extracting and Analysis....... 16
4.1 Individual-Word-Based Extraction and Analysis.......16
4.2 Chunk-Based Extraction and Analysis....... 25
4.3 Annotation-Based Extraction and Analysis....... 32
4.4 Intuition-Based Extraction and Analysis....... 39
4.4.1 Tag Question ....... 39
4.4.2 Expressing Desire....... 40
4.4.3 Expressing Feelings....... 43
4.5 Summary 45Chapter Five Conclusion....... 46
5.1 Major Findings .......46
5.2 Implication .......47
5.3 Limitations and Suggestions for Further Study....... 47
Chapter Four Data Extracting and Analysis
4.1 Individual-Word-Based Extraction and Analysis
In CEMTVS,there are 17,437 word types, or different individual words, amongwhich the lemmatization is considered, so words like 'abandon, abandons, abandoning,abandoned' are taken as one word type. Among these word types, which are thefrequently used words?By observing the wordlist after the lemmatization especially the top 18 mostfrequent words occurring well in excess of 10,000 times, which are shown in Table 4.1,we find that many words obviously belong to the traditional province ofgrammar/function words including articles like a, pronouns like I,you, it, we, auxiliaryverbs like do, have, determiners like the, basic prepositions or adverbs like of,in, on,basic conjunctions like and, the demonstratives like that and what. These words are nowwhat I want to leave out. And the words I want to illustrate here are content wordsincluding a noun, verb, adjective or adverb whose main function is to express meaningas well as exclamations or inteijections. Because there is a large number of words ineach group and the page is limited, so the top 20 high frequency words will be listed ineach group, and then one word will be chosen as an example to be studied.
………
Conclusion
Based on the theories of corpus linguistic, this study has attempted to extract thespoken English expressions in the English Movies and TV series. And the majorfindings of the study can be summarized as follows: The frequency lists of individual word or individual nouns、verbs、adjectives、adverbs and interjections can be generated automatically and quickly, and by observingthe concordance lines of the individual word, we can find some common spokenexpression,for example,the noun thing is in collocation with another noun in front toexpress the thing of the same or similar category; the verb get is used in structures get—to do and get —doing, which is common in spoken English to express the meaningmake somebody to do something; the adjective good comes out frequently in good for,meaning it is good or great for; all is common as an adverb to take place ofcompletely or very\ well as an exclamation is used frequently to express different kindsof feelings.
…………
Reference (omitted)