Funnily enough I read this article on the train journey from Coventry to Southampton for the BALEAP PIM last weekend where I went to a session delivered by Hilary Nesi and Andy Gillet about the development of teaching materials using the BAWE and was disappointed to learn that BAWE is pronounced /bɔ:/ rather than the much cooler sounding /bæwi:/ that a colleague and I had adopted!
“Issues in the development of the British Academic Written English (BAWE) corpus” by Sian Alsop and Hilary Nesi.
This article explores the reason the BAWE came into being and the issues faced when compiling a corpus of this size and offers insights into the building of a corpus having been through the process twice.
The BAWE came about to address a gap in existing corpora of the time which were a compliation of expert academic writing like the PERC Corpus of Professional English or professionally produced writing like that found in text books which makes up much of the TOEFL 2000 Spoken and Written Academic Language Corpus (p.72). Alsop and Nesi point out that TOEFL 2000 claims to be a comprehensive collection of US university registers, yet does not house examples of student produced texts. Any corpora that does contain student writing has “been designed primarily to monitor non-native-speaker errors and the processes of language acquisition, rather than the development of academic literacy skills and disciplinary knowledge” (p. 72). Alsop and Nesi do acknowledge that some small scale projects which had collected samples of student created texts produced for assessment within a discipline existed but that these had been compiled for individual research purposes (the authors then cite the likes of Woodward-Kron (2004), Moore and Morton (2005), North (2005) and Samraj (2004, 2008)). The fact that a pilot corpus, funded by the University of Warwick Research and Teaching Development Fund, had been well received indicated to the creators that there was a need to build a much greater, more in depth version.
The lessons learnt from compiling the pilot corpus informed the larger project and this article focuses on the “strategies and processes … employed in an attempt to achieve greater balance and representation in the full-scale project” (p. 73). The priority for the BAWE it seems was to collect work that was representative of the university experience, meaning that the corpus was built up of texts across a wide range of disciplines (increased by the range of provision available from four universities – namely, Warwick University, Reading University, Oxford Brookes University and Latterly Coventry University – in an effort to increase samples in underrepresented disciplines) and also across the levels of university study, i.e. from year one undergraduate to master’s level. The length and breadth of the BAWE was inspired by the shortcomings of the pilot.
Collection of the sample was also more controlled than it had been in the pilot to ensure a spread of representation and disciplinary groupings (chosen to mirror the BASE corpus – British Academic Spoken English) and a cap implemented on the number of assignments required in each discipline. A criteria was also established for the quality of the sample. Assignments would only be accepted if they had received a final grade of 60 percent or above. Individual authors were also restricted to submitting no more than three assignments at each elevel, but Alsop and Nesi admit that this had to be altered to accommodate limited numbers of submissions.
Another key factor to consider in undertaking such a project is the cost involved, both financially and in terms of time. Alsop and Nesi detail the actual collection procedure in this article and the main lesson learnt, they say, is that time is of the essence. It was a lengthy process from the inital submission to the student having to physically sign a copyright form and receive their payment for having done so. The project also undertook two quite large advertising campaigns at the initial and half way stages of the project. The second being able to be more focussed and address some issues already becoming apparent. Therefore it was possible to hold ‘open-afternoons’, which advertised the BAWE “in person in targeted departments, [and which] provided information and computer access so that assignments could be submitted and processed on the spot, and contributors could recieve immediate payment” (p. 78).
And herein lie the authors’ concluding recommendations. If a project undertakes such advertising activities, then proper monitoring of their effectiveness should be carried out. The collection of data needs to address the time issue, Alsop and Nesi conceed that the obvious fully automated solution is just not finacially feasible. Finally, as the final corpus ended up being slightly under target (p.81) Alsop and Nesi highlight the need for alternative incentives to offer students in reward for cooperation, naming Conrad and Albers’ idea of giving student extra credit. Alsop and Nesi offer the idea of compulsory submission too (p.81).
Many thoughts struck me while reading this article. Firstly I was quite surprised that previous corpora of academic writing would not include novice writing. Somewhat naively I guess I did not expect student invisibility. I think the data on the exact percentage of non-native-speakers represented in the BAWE would also make interesting reading. I find it curious too that corpora of spoken academic English (BASE and MICASE) would have been compiled first, I guess that that is simply a matter of ease of sample collection rather than any idea of placing greater import on a particluar skill.
Something that resonated for me while reading this article was the questionable role disciplines played in such a project and how instrumental departments are in its success. Across three universities “departments varied in cooperativeness” (p.74). And the initial advertising campaign included “an e-mail with departmental endorsement” as it was understandably “assumed to hold more weight than an e-mail directly from the BAWE team” (p.77). The second wave of advertising took place in departments, again to give the project more credence. Yet in the closing comments made by Alsop and Nesi there is no mention of the fact that alternative motivations for sample collection are solely dependent on departments being involved in, and postively endorsing such a project and that actually recommendations on how to foster this involvement would prehaps be more fruitful for those considering embarking on a similar project.
I actually think that the very need for a corpus like the BAWE is indicative of the pervasive problem of a lack of collaboration between ESAP practitioners and departments. The BAWE is an invaluable resource and the great efforts gone to to make the project the success it is are testament to our profession’s desire to prepare our students as best we can for their university study. However, Nesi said something in her presentation at the PIM on Saturday that has stayed with me; “lecturers aren’t explicit in what they want” from students. So rather than working together to be explicit in what is expected of students on a particular course we look at above satisfactory examples of work and decipher what it was that lecturers expected. While there is a very real need for resources like the BAWE (and I am glad we have it) I do feel somewhat frustrated that we employ genre study and corpora as departmental cooperation in absentia.
Alsop, S. and Nesi, H. (2009) Issues in the development of the British Academic Written English (BAWE) corpus. Corpora 4 (1) 71-83.