Funnily enough I read this article on the train journey from Coventry to Southampton for the BALEAP PIM last weekend where I went to a session delivered by Hilary Nesi and Andy Gillet about the development of teaching materials using the BAWE and was disappointed to learn that BAWE is pronounced /bɔ:/ rather than the much cooler sounding  /bæwi:/ that a colleague and I had adopted!

“Issues in the development of the British Academic Written English (BAWE) corpus” by Sian Alsop and Hilary Nesi.


This article explores the reason the BAWE came into being and the issues faced when compiling a corpus of this size and offers insights into the building of a corpus having been through the process twice.

The BAWE came about to address a gap in existing corpora of the time which were a compliation of expert academic writing like the PERC Corpus of Professional English  or professionally produced writing like that found in text books which makes up much of the TOEFL 2000 Spoken and Written Academic Language Corpus (p.72). Alsop and Nesi point out that TOEFL 2000 claims to be a comprehensive collection of US university registers, yet does not house examples of student produced texts.  Any corpora that does contain student writing has “been designed primarily to monitor non-native-speaker errors and the processes of language acquisition, rather than the development of academic literacy skills and disciplinary knowledge” (p. 72). Alsop and Nesi do acknowledge that some small scale projects which had collected samples of student created texts produced for assessment within a discipline existed but that these had been compiled for individual research purposes (the authors then cite the likes of Woodward-Kron (2004), Moore and Morton (2005), North (2005) and Samraj (2004, 2008)). The fact that a pilot corpus, funded by the University of Warwick Research and Teaching Development Fund,  had been well received indicated to the creators that there was a need to build a much greater, more in depth version.

The lessons learnt from compiling the pilot corpus informed the larger project and this article focuses on the  “strategies and processes … employed in an attempt to achieve greater balance and representation in the full-scale project” (p. 73). The priority for the BAWE it seems was to collect work that was representative of the university experience, meaning that the corpus was built up of texts across a wide range of disciplines (increased by the range of provision available from four universities – namely, Warwick University, Reading University, Oxford Brookes University and Latterly Coventry University – in an effort to increase samples in underrepresented disciplines) and also across the levels of university study, i.e. from year one undergraduate to master’s level. The length and breadth of the BAWE was inspired by the shortcomings of the pilot.

Collection of the sample was also more controlled than it had been in the pilot to ensure a spread of representation and disciplinary groupings (chosen to mirror the BASE corpus – British Academic Spoken English) and a cap implemented on the number of assignments required in each discipline. A criteria was also established for the quality of the sample. Assignments would only be accepted if they had received a final grade of 60 percent or above. Individual authors were also restricted to submitting no more than three assignments at each elevel, but Alsop and Nesi admit that this had to be altered to accommodate limited numbers of submissions.

Another key factor to consider in undertaking such a project is the cost involved, both financially and in terms of time. Alsop and Nesi  detail the actual collection procedure in this article and the main lesson learnt, they say, is that time is of the essence. It was a lengthy process from the inital submission to the student having to physically sign a copyright form and receive their payment for having done so. The project also undertook two quite large advertising campaigns at the initial and half way stages of the project. The second being able to be more focussed and address some issues already becoming apparent. Therefore it was possible to hold ‘open-afternoons’, which advertised the BAWE “in person in targeted departments, [and which] provided information and computer access so that assignments could be submitted and processed on the spot, and contributors could recieve immediate payment” (p. 78).

And herein lie the authors’ concluding recommendations. If a project undertakes such advertising activities, then proper monitoring of their effectiveness should be carried out. The collection of data needs to address the time issue, Alsop and Nesi conceed that the obvious fully automated solution is just not finacially feasible. Finally, as the final corpus ended up being slightly under target (p.81) Alsop and Nesi highlight the need for alternative incentives to offer students in reward for cooperation, naming Conrad and Albers’ idea of giving student extra credit. Alsop and Nesi offer the idea of compulsory submission too (p.81).

Many thoughts struck me while reading this article. Firstly I was quite surprised that previous corpora of academic writing would not include novice writing. Somewhat naively I guess I did not expect student invisibility. I think the data on the exact percentage of non-native-speakers represented in the BAWE would also make interesting reading. I find it curious too that corpora of spoken academic English (BASE and MICASE)  would have been compiled first, I guess that that is simply a matter of ease of sample collection rather than any idea of placing greater import on a particluar skill.

Something that resonated for me while reading this article was the questionable role disciplines played in such a project and how instrumental departments are in its success. Across three universities “departments varied in cooperativeness” (p.74). And the initial advertising campaign included “an e-mail with departmental endorsement” as it was understandably  “assumed to hold more weight than an e-mail directly from the BAWE team” (p.77). The second wave of advertising took place in departments, again to give the project more credence. Yet in the closing comments made by Alsop and Nesi there is no mention of the fact that alternative motivations for sample collection are solely dependent on departments being involved in, and postively endorsing such a project and that actually recommendations on how to foster this involvement would prehaps be more fruitful for those considering embarking on a similar project.

I actually think that the very need for a corpus like the BAWE is indicative of the pervasive problem of a lack of collaboration between ESAP practitioners and departments. The BAWE is an invaluable resource and the great efforts gone to to make the project the success it is are testament to our profession’s desire to prepare our students as best we can for their university study. However, Nesi said something in her presentation at the PIM on Saturday that has stayed with me; “lecturers aren’t explicit in what they want” from students. So rather than working together to be explicit in what is expected of students on a particular course we look at above satisfactory examples of work and decipher what it was that lecturers expected. While there is a very real need for resources like the BAWE (and I am glad we have it) I do feel somewhat frustrated that we employ genre study and corpora as departmental cooperation in absentia.

Alsop, S. and Nesi, H. (2009) Issues in the development of the British Academic Written English (BAWE) corpus. Corpora 4 (1) 71-83.


2 thoughts on “Reflecting on BAWE

  1. As one of the people who worked on the BAWE corpus at the pilot stage, I found these reflections very interesting. They made me consider how things have moved on in the thirteen years since the pilot project. I think it is probably still true that departments might want to work more closely with EAP centres to make the requirements of written discourse more explicit, and in many ways they have started to do so. On the other hand, I also feel that EAP still has some quite serious issues of its own to resolve. Firstly, EAP practitioners are not by any means in agreement as to what EAP really is or what its goal should be. At its worst, EAP can come over as a rather dull, sanctimonious discipline, and this clashes strongly with the culture of some academic cultures. EAP also has a responsibility, if it is to be taken seriously, to market itself clearly and in a way that harmonises and blends in with research communities. Finally, perish the thought of subject lecturers not being clear of expectations! Most that I have come across are actually very clear about what they feel is good writing. The fact that they do not always phrase these requirements in terms that EAP practitioners are familiar with, perhaps does not lessen their validity? These are just some brief thoughts about your interesting blog entry. There are many challenges that lie ahead for EAP. A starting point is always to make it academically more attractive and interesting to students who are wearily ground down by years and years of IELTS study. No proliferation of corpora will do that in itself so other strategies may be needed. Could CLIL be one answer, perhaps?

  2. Hi Gerard,
    Thank you for your comments. I absolutely agree with you! EAP is in dire need of resolving some of its issues and I also think that CLIL may bring about a positive transformation of the practice of EAP. In my post ‘Puzzled by CLIL’ Ioannou Georguiou argues that CLIL needs to have its basic principles defined and I think this is true of EAP too. I also think that we need to include content specialists in this debate, as you say, EAP needs to blend with research communities and I think those communities need to help inform what EAP actually is, just as I also believe EAP practices can inform the effective delivery of and engagement with subject matter in disciplines (I’m presenting on this idea at BALEAP on Sunday).
    The student body in UK HE institutions is such (in my context Media PG is almost exclusively NNS) that the time is right to bring EAP and research communities together in ways that have previously not been so practical.
    I plan to involve discussion on this issue in my session at BALEAP, Gerard if you’re free I’d love for you to be there to take part 🙂

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s