Data Mining and Machine Learning Series

New methods, old data: exploring historical migrant letter corpora

2nd February 2021, 11:00 add to calender
Emma Moreton

Abstract

This project examines experiences of migration contained in one of the largest historical collections of migrant letters existing today, those of Irish migrants in the US and Canada during the late 18th to early 20th century. At present, the corpus contains around 4,000 texts (more are being added). The letters have been digitised and basic contextual information has been captured, such as author/recipient name and location, date of letter, relationship between participants (sibling, parent, grandparent etc.) and sex of author/recipient. The letters were collected from two sources: 1) the Documenting Ireland: Parliament, People and Migration (DIPPAM) project, an online archive hosted by Queen’s University, Belfast and 2) Professor Kerby Miller’s private collection of over 5,000 Irish migrant letters, housed at the University of Missouri.
Over the past decade there has been a growing interest in migrant letters and ‘histories from below’. Social and cultural historians have overwhelmingly used qualitative methods to study both the complex social processes of migration (such as push/pull factors and the role of institutions and communities) and the conditions and daily lives of the migrants themselves (e.g. Erickson, Miller, Kamphoefner et al.). On the other hand, linguists have used mainly quantitative methods to explore language change and variation, identity construction and pragmatic features of correspondence (e.g. Elspa?, McLelland, Nurmi and Palander-Collin). There is, in other words, a separation of work between on the one hand the detailed study of individual migrant families and their trajectories, political contexts and socio-cultural worlds, and, on the other hand, the study of how linguistic features differ and evolve. The digitisation of migrant letters, as will be discussed, makes it increasingly possible for these disciplines to work together in new and creative ways. However, finding the best means to interrogate and compare large digitised letter collections is still a challenge. Issues such as lack of punctuation and non-standard spellings are problematic, and finding ways of scaling up micro-studies (for e.g., identifying topics and themes in the discourse of one letter collection and then doing the same across the whole corpus) present technical challenges that cannot easily be addressed using corpus tools and techniques alone.
In this presentation I will outline some of the work that has already been done with the migrant letter corpus (mainly by corpus linguists, historians and political scientists). I will then discuss some of the problems we have faced when working with the corpus and where I see a need for specialists in NLP, data mining and computer science in order to fully explore this type of data.
add to calender (including abstract)

Biography

Emma Moreton is a researcher and teacher in Applied Linguistics in the Department of English at Liverpool University. She completed her PhD in Corpus Linguistics at the University of Birmingham in 2016. Emma’s publications to date use a mixed methods approach to examine the language of historical letter collections (including eighteenth century pauper letters and nineteenth century letters of migration). She is especially interested in how ego-documents can help us to understand the lives and experiences of ordinary men and women, whilst also providing new perspectives on social, cultural and economic issues of the time. Her recent publications focus, in particular, on how new technologies can be used to analyse and visualise the language and content of digitised correspondence collections, allowing users to identify topics and themes in the discourse, pragmatic functions, or letter writing networks for example (see Moreton and Culy, 2019 and De Felice and Moreton, 2019). She has worked on several JISC funded projects that explored the use of visualisation tools with corpus data and in 2014 she was Co-Investigator on an AHRC Research Networking project (Digitising experiences of migration), which examined issues surrounding the transcription, digitisation, annotation and interconnectivity of migrant letter collections from around the world.