In the first ‘Parliaments, Politics and People’ Seminar of 2016, Dr Luke Blaxill (Hertford College, Oxford) spoke on ‘‘Big data’ and the analysis of parliamentary and platform speeches, 1880-present’. Here he discusses his paper…
My paper made the argument that almost all modern historians – and perhaps especially political historians – are increasingly surrounded by huge digitised textual collections which are too large to read, let alone analyse, in entirety. Given this, it is increasingly important that we explore the potential of computerised textual analysis methodologies which allow us to escape our present scholarly confinement only to what can be selectively read and quoted. One such methodology is utilising ‘text mining’ (often also called ‘distant reading’) on huge collections of texts (known as ‘corpora’) with the aim of answering linguistic research questions.
In this paper, I showed two sets of examples of text mining in action on corpora I myself assembled, or ones which are publically available. Rather than simply showing ‘cool’ things which could be done with text mining, or delivering a sermon on ‘revolutionary’ digital methods, I wanted to try to show text mining in action, and connect these examples to existing historiographical debates, and demonstrate how text mining could advance them.
My first set of examples were taken from several corpora of election platform speeches, 1880-1910, which I digitised myself from press archives, or took from the British Newspaper Archive. I began my analysis with the issue of Ireland in election platform speeches. This was very simple, and simply involved tracking the keyword ‘Ireland’. However, even this basic counting provided interesting insights into synergies and divergences between the Liberal and Conservative parties, and between platform language in a case study region (East Anglia) and national frontbench speakers. It also suggested that, outside of 1886 and 1892, Ireland was not a consistently central issue in elections in this period. My second example was more sophisticated, and investigated the language of imperialism in electoral politics. At this point, I reflected a little more on how issues are modelled and tracked with keywords, and introduced topic modelling and seed words as empirical mechanisms by which could be accomplished. Using a five word taxonomy of imperialism, I showed that it was mentioned by Conservatives twice as often as Liberals, but was not a particularly central issue outside of the elections of 1886, 1900, and 1906. Particularly surprising was the limited impact empire appeared to make on 1895, an election often caricatured as being similar to 1900- a Conservative landslide carried by imperialism. I also showed some examples of how text mining could also connect keywords to arguments, which demonstrated the more emotionally charged context of Conservative as opposed to Liberal mentions of empire.
My second set of examples showcased some work from the Digging into Linked Parliamentary Data project (Dilipad) on which I am a research associate. The Dilipad project has digitally enriched the existing publically available Hansard corpus to allow a historian to search MP’s speeches by gender and (for most of the Twentieth Century) party. This has transformed the ability of researchers to interrogate this source via text mining. My case study here was on women In Parliament since 1945. The degree to which women MPs utilised a distinctly female parliamentary language – and thereby contributed to the ‘substantive representation’ of women – has been much debated by both political scientists and historians. I showed that women parliamentarians spoke about women around five times as often as male colleagues, and demonstrated a list of strong gender markers which showed vocabulary women used consistently more often than men. Many of these concerned topics such as health and welfare. Perhaps more interestingly, I then demonstrated that this ‘gender gap’ has in fact been reduced by the entry of women into Parliament in large numbers since the 1997 election, and that since this historic contest (where 120 women MPs were elected) women’s parliamentary language has in fact become more similar to men’s. This challenges the influential ‘critical mass’ theory, which suggested that the emergence of large female parliamentary presence would create a stronger cross-party women’s lobby where gender differences could be more sharply articulated.
In addition to these practical examples, I also made some general theoretical arguments about the utility of corpora for historians. In particular, on the ability of computers to overcome the fallibility of human scholarly intuition and estimations of importance, their ability to communicate quantity with greater precision and verifiability, and the opportunity they afford to work more empirically.
I concluded by reiterating the historic opportunity which today’s scholars now possess with these resources at their fingertips, and encouraged historians to boldly experiment with these (and other) techniques, while at the same time not discarding traditional historian’s skills of close reading.
Our next ‘Parliaments, Politics and People’ seminar will take place later today when Dr Coleman Dennehy (University College, Dublin) will give a paper on ‘Dublin, Westminster, and appellate jurisdiction in early modern Ireland’. Full details available here. We hope you can join us!