Tag: digital humanities

What can models tell us about history?

2018-02-11 / T.S.Evans

One of my interests has been trying to use ideas from complexity and network science to look at historical questions. In part, this is because historical data is often really challenging to a researcher like me used to modern data sets. At the same time, that gap in our knowledge means that there is a real opportunity for modelling to be able to contribute something positive and substantial to historical debates. Part of the challenge is to think about the role of uncertainty in models. The effect of uncertainty in the data, how to quantify uncertainty in the models. My physics background has taught me that a conclusion without some sense of the accuracy of that conclusion carries little practical meaning. My recent paper with Ray Rivers, Was Thebes Necessary? Contingency in Spatial Modelling (Frontiers in Digital Humanities, 2017, 4, 8 ) was all about finding ways to probe uncertainties.

So I was delighted to support a suggestion from Chiara Girotto that, along with Ray Rivers, we should organise a session on the challenges faced by modelling in an archaeological context at the next EAA (European Association of Archaeologists) meeting The session has been accepted so we are really looking forward to this session in Barcelona, 5-8 September 2018 . We are looking for contributions, from anyone with something interesting to say. All types of contribution considered, for instance, presenters might want to highlight the limitations of this approach, or they will show how some technique from physical sciences might be adapted to this context. I’d love to hear about examples built upon good archaeological methods so I can learn more about the issues that archaeologists may take for granted but that I, with no formal training in archaeology, haven’t even thought about. So do think about contributing or attending.

I really enjoyed the EAA in Maastricht in 2017. A lot was outside my immediate research but still intriguing to me and I learnt a lot. There was also a solid core of modellers that made it both an exciting and relevant conference for me. I can see that our session entitled “Challenging the models: Reflections of reality?” fits in well with several other sessions so again, there is a really good strand through the meeting to keep me entertained and busy. At the time of writing the deadline for submissions was 15 February 2018

Session 545 at EAA Barcelona 5-8 September 2018: Challenging the models: Reflections of reality?

Content:

Currently modelling is a central part of archaeological behavioural research. Many papers focus on the ability to extract the reflections of past social interactions and structures from a variety of archaeological and environmental sources. Especially in the light of highly theoretic archaeological modelling in pre- and proto history this often leads to environmentally driven, Darwinian like models, devoid of cognitive human factors, fuzzy decision making, and the possibility of non-rational choice. Considering all implemented assumptions required for social interaction models we have to question whether a model might be too complex to operate on the basis of our data. Has it entered the vicious circle of self-affirmation? Are our models questioning our own lack of knowledge? Where are we on an epistemic-ontic scale?
In our session we wish to address and discuss current problems in archaeological behavioural modelling. Questions tackled might include
• whether we are creating Processualism 2.0?
• how narratives are encoded in models, as discussed from a theoretical, methodological or practical viewpoint?
• how the inclusion of social theory and the fuzziness of human decision making alters the results from a model?
• what is the impact of assumptions on modelling results?
• what is the impact of archaeological data on a model’s outcome?
• how we can use inherent capabilities and inabilities of models to better interpret and narrate our approximations of reality?

Main organiser:

Chiara Girotto (Goethe University Frankfurt, Germany)

Co-organisers:

Tim Evans (Imperial College London, U.K.), Ray Rivers (Imperial College London, U.K.)

Imperial College Physics Staff — Staff inthe Physics Department of Imperial College London clustered on the basis of the abstracts of their recent papers.

Exploring Big Historical Data

2016-02-10 / T.S.Evans

I’ve really enjoyed reading my copy of Exploring Big Historical Data: The Historian’s Macroscope (Macroscope for short here) by Shawn Graham, Ian Milligan and Scott Weingart. As the authors suggest the book will be ideal for students or researchers from humanities asking if they can use big data ideas in their work. While history is the underlying context here, most of the questions and tools are relevant whenever you have text based data, large or small. For physical scientists, many of whom are not used to text data, Macroscope prompts you to ask all the right questions. So this is a book which can really cross the disciplines. Even if some readers are like me and they find some aspects of the book very familiar, they will still find some new stimulating ideas. Failing that, will be able to draw on the simplicity of the explanations in Macroscope for their own work. I know enough about text and network analysis to see the details of the methods were skipped over but enough of a broad overview was given for someone to start using the tools. PageRank and tf-idf (term frequency–inverse document frequency) are examples where that practical approach was followed. Humanities has lot of experience of working with texts and a physical scientist like myself can learn a lot from their experience. I have heard this piecemeal in talks and articles over the last ten years or so but I enjoyed having them reinforced in a coherent way in one place. I worry a bit that that the details in Macroscope of how to use one tool or another will rapidly date but on the other hand it means a novice has a real chance to be able to try these ideas out just from this book alone. It is also where the on line resources will come into their own. So I am already planning to recommend this text to my final year physics students tackling projects involving text. My students can handle the technical aspects without the book but even there they will find this book gives them a quick way in.

Staff in the Physics Department of Imperial College London clustered on the basis of the abstracts of their recent papers.

I can see that this book works as I picked up some of the simpler suggestions and used it on a pet project which is to look at the way that the staff in my department are related through their research interests. I want to see if any bottom-up structure of staff research I can produce from texts written by staff matches up to existing and proposed top-down structures of faculties – departments – research groups. I started using by using python to access to the Scopus api. I’m not sure you can call Elsevier’s pages on this api documentation and even stackoverflow struggled to help me but the blog Getting data from the Scopus API helped a lot. A hand collected list of Scopus author ids enabled me to collect all the abstracts from recent papers coauthored by each staff member. I used python libraries to cluster and display the data, following a couple of useful blogs on this process, and got some very acceptable results. However I then realised that I could use the text modelling discussed in the book on the data I had produced. Sure enough a quick and easy tool was suggested in Macroscope, one I didn’t know, Voyant Tools. I just needed a few more lines to my code in order to produce text files, initially one per staff member containing all their recent abstracts in one document. With the Macroscope book in one hand, I soon had a first set of topics, something easy to look at and consider. This showed me that words like Physical and American were often keywords, the second of these being quite surprising initially. However, a quick look at the documents with a text editor (a tool that is rightly never far away in Macroscope) revealed that many abstracts start with a copyright statement such as “2015 American Physical Society”, something I might want to remove as this project progresses. I am very wary of such data clustering in general but with proper thought, with checks and balances of the sort which are a key part of Macroscope, you can extract useful information which was otherwise hidden.

So even for someone like me who has used or knows about sophisticated tools in this area and is (over) confident that they can use such tools, the technical side of Macroscope should provide a very useful short cut despite my initial uncertainty. Beyond that I found that having the basic issues and ideas behind these approaches reinforced and well laid out was really helpful for me. For someone starting out, like some of my own physical science masters and bachelors students working on some of my social science projects, they will find this book invaluable. A blog or intro document will often show you how to run a tool but they will not always emphasise the wider principles and context for such studies, something you get with Macroscope.

I should make clear that I do have some formal connections with this book, one of my contributions to the pool of academic goodwill. I suggested the general topic of digital humanities and Shawn Graham in particular as a potential author at an annual meeting of the physics and maths advisory committee for ICP (Imperial College Press). For free sandwiches we pass on ideas for topics or book projects to the publisher. I also commented on the formal proposal from all three authors to ICP, for which I often get a free book. My copy of Macroscape was obtained for reviewing a recent book proposal for ICP. Beyond this I get no remuneration from ICP. It is nice to see a topic and an author I highlighted to come together in a real book but the idea is the easy bit and hardly novel in this case. Taking up the idea and making it into a practical publishing project is down to Alice Oven and her ICP colleagues, and to the authors Shawn Graham, Ian Mulligan and Scott Weingart. That’s particularly true here as the book was produced in an unusual open source way and ICP had the guts to go along with the authors to try this different type of approach to publishing.

References

Exploring Big Historical Data: The Historian’s Macroscope
Shawn Graham (Carleton University, Canada),
Ian Milligan (University of Waterloo, Canada),
Scott Weingart (Indiana University, USA)
ISBN: 978-1-78326-608-1 (hardback)
ISBN: 978-1-78326-637-1 (paperback)

Myths and Networks

2012-08-21 / T.S.Evans

I have just read an intriguing paper by Carron and Kenna entitled the ‘Universal properties of mythological networks‘. In it they analyse the character networks in three ancient stories, Beowulf , the Iliad and the Irish story Táin Bó Cuailnge. That is the characters form the nodes of a network and they are connected if they appear together in the same part of the story. It has caused quite a bit of activity. It has prompted two posts on The Networks Network already and has even sparked activity in the UK newspapers (see John Sutherland writing in the Guardian Wednesday 25 July 2012 and the follow up comment by Ralph Kenna one of the authors). Well summer is the traditional silly season for newspapers.

However I think it is too easy to dismiss the article. I think Tom Brugmans posting on The Networks Network has it right that “as an exploratory exercise it would have been fine”. I disagreed with much in the paper, but it did intrigue me and many papers fail to do even this much. So overall I think it was a useful publication. I think there are ideas there waiting to be developed further.

I like the general idea that there might be some information in the character networks which would enable one to say if it was based on fact or was pure fiction. That is if the character networks have the same characteristics as a social network it would support the idea that it was based on historical events. I was intrigued by some of the measures suggested as a way to differentiate between different types of literary work. However like both Tom Brugmans and Marco Büchler, I was unconvinced the authors’ measures really do the job suggested. I’d really like to see a lot more evidence from many more texts before linking a particular measurement to a particular feature in character networks.

For instance Carron and Kenna suggest that in hierarchical networks for every node the degree times the clustering coefficient is a constant, eqn (2). That is each of your friends is always connected to the same (on average) number of your friends. By way of contrast, in a classical (Erdos-Reyni) random graph the clustering coefficient is a constant. However I don’t see that as hierarchical but an indication that everyone lives in similar size communities, some sort of fiction character Dunbar number. I’m sure you could have a very flat arrangement of communities and get the same result. Perhaps we mean different things by hierarchical.

Another claim was that in collaboration networks less than 90% of nodes are in the giant component. The Newman paper referred to is about scientific collaboration derived from coauthorships which is very different from the actual social network of scientists (science is not done in isolation no one is really isolated). I’m not sure the Newman paper tells us anything about character structure in fictional or non-fictional texts. I can not see why one would introduce any set of characters in any story (fictional or not) who are disconnected from the rest. Perhaps some clever tale with two strands separated in time yet connected in terms other than social relationships (e.g. through geography or action) – David Mitchell’s “Cloud Atlas” comes to my mind – but these are pretty contrived structures.

I think a real problem in the detail of the paper, as Marco Büchler points out, is that these texts and their networks are just too small. There is no way one can talk rigorously about power laws, and certainly not to two decimal place accuracy. I thought Michael Stumpf and Mason Porter’s commentary (Critical Truths about Power Laws) was not needed since every one knew the issues by now (I don’t in fact agree with some of the interpretation of mathematical results in Stumpf and Porter). Perhaps this mythological networks paper shows I was wrong. At best power law forms for small networks (and small to me means under a million nodes in this context) give a reasonable description or summary of fat tailed distributions found here but many other functional forms will do this too. I see no useful information in the specific forms suggested by Carron and Kenna.

Another point raised in the text was the idea that you could extract subnetworks representing `friendly’ social networks. That is interesting but really they are suggesting we need to do a semantic analysis of the links in the text, indicating where links are positive or negative (if they are that simple of course) and form signed networks (e.g. see Szell et al. on how this might be done on a large scale http://arxiv.org/abs/1003.5137). I think that is a much harder job to do in these texts than the simple tricks used here suggest but it is an important aspect in such analysis and I take the authors’ point.

Finally I was interested that they mention other character networks derived from five other fictional sources. I always liked the Marvel comic character example for instance (Alberich et al, http://arxiv.org/abs/cond-mat/0202174) as it showed that while networks were indeed trendy and hyped (everything became a network) there was often something useful hiding underneath and trying to get out in even the most bizarre examples. However what caught my eye in the five extra examples mentioned by Carron and Kenna was that they treated these five as ‘fictional literature’. One, Shakespeare’s Richard III, is surely a fictionalised account of real history written much closer to the real events and drawing on `historical’ accounts. I’d would have expected it to show the same features as they claim for their three chosen texts.

So I was intrigued and in that sense that always makes a paper/talk worthwhile to me. However while I was interested I’d need to see much more work on the idea. You might try many different tests and measurements and see if they cumulatively point in one direction or another – I imagine a PCA type plot showing different types of network in tight clusters in some `measurement’ space. I’d still need convincing on a large number of trial texts. These do now exist though, so surely there is a digital humanities project here? Or is it already happening somewhere?