Sangam: A Confluence of Knowledge Streams

Securing a data set on allegations of sexual abuse made against the former disc jockey, Jimmy Savile

Show simple item record

dc.contributor ESRC - Economic and Social Research Council
dc.contributor McDonnell, Diarmuid
dc.creator Smith, Mark
dc.creator Llewellyn, Clare
dc.creator Ruus, Laine
dc.creator Kirkwood, Steve
dc.creator Burnett, Ros
dc.date 2017-08-31T13:15:05Z
dc.date 2017-08-31T13:15:05Z
dc.identifier Smith, Mark; Llewellyn, Clare; Ruus, Laine; Kirkwood, Steve; Burnett, Ros. (2017). Securing a data set on allegations of sexual abuse made against the former disc jockey, Jimmy Savile, [dataset]. University of Edinburgh. School of Social and Political Science. Social Work. https://doi.org/10.7488/ds/2126.
dc.identifier https://hdl.handle.net/10283/2809
dc.identifier https://doi.org/10.7488/ds/2126
dc.description ## This dataset has been moved to the Edinburgh DataVault, where it is directly accessible only by authorised University of Edinburgh users. For further information please see https://www.research.ed.ac.uk/en/datasets/securing-a-data-set-on-allegations-of-sexual-abuse-made-against-t ## In this work we look at the initial phase of an ESRC funded project involving academics from Social Work, Criminology, Informatics and the University of Edinburgh Library.This project collected and analysed a data set on allegations of sexual abuse made against the former disc jockey, Jimmy Savile. The Savile affair has taken place in a public and highly charged, arena. It has generated massive media attention and spawned several public reports, most notably that which was produced as a result of Operation Yewtree. Early allegations against Savile emanate from former residents at Duncroft, a residential school for `wayward but intelligent young women'. This project stems from data produced and collected by the blogger `Anna Raccoon' herself a former resident at the school. Through her blogs on the subject of Savile and Duncroft she was contacted by others and has collected a variety of information on the subject. The data harvested from the blog are supplemented by official reports and other blogs.
dc.description The initial component of the project involves capturing Anna Racoon’s blog (The Racoon Arms). This is a WordPress blog that was taken down by the author. Following previous research approaches [9, 8] we searched for copies of the site in other content management systems. We found that this site had been archived in several frozen states in the Internet Archive’s WayBackMachine (IA). An active blog is a constantly evolving object, and therefore careful consideration needs to be given as to what version or versions should be harvested. Given that the blog is available via the IA, one might question why it is necessary to download a copy at all. There are two main reasons for doing so. Firstly, the IA may at any time, and without notice, remove the objects from their archive. Secondly, to provide additional functionality to support qualitative analysis of the content of the blog, as well as indexing to support additional resource discovery not provided within the blog software or the IA. While harvesting the contents of a blog manually can be a long and arduous process, it can be simplified and automated using a software solution, such as wget. Apart from soliciting permission from the IA, decisions need to be made as to which version or versions should be harvested. Further decisions included to what level of recursion each harvest should be and whether just blog text or all files contributing to content and functionality of the blog should be gathered. Such decisions influence not only the size of the eventual object, but also the richness of the context. There are also concomitant draw-backs – the deeper the recursion, the greater the number of missing files (those that have not been harvested by the IA). Given that WordPress blogs are based on HTML format files, apart from any images and other audio-visual files that may be associated with the blogs, the text portion is in as efficient a format as possible vis-a-vis file storage as well as capacity to use XML to provide value added indexing and tagging. Storage capacity requirements depend largely on the number of snapshots of the blog that are harvested and the level of recursion specified in the harvests. The size of one snapshot can range from 53 MiB to 660 MiB (ranging from 1,500 to 88,000 files), depending on the options specified.
dc.format application/zip
dc.format application/pdf
dc.format application/zip
dc.language eng
dc.publisher University of Edinburgh. School of Social and Political Science. Social Work
dc.rights Creative Commons Attribution 4.0 International Public License
dc.subject Jimmy Savile
dc.subject sexual abuse
dc.subject Duncroft
dc.subject Social studies::Social Work
dc.title Securing a data set on allegations of sexual abuse made against the former disc jockey, Jimmy Savile
dc.type dataset


Files in this item

Files Size Format View
admin_docs.zip 1.206Gb application/zip View/Open
A Shared Langua ... Sensitive Information.pdf 87.20Kb application/pdf View/Open
Savile KE Event Survey Responses 9.zip 49.67Kb application/zip View/Open

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse