Social media as a research method

June 4, 2017 | Autor: Tarrin Wills | Categoria: Media Studies, Digital Humanities, Social Media

Descrição do Produto

Communication Research and Practice

ISSN: 2204-1451 (Print) 2206-3374 (Online) Journal homepage: http://www.tandfonline.com/loi/rcrp20

Social media as a research method Tarrin Wills To cite this article: Tarrin Wills (2016) Social media as a research method, Communication Research and Practice, 2:1, 7-19, DOI: 10.1080/22041451.2016.1155312 To link to this article: http://dx.doi.org/10.1080/22041451.2016.1155312

Published online: 25 Apr 2016.

Submit your article to this journal

View related articles

View Crossmark data

Full Terms & Conditions of access and use can be found at http://www.tandfonline.com/action/journalInformation?journalCode=rcrp20 Download by: [University of Sydney Library]

Date: 25 April 2016, At: 18:10

COMMUNICATION RESEARCH AND PRACTICE, 2016 VOL. 2, NO. 1, 7–19 http://dx.doi.org/10.1080/22041451.2016.1155312

Social media as a research method Tarrin Wills Department of English, University of Sydney, Sydney, Australia

Downloaded by [University of Sydney Library] at 18:10 25 April 2016

ABSTRACT

Major developments in information technology in the digital society are eventually realised in the way in which research is conducted, particularly in the ﬁeld of digital humanities (DH). Through a brief historical survey, this paper observes that the adoption of new technologies in DH occurs with some delay from the wide-scale adoption of the same technologies in other areas of society. This delay allows for a prediction about what technologies may be adopted in the near future in DH. In particular, the rise of social media in recent years provides a potential model for future DH research, particularly as it diﬀers greatly from previous technologies in its capacity to engage end-users in digital methods. This paper argues that the techniques by which users interact with data in social media, particularly categorisation and semantic tagging, can be applied to a broad range of humanities research methodologies using similar interfaces to those of social media platforms. It then discusses some research tools developed by the author as a way of facilitating the interaction between researchers and primary sources using digital methods. Although much more limited than social media tools, it shows a way forward for implementing social media methods in the ﬁeld of humanities research.

ARTICLE HISTORY

Received 27 November 2015 Accepted 8 February 2016 KEYWORDS

Digital humanities; social media; hashtag; medieval studies; digital society

Introduction This paper is based on a presentation at the conference Digging the Data at the University of Sydney (17 April 2015). It incorporates some reﬂections on the insights gained through the conference on how those working, particularly, in the discipline of media studies use and analyse the outputs of new media. It addresses a cultural and methodological problem speciﬁcally in the ﬁeld of digital humanities (DH), focusing on humanities research which deals with non-contemporary primary materials, particularly those which do not originate in digital processes. Through a brief historical survey of DH in medieval studies, it identiﬁes a signiﬁcant delay in the take-up of otherwise mature and widely adopted information technologies in traditional disciplines in the humanities. The consequence of this delay is that we can predict what changes in information technology and culture are likely to inﬂuence future research in such ﬁelds. One movement that has dominated the period leading

CONTACT Tarrin Wills

[email protected]

© 2016 Australian and New Zealand Communication Association

Downloaded by [University of Sydney Library] at 18:10 25 April 2016

8

T. WILLS

up to this paper is that of social media, and the latter part will address the question of how social media can be applied to traditional humanities disciplines as a research methodology in its own right. This paper develops some general principles by which broad types of humanities research could be advanced using methods deriving from and compatible with the ways in which social media users interact with textual and non-textual data. This paper avoids reference to works which attempt to deﬁne DH and digital editing in particular (but see e.g. Robinson, 2013 for a review of these), as well as ‘where are we now/going’-type collections and publications in DH as they often obfuscate the longterm picture and do not necessarily reﬂect broader trends. Although the author has been working in digital methods for humanities research since the late 1990s, the research has been largely focused on research questions related to the interdisciplinary study of early Scandinavia, rather than DH, a research ﬁeld in itself.

The adoption of mainstream information technologies in DH In 1974, Speculum, the highest-impact journal in the ﬁeld of medieval studies, included a paper entitled ‘Report: Computers and the Medievalist’ (Bullough, Lusignan, & Ohlgren, 1974). The vast majority of the paper is devoted to describing concordancegenerating projects, with music, archaeology, and social data in small sections of their own. What is apparent from this stage of computing in the ﬁeld is that these methods could only process relatively simple structures (words, numeric data). The report concludes, But we feel that we are slowly leaving this barbarian age, and moving toward our Carolingian Renaissance. To do more, many of us will have to gain more training, or perhaps demand that our students obtain training. In fact, it is signiﬁcant that a high proportion of studies reported in this brief review were undertaken by assistant professors or fairly recent Ph.D. graduates. Perhaps they are the coming generation. We should also provide ourselves with the necessary tools to achieve this progress. Data preparation remains the heaviest burden of text processing and here the setting up of a medieval data bank would bring an important relief to the scholar. Medieval text processing calls for its own Irish monastic libraries! (Bullough et al., 1974, p. 402)

What is remarkable about this summary is how much of it remains true to this day. It does point to a fundamental problem in how the ﬁeld works in comparison with other ﬁelds such as new media – to begin the texts need to be digitised, and digitised in a way that supports analysis. For this reason, a large amount of energy in the ﬁeld is devoted to data preparation. Some 10 years later, a publication Computer Applications to Medieval Studies dealt with a few projects in the ﬁeld. The contributions are notable for their largely descriptive content, rather than addressing speciﬁc research questions. The data formats discussed reﬂect the heyday of mainframe computing: non-relational ﬁxed-length records (i.e. a single table with up to 8 columns). Few projects had the capacity to deal with a character set that encompassed case sensitivity let alone non-ASCII characters. This technology could still produce simple text concordances, usually distributed in print, and at least one project attempted to use computer processing to collate manuscript versions of texts.

Downloaded by [University of Sydney Library] at 18:10 25 April 2016

COMMUNICATION RESEARCH AND PRACTICE

9

Notably, the complexities of relational databases were largely not available to those who contributed to the 1984 volume, apart from one paper (Bächler). The relational database model had been conceived in 1970 (Codd, 1970), and its language (SQL) not long after (Chamberlin and Boyce 1974), and by 1984 systems implementing the model were available from IBM and Oracle. There is no mention of marked-up text in the volume – SGML had been under development for several years but only became a public standard in 1985. The projects described in the 1984 volume are still very much restricted to the techniques already seen in 1974, but with a small minority making use of the developments of the previous decade to do more advanced work. The PC revolution was still a long way from impacting on research in the ﬁeld, and the work of these projects was directed towards dissemination in print, a fundamental restriction on the use that could be made of these projects. The early 1990s saw the emergence of the internet and in particular the World Wide Web. The development of the web was enabled by the development of platformindependent and easy to develop programming languages that had emerged around the same time, such as Perl 5 (late 1994), Java (1994–1995), and JavaScript (1995). This was a period when computing software was dominated by Microsoft, whose main market was large workplaces. In such enterprises, the employer provides the software and employees use it for their work and there are consequently few compatibility problems within the workplace. Whole industries tended to use the same software for this reason, that is, to allow exchange between enterprises. The most obvious problem raised by the emergence of the Internet was that if people were going to collaborate (either for work or recreation), they would need to be using tools that could ‘talk’ over the internet. Proprietary ﬁle formats – most notably Microsoft Word’s .doc format – could cause compatibility problems. The other issue was that, as the internet connected diﬀerent types of computer and diﬀerent types of people, when information was exchanged, it would need to be in a format that reﬂected the meaning and use of the underlying information. This would allow for diﬀerent applications to process the same information. In the humanities, word processors, Microsoft Word in particular, did what humanities scholars needed, that is, produce documents that could be published in print publications. Print publications, if expanded to include electronic representations of print (e.g. PDF), are still the dominant medium for not just disseminating results but the process of humanities research itself. This is the fundamental challenge for increasing the take-up of digital methods in the humanities: most projects start and end with a Word document, which has almost no semantic information in its electronic form, apart from possibly the basic structure of the text. The solution to the problem of semantic structure in the humanities was largely provided through the Text Encoding Initiative (TEI), a standard for the digital representation of texts. Originally a Standard Generalised Markup Language (SGML) application, the ﬁrst full and public version was published in 1994 (P3 – see http://www.tei-c. org/About/history.xml). It was later updated to be compatible with the emerging Extensible Markup Language (XML)-generalised markup standard (XML is a subset of SGML which is easier to process). TEI provided a comprehensive solution to a number of the early problems: it was an open standard format that encompassed a huge range of humanities projects including textual scholarship (prose, drama, primary

Downloaded by [University of Sydney Library] at 18:10 25 April 2016

10

T. WILLS

sources, critical editions, etc.), dictionaries, and language corpora. TEI is now the de facto standard for producing digitally structured materials in traditional textually focused humanities disciplines. The process of standardisation of data representation in DH was based on the premise that personal computers, rather than networked servers, would process digital research materials. The internet, in this thinking, exists as a means of convenient ﬁle sharing. The presumption remains with TEI that scholars hand-code the XML, possibly with the help of an XML editor. In the digital society more broadly, XML is now ubiquitous – but usually at the front end of platforms and methodologies rather than at the back end (with some exceptions, including TEI and, perhaps ironically, ﬁle formats such as Microsoft’s .docx). It is telling that the TEI project still does not have a set of publication tools that can reﬂect anything like the complexity that its semantic model allows. Many types of DH projects do not use TEI. Dictionaries, for example, (such as the Dictionary of Old English and the Dictionary of Old Norse Prose) normally use relational databases as their underlying technology. They require encoding complex relationships and normally involve teams of scholars working concurrently on small pieces of text. Networked relational databases allow those scholars, through web or desktop interfaces, to produce and publish their work collaboratively. Although published through web interfaces, in such projects, most of the collaboration and interaction occurs through desktop computers and local networks. These diﬀerences between the two basic data structures – XML and relational data – is relevant to this paper because they determine how users interact with data. XMLbased projects tend to involve one person at a time on one device producing a single ﬁle or set of ﬁles, which are then processed. Database projects can involve multiple people on diﬀerent devices interacting with diverse media. They are also far more scalable, with well-deﬁned mechanisms for the exponential expansion of users and data. For these reasons, the rise of new media has relied on a technological foundation not of XML but of relational data. Most social media platforms (e.g. Facebook, Twitter, Wikipedia) started or continue to be developed with a variation of the LAMP (Linux, Apache, MySQL, PHP/Perl) platform, combining operating system (normally Linux), web server (Apache or similar), database server (especially MySQL), and an application programming interface (API) to connect them and generate web pages or interfaces to mobile apps. There are some DH projects which use something close to this model, but they tend to be the minority. This brief survey of major developments in the ﬁeld of DH shows consistently a 5–15-year delay in the take up of basic digital technologies such as mainframe computing, generalised markup languages and relational databases. This delay is not counted from when the technologies are ﬁrst developed, but rather from when those technologies achieve widespread adoption in the digital society. In other words, there is a considerable time gap in DH when a new technology is mature, widely available and inexpensive and when the systems and processes are implemented in humanities research. The time lag allows us to make a prediction about the future development of DH based on (relatively) recent trends in digital media. In terms of hardware technologies, the popularity of networked hand-held touch-screen devices, especially phones and

COMMUNICATION RESEARCH AND PRACTICE

11

tablets, has the potential to improve and develop the way researchers and end-users interact with data and research resources. This paper, however, will focus on software technologies, in particular, social media. The rise of social media in recent years provides an obvious contender for future developments in DH. The following section will explore ways in which social media techniques can be used in a research context.

Downloaded by [University of Sydney Library] at 18:10 25 April 2016

Social media tools as semantic analysis Social media can be deﬁned as ‘a group of Internet-based applications that build on the ideological and technological foundations of Web 2.0, and that allow the creation and exchange of user-generated content.’ (Kaplan and Haenlein 2010, p. 61). This deﬁnition applies to the main social media platforms that I will refer to here, such as Facebook, Twitter, and Wikipedia. The success of social media derives from its capacity, as the name suggests, to connect people, and to do so in a way that provides a counterpart to direct, traditional social interactions – sharing experiences, opinions, jokes; organising events, and so on. In order to do this, social media platforms rely on the ability of users to categorise information according to audience and content. The audience may be the general public (tweets, public posts on other platforms, blogs, etc.) or limited to those known or categorised by the content creator (friends, circles, groups). Importantly to the present study, these platforms provide numerous ways of organising the content itself according to semantic and social ﬁelds (hashtags, fan pages), as well as temporal and spatial dimensions (universal timestamps, geotagging), subject (hashtags, URLs, handles), group or community (hashtags, groups), event (calendartype events, photo albums), and disposition towards the thing referenced (likes, plus ones, shares, hashtags). Categorisation is an important but perhaps under-recognised tool in managing and analysing data in research. Some social media platforms use categorisation as a foundational principle, notably Pinterest. Perhaps the most powerful of the techniques for semantic analysis is the hashtag, which was popularised by Twitter and is now used by all major social media platforms (apart from Wikipedia, which has its own in-text referencing system; cf. https://en.wikipedia.org/wiki/Help:Link). The ontology or taxonomical study of hashtags is still in its infancy, although the work of Bruns et al. in this special issue foregrounds a developing typology for hashtags, and (Caleﬃ, 2015) oﬀers a linguistic analysis of the phenomenon. Yang et al. identify two broad categories of hashtag (described as ‘[an] organisational object of information’): the ﬁrst and most obvious is the ‘bookmark of content’, but the second is as a marker of a virtual community (Yang, Sun, Zhang, & Mei, 2012, p. 261). What deﬁnes that community may be membership of an organisation, attendance at an event, or more commonly a particular disposition towards the content of the information itself. The two roles in fact overlap: the ﬁrst role of the hashtag labels the content, the second makes it available to those interested in such content. The resulting information structure is sometimes referred to as a folksonomy. Unlike an ontology, it is not structured but various tools can be used to create an ontology on the basis of the folksonomy (e.g. Christiaens, 2006).

12

T. WILLS

Downloaded by [University of Sydney Library] at 18:10 25 April 2016

Hashtags allow end-users to reference concepts that are not part of the data structure of the platform. They implement, in a very simple way, a methodology used in DH projects, that is, markup, as well as one used in cataloguing and data curation, that is, restricted vocabularies. Handles also provide a very simple linking mechanism within the platform compared with traditional web methods (i.e. URIs in html links – although most platforms allow for the automatic generation of links from URIs). In many platforms hashtag information is used to tailor content, particularly advertising – but the tools which enable the semantic categorisation can be seen as analogous to many of the analytic techniques which underpin current humanities research.

Semantic classiﬁcation and humanities research Surveys of scholars in medieval studies in 2002 and 2011 show that while there has been a very quick adoption in the ﬁeld of digital publication of secondary materials, the use of digital editions barely changed in the period (Porter, 2013, p. 9). Part of the problem may be with the ways in which scholars interact with the primary materials, that is, how they are read and analysed. DH researchers often make the distinction between ‘close reading’ and ‘distant reading’ as analytical approaches in the ﬁeld. Much research assumes that the latter is the goal of DH research, whereas close reading is generally thought to require a discursive method, which can only be realised in the process of writing longer texts such as theses, papers, and monographs. However, a good proportion of humanities research requires as its foundation a systematic identiﬁcation and analysis of social or semantic ﬁelds across a corpus or text, which may be done with or without the use of digital techniques. In this section, I will show that such an approach can cut across the categories of close and distant reading and be applied to digital research. PhD projects can be viewed as an indicator of the future of research. A very large proportion of present and recent PhD projects do not use digital methods but at the same time seek to explore semantic or social ﬁelds in a body of cultural products appropriate to their discipline. If we look at the titles of recent projects, we see a remarkably consistent pattern across disciplines of how research is done in the humanities by those who will be the future of the disciplines. For example, a quick browse of some of the more common keywords in the University of Sydney’s online repository of PhD theses brings up titles such as: ‘Repetition, revision, appropriation and the Western’ (Robards, 2014); ‘From footnotes to narrative: Welsh noblewomen in the thirteenth century’ (Richards, 2005); ‘God’s Comics: Religious Humour in Contemporary Evangelical Christian and Mormon Comedy’ (McIntyre, 2013); and ‘Sisterly Subjects: Brother-sister relationships in female-authored domestic novels, 1750–1820 (Cliﬀord, 2013). All of these projects attempt to analyse diverse sociocultural ﬁelds (ﬁlm/narrative techniques, gender and socioeconomic categories, literary techniques, and social ﬁelds, respectively) in a particular corpus (ﬁlm and literature genres, historical-geographical periods). Either or both the sociocultural ﬁelds and corpus are appropriate to the discipline of the researcher, but the approach can be generalised: it involves the identiﬁcation and analysis of examples of the ﬁelds realised within the corpus. The

Downloaded by [University of Sydney Library] at 18:10 25 April 2016

COMMUNICATION RESEARCH AND PRACTICE

13

sociocultural ﬁelds usually involve an implicit ontology where diﬀerent aspects of a ﬁeld are discussed in relation to the corpus. This is a common approach to a large proportion of humanities research, although it may not always be recognised as such. Traditional digital projects rarely belong to such a framework, instead addressing problems of preservation, description and formal analysis rather than social or semantic ﬁelds. They are often focused primarily on the text or cultural product itself. This is not necessarily the case in the ﬁeld of media and communications. The papers presented in the Digging the Data conference at which the present work originated, for example, show an emphasis on the sociocultural ﬁelds as the starting point. Such a method can be eﬀectively applied to a digitised body of material when it is in a form that can be electronically analysed. The diﬀerence in approach – between DH focused on traditional disciplines, and digital media and communication studies – may have to do with the primary materials themselves, which are not normally originally created in digital form for traditional humanities ﬁelds. In a discipline that deals with media created for digital platforms, the analysis of sociocultural ﬁelds is relatively straightforward. Other disciplines, which deal in particular with pre-twenty-ﬁrst-century media, face the problem that the materials may not be digitised, or may not be digitised in a way that is apparently useful to the researcher. However, a large proportion of the primary materials of a number of disciplines have in fact already been digitised. This is certainly the case for pre-twentieth-century printed works (especially through Project Gutenberg, Internet Archive, Google Books and similar), but now also extends to a very large amount of material available through public institutions (especially Semantic Web/metadata projects), and public social media platforms (Twitter API etc.), as well as more specialised academic repositories such as the Oxford Text Archive. Copyrighted work remains an obstacle, particularly predigital sources, but much of this can be easily digitised, and for the purpose of semantic analysis can be privately digitised (scanning and OCR), processed and referenced without requiring the reproduction of the copyrighted work for publication. In order for this type of research to take place, mature technology is required for robust user-friendly semantic markup and/or ontological analysis of texts on a large scale. Such technology exists: GATE (General Architecture for Text Engineering) has been in development since 1995 and is used by a large number of projects for manual and automatic semantic markup and analysis of large corpora. Originally developed for computing science projects it is rarely used in DH, despite the relevance of its capabilities. A search for the name of the tool in the journal Digital Scholarship in the Humanities (formerly Literary and Linguistic Computing), for example, reveals only one project using the tool in the principal DH journal (Odat, Groza, & Hunter, 2015). GATE could be used, for example, to import a novel, web pages, or other resources and then either automatically analysed against an ontology, or manually marked up with semantic labels. This would facilitate ﬁnding relevant materials in the corpus and help develop analyses based on ontologies and other relationships between terms and features. Non-textual corpora would require manual tagging of relevant features in images, video, and so on. Tools likewise exist for these purposes. One of the main advantages of a digital method for this approach to humanities research is that the materials could be made available to other researchers, either

Downloaded by [University of Sydney Library] at 18:10 25 April 2016

14

T. WILLS

working on a similar corpus or exploring similar sociocultural ﬁelds in related corpora. Using the PhD projects above as an example, Richards’ analysis of medieval noblewomen in Welsh sources could be extended to English or Irish sources by other researchers; or the investigations of female-authored domestic novels by Cliﬀord could be extended to other types of relationships, reﬁning the ontologies that arise from the analysis. Importantly, this approach is not incompatible with the work of those producing scholarly digital editions. In cases where such editions exist, researchers can build on that work by using those editions as the basis of their semantic analysis, producing results based on more reliable and exchangeable corpora than other kinds of digitised works. However, with clear metadata deﬁnitions for other digitised work, subsequent digital editing projects can align the semantic analyses with the new editions. The prerequisite for this is a clearly deﬁned set of metadata for each corpus, deﬁning individual texts and their parts so that analyses of diﬀerent versions can be aligned. The prerequisite for these methodologies is a set of tools akin to social media interfaces, which allow researchers and potentially others interested in a ﬁeld to interact with the primary materials in order to produce semantic tagging and categorisation. Crowdsourcing projects such as those hosted by Zooniverse (www.zooniverse.org) allow for such work, but within the strict semantic conﬁnes of the particular project, with most humanities projects involving transcription. What is needed is a more generalised interface, which allows for users to interact with text, image, and other creative products, which can be easily digitised but may not be originally digital. Social media breaks down the distinction between researchers as end-users and as developers: end-users are the content creators and often drive development, as we have seen in the use of hashtags on platforms, such as Facebook before the platform has implemented them.

A case study: the skaldic project One of the reasons for the delay in the adoption of mature technologies in DH may be due to the relatively short-term and restricted nature of funding for many DH projects. With a longer time frame and consistent support, a project may be able to develop in a way that takes advantage of new techniques and technologies in the digital society. At this point I will share my direct experience of a project, ‘Skaldic Poetry of the Scandinavian Middle Ages’ (http://abdn.ac.uk/skaldic), which may be held as such an example. This is with a view to showing the development of a relationship between a platform, its users, and its content. I have had a long-standing involvement in the project, starting in 1999, during my doctoral research, when I began work with Margaret Clunies Ross on the project. It aims to edit the corpus of skaldic poetry, a complex poetic form composed in Old Norse by mainly Icelanders and Norwegians from the ninth to the fourteenth centuries. The resulting output was originally envisaged as two print volumes with ﬁve contributors. (As is typical of these projects, the scope has increased by an order of magnitude, with now around 50 contributors and at least 16 large volumes, published digitally and in print. In some 18 years, only about half the corpus has been ﬁnalised.) For my PhD project (1997–2000), I was working on an interactive TEI-based edition of an Old Norse

Downloaded by [University of Sydney Library] at 18:10 25 April 2016

COMMUNICATION RESEARCH AND PRACTICE

15

text, and I was invited to advise on how digital methods might be applied to the new project. In 2001, following the submission of my PhD, I began full-time work on the project, and in 2007, I became a member of the editorial board. TEI seemed like an obvious solution for this type of project, but there were a few challenges that quickly became apparent. The project was very diverse in terms of its metadata and media, incorporating diﬀerent types of text and image, necessitating a move away from the purely text-encoding approach as originally envisaged. Contributors needed information about and access to the hundreds of manuscripts containing the poetry, as well as a reference point in the existing standard editions of the corpus (in particular Finnur Jónsson 1912–1915, which deﬁned the organisation of the corpus used by subsequent major editions). The solution was a relational database linking the corpus structure of Finnur Jónsson’s edition to its various contexts: manuscript pages (with images where available), prose works in which the poetry is recorded, and the new project and its editorial team. The web seemed an obvious platform for interacting with this database, as contributors were working on three diﬀerent continents. By 2003, this database had a web interface for editing its contents, built on the LAMP conﬁguration. The project was from a very early stage built on and for the web, although the major research outputs continue to be print publications. (Metadata and material not in the printed volumes are publicly available, and the content of the volumes themselves is available after a 3-year embargo.) The scope of the project meant that the content of the edition would need to be concurrently updated by numerous contributors and assistants, requiring the data structure to be divided in some way in order to facilitate this process. At this stage TEI was retained for the encoding of the text of individual stanzas of poetry, with the rest of the structure represented by the relational data model. However, the relational database platform proved to be ﬂexible and scalable enough to avoid the mixing of XML and relational data in this way, particularly with improvements to the opensource MySQL server software. After the ﬁrst volume was produced in 2007, I converted the remaining TEI to a relational model. The process of conversion did not involve great challenges or compromises. Fundamentally, the tree structure of XML/SGML is compatible with a relational model, allowing for bidirectional linking of data, but requiring additional information and processing to retain the syntagmatic structure of language (see Wills, 2013). The purpose of moving to a largely database solution (there is some XML tagging within short sections of text) was to enable scalability and to allow for further analysis of the material, which had already expanded into related ﬁelds. It also allowed the development of interfaces to allow the editing of data by the dozen or so editors and assistants involved in this process. The semantic and analytical information in the project, that which is encoded electronically, is largely represented through the linking of various entities (texts, words, dictionaries, manuscripts, editors, bibliographic items, etc.) through the relational links within the database. These are represented to the end-user, as is typical of such projects, by HTML pages linking the entities together. The design, like similar projects, conforms to Semantic Web principles as each relationship has a deﬁned ‘triple’

Downloaded by [University of Sydney Library] at 18:10 25 April 2016

16

T. WILLS

representing the subject, predicate, and object of the link (where the predicate is deﬁned by the named link itself). In many ways the most impressive success of this approach was to engage scholars, particularly senior scholars, who do not see themselves as doing digital research, in the process of electronic editing and analysis. Just two of the retired or soon-to-retire contributors (the editors of the most recently published volumes) have between them made over 70,000 edits to the database. The key here has been to make the digital interface closely mirror the traditional processes of scholarship. In the case of textual editing, this involves gathering the manuscript and secondary sources, transcribing and collating the primary sources, normalising the text and recording variants, providing a close translation, commentary, and in the case of skaldic poetry, analysing the complex diction and word order. The database exports the information in a format that is used to directly produce traditional print publications. The added value of an electronic approach is that all of the information produced and processed is digitally linked to related information, such as the text, translation, and commentary on individual words, and pieces of text to manuscript images. These links can be represented in a variety of interactive ways through web and similar interfaces. The processes are implemented digitally through a web interface to the database using a series of interactive forms. The online forms have been gradually modiﬁed over the years to almost eliminate technical markup, while maintaining the electronic encoding of semantic structures. The result is a resource, which can be exported as TEI and preserves the intricacies of TEI, but which does not require knowledge of the markup language, nor does it permit errors in the markup language itself. The basic model for incorporating textual editing processes with primary and secondary sources, and analysis proved to be generalisable to related ﬁelds. In 2012, I was invited to join the international project, Pre-Christian Religions of the North (PCRN), and have taken on the role of developing a database of the sources of Northern paganism (see http://abdn.ac.uk/pcrn). This project builds on the corpusbased work of the Skaldic Project, but adds a further semantic dimension to it: the process of amassing a body of evidence for such a broad and diverse phenomenon as religious belief and practice is not very useful unless there is some way of navigating it for speciﬁc information regarding those practices. My original conception of the PCRN database was to create a series of complex structures linking the diﬀerent text types (written, onomastic, archaeological, visual, epigraphic, etc.) with the mythological-religious phenomena they may shed light on (gods, supernatural beings, cultic practices, etc.). The development process has involved recruiting PhD students to incorporate analyses of mythological-religious material into the database, experimenting with diﬀerent models by adapting their own analytical approaches. The complexity of dealing with the source types as structures requires extensive training and still leaves compromises in the digital representation of the interpretations (see Wills, 2014). The most successful approach so far with this project has been semantic tagging of text fragments (supported in particular by the University of Iceland). Semantic tagging is a much more accessible technique for the postgraduate-level assistants working on this project, and this

COMMUNICATION RESEARCH AND PRACTICE

17

practical discovery has informed the above proposal for semantic tagging in this and other humanities ﬁelds.

Downloaded by [University of Sydney Library] at 18:10 25 April 2016

Conclusion Social media engages end-users on a large scale in processes of semantic analysis and categorisation of data, using interfaces that rely on either no technical markup or very simple markup, such as hashtags and handles. A survey of diverse research outputs in a range of ﬁelds, as well as a critical understanding of social media tools for categorisation, tagging and linking, shows that these digital methods can be applied to a great deal of traditional research methodologies. These methodologies include both close and distant readings of primary materials that have been digitised. The history of DH in traditional ﬁelds shows a signiﬁcant delay of 5–15 years in the adoption of the dominant information technology movements into its research practices, which is perhaps not surprising given the constraints of funding and tradition on the disciplines it serves. This delay suggests that the emerging trends in the digital society of the last 5–10 years should now be ripe for integration into DH methodologies. Crowdsourcing projects, such as those hosted on Zooniverse, show that this approach can be applied to simple methodologies, such as transcription and semantic analysis within narrow categories and primary source types. The emerging challenge is to generalise these approaches to much larger corpora and highly complex ontologies. This will allow for methodologies and analyses to be compared across research projects. The author’s experience in developing digital research interfaces for both senior and emerging researchers demonstrates that researchers can eﬀectively engage with methodologies comparable to those in social media without technical training. The key in all cases is to provide interfaces, which allow researchers to pursue traditional readings and analyses using digital methods. The added value of the digital methods is that they can be scaled into much larger corpora, media and semantic ﬁelds. There are a number of challenges if this vision is to be realised. The corpora themselves must be referenced consistently, despite a great deal of variation in editions, reproductions, and digitisations of the primary materials in many ﬁelds. Perhaps more importantly, the emerging ontologies must not only be compatible across media, disciplines, and corpora, they must also allow the ongoing conﬂicts and debates about the very semantic and social ﬁelds that they encompass.

Disclosure statement No potential conﬂict of interest was reported by the author.

ORCID Tarrin Wills

http://orcid.org/0000-0001-5360-3495

18

T. WILLS

Notes on contributor Tarrin Wills (PhD Sydney) is lecturer in English at the University of Sydney, on secondment from the Centre for Scandinavian Studies at the University of Aberdeen. He has made numerous contributions to the ﬁelds of Digital Humanities and Old Norse studies, including extensive involvement in the projects Skaldic Poetry of the Scandinavian Middle Ages, Pre-Christian Religions of the North, The Medieval Nordic Text Archive (Menota) and the Medieval Unicode Font Initiative (MUFI). In 2016, he will be taking up a Horizon 2020 Marie Curie fellowship at the University of Copenhagen.

Downloaded by [University of Sydney Library] at 18:10 25 April 2016

References Bullough, V. L., Lusignan, S., & Ohlgren, T. H. (1974). Report: Computers and the medievalist. Speculum: A Journal of Mediaeval Studies, 392–402. doi:10.2307/2856091 Caleﬃ, P.-M. (2015). The ‘hashtag’: A new word or a new rule? SKASE Journal of Theoretical Linguistics, 12(2), 46–70. Chamberlin, D. D., & Boyce, R. F. (1974). SEQUEL: A structured English query language. Proceedings of the 1974 ACM SIGFIDET (now SIGMOD) workshop on Data description, access and control. New York, NY: ACM. Christiaens, S. (2006). Metadata mechanisms: From ontology to folksonomy… and back. In R. Meersman, Z. Tari, & P. Herrero, et al. (Eds.) On the Move to Meaningful Internet Systems 2006: OTM 2006 Workshops. Springer: Berlin Heidelberg. Cliﬀord, K. (2013). Sisterly subjects: Brother-sister relationships in female-authored domestic novels, 1750–1820 (PhD thesis). University of Sydney, Sydney. Codd, E. F. (1970). A relational model of data for large shared data banks. Communications of the ACM, 13(6), 377–387. doi:10.1145/362384.362685 Gilmour-Bryson, A. (Ed.). (1984). Computer applications to medieval studies. Kalamazoo: Western Michigan Univ. Jónsson, F. (Ed.). (1912–1915). Den norsk-islandske skjaldedigtning. Copenhagen: Villadsen & Christensen. Kaplan Andreas, M., & Michael, H. (2010). Users of the world, unite! The challenges and opportunities of social media. Business Horizons, 53(1), 61. doi:10.1016/j.bushor.2009.09.003 McIntyre, E. (2013). God’s comics: Religious humour in contemporary evangelical Christian and Mormon comedy (PhD thesis). University of Sydney, Sydney. Odat, S., Groza, T., & Hunter, J. (2015). Extracting structured data from publications in the Art Conservation Domain. Literary and Linguistic Computing, 30(2), 225–245 . Porter, D. (2013). Medievalists and the scholarly digital edition. Scholarly Editing, 34, 1–26. Retrieved from Robards, A. (2014). Repetition, revision, appropriation and the Western. PhD thesis, University of Sydney. Richards, G. (2005). From footnotes to narrative: Welsh noblewomen in the thirteenth century (PhD thesis). University of Sydney, Sydney. Robinson, P. (2013). Towards a theory of digital editions. Variants: The Journal of the European Society for Textual Scholarship, 10, 105–131. Wills, T. (2013). Relational data modelling of textual corpora: The skaldic project and its extensions. Literary and Linguistic Computing. doi:10.1093/llc/fqt045 Wills, T. (2014). Semantic modelling of the Pre-Christian Religions of the North’. Digital Medievalist, 9. Retrieved from Yang, L., Sun, T., Zhang, M., & Mei, Q. (2012). We know what@ you# tag: Does the dual role aﬀect hashtag adoption? Proceedings of the 21st international conference on World Wide Web. New York, NY: ACM.

COMMUNICATION RESEARCH AND PRACTICE

Downloaded by [University of Sydney Library] at 18:10 25 April 2016

Digital Resources Internet Archive Facebook General Architecture for Text Engineering Google Books Oxford Text Archive Pinterest Pre-Christian Religions of the North Project Gutenberg The Skaldic Project The Text Encoding Initiative Twitter Wikipedia Zooniverse

19

Lihat lebih banyak...

Social media as a research method

Descrição do Produto

Comentários