Internet histories and computational methods: a ”round-doc” discussion

Why should scholars consider using computational methods when they study the many forms of the internet, including the web and social media? Can you illustrate this by one or more examples?

Computational methods are helpful when researchers are interested in processing and analysing large quantities of data that cannot be processed manually or qualitatively.
However the temptation to use computational methods in internet research just because of the “structural fit” might be misleading, as these methods may be helpful in answering certain research questions, but not others.
Many interesting questions can emerge when you start with a particular computational method or approach and think of ways to apply it to historical sources.
I think it’s necessary to distinguish between the use of computational methods in two different broad areas, the first related to the retrieval of information and the second to the quantification of information.
Different domains of research have different traditions with regard to methods, information retrieval and quantification both often call for computational methods in this space, and often there is a structural fit.
One of the more sterile tropes of much of the discussion about the digital humanities is an opposing of “traditional” and “digital” methods, as if it were necessary that one of the two should be all-sufficient. […] digital methods in general add possibility– to answer the questions that could not feasibly be approached before – and that there is no reason to suppose that traditional historical method is thereby somehow under threat of extinction.
Picking up the question of which comes first, method or question, it is surely the case that both are true. I could hardly even fire up a particular application without having some initial question to ask. Over time, my research question will evolve as the work progresses. And (as William suggests) I will most likely emerge at the end with new questions to pursue, some of which have only occurred to me as a result of coming to know more about the method. The two cannot meaningfully be untangled.

La investigación ha cambiado mucho desde la entrada del mundo digital. No es tanto por dónde se empieza en el qué y cómo, sino que se van intercambiando por el uso de lo digital, por el tema de la investigación, por la pregunta que queremos resolver.
En Colombia cómo pensamos el internet?
Todo tiene que ser respondido con métodos computacionales? Qué tipo de preguntas lo requieren? Cuáles no?
No funcionan los métodos computacionales en Colombia porque no hay fuentes con las que se puedan trabajar.

What should scholars be looking out for when they use these methods? What are the possible pitfalls and challenges?

In this case, I identify two pitfalls related to their relationship to technology. First, historians’ fascination with the belief that the computational tool is inherently objective.
A useful guiding question could be whether or not there is added value in introducing computation to the analysis. […] However, when computational tools and methods are critically devised and specifically tailored to answer specific research questions, they open up a variety of exciting new ways of thinking about research questions, and of answering them creatively, reflexively and critically.
One of the most valuable outcomes of working with specifically tailored tools is that they not only provide the kind of results that Anat describes, but that they often draw attention to their own limitations.
So, in my opinion, researchers should not employ a computational tool because it is widely adopted in the community, on the opposite they should critically question it, especially because it is so widely used.

"The more widely adopted a tool is within a community, the less useful it becomes” point that Turkel and Nanni are making is we think worth pausing on.

The key thing is to focus on giving them skills to create tools, rather than giving them tools per se.
But best of all would be a standard of training that allowed people to build and test their own tools, or to verify that the tools of others were working properly and appropriate to the task at hand.
Tool criticism
Always be aware of the limitations and restrictions of an approach.
It means that when you are using computational methods in the context of internet data you need to know the limitations of your data.
Researchers have an obligation to be clear and transparent about these limita tions and to provide access to their data where possible so that others can replicate and validate their work.
The need to create tools adapted to new research questions that will inevitably be specific and will also allow innovative methods and results. On the other hand, researchers who have a good knowledge of the tools and their limitations can also help to improve the development of internet studies among beginner researchers in computational methods.

To use computational methods, the object of study needs to be in digital form. Do you have any thoughts about to what extent the process of collecting influences computational research? Are the right sources collected? In the right format? By the right institutions?

Institutions often are obliged to communicate on how data are collected, structured and stored.
In return, this also makes it more difficult for computer scientists to use the data, who must necessarily be part of a state-funded research project.
There is already wide acknowledgement among researchers that data collec tion practices are never neutral, and that constraints on access and on the ability to use various data pose significant challenges to the types of research that can be done with them.
The use of computational methods for internet research and the study of our present times is tightly interconnected with the availability of big data to be analysed by the community.
Complexity of obtaining access to such collections, especially for a scholar who is not affiliated with a national library or directly involved in an international project on the topic.
There are so many different types of data, and different standards, that the problem of data format remains complex.

Parte de los problemas es la fragilidad, obsoletización de la tecnología.

Is there anything that in your mind impedes the use of computational methods in studies of the history of the internet? Are source collections not “researcher-friendly”? Is there a lack of adequate methods and tools? Are there other obstacles?

Quite often Digital Humanities researchers spend their entire doctoral studies in building up such expertise, and they might never have the time/chance of actually using such tools in a substantive research.
I believe that the number one issue is the prompt accessibility to web archive data. While this is due to understandable reasons, the lack of access for the broad academic commu nity has limited, among other things, the development of tools specifically tailored for particular web archiving issues and, I would argue, also the perception of the challenges that web archivists are currently facing.
Most histories of the internet have been written without computational techniques, just as most histories generally. Moreover, many histories of the internet have been written without (cit ing) web archives, which could be considered a main source of historical material.
The Internet Archive’s Wayback Machine is a wonderful interface for viewing the his tory of a webpage from a qualitative point of view.
I once read somewhere that the most successful interdisciplinary work happens when a single individual is trained in the techniques of multiple disciplines.
But in order for us to effectively study the Web in a way that aligns with its fundamental nature, we need methods of trans national discovery and analysis, and if that necessitates government-level action to amend copyright legislation in different nations, then we should be lobbying for that.

How do you see the relation between subjectmatter experts like historians and new media scholars and developers (from systems librarians to programmers)? Should internet historians learn to code, or conversely, is the onus on developers to learn about historical methods?

From a cultural history perspective, it does not seem essential to me to know how to code but it is necessary to have some knowledge of coding and HTML language in general. I believe that historians must develop a digital culture and computer skills in order to be the best possible interlocutors to participate in the design of computational ana lysis methods with developers.
It seems to me that one of the challenges is precisely to succeed in cooperating all together (historians, engineers, archivists, programmers, etc.) to pro pose innovative methods but also easily usable tools that would democratise the use of natively digital sources in history.
The interdisciplinary collaboration is successful when the research questions, or the object of study, are interesting enough – scientifically – to all involved.
It is true that the research question needs to be perceived as “interesting” by the computer scientist and the computational aspect of the problem needs to be “challenging enough”, but I think that this is often not the main issue.
developing a proper data science profile is actually very challenging and it could bring you far away from the research question that you originally intended to address, often to a place where it is difficult to demonstrate the relevance of your research to either community, because it is at the same time not “novel” enough for an NLP audience and not “substantive” enough for a historical one.
In my experience, the research questions need to be compelling to all involved in a project, but what is ultimately compelling to one person will not be the same for all others.
don’t expect that historians will automatically learn code, or that computer scientists will learn the nuances of digital humanities scholarship, but it is important to find a common language. Understanding in both directions will ultimately increase the success of the research.
How research projects are conceived in terms of their staffing, which in turns depends on models of funding.
If, on the other hand, the project is conceived as one which speaks to both ques tions in the humanities and in computer science or library and information studies, then the dynamics will necessarily be different.
The question “should historians learn to program” is a slightly unhelpful one. If we were instead to ask: “do we need there always to be some historians who are learning to program”, then the answer is clearly a positive one.
But it is (I think) neither pos sible or desirable for all historians to be proficient programmers.
What scholars do however need, I think, is a grasp of basic principles of computer sci ence, data management, archival science, project management and (in particular) of the characteristics of successful development projects.

The use of computational methods in historical study has a history of its own. What are the most defining moments in the history of computational methods?

I would note the development of cloud computing that allowed scaling analyses beyond the constraints of physical memory, and the development of open source programming languages such as Python and R, that attracted a wide community of users.
I consider defining moments for our discipline all the improvements in information retrieval systems, and their impact on our everyday life and our work as historians.
Beyond that, there has been a groundswell in workshops and tutorials at annual meetings, over the summer, and online, that has served to create a rich set of educational resources.

How do you see the future of using computational methods for historical studies of the internet? What are the biggest challenges? The biggest opportunities or most exciting projects today? Which type of methods and tools would you like to see developed?

On the methods and tools side, I am currently working with Ina on data extraction from web archives. This is important to me because I believe that the creation of cor pus analysis tools would facilitate the appropriation of web archives by researchers in the social sciences and humanities.
I would like to see the development of methods related to visual studies that allow the identification of the path of visual content from pre-existing media archives (print media, television) to and in the archives of the Web.
Important computational work is currently being conducted by the Memento project at Old Dominion University and elsewhere, where researchers develop methods, tools and web services for understanding the archived web beyond the boundaries of a sin gle collection or archiving institution; and by “The Archives Unleashed” project, led by researchers from the University of Waterloo, which develops toolkits that facilitate the analysis of large scale web archives for historical research.
There are two areas that require developing new methods and tools: the first is the question of web archiving after social media, and how to facilitate research across different types of web archives and other datasets, and the second is the need to develop tools specifically designed for critiquing the archiving process, or web archives as institutions.
As I remarked before, I believe the core challenges for the future of computational methods in historical studies are twofold: on the one hand, the diffi culties of accessing (and therefore experiencing) web archived collections and on the other hand, the lack of critical attitudes towards computational methods.
The encrypted ephemerality of messaging apps provides another challenge. These days it’s as if much of the content, however valuable, is out of reach of the archivist. I’m buoyed by the increased usage of web archives by scholars and students and would encourage developing and also compiling teaching units with web archives.
We do not fully know the scale of the internet and the web, because we have not yet been able to crawl and analyse the full extent of the web.
I believe that the current push for educational resources is the fields greatest strength, as new scholars will continue to push the domains of computational methods and internet research.

We’re seeing a good note of optimism here, as we talk about how tools and programming languages are improving, new research questions can be asked, albeit with some challenges. As this new field comes together, we wonder if we might close our round-doc by asking if you had any recommendations or thoughts for scholars entering this new field? Beyond whether they should learn to program or not, what advice would you give a new entrant to the field?

I’m optimistic that we are working in an era of academic innovation, and that as scholars working with computational methods we have the opportunity, on the one hand, to look at existing questions in new ways, and, on the other hand, to ask new questions and build new theory.
“Do what makes your heart leap rather than simply follow some style or fashion” (Salk, 1991).
Get in touch with the research community as early as possible, by going to a conference (and RESAW might be the perfect choice) or taking part in an Archives Unleashed Datathon! For me, both have been incredibly enriching experien ces during my Ph.D. research.
there are dynamic research communities on internet studies, the history of the Web in relation to compu tational methods that are at the origin of an important historiography. Second, I think that neophytes also need to trust each other because innovation comes from taking risks, meeting people, but also sometimes from questions that seemed candid.

[Tim Berners-Lee]