Computational methods for the analysis of climate change communication: Towards an integrative and reflexive approach
[Correction added on 24 January 2023, after first online publication: The corresponding author has been updated in this version.]
Abstract
Computational methods, in particular text-as-data or Natural Language Processing (NLP) approaches, have become popular to study climate change communication as a global and large-scale phenomenon. Scholars have discussed opportunities and challenges of these methods for climate change communication, with some proponents and critics taking strong positions, either embracing the potential of computational methods or critically questioning their value. Mirroring developments in the broader social scientific debate, we aim to bring both sides together by proposing a reflexive, integrative approach for computational research on climate change communication: We reflect on strengths (e.g., making data big and small, nowcasting observations) and weaknesses (e.g., introducing empiricist epistemologies, ignoring biases) of computational approaches. Moreover, we also provide concrete and constructive guidance on when and how to integrate (or not integrate) these methods based on theoretical considerations. We thereby understand computational methods as part of an ever-increasing, diverse toolbox for analyzing climate change communication.
This article is categorized under:
- The Social Status of Climate Change Knowledge > Knowledge and Practice
- The Social Status of Climate Change Knowledge > Sociology/Anthropology of Climate Knowledge
Graphical Abstract
1 INTRODUCTION
Communication about climate change is crucial for developing and implementing societal responses to anthropogenic global warming (Moser, 2010, 2016). Stakeholders and decision-makers from within and beyond science have started communicating on the issue (Schlichting, 2013; Segerberg, 2017; Walter et al., 2019) within an increasingly diversified media ecosystem that now includes (online) news media, social media, instant messengers, and so on (Schäfer, 2012, 2017). Correspondingly, scholarship on climate change communication has grown considerably (Comfort & Park, 2018). A notable trend in research on communication about climate change is the increasing use of computational methods, in particular text-as-data or Natural Language Processing (NLP) approaches, often on large-scale corpora of text sometimes called “big data” (Hase & Schäfer, 2023; Koteyko et al., 2015). Computational methods and “big data” are nothing new in the broader field of climate science, of course: Scholars from STEM disciplines have long relied on simulation models and large-scale geospatial data to model climate change, its causes, and characteristics (Giorgi & Mearns, 1991; Müller, 2010), including societal and sociopolitical impacts like migration patterns (Lu et al., 2016).
In recent years, however, computational methods—automated approaches to collect, structure, and analyze data, from automated content analysis over social network analysis to agent-based simulations—have also reached the social science domain within climate research. Similar to other fields, this has fostered the emergence of Computational Social Science (CSS), a strand of research that uses computational methods to study social phenomena (Lazer et al., 2020). CSS frequently relies on “big data”: often large, granular, and unstructured data that is created in real-time (Kitchin, 2014) and collected via computational methods. Computational methods, however, come not only with opportunities but also with challenges, such as biases related to data acquisition and analysis (Boyd & Crawford, 2012; Mahrt & Scharkow, 2013; Ruths & Pfeffer, 2014).
These opportunities and challenges have led to debates about the use of computational methods in social science more generally (Lazer et al., 2020; Wagner et al., 2021) and for analyzing communication about climate change in particular (for an overview, see Hase & Schäfer, 2023). As communication scholars, we focus on a specific strand of this discussion: the analysis of large-scale text corpora from social media platforms (Pearce et al., 2019) or news media (Grundmann & Scott, 2014; Hase et al., 2021; Kirilenko & Stepchenkova, 2012) via automated content analysis. In the field of climate change communication, NLP approaches have recently faced some criticism (Grundmann, 2021), often with a focus on specific questions and studies (Lahsen, 2021). Taking these discussions as a starting point, we (a) critically reflect upon strengths and weakness of NLP approaches beyond single studies or questions, and (b), by relying on theory, provide concrete and constructive guidance on when and how to (not) integrate these methods. Similar to discussions in climate science (Faghmous & Kumar, 2014; Knüsel et al., 2019), we argue for an integrative, reflexive approach to propel computational research on climate change communication forward.
2 COMPUTATIONAL METHODS FOR THE ANALYSIS OF CLIMATE CHANGE COMMUNICATION: A BRIEF OVERVIEW
- to collect data on how organizational and individual, professional and non-professional communicators position themselves towards climate change by using transcripts of national parliamentary debates (Majdik, 2019), crawling websites and social media accounts of influential stakeholders (Adam et al., 2020), or scraping websites hosting policy documents (Biesbroek et al., 2020);
- to collect data on how public communication about climate change is structured by using programming scripts to access databases on news coverage (Buckingham et al., 2020) or social media platforms (Pearce et al., 2014);
- or to collect data on audience behavior towards climate change by relying on Google search trends (Le Nghiem et al., 2016), individual users' social media content (Williams et al., 2015), or digital traces from web tracking (Yan et al., 2021).
- to analyze the salience of climate change compared with other issues in political debates (Liu et al., 2011), news media (Schmidt et al., 2013), or Google searches (Le Nghiem et al., 2016);
- to identify relevant actors and communicators in public communication about climate change in news media (Grundmann & Scott, 2014), on social media (Pearce et al., 2014), among policy-makers (Biesbroek et al., 2020), or think-tanks and activists (Boussalis & Coan, 2016);
- to reconstruct communities and networks of individuals that share similar, and sometimes opposing, views of climate change, often on social media (Williams et al., 2015) and with a specific focus on denialist communities (Adam et al., 2020; Farrell, 2016);
- to analyze linguistic patterns of climate change communication on the word, sentence, text, and narrative level (for an overview, see Fløttum, 2016);
- or to identify prevalent evaluations, topics, and frames of climate change in news media (Hase et al., 2021; Kirilenko & Stepchenkova, 2012) or social media content (Kirilenko & Stepchenkova, 2014).
3 FROM THE “END OF THEORY” TO THE “RESURRECTION OF THEORY”: THE FUTURE OF COMPUTATIONAL METHODS
In line with a computational turn in social science, debates about how to employ computational methods have evolved—with scholars often taking somewhat opposing views on their suitability (see Figure 1).

On the one hand, some scholars push forward a technocratic view on computational advances by highlighting their promises. Especially early-on, CSS was considered “an entirely new scientific approach for scientific analysis” (Conte et al., 2012, p. 327). “Big data,” instead of theoretical and conceptual advances, was embraced as driving knowledge generation, even heralding an “end of theory” (Anderson, 2008, no page). Social science, so the argument, might be transformed “to be more like physics by identifying general principles” (Watts, 2017, p. 1) in the process.1
However, there has been critical pushback by social scientists perceiving such claims as an attacking “opening salvo” (Margolin, 2019, p. 232) on social science and the role of theory. Some fear for the field to be colonized by computer science perspectives and practices (McFarland et al., 2016). This led scholars to criticize a “big data hubris” (Lazer et al., 2014, p. 1203), that is, the lack of theoretically embedded applications of computational advances, often within positivist paradigms reducing science to empiricism (see critically Fuchs, 2017).
Recently, scholars have started to discuss how perspectives embracing or criticizing the computational turn can be aligned more. Instead of resurrecting a “Methodenstreit” that fuels either-or-propositions on the suitability of different methods (for a detailed discussion, see Gerring, 2017), they propose a reflexive yet integrative view for advancing computational research: Reflexive means that they ponder and critically discuss strengths and weaknesses arising from the approximation of social science and computer science. Integrative means that they also provide concrete and constructive guidance on how to rely on conceptual and theoretical guidance for employing (or deciding against) these methods, thus leading to a “resurrection of theory” (Halavais, 2015, p. 586). Best-practices include work on the integration of predictive and explanatory modeling (Hofman et al., 2021) or machine learning in social science (Grimmer et al., 2021; Radford & Joseph, 2020).
4 COMPUTATIONAL RESEARCH ON CLIMATE CHANGE COMMUNICATION: A REFLEXIVE, INTEGRATIVE PERSPECTIVE
For the context of climate change communication, existing work has been reflexive by critically discussing shortcomings of computational advances and how not to use them, often for the context of specific studies and questions (e.g., Lahsen, 2021). However, we are not aware of work providing concrete guidance on when and how to apply these methods in a more integrative manner and beyond selected questions. Taking the work by Lahsen (2021) as a starting point and in line with arguments by Grundmann (2021) that computational methods “need to be aligned with theoretical perspectives […] in order to advance research in this field” (p. 395), we propose a reflexive and integrative perspective for computational methods: We reflect strengths and weaknesses of computational methods for studying climate change communication. Related to these, we also propose guidance on how computational advances could be integrated via theory—including when to opt against using them.
In doing so, we focus solely on how scholars, should they consider employing computational methods, could do so in a more reflexive and integrative way. Work on climate change communication should, and often does, rely on a pluralist methodological toolkit including qualitative, quantitative, and computational methods (Agin & Karlsson, 2021; Hase & Schäfer, 2023). The degree to which researchers employing qualitative and quantitative methods are critical about their methods and rely on theory to integrate approaches can thus serve as role model for computational research. Our focus on computational methods—and a reflexive, integrative approach for this line of work—does not mean that we consider non-computational research to not be reflexive or integrative. On the contrary, we argue for computational research to better align with principles already established in qualitative or quantitative research.
4.1 Strength 1: Making data big and small
The temporal and spatial granularity of “big data” allows us to understand phenomena on a larger scale and through comparative, cross-national, cross-sectoral, longitudinal perspectives—that is, to make data big. For example, we can compare discussions about climate change across countries and beyond Anglophone contexts via machine translation (Hase et al., 2022; Reber, 2019). Pianta and Sisco (2020), for instance, analyze the salience of climate change in global news in 22 different languages; others use multilanguage approaches to study event-centeredness in global news coverage (Wozniak et al., 2021). Many of these studies work with large-scale geospatial data from climate science, for instance to understand how global temperatures are associated with shifts in climate change communication and respective news coverage (Pianta & Sisco, 2020; Schäfer et al., 2014). But computational methods also allow us to make data "small". Large data sets enable researchers to identify sub-populations or outliers (Choi, 2020), “thus providing access to the proverbial needle in the digital haystack” (Mahrt & Scharkow, 2013, pp. 24–25): Scholars can identify outliers within “big data” via computational methods and learn more about those less (or least) representative, seeing that such cases may be hard to reach via traditional “small data.” Examples include the identification of people with strong attitudes towards climate change via those most actively discussing the issue on social media (Williams et al., 2015) or studying conspiracy theories related to climate change as fringe phenomena (Mahl et al., 2021).
In sum, computational methods allow us to make data big and small, enabling researchers to address specific gaps in analyses of climate change communication: They help us to understand climate change as a truly global crisis by including “big data” from a broad range of national cases (Pearman et al., 2022) and analyzing them via cross-language NLP approaches.2 This extends to addressing theoretical issues such as fragmentation and polarization (Moser, 2016), for instance identifying ideological communities and their communication about climate change by making data small (e.g., Cann et al., 2021; Williams et al., 2015).
4.2 Strength 2: Nowcasting observations and recommendations
Another key advantage of “big data” is its in-time creation (Kitchin, 2014). Scholars can use computational methods to make timelier, often longitudinal observations, something that is useful for instantly understanding public reactions towards COPs (Fownes et al., 2018), extreme weather events, or disasters (Ford et al., 2016). This includes nowcasting recommendations for early warning systems or disaster responses. As such, CSS can prove useful for mitigation, preparedness, response, and recovery in disaster management (Yu et al., 2018).
While nowcasting observations and recommendations can be useful for approaching applied problems (see similarly Watts, 2017), it is also important for addressing theoretical gaps: As Olausson and Berglez (2014) note, there is a lack of understanding climate change communication across analytical levels in the form of (intermedia) agenda-setting and building, especially information flows between news media and the public. By instantly tracking communication via NLP approaches, computational studies can show how public communication on social media may try to influence news agendas (Su & Borah, 2019) or how social movements may apply pressure on politicians related to climate change (Haßler et al., 2021) through bottom-up perspectives.
4.3 Strength 3: Enabling data-based knowledge generation
Third, scholars in CSS have pushed for “data-driven science that radically modifies the existing scientific method by blending aspects of abduction, induction, and deduction” (Kitchin, 2014, p. 10). Approaches such as machine learning can facilitate more inductive approaches (Grimmer et al., 2021) frequently proposed by qualitative scholars, especially since researchers often employ NLP approaches to explore data via mixed methods (Hase et al., 2022). Grounded theory approaches (Nelson, 2020), for example, can provide an explicit bridge for combining qualitative, quantitative, and computational methods within these more inductive approaches, something also proposed for understanding climate change (Ford et al., 2016).
This strength of enabling data-based knowledge generation might enable researchers to address another theoretical gap in the field: Scholars have repeatedly criticized that climate change communication is focused too narrowly on framing theory (Agin & Karlsson, 2021). We think that, in combination with quantitative or qualitative methods, computational methods can remedy this gap by supporting theory-building in the field of climate change communication. While not explicitly focusing on theory-building themselves, recent studies have for instance inductively identified peaks in public attention to extend research on climate-change related media events (Olteanu et al., 2021) or combined computational and qualitative approaches for typologies of attitudes towards climate change (Tvinnereim et al., 2017)—approaches that can become building blocks for more conceptual and theoretical work.
4.4 Weakness 1: (Re-)introducing empiricist epistemologies
But CSS also introduces new, profound challenges. This includes its data-driven approach, a strength-turned-challenge when combined with positivist paradigms from computer science (see critically Fuchs, 2017). Scholars have repeatedly stressed that CSS pushes an epistemological shift towards an “empiricist mode of knowledge production” (Kitchin, 2014, p. 3).
This weakness of (re-)introducing empiricist epistemologies may perpetuate existing theoretical issues, including a narrow focus on few, selected theories like framing (Agin & Karlsson, 2021). NLP approaches are often used to detect topics or “frames” (Hase et al., 2022)—something that extends to research on climate change communication, as Grundmann (2021) critically reflects. As it is unclear whether NLP approaches measure “frames” in their theoretical sense (Schäfer & O'Neill, 2017), this is highly problematic (Grundmann, 2021). Thus, we urge researchers to opt against the use of computational methods if these cannot be used to adequately measure theoretical concepts of interest (Baden et al., 2021). Theoretical and conceptual considerations need to guide “big data” studies on climate change (Faghmous & Kumar, 2014; Knüsel et al., 2019), especially within data-driven knowledge generation—or else, they will perpetuate and even increase the theoretical limitations of the field.
4.5 Weakness 2: Ignoring bias for precision
Due to its size, “big data” allows for seemingly precise inferences. In turn, however, it can introduce non-random biases (Salganik, 2018). A perpetual weakness of many “big data” studies concerns the fact that researchers prefer more precise estimates over less biased ones, often for atheoretical reasons. This is particularly evident in terms of representativeness bias—that is, getting highly precise estimates but of less important populations (Boyd & Crawford, 2012; Salganik, 2018), for instance by focusing on selected social media samples.
Again, this weakness of ignoring bias for precision may exacerbate existing theoretical issues: An example is the specific, and narrow, set of samples researchers use to test key theories in climate change communication. Currently, most theories are explored and evaluated based on analyses of textual communication, especially from printed newspapers (Comfort & Park, 2018; Schäfer & Schlichting, 2014) which serve as proxies for news coverage in general or from Twitter which serves as a proxy for social media communication in general (Pearce et al., 2019). While such narrow foci are not limited to climate change communication (see critically Hase et al., 2022; Jünger et al., 2022), they are particularly problematic in this context since they increase the existing lack of knowledge on whether theories on climate change communication are generalizable. This includes testing existing theories for visual communication (Thorsen & Astrupgaard, 2021) or platforms other than Twitter (Pearce et al., 2019) as understudied but theoretically important populations of interest.
5 CONCLUSION
In sum, “big data” and computational methods are not always better (Boyd & Crawford, 2012) than “small data” or non-computational methods—but they are also not always worse. For the use of text-as-data approaches in research on climate change communication, we follow Faghmous and Kumar in arguing that “big data analytics should not be seen as the ‘silver bullet’ of modern research and must be used in addition to other tools” (Faghmous & Kumar, 2014, p. 261). Instead of pitting research paradigms and methods against each other, scholars should employ qualitative, quantitative, and computational perspectives on climate change communication in a complementary fashion (see similarly Boussalis & Coan, 2016; Hase & Schäfer, 2023). Not every scholar, or every study, has to use qualitative, quantitative, or computational methods on their own or in combination—instead, “what the field requires is not representativeness in each individual study, but representativeness across studies.” (Margolin, 2019, p. 242).
However, should scholars decide to use computational methods, they need to know what computational methods can and cannot do—that is, reflect their strengths and weaknesses—to decide whether and how to employ these methods adequately. They should rely on theory and conceptual thought to identify meaningful questions, data, measurements, and interpretations (Radford & Joseph, 2020) and, relatedly, decide whether to integrate computer science methods (for an overview, see Table 1). Key strengths of computational methods (i.e., making data big and small, nowcasting observations and recommendations, data-based knowledge generation) can only emerge in light of theory; similarly, key weaknesses (i.e., empiricist epistemologies, ignoring bias for precision) can only be recognized and alleviated via theory—which may, in many cases, lead researchers to opt against the use of computational methods.
Reflexivity | Integration |
---|---|
Key Strengths | In Line with Theory: Integrate Computational Methods |
Making data big and small | … to study climate change as global crisis … to analyze fragmentation and polarization concerning climate change |
Nowcasting observations and recommendations | … for applied disaster management … to study communication across analytical levels, e.g., (intermedia) agenda-setting and agenda-building |
Enabling data-based knowledge generation | … for theory-building |
Key weaknesses | In Line with Theory: Do Not Integrate Computational Methods |
(Re-)introducing empiricist epistemologies | … if they cannot measure theoretical concepts of interest and thus increase theoretical narrowness in the field |
Ignoring bias for precision | … if they cannot help to study populations of theoretical interest and thus increase a lack of generalizability in the field |
A last, important point concerns environmental costs of computational methods: Cloud computing services used for NLP leave a heavy carbon footprint (Strubell et al., 2020). Moreover, NLP models are often optimized for English-language corpora—but their environmental impact affects populations in non-English speaking countries. Researchers should consider whether they want to contribute to the affectedness of populations not benefitting from such models by building NLP approaches that “serve the needs of those who already have the most privilege in society” (Bender et al., 2021, p. 613).
AUTHOR CONTRIBUTIONS
Mike S. Schäfer: Conceptualization (equal); writing – original draft (equal); writing – review and editing (equal). Valerie Hase: Conceptualization (equal); writing – original draft (equal); writing – review and editing (equal).
ACKNOWLEDGMENT
Open Access funding enabled and organized by Projekt DEAL.
RELATED WIREs ARTICLES
Evaluating the computational (“Big Data”) turn in studies of media coverage of climate change
Endnotes
Open Research
DATA AVAILABILITY STATEMENT
Data sharing is not applicable to this article as no new data were created or analyzed in this study.