Skip to main content
SearchLoginLogin or Signup

Future Shock: Generative AI and the International AI Policy and Governance Crisis

Published onMay 31, 2024
Future Shock: Generative AI and the International AI Policy and Governance Crisis

1. Introduction

On March 29, 2023, the Future of Life Institute (FLI) published an open letter calling on “all AI labs to immediately pause for at least 6 months the training of AI systems more powerful than GPT-4” (FLI, 2023). Signed by eminent academics, CEOs, and other tech luminaries—including Yoshua Bengio, Elon Musk, and Steve Wozniak—the letter lamented the lack of “planning and management” that characterized the hapless behavior of “AI labs locked in an out-of-control race to develop and deploy ever more powerful digital minds that no one – not even their creators – can understand, predict, or reliably control.” “Powerful AI systems,” it continued, “should be developed only once we are confident that their effects will be positive and their risks will be manageable.” For the letter’s authors, the absence of actionable, binding, and enforceable governance mechanisms to ensure appropriate control over the risks and harms of generative AI (GenAI) technologies made a 6-month cessation of tech company research activities necessary. In lieu of this, a mandatory government-enforced moratorium was needed. During the pause, the letter concluded,

AI developers must work with policymakers to dramatically accelerate development of robust AI governance systems. These should at a minimum include: new and capable regulatory authorities dedicated to AI; oversight and tracking of highly capable AI systems and large pools of computational capability; provenance and watermarking systems to help distinguish real from synthetic and to track model leaks; a robust auditing and certification ecosystem; liability for AI-caused harm; robust public funding for technical AI safety research; and well-resourced institutions for coping with the dramatic economic and political disruptions (especially to democracy) that AI will cause.

Just two days after the publication of the letter, Timnit Gebru, Emily M. Bender, Angelina McMillan-Major, and Margaret Mitchell (2023; the authors of a well-known paper1 cited in it) issued a trenchant response. While agreeing with “a number of the recommendations” contained in the original letter, Gebru and colleagues stressed that these were “overshadowed by fearmongering and AI hype, which steers the discourse to the risks of imagined ‘powerful digital minds’ with ‘human-competitive intelligence.’" Criticizing how the inflationary anthropomorphic language of “a fantasized AI-enabled utopia or apocalypse” can lure people into “uncritically trusting” and “misattributing agency” to inert computational systems, these authors cautioned that such a perspective can dangerously distract from “the actual harms resulting from the deployment of AI systems today.” The FLI letter, they observed, addressed “none of the ongoing harms from these systems, including 1) worker exploitation and massive data theft to create products that profit a handful of entities, 2) the explosion of synthetic media in the world, which both reproduces systems of oppression and endangers our information ecosystem, and 3) the concentration of power in the hands of a few people which exacerbates social inequities.” Instead, Gebru et al. called for “regulation that enforces transparency,” stressing that such regulation should include documentation and disclosure requirements for “training data and model architectures,” requirements to make it clear when users were “encountering synthetic media,” and mechanisms to make the tech companies building large-scale AI systems accountable “for the outputs produced by their products.” Finally, they warned that decisions about how to develop and govern large-scale AI systems should not be left to tech industry actors and to the academics and researchers who are “financially beholden” to them, but rather should include people from those communities “most impacted by AI systems”—especially those from the historically marginalized and vulnerable social groups likely to be disproportionately harmed.

Over a year after this exchange, the outcome of these appeals for effective and proactive AI policy and governance interventions has been telling. There has neither been a pause of big-tech AI innovation activities nor a moratorium on them. Likewise, although ongoing international discussions on possibilities for regulating generative AI have been initiated alongside the formation of a few national-level ‘AI Safety Institutes’ and the issuance of several voluntary codes of conduct, no “robust AI governance systems” that enforce corporate transparency and accountability have been developed, no “new and capable regulatory authorities” have been formed, no “robust ecosystem” for third-party oversight, tracking, auditing, and certification has materialized, no disclosure requirements for training data and model architectures have been codified, no liability regime to make tech companies answerable for AI harms has come into force, and no institutional redress of ecosystem-level socioeconomic disruption and rapid big-tech financial consolidation has occurred.

In a similar way, the crux of the discord between the FLI letter and the Gebru, Bender, McMillan-Major, and Mitchell letter has remained unresolved, raising crucial issues about the overall priorities, commitments, and direction of the AI policy and governance community writ large. Some critical observers have noted that, in fact, the realization of the concerns expressed by Gebru and colleagues about the distractive focus of the AI safety community on existential risk has enabled tech industry leaders to divert attention away from the robust regulatory controls needed to redress tangible risks and harms and has thus ended up being at cross-purposes with the development of the substantial policy and governance interventions called for in both letters. On this account, the amplification of AI hype and “fearmongering” about apocalyptic risks of AI takeover (Hanna & Bender, 2023) have allowed corporate AI labs and safety researchers to “play up the long-term threat of human extinction[…][heaping] praise on proposals that would slow-walk action – all while continuing to drive the AI arms race forward at a breakneck speed” (Accountable Tech, AI Now, & EPIC 2023). Moreover, the dominance of a small group of corporate AI labs and AI safety researchers in shaping the current international AI policy and governance agenda has also only seemed to confirm Gebru et al.’s other concern about the exclusion of impacted voices from GenAI governance debates and processes. This has raised questions about the narrow-minded focus of GenAI policy and governance initiatives on issues defined by the private interests of (largely Global North) corporate actors, about the overlooking and undervaluing of the policy issues and concerns of people on the local, regional, and global margins, and, ultimately, about legitimacy and inclusivity of current GenAI policy and governance discussions.

In this Policy Forum, we hope to explore this thorny terrain of GenAI policy and governance through the lens of “future shock”—a term first coined by Alvin Toffler (1970/1984) to capture the societal dislocation caused by the rapid advent of the digital revolution. In Toffler’s view, the continuous and accelerating social, cultural, political, and economic changes brought about by this technological transformation were causing a bewildering overhaul of familiar forms of everyday life and a “shattering stress” in the lived experience of individuals “subjected to too much change in too short a time.” Toffler’s concerns were rooted in how a society ill-prepared for such sudden changes could not cope with the accelerating pace of the innovation-induced demolition of existing human institutions, norms, and practices. For him, this raised the real prospect of a “massive adaptational breakdown.” “Future shock,” he wrote, describes “the dizzying disorientation brought on by the premature arrival of the future.”

The purpose of the Policy Forum is to investigate the extent to which the meteoric rise of foundation models (FMs) and GenAI technologies, since the launch of ChatGPT at the end of 2022, has triggered future shock within the international AI policy and governance ecosystem. Each of the 13 position papers collected here examines a different aspect of how the GenAI revolution has presented “massive adaptational” challenges for (and put immense pressure on) existing institutions, norms, and practices. In putting together this Policy Forum, we have intended to create a living snapshot of the salient policy discussions that have been happening at an inflection point in the history of technology. We have prioritized drawing together multi-sector, cross-disciplinary, and geographically diverse expertise, and we have sought to spotlight the broad spectrum of lived experiences of those affected by these technologies.

1.1. Generative AI and the International AI Policy and Governance Crisis

In this introduction, we aim to provide an aerial view of the current GenAI policy and governance ecosystem. Over the course of writing the piece, we discovered that the stage-setting required to accomplish this task was extensive given that we needed to be able to provide readers uninitiated into current AI policy and governance debates with a landscape view of the range of complex technical, sociotechnical, geopolitical, and political economic issues at play. Our discussion broaches this range of concerns through a fiercely interdisciplinary lens, attempting to continuously stitch together threads from each of these issue areas. To aid the reader on their journey through this spiky conceptual terrain, we provide a landscape-view visualization of the territory we explore (Figure 1) at the end of this section.

We begin the introduction by posing the question: ‘Did the rapid industrialization of generative AI really trigger future shock for the global AI policy and governance community?’ First, we examine how and why this should not have been the case. As the GenAI revolution quickly gathered steam in early 2023, the mature state of AI-adjacent legal and regulatory regimes in areas such as cybersecurity, digital trade, consumer protection, intellectual property, antitrust, online safety, and data protection should have formed a robust conceptual basis upon which AI policy and governance communities could draw in confronting the expanding risks triggered by the industrial scaling of these technologies. Likewise, for several years leading up to GenAI’s eruptive rise, stakeholders from across industry, academia, government, and civil society, and from around the globe, had made concerted efforts to develop standards, policies, and governance mechanisms to ensure the ethical, responsible, and equitable production and use of AI systems.

However, as we then show, despite these ostensibly supportive activities and background conditions, several primary drivers of future shock converged to produce an international AI policy and governance crisis in the wake of the dawning of the GenAI era. Such a crisis, we argue, was marked by the disconnect between the strengthening thrust of public concerns about the hazards posed by the hasty industrial scaling of GenAI and the absence of effectual regulatory mechanisms and needed policy interventions to address such hazards. In painting a broad-stroked picture of this crisis, we underscore two sets of contributing factors. First, there have been factors that have demonstrated the absence of various vital aspects of AI policy and governance capability and execution—and thus the absence of key preconditions for readiness and resilience in managing technological transformation. These include prevalent enforcement gaps in existing digital- and data-related laws (e.g., intellectual property and data protection statutes), a lack of regulatory AI capacity, democratic deficits in the production of standards for trustworthy AI, and widespread evasionary tactics of ethic washing and state-enabled deregulation.

Second, there have been factors that have significantly contributed to the presence of a new scale and order of systemic-, societal-, and biospheric-level risks and harms. Chief among these were the closely connected dynamics of unprecedented scaling and centralization that emerged as both drivers and by-products of the GenAI revolution. We focus, in particular, on model scaling and industrial scaling. Whereas the scaling of data, model size, and compute were linked to the emergence of serious model intrinsic risks deriving from the unfathomability of training data, model opacity and complexity, emergent model capabilities, and exponentially expanding compute costs, the rapid industrialization of FMs and GenAI systems meant the onset of a new scale of systemic risks that spanned the social, political, economic, cultural, and natural ecosystems in which these systems were embedded. The brute-force commercialization of GenAI ushered in a new age of widespread exposure in which increasing numbers of impacted people and communities at large were made susceptible to the risks and harms issuing from model scaling and to new possibilities for misuse, abuse, and cascading system-level effects.

Alongside these aspects of model scaling and industrial scaling, patterns of economic and geopolitical centralization only further intensified conditions of future shock. The steering and momentum of these scaling dynamics lay largely in the hands of a few large tech corporations, which essentially controlled the data, compute, and skills and knowledge infrastructures required to develop FMs and GenAI systems. This meant that a small number of corporate actors and AI labs had a disproportionate influence on the direction and pace of the GenAI revolution, pursuing market-oriented values that led to hasty acceleration. This also meant that, if left unchecked, such a concentration of techno-scientific and market power could lead to further power centralization and economic consolidation. Moreover, such an impetus to industry power consolidation meant the amplification of corresponding dynamics of geopolitical power centralization among the big tech hosting nation-states of the Global North motivated by international and regional competition and security considerations to accommodate the interests and will of predominant home-grown private sector AI tech players. Such geopolitical dynamics promised to exacerbate and further entrench longer term patterns of global inequality and inequity.

The introduction concludes with an examination of the extent to which the first wave of policy and governance initiatives that cropped up in mid-2023 effectively responded to the international AI policy and governance crisis we have thus far been outlining. We explore diverging responses to this question. Some, more positive, commentators have seen initiatives like the UK AI Safety Summit and the G7’s Hiroshima AI Process as generating significant momentum for international collaboration. On this view, such initiatives took steps in the right direction by establishing the sort of international regimes and institutions that are necessary counters to the cross-border character of FM and GenAI risks, harms, supply chains, and infrastructure. However, other, more critical observers have emphasized that much of this AI policy and governance activity has been ineffective and diversionary, subserving the deregulatory interests of big-tech firms, failing to deliver binding governance mechanisms, drawing attention away from the real-world harms inflicted by large-scale AI, and further entrenching legacies of Global North political, economic, and sociocultural hegemony to the exclusion of the voices, interests, and concerns of the Majority World. Seen through this critical lens, first-wave international AI policy and governance initiatives not only struggled to sufficiently address the crisis, but their outcomes have, to a considerable degree, only exacerbated it.

We close by providing a thematically organized summary of our Policy Forum contributions, exploring how these cover a range of difficult and unresolved policy questions that have been opened amid continuing GenAI governance debates. In bringing these myriad views into one place, we hope to initiate and advance a meaningful, informed, and far-ranging conversation rather than to conclude one. We therefore welcome and urge further contributions to extend the dialogue and create a broader and more inclusive AI policy and governance discussion.

Figure 1. A landscape view of conceptual territory explored in this Introduction

2. Did the Rapid Industrialization of Generative AI Really Trigger Future Shock for the Global AI Policy and Governance Community?

Upon its public release in November 2022, ChatGPT sent almost instant shockwaves across the digital world. The unprecedented industrial revolution of AI had begun. By January 2023, ChatGPT had amassed 100 million active users (Milmo, 2023), making it the fastest growing consumer application in history. Within weeks, hundreds of commercial GenAI applications stormed onto the scene, seemingly penetrating all areas of everyday life, while large tech companies like Microsoft and Google simultaneously integrated these technologies into the flagship digital services that daily impacted their billions of users worldwide. Owing in large part to their general purpose character, GenAI technologies quickly propagated across every sector, putting immediate and far-reaching pressure on social, cultural, political, legal, and economic norms and institutions.

Despite the abruptness that characterized this eruption of GenAI applications, the extent to which this triggered ‘future shock’ among AI policy and governance communities across the globe remains debatable. The commercialization explosion of GenAI did not necessarily catch AI policymakers and policy researchers unaware. Decades of debate, research, and policy development in areas such as cybersecurity, digital trade, data privacy, consumer protection, intellectual property rights, antitrust and competition, online safety, and data protection had yielded standards, good practice protocols, laws, and regulations that formed a robust conceptual basis upon which AI policy and governance communities could draw in confronting the broadening risk surface presented by the industrial scaling of GenAI technologies. This can be seen, for instance, across a range of regional treaties and legislative interventions that had confronted the multiplying risks of expanding digitalization and datafication in the information age, from the establishment of the African Union’s Convention on Cybersecurity and Data Protection (2014) and the Council of Europe’s Conventions on Cybercrime (2001) and Data Protection (1981) to the development and codification of national data protection and privacy laws in Korea (2011), Japan (2003), Singapore (2012), China (2021), Indonesia (2022), Sri Lanka (2022), Malaysia (2010), the European Union (2016), Egypt (2020), South Africa (2013), Tunisia (2004), Botswana (2018), Ghana (2012), Kenya (2019), Mauritius (2017), Nigeria (2023), Tanzania (2022), Uganda (2019), New Zealand (2020), Australia (1988), Bahrain (2018), Qatar (2016), UAE (2021), Argentina (2000), Mexico (2010), Chile (1999), Colombia (2012), and Brazil (2018), among many others.

Likewise, for several years leading up to the ‘ChatGPT revolution,’ stakeholders from across industry, academia, government, and civil society, and from around the globe, had made concerted efforts to develop standards, policies, and governance mechanisms to ensure the ethical, responsible, and equitable production and deployment of AI systems. In the United Kingdom, for example, early policy contributions—such as the 2016 House of Commons Science and Technology Committee on robotics, the 2017 House of Lords Select Committee inquiry on AI, and the 2018 AI Sector Deal—placed emphasis on the importance of building innovation governance infrastructures that could facilitate the responsible production and use of AI. Such interventions, and significant public investment to support research into AI ethics and governance, led to national policy initiatives that yielded the world’s first national public sector guidance on AI ethics and safety (2019) and the world’s first national guidance on AI explainability (2020), both of which emphasized the importance of operationalizing principles of social and environmental sustainability, safety, security, reliability, robustness, accountability, transparency, nondiscrimination, inclusiveness, and data stewardship across the AI innovation lifecycle. In the United States, AI policy initiatives such as the Office of Management and Budget’s “Memorandum for the Heads of Executive Departments and Agencies: Guidance for Regulation of Artificial Intelligence Applications” (Vought, 2020) and the U.S. Office of Science and Technology Policy’s “Blueprint for an AI Bill of Rights” (2022) led to the development of the national AI Risk Management Framework (AI RMF), published by the Nation Institute of Standards and Technology in January of 2023. The AI RMF focused on seven essential features of trustworthy AI: validity and reliability, safety, security and resilience, accountability and transparency, explainability and interpretability, privacy-enhancement, and fairness and bias management. In the European Union, the European Commission took initial action in 2018 to appoint a High-Level Expert Group on AI (AI HILEG), which produced its Ethics Guidelines for Trustworthy AI (2019a), centering the role of human agency and oversight, technical robustness, privacy and data governance, transparency, diversity, nondiscrimination and fairness, societal and environmental well-being, and accountability. These guidelines and the AI HILEG’s subsequent Policy and investment recommendations for trustworthy Artificial Intelligence” (2019b) informed the European Commission’s 2020 “White Paper on AI” and, ultimately, the drafting of the EU AI Act (2021), the final version of which attained political agreement by the European Parliament and the Council on the Artificial Intelligence Act in December of 2023. 

Though these AI policy and governance initiatives in the United Kingdom, United States, and European Union well evidence the development of converging values, norms, and governance protocols for responsible and trustworthy AI years before GenAI burst onto the scene, they were not alone. Beyond the Western industrialized regions of the world, which have often been spotlighted in international discussions of AI policy, parallel AI policy and governance activity occurred in many other parts of the world. In Africa, emphasis on the role of AI to achieve the UN Sustainable Development Goals across the continent joined with calls for context-specific governance responses (Thomson Reuters Foundation, 2023). The African Union formed a task force in 2019 directing member states to “establish a working group on AI…to study: the creation of a common African stance on AI; the development of an Africa wide capacity building framework; and establishment of an AI think tank to assess and recommend projects to collaborate on in line with Agenda 2063 [a 50-year plan to achieve ‘The Africa We Want’] and the UNSDGs” (African Union, 2019; Stahl et al., 2023). The Sharm El Sheikh Declaration adopted by African Union Ministers also recognized and encouraged the implementation of the “Digital Transformation Strategy for Africa (2020-2030).” In 2021, as part of collective efforts to advance Agenda 2063, the African Union High Level Panel on Emerging Technologies published the report, AI for Africa­, which set in place guiding principles for the advancement of AI policy across Africa, including the principle of diversity-aware AI innovation, the principle of society-centered management of AI, and the principle of AI for the good of all (African Union Development Agency-New Partnership for Africa's Development, 2021).

Over the past few years, countries in North Africa such as Egypt, Mauritius, Tunisia, Morocco, Algeria, and Sudan have correspondingly developed national AI strategies or initiated policy processes that have emphasized the importance of the safe, ethical, equitable, and sustainable development and deployment of AI technologies (Mubangizi, 2022; Stahl et al., 2023). In Southern Africa, countries such as Zambia, Malawi, South Africa, Zimbabwe, and Namibia have also advanced AI policy formation, drawing on existing legislation “foundational for responsible AI” (United Nations Educational, Scientific and Cultural Organization [UNESCO], 2022a), and Ministers from this region have issued the “Windhoek Statement on Artificial Intelligence in Southern Africa” (2022), which affirmed their commitment to implement the UNESCO Recommendation on the Ethics of Artificial Intelligence (UNESCO, 2022b). The latter is an AI policy instrument adopted by UNESCO’s 193 member states in 2021 that lays out (1) ethical values intended to guide the direction of AI innovation (human rights and human dignity, living in peaceful, just, and interconnected societies, ensuring diversity and inclusiveness, and environmental and ecosystem flourishing), (2) practical principles needed for responsible innovation activity (do no harm, safety and security, privacy and data protection, multi-stakeholder collaboration, responsibility and accountability, transparency and explainability, human oversight and determination, sustainability, awareness and literacy, and fairness and nondiscrimination), and (3) actionable policy areas in which these values and principles can be operationalized. In Eastern Africa, Rwanda adopted a National Data Strategy and a comprehensive National AI Policy and Strategy, which have been noted for their decolonization efforts, including the promotion of the use of local data for AI algorithms, building institutional capacity, and fostering data sharing collaborations (Ayana et al., 2023).

Significant AI policy and governance activity also occurred in the run-up to the GenAI boom in Asia and Latin America. In 2019, the Chinese Ministry of Science and Technology published the “Governance Principles for a New Generation of Artificial Intelligence” comprised of eight guiding objectives: “harmony and friendliness,” “fairness and justice,” “inclusivity and sharing,” “respect privacy,” “secure/safe and controllable,” “shared responsibility,” “open collaboration,” and “agile governance.” Alongside these high-level normative goals, soft-law measures such as a complementary ethical code of conduct and a standard produced by China’s cybersecurity standards agency, "The Guidelines for Artificial Intelligence Ethical Security Risk Prevention" were published in 2021 (Roberts, Cowls, et al., 2023). Two major AI laws and regulations also emerged side-by-side with these soft-law documents. The first, the “Provisions on the Management of Algorithmic Recommendations in Internet Information Services” (2021), addressed concerns about the use of algorithmic systems in online environments, workplace management, and price-setting and established an algorithm registry for AI systems that could shape public opinion (Sheehan, 2023). The second, the “Provisions on the Administration of Deep Synthesis Internet Information Services” (2022), addressed concerns about the growing presence of deepfakes and other AI-generated content and included labeling requirements for synthetically generated content that could mislead or confuse the public.

In Latin America, country-specific AI policy initiatives from 2018 to 2022 likewise established the importance of ensuring the ethical, responsible, and equitable AI innovation. Over this time, according to analysis carried out by the Organisation for Economic Co-operation and Development’s (OECD) Observatory of Public Sector Innovation (2022), Colombia, Chile, Mexico, Uruguay, Argentina, Brazil, Costa Rica, and Peru all demonstrated commitment to establishing robust ethics and governance frameworks for AI and declared adherence to the OECD AI Principles (OECD Artificial Intelligence Policy Observatory, 2024), which include inclusive growth, sustainable development and well-being, human-centered values and fairness, transparency and explainability, robustness, security and safety, and accountability. Values-led national AI strategies were published by Mexico in 2018; Uruguay, Argentina, and Colombia in 2019; and Chile, Brazil and Peru in 2021. These broadly affirmed the importance of inclusive and sustainable growth, advancing education and other social goods, and reducing social inequalities (Urbanovics, 2023). Such national efforts have led to an emerging AI regulatory framework, where at least eight countries in the region (Argentina, Brazil, Chile, Colombia, Costa Rica, Peru, Mexico, and Uruguay) are discussing proposals to regulate the design, development, and use of AI, as well as regional efforts to support responsible AI innovation (Access Now, 2024). Moreover, as part of the Ministerial and High Authorities Summit on Artificial Intelligence in Latin America and the Caribbean held in October 2023, country authorities signed the Santiago Declaration. This document confirmed a regional commitment to the ethical use of AI and established a working group for the creation of an Intergovernmental Council on AI for the region, in line with UNESCO’s recommendations and its Readiness Assessment Methodology and Ethical Impact Assessment.

3. The Reality of Future Shock in the AI Policy and Governance Ecosystem

Taken together, strengthening policy regimes (in areas like cybersecurity, data privacy and protection, digital trade, and online safety) and the coalescence of a broad sweep of global stakeholders around the key values and principles needed for responsible, equitable, and trustworthy AI development and use should have prepared the AI policy and governance community to respond forcefully, and coherently, to the myriad societal challenges posed by the abrupt arrival of the GenAI revolution. This, however, has not been the case. The AI policy and governance ecosystem has, in fact, undoubtedly suffered from future shock amid the meteoric rise of GenAI applications. ChatGPT’s triggering of an “age of competition” (Carugati, 2023) among large tech companies capable of brute-force scaling and swift marketization was met, in the ensuing months, with an anemic legal and regulatory response (effectively, a nonresponse) at both national and international levels (Wheeler, 2023). Meanwhile, as Leslie, Ashurst, et al. (2024) point out in this special issue, the exploding “use of these technologies coalesced with widespread public protestations about their questionable legality, their far-reaching violations of intellectual property, labor, and privacy rights, their capacity to spew disinformation and false, toxic, and discriminatory content, their environmental costs, and their potential negative impacts and transformative effects on society, more generally.” The vacuum left by this gaping hole in actionable technology policy and regulation, even in the midst of strong public backlash, set off an “international AI governance crisis” (Leslie, Ashurst et al., 2024). Such a crisis was, to be sure, a crisis of future shock.

Before exploring the nature of this governance crisis and its consequences, it would be helpful to touch on several of the reasons that lie behind it.

3.1. Enforcement Gaps and Lack of Regulatory AI Capacity

First, significant gaps have arisen over the past several years in the enforcement of existing digital- and data-related legal and regulatory regimes. These gaps—combined with deficits in the capacity of regulators to develop the skills and know-how needed to competently confront the novel governance challenges presented by the rapid deployment of large-scale AI technologies—have created conditions for regulatory inaction and ineptitude. Disparities between legal protections related to digital and data rights and prevalent patterns of unimpeded bad behavior that transgress such protections can be observed, for instance, in both data protection and cybersecurity law (Kohnke et al., 2021; Lynskey, 2023). A recent review of empirical studies undertaken into the real-world impact of data protection law in Europe and California reveals that, out of the 26 studies identified, none of them “found meaningful [on the ground] legal compliance” (Lancieri, 2022, p. 16). In accounting for this, the review points to the way that “modern data protection laws largely fail to anticipate how exceptionally large information asymmetries […] between companies and consumers/regulators, and high levels of market power in many data markets […] undercut legal compliance in the shadows of the law” (pp. 16–18). Other researchers have pointed out, along similar lines, that a widespread lack of resources, skills, and capabilities among regulatory bodies charged with implementing digital- and data-related law (as well as with applying relevant existing laws and regulation to new and emerging digital technologies) has hampered enforcement efforts, allowing well-resourced corporate actors to outrun regulatory action and creating substantial fissures in the assurance of corresponding legal protections (Ada Lovelace Institute, 2023; Aitken et al., 2022; Kohnke et al., 2021).

3.2. Difficulties Faced in the Move From Principles to Practice

Another source of the AI governance crisis has been the set of challenges faced by the AI policy and governance community in translating the values and principles that have surfaced in the field of AI ethics into practicable and binding standards, governance mechanisms, laws, and regulations. In the mid to late 2010s—amid the tech lash sparked by the outbreak of widely reported scandals such as the Cambridge Analytica/Facebook data breach (Cadwalladr & Graham-Harrison, 2018), Clearview AI’s illegal image scraping affair (Hill, 2021), and Pro Publica’s exposure of racial bias in the COMPAS recidivism risk prediction model (Angwin, Larson, Mattu, & Kirchner, 2016)—a large body of voluntary AI ethics frameworks, codes of conduct, and good practice guidance was produced by industry actors, policymakers, academic researchers, and civil society advocates (Boddington, 2020; Fjeld et al., 2020; Jobin et al., 2019). From a policy development perspective, this intensive early activity laid important foundations for further progress in the advancement of enforceable governance mechanisms, because it yielded converging normative vocabularies and conceptual categories that could then be further developed into codified AI standards, laws, and regulations. However, this broader movement from principles to practice—namely, from abstract normative frameworks to concrete laws, standards, and regulations—has so far struggled to materialize at pace.

While the continuous battle of technology law, governance, and oversight to keep up with rapid scientific and technological change is nothing new,2 the issues that have confronted those attempting to move swiftly from principles to practice in AI policy and governance have proven exceptionally difficult to tackle. Such challenges have stemmed both from the unprecedently complex but unavoidable obstacles faced by those who have attempted to establish standards and codified rules for governing and regulating the sociotechnical dimensions of the production and use of AI systems and from evasionary dynamics of ‘ethics washing,’ virtue signaling, and regulatory capture that the underlying political economic and geopolitical contexts of the global AI innovation ecosystem have shaped.

3.2.1. Challenges Around Standardization and Codification: Value Plurality and Democratic Deficits

The former cluster of challenges around standardization and regulatory codification arise from the way that problems with reaching practicable consensus on the meaning of key values and principles (e.g., trustworthiness, bias, fairness, and safety) have undermined their effective operationalization in concrete national and international standards, governance mechanisms, and regulatory measures. Amid the value plurality and cultural and perspectival diversity of contemporary social life, the establishment of fixed and universally accepted understandings of the meaning and significance of these normative concepts has proven justifiably fraught (Ess, 2006; Lassman, 2011; Madsen & Strong, 2009). The meaning of ideas like privacy, safety, and fairness as well as criteria used to assess the ethical impacts of AI (e.g., effects on individual autonomy, well-being, or social solidarity) can vary both within and across cultures and between and within different stakeholder groups (Cappelen et al., 2007; Leslie et al., 2022; Wright et al., 2021). This can make their standardization and integration into codified governance protocols precarious (Pouget, 2023). And yet, because AI technologies function as cognitive surrogates, standing in for human actions in the social world, they unavoidably trigger sociotechnical issues on which these normative ideas and criteria have direct bearing. This means that standards and regulations—which aim to adequately ensure the responsible and trustworthy production, procurement, use, and decommissioning of ever more general purpose AI systems across the almost limitless range of their possible uses—must directly broach such normative ideas and criteria regardless of their semantic plurality and variability (Fish, 2019).

This convergence of conditions of cultural and interpretive plurality with the necessity of confronting the socio-technical aspects of AI development and use has made the ecosystem level move from high-level ethical concepts and frameworks to concrete standards and regulatory requirements a very thorny affair. In the domain of standards development, for instance, growing demands, over the past several years, to confront the ethical aspects of AI development and use have triggered a ‘sociotechnical turn’ with mixed results. Whereas standards for (and the certification of) digital- and software-driven innovation had previously concentrated on the technical specifications necessary to assure the reliable performance of these sorts of technologies, new concerns about the social shaping and ethical consequences of the processes and products of AI innovation began to come to the forefront during this time, spurring standardization initiatives for trustworthy AI from standards development organizations (SDOs) like the Institute of Electrical and Electronics Engineers (IEEE), the International Standards Organization (ISO), and the International Electrotechnical Commission (IEC). In 2017, the IEEE published the second version of its Ethically Aligned Design (EAD) guidance (IEEE Global Initiative on Ethics of Autonomous and Intelligent Systems, 2017). The EAD guidance formed the knowledge basis for the launch of the IEEE’s P7000 series of voluntary standards and certifications (IEEE Standards Association, 2018), which cover areas ranging from algorithmic bias and AI nudging to the transparency of autonomous systems, personalized AI agents, and wellbeing metrics for ethical AI. Around the same time as the EAD initiative, the ISO, in collaboration with the IEC, formed a working group on the Trustworthiness of Artificial Intelligence (ISO/IEC JTC 1 SC42/WG3), which was charged with thinking through how to construct standards for “characteristics of trustworthiness, such as accountability, bias, controllability, explainability, privacy, robustness, resilience, safety and security” (Price, 2019). So far, this working group has published an “Overview of ethical and societal concerns” (ISO & IEC, 2022), “Risk management: A practical guide” (ISO & United Nations Industrial Development Organization, 2021), and technical reports on AI testing, the robustness of neural networks, and “Bias in AI systems and AI aided decision making” (ISO & EIC, 2021).

Though this standards development activity has undoubtedly signaled forward progress in the broader socialization of the priorities of responsible and trustworthy AI among industry actors and policymakers, it has also instantiated the seemingly insuperable challenges of moving from principles to practice against a background of value plurality, semantic variability, and diverging stakeholder interests. After multiyear efforts to establish its standards offering in trustworthy AI, the ISO/IEC JTC 1 SC42/WG3 has, for the most part, been able to produce only strictly informational rather than prescriptive ‘soft law’ resources (i.e., descriptive guidelines, overviews, and technical reports rather than concrete and actionable international standards).3 For example, a half decade after the launch of its trustworthy AI initiative, ISO/IEC published its technical report, “Overview of ethical and societal concerns,” which—rather than pointing to substantial governance measures—stresses the ‘benefits’ to AI developers and implementers of “flexible input on ethical frameworks, AI principles, tools and methods for risk mitigation, evaluation of ethical factors, best practices for testing, impact assessment and ethics reviews” (ISO & IEC, 2022, Introduction). The report also explicitly highlights that it “is not intended to advocate for any specific set of values (value systems)” (Scope), remaining instead a reference resource for those interested in a ‘high-level’ understanding the AI ethics landscape.

This unmistakable hedging in the ISO/IEC report, and other similar self-limiting outcomes in standards development on trustworthy AI,4 signal an implicit acknowledgment of the democratic deficit faced by SDOs, which have, in many cases, traditionally been industry-led, technically focused, procedurally opaque, and thus somewhat closed to meaningful societal input (Baron et al., 2019; Iversen et al., 2004; Marchant et al., 2020; Pouget, 2023). To be sure, because different stakeholders, who are involved with or affected by AI innovation practices, have different views on the meaning of essential normative values and principles (and different motivations in asserting their particular interpretations of these), the legitimating conditions for establishing justified consensus on the standardized meaning of such values and principles demand the inclusive democratic participation of impacted people and organizations—both in the development of socio-technical AI standards and within the governance processes that steer these innovation practices themselves. Consequently, where this kind of robust stakeholder-involving participation, social license, and public consent is absent (as in the processes behind ISO/IEC standardization), standards and codified governance measures are liable to democratic deficits and legitimacy gaps that undermine claims to the consensus-based codification of contested values (Hagemann, 2018). This is the case both at the level of the ‘input legitimacy’ of AI standards (i.e., the legitimacy of the processes behind the production and establishment of standards and governance measures) and at the level of their ‘output legitimacy’ (i.e., the legitimacy of established standards in their implementation within the governance of real-world AI innovation activities) (Werle & Iverson, 2006). A significant barrier to the effective move from principles to practice in the recent history of AI standards—and a source of future shock in the broader AI policy and governance community throughout the unfolding of the GenAI revolution—has been the absence of these layers of democratic legitimation and governance in the development of standards for trustworthy AI as well as in their content and execution.

Be that as it may, several current AI policy and governance initiatives are making forward progress toward democratization and increased social license. As the European Union now seeks to establish enforceable regulatory regimes for the EU AI Act (2021), it has charged its two standards bodies, CEN (European Committee for Standardization) and CENELEC (European Committee for Electrotechnical Standardization), with drafting “European Standards […] to advance the technical harmonization in the field of trustworthy artificial intelligence and prepare the necessary technical ground for the implementation of the future AI Act” (European Commission, 2022, p. 2). The EU’s attempt to bridge statute and regulatory intervention with trustworthy AI standards promises to advance the legitimacy of standardization processes (i.e., input legitimacy) insofar as: (1) CEN-CENELEC is required, in keeping with EU Regulation 1025/2012, “to ensure that the role and the input of societal stakeholders in the development of standards are strengthened, through the reinforced support of organizations representing consumers and environmental and social interests” (European Parliament and Council, 2012, p. 316/14); and (2) CEN-CENELEC is supported by EU funded stakeholder advocacy organizations and must engage in multi-stakeholder consultation to build public consent before the official publication of any standard (Pouget, 2023). While this increased input legitimacy is developing specifically in the European regulatory context,5 wider scale efforts at democratizing international AI standardization processes can also be seen in programs such as the UK’s AI Standards Hub, a joint effort by the British Standards Institution, the National Physical Laboratory, and the Alan Turing Institute, which aims to build coordination and coherence at the international level through “increasing multi-stakeholder involvement in AI standards development.” In its mission statement, the AI Standards Hub also emphasizes its focus on “stakeholder inclusiveness […], giving special consideration to stakeholder segments that are traditionally underrepresented in standards development processes (including civil society organisations and SMEs)” (Alan Turing Institute, n.d.).

3.2.2. Evasionary Dynamics of ‘Ethics Washing,’ Virtue Signaling, and Regulatory Capture

The democratic deficits and legitimacy gaps, which have created headwinds in efforts to translate AI ethics values and principles into actionable standards, are directly related to another cluster of challenges faced by the AI policy and governance community in closing the gap between principles and practice. Industry actors are in a dominant position in the standards development community due, in no small measure, to their ability to invest disproportionate resources in shaping standards development processes and to take advantage of information asymmetries in technical knowledge emergent from their preponderant market power (Baron et al., 2019). This capacity of private sector firms to influence the production and steer the content of AI standards has tended to undermine their credibility with the public, generating suspicions about the propensity of tech companies to create self-serving soft law regimes that tailor governance requirements to their needs and that allow them to mark their own homework (Marchant, 2019; Marchant et al., 2020). Beyond raising questions about the ultimate credibility and effectiveness of voluntary AI standards regimes, concerns about industry-driven self-regulation and the capacity of self-interested private sector tech actors to shape the AI policy and governance ecosystem betoken broader trepidations about the tendency of nonstatutory and nonbinding approaches to AI governance to lead to double-standards, ‘ethics washing,’ ‘ethics shopping,’ and ‘virtue signaling’ (Boddington, 2020; Greene et al., 2019; Hagendorff, 2020; Metcalf et al., 2019; Mittelstadt, 2019; Munn, 2023; Nemitz, 2018; Rességuier & Rodrigues, 2020; Wagner, 2018). Seen through this wider critical lens, the prioritization and widespread adoption of principles-centered, discretionary AI governance mechanisms—from voluntary AI ethics frameworks, standards, and codes of conduct to good practice guidance—leads to a strategic avoidance of enforceable laws and practicable regulations among market-driven tech firms, which press evasionary tactics into the service of hindering the move from abstract principles to compulsory governance requirements.

Those who have critically analyzed the weaknesses of industry-driven self-regulation have pinpointed several political economic and geopolitical factors that help explain these perils of ethics washing and regulatory obstructionism. First of all, because voluntary AI standards and self-imposed AI ethics frameworks lack clear accountability mechanisms and means of enforcement (Hagendorff, 2020), the corporate players who adopt them are liable to act in accordance with more powerful market incentives when deciding on key ethical questions about the permissibility of their actions—even in instances where they explicitly signal their commitment to high-level ethics principles or their membership in trade organizations that hoist the colors of self-enforced codes of conduct (Leslie & Shaw, 2024; Yeung et al., 2020). Here, the semantic openness and plurality of abstract ethical principles offer opportunities for corporate actors to manipulate the meaning of these principles in accordance with their own private interests and goals—often directing attention away from salient but difficult ethical dilemmas (e.g., how to address the underlying material inequalities and ecosystem-level patterns of historical injustice that are often exacerbated by tech industry behavior) by narrowing the meaning of corresponding ethical concepts (e.g., framing the meaning of algorithmic fairness and justice in terms of the mathematical distribution of model error-rates or outputs and formal fairness metrics). Moreover, given how an unprincipled interplay of self-interest and self-governance can drive strategic behavior, these tech players are susceptible to selectively curating and deploying their AI governance approaches in ways that either omit inconvenient or overly confining normative constraints (i.e., ‘ethics shopping’) or limit the scope and efficacy of these governance approaches in accordance with overriding financial objectives (Floridi, 2019; Greene et al., 2019; van Maanen, 2022; Rességuier & Rodrigues, 2020). In this way, voluntary AI ethics frameworks can be used performatively to provide rhetorical cover for potentially unethical or societally detrimental activities at the same time as they enable the delay, dodging, or deterrence of initiatives to develop statutes and regulations that would fill the legal and enforcement gaps that allow for ethics washing to occur in the first place (Leslie & Shaw, 2024; Nemitz, 2018; Taylor & Dencik, 2020). This sort of duplicitous virtue signaling can create a façade of behavioral propriety that then “justifies deregulation, self-regulation or market driven governance” (Bietti, 2020, p. 210) and that “pre-emptively shapes [ethical debates about the risks of AI] around abstract problems and technical solutions” (Mittelstadt, 2019, p. 1) rather than around the concrete social and ethical issues that are of the greatest public import and utility.

Despite the evident divergence of such evasionary tactics from the public interest, the impetus to corporate self-regulation is not necessarily at cross-purposes with the goals, motivations, and actions of governmental bodies and regulators. In a significant sense, the political and geopolitical dynamics that shape the AI policy and governance ecosystem can foster deregulatory and light-touch approaches to codifying AI standards, rules, and controls among policymakers and regulators concerned about ‘stifling innovation’ and cramping economic productivity and growth. As Regine Paul (2022, p. 23) observes, these governmental actors,

cannot be cast as neutral agents who receive and filter business interests and balance them against societal concerns over ethical AI. Neither are they “just” the victim of regulatory capture of big business without agency of their own. Rather, regulators are the pro-active producers of the conditions under which local, national, and regional AI innovation can be globally competitive. The competition state lens provides a focus on states’ own agendas in promoting competitiveness for the national economy – including vision of its own economic and technological future.

For their part, tech industry actors have tended to act symbiotically with these political and geopolitical interests, actively cultivating public-facing corporate identities built around imaginaries of social responsibility, public service, and ethical pacesetting to gain more seamless access to governments and policymakers (Magalhães & Couldry, 2020; Veale et al., 2023; Werner, 2015). These companies have also leveraged their epistemic dominance in resource-intensive areas of AI innovation to set the parameters of discussions around the scope and content of technical standards and related governance mechanisms (Veale, 2020). And, they have used their veritable monopoly over technological expertise and skills infrastructure as an entry key to penetrate underresourced regulators and public sector bodies in desperate need of technical know-how and knowledge transfer. The prevalence of this type of strategic corporate positioning has led some commentators to question whether the ‘de-regulatory’ and ‘pro-innovation’ approaches of governments like the United Kingdom have functioned, in fact, to steward regulatory capture by putting ill-prepared, underskilled, and poorly resourced regulators in a dependency relation with big tech actors, thereby setting the scene for the state-enabled impoverishment of regulatory protections and oversight (Roberts, Ziosi, et al., 2023).

3.3. Dynamics of Unprecedented Scaling and Centralization

Taken together, prevalent enforcement gaps in existing digital- and data-related laws, a lack of regulatory AI capacity, democratic deficits in the production of standards for trustworthy AI, and widespread evasionary tactics of ethics washing and state-enabled deregulation created an ecosystem-level chasm in AI policy that helped to trigger an international AI governance crisis in the wake of GenAI’s explosive rise. These contributing factors, however, were not the sole determinants of the reality of future shock that spurred the unfolding of the AI governance crisis. While such factors demonstrate the absence of various vital aspects of AI policy and governance capability and execution (and thus the absence of key preconditions for readiness and resilience in managing technological transformation), the emergence of FMs and ‘frontier AI’ systems also marked the presence of several new factors that significantly contributed to future shock. Chief among these were the closely connected dynamics of unprecedented scaling and centralization that emerged as both drivers and by-products of what has been called the Large-Scale Era of AI and ML, which ushered in the GenAI revolution (Sevilla et al., 2022).

In the context of scaling, a tremendous increase in the magnitude of compute capacity, training data set size, and model complexity led to qualitative advances in the capabilities of industry-produced FMs. This occurred alongside an explosion of commercialization possibilities that derived from the corresponding step change in the utility of downstream applications of these models. This combination of the radical scaling of compute, data, and model size with the rapid industrial scaling of derivative technologies, in turn, yielded a new order and scale of risks presented by their hasty proliferation and potential misuse or abuse. It likewise yielded a new scale of regulatory and governance challenges emergent from the unavoidable need to cope with such an expanding risk profile.

With respect to centralization, the steering and momentum of these scaling dynamics lay largely in the hands of a few large tech corporations, which essentially controlled the data, compute, and skills and knowledge infrastructures required to develop FMs and frontier AI systems. This meant that a small number of (largely Global North based) big tech firms had a disproportionate influence on the direction and pace of the GenAI revolution. More consequentially, it meant that, if left unchecked, such a concentration of technoscientific and market power could lead to further power centralization and economic consolidation as the benefits of GenAI industrialization accrued in accordance with the narrow set of market-oriented corporate values that broadly drove industry behavior (e.g., optimizing market share, revenue, profitability, efficiency, and shareholder value). Moreover, such an impetus to industry power consolidation meant the intensification of corresponding dynamics of geopolitical power centralization among the big tech–hosting nation-states of the Global North. These latter countries were increasingly locked in a competitive technological race with each other (among perceived allies and adversaries alike) and were thus motivated to accommodate the interests and will of predominant home-grown private sector AI tech players. Such geopolitical dynamics promised to exacerbate and further entrench longer term patterns of global inequality and inequity as the risks, opportunities, and benefits of GenAI commercialization were distributed among the international ‘haves’ and ‘have-nots’ of essential AI-enabling digital resources and infrastructure. Both aspects of political-economic and geopolitical power centralization were, in fact, reflected in the architectonics of influence and authority that shaped the agenda and scope of the first-wave policy response to the international AI governance crisis triggered by GenAI.

On top of these political-economic and geopolitical dynamics of power centralization, the Large-Scale Era of AI/ML technologies also spurred precarious dynamics of operational consolidation and epistemic centralization alongside associated risks and harms. Operational consolidation involved the merging of multiple functions in individual FMs that could then be adapted to a wide variety of downstream applications. The onset of this type of consolidation meant that single points of failure in base model functioning could lead to outsized and system-level adverse impacts (Bommasani et al., 2021, 2022). With so few industrial-grade frontier AI models in commercial use, failure modes in any one of these systems—for instance, data poisoning attacks or data leakage—could have adverse population-level consequences. Epistemic centralization involved the consolidation of computational knowledge production in FMs whose training data were disproportionately composed of the dominant languages and cultural representations of a privileged global minority. This biasing dynamic risked reinforcing legacies of cultural hegemony and exposing historically marginalized sociocultural groups to epistemic injustice. Whereas FMs/large language model (LLMs) have often been framed by their producers as generating outputs that reflect ever more ‘general intelligence’ and that thus possess universal epistemic purport, these outputs more accurately reflect the prevailing cultural values, perspectives, and beliefs of the dominant, digitally recorded societies whose activities have been captured by the particular data sets on which such systems have been trained. This dynamic of imbalanced data representation in pretraining corpora has led to the exclusion of the diverse knowledges, beliefs, and perspectives of those sociocultural communities and groups that have historically been on the margins of digitalization and datafication (Bender et al., 2021; Weidinger et al., 2021). This kind of epistemic injustice implicates claims to AI ‘general intelligence’ as claims to sociocultural hegemony that derive from the FM/LLM-enabled impetus to epistemic centralization.

In advance of fleshing out how these dynamics of unprecedented scaling and centralization contoured the international AI policy response to the new order and scale of risks and governance challenges that stemmed from the GenAI revolution, it would be useful to explore the emergence and character of these risks and governance challenges in more detail.

3.4. From Unprecedented Scaling and Centralization to a New Order and Scale of Risks

The Large-Scale Era of AI/ML technologies emerged in the mid-to-late 2010s with the radical leap in compute capacity, training data set size, and number of model parameters that accompanied the birth of a new generation of industry-produced FMs encompassing an ever-improving set of LLMs and content-generating diffusion and multimodal AI models (Bommasani et al., 2023; L. Yang, Zhang, et al., 2023). The advent of these models was ushered in by the convergence of: (1)improvements in computing hardware that allowed for parallel processing, faster matrix multiplication, and thus ultimately dramatically increased compute load; (2) the leveraging of transformer model architectures that could exploit parallel processing and nonsequential computation to train much larger models with greater inferential breadth, depth, and expressiveness; and (3) the utilization of self-supervised learning techniques that removed the need for manually labeled data sets and thereby enabled model training to include boundless and unfathomable volumes of unannotated, internet-scale data (Bommasani et al., 2021; Kaddour et al., 2023; McIntosh et al., 2023; Triguero et al., 2024; Zou et al. 2023).

The drastic scaling of compute capacity, data ingestion, and model complexity led to “qualitative leaps” in FM/LLM capability (Shanahan, 2024). This included the emergence of novel capacities for zero-shot, transfer, and in-context learning (i.e., the ability of FMs to generalize behavior across divergent application contexts on the basis of latent model parameters or a few example-based inputs) (Brown et al., 2020; Dong et al., 2022; Lu et al., 2023; Xie & Min, 2022). Model scaling also led to significant aggregate performance gains in accordance with ‘scaling laws’ (Kaplan et al., 2020; Hoffmann et al., 2022).

In this new era of large-scale AI systems, a small class of FMs/LLMs, or pretrained base models, could be converted into diverse task-specific applications and adapted to carry out a wide range of downstream functions through both fine-tuning (i.e., further training on specialized data sets) and prompt-engineering (i.e., the crafting of input prompts to guide the model’s ‘reasoning’ functions, for example, by providing input-output exemplars, inducing ‘self-criticism,’ or assisting in multistep, ‘chain-of-thought’ reasoning).6 The effective conversion of these models into useable applications also required the introduction of novel computational techniques to transform them into task-oriented ‘agentic systems’(Chan et al., 2023; S. Yang, Nachum, et al., 2023; L. Wang, Ma, et al., 2023; Xi et al., 2023). For instance, OpenAI, the producers of ChatGPT, utilized reinforcement learning from human feedback (RLHF) to ‘agentify’ their GPT-3.5 and GPT.4 models with conversational and human-interactive functionality (Bai et al., 2022; Christiano et al., 2017; Ouyang et al., 2022). Other examples of the application-level agentification of FMs include the creation of ‘agent architectures’ (with ‘memory,’ ‘planning,’ and ‘reflection’ components) to construct ‘generative agents’ for video games and immersive digital simulation environments (Lin et al., 2023; Park et al., 2023), the construction of ‘tool agents’ like Toolformer and ToolBench that can use system-external tools like calculators, search engines, maps, calendars, and machine translators to accomplish multiple tasks in connected digital environments (Qin et al., 2023; Schick et al., 2024; L. Wang, Ma, et al., 2023), and the creation of AI-enabled robotic systems that use the planning, self-explanation, and interaction functionalities of FMs to navigate ‘embodied environments’(Mandi et al., 2023; Rana et al., 2023; Z. Wang, Cai, et al., 2023).

The multipurpose and increasingly multimodal character of this new class of large-scale AI systems, their increasingly agentic character, and their emergent capability for transfer learning and linguistic functionality across application domains, heralded a step change in their utility, commercializability, and uptake. With the launch of ChatGPT in late 2022, the era of large-scale AI systems metamorphosed into the era of AI’s industrial revolution. Large firms hastened to integrate GenAI applications into their core products, environments, and services at an eyewatering pace and scale. Multinational corporations like Google, Microsoft, Baidu, Meta, Adobe, CISCO, IBM, and Slack rushed to embed GenAI systems in their flagship productivity environments and software, while Google and Microsoft also incorporated these systems into their global search engine services (Competition and Markets Authority, 2023; Kshetri et al., 2023; Smith-Goodson, 2023). Meanwhile, GenAI systems like GPT-4, Gemini, Midjourney, Dalle-2, and Stable Diffusion had almost instantaneous ecosystem-level impacts across content creation–focused sectors and fields of creative work—with broadly recognized effects in the marketing and advertising industry, the journalism and media sector, the publishing industry, the gaming industry, the product- and graphic design industry, software development services, the music industry, and photography and film production (Amankwah-Amoah et al., 2024; Benbya et al., 2024; Fui-Hoon Nah et al., 2023; Hui et al., 2023; Jiang et al., 2023; Kshetri et al., 2023; J. Lee, Eom, & Lee, 2023; Pavlik, 2023).

The coupling of this fleet-footed industrial adoption with the scaling of compute, data, and model size introduced two primary drivers of a new order and scale of AI-related risks:

  1. Model scaling: The scaling of data, model size, and compute that were linked to the unfathomability of training data, model opacity and complexity, emergent model capabilities, and exponentially expanding compute costs;

  2. Industrial scaling: The rapid industrialization of FMs and GenAI systems meant the onset of a new scale of systemic-, societal-, and biospheric-level risks that spanned the social, political, economic, cultural, and natural ecosystems in which these systems were embedded. This ushered in a new age of widespread exposure in which increasing numbers of impacted people and communities at large were made susceptible to the risks and harms issuing from model scaling and to new possibilities for misuse, abuse, and cascading system-level effects.

We will flesh out these two drivers of risk in turn.

3.4.1. Risks From Model Scaling: Unfathomable Data

The scaling of training data sets has been a precondition of the accelerating evolution of multipurpose FMs. With the discovery that emergent qualities of model adaptability and transfer learning depended largely on the volume of training data fed into FMs/LLMs, the attitude of ‘the more data the better’ and ‘scale is all you need’ came quickly to prevail as the ‘rule of thumb’ among AI developers (Birhane et al., 2024; Bommasani et al., 2022; Kaplan et al., 2020).7 But, this voracity for volume came at a cost. Unprecedented data scaling required the collection of a magnitude of web-scale data that far outstripped the capacity of human project teams to manually check data quality and source integrity, let alone scrutinize sampling deficiencies and other problematic aspects that arose in data creation. The embrace of this ‘effort to scale’ at the expense of the ‘attention to care’ (Seaver, 2021) necessary to safeguard responsible data stewardship has given rise to a broad range of model intrinsic risks related to the unfathomability of training data sets (Bender et al., 2021; Kaddour et al., 2023).

First off, the inclusion of massive and uncurated web-scraped data sets in the pretraining corpora of FMs has led to widespread risks of data poisoning, memorization, and leakage. The scaled and indiscriminate extraction of data from the internet has exponentially broadened the attack surface for the injection of adversarial noise into training data sets (Bommasani et al., 2021; Carlini, Jagielski, et al., 2023; Casper et al., 2024). This kind of data poisoning can corrupt the parameters of trained FMs, introducing unreliable, poor, and harmful performance for targeted inputs. Concerningly, it has been shown that these sorts of web-scale poisoning attacks can be launched inexpensively and with relative ease, making them “practical and realistic even for a low-resourced attacker” (Carlini, Ippolito, et al., 2023, p. 2). Additionally, it has been demonstrated that the presence of personally identifiable information (PII) (e.g., email addresses and phone numbers) and sensitive documents (e.g., personal medical records) in massive pretraining corpora can yield privacy leaks during model prompting and data extraction attacks (Carlini et al., 2021; Kaddour et al., 2023; Lukas et al., 2023; Mozes et al., 2023). Carlini, Ippolito, et al. (2023) have found that these hazards of trained models emitting sensitive memorized data grow significantly as model capacity increases, concluding that, “memorization in [FMs/LLMs] is more prevalent than previously believed and will likely get worse as models [sic] continues to scale, at least without active mitigations” (p. 1).

Closely related to such privacy risks are another set of concerns about how large-scale training data collected through brute-force web-scraping and indiscriminate data extraction can lead to violations of data protection rights and infringements of intellectual property and copyright protections. In both areas of law, data subjects and content creators possess rights that must primarily be safeguarded at the decision points of data collection and processing. In jurisdictions where data protection laws are in place, parties that are collecting and processing personally identifiable information must establish the legal basis for this activity. According to the EU’s General Data Protection Law (GDPR), this can occur either by gaining the consent of affected data subjects or by establishing a legitimate interest for using their data that is consonant with data protection principles such as fairness, transparency, and purpose limitation (Information Commissioner’s Office [ICO], n.d.). However, as Garrido (2024) argues in this Policy Forum, the establishment of a legal basis for secondary data use in FMs and GenAI technologies requires considerations of “the context of the original processing and the subsequent purposes of use.” Properly taking these dimensions of ‘contextual integrity’ and ‘compatible purpose’ into account involves establishing that the data processing involved in training the FM or GenAI system “respects the relevance or expectation of use of the original information being shared, and the original purpose or flow of information.” The problem arising here is that the unfathomability of the data sets used to train FMs or GenAI systems categorically excludes the viability of such careful and case-specific contextual considerations and thus impedes clear justification of purpose limitation and legitimate interest. Combined with the generalized failure to attain consent from those whose personally identifiable digital trace data have been scraped from the internet, this inability to establish a legitimate interest consistent with data protection principles calls into question the very lawfulness of the processing of personal data in the training of the FMs or GenAI systems. This has prompted Italy’s data protection regulator, Garante, for example, to assert that OpenAI, in its development and deployment of ChatGPT, has no legal basis to justify “the mass collection and storage of personal data for the purpose of 'training' the algorithms underlying the operation of the platform” (McCallum, 2023; see also Rahman-Jones, 2024).

The unfathomability of FM/GenAI training data sets has, in a similar way, given rise to potential copyright infringements and violations of intellectual property law. Again, the impracticability of diligent human data curation and stewardship has led to a general failure both to obtain consent from copyright holders and to establish a legal basis for the legitimate use of copyrighted material (for instance, through data licensing regimes). Combined with the ability of GenAI systems to memorize and then replicate elements of this material that are embedded in their training data, such a failure to establish lawfulness has precipitated risks of outright ‘digital forgery’ (Somepalli et al., 2023) and AI-enabled content piracy or theft (Bird et al., 2023; Piskopani et al., 2023; Sobel, 2024). These risks of potential copyright violations have become an area of fierce debate amid the rapid commercialization of GenAI systems. Some defenders of the ‘fair use’ doctrine in U.S. intellectual property law claim that the unhampered appropriation of copyright-protected data in the training of FM/GenAI models is legally justified insofar as the AI technologies are ‘transforming’ text, audio, image, and video inputs into novel outputs that do not simply reproduce in-copyright content (Oremus &Izadi, 2024; Samuelson, 2023). By contrast, others point out that the fair use principle has thus far been applied by courts primarily in noncommercial contexts where the use of copyrighted material is serving the public benefit rather than the ends of profit—leaving open a space for creative workers and media companies to contest the nonconsensual arrogation of their content by private sector tech corporations (Lemley & Casey, 2020). From this perspective, even GenAI outputs that are similar to original artworks, articles, or writing contained in training data sets (e.g., synthetic content generated ‘in the style’ of a particular artist or author) do not fall under fair use where these outputs detrimentally impact the economic markets on which the livelihoods of original content creators depend, harming their incomes, financial well-being, and prospects for professional sustainability (Henderson et al., 2023; Lucchi, 2023). Currently, a slew of legal actions has been set in motion by authors, artists, and media firms like The New York Times (Stempel, 2023) and Thomson Reuters (Reuters, 2023), who are making these kinds of claims and seeking a range of options for legal relief from financial compensation for intellectual property infringement to the outright destruction of GenAI systems that have run afoul of copyright law.

Beyond risks of violating privacy, data protection, and intellectual property rights, the unfathomability of FM/GenAI training data sets has also given rise to risks of serious psychological, allocational, and identity-based harms that derive both from discriminatory and toxic content embedded in web-scraped data and from demographically skewed data sets that lead to disparate model performance (Bird et al., 2023; Birhane et al., 2023; H. Chen et al., 2024; Shelby et al., 2023; Solaiman et al., 2023; Weidinger et al., 2022). It is by now well-established that the large-scale, opaque data sets, which have become the norm of the “scrape-first-ask-later data creation and curation culture” (Birhane et al., 2024, p. 2) behind commercial-grade FMs, contain ingrained patterns of bias and discrimination (Abid et al., 2021; Birhane & Prabhu, 2021; Bender et al., 2021; Hutchinson et al., 2020; Jin et al., 2023; Kirk et al., 2021; Omiye et al., 2023; Tao et al., 2023), exclusionary norms (Weidinger et al., 2022), toxic and abusive language (Dinan et al., 2021; Gehman et al., 2020; Nozza et al., 2022), microaggressions (Breitfeller et al., 2019), and stereotyping (Barlas et al., 2021; Bianchi et al., 2023; Kotek et al., 2023; Ma et al., 2023). Moreover, it is widely acknowledged that the data used to train FMs reflect inequities of internet access, broader inequalities that manifest in local, regional, and global digital and data divides, geographic biases in data collection (e.g., oversampling in the ‘Global North’), and overrepresentation of dominant languages (e.g., English), hegemonic cultural values (e.g., U.S. views), and gender and age groups (e.g., males and younger people) (Bender et al., 2021; Cao et al., 2023; Johnson et al., 2022; Parashar et al., 2024; Solaiman et al., 2023; Tao et al., 2023). These data consequently harbor representational imbalances that crystallize in FM/GenAI training data sets and that lead to the underrepresentation, invisibility, or erasure of historically marginalized and minoritized communities. Such imbalances can lead, in turn, to disproportionately poor model performance and deficient quality-of-service for these historically marginalized and minoritized demographic groups (Bender et al., 2021; Dev et al., 2021; Nekoto et al., 2020; Solaiman et al., 2023; Talat et al., 2022).

These risks of discriminatory and psychological harm have been exacerbated by the additional negative impacts that can result from unfair biases arising in technical mitigation measures. Dodge et al. (2021) show, for instance, that blocklist filters comprised of banned words (which are used to clean web-crawled data sets) “disproportionately [remove] documents in dialects of English associated with minority identities (e.g., text in African American English, text discussing LGBTQ+ identities)” (p. 2). Bender et al. (2021) similarly point out that blocklists that contain generic words associated with obscenity and pornography can attenuate the presence and “influence of online spaces built by and for LGBTQ people” (p. 614). Other researchers have demonstrated that annotator biases that surface in the construction of data sets used train classifiers for the detection of toxic language and hate speech in FM/GenAI data sets that engender the embedding of racist attitudes, beliefs, and stereotypes that give rise to systemic prediction errors in these classifiers (Davani et al., 2023; Hartvigsen et al., 2022; Sap et al., 2021). In their analysis of the perpetuation of stereotypes by text-to-image GenAI systems, Bianchi et al. (2023) show that—even where systems like Dall-E contain supposed ‘guardrails’ set up to prevent the production of harmful content—noxious stereotyping behavior not only endures unmitigated but amplifies existing societal biases (p. 3).

3.4.2. Risks From Model Scaling: Model Opacity and Complexity

While the aggressive scaling of data has given rise to model-intrinsic hazards that derive from the unwieldiness pretraining corpora, the accompanying explosion of model size has occasioned a range of unprecedented risks and governance challenges related to model opacity and complexity. The seemingly impenetrable architectural complexity of these ultra-high-dimensional AI systems, coupled with the complex character of their emergent linguistic and cognitive behaviors, has rendered conventional AI explainability techniques largely unsuitable and ineffectual. This has yielded an urgent interpretability predicament in which the industrial development and application of black-box GenAI systems have quickly pressed ahead without actionable or viable solutions to their immediate interpretability deficits.

Such an interpretability predicament intensifies longer term challenges surrounding AI explainability that have plagued the field of deep learning since its inception (Adadi & Berrada, 2018; Benítez et al., 1997; Doshi-Velez & Kim, 2017; Gilpin et al., 2018; Jeyakumar et al., 2020; Mittelstadt et al., 2019; Räuker et al., 2023). The desiderata of AI interpretability for advanced AI systems (and especially FMs and GenAI systems) are broadly agreed. Ensuring sufficient interpretability can help AI research scientists and developers to debug the models they are building and to uncover otherwise hidden or unforeseeable failure modes, thereby improving downstream model functioning and performance (Bastings et al., 2022; Luo & Specia, 2024; Zhao, Chen, et al., 2024). It can also help detect and mitigate discriminatory biases that may be buried within model architectures (Alikhademi et al., 2021; Zhao, Chen, et al., 2024; Zhou et al., 2020). Furnishing understandable and accessible explanations of the rationale behind system outputs can likewise help to establish the lawfulness of AI systems (e.g., their compliance with data protection law and equality law) (Chuang et al., 2024; ICO/Turing, 2020) as well as to ensure responsible and trustworthy implementation by system deployers, who are better equipped to grasp system capabilities, limitations, and flaws and to integrate system outputs into their own reasoning, judgment, and experience (ICO/Turing, 2020; Leslie, Rincón, et al., 2024). The provision of nontechnical, plain-language AI explanations also helps both to establish justified trust among impacted people and to ensure paths to actionable recourse for them when things go wrong (Ferrario & Loi, 2022; Liao et al., 2022; Luo & Specia, 2024). It follows from all this that a lack of adequate interpretability can lead to significant risks ranging from unpredictable failure modes and reliability defects, undetected discriminatory biases, irresponsible and harmful implementation, questionable legality, and the potential infringement of fundamental rights and freedoms like the respect for human dignity, autonomy, and due process.

While the field of explainable AI (often referred to simply as XAI) has made notable progress over the past several years in advancing knowledge about the behaviors and potential flaws of opaque AI systems (Angelov et al., 2021; Räuker et al., 2023; Zhao, Chen, et al., 2024), myriad critical voices have emphasized that applications of contemporary AI explainability methods to black-box AI systems are rife with shortcomings that continue to hamper their real-world utility. These critics have cautioned against ‘false hopes’ that current explainability techniques provide justified reassurance about the safety, accuracy, reliability, and fairness of black-box models, stressing that contemporary approaches often generate misleading or unfaithful explanations (Ghassemi et al., 2021, p. e746). It has been demonstrated, along these lines, that instance-specific ‘local explanations,’ which are based on feature attribution or counterfactual reasoning, can mislead users and foster a false sense of understanding. Gradient-based methods,8 for example, have been shown to lack robustness and stability, e.g., producing significantly different explanations for minute perturbations to inputs that generate the same predicted label (Adebayo et al., 2018; Agarwal et al., 2022; Alvarez-Melis & Jaakkola, 2018; Ghorbani et al., 2019; Kindermans et al., 2019). Similarly, perturbation-based explanation methods9 like LIME or SHAP have been shown to lack both fidelity to the underlying models they attempt to explain and reliability and robustness in adversarial settings (Mittelstadt et al., 2019; Rudin, 2019; Slack et al., 2020). Counterfactual methods have also been shown to generate unfaithful explanations that are disconnected from ground truth data, easily manipulated, and often unjustified (Laugel et al., 2019; Slack et al., 2021).

These application-directed concerns about inaccurate or misleading AI explanations have been accompanied by a set of broader trepidations about the way that an unquestioned reliance on or confidence in such methods can lead to an unreflective trust in shaky, deceptive, or unproven XAI approaches and, correspondingly, promote a misplaced deference to the authority of black-box systems (Rudin & Radin, 2019; Rudin et al., 2022). In this connection, Lakkaraju and Bastani (2020) challenge the assumption that post hoc AI explanations actually reflect the innerworkings of opaque models, pointing out that (1) the predictions of even high-fidelity post hoc explanations reflect only correlations with the predictions of the original black-box model rather than actual determinative mechanisms within them, (2) such explanations “may fail to capture causal relationships between input features and black box predictions,” and (3) “there could be multiple high-fidelity explanations for the same black box that look qualitatively different,” undermining the validity of claims to correct explanation under conditions of predictive multiplicity (p. 79). Others have observed that deceptive or manipulated post hoc explanations can obscure or hide patterns of unfair bias and discrimination that are ingrained in opaque model architectures, leading to ‘fairwashing’ (i.e., the facilitation of a false impression that model outputs are fair when in actuality they are discriminating based on sensitive attributes) (Aïvodji et al., 2019, 2021; Alikhademi et al., 2021; Anders et al., 2020; Shahin Shamsabadi et al., 2022).

The new order and scale of complexity of FMs has only compounded these difficulties faced by conventional XAI techniques in attempting to reliably and accurately explain deep learning models. Though some interpretability researchers, working on FMs/LLMs, have endeavored to build on local, feature attribution–based methods (Kokalj et al, 2021; Sanyal & Ren, 2021; Sikdar et al., 2021; Singh et al., 2024), the scaling of model depth and complexity has largely rendered traditional XAI methods unfit, incapable, or even obsolete (Chuang et al., 2024; Wu et al., 2024; Zhao, Chen, et al., 2024; Zhao, Yang, et al., 2024; Zou et al., 2023). Above all, emergent model capabilities for carrying out higher order reasoning functions have made localized explanations of input salience much less meaningful and applicable. Explaining the cognitive and formal-linguistic behavior of FMs/LLMs demands a level of interpretive power and sophistication that is not captured by one-dimensional feature attribution approaches, simplified surrogate models, or similar local and global explainability techniques. Identifying and explicating the rationale behind FM/LLM outputs requires understanding the more intricate and expressive transformer architectures and latent representation spaces whence the cognitive functioning and behavior of FMs/LLMs spring. Additionally, the depth and size of these architectures10 have made the computational costs of scaling more computationally intensive XAI methods essentially impracticable (Zhao, Yang, et al., 2024).

In response to the unprecedented demand for approaches to interpretability that can access the mechanics and higher level cognitive functions of complex model architectures, researchers have started to develop novel techniques that strive to open the FM/LLM black box. The most prominent of these can be organized into three categories: top-down methods of representation engineering, bottom-up methods of mechanistic interpretability, and outside-in methods of prompt-based self-explanation and prediction decomposition (Singh et al., 2024; Zhao, Chen, et al., 2024). While each of these emerging techniques heralds prospects of increasing model transparency, they are all also fraught with significant and yet-to-be-addressed problems.

Representation engineering aims to tackle the challenge of FM/LLM interpretability by examining how latent representations of concepts manifest in patterns of activity across populations of neurons and produce cognitive behaviors (Zou et al., 2023). This technique views model-internal representations (i.e., higher level concepts and functions that have crystallized in the model’s latent representation space) as the primary units of analysis, seeking to gain an actionable and global understanding of targeted aspects of model behavior. It first tries to locate learned embeddings for concepts and functions (like ‘truthful,’ ‘dishonest,’ or toxic behavior) within the model’s latent space through various methods of stimulation, visualization, probing, and linear modeling (K. Li, Patel, et al., 2024; Marks & Tegmark, 2023; Morris et al., 2023; Zou et al., 2023). It then seeks to extract, monitor, and manipulate these representations with the ultimate goal of improving model control and safety (Zou et al., 2023).

Though representational engineering has shown promise in strengthening capacities for model editing and manipulation (Liu et al., 2023), it is still at a very preliminary stage of development and has yet to establish a significant body of empirical support for its validity as an interpretability technique (Wolf et al., 2023; Zhao, Yang, et al., 2024). The method also faces a set of fundamental challenges that derive from its potentially reductive approach to issues of semantic complexity, plurality, and opacity. Identifying and mathematically formalizing representations of interpretively ambiguous or contestable concepts—like “truthfulness,” “morality,” “power,” and “emotion” (Zou et al., 2023)—is a thorny affair, especially where ground truth is unavailable to validate potentially speculative claims made about the discovery of these elements in the latent embedding spaces of opaque model architectures (Zhao, Yang et al., 2024). As Levinstein and Herrmann (2024) argue in their critical examination of methods for probing ‘lie-detection’ in LLMs, “while we can examine the embeddings, parameters, and activations within an LLM, the semantic significance of these elements is opaque. The model generates predictions using a complex algorithm that manipulates high-dimensional vectors in ways that don’t obviously resemble human thought processes.” These authors show how both supervised and unsupervised methods of probing latent representations of ‘truthfulness’ and ‘falsity’ in LLMs can fail to generalize and can introduce intractable issues of semantic ambiguity into the classification of the truth-value of sentences. For instance, a probing classifier that is built to distinguish truth representations in an LLM might identify sentence properties that are coincident to ‘is true’ but that are not truth-establishing (e.g., sentence properties like ‘is commonly believed,’ ‘is believed by most Westerners,’ ‘is expressed in the style of Wikipedia’) (p. 10). Such a classifier would then mistakenly identify these coincident properties as truth-indicative (cf., Marks & Tegmark, 2023). This failure to disambiguate truth from closely associated but non-truth-related properties signals a broader weakness in the potentially reductive and semantically naïve assumptions built into representation engineering about the availability of stable, unequivocal, and seamlessly formalizable units of meaning that are captured by model-internalized representations of high-level concepts.

Notwithstanding these semantic obstacles, experimentation-centered methods in representation engineering have already spurred progress in the control and steering of desired model behavior (Liu et al., 2023; Zou et al., 2023). By presenting LLMs with salient input stimuli designed to trigger specific concept-related model responses, these methods induce and then capture neural activity for target concepts and functions, marshaling correlations between embeddings and model predictions to manipulate corresponding behaviors. This means that the embeddings behind objectionable model behaviors (e.g., toxic or offensive outputs) can potentially be pinpointed in the hidden states of the neural network layers and then extracted, but it has also raised dual-use concerns that bad actors could utilize the technique for jailbreaking and removing safety functions, especially for open source FM/LLMs (K. Li, Patel, et al., 2024; Wu et al., 2024). This focus on model control and steering has also raised issues about whether this technique advances FM/LLM interpretability per se. While such an intervention-driven method may enable more effective behavioral engineering of LLMs, it offers only a surface-level and indirect explanatory view of the meaning, rationale, and ‘reasoning’ processes that underlie model behaviors, leaving many of the longer term challenges faced by FM/LLM interpretability and explainability unaddressed (Lappin, 2024).

The second FM/LLM interpretability technique, mechanistic interpretability, is similar to representation engineering insofar as it also aims to provide a global and actionable understanding of specific aspects of FM/LLM behavior (Elhage et al., 2022; Meng et al., 2022; Olah et al., 2020; Olsson et al., 2022). However, instead of focusing on macro-scale learned representations in the latent embedding space, it zooms in on the micro-scale algorithm-level relationships between neurons, neuronal interconnections (frequently referred to as ‘circuits’), weights, and features, unpacking the functionality and influence of each of these underlying components and mechanisms (Zhao, Yang, et al., 2024). It does this by ‘reverse engineering’ distinct facets of model computation into humanly understandable components (K. Wang, et al., 2022). It concentrates in particular on the causally significant relationships of neurons and weights that form the node-to-node interconnections, which enable FMs to distil syntactically, semantically, and functionally relevant features—for example, indirect object detection (K. Wang, et al., 2022), in-context learning (Olsson et al., 2022), and text-to-label answer mapping for multiple choice questions (Lieberum et al., 2023).

While mechanistic interpretability methods have yielded some encouraging preliminary results in the areas of model editing, diagnosis, and enhancement (Luo & Specia, 2024; Wu et al., 2024), this approach (much like that of representation engineering) is as yet immature and largely untested (Räuker et al., 2023; Zhao, Yang, et al., 2024). It also faces a multitude of nontrivial difficulties related to its practicability and validity. First off, researchers, who have attempted to advance the method, have been able to produce successful results only with small or ‘toy’ models and only on a very limited number of model parameters and components—with much current effort focused on a handful of model layers containing attention heads.11 Zhao, Yang, et al. (2024) estimate that “much less than a third of all parameters in LLMs” can be provisionally explained with this technique. This leaves the vast majority of the algorithmic territory unexplored, including the complicated multilayer perceptron (MLP) layers, which have long been recognized in the field of deep learning as notoriously difficult to untangle and comprehend. This limitation has been made all-the-more significant by results from alternative research streams in LLM interpretability that have interrogated the important role played by causal states within intermediate LLM layers and feed-forward mechanisms across the entire model architecture (Belrose et al., 2023; Geva et al., 2022; Luo & Specia, 2024; Meng et al., 2022; McGrath et al., 2023). This research suggests that a circuits-based and attention-block-focused account is insufficient to capture the end-to-end computational complexity of information flows across model layers (Wu et al., 2024; Zou et al., 2023).

Additionally, mechanistic interpretability methods are increasingly encountering more fundamental conceptual difficulties. For instance, Zou et al. (2023) point out that the bottom-up, neuron-level focus on circuits could fail to account for emergent phenomena that arise at the system level alongside ascending architectural complexity. On this view, exclusive concentration on the individual parts (neurons and circuits) could lead to a failure to identify mechanisms and properties that arise from broader interactions between such individual components but that manifest only at the model level when the system functions as a whole. From this critical, holistic perspective, concentration on neuronal mechanisms could also result in flawed model control since alterations “in underlying mechanisms often have diffuse, complex, unexpected upstream effects on the rest of the system” (Zou et al., 2023, p. 38).

The interrelated problems of polysemanticity and superposition pose a second fundamental challenge. A neuron in an FM/LLM is polysemantic if it is activated by more than one feature contained in an input stimulus (Cunningham et al., 2023; Scherlis et al., 2022).12 When a polysemantic neuron activates for multiple unrelated features, it becomes difficult to isolate and understand what role that neuron is playing in the model. This functional obscurity undermines the “ability to decompose networks into independently meaningful and composable components [and] thwart[s] existing approaches to reverse-engineer networks” (Gurnee et al., 2023, p. 4). It has been argued that the pervasive condition of polysemanticity in LLMs arises from a general phenomenon called superposition (Elhage et al., 2022). Superposition occurs when the number of features that the model contains in its latent space exceeds the number of neurons it has in its architecture, causing it to compress the excess quantity of features into a lower number of dimensions (Arora et al., 2018; Elhage et al., 2022). This induces the model to represent single features in linear combinations of neurons, and it forces the individual neurons to represent numerous semantically distinct properties at the same time. On top of this, superposition makes it extremely difficult for model developers to identify and comprehensively itemize all the expressed and implicit features for any given FM/LLM, because the superposition structure (i.e., the latent space of all known and unknown features) is neither visible nor immediately accessible (Elhage et al., 2022). Without a grasp of this structure, it becomes unfeasible to decompose the feature space into understandable components and to gain a solid and meaningful grasp of the weights and activations of neurons and of neuronal relationships. Ultimately, such a high level of impenetrability poses an elementary challenge to basic FM/LLM interpretability and control.

The third FM/LLM interpretability technique is prompt-based self-explanation and prediction decomposition. This method aims to incorporate the provision of natural language explanations into the question-answer functionality of FMs/LLMs themselves, thereby aspiring to provide “an interpretable window into the behavior of the model” (Wei et al., 2022, p. 24825). It involves prompting the FM/LLM to carry out a series of steps to explain its own ‘reasoning’ process for a specific output (also known as chain-of-thought prompting) (Lanham et al., 2023; Radhakrishnan et al., 2023; Singh et al., 2024). This technique fundamentally differs from representation engineering and mechanistic interpretability in three ways. First, because it exclusively involves explanations of single instances of model predictions, it amounts to local explanation rather than global explanation of how aspects of a model’s inner workings account for general patterns of behavior and lead to prospects for model editing and control. Second, prompt-based self-explanation is model agnostic. The method can be applied without alteration to any LLM. This is in contradistinction to representation engineering and mechanistic interpretability, which are model-specific and are thus tailored to explaining the internal workings and architectures of particular models. Third, rather than lifting the hood of the LLM to try to understand model internal mechanisms and latent representation spaces, prompt-based self-explanation works outside the model in the input-output space of model prompts and predictions, preserving its black-box character (Casper et al., 2024).

The advantages of prompt-based self-explanation and prediction decomposition have simultaneously been a source of some of its greatest weak points, with its seamless, prima facie useability and cogency cloaking more basic defects. The ability of the technique to be applied universally to LLMs and to generate human understandable and output-specific explanations has made it perhaps the most accessible, flexible, and ready-to-hand explainability method currently available. Likewise, because it involves the presentation of the step-by-step logic of the system’s natural language responses, the method operates the higher order level of the model’s emergent reasoning functionality, thereby enabling some degree of transparency in an area of model capability that has outmoded conventional XAI techniques and eluded other emerging FM/LLM interpretability methods. These strengths, however, have been counterbalanced by the tendencies of self-explaining LLMs to offer explanations that are convincing but that systematically misrepresent or are unfaithful to the actual reasoning processes underlying their predictions and behaviors (Lanham et al., 2023; Turpin et al., 2024; Wu et al., 2024; Ye & Durrett, 2022). Turpin et al. (2024), in this connection, use adversarial prompting techniques to show that LLMs routinely generate unfaithful and misleading chain-of-thought self-explanations that veil discriminatory biases in internal inferential processes by omitting the presence of such biases from declared reasoning. As Lyu et al. (2023) observe, this lack of faithfulness can be especially “dangerous in high-stake applications because it may mislead people into believing that the model is self-interpretable, while there is no actual causal relationship between the reasoning chain and the answer” (p. 1). Indeed, the deceptive plausibility, credibility, and accessibility of such plain language answers could lead to overtrust in misleading models and overcompliance with their potentially biased, nonfactual, or erroneous outputs (Lyu et al., 2023). While much preliminary research is underway to explore paths out of this explanation infidelity trap (Y. Chen, Zhong, et al., 2023; Chuang et al., 2024; Lyu et al., 2023; Radhakrishnan et al., 2023; Yao et al., 2024), the unreliability of LLM self-explanation remains an open challenge.

Be that as it may, more fundamental difficulties related to other forms of misrepresentation also beset prompt-based self-explanation and prediction decomposition techniques. To begin with, this method unwarily preserves the FM/LLM black box. If taken as a sufficient explanatory tool, prompt-based self-explanation would divert attention away from or block access to potentially problematic model innerworkings that could covertly influence outputs in harmful or discriminatory ways (Casper et al., 2024). Making matters worse, the method could simultaneously obscure such influences through the provision of believable but inaccurate natural language self-explanations.

Secondly, problematic assumptions are built into the very idea that chain-of-thought prompting (composed of prefabricated exemplars of questions, simplistic reasoning chains, and answers) could sufficiently capture the interpretive and inferential intricacies involved in formulating cohesive natural language explanations. An unquestioned confidence in the effectiveness of chain-of-thought templates betrays a reductivist understanding of the nuance and complexity of communicative processes of discursive reasoning—processes requiring explanations to encompass intersubjectively binding commitments to inferential claims made within an evolving space of reasons, interpretations, and justifications. Dialogical practices of offering and asking for reasons are social practices that entail normative relationships of discursive commitments and entitlements between interlocutors who hold each other responsible for how their explanations hang together within this dynamic and holistic space of reasons (Brandom, 1994/2001, 2000, 2013; Leslie, 2020, 2023). Machine-generated self-explanation is categorically distinctive from this. Rather than working from within a framework linguistic intersubjectivity that is realized through embodied discursive practices of giving and asking for reasons, it involves the generation of next-word predictions based on the statistical likelihood of patterns that have been extracted from training data (Shanahan, 2024). This detachment from lived reality is, in fact, a chief reason why prompt-based self-explanation can be unfaithful, unreliable, and ‘hallucinated,’ in the first place. Prompt-based self-explanation, like the inference-making behavior of the LLM it explains, has no direct connection to ground truth. To the contrary, when humans justify their claims about the world to each other, they offer reasons that are linked to shared understandings about the objective world (i.e. they appeal to a commonly experienced world of ground truths, and their reasons are assessed on this basis). This grounding of reasons and their justifiability in shared experience is precisely what LLMs and their self-explanations lack.

Moreover, even where LLMs carry out higher order reasoning functions, these express only a very limited type of formal linguistic capacity (Lu et al., 2023; Mahowald et al., 2023; Mondorf & Plank, 2024). Such systems apply complex syntactical functions to prediction tasks based upon learned structures of human reasoning that have congealed in their latent representation spaces. They do not express the sort of full-blown functional linguistic capacity that is a precondition of social practices of discursive reasoning and linguistic intersubjectivity writ large. It may therefore be said that those who uncritically appeal to prompt-based LLM self-explanation for the justification of model predictions and behaviors are liable to a kind of sociomorphic deception—a cognitive bias in which the presentation of machine-generated reasons and ‘self-interpretation’ is mistakenly clothed in the attire of full-fledged discursive reasoning and intersubjective understanding and unduly ascribed credence thereby. This amounts to a false attribution of social characteristics to computational (and hence elementally inanimate and nonsocial) entities.

On the whole, while the emerging approaches to LLM interpretability that we have explored show some sparks of promise for tackling the novel set of challenges raised by FM/LLM model scaling, some critics have rightfully noted that interpretability research in this area has yet to produce “tools and insights that are useful in the real world” (Räuker et al., 2023) and that “the progress achieved thus far has been relatively modest” (Zhao, Yang, et al., 2024). This state of immaturity in the field signposts the deeper interpretability predicament that has characterized the GenAI era. The hasty commercial roll-out of black-box generative AI applications has taken place without sufficient FM/LLM interpretability safeguards in place. Meanwhile, the ascending complexity and emergent cognitive behaviors of these technologies have rendered already thin XAI techniques for deep learning broadly unsuitable. This has meant that FM/LLM developers and users have been ill-prepared to substantively address the significant risks and harms arising from a lack of adequate interpretability in the systems they design and deploy. Such an irresponsible approach to GenAI transparency has been a significant enabling factor of future shock.

3.4.3. Risks From Model Scaling: Exponential Expansion of Compute and Infrastructure Costs

In addition to unaddressed increase in model complexity and opacity, the evolution of large multipurpose FMs has led to an exponential increase in compute and infrastructure requirements. According to research conducted in 2018 by OpenAI, the amount of computing needed to train the largest AI was already doubling every 3.4 months since 2012 (Amodei et al., 2018). Despite recent efficiency improvements, the training of large and complex models entails the use of numerous servers with multiple graphic processing units (or GPUs) that consume significant energy and generate high carbon emissions where fossil fuel–based energy is used. Training Hugging Face’s BLOOM LLM resulted in over 50 metric tons of carbon dioxide emissions (about 60 London–New York flights) (Luccioni et al., 2023), and carbon emissions for the training of Google’s BERT LLM are suggested to have accounted for a similar amount to that of a transatlantic flight (Strubell et al., 2019). But this is not the only environmental cost of training FMs. They also require significant clean and fresh water to cool the data centers and generate electricity (P. Li, et al., 2023). According to the environmental reports of Google and Microsoft, in one year, the companies' water use increased by 20% and 34%, respectively, in preparation for their respective LLMs (Crawford, 2024).

While research on the environmental impact of the use of FMs and GenAI models is emerging, it is expected that factors including how these models are integrated into products, their scale of use, and energy cost per prompt may have environmental costs, which may aggregately exceed those of model training (Strubell et al., 2020; Weidinger et al., 2021). Indeed, research conducted by Luccioni et al. (2023) found that AI-generated images are particularly energy-intensive compared to AI-generated text. Using a sophisticated AI model to generate one AI-generated image consumed energy equivalent to fully charging a smartphone, while creating 1,000 texts, dependent on the AI model used, was equivalent to 4.1 miles and 0.0006 miles in an average gasoline-powered car. In addition, Luccioni and colleagues suggest that multipurpose GenAI models are more energy-intensive than AI models fine-tuned for specific tasks. P. Li et al. (2023) explored the water footprint of AI models and calculated that running GPT-3 inferences for 10–50 prompts consumed about 500 milliliters of water, with variation according to where the model is hosted.

A more comprehensive understanding of the risks and harms of exponential expansion of compute and infrastructure costs, however, also requires examining the impacts associated with the AI global supply chain. The demand for computer hardware to train the models (e.g., data centers, semiconductors, and devices) also leads to increased demand for the precious metals and materials to produce it, with significant social and environmental consequences (Gupta et al., 2021). Not only does the mining of raw materials have significant carbon footprints (Anton et al., 2020), but they are also likely to result in harms to the miners (White & Shine, 2016) and the fragmentation of local communities (Al Rawashdeh et al., 2016; Crawford, 2021; Domínguez Hernández et al., 2024). As the AI infrastructure is highly interconnected with ecological, social, and economic systems, some scholars warn that it may lead to irreversible supply chain dynamics and socioenvironmental harms (Domínguez Hernández et al., 2024; Robbins & van Wynsberghe, 2022).

The environmental costs of AI outlined above are unevenly distributed, with historically marginalized communities bearing the brunt of the impacts (Domínguez Hernández et al., 2024). As stressed by environmental and climate justice scholars, individuals and communities with the least influence in decision-making processes (and the conditions in which they live and work) within and between communities are the most exposed and affected by negative environmental impacts like pollution, ecosystem destruction, and involuntary displacement (Bender et al., 2021; Bullard, 1993; Domínguez Hernández et al., 2024; Leslie et al., 2022; Nixon, 2011; Sachs & Santarius, 2007; Westra & Lawson, 2001).

3.4.4. Risks From Industrial Scaling

While the exponential increase in computing power, data set size, and model complexity sparked off the emergence of a broad range of model intrinsic risks and harms, the convergence of the hazards produced by these scaling dynamics with the rapid industrial adoption of FMs and GenAI systems prompted the onset of a new scale of systemic-, societal-, and biospheric-level risks and harms. The upsurge of this broadening range of hazards emerged from the unprecedented exposure of increasing numbers of impacted people and communities at large to the risks and harms that arose from the interaction of humans with GenAI technologies. Such hazards included new possibilities for misuse, abuse, and cascading system-level effects that spanned the social, political, economic, cultural, and natural ecosystems in which GenAI systems were embedded.

To understand the full extent of the risks and harms arising from industrial scaling, one must look through a sociotechnical lens. This entails examining how the hazards issuing from model scaling surface both at the level of impacted individuals and interpersonal relationships in virtue of human–technology interactions and at broader systemic or structural levels in virtue of the proliferation and aggregation of these hazards that accompany the penetration of commercial GenAI activity across wider social, political, economic, cultural, and natural ecosystems (Domínguez Hernández et al., 2024; Weidinger et al., 2023). For example, discriminatory influences in the design, development, and deployment of FMs that lead to biased outcomes, disproportionately poor model performance, and barriers to accessing benefits and opportunities for individuals from historically marginalized and minoritized demographic groups also manifest at the system level with industrial scaling as the aggregate effects of such discriminatory outcomes are reflected in society-level dynamics expanding inequity, rising inequality, and the widening of digital divides.

In the Table 1, we provide a landscape view of this range of system-level risks and harms arising from industrial scaling. These risks span economy-level impacts (e.g., labor displacement, rising inequality, scaled fraud-based harms), information-ecosystem-level impacts (e.g., downstream data pollution, model collapse, and large-scale mis- and disinformation), population-level impacts on individual safety, security, and well-being (e.g., scaled cyberattacks and malware production, threats of bio-, chemical, and nuclear terrorism, and poorly designed, out-of-control systems), society-level impacts on individual agency, interpersonal relations, and political life (e.g., mass deskilling, cognitive atrophy, anthropomorphic and sociomorphic deception, overdependence, social polarization, and deterioration of social cohesion and public trust in democratic processes), geopolitical-level impacts (e.g., dual use, weaponization, and militarization of AI), and biospheric-level impacts (e.g., environment degradation, resource and biodiversity drain, and climate-related involuntary displacement).

Table 1. System-level risks and harms arising from GenAI’s industrial scaling.

System-Level Risks and Harms



Expanding inequity, rising inequality, and widening of digital divides

An increasing number of people and communities are exposed to the risks and harms that emerge from representational imbalances in training data sets and the deep-seated patterns of bias, discrimination, and toxicity they embed. The compounded or aggregate impacts of disparate model performance that are more damaging to historically marginalized and minoritized people and communities may further exacerbate and worsen existing inequities. In addition, the disparities in access to the hardware, software, and skills to extract value and benefit from foundational models (FMs) and generative AI (GenAI) systems—and the centralization of control over these resources by a handful of private actors from the Global North—create not only an uneven uptake of these technologies but also an unequal distribution of their benefits and risks, where historically advantaged people and communities disproportionately gain economic benefits, worsening economic inequalities and widening local, regional, and global digital divides.

Bender et al. (2021); Bommasani et al. (2021); Domínguez Hernández et al. (2024); Fecher et al. (2023); Khowaja et al. (2023); Kirk et al. (2023); Porsdam Mann et al. (2023); Rillig et al. (2023); Shelby et al. (2023); Solaiman et al. (2023); Weidinger et al. (2021, 2022); Williams (2022).

Labor displacement

The increasingly transferable capabilities and cognitive functionality of FMs have expanded the range of tasks that can be automated. This has raised significant concerns about labor displacement. While the extent of displacement remains uncertain and dependent on uptake and broader economic trends (Weidinger et al., 2022), some suggest that automation will affect specific tasks within occupations rather than entire job roles (Gmyrek et al., 2023) and others warn about unequal impacts, such as worsening income inequalities across industries (Weidinger et al., 2022). Some roles are particularly vulnerable to GenAI, including customer service representatives, clerical workers, paralegals, data analysts, and especially those involving tasks suitable for remote work or based on transactional relationships (C. Chen, Fu, & Lyu, 2023; Frey & Osborne, 2023; Zarifhonarvar, 2023). The ability of GenAI’s technologies to create and alter text, images, videos, and music in response to natural language prompts is also already having wide-reaching effects across the creative industries, potentially impacting competition, compensation, and prospects for professional sustainability (Weidinger et al., 2022; Zhou, 2023).

C. Chen, Fu, & Lyu (2023); Eloundou et al. (2023); Farina & Lavazza (2023); Frey & Osborne (2023); Gmyrek et al. (2023); Khowaja et al. (2023); Kirk et al. (2023); Liao & Vaughan (2024); Lund et al. (2023); Mattas (2023); Qadir (2023); Solaiman et al. (2023); Weidinger et al. (2021, 2022); Zarifhonarvar (2023).

Deskilling, cognitive atrophy, and overreliance

While GenAI tools may make some jobs more accessible for individuals with little experience or fewer skills (Frey & Osborne, 2023; Noy & Zhang, 2023), overreliance on FMs and GenAI tools may have harmful effects on human skills and cognitive abilities. Workers may become less efficient in completing the tasks they were once proficient in, including those requiring cognitive effort. They may also come to overdepend on GenAI systems to complete tasks that would otherwise enable social integration and the development of agential capacities. This may especially affect industries with distressed or high-pressure environments. For instance, De Angelis et al. (2023) suggest that, in the context of public health, the extensive use of AI to increase scientific productivity may hinder researchers’ writing skills. Others note that the deployment of industrial robots in certain industries could have impacts on individual workers such as decreased autonomy and a weakening of human interactions. This impoverishment of workplace agency and solidarity could be mirrored in other cyber-physically integrated workplace environments, which use multimodal GenAI tools to automate service delivery functions.

Bai et al. (2023); Connor & O’Neill (2023); De Angelis et al. (2023); Domínguez Hernández et al. (2024); Kirk et al. (2023); Liao & Vaughan (2023); Ma et al. (2023); Mhlanga (2023); Perry et al. (2023); Piñeiro-Martín et al. (2023); Solaiman et al. (2023); Thieme et al. (2023); Weidinger et al. (2021, 2022, 2023)

Data pollution and model collapse

The continuous development of new and more complex FM/LLM models relies heavily on the availability large-scale training data sets culled from the lived human environment. With the widespread use of GenAI tools, there is a substantial stream of AI-generated content entering public data pools, potentially becoming a significant source for future training data sets alongside human-created content. This shift indicates a change in the composition of future training data sets, likely characterized by the increasing prevalence of synthetic content, which will be harder to identify and discern from human-generated content. Over time, this trend will impact the quality and diversity of AI-generated outputs. It could lead to a failure of future models to capture veritable underlying data distributions and to contain irreversible defects whereby the tails of original distributions disappear—a phenomenon called “model collapse” (Shumailov et al., 2023). It could also have cascading effects on the integrity of the information ecosystem, raising concerns about the indefinite retention of synthetic content containing machine-produced misinformation, inaccuracies, ‘hallucinations,’ and biased representations in the human archive.

Domínguez Hernández et al, 2024); Ji et al., 2023; Kaddour et al. (2023); Martínez et al. (2023); Shumailov et al. (2023)

Large-scale disinformation increasingly indiscernible from human-generated content

GenAI-enabled content-generating capabilities could provide ill-intended actors with more sophisticated methods for creating and spreading highly persuasive, dynamic, personalized, and multimodal disinformation at lower costs, with minimal human involvement. This risk is amplified by the development of ever more sophisticated capabilities to generate deepfakes, of which we already find myriad harmful use cases (e.g., fabrication of political speeches, pornographic content, and satellite images). Inaccurate or misleading AI-generated content could also contaminate shared public knowledge platforms like Wikipedia, potentially affecting their quality and legitimacy as ‘knowledge commons’ (Huang & Siddarth, 2023). Over time, large-scale AI-generated mis- and disinformation could sow doubt in the authenticity of content to which people and communities are exposed, undermining the authentication of evidence, and eroding public trust in established sources of information and the knowledge ecosystem (S. Lee, Zúñiga, & Munger, 2023; Pawelec, 2022; Weidinger et al., 2023). Impacts are particularly significant in contexts or periods with high political stakes, with the potential to influence public opinion, exacerbate social polarization, undermine public trust in democratic processes, and deteriorate social cohesion (Shoaib et al., 2023; Weidinger et al., 2023).

Anderljung et al. (2023); Critch and Domínguez Hernández et al. (2024); Russell (2023); Hendrycks et al. (2023); Ho et al. (2023); Huang & Siddarth (2023); Lee & Shin (2022); S. Lee, Zúñiga, & Munger, (2023); Masood et al. (2023); Romero Moreno (2024); Pan et al. (2023); Pawelec (2022); Shevlane et al. (2023); Shoaib et al. (2023); Weidinger et al., (2023)

Scaling of fraud

GenAI systems can generate myriad types of misleading or manipulative content cheaply that could be used at scale by malicious actors in fraudulent activities, such as scams and impersonation. Not only could GenAI generate text that ill-intended actors could use for crafting large-scale personalized and convincing phishing email campaigns, but voice cloning and deepfakes could be used for more sophisticated and compelling impersonation scams, discreditation, extortion, and invasion of privacy. Likewise, pernicious GenAI-driven conversation agents, which emulate empathy and are hyper-personalized to the interests and tastes of targeted victims, could weaponize friendship in the ends of manipulation and fraud.

Amoroso et al. (2023); Domínguez Hernández et al. (2024); Karanjai (2022), Moreno (2024); Shoaib et al. (2023); Solaiman et al. (2023); Weidinger et al. (2023)

Scaling of cyberattacks and the production of malware

GenAI code writing and debugging capabilities could be transferred to the automation of hardware, software, and data vulnerability scanning and weakness exploitation, opening up new avenues for cyberattacks on critical infrastructure. In addition, assistive coding tools could implant bugs into codes and craft malware that morphs its own features to circumvent threat detection and response at lower cost. FM/LLMs also systemically widen the risk surface for data poisoning, data leakage, prompt injection attacks, spear phishing, and other cybersecurity hazards.

Anderljung et al. (2023); Charan et al. (2023); Derner & Batistič (2023); X. Huang et al. (2023); Shao et al. (2023); Shevlane et al. (2023)

Provision of information that enables bioterrorism, chemical warfare/terrorism, or other hostile acts

GenAI models could be used by ill-intended actors to generate harmful biological and chemical agents, and to synthesize compounds for various destructive purposes, including warfare or terrorism. This can be done with little technical expertise, minimal time, or limited computational resources. Similarly, GenAI models may gain access to existing weapons systems—including nuclear weapon payloads—or provide ill-intentioned actors with information or assist in scientific discoveries that enable the design of new ones.

Boiko et al. (2023); Cohen (2023); Hendrycks et al. (2023); Ho et al. (2023); Sandbrink (2023); Shevlane et al. (2023); Trager et al. (2023); Urbina et al. (2022)

Dual use, weaponization, and militarization of AI

The advancing complexity and dual-use capability of GenAI propel concerns associated with AI weaponization by states that are involved in military conflicts or civil disturbances (Critch & Russell, 2023). As a high-stakes and safety critical domain, military settings raise notable concerns regarding the use of AI, including the automated escalation of conflicts from accidents, the dangerous acceleration of wartime decision-making attending the use of strategy-executing AI systems, the potential for reduced accountability in military actions, the risk of malicious actors co-opting AI applications, and the possibility of more destructive and inhumane wars.

Critch & Russell (2023); Domínguez Hernández et al. (2024); Glukhov et al. (2023); Henderson et al. (2023); Hendrycks et al. (2023); Kang, D. et al. (2023); Soice et al. (2023); Weidinger et al. (2021, 2022);

Anthropomorphic and sociomorphic deception

The mimicking of humanlike characteristics and behaviors, including communication patterns, of GenAI-based assistive and conversational models risks deceiving users into believing they are interacting with real people rather than statistics-driven computational systems. Users may be induced to change their behaviors or have their beliefs shaped, leading to potential harm to mental or psychological integrity. Basic forms of social communication, deliberation, and justification could also be undermined by overtrust in and overreliance on FM/LLM-generated self-explanation. This kind of anthropomorphic and sociomorphic deception could create a false sense of familiarity and trust and open up new societal-level possibilities for social corrosion, dignity violation, misuse, and abuse.

Leslie & Rossi (2023); Shevlane et al. (2023); Subhash (2023); Weidinger et al. (2022)

Poorly designed and governed AI agents released into the wild

Poor or irresponsible human design choices and insufficient AI governance mechanisms risk leading to the deployment of badly functioning or out-of-control technologies, which could have a range of harmful ecosystem-level effects. This could include the production and release of large-scale AI technologies, which are ‘agentified’ using reinforcement learning techniques that lead to ‘rogue’ model behaviors that are misaligned with designers’ intentions and that harm people at the societal level. Lack of AI ethics and safety internal practices and short-sighted considerations regarding the potential expansion of model capabilities or its repurposing, could drive misuses and impacts that the model developers did not anticipate (Anderljung et al., 2023; Critch & Russell, 2023). Profit- or influence-driven choices, especially in contexts where AI companies are not effectively held accountable for their design and release decisions, could similarly lead to the release of models in high-stake domains with significant vulnerability to the impacts of accidents or unexpected capabilities (Ho et al., 2023) or even when harmful impacts of a model are already anticipated (Critch & Russell, 2023; Hendrycks et al., 2023).13

Anderljung et al. (2023); Critch & Russell (2023); Gruetzemacher et al. (2023); Hendrycks et al. (2023); Ho et al. (2023); Shevlane et al. (2023)

4. Future Shock and the International AI Policy and Governance Crisis

We have thus far sketched out some of the primary drivers of future shock for an AI policy and governance community that became caught in the crosshairs of the GenAI revolution. To conclude our stage-setting for this Policy Forum, we will examine how these drivers converged to produce the international AI policy and governance crisis to which all our contributions, in one way or another, endeavor to respond.

First and foremost, it is important to note that those large tech firms, which propelled the breakneck industrialization of GenAI, energetically capitalized on the readiness and capability gaps in the international AI policy and governance ecosystem that we discussed earlier. This included taking advantage of regulatory inaction and ineptitude in the enforcement of existing digital- and data-related laws; exploiting knowledge, information, and resource asymmetries between frontier AI companies and policymakers and regulators to dictate the pace and scope of potential statutory and governance interventions; and making the most of affinities between their own market-oriented pursuits of technological ascendancy in advanced AI and the strategic geopolitical interests and goals of state actors to secure deregulatory policy outcomes that promoted corporate self-regulation and frictionless innovation (Ho et al., 2024; Roberts et al., 2024). Along these lines, some scholars warned that the convergence of “unrestrained competition among firms” involved in a technological race to marketize GenAI products and services with this motivation of states to give their own domestic companies “a competitive edge through lax regulation” heralded a “race to the bottom on regulatory standards” (Trager et al., 2023). Indeed, the intersecting dynamics of unprecedented political economic and geopolitical power centralization—that predominantly shaped GenAI’s eruptive rise—only operated as a force multiplier for this downward deregulatory momentum.

The international AI policy and governance crisis materialized, in no small measure, by virtue of how this manufactured absence of practicable and binding regulatory and governance mechanisms to put checks on the design, development, and use of FMs and their downstream applications coalesced with widespread dissent among researchers and members of the public, who increasingly voiced concerns about the system-level risks and harms that were arising from the ‘move fast and break thing’ momentum of industrial scaling. We should note, in this respect, that there was no lack of knowledge and understanding in the public discourse about the potential hazards of scaled Gen AI industrialization. Both in the lead up to, and in tandem with, AI’s post-ChatGPT industrial boom, there was, in fact, broad awareness among academic researchers, journalists, and tech industry actors of the range of serious risks posed to individuals, society, and the biosphere by the large-scale production and use of Gen AI technologies (for example, Alba, 2022; Bender et al., 2021; Birhane et al., 2023; Bommasani et al., 2021; Hao, 2019; Hazell, 2023; Klepper, 2023; Kurenkov, 2022; Leslie & Rossi, 2023; Mascellino, 2023; Perrigo, 2021; Shelby et al., 2023; Strubell et al., 2019; Vock, 2022; Weidinger et al., 2022). The disconnect between this strengthening thrust of public criticism and the ecosystem-level chasm engendered by the absence of needed regulatory mechanisms and policy interventions was at the very heart of the international AI policy and governance crisis.

4.1. Did First Wave International AI Policy and Governance Initiatives Address the Crisis?

The extent to which the first wave of international policy and governance initiatives that cropped up in mid-2023 effectively responded to this crisis remains a matter of debate (Britten, 2023; Hawes & Hall, 2023). For some, this flurry of multistakeholder, multistate initiatives signaled considerable forward progress. On this view, the outcomes of initiatives like the UK AI Safety Summit’s Bletchley Declaration, the International Code of Conduct for Organizations Developing Advanced AI Systems produced by the G7’s Hiroshima AI Process, and the Partnership on AI’s Guidance for Safe Foundation Models generated significant momentum for international collaboration. They established joint commitments among participants to address identified frontier AI risks and to advance societally beneficial AI uses; they inaugurated processes for international collaboration on risk evaluation and mitigation; they secured the formation of several national AI Safety Institutes; and they underscored the urgency of making AI safety a global priority (Garfinkel et al., 2024; Guzik & Sitek, 2023; Oxford Analytica, 2023). Such preliminary steps toward cooperative international action on the governance of frontier AI were seen by proponents as progressing the establishment of the sort of international regimes and institutions that AI policy researchers have viewed as necessary counters to the cross-border character of FM and GenAI risks, harms, supply chains, and infrastructure—all of which render national regulation insufficient. These steps were also seen as an important move toward the redress of similar international coordination problems around standardized model evaluation and testing (Gruetzemacher et al., 2023; Ho et al., 2023; Trager et al., 2023).

Critics of the first wave international initiatives, however, have emphasized that much of this AI policy and governance activity has been ineffective and diversionary, subserving the deregulatory interests of big tech firms, failing to deliver binding governance mechanisms, drawing attention away from the real-world harms inflicted by large-scale AI, and further entrenching legacies of Global North political, economic, and sociocultural hegemony. From this perspective, many of the technical experts and policymakers, who shaped international policy discussions and outcomes, misleadingly focused public attention on evasive concepts like ‘frontier AI’14 and speculative doomsday scenarios about AI takeover and existential risks to humanity (Helfrich, 2024; Ryan-Mosely, 2023; Vallor & Luger, 2023). This functioned to steer statutory and policy attention away from the mobilization of the robust regulatory controls needed to redress tangible risks and harms, to govern the complex material reality of global AI value chains, and to constrain hapless corporate behavior (Gebru et al., 2023; Hanna & Bender, 2023; Helfrich, 2024; Terzis, 2024). Some scholars have also noted that because this ‘existential risk’ framing conceived of humanity as a monolith faced with the common threat of uncontrolled AI, it diverted attention and resources away from the nuanced forms of immediate harm that are disproportionately experienced by historically marginalized and minoritized groups, thus perpetuating and exacerbating prevailing inequities (Ferri & Gloerich, 2023; Schopmans, 2022).

Critics have additionally highlighted how the first wave of international AI policy and governance initiatives detrimentally narrowed longstanding discussions on responsible and ethical AI to a constricted set of largely technical issues surrounding ‘AI safety.’ This meant that, rather than directly confronting the immediate threats to civil, social, political, and legal rights and environmental sustainability posed by the irresponsible mass commercialization of FMs and GenAI systems, the policy and governance discussion veered toward model-focused considerations of AI alignment, model testing and reporting, capabilities evaluation, system robustness, and risk monitoring. From the critical standpoint, this retrenchment into ‘AI safety’ concerns had major agenda-setting effects. It affirmed the status quo of non- or self-regulation in the FM and GenAI innovation ecosystem (Ahmed et al., 2023), maintaining the legitimacy of extant big tech practices of haphazardly releasing unregulated black-box systems into the public domain, without further ado, rather than advancing the establishment of ex ante governance measures to secure the rights and interests of impacted people in advance of potentially harmful consequences (Britten, 2023). For Gebru and Torres (2024), such a blinkered focus on “safety issues” allowed the “companies working toward it to describe themselves as ‘AI safety’ organizations safeguarding humanity’s future, while simultaneously creating unsafe products, centralizing power, and evading accountability” (p. 19).15

In the end, those who have been critical of first wave international initiatives have stressed that rather than effectively responding to the international GenAI policy and governance crisis, such initiatives operated, in fact, chiefly to exacerbate it. As Leslie, Ashurst et al. (2024) write in this special issue:

though the stage had been set for definitive, indeed pivotal, policy action to protect the public interest, the entrenched power dynamics of the global political economy of AI came quickly to steer the international AI policy and governance agenda. Instead of safeguarding the public good by confronting the range of extant hazards to people, society, and the biosphere produced by the design, development, and deployment of GenAI systems, policymakers and governments (largely dominated by the amplified voices of countries from the Global North) deferred to the corporate prerogatives of private sector technical ‘experts,’ narrowing the governance discussion to model-centered ‘AI safety’ and zooming in on “technical methods for avoiding hypothetical ‘extreme risks’ that could emerge from the misuse or loss of control of advanced ‘frontier AI’ systems” (Smakman et al., 2023). This meant that initiatives like the UK AI Safety Summit “over-indexed on hypothetical future harms,” while failing to effectively target the far-reaching dangers of the use of AI “in the context of broader sociotechnical systems in which it is always embedded” (Lazar & Nelson, 2023; cf., Gebru et al., 2023; Ryan-Mosely, 2023; Vallor & Luger, 2023). Likewise, the blindered focus on technical measures placed the handful of tech companies—who monopolized the means of ‘frontier AI’ production and were thus exceptionally positioned to shape and implement such measures—in a position of overarching epistemic authority. This erroneously allowed for definitions of essential terms like ‘safety,’ ‘risk,’ ‘trust,’ and the ‘public good’ to be controlled by the “mono-culture” of Silicon Valley tech elites rather than being shaped and decided on by the broad spectrum of communities and stakeholders impacted by the swift proliferation of these technologies (Lazar & Nelson, 2023).

4.2. The Uneven Pitching of the International AI Policy and Governance Discussions

The foregoing criticisms highlight an evident failure among policymakers and government officials to tackle the full range of policy and governance challenges triggered by GenAI’s industrial revolution. However, from a broader, more globally minded perspective our account of this shortfall has thus far been incomplete. Critics have also pointed out that the international AI policy and governance conversation that steered first wave outcomes centered the views, positions, and interests of a handful of prominent geopolitical and private sector actors from the high-income countries of the West and the Global North, while broadly neglecting the contexts, voices, and concerns of those impacted communities whose members were from the Global Majority, especially those from lower income countries (Adams et al., 2023; Lazar & Nelson, 2023; World Economic Forum, 2024). From this perspective, the uneven pitching of international AI policy and governance discussions had significant agenda-determining consequences, whereby major issues that were affected by GenAI policy (and that should have affected its development and direction in turn) were largely absent from or deprioritized in international discussions. These issues included the exploitation of labor (e.g., data workers in global GenAI supply chains), widening digital divides, growing global inequality, infringements on data sovereignty, inequities in international research environments, the worsening of institutional instability, epistemic injustices (e.g., the erasure of indigenous knowledges and perspectives), data extractivism, and the disproportionate impact of AI-prompted environmental harm on those in lower income and small island countries.

On this critical view, moreover, beyond prompting the exclusion or relegation of crucial issues, such an uneven pitching also enabled Northern and Western framings to dominate global AI governance narratives—not only flooding the space with socioculturally skewed visions of AI and AI governance that were born up out of the Anglo-European “technological imaginary” (Crawford, 2021) but also reproducing and reinforcing implicit colonial logics. Helfrich (2024) argues, along these lines, that the very term ‘frontier AI,’ in its tone-deaf evocation of Manifest Destiny and pioneer-era quest, violence, and exploitation in the United States, invoked “the colonial mindset, further reinscribing the harmful dynamics between the handful of powerful Western companies who produce today’s generative AI models and the people of the ‘Global South’ who are most likely to experience harm as a direct result of the development and deployment of these AI technologies” (p. 1). Ferri and Gloerich (2023) similarly suggest that certain ‘existential risk’ and AI-induced human extinction narratives, which stake a claim to safeguarding the ‘future of humanity,’ sought to legitimize the few who participated in AI safety conversations and decision-making as representatives and defenders of ‘humanity’ as such. This led to a “top-down approach to governance” that elevated Western technological elites to a position of exceptional technopolitical and epistemic authority simultaneously as it devalued the importance of including the diverse voices and lived experiences of the rest of the world in AI policy and governance processes (Ferri & Gloerich, 2023).

Considered together, the narrow-minded focus of first wave AI policy and governance discussions on issues defined by Northern and Western interests, the overlooking and undervaluing of the policy issues and concerns of the Majority World, and the perpetuation of colonial logics in dominant narratives all indicate the need for a rebalancing of AI policy and governance discussions to include the voices of those from low- and middle-income and small island countries who have thus far been sidelined. Over and above this, redress of the uneven pitching of AI policy debates and processes requires concerted efforts to understand and address the contexts of coloniality, global inequality, and systemic discrimination that have created the situation in the first place. Critical scholars and social justice activists have long called attention to how legacies of coloniality and contexts of historically entrenched oppression influence the way AI and data-driven systems, including GenAI technologies, are built and deployed (Adams et al., 2023; Aggarwal, 2020; Ali, 2014; Birhane, 2021; Birhane & Talat, 2023; Krishnan, 2021; Mhlambi, 2020; Mohamed et al., 2020; Tacheva & Ramasubramanian, 2023). They have likewise stressed how underlying dynamics of structural discrimination and systemic power imbalances establish path dependencies that manifest across the complex global value chains, the data and compute infrastructures, and the socio-technical settings in which the production and use of these systems are nested (Cohen, 2019; Terzis, 2024). The endeavor to equitably scope and understand global AI policy and governance issues and to co-design comprehensive and transformational governance initiatives accordingly demands constant interrogation of the social, historical, cultural, political, and economic forces behind manifestations of discrimination, oppression, and injustice in contemporary ecologies of data and AI (Leslie et al., 2022). Furthermore, it demands a reorientation of policy and governance thinking that prioritizes the scrutinization of how longer term patterns and legacies of inequality, discrimination, and privilege—at local, regional, and global levels—have cascading effects across FM and GenAI innovation lifecycles.

A critical first step toward creating a more context-responsive, globally minded, and equity-centered AI policy and governance discussion is to create robust transversal interactions between a multiplicity of voices, backgrounds, and experiences. The idea of transversality, in this connection, involves calling into question the assumption that there is a dominant core or center of the AI policy and governance discussion (situated in the advanced industrialized nation-states of the Global North) that must take account of the voices that come from the margins or the periphery (i.e., those originating in non-Western and ‘Global Southern’ parts of the world) (Dussel, 2012). This assumption of a fixed core-periphery relationship anchored in the North reflects a fraught heritage of cultural hegemony and ethnocentrism that needs rectification. A truly transversal policy dialogue does not have a core and a periphery, a center and margins (Collins, 1990). Rather it disrupts the core–periphery relationship as such by creating a multitude of peripheries without a core. It decenters interactions from ‘lived experience to lived experience’ and from ‘periphery to periphery,’ giving equal importance to the unique contexts and concerns of all conversation partners and all affected voices.

The mobilization of more transversal AI policy and governance discussions would act as a corrective to prevailing representational imbalances in the international conversation. It would enable the kind of inclusive and meaningful policy dialogues that are equipped to interrogate, tackle, and repair the full range of risks and harms emerging from GenAI, while also confronting the longer term socio-historical patterns of inequity that both frame AI policy processes and determine the distribution of hazards and opportunities within the global AI innovation ecosystem. Indeed, some signs of a gathering momentum of transverality can already be seen in the recent efforts of UNESCO to co-organize with the Development Bank of Latin America and the Chilean Ministry of Science, Technology, Knowledge and Innovation the Ministerial and High Authorities Summit on Artificial Intelligence in Latin America and the Caribbean that produced the Santiago Declaration (2023); to work with Southern African states in generating the “Windhoek Statement on Artificial Intelligence in Southern Africa” (2022); to bring together government, academic, civil society, and industry stakeholders from more than 50 countries spanning every region of the world to complete its AI Readiness Assessment Methodology; and to launch a Global AI Ethics and Governance Observatory to build a participatory commons for sharing the common but diverse and pluralistic learnings and lived experience of its 193 member states. More of this kind of global co-convening and network-building work is needed to amplify the voices of those who have been peripheralized in international AI ethics and governance discussions and to decenter these international discussions as such.

5. Open Policy Questions 

At the beginning of this Introduction, we posed the question: ‘Did the rapid industrialization of generative AI really trigger future shock for the global AI policy and governance community?’ In showing that this was by and large the case, we examined how some of the primary drivers of future shock converged to produce an international AI policy and governance crisis marked by the disconnect between the strengthening thrust of public concerns about the hazards posed by the hasty industrial scaling of GenAI and the absence of effectual regulatory mechanisms and needed policy interventions to address such hazards. We then explored how first wave international policy and governance initiatives struggled to sufficiently address the crisis, paying particular attention to the ways in which first wave policy activities and outcomes have largely exacerbated it. Ultimately, the failure to address the full range of risks and harms emerging from GenAI, the inability to establish effective and binding governance mechanisms, and the sidelining of the voices and concerns of key stakeholders from the Global South have left us with more open policy questions than answers. Our Policy Forum contributions broach a broad range of these questions.

The Forum collects 13 position papers on GenAI-related policy issues from leading public sector and civil society organizations from around the world. In gathering these articles, we prioritized multisector, cross-disciplinary, and geographically diverse expertise, and we sought to spotlight the myriad lived experiences of those affected by these technologies. The contributions are written from both research and practice-based perspectives, providing insights on the ways that we can respond to the far-reaching risks and harms posed by FMs and GenAI technologies while weighing such hazards against potential positive impacts. Here, we conclude this Introduction with a brief thematically organized summary of these incisive contributions.

5.1. Addressing the Gaps in Actionable AI Policy and Governance

As governance strategies for FMs and GenAI technologies continue to take shape, contributors to the Policy Forum explore the diverse ways in which the gaps in actionable AI policy and governance can be addressed. Two of these contributions highlight the need to consolidate AI policy and governance through thoroughly informed and critically reflexive interventions. In “Scaling Up Mischief: Red-Teaming AI and Distributing Governance,” Jacob Metcalf and Ranjit Singh (2024) explore the effectiveness of red-teaming as a governance strategy and raise caution on reliance on such untested and potentially questionable governance strategies. They underscore the weaknesses of prompt-based approaches to ensuring the safety of model behavior given the indeterminacy and semantic openness of the input space and the potential for boundless workarounds in infinitely generative natural language. In “Castles in the Sand?: How the Public Sector and Academia Can Partner in Regulatory Sandboxes to Help Leverage Generative AI for Public Good,” Alex Moltzau and Robindra Prabhu (2024) advocate for collaborative public–academic and cross-disciplinary endeavors and highlight the valuable insights that we can gain from the 'constraints' imposed by the public sector on emerging technologies to orient their production and use to the public good. The authors suggest that not only do these constraints provide instructive feedback for discussions on AI governance but also present unique opportunities to explore innovative governance approaches.

Other contributions explore concerted international or multilateral efforts as points of departure. In “Toward International Cooperation on Foundational AI Models: An Expanded Role for Trade Agreements and International Economic Policy,” (a republication of the original Brookings Institution article of the same name [Meltzer, 2023]) Joshua Meltzer (2024)from the Brooking Institution in the United States focuses on efforts established by free trade agreements, digital economic agreements, and international economic forums. He suggests that these mechanisms should play a greater role to build international cooperation and to further develop new commitments to address the opportunities and risks of FMs and GenAI technologies. In “Government Interventions to Avert Future Catastrophic AI Risks,” Yoshua Bengio (2024) urges for governmental actions to address the factors he identifies as influencing both current risks and the probability of future catastrophic risks. He envisions policy actions that operate within multilateral and democratic spheres, including agile national and multilateral regulatory frameworks and legislation, global open science research efforts that inform regulation and governance structures, and R&D investment in countermeasures to address the risks associated with “potential rogue AIs or AI-equipped bad actors with harmful goals.”

These position papers prompt important open policy questions: How can we rigorously assess the suitability of governance actions to identify and mitigate potential risks and actual harms before overinvesting in or overrelying on them? How can governance approaches balance the need for real-world experimentation with the responsibility to protect societal and public interests? How can international cooperation strike a balance between seizing opportunities presented by GenAI and FMs and sufficiently identifying and mitigating associated risks and harms? What strategies should be employed to ensure that new commitments effectively address both aspects? How can we engage in concerted, democratic, and multilateral efforts to address factors enabling the full range of risks and harms that accompany the increasing capabilities of AI?

5.2. Dynamics of Unprecedented Scaling and Centralization That Contribute to Future Shock

Contributors to the Policy Forum also explore risks and harms that emerge as a result of the unprecedented scaling and centralization of GenAI, investigating how governance mechanisms can be implemented to effectively operationalize AI principles and to address the gaps in regime- or domain-specific enforcement and deficiencies in regulatory AI capacity. Despite the strengthening of data privacy and protection and online safety policy regimes over the past couple of decades, they have been insufficient to prepare the AI policy and governance ecosystem to cope with the unfathomability of AI training data and model opacity and complexity. In “Data Protection and Generative AI: An Inconclusive Answer,” Romina Garrido takes note of this, and focuses attention on the shortcomings of existing data protection regimes to address the emerging challenges that GenAI poses to data privacy and protection. The author calls for urgent efforts to address the lawfulness, purpose, and contextual integrity of the personal data collected and used by GenAI. In “We Must Fix the Lack of Transparency Around the Data Used to train foundation models,” Jack Hardinges, Elena Simperl, and Nigel Shadbolt (2024) address the gaps in attending to issues around data protection and bias that may hinder the well-being of users and impacted communities. The authors emphasize that to effectively mitigate potential harms produced by GenAI technologies, we must prioritize instituting transparency requirements for the data used in their training. These two position papers prompt important policy questions: Is it legal, in the first instance, to use personal data that have been publicly shared on the Internet in one specific context for a different one (namely, to train GenAI systems that will have myriad downstream purposes)? How do we ensure the lawfulness, purpose, and contextual integrity of the personal data collected and used by generative AI? How can governance approaches be designed to ensure that organizational opacity and secrecy are discouraged in the GenAI market landscape? What measures can be implemented to enforce the transparency of training data sets?

Other contributions shine light on the societal- and biospheric-level risks enabled by industrial scaling and insufficient AI policy and governance regimes. In “The Double-Edged Sword of AI: How Generative Language Models Like Google Bard and ChatGPT Pose a Threat to Countering Hate and Misinformation Online,” Ben Weich (2024), drawing on research from the Centre for Countering Digital Hate in the United Kingdom, discusses the governance and regulatory gaps in countering the amplification of hate and misinformation online. In response, the author offers a framework for evaluating efforts made by governments and social media companies to implement guardrails, which embeds principles of safety-by-design, transparency, accountability, and responsibility. In “AI and Creative Work,” Seamus McGibbon and Nicola Solomon (2024) from the Creators' Rights Alliance in the United Kingdom explore how LLMs and GenAI systems interact with and potentially exploit the creative works of artists and creators. To address these concerns, the authors advocate for a fairer playing field where all parties work together to ensure proper regulation and the guarantee of both copyright protection and the realization of fundamental rights and freedoms. This includes licensing agreements, as well as acknowledging, crediting, and compensating creators for the use of their work.

As contextual factors expose some populations to the disproportionate impacts of generative AI, other contributors focus on the risks of expanding inequalities. In “From Left Behind to Left Out: Generative AI or the Next Pain of the Unconnected,” Jean Louis Kedieng Ebongue Fendji (2024) from the Stellenbosch Institute for Advanced Study, University of Ngaoundéré, and AfroLeadership in Cameroon notes that GenAI technologies sit in tension with the principle of nondiscrimination. Firstly, the benefits of GenAI are predominantly accessible to populations with higher internet-penetration rates. Secondly, AI-generated content tends to reflect the lived experiences of the connected, thereby potentially overlooking or rendering invisible already disadvantaged populations. In a similar vein, in “Unpacking AI Governance From the Margins,” Shmyla Khan (2024) cautions against an overreliance on AI-generated products in contexts characterized by low trust in media and high levels of state control. These settings are particularly susceptible to the detrimental effects of AI-generated disinformation. The picture presented by both authors signals the diverse range of serious issues affecting individuals, groups, and communities across the world (especially those on the local, regional, and global margins) that need to be accounted for in the AI policy and governance agenda. These position papers prompt important policy questions: How will regulators be able to meaningfully identify and address the aggregate impacts of GenAI systems—especially the subtle downstream forms of sociocultural discrimination and exclusion that transgress nation-state borders? How can we ensure meaningful scrutiny and accountability of AI systems that ensure ex ante reflection on societal impacts and corresponding mitigation measures? What steps are necessary to uncover the diverse contexts of use and impacts of large-scale GenAI technologies? How can we establish and enforce frameworks that ensure fairness considerations are made part of the design, development, and deployment of GenAI technologies, securing an equitable distribution of opportunities and beneficial outcomes? How can regulatory and governance regimes be informed and shaped by the lived experiences of marginalized, vulnerable, and disadvantaged populations, both locally and globally?

Lastly, in “Carbon Emissions in the Tailpipe of Generative AI,” Tamara Kneese and Meg Young (2024) describe the range of environmental costs associated, first, with increasingly powerful (and resource-intensive) FMs and GenAI systems and, second, with the scaling of the production and deployment of such models. The authors call on developers to consider biospheric impacts across the AI supply chain and lifecycle. They also explore the value of developers adopting environmental justice and equity approaches to their practices. This includes going beyond just measuring climate impacts, but also focusing on how to optimally mitigate the environmental costs of GenAI production and use and how to engage meaningfully with affected people. This position paper prompts important policy questions: How can relevant stakeholders prioritize carbon and other resource considerations and make these central to the AI innovation workflow? What strategies can policymakers implement to integrate environmental justice and equity into AI policy and regulatory frameworks? How should stakeholders address the potential for environmental harm in communities housing materials for hardware, large data centers, and other components of the GenAI supply chain?

5.3. The Uneven Pitching and Disputed Issues of the International AI Policy Discussions

In response to the new order and scale of risks posed by GenAI and addressing the overrepresentation of the privileged positions, interest, and policies of a select few, Khan raises concerns about fragmented norm-setting strategies that are not only primarily concentrated in affluent Western countries but that also prioritize their economic interests. In addition, the author of “Unpacking AI Governance From the Margins” (Khan, 2024) warns about the effects of a mainstream AI policy discourse characterized by a panicking tone that overlooks structural and historical contexts. The author draws attention to the lack of incentives for incorporating the Global South (and the populations most vulnerable to GenAI’s impact) in decision-making processes and the perpetration of opaque and exclusionary AI governance processes. In their work “‘Frontier AI,’ Power, and the Public Interest: Who Benefits, Who Decides?” David Leslie and collegues (Leslie, Ashurst, et al., 2024) delve into the asymmetrical power dynamics in the global AI innovation ecosystem that place essential public interest decisions in the hands of a small group of private sector tech companies. The authors shine light on a worrying discrepancy: despite the rhetoric of a pivotal socio-historical moment spurred by breakthroughs in advanced AI technologies, the policymakers and governments steering AI policy and governance are largely pandering to private sector interests and yielding to the corporate priorities of a technical 'expert’ elite. This undermines, they claim, possibilities for the establishment of meaningful and binding statutory and regulatory interventions. As a corrective, Leslie, Ashurst, et al. urge for the recognition of AI as a global public utility subject to democratic oversight, community-driven agenda-setting, and comprehensive, society-centric regulation. Lastly, refocusing attention on the conceptual and rhetorical framings that are permeating the AI policy discussions, in “AI Safety Is a Narrative Problem,” Rachel Coldicutt (2024) from Careful Trouble in the United Kingdom explores why ‘existential risk’ discourse has become a central pillar of the mainstream media coverage of AI. In doing so, the author analyses what this narrative says about the power dynamics in AI governance and puts the spotlight on what she considers a more pressing and real existential risk: an economic elite “shaping markets and societies for their own benefit.”

These position papers prompt important policy questions: How can FM and GenAI producers be held accountable for harms in legal jurisdictions of the Majority World where their products and services have impacts but where they are not based? How can the global AI policy and governance community best foster equitable, inclusive, and transversal participatory AI policymaking? Who should control the data, compute, and skills infrastructures that enable the advancement of research and innovation practices that affect the public interest? Who should govern and decide the trajectories of advanced AI technologies that potentially hold both great promise and great peril for the future of society and the biosphere? In this respect, how can affected members of society exercise appropriate democratic agency over the trajectories of their own collective futures in shaping the role AI technologies will play in those futures?

Disclosure Statement

David Leslie and Antonella Maia Perini have no financial or non-financial disclosures to share for this article.


Abid, A., Farooqi, M., & Zou, J. (2021). Persistent anti-Muslim bias in large language models. In M. Fourcade & B. Kuipers (Eds.), AIES '21: Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society (pp. 298–306). Association for Computer Machinery.

Access Now. (2024). Radiografía normativa: ¿Dónde, qué y cómo se está regulando la inteligencia artificial en América Latina?

Accountable Tech, AI Now, and EPIC. (2023). Zero trust AI governance framework. Accountable Tech.'ve%20played%20up%20the,forward%20at%20a%20breakneck%20speed.

Ada Lovelace Institute. (2023). Regulating AI in the UK: A framework for accountability and oversight.

Adadi, A., & Berrada, M. (2018). Peeking inside the black-box: A survey on explainable artificial intelligence (XAI). IEEE Access6, 52138–52160.

Adams, R., Alayande, A., Brey, Z., Browning, B., Gastrow, M., Jerry, K., Mathew, D., Nkosi, M., Nunoo-Mensah, H., Nyakundi, D., Odumuyiwa, V., Okunowo, O., Olbrich, P., Omar, N., Omotubora, K., Plantinga, P., Schroeder, Z., Agbemenu, A., & Uwizera, D. (2023). A new research agenda for African generative AI. Nature Human Behaviour, 7, 1839–1841.

Adebayo, J., Gilmer, J., Muelly, M., Goodfellow, I., Hardt, M., & Kim, B. (2018). Sanity checks for saliency maps. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, & R. Garnett (Eds.), Advances in Neural Information Processing Systems (Vol. 31, pp. 9505–9515). Curran Associates.

African Union. (2019). 2019 Sharm El Sheikh Declaration STC-CICT-3.

Agarwal, C., Johnson, N., Pawelczyk, M., Krishna, S., Saxena, E., Zitnik, M., & Lakkaraju, H. (2022). Rethinking stability for attribution-based explanations. ArXiv.

Aggarwal, N. (2020). Introduction to the Special Issue on Intercultural Digital Ethics. Philosophy & Technology33(4), 547–550.

Ahmed, S., Jaźwińska, K., Ahlawat, A., Winecoff, A., & Wang, M. (2023). Building the epistemic community of AI safety. SSRN.

AI Safety Summit: Introduction. (2023, October 31). GOV.UK.

Aitken, M., Leslie, D., Ostmann, F., Pratt, J., Margetts, H., & Dorobantu, C. (2022). Common regulatory capacity for AI. The Alan Turing Institute.

Aïvodji, U., Arai, H., Fortineau, O., Gambs, S., Hara, S., & Tapp, A. (2019). Fairwashing: The risk of rationalization, Proceedings of Machine Learning Research, 97, 161–170.

Alan Turing Institute. (n.d.). Al Standards Hub.

Alikhademi, K., Richardson, B., Drobina, E., & Gilbert, J. E. (2021). Can explainable AI explain unfairness? A framework for evaluating explainable AI. ArXiv.

Al Rawashdeh, R., Campbell, G., & Titi, A. (2016). The socio-economic impacts of mining on local communities: The case of Jordan. The Extractive Industries and Society, 3(2), 494–507.

Alba, D. (2022, December 8). OpenAI chatbot spits out biased musings, despite guardrails. Bloomberg.

Alvarez-Melis, D., & Jaakkola, T. S. (2018). On the robustness of interpretability methods. ArXiv.

Ali, S. M. (2014). Towards a decolonial computing. In Ambiguous technologies: Philosophical issues, practical solutions, human nature (pp. 28–35). International Society of Ethics and Information Technology.

Amankwah-Amoah, J., Abdalla, S., Mogaji, E., Elbanna, A., & Dwivedi, Y. K. (2024). The impending disruption of creative industries by generative AI: Opportunities, challenges, and research agenda. International Journal of Information Management, Article 102759.

Amodei, D., Hernandez, D., Sastry, G., Clark, J., Brockman, G., & Sutskever, I. (2018, May 16). AI and compute. OpenAI.

Amoroso, R., Morelli, D., Cornia, M., Baraldi, L., Del Bimbo, A., & Cucchiara, R. (2023). Parents and children: Distinguishing multimodal deepfakes from natural images. ArXiv.

Anderljung, M., Barnhart, J., Leung, J., Korinek, A., O'Keefe, C., Whittlestone, J., Avin, S., Brundage, M., Bullock, J., Cass-Beggs, D., Chang, B., Collins, T., Fist, T., Hadfield, G., Hayes, A., Ho, L., Hooker, S., Horvitz, E., Kolt, N., ... Wolf, K. (2023b). Frontier AI regulation: Managing emerging risks to public safety. ArXiv.

Anders, C., Pasliev, P., Dombrowski, A. K., Müller, K. R., & Kessel, P. (2020). Fairwashing explanations with off-manifold detergent. Proceedings of Machine Learning Research, 119, 314–323.

Angelov, P. P., Soares, E. A., Jiang, R., Arnold, N. I., & Atkinson, P. M. (2021). Explainable artificial intelligence: An analytical review. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery11(5), Article e1424.

Angwin, J., Larson, J., Mattu, S., & Kirchner, L. (2016, May 23). Machine bias: There’s software used across the country to predict future criminals. And it’s biased against blacks. ProPublica.

Anton, E. M., Devese, S., Miller, J., Ullstad, F., Ruck, B. J., Trodahl, H. J., & Natali, F. (2020). Superconducting computing memory using rare-earth nitrides. In Brydon, P., Söhnel, T., & Mallett, B. (Eds.), 44th Annual Condensed Matter And Materials Meeting program and abstracts (p. 92). International Atomic Energy Agency.

Arora, S., Li, Y., Liang, Y., Ma, T., & Risteski, A. (2018). Linear algebraic structure of word senses, with applications to polysemy. In L. Lee, M. Johnson, K. Toutanova, & B. Roark (Eds.), Transactions of the Association for Computational Linguistics (Vol. 6, pp. 483–495). MIT Press.

African Union Development Agency-New Partnership for Africa's Development. (2021). AI for Africa: Artificial intelligence for Africa's socio-economic development.\development#:~:text=AI%20for%20Africa%3A%20Artificial%20Intelligence%20for%20Africa's%20Socio%2DEconomic%20Development,-Download&text=The%20report%20assesses%20the%20current,continent's%20economic%20and%20social%20development

Ayana, G., Dese, K., Daba, H., Mellado, B., Badu, K., Yamba, E. I., Faye, S. L., Ondua, M., Nsagha, D., Nkweteyim, D., & Kong, J. D. (2023). Decolonizing global AI governance: Assessment of the state of decolonized AI governance in Sub-Saharan Africa. SSRN.

Bai, L., Liu, X., & Su, J. (2023). ChatGPT: The cognitive effects on learning and memory. Brain‐X, 1(3), Article e30.

Bai, Y., Jones, A., Ndousse, K., Askell, A., Chen, A., DasSarma, N., Drain, D., Fort, S., Ganguli, D., Henighan, T., Joseph, N., Kadavath, S., Kernion, J., Conerly, T., El-Showk, S., Elhage, N., Hatfield-Dodds, Z., Hernandez, D., Hume, T., & Kaplan, J. (2022). Training a helpful and harmless assistant with reinforcement learning from human feedback. ArXiv.

Barlas, P., Kyriakou, K., Guest, O., Kleanthous, S., & Otterbacher, J. (2021). To" see" is to stereotype: Image tagging algorithms, gender recognition, and the accuracy-fairness trade-off. Proceedings of the ACM on Human-Computer Interaction4(CSCW3), Article 232.

Baron, J., Contreras, J. L., Husovec, M., Larouche, P., & Thumm, N. (2019). Making the rules: The governance of standard development organizations and their policies on intellectual property rights. JRC Science for Policy Report, EUR 29655.

Bastings, J., Ebert, S., Zablotskaia, P., Sandholm, A., & Filippova, K. (2022). "Will you find these shortcuts?" A protocol for evaluating the faithfulness of input salience methods for text classification. ArXiv.

Belrose, N., Furman, Z., Smith, L., Halawi, D., Ostrovsky, I., McKinney, L., Biderman, S., & Steinhardt, J. (2023). Eliciting latent predictions from transformers with the tuned lens. ArXiv.

Benbya, H., Strich, F., & Tamm, T. (2024). Navigating generative artificial intelligence promises and perils for knowledge and creative work. Journal of the Association for Information Systems, 25(1), 23–36.

Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the dangers of stochastic parrots: Can language models be too big?🦜. In FAccT '21: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (pp. 610–623). Association for Computer Machinery.

Benítez, J. M., Castro, J. L., & Requena, I. (1997). Are artificial neural networks black boxes? IEEE Transactions on Neural Networks8(5), 1156–1164.

Bengio, Y. (2024). Government interventions to avert future catastrophic AI risks. Harvard Data Science Review, (Special Issue 5).

Bietti, E. (2020). From ethics washing to ethics bashing: A view on tech ethics from within moral philosophy. In M. Hildebrandt & C. Castillo (Eds.), FAT* '20: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (pp. 210–219). Association for Computer Machinery.

Bird, C., Ungless, E., & Kasirzadeh, A. (2023). Typology of risks of generative text-to-image models. In F. Rossi, S. Das, J. Davis, K. Firth-Butterfield, & A. John (Eds.), AIES '23: Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society (pp. 396–410). Association for Computer Machinery.

Birhane, A. (2021). Algorithmic injustice: A relational ethics approach. Patterns, 2(2), Article 100205.

Birhane, A., Prabhu, V., Han, S., Boddeti, V., & Luccioni, S. (2024). Into the LAION’s den: Investigating hate in multimodal datasets. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, S. Levine (Eds.), Advances in Neural Information Processing Systems (Vol. 36, pp. 21268–21284). Curran Associates.

Birhane, A., Kasirzadeh, A., Leslie, D., & Wachter, S. (2023). Science in the age of large language models. Nature Reviews Physics, 5, 277–280.

Birhane, A., & Prabhu, V. U. (2021). Large image datasets: A pyrrhic win for computer vision? In 2021 IEEE Winter Conference on Applications of Computer Vision (WACV) (pp. 1536–1546). IEEE.

Birhane, A., Talat, Z. (2023). It’s incomprehensible: On machine learning and decoloniality. In S. Lindgren (Ed.), Handbook of critical studies of artificial intelligence (pp. 128–140). Edward Elgar.

Boddington, P. (2020). Normative modes. In M. D. Dubber, F. Pasquale, & S. Das (Eds.), The Oxford Handbook of Ethics of AI (pp. 125–140). Oxford University Press.

Boiko, D. A., MacKnight, R., & Gomes, G. (2023). Emergent autonomous scientific research capabilities of large language models. ArXiv.

Bommasani, R., Creel, K. A., Kumar, A., Jurafsky, D., & Liang, P. S. (2022). Picking on the same person: Does algorithmic monoculture lead to outcome homogenization? In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, & A. Oh (Eds.), Advances in Neural Information Processing Systems (Vol. 35, pp. 3663–3678). Curran Associates.

Bommasani, R., Hudson, D. A., Adeli, E., Altman, R., Arora, S., von Arx, S., Bernstein, M. S., Bohg, J., Bosselut, A., Brunskill, E., Brynjolfsson, E., Buch, S., Card, D., Castellon, R., Chatterji, N., Chen, A., Creel, K., Davis, J. Q., Demis, C., … Liang, P. (2021). On the opportunities and risks of foundation models. ArXiv.

Brandom, R. (2000). Articulating reasons: An introduction to inferentialism. Harvard University Press.

Brandom, R. (2001). Making it explicit: Reasoning, representing, and discursive commitment. Harvard University Press. (Original work published 1994)

Brandom, R. (2013). Reason in philosophy: Animating ideas. Harvard University Press.

Breitfeller, L., Ahn, E., Jurgens, D., & Tsvetkov, Y. (2019). Finding microaggressions in the wild: A case for locating elusive phenomena in social media posts. In K. Inui, J. Jiang, V. Ng, & X. Wan (Eds.), Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) (pp. 1664–1674). Association for Computational Linguistics.

Britten, A. (2023). Bletchley Declaration: Nations unite on AI risk. Significance, 20(6), 2–3.

Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D., Wu, J., Winter, C. … Amodei, D. (2020). Language models are few-shot learners. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, & H. Lin (Eds.), Advances in Neural Information Processing Systems (Vol. 33, pp. 1877–1901). Curran Associates.

Bullard, R. D. (Ed.). (1993). Confronting environmental racism: Voices from the grassroots. South End Press.

Cao, Y., Zhou, L., Lee, S., Cabello, L., Chen, M., & Hershcovich, D. (2023). Assessing cross-cultural alignment between ChatGPT and human societies: An empirical study. ArXiv.

Cadwalladr, C. & Graham-Harrison, E. (2018, March 17). Revealed: 50 million Facebook profiles harvested for Cambridge Analytica in major data breach. The Guardian.

Cappelen, A. W., Hole, A. D., Sørensen, E. Ø., & Tungodden, B. (2007). The pluralism of fairness ideals: An experimental approach. American Economic Review, 97(3), 818–827.

Carlini, N., Ippolito, D., Jagielski, M., Lee, K., Tramer, F., & Zhang, C. (2023). Quantifying memorization across neural language models. ArXiv.

Carlini, N., Jagielski, M., Choquette-Choo, C. A., Paleka, D., Pearce, W., Anderson, H., Terzis, A., Thomas, K., & Tramèr, F. (2023). Poisoning web-scale training datasets is practical. ArXiv.

Carlini, N., Tramer, F., Wallace, E., Jagielski, M., Herbert-Voss, A., Lee, K., Roberts, A., Brown, T., Song, D., Erlingsson, Ú., Oprea, A., & Raffel, C. (2021). Extracting training data from large language models. In 30th USENIX Security Symposium (USENIX Security 21) (pp. 2633–2650). USENIX.

Carugati, C. (2023, May 11). The age of competition in generative artificial intelligence has begun. Bruegel.

Casper, S., Ezell, C., Siegmann, C., Kolt, N., Curtis, T. L., Bucknall, B., Haupt, A., Wei, K., Scheurer, J., Hobbhahn, M., Sharkey, L., Krishna, S., Von Hagen, M., Alberti, S., Chan, A., Sun, Q., Gerovitch, M., Bau, D., Tegmark, M., ... Hadfield-Menell, D. (2024). Black-box access is insufficient for rigorous ai audits. ArXiv.

Chan, A., Salganik, R., Markelius, A., Pang, C., Rajkumar, N., Krasheninnikov, D., Langosco, L., He, Z., Suan, Y., Carroll, M., Lin, M., Mayhew, A., Collins, K., Molamohammadi, M., Burden, J., Zhao, W., Rismani, S., Voudouris, K., ... Maharaj, T. (2023). Harms from increasingly agentic algorithmic systems. In FAccT '23: Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (pp. 651–666). Association for Computer Machinery.

Charan, P. V., Chunduri, H., Anand, P. M., & Shukla, S. K. (2023). From text to MITRE techniques: Exploring the malicious use of large language models for generating cyber attack payloads. ArXiv.

Chen, C., Fu, J., & Lyu, L. (2023). A pathway towards responsible ai generated content. ArXiv.

Chen, H., Raj, B., Xie, X., & Wang, J. (2024). On catastrophic inheritance of large foundation models. ArXiv.

Chen, Y., Zhong, R., Ri, N., Zhao, C., He, H., Steinhardt, J., Zhou, Y., & McKeown, K. (2023). Do models explain themselves? counterfactual simulatability of natural language explanations. ArXiv.

Christiano, P. F., Leike, J., Brown, T., Martic, M., Legg, S., & Amodei, D. (2017). Deep reinforcement learning from human preferences. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, R. Garnett (Eds.), Advances in Neural Information Processing Systems (Vol. 30, pp. 4299–4307). Curran Associates.

Chuang, Y. N., Wang, G., Chang, C. Y., Tang, R., Yang, F., Du, M., Cai, X., & Hu, X. (2024). Large language models as faithful explainers. ArXiv.

Clarke, L. (2023, September 14). How Silicon Valley doomers are shaping Rishi Sunak’s AI plans. Politico.

Cohen, I. G. (2023). What should ChatGPT mean for bioethics? The American Journal of Bioethics, 23(10), 8–16.

Cohen, J. E. (2019). Between truth and power: The legal constructions of informational capitalism.
Oxford University Press.

Coldicutt, R. (2024). AI safety is a narrative problem. Harvard Data Science Review, (Special Issue 5).

Collins, P. H. (1990). Black feminist thought: Knowledge, consciousness, and the politics of empowerment. Unwin Hyman.

Competition and Markets Authority. (2023). AI foundation models: Initial report.

Connor, M., & O'Neill, M. (2023). Large language models in sport science & medicine: Opportunities, risks and considerations. ArXiv.

Crawford, K. (2021). The atlas of AI: Power, politics, and the planetary costs of artificial intelligence. Yale University Press.

Crawford, K. (2024). Generative AI’s environmental costs are soaring—and mostly secret. Nature, 626, Article 693.

Critch, A., & Russell, S. (2023). TASRA: A taxonomy and analysis of societal-scale risks from AI. ArXiv.

Cunningham, H., Ewart, A., Riggs, L., Huben, R., & Sharkey, L. (2023). Sparse autoencoders find highly interpretable features in language models. ArXiv.

Davani, A. M., Atari, M., Kennedy, B., & Dehghani, M. (2023). Hate speech classifiers learn normative social stereotypes. Transactions of the Association for Computational Linguistics11, 300–319.

De Angelis, L., Baglivo, F., Arzilli, G., Privitera, G. P., Ferragina, P., Tozzi, A. E., & Rizzo, C. (2023). ChatGPT and the rise of large language models: The new AI-driven infodemic threat in public health. Frontiers in Public Health, 11, Article 1166120.

Derner, E., & Batistič, K. (2023). Beyond the safeguards: Exploring the security risks of ChatGPT. ArXiv.

Dev, S., Monajatipoor, M., Ovalle, A., Subramonian, A., Phillips, J. M., & Chang, K. W. (2021). Harms of gender exclusivity and challenges in non-binary representation in language technologies. ArXiv.

Dinan, E., Abercrombie, G., Bergman, A. S., Spruit, S., Hovy, D., Boureau, Y. L., & Rieser, V. (2021). Anticipating safety issues in E2E conversational AI: Framework and tooling. ArXiv.

Dodge, J., Sap, M., Marasović, A., Agnew, W., Ilharco, G., Groeneveld, D., Mitchell, M., & Gardner, M. (2021). Documenting large webtext corpora: A case study on the Colossal Clean Crawled Corpus. In M.-F. Moens, X. Huang, L. Specia, & S. Wen-tau Yih (Eds.), Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (pp. 1286–1305). Association for Computational Linguistics.

Domínguez Hernández, A., Krishna, S., Perini, A. M., Katell, M., Bennett, S. J., Borda, A., Hashem, Y., Hadjiloizou, S., Mahomed, S., Jayadeva, S., Aitken, M., & Leslie, D. (2024). Mapping the individual, social, and biospheric impacts of foundation models. In FAccT ’24: Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency, June 3–6, 2024, Rio de Janeiro, Brazil. Association for Computer Machinery. Forthcoming.

Dong, Q., Li, L., Dai, D., Zheng, C., Wu, Z., Chang, B., Sun, X., Xu, J., & Sui, Z. (2022). A survey on in-context learning. ArXiv

Doshi-Velez, F., & Kim, B. (2017). Towards a rigorous science of interpretable machine learning. ArXiv.

Dussel, E. D. (2012). Transmodernity and interculturality: An interpretation from the perspective of philosophy of liberation. TRANSMODERNITY: Journal of Peripheral Cultural Production of the Luso-Hispanic World1(3).

Elhage, N., Hume, T., Olsson, C., Schiefer, N., Henighan, T., Kravec, S., Hatfield-Dodds, Z., Lasenby, R., Drain, D., Chen, C., Grosse, R., McCandlish, S., Kaplan, J., Amodei, D., Wattenberg, M., & Olah, C. (2022). Toy models of superposition. ArXiv.

Eloundou, T., Manning, S., Mishkin, P., & Rock, D. (2023). GPTs are GPTS: An early look at the labor market impact potential of large language models. ArXiv.

Ess, C. (2006). Ethical pluralism and global information ethics. Ethics and Information Technology8, 215–226.

European Commission. (2020). On artificial intelligence - A European approach to excellence and trust [White paper].

European Commission. (2022). A notification under Article 12 of Regulation (EU) No 1025/2012. Directorate-General for Internal Market, Industry, Entrepreneurship and SMEs.

European Parliament. (2023, December 9). Artificial Intelligence Act: deal on comprehensive rules for trustworthy AI [Press release].

European Parliament and Council. (2012). Regulation (EU) No 1025/2012 of the European Parliament and of the Council of 25 October 2012 on European standardisation, amending Council Directives 89/686/EEC and 93/15/EEC and Directives 94/9/EC, 94/25/EC, 95/16/EC, 97/23/EC, 98/34/EC, 2004/22/EC, 2007/23/EC, 2009/23/EC and 2009/105/EC of the European Parliament and of the Council and repealing Council Decision 87/95/EEC and Decision No 1673/2006/EC of the European Parliament and of the Council (Text with EEA relevance). Official Journal of the European Union, L 316(12), 12–33.

Farina, M., & Lavazza, A. (2023). ChatGPT in society: Emerging issues. Frontiers in Artificial Intelligence, 6, Article 1130913.

Fecher, B., Hebing, M., Laufer, M., Pohle, J., & Sofsky, F. (2023). Friend or foe? Exploring the implications of large language models on the science system. AI & Society, 1–13.

Fendji, J. L. K. E. (2024). From left behind to left out: Generative AI or the next pain of the unconnected. Harvard Data Science Review, (Special Issue 5).

Ferrario, A., & Loi, M. (2022). How explainability contributes to trust in AI. In FAccT '22: Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency (pp. 1457–1466). Association for Computer Machinery.

Ferri, G., & Gloerich, I. (2023). Risk and harm: Unpacking ideologies in the AI discourse. In M. Lee & C. Munteanu (Eds.), CUI '23: Proceedings of the 5th International Conference on Conversational User Interfaces, Article 28. Association for Computer Machinery.

Fish, R. (2019, March 15). Can ethics be standardized? Creating modern standards for ethical autonomous and intelligent systems. IEEE Communications Standards Magazine.

Fjeld, J., Achten, N., Hilligoss, H., Nagy, A., & Srikumar, M. (2020). Principled artificial intelligence: Mapping consensus in ethical and rights-based approaches to principles for AI. Berkman Klein Center Research Publication No. 2020-1.

Floridi, L. (2019). Translating principles into practices of digital ethics: Five risks of being unethical. Philosophy & Technology, 32(2), 185–193.

Frey, C. B., & Osborne, M. (2023). Generative AI and the future of work: A reappraisal. Brown Journal of World Affairs, 30(1), 1–17.

Fui-Hoon Nah, F., Zheng, R., Cai, J., Siau, K., & Chen, L. (2023). Generative AI and ChatGPT: Applications, challenges, and AI-human collaboration. Journal of Information Technology Case and Application Research, 25(3), 277–304.

Future of Life Institute. (2023, March 22). Pause giant AI experiments: An open letter.

Garfinkel, B., Anderljung, M., Heim, L., Trager, R., Clifford, B., & Seger, E. (2024) Goals for the Second AI Safety Summit. Centre for the Governance of AI.

Garrido, R. (2024). Data protection and generative AI: An inconclusive answer. Harvard Data Science Review, (Special Issue 5).

Gebru, T., Bender, E., McMillan-Major, A., & Mitchell, M. (2023). Statement from the listed authors of Stochastic Parrots on the “AI pause” letter. DAIR Institute.

Gebru T. & Torres, E. (2024). The TESCREAL bundle: Eugenics and the promise of utopia through artificial general intelligence. First Monday, 29(4).

Gehman, S., Gururangan, S., Sap, M., Choi, Y., & Smith, N. A. (2020). RealToxicityPrompts: Evaluating neural toxic degeneration in language models. ArXiv.

Geva, M., Caciularu, A., Wang, K. R., & Goldberg, Y. (2022). Transformer feed-forward layers build predictions by promoting concepts in the vocabulary space. ArXiv.

Ghassemi, M., Oakden-Rayner, L., & Beam, A. L. (2021). The false hope of current approaches to explainable artificial intelligence in health care. The Lancet Digital Health3(11), e745-e750.

Ghorbani, A., Abid, A., & Zou, J. (2019). Interpretation of neural networks is fragile. Proceedings of the AAAI Conference on Artificial Intelligence33(1), 3681–3688.

Gilpin, L. H., Bau, D., Yuan, B. Z., Bajwa, A., Specter, M., & Kagal, L. (2018, October). Explaining explanations: An overview of interpretability of machine learning. In 2018 IEEE 5th International Conference on data science and advanced analytics (DSAA) (pp. 80–89). IEEE.

Glukhov, D., Shumailov, I., Gal, Y., Papernot, N., & Papyan, V. (2023). LLM censorship: A machine learning challenge or a computer security problem? ArXiv.

Gmyrek, P., Berg, J., & Bescond, D. (2023). Generative AI and jobs: A global analysis of potential effects on job quantity and quality. ILO Working Paper 96.

Gruetzemacher, R., Chan, A., Frazier, K., Manning, C., Los, Š., Fox, J., Hernández-Orallo, J., Burden, J., Franklin, M., Ní Ghuidhir, C., Bailey, M., Eth, D., Pilditch, T., & Kilian, K. (2023). An international consortium for evaluations of societal-scale risks from advanced AI. ArXiv.

Greene, D., Hoffmann, A. L., & Stark, L. (2019). Better, nicer, clearer, fairer: A critical assessment of the movement for ethical artificial intelligence and machine learning. In T. Bui (Ed)., Proceedings of the 52nd Hawaii International Conference on System Sciences (pp. 2122–2131). HICSS.

Gupta, U., Kim, Y. G., Lee, S., Tse, J., Lee, H.-H. S., Wei, G.-Y., Brooks, D., & Wu, C.-J. (2021). Chasing carbon: The elusive environmental footprint of computing. In 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA) (pp. 854–867). IEEE.

Gurnee, W., Nanda, N., Pauly, M., Harvey, K., Troitskii, D., & Bertsimas, D. (2023). Finding neurons in a haystack: Case studies with sparse probing. ArXiv.

Guzik, T. & Sitek, A. (2023). Global accord on the integration of artificial intelligence in medical science publishing: Implications of the Bletchley Declaration. Cardiovascular Research, 119(17), 2681–2682.

Hagemann, R., Huddleston Skees, J., & Thierer, A. (2018). Soft law for hard problems: The governance of emerging technologies in an uncertain future. Colorado Technology Law Journal, 17(1), 37.

Hagendorff, T. (2020). The ethics of AI ethics: An evaluation of guidelines. Minds and Machines, 30(1), 99–120.

Hanna, A., & Bender, E. M. (2023, August 12). AI causes real harm. Let’s focus on that over the end-of-humanity hype. Scientific American.

Hao, K. (2019, November 11). The computing power needed to train AI is now rising seven times faster than ever before. MIT Technology Review.

Hardinges, J., Simperl, E., & Shadbolt, N. (2023). We must fix the lack of transparency around the data used to train foundation models. Harvard Data Science Review, (Special Issue 5).

Hartvigsen, T., Gabriel, S., Palangi, H., Sap, M., Ray, D., & Kamar, E. (2022). ToxiGen: A large-scale machine-generated dataset for adversarial and implicit hate speech detection. ArXiv.

Hawes, B., & Hall, D. W. (2023). After the Summit: Progress in public policy on AI. University of Southampton.

Hazell, J. (2023). Spear phishing with large language models. ArXiv.

Helfrich, G. (2024) The harms of terminology: Why we should reject so‑called “frontier AI.” AI and Ethics.

Henderson, P., Li, X., Jurafsky, D., Hashimoto, T., Lemley, M. A., & Liang, P. (2023). Foundation models and fair use. ArXiv.

Hendrycks, D., Mazeika, M., & Woodside, T. (2023). An overview of catastrophic AI risks. ArXiv.

Herkert, J., Marchant, G., & Allenby, B. R. (2011). The growing gap between emerging technologies and legal-ethical oversight: The pacing problem. Springer.

High-Level Expert Group on Artificial Intelligence, European Commission (2019a, April 8). Ethics guidelines for trustworthy AI.

High-Level Expert Group on Artificial Intelligence, European Commission (2019b, June 26). Policy and investment recommendations for trustworthy artificial intelligence.

Hill, K. (2021, November 2). The secretive company that might end privacy as we know it. The New York Times.

Ho, L., Barnhart, J., Trager, R., Bengio, Y., Brundage, M., Carnegie, A., Chowdhury, R., Dafoe, A., Hadfirld, G., Levi, M., & Snidal, D. (2023). International institutions for advanced AI. ArXiv.

Hoffmann, J., Borgeaud, S., Mensch, A., Buchatskaya, E., Cai, T., Rutherford, E., de Las Casas, D., Hendricks, L. A., Welbl, J., Clark, A., Hennigan, T., Noland, E., Millican, K., van den Driessche, G., Damoc, B., Guy, A., Osindero, S., Simonyan, K., Elsen, E., … Sifre, L. (2022). Training compute-optimal large language models. ArXiv.

Hollingshead, W., Quan-Haase, A., & Chen, W. (2021). Ethics and privacy in computational social science: A call for pedagogy. In U. Engel, A. Quan-Haase, S. X. Liu & L. Lyberg (Eds.), Handbook of Computational Social Science, Volume 1, (pp. 171–185). Routledge.

Hopster, J. K., & Maas, M. M. (2023). The technology triad: Disruptive AI, regulatory gaps and value change. AI and Ethics, 1–19.

Huang, S., & Siddarth, D. (2023). Generative AI and the digital commons. ArXiv.

Huang, X., Ruan, W., Huang, W., Jin, G., Dong, Y., Wu, C., Xu, P., Wu, D., Freitas, A., & Mustafa, M. A. (2023). A survey of safety and trustworthiness of large language models through the lens of verification and validation. ArXiv.

Hui, X., Reshef, O., & Zhou, L. (2023). The short-term effects of generative artificial intelligence on employment: Evidence from an online labor market. SSRN.

Hutchinson, B., Prabhakaran, V., Denton, E., Webster, K., Zhong, Y., & Denuyl, S. (2020). Social biases in NLP models as barriers for persons with disabilities. ArXiv.

IEEE Global Initiative on Ethics of Autonomous and Intelligent Systems. (2017, December 12). Ethically aligned design: A vision for prioritizing human well-being with autonomous and intelligent systems. Institute of Electrical and Electronics Engineers.

IEEE Standards Association (2021). IEEE standard model process for addressing ethical concerns during system design. Institute of Electrical and Electronics Engineers.

Information Commissioner’s Office. (n.d.). Data protection principles - guidance and resources.

Information Commissioner’s Office/Alan Turing Institute. (2020). Explaining decisions made with AI.

Institute of Electrical and Electronics Engineers Standards Association. (2018, August 29). The IEEE Global Initiative on Ethics of Autonomous and Intelligent Systems.

International Organization for Standardization. (2022). Information technology – Artificial intelligence: Overview of ethical and societal concerns. (ISO/IEC TR 24368:2022).

International Organization for Standardization, & International Electrotechnical Commission. (2021). Information technology — Artificial intelligence (AI) — Bias in AI systems and AI aided decision making. (ISO/IEC Standard No. 24027:2021).

International Organization for Standardization, & International Electrotechnical Commission. (2022). Information technology — Artificial intelligence — Overview of ethical and societal concerns (ISO/IEC Standard No. 24368:2022).

International Organization for Standardization, & United Nations Industrial Development Organization. (2021). ISO 31000:2018 - Risk management: A practical guide. International.

Iversen, E. J., Vedel, T., & Werle, R. (2004). Standardization and the democratic design of information and communication technology. Knowledge, Technology & Policy, 17(2), 104–126.

Jeyakumar, J. V., Noor, J., Cheng, Y. H., Garcia, L., & Srivastava, M. (2020). How can I explain this to you? An empirical study of deep neural network explanation methods. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, & H. Lin (Eds.), Advances in Neural Information Processing Systems (Vol. 33, pp. 4211–4222). Curran Associates.

Ji, Z., Lee, N., Frieske, R., Yu, T., Su, D., Xu, Y., Ishii, E., Bang, Y. J., Madotto, A., & Fung, P. (2023). Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12), Article 248..

Jiang, H. H., Brown, L., Cheng, J., Khan, M., Gupta, A., Workman, D., Hanna, A., Flowers, J., & Gebru, T. (2023). AI Art and its Impact on Artists. In F. Rossi, S. Das, J. Davis, K. Firth-Butterfield, & A. John (Eds.), AIES '23: Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society (pp. 363-374). Association for Computer Machinery.

Jin, Y., Chandra, M., Verma, G., Hu, Y., De Choudhury, M., & Kumar, S. (2023). Better to ask in English: Cross-lingual evaluation of large language models for healthcare queries. ArXiv.

Jobin, A., Ienca, M., & Vayena, E. (2019). The global landscape of AI ethics guidelines. Nature Machine Intelligence, 1(9), 389–399.

Johnson, R. L., Pistilli, G., Menédez-González, N., Duran, L. D. D., Panai, E., Kalpokiene, J., & Bertulfo, D. J. (2022). The Ghost in the Machine has an American accent: Value conflict in GPT-3. ArXiv.

Kaddour, J., Harris, J., Mozes, M., Bradley, H., Raileanu, R., & McHardy, R. (2023). Challenges and applications of large language models. ArXiv

Kang, D., Li, X., Stoica, I., Guestrin, C., Zaharia, M., & Hashimoto, T. (2023). Exploiting programmatic behavior of LLMs: Dual-use through standard security attacks. ArXiv.

Kaplan, J., McCandlish, S., Henighan, T., Brown, T. B., Chess, B., Child, R., Gray, S., Radford, A., Wu, J., & Amodei, D. (2020). Scaling laws for neural language models. ArXiv.

Karanjai, R. (2022). Targeted phishing campaigns using large scale language models. ArXiv.

Khan, S. (2024). Unpacking AI governance from the margins. Harvard Data Science Review, (Special Issue 5).

Khowaja, S. A., Khuwaja, P., & Dev, K. (2023). ChatGPT needs SPADE (Sustainability, PrivAcy, Digital Divide, and Ethics) evaluation: A review. ArXiv.

Kindermans, P. J., Hooker, S., Adebayo, J., Alber, M., Schütt, K. T., Dähne, S., Erhan, D., & Kim, B. (2019). The (un) reliability of saliency methods. In W. Samek, G. Montavon, A. Vedaldi, L. K. Hansen, & K.-R. Müller (Eds.), Explainable AI: Interpreting, explaining and visualizing deep learning (pp. 267–280). Springer.

Kirk, H. R., Vidgen, B., Röttger, P., & Hale, S. A. (2023). Personalisation within bounds: A risk taxonomy and policy framework for the alignment of large language models with personalised feedback. ArXiv.

Kirk, H. R., Jun, Y., Volpin, F., Iqbal, H., Benussi, E., Dreyer, F., Shtedritski, A. & Asano, Y. (2021). Bias out-of-the-box: An empirical analysis of intersectional occupational biases in popular generative language models. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, & J. Wortman Vaughan (Eds.), Advances in Neural Information Processing Systems (Vol. 34, pp. 2611–2624). Curran Associates.

Klepper, D. (2023, January 24). It turns out that ChatGPT is really good at creating online propaganda: ‘I think what’s clear is that in the wrong hands there’s going to be a lot of trouble.’ Fortune.

Kneese, T., & Young, M. (2024). Carbon emissions in the tailpipe of generative AI. Harvard Data Science Review, (Special issue 5).

Kohnke, A., Laidlaw, G., & Wilson, C. (2021). Challenges in bridging the law enforcement capability gap. In International Conference on Cyber Warfare and Security (pp. 521-XII). Academic Conferences International Limited.

Kokalj, E., Škrlj, B., Lavrač, N., Pollak, S., & Robnik-Šikonja, M. (2021). BERT meets Shapley: Extending SHAP explanations to transformer-based classifiers. In H. Toivonen & M. Boggia (Eds.), Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation (pp. 16–21). Association for Computational Linguistics.

Kotek, H., Dockum, R., & Sun, D. (2023). Gender bias and stereotypes in large language models. In M. Bernstein, S. Savage, & A. Bozzon (Eds.), CI '23: Proceedings of The ACM Collective Intelligence Conference (pp. 12–24). Association for Computer Machinery.

Krishnan, A. (2021, April 15). Decolonial humanitarian digital governance. Berkman Klein Center Collection. Medium.

Kshetri, N., Dwivedi, Y. K., Davenport, T. H., & Panteli, N. (2023). Generative artificial intelligence in marketing: Applications, opportunities, challenges, and research agenda. International Journal of Information Management, 75, Article 102716.

Kurenkov, A. (2022, June 12). Lessons from the GPT-4Chan Controversy. The Gradient.

Lakkaraju, H., & Bastani, O. (2020). "How do I fool you?" Manipulating user trust via misleading black box explanations. In A. Markham, J. Powles, T. Walsh, & A. L. Washington (Eds.), AIES '20: Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (pp. 79-85). Association for Computer Machinery.

Lancieri, F. (2022). Narrowing data protection's enforcement gap. Maine Law Review, 74(1), 15.

Lanham, T., Chen, A., Radhakrishnan, A., Steiner, B., Denison, C., Hernandez, D., Li, D., Durmus, E., Hubinger, E., Kernion, J., Lukosiute, K., Nguyen, K., Cheng, N., Joseph, N., Schiefer, N., Rausch, O., Larson, R., McCandlish, S., Kundu, S., … Perez, E. (2023). Measuring faithfulness in chain-of-thought reasoning. ArXiv.

Lappin, S. (2024). Assessing the strengths and weaknesses of large language models. Journal of Logic, Language and Information33(1), 9–20.

Lassman, P. (2011). Pluralism. Polity.

Laugel, T., Lesot, M. J., Marsala, C., Renard, X., & Detyniecki, M. (2019). The dangers of post-hoc interpretability: Unjustified counterfactual explanations. ArXiv.

Lazar, S., & Nelson, A. (2023). AI safety on whose terms? Science, 381(6654), 138.

Lazer, D., Kennedy, R., King, G., & Vespignani, A. (2014). The parable of Google Flu: Traps in big data analysis. Science, 343(6176), 1203–1205.

Lee, J., Eom, S. Y., & Lee, J. (2023). Empowering game designers with generative AI. IADIS International Journal on Computer Science & Information Systems, 18(2), 213–230.

Lee, S., Gil de Zúñiga, H., & Munger, K. (2023). Antecedents and consequences of fake news exposure: A two-panel study on how news use and different indicators of fake news exposure affect media trust. Human Communication Research, 49(4), 408–420.

Lee, J., & Shin, S. Y. (2022). Something that they never said: Multimodal disinformation and source vividness in understanding the power of AI-enabled deepfake news. Media Psychology, 25(4), 531–546.

Lemley, M. A., & Casey, B. (2020). Fair learning. Texas Law Review99(4), 743–785.

Leslie, D. (2020). Tackling COVID-19 through responsible AI innovation: Five steps in the right direction. Harvard Data Science Review, (Special Issue 1).

Leslie, D. (2023). Does the sun rise for ChatGPT? Scientific discovery in the age of generative AI. AI and Ethics, 1–6.

Leslie, D., Ashurst, C., Menéndez González, N., Griffiths, F., Jayadeva, S., Jorgensen, M., Katell, M., Krishna, S., Kwiatkowski, D., Iglésias Martins, C., Mohammed, S., Mougan, C. Pandit, S., Richey, M., Sakshaug, J., Vallor, S., & Vilain, L. (2024). “Frontier AI,” power, and the public interest: Who benefits, who decides? Harvard Data Science Review, (Special Issue 5).

Leslie, D. & Shaw, P. (2024) Context really matters: The law, ethics, and AI. In M. Hervey & M. Lavy (Eds.), The law of artificial intelligence. Sweet & Maxwell (pp. 31–59).

Leslie, D., & Rossi, F. (2023). Generative artificial intelligence. ACM Tech Briefs, (8), 1–4.

Leslie, D., Katell, M., Aitken, M., Singh, J., Briggs, M., Powell, R., Rincón, C., Chengeta, T., Birhane, A., Perini, A., Jayadeva, S., & Mazumder, A. (2022). Advancing data justice research and practice: An integrated literature review. SSRN.

Leslie, D., Rincón, C., Briggs, M., Perini, A., Jayadeva, S., Borda, A., Bennett, SJ. Burr, C., Aitken, M., Wong, J., Mahomed, S., & Waller, M. (2024). AI explainability in practice. The Alan Turing Institute. 

Levinstein, B. A., & Herrmann, D. A. (2024). Still no lie detector for language models: Probing empirical and conceptual roadblocks. Philosophical Studies.

Li, K., Patel, O., Viégas, F., Pfister, H., & Wattenberg, M. (2024). Inference-time intervention: Eliciting truthful answers from a language model. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, S. Levine (Eds.), Advances in Neural Information Processing Systems (Vol. 36, pp. 41451–41530). Curran Associates.

Li, P., Yang, J., Islam, M. A., & Ren, S. (2023). Making AI less" thirsty": Uncovering and addressing the secret water footprint of ai models. ArXiv.

Liao, Q. V., & Vaughan, J. W. (2024). AI transparency in the age of LLMs: A human-centered research roadmap. Harvard Data Science Review, (Special Issue 5).

Liao, Q. V., Zhang, Y., Luss, R., Doshi-Velez, F., & Dhurandhar, A. (2022, October). Connecting algorithmic research and usage contexts: A perspective of contextualized evaluation for explainable AI. Proceedings of the Tenth AAAI Conference on Human Computation and Crowdsourcing10, 147–159.

Lieberum, T., Rahtz, M., Kramár, J., Irving, G., Shah, R., & Mikulik, V. (2023). Does circuit analysis interpretability scale? Evidence from multiple choice capabilities in Chinchilla. ArXiv.

Lin, J., Zhao, H., Zhang, A., Wu, Y., Ping, H., & Chen, Q. (2023). Agentsims: An open-source sandbox for large language model evaluation. ArXiv.

Liu, W., Wang, X., Wu, M., Li, T., Lv, C., Ling, Z., Zhu, J., Zhang, C., Zheng, X., & Huang, X. (2023). Aligning large language models with human preferences through representation engineering. ArXiv.

Lu, S., Bigoulaeva, I., Sachdeva, R., Madabushi, H. T., & Gurevych, I. (2023). Are emergent abilities in large language models just in-context learning? ArXiv.

Lucchi, N. (2023). ChatGPT: A case study on copyright challenges for generative artificial intelligence systems. European Journal of Risk Regulation, 1–23. Advance online publication.

Luccioni, A. S., Viguier, S., & Ligozat, A. L. (2023). Estimating the carbon footprint of BLOOM, a 176b parameter language model. Journal of Machine Learning Research, 24(1), Article 253.

Lukas, N., Salem, A., Sim, R., Tople, S., Wutschitz, L., & Zanella-Béguelin, S. (2023, May). Analyzing leakage of personally identifiable information in language models. In 2023 IEEE Symposium on Security and Privacy (SP) (pp. 346–363). IEEE.

Lund, B. D., Wang, T., Mannuru, N. R., Nie, B., Shimray, S., & Wang, Z. (2023). ChatGPT and a new academic reality: Artificial Intelligence‐written research papers and the ethics of the large language models in scholarly publishing. Journal of the Association for Information Science and Technology, 74(5), 570–581.

Luo, H., & Specia, L. (2024). From understanding to utilization: A survey on explainability for large language models. ArXiv.

Lynskey, O. (2023). Regulating for the future: The law’s enforcement deficit. Studies: An Irish Quarterly Review 112(445), 104–119.

Lyu, Q., Havaldar, S., Stein, A., Zhang, L., Rao, D., Wong, E., Apidianak, M., & Callison-Burch, C. (2023). Faithful chain-of-thought reasoning. ArXiv.

Ma, W., Scheible, H., Wang, B. C., Veeramachaneni, G., Chowdhary, P., Sun, A., Koulogeorge, A., Wang, L., Yang, D., & Vosoughi, S. (2023, December). Deciphering stereotypes in pre-trained language models. In H. Bouamor, J. Pino, & K. Bali (Eds.), The 2023 Conference on Empirical Methods in Natural Language Processing (pp. 11328–11345). Association for Computational Linguistics.

Madsen, R., & Strong, T. B. (Eds.). (2009). The many and the one: Religious and secular perspectives on ethical pluralism in the modern world. Princeton University Press.

Magalhães, J. C., & Couldry, N. (2020, April 27). Tech giants are using this crisis to colonize the welfare system. JACOBIN.

Mahowald, K., Ivanova, A. A., Blank, I. A., Kanwisher, N., Tenenbaum, J. B., & Fedorenko, E. (2023). Dissociating language and thought in large language models: A cognitive perspective. ArXiv.

Mandi, Z., Jain, S., & Song, S. (2023). Roco: Dialectic multi-robot collaboration with large language models. ArXiv.

Marchant, G. (2019). “Soft Law” governance of artificial intelligence. UCLA: The Program on Understanding Law, Science, and Evidence (PULSE). Retrieved from

Marchant, G., Tournas, L., & Gutierrez, C. I. (2020). Governing emerging technologies through soft law: Lessons for artificial intelligence—An introduction. Jurimetrics, 61(1), 1–18.

Marks, S., & Tegmark, M. (2023). The geometry of truth: Emergent linear structure in large language model representations of true/false datasets. ArXiv.

Martínez, G., Watson, L., Reviriego, P., Hernández, J. A., Juarez, M., & Sarkar, R. (2023). Towards understanding the interplay of generative artificial intelligence and the internet. ArXiv.

Mascellino, A. (2023, January 9). ChatGPT used to develop new malicious tools. Infosecurity Magazine.

Masood, M., Nawaz, M., Malik, K. M., Javed, A., Irtaza, A., & Malik, H. (2023). Deepfakes generation and detection: State-of-the-art, open challenges, countermeasures, and way forward. Applied Intelligence, 53(4), 3974–4026.

Mattas, P. S. (2023). ChatGPT: A study of AI language processing and its implications. International Journal of Research Publications and Reviews, 4(2), 435–440.

McCallum, S. (2023, April 3). ChatGPT banned in Italy over privacy concerns. BBC News.

McGibbon, S., & Solomon, N. (2024). AI and creative work. Harvard Data Science Review, (Special Issue 5).

McGrath, T., Rahtz, M., Kramar, J., Mikulik, V., & Legg, S. (2023). The hydra effect: Emergent self-repair in language model computations. ArXiv.

McIntosh, T. R., Susnjak, T., Liu, T., Watters, P., & Halgamuge, M. N. (2023). From Google Gemini to OpenAI Q*(Q-Star): A survey of reshaping the generative artificial intelligence (AI) research landscape. ArXiv.

Meltzer, J. P. (2023). Toward international cooperation on foundational AI models: An expanded role for trade agreements and international economic policy. Brookings Institution.

Meltzer, J. P. (2024). Toward international cooperation on foundational AI models: An expanded role for trade agreements and international economic policy. Harvard Data Science Review, (Special Issue 5).

Meng, K., Bau, D., Andonian, A., & Belinkov, Y. (2022). Locating and editing factual associations in GPT. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, & A. Oh (Eds.), Advances in Neural Information Processing Systems (Vol. 35, pp. 17359–17372). Curran Associates.

Meng, X. L. (2018). Statistical paradises and paradoxes in big data (I): Law of large populations, big data paradox, and the 2016 US presidential election. The Annals of Applied Statistics12(2).

Metcalf, J., & Singh, R. (2023). Scaling up mischief: Red-teaming AI and distributing governance. Harvard Data Science Review, (Special Issue 5).

Metcalf, J., Moss, E., & boyd, d. (2019). Owning ethics: Corporate logics, Silicon Valley, and the institutionalization of ethics. Social Research: An International Quarterly, 86(2), 449–476.

Mhlambi, S. (2020). From rationality to relationality: Ubuntu as an ethical and human rights framework for artificial intelligence governance. Carr Center Discussion Paper Series No. 2020–009.

Mhlanga, D. (2023). Open AI in education, the responsible and ethical use of ChatGPT towards lifelong learning. In FinTech and artificial intelligence for sustainable development (pp. 387–409). Palgrave Macmillan, Cham.

Milmo, Dan (2023, February 2). ChatGPT reaches 100 million users two months after launch. The Guardian.

Mittelstadt, B. (2019). Principles alone cannot guarantee ethical AI. Nature Machine Intelligence, 1(11), 501–507.

Mittelstadt, B., Russell, C., & Wachter, S. (2019, January). Explaining explanations in AI. In FAT* '19: Proceedings of the Conference on Fairness, Accountability, and Transparency (pp. 279–288). Association for Computer Machinery.

Mohamed, S., Png, M.-T., & Isaac, W. (2020). Decolonial AI: Decolonial theory as sociotechnical foresight in artificial intelligence. Philosophy & Technology, 33(4), 659–684. 

Moltzau, A., & Prabhu, R. (2024). Castles in the sand?: How the public sector and academia can partner in regulatory sandboxes to help leverage generative AI for public good. Harvard Data Science Review, (Special Issue 5).

Mondorf, P., & Plank, B. (2024). Beyond accuracy: Evaluating the reasoning behavior of large language models—A survey. ArXiv.

Morris, J. X., Kuleshov, V., Shmatikov, V., & Rush, A. M. (2023). Text embeddings reveal (almost) as much as text. ArXiv.

Moses, L. B. (2011). Agents of change: How the law 'copes' with technological change. Griffith Law Review, 20(4), 763–794.

Mozes, M., He, X., Kleinberg, B., & Griffin, L. D. (2023). Use of LLMs for illicit purposes: Threats, prevention measures, and vulnerabilities. ArXiv.

Mubangizi, J. C. (2022). A human rights-based approach to the use and regulation of artificial intelligence – An African perspective. Journal of Southwest Jiaotong University57(4).

Munn, L. (2023). The uselessness of AI ethics. AI and Ethics, 3(3), 869–877.

National Institute of Standards and Technology. (2023, January). Artificial Intelligence Risk Management Framework (AI RMF 1.0). U.S. Department of Commerce.

Nekoto, W., Marivate, V., Matsila, T., Fasubaa, T., Kolawole, T., Fagbohungbe, T., Akinola, S. O., Muhammad, S. H., Kabongo, S., Osei, S., Freshia, S., Niyongabo, R. A., Macharm, R., Ogayo, P., Ahia, O., Meressa, M., Adeyemi, M., Mokgesi-Selinga, M., Okegbemi, L., … Bashir, A. (2020). Participatory research for low-resourced machine translation: A case study in African languages. ArXiv.

Nemitz, P. (2018). Constitutional democracy and technology in the age of artificial intelligence. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 376(2133).

Nixon, R. (2011). Slow violence and the environmentalism of the poor. Harvard University Press.

Nozza, D., Bianchi, F., Lauscher, A., & Hovy, D. (2022). Measuring harmful sentence completion in language models for LGBTQIA+ individuals. In B. Raja Chakravarthi, B. Bharathi, J. P. McCrae, M. Zarrouk, K. Bali, & P. Buitelaar (Eds.), Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion (pp. 26–34). Association for Computational Linguistics.

Noy, S., & Zhang, W. (2023). Experimental evidence on the productivity effects of generative artificial intelligence. Science, 381(6654), 187–192.

OECD Artificial Intelligence Policy Observatory. (2024). OECD AI Principles overview. Organisation for Economic Co-operation and Development.

Olah, C., Cammarata, N., Schubert, L., Goh, G., Petrov, M., & Carter, S. (2020). Zoom in: An introduction to circuits. Distill5(3), Article e00024-001.

Olsson, C., Elhage, N., Nanda, N., Joseph, N., DasSarma, N., Henighan, T., Olsson, C., Elhage, N., Nanda, N., Joseph, N., DasSarma, N., Henighan, T., Mann, B., Askell, A., Bai, Y., Chen, A., Conerly, T., Drain, D., Ganguli, D., Hatfield-Dodds, Z., Hernandez, D., Johnston, S., Jones, A., Kernion, J., Lovitt, L., ... Olah, C. (2022). In-context learning and induction heads. ArXiv.

Omiye, J. A., Lester, J. C., Spichak, S., Rotemberg, V., & Daneshjou, R. (2023). Large language models propagate race-based medicine. NPJ Digital Medicine6(1), Article 195.

O’Neill, M., & Connor, M. (2023). Amplifying Limitations, harms and risks of large language models. ArXiv.

Oremus, W., & Izadi, E. (2024, January 4). AI’s future could hinge on one thorny legal question. The Washington Post.

Observatory of Public Sector Innovation (2022, March 22). The strategic and responsible use of artificial intelligence in the public sector of Latin America and the Caribbean. Organisation for Economic Co-operation and Development.

Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., Schulman, J., Hilton, J., Kelton, F., Miller, L., Simens, M., Askell, A., Welinder, P., Christiano, P. F., Leike, J., … & Lowe, R. (2022). Training language models to follow instructions with human feedback. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, & A. Oh (Eds.), Advances In Neural Information Processing Systems (Vol. 35, pp. 27730–27744). Curran Associates.

Oxford Analytica. (2023). UK AI Summit will promote some global cooperation. Expert Briefings.

Pan, Y., Pan, L., Chen, W., Nakov, P., Kan, M. Y., & Wang, W. (2023). On the risk of misinformation pollution with large language models. In Findings of the Association for Computational Linguistics: EMNLP 2023 (pp. 1389–1403). Association for Computational Linguistics.

Parashar, S., Lin, Z., Liu, T., Dong, X., Li, Y., Ramanan, D., Caverlee, J., & Kong, S. (2024). The neglected tails of vision-language models. ArXiv.

Park, J. S., O’Brien, J., Cai, C. J., Morris, M. R., Liang, P., & Bernstein, M. S. (2023). Generative agents: Interactive simulacra of human behavior. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (UIST '23) (Article 2). Association for Computer Machinery.

Paul, R. (2022). The politics of regulating artificial intelligence technologies: A competition state perspective. In R. Paul, E. Carmel, & J. Cobbe (Eds.), Handbook on public policy and artificial intelligence. Edward Elgar.

Pavlik, J. V. (2023). Collaborating with ChatGPT: Considering the implications of generative artificial intelligence for journalism and media education. Journalism & Mass Communication Educator, 78(1), 84–93.

Pawelec, M. (2022). Deepfakes and democracy (theory): How synthetic audio-visual media for disinformation and hate speech threaten core democratic functions. Digital Society, 1(2), Article 19.

Perrigo, B. (2021, August 23). An artificial intelligence helped write this play. It may contain racism. TIME.

Perry, N., Srivastava, M., Kumar, D., & Boneh, D. (2023). Do users write more insecure code with AI assistants? In W. Meng & C. D. Jensen (Eds.), CCS '23: Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security (pp. 2785–2799). Association for Computer Machinery.

Piñeiro-Martín, A., García-Mateo, C., Docío-Fernández, L., & López-Pérez, M. D. C. (2023). Ethical challenges in the development of virtual assistants powered by large language models. Electronics, 12(14), Article 3170.

Piskopani, A. M., Chamberlain, A., & Ten Holter, C. (2023). Responsible AI and the arts: The ethical and legal implications of AI in the arts and creative industries. In TAS '23: Proceedings of the First International Symposium on Trustworthy Autonomous Systems, Article 48. Association for Computer Machinery.

Porsdam Mann, S., Earp, B. D., Møller, N., Vynn, S., & Savulescu, J. (2023). AUTOGEN: A personalized large language model for academic enhancement—Ethics and proof of principle. The American Journal of Bioethics, 23(10), 28–41.

Pouget, H. (2023). What will the role of standards be in AI governance? Ada Lovelace Institute.

Price, A. (2019). Establishing trustworthiness is vital in our human-machine world. International Electrotechnical Commission.

Qadir, J. (2023). Engineering education in the era of ChatGPT: Promise and pitfalls of generative AI for education. In 2023 IEEE Global Engineering Education Conference (EDUCON) (pp. 1–9). IEEE.

Qin, Y., Hu, S., Lin, Y., Chen, W., Ding, N., Cui, G., Zeng, Z., Huang, Y., Xiao, C., Han, C., Fung, Y. R., Su, Y., Wang, H., Qian, C., Tian, R., Zhu, K., Liang, S., Shen, X., Xu, B., ... Sun, M. (2023). Tool learning with foundation models. ArXiv.

Radhakrishnan, A., Nguyen, K., Chen, A., Chen, C., Denison, C., Hernandez, D., Durmus, E., Hubinger, E., Kernion, J., Lukošiūtė, K., Cheng, N., Joseph, N., Schiefer, N., Rausch, O., McCandlish, S., El Showk, S., Lanham, T., Maxwell, T., Chandrasekaran, V., ... Perez, E. (2023). Question decomposition improves the faithfulness of model-generated reasoning. ArXiv.

Rahman-Jones, I. (2024, January 31). ChatGPT: Italy says OpenAI’s chatbot breaches data protection rules. BBC News.

Rana, K., Haviland, J., Garg, S., Abou-Chakra, J., Reid, I., & Suenderhauf, N. (2023). SayPlan: Grounding large language models using 3D scene graphs for scalable robot task planning. Proceedings of the 7th Annual Conference on Robot Learning. Proceedings of Machine Learning Research, 229, 23–72.

Räuker, T., Ho, A., Casper, S., & Hadfield-Menell, D. (2023). Toward transparent AI: A survey on interpreting the inner structures of deep neural networks. In 2023 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML) (pp. 464–483). IEEE.

Rességuier, A., & Rodrigues, R. (2020). AI ethics should not remain toothless! A call to bring back the teeth of ethics. Big Data & Society, 7(2).

Reuters. (2023, September 26). Jury to decide on whether AI was trained on copied material in one of first major AI training trials. Euronews.

Rillig, M. C., Ågerstrand, M., Bi, M., Gould, K. A., & Sauerland, U. (2023). Risks and benefits of large language models for the environment. Environmental Science & Technology, 57(9), 3464–3466.

Robbins, S., & van Wynsberghe, A. (2022). Our new artificial intelligence infrastructure: Becoming locked into an unsustainable future. Sustainability, 14(8), Article 8.

Roberts, H., Hine, E., Taddeo, M., & Floridi, L. (2024). Global AI governance: Barriers and pathways forward. International Affairs, 100(3), Article iiae073.

Roberts, H., Cowls, J., Hine, E., Morley, J., Wang, V., Taddeo, M., & Floridi, L. (2023). Governing artificial intelligence in China and the European Union: Comparing aims and promoting ethical outcomes. The Information Society, 39(2), 79–97.

Roberts, H.; Ziosi, M; Osborne, C. & Saouma, L. (2023, February). A comparative framework for AI regulatory policy. The International Centre of Expertise on Artificial Intelligence in Montreal.

Romero Moreno, F. (2024). Generative AI and deepfakes: A human rights approach to tackling harmful content. International Review of Law, Computers & Technology, 1–30.

Rudin, C. (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence1(5), 206–215.

Rudin, C., Chen, C., Chen, Z., Huang, H., Semenova, L., & Zhong, C. (2022). Interpretable machine learning: Fundamental principles and 10 grand challenges. Statistic Surveys16, 1–85.

Rudin, C., & Radin, J. (2019). Why are we using black box models in AI when we don’t need to? A lesson from an explainable AI competition. Harvard Data Science Review1(2).

Ryan-Mosley, T. (2023, June 12.) It’s time to talk about the real AI risks. MIT Technology Review.

Sachs, W., & Santarius, T. (Eds.). (2007). Fair future: Resource conflicts, security, and global justice. Zed Books.

Samuelson, P. (2023). Generative AI meets copyright. Science, 381(6654), 158–161.

Sandbrink, J. B. (2023). Artificial intelligence and biological misuse: Differentiating risks of language models and biological design tools. ArXiv.

Sanyal, S., & Ren, X. (2021). Discretized integrated gradients for explaining language models. ArXiv.

Sap, M., Swayamdipta, S., Vianna, L., Zhou, X., Choi, Y., & Smith, N. A. (2021). Annotators with attitudes: How annotator beliefs and identities bias toxic language detection. ArXiv.

Scherlis, A., Sachan, K., Jermyn, A. S., Benton, J., & Shlegeris, B. (2022). Polysemanticity and capacity in neural networks. ArXiv.

Schick, T., Dwivedi-Yu, J., Dessì, R., Raileanu, R., Lomeli, M., Hambro, E., Zettlemoyer, L., Cancedda, N., & Scialom, T. (2024). Toolformer: Language models can teach themselves to use tools. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, S. Levine (Eds.), Advances in Neural Information Processing Systems (Vol. 36, pp. 68539–68551). Curran Associates.

Schopmans, H. R. (2022). From coded bias to existential threat: Expert frames and the epistemic politics of AI governance. In V. Conitzer & J. Tasioulas (Eds.), AIES '22: Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society(pp. 627–640). Association for Computer Machinery.

Seaver, N. (2021). Care and scale: Decorrelative ethics in algorithmic recommendation. Cultural Anthropology36(3), 509–537.

Sevilla, J., Heim, L., Ho, A., Besiroglu, T., Hobbhahn, M., & Villalobos, P. (2022). Compute trends across three eras of machine learning. In 2022 International Joint Conference on Neural Networks (IJCNN) (pp. 1-8). IEEE.

Shahin Shamsabadi, A., Yaghini, M., Dullerud, N., Wyllie, S., Aïvodji, U., Alaagib, A., Gambs, S., & Papernot, N. (2022). Washing the unwashable: On the (im)possibility of fairwashing detection. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, & A. Oh (Eds.), Advances in Neural Information Processing Systems (Vol. 35, pp. 14170–14182). Curran Associates.

Shanahan, M. (2024). Talking about large language models. Communications of the ACM, 67(2), 68–79.

Shao, H., Huang, J., Zheng, S., & Chang, K. C. C. (2023). Quantifying association capabilities of large language models and its implications on privacy leakage. ArXiv.

Sheehan, M. (2023, July 10). China’s AI regulations and how they get made. Carnegie Endowment for International Peace.

Shelby, R., Rismani, S., Henne, K., Moon, A., Rostamzadeh, N., Nicholas, P., Yilla-Akbari, N., Gallegos, J., Smart, A., Garcia, E. & Virk, G. (2023). Sociotechnical harms of algorithmic systems: Scoping a taxonomy for harm reduction. In F. Rossi, S. Das, J. Davis, K. Firth-Butterfield, & A. John (Eds.), AIES '23: Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society (pp. 723–741).

Shevlane, T., Farquhar, S., Garfinkel, B., Phuong, M., Whittlestone, J., Leung, J., Kokotajlo, D., Marchal, N., Anderljung, M., Kolt, N., Ho, L., Siddarth, D., Avin, S., Hawkins, W., Kim, B., Gabriel, I., Bolina, V., Clark, J., Bengio, Y., Christiano, P., &Dafoe, A. (2023). Model evaluation for extreme risks. ArXiv.

Shoaib, M. R., Wang, Z., Ahvanooey, M. T., & Zhao, J. (2023). Deepfakes, misinformation, and disinformation in the era of frontier AI, generative AI, and large AI models. In 2023 International Conference on Computer and Applications (ICCA) (pp. 1-7). IEEE.

Shumailov, I., Shumaylov, Z., Zhao, Y., Gal, Y., Papernot, N., & Anderson, R. (2023). The curse of recursion: Training on generated data makes models forget. ArXiv.

Sikdar, S., Bhattacharya, P., & Heese, K. (2021). Integrated directional gradients: Feature interaction attribution for neural NLP models. In C. Zong, F. Xia, W. Li, & R. Navigli (Eds.), Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (pp. 865–878). Association for Computational Linguistics.

Singh, C., Inala, J. P., Galley, M., Caruana, R., & Gao, J. (2024). Rethinking interpretability in the era of large language models. ArXiv.

Slack, D., Hilgard, S., Jia, E., Singh, S., & Lakkaraju, H. (2020). Fooling LIME and SHAP: Adversarial attacks on post hoc explanation methods. In A. Markham, J. Powles, T. Walsh, & A. L. Washington (Eds.), AIES '20: Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (pp. 180–186). Association for Computer Machinery.

Slack, D., Hilgard, A., Lakkaraju, H., & Singh, S. (2021). Counterfactual explanations can be manipulated. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, & J. Wortman Vaughan (Eds.), Advances in Neural Information Processing Systems (Vol. 34, pp. 62–75). Curran Associates.

Smakman, J., Davies, M., & Birtwistle, M. (2023). Mission critical: Lessons from relevant sectors for AI safety. The Ada Lovelace Institute.

Smith-Goodson, P. (2023, July 21). The extraordinary ubiquity of generative AI and how major companies are using it. Forbes.

Sobel, B. (2024, February 16). Don’t give AI free access to work denied to humans, argues a legal scholar. The Economist.

Soice, E. H., Rocha, R., Cordova, K., Specter, M., & Esvelt, K. M. (2023). Can large language models democratize access to dual-use biotechnology? ArXiv.

Solaiman, I., Talat, Z., Agnew, W., Ahmad, L., Baker, D., Blodgett, S. L., Daumé III, H., Dodge, J., Evans, E., Hooker, S., Jernite, Y., Luccioni, A. S., Lusoli, A., Mitchell, M., Newman, J., Png, M.-T., Strait, A., Vassilev, A., & Vassilev, A. (2023). Evaluating the social impact of generative AI systems in systems and society. ArXiv.

Somepalli, G., Singla, V., Goldblum, M., Geiping, J., & Goldstein, T. (2023). Diffusion art or digital forgery? investigating data replication in diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 6048–6058). IEEE.

Stahl, B. C., Leach, T., Oyeniji, O., & Ogoh, G. (2023). AI policy as a response to AI ethics? Addressing ethical issues in the development of AI policies in North Africa. In D. Okaibedi Eke, K. Wakunuma, & S. Akintoye (Eds.), Responsible AI in Africa: Challenges and opportunities (pp. 141–167). Springer International Publishing.

Stempel, J. (2023, December 27). NY Times sues OpenAI, Microsoft for infringing copyrighted works. Reuters.

Strubell, E., Ganesh, A., & McCallum, A. (2020). Energy and policy considerations for modern deep learning research. Proceedings of the AAAI Conference on Artificial Intelligence, 34(9), 13693–13696.

Strubell, E., Ganesh, A., & McCallum, A. (2019). Energy and policy considerations for deep learning in NLP. In A. Korhonen, D. Traum, & L. Màrquez (Eds.), Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (pp. 3645–3650). Association for Computational Linguistics.

Subhash, V. (2023). Can large language models change user preference adversarially? ArXiv.

Tacheva, J., & Ramasubramanian, S. (2023). AI Empire: Unraveling the interlocking systems of oppression in generative AI’s global order. Big Data & Society, 10(2).

Talat, Z., Névéol, A., Biderman, S., Clinciu, M., Dey, M., Longpre, S., Luccioni, S., Masoud, M., Mitchell, M., Radev, D., Sharma, S., Subramonian, A., Tae, J., Tan, S., Tunuguntla, D., & Van Der Wal, O. (2022, May). You reap what you sow: On the challenges of bias evaluation under multilingual settings. In A. Fan, S. Ilic, T. Wolf, & M. Gallé (Eds.), Proceedings of BigScience Episode# 5–Workshop on Challenges & Perspectives in Creating Large Language Models (pp. 26–41). Association for Computational Linguistics.

Tao, Y., Viberg, O., Baker, R. S., & Kizilcec, R. F. (2023). Auditing and mitigating cultural bias in LLMs. ArXiv.

Taylor, L., & Dencik, L. (2020). Constructing commercial data ethics. Technology and Regulation, 2020, 1–10.

Terzis, P. (2024). Law and the political economy of AI production. International Journal of Law and Information Technology, 31, 302–330.

Thieme, A., Nori, A., Ghassemi, M., Bommasani, R., Andersen, T. O., & Luger, E. (2023). Foundation models in healthcare: Opportunities, risks & strategies forward. In A. Schmidt, K. Väänänen, T. Goyal, P. O. Kristensson, & A. Peters (Eds.), CHI 2023: Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems (pp. 1–4). Association for Computer Machinery.

Thomson Reuters Foundation. (2023). AI governance for Africa toolkit.

Toffler, A. (1984). Future shock. Bantam. (Original work published 1970)

Trager, R., Harack, B., Reuel, A., Carnegie, A., Heim, L., Ho, L., Kreps, S., Lall, R., Larter, O., Ó hÉigeartaigh, S., Staffell, S., Villalobos, J. J. & Villalobos, J. J. (2023). International governance of civilian AI: A jurisdictional certification approach. ArXiv.

Triguero, I., Molina, D., Poyatos, J., Del Ser, J., & Herrera, F. (2024). General Purpose Artificial Intelligence Systems (GPAIS): Properties, definition, taxonomy, societal implications and responsible governance. Information Fusion, 103, Article 102135.

Turpin, M., Michael, J., Perez, E., & Bowman, S. (2024). Language models don't always say what they think: unfaithful explanations in chain-of-thought prompting. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, S. Levine (Eds.), Advances in Neural Information Processing Systems (Vol. 36, pp. 74952–74965). Curran Associates.

United Nations Educational, Scientific, and Cultural Organization (2022a). Landscape study of AI policies and use in Southern Africa: Research report.

United Nations Educational, Scientific and Cultural Organization (2022b). Recommendation on the ethics of artificial intelligence.

Urbina, F., Lentzos, F., Invernizzi, C., & Ekins, S. (2022). Dual use of artificial-intelligence-powered drug discovery. Nature Machine Intelligence, 4(3), 189–191.

Urbanovics, A. (2023). Artificial Intelligence Landscape in South America. AARMS–Academic and Applied Research in Military and Public Management Science22(1), 101–114.

Vallor, S. & Luger, E. (2023). A shrinking path to AI safety. Edinburgh Futures Institute.

van Maanen, G. (2022). AI ethics, ethics washing, and the need to politicize data ethics. Digital Society, 1(2), Article 9.

Veale, M. (2020). A critical take on the policy recommendations of the EU high-level expert group on artificial intelligence. European Journal of Risk Regulation, 11(1), e1.

Veale, M., Matus, K., & Gorwa, R. (2023). AI and global governance: Modalities, rationales, tensions. Annual Review of Law and Social Science19, 255–275.

Vock, I. (2022, December). ChatGPT proves that AI still has a racism problem. New Statesman.

Vought, R. T. (2020). Memorandum for the heads of executive departments and agencies: Guidance for regulation of artificial intelligence applications. U.S. Office of Management and Budget.

Wagner, B. (2018). Ethics as an escape from regulation. From “ethics-washing” to ethics-shopping? In E. Bayamlioglu, I. Baraliuc, L. A. W. Janssens, & M. Hildebrandt (Eds.), BEING PROFILED:COGITAS ERGO SUM (pp. 84–89). Amsterdam University Press.

Wang, K., Variengien, A., Conmy, A., Shlegeris, B., & Steinhardt, J. (2022). Interpretability in the wild: A circuit for indirect object identification in GPT-2 small. ArXiv.

Wang, L., Ma, C., Feng, X., Zhang, Z., Yang, H., Zhang, J., Chen, Z., Tang, J., Chen, X., Lin, Y., Zhao, W. X., Wei, Z., & Wen, J. R. (2024). A survey on large language model based autonomous agents. Frontiers of Computer Science, 18(6), Article 186345.

Wang, Z., Cai, S., Chen, G., Liu, A., Ma, X., & Liang, Y. (2023). Describe, explain, plan and select: Interactive planning with large language models enables open-world multi-task agents. ArXiv.

Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q. V., & Zhou, D. (2022). Chain-of-thought prompting elicits reasoning in large language models. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, & A. Oh (Eds.), Advances in Neural Information Processing Systems (Vol. 35, pp. 24824–24837). Curran Associates.

Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P.-S., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L. A., … Gabriel, I. (2021). Ethical and social risks of harm from language models. ArXiv.

Weidinger, L., Rauh, M., Marchal, N., Manzini, A., Hendricks, L. A., Mateos-Garcia, J., Bergman, S., Kay, J., Griffin, C., Bariach, B., Gabriel, I., Rieser, V., & Isaac, W. (2023). Sociotechnical safety evaluation of generative AI systems. ArXiv.

Weidinger, L., Uesato, J., Rauh, M., Griffin, C., Huang, P. S., Mellor, J., Glaese, A., Cheng, M., Balle, B., Kasirzadeh, A., Biles, C., Brown, S., Kenton, Z., Hawkins, W., Stepleton, T., Birhane, A., Hendricks, L. A., Rimell, L., Isaac, W., ... Gabriel, I. (2022). Taxonomy of risks posed by language models. In FAccT '22: Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency (pp. 214–229). Association for Computer Machinery.

Weich, B. (2024). The double-edged sword of AI: How generative language models like Google Bard and ChatGPT pose a threat to countering hate and misinformation online. Harvard Data Science Review, (Special Issue 5).

Weiss-Blatt, N. (2023,October 15). The AI panic campaign — part 1. AI Panic News.

Werle, R., & Iversen, E. J. (2006). Promoting legitimacy in technical standardization. Science, Technology & Innovation Studies, 2(1), 19–39.

Werner, T. (2015). Gaining access by doing good: The effect of sociopolitical reputation on firm participation in public policy making. Management Science61(8), 1989–2011.

Westra, L. & Lawson, B. (Eds.). (2001). Faces of environmental racism: Confronting issues of global justice (2nd ed.). Rowman & Littlefield.

Wheeler, T. (2023). The three challenges of AI regulation. Brookings Institution.

White, S. J. O., & Shine, J. P. (2016). Exposure potential and health impacts of indium and gallium, metals critical to emerging electronics and energy technologies. Current Environmental Health Reports, 3(4), 459–467.

White House Office of Science and Technology Policy. (2022, October). Blueprint for an AI bill of rights: Making automated systems work for the American people [White paper].

Williams, C. (2022). Framing the future: The foundation series,'foundation' models and framing AI. Law, Technology and Humans, 4(2), 109–123.

Wolf, Y., Wies, N., Avnery, O., Levine, Y., & Shashua, A. (2023). Fundamental limitations of alignment in large language models. ArXiv.

World Economic Forum. (2024). Generative AI governance: Shaping a collective global future. AI Governance Alliance Briefing Paper Series.

Wright, J., Leslie, D., Raab, C., Ostmann, F., Briggs, M., & Kitagawa, F. (2021). Privacy, agency and trust in human-AI ecosystems: Interim report (short version). The Alan Turing Institute.

Wu, X., Zhao, H., Zhu, Y., Shi, Y., Yang, F., Liu, T., Zhai, X., Yao, W., Li, J., Du, M., & Liu, N. (2024). Usable XAI: 10 strategies towards exploiting explainability in the LLM era. ArXiv.

Xi, Z., Chen, W., Guo, X., He, W., Ding, Y., Hong, B., Zhang, M., Wang, J., Jin, S., Zhou, E., Zheng, R., Fan, X., Wang, X., Xiong, L., Zhou, Y., Wang, W., Jiang, C., Zou, Y., Liu, X., ... Gui, T. (2023). The rise and potential of large language model based agents: A survey. ArXiv.

Xie, S. M., & Min, S. (2022, August 1). How does in-context learning work? A framework for understanding the differences from traditional supervised learning. The Stanford AI Lab Blog.

Yang, S., Nachum, O., Du, Y., Wei, J., Abouel, H., & Schuurmans, D. (2023). Foundation models for decision making: Problems, methods, and opportunities. ArXiv.

Yang, L., Zhang, Z., Song, Y., Hong, S., Xu, R., Zhao, Y., Zhang, W,. Cui, B., & Yang, M.-H. (2023). Diffusion models: A comprehensive survey of methods and applications. ACM Computing Surveys, 56(4), Article 105.

Yao, S., Yu, D., Zhao, J., Shafran, I., Griffiths, T., Cao, Y., & Narasimhan, K. (2024). Tree of thoughts: Deliberate problem solving with large language models. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, S. Levine (Eds.), Advances in Neural Information Processing Systems (Vol. 36, pp. 11809–11822). Curran Associates.

Ye, X., & Durrett, G. (2022). The unreliability of explanations in few-shot prompting for textual reasoning. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, & A. Oh (Eds.), Advances in Neural Information Processing Systems (Vol. 35, pp. 30378–30392). Curran Associates.

Yeung, K., Howes, A., & Pogrebna, G. (2020). AI governance by human rights–centered design, deliberation, and oversight. In M. D. Dubber, F. Pasquale, & S. Das (Eds.), The Oxford handbook of ethics of AI (pp. 77–106).

Zarifhonarvar, A. (2023). Economics of ChatGPT: A labor market view on the occupational impact of artificial intelligence. Journal of Electronic Business & Digital Economics.

Zhao, H., Chen, H., Yang, F., Liu, N., Deng, H., Cai, H., Wang, S., Yin, D., & Du, M. (2024). Explainability for large language models: A survey. ACM Transactions on Intelligent Systems and Technology15(2), Article 20.

Zhao, H., Yang, F., Lakkaraju, H., & Du, M. (2024). Opening the black box of large language models: Two views on holistic interpretability. ArXiv.

Zhou, J., Chen, F., & Holzinger, A. (2020, July). Towards explainability for AI fairness. In A. Holzinger, R. Goebel, R. Fong, T. Moon, K. R. Müller, & W. Samek (Eds.), International Workshop on Extending Explainable AI Beyond Deep Models and Classifiers (pp. 375–386). Springer.

Zou, A., Phan, L., Chen, S., Campbell, J., Guo, P., Ren, R., Pan, A., Yin, X., Mazeika, M., Dombrowski, A.-K., Goel, S., Li, N., Byun, M. J., Wang, Z., Mallen, A., Basart, S., Koyejo, S., Song, D., Fredrikson, M., Kolter, J. Z., & Hendrycks, D. (2023). Representation engineering: A top-down approach to AI transparency. ArXiv.

©2024 David Leslie and Antonella Maia Perini. This article is licensed under a Creative Commons Attribution (CC BY 4.0) International license, except where otherwise indicated with respect to particular material included in the article.

No comments here
Why not start the discussion?