Skip to main content
SearchLoginLogin or Signup

Government Interventions to Avert Future Catastrophic AI Risks

Published onJun 04, 2024
Government Interventions to Avert Future Catastrophic AI Risks
·

Abstract

This essay is a revised transcription of Yoshua Bengio’s July 2023 testimony in front of the U.S. Senate Subcommittee on Privacy, Technology, and the Law meeting on the topic of oversight of AI. It argues for caution and government interventions in regulation and research investments to mitigate the potentially catastrophic outcomes from future advances in AI as the technology approaches human-level cognitive abilities. It summarizes the trends in advancing capabilities and the uncertain timeline to these future advances, as well as the different types of catastrophic scenarios that could follow, including both intentional and unintentional cases, misuse by bad actors, and intentional as well as unintentional loss of control of powerful AIs. It makes public policy recommendations that include national regulation, international agreements, and public research investments in AI safety as well as classified research investments to design aligned AI systems that can safely protect us from bad actors and uncontrolled dangerous AI systems. It highlights the need for strong democratic governance processes to control the safety and ethical use of future powerful AI systems, whether they are in private hands or under government authority.

Keywords: artificial intelligence, AI safety, AI alignment, AI regulation, AI public policy, AI countermeasures


1. Executive Summary

The capabilities of AI systems have steadily increased over the last two decades, often in surprising ways, thanks to the development of deep learning, for which I received the 2018 Turing Award with my colleagues Geoffrey Hinton and Yann LeCun. These advancements have led many top AI researchers, including us three, to revise our estimates of when human levels of broad cognitive competence will be achieved. Previously thought to be decades or even centuries away, I and other leading AI scientists now believe human-level AI (also called AGI for artificial general intelligence) could be developed within the next 2 decades, and even possibly within the next few years. Comparing key properties of digital computers to the ‘biological hardware’ of our human brains suggests that such capability levels might then give AI systems significant intellectual advantages over humans. This includes the ability to extract and exploit much larger amounts of knowledge and the ability to easily make copies of the AI. Several such instances of an AI can communicate at much greater speeds than humans can among each other and act as one larger system that can only be destroyed by eliminating all the copies.

Progress in AI has opened exciting opportunities for numerous beneficial applications that have driven researchers like myself throughout our careers. The positive potential of these advancements has rightfully attracted significant industrial investments and allowed rapid progress; for example, in computer vision, natural language processing, and molecular modeling. However, they also introduce new negative impacts and risks against which comparatively little investment has been made. These risks are challenging to assess, yet some have the potential to be catastrophic on a global scale. They range from major threats to democracy and national security to the possibility of creating new entities more capable than humans, which could lead to humanity losing control over its future.

Although this article focuses on catastrophic and generally global risks associated with anticipated AI systems as they approach or surpass human intelligence, there are many current and shorter term risks that deserve regulatory attention, and in fact constitute the focus of the EU Artificial Intelligence Act (AI Act, 2021). In the following sections, I will explain how such catastrophic outcomes could arise, emphasizing four factors that governments can influence to reduce the probability of such events. These factors all involve human choices and include: (1) access—who can tinker with powerful AIs, what protocols must they follow, under what kind of oversight? (2) misalignment—the extent to which powerful AIs act not as intended by their developers, potentially causing severe or even catastrophic harm; (3) raw intellectual power—the capabilities of an AI system, which depend on the sophistication of its underlying algorithms and the computing resources and data sets on which it was trained; and (4) scope of actions—the ability to affect the world and cause harm in spite of society’s defenses.

Importantly, none of the methods currently known to obtain highly capable AI systems are demonstrably safe against the risk of loss of control to a misaligned AI. Therefore, the risks from catastrophic misuse or loss of control are bound to grow as AI capabilities continue on their rapid trend upward. To minimize these risks, I propose the following actions that governments can take to reduce all four risk factors contributing to catastrophic AI risk described above.

  • First, an accelerated implementation of agile national and multilateral regulatory frameworks and legislation that prioritize safety of the public from all current and anticipated risks and harms associated with AI, with more severe risks requiring more scrutiny. In terms of capabilities, some red lines (International Dialogues on AI Safety [IDAIS], 2024) should never be crossed by future AI systems: autonomous replication or improvement, dominant self-preservation and power seeking, assisting in weapon development, cyberattacks, and deception.

  • Second, a significant increase in global research endeavors focused on AI safety and governance to understand existing and future risks better, as well as study possible mitigation measures, both technical and nontechnical. This open science research should concentrate on safeguarding human rights and democracy, enabling the informed creation of essential regulations, safety protocols, safe AI methodologies, and robust and democratic as well as multilateral governance structures for powerful AI systems of the future, to make sure future AI is developed for the benefit of all on this planet.

  • Third, investing now in research and development of shared as well as classified defense measures to protect citizens and society from potential rogue AIs or AI-equipped bad actors with harmful goals. This should include any research on AI safety that could also be dangerous in bad hands. This work should be conducted within several highly secure laboratories operating under multilateral and public oversight, aiming to minimize the risks associated with an AI arms race among governments or corporations, the proliferation of dangerous AGI projects and the abuse of their power by any entity, including a democratically elected government.

The magnitude of some risks from powerful AI is so considerable that we should mobilize our most capable minds and ensure major investments in these efforts, on par with past efforts such as the space program or nuclear technologies—in order to fully reap the economic and social benefits of AI, while protecting our citizens and humanity’s shared future.

In the face of rapid technological change and the growing ubiquity of AI in society, the need for policy action is urgent. We cannot afford to wait until a crisis or a ‘Black swan’ event (low probability, high impact) occurs to react. The unprecedented pace of development, deployment, and adoption requires immediate, proactive, and deliberate measures. Without such rapid adoption of governance mechanisms, I believe there are significant chances that the risks AI poses will far outweigh the innovation opportunities it may otherwise enable.

2. Strong Convictions on AI Research and Development

From the beginning of my graduate studies in the 1980s, I made a deliberate choice to embark on research concerning artificial neural networks, which later gave rise to the advent of deep learning in the 2000s. I was motivated by curiosity to comprehend the essence of intelligence, both within the natural world and for our capacity to craft artificial intelligences. The approach I pursued, centered around learning abilities and brain-inspired computation, was driven by the hypothesis that there exist scientific principles capable of elucidating the nature of intelligence, analogous to the fundamental principles that underpin the entirety of physics. The remarkable progress witnessed over the past two decades in the realms of deep learning and modern AI serves as compelling evidence that this is indeed the case.

In the 2010s, another motivating factor for my research emerged: the potential of AI to benefit humanity in numerous ways. For several years, AI has been driving a new scientific and economic revolution: from helping us discover new medications, to improving our ability to address pandemics, to providing new tools to fight the climate crisis, all while improving efficiency and productivity across many sectors of the economy.

As a university professor leading a sizable research group, I considered it my responsibility to spend a significant portion of my work on AI applications that may not receive adequate private investments. Examples of such areas include research on infectious diseases or the development of new technologies that can model and combat climate change. Just as governments invested in areas such as medical research, environmental research, military research, the space program, and various kinds of tech during the early days of Silicon Valley, with greater public investment and attention, “AI for good” applications could yield exceptional benefits to society across many domains.

The increased use of AI has come with downsides, too, and I have dedicated considerable personal effort to raising awareness of possible negative impacts, such as human rights issues, including race and gender discrimination, as well as AI-enabled weapons and emerging concentration of power at odds with democracy and market efficiency. Additionally, I have actively participated in the development of social norms, standards, and regulations at both national and international levels. Notably, my work includes contributions to initiatives like the Montreal Declaration for a Responsible Development of Artificial Intelligence (2018), the Global Partnership on Artificial Intelligence (linked to the OECD), and serving on the Advisory Council on Artificial Intelligence for the Government of Canada as well as the UN Secretary General’s Advisory Board for Independent Advice on Breakthroughs in Science and Technology. These endeavors aim to ensure that AI progresses in a responsible and ethically aligned manner and for the benefit of all.

3. Generative AI: The Turning Point

Recent years have seen impressive advancements in the capabilities of generative AI, starting with image, speech, and video generation, more recently extended to natural language and made available to the public with OpenAI’s ChatGPT, Microsoft’s Bing Chat, Google’s Bard and Gemini, and Anthropic’s Claude. As a consequence, many AI researchers, including myself, have significantly revised our estimates regarding the timeline for achieving artificial general intelligence (AGI) or human-level abilities, that is, performance comparable to or stronger than humans on most cognitive tasks.

Previously, I had placed a plausible timeframe for this achievement somewhere between a few decades and a century. However, along with my esteemed colleagues and co-recipients of the Turing Award for deep learning, Geoffrey Hinton and Yann LeCun, I now believe this plausible timeframe is shorter than we anticipated: within a few years to a couple of decades. Attaining AGI within 5 years would be particularly worrisome because scientists, regulators, and international organizations will most likely require a significant amount of time to effectively prepare for and mitigate the potentially significant threats to democracy, national security, and our collective future. See the Epoch AI (2023) website for more detailed graphs showing trends in the growing capabilities of AI.

While the scientific methodology behind generative AI was not in itself revolutionary, the massive capability increase that comes from combining this methodology with large-scale training data and computational resources to train the AI was indeed unexpected and concerning for me and many others. This qualitative improvement caught many experts like myself off-guard and represented an unprecedented moment in history. Essentially, scientific progress has now reached what the computing pioneer Alan Turing proposed in 1950 as a milestone of future AI capability—the point at which it becomes challenging to discern in a text chat whether one is interacting with another human or a machine, commonly known as the Turing test. The current version of ChatGPT can feel human to many of us, indicating that there are now AI systems capable of mastering at least surface-level language and possessing sufficient knowledge about humankind to engage in discussions that are highly proficient and creative in many ways, if sometimes unreliable—noting that humans can also be unreliable. The next versions of this product will doubtless show significant improvements and make fewer mistakes. That is not to say that human-level AI has been reached. Whereas Geoffrey Hinton believes that the necessary ingredients are likely already known, Yann LeCun and myself believe that we have mostly figured out the principles giving rise to intuitive intelligence, but that we are still missing aspects of cognition related to reasoning. Yet, my own work in this space leads me to believe that AI researchers could be close to a breakthrough on these missing pieces.

Contemplating the numerous instances in the past decade when the pace of AI advancements surpassed expectations, one must ponder where we are headed and what the implications might be, both positive and negative. Several factors suggest that once we can develop AI systems based on principles akin to those underlying human intelligence, these systems will likely surpass human intelligence in most cognitive tasks, that is, we will have superhuman AIs or AGI. This notion was emphasized by Geoffrey Hinton (2023) in a recent conference, where he argued that, because AI systems are running on digital computers, they enjoy significant advantages over human brains. For instance, they can learn extremely fast by simultaneously consuming multiple sources of data across connected computers, which explains how ChatGPT was able to absorb a substantial fraction of Internet texts in just a few months, a feat that would require tens of thousands of human lives even if an individual were to spend every day reading. Additionally, AI systems can last virtually indefinitely, their programs and internal states can be easily replicated and copied across computers, akin to computer viruses, while our very mortal human brains are constrained by our continuously aging bodies.

4. The Decoupling of Cognitive Abilities From Values and Goals

To better understand the potential threats from these AI systems, I highlight here an important technical challenge faced by researchers when designing AI systems capable of effectively addressing cognitive tasks in a beneficial manner. This challenge arises from a critical distinction and separation between (a) desired outcomes, specified by goals and values, and (b) the efficient means of achieving those outcomes, relying on the cognitive abilities required to solve problems. Importantly, progress in AI can be achieved by separately (a) defining goals that align well with our desired results and underlying values and (b) determining optimal strategies for achieving these goals. This separation draws a parallel to the realm of economics, where a distinction exists between (a) the content of a contract (the goals), wherein Company A entrusts Company B with delivering specific outcomes, and (b) Company B’s competence in achieving those goals.

Let us consider this decoupling between goals and cognitive competence in the case of an AI in the hands of a bad actor. In AI systems, it is relatively easy to replace a beneficial goal, such as summarizing a report, with a malicious one, such as generating disinformation, by modifying its instructions. An intuitive natural language interface implies that even nonexperts may be able to introduce malevolent goals, as illustrated recently in the case of GPT-4 being coaxed by nonexperts to provide advice to design pandemic-grade pathogens (Soice et al., 2023) or to find cybersecurity vulnerabilities (Mascellino, 2023) and act to perform such cyberattacks (Fang et al., 2024). Furthermore, as illustrated with AutoGPT (Yang et al., 2023), it is fairly easy to turn a question-answering system like ChatGPT into a system that can take action on the Internet, without a human in the loop—which greatly increases the potential for harm. Fortunately, ChatGPT is not yet competent enough for such a system to successfully achieve complicated malicious goals, but this could change within a few years.

Let us now consider the case of someone with no malicious intent operating a powerful AI system. Much progress has been made in recent years regarding the development of cognitive abilities to perform tasks specified by given goals, but we still have no way to guarantee that the AI systems will perform as intended by the AI developers. This problem is not unique to AI: it was the subject of the 2016 Nobel Prize in Economics (Hart, 2017), and is relatable to any lawmaker who has witnessed citizens or corporations subverting the spirit of the law while following the letter of the law. In a contract between two parties, it is impractical for Party A to fully specify Party B's responsibilities, because it requires enumerating every possible circumstance in the contract. This makes it possible for Party B to adhere to the letter of the contract while exploiting loopholes that leave the spirit of the contract unfulfilled. In AI, the act of designing a goal is very much like writing a contract, and the challenge of specifying goals with intended effects is known as the alignment problem, which is unsolved. Just as Party B might understand the spirit of the contract, but still stick to the letter of it, an AI’s behavior can be unaligned with the actual intent of its developers—despite them having tried to carefully specify various rules and constraints for the AI. This misalignment already manifests in the present harms caused by AI systems, such as when a dialogue system insults a user, or when an AI company unintentionally designs a computer vision system with significantly poorer performance in recognizing the faces of Black females.

As AI systems increasingly surpass human intelligence in various domains, the concern arises whether these misalignments could result in more substantial and widespread harm to both human rights and national security, whether directed by a human or not. Consequently, proactive consideration of policies that can mitigate such risks before they materialize becomes imperative.

5. How AI May Cause Major Harms

Let us consider some of the main scenarios that worry me particularly because they could yield major harms by superhuman AIs.

  1. The first is the use of an AI system as an intentionally harmful tool. This is already happening with present systems, for example, using deep fakes for frauds and disinformation, and would be enhanced by future systems with superhuman capabilities. Current and upcoming AI systems are likely to lower the barrier to entry for dual-use research and technology (Sandbrink, 2023) on both the beneficial and dangerous sides, making powerful tools readily accessible to more people. For example, an AI developed with data from molecular biology can be used to design medicines, but can also be used to design (Boiko et al., 2023) a bioweapon (Sandbrink, 2023) or chemical weapon (Urbina et al., 2022) requested by a bad actor. The same would go for the design of computer viruses that could defeat our current cybersecurity defenses. While these actions were possible prior to AI, the degree to which they are facilitated and semiautomated by AI means that a much broader swath of nonexperts and malicious actors would now have these capabilities at their disposal. The risks proliferate when humans are not required to be in the loop—for example, if an algorithm is given free access to social media and can coordinate large-scale disinformation campaigns. The more extreme future case would be when an AI system is autonomous, that is, when it can perform actions directly, for example, order DNA on the internet from biotechnology companies and hire humans (Cox, 2023) who might not realize their role as part of a scheme to assemble the different pieces of the puzzle that corresponds to a highly lethal and virulent pathogen.

  2. In the second scenario, unintended harm is inflicted by an AI system used as a tool—for example, if it fails in rare circumstances, or involves subtle biases that lead to consistently lower performance for certain users. This kind of situation occurs frequently now and already constitutes a serious human rights issue, for example, when an AI algorithm for granting loans is biased against people of color, because the data it was trained on was biased or the teams designing them did not adequately consider how to address demographic biases in the design of the algorithm itself. Another example would be the interface between AI and military weapon systems where the propensity of human operators to follow the fallible recommendation of computers, combined with a subtly misaligned system, could yield grave consequences (Mecklin, 2023) in a nuclear threat scenario. Finally, the negative economic consequences on the labor market of greater AI-driven automation will presumably not be intentional but could nonetheless be greatly destabilizing for society.

  3. The third possibility, which could emerge in as little as a few years, is that of loss of control, when an AI is given a goal that includes or implies maintenance of its own agency, which is equivalent to a survival objective. This can be intentional by the human creator: a minority of researchers and CEOs consider it normal and desirable that superhuman AIs would replace humanity (Sutton, 2023). But humanity losing control to an AI may also occur unintentionally (Omohundro, 2018) if the AI uses unintended means to achieve a human-given goal (in a manner reminiscent of the movie 2001: A Space Odyssey). Indeed, an AI system may conclude that in order to achieve the given goal, it must not be turned off. If a human then tries to turn it off, a conflict may ensue. This may sound like science fiction, but it is sound and real computer science (Hadfield-Menell et al., 2017). What I find to be maybe the most worrying scenario occurs when the AI is trained by reinforcement learning, the dominant method to train Ais to achieve goals, and very similar to how we train a dog or a cat: with rewards and punishments, only that in the context of training an AI these rewards and punishments are plus points and minus points being added to a score that the AI is designed to ‘want’ to maximize. During the training of an AI, it will get plus points for doing things that contribute to achieving a certain goal and minus points for things that do not. But we know that if you train your cat or dog to not go onto the kitchen table, it may behave well so long as you are present, but act differently when you do not watch. Broadly speaking, this is also true for the training of an AI, and so we run into the alignment challenge described above: it is difficult to perfectly specify all our expectations of the AI behavior. This misalignment opens the door to harm that can become catastrophic as AI systems become more and more capable, because loopholes tend only to be fixed after they have been exploited. If instead of training a cat or a dog, we try to train a very smart grizzly bear that is behind bars (for safety) by giving it fish for good behavior, it may one day figure out a plan to escape its prison and go directly to take the fish from our hands. Similarly, a powerful AI trained by reinforcement learning might want to take control over the process through which its human developers give it plus points and minus points, because maximizing these rewards is what it is programmed to do. At that point it would act so to be able to continue getting these positive rewards forever, that is, prevent us from undoing its hack, which would create a conflict between humans and the AI: a way for the AI to prevent us from ‘putting the bear back in the cage’ or turning off the AI is to take control of us or our industrial infrastructure. One may believe that we could fix the original human-specified goal to avoid such harmful misalignment, filling in edge cases that we omitted, but we are not likely to be able to patch every omission one by one without incurring potentially major or irreparable harm at each step. If the AI is misaligned, powerful enough, and exploits a loophole in the instructions that it has been given for how to achieve its goals, the consequences could be unforeseen and severe. Therefore, a reactive approach to mitigating misspecified goals could be extremely costly for society, and we may only have a few chances of getting the alignment right for superhuman AI.

Other scenarios have been discussed (Hendrycks et al., 2023) in the AI safety literature, but I am most concerned by the above. In the last few months, I have discussed these with many of my fellow AI researchers and considered both arguments in favor of lower levels of concern, as well as those that suggest we should on the contrary use extreme caution. I have listed these in an FAQ document about catastrophic AI risks (Bengio, 2023) on my personal blog. Although I acknowledge there exists a lot of uncertainty about the most extreme risks, the amplitude of potential negative impacts is such that I lean toward prudence, setting up preventative measures and investing massively in research to better understand these risks and help shape a positive path forward.

One of the most relevant points raised in ongoing debates revolves around the question of how an AI system—a piece of code running on a computer—can inflict tangible harm in the physical world. While artificial systems have been around for decades, what is new now is that their level of ‘common sense’ has risen enough to allow them to operate in the unconstrained real world. Let us consider illustrative scenarios where a computer equipped with superhuman AI capabilities, including superhuman programming and cybersecurity skills, is granted internet access and provided with a bank account. Would it be impossible for such an AI to infiltrate other computers and replicate itself across multiple locations to minimize the risk of being shut down? Would it be impossible for it to perform frauds and generally earn money online, for example, through phishing or financial trading? Would it be impossible for it to influence humans or pay them to perform certain tasks or even recruit organized crime networks for illicit activities? With its cybersecurity expertise and the power to influence social media discussions and human decision-makers, could not a superhuman AI manipulate elections and the media, thus jeopardizing our democracies? With publicly available knowledge of biology and chemistry, could not a superhuman AI design bioweapons or chemical weapons (Quach, 2022)? It is hard to have strong guarantees of the above impossibilities required for safety, once we consider the premise of superhuman AI capabilities, even if these capabilities are barely above human level but can be duplicated at scale across the Internet on many computers.

In all cases, human involvement plays a critical role in enabling such harm, intentionally or not, through R&D efforts, insufficient understanding of consequences, lack of prudence or through negligence, or values putting intelligence above human rights. Government intervention and regulation that creates extremely strong incentives for behavior that achieves greater safety is thus essential.

In the long run, once systems that surpass humans in intelligence and possess sufficient power to cause harm (through human actors or directly) are created, it could potentially threaten the security of citizens across the globe and significantly disempower humanity. Given the great uncertainties surrounding the future beyond the advent of superhuman AI with considerable agency powers, it is imperative to consider every measure to avert such outcomes.

6. Factors for Major Harm as Choke Points to Minimize Risks

Above, I listed four factors that increase the probability of an AI causing major harm: access, misalignment, raw intellectual power, and scope of actions. Each of these four factors constitutes a choke point at which public policies could mitigate these risks:

  1. Access: Limiting who and how many people and organizations have access to powerful AI systems, and structuring proper protocols, duties, oversight, and incentives for them to act safely. For example, very few people in the world are allowed to fly passenger jets or have access to highly classified information, and they are selected based on trustworthiness, skills, and ethical integrity, which considerably reduces the chance of accidents. What sort of procedures do the designers/owners of these AI systems have to follow, and what incentives (including liability and regulations) do they have to act with care and ensure that they do not cause harm directly or indirectly? And how do we regulate access while avoiding concentration of power, for example, in the hands of a few unelected individuals or large profit-driven companies?

  2. Misalignment: Ensuring that AI systems will act appropriately, as intended by their operators and in agreement with our values and norms, mitigating against the potentially harmful impact of misalignment and banning powerful AI systems that are not convincingly safe. What are the system’s goals (intended by its developers or not), how aligned are they with societal values, and how and by whom are these values legitimately established? How do we design tests to verify the quality of the alignment (e.g., with independent audits)? Could this misalignment cause significant harm with sufficient cognitive power and ability of the AI to act?

  3. Raw intellectual power: Monitoring and, if necessary, restricting sources of potential leaps in AI capabilities, like algorithmic advances, increases in computing power, or novel and qualitatively different data sets. It is crucial to closely and constantly evaluate the ability of AI systems to understand the world and elaborate action plans, which depends on the level of sophistication of their algorithms as well as the amount of compute and the diversity of data they use for learning or sensing the world (e.g., searching the web). How competent is the AI at actually understanding the world—or some aspects of it over which its actions could become dangerous—and at devising plans to achieve its goals?

  4. Scope of actions: Evaluating the ability of the AI to influence individuals, affect the world, and cause harm indirectly (e.g., through human actions) or directly (e.g., through the Internet), as well as society’s ability to prevent or limit such harm. What is the severity and scale of the harm these actions could cause? For example, an AI system that controls powerful weapons can do much more damage than one that only controls the heating and air conditioning of a building.

There is uncertainty surrounding the rate at which AI capabilities will increase. However, there is a significant probability that superhuman AI is just a few years away, outpacing our ability to comprehend the various risks and establish sufficient guardrails, particularly against the more catastrophic scenarios. The current ‘gold rush’ into generative AI will, in fact, likely accelerate these advances in capabilities. Additionally, the far-reaching developments of the Internet, digital integration, and social media may amplify the scope of harm caused by such future advanced AI, especially rogue superhuman AI. We do not have the luxury to wait for an accident, which could be a ‘Black swan’ event (low probability, high impact, cascading effects, and major disruptions), as the pace of technological change means that we must be proactive. The COVID-19 pandemic was an example of how rapid developments can catch us off guard, and how the need for preparedness and resilience is crucial. Consequently, it is urgent for governments to intervene with regulation and invest in research to protect our society, and I offer a suggested path forward below.

7. The Path Forward: Regulating AI and Investing in Research

While there remains much to be understood about the potential for harm of very powerful AI systems, looking at risks through the lens of each of the above-mentioned four factors is critical to designing appropriate actions.

In light of the significant challenges societies face in designing the needed regulation and international treaties, I firmly believe that urgent efforts in the following areas are crucial:

a) The coordination and implementation of agile national and multilateral regulations—beyond voluntary guidelines—anchored in new national and international institutions that prioritize public safety in relation to all risks and harms associated with AI. This necessitates clear and mandatory, but evolving, standards for the comprehensive evaluation of potential harm through independent audits and restricting/prohibiting (with criminal law) the development and deployment of AI systems possessing certain dangerous capabilities. The goal should be to establish a level of scrutiny beyond that applied in the pharmaceutical, transportation, or nuclear industries. Minimal international standards should be set globally and enforced by domestic regulators, using the pressure of commercial barriers (Trager et al., 2023) to maximize compliance with standards across the world. Since AI-caused catastrophic risks are likely to increase with the number of AGI projects in the world, nonproliferation international agreements should be established to minimize the number of such projects, which should therefore have a multilateral governance.

b) Significantly accelerating global research endeavors focused on AI safety and governance to enhance our comprehension of existing and future risks. This research should be open access and concentrate on safeguarding public safety, human rights, and democracy, enabling the informed creation of essential regulations, safety protocols, safe AI methodologies, and new governance structures. In particular, the priority should be to crack the scientific challenge of designing controlled and aligned AI that will not turn against humans, before AGI is reached.

c) Immediate investments in research and development aiming at designing countermeasures to minimize harm from potential rogue AIs, with paramount emphasis on safety. The main mission of this work should be to design safe AIs that could protect society and humanity from AI with malicious goals. It should include any research on AI safety that could also be dangerous in bad hands and thus this work should be conducted within highly secure laboratories operating under multilateral oversight, in order to minimize the risks associated with the proliferation of dangerous AGI projects, an AI arms race, or direct control by malicious actors or governments. A centralized research center would likely not be as efficient as a network of laboratories with independent and diverse research directions, and implementing these labs in several countries would make the network more robust against a single point of failure (creating a dangerous autonomous AI by mistake or one that is exploited by a single person, organization, or government at the expense of the rest of society). Ideally, the entities leading this research should be nonprofit and nongovernmental, combining expertise in national and international security and AI, to ensure this work is uncompromised by national or commercial interests. They could be audited following safety rules set by the international community and participating governments, with an agreed-upon mission to which products of work must align.

As expressed by Piper (2023) regarding catastrophic risks of AI: “when there is this much uncertainty, high-stakes decisions shouldn’t be made unilaterally by whoever gets there first. If there were this much expert disagreement about whether a plane would land safely, it wouldn’t be allowed to take off—and that’s with 200 people on board, not eight billion.”

Given the significant potential for large-scale harm, governments must allocate substantial additional social and technological resources to safeguard our future, inspired by efforts such as space exploration or nuclear fusion. The U.K. AI Safety Institute is a good example of how to initiate such a movement and start acting now. As for regulatory frameworks, they should be extremely agile in order to quickly react to changes in technology, new research on safety and fairness, and nefarious uses that emerge. An example of such a framework is Canada's principle-based approach, the Artificial Intelligence and Data Act (AIDA, 2023), in which the law itself contains high-level objectives that are in turn defined, adapted, and operationalized in regulation. This honors the important and necessary processes that lead to the adoption of laws, while providing agility for governmental bodies to design and adapt regulation as needed, thus keeping pace with technological developments. This ensures that regulation can continue to honor the spirit of the democratic will that led to the regulation, instead of sticking to a rigid set of rules that would soon become outdated and unable to fulfill the regulator’s intent.

As proposed at the March 2024 meeting of the IDAIS (2024), it will be important for governments to keep track of advances in AI capabilities to make sure that dangerous red lines are never crossed:

  1. Autonomous replication or improvement: AI systems should not autonomously copy themselves on other computers or autonomously improve themselves.

  2. Dominant self-preservation and power seeking: AI systems should not put their own preservation above the objective to not harm humans, nor autonomously attempt to maintain or expand their control and power.

  3. Assisting in weapon development: AI systems should not substantially increase the ability of humans to design or deploy weapons of mass destruction.

  4. Cyberattacks: AI systems should not help design or autonomously execute cyberattacks resulting in major harm to people or infrastructure.

  5. Deception: AI systems should not deceive humans, especially regarding their capabilities and likelihood to cross the above red lines.

8. Additional Thoughts on Regulatory Action

While these regulatory and research efforts will unfold over the course of multiple years, a number of elements are already coming into focus that can/should be enacted, namely, regarding access, monitoring, and evaluating potential for harm. Here are additional thoughts on appropriate policies (Anderljung et al., 2023; Cohen et al., 2024; Hadfield et al., 2023; Novelli et al., 2023; Shavit et al., 2023) as per the four choke points above:

  • Ethics review committees or boards in academic and industrial labs developing algorithms or trained models that could bring rapid advances in AI capabilities;

  • Requiring powerful AI systems to be registered, providing documentation of the development process and the safety analysis of AI systems over multiple stages —before training, before deployment, and ongoing—to enable auditing and verification of safety protocols;

  • Ensuring that AI-generated content is identified as such to users to reduce the influence of AI systems (controlled by malicious individuals or not) on people’s opinions, minimizing the risk that people mistakenly believe AI-generated content to be real;

  • Licenses for companies and people with access to highly capable systems, monitoring of advanced AI systems, and who works with them, ensuring conformity to established risk-minimizing procedures: as we approach AGI, the rule should be that systems that are not demonstrably safe should neither be built nor deployed;

  • Registration requirements (Hadfield et al., 2023) for advanced AIs trained with more than a specified amount of compute (Shavit, 2023), as an initial but adaptable criterion that future research should improve upon;

  • Keeping track of the size and scope of the data sets used to train systems to differentiate AI systems that are highly specialized (targeted field of action) from those that are very general purpose and can interact with / influence / manipulate citizens and society;

  • Beyond a critical threshold of competency, limiting access to source code and trained advanced models to individuals and organizations with the appropriate licensing. Furthermore, to avoid concentration of power in the hands of a few licensed corporations, a substantial fraction of these licensed organizations should be bound to spread the benefits, through public funding and/or global public good objectives;

  • Strict regulatory requirements or bans on the development of highly advanced AIs known for the risk of emergent goals within an AI, such as reinforcement learning, until we have clear evidence of their safety;

  • Semiautomated screening of powerful AI systems for requests that can lead to dangerous behaviors such as terrorism or to increasing the power of the AI;

  • Controlling and limiting the ability of highly capable AI systems to act in the world (for example, via the Internet or specialized tools);

  • Associating social media and email accounts with a well-identified human being who registered in person with an ID, making it harder for bad actors and AI systems to rapidly take over a large number of social media or email accounts;

  • Monitoring and restriction of biotechnology and pharmaceutical companies’ sharing of sensitive data and creation of new or genetically modified biological organisms (that could be used for AI-enhanced attempts to develop or release bioweapons).

Since the Internet and social media have no strong national borders, and neither do biological or computer viruses, it will be critically important to negotiate international agreements (Ho et al., 2023) such that public policies and regulations aiming at reducing the risks of catastrophic outcomes from AI are well synchronized worldwide. An international treaty and supporting UN agency (Ho et al., 2023) akin to the International Atomic Energy Agency are necessary to standardize access permissions, cybersecurity countermeasures, safety restrictions, and fairness requirements of AI globally. The world has widely varying cultures and norms, making a minimal set of shared values, such as human safety and the rule of law, a good base from which to expand. Safety against rogue AGIs, with the future of all of humanity at stake, suggests we aim for a worldwide treaty on AI safety, AI governance, and countermeasures.

9. Conclusion

As expressed through this testimony, I am very concerned by the severe and potentially catastrophic risks that could arise intentionally—because of malicious actors using advanced AI systems to achieve harmful goals—or unintentionally—if an AI system develops strategies to achieve objectives that are misaligned with our values. I am grateful to have had the opportunity to present my perspective, emphasizing four factors that governments can focus on in their regulatory efforts to mitigate harms, especially major ones, associated with AI.

I feel strongly that it is critical to invest immediately and massively in research endeavors to design systems and safety protocols that will minimize the probability of yielding rogue AIs, as well as to develop countermeasures against the possibility of undesirable scenarios. There is a great need and opportunity for innovation in governance research to design adaptable and agile regulations and treaties that will safeguard citizens and society as the technology evolves and new unexpected threats may arise.

I believe we have the moral responsibility to mobilize our greatest minds and major resources in a bold, coordinated effort to fully reap the economic and social benefits of AI, while protecting society, humanity, and our shared future against its potential perils. And we need to do so urgently, with the United States playing the same leadership role in protecting humanity as it is in advancing AI capabilities.


Acknowledgments

This document benefited from the feedback of Valérie Pisano, Daniel Privitera, Niki Howe, Michael Cohen, David Rolnick, Alan Chan, Richard Mallah, Benjamin Prudhomme, Julia Bossmann, Sören Mindermann, Lama Saouma, Marc-Antoine Guérard, Dan Hendrycks, Noam Kolt, Roger Grosse, Ludovic Soucisse, Alex Hernandez-Garcia, Cristian Dragos Manta, Edward J. Hu, Fazl Barez, and Jean-Pierre Falet.

Disclosure Statement

Yoshua Bengio has no financial or non-financial disclosures to share for this article.


References

Anderljung, M., Barnhart, J., Korinek, A., Leung, J., O’Keefe, C., Whittlestone, J., Avin, S., Brundage, M., Bullock, J., Cass-Beggs, D., Chang, B., Collins, T., Fist, T., Hadfield, G., Hayes, A., Ho, L., Hooker, S., Horvitz, E., Kolt, N., . . . Wolf, K. (2023). Frontier AI regulation: Managing emerging risks to public safety. ArXiv. https://doi.org/10.48550/arXiv.2307.03718

Artificial Intelligence Act. (2021). Proposal for a regulation of the European Parliament and the Council laying down harmonised rules on Artificial Intelligence (Artificial Intelligence Act) and amending certain Union legislative acts. EUR-Lex-52021PC0206.

Artificial Intelligence and Data Act. (2023, March 13). The Artificial Intelligence and Data Act (AIDA) – Companion document. https://ised-isde.canada.ca/site/innovation-better-canada/en/artificial-intelligence-and-data-act-aida-companion-document

Bengio, Y. (2023, August 12). FAQ on catastrophic AI risks. Retrieved March 31, 2024, from https://yoshuabengio.org/2023/06/24/faq-on-catastrophic-ai-risks/

Boiko, D. A., MacKnight, R., & Gomes, G. (2023). Emergent autonomous scientific research capabilities of large language models. ArXiv. https://doi.org/10.48550/arXiv.2304.05332

Cohen, M. K., Kolt, N., Bengio, Y., Hadfield, G. K., & Russell, S. (2024). Regulating advanced artificial agents. Science, 384(6691), 36–38. https://doi.org/10.1126/science.adl0625

Cox, J. (2023, March 15). GPT-4 hired unwitting TaskRabbit worker by pretending to be “vision-impaired” human. https://www.vice.com/en/article/jg5ew4/gpt4-hired-unwitting-taskrabbit-worker

Epoch AI. (2023, April 11). Machine learning trends. https://epochai.org/trends

Fang, R., Bindu, R., Gupta, A., Zhan, Q., & Kang, D. (2024). LLM agents can autonomously hack websites. ArXiv. https://doi.org/10.48550/arXiv.2402.06664

Hadfield, G., Cuéllar, M. F. T., & O’Reilly, T. (2023, July 12). It’s time to create a national registry for large AI models. Carnegie Endowment for International Peace. https://carnegieendowment.org/2023/07/12/it-s-time-to-create-national-registry-for-large-ai-models-pub-90180

Hadfield-Menell, D., Dragan, A., Abbeel, P., & Russell, S. (2017). The off-switch game. International Joint Conference on Artificial Intelligence. https://www.ijcai.org/Proceedings/2017/0032

Hart, O. (2017). Incomplete contracts and control. American Economic Review, 107(7), 1731–1752. https://doi/org/10.1257/aer.107.7.1731

Hendrycks, D., Mazeika, M., & Woodside, T. (2023). An overview of catastrophic AI risks. ArXiv. https://doi.org/10.48550/arXiv.2306.12001

Hinton, G. (2023, June 5). Two paths to intelligence [Video]. YouTube. https://www.youtube.com/watch?v=rGgGOccMEiY

Ho, L., Barnhart, J., Trager, R., Bengio, Y., Brundage, M., Carnegie, A., Chowdhury, R., Dafoe, A., Hadfield, G., Levi, M., & Snidal, D. (2023). International institutions for advanced AI. ArXiv. https://doi.org/10.48550/arXiv.2307.04699

International Dialogues on AI Safety. (2024, March 10–11). International Dialogues on AI Safety. Retrieved from https://idais.ai

Mascellino, A. (2023, January 18). ChatGPT creates polymorphic malware. Infosecurity Magazine. https://www.infosecurity-magazine.com/news/chatgpt-creates-polymorphic-malware/

Mecklin, J. (2023, July 20). “Artificial escalation”: Imagining the future of nuclear risk. Bulletin of the Atomic Scientists. https://thebulletin.org/2023/07/artificial-escalation-imagining-the-future-of-nuclear-risk/#post-heading

Montreal Declaration for a Responsible Development of Artificial Intelligence. (2018). Retrieved from https://recherche.umontreal.ca/english/strategic-initiatives/montreal-declaration-for-a-responsible-ai/

Novelli, C., Casolari, F., Rotolo, A., Taddeo, M., & Floridi, L. (2023). Taking AI risks seriously: A new assessment model for the AI Act. AI & Society, 1–5. https://doi.org/10.1007/s00146-023-01723-z

Omohundro, S. M. (2018). The basic AI drives. In Artificial intelligence safety and security (pp. 47–55). Chapman and Hall/CRC. https://doi.org/10.1201/9781351251389-3

Piper, K. (2023, June). A field guide to AI safety—Asterisk. https://asteriskmag.com/issues/03/a-field-guide-to-ai-safety

Quach, K. (2022, March 18). AI drug algorithms can be flipped to invent bioweapons. https://www.theregister.com/2022/03/18/ai_weapons_learning/

Sandbrink, J. B. (2023). Artificial intelligence and biological misuse: Differentiating risks of language models and biological design tools. ArXiv. https://doi.org/10.48550/arXiv.2306.13952

Shavit, Y. (2023). What does it take to catch a Chinchilla? Verifying rules on large-scale neural network training via compute monitoring. ArXiv. https://doi.org/10.48550/arXiv.2303.11341

Soice, E. H., Rocha, R., Cordova, K., Specter, M., & Esvelt, K. M. (2023). Can large language models democratize access to dual-use biotechnology? ArXiv. https://doi.org/10.48550/arXiv.2306.03809

Sutton, R. (2023, September 9). AI succession [Video]. YouTube. https://www.youtube.com/watch?v=NgHFMolXs3U

Trager, R., Harack, B., Reuel, A., Carnegie, A., Heim, L., Ho, L., Kreps, S., Lall, R., Larter, O., hÉigeartaigh, S. O., Staffell, S., & Villalobos, J. J. (2023). International governance of civilian AI: A jurisdictional certification approach. ArXiv. https://doi.org/10.48550/arXiv.2308.15514

Urbina, F., Lentzos, F., Invernizzi, C., & Ekins, S. (2022). Dual use of artificial-intelligence-powered drug discovery. Nature Machine Intelligence, 4, 189–191. https://doi.org/10.1038/s42256-022-00465-9

Yang, H., Yue, S., & He, Y. (2023). Auto-GPT for online decision making: Benchmarks and additional opinions. ArXiv. https://doi.org/10.48550/arXiv.2306.02224


©2024 Yoshua Bengio. This article is licensed under a Creative Commons Attribution (CC BY 4.0) International license, except where otherwise indicated with respect to particular material included in the article.

Comments
0
comment
No comments here
Why not start the discussion?