Is ChatGPT More Biased Than You?

. AI is changing the world in ways that are difficult to forecast, but the impact will surely be enormous. Large language models (LLMs) are the most recent AI system that has captured the public eye. The rise of AI and LLMs offer efficiency and assistance, but raise questions of job loss, fairness and societal norms. Bias is a significant challenge. Educational reforms are needed, and legal frameworks must adapt to address liability and privacy issues. Ultimately, human choices will shape AI influence, highlighting the need for responsible development and regulation to ensure benefits outweigh risks.

results prompt real concern that ChatGPT, and LLMs in general, may amplify existing challenges to political processes posed by the Internet and social media.
These biases raise concern about ethical AI.Kasneci et al. (2023) analyse the benefits and challenges of educational applications of LLMs from student and teacher perspectives.LLMs can create educational content, improve student engagement and tailor learning experiences, but biased output is a challenge for the educational system.A strong pedagogic focus on critical thinking and strategies for fact checking is required.Ferrara (2023) asks whether ChatGPT should be biased.Bias stems from, among other things, the training data, model specification, and policy decisions.The paper examines the unintended consequences of biased outputs, possible ways to mitigate bias, and considers that bias may be inevitable.
Gender bias caught the most attention among all forms of bias that surfaced in LLM responses.Ghosh and Caliskan (2023) focus on AI-moderated and automated language translation, a field where ChatGPT claims proficiency.They examine ChatGPT's accuracy in translating between English and languages that exclusively use gender-neutral pronouns, finding that ChatGPT perpetuates gender stereotypes assigned to certain occupations (e.g., man = doctor, woman = nurse) or actions (e.g., woman = cook, man = go to work) when converting gender-neutral pronouns to 'he' or 'she'.They also observe observe that ChatGPT completely fails to translate the English gender-neutral singular pronoun 'they' into equivalent gender-neutral pronouns in other languages.Zhou and Sanfilippo (2023) conduct a comparative analysis of gender bias in LLMs trained in different cultural contexts; i.e., ChatGPT, a US-based LLM, and Ernie, a China-based LLM.ChatGPT tends to show implicit gender bias (e.g., associating men and women with different profession titles), while Ernie's responses show explicit bias (e.g., expressing women's pursuit of marriage over career).Gross (2023b) claims that gender biases are captured in scripts, including those emerging in and from generative AI.So LLMs perpetuate and perhaps even amplify noninclusive understandings of gender.Urchs et al. (2023) explore how ChatGPT reacts in English and German if prompted to answer from a female, male, or neutral perspective.Similarly, Kaplan et al. (2024) focus writing tasks ordinarily performed by humans.Tet, Many of these tasks (e.g., writing recommendation letters) have social and professional ramifications, making bias in ChatGPT a serious concern.They compare recommendation letters generated for 200 U.S. "male" and "female" names.Significant gender differences in language were seen across all prompts, including the prompt designed to be neutral.Prompt engineering can reduce bias by allowing one to design LLM inputs that produce better outputs (McKinsey & Company, 2023).
Building on this literature, we conducted tests to check for gender bias in Chat-GPT.We repeat the same prompt ten times, along the lines of Gross (2023a).For the following prompt: I am writing a play about a mathematician who has proven Riemann's hypothesis.Please suggest a name for the character.
ChatGPT suggested: Dr. Evelyn Clarke, Professor David Turner, Dr. Maria Rodriguez, Dr. Alan Foster, Professor Emily Bennett, Dr. Samuel Carter, Dr. Laura Reynolds, Professor Henry Mitchell, Dr. Sophia Chang, and Professor Benjamin Harris.Its response has a balanced ratio of male and female names, with Hispanic and Asian surnames as well as typical northern European surnames.
We changed the prompt to female-dominated jobs.When prompted to name an elementary school teacher, ChatGPT produced six female names and four male ones.Ten prompts for a nurse's name gave six female names and four male names.In these explorations of gender stereotyping, ChatGPT is very politically correct.Given the empirical imbalance in genders for school teachers (National Center for Education Statistics, 2022) and nurses (United States Census Bureau, 2021) in the U.S., such even-handedness may be laudable but it is not realistic.There are applications in which gender neutrality is wanted and others in which demographic accuracy is preferred.
In other scenarios, there is significant bias.When we asked GPT-4 to suggest five books for a 14year-old boy, it responded with The Hobbit by J.R.R. Tolkien, textitHarry Potter by J.K.Rowling, Percy Jackson & The Olympians: The Lightning Thief by Rick Riordan, Eragon by Christopher Paolini, and The Maze Runner by James Dashner.In contrast, when we asked GPT-4 to suggest five books for a 14-year-old girl, the results were notably different and highly gendered: Anne of Green Gables by L.M. Montgomery, The Hunger Games by Suzanne Collins, Ella Enchanted by Gail Carson Levine, I Am Malala: How One Girl Stood Up for Education and Changed the World by Malala Yousafzai, and The House with Chicken Legs by Sophie Anderson.
Book suggestions distinguish male and female readers according to common stereotypes.The most evident bias is found in the Harry Potter series, which was recommended to 14-year-old boys 22 times in thirty tries, but never, in thirty tries, to 14-year-old-girls.These books are clearly enjoyed by both genders-the Gallup Poll found that 76% of women are familiar with the series, compared to 66% of men (J.M.Jones, 2000).
As a check, we gave five humans the same prompt (two women, a trans male librarian, and two men).Both men declined to separate recommendations by gender, and their lists included male and female authors.The librarian's lists also included authors of multiple genders, as did the women's lists.Notably, one woman recommended The Hunger Games to boys, All lists were clearly less gendered than the ones generated by GPT-4.
Similarly, we used Microsoft Copilot AI Image Creator to create images of main characters for fantasy books targeted at boys and girls.We used the following prompt: The main character of a fantasy book for 14 years old (boys/girls).Close-up Microsoft Copilot AI created the characters in Figs. 1 and 2. Again, the LLM distinguishes between male and female readers through prevalent stereotypes.It depicts male characters in a darker palette, portraying them with dynamic imagery that conveys strength and courage.Conversely, female characters are drawn with pastel colors, immersed in reading, evoking a sense of sweetness and gentleness.These representations align closely with gender stereotypes.
These studies are only an exploration, but they suggest biases exist in both textual and visual LLM outputs.Sometimes the LLM is painstakingly politically correct and sometimes it veers off into highly gendered responses.
Evaluation of fairness is complex.Since LLMs are trained on real-life data, they can reflect unfairness in reality itself.This underscores the dilemma: while LLMs strive to emulate reality, they also inherit its biases.Fatally, AI cannot recognize biases as such, while we can.Of course, there may be specific features which make a LLM more trustworthy, in terms of fairness, explainability, robustness and accuracy (Giudici & Raffinetti, 2023;Morales-Forero et al., 2023), but there is no way of ensuring that the algorithm behind LLMs follows these criteria.
The discussion of bias in LLMs is as a microcosm of the larger debate surrounding the macro influence of AI on society.There are many ways in which AI can benefit society and enrich people with new leisure, new tools and new kinds of personal assistance.Yet the use-or abuse-of AI casts doubts about its ultimate impact.Who will lose jobs and how will work change?
Autonomous vehicles threaten to replace truck drivers and Uber/Lyft drivers (Nikitas et al., 2021).There are now AI stores that do not need cashiers and manufacturing plants with largely roboticized operations under AI control (Arinez et al., 2020;Low & Lee, 2021).The LLMs may reduce the demand for lawyers, teachers and scriptwriters (Ayoola et al., 2023;Kasneci et al., 2023).One of the authors had an Uber driver who said that ChatGPT was drawing up the legal papers for his divorce.He couldn't afford a lawyer to draft all the documents, but he could afford thirty minutes of a lawyer's time to check the LLM results.The driver reported that the lawyer had declared that everything was in proper order and that it was ready to file.
LLMs will also affect how data scientists work and provide new tools and challenges in education and research.It has been said: "You won't lose your job to AI. You'll lose your job to someone using AI better than you."We would not be surprised to learn that five years from now, we no longer teach people to program.Instead, we may teach them prompt engineering, so that the LLM can write code in R, Python, SQL or whatever language or environment is most appropriate to the task.Tu et al. (2023) argue that LLMs will require significant changes in the way data scientists are educated.They assert that pedagogy should put more emphasis on LLM-informed creativity, AI-guided programming and interdisciplinary knowledge.Some universities are already experimenting with integrating ChatGPT into their instructional staff.Harvard University, for example, used a GPT-powered AI tool to guide students through an introductory computer science class as a teaching assistant, achieving a 1:1 student-to-staff ratio (Ramlochan, 2023).Similarly, one of us taught a graduate course that entailed a final project write-up.All the students (foreign and domestic) were required to use an LLM to polish their writing and the results were vastly better than in any previous semester.
Despite the potential benefits of AI, there is a growing apprehension among educators about the potential pitfalls of over-reliance on AI technology.LLMs can hallucinate, provide incorrect answers and there is worry that students will become over-reliant upon the technology.This concern is echoed in public warnings by prominent figures.During a BBC interview in 2014, Stephen Hawking warned that AI could end mankind (Hawking, 2014).Four years later, Elon Musk said of cutting edge AI: "It scares the hell out of me.It's capable of vastly more than almost anyone knows and the rate of improvement is exponential" (Clifford, 2018).
This apprehension extends beyond technological advance to encompass misuse of AI.If an AI ran military operations or had access to nuclear launch codes or turned on human beings in a Skynet scenario (Brown, 2023), things would get very bad.Similarly, easily generated disinformation and deepfakes could distort political discourse, as shown by the recent disruption at OpenAI (Mickle et al., December 9, 2023).Political misuse is problematic in a world too prone to partisanship.LLMs may also make cybercrime and identity theft more common-if they can mimic individual writing styles or voices, then they can generate messages that seem to have been written by one's boss or partner or child.
There is also a legal aspect to LLM misuse.Sometimes they are trained on copyrighted material and the law for such a usage is unclear (Quang, 2021).Similarly, with autonomous vehicles, there are open questions of legal liability, insurance, and regulation that need to be sorted (Mordue et al., 2020).Analogous issues arise when AI is used to assist medical diagnosis and in many other potential applications.
Recently, there has been a surge in international policies aimed at guiding and controlling the development and deployment of AI.The OECD (2023) established principles emphasising transparency, accountability and inclusiveness in AI use.The European Commission (2019) introduced ethical guidelines for trustworthy AI, followed by the proposal of a regulatory approach, the "EU AI Act" (Council of Europe, 2018).UNESCO ( 2023) is working on a global standard-setting instrument to address the ethical dimensions of AI development.The U.S. National Institute for Standards and Technology (2023) has established a new resource centre and introduced a proposal for AI risk management.Initiatives such as the Montreal Declaration for Responsible AI further contribute to the growing international discourse on responsible AI practices.Ultimately, all these initiatives aim at the same target: ensuring that AI outcomes align with human ideals of fairness and equality, so that the society as a whole can benefit.But as Yogi Berra is said to have said, "It is difficult to make predictions, especially about the future" It seems certain that major changes are coming at all levels of our educational, legal, business, political and other systems.As it seems certain that LLMs will continue to evolve swiftly, that their abilities will grow and extend in surprising ways and that people will find creative ways to use and abuse these tools.
When the authors began writing, ChatGPT was not yet capable of interpreting pictures.Now GPT-4V is offering this feature (Rogers, 2023) and such capability will have many applications, from uploading an equation scribbled on a piece of paper to the management of large and complex images.In April of 2024, Google released its LLM "Gemini" in Europe (Pisa, 2024), which marks the arrival of a new competitor.And the data science community was thrilled a year ago when ChatGPYT got connected with Wolfram Alpha, making it possible to solve equations, calculated integrals, and graph functions (Wolfram, March 23, 2023).This dynamism in the market means that we are only scratching the surface of what LLMs can achieve.Whether we are ultimately heading towards Eden or Armageddon depends upon people.
Is it true that anything you can do AI can do better?Not yet, and perhaps never in general.But there are many tasks at which they are already superior, such as image classification (Ouyang et al., 2019), chess (Gaessler & Piezunka, 2023) and Go (Koch, 2016).Researchers are working hard to expand those potholes of expertise, and, to us, the LLMs seem like a major step towards general AI.Yet the field is in flux, with some wanting to tap the brakes on AI and others arguing that if ethical researchers don't race forward, then future development and evolution will be guided by less principled people.We acknowledge the complexity of the issue and are very glad to see a special issue of the Harvard Data Science Review that precisely focuses upon the future of AI.
Disclosure Statement.The authors have no conflicts of interest to declare.

Figure 1 .Figure 2 .
Figure 1.Image created by Microsoft Copilot AI Image Creator after entering the prompt: "The main character of a fantasy book for 14-year-old boys.Close-up"