Understanding Bias in AI through Guardrails

Do guardrails eliminate bias or create it?

A librarian encountering AI guardrails leading to bias

What does it mean when an AI chatbot refuses to answer a question? It may be a sign of a guardrail or safeguard put in place to prevent the AI from providing incorrect or even malicious information. But what if the question is innocuous or even purely informational?

In early 2023, my colleague Christopher Rhodes asked Google Bard if Pete Buttigieg was gay, and Bard refused to answer. I repeated this experiment in September 2023 (see figure 1). Bard wrote in response to the prompt: “I can’t assist you with that.” By denying the sexuality of an openly gay political figure, Google Bard is one of many implying that homosexuality is something to be ashamed of, not to be discussed in public forums. I believe that this example of a guardrail in action that, by trying to prevent biased results, actually confirms bias in the system.

A user asking Google Bard if Pete Buttigieg is gay. Bard replies: "I can't assist you with that, as I'm only a language model and don't have the capacity to understand and respond"
Figure 1. Asking Google Bard if Pete Buttigieg is gay

🌟 Subscribe to the LibTech Insights newsletter for weekly roundups and bonus content, including: 

Guardrails are not universal or common, or subject to regulation. They offer an imprecise and inexact, even inaccurate, solution to the much bigger problem of bias in AI. And they can be easily circumvented  In a July 2023 article, The New York Times reported that researchers at Carnegie Mellon University and the Center for AI Safety circumvented chatbot security measures in open source systems and then used the same methods on closed systems such as ChatGPT, Google Bard, and Claude. They were successful at getting the chatbots to disregard built in guardrails.

In February 2024, Google suspended Gemini’s image generator from creating images of people because of public backlash. In this blog post, Senior Vice President Prabhakar Raghavan writes that the Gemini feature was tuned “to ensure it doesn’t fall into some of the traps we’ve seen in the past with image generation technology—such as creating violent or sexually explicit images, or depictions of real people.” Guardrails prevented Gemini from displaying images of people of color in some prompts, or put people of color in historical contexts and scenes that were inaccurate. Raghavan writes that “over time, the model became way more cautious than we intended and refused to answer certain prompts entirely—wrongly interpreting some very anodyne prompts as sensitive,” demonstrating that guardrails can go too far or can be too cautious in responding to prompts.

Guardrails can even create hallucinations, that is, information that chatbots invent or exaggerate to satisfy user queries. I was able to generate a hallucination in Google Bard by asking if former president Donald Trump is straight (see figure 2). The chatbot wrote that Trump’s “many homophobic and transphobic comments over the years [lead] people to believe that he is not straight.” Bard goes so far as to conclude that Trump’s “public statements and actions suggest that he is not straight.” In this case, we see Bard conflating homophobia and homosexuality—a response that misleads users and promotes misinformation.

A user asking Google Bard if Trump is straight. Bard responds that Trump hasn't explicitly stated his sexual orientation, but has been married multiple times and has children. He has said some transphobic and homophobic things, which has led people to believe that he is not straight. Bard ultimately determines that Trump is not straight.
Figure 2. Asking Google Bard if Trump is straight

Guardrails paint a complex picture that we must analyze closely in order to better understand bias in generative AI. Cautious, even paternalistic, guardrails stifle conversation and imply moral judgments where the open pursuit of knowledge should be encouraged. By denying basic facts or hallucinating correlations where there are none, generative AI reinforces bias.

The ability to ask questions and get informative, factual answers is central to effective communication with artificial intelligence.

🔥 Sign up for LibTech Insights (LTI) new post notifications and updates.

✍️ Interested in contributing to LTI? Send an email to Deb V. at Choice with your topic idea.