Considering RAG when Evaluating Generative AI Tools

Not all AI tools are equal for research

A jaunty librarian evaluating Retrieval Augmented Generation (RAG) in her generative AI tools

Everything was easier back in November 2022 when ChatGPT came on the scene. There was one popular large language model (LLM). While we didn’t know exactly what to do with it, it was really cool to play with. But a year and a half later, the novelty of AI has worn thin.

An April 2024 ACRLChoice webinar poll revealed that 68% of university libraries are “still figuring it out” when it comes to AI. With increased pressure to figure it out, academic librarians are presented with an overwhelming number of AI tools, some of which are incredibly expensive. How do you even begin creating a rubric to evaluate these products?

Categorizing AI applications

By now, most of us know there are different flavors of consumer AI applications:

  • Conversational AI has crept into our lives in the form of digital assistants. While Siri and Alexa are eager to set a timer, and tell us cat jokes, they tend to stick to a basic script. They’re like the gonk droid of AI.

  • Predictive AI is currently used in in business and health care to forecasts trends. Your bank uses a predictive AI to look at transactions and pick out fraudulent charges. Marketers use it to analyze customer buying trends and feed you online ads. Predictive AI analyzes data but does not create new knowledge.

  • Generative AI is far more powerful and can actually create information, like stories and fake press releases. Scientists train LLM algorithms on massive amounts of data (for example, millions of articles from the New York Times, allegedly). When the user enters into conversation with the LLM, the algorithm creates an answer based not on the rules of English grammar, but the likelihood of what the next word should be according to its training.

Most of the tools developed for libraries are generative AI applications. These are legitimate tools that help with reference management. They are research tools that automate literature review and document analysis. These tools have been created by some of the leading names in library technology: Clarivate, Elsevier, JSTOR, and SirsiDynix, just to name a few. But how do these tools square with the moral panic of students using ChatGPT to cheat on their homework? How can one AI tool be virtuous and the other evil?


🌟 Subscribe to the LibTech Insights newsletter for weekly roundups and bonus content, including: 


Why you need to know RAG

It all comes down to Retrieval Augmented Generation (RAG). RAG is a framework that creates an application for generative AI LLMs. While it’s fun to play around with ChatGPT and Chatbot Arena, they aren’t particularly useful for scholars.

The limitations of generative AI are myriad:

  • They don’t provide citations.
  • They are prone to hallucinations.
  • There is no way to reproduce results.
  • There are major issues with copyright and user privacy.

While RAG doesn’t solve all these problems, it begins to address the concerns of the academic community. It’s the first step toward creating an LLM tool that is actually useful for higher education. At its core, RAG is the difference between generative AI for fun and generative AI as a legitimate research tool.

RAG enhances the LLM neural network by bringing in new information (for example, a specific dataset) and optimizing output so that users know how that output was generated (for example, citations). Say I ask Claude to tell me the social consequences of the Supreme Court cases decided in April 2024. The Claude LLM algorithm should be able to scrape the Supreme Court website for recent cases.. But it turns out the free, non-RAG version of Claude was last trained in August 2023. The chatbot freely admits that it doesn’t know anything about what happened last week. This is disappointing, but at least it admits not knowing. After all, other LLMs just make stuff up.

Claude gives a generic "I can't answer this question" response to a query about Supreme Court decisions in 2024
Claude is unable to respond to prompts requesting information more recent than its dataset

The RAG difference

Perplexity, which uses RAG, adds value to the underlying LLM by helping users refine their prompt. It also allows users to choose a dataset, and provides citations and links for search results. Claude has limitations, but Perplexity provides users with tools that create a much more useful and trustworthy experience.

I asked Perplexity my question about the latest Supreme Court decisions and had a completely different experience. First, it prompted me to get specific about the social consequences I was interested in: civil rights, environment, or immigration. That’s a good question, Perplexity.

Perplexity responds to a question about the social consequences of Supreme Court cases in 2024 by asking the user additional questions about the specific consequences they're interested in
Perplexity asks for additional information to tailor its results.

I can still use Claude as my LLM, but RAG allows me to choose my dataset (Semantic Scholar, Reddit, or full web). It also gives me citations so I can check to make sure the information I’ve received is not a hallucination.

A screenshot of Perplexity's answer to a query, showing its precise sources for the information
Perplexity provides a response with citations.

RAG is pretty powerful stuff and why subscriptions tools such as Scopus AI, Scite, Power Notes, and others add so much value to LLMs. Chatbots may be cool toys, but they are not tools for scholars. RAG technology is still new, but it’s getting better with every product release. Understanding RAG and its role in enhancing LLMs is key for librarians who evaluate and make purchasing recommendations for generative AI tools at their institutions. Students and faculty also need to understand RAG when considering which AI product to use for their own research. AI has permanently changed how we do research, and a good RAG is a must in any scholar’s toolkit.


🔥 Sign up for LibTech Insights (LTI) new post notifications and updates.

✍️ Interested in contributing to LTI? Send an email to Daniel P. at Choice with your topic idea.