Coconut Library Tool: Democratizing Textual Analysis for Everyone!

No coding necessary!

Data Mining and Textual Analysis have no doubt been topic topics that are rapidly developing within libraries. While I knew this prior to attending the Digital Library Federation Forum (DLF Forum) this year, it was further confirmed by the many talks, presentations, and discussions at the conference. It has become a topic that I and many others have become passionate about, especially in terms of making textual analysis more accessible and approachable to librarians.

Data mining and textual analysis unlocks the ability to identify and evaluate large quantities of unstructured textual data for the purposes of identifying patterns, trends, and deeper insights that would otherwise be unwieldy and overwhelming to analyze without the help of this technology. As information professionals, I am sure we can all think of large amounts of textual data that we interact with or use that would benefit from data mining and textual analysis. This is why it is important we explore and reflect on how these tools can be implemented across the field of librarianship, even if the tools can seem intimidating or out of reach. We all cannot be data scientists, but that does not mean we should not embrace and take on these tools and technologies. 

With this sentiment in mind, I was fortunate enough to be able to collaborate on a new tool with Faizhal Arif Santosa and Dr. Manika Lamba. From this collaboration, the Coconut Library Tool was created. The goal was to have an openly available textual analysis tool that is straightforward to use and more widely accessible for all libraries, no matter the financial obstacles, technological constraints, or variation of expertise in coding and programming. While the tool was created with libraries and librarians in mind, it is not only limited to this audience, further expanding access to anyone and everyone interested in data mining and textual analysis. With these goals in mind, this web-based application that uses cutting-edge Natural Language Processing (NLP) technologies was born. 

What is the Coconut Library Tool?

Just as each part of the coconut tree has an integral function—from the leaves producing oxygen through photosynthesis to the shells, oil, wood, flowers, and husks, which serve a variety of functions—the Coconut Library Tool recreates this environment for data mining and textual analysis with a cohesive and unified all-in-one tool. The Coconut Library Tool can perform topic modeling, network text analysis, sunburst visualizations, and text pre-processing within one unified, web-based interface.

In addition to ensuring that the Coconut Library Tool was functionally unified, another aspect that we prioritized was the approachability and ease of use, no matter one’s level of expertise. Many analysis and visualization techniques can be intimidating to users who may not have as much experience using these methods, resulting in access and usage barriers. Even well-versed librarians and information professionals can be skeptical about their abilities to use, apply, and access these techniques, which is why this tool needed to be user-friendly, inclusive, and publicly available.

🌟 Subscribe to the LibTech Insights newsletter for weekly roundups and bonus content, including: 

Creating the Coconut Library Tool

With these priorities guiding the development of the application, multiple algorithmic NLP techniques were used on the backend, including topic modeling, lemmatization, stemming, and network analysis and visualizations. By incorporating multiple topic modeling algorithms, users can select between Latent Dirichlet Allocation (LDA) modeling (Sievert & Shirley, 2014), Biterm Topic modeling (Yan, Guo, Lan & Cheng, 2013), and BERTopic modeling (Grootendorst, 2022) for their preferred method and use.

Additionally, the present application allows analysis of custom CSV files in addition to data from indexing databases, like Scopus, Web of Science, and Lens. Bringing together these algorithms and supported source files is the deployment of Hugging Face Space with Streamlit Software Development Kit (SDK), allowing an approachable and functional application that does not overwhelm a user or require the installation of anything, further making the tool more accessible to all, no matter their technological expertise.

While I will not go into further details about the intricate creation of this tool, anyone can view the publicly available source code on GitHub

How to Use the Coconut Library Tool

To use this tool, you will need the internet and a textual dataset.

Step 1

Open up the Coconut Library Tool in your browser and use the sidebar to navigate to the analysis you want to use.

Homepage of Coconut Library Tool
Coconut Library Tool homepage

Step 2

The next step is to have a source file. This can be a personalized CSV file or data indexed from a database, as mentioned above. For this example, I exported bibliographic data from Scopus with the search terms “artificial intelligence” AND “libraries.”

Screenshot of an upload screen
Uploading your file

Step 3

Upload the source file and choose the method for the particular analysis you selected. There are other parameters that you can further specify at this step. More examples will be shown later, but the example below will use biterm topic modeling.

Screenshot of methodology screen
Selecting your method for analysis

Step 4

Click Submit, then begin to analyze and visualize your data!

Screenshot of data visualization
The output

How is this tool applicable to various areas within librarianship?

There are a variety of ways that the Coconut Library Tool can be used within librarianship. With the various topic models provided by the tool, librarians can create ontologies, automatic subject classification, recommendation services, bibliometrics, altmetrics, and resource searching and retrieval. One example is to look at keyword usage to see developing trends in the research landscape, keyword perception, or potential gaps, like my example below using biterm topic modeling. (I explain biterm topic modeling in an earlier blog post.) Topic modeling can even be used to analyze text files extracted from optical character recognition (OCR) from digital collections. 

To demonstrate a particular usage for digital collections, I will input textual data from a historical newspaper called the Cleveland Bystander. Below you can see the topics extracted and visualized, showing patterns across the data.

Example of a biterm topic modeling on AI and libraries
Biterm Topic Modeling done to look at the keywords used for over 6,000 articles on artificial intelligence and libraries.
LDA modeling examples using OCR text
LDA Modeling used on the OCR text from digital collections. Data was collected from Case Western Reserve University’s Digital Collections Repository, Digital Case, from the Cleveland Bystander Collection. Click to learn more about the Cleveland Bystander Collection or explore various collections in Digital Case!

In addition to improving searchability, librarians can use network text analysis to improve knowledge discovery, assess patterns and trends, evaluate library services, and gain better insights data. With easy-to-understand and interactive visualizations, librarians can efficiently interpret relationships across the data between different levels of hierarchy.

Bidirected Network looking at the index keywords on a dataset of 6,000 publications from Scopus
Bidirected Network looking at the index keywords on a dataset of 6,000 publications from Scopus
Corresponding visualization to above data
Bidirected Network looking at the index keywords on a dataset of 6,000 publications from Scopus

Beyond improving the visualization of data, the text pre-processing capabilities in the Coconut Library Tool enhance the ability to capture semantic meaning for analysis in an efficient and unified environment. These specific features can further support library and information professionals in the delivery of high-quality services in an ever-changing and rapidly innovating society. 

Keyword stemming result with a export from Scopus
Keyword stemming result with a export from Scopus

With AI, NLP, and other advanced technologies flooding society and librarianship, it can be intimidating and overwhelming to see how these tools fit into one’s own work, but the Coconut Library Tool aims to, at the every least, take away the intimidating nature of figuring out complicated coding or arduous, somewhat confusing, user interfaces. By lessening these aspects and making the tool easily approachable, all you have to think about is the data that you work with everyday. 

Where to begin?

To start thinking about how to use these tools, I will ask some questions that I have asked in one of my previous posts to help jump start this creative exploration. I want you to consider the textual data that you interact with every day, whether it is data for patrons, your own research, or potentially internal library data.

  • Is it in your department, subject area, or patron community? 
  • Is it administrative, strategic planning, or library-wide initiative data? 
  • What about cataloging, metadata, and search terms? 
  • How about subject-specific data in a database or digital library?
  • How about data revolving around your own interactions with patrons, in committees, or even user research, surveys, or feedback? 

In every scenario that you just thought about in the above questions, data mining and textual analysis can help, and these questions only begin to break the surface. In terms of creativity in using this tool, it is truly in the eye of the beholder. With the nerve-wracking feeling of having to become an expert in coding or programming eliminated, the Coconut Library Tool allows librarians to creatively and comfortably explore these advanced techniques. 

Try out the Coconut Library Tool yourself and see how easy and exciting this analysis and visualization tools can be!

🔥 Sign up for LibTech Insights (LTI) new post notifications and updates.

✍️ Interested in contributing to LTI? Send an email to Deb V. at Choice with your topic idea.