Preserving Podcasts in Institutional Repositories

How can archivists tackle this rising form of scholarship?

A librarian standing in front of an institutional repository folder with a podcast symbol on it

In an interview we recently ran, Vanderbilt University librarians Melissa N. Mallon and Andrew Wesolek discussed the potential for podcasts as a form of scholarship. Their undergraduate fellows used podcasts to communicate their findings on privacy literacy. Some academics already have moved into the podcasting space to share their research with a wider audience and tackle the misinformation that often appears in this media environment. A recent book examines the possibility of extending the peer review process to podcasts. Though it’s unlikely that tenure requirements will broaden to include podcasts in the immediate future, it does seem like podcasts will continue to grow as a venue for public scholarship.

For librarians and archivists, this presents a practical hurdle. How do we store, catalogue, and maintain these files? Archivists Valerie Collins and Erik Moore explore the unique challenges of podcasts in institutional repositories and possible solutions. Their chapter, “Preserving Podcasts in Institutional Repositories,” appears in Rethinking Institutional Repositories: Innovations in Management, Collections, and Inclusion (ACRL, 2024) and is reprinted below with their permission.


Preserving Podcasts in Institutional Repositories

In 2005, Stanford University launched Stanford on iTunes in partnership with Apple to provide the “general public free access to a wide range of Stanford-specific digital audio content.”[1]This academic/business partnership expanded in 2007 when Apple released iTunes U, a platform providing higher education institutions a space to distribute university-produced audio content. The then-newly emerging technology of podcasting was a natural fit for freely available course lectures, sports programs, informational segments, and other types of educational content, and in the intervening years, university-produced podcasts have proliferated. In 2020, however, Apple announced it would no longer support iTunes U and would remove it from its app store by the end of 2021.[2] While there is no shortage of platforms available to distribute podcasts, the end of iTunes U highlights an issue of concern to those involved in the digital preservation of openly available university-produced content. Namely, how will podcasts be preserved and made accessible going forward? What options do colleges and universities have for preserving this institutional content and maintaining the accessibility of the podcasts?

Institutional repositories (IRs) are one possibility for the preservation of and access to university-produced podcasts. Put in terms of podcasting technology, institutional repositories are content management systems that employ marked-up metadata and syndication protocols to distribute and enhance the discoverability of content in the repository.[3] University-created podcasts represent content that is likely to be within the collecting scope of a university’s institutional repository. As a format, podcasts are digital objects—audio files—that can be collected and described at the item level using descriptive and technical metadata created by the podcast producers. In their production and release, podcasts function as serial publications. In terms of being an informational resource, podcasts produced by academic units not only provide educational content, real-time news, or other updates to their audience but also fulfill a documentary role in capturing a historical record of the institution.

This chapter introduces the University of Minnesota Archives’ efforts to locate and ingest University of Minnesota podcasts into the institutional repository, the University Digital Conservancy (UDC). The inclusion of podcast media in IRs rethinks traditional formats in repositories by focusing on non-text-based content. This undertaking extends the IR’s reach to local creators and new contributors at the institution while broadening its reach beyond an academic audience. By looking past traditional IR scholarly content to include podcasts produced by university departments, institutional repositories can also capture a particular record of the institution that might otherwise be lost: the sound of the university engaged in its mission of research, teaching, and outreach.

Podcasts and Institutional Value

The University Archives and Records Program at the University of Michigan conducted a case study on institutional podcasts in 2010. This work applied an institutional functional analysis based on Helen Samuels’ framework to assess the value of university records. Through this analysis, archivists at Michigan determined that podcasts have archival value, as the content they were evaluating—classroom lectures, outreach, and student recruitment—supported the functions of conferring credentials, conveying knowledge, sustaining the institution, and providing public service.[4] They further determined that individual audio files and metadata should be added to their preservation environment and access copies made available through their institutional repository, Deep Blue.[5] The case study identified several challenges in preserving podcasts and methods to mitigate them, including working with podcast producers and using tools to extract metadata from the digital files and web-based submission forms.[6]

Around the same time, Brock Peoples and Carol Tilley identified the institutional podcast. Peoples and Tilley defined institutional podcasts as recordings of course lectures, supplemental instruction material, or discussions of a faculty member’s research. Where the archivists at Michigan connected these matters to their functional analysis, Peoples and Tilley contextualized these works with traditional scholarly outputs, and identified institutional podcasts as fulfilling the definition of grey literature.[7] They continued by suggesting that institutional podcasts “share similarities with scholarly works and forms of grey literature that libraries collect and preserve directly or manage in institutional repositories.”[8] Whereas the Michigan project deposited podcasts in the IR as an access copy for content in their digital preservation environment, Peoples and Tilley proposed institutional repositories as a means to intentionally preserve podcasts for discoverability and access. Reasoning that IRs provide access to non-traditional scholarly works and that these works would not be made available through more traditional catalogs like library OPACs, and combined with the ephemeral nature of grey literature, Peoples and Tilley concluded that placement of institutional podcasts in IRs “is vital for the continued availability and access of these resources.”[9]

As Peoples and Tilley also note, both podcasts and grey literature “circumvent the traditional publishing process” and thus can be produced and released more quickly than traditionally published materials.[10] This timeliness allows podcast producers to quickly respond to news or events that are of interest to their listeners and to produce new episodes more frequently. Today, many institutionally produced podcasts are released in an episodic format as a type of serialized grey literature to deliver informational news, human interest stories, or academic or scholarly content.

The combination of the serial nature of podcasts and their timeliness as an informational resource presents two issues related to their continued preservation. First, their often-irregular release schedule makes it difficult to identify and capture complete runs of these serialized digital publications. Second, lacking a regular production cycle by the creator, it is more likely that these institutional podcasts will become dormant. Known as “podfade,” podcasts may cease to produce new content without warning, disappear from their hosting platforms, or even see their platforms discontinued, as in the case of iTunes U.[11]

Kidd, Nguyen, and Titkemeyer recognize the inherent vulnerability of podcasts hosted on ephemeral websites but not otherwise managed for long-term preservation in their “Preserve this Podcast” project.[12] The vulnerability of podcasts to loss is no different for university-produced podcasts. While some university podcasts are hosted on university websites and pushed out to other distribution channels via an RSS feed, others have no central institutional web-based home, and in any case, the university is continually updating, migrating, transitioning, and deprecating websites—making even an institutional URL unstable over the long term. By demonstrating the IR’s ability to preserve and provide stable access to podcasts, repository staff can point to the IR as part of their outreach to content producers to recruit other material for the repository, including older, discontinued podcast series.

The Records of a Podcast

It became apparent during the onset of the COVID-19 pandemic that existing university podcasts were shifting their topical focus to the pandemic and new podcasts were being created to address COVID information. These serial podcasts were immediately responsive to a major event and documented not only policy and procedural changes in real-time but also changing public attitudes and awareness. On deeper reflection, archives staff realized that the historical value of the podcasts was not specific to the pandemic but rather that the medium had an inherently documentary nature that provided informational and enduring value and could operate as an institutional record.

Although the audio files of individual podcast series presented an obvious focus for collecting, it was also important to recognize that these files were not the sole record connected to the podcast. For instance, the University of Minnesota has a long history of producing audio content that educates, informs, and engages current events. During the summer of 1946, a polio outbreak in the Upper Midwest region of the United States precipitated the closure of parks, pools, and fairs and delayed the opening of schools to curb the spread of the epidemic. In response, the University of Minnesota radio station, KUOM, developed on-air programming to both entertain and educate children confined to their homes.[13] The university’s engagement through broadcast technology demonstrated how quickly the resources and expertise of the university could be brought to the public’s benefit. Today, these audio recordings in the University Archives are a primary resource documenting the events and response to the polio epidemic.

The collection of archival records from the KUOM radio station includes textual records related to the development of radio programs, pointing to the likelihood of a larger constellation of contemporary records that may be created in the development of podcast programs. These corresponding records are conceivably even more hidden and likely to be lost than the podcast episodes released online. Eric Hoyt cautions that “if researchers only engage with MP3 and XML files, they will miss important production and reception environments that are contributing to the rise of podcasting as a vital and important media format.”[14] In what ways are institutional repositories able to support these contextual records? The inclusion of podcast series and episode descriptions, thumbnails or other artwork, transcripts, and text-based scholarship or datasets connected to the podcast are all starting points.

Collecting Podcasts at the University of Minnesota

For this project, the institutional podcasts identified by University Archives staff were made available in the University of Minnesota’s institutional repository, the University Digital Conservancy (UDC). Launched in 2007, the Digital Conservancy is considered the “digital arm” of the University of Minnesota Archives and is directed and managed by archives staff. As a program, the Conservancy focuses on the public institutional records of the university, including Senate and Regents minutes, reports and strategic planning documents, and other types of systemwide university publications. The UDC is home to the university’s electronic thesis and dissertation (ETD) collection. It serves as a publication clearinghouse for the Extension Division. It houses the university’s data repository and provides open access to self-submitted scholarly works. However, the majority of the 100,000 plus records within the repository represent non-traditional academic content focusing on the grey literature produced by the university’s many centers, programs, and research units.

The current podcasting environment at the university represents a distributed network of content creators. Repository staff needed a way to identify and connect with these producing units and podcast creators and to determine what information to request from them. This guidance came from the creators of the Mellon-funded Preserve This Podcast project. Following their framework to locate the audio files, capture metadata, and consider ownership issues, staff began this process by distributing a brief survey in May 2020 with the goal of identifying current as well as inactive podcasts and making initial contact with podcast producers. The survey acted as a tool to gather information related to relevant podcasts but also as an introduction to the longer-term purpose and intentions of the project.

The survey identified thirty-two podcast programs. The oldest podcast represented in the survey began in 2006, but this was an outlier. The remainder of the podcasts were created after 2015. Twelve of the twenty-three podcast series identified for inclusion in the repository began in 2019 or 2020. These results suggest staff responding to the survey were primarily concerned with, or only aware of, the podcasts they were actively producing and that this kind of general surveying would be inefficient at identifying earlier years of podcasting at the university. The survey responses indicated the number of episodes for each podcast at the time ranged from two to 163, with the average being nineteen episodes. Most of these series were relatively small—generally because they were on occasional release schedules or the podcast had only recently launched.

In response to the survey, between fall 2020 and spring 2021, archives and repository staff uploaded 752 episodes from twenty-three of the podcast series determined to be within scope. Staff tested and documented a variety of methods to upload the podcasts to the IR. Acquiring the digital files and metadata through different processes helped identify available resources for each podcast. Staff used a mix of approaches that included collecting metadata and episodes directly from the podcast website, uploading files individually, using RSS feeds to automate the collecting of metadata, and utilizing a batch import process through the repository to upload a series in bulk.

This documentation of current and historical podcasts was crucial to understand the scale of podcasting at the University of Minnesota and, hopefully, gain some insight into older podcasts that might not have any online presence. As Gerber-Margie et al. explain, “The magnitude of podcast loss can be measured by the loss of references to the original files.”[15] The files might still exist somewhere, but the absence of any references to a URL makes them essentially lost to those who seek them. Repository staff know that the creation of University of Minnesota podcasts dates to at least 2006, as indicated both in responses to the survey and discovered through secondary references to these podcasts in contemporaneous press releases found in the Digital Conservancy, but these and other concurrent podcasts of earlier eras are otherwise lost.[16]

An example podcast included in the IR is The Moos Room, a series produced by the University of Minnesota Extension Beef and Dairy Teams. The Moos Room “discusses relevant topics to help beef and dairy producers be more successful.”[17] The show’s format is an informal conversation between the hosts and their guests. When the survey was completed, The Moos Room had fourteen episodes and had only been in production for a few months. As a weekly podcast, it has since passed a hundred episodes, meaning that although when originally surveyed it had fewer than the average number of episodes to upload in 2020, the number of episodes over two years accumulated quickly.

The Moos Room is available through multiple platforms, like many of the podcasts represented in the survey. While it originally was available on a blog platform using a umn.edu URL as its “home page,” it now directs listeners to a dedicated podcast hosting platform. From this platform, there is a direct download link for each of the audio files as well as additional descriptive metadata to populate the IR record. The available episode synopsis becomes the item abstract. The hosts and any special guests are listed as the authors. No additional descriptive metadata is created for episodes during this process, except to add the runtime of each episode as a descriptive element. The title field uses the format of “Episode Number: Episode Title: Series Name.” Since the IR is indexed by Google Scholar, this helps identify the content when it displays in search results. Although the Moos Room does not currently provide transcripts, other podcasts do, and when available, transcripts are collected with the audio file and added to the repository record as a downloadable PDF.

Podcasts and Discoverability

When transcripts are available to download along with the audio file, including them in the repository record is an easy way to increase the discoverability and accessibility of the podcasts. While institutional repositories can ingest audio and video file types, the text-based formats of transcripts work best with character recognition software and keyword indexing. When paired with the podcast, repository users download the transcripts more often than the audio files from the podcast record page. While skipping the audio file and going straight to the transcript may seem to defeat the purpose of preserving podcasts, podcasts are composed of many parts, including audio and textual content. Both the Preserve this Podcast project and Eric Hoyt note that the additional contextual records created as part of a podcast’s production, like transcripts, are another layer of ephemeral content that remains offline, unseen, and vulnerable to loss.[18] When transcripts are not available, one option is to process the audio file through a media filter to generate an automated transcription as a text file. This text file can be uploaded to the record to provide indexing for the podcast, similar to the way running optical character recognition (OCR) on a digitized document allows it to be indexed in the repository.

IRs often provide stable and persistent URLs to content. Since content added to the University of Minnesota’s IR is included in the libraries’ digital preservation program, content creators can be assured that episodes will be preserved in a central location, even as the “live” version of the podcast moves between different hosting locations. The repository can also serve as a point of metadata control. For Peoples and Tilley in 2011, bibliographic control and optimized access points were important features and could be achieved by cataloging podcasts for discovery via a library’s OPAC.[19] At the University of Minnesota, the institutional repository leverages the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) to push its metadata to the library’s catalog for ingest. The metadata for each podcast then becomes discoverable via the catalog based on the record in the IR. No additional cataloging is required, and the harvesting and indexing in the catalog are automated. In addition to optimizing access to the podcasts in the repository and the catalog, the IR is crawled and indexed by Google Scholar, allowing for this audio-based grey literature to be discoverable in the same environment as other scholarly works.

Conclusion

Podcasts are a digital publishing medium that academic institutions use to share information and promote current research activities. When produced by a university department, a podcast may serve as an institutional record documenting a function of the institution. These podcasts may also enhance a more traditional academic publication or serve as a non-traditional scholarly work like other types of grey literature. Although the format of podcasts as an audio file published through online distribution platforms is not traditionally associated with an institutional repository, the relevance of their content as an institutional record or as a serial publication makes IRs an ideal solution for the preservation of and access to university-produced podcasts.

By adding podcasts to an institutional repository, these audio works receive many of the same benefits as other types of traditional repository content, including indexing in Google Scholar, permanency of the resource, and statistical tracking of use. Institutional repositories ensure that these contemporary institutional records are preserved and document the shifting ways the university reaches out through available mediums to engage communities and share its research and learning.

Notes

  1. “Stanford provides access to university content through iTunes,” Stanford University New Service, October 20, 2005, web crawled January 13, 2006, https://web.archive.org/web/20060113075443/http://www.stanford.edu/dept/news/pr/2005/pr-itunes-102605.html.
  2. Lindsay McKenzie, “Apple to Discontinue iTunes U App,” Inside Higher Ed (June 12, 2020), https://www.insidehighered.com/quicktakes/2020/06/12/apple-discontinue-itunes-u-app.
  3. Mary Kidd, Sarah Nguyen, and Erica Titkemeyer, “Subscribe, Rate and Preserve Wherever You Get Your Podcasts,” Journal of Archival Organization 17 (2020): 163, https://doi.org/10.1080/15332748.2020.1769997.
  4. Alexis A. Antracoli, “Podcasts in the Archives: Archiving Podcasting Content at the University of Michigan,” Campus Case Studies (Chicago: Society of American Archivists, 2010), 5, http://files.archivists.org/pubs/CampusCaseStudies/CASE12.pdf. The functions listed are four of the seven identified by Helen Samuels in her book Varsity Letters, which outlines an institutional functional analysis for university and college archivists to identify records that document their institutions.
  5. Antracoli, “Podcasts in the Archives,” 7.
  6. Ibid., 9.
  7. Brock Peoples and Carol Tilley, “Podcasts as an Emerging Information Resource,” College & Undergraduate Libraries 18, no.1 (2011): 53, https://doi.org/10.1080/10691316.2010.550529.
  8. Peoples and Tilley, “Podcasts as an Emerging Information Resource,” 45.
  9. Ibid., 54.
  10. Ibid., 47.
  11. The issue of podfade is discussed in Peoples and Tilley, “Podcasts as an Emerging Information Resource,” 49; Dana Gerber-Margie, Mary Kidd, Molly Schwartz, and Sarah Nguyen, “Preserve This Podcast: A Podcaster-Led Preservation Strategy,” in Saving New Sounds: Podcast Preservation and Historiography, eds. Jeremy Wade Morris and Eric Hoyt (Ann Arbor: University of Michigan Press, 2021), 224. Both references attribute their discussion on “podfade” to Steve Freiss, “Podfading Takes Its Toll,” Wired News (February 7, 2006), https://www.wired.com/2006/02/podfading-takes-its-toll/.
  12. Kidd, Nguyen, and Titkemeyer, “Subscribe, Rate and Preserve,” 173.
  13. “In the Public Interest” online exhibit, University of Minnesota Archives, accessed May 11, 2022, https://gallery.lib.umn.edu/exhibits/show/in-the-public-interest/about-the-exhibit.
  14. Eric Hoyt, “Saving Podcasting’s Contexts: Archive Collecting Strategies and Media Historiography,” in Saving New Sounds: Podcast Preservation and Historiography, eds. Jeremy Wade Morris and Eric Hoyt (Ann Arbor: University of Michigan Press, 2021), 237, https://doi.org/10.3998/mpub.11435021.
  15. Gerber-Margie, Kidd, Schwartz, and Nguyen, “Preserve This Podcast,” 231.
  16. “New U of M Podcasts Aim to Help Students Handle Stress, Eliminate Procrastination and Prepare for Final Tests,” University of Minnesota News Service, in Press Releases, January-June 2006, 272, https://hdl.handle.net/11299/137773.
  17. “About: The Moos Room,” University of Minnesota, accessed May 16, 2022, https://moosroom.transistor.fm/about.
  18. Kidd, Nguyen, and Titkemeyer, “Subscribe, Rate and Preserve,” 167–68; Hoyt, “Saving Podcasting’s Contexts,” 240–43.
  19. Peoples and Tilley, “Podcasts as an Emerging Information Resource,” 55.

Bibliography

Antracoli, Alexis A. “Podcasts in the Archives: Archiving Podcasting Content at the University of Michigan.” Campus Case Studies. Chicago: Society of American Archivists, 2010. http://files.archivists.org/pubs/CampusCaseStudies/CASE12.pdf.

Gerber-Margie, Dana, Mary Kidd, Molly Schwartz, and Sarah Nguyen, “Preserve This Podcast: A Podcaster-Led Preservation Strategy.” In Saving New Sounds: Podcast Preservation and Historiography, edited by Jeremy Wade Morris and Eric Hoyt. Ann Arbor: University of Michigan Press, 2021, 224–36.

Hoyt, Eric. “Saving Podcasting’s Contexts: Archive Collecting Strategies and Media Historiography.” In Saving New Sounds: Podcast Preservation and Historiography, edited by Jeremy Wade Morris and Eric Hoyt. Ann Arbor: University of Michigan Press, 2021, 237–55.

Kidd, Mary, Sarah Nguyen, and Erica Titkemeyer. “Subscribe, Rate and Preserve Wherever You Get Your Podcasts.” Journal of Archival Organization 17:1-2 (2020), 161–77. https://doi.org/10.1080/15332748.2020.1769997.

McKenzie, Lindsay. “The End of the Line for iTunes U?” Inside Higher Ed (June 12, 2019). https://web.archive.org/web/20211203223133/https://www.insidehighered.com/digital-learning/article/2019/06/12/apple-winds-down-itunes-u.

Morris, Jeremy Wade, and Eric Hoyt, eds. “Introduction” in Saving New Sounds: Podcast Preservation and Historiography. Ann Arbor: University of Michigan Press, 2021, 1–25.

Peoples, Brock, and Carol Tilley. “Podcasts as an Emerging Information Resource.” College & Undergraduate Libraries 18:1 (2011), 44–57. https://doi.org/10.1080/10691316.2010.550529.

Preserve this Podcast. Accessed May 16, 2022. http://preservethispodcast.org/.

Samuels, Helen Willa. Varsity Letters: Documenting Modern Colleges and Universities. Chicago, IL: Society of American Archivists, 1992.

Stanford News Service. “Stanford provides access to university content through iTunes.” October 20, 2005. https://web.archive.org/web/20060113075443/http://www.stanford.edu/dept/news/pr/2005/pr-itunes-102605.html.

University of Minnesota Archives. “In the Public Interest” (online exhibit). Accessed May 11, 2022. https://gallery.lib.umn.edu/exhibits/show/in-the-public-interest/about-the-exhibit.

University of Minnesota Extension. The Moos Room. “About.” Accessed May 16, 2022. https://moosroom.transistor.fm/about.

University of Minnesota News Service. “New U of M Podcasts Aim to Help Students Handle Stress, Eliminate Procrastination and Prepare for Final Tests.” In Press Releases, January-June 2006, 272–73. https://hdl.handle.net/11299/137773.


About the authors

Valerie Collins is the former Digital Repositories and Records archivist at the University of Minnesota where she worked with the digital records programs of the University Archives, including the university’s institutional repository and web archiving program.

Erik A. Moore is the university archivist for the University of Minnesota. Moore is also director of the University Digital Conservancy, the University of Minnesota’s institutional repository of scholarly works, research data, and institutional digital records.


🔥 Sign up for LibTech Insights (LTI) new post notifications and updates.

📘 Check out Rethinking Institutional Repositories: Innovations in Management, Collections, and Inclusion

📅 Join us for a free webinar on library content strategy and web design.

✍️ Interested in contributing to LTI? Send an email to Deb V. at Choice with your topic idea.