The What, Why, and How of Web Archiving

Web content doesn't stay around forever. Here's how to preserve it.

As time moves on, we conduct more and more of our daily lives online. Work, education, communication with friends and family, news, and images are presented and shared through a variety of websites. Many resources that were previously published and distributed on paper are now only available online, like course catalogs, promotional materials, reports, and large data sets. With the recent COVID-19 pandemic, even more of our interactions have moved to the web. As is the case with any digital material, web content is more vulnerable to loss than analog media.

According to a Scientific American report, the average lifespan of a webpage in 1997 was just 44 days; in 2003, the average lifespan increased to 100 days. As of 2021, the lifespan of a website is 2 years and 7 months. Estimating the lifespan of a website depends on many factors, but these numbers pose a concern. With the ability to easily edit or remove information completely from a website, how can this information be preserved? This is where web archiving comes into play.

What is web archiving?

Web archiving is “the process of collecting, preserving, and providing enduring access to web content,” according to the official definition from the Society of American Archivists. It allows for the recording of the entire contents of a web page and its images, providing interaction with links. Web archives differ from screenshots, which are static and can only provide an image reminder of what a web page looked like. Screenshots are also easily editable and manipulable, unlike web archives.

People can archive websites in a variety of ways, such as the Save Page Now feature on the Wayback Machine website, a free account on Conifer, or a paid subscription service like Archive-It.

  • The Wayback Machine, founded by the Internet Archive, is the most well-known web archiving tool, with over 789 billion web pages saved. As a free service, it is used frequently by individuals as a way to take a quick capture of a website.

  • Archive-It, the paid service offered by the Internet Archive, is used by organizations, such as libraries, cultural heritage institutions, community and educational groups. It provides web archiving tools, training, and a platform to search and discover collections. Archive-It users can add their own metadata to their collections, including searchable subject headings. These collections can also be harvested into library discovery layers to expand their reach.

  • Conifer is an open-source tool that provides 5GB of web archiving storage for free, with additional storage and benefits with membership. It has a high success rate for capturing more complex websites that include embedded videos and 3D graphics.

While each of these tools have their benefits, the choice of tool depends on the types of collections you would like to archive, your budget, and how you would like users to access them.

It is especially important to capture at-risk materials, such as government and social justice websites and social media. Web archiving can be a cross-departmental collaboration involving technical services, special collections, reference, and instruction. Working together, librarians can help preserve the web, while also teaching people about the significance of what is posted online.

Ways to incorporate web archiving at your library

  • Engage students in experiential learning through “citizen archiving.” The “Save Page Now” feature on the WayBack Machine allows anyone to freely archive a single, public website page. At our institution, we created a co-curricular learning opportunity where students learned about the impermanence of the web and then selected a website related to the “common good” to archive. The program was created with three learning goals in mind: teach students about the process of web archiving, introduce them to the web archive collections curated by the university libraries, and instill in them the importance of preserving history on the internet. The Internet Archive’s Wayback Machine was useful for this program due to its underlying idea of our collective responsibility to contribute to one data archive.

  • Archive your social media accounts. Due to the potential for instability with social media platforms, now is an important time to archive your university’s and library’s Facebook, Twitter, and Instagram posts. In addition to web archiving tools, you can download all your tweets directly from Twitter.   

  • Highlight your distinctive collections. Our archivists curate several different web archive collections using the Archive-It tool from the Internet Archive. These web collections complement the physical materials in each of our three special collections areas: the Marian Library, the U.S. Catholic Special Collection, and the University Archives and Special Collections. 

Librarians and archivists should pursue web archiving as a form of collection development and preservation. The records created online today are the historical record for tomorrow, and by preserving the internet, we are ensuring a diverse historical record for the future.

Dimensions logo. "Dimensions Part of DIGITAL SCIENCE" written in black text next to a rainbow geometric shape.

Choice and LibTech Insights gratefully acknowledge our launch sponsor, Dimensions, a part of Digital Science.  Dimensions, is the largest linked research database available and provides a unique view across the whole research ecosystem from idea to impact.

Sign up for LibTech Insights (LTI) new post notifications and updates.

Interested in contributing to LTI? Send an email to Deb V. at Choice with your topic idea.