Why Is There a Lawsuit Against the Internet Archive?

Opening arguments began this week in a case challenging the future of eResources

The Internet Archive hosts one of the largest online collections of cultural artifacts, and this week, oral arguments to determine its future began in court in the Southern District of New York. In Hachette v. Internet Archive, four major publishing houses—Hachette, HarperCollins, Wiley, and Penguin Random House—take aim at the Internet Archive’s controlled lending program, which allows the Internet Archive to loan the books it or its partner libraries own to one patron at a time, just like other libraries. In a nutshell, the plaintiffs allege that the Internet Archive is not an actual library, so its lending program infringes on copyrighted materials. Here’s what this case is and why it matters.

What is the Internet Archive?

The Internet Archive is an online repository for all sorts of artifacts, though it is best known for the Wayback Machine, which preserves old versions of web pages. The Internet Archive has far broader offerings, including archives of classic television shows, software and games, and community-recorded audiobooks. Further, the Internet Archive hosts 37 million digitally scanned books. The site has been a boon to academic researchers in particular for its repository of rare, out-of-print, and foreign-language books. 

The Internet Archive’s Open Library, launched in 2006, is at the center of this lawsuit. The library boasts more than 20 million digitally scanned books; many thousands of them, collected from more than 150 libraries and publishers, are publicly available for controlled borrowing. The Open Library has earned praise for its efforts to render many of these books and resources ADA compliant, making them, in some cases, accessible to blind and dyslexic readers for the first time.


More on this topic: The What, Why, and How of Web Archiving


What is the argument?

While the Internet Archive hosts many materials that are out of copyright or fall under fair use, its Open Library repository includes many titles with active copyrights. The Internet Archive has a history of legal threats for copyright infringement. It has tried to ward off these threats by claiming that either the Archive itself or its partner libraries own at least one copy of each book they lend out, so its controlled digital lending practices mirror the practices of other libraries across the country. 

This current lawsuit, led by Hachette, was filed in June 2020. Months earlier, at the start of the COVID-19 pandemic, the Open Library removed its lending restrictions and let more than one person borrow a book at a given time (though the encrypted file would still corrupt after 14 days). This “emergency library” sparked the publishing industry’s ire over the Internet Archive’s distribution tactics.

The lawsuit filed against the Internet Archive alleges

  • The Internet Archive is “engaged in willful mass copyright infringement” without “any license or any payment to authors or publishers”;
  • The scale of the Internet Archive’s operations far exceeds that of a normal library’s and “constitute[s] willful digital piracy on an industrial scale”;
  • The Copyright Act covers both the creation and the dissemination of books and other materials;
  • The Internet Archive hosts many in-copyright books, which makes it into a “direct substitute for established markets”;
  • Despite its 501(c)(3) nonprofit status, the Internet Archive is “in fact a highly commercial enterprise with millions of dollars of annual revenues”;
  • Controlled lending is an invented scheme with no basis in copyright law;
  • The Internet Archive does no work of its own and merely profits from the existing work of the industry;
  • The Internet Archive makes an end-run around existing eBook policies and practices by offering digitally scanned physical books;
  • The concept of fair use does not apply to the mass distribution of copyrighted materials;
  • The Internet Archive is not an actual library;
  • In routing the publishing industry and copyright laws, the Internet Archive “destroy[s] the ecosystem that makes books possible in the first place.”

Defenders of the Internet Archive rebut these claims: from upholding the application of fair use to challenging claims of financial harm and arguing that the Internet Archive, by digitizing and rendering accessible these texts, does, in fact, do work of its own. But the argument with the most interest comes from several university librarians who argued in an opinion piece in Inside Higher Ed that the Internet Archive “fulfills the mission of a library in ways we could only dream of a few decades ago.” In other words, the Internet Archive is a library and should be protected as such from the copyright claims of the publishing industry.

What is happening?

The lawsuit has been underway since mid-2020, but opening arguments began this Monday, March 20, 2023. Both the plaintiff and the defendant elected and were approved for a summary judgment. This form of legal recourse seeks the decision of a judge based on evidence alone, without the use of a trial.

Why does this matter?

In the most immediate sense, the shuttering of the Open Library will affect the many thousands of users who borrow books from it and make use of its vast collection of resources. But this lawsuit also presses on a bigger issue emerging over the definition of the internet. Is the internet a private domain, or is it fundamentally a public space, replete with public institutions such as libraries? Twitter has grappled with this question over the issue of free speech. If social media constitutes a public forum, the argument goes, then it must uphold public freedoms. While a decision on this current lawsuit may come in short order, this broader question is one that courts and the public will have to continue wrestling with.


Dimensions logo. "Dimensions Part of DIGITAL SCIENCE" written in black text next to a rainbow geometric shape.

Choice and LibTech Insights gratefully acknowledge our launch sponsor, Dimensions, a part of Digital Science.  Dimensions, is the largest linked research database available and provides a unique view across the whole research ecosystem from idea to impact.


Sign up for LibTech Insights (LTI) new post notifications and updates.

Interested in contributing to LTI? Send an email to Deb V. at Choice with your topic idea.