Authors:
Published:
This past week was the annual gathering of the International Internet Preservation Consortium. This year, the event was hosted online by the Library of Congress, and we were excited to be able to attend sessions from folks all over the globe.
The programming will be available in full in a couple of weeks (we will send the links out with our next newsletter!), but here are some highlights from the live event that we think our community would find particularly relevant:
Arquivo404: This project from Portuguese archive Arquivo uses Memento protocols to allow website administrators to back up pages with various web archives. “This presentation will show use cases of the Arquivo404 service, detail the technologies it uses and provide some insight on the configurations it allows, namely the addition of other web archives for the search”
Optimizing Archival Replay by Eliminating Unnecessary Traffic to Web Archives: Our friends from the Internet Archive and Web Science & Digital Libraries Research Group at Old Dominion University have been conducting research on the speed of archival replay. “We discovered that some replayed web pages cause recurrent requests that lead to unnecessary traffic for the web archive. We looked at the network traffic on numerous archived web pages and found, for example, an archived page that made 945 requests per minute on average.”
WARC Collection Summarization: We send copies of our Perma collection to the Internet Archive as part of our preservation plan - and have worked with the team at the Internet Archive to optimize the way that we share our collection. This presentation is by our collaborator on their team, and is related to our work together. “Items in the Internet Archive’s Petabox collections of various media types like image, video, audio, book, etc. have rich metadata, representative thumbnails, and interactive hero elements. However, web collections, primarily containing WARC files and their corresponding CDX files, often look opaque. We created an open-source CLI tool called ‘CDX Summary’ to process sorted CDX files and generate reports.”
The Evolving Treatment of Wayback Machine Evidence by U.S. Federal Courts: Friend of LIL Nicholas Taylor took a deep dive into how U.S. federal courts have been evaluating the efficacy of Wayback Machine content for use in court. This chart outlines the four different ways that lawyers have argued for the use of a web archive as evidence:
Keep an eye out for recordings of the full sessions as well as Q&A sessions! Thanks to IIPC and the Library of Congress for pulling all of this together!