JISC

Welcome to the ArchivePress website

ArchivePress is a blog-archiving project being undertaken by the University of London Computer Centre and the British Library Digital Preservation department, funded by the JISC Information Environment Programme under its Rapid Innovation Grants Call (03/09).

The project will explore practical issues around the archiving of weblog content, focusing on blogs as records of institutional activity and corporate memory. As an alternative to the web crawling/harvesting approach of the Internet Archive and the UK Web Archive, ArchivePress will test the viability of using RSS feeds and blog APIs to harvest blog content (including comments, embedded content and metadata). The archived content will be stored and managed using instances of Wordpress, thereby maintaining the blogs’ native data structures, formats and relationships.

We hope to develop tools and methodology that will enable organisations to use simple, free, open source blogging software to manage a central archive of designated institutional blog outputs, even if they are spread over different blog hosts and platforms. The benefits of this approach will include:

  • targeted gathering of selected weblogs
  • improved reliability and authenticity of records
  • citable blog content with persistent identifiers
  • automated, ongoing harvesting, via newfeeds
  • accessibility of content, using native blog interfaces
  • use of native web and database file formats, compatible with registry-based preservation activities.

Inputs

Outputs

  • This blog, where we can discuss with the Digital Preservation community, such as preservation, authenticity, citation, context, versioning, reuse.
  • Methodology and guidance for the effective capture and management of blog posts.
  • Scripts/plugins to enable WordPress to be used as a blog aggregator and archiving engine. (Currently in preparation on Google Code.)