If your question isn't answered below then please contact us.
Why do we need a Web Archive?
There are millions of UK websites. They are constantly changing and even disappearing. Often they contain information that is only available on the Web. Responding to the challenge of a potential "digital black hole" web archives are designed to safeguard as many of these websites as practical. The purpose of the UK Web Archive is to collect, preserve and give permanent access to key UK websites for future generations.
How big is the archive?
The UK Web Archive collects millions of websites each year and billions of individual assets (pages, images, videos, pdfs etc.). As of 2017 we have collected approximately 500TB of data and increasing this by over roughly 60 – 70 TB a year.
How frequently are websites collected?
The majority of the archive is collected as part of our big annual uk domain crawl. Selected websites are archived more frequently based on, factors such as the rate of change of the website or its relevance to a particular collection. For example many news sites are collected on a daily basis whereas others are visited less frequently.
How are websites selected?
As per the Non-Print Legal Deposit regulations we the six UK Legal Deposit Libraries are empowered to collect any and all UK based websites. In effect this includes all websites that have a UK top level domain name such as .UK, .SCOT, .WALES, .CYMRU and .LONDON plus any websites that are identified as being hosted on a server located physically in the UK via a geo-ip lookup. Additionally, if a website contains a UK postal address or the website owner confirms UK residence or place of business their website can be included. In order to build comprehensive thematic website collections, we occasionally request permission to archive non-UK websites from the site owner.
What is Non-Print Legal Deposit?
The Legal Deposit Libraries Act (2003) extended existing Legal Deposit Legislation to non-print (electronic) publications, including websites, subject to further enabling Regulations, enacted in 2013. Since the enabling Regulations came into force in 2013, the UK Legal Deposit Libraries, have been archiving UK websites with the caveat that this material is only available to view on Library premises unless we have an additional permission from the website publisher to make the content more widely available.
Is inclusion in the web Archive an endorsement?
No. Websites are reflected for the benefit of future researchers; they are intended to be reflective of contemporary UK life or for their relevance to a particular collection. Judgement is not made on the validity or quality of website content. Archived websites will vary in sophistication, from the very technical and dynamic to more simple, ‘home grown’ publications.
Where is the archive stored?
The UK Web Archive will be ingested into the Digital Library System, a long term digital repository developed by the British Library and supported by the other UK Legal Deposit Libraries.
The Digital Library System is maintained in a secure environment, with network links protected by firewalls and virus checking systems, and with no public internet access. It has four storage nodes located in St Pancras, Boston Spa, Aberystwyth and Edinburgh. Each node stores a full copy of all the materials held within the system. The nodes are in constant communication with each other across a secure network, with automated routines for self-checking, replication and repair; if a digital file stored in one of the nodes becomes corrupted or lost, it is automatically restored from one of the other nodes. Furthermore, each node also uses a technical arrangement by which files are copied and stored on two or more physical disks, with self-checking and replication between the disks.
Can I suggest a website for the archive?
Absolutely! Please let us know of any UK based website that you feel should be archived via our Save a website form.
When will my site appear in the archive?
As soon as practically possible, after our basic quality assurance and indexing processes have been run to prepare archival copies for access. You will not see your archived website through the online UK Web Archive unless you have completed a permission form. We currently have a large backlog of content to be made available through the UK Web Archive so please be patient with us.
Why are some archived websites incomplete, absent or rendering incorrectly?
It is likely that we have not identified the website either manually or automatically through our domain crawls, this is particularly likely if the website does not have a UK top level domain name, for example if it is on a .com domain name. If this is the case and the website is UK please save it (link to form). Some websites may exist in our reading room collection – without the additionally permission of the website publisher we cannot make it available through the UK Web Archive.
Websites are gathered at a particular point in time by harvesting software and are intended to reflect as completely as possible how the website looked and behaved on the Internet at that time. An attempt is made to gather all of the objects associated with a website including html, images, PDF documents, audio and video files and other objects such as programming scripts.
However, even the state-of-the-art web crawlers used by the UK Web Archive have technical limitations and are currently unable to capture streaming media, deep web or database content requiring user input, interactive components based on programming scripts or content which requires plug-ins for rendering. This means that certain elements in some of the archived websites are not present.
Another reason some archived websites are not complete is that occasionally only one page of the site was ever intended for the archive. This may be due to the kind of permission granted for archiving or may be because the selector regarded the single page as sufficiently representing one aspect within a Special Collection.
Finally, websites often link to other websites and it is not always clear to the user that they do so. Unless the linked-to website has also been archived by the UK Web Archive a "Resource Not in Archive" message will be displayed.
The UK Web Archive is designed to support as many browsers as possible however different browsers may display the same website differently.
How can I get my website or other intellectual property removed from the archive?
We respect intellectual property rights (IPR) and data privacy. We seek the appropriate permissions from rights holders before archiving and operate a Notice and Take-down Procedure to take down web sites from UK Web Archive under exceptional circumstances.
How can I protect my privacy if information about me is archived?
It's important for any person making work publicly available on the Internet to be careful with their personal information and that of others. The UK Web Archive collects web pages that are freely and openly available on the Internet. The UK Web Archive does not collect password protected content (unless specific permission has been given), or pages on secure servers such as email and Intranets.
However, if you have queries about material in the UK Web Archive that you feel may infringe your privacy please contact the Archive.