To preserve billions of web pages, blogs and e-books which appear on the UK web domain, the British Library will begin to "harvest" the internet.
The library hopes to document the entire domain, and could eventually build a database holding every public Tweet or Facebook page.
"If you want a picture of what life is like today in the UK you have to look at the web," explained project leader Lucie Burgess.
"We have already lost a lot of material, particularly around events such as the 7/7 London bombings or the 2008 financial crisis."
"That material has fallen into the digital black hole of the 21st century because we haven't been able to capture it."
"Most of that material has already been lost or taken down. The social media reaction has gone," she added.
A three-month operation to harvest an initial 4.8 million websites -- or one billion web pages -- will begin on Friday, the first step in the ambitious project.
"We will have to distinguish between content published in the UK and elsewhere but in principle we will be able to archive the publicly available tweets of any individual, company or organisation," said Burgess.
The project, which has so far cost £3 million, was sparked by a change in regulations which now allow a small number of librairies to hold digital content without seeking copyright clearance.
"Ten years ago, there was a very real danger of a black hole opening up and swallowing our digital heritage, with millions of web pages, e-publications and other non-print items falling through the cracks of a system that was devised primarily to capture ink and paper," said Roly Keating, chief executive of the British Library.
"The regulations now coming into force make digital legal deposit a reality, and ensure that the Legal Deposit Libraries themselves are able to evolve - collecting, preserving and providing long-term access to the profusion of cultural and intellectual content appearing online or in other digital formats," he added.