A digital museum of a trillion websites

hossain
4 Min Read
Advertisement

The online world is like a name or a drawing written on the sand. A wave of the ocean comes and erases everything. Nothing is permanent in this vast digital world. Our favorite website, photo or memory can be lost at any moment due to a server crash, domain expiration or hacker attack. Just like in 2019, when MySpace changed servers, it accidentally deleted about 50 million songs uploaded between 2003 and 2015 forever! Those songs by 14 million artists can never be recovered.

But an organization is fighting hard to save the internet from this digital oblivion—the Internet Archive. They recently reached an incredible milestone. After almost 30 years of tireless work, this non-profit organization has saved 1 trillion or 1 lakh crore URLs in their collection!

Why is it so important

In 1996, Brewster Kell founded this archive in San Francisco. Their goal was to keep a permanent record of the changes in the Internet. The Internet Archive’s famous service is called the Wayback Machine. Through it, you can see what a website looked like 10 or 20 years ago.

What did the Prothom Alo or BBC Bangla website look like 20 years ago? Or what did your favorite blog, which no longer exists, look like? The Wayback Machine can take you back to that past just like a time machine. It is an invaluable treasure trove for researchers, journalists, and the general public.

A huge treasure

The Internet Archive’s collection includes not only websites, but much more. Their digital library has so far stored 1 trillion web pages. There are 41 million books and text documents. Out of 15 million audio recordings, 250,000 are live concerts. There are more than 10 million videos. There are also 4.4 million images. More than 1 million software programs.

In total, they now have about 100,000 terabytes or 100 petabytes of data! Simply put, it is not possible to store this huge amount of information even with 50,000 iPhones with the highest memory currently on the market. On average, about 500 million new web pages are being added to this archive every day.

Digital Library in the Face of Challenges

However, this great task is becoming more difficult day by day. As a result of the rise of artificial intelligence, big technology companies are searching for all the information on the Internet to teach their AI models. As a result, big media companies like the New York Times, The Guardian or USA Today are now blocking bots like the Internet Archive to protect their content or writings from the hands of AI. Due to copyright complications, the work of the archive is now more challenging than ever.

Nevertheless, it is hoped that the Internet Archive will survive despite all the legal complications. Because there is no alternative to it to keep this most fragile but important digital ecosystem in human history alive. Maybe very soon they will also reach the milestone of 2 trillion websites!

Share This Article
Leave a Comment