News sites are locking out the Internet Archive to stop AI crawling. Is the ‘open web’ closing?
#Tech #AI #OpenWeb #InternetArchive #DigitalRights #MediaPower #Paywalls #Journalism #TechEthics #Business #PublicAccess #DigitalHistory #AITraining #Copyright
https://the-14.com/news-sites-are-locking-out-the-internet-archive-to-stop-ai-crawling-is-the-open-web-closing/
News sites are locking out the Internet Archive to stop AI crawling. Is the ‘open web’ closing?
#Tech #AI #OpenWeb #InternetArchive #DigitalRights #MediaPower #Paywalls #Journalism #TechEthics #Business #PublicAccess #DigitalHistory #AITraining #Copyright
https://the-14.com/news-sites-are-locking-out-the-internet-archive-to-stop-ai-crawling-is-the-open-web-closing/
Fixing Links with the Wayback Machine
#WordPress #InternetArchive
https://automattic.com/2026/02/04/wayback-machine-wordpress-link-fixer/
GitHub: https://github.com/a8cteam51/internet-archive-wayback-machine-link-fixer
Fixing Links with the Wayback Machine
#WordPress #InternetArchive
https://automattic.com/2026/02/04/wayback-machine-wordpress-link-fixer/
GitHub: https://github.com/a8cteam51/internet-archive-wayback-machine-link-fixer
The CIA just stopped publishing their World Factbook and took every page, including the archived copies of previous versions!
This sucks. It was public domain, so I recovered the 2020 edition (the last one published as a zip file) and shared it to GitHub https://simonwillison.net/2026/Feb/5/the-world-factbook/
RE: https://fedi.simonwillison.net/@simon/116015180016712361
I have been thinking for quite some time that we need a decentralized network for preserving public domain works. The #InternetArchive is important, but they are one single American organization and thus vulnerable (and the fact that they are playing fast and loose with copyright sometimes doesn't help). We need to spread those works out more.
Democracy? What Democracy? It's all about the old mighty cold dollar... ->
"When the World Wide Web went live in the early 1990s, its founders hoped it would be a space for anyone to share information and collaborate. But today, the free and open web is shrinking.
The Internet Archive has been recording the history of the internet and making it available to the public through its Wayback Machine since 1996. Now, some of the world’s biggest news outlets are blocking the archive’s access to their pages.
Major publishers – including The Guardian, The New York Times, the Financial Times, and USA Today – have confirmed they’re ending the Internet Archive’s access to their content.
While publishers say they support the archive’s preservation mission, they argue unrestricted access creates unintended consequences, exposing journalism to AI crawlers and members of the public trying to skirt their paywalls.
Yet, publishers don’t simply want to lock out AI crawlers. Rather, they want to sell their content to data-hungry tech companies. Their back catalogues of news, books and other media have become a hot commodity as data to train AI systems.
#OpenWeb #Media #InternetArchive #News #Newspapers #Journalism
Democracy? What Democracy? It's all about the old mighty cold dollar... ->
"When the World Wide Web went live in the early 1990s, its founders hoped it would be a space for anyone to share information and collaborate. But today, the free and open web is shrinking.
The Internet Archive has been recording the history of the internet and making it available to the public through its Wayback Machine since 1996. Now, some of the world’s biggest news outlets are blocking the archive’s access to their pages.
Major publishers – including The Guardian, The New York Times, the Financial Times, and USA Today – have confirmed they’re ending the Internet Archive’s access to their content.
While publishers say they support the archive’s preservation mission, they argue unrestricted access creates unintended consequences, exposing journalism to AI crawlers and members of the public trying to skirt their paywalls.
Yet, publishers don’t simply want to lock out AI crawlers. Rather, they want to sell their content to data-hungry tech companies. Their back catalogues of news, books and other media have become a hot commodity as data to train AI systems.
#OpenWeb #Media #InternetArchive #News #Newspapers #Journalism
The CIA just stopped publishing their World Factbook and took every page, including the archived copies of previous versions!
This sucks. It was public domain, so I recovered the 2020 edition (the last one published as a zip file) and shared it to GitHub https://simonwillison.net/2026/Feb/5/the-world-factbook/
RE: https://fedi.simonwillison.net/@simon/116015180016712361
I have been thinking for quite some time that we need a decentralized network for preserving public domain works. The #InternetArchive is important, but they are one single American organization and thus vulnerable (and the fact that they are playing fast and loose with copyright sometimes doesn't help). We need to spread those works out more.
💕 I Love Free Software Day 2026 💕
For this year’s I Love Free Software Day I am co-organising two special events, and I am super excited about them!
- 🧶 Knitting Our Internet at Snackbar Frieda, Rotterdam, on Friday Feb 13th at 18:00. All information here.
- 🛹 A reading of @kirschner’s Ada & Zangemann in English and in Dutch. After that, a conversation about Free Software maintenance as care work, together with @mayel from @Bonfire ❤️🔥 at the @internetarchiveeurope, Oudeschans 16, Amsterdam on Saturday Feb 14th at 14:00. All information here. Info about the super cool poster in the post below.
#FreeSoftware #SoftwareFreedom #ILoveFS #ILoveFreeSoftware #ILoveFS #IloveFS26 #ournet #KnittingOurInternet #SnackbarFrieda #Rotterdam #Amsterdam #InternetArchive #InternetArchiveEurope #AZbook #Ada #AdaZangemann #reading #event #decentralizaion #InternetHistory #Internet