mastodon.green is one of the many independent Mastodon servers you can use to participate in the fediverse.
Plant trees while you use Mastodon. A server originally for people in the EU, but now open for anyone in the world

Administered by:

Server stats:

1.2K
active users

#scraping

7 posts7 participants1 post today
ResearchBuzz: Firehose<p>The Markup: A Guide on How to Legally Web Scrape EU Data. “At The Markup, some of our data journalists recently had questions about the legal risks involved in scraping websites hosted in the European Union. We conducted our own research to answer this question, and offer a summary of what we learned below. Our goal is to help other journalists, researchers, and advocates come up with a […]</p><p><a href="https://rbfirehose.com/2025/04/06/the-markup-a-guide-on-how-to-legally-web-scrape-eu-data/" class="" rel="nofollow noopener noreferrer" target="_blank">https://rbfirehose.com/2025/04/06/the-markup-a-guide-on-how-to-legally-web-scrape-eu-data/</a></p>
Frontend Dogma<p>Web Scraping With Cheerio in 2025, by @apify.bsky.social:</p><p><a href="https://blog.apify.com/web-scraping-with-cheerio/" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">blog.apify.com/web-scraping-wi</span><span class="invisible">th-cheerio/</span></a></p><p><a href="https://mas.to/tags/guides" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>guides</span></a> <a href="https://mas.to/tags/scraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>scraping</span></a> <a href="https://mas.to/tags/tooling" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>tooling</span></a></p>
zombor.io<p>**<a href="https://mastodon.social/tags/Scraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Scraping</span></a> of <a href="https://mastodon.social/tags/LLM" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>LLM</span></a> ’s explained:**<br>* take a company that lives off of answering people’s questions, e.g. WikiHow<br>* take all WikiHow’s guides and turn them into an answerbot.<br>* make money by providing the answers to WikiHow users with your chatbot<br>* claim that you are not a competitor to WikiHow, and your use of their entire content library is <a href="https://mastodon.social/tags/FairUse" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>FairUse</span></a><br>* repeat with the entire internet</p><p><a href="https://mastodon.social/tags/GenAI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>GenAI</span></a> <a href="https://mastodon.social/tags/AITraining" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AITraining</span></a></p>
Synapsenkitzler 🌻<p>11.4 AK KI<br>12 Ausblick und Schlussbemerkung <br>13 Anhang<br>13.1 DSGVO Art. 51 ff.<br>13.2 DSGVO Art. 85<br>13.3 MStV § 12, § 23, § 113 <br>13.4 TDDDG § 25 <br>13.5 Regelungen zum Rundfunkdatenschutzbeauftragten<br>13.6 RDSK-Mitgliederliste <br>13.7 RDSK-Verwaltungsvereinbarung</p><p>[ENDE]</p><p><a href="https://digitalcourage.social/tags/Rundfunkdatenschutzbeauftragter" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Rundfunkdatenschutzbeauftragter</span></a> <a href="https://digitalcourage.social/tags/Datenschutzrecht" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Datenschutzrecht</span></a> <a href="https://digitalcourage.social/tags/Datenstrategie" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Datenstrategie</span></a> <a href="https://digitalcourage.social/tags/AIA" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AIA</span></a> <a href="https://digitalcourage.social/tags/KI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>KI</span></a> <a href="https://digitalcourage.social/tags/TDDDG" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>TDDDG</span></a> <a href="https://digitalcourage.social/tags/BeschDG" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>BeschDG</span></a> <a href="https://digitalcourage.social/tags/Auskunftsanspruch" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Auskunftsanspruch</span></a> <a href="https://digitalcourage.social/tags/Scraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Scraping</span></a> <a href="https://digitalcourage.social/tags/EDSA" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>EDSA</span></a> <a href="https://digitalcourage.social/tags/Auftragsverarbeitung" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Auftragsverarbeitung</span></a> <a href="https://digitalcourage.social/tags/DSGVO" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>DSGVO</span></a> <a href="https://digitalcourage.social/tags/personenbezogen" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>personenbezogen</span></a> <a href="https://digitalcourage.social/tags/Datenschutzleitfaden" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Datenschutzleitfaden</span></a> <a href="https://digitalcourage.social/tags/Nutzungsmessung" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Nutzungsmessung</span></a> <a href="https://digitalcourage.social/tags/PianoAnalytics" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>PianoAnalytics</span></a> <a href="https://digitalcourage.social/tags/Medienprivileg" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Medienprivileg</span></a> <a href="https://digitalcourage.social/tags/Facebook" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Facebook</span></a> <a href="https://digitalcourage.social/tags/Correctiv" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Correctiv</span></a> <a href="https://digitalcourage.social/tags/Informationspflicht" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Informationspflicht</span></a> <a href="https://digitalcourage.social/tags/Datenauswertung" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Datenauswertung</span></a> <a href="https://digitalcourage.social/tags/VVT" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>VVT</span></a> <a href="https://digitalcourage.social/tags/Mediathek" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Mediathek</span></a> <a href="https://digitalcourage.social/tags/WhatsApp" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>WhatsApp</span></a> <a href="https://digitalcourage.social/tags/Gewinnspiel" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Gewinnspiel</span></a> <a href="https://digitalcourage.social/tags/Rundfunkanstalt" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Rundfunkanstalt</span></a> <a href="https://digitalcourage.social/tags/Beitragsservice" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Beitragsservice</span></a> <a href="https://digitalcourage.social/tags/Gesundheitsdaten" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Gesundheitsdaten</span></a> <a href="https://digitalcourage.social/tags/Bankdaten" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Bankdaten</span></a> <a href="https://digitalcourage.social/tags/Gerichtsvollzieher" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Gerichtsvollzieher</span></a> <a href="https://digitalcourage.social/tags/Adresshandel" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Adresshandel</span></a> <a href="https://digitalcourage.social/tags/Rundfunkdatenschutzkonferenz" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Rundfunkdatenschutzkonferenz</span></a> <a href="https://digitalcourage.social/tags/RDSK" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>RDSK</span></a> <a href="https://digitalcourage.social/tags/Datenschutzfolgenabsch%C3%A4tzung" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Datenschutzfolgenabschätzung</span></a> <a href="https://digitalcourage.social/tags/Bu%C3%9Fgelder" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Bußgelder</span></a> <a href="https://digitalcourage.social/tags/AKDSB" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AKDSB</span></a> <a href="https://digitalcourage.social/tags/Datenschutzkonferenz" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Datenschutzkonferenz</span></a> <a href="https://digitalcourage.social/tags/DSK" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>DSK</span></a> <a href="https://digitalcourage.social/tags/Medien" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Medien</span></a> <a href="https://digitalcourage.social/tags/MStV" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>MStV</span></a></p>
Martin Owens :inkscape:<p>I've set up my new <a href="https://floss.social/tags/inkscape" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>inkscape</span></a> website AI bot trap. It works by giving everyone a chance to not fall into it.</p><p>An anchor link that says "I am a bot" and links to /P3W-451/{datetime}/ it's got a fixed position at top -100px so should never be seen</p><p>The robots.txt says "Disallow: /P3W-451/" so if you were reading the robots, you'd know.</p><p>Then <a href="https://floss.social/tags/nginx" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>nginx</span></a> logs the requests to a log of their ip-addresses and browser strings and sends them a 301 redirect to google.com</p><p><a href="https://floss.social/tags/ai" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>ai</span></a> <a href="https://floss.social/tags/Scraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Scraping</span></a> </p><p>1/2</p>
Strypey<p>Joshua Yuvaraj, co-director of the New Zealand Centre for Intellectual Property, was interviewed on RNZ yesterday, about the degree to which copyright law might be used to prevent scraping of the open web by <a href="https://mastodon.nzoss.nz/tags/MOLE" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>MOLE</span></a> Trainers;</p><p><a href="https://www.rnz.co.nz/national/programmes/nights/audio/2018981590/what-can-writers-do-about-their-work-being-used-to-train-ai-models" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://www.</span><span class="ellipsis">rnz.co.nz/national/programmes/</span><span class="invisible">nights/audio/2018981590/what-can-writers-do-about-their-work-being-used-to-train-ai-models</span></a></p><p>As Cory Doctorow noted back in 2023;</p><p>"In privacy and labor fights, copyright is a clumsy tool at best."</p><p><a href="https://pluralistic.net/2023/09/17/how-to-think-about-scraping/" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">pluralistic.net/2023/09/17/how</span><span class="invisible">-to-think-about-scraping/</span></a></p><p><a href="https://mastodon.nzoss.nz/tags/RNZ" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>RNZ</span></a> <a href="https://mastodon.nzoss.nz/tags/NZCIP" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>NZCIP</span></a> <a href="https://mastodon.nzoss.nz/tags/JoshuaYuvaraj" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>JoshuaYuvaraj</span></a> <a href="https://mastodon.nzoss.nz/tags/LLM" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>LLM</span></a> <a href="https://mastodon.nzoss.nz/tags/scraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>scraping</span></a></p>
sheislaurence<p><span class="h-card" translate="no"><a href="https://norrebro.space/@nimi" class="u-url mention" rel="nofollow noopener noreferrer" target="_blank">@<span>nimi</span></a></span> <span class="h-card" translate="no"><a href="https://toot.lv/@papuass" class="u-url mention" rel="nofollow noopener noreferrer" target="_blank">@<span>papuass</span></a></span> <span class="h-card" translate="no"><a href="https://stefanbohacek.online/@stefan" class="u-url mention" rel="nofollow noopener noreferrer" target="_blank">@<span>stefan</span></a></span> <span class="h-card" translate="no"><a href="https://mastodon.social/@freediverx" class="u-url mention" rel="nofollow noopener noreferrer" target="_blank">@<span>freediverx</span></a></span> yeah except you can't force bad actors to use your commercial API if they still have an open route in, that basically cost them next to nothing. It really doesn't matter <a href="https://mastodon.social/tags/scraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>scraping</span></a> isn't elegant. It works, it's cheap. It's basically an arms race that <a href="https://mastodon.social/tags/opensource" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>opensource</span></a> <a href="https://mastodon.social/tags/openknowledge" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>openknowledge</span></a> were never designed to wage. My only hope is that the <a href="https://mastodon.social/tags/cyberpunk" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>cyberpunk</span></a> spirit will reorganise itself along those faultlines and fight the good fight.</p>
Liz Probert<p>How crawlers impact the operations of the Wikimedia projects <a href="https://diff.wikimedia.org/2025/04/01/how-crawlers-impact-the-operations-of-the-wikimedia-projects/" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">diff.wikimedia.org/2025/04/01/</span><span class="invisible">how-crawlers-impact-the-operations-of-the-wikimedia-projects/</span></a> <a href="https://greennet.social/tags/AI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AI</span></a>, <a href="https://greennet.social/tags/Crawlers" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Crawlers</span></a>, <a href="https://greennet.social/tags/Infrastructure" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Infrastructure</span></a>, <a href="https://greennet.social/tags/KnowledgeAsAService" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>KnowledgeAsAService</span></a>, <a href="https://greennet.social/tags/KnowledgeContent" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>KnowledgeContent</span></a>, <a href="https://greennet.social/tags/Operations" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Operations</span></a>, <a href="https://greennet.social/tags/Scraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Scraping</span></a>, <a href="https://greennet.social/tags/ScrapingBots" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>ScrapingBots</span></a>, <a href="https://greennet.social/tags/Traffic" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Traffic</span></a>, <a href="https://greennet.social/tags/WikimediaFoundation" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>WikimediaFoundation</span></a>, <a href="https://greennet.social/tags/WikimediaProjects" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>WikimediaProjects</span></a></p>
Venkatesh-Prasad Ranganath<p>An interesting code hosting related downside of AI. <a href="https://mastodon.social/tags/ai" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>ai</span></a> <a href="https://mastodon.social/tags/ddos" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>ddos</span></a> <a href="https://mastodon.social/tags/web" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>web</span></a> <a href="https://mastodon.social/tags/scraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>scraping</span></a> <a href="https://mastodon.social/tags/copyright" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>copyright</span></a> <a href="https://mastodon.social/tags/code" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>code</span></a></p><p><a href="https://techcrunch.com/2025/03/27/open-source-devs-are-fighting-ai-crawlers-with-cleverness-and-vengeance/" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">techcrunch.com/2025/03/27/open</span><span class="invisible">-source-devs-are-fighting-ai-crawlers-with-cleverness-and-vengeance/</span></a></p>
Petra van Cronenburg<p><span class="h-card" translate="no"><a href="https://wandering.shop/@susankayequinn" class="u-url mention" rel="nofollow noopener noreferrer" target="_blank">@<span>susankayequinn</span></a></span> Here's another article by <span class="h-card" translate="no"><a href="https://mastodon.social/@brianmerchant" class="u-url mention" rel="nofollow noopener noreferrer" target="_blank">@<span>brianmerchant</span></a></span> : <a href="https://www.bloodinthemachine.com/p/openais-studio-ghibli-meme-factory" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://www.</span><span class="ellipsis">bloodinthemachine.com/p/openai</span><span class="invisible">s-studio-ghibli-meme-factory</span></a><br>"AI giants are indeed eating away at the livelihoods and dignity of working artists, and this devouring, appropriating, and automation of the production of art, of culture, at a scale truly never seen before, should not be underestimated as a menace"</p><p><a href="https://mastodon.online/tags/AI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AI</span></a> <a href="https://mastodon.online/tags/OpenAI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>OpenAI</span></a> <a href="https://mastodon.online/tags/StudioGhibli" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>StudioGhibli</span></a> <a href="https://mastodon.online/tags/art" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>art</span></a> <a href="https://mastodon.online/tags/artists" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>artists</span></a> <a href="https://mastodon.online/tags/scraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>scraping</span></a> <a href="https://mastodon.online/tags/copyright" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>copyright</span></a> <a href="https://mastodon.online/tags/copyrightInfringement" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>copyrightInfringement</span></a> <a href="https://mastodon.online/tags/culture" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>culture</span></a> <a href="https://mastodon.online/tags/billionaires" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>billionaires</span></a></p>
Petra van Cronenburg<p>"GPT-4o is partly (aside from some licensed content) a product of a massive scrape of the Internet without regard to copyright or consent from artists ... GPT-4o's image generation model (and the technology behind it, once open source) feels like it further erodes trust in remotely produced media ... Everyone needs media literacy skills ..." <a href="https://arstechnica.com/ai/2025/03/openais-new-ai-image-generator-is-potent-and-bound-to-provoke/?utm_brand=arstechnica&amp;utm_social-type=owned&amp;utm_source=mastodon&amp;utm_medium=social" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">arstechnica.com/ai/2025/03/ope</span><span class="invisible">nais-new-ai-image-generator-is-potent-and-bound-to-provoke/?utm_brand=arstechnica&amp;utm_social-type=owned&amp;utm_source=mastodon&amp;utm_medium=social</span></a> via <span class="h-card" translate="no"><a href="https://mastodon.social/@arstechnica" class="u-url mention" rel="nofollow noopener noreferrer" target="_blank">@<span>arstechnica</span></a></span> </p><p><a href="https://mastodon.online/tags/AI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AI</span></a> <a href="https://mastodon.online/tags/generativeAI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>generativeAI</span></a> <a href="https://mastodon.online/tags/imageGenerator" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>imageGenerator</span></a> <a href="https://mastodon.online/tags/fake" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>fake</span></a> <a href="https://mastodon.online/tags/gpt4o" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>gpt4o</span></a> <a href="https://mastodon.online/tags/artists" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>artists</span></a> <a href="https://mastodon.online/tags/copyright" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>copyright</span></a> <a href="https://mastodon.online/tags/scraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>scraping</span></a> <a href="https://mastodon.online/tags/mediaLiteracy" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>mediaLiteracy</span></a> <a href="https://mastodon.online/tags/images" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>images</span></a></p>
Simon Hewison<p>another part of my day job involves working around systems designed to prevent mass AI-driven scraping, because humans and well-behaved query scripts are accidentally caught up in all the war-of-the-scrapers, because Cloudflare etc are offering what seems to management to be a magic bullet, and putting the bluntest of tools in front of anywhere that needs to be public, including APIs.<br><a href="https://mastodon.online/tags/scraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>scraping</span></a> <a href="https://mastodon.online/tags/api" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>api</span></a></p>
Simon Hewison<p>Part of my day job involves using APIs to retrieve public data from third party public websites, some of which were never designed to publish raw data, so I tread lightly, no more than a human-driven query.<br>Part of my day job is preventing third party machines from hammering servers I run by incessant mass scraping - hundreds of thousands of ridiculous requests humans would never do or want (typically that's AI-driven scraping that doesn't abide by robots.txt).<br>I feel conflicted.<br><a href="https://mastodon.online/tags/scraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>scraping</span></a> <a href="https://mastodon.online/tags/api" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>api</span></a></p>
Thor A. Hopland<p>When it comes to <a href="https://snabelen.no/tags/AI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AI</span></a>, could you tf not with all that <a href="https://snabelen.no/tags/scraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>scraping</span></a>? Pay-per-packet could be the future now if some people can't control themselves.</p><p>The more you <a href="https://snabelen.no/tags/scrape" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>scrape</span></a>, the more <a href="https://snabelen.no/tags/developers" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>developers</span></a> have to pay, which should yield better and improved infrastructure to decrease the cost, but instead: it could turn the internet into a true "transactional" network.</p><p>Suddenly the <a href="https://snabelen.no/tags/internet" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>internet</span></a> is run on <a href="https://snabelen.no/tags/microtransactions" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>microtransactions</span></a>... hell hath arrived. Granted, this fringe scenario is a bit hyperbolic, but still.</p>
uǝuunɹƃʇǝO<p>Thoughts: AI corps scraping data</p><p>The corporations assert that they can utilize public data without incurring any costs, citing fair use as their justification.</p><p>To address this issue, we should implement a law that compels corporations claiming fair use as a defense to make all their process data publicly available, free of charge. This would ensure that the scraped data, as well as data derived from the freely available data, is accessible to the public.<br><a href="https://mstdn.social/tags/AI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AI</span></a> <a href="https://mstdn.social/tags/FairUse" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>FairUse</span></a> <a href="https://mstdn.social/tags/Scraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Scraping</span></a> <a href="https://mstdn.social/tags/WebScraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>WebScraping</span></a></p>
Petra van Cronenburg<p><span class="h-card" translate="no"><a href="https://ohai.social/@Garwboy" class="u-url mention" rel="nofollow noopener noreferrer" target="_blank">@<span>Garwboy</span></a></span> As a friend of biodiversity I had nearly stopped reading until there: "I like all of those creatures. I find them fascinating, and they occupy important roles in our society and ecosystem. I would never say that about Mark Zuckerberg."<br>But now I dream of writer troll farms using your inspiring idea to train <a href="https://mastodon.online/tags/AI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AI</span></a>: <a href="https://theneuroscienceofeverydaylife.substack.com/p/an-article-for-meta-to-use-to-train" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">theneuroscienceofeverydaylife.</span><span class="invisible">substack.com/p/an-article-for-meta-to-use-to-train</span></a> Great! Made my day. 😂 <br><span class="h-card" translate="no"><a href="https://a.gup.pe/u/writing" class="u-url mention" rel="nofollow noopener noreferrer" target="_blank">@<span>writing</span></a></span> <span class="h-card" translate="no"><a href="https://a.gup.pe/u/writers" class="u-url mention" rel="nofollow noopener noreferrer" target="_blank">@<span>writers</span></a></span> <span class="h-card" translate="no"><a href="https://a.gup.pe/u/writerscommunity" class="u-url mention" rel="nofollow noopener noreferrer" target="_blank">@<span>writerscommunity</span></a></span> </p><p><a href="https://mastodon.online/tags/writers" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>writers</span></a> <a href="https://mastodon.online/tags/authors" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>authors</span></a> <a href="https://mastodon.online/tags/LLM" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>LLM</span></a> <a href="https://mastodon.online/tags/bookstodon" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>bookstodon</span></a> <a href="https://mastodon.online/tags/generativeAI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>generativeAI</span></a> <a href="https://mastodon.online/tags/copyright" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>copyright</span></a> <a href="https://mastodon.online/tags/scraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>scraping</span></a></p>
Petra van Cronenburg<p>Yesterday I made a test, warned against this account with a hashtag of the name and a certain bird, and promptly got the <a href="https://mastodon.online/tags/scam" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>scam</span></a> again. It's the sign that this paragon of a <a href="https://mastodon.online/tags/troll" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>troll</span></a> factory or a narcissistic bot tinkerer hopping instances is not reacting randomly. Don't just block it, it's important to <a href="https://mastodon.online/tags/report" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>report</span></a> it so that it finally comes to an end. Don't click the links. If it's <a href="https://mastodon.online/tags/scraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>scraping</span></a>, a joke, or an attack on the Fediverse: a <a href="https://mastodon.online/tags/fediblock" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>fediblock</span></a> would be fine! The phrase pattern could be filtered.</p>
WinFuture.de<p>Eine <a href="https://mastodon.social/tags/Verbrecherbande" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Verbrecherbande</span></a> hat in den USA tausende <a href="https://mastodon.social/tags/iPhones" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>iPhones</span></a> direkt bei der Zustellung <a href="https://mastodon.social/tags/gestohlen" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>gestohlen</span></a>. Sie bestachen Mobilfunkmitarbeiter und besorgten sich Tracking-Nummern mithilfe von <a href="https://mastodon.social/tags/Scraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Scraping</span></a>-Software. <a href="https://winfuture.de/news,149768.html?utm_source=Mastodon&amp;utm_medium=ManualStatus&amp;utm_campaign=SocialMedia" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">winfuture.de/news,149768.html?</span><span class="invisible">utm_source=Mastodon&amp;utm_medium=ManualStatus&amp;utm_campaign=SocialMedia</span></a></p>
Moreno Colaiacovo 🧬🇮🇹<p>Alle lezioni di data science ci hanno insegnato che fare <a href="https://mastodon.uno/tags/scraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>scraping</span></a> è una pratica controversa al limite dell'illecito, maneggiare con cautela. Poi è arrivata <a href="https://mastodon.uno/tags/OpenAI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>OpenAI</span></a>, ha fatto scraping massivo senza pietà e ci ha pure fatto un prodotto commerciale. O sbagliava il mio prof o ha sbagliato Sam Altman! 🤷🏻‍♂️</p>
CCIA Europe<p><a href="https://eupolicy.social/tags/AI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AI</span></a> Training: Does <a href="https://eupolicy.social/tags/Data" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Data</span></a> Scraping Really Impact <a href="https://eupolicy.social/tags/Privacy" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Privacy</span></a>? 🔒</p><p>👇 Etienne Drouard explains the complex interplay between the EU <a href="https://eupolicy.social/tags/AIAct" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AIAct</span></a> and <a href="https://eupolicy.social/tags/GDPR" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>GDPR</span></a>, and why <a href="https://eupolicy.social/tags/scraping" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>scraping</span></a> publicly available data for training <a href="https://eupolicy.social/tags/ArtificialIntelligence" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>ArtificialIntelligence</span></a> models isn’t a privacy violation. <a href="https://eupolicy.social/tags/EuropeanAIroundtable" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>EuropeanAIroundtable</span></a> </p><p><a href="https://www.youtube.com/watch?v=vpk1cCx0Odc" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://www.</span><span class="ellipsis">youtube.com/watch?v=vpk1cCx0Od</span><span class="invisible">c</span></a></p>