ℍ𝕂-𝟞𝟝

  • 0 Posts
  • 2 Comments
Joined 1 year ago
cake
Cake day: July 14th, 2024

help-circle
  • AI does not triple traffic. It’s a completely irrational statement to make.

    Multiple testimonials from people who host sites say they do. Multiple Lemmy instances also supported this claim.

    I would bet that the number of requests per year of s resource by an AI scrapper is on the dozens at most.

    You obviously don’t know much about hosting a public server. Try dozens per second.

    There is a booming startup industry all over the world training AI, and scraping data to sell to companies training AI. It’s not just Microsoft, Facebook and Twitter doing it, but also Chinese companies trying to compete. Also companies not developing public models, but models for internal use. They all use public cloud IPs, so the traffic is coming from all over incessantly.

    Using as much energy as a available per scrapping doesn’t even make physical sense. What does that sentence even mean?

    It means that Microsoft buys a server for scraping, they are going to be running it 24/7, with the CPU/network maxed out, maximum power use, to get as much data as they can. If the server can scrape 100 sites per minute, it will scrape 100 sites. If it can scrape 1000, it will scrape 1000, and if it can do 10, it will do 10.

    It will not stop scraping ever, as it is the equivalent of shutting down a production line. Everyone always uses their scrapers as much as they can. Ironically, increasing the cost of scraping would result in less energy consumed in total, since it would force companies to work more “smart” and less “hard” at scraping and training AI.

    Oh, and it’s S-C-R-A-P-I-N-G, not scrapping. It comes from the word “scrape”, meaning to remove the surface from an object using a sharp instrument, not “scrap”, which means to take something apart for its components.


  • Websites were under a constant noise of malicious requests even before AI, but now AI scraping of Lemmy instances usually triples traffic. While some sites can cope with this, this means a three-fold increase in hosting costs in order to essentially fuel investment portfolios.

    AI scrapers will already use as much energy as available, so making them use more per site measn less sites being scraped, not more total energy used.

    And this is not DDoS, the objective of scrapers is to get the data, not bring the site down, so while the server must reply to all requests, the clients can’t get the data out without doing more work than the server.