• kautau@lemmy.world
    link
    fedilink
    arrow-up
    0
    ·
    7 days ago

    I don’t think even a raspberry 2 would go down over a web scrap

    Absolutely depends on what software the server is running, if there’s proper caching involved. If running some PoW is involved to scrape 1 page it shouldn’t be too much of an issue, as opposed to just blindly following and ingesting every link.

    Additionally, you can choose “good bots” like the internet archive, and they’re currently working on a list of “good bots”

    https://github.com/TecharoHQ/anubis/blob/main/docs/docs/admin/policies.mdx

    AI companies ingesting data nonstop to train their models doesn’t make for a open and free internet, and will likely lead to the opposite, where users no longer even browse the web but trust in AI responses that maybe be hallucinated.

    • daniskarma@lemmy.dbzer0.com
      link
      fedilink
      arrow-up
      0
      arrow-down
      1
      ·
      edit-2
      7 days ago

      There a small number of AI companies training full LLM models. And they usually do a few trains per years. What most people see as “AI bots” are not actually that.

      The influence of AI over the net is another topic. But anubis is also not doing anything about that as it just makes so the AI bots waste more energy getting the data or at most that data under “anubis protection” does not enter the training dataset. The AI will still be there.

      Am I in the list of “good bots” ?sometimes I scrap websites for price tracking or change tracking. If I see a website running malware on my end I would most likely just block that site, one legitimate user less.