@daniskarma

daniskarma@lemmy.dbzer0.com · edit-2 14 hours ago

I use AIO approach with jellyfin but I’m thinking about changing it.

I like how jellyfin handles music, but the search feature is unusable with so many files.

Each time I search for a movie it search through thousands of music files and music people. And jellyfin search feature is bad as it is. I’m waiting for them to fix ot but it doesn’t seem like it.

So maybe taking music out would make that feature usable again.

daniskarma@lemmy.dbzer0.com · 4 days ago

Some of us will be replaced the old fashioned way. By a underpaid worker in a third word country that will ask for one tenth of the money for the same job.

daniskarma@lemmy.dbzer0.com · edit-2 8 days ago

Why would they request so many times a day the same data if the objective was AI model training. It makes zero sense.

Also google bots obeys robots.txt so they are easy to manage.

There may be tons of reasons google is crawling your website. From ad research to any kind of research. The only AI related use I can think of is RAG. But that would take some user requests aways because if the user got the info through the AI google response then they would not enter the website. I suppose that would suck for the website owner, but it won’t drastically increase the number of requests.

But for training I don’t see it, there’s no need at all to keep constantly scraping the same web for model training.

daniskarma@lemmy.dbzer0.com · edit-2 8 days ago

I mean number of pirates correlates with global temperature. That doesn’t mean causation.

The rest of the indices would aso match for any archiving bot, or with any bit in search of big data. We must remember that big data is used for much more than AI. At the end of the day scraping is cheap, but very few companies in the world have access to the processing power to train that amount of data. That’s why it seems so illogical to me.

We are seeing how many LLM models which are results of a full train, per year? Ten? twenty? Even if they update and retrain often it’s not compatible with the amount of request people are implying as AI scraping that would put services into dos risk. Specially when I would think that any AI company would not try to scrap the same data twice.

I have also experience an increase in bot requests in my host. But I just think is a result of internet getting bigger, more people using internet with more diverse intentions, some ill some not. I’ve also experience a big increase on probing and attack attempts on general, and I don’t think it’s OpenAI trying some outdated Apache vulnerability on my server. Internet is just a bigger sea with more fish in it.

daniskarma@lemmy.dbzer0.com · 8 days ago

I don’t think is millions. Take into account that a ddos attacker is not going to execute JavaScript code, at least not any competent one, so they are not going to run the PoW.

In fact the unsolicited and unwarned PoW does not provide more protection than a captcha again ddos.

The mitigation comes from the smaller and easier requests response by the server, so the number of requests to saturate the service must increase. How much? Depending how demanding the “real” website would be in comparison. I doubt the answer is millions. And they would achieve the exact same result with a captcha without running literal malware on the clients.

daniskarma@lemmy.dbzer0.com · edit-2 8 days ago

Precisely that’s my point. It fits a very small risk profile. People who is going to be ddosed but not by a big agent.

It’s not the most common risk profile. Usually ddos attacks are very heavy or doesn’t happen at all. These “half gas” ddos attacks are not really common.

I think that’s why when I read about Anubis is never in a context of ddos protection. It’s always on a context of “let’s fuck AI”, like this precise line of comments.

daniskarma@lemmy.dbzer0.com · edit-2 8 days ago

How do you know those reduced request were AI companies and not any other purpose?

daniskarma@lemmy.dbzer0.com · 8 days ago

I’m not native English speaker. So I would apologize if there’s bad English in my response. And would thank any corrections.

That being said I do host public services, before and after AI was a thing. And I have asked many of these people who claim “we are under AI bot attacks” how are they able to differentiate when a request is from a AI scrapper or just any other scrapper and there was no satisfying answer.

daniskarma@lemmy.dbzer0.com · edit-2 8 days ago

There a small number of AI companies training full LLM models. And they usually do a few trains per years. What most people see as “AI bots” are not actually that.

The influence of AI over the net is another topic. But anubis is also not doing anything about that as it just makes so the AI bots waste more energy getting the data or at most that data under “anubis protection” does not enter the training dataset. The AI will still be there.

Am I in the list of “good bots” ?sometimes I scrap websites for price tracking or change tracking. If I see a website running malware on my end I would most likely just block that site, one legitimate user less.

daniskarma@lemmy.dbzer0.com · edit-2 8 days ago

Some websites being under ddos attack =/= all sites are under constant ddos attack, nor it cannot exist without it.

First there’s a logic fallacy in there. Being used by does not mean it’s useful. Many companies use AI for some task, does that make AI useful? Not.

The logic it’s still there all anubis can do against ddos is raising a little the barrier before the site goes down. That’s call mitigation not protection. If you are targeted for a ddos that mitigation is not going to do much, and your site is going down regardless.

daniskarma@lemmy.dbzer0.com · 8 days ago

AI does not triple traffic. It’s a completely irrational statement to make.

There’s a very limited number of companies training big LLM models, and these companies do train a model a few times per year. I would bet that the number of requests per year of s resource by an AI scrapper is on the dozens at most.

Using as much energy as a available per scrapping doesn’t even make physical sense. What does that sentence even mean?

daniskarma@lemmy.dbzer0.com · edit-2 8 days ago

I’m not saying it’s not open source or free. I say that it does not contribute to make the web free and open. It really only contribute into making everyone waste more energy surfing the web.

The web is already too heavy we do NOT need PoW added to that.

I don’t think even a raspberry 2 would go down over a web scrap. And Anubis cannot protect from proper ddos so…

daniskarma@lemmy.dbzer0.com · edit-2 8 days ago

That whole thing is under two wrong suppositions.

It assumes that we sites are under constant ddos and that cannot exist if there is not ddos protection.

This is false.

It assumes that anubis is effective against ddos attacks. Which is not. Is a mitigation, but any ddos attack worth is name would not have any issue bringing down a site with anubis. As the sever still have to handle request even if they are smaller requests.

Anubis only use case is to make AI scrappers to consume more energy while scrapping, while also making many legitimate users also use more energy. It’s just being promoted in the anti-AI wave, but I don’t really see much usefulness into it.

daniskarma@lemmy.dbzer0.com · 8 days ago

How does it factor in the “free” and “open”?

It seems to be more about IP protection that any other thing.

daniskarma@lemmy.dbzer0.com · edit-2 2 months ago

How do you know it’s “AI” scrappers?

I’ve have my server up before AI was a thing.

It’s totally normal to get thousands of bot hits and to get scraped.

I use crowdsec to mitigate it. But you will always get bot hits.