There’s no substitute for 401 Auth Required with a resource description of “Please enter any username or password to proceed”. Automated scrapers would be legally liable under US criminal hacking statutes if they started attempting random credentials at random sites, even if one claimed that an AI interpreted the message to indicate it was permissible; meanwhile, human beings can simply follow the directions to be authorized access without vulnerability under any common sense interpretation by a judge.
From an end user perspective, the whole "proof that you are human" thing is nothing but hassle.
Back then the "select all boxes with traffic light or something" is already ambiguous enough, now they even started to generate AI images for that and to be honest, I can't even get it right like 40% of the time...
... And the actual bots are able to do that better than me. What an absurd time to live in.
As a side note, I prefer using the term GPT instead of LLM. OpenAI bullied everyone until they got to choose the language but they no longer control how people use the word GPT.
Software performance, which translates into CPU/memory/disk resources, are only one aspect of the costs incurred by crawling bots, and fall under the unmetered / virtually infinite category. However, there are also the metered resources, that do translate (after a certain threshold) into monetary costs: network bandwidth.
Thus, regardless how well one optimizes his site delivery (static site, minimizing, CDN, caching, etc.) a stampede of bot crawling does in the end become a DDoS, which if it doesn't take down the infrastructure, it might leave a deep hole in one's budget.
For example, for one of the sites I manage, I get daily peaks of ~300 requests per second measured at the backend, for a site that already employs heavy caching (both client-side, CDN, and even server-side). This wasn't so a few months back, and the site didn't just jump in popularity.
>and you won't be able to "buy" a book, only rent it each time you want to read it
Then I can just start pirating them much more pervasively. Problem solved. Buying mere access to things like books and media has its uses, but to make it obligatory as an alternative to true ownership is a tendency that can go die.
I don't see how respecting one's rights (in this case one's copyright over his own works) leads to having revoked the right to buy books.
I agree that it is sad to see many online book stores moving from "selling" to "renting". But that is a completely different problem.
As a personal note, I know the pain of not being able to access scientific papers because they were behind paywalls, and I had to search for drafts to be able to read them. But that model was well in place circa 2010, thus it's and old tactic applied to a new field: books (and others).
There’s no substitute for 401 Auth Required with a resource description of “Please enter any username or password to proceed”. Automated scrapers would be legally liable under US criminal hacking statutes if they started attempting random credentials at random sites, even if one claimed that an AI interpreted the message to indicate it was permissible; meanwhile, human beings can simply follow the directions to be authorized access without vulnerability under any common sense interpretation by a judge.
There are antibot measures that do not depend on JS or cookies, like asking the name of a Red fruit, and the server adding the IP to allowed.
Very easy to bypass for sure, but custom enough to protect you from the horde of generic bots =p
From an end user perspective, the whole "proof that you are human" thing is nothing but hassle.
Back then the "select all boxes with traffic light or something" is already ambiguous enough, now they even started to generate AI images for that and to be honest, I can't even get it right like 40% of the time...
... And the actual bots are able to do that better than me. What an absurd time to live in.
As a side note, I prefer using the term GPT instead of LLM. OpenAI bullied everyone until they got to choose the language but they no longer control how people use the word GPT.
Imagine if software was actually efficient enough that bots don't affect its functionality.
Software performance, which translates into CPU/memory/disk resources, are only one aspect of the costs incurred by crawling bots, and fall under the unmetered / virtually infinite category. However, there are also the metered resources, that do translate (after a certain threshold) into monetary costs: network bandwidth.
Thus, regardless how well one optimizes his site delivery (static site, minimizing, CDN, caching, etc.) a stampede of bot crawling does in the end become a DDoS, which if it doesn't take down the infrastructure, it might leave a deep hole in one's budget.
For example, for one of the sites I manage, I get daily peaks of ~300 requests per second measured at the backend, for a site that already employs heavy caching (both client-side, CDN, and even server-side). This wasn't so a few months back, and the site didn't just jump in popularity.
[dead]
[dead]
>and you won't be able to "buy" a book, only rent it each time you want to read it
Then I can just start pirating them much more pervasively. Problem solved. Buying mere access to things like books and media has its uses, but to make it obligatory as an alternative to true ownership is a tendency that can go die.
I don't see how respecting one's rights (in this case one's copyright over his own works) leads to having revoked the right to buy books.
I agree that it is sad to see many online book stores moving from "selling" to "renting". But that is a completely different problem.
As a personal note, I know the pain of not being able to access scientific papers because they were behind paywalls, and I had to search for drafts to be able to read them. But that model was well in place circa 2010, thus it's and old tactic applied to a new field: books (and others).