We have successfully replaced thousands of complicated deep net time series based anomaly detectors at a FANG with statistical (nonparametric, semiparametric) process control ones.
They use 3 to 4 orders lower number of trained parameters and have just enough complexity that a team of 3 or four can handle several thousands of such streams.
The amount of baby sitting that deep net models needed was astronomical, debugging and understanding what has happened quite opaque.
For small teams, with limited resources I would still heavily recommend stats based models for time series anomaly detection.
May not be your best career move right now for political reasons. Those making massive bets do not like to confront that some of their bets might not have been well placed. They may try to make it difficult for contrary evidence to become too visible.
What confuses me about deep nets is that there's rarely enough signal to be able to meaningfully train a large number of parameters. Surely 99 % of those parameters are either (a) incredibly unstable, or (b) correlate perfectly with other parameters?
They do. There are enormous redundancies. There's a manifold over which the parameters can vary wildly yet do zilch to the output. The nonlinear analogue of a null space.
The current understanding goes that this overparameterization makes reaching good configurations easier while keeping the search algorithm as simple as stochastic gradient descent.
This is one of the reasons I am so skeptical of the current AI hype cycle. There are boring, well-behaved classical solutions for many of the use-cases where fancy ML is pushed today.
You'd think that rational businesses would take the low-risk snooze-fest high-margin option any day instead of unintelligible and unreliable options that demand a lot of resources, and yet...
>This is one of the reasons I am so skeptical of the current AI hype cycle. There are boring, well-behaved classical solutions for many of the use-cases where fancy ML is pushed today.
In 2013 my statistics professor warned that once we are in the real world, "people will come up to you trying to sell fancy machine learning models for big money, though the simple truth is that many problems can be solved better by applying straightforward statistical methods".
There has always been the ML hype, but the last couple years are a whole different level.
Say you have bet billions as a CEO, CTO, CFO. The decision has already been made. Such a steep price had to come at the cost of many groups and teams and projects in the company.
Now is not a time to water plants that offer alternatives. You will have a smoother ride choosing tools that justifies that billion dollar bet.
Decision-making in organizations is definitely a hard problem.
I think an uncomfortable reality is that a lot of decisions (technology, strategy, etc.) are not optimal or even rational, but more just an outcome of personal preferences.
Even data-driven approaches aren't immune since they depend on the analysis and interpretation of the data (which is subjective).
Data informed is good. Purely data driven is a bad idea.
After all even in Physics big advances came from thought experiments. Data is one way to reason about a decision, logic and knowledgebase is another way. Both can be very powerful if one retains the humility of fallibility.
In organizations one common failure mode is that the organisational level at which decisions are made are not the same levels where the decisions are going to have their effects felt.
It's a really difficult problem to solve. Too much decentralisation is also a bad idea. You get the mess of unplanned congested cities.
- Instead of trying to get LLMs to answer user questions, write better FAQs informed by reviewing tickets submitted by customers
- Instead of RAG for anything involving business data, have some DBA write a bunch of reports that answer specific business questions
- Instead of putting some copilot chat into tools and telling users to ask it to e.g. "explain recent sales trends", make task-focused wizards and visualizations so users can answer these with hard numbers
- Instead of generating code with LLMs, write more expressive frameworks and libraries that don't require so much plumbing and boilerplate
Of course, maybe there is something I am missing, but these are just my personal observations!
With all due respect, all of those examples are the examples of "yesterday" ... that's how we have been bringing money to businesses for decades, no? Today we have AI models that can already do as good, almost as good, or even better than the average human in many many tasks, including the ones you mentioned.
Businesses are incentivized to be more productive and cost-effective since they are solely profit-driven so they naturally see this as an opportunity to make more money by hiring less people while keeping the amount of work done roughly the same or even more.
So "classical" approach to many of the problems is I think the thing of a past already.
> Today we have AI models that can already do as good, almost as good, or even better than the average human in many many tasks, including the ones you mentioned.
We really don't. There are demos that look cool onstage, but there is a big difference between "in store good" and "at home good" in the sense that products aren't living up to their marketing during actual use.
IMO there is a lot of room to grow within the traditional approaches of "yesterday" - The problem is that large orgs get bogged down in legacy + bureaucracy, and most startups don't understand the business problems well enough to make a better solution. And I don't think that there is any technical silver bullet that can solve either of these problems (AI or otherwise)
I thought the OCR was one of the obvious examples where we have a classical technology that is already working very well but in the long-run I don't see it surviving. _Generic_ AI models already can do the OCR kinda good but they are not even trained for that purpose, it's almost incidental - they've never been trained to extract the, let's say name/surname from some sort of a document with a completely unfamiliar structure, but the crazy thing is that it does work somehow! I think that once somebody finetunes the AI model only for this purpose I think there's a good chance it will outperform classical approach in terms of precision and scalability.
In general I agree. For OCR I agree vehemently. Part of the reason is the structure of the solution (convolutions) match the space so well.
The failure cases are those where AI solutions have to stay in a continuous debug, train, update mode. Then you have to think about the resources you need, both in terms of people as well as compute to maintain such a solution.
Because of the way the world works, it's endemic nonstationarity, the debug-retrain-update is a common state of affairs even in traditional stats and ML.
I see. Let's take another example here, I hope I understood you - imagine you have an AI model which is connected to all of your company's in-house data generation sources such as wiki, chat, jira, emails, merge requests, excel sheets, etc. Basically everything that can be deemed useful to query or to create business inteligence on top of. These data sources are continously generating more and more data every day, and given their nature they are more or less unstructured.
Yet, we have such systems in place where we don't have to retrain the model with ever-growing data. This is one example I could think of but it kinda suggests that models, at least for some purposes, don't have to be retrained continuously to keep them running well.
I also use a technique of explaining something to the AI model he has not seen before (according to the wrong answer I got from it previously), and it manages to evolve the steps, whatever they are, so that it gives me the correct answer in the end. This also suggests that capacity of the models is larger than what they have been trained on.
Fun memories.
We have successfully replaced thousands of complicated deep net time series based anomaly detectors at a FANG with statistical (nonparametric, semiparametric) process control ones.
They use 3 to 4 orders lower number of trained parameters and have just enough complexity that a team of 3 or four can handle several thousands of such streams.
The amount of baby sitting that deep net models needed was astronomical, debugging and understanding what has happened quite opaque.
For small teams, with limited resources I would still heavily recommend stats based models for time series anomaly detection.
May not be your best career move right now for political reasons. Those making massive bets do not like to confront that some of their bets might not have been well placed. They may try to make it difficult for contrary evidence to become too visible.
What confuses me about deep nets is that there's rarely enough signal to be able to meaningfully train a large number of parameters. Surely 99 % of those parameters are either (a) incredibly unstable, or (b) correlate perfectly with other parameters?
They do. There are enormous redundancies. There's a manifold over which the parameters can vary wildly yet do zilch to the output. The nonlinear analogue of a null space.
The current understanding goes that this overparameterization makes reaching good configurations easier while keeping the search algorithm as simple as stochastic gradient descent.
Super cool, thanks for sharing!
This is one of the reasons I am so skeptical of the current AI hype cycle. There are boring, well-behaved classical solutions for many of the use-cases where fancy ML is pushed today.
You'd think that rational businesses would take the low-risk snooze-fest high-margin option any day instead of unintelligible and unreliable options that demand a lot of resources, and yet...
>This is one of the reasons I am so skeptical of the current AI hype cycle. There are boring, well-behaved classical solutions for many of the use-cases where fancy ML is pushed today.
In 2013 my statistics professor warned that once we are in the real world, "people will come up to you trying to sell fancy machine learning models for big money, though the simple truth is that many problems can be solved better by applying straightforward statistical methods".
There has always been the ML hype, but the last couple years are a whole different level.
It does not work that way in the short term.
Say you have bet billions as a CEO, CTO, CFO. The decision has already been made. Such a steep price had to come at the cost of many groups and teams and projects in the company.
Now is not a time to water plants that offer alternatives. You will have a smoother ride choosing tools that justifies that billion dollar bet.
Decision-making in organizations is definitely a hard problem.
I think an uncomfortable reality is that a lot of decisions (technology, strategy, etc.) are not optimal or even rational, but more just an outcome of personal preferences.
Even data-driven approaches aren't immune since they depend on the analysis and interpretation of the data (which is subjective).
Data informed is good. Purely data driven is a bad idea.
After all even in Physics big advances came from thought experiments. Data is one way to reason about a decision, logic and knowledgebase is another way. Both can be very powerful if one retains the humility of fallibility.
In organizations one common failure mode is that the organisational level at which decisions are made are not the same levels where the decisions are going to have their effects felt.
It's a really difficult problem to solve. Too much decentralisation is also a bad idea. You get the mess of unplanned congested cities.
[delayed]
> There are boring, well-behaved classical solutions for many of the use-cases where fancy ML is pushed today.
I know some examples but not too many. Care to share more examples?
Some off the top of my head...
- Instead of trying to get LLMs to answer user questions, write better FAQs informed by reviewing tickets submitted by customers
- Instead of RAG for anything involving business data, have some DBA write a bunch of reports that answer specific business questions
- Instead of putting some copilot chat into tools and telling users to ask it to e.g. "explain recent sales trends", make task-focused wizards and visualizations so users can answer these with hard numbers
- Instead of generating code with LLMs, write more expressive frameworks and libraries that don't require so much plumbing and boilerplate
Of course, maybe there is something I am missing, but these are just my personal observations!
With all due respect, all of those examples are the examples of "yesterday" ... that's how we have been bringing money to businesses for decades, no? Today we have AI models that can already do as good, almost as good, or even better than the average human in many many tasks, including the ones you mentioned.
Businesses are incentivized to be more productive and cost-effective since they are solely profit-driven so they naturally see this as an opportunity to make more money by hiring less people while keeping the amount of work done roughly the same or even more.
So "classical" approach to many of the problems is I think the thing of a past already.
> Today we have AI models that can already do as good, almost as good, or even better than the average human in many many tasks, including the ones you mentioned.
We really don't. There are demos that look cool onstage, but there is a big difference between "in store good" and "at home good" in the sense that products aren't living up to their marketing during actual use.
IMO there is a lot of room to grow within the traditional approaches of "yesterday" - The problem is that large orgs get bogged down in legacy + bureaucracy, and most startups don't understand the business problems well enough to make a better solution. And I don't think that there is any technical silver bullet that can solve either of these problems (AI or otherwise)
In the realm of data science, Linear models and SAT solvers used cleverly will get you a surprisingly long way.
I thought the OCR was one of the obvious examples where we have a classical technology that is already working very well but in the long-run I don't see it surviving. _Generic_ AI models already can do the OCR kinda good but they are not even trained for that purpose, it's almost incidental - they've never been trained to extract the, let's say name/surname from some sort of a document with a completely unfamiliar structure, but the crazy thing is that it does work somehow! I think that once somebody finetunes the AI model only for this purpose I think there's a good chance it will outperform classical approach in terms of precision and scalability.
In general I agree. For OCR I agree vehemently. Part of the reason is the structure of the solution (convolutions) match the space so well.
The failure cases are those where AI solutions have to stay in a continuous debug, train, update mode. Then you have to think about the resources you need, both in terms of people as well as compute to maintain such a solution.
Because of the way the world works, it's endemic nonstationarity, the debug-retrain-update is a common state of affairs even in traditional stats and ML.
I see. Let's take another example here, I hope I understood you - imagine you have an AI model which is connected to all of your company's in-house data generation sources such as wiki, chat, jira, emails, merge requests, excel sheets, etc. Basically everything that can be deemed useful to query or to create business inteligence on top of. These data sources are continously generating more and more data every day, and given their nature they are more or less unstructured.
Yet, we have such systems in place where we don't have to retrain the model with ever-growing data. This is one example I could think of but it kinda suggests that models, at least for some purposes, don't have to be retrained continuously to keep them running well.
I also use a technique of explaining something to the AI model he has not seen before (according to the wrong answer I got from it previously), and it manages to evolve the steps, whatever they are, so that it gives me the correct answer in the end. This also suggests that capacity of the models is larger than what they have been trained on.
For beginners to SPC, I wrote a practitioner's guide a couple of years ago. Might provide meaningful context: https://entropicthoughts.com/statistical-process-control-a-p...
On a sidenote, love the look and feel of your page!!