I was excited to check out lecture videos thinking they were public, but quickly saw that they were closed.
One of the things I miss most about the pandemic was how all of these institutions opened up for the world. Lately they have been closing down not only newer course offerings but also putting old videos private. Even MIT OCW falls apart once you get into some advanced graduate courses.
I understand that universities should prioritize their alumni, but there’s literally no cost in making the underlying material (especially lectures!) available on the internet. It delivers immense value to the world.
It’s been said that RL is the worst way to train a model, except for all the others. Many prominent scientists seem to doubt that this is how we’ll be training cutting edge models in a decade. I agree, and I encourage you to try to think of alternative paradigms as you go through this course.
If that seems unlikely, remember that image generation didn’t take off till diffusion models, and GPTs didn’t take off till RLHF. If you’ve been around long enough it’ll seem obvious that this isn’t the final step. The challenge for you is, find the one that’s better.
You're assuming that people are only interested in image and text generation.
RL excels at learning control problems. It is mathematically guaranteed to provide an optimal solution for the state and controls you provide it, given enough runtime. For some problems (playing computer games), that runtime is surprisingly short.
There is a reason self-driving cars use RL, and don't use GPTs.
Control theory and reinforcement learning are different ways of looking at the same problem. They traditionally and culturally focussed on different aspects.
I feel like both this comment and the parent comment highlight how RL has been going through a cycle of misunderstanding recently from another one of its popularity booms due to being used to train LLMs
More likely we will develop general super intelligent AI before we (together with our super intelligent friends) solve the problem of combinatorial optimization.
There's nothing to solve. The CoD kills you no matter what. P=NP or maybe quantum computing is the only hope of making serious progress on large-scale combinatorial optimization.
I was excited to check out lecture videos thinking they were public, but quickly saw that they were closed.
One of the things I miss most about the pandemic was how all of these institutions opened up for the world. Lately they have been closing down not only newer course offerings but also putting old videos private. Even MIT OCW falls apart once you get into some advanced graduate courses.
I understand that universities should prioritize their alumni, but there’s literally no cost in making the underlying material (especially lectures!) available on the internet. It delivers immense value to the world.
It’s been said that RL is the worst way to train a model, except for all the others. Many prominent scientists seem to doubt that this is how we’ll be training cutting edge models in a decade. I agree, and I encourage you to try to think of alternative paradigms as you go through this course.
If that seems unlikely, remember that image generation didn’t take off till diffusion models, and GPTs didn’t take off till RLHF. If you’ve been around long enough it’ll seem obvious that this isn’t the final step. The challenge for you is, find the one that’s better.
You're assuming that people are only interested in image and text generation.
RL excels at learning control problems. It is mathematically guaranteed to provide an optimal solution for the state and controls you provide it, given enough runtime. For some problems (playing computer games), that runtime is surprisingly short.
There is a reason self-driving cars use RL, and don't use GPTs.
I have been using it to train it on my game hotlapdaily
Apparently AI sets the best time even better than the pros It is really useful when it comes to controlled environment optimizations
You are exactly right.
Control theory and reinforcement learning are different ways of looking at the same problem. They traditionally and culturally focussed on different aspects.
RL is barely even a training method, its more of a dataset generation method.
I feel like both this comment and the parent comment highlight how RL has been going through a cycle of misunderstanding recently from another one of its popularity booms due to being used to train LLMs
Its reductive, but also roughly correct.
care to correct the misunderstanding?
What about for combinatorial optimization? When you have a simulation of the world what other paradigms are fitting
More likely we will develop general super intelligent AI before we (together with our super intelligent friends) solve the problem of combinatorial optimization.
There's nothing to solve. The CoD kills you no matter what. P=NP or maybe quantum computing is the only hope of making serious progress on large-scale combinatorial optimization.
GPT wouldn't have even been possible, let alone take off, without self supervised learning.
RL is extremely brittle, it's often difficult to make it converge. Even Stanford folks admit that. Are there any solutions for this?
Are the videos available somewhere?
spring course is on YouTube https://m.youtube.com/playlist?list=PLoROMvodv4rN4wG6Nk6sNpT...
Given Ilya's podcast this is an interesting title.
So, basically AI Winter? :-)
That's how I read it XD "oh no, RL is dead too"
I didn't get the reference. Please elaborate.
he said RL sucks because it narrowly optimizes to solve a certain set of problems in certain sets of conditions.
he compared it to students who win at math competition but cant do anything practical .