Worth a listen!
And the famous quote is, “If we build a machine to achieve our purposes with which we cannot interfere once we’ve started it, then we had better be quite sure that the purpose we put into the machine is the thing we really desire.” And this has continued through the early 20th century, as the thought experiment of the Paperclip Maximizer that turns the universe into paperclips, killing everyone in the process.
But to your point, I don’t think we need these thought experiments anymore. We’re now living with these alignment problems every day. So, one example is there’s a facial recognition data set called Labeled Faces in the Wild. And it was collected by scraping newspaper articles off the web and using the images that came with the articles. Later, this data set was analyzed. And it was found that the most prevalent individuals in the data set were the people who appeared in newspaper articles in the late 2000s.
And so, you get issues like there are twice as many pictures of George W. Bush as of all Black women combined. And so, if you train a model on that data set, you think you’re building facial recognition, but you’re actually building George W. Bush recognition. And so, this is going to have totally unpredictable behavior.
There is a computer science research group that has the, I think, somewhat tongue in cheek title of People for the Ethical Treatment of Reinforcement Learning Agents. But there are people who absolutely sincerely think that we should start now thinking about the ethical implications of making a program play Super Mario Brothers for four months straight, 24 hours a day.
You talked about one that did Super Mario Brothers, and it’s just caught in this game that has no more novelty. And it’s a novelty seeking robot. And I thought it was so sad.
Yeah, it just learns to sit there. Because it’s like, well, why would I jump across this little pipe because it’s just the same old shit on the other side. Like, well, I might as well just do nothing. I might as well just kill myself. And there have been reinforcement learning agents that, because of the nature of the environment, essentially learn to commit suicide as quickly as possible. Because there’s a time penalty being assessed for every second that passes that you don’t achieve some goal. And they can’t achieve it, so they’re like, well, the next best thing is to just like die right now.
And again, it’s like we’re somewhere on this slippery slope. I mean, there is this funny thing for me, where the more I study AI, the more concerned I become with animal rights. And I’m not saying that AlphaGo is equivalent to a factory farm chicken or something like that, necessarily. But going back to some of the things we’ve talked about, the dopamine system, some of these drives that are — the fact that we are building artificial neural networks that at least to some degree of approximation are modeled explicitly on the brain. We’re using TD learning, which is modeled explicitly on the dopamine system. We are building these things in our own image.
And so, the odds of them having some kind of subjective experience, I think, are higher than if we were just writing a generic software. This is the huge question of philosophy of mind, is are we going to if we manage to create something with a subjectivity or not? I’m not sure. But these questions, I think, are going to go from seemingly crazy now to maybe on a par with something like animal welfare by the end of the century. I think that’s not a crazy prediction to make.