here has to be some way to make progress on it, and what I saw for a long time when I looked at – well, it used to be called Singularity Institute and then MIRI – when I looked at what they were doing, what the people who talked about AI alignment were doing, it seemed like a lot of a priori, philosophical thinking about almost a form of theology.
And I don’t say that derisively. It’s almost just inherent to the subject matter when you are trying to imagine a being that much smarter than yourself or that much more omniscient and omnipotent than yourself. The term that humanity has had for millennia for that exercise has been theology, and so there was a lot of reasoning from first principles about ‘assume an arbitrarily powerful intelligence, what would it do in such and such a situation, or why would such and such approach to aligning it not work? Why would it see so much further ahead of us that we shouldn’t even bother?’ And the whole exercise felt to me like I would feel bad coming in as an outsider and saying, I don’t really see clear progress being made here.
And whether it’s true or it’s not true, in science, you always have to ask a different question, which is what can I make progress on, and I think that the general rule is that to make progress in science, you need at least one of two things. You need a clear mathematical theory, or you need experiments or data, but what is common to both of those is that you need something external to yourself that can tell you when you were wrong, that can tell you when you have to back up and try a different path.
A very good case study here would be string theory. String theory has been trying to make progress in physics in the absence of both experiments that directly bear on the questions they’re asking and a clear mathematical definition of what the theory is.
We can only see a short distance ahead, but we can see plenty there that needs to be done”
What I find both more plausible and also more productive to work on is the scenario where the ability to deceive develops gradually just like every other ability that we’ve seen, where before you get an AI that is plotting to make its one move to take over the entire world, you get AIs that are trying to deceive us and doing a pretty bad job at it, or that succeed at deceiving one person, but not another person, and in some sense, we’re already on that path.
I sometimes think about AI safety in terms of the analogy of when you have a really old shower head and the water is freezing cold, and you just want to turn it to make the water hot, and you turn it and nothing’s happening. And the danger is, well, if you turn it too fast, it could go from freezing to scalding, and that’s what you’re trying to avoid. You need to turn the shower head enough that you can feel some heat, because otherwise you’re just not getting any feedback from the system about how much should you be turning it. If you don’t get any feedback, then it’s going to make you just keep turning it more and more. But when you do start getting that feedback, then you have to moderate the speed, and then you have to be learning from what you see and not just blindly continuing to turn.
Glasp is a social web highlighter that people can highlight and organize quotes and thoughts from the web, and access other like-minded people’s learning.