This is a brief note on artificial intelligence, and why I am, in short, skeptical of doom.
It is certainly true that we do not know how, or if, we can align AI. I have no doubt that the artificial intelligences we construct will be deeply alien to us, that we will not understand, in any deep sense, the mechanics of how they work, and that we will always be uncertain of whether we are pointing them toward the right goals.
At the same time, we are extremely uncertain how, or if, we can make a super intelligent AI at all. There is no guarantee that we will ever make a super-intelligent AI. There is no guarantee we will ever make an AI which is meaningfully an independent agent, or has desires as we think of them, or which cannot be aligned by simply saying “do this, not that”.
What I want is for our belief in technological change to be two-sided. I cannot see why we should a priori believe that technology will improve faster than our ability to control them. To have faith in one side, but not the other, is simply a mood. It is not a mood shared by those who have to put their money where their mouth is, either — while it is theoretically ambiguous which way interest rates should go if AGI comes and is aligned, a misaligned AI killing everyone would unequivocally make real interest rates go up.
It’s okay that we don’t understand AIs. I do not believe we need to understand AIs to align them. We do not understand how humans think, either, nor can we directly intervene upon our preferences, yet we are capable of sustaining the civilization in which we now live. I am sure that we will have a good shot at developing novel methods of alignment which could not be anticipated in advance — after all, won’t we have the best AIs helping us?
Came across a couple LW posts yesterday that were more bearish on LLMs than I expected from LW.
https://www.lesswrong.com/posts/oKAFFvaouKKEhbBPm/a-bear-case-my-predictions-regarding-ai-progress
https://www.lesswrong.com/posts/tqmQTezvXGFmfSe7f/how-much-are-llms-actually-boosting-real-world-programmer
I was actually a founder/president of the Berkeley student AI Safety group (https://berkeleyaisafety.com/), but I would say I've always been on the more skeptical side. I might write about the topic at some point.
I'm pretty sympathetic to the "aligned by default" idea, it seems to be the much more plausible outcome. But it needs to be said that this doesn't strictly avoid bad outcomes. It might be trivially easy to adjust the weights for bad outcomes. And disempowerment scenarios seem likely to me - I think Will Macaskill has more properly adjusted to the ways AI threats are currently looking to an extent that lesswrong doomers haven't.