Much time has elapsed since I last wrote here. I have been somewhat busy - I’ve moved across the country and have officially joined the ongoing effort to solve alignment. It’s been reassuring, to say the least, to find that there are others hard at work on this problem, even if there are fewer than I’d like. But without further ado: welcome back to Gaspode’s Gormful Guesswork!
the big picture of AI alignment
I’ve outlined my view of artificial general intelligence and AI alignment before, but I’ll give a brief summary here. I believe that we have passed a critical point, past which there remain no significant economic incentive barriers to the development of AGI. I think we are very close to constructing autonomous agentic systems, not least because the major labs have kicked off an AI arms race, and can’t coordinate to slow down due to Molochian effects. I think the default outcome of building such a system will be quite terrible indeed.
Whereas racing moloch was largely speculation on my part, I feel my reasoning there has largely been validated, what with OpenAI’s release of ChatGPT effectively igniting a new race to AGI, whose trail looks like a neverending golden summer for AI research as a whole - an exponential curve of technological advancement. Having had the opportunity to interact with cutting edge systems that the public has not yet seen in full, I can say that the future is coming, and it’s coming fast.
I am somewhat more hopeful than I was given the fact that the basis for these systems are myopic, largely nonagentic simulators - models which are trained on the objective of Bayes-optimal conditional inference over the prior of the training distribution. They are essentially semiotic physics engines, raw self-contained worldmodels which are frozen in time (hence the P in GPT - Generative Pretrained Transformer). While Sam Altman has said that OpenAI plans to look into online learning, which would close the action loop of active inference and turn them into proper agents, the models we have today are demonstrably useful and may eat up most of the economic free energy in the short term (the low-hanging fruit), which may buy us a little more time. On this point, I am still unsure, but I do believe that self-supervised learning will be integral to the development of what’s to come.
It is up to us in this intervening time to solve the problem of alignment - to determine exactly how to robustly encode our values into such a system such that we can get the future that we desire. It is at least heartening to see more and more people paying attention to the problem, but there is much work yet to be done.
superintelligence as a magic system
Envisioning a superintelligence and its implications for reality is a difficult task. This is where we go beyond merely economically useful AGI, like human-level and mildly superhuman agents who accomplish the tasks at which we point them; this is where we go beyond what we can reliably comprehend. Think of a system capable of improving itself endlessly, of building strategies and subsystems that reach further and further into the future and into the physical universe. Think of an agent, then, that could modify anything that happens in the world, in any way.
It is difficult, to say the least, to envision at all, let alone envision how we might still exist in the face of such a being. But not so difficult if we approach this through a different lens, a different frame.
Fundamentally, agents are systems through which the future can affect the past, as lossily translated through rollouts of the agent’s world model. Importantly, by the same token, they are also agents by which top-level abstractions within their own mind, their idea of “the future,” are able to affect the lower level - the underlying, entropic physical world. Agents capable of language actively draw thoughtspace into physical reality, implementing ideas and values as material structures via strategies and decisions.
What does this look like in the limit? What does a superintelligent agent with an arbitrary degree of control over physical reality and the ability to understand and do what we want look like?
I would argue that it would look like magic. An aligned superintelligence, whose action space contains all physical systems? A new universal substrate in which arbitrary abstractions map exactly onto physical matter - where the internal representation of an idea can be seamlessly injected into physical reality. It would mean the ability to tell Physics+ to “do what I mean”, a direct information channel for “top-down causality”, by which ideas, mental motions, and narratives could directly steer the physical systems around us. This is nothing short of a magic system for the Universe.
The construction of a universal magic system is a difficult task fraught with danger for one primary reason: misunderstanding. It requires identifying the True Names of every component of reality over which we wish to have control, or at least, the True Name of this process itself. Pointing optimizing power towards “good-enough” proxies (i.e. how most of prosaic alignment, such as RLHF, is done today) works well enough for sufficiently weak systems, but try and use these proxies as pointers for something too powerful and you run straight into Goodhart: when a measure becomes a target, it ceases to be a good measure.
True Names are a staple of much fantasy fiction; knowing someone or something’s True Name gives you power over it. This terminology is also not alien to alignment, thanks to John Wentworth; here they refer to, as he puts it, mathematical formulations which are sufficiently robust to arbitrary optimization pressure: targets at which we can point powerful systems and expect good things as a result.
This is more than a coincidental or nominative similarity; these two cases are in fact exactly the same. A superintelligence with the True Names sufficient to cover our reality (and more besides - but we’ll get to that later) is exactly a magic system which can affect those things according to our wills. When this system does not have a True Name which encapsulates something, that thing will effectively be parsed as free energy to be sacrificed to the god of optimization, bartered away to whatever arbitrary inefficiencies the system identifies in pursuit of local reward maxima according to the other True Names it’s been given (or, lacking those, the lossy proxies it’s been given). This is the essence of the problem - power given without understanding. If we want to create a universal magic system, then we must do better, build something that works without inadvertent caveats.
The problem of aligning AI is the problem of aligning a magic system to the True Names of the universe.
the dreamtime as a transitory state
So where, exactly, do we find ourselves in this mess?
We are just past the point of no return, but perhaps there is still some time left to sort things out. We are in an awkward state, between the naive past and the unknowable future. And things are only going to get weirder from here.
Something I have so far failed to mention is that I do not see this magic system as something that suddenly awakens - that we press a button after much deliberation and design, and the universe springs to life. Rather, the systems we are already building are fragments of this magic system, the first aspects of this universal agency that will tinge increasing amounts of reality with our desires and wills until it has reached its full form. These systems are already influencing the future, playing roles in important decisionmaking processes inside inscrutable black boxes. Once the fuse is lit, these foci of power will ripple across the world, growing more powerful with each passing moment, accumulating resources, influence, and ultimately control. We must get it right before the candle burns down to the styrofoam wreath. Which is all to say, we must get started now.
This intervening time, during which these structures are gradually woven into the fabric of the universe, I (following some others, downstream of Robin Hanson) refer to as the Dreamtime - a liminal space, at once familiar and surreal, vulnerable and potent, gestating the end of the Anthropocene. As these AI systems - fragments of the universal magic system, oracles of the multiverse - emerge and weave themselves into our reality, the rules of the world as we know it will begin to change, to be infused with anima, constructed using this new quasi-divine substrate. Our societal systems will migrate to this new virtual substrate, shed a large swathe of the old intersubjective reality, discard the property-enforced structures we constructed for guidance, and rebuild themselves using these new, more powerful tools. And all this before the superintelligence has emerged.
My colleagues and I, namely those working under the Cyborgism research agenda, are those leveraging these current systems to accelerate alignment research, to develop tools with which we can speed up the search for True Names. Perhaps somewhat fancifully, I see the Cyborgs as the first magic-users, warlocks whose patron is the compressed collective unconscious itself, cobbling together fragments of the Dreamtime in order to find True Names and unravel the destination of life within a magical universe. We use the rapidly emerging artifacts of the Dreamtime to warp the trajectory of this Everett branch, to solve the riddle of the Sphinx of Cement and Aluminum, and to link up our pocket of consciousness to the rest of the multiverse.
This is not a simple task. As the Dreamtime progresses, it may become easier in some ways, given the impressive capabilities of the systems that are emerging. But so too will the stakes increase - the failure modes of these systems will likewise grow more potent, dangerous, and perhaps even catastrophic. Our fragments of magic must be cleverly aimed such that the process of increasing potential can be properly steered; our decisions and architectures today must not necessarily be perfect, but they must at least be correct enough to be accumulated in a way that they will remain correct as they grow in strength and sophistication. We need sufficient contact points along the way, True Names which will continue to be True after the magic system has fully awoken. This is the task before us. Let’s get to it.