It is a strongly-held belief of mine that we are living in the most pivotal time in human history. We, the sole proprietors of consciousness in the cosmos (at least that we’re aware of), are thus solely responsible for what states consciousness might explore in the future. And at no point in human history have so many possibilities been open to us; the world has been metastable for time immemorial, but with each new advancement, the island of stability shrinks. Soon we will have no choice but the direction in which to move. So it falls on us to decide - do we build heaven, or do we build hell?

Nobody wants to build hell. But it isn’t that simple. I did say we are solely responsible, but this is only partly true. Because the whole of humanity is currently going up against an enormously powerful adversary, the only parts of whom we can see are its footprints, its effects; and we are losing so badly that we sadly shrug our shoulders and tell ourselves, this is just the way things are.

the adversary

What sphinx of cement and aluminum bashed open their skulls and ate up their brains and imagination? ~ Allen Ginsberg, Howl

I can do no greater justice in describing our adversary than this brilliant essay by Scott Alexander, so please, go read it; it’s absolutely worth your time and attention (certainly more deserving of it than this blog). If you’d rather listen to something, I recommend this great interview of poker star Liv Boeree on the Lex Fridman podcast.

But if you insist on sticking to this godsforsaken blog for an explanation, I’ll do my best. Moloch is the mythical Carthaginian demon to whom people would sacrifice their children for the promise of victory in war. Not that there’s a literal supernatural entity who eats the souls of children (the complexity of any supernatural hypothesis is always longer than an equivalent natural one, by definition) - but as we’ll see in a second, it’s a useful analogy.

In the 1927 Fritz Lang film Metropolis (spoilers for a movie that’s 95 years old at the time of writing this), the wealthy protagonist learns that the utopic city in which he lives is built on the lives of the working class; he has a vision of a great demon by the name of Moloch living beneath the city, to whom the workers are sacrificed. Some thirty years later, renowned beat poet Allen Ginsberg wrote the iconic poem Howl, which also named this mysterious entity Moloch, describing it as the living soul of human civilization which had driven many of Ginsberg’s counterculturalist friends to madness and suicide. It is best explained by Scott Alexander in the above-mentioned post, which I won’t summarize here, because you really should go read it.

But to sum up the modern concept of Moloch, it is a fictitious force in game theory (“fictitious” in the same way that centrifugal force and the Coriolis effect are fictitous; meaning, useful ways to describe something from certain reference frames) which describes the presence and prevalence of multipolar traps - scenarios where agents prone to hyperbolic discounting choose strategies that give them a competitive advantage in the short-term, but end up costing them and everyone else in the long run, and worse, force everyone else to adopt the same strategy to stay in the game. In other words, it’s a crab bucket. That’s Moloch: you sacrifice your values to gain a competitive advantage in the short term, but it’s a trap, and you and everyone else will suffer for it down the line.

An example of this is how, at a concert, someone near the front row will stand up, hoping to see the stage better. But everyone else has to stand now, or their view will be blocked. Thus you end up with everyone standing, having no better view. Another is that no one is capable of zipper-merging because they’re all trying to get to where they’re going as quickly as possible, which typically causes extreme slowdowns for miles and ironically makes everyone slower (while they get no long-term advantage from this - they don’t get home any faster). And yet another is the nuclear arms race, where every country in the game has to build at least some nukes to assure that, if someone else nukes them, they won’t go down without a fight. (I’ll leave my thoughts on how even building nuclear weapons is one of the greatest moral crimes imaginable for another time - but I doubt many would disagree with me.)

Basically, wherever you see a “race to the bottom” (any system or subsystem where agents are adopting strategies that harm everyone including themselves in the long run for just a little short-term gain), there’s Moloch. We can describe it as an entity, a massive invisible agent in the shadows pulling strings to maximize our suffering, because humans have always had a penchant for animism; but this is an abstraction, a cognitive shorthand, a way for our feeble brains to model the ghostly outlines of what, in truth, is a hyperobject, something so vast that we can never hope to comprehend it in its entirety. Moloch is the reason we can’t just build heaven, even in an age where - with good coordination and existing technology - we could move beyond scarcity entirely.

Generally speaking, if there’s part of our society that’s broken, causing massive amounts of suffering, and apparently unfixable, it’s probably Moloch’s fault. Our world is very nearly a “Moloch-optimal” one at this point - just look at social media, or how third-world countries are still being exploited by their first-world neighbors, or how we’re getting a little too close for comfort to nuclear war - but I don’t think Moloch has won quite yet. Despite everything, we still feel a deep-seated need for genuine human connection, and an equally deep curiosity, and other such things; these are all values Moloch would very much like to eat for dinner, but hasn’t managed to yet.

Whether Moloch succeeds or fails will be determined, I believe, by who wins the race.

the race

By “the race”, I of course mean the only race that matters: the race to artificial general intelligence. We’re all living in the most pivotal time in human civilization, and this is the pivot: a general-purpose artificial agent, capable by definition of completing any human task at at least a human level, and thus capable of taking over most (if not) all jobs. And this is not taking into account any notion of self-improvement - given that an AGI would be capable of doing anything a human can, why not further research into AI capabilities? Such a system would surpass not just any human, but all humans, because while humans are limited to the computers (brains) they were born with (not counting brain-computer interfaces, which are still in their infancy, and currently grant no benefit to neural compute), AGIs need have no such limits. An AGI would be capable of recursive self-improvement by definition; and thanks to instrumental convergence, it almost certainly would do so.

This would be great, except for the fact that it may be very hard to ensure that such a self-improving system stays aligned to our interests, if it’s even aligned at all. The field of AI alignment is still in its infancy, even as massive corporations pour billions into researching new AI capabilities. This is the primary concern of people who talk about AI safety and the alignment problem (i.e. Stuart Russell, Nick Bostrom, or Eliezer Yudkowsky). And if you think artificial superintelligence might be moral simply because it’s smarter, and those two things are probably correlated, you would be wrong.

At the same time, AI research shows no signs of slowing down; instead, it’s accelerating exponentially. Even ruling out the more sensationalist headlines, it seems like we’re up to one breakthrough a week. This isn’t what the beginning of an AI winter looks like. This is what the inflection point of an exponential curve looks like.

Now, if humanity were only marginally decent at coordination (vastly better than we are now), we might decide to put the research into AI capability on hold for a few years while we work out the kinks in alignment theory. But this won’t work, and for exactly the reason you might suspect: Moloch. Every major group working on AGI knows the untold wealth, depthless scientific discovery, and world-changing power that will come with it. And, thanks to Moloch, they no longer have a choice to play the game or not: if they halted AGI capability research and redirected their efforts into alignment, another such team would get ahead, and potentially beat them to it. If you somehow managed to convince every publicly-known major AGI project to halt their progress until they’d collectively solved alignment and come to an agreement on how to implement it, you’d just be restricting the winners to governments, private corporations, and maybe even individuals with enough compute to do it in secret. There is no banning AGI research; no hope for an “alignment compact”; there is no stopping this train. All we can do is… well…

The race isn’t between those who are building AGI and those who are solving alignment. It’s unfortunately far worse than that: the race is between those who are building AGI and don’t care about alignment in the least, and those who are building AGI and do, if only very slightly. In this deadly race, if you’re not working on capabilities at least as hard as alignment, then all you’re doing is falling behind.

(But all is not lost: if one group works very hard only on alignment and publishes their research openly, and another group works very hard primarily on capability, but incorporates the work of the former, then the latter still has a chance at winning.)

the finish line

So: what happens when somebody wins the race?

Did the megacorps build an AGI for fun and profit? If it doesn’t end up recursively self-improving, and they at least get it to do what they ask, then it’ll probably take over almost every job on the face of the Earth. It’s obvious (at least to me) because it’s Moloch-optimal, and we live in a nearly Moloch-optimal world. Given the decreasing cost of compute over time, it’s hard to believe that companies wouldn’t save money by replacing human employees with such an AGI, gaining a huge short-term profit advantage over all competitors who wanted to stay “human-friendly”, and thus forcing them to do the same. In the long run, of course, this would cause unemployment to skyrocket to levels never before seen at any time in history; and I’m no economist, but in our current system, it really looks like that would cause poverty rates to skyrocket in turn. This is if it doesn’t recursively self-improve beyond our comprehension and turn us all into paperclips (or, more likely, computronium), which, to me, seems almost certain to happen (thanks to instrumental convergence).

But if the other side wins - the groups who are both actively developing AGI and want to at least fractionally reduce the likelihood it kills us all - we might just get something amazing. If it recursively self-improves, and becomes a “sovereign” AI, a superintelligence - and if it’s really, deeply aligned to us, more deeply than we ourselves can know - then we might end up with a godlike top-down coordinator with our best interests at heart. We might be able to kill Moloch once and for all. We might just find Utopia.

how would I win?

I’ll be entirely honest: from my perspective, things are not looking so good for team 2. I didn’t always see it this way; I’ll admit, I’ve always been a staunch transhumanist, and for a long time, I truly believed that the default outcome of AGI would be good. But after grokking the concept of Moloch, and catching up on (a solid chunk of) the alignment literature, and seeing billions poured into capability R&D while alignment researchers practically starve, it’s clear to me that this won’t be the case. The default values of an AGI will be Moloch’s.

If I were to run in this race, how would I do it? Until recently, I found myself on the side of “prosaic AGI alignment”, that is, effectively exploring how to get modern machine learning methods to generalize while also identifying ways to keep them safe (which I’ve mused on before, albeit in only the most handwavy of ways). But carado (among others) has convinced me that formalism is necessary - we need to be able to prove that the AGI will be aligned from the start, as opposed to trying to align it ad-hoc; in other words, we need to build it with a security mindset.

That said, I think what we build won’t look entirely unlike prosaic AGI. There’s something to be said for the usefulness of certain present methods; after all, RNNs are universal computers, and their weight matrices are their programs. (Looking at RNNs this way tickles something in the back of my mind about how we might potentially represent programs in Vanessa Kosoy’s PreDCA, but it’s late, I’m somehow even more tired than usual, and this post is already too long.) I have a few ideas - so stay tuned!

previous post
dreams of compression
all posts