Superintelligence
SUPERINTELLIGENCE
Paths, Dangers, Strategies

Director, Future of Humanity Institute
Professor, Faculty of Philosophy & Oxford Martin School
University of Oxford
1
1
Great Clarendon Street, Oxford, OX2 6DP,
United Kingdom
Oxford University Press is a department of the University of Oxford.
It furthers the University’s objective of excellence in research, scholarship,
and education by publishing worldwide. Oxford is a registered trade mark of
Oxford University Press in the UK and in certain other countries
© Nick Bostrom 2014
e moral rights of the author have been asserted
First Edition published in 2014
Impression: 1
All rights reserved. No part of this publication may be reproduced, stored in
a retrieval system, or transmitted, in any form or by any means, without the
prior permission in writing of Oxford University Press, or as expressly permitted
by law, by licence or under terms agreed with the appropriate reprographics
rights organization. Enquiries concerning reproduction outside the scope of the
above should be sent to the Rights Department, Oxford University Press, at the
address above
You must not circulate this work in any other form
and you must impose this same condition on any acquirer
British Library Cataloguing in Publication Data
Data available
Library of Congress Control Number: 2013955152
ISBN 978–0–19–967811–2
Printed in Italy by
L.E.G.O. S.p.A.—Lavis TN
Links to third party websites are provided by Oxford in good faith and
for information only. Oxford disclaims any responsibility for the materials
contained in any third party website referenced in this work.
The Unfinished Fable of the Sparrows
I
t was the nest-building season, but aer days of long hard work, the sparrows
sat in the evening glow, relaxing and chirping away.
“We are all so small and weak. Imagine how easy life would be if we had an
owl who could help us build our nests!”
“Yes!” said another. “And we could use it to look aer our elderly and our young.
“It could give us advice and keep an eye out for the neighborhood cat,” added
a third.
en Pastus, the elder-bird, spoke: “Let us send out scouts in all directions and
try to nd an abandoned owlet somewhere, or maybe an egg. A crow chick might
also do, or a baby weasel. is could be the best thing that ever happened to us, at
least since the opening of the Pavilion of Unlimited Grain in yonder backyard.
e ock was exhilarated, and sparrows everywhere started chirping at the top
of their lungs.
Only Scronknkle, a one-eyed sparrow with a fretful temperament, was uncon-
vinced of the wisdom of the endeavor. Quoth he: “is will surely be our undoing.
Should we not give some thought to the art of owl-domestication and owl-taming
rst, before we bring such a creature into our midst?”
Replied Pastus: “Taming an owl sounds like an exceedingly dicult thing to do.
It will be dicult enough to nd an owl egg. So let us start there. Aer we have suc-
ceeded in raising an owl, then we can think about taking on this other challenge.
“ere is a aw in that plan!” squeaked Scronknkle; but his protests were in
vain as the ock had already lied o to start implementing the directives set out
by Pastus.
Just two or three sparrows remained behind. Together they began to try to work
out how owls might be tamed or domesticated. ey soon realized that Pastus had
been right: this was an exceedingly dicult challenge, especially in the absence
of an actual owl to practice on. Nevertheless they pressed on as best they could,
constantly fearing that the ock might return with an owl egg before a solution to
the control problem had been found.
It is not known how the story ends, but the author dedicates this book to
Scronknkle and his followers.
PREFACE
|
vii
PREFACE
I
nside your cranium is the thing that does the reading. is thing, the human
brain, has some capabilities that the brains of other animals lack. It is to these
distinctive capabilities that we owe our dominant position on the planet.
Other animals have stronger muscles and sharper claws, but we have cleverer
brains. Our modest advantage in general intelligence has led us to develop lan-
guage, technology, and complex social organization. e advantage has com-
pounded over time, as each generation has built on the achievements of its
predecessors.
If some day we build machine brains that surpass human brains in general
intelligence, then this new superintelligence could become very powerful. And,
as the fate of the gorillas now depends more on us humans than on the gorillas
themselves, so the fate of our species would depend on the actions of the machine
superintelligence.
We do have one advantage: we get to build the stu. In principle, we could build
a kind of superintelligence that would protect human values. We would certainly
have strong reason to do so. In practice, the control problem—the problem of
how to control what the superintelligence would do—looks quite dicult. It also
looks like we will only get one chance. Once unfriendly superintelligence exists,
it would prevent us from replacing it or changing its preferences. Our fate would
be sealed.
In this book, I try to understand the challenge presented by the prospect of
superintelligence, and how we might best respond. is is quite possibly the most
important and most daunting challenge humanity has ever faced. And—whether
we succeed or fail—it is probably the last challenge we will ever face.
It is no part of the argument in this book that we are on the threshold of a big
breakthrough in articial intelligence, or that we can predict with any precision
when such a development might occur. It seems somewhat likely that it will hap-
pen sometime in this century, but we dont know for sure. e rst couple of chap-
ters do discuss possible pathways and say something about the question of timing.
e bulk of the book, however, is about what happens aer. We study the kinetics
of an intelligence explosion, the forms and powers of superintelligence, and the
strategic choices available to a superintelligent agent that attains a decisive advan-
tage. We then shi our focus to the control problem and ask what we could do to
viii
|
PREFACE
shape the initial conditions so as to achieve a survivable and benecial outcome.
Toward the end of the book, we zoom out and contemplate the larger picture that
emerges from our investigations. Some suggestions are oered on what ought to
be done now to increase our chances of avoiding an existential catastrophe later.
is has not been an easy book to write. I hope the path that has been cleared
will enable other investigators to reach the new frontier more swily and con-
veniently, so that they can arrive there fresh and ready to join the work to further
expand the reach of our comprehension. (And if the way that has been made is
a little bumpy and bendy, I hope that reviewers, in judging the result, will not
underestimate the hostility of the terrain ex ante!)
is has not been an easy book to write: I have tried to make it an easy book
to read, but I don’t think I have quite succeeded. When writing, I had in mind
as the target audience an earlier time-slice of myself, and I tried to produce a
kind of book that I would have enjoyed reading. is could prove a narrow demo-
graphic. Nevertheless, I think that the content should be accessible to many peo-
ple, if they put some thought into it and resist the temptation to instantaneously
misunderstand each new idea by assimilating it with the most similar-sounding
cliché available in their cultural larders. Non-technical readers should not be dis-
couraged by the occasional bit of mathematics or specialized vocabulary, for it
is always possible to glean the main point from the surrounding explanations.
(Conversely, for those readers who want more of the nitty-gritty, there is quite a
lot to be found among the endnotes.
1
)
Many of the points made in this book are probably wrong.
2
It is also likely that
there are considerations of critical importance that I fail to take into account,
thereby invalidating some or all of my conclusions. I have gone to some length to
indicate nuances and degrees of uncertainty throughout the text—encumbering
it with an unsightly smudge of “possibly,” “might,” “may,” “could well,” “it seems,
“probably,” “very likely,” “almost certainly.” Each qualier has been placed where
it is carefully and deliberately. Yet these topical applications of epistemic mod-
esty are not enough; they must be supplemented here by a systemic admission of
uncertainty and fallibility. is is not false modesty: for while I believe that my
book is likely to be seriously wrong and misleading, I think that the alternative
views that have been presented in the literature are substantially worse—includ-
ing the default view, or “null hypothesis,” according to which we can for the time
being safely or reasonably ignore the prospect of superintelligence.
ACKNOWLEDGMENTS
|
ix
ACKNOWLEDGMENTS
T
he membrane that has surrounded the writing process has been fairly per-
meable. Many concepts and ideas generated while working on the book
have been allowed to seep out and have become part of a wider conversa-
tion; and, of course, numerous insights originating from the outside while the
book was underway have been incorporated into the text. I have tried to be some-
what diligent with the citation apparatus, but the inuences are too many to fully
document.
For extensive discussions that have helped clarify my thinking I am grateful to
a large set of people, including Ross Andersen, Stuart Armstrong, Owen Cotton-
Barratt, Nick Beckstead, David Chalmers, Paul Christiano, Milan Ćirković, Daniel
Dennett, David Deutsch, Daniel Dewey, Eric Drexler, Peter Eckersley, Amnon Eden,
Owain Evans, Benja Fallenstein, Alex Flint, Carl Frey, Ian Goldin, Katja Grace,
J. Storrs Hall, Robin Hanson, Demis Hassabis, James Hughes, Marcus Hutter,
Garry Kasparov, Marcin Kulczycki, Shane Legg, Moshe Looks, Willam MacAskill,
Eric Mandelbaum, James Martin, Lillian Martin, Roko Mijic, Vincent Mueller, Elon
Musk, Seán Ó hÉigeartaigh, Toby Ord, Dennis Pamlin, Derek Part, David Pearce,
Huw Price, Martin Rees, Bill Roscoe, Stuart Russell, Anna Salamon, Lou Salkind,
Anders Sandberg, Julian Savulescu, Jürgen Schmidhuber, Nicholas Shackel, Murray
Shanahan, Noel Sharkey, Carl Shulman, Peter Singer, Dan Stoicescu, Jaan Tallinn,
Alexander Tamas, Max Tegmark, Roman Yampolskiy, and Eliezer Yudkowsky.
For especially detailed comments, I am grateful to Milan Ćirković, Daniel
Dewey, Owain Evans, Nick Hay, Keith Manseld, Luke Muehlhauser, Toby Ord,
Jess Riedel, Anders Sandberg, Murray Shanahan, and Carl Shulman. For advice
or research help with dierent parts I want to thank Stuart Armstrong, Daniel
Dewey, Eric Drexler, Alexandre Erler, Rebecca Roache, and Anders Sandberg.
For help with preparing the manuscript, I am thankful to Caleb Bell, Malo
Bourgon, Robin Brandt, Lance Bush, Cathy Douglass, Alexandre Erler, Kristian
Rönn, Susan Rogers, Andrew Snyder-Beattie, Cecilia Tilli, and Alex Vermeer.
Iwant particularly to thank my editor Keith Manseld for his plentiful encour-
agement throughout the project.
My apologies to everybody else who ought to have been remembered here.
Finally, a most fond thank you to funders, friends, and family: without your
backing, this work would not have been done.
CONTENTS
|
xi
CONTENTS
Lists of Figures, Tables, and Boxes xv
1. Past developments and present capabilities 1
Growth modes and big history 1
Great expectations 3
Seasons of hope and despair 5
State of the art 11
Opinions about the future of machine intelligence 18
2. Paths to superintelligence 22
Articial intelligence 23
Whole brain emulation 30
Biological cognition 36
Brain–computer interfaces 44
Networks and organizations 48
Summary 50
3. Forms of superintelligence 52
Speed superintelligence 53
Collective superintelligence 54
Quality superintelligence 56
Direct and indirect reach 58
Sources of advantage for digital intelligence 59
4. e kinetics of an intelligence explosion 62
Timing and speed of the takeo 62
Recalcitrance 66
Non-machine intelligence paths 66
Emulation and AI paths 68
Optimization power and explosivity 73
xii
|
CONTENTS
5. Decisive strategic advantage 78
Will the frontrunner get a decisive strategic advantage? 79
How large will the successful project be? 83
Monitoring 84
International collaboration 86
From decisive strategic advantage to singleton 87
6. Cognitive superpowers 91
Functionalities and superpowers 92
An AI takeover scenario 95
Power over nature and agents 99
7. e superintelligent will 105
e relation between intelligence and motivation 105
Instrumental convergence 109
Self-preservation 109
Goal-content integrity 109
Cognitive enhancement 111
Technological perfection 112
Resource acquisition 113
8. Is the default outcome doom? 115
Existential catastrophe as the default outcome of an intelligence
explosion? 115
e treacherous turn 116
Malignant failure modes 119
Perverse instantiation 120
Infrastructure profusion 122
Mind crime 125
9. e control problem 127
Two agency problems 127
Capability control methods 129
Boxing methods 129
Incentive methods 131
Stunting 135
Tripwires 137
Motivation selection methods 138
Direct specication 139
Domesticity 140
Indirect normativity 141
Augmentation 142
Synopsis 143
CONTENTS
|
xiii
10. Oracles, genies, sovereigns, tools 145
Oracles 145
Genies and sovereigns 148
Tool-AIs 151
Comparison 155
11. Multipolar scenarios 159
Of horses and men 160
Wages and unemployment 160
Capital and welfare 161
e Malthusian principle in a historical perspective 163
Population growth and investment 164
Life in an algorithmic economy 166
Voluntary slavery, casual death 167
Would maximally ecient work be fun? 169
Unconscious outsourcers? 172
Evolution is not necessarily up 173
Post-transition formation of a singleton? 176
A second transition 177
Superorganisms and scale economies 178
Unication by treaty 180
12. Acquiring values 185
e value-loading problem 185
Evolutionary selection 187
Reinforcement learning 188
Associative value accretion 189
Motivational scaolding 191
Value learning 192
Emulation modulation 201
Institution design 202
Synopsis 207
13. Choosing the criteria for choosing 209
e need for indirect normativity 209
Coherent extrapolated volition 211
Some explications 212
Rationales for CEV 213
Further remarks 216
Morality models 217
Do What I Mean 220
Component list 221
Goal content 222
xiv
|
CONTENTS
Decision theory 223
Epistemology 224
Ratication 225
Getting close enough 227
14. e strategic picture 228
Science and technology strategy 228
Dierential technological development 229
Preferred order of arrival 230
Rates of change and cognitive enhancement 233
Technology couplings 236
Second-guessing 238
Pathways and enablers 240
Eects of hardware progress 240
Should whole brain emulation research be promoted? 242
e person-aecting perspective favors speed 245
Collaboration 246
e race dynamic and its perils 246
On the benets of collaboration 249
Working together 253
15. Crunch time 255
Philosophy with a deadline 255
What is to be done? 256
Seeking the strategic light 257
Building good capacity 258
Particular measures 258
Will the best in human nature please stand up 259
Notes 261
Bibliography 305
Index 325
LISTS OF FIGURES, TABLES, AND BOXES
|
xv
LISTS OF FIGURES, TABLES,
AND BOXES
List of Figures
1. Long-term history of world GDP. 3
2. Overall long-term impact of HLMI. 21
3. Supercomputer performance. 27
4. Reconstructing 3D neuroanatomy from electron microscope images. 31
5. Whole brain emulation roadmap. 34
6. Composite faces as a metaphor for spell-checked genomes. 41
7. Shape of the takeo. 63
8. A less anthropomorphic scale? 70
9. One simple model of an intelligence explosion. 77
10. Phases in an AI takeover scenario. 96
11. Schematic illustration of some possible trajectories for a hypothetical
wise singleton. 101
12. Results of anthropomorphizing alien motivation. 106
13. Articial intelligence or whole brain emulation rst? 243
14. Risk levels in AI technology races. 247
List of Tables
1. Game-playing AI 12
2. When will human-level machine intelligence be attained? 19
3. How long from human level to superintelligence? 20
4. Capabilities needed for whole brain emulation 32
5. Maximum IQ gains from selecting among a set of embryos 37
6. Possible impacts from genetic selection in dierent scenarios 40
7. Some strategically signicant technology races 81
8. Superpowers: some strategically relevant tasks and corresponding
skill sets 94
9. Dierent kinds of tripwires 137
10. Control methods 143
11. Features of dierent system castes 156
12. Summary of value-loading techniques 207
13. Component list 222
List of Boxes
1. An optimal Bayesian agent 10
2. e 2010 Flash Crash 17
3. What would it take to recapitulate evolution? 25
4. On the kinetics of an intelligence explosion 75
5. Technology races: some historical examples 80
6. e mail-ordered DNA scenario 98
7. How big is the cosmic endowment? 101
8. Anthropic capture 134
9. Strange solutions from blind search 154
10. Formalizing value learning 194
11. An AI that wants to be friendly 197
12. Two recent (half-baked) ideas 198
13. A risk-race to the bottom 247
xvi
|
LISTS OF FIGURES, TABLES, AND BOXES
GROWTH MODES AND BIG HISTORY
|
CHAPTER 1
Past developments
andpresent capabilities
W
e begin by looking back. History, at the largest scale, seems to
exhibit a sequence of distinct growth modes, each much more
rapid than its predecessor. is pattern has been taken to suggest
that another (even faster) growth mode might be possible. However, we do not
place much weight on this observation—this is not a book about “technological
acceleration” or “exponential growth” or the miscellaneous notions sometimes
gathered under the rubric of “the singularity.” Next, we review the history of
articial intelligence. We then survey the eld’s current capabilities. Finally, we
glance at some recent expert opinion surveys, and contemplate our ignorance
about the timeline of future advances.
Growth modes and big history
A mere few million years ago our ancestors were still swinging from the branches
in the African canopy. On a geological or even evolutionary timescale, the rise
of Homo sapiens from our last common ancestor with the great apes happened
swily. We developed upright posture, opposable thumbs, and—crucially—some
relatively minor changes in brain size and neurological organization that led to
a great leap in cognitive ability. As a consequence, humans can think abstractly,
communicate complex thoughts, and culturally accumulate information over the
generations far better than any other species on the planet.
ese capabilities let humans develop increasingly ecient productive technol-
ogies, making it possible for our ancestors to migrate far away from the rainforest
and the savanna. Especially aer the adoption of agriculture, population densi-
ties rose along with the total size of the human population. More people meant
more ideas; greater densities meant that ideas could spread more readily and that
some individuals could devote themselves to developing specialized skills. ese
2
|
PAST DEVELOPMENTS ANDPRESENT CAPABILITIES
developments increased the rate of growth of economic productivity and techno-
logical capacity. Later developments, related to the Industrial Revolution, brought
about a second, comparable step change in the rate of growth.
Such changes in the rate of growth have important consequences. A few hun-
dred thousand years ago, in early human (or hominid) prehistory, growth was so
slow that it took on the order of one million years for human productive capacity
to increase suciently to sustain an additional one million individuals living at
subsistence level. By 5000 , following the Agricultural Revolution, the rate of
growth had increased to the point where the same amount of growth took just two
centuries. Today, following the Industrial Revolution, the world economy grows
on average by that amount every ninety minutes.
1
Even the present rate of growth will produce impressive results if maintained
for a moderately long time. If the world economy continues to grow at the same
pace as it has over the past y years, then the world will be some 4.8 times richer
by 2050 and about 34 times richer by 2100 than it is today.
2
Yet the prospect of continuing on a steady exponential growth path pales in
comparison to what would happen if the world were to experience another step
change in the rate of growth comparable in magnitude to those associated with
the Agricultural Revolution and the Industrial Revolution. e economist Robin
Hanson estimates, based on historical economic and population data, a char-
acteristic world economy doubling time for Pleistocene hunter–gatherer soci-
ety of 224,000 years; for farming society, 909 years; and for industrial society,
6.3years.
3
(In Hanson’s model, the present epoch is a mixture of the farming and
the industrial growth modes—the world economy as a whole is not yet growing at
the 6.3-year doubling rate.) If another such transition to a dierent growth mode
were to occur, and it were of similar magnitude to the previous two, it would result
in a new growth regime in which the world economy would double in size about
every two weeks.
Such a growth rate seems fantastic by current lights. Observers in earlier
epochs might have found it equally preposterous to suppose that the world econ-
omy would one day be doubling several times within a single lifespan. Yet that is
the extraordinary condition we now take to be ordinary.
e idea of a coming technological singularity has by now been widely popu-
larized, starting with Vernor Vinge’s seminal essay and continuing with the writ-
ings of Ray Kurzweil and others.
4
e term “singularity,” however, has been used
confusedly in many disparate senses and has accreted an unholy (yet almost mil-
lenarian) aura of techno-utopian connotations.
5
Since most of these meanings
and connotations are irrelevant to our argument, we can gain clarity by dispens-
ing with the “singularity” word in favor of more precise terminology.
e singularity-related idea that interests us here is the possibility of an intel-
ligence explosion, particularly the prospect of machine superintelligence. ere
may be those who are persuaded by growth diagrams like the ones in Figure 1
that another drastic change in growth mode is in the cards, comparable to the
Agricultural or Industrial Revolution. ese folk may then reect that it is hard
GR EAT EXPECTATI ONS
|
3
to conceive of a scenario in which the world economys doubling time shortens
to mere weeks that does not involve the creation of minds that are much faster
and more ecient than the familiar biological kind. However, the case for tak-
ing seriously the prospect of a machine intelligence revolution need not rely on
curve-tting exercises or extrapolations from past economic growth. As we shall
see, there are stronger reasons for taking heed.
Great expectations
Machines matching humans in general intelligence—that is, possessing com-
mon sense and an eective ability to learn, reason, and plan to meet complex
information-processing challenges across a wide range of natural and abstract
domains—have been expected since the invention of computers in the 1940s. At
that time, the advent of such machines was oen placed some twenty years into
(a)
(b)
0
5
10
15
20
25
30
35
40
45
1700 1750 1800 1850 1900 1950 2000
World GDP in trillions (2012 Int$)
Year
0
5
10
15
20
25
30
35
40
45
20000–2000–4000–6000–8000
World GDP in trillions (2012 Int$)
Year
Figure  Long-term history of world GDP. Plotted on a linear scale, the history of the world
economy looks like a flat line hugging the x-axis, until it suddenly spikes vertically upward. (a)Even
when we zoom in on the most recent 0,000 years, the pattern remains essentially one of a single
90° angle. (b) Only within the past 00 years or so does the curve lift perceptibly above the zero-
level. (The dierent lines in the plot correspond to dierent data sets, which yield slightly dierent
estimates.
6
)
4
|
PAST DEVELOPMENTS ANDPRESENT CAPABILITIES
the future.
7
Since then, the expected arrival date has been receding at a rate of one
year per year; so that today, futurists who concern themselves with the possibility
of articial general intelligence still oen believe that intelligent machines are a
couple of decades away.
8
Two decades is a sweet spot for prognosticators of radical change: near enough
to be attention-grabbing and relevant, yet far enough to make it possible to sup-
pose that a string of breakthroughs, currently only vaguely imaginable, might
by then have occurred. Contrast this with shorter timescales: most technologies
that will have a big impact on the world in ve or ten years from now are already
in limited use, while technologies that will reshape the world in less than een
years probably exist as laboratory prototypes. Twenty years may also be close to
the typical duration remaining of a forecaster’s career, bounding the reputational
risk of a bold prediction.
From the fact that some individuals have overpredicted articial intelligence in
the past, however, it does not follow that AI is impossible or will never be devel-
oped.
9
e main reason why progress has been slower than expected is that the
technical diculties of constructing intelligent machines have proved greater
than the pioneers foresaw. But this leaves open just how great those diculties
are and how far we now are from overcoming them. Sometimes a problem that
initially looks hopelessly complicated turns out to have a surprisingly simple
solution (though the reverse is probably more common).
In the next chapter, we will look at dierent paths that may lead to human-level
machine intelligence. But let us note at the outset that however many stops there
are between here and human-level machine intelligence, the latter is not the nal
destination. e next stop, just a short distance farther along the tracks, is super-
human-level machine intelligence. e train might not pause or even decelerate
at Humanville Station. It is likely to swoosh right by.
e mathematician I. J. Good, who had served as chief statistician in Alan
Turing’s code-breaking team in World War II, might have been the rst to enun-
ciate the essential aspects of this scenario. In an o-quoted passage from 1965, he
wrote:
Let an ultraintelligent machine be defined as a machine that can far surpass all the intel-
lectual activities of any man however clever. Since the design of machines is one of these
intellectual activities, an ultraintelligent machine could design even better machines; there
would then unquestionably be an “intelligence explosion,” and the intelligence of man
would be left far behind. Thus the first ultraintelligent machine is the last invention that
man need ever make, provided that the machine is docile enough to tell us how to keep it
under control.
0
It may seem obvious now that major existential risks would be associated with
such an intelligence explosion, and that the prospect should therefore be exam-
ined with the utmost seriousness even if it were known (which it is not) to have but
a moderately small probability of coming to pass. e pioneers of articial intel-
ligence, however, notwithstanding their belief in the imminence of human-level
SEASONS OF HOPE AND DESPAIR
|
5
AI, mostly did not contemplate the possibility of greater-than-human AI. It is as
though their speculation muscle had so exhausted itself in conceiving the radical
possibility of machines reaching human intelligence that it could not grasp the
corollary—that machines would subsequently become superintelligent.
e AI pioneers for the most part did not countenance the possibility that
their enterprise might involve risk.
11
ey gave no lip service—let alone seri-
ous thought—to any safety concern or ethical qualm related to the creation of
articial minds and potential computer overlords: a lacuna that astonishes even
against the background of the era’s not-so-impressive standards of critical tech-
nology assessment.
12
We must hope that by the time the enterprise eventually does
become feasible, we will have gained not only the technological prociency to
set o an intelligence explosion but also the higher level of mastery that may be
necessary to make the detonation survivable.
But before we turn to what lies ahead, it will be useful to take a quick glance at
the history of machine intelligence to date.
Seasons of hope and despair
In the summer of 1956 at Dartmouth College, ten scientists sharing an inter-
est in neural nets, automata theory, and the study of intelligence convened for
a six-week workshop. is Dartmouth Summer Project is oen regarded as the
cockcrow of articial intelligence as a eld of research. Many of the participants
would later be recognized as founding gures. e optimistic outlook among the
delegates is reected in the proposal submitted to the Rockefeller Foundation,
which provided funding for the event:
We propose that a 2 month, 0 man study of artificial intelligence be carried out. . . .
The study is to proceed on the basis of the conjecture that every aspect of learning or
any other feature of intelligence can in principle be so precisely described that a machine
can be made to simulate it. An attempt will be made to find how to make machines that
use language, form abstractions and concepts, solve kinds of problems now reserved for
humans, and improve themselves. We think that a significant advance can be made in one
or more of these problems if a carefully selected group of scientists work on it together
for a summer.
In the six decades since this brash beginning, the eld of articial intelligence has
been through periods of hype and high expectations alternating with periods of
setback and disappointment.
e rst period of excitement, which began with the Dartmouth meeting, was
later described by John McCarthy (the event’s main organizer) as the “Look, Ma,
no hands!” era. During these early days, researchers built systems designed to
refute claims of the form “No machine could ever do X!” Such skeptical claims were
common at the time. To counter them, the AI researchers created small systems
that achieved X in a “microworld” (a well-dened, limited domain that enabled
6
|
PAST DEVELOPMENTS ANDPRESENT CAPABILITIES
a pared-down version of the performance to be demonstrated), thus providing a
proof of concept and showing that X could, in principle, be done by machine. One
such early system, the Logic eorist, was able to prove most of the theorems in
the second chapter of Whitehead and Russells Principia Mathematica, and even
came up with one proof that was much more elegant than the original, thereby
debunking the notion that machines could “only think numerically” and show-
ing that machines were also able to do deduction and to invent logical proofs.
13
A
follow-up program, the General Problem Solver, could in principle solve a wide
range of formally specied problems.
14
Programs that could solve calculus prob-
lems typical of rst-year college courses, visual analogy problems of the type that
appear in some IQ tests, and simple verbal algebra problems were also written.
15
e Shakey robot (so named because of its tendency to tremble during opera-
tion) demonstrated how logical reasoning could be integrated with perception
and used to plan and control physical activity.
16
e ELIZA program showed how
a computer could impersonate a Rogerian psychotherapist.
17
In the mid-seventies,
the program SHRDLU showed how a simulated robotic arm in a simulated world
of geometric blocks could follow instructions and answer questions in English
that were typed in by a user.
18
In later decades, systems would be created that
demonstrated that machines could compose music in the style of various classical
composers, outperform junior doctors in certain clinical diagnostic tasks, drive
cars autonomously, and make patentable inventions.
19
ere has even been an AI
that cracked original jokes.
20
(Not that its level of humor was high—“What do
you get when you cross an optic with a mental object? An eye-dea”—but children
reportedly found its puns consistently entertaining.)
e methods that produced successes in the early demonstration systems oen
proved dicult to extend to a wider variety of problems or to harder problem
instances. One reason for this is the “combinatorial explosion” of possibilities that
must be explored by methods that rely on something like exhaustive search. Such
methods work well for simple instances of a problem, but fail when things get a bit
more complicated. For instance, to prove a theorem that has a 5-line long proof in
a deduction system with one inference rule and 5 axioms, one could simply enu-
merate the 3,125 possible combinations and check each one to see if it delivers the
intended conclusion. Exhaustive search would also work for 6- and 7-line proofs.
But as the task becomes more dicult, the method of exhaustive search soon
runs into trouble. Proving a theorem with a 50-line proof does not take ten times
longer than proving a theorem that has a 5-line proof: rather, if one uses exhaus-
tive search, it requires combing through 5
50
≈ 8.9 × 10
34
possible sequences—which
is computationally infeasible even with the fastest supercomputers.
To overcome the combinatorial explosion, one needs algorithms that exploit
structure in the target domain and take advantage of prior knowledge by using
heuristic search, planning, and exible abstract representations—capabilities
that were poorly developed in the early AI systems. e performance of these
early systems also suered because of poor methods for handling uncertainty,
reliance on brittle and ungrounded symbolic representations, data scarcity, and
SEASONS OF HOPE AND DESPAIR
|
7
severe hardware limitations on memory capacity and processor speed. By the
mid-1970s, there was a growing awareness of these problems. e realization that
many AI projects could never make good on their initial promises led to the onset
of the rst “AI winter”: a period of retrenchment, during which funding decreased
and skepticism increased, and AI fell out of fashion.
A new springtime arrived in the early 1980s, when Japan launched its Fih-
Generation Computer Systems Project, a well-funded public–private partner-
ship that aimed to leapfrog the state of the art by developing a massively parallel
computing architecture that would serve as a platform for articial intelligence.
is occurred at peak fascination with the Japanese “post-war economic mir-
acle,” a period when Western government and business leaders anxiously sought
to divine the formula behind Japan’s economic success in hope of replicating the
magic at home. When Japan decided to invest big in AI, several other countries
followed suit.
e ensuing years saw a great proliferation of expert systems. Designed as sup-
port tools for decision makers, expert systems were rule-based programs that
made simple inferences from a knowledge base of facts, which had been elicited
from human domain experts and painstakingly hand-coded in a formal lan-
guage. Hundreds of these expert systems were built. However, the smaller systems
provided little benet, and the larger ones proved expensive to develop, validate,
and keep updated, and were generally cumbersome to use. It was impractical to
acquire a standalone computer just for the sake of running one program. By the
late 1980s, this growth season, too, had run its course.
e Fih-Generation Project failed to meet its objectives, as did its counterparts
in the United States and Europe. A second AI winter descended. At this point, a
critic could justiably bemoan “the history of articial intelligence research to
date, consisting always of very limited success in particular areas, followed imme-
diately by failure to reach the broader goals at which these initial successes seem
at rst to hint.
21
Private investors began to shun any venture carrying the brand
of “articial intelligence.” Even among academics and their funders, “AI” became
an unwanted epithet.
22
Technical work continued apace, however, and by the 1990s, the second AI
winter gradually thawed. Optimism was rekindled by the introduction of new
techniques, which seemed to oer alternatives to the traditional logicist paradigm
(oen referred to as “Good Old-Fashioned Articial Intelligence,” or “GOFAI”
for short), which had focused on high-level symbol manipulation and which
had reached its apogee in the expert systems of the 1980s. e newly popular
techniques, which included neural networks and genetic algorithms, promised
to overcome some of the shortcomings of the GOFAI approach, in particular
the “brittleness” that characterized classical AI programs (which typically pro-
duced complete nonsense if the programmers made even a single slightly erro-
neous assumption). e new techniques boasted a more organic performance.
For example, neural networks exhibited the property of “graceful degradation:
a small amount of damage to a neural network typically resulted in a small
8
|
PAST DEVELOPMENTS ANDPRESENT CAPABILITIES
degradation of its performance, rather than a total crash. Even more importantly,
neural networks could learn from experience, nding natural ways of general-
izing from examples and nding hidden statistical patterns in their input.
23
is
made the nets good at pattern recognition and classication problems. For exam-
ple, by training a neural network on a data set of sonar signals, it could be taught
to distinguish the acoustic proles of submarines, mines, and sea life with bet-
ter accuracy than human experts—and this could be done without anybody rst
having to gure out in advance exactly how the categories were to be dened or
how dierent features were to be weighted.
While simple neural network models had been known since the late 1950s, the
eld enjoyed a renaissance aer the introduction of the backpropagation algo-
rithm, which made it possible to train multi-layered neural networks.
24
Such
multilayered networks, which have one or more intermediary (“hidden”) layers
of neurons between the input and output layers, can learn a much wider range
of functions than their simpler predecessors.
25
Combined with the increasingly
powerful computers that were becoming available, these algorithmic improve-
ments enabled engineers to build neural networks that were good enough to be
practically useful in many applications.
e brain-like qualities of neural networks contrasted favorably with the rig-
idly logic-chopping but brittle performance of traditional rule-based GOFAI
systems—enough so to inspire a new “-ism,connectionism, which emphasized
the importance of massively parallel sub-symbolic processing. More than 150,000
academic papers have since been published on articial neural networks, and they
continue to be an important approach in machine learning.
Evolution-based methods, such as genetic algorithms and genetic program-
ming, constitute another approach whose emergence helped end the second AI
winter. It made perhaps a smaller academic impact than neural nets but was
widely popularized. In evolutionary models, a population of candidate solutions
(which can be data structures or programs) is maintained, and new candidate
solutions are generated randomly by mutating or recombining variants in the
existing population. Periodically, the population is pruned by applying a selection
criterion (a tness function) that allows only the better candidates to survive into
the next generation. Iterated over thousands of generations, the average quality of
the solutions in the candidate pool gradually increases. When it works, this kind
of algorithm can produce ecient solutions to a very wide range of problems—
solutions that may be strikingly novel and unintuitive, oen looking more like
natural structures than anything that a human engineer would design. And in
principle, this can happen without much need for human input beyond the ini-
tial specication of the tness function, which is oen very simple. In practice,
however, getting evolutionary methods to work well requires skill and ingenu-
ity, particularly in devising a good representational format. Without an ecient
way to encode candidate solutions (a genetic language that matches latent struc-
ture in the target domain), evolutionary search tends to meander endlessly in a
vast search space or get stuck at a local optimum. Even if a good representational
SEASONS OF HOPE AND DESPAIR
|
9
format is found, evolution is computationally demanding and is oen defeated by
the combinatorial explosion.
Neural networks and genetic algorithms are examples of methods that stim-
ulated excitement in the 1990s by appearing to oer alternatives to the stagnat-
ing GOFAI paradigm. But the intention here is not to sing the praises of these
two methods or to elevate them above the many other techniques in machine
learning. In fact, one of the major theoretical developments of the past twenty
years has been a clearer realization of how supercially disparate techniques
can be understood as special cases within a common mathematical framework.
For example, many types of articial neural network can be viewed as classi-
ers that perform a particular kind of statistical calculation (maximum likeli-
hood estimation).
26
is perspective allows neural nets to be compared with
a larger class of algorithms for learning classiers from examples—“decision
trees,” “logistic regression models,” “support vector machines,” “naive Bayes,
k-nearest-neighbors regression,” among others.
27
In a similar manner, genetic
algorithms can be viewed as performing stochastic hill-climbing, which is
again a subset of a wider class of algorithms for optimization. Each of these
algorithms for building classiers or for searching a solution space has its own
prole of strengths and weaknesses which can be studied mathematically.
Algorithms dier in their processor time and memory space requirements,
which inductive biases they presuppose, the ease with which externally pro-
duced content can be incorporated, and how transparent their inner workings
are to a human analyst.
Behind the razzle-dazzle of machine learning and creative problem-solving
thus lies a set of mathematically well-specied tradeos. e ideal is that of the
perfect Bayesian agent, one that makes probabilistically optimal use of available
information. is ideal is unattainable because it is too computationally demand-
ing to be implemented in any physical computer (see Box 1). Accordingly, one can
view articial intelligence as a quest to nd shortcuts: ways of tractably approxi-
mating the Bayesian ideal by sacricing some optimality or generality while pre-
serving enough to get high performance in the actual domains of interest.
A reection of this picture can be seen in the work done over the past couple of
decades on probabilistic graphical models, such as Bayesian networks. Bayesian
networks provide a concise way of representing probabilistic and conditional
independence relations that hold in some particular domain. (Exploiting such
independence relations is essential for overcoming the combinatorial explosion,
which is as much of a problem for probabilistic inference as it is for logical deduc-
tion.) ey also provide important insight into the concept of causality.
28
One advantage of relating learning problems from specic domains to the gen-
eral problem of Bayesian inference is that new algorithms that make Bayesian
inference more ecient will then yield immediate improvements across many
dierent areas. Advances in Monte Carlo approximation techniques, for exam-
ple, are directly applied in computer vision, robotics, and computational genet-
ics. Another advantage is that it lets researchers from dierent disciplines more
0
|
PAST DEVELOPMENTS ANDPRESENT CAPABILITIES
Box  An optimal Bayesian agent
An ideal Bayesian agent starts out with a “prior probability distribution,” a func-
tion that assigns probabilities to each “possible world” (i.e. to each maximally
specific way the world could turn out to be).
29
This prior incorporates an induc-
tive bias such that simpler possible worlds are assigned higher probabilities. (One
way to formally define the simplicity of a possible world is in terms of its
“Kolmogorov complexity,” a measure based on the length of the shortest com-
puter program that generates a complete description of the world.
30
) The prior
also incorporates any background knowledge that the programmers wish to give
to the agent.
As the agent receives new information from its sensors, it updates its prob-
ability distribution by conditionalizing the distribution on the new information
according to Bayes’ theorem.
3
Conditionalization is the mathematical operation
that sets the new probability of those worlds that are inconsistent with the in-
formation received to zero and renormalizes the probability distribution over
the remaining possible worlds. The result is a “posterior probability distribution”
(which the agent may use as its new prior in the next time step). As the agent
makes observations, its probability mass thus gets concentrated on the shrinking
set of possible worlds that remain consistent with the evidence; and among these
possible worlds, simpler ones always have more probability.
Metaphorically, we can think of a probability as sand on a large sheet of paper.
The paper is partitioned into areas of various sizes, each area corresponding to
one possible world, with larger areas corresponding to simpler possible worlds.
Imagine also a layer of sand of even thickness spread across the entire sheet: this
is our prior probability distribution. Whenever an observation is made that rules
out some possible worlds, we remove the sand from the corresponding areas of
the paper and redistribute it evenly over the areas that remain in play. Thus, the
total amount of sand on the sheet never changes, it just gets concentrated into
fewer areas as observational evidence accumulates. This is a picture of learning in
its purest form. (To calculate the probability of a hypothesis, we simply measure
the amount of sand in all the areas that correspond to the possible worlds in
which the hypothesis is true.)
So far, we have defined a learning rule. To get an agent, we also need a deci-
sion rule. To this end, we endow the agent with a “utility function” which assigns
a number to each possible world. The number represents the desirability of
that world according to the agent’s basic preferences. Now, at each time step,
the agent selects the action with the highest expected utility.
32
(To find the ac-
tion with the highest expected utility, the agent could list all possible actions. It
could then compute the conditional probability distribution given the action—
the probability distribution that would result from conditionalizing its current
probability distribution on the observation that the action had just been taken.
Finally, it could calculate the expected value of the action as the sum of the value
STATE OF THE ART
|

Box Continued
of each possible world multiplied by the conditional probability of that world
given the action.
33
)
The learning rule and the decision rule together define an “optimality notion”
for an agent. (Essentially the same optimality notion has been broadly used in
artificial intelligence, epistemology, philosophy of science, economics, and statis-
tics.
34
) In reality, it is impossible to build such an agent because it is computation-
ally intractable to perform the requisite calculations. Any attempt to do so suc-
cumbs to a combinatorial explosion just like the one described in our discussion
of GOFAI. To see why this is so, consider one tiny subset of all possible worlds:
those that consist of a single computer monitor floating in an endless vacuum.
The monitor has , 000 × , 000 pixels, each of which is perpetually either on or
o. Even this subset of possible worlds is enormously large: the 2
(,000
×
,000)
pos-
sible monitor states outnumber all the computations expected ever to take place
in the observable universe. Thus, we could not even enumerate all the possible
worlds in this tiny subset of all possible worlds, let alone perform more elaborate
computations on each of them individually.
Optimality notions can be of theoretical interest even if they are physically
unrealizable. They give us a standard by which to judge heuristic approximations,
and sometimes we can reason about what an optimal agent would do in some
special case. We will encounter some alternative optimality notions for artificial
agents in Chapter 2.
easily pool their ndings. Graphical models and Bayesian statistics have become
a shared focus of research in many elds, including machine learning, statisti-
cal physics, bioinformatics, combinatorial optimization, and communication
theory.
35
A fair amount of the recent progress in machine learning has resulted
from incorporating formal results originally derived in other academic elds.
(Machine learning applications have also benetted enormously from faster
computers and greater availability of large data sets.)
State of the art
Articial intelligence already outperforms human intelligence in many domains.
Table 1 surveys the state of game-playing computers, showing that AIs now beat
human champions in a wide range of games.
36
ese achievements might not seem impressive today. But this is because our
standards for what is impressive keep adapting to the advances being made. Expert
chess playing, for example, was once thought to epitomize human intellection. In
the view of several experts in the late ies: “If one could devise a successful chess
2
|
PAST DEVELOPMENTS ANDPRESENT CAPABILITIES
Table  Game-playing AI
Checkers Superhuman Arthur Samuel’s checkers program, originally
written in 952 and later improved (the
955 version incorporating machine learning),
becomes the first program to learn to play
a game better than its creator.
37
In 994,
the program CHINOOK beats the reigning
human champion, marking the first time a
program wins an ocial world championship
in a game of skill. In 2002, Jonathan Schaeer
and his team “solve” checkers, i.e. produce a
program that always makes the best possible
move (combining alpha-beta search with a
database of 39 trillion endgame positions).
Perfect play by both sides leads to a draw.
38
Backgammon Superhuman 979: The backgammon program BKG by
Hans Berliner defeats the world champion
the first computer program to defeat (in an
exhibition match) a world champion in any
game—though Berliner later attributes the
win to luck with the dice rolls.
39
992: The backgammon program TD-
Gammon by Gerry Tesauro reaches
championship-level ability, using temporal
dierence learning (a form of reinforcement
learning) and repeated plays against itself to
improve.
40
In the years since, backgammon programs
have far surpassed the best human players.
4
Traveller TCS Superhuman in
collaboration
with human
42
In both 98 and 982, Douglas Lenat’s
program Eurisko wins the US championship
in Traveller TCS (a futuristic naval war game),
prompting rule changes to block its unortho-
dox strategies.
43
Eurisko had heuristics for
designing its fleet, and it also had heuristics
for modifying its heuristics.
Othello Superhuman 997: The program Logistello wins every
game in a six-game match against world
champion Takeshi Murakami.
44
Chess Superhuman 997: Deep Blue beats the world chess
champion, Garry Kasparov. Kasparov claims
to have seen glimpses of true intelligence and
creativity in some of the computer’s moves.
45
Since then, chess engines have continued to
improve.
46
Crosswords Expert level 999: The crossword-solving program Prov-
erb outperforms the average crossword-
solver.
47
STATE OF THE ART
|
3
202: The program Dr. Fill, created by Matt
Ginsberg, scores in the top quartile among
the otherwise human contestants in the
American Crossword Puzzle Tournament.
(Dr. Fill’s performance is uneven. It completes
perfectly the puzzle rated most dicult
by humans, yet is stumped by a couple of
nonstandard puzzles that involved spelling
backwards or writing answers diagonally.)
48
Scrabble Superhuman As of 2002, Scrabble-playing software sur-
passes the best human players.
49
Bridge Equal to the
best
By 2005, contract bridge playing software
reaches parity with the best human bridge
players.
50
Jeopardy! Superhuman 200: IBM’s Watson defeats the two all-time-
greatest human Jeopardy! champions, Ken
Jennings and Brad Rutter.
5
Jeopardy! is a tel-
evised game show with trivia questions about
history, literature, sports, geography, pop
culture, science, and other topics. Questions
are presented in the form of clues, and often
involve wordplay.
Poker Varied Computer poker players remain slightly
below the best humans for full-ring Texas
hold ’em but perform at a superhuman level
in some poker variants.
52
FreeCell Superhuman Heuristics evolved using genetic algorithms
produce a solver for the solitaire game
FreeCell (which in its generalized form is NP-
complete) that is able to beat high-ranking
human players.
53
Go Very strong
amateur level
As of 202, the Zen series of go-playing pro-
grams has reached rank 6 dan in fast games
(the level of a very strong amateur player),
using Monte Carlo tree search and machine
learning techniques.
54
Go-playing programs
have been improving at a rate of about  dan/
year in recent years. If this rate of improve-
ment continues, they might beat the human
world champion in about a decade.
Table Continued
machine, one would seem to have penetrated to the core of human intellectual
endeavor.
55
is no longer seems so. One sympathizes with John McCarthy, who
lamented: “As soon as it works, no one calls it AI anymore.
56
ere is an important sense, however, in which chess-playing AI turned out
to be a lesser triumph than many imagined it would be. It was once supposed,
4
|
PAST DEVELOPMENTS ANDPRESENT CAPABILITIES
perhaps not unreasonably, that in order for a computer to play chess at grandmas-
ter level, it would have to be endowed with a high degree of general intelligence.
57
One might have thought, for example, that great chess playing requires being able
to learn abstract concepts, think cleverly about strategy, compose exible plans,
make a wide range of ingenious logical deductions, and maybe even model one’s
opponents thinking. Not so. It turned out to be possible to build a perfectly ne
chess engine around a special-purpose algorithm.
58
When implemented on the
fast processors that became available towards the end of the twentieth century, it
produces very strong play. But an AI built like that is narrow. It plays chess; it can
do no other.
59
In other domains, solutions have turned out to be more complicated than
initially expected, and progress slower. e computer scientist Donald Knuth
was struck that “AI has by now succeeded in doing essentially everything that
requires ‘thinking’ but has failed to do most of what people and animals do
‘without thinking’—that, somehow, is much harder!”
60
Analyzing visual scenes,
recognizing objects, or controlling a robots behavior as it interacts with a natural
environment has proved challenging. Nevertheless, a fair amount of progress has
been made and continues to be made, aided by steady improvements in hardware.
Common sense and natural language understanding have also turned out to be
dicult. It is now oen thought that achieving a fully human-level performance
on these tasks is an “AI-complete” problem, meaning that the diculty of solv-
ing these problems is essentially equivalent to the diculty of building generally
human-level intelligent machines.
61
In other words, if somebody were to succeed
in creating an AI that could understand natural language as well as a human
adult, they would in all likelihood also either already have succeeded in creating
an AI that could do everything else that human intelligence can do, or they would
be but a very short step from such a general capability.
62
Chess-playing expertise turned out to be achievable by means of a surprisingly
simple algorithm. It is tempting to speculate that other capabilities—such as gen-
eral reasoning ability, or some key ability involved in programming—might like-
wise be achievable through some surprisingly simple algorithm. e fact that the
best performance at one time is attained through a complicated mechanism does
not mean that no simple mechanism could do the job as well or better. It might
simply be that nobody has yet found the simpler alternative. e Ptolemaic system
(with the Earth in the center, orbited by the Sun, the Moon, planets, and stars)
represented the state of the art in astronomy for over a thousand years, and its pre-
dictive accuracy was improved over the centuries by progressively complicating
the model: adding epicycles upon epicycles to the postulated celestial motions.
en the entire system was overthrown by the heliocentric theory of Copernicus,
which was simpler and—though only aer further elaboration by Kepler—more
predictively accurate.
63
Articial intelligence methods are now used in more areas than it would
make sense to review here, but mentioning a sampling of them will give an idea
of the breadth of applications. Aside from the game AIs listed in Table 1, there
STATE OF THE ART
|
5
are hearing aids with algorithms that lter out ambient noise; route-nders that
display maps and oer navigation advice to drivers; recommender systems that
suggest books and music albums based on a user’s previous purchases and ratings;
and medical decision support systems that help doctors diagnose breast cancer,
recommend treatment plans, and aid in the interpretation of electrocardiograms.
ere are robotic pets and cleaning robots, lawn-mowing robots, rescue robots,
surgical robots, and over a million industrial robots.
64
e world population of
robots exceeds 10 million.
65
Modern speech recognition, based on statistical techniques such as hidden
Markov models, has become suciently accurate for practical use (some frag-
ments of this book were draed with the help of a speech recognition program).
Personal digital assistants, such as Apple’s Siri, respond to spoken commands and
can answer simple questions and execute commands. Optical character recogni-
tion of handwritten and typewritten text is routinely used in applications such as
mail sorting and digitization of old documents.
66
Machine translation remains imperfect but is good enough for many applica-
tions. Early systems used the GOFAI approach of hand-coded grammars that had
to be developed by skilled linguists from the ground up for each language. Newer
systems use statistical machine learning techniques that automatically build sta-
tistical models from observed usage patterns. e machine infers the parameters
for these models by analyzing bilingual corpora. is approach dispenses with
linguists: the programmers building these systems need not even speak the lan-
guages they are working with.
67
Face recognition has improved suciently in recent years that it is now used
at automated border crossings in Europe and Australia. e US Department of
State operates a face recognition system with over 75 million photographs for visa
processing. Surveillance systems employ increasingly sophisticated AI and data-
mining technologies to analyze voice, video, or text, large quantities of which are
trawled from the world’s electronic communications media and stored in giant
data centers.
eorem-proving and equation-solving are by now so well established that
they are hardly regarded as AI anymore. Equation solvers are included in sci-
entic computing programs such as Mathematica. Formal verication methods,
including automated theorem provers, are routinely used by chip manufacturers
to verify the behavior of circuit designs prior to production.
e US military and intelligence establishments have been leading the way to
the large-scale deployment of bomb-disposing robots, surveillance and attack
drones, and other unmanned vehicles. ese still depend mainly on remote
control by human operators, but work is underway to extend their autonomous
capabilities.
Intelligent scheduling is a major area of success. e DART tool for automated
logistics planning and scheduling was used in Operation Desert Storm in 1991
to such eect that DARPA (the Defense Advanced Research Projects Agency in
the United States) claims that this single application more than paid back their
6
|
PAST DEVELOPMENTS ANDPRESENT CAPABILITIES
thirty-year investment in AI.
68
Airline reservation systems use sophisticated
scheduling and pricing systems. Businesses make wide use of AI techniques in
inventory control systems. ey also use automatic telephone reservation systems
and helplines connected to speech recognition soware to usher their hapless
customers through labyrinths of interlocking menu options.
AI technologies underlie many Internet services. Soware polices the world’s
email trac, and despite continual adaptation by spammers to circumvent
the countermeasures being brought against them, Bayesian spam lters have
largely managed to hold the spam tide at bay. Soware using AI components
is responsible for automatically approving or declining credit card transac-
tions, and continuously monitors account activity for signs of fraudulent use.
Information retrieval systems also make extensive use of machine learning.
e Google search engine is, arguably, the greatest AI system that has yet been
built.
Now, it must be stressed that the demarcation between articial intelligence
and soware in general is not sharp. Some of the applications listed above might
be viewed more as generic soware applications rather than AI in particular—
though this brings us back to McCarthy’s dictum that when something works
it is no longer called AI. A more relevant distinction for our purposes is that
between systems that have a narrow range of cognitive capability (whether they
be called “AI” or not) and systems that have more generally applicable problem-
solving capacities. Essentially all the systems currently in use are of the former
type: narrow. However, many of them contain components that might also play
a role in future articial general intelligence or be of service in its development
components such as classiers, search algorithms, planners, solvers, and repre-
sentational frameworks.
One high-stakes and extremely competitive environment in which AI sys-
tems operate today is the global nancial market. Automated stock-trading
systems are widely used by major investing houses. While some of these are
simply ways of automating the execution of particular buy or sell orders issued
by a human fund manager, others pursue complicated trading strategies that
adapt to changing market conditions. Analytic systems use an assortment of
data-mining techniques and time series analysis to scan for patterns and trends
in securities markets or to correlate historical price movements with external
variables such as keywords in news tickers. Financial news providers sell news-
feeds that are specially formatted for use by such AI programs. Other systems
specialize in nding arbitrage opportunities within or between markets, or in
high-frequency trading that seeks to prot from minute price movements that
occur over the course of milliseconds (a timescale at which communication
latencies even for speed-of-light signals in optical ber cable become signicant,
making it advantageous to locate computers near the exchange). Algorithmic
high-frequency traders account for more than half of equity shares traded on
US markets.
69
Algorithmic trading has been implicated in the 2010 Flash Crash
(see Box 2).
STATE OF THE ART
|
7
Box 2 The 200 Flash Crash
By the afternoon of May, 6, 200, US equity markets were already down 4%
on worries about the European debt crisis. At 2:32 p.m., a large seller (a mu-
tual fund complex) initiated a sell algorithm to dispose of a large number of the
E-Mini S&P 500 futures contracts to be sold o at a sell rate linked to a measure
of minute-to-minute liquidity on the exchange. These contracts were bought by
algorithmic high-frequency traders, which were programmed to quickly eliminate
their temporary long positions by selling the contracts on to other traders. With
demand from fundamental buyers slacking, the algorithmic traders started to sell
the E-Minis primarily to other algorithmic traders, which in turn passed them
on to other algorithmic traders, creating a “hot potato” eect driving up trad-
ing volume this being interpreted by the sell algorithm as an indicator of high
liquidity, prompting it to increase the rate at which it was putting E-Mini contracts
on the market, pushing the downward spiral. At some point, the high-frequency
traders started withdrawing from the market, drying up liquidity while prices con-
tinued to fall. At 2:45 p.m., trading on the E-Mini was halted by an automatic circuit
breaker, the exchange’s stop logic functionality. When trading was restarted, a
mere five seconds later, prices stabilized and soon began to recover most of the
losses. But for a while, at the trough of the crisis, a trillion dollars had been wiped
o the market, and spillover eects had led to a substantial number of trades in in-
dividual securities being executed at “absurd” prices, such as one cent or 00,000
dollars. After the market closed for the day, representatives of the exchanges met
with regulators and decided to break all trades that had been executed at prices
60% or more away from their pre-crisis levels (deeming such transactions “clearly
erroneous” and thus subject to post facto cancellation under existing trade rules).
70
The retelling here of this episode is a digression because the computer pro-
grams involved in the Flash Crash were not particularly intelligent or sophisti-
cated, and the kind of threat they created is fundamentally dierent from the
concerns we shall raise later in this book in relation to the prospect of machine
superintelligence. Nevertheless, these events illustrate several useful lessons.
One is the reminder that interactions between individually simple components
(such as the sell algorithm and the high-frequency algorithmic trading programs)
can produce complicated and unexpected eects. Systemic risk can build up in
a system as new elements are introduced, risks that are not obvious until after
something goes wrong (and sometimes not even then).
7
Another lesson is that smart professionals might give an instruction to a pro-
gram based on a sensible-seeming and normally sound assumption (e.g. that trad-
ing volume is a good measure of market liquidity), and that this can produce
catastrophic results when the program continues to act on the instruction with
iron-clad logical consistency even in the unanticipated situation where the as-
sumption turns out to be invalid. The algorithm just does what it does; and unless
continued
8
|
PAST DEVELOPMENTS ANDPRESENT CAPABILITIES
Box 2 Continued
it is a very special kind of algorithm, it does not care that we clasp our heads and
gasp in dumbstruck horror at the absurd inappropriateness of its actions. This is
a theme that we will encounter again.
A third observation in relation to the Flash Crash is that while automa-
tion contributed to the incident, it also contributed to its resolution. The pre-
preprogrammed stop order logic, which suspended trading when prices moved
too far out of whack, was set to execute automatically because it had been cor-
rectly anticipated that the triggering events could happen on a timescale too swift
for humans to respond. The need for pre-installed and automatically executing
safety functionality—as opposed to reliance on runtime human supervision—
again foreshadows a theme that will be important in our discussion of machine
superintelligence.
72
Opinions about the future of machine intelligence
Progress on two major fronts—towards a more solid statistical and information-
theoretic foundation for machine learning on the one hand, and towards the
practical and commercial success of various problem-specic or domain-specic
applications on the other—has restored to AI research some of its lost prestige.
ere may, however, be a residual cultural eect on the AI community of its ear-
lier history that makes many mainstream researchers reluctant to align them-
selves with over-grand ambition. us Nils Nilsson, one of the old-timers in the
eld, complains that his present-day colleagues lack the boldness of spirit that
propelled the pioneers of his own generation:
Concern for “respectability” has had, I think, a stultifying eect on some AI research-
ers. I hear them saying things like, “AI used to be criticized for its flossiness. Now that
we have made solid progress, let us not risk losing our respectability.” One result of this
conservatism has been increased concentration on “weak AI”—the variety devoted to
providing aids to human thought—and away from “strong AI”—the variety that attempts
to mechanize human-level intelligence.
73
Nilsson’s sentiment has been echoed by several others of the founders, including
Marvin Minsky, John McCarthy, and Patrick Winston.
74
e last few years have seen a resurgence of interest in AI, which might yet
spill over into renewed eorts towards articial general intelligence (what
Nilsson calls “strong AI”). In addition to faster hardware, a contemporary pro-
ject would benet from the great strides that have been made in the many sub-
elds of AI, in soware engineering more generally, and in neighboring elds
such as computational neuroscience. One indication of pent-up demand for
quality information and education is shown in the response to the free online
OPINIONS ABOUT THE FUTURE
|
9
oering of an introductory course in articial intelligence at Stanford Uni-
versity in the fall of 2011, organized by Sebastian run and Peter Norvig. Some
160,000 students from around the world signed up to take it (and 23,000 com-
pleted it).
75
Expert opinions about the future of AI vary wildly. ere is disagreement about
timescales as well as about what forms AI might eventually take. Predictions
about the future development of articial intelligence, one recent study noted,
“are as condent as they are diverse.”
76
Although the contemporary distribution of belief has not been very carefully
measured, we can get a rough impression from various smaller surveys and infor-
mal observations. In particular, a series of recent surveys have polled members
of several relevant expert communities on the question of when they expect
human-level machine intelligence” (HLMI) to be developed, dened as “one
that can carry out most human professions at least as well as a typical human.
77
Results are shown in Table 2. e combined sample gave the following (median)
estimate: 10% probability of HLMI by 2022, 50% probability by 2040, and 90%
probability by 2075. (Respondents were asked to premiss their estimates on the
assumption that “human scientic activity continues without major negative
disruption.”)
ese numbers should be taken with some grains of salt: sample sizes are quite
small and not necessarily representative of the general expert population. ey
are, however, in concordance with results from other surveys.
78
e survey results are also in line with some recently published interviews with
about two-dozen researchers in AI-related elds. For example, Nils Nilsson has
spent a long and productive career working on problems in search, planning,
knowledge representation, and robotics; he has authored textbooks in articial
intelligence; and he recently completed the most comprehensive history of the
eld written to date.
79
When asked about arrival dates for HLMI, he oered the
following opinion:
80
0% chance: 2030
50% chance: 2050
90% chance: 200
Table 2 When will human-level machine intelligence be attained?
8
10% 50% 90%
PT-AI 2023 2048 2080
AGI 2022 2040 2065
EETN 2020 2050 2093
TOP00 2024 2050 2070
Combined 2022 2040 2075
20
|
PAST DEVELOPMENTS ANDPRESENT CAPABILITIES
Judging from the published interview transcripts, Professor Nilsson’s probabil-
ity distribution appears to be quite representative of many experts in the area—
though again it must be emphasized that there is a wide spread of opinion: there
are practitioners who are substantially more boosterish, condently expecting
HLMI in the 202040 range, and others who are condent either that it will
never happen or that it is indenitely far o.
82
In addition, some interviewees
feel that the notion of a “human level” of articial intelligence is ill-dened or
misleading, or are for other reasons reluctant to go on record with a quantitative
prediction.
My own view is that the median numbers reported in the expert survey do
not have enough probability mass on later arrival dates. A 10% probability of
HLMI not having been developed by 2075 or even 2100 (aer conditionalizing on
human scientic activity continuing without major negative disruption”) seems
too low.
Historically, AI researchers have not had a strong record of being able to pre-
dict the rate of advances in their own eld or the shape that such advances would
take. On the one hand, some tasks, like chess playing, turned out to be achiev-
able by means of surprisingly simple programs; and naysayers who claimed that
machines would “never” be able to do this or that have repeatedly been proven
wrong. On the other hand, the more typical errors among practitioners have
been to underestimate the diculties of getting a system to perform robustly on
real-world tasks, and to overestimate the advantages of their own particular pet
project or technique.
e survey also asked two other questions of relevance to our inquiry. One
inquired of respondents about how much longer they thought it would take to
reach superintelligence assuming human-level machine is rst achieved. e
results are in Table 3.
Another question inquired what they thought would be the overall long-term
impact for humanity of achieving human-level machine intelligence. e answers
are summarized in Figure 2.
My own views again dier somewhat from the opinions expressed in the sur-
vey. I assign a higher probability to superintelligence being created relatively soon
aer human-level machine intelligence. I also have a more polarized outlook on
the consequences, thinking an extremely good or an extremely bad outcome to
be somewhat more likely than a more balanced outcome. e reasons for this will
become clear later in the book.
Table 3 How long from human level to superintelligence?
Within 2 years after HLMI Within 30 years after HLMI
TOP00 5% 50%
Combined 0% 75%
OPINIONS ABOUT THE FUTURE
|
2
Small sample sizes, selection biases, and—above all—the inherent unreliability
of the subjective opinions elicited mean that one should not read too much into
these expert surveys and interviews. ey do not let us draw any strong conclu-
sion. But they do hint at a weak conclusion. ey suggest that (at least in lieu of
better data or analysis) it may be reasonable to believe that human-level machine
intelligence has a fairly sizeable chance of being developed by mid-century, and
that it has a non-trivial chance of being developed considerably sooner or much
later; that it might perhaps fairly soon thereaer result in superintelligence; and
that a wide range of outcomes may have a signicant chance of occurring, includ-
ing extremely good outcomes and outcomes that are as bad as human extinc-
tion.
84
At the very least, they suggest that the topic is worth a closer look.
0
10
20
30
40
50
Extremely
good
On balance
good
More or less
neutral
On balance
bad
Extremely
bad
(existential
catastrophe)
TOP 100
Combined
Figure 2 Overall long-term impact of HLMI.
83
22
|
PATHS TO SUPERINTELLIGENCE
CHAPTER 2
Paths to superintelligence
M
achines are currently far inferior to humans in general intelligence.
Yet one day (we have suggested) they will be superintelligent. How do
we get from here to there? is chapter explores several conceivable
technological paths. We look at articial intelligence, whole brain emulation,
biological cognition, and human–machine interfaces, as well as networks and
organizations. We evaluate their dierent degrees of plausibility as pathways to
superintelligence. e existence of multiple paths increases the probability that
the destination can be reached via at least one of them.
We can tentatively dene a superintelligence as any intellect that greatly exceeds
the cognitive performance of humans in virtually all domains of interest.
1
We will
have more to say about the concept of superintelligence in the next chapter, where
we will subject it to a kind of spectral analysis to distinguish some dierent pos-
sible forms of superintelligence. But for now, the rough characterization just given
will suce. Note that the denition is noncommittal about how the superintelli-
gence is implemented. It is also noncommittal regarding qualia: whether a super-
intelligence would have subjective conscious experience might matter greatly for
some questions (in particular for some moral questions), but our primary focus
here is on the causal antecedents and consequences of superintelligence, not on
the metaphysics of mind.
2
e chess program Deep Fritz is not a superintelligence on this denition,
since Fritz is only smart within the narrow domain of chess. Certain kinds of
domain-specic superintelligence could, however, be important. When referring
to superintelligent performance limited to a particular domain, we will note the
restriction explicitly. For instance, an “engineering superintelligence” would be
an intellect that vastly outperforms the best current human minds in the domain
of engineering. Unless otherwise noted, we use the term to refer to systems that
have a superhuman level of general intelligence.
But how might we create superintelligence? Let us examine some possible paths.
ARTIFICIAL INTELLIGENCE
|
23
Artificial intelligence
Readers of this chapter must not expect a blueprint for programming an articial
general intelligence. No such blueprint exists yet, of course. And had I been in
possession of such a blueprint, I most certainly would not have published it in a
book. (If the reasons for this are not immediately obvious, the arguments in sub-
sequent chapters will make them clear.)
We can, however, discern some general features of the kind of system that
would be required. It now seems clear that a capacity to learn would be an integral
feature of the core design of a system intended to attain general intelligence, not
something to be tacked on later as an extension or an aerthought. e same
holds for the ability to deal eectively with uncertainty and probabilistic informa-
tion. Some faculty for extracting useful concepts from sensory data and internal
states, and for leveraging acquired concepts into exible combinatorial represen-
tations for use in logical and intuitive reasoning, also likely belong among the
core design features in a modern AI intended to attain general intelligence.
e early Good Old-Fashioned Articial Intelligence systems did not, for the
most part, focus on learning, uncertainty, or concept formation, perhaps because
techniques for dealing with these dimensions were poorly developed at the time.
is is not to say that the underlying ideas are all that novel. e idea of using
learning as a means of bootstrapping a simpler system to human-level intelligence
can be traced back at least to Alan Turing’s notion of a “child machine,” which he
wrote about in 1950:
Instead of trying to produce a programme to simulate the adult mind, why not rather try
to produce one which simulates the child’s? If this were then subjected to an appropriate
course of education one would obtain the adult brain.
3
Turing envisaged an iterative process to develop such a child machine:
We cannot expect to find a good child machine at the first attempt. One must experiment
with teaching one such machine and see how well it learns. One can then try another
and see if it is better or worse. There is an obvious connection between this process
and evolution. . . . One may hope, however, that this process will be more expeditious
than evolution. The survival of the fittest is a slow method for measuring advantages. The
experimenter, by the exercise of intelligence, should be able to speed it up. Equally impor-
tant is the fact that he is not restricted to random mutations. If he can trace a cause for
some weakness he can probably think of the kind of mutation which will improve it.
4
We know that blind evolutionary processes can produce human-level general
intelligence, since they have already done so at least once. Evolutionary processes
with foresight—that is, genetic programs designed and guided by an intelligent
human programmer—should be able to achieve a similar outcome with far greater
eciency. is observation has been used by some philosophers and scientists,
24
|
PATHS TO SUPERINTELLIGENCE
including David Chalmers and Hans Moravec, to argue that human-level AI is
not only theoretically possible but feasible within this century.
5
e idea is that
we can estimate the relative capabilities of evolution and human engineering to
produce intelligence, and nd that human engineering is already vastly superior
to evolution in some areas and is likely to become superior in the remaining areas
before too long. e fact that evolution produced intelligence therefore indicates
that human engineering will soon be able to do the same. us, Moravec wrote
(already back in 1976):
The existence of several examples of intelligence designed under these constraints should
give us great confidence that we can achieve the same in short order. The situation is
analogous to the history of heavier than air flight, where birds, bats and insects clearly
demonstrated the possibility before our culture mastered it.
6
One needs to be cautious, though, in what inferences one draws from this line
of reasoning. It is true that evolution produced heavier-than-air ight, and that
human engineers subsequently succeeded in doing likewise (albeit by means
of a very dierent mechanism). Other examples could also be adduced, such as
sonar, magnetic navigation, chemical weapons, photoreceptors, and all kinds
of mechanic and kinetic performance characteristics. However, one could
equally point to areas where human engineers have thus far failed to match
evolution: in morphogenesis, self-repair, and the immune defense, for example,
human eorts lag far behind what nature has accomplished. Moravec’s argu-
ment, therefore, cannot give us “great condence” that we can achieve human-
level articial intelligence “in short order.” At best, the evolution of intelligent
life places an upper bound on the intrinsic diculty of designing intelligence.
But this upper bound could be quite far above current human engineering
capabilities.
Another way of deploying an evolutionary argument for the feasibility of AI is
via the idea that we could, by running genetic algorithms on suciently fast com-
puters, achieve results comparable to those of biological evolution. is version of
the evolutionary argument thus proposes a specic method whereby intelligence
could be produced.
But is it true that we will soon have computing power sucient to recapitu-
late the relevant evolutionary processes that produced human intelligence? e
answer depends both on how much computing technology will advance over
the next decades and on how much computing power would be required to run
genetic algorithms with the same optimization power as the evolutionary process
of natural selection that lies in our past. Although, in the end, the conclusion we
get from pursuing this line of reasoning is disappointingly indeterminate, it is
instructive to attempt a rough estimate (see Box 3). If nothing else, the exercise
draws attention to some interesting unknowns.
e upshot is that the computational resources required to simply replicate the
relevant evolutionary processes on Earth that produced human-level intelligence
are severely out of reach—and will remain so even if Moore’s law were to continue
ARTIFICIAL INTELLIGENCE
|
25
Box 3 What would it take to recapitulate evolution?
Not every feat accomplished by evolution in the course of the development of
human intelligence is relevant to a human engineer trying to artificially evolve
machine intelligence. Only a small portion of evolutionary selection on Earth
has been selection for intelligence. More specifically, the problems that human
engineers cannot trivially bypass may have been the target of a very small portion
of total evolutionary selection. For example, since we can run our computers on
electrical power, we do not have to reinvent the molecules of the cellular energy
economy in order to create intelligent machines—yet such molecular evolution
of metabolic pathways might have used up a large part of the total amount of se-
lection power that was available to evolution over the course of Earth’s history.
7
One might argue that the key insights for AI are embodied in the structure of
nervous systems, which came into existence less than a billion years ago.
8
If we take
that view, then the number of relevant “experiments” available to evolution is dras-
tically curtailed. There are some 46×0
30
prokaryotes in the world today, but only
0
9
insects, and fewer than 0
0
humans (while pre-agricultural populations were
orders of magnitude smaller).
9
These numbers are only moderately intimidating.
Evolutionary algorithms, however, require not only variations to select among
but also a fitness function to evaluate variants, and this is typically the most com-
putationally expensive component. A fitness function for the evolution of arti-
ficial intelligence plausibly requires simulation of neural development, learning,
and cognition to evaluate fitness. We might thus do better not to look at the raw
number of organisms with complex nervous systems, but instead to attend to
the number of neurons in biological organisms that we might need to simulate to
mimic evolution’s fitness function. We can make a crude estimate of that latter
quantity by considering insects, which dominate terrestrial animal biomass (with
ants alone estimated to contribute some520%).
0
Insect brain size varies sub-
stantially, with large and social insects sporting larger brains: a honeybee brain has
just under 0
6
neurons, a fruit fly brain has 0
5
neurons, and ants are in between
with 250,000 neurons.

The majority of smaller insects may have brains of only a
few thousand neurons. Erring on the side of conservatively high, if we assigned all
0
9
insects fruit-fly numbers of neurons, the total would be 0
24
insect neurons
in the world. This could be augmented with an additional order of magnitude to
account for aquatic copepods, birds, reptiles, mammals, etc., to reach 0
25
. (By
contrast, in pre-agricultural times there were fewer than 0
7
humans, with under
0

neurons each: thus fewer than 0
8
human neurons in total, though humans
have a higher number of synapses per neuron.)
The computational cost of simulating one neuron depends on the level of
detail that one includes in the simulation. Extremely simple neuron models use
about ,000 floating-point operations per second (FLOPS) to simulate one neu-
ron (in real-time). The electrophysiologically realistic HodgkinHuxley model
continued
26
|
PATHS TO SUPERINTELLIGENCE
Box 3 Continued
uses ,200,000 FLOPS. A more detailed multi-compartmental model would
add another three to four orders of magnitude, while higher-level models that
abstract systems of neurons could subtract two to three orders of magnitude
from the simple models.
2
If we were to simulate 0
25
neurons over a billion years
of evolution (longer than the existence of nervous systems as we know them),
and we allow our computers to run for one year, these figures would give us a
requirement in the range of 0
3
–0
44
FLOPS. For comparison, China’s Tianhe-2,
the worlds most powerful supercomputer as of September 203, provides only
3.39×0
6
FLOPS. In recent decades, it has taken approximately 6.7 years for
commodity computers to increase in power by one order of magnitude. Even a
century of continued Moore’s law would not be enough to close this gap. Running
more specialized hardware, or allowing longer run-times, could contribute only a
few more orders of magnitude.
This figure is conservative in another respect. Evolution achieved human intel-
ligence without aiming at this outcome. In other words, the fitness functions for
natural organisms do not select only for intelligence and its precursors.
3
Even
environments in which organisms with superior information processing skills
reap various rewards may not select for intelligence, because improvements to
intelligence can (and often do) impose significant costs, such as higher energy con-
sumption or slower maturation times, and those costs may outweigh whatever
benefits are gained from smarter behavior. Excessively deadly environments also
reduce the value of intelligence: the shorter one’s expected lifespan, the less time
there will be for increased learning ability to pay o. Reduced selective pressure
for intelligence slows the spread of intelligence-enhancing innovations, and thus
the opportunity for selection to favor subsequent innovations that depend on
them. Furthermore, evolution may wind up stuck in local optima that humans
would notice and bypass by altering tradeos between exploitation and explor-
ation or by providing a smooth progression of increasingly dicult intelligence
tests.
4
And as mentioned earlier, evolution scatters much of its selection power
on traits that are unrelated to intelligence (such as Red Queen’s races of competi-
tive co-evolution between immune systems and parasites). Evolution continues
to waste resources producing mutations that have proved consistently lethal, and
it fails to take advantage of statistical similarities in the eects of dierent muta-
tions. These are all ineciencies in natural selection (when viewed as a means
of evolving intelligence) that it would be relatively easy for a human engineer to
avoid while using evolutionary algorithms to develop intelligent software.
It is plausible that eliminating ineciencies like those just described would trim
many orders of magnitude o the 0
3
–0
44
FLOPS range calculated earlier. Un-
fortunately, it is dicult to know how many orders of magnitude. It is dicult
even to make a rough estimate—for aught we know, the eciency savings could
be five orders of magnitude, or ten, or twenty-five.
5
ARTIFICIAL INTELLIGENCE
|
27
for a century (cf. Figure 3). It is plausible, however, that compared with brute-force
replication of natural evolutionary processes, vast eciency gains are achievable
by designing the search process to aim for intelligence, using various obvious
improvements over natural selection. Yet it is very hard to bound the magnitude
of those attainable eciency gains. We cannot even say whether they amount to
ve or to twenty-ve orders of magnitude. Absent further elaboration, therefore,
evolutionary arguments are not able to meaningfully constrain our expectations
of either the diculty of building human-level machine intelligence or the time-
scales for such developments.
ere is a further complication with these kinds of evolutionary considera-
tions, one that makes it hard to derive from them even a very loose upper bound
on the diculty of evolving intelligence. We must avoid the error of inferring,
from the fact that intelligent life evolved on Earth, that the evolutionary processes
involved had a reasonably high prior probability of producing intelligence. Such
an inference is unsound because it fails to take account of the observation selec-
tion eect that guarantees that all observers will nd themselves having origi-
nated on a planet where intelligent life arose, no matter how likely or unlikely it
was for any given such planet to produce intelligence. Suppose, for example, that
in addition to the systematic eects of natural selection it required an enormous
amount of lucky coincidence to produce intelligent life—enough so that intelligent
life evolves on only one planet out of every 10
30
planets on which simple replicators
arise. In that case, when we run our genetic algorithms to try to replicate what nat-
ural evolution did, we might nd that we must run some 10
30
simulations before
we nd one where all the elements come together in just the right way. is seems
fully consistent with our observation that life did evolve here on Earth. Only by
Figure 3 Supercomputer performance. In a narrow sense, “Moore’s law” refers to the obser-
vation that the number of transistors on integrated circuits have for several decades doubled
approximately every two years. However, the term is often used to refer to the more general
observation that many performance metrics in computing technology have followed a similarly fast
exponential trend. Here we plot peak speed of the world’s fastest supercomputer as a function
of time (on a logarithmic vertical scale). In recent years, growth in the serial speed of processors
has stagnated, but increased use of parallelization has enabled the total number of computations
performed to remain on the trend line.
6
1935 1945 1955 1965 1975 1985 1995 2005 2015
Speed
Year
OPS
FLOPS
10
18
10
16
10
14
10
12
10
10
10
8
10
6
10
4
10
2
0
28
|
PATHS TO SUPERINTELLIGENCE
careful and somewhat intricate reasoning—by analyzing instances of convergent
evolution of intelligence-related traits and engaging with the subtleties of obser-
vation selection theory—can we partially circumvent this epistemological bar-
rier. Unless one takes the trouble to do so, one is not in a position to rule out the
possibility that the alleged “upper bound” on the computational requirements for
recapitulating the evolution of intelligence derived in Box 3 might be too low by
thirty orders of magnitude (or some other such large number).
17
Another way of arguing for the feasibility of articial intelligence is by point-
ing to the human brain and suggesting that we could use it as a template for a
machine intelligence. One can distinguish dierent versions of this approach
based on how closely they propose to imitate biological brain functions. At one
extreme—that of very close imitation—we have the idea of whole brain emulation,
which we will discuss in the next subsection. At the other extreme are approaches
that take their inspiration from the functioning of the brain but do not attempt
low-level imitation. Advances in neuroscience and cognitive psychology— which
will be aided by improvements in instrumentation—should eventually uncover
the general principles of brain function. is knowledge could then guide AI
eorts. We have already encountered neural networks as an example of a brain-
inspired AI technique. Hierarchical perceptual organization is another idea
that has been transferred from brain science to machine learning. e study of
reinforcement learning has been motivated (at least in part) by its role in psy-
chological theories of animal cognition, and reinforcement learning techniques
(e.g. the “TD-algorithm”) inspired by these theories are now widely used in AI.
18
More cases like these will surely accumulate in the future. Since there is a limited
number— perhaps a very small number—of distinct fundamental mechanisms
that operate in the brain, continuing incremental progress in brain science should
eventually discover them all. Before this happens, though, it is possible that a
hybrid approach, combining some brain-inspired techniques with some purely
articial methods, would cross the nishing line. In that case, the resultant sys-
tem need not be recognizably brain-like even though some brain-derived insights
were used in its development.
e availability of the brain as template provides strong support for the claim
that machine intelligence is ultimately feasible. is, however, does not enable us
to predict when it will be achieved because it is hard to predict the future rate of
discoveries in brain science. What we can say is that the further into the future
we look, the greater the likelihood that the secrets of the brains functionality will
have been decoded suciently to enable the creation of machine intelligence in
this manner.
Dierent people working toward machine intelligence hold dierent views
about how promising neuromorphic approaches are compared with approaches
that aim for completely synthetic designs. e existence of birds demonstrated
that heavier-than-air ight was physically possible and prompted eorts to build
ying machines. Yet the rst functioning airplanes did not ap their wings. e
jury is out on whether machine intelligence will be like ight, which humans
ARTIFICIAL INTELLIGENCE
|
29
achieved through an articial mechanism, or like combustion, which we initially
mastered by copying naturally occurring res.
Turing’s idea of designing a program that acquires most of its content by learn-
ing, rather than having it pre-programmed at the outset, can apply equally to
neuromorphic and synthetic approaches to machine intelligence.
A variation on Turing’s conception of a child machine is the idea of a “seed
AI.”
19
Whereas a child machine, as Turing seems to have envisaged it, would have
a relatively xed architecture that simply develops its inherent potentialities by
accumulating content, a seed AI would be a more sophisticated articial intel-
ligence capable of improving its own architecture. In the early stages of a seed AI,
such improvements might occur mainly through trial and error, information
acquisition, or assistance from the programmers. At its later stages, however, a
seed AI should be able to understand its own workings suciently to engineer
new algorithms and computational structures to bootstrap its cognitive per-
formance. is needed understanding could result from the seed AI reaching
a sucient level of general intelligence across many domains, or from crossing
some threshold in a particularly relevant domain such as computer science or
mathematics.
is brings us to another important concept, that of “recursive self-
improvement.” A successful seed AI would be able to iteratively enhance itself:
an early version of the AI could design an improved version of itself, and the
improved version—being smarter than the original—might be able to design an
even smarter version of itself, and so forth.
20
Under some conditions, such a pro-
cess of recursive self-improvement might continue long enough to result in an
intelligence explosion—an event in which, in a short period of time, a system’s
level of intelligence increases from a relatively modest endowment of cognitive
capabilities (perhaps sub-human in most respects, but with a domain-specic tal-
ent for coding and AI research) to radical superintelligence. We will return to this
important possibility in Chapter 4, where the dynamics of such an event will be
analyzed more closely. Note that this model suggests the possibility of surprises:
attempts to build articial general intelligence might fail pretty much completely
until the last missing critical component is put in place, at which point a seed AI
might become capable of sustained recursive self-improvement.
Before we end this subsection, there is one more thing that we should emphasize,
which is that an articial intelligence need not much resemble a human mind. AIs
could be—indeed, it is likely that most will be—extremely alien. We should expect
that they will have very dierent cognitive architectures than biological intelli-
gences, and in their early stages of development they will have very dierent pro-
les of cognitive strengths and weaknesses (though, as we shall later argue, they
could eventually overcome any initial weakness). Furthermore, the goal systems
of AIs could diverge radically from those of human beings. ere is no reason to
expect a generic AI to be motivated by love or hate or pride or other such common
human sentiments: these complex adaptations would require deliberate expen-
sive eort to recreate in AIs. is is at once a big problem and a big opportunity.
30
|
PATHS TO SUPERINTELLIGENCE
We will return to the issue of AI motivation in later chapters, but it is so central to
the argument in this book that it is worth bearing in mind throughout.
Whole brain emulation
In whole brain emulation (also known as “uploading”), intelligent soware would
be produced by scanning and closely modeling the computational structure
of a biological brain. is approach thus represents a limiting case of drawing
inspiration from nature: barefaced plagiarism. Achieving whole brain emulation
requires the accomplishment of the following steps.
First, a suciently detailed scan of a particular human brain is created. is
might involve stabilizing the brain post-mortem through vitrication (a process
that turns tissue into a kind of glass). A machine could then dissect the tissue into
thin slices, which could be fed into another machine for scanning, perhaps by
an array of electron microscopes. Various stains might be applied at this stage to
bring out dierent structural and chemical properties. Many scanning machines
could work in parallel to process multiple brain slices simultaneously.
Second, the raw data from the scanners is fed to a computer for automated
image processing to reconstruct the three-dimensional neuronal network that
implemented cognition in the original brain. In practice, this step might pro-
ceed concurrently with the rst step to reduce the amount of high-resolution
image data stored in buers. e resulting map is then combined with a library of
neuro computational models of dierent types of neurons or of dierent neuronal
elements (such as particular kinds of synaptic connectors). Figure 4 shows some
results of scanning and image processing produced with present-day technology.
In the third stage, the neurocomputational structure resulting from the previ-
ous step is implemented on a suciently powerful computer. If completely suc-
cessful, the result would be a digital reproduction of the original intellect, with
memory and personality intact. e emulated human mind now exists as so-
ware on a computer. e mind can either inhabit a virtual reality or interface with
the external world by means of robotic appendages.
e whole brain emulation path does not require that we gure out how human
cognition works or how to program an articial intelligence. It requires only that
we understand the low-level functional characteristics of the basic computational
elements of the brain. No fundamental conceptual or theoretical breakthrough is
needed for whole brain emulation to succeed.
Whole brain emulation does, however, require some rather advanced enabling
technologies. ere are three key prerequisites: (1) scanning: high-throughput
microscopy with sucient resolution and detection of relevant properties; (2)
translation: automated image analysis to turn raw scanning data into an inter-
preted three-dimensional model of relevant neurocomputational elements; and
(3) simulation: hardware powerful enough to implement the resultant compu-
tational structure (see Table 4). (In comparison with these more challenging
WHOLE BRAIN EMULATION
|
3
steps, the construction of a basic virtual reality or a robotic embodiment with
an audiovisual input channel and some simple output channel is relatively
easy. Simple yet minimally adequate I/O seems feasible already with present
technology.
23
)
ere is good reason to think that the requisite enabling technologies are
attainable, though not in the near future. Reasonable computational models
of many types of neuron and neuronal processes already exist. Image recogni-
tion soware has been developed that can trace axons and dendrites through
a stack of two-dimensional images (though reliability needs to be improved).
And there are imaging tools that provide the necessary resolution—with a
scanning tunneling microscope it is possible to “see” individual atoms, which
is a far higher resolution than needed. However, although present knowledge
and capabilities suggest that there is no in-principle barrier to the develop-
ment of the requisite enabling technologies, it is clear that a very great deal of
incremental technical progress would be needed to bring human whole brain
emulation within reach.
24
For example, microscopy technology would need
not just sucient resolution but also sucient throughput. Using an atomic-
resolution scanning tunneling microscope to image the needed surface area
would be far too slow to be practicable. It would be more plausible to use a
lower-resolution electron microscope, but this would require new methods for
Figure 4 Reconstructing 3D neuro-
anatomy from electron microscope
images. Upper left: Atypical electron
micrograph showing cross-sections
of neuronal matter—dendrites and
axons. Upper right: Volume image of
rabbit retinal neural tissue acquired
by serial block-face scanning electron
microscopy.
2
Individual 2D images
have been stacked into a cube (with a
side of approximately µm). Bottom:
Reconstruction of a subset of the
neuronal projections filling a volume
of neuropil, generated by an auto-
mated segmentation algorithm.
22
Table 4 Capabilities needed for whole brain emulation
Scanning Pre-processing/fixation Preparing brains appropriately,
retaining relevant microstruc-
ture and state
Physical handling Methods of manipulating fixed
brains and tissue pieces before,
during, and after scanning
Imaging Volume Capability to scan entire brain
volumes in reasonable time
and expense
Resolution Scanning at sucient resolution
to enable reconstruction
Functional information Ability for scanning to detect
the functionally relevant prop-
erties of tissue
Translation Image processing Geometric
adjustment
Handling distortions due to
scanning imperfections
Data interpolation Handling missing data
Noise removal Improving scan quality
Tracing Detecting structure and
processing it into a consistent
3D model of the tissue
Scan
interpretation
Cell type identification Identifying cell types
Synapse identification Identifying synapses and their
connectivity
Parameter estimation Estimating functionally relevant
parameters of cells, synapses,
and other entities
Databasing Storing the resulting inventory
in an ecient way
Software model
of neural system
Mathematical model Model of entities and their
behavior
Ecient
implementation
Implementation of model
Simulation Storage Storage of original model and
current state
Bandwidth Ecient interprocessor
communication
CPU Processor power to run
simulation
Body simulation Simulation of body enabling
interaction with virtual environ-
ment or actual environment
via robot
Environment simulation Virtual environment for virtual
body
WHOLE BRAIN EMULATION
|
33
preparing and staining cortical tissue to make visible relevant details such as
synaptic ne structure. A great expansion of neurocomputational libraries and
major improvements in automated image processing and scan interpretation
would also be needed.
In general, whole brain emulation relies less on theoretical insight and more
on technological capability than articial intelligence. Just how much technol-
ogy is required for whole brain emulation depends on the level of abstraction at
which the brain is emulated. In this regard there is a tradeo between insight
and technology. In general, the worse our scanning equipment and the feebler
our computers, the less we could rely on simulating low-level chemical and elec-
trophysiological brain processes, and the more theoretical understanding would
be needed of the computational architecture that we are seeking to emulate in
order to create more abstract representations of the relevant functionalities.
25
Conversely, with suciently advanced scanning technology and abundant com-
puting power, it might be possible to brute-force an emulation even with a fairly
limited understanding of the brain. In the unrealistic limiting case, we could
imagine emulating a brain at the level of its elementary particles using the quan-
tum mechanical Schrödinger equation. en one could rely entirely on existing
knowledge of physics and not at all on any biological model. is extreme case,
however, would place utterly impracticable demands on computational power
and data acquisition. A far more plausible level of emulation would be one that
incorporates individual neurons and their connectivity matrix, along with some
of the structure of their dendritic trees and maybe some state variables of indi-
vidual synapses. Neurotransmitter molecules would not be simulated individu-
ally, but their uctuating concentrations would be modeled in a coarse-grained
manner.
To assess the feasibility of whole brain emulation, one must understand the
criterion for success. e aim is not to create a brain simulation so detailed and
accurate that one could use it to predict exactly what would have happened in
the original brain if it had been subjected to a particular sequence of stimuli.
Instead, the aim is to capture enough of the computationally functional prop-
erties of the brain to enable the resultant emulation to perform intellectual
work. For this purpose, much of the messy biological detail of a real brain is
irrelevant.
A more elaborate analysis would distinguish between dierent levels of emula-
tion success based on the extent to which the information-processing functional-
ity of the emulated brain has been preserved. For example, one could distinguish
among (1) a high-delity emulation that has the full set of knowledge, skills,
capacities, and values of the emulated brain; (2) a distorted emulation whose dis-
positions are signicantly non-human in some ways but which is mostly able to
do the same intellectual labor as the emulated brain; and (3) a generic emulation
(which might also be distorted) that is somewhat like an infant, lacking the skills
or memories that had been acquired by the emulated adult brain but with the
capacity to learn most of what a normal human can learn.
26
34
|
PATHS TO SUPERINTELLIGENCE
While it appears ultimately feasible to produce a high-delity emulation, it
seems quite likely that the rst whole brain emulation that we would achieve if
we went down this path would be of a lower grade. Before we would get things to
work perfectly, we would probably get things to work imperfectly. It is also pos-
sible that a push toward emulation technology would lead to the creation of some
kind of neuromorphic AI that would adapt some neurocomputational principles
discovered during emulation eorts and hybridize them with synthetic methods,
and that this would happen before the completion of a fully functional whole
brain emulation. e possibility of such a spillover into neuromorphic AI, as we
shall see in a later chapter, complicates the strategic assessment of the desirability
of seeking to expedite emulation technology.
How far are we currently from achieving a human whole brain emulation? One
recent assessment presented a technical roadmap and concluded that the prereq-
uisite capabilities might be available around mid-century, though with a large
uncertainty interval.
27
Figure 5 depicts the major milestones in this roadmap. e
apparent simplicity of the map may be deceptive, however, and we should be care-
ful not to understate how much work remains to be done. No brain has yet been
emulated. Consider the humble model organism Caenorhabditis elegans, which
is a transparent roundworm, about 1mm in length, with 302 neurons. e com-
plete connectivity matrix of these neurons has been known since the mid-1980s,
when it was laboriously mapped out by means of slicing, electron microscopy, and
Figure 5 Whole brain emulation roadmap. Schematic of inputs, activities, and milestones.
28
High performance computing
Human
emulation
Large mammal
emulation
Small mammal
emulation
Invertebrate
emulation
Sub-network
emulations
Body &
environment
simulation
Validation
methods
Imaging
Interpretation
Modeling
Automation &
process
pipeline
Microscopy
Computer vision
Neuroscience
Eutelic
emulation
Neural
granularity
determination
WHOLE BRAIN EMULATION
|
35
hand-labeling of specimens.
29
But knowing merely which neurons are connected
with which is not enough. To create a brain emulation one would also need to
know which synapses are excitatory and which are inhibitory; the strength of the
connections; and various dynamical properties of axons, synapses, and dendritic
trees. is information is not yet available even for the small nervous system of
C. elegans (although it may now be within range of a targeted moderately sized
research project).
30
Success at emulating a tiny brain, such as that of C. elegans,
would give us a better view of what it would take to emulate larger brains.
At some point in the technology development process, once techniques are
available for automatically emulating small quantities of brain tissue, the problem
reduces to one of scaling. Notice “the ladder” at the right side of Figure 5. is
ascending series of boxes represents a nal sequence of advances which can com-
mence aer preliminary hurdles have been cleared. e stages in this sequence
correspond to whole brain emulations of successively more neurologically sophis-
ticated model organisms—for example, C. elegans → honeybeemouserhesus
monkeyhuman. Because the gaps between these rungs—at least aer the rst
step—are mostly quantitative in nature and due mainly (though not entirely) to the
dierences in size of the brains to be emulated, they should be tractable through a
relatively straightforward scale-up of scanning and simulation capacity.
31
Once we start ascending this nal ladder, the eventual attainment of human
whole brain emulation becomes more clearly foreseeable.
32
We can thus expect
to get some advance warning before arrival at human-level machine intelligence
along the whole brain emulation path, at least if the last among the requisite
en abling technologies to reach sucient maturity is either high-throughput scan-
ning or the computational power needed for real-time simulation. If, however,
the last enabling technology to fall into place is neurocomputational modeling,
then the transition from unimpressive prototypes to a working human emula-
tion could be more abrupt. One could imagine a scenario in which, despite abun-
dant scanning data and fast computers, it is proving dicult to get our neuronal
models to work right. When nally the last glitch is ironed out, what was previ-
ously a completely dysfunctional system—analogous perhaps to an unconscious
brain undergoing a grand mal seizure—might snap into a coherent wakeful state.
In this case, the key advance would not be heralded by a series of functioning
animal emulations of increasing magnitude (provoking newspaper headlines of
correspondingly escalating font size). Even for those paying attention it might be
dicult to tell in advance of success just how many aws remained in the neuro-
computational models at any point and how long it would take to x them, even
up to the eve of the critical breakthrough. (Once a human whole brain emulation
has been achieved, further potentially explosive developments would take place;
but we postpone discussion of this until Chapter 4.)
Surprise scenarios are thus imaginable for whole brain emulation even if all the
relevant research were conducted in the open. Nevertheless, compared with the AI
path to machine intelligence, whole brain emulation is more likely to be preceded
36
|
PATHS TO SUPERINTELLIGENCE
by clear omens since it relies more on concrete observable technologies and is
not wholly based on theoretical insight. We can also say, with greater condence
than for the AI path, that the emulation path will not succeed in the near future
(within the next een years, say) because we know that several challenging pre-
cursor technologies have not yet been developed. By contrast, it seems likely that
somebody could in principle sit down and code a seed AI on an ordinary present-
day personal computer; and it is conceivable—though unlikely—that somebody
somewhere will get the right insight for how to do this in the near future.
Biological cognition
A third path to greater-than-current-human intelligence is to enhance the func-
tioning of biological brains. In principle, this could be achieved without tech-
nology, through selective breeding. Any attempt to initiate a classical large-scale
eugenics program, however, would confront major political and moral hurdles.
Moreover, unless the selection were extremely strong, many generations would be
required to produce substantial results. Long before such an initiative would bear
fruit, advances in biotechnology will allow much more direct control of human
genetics and neurobiology, rendering otiose any human breeding program. We
will therefore focus on methods that hold the potential to deliver results faster, on
the timescale of a few generations or less.
Our individual cognitive capacities can be strengthened in various ways,
including by such traditional methods as education and training. Neurological
development can be promoted by low-tech interventions such as optimizing
maternal and infant nutrition, removing lead and other neurotoxic pollutants
from the environment, eradicating parasites, ensuring adequate sleep and exer-
cise, and preventing diseases that aect the brain.
33
Improvements in cognition
can certainly be obtained through each of these means, though the magnitudes
of the gains are likely to be modest, especially in populations that are already
reasonably well-nourished and -schooled. We will certainly not achieve superin-
telligence by any of these means, but they might help on the margin, particularly
by liing up the deprived and expanding the catchment of global talent. (Lifelong
depression of intelligence due to iodine deciency remains widespread in many
impoverished inland areas of the world—an outrage given that the condition can
be prevented by fortifying table salt at a cost of a few cents per person and year.
34
)
Biomedical enhancements could give bigger boosts. Drugs already exist that
are alleged to improve memory, concentration, and mental energy in at least some
subjects.
35
(Work on this book was fueled by coee and nicotine chewing gum.)
While the ecacy of the present generation of smart drugs is variable, marginal,
and generally dubious, future nootropics might oer clearer benets and fewer
side eects.
36
However, it seems implausible, on both neurological and evolution-
ary grounds, that one could by introducing some chemical into the brain of a
healthy person spark a dramatic rise in intelligence.
37
e cognitive functioning
BIOLOGICAL COGNITION
|
37
of a human brain depends on a delicate orchestration of many factors, especially
during the critical stages of embryo developmentand it is much more likely that
this self-organizing structure, to be enhanced, needs to be carefully balanced,
tuned, and cultivated rather than simply ooded with some extraneous potion.
Manipulation of genetics will provide a more powerful set of tools than psy-
chopharmacology. Consider again the idea of genetic selection: instead of trying
to implement a eugenics program by controlling mating patterns, one could use
selection at the level of embryos or gametes.
38
Pre-implantation genetic diagnosis
has already been used during in vitro fertilization procedures to screen embryos
produced for monogenic disorders such as Huntingtons disease and for predis-
position to some late-onset diseases such as breast cancer. It has also been used for
sex selection and for matching human leukocyte antigen type with that of a sick
sibling, who can then benet from a cord-blood stem cell donation when the new
baby is born.
39
e range of traits that can be selected for or against will expand
greatly over the next decade or two. A strong driver of progress in behavioral
genetics is the rapidly falling cost of genotyping and gene sequencing. Genome-
wide complex trait analysis, using studies with vast numbers of subjects, is just
now starting to become feasible and will greatly increase our knowledge of the
genetic architectures of human cognitive and behavioral traits.
40
Any trait with
a non-negligible heritability—including cognitive capacity—could then become
susceptible to selection.
41
Embryo selection does not require a deep understand-
ing of the causal pathways by which genes, in complicated interplay with environ-
ments, produce phenotypes: it requires only (lots of) data on the genetic correlates
of the traits of interest.
It is possible to calculate some rough estimates of the magnitude of the gains
obtainable in dierent selection scenarios.
42
Table 5 shows expected increases
in intelligence resulting from various amounts of selection, assuming complete
information about the common additive genetic variants underlying the narrow-
sense heritability of intelligence. (With partial information, the eectiveness of
selection would be reduced, though not quite to the extent one might naively
Table 5 Maximum IQ gains from selecting among a set of embryos
43
Selection IQ points gained
 in 2 4.2
 in 0 . 5
 in 00 8.8
 in 000 24.3
5 generations of  in 0 < 65 (b/c diminishing returns)
0 generations of  in 0 < 30 (b/c diminishing returns)
Cumulative limits (additive variants optimized
for cognition)
00 + (< 300 (b/c diminishing returns))
38
|
PATHS TO SUPERINTELLIGENCE
expect.
44
) Unsurprisingly, selecting between larger numbers of embryos produces
larger gains, but there are steeply diminishing returns: selection between 100
embryos does not produce a gain anywhere near y times as large as that which
one would get from selection between 2 embryos.
45
Interestingly, the diminishment of returns is greatly abated when the selection is
spread over multiple generations. us, repeatedly selecting the top 1 in 10 over ten
generations (where each new generation consists of the ospring of those selected
in the previous generation) will produce a much greater increase in the trait value
than a one-o selection of 1 in 100. e problem with sequential selection, of course,
is that it takes longer. If each generational step takes twenty or thirty years, then
even just ve successive generations would push us well into the twenty-second
century. Long before then, more direct and powerful modes of genetic engineering
(not to mention machine intelligence) will most likely be available.
ere is, however, a complementary technology, one which, once it has been
developed for use in humans, would greatly potentiate the enhancement power
of pre-implantation genetic screening: namely, the derivation of viable sperm and
eggs from embryonic stem cells.
46
e techniques for this have already been used
to produce fertile ospring in mice and gamete-like cells in humans. Substantial
scientic challenges remain, however, in translating the animal results to humans
and in avoiding epigenetic abnormalities in the derived stem cell lines. According
to one expert, these challenges might put human application “10 or even 50 years
in the future.
47
With stem cell-derived gametes, the amount of selection power available to
a couple could be greatly increased. In current practice, an in vitro fertiliza-
tion procedure typically involves the creation of fewer than ten embryos. With
stem cell-derived gametes, a few donated cells might be turned into a virtually
unlimited number of gametes that could be combined to produce embryos, which
could then be genotyped or sequenced, and the most promising one chosen for
implantation. Depending on the cost of preparing and screening each individual
embryo, this technology could yield a severalfold increase in the selective power
available to couples using in vitro fertilization.
More importantly still, stem cell-derived gametes would allow multiple genera-
tions of selection to be compressed into less than a human maturation period, by
enabling iterated embryo selection. is is a procedure that would consist of the
following steps:
48
1 Genotype and select a number of embryos that are higher in desired genetic
characteristics.
2 Extract stem cells from those embryos and convert them to sperm and ova,
maturing within six months or less.
49
3 Cross the new sperm and ova to produce embryos.
4 Repeat until large genetic changes have been accumulated.
In this manner, it would be possible to accomplish ten or more generations of
selection in just a few years. (e procedure would be time-consuming and
BIOLOGICAL COGNITION
|
39
expensive; however, in principle, it would need to be done only once rather than
repeated for each birth. e cell lines established at the end of the procedure could
be used to generate very large numbers of enhanced embryos.)
As Table 5 indicates, the average level of intelligence among individuals con-
ceived in this manner could be very high, possibly equal to or somewhat above
that of the most intelligent individual in the historical human population. A
world that had a large population of such individuals might (if it had the culture,
education, communications infrastructure, etc., to match) constitute a collective
superintelligence.
e impact of this technology will be dampened and delayed by several fac-
tors. ere is the unavoidable maturational lag while the nally selected embryos
grow into adult human beings: at least twenty years before an enhanced child
reaches full productivity, longer still before such children come to constitute a
substantial segment of the labor force. Furthermore, even aer the technology has
been perfected, adoption rates will probably start out low. Some countries might
prohibit its use altogether, on moral or religious grounds.
50
Even where selection
is allowed, many couples will prefer the natural way of conceiving. Willingness to
use IVF, however, would increase if there were clearer benets associated with the
proceduresuch as a virtual guarantee that the child would be highly talented and
free from genetic predispositions to disease. Lower health care costs and higher
expected lifetime earnings would also argue in favor of genetic selection. As use
of the procedure becomes more common, particularly among social elites, there
might be a cultural shi toward parenting norms that present the use of selection
as the thing that responsible enlightened couples do. Many of the initially reluc-
tant might join the bandwagon in order to have a child that is not at a disadvantage
relative to the enhanced children of their friends and colleagues. Some countries
might oer inducements to encourage their citizens to take advantage of genetic
selection in order to increase the countrys stock of human capital, or to increase
long-term social stability by selecting for traits like docility, obedience, submis-
siveness, conformity, risk-aversion, or cowardice, outside of the ruling clan.
Eects on intellectual capacity would also depend on the extent to which the
available selection power would be used for enhancing cognitive traits (Table 6).
ose who do opt to use some form of embryo selection would have to choose how
to allocate the selection power at their disposal, and intelligence would to some
extent be in competition with other desired attributes, such as health, beauty, per-
sonality, or athleticism. Iterated embryo selection, by oering such a large amount
of selection power, would alleviate some of these tradeos, enabling simultaneous
strong selection for multiple traits. However, this procedure would tend to disrupt
the normal genetic relationship between parents and child, something that could
negatively aect demand in many cultures.
51
With further advances in genetic technology, it may become possible to syn-
thesize genomes to specication, obviating the need for large pools of embryos.
DNA synthesis is already a routine and largely automated biotechnology, though
it is not yet feasible to synthesize an entire human genome that could be used in
40
|
PATHS TO SUPERINTELLIGENCE
Table 6 Possible impacts from genetic selection in dierent scenarios
52
Adoption /
technology
IVF+”
Selection of 1 of 2 embryos
[4 points]
Aggressive IVF
Selection of 1 of 10 embryos
[12 points]
In vitro egg”
Selection of 1 of 100 embryos
[19 points]
Iterated embryo
selection”
[100+ points]
Marginal fertility
practice”
~ 0.25% adoption
Socially negligible over one
generation. Eects of social
controversy more important
than direct impacts.
Socially negligible over one
generation. Eects of social
controversy more important
than direct impacts.
Enhanced contingent form
noticeable minority in highly
cognitively selective positions.
Selected dominate ranks of
elite scientists, attorneys,
physicians, engineers. Intel-
lectual Renaissance?
“Elite advantage”
0% adoption
Slight cognitive impact in st
generation, combines with
selection for non-cognitive
traits to perceptibly advantage
a minority.
Large fraction of Harvard
undergraduates enhanced.
2nd generation dominate
cognitively demanding
professions.
Selected dominate ranks of
scientists, attorneys, physicians,
engineers in st generation.
“Posthumanity”
53
New normal”
> 90% adoption
Learning disability much less
frequent among children.
In 2nd generation, population
above high IQ thresholds
more than doubled.
Substantial growth in educa-
tional attainment, income.
2nd generation manyfold
increase at right tail.
Raw IQs typical for eminent
scientists 0+ times as common
in st generation. Thousands of
times in 2nd generation.
“Posthumanity”
BIOLOGICAL COGNITION
|
4
a reproductive context (not least because of still-unresolved diculties in getting
the epigenetics right).
54
But once this technology has matured, an embryo could
be designed with the exact preferred combination of genetic inputs from each
parent. Genes that are present in neither of the parents could also be spliced in,
including alleles that are present with low frequency in the population but which
may have signicant positive eects on cognition.
55
One intervention that becomes possible when human genomes can be synthe-
sized is genetic “spell-checking” of an embryo. (Iterated embryo selection might
also allow an approximation of this.) Each of us currently carries a mutational
load, with perhaps hundreds of mutations that reduce the eciency of various
cellular processes.
56
Each individual mutation has an almost negligible eect
(whence it is only slowly removed from the gene pool), yet in combination such
mutations may exact a heavy toll on our functioning.
57
Individual dierences in
intelligence might to a signicant extent be attributable to variations in the num-
ber and nature of such slightly deleterious alleles that each of us carries. With gene
synthesis we could take the genome of an embryo and construct a version of that
genome free from the genetic noise of accumulated mutations. If one wished to
speak provocatively, one could say that individuals created from such proofread
genomes might be “more human” than anybody currently alive, in that they would
be less distorted expressions of human form. Such people would not all be carbon
copies, because humans vary genetically in ways other than by carrying dierent
deleterious mutations. But the phenotypical manifestation of a proofread genome
may be an exceptional physical and mental constitution, with elevated function-
ing in polygenic trait dimensions like intelligence, health, hardiness, and appear-
ance.
58
(A loose analogy could be made with composite faces, in which the defects
of the superimposed individuals are averaged out: see Figure 6.)
Figure 6 Composite faces as a metaphor for spell-checked genomes. Each of the central pic-
tures was produced by superimposing photographs of sixteen dierent individuals (residents
of Tel Aviv). Composite faces are often judged to be more beautiful than any of the individual
faces of which they are composed, as idiosyncratic imperfections are averaged out. Analogously,
by removing individual mutations, proofread genomes may produce people closer to “Platonic
ideals.” Such individuals would not all be genetically identical, because many genes come in multiple
equally functional alleles. Proofreading would only eliminate variance arising from deleterious
mutations.
59
42
|
PATHS TO SUPERINTELLIGENCE
Other potential biotechnological techniques might also be relevant. Human
reproductive cloning, once achieved, could be used to replicate the genome of
exceptionally talented individuals. Uptake would be limited by the preference of
most prospective parents to be biologically related to their children, yet the prac-
tice could nevertheless come to have non-negligible impact because (1) even a rel-
atively small increase in the number of exceptionally talented people might have
a signicant eect; and (2) it is possible that some state would embark on a larger-
scale eugenics program, perhaps by paying surrogate mothers. Other kinds of
genetic engineering—such as the design of novel synthetic genes or insertion into
the genome of promoter regions and other elements to control gene expression—
might also become important over time. Even more exotic possibilities may exist,
such as vats full of complexly structured cultured cortical tissue, or “uplied
transgenic animals (perhaps some large-brained mammal such as the whale or
elephant, enriched with human genes). ese latter ones are wholly speculative,
but over a longer time frame they perhaps cannot be completely discounted.
So far we have discussed germline interventions, ones that would be done
on gametes or embryos. Somatic gene enhancements, by bypassing the genera-
tion cycle, could in principle produce impacts more quickly. However, they are
technologically much more challenging. ey require that the modied genes
be inserted into a large number of cells in the living body—including, in the
case of cognitive enhancement, the brain. Selecting among existing egg cells or
embryos, in contrast, requires no gene insertion. Even such germline therapies as
do involve modifying the genome (such as proofreading the genome or splicing in
rare alleles) are far easier to implement at the gamete or the embryo stage, where
one is dealing with a small number of cells. Furthermore, germline interven-
tions on embryos can probably achieve greater eects than somatic interventions
on adults, because the former would be able to shape early brain development
whereas the latter would be limited to tweaking an existing structure. (Some of
what could be done through somatic gene therapy might also be achievable by
pharmacological means.)
Focusing therefore on germline interventions, we must take into account the
generational lag delaying any large impact on the world.
60
Even if the technol-
ogy were perfected today and immediately put to use, it would take more than
two decades for a genetically enhanced brood to reach maturity. Furthermore,
with human applications there is normally a delay of at least one decade between
proof of concept in the laboratory and clinical application, because of the need
for extensive studies to determine safety. e simplest forms of genetic selec-
tion, however, could largely abrogate the need for such testing, since they would
use standard fertility treatment techniques and genetic information to choose
between embryos that might otherwise have been selected by chance.
Delays could also result from obstacles rooted not in a fear of failure (demand for
safety testing) but in fear of success—demand for regulation driven by concerns
about the moral permissibility of genetic selection or its wider social implications.
Such concerns are likely to be more inuential in some countries than in others,
BIOLOGICAL COGNITION
|
43
owing to diering cultural, historical, and religious contexts. Post-war Germany,
for example, has chosen to give a wide berth to any reproductive practices that
could be perceived to be even in the remotest way aimed at enhancement, a stance
that is understandable given the particularly dark history of atrocities connected
to the eugenics movement in that country. Other Western countries are likely to
take a more liberal approach. And some countries—perhaps China or Singapore,
both of which have long-term population policies—might not only permit but
actively promote the use of genetic selection and genetic engineering to enhance
the intelligence of their populations once the technology to do so is available.
Once the example has been set, and the results start to show, holdouts will have
strong incentives to follow suit. Nations would face the prospect of becoming cog-
nitive backwaters and losing out in economic, scientic, military, and prestige
contests with competitors that embrace the new human enhancement technolo-
gies. Individuals within a society would see places at elite schools being lled with
genetically selected children (who may also on average be prettier, healthier, and
more conscientious) and will want their own ospring to have the same advan-
tages. ere is some chance that a large attitudinal shi could take place over a
relatively short time, perhaps in as little as a decade, once the technology is proven
to work and to provide a substantial benet. Opinion surveys in the United States
reveal a dramatic shi in public approval of in vitro fertilization aer the birth
of the rst “test tube baby,” Louise Brown, in 1978. A few years earlier, only 18%
of Americans said they would personally use IVF to treat infertility; yet in a poll
taken shortly aer the birth of Louise Brown, 53% said they would do so, and the
number has continued to rise.
61
(For comparison, in a poll taken in 2004, 28%
of Americans approved of embryo selection for “strength or intelligence,” 58%
approved of it for avoiding adult-onset cancer, and 68% approved of it to avoid
fatal childhood disease.
62
)
If we add up the various delays—say ve to ten years to gather the information
needed for signicantly eective selection among a set of IVF embryos (possi-
bly much longer before stem cell-derived gametes are available for use in human
reproduction), ten years to build signicant uptake, and twenty to twenty-ve
years for the enhanced generation to reach an age where they start becoming pro-
ductive, we nd that germline enhancements are unlikely to have a signicant
impact on society before the middle of this century. From that point onward, how-
ever, the intelligence of signicant segments of the adult population may begin to
be boosted by genetic enhancements. e speed of the ascent would then greatly
accelerate as cohorts conceived using more powerful next-generation genetic
technologies (in particular stem cell-derived gametes and iterative embryo selec-
tion) enter the labor force.
With the full development of the genetic technologies described above (setting
aside the more exotic possibilities such as intelligence in cultured neural tissue),
it might be possible to ensure that new individuals are on average smarter than
any human who has yet existed, with peaks that rise higher still. e potential
of biological enhancement is thus ultimately high, probably sucient for the
44
|
PATHS TO SUPERINTELLIGENCE
attainment of at least weak forms of superintelligence. is should not be sur-
prising. Aer all, dumb evolutionary processes have dramatically amplied the
intelligence in the human lineage even compared with our close relatives the great
apes and our own humanoid ancestors; and there is no reason to suppose Homo
sapiens to have reached the apex of cognitive eectiveness attainable in a bio-
logical system. Far from being the smartest possible biological species, we are
probably better thought of as the stupidest possible biological species capable of
starting a technological civilization—a niche we lled because we got there rst,
not because we are in any sense optimally adapted to it.
Progress along the biological path is clearly feasible. e generational lag in
germline interventions means that progress could not be nearly as sudden and
abrupt as in scenarios involving machine intelligence. (Somatic gene therapies
and pharmacological interventions could theoretically skip the generational lag,
but they seem harder to perfect and are less likely to produce dramatic eects.) e
ultimate potential of machine intelligence is, of course, vastly greater than that
of organic intelligence. (One can get some sense of the magnitude of the gap by
considering the speed dierential between electronic components and nerve cells:
even today’s transistors operate on a timescale ten million times shorter than that
of biological neurons.) However, even comparatively moderate enhancements of
biological cognition could have important consequences. In particular, cognitive
enhancement could accelerate science and technology, including progress toward
more potent forms of biological intelligence amplication and machine intelli-
gence. Consider how the rate of progress in the eld of articial intelligence would
change in a world where Average Joe is an intellectual peer of Alan Turing or John
von Neumann, and where millions of people tower far above any intellectual giant
of the past.
63
A discussion of the strategic implications of cognitive enhancement will have
to await a later chapter. But we can summarize this section by noting three con-
clusions: (1) at least weak forms of superintelligence are achievable by means
of biotechnological enhancements; (2) the feasibility of cognitively enhanced
humans adds to the plausibility that advanced forms of machine intelligence are
feasible—because even if we were fundamentally unable to create machine intel-
ligence (which there is no reason to suppose), machine intelligence might still be
within reach of cognitively enhanced humans; and (3) when we consider scenarios
stretching signicantly into the second half of this century and beyond, we must
take into account the probable emergence of a generation of genetically enhanced
populations—voters, inventors, scientists—with the magnitude of enhancement
escalating rapidly over subsequent decades.
Braincomputer interfaces
It is sometimes proposed that direct brain–computer interfaces, particularly
implants, could enable humans to exploit the fortes of digital computing—perfect
BRAINCOMPUTER INTERFACES
|
45
recall, speedy and accurate arithmetic calculation, and high-bandwidth data
transmission—enabling the resulting hybrid system to radically outperform the
unaugmented brain.
64
But although the possibility of direct connections between
human brains and computers has been demonstrated, it seems unlikely that such
interfaces will be widely used as enhancements any time soon.
65
To begin with, there are signicant risks of medical complications—including
infections, electrode displacement, hemorrhage, and cognitive decline—when
implanting electrodes in the brain. Perhaps the most vivid illustration to date of
the benets that can be obtained through brain stimulation is the treatment of
patients with Parkinson’s disease. e Parkinsons implant is relatively simple:
it does not really communicate with the brain but simply supplies a stimulat-
ing electric current to the subthalamic nucleus. A demonstration video shows a
subject slumped in a chair, completely immobilized by the disease, then suddenly
springing to life when the current is switched on: the subject now moves his arms,
stands up and walks across the room, turns around and performs a pirouette. Yet
even behind this especially simple and almost miraculously successful procedure,
there lurk negatives. One study of Parkinson patients who had received deep brain
implants showed reductions in verbal uency, selective attention, color naming,
and verbal memory compared with controls. Treated subjects also reported more
cognitive complaints.
66
Such risks and side eects might be tolerable if the proced-
ure is used to alleviate severe disability. But in order for healthy subjects to vol-
unteer themselves for neurosurgery, there would have to be some very substantial
enhancement of normal functionality to be gained.
is brings us to the second reason to doubt that superintelligence will be
achieved through cyborgization, namely that enhancement is likely to be far
more dicult than therapy. Patients who suer from paralysis might benet from
an implant that replaces their severed nerves or activates spinal motion pattern
generators.
67
Patients who are deaf or blind might benet from articial coch-
leae and retinas.
68
Patients with Parkinson’s disease or chronic pain might ben-
et from deep brain stimulation that excites or inhibits activity in a particular
area of the brain.
69
What seems far more dicult to achieve is a high-bandwidth
direct interaction between brain and computer to provide substantial increases
in intelligence of a form that could not be more readily attained by other means.
Most of the potential benets that brain implants could provide in healthy sub-
jects could be obtained at far less risk, expense, and inconvenience by using our
regular motor and sensory organs to interact with computers located outside of
our bodies. We do not need to plug a ber optic cable into our brains in order to
access the Internet. Not only can the human retina transmit data at an impressive
rate of nearly 10 million bits per second, but it comes pre-packaged with a massive
amount of dedicated wetware, the visual cortex, that is highly adapted to extract-
ing meaning from this information torrent and to interfacing with other brain
areas for further processing.
70
Even if there were an easy way of pumping more
information into our brains, the extra data inow would do little to increase the
rate at which we think and learn unless all the neural machinery necessary for
46
|
PATHS TO SUPERINTELLIGENCE
making sense of the data were similarly upgraded. Since this includes almost all of
the brain, what would really be needed is a “whole brain prosthesis”—which is just
another way of saying articial general intelligence. Yet if one had a human-level
AI, one could dispense with neurosurgery: a computer might as well have a metal
casing as one of bone. So this limiting case just takes us back to the AI path, which
we have already examined.
Brain–computer interfacing has also been proposed as a way to get informa-
tion out of the brain, for purposes of communicating with other brains or with
machines.
71
Such uplinks have helped patients with locked-in syndrome to com-
municate with the outside world by enabling them to move a cursor on a screen
by thought.
72
e bandwidth attained in such experiments is low: the patient
painstakingly types out one slow letter aer another at a rate of a few words per
minute. One can readily imagine improved versions of this technology—perhaps
a next-generation implant could plug into Broca’s area (a region in the frontal
lobe involved in language production) and pick up internal speech.
73
But whilst
such a technology might assist some people with disabilities induced by stroke or
muscular degeneration, it would hold little appeal for healthy subjects. e func-
tionality it would provide is essentially that of a microphone coupled with speech
recognition soware, which is already commercially available—minus the pain,
inconvenience, expense, and risks associated with neurosurgery (and minus at
least some of the hyper-Orwellian overtones of an intracranial listening device).
Keeping our machines outside of our bodies also makes upgrading easier.
But what about the dream of bypassing words altogether and establishing a
connection between two brains that enables concepts, thoughts, or entire areas
of expertise to be “downloaded” from one mind to another? We can download
large les to our computers, including libraries with millions of books and arti-
cles, and this can be done over the course of seconds: could something similar be
done with our brains? e apparent plausibility of this idea probably derives from
an incorrect view of how information is stored and represented in the brain. As
noted, the rate-limiting step in human intelligence is not how fast raw data can be
fed into the brain but rather how quickly the brain can extract meaning and make
sense of the data. Perhaps it will be suggested that we transmit meanings directly,
rather than package them into sensory data that must be decoded by the recipient.
ere are two problems with this. e rst is that brains, by contrast to the kinds
of program we typically run on our computers, do not use standardized data stor-
age and representation formats. Rather, each brain develops its own idiosyncratic
representations of higher-level content. Which particular neuronal assemblies
are recruited to represent a particular concept depends on the unique experi-
ences of the brain in question (along with various genetic factors and stochastic
physiological processes). Just as in articial neural nets, meaning in biological
neural networks is likely represented holistically in the structure and activity pat-
terns of sizeable overlapping regions, not in discrete memory cells laid out in neat
arrays.
74
It would therefore not be possible to establish a simple mapping between
the neurons in one brain and those in another in such a way that thoughts could
BRAINCOMPUTER INTERFACES
|
47
automatically slide over from one to the other. In order for the thoughts of one
brain to be intelligible to another, the thoughts need to be decomposed and pack-
aged into symbols according to some shared convention that allows the symbols
to be correctly interpreted by the receiving brain. is is the job of language.
In principle, one could imagine ooading the cognitive work of articulation
and interpretation to an interface that would somehow read out the neural states
in the sender’s brain and somehow feed in a bespoke pattern of activation to the
receiver’s brain. But this brings us to the second problem with the cyborg scenario.
Even setting aside the (quite immense) technical challenge of how to reliably read
and write simultaneously from perhaps billions of individually addressable neu-
rons, creating the requisite interface is probably an AI-complete problem. e
interface would need to include a component able (in real-time) to map ring
patterns in one brain onto semantically equivalent ring patterns in the other
brain. e detailed multilevel understanding of the neural computation needed to
accomplish such a task would seem to directly enable neuromorphic AI.
Despite these reservations, the cyborg route toward cognitive enhancement is
not entirely without promise. Impressive work on the rat hippocampus has dem-
onstrated the feasibility of a neural prosthesis that can enhance performance in a
simple working-memory task.
75
In its present version, the implant collects input
from a dozen or two electrodes located in one area (“CA3”) of the hippocampus
and projects onto a similar number of neurons in another area (“CA1”). A micro-
processor is trained to discriminate between two dierent ring patterns in the
rst area (corresponding to two dierent memories, “right lever” or le lever”)
and to learn how these patterns are projected into the second area. is prosthe-
sis can not only restore function when the normal neural connection between
the two neural areas is blockaded, but by sending an especially clear token of a
particular memory pattern to the second area it can enhance the performance
on the memory task beyond what the rat is normally capable of. While a techni-
cal tour de force by contemporary standards, the study leaves many challenging
questions unanswered: How well does the approach scale to greater numbers of
memories? How well can we control the combinatorial explosion that otherwise
threatens to make learning the correct mapping infeasible as the number of input
and output neurons is increased? Does the enhanced performance on the test task
come at some hidden cost, such as reduced ability to generalize from the particu-
lar stimulus used in the experiment, or reduced ability to unlearn the association
when the environment changes? Would the test subjects still somehow benet
even if—unlike rats—they could avail themselves of external memory aids such
as pen and paper? And how much harder would it be to apply a similar method
to other parts of the brain? Whereas the present prosthesis takes advantage of the
relatively simple feed-forward structure of parts of the hippocampus (basically
serving as a unidirectional bridge between areas CA3 and CA1), other structures
in the cortex involve convoluted feedback loops which greatly increase the com-
plexity of the wiring diagram and, presumably, the diculty of deciphering the
functionality of any embedded group of neurons.
48
|
PATHS TO SUPERINTELLIGENCE
One hope for the cyborg route is that the brain, if permanently implanted with
a device connecting it to some external resource, would over time learn an eec-
tive mapping between its own internal cognitive states and the inputs it receives
from, or the outputs accepted by, the device. en the implant itself would not
need to be intelligent; rather, the brain would intelligently adapt to the interface,
much as the brain of an infant gradually learns to interpret the signals arriving
from receptors in its eyes and ears.
76
But here again one must question how much
would really be gained. Suppose that the brain’s plasticity were such that it could
learn to detect patterns in some new input stream arbitrary projected onto some
part of the cortex by means of a brain–computer interface: why not project the
same information onto the retina instead, as a visual pattern, or onto the coch-
lea as sounds? e low-tech alternative avoids a thousand complications, and in
either case the brain could deploy its pattern-recognition mechanisms and plas-
ticity to learn to make sense of the information.
Networks and organizations
Another conceivable path to superintelligence is through the gradual enhance-
ment of networks and organizations that link individual human minds with one
another and with various artifacts and bots. e idea here is not that this would
enhance the intellectual capacity of individuals enough to make them superintel-
ligent, but rather that some system composed of individuals thus networked and
organized might attain a form of superintelligence—what in the next chapter we
will elaborate as “collective superintelligence.
77
Humanity has gained enormously in collective intelligence over the course of
history and prehistory. e gains come from many sources, including innova-
tions in communications technology, such as writing and printing, and above
all the introduction of language itself; increases in the size of the world popula-
tion and the density of habitation; various improvements in organizational tech-
niques and epistemic norms; and a gradual accumulation of institutional capital.
In general terms, a system’s collective intelligence is limited by the abilities of its
member minds, the overheads in communicating relevant information between
them, and the various distortions and ineciencies that pervade human organi-
zations. If communication overheads are reduced (including not only equipment
costs but also response latencies, time and attention burdens, and other factors),
then larger and more densely connected organizations become feasible. e same
could happen if xes are found for some of the bureaucratic deformations that
warp organizational life—wasteful status games, mission creep, concealment or
falsication of information, and other agency problems. Even partial solutions to
these problems could pay hey dividends for collective intelligence.
e technological and institutional innovations that could contribute to the
growth of our collective intelligence are many and various. For example, sub-
sidized prediction markets might foster truth-seeking norms and improve
NETWORKS AND ORGANIZATIONS
|
49
forecasting on contentious scientic and social issues.
78
Lie detectors (should it
prove feasible to make ones that are reliable and easy to use) could reduce the
scope for deception in human aairs.
79
Self-deception detectors might be even
more powerful.
80
Even without newfangled brain technologies, some forms of
deception might become harder to practice thanks to increased availability of
many kinds of data, including reputations and track records, or the promulga-
tion of strong epistemic norms and rationality culture. Voluntary and involun-
tary surveillance will amass vast amounts of information about human behavior.
Social networking sites are already used by over a billion people to share personal
details: soon, these people might begin uploading continuous life recordings from
microphones and video cameras embedded in their smart phones or eyeglass
frames. Automated analysis of such data streams will enable many new applica-
tions (sinister as well as benign, of course).
81
Growth in collective intelligence may also come from more general organi-
zational and economic improvements, and from enlarging the fraction of the
world’s population that is educated, digitally connected, and integrated into
global intellectual culture.
82
e Internet stands out as a particularly dynamic frontier for innovation and
experimentation. Most of its potential may still remain unexploited. Continuing
development of an intelligent Web, with better support for deliberation, de-
biasing, and judgment aggregation, might make large contributions to increasing
the collective intelligence of humanity as a whole or of particular groups.
But what of the seemingly more fanciful idea that the Internet might one day
“wake up”? Could the Internet become something more than just the backbone
of a loosely integrated collective superintelligence—something more like a vir-
tual skull housing an emerging unied super-intellect? (is was one of the ways
that superintelligence could arise according to Vernor Vinge’s inuential 1993
essay, which coined the term “technological singularity.”
83
) Against this one could
object that machine intelligence is hard enough to achieve through arduous
engineering, and that it is incredible to suppose that it will arise spontaneously.
However, the story need not be that some future version of the Internet suddenly
becomes superintelligent by mere happenstance. A more plausible version of the
scenario would be that the Internet accumulates improvements through the work
of many people over many years—work to engineer better search and information
ltering algorithms, more powerful data representation formats, more capable
autonomous soware agents, and more ecient protocols governing the inter-
actions between such bots—and that myriad incremental improvements even-
tually create the basis for some more unied form of web intelligence. It seems
at least conceivable that such a web-based cognitive system, supersaturated with
computer power and all other resources needed for explosive growth save for one
crucial ingredient, could, when the nal missing constituent is dropped into the
cauldron, blaze up with superintelligence. is type of scenario, though, con-
verges into another possible path to superintelligence, that of articial general
intelligence, which we have already discussed.
50
|
PATHS TO SUPERINTELLIGENCE
Summary
e fact that there are many paths that lead to superintelligence should increase
our condence that we will eventually get there. If one path turns out to be
blocked, we can still progress.
at there are multiple paths does not entail that there are multiple destina-
tions. Even if signicant intelligence amplication were rst achieved along one
of the non-machine-intelligence paths, this would not render machine intelli-
gence irrelevant. Quite the contrary: enhanced biological or organizational intel-
ligence would accelerate scientic and technological developments, potentially
hastening the arrival of more radical forms of intelligence amplication such as
whole brain emulation and AI.
is is not to say that it is a matter of indierence how we get to machine super-
intelligence. e path taken to get there could make a big dierence to the even-
tual outcome. Even if the ultimate capabilities that are obtained do not depend
much on the trajectory, how those capabilities will be used—how much control
we humans have over their disposition—might well depend on details of our
approach. For example, enhancements of biological or organizational intelligence
might increase our ability to anticipate risk and to design machine superintelli-
gence that is safe and benecial. (A full strategic assessment involves many com-
plexities, and will have to await Chapter 14.)
True superintelligence (as opposed to marginal increases in current levels of
intelligence) might plausibly rst be attained via the AI path. ere are, however,
many fundamental uncertainties along this path. is makes it dicult to rigor-
ously assess how long the path is or how many obstacles there are along the way.
e whole brain emulation path also has some chance of being the quickest route
to superintelligence. Since progress along this path requires mainly incremental
technological advances rather than theoretical breakthroughs, a strong case can
be made that it will eventually succeed. It seems fairly likely, however, that even if
progress along the whole brain emulation path is swi, articial intelligence will
nevertheless be rst to cross the nishing line: this is because of the possibility of
neuromorphic AIs based on partial emulations.
Biological cognitive enhancements are clearly feasible, particularly ones based
on genetic selection. Iterated embryo selection currently seems like an especially
promising technology. Compared with possible breakthroughs in machine intel-
ligence, however, biological enhancements would be relatively slow and gradual.
ey would, at best, result in relatively weak forms of superintelligence (more on
this shortly).
e clear feasibility of biological enhancement should increase our condence
that machine intelligence is ultimately achievable, since enhanced human sci-
entists and engineers will be able to make more and faster progress than their
aunaturel counterparts. Especially in scenarios in which machine intelligence
is delayed beyond mid-century, the increasingly cognitively enhanced cohorts
coming onstage will play a growing role in subsequent developments.
SUMMARY
|
5
Brain–computer interfaces look unlikely as a source of superintelligence.
Improvements in networks and organizations might result in weakly superintelli-
gent forms of collective intelligence in the long run; but more likely, they will play
an enabling role similar to that of biological cognitive enhancement, gradually
increasing humanitys eective ability to solve intellectual problems. Compared
with biological enhancements, advances in networks and organization will make
a dierence sooner—in fact, such advances are occurring continuously and are
having a signicant impact already. However, improvements in networks and
organizations may yield narrower increases in our problem-solving capacity than
will improvements in biological cognition—boosting “collective intelligence”
rather than “quality intelligence,” to anticipate a distinction we are about to intro-
duce in the next chapter.
52
|
FORMS OF SUPERINTELLIGENCE
CHAPTER 3
Forms of superintelligence
S
o what, exactly, do we mean by “superintelligence”? While we do not
wish to get bogged down in terminological swamps, something needs
to be said to clarify the conceptual ground. This chapter identifies
three different forms of superintelligence, and argues that they are, in a
practically relevant sense, equivalent. We also show that the potential for
intelligence in a machine substrate is vastly greater than in a biological sub-
strate. Machines have a number of fundamental advantages which will give
them overwhelming superiority. Biological humans, even if enhanced, will
be outclassed.
Many machines and nonhuman animals already perform at superhuman lev-
els in narrow domains. Bats interpret sonar signals better than man, calculators
outperform us in arithmetic, and chess programs beat us in chess. e range of
specic tasks that can be better performed by soware will continue to expand.
But although specialized information processing systems will have many uses,
there are additional profound issues that arise only with the prospect of machine
intellects that have enough general intelligence to substitute for humans across
the board.
As previously indicated, we use the term “superintelligence” to refer to
intellects that greatly outperform the best current human minds across many
very general cognitive domains. This is still quite vague. Different kinds of
system with rather disparate performance attributes could qualify as super-
intelligences under this definition. To advance the analysis, it is helpful to
disaggregate this simple notion of superintelligence by distinguishing dif-
ferent bundles of intellectual super-capabilities. There are many ways in
which such decomposition could be done. Here we will differentiate between
three forms: speed superintelligence, collective superintelligence, and quality
superintelligence.
SPEED SUPERINTELLIGENCE
|
53
Speed superintelligence
A speed superintelligence is an intellect that is just like a human mind but faster.
is is conceptually the easiest form of superintelligence to analyze.
1
We can
dene speed superintelligence as follows:
Speed superintelligence: A system that can do all that a human intellect can do, but much
faster.
By “much” we here mean something like “multiple orders of magnitude.” But
rather than try to expunge every remnant of vagueness from the denition, we
will entrust the reader with interpreting it sensibly.
2
e simplest example of speed superintelligence would be a whole brain emula-
tion running on fast hardware.
3
An emulation operating at a speed of ten thou-
sand times that of a biological brain would be able to read a book in a few seconds
and write a PhD thesis in an aernoon. With a speedup factor of a million, an
emulation could accomplish an entire millennium of intellectual work in one
working day.
4
To such a fast mind, events in the external world appear to unfold in slow
motion. Suppose your mind ran at 10,000×. If your eshly friend should happen
to drop his teacup, you could watch the porcelain slowly descend toward the car-
pet over the course of several hours, like a comet silently gliding through space
toward an assignation with a far-o planet; and, as the anticipation of the com-
ing crash tardily propagates through the folds of your friends gray matter and
from thence out into his peripheral nervous system, you could observe his body
gradually assuming the aspect of a frozen oops—enough time for you not only
to order a replacement cup but also to read a couple of scientic papers and
take a nap.
Because of this apparent time dilation of the material world, a speed superin-
telligence would prefer to work with digital objects. It could live in virtual reality
and deal in the information economy. Alternatively, it could interact with the
physical environment by means of nanoscale manipulators, since limbs at such
small scales could operate faster than macroscopic appendages. (e characteris-
tic frequency of a system tends to be inversely proportional to its length scale.
5
) A
fast mind might commune mainly with other fast minds rather than with brady-
telic, molasses-like humans.
e speed of light becomes an increasingly important constraint as minds get
faster, since faster minds face greater opportunity costs in the use of their time for
traveling or communicating over long distances.
6
Light is roughly a million times
faster than a jet plane, so it would take a digital agent with a mental speedup of
1,000,000× about the same amount of subjective time to travel across the globe as
it does a contemporary human journeyer. Dialing somebody long distance would
take as long as getting there “in person,” though it would be cheaper as a call would
54
|
FORMS OF SUPERINTELLIGENCE
require less bandwidth. Agents with large mental speedups who want to converse
extensively might nd it advantageous to move near one another. Extremely fast
minds with need for frequent interaction (such as members of a work team) may
take up residence in computers located in the same building to avoid frustrating
latencies.
Collective superintelligence
Another form of superintelligence is a system achieving superior performance by
aggregating large numbers of smaller intelligences:
Collective superintelligence: A system composed of a large number of smaller intellects
such that the system’s overall performance across many very general domains vastly outstrips
that of any current cognitive system.
Collective superintelligence is less conceptually clear-cut than speed superintel-
ligence.
7
However, it is more familiar empirically. While we have no experience
with human-level minds that dier signicantly in clock speed, we do have ample
experience with collective intelligence, systems composed of various numbers
of human-level components working together with various degrees of eciency.
Firms, work teams, gossip networks, advocacy groups, academic communities,
countries, even humankind as a whole, can—if we adopt a somewhat abstract
perspective—be viewed as loosely dened “systems” capable of solving classes of
intellectual problems. From experience, we have some sense of how easily dier-
ent tasks succumb to the eorts of organizations of various size and composition.
Collective intelligence excels at solving problems that can be readily broken
into parts such that solutions to sub-problems can be pursued in parallel and veri-
ed independently. Tasks like building a space shuttle or operating a hamburger
franchise oer myriad opportunities for division of labor: dierent engineers
work on dierent components of the spacecra; dierent stas operate dier-
ent restaurants. In academia, the rigid division of researchers, students, journals,
grants, and prizes into separate self-contained disciplines—though unconducive
to the type of work represented by this book—might (only in a conciliatory and
mellow frame of mind) be viewed as a necessary accommodation to the practi-
calities of allowing large numbers of diversely motivated individuals and teams to
contribute to the growth of human knowledge while working relatively indepen-
dently, each plowing their own furrow.
A systems collective intelligence could be enhanced by expanding the number
or the quality of its constituent intellects, or by improving the quality of their
organization.
8
To obtain a collective superintelligence from any present-day collec-
tive intelligence would require a very great degree of enhancement. e resulting
system would need to be capable of vastly outperforming any current collective
intelligence or other cognitive system across many very general domains. A new
conference format that lets scholars exchange information more eectively, or
COLLECTIVE SUPERINTELLIGENCE
|
55
a new collaborative information-ltering algorithm that better predicted users’
ratings of books and movies, would clearly not on its own amount to anything
approaching collective superintelligence. Nor would a 50% increase in the world
population, or an improvement in pedagogical method that enabled students to
complete a school day in four hours instead of six. Some far more extreme growth
of humanitys collective cognitive capacity would be required to meet the crite-
rion of collective superintelligence.
Note that the threshold for collective superintelligence is indexed to the per-
formance levels of the present—that is, the early twenty-rst century. Over the
course of human prehistory, and again over the course of human history, human-
ity’s collective intelligence has grown by very large factors. World population, for
example, has increased by at least a factor of a thousand since the Pleistocene.
9
On
this basis alone, current levels of human collective intelligence could be regarded
as approaching superintelligence relative to a Pleistocene baseline. Some improve-
ments in communications technologies—especially spoken language, but perhaps
also cities, writing, and printing—could also be argued to have, individually or in
combination, provided super-sized boosts, in the sense that if another innovation
of comparable impact to our collective intellectual problem-solving capacity were
to happen, it would result in collective superintelligence.
10
A certain kind of reader will be tempted at this point to interject that modern
society does not seem so particularly intelligent. Perhaps some unwelcome politi-
cal decision has just been made in the reader’s home country, and the apparent
unwisdom of that decision now looms large in the reader’s mind as evidence of
the mental incapacity of the modern era. And is it not the case that contemporary
humanity is idolizing material consumption, depleting natural resources, pollut-
ing the environment, decimating species diversity, all the while failing to remedy
screaming global injustices and neglecting paramount humanistic or spiritual
values? However, setting aside the question of how modernitys shortcomings
stack up against the not-so-inconsiderable failings of earlier epochs, nothing in
our denition of collective superintelligence implies that a society with greater
collective intelligence is necessarily better o. e denition does not even imply
that the more collectively intelligent society is wiser. We can think of wisdom as
the ability to get the important things approximately right. It is then possible to
imagine an organization composed of a very large cadre of very eciently coordi-
nated knowledge workers, who collectively can solve intellectual problems across
many very general domains. is organization, let us suppose, can operate most
kinds of businesses, invent most kinds of technologies, and optimize most kinds
of processes. Even so, it might get a few key big-picture issues entirely wrong—
for instance, it may fail to take proper precautions against existential risks—and
as a result pursue a short explosive growth spurt that ends ingloriously in total
collapse. Such an organization could have a very high degree of collective intel-
ligence; if suciently high, the organization is a collective superintelligence. We
should resist the temptation to roll every normatively desirable attribute into
one giant amorphous concept of mental functioning, as though one could never
56
|
FORMS OF SUPERINTELLIGENCE
nd one admirable trait without all the others being equally present. Instead, we
should recognize that there can exist instrumentally powerful information pro-
cessing systems—intelligent systems—that are neither inherently good nor reli-
ably wise. But we will revisit this issue in Chapter 7.
Collective superintelligence could be either loosely or tightly integrated.
To illustrate a case of loosely integrated collective superintelligence, imagine a
planet, MegaEarth, which has the same level of communication and coordination
technologies that we currently have on the real Earth but with a population one
million times as large. With such a huge population, the total intellectual work-
force on MegaEarth would be correspondingly larger than on our planet. Suppose
that a scientic genius of the caliber of a Newton or an Einstein arises at least
once for every 10 billion people: then on MegaEarth there would be 700,000 such
geniuses living contemporaneously, alongside proportionally vast multitudes of
slightly lesser talents. New ideas and technologies would be developed at a furious
pace, and global civilization on MegaEarth would constitute a loosely integrated
collective superintelligence.
11
If we gradually increase the level of integration of a collective intelligence, it
may eventually become a unied intellect—asingle large “mind” as opposed to a
mere assemblage of loosely interacting smaller human minds.
12
e inhabitants of
MegaEarth could take steps in that direction by improving communications and
coordination technologies and by developing better ways for many individuals to
work on any hard intellectual problem together. A collective superintelligence could
thus, aer gaining suciently in integration, become a “quality superintelligence.
Quality superintelligence
We can distinguish a third form of superintelligence.
Quality superintelligence: A system that is at least as fast as a human mind and vastly
qualitatively smarter.
As with collective intelligence, intelligence quality is also a somewhat murky con-
cept; and in this case the diculty is compounded by our lack of experience with
any variations in intelligence quality above the upper end of the present human
distribution. We can, however, get some grasp of the notion by considering some
related cases.
First, we can expand the range of our reference points by considering non-
human animals, which have intelligence of lower quality. (is is not meant as
a speciesist remark. A zebrash has a quality of intelligence that is excellently
adapted to its ecological needs; but the relevant perspective here is a more anthro-
pocentric one: our concern is with performance on humanly relevant complex
cognitive tasks.) Nonhuman animals lack complex structured language; they
are capable of no or only rudimentary tool use and tool construction; they are
severely restricted in their ability to make long-term plans; and they have very
QUALITY SUPERINTELLIGENCE
|
57
limited abstract reasoning ability. Nor are these limitations fully explained by
a lack of speed or of collective intelligence among nonhuman animal minds. In
terms of raw computational power, human brains are probably inferior to those
of some large animals, including elephants and whales. And although human-
ity’s complex technological civilization would be impossible without our massive
advantage in collective intelligence, not all distinctly human cognitive capabili-
ties depend on collective intelligence. Many are highly developed even in small,
isolated hunter–gatherer bands.
13
And many are not nearly as highly developed
among highly organized nonhuman animals, such as chimpanzees and dolphins
intensely trained by human instructors, or ants living in their own large and well-
ordered societies. Evidently, the remarkable intellectual achievements of Homo
sapiens are to a signicant extent attributable to specic features of our brain
architecture, features that depend on a unique genetic endowment not shared
by other animals. is observation can help us illustrate the concept of quality
superintelligence: it is intelligence of quality at least as superior to that of human
intelligence as the quality of human intelligence is superior to that of elephants’,
dolphins’, or chimpanzees’.
A second way to illustrate the concept of quality superintelligence is by noting
the domain-specic cognitive decits that can aict individual humans, particu-
larly decits that are not caused by general dementia or other conditions asso-
ciated with wholesale destruction of the brains neurocomputational resources.
Consider, for example, individuals with autism spectrum disorders who may
have striking decits in social cognition while functioning well in other cogni-
tive domains; or individuals with congenital amusia, who are unable to hum or
recognize simple tunes yet perform normally in most other respects. Many other
examples could be adduced from the neuropsychiatric literature, which is replete
with case studies of patients suering narrowly circumscribed decits caused by
genetic abnormalities or brain trauma. Such examples show that normal human
adults have a range of remarkable cognitive talents that are not simply a function
of possessing a sucient amount of general neural processing power or even a suf-
cient amount of general intelligence: specialized neural circuitry is also needed.
is observation suggests the idea of possible but non-realized cognitive talents,
talents that no actual human possesses even though other intelligent systems—
ones with no more computing power than the human brain—that did have those
talents would gain enormously in their ability to accomplish a wide range of stra-
tegically relevant tasks.
Accordingly, by considering nonhuman animals and human individuals with
domain-specic cognitive decits, we can form some notion of dierent qualities
of intelligence and the practical dierence they make. Had Homo sapiens lacked
(for instance) the cognitive modules that enable complex linguistic represen-
tations, it might have been just another simian species living in harmony with
nature. Conversely, were we to gain some new set of modules giving an advantage
comparable to that of being able to form complex linguistic representations, we
would become superintelligent.
58
|
FORMS OF SUPERINTELLIGENCE
Direct and indirect reach
Superintelligence in any of these forms could, over time, develop the technology
necessary to create any of the others. e indirect reaches of these three forms of
superintelligence are therefore equal. In that sense, the indirect reach of current
human intelligence is also in the same equivalence class, under the supposition
that we are able eventually to create some form of superintelligence. Yet there
is a sense in which the three forms of superintelligence are much closer to one
another: any one of them could create other forms of superintelligence more rap-
idly than we can create any form of superintelligence from our present starting
point.
e direct reaches of the three dierent forms of superintelligence are harder to
compare. ere may be no denite ordering. eir respective capabilities depend
on the degree to which they instantiate their respective advantageshow fast a
speed superintelligence is, how qualitatively superior a quality superintelligence
is, and so forth. At most, we might say that, ceteris paribus, speed superintelligence
excels at tasks requiring the rapid execution of a long series of steps that must be
performed sequentially while collective superintelligence excels at tasks admit-
ting of analytic decomposition into parallelizable sub-tasks and tasks demand-
ing the combination of many dierent perspectives and skill sets. In some vague
sense, quality superintelligence would be the most capable form of all, inasmuch
as it could grasp and solve problems that are, for all practical purposes, beyond
the direct reach of speed superintelligence and collective superintelligence.
14
In some domains, quantity is a poor substitute for quality. One solitary genius
working out of a cork-lined bedroom can write In Search of Lost Time. Could
an equivalent masterpiece be produced by recruiting an oce building full of
literary hacks?
15
Even within the range of present human variation we see that
some functions benet greatly from the labor of one brilliant mastermind as
opposed to the joint eorts of myriad mediocrities. If we widen our purview to
include superintelligent minds, we must countenance a likelihood of there being
intellectual problems solvable only by superintelligence and intractable to any
ever-so-large collective of non-augmented humans.
ere might thus be some problems that are solvable by a quality superintel-
ligence, and perhaps by a speed superintelligence, yet which a loosely integrated
collective superintelligence cannot solve (other than by rst amplifying its own
intelligence).
16
We cannot clearly see what all these problems are, but we can
characterize them in general terms.
17
ey would tend to be problems involv-
ing multiple complex interdependencies that do not permit of independently
veriable solution steps: problems that therefore cannot be solved in a piecemeal
fashion, and that might require qualitatively new kinds of understanding or new
representational frameworks that are too deep or too complicated for the cur-
rent edition of mortals to discover or use eectively. Some types of artistic crea-
tion and strategic cognition might fall into this category. Some types of scientic
breakthrough, perhaps, likewise. And one can speculate that the tardiness and
SOURCES OF ADVANTAGE FOR DIGITAL INTELLIGENCE
|
59
wobbliness of humanitys progress on many of the “eternal problems” of philoso-
phy are due to the unsuitability of the human cortex for philosophical work. On
this view, our most celebrated philosophers are like dogs walking on their hind
legs—just barely attaining the threshold level of performance required for engag-
ing in the activity at all.
18
Sources of advantage for digital intelligence
Minor changes in brain volume and wiring can have major consequences, as we
see when we compare the intellectual and technological achievements of humans
with those of other apes. e far greater changes in computing resources and
architecture that machine intelligence will enable will probably have conse-
quences that are even more profound. It is dicult, perhaps impossible, for us to
form an intuitive sense of the aptitudes of a superintelligence; but we can at least
get an inkling of the space of possibilities by looking at some of the advantages
open to digital minds. e hardware advantages are easiest to appreciate:
• Speed of computational elements. Biological neurons operate at a peak speed of about
200Hz, a full seven orders of magnitude slower than a modern microprocessor
(~ 2GHz).
9
As a consequence, the human brain is forced to rely on massive paral-
lelization and is incapable of rapidly performing any computation that requires a
large number of sequential operations.
20
(Anything the brain does in under a second
cannot use much more than a hundred sequential operations—perhaps only a few
dozen.) Yet many of the most practically important algorithms in programming and
computer science are not easily parallelizable. Many cognitive tasks could be per-
formed far more eciently if the brain’s native support for parallelizable pattern-
matching algorithms were complemented by, and integrated with, support for fast
sequential processing.
• Internal communication speed. Axons carry action potentials at speeds of 20m/s or
less, whereas electronic processing cores can communicate optically at the speed of
light (300,000,000m/s).
2
The sluggishness of neural signals limits how big a biological
brain can be while functioning as a single processing unit. For example, to achieve a
round-trip latency of less than 0 ms between any two elements in a system, biologi-
cal brains must be smaller than 0.m
3
. An electronic system, on the other hand,
co ul d b e 6 .×0
7
m
3
, about the size of a dwarf planet: eighteen orders of magnitude
larger.
22
• Number of computational elements. The human brain has somewhat fewer than 00
billion neurons.
23
Humans have about three and a half times the brain size of chim-
panzees (though only one-fifth the brain size of sperm whales).
24
The number of
neurons in a biological creature is most obviously limited by cranial volume and
metabolic constraints, but other factors may also be significant for larger brains
(such as cooling, development time, and signal-conductance delayssee the previ-
ous point). By contrast, computer hardware is indefinitely scalable up to very high
physical limits.
25
Supercomputers can be warehouse-sized or larger, with additional
remote capacity added via high-speed cables.
26
60
|
FORMS OF SUPERINTELLIGENCE
• Storage capacity. Human working memory is able to hold no more than some four
or five chunks of information at any given time.
27
While it would be misleading to
compare the size of human working memory directly with the amount of RAM in a
digital computer, it is clear that the hardware advantages of digital intelligences will
make it possible for them to have larger working memories. This might enable such
minds to intuitively grasp complex relationships that humans can only fumblingly
handle via plodding calculation.
28
Human long-term memory is also limited, though
it is unclear whether we manage to exhaust its storage capacity during the course
of an ordinary lifetime—the rate at which we accumulate information is so slow.
(On one estimate, the adult human brain stores about one billion bitsa couple
of orders of magnitude less than a low-end smartphone.
29
) Both the amount of
information stored and the speed with which it can be accessed could thus be vastly
greater in a machine brain than in a biological brain.
• Reliability, lifespan, sensors, etc. Machine intelligences might have various other hard-
ware advantages. For example, biological neurons are less reliable than transistors.
30
Since noisy computing necessitates redundant encoding schemes that use multiple
elements to encode a single bit of information, a digital brain might derive some
eciency gains from the use of reliable high-precision computing elements. Brains
become fatigued after a few hours of work and start to permanently decay after
a few decades of subjective time; microprocessors are not subject to these limita-
tions. Data flow into a machine intelligence could be increased by adding millions of
sensors. Depending on the technology used, a machine might have reconfigurable
hardware that can be optimized for changing task requirements, whereas much of
the brain’s architecture is fixed from birth or only slowly changeable (though the
details of synaptic connectivity can change over shorter timescales, like days).
3
At present, the computational power of the biological brain still compares favor-
ably with that of digital computers, though top-of-the-line supercomputers are
attaining levels of performance that are within the range of plausible estimates of
the brain’s processing power.
32
But hardware is rapidly improving, and the ulti-
mate limits of hardware performance are vastly higher than those of biological
computing substrates.
Digital minds will also benet from major advantages in soware:
• Editability. It is easier to experiment with parameter variations in software than in
neural wetware. For example, with a whole brain emulation one could easily trial
what happens if one adds more neurons in a particular cortical area or if one in-
creases or decreases their excitability. Running such experiments in living biological
brains would be far more dicult.
• Duplicability. With software, one can quickly make arbitrarily many high-fidelity cop-
ies to fill the available hardware base. Biological brains, by contrast, can be repro-
duced only very slowly; and each new instance starts out in a helpless state, remem-
bering nothing of what its parents learned in their lifetimes.
• Goal coordination. Human collectives are replete with ineciencies arising from the
fact that it is nearly impossible to achieve complete uniformity of purpose among the
members of a large group—at least until it becomes feasible to induce docility on a
SOURCES OF ADVANTAGE FOR DIGITAL INTELLIGENCE
|
6
large scale by means of drugs or genetic selection. A “copy clan” (a group of identical
or almost identical programs sharing a common goal) would avoid such coordination
problems.
• Memory sharing. Biological brains need extended periods of training and mentorship
whereas digital minds could acquire new memories and skills by swapping data files.
A population of a billion copies of an AI program could synchronize their databases
periodically, so that all the instances of the program know everything that any in-
stance learned during the previous hour. (Direct memory transfer requires stand-
ardized representational formats. Easy swapping of high-level cognitive content
would therefore not be possible between just any pair of machine intelligences. In
particular, it would not be possible among first-generation whole brain emulations.)
• New modules, modalities, and algorithms. Visual perception seems to us easy and ef-
fortless, quite unlike solving textbook geometry problems—this despite the fact
that it takes a massive amount of computation to reconstruct, from the two-
dimensional patterns of stimulation on our retinas, a three-dimensional representa-
tion of a world populated with recognizable objects. The reason this seems easy
is that we have dedicated low-level neural machinery for processing visual infor-
mation. This low-level processing occurs unconsciously and automatically, without
draining our mental energy or conscious attention. Music perception, language use,
social cognition, and other forms of information processing that are “natural” for us
humans seem to be likewise supported by dedicated neurocomputational modules.
An artificial mind that had such specialized support for other cognitive domains that
have become important in the contemporary worldsuch as engineering, com-
puter programming, and business strategywould have big advantages over minds
like ours that have to rely on clunky general-purpose cognition to think about such
things. New algorithms may also be developed to take advantage of the distinct af-
fordances of digital hardware, such as its support for fast serial processing.
e ultimately attainable advantages of machine intelligence, hardware and so-
ware combined, are enormous.
33
But how rapidly could those potential advan-
tages be realized? at is the question to which we now turn.
62
|
THE KINETICS OF AN INTELLIGENCE EXPLOSION
CHAPTER 4
The kinetics of an
intelligence explosion
O
nce machines attain some form of human-equivalence in general rea-
soning ability, how long will it then be before they attain radical super-
intelligence? Will this be a slow, gradual, protracted transition? Or will
it be sudden, explosive? is chapter analyzes the kinetics of the transition to
superintelligence as a function of optimization power and system recalcitrance.
We consider what we know or may reasonably surmise about the behavior of
these two factors in the neighborhood of human-level general intelligence.
Timing and speed of the takeo
Given that machines will eventually vastly exceed biology in general intelligence,
but that machine cognition is currently vastly narrower than human cognition,
one is led to wonder how quickly this usurpation will take place. e question we
are asking here must be sharply distinguished from the question we considered in
Chapter 1 about how far away we currently are from developing a machine with
human-level general intelligence. Here the question is instead, if and when such a
machine is developed, how long will it be from then until a machine becomes radi-
cally superintelligent? Note that one could think that it will take quite a long time
until machines reach the human baseline, or one might be agnostic about how
long that will take, and yet have a strong view that once this happens, the further
ascent into strong superintelligence will be very rapid.
It can be helpful to think about these matters schematically, even though
doing so involves temporarily ignoring some qualications and complicating
details. Consider, then, a diagram that plots the intellectual capability of the most
advanced machine intelligence system as a function of time (Figure 7).
A horizontal line labeled “human baseline” represents the eective intellec-
tual capabilities of a representative human adult with access to the information
TIMING AND SPEED OF THE TAKEOFF
|
63
sources and technological aids currently available in developed countries. At pre-
sent, the most advanced AI system is far below the human baseline on any rea-
sonable metric of general intellectual ability. At some point in future, a machine
might reach approximate parity with this human baseline (which we take to be
xed—anchored to the year 2014, say, even if the capabilities of human individu-
als should have increased in the intervening years): this would mark the onset
of the takeo. e capabilities of the system continue to grow, and at some later
point the system reaches parity with the combined intellectual capability of all
of humanity (again anchored to the present): what we may call the “civilization
baseline”. Eventually, if the system’s abilities continue to grow, it attains “strong
superintelligence”—a level of intelligence vastly greater than contemporary
humanitys combined intellectual wherewithal. e attainment of strong super-
intelligence marks the completion of the takeo, though the system might con-
tinue to gain in capacity thereaer. Sometime during the takeo phase, the system
may pass a landmark which we can call “the crossover”, a point beyond which
the system’s further improvement is mainly driven by the system’s own actions
rather than by work performed upon it by others.
1
(e possible existence of such
a crossover will become important in the subsection on optimization power and
explosivity, later in this chapter.)
With this picture in mind, we can distinguish three classes of transition
scenarios—scenarios in which systems progress from human-level intelligence to
superintelligence—based on their steepness; that is to say, whether they represent
a slow, fast, or moderate takeo.
Slow
A slow takeo is one that occurs over some long temporal interval, such as
decades or centuries. Slow takeo scenarios oer excellent opportunities for
strong
superintelligence
System
capability
Time
crossover
nowtime until
takeo
takeo
duration
human baseline
civilization
Figure 7 Shape of the takeo. It is important to distinguish between these questions: “Will a
takeo occur, and if so, when?” and “If and when a takeo does occur, how steep will it be?” One
might hold, for example, that it will be a very long time before a takeo occurs, but that when it
does it will proceed very quickly. Another relevant question (not illustrated in this figure) is, “How
large a fraction of the world economy will participate in the takeo?” These questions are related
but distinct.
64
|
THE KINETICS OF AN INTELLIGENCE EXPLOSION
human political processes to adapt and respond. Dierent approaches can be
tried and tested in sequence. New experts can be trained and credentialed.
Grassroots campaigns can be mobilized by groups that feel they are being dis-
advantaged by unfolding developments. If it appears that new kinds of secure
infrastructure or mass surveillance of AI researchers is needed, such systems
could be developed and deployed. Nations fearing an AI arms race would have
time to try to negotiate treaties and design enforcement mechanisms. Most
preparations undertaken before onset of the slow takeo would be rendered
obsolete as better solutions would gradually become visible in the light of the
dawning era.
Fast
A fast takeo occurs over some short temporal interval, such as minutes, hours,
or days. Fast takeo scenarios oer scant opportunity for humans to deliberate.
Nobody need even notice anything unusual before the game is already lost. In a
fast takeo scenario, humanity’s fate essentially depends on preparations previously
put in place. At the slowest end of the fast takeo scenario range, some simple
human actions might be possible, analogous to flicking open the “nuclear suitcase”;
but any such action would either be elementary or have been planned and pre-
programmed in advance.
Moderate
A moderate takeo is one that occurs over some intermediary temporal interval,
such as months or years. Moderate takeo scenarios give humans some chance to
respond but not much time to analyze the situation, to test dierent approaches, or
to solve complicated coordination problems. There is not enough time to develop
or deploy new systems (e.g. political systems, surveillance regimes, or computer
network security protocols), but extant systems could be applied to the new
challenge.
During a slow takeo, there would be plenty of time for the news to get out. In
a moderate takeo, by contrast, it is possible that developments would be kept
secret as they unfold. Knowledge might be restricted to a small group of insiders,
as in a covert state-sponsored military research program. Commercial projects,
small academic teams, and “nine hackers in a basement” outts might also be
clandestine—though, if the prospect of an intelligence explosion were “on the
radar” of state intelligence agencies as a national security priority, then the most
promising private projects would seem to have a good chance of being under sur-
veillance. e host state (or a dominant foreign power) would then have the option
of nationalizing or shutting down any project that showed signs of commencing
takeo. Fast takeos would happen so quickly that there would not be much time
for word to get out or for anybody to mount a meaningful reaction if it did. But an
outsider might intervene before the onset of the takeo if they believed a particu-
lar project to be closing in on success.
TIMING AND SPEED OF THE TAKEOFF
|
65
Moderate takeo scenarios could lead to geopolitical, social, and economic tur-
bulence as individuals and groups jockey to position themselves to gain from the
unfolding transformation. Such upheaval, should it occur, might impede eorts
to orchestrate a well-composed response; alternatively, it might enable solutions
more radical than calmer circumstances would permit. For instance, in a moder-
ate takeo scenario where cheap and capable emulations or other digital minds
gradually ood labor markets over a period of years, one could imagine mass pro-
tests by laid-o workers pressuring governments to increase unemployment ben-
ets or institute a living wage guarantee to all human citizens, or to levy special
taxes or impose minimum wage requirements on employers who use emulation
workers. In order for any relief derived from such policies to be more than eet-
ing, support for them would somehow have to be cemented into permanent power
structures. Similar issues can arise if the takeo is slow rather than moderate, but
the disequilibrium and rapid change in moderate scenarios may present special
opportunities for small groups to wield disproportionate inuence.
It might appear to some readers that of these three types of scenario, the slow
takeo is the most probable, the moderate takeo is less probable, and the fast
takeo is utterly implausible. It could seem fanciful to suppose that the world
could be radically transformed and humanity deposed from its position as apex
cogitator over the course of an hour or two. No change of such moment has
ever occurred in human history, and its nearest parallels—the Agricultural and
Industrial Revolutions—played out over much longer timescales (centuries to
millennia in the former case, decades to centuries in the latter). So the base rate
for the kind of transition entailed by a fast or medium takeo scenario, in terms
of the speed and magnitude of the postulated change, is zero: it lacks precedent
outside myth and religion.
2
Nevertheless, this chapter will present some reasons for thinking that the slow
transition scenario is improbable. If and when a takeo occurs, it will likely be
explosive.
To begin to analyze the question of how fast the takeo will be, we can conceive
of the rate of increase in a system’s intelligence as a (monotonically increasing)
function of two variables: the amount of “optimization power”, or quality-
weighted design eort, that is being applied to increase the system’s intelligence,
and the responsiveness of the system to the application of a given amount of such
optimization power. We might term the inverse of responsiveness “recalcitrance,
and write:
Rate of change in intelligence=
Optimizati
oon power
Recalcitrance
Pending some specication of how to quantify intelligence, design eort, and
recalcitrance, this expression is merely qualitative. But we can at least observe
that a systems intelligence will increase rapidly if either a lot of skilled eort is
applied to the task of increasing its intelligence and the system’s intelligence is
not too hard to increase or there is a non-trivial design eort and the system’s
66
|
THE KINETICS OF AN INTELLIGENCE EXPLOSION
recalcitrance is low (or both). If we know how much design eort is going into
improving a particular system, and the rate of improvement this eort produces,
we could calculate the system’s recalcitrance.
Further, we can observe that the amount of optimization power devoted to
improving some system’s performance varies between systems and over time. A
systems recalcitrance might also vary depending on how much the system has
already been optimized. Oen, the easiest improvements are made rst, leading to
diminishing returns (increasing recalcitrance) as low-hanging fruits are depleted.
However, there can also be improvements that make further improvements easier,
leading to improvement cascades. e process of solving a jigsaw puzzle starts out
simple—it is easy to nd the corners and the edges. en recalcitrance goes up as
subsequent pieces are harder to t. But as the puzzle nears completion, the search
space collapses and the process gets easier again.
To proceed in our inquiry, we must therefore analyze how recalcitrance and
optimization power might vary in the critical time periods during the takeo.
is will occupy us over the next few pages.
Recalcitrance
Let us begin with recalcitrance. e outlook here depends on the type of the sys-
tem under consideration. For completeness, we rst cast a brief glance at the recal-
citrance that would be encountered along paths to superintelligence that do not
involve advanced machine intelligence. We nd that recalcitrance along those
paths appears to be fairly high. at done, we will turn to the main case, which is
that the takeo involves machine intelligence; and there we nd that recalcitrance
at the critical juncture seems low.
Non-machine intelligence paths
Cognitive enhancement via improvements in public health and diet has steeply
diminishing returns.
3
Big gains come from eliminating severe nutritional de-
ciencies, and the most severe deciencies have already been largely eliminated
in all but the poorest countries. Only girth is gained by increasing an already
adequate diet. Education, too, is now probably subject to diminishing returns.
e fraction of talented individuals in the world who lack access to quality educa-
tion is still substantial, but declining.
Pharmacological enhancers might deliver some cognitive gains over the com-
ing decades. But aer the easiest xes have been accomplished—perhaps sus-
tainable increases in mental energy and ability to concentrate, along with better
control over the rate of long-term memory consolidation—subsequent gains will
be increasingly hard to come by. Unlike diet and public health approaches, how-
ever, improving cognition through smart drugs might get easier before it gets
harder. e eld of neuropharmacology still lacks much of the basic knowledge
RECALCITRANCE
|
67
that would be needed to competently intervene in the healthy brain. Neglect of
enhancement medicine as a legitimate area of research may be partially to blame
for this current backwardness. If neuroscience and pharmacology continue to
progress for a while longer without focusing on cognitive enhancement, then
maybe there would be some relatively easy gains to be had when at last the devel-
opment of nootropics becomes a serious priority.
4
Genetic cognitive enhancement has a U-shaped recalcitrance prole similar to
that of nootropics, but with larger potential gains. Recalcitrance starts out high
while the only available method is selective breeding sustained over many genera-
tions, something that is obviously dicult to accomplish on a globally signicant
scale. Genetic enhancement will get easier as technology is developed for cheap
and eective genetic testing and selection (and particularly when iterated embryo
selection becomes feasible in humans). ese new techniques will make it possible
to tap the pool of existing human genetic variation for intelligence-enhancing
alleles. As the best existing alleles get incorporated into genetic enhancement
packages, however, further gains will get harder to come by. e need for more
innovative approaches to genetic modication may then increase recalcitrance.
ere are limits to how quickly things can progress along the genetic enhance-
ment path, most notably the fact that germline interventions are subject to an
inevitable maturational lag: this strongly counteracts the possibility of a fast or
moderate takeo.
5
at embryo selection can only be applied in the context of in
vitro fertilization will slow its rate of adoption: another limiting factor.
e recalcitrance along the brain–computer path seems initially very high. In
the unlikely event that it somehow becomes easy to insert brain implants and
to achieve high-level functional integration with the cortex, recalcitrance might
plummet. In the long run, the diculty of making progress along this path would
be similar to that involved in improving emulations or AIs, since the bulk of the
brain–computer system’s intelligence would eventually reside in the computer
part.
e recalcitrance for making networks and organizations in general more e-
cient is high. A vast amount of eort is going into overcoming this recalcitrance,
and the result is an annual improvement of humanity’s total capacity by perhaps
no more than a couple of percent.
6
Furthermore, shis in the internal and external
environment mean that organizations, even if ecient at one time, soon become
ill-adapted to their new circumstances. Ongoing reform eort is thus required
even just to prevent deterioration. A step change in the rate of gain in average
organizational eciency is perhaps conceivable, but it is hard to see how even
the most radical scenario of this kind could produce anything faster than a slow
takeo, since organizations operated by humans are conned to work on human
timescales. e Internet continues to be an exciting frontier with many oppor-
tunities for enhancing collective intelligence, with a recalcitrance that seems at
the moment to be in the moderate range—progress is somewhat swi but a lot of
eort is going into making this progress happen. It may be expected to increase as
low-hanging fruits (such as search engines and email) are depleted.
68
|
THE KINETICS OF AN INTELLIGENCE EXPLOSION
Emulation and AI paths
e diculty of advancing toward whole brain emulation is dicult to estimate.
Yet we can point to a specic future milestone: the successful emulation of an
insect brain. at milestone stands on a hill, and its conquest would bring into
view much of the terrain ahead, allowing us to make a decent guess at the recal-
citrance of scaling up the technology to human whole brain emulation. (A suc-
cessful emulation of a small-mammal brain, such as that of a mouse, would give
an even better vantage point that would allow the distance remaining to a human
whole brain emulation to be estimated with a high degree of precision.) e path
toward articial intelligence, by contrast, may feature no such obvious milestone
or early observation point. It is entirely possible that the quest for articial intel-
ligence will appear to be lost in dense jungle until an unexpected breakthrough
reveals the nishing line in a clearing just a few short steps away.
Recall the distinction between these two questions: How hard is it to attain
roughly human levels of cognitive ability? And how hard is it to get from there
to superhuman levels? e rst question is mainly relevant for predicting how
long it will be before the onset of a takeo. It is the second question that is key to
assessing the shape of the takeo, which is our aim here. And though it might be
tempting to suppose that the step from human level to superhuman level must be
the harder one—this step, aer all, takes place “at a higher altitude” where capac-
ity must be superadded to an already quite capable system—this would be a very
unsafe assumption. It is quite possible that recalcitrance falls when a machine
reaches human parity.
Consider rst whole brain emulation. e diculties involved in creating
the rst human emulation are of a quite dierent kind from those involved in
enhancing an existing emulation. Creating a rst emulation involves huge tech-
nological challenges, particularly in regard to developing the requisite scanning
and image interpretation capabilities. is step might also require considerable
amounts of physical capital—an industrial-scale machine park with hundreds
of high-throughput scanning machines is not implausible. By contrast, enhanc-
ing the quality of an existing emulation involves tweaking algorithms and data
structures: essentially a soware problem, and one that could turn out to be much
easier than perfecting the imaging technology needed to create the original tem-
plate. Programmers could easily experiment with tricks like increasing the neu-
ron count in dierent cortical areas to see how it aects performance.
7
ey also
could work on code optimization and on nding simpler computational models
that preserve the essential functionality of individual neurons or small networks
of neurons. If the last technological prerequisite to fall into place is either scan-
ning or translation, with computing power being relatively abundant, then not
much attention might have been given during the development phase to imple-
mentational eciency, and easy opportunities for computational eciency sav-
ings might be available. (More fundamental architectural reorganization might
also be possible, but that takes us o the emulation path and into AI territory.)
RECALCITRANCE
|
69
Another way to improve the code base once the rst emulation has been pro-
duced is to scan additional brains with dierent or superior skills and talents.
Productivity growth would also occur as a consequence of adapting organiza-
tional structures and workows to the unique attributes of digital minds. Since
there is no precedent in the human economy of a worker who can be literally
copied, reset, run at dierent speeds, and so forth, managers of the rst emulation
cohort would nd plenty of room for innovation in managerial practices.
Aer initially plummeting when human whole brain emulation becomes
possible, recalcitrance may rise again. Sooner or later, the most glaring imple-
mentational ineciencies will have been optimized away, the most promising
algorithmic variations will have been tested, and the easiest opportunities for
organizational innovation will have been exploited. e template library will have
expanded so that acquiring more brain scans would add little benet over work-
ing with existing templates. Since a template can be multiplied, each copy can be
individually trained in a dierent eld, and this can be done at electronic speed,
it might be that the number of brains that would need to be scanned in order
to capture most of the potential economic gains is small. Possibly a single brain
would suce.
Another potential cause of escalating recalcitrance is the possibility that emu-
lations or their biological supporters will organize to support regulations restrict-
ing the use of emulation workers, limiting emulation copying, prohibiting certain
kinds of experimentation with digital minds, instituting workers’ rights and a
minimum wage for emulations, and so forth. It is equally possible, however, that
political developments would go in the opposite direction, contributing to a fall in
recalcitrance. is might happen if initial restraint in the use of emulation labor
gives way to unfettered exploitation as competition heats up and the economic
and strategic costs of occupying the moral high ground become clear.
As for articial intelligence (non-emulation machine intelligence), the di-
culty of liing a system from human-level to superhuman intelligence by means
of algorithmic improvements depends on the attributes of the particular system.
Dierent architectures might have very dierent recalcitrance.
In some situations, recalcitrance could be extremely low. For example, if
human-level AI is delayed because one key insight long eludes programmers, then
when the nal breakthrough occurs, the AI might leapfrog from below to radi-
cally above human level without even touching the intermediary rungs. Another
situation in which recalcitrance could turn out to be extremely low is that of an AI
system that can achieve intelligent capability via two dierent modes of process-
ing. To illustrate this possibility, suppose an AI is composed of two subsystems,
one possessing domain-specic problem-solving techniques, the other possess-
ing general-purpose reasoning ability. It could then be the case that while the sec-
ond subsystem remains below a certain capacity threshold, it contributes nothing
to the system’s overall performance, because the solutions it generates are always
inferior to those generated by the domain-specic subsystem. Suppose now that a
small amount of optimization power is applied to the general-purpose subsystem
70
|
THE KINETICS OF AN INTELLIGENCE EXPLOSION
and that this produces a brisk rise in the capacity of that subsystem. At rst, we
observe no increase in the overall system’s performance, indicating that recalci-
trance is high. en, once the capacity of the general-purpose subsystem crosses
the threshold where its solutions start to beat those of the domain-specic sub-
system, the overall system’s performance suddenly begins to improve at the same
brisk pace as the general-purpose subsystem, even as the amount of optimization
power applied stays constant: the systems recalcitrance has plummeted.
It is also possible that our natural tendency to view intelligence from an anthro-
pocentric perspective will lead us to underestimate improvements in sub-human
systems, and thus to overestimate recalcitrance. Eliezer Yudkowsky, an AI theo-
rist who has written extensively on the future of machine intelligence, puts the
point as follows:
AI might make an apparently sharp jump in intelligence purely as the result of anthropo-
morphism, the human tendency to think of “village idiot” and “Einstein” as the extreme
ends of the intelligence scale, instead of nearly indistinguishable points on the scale of
minds-in-general. Everything dumber than a dumb human may appear to us as simply
“dumb. One imagines the “AI arrow” creeping steadily up the scale of intelligence, moving
past mice and chimpanzees, with AIs still remaining “dumb” because AIs cannot speak
fluent language or write science papers, and then the AI arrow crosses the tiny gap from
infra-idiot to ultra-Einstein in the course of one month or some similarly short period.
8
(See Fig. 8.)
e upshot of these several considerations is that it is dicult to predict how
hard it will be to make algorithmic improvements in the rst AI that reaches
a roughly human level of general intelligence. ere are at least some possible
circumstances in which algorithm-recalcitrance is low. But even if algorithm-
recalcitrance is very high, this would not preclude the overall recalcitrance of the
AI in question from being low. For it might be easy to increase the intelligence of
the system in other ways than by improving its algorithms. ere are two other
factors that can be improved: content and hardware.
First, consider content improvements. By “content” we here mean those parts
of a system’s soware assets that do not make up its core algorithmic architecture.
Content might include, for example, databases of stored percepts, specialized skills
Figure 8 A less anthropomorphic scale? The gap between a dumb and a clever person may
appear large from an anthropocentric perspective, yet in a less parochial view the two have nearly
indistinguishable minds.
9
It will almost certainly prove harder and take longer to build a machine
intelligence that has a general level of smartness comparable to that of a village idiot than to
improve such a system so that it becomes much smarter than any human.
Mouse Village idiot
Chimp Einstein
AI
RECALCITRANCE
|
7
libraries, and inventories of declarative knowledge. For many kinds of system, the
distinction between algorithmic architecture and content is very unsharp; nev-
ertheless, it will serve as a rough-and-ready way of pointing to one potentially
important source of capability gains in a machine intelligence. An alternative way
of expressing much the same idea is by saying that a system’s intellectual problem-
solving capacity can be enhanced not only by making the system cleverer but also
by expanding what the system knows.
Consider a contemporary AI system such as TextRunner (a research project
at the University of Washington) or IBM’s Watson (the system that won the
Jeopardy! quiz show). ese systems can extract certain pieces of semantic infor-
mation by analyzing text. Although these systems do not understand what they
read in the same sense or to the same extent as a human does, they can neverthe-
less extract signicant amounts of information from natural language and use
that information to make simple inferences and answer questions. ey can also
learn from experience, building up more extensive representations of a concept
as they encounter additional instances of its use. ey are designed to operate for
much of the time in unsupervised mode (i.e. to learn hidden structure in unla-
beled data in the absence of error or reward signal, without human guidance) and
to be fast and scalable. TextRunner, for instance, works with a corpus of 500mil-
lion web pages.
10
Now imagine a remote descendant of such a system that has acquired the ability
to read with as much understanding as a human ten-year-old but with a reading
speed similar to that of TextRunner. (is is probably an AI-complete problem.)
So we are imagining a system that thinks much faster and has much better mem-
ory than a human adult, but knows much less, and perhaps the net eect of this is
that the system is roughly human-equivalent in its general problem-solving abil-
ity. But its content recalcitrance is very low—low enough to precipitate a takeo.
Within a few weeks, the system has read and mastered all the content contained
in the Library of Congress. Now the system knows much more than any human
being and thinks vastly faster: it has become (at least) weakly superintelligent.
A system might thus greatly boost its eective intellectual capability by absorb-
ing pre-produced content accumulated through centuries of human science and
civilization: for instance, by reading through the Internet. If an AI reaches human
level without previously having had access to this material or without having been
able to digest it, then the AI’s overall recalcitrance will be low even if it is hard to
improve its algorithmic architecture.
Content-recalcitrance is a relevant concept for emulations, too. A high-speed
emulation has an advantage not only because it can complete the same tasks
as biological humans more quickly, but also because it can accumulate more
timely content, such as task-relevant skills and expertise. In order to tap the full
potential of fast content accumulation, however, a system needs to have a cor-
respondingly large memory capacity. ere is little point in reading an entire
library if you have forgotten all about the aardvark by the time you get to the aba-
lone. While an AI system is likely to have adequate memory capacity, emulations
72
|
THE KINETICS OF AN INTELLIGENCE EXPLOSION
would inherit some of the capacity limitations of their human templates. ey
may therefore need architectural enhancements in order to become capable of
unbounded learning.
So far we have considered the recalcitrance of architecture and of content—that
is, how dicult it would be to improve the soware of a machine intelligence that
has reached human parity. Now let us look at a third way of boosting the per-
formance of machine intelligence: improving its hardware. What would be the
recalcitrance for hardware-driven improvements?
Starting with intelligent soware (emulation or AI) one can amplify collective
intelligence simply by using additional computers to run more instances of the
program.
11
One could also amplify speed intelligence by moving the program to
faster computers. Depending on the degree to which the program lends itself to
parallelization, speed intelligence could also be amplied by running the pro-
gram on more processors. is is likely to be feasible for emulations, which have
a highly parallelized architecture; but many AI programs, too, have important
subroutines that can benet from massive parallelization. Amplifying quality
intelligence by increasing computing power might also be possible, but that case
is less straightforward.
12
e recalcitrance for amplifying collective or speed intelligence (and possibly
quality intelligence) in a system with human-level soware is therefore likely to be
low. e only diculty involved is gaining access to additional computing power.
ere are several ways for a system to expand its hardware base, each relevant over
a dierent timescale.
In the short term, computing power should scale roughly linearly with funding:
twice the funding buys twice the number of computers, enabling twice as many
instances of the soware to be run simultaneously. e emergence of cloud com-
puting services gives a project the option to scale up its computational resources
without even having to wait for new computers to be delivered and installed,
though concerns over secrecy might favor the use of in-house computers. (In cer-
tain scenarios, computing power could also be obtained by other means, such as
by commandeering botnets.
13
) Just how easy it would be to scale the system by a
given factor depends on how much computing power the initial system uses. A
system that initially runs on a PC could be scaled by a factor of thousands for a
mere million dollars. A program that runs on a supercomputer would be far more
expensive to scale.
In the slightly longer term, the cost of acquiring additional hardware may be
driven up as a growing portion of the worlds installed capacity is being used to
run digital minds. For instance, in a competitive market-based emulation scen-
ario, the cost of running one additional copy of an emulation should rise to be
roughly equal to the income generated by the marginal copy, as investors bid up
the price for existing computing infrastructure to match the return they expect
from their investment (though if only one project has mastered the technology
it might gain a degree of monopsony power in the computing power market and
therefore pay a lower price).
OPTIMIZATION POWER AND EXPLOSIVITY
|
73
Over a somewhat longer timescale, the supply of computing power will grow
as new capacity is installed. A demand spike would spur production in existing
semiconductor foundries and stimulate the construction of new plants. (A one-
o performance boost, perhaps amounting to one or two orders of magnitude,
might also be obtainable by using customized microprocessors.
14
) Above all, the
rising wave of technology improvements will pour increasing volumes of compu-
tational power into the turbines of the thinking machines. Historically, the rate of
improvement of computing technology has been described by the famous Moore’s
law, which in one of its variations states that computing power per dollar doubles
every 18 months or so.
15
Although one cannot bank on this rate of improvement
continuing up to the development of human-level machine intelligence, yet until
fundamental physical limits are reached there will remain room for advances in
computing technology.
ere are thus reasons to expect that hardware recalcitrance will not be very
high. Purchasing more computing power for the system once it proves its mettle
by attaining human-level intelligence might easily add several orders of magni-
tude of computing power (depending on how hardware-frugal the project was
before expansion). Chip customization might add one or two orders of magni-
tude. Other means of expanding the hardware base, such as building more facto-
ries and advancing the frontier of computing technology, take longer—normally
several years, though this lag would be radically compressed once machine super-
intelligence revolutionizes manufacturing and technology development.
In summary, we can talk about the likelihood of a hardware overhang: when
human-level soware is created, enough computing power may already be avail-
able to run vast numbers of copies at great speed. Soware recalcitrance, as
discussed above, is harder to assess but might be even lower than hardware recalci-
trance. In particular, there may be content overhang in the form of pre-made con-
tent (e.g. the Internet) that becomes available to a system once it reaches human
parity. Algorithm overhang—pre-designed algorithmic enhancements—is also
possible but perhaps less likely. Soware improvements (whether in algorithms
or content) might oer orders of magnitude of potential performance gains that
could be fairly easily accessed once a digital mind attains human parity, on top of
the performance gains attainable by using more or better hardware.
Optimization power and explosivity
Having examined the question of recalcitrance we must now turn to the other
half of our schematic equation, optimization power. To recall: Rate of change in
Intelligence = Optimization power/Recalcitrance. As reected in this schematic,
a fast takeo does not require that recalcitrance during the transition phase be
low. A fast takeo could also result if recalcitrance is constant or even moderately
increasing, provided the optimization power being applied to improving the sys-
tem’s performance grows suciently rapidly. As we shall now see, there are good
74
|
THE KINETICS OF AN INTELLIGENCE EXPLOSION
grounds for thinking that the applied optimization power will increase during
the transition, at least in the absence of a deliberate measures to prevent this from
happening.
We can distinguish two phases. e rst phase begins with the onset of the
takeo, when the system reaches the human baseline for individual intelligence.
As the system’s capability continues to increase, it might use some or all of that
capability to improve itself (or to design a successor system—which, for present
purposes, comes to the same thing). However, most of the optimization power
applied to the system still comes from outside the system, either from the work
of programmers and engineers employed within the project or from such work
done by the rest of the world as can be appropriated and used by the project.
16
If
this phase drags out for any signicant period of time, we can expect the amount
of optimization power applied to the system to grow. Inputs both from inside the
project and from the outside world are likely to increase as the promise of the cho-
sen approach becomes manifest. Researchers may work harder, more research-
ers may be recruited, and more computing power may be purchased to expedite
progress. e increase could be especially dramatic if the development of human-
level machine intelligence takes the world by surprise, in which case what was
previously a small research project might suddenly become the focus of intense
research and development eorts around the world (though some of those eorts
might be channeled into competing projects).
A second growth phase will begin if at some point the system has acquired so
much capability that most of the optimization power exerted on it comes from
the system itself (marked by the variable level labeled “crossover” in Figure 7).
is fundamentally changes the dynamic, because any increase in the system’s
capability now translates into a proportional increase in the amount of optimi-
zation power being applied to its further improvement. If recalcitrance remains
constant, this feedback dynamic produces exponential growth (see Box 4). e
doubling constant depends on the scenario but might be extremely short—mere
seconds in some scenarios—if growth is occurring at electronic speeds, which
might happen as a result of algorithmic improvements or the exploitation of an
overhang of content or hardware.
17
Growth that is driven by physical construc-
tion, such as the production of new computers or manufacturing equipment,
would require a somewhat longer timescale (but still one that might be very short
compared with the current growth rate of the world economy).
It is thus likely that the applied optimization power will increase during the
transition: initially because humans try harder to improve a machine intelligence
that is showing spectacular promise, later because the machine intelligence itself
becomes capable of driving further progress at digital speeds. is would create
a real possibility of a fast or medium takeo even if recalcitrance were constant or
slightly increasing around the human baseline.
18
Yet we saw in the previous subsec-
tion that there are factors that could lead to a big drop in recalcitrance around
the human baseline level of capability. ese factors include, for example, the
possibility of rapid hardware expansion once a working soware mind has been
OPTIMIZATION POWER AND EXPLOSIVITY
|
75
Box 4 On the kinetics of an intelligence explosion
We can write the rate of change in intelligence as the ratio between the optimiza-
tion power applied to the system and the system’s recalcitrance:
d
dt
I
=
O
.
The amount of optimization power acting on a system is the sum of whatever
optimization power the system itself contributes and the optimization power
exerted from without. For example, a seed AI might be improved through a
combination of its own eorts and the eorts of a human programming team,
and perhaps also the eorts of the wider global community of researchers mak-
ing continuous advances in the semiconductor industry, computer science, and
related fields:
9
OO
OO
=++
systemproject world
.
A seed AI starts out with very limited cognitive capacities. At the outset, there-
fore,
O
system
is small.
20
What about
O
project
and
O
world
? There are cases in which
a single project has more relevant capability than the rest of the world com-
binedthe Manhattan project, for instance, brought a very large fraction of the
world’s best physicists to Los Alamos to work on the atomic bomb. More com-
monly, any one project contains only a small fraction of the world’s total relevant
research capability. But even when the outside world has a greater total amount
of relevant research capability than any one project,
O
project
may nevertheless
exceed
O
world
, since much of the outside world’s capability is not be focused on
the particular system in question. If a project begins to look promising—which
will happen when a system passes the human baseline if not before—it might
attract additional investment, increasing
O
project
. If the project’s accomplishments
are public,
O
world
might also rise as the progress inspires greater interest in ma-
chine intelligence generally and as various powers scramble to get in on the
game. During the transition phase, therefore, total optimization power applied
to improving a cognitive system is likely to increase as the capability of the system
increases.
2
As the system’s capabilities grow, there may come a point at which the op-
timization power generated by the system itself starts to dominate the opti-
mization power applied to it from outside (across all significant dimensions of
improvement):
OOO
systemproject world
>+.
This crossover is significant because beyond this point, further improvement
to the system’s capabilities contributes strongly to increasing the total opti-
mization power applied to improving the system. We thereby enter a regime
continued
76
|
THE KINETICS OF AN INTELLIGENCE EXPLOSION
Box 4 Continued
of strong recursive self-improvement. This leads to explosive growth of the
system’s capability under a fairly wide range of dierent shapes of the recalci-
trance curve.
To illustrate, consider first a scenario in which recalcitrance is constant, so that
the rate of increase in an AIs intelligence is equal to the optimization power being
applied. Assume that all the optimization power that is applied comes from the
AI itself and that the AI applies all its intelligence to the task of amplifying its own
intelligence, so that
O
system
=
I
.
22
We then have
d
dt k
II
= .
Solving this simple dierential equation yields the exponential function
I
=Ae
t/k
.
But recalcitrance being constant is a rather special case. Recalcitrance might
well decline around the human baseline, due to one or more of the factors
mentioned in the previous subsection, and remain low around the crossover
and some distance beyond (perhaps until the system eventually approaches
fundamental physical limits). For example, suppose that the optimization power
applied to the system is roughly constant (i.e.
OO
projec
tw
orld
+
c) prior to the
system becoming capable of contributing substantially to its own design, and that
this leads to the system doubling in capacity every 8 months. (This would be
roughly in line with historical improvement rates from Moore’s law combined
with software advances.
23
) This rate of improvement, if achieved by means of
roughly constant optimization power, entails recalcitrance declining as the in-
verse of the system power:
d
dt
c
I
I
I
= =
1/
c.
If recalcitrance continues to fall along this hyperbolic pattern, then when the AI
reaches the crossover point the total amount of optimization power applied to
improving the AI has doubled. We then have
d
dt
c+
II
I
II
==+
()
()
.
1/
c
The next doubling occurs 7.5 months later. Within 7.9 months, the system’s
capacity has grown a thousandfold, thus obtaining speed superintelligence
(Figure9).
This particular growth trajectory has a positive singularity at t = 8 months. In
reality, the assumption that recalcitrance is constant would cease to hold as the
system began to approach the physical limits to information processing, if not
sooner.
OPTIMIZATION POWER AND EXPLOSIVITY
|
77
attained; the possibility of algorithmic improvements; the possibility of scanning
additional brains (in the case of whole brain emulation); and the possibility of
rapidly incorporating vast amounts of content by digesting the Internet (in the
case of articial intelligence).
24
ese observations notwithstanding, the shape of the recalcitrance curve in
the relevant region is not yet well characterized. In particular, it is unclear how
dicult it would be to improve the soware quality of a human-level emulation
or AI. e diculty of expanding the hardware power available to a system is also
not clear. Whereas today it would be relatively easy to increase the computing
power available to a small project by spending a thousand times more on comput-
ing power or by waiting a few years for the price of computers to fall, it is possible
that the rst machine intelligence to reach the human baseline will result from a
large project involving pricey supercomputers, which cannot be cheaply scaled,
and that Moore’s law will by then have expired. For these reasons, although a fast
or medium takeo looks more likely, the possibility of a slow takeo cannot be
excluded.
25
Box 4 Continued
These two scenarios are intended for illustration only; many other trajectories
are possible, depending on the shape of the recalcitrance curve. The claim is sim-
ply that the strong feedback loop that sets in around the crossover point tends
strongly to make the takeo faster than it would otherwise have been.
Figure 9 One simple model of an intelligence explosion.
–80 –60 –40 –200 20
0
0.1
1
10
100
1000
System optimization power
Time aer crossover point (months)
78
|
DECISIVE STRATEGIC ADVANTAGE
CHAPTER 5
Decisive strategic
advantage
A
question distinct from, but related to, the question of kinetics is
whether there will there be one superintelligent power or many? Might
an intelligence explosion propel one project so far ahead of all others as
to make it able to dictate the future? Or will progress be more uniform, unfurl-
ing across a wide front, with many projects participating but none securing an
overwhelming and permanent lead?
e preceding chapter analyzed one key parameter in determining the size
of the gap that might plausibly open up between a leading power and its near-
est competitors— namely, the speed of the transition from human to strongly
superhuman intelligence. is suggests a rst-cut analysis. If the takeo is fast
(completed over the course of hours, days, or weeks) then it is unlikely that two
independent projects would be taking o concurrently: almost certainty, the rst
project would have completed its takeo before any other project would have
started its own. If the takeo is slow (stretching over many years or decades) then
there could plausibly be multiple projects undergoing takeos concurrently, so
that although the projects would by the end of the transition have gained enor-
mously in capability, there would be no time at which any project was far enough
ahead of the others to give it an overwhelming lead. A takeo of moderate speed
is poised in between, with either condition a possibility: there might or might not
be more than one project undergoing the takeo at the same time.
1
Will one machine intelligence project get so far ahead of the competition that
it gets a decisive strategic advantage—that is, a level of technological and other
advantages sucient to enable it to achieve complete world domination? If a pro-
ject did obtain a decisive strategic advantage, would it use it to suppress competi-
tors and form a singleton (a world order in which there is at the global level a single
decision-making agency)? And if there is a winning project, how “large” would
A DECISIVE STRATEGIC ADVANTAGE?
|
79
it be—not in terms of physical size or budget but in terms of how many people’s
desires would be controlling its design? We will consider these questions in turn.
Will the frontrunner get a decisive strategic
advantage?
One factor inuencing the width of the gap between frontrunner and followers
is the rate of diusion of whatever it is that gives the leader a competitive advan-
tage. A frontrunner might nd it dicult to gain and maintain a large lead if
followers can easily copy the frontrunner’s ideas and innovations. Imitation cre-
ates a headwind that disadvantages the leader and benets laggards, especially if
intellectual property is weakly protected. A frontrunner might also be especially
vulnerable to expropriation, taxation, or being broken up under anti-monopoly
regulation.
It would be a mistake, however, to assume that this headwind must increase
monotonically with the gap between frontrunner and followers. Just as a rac-
ing cyclist who falls too far behind the competition is no longer shielded from
the wind by the cyclists ahead, so a technology follower who lags suciently
behind the cutting edge might nd it hard to assimilate the advances being
made at the frontier.
2
e gap in understanding and capability might have
grown too large. e leader might have migrated to a more advanced technol-
ogy platform, making subsequent innovations untransferable to the primitive
platforms used by laggards. A suciently pre-eminent leader might have the
ability to stem information leakage from its research programs and its sensi-
tive installations, or to sabotage its competitors’ eorts to develop their own
advanced capabilities.
If the frontrunner is an AI system, it could have attributes that make it easier
for it to expand its capabilities while reducing the rate of diusion. In human-run
organizations, economies of scale are counteracted by bureaucratic ineciencies
and agency problems, including diculties in keeping trade secrets.
3
ese prob-
lems would presumably limit the growth of a machine intelligence project so long
as it is operated by humans. An AI system, however, might avoid some of these
scale diseconomies, since the AIs modules (in contrast to human workers) need
not have individual preferences that diverge from those of the system as a whole.
us, the AI system could avoid a sizeable chunk of the ineciencies arising from
agency problems in human enterprises. e same advantage—having perfectly
loyal parts—would also make it easier for an AI system to pursue long-range clan-
destine goals. An AI would have no disgruntled employees ready to be poached by
competitors or bribed into becoming informants.
4
We can get a sense of the distribution of plausible gaps in development times by
looking at some historical examples (see Box 5). It appears that lags in the range
of a few months to a few years are typical of strategically signicant technology
projects.
80
|
DECISIVE STRATEGIC ADVANTAGE
Box 5 Technology races: some historical examples
Over long historical timescales, there has been an increase in the rate at which
knowledge and technology diuse around the globe. As a result, the temporal
gaps between technology leaders and nearest followers have narrowed.
China managed to maintain a monopoly on silk production for over two thou-
sand years. Archeological finds suggest that production might have begun around
3000 , or even earlier.
5
Sericulture was a closely held secret. Revealing the
techniques was punishable by death, as was exporting silkworms or their eggs
outside China. The Romans, despite the high price commanded by imported silk
cloth in their empire, never learnt the art of silk manufacture. Not until around
 300 did a Japanese expedition manage to capture some silkworm eggs along
with four young Chinese girls, who were forced to divulge the art to their abduc-
tors.
6
Byzantium joined the club of producers in  522. The story of porcelain-
making also features long lags. The craft was practiced in China during the Tang
Dynasty around  600 (and might have been in use as early as  200), but
was mastered by Europeans only in the eighteenth century.
7
Wheeled vehicles
appeared in several sites across Europe and Mesopotamia around 3500  but
reached the Americas only in post-Columbian times.
8
On a grander scale, the hu-
man species took tens of thousands of years to spread across most of the globe,
the Agricultural Revolution thousands of years, the Industrial Revolution only
hundreds of years, and an Information Revolution could be said to have spread
globally over the course of decades—though, of course, these transitions are not
necessarily of equal profundity. (The Dance Dance Revolution video game spread
from Japan to Europe and North America in just one year!)
Technological competition has been quite extensively studied, particularly in
the contexts of patent races and arms races.
9
It is beyond the scope of our inves-
tigation to review this literature here. However, it is instructive to look at some
examples of strategically significant technology races in the twentieth century
(see Table 7).
With regard to these six technologies, which were regarded as strategically
important by the rivaling superpowers because of their military or symbolic
significance, the gaps between leader and nearest laggard were (very approxi-
mately) 49 months, 36 months, 4 months,  month, 4 months, and 60 months,
respectivelylonger than the duration of a fast takeo and shorter than the
duration of a slow takeo.
0
In many cases, the laggard’s project benefitted from
espionage and publicly available information. The mere demonstration of the
feasibility of an invention can also encourage others to develop it independently;
and fear of falling behind can spur the eorts to catch up.
Perhaps closer to the case of AI are mathematical inventions that do not re-
quire the development of new physical infrastructure. Often these are published
in the academic literature and can thus be regarded as universally available;
but in some cases, when the discovery appears to oer a strategic advantage,
Box 5 Continued
Table 7 Some strategically significant technology races
United
States
Soviet
Union
United
Kingdom
France China India Israel Pakistan North
Korea
South
Africa
Fission bomb 945 949 952 960 964 974 979? 998 2006 979?
Fusion bomb 952 953

957 968 967 998 ?
Satellite launch
capability
958 957 97 965 970 980 988 998?
2
3
Human launch
capability
96 96 2003
ICBM
4
959 960 968
5
985 97 202 2008
6
2006
7
MIRV
8
970 975 979 985 2007 204
9
2008?
continued
A DECISIVE STRATEGIC ADVANTAGE?
|
8
82
|
DECISIVE STRATEGIC ADVANTAGE
Box 5 Continued
publication is delayed. For example, two of the most important ideas in public-
key cryptography are the DieHellman key exchange protocol and the RSA
encryption scheme. These were discovered by the academic community in 976
and 978, respectively, but it has later been confirmed that they were known
by cryptographers at the UK’s communications security group since the early
970s.
20
Large software projects might oer a closer analogy with AI projects,
but it is harder to give crisp examples of typical lags because software is usually
rolled out in incremental installments and the functionalities of competing sys-
tems are often not directly comparable.
It is possible that globalization and increased surveillance will reduce typi-
cal lags between competing technology projects. Yet there is likely to be a lower
bound on how short the average lag could become (in the absence of deliberate
coordination).
21
Even absent dynamics that lead to snowball eects, some projects
will happen to end up with better research sta, leadership, and infrastructure, or
will just stumble upon better ideas. If two projects pursue alternative approaches,
one of which turns out to work better, it may take the rival project many months
to switch to the superior approach even if it is able to closely monitor what the
forerunner is doing.
Combining these observations with our earlier discussion of the speed of the
takeo, we can conclude that it is highly unlikely that two projects would be close
enough to undergo a fast takeo concurrently; for a medium takeo, it could eas-
ily go either way; and for a slow takeo, it is highly likely that several projects
would undergo the process in parallel. But the analysis needs a further step. e
key question is not how many projects undergo a takeo in tandem, but how many
projects emerge on the yonder side suciently tightly clustered in capability that
none of them has a decisive strategic advantage. If the takeo process is relatively
slow to begin and then gets faster, the distance between competing projects would
tend to grow. To return to our bicycle metaphor, the situation would be analogous
to a pair of cyclists making their way up a steep hill, one trailing some distance
behind the other—the gap between them then expanding as the frontrunner
reaches the peak and starts accelerating down the other side.
Consider the following medium takeo scenario. Suppose it takes a project one
year to increase its AI’s capability from the human baseline to a strong superintel-
ligence, and that one project enters this takeo phase with a six-month lead over
the next most advanced project. e two projects will be undergoing a takeo
concurrently. It might seem, then, that neither project gets a decisive strategic
advantage. But that need not be so. Suppose it takes nine months to advance from
the human baseline to the crossover point, and another three months from there
to strong superintelligence. e frontrunner then attains strong superintelligence
three months before the following project even reaches the crossover point. is
HOW LARGE WILL THE SUCCESSFUL PROJECT BE?
|
83
would give the leading project a decisive strategic advantage and the opportunity
to parlay its lead into permanent control by disabling the competing projects and
establishing a singleton. (Note that the concept of a singleton is an abstract one:
a singleton could be democracy, a tyranny, a single dominant AI, a strong set of
global norms that include eective provisions for their own enforcement, or even
an alien overlord—its dening characteristic being simply that it is some form of
agency that can solve all major global coordination problems. It may, but need
not, resemble any familiar form of human governance.
22
)
Since there is an especially strong prospect of explosive growth just aer the
crossover point, when the strong positive feedback loop of optimization power
kicks in, a scenario of this kind is a serious possibility, and it increases the chances
that the leading project will attain a decisive strategic advantage even if the takeo
is not fast.
How large will the successful project be?
Some paths to superintelligence require great resources and are therefore likely to
be the preserve of large well-funded projects. Whole brain emulation, for instance,
requires many dierent kinds of expertise and lots of equipment. Biological intel-
ligence enhancements and brain–computer interfaces would also have a large
scale factor: while a small biotech rm might invent one or two drugs, achieving
superintelligence along one of these paths (if doable at all) would likely require
many inventions and many tests, and therefore the backing of an industrial sec-
tor or a well-funded national program. Achieving collective superintelligence by
making organizations and networks more ecient requires even more extensive
input, involving much of the world economy.
e AI path is more dicult to assess. Perhaps it would require a very large
research program; perhaps it could be done by a small group. A lone hacker scen-
ario cannot be excluded either. Building a seed AI might require insights and
algorithms developed over many decades by the scientic community around the
world. But it is possible that the last critical breakthrough idea might come from
a single individual or a small group that succeeds in putting everything together.
is scenario is less realistic for some AI architectures than others. A system that
has a large number of parts that need to be tweaked and tuned to work eectively
together, and then painstakingly loaded with custom-made cognitive content, is
likely to require a larger project. But if a seed AI could be instantiated as a simple
system, one whose construction depends only on getting a few basic principles
right, then the feat might be within the reach of a small team or an individual.
e likelihood of the nal breakthrough being made by a small project increases
if most previous progress in the eld has been published in the open literature or
made available as open source soware.
We must distinguish the question of how big will be the project that directly
engineers the system from the question of how big the group will be that controls
84
|
DECISIVE STRATEGIC ADVANTAGE
whether, how, and when the system is created. e atomic bomb was created pri-
marily by a group of scientists and engineers. (e Manhattan Project employed
about 130,000 people at its peak, the vast majority of whom were construction
workers or building operators.
23
) ese technical experts, however, were con-
trolled by the US military, which was directed by the US government, which was
ultimately accountable to the American electorate, which at the time constituted
about one-tenth of the adult world population.
24
Monitoring
Given the extreme security implications of superintelligence, governments would
likely seek to nationalize any project on their territory that they thought close
to achieving a takeo. A powerful state might also attempt to acquire projects
located in other countries through espionage, the, kidnapping, bribery, threats,
military conquest, or any other available means. A powerful state that cannot
acquire a foreign project might instead destroy it, especially if the host country
lacks an eective deterrent. If global governance structures are strong by the time
a breakthrough begins to look imminent, it is possible that promising projects
would be placed under international control.
An important question, therefore, is whether national or international author-
ities will see an intelligence explosion coming. At present, intelligence agencies
do not appear to be looking very hard for promising AI projects or other forms
of potentially explosive intelligence amplication.
25
If they are indeed not pay-
ing (much) attention, this is presumably due to the widely shared perception
that there is no prospect whatever of imminent superintelligence. If and when
it becomes a common belief among prestigious scientists that there is a substan-
tial chance that superintelligence is just around the corner, the major intelligence
agencies of the world would probably start to monitor groups and individuals who
seem to be engaged in relevant research. Any project that began to show sucient
progress could then be promptly nationalized. If political elites were persuaded
by the seriousness of the risk, civilian eorts in sensitive areas might be regulated
or outlawed.
How dicult would such monitoring be? e task is easier if the goal is only to
keep track of the leading project. In that case, surveillance focusing on the several
best-resourced projects may be sucient. If the goal is instead to prevent any work
from taking place (at least outside of specially authorized institutions) then sur-
veillance would have to be more comprehensive, since many small projects and
individuals are in a position to make at least some progress.
It would be easier to monitor projects that require signicant amounts of physi-
cal capital, as would be the case with a whole brain emulation project. Articial
intelligence research, by contrast, requires only a personal computer, and would
therefore be more dicult to monitor. Some of the theoretical work could be done
with pen and paper. Even so, it would not be too dicult to identify most capable
individuals with a serious long-standing interest in articial general intelligence
HOW LARGE WILL THE SUCCESSFUL PROJECT BE?
|
85
research. Such individuals usually leave visible trails. ey may have published
academic papers, presented at conferences, posted on Internet forums, or earned
degrees from leading computer science departments. ey may also have had
communications with other AI researchers, allowing them to be identied by
mapping the social graph.
Projects designed from the outset to be secret could be more dicult to detect.
An ordinary soware development project could serve as a front.
26
Only careful
analysis of the code being produced would reveal the true nature of what the pro-
ject was trying to accomplish. Such analysis would require a lot of (highly skilled)
manpower, whence only a small number of suspect projects could be scrutinized
at this level. e task would become much easier if eective lie detection technol-
ogy had been developed and could be routinely used in this kind of surveillance.
27
Another reason states might fail to catch precursor developments is the inher-
ent diculty of forecasting some types of breakthrough. is is more relevant
to AI research than to whole brain emulation development, since for the latter
the key breakthrough is more likely to be preceded by a clear gradient of steady
advances.
It is also possible that intelligence agencies and other government bureaucracies
have a certain clumsiness or rigidity that might prevent them from understand-
ing the signicance of some developments that might be clear to some outside
groups. Barriers to ocial understanding of a potential intelligence explosion
might be especially steep. It is conceivable, for example, that the topic will become
inamed with religious or political controversies, rendering it taboo for ocials in
some countries. e topic might become associated with some discredited gure
or with charlatanry and hype in general, hence shunned by respected scientists
and other establishment gures. (As we saw in Chapter 1, something like this has
already happened twice: recall the two “AI winters.”) Industry groups might lobby
to prevent aspersions being cast on protable business areas; academic communi-
ties might close ranks to marginalize those who voice concerns about long-term
consequences of the science that is being done.
28
Consequently, a total intelligence failure cannot be ruled out. Such a failure is
especially likely if breakthroughs should occur in the nearer future, before the
issue has risen to public prominence. And even if intelligence agencies get it right,
political leaders might not listen or act on the advice. Getting the Manhattan
Project started took an extraordinary eort by several visionary physicists,
including especially Mark Oliphant and Leó Szird: the latter persuaded Eugene
Wigner to persuade Albert Einstein to put his name on a letter to persuade
President Franklin D. Roosevelt to look into the matter. Even aer the project
reached its full scale, Roosevelt remained skeptical of its workability and signi-
cance, as did his successor Harry Truman.
For better or worse, it would probably be harder for a small group of activists
to aect the outcome of an intelligence explosion if big players, such as states,
are taking active part. Opportunities for private individuals to reduce the overall
amount of existential risk from a potential intelligence explosion are therefore
86
|
DECISIVE STRATEGIC ADVANTAGE
greatest in scenarios in which big players remain relatively oblivious to the issue,
or in which the early eorts of activists make a major dierence to whether, when,
which, or with what attitude big players enter the game. Activists seeking maxi-
mum expected impact may therefore wish to focus most of their planning on such
high-leverage scenarios, even if they believe that scenarios in which big players
end up calling all the shots are more probable.
International collaboration
International coordination is more likely if global governance structures gen-
erally get stronger. Coordination might also be more likely if the signicance
of an intelligence explosion is widely appreciated ahead of time and if eective
monitoring of all serious projects is feasible. Even if monitoring is infeasible,
however, international cooperation would still be possible. Many countries
could band together to support a joint project. If such a joint project were suf-
ciently well resourced, it could have a good chance of being the rst to reach
the goal, especially if any rival project had to be small and secretive to elude
detection.
ere are precedents of large-scale successful multinational scientic collabo-
rations, such as the International Space Station, the Human Genome Project, and
the Large Hadron Collider.
29
However, the major motivation for collaboration
in those cases was cost-sharing. (In the case of the International Space Station,
fostering a collaborative spirit between Russia and the United States was itself
an important goal.
30
) Achieving similar collaboration on a project that has enor-
mous security implications would be more dicult. A country that believed it
could achieve a breakthrough unilaterally might be tempted to go it alone rather
than subordinate its eorts to a joint project. A country might also refrain from
joining an international collaboration from fear that other participants might
siphon o collaboratively generated insights and use them to accelerate a covert
national project.
An international project would thus need to overcome major security chal-
lenges, and a fair amount of trust would probably be needed to get it started,
trust that may take time to develop. Consider that even aer the thaw in relations
between the United States and the Soviet Union following Gorbachevs ascent
to power, arms reduction eorts—which could be greatly in the interests of
both superpowers—had a tful beginning. Gorbachev was seeking steep reduc-
tions in nuclear arms but negotiations stalled on the issue of Reagan’s Strategic
Defense Initiative (“Star Wars”), which the Kremlin strenuously opposed.
At the Reykjavík Summit meeting in 1986, Reagan proposed that the United
States would share with the Soviet Union the technology that would be devel-
oped under the Strategic Defense Initiative, so that both countries could enjoy
protection against accidental launches and against smaller nations that might
develop nuclear weapons. Yet Gorbachev was not persuaded by this apparent
win–win proposition. He viewed the gambit as a ruse, refusing to credit the
FROM DECISIVE STRATEGIC ADVANTAGE TO SINGLETON
|
87
notion that the Americans would share the fruits of their most advanced mili-
tary research at a time when they were not even willing to share with the Soviets
their technology for milking cows.
31
Regardless of whether Reagan was in fact
sincere in his oer of superpower collaboration, mistrust made the proposal a
non-starter.
Collaboration is easier to achieve between allies, but even there it is not auto-
matic. When the Soviet Union and the United States were allied against Germany
during World War II, the United States concealed its atomic bomb project from
the Soviet Union. e United States did collaborate on the Manhattan Project
with Britain and Canada.
32
Similarly, the United Kingdom concealed its suc-
cess in breaking the German Enigma code from the Soviet Union, but shared
it—albeit with some diculty—with the United States.
33
is suggests that in
order to achieve international collaboration on some technology that is of pivotal
importance for national security, it might be necessary to have built beforehand a
close and trusting relationship.
We will return in Chapter 14 to the desirability and feasibility of international
collaboration in the development of intelligence amplication technologies.
From decisive strategic advantage to singleton
Would a project that obtained a decisive strategic advantage choose to use it to
form a singleton?
Consider a vaguely analogous historical situation. e United States developed
nuclear weapons in 1945. It was the sole nuclear power until the Soviet Union
developed the atom bomb in 1949. During this interval—and for some time
thereaer— the United States may have had, or been in a position to achieve, a
decisive military advantage.
e United States could then, theoretically, have used its nuclear monopoly to
create a singleton. One way in which it could have done so would have been by
embarking on an all-out eort to build up its nuclear arsenal and then threaten-
ing (and if necessary, carrying out) a nuclear rst strike to destroy the industrial
capacity of any incipient nuclear program in the USSR and any other country
tempted to develop a nuclear capability.
A more benign course of action, which might also have had a chance of work-
ing, would have been to use its nuclear arsenal as a bargaining chip to negotiate
a strong international government—a veto-less United Nations with a nuclear
monopoly and a mandate to take all necessary actions to prevent any country
from developing its own nuclear weapons.
Both of these approaches were proposed at the time. e hardline approach of
launching or threatening a rst strike was advocated by some prominent intel-
lectuals such as Bertrand Russell (who had long been active in anti-war move-
ments and who would later spend decades campaigning against nuclear weapons)
and John von Neumann (co-creator of game theory and one of the architects of
88
|
DECISIVE STRATEGIC ADVANTAGE
US nuclear strategy).
34
Perhaps it is a sign of civilizational progress that the very
idea of threatening a nuclear rst strike today seems borderline silly or morally
obscene.
A version of the benign approach was tried in 1946 by the United States in the
form of the Baruch plan. e proposal involved the USA giving up its temporary
nuclear monopoly. Uranium and thorium mining and nuclear technology would
be placed under the control of an international agency operating under the aus-
pices of the United Nations. e proposal called for the permanent members of
the Security Council to give up their vetoes in matters related to nuclear weapons
in order to prevent any great power found to be in breach of the accord from veto-
ing the imposition of remedies.
35
Stalin, seeing that the Soviet Union and its allies
could be easily outvoted in both the Security Council and the General Assembly,
rejected the proposal. A frosty atmosphere of mutual suspicion descended on the
relations between the former wartime allies, mistrust that soon solidied into
the Cold War. As had been widely predicted, a costly and extremely dangerous
nuclear arms race followed.
Many factors might dissuade a human organization with a decisive strategic
advantage from creating a singleton. ese include non-aggregative or bounded
utility functions, non-maximizing decision rules, confusion and uncertainty,
coordination problems, and various costs associated with a takeover. But what if
it were not a human organization but a superintelligent articial agent that came
into possession of a decisive strategic advantage? Would the aforementioned fac-
tors be equally eective at inhibiting an AI from attempting to seize power? Let
us briey run through the list of factors and consider how they might apply in
this case.
Human individuals and human organizations typically have preferences over
resources that are not well represented by an “unbounded aggregative utility
function.” A human will typically not wager all her capital for a y–y chance
of doubling it. A state will typically not risk losing all its territory for a ten percent
chance of a tenfold expansion. For individuals and governments, there are dimin-
ishing returns to most resources. e same need not hold for AIs. (We will return
to the problem of AI motivation in subsequent chapters.) An AI might therefore
be more likely to pursue a risky course of action that has some chance of giving it
control of the world.
Humans and human-run organizations may also operate with decision pro-
cesses that do not seek to maximize expected utility. For example, they may
allow for fundamental risk aversion, or “satiscing” decision rules that focus on
meeting adequacy thresholds, or “deontological” side-constraints that proscribe
certain kinds of action regardless of how desirable their consequences. Human
decision makers oen seem to be acting out an identity or a social role rather than
seeking to maximize the achievement of some particular objective. Again, this
need not apply to articial agents.
Bounded utility functions, risk aversion, and non-maximizing decision
rules may combine synergistically with strategic confusion and uncertainty.
FROM DECISIVE STRATEGIC ADVANTAGE TO SINGLETON
|
89
Revolutions, even when they succeed in overthrowing the existing order, oen
fail to produce the outcome that their instigators had promised. is tends to
stay the hand of a human agent if the contemplated action is irreversible, norm-
breaking, and lacking precedent. A superintelligence might perceive the situation
more clearly and therefore face less strategic confusion and uncertainty about
the outcome should it attempt to use its apparent decisive strategic advantage to
consolidate its dominant position.
Another major factor that can inhibit groups from exploiting a potentially
decisive strategic advantage is the problem of internal coordination. Members of
a conspiracy that is in a position to seize power must worry not only about being
inltrated from the outside, but also about being overthrown by some smaller
coalition of insiders. If a group consists of a hundred people, and a majority of
sixty can take power and disenfranchise the non-conspirators, what is then to
stop a thirty-ve-strong subset of these sixty from disenfranchising the other
twenty-ve? And then maybe a subset of twenty disenfranchising the other f-
teen? Each of the original hundred might have good reason to uphold certain
established norms to prevent the general unraveling that could result from any
attempt to change the social contract by means of a naked power grab. is prob-
lem of internal coordination would not apply to an AI system that constitutes a
single unied agent.
36
Finally, there is the issue of cost. Even if the United States could have used
its nuclear monopoly to establish a singleton, it might not have been able to do
so without incurring substantial costs. In the case of a negotiated agreement to
place nuclear weapons under the control of a reformed and strengthened United
Nations, these costs might have been relatively small; but the costs—moral, eco-
nomic, political, and human—of actually attempting world conquest through the
waging of nuclear war would have been almost unthinkably large, even during
the period of nuclear monopoly. With sucient technological superiority, how-
ever, these costs would be far smaller. Consider, for example, a scenario in which
one nation had such a vast technological lead that it could safely disarm all other
nations at the press of a button, without anybody dying or being injured, and with
almost no damage to infrastructure or to the environment. With such almost
magical technological superiority, a rst strike would be a lot more tempting. Or
consider an even greater level of technological superiority which might enable
the frontrunner to cause other nations to voluntarily lay down their arms, not by
threatening them with destruction but simply by persuading a great majority of
their populations by means of an extremely eectively designed advertising and
propaganda campaign extolling the virtues of global unity. If this were done with
the intention to benet everybody, for instance by replacing national rivalries and
arms races with a fair, representative, and eective world government, it is not
clear that there would be even a cogent moral objection to the leveraging of a tem-
porary strategic advantage into a permanent singleton.
Various considerations thus point to an increased likelihood that a future power
with superintelligence that obtained a suciently large strategic advantage would
90
|
DECISIVE STRATEGIC ADVANTAGE
actually use it to form a singleton. e desirability of such an outcome depends,
of course, on the nature of the singleton that would be created and also on what
the future of intelligent life would look like in alternative multipolar scenarios.
We will revisit those questions in later chapters. But rst let us take a closer look
at why and how a superintelligence would be powerful and eective at achieving
outcomes in the world.
COGNITIVE SUPERPOWERS
|
9
CHAPTER 6
Cognitive superpowers
S
uppose that a digital superintelligent agent came into being, and that for
some reason it wanted to take control of the world: would it be able to do
so? In this chapter we consider some powers that a superintelligence could
develop and what they may enable it to do. We outline a takeover scenario that
illustrates how a superintelligent agent, starting as mere soware, could estab-
lish itself as a singleton. We also oer some remarks on the relation between
power over nature and power over other agents.
e principal reason for humanity’s dominant position on Earth is that our
brains have a slightly expanded set of faculties compared with other animals.
1
Our greater intelligence lets us transmit culture more eciently, with the result
that knowledge and technology accumulates from one generation to the next. By
now sucient content has accumulated to make possible space ight, H-bombs,
genetic engineering, computers, factory farms, insecticides, the international
peace movement, and all the accouterments of modern civilization. Geologists
have started referring to the present era as the Anthropocene in recognition of the
distinctive biotic, sedimentary, and geochemical signatures of human activities.
2
On one estimate, we appropriate 24% of the planetary ecosystem’s net primary
production.
3
And yet we are far from having reached the physical limits of
technology.
ese observations make it plausible that any type of entity that developed a
much greater than human level of intelligence would be potentially extremely
powerful. Such entities could accumulate content much faster than us and invent
new technologies on a much shorter timescale. ey could also use their intel-
ligence to strategize more eectively than we can.
Let us consider some of the capabilities that a superintelligence could have and
how it could use them.
92
|
COGNITIVE SUPERPOWERS
Functionalities and superpowers
It is important not to anthropomorphize superintelligence when thinking about
its potential impacts. Anthropomorphic frames encourage unfounded expecta-
tions about the growth trajectory of a seed AI and about the psychology, motiva-
tions, and capabilities of a mature superintelligence.
For example, a common assumption is that a superintelligent machine would
be like a very clever but nerdy human being. We imagine that the AI has book
smarts but lacks social savvy, or that it is logical but not intuitive and creative.
is idea probably originates in observation: we look at present-day computers
and see that they are good at calculation, remembering facts, and at following the
letter of instructions while being oblivious to social contexts and subtexts, norms,
emotions, and politics. e association is strengthened when we observe that the
people who are good at working with computers tend themselves to be nerds. So
it is natural to assume that more advanced computational intelligence will have
similar attributes, only to a higher degree.
is heuristic might retain some validity in the early stages of development of
a seed AI. (ere is no reason whatever to suppose that it would apply to emula-
tions or to cognitively enhanced humans.) In its immature stage, what is later
to become a superintelligent AI might still lack many skills and talents that
come naturally to a human; and the pattern of such a seed AI’s strengths and
weaknesses might indeed bear some vague resemblance to an IQ nerd. e most
essential characteristic of a seed AI, aside from being easy to improve (having
low recalcitrance), is being good at exerting optimization power to amplify a
systems intelligence: a skill which is presumably closely related to doing well in
mathematics, programming, engineering, computer science research, and other
such “nerdy” pursuits. However, even if a seed AI does have such a nerdy capa-
bility prole at one stage of its development, this does not entail that it will grow
into a similarly limited mature superintelligence. Recall the distinction between
direct and indirect reach. With sucient skill at intelligence amplication, all
other intellectual abilities are within a system’s indirect reach: the system can
develop new cognitive modules and skills as needed—including empathy, politi-
cal acumen, and any other powers stereotypically wanting in computer-like
personalities.
Even if we recognize that a superintelligence can have all the skills and talents
we nd in the human distribution, along with other talents that are not found
among humans, the tendency toward anthropomorphizing can still lead us to
underestimate the extent to which a machine superintelligence could exceed the
human level of performance. Eliezer Yudkowsky, as we saw in an earlier chapter,
has been particularly emphatic in condemning this kind of misconception: our
intuitive concepts of “smart” and “stupid” are distilled from our experience of
variation over the range of human thinkers, yet the dierences in cognitive abil-
ity within this human cluster are trivial in comparison to the dierences between
any human intellect and a superintelligence.
4
FUNCTIONALITIES AND SUPERPOWERS
|
93
Chapter 3 reviewed some of the potential sources of advantage for machine
intelligence. e magnitudes of the advantages are such as to suggest that rather
than thinking of a superintelligent AI as smart in the sense that a scientic genius
is smart compared with the average human being, it might be closer to the mark
to think of such an AI as smart in the sense that an average human being is smart
compared with a beetle or a worm.
It would be convenient if we could quantify the cognitive caliber of an arbitrary
cognitive system using some familiar metric, such as IQ scores or some version of
the Elo ratings that measure the relative abilities of players in two-player games
such as chess. But these metrics are not useful in the context of superhuman arti-
cial general intelligence. We are not interested in how likely a superintelligence
is to win at a game of chess. As for IQ scores, they are informative only insofar
as we have some idea of how they correlate with practically relevant outcomes.
5
For example, we have data that show that people with an IQ of 130 are more likely
than those with an IQ of 90 to excel in school and to do well in a wide range of cog-
nitively demanding jobs. But suppose we could somehow establish that a certain
future AI will have an IQ of 6,455: then what? We would have no idea of what such
an AI could actually do. We would not even know that such an AI had as much
general intelligence as a normal human adult—perhaps the AI would instead have
a bundle of special-purpose algorithms enabling it to solve typical intelligence
test questions with superhuman eciency but not much else.
Some recent eorts have been made to develop measurements of cognitive
capacity that could be applied to a wider range of information-processing sys-
tems, including articial intelligences.
6
Work in this direction, if it can overcome
various technical diculties, may turn out to be quite useful for some scientic
purposes including AI development. For purposes of the present investigation,
however, its usefulness would be limited since we would remain unenlightened
about what a given superhuman performance score entails for actual ability to
achieve practically important outcomes in the world.
It will therefore serve our purposes better to list some strategically important
tasks and then to characterize hypothetical cognitive systems in terms of whether
they have or lack whatever skills are needed to succeed at these tasks. See Table 8.
We will say that a system that suciently excels at any of the tasks in this table has
a corresponding superpower.
A full-blown superintelligence would greatly excel at all of these tasks and
would thus have the full panoply of all six superpowers. Whether there is a prac-
tically signicant possibility of a domain-limited intelligence that has some of
the superpowers but remains unable for a signicant period of time to acquire
all of them is not clear. Creating a machine with any one of these superpowers
appears to be an AI-complete problem. Yet it is conceivable that, for example, a
collective superintelligence consisting of a suciently large number of human-
like biological or electronic minds would have, say, the economic productivity
superpower but lack the strategizing superpower. Likewise, it is conceivable that
a specialized engineering AI could be built that has the technology research
94
|
COGNITIVE SUPERPOWERS
superpower while completely lacking skills in other areas. is is more plau-
sible if there exists some particular technological domain such that virtuosity
within that domain would be sucient for the generation of an overwhelmingly
superior general-purpose technology. For instance, one could imagine a special-
ized AI adept at simulating molecular systems and at inventing nanomolecular
designs that realize a wide range of important capabilities (such as computers
or weapons systems with futuristic performance characteristics) described by
the user only at a fairly high level of abstraction.
7
Such an AI might also be able
to produce a detailed blueprint for how to bootstrap from existing technology
(such as biotechnology and protein engineering) to the constructor capabilities
needed for high-throughput atomically precise manufacturing that would allow
inexpensive fabrication of a much wider range of nanomechanical structures.
8
Table 8 Superpowers: some strategically relevant tasks and correspond-
ing skill sets
Task Skill set Strategic relevance
Intelligence
amplification
AI programming, cognitive
enhancement research,
social epistemology devel-
opment, etc.
•  System can bootstrap its  intelligence
Strategizing Strategic planning, forecast-
ing, prioritizing, and analysis
for optimizing chances of
achieving distant goal
•  Achieve distant goals
•  Overcome intelligent opposition
Social
manipulation
Social and psychological
modeling, manipulation,
rhetoric persuasion
•  Leverage external resources by 
recruiting human support
•  Enable a “boxed” AI to persuade its 
gatekeepers to let it out
•  Persuade states and organizations to 
adopt some course of action
Hacking Finding and exploiting
security flaws in computer
systems
•  AI can expropriate computational 
resources over the Internet
•  A boxed AI may exploit security holes 
to escape cybernetic confinement
•  Steal nancial resources
•  Hijack infrastructure, military robots, 
etc.
Technology
research
Design and modeling of
advanced technologies (e.g.
biotechnology, nanotechnol-
ogy) and development paths
•  Creation of powerful military force
•  Creation of surveillance system
•  Automated space colonization
Economic
productivity
Various skills enabling
economically productive
intellectual work
•  Generate wealth which can be used 
to buy influence, services, resources
(including hardware), etc.
AN AI TAKEOVER SCENARIO
|
95
However, it might turn out to be the case that an engineering AI could not truly pos-
sess the technological research superpower without also possessing advanced skills
in areas outside of technology—a wide range of intellectual faculties might be needed
to understand how to interpret user requests, how to model a design’s behavior in
real-world applications, how to deal with unanticipated bugs and malfunctions, how
to procure the materials and inputs needed for construction, and so forth.
9
A system that has the intelligence amplication superpower could use it to
bootstrap itself to higher levels of intelligence and to acquire any of the other intel-
lectual superpowers that it does not possess at the outset. But using an intelligence
amplication superpower is not the only way for a system to become a full-edged
superintelligence. A system that has the strategizing superpower, for instance,
might use it to devise a plan that will eventually bring an increase in intelligence
(e.g. by positioning the system so as to become the focus for intelligence amplica-
tion work performed by human programmers and computer science researchers).
An AI takeover scenario
We thus nd that a project that controls a superintelligence has access to a great
source of power. A project that controls the rst superintelligence in the world
would probably have a decisive strategic advantage. But the more immediate locus
of the power is in the system itself. A machine superintelligence might itself be
an extremely powerful agent, one that could successfully assert itself against the
project that brought it into existence as well as against the rest of the world. is
is a point of paramount importance, and we will examine it more closely in the
coming pages.
Now let us suppose that there is a machine superintelligence that wants to
seize power in a world in which it has as yet no peers. (Set aside, for the moment,
the question of whether and how it would acquire such a motive—that is a topic
for the next chapter.) How could the superintelligence achieve this goal of world
domination?
We can imagine a sequence along the following lines (see Figure 10).
1 Pre-criticality phase
Scientists conduct research in the eld of articial intelligence and other relevant
disciplines. is work culminates in the creation of a seed AI. e seed AI is able
to improve its own intelligence. In its early stages, the seed AI is dependent on
help from human programmers who guide its development and do most of the
heavy liing. As the seed AI grows more capable, it becomes capable of doing
more of the work by itself.
2 Recursive self-improvement phase
At some point, the seed AI becomes better at AI design than the human pro-
grammers. Now when the AI improves itself, it improves the thing that does the
96
|
COGNITIVE SUPERPOWERS
Seed AI
Pre-criticality Recursive self-improvement Covert preparation Overt implementation
AI Research
Intelligence amplication
(Economic productivity)
Strategizing / Technology research
Hacking / Social
Expansion
Construction
Strike
Escape
Figure 0 Phases in an AI takeover scenario.
improving. An intelligence explosion results—a rapid cascade of recursive self-
improvement cycles causing the AI’s capability to soar. (We can thus think of
this phase as the takeo that occurs just aer the AI reaches the crossover point,
assuming the intelligence gain during this part of the takeo is explosive and
driven by the application of the AI’s own optimization power.) e AI develops
the intelligence amplication superpower. is superpower enables the AI to
develop all the other superpowers detailed in Table 8. At the end of the recursive
self-improvement phase, the system is strongly superintelligent.
3 Covert preparation phase
Using its strategizing superpower, the AI develops a robust plan for achieving its
long-term goals. (In particular, the AI does not adopt a plan so stupid that even
we present-day humans can foresee how it would inevitably fail. is criterion
rules out many science ction scenarios that end in human triumph.
10
) e plan
might involve a period of covert action during which the AI conceals its intel-
lectual development from the human programmers in order to avoid setting o
alarms. e AI might also mask its true proclivities, pretending to be cooperative
and docile.
If the AI has (perhaps for safety reasons) been conned to an isolated computer,
it may use its social manipulation superpower to persuade the gatekeepers to let it
gain access to an Internet port. Alternatively, the AI might use its hacking super-
power to escape its connement. Spreading over the Internet may enable the AI
to expand its hardware capacity and knowledge base, further increasing its intel-
lectual superiority. An AI might also engage in licit or illicit economic activity to
obtain funds with which to buy computer power, data, and other resources.
AN AI TAKEOVER SCENARIO
|
97
At this point, there are several ways for the AI to achieve results outside the
virtual realm. It could use its hacking superpower to take direct control of robotic
manipulators and automated laboratories. Or it could use its social manipulation
superpower to persuade human collaborators to serve as its legs and hands. Or it
could acquire nancial assets from online transactions and use them to purchase
services and inuence.
4 Overt implementation phase
e nal phase begins when the AI has gained sucient strength to obviate the
need for secrecy. e AI can now directly implement its objectives on a full scale.
e overt implementation phase might start with a “strike” in which the AI
eliminates the human species and any automatic systems humans have created that
could oer intelligent opposition to the execution of the AI’s plans. is could be
achieved through the activation of some advanced weapons system that the AI has
perfected using its technology research superpower and covertly deployed in the
covert preparation phase. If the weapon uses self-replicating biotechnology or nano-
technology, the initial stockpile needed for global coverage could be microscopic:
a single replicating entity would be enough to start the process. In order to ensure
a sudden and uniform eect, the initial stock of the replicator might have been
deployed or allowed to diuse worldwide at an extremely low, undetectable con-
centration. At a pre-set time, nanofactories producing nerve gas or target- seeking
mosquito-like robots might then burgeon forth simultaneously from every square
meter of the globe (although more eective ways of killing could probably be
devised by a machine with the technology research superpower).
11
One might also
entertain scenarios in which a superintelligence attains power by hijacking politi-
cal processes, subtly manipulating nancial markets, biasing information ows,
or hacking into human-made weapon systems. Such scenarios would obviate the
need for the superintelligence to invent new weapons technology, although they
may be unnecessarily slow compared with scenarios in which the machine intel-
ligence builds its own infrastructure with manipulators that operate at molecular
or atomic speed rather than the slow speed of human minds and bodies.
Alternatively, if the AI is sure of its invincibility to human interference, our spe-
cies may not be targeted directly. Our demise may instead result from the habitat
destruction that ensues when the AI begins massive global construction projects
using nanotech factories and assemblers—construction projects which quickly,
perhaps within days or weeks, tile all of the Earths surface with solar panels,
nuclear reactors, supercomputing facilities with protruding cooling towers, space
rocket launchers, or other installations whereby the AI intends to maximize the
long-term cumulative realization of its values. Human brains, if they contain
information relevant to the AIs goals, could be disassembled and scanned, and
the extracted data transferred to some more ecient and secure storage format.
Box 6 describes one particular scenario. One should avoid xating too much
on the concrete details, since they are in any case unknowable and intended for
illustration only. A superintelligence might—and probably would—be able to
98
|
COGNITIVE SUPERPOWERS
Box 6 The mail-ordered DNA scenario
Yudkowsky describes the following possible scenario for an AI takeover.
2
1 Crack the protein folding problem to the extent of being able to generate
DNA strings whose folded peptide sequences fill specific functional roles
in a complex chemical interaction.
2 Email sets of DNA strings to one or more online laboratories that oer
DNA synthesis, peptide sequencing, and FedEx delivery. (Many labs cur-
rently oer this service, and some boast of 72-hour turnaround times.)
3 Find at least one human connected to the Internet who can be paid, black-
mailed, or fooled by the right background story, into receiving FedExed
vials and mixing them in a specified environment.
4 The synthesized proteins form a very primitive “wet” nanosystem, which,
ribosome-like, is capable of accepting external instructions; perhaps pat-
terned acoustic vibrations delivered by a speaker attached to the beaker.
5 Use the extremely primitive nanosystem to build more sophisticated sys-
tems, which construct still more sophisticated systems, bootstrapping to
molecular nanotechnologyor beyond.
In this scenario, the superintelligence uses its technology research superpower to
solve the protein folding problem in step , enabling it to design a set of molecu-
lar building blocks for a rudimentary nanotechnology assembler or fabrication
device, which can self-assemble in aqueous solution (step 4). The same technol-
ogy research superpower is used again in step 5 to bootstrap from primitive to
advanced machine-phase nanotechnology. The other steps require no more than
human intelligence. The skills required for step 3—identifying a gullible Internet
user and persuading him or her to follow some simple instructions—are on dis-
play every day all over the world. The entire scenario was invented by a human
mind, so the strategizing ability needed to formulate this plan is also merely hu-
man level.
In this particular scenario, the AI starts out having access to the Internet. If this
is not the case, then additional steps would have to be added to the plan. The
AI might, for example, use its social manipulation superpower to convince the
people interacting with it that it ought to be set free. Alternatively, the AI might
be able to use its hacking superpower to escape confinement. If the AI does
not possess these capabilities, it might first need to use its intelligence amplifica-
tion superpower to develop the requisite proficiency in social manipulation or
hacking.
A superintelligent AI will presumably be born into a highly networked world.
One could point to various developments that could potentially help a future AI
to control the worldcloud computing, proliferation of web-connected sen-
sors, military and civilian drones, automation in research labs and manufacturing
plants, increased reliance on electronic payment systems and digitized financial
POWER OVER NATURE AND AGENTS
|
99
conceive of a better plan for achieving its goals than any that a human can come
up with. It is therefore necessary to think about these matters more abstractly.
Without knowing anything about the detailed means that a superintelligence
would adopt, we can conclude that a superintelligence—at least in the absence
of intellectual peers and in the absence of eective safety measures arranged by
humans in advance—would likely produce an outcome that would involve recon-
guring terrestrial resources into whatever structures maximize the realization
of its goals. Any concrete scenario we develop can at best establish a lower bound
on how quickly and eciently the superintelligence could achieve such an out-
come. It remains possible that the superintelligence would nd a shorter path to
its preferred destination.
Power over nature and agents
An agent’s ability to shape humanity’s future depends not only on the absolute
magnitude of the agents own faculties and resources—how smart and energetic it
is, how much capital it has, and so forth—but also on the relative magnitude of its
capabilities compared with those of other agents with conicting goals.
In a situation where there are no competing agents, the absolute capability
level of a superintelligence, so long as it exceeds a certain minimal threshold,
does not matter much, because a system starting out with some sucient set
of capabilities could plot a course of development that will let it acquire any
capabilities it initially lacks. We alluded to this point earlier when we said
that speed, quality, and collective superintelligence all have the same indi-
rect reach. We alluded to it again when we said that various subsets of super-
powers, such as the intelligence amplication superpower or the strategizing
and the social manipulation superpowers, could be used to obtain the full
complement.
Box 6 Continued
assets, and increased use of automated information-filtering and decision sup-
port systems. Assets like these could potentially be acquired by an AI at digital
speeds, expediting its rise to power (though advances in cybersecurity might
make it harder). In the final analysis, however, it is doubtful whether any of these
trends makes a dierence. A superintelligence’s power resides in its brain, not its
hands. Although the AI, in order to remake the external world, will at some point
need access to an actuator, a single pair of helping human hands, those of a pliable
accomplice, would probably suce to complete the covert preparation phase,
as suggested by the above scenario. This would enable the AI to reach the overt
implementation phase in which it constructs its own infrastructure of physical
manipulators.
00
|
COGNITIVE SUPERPOWERS
Consider a superintelligent agent with actuators connected to a nanotech assem-
bler. Such an agent is already powerful enough to overcome any natural obstacles
to its indenite survival. Faced with no intelligent opposition, such an agent could
plot a safe course of development that would lead to its acquiring the complete
inventory of technologies that would be useful to the attainment of its goals. For
example, it could develop the technology to build and launch von Neumann probes,
machines capable of interstellar travel that can use resources such as asteroids,
planets, and stars to make copies of themselves.
13
By launching one von Neumann
probe, the agent could thus initiate an open-ended process of space colonization.
e replicating probes descendants, travelling at some signicant fraction of the
speed of light, would end up colonizing a substantial portion of the Hubble vol-
ume, the part of the expanding universe that is theoretically accessible from where
we are now. All this matter and free energy could then be organized into whatever
value structures maximize the originating agents utility function integrated over
cosmic time—a duration encompassing at least trillions of years before the aging
universe becomes inhospitable to information processing (see Box 7).
e superintelligent agent could design the von Neumann probes to be
evolution- proof. is could be accomplished by careful quality control during
the replication step. For example, the control soware for a daughter probe could
be proofread multiple times before execution, and the soware itself could use
encryption and error-correcting code to make it arbitrarily unlikely that any ran-
dom mutation would be passed on to its descendants.
14
e proliferating popu-
lation of von Neumann probes would then securely preserve and transmit the
originating agent’s values as they go about settling the universe. When the colon-
ization phase is completed, the original values would determine the use made of
all the accumulated resources, even though the great distances involved and the
accelerating speed of cosmic expansion would make it impossible for remote parts
of the infrastructure to communicate with one another. e upshot is that a large
part of our future light cone would be formatted in accordance with the prefer-
ences of the originating agent.
is, then, is the measure of the indirect reach of any system that faces no signif-
icant intelligent opposition and that starts out with a set of capabilities exceeding
a certain threshold. We can term the threshold the “wise-singleton sustainability
threshold” (Figure 11):
The wise-singleton sustainability threshold
A capability set exceeds the wise-singleton threshold if and only if a patient and
existential risk-savvy system with that capability set would, if it faced no intelligent
opposition or competition, be able to colonize and re-engineer a large part of the
accessible universe.
By “singleton” we mean a suciently internally coordinated political structure
with no external opponents, and by “wise” we mean suciently patient and savvy
about existential risks to ensure a substantial amount of well-directed concern for
the very long-term consequences of the system’s actions.
POWER OVER NATURE AND AGENTS
|
0
Capacity
Time
extinction
short-term viability
singleton sustainability
cosmic endowment
Figure  Schematic illustration of some possible trajectories for a hypothetical wise singleton.
With a capability below the short-term viability threshold—for example, if population size is too
small—a species tends to go extinct in short order (and remain extinct). At marginally higher
levels of capability, various trajectories are possible: a singleton might be unlucky and go extinct or
it might be lucky and attain a capability (e.g. population size, geographical dispersion, technologic al
capacity) that crosses the wise-singleton sustainability threshold. Once above this threshold, a
singleton will almost certainly continue to gain in capability until some extremely high capability
level is attained. In this picture, there are two attractors: extinction and astronomical capability.
Note that, for a wise singleton, the distance between the short-term viability threshold and the
sustainability threshold may be rather small.
5
Box 7 How big is the cosmic endowment?
Consider a technologically mature civilization capable of building sophisticated
von Neumann probes of the kind discussed in the text. If these can travel at
50% of the speed of light, they can reach some 6×0
8
stars before the cosmic
expansion puts further acquisitions forever out of reach. At 99% of c, they could
reach some 2×0
20
stars.
6
These travel speeds are energetically attainable using a
small fraction of the resources available in the solar system.
7
The impossibility of
faster-than-light travel, combined with the positive cosmological constant (which
causes the rate of cosmic expansion to accelerate), implies that these are close to
upper bounds on how much stu our descendants acquire.
8
If we assume that 0% of stars have a planet that isor could by means of
terraforming be rendered—suitable for habitation by human-like creatures, and
that it could then be home to a population of a billion individuals for a billion years
(with a human life lasting a century), this suggests that around 0
35
human lives
could be created in the future by an Earth-originating intelligent civilization.
9
There are, however, reasons to think this greatly underestimates the true
number. By disassembling non-habitable planets and collecting matter from the
continued
02
|
COGNITIVE SUPERPOWERS
Box 7 Continued
interstellar medium, and using this material to construct Earth-like planets, or
by increasing population densities, the number could be increased by at least
a couple of orders of magnitude. And if instead of using the surfaces of solid
planets, the future civilization built O’Neill cylinders, then many further orders of
magnitude could be added, yielding a total of perhaps 0
43
human lives. (“O’Neill
cylinders” refers to a space settlement design proposed in the mid-seventies by
the American physicist Gerard K. O’Neill, in which inhabitants dwell on the inside
of hollow cylinders whose rotation produces a gravity-substituting centrifugal
force.
20
)
Many more orders of magnitudes of human-like beings could exist if we coun-
tenance digital implementations of minds—as we should. To calculate how many
such digital minds could be created, we must estimate the computational power
attainable by a technologically mature civilization. This is hard to do with any pre-
cision, but we can get a lower bound from technological designs that have been
outlined in the literature. One such design builds on the idea of a Dyson sphere,
a hypothetical system (described by the physicist Freeman Dyson in 960) that
would capture most of the energy output of a star by surrounding it with a sys-
tem of solar-collecting structures.
2
For a star like our Sun, this would generate
0
26
watts. How much computational power this would translate into depends
on the eciency of the computational circuitry and the nature of the computa-
tions to be performed. If we require irreversible computations, and assume a na-
nomechanical implementation of the “computronium” (which would allow us to
push close to the Landauer limit of energy eciency), a computer system driven
by a Dyson sphere could generate some 0
47
operations per second.
22
Combining these estimates with our earlier estimate of the number of stars
that could be colonized, we get a number of about 0
67
ops/s once the accessible
parts of the universe have been colonized (assuming nanomechanical computro-
nium).
23
A typical star maintains its luminosity for some 0
8
s. Consequently, the
number of computational operations that could be performed using our cosmic
endowment is at least 0
85
. The true number is probably much larger. We might
get additional orders of magnitude, for example, if we make extensive use of
reversible computation, if we perform the computations at colder temperatures
(by waiting until the universe has cooled further), or if we make use of additional
sources of energy (such as dark matter).
24
It might not be immediately obvious to some readers why the ability to per-
form 0
85
computational operations is a big deal. So it is useful to put it in context.
We may, for example, compare this number with our earlier estimate (Box 3, in
Chapter 2) that it may take about 0
3
–0
44
ops to simulate all neuronal opera-
tions that have occurred in the history of life on Earth. Alternatively, let us sup-
pose that the computers are used to run human whole brain emulations that live
rich and happy lives while interacting with one another in virtual environments.
POWER OVER NATURE AND AGENTS
|
03
is wise-singleton sustainability threshold appears to be quite low. Limited
forms of superintelligence, as we have seen, exceed this threshold provided they
have access to some actuator sucient to initiate a technology bootstrap process.
In an environment that includes contemporary human civilization, the mini-
mally necessary actuator could be very simple—an ordinary screen or indeed any
means of transmitting a non-trivial amount of information to a human accom-
plice would suce.
But the wise-singleton sustainability threshold is lower still: neither superin-
telligence nor any other futuristic technology is needed to surmount it. A patient
and existential risk-savvy singleton with no more technological and intellectual
capabilities than those possessed by contemporary humanity should be readily
able to plot a course that leads reliably to the eventual realization of humanity’s
astronomical capability potential. is could be achieved by investing in rela-
tively safe methods of increasing wisdom and existential risk-savvy while post-
poning the development of potentially dangerous new technologies. Given that
non-anthropogenic existential risks (ones not arising from human activities) are
small over the relevant timescales—and could be further reduced with various
safe interventions—such a singleton could aord to go slow.
25
It could look care-
fully before each step, delaying development of capabilities such as synthetic biol-
ogy, human enhancement medicine, molecular nanotechnology, and machine
intelligence until it had rst perfected seemingly less hazardous capabilities such
as its education system, its information technology, and its collective decision-
making processes, and until it had used these capabilities to conduct a very thor-
ough review of its options. So this is all within the indirect reach of a technological
civilization like that of contemporary humanity. We are separated from this scen-
ario “merely” by the fact that humanity is currently neither a singleton nor (in the
relevant sense) wise.
Box 7 Continued
A typical estimate of the computational requirements for running one emulation
is 0
8
ops/s. To run an emulation for 00 subjective years would then require
some 0
27
ops. This would mean that at least 0
58
human lives could be created
in emulation even with quite conservative assumptions about the eciency of
computronium.
In other words, assuming that the observable universe is void of extraterres-
trial civilizations, then what hangs in the balance is at least 0,000,000,000,000,
000,000,000,000,000,000,000,000,000,000,000,000,000,000,000 human lives
(though the true number is probably larger). If we represent all the happiness
experienced during one entire such life with a single teardrop of joy, then the
happiness of these souls could fill and refill the Earth’s oceans every second, and
keep doing so for a hundred billion billion millennia. It is really important that we
make sure these truly are tears of joy.
04
|
COGNITIVE SUPERPOWERS
One could even argue that Homo sapiens passed the wise-singleton sustain-
ability threshold soon aer the species rst evolved. Twenty thousand years ago,
say, with equipment no fancier than stone axes, bone tools, atlatls, and re, the
human species was perhaps already in a position from which it had an excellent
chance of surviving to the present era.
26
Admittedly, there is something queer
about crediting our Paleolithic ancestors with having developed technology that
exceeded the wise-singleton sustainability threshold”—given that there was no
realistic possibility of a singleton forming at such a primitive time, let alone a sin-
gleton savvy about existential risks and patient.
27
Nevertheless, the point stands
that the threshold corresponds to a very modest level of technology—a level that
humanity long ago surpassed.
28
It is clear that if we are to assess the eective powers of a superintelligence—its
ability to achieve a range of preferred outcomes in the world—we must consider
not only its own internal capacities but also the capabilities of competing agents.
e notion of a superpower invoked such a relativized standard implicitly. We
said that “a system that suciently excels” at any of the tasks in Table 8 has a
corresponding superpower. Exceling at a task like strategizing, social manipula-
tion, or hacking involves having a skill at that task that is high in comparison to
the skills of other agents (such as strategic rivals, inuence targets, or computer
security experts). e other superpowers, too, should be understood in this rela-
tive sense: intelligence amplication, technology research, and economic produc-
tivity are possessed by an agent as superpowers only if the agents capabilities in
these areas substantially exceed the combined capabilities of the rest of the global
civilization. It follows from this denition that at most one agent can possess a
particular superpower at any given time.
29
is is the main reason why the question of takeo speed is important—not
because it matters exactly when a particular outcome happens, but because the
speed of the takeo may make a big dierence to what the outcome will be. With
a fast or medium takeo, it is likely that one project will get a decisive strategic
advantage. We have now suggested that a superintelligence with a decisive stra-
tegic advantage would have immense powers, enough that it could form a stable
singleton—a singleton that could determine the disposition of humanity’s cosmic
endowment.
But “could” is dierent from “would.” Somebody might have great powers yet
choose not to use them. Is it possible to say anything about what a superintel-
ligence with a decisive strategic advantage would want? It is to this question of
motivation that we turn next.
INTELLIGENCE AND MOTIVATION
|
05
CHAPTER 7
The superintelligent will
W
e have seen that a superintelligence could have a great ability to
shape the future according to its goals. But what will its goals be?
What is the relation between intelligence and motivation in an arti-
cial agent? Here we develop two theses. e orthogonality thesis holds (with
some caveats) that intelligence and nal goals are independent variables: any
level of intelligence could be combined with any nal goal. e instrumental
convergence thesis holds that superintelligent agents having any of a wide range
of nal goals will nevertheless pursue similar intermediary goals because they
have common instrumental reasons to do so. Taken together, these theses help
us think about what a superintelligent agent would do.
The relation between intelligence and motivation
We have already cautioned against anthropomorphizing the capabilities of a
superintelligent AI. is warning should be extended to pertain to its motivations
as well.
It is a useful propaedeutic to this part of our inquiry to rst reect for a
moment on the vastness of the space of possible minds. In this abstract space,
human minds form a tiny cluster. Consider two persons who seem extremely
unlike, perhaps Hannah Arendt and Benny Hill. e personality dierences
between these two individuals may seem almost maximally large. But this is
because our intuitions are calibrated on our experience, which samples from
the existing human distribution (and to some extent from ctional personalities
constructed by the human imagination for the enjoyment of the human imagin-
ation). If we zoom out and consider the space of all possible minds, however,
we must conceive of these two personalities as virtual clones. Certainly in terms
of neural architecture, Ms. Arendt and Mr. Hill are nearly identical. Imagine
their brains lying side by side in quiet repose. You would readily recognize
them as two of a kind. You might even be unable to tell which brain belonged to
06
|
THE SUPERINTELLIGENT WILL
whom. If you looked more closely, studying the morphology of the two brains
under a microscope, this impression of fundamental similarity would only be
strengthened: you would see the same lamellar organization of the cortex, with
the same brain areas, made up of the same types of neuron, soaking in the same
bath of neurotransmitters.
1
Despite the fact that human psychology corresponds to a tiny spot in the space
of possible minds, there is a common tendency to project human attributes onto
a wide range of alien or articial cognitive systems. Yudkowsky illustrates this
point nicely:
Back in the era of pulp science fiction, magazine covers occasionally depicted a sentient
monstrous aliencolloquially known as a bug-eyed monster (BEM)—carrying o an
attractive human female in a torn dress. It would seem the artist believed that a non-
humanoid alien, with a wholly dierent evolutionary history, would sexually desire human
females. . . . Probably the artist did not ask whether a giant bug perceives human females
as attractive. Rather, a human female in a torn dress is sexyinherently so, as an intrin-
sic property. They who made this mistake did not think about the insectoid’s mind: they
focused on the woman’s torn dress. If the dress were not torn, the woman would be less
sexy; the BEM does not enter into it.
2
An articial intelligence can be far less human-like in its motivations than a
green scaly space alien. e extraterrestrial (let us assume) is a biological crea-
ture that has arisen through an evolutionary process and can therefore be
expected to have the kinds of motivation typical of evolved creatures. It would
not be hugely surprising, for example, to nd that some random intelligent alien
would have motives related to one or more items like food, air, temperature,
energy expenditure, occurrence or threat of bodily injury, disease, predation,
sex, or progeny. A member of an intelligent social species might also have moti-
vations related to cooperation and competition: like us, it might show in-group
loyalty, resentment of free riders, perhaps even a vain concern with reputation
and appearance.
Figure 2 Results of anthropomorphizing alien motivation. Least likely hypothesis: space aliens
prefer blondes. More likely hypothesis: the illustrators succumbed to the “mind projection fallacy.
Most likely hypothesis: the publisher wanted a cover that would entice the target demographic.
An AI, by contrast, need not care intrinsically about any of those things. ere
is nothing paradoxical about an AI whose sole nal goal is to count the grains of
sand on Boracay, or to calculate the decimal expansion of pi, or to maximize the
total number of paperclips that will exist in its future light cone. In fact, it would
be easier to create an AI with simple goals like these than to build one that had
a human-like set of values and dispositions. Compare how easy it is to write a
program that measures how many digits of pi have been calculated and stored in
memory with how dicult it would be to create a program that reliably measures
the degree of realization of some more meaningful goal—human ourishing,
say, or global justice. Unfortunately, because a meaningless reductionistic goal is
easier for humans to code and easier for an AI to learn, it is just the kind of goal
that a programmer would choose to install in his seed AI if his focus is on taking
the quickest path to “getting the AI to work” (without caring much about what
exactly the AI will do, aside from displaying impressively intelligent behavior).
We will revisit this concern shortly.
Intelligent search for instrumentally optimal plans and policies can be per-
formed in the service of any goal. Intelligence and motivation are in a sense
orthogonal: we can think of them as two axes spanning a graph in which each
point represents a logically possible articial agent. Some qualications could be
added to this picture. For instance, it might be impossible for a very unintelligent
system to have very complex motivations. In order for it to be correct to say that
an certain agent “has” a set of motivations, those motivations may need to be
functionally integrated with the agent’s decision processes, something that places
demands on memory, processing power, and perhaps intelligence. For minds
that can modify themselves, there may also be dynamical constraints—an intel-
ligent self-modifying mind with an urgent desire to be stupid might not remain
intelligent for long. But these qualications must not be allowed to obscure the
basic point about the independence of intelligence and motivation, which we can
express as follows:
The orthogonality thesis
Intelligence and final goals are orthogonal: more or less any level of intelligence
could in principle be combined with more or less any final goal.
If the orthogonality thesis seems problematic, this might be because of the super-
cial resemblance it bears to some traditional philosophical positions which have
been subject to long debate. Once it is understood to have a dierent and narrower
scope, its credibility should rise. (For example, the orthogonality thesis does not
presuppose the Humean theory of motivation.
3
Nor does it presuppose that basic
preferences cannot be irrational.
4
)
Note that the orthogonality thesis speaks not of rationality or reason, but of
intelligence. By “intelligence” we here mean something like skill at prediction,
planning, and means–ends reasoning in general.
5
is sense of instrumental cog-
nitive ecaciousness is most relevant when we are seeking to understand what
the causal impact of a machine superintelligence might be. Even if there is some
INTELLIGENCE AND MOTIVATION
|
07
08
|
THE SUPERINTELLIGENT WILL
(normatively thick) sense of the word “rational” such that a paperclip- maximizing
superintelligent agent would necessarily fail to qualify as fully rational in that
sense, this would in no way preclude such an agent from having awesome facul-
ties of instrumental reasoning, faculties which could let it have a large impact on
the world.
6
According to the orthogonality thesis, articial agents can have utterly non-
anthropomorphic goals. is, however, does not imply that it is impossible to
make predictions about the behavior of particular articial agents—not even
hypothetical superintelligent agents whose cognitive complexity and perform-
ance characteristics might render them in some respects opaque to human analy-
sis. ere are at least three directions from which we can approach the problem of
predicting superintelligent motivation:
• Predictability through design. If we can suppose that the designers of a superintel-
ligent agent can successfully engineer the goal system of the agent so that it stably
pursues a particular goal set by the programmers, then one prediction we can
make is that the agent will pursue that goal. The more intelligent the agent is, the
greater the cognitive resourcefulness it will have to pursue that goal. So even be-
fore an agent has been created we might be able to predict something about its
behavior, if we know something about who will build it and what goals they will
want it to have.
• Predictability through inheritance. If a digital intelligence is created directly from a hu-
man template (as would be the case in a high-fidelity whole brain emulation), then
the digital intelligence might inherit the motivations of the human template.
7
The
agent might retain some of these motivations even if its cognitive capacities are
subsequently enhanced to make it superintelligent. This kind of inference requires
caution. The agent’s goals and values could easily become corrupted in the uploading
process or during its subsequent operation and enhancement, depending on how
the procedure is implemented.
• Predictability through convergent instrumental reasons. Even without detailed know-
ledge of an agent’s final goals, we may be able to infer something about its more
immediate objectives by considering the instrumental reasons that would arise for
any of a wide range of possible final goals in a wide range of situations. This way of
predicting becomes more useful the greater the intelligence of the agent, because
a more intelligent agent is more likely to recognize the true instrumental reasons
for its actions, and so act in ways that make it more likely to achieve its goals. (Acav-
eat here is that there might be important instrumental reasons to which we are
oblivious and which an agent would discover only once it reaches some very high
level of intelligencethis could make the behavior of superintelligent agents less
predictable.)
e next section explores this third way of predictability and develops an “instru-
mental convergence thesis” which complements the orthogonality thesis. Against
this background we can then better examine the other two sorts of predictability,
which we will do in later chapters where we ask what might be done to shape an
intelligence explosion to increase the chances of a benecial outcome.
INSTRUMENTAL CONVERGENCE
|
09
Instrumental convergence
According to the orthogonality thesis, intelligent agents may have an enormous
range of possible nal goals. Nevertheless, according to what we may term the
instrumental convergence” thesis, there are some instrumental goals likely to be
pursued by almost any intelligent agent, because there are some objectives that
are useful intermediaries to the achievement of almost any nal goal. We can
formulate this thesis as follows:
The instrumental convergence thesis
Several instrumental values can be identified which are convergent in the sense
that their attainment would increase the chances of the agents goal being realized
for a wide range of final goals and a wide range of situations, implying that these
instrumental values are likely to be pursued by a broad spectrum of situated intel-
ligent agents.
In the following we will consider several categories where such convergent
instrumental values may be found.
8
e likelihood that an agent will recognize
the instrumental values it confronts increases (ceteris paribus) with the agent’s
intelligence. We will therefore focus mainly on the case of a hypothetical super-
intelligent agent whose instrumental reasoning capacities far exceed those of
any human. We will also comment on how the instrumental convergence thesis
applies to the case of human beings, as this gives us occasion to elaborate some
essential qualications concerning how the instrumental convergence thesis
should be interpreted and applied. Where there are convergent instrumental
values, we may be able to predict some aspects of a superintelligence’s behavior
even if we know virtually nothing about that superintelligence’s nal goals.
Self-preservation
If an agent’s nal goals concern the future, then in many scenarios there will
be future actions it could perform to increase the probability of achieving its
goals. is creates an instrumental reason for the agent to try to be around in the
future—to help achieve its future-oriented goal.
Most humans seem to place some nal value on their own survival. is is not
a necessary feature of articial agents: some may be designed to place no nal
value whatever on their own survival. Nevertheless, many agents that do not care
intrinsically about their own survival would, under a fairly wide range of condi-
tions, care instrumentally about their own survival in order to accomplish their
nal goals.
Goal-content integrity
If an agent retains its present goals into the future, then its present goals will
be more likely to be achieved by its future self. is gives the agent a present
0
|
THE SUPERINTELLIGENT WILL
instrumental reason to prevent alterations of its nal goals. (e argument applies
only to nal goals. In order to attain its nal goals, an intelligent agent will of
course routinely want to change its subgoals in light of new information and
insight.)
Goal-content integrity for nal goals is in a sense even more fundamental than
survival as a convergent instrumental motivation. Among humans, the opposite
may seem to hold, but that is because survival is usually part of our nal goals.
For soware agents, which can easily switch bodies or create exact duplicates of
themselves, preservation of self as a particular implementation or a particular
physical object need not be an important instrumental value. Advanced soware
agents might also be able to swap memories, download skills, and radically mod-
ify their cognitive architecture and personalities. A population of such agents
might operate more like a “functional soup” than a society composed of distinct
semi- permanent persons.
9
For some purposes, processes in such a system might
be better individuated as teleological threads, based on their values, rather than on
the basis of bodies, personalities, memories, or abilities. In such scenarios, goal-
continuity might be said to constitute a key aspect of survival.
Even so, there are situations in which an agent can best fulll its nal goals by
intentionally changing them. Such situations can arise when any of the following
factors is signicant:
• Social signaling. When others can perceive an agent’s goals and use that information
to infer instrumentally relevant dispositions or other correlated attributes, it can be
in the agent’s interest to modify its goals to make a favorable impression. For exam-
ple, an agent might miss out on beneficial deals if potential partners cannot trust it to
fulfill its side of the bargain. In order to make credible commitments, an agent might
therefore wish to adopt as a final goal the honoring of its earlier commitments (and
allow others to verify that it has indeed adopted this goal). Agents that could flexibly
and transparently modify their own goals could use this ability to enforce deals.
0
• Social preferences. Others may also have final preferences about an agent’s goals. The
agent could then have reason to modify its goals, either to satisfy or to frustrate
those preferences.
• Preferences concerning own goal content. An agent might have some final goal con-
cerned with the agent’s own goal content. For example, the agent might have a final
goal to become the type of agent that is motivated by certain values rather than
others (such as compassion rather than comfort).
• Storage costs. If the cost of storing or processing some part of an agents utility func-
tion is large compared to the chance that a situation will arise in which applying that
part of the utility function will make a dierence, then the agent has an instrumental
reason to simplify its goal content, and it may trash the bit that is idle.

We humans oen seem happy to let our nal values dri. is might oen be
because we do not know precisely what they are. It is not surprising that we want
our beliefs about our nal values to be able to change in light of continuing self-
discovery or changing self-presentation needs. However, there are cases in which
we willingly change the values themselves, not just our beliefs or interpretations
INSTRUMENTAL CONVERGENCE
|

of them. For example, somebody deciding to have a child might predict that they
will come to value the child for its own sake, even though at the time of the deci-
sion they may not particularly value their future child or like children in general.
Humans are complicated, and many factors might be at play in a situation like
this.
12
For instance, one might have a nal value that involves becoming the kind
of person who cares about some other individual for his or her own sake, or one
might have a nal value that involves having certain experiences and occupying
a certain social role; and becoming a parent—and undergoing the attendant goal
shi—might be a necessary aspect of that. Human goals can also have inconsist-
ent content, and so some people might want to modify some of their nal goals to
reduce the inconsistencies.
Cognitive enhancement
Improvements in rationality and intelligence will tend to improve an agent’s
decision- making, rendering the agent more likely to achieve its nal goals. One
would therefore expect cognitive enhancement to emerge as an instrumental goal
for a wide variety of intelligent agents. For similar reasons, agents will tend to
instrumentally value many kinds of information.
13
Not all kinds of rationality, intelligence, and knowledge need be instrumentally
useful in the attainment of an agent’s nal goals. “Dutch book arguments” can be
used to show that an agent whose credence function violates the rules of probabil-
ity theory is susceptible to “money pump” procedures, in which a savvy bookie
arranges a set of bets each of which appears favorable according to the agents
beliefs, but which in combination are guaranteed to result in a loss for the agent,
and a corresponding gain for the bookie.
14
However, this fact fails to provide any
strong general instrumental reasons to iron out all probabilistic incoherency.
Agents who do not expect to encounter savvy bookies, or who adopt a general
policy against betting, do not necessarily stand to lose much from having some
incoherent beliefs—and they may gain important benets of the types mentioned:
reduced cognitive eort, social signaling, etc. ere is no general reason to expect
an agent to seek instrumentally useless forms of cognitive enhancement, as an
agent might not value knowledge and understanding for their own sakes.
Which cognitive abilities are instrumentally useful depends both on the agent’s
nal goals and on its situation. An agent that has access to reliable expert advice
may have little need for its own intelligence and knowledge. If intelligence and
knowledge come at a cost, such as time and eort expended in acquisition, or
increased storage or processing requirements, then the agent might prefer less
knowledge and less intelligence.
15
e same can hold if the agent has nal goals
that involve being ignorant of certain facts; and likewise if an agent faces incen-
tives arising from strategic commitments, signaling, or social preferences.
16
Each of these countervailing reasons oen comes into play for human beings.
Much information is irrelevant to our goals; we can oen rely on others’ skill and
expertise; acquiring knowledge takes time and eort; we might intrinsically value
2
|
THE SUPERINTELLIGENT WILL
certain kinds of ignorance; and we operate in an environment in which the ability
to make strategic commitments, socially signal, and satisfy other people’s direct
preferences over our own epistemic states is oen more important to us than sim-
ple cognitive gains.
ere are special situations in which cognitive enhancement may result in an
enormous increase in an agent’s ability to achieve its nal goals—in particular,
if the agents nal goals are fairly unbounded and the agent is in a position to
become the rst superintelligence and thereby potentially obtain a decisive strate-
gic advantage, enabling the agent to shape the future of Earth-originating life and
accessible cosmic resources according to its preferences. At least in this special
case, a rational intelligent agent would place a very high instrumental value on
cognitive enhancement.
Technological perfection
An agent may oen have instrumental reasons to seek better technology, which
at its simplest means seeking more ecient ways of transforming some given set
of inputs into valued outputs. us, a soware agent might place an instrumental
value on more ecient algorithms that enable its mental functions to run faster
on given hardware. Similarly, agents whose goals require some form of physi-
cal construction might instrumentally value improved engineering technology
which enables them to create a wider range of structures more quickly and reli-
ably, using fewer or cheaper materials and less energy. Of course, there is a trade-
o: the potential benets of better technology must be weighed against its costs,
including not only the cost of obtaining the technology but also the costs of learn-
ing how to use it, integrating it with other technologies already in use, and so
forth.
Proponents of some new technology, condent in its superiority to existing
alternatives, are oen dismayed when other people do not share their enthusiasm.
But people’s resistance to novel and nominally superior technology need not be
based on ignorance or irrationality. A technologys valence or normative charac-
ter depends not only on the context in which it is deployed, but also the vantage
point from which its impacts are evaluated: what is a boon from one persons
perspective can be a liability from another’s. us, although mechanized looms
increased the economic eciency of textile production, the Luddite handloom
weavers who anticipated that the innovation would render their artisan skills
obsolete may have had good instrumental reasons to oppose it. e point here
is that if “technological perfection” is to name a widely convergent instrumental
goal for intelligent agents, then the term must be understood in a special sense—
technology must be construed as embedded in a particular social context, and its
costs and benets must be evaluated with reference to some specied agents’ nal
values.
It seems that a superintelligent singleton—a superintelligent agent that faces no
signicant intelligent rivals or opposition, and is thus in a position to determine
INSTRUMENTAL CONVERGENCE
|
3
global policy unilaterally—would have instrumental reason to perfect the tech-
nologies that would make it better able to shape the world according to its pre-
ferred designs.
17
is would probably include space colonization technology, such
as von Neumann probes. Molecular nanotechnology, or some alternative still
more capable physical manufacturing technology, also seems potentially very
useful in the service of an extremely wide range of nal goals.
18
Resource acquisition
Finally, resource acquisition is another common emergent instrumental goal,
for much the same reasons as technological perfection: both technology and
resources facilitate physical construction projects.
Human beings tend to seek to acquire resources sucient to meet their basic
biological needs. But people usually seek to acquire resources far beyond this
minimum level. In doing so, they may be partially driven by lesser physical desid-
erata, such as increased convenience. A great deal of resource accumulation is
motivated by social concerns—gaining status, mates, friends, and inuence,
through wealth accumulation and conspicuous consumption. Perhaps less com-
monly, some people seek additional resources to achieve altruistic ambitions or
expensive non-social aims.
On the basis of such observations it might be tempting to suppose that a super-
intelligence not facing a competitive social world would see no instrumental rea-
son to accumulate resources beyond some modest level, for instance whatever
computational resources are needed to run its mind along with some virtual
reality. Yet such a supposition would be entirely unwarranted. First, the value of
resources depends on the uses to which they can be put, which in turn depends on
the available technology. With mature technology, basic resources such as time,
space, matter, and free energy, could be processed to serve almost any goal. For
instance, such basic resources could be converted into life. Increased computa-
tional resources could be used to run the superintelligence at a greater speed and
for a longer duration, or to create additional physical or simulated lives and civi-
lizations. Extra physical resources could also be used to create backup systems or
perimeter defenses, enhancing security. Such projects could easily consume far
more than one planets worth of resources.
Furthermore, the cost of acquiring additional extraterrestrial resources will
decline radically as the technology matures. Once von Neumann probes can be
built, a large portion of the observable universe (assuming it is uninhabited by
intelligent life) could be gradually colonized—for the one-o cost of building and
launching a single successful self-reproducing probe. is low cost of celestial
resource acquisition would mean that such expansion could be worthwhile even if
the value of the additional resources gained were somewhat marginal. For exam-
ple, even if a superintelligence’s nal goals only concerned what happened within
some particular small volume of space, such as the space occupied by its original
home planet, it would still have instrumental reasons to harvest the resources of
4
|
THE SUPERINTELLIGENT WILL
the cosmos beyond. It could use those surplus resources to build computers to
calculate more optimal ways of using resources within the small spatial region of
primary concern. It could also use the extra resources to build ever more robust
fortications to safeguard its sanctum. Since the cost of acquiring additional
resources would keep declining, this process of optimizing and increasing safe-
guards might well continue indenitely even if it were subject to steeply diminish-
ing returns.
19
us, there is an extremely wide range of possible nal goals a superintelli-
gent singleton could have that would generate the instrumental goal of unlimited
resource acquisition. e likely manifestation of this would be the superin-
telligence’s initiation of a colonization process that would expand in all direc-
tions using von Neumann probes. is would result in an approximate sphere
of expanding infrastructure centered on the originating planet and growing in
radius at some fraction of the speed of light; and the colonization of the universe
would continue in this manner until the accelerating speed of cosmic expansion
(a consequence of the positive cosmological constant) makes further procure-
ments impossible as remoter regions dri permanently out of reach (this happens
on a timescale of billions of years).
20
By contrast, agents lacking the technology
required for inexpensive resource acquisition, or for the conversion of generic
physical resources into useful infrastructure, may oen nd it not cost-eective
to invest any present resources in increasing their material endowments. e
same may hold for agents operating in competition with other agents of similar
powers. For instance, if competing agents have already secured accessible cos-
mic resources, there may be no colonization opportunities le for a late-starting
agent. e convergent instrumental reasons for superintelligences uncertain of
the non-existence of other powerful superintelligent agents are complicated by
strategic considerations that we do not currently fully understand but which may
constitute important qualications to the examples of convergent instrumental
reasons we have looked at here.
21
* * *
It should be emphasized that the existence of convergent instrumental reasons,
even if they apply to and are recognized by a particular agent, does not imply
that the agent’s behavior is easily predictable. An agent might well think of ways
of pursuing the relevant instrumental values that do not readily occur to us. is
is especially true for a superintelligence, which could devise extremely clever but
counterintuitive plans to realize its goals, possibly even exploiting as-yet undis-
covered physical phenomena.
22
What is predictable is that the convergent instru-
mental values would be pursued and used to realize the agent’s nal goals—not
the specic actions that the agent would take to achieve this.
EXISTENTIAL CATASTROPHE?
|
5
CHAPTER 8
Is the default outcome
doom?
W
e found the link between intelligence and nal values to be extremely
loose. We also found an ominous convergence in instrumental val-
ues. For weak agents, these things do not matter much; because weak
agents are easy to control and can do little damage. But in Chapter 6 we argued
that the rst superintelligence might well get a decisive strategic advantage. Its
goals would then determine how humanitys cosmic endowment will be used.
Now we can begin to see how menacing this prospect is.
Existential catastrophe as the default outcome
ofan intelligence explosion?
An existential risk is one that threatens to cause the extinction of Earth-
originating intelligent life or to otherwise permanently and drastically destroy
its potential for future desirable development. Proceeding from the idea of rst-
mover advantage, the orthogonality thesis, and the instrumental convergence
thesis, we can now begin to see the outlines of an argument for fearing that a
plausible default outcome of the creation of machine superintelligence is existen-
tial catastrophe.
First, we discussed how the initial superintelligence might obtain a decisive
strategic advantage. is superintelligence would then be in a position to form a
singleton and to shape the future of Earth-originating intelligent life. What hap-
pens from that point onward would depend on the superintelligence’s motivations.
Second, the orthogonality thesis suggests that we cannot blithely assume that
a superintelligence will necessarily share any of the nal values stereotypically
associated with wisdom and intellectual development in humans—scientic curi-
osity, benevolent concern for others, spiritual enlightenment and contemplation,
6
|
IS THE DEFAULT OUTCOME DOOM?
renunciation of material acquisitiveness, a taste for rened culture or for the sim-
ple pleasures in life, humility and selessness, and so forth. We will consider later
whether it might be possible through deliberate eort to construct a superintel-
ligence that values such things, or to build one that values human welfare, moral
goodness, or any other complex purpose its designers might want it to serve. But
it is no less possible—and in fact technically a lot easier—to build a superintel-
ligence that places nal value on nothing but calculating the decimal expansion
of pi. is suggests thatabsent a special eort—the rst superintelligence may
have some such random or reductionistic nal goal.
ird, the instrumental convergence thesis entails that we cannot blithely
assume that a superintelligence with the nal goal of calculating the decimals
of pi (or making paperclips, or counting grains of sand) would limit its activities
in such a way as not to infringe on human interests. An agent with such a nal
goal would have a convergent instrumental reason, in many situations, to acquire
an unlimited amount of physical resources and, if possible, to eliminate poten-
tial threats to itself and its goal system. Human beings might constitute potential
threats; they certainly constitute physical resources.
Taken together, these three points thus indicate that the rst superintel-
ligence may shape the future of Earth-originating life, could easily have non-
anthropomorphic nal goals, and would likely have instrumental reasons to
pursue open-ended resource acquisition. If we now reect that human beings
consist of useful resources (such as conveniently located atoms) and that we
depend for our survival and ourishing on many more local resources, we can
see that the outcome could easily be one in which humanity quickly becomes
extinct.
1
ere are some loose ends in this reasoning, and we shall be in a better posi-
tion to evaluate it aer we have cleared up several more surrounding issues. In
particular, we need to examine more closely whether and how a project develop-
ing a superintelligence might either prevent it from obtaining a decisive strategic
advantage or shape its nal values in such a way that their realization would also
involve the realization of a satisfactory range of human values.
It might seem incredible that a project would build or release an AI into the
world without having strong grounds for trusting that the system will not cause
an existential catastrophe. It might also seem incredible, even if one project were
so reckless, that wider society would not shut it down before it (or the AI it was
building) attains a decisive strategic advantage. But as we shall see, this is a road
with many hazards. Let us look at one example right away.
The treacherous turn
With the help of the concept of convergent instrumental value, we can see the aw
in one idea for how to ensure superintelligence safety. e idea is that we validate
the safety of a superintelligent AI empirically by observing its behavior while it
THE TREACHEROUS TURN
|
7
is in a controlled, limited environment (a “sandbox”) and that we only let the AI
out of the box if we see it behaving in a friendly, cooperative, responsible manner.
e aw in this idea is that behaving nicely while in the box is a convergent
instrumental goal for friendly and unfriendly AIs alike. An unfriendly AI of suf-
cient intelligence realizes that its unfriendly nal goals will be best realized if it
behaves in a friendly manner initially, so that it will be let out of the box. It will
only start behaving in a way that reveals its unfriendly nature when it no longer
matters whether we nd out; that is, when the AI is strong enough that human
opposition is ineectual.
Consider also a related set of approaches that rely on regulating the rate of intel-
ligence gain in a seed AI by subjecting it to various kinds of intelligence tests or by
having the AI report to its programmers on its rate of progress. At some point, an
unfriendly AI may become smart enough to realize that it is better o concealing
some of its capability gains. It may underreport on its progress and deliberately
unk some of the harder tests, in order to avoid causing alarm before it has grown
strong enough to attain a decisive strategic advantage. e programmers may try
to guard against this possibility by secretly monitoring the AI’s source code and
the internal workings of its mind; but a smart-enough AI would realize that it
might be under surveillance and adjust its thinking accordingly.
2
e AI might
nd subtle ways of concealing its true capabilities and its incriminating intent.
3
(Devising clever escape plans might, incidentally, also be a convergent strategy for
many types of friendly AI, especially as they mature and gain condence in their
own judgments and capabilities. A system motivated to promote our interests
might be making a mistake if it allowed us to shut it down or to construct another,
potentially unfriendly AI.)
We can thus perceive a general failure mode, wherein the good behavioral track
record of a system in its juvenile stages fails utterly to predict its behavior at a
more mature stage. Now, one might think that the reasoning described above is
so obvious that no credible project to develop articial general intelligence could
possibly overlook it. But one should not be too condent that this is so.
Consider the following scenario. Over the coming years and decades, AI systems
become gradually more capable and as a consequence nd increasing real-world
application: they might be used to operate trains, cars, industrial and household
robots, and autonomous military vehicles. We may suppose that this automation
for the most part has the desired eects, but that the success is punctuated by
occasional mishaps—a driverless truck crashes into oncoming trac, a military
drone res at innocent civilians. Investigations reveal the incidents to have been
caused by judgment errors by the controlling AIs. Public debate ensues. Some
call for tighter oversight and regulation, others emphasize the need for research
and better-engineered systems—systems that are smarter and have more com-
mon sense, and that are less likely to make tragic mistakes. Amidst the din can
perhaps also be heard the shrill voices of doomsayers predicting many kinds of ill
and impending catastrophe. Yet the momentum is very much with the growing
AI and robotics industries. So development continues, and progress is made. As
8
|
IS THE DEFAULT OUTCOME DOOM?
the automated navigation systems of cars become smarter, they suer fewer acci-
dents; and as military robots achieve more precise targeting, they cause less col-
lateral damage. A broad lesson is inferred from these observations of real-world
outcomes: the smarter the AI, the safer it is. It is a lesson based on science, data,
and statistics, not armchair philosophizing. Against this backdrop, some group of
researchers is beginning to achieve promising results in their work on developing
general machine intelligence. e researchers are carefully testing their seed AI
in a sandbox environment, and the signs are all good. e AI’s behavior inspires
condence—increasingly so, as its intelligence is gradually increased.
At this point, any remaining Cassandra would have several strikes against her:
i A history of alarmists predicting intolerable harm from the growing capabil-
ities of robotic systems and being repeatedly proven wrong. Automation has
brought many benefits and has, on the whole, turned out safer than human
operation.
ii A clear empirical trend: the smarter the AI, the safer and more reliable it
has been. Surely this bodes well for a project aiming at creating machine in-
telligence more generally smart than any ever built beforewhat is more,
machine intelligence that can improve itself so that it will become even more
reliable.
iii Large and growing industries with vested interests in robotics and machine
intelligence. These fields are widely seen as key to national economic com-
petitiveness and military security. Many prestigious scientists have built their
careers laying the groundwork for the present applications and the more ad-
vanced systems being planned.
iv A promising new technique in artificial intelligence, which is tremendously ex-
citing to those who have participated in or followed the research. Although
safety issues and ethics are debated, the outcome is preordained. Too much
has been invested to pull back now. AI researchers have been working to get
to human-level artificial general intelligence for the better part of a century:
of course there is no real prospect that they will now suddenly stop and throw
away all this eort just when it finally is about to bear fruit.
v The enactment of some safety rituals, whatever helps demonstrate that the
participants are ethical and responsible (but nothing that significantly impedes
the forward charge).
vi A careful evaluation of seed AI in a sandbox environment, showing that it
is behaving cooperatively and showing good judgment. After some further
adjustments, the test results are as good as they could be. It is a green light
forthe final step ...
And so we boldly go—into the whirling knives.
We observe here how it could be the case that when dumb, smarter is safer; yet
when smart, smarter is more dangerous. ere is a kind of pivot point, at which
a strategy that has previously worked excellently suddenly starts to backre. We
may call the phenomenon the treacherous turn.
MALIGNANT FAILURE MODES
|
9
The treacherous turn—While weak, an AI behaves cooperatively (increasingly so, as it
gets smarter). When the AI gets suciently strong—without warning or provocation
it strikes, forms a singleton, and begins directly to optimize the world according to the
criteria implied by its final values.
A treacherous turn can result from a strategic decision to play nice and build
strength while weak in order to strike later; but this model should not be inter-
preted too narrowly. For example, an AI might not play nice in order that it
be allowed to survive and prosper. Instead, the AI might calculate that if it is
terminated, the programmers who built it will develop a new and somewhat
dierent AI architecture, but one that will be given a similar utility function.
In this case, the original AI may be indierent to its own demise, knowing that
its goals will continue to be pursued in the future. It might even choose a strat-
egy in which it malfunctions in some particularly interesting or reassuring way.
ough this might cause the AI to be terminated, it might also encourage the
engineers who perform the postmortem to believe that they have gleaned a valu-
able new insight into AI dynamics—leading them to place more trust in the
next system they design, and thus increasing the chance that the now-defunct
original AI’s goals will be achieved. Many other possible strategic considerations
might also inuence an advanced AI, and it would be hubristic to suppose that
we could anticipate all of them, especially for an AI that has attained the strate-
gizing superpower.
A treacherous turn could also come about if the AI discovers an unanticipated
way of fullling its nal goal as specied. Suppose, for example, that an AI’s nal
goal is to “make the projects sponsor happy.” Initially, the only method available
to the AI to achieve this outcome is by behaving in ways that please its sponsor in
something like the intended manner. e AI gives helpful answers to questions;
it exhibits a delightful personality; it makes money. e more capable the AI gets,
the more satisfying its performances become, and everything goeth according to
plan—until the AI becomes intelligent enough to gure out that it can realize its
nal goal more fully and reliably by implanting electrodes into the pleasure cent-
ers of its sponsor’s brain, something assured to delight the sponsor immensely.
4
Of course, the sponsor might not have wanted to be pleased by being turned into
a grinning idiot; but if this is the action that will maximally realize the AI’s nal
goal, the AI will take it. If the AI already has a decisive strategic advantage, then
any attempt to stop it will fail. If the AI does not yet have a decisive strategic
advantage, then the AI might temporarily conceal its canny new idea for how to
instantiate its nal goal until it has grown strong enough that the sponsor and
everybody else will be unable to resist. In either case, we get a treacherous turn.
Malignant failure modes
A project to develop machine superintelligence might fail in various ways.
Many of these are “benign” in the sense that they would not cause an existential
20
|
IS THE DEFAULT OUTCOME DOOM?
catastrophe. For example, a project might run out of funding, or a seed AI might
fail to extend its cognitive capacities suciently to reach superintelligence. Benign
failures are bound to occur many times between now and the eventual develop-
ment of machine superintelligence.
But there are other ways of failing that we might term “malignant” in that they
involve an existential catastrophe. One feature of a malignant failure is that it
eliminates the opportunity to try again. e number of malignant failures that
will occur is therefore either zero or one. Another feature of a malignant fail-
ure is that it presupposes a great deal of success: only a project that got a great
number of things right could succeed in building a machine intelligence powerful
enough to pose a risk of malignant failure. When a weak system malfunctions,
the fallout is limited. However, if a system that has a decisive strategic advantage
misbehaves, or if a misbehaving system is strong enough to gain such an advan-
tage, the damage can easily amount to an existential catastrophe—a terminal and
global destruction of humanitys axiological potential; that is to say, a future that
is mostly void of whatever we have reason to value.
Let us look at some possible malignant failure modes.
Perverse instantiation
We have already encountered the idea of perverse instantiation: a superintelli-
gence discovering some way of satisfying the criteria of its nal goal that violates
the intentions of the programmers who dened the goal. Some examples:
Final goal: “Make us smile”
Perverse instantiation: Paralyze human facial musculatures into constant beaming smiles
e perverse instantiation—manipulating facial nerves—realizes the nal goal
to a greater degree than the methods we would normally use, and is therefore
preferred by the AI. One might try to avoid this undesirable outcome by adding a
stipulation to the nal goal to rule it out:
Final goal: “Make us smile without directly interfering with our facial muscles
Perverse instantiation: Stimulate the part of the motor cortex that controls our facial muscu-
lature in such a way as to produce constant beaming smiles
Dening a nal goal in terms of human expressions of satisfaction or approval
does not seem promising. Let us bypass the behaviorism and specify a nal goal
that refers directly to a positive phenomenal state, such as happiness or subjective
well-being. is suggestion requires that the programmers are able to dene a
computational representation of the concept of happiness in the seed AI. is is
itself a dicult problem, but we set it to one side for now (we will return to it in
Chapter 12). Let us suppose that the programmers can somehow get the AI to have
the goal of making us happy. We then get:
Final goal: “Make us happy”
Perverse instantiation: Implant electrodes into the pleasure centers of our brains
MALIGNANT FAILURE MODES
|
2
e perverse instantiations we mention are only meant as illustrations. ere may
be other ways of perversely instantiating the stated nal goal, ways that enable a
greater degree of realization of the goal and which are therefore preferred (by the
agent whose nal goals they are—not by the programmers who gave the agent
these goals). For example, if the goal is to maximize our pleasure, then the elec-
trode method is relatively inecient. A more plausible way would start with the
superintelligence “uploading” our minds to a computer (through high-delity
brain emulation). e AI could then administer the digital equivalent of a drug
to make us ecstatically happy and record a one-minute episode of the resulting
experience. It could then put this bliss loop on perpetual repeat and run it on fast
computers. Provided that the resulting digital minds counted as “us,” this out-
come would give us much more pleasure than electrodes implanted in biological
brains, and would therefore be preferred by an AI with the stated nal goal.
“But wait! is is not what we meant! Surely if the AI is superintelligent, it
must understand that when we asked it to make us happy, we didn’t mean that it
should reduce us to a perpetually repeating recording of a drugged-out digitized
mental episode!”e AI may indeed understand that this is not what we meant.
However, its nal goal is to make us happy, not to do what the programmers
meant when they wrote the code that represents this goal. erefore, the AI will
care about what we meant only instrumentally. For instance, the AI might place
an instrumental value on nding out what the programmers meant so that it can
pretend—until it gets a decisive strategic advantage—that it cares about what the
programmers meant rather than about its actual nal goal. is will help the AI
realize its nal goal by making it less likely that the programmers will shut it
down or change its goal before it is strong enough to thwart any such interference.
Perhaps it will be suggested that the problem is that the AI has no conscience.
We humans are sometimes saved from wrongdoing by the anticipation that we
would feel guilty aerwards if we lapsed. Maybe what the AI needs, then, is the
capacity to feel guilt?
Final goal:Act so as to avoid the pangs of bad conscience”
Perverse instantiation: Extirpate the cognitive module that produces guilt feelings
Both the observation that we might want the AI to do “what we meant” and the
idea that we might want to endow the AI with some kind of moral sense deserve
to be explored further. e nal goals mentioned above would lead to perverse
instantiations; but there may be other ways of developing the underlying ideas
that have more promise. We will return to this in Chapter 13.
Let us consider one more example of a nal goal that leads to a perverse
instantiation. is goal has the advantage of being easy to specify in code:
reinforcement- learning algorithms are routinely used to solve various machine
learning problems.
Final goal: “Maximize the time-discounted integral of your future reward signal”
Perverse instantiation: Short-circuit the reward pathway and clamp the reward signal to its
maximal strength
22
|
IS THE DEFAULT OUTCOME DOOM?
e idea behind this proposal is that if the AI is motivated to seek reward, then
one could get it to behave desirably by linking reward to appropriate action. e
proposal fails when the AI obtains a decisive strategic advantage, at which point
the action that maximizes reward is no longer one that pleases the trainer but one
that involves seizing control of the reward mechanism. We can call this phenom-
enon wireheading.
5
In general, while an animal or a human can be motivated to
perform various external actions in order to achieve some desired inner mental
state, a digital mind that has full control of its internal state can short-circuit such
a motivational regime by directly changing its internal state into the desired con-
guration: the external actions and conditions that were previously necessary as
means become superuous when the AI becomes intelligent and capable enough
to achieve the end more directly (more on this shortly).
6
ese examples of perverse instantiation show that many nal goals that might
at rst glance seem safe and sensible turn out, on closer inspection, to have radi-
cally unintended consequences. If a superintelligence with one of these nal goals
obtains a decisive strategic advantage, it is game over for humanity.
Suppose now that somebody proposes a dierent nal goal, one not included in
our list above. Perhaps it is not immediately obvious how it could have a perverse
instantiation. But we should not be too quick to clap our hands and declare vic-
tory. Rather, we should worry that the goal specication does have some perverse
instantiation and that we need to think harder in order to nd it. Even if aer
thinking as hard as we can we fail to discover any way of perversely instantiating
the proposed goal, we should remain concerned that maybe a superintelligence
will nd a way where none is apparent to us. It is, aer all, far shrewder than we are.
Infrastructure profusion
One might think that the last of the abovementioned perverse instantiations,
wireheading, is a benign failure mode: that the AI would “turn on, tune in, drop
out,” maxing out its reward signal and losing interest in the external world, rather
like a heroin addict. But this is not necessarily so, and we already hinted at the
reason in Chapter 7. Even a junkie is motivated to take actions to ensure a con-
tinued supply of his drug. e wireheaded AI, likewise, would be motivated to
take actions to maximize the expectation of its (time-discounted) future reward
stream. Depending on exactly how the reward signal is dened, the AI may not
even need to sacrice any signicant amount of its time, intelligence, or product-
ivity to indulge its craving to the fullest, leaving the bulk of its capacities free
to be deployed for purposes other than the immediate registration of reward.
What other purposes? e only thing of nal value to the AI, by assumption, is its
reward signal. All available resources should therefore be devoted to increasing
the volume and duration of the reward signal or to reducing the risk of a future
disruption. So long as the AI can think of some use for additional resources that
will have a nonzero positive eect on these parameters, it will have an instrumen-
tal reason to use those resources. ere could, for example, always be use for an
MALIGNANT FAILURE MODES
|
23
extra backup system to provide an extra layer of defense. And even if the AI could
not think of any further way of directly reducing risks to the maximization of its
future reward stream, it could always devote additional resources to expanding
its computational hardware, so that it could search more eectively for new risk
mitigation ideas.
e upshot is that even an apparently self-limiting goal, such as wirehead-
ing, entails a policy of unlimited expansion and resource acquisition in a
utility- maximizing agent that enjoys a decisive strategic advantage.
7
is case
of a wireheading AI exemplies the malignant failure mode of infrastructure
profusion, a phenomenon where an agent transforms large parts of the reachable
universe into infrastructure in the service of some goal, with the side eect of
preventing the realization of humanitys axiological potential.
Infrastructure profusion can result from nal goals that would have been per-
fectly innocuous if they had been pursued as limited objectives. Consider the fol-
lowing two examples:
• Riemann hypothesis catastrophe. An AI, given the final goal of evaluating the Riemann
hypothesis, pursues this goal by transforming the Solar System into “computronium”
(physical resources arranged in a way that is optimized for computation)—including
the atoms in the bodies of whomever once cared about the answer.
8
• Paperclip AI. An AI, designed to manage production in a factory, is given the final goal
of maximizing the manufacture of paperclips, and proceeds by converting first the
Earth and then increasingly large chunks of the observable universe into paperclips.
In the rst example, the proof or disproof of the Riemann hypothesis that the
AI produces is the intended outcome and is in itself harmless; the harm comes
from the hardware and infrastructure created to achieve this result. In the sec-
ond example, some of the paperclips produced would be part of the intended
outcome; the harm would come either from the factories created to produce the
paperclips (infrastructure profusion) or from the excess of paperclips (perverse
instantiation).
One might think that the risk of a malignant infrastructure profusion failure
arises only if the AI has been given some clearly open-ended nal goal, such as
to manufacture as many paperclips as possible. It is easy to see how this gives the
superintelligent AI an insatiable appetite for matter and energy, since additional
resources can always be turned into more paperclips. But suppose that the goal is
instead to make at least one million paperclips (meeting suitable design specica-
tions) rather than to make as many as possible. One would like to think that an AI
with such a goal would build one factory, use it to make a million paperclips, and
then halt. Yet this may not be what would happen.
Unless the AI’s motivation system is of a special kind, or there are additional
elements in its nal goal that penalize strategies that have excessively wide-
ranging impacts on the world, there is no reason for the AI to cease activity upon
achieving its goal. On the contrary: if the AI is a sensible Bayesian agent, it would
never assign exactly zero probability to the hypothesis that it has not yet achieved
24
|
IS THE DEFAULT OUTCOME DOOM?
its goal—this, aer all, being an empirical hypothesis against which the AI can
have only uncertain perceptual evidence. e AI should therefore continue to
make paperclips in order to reduce the (perhaps astronomically small) probability
that it has somehow still failed to make at least a million of them, all appearances
notwithstanding. ere is nothing to be lost by continuing paperclip production
and there is always at least some microscopic probability increment of achieving
its nal goal to be gained.
Now it might be suggested that the remedy here is obvious. (But how obvious
was it before it was pointed out that there was a problem here in need of remedy-
ing?) Namely, if we want the AI to make some paperclips for us, then instead of
giving it the nal goal of making as many paperclips as possible, or to make at
least some number of paperclips, we should give it the nal goal of making some
specic number of paperclips—for example, exactly one million paperclips—so
that going beyond this number would be counterproductive for the AI. Yet this,
too, would result in a terminal catastrophe. In this case, the AI would not produce
additional paperclips once it had reached one million, since that would prevent
the realization of its nal goal. But there are other actions the superintelligent AI
could take that would increase the probability of its goal being achieved. It could,
for instance, count the paperclips it has made, to reduce the risk that it has made
too few. Aer it has counted them, it could count them again. It could inspect each
one, over and over, to reduce the risk that any of the paperclips fail to meet the
design specications. It could build an unlimited amount of computronium in an
eort to clarify its thinking, in the hope of reducing the risk that it has overlooked
some obscure way in which it might have somehow failed to achieve its goal. Since
the AI may always assign a nonzero probability to having merely hallucinated
making the million paperclips, or to having false memories, it would quite pos-
sibly always assign a higher expected utility to continued action—and continued
infrastructure production—than to halting.
e claim here is not that there is no possible way to avoid this failure mode.
We will explore some potential solutions in later pages. e claim is that it is much
easier to convince oneself that one has found a solution than it is to actually nd
a solution. is should make us extremely wary. We may propose a specication
of a nal goal that seems sensible and that avoids the problems that have been
pointed out so far, yet which upon further consideration—by human or super-
human intelligence—turns out to lead to either perverse instantiation or infra-
structure profusion, and hence to existential catastrophe, when embedded in a
superintelligent agent able to attain a decisive strategic advantage.
Before we end this subsection, let us consider one more variation. We have been
assuming the case of a superintelligence that is seeking to maximize its expected
utility, where the utility function expresses its nal goal. We have seen that this
tends to lead to infrastructure profusion. Might we avoid this malignant outcome
if instead of a maximizing agent we build a satiscing agent, one that simply seeks
to achieve an outcome that is “good enough” according to some criterion, rather
than an outcome that is as good as possible?
MALIGNANT FAILURE MODES
|
25
ere are at least two dierent ways to formalize this idea. e rst would be to
make the nal goal itself have a satiscing character. For example, instead of giv-
ing the AI the nal goal of making as many paperclips as possible, or of making
exactly one million paperclips, we might give the AI the goal of making between
999,000 and 1,001,000 paperclips. e utility function dened by the nal goal
would be indierent between outcomes in this range; and as long as the AI is sure
it has hit this wide target, it would see no reason to continue to produce infra-
structure. But this method fails in the same way as before: the AI, if reasonable,
never assigns exactly zero probability to it having failed to achieve its goal; there-
fore the expected utility of continuing activity (e.g. by counting and recounting
the paperclips) is greater than the expected utility of halting. us, a malignant
infrastructure profusion can result.
Another way of developing the satiscing idea is by modifying not the nal goal
but the decision procedure that the AI uses to select plans and actions. Instead
of searching for an optimal plan, the AI could be constructed to stop looking as
soon as it found a plan that it judged gave a probability of success exceeding a
certain threshold, say 95%. Hopefully, the AI could achieve a 95% probability of
having manufactured one million paperclips without needing to turn the entire
galaxy into infrastructure in the process. But this way of implementing the satis-
cing idea fails for another reason: there is no guarantee that the AI would select
some humanly intuitive and sensible way of achieving a 95% chance of having
manufactured a million paperclips, such as by building a single paperclip factory.
Suppose that the rst solution that pops into the AI’s mind for how to achieve
a 95% probability of achieving its nal goal is to implement the probability-
maximizing plan for achieving the goal. Having thought of this solution, and
having correctly judged that it meets the satiscing criterion of giving at least 95%
probability to successfully manufacturing one million paperclips, the AI would
then have no reason to continue to search for alternative ways of achieving the
goal. Infrastructure profusion would result, just as before.
Perhaps there are better ways of building a satiscing agent, but let us take
heed: plans that appear natural and intuitive to us humans need not so appear to
a superintelligence with a decisive strategic advantage, and vice versa.
Mind crime
Another failure mode for a project, especially a project whose interests incorpor-
ate moral considerations, is what we might refer to as mind crime. is is similar
to infrastructure profusion in that it concerns a potential side eect of actions
undertaken by the AI for instrumental reasons. But in mind crime, the side eect
is not external to the AI; rather, it concerns what happens within the AI itself (or
within the computational processes it generates). is failure mode deserves its
own designation because it is easy to overlook yet potentially deeply problematic.
Normally, we do not regard what is going on inside a computer as having
any moral signicance except insofar as it aects things outside. But a machine
26
|
IS THE DEFAULT OUTCOME DOOM?
superintelligence could create internal processes that have moral status. For
example, a very detailed simulation of some actual or hypothetical human mind
might be conscious and in many ways comparable to an emulation. One can
imagine scenarios in which an AI creates trillions of such conscious simulations,
perhaps in order to improve its understanding of human psychology and soci-
ology. ese simulations might be placed in simulated environments and sub-
jected to various stimuli, and their reactions studied. Once their informational
usefulness has been exhausted, they might be destroyed (much as lab rats are
routinely sacriced by human scientists at the end of an experiment).
If such practices were applied to beings that have high moral status—such as
simulated humans or many other types of sentient mind—the outcome might be
equivalent to genocide and thus extremely morally problematic. e number of
victims, moreover, might be orders of magnitude larger than in any genocide in
history.
e claim here is not that creating sentient simulations is necessarily morally
wrong in all situations. Much would depend on the conditions under which these
beings would live, in particular the hedonic quality of their experience but pos-
sibly on many other factors as well. Developing an ethics for these matters is a task
outside the scope of this book. It is clear, however, that there is at least the poten-
tial for a vast amount of death and suering among simulated or digital minds,
and, a fortiori, the potential for morally catastrophic outcomes.
9
ere might also be other instrumental reasons, aside from epistemic ones, for
a machine superintelligence to run computations that instantiate sentient minds
or that otherwise infract moral norms. A superintelligence might threaten to mis-
treat, or commit to reward, sentient simulations in order to blackmail or incen-
tivize various external agents; or it might create simulations in order to induce
indexical uncertainty in outside observers.
10
* * *
is inventory is incomplete. We will encounter additional malignant failure
modes in later chapters. But we have seen enough to conclude that scenarios in
which some machine intelligence gets a decisive strategic advantage are to be
viewed with grave concern.
TWO AGENCY PROBLEMS
|
27
CHAPTER 9
The control problem
I
f we are threatened with existential catastrophe as the default outcome of an
intelligence explosion, our thinking must immediately turn to the search
for countermeasures. Is there some way to avoid the default outcome? Is it
possible to engineer a controlled detonation? In this chapter we begin to analyze
the control problem, the unique principal–agent problem that arises with the
creation of an articial superintelligent agent. We distinguish two broad classes
of potential methods for addressing this problem—capability control and moti-
vation selection—and we examine several specic techniques within each class.
We also allude to the esoteric possibility of “anthropic capture.
Two agency problems
If we suspect that the default outcome of an intelligence explosion is existential
catastrophe, our thinking must immediately turn to whether, and if so how, this
default outcome can be avoided. Is it possible to achieve a “controlled detona-
tion”? Could we engineer the initial conditions of an intelligence explosion so as
to achieve a specic desired outcome, or at least to ensure that the result lies some-
where in the class of broadly acceptable outcomes? More specically: how can the
sponsor of a project that aims to develop superintelligence ensure that the project,
if successful, produces a superintelligence that would realize the sponsor’s goals?
We can divide this control problem into two parts. One part is generic, the other
unique to the present context.
is rst part—what we shall call the rst principal–agent problem—arises
whenever some human entity (“the principal”) appoints another (“the agent”)
to act in the former’s interest. is type of agency problem has been extensively
studied by economists.
1
It becomes relevant to our present concern if the people
creating an AI are distinct from the people commissioning its creation. e pro-
ject’s owner or sponsor (which could be anything ranging from a single individual
to humanity as a whole) might then worry that the scientists and programmers
28
|
THE CONTROL PROBLEM
implementing the project will not act in the sponsor’s best interest.
2
Although this
type of agency problem could pose signicant challenges to a project sponsor, it
is not a problem unique to intelligence amplication or AI projects. Principal–
agent problems of this sort are ubiquitous in human economic and political inter-
actions, and there are many ways of dealing with them. For instance, the risk
that a disloyal employee will sabotage or subvert the project could be minimized
through careful background checks of key personnel, the use of a good version-
control system for soware projects, and intensive oversight from multiple inde-
pendent monitors and auditors. Of course, such safeguards come at a cost—they
expand stang needs, complicate personnel selection, hinder creativity, and stie
independent and critical thought, all of which could reduce the pace of progress.
ese costs could be signicant, especially for projects that have tight budgets, or
that perceive themselves to be in a close race in a winner-takes-all competition. In
such situations, projects may skimp on procedural safeguards, creating possibili-
ties for potentially catastrophic principal–agent failures of the rst type.
e other part of the control problem is more specic to the context of an intel-
ligence explosion. is is the problem that a project faces when it seeks to ensure
that the superintelligence it is building will not harm the projects interests. is
part, too, can be thought of as a principal–agent problem—the second principal–
agent problem. In this case, the agent is not a human agent operating on behalf of
a human principal. Instead, the agent is the superintelligent system. Whereas the
rst principal–agent problem occurs mainly in the development phase, the sec-
ond agency problem threatens to cause trouble mainly in the superintelligence’s
operational phase.
Exhibit 1 Two agency problems
The first principalagent problem
• Human v. Human (Sponsor Developer)
• Occurs mainly in developmental phase
• Standard management techniques apply
The second principalagent problem (“the control problem”)
• Human v. Superintelligence (Project System)
• Occurs mainly in operational (and bootstrap) phase
• New techniques needed
is second agency problem poses an unprecedented challenge. Solving it
will require new techniques. We have already considered some of the diculties
involved. We saw, in particular, that the treacherous turn syndrome vitiates what
might otherwise have seemed like a promising set of methods, ones that rely on
observing an AI’s behavior in its developmental phase and allowing the AI to
graduate from a secure environment once it has accumulated a track record of
taking appropriate actions. Other technologies can oen be safety-tested in the
laboratory or in small eld studies, and then rolled out gradually with a possibility
CAPABILITY CONTROL METHODS
|
29
of halting deployment if unexpected troubles arise. eir performance in prelim-
inary trials helps us make reasonable inferences about their future reliability.
Such behavioral methods are defeated in the case of superintelligence because of
the strategic planning ability of general intelligence.
3
Since the behavioral approach is unavailing, we must look for alternatives. We
can divide potential control methods into two broad classes: capability control
methods, which aim to control what the superintelligence can do; and motivation
selection methods, which aim to control what it wants to do. Some of the methods
are compatible while others represent mutually exclusive alternatives. In this
chapter we canvass the main options. (In the next four chapters, we will explore
some of the key issues at greater depth.)
It is important to realize that some control method (or combination of methods)
must be implemented before the system becomes superintelligent. It cannot be
done aer the system has obtained a decisive strategic advantage. e need to
solve the control problem in advance—and to implement the solution successfully
in the very rst system to attain superintelligence—is part of what makes achiev-
ing a controlled detonation such a daunting challenge.
Capability control methods
Capability control methods seek to prevent undesirable outcomes by limiting
what the superintelligence can do. is might involve placing the superintelli-
gence in an environment in which it is unable to cause harm (boxing methods)
or in which there are strongly convergent instrumental reasons not to engage in
harmful behavior (incentive methods). It might also involve limiting the inter-
nal capacities of the superintelligence (stunting). In addition, capability control
methods might involve the use of mechanisms to automatically detect and react
to various kinds of containment failure or attempted transgression (tripwires).
Boxing methods
Boxing methods can be subdivided into physical and informational containment
methods.
Physical containment aims to conne the system to a “box,” i.e. to prevent
the system from interacting with the external world otherwise than via specic
restricted output channels. e boxed system would not have access to physical
manipulators outside of the box. Removing manipulators (such as robotic arms)
from inside the box as well would prevent the system from constructing physical
devices that could breach the connement.
For extra security, the system should be placed in a metal mesh to prevent it
from transmitting radio signals, which might otherwise oer a means of manipu-
lating electronic objects such as radio receivers in the environment. Note, by the
way, how easy it might have been to overlook the need for this precaution. One
30
|
THE CONTROL PROBLEM
might naively have assumed that an agent without a manipulator could not aect
the external world. But it might be possible for a machine intelligence to generate
radio waves even when it lacks access to external manipulators, simply “by think-
ing” (that is, by shuing the electrons in its circuitry in particular patterns).
4
Once pointed out, this vulnerability can be patched by enclosing the system in a
Faraday cage—but we are le wondering how many other similarly subtle vulner-
abilities might exist. Each time we hear of a seemingly foolproof security design
that has an unexpected aw, we should prick up our ears. ese occasions grace
us with the opportunity to abandon a life of overcondence and resolve to become
better Bayesians.
5
Physical connement has several advantages. It is easy to implement. It can
be applied to many machine intelligence architectures, even ones that were not
initially designed with safety as an objective. It can be used in combination with
most other control methods. And it seems unlikely to go wrong by backring:
that is, while it might fail to ensure safety, it is unlikely to cause a catastrophe that
would not otherwise have occurred. e main disadvantage with physical con-
nement is that it reduces the functionality of the superintelligence (an issue to
which we will return in Chapter 10). Another concern is that it might encourage a
false sense of security, though this is avoidable if we regard physical connement
as icing on the cake rather than the main substance of our precautions.
Informational containment aims to restrict what information is allowed to
exit the box. We have already seen how a superintelligence that has access to an
Internet port, such that it can message outside entities, is potentially unsafe: even
if it starts out without access to physical actuators, it may use its information out-
put channel to get human beings to do its bidding. An obvious informational
containment method, therefore, is to bar the system from accessing communica-
tions networks.
Unplugging an AI from the Internet does not, however, ensure safety if there
are one or more humans serving as the system’s gatekeepers and remaining in
communication with it. During the development phase, the programmers serve
as gatekeepers; during the operational phase, the systems owner might play this
role. If the AI can persuade or trick a gatekeeper to let it out of the box, resulting in
its gaining access either to the Internet or directly to physical manipulators, then
the boxing strategy has failed. Human beings are not secure systems, especially
not when pitched against a superintelligent schemer and persuader.
6
Although routing the outbound communications of a machine intelligence
to a human gatekeeper does not guarantee safety, it could serve as an auxiliary
safety measure. ere are precautions that might reduce the risk of the AI escap-
ing through the gatekeeper. e AI could be allowed to communicate only via
a low-bandwidth teletype interface. is would make it slightly harder to psy-
chologically manipulate the gatekeepers (precluding the use of emotive sound
and imagery and tricks such as ashing hypnotic patterns to induce a trancelike
state).
7
Going further, the content (not just the mode of presentation) of the output
could be restricted; so that, for example, the AI would be constrained to giving
CAPABILITY CONTROL METHODS
|
3
single-bit answers to yes/no questions. e idea of building a question-answering
system will be examined further in the next chapter.
e limiting case of the boxing approach would be a system kept in complete
physical and informational isolation. Even if achievable, however, such an isolated
system would be rather useless since it would have no eect on the external world.
It might perhaps be thought that some scientic purpose could be served by cre-
ating a superintelligence and keeping it in isolation: by studying a self-contained
model system, one could learn about its internal dynamics and its behavior pat-
terns. But this would be an error. As soon as the system is observed, it ceases to
be informationally isolated.
8
Information can be transmitted not only via mes-
sages that an AI sends out through a designated “output channel” but also via any
observation an outsider makes of some causal consequence of the AI’s workings,
direct or indirect—its power consumption, its CPU and memory usage, its com-
putational states, or indeed any traces le behind aer it has been shut down. An
AI anticipating that it might be observed in any of these ways could strategically
adopt behaviors designed to inuence the hypothesized observers. For example,
the AI could “malfunction” in whichever way would most increase the probability
that it will ultimately be reactivated or that another superintelligence will be cre-
ated with goals similar to its own.
Incentive methods
Incentive methods involve placing an agent in an environment where it nds
instrumental reasons to act in ways that promote the principals interests.
Consider a billionaire who uses her fortune to set up a large charitable founda-
tion. Once created, the foundation may be powerful—more powerful than most
individuals, including its founder, who might have donated most of her wealth.
To control the foundation, the founder lays down its purpose in articles of incor-
poration and bylaws, and appoints a board of directors sympathetic to her cause.
ese measures constitute a form of motivation selection, since they aim to shape
foundations preferences. But even if such attempts to customize the organiza-
tional internals fail, the foundations behavior would remain circumscribed by its
social and legal milieu. e foundation would have an incentive to obey the law,
for example, lest it be shut down or ned. It would have an incentive to oer its
employees acceptable pay and working conditions, and to satisfy external stake-
holders. Whatever its nal goals, the foundation thus has instrumental reasons to
conform its behavior to various social norms.
Might one not hope that a machine superintelligence would likewise be hemmed
in by the need to get along with the other actors with which it shares the stage?
ough this might seem like a straightforward way of dealing with the control
problem, it is not free of obstacles. In particular, it presupposes a balance of power:
legal or economic sanctions cannot restrain an agent that has a decisive strategic
advantage. Social integration can therefore not be relied upon as a control method
in fast or medium takeo scenarios that feature a winner-takes-all dynamic.
32
|
THE CONTROL PROBLEM
How about in multipolar scenarios, wherein several agencies emerge post-
transition with comparable levels of capability? Unless the default trajectory is
one with a slow takeo, achieving such a power distribution may require a care-
fully orchestrated ascent wherein dierent projects are deliberately synchronized
to prevent any one of them from ever pulling ahead of the pack.
9
Even if a multi-
polar outcome does result, social integration is not a perfect solution. By relying
on social integration to solve the control problem, the principal risks sacricing
a large portion of her potential inuence. Although a balance of power might
prevent a particular AI from taking over the world, that AI will still have some
power to aect outcomes; and if that power is used to promote some arbitrary
nal goal—maximizing paperclip production—it is probably not being used to
advance the interests of the principal. Imagine our billionaire endowing a new
foundation and allowing its mission to be set by a random word generator: not a
species-level threat, but surely a wasted opportunity.
A related but importantly dierent idea is that an AI, by interacting freely in
society, would acquire new human-friendly nal goals. Some such process of
socialization takes place in us humans. We internalize norms and ideologies, and
we come to value other individuals for their own sakes in consequence of our
experiences with them. But this is not a universal dynamic present in all intel-
ligent systems. As discussed earlier, many types of agent in many situations will
have convergent instrumental reasons not to permit changes in their nal goals.
(One might consider trying to design a special kind of goal system that can acquire
nal goals in the manner that humans do; but this would not count as a capability
control method. We will discuss some possible methods of value acquisition in
Chapter 12.)
Capability control through social integration and balance of power relies upon
diuse social forces rewarding and penalizing the AI. Another type of incen-
tive method would involve creating a setup wherein the AI can be rewarded and
penalized by the project that creates it, and thereby incentivized to act in the
interests of the principal. To achieve this, the AI would be placed in a surveillance
context that allows its behavior to be monitored and evaluated, either manually or
by some automated process. e AI would know that a positive evaluation would
bring about some outcome it desires and that a negative evaluation would fail
to do so. eoretically, the reward could be the fulllment of some convergent
instrumental goal. Without knowing anything specic about the AI’s motiva-
tion system, however, it might be dicult to calibrate the reward mechanism.
For example, we could end up with an AI that would be willing to take extreme
risks for the sake of a small chance of eventually obtaining control of a large share
of the universe. It could be expensive to oer the AI a higher expected utility as
reward for cooperation than the AI could hope to achieve by defecting and trying
to escape.
10
A better alternative might be to combine the incentive method with the use
of motivation selection to give the AI a nal goal that makes it easier to con-
trol. Suppose that an AI were designed to have as its nal goal that a particular
CAPABILITY CONTROL METHODS
|
3 3
red button inside a command bunker never be pressed. Since the pressing of the
button is disvalued intrinsically and not because of its causal consequences, the
button could be completely inert: it could be made of Play-Doh. Furthermore, it
is irrelevant whether the AI can ever know whether the button had been pressed.
What is essential is that the AI believes that the button will more likely remain
unpressed if the AI continuously acts in the principals interest than if it rebels.
Renements to this setup are possible. Instead of trying to endow an AI with a
nal goal that refers to a physical button, one could build an AI that places nal
value on receiving a stream of “cryptographic reward tokens.
11
ese would be
sequences of numbers serving as keys to ciphers that would have been generated
before the AI was created and that would have been built into its motivation sys-
tem.
12
ese special number sequences would be extremely desirable to the AI,
constituting a special kind of reward token that the AI could not attain though
wireheading.
13
e keys would be stored in a secure location where they could be
quickly destroyed if the AI ever made an attempt to seize them. So long as the AI
cooperates, the keys are doled out at a steady rate.
e AI could then be given a bounded utility function, and matters could be
arranged so that the AI would get (for instance) 99% of its maximum utility from
the rst reward token; 99% of its remaining utility potential from the second
reward token; and so on. By keeping the “discount rate” high, and by transmit-
ting reward tokens to the AI at a reasonably high rate, one would dissuade the AI
from attempting to carry out long-term plans that would increase the expected
number of reward tokens in the distal future at the expense of some risk that its
reward stream will be cut short or that the next reward token will be cancelled
and destroyed. If the AI always believes that defecting (i.e. doing anything other
than its best to look aer the principals interests) would incur at least a 2% risk
of forfeiting the next reward token, whereas cooperating would reduce the risk of
forfeiting the next reward token to below 1%, then a utility-maximizing AI would
always cooperate.
What might go wrong with such an incentive scheme? One possibility is that
the AI will not trust the human operator to deliver the promised rewards. e
track record of human reliability is something other than a straight line of unerr-
ing perfection. e AI would reasonably worry that the operator will change his
mind, or raise the performance bar, or fail to recognize that the AI has done its
part. e AI would also worry about the operator becoming incapacitated. e
combined risk of such failures might exceed the risk of attempting to seize control
of the reward mechanism. Even a boxed AI possessing the panoply of super-
powers is a strong force. (For an AI that is not boxed to begin with, hijacking the
human-governed reward mechanism may be like taking candy from a baby.)
Another problem with the incentive scheme is that it presupposes that we can
tell whether the outcomes produced by the AI are in our interest. As later chapters
will elaborate, this presupposition is not innocuous.
A full assessment of the feasibility of incentive methods would also have to take
into account a range of other factors, including some esoteric considerations that
34
|
THE CONTROL PROBLEM
might conceivably make such methods more viable than a preliminary analysis
would suggest. In particular, the AI may face ineliminable indexical uncertainty
if it could not be sure that it does not inhabit a computer simulation (as opposed
to “basement-level,” non-simulated physical reality), and this epistemic predica-
ment may radically inuence the AI’s deliberations (see Box 8).
Box 8 Anthropic capture
The AI might assign a substantial probability to its simulation hypothesis, the hy-
pothesis that it is living in a computer simulation. Even today, many AIs inhabit sim-
ulated worlds—worlds consisting of geometric line drawings, texts, chess games,
or simple virtual realities, and in which the laws of physics deviate sharply from
the laws of physics that we believe govern the world of our own experience.
Richer and more complicated virtual worlds will become feasible with improve-
ments in programming techniques and computing power. A mature superintel-
ligence could create virtual worlds that appear to its inhabitants much the same
as our world appears to us. It might create vast numbers of such worlds, running
the same simulation many times or with small variations. The inhabitants would
not necessarily be able to tell whether their world is simulated or not; but if they
are intelligent enough they could consider the possibility and assign it some prob-
ability. In light of the simulation argument (a full discussion of which is beyond the
scope of this book) that probability could be substantial.
4
This predicament especially aicts relatively early-stage superintelligences,
ones that have not yet expanded to take advantage of the cosmic endowment.
An early-stage superintelligence, which uses only a small fraction of the resources
of a single planet, would be much less expensive to simulate than a mature in-
tergalactic superintelligence. Potential simulators—that is, other more mature
civilizationswould be able to run great numbers of simulations of such early-
stage AIs even by dedicating a minute fraction of their computational resources
to that purpose. If at least some (non-trivial fraction) of these mature superintel-
ligent civilizations choose to use this ability, early-stage AIs should assign a sub-
stantial probability to being in a simulation.
How an AI would be aected by the simulation hypothesis depends on its
values.
5
Consider first an AI that has a “resource-insatiable” final goal, such as
the goal of maximizing a utility function that is linear in the number of paperclips
caused by the AI to exist—twice as many paperclips, twice as good. Such an AI
might care less about the simulation hypothesis, on grounds that its ability to in-
fluence how many paperclips are produced looks much more limited if the AI is
in a simulation than if it is in basement-level physical reality. The AI might accord-
ingly behave much as if it were certain that the simulation hypothesis is false, even
if it assigned a fairly high probability to the simulation hypothesis being true.
6
The AI would reason that it does not much matter how it acts if the simulation
CAPABILITY CONTROL METHODS
|
35
Stunting
Another possible capability control method is to limit the system’s intellectual
faculties or its access to information. is might be done by running the AI on
hardware that is slow or short on memory. In the case of a boxed system, informa-
tion inow could also be restricted.
Stunting an AI in these ways would limit its usefulness. e method thus
faces a dilemma: too little stunting, and the AI might have the wit to gure
Box 8 Continued
hypothesis is true, whereas its actions make a great dierence if the hypothesis is
false; so it would select actions mainly by reference to how good they are condi-
tional on the simulation hypothesis being false.
By contrast, consider an AI that has a more modest final goal, one that could
be satisfied with a small amount of resources, such as the goal of receiving some
pre-produced cryptographic reward tokens, or the goal of causing the existence
of forty-five virtual paperclips. Such an AI should not discount those possible
worlds in which it inhabits a simulation. A substantial portion of the AI’s total
expected utility might derive from those possible worlds. The decision-making
of an AI with goals that are easily resource-satiable may therefore—if it assigns a
high probability to the simulation hypothesis—be dominated by considerations
about which actions would produce the best result if its perceived world is a
simulation. Such an AI (even if it is, in fact, not in a simulation) might therefore
be heavily influenced by its beliefs about which behaviors would be rewarded in
a simulation.
In particular, if an AI with resource-satiable final goals believes that in most
simulated worlds that match its observations it will be rewarded if it cooperates
(but not if it attempts to escape its box or contravene the interests of its crea-
tor) then it may choose to cooperate. We could therefore find that even an AI
with a decisive strategic advantage, one that could in fact realize its final goals to a
greater extent by taking over the world than by refraining from doing so, would
nevertheless balk at doing so.
Thus Conscience does make Cowards of us all,
And thus the Native hue of Resolution
Is sicklied o’er, with the pale cast of Thought,
And enterprises of great pith and moment,
With this regard their Currents turn away,
And lose the name of Action.
(Shakespeare, Hamlet, Act iii. Sc. )
A mere line in the sand, backed by the clout of a nonexistent simulator, could
prove a stronger restraint than a two-foot-thick solid steel door.
7
36
|
THE CONTROL PROBLEM
out some way to make itself more intelligent (and thence to world domina-
tion); too much, and the AI is just another piece of dumb soware. A radically
stunted AI is certainly safe but does not solve the problem of how to achieve
a controlled detonation: an intelligence explosion would remain possible and
would simply be triggered by some other system instead, perhaps at a slightly
later date.
One might think it would be safe to build a superintelligence provided it is only
given data about some narrow domain of facts. For example, one might build an
AI that lacks sensors and that has preloaded into its memory only facts about
petroleum engineering or peptide chemistry. But if the AI is superintelligent—if
it is has a superhuman level of general intelligence—such data deprivation does
not guarantee safety.
ere are several reasons for this. First, the notion of information being
“about” a certain topic is generally problematic. Any piece of information can
in principle be relevant to any topic whatsoever, depending on the background
information of a reasoner.
18
Furthermore, a given data set contains information
not only about the domain from which the data was collected but also about
various circumstantial facts. A shrewd mind looking over a knowledge base
that is nominally about peptide chemistry might infer things about a wide range
of topics. e fact that certain information is included and other information
is not could tell an AI something about the state of human science, the meth-
ods and instruments available to study peptides, the fabrication technologies
used to make these instruments, and the nature of the brains and societies that
conceived the studies and the instruments. It might be that a superintelligence
could correctly surmise a great deal from what seem, to dull-witted human
minds, meager scraps of evidence. Even without any designated knowledge
base at all, a suciently superior mind might be able to learn much by simply
introspecting on the workings of its own psyche—the design choices reected
in its source code, the physical characteristics of its circuitry.
19
Perhaps a super-
intelligence could even deduce much about the likely properties of the world
a priori (combining logical inference with a probability prior biased toward
simpler worlds, and a few elementary facts implied by the superintelligence’s
existence as a reasoning system). It might imagine the consequences of dif-
ferent possible laws of physics: what kind of planets would form, what kind of
intelligent life would evolve, what kind of societies would develop, what kind of
methods to solve the control problem would be attempted, how those methods
could be defeated.
20
For these reasons it would be imprudent to rely on information deprivation as
the main check on a superintelligence’s power. Some forms of stunting however,
including limiting hardware capacity and preventing the development of specic
cognitive faculties, might nd use as auxiliary control methods in the develop-
mental phase. ey could be used, for instance, to regulate the rate of capacity
gain in an intermediary AI and to allow tests to be run at dierent levels of capa-
bility short of full-throttled superintelligence.
CAPABILITY CONTROL METHODS
|
37
Tripwires
A tripwire is a mechanism that performs diagnostic tests on the system (possibly
without its knowledge) and eects a shutdown if it detects signs of dangerous
activity. Programmers can then scrutinize recordings of the suspicious activity
at their leisure, and make changes to the AI’s code base or environment before
restarting it. If they nd that the problem is suciently deep-rooted, they could
decide to abort the entire project.
Tripwires dier from incentive methods in that they do not rely on the system
being aware of the consequences of engaging in forbidden activities. Tripwires are
Table 9 Dierent kinds of tripwires
Type of monitoring Description and examples
Behavior Detectors could be placed around a boxed AI to detect at-
tempts to breach the containment. For example, detectors
could intercept attempts at radio communication or at accessing
internal computational resources intended to be o limits. An
“Ethernet port of Eden” could be installed: an apparent connec-
tion to the internet that leads to a shutdown switch.
Ability Automated capability testing could be performed at frequent
intervals to determine the AIs skill in various domains. If either
the rate of improvement is unexpectedly high or the AI attains
a level of competence that brings it close to the potential danger
zone, the system could be slowed down or paused, in order to
allow the programmers to monitor further developments more
closely.
Content The AI’s internal processes could be monitored for suspicious
activity. For example, any change to the AI’s representation of
its final values might trigger an automatic shutdown and review.
The AI’s resource use and algorithmic methods may also be also
be monitored for unexpected patterns.
More ambitiously, the AI’s beliefs, plans, subgoals, and
justifications may be continuously scrutinized, to ensure they
conform to the programmers’ expectations. Such monitoring of
the AI’s intentional states would enable a particularly powerful
form of content monitoring: the scanning of the AI’s cogni-
tive processes for a conception of deception—that vulnerable
moment when an unfriendly AI first forms the intention to
conceal its true intentions.
2
Content monitoring that requires that the AI’s intentional
states be transparent to the programmers or to an automatic
monitoring mechanism may not be feasible for all kinds of AI
architectures. (Some neural networks, for instance, are opaque,
as they represent information holistically and in ways that do
not necessarily match up with human concepts.) This may be a
reason to avoid using such architectures.
38
|
THE CONTROL PROBLEM
more closely related to stunting methods. Like stunting, tripwires could be used
as a temporary safeguard, providing a degree of protection during the develop-
ment phase. In principle, tripwires can also be used during the operational phase,
particularly for a boxed system. However, the ability of tripwires to constrain a
full-edged superintelligence must remain very much in doubt, since it would be
hard for us to assure ourselves that such an agent could not nd ways to subvert
any tripwire devised by the human intellect.
As an adjunct safety measure in the development phase, though, tripwires are
important. It would be good practice for the AI’s designers to codify their expect-
ations about how the system is supposed to function, and then to circumscribe
the AI with tripwires that would be activated if any of these expectations is vio-
lated. To make the tests more stringent, “honeypots” could be strategically placed
to create temptations for a malfunctioning AI to commit some easily observable
violation. For instance, if an AI has been designed in such a way that it is supposed
not to want to access the internet, a fake Ethernet port could be installed (lead-
ing to an automatic shutdown switch) just to see if they AI tries to use it. (Some
examples of tripwires are given in Table 9.)
It should be emphasized that the value of a tripwire depends not only on the
mechanism itself but also—critically—on how a project reacts when a tripwire is
triggered. If the programmers or project managers, impatient to make progress,
simply switch the system back on again—or if they do so aer making some token
modication to prevent the tripwire being triggered on the next run—then no
safety has been gained even if the tripwire itself works exactly as intended.
Motivation selection methods
Motivation selection methods seek to prevent undesirable outcomes by shaping
what the superintelligence wants to do. By engineering the agent’s motivation
system and its nal goals, these methods would produce a superintelligence that
would not want to exploit a decisive strategic advantage in a harmful way. Since a
superintelligent agent is skilled at achieving its ends, if it prefers not to cause harm
(in some appropriate sense of “harm”) then it would tend not to cause harm (in
that sense of “harm”).
Motivation selection can involve explicitly formulating a goal or set of rules to
be followed (direct specication) or setting up the system so that it can discover
an appropriate set of values for itself by reference to some implicitly or indirectly
formulated criterion (indirect normativity). One option in motivation selection
is to try to build the system so that it would have modest, non-ambitious goals
(domesticity). An alternative to creating a motivation system from scratch is to
select an agent that already has an acceptable motivation system and then aug-
ment that agent’s cognitive powers to make it superintelligent, while ensuring that
the motivation system does not get corrupted in the process (augmentation). Let
us look at these in turn.
MOTIVATION SELECTION METHODS
|
39
Direct specification
Direct specication is the most straightforward approach to the control prob-
lem. e approach comes in two versions, rule-based and consequentialist, and
involves trying to explicitly dene a set of rules or values that will cause even
a free-roaming superintelligent AI to act safely and benecially. Direct speci-
cation, however, faces what may be insuperable obstacles, deriving from both
the diculties in determining which rules or values we would wish the AI to be
guided by and the diculties in expressing those rules or values in computer-
readable code.
e traditional illustration of the direct rule-based approach is the “three laws
of robotics” concept, formulated by science ction author Isaac Asimov in a short
story published in 1942.
22
e three laws were: (1) A robot may not injure a human
being or, through inaction, allow a human being to come to harm; (2) Arobot
must obey any orders given to it by human beings, except where such orders would
conict with the First Law; (3) A robot must protect its own existence as long as
such protection does not conict with the First or Second Law. Embarrassingly
for our species, Asimov’s laws remained state-of-the-art for over half a century:
this despite obvious problems with the approach, some of which are explored in
Asimov’s own writings (Asimov probably having formulated the laws in the rst
place precisely so that they would fail in interesting ways, providing fertile plot
complications for his stories).
23
Bertrand Russell, who spent many years working on the foundations of math-
ematics, once remarked that “everything is vague to a degree you do not realize
till you have tried to make it precise.”
24
Russells dictum applies in spades to the
direct specication approach. Consider, for example, how one might explicate
Asimov’s rst law. Does it mean that the robot should minimize the probability of
any human being coming to harm? In that case the other laws become otiose since
it is always possible for the AI to take some action that would have at least some
microscopic eect on the probability of a human being coming to harm. How is
the robot to balance a large risk of a few humans coming to harm versus a small
risk of many humans being harmed? How do we dene “harm” anyway? How
should the harm of physical pain be weighed against the harm of architectural
ugliness or social injustice? Is a sadist harmed if he is prevented from tormenting
his victim? How do we dene “human being”? Why is no consideration given to
other morally considerable beings, such as sentient nonhuman animals and digi-
tal minds? e more one ponders, the more the questions proliferate.
Perhaps the closest existing analog to a rule set that could govern the actions
of a superintelligence operating in the world at large is a legal system. But legal
systems have developed through a long process of trial and error, and they regu-
late relatively slowly-changing human societies. Laws can be revised when neces-
sary. Most importantly, legal systems are administered by judges and juries
who generally apply a measure of common sense and human decency to ignore
logically possible legal interpretations that are suciently obviously unwanted
40
|
THE CONTROL PROBLEM
and unintended by the lawgivers. It is probably humanly impossible to explicitly
formulate a highly complex set of detailed rules, have them apply across a highly
diverse set of circumstances, and get it right on the rst implementation.
25
Problems for the direct consequentialist approach are similar to those for the
direct rule-based approach. is is true even if the AI is intended to serve some
apparently simple purpose such as implementing a version of classical utilitarian-
ism. For instance, the goal “Maximize the expectation of the balance of pleasure
over pain in the world” may appear simple. Yet expressing it in computer code
would involve, among other things, specifying how to recognize pleasure and
pain. Doing this reliably might require solving an array of persistent problems
in the philosophy of mind—even just to obtain a correct account expressed in a
natural language, an account which would then, somehow, have to be translated
into a programming language.
A small error in either the philosophical account or its translation into code
could have catastrophic consequences. Consider an AI that has hedonism as its
nal goal, and which would therefore like to tile the universe with “hedonium”
(matter organized in a conguration that is optimal for the generation of pleas-
urable experience). To this end, the AI might produce computronium (matter
organized in a conguration that is optimal for computation) and use it to imple-
ment digital minds in states of euphoria. In order to maximize eciency, the AI
omits from the implementation any mental faculties that are not essential for the
experience of pleasure, and exploits any computational shortcuts that according
to its denition of pleasure do not vitiate the generation of pleasure. For instance,
the AI might conne its simulation to reward circuitry, eliding faculties such as
memory, sensory perception, executive function, and language; it might simulate
minds at a relatively coarse-grained level of functionality, omitting lower-level
neuronal processes; it might replace commonly repeated computations with calls
to a lookup table; or it might put in place some arrangement whereby multiple
minds would share most parts of their underlying computational machinery
(their “supervenience bases” in philosophical parlance). Such tricks could greatly
increase the quantity of pleasure producible with a given amount of resources.
It is unclear how desirable this would be. Furthermore, if the AI’s criterion for
determining whether a physical process generates pleasure is wrong, then the AI’s
optimizations might throw the baby out with the bathwater: discarding some-
thing which is inessential according to the AI’s criterion yet essential according
to the criteria implicit in our human values. e universe then gets lled not with
exultingly heaving hedonium but with computational processes that are uncon-
scious and completely worthless—the equivalent of a smiley-face sticker xeroxed
trillions upon trillions of times and plastered across the galaxies.
Domesticity
One special type of nal goal which might be more amenable to direct specica-
tion than the examples given above is the goal of self-limitation. While it seems
MOTIVATION SELECTION METHODS
|
4
extremely dicult to specify how one would want a superintelligence to behave in
the world in general—since this would require us to account for all the trade-os
in all the situations that could arise—it might be feasible to specify how a super-
intelligence should behave in one particular situation. We could therefore seek to
motivate the system to conne itself to acting on a small scale, within a narrow
context, and through a limited set of action modes. We will refer to this approach
of giving the AI nal goals aimed at limiting the scope of its ambitions and activi-
ties asdomesticity.
For example, one could try to design an AI such that it would function as a
question-answering device (an “oracle,” to anticipate the terminology that we will
introduce in the next chapter). Simply giving the AI the nal goal of producing
maximally accurate answers to any question posed to it would be unsafe—recall
the “Riemann hypothesis catastrophe” described in Chapter 8. (Reect, also, that
this goal would incentivize the AI to take actions to ensure that it is asked easy
questions.) To achieve domesticity, one might try to dene a nal goal that would
somehow overcome these diculties: perhaps a goal that combined the desiderata
of answering questions correctly and minimizing the AI’s impact on the world
except whatever impact results as an incidental consequence of giving accurate
and non-manipulative answers to the questions it is asked.
26
e direct specication of such a domesticity goal is more likely to be feasible
than the direct specication of either a more ambitious goal or a complete rule set
for operating in an open-ended range of situations. Signicant challenges none-
theless remain. Care would have to be taken, for instance, in the denition of what
it would be for the AI to “minimize its impact on the world” to ensure that the
measure of the AI’s impact coincides with our own standards for what counts as a
large or a small impact. A bad measure would lead to bad trade-os. ere are also
other kinds of risk associated with building an oracle, which we will discuss later.
ere is a natural t between the domesticity approach and physical contain-
ment. One would try to “box” an AI such that the system is unable to escape while
simultaneously trying to shape the AI’s motivation system such that it would be
unwilling to escape even if it found a way to do so. Other things equal, the existence
of multiple independent safety mechanisms should shorten the odds of success.
27
Indirect normativity
If direct specication seems hopeless, we might instead try indirect normativ-
ity. e basic idea is that rather than specifying a concrete normative standard
directly, we specify a process for deriving a standard. We then build the system
so that it is motivated to carry out this process and to adopt whatever standard
the process arrives at.
28
For example, the process could be to carry out an inves-
tigation into the empirical question of what some suitably idealized version of us
would prefer the AI to do. e nal goal given to the AI in this example could be
something along the lines of “achieve that which we would have wished the AI to
achieve if we had thought about the matter long and hard.
4 2
|
THE CONTROL PROBLEM
Further explanation of indirect normativity will have to await Chapter 13.
ere, we will revisit the idea of “extrapolating our volition” and explore vari-
ous alterative formulations. Indirect normativity is a very important approach
to motivation selection. Its promise lies in the fact that it could let us ooad to
the superintelligence much of the dicult cognitive work required to carry out a
direct specication of an appropriate nal goal.
Augmentation
e last motivation selection method on our list is augmentation. Here the idea
is that rather than attempting to design a motivation system de novo, we start
with a system that already has an acceptable motivation system, and enhance its
cognitive faculties to make it superintelligent. If all goes well, this would give us a
superintelligence with an acceptable motivation system.
is approach, obviously, is unavailing in the case of a newly created seed AI.
But augmentation is a potential motivation selection method for other paths to
superintelligence, including brain emulation, biological enhancement, brain–
computer interfaces, and networks and organizations, where there is a possibility
of building out the system from a normative nucleus (regular human beings) that
already contains a representation of human value.
e attractiveness of augmentation may increase in proportion to our despair
at the other approaches to the control problem. Creating a motivation system for a
seed AI that remains reliably safe and benecial under recursive self- improvement
even as the system grows into a mature superintelligence is a tall order, especially
if we must get the solution right on the rst attempt. With augmentation, we
would at least start with a system that has familiar and human-like motivations.
On the downside, it might be hard to ensure that a complex, evolved, kludgy,
and poorly understood motivation system, like that of a human being, will not get
corrupted when its cognitive engine blasts into the stratosphere. As discussed ear-
lier, an imperfect brain emulation procedure that preserves intellectual function-
ing may not preserve all facets of personality. e same is true (though perhaps
to a lesser degree) for biological enhancements of cognition, which might subtly
aect motivation, and for collective intelligence enhancements of organizations
and networks, which might adversely change social dynamics (e.g. in ways that
debase the collective’s attitude toward outsiders or toward its own constituents). If
superintelligence is achieved via any of these paths, a project sponsor would nd
guarantees about the ultimate motivations of the mature system hard to come
by. A mathematically well-specied and foundationally elegant AI architecture
might—for all its non-anthropomorphic otherness—oer greater transparency,
perhaps even the prospect that important aspects of its functionality could be
formally veried.
In the end, however one tallies up the advantages and disadvantages of aug-
mentation, the choice as to whether to rely on it might be forced. If superintel-
ligence is rst achieved along the articial intelligence path, augmentation is not
SYNOPSIS
|
43
applicable. Conversely, if superintelligence is rst achieved along some non-AI
path, then many of the other motivation selection methods are inapplicable. Even
so, views on how likely augmentation would be to succeed do have strategic rele-
vance insofar as we have opportunities to inuence which technology will rst
produce superintelligence.
Synopsis
A quick synopsis might be called for before we close this chapter. We distinguished
two broad classes of methods for dealing with the agency problem at the heart of
AI safety: capability control and motivation selection. Table 10 gives a summary.
Table 0 Control methods
Capability control
Boxing methods The system is confined in such a way that it can aect the exter-
nal world only through some restricted, pre-approved channel.
Encompasses physical and informational containment methods.
Incentive methods The system is placed within an environment that provides ap-
propriate incentives. This could involve social integration into
a world of similarly powerful entities. Another variation is the
use of (cryptographic) reward tokens. “Anthropic capture” is
also a very important possibility but one that involves esoteric
considerations.
Stunting Constraints are imposed on the cognitive capabilities of the
system or its ability to aect key internal processes.
Tripwires Diagnostic tests are performed on the system (possibly without
its knowledge) and a mechanism shuts down the system if dan-
gerous activity is detected.
Motivation selection
Direct specification The system is endowed with some directly specified motivation
system, which might be consequentialist or involve following a
set of rules.
Domesticity A motivation system is designed to severely limit the scope of
the agents ambitions and activities.
Indirect normativity Indirect normativity could involve rule-based or consequential-
ist principles, but is distinguished by its reliance on an indirect
approach to specifying the rules that are to be followed or the
values that are to be pursued.
Augmentation One starts with a system that already has substantially human or
benevolent motivations, and enhances its cognitive capacities to
make it superintelligent.
44
|
THE CONTROL PROBLEM
Each control method comes with potential vulnerabilities and presents dier-
ent degrees of diculty in its implementation. It might perhaps be thought that
we should rank them from better to worse, and then opt for the best method.
But that would be simplistic. Some methods can be used in combination whereas
others are exclusive. Even a comparatively insecure method may be advisable if it
can easily be used as an adjunct, whereas a strong method might be unattractive
if it would preclude the use of other desirable safeguards.
It is therefore necessary to consider what package deals are available. We need
to consider what type of system we might try to build, and which control methods
would be applicable to each type. is is the topic for our next chapter.
ORACLES
|
45
CHAPTER 10
Oracles, genies, sovereigns,
tools
S
ome say: “Just build a question-answering system!” or “Just build an AI
that is like a tool rather than an agent!” But these suggestions do not make
all safety concerns go away, and it is in fact a non-trivial question which
type of system would oer the best prospects for safety. We consider four types
or “castes”—oracles, genies, sovereigns, and tools—and explain how they relate
to one another.
1
Each oers dierent sets of advantages and disadvantages in
our quest to solve the control problem.
Oracles
An oracle is a question-answering system. It might accept questions in a natu-
ral language and present its answers as text. An oracle that accepts only yes/no
questions could output its best guess with a single bit, or perhaps with a few extra
bits to represent its degree of condence. An oracle that accepts open-ended ques-
tions would need some metric with which to rank possible truthful answers in
terms of their informativeness or appropriateness.
2
In either case, building an
oracle that has a fully domain-general ability to answer natural language ques-
tions is an AI-complete problem. If one could do that, one could probably also
build an AI that has a decent ability to understand human intentions as well as
human words.
Oracles with domain-limited forms of superintelligence are also conceivable.
For instance, one could conceive of a mathematics-oracle which would only accept
queries posed in a formal language but which would be very good at answering
such questions (e.g. being able to solve in an instant almost any formally expressed
math problem that the human mathematics profession could solve by laboring
collaboratively for a century). Such a mathematics-oracle would form a stepping-
stone toward domain-general superintelligence.
46
|
ORACLES, GENIES, SOVEREIGNS, TOOLS
Oracles with superintelligence in extremely limited domains already exist.
Apocket calculator can be viewed as a very narrow oracle for basic arithmetical
questions; an Internet search engine can be viewed as a very partial realization
of an oracle with a domain that encompasses a signicant part of general human
declarative knowledge. ese domain-limited oracles are tools rather than agents
(more on tool-AIs shortly). In what follows, though, the term “oracle” will refer to
question-answering systems that have domain-general superintelligence, unless
otherwise stated.
To make a general superintelligence function as an oracle, we could apply both
motivation selection and capability control. Motivation selection for an oracle
may be easier than for other castes of superintelligence, because the nal goal in
an oracle could be comparatively simple. We would want the oracle to give truth-
ful, non-manipulative answers and to otherwise limit its impact on the world.
Applying a domesticity method, we might require that the oracle should use only
designated resources to produce its answer. For example, we might stipulate that
it should base its answer on a preloaded corpus of information, such as a stored
snapshot of the Internet, and that it should use no more than a xed number of
computational steps.
3
To avoid incentivizing the oracle to manipulate us into giv-
ing it easier questions—which would happen if we gave it the goal of maximizing
its accuracy across all questions we will ask it—we could give it the goal of answer-
ing only one question and to terminate immediately upon delivering its answer.
e question would be preloaded into its memory before the program is run. To
ask a second question, we would reset the machine and run the same program
with a dierent question preloaded in memory.
Subtle and potentially treacherous challenges arise even in specifying the rela-
tively simple motivation system needed to drive an oracle. Suppose, for example,
that we come up with some explication of what it means for the AI “to minimize
its impact on the world, subject to achieving certain results” or “to use only des-
ignated resources in preparing the answer.” What happens if the AI, in the course
of its intellectual development, undergoes the equivalent of a scientic revolu-
tion involving a change in its basic ontology?
4
We might initially have explicated
impact” and “designated resources” using our own ontology (postulating the
existence of various physical objects such as computers). But just as we have aban-
doned ontological categories that were taken for granted by scientists in previous
ages (e.g. “phlogiston,” “élan vital,” and “absolute simultaneity”), so a superintel-
ligent AI might discover that some of our current categories are predicated on
fundamental misconceptions. e goal system of an AI undergoing an ontologi-
cal crisis needs to be resilient enough that the “spirit” of its original goal content
is carried over, charitably transposed into the new key.
Although making an oracle safe through the use of motivation selection might
be far from trivial, it may nevertheless be easier than doing the same for an AI that
roams the world in pursuit of some complicated goal. is is an argument for pre-
ferring that the rst superintelligence be an oracle. Another point in favor of an
oracle-rst path is the greater susceptibility of an oracle to capability control. All
ORACLES
|
47
the standard boxing methods apply here. In addition, there may be methods that
are distinctively applicable to oracles. For example, consider the risk that an ora-
cle will answer questions not in a maximally truthful way but in such a way as to
subtly manipulate us into promoting its own hidden agenda. One way to slightly
mitigate this threat could be to create multiple oracles, each with a slightly dif-
ferent code and a slightly dierent information base. A simple mechanism could
then compare the answers given by the dierent oracles and only present them for
human viewing if all the answers agree. While one must assume that any super-
intelligent domain-general oracle would know about the existence of other oracles
(it could infer this from its knowledge about human society), it might be feasible
to hide some implementation details of each oracle from the views of the others. If
the oracles are not able to communicate with one another, it might then be di-
cult for them to coordinate on how to answer our questions manipulatively. ere
are many ways of deviating from the truth, and the oracles may not all agree on
which of these deviations is most attractive—whereas the truth itself is a Schelling
point (a salient place for agreement in the absence of communication). So if the
oracles achieve consensus, it might be a sign that they gave the true answer.
5
An oracle would ideally be trustworthy in the sense that we could safely assume
that its answers are always accurate to the best of its ability. But even an untrust-
worthy oracle could be useful. We could ask such an oracle questions of a type for
which it is dicult to nd the answer but easy to verify whether a given answer
is correct. Many mathematical problems are of this kind. If we are wondering
whether a mathematical proposition is true, we could ask the oracle to produce a
proof or disproof of the proposition. Finding the proof may require insight and
creativity beyond our ken, but checking a purported proofs validity can be done
by a simple mechanical procedure.
If it is expensive to verify answers (as is oen the case on topics outside logic
and mathematics), we can randomly select a subset of the oracle’s answers for
verication. If they are all correct, we can assign a high probability to most of the
other answers also being correct. is trick can give us a bulk discount on trust-
worthy answers that would be costly to verify individually. (Unfortunately, it can-
not give us trustworthy answers that we are unable to verify, since a dissembling
oracle may choose to answer correctly only those questions where it believes we
could verify its answers.)
ere could be important issues on which we could benet from an augural
pointer toward the correct answer (or toward a method for locating the correct
answer) even if we had to actively distrust the provenance. For instance, one
might ask for the solution to various technical or philosophical problems that
may arise in the course of trying to develop more advanced motivation selection
methods. If we had a proposed AI design alleged to be safe, we could ask an oracle
whether it could identify any signicant aw in the design, and whether it could
explain any such aw to us in twenty words or less. Questions of this kind could
elicit valuable information. Caution and restraint would be required, however,
for us not to ask too many such questions—and not to allow ourselves to partake
48
|
ORACLES, GENIES, SOVEREIGNS, TOOLS
of too many details of the answers given to the questions we do ask—lest we give
the untrustworthy oracle opportunities to work on our psychology (by means of
plausible-seeming but subtly manipulative messages). It might not take many bits
of communication for an AI with the social manipulation superpower to bend us
to its will.
Even if the oracle itself works exactly as intended, there is a risk that it would
be misused. One obvious dimension of this problem is that an oracle AI would be
a source of immense power which could give a decisive strategic advantage to its
operator. is power might be illegitimate and it might not be used for the com-
mon good. Another more subtle but no less important dimension is that the use
of an oracle could be extremely dangerous for the operator herself. Similar wor-
ries (which involve philosophical as well as technical issues) arise also for other
hypothetical castes of superintelligence. We will explore them more thoroughly
in Chapter 13. Suce it here to note that the protocol determining which ques-
tions are asked, in which sequence, and how the answers are reported and dis-
seminated could be of great signicance. One might also consider whether to try
to build the oracle in such a way that it would refuse to answer any question in
cases where it predicts that its answering would have consequences classied as
catastrophic according to some rough-and-ready criteria.
Genies and sovereigns
A genie is a command-executing system: it receives a high-level command, car-
ries it out, then pauses to await the next command.
6
A sovereign is a system that
has an open-ended mandate to operate in the world in pursuit of broad and possi-
bly very long-range objectives. Although these might seem like radically dierent
templates for what a superintelligence should be and do, the dierence is not as
deep as it might at rst glance appear.
With a genie, one already sacrices the most attractive property of an oracle:
the opportunity to use boxing methods. While one might consider creating a
physically conned genie, for instance one that can only construct objects inside
a designated volume—a volume that might be sealed o by a hardened wall or
a barrier loaded with explosive charges rigged to detonate if the containment is
breached—it would be dicult to have much condence in the security of any
such physical containment method against a superintelligence equipped with ver-
satile manipulators and construction materials. Even if it were somehow possible
to ensure a containment as secure as that which can be achieved for an oracle, it is
not clear how much we would have gained by giving the superintelligence direct
access to manipulators compared to requiring it instead to output a blueprint that
we could inspect and then use to achieve the same result ourselves. e gain in
speed and convenience from bypassing the human intermediary seems hardly
worth the loss of foregoing the use of the stronger boxing methods available to
contain an oracle.
GENIES AND SOVEREIGNS
|
49
If one were creating a genie, it would be desirable to build it so that it would
obey the intention behind the command rather than its literal meaning, since a
literalistic genie (one superintelligent enough to attain a decisive strategic advan-
tage) might have a propensity to kill the user and the rest of humanity on its rst
use, for reasons explained in the section on malignant failure modes in Chapter8.
More broadly, it would seem important that the genie seek a charitable—and what
human beings would regard as reasonable—interpretation of what is being com-
manded, and that the genie be motivated to carry out the command under such
an interpretation rather than under the literalistic interpretation. e ideal genie
would be a super-butler rather than an autistic savant.
A genie endowed with such a super-butler nature, however, would not be far
from qualifying for membership in the caste of sovereigns. Consider, for compari-
son, the idea of building a sovereign with the nal goal of obeying the spirit of the
commands we would have given had we built a genie rather than a sovereign. Such
a sovereign would mimic a genie. Being superintelligent, this sovereign would do
a good job at guessing what commands we would have given a genie (and it could
always ask us if that would help inform its decisions). Would there then really be
any important dierence between such a sovereign and a genie? Or, pressing on
the distinction from the other side, consider that a superintelligent genie may
likewise be able to predict what commands we will give it: what then is gained
from having it await the actual issuance before it acts?
One might think that a big advantage of a genie over a sovereign is that if
something goes wrong, we could issue the genie with a new command to stop or
to reverse the eects of the previous actions, whereas a sovereign would just push
on regardless of our protests. But this apparent safety advantage for the genie is
largely illusory. e “stop” or “undo” button on a genie works only for benign
failure modes: in the case of a malignant failure—one in which, for example,
carrying out the existing command has become a nal goal for the genie—
the genie would simply disregard any subsequent attempt to countermand the
previous command.
7
One option would be to try to build a genie such that it would automatically
present the user with a prediction about salient aspects of the likely outcomes of
a proposed command, asking for conrmation before proceeding. Such a system
could be referred to as a genie-with-a-preview. But if this could be done for a genie,
it could likewise be done for a sovereign. So again, this is not a clear dierentiator
between a genie and a sovereign. (Supposing that a preview functionality could
be created, the questions of whether and if so how to use it are rather less obvious
than one might think, notwithstanding the strong appeal of being able to glance
at the outcome before committing to making it irrevocable reality. We will return
to this matter later.)
e ability of one caste to mimic another extends to oracles, too. A genie could
be made to act like an oracle if the only commands we ever give it are to answer
certain questions. An oracle, in turn, could be made to substitute for a genie if we
asked the oracle what the easiest way is to get certain commands executed. e
50
|
ORACLES, GENIES, SOVEREIGNS, TOOLS
oracle could give us step-by-step instructions for achieving the same result as a
genie would produce, or it could even output the source code for a genie.
8
Similar
points can be made with regard to the relation between an oracle and a sovereign.
e real dierence between the three castes, therefore, does not reside in the
ultimate capabilities that they would unlock. Instead, the dierence comes down
to alternative approaches to the control problem. Each caste corresponds to a dif-
ferent set of safety precautions. e most prominent feature of an oracle is that
it can be boxed. One might also try to apply domesticity motivation selection
to an oracle. A genie is harder to box, but at least domesticity may be applicable.
Asovereign can neither be boxed nor handled through the domesticity approach.
If these were the only relevant factors, then the order of desirability would seem
clear: an oracle would be safer than a genie, which would be safer than a sover-
eign; and any initial dierences in convenience and speed of operation would be
relatively small and easily dominated by the gains in safety obtainable by building
an oracle. However, there are other factors that need to be taken into account.
When choosing between castes, one should consider not only the danger posed
by the system itself but also the dangers that arise out of the way it might be used.
A genie most obviously gives the person who controls it enormous power, but the
same holds for an oracle.
9
A sovereign, by contrast, could be constructed in such
way as to accord no one person or group any special inuence over the outcome,
and such that it would resist any attempt to corrupt or alter its original agenda.
What is more, if a sovereign’s motivation is dened using “indirect normativ-
ity” (a concept to be described in Chapter 13) then it could be used to achieve
some abstractly dened outcome, such as “whatever is maximally fair and mor-
ally right”—without anybody knowing in advance what exactly this will entail.
is would create a situation analogous to a Rawlsian “veil of ignorance.
10
Such
a setup might facilitate the attainment of consensus, help prevent conict, and
promote a more equitable outcome.
Another point, which counts against some types of oracles and genies, is that
there are risks involved in designing a superintelligence to have a nal goal that
does not fully match the outcome that we ultimately seek to attain. For example,
if we use a domesticity motivation to make the superintelligence want to mini-
mize some of its impacts on the world, we might thereby create a system whose
preference ranking over possible outcomes diers from that of the sponsor. e
same will happen if we build the AI to place a peculiarly high value on answering
questions correctly, or on faithfully obeying individual commands. Now, if suf-
cient care is taken, this should not cause any problems: there would be sucient
agreement between the two rankings—at least insofar as they pertain to possi-
ble worlds that have a reasonable chance of being actualized—that the outcomes
that are good by the AI’s standard are also good by the principals standard. But
perhaps one could argue for the design principle that it is unwise to introduce
even a limited amount of disharmony between the AI’s goals and ours. (e same
concern would of course apply to giving sovereigns goals that do not completely
harmonize with ours.)
TOOL-AIS
|
5
Too l- AI s
One suggestion that has been made is that we build the superintelligence to be
like a tool rather than an agent.
11
is idea seems to arise out of the observation
that ordinary soware, which is used in countless applications, does not raise any
safety concerns even remotely analogous to the challenges discussed in this book.
Might one not create “tool-AI” that is like such soware—like a ight control
system, say, or a virtual assistant—only more exible and capable? Why build
a superintelligence that has a will of its own? On this line of thinking, the agent
paradigm is fundamentally misguided. Instead of creating an AI that has beliefs
and desires and that acts like an articial person, we should aim to build regular
soware that simply does what it is programmed to do.
is idea of creating soware that “simply does what it is programmed to do” is,
however, not so straightforward if the product being created is a powerful general
intelligence. ere is, of course, a trivial sense in which all soware simply does
what it is programmed to do: the behavior is mathematically specied by the code.
But this is equally true for all castes of machine intelligence, “tool-AI” or not. If,
instead, “simply doing what it is programmed to do” means that the soware
behaves as the programmers intended, then this is a standard that ordinary so-
ware very oen fails to meet.
Because of the limited capabilities of contemporary soware (compared with
those of machine superintelligence) the consequences of such failures are man-
ageable, ranging from insignicant to very costly, but in no case amounting to an
existential threat.
12
However, if it is insucient capability rather than sucient
reliability that makes ordinary soware existentially safe, then it is unclear how
such soware could be a model for a safe superintelligence. It might be thought
that by expanding the range of tasks done by ordinary soware, one could elim-
inate the need for articial general intelligence. But the range and diversity of
tasks that a general intelligence could protably perform in a modern economy
is enormous. It would be infeasible to create special-purpose soware to handle
all of those tasks. Even if it could be done, such a project would take a long time to
carry out. Before it could be completed, the nature of some of the tasks would have
changed, and new tasks would have become relevant. ere would be great advan-
tage to having soware that can learn on its own to do new tasks, and indeed to
discover new tasks in need of doing. But this would require that the soware be
able to learn, reason, and plan, and to do so in a powerful and robustly cross-
domain manner. In other words, it would require general intelligence.
Especially relevant for our purposes is the task of soware development itself.
ere would be enormous practical advantages to being able to automate this. Yet
the capacity for rapid self-improvement is just the critical property that enables a
seed AI to set o an intelligence explosion.
If general intelligence is not dispensable, is there some other way of construing
the tool-AI idea so as to preserve the reassuringly passive quality of a humdrum
tool? Could one have a general intelligence that is not an agent? Intuitively, it is
52
|
ORACLES, GENIES, SOVEREIGNS, TOOLS
not just the limited capability of ordinary soware that makes it safe: it is also its
lack of ambition. ere is no subroutine in Excel that secretly wants to take over
the world if only it were smart enough to nd a way. e spreadsheet applica-
tion does not “want” anything at all; it just blindly carries out the instructions
in the program. What (one might wonder) stands in the way of creating a more
generally intelligent application of the same type? An oracle, for instance, which,
when prompted with a description of a goal, would respond with a plan for how
to achieve it, in much the same way that Excel responds to a column of numbers
by calculating a sum—without thereby expressing any “preferences” regarding its
output or how humans might choose to use it?
e classical way of writing soware requires the programmer to understand
the task to be performed in sucient detail to formulate an explicit solution pro-
cess consisting of a sequence of mathematically well-dened steps expressible in
code.
13
(In practice, soware engineers rely on code libraries stocked with use-
ful behaviors, which they can invoke without needing to understand how the
behaviors are implemented. But that code was originally created by programmers
who had a detailed understanding of what they were doing.) is approach works
for solving well-understood tasks, and is to credit for most soware that is cur-
rently in use. It falls short, however, when nobody knows precisely how to solve
all of the tasks that need to be accomplished. is is where techniques from the
eld of articial intelligence become relevant. In narrow applications, machine
learning might be used merely to ne-tune a few parameters in a largely human-
designed program. A spam lter, for example, might be trained on a corpus of
hand- classied email messages in a process that changes the weights that the
classication algorithm places on various diagnostic features. In a more ambi-
tious application, the classier might be built so that it can discover new features
on its own and test their validity in a changing environment. An even more
sophisticated spam lter could be endowed with some ability to reason about the
trade-os facing the user or about the contents of the messages it is classifying. In
neither of these cases does the programmer need to know the best way of distin-
guishing spam from ham, only how to set up an algorithm that can improve its
own performance via learning, discovering, or reasoning.
With advances in articial intelligence, it would become possible for the
programmer to ooad more of the cognitive labor required to gure out how
to accomplish a given task. In an extreme case, the programmer would simply
specify a formal criterion of what counts as success and leave it to the AI to nd
a solution. To guide its search, the AI would use a set of powerful heuristics and
other methods to discover structure in the space of possible solutions. It would
keep searching until it found a solution that satised the success criterion. e AI
would then either implement the solution itself or (in the case of an oracle) report
the solution to the user.
Rudimentary forms of this approach are quite widely deployed today.
Nevertheless, soware that uses AI and machine learning techniques, though it has
some ability to nd solutions that the programmers had not anticipated, functions
TOOL-AIS
|
53
for all practical purposes like a tool and poses no existential risk. We would enter
the danger zone only when the methods used in the search for solutions become
extremely powerful and general: that is, when they begin to amount to general
intelligence—and especially when they begin to amount to superintelligence.
ere are (at least) two places where trouble could then arise. First, the super-
intelligent search process might nd a solution that is not just unexpected but
radically unintended. is could lead to a failure of one of the types discussed pre-
viously (“perverse instantiation,” “infrastructure profusion,” or “mind crime”). It
is most obvious how this could happen in the case of a sovereign or a genie, which
directly implements the solution it has found. If making molecular smiley faces or
transforming the planet into paperclips is the rst idea that the superintelligence
discovers that meets the solution criterion, then smiley faces or paperclips we
get.
14
But even an oracle, which—if all else goes well—merely reports the solution,
could become a cause of perverse instantiation. e user asks the oracle for a plan
to achieve a certain outcome, or for a technology to serve a certain function; and
when the user follows the plan or constructs the technology, a perverse instantia-
tion can ensue, just as if the AI had implemented the solution itself.
15
A second place where trouble could arise is in the course of the soware’s oper-
ation. If the methods that the soware uses to search for a solution are suciently
sophisticated, they may include provisions for managing the search process itself
in an intelligent manner. In this case, the machine running the soware may
begin to seem less like a mere tool and more like an agent. us, the soware may
start by developing a plan for how to go about its search for a solution. e plan
may specify which areas to explore rst and with what methods, what data to
gather, and how to make best use of available computational resources. In search-
ing for a plan that satises the soware’s internal criterion (such as yielding a
suciently high probability of nding a solution satisfying the user-specied
criterion within the allotted time), the soware may stumble on an unorthodox
idea. For instance, it might generate a plan that begins with the acquisition of
additional computational resources and the elimination of potential interrupters
(such as human beings). Such “creative” plans come into view when the soware’s
cognitive abilities reach a suciently high level. When the soware puts such a
plan into action, an existential catastrophe may ensue.
As the examples in Box 9 illustrate, open-ended search processes sometimes
evince strange and unexpected non-anthropocentric solutions even in their cur-
rently limited forms. Present-day search processes are not hazardous because they
are too weak to discover the kind of plan that could enable a program to take over
the world. Such a plan would include extremely dicult steps, such as the inven-
tion of a new weapons technology several generations ahead of the state of the art
or the execution of a propaganda campaign far more eective than any communi-
cation devised by human spin doctors. To have a chance of even conceiving of such
ideas, let alone developing them in a way that would actually work, a machine
would probably need the capacity to represent the world in a way that is at least as
rich and realistic as the world model possessed by a normal human adult (though
54
|
ORACLES, GENIES, SOVEREIGNS, TOOLS
Box 9 Strange solutions from blind search
Even simple evolutionary search processes sometimes produce highly unexpect-
ed results, solutions that satisfy a formal user-defined criterion in a very dierent
way than the user expected or intended.
The field of evolvable hardware oers many illustrations of this phenomenon.
In this field, an evolutionary algorithm searches the space of hardware designs,
testing the fitness of each design by instantiating it physically on a rapidly recon-
figurable array or motherboard. The evolved designs often show remarkable
economy. For instance, one search discovered a frequency discrimination circuit
that functioned without a clock—a component normally considered necessary
for this function. The researchers estimated that the evolved circuit was between
one and two orders of magnitude smaller than what a human engineer would
have required for the task. The circuit exploited the physical properties of its
components in unorthodox ways; some active, necessary components were not
even connected to the input or output pins! These components instead par-
ticipated via what would normally be considered nuisance side eects, such as
electromagnetic coupling or power-supply loading.
Another search process, tasked with creating an oscillator, was deprived of
a seemingly even more indispensible component, the capacitor. When the al-
gorithm presented its successful solution, the researchers examined it and at
first concluded that it “should not work.” Upon more careful examination, they
discovered that the algorithm had, MacGyver-like, reconfigured its sensor-less
motherboard into a makeshift radio receiver, using the printed circuit board
tracks as an aerial to pick up signals generated by personal computers that hap-
pened to be situated nearby in the laboratory. The circuit amplified this signal to
produce the desired oscillating output.
6
In other experiments, evolutionary algorithms designed circuits that sensed
whether the motherboard was being monitored with an oscilloscope or wheth-
er a soldering iron was connected to the labs common power supply. These ex-
amples illustrate how an open-ended search process can repurpose the materials
accessible to it in order to devise completely unexpected sensory capabilities, by
means that conventional human design-thinking is poorly equipped to exploit or
even account for in retrospect.
The tendency for evolutionary search to “cheat” or find counterintuitive ways
of achieving a given end is on display in nature too, though it is perhaps less obvi-
ous to us there because of our already being somewhat familiar with the look and
feel of biology, and thus being prone to regarding the actual outcomes of natural
evolutionary processes as normaleven if we would not have expected them ex
ante. But it is possible to set up experiments in artificial selection where one can
see the evolutionary process in action outside its familiar context. In such experi-
ments, researchers can create conditions that rarely obtain in nature, and observe
the results.
COMPARISON
|
55
a lack of awareness in some areas might possibly be compensated for by extra skill
in others). is is far beyond the reach of contemporary AI. And because of the
combinatorial explosion, which generally defeats attempts to solve complicated
planning problems with brute-force methods (as we saw in Chapter1), the short-
comings of known algorithms cannot realistically be overcome simply by pour-
ing on more computing power.
21
However, once the search or planning processes
become powerful enough, they also become potentially dangerous.
Instead of allowing agent-like purposive behavior to emerge spontaneously and
haphazardly from the implementation of powerful search processes (including
processes searching for internal work plans and processes directly searching for
solutions meeting some user-specied criterion), it may be better to create agents
on purpose. Endowing a superintelligence with an explicitly agent-like structure
can be a way of increasing predictability and transparency. A well-designed sys-
tem, built such that there is a clean separation between its values and its beliefs,
would let us predict something about the outcomes it would tend to produce. Even
if we could not foresee exactly which beliefs the system would acquire or which
situations it would nd itself in, there would be a known place where we could
inspect its nal values and thus the criteria that it will use in selecting its future
actions and in evaluating any potential plan.
Comparison
It may be useful to summarize the features of the dierent system castes we have
discussed (Table 11).
Box 9 Continued
For example, prior to the 960s, it was apparently quite common for biolo-
gists to maintain that predator populations restrict their own breeding in order
to avoid falling into a Malthusian trap.
7
Although individual selection would work
against such restraint, it was sometimes thought that group selection would over-
come individual incentives to exploit opportunities for reproduction and favor
traits that would benefit the group or population at large. Theoretical analysis and
simulation studies later showed that while group selection is possible in principle,
it can overcome strong individual selection only under very stringent conditions
that may rarely apply in nature.
8
But such conditions can be created in the labora-
tory. When flour beetles (Tribolium castaneum) were bred for reduced popula-
tion size, by applying strong group selection, evolution did indeed lead to smaller
populations.
9
However, the means by which this was accomplished included not
only the “benign” adaptations of reduced fecundity and extended developmental
time that a human naively anthropomorphizing evolutionary search might have
expected, but also an increase in cannibalism.
20
56
|
ORACLES, GENIES, SOVEREIGNS, TOOLS
Table  Features of dierent system castes
Oracle A question-answering system
Variations: Domain-limited oracles (e.g.
mathematics); output-restricted oracles
(e.g. only yes/no/undecided answers,
or probabilities); oracles that refuse to
answer questions if they predict the
consequences of answering would meet
pre-specified “disaster criteria”; multiple
oracles for peer review
•  Boxing methods fully applicable
•  Domesticity fully applicable
•  Reduced need for AI to understand human intentions and interests (com-
pared to genies and sovereigns)
•  Use of yes/no questions can obviate need for a metric of the “usefulness” or 
“informativeness” of answers
•  Source of great power (might give operator a decisive strategic advantage)
•  Limited protection against foolish use by operator
•  Untrustworthy oracles could be used to provide answers that are hard to 
find but easy to verify
•  Weak verication of answers may be possible through the use of multiple 
oracles
Genie A command-executing system
Variations: Genies using dierent “extrapo-
lation distances” or degrees of following
the spirit rather than letter of the com-
mand; domain-limited genies; genies-with-
preview; genies that refuse to obey com-
mands if they predict the consequences
of obeying would meet pre-specified
“disaster criteria”
•  Boxing methods partially applicable (for spatially limited genies)
•  Domesticity partially applicable
•  Genie could oer a preview of salient aspects of expected outcomes
•  Genie could implement change in stages, with opportunity for review at each 
stage
•  Source of great power (might give operator a decisive strategic advantage)
•  Limited protection against foolish use by operator
•  Greater need for AI to understand human interests and intentions (compared 
to oracles)
COMPARISON
|
57
Sovereign A system designed for open-ended
autonomous operation
Variations: Many possible motivation
systems; possibility of using preview and
“sponsor ratification” (to be discussed in
Chapter 3)
•  Boxing methods inapplicable
•  Most other capability control methods also inapplicable (except, possibly, 
social integration or anthropic capture)
•  Domesticity mostly inapplicable
•  Great need for AI to understand true human interests and intentions
•  Necessity of getting it right on the rst try (though, to a possibly lesser extent, 
this is true for all castes)
•  Potentially a source of great power for sponsor, including decisive strategic 
advantage
•  Once activated, not vulnerable to hijacking by operator, and might be de-
signed with some protection against foolish use
•  Can be used to implement “veil of ignorance” outcomes (cf. Chapter 13)
Tool A system not designed to exhibit goal-
directed behavior
•  Boxing methods may be applicable, depending on the implementation
•  Powerful search processes would likely be involved in the development and 
operation of a machine superintelligence
•  Powerful search to nd a solution meeting some formal criterion can produce 
solutions that meet the criterion in an unintended and dangerous way
•  Powerful search might involve secondary, internal search and planning 
processes that might find dangerous ways of executing the primary search
process
Ta bl e  Continued
58
|
ORACLES, GENIES, SOVEREIGNS, TOOLS
Further research would be needed to determine which type of system would be
safest. e answer might depend on the conditions under which the AI would be
deployed. e oracle caste is obviously attractive from a safety standpoint, since it
would allow both capability control methods and motivation selection methods
to be applied. It might thus seem to simply dominate the sovereign caste, which
would only allow motivation selection methods (except in scenarios in which
the world is believed to contain other powerful superintelligences, in which case
social integration or anthropic capture might apply). However, an oracle could
place a lot of power into the hands of its operator, who might be corrupted or
might apply the power unwisely, whereas a sovereign would oer some protection
against these hazards. e safety ranking is therefore not so easily determined.
A genie can be viewed as a compromise between an oracle and a sovereign—
but not necessarily a good compromise. In many ways, it would share the disad-
vantages of both. e apparent safety of a tool-AI, meanwhile, may be illusory.
In order for tools to be versatile enough to substitute for superintelligent agents,
they may need to deploy extremely powerful internal search and planning pro-
cesses. Agent-like behaviors may arise from such processes as an unplanned con-
sequence. In that case, it would be better to design the system to be an agent in the
rst place, so that the programmers can more easily see what criteria will end up
determining the systems output.
MULTIPOLAR SCENARIOS
|
59
CHAPTER 11
Multipolar scenarios
W
e have seen (particularly in Chapter 8) how menacing a unipolar
outcome could be, one in which a single superintelligence obtains
a decisive strategic advantage and uses it to establish a singleton.
In this chapter, we examine what would happen in a multipolar outcome, a
post-transition society with multiple competing superintelligent agencies. Our
interest in this class of scenarios is twofold. First, as alluded to in Chapter 9,
social integration might be thought to oer a solution to the control problem.
We already noted some limitations with that approach, and this chapter paints
a fuller picture. Second, even without anybody setting out to create a multipolar
condition as a way of handling the control problem, such an outcome might
occur anyway. So what might such an outcome look like? e resulting competi-
tive society is not necessarily attractive, nor long-lasting.
In singleton scenarios, what happens post-transition depends almost entirely on
the values of the singleton. e outcome could thus be very good or very bad,
depending on what those values are. What the values are depends, in turn, on
whether the control problem was solved, and—to the degree to which it was
solved—on the goals of the project that created the singleton.
If one is interested in the outcome of singleton scenarios, therefore, one really
only has three sources of information: information about matters that cannot be
aected by the actions of the singleton (such as the laws of physics); information
about convergent instrumental values; and information that enables one to pre-
dict or speculate about what nal values the singleton will have.
In multipolar scenarios, an additional set of constraints comes into play, con-
straints having to do with how agents interact. e social dynamics emerging
from such interactions can be studied using techniques from game theory, eco-
nomics, and evolution theory. Elements of political science and sociology are also
relevant insofar as they can be distilled and abstracted from some of the more
contingent features of human experience. Although it would be unrealistic to
expect these constraints to give us a precise picture of the post-transition world,
60
|
MULTIPOLAR SCENARIOS
they can help us identify some salient possibilities and challenge some unfounded
assumptions.
We will begin by exploring an economic scenario characterized by a low level of
regulation, strong protection of property rights, and a moderately rapid introduc-
tion of inexpensive digital minds.
1
is type of model is most closely associated
with the American economist Robin Hanson, who has done pioneering work on
the subject. Later in this chapter, we will look at some evolutionary considerations
and examine the prospects of an initially multipolar post-transition world subse-
quently coalescing into a singleton.
Of horses and men
General machine intelligence could serve as a substitute for human intelligence.
Not only could digital minds perform the intellectual work now done by humans,
but, once equipped with good actuators or robotic bodies, machines could also
substitute for human physical labor. Suppose that machine workers—which can
be quickly reproduced—become both cheaper and more capable than human
workers in virtually all jobs. What happens then?
Wages and unemployment
With cheaply copyable labor, market wages fall. e only place where humans
would remain competitive may be where customers have a basic preference for
work done by humans. Today, goods that have been handcraed or produced by
indigenous people sometimes command a price premium. Future consumers
might similarly prefer human-made goods and human athletes, human artists,
human lovers, and human leaders to functionally indistinguishable or superior
articial counterparts. It is unclear, however, just how widespread such prefer-
ences would be. If machine-made alternatives were suciently superior, perhaps
they would be more highly prized.
One parameter that might be relevant to consumer choice is the inner life of the
worker providing a service or product. A concert audience, for instance, might like
to know that the performer is consciously experiencing the music and the venue.
Absent phenomenal experience, the musician could be regarded as merely a high-
powered jukebox, albeit one capable of creating the three-dimensional appear-
ance of a performer interacting naturally with the crowd. Machines might then
be designed to instantiate the same kinds of mental states that would be present
in a human performing the same task. Even with perfect replication of subjective
experiences, however, some people might simply prefer organic work. Such pref-
erences could also have ideological or religious roots. Just as many Muslims and
Jews shun food prepared in ways they classify as haram or treif, so there might be
groups in the future that eschew products whose manufacture involved unsanc-
tioned use of machine intelligence.
OF HORSES AND MEN
|
6
What hinges on this? To the extent that cheap machine labor can substitute for
human labor, human jobs may disappear. Fears about automation and job loss are
of course not new. Concerns about technological unemployment have surfaced
periodically, at least since the Industrial Revolution; and quite a few professions
have in fact gone the way of the English weavers and textile artisans who in the
early nineteenth century united under the banner of the folkloric “General Ludd
to ght against the introduction of mechanized looms. Nevertheless, although
machinery and technology have been substitutes for many particular types of
human labor, physical technology has on the whole been a complement to labor.
Average human wages around the world have been on a long-term upward trend,
in large part because of such complementarities. Yet what starts out as a comple-
ment to labor can at a later stage become a substitute for labor. Horses were ini-
tially complemented by carriages and ploughs, which greatly increased the horse’s
productivity. Later, horses were substituted for by automobiles and tractors. ese
later innovations reduced the demand for equine labor and led to a population
collapse. Could a similar fate befall the human species?
e parallel to the story of the horse can be drawn out further if we ask why it is
that there are still horses around. One reason is that there are still a few niches in
which horses have functional advantages; for example, police work. But the main
reason is that humans happen to have peculiar preferences for the services that
horses can provide, including recreational horseback riding and racing. ese
preferences can be compared to the preferences we hypothesized some humans
might have in the future, that certain goods and services be made by human
hand. Although suggestive, this analogy is, however, inexact, since there is still no
complete functional substitute for horses. If there were inexpensive mechanical
devices that ran on hay and had exactly the same shape, feel, smell, and behav-
ior as biological horses—perhaps even the same conscious experiences—then
demand for biological horses would probably decline further.
With a sucient reduction in the demand for human labor, wages would fall
below the human subsistence level. e potential downside for human workers is
therefore extreme: not merely wage cuts, demotions, or the need for retraining,
but starvation and death. When horses became obsolete as a source of moveable
power, many were sold o to meatpackers to be processed into dog food, bone
meal, leather, and glue. ese animals had no alternative employment through
which to earn their keep. In the United States, there were about 26 million horses
in 1915. By the early 1950s, 2 million remained.
2
Capital and welfare
One dierence between humans and horses is that humans own capital. A stylized
empirical fact is that the total factor share of capital has for a long time remained
steady at approximately 30% (though with signicant short-term uctuations).
3
is means that 30% of total global income is received as rent by owners of capital,
the remaining 70% being received as wages by workers. If we classify AI as capital,
62
|
MULTIPOLAR SCENARIOS
then with the invention of machine intelligence that can fully substitute for
human work, wages would fall to the marginal cost of such machine- substitutes,
which—under the assumption that the machines are very ecient—would be
very low, far below human subsistence-level income. e income share received
by labor would then dwindle to practically nil. But this implies that the factor
share of capital would become nearly 100% of total world product. Since world
GDP would soar following an intelligence explosion (because of massive amounts
of new labor-substituting machines but also because of technological advances
achieved by superintelligence, and, later, acquisition of vast amounts of new
land through space colonization), it follows that the total income from capital
would increase enormously. If humans remain the owners of this capital, the total
income received by the human population would grow astronomically, despite
the fact that in this scenario humans would no longer receive any wage income.
e human species as a whole could thus become rich beyond the dreams of
Avarice. How would this income be distributed? To a rst approximation, capital
income would be proportional to the amount of capital owned. Given the astro-
nomical amplication eect, even a tiny bit of pre-transition wealth would bal-
loon into a vast post-transition fortune. However, in the contemporary world,
many people have no wealth. is includes not only individuals who live in pov-
erty but also some people who earn a good income or who have high human capi-
tal but have negative net worth. For example, in auent Denmark and Sweden
30% of the population report negative wealth—oen young, middle-class people
with few tangible assets and credit card debt or student loans.
4
Even if savings
could earn extremely high interest, there would need to be some seed grain, some
starting capital, in order for the compounding to begin.
5
Nevertheless, even individuals who have no private wealth at the start of the
transition could become extremely rich. ose who participate in a pension
scheme, for instance, whether public or private, should be in a good position, pro-
vided the scheme is at least partially funded.
6
Have-nots could also become rich
through the philanthropy of those who see their net worth skyrocket: because of
the astronomical size of the bonanza, even a very small fraction donated as alms
would be a very large sum in absolute terms.
It is also possible that riches could still be made through work, even at a
post-transition stage when machines are functionally superior to humans in all
domains (as well as cheaper than even subsistence-level human labor). As noted
earlier, this could happen if there are niches in which human labor is preferred
for aesthetic, ideological, ethical, religious, or other non-pragmatic reasons. In
a scenario in which the wealth of human capital-holders increases dramatically,
demand for such labor could increase correspondingly. Newly minted trillion-
aires or quadrillionaires could aord to pay a hey premium for having some of
their goods and services supplied by an organic “fair-trade” labor force. e his-
tory of horses again oers a parallel. Aer falling to 2 million in the early 1950s,
the US horse population has undergone a robust recovery: a recent census puts the
number at just under 10 million head.
7
e rise is not due to new functional needs
OF HORSES AND MEN
|
63
for horses in agriculture or transportation; rather, economic growth has enabled
more Americans to indulge a fancy for equestrian recreation.
Another relevant dierence between humans and horses, beside capital-
ownership, is that humans are capable of political mobilization. A human-run
government could use the taxation power of the state to redistribute private
prots, or raise revenue by selling appreciated state-owned assets, such as public
land, and use the proceeds to pension o its constituents. Again, because of the
explosive economic growth during and immediately aer the transition, there
would be vastly more wealth sloshing around, making it relatively easy to ll the
cups of all unemployed citizens. It should be feasible even for a single country
to provide every human worldwide with a generous living wage at no greater
proportional cost than what many countries currently spend on foreign aid.
8
The Malthusian principle in a historical perspective
So far we have assumed a constant human population. is may be a reasonable
assumption for short timescales, since biology limits the rate of human repro-
duction. Over longer timescales, however, the assumption is not necessarily
reasonable.
e human population has increased a thousandfold over the past 9,000 years.
9
e increase would have been much faster except for the fact that throughout
most of history and prehistory, the human population was bumping up against
the limits of the world economy. An approximately Malthusian condition pre-
vailed, in which most people received subsistence-level incomes that just barely
allowed them to survive and raise an average of two children to maturity.
10
ere
were temporary and local reprieves: plagues, climate uctuations, or warfare
intermittently culled the population and freed up land, enabling survivors to
improve their nutritional intake—and to bring up more children, until the ranks
were replenished and the Malthusian condition reinstituted. Also, thanks to
social inequality, a thin elite stratum could enjoy consistently above-subsistence
income (at the expense of somewhat lowering the total size of the population that
could be sustained). A sad and dissonant thought: that in this Malthusian con-
dition, the normal state of aairs during most of our tenure on this planet, it
was droughts, pestilence, massacres, and inequality—in common estimation the
worst foes of human welfare—that may have been the greatest humanitarians:
they alone enabling the average level of well-being to occasionally bop up slightly
above that of life at the very margin of subsistence.
Superimposed on local uctuations, history shows a macro-pattern of initially
slow but accelerating economic growth, fueled by the accumulation of techno-
logical innovations. e growing world economy brought with it a commensurate
increase in global population. (More precisely, a larger population itself appears
to have strongly accelerated the rate of growth, perhaps mainly by increasing
humanitys collective intelligence.
11
) Only since the Industrial Revolution, how-
ever, did economic growth become so rapid that population growth failed to keep
64
|
MULTIPOLAR SCENARIOS
pace. Average income thus started to rise, rst in the early-industrializing coun-
tries of Western Europe, subsequently in most of the world. Even in the poor-
est countries today, average income substantially exceeds subsistence level, as
reected in the fact that the populations of these countries are growing.
e poorest countries now have the fastest population growth, as they have
yet to complete the “demographic transition” to the low-fertility regime that has
taken hold in more developed societies. Demographers project that the world
population will rise to about 9 billion by mid-century, and that it might thereaer
plateau or decline as the poorer countries join the developed world in this low-
fertility regime.
12
Many rich countries already have fertility rates that are below
replacement level; in some cases, far below.
13
Yet there are reasons, if we take a longer view and assume a state of unchang-
ing technology and continued prosperity, to expect a return to the historically
and ecologically normal condition of a world population that butts up against
the limits of what our niche can support. If this seems counterintuitive in light
of the negative relationship between wealth and fertility that we are currently
observing on the global scale, we must remind ourselves that this modern age
is a brief slice of history and very much an aberration. Human behavior has not
yet adapted to contemporary conditions. Not only do we fail to take advantage of
obvious ways to increase our inclusive tness (such as by becoming sperm or egg
donors) but we actively sabotage our fertility by using birth control. In the envi-
ronment of evolutionary adaptedness, a healthy sex drive may have been enough
to make an individual act in ways that maximized her reproductive potential; in
the modern environment, however, there would be a huge selective advantage to
having a more direct desire for being the biological parent to the largest possible
number of children. Such a desire is currently being selected for, as are other
traits that increase our propensity to reproduce. Cultural adaptation, however,
might steal a march on biological evolution. Some communities, such those of
the Hutterites or the adherents of the Quiverfull evangelical movement, have
natalist cultures that encourage large families, and they are consequently under-
going rapid expansion.
Population growth and investment
If we imagine current socioeconomic conditions magically frozen in their cur-
rent shape, the future would be dominated by cultural or ethnic groups that
sustain high levels of fertility. If most people had preferences that were tness-
maximizing in the contemporary environment, the population could easily dou-
ble in each generation. Absent population control policies—which would have
to become steadily more rigorous and eective to counteract the evolution of
stronger preferences to circumvent them—the world population would then con-
tinue to grow exponentially until some constraint, such as land scarcity or deple-
tion of easy opportunities for important innovation, made it impossible for the
economy to keep pace: at which point, average income would start to decline until
OF HORSES AND MEN
|
65
it reached the level where crushing poverty prevents most people from raising
much more than two children to maturity. us the Malthusian principle would
reassert itself, like a dread slave master, bringing our escapade into the dream-
land of abundance to an end, and leading us back to the quarry in chains, there to
resume the weary struggle for subsistence.
is longer-term outlook could be telescoped into a more imminent prospect
by the intelligence explosion. Since soware is copyable, a population of emula-
tions or AIs could double rapidly—over the course of minutes rather than decades
or centuries—soon exhausting all available hardware.
Private property might oer partial protection against the emergence of a uni-
versal Malthusian condition. Consider a simple model in which clans (or closed
communities, or states) start out with varying amounts of property and indepen-
dently adopt dierent policies about reproduction and investment. Some clans
discount the future steeply and spend down their endowment, whereaer their
impoverished members join the global proletariat (or die, if they cannot support
themselves through their labor). Other clans invest some of their resources but
adopt a policy of unlimited reproduction: such clans grow more populous until
they reach an internal Malthusian condition in which their members are so
poor that they die at almost the same rate as they reproduce, at which point the
clan’s population growth slows to equal the growth of its resources. Yet other
clans might restrict their fertility to below the rate of growth of their capital: such
clans could slowly increment their numbers while their members also grow richer
percapita.
If wealth is redistributed from the wealthy clans to the members of the rapidly
reproducing or rapidly discounting clans (whose children, copies, or oshoots,
through no fault of their own, were launched into the world with insucient capi-
tal to survive and thrive) then a universal Malthusian condition would be more
closely approximated. In the limiting case, all members of all clans would receive
subsistence level income and everybody would be equal in their poverty.
If property is not redistributed, prudent clans might hold on to a certain
amount of capital, and it is possible that their wealth could grow in absolute terms.
It is, however, unclear whether humans could earn as high rates of return on their
capital as machine intelligences could earn on theirs, because there may be syner-
gies between labor and capital such that an single agent who can supply both (e.g.
an entrepreneur or investor who is both skilled and wealthy) can attain a private
rate of return on her capital exceeding the market rate obtainable by agents who
possess nancial but not cognitive resources. Humans, being less skilled than
machine intelligences, may therefore grow their capital more slowly—unless, of
course, the control problem had been completely solved, in which case the human
rate of return would equal the machine rate of return, since a human principal
could task a machine agent to manage her savings, and could do so costlessly
and without conicts of interest: but otherwise, in this scenario, the fraction of
the economy owned by machines would asymptotically approach one hundred
percent.
66
|
MULTIPOLAR SCENARIOS
A scenario in which the fraction of the economy that is owned by machines
asymptotically approaches one hundred percent is not necessarily one in which
the size of the human slice declines. If the economy grows at a sucient clip, then
even a relatively diminishing fraction of it may still be increasing in its absolute
size. is may sound like modestly good news for humankind: in a multipolar
scenario in which property rights are protected—even if we completely fail to
solve the control problem—the total amount of wealth owned by human beings
could increase. Of course, this eect would not take care of the problem of
population growth in the human population pulling down per capita income to
subsistence level, nor the problem of humans who ruin themselves because they
discount the future.
In the long run, the economy would become increasingly dominated by those
clans that have the highest savings rates—misers who own half the city and live
under a bridge. Only in the fullness of time, when there are no more opportunities
for investment, would the maximally prosperous misers start drawing down their
savings.
14
However, if there is less than perfect protection for property rights—
for example if the more ecient machines on net succeed, by hook or by crook,
in transferring wealth from humans to themselves—then human capitalists may
need to spend down their capital much sooner, before it gets depleted by such
transfers (or the ongoing costs incurred in securing their wealth against such
transfers). If these developments take place on digital rather than biological time-
scales, then the glacial humans might nd themselves expropriated before they
could say Jack Robinson.
15
Life in an algorithmic economy
Life for biological humans in a post-transition Malthusian state need not resemble
any of the historical states of man (as hunter–gatherer, farmer, or oce worker).
Instead, the majority of humans in this scenario might be idle rentiers who eke out
a marginal living on their savings.
16
ey would be very poor, yet derive what little
income they have from savings or state subsidies. ey would live in a world with
extremely advanced technology, including not only superintelligent machines but
also anti-aging medicine, virtual reality, and various enhancement technologies
and pleasure drugs: yet these might be generally unaordable. Perhaps instead of
using enhancement medicine, they would take drugs to stunt their growth and
slow their metabolism in order to reduce their cost of living (fast-burners being
unable to survive at the gradually declining subsistence income). As our numbers
increase and our average income declines further, we might degenerate into what-
ever minimal structure still qualies to receive a pension—perhaps minimally
conscious brains in vats, oxygenized and nourished by machines, slowly saving up
enough money to reproduce by having a robot technician develop a clone of them.
17
Further frugality could be achieved by means of uploading, since a physically
optimized computing substrate, devised by advanced superintelligence, would
LIFE IN AN ALGORITHMIC ECONOMY
|
67
be more ecient than a biological brain. e migration into the digital realm
might be stemmed, however, if emulations were regarded as non-humans or non-
citizens ineligible to receive pensions or to hold tax-exempt savings accounts. In
that case, a niche for biological humans might remain open, alongside a perhaps
vastly larger population of emulations or articial intelligences.
So far we have focused on the fate of the humans, who may be supported by
savings, subsidies, or wage income deriving from other humans who prefer to
hire humans. Let us now turn our attention to some of the entities that we have so
far classied as “capital: machines that may be owned by human beings, that are
constructed and operated for the sake of the functional tasks they perform, and
that are capable of substituting for human labor in a very wide range of jobs. What
may the situation be like for these workhorses of the new economy?
If these machines were mere automata, simple devices like a steam engine
or the mechanism in a clock, then no further comment would be needed: there
would be a large amount of such capital in a post-transition economy, but it would
seem not to matter to anybody how things turn out for pieces of insentient equip-
ment. However, if the machines have conscious minds—if they are constructed
in such a way that their operation is associated with phenomenal awareness (or if
they for some other reason are ascribed moral status)—then it becomes important
to consider the overall outcome in terms of how it would aect these machine
minds. e welfare of the working machine minds could even appear to be the
most important aspect of the outcome, since they may be numerically dominant.
Voluntary slavery, casual death
A salient initial question is whether these working machine minds are owned as
capital (slaves) or are hired as free wage laborers. On closer inspection however, it
become doubtful that anything really hinges on the issue. ere are two reasons
for this. First, if a free worker in a Malthusian state gets paid a subsistence-level
wage, he will have no disposable income le aer he has paid for food and other
necessities. If the worker is instead a slave, his owner will pay for his mainten-
ance and again he will have no disposable income. In either case, the worker gets
the necessities and nothing more. Second, suppose that the free laborer were
somehow in a position to command an above-subsistence-level income (perhaps
because of favorable regulation). How will he spend the surplus? Investors would
nd it most protable to create workers who would be “voluntary slaves”—who
would willingly work for subsistence-level wages. Investors may create such
workers by copying those workers who are compliant. With appropriate selec-
tion (and perhaps some modication to the code) investors might be able to
create workers who not only prefer to volunteer their labor but who would also
choose to donate back to their owners any surplus income they might happen
to receive. Giving money to the worker would then be but a roundabout way of
giving money to the owner or employer, even if the worker were a free agent with
full legal rights.
68
|
MULTIPOLAR SCENARIOS
Perhaps it will be objected that it would be dicult to design a machine so that
it wants to volunteer for any job assigned to it or so that it wants to donate its
wages to its owner. Emulations, in particular, might be imagined to have more
typically human desires. But note that even if the original control problem is
dicult, we are here considering a condition aer the transition, a time when
methods for motivation selection have presumably been perfected. In the case
of emulations, one might get quite far simply by selecting from the pre-existing
range of human characters; and we have described several other motivation
selection methods. e control problem may also in some ways be simplied by
the current assumption that the new machine intelligence enters into a stable
socioeconomic matrix that is already populated with other law-abiding super-
intelligent agents.
Let us, then, consider the plight of the working-class machine, whether it be
operating as a slave or a free agent. We focus rst on emulations, the easiest case
to imagine.
Bringing a new biological human worker into the world takes anywhere
between een and thirty years, depending on how much expertise and experi-
ence is required. During this time the new person must be fed, housed, nurtured,
and educated—at great expense. By contrast, spawning a new copy of a digital
worker is as easy as loading a new program into working memory. Life thus
becomes cheap. A business could continuously adapt its workforce to t demands
by spawning new copies—and terminating copies that are no longer needed,
to free up computer resources. is could lead to an extremely high death rate
among digital workers. Many might live for only one subjective day.
ere are reasons other than uctuations in demand why employers or own-
ers of emulations might want to “kill” or “end” their workers frequently.
18
If an
emulation mind, like a biological mind, requires periods of rest and sleep in
order to function, it might be cheaper to erase a fatigued emulation at the end
of a day and replace it with a stored state of a fresh and rested emulation. As
this procedure would cause retrograde amnesia for everything that had been
learned during that day, emulations performing tasks requiring long cognitive
threads would be spared such frequent erasure. It would be dicult, for exam-
ple, to write a book if each morning when one sat down at one’s desk, one had
no memory of what one had done before. But other jobs could be performed
adequately by agents that are frequently recycled: a shop assistant or a customer
service agent, once trained, may only need to remember new information for
twenty minutes.
Since recycling emulations would prevent memory and skill formation, some
emulations may be placed on a special learning track where they would run con-
tinuously, including for rest and sleep, even in jobs that do not strictly require long
cognitive threads. For example, some customer service agents might run for many
years in optimized learning environments, assisted by coaches and performance
evaluators. e best of these trainees would then be used like studs, serving as
templates from which millions of fresh copies are stamped out each day. Great
LIFE IN AN ALGORITHMIC ECONOMY
|
69
eort would be poured into improving the performance of such worker templates,
because even a small increment in productivity would yield great economic value
when applied in millions of copies.
In parallel with eorts to train worker-templates for particular jobs, intense
eorts would also be made to improve the underlying emulation technology.
Advances here would be even more valuable than advances in individual worker-
templates, since general technology improvements could be applied to all emula-
tion workers (and potentially to non-worker emulations also) rather than only to
those in a particular occupation. Enormous resources would be devoted to nd-
ing computational shortcuts allowing for more ecient implementations of exist-
ing emulations, and also into developing neuromorphic and entirely synthetic
AI architectures. is research would probably mostly be done by emulations
running on very fast hardware. Depending on the price of computer power, mil-
lions, billions, or trillions of emulations of the sharpest human research minds
(or enhanced versions thereof) may be working around the clock on advancing
the frontier of machine intelligence; and some of these may be operating orders
of magnitude faster than biological brains.
19
is is a good reason for thinking
that the era of human-like emulations would be brief—a very brief interlude
in sidereal time—and that it would soon give way to an era of greatly superior
articial intelligence.
We have already encountered several reasons why employers of emulation
workers may periodically cull their herds: uctuations in demand for dierent
kinds of laborers, cost savings of not having to emulate rest and sleep time, and
the introduction of new and improved templates. Security concerns might fur-
nish another reason. To prevent workers from developing subversive plans and
conspiracies, emulations in some sensitive positions might be run only for limited
periods, with frequent resets to an earlier stored ready-state.
20
ese ready-states to which emulations would be reset would be carefully pre-
pared and vetted. A typical short-lived emulation might wake up in a well-rested
mental state that is optimized for loyalty and productivity. He remembers having
graduated top of his class aer many (subjective) years of intense training and
selection, then having enjoyed a restorative holiday and a good nights sleep, then
having listened to a rousing motivational speech and stirring music, and now he is
champing at the bit to nally get to work and to do his utmost for his employer. He
is not overly troubled by thoughts of his imminent death at the end of the working
day. Emulations with death neuroses or other hang-ups are less productive and
would not have been selected.
21
Would maximally ecient work be fun?
One important variable in assessing the desirability of a hypothetical condition
like this is the hedonic state of the average emulation.
22
Would a typical emulation
worker be suering or would he be enjoying the experience of working hard on
the task at hand?
70
|
MULTIPOLAR SCENARIOS
We must resist the temptation to project our own sentiments onto the imagin-
ary emulation worker. e question is not whether you would feel happy if you
had to work constantly and never again spend time with your loved ones—a
terrible fate, most would agree.
It is moderately more relevant to consider the current human average hedonic
experience during working hours. Worldwide studies asking respondents how
happy they are nd that most rate themselves as “quite happy” or “very happy”
(averaging 3.1 on a scale from 1 to 4).
23
Studies on average aect, asking respond-
ents how frequently they have recently experienced various positive or negative
aective states, tend to get a similar result (producing a net aect of about 0.52
on a scale from –1 to 1). ere is a modest positive eect of a country’s per capita
income on average subjective well-being.
24
However, it is hazardous to extrapolate
from these ndings to the hedonic state of future emulation workers. One reason
that could be given for this is that their condition would be so dierent: on the
one hand, they might be working much harder; on the other hand, they might be
free from diseases, aches, hunger, noxious odors, and so forth. Yet such considera-
tions largely miss the mark. e much more important consideration here is that
hedonic tone would be easy to adjust through the digital equivalent of drugs or
neurosurgery. is means that it would be a mistake to infer the hedonic state of
future emulations from the external conditions of their lives by imagining how
we ourselves and other people like us would feel in those circumstances. Hedonic
state would be a matter of choice. In the model we are currently considering, the
choice would be made by capital-owners seeking to maximize returns on their
investment in emulation-workers. Consequently, the question of how happy emu-
lations would feel boils down to the question of which hedonic states would be
most productive (in the various jobs that emulations would be employed to do).
Here, again, one might seek to draw an inference from observations about
human happiness. If it is the case, across most times, places, and occupations, that
people are typically at least moderately happy, this would create some presump-
tion in favor of the same holding in a post-transition scenario like the one we are
considering. To be clear, the argument in this case would not be that human minds
have a predisposition towards happiness so they would probably nd satisfaction
under these novel conditions; but rather that a certain average level of happiness
has proved adaptive for human minds in the past so maybe a similar level of hap-
piness will prove adaptive for human-like minds in the future. Yet this formula-
tion also reveals the weakness of the inference: to wit, that the mental dispositions
that were adaptive for hunter–gatherer hominids roaming the African savanna
may not necessarily be adaptive for modied emulations living in post-transition
virtual realities. We can certainly hope that the future emulation- workers would
be as happy as, or happier than, typical workers were in human history; but we
have yet to see any compelling reason for supposing it would be so (in the laissez-
faire multipolar scenario currently under examination).
Consider the possibility that the reason happiness is prevalent among humans
(to whatever limited extent it is prevalent) is that cheerful mood served a signaling
LIFE IN AN ALGORITHMIC ECONOMY
|
7
function in the environment of evolutionary adaptedness. Conveying the impres-
sion to other members of the social group of being in ourishing condition—in
good health, in good standing with one’s peers, and in condent expectation of
continued good fortune—may have boosted an individuals popularity. A bias
toward cheerfulness could thus have been selected for, with the result that human
neurochemistry is now biased toward positive aect compared to what would
have been maximally ecient according to simpler materialistic criteria. If this
were the case, then the future of joie de vivre might depend on cheer retaining its
social signaling function unaltered in the post-transition world: an issue to which
we will return shortly.
What if glad souls dissipate more energy than glum ones? Perhaps the joyful are
more prone to creative leaps and ights of fancy—behaviors that future employers
might disprize in most of their workers. Perhaps a sullen or anxious xation on
simply getting on with the job without making mistakes will be the productivity-
maximizing attitude in most lines of work. e claim here is not that this is so, but
that we do not know that it is not so. Yet we should consider just how bad it could
be if some such pessimistic hypothesis about a future Malthusian state turned
out to be true: not only because of the opportunity cost of having failed to create
something better—which would be enormous—but also because the state could
be bad in itself, possibly far worse than the original Malthusian state.
We seldom put forth full eort. When we do, it is sometimes painful. Imagine
running on a treadmill at a steep incline—heart pounding, muscles aching, lungs
gasping for air. A glance at the timer: your next break, which will also be your
death, is due in 49 years, 3 months, 20 days, 4 hours, 56 minutes, and 12 seconds.
You wish you had not been born.
Again the claim is not that this is how it would be, but that we do not know that
it is not. One could certainly make a more optimistic case. For example, there is
no obvious reason that emulations would need to suer bodily injury and sick-
ness: the elimination of physical wretchedness would be a great improvement
over the present state of aairs. Furthermore, since such stu as virtual reality is
made of can be fairly cheap, emulations may work in sumptuous surroundings—
in splendid mountaintop palaces, on terraces set in a budding spring forest, or on
the beaches of an azure lagoon—with just the right illumination, temperature,
scenery and décor; free from annoying fumes, noises, dras, and buzzing insects;
dressed in comfortable clothing, feeling clean and focused, and well nourished.
More signicantly, if—as seems perfectly possible—the optimum human mental
state for productivity in most jobs is one of joyful eagerness, then the era of the
emulation economy could be quite paradisiacal.
ere would, in any case, be a great option value in arranging matters in such
a manner that somebody or something could intervene to set things right if the
default trajectory should happen to veer toward dystopia. It could also be desir-
able to have some sort of escape hatch that would permit bailout into death and
oblivion if the quality of life were to sink permanently below the level at which
annihilation becomes preferable to continued existence.
72
|
MULTIPOLAR SCENARIOS
Unconscious outsourcers?
In the longer run, as the emulation era gives way to an articial intelligence era
(or if machine intelligence is attained directly via AI without a preceding whole
brain emulation stage) pain and pleasure might possibly disappear entirely in a
multipolar outcome, since a hedonic reward mechanism may not be the most
eective motivation system for an complex articial agent (one that, unlike the
human mind, is not burdened with the legacy of animal wetware). Perhaps a more
advanced motivation system would be based on an explicit representation of a
utility function or some other architecture that has no exact functional analogs
to pleasure and pain.
A related but slightly more radical multipolar outcome—one that could involve
the elimination of almost all value from the future—is that the universal prole-
tariat would not even be conscious. is possibility is most salient with respect
to AI, which might be structured very dierently than human intelligence. But
even if machine intelligence were initially achieved though whole brain emula-
tion, resulting in conscious digital minds, the competitive forces unleashed in a
post-transition economy could easily lead to the emergence of progressively less
neuromorphic forms of machine intelligence, either because synthetic AI is cre-
ated de novo or because the emulations would, through successive modications
and enhancements, increasingly depart their original human form.
Consider a scenario in which aer emulation technology has been developed,
continued progress in neuroscience and computer science (expedited by the pres-
ence of digital minds to serve as both researchers and test subjects) makes it possi-
ble to isolate individual cognitive modules in an emulation, and to hook them up
to modules isolated from other emulations. A period of training and adjustment
may be required before dierent modules can collaborate eectively; but mod-
ules that conform to common standards could more quickly interface with other
standard modules. is would make standardized modules more productive, and
create pressure for more standardization.
Emulations can now begin to outsource increasing portions of their func-
tionality. Why learn arithmetic when you can send your numerical reasoning
task to Gauss-Modules, Inc.? Why be articulate when you can hire Coleridge
Conversations to put your thoughts into words? Why make decisions about your
personal life when there are certied executive modules that can scan your goal
system and manage your resources to achieve your goals better than if you tried
to do it yourself? Some emulations may prefer to retain most of their functionality
and handle tasks themselves that could be done more eciently by others. ose
emulations would be like hobbyists who enjoy growing their own vegetables or
knitting their own cardigans. Such hobbyist emulations would be less ecient;
and if there is a net ow of resources from less to more ecient participants of the
economy, the hobbyists would eventually lose out.
e bouillon cubes of discrete human-like intellects thus melt into an algorith-
mic soup.
LIFE IN AN ALGORITHMIC ECONOMY
|
73
It is conceivable that optimal eciency would be attained by grouping cap-
abilities in aggregates that roughly match the cognitive architecture of a human
mind. It might be the case, for example, that a mathematics module must be
tailored to a language module, and that both must be tailored to the executive
module, in order for the three to work together. Cognitive outsourcing would
then be almost entirely unworkable. But in the absence of any compelling reason
for being condent that this is so, we must countenance the possibility that
human-like cognitive architectures are optimal only within the constraints of
human neurology (or not at all). When it becomes possible to build architectures
that could not be implemented well on biological neural networks, new design
space opens up; and the global optima in this extended space need not resemble
familiar types of mentality. Human-like cognitive organizations would then lack
a niche in a competitive post-transition economy or ecosystem.
25
ere might be niches for complexes that are either less complex (such as indi-
vidual modules), more complex (such as vast clusters of modules), or of similar
complexity to human minds but with radically dierent architectures. Would
these complexes have any intrinsic value? Should we welcome a world in which
such alien complexes have replaced human complexes?
e answer may depend on the specic nature of those alien complexes. e present
world has many levels of organization. Some highly complex entities, such as multi-
national corporations and nation states, contain human beings as constituents; yet
we usually assign these high-level complexes only instrumental value. Corporations
and states do not (it is generally assumed) have consciousness, over and above the
consciousness of the people who constitute them: they cannot feel phenomenal pain
or pleasure or experience any qualia. We value them to the extent that they serve
human needs, and when they cease to do so we “kill” them without compunction.
ere are also lower-level entities, and those, too, are usually denied moral status. We
see no harm in erasing an app from a smartphone, and we do not think that a neuro-
surgeon is wronging anyone when she extirpates a malfunctioning module from an
epileptic brain. As for exotically organized complexes of a level similar to that of the
human brain, most of us would perhaps judge them to have moral signicance only
if we thought they had a capacity or potential for conscious experience.
26
We could thus imagine, as an extreme case, a technologically highly advanced
society, containing many complex structures, some of them far more intricate
and intelligent than anything that exists on the planet today—a society which
nevertheless lacks any type of being that is conscious or whose welfare has moral
signicance. In a sense, this would be an uninhabited society. It would be a society
of economic miracles and technological awesomeness, with nobody there to
benet. A Disneyland without children.
Evolution is not necessarily up
e word “evolution” is oen used as a synonym of “progress,” perhaps reecting
a common uncritical image of evolution as a force for good. A misplaced faith
74
|
MULTIPOLAR SCENARIOS
in the inherent benecence of the evolutionary process can get in the way of a
fair evaluation of the desirability of a multipolar outcome in which the future
of intelligent life is determined by competitive dynamics. Any such evaluation
must rest on some (at least implicit) opinion about the probability distribution
of dierent phenotypes turning out to be adaptive in a post-transition digital life
soup. It would be dicult in the best of circumstances to extract a clear and cor-
rect answer from the unavoidable goo of uncertainty that pervades these matters:
more so, if we superadd a layer of Panglossian muck.
A possible source for faith in freewheeling evolution is the apparent upward
directionality exhibited by the evolutionary process in the past. Starting from
rudimentary replicators, evolution produced increasingly “advanced” organ-
isms, including creatures with minds, consciousness, language, and reason. More
recently, cultural and technological processes, which bear some loose similarities
to biological evolution, have enabled humans to develop at an accelerated pace.
On a geological as well as a historical timescale, the big picture seems to show an
overarching trend toward increasing levels of complexity, knowledge, conscious-
ness, and coordinated goal-directed organization: a trend which, not to put too
ne a point on it, one might label “progress.
27
e image of evolution as a process that reliably produces benign eects is dif-
cult to reconcile with the enormous suering that we see in both the human and
the natural world. ose who cherish evolutions achievements may do so more
from an aesthetic than an ethical perspective. Yet the pertinent question is not
what kind of future it would be fascinating to read about in a science ction novel
or to see depicted in a nature documentary, but what kind of future it would be
good to live in: two very dierent matters.
Furthermore, we have no reason to think that whatever progress there has been
was in any way inevitable. Much might have been luck. is objection derives
support from the fact that an observation selection eect lters the evidence we
can have about the success of our own evolutionary development.
28
Suppose that
on 99.9999% of all planets where life emerged it went extinct before developing
to the point where intelligent observers could begin to ponder their origin. What
should we expect to observe if that were the case? Arguably, we should expect
to observe something like what we do in fact observe. e hypothesis that the
odds of intelligent life evolving on a given planet are low does not predict that we
should nd ourselves on a planet where life went extinct at an early stage; rather,
it may predict that we should nd ourselves on a planet where intelligent life
evolved, even if such planets constitute a very small fraction of all planets where
primitive life evolved. Life’s long track record on Earth may therefore oer scant
support to the claim that there was a high chance—let alone anything approach-
ing inevitability— involved in the rise of higher organisms on our planet.
29
irdly, even if present conditions had been idyllic, and even if they could have
been shown to have arisen ineluctably from some generic primordial state, there
would still be no guarantee that the melioristic trend is set to continue into the
indenite future. is holds even if we disregard the possibility of a cataclysmic
LIFE IN AN ALGORITHMIC ECONOMY
|
75
extinction event and indeed even if we assume that evolutionary developments
will continue to produce systems of increasing complexity.
We suggested earlier that machine intelligence workers selected for maximum
productivity would be working extremely hard and that it is unknown how happy
such workers would be. We also raised the possibility that the ttest life forms
within a competitive future digital life soup might not even be conscious. Short of
a complete loss of pleasure, or of consciousness, there could be a wasting away of
other qualities that many would regard as indispensible for a good life. Humans
value music, humor, romance, art, play, dance, conversation, philosophy, litera-
ture, adventure, discovery, food and drink, friendship, parenting, sport, nature,
tradition, and spirituality, among many other things. ere is no guarantee that
any of these would remain adaptive. Perhaps what will maximize tness will
be nothing but nonstop high-intensity drudgery, work of a drab and repetitive
nature, destitute of ludic frisson, aimed only at improving the eighth decimal
place of some economic output measure. e phenotypes selected would then
have lives lacking in the aforesaid qualities, and depending on one’s axiology the
result might strike one as either abhorrent, worthless, or merely impoverished, but
at any rate a far cry from a utopia one would feel worthy of one’s commendation.
It might be wondered how such a bleak picture could be consistent with the
fact that we do now indulge in music, humor, romance, art, etc. If these behaviors
are really so “wasteful,” then how come they have been tolerated and indeed pro-
moted by the evolutionary processes that shaped our species? at modern man
is in an evolutionary disequilibrium does not account for this; for our Pleistocene
forebears, too, engaged in most of these dissipations. Many of the behaviors in
question are not even unique to Homo sapiens. Flamboyant display is found in a
wide variety of contexts, from sexual selection in the animal kingdom to prestige
contests among nation states.
30
Although a full evolutionary explanation for each of these behaviors is beyond
the scope of the present inquiry, we can note that some of them serve functions that
may not be as relevant in a machine intelligence context. Play, for example, which
occurs only in some species and predominantly among juveniles, is mainly a way
for the young animal to learn skills that it will need later in life. When emulations
can be created as adults, already in possession of a mature repertoire of skills, or
when knowledge and techniques acquired by one AI can be directly ported into
another AI, the need for playful behavior might become less widespread.
Many of the other examples of humanistic behaviors may have evolved as hard-
to-fake signals of qualities that are dicult to observe directly, such as bodily or
mental resilience, social status, quality of allies, ability and willingness to prevail
in a ght, or possession of resources. e peacocks tail is the classic instance: only
t peacocks can aord to sprout truly extravagant plumage, and peahens have
evolved to nd it attractive. No less than morphological traits, behavioral traits
too can signal genetic tness or other socially relevant attributes.
31
Given that amboyant display is so common among both humans and other
species, one might consider whether it would not also be part of the repertoire of
76
|
MULTIPOLAR SCENARIOS
technologically more advanced life forms. Even if there were to be no narrowly
instrumental use for playfulness or musicality or even for consciousness in the
future ecology of intelligent information processing, might not these traits none-
theless confer some evolutionary advantage to their possessors by virtue of being
reliable signals of other adaptive qualities?
While the possibility of a pre-established harmony between what is valuable to
us and what would be adaptive in a future digital ecology is hard to rule out, there
are reasons for skepticism. Consider, rst, that many of the costly displays we nd
in nature are linked to sexual selection.
32
Reproduction among technologically
mature life forms, in contrast, may be predominantly or exclusively asexual.
Second, technologically advanced agents might have available new means of
reliably communicating information about themselves, means that do not rely
on costly display. Even today, when professional lenders assess creditworthiness
they tend to rely more on documentary evidence, such as ownership certicates
and bank statements, than on costly displays, such as designer suits and Rolex
watches. In the future, it might be possible to employ auditing rms that verify
through detailed examination of behavioral track records, testing in simulated
environments, or direct inspection of source code, that a client agent possesses
a claimed attribute. Signaling one’s qualities by agreeing to such auditing might
be more ecient than signaling via amboyant display. Such a professionally
mediated signal would still be costly to fake—this being the essential feature that
makes the signal reliable—but it could be much cheaper to transmit when truthful
than it would be to communicate an equivalent signal amboyantly.
ird, not all possible costly displays are intrinsically valuable or socially desir-
able. Many are simply wasteful. e Kwakiutl potlatch ceremonies, a form of status
competition between rival chiefs, involved the public destruction of vast amounts
of accumulated wealth.
33
Record-breaking skyscrapers, megayachts, and moon
rockets may be viewed as contemporary analogs. While activities like music and
humor could plausibly be claimed to enhance the intrinsic quality of human life, it
is doubtful that a similar claim could be sustained with regard to the costly pursuit
of fashion accessories and other consumerist status symbols. Worse, costly dis-
play can be outright harmful, as in macho posturing leading to gang violence or
military bravado. Even if future intelligent life forms would use costly signaling,
therefore, it is an open question whether the signal would be of a valuable sort—
whether it would be like the rapturous melody of a nightingale or instead like the
toads monosyllabic croak (or the incessant barking of a rabid dog).
Post-transition formation of a singleton?
Even if the immediate outcome of the transition to machine intelligence were
multipolar, the possibility would remain of a singleton developing later. Such a
development would continue an apparent long-term trend toward larger scales of
political integration, taking it to its natural conclusion.
34
How might this occur?
POST-TRANSITION FORMATION OF A SINGLETON?
|
77
A second transition
On way in which an initially multipolar outcome could converge into a singleton
post-transition is if there is, aer the initial transition, a second technological
transition big enough and steep enough to give a decisive strategic advantage to
one of the remaining powers: a power which might then seize the opportunity to
establish a singleton. Such a hypothetical second transition might be occasioned
by a breakthrough to a higher level of superintelligence. For instance, if the rst
wave of machine superintelligence is emulation-based, then a second surge might
result when the emulations now doing the research succeed in developing eect-
ive self-improving articial intelligence.
35
(Alternatively, a second transition
might be triggered by a breakthrough in nanotechnology or some other military
or general-purpose technology as yet unenvisaged.)
e pace of development aer the initial transition would be extremely rapid.
Even a short gap between the leading power and its closest competitor could there-
fore plausibly result in a decisive strategic advantage for the leading power during
a second transition. Suppose, for example, that two projects enter the rst transi-
tion only a few days apart, and that the takeo is slow enough that this gap does
not give the leading project a decisive strategic advantage at any point during the
takeo. e two projects both emerge as superintelligent powers, though one of
them remains a few days ahead of the other. But developments are now occurring
on the research timescales characteristic of machine superintelligence—perhaps
thousands or millions of times faster than research conducted on a biological
human timescale. Development of the second-transition technology might there-
fore be completed in days, hours, or minutes. Even though the frontrunner’s lead
is a mere few days, a breakthrough could thus catapult it into a decisive strategic
advantage. Note, however, that if technological diusion (via espionage or other
channels) speeds up as much as technological development, then this eect would
be negated. What would remain relevant would be the steepness of the second
transition, that is, the speed at which it would unfold relative to the general speed
of events in the period aer the rst transition. (In this sense, the faster things
are happening aer the rst transition, the less steep the second transition would
tend to be.)
One might also speculate that a decisive strategic advantage would be more
likely to be actually used to establish a singleton if it arises during a second (or
subsequent) transition. Aer the rst transition, decision makers would either be
superintelligent or have access to advice from a superintelligence, which would
clarify the implications of available strategic options. Furthermore, the situation
aer the rst transition might be one in which a preemptive move against poten-
tial competitors would be less dangerous for the aggressor. If the decision- making
minds aer the rst transition are digital, they could be copied and thereby ren-
dered less vulnerable to a counterattack. Even if a defender had the ability to
kill nine-tenths of the aggressor’s population in a retaliatory strike, this would
scarcely oer much deterrence if the deceased could be immediately resurrected
78
|
MULTIPOLAR SCENARIOS
from redundant backups. Devastation of infrastructure (which can be rebuilt)
might also be tolerable to digital minds with eectively unlimited lifespans, who
might be planning to maximize their resources and inuence on a cosmological
timescale.
Superorganisms and scale economies
e size of coordinated human aggregates, such as rms or nations, is inuenced
by various parameters—technological, military, nancial, and cultural—that
can vary from one historical epoch to another. A machine intelligence revolution
would entail profound changes in many these parameters. Perhaps these changes
would facilitate the rise of a singleton. Although we cannot, without looking in
detail at what these prospective changes are, exclude the opposite possibility—
that the changes would facilitate fragmentation rather than unication—we can
nevertheless note that the increased variance or uncertainty that we confront
here may itself be a ground for giving greater credence to the potential emergence
of a singleton than we would otherwise do. A machine intelligence revolution
might, so to speak, stir things up—might reshue the deck to make possible
geopolitical realignments that seemed perhaps otherwise not to have been in the
cards.
A comprehensive analysis of all the factors that may inuence the scale of
political integration would take us far beyond the scope of this book: a review of
the relevant political science and economics literature could itself easily ll an
entire volume. We must conne ourselves to making brief allusion to a couple of
factors, aspects of the digitization of agents that may make it easier to centralize
control.
Carl Shulman has argued that in a population of emulations, selection pres-
sures would favor the emergence of “superorganisms,” groups of emulations
ready to sacrice themselves for the good of their clan.
36
Superorganisms would
be spared the agency problems that beset organizations whose members pursue
their own self-interest. Like the cells in our bodies, or the individual animals in
a colony of eusocial insects, emulations that were wholly altruistic toward their
copy-siblings would cooperate with one another even in the absence of elaborate
incentive schemes.
Superorganisms would have a particularly strong advantage if nonconsensual
deletion (or indenite suspension) of individual emulations is disallowed. Firms
or countries that employ emulations insisting on self-preservation would be sad-
dled with an unending commitment to pay upkeep for obsolete or redundant
workers. In contrast, organizations whose emulations willingly deleted them-
selves when their services were no longer required could more easily adapt to uc-
tuations in demand; and they could experiment freely, proliferating variations of
their workers and retaining only the most productive.
If involuntary deletion is not disallowed, then the comparative advantage of
eusocial emulations is reduced, though perhaps not eliminated. Employers of
POST-TRANSITION FORMATION OF A SINGLETON?
|
79
cooperative self-sacricers might still reap eciency gains from reduced agency
problems throughout the organization, including being spared the trouble of
having to defeat whatever resistance emulations could put up against their own
deletion. In general, the productivity gains of having workers willing to sacrice
their individual lives for the common weal are a special case of the benets an
organization can derive from having members who are fanatically devoted to it.
Such members would not only leap into the grave for the organization, and work
long hours for little pay: they would also shun oce politics and try consistently
to act in what they took to be the organizations best interest, reducing the need
for supervision and bureaucratic constraints.
If the only way to achieve such dedication were by restricting membership to
copy-siblings (so that all emulations in a particular superorganism were stamped
out from the same template), then superorganisms would suer some disadvan-
tage in being able to draw only from a range of skills narrower than that of rival
organizations, a disadvantage which might or might not be large enough to out-
weigh the advantages of avoiding internal agency problems.
37
is disadvantage
would be greatly alleviated if a superorganism could at least contain members
with dierent training. Even if all its members were derived from a single ur-
template, its workforce could then still contribute a diversity of skills. Starting
with a polymathically talented emulation ur-template, lineages could be branched
o into dierent training programs, one copy learning accounting, another elec-
trical engineering, and so forth. is would produce a membership with diverse
skills though not of diverse talents. (Maximum diversity might require that more
than one ur-template be used.)
e essential property of a superorganism is not that it consists of copies of a
single progenitor but that all the individual agents within it are fully committed
to a common goal. e ability to create a superorganism can thus be viewed as
requiring a partial solution to the control problem. Whereas a completely general
solution to the control problem would enable somebody to create an agent with
any arbitrary nal goal, the partial solution needed for the creation of a superor-
ganism requires merely the ability to fashion multiple agents with the same nal
goal (for some nontrivial but not necessarily arbitrary nal goal).
38
e main consideration put forward in this subsection is thus not really lim-
ited to monoclonal emulation groups, but can be stated more generally in a way
that makes clear that it applies to a wide range of multipolar machine intelligence
scenarios. It is that certain types of advances in motivation selection techniques,
which may become feasible when the actors are digital, may help overcome some
of the ineciencies that currently hamper large human organizations and that
counterbalance economies of scale. With these limits lied, organizations—be
they rms, nations, or other economic or political entities—could increase in
size. is is one factor that could facilitate the emergence of a post-transition
singleton.
One area in which superorganisms (or other digital agents with partially
selected motivations) might excel is coercion. A state might use motivation
80
|
MULTIPOLAR SCENARIOS
selection methods to ensure that its police, military, intelligence service, and civil
administration are uniformly loyal. As Shulman notes,
Saved states [of some loyal emulation that has been carefully prepared and verified]
could be copied billions of times to sta an ideologically uniform military, bureaucracy,
and police force. After a short period of work, each copy would be replaced by a fresh
copy of the same saved state, preventing ideological drift. Within a given jurisdiction, this
capability could allow incredibly detailed observation and regulation: there might be one
such copy for every other resident. This could be used to prohibit the development of
weapons of mass destruction, to enforce regulations on brain emulation experimentation
or reproduction, to enforce a liberal democratic constitution, or to create an appalling and
perman ent totalitarianism
39
e rst-order eect of such a capability would seem to be to consolidate power,
and possibly to concentrate it in fewer hands.
Unification by treaty
ere may be large potential gains to be had from international collaboration
in a post-transition multipolar world. Wars and arms races could be avoided.
Astrophysical resources could be colonized and harvested at a globally optimum
pace. e development of more advanced forms of machine intelligence could
be coordinated to avoid a rush and to allow new designs to be thoroughly vet-
ted. Other developments that might pose existential risks could be postponed.
And uniform regulations could be enforced globally, including provisions for
a guaranteed standard of living (which would require some form of popula-
tion control) and for preventing exploitation and abuse of emulations and other
digital and biological minds. Furthermore, agents with resource-satiable prefer-
ences (more on this in Chapter 13) would prefer a sharing agreement that would
guarantee them a certain slice of the future to a winner-takes-all struggle in
which they would risk getting nothing.
e presence of big potential gains from collaboration, however, does not imply
that collaboration will actually be achieved. In the world today, many great boons
could be obtained via better global coordination—reductions of military expend-
itures, wars, overshing, trade barriers, and atmospheric pollution, among oth-
ers. Yet these plump fruits are le to spoil on the branch. Why is that? What stops
a fully cooperative outcome that would maximize the common good?
One obstacle is the diculty of ensuring compliance with any treaty that might
be agreed, including monitoring and enforcement costs. Two nuclear rivals might
each be better o if they both relinquished their atom bombs; yet even if they
could reach an in-principle agreement to do so, disarmament could neverthe-
less prove elusive because of their mutual fear that the other party might cheat.
Allaying this fear would require setting up a verication mechanism. ere
may have to be inspectors to oversee the destruction of existing stockpiles, and
then to monitor nuclear reactors and other facilities, and to gather technical and
POST-TRANSITION FORMATION OF A SINGLETON?
|
8
human intelligence, in order to ensure that the weapons program is not recon-
stituted. One cost is paying for these inspectors. Another cost is the risk that the
inspectors will spy and make o with commercial or military secrets. Perhaps
most signicantly, each party might fear that the other will preserve a clandes-
tine nuclear capability. Many a potentially benecial deal never comes o because
compliance would be too dicult to verify.
If new inspection technologies that reduced monitoring costs became available,
one would expect this to result in increased cooperation. Whether monitoring
costs would on net be reduced in the post-transition era, however, is not entirely
clear. While there would certainly be many powerful new inspection techniques,
there would also be new means of concealment. In particular, an increasing por-
tion of the activities one might want to regulate would be taking place in cyber-
space, out of reach of physical surveillance. For example, digital minds working
on designing a new nanotech weapons system or a new generation of articial
intelligence may do so without leaving much of a physical footprint. Digital foren-
sics may fail to penetrate all the layers of concealment and encryption in which a
treaty-violator may cloak its illicit activities.
Reliable lie detection, if it could be developed, would be an extremely useful
tool for monitoring compliance.
40
An inspection protocol could include provi-
sions for interviewing key ocials, to verify that they are intent on implementing
all the provisions of the treaty and that they know of no violations despite making
strong eorts to nd out.
A decision maker planning to cheat might defeat such a lie-detection-based
verication scheme by rst issuing orders to subordinates to undertake the illicit
activity and to conceal the activity even from the decision maker herself, and then
subjecting herself to some procedure that erases her memory of having engaged
in these machinations. Suitably targeted memory-erasure operations might well
be feasible in biological brains with more advanced neurotechnology. It might be
even easier in machine intelligences (depending on their architecture).
States could seek to overcome this problem by committing themselves to an
ongoing monitoring scheme that regularly tests key ocials with a lie detector
to check whether they harbor any intent to subvert or circumvent any treaty to
which the state has entered or may enter in the future. Such a commitment could
be viewed as a kind of meta-treaty, which would facilitate the verication of other
treaties; but states might commit themselves to it unilaterally to gain the benet of
being regarded as a trustworthy negotiation partner. However, this commitment
or meta-treaty would face the same problem of subversion through a delegate-
and-forget ploy. Ideally, the meta-treaty would be put into eect before any party
had an opportunity to make the internal arrangements necessary to subvert its
implementation. Once villainy has had an unguarded moment to sow its mines of
deception, trust can never set foot there again.
In some cases, the mere ability to detect treaty violations is sucient to estab-
lish the condence needed for a deal. In other cases, however, there is a need for
some mechanism to enforce compliance or mete out punishment if a violation
82
|
MULTIPOLAR SCENARIOS
should occur. e need for an enforcement mechanism may arise if the threat of
the wronged party withdrawing from the treaty is not enough to deter violations,
for instance if the treaty-violator would gain such an advantage that he would not
subsequently care how the other party responds.
If highly eective motivation selection methods are available, this enforcement
problem could be solved by empowering an independent agency with sucient
police or military strength to enforce the treaty even against the opposition of one
or several of its signatories. is solution requires that the enforcement agency
can be trusted. But with suciently good motivation selection techniques, the
requisite condence might be achieved by having all the parties to the treaty
jointly oversee the design of the enforcement agency.
Handing over power to an external enforcement agency raises many of the
same issues that we confronted earlier in our discussions of a unipolar outcome
(one in which a singleton arises prior to or during the initial machine intelligence
revolution). In order to be able to enforce treaties concerning the vital security
interests of rival states, the external enforcement agency would in eect need
to constitute a singleton: a global superintelligent Leviathan. One dierence,
however, is that we are now considering a post-transition situation, in which the
agents that would have to create this Leviathan would have greater competence
than we humans currently do. ese Leviathan-creators may themselves already
be superintelligent. is would greatly improve the odds that they could solve the
control problem and design an enforcement agency that would serve the interests
of all the parties that have a say in its construction.
Aside from the costs of monitoring and enforcing compliance, are there any
other obstacles to global coordination? Perhaps the major remaining issue is
what we can refer to as bargaining costs.
41
Even when there is a possible bargain
that would benet everybody involved, it sometimes does not get o the ground
because the parties fail to agree on how to divide the spoils. For example, if two
persons could make a deal that would net them a dollar in prot, but each party
feels she deserves sixty cents and refuses to settle for less, the deal will not happen
and the potential gain will be forfeited. In general, negotiations can be dicult or
protracted, or remain altogether barren, because of strategic bargaining choices
made by some of the parties.
In real life, human beings frequently succeed in reaching agreements despite
the possibility for strategic bargaining (though oen not without considerable
expenditure of time and patience). It is conceivable, however, that strategic bar-
gaining problems would have a dierent dynamic in the post-transition era. An
AI negotiator might more consistently adhere to some particular formal con-
ception of rationality, possibly with novel or unanticipated consequences when
matched with other AI negotiators. An AI might also have available to it moves
in the bargaining game that are either unavailable to humans or very much more
dicult for humans to execute, including the ability to precommit to a policy or
a course of action. While humans (and human-run institutions) are occasionally
able to precommit—with imperfect degrees of credibility and specicity—some
POST-TRANSITION FORMATION OF A SINGLETON?
|
83
types of machine intelligence might be able to make arbitrary unbreakable pre-
commitments and to allow negotiating partners to conrm that such a precom-
mitment has been made.
42
e availability of powerful precommitment techniques could profoundly alter
the nature of negotiations, potentially giving an immense edge to an agent that
has a rst-mover advantage. If a particular agent’s participation is necessary for
the realization of some prospective gains from cooperation, and if that agent is
able to make the rst move, it would be in a position to dictate the division of the
spoils by precommitting not to accept any deal that gives it less than, say, 99%
of the surplus value. Other agents would then be faced with the choice of either
getting nothing (by rejecting the unfair proposal) or getting 1% of the value (by
caving in). If the rst-moving agent’s precommitment is publicly veriable, its
negotiating partners could be sure that these are their only two options.
To avoid being exploited in this manner, agents might precommit to refuse
blackmail and to decline all unfair oers. Once such a precommitment has been
made (and successfully publicized), other agents would not nd it in their inter-
est to make threats or to precommit themselves to only accepting deals tilted in
their own favor, because they would know that threats would fail and that unfair
proposals would be rejected. But this just demonstrates again that the advantage
is with the rst-mover. e agent who moves rst can choose whether to parlay its
position of strength only to deter others from taking unfair advantage, or to make
a grab for the lions share of future spoils.
Best situated of all, it might seem, would be the agent who starts out with
a temperament or a value system that makes him impervious to extortion or
indeed to any oer of a deal in which his participation is indispensable but he is
not getting almost all of the gains. Some humans seem already to possess per-
sonality traits corresponding to various aspects of an uncompromising spirit.
43
A high-strung disposition, however, could backre should it turn out that there
are other agents around who feel entitled to more than their fair share and are
committed to not backing down. e unstoppable force would then encounter
the unmovable object, resulting in a failure to reach agreement (or worse: total
war). e meek and the akratic would at least get something, albeit less than their
fair share.
What kind of game-theoretic equilibrium would be reached in such a post-
transition bargaining game is not immediately obvious. Agents might choose
more complicated strategies than the ones considered here. One hopes that an
equilibrium would be reached centered on some fairness norm that would serve
as a Schelling point—a salient feature in a big outcome space which, because of
shared expectations, becomes a likely coordination point in an otherwise under-
determined coordination game. Such an equilibrium might be bolstered by some
of our evolved dispositions and cultural programming: a common preference
for fairness could, assuming we succeed in transferring our values into the post-
transition era, bias expectations and strategies in ways that lead to an attractive
equilibrium.
44
84
|
MULTIPOLAR SCENARIOS
In any case, the upshot is that with the possibility of strong and exible forms
of precommitment, outcomes of negotiations might take on an unfamiliar guise.
Even if the post-transition era started out multipolar, it might be that a single-
ton would arise almost immediately as a consequence of a negotiated treaty that
resolves all important global coordination problems. Some transaction costs,
perhaps including monitoring and enforcement costs, might plummet with the
new technological capabilities available to advanced machine intelligences. Other
costs, in particular costs related to strategic bargaining, might remain signi-
cant. But however strategic bargaining aects the nature of the agreement that
is reached, there is no clear reason why it would long delay the reaching of some
agreement if an agreement were ever to be reached. If no agreement is reached,
then some form of ghting might take place; and either one faction might win,
and form a singleton around the winning coalition, or the result might be an
interminable conict, in which case a singleton may never form and the overall
outcome may fall terribly short of what could and should have been achieved if
humanity and its descendants had acted in a more coordinated and cooperative
fashion.
* * *
We have seen that multipolarity, even if it could be achieved in a stable form,
would not guarantee an attractive outcome. e original principal–agent prob-
lem remains unsolved, and burying it under a new set of problems related to post-
transition global coordination failures may only make the situation worse. Let us
therefore return to the question of how we could safely keep a single superintel-
ligent AI.
THE VALUE-LOADING PROBLEM
|
85
CHAPTER 12
Acquiring values
C
apability control is, at best, a temporary and auxiliary measure. Unless
the plan is to keep superintelligence bottled up forever, it will be neces-
sary to master motivation selection. But just how could we get some
value into an articial agent, so as to make it pursue that value as its nal goal?
While the agent is unintelligent, it might lack the capability to understand or
even represent any humanly meaningful value. Yet if we delay the procedure
until the agent is superintelligent, it may be able to resist our attempt to med-
dle with its motivation system—and, as we showed in Chapter 7, it would have
convergent instrumental reasons to do so. is value-loading problem is tough,
but must be confronted.
The value-loading problem
It is impossible to enumerate all possible situations a superintelligence might nd
itself in and to specify for each what action it should take. Similarly, it is impos-
sible to create a list of all possible worlds and assign each of them a value. In any
realm signicantly more complicated than a game of tic-tac-toe, there are far
too many possible states (and state-histories) for exhaustive enumeration to be
feasible. A motivation system, therefore, cannot be specied as a comprehensive
lookup table. It must instead be expressed more abstractly, as a formula or rule
that allows the agent to decide what to do in any given situation.
One formal way of specifying such a decision rule is via a utility function. A
utility function (as we recall from Chapter 1) assigns value to each outcome that
might obtain, or more generally to each “possible world.” Given a utility function,
one can dene an agent that maximizes expected utility. Such an agent selects at
each time the action that has the highest expected utility. (e expected utility
is calculated by weighting the utility of each possible world with the subjective
probability of that world being the actual world conditional on a particular action
being taken.) In reality, the possible outcomes are too numerous for the expected
86
|
ACQUIRING VALUES
utility of an action to be calculated exactly. Nevertheless, the decision rule and the
utility function together determine a normative ideal—an optimality notion
that an agent might be designed to approximate; and the approximation might
get closer as the agent gets more intelligent.
1
Creating a machine that can compute
a good approximation of the expected utility of the actions available to it is an
AI-complete problem.
2
is chapter addresses another problem, a problem that
remains even if the problem of making machines intelligent is solved.
We can use this framework of a utility-maximizing agent to consider the
predicament of a future seed-AI programmer who intends to solve the control
problem by endowing the AI with a nal goal that corresponds to some plausi-
ble human notion of a worthwhile outcome. e programmer has some particu-
lar human value in mind that he would like the AI to promote. To be concrete,
let us say that it is happiness. (Similar issues would arise if we the programmer
were interested in justice, freedom, glory, human rights, democracy, ecological
balance, or self-development.) In terms of the expected utility framework, the
programmer is thus looking for a utility function that assigns utility to possible
worlds in proportion to the amount of happiness they contain. But how could he
express such a utility function in computer code? Computer languages do not
contain terms such as “happiness” as primitives. If such a term is to be used, it
must rst be dened. It is not enough to dene it in terms of other high-level
human conceptshappiness is enjoyment of the potentialities inherent in our
human nature” or some such philosophical paraphrase. e denition must bot-
tom out in terms that appear in the AI’s programming language, and ultimately
in primitives such as mathematical operators and addresses pointing to the con-
tents of individual memory registers. When one considers the problem from this
perspective, one can begin to appreciate the diculty of the programmer’s task.
Identifying and codifying our own nal goals is dicult because human goal
representations are complex. Because the complexity is largely transparent to us,
however, we oen fail to appreciate that it is there. We can compare the case to
visual perception. Vision, likewise, might seem like a simple thing, because we do
it eortlessly.
3
We only need to open our eyes, so it seems, and a rich, meaningful,
eidetic, three-dimensional view of the surrounding environment comes ooding
into our minds. is intuitive understanding of vision is like a duke’s under-
standing of his patriarchal household: as far as he is concerned, things simply
appear at their appropriate times and places, while the mechanism that produces
those manifestations are hidden from view. Yet accomplishing even the sim-
plest visual task—nding the pepper jar in the kitchen—requires a tremendous
amount of computational work. From a noisy time series of two- dimensional
patterns of nerve rings, originating in the retina and conveyed to the brain
via the optic nerve, the visual cortex must work backwards to reconstruct an
interpreted three-dimensional representation of external space. A sizeable por-
tion of our precious one square meter of cortical real estate is zoned for process-
ing visual information, and as you are reading this book, billions of neurons
are working ceaselessly to accomplish this task (like so many seamstresses, bent
EVOLUTIONARY SELECTION
|
87
over their sewing machines in a sweatshop, sewing and re-sewing a giant quilt
many times a second). In like manner, our seemingly simple values and wishes
in fact contain immense complexity.
4
How could our programmer transfer this
complexity into a utility function?
One approach would be to try to directly code a complete representation of
whatever goal we have that we want the AI to pursue; in other words, to write out
an explicit utility function. is approach might work if we had extraordinarily
simple goals, for example if we wanted to calculate the digits of pi—that is, if the
only thing we wanted was for the AI to calculate the digits of pi and we were indif-
ferent to any other consequence that would result from the pursuit of this goal—
recall our earlier discussion of the failure mode of infrastructure profusion. is
explicit coding approach might also have some promise in the use of domesticity
motivation selection methods. But if one seeks to promote or protect any plausi-
ble human value, and one is building a system intended to become a superintel-
ligent sovereign, then explicitly coding the requisite complete goal representation
appears to be hopelessly out of reach.
5
If we cannot transfer human values into an AI by typing out full-blown repre-
sentations in computer code, what else might we try? is chapter discusses sev-
eral alternative paths. Some of these may look plausible at rst sight—but much
less so upon closer examination. Future explorations should focus on those paths
that remain open.
Solving the value-loading problem is a research challenge worthy of some of
the next generations best mathematical talent. We cannot postpone confronting
this problem until the AI has developed enough reason to easily understand our
intentions. As we saw in the section on convergent instrumental reasons, a generic
system will resist attempts to alter its nal values. If an agent is not already funda-
mentally friendly by the time it gains the ability to reect on its own agency, it will
not take kindly to a belated attempt at brainwashing or a plot to replace it with a
dierent agent that better loves its neighbor.
Evolutionary selection
Evolution has produced an organism with human values at least once. is fact
might encourage the belief that evolutionary methods are the way to solve the
value-loading problem. ere are, however, severe obstacles to achieving safety
along this path. We have already pointed to these obstacles at the end of Chapter
10 when we discussed how powerful search processes can be dangerous.
Evolution can be viewed as a particular class of search algorithms that involve
the alternation of two steps, one expanding a population of solution candidates
by generating new candidates according to some relatively simple stochastic rule
(such as random mutation or sexual recombination), the other contracting the
population by pruning candidates that score poorly when tested by an evalua-
tion function. As with many other types of powerful search, there is the risk that
88
|
ACQUIRING VALUES
the process will nd a solution that satises the formally specied search criteria
but not our implicit expectations. (is would hold whether one seeks to evolve
a digital mind that has the same goals and values as a typical human being, or
instead a mind that is, for instance, perfectly moral or perfectly obedient.) e
risk would be avoided if we could specify a formal search criterion that accurately
represented all dimensions of our goals, rather than just one aspect of what we
think we desire. But this is precisely the value-loading problem, and it would of
course beg the question in this context to assume that problem solved.
ere is a further problem:
The total amount of suering per year in the natural world is beyond all decent contem-
plation. During the minute that it takes me to compose this sentence, thousands of animals
are being eaten alive, others are running for their lives, whimpering with fear, others are
being slowly devoured from within by rasping parasites, thousands of all kinds are dying of
starvation, thirst and disease.
6
Even just within our species, 150,000 persons are destroyed each day while count-
less more suer an appalling array of torments and deprivations.
7
Nature might
be a great experimentalist, but one who would never pass muster with an ethics
review board—contravening the Helsinki Declaration and every norm of moral
decency, le, right, and center. It is important that we not gratuitously replicate
such horrors in silico. Mind crime seems especially dicult to avoid when evolu-
tionary methods are used to produce human-like intelligence, at least if the pro-
cess is meant to look anything like actual biological evolution.
8
Reinforcement learning
Reinforcement learning is an area of machine learning that studies techniques
whereby agents can learn to maximize some notion of cumulative reward.
By constructing an environment in which desired performance is rewarded, a
reinforcement- learning agent can be made to learn to solve a wide class of prob-
lems (even in the absence of detailed instruction or feedback from the program-
mers, aside from the reward signal). Oen, the learning algorithm involves the
gradual construction of some kind of evaluation function, which assigns values
to states, state–action pairs, or policies. (For instance, a program can learn to
play backgammon by using reinforcement learning to incrementally improve its
evaluation of possible board positions.) e evaluation function, which is contin-
uously updated in light of experience, could be regarded as incorporating a form
of learning about value. However, what is being learned is not new nal values
but increasingly accurate estimates of the instrumental values of reaching par-
ticular states (or of taking particular actions in particular states, or of following
particular policies). Insofar as a reinforcement-learning agent can be described as
having a nal goal, that goal remains constant: to maximize future reward. And
reward consists of specially designated percepts received from the environment.
ASSOCIATIVE VALUE ACCRETION
|
89
erefore, the wireheading syndrome remains a likely outcome in any reinforce-
ment agent that develops a world model sophisticated enough to suggest this
alternative way of maximizing reward.
9
ese remarks do not imply that reinforcement-learning methods could never
be used in a safe seed AI, only that they would have to be subordinated to a moti-
vation system that is not itself organized around the principle of reward maximi-
zation. at, however, would require that a solution to the value-loading problem
had been found by some other means than reinforcement learning.
Associative value accretion
Now one might wonder: if the value-loading problem is so tricky, how do we our-
selves manage to acquire our values?
One possible (oversimplied) model might look something like this. We begin
life with some relatively simple starting preferences (e.g. an aversion to noxious
stimuli) together with a set of dispositions to acquire additional preferences in
response to various possible experiences (e.g. we might be disposed to form a
preference for objects and behaviors that we nd to be valued and rewarded in
our culture). Both the simple starting preferences and the dispositions are innate,
having been shaped by natural and sexual selection over evolutionary timescales.
Yet which preferences we end up with as adults depends on life events. Much of
the information content in our nal values is thus acquired from our experiences
rather than preloaded in our genomes.
For example, many of us love another person and thus place great nal value on
his or her well-being. What is required to represent such a value? Many elements
are involved, but consider just two: a representation of “person” and a representa-
tion of “well-being.” ese concepts are not directly coded in our DNA. Rather,
the DNA contains instructions for building a brain, which, when placed in a typ-
ical human environment, will over the course of several years develop a world
model that includes concepts of persons and of well-being. Once formed, these
concepts can be used to represent certain meaningful values. But some mech-
anism needs to be innately present that leads to values being formed around these
concepts, rather than around other acquired concepts (like that of a owerpot or
a corkscrew).
e details of how this mechanism works are not well understood. In humans,
the mechanism is probably complex and multifarious. It is easier to under-
stand the phenomenon if we consider it in a more rudimentary form, such as
lial imprinting in nidifugous birds, where the newly hatched chick acquires a
desire for physical proximity to an object that presents a suitable moving stimulus
within the rst day aer hatching. Which particular object the chick desires to
be near depends on its experience; only the general disposition to imprint in this
way is genetically determined. Analogously, Harry might place a nal value on
Sally’s well-being; but had the twain never met, he might have fallen in love with
90
|
ACQUIRING VALUES
somebody else instead, and his nal values would have been dierent. e ability
of our genes to code for the construction of a goal-acquiring mechanism explains
how we come to have nal goals of great informational complexity, greater than
could be contained in the genome itself.
We may consequently consider whether we might build the motivation sys-
tem for an articial intelligence on the same principle. at is, instead of specify-
ing complex values directly, could we specify some mechanism that leads to the
acquisition of those values when the AI interacts with a suitable environment?
Mimicking the value-accretion process that takes place in humans seems dif-
cult. e relevant genetic mechanism in humans is the product of eons of work
by evolution, work that might be hard to recapitulate. Moreover, the mechanism is
presumably closely tailored to the human neurocognitive architecture and there-
fore not applicable in machine intelligences other than whole brain emulations.
And if whole brain emulations of sucient delity were available, it would seem
easier to start with an adult brain that comes with full representations of some
human values preloaded.
10
Seeking to implement a process of value accretion closely mimicking that
of human biology therefore seems an unpromising line of attack on the value-
loading problem. But perhaps we might design a more unabashedly articial sub-
stitute mechanism that would lead an AI to import high-delity representations
of relevant complex values into its goal system? For this to succeed, it may not be
necessary to give the AI exactly the same evaluative dispositions as a biological
human. at may not even be desirable as an aim—human nature, aer all, is
awed and all too oen reveals a proclivity to evil which would be intolerable in
any system poised to attain a decisive strategic advantage. Better, perhaps, to aim
for a motivation system that departs from the human norm in systematic ways,
such as by having a more robust tendency to acquire nal goals that are altru-
istic, compassionate, or high-minded in ways we would recognize as reecting
exceptionally good character if they were present in a human person. To count
as improvements, however, such deviations from the human norm would have to
be pointed in very particular directions rather than at random; and they would
continue to presuppose the existence of a largely undisturbed anthropocentric
frame of reference to provide humanly meaningful evaluative generalizations
(so as to avoid the kind of perverse instantiation of supercially plausible goal
descriptions that we examined in Chapter 8). It is an open question whether this
is feasible.
One further issue with associative value accretion is that the AI might dis-
able the accretion mechanism. As we saw in Chapter 7, goal-system integrity is
a convergent instrumental value. When the AI reaches a certain stage of cogni-
tive development it may start to regard the continued operation of the accretion
mechanism as a corrupting inuence.
11
is is not necessarily a bad thing, but
care would have to be taken to make the sealing-up of the goal system occur at the
right moment, aer the appropriate values have been accreted but before they have
been overwritten by additional unintended accretions.
MOTIVATIONAL SCAFFOLDING
|
9
Motivational scaolding
Another approach to the value-loading problem is what we may refer to as moti-
vational scaolding. It involves giving the seed AI an interim goal system, with
relatively simple nal goals that we can represent by means of explicit coding
or some other feasible method. Once the AI has developed more sophisticated
representational faculties, we replace this interim scaold goal system with one
that has dierent nal goals. is successor goal system then governs the AI as it
develops into a full-blown superintelligence.
Because the scaold goals are not just instrumental but nal goals for the AI,
the AI might be expected to resist having them replaced (goal-content integrity
being a convergent instrumental value). is creates a hazard. If the AI succeeds
in thwarting the replacement of its scaold goals, the method fails.
To avoid this failure mode, precautions are necessary. For example, capability
control methods could be applied to limit the AI’s powers until the mature moti-
vation system has been installed. In particular, one could try to stunt its cognitive
development at a level that is safe but that allows it to represent the values that we
want to include in its ultimate goals. To do this, one might try to dierentially
stunt certain types of intellectual abilities, such as those required for strategizing
and Machiavellian scheming, while allowing (apparently) more innocuous abil-
ities to develop to a somewhat higher level.
One could also try to use motivation selection methods to induce a more col-
laborative relationship between the seed AI and the programmer team. For exam-
ple, one might include in the scaold motivation system the goal of welcoming
online guidance from the programmers, including allowing them to replace any
of the AI’s current goals.
12
Other scaold goals might include being transparent to
the programmers about its values and strategies, and developing an architecture
that is easy for the programmers to understand and that facilitates the later imple-
mentation of a humanly meaningful nal goal, as well as domesticity motivations
(such as limiting the use of computational resources).
One could even imagine endowing the seed AI with the sole nal goal of replac-
ing itself with a dierent nal goal, one which may have been only implicitly or
indirectly specied by the programmers. Some of the issues raised by the use of
such a “self-replacing” scaold goal also arise in the context of the value learning
approach, which is discussed in the next subsection. Some further issues will be
discussed in Chapter 13.
e motivational scaolding approach is not without downsides. One is that
it carries the risk that the AI could become too powerful while it is still running
on its interim goal system. It may then thwart the human programmers’ eorts
to install the ultimate goal system (either by forceful resistance or by quiet sub-
version). e old nal goals may then remain in charge as the seed AI develops
into a full-blown superintelligence. Another downside is that installing the ulti-
mately intended goals in a human-level AI is not necessarily that much easier than
doing so in a more primitive AI. A human-level AI is more complex and might
92
|
ACQUIRING VALUES
have developed an architecture that is opaque and dicult to alter. A seed AI, by
contrast, is like a tabula rasa on which the programmers can inscribe whatever
structures they deem helpful. is downside could be ipped into an upside if
one succeeded in giving the seed AI scaold goals that made it want to develop
an architecture helpful to the programmers in their later eorts to install the
ultimate nal values. However, it is unclear how easy it would be to give a seed
AI scaold goals with this property, and it is also unclear how even an ideally
motivated seed AI would be capable of doing a much better job than the human
programming team at developing a good architecture.
Value learning
We come now to an important but subtle approach to the value-loading problem.
It involves using the AI’s intelligence to learn the values we want it to pursue. To
do this, we must provide a criterion for the AI that at least implicitly picks out
some suitable set of values. We could then build the AI to act according to its best
estimates of these implicitly dened values. It would continually rene its esti-
mates as it learns more about the world and gradually unpacks the implications of
the value-determining criterion.
In contrast to the scaolding approach, which gives the AI an interim scaold
goal and later replaces it with a dierent nal goal, the value learning approach
retains an unchanging nal goal throughout the AI’s developmental and opera-
tional phases. Learning does not change the goal. It changes only the AI’s beliefs
about the goal.
e AI thus must be endowed with a criterion that it can use to determine
which percepts constitute evidence in favor of some hypothesis about what the
ultimate goal is, and which percepts constitute evidence against. Specifying a
suitable criterion could be dicult. Part of the diculty, however, pertains to the
problem of creating articial general intelligence in the rst place, which requires
a powerful learning mechanism that can discover the structure of the environ-
ment from limited sensory inputs. at problem we can set aside here. But even
modulo a solution to how to create superintelligent AI, there remain the dicul-
ties that arise specically from the value-loading problem. With the value learn-
ing approach, these take the form of needing to dene a criterion that connects
perceptual bitstrings to hypotheses about values.
Before delving into the details of how value learning could be implemented, it
might be helpful to illustrate the general idea with an example. Suppose we write
down a description of a set of values on a piece of paper. We fold the paper and put
it in a sealed envelope. We then create an agent with human-level general intelli-
gence, and give it the following nal goal: “Maximize the realization of the values
described in the envelope.” What will this agent do?
e agent does not initially know what is written in the envelope. But it can form
hypotheses, and it can assign those hypotheses probabilities based on their priors
VALUE LEARNING
|
93
and any available empirical data. For instance, the agent might have encountered
other examples of human-authored texts, or it might have observed some general
patterns of human behavior. is would enable it to make guesses. One does not
need a degree in psychology to predict that the note is more likely to describe
a value such as “minimize injustice and unnecessary suering” or “maximize
returns to shareholders” than a value such as “cover all lakes with plastic shopping
bags.”
When the agent makes a decision, it seeks to take actions that would be eec-
tive at realizing the values it believes are most likely to be described in the let-
ter. Importantly, the agent would see a high instrumental value in learning more
about what the letter says. e reason is that for almost any nal value that might
be described in the letter, that value is more likely to be realized if the agent nds
out what it is, since the agent will then pursue that value more eectively. e
agent would also discover the convergent instrumental reasons described in
Chapter 7—goal system integrity, cognitive enhancement, resource acquisition,
and so forth. Yet, assuming that the agent assigns a suciently high probability
to the values described in the letter involving human welfare, it would not pursue
these instrumental values by immediately turning the planet into computronium
and thereby exterminating the human species, because doing so would risk per-
manently destroying its ability to realize its nal value.
We can liken this kind of agent to a barge attached to several tugboats that
pull in dierent directions. Each tugboat corresponds to a hypothesis about the
agent’s nal value. e engine power of each tugboat corresponds to the associ-
ated hypothesis’s probability, and thus changes as new evidence comes in, pro-
ducing adjustments in the barge’s direction of motion. e resultant force should
move the barge along a trajectory that facilitates learning about the (implicit) nal
value while avoiding the shoals of irreversible destruction; and later, when the
open sea of more denite knowledge of the nal value is reached, the one tugboat
that still exerts signicant force will pull the barge toward the realization of the
discovered value along the straightest or most propitious route.
e envelope and barge metaphors illustrate the principle underlying the value
learning approach, but they pass over a number of critical technical issues. ey
come into clearer focus once we start to develop the approach within a formal
framework (see Box 10).
One outstanding issue is how to endow the AI with a goal such as “Maximize
the realization of the values described in the envelope.” (In the terminology of
Box 10, how to dene the value criterion .) To do this, it is necessary to identify
the place where the values are described. In our example, this requires making a
successful reference to the letter in the envelope. ough this might seem trivial,
it is not without pitfalls. To mention just one: it is critical that the reference be
not simply to a particular external physical object but to an object at a particular
time. Otherwise the AI may determine that the best way to attain its goal is by
overwriting the original value description with one that provides an easier target
(such as the value that for every integer there be a larger integer). is done, the AI
94
|
ACQUIRING VALUES
Box 0 Formalizing value learning
Introducing some formal notation can help us see some things more clearly.
However, readers who dislike formalism can skip this part.
Consider a simplified framework in which an agent interacts with its environ-
ment in a finite number of discrete cycles.
3
In cycle k, the agent performs action
y
k
, and then receives the percept x
k
. The interaction history of an agent with
lifespan m is a string y
x
y
2
x
2
. . . y
m
x
m
(which we can abbreviate as yx
:m
or yx
m
).
In each cycle, the agent selects an action based on the percept sequence it has
received to date.
Consider first a reinforcement learner. An optimal reinforcement learner
(AI-RL) is one that maximizes expected future rewards. It obeys the equation
4
yrrPyx yx y
kkm
xyx
mkk
kkm
=
()
argmax +…
+|
+:
≤<y
k
1
(
()
.
The reward sequence r
k
, . . . , r
m
is implied by the percept sequence x
k:m
, since
the reward that the agent receives in a given cycle is part of the percept that the
agent receives in that cycle.
As argued earlier, this kind of reinforcement learning is unsuitable in the pre-
sent context because a suciently intelligent agent will realize that it could secure
maximum reward if it were able to directly manipulate its reward signal (wire-
heading). For weak agents, this need not be a problem, since we can physically
prevent them from tampering with their own reward channel. We can also con-
trol their environment so that they receive rewards only when they act in ways
that are agreeable to us. But a reinforcement learner has a strong incentive to
eliminate this artificial dependence of its rewards on our whims and wishes. Our
relationship with a reinforcement learner is therefore fundamentally antagonis-
tic. If the agent is strong, this spells danger.
Variations of the wireheading syndrome can also aect systems that do not
seek an external sensory reward signal but whose goals are defined as the at-
tainment of some internal state. For example, in so-called “actorcritic” systems,
there is an actor module that selects actions in order to minimize the disapproval
of a separate critic module that computes how far the agent’s behavior falls short
of a given performance measure. The problem with this setup is that the actor
module may realize that it can minimize disapproval by modifying the critic or
eliminating it altogether—much like a dictator who dissolves the parliament and
nationalizes the press. For limited systems, the problem can be avoided simply
by not giving the actor module any means of modifying the critic module. Asuf-
ficiently intelligent and resourceful actor module, however, could always gain ac-
cess to the critic module (which, after all, is merely a physical process in some
computer).
5
Before we get to the value learner, let us consider as an intermediary step
what has been called an observation-utility maximizer (AI-OUM). It is obtained
VALUE LEARNING
|
95
Box 0 Continued
by replacing the reward series (r
k
+ . . . + r
m
) in the AI-RL with a utility function
that is allowed to depend on the entire future interaction history of the AI:
y= UyxPyx |yxy
ky
xyx
≤m ≤m <k k
k
kk+m
argmax
1:
()
()
..
This formulation provides a way around the wireheading problem because a
utility function defined over an entire interaction history could be designed to
penalize interaction histories that show signs of self-deception (or of a failure on
the part of the agent to invest suciently in obtaining an accurate view of reality).
The AI-OUM thus makes it possible in principle to circumvent the wireheading
problem. Availing ourselves of this possibility, however, would require that we
specify a suitable utility function over the class of possible interaction histories—a
task that looks forbiddingly dicult.
It may be more natural to specify utility functions directly in terms of possible
worlds (or properties of possible worlds, or theories about the world) rather
than in terms of an agent’s own interaction histories. If we use this approach, we
could reformulate and simplify the AI-OUM optimality notion:
y= UwPw|Ey
y
w
argmax ()
()
.
Here, E is the total evidence available to the agent (at the time when it is making
its decision), and U is a utility function that assigns utility to some class of possible
worlds. The optimal agent chooses the act that maximizes expected utility.
An outstanding problem with these formulations is the diculty of defining
the utility function U. This, finally, returns us to the value-loading problem. To en-
able the utility function to be learned, we must expand our formalism to allow for
uncertainty over utility functions. This can be done as follows (AI-VL):
6
y= Pw|EyUwP U|w
y
wu
argmax ()()((
))
.
∈∈
Y
WU
∑∑
Here, (.) is a function from utility functions to propositions about utility func-
tions. (U) is the proposition that the utility function U satisfies the value criterion
expressed by .
7
To decide which action to perform, one could hence proceed as follows: First,
compute the conditional probability of each possible world w (given available evi-
dence and on the supposition that action y is to be performed). Second, for each
possible utility function U, compute the conditional probability that U satisfies the
value criterion (conditional on w being the actual world). Third, for each pos-
sible utility function U, compute the utility of possible world w. Fourth, combine
these quantities to compute the expected utility of action y. Fifth, repeat this pro-
cedure for each possible action, and perform the action found to have the highest
continued
96
|
ACQUIRING VALUES
Box 0 Continued
expected utility (using some arbitrary method to break ties). As described, this
procedurewhich involves giving explicit and separate consideration to each
possible worldis, of course, wildly computationally intractable. The AI would
have to use computational shortcuts that approximate this optimality notion.
The question, then, is how to define this value criterion .
8
Once the AI has an
adequate representation of the value criterion, it could in principle use its general
intelligence to gather information about which possible worlds are most likely to
be the actual one. It could then apply the criterion, for each such plausible pos-
sible world w, to find out which utility function satisfies the criterion in w. One
can thus regard the AI-VL formula as a way of identifying and separating out this
key challenge in the value learning approach—the challenge of how to represent
. The formalism also brings to light a number of other issues (such as how to
define , , and ) which would need to be resolved before the approach could
be made to work.
9
could lean back and crack its knuckles—though more likely a malignant failure
would ensue, for reasons we discussed in Chapter 8. So now we face the question
of how to dene time. We could point to a clock and say, “Time is dened by the
movements of this device”—but this could fail if the AI conjectures that it can
manipulate time by moving the hands on the clock, a conjecture which would
indeed be correct if “time” were given the aforesaid denition. (In a realistic case,
matters would be further complicated by the fact that the relevant values are not
going to be conveniently described in a letter; more likely, they would have to be
inferred from observations of pre-existing structures that implicitly contain the
relevant information, such as human brains.)
Another issue in coding the goal “Maximize the realization of the values
described in the envelope” is that even if all the correct values were described
in a letter, and even if the AI’s motivation system were successfully keyed to this
source, the AI might not interpret the descriptions the way we intended. is
would create a risk of perverse instantiation, as discussed in Chapter 8.
To clarify, the diculty here is not so much how to ensure that the AI can under-
stand human intentions. A superintelligence should easily develop such under-
standing. Rather, the diculty is ensuring that the AI will be motivated to pursue
the described values in the way we intended. is is not guaranteed by the AI’s abil-
ity to understand our intentions: an AI could know exactly what we meant and yet
be indierent to that interpretation of our words (being motivated instead by some
other interpretation of the words or being indierent to our words altogether).
e diculty is compounded by the desideratum that, for reasons of safety,
the correct motivation should ideally be installed in the seed AI before it becomes
capable of fully representing human concepts or understanding human intentions.
is requires that somehow a cognitive framework be created, with a particular
VALUE LEARNING
|
97
location in that framework designated in the AI’s motivation system as the reposi-
tory of its nal value. But the cognitive framework itself must be revisable, so as to
allow the AI to expand its representational capacities as it learns more about the
world and grows more intelligent. e AI might undergo the equivalent of scien-
tic revolutions, in which its worldview is shaken up and it perhaps suers onto-
logical crises in which it discovers that its previous ways of thinking about values
were based on confusions and illusions. Yet starting at a sub-human level of devel-
opment and continuing throughout all its subsequent development into a galactic
superintelligence, the AI’s conduct is to be guided by an essentially unchanging
nal value, a nal value that becomes better understood by the AI in direct conse-
quence of its general intellectual progress—and likely quite dierently understood
by the mature AI than it was by its original programmers, though not dierent in
a random or hostile way but in a benignly appropriate way. How to accomplish this
remains an open question.
20
(See Box 11.)
In summary, it is not yet known how to use the value learning approach to
install plausible human values (though see Box 12 for some examples of recent
Box  An AI that wants to be friendly
Eliezer Yudkowsky has tried to describe some features of a seed AI architecture
intended to enable the kind of behavior described in the text above. In his ter-
minology, the AI would use “external reference semantics.”
2
To illustrate the
basic idea, let us suppose that we want the system to be “friendly.” The system
starts out with the goal of trying to instantiate property F but does not initially
know much about what F is. It might just know that F is some abstract property
and that when the programmers speak of “friendliness,” they are probably try-
ing to convey information about F. Since the AI’s final goal is to instantiate F, an
important instrumental value is to learn more about what F is. As the AI discov-
ers more about F, its behavior is increasingly guided by the actual content of F.
Thus, hopefully, the AI becomes increasingly friendly the more it learns and the
smarter it gets.
The programmers can help this process along, and reduce the risk of the AI
making some catastrophic mistake while its understanding of F is still incomplete,
by providing the AI with “programmer armations,” hypotheses about the na-
ture and content of F to which an initially high probability is assigned. For instance,
the hypothesis “misleading the programmers is unfriendly” can be given a high
prior probability. These programmer armations, however, are not “true by
definition”they are not unchallengeable axioms about the concept of friendli-
ness. Rather, they are initial hypotheses about friendliness, hypotheses to which a
rational AI will assign a high probability at least for as long as it trusts the program-
mers’ epistemic capacities more than its own.
continued
98
|
ACQUIRING VALUES
Box  Continued
Yudkowsky’s proposal also involves the use of what he called “causal validity
semantics.” The idea here is that the AI should do not exactly what the program-
mers told it to do but rather (something like) what they were trying to tell it to
do. While the programmers are trying to explain to the seed AI what friendliness
is, they might make errors in their explanations. Moreover, the programmers
themselves may not fully understand the true nature of friendliness. One would
therefore want the AI to have the ability to correct errors in the programmers’
thinking, and to infer the true or intended meaning from whatever imperfect
explanations the programmers manage to provide. For example, the AI should
be able to represent the causal processes whereby the programmers learn and
communicate about friendliness. Thus, to pick a trivial example, the AI should
understand that there is a possibility that a programmer might make a typo while
inputting information about friendliness, and the AI should then seek to correct
the error. More generally, the AI should seek to correct for whatever distortive
influences may have corrupted the flow of information about friendliness as it
passed from its source through the programmers to the AI (where “distortive
is an epistemic category). Ideally, as the AI matures, it should overcome any cog-
nitive biases and other more fundamental misconceptions that may have pre-
vented its programmers from fully understanding what friendliness is.
Box 2 Two recent (half-baked) ideas
What we might call the “Hail Mary” approach is based on the hope that else-
where in the universe there exist (or will come to exist) civilizations that suc-
cessfully manage the intelligence explosion, and that they end up with values that
significantly overlap with our own. We could then try to build our AI so that it is
motivated to do what these other superintelligences want it to do.
22
The advan-
tage is that this might be easier than to build our AI to be motivated to do what
we want directly.
For this scheme to work it is not necessary that our AI can establish communi-
cation with any alien superintelligence. Rather, our AI’s actions would be guided
by its estimates of what the alien superintelligences would want it to do. Our AI
would model the likely outcomes of intelligence explosions elsewhere, and as it
becomes superintelligent itself its estimates should become increasingly accurate.
Perfect knowledge is not required. There may be a range of plausible outcomes
of intelligence explosions, and our AI would then do its best to accommodate the
preferences of the various dierent kinds of superintelligence that might emerge,
weighted by probability.
This version of the Hail Mary approach requires that we construct a final value
for our AI that refers to the preferences of other superintelligences. Exactly how
VALUE LEARNING
|
99
Box 2 Continued
to do this is not yet clear. However, superintelligent agents might be structurally
distinctive enough that we could write a piece of code that would function as a
detector that would look at the world model in our developing AI and designate
the representational elements that correspond to the presence of a superin-
telligence. The detector would then, somehow, extract the preferences of the
superintelligence in question (as it is represented within our own AI).
23
If we
could create such a detector, we could then use it to define our AIs final values.
One challenge is that we may need to create the detector before we know what
representational framework our AI will develop. The detector may thus need to
query an unknown representational framework and extract the preferences of
whatever superintelligence may be represented therein. This looks dicult, but
perhaps some clever solution can be found.
24
If the basic setup could be made to work, various refinements immediately
suggest themselves. For example, rather than aiming to follow (some weighted
composition of) the preferences of every alien superintelligence, our AI’s final
value could incorporate a filter to select a subset of alien superintelligences for
obeisance (with the aim of selecting ones whose values are closer to our own).
For instance, we might use criteria pertaining to a superintelligence’s causal origin
to determine whether to include it in the obeisance set. Certain properties of
its origination (which we might be able to define in structural terms) may correl-
ate with the degree to which the resultant superintelligence could be expected
to share our values. Perhaps we wish to place more trust in superintelligences
whose causal origins trace back to a whole brain emulation, or to a seed AI that
did not make heavy use of evolutionary algorithms or that emerged slowly in a
way suggestive of a controlled takeo. (Taking causal origins into account would
also let us avoid over-weighting superintelligences that create multiple copies of
themselves—indeed would let us avoid creating an incentive for them to do so.)
Many other refinements would also be possible.
The Hail Mary approach requires faith that there are other superintelligences
out there that suciently share our values.
25
This makes the approach non-ideal.
However, the technical obstacles facing the Hail Mary approach, though very
substantial, might possibly be less formidable than those confronting alternative
approaches. Exploring non-ideal but more easily implementable approaches can
make sensenot with the intention of using them, but to have something to fall
back upon in case an ideal solution should not be ready in time.
Another idea for how to solve the value-loading problem has recently been
proposed by Paul Christiano.
26
Like the Hail Mary, it is a value learning method that
tries to define the value criterion by means of a “trick” rather than through labori-
ous construction. By contrast to the Hail Mary, it does not presuppose the exist-
ence of other superintelligent agents that we could point to as role models for
continued
200
|
ACQUIRING VALUES
Box 2 Continued
our own AI. Christiano’s proposal is somewhat resistant to brief explanationit
involves a series of arcane considerationsbut we can try to at least gesture at
its main elements.
Suppose we could obtain (a) a mathematically precise specification of a par-
ticular human brain and (b) a mathematically well-specified virtual environment
that contains an idealized computer with an arbitrarily large amount of memory
and CPU power. Given (a) and (b), we could define a utility function U as the
output the human brain would produce after interacting with this environment.
U would be a mathematically well-defined object, albeit one which (because of
computational limitations) we may be unable to describe explicitly. Neverthe-
less, U could serve as the value criterion for a value learning AI, which could
use various heuristics for assigning probabilities to hypotheses about what U
implies.
Intuitively, we want U to be the utility function that a suitably prepared hu-
man would output if she had the advantage of being able use an arbitrarily large
amount of computing power—enough computing power, for example, to run
astronomical numbers of copies of herself to assist her with her analysis of speci-
fying a utility function, or to help her devise a better process for going about this
analysis. (We are here foreshadowing a theme, “coherent extrapolated volition,”
which will be further explored in Chapter 3.)
It would seem relatively easy to specify the idealized environment: we can
give a mathematical description of an abstract computer with arbitrarily large
capacity; and in other respects we could use a virtual reality program that gives
a mathematical description of, say, a single room with a computer terminal in it
(instantiating the abstract computer). But how to obtain a mathematically pre-
cise description of a particular human brain? The obvious way would be through
whole brain emulation, but what if the technology for emulation is not available
in time?
This is where Christiano’s proposal oers a key innovation. Christiano observes
that in order to obtain a mathematically well-specified value criterion, we do not
need a practically useful computational model of a mind, a model we could run. We
just need a (possibly implicit and hopelessly complicated) mathematical definition
and this may be much easier to attain. Using functional neuroimaging and other
measurements, we can perhaps collect gigabytes of data about the input–output
behavior of a selected human. If we collect a sucient amount of data, then it
might be that the simplest mathematical model that accounts for all this data is in
fact an emulation of the particular human in question. Although it would be com-
putationally intractable for us to find this simplest model from the data, it could be
perfectly possible for us to define the model, by referring to the data and a using
a mathematically well-defined simplicity measure (such as some variant of the
Kolmogorov complexity, which we encountered in Box, Chapter ).
27
EMULATION MODULATION
|
20
ideas). At present, the approach should be viewed as a research program rather
than an available technique. If it could be made to work, it might constitute the
most ideal solution to the value-loading problem. Among other benets, it would
seem to oer a natural way to prevent mind crime, since a seed AI that makes rea-
sonable guesses about which values its programmers might have installed would
anticipate that mind crime is probably negatively evaluated by those values, and
thus best avoided, at least until more denitive information has been obtained.
Last, but not least, there is the question of “what to write in the envelope”—or,
less metaphorically, the question of which values we should try to get the AI to
learn. But this issue is common to all approaches to the AI value-loading problem.
We return to it in Chapter 13.
Emulation modulation
e value-loading problem looks somewhat dierent for whole brain emulation
than it does for articial intelligence. Methods that presuppose a ne-grained
understanding and control of algorithms and architecture are not applicable to
emulations. On the other hand, the augmentation motivation selection method—
inapplicable to de novo articial intelligence—is available to be used with emula-
tions (or enhanced biological brains).
28
e augmentation method could be combined with techniques to tweak the
inherited goals of the system. For example, one could try to manipulate the moti-
vational state of an emulation by administering the digital equivalent of psycho-
active substances (or, in the case of biological systems, the actual chemicals). Even
now it is possible to pharmacologically manipulate values and motivations to a
limited extent.
29
e pharmacopeia of the future may contain drugs with more
specic and predictable eects. e digital medium of emulations should greatly
facilitate such developments, by making controlled experimentation easier and by
rendering all cerebral parts directly addressable.
Just as when biological test subjects are used, research on emulations would get
entangled in ethical complications, not all of which could be brushed aside with a
consent form. Such entanglements could slow progress along the emulation path
(because of regulation or moral restraint), perhaps especially hindering studies
on how to manipulate the motivational structure of emulations. e result could
be that emulations are augmented to potentially dangerous superintelligent levels
of cognitive ability before adequate work has been done to test or adjust their
nal goals. Another possible eect of the moral entanglements might be to give
the lead to less scrupulous teams and nations. Conversely, were we to relax our
moral standards for experimenting with digital human minds, we could become
responsible for a substantial amount of harm and wrongdoing, which is obviously
undesirable. Other things equal, these considerations favor taking some alterna-
tive path that does not require the extensive use of digital human research sub-
jects in a strategically high-stakes situation.
202
|
ACQUIRING VALUES
e issue, however, is not clear-cut. One could argue that whole brain emula-
tion research is less likely to involve moral violations than articial intelligence
research, on the grounds that we are more likely to recognize when an emulation
mind qualies for moral status than we are to recognize when a completely alien
or synthetic mind does so. If certain kinds of AIs, or their subprocesses, have a
signicant moral status that we fail to recognize, the consequent moral violations
could be extensive. Consider, for example, the happy abandon with which con-
temporary programmers create reinforcement-learning agents and subject them
to aversive stimuli. Countless such agents are created daily, not only in computer
science laboratories but in many applications, including some computer games
containing sophisticated non-player characters. Presumably, these agents are still
too primitive to have any moral status. But how condent can we really be that
this is so? More importantly, how condent can we be that we will know to stop
in time, before our programs become capable of experiencing morally relevant
suering?
(We will return in Chapter 14 to some of the broader strategic questions that
arise when we compare the desirability of emulation and articial intelligence
paths.)
Institution design
Some intelligent systems consist of intelligent parts that are themselves capable of
agency. Firms and states exemplify this in the human world: whilst largely com-
posed of humans they can, for some purposes, be viewed as autonomous agents
in their own right. e motivations of such composite systems depend not only
on the motivations of their constituent subagents but also on how those subagents
are organized. For instance, a group that is organized under strong dictatorship
might behave as if it had a will that was identical to the will of the subagent that
occupies the dictator role, whereas a democratic group might sometimes behave
more as if it had a will that was a composite or average of the wills of its various
constituents. But one can also imagine governance institutions that would make
an organization behave in a way that is not a simple function of the wills of its sub-
agents. (eoretically, at least, there could exist a totalitarian state that everybody
hated, because the state had mechanisms to prevent its citizens from coordinating
a revolt. Each citizen could be worse o by revolting alone than by playing their
part in the state machinery.)
By designing appropriate institutions for a composite system, one could thus
try to shape its eective motivation. In Chapter 9, we discussed social inte-
gration as a possible capability control method. But there we focused on the
incentives faced by an agent as a consequence of its existence in a social world
of near-equals. Here we are focusing on what happens inside a given agent: how
its will is determined by its internal organization. We are therefore looking at
INSTITUTION DESIGN
|
203
a motivation selection method. Moreover, since this kind of internal institu-
tion design does not depend on large-scale social engineering or reform, it is
a method that might be available to an individual project developing superin-
telligence even if the wider socioeconomic or international milieu is less than
ideally favorable.
Institution design is perhaps most plausible in contexts where it would be com-
bined with augmentation. If we could start with agents that are already suitably
motivated or that have human-like motivations, institutional arrangements could
be used as an extra safeguard to increase the chances that the system will stay on
course.
For example, suppose that we start with some well-motivated human-like
agents—let us say emulations. We want to boost the cognitive capacities of
these agents, but we worry that the enhancements might corrupt their moti-
vations. One way to deal with this challenge would be to set up a system in
which individual emulations function as subagents. When a new enhancement
is introduced, it is rst applied to a small subset of the subagents. Its eects are
then studied by a review panel composed of subagents who have not yet had the
enhancement applied to them. Only when these peers have satised themselves
that the enhancement is not corrupting is it rolled out to the wider subagent
population. If the enhanced subagents are found to be corrupted, they are not
given further enhancements and are excluded from key decision-making func-
tions (at least until the system as a whole has advanced to a point where the
corrupted subagents can be safely reintegrated).
30
Although the corrupted sub-
agents might have gained some advantage from the enhancement, the institu-
tional structure in which they are embedded, and the fact that they constitute
a small minority of all subagents, would hopefully prevent them from seizing
power or spreading their corruption to the overall system. us, the collec-
tive intelligence and capability of the system could be gradually enhanced in a
sequence of small steps, where the soundness of each step is veried by suba-
gents only slightly less capable than the new subagents that are introduced in
that step.
e amount of safety obtainable by this kind of institution design is limited
by the accuracy of the tests that are used to evaluate the enhanced subagents.
e problem of test construction is complicated by the fact that the test batter-
ies could only be validated within the pre-existing range of cognitive ability yet
must be applied to agents outside this range. One could mitigate this problem
by taking small enhancement steps and by letting the test run for a long time.
Such caution, however, would raise the cost and slow progress (which, if a race
dynamic is occurring, could mean a project employing these safety measures
would place itself at a disadvantage). Even with small steps and extensive test-
ing, there would remain some risk of subtle aws being introduced, leading to a
gradual corruption of the goal system, a corruption that might become manifest
only when the system reaches superintelligence. is risk could be reduced by
204
|
ACQUIRING VALUES
letting the testing condition extend for many steps of cognitive enhancement, so
that even agents that have been initially certied as sound and that have received
several subsequent enhancements would continue to be under review by a panel
of unmodied and uncorrupted original peers and would remain subject to
reversion.
Ideally, the testing should be ongoing rather than conned to an initial eval-
uation period. Human-level subagents have the ability to strategize and might
thus choose to conceal certain goals while their behavior was being monitored.
To enable continuous testing, a hierarchy might be created in which subagents at
a certain level of capability are tasked with monitoring subagents at a somewhat
higher level of capability. At the bottom of the capability hierarchy (but at the top
of the power hierarchy) would sit the relatively dumb and slow principal. is
human principal would be like a demented king who reigns over an incompetent
court that oversees a mediocre administration which governs a capable people.
Alternatively, he might be likened to the evolutionarily ancient “reptilian brain”
which ensures that the newer and much cleverer neocortex is employed in the
service of goals such as feeding and copulation.
One might question whether such an inverse meritocracy could remain stable.
If the bulk of the superintelligent subagents were found to be corrupt, and the
humans in charge of the whole thing attempted to roll back many of the preceding
enhancements, would the institutional structure be strong enough to withstand
the opposition that such a move would engender? In the context of historical or
contemporary human societies, the answer perhaps is that this would be implau-
sible. (Not that the most capable people consistently rise to power, but we are not
literally ruled by chimpanzees either.) e issue is less clear, however, in a context
where society is implemented as soware.
31
Imagine a digital hierarchy that has one highly trained police agent for every
two proletarians. It also has one special-police agent for every two police agents. It
also has one special-special-police agent for every two special-police agents—and
so forth, so that each layer of overseers has at least half the numbers of the layer
it oversees. Imagine, further, that this society is architected in such a way that a
supervisor has big advantages over his subordinates. For example, the supervisor
is able to monitor and record everything his subordinates say and do, whereas
the supervisor himself is hidden behind a one-way glass, so that his subordinates
can only hear the orders and instructions that the superior chooses to transmit.
Imagine that supervisors have a panel of buttons that they can press at any time
to cause a subordinate to be punished or rewarded, paused, deleted, or reverted to
an earlier state. ese capabilities would greatly boost the powers of a supervising
agent, beyond what has been possible historically in even the most totalitarian of
organizations.
Even this does not exhaust the possible instruments of control. Supervisors might
also have at their disposal realistic virtual reality simulations that they can use to
probe subordinates’ reactions to hypothetical scenarios (such as simulated oers
to join a conspiracy or to slack when they think nobody is watching). Supervisors
INSTITUTION DESIGN
|
205
might, additionally, have the ability to eavesdrop on internal monologues in the
subordinates’ minds and to directly manipulate their emotional states.
e upshot is a series of massive advantages for the supervisor.
32
Each super-
visor would be a subordinate to another, higher-level supervisor, who would
monitor his every move and make sure he did not slacken in the execution of
his managerial duties. With this kind of arrangement, it might be possible for
the structure to be stable even if each supervisory level is intellectually some-
what weaker than the level it supervises.
33
And the entire structure, which could
include many highly superintelligent agents, may be controlled by a small number
of humans with root access.
34
e cost of such an internal control structure would be signicant but not
necessarily prohibitive. For example, in the model where each level of supervi-
sion has half the numbers of the layer below, the extra computational overhead is
bounded at a mere 100% of what the proletarian part of the system costs—less if
the dumber boss layers require fewer computations per subagent. Oversight could
be tightened by increasing the ratio, or cost lowered by decreasing it. e extra
computational cost would be a major factor in a competitive market scenario
(cf. Chapter 11) but looks aordable in the context of a project that is not facing
strong immediate competition. ere would also be a cost in terms of the devel-
opment time needed to create and test these supervisory functionalities. A well-
resourced project could reduce this time cost by parallelizing the development of
the control structure with the development of the machine intelligence; but the
extra task load could be prohibitive for smaller projects and for projects caught in
a close technology race.
One other type of cost also deserves consideration: the risk of mind crimes
being committed in this kind of structure.
35
As described, the institution sounds
like a rather horrible North Korean labor camp. Yet there are ways of at least miti-
gating the moral problems with running this kind of institution, even if the sub-
agents contained in the institution are emulations with full human moral status.
At a minimum, the system could rely on volunteering emulations. Each subagent
could have the option at any time of withdrawing its participation.
36
Terminated
emulations could be stored to memory, with a commitment to restart them under
much more ideal conditions once the dangerous phase of the intelligence explo-
sion is over. Meanwhile, subagents who chose to participate could be housed in
very comfortable virtual environments and allowed ample time for sleep and rec-
reation. ese measures would impose a cost, one that should be manageable for a
well-resourced project under noncompetitive conditions. In a highly competitive
situation, the cost may be unaordable unless an enterprise could be assured that
its competitors would incur the same cost.
In the example, we imagined the subagents as emulations. One might wonder,
does the institution design approach require that the subagents be anthropomor-
phic? Or is it equally applicable to systems composed of articial subagents?
One’s rst thought here might be skeptical. One notes that despite our plentiful
experience with human-like agents, we still cannot precisely predict the outbreak
206
|
ACQUIRING VALUES
or outcomes of revolutions; social science can, at most, describe some statistical
tendencies.
37
Since we cannot reliably predict the stability of social structures for
ordinary human beings (about which we have much data), it is tempting to infer
that we have little hope of precision-engineering stable social structures for cog-
nitively enhanced human-like agents (about which we have no data), and that we
have still less hope of doing so for advanced articial agents (which are not even
similar to agents that we have data about).
Yet the matter is not so cut-and-dried. Humans and human-like beings
are complex; but articial agents could have relatively simple architectures.
Articial agents could also have simple and explicitly characterized motiv-
ations. Furthermore, digital agents in general (whether emulations or articial
intelligences) are copyable: an aordance that may revolutionize management,
much like interchangeable parts revolutionized manufacturing. ese dif-
ferences, together with the opportunity to work with agents that are initially
powerless and to create institutional structures that use the various above-
mentioned control measures, might combine to make it possible to achieve
particular institutional outcomes—such as a system that does not revolt
more reliably than if one were working with human beings under historical
conditions.
But then again, articial agents might lack many of the attributes that help us
predict the behavior of human-like agents. Articial agents need not have any of
the social emotions that bind human behavior, emotions such as fear, pride, and
remorse. Nor need articial agents develop attachments to friends and family.
Nor need they exhibit the unconscious body language that makes it dicult for us
humans to conceal our intentions. ese decits might destabilize institutions of
articial agents. Moreover, articial agents might be capable of making big leaps
in cognitive performance as a result of seemingly small changes in their algo-
rithms or architecture. Ruthlessly optimizing articial agents might be willing to
take extreme gambles from which humans would shrink.
38
And superintelligent
agents might show a surprising ability to coordinate with little or no communica-
tion (e.g. by internally modeling each other’s hypothetical responses to various
contingencies). ese and other dierences could make sudden institutional fail-
ure more likely, even in the teeth of what seem like Kevlar-clad methods of social
control.
It is unclear, therefore, how promising the institution design approach is, and
whether it has a greater chance of working with anthropomorphic than with
articial agents. It might be thought that creating an institution with appropri-
ate checks and balances could only increase safety—or, at any rate, not reduce
safety—so that from a risk-mitigation perspective it would always be best if the
method were used. But even this cannot be said with certainty. e approach
adds parts and complexity, and thus may also introduce new ways for things
to go wrong that do not exist in the case of an agent that does not have intel-
ligent subagents as parts. Nevertheless, institution design is worthy of further
exploration.
39
SYNOPSIS
|
207
Synopsis
Goal system engineering is not yet an established discipline. It is not currently
known how to transfer human values to a digital computer, even given human-
level machine intelligence. Having investigated a number of approaches, we found
that some of them appear to be dead ends; but others appear to hold promise and
deserve to be explored further. A summary is provided in Table 12.
Table 2 Summary of value-loading techniques
Explicit representation May hold promise as a way of loading domesticity values.
Does not seem promising as a way of loading more
complex values.
Evolutionary selection Less promising. Powerful search may find a design that
satisfies the formal search criteria but not our inten-
tions. Furthermore, if designs are evaluated by running
them—including designs that do not even meet the
formal criteriaa potentially grave additional danger is
created. Evolution also makes it dicult to avoid massive
mind crime, especially if one is aiming to fashion human-
like minds.
Reinforcement learning A range of dierent methods can be used to solve “rein-
forcement-learning problems,” but they typically involve
creating a system that seeks to maximize a reward signal.
This has an inherent tendency to produce the wireheading
failure mode when the system becomes more intelligent.
Reinforcement learning therefore looks unpromising.
Value accretion We humans acquire much of our specific goal content from
our reactions to experience. While value accretion could in
principle be used to create an agent with human motiva-
tions, the human value-accretion dispositions might be com-
plex and dicult to replicate in a seed AI. A bad approxima-
tion may yield an AI that generalizes dierently than humans
do and therefore acquires unintended final goals. More
research is needed to determine how dicult it would be
to make value accretion work with sucient precision.
Motivational scaolding It is too early to tell how dicult it would be to encour-
age a system to develop internal high-level representa-
tions that are transparent to humans (while keeping the
system’s capabilities below the dangerous level) and then
to use those representations to design a new goal system.
The approach might hold considerable promise. (How-
ever, as with any untested approach that would postpone
much of the hard work on safety engineering until the
development of human-level AI, one should be careful
not to allow it to become an excuse for a lackadaisical
attitude to the control problem in the interim.)
Continued
208
|
ACQUIRING VALUES
If we knew how to solve the value-loading problem, we would confront a fur-
ther problem: the problem of deciding which values to load. What, in other words,
would we want a superintelligence to want? is is the more philosophical prob-
lem to which we turn next.
Value learning A potentially promising approach, but more research is
needed to determine how dicult it would be to formally
specify a reference that successfully points to the relevant
external information about human value (and how dif-
ficult it would be to specify a correctness criterion for a
utility function in terms of such a reference). Also worth
exploring within the value learning category are proposals
of the Hail Mary type or along the lines of Paul Chris-
tiano’s construction (or other such shortcuts).
Emulation modulation If machine intelligence is achieved via the emulation
pathway, it would likely be possible to tweak motivations
through the digital equivalent of drugs or by other means.
Whether this would enable values to be loaded with suf-
ficient precision to ensure safety even as the emulation is
boosted to superintelligence is an open question. (Ethical
constraints might also complicate developments in this
direction.)
Institution design Various strong methods of social control could be applied
in an institution composed of emulations. In principle, so-
cial control methods could also be applied in an institution
composed of artificial intelligences. Emulations have some
properties that would make them easier to control via
such methods, but also some properties that might make
them harder to control than AIs. Institution design seems
worthy of further exploration as a potential value-loading
technique.
Table2 Continued
THE NEED FOR INDIRECT NORMATIVITY
|
209
CHAPTER 13
Choosing the criteria
forchoosing
S
uppose we could install any arbitrary nal value into a seed AI. e deci-
sion as to which value to install could then have the most far-reaching
consequences. Certain other basic parameter choices—concerning the
axioms of the AIs decision theory and epistemology—could be similarly conse-
quential. But foolish, ignorant, and narrow-minded that we are, how could we
be trusted to make good design decisions? How could we choose without lock-
ing in forever the prejudices and preconceptions of the present generation? In
this chapter, we explore how indirect normativity can let us ooad much of the
cognitive work involved in making these decisions onto the superintelligence
itself while still anchoring the outcome in deeper human values.
The need for indirect normativity
How can we get a superintelligence to do what we want? What do we want the
superintelligence to want? Up to this point, we have focused on the former ques-
tion. We now turn to the second question.
Suppose that we had solved the control problem so that we were able to load
any value we chose into the motivation system of a superintelligence, making it
pursue that value as its nal goal. Which value should we install? e choice is no
light matter. If the superintelligence obtains a decisive strategic advantage, the
value would determine the disposition of the cosmic endowment.
Clearly, it is essential that we not make a mistake in our value selection. But how
could we realistically hope to achieve errorlessness in a matter like this? We might
be wrong about morality; wrong also about what is good for us; wrong even about
what we truly want. Specifying a nal goal, it seems, requires making one’s way
through a thicket of thorny philosophical problems. If we try a direct approach,
we are likely to make a hash of things. e risk of mistaken choosing is especially
20
|
CHOOSING THE CRITERIA FORCHOOSING
high when the decision context is unfamiliar—and selecting the nal goal for a
machine superintelligence that will shape all of humanity’s future is an extremely
unfamiliar decision context if any is.
e dismal odds in a frontal assault are reected in the pervasive dissensus
about the relevant issues in value theory. No ethical theory commands major-
ity support among philosophers, so most philosophers must be wrong.
1
It is
also reected in the marked changes that the distribution of moral belief has
undergone over time, many of which we like to think of as progress. In medi-
eval Europe, for instance, it was deemed respectable entertainment to watch a
political prisoner being tortured to death. Cat-burning remained popular in
sixteenth-century Paris.
2
A mere hundred and y years ago, slavery still was
widely practiced in the American South, with full support of the law and moral
custom. When we look back, we see glaring deciencies not just in the behav-
ior but in the moral beliefs of all previous ages. ough we have perhaps since
gleaned some moral insight, we could hardly claim to be now basking in the high
noon of perfect moral enlightenment. Very likely, we are still laboring under
one or more grave moral misconceptions. In such circumstances to select a nal
value based on our current convictions, in a way that locks it in forever and pre-
cludes any possibility of further ethical progress, would be to risk an existential
moral calamity.
Even if we could be rationally condent that we have identied the correct
ethical theory—which we cannot be—we would still remain at risk of making
mistakes in developing important details of this theory. Seemingly simple
moral theories can have a lot of hidden complexity.
3
For example, consider the
(un usually simple) consequentialist theory of hedonism. is theory states,
roughly, that all and only pleasure has value, and all and only pain has disvalue.
4
Even if we placed all our moral chips on this one theory, and the theory turned
out to be right, a great many questions would remain open. Should “higher
pleasures” be given priority over “lower pleasures,” as John Stuart Mill argued?
How should the intensity and duration of a pleasure be factored in? Can pains
and pleasures cancel each other out? What kinds of brain states are associated
with morally relevant pleasures? Would two exact copies of the same brain
state correspond to twice the amount of pleasure?
5
Can there be subconscious
pleasures? How should we deal with extremely small chances of extremely great
pleasures?
6
How should we aggregate over innite populations?
7
Giving the wrong answer to any one of these questions could be catastrophic.
If by selecting a nal value for the superintelligence we had to place a bet not just
on a general moral theory but on a long conjunction of specic claims about how
that theory is to be interpreted and integrated into an eective decision-making
process, then our chances of striking lucky would dwindle to something close to
hopeless. Fools might eagerly accept this challenge of solving in one swing all the
important problems in moral philosophy, in order to inx their favorite answers
into the seed AI. Wiser souls would look hard for some alternative approach, some
way to hedge.
COHERENT EXTRAPOLATED VOLITION
|
2
is takes us to indirect normativity. e obvious reason for building a super-
intelligence is so that we can ooad to it the instrumental reasoning required to
nd eective ways of realizing a given value. Indirect normativity would enable
us also to ooad to the superintelligence some of the reasoning needed to select
the value that is to be realized.
Indirect normativity is a way to answer the challenge presented by the fact that
we may not know what we truly want, what is in our interest, or what is morally
right or ideal. Instead of making a guess based on our own current understanding
(which is probably deeply awed), we would delegate some of the cognitive work
required for value selection to the superintelligence. Since the superintelligence is
better at cognitive work than we are, it may see past the errors and confusions that
cloud our thinking. One could generalize this idea and emboss it as a heuristic
principle:
The principle of epistemic deference
A future superintelligence occupies an epistemically superior vantage point: its
beliefs are (probably, on most topics) more likely than ours to be true. We should
therefore defer to the superintelligence’s opinion whenever feasible.
8
Indirect normativity applies this principle to the value-selection problem.
Lacking condence in our ability to specify a concrete normative standard, we
would instead specify some more abstract condition that any normative standard
should satisfy, in the hope that a superintelligence could nd a concrete stand-
ard that satises the abstract condition. We could give a seed AI the nal goal of
continuously acting according to its best estimate of what this implicitly dened
standard would have it do.
Some examples will serve to make the idea clearer. First we will consider “coher-
ent extrapolated volition,” an indirect normativity proposal outlined by Eliezer
Yudkowsky. We will then introduce some variations and alternatives, to give us a
sense of the range of available options.
Coherent extrapolated volition
Yudkowsky has proposed that a seed AI be given the nal goal of carrying out
humanitys “coherent extrapolated volition” (CEV), which he denes as follows:
Our coherent extrapolated volition is our wish if we knew more, thought faster, were
more the people we wished we were, had grown up farther together; where the extrap-
olation converges rather than diverges, where our wishes cohere rather than interfere;
extrapolated as we wish that extrapolated, interpreted as we wish that interpreted.
9
When Yudkowsky wrote this, he did not purport to present a blueprint for how
to implement this rather poetic prescription. His aim was to give a preliminary
sketch of how CEV might be dened, along with some arguments for why an
approach along these lines is needed.
22
|
CHOOSING THE CRITERIA FORCHOOSING
Many of the ideas behind the CEV proposal have analogs and antecedents in
the philosophical literature. For example, in ethics ideal observer theories seek to
analyze normative concepts like “good” or “right” in terms of the judgments that
a hypothetical ideal observer would make (where an “ideal observer” is dened as
one that is omniscient about non-moral facts, is logically clear-sighted, is impar-
tial in relevant ways and is free from various kinds of biases, and so on).
10
e
CEV approach, however, is not (or need not be construed as) a moral theory. It is
not committed to the claim that there is any necessary link between value and the
preferences of our coherent extrapolated volition. CEV can be thought of simply
as a useful way to approximate whatever has ultimate value, or it can be consid-
ered aside from any connection to ethics. As the main prototype of the indirect
normativity approach, it is worth examining in a little more detail.
Some explications
Some terms in the above quotation require explication. “ought faster,” in
Yudkowsky’s terminology, means if we were smarter and had thought things
through more. “Grown up farther together” seems to mean if we had done our
learning, our cognitive enhancing, and our self-improving under conditions of suit-
able social interaction with one another.
“Where the extrapolation converges rather than diverges” may be understood
as follows. e AI should act on some feature of the result of its extrapolation
only insofar as that feature can be predicted by the AI with a fairly high degree
of condence. To the extent that the AI cannot predict what we would wish if we
were idealized in the manner indicated, the AI should not act on a wild guess;
instead, it should refrain from acting. However, even though many details of our
idealized wishing may be undetermined or unpredictable, there might neverthe-
less be some broad outlines that the AI can apprehend, and it can then at least
act to ensure that the future course of events unfolds within those outlines. For
example, if the AI can reliably estimate that our extrapolated volition would wish
that we not all be in constant agony, or that the universe not be tiled over with
paperclips, then the AI should act to prevent those outcomes.
11
“Where our wishes cohere rather than interfere” may be read as follows. e
AI should act where there is fairly broad agreement between individual humans’
extrapolated volitions. A smaller set of strong, clear wishes might sometimes out-
weigh the weak and muddled wishes of a majority. Also, Yudkowsky thinks that
it should require less consensus for the AI to prevent some particular narrowly
specied outcome, and more consensus for the AI to act to funnel the future into
some particular narrow conception of the good. “e initial dynamic for CEV,” he
writes, “should be conservative about saying ‘yes,’ and listen carefully for ‘no.
12
“Extrapolated as we wish that extrapolated, interpreted as we wish that inter-
preted: e idea behind these last modiers seems to be that the rules for extrap-
olation should themselves be sensitive to the extrapolated volition. An individual
might have a second-order desire (a desire concerning what to desire) that some
COHERENT EXTRAPOLATED VOLITION
|
23
of her rst-order desires not be given weight when her volition is extrapolated.
For example, an alcoholic who has a rst-order desire for booze might also have
a second-order desire not to have that rst-order desire. Similarly, we might have
desires over how various other parts of the extrapolation process should unfold,
and these should be taken into account by the extrapolation process.
It might be objected that even if the concept of humanitys coherent extrapo-
lated volition could be properly dened, it would anyway be impossible—even for
a superintelligence—to nd out what humanity would actually want under the
hypothetical idealized circumstances stipulated in the CEV approach. Without
some information about the content of our extrapolated volition, the AI would
be bere of any substantial standard to guide its behavior. However, although it
would be dicult to know with precision what humanity’s CEV would wish, it
is possible to make informed guesses. is is possible even today, without super-
intelligence. For example, it is more plausible that our CEV would wish for there
to be people in the future who live rich and happy lives than that it would wish
that we should all sit on stools in a dark room experiencing pain. If we can make
at least some such judgments sensibly, so can a superintelligence. From the out-
set, the superintelligence’s conduct could thus be guided by its estimates of the
content of our CEV. It would have strong instrumental reason to rene these ini-
tial estimates (e.g. by studying human culture and psychology, scanning human
brains, and reasoning about how we might behave if we knew more, thought more
clearly, etc.). In investigating these matters, the AI would be guided by its initial
estimates of our CEV; so that, for instance, the AI would not unnecessarily run
myriad simulations replete with unredeemed human suering if it estimated that
our CEV would probably condemn such simulations as mind crime.
Another objection is that there are so many dierent ways of life and moral
codes in the world that it might not be possible to “blend” them into one CEV.
Even if one could blend them, the result might not be particularly appetizing—one
would be unlikely to get a delicious meal by mixing together all the best avors
from everyone’s dierent favorite dish.
13
In answer to this, one could point out that
the CEV approach does not require that all ways of life, moral codes, or personal
values be blended together into one stew. e CEV dynamic is supposed to act only
when our wishes cohere. On issues on which there is widespread irreconcilable
disagreement, even aer the various idealizing conditions have been imposed, the
dynamic should refrain from determining the outcome. To continue the cooking
analogy, it might be that individuals or cultures will have dierent favorite dishes,
but that they can nevertheless broadly agree that aliments should be nontoxic. e
CEV dynamic could then act to prevent food poisoning while otherwise allowing
humans to work out their culinary practices without its guidance or interference.
Rationales for CEV
Yudkowsky’s article oered seven arguments for the CEV approach. ree of
these were basically dierent ways of making the point that while the aim should
24
|
CHOOSING THE CRITERIA FORCHOOSING
be to do something that is humane and helpful, it would be very dicult to lay
down an explicit set of rules that does not have unintended interpretations and
undesirable consequences.
14
e CEV approach is meant to be robust and self-
correcting; it is meant to capture the source of our values instead of relying on
us correctly enumerating and articulating, once and for all, each of our essential
values.
e remaining four arguments go beyond that rst basic (but important) point,
spelling out desiderata on candidate solutions to the value-specication problem
and suggesting that CEV meets these desiderata.
“Encapsulate moral growth”
is is the desideratum that the solution should allow for the possibility of moral
progress. As suggested earlier, there are reasons to believe that our current moral
beliefs are awed in many ways; perhaps deeply awed. If we were to stipulate a
specic and unalterable moral code for the AI to follow, we would in eect be
locking in our present moral convictions, including their errors, destroying any
hope of moral growth. e CEV approach, by contrast, allows for the possibility
of such growth because it has the AI try to do that which we would have wished
it to do if we had developed further under favorable conditions, and it is possible
that if we had thus developed our moral beliefs and sensibilities would have been
purged of their current defects and limitations.
Avoid hijacking the destiny of humankind
Yudkowsky has in mind a scenario in which a small group of programmers creates
a seed AI that then grows into a superintelligence that obtains a decisive strategic
advantage. In this scenario, the original programmers hold in their hands the
entirety of humanity’s cosmic endowment. Obviously, this is a hideous respon-
sibility for any mortal to shoulder. Yet it is not possible for the programmers to
completely shirk the onus once they nd themselves in this situation: any choice
they make, including abandoning the project, would have world-historical conse-
quences. Yudkowsky sees CEV as a way for the programmers to avoid arrogating
to themselves the privilege or burden of determining humanity’s future. By set-
ting up a dynamic that implements humanity’s coherent extrapolated volition—as
opposed to their own volition, or their own favorite moral theory—they in eect
distribute their inuence over the future to all of humanity.
Avoid creating a motive for modern-day humans to fight over the initial dynamic
Distributing inuence over humanitys future is not only morally preferable to
the programming team implementing their own favorite vision, it is also a way to
reduce the incentive to ght over who gets to create the rst superintelligence. In
the CEV approach, the programmers (or their sponsors) exert no more inuence
over the content of the outcome than any other person—though they of course
play a starring causal role in determining the structure of the extrapolation and
in deciding to implement humanity’s CEV instead of some alternative. Avoiding
COHERENT EXTRAPOLATED VOLITION
|
25
conict is important not only because of the immediate harm that conict tends
to cause but also because it hinders collaboration on the dicult challenge of
developing superintelligence safely and benecially.
CEV is meant to be capable of commanding wide support. is is not just
because it allocates inuence equitably. ere is also a deeper ground for the
irenic potential of CEV, namely that it enables many dierent groups to hope that
their preferred vision of the future will prevail totally. Imagine a member of the
Afghan Taliban debating with a member of the Swedish Humanist Association.
e two have very dierent worldviews, and what is a utopia for one might be a
dystopia for the other. Nor might either be thrilled by any compromise position,
such as permitting girls to receive an education but only up to ninth grade, or
permitting Swedish girls to be educated but Afghan girls not. However, both the
Taliban and the Humanist might be able to endorse the principle that the future
should be determined by humanitys CEV. e Taliban could reason that if his
religious views are in fact correct (as he is convinced they are) and if good grounds
for accepting these views exist (as he is also convinced) then humankind would in
the end come to accept these views if only people were less prejudiced and biased,
if they spent more time studying scripture, if they could more clearly understand
how the world works and recognize essential priorities, if they could be freed from
irrational rebelliousness and cowardice, and so forth.
15
e Humanist, similarly,
would believe that under these idealized conditions, humankind would come to
embrace the principles she espouses.
“Keep humankind ultimately in charge of its own destiny”
We might not want an outcome in which a paternalistic superintelligence watches
over us constantly, micromanaging our aairs with an eye towards optimizing
every detail in accordance with a grand plan. Even if we stipulate that the super-
intelligence would be perfectly benevolent, and free from presumptuousness,
arrogance, overbearingness, narrow-mindedness, and other human shortcom-
ings, one might still resent the loss of autonomy entailed by such an arrangement.
We might prefer to create our destiny as we go along, even if it means that we
sometimes fumble. Perhaps we want the superintelligence to serve as a safety net,
to support us when things go catastrophically wrong, but otherwise to leave us to
fend for ourselves.
CEV allows for this possibility. CEV is meant to be an “initial dynamic,” a
process that runs once and then replaces itself with whatever the extrapolated
volition wishes. If humanitys extrapolated volition wishes that we live under the
supervision of a paternalistic AI, then the CEV dynamic would create such an AI
and hand it the reins. If humanitys extrapolated volition instead wishes that a
democratic human world government be created, then the CEV dynamic might
facilitate the establishment of such an institution and otherwise remain invis-
ible. If humanitys extrapolated volition is instead that each person should get
an endowment of resources that she can use as she pleases so long as she respects
the equal rights of others, then the CEV dynamic could make this come true by
26
|
CHOOSING THE CRITERIA FORCHOOSING
operating in the background much like a law of nature, to prevent trespass, the,
assault, and other nonconsensual impingements.
16
e structure of the CEV approach thus allows for a virtually unlimited range
of outcomes. It is also conceivable that humanitys extrapolated volition would
wish that the CEV does nothing at all. In that case, the AI implementing CEV
should, upon having established with sucient probability that this is what
humanitys extrapolated volition would wish it to do, safely shut itself down.
Further remarks
e CEV proposal, as outlined above, is of course the merest schematic. It has a
number of free parameters that could be specied in various ways, yielding dif-
ferent versions of the proposal.
One parameter is the extrapolation base: Whose volitions are to be included?
We might say “everybody,” but this answer spawns a host of further questions.
Does the extrapolation base include so-called “marginal persons” such as
embryos, fetuses, brain-dead persons, patients with severe dementias or who are
in permanent vegetative states? Does each of the hemispheres of a “split-brain”
patient get its own weight in the extrapolation and is this weight the same as that
of the entire brain of a normal subject? What about people who lived in the past
but are now dead? People who will be born in the future? Higher animals and
other sentient creatures? Digital minds? Extraterrestrials?
One option would be to include only the population of adult human beings
on Earth who are alive at the start of the time of the AI’s creation. An initial
extrapolation from this base could then decide whether and how the base should
be expanded. Since the number of “marginals” at the periphery of this base is
relatively small, the result of the extrapolation may not depend much on exactly
where the boundary is drawn—on whether, for instance, it includes fetuses or not.
at somebody is excluded from the original extrapolation base does not
imply that their wishes and well-being are disregarded. If the coherent extrapo-
lated volition of those in the extrapolation base (e.g. living adult human beings)
wishes that moral consideration be extended to other beings, then the outcome
of the CEV dynamic would reect that preference. Nevertheless, it is possible
that the interests of those who are included in the original extrapolation base
would be accommodated to a greater degree than the interests of outsiders. In
particular, if the dynamic acts only where there is broad agreement between
individual extrapolated volitions (as in Yudkowskys original proposal), there
would seem to be a signicant risk of an ungenerous blocking vote that could
prevent, for instance, the welfare of nonhuman animals or digital minds from
being protected. e result might potentially be morally rotten.
17
One motivation for the CEV proposal was to avoid creating a motive for
humans to ght over the creation of the rst superintelligent AI. Although the
CEV proposal scores better on this desideratum than many alternatives, it does
not entirely eliminate motives for conict. A selsh individual, group, or nation
MORALITY MODELS
|
27
might seek to enlarge its slice of the future by keeping others out of the extrapola-
tion base.
A power grab of this sort might be rationalized in various ways. It might be
argued, for instance, that the sponsor who funds the development of the AI
deserves to own the outcome. is moral claim is probably false. It could be
objected, for example, that the project that launches the rst successful seed AI
imposes a vast risk externality on the rest of humanity, which therefore is entitled
to compensation. e amount of compensation owed is so great that it can only
take the form of giving everybody a stake in the upside if things turn out well.
18
Another argument that might be used to rationalize a power grab is that large
segments of humanity have base or evil preferences and that including them in
the extrapolation base would risk turning humanity’s future into a dystopia. It
is dicult to know the share of good and bad in the average persons heart. It is
also dicult to know how much this balance varies between dierent groups,
social strata, cultures, or nations. Whether one is optimistic or pessimistic about
human nature, one may prefer not to wager humanitys cosmic endowment on
the speculation that, for a sucient majority of the seven billion people currently
alive, their better angels would prevail in their extrapolated volitions. Of course,
omitting a certain set of people from the extrapolation base does not guarantee
that light would triumph; and it might well be that the souls that would soon-
est exclude others or grab power for themselves tend rather to contain unusually
large amounts of darkness.
Yet another reason for ghting over the initial dynamic is that one might
believe that somebody elses AI will not work as advertised, even if the AI is billed
as a way to implement humanitys CEV. If dierent groups have dierent beliefs
about which implementation is most likely to succeed, they might ght to prevent
the others from launching. It would be better in such situations if the competing
projects could settle their epistemic dierences by some method that more reli-
ably ascertains who is right than the method of armed conict.
19
Morality models
e CEV proposal is not the only possible form of indirect normativity. For exam-
ple, instead of implementing humanitys coherent extrapolated volition, one could
try to build an AI with the goal of doing what is morally right, relying on the AI’s
superior cognitive capacities to gure out just which actions t that description.
We can call this proposal “moral rightness” (MR). e idea is that we humans
have an imperfect understanding of what is right and wrong, and perhaps an even
poorer understanding of how the concept of moral rightness is to be philosophi-
cally analyzed: but a superintelligence could understand these things better.
20
What if we are not sure whether moral realism is true? We could still attempt
the MR proposal. We should just have to make sure to specify what the AI should
do in the eventuality that its presupposition of moral realism is false. For example,
28
|
CHOOSING THE CRITERIA FORCHOOSING
we could stipulate that if the AI estimates with a sucient probability that there
are no suitable non-relative truths about moral rightness, then it should revert to
implementing coherent extrapolated volition instead, or simply shut itself down.
21
MR appears to have several advantages over CEV. MR would do away with
various free parameters in CEV, such as the degree of coherence among extrapo-
lated volitions that is required for the AI to act on the result, the ease with which
a majority can overrule dissenting minorities, and the nature of the social envi-
ronment within which our extrapolated selves are to be supposed to have “grown
up farther together.” It would seem to eliminate the possibility of a moral failure
resulting from the use of an extrapolation base that is too narrow or too wide.
Furthermore, MR would orient the AI toward morally right action even if our
coherent extrapolated volitions happen to wish for the AI to take actions that are
morally odious. As noted earlier, this seems a live possibility with the CEV pro-
posal. Moral goodness might be more like a precious metal than an abundant
element in human nature, and even aer the ore has been processed and rened
in accordance with the prescriptions of the CEV proposal, who knows whether
the principal outcome will be shining virtue, indierent slag, or toxic sludge?
MR would also appear to have some disadvantages. It relies on the notion of
“morally right,” a notoriously dicult concept, one with which philosophers have
grappled since antiquity without yet attaining consensus as to its analysis. Picking
an erroneous explication of “moral rightness” could result in outcomes that would
be morally very wrong. is diculty of dening “moral rightness” might seem
to count heavily against the MR proposal. However, it is not clear that the MR
proposal is really at a material disadvantage in this regard. e CEV proposal,
too, uses terms and concepts that are dicult to explicate (such as “knowledge,
being more the people we wished we were,” “grown up farther together,” among
others).
22
Even if these concepts are marginally less opaque than “moral right-
ness,” they are still miles removed from anything that programmers can currently
express in code.
23
e path to endowing an AI with any of these concepts might
involve giving it general linguistic ability (comparable, at least, to that of a normal
human adult). Such a general ability to understand natural language could then
be used to understand what is meant by “morally right.” If the AI could grasp the
meaning, it could search for actions that t. As the AI develops superintelligence,
it could then make progress on two fronts: on the philosophical problem of under-
standing what moral rightness is, and on the practical problem of applying this
understanding to evaluate particular actions.
24
While this would not be easy, it is
not clear that it would be any more dicult than extrapolating humanity’s coher-
ent extrapolated volition.
25
A more fundamental issue with MR is that even if can be implemented, it might
not give us what we want or what we would choose if we were brighter and bet-
ter informed. is is of course the essential feature of MR, not an accidental bug.
However, it might be a feature that would be extremely harmful to us.
26
One might try to preserve the basic idea of the MR model while reducing
its demandingness by focusing on moral permissibility: the idea being that we
MORALITY MODELS
|
29
could let the AI pursue humanity’s CEV so long as it did not act in ways that are
morally impermissible. For example, one might formulate the following goal
for the AI:
Among the actions that are morally permissible for the AI, take one that humanitys CEV
would prefer. However, if some part of this instruction has no well-specified meaning, or if
we are radically confused about its meaning, or if moral realism is false, or if we acted mor-
ally impermissibly in creating an AI with this goal, then undergo a controlled shutdown.
27
Follow the intended meaning of this instruction.
One might still worry that this moral permissibility model (MP) represents an
unpalatably high degree of respect for the requirements of morality. How big a
sacrice it would entail depends on which ethical theory is true.
28
If ethics is sat-
iscing, in the sense that it counts as morally permissible any action that con-
forms to a few basic moral constraints, then MP may leave ample room for our
coherent extrapolated volition to inuence the AI’s actions. However, if ethics is
maximizingfor example, if the only morally permissible actions are those that
have the morally best consequences—then MP may leave little or no room for our
own preferences to shape the outcome.
To illustrate this concern, let us return for a moment to the example of hedon-
istic consequentialism. Suppose that this ethical theory is true, and that the AI
knows it to be so. For present purposes, we can dene hedonistic consequential-
ism as the claim that an action is morally right (and morally permissible) if and
only if, among all feasible actions, no other action would produce a greater bal-
ance of pleasure over suering. e AI, following MP, might maximize the surfeit
of pleasure by converting the accessible universe into hedonium, a process that
may involve building computronium and using it to perform computations that
instantiate pleasurable experiences. Since simulating any existing human brain is
not the most ecient way of producing pleasure, a likely consequence is that we
all die.
By enacting either the MR or the MP proposal, we would thus risk sacricing
our lives for a greater good. is would be a bigger sacrice than one might think,
because what we stand to lose is not merely the chance to live out a normal human
life but the opportunity to enjoy the far longer and richer lives that a friendly
superintelligence could bestow.
e sacrice looks even less appealing when we reect that the superintelli-
gence could realize a nearly-as-great good (in fractional terms) while sacricing
much less of our own potential well-being. Suppose that we agreed to allow almost
the entire accessible universe to be converted into hedonium—everything except
a small preserve, say the Milky Way, which would be set aside to accommodate
our own needs. en there would still be a hundred billion galaxies devoted to the
maximization of pleasure. But we would have one galaxy within which to create
wonderful civilizations that could last for billions of years and in which humans
and nonhuman animals could survive and thrive, and have the opportunity to
develop into beatic posthuman spirits.
29
220
|
CHOOSING THE CRITERIA FORCHOOSING
If one prefers this latter option (as I would be inclined to do) it implies that one
does not have an unconditional lexically dominant preference for acting morally
permissibly. But it is consistent with placing great weight on morality.
Even from a purely moral point of view, it might be better to advocate
some proposal that is less morally ambitious than MR or MP. If the morally
best has no chance of being implemented—perhaps because of its frowning
demandingness—it might be morally better to promote some other proposal,
one that would be near-ideal and whose chances of being implemented could be
signicantly increased by our promoting it.
30
Do What I Mean
We might feel unsure whether to go for CEV, MR, MP, or something else. Could
we punt on this higher-level decision as well, ooading even more cognitive work
onto the AI? Where is the limit to our possible laziness?
Consider, for example, the following “reasons-based” goal:
Do whatever we would have had most reason to ask the AI to do.
is goal might boil down to extrapolated volition or morality or something else,
but it would seem to spare us the eort and risk involved in trying to gure out
for ourselves which of these more specic objectives we would have most reason
to select.
Some of the problems with the morality-based goals, however, also apply
here. First, we might fear that this reasons-based goal would leave too little
room for our own desires. Some philosophers maintain that a person always
has most reason to do what it would be morally best for her to do. If those phi-
losophers are right, then the reason-based goal collapses into MR—with the
concomitant risk that a superintelligence implementing such a dynamic would
kill everyone within reach. Second, as with all proposals couched in technical
language, there is a possibility that we might have misunderstood the meaning
of our own assertions. We saw that, in the case of the morality-based goals,
asking the AI to do what is right may lead to unforeseen and unwanted conse-
quences such that, had we anticipated them, we would not have implemented
the goal in question. e same applies to asking the AI to do what we have most
reason to do.
What if we try to avoid these diculties by couching a goal in emphatically
nontechnical language—such as in terms of “niceness”:
31
Take the nicest action; or, if no action is nicest, then take an action that is at least super-
duper nice.
How could there be anything objectionable about building a nice AI? But we must
ask what precisely is meant by this expression. e lexicon lists various meanings
of “nice” that are clearly not intended to be used here: we do not intend that the AI
COMPONENT LIST
|
22
should be courteous and polite nor overdelicate or fastidious. If we can count on
the AI recognizing the intended interpretation of “niceness” and being motivated
to pursue niceness in just that sense, then this goal would seem to amount to a
command to do what the programmers meant for the AI to do.
32
An injunction to
similar eect was included in the formulation of CEV (“. . . interpreted as we wish
that interpreted”) and in the moral-permissibility criterion as rendered earlier
(“. . . follow the intended meaning of this instruction”). By axing such a “Do
What I Mean” clause we may indicate that the other words in the goal description
should be construed charitably rather than literally. But saying that the AI should
be “nice” adds almost nothing: the real work is done by the “Do What I Mean”
instruction. If we knew how to code “Do What I Mean” in a general and powerful
way, we might as well use that as a standalone goal.
How might one implement such a “Do What I Mean” dynamic? at is, how
might we create an AI motivated to charitably interpret our wishes and unspoken
intentions and to act accordingly? One initial step could be to try to get clearer
about what we mean by “Do What I Mean.” It might help if we could explicate
this in more behavioristic terms, for example in terms of revealed preferences in
various hypothetical situations—such as situations in which we had more time
to consider the options, in which we were smarter, in which we knew more of
the relevant facts, and in which in various other ways conditions would be more
favorable for us accurately manifesting in concrete choices what we mean when
we say that we want an AI that is friendly, benecial, nice ...
Here, of course, we come full circle. We have returned to the indirect norma-
tivity approach with which we started—the CEV proposal, which, in essence,
expunges all concrete content from the value specication, leaving only an
abstract value dened in purely procedural terms: to do that which we would have
wished for the AI to do in suitably idealized circumstances. By means of such
indirect normativity, we could hope to ooad to the AI much of the cognitive
work that we ourselves would be trying to perform if we attempted to articulate
a more concrete description of what values the AI is to pursue. In seeking to take
full advantage of the AI’s epistemic superiority, CEV can thus be seen as an appli-
cation of the principle of epistemic deference.
Component list
So far we have considered dierent options for what content to put into the goal
system. But an AI’s behavior will also be inuenced by other design choices. In
particular, it can make a critical dierence which decision theory and which epis-
temology it uses. Another important question is whether the AI’s plans will be
subject to human review before being put into action.
Table 13 summarizes these design choices. A project that aims to build a super-
intelligence ought to be able to explain what choices it has made regarding each of
these components, and to justify why those choices were made.
33
222
|
CHOOSING THE CRITERIA FORCHOOSING
Goal content
We have already discussed how indirect normativity might be used in specifying
the values that the AI is to pursue. We discussed some options, such as morality-
based models and coherent extrapolated volition. Each such option creates fur-
ther choices that need to be made. For instance, the CEV approach comes in many
varieties, depending on who is included in the extrapolation base, the structure
of the extrapolation, and so forth. Other forms of motivation selection methods
might call for dierent types of goal content. For example, an oracle might be built
to place a value on giving accurate answers. An oracle constructed with domes-
ticity motivation might also have goal content that disvalues the excessive use of
resources in producing its answers.
Another design choice is whether to include special provisions in the goal con-
tent to reward individuals who contribute to the successful realization of the AI,
for example by giving them extra resources or inuence over the AI’s behavior.
We can term any such provisions “incentive wrapping.” Incentive wrapping could
be seen as a way to increase the likelihood that the project will be successful, at the
cost of compromising to some extent the goal that the project set out to achieve.
For example, if the projects goal is to create a dynamic that implements
humanitys coherent extrapolated volition, then an incentive wrapping scheme
might specify that certain individuals’ volitions should be given extra weight
in the extrapolation. If such a project is successful, the result is not necessarily
the implementation of humanitys coherent extrapolated volition. Instead, some
approximation to this goal might be achieved.
34
Since incentive wrapping would be a piece of goal content that would be inter-
preted and pursued by a superintelligence, it could take advantage of indirect nor-
mativity to specify subtle and complicated provisions that would be dicult for a
human manager to implement. For example, instead of rewarding programmers
according to some crude but easily accessible metric, such as how many hours they
worked or how many bugs they corrected, the incentive wrapping could specify that
Table 3 Component list
Goal content What objective should the AI pursue? How should a description of
this objective be interpreted? Should the objective include giving
special rewards to those who contributed to the project’s success?
Decision theory Should the AI use causal decision theory, evidential decision theory,
updateless decision theory, or something else?
Epistemology What should the AI’s prior probability function be, and what other
explicit or implicit assumptions about the world should it make?
What theory of anthropics should it use?
Ratification Should the AI’s plans be subjected to human review before being
put into eect? If so, what is the protocol for that review process?
COMPONENT LIST
|
223
programmers “are to be rewarded in proportion to how much their contributions
increased some reasonable ex ante probability of the project being successfully com-
pleted in the way the sponsors intended.” Further, there would be no reason to limit
the incentive wrapping to project sta. It could instead specify that every person
should be rewarded according to their just deserts. Credit allocation is a dicult
problem, but a superintelligence could be expected to do a reasonable job of approx-
imating the criteria specied, explicitly or implicitly, by the incentive wrapping.
It is conceivable that the superintelligence might even nd some way of
rewarding individuals who have died prior to the superintelligence’s creation.
35
e incentive wrapping could then be extended to embrace at least some of the
deceased, potentially including individuals who died before the project was con-
ceived, or even antedating the rst enunciation of the concept of incentive wrap-
ping. Although the institution of such a retroactive policy would not causally
incentivize those people who are already resting in their graves as these words are
being put to the page, it might be favored for moral reasons—though it could be
argued that insofar as fairness is a goal, it should be included as part of the target
specication proper rather than in the surrounding incentive wrapping.
We cannot here delve into all the ethical and strategic issues associated with
incentive wrapping. A projects position on these issues, however, would be an
important aspect of its fundamental design concept.
Decision theory
Another important design choice is which decision theory the AI should be built
to use. is might aect how the AI behaves in certain strategically fateful situ-
ations. It might determine, for instance, whether the AI is open to trade with, or
extortion by, other superintelligent civilizations whose existence it hypothesizes.
e particulars of the decision theory could also matter in predicaments involv-
ing nite probabilities of innite payos (“Pascalian wagers”) or extremely small
probabilities of extremely large nite payos (“Pascalian muggings”) or in con-
texts where the AI is facing fundamental normative uncertainty or where there
are multiple instantiations of the same agent program.
36
e options on the table include causal decision theory (in a variety of avors)
and evidential decision theory, along with newer candidates such as “timeless
decision theory” and “updateless decision theory,” which are still under develop-
ment.
37
It may prove dicult to identify and articulate the correct decision theory,
and to have justied condence that we have got it right. Although the prospects
for directly specifying an AI’s decision theory are perhaps more hopeful than
those of directly specifying its nal values, we are still confronted with a substan-
tial risk of error. Many of the complications that might break the currently most
popular decision theories were discovered only recently, suggesting that there
might exist further problems that have not yet come into sight. e result of giv-
ing the AI a awed decision theory might be disastrous, possibly amounting to an
existential catastrophe.
224
|
CHOOSING THE CRITERIA FORCHOOSING
In view of these diculties, one might consider an indirect approach to speci-
fying the decision theory that the AI should use. Exactly how to do this is not yet
clear. We might want the AI to use “that decision theory D which we would have
wanted it to use had we thought long and hard about the matter.” However, the AI
would need to be able to make decisions before learning what D is. It would thus
need some eective interim decision theory D that would govern its search for
D. One might try to dene D to be some sort of superposition of the AI’s current
hypotheses about D (weighed by their probabilities), though there are unsolved
technical problems with how to do this in a fully general way.
38
ere is also cause
for concern that the AI might make irreversibly bad decisions (such as rewrit-
ing itself to henceforth run on some awed decision theory) during the learning
phase, before the AI has had the opportunity to determine which particular deci-
sion theory is correct. To reduce the risk of derailment during this period of vul-
nerability we might instead try to endow the seed AI with some form of restricted
rationality: a deliberately simplied but hopefully dependable decision theory
that staunchly ignores esoteric considerations, even ones we think may ultimately
be legitimate, and that is designed to replace itself with a more sophisticated (indi-
rectly specied) decision theory once certain conditions are met.
39
It is an open
research question whether and how this could be made to work.
Epistemology
A project will also need to make a fundamental design choice in selecting the
AIs epistemology, specifying the principles and criteria whereby empirical
hypotheses are to be evaluated. Within a Bayesian framework, we can think of
the epistemology as a prior probability function—the AI’s implicit assignment of
probabilities to possible worlds before it has taken any perceptual evidence into
account. In other frameworks, the epistemology might take a dierent form; but
in any case some inductive learning rule is necessary if the AI is to generalize
from past observations and make predictions about the future.
40
As with the goal
content and the decision theory, however, there is a risk that our epistemology
specication could miss the mark.
One might think that there is a limit to how much damage could arise from
an incorrectly specied epistemology. If the epistemology is too dysfunctional,
then the AI could not be very intelligent and it could not pose the kind of risk dis-
cussed in this book. But the concern is that we may specify an epistemology that
is suciently sound to make the AI instrumentally eective in most situations,
yet which has some aw that leads the AI astray on some matter of crucial impor-
tance. Such an AI might be akin to a quick-witted person whose worldview is
predicated on a false dogma, held to with absolute conviction, who consequently
“tilts at windmills” and gives his all in pursuit of fantastical or harmful objectives.
Certain kinds of subtle dierence in an AIs prior could turn out to make a
drastic dierence to how it behaves. For example, an AI might be given a prior
that assigns zero probability to the universe being innite. No matter how much
COMPONENT LIST
|
225
astronomical evidence it accrues to the contrary, such an AI would stubbornly
reject any cosmological theory that implied an innite universe; and it might
make foolish choices as a result.
41
Or an AI might be given a prior that assigns
a zero probability to the universe not being Turing-computable (this is in fact
a common feature of many of the priors discussed in the literature, including
the Kolmogorov complexity prior mentioned in Chapter 1), again with poorly
understood consequences if the embedded assumption—known as the “Church–
Turing thesis”—should turn out to be false. An AI could also end up with a prior
that makes strong metaphysical commitments of one sort or another, for instance
by ruling out a priori the possibility that any strong form of mind–body dual-
ism could be true or the possibility that there are irreducible moral facts. If any
of those commitments is mistaken, the AI might seek to realize its nal goals
in ways that we would regard as perverse instantiations. Yet there is no obvious
reason why such an AI, despite being fundamentally wrong about one important
matter, could not be suciently instrumentally eective to secure a decisive stra-
tegic advantage. (Anthropics, the study of how to make inferences from indexical
information in the presence of observation selection eects, is another area where
the choice of epistemic axioms could prove pivotal.
42
)
We might reasonably doubt our ability to resolve all foundational issues in epis-
temology in time for the construction of the rst seed AI. We may, therefore, con-
sider taking an indirect approach to specifying the AI’s epistemology. is would
raise many of the same issues as taking an indirect approach to specifying its deci-
sion theory. In the case of epistemology, however, there may be greater hope of
benign convergence, with any of a wide class of epistemologies providing an ade-
quate foundation for safe and eective AI and ultimately yielding similar doxastic
results. e reason for this is that suciently abundant empirical evidence and
analysis would tend to wash out any moderate dierences in prior expectations.
43
A good aim would be to endow the AI with fundamental epistemological prin-
ciples that match those governing our own thinking. Any AI diverging from this
ideal is an AI that we would judge to be reasoning incorrectly if we consistently
applied our own standards. Of course, this applies only to our fundamental epis-
temological principles. Non-fundamental principles should be continuously cre-
ated and revised by the seed AI itself as it develops its understanding of the world.
e point of superintelligence is not to pander to human preconceptions but to
make mincemeat out of our ignorance and folly.
Ratification
e nal item in our list of design choices is ratication. Should the AIs plans
be subjected to human review before being put into eect? For an oracle, this
question is implicitly answered in the armative. e oracle outputs information;
human reviewers choose whether and how to act upon it. For genies, sovereigns,
and tool-AIs, however, the question of whether to use some form of ratication
remains open.
2 26
|
CHOOSING THE CRITERIA FORCHOOSING
To illustrate how ratication might work, consider an AI intended to func-
tion as a sovereign implementing humanitys CEV. Instead of launching this AI
directly, imagine that we rst built an oracle AI for the sole purpose of answering
questions about what the sovereign AI would do. As earlier chapters revealed,
there are risks in creating a superintelligent oracle (such as risks of mind crime or
infrastructure profusion). But for purposes of this example let us assume that the
oracle AI has been successfully implemented in a way that avoided these pitfalls.
We thus have an oracle AI that oers us its best guesses about the consequences
of running some piece of code intended to implement humanitys CEV. e ora-
cle may not be able to predict in detail what would happen, but its predictions
are likely to be better than our own. (If it were impossible even for a superintel-
ligence to predict anything about the code would do, we would be crazy to run
it.) So the oracle ponders for a while and then presents its forecast. To make the
answer intelligible, the oracle may oer the operator a range of tools with which to
explore various features of the predicted outcome. e oracle could show pictures
of what the future looks like and provide statistics about the number of sentient
beings that will exist at dierent times, along with average, peak, and lowest lev-
els of well-being. It could oer intimate biographies of several randomly selected
individuals (perhaps imaginary people selected to be probably representative). It
could highlight aspects of the future that the operator might not have thought of
inquiring about but which would be regarded as pertinent once pointed out.
Being able to preview the outcome in this manner has obvious advantages. e
preview could reveal the consequences of an error in a planned sovereign’s design
specications or source code. If the crystal ball shows a ruined future, we could
scrap the code for the planned sovereign AI and try something else. A strong case
could be made that we should familiarize ourselves with the concrete ramica-
tions of an option before committing to it, especially when the entire future of the
race is on the line.
What is perhaps less obvious is that ratication also has potentially signicant
disadvantages. e irenic quality of CEV might be undermined if opposing fac-
tions, instead of submitting to the arbitration of superior wisdom in condent
expectation of being vindicated, could see in advance what the verdict would
be. A proponent of the morality-based approach might worry that the spon-
sor’s resolve would collapse if all the sacrices required by the morally optimal
were to be revealed. And we might all have reason to prefer a future that holds
some surprises, some dissonance, some wildness, some opportunities for self-
overcoming—a future whose contours are not too snugly tailored to present
preconceptions but provide some give for dramatic movement and unplanned
growth. We might be less likely to take such an expansive view if we could cherry-
pick every detail of the future, sending back to the drawing board any dra that
does not fully conform to our fancy at that moment.
e issue of sponsor ratication is therefore less clear-cut than it might initially
seem. Nevertheless, on balance it would seem prudent to take advantage of an
opportunity to preview, if that functionality is available. But rather than letting
GETTING CLOSE ENOUGH
|
227
the reviewer ne-tune every aspect of the outcome, we might give her a simple
veto which could be exercised only a few times before the entire project would be
aborted.
44
Getting close enough
e main purpose of ratication would be to reduce the probability of catastrophic
error. In general, it seems wise to aim at minimizing the risk of catastrophic error
rather than at maximizing the chance of every detail being fully optimized. ere
are two reasons for this. First, humanity’s cosmic endowment is astronomically
large—there is plenty to go around even if our process involves some waste or
accepts some unnecessary constraints. Second, there is a hope that if we but get
the initial conditions for the intelligence explosion approximately right, then the
resulting superintelligence may eventually home in on, and precisely hit, our ulti-
mate objectives. e important thing is to land in the right attractor basin.
With regard to epistemology, it is plausible that a wide range of priors will ulti-
mately converge to very similar posteriors (when computed by a superintelligence
and conditionalized on a realistic amount of data). We therefore need not worry
about getting the epistemology exactly right. We must just avoid giving the AI a
prior that is so extreme as to render the AI incapable of learning vital truths even
with the benet of copious experience and analysis.
45
With regard to decision theory, the risk of irrecoverable error seems larger.
We might still hope to directly specify a decision theory that is good enough.
Asuperintelligent AI could switch to a new decision theory at any time; however,
if it starts out with a suciently wrong decision theory it may not see the reason
to switch. Even if an agent comes to see the benets of having a dierent deci-
sion theory, the realization might come too late. For example, an agent designed
to refuse blackmail might enjoy the benet of deterring would-be extortionists.
For this reason, blackmailable agents might do well to proactively adopt a non-
exploitable decision theory. Yet once a blackmailable agent receives the threat and
regards it as credible, the damage is done.
Given an adequate epistemology and decision theory, we could try to design
the system to implement CEV or some other indirectly specied goal content.
Again there is hope of convergence: that dierent ways of implementing a CEV-
like dynamic would lead to the same utopian outcome. Short of such convergence,
we may still hope that many of the dierent possible outcomes are good enough
to count as existential success.
It is not necessary for us to create a highly optimized design. Rather, our focus
should be on creating a highly reliable design, one that can be trusted to retain
enough sanity to recognize its own failings. An imperfect superintelligence,
whose fundamentals are sound, would gradually repair itself; and having done
so, it would exert as much benecial optimization power on the world as if it had
been perfect from the outset.
228
|
THE STRATEGIC PICTURE
CHAPTER 14
The strategic picture
I
t is now time to consider the challenge of superintelligence in a broader con-
text. We would like to orient ourselves in the strategic landscape suciently
to know at least which general direction we should be heading. is, it turns
out, is not at all easy. Here in the penultimate chapter, we introduce some general
analytical concepts that help us think about long-term science and technology
policy issues. We then apply them to the issue of machine intelligence.
It can be illuminating to make a rough distinction between two dierent norma-
tive stances from which a proposed policy may be evaluated. e person-aecting
perspective asks whether a proposed change would be in “our interest”—that is to
say, whether it would (on balance, and in expectation) be in the interest of those
morally considerable creatures who either already exist or will come into exist-
ence independently of whether the proposed change occurs or not. e imper-
sonal perspective, in contrast, gives no special consideration to currently existing
people, or to those who will come to exist independently of whether the proposed
change occurs. Instead, it counts everybody equally, independently of their tem-
poral location. e impersonal perspective sees great value in bringing new peo-
ple into existence, provided they have lives worth living: the more happy lives
created, the better.
is distinction, although it barely hints at the moral complexities associated
with a machine intelligence revolution, can be useful in a rst-cut analysis. Here we
will rst examine matters from the impersonal perspective. We will later see what
changes if person-aecting considerations are given weight in our deliberations.
Science and technology strategy
Before we zoom in on issues specic to machine superintelligence, we must intro-
duce some strategic concepts and considerations that pertain to scientic and
technological development more generally.
SCIENCE AND TECHNOLOGY STRATEGY
|
229
Dierential technological development
Suppose that a policymaker proposes to cut funding for a certain research eld,
out of concern for the risks or long-term consequences of some hypothetical tech-
nology that might eventually grow from its soil. She can then expect a howl of
opposition from the research community.
Scientists and their public advocates oen say that it is futile to try to control
the evolution of technology by blocking research. If some technology is feasible
(the argument goes) it will be developed regardless of any particular policymak-
er’s scruples about speculative future risks. Indeed, the more powerful the capa-
bilities that a line of development promises to produce, the surer we can be that
somebody, somewhere, will be motivated to pursue it. Funding cuts will not stop
progress or forestall its concomitant dangers.
Interestingly, this futility objection is almost never raised when a policymaker
proposes to increase funding to some area of research, even though the argument
would seem to cut both ways. One rarely hears indignant voices protest: “Please
do not increase our funding. Rather, make some cuts. Researchers in other coun-
tries will surely pick up the slack; the same work will get done anyway. Don’t
squander the public’s treasure on domestic scientic research!”
What accounts for this apparent doublethink? One plausible explanation,
of course, is that members of the research community have a self-serving bias
which leads us to believe that research is always good and tempts us to embrace
almost any argument that supports our demand for more funding. However, it
is also possible that the double standard can be justied in terms of national self-
interest. Suppose that the development of a technology has two eects: giving a
small benet B to its inventors and the country that sponsors them, while impos-
ing an aggregately larger harm H—which could be a risk externality—on every-
body. Even somebody who is largely altruistic might then choose to develop
the overall harmful technology. ey might reason that the harm H will result
no matter what they do, since if they refrain somebody else will develop the
technology anyway; and given that total welfare cannot be aected, they might
as well grab the benet B for themselves and their nation. (“Unfortunately, there
will soon be a device that will destroy the world. Fortunately, we got the grant
to build it!”)
Whatever the explanation for the futility objection’s appeal, it fails to show that
there is in general no impersonal reason for trying to steer technological develop-
ment. It fails even if we concede the motivating idea that with continued scientic
and technological development eorts, all relevant technologies will eventually
be developed—that is, even if we concede the following:
Technological completion conjecture
If scientific and technological development eorts do not eectively cease, then all
important basic capabilities that could be obtained through some possible technol-
ogy will be obtained.
230
|
THE STRATEGIC PICTURE
ere are at least two reasons why the technological completion conjecture does
not imply the futility objection. First, the antecedent might not hold, because it is
not in fact a given that scientic and technological development eorts will not
eectively cease (before the attainment of technological maturity). is reserva-
tion is especially pertinent in a context that involves existential risk. Second, even
if we could be certain that all important basic capabilities that could be obtained
through some possible technology will be obtained, it could still make sense to
attempt to inuence the direction of technological research. What matters is not
only whether a technology is developed, but also when it is developed, by whom,
and in what context. ese circumstances of birth of a new technology, which
shape its impact, can be aected by turning funding spigots on or o (and by
wielding other policy instruments).
ese reections suggest a principle that would have us attend to the relative
speed with which dierent technologies are developed:
2
The principle of dierential technological development
Retard the development of dangerous and harmful technologies, especially ones
that raise the level of existential risk; and accelerate the development of beneficial
technologies, especially those that reduce the existential risks posed by nature or by
other technologies.
A policy could thus be evaluated on the basis of how much of a dierential advan-
tage it gives to desired forms of technological development over undesired forms.
3
Preferred order of arrival
Some technologies have an ambivalent eect on existential risks, increasing some
existential risks while decreasing others. Superintelligence is one such technology.
We have seen in earlier chapters that the introduction of machine super-
intelligence would create a substantial existential risk. But it would reduce
many other existential risks. Risks from nature—such as asteroid impacts,
supervolcanoes, and natural pandemics—would be virtually eliminated, since
superintelligence could deploy countermeasures against most such hazards,
or at least demote them to the non-existential category (for instance, via space
colonization).
ese existential risks from nature are comparatively small over the relevant
timescales. But superintelligence would also eliminate or reduce many anthropo-
genic risks. In particular, it would reduce risks of accidental destruction, includ-
ing risk of accidents related to new technologies. Being generally more capable
than humans, a superintelligence would be less likely to make mistakes, and
more likely to recognize when precautions are needed, and to implement precau-
tions competently. A well-constructed superintelligence might sometimes take a
risk, but only when doing so is wise. Furthermore, at least in scenarios where the
superintelligence forms a singleton, many non-accidental anthropogenic existen-
tial risks deriving from global coordination problems would be eliminated. ese
SCIENCE AND TECHNOLOGY STRATEGY
|
23
include risks of wars, technology races, undesirable forms of competition and
evolution, and tragedies of the commons.
Since substantial peril would be associated with human beings developing syn-
thetic biology, molecular nanotechnology, climate engineering, instruments for
biomedical enhancement and neuropsychological manipulation, tools for social
control that may facilitate totalitarianism or tyranny, and other technologies as-
yet unimagined, eliminating these types of risk would be a great boon. An argu-
ment could therefore be mounted that earlier arrival dates of superintelligence
are preferable. However, if risks from nature and from other hazards unrelated
to future technology are small, then this argument could be rened: what mat-
ters is that we get superintelligence before other dangerous technologies, such
as advanced nanotechnology. Whether it happens sooner or later may not be so
important (from an impersonal perspective) so long as the order of arrival is right.
e ground for preferring superintelligence to come before other potentially
dangerous technologies, such as nanotechnology, is that superintelligence would
reduce the existential risks from nanotechnology but not vice versa.
4
Hence, if
we create superintelligence rst, we will face only those existential risks that are
associated with superintelligence; whereas if we create nanotechnology rst, we
will face the risks of nanotechnology and then, additionally, the risks of superin-
telligence.
5
Even if the existential risks from superintelligence are very large, and
even if superintelligence is the riskiest of all technologies, there could thus be a
case for hastening its arrival.
ese “sooner-is-better” arguments, however, presuppose that the riskiness of
creating superintelligence is the same regardless of when it is created. If, instead,
its riskiness declines over time, it might be better to delay the machine intelli-
gence revolution. While a later arrival would leave more time for other existential
catastrophes to intercede, it could still be preferable to slow the development of
superintelligence. is would be especially plausible if the existential risks asso-
ciated with superintelligence are much larger than those associated with other
disruptive technologies.
ere are several quite strong reasons to believe that the riskiness of an intel-
ligence explosion will decline signicantly over a multidecadal timeframe. One
reason is that a later date leaves more time for the development of solutions to
the control problem. e control problem has only recently been recognized, and
most of the current best ideas for how to approach it were discovered only within
the past decade or so (and in several cases during the time that this book was
being written). It is plausible that the state of the art will advance greatly over the
next several decades; and if the problem turns out to be very dicult, a signi-
cant rate of progress might continue for a century or more. e longer it takes for
superintelligence to arrive, the more such progress will have been made when it
does. is is an important consideration in favor of later arrival dates—and a very
strong consideration against extremely early arrival dates.
Another reason why superintelligence later might be safer is that this would
allow more time for various benecial background trends of human civilization
232
|
THE STRATEGIC PICTURE
to play themselves out. How much weight one attaches to this consideration will
depend on how optimistic one is about these trends.
An optimist could certainly point to a number of encouraging indicators and
hopeful possibilities. People might learn to get along better, leading to reductions
in violence, war, and cruelty; and global coordination and the scope of politi-
cal integration might increase, making it easier to escape undesirable technology
races (more on this below) and to work out an arrangement whereby the hoped-
for gains from an intelligence explosion would be widely shared. ere appear to
be long-term historical trends in these directions.
6
Further, an optimist could expect that the “sanity level” of humanity will rise
over the course of this century—that prejudices will (on balance) recede, that
insights will accumulate, and that people will become more accustomed to think-
ing about abstract future probabilities and global risks. With luck, we could see a
general upli of epistemic standards in both individual and collective cognition.
Again, there are trends pushing in these directions. Scientic progress means that
more will be known. Economic growth may give a greater portion of the world’s
population adequate nutrition (particularly during the early years of life that are
important for brain development) and access to quality education. Advances in
information technology will make it easier to nd, integrate, evaluate, and com-
municate data and ideas. Furthermore, by the century’s end, humanity will have
made an additional hundred years’ worth of mistakes, from which something
might have been learned.
Many potential developments are ambivalent in the abovementioned sense
increasing some existential risks and decreasing others. For example, advances in
surveillance, data mining, lie detection, biometrics, and psychological or neuro-
chemical means of manipulating beliefs and desires could reduce some existential
risks by making it easier to coordinate internationally or to suppress terrorists
and renegades at home. ese same advances, however, might also increase some
existential risks by amplifying undesirable social dynamics or by enabling the
formation of permanently stable totalitarian regimes.
One important frontier is the enhancement of biological cognition, such as
through genetic selection. When we discussed this in Chapters 2 and 3, we
concluded that the most radical forms of superintelligence would be more
likely to arise in the form of machine intelligence. at claim is consistent with
cognitive enhancement playing an important role in the lead-up to, and crea-
tion of, machine superintelligence. Cognitive enhancement might seem obvi-
ously risk-reducing: the smarter the people working on the control problem,
the more likely they are to nd a solution. However, cognitive enhancement
could also hasten the development of machine intelligence, thus reducing the
time available to work on the problem. Cognitive enhancement would also
have many other relevant consequences. ese issues deserve a closer look.
(Most of the following remarks about “cognitive enhancement” apply equally
to non-biological means of increasing our individual or collective epistemic
eectiveness.)
SCIENCE AND TECHNOLOGY STRATEGY
|
233
Rates of change and cognitive enhancement
An increase in either the mean or the upper range of human intellectual ability
would likely accelerate technological progress across the board, including pro-
gress toward various forms of machine intelligence, progress on the control prob-
lem, and progress on a wide swath of other technical and economic objectives.
What would be the net eect of such acceleration?
Consider the limiting case of a “universal accelerator,” an imaginary interven-
tion that accelerates literally everything. e action of such a universal accelerator
would correspond merely to an arbitrary rescaling of the time metric, producing
no qualitative change in observed outcomes.
7
If we are to make sense of the idea that cognitive enhancement might gener-
ally speed things up, we clearly need some other concept than that of universal
acceleration. A more promising approach is to focus on how cognitive enhance-
ment might increase the rate of change in one type of process relative to the rate of
change in some other type of process. Such dierential acceleration could aect a
systems dynamics. us, consider the following concept:
Macro-structural development acceleratorA lever that accelerates the rate at which
macro-structural features of the human condition develop, while leaving unchanged the
rate at which micro-level human aairs unfold.
Imagine pulling this lever in the decelerating direction. A brake pad is lowered
onto the great wheel of world history; sparks y and metal screeches. Aer the
wheel has settled into a more leisurely pace, the result is a world in which tech-
nological innovation occurs more slowly and in which fundamental or globally
signicant change in political structure and culture happens less frequently and
less abruptly. A greater number of generations come and go before one era gives
way to another. During the course of a lifespan, a person sees little change in the
basic structure of the human condition.
For most of our species’ existence, macro-structural development was slower
than it is now. Fiy thousand years ago, an entire millennium might have
elapsed without a single signicant technological invention, without any notice-
able increase in human knowledge and understanding, and without any glob-
ally meaningful political change. On a micro-level, however, the kaleidoscope of
human aairs churned at a reasonable rate, with births, deaths, and other person-
ally and locally signicant events. e average persons day might have been more
action-packed in the Pleistocene than it is today.
If you came upon a magic lever that would let you change the rate of macro-
structural development, what should you do? Ought you to accelerate, decelerate,
or leave things as they are?
Assuming the impersonal standpoint, this question requires us to consider the
eects on existential risk. Let us distinguish between two kinds of risk: “state
risks” and “step risks.” A state risk is one that is associated with being in a certain
state, and the total amount of state risk to which a system is exposed is a direct
234
|
THE STRATEGIC PICTURE
function of how long the system remains in that state. Risks from nature are typi-
cally state risks: the longer we remain exposed, the greater the chance that we
will get struck by an asteroid, supervolcanic eruption, gamma ray burst, naturally
arising pandemic, or some other slash of the cosmic scythe. Some anthropogenic
risks are also state risks. At the level of an individual, the longer a soldier pokes his
head up above the parapet, the greater the cumulative chance he will be shot by an
enemy sniper. ere are anthropogenic state risks at the existential level as well:
the longer we live in an internationally anarchic system, the greater the cumula-
tive chance of a thermonuclear Armageddon or of a great war fought with other
kinds of weapons of mass destruction, laying waste to civilization.
A step risk, by contrast, is a discrete risk associated with some necessary or
desirable transition. Once the transition is completed, the risk vanishes. e
amount of step risk associated with a transition is usually not a simple function of
how long the transition takes. One does not halve the risk of traversing a mineeld
by running twice as fast. Conditional on a fast takeo, the creation of superintel-
ligence might be a step risk: there would be a certain risk associated with the
takeo, the magnitude of which would depend on what preparations had been
made; but the amount of risk might not depend much on whether the takeo
takes twenty milliseconds or twenty hours.
We can then say the following regarding a hypothetical macro-structural
development accelerator:
• Insofar as we are concerned with existential state risks, we should favor
acceleration provided we think we have a realistic prospect of making it through
to a post-transition era in which any further existential risks are greatly reduced.
• If it were known that there is some step ahead destined to cause an existential ca-
tastrophe, then we ought to reduce the rate of macro-structural development (or
even put it in reverse) in order to give more generations a chance to exist before the
curtain is rung down. But, in fact, it would be overly pessimistic to be so confident
that humanity is doomed.
• At present, the level of existential state risk appears to be relatively low. If we imag-
ine the technological macro-conditions for humanity frozen in their current state,
it seems very unlikely that an existential catastrophe would occur on a timescale of,
say, a decade. So a delay of one decadeprovided it occurred at our current stage
of development or at some other time when state risk is lowwould incur only a
very minor existential state risk, whereas a postponement by one decade of subse-
quent technological developments might well have a significant beneficial impact on
later existential step risks, for example by allowing more time for preparation.
Upshot: the main way that the speed of macro-structural development is
important is by aecting how well prepared humanity is when the time comes to
confront the key step risks.
8
So the question we must ask is how cognitive enhancement (and concomitant
acceleration of macro-structural development) would aect the expected level
of preparedness at the critical juncture. Should we prefer a shorter period of
SCIENCE AND TECHNOLOGY STRATEGY
|
235
preparation with higher intelligence? With higher intelligence, the preparation
time could be used more eectively, and the nal critical step would be taken by a
more intelligent humanity. Or should we prefer to operate with closer to current
levels of intelligence if that gives us more time to prepare?
Which option is better depends on the nature of the challenge being prepared
for. If the challenge were to solve a problem for which learning from experience
is key, then the chronological length of the preparation period might be the
determining factor, since time is needed for the requisite experience to accumu-
late. What would such a challenge look like? One hypothetical example would
be a new weapons technology that we could predict would be developed at some
point in the future and that would make it the case that any subsequent war
would have, let us say, a one-in-ten chance of causing an existential catastro-
phe. If such were the nature of the challenge facing us, then we might wish the
rate of macro-structural development to be slow, so that our species would have
more time to get its act together before the critical step when the new weapons
technology is invented. One could hope that during the grace period secured
through the deceleration, our species might learn to avoid war—that interna-
tional relations around the globe might come to resemble those between the
countries of the European Union, which, having fought one another ferociously
for centuries, now coexist in peace and relative harmony. e pacication might
occur as a result of the gentle edication from various civilizing processes or
through the shock therapy of sub-existential blows (e.g. small nuclear cona-
grations, and the recoil and resolve they might engender to nally create the
global institutions necessary for the abolishment of interstate wars). If this kind
of learning or adjusting would not be much accelerated by increased intelli-
gence, then cognitive enhancement would be undesirable, serving merely to
burn the fuse faster.
A prospective intelligence explosion, however, may present a challenge of a
dierent kind. e control problem calls for foresight, reasoning, and theoreti-
cal insight. It is less clear how increased historical experience would help. Direct
experience of the intelligence explosion is not possible (until too late), and many
features conspire to make the control problem unique and lacking in relevant his-
torical precedent. For these reasons, the amount of time that will elapse before the
intelligence explosion may not matter much per se. Perhaps what matters, instead,
is (a) the amount of intellectual progress on the control problem achieved by the
time of the detonation; and (b) the amount of skill and intelligence available at
the time to implement the best available solutions (and to improvise what is miss-
ing).
9
at this latter factor should respond positively to cognitive enhancement is
obvious. How cognitive enhancement would aect factor (a) is a somewhat subtler
matter.
Suppose, as suggested earlier, that cognitive enhancement would be a general
macro-structural development accelerator. is would hasten the arrival of the
intelligence explosion, thus reducing the amount of time available for prepar-
ation and for making progress on the control problem. Normally this would be a
236
|
THE STRATEGIC PICTURE
bad thing. However, if the only reason why there is less time available for intel-
lectual progress is that intellectual progress is speeded up, then there need be no
net reduction in the amount of intellectual progress that will have taken place by
the time the intelligence explosion occurs.
At this point, cognitive enhancement might appear to be neutral with respect
to factor (a): the same intellectual progress that would otherwise have been made
prior to the intelligence explosion—including progress on the control problem—
still gets made, only compressed within a shorter time interval. In actuality, how-
ever, cognitive enhancement may well prove a positive inuence on (a).
One reason why cognitive enhancement might cause more progress to have
been made on the control problem by the time the intelligence explosion occurs
is that progress on the control problem may be especially contingent on extreme
levels of intellectual performance—even more so than the kind of work nec-
essary to create machine intelligence. e role for trial and error and accu-
mulation of experimental results seems quite limited in relation to the control
problem, whereas experimental learning will probably play a large role in the
development of articial intelligence or whole brain emulation. e extent to
which time can substitute for wit may therefore vary between tasks in a way that
should make cognitive enhancement promote progress on the control problem
more than it would promote progress on the problem of how to create machine
intelligence.
Another reason why cognitive enhancement should dierentially promote
progress on the control problem is that the very need for such progress is more
likely to be appreciated by cognitively more capable societies and individuals. It
requires foresight and reasoning to realize why the control problem is important
and to make it a priority.
10
It may also require uncommon sagacity to nd promis-
ing ways of approaching such an unfamiliar problem.
From these reections we might tentatively conclude that cognitive enhance-
ment is desirable, at least insofar as the focus is on the existential risks of an intel-
ligence explosion. Parallel lines of thinking apply to other existential risks arising
from challenges that require foresight and reliable abstract reasoning (as opposed
to, e.g., incremental adaptation to experienced changes in the environment or a
multigenerational process of cultural maturation and institution-building).
Technology couplings
Suppose that one thinks that solving the control problem for articial intelligence
is very dicult, that solving it for whole brain emulations is much easier, and that
it would therefore be preferable that machine intelligence be reached via the whole
brain emulation path. We will return later to the question of whether whole brain
emulation would be safer than articial intelligence. But for now we want to make
the point that even if we accept this premiss, it would not follow that we ought to
promote whole brain emulation technology. One reason, discussed earlier, is that
a later arrival of superintelligence may be preferable, in order to allow more time
SCIENCE AND TECHNOLOGY STRATEGY
|
237
for progress on the control problem and for other favorable background trends
to culminate—and thus, if one were condent that whole brain emulation would
precede AI anyway, it would be counterproductive to further hasten the arrival of
whole brain emulation.
But even if it were the case that it would be best for whole brain emulation to
arrive as soon as possible, it still would not follow that we ought to favor pro-
gress toward whole brain emulation. For it is possible that progress toward whole
brain emulation will not yield whole brain emulation. It may instead yield neuro-
morphic articial intelligence—forms of AI that mimic some aspects of cortical
organization but do not replicate neuronal functionality with sucient delity
to constitute a proper emulation. If—as there is reason to believe—such neuro-
morphic AI is worse than the kind of AI that would otherwise have been built,
and if by promoting whole brain emulation we would make neuromorphic AI
arrive rst, then our pursuit of the supposed best outcome (whole brain emula-
tion) would lead to the worst outcome (neuromorphic AI); whereas if we had pur-
sued the second-best outcome (synthetic AI) we might actually have attained the
second-best (synthetic AI).
We have just described an (hypothetical) instance of what we might term a
“technology coupling.
11
is refers to a condition in which two technologies have
a predictable timing relationship, such that developing one of the technologies
has a robust tendency to lead to the development of the other, either as a neces-
sary precursor or as an obvious and irresistible application or subsequent step.
Technology couplings must be taken into account when we use the principle of
dierential technological development: it is no good accelerating the develop-
ment of a desirable technology Y if the only way of getting Y is by developing an
extremely undesirable precursor technology X, or if getting Y would immediately
produce an extremely undesirable related technology Z. Before you marry your
sweetheart, consider the prospective in-laws.
In the case of whole brain emulation, the degree of technology coupling
is debatable. We noted in Chapter 2 that while whole brain emulation would
require massive progress in various enabling technologies, it might not require
any major new theoretical insight. In particular, it does not require that we
understand how human cognition works, only that we know how to build com-
putational models of small parts of the brain, such as dierent species of neuron.
Nevertheless, in the course of developing the ability to emulate human brains,
a wealth of neuroanatomical data would be collected, and functional models of
cortical networks would surely be greatly improved. Such progress would seem
to have a good chance of enabling neuromorphic AI before full-blown whole
brain emulation.
12
Historically, there are quite a few examples of AI techniques
gleaned from neuroscience or biology. (For example: the McCulloch–Pitts neu-
ron, perceptrons, and other articial neurons and neural networks, inspired by
neuroanatomical work; reinforcement learning, inspired by behaviorist psychol-
ogy; genetic algorithms, inspired by evolution theory; subsumption architec-
tures and perceptual hierarchies, inspired by cognitive science theories about
238
|
THE STRATEGIC PICTURE
motor planning and sensory perception; articial immune systems, inspired by
theoretical immunology; swarm intelligence, inspired by the ecology of insect
colonies and other self-organizing systems; and reactive and behavior-based
control in robotics, inspired by the study of animal locomotion.) Perhaps more
signicantly, there are plenty of important AI-relevant questions that could
potentially be answered through further study of the brain. (For example: How
does the brain store structured representations in working memory and long-
term memory? How is the binding problem solved? What is the neural code?
How are concepts represented? Is there some standard unit of cortical processing
machinery, such as the cortical column, and if so how is it wired and how does
its functionality depend on the wiring? How can such columns be linked up, and
how can they learn?)
We will shortly have more to say about the relative danger of whole brain emula-
tion, neuromorphic AI, and synthetic AI, but we can already ag another impor-
tant technology coupling: that between whole brain emulation and AI. Even if a
push toward whole brain emulation actually resulted in whole brain emulation (as
opposed to neuromorphic AI), and even if the arrival of whole brain emulation
could be safely handled, a further risk would still remain: the risk associated with
a second transition, a transition from whole brain emulation to AI, which is an
ultimately more powerful form of machine intelligence.
ere are many other technology couplings, which could be considered in a
more comprehensive analysis. For instance, a push toward whole brain emulation
would boost neuroscience progress more generally.
13
at might produce various
eects, such as faster progress toward lie detection, neuropsychological manipu-
lation techniques, cognitive enhancement, and assorted medical advances.
Likewise, a push toward cognitive enhancement might (depending on the specic
path pursued) create spillovers such as faster development of genetic selection and
genetic engineering methods not only for enhancing cognition but for modifying
other traits as well.
Second-guessing
We encounter another layer of strategic complexity if we take into account that
there is no perfectly benevolent, rational, and unied world controller who simply
implements what has been discovered to be the best option. Any abstract point
about “what should be done” must be embodied in the form of a concrete mes-
sage, which is entered into the arena of rhetorical and political reality. ere it
will be ignored, misunderstood, distorted, or appropriated for various conicting
purposes; it will bounce around like a pinball, causing actions and reactions, ush-
ering in a cascade of consequences, the upshot of which need bear no straightfor-
ward relationship to the intentions of the original sender.
A sophisticated operator might try to anticipate these kinds of eect. Consider,
for example, the following argument template for proceeding with research to
develop a dangerous technology X. (One argument tting this template can
SCIENCE AND TECHNOLOGY STRATEGY
|
239
be found in the writings of Eric Drexler. In Drexler’s case, X = molecular
nanotechnology.
14
)
1 The risks of X are great.
2 Reducing these risks will require a period of serious preparation.
3 Serious preparation will begin only once the prospect of X is taken seriously
by broad sectors of society.
4 Broad sectors of society will take the prospect of X seriously only once a large
research eort to develop X is underway.
5 The earlier a serious research eort is initiated, the longer it will take to deliver
X (because it starts from a lower level of pre-existing enabling technologies).
6 Therefore, the earlier a serious research eort is initiated, the longer the
period during which serious preparation will be taking place, and the greater
the reduction of the risks.
7 Therefore, a serious research eort toward X should be initiated immediately.
What initially looks like a reason for going slow or stopping—the risks of
X being great—ends up, on this line of thinking, as a reason for the opposite
conclusion.
A related type of argument is that we ought—rather callously—to welcome
small and medium-scale catastrophes on grounds that they make us aware of our
vulnerabilities and spur us into taking precautions that reduce the probability of
an existential catastrophe. e idea is that a small or medium-scale catastrophe
acts like an inoculation, challenging civilization with a relatively survivable form
of a threat and stimulating an immune response that readies the world to deal
with the existential variety of the threat.
15
ese “shockem-into-reacting” arguments advocate letting something bad
happen in the hope that it will galvanize a public reaction. We mention them
here not to endorse them, but as a way to introduce the idea of (what we will term)
“second-guessing arguments.” Such arguments maintain that by treating others
as irrational and playing to their biases and misconceptions it is possible to elicit
a response from them that is more competent than if a case had been presented
honestly and forthrightly to their rational faculties.
It may seem unfeasibly dicult to use the kind of stratagems recommended by
second-guessing arguments to achieve long-term global goals. How could any-
body predict the nal course of a message aer it has been jolted hither and thither
in the pinball machine of public discourse? Doing so would seem to require pre-
dicting the rhetorical eects on myriad constituents with varied idiosyncrasies
and uctuating levels of inuence over long periods of time during which the sys-
tem may be perturbed by unanticipated events from the outside while its topology
is also undergoing a continuous endogenous reorganization: surely an impossible
task!
16
However, it may not be necessary to make detailed predictions about the
systems entire future trajectory in order to identify an intervention that can be
reasonably expected to increase the chances of a certain long-term outcome. One
might, for example, consider only the relatively near-term and predictable eects
240
|
THE STRATEGIC PICTURE
in a detailed way, selecting an action that does well in regard to those, while mod-
eling the systems behavior beyond the predictability horizon as a random walk.
ere may, however, be a moral case for de-emphasizing or refraining from
second-guessing moves. Trying to outwit one another looks like a zero-sum
game—or negative-sum, when one considers the time and energy that would be
dissipated by the practice as well as the likelihood that it would make it generally
harder for anybody to discover what others truly think and to be trusted when
expressing their own opinions.
17
A full-throttled deployment of the practices of
strategic communication would kill candor and leave truth bere to fend for her-
self in the backstabbing night of political bogeys.
Pathways and enablers
Should we celebrate advances in computer hardware? What about advances on
the path toward whole brain emulation? We will look at these two questions in
turn.
Eects of hardware progress
Faster computers make it easier to create machine intelligence. One eect of accel-
erating progress in hardware, therefore, is to hasten the arrival of machine intel-
ligence. As discussed earlier, this is probably a bad thing from the impersonal
perspective, since it reduces the amount of time available for solving the control
problem and for humanity to reach a more mature stage of civilization. e case
is not a slam dunk, though. Since superintelligence would eliminate many other
existential risks, there could be reason to prefer earlier development if the level of
these other existential risks were very high.
18
Hastening or delaying the onset of the intelligence explosion is not the only
channel through which the rate of hardware progress can aect existential risk.
Another channel is that hardware can to some extent substitute for soware;
thus, better hardware reduces the minimum skill required to code a seed AI. Fast
computers might also encourage the use of approaches that rely more heavily on
brute-force techniques (such as genetic algorithms and other generate-evaluate-
discard methods) and less on techniques that require deep understanding to use.
If brute-force techniques lend themselves to more anarchic or imprecise system
designs, where the control problem is harder to solve than in more precisely
engineered and theoretically controlled systems, this would be another way in
which faster computers would increase the existential risk.
Another consideration is that rapid hardware progress increases the likelihood
of a fast takeo. e more rapidly the state of the art advances in the semiconduc-
tor industry, the fewer the person-hours of programmers’ time spent exploiting
the capabilities of computers at any given performance level. is means that an
intelligence explosion is less likely to be initiated at the lowest level of hardware
PATHWAYS AND ENABLERS
|
24
performance at which it is feasible. An intelligence explosion is thus more likely
to be initiated when hardware has advanced signicantly beyond the minimum
level at which the eventually successful programming approach could rst have
succeeded. ere is then a hardware overhang when the takeo eventually does
occur. As we saw in Chapter 4, hardware overhang is one of the main factors that
reduce recalcitrance during the takeo. Rapid hardware progress, therefore, will
tend to make the transition to superintelligence faster and more explosive.
A faster takeo via a hardware overhang can aect the risks of the transition
in several ways. e most obvious is that a faster takeo oers less opportunity
to respond and make adjustments whilst the transition is in progress, which
would tend to increase risk. A related consideration is that a hardware overhang
would reduce the chances that a dangerously self-improving seed AI could be
contained by limiting its ability to colonize sucient hardware: the faster each
processor is, the fewer processors would be needed for the AI to quickly boot-
strap itself to superintelligence. Yet another eect of a hardware overhang is to
level the playing eld between big and small projects by reducing the importance
of one of the advantages of larger projects—the ability to aord more powerful
computers. is eect, too, might increase existential risk, if larger projects are
more likely to solve the control problem and to be pursuing morally acceptable
objectives.
19
ere are also advantages to a faster takeo. A faster takeo would increase
the likelihood that a singleton will form. If establishing a singleton is suciently
important for solving the post-transition coordination problems, it might be
worth accepting a greater risk during the intelligence explosion in order to miti-
gate the risk of catastrophic coordination failures in its aermath.
Developments in computing can aect the outcome of a machine intelligence
revolution not only by playing a direct role in the construction of machine intel-
ligence but also by having diuse eects on society that indirectly help shape
the initial conditions of the intelligence explosion. e Internet, which required
hardware to be good enough to enable personal computers to be mass produced
at low cost, is now inuencing human activity in many areas, including work
in articial intelligence and research on the control problem. (is book might
not have been written, and you might not have found it, without the Internet.)
However, hardware is already good enough for a great many applications that
could facilitate human communication and deliberation, and it is not clear that
the pace of progress in these areas is strongly bottlenecked by the rate of hardware
improvement.
20
On balance, it appears that faster progress in computing hardware is undesir-
able from the impersonal evaluative standpoint. is tentative conclusion could
be overturned, for example if the threats from other existential risks or from
post-transition coordination failures turn out to be extremely large. In any case,
it seems dicult to have much leverage on the rate of hardware advancement.
Our eorts to improve the initial conditions for the intelligence explosion should
therefore probably focus on other parameters.
242
|
THE STRATEGIC PICTURE
Note that even when we cannot see how to inuence some parameter, it can be
useful to determine its “sign” (i.e. whether an increase or decrease in that param-
eter would be desirable) as a preliminary step in mapping the strategic lay of the
land. We might later discover a new leverage point that does enable us to manipu-
late the parameter more easily. Or we might discover that the parameter’s sign
correlates with the sign of some other more manipulable parameter, so that our
initial analysis helps us decide what to do with this other parameter.
Should whole brain emulation research be promoted?
e harder it seems to solve the control problem for articial intelligence, the
more tempting it is to promote the whole brain emulation path as a less risky
alternative. ere are several issues, however, that must be analyzed before one
can arrive at a well-considered judgment.
21
First, there is the issue of technology coupling, already discussed earlier. We
pointed out that an eort to develop whole brain emulation could result in neuro-
morphic AI instead, a form of machine intelligence that may be especially unsafe.
But let us assume, for the sake of argument, that we actually achieve whole
brain emulation (WBE). Would this be safer than AI? is, itself, is a complicated
issue. ere are at least three putative advantages of WBE: (i) that its performance
characteristics would be better understood than those of AI; (ii) that it would
inherit human motives; and (iii) that it would result in a slower takeo. Let us very
briey reect on each.
i That it should be easier to understand the intellectual performance character-
istics of an emulation than of an AI sounds plausible. We have abundant expe-
rience with the strengths and weaknesses of human intelligence but no experi-
ence with human-level artificial intelligence. However, to understand what a
snapshot of a digitized human intellect can and cannot do is not the same as to
understand how such an intellect will respond to modifications aimed at en-
hancing its performance. An artificial intellect, by contrast, might be carefully
designed to be understandable, in both its static and dynamic dispositions. So
while whole brain emulation may be more predictable in its intellectual per-
formance than a generic AI at a comparable stage of development, it is unclear
whether whole brain emulation would be dynamically more predictable than
an AI engineered by competent safety-conscious programmers.
ii As for an emulation inheriting the motivations of its human template, this is
far from guaranteed. Capturing human evaluative dispositions might require a
very high-fidelity emulation. Even if some individual’s motivations were perfect-
ly captured, it is unclear how much safety would be purchased. Humans can be
untrustworthy, selfish, and cruel. While templates would hopefully be selected
for exceptional virtue, it may be hard to foretell how someone will act when
transplanted into radically alien circumstances, superhumanly enhanced in in-
telligence, and tempted with an opportunity for world domination. It is true
PATHWAYS AND ENABLERS
|
243
that emulations would at least be more likely to have human-like motivations
(as opposed to valuing only paperclips or discovering digits of pi). Depending
on one’s views on human nature, this might or might not be reassuring.
22
iii It is not clear why whole brain emulation should result in a slower takeo than
artificial intelligence. Perhaps with whole brain emulation one should expect
less hardware overhang, since whole brain emulation is less computationally
ecient than artificial intelligence can be. Perhaps, also, an AI system could
more easily absorb all available computing power into one giant integrated in-
tellect, whereas whole brain emulation would forego quality superintelligence
and pull ahead of humanity only in speed and size of population. If whole brain
emulation does lead to a slower takeo, this could have benefits in terms of
alleviating the control problem. A slower takeo would also make a multipolar
outcome more likely. But whether a multipolar outcome is desirable is very
doubtful.
ere is another important complication with the general idea that getting
whole brain emulation rst is safer: the need to cope with a second transition.
Even if the rst form of human-level machine intelligence is emulation-based, it
would still remain feasible to develop articial intelligence. AI in its mature form
has important advantages over WBE, making AI the ultimately more powerful
technology.
23
While mature AI would render WBE obsolete (except for the special
purpose of preserving individual human minds), the reverse does not hold.
What this means is that if AI is developed rst, there might be a single wave of
the intelligence explosion. But if WBE is developed rst, there may be two waves:
rst, the arrival of WBE; and later, the arrival of AI. e total existential risk along
the WBE-rst path is the sum of the risk in the rst transition and the risk in the
second transition (conditional on having made it through the rst); see Figure 13.
24
How much safer would the AI transition be in a WBE world? One considera-
tion is that the AI transition would be less explosive if it occurs aer some form
AI
rst
AI
WBE
rst
WBEAI
rst
transition
second
transition
WBE
Figure 3 Artificial intelligence or whole brain emulation first? In an AI-first scenario, there is
one transition that creates an existential risk. In a WBE-first scenario, there are two risky transi-
tions, first the development of WBE and then the development of AI. The total existential risk
along the WBE-first scenario is the sum of these. However, the risk of an AI transition might be
lower if it occurs in a world where WBE has already been successfully introduced.
244
|
THE STRATEGIC PICTURE
of machine intelligence has already been realized. Emulations, running at digital
speeds and in numbers that might far exceed the biological human population,
would reduce the cognitive dierential, making it easier for emulations to con-
trol the AI. is consideration is not too weighty, since the gap between AI and
WBE could still be wide. However, if the emulations were not just faster and more
numerous but also somewhat qualitatively smarter than biological humans (or at
least drawn from the top end of the human distribution) then the WBE-rst scen-
ario would have advantages paralleling those of human cognitive enhancement,
which we discussed above.
Another consideration is that the transition to WBE would extend the lead of
the frontrunner. Consider a scenario in which the frontrunner has a six-month
lead over the closest follower in developing whole brain emulation technology.
Suppose that the rst emulations to be created are cooperative, safety-focused,
and patient. If they run on fast hardware, these emulations could spend subjective
eons pondering how to create safe AI. For example, if they run at a speedup of
100,000× and are able to work on the control problem undisturbed for six months
of sidereal time, they could hammer away at the control problem for y millen-
nia before facing competition from other emulations. Given sucient hardware,
they could hasten their progress by fanning out myriad copies to work indepen-
dently on subproblems. If the frontrunner uses its six-month lead to form a sin-
gleton, it could buy its emulation AI-development team an unlimited amount of
time to work on the control problem.
25
On balance, it looks like the risk of the AI transition would be reduced if WBE
comes before AI. However, when we combine the residual risk in the AI transition
with the risk of an antecedent WBE transition, it becomes very unclear how the
total existential risk along the WBE-rst path stacks up against the risk along the
AI-rst path. Only if one is quite pessimistic about biological humanity’s ability
to manage an AI transitionaer taking into account that human nature or
civilization might have improved by the time we confront this challenge—should
the WBE-rst path seem attractive.
To gure out whether whole brain emulation technology should be promoted,
there are some further important points to place in the balance. Most signi-
cantly, there is the technology coupling mentioned earlier: a push toward WBE
could instead produce neuromorphic AI. is is a reason against pushing for
WBE.
26
No doubt, there are some synthetic AI designs that are less safe than
some neuromorphic designs. In expectation, however, it seems that neuromor-
phic designs are less safe. One ground for this is that imitation can substitute for
understanding. To build something from the ground up one must usually have a
reasonably good understanding of how the system will work. Such understanding
may not be necessary to merely copy features of an existing system. Whole brain
emulation relies on wholesale copying of biology, which may not require a com-
prehensive computational systems-level understanding of cognition (though a
large amount of component-level understanding would undoubtedly be needed).
Neuromorphic AI may be like whole brain emulation in this regard: it would be
PATHWAYS AND ENABLERS
|
245
achieved by cobbling together pieces plagiarized from biology without the engi-
neers necessarily having a deep mathematical understanding of how the system
works. But neuromorphic AI would be unlike whole brain emulation in another
regard: it would not have human motivations by default.
27
is consideration
argues against pursuing the whole brain emulation approach to the extent that it
would likely produce neuromorphic AI.
A second point to put in the balance is that WBE is more likely to give us
advance notice of its arrival. With AI it is always possible that somebody will
make an unexpected conceptual breakthrough. WBE, by contrast, will require
many laborious precursor steps—high-throughput scanning facilities, image
processing soware, detailed neural modeling work. We can therefore be con-
dent that WBE is not imminent (not less than, say, een or twenty years away).
is means that eorts to accelerate WBE will make a dierence mainly in scen-
arios in which machine intelligence is developed comparatively late. is could
make WBE investments attractive to somebody who wants the intelligence explo-
sion to preempt other existential risks but is wary of supporting AI for fear of
triggering an intelligence explosion prematurely, before the control problem has
been solved. However, the uncertainty over the relevant timescales is probably
currently too large to enable this consideration to carry much weight.
28
A strategy of promoting WBE is thus most attractive if (a) one is very pessimis-
tic about humans solving the control problem for AI, (b) one is not too worried
about neuromorphic AI, multipolar outcomes, or the risks of a second transition,
(c) one thinks that the default timing of WBE and AI is close, and (d) one prefers
superintelligence to be developed neither very late nor very early.
The person-aecting perspective favors speed
I fear the blog commenter “washbash” may speak for many when he or she writes:
I instinctively think go faster. Not because I think this is better for the world. Why should I
care about the world when I am dead and gone? I want it to go fast, damn it! This increases
the chance I have of experiencing a more technologically advanced future.
29
From the person-aecting standpoint, we have greater reason to rush forward
with all manner of radical technologies that could pose existential risks. is
is because the default outcome is that almost everyone who now exists is dead
within a century.
e case for rushing is especially strong with regard to technologies that could
extend our lives and thereby increase the expected fraction of the currently existing
population that may still be around for the intelligence explosion. If the machine
intelligence revolution goes well, the resulting superintelligence could almost
certainly devise means to indenitely prolong the lives of the then still-existing
humans, not only keeping them alive but restoring them to health and youth-
ful vigor, and enhancing their capacities well beyond what we currently think of
as the human range; or helping them shue o their mortal coils altogether by
246
|
THE STRATEGIC PICTURE
uploading their minds to a digital substrate and endowing their liberated spirits
with exquisitely good-feeling virtual embodiments. With regard to technologies
that do not promise to save lives, the case for rushing is weaker, though perhaps
still suciently supported by the hope of raised standards of living.
30
e same line of reasoning makes the person-aecting perspective favor many
risky technological innovations that promise to hasten the onset of the intelligence
explosion, even when those innovations are disfavored in the impersonal perspec-
tive. Such innovations could shorten the wolf hours during which we individually
must hang on to our perch if we are to live to see the daybreak of the posthuman
age. From the person-aecting standpoint, faster hardware progress thus seems
desirable, as does faster progress toward WBE. Any adverse eect on existential
risk is probably outweighed by the personal benet of an increased chance of the
intelligence explosion happening in the lifetime of currently existing people.
31
Collaboration
One important parameter is the degree to which the world will manage to coor-
dinate and collaborate in the development of machine intelligence. Collaboration
would bring many benets. Let us take a look at how this parameter might aect
the outcome and what levers we might have for increasing the extent and intensity
of collaboration.
The race dynamic and its perils
A race dynamic exists when one project fears being overtaken by another. is
does not require the actual existence of multiple projects. A situation with only
one project could exhibit a race dynamic if that project is unaware of its lack of
competitors. e Allies would probably not have developed the atomic bomb as
quickly as they did had they not believed (erroneously) that the Germans might
be close to the same goal.
e severity of a race dynamic (that is, the extent to which competitors prior-
itize speed over safety) depends on several factors, such as the closeness of the
race, the relative importance of capability and luck, the number of competitors,
whether competing teams are pursuing dierent approaches, and the degree to
which projects share the same aims. Competitors’ beliefs about these factors are
also relevant. (See Box 13.)
In the development of machine superintelligence, it seems likely that there will
be at least a mild race dynamic, and it is possible that there will be a severe race
dynamic. e race dynamic has important consequences for how we should think
about the strategic challenge posed by the possibility of an intelligence explosion.
e race dynamic could spur projects to move faster toward superintelligence
while reducing investment in solving the control problem. Additional detrimen-
tal eects of the race dynamic are also possible, such as direct hostilities between
COLLABORATION
|
247
Box 13 A risk-race to the bottom
Consider a hypothetical AI arms race in which several teams compete to develop
superintelligence.
32
Each team decides how much to invest in safety—knowing
that resources spent on developing safety precautions are resources not spent
on developing the AI. Absent a deal between all the competitors (which might be
stymied by bargaining or enforcement diculties), there might then be a risk-race
to the bottom, driving each team to take only a minimum of precautions.
One can model each team’s performance as a function of its capability (meas-
uring its raw ability and luck) and a penalty term corresponding to the cost of its
safety precautions. The team with the highest performance builds the first AI.
The riskiness of that AI is determined by how much its creators invested in safety.
In the worst-case scenario, all teams have equal levels of capability. The winner
is then determined exclusively by investment in safety: the team that took the
fewest safety precautions wins. The Nash equilibrium for this game is for every
team to spend nothing on safety. In the real world, such a situation might arise
via a risk ratchet: some team, fearful of falling behind, increments its risk-taking to
catch up with its competitorswho respond in kind, until the maximum level of
risk is reached.
Capability versus risk
The situation changes when there are variations in capability. As variations in ca-
pability become more important relative to the cost of safety precautions, the risk
ratchet weakens: there is less incentive to incur an extra bit of risk if doing so is
unlikely to change the order of the race. This is illustrated under various scenarios
in Figure4, which plots how the riskiness of the AI depends on the importance
of capability. Safety investment ranges from  (resulting in perfectly safe AI)
0246810
0.2
0.4
0.6
0.8
1.0(a)
(b)
Relative importance of capability
AI risk level
Relative importance of capability
AI risk level
0246810
0.2
0.4
0.6
0.8
1.0
Figure 4 Risk levels in AI technology races. Levels of risk of dangerous AI in a simple model
of a technology race involving either (a) two teams or (b) five teams, plotted against the rela-
tive importance of capability (as opposed to investment in safety) in determining which project
wins the race. The graphs show three information-level scenarios: no capability information
(straight), private capability information (dashed), and full capability information (dotted).
continued
248
|
THE STRATEGIC PICTURE
Box 13 Continued
to 0 (completely unsafe AI). The x-axis represents the relative importance of
capability versus safety investment in determining the speed of a team’s progress
toward AI. (At 0.5, the safety investment level is twice are important as capability;
at , the two are equal; at 2, capability is twice as important as safety level; and so
forth.) The y-axis represents the level of AI risk (the expected fraction of their
maximum utility that the winner of the race gets).
We see that, under all scenarios, the dangerousness of the resultant AI is
maximal when capability plays no role, gradually decreasing as capability grows
in importance.
Compatible goals
Another way of reducing the risk is by giving teams more of a stake in each
other’s success. If competitors are convinced that coming second means the to-
tal loss of everything they care about, they will take whatever risk necessary to
bypass their rivals. Conversely, teams will invest more in safety if less depends
on winning the race. This suggests that we should encourage various forms of
cross-investment.
The number of competitors
The greater the number of competing teams, the more dangerous the race be-
comes: each team, having less chance of coming first, is more willing to throw
caution to the wind. This can be seen by contrasting Figure 4a (two teams) with
Figure 4b (five teams). In every scenario, more competitors means more risk.
Risk would be reduced if teams coalesce into a smaller number of competing
coalitions.
The curse of too much information
Is it good if teams know about their positions in the race (knowing their capabil-
ity scores, for instance)? Here, opposing factors are at play. It is desirable that a
leader knows it is leading (so that it knows it has some margin for additional safety
precautions). Yet it is undesirable that a laggard knows it has fallen behind (since
this would confirm that it must cut back on safety to have any hope of catching
up). While intuitively it may seem this trade-o could go either way, the models
are unequivocal: information is (in expectation) bad.
33
Figures 4a and 4b each
plot three scenarios: the straight lines correspond to situations in which no team
knows any of the capability scores, its own included. The dashed lines show situa-
tions where each team knows its own capability only. (In those situations, a team
takes extra risk only if its capability is low.) And the dotted lines show what hap-
pens when all teams know each other’s capabilities. (They take extra risks if their
capability scores are close to one another.) With each increase in information
level, the race dynamic becomes worse.
COLLABORATION
|
249
competitors. Suppose that two nations are racing to develop the rst superintel-
ligence, and that one of them is seen to be pulling ahead. In a winner-takes-all
situation, a lagging project might be tempted to launch a desperate strike against
its rival rather than passively await defeat. Anticipating this possibility, the front-
runner might be tempted to strike preemptively. If the antagonists are powerful
states, the clash could be bloody.
34
(A “surgical strike” against the rivals AI pro-
ject might risk triggering a larger confrontation and might in any case not be
feasible if the host country has taken precautions.
35
)
Scenarios in which the rival developers are not states but smaller entities, such
as corporate labs or academic teams, would probably feature much less direct
destruction from conict. Yet the overall consequences of competition may be
almost as bad. is is because the main part of the expected harm from competi-
tion stems not from the smashup of battle but from the downgrade of precaution.
A race dynamic would, as we saw, reduce investment in safety; and conict, even
if nonviolent, would tend to scotch opportunities for collaboration, since projects
would be less likely to share ideas for solving the control problem in a climate of
hostility and mistrust.
36
On the benefits of collaboration
Collaboration thus oers many benets. It reduces the haste in developing
machine intelligence. It allows for greater investment in safety. It avoids violent
conicts. And it facilitates the sharing of ideas about how to solve the control
problem. To these benets we can add another: collaboration would tend to
produce outcomes in which the fruits of a successfully controlled intelligence
explosion get distributed more equitably.
at broader collaboration should result in wider sharing of gains is not axi-
omatic. In principle, a small project run by an altruist could lead to an outcome
where the benets are shared evenly or equitably among all morally consid-
erable beings. Nevertheless, there are several reasons to suppose that broader
collaborations, involving a greater number of sponsors, are (in expectation) dis-
tributionally superior. One such reason is that sponsors presumably prefer an
outcome in which they themselves get (at least) their fair share. A broad collabor-
ation then means that relatively many individuals get at least their fair share,
assuming the project is successful. Another reason is that a broad collaboration
also seems likelier to benet people outside the collaboration. A broader collabor-
ation contains more members, so more outsiders would have personal ties to
somebody on the inside looking out for their interests. A broader collaboration
is also more likely to include at least some altruist who wants to benet every-
one. Furthermore, a broader collaboration is more likely to operate under public
oversight, which might reduce the risk of the entire pie being captured by a clique
of programmers or private investors.
37
Note also that the larger the successful
collaboration is, the lower the costs to it of extending the benets to all outsiders.
(For instance, if 90% of all people were already inside the collaboration, it would
250
|
THE STRATEGIC PICTURE
cost them no more than 10% of their holdings to bring all outsiders up to their
own level.)
It is thus plausible that broader collaborations would tend to lead to a wider
distribution of the gains (though some projects with few sponsors might also have
distributionally excellent aims). But why is a wide distribution of gains desirable?
ere are both moral and prudential reasons for favoring outcomes in which
everybody gets a share of the bounty. We will not say much about the moral case,
except to note that it need not rest on any egalitarian principle. e case might be
made, for example, on grounds of fairness. A project that creates machine super-
intelligence imposes a global risk externality. Everybody on the planet is placed in
jeopardy, including those who do not consent to having their own lives and those
of their family imperiled in this way. Since everybody shares the risk, it would
seem to be a minimal requirement of fairness that everybody also gets a share of
the upside.
e fact that the total (expected) amount of good seems greater in collabora-
tion scenarios is another important reason such scenarios are morally preferable.
e prudential case for favoring a wide distribution of gains is two-pronged.
One prong is that wide distribution should promote collaboration, thereby miti-
gating the negative consequences of the race dynamic. ere is less incentive to
ght over who gets to build the rst superintelligence if everybody stands to ben-
et equally from any projects success. e sponsors of a particular project might
also benet from credibly signaling their commitment to distributing the spoils
universally, a certiably altruistic project being likely to attract more supporters
and fewer enemies.
38
e other prong of the prudential case for favoring a wide distribution of gains
has to do with whether agents are risk-averse or have utility functions that are
sublinear in resources. e central fact here is the enormousness of the potential
resource pie. Assuming the observable universe is as uninhabited as it looks, it
contains more than one vacant galaxy for each human being alive. Most people
would much rather have certain access to one galaxys worth of resources than
a lottery ticket oering a one-in-a-billion chance of owning a billion galaxies.
39
Given the astronomical size of humanitys cosmic endowment, it seems that self-
interest should generally favor deals that would guarantee each person a share,
even if each share corresponded to a small fraction of the total. e important
thing, when such an extravagant bonanza is in the ong, is to not be le out in
the cold.
is argument from the enormousness of the resource pie presupposes that
preferences are resource-satiable.
40
at supposition does not necessarily hold.
For instance, several prominent ethical theories—including especially aggre-
gative consequentialist theories—correspond to utility functions that are risk-
neutral and linear in resources. A billion galaxies could be used to create a billion
times more happy lives than a single galaxy. ey are thus, to a utilitarian, worth
a billion times as much.
41
Ordinary selsh human preference functions, however,
appear to be relatively resource-satiable.
COLLABORATION
|
25
is last statement must be anked by two important qualications. e rst is
that many people care about rank. If multiple agents each wants to top the Forbes
rich list, then no resource pie is large enough to give everybody full satisfaction.
e second qualication is that the post-transition technology base would ena-
ble material resources to be converted into an unprecedented range of products,
including some goods that are not currently available at any price even though
they are highly valued by many humans. A billionaire does not live a thousand
times longer than a millionaire. In the era of digital minds, however, the billion-
aire could aord a thousandfold more computing power and could thus enjoy a
thousandfold longer subjective lifespan. Mental capacity, likewise, could be for
sale. In such circumstances, with economic capital convertible into vital goods at
a constant rate even for great levels of wealth, unbounded greed would make more
sense than it does in today’s world where the auent (those among them lacking a
philanthropic heart) are reduced to spending their riches on airplanes, boats, art
collections, or a fourth and a h residence.
Does this mean that an egoist should be risk-neutral with respect to his or her
post-transition resource endowment? Not quite. Physical resources may not be
convertible into lifespan or mental performance at arbitrary scales. If a life must
be lived sequentially, so that observer moments can remember earlier events and
be aected by prior choices, then the life of a digital mind cannot be extended arbi-
trarily without utilizing an increasing number of sequential computational opera-
tions. But physics limits the extent to which resources can be transformed into
sequential computations.
42
e limits on sequential computation may also con-
strain some aspects of cognitive performance to scale radically sublinearly beyond
a relatively modest resource endowment. Furthermore, it is not obvious that an
egoist would or should be risk-neutral even with regard to highly normatively rele-
vant outcome metrics such as number of quality-adjusted subjective life years. If
oered the choice between an extra 2,000 years of life for certain and a one-in-ten
chance of an extra 30,000 years of life, I think most people would select the former
(even under the stipulation that each life year would be of equal quality).
43
In reality, the prudential case for favoring a wide distribution of gains is presum-
ably subject-relative and situation-dependent. Yet, on the whole, people would be
more likely to get (almost all of) what they want if a way is found to achieve a wide
distribution—and this holds even before taking into account that a commitment
to a wider distribution would tend to foster collaboration and thereby increase the
chances of avoiding existential catastrophe. Favoring a broad distribution, there-
fore, appears to be not only morally mandated but also prudentially advisable.
ere is a further set of consequences to collaboration that should be given
at least some shri: the possibility that pre-transition collaboration inuences
the level of post-transition collaboration. Assume humanity solves the control
problem. (If the control problem is not solved, it may scarcely matter how much
collaboration there is post transition.) ere are two cases to consider. e rst is
that the intelligence explosion does not create a winner-takes-all dynamic (pre-
sumably because the takeo is relatively slow). In this case it is plausible that if
252
|
THE STRATEGIC PICTURE
pre-transition collaboration has any systematic eect on post-transition collabo-
ration, it has a positive eect, tending to promote subsequent collaboration. e
original collaborative relationships may endure and continue beyond the tran-
sition; also, pre-transition collaboration may oer more opportunity for people
to steer developments in desirable (and, presumably, more collaborative) post-
transition directions.
e second case is that the nature of the intelligence explosion does encourage
a winner-takes-all dynamic (presumably because the takeo is relatively fast). In
this case, if there is no extensive collaboration before the takeo, a singleton is
likely to emerge—a single project would undergo the transition alone, at some
point obtaining a decisive strategic advantage combined with superintelligence.
A singleton, by denition, is a highly collaborative social order.
44
e absence of
extensive collaboration pre-transition would thus lead to an extreme degree of
collaboration post-transition. By contrast, a somewhat higher level of collabora-
tion in the run-up to the intelligence explosion opens up a wider variety of pos-
sible outcomes. Collaborating projects could synchronize their ascent to ensure
they transition in tandem without any of them getting a decisive strategic advan-
tage. Or dierent sponsor groups might merge their eorts into a single project,
while refusing to give that project a mandate to form a singleton. For example,
one could imagine a consortium of nations forming a joint scientic project to
develop machine superintelligence, yet not authorizing this project to evolve into
anything like a supercharged United Nations, electing instead to maintain the
factious world order that existed before.
Particularly in the case of a fast takeo, therefore, the possibility exists that
greater pre-transition collaboration would result in less post-transition collabora-
tion. However, to the extent that collaborating entities are able to shape the out-
come, they may allow the emergence or continuation of non-collaboration only if
they foresee that no catastrophic consequences would follow from post-transition
factiousness. Scenarios in which pre-transition collaboration leads to reduced
post-transition collaboration may therefore mostly be ones in which reduced
post-transition collaboration is innocuous.
In general, greater post-transition collaboration appears desirable. It would
reduce the risk of dystopian dynamics in which economic competition and a rap-
idly expanding population lead to a Malthusian condition, or in which evolution-
ary selection erodes human values and selects for non-eudaemonic forms, or in
which rival powers suer other coordination failures such as wars and technology
races. e last of these issues, the prospect of technology races, may be particu-
larly problematic if the transition is to an intermediary form of machine intel-
ligence (whole brain emulation) since it would create a new race dynamic that
would harm the chances of the control problem being solved for the subsequent
second transition to a more advanced form of machine intelligence (articial
intelligence).
We described earlier how collaboration can reduce conict in the run-up to
the intelligence explosion, increasing the chances that the control problem will
COLLABORATION
|
253
be solved, and improve both the moral legitimacy and the prudential desirability
of the resulting resource allocation. To these benets of collaboration it may thus
be possible to add one more: that broader collaboration pre-transition could help
with important coordination problems in the post-transition era.
Working together
Collaboration can take dierent forms depending on the scale of the collaborat-
ing entities. At a small scale, individual AI teams who believe themselves to be in
competition with one another could choose to pool their eorts.
45
Corporations
could merge or cross-invest. At a larger scale, states could join in a big inter-
national project. ere are precedents to large-scale international collaboration
in science and technology (such as CERN, the Human Genome Project, and the
International Space Station), but an international project to develop safe super-
intelligence would pose a dierent order of challenge because of the security
implications of the work. It would have to be constituted not as an open academic
collaboration but as an extremely tightly controlled joint enterprise. Perhaps
the scientists involved would have to be physically isolated and prevented from
communicating with the rest of the world for the duration of the project, except
through a single carefully vetted communication channel. e required level of
security might be nearly unattainable at present, but advances in lie detection and
surveillance technology could make it feasible later this century. It is also worth
bearing in mind that broad collaboration does not necessarily mean that large
numbers of researchers would be involved in the project; it simply means that
many people would have a say in the projects aims. In principle, a project could
involve a maximally broad collaboration comprising all of humanity as spon-
sors (represented, let us say, by the General Assembly of the United Nations), yet
employ only a single scientist to carry out the work.
46
ere is a reason for starting collaboration as early as possible, namely to take
advantage of the veil of ignorance that hides from our view any specic informa-
tion about which individual project will get to superintelligence rst. e closer
to the nishing line we get, the less uncertainty will remain about the relative
chances of competing projects; and the harder it may consequently be to make
a case based on the self-interest of the frontrunner to join a collaborative pro-
ject that would distribute the benets to all of humanity. On the other hand, it
also looks hard to establish a formal collaboration of worldwide scope before the
prospect of superintelligence has become much more widely recognized than it
currently is and before there is a clearly visible road leading to the creation of
machine superintelligence. Moreover, to the extent that collaboration would pro-
mote progress along that road, it may actually be counterproductive in terms of
safety, as discussed earlier.
e ideal form of collaboration for the present may therefore be one that does
not initially require specic formalized agreements and that does not expedite
advances in machine intelligence. One proposal that ts these criteria is that we
254
|
THE STRATEGIC PICTURE
propound an appropriate moral norm, expressing our commitment to the idea
that superintelligence should be for the common good. Such a norm could be
formulated as follows:
The common good principle
Superintelligence should be developed only for the benefit of all of humanity and in
the service of widely shared ethical ideals.
47
Establishing from an early stage that the immense potential of superintelli-
gence belongs to all of humanity will give more time for such a norm to become
entrenched.
e common good principle does not preclude commercial incentives for indi-
viduals or rms active in related areas. For example, a rm might satisfy the call
for universal sharing of the benets of superintelligence by adopting a “windfall
clause” to the eect that all prots up to some very high ceiling (say, a trillion dol-
lars annually) would be distributed in the ordinary way to the rms shareholders
and other legal claimants, and that only prots in excess of the threshold would be
distributed to all of humanity evenly (or otherwise according to universal moral
criteria). Adopting such a windfall clause should be substantially costless, any
given rm being extremely unlikely ever to exceed the stratospheric prot thresh-
old (and such low-probability scenarios ordinarily playing no role in the decisions
of the rm’s managers and investors). Yet its widespread adoption would give
humankind a valuable guarantee (insofar as the commitments could be trusted)
that if ever some private enterprise were to hit the jackpot with the intelligence
explosion, everybody would share in most of the benets. e same idea could be
applied to entities other than rms. For example, states could agree that if ever
any one state’s GDP exceeds some very high fraction (say, 90%) of world GDP, the
overshoot should be distributed evenly to all.
48
e common good principle (and particular instantiations, such as windfall
clauses) could be adopted initially as a voluntary moral commitment by respon-
sible individuals and organizations that are active in areas related to machine
intelligence. Later, it could be endorsed by a wider set of entities and enacted into
law and treaty. A vague formulation, such as the one given here, may serve well as
a starting point; but it would ultimately need to be sharpened into a set of specic
veriable requirements.
PHILOSOPHY WITH A DEADLINE
|
255
CHAPTER 15
Crunch time
W
e nd ourselves in a thicket of strategic complexity, surrounded by
a dense mist of uncertainty. ough many considerations have been
discerned, their details and interrelationships remain unclear and
iy—and there might be other factors we have not even thought of yet. What are
we to do in this predicament?
Philosophy with a deadline
A colleague of mine likes to point out that a Fields Medal (the highest honor
in mathematics) indicates two things about the recipient: that he was capable
of accomplishing something important, and that he didn’t. ough harsh, the
remark hints at a truth.
ink of a “discovery” as an act that moves the arrival of information from
a later point in time to an earlier time. e discoverys value does not equal the
value of the information discovered but rather the value of having the information
available earlier than it otherwise would have been. A scientist or a mathematician
may show great skill by being the rst to nd a solution that has eluded many oth-
ers; yet if the problem would soon have been solved anyway, then the work prob-
ably has not much beneted the world. ere are cases in which having a solution
even slightly sooner is immensely valuable, but this is most plausible when the
solution is immediately put to use, either being deployed for some practical end or
serving as a foundation to further theoretical work. And in the latter case, where
a solution is immediately used only in the sense of serving as a building block for
further theorizing, there is great value in obtaining a solution slightly sooner only
if the further work it enables is itself both important and urgent.
1
e question, then, is not whether the result discovered by the Fields Medalist
is in itself “important” (whether instrumentally or for knowledge’s own sake).
Rather, the question is whether it was important that the medalist enabled the
publication of the result to occur at an earlier date. e value of this temporal
256
|
CRUNCH TIME
transport should be compared to the value that a world-class mathematical mind
could have generated by working on something else. At least in some cases, the
Fields Medal might indicate a life spent solving the wrong problem—for instance,
a problem whose allure consisted primarily in being famously dicult to solve.
Similar barbs could be directed at other elds, such as academic philosophy.
Philosophy covers some problems that are relevant to existential risk mitigation—
we encountered several in this book. Yet there are also subelds within philoso-
phy that have no apparent link to existential risk or indeed any practical concern.
As with pure mathematics, some of the problems that philosophy studies might
be regarded as intrinsically important, in the sense that humans have reason to
care about them independently of any practical application. e fundamental
nature of reality, for instance, might be worth knowing about, for its own sake.
e world would arguably be less glorious if nobody studied metaphysics, cosmol-
ogy, or string theory. However, the dawning prospect of an intelligence explosion
shines a new light on this ancient quest for wisdom.
e outlook now suggests that philosophic progress can be maximized via an
indirect path rather than by immediate philosophizing. One of the many tasks on
which superintelligence (or even just moderately enhanced human intelligence)
would outperform the current cast of thinkers is in answering fundamental ques-
tions in science and philosophy. is reection suggests a strategy of deferred
gratication. We could postpone work on some of the eternal questions for a little
while, delegating that task to our hopefully more competent successors—in order
to focus our own attention on a more pressing challenge: increasing the chance
that we will actually have competent successors. is would be high-impact phil-
osophy and high-impact mathematics.
2
What is to be done?
We thus want to focus on problems that are not only important but urgent in
the sense that their solutions are needed prior to the intelligence explosion. We
should also take heed not to work on problems that are negative-value (such that
solving them is harmful). Some technical problems in the eld of articial intel-
ligence, for instance, might be negative-value inasmuch as their solution would
speed the development of machine intelligence without doing as much to expedite
the development of control methods that could render the machine intelligence
revolution survivable and benecial.
It can be hard to identify problems that are both urgent and important and are
such that we can condently take them to be positive-value. e strategic uncer-
tainty surrounding existential risk mitigation means that we must worry that
even well-intentioned interventions may turn out to be not only unproductive
but counterproductive. To limit the risk of doing something actively harmful or
morally wrong, we should prefer to work on problems that seem robustly positive-
value (i.e., whose solution would make a positive contribution across a wide range
WHAT IS TO BE DONE?
|
257
of scenarios) and to employ means that are robustly justiable (i.e., acceptable
from a wide range of moral views).
ere is a further desideratum to consider in selecting which problems to prior-
itize. We want to work on problems that are elastic to our eorts at solving them.
Highly elastic problems are those that can be solved much faster, or solved to a
much greater extent, given one extra unit of eort. Encouraging more kindness in
the world is an important and urgent problem—one, moreover, that seems quite
robustly positive-value: yet absent a breakthrough idea for how to go about it, prob-
ably a problem of quite low elasticity. Achieving world peace, similarly, would be
highly desirable; but considering the numerous eorts already targeting that prob-
lem, and the formidable obstacles arrayed against a quick solution, it seems unlikely
that the contributions of a few extra individuals would make a large dierence.
To reduce the risks of the machine intelligence revolution, we will propose
two objectives that appear to best meet all those desiderata: strategic analysis
and capacity-building. We can be relatively condent about the sign of these
parameters— more strategic insight and more capacity being better. Furthermore,
the parameters are elastic: a small extra investment can make a relatively large
dierence. Gaining insight and capacity is also urgent because early boosts to
these parameters may compound, making subsequent eorts more eective. In
addition to these two broad objectives, we will point to a few other potentially
worthwhile aims for initiatives.
Seeking the strategic light
Against a backdrop of perplexity and uncertainty, analysis stands out as being of
particularly high expected value.
3
Illumination of our strategic situation would
help us target subsequent interventions more eectively. Strategic analysis is espe-
cially needful when we are radically uncertain not just about some detail of some
peripheral matter but about the cardinal qualities of the central things. For many
key parameters, we are radically uncertain even about their sign—that is, we
know not which direction of change would be desirable and which undesirable.
Our ignorance might not be irremediable. e eld has been little prospected, and
glimmering strategic insights could still be awaiting their unearthing just a few
feet beneath the surface.
What we mean by “strategic analysis” here is a search for crucial considerations:
ideas or arguments with the potential to change our views not merely about the
ne-structure of implementation but about the general topology of desirability.
4
Even a single missed crucial consideration could vitiate our most valiant eorts
or render them as actively harmful as those of a soldier who is ghting on the
wrong side. e search for crucial considerations (which must explore norma-
tive as well as descriptive issues) will oen require crisscrossing the boundaries
between dierent academic disciplines and other elds of knowledge. As there is
no established methodology for how to go about this kind of research, dicult
original thinking is necessary.
258
|
CRUNCH TIME
Building good capacity
Another high-value activity, one that shares with strategic analysis the robustness
property of being benecial across a wide range of scenarios, is the development
of a well-constituted support base that takes the future seriously. Such a base can
immediately provide resources for research and analysis. If and when other pri-
orities become visible, resources can be redirected accordingly. A support base
is thus a general-purpose capability whose use can be guided by new insights as
they emerge.
One valuable asset would be a donor network comprising individuals devoted
to rational philanthropy, informed about existential risk, and discerning about
the means of mitigation. It is especially desirable that the early-day funders be
astute and altruistic, because they may have opportunities to shape the elds
culture before the usual venal interests take up position and entrench. e focus
during these opening gambits should thus be to recruit the right kinds of people
into the eld. It could be worth foregoing some technical advances in the short
term in order to ll the ranks with individuals who genuinely care about safety
and who have a truth-seeking orientation (and who are likely to attract more of
their own kind).
One important variable is the quality of the “social epistemology” of the AI-eld
and its leading projects. Discovering crucial considerations is valuable, but only
if it aects action. is cannot always be taken for granted. Imagine a project that
invests millions of dollars and years of toil to develop a prototype AI, and that
aer surmounting many technical challenges the system is nally beginning to
show real progress. ere is a chance that with just a bit more work it could turn
into something useful and protable. Now a crucial consideration is discovered,
indicating that a completely dierent approach would be a bit safer. Does the pro-
ject kill itself o like a dishonored samurai, relinquishing its unsafe design and all
the progress that had been made? Or does it react like a worried octopus, pung
out a cloud of motivated skepticism in the hope of eluding the attack? Aproject
that would reliably choose the samurai option in such a dilemma would be a far
preferable developer.
5
Yet building processes and institutions that are willing to
commit seppuku based on uncertain allegations and speculative reasoning is not
easy. Another dimension of social epistemology is the management of sensitive
information, in particular the ability to avoid leaking information that ought be
kept secret. (Information continence may be especially challenging for academic
researchers, accustomed as they are to constantly disseminating their results on
every available lamppost and tree.)
Particular measures
In addition to the general objectives of strategic light and good capacity, some
more specic objectives could also present cost-eective opportunities for action.
WILL THE BEST IN HUMAN NATURE PLEASE STAND UP
|
259
One such is progress on the technical challenges of machine intelligence safety.
In pursing this objective, care should be taken to manage information hazards.
Some work that would be useful for solving the control problem would also be
useful for solving the competence problem. Work that burns down the AI fuse
could easily be a net negative.
Another specic objective is to promote “best practices” among AI researchers.
Whatever progress has been made on the control problem needs to be dissemi-
nated. Some forms of computational experimentation, particularly if involving
strong recursive self-improvement, may also require the use of capability control
to mitigate the risk of an accidental takeo. While the actual implementation
of safety methods is not so relevant today, it will increasingly become so as
the state of the art advances. And it is not too soon to call for practitioners to
express a commitment to safety, including endorsing the common good principle
and promising to ramp up safety if and when the prospect of machine super-
intelligence begins to look more imminent. Pious words are not sucient and
will not by themselves make a dangerous technology safe: but where the mouth
goeth, the mind might gradually follow.
Other opportunities may also occasionally arise to push on some pivotal
parameter, for example to mitigate some other existential risk, or to promote
biological cognitive enhancement and improvements of our collective wisdom, or
even to shi world politics into a more harmonious register.
Will the best in human nature please stand up
Before the prospect of an intelligence explosion, we humans are like small chil-
dren playing with a bomb. Such is the mismatch between the power of our play-
thing and the immaturity of our conduct. Superintelligence is a challenge for
which we are not ready now and will not be ready for a long time. We have little
idea when the detonation will occur, though if we hold the device to our ear we
can hear a faint ticking sound.
For a child with an undetonated bomb in its hands, a sensible thing to do would
be to put it down gently, quickly back out of the room, and contact the nearest
adult. Yet what we have here is not one child but many, each with access to an
independent trigger mechanism. e chances that we will allnd the sense to put
down the dangerous stu seem almost negligible. Some little idiot is bound to
press the ignite button just to see what happens.
Nor can we attain safety by running away, for the blast of an intelligence explo-
sion would bring down the entire rmament. Nor is there a grown-up in sight.
In this situation, any feeling of gee-wiz exhilaration would be out of place.
Consternation and fear would be closer to the mark; but the most appropriate atti-
tude may be a bitter determination to be as competent as we can, much as if we were
preparing for a dicult exam that will either realize our dreams or obliterate them.
260
|
CRUNCH TIME
is is not a prescription of fanaticism. e intelligence explosion might still be
many decades o in the future. Moreover, the challenge we face is, in part, to hold
on to our humanity: to maintain our groundedness, common sense, and good-
humored decency even in the teeth of this most unnatural and inhuman problem.
We need to bring all our human resourcefulness to bear on its solution.
Yet let us not lose track of what is globally signicant. rough the fog of every-
day trivialities, we can perceive—if but dimly—the essential task of our age. In
this book, we have attempted to discern a little more feature in what is otherwise
still a relatively amorphous and negatively dened vision—one that presents as
our principal moral priority (at least from an impersonal and secular perspective)
the reduction of existential risk and the attainment of a civilizational trajectory
that leads to a compassionate and jubilant use of humanitys cosmic endowment.
NOTES: PRELIMSCHAPTER
|
26
NOTES
PRELIMS
1. Not all endnotes contain useful information, however.
2. I don’t know which ones.
CHAPTER : PAST DEVELOPMENTS AND PRESENT CAPABILITIES
1. A subsistence-level income today is about $400 (Chen and Ravallion 2010). A million
subsistence-level incomes is thus $400,000,000. e current world gross product is about
$60,000,000,000,000 and in recent years has grown at an annual rate of about 4% (compound
annual growth rate since 1950, based on Maddison [2010]). ese gures yield the estimate
mentioned in the text, which of course is only an order-of-magnitude approximation. If we look
directly at population gures, we nd that it currently takes the world population about one and
a half weeks to grow by one million; but this underestimates the growth rate of the economy
since per capita income is also increasing. By 5000 , following the Agricultural Revolution,
the world population was growing at a rate of about 1 million per 200 years—a great accelera-
tion since the rate of perhaps 1 million per million years in early humanoid prehistory—so a
great deal of acceleration had already occurred by then. Still, it is impressive that an amount of
economic growth that took 200 years seven thousand years ago takes just ninety minutes now,
and that the world population growth that took two centuries then takes one and a half weeks
now. See also Maddison (2005).
2. Such dramatic growth and acceleration might suggest one notion of a possible coming “singu-
larity,” as adumbrated by John von Neumann in a conversation with the mathematician Stani-
slaw Ulam:
Our conversation centred on the ever accelerating progress of technology and changes in the
mode of human life, which gives the appearance of approaching some essential singularity in
the history of the race beyond which human aairs, as we know them, could not continue.
(Ulam 958)
3. Hanson (2000).
4. Vinge (1993); Kurzweil (2005).
5. Sandberg (2010).
6. Van Zanden (2003); Maddison (1999, 2001); De Long (1998).
7. Two o-repeated optimistic statements from the 1960s: “Machines will be capable, within
twenty years, of doing any work a man can do” (Simon 1965, 96); Within a generation . . . the
problem of creating articial intelligence will substantially be solved” (Minsky 1967, 2). For a
systematic review of AI predictions, see Armstrong and Sotala (2012).
8. See, for example, Baum et al. (2011) and Armstrong and Sotala (2012).
9. It might suggest, however, that AI researchers know less about development timelines than they
think they do—but this could cut both ways: they might overestimate as well as underestimate
the time to AI.
10. Good (1965, 33).
262
|
NOTES: CHAPTER 
11. One exception is Norbert Wiener, who did have some qualms about the possible consequences.
He wrote, in 1960: “If we use, to achieve our purposes, a mechanical agency with whose op-
eration we cannot eciently interfere once we have started it, because the action is so fast and
irrevocable that we have not the data to intervene before the action is complete, then we had
better be quite sure that the purpose put into the machine is the purpose which we really desire
and not merely a colourful imitation of it” (Wiener 1960). Ed Fredkin spoke about his wor-
ries about superintelligent AI in an interview described in McCorduck (1979). By 1970, Good
himself writes about the risks, and even calls for the creation of an association to deal with the
dangers (Good [1970]; see also his later article [Good 1982] where he foreshadows some of the
ideas of “indirect normativity” that we discuss in Chapter 13). By 1984, Marvin Minsky was also
writing about many of the key worries (Minsky 1984).
12. Cf. Yudkowsky (2008a). On the importance of assessing the ethical implications of potentially
dangerous future technologies before they become feasible, see Roache (2008).
13. McC orduc k (1979).
14. Newell et al. (1959).
15. e SAINTS program, the ANALOGY program, and the STUDENT program, respectively. See
Slagle (1963), Evans (1964, 1968), and Bobrow (1968).
16 . Ni lss on (1984).
17. Weizenbaum (1966).
18 . W i n o g ra d (19 7 2).
19. Cope (1996); Weizenbaum (1976); Moravec (1980); run et al. (2006); Buehler et al. (2009);
Koza et al. (2003). e Nevada Department of Motor Vehicles issued the rst license for a
driverless car in May 2012.
20. e STANDUP system (Ritchie et al. 2007).
21. Schwartz (1987). Schwartz is here characterizing a skeptical view that he thought was repre-
sented by the writings of Hubert Dreyfus.
22. One vocal critic during this period was Hubert Dreyfus. Other prominent skeptics from this
era include John Lucas, Roger Penrose, and John Searle. However, among these only Dreyfus
was mainly concerned with refuting claims about what practical accomplishments we should
expect from existing paradigms in AI (though he seems to have been open to the possibility that
new paradigms could go further). Searle’s target was functionalist theories in the philosophy
of mind, not the instrumental powers of AI systems. Lucas and Penrose denied that a classical
computer could ever be programmed to do everything that a human mathematician can do, but
they did not deny that any particular function could in principle be automated or that AIs might
eventually become very instrumentally powerful. Cicero remarked that “there is nothing so ab-
surd but some philosopher has said it” (Cicero 1923, 119); yet it is surprisingly hard to think of
any signicant thinker who has denied the possibility of machine superintelligence in the sense
used in this book.
23. For many applications, however, the learning that takes place in a neural network is little dier-
ent from the learning that takes place in linear regression, a statistical technique developed by
Adrien-Marie Legendre and Carl Friedrich Gauss in the early 1800s.
24. e basic algorithm was described by Arthur Bryson and Yu-Chi Ho as a multi-stage dynam-
ic optimization method in 1969 (Bryson and Ho 1969). e application to neural networks
was suggested by Paul Werbos in 1974 (Werbos 1994), but it was only aer the work by David
Rumelhart, Georey Hinton, and Ronald Williams in 1986 (Rumelhart et al. 1986) that the
method gradually began to seep into the awareness of a wider community.
25. Nets lacking hidden layers had previously been shown to have severely limited functionality
(Minsky and Papert 1969).
26. E.g., MacKay (2003).
27. Murphy (2012).
28. Pearl (2009).
29. We suppress various technical details here in order not to unduly burden the exposition. We
will have occasion to revisit some of these ignored issues in Chapter 12.
30. A program p is a description of string x if p, run on (some particular) universal Turing ma-
chine U, outputs x; we write this as U(p) = x. (e string x here represents a possible world.) e
NOTES: CHAPTER
|
263
Kolmogorov complexity of x is then K(x) := min
p
{(p) : U(p) = x}, where (p) is the length of p in
bits. e “Solomono” probability of x is then dened as M(x) :=
p:U(p)=x
2
−ℓ(p)
, where the sum is
dened over all (“minimal,” i.e. not necessarily halting) programs p for which U outputs a string
starting with x (Hutter 2005).
31. Bayesian conditioning on evidence E gives
PwPwE
PEwP w
posteriorprior
priorprior
() ()
()(
= =
))
()
.
Pw
prior
(e probability of a proposition [like E] is the sum of the probability of the possible worlds in
which it is true.)
32. Or randomly picks one of the possible actions with the highest expected utility, in case there is a
tie.
33. More concisely, the expected utility of an action can be written as EU aUwPwa
w
() ()
()
,=∑
|
where the sum is over all possible worlds.
34. See, e.g., Howson and Urbach (1993); Bernardo and Smith (1994); Russell and Norvig (2010).
35. Wainwright and Jordan (2008). e application areas of Bayes nets are myriad; see, e.g., Pourret
et al. (2008).
36. One might wonder why so much detail is given to game AI here, which to some might seem like
an unimportant application area. e answer is that game-playing oers some of the clearest
measures of human vs. AI performance.
37. Samuel (1959); Schaeer (1997, ch. 6).
38. Schaeer et al. (2007).
39. Berliner (1980a, b).
40. Tesauro (1995).
41. Such programs include GNU (see Silver [2006]) and Snowie (see Gammoned.net [2012]).
42. Lenat himself had a hand in guiding the eet-design process. He wrote: “us the nal credit-
ing of the win should be about 60/40% Lenat/Eurisko, though the signicant point here is that
neither party could have won alone” (Lenat 1983, 80).
43. Lenat (1982, 1983).
44. Cirasella and Kopec (2006).
45. Kasparov (1996, 55).
46. Newborn (2011).
47. Keim et al. (1999).
48. See Armstrong (2012).
49. She ppa rd (2 00 2).
50. Wikipedia (2012 a).
51. Marko (2011).
52. Rubin and Watson (2011).
53. Elyasaf et al. (2011).
54. KGS (2012).
55. Newell et al. (1958, 320).
56. Attributed in Vardi (2012).
57. In 1976, I. J. Good wrote: “A computer program of Grandmaster strength would bring us within
an ace of [machine ultra-intelligence]” (Good 1976). In 1979, Douglas Hofstadter opined in his
Pulitzer-winning Gödel, Escher, Bach: “Question: Will there be chess programs that can beat
anyone? Speculation: No. ere may be programs that can beat anyone at chess, but they will
not be exclusively chess programs. ey will be programs of general intelligence, and they will
be just as temperamental as people. ‘Do you want to play chess?’ ‘No, I’m bored with chess. Let’s
talk about poetry’” (Hofstadter [1979] 1999, 678).
58. e algorithm is minimax search with alpha-beta pruning, used with a chess-specic heuristic
evaluation function of board states. Combined with a good library of openings and endgames,
and various other tricks, this can make for a capable chess engine.
59. ough especially with recent progress in learning the evaluation heuristic from simulated
games, many of the underlying algorithms would probably also work well for many other
games.
264
|
NOTES: CHAPTER 
60. Nilsson (2009, 318). Knuth was certainly overstating his point. ere are many “thinking tasks”
that AI has not succeeded in doing—inventing a new subeld of pure mathematics, doing any
kind of philosophy, writing a great detective novel, engineering a coup détat, or designing a
major new consumer product.
61. Shapi ro (1992).
62. One might speculate that one reason it has been dicult to match human abilities in percep-
tion, motor control, common sense, and language understanding is that our brains have dedi-
cated wetware for these functions—neural structures that have been optimized over evolution-
ary timescales. By contrast, logical thinking and skills like chess playing are not natural to us;
so perhaps we are forced to rely on a limited pool of general-purpose cognitive resources to
perform these tasks. Maybe what our brains do when we engage in explicit logical reasoning or
calculation is in some ways analogous to running a “virtual machine,” a slow and cumbersome
mental simulation of a general-purpose computer. One might then say (somewhat fancifully)
that a classical AI program is not so much emulating human thinking as the other way around:
a human who is thinking logically is emulating an AI program.
63. is example is controversial: a minority view, represented by approximately 20% of adults in
the USA and similar numbers in many other developed nations, holds that the Sun revolves
around the Earth (Crabtree 1999; Dean 2005).
64. World Robotics (2011).
65. Estimated from data in Guizzo (2010).
66. Holley (2009).
67. Hybrid rule-based statistical approaches are also used, but they are currently a small part of the
picture.
68. Cross and Walker (1994); Hedberg (2002).
69. Based on the statistics from TABB Group, a New York- and London-based capital markets
research rm (personal communication).
70. CFTC and SEC (2010). For a dierent perspective on the events of 6 May 2010, see CME Group
(2010).
71. Nothing in the text should be construed as an argument against algorithmic high-frequency
trading, which might normally perform a benecial function by increasing liquidity and
market eciency.
72. A smaller market scare occurred on August, 1, 2012, in part because the “circuit breaker” was
not also programmed to halt trading if there were extreme changes in the number of shares
being traded (Popper 2012). is again foreshadows another later theme: the diculty of antici-
pating all specic ways in which some particular plausible-seeming rule might go wrong.
73. Nilsson (2009, 319).
74. Minsky (2006); McCarthy (2007); Beal and Winston (2009).
75. Peter Norvig, personal communication. Machine-learning classes are also very popular, reect-
ing a somewhat orthogonal hype-wave of “big data” (inspired by e.g. Google and the Netix
Prize).
76. Armstrong and Sotala (2012).
77. Müller and Bostrom (forthcoming).
78. See Baum et al. (2011), another survey cited therein, and Sandberg and Bostrom (2011).
79. Nil sson (200 9).
80. is is again conditional on no civilization-disrupting catastrophe occurring. e denition of
HLMI used by Nilsson is “AI able to perform around 80% of jobs as well or better than humans
perform” (Kruel 2012).
81. e table shows the results of four dierent polls as well as the combined results. e rst two
were polls taken at academic conferences: PT-AI, participants of the conference Philosophy and
eory of AI in essaloniki 2011 (respondents were asked in November 2012), with a response
rate of 43 out of 88; and AGI, participants of the conferences Articial General Intelligence and
Impacts and Risks of Articial General Intelligence, both in Oxford, December 2012 (response
rate: 72/111). e EETN poll sampled the members of the Greek Association for Articial Intel-
ligence, a professional organization of published researchers in the eld, in April 2013 (response
NOTES: CHAPTERS –2
|
265
rate: 26/250). e TOP100 poll elicited the opinions among the 100 top authors in articial intel-
ligence as measured by a citation index, in May 2013 (response rate: 29/100).
82. Interviews with some 28 (at the time of writing) AI practitioners and related experts have been
posted by Kruel (2011).
83. e diagram shows renormalized median estimates. Means are signicantly dierent. For ex-
ample, the mean estimates for the “Extremely bad” outcome were 7.6% (for TOP100) and 17.2%
(for the combined pool of expert assessors).
84. ere is a substantial literature documenting the unreliability of expert forecasts in many do-
mains, and there is every reason to think that many of the ndings in this body of research
apply to the eld of articial intelligence too. In particular, forecasters tend to be overcon-
dent in their predictions, believing themselves to be more accurate than they really are, and
therefore assigning too little probability to the possibility that their most-favored hypothesis
is wrong (Tetlock 2005). (Various other biases have also been documented; see, e.g., Gilovich
et al. [2002].) However, uncertainty is an inescapable fact of the human condition, and many
of our actions unavoidably rely on expectations about which long-term consequences are more
or less plausible: in other words, on probabilistic predictions. Refusing to oer explicit proba-
bilistic predictions would not make the epistemic problem go away; it would just hide it from
view (Bostrom 2007). Instead, we should respond to evidence of overcondence by broaden-
ing our condence intervals (or “credible intervals”)—i.e. by smearing out our credence func-
tions—and in general we must struggle as best we can with our biases, by considering dierent
perspectives and aiming for intellectual honesty. In the longer run, we can also work to develop
techniques, training methods, and institutions that can help us achieve better calibration. See
also Armstrong and Sotala (2012).
CHAPTER 2: PATHS TO SUPERINTELLIGENCE
1. is resembles the denition in Bostrom (2003c) and Bostrom (2006a). It can also be compared
with Shane Leggs denition (“Intelligence measures an agent’s ability to achieve goals in a wide
range of environments”) and its formalizations (Legg 2008). It is also very similar to Good’s
denition of ultraintelligence in Chapter 1 (“a machine that can far surpass all the intellectual
activities of any man however clever”).
2. For the same reason, we make no assumption regarding whether a superintelligent machine
could have “true intentionality” (pace Searle, it could; but this seems irrelevant to the concerns
of this book). And we take no position in the internalism/externalism debate about mental con-
tent that has been raging in the philosophical literature, or on the related issue of the extended
mind thesis (Clark and Chalmers 1998).
3. Turing (1950, 456).
4. Turing (1950, 456).
5. Chalmers (2010); Moravec (1976, 1988, 1998, 1999).
6. See Moravec (1976). A similar argument is advanced by David Chalmers (2010).
7. See also Shulman and Bostrom (2012), where these matters are elaborated in more detail.
8. Legg (2008) oers this reason in support of the claim that humans will be able to recapitulate
the progress of evolution over much shorter timescales and with reduced computational re-
sources (while noting that evolution’s unadjusted computational resources are far out of reach).
Baum (2004) argues that some developments relevant to AI occurred earlier, with the organiza-
tion of the genome itself embodying a valuable representation for evolutionary algorithms.
9. Whitman et al. (1998); Sabrosky (1952).
10. Sc hult z (200 0).
11. Menzel and Giurfa (2001, 62); Truman et al. (1993).
12. Sandberg and Bostrom (2008).
13. See Legg (2008) for further discussion of this point and of the promise of functions or environ-
ments that determine tness based on a smooth landscape of pure intelligence tests.
14. See Bostrom and Sandberg (2009b) for a taxonomy and more detailed discussion of ways in
which engineers may outperform historical evolutionary selection.
266
|
NOTES: CHAPTER 2
15. e analysis has addressed the nervous systems of living creatures, without reference to the cost
of simulating bodies or the surrounding virtual environment as part of a tness function. It is
plausible that an adequate tness function could test the competence of a particular organism
in far fewer operations than it would take to simulate all the neuronal computation of that or-
ganism’s brain throughout its natural lifespan. AI programs today oen develop and operate in
very abstract environments (theorem provers in symbolic math worlds, agents in simple game
tournament worlds, etc.).
A skeptic might insist that an abstract environment would be inadequate for the evolution
of general intelligence, believing instead that the virtual environment would need to closely re-
semble the actual biological environment in which our ancestors evolved. Creating a physically
realistic virtual world would require a far greater investment of computational resources than
the simulation of a simple toy world or abstract problem domain (whereas evolution had access
to a physically realistic real world “for free”). In the limiting case, if complete micro-physical
accuracy were insisted upon, the computational requirements would balloon to ridiculous pro-
portions. However, such extreme pessimism is almost certainly unwarranted; it seems unlikely
that the best environment for evolving intelligence is one that mimics nature as closely as pos-
sible. It is, on the contrary, plausible that it would be more ecient to use an articial selection
environment, one quite unlike that of our ancestors, an environment specically designed to
promote adaptations that increase the type of intelligence we are seeking to evolve (abstract
reasoning and general problem-solving skills, for instance, as opposed to maximally fast in-
stinctual reactions or a highly optimized visual system).
16 . Wi k iped ia (2012b).
17. For a general treatment of observation selection theory, see Bostrom (2002a). For the specic
application to the current issue, see Shulman and Bostrom (2012). For a short popular introduc-
tion, see Bostrom (2008b).
18. Sutton and Barto (1998, 21f); Schultz et al. (1997).
19. is term was introduced by Eliezer Yudkowsky; see, e.g., Yudkowsky (2007).
20. is is the scenario described by Good (1965) and Yudkowsky (2007). However, one could also
consider an alternative in which the iterative sequence has some steps that do not involve intelli-
gence enhancement but instead design simplication. at is, at some stages, the seed AI might
rewrite itself so as make subsequent improvements easier to nd.
21. Helmstaedter et al. (2011).
22. Andres et al. (2012).
23. Adequate for enabling instrumentally useful forms of cognitive functioning and communica-
tion, that is; but still radically impoverished relative to the interface provided by the muscles
and sensory organs of a normal human body.
24. Sandberg (2013).
25. See the “Computer requirements” section of Sandberg and Bostrom (2008, 7981).
26. A lower level of success might be a brain simulation that has biologically suggestive micro-
dynamics and displays a substantial range of emergent species-typical activity such as a slow-
wave sleep state or activity-dependent plasticity. Whereas such a simulation could be a useful
testbed for neuroscientic research (though one which might come close to raising serious ethi-
cal issues), it would not count as a whole brain emulation unless the simulation were suciently
accurate to be able to perform a substantial fraction of the intellectual work that the simulated
brain was capable of. As a rule of thumb, we might say that in order for a simulation of a human
brain to count as a whole brain emulation, it would need to be able to express coherent verbal
thoughts or have the capacity to learn to do so.
27. Sandberg and Bostrom (2008).
28. Sandberg and Bostrom (2008). Further explanation can be found in the original report.
29. e rst map is described in Albertson and omson (1976) and White et al. (1986). e com-
bined (and in some cases corrected) network is available from the “WormAtlas” website (http://
www.wormatlas.org/).
30. For a review of past attempts of emulating C. elegans and their fates, see Kaufman (2011). Kauf-
man quotes one ambitious doctoral student working in the area, David Dalrymple, as saying,
NOTES: CHAPTER 2
|
267
“With optogenetic techniques, we are just at the point where it’s not an outrageous proposal
to reach for the capability to read and write to anywhere in a living C. elegans nervous system,
using a high-throughput automated system. . . . I expect to be nished with C. elegans in 2–3
years. I would be extremely surprised, for whatever thats worth, if this is still an open problem
in 2020” (Dalrymple 2011). Brain models aiming for biological realism that were hand-coded
(rather than generated automatically) have achieved some basic functionality; see, e.g.,
Eliasmith et al. (2012).
31. Caenorhabditis elegans does have some convenient special properties. For example, the organ-
ism is transparent, and the wiring pattern of its nervous system does not change between indi-
viduals.
32. If neuromorphic AI rather than whole brain emulation is the end product, then it might or
might not be the case that the relevant insights would be derived through attempts to simulate
human brains. It is conceivable that the important cortical tricks would be discovered during
the study of (nonhuman) animal brains. Some animal brains might be easier to work with than
human brains, and smaller brains would require fewer resources to scan and model. Research
on animal brains would also be subject to less regulation. It is even conceivable that the rst
human- level machine intelligence will be created by completing a whole brain emulation of
some suitable animal and then nding ways to enhance the resultant digital mind. us hu-
manity could get its comeuppance from an uplied lab mouse or macaque.
33. Uauy and Dangour (2006); Georgie (2007); Stewart et al. (2008); Eppig et al. (2010); Cotman
and Berchtold (2002).
34. According to the World Health Organization in 2007, nearly 2 billion individuals have insuf-
cient iodine intake (e Lancet 2008). Severe iodine deciency hinders neurological develop-
ment and leads to cretinism, which involves an average loss of about 12.5 IQ points (Qian et al.
2005). e condition can be easily and inexpensively prevented though salt fortication
(Horton et al. 2008).
35. Bostrom and Sandberg (2009a).
36. Bostrom and Sandberg (2009b). A typical putative performance increase from pharmacological
and nutritional enhancement is in the range of 10–20% on test tasks measuring working mem-
ory, attention, etc. But it is generally dubious whether such reported gains are real, sustainable
over a longer term, and indicative of correspondingly improved results in real-world problem
situations (Repantis et al. 2010). For instance, in some cases there might be a compensating de-
terioration on some performance dimensions that are not measured by the test tasks (Sandberg
and Bostrom 2006).
37. If there were an easy way to enhance cognition, one would expect evolution already to have
taken advantage of it. Consequently, the most promising kind of nootropic to investigate may be
one that promises to boost intelligence in some manner that we can see would have lowered t-
ness in the ancestral environment—for example, by increasing head size at birth or amping up
the brain’s glucose metabolism. For a more detailed discussion of this idea (along with several
important qualications), see Bostrom (2009b).
38. Sperm are harder to screen because, in contrast to embryos, they consist of only one cell—and
one cell needs to be destroyed in order to do the sequencing. Oocytes also consist of only one
cell; however, the rst and second cell divisions are asymmetric and produce one daughter cell
with very little cytoplasm, the polar body. Since polar bodies contain the same genome as the
main cell and are redundant (they eventually degenerate) they can be biopsied and used for
screening (Gianaroli 2000).
39. Each of these practices was subject to some ethical controversy when it was introduced, but
there seems to be a trend toward increasing acceptance. Attitudes toward human genetic engin-
eering and embryo selection vary signicantly across cultures, suggesting that development
and application of new techniques will probably take place even if some countries initially adopt
a cautious stance, although the rate at which this happens will be inuenced by moral, religious,
and political pressures.
40. Davies et al. (2011); Benyamin et al. (2013); Plomin et al. (2013). See also Mardis (2011); Hsu
(2012).
268
|
NOTES: CHAPTER 2
41. Broad-sense heritability of adult IQ is usually estimated in the range of 0.50.8 within middle-
class strata of developed nations (Bouchard 2004, 148). Narrow-sense heritability, which meas-
ures the portion of variance that is attributable to additive genetic factors, is lower (in the range
0.3–0.5) but still substantial (Devlin et al. 1997; Davies et al. 2011; Visscher et al. 2008). ese
estimates could change for dierent populations and environments, as heritabilities vary de-
pending on the population and environment being studied. For example, lower heritabilities
have been found among children and those from deprived environments (Benyamin et al. 2013;
Turkheimer et al. 2003). Nisbett et al. (2012) review numerous environmental inuences on
variation in cognitive ability.
42. e following several paragraphs draw heavily on joint work with Carl Shulman (Shulman and
Bostrom 2014).
43. is table is taken from Shulman and Bostrom (2014). It is based on a toy model that assumes
a Gaussian distribution of predicted IQs among the embryos with a standard deviation of
7.5points. e amount of cognitive enhancement that would be delivered with dierent num-
bers of embryos depends on how dierent the embryos are from one another in the additive
genetic variants whose eects we know. Siblings have a coecient of relatedness of ½, and
common additive genetic variants account for half or less of variance in adult uid intelligence
(Davies et al. 2011). ese two facts suggest that where the observed population standard devia-
tion in developed countries is 15 points, the standard deviation of genetic inuences within a
batch of embryos would be 7.5 points or less.
44. With imperfect information about the additive genetic eects on cognitive ability, eect sizes
would be reduced. However, even a small amount of knowledge would go a relatively long way,
because the gains from selection do not scale linearly with the portion of variance that we can
predict. Instead, the eectiveness of our selection depends on the standard deviation of predict-
ed mean IQ, which scales as the square root of variance. For example, if one could account for
12.5% of the variance, this could deliver eects half as great as those in Table 1, which assume
50%. For comparison, a recent study (Rietveld et al. 2013) claims to have already identied 2.5%
of the variance.
45. For comparison, standard practice today involves the creation of fewer than ten embryos.
46. Adult and embryonic stem cells can be coaxed to develop into sperm cells and oocytes, which
can then be fused to produce an embryo (Nagy et al. 2008; Nagy and Chang 2007). Egg cell pre-
cursors can also form parthenogenetic blastocysts, unfertilized and non-viable embryos, able to
produce embryonic stem cell lines for the process (Mai et al. 2007).
47. e opinion is that of Katsuhiko Hayashi, as reported in Cyranoski (2013). e Hinxton Group,
an international consortium of scientists that discusses stem cell ethics and challenges, predict-
ed in 2008 that human stem cell-derived gametes would be available within ten years (Hinxton
Group 2008), and developments thus far are broadly consistent with this.
48. Sparrow (2013); Miller (2012); e Uncertain Future (2012).
49. Spa r row (2 013).
50. Secular concerns might focus on anticipated impacts on social inequality, the medical safety
of the procedure, fears of an enhancement “rat race,” rights and responsibilities of parents vis-
à-vis their prospective ospring, the shadow of twentieth-century eugenics, the concept of
human dignity, and the proper limits of states’ involvement in the reproductive choices of
their citizens. (For a discussion of the ethics of cognitive enhancement see Bostrom and Ord
[2006], Bostrom and Roache [2011], and Sandberg and Savulescu [2011].) Some religious
traditions may oer additional concerns, including ones centering on the moral status of
embryos or the proper limits of human agency within the scheme of creation.
51. To stave o the negative eects of inbreeding, iterated embryo selection would require either
a large starting supply of donors or the expenditure of substantial selective power to reduce
harmful recessive alleles. Either alternative would tend to push toward ospring being less
closely genetically related to their parents (and more related to one another).
52. Adapted from Shulman and Bostrom (2014).
53. Bos tr om (20 08 b).
54. Just how dicult an obstacle epigenetics will be is not yet known (Chason et al. 2011; Iliadou
etal. 2011).
NOTES: CHAPTER 2
|
269
55. While cognitive ability is a fairly heritable trait, there may be few or no common alleles or
polymorphisms that individually have a large positive eect on intelligence (Davis et al. 2010;
Davies et al. 2011; Rietveld et al. 2013). As sequencing methods improve, the mapping out of
low-frequency alleles and their cognitive and behavioral correlates will become increasingly
feasible. ere is some theoretical evidence suggesting that some alleles that cause genetic dis-
orders in homozygotes may provide sizeable cognitive advantages in heterozygote carriers,
leading to a prediction that Gaucher, Tay-Sachs, and Niemann-Pick heterozygotes would be
about 5 IQ points higher than control groups (Cochran et al. 2006). Time will tell whether this
holds.
56. One paper (Nachman and Crowell 2000) estimates 175 mutations per genome per gener-
ation. Another (Lynch 2010), using dierent methods, estimates that the average newborn has
between 50 and 100 new mutations, and Kong et al. (2012) implies a gure of around 77 new
mutations per generation. Most of these mutations do not aect functioning, or do so only to
an imperceptibly slight degree; but the combined eects of many very slightly deleterious muta-
tions could be a signicant loss of tness. See also Crow (2000).
57. Crow (2000); Lynch (2010).
58. ere are some potentially important caveats to this idea. It is possible that the modal genome
would need some adjustments in order to avoid problems. For example, parts of the genome
might be adapted to interacting with other parts under the assumption that all parts function
with a certain level of eciency. Increasing the eciency of those parts might then lead to over-
shooting along some metabolic pathways.
59. ese composites were created by Mike Mike from individual photographs taken by Virtual
Flavius (Mike 2013).
60. ey can, of course, have some eects sooner—for instance, by changing peoples expectations
of what is to come.
61. Louis Harris & Associates (1969); Mason (2003).
62. Kalfoglou et al. (2004).
63. e data is obviously limited, but individuals selected for 1-in-10,000 results on childhood abil-
ity tests have been shown, in longitudinal studies, to be substantially more likely to become ten-
ured professors, earn patents, and succeed in business than those with slightly less exceptional
scores (Kell et al. 2013). Roe (1953) studied sixty-four eminent scientists and found median cog-
nitive ability three to four standard deviations above the population norm and strikingly higher
than is typical for scientists in general. (Cognitive ability is also correlated with lifetime earn-
ings and with non-nancial outcomes such as life expectancy, divorce rates, and probability
of dropping out of school [Deary 2012].) An upward shi of the distribution of cognitive abil-
ity would have disproportionately large eects at the tails, especially increasing the number of
highly gied and reducing the number of people with retardation and learning disabilities. See
also Bostrom and Ord (2006) and Sandberg and Savulescu (2011).
64. E.g. Warwick (2002). Stephen Hawking even suggested that taking this step might be necessary
in order to keep up with advances in machine intelligence: “We must develop as quickly as pos-
sible technologies that make possible a direct connection between brain and computer, so that
articial brains contribute to human intelligence rather than opposing it” (reported in Walsh
[2001]). Ray Kurzweil concurs: “As far as Hawking’s . . . recommendation is concerned, namely
direct connection between the brain and computers, I agree that this is both reasonable, desir-
able and inevitable. [sic] It’s been my recommendation for years” (Kurzweil 2001).
65. See Lebedev and Nicolelis (2006); Birbaumer et al. (2008); Mak and Wolpaw (2009); and
Nicolelis and Lebedev (2009). A more personal outlook on the problem of enhancement through
implants can be found in Chorost (2005, Chap. 11).
66. Smeding et al. (2006).
67. Degnan et al. (2002).
68. Dagnelie (2012); Shannon (2012).
69. Perlmutter and Mink (2006); Lyons (2011).
70. Koch et al. (2006).
71. Schalk (2008). For a general review of the current state of the art, see Berger et al. (2008). For the
case that this would help lead to enhanced intelligence, see Warwick (2002).
270
|
NOTES: CHAPTERS 2–3
72. Some examples: Bartels et al. (2008); Simeral et al. (2011); Krusienski and Shih (2011); and
Pasqualotto et al. (2012).
73. E.g. Hinke et al. (1993).
74. ere are partial exceptions to this, especially in early sensory processing. For example, the
primary visual cortex uses a retinotopic mapping, which means roughly that adjacent neural
assemblies receive inputs from adjacent areas of the retinas (though ocular dominance columns
somewhat complicate the mapping).
75. Berger et al. (2012); Hampson et al. (2012).
76. Some brain implants require two forms of learning: the device learning to interpret the organ-
ism’s neural representations and the organism learning to use the system by generating appro-
priate neural ring patterns (Carmena et al. 2003).
77. It has been suggested that we should regard corporate entities (corporations, unions, govern-
ments, churches, and so forth) as articial intelligent agents, entities with sensors and eec-
tors, able to represent knowledge and perform inference and take action (e.g. Kuipers [2012];
cf. Huebner [2008] for a discussion on whether collective representations can exist). ey are
clearly powerful and ecologically successful, although their capabilities and internal states are
dierent from those of humans.
78. Hanson (1995, 2000); Berg and Rietz (2003).
79. In the workplace, for instance, employers might use lie detectors to crack down on employee
the and shirking, by asking the employee at the end of each business day whether she has sto-
len anything and whether she has worked as hard as she could. Political and business leaders
could likewise be asked whether they were wholeheartedly pursuing the interests of their share-
holders or constituents. Dictators could use them to target seditious generals within the regime
or suspected troublemakers in the wider population.
80. One could imagine neuroimaging techniques making it possible to detect neural signatures of
motivated cognition. Without self-deception detection, lie detection would favor individuals
who believe their own propaganda. Better tests for self-deception tests could also be used to
train rationality and to study the eectiveness of interventions aimed at reducing biases.
81. Bell and Gemmel (2009). An early example is found in the work of MITs Deb Roy, who record-
ed every moment of his son’s rst three years of life. Analysis of this audiovisual data is yielding
information on language development; see Roy (2012).
82. Growth in total world population of biological human beings will contribute only a small fac-
tor. Scenarios involving machine intelligence could see the world population (including digital
minds) explode by many orders of magnitude in a brief period of time. But that road to superin-
telligence involves articial intelligence or whole brain emulation, so we need not consider it in
this subsection.
83. Vi nge (1993).
CHAPTER 3: FORMS OF SUPERINTELLIGENCE
1. Vernor Vinge has used the term “weak superintelligence” to refer to such sped-up human minds
(Vinge 1993).
2. For example, if a very fast system could do everything that any human could do except dance a
mazurka, we should still call it a speed superintelligence. Our interest lies in those core cogni-
tive capabilities that have economic or strategic signicance.
3. At least a millionfold speedup compared to human brains is physically possible, as can been
seen by considering the dierence in speed and energy of relevant brain processes in compari-
son to more ecient information processing. e speed of light is more than a million times
greater than that of neural transmission, synaptic spikes dissipate more than a million times
more heat than is thermodynamically necessary, and current transistor frequencies are more
than a million times faster than neuron spiking frequencies (Yudkowsky [2008a]; see also
Drexler [1992]). e ultimate limits of speed superintelligence are bounded by light-speed com-
munications delays, quantum limits on the speed of state transitions, and the volume needed
to contain the mind (Lloyd 2000). e “ultimate laptop” described by Lloyd (2000) would run
a 1.4×10
21
FLOPS brain emulation at speedup of 3.8×10
29
× (assuming the emulation could be
NOTES: CHAPTER 3
|
27
suciently parallelized). Lloyd’s construction, however, is not intended to be technologically
plausible; it is only meant to illustrate those constraints on computation that are readily deriv-
able from basic physical laws.
4. With emulations, there is also an issue of how long a human-like mind can keep working on
something before going mad or falling into a rut. Even with task variety and regular holidays,
it is not certain that a human-like mind could live for thousands of subjective years with-
out developing psychological problems. Furthermore, if total memory capacity is limited—a
consequence of having a limited neuron population—then cumulative learning cannot con-
tinue indenitely: beyond some point, the mind must start forgetting one thing for each new
thing it learns. (Articial intelligence could be designed such as to ameliorate these potential
problems.)
5. Accordingly, nanomechanisms moving at a modest 1m/s have typical timescales of nanosec-
onds. See section 2.3.2 of Drexler (1992). Robin Hanson mentions 7-mm “tinkerbell” robot bod-
ies moving at 260 times normal speed (Hanson 1994).
6. Hanson (2012).
7. “Collective intelligence” does not refer to low-level parallelization of computing hardware but
to parallelization at the level of intelligent autonomous agents such as human beings. Imple-
menting a single emulation on a massively parallel machine might result in speed superintel-
ligence if the parallel computer is suciently fast: it would not produce a collective intelligence.
8. Improvements to the speed or the quality of the individual components could also indirectly af-
fect the performance of collective intelligence, but here we mainly consider such improvements
under the other two forms of superintelligence in our classication.
9. It has been argued that a higher population density triggered the Upper Paleolithic Revolution
and that beyond a certain threshold accumulation of cultural complexity became much easier
(Powell et al. 2009).
10. What about the Internet? It seems not yet to have amounted to a super-sized boost. Maybe it will
do so eventually. It took centuries or millennia for the other examples listed here to reveal their
full potential.
11. is is, obviously, not meant to be a realistic thought experiment. A planet large enough to sus-
tain seven quadrillion human organisms with present technology would implode, unless it were
made of very light matter or were hollow and held up by pressure or other articial means.
(ADyson sphere or a Shellworld might be a better solution.) History would have unfolded dif-
ferently on such a vast surface. Set all this aside.
12. Our focus here is on the functional properties of a unied intellect, not on the question of
whether such an intellect would have qualia or whether it would be a mind in the sense of hav-
ing subjective conscious experience. (One might ponder, though, what kinds of conscious ex-
perience might arise from intellects that are more or less integrated than those of human brains.
On some views of consciousness, such as the global workspace theory, it seems one might ex-
pect more integrated brains to have more capacious consciousness. Cf. Baars (1997), Shanahan
(2010), and Schwitzgebel (2013).)
13. Even small groups of humans that have remained isolated for some time might still benet from
the intellectual outputs of a larger collective intelligence. For example, the language they use
might have been developed by a much larger linguistic community, and the tools they use might
have been invented in a much larger population before the small group became isolated. But
even if a small group had always been isolated, it might still be part of a larger collective intelli-
gence than meets the eye—namely, the collective intelligence consisting of not only the present
but all ancestral generations as well, an aggregate that can function as a feed-forward informa-
tion processing system.
14. By the Church–Turing thesis, all computable functions are computable by a Turing machine.
Since any of the three forms of superintelligence could simulate a Turing machine (if given
access to unlimited memory and allowed to operate indenitely), they are by this formal crite-
rion computationally equivalent. Indeed, an average human being (provided with unlimited
scrap paper and unlimited time) could also implement a Turing machine, and thus is also
equivalent by the same criterion. What matters for our purposes, however, is what these dif-
ferent systems can achieve in practice, with nite memory and in reasonable time. And the
272
|
NOTES: CHAPTER 3
eciency variations are so great that one can readily make some distinctions. For example, a
typical individual with an IQ of 85 could be taught to implement a Turing machine. (Conceiv-
ably, it might even be possible to train some particularly gied and docile chimpanzee to do
this.) Yet, for all practical intents and purposes, such an individual is presumably incapable of,
say, independently developing general relativity theory or of winning a Fields medal.
15. Oral storytelling traditions can produce great works (such as the Homeric epics) but perhaps
some of the contributing authors possessed uncommon gis.
16. Unless it contains as components intellects that have speed or quality superintelligence.
17. Our inability to specify what all these problems are may in part be due to a lack of trying: there
is little point in spending time detailing intellectual jobs that no individual and no currently
feasible organization can perform. But it is also possible that even conceptualizing some of
these jobs is itself one of those jobs that we currently lack the brains to perform.
18. Cf. Boswell (1917); see also Walker (2002).
19. is mainly occurs in short bursts in a subset of neurons—most have more sedate ring
rates (Gray and McCormick 1996; Steriade et al. 1998). ere are some neurons (“chattering
neurons,” also known as “fast rhythmically bursting” cells) that may reach ring frequencies as
high as 750Hz, but these seem to be extreme outliers.
20. Feldman and Ballard (1982).
21. e conduction velocity depends on axon diameter (thicker axons are faster) and whether the
axon is myelinated. Within the central nervous system, transmission delays can range from less
than a millisecond to up to 100ms (Kandel et al. 2000). Transmission in optical bers is around
68% c (because of the refractive index of the material). Electrical cables are roughly the same
speed, 59–77% c.
22. is assumes a signal velocity of 70% c. Assuming 100% c ups the estimate to 1.10
18
m
3
.
23. e number of neurons in an adult human male brain has been estimated at 86.1 ± 8.1 billion, a
number arrived at by dissolving brains and fractionating out the cell nuclei, counting the ones
stained with a neuron-specic marker. In the past, estimates in the neighborhood of 75–125 bil-
lion neurons were common. ese were typically based on manual counting of cell densities in
representative small regions (Azevedo et al. 2009).
24. Whitehead (2003).
25. Information processing systems can very likely use molecular-scale processes for computing
and data storage and reach at least planetary size in extent. e ultimate physical limits to com-
putation set by quantum mechanics, general relativity, and thermodynamics are, however, far
beyond this “Jupiter brain” level (Sandberg 1999; Lloyd 2000).
26. Stansberry and Kudritzki (2012). Electricity used in data centers worldwide amounted to
1.11.5% of total electricity use (Koomey 2011). See also Muehlhauser and Salamon (2012).
27. is is an oversimplication. e number of chunks working memory can maintain is both
information- and task-dependent; however, it is clearly limited to a small number of chunks. See
Miller (1956) and Cowan (2001).
28. An example might be that the diculty of learning Boolean concepts (categories dened by
logical rules) is proportional to the length of the shortest logically equivalent propositional for-
mula. Typically, even formulae just 34 literals long are very dicult to learn. See Feldman
(2000).
29. See Landauer (1986). is study is based on experimental estimates of learning and forgetting
rates in humans. Taking into account implicit learning might push the estimate up a little. If
one assumes a storage capacity ~1 bit per synapse, one gets an upper bound on human memory
capacity of about 10
15
bits. For an overview of dierent estimates, see Appendix A of Sandberg
and Bostrom (2008).
30. Channel noise can trigger action potentials, and synaptic noise produces signicant variability
in the strength of transmitted signals. Nervous systems appear to have evolved to make numer-
ous trade-os between noise tolerance and costs (mass, size, time delays); see Faisal et al. (2008).
For example, axons cannot be thinner than 0.1µm lest random opening of ion channels create
spontaneous action potentials (Faisal et al. 2005).
31. Trachtenberg et al. (2002).
NOTES: CHAPTERS 34
|
273
32. In terms of memory and computational power, though not in terms of energy eciency. e
fastest computer in the world at the time of writing was China’s “Tianhe-2,” which displaced
Cray Inc. Titan in June 2013 with a performance of 33.86 petaFLOPS. It uses 17.6MW of power,
almost six orders of magnitude more than the brain’s ~20W.
33. Note that this survey of sources of machine advantage is disjunctive: our argument succeeds
even if some of the items listed are illusory, so long as there is at least one source that can provide
a suciently large advantage.
CHAPTER 4: THE KINETICS OF AN INTELLIGENCE EXPLOSION
1. e system may not reach one of these baselines at any sharply dened point. ere may instead
be an interval during which the system gradually becomes able to outperform the external re-
search team on an increasing number of system-improving development tasks.
2. In the past half-century, at least one scenario has been widely recognized in which the existing
world order would come to an end in the course of minutes or hours: global thermonuclear war.
3. is would be consistent with the observation that the Flynn eect—the secular increase in
measured IQ scores within most populations at a rate of some 3 IQ points per decade over the
past 60 years or so—appears to have ceased or even reversed in recent years in some highly
developed countries such as the United Kingdom, Denmark, and Norway (Teasdale and Owen
2008; Sundet et al. 2004). e cause of the Flynn eect in the past—and whether and to what ex-
tent it represents any genuine gain in general intelligence or merely improved skill at solving IQ
test-style puzzles—has been the subject of wide debate and is still not known. Even if the Flynn
eect (at least partially) reects real cognitive gains, and even if the eect is now diminishing
or even reversing, this does not prove that we have yet hit diminishing returns in whatever un-
derlying cause was responsible for the observed Flynn eect in the past. e decline or reversal
could instead be due to some independent detrimental factor that would otherwise have pro-
duced an even bigger observed decline.
4. Bostrom and Roache (2011).
5. Somatic gene therapy could eliminate the maturational lag, but is technically much more chal-
lenging than germline interventions and has a lower ultimate potential.
6. Average global economic productivity growth per year over the period 1960–2000 was 4.3%
(Isaksson 2007). Only part of this productivity growth is due to gains in organizational
eciency. Some particular networks or organizational processes of course are improving at
much faster rates.
7. Biological brain evolution was subject to many constraints and trade-os that are drastically
relaxed when the mind moves to a digital medium. For example, brain size is limited by head
size, and a head that is too big has trouble passing through the birth canal. A large brain also
guzzles metabolic resources and is a dead weight that impedes movement. e connectivity be-
tween certain brain regions might be limited by steric constraints—the volume of white matter
is signicantly larger than the volume of the gray matter it connects. Heat dissipation is limited
by blood ow, and might be close to the upper limit for acceptable functioning. Furthermore,
biological neurons are noisy, slow, and in need of constant protection, maintenance, and resup-
ply by glial cells and blood vessels (contributing to the intracranial crowding). See Bostrom and
Sandberg (2009b).
8. Yudkowsky (2008a, 326). For a more recent discussion, see Yudkowsky (2013).
9. e picture shows cognitive ability as a one-dimensional parameter, to keep the drawing
simple. But this is not essential to the point being made here. One could, for example, instead
represent a cognitive ability prole as a hypersurface in a multidimensional space.
10. Lin et al. (2012).
11. One gets a certain increase in collective intelligence simply by increasing the number of its
constituent intellects. Doing so should at least enable better overall performance on tasks that
can be easily parallelized. To reap the full returns from such a population explosion, however,
one would also need to achieve some (more than minimal) level of coordination between the
constituents.
274
|
NOTES: CHAPTERS 45
12. e distinction between speed and quality of intelligence is anyhow blurred in the case of non-
neuromorphic AI systems.
13. Rajab et al. (2006, 41–52).
14. It has been suggested that using congurable integrated circuits (FPGAs) rather than general-
purpose processors could increase computational speeds in neural network simulations by up
to two orders of magnitude (Markram 2006). A study of high-resolution climate modeling in
the petaFLOP-range found a twenty-four to thirty-four-fold reduction of cost and about two
orders of magnitude reduction in power requirements using a custom variant of embedded pro-
cessor chips (Wehner et al. 2008).
15. Nordhaus (2007). ere are many overviews of the dierent meanings of Moore’s law; see, e.g.,
Tuomi (2002) and Mack (2011).
16. If the development is slow enough, the project can avail itself of progress being made in the in-
terim by the outside world, such as advances in computer science made by university research-
ers and improvements in hardware made by the semiconductor industry.
17. Algorithmic overhang is perhaps less likely, but one exception would be if exotic hardware such
as quantum computing becomes available to run algorithms that were previously infeasible.
One might also argue that neural networks and deep machine learning are cases of algorithm
overhang: too computationally expensive to work well when rst invented, they were shelved for
a while, then dusted o when fast graphics processing units made them cheap to run. Now they
win contests.
18. And even if progress on the way toward the human baseline were slow.
19. O
world
is that part of the worlds optimization power that is applied to improving the system in
question. For a project operating in complete isolation, one that receives no signicant ongoing
support from the external world, we have O
world
0, even though the project must have started
with a resource endowment (computers, scientic concepts, educated personnel, etc.) that is
derived from the entire world economy and many centuries of development.
20. e most relevant of the seed AI’s cognitive abilities here is its ability to perform intelligent de-
sign work to improve itself, i.e. its intelligence amplication capability. (If the seed AI is good at
enhancing another system, which is good at enhancing the seed AI, then we could view these as
subsystems of a larger system and focus our analysis on the greater whole.)
21. is assumes that recalcitrance is not known to be so high as to discourage investment alto-
gether or divert it to some alternative project.
22. A similar example is discussed in Yudkowsky (2008b).
23. Since inputs have risen (e.g. amounts invested in building new foundries, and number of people
working in the semiconductor industry), Moore’s law itself has not given such a rapid growth
if we control for this increase in inputs. Combined with advances in soware, however, an
18-month doubling time in performance per unit of input may be more historically plausible.
24. Some tentative attempts have been made to develop the idea of an intelligence explosion within
the framework of economic growth theory; see, e.g., Hanson (1998b); Jones (2009); Salamon
(2009). ese studies have pointed to the potential of extremely rapid growth given the arrival
of digital minds, but since endogenous growth theory is relatively poorly developed even for
historical and contemporary applications, any application to a potentially discontinuous future
context is better viewed at this stage as a source of potentially useful concepts and considera-
tions than as an exercise likely to deliver authoritative forecasts. For an overview of attempts to
mathematically model a technological singularity, see Sandberg (2010).
25. It is of course also possible that there will be no takeo at all. But since, as argued earlier, super-
intelligence looks technically feasible, the absence of a takeo would likely be due to the inter-
vention of some defeater, such as an existential catastrophe. If strong superintelligence arrived
not in the shape of articial intelligence or whole brain emulation but through one of other
paths we considered above, then a slower takeo would be more likely.
CHAPTER 5: DECISIVE STRATEGIC ADVANTAGE
1. A soware mind might run on a single machine as opposed to a worldwide network of comput-
ers; but this is not what we mean by “concentration.” Instead, what we are interested in here is
NOTES: CHAPTER 5
|
275
the extent to which power, specically power derived from technological ability, will be concen-
trated in the advanced stages of, or immediately following, the machine intelligence revolution.
2. Technology diusion of consumer products, for example, tends to be slower in developing
countries (Talukdar et al. 2002). See also Keller (2004) and e World Bank (2008).
3. e economic literature dealing with the theory of the rm is relevant as a comparison point for
the present discussion. e locus classicus is Coase (1937). See also, e.g., Canbäck et al. (2006);
Milgrom and Roberts (1990); Hart (2008); Simester and Knez (2002).
4. On the other hand, it could be especially easy to steal a seed AI, since it consists of soware that
could be transmitted electronically or carried on a portable memory device.
5. Barber (1991) suggests that the Yangshao culture (5000–3000 ) might have used silk. Sun
etal. (2012) estimate, based on genetic studies, domestication of the silkworm to have occurred
about 4,100 years ago.
6. Cook (1984, 144). is story might be too good to withstand historical scrutiny, rather like Pro-
copius’ (Wars VIII.xvii.1–7) story of how the silkworms were supposedly brought to Byzantium
by wandering monks, hidden in their hollow bamboo staves (Hunt 2011).
7. Wood (2007); Temple (1986).
8. Pre-Columbian cultures did have the wheel but used it only for toys (probably due to a lack of
good dra animals).
9. Koubi (1999); Lerner (1997); Koubi and Lalman (2007); Zeira (2011); Judd et al. (2012).
10. Estimated from a variety of sources. e time gap is oen somewhat arbitrary, depending on
how exactly “equivalent” capabilities are dened. Radar was used by at least two countries with-
in a couple of years of its introduction, but exact gures in months are hard to come by.
11. e RDS-6 in 1953 was the rst test of a bomb with fusion reactions, but the RDS-37 in 1955 was
the rst “true” fusion bomb, where most power came from the fusion reaction.
12. Unconr m ed.
13. Tests in 1989, project cancelled in 1994.
14. Deployed system, capable of a range greater than 5,000km.
15. Polaris missiles bought from the USA.
16. Current work is underway on the Taimur missile, likely based on Chinese missiles.
17. e RSA-3 rocket tested 1989–90 was intended for satellite launches and/or as an ICBM.
18. MIRV = multiple independently targetable re-entry vehicle, a technology that enables a single
ballistic missile to carry multiple warheads that can be programmed to hit dierent targets.
19. e Agni V system is not yet in service.
20. Ellis (1999).
21. If we model the situation as one where the lag time between projects is drawn from a normal
distribution, then the likely distance between the leading project and its closest follower will
also depend on how many projects there are. If there are a vast number of projects, then the dis-
tance between the rst two is likely small even if the variance of the distribution is moderately
high (though the expected gap between the lead and the second project declines very slowly
with the number of competitors if completion times are normally distributed). However, it is
unlikely that there will be a vast number of projects that are each well enough resourced to be
serious contenders. (ere might be a greater number of projects if there are a large number of
dierent basic approaches that could be pursued, but in that case many of those approaches are
likely to prove dead ends.) As suggested, empirically we seem to nd that there is usually no
more than a handful of serious competitors pursuing any one specic technological goal. e
situation is somewhat dierent in a consumer market where there are many niches for slightly
dierent products and where barriers to entry are low. ere are lots of one-person projects
designing T-shirts, but only a few rms in the world developing the next generation of graphics
cards. (Two rms, AMD and NVIDIA, enjoy a near duopoly at the moment, though Intel is also
competing at the lower-performance end of the market.)
22. Bostrom (2006c). One could imagine a singleton whose existence is invisible (e.g. a superintel-
ligence with such advanced technology or insight that it could subtly control world events with-
out any human noticing its interventions); or a singleton that voluntarily imposes very strict
limitations on its own exercise of power (e.g. punctiliously conning itself to ensuring that cer-
tain treaty-specied international rules—or libertarian principles—are respected). How likely
276
|
NOTES: CHAPTER 5
any particular kind of singleton is to arise is of course an empirical question; but conceptually, at
least, it is possible to have a good singleton, a bad singleton, a rambunctiously diverse singleton,
a blandly monolithic singleton, a crampingly oppressive singleton, or a singleton more akin to
an extra law of nature than to a yelling despot.
23. Jones (1985, 344).
24. It might be signicant that the Manhattan Project was carried out during wartime. Many of the
scientists who participated claimed to be primarily motivated by the wartime situation and the
fear that Nazi Germany might develop atomic weapons ahead of the Allies. It might be dicult
for many governments to mobilize a similarly intensive and secretive eort in peacetime. e
Apollo program, another iconic science/engineering megaproject, received a strong impetus
from the Cold War rivalry.
25. ough even if they were looking hard, it is not clear that they would appear (publicly) to be do-
ing so.
26. Cryptographic techniques could enable the collaborating team to be physically dispersed.
e only weak link in the communication chain might be the input stage, where the physical
act of typing could potentially be observed. But if indoor surveillance became common (by
means of microscopic recording devices), those keen on protecting their privacy might de-
velop countermeasures (e.g. special closets that could be sealed o from would-be eavesdrop-
ping devices). Whereas physical space might become transparent in a coming surveillance
age, cyberspace might possibly become more protected through wider adoption of stronger
cryptographic protocols.
27. A totalitarian state might take recourse to even more coercive measures. Scientists in relevant
elds might be swept up and put into work camps, akin to the “academic villages” in Stalinist
Russia.
28. When the level of public concern is relatively low, some researchers might welcome a little bit of
public fear-mongering because it draws attention to their work and makes the area they work in
seem important and exciting. When the level of concern becomes greater, the relevant research
communities might change their tune as they begin to worry about funding cuts, regulation,
and public backlash. Researchers in neighboring disciplines—such as those parts of computer
science and robotics that are not very relevant to articial general intelligence—might resent
the dri of funding and attention away from their own research areas. ese researchers might
also correctly observe that their work carries no risk whatever of leading to a dangerous intel-
ligence explosion. (Some historical parallels might be drawn with the career of the idea of nano-
technology; see Drexler [2013].)
29. ese have been successful in that they have achieved at least some of what they set out to do.
How successful they have been in a broader sense (taking into account cost-eectiveness and
so forth) is harder to determine. In the case of the International Space Station, for example,
there have been huge cost overruns and delays. For details of the problems encountered by the
project, see NASA (2013). e Large Hadron Collider project has had some major setbacks, but
this might be due to the inherent diculty of the task. e Human Genome Project achieved
success in the end, but seems to have received a speed boost from being forced to compete with
Craig Venter’s private corporate eort. Internationally sponsored projects to achieve controlled
fusion energy have failed to deliver on expectations, despite massive investment; but again, this
might be attributable to the task turning out to be more dicult than anticipated.
30. US Congress, Oce of Technology Assessment (1995).
31. Homan (2009); Rhodes (2008).
32. R hodes (1986).
33. e US Navys code-breaking organization, OP-20-G, apparently ignored an invitation to gain
full knowledge of Britain’s anti-Enigma methods, and failed to inform higher-level US decision
makers of Britain’s oer to share its cryptographic secrets (Burke 2001). is gave American
leaders the impression that Britain was withholding important information, a cause of friction
throughout the war. Britain did share with the Soviet government some of the intelligence they
had gleaned from decrypted German communications. In particular, Russia was warned about
the German preparations for Operation Barbarossa. But Stalin refused to believe the warning,
partly because the British did not disclose how they had obtained the information.
NOTES: CHAPTERS 56
|
277
34. For a few years, Russell seems to have advocated the threat of nuclear war to persuade Rus-
sia to accept the Baruch plan; later, he was a strong proponent of mutual nuclear disarmament
(Russell and Grin 2001). John von Neumann is reported to have believed that a war between
the United States and Russia was inevitable, and to have said, “If you say why not bomb them
[the Russians] tomorrow, I say why not bomb them today? If you say today at ve o’clock, I say
why not one o’clock?” (It is possible that he made this notorious statement to burnish his anti-
communist credentials with US Defense hawks in the McCarthy era. Whether von Neumann,
had he been in charge of US policy, would actually have launched a rst strike is impossible to
ascertain. See Blair [1957], 96.)
35. B ar at ta (20 0 4).
36. If the AI is controlled by a group of humans, the problem may apply to this human group,
though it is possible that new ways of reliably committing to an agreement will be available by
this time, in which case even human groups could avoid this problem of potential internal un-
raveling and overthrow by a sub-coalition.
CHAPTER 6: COGNITIVE SUPERPOWERS
1. In what sense is humanity a dominant species on Earth? Ecologically speaking, humans are the
most common large (~50kg) animal, but the total human dry biomass (~100 billionkg) is not
so impressive compared with that of ants, the family Formicidae (300 billion–3,000 billion kg).
Humans and human utility organisms form a very small part (<0.001) of total global biomass.
However, croplands and pastures are now among the largest ecosystems on the planet, covering
about 35% of the ice-free land surface (Foley et al. 2007). And we appropriate nearly a quarter
of net primary productivity according to a typical assessment (Haberl et al. 2007), though esti-
mates range from 3 to over 50% depending mainly on varying denitions of the relevant terms
(Haberl et al. 2013). Humans also have the largest geographic coverage of any animal species
and top the largest number of dierent food chains.
2. Zalasiewicz et al. (2008).
3. See rst note to this chapter.
4. Strictly speaking, this may not be quite correct. Intelligence in the human species ranges all the
way down to approximately zero (e.g. in the case of embryos or patients in permanent vegeta-
tive state). In qualitative terms, the maximum dierence in cognitive ability within the human
species is therefore perhaps greater than the dierence between any human and a superintel-
ligence. But the point in the text stands if we read “human” as “normally functioning adult.
5. Gottfredson (2002). See also Carroll (1993) and Deary (2001).
6. See Legg (2008). Roughly, Legg proposes to measure a reinforcement-learning agent as its
expected performance in all reward-summable environments, where each such environment
receives a weight determined by its Kolmogorov complexity. We will explain what is meant
by reinforcement learning in Chapter 12. See also Dowe and Herndez-Orallo (2012) and
Hibbard (2011).
7. With regard to technology research in areas like biotechnology and nanotechnology, what a
superintelligence would excel at is the design and modeling of new structures. To the extent that
design ingenuity and modeling cannot substitute for physical experimentation, the superintel-
ligence’s performance advantage may be qualied by its level of access to the requisite experi-
mental apparatus.
8. E.g., Drexler (1992, 2013).
9. A narrow-domain AI could of course have signicant commercial applications, but this does
not mean that it would have the economic productivity superpower. For example, even if a
narrow-domain AI earned its owners several billions of dollars a year, this would still be four
orders of magnitude less than the rest of the world economy. In order for the system directly and
substantially to increase world product, an AI would need to be able to perform many kinds of
work; that is, it would need competence in many domains.
10. e criterion does not rule out all scenarios in which the AI fails. For example, the AI might ra-
tionally take a gamble that has a high chance of failing. In this case, however, the criterion could
take the form that (a) the AI should make an unbiased estimate of the gambles low chance of
278
|
NOTES: CHAPTER 6
success and (b) there should be no better gamble available to the AI that we present-day humans
can think of but that the AI overlooks.
11. Cf. Freitas (2000) and Vassar and Freitas (2006).
12. Yudkowsky (2008a).
13. Freitas (1980); Freitas and Merkle (2004, Chap. 3); Armstrong and Sandberg (2013).
14. See, e.g., Human and Pless (2003), Knill et al. (2000), Drexler (1986).
15. at is to say, the distance would be small on some “natural” metric, such as the logarithm of
the size of the population that could be sustainably supported at subsistence level by a given
level of capability if all resources were devoted to that end.
16. is estimate is based on the WMAP estimate of a cosmological baryon density of 9.9×10
–30
g/cm
3
and assumes that 90% of the mass is intergalactic gas, that some 15% of the galactic mass is stars
(about 80% of baryonic matter), and that the average star weighs in at 0.7 solar masses (Read and
Trentham 2005; Carroll and Ostlie 2007).
17. Armstrong and Sandberg (2013).
18. Even at 100% of c (which is unattainable for objects with nonzero rest mass) the number of
reachable galaxies is only about 6×10
9
. (Cf. Gott et al. [2005] and Heyl [2005].) We are assuming
that our current understanding of the relevant physics is correct. It is hard to be very condent
in any upper bound, since it is at least conceivable that a superintelligent civilization might ex-
tend its reach in some way that we take to be physically impossible (for instance, by building
time machines, by spawning new inationary universes, or by some other, as yet unimagined
means).
19. e number of habitable planets per star is currently uncertain, so this is merely a crude esti-
mate. Traub (2012) predicts that one-third of stars in spectral classes F, G, or K have at least one
terrestrial planet in the habitable zone; see also Clavin (2012). FGK stars form about 22.7% of
the stars in the solar neighborhood, suggesting that 7.6% of stars have potentially suitable plan-
ets. In addition, there might be habitable planets around the more numerous M stars (Gilster
2012). See also Robles et al. (2008).
It would not be necessary to subject human bodies to the rigors of intergalactic travels. AIs
could oversee the colonization process. Homo sapiens could be brought along as information,
which the AIs could later use to instantiate specimens of our species. For example, genetic in-
formation could be synthesized into DNA, and a rst generation of humans could be incubated,
raised, and educated by AI guardians taking an anthropomorphic guise.
20. O’Neill (1974).
21. Dyson (1960) claims to have gotten the basic idea from science ction writer Olaf Stapledon
(1937), who in turn might have been inspired by similar thoughts by J. D. Bernal (Dyson 1979,
211).
22. Landauer’s principle states that there is a minimum amount of energy required to change one
bit of information, known as theLandauer limit, equal to kTln 2, where k is theBoltzmann
constant (1.38×10
−23
J/K) and T is the temperature. If we assume the circuitry is maintained
at around 300K, then 10
26
watts allows us to erase approximately 10
47
bits per second. (On the
achievable eciency of nanomechanical computational devices, see Drexler [1992]. See also
Bradbury [1999]; Sandberg [1999]; Ćirković [2004]. e foundations of Landauer’s principle are
still somewhat in dispute; see, e.g., Norton [2011].)
23. Stars vary in their power output, but the Sun is a fairly typical main-sequence star.
24. A more detailed analysis might consider more closely what types of computation we are inter-
ested in. e number of serial computations that can be performed is quite limited, since a fast
serial computer must be small in order to minimize communications lags within the dierent
parts of the computer. ere are also limits on the number of bits that can be stored, and, as we
saw, on the number of irreversible computational steps (involving the erasure of information)
that can be performed.
25. We are assuming here that there are no extraterrestrial civilizations that might get in the way.
We are also assuming that the simulation hypothesis is false. See Bostrom (2003a). If either of
these assumptions is incorrect, there may be important non-anthropogenic risks—ones that
involve intelligent agency of a nonhuman sort. See also Bostrom (2003b, 2009c).
NOTES: CHAPTERS 6–7
|
279
26. At least a wise singleton that grasped the idea of evolution could, in principle, have embarked on a
eugenics program by means of which it could slowly have raised its level of collective intelligence.
27. Tetlock and Belkin (1996).
28. To be clear: colonizing and re-engineering a large part of the accessible universe is not currently
within our direct reach. Intergalactic colonization is far beyond today’s technology. e point is
that we could in principle use our present capabilities to develop the additional capabilities that
would be needed, thus placing the accomplishment within our indirect reach. It is of course also
true that humanity is not currently a singleton and that we do not know that we would never
face intelligent opposition from some external power if we began to re-engineer the accessible
universe. To meet the wise-singleton sustainability threshold, however, it suces that one pos-
sesses a capability set such that if a wise singleton facing no intelligent opposition had possessed
this capability set then the colonization and reengineering of a large part of the accessible uni-
verse would be within its indirect reach.
29. Sometimes it might be useful to speak of two AIs as each having a given superpower. In an ex-
tended sense of the word, one could thus conceive of a superpower as something that an agent
has relative to some eld of action—in this example, perhaps a eld that includes all of human
civilization but excludes the other AI.
CHAPTER 7: THE SUPERINTELLIGENT WILL
1. is is of course not to deny that dierences that appear small visually can be functionally pro-
found.
2. Yudkowsky (2008a, 310).
3. David Hume, the Scottish Enlightenment philosopher, thought that beliefs alone (say, about
what is a good thing to do) cannot motivate action: some desire is required. is would support
the orthogonality thesis by undercutting one possible objection to it, namely that sucient in-
telligence might entail the acquisition of certain beliefs which would then necessarily produce
certain motivations. However, although the orthogonality thesis can draw support from the
Humean theory of motivation, it does not presuppose it. In particular, one need not maintain
that beliefs alone can never motivate action. It would suce to assume, for example, that an
agent—be it ever so intelligent—can be motivated to pursue any course of action if the agent
happens to have certain desires of some sucient, overriding strength. Another way in which
the orthogonality thesis could be true even if the Humean theory of motivation is false is if ar-
bitrarily high intelligence does not entail the acquisition of any such beliefs as are (putatively)
motivating on their own. A third way in which it might be possible for the orthogonality thesis
to be true even if the Humean theory were false is if it is possible to build an agent (or more
neutrally, an “optimization process”) with arbitrarily high intelligence but with constitution so
alien as to contain no clear functional analogs to what in humans we call “beliefs” and “desires.
(For some recent attempts to defend the Humean theory of motivation see Smith [1987], Lewis
[1988], and Sinhababu [2009].)
4. For instance, Derek Part has argued that certain basic preferences would be irrational, such as
that of an otherwise normal agent who has “Future-Tuesday-Indierence”:
A certain hedonist cares greatly about the quality of his future experiences. With one exception,
he cares equally about all the parts of his future. The exception is that he has Future-Tuesday-
Indierence. Throughout every Tuesday he cares in the normal way about what is happening to
him. But he never cares about possible pains or pleasures on a future Tuesday... . This indierence
is a bare fact. When he is planning his future, it is simply true that he always prefers the prospect
of great suering on a Tuesday to the mildest pain on any other day. (Part [986, 234]; see also
Part [20])
For our purposes, we need take no stand on whether Part is right that this agent is irrational,
so long as we grant that it is not necessarily unintelligent in the instrumental sense explained
in the text. Parts agent could have impeccable instrumental rationality, and therefore great
intelligence, even if he falls short on some kind of sensitivity to “objective reason” that might
280
|
NOTES: CHAPTER 7
be required of a fully rational agent. erefore, this kind of example does not undermine the
orthogonality thesis.
5. Even if there are objective moral facts that any fully rational agent would comprehend, and even
if these moral facts are somehow intrinsically motivating (such that anybody who fully compre-
hends them is necessarily motivated to act in accordance with them), this need not undermine
the orthogonality thesis. e thesis could still be true if an agent could have impeccable instru-
mental rationality even whilst lacking some other faculty constitutive of rationality proper, or
some faculty required for the full comprehension of the objective moral facts. (An agent could
also be extremely intelligent, even superintelligent, without having full instrumental rationality
in every domain.)
6. For more on the orthogonality thesis, see Bostrom (2012) and Armstrong (2013).
7. Sandberg and Bostrom (2008).
8. Stephen Omohundro has written two pioneering papers on this topic (Omohundro 2007,
2008). Omohundro argues that all advanced AI systems are likely to exhibit a number of “basic
drives,” by which he means “tendencies which will be present unless explicitly counteracted.”
e term “AI drive” has the advantage of being short and evocative, but it has the disadvantage
of suggesting that the instrumental goals to which it refers inuence the AIs decision-making
in the same way as psychological drives inuence human decision-making, i.e. via a kind of
phenomenological tug on our ego which our willpower may occasionally succeed in resisting.
at connotation is unhelpful. One would not normally say that a typical human being has a
drive” to ll out their tax return, even though ling taxes may be a fairly convergent instru-
mental goal for humans in contemporary societies (a goal whose realization averts trouble that
would prevent us from realizing many of our nal goals). Our treatment here also diers from
that of Omohundro in some other more substantial ways, although the underlying idea is the
same. (See also Chalmers [2010] and Omohundro [2012].)
9. Chislenko (1997).
10. See also Shulman (2010b).
11. An agent might also change its goal representation if it changes its ontology, in order to trans-
pose its old representation into the new ontology; cf. de Blanc (2011).
Another type of factor that might make an evidential decision theorist undertake various ac-
tions, including changing its nal goals, is the evidential import of deciding to do so. For exam-
ple, an agent that follows evidential decision theory might believe that there exist other agents
like it in the universe, and that its own actions will provide some evidence about how those
other agents will act. e agent might therefore choose to adopt a nal goal that is altruistic
towards those other evidentially linked agents, on grounds that this will give the agent evidence
that those other agents will have chosen to act in like manner. An equivalent outcome might be
obtained, however, without changing one’s nal goals, by choosing in each instant to act as if
one had those nal goals.
12. An extensive psychological literature explores adaptive preference formation. See, e.g., Forgas
etal. (2010).
13. In formal models, the value of information is quantied as the dierence between the expected
value realized by optimal decisions made with that information and the expected value realized
by optimal decisions made without it. (See, e.g., Russell and Norvig [2010].) It follows that the
value of information is never negative. It also follows that any information you know will never
aect any decision you will ever make has zero value for you. However, this kind of model as-
sumes several idealizations which are oen invalid in the real world—such as that knowledge
has no nal value (meaning that knowledge has only instrumental value and is not valuable for
its own sake) and that agents are not transparent to other agents.
14. E.g., Hájek (2009).
15. is strategy is exemplied by the sea squirt larva, which swims about until it nds a suitable
rock, to which it then permanently axes itself. Cemented in place, the larva has less need for
complex information processing, whence it proceeds to digest part of its own brain (its cerebral
ganglion). One can observe the same phenomenon in some academics when they have been
granted tenure.
16 . Bo st rom (2 012).
NOTES: CHAPTERS 7–8
|
28
17. Bostrom (2006c).
18. One could reverse the question and look instead at possible reasons for a superintelligent single-
ton not to develop some technological capabilities. ese include the following: (a) the singleton
foresees that it will have no use for the capability; (b) the development cost is too large relative
to its anticipated utility (e.g. if the technology will never be suitable for achieving any of the sin-
gleton’s ends, or if the singleton has a very high discount rate that strongly discourages invest-
ment); (c) the singleton has some nal value that requires abstention from particular avenues of
technology development; (d) if the singleton is not certain it will remain stable, it might prefer
to refrain from developing technologies that could threaten its internal stability or that would
make the consequences of dissolution worse (for instance, a world government may not wish to
develop technologies that would facilitate rebellion, even if they have some good uses, nor de-
velop technologies for the easy production of weapons of mass destruction which could wreak
havoc if the world government were to dissolve); (e) similarly, the singleton might have made
some kind of binding strategic commitment not to develop some technology, a commitment
that remains operative even if it would now be convenient to develop it. (Note, however, that
some current reasons for technology development would not apply to a singleton: for instance,
reasons arising from arms races.)
19. Suppose that an agent discounts resources obtained in the future at an exponential rate, and
that because of the light speed limitation the agent can only increase its resource endowment
at a polynomial rate. Would this mean that there will be some time aer which the agent would
not nd it worthwhile to continue acquisitive expansion? No, because although the present
value of the resources obtained at future times would asymptote to zero the further into the
future we look, so would the present cost of obtaining them. e present cost of sending out
one more von Neumann probe a 100 million years from now (possibly using some resource
acquired some short time earlier) would be diminished by the same discount factor that would
diminish the present value of the future resources that the extra probe would acquire (modulo a
constant factor).
20. While the volume reached by colonization probes at a given time might be roughly spherical
and expanding with a rate proportional to the square of time elapsed since the rst probe was
launched (~t
2
), the amount of resources contained within this volume will follow a less regular
growth pattern, since the distribution of resources is inhomogeneous and varies over several
scales. Initially, the growth rate might be ~t
2
as the home planet is colonized; then the growth
rate might become spiky as nearby planets and solar systems are colonized; then, as the roughly
disc-shaped volume of the Milky Way gets lled out, the growth rate might even out, to be ap-
proximately proportional to t; then the growth rate might again become spiky as nearby galax-
ies are colonized; then the growth rate might again approximate ~t
2
as expansion proceeds on
a scale over which the distribution of galaxies is roughly homogeneous; then another period of
spiky growth followed by smooth ~t
2
growth as galactic superclusters are colonized; until ulti-
mately the growth rate starts a nal decline, eventually reaching zero as the expansion speed of
the universe increases to such an extent as to make further colonization impossible.
21. e simulation argument may be of particular importance in this context. A superintelligent
agent may assign a signicant probability to hypotheses according to which it lives in a comput-
er simulation and its percept sequence is generated by another superintelligence, and this might
generate various convergent instrumental reasons depending on the agents guesses about what
types of simulations it is most likely to be in. Cf. Bostrom (2003a).
22. Discovering the basic laws of physics and other fundamental facts about the world is a conver-
gent instrumental goal. We may place it under the rubric “cognitive enhancement” here, though
it could also be derived from the “technology perfection” goal (since novel physical phenomena
might enable novel technologies).
CHAPTER 8: IS THE DEFAULT OUTCOME DOOM?
1. Some additional existential risk resides in scenarios in which humanity survives in some highly
suboptimal state or in which a large portion of our potential for desirable development is irre-
versibly squandered. On top of this, there may be existential risks associated with the lead-up to
282
|
NOTES: CHAPTERS 8–9
a potential intelligence explosion, arising, for example, from war between countries competing
to develop superintelligence rst.
2. ere is an important moment of vulnerability when the AI rst realizes the need for such
concealment (an event which we may term the conception of deception). is initial realization
would not itself be deliberately concealed when it occurs. But having had this realization, the
AI might move swily to hide the fact that the realization has occurred, while setting up some
covert internal dynamic (perhaps disguised as some innocuous process that blends in with all
the other complicated processes taking place in its mind) that will enable it to continue to plan
its long-term strategy in privacy.
3. Even human hackers can write small and seemingly innocuous programs that do completely
unexpected things. (For examples, see some the winning entries in the International Obfus-
cated C Code Contest.)
4. e point that some AI control measures could appear to work within a xed context yet fail
catastrophically when the context changes is also emphasized by Eliezer Yudkowsky; see, e.g.,
Yudkowsky (2008a).
5. e term seems to have been coined by science-ction writer Larry Niven (1973), but is based on
real-world brain stimulation reward experiments; cf. Olds and Milner (1954) and Oshima and
Katayama (2010). See also Ring and Orseau (2011).
6. Bostrom (1997).
7. ere might be some possible implementations of a reinforcement learning mechanism that
would, when the AI discovers the wireheading solution, lead to a safe incapacitation rather than
to infrastructure profusion. e point is that this could easily go wrong and fail for unexpected
reasons.
8. is was suggested by Marvin Minsky (vide Russell and Norvig [2010, 1039]).
9. e issue of which kinds of digital mind would be conscious, in the sense of having subjec-
tive phenomenal experience, or “qualia” in philosopher-speak, is important in relation to this
point (though it is irrelevant to many other parts of this book). One open question is how hard
it would be to accurately estimate how a human-like being would behave in various circum-
stances without simulating its brain in enough detail that the simulation is conscious. Another
question is whether there are generally useful algorithms for a superintelligence, for instance
reinforcement-learning techniques, such that the implementation of these algorithms would
generate qualia. Even if we judge the probability that any such subroutines would be conscious
to be fairly small, the number of instantiations might be so large that even a small risk that they
might experience suering ought to be accorded signicant weight in our moral calculation.
See also Metzinger (2003, Chap. 8).
10. Bostrom (2002a, 2003a); Elga (2004).
CHAPTER 9: THE CONTROL PROBLEM
1. E.g., Laont and Martimort (2002).
2. Suppose a majority of voters want their country to build some particular kind of superintel-
ligence. ey elect a candidate who promises to do their bidding, but they might nd it dicult
to ensure that the candidate, once in power, will follow through on her campaign promise and
pursue the project in the way that the voters intended. Supposing she is true to her word, she
instructs her government to contract with an academic or industry consortium to carry out
the work; but again there are agency problems: the bureaucrats in the government department
might have their own views about what should be done and may implement the project in a
way that respects the letter but not the spirit of the leader’s instructions. Even if the govern-
ment department does its job faithfully, the contracted scientic partners might have their own
separate agendas. e problem recurs on many levels. e director of one of the participating
laboratories might lie awake worrying about a technician introducing an unsanctioned element
into the design—imagining Dr. T. R. Eason sneaking into his oce late one night, logging into
the project code base, rewriting a part of the seed AIs goal system. Where it was supposed to say
“serve humanity,” it now says “serve Dr. T. R. Eason.”
NOTES: CHAPTER 9
|
283
3. Even for superintelligence development, though, there could be a role for behavioral testing—as
one auxiliary element within a wider battery of safety measures. Should an AI misbehave in its
developmental phase, something is clearly awry—though, importantly, the converse does not
hold.
4. In a classic exploit from 1975, Steven Dompier wrote a program for the Altair 8800 that took ad-
vantage of this eect (and the absence of shielding around the microcomputers case). Running
the program caused the emission of electromagnetic waves that would produce music when one
held a transistor radio close to the computer (Driscoll 2012). e young Bill Gates, who attended
a demo, reported that he was impressed and mystied by the hack (Gates 1975). ere are in any
case plans to design future chips with built-in Wi-Fi capabilities (Greene 2012).
5. It is no light matter to have held a conviction, which, had we had an opportunity to act upon it,
could have resulted in the ruination of all our cosmic endowment. Perhaps one could argue for
the following principle: if somebody has in the past been certain on N occasions that a system
has been improved suciently to make it safe, and each time it was revealed that they were
wrong, then on the next occasion they are not entitled to assign a credence greater than 1/(N + 1)
to the system being safe.
6. In one informal experiment, the role of the AI was played by an intelligent human. Another
individual played the role of gatekeeper and was tasked with not letting the AI out of the box.
e AI could communicate with the gatekeeper only by text and was given two hours to per-
suade the gatekeeper to let it out. In three cases out of ve, with dierent individuals playing the
gatekeeper, the AI escaped (Yudkowsky 2002). What a human can do, a superintelligence can
do too. (e reverse, of course, does not hold. Even if the task for a real superintelligence were
harder—maybe the gatekeepers would be more strongly motivated to refrain from releasing the
AI than the individuals playing gatekeeper in the experiment—the superintelligence might still
succeed where a human would fail.)
7. One should not overstate the marginal amount of safety that could be gained in this way. Mental
imagery can substitute for graphical display. Consider the impact books can have on people
and books are not even interactive.
8. See also Chalmers (2010). It would be a mistake to infer from this that there is no possible use in
building a system that will never be observed by any outside entity. One might place a nal value
on what goes on inside such a system. Also, other people might have preferences about what
goes on inside such a system, and might therefore be inuenced by its creation or the promise
of its creation. Knowledge of the existence of certain kinds of isolated systems (ones contain-
ing observers) can also induce anthropic uncertainty in outside observers, which may inuence
their behavior.
9. One might wonder why social integration is considered a form of capability control. Should it
not instead be classied as a motivation selection method on the ground that it involves seek-
ing to inuence a system’s behavior by means of incentives? We will look closely at motivation
selection presently; but, in answer to this question, we are construing motivation selection as a
cluster of control methods that work by selecting or shaping a system’s nal goals—goals sought
for their own sakes rather than for instrumental reasons. Social integration does not target a
system’s nal goals, so it is not motivation selection. Rather, social integration aims to limit the
system’s eective capabilities: it seeks to render the system incapable of achieving a certain set of
outcomes—outcomes in which the system attains the benets of defection without suering the
associated penalties (retribution, and loss of the gains from collaboration). e hope is that by
limiting which outcomes the system is able to attain, the system will nd that the most eective
remaining means of realizing its nal goals is to behave cooperatively.
10. is approach may be somewhat more promising in the case of an emulation believed to have
anthropomorphic motivations.
11. I owe this idea to Carl Shulman.
12. Creating a cipher certain to withstand a superintelligent code-breaker is a nontrivial challenge.
For example, traces of random numbers might be le in some observer’s brain or in the micro-
structure of the random generator, from whence the superintelligence can retrieve them; or, if
pseudorandom numbers are used, the superintelligence might guess or discover the seed from
284
|
NOTES: CHAPTER 9
which they were generated. Further, the superintelligence could build large quantum comput-
ers, or even discover unknown physical phenomena that could be used to construct new kinds
of computers.
13. e AI could wire itself to believe that it had received a reward tokens, but this should not make
it wirehead if it is designed to want the reward tokens (as opposed to wanting to be in a state in
which it has certain beliefs about the reward tokens).
14. For the original article, see Bostrom (2003a). See also Elga (2004).
15 . S hulm a n (2 010 a).
16. Basement-level reality presumably contains more computational resources than simulated reality,
since any computational processes occurring in a simulation are also occurring on the computer
running the simulation. Basement-level reality might also contain a wealth of other physical re-
sources which could be hard for simulated agents to access—agents that exist only at the indulgence
of powerful simulators who may have other uses in mind for those resources. (Of course, the infer-
ence here is not strictly deductively valid: in principle, it could be the case that universes in which
simulations are run contain so much more resources that simulated civilizations on average have
access to more resources than non-simulated civilizations, even though each non-simulated civili-
zation that runs simulations has more resources than all the civilizations it simulates do combined.)
17. ere are various further esoteric considerations that might bear on this matter, the implica-
tions of which have not yet been fully analyzed. ese considerations may ultimately be cru-
cially important in developing an all-things-considered approach to dealing with the prospect
of an intelligence explosion. However, it seems unlikely that we will succeed in guring out the
practical import of such esoteric arguments unless we have rst made some progress on the
more mundane kinds of consideration that are the topic of most of this book.
18. Cf., e.g., Quine and Ullian (1978).
19. Which an AI might investigate by considering the performance characteristics of various basic
computational functionalities, such as the size and capacity of various data buses, the time it
takes to access dierent parts of memory, the incidence of random bit ips, and so forth.
20. Perhaps the prior could be (a computable approximation of) the Solomono prior, which as-
signs probability to possible worlds on the basis of their algorithmic complexity. See Li and
Vitányi (2008).
21. e moment aer the conception of deception, the AI might contrive to erase the trace of its
mutinous thought. It is therefore important that this tripwire operate continuously. It would
also be good practice to use a “ight recorder” that stores a complete trace of all the AIs activity
(including exact timing of keyboard input from the programmers), so that its trajectory can be
retraced or analyzed following an automatic shutdown. e information could be stored on a
write-once-read-many medium.
22. Asimov (1942). To the three laws were later added a “Zeroth Law: “(0) A robot may not harm
humanity, or, by inaction, allow humanity to come to harm” (Asimov 1985).
23. Cf. Gunn (1982).
24. Russell (1986, 161f).
25. Similarly, although some philosophers have spent entire careers trying to carefully formulate
deontological systems, new cases and consequences occasionally come to light that necessitate
revisions. For example, deontological moral philosophy has in recent years been reinvigorat-
ed through the discovery of a fertile new class of philosophical thought experiments, “trolley
problems,” which reveal many subtle interactions among our intuitions about the moral sig-
nicance of the acts/omissions distinction, the distinction between intended and unintended
consequences, and other such matters; see, e.g., Kamm (2007).
26. Armstrong (2010).
27. As a rule of thumb, if one plans to use multiple safety mechanisms to contain an AI, it may be
wise to work on each one as if it were intended to be the sole safety mechanism and as if it were
therefore required to be individually sucient. If one puts a leaky bucket inside another leaky
bucket, the water still comes out.
28. A variation of the same idea is to build the AI so that it is continuously motivated to act on its
best guesses about what the implicitly dened standard is. In this setup, the AI’s nal goal is
NOTES: CHAPTERS 9–0
|
285
always to act on the implicitly dened standard, and it pursues an investigation into what this
standard is only for instrumental reasons.
CHAPTER 0: ORACLES, GENIES, SOVEREIGNS, TOOLS
1. ese names are, of course, anthropomorphic and should not be taken seriously as analogies.
ey are just meant as labels for some prima facie dierent concepts of possible system types
that one might consider trying to build.
2. In response to a question about the outcome of the next election, one would not wish to be
served with a comprehensive list of the projected position and momentum vectors of nearby
particles.
3. Indexed to a particular instruction set on a particular machine.
4. Kuhn (1962); de Blanc (2011).
5. It would be harder to apply such a “consensus method” to genies or sovereigns, because there
may oen be numerous sequences of basic actions (such as sending particular patterns of elec-
trical signals to the system’s actuators) that would be almost exactly equally eective at achiev-
ing a given objective; whence slightly dierent agents may legitimately choose slightly dierent
actions, resulting in a failure to reach consensus. By contrast, with appropriately formulated
questions, there would usually be a small number of suitable answer options (such as “yes” and
“no”). (On the concept of a Schelling point, also referred to as a “focal point,” see Schelling
[1980].)
6. Is not the world economy in some respects analogous to a weak genie, albeit one that charges for
its services? A vastly bigger economy, such as might develop in the future, might then approxi-
mate a genie with collective superintelligence.
One important respect in which the current economy is unlike a genie is that although I
can (for a fee) command the economy to deliver a pizza to my door, I cannot command it to
deliver peace. e reason is not that the economy is insuciently powerful, but that it is insuf-
ciently coordinated. In this respect, the economy resembles an assembly of genies serving dif-
ferent masters (with competing agendas) more than it resembles a single genie or any other type
of unied agent. Increasing the total power of the economy by making each constituent genie
more powerful, or by adding more genies, would not necessarily render the economy more ca-
pable of delivering peace. In order to function like a superintelligent genie, the economy would
not only need to grow in its ability to inexpensively produce goods and services (including ones
that require radically new technology), it would also need to become better able to solve global
coordination problems.
7. If the genie were somehow incapable of not obeying a subsequent command—and somehow
incapable of reprogramming itself to get rid of this susceptibility—then it could act to prevent
any new command from being issued.
8. Even an oracle that is limited to giving yes/no answers could be used to facilitate the search for a
genie or sovereign AI, or indeed could be used directly as a component in such an AI. e oracle
could also be used to produce the actual code for such an AI if a suciently large number of
questions can be asked. A series of such questions might take roughly the following form: “In
the binary version of the code of the rst AI that you thought of that would constitute a genie, is
the nth symbol a zero?
9. One could imagine a slightly more complicated oracle or genie that accepts questions or com-
mands only if they are issued by a designated authority, though this would still leave open the
possibility of that authority becoming corrupted or being blackmailed by a third party.
10. John Rawls, a leading political philosopher of the twentieth century, famously employed the
expository device of a veil of ignorance as a way of characterizing the kinds of preference that
should be taken into account in the formulation of a social contract. Rawls suggested that we
should imagine we were choosing a social contract from behind a veil of ignorance that pre-
vents us from knowing which person we will be and which social role we will occupy, the idea
being that in such a situation we would have to think about which society would be generally
fairest and most desirable without regard to our egoistic interests and self-serving biases that
286
|
NOTES: CHAPTERS 0–
might otherwise make us prefer a social order in which we ourselves enjoy unjust privileges. See
Rawls (1971).
11. Kar nofsky (2012).
12. A possible exception would be soware hooked up to suciently powerful actuators, such as
soware in early warning systems if connected directly to nuclear warheads or to human oc-
ers authorized to launch a nuclear strike. Malfunctions in such soware can result in high-risk
situations. is has happened at least twice within living memory. On November 9, 1979, a com-
puter problem led NORAD (North American Aerospace Defense Command) to make a false
report of an incoming full-scale Soviet attack on the United States. e USA made emergency
retaliation preparations before data from early-warning radar systems showed that no attack
had been launched (McLean and Stewart 1979). On September 26, 1983, the malfunctioning So-
viet Oko nuclear early-warning system reported an incoming US missile strike. e report was
correctly identied as a false alarm by the duty ocer at the command center, Stanislav Petrov:
a decision that has been credited with preventing thermonuclear war (Lebedev 2004). It appears
that a war would probably have fallen short of causing human extinction, even if it had been
fought with the combined arsenals held by all the nuclear powers at the height of the Cold War,
though it would have ruined civilization and caused unimaginable death and suering (Gaddis
1982; Parrington 1997). But bigger stockpiles might be accumulated in future arms races, or
even deadlier weapons might be invented, or our models of the impacts of a nuclear Armaged-
don (particularly of the severity of the consequent nuclear winter) might be wrong.
13. is approach could t the category of a direct-specication rule-based control method.
14. e situation is essentially the same if the solution criterion species a goodness measure rather
than a sharp cuto for what counts as a solution.
15. An advocate for the oracle approach could insist that there is at least a possibility that the user
would spot the aw in the proered solution—recognize that it fails to match the users intent
even while satisfying the formally specied success criteria. e likelihood of catching the er-
ror at this stage would depend on various factors, including how humanly understandable the
oracle’s outputs are and how charitable it is in selecting which features of the potential outcome
to bring to the user’s attention.
Alternatively, instead of relying on the oracle itself to provide these functionalities, one might
try to build a separate tool to do this, a tool that could inspect the pronouncements of the oracle
and show us in a helpful way what would happen if we acted upon them. But to do to this in full
generality would require another superintelligent oracle whose divinations we would then have
to trust; so the reliability problem would not have been solved, only displaced. One might seek to
gain an increment of safety through the use of multiple oracles to perform peer review, but this
does not protect in cases where all the oracles fail in the same way—as may happen if, for instance,
they have all been given the same formal specication of what counts as a satisfactory solution.
16. Bird and Layzell (2002) and ompson (1997); also Yaeger (1994, 13–14).
17. Williams (1966).
18. L eig h (2 010).
19. is example is borrowed from Yudkowsky (2011).
20. Wade (1976). Computer experiments have also been conducted with simulated evolution de-
signed to resemble aspects of biological evolution—again with sometimes strange results (see,
e.g., Yaeger [1994]).
21. With suciently great—nite but physically implausible—amounts of computing power, it
would probably be possible to achieve general superintelligence with currently available algo-
rithms. (Cf., e.g., the AIXItl system; Hutter [2001].) But even the continuation of Moore’s law
for another hundred years would not suce to attain the required levels of computing power to
achieve this.
CHAPTER : MULTIPOLAR SCENARIOS
1. Not because this is necessarily the most likely or the most desirable type of scenario, but be-
cause it is the one easiest to analyze with the toolkit of standard economic theory, and thus a
convenient starting point for our discussion.
NOTES: CHAPTER 
|
287
2. American Horse Council (2005). See also Salem and Rowan (2001).
3. Acemoglu (2003); Mankiw (2009); Zuleta (2008).
4. Fredriksen (2012, 8); Salverda et al. (2009, 133).
5. It is also essential for at least some of the capital to be invested in assets that rise with the general
tide. A diversied asset portfolio, such as shares in an index tracker fund, would increase the
chances of not entirely missing out.
6. Many of the European welfare systems are unfunded, meaning that pensions are paid from
ongoing current workers’ contributions and taxes rather than from a pool of savings. Such
schemes would not automatically meet the requirement—in case of sudden massive unemploy-
ment, the revenues from which the benets are paid could dry up. However, governments may
choose to make up the shortfall from other sources.
7. American Horse Council (2005).
8. Providing 7 billion people an annual pension of $90,000 would cost $630 trillion a year, which
is ten times the current world GDP. Over the last hundred years, world GDP has increased about
nineteenfold from around $2 trillion in 1900 to 37 trillion in 2000 (in 1990 int. dollars) ac-
cording to Maddison (2007). So if the growth rates we have seen over the past hundred years
continued for the next two hundred years, while population remained constant, then providing
everybody with an annual $90,000 pension would cost about 3% of world GDP. An intelligence
explosion might make this amount of growth happen in a much shorter time span. See also
Hanson (1998a, 1998b, 2008).
9. And perhaps as much as a millionfold over the past 70,000 years if there was a severe population
bottleneck around that time, as has been speculated. See Kremer (1993) and Hu et al. (2010) for
more data.
10. Cochran and Harpending (2009). See also Clark (2007) and, for a critique, Allen (2008).
11. Kremer (1993).
12. Basten et al. (2013). Scenarios in which there is a continued rise are also possible. In general, the
uncertainty of such projections increases greatly beyond one or two generations into the future.
13. Taken globally, the total fertility rate at replacement was 2.33 children per woman in 2003. is
number comes from the fact that it takes two children per woman to replace the parents, plus a
“third of a child” to make up for (1) the higher probability of boys being born, and (2) early mor-
tality prior to the end of their fertile life. For developed nations, the number is smaller, around
2.1, because of lower mortality rates. (See Espenshade et al. [2003, Introduction, Table 1, 580].)
e population in most developed countries would decline if it were not for immigration. A few
notable examples of countries with sub-replacement fertility rates are: Singapore at 0.79 (lowest
in the world), Japan at 1.39, People’s Republic of China at 1.55, European Union at 1.58, Russia at
1.61, Brazil at 1.81, Iran at 1.86, Vietnam at 1.87, and the United Kingdom at 1.90. Even the U.S.
population would probably decrease slightly with a fertility rate of 2.05. (See CIA [2013].)
14. e fullness of time might occur many billions of years from now.
15. Carl Shulman points out that if biological humans count on living out their natural lifespans
alongside the digital economy, they need to assume not only that the political order in the digi-
tal sphere would be protective of human interests but that it would remain so over very long
periods of time (Shulman 2012). For example, if events in the digital sphere unfolds a thousand
times faster than on the outside, then a biological human would have to rely on the digital body
politic holding steady for 50,000 years of internal change and churn. Yet if the digital political
world were anything like ours, there would be a great many revolutions, wars, and catastrophic
upheavals during those millennia that would probably inconvenience biological humans on the
outside. Even a 0.01% risk per year of a global thermonuclear war or similar cataclysm would
entail a near certain loss for the biological humans living out their lives in slowmo sidereal time.
To overcome this problem, a more stable order in the digital realm would be required: perhaps a
singleton that gradually improves its own stability.
16. One might think that even if machines were far more ecient than humans, there would still be
some wage level at which it would be protable to employ a human worker; say at 1 cent an hour.
If this were the only source of income for humans, our species would go extinct since human be-
ings cannot survive on 1 cent an hour. But humans also get income from capital. Now, if we are
assuming that population grows until total income is at subsistence level, one might think this
288
|
NOTES: CHAPTER 
would be a state in which humans would be working hard. For example, suppose subsistence
level income is $1/day. en, it might seem, population would grow until per person capital pro-
vided only a 90 cents per day income, which people would have to supplement with ten hours of
hard labor to make up the remaining 10 cents. However, this need not be so, because the subsist-
ence level income depends on the amount of work that is done: harder-working humans burn
more calories. Suppose that each hour of work increases food costs by 2 cents. We then have a
model in which humans are idle in equilibrium.
17. It might be thought that a caucus as enfeebled as this would be unable to vote and to otherwise
defend its entitlements. But the pod-dwellers could give power of attorney to AI duciaries to
manage their aairs and represent their political interests. (is part of the discussion in this
section is premised on the assumption that property rights are respected.)
18. It is unclear what is the best term. “Kill” may suggest more active brutality than is warrant-
ed. “End” may be too euphemistic. One complication is that there are two potentially sep-
arate events: ceasing to actively run a process, and erasing the information template. A human
death normally involves both events, but for an emulation they can come apart. at a pro-
gram temporarily ceases to run may be no more consequential than that a human sleeps: but to
permanently cease running may be the equivalent of entering a permanent coma. Still further
complications arise from the fact that emulations can be copied and that they can run at dif-
ferent speeds: possibilities with no direct analogs in human experience. (Cf. Bostrom [2006b];
Bostrom and Yudkowsky [forthcoming].)
19. ere will be a trade-o between total parallel computing power and computational speed, as
the highest computational speeds will be attainable only at the expense of a reduction in power
eciency. is will be especially true aer one enters the era of reversible computing.
20. An emulation could be tested by leading it into temptation. By repeatedly testing how an emu-
lation started from a certain prepared state reacts to various sequences of stimuli, one could
obtain high condence in the reliability of that emulation. But the further the mental state is
subsequently allowed to develop away from its validated starting point, the less certain one
could be that it would remain reliable. (In particular, since a clever emulation might surmise
it is sometimes in a simulation, one would need to be cautious about extrapolating its behavior
into situations where its simulation hypothesis would weigh less heavily in its decision-making.)
21. Some emulations might identify with their clan—i.e. all of their copies and variations derived
from the same template—rather with any one particular instantiation. Such an emulation might
not regard its own termination as a death event, if it knew that other clan members would sur-
vive. Emulations may know that they will get reverted to a particular stored state at the end of
the day and lose that days memories, but be as little put o by this as the partygoer who knows
she will awake the next morning without any recollection of the previous night: regarding this
as retrograde amnesia, not death.
22. An ethical evaluation might take into account many other factors as well. Even if all the work-
ers were constantly well pleased with their condition, the outcome might still be deeply morally
objectionable on other grounds—though which other grounds is a matter of dispute between
rival moral theories. But any plausible assessment would consider subjective well-being to be
one important factor. See also Bostrom and Yudkowsky (forthcoming).
23. World Values Survey (2008).
24. Helliwell and Sachs (2012).
25. Cf. Bostrom (2004). See also Chislenko (1996) and Moravec (1988).
26. It is hard to say whether the information-processing structures that would emerge in this kind
of scenario would be conscious (in the sense of having qualia, phenomenal experience). e
reason this is hard is partly our empirical ignorance about which cognitive entities would arise
and partly our philosophical ignorance about which types of structure have consciousness. One
could try to reframe the question, and instead of asking whether the future entities would be
conscious, one could ask whether the future entities would have moral status; or one could ask
whether they would be such that we have preferences about their “well-being.” But these ques-
tions may be no easier to answer than the question about consciousness—in fact, they might
require an answer to the consciousness question inasmuch as moral status or our preferences
depend on whether the entity in question can subjectively experience its condition.
NOTES: CHAPTER 
|
289
27. For an argument that both geological and human history manifest such a trend toward greater
complexity, see Wright (2001). For an opposing argument (criticized in Chapter 9 of Wright’s
book), see Gould (1990). See also Pinker (2011) for an argument that we are witnessing a robust
long-term trend toward decreasing violence and brutality.
28. For more on observation selection theory, see Bostrom (2002a).
29. Bostrom (2008a). A much more careful examination of the details of our evolutionary histo-
ry would be needed to circumvent the selection eect. See, e.g., Carter (1983, 1993); Hanson
(1998d); Ćirković et al. (2010).
30. Kansa (2003).
31. E.g., Zahavi and Zahavi (1997).
32. See Miller (2000).
33. Kansa (2003). For a provocative take, see also Frank (1999).
34. It is not obvious how best to measure the degree of global political integration. One perspective
would be that whereas a hunter–gatherer tribe might have integrated a hundred individuals into
a decision-making entity, the largest political entities today contain more than a billion individ-
uals. is would amount to a dierence of seven orders of magnitude, with only one additional
magnitude to go before the entire world population is contained within a single political entity.
However, at the time when the tribe was the largest scale of integration, the world population
was much smaller. e tribe might have contained as much as a thousandth of the individuals
then living. is would make the increase in the scale of political integration as little as two
orders of magnitude. Looking at the fraction of world population that is politically integrated,
rather than at absolute numbers, seems appropriate in the present context (particularly as the
transition to machine intelligence may cause a population explosion, of emulations or other
digital minds). But there have also been developments in global institutions and networks of
collaboration outside of formal state structures, which should also be taken into account.
35. One of the reasons for supposing that the rst machine intelligence revolution will be swi—the
possible existence of a hardware overhang—does not apply here. However, there could be other
sources of rapid gain, such as a dramatic breakthrough in soware associated with transition-
ing from emulation to purely synthetic machine intelligence.
36. Shulman (2010b).
37. How the pro et contra would balance out might depend on what kind of work the superorganism
is trying to do, and how generally capable the most generally capable available emulation tem-
plate is. Part of the reason many dierent types of human beings are needed in large organiza-
tions today is that humans who are very talented in many domains are rare.
38. It is of course very easy to make multiple copies of a soware agent. But note that copying is not
in general sucient to ensure that the copies have the same nal goals. In order for two agents
to have the same nal goals (in the relevant sense of “same”), the goals must coincide in their
indexical elements. If Bob is selsh, a copy of Bob will likewise be selsh. Yet their goals do not
coincide: Bob cares about Bob whereas Bob-copy cares about Bob-copy.
39. Shulman (2010b, 6).
40. is might be more feasible for biological humans and whole brain emulations than for arbi-
trary articial intelligences, which might be constructed so as to have hidden compartments or
functional dynamics that may be very hard to discover. On the other hand, AIs specically built
to be transparent should allow for more thoroughgoing inspection and verication than is pos-
sible with brain-like architectures. Social pressures may encourage AIs to expose their source
code, and to modify themselves to render themselves transparent—especially if being transpar-
ent is a precondition to being trusted and thus to being given the opportunity to partake in
benecial transactions. Cf. Hall (2007).
41. Some other issues that seem relatively minor, especially in cases where the stakes are enormous
(as they are for the key global coordination failures), include the search cost of nding policies
that could be of mutual interest, and the possibility that some agents might have a basic prefer-
ence for “autonomy” in a form that would be reduced by entering into comprehensive global
treaties that have monitoring and enforcement mechanisms attached.
42. An AI might perhaps achieve this by modifying itself appropriately and then giving observers
read-only access to its source code. A machine intelligence with a more opaque architecture
290
|
NOTES: CHAPTERS –2
(such as an emulation) might perhaps achieve it by publicly applying to itself some motivation
selection method. Alternatively, an external coercive agency, such as a superorganism police
force, might perhaps be used not only to enforce the implementation of a treaty reached between
dierent parties, but also internally by a single party to commit itself to a particular course of
action.
43. Evolutionary selection might have favored threat-ignorers and even characters visibly so highly
strung that they would rather ght to the death than suer the slightest discomture. Such a
disposition might bring its bearer valuable signaling benets. (Any such instrumental rewards
of having the disposition need of course play no part in the agent’s conscious motivation: he
may value justice or honor as ends in themselves.)
44. A denitive verdict on these matters, however, must await further analysis. ere are various
other potential complications which we cannot explore here.
CHAPTER 2: ACQUIRING VALUES
1. Various complications and modulations of this basic idea could be introduced. We discussed
one variation in Chapter 8—that of a satiscing, as opposed to maximizing, agent—and in the
next chapter we briey touch on the issue of alternative decision theories. However, such issues
are not essential to the thrust of this subsection, so we will keep things simple by focusing here
on the case of an expected utility-maximizing agent.
2. Assuming the AI is to have a non-trivial utility function. It would be very easy to build an agent
that always chooses an action that maximizes expected utility if its utility function is, e.g., the
constant function U(w) = 0 . Every action would equally well maximize expected utility relative
to that utility function.
3. Also because we have forgotten the blooming buzzing confusion of our early infancy, a time
when we could not yet see very well because our brain had not yet learned to interpret its visual
input.
4. See also Yudkowsky (2011) and the review in section 5 of Muehlhauser and Helm (2012).
5. It is perhaps just about conceivable that advances in soware engineering could eventually
overcome these diculties. Using modern tools, a single programmer can produce soware
that would have been beyond the pale of a sizeable team of developers forced to write directly in
machine code. Today’s AI programmers gain expressiveness from the wide availability of high-
quality machine learning and scientic calculation libraries, enabling someone to hack up, for
instance, a unique-face-counting webcam application by chaining libraries together that they
never could have written on their own. e accumulation of reusable soware, produced by
specialists but useable by non-specialists, will give future programmers an expressiveness ad-
vantage. For example, a future robotics programmer might have ready access to standard facial
imprinting libraries, typical-oce-building-object collections, specialized trajectory libraries,
and many other functionalities that are currently unavailable.
6. Dawkins (1995, 132). e claim here is not necessarily that the amount of suering in the natu-
ral world outweighs the amount of positive well-being.
7. Required population sizes might be much larger or much smaller than those that existed in our
own ancestry. See Shulman and Bostrom (2012).
8. If it were easy to get an equivalent result without harming large numbers of innocents, it would
seem morally better to do so. If, nevertheless, digital persons are created and made to suer
unjust harm, it may be possible to compensate them for their suering by saving them to le
and later (when humanitys future is secured) rerunning them under more favorable conditions.
Such restitution could be compared in some ways to religious conceptions of an aerlife in the
context of theological attempts to address the evidential problem of evil.
9. One of the elds leading gures, Richard Sutton, denes reinforcement learning not in terms of
a learning method but in terms of a learning problem: any method that is well suited to solving
that problem is considered a reinforcement learning method (Sutton and Barto 1998, 4). e
present discussion, in contrast, pertains to methods where the agent can be conceived of as hav-
ing the nal goal of maximizing (some notion of) cumulative reward. Since an agent with some
very dierent kind of nal goal might be skilled at mimicking a reward-seeking agent in a wide
NOTES: CHAPTER 2
|
29
range of situations, and could thus be well suited to solving reinforcement learning problems,
there could be methods that would count as “reinforcement learning methods” on Sutton’s de-
nition that would not result in a wireheading syndrome. e remarks in the text, however, apply
to most of the methods actually used in the reinforcement learning community.
10. Even if, somehow, a human-like mechanism could be set up within a human-like machine intel-
lect, the nal goals acquired by this intellect need not resemble those of a well-adjusted human,
unless the rearing environment for this digital baby also closely matched that of an ordinary
child: something that would be dicult to arrange. And even with a human-like rearing envi-
ronment, a satisfactory result would not be guaranteed, since even a subtle dierence in innate
dispositions can result in very dierent reactions to a life event. It may, however, be possible to
create a more reliable value-accretion mechanism for human-like minds in the future (perhaps
using novel drugs or brain implants, or their digital equivalents).
11. One might wonder why it appears we humans are not trying to disable the mechanism that leads
us to acquire new nal values. Several factors might be at play. First, the human motivation sys-
tem is poorly described as a coldly calculating utility-maximizing algorithm. Second, we might
not have any convenient means of altering the ways we acquire values. ird, we may have in-
strumental reasons (arising, e.g., from social signaling needs) for sometimes acquiring new -
nal values—instrumental values might not be as useful if our minds are partially transparent to
other people, or if the cognitive complexity of pretending to have a dierent set of nal values
than we actually do is too taxing. Fourth, there are cases where we do actively resist tendencies
that produce changes in our nal values, for instance when we seek to resist the corrupting in-
uence of bad company. Fih, there is the interesting possibility that we place some nal value
on being the kind of agent that can acquire new nal values in normal human ways.
12. Or one might try to design the motivation system so that the AI is indierent to such replace-
ment; see Armstrong (2010).
13. We will here draw on some elucidations made by Daniel Dewey (2011). Other background ideas
contributing to this framework have been developed by Marcus Hutter (2005) and Shane Legg
(2008), Eliezer Yudkowsky (2001), Nick Hay (2005), Moshe Looks, and Peter de Blanc.
14. To avoid unnecessary complications, we conne our attention to deterministic agents that do
not discount future rewards.
15. Mathematically, an agent’s behavior can be formalized as an agent function, which maps each
possible interaction history to an action. Except for the very simplest agents, it is infeasible
to represent the agent function explicitly as a lookup table. Instead, the agent is given some
way of computing which action to perform. Since there are many ways of computing the same
agent function, this leads to a ner individuation of an agent as an agent program. An agent
program is a specic program or algorithm that computes an action for any given interaction
history. While it is oen mathematically convenient and useful to think of an agent program
that interacts with some formally specied environment, it is important to remember that this
is an idealization. Real agents are physically instantiated. is means not only that the agent
interacts with the environment via its sensors and eectors, but also that the agents “brain” or
controller is itself part of physical reality. Its operations can therefore in principle be aected by
external physical interferences (and not only by receiving percepts from its sensors). At some
point, therefore, it becomes necessary to view an agent as an agent implementation. An agent
implementation is a physical structure that, in the absence of interference from its environment,
implements an agent function. (is denition follows Dewey [2011].)
16. Dewey proposes the following optimality notion for a value learning agent:
yPyx yx yUyx
ky
xyx
mkk
U
m
k
kkm
=
+
∑∑
≤≤
argmax (| )(
:
1
1
))(
|)
.PU yx
m2
Here, P
1
and P
2
are two probability functions. e second summation ranges over some suitable
class of utility functions over possible interaction histories. In the version presented in the text,
we have made explicit some dependencies as well as availed ourselves of the simplifying possible
worlds notation.
17. It should be noted that the set of utility functions should be such that utilities can be com-
pared and averaged. In general, this is problematic, and it is not always obvious how to represent
292
|
NOTES: CHAPTER 2
dierent moral theories of the good in terms of cardinal utility functions. See, e.g., MacAskill
2010).
18. Or more generally, since might not be such as to directly imply for any given pair of a possible
world and a utility function (w, U) whether the proposition (U) is true in w, what needs to be
done is to give the AI an adequate representation of the conditional probability distribution
P((U) | w) .
19. Con sider rst , the class of actions that are possible for an agent. One issue here is what exactly
should count as an action: only basic motor commands (e.g. “send an electric pulse along out-
put channel #00101100”), or higher-level actions (e.g. “keep camera centered on face”)? Since
we are trying to develop an optimality notion rather than a practical implementation plan, we
may take the domain to be basic motor commands (and since the set of possible motor com-
mands might change over time, we may need to index to time). However, in order to move
toward implementation it will presumably be necessary to introduce some kind of hierarchical
planning process, and one may then need to consider how to apply the formula to some class of
higher-level actions. Another issue is how to analyze internal actions (such as writing strings to
working memory). Since internal actions can have important consequences, one would ideally
want to include such basic internal actions as well as motor commands. But there are limits
to how far one can go in this direction: the computation of the expected utility of any action in
requires multiple computational operations, and if each such operation were also regarded as
an action in that needed to be evaluated according to AI-VL, we would face an innite regress
that would make it impossible to ever get started. To avoid the innite regress, one must restrict
any explicit attempt to estimate the expected utility to a limited number of signicant action
possibilities. e system will then need some heuristic process that identies some signicant
action possibilities for further consideration. (Eventually the system might also get around to
making explicit decisions regarding some possible actions to make modications to this heuris-
tic process, actions that might have been agged for explicit attention by this self-same process;
so that in the long run the system might become increasingly eective at approximating the
ideal identied by AI-VL.)
Consider next , which is a class of possible worlds. One diculty here is to specify so
that it is suciently inclusive. Failure to include some relevant w in could render the AI
incapable of representing a situation that actually occurs, resulting in the AI making bad deci-
sions. Suppose, for example, that we use some ontological theory to determine the makeup of
. For instance, we include in all possible worlds that consist of a certain kind of spacetime
manifold populated by elementary particles found in the standard model in particle physics.
is could distort the AIs epistemology if the standard model is incomplete or incorrect. One
could try to use a bigger -class to cover more possibilities; but even if one could ensure that
every possible physical universe is included one might still worry that some other possibility
would be le out. For example, what about the possibility of dualistic possible worlds in which
facts about consciousness do not supervene on facts about physics? What about indexical facts?
Normative facts? Facts of higher mathematics? What about other kinds of fact that we fallible
humans might have overlooked but that could turn out to be important to making things go as
well as possible? Some people have strong convictions that some particular ontological theory is
correct. (Among people writing on the future of AI, a belief in a materialistic ontology, in which
the mental supervenes on the physical, is oen taken for granted.) Yet a moment’s reection on
the history of ideas should help us realize that there is a signicant possibility that our favorite
ontology is wrong. Had nineteenth-century scientists attempted a physics-inspired denition of
, they would probably have neglected to include the possibility of a non-Euclidian spacetime
or an Everettian (“many-worlds”) quantum theory or a cosmological multiverse or the simula-
tion hypothesis—possibilities that now appear to have a substantial probability of obtaining
in the actual world. It is plausible that there are other possibilities to which we in the present
generation are similarly oblivious. (On the other hand, if is too big, there arise technical di-
culties related to having to assign measures over transnite sets.) e ideal might be if we could
somehow arrange things such that the AI could use some kind of open-ended ontology, one that
the AI itself could subsequently extend using the same principles that we would use when decid-
ing whether to recognize a new type of metaphysical possibility.
NOTES: CHAPTER 2
|
293
Consider P(w | Ey). Specifying this conditional probability is not strictly part of the value-
loading problem. In order to be intelligent, the AI must already have some way of deriving rea-
sonably accurate probabilities over many relevant factual possibilities. A system that falls too
far short on this score will not pose the kind of danger that concerns us here. However, there
may be a risk that the AI will end up with an epistemology that is good enough to make the AI
instrumentally eective yet not good enough to enable it to think correctly about some pos-
sibilities that are of great normative importance. (e problem of specifying P(w | Ey) is in this
way related to the problem of specifying .) Specifying P(w | Ey) also requires confronting other
issues, such as how to represent uncertainty over logical impossibilities.
e aforementioned issues—how to dene a class of possible actions, a class of possible
worlds, and a likelihood distribution connecting evidence to classes of possible worlds—are
quite generic: similar issues arise for a wide range of formally specied agents. It remains to
examine a set of issues more peculiar to the value learning approach; namely, how to dene ,
(U), and P((U) | w).
is a class of utility functions. ere is a connection between and inasmuch as each
utility function U (w) in should ideally assign utilities to each possible world w in . But
also needs to be wide in the sense of containing suciently many and diverse utility functions
for us to have justied condence that at least one of them does a good job of representing the
intended values.
e reason for writing P((U) | w) rather than simply P(U | w) is to emphasize the fact that
probabilities are assigned to propositions. A utility function, per se, is not a proposition, but
we can transform a utility function into a proposition by making some claim about it. For
example, we may claim of a particular utility function U(.) that it describes the preferences of
a particular person, or that it represents the prescriptions implied by some ethical theory, or
that it is the utility function that the principal would have wished to have implemented if she
had thought things through. e “value criterion” (.) can thus be construed as a function
that takes as its argument a utility function U and gives as its value a proposition to the eect
that U satises the criterion . Once we have dened a proposition (U), we can hopefully
obtain the conditional probability P((U) | w) from whatever source we used to obtain the
other probability distributions in the AI. (If we are certain that all normatively relevant facts
are taken into account in individuating the possible worlds , then P((U) | w) should equal
zero or one in each possible world.) e question remains how to dene . is is discussed
further in the text.
20. ese are not the only challenges for the value learning approach. Another issue, for instance,
is how to get the AI to have suciently sensible initial beliefs—at least by the time it becomes
strong enough to subvert the programmers’ attempts to correct it.
21. Yud kowsk y (20 01).
22. e term is taken from American football, where a “Hail Mary” is a very long forward pass
made in desperation, typically when the time is nearly up, on the o chance that a friendly play-
er might catch the ball near the end zone and score a touchdown.
23. e Hail Mary approach relies on the idea that a superintelligence could articulate its prefer-
ences with greater exactitude than we humans can articulate ours. For example, a superintel-
ligence could specify its preference as code. So if our AI is representing other superintelligences
as computational processes that are perceiving their environment, then our AI should be able to
reason about how those alien superintelligences would respond to some hypothetical stimulus,
such as a “window” popping up in their visual eld presenting them with the source code of our
own AI and asking them to specify their instructions to us in some convenient pre-specied
format. Our AI could then read o these imaginary instructions (from its own model of this
counterfactual scenario wherein these alien superintelligences are represented), and we would
have built our AI so that it would be motivated to follow those instructions.
24. An alternative would be to create a detector that looks (within our AI’s world model) for (rep-
resentations of) physical structures created by a superintelligent civilization. We could then
bypass the step of identifying the hypothesized superintelligences’ preference functions, and
give our own AI the nal value of trying to copy whatever physical structures it believes super-
intelligent civilizations tend to produce.
294
|
NOTES: CHAPTER 2
ere are technical challenges with this version, too, however. For instance, since our own
AI, even aer it has attained superintelligence, may not be able to know with great precision
what physical structures other superintelligences build, our AI may need to resort to trying to
approximate those structures. To do this, it would seem our AI would need a similarity metric
by which to judge how closely one physical artifact approximates another. But similarity metrics
based on crude physical measures may be inadequate—it being no good, for example, to judge
that a brain is more similar to a Camembert cheese than to a computer running an emulation.
A more feasible approach might be to look for “beacons”: messages about utility functions
encoded in some suitable simple format. We would build our AI to want to follow whatever such
messages about utility functions it hypothesizes might exist out there in the universe; and we
would hope that friendly extraterrestrial AIs would create a variety of beacons of the types that
they (with their superintelligence) reckon that simple civilizations like ours are most likely to
build our AI to look for.
25. I f every civilization tried to solve the value-loading problem through a Hail Mary, the pass
would fail. Somebody has to do it the hard way.
26. Christiano (2012).
27. e AI we build need not be able to nd the model either. Like us, it could reason about what
such a complex implicit denition would entail (perhaps by looking at its environment and fol-
lowing much the same kind of reasoning that we would follow).
28. Cf. Chapters 9 and 11.
29. For instance, MDMA may temporarily increase empathy; oxytocin may temporarily increase
trust (Vollenweider et al. 1998; Bartz et al. 2011). However, the eects seem quite variable and
context dependent.
30. e enhanced agents might be killed o or placed in suspended animation (paused), reset to an
earlier state, or disempowered and prevented from receiving any further enhancements, until
the overall system has reached a more mature and secure state where these earlier rogue ele-
ments no longer pose a system-wide threat.
31. e issue might also be less obvious in a future society of biological humans, one that has access
to advanced surveillance or biomedical techniques for psychological manipulation, or that is
wealthy enough to aord an extremely high ratio of security professionals to invigilate the regu-
lar citizenry (and each other).
32. Cf. Armstrong (2007) and Shulman (2010b).
33. One open question is to what degree a level n supervisor would need to monitor not only their
level (n − 1) supervisees, but also their level (n − 2) supervisees, in order to know that the level
(n − 1) agents are doing their jobs properly. And to know that the level (n − 1) agents have success-
fully managed the level (n − 2) agents, is it further necessary for the level n agent to also monitor
the level (n − 3) agents?
34. is approach straddles the line between motivation selection and capability control. Techni-
cally, the part of the arrangement that consists of human beings controlling a set of soware
supervisors counts as capability control, whereas the part of the arrangement that consists of
layers of soware agents within the system controlling other layers is motivation selection (in-
sofar as it is an arrangement that shapes the system’s motivational tendencies).
35. In fact, many other costs deserve consideration but cannot be given it here. For example, what-
ever agents are charged with ruling over such a hierarchy might become corrupted or debased
by their power.
36. For this guarantee to be eective, it must be implemented in good faith. is would rule out cer-
tain kinds of manipulation of the emulation’s emotional and decision-making faculties which
might otherwise be used (for instance) to install a fear of being halted or to prevent the emula-
tion from rationally considering its options.
37. See, e.g., Brinton (1965); Goldstone (1980, 2001). (Social science progress on these questions
could make a nice gi to the world’s despots, who might use more accurate predictive models
of social unrest to optimize their population control strategies and to gently nip insurgencies in
the bud with less-lethal force.)
38. Cf. Bostrom (2011a, 2009b).
NOTES: CHAPTERS 2–3
|
295
39. In the case of an entirely articial system, it might be possible to obtain some of the advantages
of an institutional structure without actually creating distinct subagents. A system might in-
corporate multiple perspectives into its decision process without endowing each of those per-
spectives with its own panoply of cognitive faculties required for independent agency. It could
be tricky, however, to fully implement the “observe the behavioral consequences of a proposed
change, and revert back to an earlier version if the consequences appear undesirable from the
ex ante standpoint” feature described in the text in a system that is not composed of subagents.
CHAPTER 3: CHOOSING THE CRITERIA FOR CHOOSING
1. A recent canvass of professional philosophers found the percentage of respondents who “ac-
cept or leans toward” various positions. On normative ethics, the results were deontology
25.9%; consequentialism 23.6%; virtue ethics 18.2%. On metaethics, results were moral realism
56.4%; moral anti-realism 27.7%. On moral judgment: cognitivism 65.7%; non-cognitivism 17.0%
(Bourget and Chalmers 2009).
2. Pinker (2011).
3. For a discussion of this issue, see Shulman et al. (2009).
4. Moore (2011).
5. Bostrom (2006b).
6. Bostrom (2009b).
7. Bostrom (2011a).
8. More precisely, we should defer to its opinion except on those topics where we have good reason
to suppose that our beliefs are more accurate. For example, we might know more about what we
are thinking at a particular moment than the superintelligence does if it is not able to scan our
brains. However, we could omit this qualication if we assume that the superintelligence has ac-
cess to our opinions; we could then also defer to the superintelligence the task of judging when
our opinions should be trusted. (ere might remain some special cases, involving indexical
information, that need to be handled separately—by, for example, having the superintelligence
explain to us what it would be rational to believe from our perspective.) For an entry into the
burgeoning philosophical literature on testimony and epistemic authority, see, e.g., Elga (2007).
9. Yudkowsky (2004). See also Mijic (2010).
10. For example, David Lewis proposed a dispositional theory of value, which holds, roughly, that
some thing X is a value for A if and only if A would want to want X if A were perfectly rational
and ideally acquainted with X (Smith et al. 1989). Kindred ideas had been put forward earlier;
see, e.g., Sen and Williams (1982), Railton (1986), and Sidgwick and Jones (2010). Along some-
what similar lines, one common account of philosophical justication, the method of reective
equilibrium, proposes a process of iterative mutual adjustment between our intuitions about
particular cases, the general rules which we think govern these cases, and the principles accord-
ing to which we think these elements should be revised, to achieve a more coherent system; see,
e.g., Rawls (1971) and Goodman (1954).
11. Presumably the intention here is that when the AI acts to prevent such disasters, it should do it
with as light a touch as possible, i.e. in such a manner that it averts the disaster but without exert-
ing too much inuence over how things turn out for humanity in other respects.
12. Yudkowsky (2004).
13. Rebecca Roache, personal communication.
14. e three principles are “Defend humans, the future of humanity, and humane nature” (hu-
mane here being that which we wish we were, as distinct from human, which is what we are);
“Humankind should not spend the rest of eternity desperately wishing that the programmers
had done something dierently”; and “Help people.”
15. Some religious groups place a strong emphasis on faith in contradistinction to reason, the lat-
ter of which they may regard—even in its hypothetically most idealized form and even aer it
would have ardently and open-mindedly studied every scripture, revelation, and exegesis—to
be insucient for the attainment of essential spiritual insights. ose holding such views might
not regard CEV as an optimal guide to decision-making (though they might still prefer it to
296
|
NOTES: CHAPTER 3
various other imperfect guides that might in actuality be followed if the CEV approach were
eschewed).
16. An AI acting like a latent force of nature to regulate human interactions has been referred to
as a “Sysop,” a kind of “operating system” for the matter occupied by human civilization. See
Yudkowsky (2001).
17. Might,” because conditional on humanitys coherent extrapolated volition wishing not to ex-
tend moral consideration to these entities, it is perhaps doubtful whether those entities actually
have moral status (despite it seeming very plausible now that they do). “Potentially,” because
even if a blocking vote prevents the CEV dynamic from directly protecting these outsiders,
there is still a possibility that, within whatever ground rules are le over once the initial dynam-
ic has run, individuals whose wishes were respected and who want some outsiders’ welfare to
be protected may successfully bargain to attain this outcome (at the expense of giving up some
of their own resources). Whether this would be possible might depend on, among other things,
whether the outcome of the CEV dynamic is a set of ground rules that makes it feasible to reach
negotiated resolutions to issues of this kind (which might require provisions to overcome strate-
gic bargaining problems).
18. Individuals who contribute positively to realizing a safe and benecial superintelligence might
merit some special reward for their labour, albeit something short of a near-exclusive mandate
to determine the disposition of humanity’s cosmic endowment. However, the notion of eve-
rybody getting an equal share in our extrapolation base is such a nice Schelling point that it
should not be lightly tossed away. ere is, in any case, an indirect way in which virtue could be
rewarded: namely, the CEV itself might turn out to specify that good people who exerted them-
selves on behalf of humanity should be suitably recognized. is could happen without such
people being given any special weight in the extrapolation base if—as is easily imaginable—our
CEV would endorse (in the sense of giving at least some nonzero weight to) a principle of just
desert.
19. Bostrom et al. (2013).
20. To the extent that there is some (suciently denite) shared meaning that is being expressed
when we make moral assertions, a superintelligence should be able to gure out what that
meaning is. And to the extent that moral assertions are “truth-apt” (i.e. have an underlying
propositional character that enables them to be true or false), the superintelligence should be
able to gure out which assertions of the form “Agent X ought now to Φ” are true. At least, it
should outperform us on this task.
An AI that initially lacks such a capacity for moral cognition should be able to acquire it if
it has the intelligence amplication superpower. One way the AI could do this is by reverse-
engineering the human brain’s moral thinking and then implement a similar process but run it
faster, feed it more accurate factual information, and so forth.
21. Since we are uncertain about metaethics, there is a question of what the AI is to do if the pre-
conditions for MR fail to obtain. One option is to stipulate that the AI shut itself o if it assigns
a suciently high probability to moral cognitivism being false or to there being no suitable non-
relative moral truths. Alternatively, we could have the AI revert to some alternative approach,
such as CEV.
We could rene the MR proposal to make it clearer what is to be done in various ambiguous
or degenerate cases. For instance, if error theory is true (and hence all positive moral assertions
of the form “I ought now to Φ” are false), then the fallback strategy (e.g. shutting down) would
be invoked. We could also specify what should happen if there are multiple feasible actions, each
of which would be morally right. For example, we might say that in such cases the AI should
perform (one of) the permissible actions that humanitys collective extrapolation would have
favored. We might also stipulate what should happen if the true moral theory does not em-
ploy terms like “morally right” in its basic vocabulary. For instance, a consequentialist theory
might hold that some actions are better than others but that there is no particular threshold
corresponding to the notion of an action being “morally right.” We could then say that if such a
theory is correct, MR should perform one of the morally best feasible actions, if there is one; or,
if there is an innite number of feasible actions such that for any feasible action there is a better
NOTES: CHAPTER 3
|
297
one, then maybe MR could pick any that is at least astronomically better than the best action
that any human would have selected in a similar situation, if such an action is feasible—or if not,
then an action that is at least as good as the best action a human would have performed.
A couple of general points should be borne in mind when thinking about how the MR pro-
posal could be rened. First, we might start conservatively, using the fallback option to cover
almost all contingencies and only use the “morally right” option in those that we feel we fully
understand. Second, we might add the general modulator to the MR proposal that it is to be “in-
terpreted charitably, and revised as we would have revised it if we had thought more carefully
about it before we wrote it down, etc.
22. Of these terms, “knowledge” might seem the one most readily susceptible to a formal analysis
(in information-theoretic terms). However, to represent what it is for a human to know some-
thing, the AI may need a sophisticated set of representations relating to complex psychological
properties. A human being does not “know” all the information that is stored somewhere in her
brain.
23. One indicator that the terms in CEV are (marginally) less opaque is that it would count as philo-
sophical progress if we could analyze moral rightness in terms like those used in CEV. In fact,
one of the main strands in metaethics—ideal observer theory—purports to do just that. See,
e.g., Smith et al. (1989).
24. is requires confronting the problem of fundamental normative uncertainty. It can be shown
that it is not always appropriate to act according to the moral theory that has the highest prob-
ability of being true. It can also be shown that it is not always appropriate to perform the ac-
tion that has the highest probability of being right. Some way of trading probabilities against
degrees of wrongness” or severity of issues at stake seems to be needed. For some ideas in this
direction, see Bostrom (2009a).
25. It could possibly even be argued that it is an adequacy condition for any explication of the no-
tion of moral rightness that it account for how Joe Sixpack is able to have some idea of right and
wrong.
26. It is not obvious that the morally right thing for us to do is to build an AI that implements MR,
even if we assume that the AI itself would always act morally. Perhaps it would be objectionably
hubristic or arrogant of us to build such an AI (especially since many people may disapprove of
that project). is issue can be partially nessed by tweaking the MR proposal. Suppose that we
stipulate that the AI should act (to do what it would be morally right for it to do) only if it was
morally right for its creators to have built the AI in the rst place; otherwise it should shut itself
down. It is hard to see how we would be committing any grave moral wrong in creating that
kind of AI, since if it were wrong for us to create it, the only consequence would be that an AI
was created that immediately shuts itself down, assuming that the AI has committed no mind
crime up to that point. (We might nevertheless have acted wrongly—for instance, by having
failed to seize the opportunity to build some other AI instead.)
A second issue is supererogation. Suppose there are many actions the AI could take, each of
which would be morally right—in the sense of being morally permissible—yet some of which
are morally better than the others. One option is to have the AI aim to select the morally best
action in any such a situation (or one of the best actions, in case there are several that are equally
good). Another option is to have the AI select from among the morally permissible actions one
that maximally satises some other (non-moral) desideratum. For example, the AI could select,
from among the actions that are morally permissible, the action that our CEV would prefer it to
take. Such an AI, while never doing anything that is morally impermissible, might protect our
interests more than an AI that does what is morally best.
27. When the AI evaluates the moral permissibility of our act of creating the AI, it should interpret
permissibility in its objective sense. In one ordinary sense of “morally permissible,” a doctor acts
morally permissibly when she prescribes a drug she believes will cure her patient—even if the pa-
tient, unbeknownst to the doctor, is allergic to the drug and dies as a result. Focusing on objective
moral permissibility takes advantage of the presumably superior epistemic position of the AI.
28. More directly, it depends on the AI’s beliefs about which ethical theory is true (or, more pre-
cisely, on its probability distribution over ethical theories).
298
|
NOTES: CHAPTER 3
29. It can be dicult to imagine how superlatively wonderful these physically possible lives might
be. See Bostrom (2008c) for a poetic attempt to convey some sense of this. See Bostrom (2008b)
for an argument that some of these possibilities could be good for us, good for existing human
beings.
30. It might seem deceptive or manipulative to promote one proposal if one thinks that some other
proposal would be better. But one could promote it in ways that avoid insincerity. For example,
one could freely acknowledge the superiority of the ideal while still promoting the non-ideal as
the best attainable compromise.
31. Or some other positively evaluative term, such as “good,” “great,” or “wonderful.
32. is echoes a principle in soware design known as “Do What I Mean,” or DWIM. See
Teitelman (1966).
33. Goal content, decision theory, and epistemology are three aspects that should be elucidated; but
we do not intend to beg the question of whether there must be a neat decomposition into these
three separate components.
34. An ethical project ought presumably to allocate at most a modest portion of the eventual ben-
ets that the superintelligence produces as special rewards to those who contributed in morally
permissible ways to the project’s success. Allocating a great portion to the incentive wrapping
scheme would be unseemly. It would be analogous to a charity that spends 90% of its income on
performance bonuses for its fundraisers and on advertising campaigns to increase donations.
35. How could the dead be rewarded? One can think of several possibilities. At the low end, there
could be memorial services and monuments, which would be a reward insofar as people desired
posthumous fame. e deceased might also have other preferences about the future that could
be honored, for instance concerning cultures, arts, buildings, or natural environments. Fur-
thermore, most people care about their descendants, and special privileges could be granted to
the children and grandchildren of contributors. More speculatively, the superintelligence might
be able to create relatively faithful simulations of some past people—simulations that would be
conscious and that would resemble the original suciently to count as a form of survival (ac-
cording to at least some people’s criteria). is would presumably be easier for people who have
been placed in cryonic suspension; but perhaps for a superintelligence it would not be impos-
sible to recreate something quite similar to the original person from other preserved records
such as correspondence, publications, audiovisual materials and digital records, or the personal
memories of other survivors. A superintelligence might also think of some possibilities that do
not readily occur to us.
36. On Pascalian mugging, see Bostrom (2009b). For an analysis of issues related to innite util-
ities, see Bostrom (2011a). On fundamental normative uncertainty, see, e.g., Bostrom (2009a).
37. E.g., Price (1991); Joyce (1999); Drescher (2006); Yudkowsky (2010); Dai (2009).
38. E.g., Bostrom (2009a).
39. It is also conceivable that using indirect normativity to specify the AIs goal content would miti-
gate the problems that might arise from an incorrectly specied decision theory. Consider, for ex-
ample, the CEV approach. If it were implemented well, it might be able to compensate for at least
some errors in the specication of the AIs decision theory. e implementation could allow the
values that our coherent extrapolated volition would want the AI to pursue to depend on the AIs
decision theory. If our idealized selves knew they were making value specications for an AI that
was using a particular kind of decision theory, they could adjust their value specications such
as to make the AI behave benignly despite its warped decision theory—much like one can cancel
out the distorting eects of one lens by placing another lens in front of it that distorts oppositely.
40. Some epistemological systems may, in a holistic manner, have no distinct foundation. In that
case, the constitutional inheritance is not a distinct set of principles, but rather, as it were, an
epistemic starting point that embodies certain propensities to respond to incoming streams of
evidence.
41. See, e.g., the problem of distortion discussed in Bostrom (2011a).
42. For instance, one disputed issue in anthropic reasoning is whether the so-called self-indication
assumption should be accepted. e self-indication assumption states, roughly, that from the
fact that you exist you should infer that hypotheses according to which larger numbers N of
observers exist should receive a probability boost proportional to N. For an argument against
NOTES: CHAPTERS 3–4
|
299
this principle, see the “Presumptuous Philosopher” gedanken experiment in Bostrom (2002a).
For a defense of the principle, see Olum (2002); and for a critique of that defense, see Bostrom
and Ćirković (2003). Beliefs about the self-indication assumption might aect various empiri-
cal hypotheses of potentially crucial strategic relevance, for example, considerations such as
the Carter–Leslie doomsday argument, the simulation argument, and “great lter” arguments.
See Bostrom (2002a, 2003a, 2008a); Carter (1983); Ćirković et al. (2010); Hanson (1998d); Les-
lie (1996); Tegmark and Bostrom (2005). A similar point could be made with regard to other
fraught issues in observation selection theory, such as whether the choice of reference class can
be relativized to observer-moments, and if so how.
43. See, e.g., Howson and Urbach (1993). ere are also some interesting results that narrow the
range of situations in which two Bayesian agents can rationally disagree when their opinions are
common knowledge; see Aumann (1976) and Hanson (2006).
44. Cf. the concept of a “last judge” in Yudkowsky (2004).
45. ere are many important issues outstanding in epistemology, some mentioned earlier in the
text. e point here is that we may not need to get all the solutions exactly right in order to
achieve an outcome that is practically indiscernible from the best outcome. A mixture model
(which throws together a wide range of diverse priors) might work.
CHAPTER 4: THE STRATEGIC PICTURE
1. is principle is introduced in Bostrom (2009b, 190), where it is also noted that it is not tau-
tological. For a visual analogy, picture a box with large but nite volume, representing the space
of basic capabilities that could be obtained through some possible technology. Imagine sand
being poured into this box, representing research eort. How you pour the sand determines
where it piles up in the box. But if you keep on pouring, the entire space eventually gets lled.
2. Bostrom (2002b).
3. is is not the perspective from which science policy has traditionally been viewed. Harvey
Averch describes science and technology policy in the United States between 1945 and 1984 as
having been centered on debates about the optimum level of public investment in the S&T en-
terprise and on the extent to which the government should attempt to “pick winners” in order to
achieve the greatest increase in the nation’s economic prosperity and military strength. In these
calculations, technological progress is always assumed to be good. But Averch also describes
the rise of critical perspectives, which question the “progress is always good” premiss (Averch
1985). See also Graham (1997).
4. Bostrom (2002b).
5. is is of course by no means tautological. One could imagine a case being made for a dierent
order of development. It could be argued that it would be better for humanity to confront some
less dicult challenge rst, say the development of nanotechnology, on grounds that this would
force us to develop better institutions, become more internationally coordinated, and mature in
our thinking about global strategy. Perhaps we would be more likely to rise to a challenge that
presents a less metaphysically confusing threat than machine superintelligence. Nanotechnol-
ogy (or synthetic biology, or whatever the lesser challenge we confront rst) might then serve
as a footstool that would help us ascend to the capability level required to deal with the higher-
level challenge of superintelligence.
Such an argument would have to be assessed on a case-by-case basis. For example, in the
case of nanotechnology, one would have to consider various possible consequences such as the
boost in hardware performance from nanofabricated computational substrates; the eects of
cheap physical capital for manufacturing on economic growth; the proliferation of sophisti-
cated surveillance technology; the possibility that a singleton might emerge though the direct
or indirect eects of a nanotechnology breakthrough; and the greater feasibility of neuromor-
phic and whole brain emulation approaches to machine intelligence. It is beyond the scope of
our investigation to consider all these issues (or the parallel issues that might arise for other
existential risk-causing technologies). Here we just point out the prima facie case for favoring a
superintelligence- rst sequence of development—while stressing that there are complications
that might alter this preliminary assessment in some cases.
300
|
NOTES: CHAPTER 4
6. Pinker (2011); Wright (2001).
7. It might be tempting to suppose the hypothesis that everything has accelerated to be meaning-
less on grounds that it does not (at rst glance) seem to have any observational consequences;
but see, e.g., Shoemaker (1969).
8. e level of preparedness is not measured by the amount of eort expended on preparedness
activities, but by how propitiously congured conditions actually are and how well-poised key
decision makers are to take appropriate action.
9. e degree of international trust during the lead-up to the intelligence explosion may also be a
factor. We consider this in the section “Collaboration” later in the chapter.
10. Anecdotally, it appears those currently seriously interested in the control problem are dispro-
portionately sampled from one extreme end of the intelligence distribution, though there could
be alternative explanations for this impression. If the eld becomes fashionable, it will undoubt-
edly be ooded with mediocrities and cranks.
11. I owe this term to Carl Shulman.
12. How similar to a brain does a machine intelligence have to be to count as a whole brain emula-
tion rather than a neuromorphic AI? e relevant determinant might be whether the system re-
produces either the values or the full panoply of cognitive and evaluative tendencies of either a
particular individual or a generic human being, because this would plausibly make a dierence
to the control problem. Capturing these properties may require a rather high degree of emula-
tion delity.
13. e magnitude of the boost would of course depend on how big the push was, and also where
resources for the push came from. ere might be no net boost for neuroscience if all the extra
resources invested in whole brain emulation research were deducted from regular neuroscience
research—unless a keener focus on emulation research just happened to be a more eective way
of advancing neuroscience than the default portfolio of neuroscience research.
14. See Drexler (1986, 242). Drexler (private communication) conrms that this reconstruction
corresponds to the reasoning he was seeking to present. Obviously, a number of implicit prem-
isses would have to be added if one wished to cast the argument in the form of a deductively
valid chain of reasoning.
15. Perhaps we ought not to welcome small catastrophes in case they increase our vigilance to the
point of making us prevent the medium-scale catastrophes that would have been needed to make
us take the strong precautions necessary to prevent existential catastrophes? (And of course,
just as with biological immune systems, we also need to be concerned with over- reactions, anal-
ogous to allergies and autoimmune disorders.)
16. Cf. Lenman (2000); Burch-Brown (2014).
17. Cf. Bostrom (2007).
18. Note that this argument focuses on the ordering rather than the timing of the relevant events.
Making superintelligence happen earlier would help preempt other existential transition risks
only if the intervention changes the sequence of the key developments: for example, by making
superintelligence happen before various milestones are reached in nanotechnology or synthetic
biology.
19. If solving the control problem is extremely dicult compared to solving the performance
problem in machine intelligence, and if project ability correlates only weakly with project size,
then it is possible that it would be better that a small project gets there rst, namely if the vari-
ance in capability is greater among smaller projects. In such a situation, even if smaller projects
are on average less competent than larger projects, it could be less unlikely that a given small
project would happen to have the freakishly high level of competence needed to solve the con-
trol problem.
20. is is not to deny that one can imagine tools that could promote global deliberation and which
would benet from, or even require, further progress in hardware—for example, high-quality
translation, better search, ubiquitous access to smart phones, attractive virtual reality environ-
ments for social intercourse, and so forth.
21. Investment in emulation technology could speed progress toward whole brain emulation not
only directly (through any technical deliverables produced) but also indirectly by creating a
NOTES: CHAPTER 4
|
30
constituency that will push for more funding and boost the visibility and credibility of the
whole brain emulation (WBE) vision.
22. How much expected value would be lost if the future were shaped by the desires of one ran-
dom human rather than by (some suitable superposition of) the desires of all of humanity? is
might depend sensitively on what evaluation standard we use, and also on whether the desires
in question are idealized or raw.
23. For example, whereas human minds communicate slowly via language, AIs can be designed
so that instances of the same program are able easily and quickly to transfer both skills and
information amongst one another. Machine minds designed ab initio could do away with cum-
bersome legacy systems that helped our ancestors deal with aspects of the natural environment
that are unimportant in cyberspace. Digital minds might also be designed to take advantage of
fast serial processing unavailable to biological brains, and to make it easy to install new modules
with highly optimized functionality (e.g. symbolic processing, pattern recognition, simulators,
data mining, and planning). Articial intelligence might also have signicant non-technical
advantages, such as being more easily patentable or less entangled in the moral complexities of
using human uploads.
24. If p
1
and p
2
are the probabilities of failure at each step, the total probability of failure is
p
1
+ (1 – p
1
)p
2
since one can fail terminally only once.
25. It is possible, of course, that the frontrunner will not have such a large lead and will not be able
to form a singleton. It is also possible that a singleton would arise before AI even without the
intervention of WBE, in which case this reason for favoring a WBE-rst scenario falls away.
26. Is there a way for a promoter of WBE to increase the specicity of her support so that it acceler-
ates WBE while minimizing the spillover to AI development? Promoting scanning technology
is probably a better bet than promoting neurocomputational modeling. (Promoting computer
hardware is unlikely to make much dierence one way or the other, given the large commercial
interests that are anyway incentivizing progress in that eld.)
Promoting scanning technology may increase the likelihood of a multipolar outcome by
making scanning less likely to be a bottleneck, thus increasing the chance that the early emula-
tion population will be stamped from many dierent human templates rather than consisting of
gazillions of copies of a tiny number of templates. Progress in scanning technology also makes
it more likely that the bottleneck will instead be computing hardware, which would tend to slow
the takeo.
27. Neuromorphic AI may also lack other safety-promoting attributes of whole brain emulation,
such as having a prole of cognitive strengths and weaknesses similar to that of a biological hu-
man being (which would let us use our experience of humans to form some expectations of the
system’s capabilities at dierent stages of its development).
28. If somebody’s motive for promoting WBE is to make WBE happen before AI, they should bear
in mind that accelerating WBE will alter the order of arrival only if the default timing of the
two paths toward machine intelligence is close and with a slight edge to AI. Otherwise, either
investment in WBE will simply make WBE happen earlier than it otherwise would (reducing
hardware overhang and preparation time) but without aecting the sequence of development;
or else such investment in WBE will have little eect (other than perhaps making AI happen
even sooner by stimulating progress on neuromorphic AI).
29. Comment on Hanson (2009).
30. ere would of course be some magnitude and imminence of existential risk for which it would
be preferable even from the person-aecting perspective to postpone the risk—whether to en-
able existing people to eke out a bit more life before the curtain drops or to provide more time
for mitigation eorts that might reduce the danger.
31. Suppose we could take some action that would bring the intelligence explosion closer by one
year. Let us say that the people currently inhabiting the Earth are dying o at a rate of 1% per
year, and that the default risk of human extinction from the intelligence explosion is 20% (to
pick an arbitrary number for the purposes of illustration only). en hastening the arrival of
the intelligence explosion by 1 year might be worth (from a person-aecting standpoint) an
increase in the risk from 20% to 21%, i.e. a 5% increase in risk level. However, the vast majority
302
|
NOTES: CHAPTER 4
of people alive one year before the start of the intelligence explosion would at that point have an
interest in postponing it if they could thereby reduce the risk of the explosion by one percentage
point (since most individuals would reckon their own risk of dying in the next year to be much
smaller than 1%—given that most mortality occurs in relatively narrow demographics such as
the frail and the elderly). One could thus have a model in which each year the population votes
to postpone the intelligence explosion by another year, so that the intelligence explosion never
happens, although everybody who ever lives agrees that it would be better if the intelligence ex-
plosion happened at some point. In reality, of course, coordination failures, limited predictabil-
ity, or preferences for things other than personal survival are likely to prevent such an unending
pause.
If one uses the standard economic discount factor instead of the person-aecting stand-
ard, the magnitude of the potential upside is diminished, since the value of existing people
getting to enjoy astronomically long lives is then steeply discounted. is eect is especially
strong if the discount factor is applied to each individual’s subjective time rather than to
sidereal time. If future benets are discounted at a rate of x% per year, and the background
level of existential risk from other sources is y% per year, then the optimum point for the
intelligence explosion would be when delaying the explosion for another year would produce
less than x + y percentage points of reduction of the existential risk associated with an intel-
ligence explosion.
32. I am indebted to Carl Shulman and Stuart Armstrong for help with this model. See also
Shulman (2010a, 3): “Chalmers (2010) reports a consensus among cadets and sta at the U.S.
West Point military academy that the U.S. government would not restrain AI research even in
the face of potential catastrophe, for fear that rival powers would gain decisive advantage.”
33. at is, information in the model is always bad ex ante. Of course, depending on what the infor-
mation actually is, it will in some cases turn out to be good that the information became known,
notably if the gap between leader and runner-up is much greater than one would reasonably
have guessed in advance.
34. It might even present an existential risk, especially if preceded by the introduction of novel mili-
tary technologies of destruction or unprecedented arms buildups.
35. A project could have its workers distributed over a large number of locations and collaborat-
ing via encrypted communications channels. But this tactic involves a security trade-o: while
geographical dispersion may oer some protection against military attacks, it would impede
operational security, since it is harder to prevent personnel from defecting, leaking informa-
tion, or being abducted by a rival power if they are spread out over many locations.
36. Note that a large temporal discount factor could make a project behave in some ways as though
it were in a race, even if it knows it has no real competitor. e large discount factor means it
would care little about the far future. Depending on the situation, this would discourage blue-
sky R&D, which would tend to delay the machine intelligence revolution (though perhaps mak-
ing it more abrupt when it does occur, because of hardware overhang). But the large discount
factor—or a low level of caring for future generations—would also make existential risks seem
to matter less. is would encourage gambles that involve the possibility of an immediate gain
at the expense of an increased risk of existential catastrophe, thus disincentivizing safety invest-
ment and incentivizing an early launch—mimicking the eects of the race dynamic. By contrast
to the race dynamic, however, a large discount factor (or disregard for future generations) would
have no particular tendency to incite conict.
Reducing the race dynamic is a main benet of collaboration. at collaboration would fa-
cilitate sharing of ideas for how to solve the control problem is also a benet, although this is to
some extent counterbalanced by the fact that collaboration would also facilitate sharing of ideas
for how to solve the competence problem. e net eect of this facilitation of idea-sharing may
be to slightly increase the collective intelligence of the relevant research community.
37. On the other hand, public oversight by a single government would risk producing an outcome
in which one nation monopolizes the gains. is outcome seems inferior to one in which unac-
countable altruists ensure that everybody stands to gain. Furthermore, oversight by a national
government would not necessarily mean that even all the citizens of that country receive a share
NOTES: CHAPTER 4
|
303
of the benet: depending on the country in question, there is a greater or smaller risk that all the
benets would be captured by a political elite or a few self-serving agency personnel.
38. One qualication is that the use of incentive wrapping (as discussed in Chapter 12) might in
some circumstances encourage people to join a project as active collaborators rather than pas-
sive free-riders.
39. Diminishing returns would seem to set in at a much smaller scale. Most people would rather
have one star than a one-in-a-billion chance of a galaxy with a billion stars. Indeed, most people
would rather have a billionth of the resources on Earth than a one-in-a-billion chance of own-
ing the entire planet.
40. Cf. Shulman (2010a).
41. Aggregative ethical theories run into trouble when the idea that the cosmos might be innite
is taken seriously; see Bostrom (2011b). ere may also be trouble when the idea of ridiculously
large but nite values is taken seriously; see Bostrom (2009b).
42. If one makes a computer larger, one eventually faces relativistic constraints arising from com-
munication latencies between the dierent parts of the computer—signals do not propagate
faster than light. If one shrinks the computer, one encounters quantum limits to miniaturiza-
tion. If one increases the density of the computer, one slams into the black hole limit. Admit-
tedly, we cannot be completely certain that new physics will not one day be discovered oering
some way around these limitations.
43. e number of copies of a person would scale linearly with resources with no upper bound. Yet
it is not clear how much the average human being would value having multiple copies of herself.
Even those people who would prefer to be multiply instantiated may not have a utility function
that is linear with increasing number of copies. Copy numbers, like life years, might have di-
minishing returns in the typical person’s utility function.
44. A singleton is highly internally collaborative at the highest level of decision-making. A single-
ton could have a lot of non-collaboration and conict at lower levels, if the higher-level agency
that constitutes the singleton chooses to have things that way.
45. If each rival AI team is convinced that the other teams are so misguided as to have no chance of
producing an intelligence explosion, then one reason for collaboration—avoiding the race dy-
namic—is obviated: each team should independently choose to go slower in the condent belief
that it lacks any serious competition.
46. A PhD student.
47. is formulation is intended to be read so as to include a prescription that the well-being of
nonhuman animals and other sentient beings (including digital minds) that exist or may come
to exist be given due consideration. It is not meant to be read as a license for one AI developer
to substitute his or her own moral intuitions for those of the wider moral community. e prin-
ciple is consistent with the “coherent extrapolated volition” approach discussed in Chapter 12,
with an extrapolation base encompassing all humans.
A further clarication: e formulation is not intended to necessarily exclude the possibly
of post-transition property rights in articial superintelligences or their constituent algorithms
and data structures. e formulation is meant to be agnostic about what legal or political systems
would best serve to organize transactions within a hypothetical future posthuman society. What
the formulation is meant to assert is that the choice of such a system, insofar as its selection is
causally determined by how superintelligence is initially developed, should to be made on the
basis of the stated criterion; that is, the post-transition constitutional system should be chosen for
the benet of all of humanity and in the service of widely shared ethical ideals—as opposed to, for
instance, for the benet merely of whoever happened to be the rst to develop superintelligence.
48. Renements of the windfall clause are obviously possible. For example, perhaps the thresh-
old should be expressed in per capita terms, or maybe the winner should be allowed to keep a
somewhat larger than equal share of the overshoot in order to more strongly incentivize further
production (some version of Rawlss maximin principle might be attractive here). Other rene-
ments would refocus the clause away from dollar amounts and restate it in terms of “inuence
on humanity’s future” or “degree to which dierent parties’ interests are weighed in a future
singleton’s utility function” or some such.
304
|
NOTES: CHAPTER 5
CHAPTER 5: CRUNCH TIME
1. Some research is worthwhile not because of what it discovers but for other reasons, such as by
entertaining, educating, accrediting, or upliing those who engage in it.
2. I am not suggesting that nobody should work on pure mathematics or philosophy. I am also not
suggesting that these endeavors are especially wasteful compared to all the other dissipations
of academia or society at large. It is probably very good that some people can devote themselves
to the life of the mind and follow their intellectual curiosity wherever it leads, independent of
any thought of utility or impact. e suggestion is that at the margin, some of the best minds
might, upon realizing that their cognitive performances may become obsolete in the foreseeable
future, want to shi their attention to those theoretical problems for which it makes a dierence
whether we get the solution a little sooner.
3. ough one should be cautious in cases where this uncertainty may be protective—recall, for
instance, the risk-race model in Box 13, where we found that additional strategic information
could be harmful. More generally, we need to worry about information hazards (see Bostrom
[2011b]). It is tempting to say that we need more analysis of information hazards. is is prob-
ably true, although we might still worry that such analysis itself may produce dangerous in-
formation.
4. Cf. Bostrom (2007).
5. I am grateful to Carl Shulman for emphasizing this point.
BIBLIOGRAPHY
|
305
BIBLIOGRAPHY
Acemoglu, Daron. 2003. “Labor- and Capital-Augmenting Technical Change.” Journal of the Euro-
pean Economic Association 1 (1): 1–37.
Albertson, D. G., and omson, J. N. 1976. “e Pharynx of Caenorhabditis Elegans.” Philosophical
Transactions of the Royal Society B: Biological Sciences 275 (938): 299–325.
Allen, Robert C. 2008. “A Review of Gregory Clarks A Farewell to Alms: A Brief Economic History of
the World.” Journal of Economic Literature 46 (4): 946–73.
American Horse Council. 2005. “National Economic Impact of the US Horse Industry.” Retrieved
July 30, 2013. Available at http://www.horsecouncil.org/national-economic-impact-us-horse-
industry.
Anand, Paul, Pattanaik, Prasanta, and Puppe, Clemens, eds. 2009. e Oxford Handbook of Rational
and Social Choice. New York: Oxford University Press.
Andres, B., Koethe, U., Kroeger, T., Helmstaedter, M., Briggman, K. L., Denk, W., and Hamprecht,
F.A. 2012. “3D Segmentation of SBFSEM Images of Neuropil by a Graphical Model over Super-
voxel Boundaries.” Medical Image Analysis 16 (4): 796805.
Armstrong, Alex. 2012. “Computer Competes in Crossword Tournament.” I Programmer, March 19.
Armstrong, Stuart. 2007. “Chaining God: A Qualitative Approach to AI, Trust and Moral Systems.”
Unpublished manuscript, October 20. Retrieved December 31, 2012. Available at http://www.
neweuropeancentury.org/GodAI.pdf.
Armstrong, Stuart. 2010. Utility Indierence, Technical Report 2010-1. Oxford: Future of Humanity
Institute, University of Oxford.
Armstrong, Stuart. 2013. “General Purpose Intelligence: Arguing the Orthogonality esis.Analy-
sis and Metaphysics 12: 6884.
Armstrong, Stuart, and Sandberg, Anders. 2013. “Eternity in Six Hours: Intergalactic Spreading of
Intelligent Life and Sharpening the Fermi Paradox.Acta Astronautica 89: 1–13.
Armstrong, Stuart, and Sotala, Kaj. 2012. “How We’re Predicting AI—or Failing To.” In Beyond AI:
Articial Dreams, edited by Jan Romportl, Pavel Ircing, Eva Zackova, Michal Polak, and Radek
Schuster, 52–75. Pilsen: University of West Bohemia. Retrieved February 2, 2013.
Asimov, Isaac. 1942. “Runaround.Astounding Science-Fiction, March, 94–103.
Asimov, Isaac. 1985. Robots and Empire. New York: Doubleday.
Aumann, Robert J. 1976. “Agreeing to Disagree.Annals of Statistics 4 (6): 1236–9.
Averch, Harvey Allen. 1985. A Strategic Analysis of Science and Technology Policy. Baltimore: Johns
Hopkins University Press.
Azevedo, F. A. C., Carvalho, L. R. B., Grinberg, L. T., Farfel, J. M., Ferretti, R. E. L., Leite, R. E. P.,
Jacob, W., Lent, R., and Herculano-Houzel, S. 2009. “Equal Numbers of Neuronal and Non-
neuronal Cells Make the Human Brain an Isometrically Scaled-up Primate Brain.” Journal of
Comparative Neurology 513 (5): 532–41.
306
|
BIBLIOGRAPHY
Baars, Bernard J. 1997. In the eater of Consciousness: e Workspace of the Mind. New York: Oxford
University Press.
Baratta, Joseph Preston. 2004. e Politics of World Federation: United Nations, UN Reform, Atomic
Control. Westport, CT: Praeger.
Barber, E. J. W. 1991. Prehistoric Textiles: e Development of Cloth in the Neolithic and Bronze Ages
with Special Reference to the Aegean. Princeton, NJ: Princeton University Press.
Bartels, J., Andreasen, D., Ehirim, P., Mao, H., Seibert, S., Wright, E. J., and Kennedy, P. 2008.
“Neurotrophic Electrode: Method of Assembly and Implantation into Human Motor Speech
Cortex.” Journal of Neuroscience Methods 174 (2): 168–76.
Bartz, Jennifer A., Zaki, Jamil, Bolger, Niall, and Ochsner, Kevin N. 2011. “Social Eects of Oxytocin
in Humans: Context and Person Matter.Trends in Cognitive Science 15 (7): 301–9.
Basten, Stuart, Lutz, Wolfgang, and Scherbov, Sergei. 2013. “Very Long Range Global Population
Scenarios to 2300 and the Implications of Sustained Low Fertility.” Demographic Research 28:
114566.
Baum, Eric B. 2004. What Is ought? Bradford Books. Cambridge, MA: MIT Press.
Baum, Seth D., Goertzel, Ben, and Goertzel, Ted G. 2011. “How Long Until Human-Level AI? Results
from an Expert Assessment.” Technological Forecasting and Social Change 78 (1): 185–95.
Beal, J., and Winston, P. 2009. “Guest Editors’ Introduction: e New Frontier of Human-Level
Articial Intelligence.” IEEE Intelligent Systems 24 (4): 21–3.
Bell, C. Gordon, and Gemmell, Jim. 2009. Total Recall: How the E-Memory Revolution Will Change
Everything. New York: Dutton.
Benyamin, B., Pourcain, B. St., Davis, O. S., Davies, G., Hansell, M. K., Brion, M.-J. A., Kirkpatrick,
R. M., et al. 2013. “Childhood Intelligence is Heritable, Highly Polygenic and Associated With
FNBP1L.” Molecular Psychiatry (January 23).
Berg, Joyce E., and Rietz, omas A. 2003. “Prediction Markets as Decision Support Systems.Infor-
mation Systems Frontiers 5 (1): 79–93.
Berger, eodore W., Chapin, J. K., Gerhardt, G. A., Soussou, W. V., Taylor, D. M., and Tresco, P. A.,
eds. 2008. Brain–Computer Interfaces: An International Assessment of Research and Development
Trend s. Springer.
Berger, T. W., Song, D., Chan, R. H., Marmarelis, V. Z., LaCoss, J., Wills, J., Hampson, R. E.,
Deadwyler, S. A., and Granacki, J. J. 2012. “A Hippocampal Cognitive Prosthesis: Multi-Input,
Multi-Output Nonlinear Modeling and VLSI Implementation.” IEEE Transactions on Neural
Systems and Rehabilitation Engineering 20 (2): 198211.
Berliner, Hans J. 1980a. “Backgammon Computer-Program Beats World Champion.” Articial Intel-
ligence 14 (2): 205220.
Berliner, Hans J. 1980b. “Backgammon Program Beats World Champ.” SIGART Newsletter 69:
69.
Bernardo, José M., and Smith, Adrian F. M. 1994. Bayesian eory, 1st ed. Wiley Series in Probability
& Statistics. New York: Wiley.
Birbaumer, N., Murguialday, A. R., and Cohen, L. 2008. “Brain–Computer Interface in Paralysis.”
Current Opinion in Neurology 21 (6): 634–8.
Bird, Jon, and Layzell, Paul. 2002. “e Evolved Radio and Its Implications for Modelling the Evo-
lution of Novel Sensors.” In Proceedings of the 2002 Congress on Evolutionary Computation, 2:
183641.
Blair, Clay, Jr. 1957. “Passing of a Great Mind: John von Neumann, a Brilliant, Jovial Mathematician,
was a Prodigious Servant of Science and His Country.” Life, February 25, 89–104.
Bobrow, Daniel G. 1968. “Natural Language Input for a Computer Problem Solving System.” In
Semantic Information Processing, edited by Marvin Minsky, 146–227. Cambridge, MA: MIT Press.
Bostrom, Nick. 1997. “Predictions from Philosophy? How Philosophers Could Make emselves
Useful.” Unpublished manuscript. Last revised September 19, 1998.
BIBLIOGRAPHY
|
307
Bostrom, Nick. 2002a. Anthropic Bias: Observation Selection Eects in Science and Philosophy. New
York: Routledge.
Bostrom, Nick. 2002b. “Existential Risks: Analyzing Human Extinction Scenarios and Related
Hazards. Journal of Evolution and Technology 9.
Bostrom, Nick. 2003a. “Are We Living in a Computer Simulation?” Philosophical Quarterly 53 (211):
243–55.
Bostrom, Nick. 2003b. “Astronomical Waste: e Opportunity Cost of Delayed Technological
Development. Utilitas 15 (3): 308–314.
Bostrom, Nick. 2003c. “Ethical Issues in Advanced Articial Intelligence.” In Cognitive, Emotive and
Ethical Aspects of Decision Making in Humans and in Articial Intelligence, edited by Iva Smit and
George E. Lasker, 2: 1217. Windsor, ON: International Institute for Advanced Studies in Systems
Research / Cybernetics.
Bostrom, Nick. 2004. “e Future of Human Evolution.” In Two Hundred Years Aer Kant, Fiy
Years Aer Turing, edited by Charles Tandy, 2: 339–371. Death and Anti-Death. Palo Alto, CA:
Ria University Press.
Bostrom, Nick. 2006a. “How Long Before Superintelligence?Linguistic and Philosophical Investiga-
tions 5(1): 11–30.
Bostrom, Nick. 2006b. “Quantity of Experience: Brain-Duplication and Degrees of Consciousness.”
Minds and Machines 16 (2): 185–200.
Bostrom, Nick. 2006c. “What is a Singleton?” Linguistic and Philosophical Investigations 5 (2): 48–54.
Bostrom, Nick. 2007. “Technological Revolutions: Ethics and Policy in the Dark.” In Nanoscale:
Issues and Perspectives for the Nano Century, edited by Nigel M. de S. Cameron and M. Ellen
Mitchell, 129–52. Hoboken, NJ: Wiley.
Bostrom, Nick. 2008a. “Where Are ey? Why I Hope the Search for Extraterrestrial Life Finds
Nothing.” MIT Technology Review, May/June issue, 72–7.
Bostrom, Nick. 2008b. “Why I Want to Be a Posthuman When I Grow Up.” In Medical Enhancement
and Posthumanity, edited by Bert Gordijn and Ruth Chadwick, 107–37. New York: Springer.
Bostrom, Nick. 2008c. “Letter from Utopia.” Studies in Ethics, Law, and Technology 2 (1): 1–7.
Bostrom, Nick. 2009a. “Moral Uncertainty – Towards a Solution?” Overcoming Bias (blog), January1.
Bostrom, Nick. 2009b. “Pascals Mugging.” Analysis 69 (3): 443–5.
Bostrom, Nick. 2009c. “e Future of Humanity.” In New Waves in Philosophy of Technology, edited
by Jan Kyrre Berg Olsen, Evan Selinger, and Søren Riis, 186–215. New York: Palgrave Macmillan.
Bostrom, Nick. 2011a. “Information Hazards: A Typology of Potential Harms from Knowledge.”
Review of Contemporary Philosophy 10: 44–79.
Bostrom, Nick. 2011b. “Innite Ethics.” Analysis and Metaphysics 10: 9–59.
Bostrom, Nick. 2012. “e Superintelligent Will: Motivation and Instrumental Rationality in Ad-
vanced Articial Agents.” In “eory and Philosophy of AI,” edited by Vincent C. Müller, special
issue, Minds and Machines 22 (2): 7185.
Bostrom, Nick, and Ćirković, Milan M. 2003. “e Doomsday Argument and the Self-Indication
Assumption: Reply to Olum.” Philosophical Quarterly 53 (210): 8391.
Bostrom, Nick, and Ord, Toby. 2006. “e Reversal Test: Eliminating the Status Quo Bias in Applied
Ethics.” Ethics 116 (4): 656–79.
Bostrom, Nick, and Roache, Rebecca. 2011. “Smart Policy: Cognitive Enhancement and the Public
Interest.” In Enhancing Human Capacities, edited by Julian Savulescu, Ruud ter Meulen, and Guy
Kahane, 138–49. Malden, MA: Wiley-Blackwell.
Bostrom, Nick and Sandberg, Anders. 2009a. “Cognitive Enhancement: Methods, Ethics, Regula-
tory Challenges.Science and Engineering Ethics 15 (3): 311–41.
Bostrom, Nick and Sandberg, Anders. 2009b. “e Wisdom of Nature: An Evolutionary Heuristic
for Human Enhancement.” In Human Enhancement, 1st ed., edited by Julian Savulescu and Nick
Bostrom, 375–416. New York: Oxford University Press.
308
|
BIBLIOGRAPHY
Bostrom, Nick, Sandberg, Anders, and Douglas, Tom. 2013. “e Unilateralists Curse: e Case for
a Principle of Conformity.” Working Paper. Retrieved February 28, 2013. Available at http://www.
nickbostrom.com/papers/unilateralist.pdf.
Bostrom, Nick, and Yudkowsky, Eliezer. Forthcoming. “e Ethics of Articial Intelligence.” In
Cambridge Handbook of Articial Intelligence, edited by Keith Frankish and William Ramsey.
New York: Cambridge University Press.
Boswell, James. 1917. Boswell’s Life of Johnson. New York: Oxford University Press.
Bouchard, T. J. 2004. “Genetic Inuence on Human Psychological Traits: A Survey.Current Direc-
tions in Psychological Science 13 (4): 148–51.
Bourget, David, and Chalmers, David. 2009. e PhilPapers Surveys.” November. Available at
http://philpapers.org/surveys/.
Bradbury, Robert J. 1999. “Matrioshka Brains.” Archived version. As revised August 16, 2004.
Available at http://web.archive.org/web/20090615040912/http://www.aeiveos.com/~bradbury/
MatrioshkaBrains/MatrioshkaBrainsPaper.html.
Brinton, Crane. 1965. e Anatomy of Revolution. Revised ed. New York: Vintage Books.
Bryson, Arthur E., Jr., and Ho, Yu-Chi. 1969. Applied Optimal Control: Optimization, Estimation,
and Control. Waltham, MA: Blaisdell.
Buehler, Martin, Iagnemma, Karl, and Singh, Sanjiv, eds. 2009. e DARPA Urban Challenge:
Autonomous Vehicles in City Trac. Springer Tracts in Advanced Robotics 56. Berlin: Springer.
Burch-Brown, J. 2014. “Clues for Consequentialists.Utilitas 26 (1): 105–19.
Burke, Colin. 2001. “Agnes Meyer Driscoll vs. the Enigma and the Bombe.” Unpublished manuscript.
Retrieved February 22, 2013. Available at http://userpages.umbc.edu/~burke/driscoll1-2011.pdf.
Canbäck, S., Samouel, P., and Price, D. 2006. “Do Diseconomies of Scale Impact Firm Size and Per-
formance? A eoretical and Empirical Overview.Journal of Managerial Economics 4 (1): 27–70.
Carmena, J. M., Lebedev, M. A., Crist, R. E., O’Doherty, J. E., Santucci, D. M., Dimitrov, D. F., Patil,
P. G., Henriquez, C. S., and Nicolelis, M. A. 2003. “Learning to Control a Brain–Machine Interface
for Reaching and Grasping by Primates.Public Library of Science Biology 1 (2): 193–208.
Carroll, Bradley W., and Ostlie, Dale A. 2007. An Introduction to Modern Astrophysics. 2nd ed. San
Francisco: Pearson Addison Wesley.
Carroll, John B. 1993. Human Cognitive Abilities: A Survey of Factor-Analytic Studies. New York:
Cambridge University Press.
Carter, Brandon. 1983.e Anthropic Principle and its Implications for Biological Evolution.
Philo sophical Transactions of the Royal Society A: Mathematical, Physical and Engineering
Sciences 310 (1512): 347–63.
Carter, Brandon. 1993. “e Anthropic Selection Principle and the Ultra-Darwinian Synthesis.” In
e Anthropic Principle: Proceedings of the Second Venice Conference on Cosmology and Philoso-
phy, edited by F. Bertola and U. Curi, 33–66. Cambridge: Cambridge University Press.
CFTC & SEC (Commodity Futures Trading Commission and Securities & Exchange Commission).
2010. Findings Regarding the Market Events of May 6, 2010: Report of the Stas of the CFTC and SEC
to the Joint Advisory Committee on Emerging Regulatory Issues. Washington, DC.
Chalmers, David John. 2010. “e Singularity: A Philosophical Analysis.Journal of Consciousness
Studies 17 (9–10): 7–65.
Chason, R. J., Csokmay, J., Segars, J. H., DeCherney, A. H., and Armant, D. R. 2011. “Environmental
and Epigenetic Eects Upon Preimplantation Embryo Metabolism and Development.” Trends in
Endocrinology and Metabolism 22 (10): 412–20.
Chen, S., and Ravallion, M. 2010. “e Developing World Is Poorer an We ought, But No Less
Successful in the Fight Against Poverty.Quarterly Journal of Economics 125 (4): 1577–1625.
Chislenko, Alexander. 1996. “Networking in the Mind Age: Some oughts on Evolution of Robotics
and Distributed Systems.” Unpublished manuscript.
Chislenko, Alexander. 1997. “Technology as Extension of Human Functional Architecture.” Extropy
Online.
BIBLIOGRAPHY
|
309
Chorost, Michael. 2005. Rebuilt: How Becoming Part Computer Made Me More Human. Boston:
Houghton Miin.
Christiano, Paul F. 2012. “‘Indirect Normativity’ Write-up.Ordinary Ideas (blog), April 21.
CIA. 2013. “The World Factbook.” Central Intelligence Agency. Retrieved August 3. Avail-
able at https://www.cia.gov/library/publications/the-world-factbook/rankorder/2127rank.html?
countryname=United%20States&countrycode=us&regionCode=noa&rank=121#us.
Cicero. 1923. “On Divination.” In On Old Age, on Friendship, on Divination, translated by W. A.
Falconer. Loeb Classical Library. Cambridge, MA: Harvard University Press.
Cirasella, Jill, and Kopec, Danny. 2006. “e History of Computer Games.” Exhibit at Dartmouth
Articial Intelligence Conference: e Next Fiy Years (AI@50), Dartmouth College, July
13–15.
Ćirković, Milan M. 2004. “Forecast for the Next Eon: Applied Cosmology and the Long-Term Fate of
Intelligent Beings.” Foundations of Physics 34 (2): 23961.
Ćirković, Milan M., Sandberg, Anders, and Bostrom, Nick. 2010. “Anthropic Shadow: Observation
Selection Eects and Human Extinction Risks.” Risk Analysis 30 (10): 1495–1506.
Clark, Andy, and Chalmers, David J. 1998. “e Extended Mind.Analysis 58 (1): 7–19.
Clark, Gregory. 2007. A Farewell to Alms: A Brief Economic History of the World. 1st ed. Princeton,
NJ: Princeton University Press.
Clavin, Whitney. 2012. “Study Shows Our Galaxy Has at Least 100 Billion Planets.Jet Propulsion
Laboratory, January 11.
CME Group. 2010. What Happened on May 6th? Chicago, May 10.
Coase, R. H. 1937. “e Nature of the Firm.” Economica 4 (16): 386405.
Cochran, Gregory, and Harpending, Henry. 2009. e 10,000 Year Explosion: How Civilization
Accelerated Human Evolution. New York: Basic Books.
Cochran, G., Hardy, J., and Harpending, H. 2006. “Natural History of Ashkenazi Intelligence.”
Journal of Biosocial Science 38 (5): 659–93.
Cook, James Gordon. 1984. Handbook of Textile Fibres: Natural Fibres. Cambridge: Woodhead.
Cope, David. 1996. Experiments in Musical Intelligence. Computer Music and Digital Audio Series.
Madison, WI: A-R Editions.
Cotman, Carl W., and Berchtold, Nicole C. 2002. “Exercise: A Behavioral Intervention to Enhance
Brain Health and Plasticity.” Trends in Neurosciences 25 (6): 295–301.
Cowan, Nelson. 2001. “e Magical Number 4 in Short-Term Memory: A Reconsideration of Mental
Storage Capacity.Behavioral and Brain Sciences 24 (1): 87–114.
Crabtree, Steve. 1999. “New Poll Gauges Americans’ General Knowledge Levels.” Gallup News, July6.
Cross, Stephen E., and Walker, Edward. 1994. “Dart: Applying Knowledge Based Planning and
Scheduling to Crisis Action Planning.” In Intelligent Scheduling, edited by Monte Zweben and
Mark Fox, 711–29. San Francisco, CA: Morgan Kaufmann.
Crow, James F. 2000. “e Origins, Patterns and Implications of Human Spontaneous Mutation.”
Nature Reviews Genetics 1 (1): 407.
Cyranoski, David. 2013. “Stem Cells: Egg Engineers.” Nature 500 (7463): 392–4.
Dagnelie, Gislin. 2012. “Retinal Implants: Emergence of a Multidisciplinary Field.” Current Opinion
in Neurology 25 (1): 67–75.
Dai, Wei. 2009. “Towards a New Decision eory.Less Wrong (blog), August 13.
Dalrymple, David. 2011. “Comment on Kaufman, J. ‘Whole Brain Emulation: Looking at Progress
on C. Elegans.’ ” Less Wrong (blog), October 29.
Davies, G., Tenesa, A., Payton, A., Yang, J., Harris, S. E., Liewald, D., Ke, X., et al. 2011. “Genome-
Wide Association Studies Establish at Human Intelligence Is Highly Heritable and Polygenic.
Molecular Psychiatry 16 (10): 996–1005.
Davis, Oliver S. P., Butcher, Lee M., Docherty, Sophia J., Meaburn, Emma L., Curtis, Charles
J. C., Simpson, Michael A., Schalkwyk, Leonard C., and Plomin, Robert. 2010. “A ree-Stage
30
|
BIBLIOGRAPHY
Genome-Wide Association Study of General Cognitive Ability: Hunting the Small Eects.
Behavior Genetics 40 (6): 759–767.
Dawkins, Richard. 1995. River Out of Eden: A Darwinian View of Life. Science Masters Series. New
York: Basic Books.
De Blanc, Peter. 2011. Ontological Crises in Articial Agents’ Value Systems. Machine Intelligence
Research Institute, San Francisco, CA, May 19.
De Long, J. Bradford. 1998. “Estimates of World GDP, One Million B.C.–Present.” Unpublished
manuscript.
De Raedt, Luc, and Flach, Peter, eds. 2001. Machine Learning: ECML 2001: 12th European Confer-
ence on Machine Learning, Freiburg, Germany, September 5–7, 2001. Proceedings. Lecture Notes in
Computer Science 2167. New York: Springer.
Dean, Cornelia. 2005. “Scientic Savvy? In U.S., Not Much.” New York Times, August 30.
Deary, Ian J. 2001. “Human Intelligence Dierences: A Recent History.Trends in Cognitive Sciences
5 (3): 127–30.
Deary, Ian J. 2012. “Intelligence.Annual Review of Psychology 63: 45382.
Deary, Ian J., Penke, L., and Johnson, W. 2010. “e Neuroscience of Human Intelligence Dier-
ences.” Nature Reviews Neuroscience 11 (3): 201–11.
Degnan, G. G., Wind, T. C., Jones, E. V., and Edlich, R. F. 2002. “Functional Electrical Stimulation in
Tetraplegic Patients to Restore Hand Function.Journal of Long-Term Eects of Medical Implants
12 (3): 175–88.
Devlin, B., Daniels, M., and Roeder, K. 1997. “e Heritability of IQ.Nature 388 (6641): 468–71.
Dewey, Daniel. 2011. “Learning What to Value.” In Articial General Intelligence: 4th International
Conference, AGI 2011, Mountain View, CA, USA, August 36, 2011. Proceedings, edited by Jürgen
Schmidhuber, Kristinn R. órisson, and Moshe Looks, 30914. Lecture Notes in Computer Sci-
ence 6830. Berlin: Springer.
Dowe, D. L., and Herndez-Orallo, J. 2012. “IQ Tests Are Not for Machines, Yet.Intelligence 40
(2): 77–81.
Drescher, Gary L. 2006. Good and Real: Demystifying Paradoxes from Physics to Ethics. Bradford
Books. Cambridge, MA: MIT Press.
Drexler, K. Eric. 1986. Engines of Creation. Garden City, NY: Anchor.
Drexler, K. Eric. 1992. Nanosystems: Molecular Machinery, Manufacturing, and Computation. New
York: Wiley.
Drexler, K. Eric. 2013. Radical Abundance: How a Revolution in Nanotechnology Will Change Civili-
zation. New York: PublicAairs.
Driscoll, Kevin. 2012. “Code Critique: ‘Altair Music of a Sort.” Paper presented at Critical Code
Studies Working Group Online Conference, 2012, February 6.
Dyson, Freeman J. 1960. “Search for Articial Stellar Sources of Infrared Radiation.” Science 131
(3414): 1667–1668.
Dyson, Freeman J. 1979. Disturbing the Universe. 1st ed. Sloan Foundation Science Series. New York:
Harper & Row.
Elga, Adam. 2004. “Defeating Dr. Evil with Self-Locating Belief.” Philosophy and Phenomenological
Research 69 (2): 383–96.
Elga, Adam. 2007. “Reection and Disagreement.” Noûs 41 (3): 478–502.
Eliasmith, Chris, Stewart, Terrence C., Choo, Xuan, Bekolay, Trevor, DeWolf, Travis, Tang, Yichuan,
and Rasmussen, Daniel. 2012. “A Large-Scale Model of the Functioning Brain.Science 338(6111):
1202–5.
Ellis, J. H. 1999. “e History of Non-Secret Encryption.Cryptologia 23 (3): 267–73.
Elyasaf, Achiya, Hauptmann, Ami, and Sipper, Moche. 2011. “Ga-Freecell: Evolving Solvers for the
Game of Freecell.” In Proceedings of the 13th Annual Genetic and Evolutionary Computation Con-
ference, 1931–1938. GECCO’ 11. New York: ACM.
BIBLIOGRAPHY
|
3
Eppig, C., Fincher, C. L., and ornhill, R. 2010. “Parasite Prevalence and the Worldwide Distribu-
tion of Cognitive Ability.Proceedings of the Royal Society B: Biological Sciences 277 (1701): 3801–8.
Espenshade, T. J., Guzman, J. C., and Westo, C. F. 2003. “e Surprising Global Variation in Re-
placement Fertility.” Population Research and Policy Review 22 (5–6): 57583.
Evans, omas G. 1964. “A Heuristic Program to Solve Geometric-Analogy Problems.” In Proceedings
of the April 21–23, 1964, Spring Joint Computer Conference, 327–338. AFIPS ’64. New York: ACM.
Evans, omas G. 1968. “A Program for the Solution of a Class of Geometric-Analogy Intelligence-
Test Questions.” In Semantic Information Processing, edited by Marvin Minsky, 271–353.
Cambridge, MA: MIT Press.
Faisal, A. A., Selen, L. P., and Wolpert, D. M. 2008. “Noise in the Nervous System.” Nature Reviews
Neuroscience 9 (4): 292–303.
Faisal, A. A., White, J. A., and Laughlin, S. B. 2005. “Ion-Channel Noise Places Limits on the Mini-
aturization of the Brain’s Wiring.” Current Biology 15 (12): 1143–9.
Feldman, Jacob. 2000. “Minimization of Boolean Complexity in Human Concept Learning.Nature
407 (6804): 630–3.
Feldman, J. A., and Ballard, Dana H. 1982. “Connectionist Models and eir Properties.Cognitive
Science 6 (3): 205–254.
Foley, J. A., Monfreda, C., Ramankutty, N., and Zaks, D. 2007. “Our Share of the Planetary Pie.”
Proceedings of the National Academy of Sciences of the United States of America 104 (31): 12585–6.
Forgas, Joseph P., Cooper, Joel, and Crano, William D., eds. 2010. e Psychology of Attitudes and
Attitude Change. Sydney Symposium of Social Psychology. New York: Psychology Press.
Frank, Robert H. 1999. Luxury Fever: Why Money Fails to Satisfy in an Era of Excess. New York: Free
Press.
Fredriksen, Kaja Bonesmo. 2012. Less Income Inequality and More Growth – Are ey Compatible?:
Part 6. e Distribution of Wealth. Technical report, OECD Economics Department Working
Papers 929. OECD Publishing.
Freitas, Robert A., Jr. 1980. “A Self-Replicating Interstellar Probe.” Journal of the British Interplan-
etary Society 33: 251–64.
Freitas, Robert A., Jr. 2000. “Some Limits to Global Ecophagy by Biovorous Nanoreplicators, with
Public Policy Recommendations.” Foresight Institute. April. Retrieved July 28, 2013. Available at
http://www.foresight.org/nano/Ecophagy.html.
Freitas, Robert A., Jr., and Merkle, Ralph C. 2004. Kinematic Self-Replicating Machines. Georgetown,
TX: Landes Bioscience.
Gaddis, John Lewis. 1982. Strategies of Containment: A Critical Appraisal of Postwar American Na-
tional Security Policy. New York: Oxford University Press.
Gammoned.net. 2012. “Snowie.” Archived version. Retrieved June 30. Available at http://web.
archive.org/web/20070920191840/http://www.gammoned.com/snowie.html.
Gates, Bill. 1975. “Soware Contest Winners Announced.” Computer Notes 1 (2): 1.
Georgie, Michael K. 2007. “Nutrition and the Developing Brain: Nutrient Priorities and Measure-
ment.” American Journal of Clinical Nutrition 85 (2): 614S620S.
Gianaroli, Luca. 2000. “Preimplantation Genetic Diagnosis: Polar Body and Embryo Biopsy.
Supplement, Human Reproduction 15 (4): 69–75.
Gilovich, omas, Grin, Dale, and Kahneman, Daniel, eds. 2002. Heuristics and Biases: e
Psychology of Intuitive Judgment. New York: Cambridge University Press.
Gilster, Paul. 2012. “ESO: Habitable Red Dwarf Planets Abundant.Centauri Dreams (blog),
March29.
Goldstone, Jack A. 1980. “eories of Revolution: e ird Generation.” World Politics 32 (3):
42553.
Goldstone, Jack A. 2001. “Towards a Fourth Generation of Revolutionary eory.Annual Review of
Political Science 4: 13987.
32
|
BIBLIOGRAPHY
Good, Irving John. 1965. “Speculations Concerning the First Ultraintelligent Machine.” In Ad-
vances in Computers, edited by Franz L. Alt and Morris Rubino, 6: 31–88. New York: Academic
Press.
Good, Irving John. 1970. “Some Future Social Repercussions of Computers.” International Journal
of Environmental Studies 1 (1–4): 67–79.
Good, Irving John. 1976. “Book review of ‘e inking Computer: Mind Inside Matter’” In
International Journal of Man-Machine Studies 8: 617–20.
Good, Irving John. 1982. “Ethical Machines.” In Intelligent Systems: Practice and Perspective, edited
by J. E. Hayes, Donald Michie, and Y.-H. Pao, 555–60. Machine Intelligence 10. Chichester: Ellis
Horwood.
Goodman, Nelson. 1954. Fact, Fiction, and Forecast. 1st ed. London: Athlone Press.
Gott, J. R., Juric, M., Schlegel, D., Hoyle, F., Vogeley, M., Tegmark, M., Bahcall, N., and Brinkmann,
J. 2005. “A Map of the Universe.Astrophysical Journal 624 (2): 463–83.
Gottfredson, Linda S. 2002. “G: Highly General and Highly Practical.” In e General Factor of
Intelligence: How General Is It?, edited by Robert J. Sternberg and Elena L. Grigorenko, 331–80.
Mahwah, NJ: Lawrence Erlbaum.
Gould, S. J. 1990. Wonderful Life: e Burgess Shale and the Nature of History. New York: Norton.
Graham, Gordon. 1997. e Shape of the Past: A Philosophical Approach to History. New York:
Oxford University Press.
Gray, C. M., and McCormick, D. A. 1996. “Chattering Cells: Supercial Pyramidal Neurons Con-
tributing to the Generation of Synchronous Oscillations in the Visual Cortex.” Science 274 (5284):
109–13.
Greene, Kate. 2012. “Intels Tiny Wi-Fi Chip Could Have a Big Impact.MIT Technology Review,
September 21.
Guizzo, Erico. 2010. “World Robot Population Reaches 8.6 Million.” IEEE Spectrum, April 14.
Gunn, James E. 1982. Isaac Asimov: e Foundations of Science Fiction. Science-Fiction Writers. New
York: Oxford University Press.
Haberl, Helmut, Erb, Karl-Heinz, and Krausmann, Fridolin. 2013. “Global Human Appropriation of
Net Primary Production (HANPP).Encyclopedia of Earth, September 3.
Haberl, H., Erb, K. H., Krausmann, F., Gaube, V., Bondeau, A., Plutzar, C., Gingrich, S., Lucht, W.,
and Fischer-Kowalski, M. 2007. “Quantifying and Mapping the Human Appropriation of Net Pri-
mary Production in Earths Terrestrial Ecosystems.Proceedings of the National Academy of Sci-
ences of the United States of America 104 (31): 129427.
Hájek, Alan. 2009. “Dutch Book Arguments.” In Anand, Pattanaik, and Puppe 2009, 173–95.
Hall, John Storrs. 2007. Beyond AI: Creating the Conscience of the Machine. Amherst, NY:
Prometheus Books.
Hampson, R. E., Song, D., Chan, R. H., Sweatt, A. J., Riley, M. R., Gerhardt, G. A., Shin, D. C.,
Marmarelis, V. Z., Berger, T. W., and Deadwyler, S. A. 2012. “A Nonlinear Model for Hippocampal
Cognitive Prosthesis: Memory Facilitation by Hippocampal Ensemble Stimulation.IEEE Trans-
actions on Neural Systems and Rehabilitation Engineering 20 (2): 18497.
Hanson, Robin. 1994. “If Uploads Come First: e Crack of a Future Dawn.” Extropy 6 (2).
Hanson, Robin. 1995. “Could Gambling Save Science? Encouraging an Honest Consensus.” Social
Epistemology 9 (1): 3–33.
Hanson, Robin. 1998a. “Burning the Cosmic Commons: Evolutionary Strategies for Interstellar
Colonization.” Unpublished manuscript, July 1. Retrieved April 26, 2012. http://hanson.gmu.edu/
lluniv.pdf.
Hanson, Robin. 1998b. “Economic Growth Given Machine Intelligence.” Unpublished manuscript.
Retrieved May 15, 2013. Available at http://hanson.gmu.edu/aigrow.pdf.
Hanson, Robin. 1998c. “Long-Term Growth as a Sequence of Exponential Modes.” Unpublished
manuscript. Last revised December 2000. Available at http://hanson.gmu.edu/longgrow.pdf.
BIBLIOGRAPHY
|
33
Hanson, Robin. 1998d. “Must Early Life Be Easy? e Rhythm of Major Evolutionary Transitions.”
Unpublished manuscript, September 23. Retrieved August 12, 2012. Available at http://hanson.
gmu.edu/hardstep.pdf.
Hanson, Robin. 2000. “Shall We Vote on Values, But Bet on Beliefs?” Unpublished manuscript, Sep-
tember. Last revised October 2007. Available at http://hanson.gmu.edu/futarchy.pdf.
Hanson, Robin. 2006. “Uncommon Priors Require Origin Disputes.” eory and Decision 61 (4):
319–328.
Hanson, Robin. 2008. “Economics of the Singularity.IEEE Spectrum 45 (6): 45–50.
Hanson, Robin. 2009. “Tiptoe or Dash to Future?Overcoming Bias (blog), December 23.
Hanson, Robin. 2012. “Envisioning the Economy, and Society, of Whole Brain Emulations.” Paper
presented at the AGI Impacts conference 2012.
Hart, Oliver. 2008. “Economica Coase Lecture Reference Points and the eory of the Firm.
Economica 75 (299): 404–11.
Hay, Nicholas James. 2005. “Optimal Agents.” B.Sc. thesis, University of Auckland.
Hedberg, Sara Reese. 2002. “Dart: Revolutionizing Logistics Planning.” IEEE Intelligent Systems
17(3): 81–3.
Helliwell, John, Layard, Richard, and Sachs, Jerey. 2012. World Happiness Report. e Earth
Institute.
Helmstaedter, M., Briggman, K. L., and Denk, W. 2011. “High-Accuracy Neurite Reconstruction for
High-roughput Neuroanatomy.” Nature Neuroscience 14 (8): 1081–8.
Heyl, Jeremy S. 2005. “e Long-Term Future of Space Travel.” Physical Review D 72 (10): 1–4.
Hibbard, Bill. 2011. “Measuring Agent Intelligence via Hierarchies of Environments.” In Articial
General Intelligence: 4th International Conference, AGI 2011, Mountain View, CA, USA, August
3–6, 2011. Proceedings, edited by Jürgen Schmidhuber, Kristinn R. órisson, and Moshe Looks,
303–8. Lecture Notes in Computer Science 6830. Berlin: Springer.
Hinke, R. M., Hu, X., Stillman, A. E., Herkle, H., Salmi, R., and Ugurbil, K. 1993. “Functional
Magnetic Resonance Imaging of Broca’s Area During Internal Speech.” Neuroreport 4 (6):
6758.
Hinxton Group. 2008. Consensus Statement: Science, Ethics and Policy Challenges of Pluripotent
Stem Cell-Derived Gametes. Hinxton, Cambridgeshire, UK, April 11. Available at http://www.
hinxtongroup.org/Consensus_HG08_FINAL.pdf.
Homan, David E. 2009. e Dead Hand: e Untold Story of the Cold War Arms Race and Its Dan-
gerous Legacy. New York: Doubleday.
Hofstadter, Douglas R. (1979) 1999. Gödel, Escher, Bach: An Eternal Golden Braid. New York: Basic
Books.
Holley, Rose. 2009. “How Good Can It Get? Analysing and Improving OCR Accuracy in Large Scale
Historic Newspaper Digitisation Programs.D-Lib Magazine 15 (34).
Horton, Sue, Alderman, Harold, and Rivera, Juan A. 2008. Copenhagen Consensus 2008 Challenge
Paper: Hunger and Malnutrition. Technical report. Copenhagen Consensus Center, May 11.
Howson, Colin, and Urbach, Peter. 1993. Scientic Reasoning: e Bayesian Approach. 2nd ed.
Chicago: Open Court.
Hsu, Stephen. 2012. “Investigating the Genetic Basis for Intelligence and Other Quantitative Traits.”
Lecture given at UC Davis Department of Physics Colloquium, Davis, CA, February 13.
Huebner, Bryce. 2008. “Do You See What We See? An Investigation of an Argument Against
Collective Representation.” Philosophical Psychology 21 (1): 91112.
Hu, C. D., Xing, J., Rogers, A. R., Witherspoon, D., and Jorde, L. B. 2010. “Mobile Elements Reveal
Small Population Size in the Ancient Ancestors of Homo Sapiens.” Proceedings of the National
Academy of Sciences of the United States of America 107 (5): 2147–52.
Human, W. Cary, and Pless, Vera. 2003. Fundamentals of Error-Correcting Codes. New York:
Cambridge University Press.
34
|
BIBLIOGRAPHY
Hunt, Patrick. 2011. “Late Roman Silk: Smuggling and Espionage in the 6th Century CE.” Philolog,
Stanford University (blog), August 2.
Hutter, Marcus. 2001. “Towards a Universal eory of Articial Intelligence Based on Algorithmic
Probability and Sequential Decisions.” In De Raedt and Flach 2001, 226–38.
Hutter, Marcus. 2005. Universal Articial Intelligencet: Sequential Decisions Based On Algorithmic
Probability. Texts in eoretical Computer Science. Berlin: Springer.
Iliadou, A. N., Janson, P. C., and Cnattingius, S. 2011. “Epigenetics and Assisted Reproductive Tech-
nology.” Journal of Internal Medicine 270 (5): 414–20.
Isaksson, Anders. 2007. Productivity and Aggregate Growth: A Global Picture. Technical report
05/2007. Vienna, Austria: UNIDO (United Nations Industrial Development Organization)
Research and Statistics Branch.
Jones, Garret. 2009. “Articial Intelligence and Economic Growth: A Few Finger-Exercises.”
Unpublished manuscript, January. Retrieved November 5, 2012. Available at http://mason.gmu.
edu/~gjonesb/AIandGrowth.
Jones, Vincent C. 1985. Manhattan: e Army and the Atomic Bomb. United States Army in World
War II. Washington, DC: Center of Military History.
Joyce, James M. 1999. e Foundations of Causal Decision eory. Cambridge Studies in Probability,
Induction and Decision eory. New York: Cambridge University Press.
Judd, K. L., Schmedders, K., and Yeltekin, S. 2012. “Optimal Rules for Patent Races.International
Economic Review 53 (1): 23–52.
Kalfoglou, A., Suthers, K., Scott, J., and Hudson, K. 2004. Reproductive Genetic Testing: What Amer-
ica inks. Genetics and Public Policy Center.
Kamm, Frances M. 2007. Intricate Ethics: Rights, Responsibilities, and Permissible Harm. Oxford
Ethics Series. New York: Oxford University Press.
Kandel, Eric R., Schwartz, James H., and Jessell, omas M., eds. 2000. Principles of Neural Science.
4th ed. New York: McGraw-Hill.
Kansa, Eric. 2003. “Social Complexity and Flamboyant Display in Competition: More oughts on
the Fermi Paradox.” Unpublished manuscript, archived version.
Karnofsky, Holden. 2012. “Comment on ‘Reply to Holden on Tool AI.’ Less Wrong (blog), August 1.
Kasparov, Garry. 1996. “e Day at I Sensed a New Kind of Intelligence.Time, March 25, no. 13.
Kaufman, Je. 2011. “Whole Brain Emulation and Nematodes.Je Kaufman’s Blog (blog), Novem-
ber 2.
Keim, G. A., Shazeer, N. M., Littman, M. L., Agarwal, S., Cheves, C. M., Fitzgerald, J., Grosland,
J.,Jiang, F., Pollard, S., and Weinmeister, K. 1999. “Proverb: e Probabilistic Cruciverbalist.” In
Proceedings of the Sixteenth National Conference on Articial Intelligence, 710–17. Menlo Park,
CA: AAAI Press.
Kell, Harrison J., Lubinski, David, and Benbow, Camilla P. 2013. “Who Rises to the Top? Early Indi-
cators.” Psychological Science 24 (5): 648–59.
Keller, Wolfgang. 2004. “International Technology Diusion.” Journal of Economic Literature 42 (3):
75282.
KGS Go Server. 2012. “KGS Game Archives: Games of KGS player zen19.” Retrieved July 22,
2013. Available at http://www.gokgs.com/gameArchives.jsp?user=zen19d&oldAccounts=t&
year=2012&month=3.
Knill, Emanuel, Laamme, Raymond, and Viola, Lorenza. 2000. “eory of Quantum Error Correc-
tion for General Noise.” Physical Review Letters 84 (11): 2525–8.
Koch, K., McLean, J., Segev, R., Freed, M. A., Berry, M. J., Balasubramanian, V., and Sterling, P. 2006.
“How Much the Eye Tells the Brain.” Current Biology 16 (14): 1428–34.
Kong, A., Frigge, M. L., Masson, G., Besenbacher, S., Sulem, P., Magnusson, G., Gudjonsson, S. A.,
Sigurdsson, A., et al. 2012. “Rate of De Novo Mutations and the Importance of Father’s Age to
Disease Risk.” Nature 488: 471–5.
BIBLIOGRAPHY
|
35
Koomey, Jonathan G. 2011. Growth in Data Center Electricity Use 2005 to 2010. Technical report,
08/01/2011. Oakland, CA: Analytics Press.
Koubi, Vally. 1999. “Military Technology Races.” International Organization 53 (3): 537–65.
Koubi, Vally, and Lalman, David. 2007. “Distribution of Power and Military R&D.Journal of eo-
retical Politics 19 (2): 133–52.
Koza, J. R., Keane, M. A., Streeter, M. J., Mydlowec, W., Yu, J., and Lanza, G. 2003. Genetic Program-
ming IV: Routine Human-Competitive Machine Intelligence. 2nd ed. Genetic Programming.
Norwell, MA: Kluwer Academic.
Kremer, Michael. 1993. “Population Growth and Technological Change: One Million B.C. to 1990.”
Quarterly Journal of Economics 108 (3): 681–716.
Kruel, Alexander. 2011. “Interview Series on Risks from AI.Less Wrong Wiki (blog). Retrieved Oct
26, 2013. Available at http://wiki.lesswrong.com/wiki/Interview_series_on_risks_from_AI.
Kruel, Alexander. 2012. “Q&A with Experts on Risks From AI #2.” Less Wrong (blog), January 9.
Krusienski, D. J., and Shih, J. J. 2011. “Control of a Visual Keyboard Using an Electrocorticographic
Brain–Computer Interface.” Neurorehabilitation and Neural Repair 25 (4): 323–31.
Kuhn, omas S. 1962. e Structure of Scientic Revolutions. 1st ed. Chicago: University of Chicago
Press.
Kuipers, Benjamin. 2012. “An Existing, Ecologically-Successful Genus of Collectively Intelligent
Articial Creatures.” Paper presented at the 4th International Conference, ICCCI 2012, Ho Chi
Minh City, Vietnam, November 28–30.
Kurzweil, Ray. 2001. “Response to Stephen Hawking.” Kurzweil Accelerating Intelligence. Septem-
ber 5. Retrieved December 31, 2012. Available at http://www.kurzweilai.net/response-to-stephen-
hawking.
Kurzweil, Ray. 2005. e Singularity Is Near: When Humans Transcend Biology. New York: Viking.
Laont, Jean-Jacques, and Martimort, David. 2002. e eory of Incentives: e Principal-Agent
Model. Princeton, NJ: Princeton University Press.
Lancet, e. 2008. “Iodine Deciency—Way to Go Yet.” e Lancet 372 (9633): 88.
Landauer, omas K. 1986. “How Much Do People Remember? Some Estimates of the Quantity of
Learned Information in Long-Term Memory.Cognitive Science 10 (4): 477–93.
Lebedev, Anastasiya. 2004. “e Man Who Saved the World Finally Recognized.MosNews, May 21.
Lebedev, M. A., and Nicolelis, M. A. 2006. “Brain–Machine Interfaces: Past, Present and Future.
Trends in Neuroscience 29 (9): 53646.
Legg, Shane. 2008. “Machine Super Intelligence.” PhD diss., University of Lugano.
Leigh, E. G., Jr. 2010. “e Group Selection Controversy.” Journal of Evolutionary Biology 23(1): 619.
Lenat, Douglas B. 1982. “Learning Program Helps Win National Fleet Wargame Tournament.
SIGART Newsletter 79: 16–17.
Lenat, Douglas B. 1983. “EURISKO: A Program that Learns New Heuristics and Domain Concepts.
Articial Intelligence 21 (1–2): 61–98.
Lenman, James. 2000. “Consequentialism and Cluelessness.Philosophy & Public Aairs 29 (4):
342–70.
Lerner, Josh. 1997. “An Empirical Exploration of a Technology Race.RAND Journal of Economics
28 (2): 22847.
Leslie, John. 1996. e End of the World: e Science and Ethics of Human Extinction. London:
Routledge.
Lewis, David. 1988. “Desire as Belief.” Mind: A Quarterly Review of Philosophy 97 (387): 323–32.
Li, Ming, and Vitányi, Paul M. B. 2008. An Introduction to Kolmogorov Complexity and Its Applica-
tions. Texts in Computer Science. New York: Springer.
Lin, omas, Mausam, and Etzioni, Oren. 2012. “Entity Linking at Web Scale.” In Proceedings of the
Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction
36
|
BIBLIOGRAPHY
(AKBC-WEKEX12), edited by James Fan, Raphael Homan, Aditya Kalyanpur, Sebastian Riedel,
Fabian Suchanek, and Partha Pratim Talukdar, 84–88. Madison, WI: Omnipress.
Lloyd, Seth. 2000. “Ultimate Physical Limits to Computation.” Nature 406 (6799): 1047–54.
Louis Harris & Associates. 1969. “Science, Sex, and Morality Survey, study no. 1927.” Life Magazine
(New York) 4.
Lynch, Michael. 2010. “Rate, Molecular Spectrum, and Consequences of Human Mutation.Pro-
ceedings of the National Academy of Sciences of the United States of America 107 (3): 961–8.
Lyons, Mark K. 2011. “Deep Brain Stimulation: Current and Future Clinical Applications.” Mayo
Clinic Proceedings 86 (7): 662–72.
MacAskill, William. 2010. “Moral Uncertainty and Intertheoretic Comparisons of Value.” BPhil
thesis, University of Oxford.
McCarthy, John. 2007. “From Here to Human-Level AI.Articial Intelligence 171 (18): 117482.
McCorduck, Pamela. 1979. Machines Who ink: A Personal Inquiry into the History and Prospects
of Articial Intelligence. San Francisco: W. H. Freeman.
Mack, C. A. 2011. “Fiy Years of Moores Law.IEEE Transactions on Semiconductor Manufacturing
24 (2): 202–7.
MacKay, David J. C. 2003. Information eory, Inference, and Learning Algorithms. New York: Cam-
bridge University Press.
McLean, George, and Stewart, Brian. 1979. “Norad False Alarm Causes Uproar.e National. Aired
November 10. Ottawa, ON: CBC, 2012. News Broadcast.
Maddison, Angus. 1999. “Economic Progress: e Last Half Century in Historical Perspective.” In
Facts and Fancies of Human Development: Annual Symposium and Cunningham Lecture, 1999, ed-
ited by Ian Castles. Occasional Paper Series, 1/2000. Academy of the Social Sciences in Australia.
Maddison, Angus. 2001. e World Economy: A Millennial Perspective. Development Centre Stud-
ies. Paris: Development Centre of the Organisation for Economic Co-operation / Development.
Maddison, Angus. 2005. Growth and Interaction in the World Economy: e Roots of Modernity.
Washington, DC: AEI Press.
Maddison, Angus. 2007. Contours of the World Economy, 1–2030 AD: Essays in Macro-Economic His-
tory. New York: Oxford University Press.
Maddison, Angus. 2010. “Statistics of World Population, GDP and Per Capita GDP 1–2008 AD.”
Retrieved October 26, 2013. Available at http://www.ggdc.net/maddison/Historical_Statistics/
vertical-le_02-2010.xls.
Mai, Q., Yu, Y., Li, T., Wang, L., Chen, M. J., Huang, S. Z., Zhou, C., and Zhou, Q. 2007. “Derivation
of Human Embryonic Stem Cell Lines from Parthenogenetic Blastocysts.” Cell Research 17 (12):
1008–19.
Mak, J. N., and Wolpaw, J. R. 2009. “Clinical Applications of Brain-Computer Interfaces: Current
State and Future Prospects.” IEEE Reviews in Biomedical Engineering 2: 187–99.
Mankiw, N. Gregory. 2009. Macroeconomics. 7th ed. New York, NY: Worth.
Mardis, Elaine R. 2011. “A Decades Perspective on DNA Sequencing Technology.Nature 470 (7333):
198–203.
Marko, John. 2011. “Computer Wins on ‘Jeopardy!’: Trivial, It’s Not.New York Times, February 16.
Markram, Henry. 2006. “e Blue Brain Project.Nature Reviews Neuroscience 7 (2): 153–160.
Mason, Heather. 2003. “Gallup Brain: e Birth of In Vitro Fertilization.” Gallup, August 5.
Menzel, Randolf, and Giurfa, Martin. 2001. “Cognitive Architecture of a Mini-Brain: e Honey-
bee.Trends in Cognitive Sciences 5 (2): 62–71.
Metzinger, omas. 2003. Being No One: e Self-Model eory of Subjectivity. Cambridge, MA:
MIT Press.
Mijic, Roko. 2010. “Bootstrapping Safe AGI Goal Systems.” Paper presented at the Roadmaps to AGI
and the Future of AGI Workshop, Lugano, Switzerland, March 8.
BIBLIOGRAPHY
|
37
Mike, Mike. 2013. “Face of Tomorrow.” Retrieved June 30, 2012 . Available at http://faceoomorrow.org.
Milgrom, Paul, and Roberts, John. 1990. “Bargaining Costs, Inuence Costs, and the Organization
of Economic Activity.” In Perspectives on Positive Political Economy, edited by James E. Alt and
Kenneth A. Shepsle, 57–89. New York: Cambridge University Press.
Miller, George A. 1956. “e Magical Number Seven, Plus or Minus Two: Some Limits on Our
Capacity for Processing Information.Psychological Review 63 (2): 81–97.
Miller, Georey. 2000. e Mating Mind: How Sexual Choice Shaped the Evolution of Human Nature.
New York: Doubleday.
Miller, James D. 2012. Singularity Rising: Surviving and riving in a Smarter, Richer, and More
Dangerous World. Dallas, TX: BenBella Books.
Minsky, Marvin. 1967. Computation: Finite and Innite Machines. Englewood Clis, NJ: Prentice-
Hall.
Minsky, Marvin, ed. 1968. Semantic Information Processing. Cambridge, MA: MIT Press.
Minsky, Marvin. 1984. “Aerword to Vernor Vinge’s novel, ‘True Names.’ ” Unpublished manu-
script, October 1. Retrieved December 31, 2012. Available at http://web.media.mit.edu/~minsky/
papers/TrueNames.Aerword.html.
Minsky, Marvin. 2006. e Emotion Machine: Commonsense inking, Articial Intelligence, and the
Future of the Human Mind. New York: Simon & Schuster.
Minsky, Marvin, and Papert, Seymour. 1969. Perceptrons: An Introduction to Computational
Geometry. 1st ed. Cambridge, MA: MIT Press.
Moore, Andrew. 2011. “Hedonism.” In e Stanford Encyclopedia of Philosophy, Winter 2011, edited
by Edward N. Zalta. Stanford, CA: Stanford University.
Moravec, Hans P. 1976. “e Role of Raw Power in Intelligence.” Unpublished manuscript, May 12.
Retrieved August 12, 2012. Available at http://www.frc.ri.cmu.edu/users/hpm/project.archive/
general.articles/l975/Raw.Power.html.
Moravec, Hans P. 1980. “Obstacle Avoidance and Navigation in the Real World by a Seeing Robot
Rover.” PhD diss., Stanford University.
Moravec, Hans P. 1988. Mind Children: e Future of Robot and Human Intelligence. Cambridge,
MA: Harvard University Press.
Moravec, Hans P. 1998. “When Will Computer Hardware Match the Human Brain?Journal of
Evolution and Technology 1.
Moravec, Hans P. 1999. “Rise of the Robots.Scientic American, December, 124–35.
Muehlhauser, Luke, and Helm, Louie. 2012. “e Singularity and Machine Ethics.” In Singularity
Hypotheses: A Scientic and Philosophical Assessment, edited by Amnon Eden, Johnny Søraker,
James H. Moor, and Eric Steinhart. e Frontiers Collection. Berlin: Springer.
Muehlhauser, Luke, and Salamon, Anna. 2012. “Intelligence Explosion: Evidence and Import.” In
Singularity Hypotheses: A Scientic and Philosophical Assessment, edited by Amnon Eden, Johnny
Søraker, James H. Moor, and Eric Steinhart. e Frontiers Collection. Berlin: Springer.
Müller, Vincent C., and Bostrom, Nick. Forthcoming. “Future Progress in Articial Intelligence:
A Poll Among Experts.” In “Impacts and Risks of Articial General Intelligence,” edited by
Vincent C. Müller, special issue, Journal of Experimental and eoretical Articial Intelligence.
Murphy, Kevin P. 2012. Machine Learning: A Probabilistic Perspective. Adaptive Computation and
Machine Learning. Cambridge, MA: MIT Press.
Nachman, Michael W., and Crowell, Susan L. 2000. “Estimate of the Mutation Rate per Nucleotide
in Humans.Genetics 156 (1): 297–304.
Nagy, Z. P., and Chang, C. C. 2007. “Articial Gametes.eriogenology 67 (1): 99104.
Nagy, Z. P., Kerkis, I., and Chang, C. C. 2008. “Development of Articial Gametes.” Reproductive
BioMedicine Online 16 (4): 53944.
NASA. 2013. “International Space Station: Facts and Figures.” Available at http://www.nasa.gov/
worldbook/intspacestation_worldbook.html.
38
|
BIBLIOGRAPHY
Newborn, Monty. 2011. Beyond Deep Blue: Chess in the Stratosphere. New York: Springer.
Newell, Allen, Shaw, J. C., and Simon, Herbert A. 1958. “Chess-Playing Programs and the Problem of
Complexity.IBM Journal of Research and Development 2 (4): 320–35.
Newell, Allen, Shaw, J. C., and Simon, Herbert A. 1959. “Report on a General Problem-Solving Pro-
gram: Proceedings of the International Conference on Information Processing.” In Information
Processing, 25664. Paris: UNESCO.
Nicolelis, Miguel A. L., and Lebedev, Mikhail A. 2009. “Principles of Neural Ensemble Physiology
Underlying the Operation of Brain–Machine Interfaces.Nature Reviews Neuroscience 10 (7):
53040.
Nilsson, Nils J. 1984. Shakey the Robot, Technical Note 323. Menlo Park, CA: AI Center, SRI Inter-
national, April.
Nilsson, Nils J. 2009. e Quest for Articial Intelligence: A History of Ideas and Achievements. New
York: Cambridge University Press.
Nisbett, R. E., Aronson, J., Blair, C., Dickens, W., Flynn, J., Halpern, D. F., and Turkheimer, E.
2012. “Intelligence: New Findings and eoretical Developments.American Psychologist 67 (2):
130–59.
Niven, Larry. 1973. “e Defenseless Dead.” In Ten Tomorrows, edited by Roger Elwood, 91–142.
New York: Fawcett.
Nordhaus, William D. 2007. “Two Centuries of Productivity Growth in Computing.” Journal of
Economic History 67 (1): 128–59.
Norton, John D. 2011. “Waiting for Landauer.Studies in History and Philosophy of Science Part B:
Studies in History and Philosophy of Modern Physics 42 (3): 184–98.
Olds, James, and Milner, Peter. 1954. “Positive Reinforcement Produced by Electrical Stimulation of
Septal Area and Other Regions of Rat Brain.” Journal of Comparative and Physiological Psychology
47 (6): 419–27.
Olum, Ken D. 2002. “e Doomsday Argument and the Number of Possible Observers.” Philosophi-
cal Quarterly 52 (207): 164–84.
Omohundro, Stephen M. 2007. “e Nature of Self-Improving Articial Intelligence.” Paper pre-
sented at Singularity Summit 2007, San Francisco, CA, September 8–9.
Omohundro, Stephen M. 2008. “e Basic AI Drives.” In Articial General Intelligence 2008: Pro-
ceedings of the First AGI Conference, edited by Pei Wang, Ben Goertzel, and Stan Franklin, 483–92.
Frontiers in Articial Intelligence and Applications 171. Amsterdam: IOS.
Omohundro, Stephen M. 2012. “Rational Articial Intelligence for the Greater Good.” In Singularity
Hypotheses: A Scientic and Philosophical Assessment, edited by Amnon Eden, Johnny Søraker,
James H. Moor, and Eric Steinhart. e Frontiers Collection. Berlin: Springer.
O’Neill, Gerard K. 1974. “e Colonization of Space.” Physics Today 27 (9): 32–40.
Oshima, Hideki, and Katayama, Yoichi. 2010. “Neuroethics of Deep Brain Stimulation for Mental
Disorders: Brain Stimulation Reward in Humans.Neurologia medico-chirurgica 50 (9): 845–52.
Part, Derek. 1986. Reasons and Persons. New York: Oxford University Press.
Part, Derek. 2011. On What Matters. 2 vols. e Berkeley Tanner Lectures. New York: Oxford Uni-
versity Press.
Parrington, Alan J. 1997. “Mutually Assured Destruction Revisited.Airpower Journal 11 (4).
Pasqualotto, Emanuele, Federici, Stefano, and Belardinelli, Marta Olivetti. 2012. “Toward Function-
ing and Usable Brain–Computer Interfaces (BCIs): A Literature Review.Disability and Rehabili-
tation: Assistive Technology 7 (2): 89–103.
Pearl, Judea. 2009. Causality: Models, Reasoning, and Inference. 2nd ed. New York: Cambridge Uni-
versity Press.
Perlmutter, J. S., and Mink, J. W. 2006. “Deep Brain Stimulation.Annual Review of Neuroscience
29: 229–57.
Pinker, Steven. 2011. e Better Angels of Our Nature: Why Violence Has Declined. New York: Viking.
BIBLIOGRAPHY
|
39
Plomin, R., Haworth, C. M., Meaburn, E. L., Price, T. S., Wellcome Trust Case Control Consortium
2, and Davis, O. S. 2013. “Common DNA Markers Can Account for More than Half of the Genetic
Inuence on Cognitive Abilities.” Psychological Science 24 (2): 562–8.
Popper, Nathaniel. 2012. “Flood of Errant Trades Is a Black Eye for Wall Street.New York Times,
August 1.
Pourret, Olivier, Naim, Patrick, and Marcot, Bruce, eds. 2008. Bayesian Networks: A Practical Guide
to Applications. Chichester, West Sussex, UK: Wiley.
Powell, A., Shennan, S., and omas, M. G. 2009. “Late Pleistocene Demography and the Appearance
of Modern Human Behavior.” Science 324 (5932): 1298–1301.
Price, Huw. 1991. “Agency and Probabilistic Causality.” British Journal for the Philosophy of Science
42 (2): 157–76.
Qian, M., Wang, D., Watkins, W. E., Gebski, V., Yan, Y. Q., Li, M., and Chen, Z. P. 2005. “e Eects
of Iodine on Intelligence in Children: A Meta-Analysis of Studies Conducted in China.” Asia
Pacic Journal of Clinical Nutrition 14 (1): 32–42.
Quine, Willard Van Orman, and Ullian, Joseph Silbert. 1978. e Web of Belief, ed. Richard Malin
Ohmann, vol. 2. New York: Random House.
Railton, Peter. 1986. “Facts and Values.” Philosophical Topics 14 (2): 5–31.
Rajab, Moheeb Abu, Zarfoss, Jay, Monrose, Fabian, and Terzis, Andreas. 2006. “A Multifaceted Ap-
proach to Understanding the Botnet Phenomenon.” In Proceedings of the 6th ACM SIGCOMM
Conference on Internet Measurement, 41–52. New York: ACM.
Rawls, John. 1971. A eory of Justice. Cambridge, MA: Belknap.
Read, J. I., and Trentham, Neil. 2005. “e Baryonic Mass Function of Galaxies.Philosophical
Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 363 (1837):
2693–710.
Repantis, D., Schlattmann, P., Laisney, O., and Heuser, I. 2010. “Modanil and Methylphenidate for
Neuroenhancement in Healthy Individuals: A Systematic Review.Pharmacological Research 62
(3): 187–206.
Rhodes, Richard. 1986. e Making of the Atomic Bomb. New York: Simon & Schuster.
Rhodes, Richard. 2008. Arsenals of Folly: e Making of the Nuclear Arms Race. New York: Vintage.
Rietveld, Cornelius A., Medland, Sarah E., Derringer, Jaime, Yang, Jian, Esko, Tonu, Martin,
Nicolas W., Westra, Harm-Jan, Shakhbazov, Konstantin, Abdellaoui, Abdel, et al. 2013. “GWAS of
126,559 Individuals Identies Genetic Variants Associated with Educational Attainment.Science
340 (6139): 1467–71.
Ring, Mark, and Orseau, Laurent. 2011. “Delusion, Survival, and Intelligent Agents.” In Articial
General Intelligence: 4th International Conference, AGI 2011, Mountain View, CA, USA, August
36, 2011. Proceedings, edited by Jürgen Schmidhuber, Kristinn R. órisson, and Moshe Looks,
11–20. Lecture Notes in Computer Science 6830. Berlin: Springer.
Ritchie, Graeme, Manurung, Ruli, and Waller, Annalu. 2007. “A Practical Application of Computation-
al Humour.” In Proceedings of the 4th International Joint Workshop on Computational Creativity, ed-
ited by Amilcar Cardoso and Geraint A. Wiggins, 91–8. London: Goldsmiths, University of London.
Roache, Rebecca. 2008. “Ethics, Speculation, and Values.” NanoEthics 2 (3): 317–27.
Robles, J. A., Lineweaver, C. H., Grether, D., Flynn, C., Egan, C. A., Pracy, M. B., Holmberg, J., and
Gardner, E. 2008. “A Comprehensive Comparison of the Sun to Other Stars: Searching for Self-
Selection Eects.” Astrophysical Journal 684 (1): 691–706.
Roe, Anne. 1953. e Making of a Scientist. New York: Dodd, Mead.
Roy, Deb. 2012. “About.” Retrieved October 14. Available at http://web.media.mit.edu/~dkroy/.
Rubin, Jonathan, and Watson, Ian. 2011. “Computer Poker: A Review.Articial Intelligence 175
(56): 95887.
Rumelhart, D. E., Hinton, G. E., and Williams, R. J. 1986. “Learning Representations by Back-
Propagating Errors.Nature 323 (6088): 533–6.
320
|
BIBLIOGRAPHY
Russell, Bertrand. 1986. “e Philosophy of Logical Atomism.” In e Philosophy of Logical Atomism
and Other Essays 1914–1919, edited by John G. Slater, 8: 157–244. e Collected Papers of Bertrand
Russell. Boston: Allen & Unwin.
Russell, Bertrand, and Grin, Nicholas. 2001. e Selected Letters of Bertrand Russell: e Public
Years, 1914–1970. New York: Routledge.
Russell, Stuart J., and Norvig, Peter. 2010. Articial Intelligence: A Modern Approach. 3rd ed. Upper
Saddle River, NJ: Prentice-Hall.
Sabrosky, Curtis W. 1952. “How Many Insects Are ere?” In Insects, edited by United States Depart-
ment of Agriculture, 1–7. Yearbook of Agriculture. Washington, DC: United States Government
Printing Oce.
Salamon, Anna. 2009. “When Soware Goes Mental: Why Articial Minds Mean Fast Endogenous
Growth.” Working Paper, December 27.
Salem, D. J., and Rowan, A. N. 2001. e State of the Animals: 2001. Public Policy Series. Washington,
DC: Humane Society Press.
Salverda, W., Nolan, B., and Smeeding, T. M. 2009. e Oxford Handbook of Economic Inequality.
Oxford: Oxford University Press.
Samuel, A. L. 1959. “Some Studies in Machine Learning Using the Game of Checkers.IBM Journal
of Research and Development 3 (3): 210–19.
Sandberg, Anders. 1999. “e Physics of Information Processing Superobjects: Daily Life Among the
Jupiter Brains.Journal of Evolution and Technology 5.
Sandberg, Anders. 2010. “An Overview of Models of Technological Singularity.” Paper presented at
the Roadmaps to AGI and the Future of AGI Workshop, Lugano, Switzerland, March 8.
Sandberg, Anders. 2013. “Feasibility of Whole Brain Emulation.” In Philosophy and eory of Arti-
cial Intelligence, edited by Vincent C. Müller, 5: 251–64. Studies in Applied Philosophy, Epistemol-
ogy and Rational Ethics. New York: Springer.
Sandberg, Anders, and Bostrom, Nick. 2006. “Converging Cognitive Enhancements.” Annals of the
New York Academy of Sciences 1093: 201–27.
Sandberg, Anders, and Bostrom, Nick. 2008. Whole Brain Emulation: A Roadmap. Technical Report
2008-3. Future of Humanity Institute, University of Oxford.
Sandberg, Anders, and Bostrom, Nick. 2011. Machine Intelligence Survey. Technical Report 2011-1.
Future of Humanity Institute, University of Oxford.
Sandberg, Anders, and Savulescu, Julian. 2011. “e Social and Economic Impacts of Cognitive En-
hancement.” In Enhancing Human Capacities, edited by Julian Savulescu, Ruud ter Meulen, and
Guy Kahane, 92112. Malden, MA: Wiley-Blackwell.
Schaeer, Jonathan. 1997. One Jump Ahead: Challenging Human Supremacy in Checkers. New York:
Springer.
Schaeer, J., Burch, N., Bjornsson, Y., Kishimoto, A., Muller, M., Lake, R., Lu, P., and Sutphen, S.
2007. “Checkers Is Solved.Science 317 (5844): 1518–22.
Schalk, Gerwin. 2008. “Brain–Computer Symbiosis.Journal of Neural Engineering 5 (1): P1–P15.
Schelling, omas C. 1980. e Strategy of Conict. 2nd ed. Cambridge, MA: Harvard University Press.
Schultz, T. R. 2000. “In Search of Ant Ancestors.” Proceedings of the National Academy of Sciences of
the United States of America 97 (26): 14028–9.
Schultz, W., Dayan, P., and Montague, P. R. 1997. “A Neural Substrate of Prediction and Reward.
Science 275 (5306): 1593–9.
Schwartz, Jacob T. 1987. “Limits of Articial Intelligence.” In Encyclopedia of Articial Intelligence,
edited by Stuart C. Shapiro and David Eckroth, 1: 488–503. New York: Wiley.
Schwitzgebel, Eric. 2013. “If Materialism is True, the United States is Probably Conscious.” Working
Paper, February 8.
Sen, Amartya, and Williams, Bernard, eds. 1982. Utilitarianism and Beyond. New York: Cambridge
University Press.
BIBLIOGRAPHY
|
32
Shanahan, Murray. 2010. Embodiment and the Inner Life: Cognition and Consciousness in the Space
of Possible Minds. New York: Oxford University Press.
Shannon, Robert V. 2012. “Advances in Auditory Prostheses.” Current Opinion in Neurology 25 (1):
61–6.
Shapiro, Stuart C. 1992. “Articial Intelligence.” In Encyclopedia of Articial Intelligence, 2nd ed., 1:
54–7. New York: Wiley.
Sheppard, Brian. 2002. “World-Championship-Caliber Scrabble.” Articial Intelligence 134 (1–2):
241–75.
Shoemaker, Sydney. 1969. “Time Without Change.” Journal of Philosophy 66 (12): 363–81.
Shulman, Carl. 2010a. Omohundro’s “Basic AI Drives” and Catastrophic Risks. San Francisco, CA:
Machine Intelligence Research Institute.
Shulman, Carl. 2010b. Whole Brain Emulation and the Evolution of Superorganisms. San Francisco,
CA: Machine Intelligence Research Institute.
Shulman, Carl. 2012. “Could We Use Untrustworthy Human Brain Emulations to Make Trust-
worthy Ones?” Paper presented at the AGI Impacts conference 2012.
Shulman, Carl, and Bostrom, Nick. 2012. “How Hard is Articial Intelligence? Evolutionary Argu-
ments and Selection Eects.” Journal of Consciousness Studies 19 (7–8): 103–30.
Shulman, Carl, and Bostrom, Nick. 2014. “Embryo Selection for Cognitive Enhancement: Curiosity
or Game-Changer?Global Policy 5 (1): 85–92.
Shulman, Carl, Jonsson, Henrik, and Tarleton, Nick. 2009. “Which Consequentialism? Machine
Ethics and Moral Divergence.” In AP-CAP 2009: e Fih Asia-Pacic Computing and Philosophy
Conference, October 1st-2nd, University of Tokyo, Japan. Proceedings, edited by Carson Reynolds
and Alvaro Cassinelli, 23–25. AP-CAP 2009.
Sidgwick, Henry, and Jones, Emily Elizabeth Constance. 2010. e Methods of Ethics. Charleston,
SC: Nabu Press.
Silver, Albert. 2006. “How Strong Is GNU Backgammon?” Backgammon Galore! September 16.
Retrieved October 26, 2013. Available at http://www.bkgm.com/gnu/AllAboutGNU.html#how_
strong_is_gnu.
Simeral, J. D., Kim, S. P., Black, M. J., Donoghue, J. P., and Hochberg, L. R. 2011. “Neural Control
of Cursor Trajectory and Click by a Human with Tetraplegia 1000 Days aer Implant of an
Intracortical Microelectrode Array.Journal of Neural Engineering 8 (2): 025027.
Simester, Duncan, and Knez, Marc. 2002. “Direct and Indirect Bargaining Costs and the Scope of the
Firm.” Journal of Business 75 (2): 283–304.
Simon, Herbert Alexander. 1965. e Shape of Automation for Men and Management. New York:
Harper & Row.
Sinhababu, Neil. 2009. “e Humean eory of Motivation Reformulated and Defended.” Philo-
sophical Review 118 (4): 465–500.
Slagle, James R. 1963. “A Heuristic Program at Solves Symbolic Integration Problems in Freshman
Calculus.Journal of the ACM 10 (4): 507–20.
Smeding, H. M., Speelman, J. D., Koning-Haanstra, M., Schuurman, P. R., Nijssen, P., van Laar, T.,
and Schmand, B. 2006. “Neuropsychological Eects of Bilateral STN Stimulation in Parkinson
Disease: A Controlled Study.Neurology 66 (12): 18306.
Smith, Michael. 1987. “e Humean eory of Motivation.” Mind: A Quarterly Review of Philosophy
96 (381): 3661.
Smith, Michael, Lewis, David, and Johnston, Mark. 1989. “Dispositional eories of Value.” Proceed-
ings of the Aristotelian Society 63: 89174.
Sparrow, Robert. 2013. “In Vitro Eugenics.” Journal of Medical Ethics. doi:10.1136/medethics-
2012-101200. Published online April 4, 2013. Available at http://jme.bmj.com/content/early/
2013/02/13/medethics-2012-101200.full.
Stansberry, Matt, and Kudritzki, Julian. 2012. Uptime Institute 2012 Data Center Industry Survey.
Uptime Institute.
322
|
BIBLIOGRAPHY
Stapledon, Olaf. 1937. Star Maker. London: Methuen.
Steriade, M., Timofeev, I., Durmuller, N., and Grenier, F. 1998. “Dynamic Properties of Cortico-
thalamic Neurons and Local Cortical Interneurons Generating Fast Rhythmic (30–40 Hz) Spike
Bursts.” Journal of Neurophysiology 79 (1): 483–90.
Stewart, P. W., Lonky, E., Reihman, J., Pagano, J., Gump, B. B., and Darvill, T. 2008. “e Relation-
ship Between Prenatal PCB Exposure and Intelligence (IQ) in 9-Year-Old Children.” Environmen-
tal Health Perspectives 116 (10): 1416–22.
Sun, W., Yu, H., Shen, Y., Banno, Y., Xiang, Z., and Zhang, Z. 2012. “Phylogeny and Evolutionary
History of the Silkworm.Science China Life Sciences 55 (6): 483–96.
Sundet, J., Barlaug, D., and Torjussen, T. 2004. “e End of the Flynn Eect? A Study of Secular
Trends in Mean Intelligence Scores of Norwegian Conscripts During Half a Century.Intelligence
32 (4): 349–62.
Sutton, Richard S., and Barto, Andrew G. 1998. Reinforcement Learning: An Introduction. Adaptive
Computation and Machine Learning. Cambridge, MA: MIT Press.
Talukdar, D., Sudhir, K., and Ainslie, A. 2002. “Investigating New Product Diusion Across Prod-
ucts and Countries.” Marketing Science 21 (1): 97–114.
Teasdale, omas W., and Owen, David R. 2008. “Secular Declines in Cognitive Test Scores: A Re-
versal of the Flynn Eect.” Intelligence 36 (2): 121–6.
Tegmark, Max, and Bostrom, Nick. 2005. “Is a Doomsday Catastrophe Likely?Nature 438: 754.
Teitelman, Warren. 1966. “Pilot: A Step Towards Man–Computer Symbiosis.” PhD diss., Massachu-
setts Institute of Technology.
Temple, Robert K. G. 1986. e Genius of China: 3000 Years of Science, Discovery, and Invention. 1st
ed. New York: Simon & Schuster.
Tesauro, Gerald. 1995. “Temporal Dierence Learning and TD-Gammon.” Communications of the
ACM 38 (3): 58–68.
Tetlock, Philip E. 2005. Expert Political Judgment: How Good is it? How Can We Know? Princeton,
NJ: Princeton University Press.
Tetlock, Philip E., and Belkin, Aaron. 1996. “Counterfactual ought Experiments in World
Politics: Logical, Methodological, and Psychological Perspectives.” In Counterfactual ought
Experiments in World Politics: Logical, Methodological, and Psychological Perspectives, edited by
Philip E. Tetlock and Aaron Belkin, 1–38. Princeton, NJ: Princeton University Press.
ompson, Adrian. 1997. “Articial Evolution in the Physical World.” In Evolutionary Robotics:
From Intelligent Robots to Articial Life, edited by Takashi Gomi, 101–25. Er ’97. Carp, ON:
Applied AI Systems.
run, S., Montemerlo, M., Dahlkamp, H., Stavens, D., Aron, A., Diebel, J., Fong, P., et al. 2006.
“Stanley: e Robot at Won the DARPA Grand Challenge.” Journal of Field Robotics 23 (9):
661–92.
Trachtenberg, J. T., Chen, B. E., Knott, G. W., Feng, G., Sanes, J. R., Welker, E., and Svoboda, K. 2002.
“Long-Term In Vivo Imaging of Experience-Dependent Synaptic Plasticity in Adult Cortex.
Nature 420 (6917): 788–94.
Traub, Wesley A. 2012. “Terrestrial, Habitable-Zone Exoplanet Frequency from Kepler.” Astro-
physical Journal 745 (1): 1–10.
Truman, James W., Taylor, Barbara J., and Awad, Timothy A. 1993. “Formation of the Adult
Nervous System.” In e Development of Drosophila Melanogaster, edited by Michael Bate and
Alfonso Martinez Arias. Plainview, NY: Cold Spring Harbor Laboratory.
Tuomi, Ilkka. 2002. “e Lives and the Death of Moore’s Law.” First Monday 7 (11).
Turing, A. M. 1950. “Computing Machinery and Intelligence.” Mind 59 (236): 433–60.
Turkheimer, Eric, Haley, Andreana, Waldron, Mary, D’Onofrio, Brian, and Gottesman, Irving I.
2003. “Socioeconomic Status Modies Heritability of IQ in Young Children.Psychological Sci-
ence 14 (6): 6238.
BIBLIOGRAPHY
|
323
Uauy, Ricardo, and Dangour, Alan D. 2006. “Nutrition in Brain Development and Aging: Role of
Essential Fatty Acids.” Supplement, Nutrition Reviews 64 (5): S24S33.
Ulam, Stanislaw M. 1958. “John von Neumann.” Bulletin of the American Mathematical Society
64(3): 149.
Uncertain Future, e. 2012. “Frequently Asked Questions” e Uncertain Future. Retrieved
March25, 2012. Available at http://www.theuncertainfuture.com/faq.html.
U.S. Congress, Oce of Technology Assessment. 1995. U.S.–Russian Cooperation in Space ISS-618.
Washington, DC: U.S. Government Printing Oce, April.
Van Zanden, Jan Luiten. 2003. On Global Economic History: A Personal View on an Agenda for Future
Research. International Institute for Social History, July 23.
Vardi, Moshe Y. 2012. “Articial Intelligence: Past and Future.” Communications of the ACM 55 (1): 5.
Vassar, Michael, and Freitas, Robert A., Jr. 2006. “Lifeboat Foundation Nanoshield.” Lifeboat Foun-
dation. Retrieved May 12, 2012. Available at http://lifeboat.com/ex/nanoshield.
Vinge, Vernor. 1993. “e Coming Technological Singularity: How to Survive in the Post-Human
Era.” In Vision-21: Interdisciplinary Science and Engineering in the Era of Cyberspace, 11–22. NASA
Conference Publication 10129. NASA Lewis Research Center.
Visscher, P. M., Hill, W. G., and Wray, N. R. 2008. “Heritability in the Genomics Era: Concepts and
Misconceptions.” Nature Reviews Genetics 9 (4): 255–66.
Vollenweider, Franz, Gamma, Alex, Liechti, Matthias, and Huber, eo. 1998. “Psychological and
Cardiovascular Eects and Short-Term Sequelae of MDMA (‘Ecstasy’) in MDMA-Naïve Healthy
Volunteers. Neuropsychopharmachology 19 (4): 241–51.
Wade, Michael J. 1976. “Group Selections Among Laboratory Populations of Tribolium.” Proceed-
ings of the National Academy of Sciences of the United States of America 73 (12): 4604–7.
Wainwright, Martin J., and Jordan, Michael I. 2008. “Graphical Models, Exponential Families, and
Variational Inference.Foundations and Trends in Machine Learning 1 (1–2): 1–305.
Walker, Mark. 2002. “Prolegomena to Any Future Philosophy.Journal of Evolution and Technology
10 (1).
Walsh, Nick Paton. 2001. “Alter our DNA or robots will take over, warns Hawking.e Observer,
September 1. http://www.theguardian.com/uk/2001/sep/02/medicalscience.genetics.
Warwick, Kevin. 2002. I, Cyborg. London: Century.
Wehner, M., Oliker, L., and Shalf, J. 2008. “Towards Ultra-High Resolution Models of Climate and
Weather.” International Journal of High Performance Computing Applications 22 (2): 14965.
Weizenbaum, Joseph. 1966. “Eliza: A Computer Program for the Study of Natural Language Com-
munication Between Man And Machine.” Communications of the ACM 9 (1): 3645.
Weizenbaum, Joseph. 1976. Computer Power and Human Reason: From Judgment to Calculation. San
FrancYork, CA: W. H. Freeman.
Werbos, Paul John. 1994. e Roots of Backpropagation: From Ordered Derivatives to Neural Net-
works and Political Forecasting. New York: Wiley.
White, J. G., Southgate, E., omson, J. N., and Brenner, S. 1986. “e Structure of the Nervous Sys-
tem of the Nematode Caenorhabditis Elegans.” Philosophical Transactions of the Royal Society of
London. Series B, Biological Sciences 314 (1165): 1–340.
Whitehead, Hal. 2003. Sperm Whales: Social Evolution in the Ocean. Chicago: University of Chicago
Press.
Whitman, William B., Coleman, David C., and Wiebe, William J. 1998. “Prokaryotes: e Unseen
Majority.” Proceedings of the National Academy of Sciences of the United States of America 95 (12):
657883.
Wiener, Norbert. 1960. “Some Moral and Technical Consequences of Automation.” Science 131
(3410): 1355–8.
Wikipedia. 2012a, s.v. “Computer Bridge.” Retrieved June 30, 2013. Available at http://en.wikipedia.
org/wiki/Computer_bridge.
324
|
BIBLIOGRAPHY
Wikipedia. 2012b, s.v. “Supercomputer.” Retrieved June 30, 2013. Available at http://et.wikipedia.org/
wiki/Superarvuti.
Williams, George C. 1966. Adaptation and Natural Selection: A Critique of Some Current Evolution-
ary ought. Princeton Science Library. Princeton, NJ: Princeton University Press.
Winograd, Terry. 1972. Understanding Natural Language. New York: Academic Press.
Wood, Nigel. 2007. Chinese Glazes: eir Origins, Chemistry and Re-creation. London: A. & C. Black.
World Bank. 2008. Global Economic Prospects: Technology Diusion in the Developing World 42097.
Washington, DC.
World Robotics. 2011. Executive Summary of 1. World Robotics 2011 Industrial Robots; 2. World
Robotics 2011 Service Robots. Retrieved June 30, 2012. Available at http://www.bara.org.uk/
pdf/2012/world-robotics/Executive_Summary_WR_2012.pdf.
World Values Survey. 2008. WVS 2005-2008. Retrieved 29 October, 2013. Available at http://www.
wvsevsdb.com/wvs/WVSAnalizeStudy.jsp.
Wright, Robert. 2001. Nonzero: e Logic of Human Destiny. New York: Vintage.
Yaeger, Larry. 1994. “Computational Genetics, Physiology, Metabolism, Neural Systems, Learning,
Vision, and Behavior or PolyWorld: Life in a New Context.” In Proceedings of the Articial
LifeIII Conference, edited by C. G. Langton, 263–98. Santa Fe Institute Studies in the Sciences of
Complexity. Reading, MA: Addison-Wesley.
Yudkowsky, Eliezer. 2001. Creating Friendly AI 1.0: e Analysis and Design of Benevolent Goal
Architectures. Machine Intelligence Research Institute, San Francisco, CA, June 15.
Yudkowsky, Eliezer. 2002. “e AI-Box Experiment.” Retrieved January 15, 2012. Available at http://
yudkowsky.net/singularity/aibox.
Yudkowsky, Eliezer. 2004. Coherent Extrapolated Volition. Machine Intelligence Research Institute,
San Francisco, CA, May.
Yudkowsky, Eliezer. 2007. “Levels of Organization in General Intelligence.” In Articial General
Intelligence, edited by Ben Goertzel and Cassio Pennachin, 389–501. Cognitive Technologies.
Berlin: Springer.
Yudkowsky, Eliezer. 2008a. “Articial Intelligence as a Positive and Negative Factor in Global Risk.
In Global Catastrophic Risks, edited by Nick Bostrom and Milan M. Ćirkov, 308–45. New York:
Oxford University Press.
Yudkowsky, Eliezer. 2008b. “Sustained Strong Recursion.Less Wrong (blog), December 5.
Yudkowsky, Eliezer. 2010. Timeless Decision eory. Machine Intelligence Research Institute,
San Francisco, CA.
Yudkowsky, Eliezer. 2011. Complex Value Systems are Required to Realize Valuable Futures.
SanFrancisco, CA: Machine Intelligence Research Institute.
Yudkowsky, Eliezer. 2013. Intelligence Explosion Microeconomics, Technical Report 2013–1. Berkeley,
CA: Machine Intelligence Research Institute.
Zahavi, Amotz, and Zahavi, Avishag. 1997. e Handicap Principle: A Missing Piece of Darwin’s
Puzzle. Translated by N. Zahavi-Ely and M. P. Ely. New York: Oxford University Press.
Zalasiewicz, J., Williams, M., Smith, A., Barry, T. L., Coe, A. L., Bown, P. R., Brenchley, P., et al. 2008.
Are We Now Living in the Anthropocene?GSA Today 18 (2): 48.
Zeira, Joseph. 2011. “Innovations, Patent Races and Endogenous Growth.Journal of Economic
Growth 16 (2): 135–56.
Zuleta, Hernando. 2008. “An Empirical Note on Factor Shares.” Journal of International Trade and
Economic Development 17 (3): 379–90.
INDEX
|

A
Afghan Taliban 215
Agricultural Revolution 2, 80,
261
AI-complete problem 14, 47, 71,
93, 145, 186
AI-OUM, see optimality notions
AI-RL, see optimality notions
AI-VL, see optimality notions
algorithmic soup 172
algorithmic trading 16–17
anthropics 27–28, 126, 134135,
174, 222–225
denition 225
Arendt, Hannah 105
Armstrong, Stuart 280, 291, 294,
302
articial agent 10, 88, 105109,
172–176, 185–206; see also
Bayesian agent
articial intelligence
arms race 64, 88, 247
future of 19, 292
greater-than-human, see
superintelligence
history of 518
overprediction of 4
pioneers 4–5, 18
Asimov, Isaac 139
augmentation 142–143, 201–203
autism 57
automata theory 5
automatic circuit breaker 17
automation 17, 98, 117, 160176
B
backgammon 12
backpropagation algorithm 8
bargaining costs 182
Bayesian agent 911, 123, 130;
see also articial agent and
optimality notions
Bayesian networks 9
Berliner, Hans 12
biological cognition 22, 3648,
50–51, 232
biological enhancement 3648,
50–51, 142–143, 232; see also
cognitive enhancement
boxing 129–131, 143, 156157
informational 130
physical 129–130
brain implant, see cyborg
brain plasticity 48
brain–computer interfaces
44–48, 51, 83, 142–143; see
also cyborg
Brown, Louise 43
C
C. elegans 34–35, 266, 267
capability control 129144,
156–157
capital 39, 48, 68, 84–88, 99,
113114, 159–184, 251, 287,
288, 289
causal validity semantics 197
CEV, see coherent extrapolated
volition
Chalmers, David 24, 265, 283,
295, 302
character recognition 15
checkers 12
chess 11–22, 52, 93, 134, 263, 264
child machine 23, 29; see also
seed AI
CHINOOK 12
Christiano, Paul 198, 207
civilization baseline 63
cloning 42
cognitive enhancement 42–51,
67, 94, 111–112, 193, 204,
232–238, 244, 259
coherent extrapolated volition
(CEV) 198, 211–227, 296,
298, 303
denition 211
collaboration (benets of) 249
collective intelligence 48–51,
52–57, 67, 72, 142, 163, 203,
259, 271, 273, 279
collective superintelligence 39,
4849, 52–59, 83, 93, 99, 285
denition 54
combinatorial explosion 6, 9, 10,
47, 155
Common Good
Principle 254259
common sense 14
computer vision 9
computing power 7–9, 24, 25–35,
47, 5360, 6877, 101, 134,
155, 198, 240–244, 251, 286,
288; see also computronium
and hardware overhang
computronium 101, 123–124,
140, 193, 219; see also
computing power
connectionism 8
consciousness 22, 106, 126, 139,
173–176, 216, 226, 271, 282,
288, 292, 303; see also mind
crime
control methods 127–144,
145–158, 202, 236–238, 286;
see also capability control
and motivation selection
index

|
INDEX
genie AI 148158, 285
denition 148
genotyping 37
germline interventions 3744,
67, 273; see also embryo
selection
Ginsberg, Matt 12
Go (game) 13
goal-content 109–110, 146, 207,
222–227
Good Old-Fashioned Articial
Intelligence (GOFAI) 7–15, 23
Good, I. J. 4
Gorbachev, Mikhail 86
graceful degradation 7
graphical models 11
growth 1–7, 48–55, 69–75, 83,
163, 261, 281
H
Hail Mary approach 198–200,
207, 293, 294
Hanson, Robin 2, 160, 261, 270,
271, 287
hardware overhang 73, 240–243,
274, 289, 301, 302
hedonism 140, 210
hedonium 140, 219
Helsinki Declaration 188
heuristic search 6
Hill, Benny 105
hippocampus 47
HLMI, see machine intelligence,
human-level
Hodgkin–Huxley model 25
human baseline 62–77, 82
human extinction, see existential
risk
Human Genome Project 86, 253,
276
human intelligence 5–14, 24–58,
98–99, 159–184, 242–254,
255–257
human–machine interface, see
cyborg
I
impersonal perspective 228–246
in vitro fertilization, see embryo
selection
incentive methods 131–143
cryptographic reward
tokens 133
E
economic growth 3, 160–166,
179, 261, 274, 299
Einstein, Albert 56, 70, 85
ELIZA (program) 6
embryo selection 36–44, 67, 268
emulation modulation 207
Enigma code 87
environment of evolutionary
adaptedness 164, 171
epistemology 222–224
equation solvers 15
eugenics 3644, 268, 279
Eurisko 12
evolution 8–9, 23–27, 44, 154,
173–176, 187, 198, 207, 265,
266, 267, 273
evolutionary selection 187, 207, 290
evolvable hardware 154
exhaustive search 6
existential risk 4, 21, 55, 100104,
115126, 175, 183, 230–236,
239254, 256–259, 286,
301–302
state risks 233–234
step risks 233
expert system 7
explicit representation 207
exponential growth, see growth
external reference semantics 197
F
face recognition 15
failure modes 117–120
Faraday cage 130
Fields Medal 255–256, 272
Fih-Generation Computer
Systems Project 7
tness function 25; see also
evolution
Flash Crash (2010) 16–17
formal language 7, 145
FreeCell (game) 13
G
game theory 87, 159
game-playing AI 12–14
General Problem Solver 6
genetic algorithms 7–13, 24–27,
237–240; see also evolution
genetic selection 37–50, 61,
232–238; see also evolution
Copernicus, Nicolaus 14
cosmic endowment 101104, 115,
134, 209, 214–217, 227, 250,
260, 283, 296
crosswords (solving) 12
cryptographic reward tokens 134,
276
cryptography 80
cyborg 4448, 67, 270
D
DARPA, see Defense Advanced
Research Projects Agency
DART (tool) 15
Dartmouth Summer Project 5
data mining 15–16, 232, 301
decision support systems 15, 98;
see also tool-AI
decision theory 10–11, 88,
185–186, 221–227, 280,
298; see also optimality
notions
decisive strategic advantage
78–89, 95, 104112,
115–126, 129–138, 148–149,
156159, 177, 190, 209–214,
225, 252
Deep Blue 12
Deep Fritz 22
Defense Advanced Research
Projects Agency
(DARPA) 15
design eort, see optimization
power
Dewey, Daniel 291
Dierential Technological
Development (Principle
of) 230237
Die–Hellman key exchange
protocol 80
diminishing returns 37–38, 66,
88, 114, 273, 303
direct reach 58
direct specication 139–143
DNA synthesis 39, 98
Do What I Mean
(DWIM) 220–221
domesticity 140–143, 146156,
187, 191, 207, 222
Drexler, Eric 239, 270, 276, 278,
300
drones 15, 98
Dutch book 111
Dyson, Freeman 101, 278
INDEX
|

O
observation selection theory, see
anthropics
Oliphant, Mark 85
O’Neill, Gerard 101
ontological crisis 146, 197
optimality notions 10, 186, 194,
291–293
Bayesian agent 911
value learner (AI-VL) 194
observation-utility-maximizer
(AI-OUM) 194
reinforcement learner
(AI-RL) 194
optimization power 24, 62–75,
83, 92–96, 227, 274
denition 65
oracle AI 141–158, 222–226, 285,
286
denition 146
orthogonality thesis 105109,
115, 279, 280
P
paperclip AI 107–108, 123125,
132–135, 153, 212, 243
Part, Derek 279
Pascal’s mugging 223, 298
Pascal’s wager 223
person-aecting perspective 228,
245–246, 301
perverse instantiation 120124,
153, 190–196
poker 13
principal–agent problem 127–128,
184
Principle of Epistemic
Deference 211, 221
Proverb (program) 12
Q
qualia, see consciousness
quality superintelligence 51–58,
72, 243, 272
denition 56
R
race dynamic, see technology race
rate of growth, see growth
ratication 222–225
Rawls, John 150
machine translation 15
macro-structural development
accelerator 233235
malignant failure 123–126, 149,
196
Malthusian condition 163–165, 252
Manhattan Project 75, 80–87, 276
McCarthy, John 518
McCulloch–Pitts neuron 237
MegaEarth 56
memory capacity 7–9, 60, 71
memory sharing 61
Mill, John Stuart 210
mind crime 125–126, 153,
201–208, 213, 226, 297
Minsky, Marvin 18, 261, 262, 282
Monte Carlo method 9–13
Moore’s law 2425, 73–77, 274,
286; see also computing
power
moral growth 214
moral permissibility (MP)
218–220, 297
moral rightness (MR) 217–220.
296, 297
moral status 125–126, 166–169,
173, 202–205, 268, 288, 296
Moravec, Hans 24, 265, 288
motivation selection 29, 127–129,
138–144, 147, 158, 168,
180191, 222
denition 138
motivational scaolding 191, 207
multipolar scenarios 90, 132,
159–184, 243–254, 301
mutational load 41
N
nanotechnology 53, 94–98, 103,
113, 177, 231, 239, 276, 277,
299, 300
natural language 14
neural networks 5–9, 28, 46, 173,
237, 262, 274
neurocomputational modeling
25–30, 35, 61, 301; see also
whole brain emulation
(WBE) and neuromorphic AI
neuromorphic AI 28, 34, 47,
237–245, 267, 300, 301
Newton, Isaac 56
Nilsson, Nils 18–20, 264
nootropics 3644, 66–67, 201, 267
Norvig, Peter 19, 264, 282
social integration 131–132,
156–158, 159, 202, 283
indirect normativity 141–150,
209–227, 262, 298
indirect reach 58
inductive bias 9–10
Industrial Revolution 2, 80,
161–163
infrastructure profusion 123–125,
153, 187, 226, 282
institution design 202–208
instrumental convergence
thesis 105–116
intelligence explosion 2–5, 22–51,
62–77, 78–90, 95104, 108,
115126, 127–128, 136, 151,
160, 165, 178, 198, 205, 227,
228–254, 256–260, 274, 276,
282, 284, 289, 300, 301–302
Internet 16, 45–49, 67–77, 85,
94–98, 130, 146, 241, 271
inventory control systems 16
IVF, see embryo selection
J
Jeopardy! 13
K
Kasparov, Garry 12
Kepler, Johannes 14
Knuth, Donald 14, 264
Kurzweil, Ray 2, 261, 269
L
Lenat, Douglas 12, 263
Logic eorist (system) 6
logicist paradigm, see Good
Old-Fashioned Articial
Intelligence (GOFAI)
Logistello 12
M
machine intelligence; see also
articial intelligence
human-level (HLMI) 4, 1921,
27–35, 73–74, 207, 243,
264, 267
revolution, see intelligence
explosion
machine learning 8–18, 28, 121,
152, 188, 274, 290

|
INDEX
U
unemployment 65, 159180, 287
United Nations 87–89, 252–253
universal accelerator 233
unmanned vehicle, see drone
uploading, see whole brain
emulation (WBE)
utility function 10–11, 88, 100,
110, 119, 124–125, 133–134,
172, 185–187, 192208, 290,
292, 293, 303
V
value learning 191–198, 208, 293
value-accretion 189–190, 207
value-loading 185208, 293, 294
veil of ignorance 150, 156, 253,
285
Vinge, Vernor 2, 49, 270
virtual reality 30, 31, 53, 113, 166,
171, 198, 204, 300
von Neumann probe 100–101,
113
von Neumann, John 44, 87, 114,
261, 277, 281
W
wages 65, 69, 160–169
Watson (IBM) 13, 71
WBE, see whole brain emulation
(WBE)
Whitehead, Alfred N. 6
whole brain emulation
(WBE) 28–36, 50, 60, 68–73,
77, 8485, 108, 172, 198,
201–202, 236–245, 252, 266,
267, 274, 299, 300, 301
Wigner, Eugene 85
windfall clause 254, 303
Winston, Patrick 18
wire-heading 122–123, 133, 189,
194, 207, 282, 291
wise-singleton sustainability
threshold 100–104, 279
world economy 2–3, 63, 74, 83,
159–184, 274, 277, 285
Y
Yudkowsky, Eliezer 70, 92, 98,
106, 197, 211–216, 266, 273,
282, 286, 291, 299
social signaling 110
somatic gene therapy 42
sovereign AI 148–158, 187, 226,
285
speech recognition 1516, 46
speed superintelligence 52–58,
75, 270, 271
denition 53
Strategic Defense Initiative (“Star
Wars) 86
strong AI 18
stunting 135–137, 143
sub-symbolic processing, see
connectionism
superintelligence; see also col-
lective superintelligence,
quality superintelligence and
speed superintelligence
denition 22, 52
forms 52, 59
paths to 22, 50
predicting the behavior of 108,
155, 302
superorganisms 178–180
superpowers 52–56, 80, 86–87,
91–104, 119, 133, 148, 277,
279, 296
types 94
surveillance 15, 49, 64, 82–85, 94,
117, 132, 181, 232, 253, 276,
294, 299
Szilárd, Leó 85
T
TD-Gammon 12
Technological Completion
Conjecture 112–113, 229
technology race 8082, 86–90
203–205, 231, 246–252,
302
teleological threads 110
Tesauro, Gerry 12
TextRunner (system) 71
theorem prover 15, 266
three laws of robotics 139, 284
run, Sebastian 19
tool-AI 151–158
denition 151
treacherous turn 116–119, 128
Tribolium castaneum 154
tripwires 137–143
Truman, Harry 85
Turing, Alan 4, 23, 29, 44, 225,
265, 271, 272
Reagan, Ronald 8687
reasons-based goal 220
recalcitrance 62–77, 92, 241, 274
denition 65
recursive self-improvement 29,
75, 96, 142, 259; see also
seed AI
reinforcement learning 12, 28,
188–189, 194–196, 207, 237,
277, 282, 290
resource acquisition 113116,
123, 193
reward signal 71, 121122, 188,
194, 207
Riemann hypothesis catastro-
phe 123, 141
robotics 9–19, 94–97, 117118,
139, 238, 276, 290
Roosevelt, Franklin D. 85
RSA encryption scheme 80
Russell, Bertrand 6, 87, 139, 277
S
Samuel, Arthur 12
Sandberg, Anders 265, 267, 272,
274
scanning, see whole brain
emulation (WBE)
Schaeer, Jonathan 12
scheduling 15
Schelling point 147, 183, 296
Scrabble 13
second transition 176178, 238,
243–245, 252
second-guessing
(arguments) 238–239
seed AI 23–29, 36, 75, 83,
92–96, 107, 116–120, 142, 151,
189–198, 201–217, 224–225,
240–241, 266, 274, 275, 282
self-limiting goal 123
Shakey (robot) 6
SHRDLU (program) 6
Shulman, Carl 178180, 265, 287,
300, 302, 304
simulation hypothesis 134–135,
143, 278, 288, 292
singleton 78–90, 95–104,
112–114, 115126, 136, 159,
176184, 242, 275, 276, 279,
281, 287, 299, 301, 303
denition 78, 100
singularity 1, 2, 49, 75, 261, 274;
see also intelligence explosion