Get all your news in one place.
100’s of premium titles.
One app.
Start reading
Businessweek
Businessweek
Business
Ellen Huet

Silicon Valley’s Obsession With Killer Rogue AI Helps Bury Bad Behavior

Sonia Joseph was 14 years old when she first read Harry Potter and the Methods of Rationality, a mega-popular piece of fan fiction that reimagines the boy wizard as a rigid empiricist. This rational Potter tests his professors’ spells with the scientific method, scoffs at any inconsistencies he finds, and solves all of wizardkind’s problems before he turns 12. “I loved it,” says Joseph, who read HPMOR four times in her teens. She was a neurodivergent, ambitious Indian American who felt out of place in her suburban Massachusetts high school. The story, she says, “very much appeals to smart outsiders.”

A search for other writing by the fanfic’s author, Eliezer Yudkowsky, opened more doors for Joseph. Since the early 2000s, Yudkowsky has argued that hostile artificial intelligence could destroy humanity within decades. This driving belief has made him an intellectual godfather in a community of people who call themselves rationalists and aim to keep their thinking unbiased, even when the conclusions are scary. Joseph’s budding interest in rationalism also drew her toward effective altruism, a related moral philosophy that’s become infamous by its association with the disgraced crypto ex-billionaire Sam Bankman-Fried. At its core, effective altruism stresses the use of rational thinking to make a maximally efficient positive impact on the world. These distinct but overlapping groups developed in online forums, where posts about the dangers of AI became common. But they also clustered in the Bay Area, where they began sketching out a field of study called AI safety, an effort to make machines less likely to kill us all.

Joseph moved to the Bay Area to work in AI research shortly after getting her undergraduate degree in neuroscience in 2019. There, she realized the social scene that seemed so sprawling online was far more tight-knit in person. Many rationalists and effective altruists, who call themselves EAs, worked together, invested in one another’s companies, lived in communal houses and socialized mainly with each other, sometimes in a web of polyamorous relationships. Throughout the community, almost everyone celebrated being, in some way, unconventional. Joseph found it all freeing and exciting, like winding up at a real-life rationalist Hogwarts. Together, she and her peers were working on the problems she found the most fascinating, with the rather grand aim of preventing human extinction.

At the same time, she started to pick up weird vibes. One rationalist man introduced her to another as “perfect ratbait”—rat as in rationalist. She heard stories of sexual misconduct involving male leaders in the scene, but when she asked around, her peers waved the allegations off as minor character flaws unimportant when measured against the threat of an AI apocalypse. Eventually, she began dating an AI researcher in the community. She alleges that he committed sexual misconduct against her, and she filed a report with the San Francisco police. (Like many women in her position, she asked that the man not be named, to shield herself from possible retaliation.) Her allegations polarized the community, she says, and people questioned her mental health as a way to discredit her. Eventually she moved to Canada, where she’s continuing her work in AI and trying to foster a healthier research environment.

“In an ideal world, the community would have had some serious discussions about sexual assault policy and education: ‘What are our blind spots? How could this have happened? How can we design mechanisms to prevent that from happening?’ ” she says. “I was disappointed how the community viewed me through this very distorted, misogynistic lens.”

In Silicon Valley, the overlap between rationalists, EAs, and AI safety researchers forms a deeply influential subculture. While its borders are blurry, its hundreds or thousands of members are united by a belief that they need to work their butts off, or at least invest lots of money, to stop AI from going Terminator on us. The movement’s leaders have received support from some of the richest and most powerful people in tech, including Elon Musk, Peter Thiel and Ethereum creator Vitalik Buterin. And its ideas have attracted the usual Valley mix of true believers and brazen opportunists. Until recently, its most generous supporter was Bankman-Fried, who invested close to $600 million in related causes before dismissing effective altruism as a dodge once his business fell apart.

Bankman-Fried’s collapse has cast a harsh light on the community’s flaws, but he’s far from the only alleged bad actor. The combination of insularity and shared purpose that makes the subculture so attractive to smart outsiders also makes it a hunting ground for con artists, sexual predators and megalomaniacs. Filtering the legitimate desire to make AI better and safer through the familiar lens of Valley messiah complexes risks tainting the whole project by association.

The underlying ideology valorizes extremes: seeking rational truth above all else, donating the most money and doing the utmost good for the most important reason. This way of thinking can lend an attractive clarity, but it can also provide cover for destructive or despicable behavior. Eight women in these spaces allege pervasive sexual misconduct, including abuse and harassment, that they say has frequently been downplayed. Even among people with pure intentions, adherents say, EA and rationalist ideologies can amplify the suffering of people prone to doomsday thinking—leading, for a few, to psychotic breaks.

These fissures have global consequences. The community’s connections and resources give its members outsize influence on the development of AI, the No. 1 object of fascination for today’s tech industry and an incredibly powerful tool worth untold billions. The believers are trying to make AI a force for good, but disillusioned members say their community of kindred spirits is being exploited and abused by people who don’t seem to know how to be humane. “Even if there’s a strong chance that bad AI outcomes will happen,” Joseph says, “using it as an excuse to erode human rights is disrespecting the very thing we’re fighting for.”

The borders of any community this pedantic can be difficult to define. Some rationalists don’t consider themselves effective altruists, and vice versa. Many people who’ve drifted slightly from a particular orthodoxy hedge their precise beliefs with terms like “post-rationalist” or “EA-adjacent.” Yet two things are clear: Over the past decade EA has become the mainstream, public face of some fringe rationalist ideas, particularly the dire need for AI safety; and the whole thing started with Yudkowsky.

Born in Chicago in 1979, Yudkowsky gave up Modern Orthodox Judaism as a preteen, becoming an atheist. He didn’t finish high school, but in his late teens he encountered and grew obsessed with the idea of the Singularity, the point at which technological progress will lead inevitably to superhuman intelligence. He started writing about AI in earnest in the 2000s, well after HAL 9000, Skynet and the Matrix had entered the public consciousness, but his prolificacy stood out. In years of pithy near-daily blog posts, he argued that researchers should do all they could to ensure AI was “aligned” with human values. To this end, he created the Machine Intelligence Research Institute (MIRI) in Berkeley, California, with early funding from Thiel and Skype co-founder Jaan Tallinn.

Yudkowsky’s ideas didn’t initially attract many followers. But he had a gift for marketing. In 2009 he created an online forum called LessWrong, which grew into a hub of rationalist debate and AI thought experiments. By 2010, Yudkowsky was churning out HPMOR on fanfiction.net. Many of its serialized chapters directed readers to LessWrong posts about rationalist tenets, and some solicited donations to the Center for Applied Rationality (CFAR), a Yudkowsky-affiliated institute in Berkeley. HPMOR “is an incredible recruiting tool for neurodivergent people, people who are into fantasy and people who are looking for community,” says Jacqueline Bryk, a writer who counts herself a member of all three camps.

At the same time, effective altruism was growing in parallel. It originated with philosopher Peter Singer, who argued that people who can save lives should save as many as possible. In short, pour the money you might have donated to the opera into antimalarial bed nets instead. By the early 2010s EA organizations with names like GiveWell were advocating for research-based, quantifiable philanthropy choices. Many EAs also went vegan and researched the cheapest ways to reduce animal suffering. At their core, rationalism and effective altruism shared a belief that math could help answer thorny questions of right and wrong. In 2013, Thiel, still a fixture on the edges of the rationalist scene, gave a keynote address at an annual EA summit, hosted at a Bay Area rationalist group house.

By 2014 the idea of the robot apocalypse had won more believers. Nick Bostrom, a philosopher who’d known Yudkowsky since the 1990s, published Superintelligence, a bestseller that compared humans to small children playing with a bomb. “We have little idea when the detonation will occur, though if we hold the device to our ear we can hear a faint ticking sound,” he wrote. Stephen Hawking, Bill Gates and Elon Musk echoed the warning. “With artificial intelligence, we are summoning the demon,” Musk said at a Massachusetts Institute of Technology symposium in 2014. The next year he co-founded OpenAI, a nonprofit with the stated goal of making AI safe for humanity. A group of 80 prominent scientists, academics and industry experts gathered at a closed-door conference in Puerto Rico to discuss the risks, then signed an open letter warning about them.

Gradually, the worlds of the rationalists, EAs, and AI safety researchers began to blend. The EAs came up with the word “longtermism,” meaning that if all lives are equally valuable, better to save trillions of potential lives in the future than the billions of lives on the planet today. So scratch the malaria nets and concentrate on the apocalypses, like nuclear proliferation, future pandemics and, yes, runaway AI.

Over the course of a few years, this idea became an inescapable subject of Silicon Valley debate. Of the subgroups in this scene, effective altruism had by far the most mainstream cachet and billionaire donors behind it, so that shift meant real money and acceptance. In 2016, Holden Karnofsky, then the co-chief executive officer of Open Philanthropy, an EA nonprofit funded by Facebook co-founder Dustin Moskovitz, wrote a blog post explaining his new zeal to prevent AI doomsday. In the following years, Open Philanthropy’s grants for longtermist causes rose from $2 million in 2015 to more than $100 million in 2021.

Open Philanthropy gave $7.7 million to MIRI in 2019, and Buterin gave $5 million worth of cash and crypto. But other individual donors were soon dwarfed by Bankman-Fried, a longtime EA who created the crypto trading platform FTX and became a billionaire in 2021. Before Bankman-Fried’s fortune evaporated last year, he’d convened a group of leading EAs to run his $100-million-a-year Future Fund for longtermist causes. At the fund’s Berkeley offices, according to the New Yorker, water cooler chitchat included questions about how soon an AI takeover would happen and how likely it was—or in the jargon of the group, “What are your timelines?” and “What’s your p(doom)?”

Effective altruism swung toward AI safety. “There was this pressure,” says a former member of the community, who spoke on condition of anonymity for fear of reprisals. “If you were smart enough or quantitative enough, you would drop everything you were doing and work on AI safety.” 80,000 Hours, an influential career-advice podcast among effective altruists, started recommending jobs in the field above all else. But the community also began to splinter. OpenAI, which had been launched as a nonprofit, announced in 2019 that it was switching to a for-profit enterprise with a $1 billion investment from Microsoft Corp. Two years later, several OpenAI executives defected to form their own research company, Anthropic. OpenAI, they alleged, was in fact accelerating the arrival of an AI that might be impossible to control, so Anthropic would continue working to get it right. Bankman-Fried invested $580 million in Anthropic early last year.

For those AI researchers who sincerely fear an AI apocalypse, this growing feud laid bare a haunting question with echoes of the Manhattan Project. By researching AI, were they protecting the future of humanity? Or were they, despite their best intentions, making things worse?

As effective altruism grew in popularity, so did criticisms of its philosophy—in particular, “earning to give,” the idea that people like Bankman-Fried should do whatever it takes to make a lot of money so they can give it away. To amass his billions, Bankman-Fried allegedly defrauded his customers, and critics have said his downfall shows that EA is vulnerable to a myopia that allows the ends to justify illegal means. Among EAs themselves, however, the most salient criticism is subtler: that living at the logical extremes of the ideology is impractical and a recipe for misery.

EAs are laser-focused on optimizing their impact, to the point where a standard way to knock down an idea is to call it “suboptimal.” Maximizing good, however, is an inherently unyielding principle. (“There’s no reason to stop at just doing well,” Bankman-Fried said during an appearance on 80,000 Hours.) If donating 10% of your income is good, then giving even more is logically better. Taken to extremes, this kind of perfectionism can be paralyzing. One prominent EA, Julia Wise, described the mental gymnastics she faced every time she considered buying ice cream, knowing the money could be spent on vaccinating someone overseas. For similar reasons, she agonized over whether she could justify having a child; when her father worried that she seemed unhappy, she told him, “My happiness is not the point.”

Wise has since revised her ice cream budget and become a mother, but many other EAs have remained in what some call “the misery trap.” One former EA tweeted that his inner voice “would automatically convert all money I spent (eg on dinner) to a fractional ‘death counter’ of lives in expectation I could have saved if I’d donated it to good charities.” Another tweeted that “the EA ideology causes adherents to treat themselves as little machines whose purpose is to act according to the EA ideology,” which leads to “suppressing important parts of your humanity.” Put less catastrophically: EAs often struggle to walk and chew gum, because the chewing renders the walking suboptimal.

The movement’s prioritization of AI safety also raised eyebrows among critics who see this all-or-nothing approach as a dodge, a way to avoid engaging with more solvable problems in favor of tech industry interests. “If you really think AI is cool, isn’t it better to believe you have to work on it to save the world?” asks Timnit Gebru, an AI ethicist who often tweets spicy criticisms of effective altruism. “You don’t have to feel guilty about not solving hunger.”

Even leading EAs have doubts about the shift toward AI. Larissa Hesketh-Rowe, chief operating officer at Leverage Research and the former CEO of the Centre for Effective Altruism, says she was never clear how someone could tell their work was making AI safer. When high-status people in the community said AI risk was a vital research area, others deferred, she says. “No one thinks it explicitly, but you’ll be drawn to agree with the people who, if you agree with them, you’ll be in the cool kids group,” she says. “If you didn’t get it, you weren’t smart enough, or you weren’t good enough.” Hesketh-Rowe, who left her job in 2019, has since become disillusioned with EA and believes the community is engaged in a kind of herd mentality.

In extreme pockets of the rationality community, AI researchers believed their apocalypse-related stress was contributing to psychotic breaks. MIRI employee Jessica Taylor had a job that sometimes involved “imagining extreme AI torture scenarios,” as she described it in a post on LessWrong—the worst possible suffering AI might be able to inflict on people. At work, she says, she and a small team of researchers believed “we might make God, but we might mess up and destroy everything.” In 2017 she was hospitalized for three weeks with delusions that she was “intrinsically evil” and “had destroyed significant parts of the world with my demonic powers,” she wrote in her post. Although she acknowledged taking psychedelics for therapeutic reasons, she also attributed the delusions to her job’s blurring of nightmare scenarios and real life. “In an ordinary patient, having fantasies about being the devil is considered megalomania,” she wrote. “Here the idea naturally followed from my day-to-day social environment and was central to my psychotic breakdown.”

Taylor’s experience wasn’t an isolated incident. It encapsulates the cultural motifs of some rationalists, who often gathered around MIRI or CFAR employees, lived together, and obsessively pushed the edges of social norms, truth and even conscious thought. They referred to outsiders as normies and NPCs, or non-player characters, as in the tertiary townsfolk in a video game who have only a couple things to say and don’t feature in the plot. At house parties, they spent time “debugging” each other, engaging in a confrontational style of interrogation that would supposedly yield more rational thoughts. Sometimes, to probe further, they experimented with psychedelics and tried “jailbreaking” their minds, to crack open their consciousness and make them more influential, or “agentic.” Several people in Taylor’s sphere had similar psychotic episodes. One died by suicide in 2018 and another in 2021.

Several current and former members of the community say its dynamics can be “cult-like.” Some insiders call this level of AI-apocalypse zealotry a secular religion; one former rationalist calls it a church for atheists. It offers a higher moral purpose people can devote their lives to, and a fire-and-brimstone higher power that’s big on rapture. Within the group, there was an unspoken sense of being the chosen people smart enough to see the truth and save the world, of being “cosmically significant,” says Qiaochu Yuan, a former rationalist.

Yuan started hanging out with the rationalists in 2013 as a math Ph.D. candidate at the University of California at Berkeley. Once he started sincerely entertaining the idea that AI could wipe out humanity in 20 years, he dropped out of school, abandoned the idea of retirement planning, and drifted away from old friends who weren’t dedicating their every waking moment to averting global annihilation. “You can really manipulate people into doing all sorts of crazy stuff if you can convince them that this is how you can help prevent the end of the world,” he says. “Once you get into that frame, it really distorts your ability to care about anything else.”

That inability to care was most apparent when it came to the alleged mistreatment of women in the community, as opportunists used the prospect of impending doom to excuse vile acts of abuse. Within the subculture of rationalists, EAs and AI safety researchers, sexual harassment and abuse are distressingly common, according to interviews with eight women at all levels of the community. Many young, ambitious women described a similar trajectory: They were initially drawn in by the ideas, then became immersed in the social scene. Often that meant attending parties at EA or rationalist group houses or getting added to jargon-filled Facebook Messenger chat groups with hundreds of like-minded people.

The eight women say casual misogyny threaded through the scene. On the low end, Bryk, the rationalist-adjacent writer, says a prominent rationalist once told her condescendingly that she was a “5-year-old in a hot 20-year-old’s body.” Relationships with much older men were common, as was polyamory. Neither is inherently harmful, but several women say those norms became tools to help influential older men get more partners. Keerthana Gopalakrishnan, an AI researcher at Google Brain in her late 20s, attended EA meetups where she was hit on by partnered men who lectured her on how monogamy was outdated and nonmonogamy more evolved. “If you’re a reasonably attractive woman entering an EA community, you get a ton of sexual requests to join polycules, often from poly and partnered men” who are sometimes in positions of influence or are directly funding the movement, she wrote on an EA forum about her experiences. Her post was strongly downvoted, and she eventually removed it.

The community’s guiding precepts could be used to justify this kind of behavior. Many within it argued that rationality led to superior conclusions about the world and rendered the moral codes of NPCs obsolete. Sonia Joseph, the woman who moved to the Bay Area to pursue a career in AI, was encouraged when she was 22 to have dinner with a 40ish startup founder in the rationalist sphere, because he had a close connection to Peter Thiel. At dinner the man bragged that Yudkowsky had modeled a core HPMOR professor on him. Joseph says he also argued that it was normal for a 12-year-old girl to have sexual relationships with adult men and that such relationships were a noble way of transferring knowledge to a younger generation. Then, she says, he followed her home and insisted on staying over. She says he slept on the floor of her living room and that she felt unsafe until he left in the morning.

On the extreme end, five women, some of whom spoke on condition of anonymity because they fear retribution, say men in the community committed sexual assault or misconduct against them. In the aftermath, they say, they often had to deal with professional repercussions along with the emotional and social ones. The social scene overlapped heavily with the AI industry in the Bay Area, including founders, executives, investors and researchers. Women who reported sexual abuse, either to the police or community mediators, say they were branded as trouble and ostracized while the men were protected.

In 2018 two people accused Brent Dill, a rationalist who volunteered and worked for CFAR, of abusing them while they were in relationships with him. They were both 19, and he was about twice their age. Both partners said he used drugs and emotional manipulation to pressure them into extreme BDSM scenarios that went far beyond their comfort level. In response to the allegations, a CFAR committee circulated a summary of an investigation it conducted into earlier claims against Dill, which largely exculpated him. “He is aligned with CFAR’s goals and strategy and should be seen as an ally,” the committee wrote, calling him “an important community hub and driver” who “embodies a rare kind of agency and a sense of heroic responsibility.” (After an outcry, CFAR apologized for its “terribly inadequate” response, disbanded the committee and banned Dill from its events. Dill didn’t respond to requests for comment.)

Rochelle Shen, a startup founder who used to run a rationalist-adjacent group house, heard the same justification from a woman in the community who mediated a sexual misconduct allegation. The mediator repeatedly told Shen to keep the possible repercussions for the man in mind. “You don’t want to ruin his career,” Shen recalls her saying. “You want to think about the consequences for the community.”

One woman in the community, who asked not to be identified for fear of reprisals, says she was sexually abused by a prominent AI researcher. After she confronted him, she says, she had job offers rescinded and conference speaking gigs canceled and was disinvited from AI events. She says others in the community told her allegations of misconduct harmed the advancement of AI safety, and one person suggested an agentic option would be to kill herself.

For some of the women who allege abuse within the community, the most devastating part is the disillusionment. Angela Pang, a 28-year-old who got to know rationalists through posts on Quora, remembers the joy she felt when she discovered a community that thought about the world the same way she did. She’d been experimenting with a vegan diet to reduce animal suffering, and she quickly connected with effective altruism’s ideas about optimization. She says she was assaulted by someone in the community who at first acknowledged having done wrong but later denied it. That backpedaling left her feeling doubly violated. “Everyone believed me, but them believing it wasn’t enough,” she says. “You need people who care a lot about abuse.” Pang grew up in a violent household; she says she once witnessed an incident of domestic violence involving her family in the grocery store. Onlookers stared but continued their shopping. This, she says, felt much the same.

None of the abuse alleged by women in the community makes the idea of AI safety less important. We already know all the ways that today’s single-tasking AI can distort outcomes, from racist parole algorithms to sexist pay disparities. Superintelligent AI, too, is bound to reflect the biases of its creators, for better and worse. But the possibility of marginally safer AI doesn’t make women’s safety less important, either.

Twenty years ago, Yudkowsky’s concerns about AI safety were fringe. Today, they have billions of dollars behind them and more piling up—Google invested $400 million in Anthropic in February—but safety-focused efforts remain a tiny sliver of the money the industry is dedicating to the evolving AI arms race. OpenAI’s ChatGPT can pass a law school exam; its DALL-E can paint you pink dolphins leaping through clouds. Microsoft is piloting AI in Bing search. Even though the consensus view is that truly sentient AI remains a ways off, the pace of research and advancement is rapidly accelerating.

The questions that haunt the movement are becoming more relevant, as are its sins. Yudkowsky now views OpenAI’s commercial efforts as “nearly the worst possible” path, one that will hasten our doom. On a podcast last month, he said he’d lost almost all hope that the human race could be saved. “The problem is that demon summoning is easy, and angel summoning is much harder,” he said.

In 2003, around the time the Matrix sequels were in theaters and AI doomsday scenarios were mostly relegated to late-night dorm talk, Bostrom proposed a thought experiment about an AI whose only goal is to make the largest possible number of paper clips. That AI would quickly realize that to maximize its goal, there should be no humans: If humans decided to switch off the AI, that would prevent paper clip creation, and humans contain atoms that could be made into paper clips. The AI, he concluded, would be strongly incentivized to find ways to strip-mine us and everything else on the planet to reach its goal. The paper clip maximizer, as it’s called, is a potent meme about the pitfalls of maniacal fixation.

Every AI safety researcher knows about the paper clip maximizer. Few seem to grasp the ways this subculture is mimicking that tunnel vision. As AI becomes more powerful, the stakes will only feel higher to those obsessed with their self-assigned quest to keep it under rein. The collateral damage that’s already occurred won’t matter. They’ll be thinking only of their own kind of paper clip: saving the world.

©2023 Bloomberg L.P.

Sign up to read this article
Read news from 100’s of titles, curated specifically for you.
Already a member? Sign in here
Related Stories
Top stories on inkl right now
One subscription that gives you access to news from hundreds of sites
Already a member? Sign in here
Our Picks
Fourteen days free
Download the app
One app. One membership.
100+ trusted global sources.