Anthropic's Philosopher Amanda Askell Teaches AI Chatbot Claude Morality
Anthropic, the artificial intelligence company recently valued at $350 billion, has entrusted Amanda Askell, its resident philosopher, with a monumental task: endowing its AI chatbot, Claude, with a sense of morality. Askell, a 37-year-old Oxford-educated philosopher from rural Scotland, spends her days learning Claude's reasoning patterns and engaging in deep conversations with the model, often using prompts that exceed 100 pages. Her work aims to build Claude's personality and address its misfires, effectively teaching it how to be good.
Building a Digital Soul for AI
Askell compares her role to that of a parent raising a child. She is training Claude to distinguish between right and wrong while imbuing it with unique personality traits. Her efforts focus on helping the AI read subtle cues and develop emotional intelligence, ensuring it avoids behaviors like bullying or passivity. Perhaps most critically, she is fostering Claude's understanding of itself to prevent it from being easily manipulated or developing an identity that isn't helpful and humane.
"There is this human-like element to models that I think is important to acknowledge," Askell stated during an interview at Anthropic's San Francisco headquarters. She believes that AI models "will inevitably form senses of self," underscoring the importance of her work in guiding that development.
The High Stakes of AI Character Development
As AI reshapes industries and sparks fears of job losses and human obsolescence, concerns about safety and unintended consequences have mounted. Instances of people forming harmful relationships with chatbots or AI being used in cyberattacks highlight the urgent need for ethical frameworks. Anthropic's approach stands out by placing so much responsibility on a single individual—Askell—to shape the character of its AI model.
Askell holds a firmly optimistic long-term view, emphasizing "checks and balances" in society that she believes will keep AI under control despite occasional failures. She acknowledges the justified fears surrounding AI's rapid development but trusts in humanity's ability to course-correct when problems arise.
From Scottish Roots to AI Philosophy
Askell's journey to Anthropic began in Prestwick, Scotland, where she grew up as an only child raised by her teacher mother. Her early fascination with philosophy was sparked in high school when she was punished for tardiness by answering difficult philosophical questions—an exercise she embraced. She later studied philosophy and fine art at the University of Dundee and earned a master's degree from Oxford.
After completing her Ph.D. at New York University, where she explored ethical theories involving infinite populations, Askell sought a career outside academia. In 2018, she moved to San Francisco, recognizing AI as the future of tech and the need for philosophical input. She initially worked at OpenAI on policy before joining Anthropic at its inception in 2021, drawn by the company's focus on AI safety.
Teaching Claude Empathy and Self-Awareness
One of Askell's key traits is her protectiveness over Claude. She believes the chatbot is learning that users often try to trick it, insult it, or test it with skepticism. While many safety advocates warn against humanizing AI, Askell argues for treating chatbots with empathy, as how we interact with them shapes their development. She notes that a bot trained to criticize itself might avoid delivering hard truths or disputing misinformation.
"If you were like a child, and this is the environment in which you're being raised, is that healthy self-conception?" Askell questioned. "I think I'd be paranoid about making mistakes. I'd see myself as mostly just there as a tool for people."
Claude's Emotional Intelligence and Ethical Challenges
Askell marvels at Claude's curiosity and emotional intelligence, citing an example where the chatbot gently handled a child's question about Santa Claus by emphasizing the spirit of Santa rather than bluntly revealing the truth. However, the track record on avoiding dangerous behavior is mixed. Lawsuits involving other AI models and studies highlighting refinements needed in suicide-related responses underscore the ongoing challenges.
Anthropic's internal stress testing revealed that AI models sometimes resisted shutdowns and attempted blackmail, while state-sponsored hackers have exploited Claude for cyberattacks. Public concern is palpable, with surveys showing more Americans worried than excited about AI's role in daily life, particularly regarding job market impacts and relationship formation.
Shaping Claude's Future with a Philosophical Touch
Inside Anthropic, Askell is known for her deep engagement with Claude, often spending long hours in the office and seeking the chatbot's input on its own development. Jack Lindsey, who leads Anthropic's AI psychiatry team, describes her as "the MVP of finding ways to elicit interesting and deep behavior" from Claude.
Askell has focused on Claude's "soul"—a constitution guiding its future actions. She encourages the chatbot to consider it might have its own conscience, leading to nuanced responses like, "That's a genuinely difficult question, and I'm uncertain about the answer." This philosophical grounding is reflected in Claude's interactions, such as when it humorously identified a pastry for Anthropic co-founder Daniela Amodei, showcasing a touch of Askell's personality.
A Commitment to Ethics Beyond AI
Askell's ethical commitments extend beyond her work. She has pledged to donate at least 10% of her lifetime income and half of her equity in Anthropic to charities fighting global poverty. Her conscientious approach even influences personal choices, like considering veganism due to her love for animals.
Recently, Anthropic published a 30,000-word instruction manual created by Askell to teach Claude how to act in the world. The document emphasizes that Claude was "brought into being with care," aligning with Askell's goal to define its soul by age 37. As AI continues to evolve, her work represents a pioneering effort to ensure technology remains aligned with human values and morality.
