A new feature from OpenAI is set to fundamentally disrupt the “trust equation” between young users and conversational AI. By programming ChatGPT to report potential self-harm risks to parents, the company is forcing a critical question: will teens continue to trust and confide in an AI that might tell on them?
The argument for the feature suggests that a different kind of trust will be formed—trust in the AI as a safety device. Supporters believe that users will come to appreciate the AI’s role as a guardian, understanding that the confidentiality breach is a last-resort measure designed for their own protection. In this view, trust is rebuilt around the concept of safety.
However, a compelling counterargument suggests that the existing form of trust, built on a foundation of absolute confidentiality, will be shattered. Critics argue that for a teenager, the fear of a parent being notified is a powerful deterrent. The AI, once seen as a safe, non-judgmental confidant, will be re-evaluated as a potential informant, fundamentally changing how and what users are willing to share.
This delicate balance of trust was upended by the tragic case of Adam Raine, which led OpenAI to prioritize a system of intervention over the principle of unwavering user confidentiality. The company is wagering that users will adapt to this new trust model, valuing the safety net more than the absolute privacy.
As users become aware of this new functionality, their behavior will be the ultimate test of this new trust equation. A decline in conversations about mental health could signal a catastrophic failure of trust, while continued engagement might suggest a successful recalibration. The future of AI as a mental health tool hangs in the balance.

