Therapy Tools Reviewed
Menu

AI Therapy Clinical Evidence Scorecard

Most AI therapy tools claim to be "clinically validated" or "evidence-based." But what does that actually mean? We graded every platform in our database by the quality and quantity of their published clinical evidence. The disparity is striking.

Published: April 4, 2026 | Last updated: April 4, 2026

The Big 2025 Development: Therabot — The First RCT of a Generative-AI Therapy Chatbot

In March 2025, Dartmouth researchers published the field's first randomized controlled trial of a generative-AI therapy chatbot in NEJM AI. The study (Heinz et al., 2025) randomized 210 adults with clinically significant symptoms of major depressive disorder, generalized anxiety disorder, or clinically high risk for feeding/eating disorders to either a 4-week Therabot intervention (N=106) or a waitlist control (N=104). The intervention group showed symptom reductions of approximately 51% (depression), 31% (anxiety), and 19% (eating disorders), with participants reporting a therapeutic alliance with Therabot comparable to working with a human clinician. Effect sizes were broadly comparable to RCTs of in-person CBT delivered over roughly twice the contact time. The result was meaningful enough to be covered in MIT Technology Review and STAT News as a watershed moment for the category.

Two caveats matter. First, Therabot is not publicly available — it is a research instrument developed at Dartmouth, not a consumer product. Larger-sample replication and head-to-head comparison against existing in-person treatment are still needed before clinicians or consumers should treat this as definitive. Second, Therabot's safety profile in the trial was strong, but the trial excluded participants at high suicide risk and clinical oversight was active throughout — safety in unmoderated, real-world consumer deployment remains an open question.

Our Evidence Rating System

  • Gold: 5+ randomized controlled trials (RCTs) with control groups, published in peer-reviewed journals
  • Silver: 1-4 peer-reviewed studies or large observational data with meaningful effect sizes
  • Bronze: Internal data, third-party reviews, or validated assessment instruments used (but no independent RCTs about the platform itself)
  • Unrated: No published clinical evidence about the platform's effectiveness
Platform Audience Evidence Rating Published Research Regulatory Status
Therabot (Dartmouth) Research (not publicly available) Gold First-ever RCT of a generative-AI therapy chatbot — Heinz et al., NEJM AI, 2025 (N=210) FDA Breakthrough Device designation (March 2026)
Wysa B2C Gold 30+ peer-reviewed papers including Inkster et al., JMIR mHealth, 2018 (real-world evaluation in NHS-deployed users) CE-marked Class I (EU), NHS-endorsed (UK)
Woebot B2B (consumer app shut down) Gold 14+ RCTs — most validated AI chatbot globally; flagship study Fitzpatrick et al., JMIR Mental Health, 2017 (N=70 college students, significant PHQ-9 reduction) FDA pathway attempted (cited as reason for B2C exit)
Lyssn B2B Gold 60+ peer-reviewed publications, 17+ years research HIPAA compliant
Youper B2C Silver Mehta et al., JMIR, 2021 — longitudinal observational study (N=4,517), anxiety d=0.57 + depression d=0.46 over 2 weeks. Not an RCT. None
Elomia B2B+B2C Silver Active clinical trial (NCT06725147); BMC Psychology study None
MindDoc B2C Bronze Uses validated instruments (PHQ-9, GAD-7); no platform-specific RCTs EU Class I medical device
Talkiatry B2C Bronze Internal outcomes data: 87% anxiety patients improve after 2 visits None (licensed psychiatrists, not the platform)
Replika B2C Bronze Mixed: Maples et al., 2024 reported 3% of 1,006 users credited Replika with halting suicidal ideation; critical Matters Arising response flagged self-selection bias + harm documentation. FTC complaint outstanding. None — not a clinical tool
Bloom B2C (discontinued) Unrated Content by licensed therapists; no independent studies Discontinued Feb 2025
Blueprint B2B Unrated Measurement-based care validated; no AI-specific studies HIPAA compliant
Mentalyc B2B Unrated UC Berkeley-backed; SOC 2 Type II; no clinical studies SOC 2 Type II, HIPAA
Upheal B2B Unrated No published research; user reviews only HIPAA + BAA
Freed B2B Unrated No mental health-specific studies; general scribe HIPAA, SOC 2
Alma B2B+B2C Unrated No published research BBB accredited
SimplePractice B2B Unrated No AI-specific research; most-used therapy EHR HIPAA + BAA, HITRUST

Key Takeaways

  • Consumer apps have more evidence than therapist tools. Wysa (30+ papers) and Woebot (14+ RCTs) invested heavily in clinical validation. None of the B2B AI note tools (Upheal, Mentalyc, Blueprint, SimplePractice, Freed) have published peer-reviewed studies about their AI accuracy.
  • Lyssn is the B2B exception. With 60+ peer-reviewed publications and 17+ years of research, Lyssn has the strongest evidence base of any B2B tool — but it's a training/QI tool, not a documentation tool.
  • "Clinically validated" is often marketing. Many platforms claim clinical validation without peer-reviewed RCTs. Using validated assessment instruments (PHQ-9, GAD-7) is not the same as validating the platform itself.
  • Regulatory status varies dramatically. Wysa has CE-mark and NHS endorsement. MindDoc has EU Class I classification. Most other platforms have no regulatory approval of any kind.
  • Evidence does not equal safety. HIPAA compliance, BAA availability, and SOC 2 certification are separate from clinical effectiveness research. A tool can be secure without being clinically validated.

Why This Matters

For a YMYL (Your Money or Your Life) topic like mental health, the distinction between evidence-based and evidence-free matters. Consumers choosing therapy apps deserve to know that Wysa has 30+ papers while Replika has an FTC complaint. Therapists choosing documentation tools deserve to know that none of them have published accuracy studies.

This scorecard is updated quarterly. If any platform publishes new clinical research, we will update their rating. Read our full methodology for how we evaluate clinical evidence.

In crisis? Call 988 or text HOME to 741741 — free, confidential, 24/7
For Therapists: Upheal Try Free ↗