Not medical advice. Disclaimer

About This Project

Endo Central is a community-sourced directory of endometriosis specialists, built by mining patient experiences from Reddit. The goal is to help patients make more informed decisions by providing transparent, evidence-based doctor reviews backed by actual posts.

How It Works

Data Collection

We scanned 332,000 Reddit posts across 10 endometriosis-related subreddits, including r/Endo, r/endometriosis, r/hysterectomy, r/adenomyosis, r/pelvicfloor, and related communities.

Doctor Discovery

Doctors are identified using LLM-based extraction that reads posts in context, producing 16,799 doctor mention extractions. Each doctor is then verified against the NPI (National Provider Identifier) registry to confirm their identity, location, credentials, and specialties — 275 doctors were matched with verified NPI data.

Sentiment Classification

Each post is classified as positive, negative, mixed, or neutral using LLM-based sentiment analysis that reads the full post in context. This produces more accurate classifications for nuanced, sarcastic, or complex posts.

Patient vs. Other Users

Posts are classified as "patient experiences" if they contain first-person language indicating direct interaction with the doctor. Other mentions (recommendations, questions, secondhand reports) are shown separately.

Approval Rates

Approval rates are calculated from patient users only: the percentage with an overall positive sentiment across all their posts about a given doctor. Each Reddit user is counted once per doctor regardless of how many posts they made.

Known Limitations

  • Reddit skews younger, more tech-savvy, and more willing to share negative experiences
  • Patients with extreme experiences are more likely to post
  • 167 doctors with common names could not be unambiguously matched to NPI records
  • LLM-based classification is highly accurate but not perfect — always read the actual posts
  • Small sample sizes (under ~20 patients) should be interpreted with caution
  • Data is a snapshot in time and may not reflect a doctor's current practice

Privacy

Reddit usernames have been anonymized ("Patient 1", "Patient 2", etc.). Each post includes a "View on Reddit" link to the original public post for verification purposes. All source data comes from publicly accessible Reddit posts.

Open Source

This project was built as a personal research tool. The data pipeline, analysis scripts, and web platform are open source.