Skip to content

URMC-CTSI

Leveraging LLMs to Identify Engaging Positive Mental Health Messages on Social Media 

Introduction

Public health need

60 million U.S. adults experience mental illness annually

1 in 5 youth experience major depression; >50% untreated

Millions report suicidal ideation

Gap

Most research focuses on detection

Less focus on what makes content engaging

Objectives and Significance

Understand the content of social media posts related to mental health resilience then identify which mental health messages drive engagement and why.

Provide valuable guidance for designing effective health communication messages to improve resilience

Dataset

Source: Twitter/X (public posts)

Content: English-language tweets related to mental health resilience

Location: United States–based users

Timeframe: September 1, 2024 – December 1, 2025

Dataset size: ~100,000 tweets

Inclusion focus: resilience-related keywords, themes, and discourse

Methodology & Data Pipeline

Data: ~100k tweets → ~14.6k relevant (filtered)

Manual annotation:

500 tweets labeled for relevance + sentiment (double-coded)

Cohen’s κ used to ensure inter-rater reliability

Feature labeling:

250 relevant tweets annotated for content features

Guidelines refined iteratively; reliability re-evaluated

LLM classification:

Annotation rules converted into structured GPT prompts

Model validated on labeled data (F1 ≥ 0.70)

Scaling:

Applied LLM classification to ~14k tweets

Analysis pipeline:

BERTopic → theme discovery

Negative Binomial regression → engagement modeling

LLM Prompting

Used GPT prompting (zero-/few-shot) to classify content features

Model performance validated on labeled data

F1 ≥ 0.70 across all features

Results: Topic Modeling

BERTopic applied to ~14.6k relevant tweets to identify dominant themes across sentiment subsets

All tweets are highly concentrated

~99% in Support & Awareness

Positive tweets are more diverse

Mindfulness (37%), Recovery (33%), Support (29%)

Neutral tweets emphasize different content Support (42%), Stress (34%), Teletherapy (25%)

Results: Regression

Modeled engagement (likes) using a Negative Binomial regression

Inputs included:

  • Content features (storytelling, first-person, CTA, etc.)
  • Topic categories (BERTopic outputs) Structural features (length, emojis, links)

N = 13,807; Negative Binomial regression (likes)

Pseudo R² ≈ 0.04 → modest explanatory power (expected for social media data)

*Note that Model1 is our regression for tweets with a POSITIVE sentiment only

Key Results

First-person language and storytelling → higher engagement

Call-to-action (CTA) → positive effect on likes

Topic effects:

  • Recovery & mindfulness content tends to perform better than generic awareness
  • Structural features → weaker, inconsistent effects

Interpretation

  • Engagement is driven more by how content is written than just the topic
  • Narrative + personal framing makes content more relatable and engaging
  • Informational or generic posts less likely to drive interaction

Conclusion

Mental health discourse on social media is not uniform—it spans awareness, recovery, self-care, and service access. While overall conversation is dominated by general support language, engagement is driven more by how content is written than by topic alone. Posts that use personal, narrative-driven language and include clear calls to action tend to generate higher interaction. These findings suggest that effective mental health communication should prioritize relatability, storytelling, and actionable messaging.