Leveraging LLMs to Identify Engaging Positive Mental Health Messages on Social Media
Introduction
Public health need
60 million U.S. adults experience mental illness annually
1 in 5 youth experience major depression; >50% untreated
Millions report suicidal ideation
Gap
Most research focuses on detection
Less focus on what makes content engaging
Objectives and Significance
Understand the content of social media posts related to mental health resilience then identify which mental health messages drive engagement and why.
Provide valuable guidance for designing effective health communication messages to improve resilience
Dataset
Source: Twitter/X (public posts)
Content: English-language tweets related to mental health resilience
Location: United States–based users
Timeframe: September 1, 2024 – December 1, 2025
Dataset size: ~100,000 tweets
Inclusion focus: resilience-related keywords, themes, and discourse
Methodology & Data Pipeline
Data: ~100k tweets → ~14.6k relevant (filtered)
Manual annotation:
500 tweets labeled for relevance + sentiment (double-coded)
Cohen’s κ used to ensure inter-rater reliability
Feature labeling:
250 relevant tweets annotated for content features
Guidelines refined iteratively; reliability re-evaluated
LLM classification:
Annotation rules converted into structured GPT prompts
Model validated on labeled data (F1 ≥ 0.70)
Scaling:
Applied LLM classification to ~14k tweets
Analysis pipeline:
BERTopic → theme discovery
Negative Binomial regression → engagement modeling
LLM Prompting
Used GPT prompting (zero-/few-shot) to classify content features
Model performance validated on labeled data
F1 ≥ 0.70 across all features

Results: Topic Modeling
BERTopic applied to ~14.6k relevant tweets to identify dominant themes across sentiment subsets
All tweets are highly concentrated
~99% in Support & Awareness
Positive tweets are more diverse
Mindfulness (37%), Recovery (33%), Support (29%)
Neutral tweets emphasize different content Support (42%), Stress (34%), Teletherapy (25%)

Results: Regression
Modeled engagement (likes) using a Negative Binomial regression
Inputs included:
- Content features (storytelling, first-person, CTA, etc.)
- Topic categories (BERTopic outputs) Structural features (length, emojis, links)
N = 13,807; Negative Binomial regression (likes)
Pseudo R² ≈ 0.04 → modest explanatory power (expected for social media data)
*Note that Model1 is our regression for tweets with a POSITIVE sentiment only

Key Results
First-person language and storytelling → higher engagement
Call-to-action (CTA) → positive effect on likes
Topic effects:
- Recovery & mindfulness content tends to perform better than generic awareness
- Structural features → weaker, inconsistent effects
Interpretation
- Engagement is driven more by how content is written than just the topic
- Narrative + personal framing makes content more relatable and engaging
- Informational or generic posts less likely to drive interaction
Conclusion
Mental health discourse on social media is not uniform—it spans awareness, recovery, self-care, and service access. While overall conversation is dominated by general support language, engagement is driven more by how content is written than by topic alone. Posts that use personal, narrative-driven language and include clear calls to action tend to generate higher interaction. These findings suggest that effective mental health communication should prioritize relatability, storytelling, and actionable messaging.