Documentation — Dr. Hermes

Overview

System architecture

The platform continuously ingests data from 8 independent public sources, scores 190+ compounds across 20 categories, and serves real-time updates to the dashboard. The scoring engine runs on FastAPI with SQLite time-series storage, Bayesian rating persistence, and server-sent events for live updates.

Reddit Google Trends OpenAlex PubMed ClinicalTrials.gov FDA arXiv Yahoo Finance

↓

Scrapers → ln(x+1) Transform → Weighted Composite → EMA Smoothing → Bayesian Update → Advancement Index

Scoring Engine

Three-layer composite scoring

Every compound score is constructed through three sequential layers. Layer 1 combines six raw signal dimensions using fixed weights. Layer 2 applies exponential moving average smoothing to filter transient noise. Layer 3 blends the EMA with a Bayesian rating that encodes longer-term trajectory confidence.

Raw Score = Σ(dimension_i × weight_i)
EMA_t = EMA_t-1 + α × (Raw_t − EMA_t-1)
α = 1 − 2^{−(Δt / 12h)}
Advancement Index = 0.7 × EMA + 0.3 × Bayesian μ

All raw counts undergo a ln(x + 1) transform before normalization. This stabilizes variance across power-law distributions -- a compound with 10,000 papers and one with 10 are both scored on a comparable logarithmic scale.

Dimensions

Six independent signal vectors

Each dimension is scored independently on a 0-100 scale, then combined using fixed weights. Research velocity carries the highest weight because clinical and academic evidence is the least gameable signal.

Research Velocity

30%

Social Signal

25%

Search Momentum

15%

Regulatory Signal

15%

Sentiment

10%

Market Signal

5%

Social Signal (25%)

Aggregated from Reddit post volume, engagement, and subreddit spread across 65+ monitored communities. Volume and engagement each contribute 40%, spread contributes 20%. Comments are weighted 3x more than upvotes.

volume = ln_norm(posts_7d, scale=200)
engagement = ln_norm(score + comments × 3, scale=5000)
spread = ln_norm(subreddits, scale=15)
social = volume × 0.4 + engagement × 0.4 + spread × 0.2

Search Momentum (15%)

Google Trends interest score over 90-day windows. Composite of average interest (50%), momentum percentage change (30%), and count of rising related queries (20%).

Research Velocity (30%)

Papers, citations, and clinical trial activity across OpenAlex, PubMed, and ClinicalTrials.gov. Active recruiting trials carry the highest sub-weight (25%) because they represent current investment in the compound.

papers = ln_norm(openalex + pubmed, scale=30,000) × 0.15
recent = ln_norm(recent_papers, scale=5,000) × 0.30
citations = ln_norm(citations, scale=50,000) × 0.15
trials = ln_norm(trials_total, scale=500) × 0.15
active = ln_norm(recruiting, scale=50) × 0.25

Regulatory Signal (15%)

FDA approval status provides a +60 point base. Additional signal from label count (10%), completed trial phase progression (15%), and active recruiting pipeline (15%).

Sentiment (10%)

Keyword-based sentiment analysis on Reddit discourse. Positive and negative keyword dictionaries produce a score from -1 to +1, mapped linearly to 0-100.

sentiment = max(0, min(100, (avg + 1) × 50))

Market Signal (5%)

Stock price momentum and trading volume for companies with exposure to each compound. Sourced from Yahoo Finance. Weighted lowest because market signals follow advancement rather than leading it.

Normalization

Logarithmic scale factors

Every raw count passes through ln(x + 1) before normalization. The scale factor defines what count maps to 100 points on the logarithmic curve. Counts beyond the scale factor are capped at 100.

Metric	Scale Factor	Meaning
Reddit posts (7d)	200	200 posts/week = max score
Reddit engagement	5,000	score + comments × 3
Subreddit spread	15	15 unique subreddits
Total papers	30,000	OpenAlex + PubMed combined
Recent papers	5,000	Last 2 years
Citations	50,000	Total citation count
Clinical trials	500	Total registered trials
Recruiting trials	50	Actively recruiting
FDA labels	10	Approved label count
Completed trials	100	Phase progression signal
Stock volume	50,000,000	Daily trading volume

EMA Smoothing

Exponential moving average with 12-hour half-life

Raw scores are noisy. A single viral Reddit post can move a compound's social signal by 40 points in a day. The EMA layer smooths this by weighting recent observations exponentially, with a half-life of 12 hours.

α = 1 − 2^{−(Δt / 12)}
α = clamp(α, 0.001, 1.0)
EMA_new = EMA_old + α × (Raw − EMA_old)

After 12 hours, a new observation has ~50% influence on the EMA. After 24 hours, ~75%. After 3 days, ~94%. This means genuine sustained trends propagate within 1-2 days, while transient spikes are damped within hours.

Bayesian Layer

Encoding compound trajectory as a probability distribution

Each compound is modeled as a Bayesian rating with a mean (μ) and uncertainty (σ), using the OpenSkill library's Plackett-Luce model. New compounds start with high uncertainty (μ = 25, σ = 8.33). As observations accumulate, the model tightens its confidence interval.

Each scoring cycle compares the compound's current raw score against its previous score. Score increases are treated as "wins" against a baseline, decreases as "losses." The Plackett-Luce model updates μ and σ accordingly.

Advancement Index = 0.7 × EMA + 0.3 × μ_bayesian

A compound that scores consistently high earns a stable prior that resists transient drops. A compound riding a single spike will see its Bayesian score remain conservative until sustained evidence confirms the trend. State is persisted to disk between server restarts.

Data Sources

Eight independent public sources

Source	What We Collect	Method
Reddit	Post volume, score, comments, sentiment, subreddit spread across 65+ communities (peptides, Semaglutide, Biohackers, longevity, DrugNerds, etc.)	Apify Reddit Scraper Pro, batched 10 subreddits/request
Google Trends	Average interest (0-100), momentum %, peak interest, rising queries, regional breakdown	pytrends library, 90-day windows
OpenAlex	Total papers, recent papers (last 2 years), citation count, top titles	REST API, sorted by citation count
PubMed	Total papers, recent papers, paper titles, publication dates	NCBI E-utilities (esearch + esummary)
ClinicalTrials.gov	Total trials, recruiting trials, completed trials, recent trial details	v2 REST API, paginated
FDA / openFDA	Approval status, label count, brand names	Drug label API
arXiv	Preprint count, recent preprints, titles	Atom feed API
Yahoo Finance	Stock price, change %, volume, 52-week range for pharma companies mapped to compounds	Chart API (free, no auth), US (.NYQ) and India (.NS) tickers

Reddit communities monitored

Tier 1 (core): peptides, Peptides, PeptideScience, researchchemicals. Tier 2 (compound-specific): Semaglutide, Ozempic, Wegovy, tirzepatide, Mounjaro. Tier 3 (health): Biohackers, longevity, Nootropics, PEDs, fitness, diabetes. Tier 4 (research): DrugNerds, medicine, pharmacology, neuroscience. Total: 65+ subreddits.

Stock ticker mappings

40+ global pharma companies are mapped to compounds. US tickers include NVO (Novo Nordisk), LLY (Eli Lilly), AZN (AstraZeneca), PFE (Pfizer), AMGN (Amgen). Indian NSE tickers include SUNPHARMA.NS, DRREDDY.NS, CIPLA.NS, ZYDUSLIFE.NS, LUPIN.NS, GLENMARK.NS for post-patent generic manufacturers.

Refresh Cycles

Multi-layer real-time data pipeline

Data freshness varies by source. Social signals refresh most frequently because they move fastest. Research data refreshes less often because papers and trials update on longer timescales.

Full Rescore

4h

All compounds re-scored via cron. Reddit + all APIs.

History Snapshots

15min

SQLite time-series snapshots for trend charts.

SSE Broadcast

30s

Server-sent events push live scores to connected dashboards.

Frontend Refresh

5min

GLP market views auto-refresh stock tickers and stats.

Reddit Scan

5min

Background auto-refresh of social signals.

Google Trends

1h

Search momentum and rising query refresh.

GLP Market Tracking

US and India GLP-1 market intelligence

Dedicated market views track the GLP-1 therapeutic landscape in the United States and India. Both views pull live stock data from Yahoo Finance, map compounds to companies and brands, and display macro health statistics.

US Market

Tracks 11 GLP-1 compounds with FDA status, brand names, originator companies, and stock performance. Top-line stats: CDC obesity rate (40.3%, 100M+ adults), diabetes prevalence (12%, 37M adults), annual economic burden ($260B), GLP-1 market size ($54B 2024), and top combined revenue ($41.2B for semaglutide + tirzepatide).

Key tickers: NVO (Novo Nordisk, semaglutide), LLY (Eli Lilly, tirzepatide/retatrutide), AZN (AstraZeneca), ALT (Altimmune, pemvidutide).

India Market

Following semaglutide's patent expiry on March 20, 2026, 40+ Indian companies launched 50+ generic brands. The India view tracks originator products alongside domestic generics, with live NSE stock tickers for companies like Dr. Reddy's, Sun Pharma, Cipla, Zydus Lifesciences, Lupin, and Glenmark.

India stats: obesity rate (24%, 350M+ overweight), diabetes prevalence (89.8M, 10.5%), annual burden ($29B), GLP-1 market ($118M, projected $530M by 2030), and 40+ generic manufacturers post-patent.

Key generic brands (India, post-patent)

Company	Brand	NSE Ticker
Dr. Reddy's	Obeda	DRREDDY.NS
Sun Pharma	Noveltreat	SUNPHARMA.NS
Zydus Lifesciences	Semaglyn, Alterme	ZYDUSLIFE.NS
Cipla	Yurpeak (tirzepatide)	CIPLA.NS
Glenmark	GLIPIQ, Lirafit	GLENMARK.NS
Natco Pharma	Semanat	NATCOPHARMA.NS
Lupin	Semanext	LUPIN.NS
Alkem	Semasize	ALKEM.NS
Torrent	Sembolic	TORNTPHARM.NS
Mankind Pharma	Samakind	MANKIND.NS

Labels & Alerts

Signal classification and automated alerts

Each compound receives a signal label based on its Advancement Index and search momentum. Labels are descriptive, not predictive.

Label	Criteria
Surging	Score ≥ 80, or score ≥ 70 with momentum > 5%
Rising	Score ≥ 55, or score ≥ 40 with momentum > 10%
Stable	Score ≥ 30
Cooling	Score ≥ 15
Dormant	Score < 15

Automated alert rules

Social buzz ≥ 80 Exceptional social media activity

Search +30% Search interest surging

Search -20% Search interest declining

Recruiting ≥ 3 Multiple trials actively recruiting

FDA approved Regulatory milestone reached

Posts ≥ 20/week High Reddit activity

5+ subreddits Cross-community discussion

Sentiment < -0.3 Negative sentiment, possible safety concerns

50+ recent papers Active research surge

5,000+ citations Highly cited compound

Time-Series History

SQLite append-only signal history

Every 15 minutes, the system snapshots all compound scores into an SQLite database. This produces ~96 data points per compound per day, enabling trend visualization and historical analysis.

Column	Type	Description
slug	TEXT	Compound identifier
timestamp	TEXT	ISO 8601 timestamp
signal_score	REAL	Raw composite score
advancement_index	REAL	Final blended score
social_buzz	REAL	Social dimension (0-100)
search_momentum	REAL	Search dimension (0-100)
research_velocity	REAL	Research dimension (0-100)
sentiment	REAL	Sentiment dimension (0-100)
mu	REAL	Bayesian mean
sigma	REAL	Bayesian uncertainty
reddit_posts_7d	INTEGER	Reddit post count
google_interest	REAL	Google Trends score
google_momentum_pct	REAL	Search momentum %
openalex_recent	INTEGER	Recent paper count
trials_recruiting	INTEGER	Active recruiting trials
fda_approved	INTEGER	FDA approval flag

API Reference

REST endpoints

All endpoints are served from the FastAPI application on port 8420. CORS is enabled. Responses are JSON unless otherwise noted.

Endpoint	Method	Description
/api/signals/cache	GET	Full ranked compound cache, sorted by advancement_index
/api/signals/compound/{slug}	GET	Single compound detail. Pass ?refresh=true to force live rescore.
/api/signals/top?limit=20	GET	Top N movers with score, label, momentum, alerts
/api/signals/history/{slug}?days=30	GET	Time-series history from SQLite
/api/signals/alerts	GET	All active alerts across all compounds
/api/signals/refresh	POST	Trigger full rescan and rescore (10-min timeout)
/api/glp/market?region=us	GET	GLP-1 market data, stats, and live stock tickers. Regions: us, india
/api/signals/stream	GET	Server-sent events stream for live dashboard updates (30s interval)
/api/waitlist	POST	Email subscription. Stores locally, sends confirmation via Resend.
/api/contact	POST	Contact form. Stores locally, notifies via Resend.

Compound Catalog

Self-expanding compound universe

The system does not maintain a fixed list. It continuously scans social and research sources for new compound mentions, validates them against academic databases, and adds them to the catalog automatically. Started with 73 seed compounds, now tracks 190+ across 20 categories.

Category	Examples
GLP-1 / Weight Loss	Semaglutide, Tirzepatide, Retatrutide, Survodutide, Cagrisema, Orforglipron
Growth Hormone	Tesamorelin, MK-677, CJC-1295, Ipamorelin, Sermorelin, Hexarelin
Healing & Repair	BPC-157, TB-500, TB4-FRAG
Nootropic	Semax, Selank, Dihexa, P21, Cerebrolysin, Cortexin
Longevity	SS-31, Rapamycin, NAD+, Humanin, MOTS-c, FOXO4-DRI, Epitalon
Skin, Hair & Aging	Melanotan-2, GHK-Cu, SNAP-8, Argireline, Matrixyl
Immune	Thymosin Alpha-1, LL-37, KPV, Larazotide
Muscle & Performance	Follistatin-344, ACE-031, AOD-9604, IGF-1 LR3
SARMs	Enclomiphene, Ostarine, RAD-140, LGD-4033, Cardarine
Bioregulators	Thymalin, Thymogen, Ovagen, Vesugen, Chelohart (Khavinson peptides)
Sexual Health	PT-141, Kisspeptin
Clinical Pipeline	VX-548, ER-100
Nucleotides	NAD+, NMN, NR (Dinucleotide, Mononucleotide)
Antimicrobial	LL-37, Thymosin peptides

Fuzzy matching

Compounds are discovered via fuzzy string matching against social mentions. Thresholds: basic similarity ≥ 85%, token-flexible ≥ 80%, phonetic similarity ≥ 0.9. This catches misspellings (e.g., "semiglutide" → semaglutide) and brand name references.

Not medical advice. Not investment advice. The Advancement Index is provided for informational and research purposes only. Signal data reflects observable public activity and does not constitute a recommendation to buy, sell, or use any compound. Data sources are public APIs subject to their own terms of service.

Technical Documentation