Ember
⚙️ Health · Eval

Eval

How well is Ember doing? Daily thumbs from Telegram, rubric scores from the LLM judge, and backtest performance against curated past events.

Daily ratings

Thumbs from Telegram

No ratings yet. Once the eval harness is wired through Hermes, daily thumbs land here.

Rubric score

LLM-as-judge, 5 dimensions

2.3/ 5

Backtest

Curated historical events

Curate ~10–15 past significant events to measure if Ember would have caught them.

Rating trend (last 30 days)

Daily positive % from Telegram thumbs

Ratings will accumulate here as Jim rates daily briefings via Telegram.

Rubric breakdown

Mean LLM-judge score across 5 dimensions