weatherbots

mirror of https://github.com/JHenzi/weatherbots synced 2026-07-28 06:33:35 +00:00

No description

Jupyter Notebook 93.9%
Python 5.1%
HTML 0.8%
Shell 0.2%

Find a file

J Henzi 555b4613dd continuing		2026-07-27 21:13:04 -04:00
.cursor/plans	it's not real	2026-02-03 15:29:26 -05:00
__pycache__	continuing	2026-07-27 00:34:16 -04:00
bandit	it's not real	2026-03-14 12:08:37 -04:00
CS542 Common Task Report a6dd308f10b84adea1ea460a38d5040d	Updated README	2024-06-05 21:00:53 +05:30
Data	continuing	2026-07-27 21:13:04 -04:00
db	it's not real	2026-02-17 07:35:05 -05:00
documentation	it's not real	2026-03-22 23:52:56 -04:00
ops	it's not real	2026-03-14 23:58:48 -04:00
pgdata	continuing	2026-07-27 00:34:16 -04:00
scripts	continuing	2026-07-27 00:34:16 -04:00
tests	it's not real	2026-03-02 18:17:09 -05:00
.cursorrules	it's not real	2026-01-26 16:41:55 -05:00
.dockerignore	it's not real	2026-01-23 21:58:06 -05:00
.DS_Store	continuing	2026-07-26 23:54:21 -04:00
.env.example	it's not real	2026-03-03 10:24:56 -05:00
.gitattributes	Initial commit	2024-02-28 12:42:41 -05:00
.gitignore	it's not real	2026-02-02 20:17:09 -05:00
AdaptiveLearningEngine.md	it's not real	2026-01-24 10:58:27 -05:00
AUDIT_2026-07-11.md	continuing	2026-07-11 21:58:07 -04:00
bandit_update.py	it's not real	2026-02-17 07:35:05 -05:00
calibrate_sources.py	it's not real	2026-02-20 22:30:31 -05:00
CHANGELOG.md	it's not real	2026-03-15 00:34:52 -04:00
daily_metrics.py	it's not real	2026-01-29 23:17:46 -05:00
daily_prediction.py	it's not real	2026-01-24 10:58:27 -05:00
data_fetcher_new.ipynb	Final Code and Repo Cleanup	2024-06-05 20:55:06 +05:30
data_lstm.ipynb	Final Code and Repo Cleanup	2024-06-05 20:55:06 +05:30
db.py	it's not real	2026-02-17 07:35:05 -05:00
docker-compose.yml	it's not real	2026-03-14 11:45:18 -04:00
Dockerfile	continuing	2026-07-17 18:51:12 -04:00
exit_manager.py	it's not real	2026-03-14 23:58:48 -04:00
ForecasterLearningImprovements.md	it's not real	2026-01-23 22:15:24 -05:00
Forecasts.png	it's not real	2026-02-02 19:28:48 -05:00
ForecastsVsActualsChart.png	it's not real	2026-02-02 19:07:01 -05:00
hourly_pulse.py	it's not real	2026-01-24 10:58:27 -05:00
intraday_pulse.py	continuing	2026-07-11 21:58:07 -04:00
Kalshi-Recent-Activity-Pranav.csv	Backup cuz im scared	2024-04-02 14:34:07 -04:00
kalshi_trader.py	continuing	2026-07-27 00:34:16 -04:00
log_market_prices.py	it's not real	2026-03-03 10:24:56 -05:00
morning_trader.py	it's not real	2026-03-14 23:58:48 -04:00
normalize_timestamps.py	it's not real	2026-01-24 21:41:51 -05:00
pgoyanka.zip	Final Code and Repo Cleanup	2024-06-05 20:55:06 +05:30
prediction_mae.py	it's not real	2026-02-02 20:10:02 -05:00
predictions_final.csv	it's not real	2026-01-23 19:11:47 -05:00
predictions_final_all_days_with_actual.csv	Final Code and Repo Cleanup	2024-06-05 20:55:06 +05:30
ProfitLossOverTime.png	Backup cuz im scared	2024-04-02 14:34:07 -04:00
README.md	it's not real	2026-03-02 19:51:11 -05:00
requirements.txt	it's not real	2026-01-29 12:56:20 -05:00
RLSYS.md	starting	2026-01-23 15:36:54 -05:00
run_daily.py	it's not real	2026-02-20 22:57:44 -05:00
SECURITY.md	it's not real	2026-02-02 20:17:09 -05:00
settle_eval.py	it's not real	2026-01-24 10:58:27 -05:00
ToDo.md	it's not real	2026-01-25 20:33:44 -05:00
train_models.py	it's not real	2026-01-24 10:58:27 -05:00
truth_engine.py	it's not real	2026-02-20 22:30:31 -05:00
update_city_metadata.py	it's not real	2026-03-15 15:23:16 -04:00
VotingModel.md	it's not real	2026-01-28 21:31:53 -05:00

README.md

How To Trade Weather on Kalshi — A Weather Prediction Market Bot

If you are searching for how weather prediction markets work, how people try to get an edge in temperature contracts, or how to automate a weather-trading strategy, this repo is built for exactly that problem.

Weather Trader is an automated system for Kalshi daily high-temperature markets. It pulls forecasts from eight weather sources, scores them against NWS settlement truth, learns which sources and prediction modes are working best, and only takes trades that clear configurable guardrails.

In plain English: this is a working example of how someone can approach weather markets with data, discipline, and automation instead of guessing.

Why People Use This Repo

Learn how a real weather prediction market workflow is built end to end.
Study how traders can look for edge in data-rich, rules-based markets instead of relying on gut feel.
Run a live dashboard, a paper-trading loop, and a fully automated pipeline on your own machine.
Explore "how to bet on weather" in a technical, measurable way: forecasts in, probabilities out, orders gated by risk rules.

Important

This is research and automation code, not a promise of profit. Prediction markets are risky. The safest way to start is KALSHI_ENV=demo with WT_SEND_ORDERS=false.

Quick Start in Demo Mode

cp .env.example .env      # fill in KALSHI_API_KEY_ID + KALSHI_PRIVATE_KEY_PATH
docker compose up -d --build
open http://localhost:8080  # live dashboard

Minimum required secrets: KALSHI_API_KEY_ID, KALSHI_PRIVATE_KEY_PATH, KALSHI_ENV. Recommended first run: keep KALSHI_ENV=demo and WT_SEND_ORDERS=false until you trust the pipeline.

Important

Setup guides: Docker setup · Operational runbook · Environment variables

Never commit .env or private keys. See SECURITY.md if secrets were exposed.

Without Docker: pip install -r requirements.txt, configure .env, run scripts manually. Scheduling is up to you.

What It Does

Multi-source weather forecast aggregation

Pulls daily high-temperature forecasts from Open-Meteo, Visual Crossing, Tomorrow.io, WeatherAPI, OpenWeatherMap, Pirate Weather, NWS Weather.gov, and Google Weather. Each source is scored continuously against NWS CLI settlement truth and reweighted nightly via inverse-MAE weighting.

Kalshi weather market execution

Targets Kalshi daily high-temperature markets for New York (KXHIGHNY), Chicago (KXHIGHCHI), Austin (KXHIGHAUS), and Miami (KXHIGHMIA). Selects the highest-EV bucket using a live orderbook snapshot, with confidence/spread guardrails and per-city daily budget controls.

Contextual bandit model selection

A LinUCB contextual bandit (bandit/) learns which prediction mode performs best given real-time weather context (sky condition, cloud cover, provider disagreement, season). It selects between:

forecast — raw MAE-weighted ensemble
blend — bias-corrected forecast (adds per-city rolling cold-bias correction to the ensemble)

Context features include NWS/provider sky condition votes, cloud cover, spread, provider count, and city/date signals. Full decision and reward telemetry is logged for post-settlement learning.

Self-calibrating bias correction

The system tracks signed prediction error per city over a 14-day rolling window. This bias_correction_f (stored in Data/city_metadata.json) is applied by the blend action to correct systematic cold-bias in the ensemble — typically +0.4–0.6°F across all four cities.

Intraday refresh and trade gates

Forecasts refresh every hour. Trade decisions fire at 13:00 local time per city (NY/FL at 13:00 ET, IL/TX at 14:00 ET). A second intraday pulse runs at 14:00 ET for remaining cities.

Why that can matter in prediction markets

Weather markets are attractive because they settle on objective public data, update throughout the day, and often show visible disagreement between sources. This repo is built around that edge hypothesis: if you can measure forecast quality better than the market prices it, you may be able to find better entries than a casual trader.

Cities

City	Code	Kalshi Series	Lat/Lon
NYC (Central Park)	`ny`	KXHIGHNY	40.79736, -73.97785
Chicago (Midway)	`il`	KXHIGHCHI	41.78701, -87.77166
Austin (Bergstrom)	`tx`	KXHIGHAUS	30.14440, -97.66876
Miami	`fl`	KXHIGHMIA	25.77380, -80.19360

Architecture

intraday_pulse.py          ← fetch forecasts, run bandit, write predictions_latest.csv
    └── bandit/policy.py   ← LinUCB selects forecast vs blend
    └── bandit/modes.py    ← blend = forecast + bias_correction_f
    └── bandit/context.py  ← sky/condition voting from provider payloads
kalshi_trader.py           ← read predictions, select market bucket, place orders
calibrate_sources.py       ← nightly: update source weights from settled actuals
bandit_update.py           ← nightly: update bandit policy from settled rewards
update_city_metadata.py    ← nightly: update per-city MAE + rolling bias correction

Data Artifacts

File	Purpose
`Data/source_performance.csv`	Per-source signed and absolute error vs NWS actual
`Data/city_metadata.json`	Per-city MAE and rolling bias correction
`Data/weights.json`	Learned source weights (inverse-MAE)
`Data/eval_history.csv`	Per-trade outcome, bucket hit, realized P&L
`Data/bandit_decisions_history.csv`	Per-city bandit action selected and applied
`Data/bandit_rewards_history.csv`	Post-settlement reward signal per action
`Data/bandit_state.json`	Persisted LinUCB policy (A and b matrices)

Note

Full schemas: Data reference

Key Environment Variables

Variable	Default	Description
`KALSHI_ENV`	`demo`	`demo` or `prod`
`WT_ENV`	`demo`	Trading environment passed to trader
`WT_SEND_ORDERS`	`false`	Set `true` to place real orders
`WT_DAILY_BUDGET`	`50`	Max dollars per day across all cities
`WT_BANDIT_MODE`	`live`	Bandit mode: `off` / `shadow` / `canary` / `live`
`WT_BANDIT_ALPHA`	`0.7`	LinUCB exploration parameter

Tip

Full env var table: Environment variables

Documentation

Important

Operational runbook — budget, live trading, idempotency, logs, dry-run, commands

Important

Docker setup — .env, Kalshi key mount, logs, schedule, timezone

Tip

Data flow — ingestion → consensus → prediction modes → Kalshi execution

Tip

Kalshi markets — series tickers, NWS resolution, contract selection, authentication

Tip

Dashboard — web UI, TUI, observations, analytics

Tip

Mathematical foundations — weights, sigma, probability, EV

Note

System architecture · Data reference · Audit report · Improvement roadmap

Caution

SECURITY.md — key rotation, purging git history, deleting cache files

Weather Prediction Betting Intelligence

This section documents observed system performance, source quality, and the guardrail decisions made from live production data (Jan–Mar 2026). Numbers are derived from settled NWS CLI actuals vs. predictions logged at trade time.

Observed Prediction Accuracy (MAE)

14-day rolling MAE by city (consensus ensemble, production runs):

City	MAE (14d)	Typical cold bias	Bias correction applied
Miami (FL)	0.67°F	+0.89°F	Yes — `blend` mode
Chicago (IL)	0.78°F	+0.55°F	Yes — `blend` mode
Austin (TX)	0.84°F	+0.55°F	Yes — `blend` mode
NYC (NY)	0.87°F	+0.51°F	Yes — `blend` mode

On stable weather days (no front transitions), production MAE averages ~1.0°F. This is the regime where all four providers cluster tightly and confidence scores are high.

On weather-transition days (warm or cold fronts), errors of 3–7°F are possible. These are not bugs — every provider fails simultaneously because the NWP models miss the front timing. The guardrails below address this.

Source Quality Ranking

Eight sources are scored continuously against NWS CLI actuals. Weights are updated nightly via inverse-MAE.

Source	Role	Notes
Visual Crossing	Highest weight (40–59% across cities)	Most consistent; best for IL, TX
Tomorrow.io	Second tier (15–30%)	Strong for TX and transitional days
Open-Meteo	Second tier (10–30%)	Good for IL, NY
Pirate Weather	Third tier (10–20%)	Reliable background signal
WeatherAPI	Low weight, used as divergence signal	Runs 1–4°F warm; useful as a leading warm-front indicator
OpenWeatherMap	Very low weight (<1%)	Low accuracy vs. NWS actuals
Google Weather	Very low weight (<1%)	Included but rarely decisive
Weather.gov (NWS)	Very low weight (<1%)	Ironically not the most accurate at forecast time
~~LSTM~~	Retired	Was 20–35°F off due to stale training data

Guardrails and Trades Avoided

The system applies layered gates before any order is placed:

1. Spread (sigma) guardrail — skips if cross-source standard deviation exceeds 3.0°F.

2. Max-source-divergence guardrail (added Mar 2026) — if any single source deviates more than 3.0°F from the weighted consensus mean, sigma is widened to max(sigma, max_deviation / 2). This lowers the confidence score and can trigger a skip even when most sources agree.

Example: Feb 26 NYC. Actual high was 49°F. All sources predicted 40.9–42.2°F except WeatherAPI at 46.4°F — a 4.5°F divergence from consensus. The original sigma was 1.09°F (sources appeared to agree), but with the divergence guardrail the effective sigma would have widened to 2.25°F, dropping conf_final to ~0.44 and skipping a trade that lost -$2.82 on a 7.1°F miss.

3. Confidence threshold — currently 0.60 (effective_confidence = 0.7 × confidence + 0.3 × conviction). Raised from 0.50 to 0.75 in Feb 2026 to reduce trading, lowered back to 0.60 in Mar 2026 after diagnosing that the 0.75 threshold was blocking legitimate trades on normal days.

4. Intraday signal gate — requires a stable or monotonically increasing prediction trend across the last four 30-minute pulses before allowing an order.

What the Data Shows About Bad Trades

From 18 settled production trades (Feb 4 – Feb 28 2026):

Scenario	Trades	Avg error	Notes
Normal days (sources agree)	14	1.05°F	Sources within 2°F of each other
Outlier/transition days	4	5.12°F	Feb 18 TX cold snap, Feb 21 IL cold snap, Feb 25–26 NY warm front

All four outlier trades shared the same root cause: every provider simultaneously missed a front transition. Three of the four had tight cross-source agreement (sigma < 2°F), which is why the spread guardrail didn't fire. The new divergence guardrail specifically addresses the Feb 26 NY case where WeatherAPI was signalling the warm move while the others were not.

Key observation: later-in-the-day predictions are more accurate (hour 15 averages 1.81°F MAE vs. 2.08°F at hour 13 across all settled data), because providers issue NWP model updates in the early afternoon. However, the system intentionally trades at 13:00 local time — by 15:00, Kalshi market liquidity drops significantly and the prices available no longer offer good value. The accuracy improvement is not worth the liquidity cost.

Prediction Mode Performance

The contextual bandit chooses between two actions:

forecast — raw MAE-weighted ensemble mean
blend — forecast + bias_correction_f (adds the 14-day rolling signed bias to correct systematic under-prediction)

In the current production window, the bandit has predominantly selected forecast mode. As reward signal accumulates post-settlement, blend selection is expected to increase — the 14-day bias corrections (+0.5–0.9°F across cities) are directionally correct and the bandit's LinUCB policy will learn to exploit this in the appropriate sky/spread context.

References

LSTM-Automated-Trading-System — Kalshi Weather Prediction Common Task, BU CS542 Spring 2024
Predicting Temperature of Major Cities Using Machine Learning and Deep Learning
Kalshi API Documentation

README.md Unescape Escape