No description
  • Jupyter Notebook 93.9%
  • Python 5.1%
  • HTML 0.8%
  • Shell 0.2%
Find a file
2026-03-25 12:57:27 -04:00
.cursor/plans it's not real 2026-02-03 15:29:26 -05:00
__pycache__ it's not real 2026-03-25 12:56:56 -04:00
bandit it's not real 2026-03-14 12:08:37 -04:00
CS542 Common Task Report a6dd308f10b84adea1ea460a38d5040d Updated README 2024-06-05 21:00:53 +05:30
Data it's not real 2026-03-25 12:57:27 -04:00
db it's not real 2026-02-17 07:35:05 -05:00
documentation it's not real 2026-03-22 23:52:56 -04:00
ops it's not real 2026-03-14 23:58:48 -04:00
pgdata it's not real 2026-03-25 12:56:56 -04:00
scripts it's not real 2026-03-14 23:58:48 -04:00
tests it's not real 2026-03-02 18:17:09 -05:00
.cursorrules it's not real 2026-01-26 16:41:55 -05:00
.dockerignore it's not real 2026-01-23 21:58:06 -05:00
.DS_Store it's not real 2026-02-08 00:17:29 -05:00
.env.example it's not real 2026-03-03 10:24:56 -05:00
.gitignore it's not real 2026-02-02 20:17:09 -05:00
AdaptiveLearningEngine.md it's not real 2026-01-24 10:58:27 -05:00
bandit_update.py it's not real 2026-02-17 07:35:05 -05:00
calibrate_sources.py it's not real 2026-02-20 22:30:31 -05:00
CHANGELOG.md it's not real 2026-03-15 00:34:52 -04:00
daily_metrics.py it's not real 2026-01-29 23:17:46 -05:00
daily_prediction.py it's not real 2026-01-24 10:58:27 -05:00
data_fetcher_new.ipynb Final Code and Repo Cleanup 2024-06-05 20:55:06 +05:30
data_lstm.ipynb Final Code and Repo Cleanup 2024-06-05 20:55:06 +05:30
db.py it's not real 2026-02-17 07:35:05 -05:00
docker-compose.yml it's not real 2026-03-14 11:45:18 -04:00
Dockerfile it's not real 2026-03-04 14:41:28 -05:00
exit_manager.py it's not real 2026-03-14 23:58:48 -04:00
ForecasterLearningImprovements.md it's not real 2026-01-23 22:15:24 -05:00
Forecasts.png it's not real 2026-02-02 19:28:48 -05:00
ForecastsVsActualsChart.png it's not real 2026-02-02 19:07:01 -05:00
hourly_pulse.py it's not real 2026-01-24 10:58:27 -05:00
intraday_pulse.py it's not real 2026-03-17 20:33:41 -04:00
Kalshi-Recent-Activity-Pranav.csv Backup cuz im scared 2024-04-02 14:34:07 -04:00
kalshi_trader.py it's not real 2026-03-25 12:56:56 -04:00
log_market_prices.py it's not real 2026-03-03 10:24:56 -05:00
morning_trader.py it's not real 2026-03-14 23:58:48 -04:00
normalize_timestamps.py it's not real 2026-01-24 21:41:51 -05:00
pgoyanka.zip Final Code and Repo Cleanup 2024-06-05 20:55:06 +05:30
prediction_mae.py it's not real 2026-02-02 20:10:02 -05:00
predictions_final.csv it's not real 2026-01-23 19:11:47 -05:00
predictions_final_all_days_with_actual.csv Final Code and Repo Cleanup 2024-06-05 20:55:06 +05:30
ProfitLossOverTime.png Backup cuz im scared 2024-04-02 14:34:07 -04:00
README.md it's not real 2026-03-02 19:51:11 -05:00
requirements.txt it's not real 2026-01-29 12:56:20 -05:00
RLSYS.md starting 2026-01-23 15:36:54 -05:00
run_daily.py it's not real 2026-02-20 22:57:44 -05:00
SECURITY.md it's not real 2026-02-02 20:17:09 -05:00
settle_eval.py it's not real 2026-01-24 10:58:27 -05:00
ToDo.md it's not real 2026-01-25 20:33:44 -05:00
train_models.py it's not real 2026-01-24 10:58:27 -05:00
truth_engine.py it's not real 2026-02-20 22:30:31 -05:00
update_city_metadata.py it's not real 2026-03-15 15:23:16 -04:00
VotingModel.md it's not real 2026-01-28 21:31:53 -05:00

How To Trade Weather on Kalshi — A Weather Prediction Market Bot

If you are searching for how weather prediction markets work, how people try to get an edge in temperature contracts, or how to automate a weather-trading strategy, this repo is built for exactly that problem.

Weather Trader is an automated system for Kalshi daily high-temperature markets. It pulls forecasts from eight weather sources, scores them against NWS settlement truth, learns which sources and prediction modes are working best, and only takes trades that clear configurable guardrails.

In plain English: this is a working example of how someone can approach weather markets with data, discipline, and automation instead of guessing.

Next trade view: prediction, confidence, source set, and weights.


Why People Use This Repo

  • Learn how a real weather prediction market workflow is built end to end.
  • Study how traders can look for edge in data-rich, rules-based markets instead of relying on gut feel.
  • Run a live dashboard, a paper-trading loop, and a fully automated pipeline on your own machine.
  • Explore "how to bet on weather" in a technical, measurable way: forecasts in, probabilities out, orders gated by risk rules.

Important

This is research and automation code, not a promise of profit. Prediction markets are risky. The safest way to start is KALSHI_ENV=demo with WT_SEND_ORDERS=false.


Quick Start in Demo Mode

cp .env.example .env      # fill in KALSHI_API_KEY_ID + KALSHI_PRIVATE_KEY_PATH
docker compose up -d --build
open http://localhost:8080  # live dashboard

Minimum required secrets: KALSHI_API_KEY_ID, KALSHI_PRIVATE_KEY_PATH, KALSHI_ENV. Recommended first run: keep KALSHI_ENV=demo and WT_SEND_ORDERS=false until you trust the pipeline.

Important

Setup guides: Docker setup · Operational runbook · Environment variables

Never commit .env or private keys. See SECURITY.md if secrets were exposed.

Without Docker: pip install -r requirements.txt, configure .env, run scripts manually. Scheduling is up to you.


What It Does

Multi-source weather forecast aggregation

Pulls daily high-temperature forecasts from Open-Meteo, Visual Crossing, Tomorrow.io, WeatherAPI, OpenWeatherMap, Pirate Weather, NWS Weather.gov, and Google Weather. Each source is scored continuously against NWS CLI settlement truth and reweighted nightly via inverse-MAE weighting.

Kalshi weather market execution

Targets Kalshi daily high-temperature markets for New York (KXHIGHNY), Chicago (KXHIGHCHI), Austin (KXHIGHAUS), and Miami (KXHIGHMIA). Selects the highest-EV bucket using a live orderbook snapshot, with confidence/spread guardrails and per-city daily budget controls.

Contextual bandit model selection

A LinUCB contextual bandit (bandit/) learns which prediction mode performs best given real-time weather context (sky condition, cloud cover, provider disagreement, season). It selects between:

  • forecast — raw MAE-weighted ensemble
  • blend — bias-corrected forecast (adds per-city rolling cold-bias correction to the ensemble)

Context features include NWS/provider sky condition votes, cloud cover, spread, provider count, and city/date signals. Full decision and reward telemetry is logged for post-settlement learning.

Self-calibrating bias correction

The system tracks signed prediction error per city over a 14-day rolling window. This bias_correction_f (stored in Data/city_metadata.json) is applied by the blend action to correct systematic cold-bias in the ensemble — typically +0.40.6°F across all four cities.

Intraday refresh and trade gates

Forecasts refresh every hour. Trade decisions fire at 13:00 local time per city (NY/FL at 13:00 ET, IL/TX at 14:00 ET). A second intraday pulse runs at 14:00 ET for remaining cities.

Why that can matter in prediction markets

Weather markets are attractive because they settle on objective public data, update throughout the day, and often show visible disagreement between sources. This repo is built around that edge hypothesis: if you can measure forecast quality better than the market prices it, you may be able to find better entries than a casual trader.


Cities

City Code Kalshi Series Lat/Lon
NYC (Central Park) ny KXHIGHNY 40.79736, -73.97785
Chicago (Midway) il KXHIGHCHI 41.78701, -87.77166
Austin (Bergstrom) tx KXHIGHAUS 30.14440, -97.66876
Miami fl KXHIGHMIA 25.77380, -80.19360

Architecture

intraday_pulse.py          ← fetch forecasts, run bandit, write predictions_latest.csv
    └── bandit/policy.py   ← LinUCB selects forecast vs blend
    └── bandit/modes.py    ← blend = forecast + bias_correction_f
    └── bandit/context.py  ← sky/condition voting from provider payloads
kalshi_trader.py           ← read predictions, select market bucket, place orders
calibrate_sources.py       ← nightly: update source weights from settled actuals
bandit_update.py           ← nightly: update bandit policy from settled rewards
update_city_metadata.py    ← nightly: update per-city MAE + rolling bias correction

Data Artifacts

File Purpose
Data/source_performance.csv Per-source signed and absolute error vs NWS actual
Data/city_metadata.json Per-city MAE and rolling bias correction
Data/weights.json Learned source weights (inverse-MAE)
Data/eval_history.csv Per-trade outcome, bucket hit, realized P&L
Data/bandit_decisions_history.csv Per-city bandit action selected and applied
Data/bandit_rewards_history.csv Post-settlement reward signal per action
Data/bandit_state.json Persisted LinUCB policy (A and b matrices)

Note

Full schemas: Data reference


Key Environment Variables

Variable Default Description
KALSHI_ENV demo demo or prod
WT_ENV demo Trading environment passed to trader
WT_SEND_ORDERS false Set true to place real orders
WT_DAILY_BUDGET 50 Max dollars per day across all cities
WT_BANDIT_MODE live Bandit mode: off / shadow / canary / live
WT_BANDIT_ALPHA 0.7 LinUCB exploration parameter

Tip

Full env var table: Environment variables


Documentation

Important

Operational runbook — budget, live trading, idempotency, logs, dry-run, commands

Important

Docker setup.env, Kalshi key mount, logs, schedule, timezone

Tip

Data flow — ingestion → consensus → prediction modes → Kalshi execution

Tip

Kalshi markets — series tickers, NWS resolution, contract selection, authentication

Tip

Dashboard — web UI, TUI, observations, analytics

Tip

Mathematical foundations — weights, sigma, probability, EV

Note

System architecture · Data reference · Audit report · Improvement roadmap

Caution

SECURITY.md — key rotation, purging git history, deleting cache files


Weather Prediction Betting Intelligence

This section documents observed system performance, source quality, and the guardrail decisions made from live production data (JanMar 2026). Numbers are derived from settled NWS CLI actuals vs. predictions logged at trade time.

Observed Prediction Accuracy (MAE)

14-day rolling MAE by city (consensus ensemble, production runs):

City MAE (14d) Typical cold bias Bias correction applied
Miami (FL) 0.67°F +0.89°F Yes — blend mode
Chicago (IL) 0.78°F +0.55°F Yes — blend mode
Austin (TX) 0.84°F +0.55°F Yes — blend mode
NYC (NY) 0.87°F +0.51°F Yes — blend mode

On stable weather days (no front transitions), production MAE averages ~1.0°F. This is the regime where all four providers cluster tightly and confidence scores are high.

On weather-transition days (warm or cold fronts), errors of 37°F are possible. These are not bugs — every provider fails simultaneously because the NWP models miss the front timing. The guardrails below address this.

Source Quality Ranking

Eight sources are scored continuously against NWS CLI actuals. Weights are updated nightly via inverse-MAE.

Source Role Notes
Visual Crossing Highest weight (4059% across cities) Most consistent; best for IL, TX
Tomorrow.io Second tier (1530%) Strong for TX and transitional days
Open-Meteo Second tier (1030%) Good for IL, NY
Pirate Weather Third tier (1020%) Reliable background signal
WeatherAPI Low weight, used as divergence signal Runs 14°F warm; useful as a leading warm-front indicator
OpenWeatherMap Very low weight (<1%) Low accuracy vs. NWS actuals
Google Weather Very low weight (<1%) Included but rarely decisive
Weather.gov (NWS) Very low weight (<1%) Ironically not the most accurate at forecast time
LSTM Retired Was 2035°F off due to stale training data

Guardrails and Trades Avoided

The system applies layered gates before any order is placed:

1. Spread (sigma) guardrail — skips if cross-source standard deviation exceeds 3.0°F.

2. Max-source-divergence guardrail (added Mar 2026) — if any single source deviates more than 3.0°F from the weighted consensus mean, sigma is widened to max(sigma, max_deviation / 2). This lowers the confidence score and can trigger a skip even when most sources agree.

Example: Feb 26 NYC. Actual high was 49°F. All sources predicted 40.942.2°F except WeatherAPI at 46.4°F — a 4.5°F divergence from consensus. The original sigma was 1.09°F (sources appeared to agree), but with the divergence guardrail the effective sigma would have widened to 2.25°F, dropping conf_final to ~0.44 and skipping a trade that lost -$2.82 on a 7.1°F miss.

3. Confidence threshold — currently 0.60 (effective_confidence = 0.7 × confidence + 0.3 × conviction). Raised from 0.50 to 0.75 in Feb 2026 to reduce trading, lowered back to 0.60 in Mar 2026 after diagnosing that the 0.75 threshold was blocking legitimate trades on normal days.

4. Intraday signal gate — requires a stable or monotonically increasing prediction trend across the last four 30-minute pulses before allowing an order.

What the Data Shows About Bad Trades

From 18 settled production trades (Feb 4 Feb 28 2026):

Scenario Trades Avg error Notes
Normal days (sources agree) 14 1.05°F Sources within 2°F of each other
Outlier/transition days 4 5.12°F Feb 18 TX cold snap, Feb 21 IL cold snap, Feb 2526 NY warm front

All four outlier trades shared the same root cause: every provider simultaneously missed a front transition. Three of the four had tight cross-source agreement (sigma < 2°F), which is why the spread guardrail didn't fire. The new divergence guardrail specifically addresses the Feb 26 NY case where WeatherAPI was signalling the warm move while the others were not.

Key observation: later-in-the-day predictions are more accurate (hour 15 averages 1.81°F MAE vs. 2.08°F at hour 13 across all settled data), because providers issue NWP model updates in the early afternoon. However, the system intentionally trades at 13:00 local time — by 15:00, Kalshi market liquidity drops significantly and the prices available no longer offer good value. The accuracy improvement is not worth the liquidity cost.

Prediction Mode Performance

The contextual bandit chooses between two actions:

  • forecast — raw MAE-weighted ensemble mean
  • blendforecast + bias_correction_f (adds the 14-day rolling signed bias to correct systematic under-prediction)

In the current production window, the bandit has predominantly selected forecast mode. As reward signal accumulates post-settlement, blend selection is expected to increase — the 14-day bias corrections (+0.50.9°F across cities) are directionally correct and the bandit's LinUCB policy will learn to exploit this in the appropriate sky/spread context.


References