SAR Multi-Crop Acreage Estimator

Run & Deploy

Run & Deploy — SAR Multi-Crop Acreage Estimator

Prerequisites

Node.js 18+ (developed/tested on Node 24). No npm install required — the

pipeline is dependency-free and offline.

Run locally

cd delivery-package/sar-crop-acreage
node run.mjs        # synthetic demo -> data/submission.csv + data/metrics.json
node verify.mjs     # verification suite -> verification-report.json/.md

Producing a real submission

The model is data-ready; the only stage that needs a geospatial stack is turning SAR tiles into per-village zonal statistics. That stage is documented here as a seam (Stage A) and consumed by the pipeline (Stage B).

Stage A — raster → per-village zonal statistics (outside Node)

For each acquisition date and polarisation in the season:

Calibrate DN → σ⁰ (linear), then to dB. (X-band: e.g. ICEYE / TerraSAR-X /

Capella products.)

(Optional) terrain-flatten σ⁰ → γ⁰ using a DEM to remove slope effects.
Speckle-reduce (or rely on the pipeline's multi-temporal Lee filter).
Zonal statistics: compute the mean backscatter per village polygon.

Emit one long-format CSV row per (village, date, polarisation):

village_id,date,pol,backscatter_db,village_area_ha
V0001,2025-06-15,VV,-12.43,842.10
V0001,2025-06-15,VH,-18.02,842.10
...

Tools that can do Stage A: GDAL (gdalwarp, gdal_calc), rasterio + rasterstats, ESA SNAP, or Google Earth Engine reduceRegions. This stage is intentionally not bundled because it requires a heavy geospatial environment; everything downstream is pure Node.

Stage B — pipeline (this repo)

node run.mjs --zonal zonal.csv --labels train-labels.csv
# -> data/submission.csv

src/ingest.mjs::parseZonalCsv pivots the long table into per-village { id, villageAreaHa, co[T], cross[T] } stacks ordered by date — exactly what the model consumes. No code changes needed.

Evaluate on the official dataset (same verification structure)

To make verification run on the real data instead of (alongside) the synthetic benchmark, place the official files here:

data/official/zonal.csv          # long-format zonal statistics (Stage A output)
data/official/train-labels.csv   # ID,Rice_ha,Cotton_ha,Maize_ha,Bajra_ha,Groundnut_ha

Then:

node verify.mjs   # auto-detects data/official, runs the SAME checks via k-fold CV,
                  # writes data/official/submission.csv, report -> "official + synthetic"

Because the leaderboard test labels are hidden, real MSE is estimated by out-of-fold cross-validation on the labelled villages — every village is predicted once by a model that never saw it. You can also score directly:

node run.mjs --zonal data/official/zonal.csv --labels data/official/train-labels.csv --cv

No competition files yet? Generate an official-format fixture from the physics model to dry-run the whole path:

node tools/synth-to-official.mjs data/official --n 1000 --T 12 --test 0.2

Deploy the Season Explorer (static)

public/tool.html is a single self-contained file. Host it on any static server (GitHub Pages, S3, nginx) or open locally:

# any static server, e.g.
npx serve public        # then open the printed URL

CI / reproducibility

verify.mjs exits non-zero on any failed check, so it can gate CI:

node verify.mjs   # exit 0 = all checks pass

Runs are deterministic (seeded PRNG), so verification output is stable across machines.