SAR Multi-Crop Acreage Estimator

User Guide

User Guide — SAR Multi-Crop Acreage Estimator

This guide covers the three ways to use the pipeline: the synthetic demo, the real-data run, and the interactive Season Explorer.

1. Synthetic demo (no data needed)

node run.mjs

This generates a physically-motivated synthetic Kharif dataset, holds out 30% of villages as a test set, trains the model, writes data/submission.csv, and prints metrics:

Synthetic Kharif run
  train=1050 test=450 acquisitions=12 (trained in ~100 ms)
  MSE = ~15000 ha^2   baseline(mean) = ~38000   skill = ~60%
  R^2 = ~0.59
    Rice/Cotton/Maize/Bajra/Groundnut  RMSE per crop (ha)
  submission -> data/submission.csv

Useful flags:

--n 2000 — number of synthetic villages.
--T 14 — acquisitions across the season.
--seed 7 — reproducibility seed.

data/metrics.json captures the full run metrics for record-keeping.

2. Real competition data

Once you have produced a per-village zonal-statistics CSV from the official SAR tiles (see run-deploy-instructions.md for the raster recipe), run:

node run.mjs --zonal zonal.csv --labels train-labels.csv

zonal.csv is long-format: village_id,date,pol,backscatter_db,village_area_ha

with pol in {VV/HH (co-pol), VH/HV (cross-pol)}.

train-labels.csv uses the submission schema:

ID,Rice_ha,Cotton_ha,Maize_ha,Bajra_ha,Groundnut_ha.

Villages present in the labels are used for training; the rest (the test set) are predicted into data/submission.csv. Add --cv to print an honest out-of-fold cross-validated MSE/R²/skill on the labelled villages.

Official-data verification (same structure)

Drop data/official/zonal.csv and data/official/train-labels.csv in place, then run node verify.mjs. It runs the identical evaluation checks on the real data via k-fold cross-validation and writes data/official/submission.csv. To rehearse the path before the real files arrive:

node tools/synth-to-official.mjs data/official --n 1000 --T 12 --test 0.2
node verify.mjs

3. Season Explorer (interactive)

Open public/tool.html in a browser (or serve the folder). Move the crop-area sliders and watch the multi-temporal VV/VH backscatter curves respond, then see the linear-unmixing estimate recover the crop areas from the (noisy) curves. This makes the SAR ↔ acreage forward/inverse relationship tangible.

Reading the output

data/submission.csv is the deliverable:

ID,Rice_ha,Cotton_ha,Maize_ha,Bajra_ha,Groundnut_ha
1,182.34,0,140.10,55.20,0
2,0,210.55,0,0,98.40
...

One row per village; areas in hectares, rounded to 2 decimals, never negative, never exceeding the village area.

Tips for lowering MSE on real data

More acquisitions across the season improve crop separability (more

phenological contrast captured).

Enable the residual forest (new AcreageModel({ useForest: true })) for a

small accuracy gain at higher runtime.

Tune the ridge lambda to the noise level of your zonal statistics.
Improve the upstream zonal stats (terrain flattening, more careful speckle

reduction) — features in, accuracy out.