SAR Multi-Crop Acreage Estimator

User Guide

← Back to outcome

User Guide — SAR Multi-Crop Acreage Estimator

This guide covers the three ways to use the pipeline: the synthetic demo, the real-data run, and the interactive Season Explorer.

1. Synthetic demo (no data needed)

node run.mjs

This generates a physically-motivated synthetic Kharif dataset, holds out 30% of villages as a test set, trains the model, writes data/submission.csv, and prints metrics:

Synthetic Kharif run
  train=1050 test=450 acquisitions=12 (trained in ~100 ms)
  MSE = ~15000 ha^2   baseline(mean) = ~38000   skill = ~60%
  R^2 = ~0.59
    Rice/Cotton/Maize/Bajra/Groundnut  RMSE per crop (ha)
  submission -> data/submission.csv

Useful flags:

  • --n 2000 — number of synthetic villages.
  • --T 14 — acquisitions across the season.
  • --seed 7 — reproducibility seed.

data/metrics.json captures the full run metrics for record-keeping.

2. Real competition data

Once you have produced a per-village zonal-statistics CSV from the official SAR tiles (see run-deploy-instructions.md for the raster recipe), run:

node run.mjs --zonal zonal.csv --labels train-labels.csv
  • zonal.csv is long-format: village_id,date,pol,backscatter_db,village_area_ha

with pol in {VV/HH (co-pol), VH/HV (cross-pol)}.

  • train-labels.csv uses the submission schema:

ID,Rice_ha,Cotton_ha,Maize_ha,Bajra_ha,Groundnut_ha.

Villages present in the labels are used for training; the rest (the test set) are predicted into data/submission.csv. Add --cv to print an honest out-of-fold cross-validated MSE/R²/skill on the labelled villages.

Official-data verification (same structure)

Drop data/official/zonal.csv and data/official/train-labels.csv in place, then run node verify.mjs. It runs the identical evaluation checks on the real data via k-fold cross-validation and writes data/official/submission.csv. To rehearse the path before the real files arrive:

node tools/synth-to-official.mjs data/official --n 1000 --T 12 --test 0.2
node verify.mjs

3. Season Explorer (interactive)

Open public/tool.html in a browser (or serve the folder). Move the crop-area sliders and watch the multi-temporal VV/VH backscatter curves respond, then see the linear-unmixing estimate recover the crop areas from the (noisy) curves. This makes the SAR ↔ acreage forward/inverse relationship tangible.

Reading the output

data/submission.csv is the deliverable:

ID,Rice_ha,Cotton_ha,Maize_ha,Bajra_ha,Groundnut_ha
1,182.34,0,140.10,55.20,0
2,0,210.55,0,0,98.40
...

One row per village; areas in hectares, rounded to 2 decimals, never negative, never exceeding the village area.

Tips for lowering MSE on real data

  • More acquisitions across the season improve crop separability (more

phenological contrast captured).

  • Enable the residual forest (new AcreageModel({ useForest: true })) for a

small accuracy gain at higher runtime.

  • Tune the ridge lambda to the noise level of your zonal statistics.
  • Improve the upstream zonal stats (terrain flattening, more careful speckle

reduction) — features in, accuracy out.