User Guide
User Guide — SAR Multi-Crop Acreage Estimator
This guide covers the three ways to use the pipeline: the synthetic demo, the real-data run, and the interactive Season Explorer.
1. Synthetic demo (no data needed)
node run.mjs
This generates a physically-motivated synthetic Kharif dataset, holds out 30% of villages as a test set, trains the model, writes data/submission.csv, and prints metrics:
Synthetic Kharif run
train=1050 test=450 acquisitions=12 (trained in ~100 ms)
MSE = ~15000 ha^2 baseline(mean) = ~38000 skill = ~60%
R^2 = ~0.59
Rice/Cotton/Maize/Bajra/Groundnut RMSE per crop (ha)
submission -> data/submission.csv
Useful flags:
--n 2000— number of synthetic villages.--T 14— acquisitions across the season.--seed 7— reproducibility seed.
data/metrics.json captures the full run metrics for record-keeping.
2. Real competition data
Once you have produced a per-village zonal-statistics CSV from the official SAR tiles (see run-deploy-instructions.md for the raster recipe), run:
node run.mjs --zonal zonal.csv --labels train-labels.csv
zonal.csvis long-format:village_id,date,pol,backscatter_db,village_area_ha
with pol in {VV/HH (co-pol), VH/HV (cross-pol)}.
train-labels.csvuses the submission schema:
ID,Rice_ha,Cotton_ha,Maize_ha,Bajra_ha,Groundnut_ha.
Villages present in the labels are used for training; the rest (the test set) are predicted into data/submission.csv. Add --cv to print an honest out-of-fold cross-validated MSE/R²/skill on the labelled villages.
Official-data verification (same structure)
Drop data/official/zonal.csv and data/official/train-labels.csv in place, then run node verify.mjs. It runs the identical evaluation checks on the real data via k-fold cross-validation and writes data/official/submission.csv. To rehearse the path before the real files arrive:
node tools/synth-to-official.mjs data/official --n 1000 --T 12 --test 0.2
node verify.mjs
3. Season Explorer (interactive)
Open public/tool.html in a browser (or serve the folder). Move the crop-area sliders and watch the multi-temporal VV/VH backscatter curves respond, then see the linear-unmixing estimate recover the crop areas from the (noisy) curves. This makes the SAR ↔ acreage forward/inverse relationship tangible.
Reading the output
data/submission.csv is the deliverable:
ID,Rice_ha,Cotton_ha,Maize_ha,Bajra_ha,Groundnut_ha
1,182.34,0,140.10,55.20,0
2,0,210.55,0,0,98.40
...
One row per village; areas in hectares, rounded to 2 decimals, never negative, never exceeding the village area.
Tips for lowering MSE on real data
- More acquisitions across the season improve crop separability (more
phenological contrast captured).
- Enable the residual forest (
new AcreageModel({ useForest: true })) for a
small accuracy gain at higher runtime.
- Tune the ridge
lambdato the noise level of your zonal statistics. - Improve the upstream zonal stats (terrain flattening, more careful speckle
reduction) — features in, accuracy out.