dc-l2-lagos-snap-normalize
Analyst notes
Description
Multi-step data-cleaning chain on a hand-crafted Lagos zoning fixture. The task tests whether the agent can read the data carefully enough to infer the four canonical zoning families from the variant spellings, order the cleanup steps so the dissolve actually unifies adjacent parcels, and resist the temptation to reproject to Web Mercator for a portal export. The deliberate hidden gotcha is the sub-millimetre vertex offset between adjacent parcels: a dissolve without a snap looks like it works but leaves thousands of tiny interior rings.
Approach
- Read the GPKG and look at the zoning_class values to figure out the four canonical families behind the variant spellings.
- Snap every coordinate to a 1 mm grid so adjacent parcels' shared corners coincide exactly.
- Drop the zero-area ghost polygons left over from collinear-vertex parcels.
- Normalise each non-blank class label to its canonical TitleCase value and drop the blank or whitespace-only rows.
- Dissolve per canonical class and recompute area_m2 from the resulting geometry in metres.
- Write the four-row result as GPKG, keeping the input CRS intact.
Pitfalls
- Dissolving without snapping first looks correct on a quick check but produces MultiPolygons with hundreds of sub-millimetre interior holes along the internal grid lines, so the strict-Polygon and no-interior-holes subchecks both fail.
- Reprojecting to Web Mercator before writing because it feels like the universal web-portal CRS costs two subchecks (the soft-CRS policy reprojects the submission back to EPSG:26331 for the geometric checks, but the canonical and meaningful-set checks both fail).
- Picking ALL-CAPS canonical labels instead of TitleCase fails the canonical-class subcheck, even though the per-class area still matches.
- Leaving the blank-class rows in produces a fifth row that trips the count gate.
- Skipping the dissolve and emitting per-parcel rows blows the count gate completely (10 000 vs 4).
- Recomputing area_m2 from the stale input column instead of from the dissolved geometry leaves the values at the per-parcel 100 m² nominal, so the area-recomputed subcheck fails.
Map
Recent runs task v3
| adapter | started | score | steps | duration | cost | status |
|---|---|---|---|---|---|---|
| openrouter-gemma4-26b-basic | 2026-06-18T07:32:32Z | pending | — | — | — | pending |
| openrouter-deepseek-v4-flash-basic | 2026-06-18T03:08:04Z | 0.00 | 14 | 2:25 | 0.60¢ | done |
| openrouter-deepseek-v4-flash-detailed | 2026-06-17T22:01:33Z | 0.00 | 23 | 3:47 | 1.37¢ | done |
| openrouter-gemma4-26b-detailed | 2026-06-17T19:47:47Z | 0.00 | 8 | 0:46 | 0.41¢ | done |
| openrouter-deepseek-v4-flash-basic | 2026-06-16T21:43:55Z | 1.00 | 16 | 2:21 | 0.55¢ | done |