dc-l2-lagos-snap-normalize

Status: done
Score: 0.750
Duration: 2:09
Cost: 0.14¢
Model: google/gemma-4-26b-a4b-it

Map (reference ← swipe → agent)

0:00
I need to get our legacy zoning layer ready for the state portal. lagos_zoning_legacy.gpkg has inconsistent zoning_class values (mixed casing, spelling variations, and some blank entries) plus vertex precision issues that cause tiny gaps and overlaps between adjacent parcels. Can you clean up the class names to consistent canonical TitleCase values, drop any rows with blank or whitespace-only classes, fix the vertex precision problems, and produce one merged geometry per canonical zoning class? Please write the result to zoning_aggregated.gpkg, with each row carrying two columns: zoning_class and a numeric area_m2.
0:00
System info
{
  "adapter": "openrouter",
  "model": "google/gemma-4-26b-a4b-it",
  "base_url": "https://openrouter.ai/api/v1"
}
0:01
Bash ls -l
total 2468
-rw-r--r-- 1 agent agent 2527232 Jun 16 14:20 lagos_zoning_legacy.gpkg
0:12
Write /work/inspect_data.py
content · 15 lines · py
import geopandas as gpd

def inspect_data(file_path):
    gdf = gpd.read_file(file_path)
    print("Columns:", gdf.columns.tolist())
    print("\nValue counts for 'zoning_class':")
    print(gdf['zoning_class'].value_counts(dropna=False))
    print("\nGeometry type:", gdf.geometry.type.unique())
    print("\nShape:", gdf.shape)
    print("\nCRS:", gdf.crs)
    return gdf

if __name__ == "__main__":
    gdf = inspect_data('lagos_zoning_legacy.gpkg')
Wrote /work/inspect_data.py (453 chars).
0:15
Bash python inspect_data.py
Columns: ['parcel_id', 'lga_source', 'zoning_class', 'area_m2', 'geometry']

Value counts for 'zoning_class':
zoning_class
Residential     420
COMMERCIAL      420
industrial      420
Agri.           420
residential     419
resi.           419
Comm.           419
COMM.           419
indus.          419
AGRI.           419
Industrial      418
AGRICULTURAL    418
RESIDENTIAL     417
RESI.           417
comm.           417
commercial      417
Indus.          417
INDUSTRIAL      417
Agricultural    417
agri.           417
Resi.           416
Commercial      416
INDUS.          416
agricultural    416
                 13
                 13
NaN              12
\t               12
Name: count, dtype: int64

Geometry type: <ArrowStringArray>
['Polygon']
Length: 1, dtype: str

Shape: (10080, 5)

CRS: EPSG:26331
1:21
Write /work/solve.py
content · 93 lines · py
import geopandas as gpd
import pandas as pd
from shapely.geometry import Polygon, MultiPolygon
from shapely.ops import unary_union

def clean_zoning_class(val):
    if not isinstance(val, str):
        return None
    s = val.strip().lower()
    if not s:
        return None
    
    # Mapping variations to canonical TitleCase names
    # Based on inspection: Residential, Commercial, Industrial, Agricultural
    if s in ['residential', 'resi.', 'resi']:
        return 'Residential'
    elif s in ['commercial', 'comm.', 'comm']:
        return 'Commercial'
    elif s in ['industrial', 'indus.', 'indus']:
        return 'Industrial'
    elif s in ['agricultural', 'agri.', 'agri']:
        return 'Agricultural'
    else:
        # Fallback: just Title Case if it's unknown but not empty
        return s.capitalize()

def solve():
    # 1. Load data
    gdf = gpd.read_file('lagos_zoning_legacy.gpkg')
    
    # 2. Clean zoning_class
    # Convert to string first to handle potential mixed types, then apply cleaning
    gdf['zoning_class'] = gdf['zoning_class'].astype(str).apply(clean_zoning_class)
    
    # Drop rows with None (which covers NaN, empty, and whitespace-only after cleaning)
    gdf = gdf.dropna(subset=['zoning_class'])
    
    # 3. Fix vertex precision and resolve gaps/overlaps
    # A common trick for "fixing" gaps/overlaps in vector data is to round coordinates 
    # and then dissolve. To handle the "gaps and overlaps", we can use a small buffer 
    # or simply rely on the dissolve which handles overlaps by merging geometries.
    # However, 'tiny gaps' usually require a small buffer or snapping.
    # We'll use a small buffer(0) or simplify to snap vertices, but a more robust 
    # way for "merging per class" is to dissolve.
    
    # For precision issues, let's round the coordinates to a reasonable precision (e.g., 2 decimal places for meters)
    # Given CRS is EPSG:26331 (meters), 2 decimal places is ~1cm precision.
    def snap_to_grid(geom, precision=0.01):
        # Using a simple approach: round coordinates
        # shapely's simplify can also help, but rounding is more deterministic for "precision issues"
        # However, the easiest way to fix "tiny gaps/overlaps" is to dissolve.
        # If gaps exist, dissolve will create a MultiPolygon with holes.
        # If they must be "fixed", we could buffer by a tiny amount then dissolve.
        return geom.buffer(0.01).buffer(-0.01) # This can be risky.
    
    # Let's try a safer approach: dissolve first, then clean up. 
    # The prompt says "fix the vertex precision problems, and produce one merged geometry per canonical zoning class".
    # We will round the coordinates to fix precision.
    
    def round_coords(geom, precision=0.01):
        if geom.is_empty:
            return geom
        # Use a more standard way to handle vertex precision: precision rounding via simplify
        return geom.simplify(precision, preserve_topology=True)

    gdf['geometry'] = gdf['geometry'].apply(lambda x: round_coords(x, 0.01))

    # 4. Aggregate (Dissolve)
    # Dissolve by 'zoning_class'
    # We need to calculate area_m2 for the aggregated geometry
    
    # Group by zoning_class and dissolve
    dissolved = gdf.dissolve(by='zoning_class', aggfunc={'area_m2': 'sum'})
    
    # Note: the 'dissolve' method in geopandas returns a GeoDataFrame where 
    # the grouping column becomes the index. We need to reset it and ensure 
    # 'area_m2' is the calculated area of the *new* geometry.
    
    # Let's re-calculate area for the final merged geometries to be sure.
    dissolved = dissolved.reset_index()
    dissolved['area_m2'] = dissolved.geometry.area
    
    # Keep only the required columns
    result = dissolved[['zoning_class', 'area_m2', 'geometry']]
    
    # 5. Write result
    result.to_file('zoning_aggregated.gpkg', driver='GPKG')
    print("Successfully created zoning_aggregated.gpkg")
    print(result)

if __name__ == "__main__":
    solve()
Wrote /work/solve.py (3999 chars).
1:23
Bash python solve.py
Successfully created zoning_aggregated.gpkg
   zoning_class  ...                                           geometry
0  Agricultural  ...  MULTIPOLYGON (((540500 720500, 540500 720510, ...
1    Commercial  ...  MULTIPOLYGON (((540510 720030, 540510 720030, ...
2    Industrial  ...  MULTIPOLYGON (((540000 720530, 540000 720540, ...
3   Residential  ...  MULTIPOLYGON (((540010 720000, 540000 720000, ...

[4 rows x 3 columns]
1:32
Bash python -c "import geopandas as gpd; gdf = gpd.read_file('zoning_aggregated.gpkg'...
   zoning_class  ...                                           geometry
0  Agricultural  ...  MULTIPOLYGON (((540500 720500, 540500 720510, ...
1    Commercial  ...  MULTIPOLYGON (((540510 720030, 540510 720030, ...
2    Industrial  ...  MULTIPOLYGON (((540000 720530, 540000 720540, ...
3   Residential  ...  MULTIPOLYGON (((540010 720000, 540000 720000, ...

[4 rows x 3 columns]
['zoning_class', 'area_m2', 'geometry']
zoning_class         str
area_m2          float64
geometry        geometry
dtype: object
1:59
Write
Invalid JSON arguments: Unterminated string starting at: line 1 column 13 (char 12)