dc-l2-lagos-snap-normalize

Status: done
Score: 0.000
Duration: 0:46
Cost: 0.41¢
Model: google/gemma-4-26b-a4b-it

Map (reference ← swipe → agent)

0:00
I need to get our legacy zoning layer ready for the state portal. lagos_zoning_legacy.gpkg has inconsistent zoning_class values (mixed casing, spelling variations, and some blank entries) plus vertex precision issues that cause tiny gaps and overlaps between adjacent parcels. Can you clean up the class names to consistent canonical TitleCase values, drop any rows with blank or whitespace-only classes, fix the vertex precision problems, and produce one merged geometry per canonical zoning class? Please write the result to zoning_aggregated.gpkg, with each row carrying two columns: zoning_class and a numeric area_m2.
0:00
System info
{
  "adapter": "openrouter",
  "model": "google/gemma-4-26b-a4b-it",
  "base_url": "https://openrouter.ai/api/v1"
}
0:00
Bash ls -l /work/lagos_zoning_legacy.gpkg
-rw-r--r-- 1 agent agent 2527232 Jun 17 20:06 /work/lagos_zoning_legacy.gpkg
0:03
Write /work/inspect_data.py
content · 15 lines · py
import geopandas as gpd

def inspect_data(path):
    gdf = gpd.read_file(path)
    print("Columns:", gdf.columns.tolist())
    print("Head:\n", gdf.head())
    print("\nUnique zoning_class values before cleaning:")
    print(gdf['zoning_class'].unique())
    print("\nGeometry type:", gdf.geometry.type.unique())
    print("CRS:", gdf.crs)
    return gdf

if __name__ == "__main__":
    gdf = inspect_data('/work/lagos_zoning_legacy.gpkg')
Wrote /work/inspect_data.py (440 chars).
0:04
Bash python /work/inspect_data.py
Columns: ['parcel_id', 'lga_source', 'zoning_class', 'area_m2', 'geometry']
Head:
    parcel_id  ...                                           geometry
0          1  ...  POLYGON ((540000 720000, 540010 720000, 540010...
1          2  ...  POLYGON ((540010 720000, 540020 720000, 540020...
2          3  ...  POLYGON ((540020 720000, 540030 720000, 540030...
3          4  ...  POLYGON ((540030 720000, 540040 720000, 540040...
4          5  ...  POLYGON ((540040 720000, 540050 720000, 540050...

[5 rows x 5 columns]

Unique zoning_class values before cleaning:
<ArrowStringArray>
[ 'RESIDENTIAL',  'residential',        'Resi.',        'resi.',
        'RESI.',  'Residential',        'Comm.',        'comm.',
        'COMM.',   'Commercial',   'COMMERCIAL',   'commercial',
       'Indus.',       'indus.',       'INDUS.',   'Industrial',
   'INDUSTRIAL',   'industrial',        'AGRI.', 'Agricultural',
 'AGRICULTURAL', 'agricultural',        'Agri.',        'agri.',
             '',          '   ',            nan,           '\t']
Length: 28, dtype: str

Geometry type: <ArrowStringArray>
['Polygon']
Length: 1, dtype: str
CRS: EPSG:26331
0:18
Write /work/solve.py
content · 131 lines · py
import geopandas as gpd
import pandas as pd
from shapely.geometry import Polygon, MultiPolygon
from shapely.ops import unary_union

def clean_zoning_class(val):
    if pd.isna(val):
        return None
    s = str(val).strip().lower()
    if not s:
        return None
    
    # Define mappings for spelling variations
    mappings = {
        'resi.': 'Residential',
        'residential': 'Residential',
        'comm.': 'Commercial',
        'commercial': 'Commercial',
        'indus.': 'Industrial',
        'industrial': 'Industrial',
        'agri.': 'Agricultural',
        'agricultural': 'Agricultural'
    }
    
    # Try to find mapping
    for key, canonical in mappings.items():
        if s == key:
            return canonical
    
    # Fallback to TitleCase if no specific mapping found
    return s.capitalize()

def solve():
    # 1. Load data
    gdf = gpd.read_file('/work/lagos_zoning_legacy.gpkg')
    print("Loaded input data.")

    # 2. Clean zoning_class
    # First, apply the cleaning function
    gdf['zoning_class'] = gdf['zoning_class'].apply(clean_zoning_class)
    
    # Drop rows where zoning_class is None (handles NaN, '', ' ', etc.)
    gdf = gdf.dropna(subset=['zoning_class'])
    gdf = gdf[gdf['zoning_class'].str.strip() != '']
    
    print("Unique zoning classes after cleaning:", gdf['zoning_class'].unique())

    # 3. Fix vertex precision and topological issues
    # We use a small buffer(0) trick or snap to grid to fix gaps/overlaps
    # But since we are merging by class, we can unary_union the geometries of each class.
    # unary_union is very robust at fixing overlaps and gaps if we first snap.
    
    # Snap geometries to a grid to reduce precision artifacts
    # EPSG:26331 is metric. 1cm precision (0.01m) should be plenty for zoning.
    precision = 0.01
    gdf['geometry'] = gdf['geometry'].apply(lambda geom: geom.simplify(0, preserve_topology=True)) # placeholder
    # A better way for precision:
    from shapely import wkt
    
    # Re-snapping to grid
    def snap_to_grid(geom, precision):
        # This is a simple way to snap coordinates to a grid
        if geom.is_empty:
            return geom
        return gpd.GeoSeries([geom]).apply(lambda g: g.buffer(0)).iloc[0] # buffer(0) fixes some topology

    # Actually, the most robust way to "fix" gaps/overlaps is to dissolve the geometries.
    # But to ensure the "gaps" are filled, we might need to merge everything and then subtract?
    # No, the prompt says: "fix the vertex precision problems, and produce one merged geometry per canonical zoning class"
    # This implies the classes are somewhat well-behaved but have slivers.
    
    # Let's snap to grid first.
    def snap_coords(geom, grid_size):
        from shapely.affinity import scale, translate
        # A more direct way to snap coordinates:
        def snap_point(x, y):
            return round(x / grid_size) * grid_size, round(y / grid_size) * grid_size
        
        # We can use shapely.ops.transform
        from shapely.ops import transform
        return transform(lambda x, y: snap_point(x, y), geom)

    print("Snapping geometries to grid...")
    gdf['geometry'] = gdf['geometry'].apply(lambda g: snap_coords(g, precision))

    # 4. Merge geometries per zoning class
    print("Dissolving geometries by zoning_class...")
    # Dissolve will merge all geometries in a class into one (or many MultiPolygons if they aren't contiguous)
    # The prompt says "produce one merged geometry per canonical zoning class"
    # This usually means a single MultiPolygon per class.
    
    dissolved = gdf.dissolve(by='zoning_class', aggfunc={'area_m2': 'sum'})
    
    # Note: dissolve might create separate polygons if they aren't touching. 
    # If the user wants exactly ONE row per class, we need to ensure they are MultiPolygons.
    # dissolve() with a single geometry output per group usually returns MultiPolygons if they aren't contiguous.
    
    # Check if we need to force single geometry per class (it should be MultiPolygon already)
    # Let's check types
    print("Dissolved geometry types:", dissolved.geometry.type.unique())

    # 5. Final check on area_m2
    # The prompt asks for area_m2. The input has area_m2.
    # If we dissolve, we should probably recalculate the area to be safe,
    # or use the sum of input areas if we assume no overlaps.
    # However, since we are fixing overlaps/gaps, calculating from the new geometry is more accurate.
    
    dissolved['area_m2'] = dissolved.geometry.area

    # 6. Finalizing columns
    # Requirement: "each row carrying two columns: zoning_class and a numeric area_m2"
    # dissolve() puts 'zoning_class' in the index.
    dissolved = dissolved.reset_index()
    result = dissolved[['zoning_class', 'area_m2', 'geometry']]
    
    # The prompt specifically says "each row carrying two columns: zoning_class and a numeric area_m2"
    # It doesn't explicitly say "exclude geometry", but you can't have a GeoPackage without geometry.
    # I will provide the geometry as it is a GeoPackage.
    
    # Final selection as per instruction
    # "produce one merged geometry per canonical zoning class... with each row carrying two columns: zoning_class and a numeric area_m2"
    # This is a bit ambiguous about whether the geometry column should be included in the final file.
    # In a GPKG, the geometry column is mandatory.
    
    print("Saving to zoning_aggregated.gpkg...")
    result.to_file('/work/zoning_aggregated.gpkg', driver='GPKG')
    print("Done.")

if __name__ == "__main__":
    solve()
Wrote /work/solve.py (5598 chars).
0:19
Bash python /work/solve.py
Loaded input data.
Unique zoning classes after cleaning: <ArrowStringArray>
['Residential', 'Commercial', 'Industrial', 'Agricultural']
Length: 4, dtype: str
Snapping geometries to grid...
Dissolving geometries by zoning_class...
Dissolved geometry types: <ArrowStringArray>
['Polygon']
Length: 1, dtype: str
Saving to zoning_aggregated.gpkg...
Done.
0:33
Write /work/solve.py
content · 90 lines · py
import geopandas as gpd
import pandas as pd
from shapely.ops import transform
import numpy as np

def clean_zoning_class(val):
    if pd.isna(val):
        return None
    s = str(val).strip().lower()
    if not s:
        return None
    
    mappings = {
        'resi.': 'Residential',
        'residential': 'Residential',
        'comm.': 'Commercial',
        'commercial': 'Commercial',
        'indus.': 'Industrial',
        'industrial': 'Industrial',
        'agri.': 'Agricultural',
        'agricultural': 'Agricultural'
    }
    
    if s in mappings:
        return mappings[s]
    
    # Fallback: try to handle the case where it might be 'Resi' or something similar
    # but let's stick to the provided mapping first.
    # If not in mapping, try title case
    return s.title()

def snap_coords(geom, grid_size):
    def snap_point(x, y):
        return round(x / grid_size) * grid_size, round(y / grid_size) * grid_size
    return transform(lambda x, y: snap_point(x, y), geom)

def solve():
    # 1. Load data
    gdf = gpd.read_file('/work/lagos_zoning_legacy.gpkg')
    print(f"Loaded {len(gdf)} rows.")

    # 2. Clean zoning_class
    gdf['zoning_class'] = gdf['zoning_class'].apply(clean_zoning_class)
    gdf = gdf.dropna(subset=['zoning_class'])
    # Filter out any empty strings that might have survived
    gdf = gdf[gdf['zoning_class'].str.strip() != '']
    
    print("Unique zoning classes after cleaning:", gdf['zoning_class'].unique())

    # 3. Fix vertex precision and topological issues
    # Snapping to 1cm (0.01m) to fix tiny gaps and overlaps.
    precision = 0.01
    print("Snapping geometries to grid...")
    gdf['geometry'] = gdf['geometry'].apply(lambda g: snap_coords(g, precision))

    # 4. Merge geometries per zoning class
    # We use unary_union for each group to dissolve and clean up overlaps/gaps.
    # Dissolve with a groupby/agg is standard, but unary_union is better for cleaning geometry.
    
    print("Aggregating geometries...")
    
    results = []
    for cls, group in gdf.groupby('zoning_class'):
        # unary_union dissolves all geometries in the group into a single geometry
        merged_geom = group.geometry.unary_union
        # Recalculate area from the merged geometry to be precise
        merged_area = merged_geom.area
        results.append({
            'zoning_class': cls,
            'area_m2': merged_area,
            'geometry': merged_geom
        })
    
    # Create the final GeoDataFrame
    result_gdf = gpd.GeoDataFrame(results, crs=gdf.crs)
    
    # Ensure 'zoning_class' is a string (it might have become an object/category)
    result_gdf['zoning_class'] = result_gdf['zoning_class'].astype(str)
    result_gdf['area_m2'] = result_gdf['area_m2'].astype(float)

    print("Finalizing result...")
    print(result_gdf[['zoning_class', 'area_m2']])
    
    # 5. Save the result
    result_gdf.to_file('/work/zoning_aggregated.gpkg', driver='GPKG')
    print("Successfully saved to /work/zoning_aggregated.gpkg")

if __name__ == "__main__":
    solve()
Wrote /work/solve.py (3060 chars).
0:34
Bash python /work/solve.py
Loaded 10080 rows.
Unique zoning classes after cleaning: <ArrowStringArray>
['Residential', 'Commercial', 'Industrial', 'Agricultural']
Length: 4, dtype: str
Snapping geometries to grid...
Aggregating geometries...
Finalizing result...
   zoning_class   area_m2
0  Agricultural  250000.0
1    Commercial  250000.0
2    Industrial  250000.0
3   Residential  250000.0
Successfully saved to /work/zoning_aggregated.gpkg
/work/solve.py:65: DeprecationWarning: The 'unary_union' attribute is deprecated, use the 'union_all()' method instead.
  merged_geom = group.geometry.unary_union
0:37
Bash python -c "import geopandas as gpd; gdf = gpd.read_file('/work/zoning_aggregated...
   zoning_class   area_m2                                           geometry
0  Agricultural  250000.0  POLYGON ((540500 720560, 540500 720570, 540500...
1    Commercial  250000.0  POLYGON ((540500 720100, 540500 720110, 540500...
2    Industrial  250000.0  POLYGON ((540000 720560, 540000 720570, 540000...
3   Residential  250000.0  POLYGON ((540000 720100, 540000 720110, 540000...
['zoning_class', 'area_m2', 'geometry']
0:45