dc-l2-lagos-snap-normalize

Status: done
Score: 1.000
Duration: 2:43
Cost: 0.40¢
Model: google/gemma-4-26b-a4b-it

Map (reference ← swipe → agent)

0:00
I need to get our legacy zoning layer ready for the state portal. lagos_zoning_legacy.gpkg has inconsistent zoning_class values (mixed casing, spelling variations, and some blank entries) plus vertex precision issues that cause tiny gaps and overlaps between adjacent parcels. Can you clean up the class names to consistent canonical TitleCase values, drop any rows with blank or whitespace-only classes, fix the vertex precision problems, and produce one merged geometry per canonical zoning class? Please write the result to zoning_aggregated.gpkg, with each row carrying two columns: zoning_class and a numeric area_m2.
0:00
System info
{
  "adapter": "openrouter",
  "model": "google/gemma-4-26b-a4b-it",
  "base_url": "https://openrouter.ai/api/v1"
}
0:03
Bash ls -F
ls -F
lagos_zoning_legacy.gpkg
0:16
Write /work/inspect_data.py
content · 14 lines · py
import geopandas as gpd

# Load the data
gdf = gpd.read_file('lagos_zoning_legacy.gpkg')

# Inspect the data
print("Initial columns:", gdf.columns.tolist())
print("Initial head:")
print(gdf.head())
print("\nValue counts for zoning_class:")
print(gdf['zoning_class'].value_counts(dropna=False))
print("\nGeometry types:", gdf.geometry.type.unique())
print("\nCRS:", gdf.crs)
Wrote /work/inspect_data.py (374 chars).
0:20
Bash python inspect_data.py
python inspect_data.py
Initial columns: ['parcel_id', 'lga_source', 'zoning_class', 'area_m2', 'geometry']
Initial head:
   parcel_id  ...                                           geometry
0          1  ...  POLYGON ((540000 720000, 540010 720000, 540010...
1          2  ...  POLYGON ((540010 720000, 540020 720000, 540020...
2          3  ...  POLYGON ((540020 720000, 540030 720000, 540030...
3          4  ...  POLYGON ((540030 720000, 540040 720000, 540040...
4          5  ...  POLYGON ((540040 720000, 540050 720000, 540050...

[5 rows x 5 columns]

Value counts for zoning_class:
zoning_class
Residential     420
COMMERCIAL      420
industrial      420
Agri.           420
residential     419
resi.           419
Comm.           419
COMM.           419
indus.          419
AGRI.           419
Industrial      418
AGRICULTURAL    418
RESIDENTIAL     417
RESI.           417
comm.           417
commercial      417
Indus.          417
INDUSTRIAL      417
Agricultural    417
agri.           417
Resi.           416
Commercial      416
INDUS.          416
agricultural    416
                 13
                 13
NaN              12
\t               12
Name: count, dtype: int64

Geometry types: <ArrowStringArray>
['Polygon']
Length: 1, dtype: str

CRS: EPSG:26331
0:49
Write /work/solve.py
content · 137 lines · py
import geopandas as gpd
import pandas as pd
from shapely.geometry import MultiPolygon, Polygon

def clean_zoning_class(val):
    if pd.isna(val):
        return None
    s = str(val).strip().lower()
    if not s:
        return None
    
    # Mapping dictionary for spelling variations
    # Based on inspection: 
    # Residential: 'residential', 'resi.'
    # Commercial: 'commercial', 'comm.'
    # Industrial: 'industrial', 'indus.'
    # Agricultural: 'agricultural', 'agri.'
    
    mapping = {
        'residential': 'Residential',
        'resi.': 'Residential',
        'commercial': 'Commercial',
        'comm.': 'Commercial',
        'industrial': 'Industrial',
        'indus.': 'Industrial',
        'agricultural': 'Agricultural',
        'agri.': 'Agricultural'
    }
    
    return mapping.get(s, s.title())

def solve():
    # 1. Load
    gdf = gpd.read_file('lagos_zoning_legacy.gpkg')
    
    # 2. Clean zoning_class
    gdf['zoning_class'] = gdf['zoning_class'].apply(clean_zoning_class)
    
    # 3. Drop rows with blank/None classes
    gdf = gdf.dropna(subset=['zoning_class'])
    gdf = gdf[gdf['zoning_class'].str.strip() != '']
    
    print("After cleaning classes, value counts:")
    print(gdf['zoning_class'].value_counts())

    # 4. Fix vertex precision problems (gaps and overlaps)
    # A common way to fix topology issues in GIS is to use a snap-to-grid or buffer(0) approach.
    # However, "tiny gaps and overlaps" are often solved by a small buffer(0) or by snapping to a grid.
    # Since we want to merge them anyway, let's snap the coordinates to a certain precision first.
    # EPSG:26331 is in meters. Snapping to 1cm (0.01m) should help.
    
    def snap_geometry(geom, precision=0.01):
        if geom.is_empty:
            return geom
        # Rounding coordinates is a simple way to snap to a grid
        def round_coords(coords):
            return tuple(round(c, 2) for c in coords)
        
        if isinstance(geom, Polygon):
            return Polygon(round_coords(geom.exterior.coords))
        elif isinstance(geom, MultiPolygon):
            return MultiPolygon([Polygon(round_coords(p.exterior.coords)) for p in geom.geoms])
        return geom

    # Actually, a better approach for gaps/overlaps is buffering by a tiny amount or using a grid.
    # Let's try snapping all coordinates to the nearest 0.01m.
    gdf.geometry = gdf.geometry.apply(lambda x: x.buffer(0.001).buffer(-0.001)) 
    # Wait, buffer(0) is more standard for fixing invalid geometries.
    # Let's use a coordinate rounding approach.
    
    from shapely import wkt
    
    def snap_to_grid(geom, grid_size=0.01):
        if geom.is_empty:
            return geom
        # This is a bit complex to do manually for all types, 
        # but for Polygons we can use the fact that we'll dissolve later.
        # Let's use shapely's rounding if available or just buffer(0).
        return geom.buffer(0)

    gdf.geometry = gdf.geometry.buffer(0)
    
    # Actually, the most robust way to handle "gaps and overlaps" for dissolution
    # is to snap geometries to a grid.
    
    def snap_to_grid_geometry(geom, precision=0.01):
        # This approach snaps all vertices to a grid
        from shapely.ops import transform
        import numpy as np
        
        def snap_func(x, y):
            return np.round(x / precision) * precision, np.round(y / precision) * precision
        
        # We'll use a more direct approach:
        # Just use the built-in precision if possible.
        # Since we are using shapely, we can use the `wkt` or coordinate manipulation.
        pass

    # Let's try the simplest way to fix precision: round the coordinates
    from shapely.affinity import scale
    
    def round_coords_to_precision(geom, precision=0.01):
        # Shapely doesn't have a direct "round" function for all geometries easily
        # but we can use the 'apply' on coordinates.
        # A trick is to use buffer(0) which often cleans up topology.
        return geom.buffer(0)

    # For the specific issue of gaps/overlaps, let's try snapping to 1cm grid
    # A simple way to snap:
    from shapely.geometry import mapping
    from shapely.ops import unary_union

    # Instead of complex snapping, let's use the fact that we want to aggregate.
    # We will group by zoning_class, then unary_union the geometries.
    # unary_union is excellent at merging adjacent polygons and resolving overlaps.
    
    results = []
    for z_class, group in gdf.groupby('zoning_class'):
        # Merge all geometries in this group
        merged_geom = group.geometry.unary_union
        # Calculate total area
        total_area = merged_geom.area
        results.append({
            'zoning_class': z_class,
            'area_m2': total_area,
            'geometry': merged_geom
        })
    
    output_gdf = gpd.GeoDataFrame(results, crs=gdf.crs)
    
    # 5. Write to file
    output_gdf.to_file('zoning_aggregated.gpkg', driver="GPKG")
    print("Successfully wrote zoning_aggregated.gpkg")

if __name__ == "__main__":
    solve()
Wrote /work/solve.py (5100 chars).
1:17
Edit /work/solve.py
old_string not found in /work/solve.py
1:39
Write /work/solve.py
content · 69 lines · py
import geopandas as gpd
import pandas as pd
import numpy as np
from shapely.ops import transform

def clean_zoning_class(val):
    if pd.isna(val):
        return None
    s = str(val).strip().lower()
    if not s:
        return None
    
    # Mapping dictionary for spelling variations
    mapping = {
        'residential': 'Residential',
        'resi.': 'Residential',
        'commercial': 'Commercial',
        'comm.': 'Commercial',
        'industrial': 'Industrial',
        'indus.': 'Industrial',
        'agricultural': 'Agricultural',
        'agri.': 'Agricultural'
    }
    
    return mapping.get(s, s.title())

def snap_to_grid(geom, precision=0.01):
    def snap_func(x, y):
        return np.round(x / precision) * precision, np.round(y / precision) * precision
    return transform(snap_func, geom)

def solve():
    # 1. Load
    gdf = gpd.read_file('lagos_zoning_legacy.gpkg')
    
    # 2. Clean zoning_class
    gdf['zoning_class'] = gdf['zoning_class'].apply(clean_zoning_class)
    
    # 3. Drop rows with blank/None classes
    gdf = gdf.dropna(subset=['zoning_class'])
    gdf = gdf[gdf['zoning_class'].str.strip() != '']
    
    print("After cleaning classes, value counts:")
    print(gdf['zoning_class'].value_counts())

    # 4. Fix vertex precision problems (gaps and overlaps)
    print("Snapping geometries to 0.01m grid...")
    gdf.geometry = gdf.geometry.apply(lambda x: snap_to_grid(x, 0.01))
    
    # 5. Dissolve by zoning_class
    print("Dissolving geometries by zoning_class...")
    # Dissolve handles the merging of polygons into one per class
    dissolved = gdf.dissolve(by='zoning_class')
    
    # 6. Calculate area of the merged geometry
    dissolved['area_m2'] = dissolved.geometry.area
    
    # Prepare final GeoDataFrame
    output_gdf = dissolved.reset_index()[['zoning_class', 'area_m2', 'geometry']]
    
    # 7. Write to file
    output_gdf.to_file('zoning_aggregated.gpkg', driver="GPKG")
    print("Successfully wrote zoning_aggregated.gpkg")
    print("\nFinal aggregated data:")
    print(output_gdf)

if __name__ == "__main__":
    solve()
Wrote /work/solve.py (2114 chars).
1:45
Bash python solve.py
python solve.py
After cleaning classes, value counts:
zoning_class
Residential     2508
Commercial      2508
Industrial      2507
Agricultural    2507
Name: count, dtype: int64
Snapping geometries to 0.01m grid...
Dissolving geometries by zoning_class...
Successfully wrote zoning_aggregated.gpkg

Final aggregated data:
   zoning_class   area_m2                                           geometry
0  Agricultural  250000.0  POLYGON ((540500 720560, 540500 720570, 540500...
1    Commercial  250000.0  POLYGON ((540500 720100, 540500 720110, 540500...
2    Industrial  250000.0  POLYGON ((540000 720560, 540000 720570, 540000...
3   Residential  250000.0  POLYGON ((540000 720100, 540000 720110, 540000...
1:52
Bash # Check if the output file exists and has the correct columns and geometry
# Check if the output file exists and has the correct columns and geometry
python -c "import geopandas as gpd; gdf = gpd.read_file('zoning_aggregated.gpkg'); print(gdf.head()); print(gdf.columns); print(gdf.dtypes)"
   zoning_class   area_m2                                           geometry
0  Agricultural  250000.0  POLYGON ((540500 720560, 540500 720570, 540500...
1    Commercial  250000.0  POLYGON ((540500 720100, 540500 720110, 540500...
2    Industrial  250000.0  POLYGON ((540000 720560, 540000 720570, 540000...
3   Residential  250000.0  POLYGON ((540000 720100, 540000 720110, 540000...
Index(['zoning_class', 'area_m2', 'geometry'], dtype='str')
zoning_class         str
area_m2          float64
geometry        geometry
dtype: object
2:41