dc-l2-lagos-snap-normalize

Status: done
Score: 1.000
Duration: 1:52
Cost: 0.18¢
Model: google/gemma-4-26b-a4b-it

Map (reference ← swipe → agent)

0:00
I need to get our legacy zoning layer ready for the state portal. lagos_zoning_legacy.gpkg has inconsistent zoning_class values (mixed casing, spelling variations, and some blank entries) plus vertex precision issues that cause tiny gaps and overlaps between adjacent parcels. Can you clean up the class names to consistent canonical TitleCase values, drop any rows with blank or whitespace-only classes, fix the vertex precision problems, and produce one merged geometry per canonical zoning class? Please write the result to zoning_aggregated.gpkg, with each row carrying two columns: zoning_class and a numeric area_m2.
0:00
System info
{
  "adapter": "openrouter",
  "model": "google/gemma-4-26b-a4b-it",
  "base_url": "https://openrouter.ai/api/v1"
}
0:01
Bash ls -l
total 2468
-rw-r--r-- 1 agent agent 2527232 Jun 16 12:14 lagos_zoning_legacy.gpkg
0:04
Write /work/inspect_data.py
content · 12 lines · py
import geopandas as gpd

# Load the data
gdf = gpd.read_file('lagos_zoning_legacy.gpkg')

print("Columns:", gdf.columns.tolist())
print("First few rows:")
print(gdf.head())
print("\nValue counts for zoning_class:")
print(gdf['zoning_class'].value_counts(dropna=False))
print("\nCRS:", gdf.crs)
Wrote /work/inspect_data.py (294 chars).
0:06
Bash python inspect_data.py
Columns: ['parcel_id', 'lga_source', 'zoning_class', 'area_m2', 'geometry']
First few rows:
   parcel_id  ...                                           geometry
0          1  ...  POLYGON ((540000 720000, 540010 720000, 540010...
1          2  ...  POLYGON ((540010 720000, 540020 720000, 540020...
2          3  ...  POLYGON ((540020 720000, 540030 720000, 540030...
3          4  ...  POLYGON ((540030 720000, 540040 720000, 540040...
4          5  ...  POLYGON ((540040 720000, 540050 720000, 540050...

[5 rows x 5 columns]

Value counts for zoning_class:
zoning_class
Residential     420
COMMERCIAL      420
industrial      420
Agri.           420
residential     419
resi.           419
Comm.           419
COMM.           419
indus.          419
AGRI.           419
Industrial      418
AGRICULTURAL    418
RESIDENTIAL     417
RESI.           417
comm.           417
commercial      417
Indus.          417
INDUSTRIAL      417
Agricultural    417
agri.           417
Resi.           416
Commercial      416
INDUS.          416
agricultural    416
                 13
                 13
NaN              12
\t               12
Name: count, dtype: int64

CRS: EPSG:26331
0:52
Write /work/solve.py
content · 95 lines · py
import geopandas as gpd
import pandas as pd
from shapely.geometry import Polygon, MultiPolygon

def clean_zoning_class(val):
    if pd.isna(val):
        return None
    s = str(val).strip().lower()
    if not s:
        return None
    
    # Mapping dictionary for variations
    mapping = {
        'residential': 'Residential',
        'resi.': 'Residential',
        'resi': 'Residential',
        'commercial': 'Commercial',
        'comm.': 'Commercial',
        'comm': 'Commercial',
        'industrial': 'Industrial',
        'indus.': 'Industrial',
        'indus': 'Industrial',
        'agricultural': 'Agricultural',
        'agri.': 'Agricultural',
        'agri': 'Agricultural'
    }
    
    # Try direct mapping first
    if s in mapping:
        return mapping[s]
    
    # Fallback to Title Case if not in mapping, but let's see if we missed something
    return s.title()

def solve():
    # 1. Load data
    print("Loading data...")
    gdf = gpd.read_file('lagos_zoning_legacy.gpkg')

    # 2. Clean zoning_class
    print("Cleaning zoning_class...")
    gdf['zoning_class'] = gdf['zoning_class'].apply(clean_zoning_class)
    
    # Drop rows where zoning_class is None or whitespace-only (already handled by None in clean_zoning_class)
    gdf = gdf.dropna(subset=['zoning_class'])
    gdf = gdf[gdf['zoning_class'].str.strip() != '']

    # 3. Fix vertex precision issues (tiny gaps/overlaps)
    # A common way is to round coordinates to a certain precision or use a snap-to-grid approach.
    # Since it's a metric CRS (EPSG:26331), 1mm is 0.001. Let's round to 1cm (0.01) or 1mm (0.001).
    # Let's try rounding to 1mm (0.001) to avoid losing too much detail but snap vertices.
    print("Fixing vertex precision...")
    def snap_geometry(geom, precision=0.001):
        if geom is None:
            return None
        # Rounding coordinates is a simple way to snap to a grid
        from shapely.ops import transform
        def round_coords(x, y, z=None):
            return (round(x / precision) * precision, round(y / precision) * precision)
        return transform(lambda x, y, z=None: round_coords(x, y), geom)

    gdf['geometry'] = gdf['geometry'].apply(lambda g: snap_geometry(g))

    # 4. Merge geometries by zoning_class
    print("Merging geometries by zoning_class...")
    # We need to dissolve the geometries based on zoning_class.
    # To ensure we handle overlaps/gaps properly after snapping, 
    # we should ideally use a buffer(0) or union. 
    # Dissolve in geopandas is a union operation.
    
    # Group by zoning_class and dissolve
    dissolved = gdf.dissolve(by='zoning_class', aggfunc={'area_m2': 'sum'})
    
    # The dissolve operation results in 'zoning_class' being the index.
    # We need to move it back to a column and ensure 'area_m2' is preserved.
    dissolved = dissolved.reset_index()
    
    # Note: The area_m2 in the prompt might mean the area of the merged polygon.
    # "each row carrying two columns: zoning_class and a numeric area_m2"
    # If they want the total area of the resulting polygon, we should recalculate it.
    print("Recalculating merged area...")
    dissolved['area_m2'] = dissolved['geometry'].area

    # 5. Save to file
    print(f"Saving to zoning_aggregated.gpkg...")
    # Ensure columns are in correct order/subset
    output_gdf = dissolved[['zoning_class', 'area_m2', 'geometry']]
    output_gdf.to_file('zoning_aggregated.gpkg', driver='GPKG')
    
    print("Done!")
    print(output_gdf.head())

if __name__ == "__main__":
    solve()
Wrote /work/solve.py (3555 chars).
0:53
Bash python solve.py
Loading data...
Cleaning zoning_class...
Fixing vertex precision...
Merging geometries by zoning_class...
Recalculating merged area...
Saving to zoning_aggregated.gpkg...
Done!
   zoning_class   area_m2                                           geometry
0  Agricultural  250000.0  POLYGON ((540500 720560, 540500 720570, 540500...
1    Commercial  250000.0  POLYGON ((540500 720100, 540500 720110, 540500...
2    Industrial  250000.0  POLYGON ((540000 720560, 540000 720570, 540000...
3   Residential  250000.0  POLYGON ((540000 720100, 540000 720110, 540000...
1:09
Bash python -c "import geopandas as gpd; gdf = gpd.read_file('zoning_aggregated.gpkg'...
   zoning_class   area_m2                                           geometry
0  Agricultural  250000.0  POLYGON ((540500 720560, 540500 720570, 540500...
1    Commercial  250000.0  POLYGON ((540500 720100, 540500 720110, 540500...
2    Industrial  250000.0  POLYGON ((540000 720560, 540000 720570, 540000...
3   Residential  250000.0  POLYGON ((540000 720100, 540000 720110, 540000...
Index(['zoning_class', 'area_m2', 'geometry'], dtype='str')
1:32
Write
Invalid JSON arguments: Unterminated string starting at: line 1 column 13 (char 12)