dc-l2-lagos-snap-normalize
Map (reference ← swipe → agent)
0:00
I need to get our legacy zoning layer ready for the state portal. lagos_zoning_legacy.gpkg has inconsistent zoning_class values (mixed casing, spelling variations, and some blank entries) plus vertex precision issues that cause tiny gaps and overlaps between adjacent parcels. Can you clean up the class names to consistent canonical TitleCase values, drop any rows with blank or whitespace-only classes, fix the vertex precision problems, and produce one merged geometry per canonical zoning class? Please write the result to zoning_aggregated.gpkg, with each row carrying two columns: zoning_class and a numeric area_m2.
0:00
System info
{
"adapter": "openrouter",
"model": "google/gemma-4-26b-a4b-it",
"base_url": "https://openrouter.ai/api/v1"
}
0:01
Bash
ls -l
ls -l
total 2468 -rw-r--r-- 1 agent agent 2527232 Jun 15 21:14 lagos_zoning_legacy.gpkg
0:05
Write
/work/inspect.py
content · 7 lines · py
import geopandas as gpd
gdf = gpd.read_file('lagos_zoning_legacy.gpkg')
print(gdf.head())
print(gdf.info())
print(gdf['zoning_class'].unique())
Wrote /work/inspect.py (145 chars).
0:07
Bash
python /work/inspect.py
python /work/inspect.py
Traceback (most recent call last):
File "/work/inspect.py", line 1, in <module>
import geopandas as gpd
File "/usr/local/lib/python3.14/site-packages/geopandas/__init__.py", line 3, in <module>
from geopandas.geoseries import GeoSeries
File "/usr/local/lib/python3.14/site-packages/geopandas/geoseries.py", line 7, in <module>
import numpy as np
File "/usr/local/lib/python3.14/site-packages/numpy/__init__.py", line 112, in <module>
from numpy.__config__ import show_config
File "/usr/local/lib/python3.14/site-packages/numpy/__config__.py", line 4, in <module>
from numpy._core._multiarray_umath import (
File "/usr/local/lib/python3.14/site-packages/numpy/_core/__init__.py", line 24, in <module>
from . import multiarray
File "/usr/local/lib/python3.14/site-packages/numpy/_core/multiarray.py", line 11, in <module>
from . import _multiarray_umath, overrides
File "/usr/local/lib/python3.14/site-packages/numpy/_core/overrides.py", line 4, in <module>
import inspect
File "/work/inspect.py", line 3, in <module>
gdf = gpd.read_file('lagos_zoning_legacy.gpkg')
AttributeError: partially initialized module 'geopandas' from '/usr/local/lib/python3.14/site-packages/geopandas/__init__.py' has no attribute 'read_file' (most likely due to a circular import)
0:14
Write
/work/inspect.py
content · 10 lines · py
import geopandas as gpd
import numpy as np
gdf = gpd.read_file('lagos_zoning_legacy.gpkg')
print("Columns:", gdf.columns)
print("First few rows:")
print(gdf.head())
print("\nUnique zoning classes:")
print(gdf['zoning_class'].unique())
Wrote /work/inspect.py (236 chars).
0:16
Bash
mv /work/inspect.py /work/check_data.py
mv /work/inspect.py /work/check_data.py python /work/check_data.py
Columns: Index(['parcel_id', 'lga_source', 'zoning_class', 'area_m2', 'geometry'], dtype='str')
First few rows:
parcel_id ... geometry
0 1 ... POLYGON ((540000 720000, 540010 720000, 540010...
1 2 ... POLYGON ((540010 720000, 540020 720000, 540020...
2 3 ... POLYGON ((540020 720000, 540030 720000, 540030...
3 4 ... POLYGON ((540030 720000, 540040 720000, 540040...
4 5 ... POLYGON ((540040 720000, 540050 720000, 540050...
[5 rows x 5 columns]
Unique zoning classes:
<ArrowStringArray>
[ 'RESIDENTIAL', 'residential', 'Resi.', 'resi.',
'RESI.', 'Residential', 'Comm.', 'comm.',
'COMM.', 'Commercial', 'COMMERCIAL', 'commercial',
'Indus.', 'indus.', 'INDUS.', 'Industrial',
'INDUSTRIAL', 'industrial', 'AGRI.', 'Agricultural',
'AGRICULTURAL', 'agricultural', 'Agri.', 'agri.',
'', ' ', nan, '\t']
Length: 28, dtype: str
1:40
Write
/work/solve.py
content · 138 lines · py
import geopandas as gpd
import pandas as pd
from shapely.geometry import Polygon, MultiPolygon
from shapely.ops import unary_union, snap
def clean_zoning_class(val):
if pd.isna(val) or str(val).strip() == "":
return None
val = str(val).strip().lower()
# Mapping variations to canonical TitleCase values
mapping = {
'residential': 'Residential',
'resi.': 'Residential',
'res': 'Residential',
'commercial': 'Commercial',
'comm.': 'Commercial',
'comm': 'Commercial',
'industrial': 'Industrial',
'indus.': 'Industrial',
'indus': 'Industrial',
'agricultural': 'Agricultural',
'agri.': 'Agricultural',
'agri': 'Agricultural'
}
# Try to find a match in the mapping, else TitleCase it
# We use common prefixes to catch variations like 'Resi.'
for key, canonical in mapping.items():
if val.startswith(key):
return canonical
return val.capitalize()
def solve():
print("Loading data...")
gdf = gpd.read_file('lagos_zoning_legacy.gpkg')
print("Cleaning zoning classes...")
gdf['zoning_class'] = gdf['zoning_class'].apply(clean_zoning_class)
# Drop rows with None/NaN/Empty zoning_class
gdf = gdf.dropna(subset=['zoning_class'])
gdf = gdf[gdf['zoning_class'].str.strip() != ""]
print(f"Unique classes after cleaning: {gdf['zoning_class'].unique()}")
# Fix vertex precision and overlaps/gaps
# A common technique is to snap geometries to a grid or to each other.
# Given the requirement "fix the vertex precision problems... produce one merged geometry per canonical zoning class"
# We will group by zoning_class and use unary_union, which dissolves boundaries.
# However, if there are small gaps, unary_union won't fill them.
# We can snap geometries to a grid to align vertices.
print("Snapping geometries to a grid to fix precision issues...")
# Using a precision of 0.1 meters (assuming the CRS is in meters)
# This helps align vertices that are slightly off.
gdf['geometry'] = gdf['geometry'].buffer(0) # Fix invalid geometries
# To address gaps/overlaps before union, we can use a small buffer or snap.
# Let's try snapping all geometries to a 0.01 grid.
# But for simplicity and robustness in a 'cleaning' task,
# we can group by class and use unary_union.
# To handle gaps, we can apply a tiny buffer then dissolve, but that might overlap.
# Better approach: snap vertices to a grid.
from shapely import wkb
# Snap to 1cm grid
precision = 0.01
gdf['geometry'] = gdf['geometry'].apply(lambda geom: snap(geom, geom, precision)) # This is identity, not helpful.
# Let's use a standard trick: buffer(0) and then unary_union by group.
# To handle the "tiny gaps/overlaps" explicitly, we can use a very small buffer.
# Or, we can snap the whole layer to a grid.
# Actually, the requirement says "fix the vertex precision problems, and produce one merged geometry per canonical zoning class".
# A common way to fix precision is to use the 'shapely.set_precision' if available, or coordinate rounding.
def snap_to_grid(geom, grid_size=0.01):
# Rounding coordinates to the grid size
if geom.is_empty:
return geom
return lambda x, y: (round(x / grid_size) * grid_size, round(y / grid_size) * grid_size)
# More robust: use shapely.set_precision (available in newer shapely)
try:
from shapely import set_precision
gdf['geometry'] = gdf['geometry'].apply(lambda g: set_precision(g, grid_size=0.01))
except ImportError:
print("shapely.set_precision not available, using coordinate rounding.")
def round_coords(geom):
from shapely.ops import transform
return transform(lambda x, y: (round(x, 2), round(y, 2)), geom)
gdf['geometry'] = gdf['geometry'].apply(round_coords)
print("Dissolving geometries by zoning class...")
# Dissolve by zoning_class
# We want to merge geometries first, then calculate area.
# If there are gaps, unary_union on the snapped geometries should close many of them.
aggregated_data = []
for z_class, group in gdf.groupby('zoning_class'):
merged_geom = group['geometry'].unary_union
# If gaps still exist, unary_union won't fill them.
# But typically in these tasks, snapping to grid is the intended solution for "precision issues".
area = merged_geom.area
aggregated_data.append({
'zoning_class': z_class,
'area_m2': area,
'geometry': merged_geom
})
print("Creating output GeoDataFrame...")
out_gdf = gpd.GeoDataFrame(aggregated_data, crs=gdf.crs)
# Ensure columns are in correct order and only contains requested
# "each row carrying two columns: zoning_class and a numeric area_m2"
# Wait, it says "produce one merged geometry per canonical zoning class"
# and then "each row carrying two columns: zoning_class and a numeric area_m2".
# Usually, a GeoDataFrame must have a geometry column. I'll include it.
# If the user strictly wants ONLY two columns, they might want a non-spatial file,
# but it's a .gpkg, so it must have geometry. I will include the geometry.
# Re-reading: "produce one merged geometry per canonical zoning class? ... write the result to zoning_aggregated.gpkg, with each row carrying two columns: zoning_class and a numeric area_m2."
# This is slightly contradictory if it's a .gpkg. I'll include zoning_class, area_m2, and geometry.
out_gdf = out_gdf[['zoning_class', 'area_m2', 'geometry']]
print(f"Saving to zoning_aggregated.gpkg...")
out_gdf.to_file('zoning_aggregated.gpkg', driver='GPKG')
print("Done!")
if __name__ == "__main__":
solve()
Wrote /work/solve.py (5994 chars).
1:42
Bash
python /work/solve.py
python /work/solve.py
Loading data... Cleaning zoning classes... Unique classes after cleaning: <ArrowStringArray> ['Residential', 'Commercial', 'Industrial', 'Agricultural'] Length: 4, dtype: str Snapping geometries to a grid to fix precision issues... Dissolving geometries by zoning class... Creating output GeoDataFrame... Saving to zoning_aggregated.gpkg... Done!
/work/solve.py:105: DeprecationWarning: The 'unary_union' attribute is deprecated, use the 'union_all()' method instead. merged_geom = group['geometry'].unary_union
1:47
Bash
python -c "import geopandas as gpd; gdf = gpd.read_file('zoning_aggregated.gpkg'...
python -c "import geopandas as gpd; gdf = gpd.read_file('zoning_aggregated.gpkg'); print(gdf.head()); print(gdf.columns)"
zoning_class area_m2 geometry 0 Agricultural 250000.0 POLYGON ((540500 720590, 540500 720600, 540500... 1 Commercial 250000.0 POLYGON ((540500 720090, 540500 720100, 540500... 2 Industrial 250000.0 POLYGON ((540000 720590, 540000 720600, 540000... 3 Residential 250000.0 POLYGON ((540000 720090, 540000 720100, 540000... Index(['zoning_class', 'area_m2', 'geometry'], dtype='str')
2:10
Write
/work/solve.py
content · 96 lines · py
import geopandas as gpd
import pandas as pd
from shapely.geometry import Polygon, MultiPolygon
from shapely.ops import unary_union
def clean_zoning_class(val):
if pd.isna(val) or str(val).strip() == "":
return None
val = str(val).strip().lower()
# Mapping variations to canonical TitleCase values
mapping = {
'residential': 'Residential',
'resi.': 'Residential',
'res': 'Residential',
'commercial': 'Commercial',
'comm.': 'Commercial',
'comm': 'Commercial',
'industrial': 'Industrial',
'indus.': 'Industrial',
'indus': 'Industrial',
'agricultural': 'Agricultural',
'agri.': 'Agricultural',
'agri': 'Agricultural'
}
# Check for exact matches or prefix matches
for key, canonical in mapping.items():
if val == key or val.startswith(key + '.') or val.startswith(key + ' '):
return canonical
# Fallback: try to match by start of word to handle things like 'Resi.'
if val.startswith('resi'): return 'Residential'
if val.startswith('comm'): return 'Commercial'
if val.startswith('indu'): return 'Industrial'
if val.startswith('agri'): return 'Agricultural'
return val.capitalize()
def solve():
print("Loading data...")
gdf = gpd.read_file('lagos_zoning_legacy.gpkg')
print("Cleaning zoning classes...")
gdf['zoning_class'] = gdf['zoning_class'].apply(clean_zoning_class)
# Drop rows with None/NaN/Empty zoning_class
gdf = gdf.dropna(subset=['zoning_class'])
gdf = gdf[gdf['zoning_class'].str.strip() != ""]
print(f"Unique classes after cleaning: {gdf['zoning_class'].unique()}")
# Fix vertex precision by snapping to a grid.
# Using a slightly larger grid to ensure tiny gaps/overlaps are closed.
try:
from shapely import set_precision
print("Using shapely.set_precision...")
gdf['geometry'] = gdf['geometry'].apply(lambda g: set_precision(g, grid_size=0.05))
except ImportError:
print("shapely.set_precision not available, using coordinate rounding.")
def round_coords(geom):
from shapely.ops import transform
return transform(lambda x, y: (round(x, 2), round(y, 2)), geom)
gdf['geometry'] = gdf['geometry'].apply(round_coords)
print("Dissolving geometries by zoning class...")
# Ensure geometry is valid before dissolving
gdf['geometry'] = gdf['geometry'].buffer(0)
aggregated_data = []
for z_class, group in gdf.groupby('zoning_class'):
# unary_union dissolves boundaries
merged_geom = group['geometry'].unary_union
area = merged_geom.area
aggregated_data.append({
'zoning_class': z_class,
'area_m2': area,
'geometry': merged_geom
})
print("Creating output GeoDataFrame...")
out_gdf = gpd.GeoDataFrame(aggregated_data, crs=gdf.crs)
# The instruction asks for two columns (zoning_class and area_m2)
# but since it's a .gpkg, the geometry column must be present.
# We'll include zoning_class, area_m2 and the geometry.
out_gdf = out_gdf[['zoning_class', 'area_m2', 'geometry']]
print(f"Saving to zoning_aggregated.gpkg...")
out_gdf.to_file('zoning_aggregated.gpkg', driver='GPKG')
print("Done!")
if __name__ == "__main__":
solve()
Wrote /work/solve.py (3441 chars).
2:12
Bash
python /work/solve.py
python /work/solve.py
Loading data... Cleaning zoning classes... Unique classes after cleaning: <ArrowStringArray> ['Residential', 'Commercial', 'Industrial', 'Agricultural'] Length: 4, dtype: str Using shapely.set_precision... Dissolving geometries by zoning class... Creating output GeoDataFrame... Saving to zoning_aggregated.gpkg... Done!
/work/solve.py:74: DeprecationWarning: The 'unary_union' attribute is deprecated, use the 'union_all()' method instead. merged_geom = group['geometry'].unary_union
2:17
Bash
python -c "import geopandas as gpd; gdf = gpd.read_file('zoning_aggregated.gpkg'...
python -c "import geopandas as gpd; gdf = gpd.read_file('zoning_aggregated.gpkg'); print(gdf.head()); print(gdf.columns)"
zoning_class area_m2 geometry 0 Agricultural 250000.0 POLYGON ((540500 720580, 540500 720590, 540500... 1 Commercial 250000.0 POLYGON ((540500 720080, 540500 720090, 540500... 2 Industrial 250000.0 POLYGON ((540000 720580, 540000 720590, 540000... 3 Residential 250000.0 POLYGON ((540000 720080, 540000 720090, 540000... Index(['zoning_class', 'area_m2', 'geometry'], dtype='str')
2:33