dc-l2-lagos-snap-normalize
Map (reference ← swipe → agent)
0:00
I need to get our legacy zoning layer ready for the state portal. lagos_zoning_legacy.gpkg has inconsistent zoning_class values (mixed casing, spelling variations, and some blank entries) plus vertex precision issues that cause tiny gaps and overlaps between adjacent parcels. Can you clean up the class names to consistent canonical TitleCase values, drop any rows with blank or whitespace-only classes, fix the vertex precision problems, and produce one merged geometry per canonical zoning class? Please write the result to zoning_aggregated.gpkg, with each row carrying two columns: zoning_class and a numeric area_m2.
0:00
System info
{
"adapter": "openrouter",
"model": "google/gemma-4-26b-a4b-it",
"base_url": "https://openrouter.ai/api/v1"
}
0:03
Bash
ls -F
ls -F
lagos_zoning_legacy.gpkg
0:16
Write
/work/inspect_data.py
content · 14 lines · py
import geopandas as gpd
# Load the data
gdf = gpd.read_file('lagos_zoning_legacy.gpkg')
# Inspect the data
print("Initial columns:", gdf.columns.tolist())
print("Initial head:")
print(gdf.head())
print("\nValue counts for zoning_class:")
print(gdf['zoning_class'].value_counts(dropna=False))
print("\nGeometry types:", gdf.geometry.type.unique())
print("\nCRS:", gdf.crs)
Wrote /work/inspect_data.py (374 chars).
0:20
Bash
python inspect_data.py
python inspect_data.py
Initial columns: ['parcel_id', 'lga_source', 'zoning_class', 'area_m2', 'geometry']
Initial head:
parcel_id ... geometry
0 1 ... POLYGON ((540000 720000, 540010 720000, 540010...
1 2 ... POLYGON ((540010 720000, 540020 720000, 540020...
2 3 ... POLYGON ((540020 720000, 540030 720000, 540030...
3 4 ... POLYGON ((540030 720000, 540040 720000, 540040...
4 5 ... POLYGON ((540040 720000, 540050 720000, 540050...
[5 rows x 5 columns]
Value counts for zoning_class:
zoning_class
Residential 420
COMMERCIAL 420
industrial 420
Agri. 420
residential 419
resi. 419
Comm. 419
COMM. 419
indus. 419
AGRI. 419
Industrial 418
AGRICULTURAL 418
RESIDENTIAL 417
RESI. 417
comm. 417
commercial 417
Indus. 417
INDUSTRIAL 417
Agricultural 417
agri. 417
Resi. 416
Commercial 416
INDUS. 416
agricultural 416
13
13
NaN 12
\t 12
Name: count, dtype: int64
Geometry types: <ArrowStringArray>
['Polygon']
Length: 1, dtype: str
CRS: EPSG:26331
0:49
Write
/work/solve.py
content · 137 lines · py
import geopandas as gpd
import pandas as pd
from shapely.geometry import MultiPolygon, Polygon
def clean_zoning_class(val):
if pd.isna(val):
return None
s = str(val).strip().lower()
if not s:
return None
# Mapping dictionary for spelling variations
# Based on inspection:
# Residential: 'residential', 'resi.'
# Commercial: 'commercial', 'comm.'
# Industrial: 'industrial', 'indus.'
# Agricultural: 'agricultural', 'agri.'
mapping = {
'residential': 'Residential',
'resi.': 'Residential',
'commercial': 'Commercial',
'comm.': 'Commercial',
'industrial': 'Industrial',
'indus.': 'Industrial',
'agricultural': 'Agricultural',
'agri.': 'Agricultural'
}
return mapping.get(s, s.title())
def solve():
# 1. Load
gdf = gpd.read_file('lagos_zoning_legacy.gpkg')
# 2. Clean zoning_class
gdf['zoning_class'] = gdf['zoning_class'].apply(clean_zoning_class)
# 3. Drop rows with blank/None classes
gdf = gdf.dropna(subset=['zoning_class'])
gdf = gdf[gdf['zoning_class'].str.strip() != '']
print("After cleaning classes, value counts:")
print(gdf['zoning_class'].value_counts())
# 4. Fix vertex precision problems (gaps and overlaps)
# A common way to fix topology issues in GIS is to use a snap-to-grid or buffer(0) approach.
# However, "tiny gaps and overlaps" are often solved by a small buffer(0) or by snapping to a grid.
# Since we want to merge them anyway, let's snap the coordinates to a certain precision first.
# EPSG:26331 is in meters. Snapping to 1cm (0.01m) should help.
def snap_geometry(geom, precision=0.01):
if geom.is_empty:
return geom
# Rounding coordinates is a simple way to snap to a grid
def round_coords(coords):
return tuple(round(c, 2) for c in coords)
if isinstance(geom, Polygon):
return Polygon(round_coords(geom.exterior.coords))
elif isinstance(geom, MultiPolygon):
return MultiPolygon([Polygon(round_coords(p.exterior.coords)) for p in geom.geoms])
return geom
# Actually, a better approach for gaps/overlaps is buffering by a tiny amount or using a grid.
# Let's try snapping all coordinates to the nearest 0.01m.
gdf.geometry = gdf.geometry.apply(lambda x: x.buffer(0.001).buffer(-0.001))
# Wait, buffer(0) is more standard for fixing invalid geometries.
# Let's use a coordinate rounding approach.
from shapely import wkt
def snap_to_grid(geom, grid_size=0.01):
if geom.is_empty:
return geom
# This is a bit complex to do manually for all types,
# but for Polygons we can use the fact that we'll dissolve later.
# Let's use shapely's rounding if available or just buffer(0).
return geom.buffer(0)
gdf.geometry = gdf.geometry.buffer(0)
# Actually, the most robust way to handle "gaps and overlaps" for dissolution
# is to snap geometries to a grid.
def snap_to_grid_geometry(geom, precision=0.01):
# This approach snaps all vertices to a grid
from shapely.ops import transform
import numpy as np
def snap_func(x, y):
return np.round(x / precision) * precision, np.round(y / precision) * precision
# We'll use a more direct approach:
# Just use the built-in precision if possible.
# Since we are using shapely, we can use the `wkt` or coordinate manipulation.
pass
# Let's try the simplest way to fix precision: round the coordinates
from shapely.affinity import scale
def round_coords_to_precision(geom, precision=0.01):
# Shapely doesn't have a direct "round" function for all geometries easily
# but we can use the 'apply' on coordinates.
# A trick is to use buffer(0) which often cleans up topology.
return geom.buffer(0)
# For the specific issue of gaps/overlaps, let's try snapping to 1cm grid
# A simple way to snap:
from shapely.geometry import mapping
from shapely.ops import unary_union
# Instead of complex snapping, let's use the fact that we want to aggregate.
# We will group by zoning_class, then unary_union the geometries.
# unary_union is excellent at merging adjacent polygons and resolving overlaps.
results = []
for z_class, group in gdf.groupby('zoning_class'):
# Merge all geometries in this group
merged_geom = group.geometry.unary_union
# Calculate total area
total_area = merged_geom.area
results.append({
'zoning_class': z_class,
'area_m2': total_area,
'geometry': merged_geom
})
output_gdf = gpd.GeoDataFrame(results, crs=gdf.crs)
# 5. Write to file
output_gdf.to_file('zoning_aggregated.gpkg', driver="GPKG")
print("Successfully wrote zoning_aggregated.gpkg")
if __name__ == "__main__":
solve()
Wrote /work/solve.py (5100 chars).
1:17
Edit
/work/solve.py
old_string not found in /work/solve.py
1:39
Write
/work/solve.py
content · 69 lines · py
import geopandas as gpd
import pandas as pd
import numpy as np
from shapely.ops import transform
def clean_zoning_class(val):
if pd.isna(val):
return None
s = str(val).strip().lower()
if not s:
return None
# Mapping dictionary for spelling variations
mapping = {
'residential': 'Residential',
'resi.': 'Residential',
'commercial': 'Commercial',
'comm.': 'Commercial',
'industrial': 'Industrial',
'indus.': 'Industrial',
'agricultural': 'Agricultural',
'agri.': 'Agricultural'
}
return mapping.get(s, s.title())
def snap_to_grid(geom, precision=0.01):
def snap_func(x, y):
return np.round(x / precision) * precision, np.round(y / precision) * precision
return transform(snap_func, geom)
def solve():
# 1. Load
gdf = gpd.read_file('lagos_zoning_legacy.gpkg')
# 2. Clean zoning_class
gdf['zoning_class'] = gdf['zoning_class'].apply(clean_zoning_class)
# 3. Drop rows with blank/None classes
gdf = gdf.dropna(subset=['zoning_class'])
gdf = gdf[gdf['zoning_class'].str.strip() != '']
print("After cleaning classes, value counts:")
print(gdf['zoning_class'].value_counts())
# 4. Fix vertex precision problems (gaps and overlaps)
print("Snapping geometries to 0.01m grid...")
gdf.geometry = gdf.geometry.apply(lambda x: snap_to_grid(x, 0.01))
# 5. Dissolve by zoning_class
print("Dissolving geometries by zoning_class...")
# Dissolve handles the merging of polygons into one per class
dissolved = gdf.dissolve(by='zoning_class')
# 6. Calculate area of the merged geometry
dissolved['area_m2'] = dissolved.geometry.area
# Prepare final GeoDataFrame
output_gdf = dissolved.reset_index()[['zoning_class', 'area_m2', 'geometry']]
# 7. Write to file
output_gdf.to_file('zoning_aggregated.gpkg', driver="GPKG")
print("Successfully wrote zoning_aggregated.gpkg")
print("\nFinal aggregated data:")
print(output_gdf)
if __name__ == "__main__":
solve()
Wrote /work/solve.py (2114 chars).
1:45
Bash
python solve.py
python solve.py
After cleaning classes, value counts: zoning_class Residential 2508 Commercial 2508 Industrial 2507 Agricultural 2507 Name: count, dtype: int64 Snapping geometries to 0.01m grid... Dissolving geometries by zoning_class... Successfully wrote zoning_aggregated.gpkg Final aggregated data: zoning_class area_m2 geometry 0 Agricultural 250000.0 POLYGON ((540500 720560, 540500 720570, 540500... 1 Commercial 250000.0 POLYGON ((540500 720100, 540500 720110, 540500... 2 Industrial 250000.0 POLYGON ((540000 720560, 540000 720570, 540000... 3 Residential 250000.0 POLYGON ((540000 720100, 540000 720110, 540000...
1:52
Bash
# Check if the output file exists and has the correct columns and geometry
# Check if the output file exists and has the correct columns and geometry
python -c "import geopandas as gpd; gdf = gpd.read_file('zoning_aggregated.gpkg'); print(gdf.head()); print(gdf.columns); print(gdf.dtypes)"
zoning_class area_m2 geometry 0 Agricultural 250000.0 POLYGON ((540500 720560, 540500 720570, 540500... 1 Commercial 250000.0 POLYGON ((540500 720100, 540500 720110, 540500... 2 Industrial 250000.0 POLYGON ((540000 720560, 540000 720570, 540000... 3 Residential 250000.0 POLYGON ((540000 720100, 540000 720110, 540000... Index(['zoning_class', 'area_m2', 'geometry'], dtype='str') zoning_class str area_m2 float64 geometry geometry dtype: object
2:41