run-20260618-030804Z / dc-l2-lagos-snap-normalize

0:00

I need to get our legacy zoning layer ready for the state portal. lagos_zoning_legacy.gpkg has inconsistent zoning_class values (mixed casing, spelling variations, and some blank entries) plus vertex precision issues that cause tiny gaps and overlaps between adjacent parcels. Can you clean up the class names to consistent canonical TitleCase values, drop any rows with blank or whitespace-only classes, fix the vertex precision problems, and produce one merged geometry per canonical zoning class? Please write the result to zoning_aggregated.gpkg, with each row carrying two columns: zoning_class and a numeric area_m2.

0:00

System info

{
  "adapter": "openrouter",
  "model": "deepseek/deepseek-v4-flash",
  "base_url": "https://openrouter.ai/api/v1"
}

0:08

Let me start by examining the input file to understand its structure and content.

0:08

Bash cd /work && python3 -c "

cd /work && python3 -c "
import geopandas as gpd
gdf = gpd.read_file('lagos_zoning_legacy.gpkg')
print('Columns:', gdf.columns.tolist())
print('Shape:', gdf.shape)
print('CRS:', gdf.crs)
print('Geometry type:', gdf.geometry.type.unique())
print()
print('--- zoning_class value counts ---')
print(gdf['zoning_class'].value_counts(dropna=False))
print()
print('--- dtypes ---')
print(gdf.dtypes)
print()
print('--- head ---')
print(gdf.head())
"

Columns: ['parcel_id', 'lga_source', 'zoning_class', 'area_m2', 'geometry']
Shape: (10080, 5)
CRS: EPSG:26331
Geometry type: <ArrowStringArray>
['Polygon']
Length: 1, dtype: str

--- zoning_class value counts ---
zoning_class
Residential     420
COMMERCIAL      420
industrial      420
Agri.           420
residential     419
resi.           419
Comm.           419
COMM.           419
indus.          419
AGRI.           419
Industrial      418
AGRICULTURAL    418
RESIDENTIAL     417
RESI.           417
comm.           417
commercial      417
Indus.          417
INDUSTRIAL      417
Agricultural    417
agri.           417
Resi.           416
Commercial      416
INDUS.          416
agricultural    416
                 13
                 13
NaN              12
\t               12
Name: count, dtype: int64

--- dtypes ---
parcel_id          int64
lga_source           str
zoning_class         str
area_m2          float64
geometry        geometry
dtype: object

--- head ---
   parcel_id  ...                                           geometry
0          1  ...  POLYGON ((540000 720000, 540010 720000, 540010...
1          2  ...  POLYGON ((540010 720000, 540020 720000, 540020...
2          3  ...  POLYGON ((540020 720000, 540030 720000, 540030...
3          4  ...  POLYGON ((540030 720000, 540040 720000, 540040...
4          5  ...  POLYGON ((540040 720000, 540050 720000, 540050...

[5 rows x 5 columns]

0:18

Let me analyze the data:

1. **zoning_class** column has many variations. I need to map them to canonical TitleCase values:
   - 'Residential', 'residential', 'RESIDENTIAL', 'Resi.', 'resi.', 'RESI.' → 'Residential'
   - 'COMMERCIAL', 'commercial', 'Commercial', 'Comm.', 'comm.', 'COMM.' → 'Commercial'
   - 'industrial', 'Industrial', 'Indus.', 'indus.', 'INDUS.', 'INDUSTRIAL' → 'Industrial'
   - 'Agri.', 'AGRI.', 'agri.', 'Agricultural', 'agricultural', 'AGRICULTURAL' → 'Agricultural'
   - Blank/whitespace/NaN → drop these rows

2. **Geometry issues**: There are likely vertex precision issues causing tiny gaps/overlaps between adjacent parcels. I should snap/snap to a grid to fix precision, then use unary_union/dissolve to merge by zoning_class.

3. **Output**: One merged geometry per canonical zoning class, with columns: zoning_class and area_m2.

Let me look at the geometry coordinates to understand the precision:

The example polygon shows coordinates like 540000, 720000, incrementing by 10. So the precision is likely integer coordinates. Let me check more carefully.

0:18

Bash cd /work && python3 -c "

cd /work && python3 -c "
import geopandas as gpd
gdf = gpd.read_file('lagos_zoning_legacy.gpkg')
# Check coordinate precision in some geometries
for i in range(min(10, len(gdf))):
    geom = gdf.iloc[i].geometry
    coords = list(geom.exterior.coords)
    print(f'Row {i}: {coords}')
"

Row 0: [(540000.0, 720000.0), (540010.00002, 720000.000005), (540010.000005, 720010.000015), (540000.000025, 720010.00002), (540000.0, 720000.0)]
Row 1: [(540010.00001, 720000.000025), (540020.0, 720000.000005), (540020.00002, 720010.00001), (540010.000005, 720010.00002), (540010.00001, 720000.000025)]
Row 2: [(540020.000025, 720000.00002), (540030.000015, 720000.0), (540030.0, 720010.00001), (540020.00002, 720010.000015), (540020.000025, 720000.00002)]
Row 3: [(540030.000005, 720000.00002), (540040.000025, 720000.000025), (540040.000015, 720010.000005), (540030.0, 720010.000015), (540030.000005, 720000.00002)]
Row 4: [(540040.00002, 720000.000015), (540050.000005, 720000.000025), (540050.00003, 720010.0), (540040.000015, 720010.00001), (540040.00002, 720000.000015)]
Row 5: [(540050.0, 720000.000015), (540060.00002, 720000.00002), (540060.00001, 720010.0), (540050.00003, 720010.000005), (540050.0, 720000.000015)]
Row 6: [(540060.000015, 720000.00001), (540070.0, 720000.00002), (540070.00002, 720010.000025), (540060.00001, 720010.000005), (540060.000015, 720000.00001)]
Row 7: [(540070.00003, 720000.000005), (540080.000015, 720000.000015), (540080.0, 720010.000025), (540070.000025, 720010.0), (540070.00003, 720000.000005)]
Row 8: [(540080.00001, 720000.000005), (540090.00003, 720000.00001), (540090.000015, 720010.00002), (540080.000005, 720010.0), (540080.00001, 720000.000005)]
Row 9: [(540090.000025, 720000.0), (540100.00001, 720000.00001), (540100.00003, 720010.000015), (540090.000015, 720010.000025), (540090.000025, 720000.0)]

0:40

Now I understand the data. Let me plan the solution:

1. **Clean zoning_class**: Map all variations to canonical TitleCase values:
- residential, Resi., RESI., resi. → Residential
- commercial, COMM., Comm., comm., COMMERCIAL, Commercial → Commercial
- industrial, Indus., INDUS., indus., INDUSTRIAL, Industrial → Industrial
- agricultural, AGRI., agri., Agri., AGRICULTURAL, Agricultural, AGRI. → Agricultural
- blank, whitespace-only, NaN → drop these rows

2. **Fix vertex precision**: Round coordinates to 3 decimal places (since CRS is EPSG:26331 which is likely meters, so mm precision is fine), then snap/cluster to fix gaps/overlaps. A common approach is to snap geometries to a grid (snap to 0.001 or 0.01), then run a union/dissolve by class.

Actually, looking at the coordinates, the values are like 540000.0, 540010.00002 etc. The intended precision seems to be 1-unit (1 meter) grid. Let me use snapping to 0.001 (mm) or even 0.01 (cm). Actually the vertex perturbations are in the 5th decimal (0.00001-0.00003 range), so rounding to 3 decimal places (mm) would fix that.

But for fixing the tiny gaps/overlaps between adjacent parcels, rounding alone might not suffice — after rounding, adjacent parcels might still have slight misalignments. A better approach is to:
1. Round coordinates to 3 decimal places
2. Use `snap` to snap geometries to each other (or use a grid approach)
3. Then dissolve by class

Actually, a cleaner approach:
1. Round geometry coordinates to e.g. 3 decimal places (or even integer since CRS is in meters and the grid looks like 10m intervals)
2. Clean class names
3. Dissolve/drop rows with bad classes
4. Union by class

Let me check the coordinate pattern more - values are like 540000, 540010, 540020... every 10 units, but with tiny perturbations. So snapping to integer (0 decimal places) or 0.001 would work. But snapping to integer might cause overlaps if boundaries are supposed to be shared. Let me think...

Actually, the safest approach for fixing vertex precision issues in a way that eliminates tiny gaps/overlaps:
1. Round all coordinates to 3 decimal places (mm precision - removes the 5th decimal noise)
2. Use `snap` to snap vertices of nearby geometries
3. Then dissolve by class

Or even better, use a snapping tolerance when doing the dissolve. Let me use geopandas' `dissolve` after snapping to a grid.

Let me write a comprehensive solution.

For the class mapping, I'll create a canonical mapping dictionary.

Let me also check what the valid classes should be:
- Residential
- Commercial
- Industrial
- Agricultural

Those seem to be the 4 canonical classes.

Let me write the script.

0:40

Write /work/solve.py

content · 123 lines · py

import geopandas as gpd
import pandas as pd
import numpy as np
from shapely.geometry import Polygon, MultiPolygon
from shapely.ops import unary_union, snap
from shapely.validation import make_valid

# ------------------------------------------------------------
# 1. Read data
# ------------------------------------------------------------
gdf = gpd.read_file('/work/lagos_zoning_legacy.gpkg')
print(f"Input rows: {len(gdf)}")

# ------------------------------------------------------------
# 2. Canonical zoning_class mapping
# ------------------------------------------------------------
class_map = {
    'residential': 'Residential',
    'resi.':       'Residential',
    'RESIDENTIAL': 'Residential',
    'RESI.':       'Residential',
    'Resi.':       'Residential',
    'commercial':  'Commercial',
    'comm.':       'Commercial',
    'COMMERCIAL':  'Commercial',
    'COMM.':       'Commercial',
    'Comm.':       'Commercial',
    'industrial':  'Industrial',
    'indus.':      'Industrial',
    'INDUSTRIAL':  'Industrial',
    'INDUS.':      'Industrial',
    'Indus.':      'Industrial',
    'agricultural':'Agricultural',
    'agri.':       'Agricultural',
    'AGRICULTURAL':'Agricultural',
    'AGRI.':       'Agricultural',
    'Agricultural':'Agricultural',
    'Agri.':       'Agricultural',
}

# Apply mapping (case-insensitive fallback for any unlisted)
def normalize_class(val):
    if val is None or (isinstance(val, str) and val.strip() == ''):
        return None
    v = str(val).strip()
    # Direct map lookup
    if v in class_map:
        return class_map[v]
    # Case-insensitive fallback: try lowercased
    v_lower = v.lower()
    for k, canon in class_map.items():
        if k.lower() == v_lower:
            return canon
    # Capitalize first letter as fallback
    return v.capitalize()

gdf['zoning_class'] = gdf['zoning_class'].apply(normalize_class)

# Drop rows with None (blank / whitespace / NaN)
before = len(gdf)
gdf = gdf.dropna(subset=['zoning_class'])
after = len(gdf)
print(f"Dropped {before - after} rows with blank/whitespace classes")
print(f"Remaining: {len(gdf)}")

# Check distribution
print("\nCanonical class counts:")
print(gdf['zoning_class'].value_counts())

# ------------------------------------------------------------
# 3. Fix vertex precision – snap to a 0.001 grid (mm)
# ------------------------------------------------------------
# Use Shapely's snap + coordinate rounding to eliminate micro-gaps
# First round coordinates to 3 decimal places
def round_geom(geom, ndigits=3):
    """Round all coordinates in a geometry to fixed decimal places."""
    if geom is None:
        return None
    return shapely.set_precision(geom, grid_size=10**-ndigits)

import shapely

gdf['geometry'] = gdf['geometry'].apply(lambda g: round_geom(g, ndigits=3))

# ------------------------------------------------------------
# 4. Snap adjacent geometries within a tiny tolerance
# ------------------------------------------------------------
# Dissolve by class first (temporary) to get a merged per-class shape
# But we need to fix overlaps/gaps between classes too.
# Strategy: use unary_union per class after snapping, which will
# cleanly merge overlapping/snapped boundaries.

# Ensure valid geometries
gdf['geometry'] = gdf['geometry'].apply(lambda g: make_valid(g) if g is not None else g)

# ------------------------------------------------------------
# 5. Dissolve by zoning_class – merge geometries
# ------------------------------------------------------------
dissolved = gdf.dissolve(by='zoning_class', aggfunc='mean')

# ------------------------------------------------------------
# 6. Compute area_m2 for each merged geometry
# ------------------------------------------------------------
# CRS is EPSG:26331 (projected, meters) – area from geometry is in m²
dissolved['area_m2'] = dissolved.geometry.area

# Round area to 2 decimal places for cleanliness
dissolved['area_m2'] = dissolved['area_m2'].round(2)

# Keep only the two requested columns
result = dissolved[['area_m2']].copy()
result.index.name = 'zoning_class'
result = result.reset_index()

print("\nResult:")
print(result)
print(f"\nTotal area: {result['area_m2'].sum():.2f} m²")

# ------------------------------------------------------------
# 7. Write output
# ------------------------------------------------------------
result.to_file('/work/zoning_aggregated.gpkg', layer='zoning_aggregated', driver='GPKG')
print("\nWritten to /work/zoning_aggregated.gpkg")

Wrote /work/solve.py (4501 chars).

0:51

Wait, I used `shapely.set_precision` but I need to import shapely properly. Also I wrote `shapely` in the function but imported it later. Let me fix the script.

Actually, looking at the script again, I'm importing `shapely` after calling `round_geom` which uses `shapely.set_precision`. Let me reorder. Also `make_valid` is imported but `shapely.set_precision` needs the module. Let me rewrite more carefully.

Also I should consider: the dissolve by class approach will do unary_union internally. But there might still be issues with tiny gaps between different classes. However, since the task says "one merged geometry per canonical zoning class", I think dissolving by class is the right approach, and snapping to grid + make_valid should handle the topology issues.

Let me rewrite the script more cleanly.

0:51

Write /work/solve.py

content · 111 lines · py

import geopandas as gpd
import pandas as pd
import numpy as np
import shapely
from shapely.validation import make_valid

# ------------------------------------------------------------
# 1. Read data
# ------------------------------------------------------------
gdf = gpd.read_file('/work/lagos_zoning_legacy.gpkg')
print(f"Input rows: {len(gdf)}")

# ------------------------------------------------------------
# 2. Canonical zoning_class mapping
# ------------------------------------------------------------
class_map = {
    'residential': 'Residential',
    'resi.':       'Residential',
    'RESIDENTIAL': 'Residential',
    'RESI.':       'Residential',
    'Resi.':       'Residential',
    'commercial':  'Commercial',
    'comm.':       'Commercial',
    'COMMERCIAL':  'Commercial',
    'COMM.':       'Commercial',
    'Comm.':       'Commercial',
    'industrial':  'Industrial',
    'indus.':      'Industrial',
    'INDUSTRIAL':  'Industrial',
    'INDUS.':      'Industrial',
    'Indus.':      'Industrial',
    'agricultural':'Agricultural',
    'agri.':       'Agricultural',
    'AGRICULTURAL':'Agricultural',
    'AGRI.':       'Agricultural',
    'Agricultural':'Agricultural',
    'Agri.':       'Agricultural',
}

def normalize_class(val):
    if val is None:
        return None
    if isinstance(val, str) and val.strip() == '':
        return None
    # Also catch 'nan' strings or blank-only strings
    v = str(val).strip()
    if v.lower() == 'nan' or v == '':
        return None
    # Direct map lookup
    if v in class_map:
        return class_map[v]
    # Case-insensitive fallback
    v_lower = v.lower()
    for k, canon in class_map.items():
        if k.lower() == v_lower:
            return canon
    # Fallback: TitleCase
    return v.capitalize()

gdf['zoning_class'] = gdf['zoning_class'].apply(normalize_class)

# Drop rows with None (blank / whitespace / NaN)
before = len(gdf)
gdf = gdf.dropna(subset=['zoning_class'])
after = len(gdf)
print(f"Dropped {before - after} rows with blank/whitespace classes")
print(f"Remaining: {len(gdf)}")

# Check distribution
print("\nCanonical class counts:")
print(gdf['zoning_class'].value_counts())

# ------------------------------------------------------------
# 3. Fix vertex precision – snap to a 0.001 grid
# ------------------------------------------------------------
def snap_to_grid(geom, grid_size=0.001):
    """Round all coordinates in a geometry to a fixed grid."""
    if geom is None:
        return None
    return shapely.set_precision(geom, grid_size=grid_size, mode='valid_output')

gdf['geometry'] = gdf['geometry'].apply(lambda g: snap_to_grid(g, 0.001))

# Ensure valid geometries
gdf['geometry'] = gdf['geometry'].apply(lambda g: make_valid(g) if g is not None else g)

# ------------------------------------------------------------
# 4. Dissolve by zoning_class – merge geometries
# ------------------------------------------------------------
dissolved = gdf.dissolve(by='zoning_class', aggfunc='mean')

# ------------------------------------------------------------
# 5. Compute area_m2 for each merged geometry
# ------------------------------------------------------------
# CRS is EPSG:26331 (projected, metres) – area from geometry is in m²
dissolved['area_m2'] = dissolved.geometry.area.round(2)

# Keep only the two requested columns
result = dissolved[['area_m2']].copy()
result.index.name = 'zoning_class'
result = result.reset_index()

print("\nResult:")
print(result)
print(f"\nTotal area: {result['area_m2'].sum():.2f} m²")

# ------------------------------------------------------------
# 6. Write output
# ------------------------------------------------------------
result.to_file('/work/zoning_aggregated.gpkg', layer='zoning_aggregated', driver='GPKG')
print("\nWritten to /work/zoning_aggregated.gpkg")

Wrote /work/solve.py (3832 chars).

0:57

Let me check if shapely.set_precision actually exists, and also think about potential issues.

The function `shapely.set_precision` might not exist in older versions. Let me check what's available.

0:57

Bash cd /work && python3 -c "import shapely; print(dir(shapely))" 2>&1 | tr ',' '\n' ...

 'get_precision'
 'set_precision'

1:01

Good, `shapely.set_precision` exists. Let me also check `make_valid` location.

1:01

Bash cd /work && python3 -c "

cd /work && python3 -c "
import shapely
help(shapely.set_precision)
" 2>&1 | head -30

Help on function set_precision in module shapely._geometry:

set_precision(geometry, grid_size, mode='valid_output', **kwargs)
    Return geometry with the precision set to a precision grid size.

    By default, geometries use double precision coordinates (grid_size = 0).

    Coordinates will be rounded if the precision grid specified is less precise
    than the input geometry. Duplicated vertices will be dropped from lines and
    polygons for grid sizes greater than 0. Line and polygon geometries may
    collapse to empty geometries if all vertices are closer together than
    ``grid_size`` or if a polygon becomes significantly narrower than
    ``grid_size``. Spikes or sections in polygons narrower than ``grid_size``
    after rounding the vertices will be removed, which can lead to multipolygons
    or empty geometries. Z values, if present, will not be modified.

    Notes
    -----
    * subsequent operations will always be performed in the precision of the
      geometry with higher precision (smaller "grid_size"). That same precision
      will be attached to the operation outputs.
    * input geometries should be geometrically valid; unexpected results may
      occur if input geometries are not.
    * the geometry returned will be in
      :ref:`mild canonical form <canonical-form>`, and the order of vertices can
      change and should not be relied upon.
    * returns None if geometry is None.

    Parameters
    ----------

1:05

Good. So `shapely.set_precision` can snap coordinates to a grid. Let me run the script.

1:05

Bash cd /work && python solve.py 2>&1

Input rows: 10080
Dropped 50 rows with blank/whitespace classes
Remaining: 10030

Canonical class counts:
zoning_class
Residential     2508
Commercial      2508
Industrial      2507
Agricultural    2507
Name: count, dtype: int64
Traceback (most recent call last):
  File "/work/solve.py", line 90, in <module>
    dissolved = gdf.dissolve(by='zoning_class', aggfunc='mean')
  File "/usr/local/lib/python3.14/site-packages/geopandas/geodataframe.py", line 2274, in dissolve
    aggregated_data = data.groupby(**groupby_kwargs).agg(aggfunc, **kwargs)
  File "/usr/local/lib/python3.14/site-packages/pandas/core/groupby/generic.py", line 2291, in aggregate
    result = op.agg()
  File "/usr/local/lib/python3.14/site-packages/pandas/core/apply.py", line 291, in agg
    return self.apply_str()
           ~~~~~~~~~~~~~~^^
  File "/usr/local/lib/python3.14/site-packages/pandas/core/apply.py", line 701, in apply_str
    return self._apply_str(obj, func, *self.args, **self.kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.14/site-packages/pandas/core/apply.py", line 792, in _apply_str
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.14/site-packages/pandas/core/groupby/groupby.py", line 2285, in mean
    result = self._cython_agg_general(
        "mean",
    ...<4 lines>...
        skipna=skipna,
    )
  File "/usr/local/lib/python3.14/site-packages/pandas/core/groupby/groupby.py", line 1808, in _cython_agg_general
    new_mgr = data.grouped_reduce(array_func)
  File "/usr/local/lib/python3.14/site-packages/pandas/core/internals/managers.py", line 1654, in grouped_reduce
    applied = blk.apply(func)
  File "/usr/local/lib/python3.14/site-packages/pandas/core/internals/blocks.py", line 347, in apply
    result = func(self.values, **kwargs)
  File "/usr/local/lib/python3.14/site-packages/pandas/core/groupby/groupby.py", line 1783, in array_func
    result = self._grouper._cython_operation(
        "aggregate",
    ...<4 lines>...
        **kwargs,
    )
  File "/usr/local/lib/python3.14/site-packages/pandas/core/groupby/ops.py", line 963, in _cython_operation
    return cy_op.cython_operation(
           ~~~~~~~~~~~~~~~~~~~~~~^
        values=values,
        ^^^^^^^^^^^^^^
    ...<4 lines>...
        **kwargs,
        ^^^^^^^^^
    )
    ^
  File "/usr/local/lib/python3.14/site-packages/pandas/core/groupby/ops.py", line 553, in cython_operation
    return values._groupby_op(
           ~~~~~~~~~~~~~~~~~~^
        how=self.how,
        ^^^^^^^^^^^^^
    ...<4 lines>...
        **kwargs,
        ^^^^^^^^^
    )
    ^
  File "/usr/local/lib/python3.14/site-packages/pandas/core/arrays/arrow/array.py", line 2677, in _groupby_op
    raise TypeError(
        f"dtype '{self.dtype}' does not support operation '{how}'"
    )
TypeError: dtype 'str' does not support operation 'mean'

1:15

The issue is that `aggfunc='mean'` tries to average string columns like `lga_source` and `parcel_id`. I should just drop the extra columns and only keep geometry and zoning_class, or use a different aggfunc. Let me simplify: drop the non-needed columns first, then dissolve.

1:15

Write /work/solve.py

content · 111 lines · py

import geopandas as gpd
import pandas as pd
import numpy as np
import shapely
from shapely.validation import make_valid

# ------------------------------------------------------------
# 1. Read data
# ------------------------------------------------------------
gdf = gpd.read_file('/work/lagos_zoning_legacy.gpkg')
print(f"Input rows: {len(gdf)}")

# ------------------------------------------------------------
# 2. Canonical zoning_class mapping
# ------------------------------------------------------------
class_map = {
    'residential': 'Residential',
    'resi.':       'Residential',
    'RESIDENTIAL': 'Residential',
    'RESI.':       'Residential',
    'Resi.':       'Residential',
    'commercial':  'Commercial',
    'comm.':       'Commercial',
    'COMMERCIAL':  'Commercial',
    'COMM.':       'Commercial',
    'Comm.':       'Commercial',
    'industrial':  'Industrial',
    'indus.':      'Industrial',
    'INDUSTRIAL':  'Industrial',
    'INDUS.':      'Industrial',
    'Indus.':      'Industrial',
    'agricultural':'Agricultural',
    'agri.':       'Agricultural',
    'AGRICULTURAL':'Agricultural',
    'AGRI.':       'Agricultural',
    'Agricultural':'Agricultural',
    'Agri.':       'Agricultural',
}

def normalize_class(val):
    if val is None:
        return None
    if isinstance(val, str) and val.strip() == '':
        return None
    v = str(val).strip()
    if v.lower() == 'nan' or v == '':
        return None
    # Direct map lookup
    if v in class_map:
        return class_map[v]
    # Case-insensitive fallback
    v_lower = v.lower()
    for k, canon in class_map.items():
        if k.lower() == v_lower:
            return canon
    # Fallback: TitleCase
    return v.capitalize()

gdf['zoning_class'] = gdf['zoning_class'].apply(normalize_class)

# Drop rows with None (blank / whitespace / NaN)
before = len(gdf)
gdf = gdf.dropna(subset=['zoning_class'])
after = len(gdf)
print(f"Dropped {before - after} rows with blank/whitespace classes")
print(f"Remaining: {len(gdf)}")

print("\nCanonical class counts:")
print(gdf['zoning_class'].value_counts())

# ------------------------------------------------------------
# 3. Fix vertex precision – snap to a 0.001 grid
# ------------------------------------------------------------
def snap_to_grid(geom, grid_size=0.001):
    """Round all coordinates in a geometry to a fixed grid."""
    if geom is None:
        return None
    return shapely.set_precision(geom, grid_size=grid_size, mode='valid_output')

gdf['geometry'] = gdf['geometry'].apply(lambda g: snap_to_grid(g, 0.001))

# Ensure valid geometries
gdf['geometry'] = gdf['geometry'].apply(lambda g: make_valid(g) if g is not None else g)

# ------------------------------------------------------------
# 4. Dissolve by zoning_class – merge geometries
# ------------------------------------------------------------
# Keep only essential columns for dissolve
gdf_simple = gdf[['zoning_class', 'geometry']].copy()
dissolved = gdf_simple.dissolve(by='zoning_class')

# ------------------------------------------------------------
# 5. Compute area_m2 for each merged geometry
# ------------------------------------------------------------
# CRS is EPSG:26331 (projected, metres) – area from geometry is in m²
dissolved['area_m2'] = dissolved.geometry.area.round(2)

# Keep only the two requested columns
result = dissolved[['area_m2']].copy()
result.index.name = 'zoning_class'
result = result.reset_index()

print("\nResult:")
print(result)
print(f"\nTotal area: {result['area_m2'].sum():.2f} m²")

# ------------------------------------------------------------
# 6. Write output
# ------------------------------------------------------------
result.to_file('/work/zoning_aggregated.gpkg', layer='zoning_aggregated', driver='GPKG')
print("\nWritten to /work/zoning_aggregated.gpkg")

Wrote /work/solve.py (3846 chars).