run-20260609-084636Z / spa-l2-lagos-hotspot-overlaps

0:00

Hot-spot map for the Lagos housing brief. `lagos_landuse` carries land-use polygons with a population density attribute; `lagos_hex_grid` is a hex grid. I need the top 10 % of hex cells ranked by area-weighted mean population density across overlapping land-use polygons. Tiny polygons under 100 m² are noise and should be excluded. I need two output files that share exactly the same set of `hex_id` values, one row per cell. `hotspots.geoparquet` is a GeoParquet (Polygon or MultiPolygon geometry) in Nigeria's national grid with columns `hex_id`, `rank`, `area_weighted_density`, and `geometry`. `hotspot_ranking.parquet` is a plain Parquet with no geometry, carrying `hex_id`, `rank`, `area_weighted_density`, `n_overlap_polygons` (integer), and `n_slivers_filtered` (integer). In both files `rank` must be a unique integer starting at 1 where ascending rank corresponds to descending `area_weighted_density`.

0:00

System info

{
  "adapter": "openrouter",
  "model": "deepseek/deepseek-v4-flash",
  "base_url": "https://openrouter.ai/api/v1"
}

0:03

Let me start by understanding the data. I need to:

1. Read the land-use polygons with population density
2. Read the hex grid
3. Exclude tiny polygons under 100 m²
4. Compute area-weighted mean population density for each hex cell
5. Find top 10% of hex cells by that metric
6. Output two files

Let me begin by exploring the data.

0:03

Bash cd /work && python3 -c "

cd /work && python3 -c "
import geopandas as gpd
# Check what files are available
import os
for f in os.listdir('.'):
    print(f)
"

lagos_hex_grid.geojson
lagos_landuse.geojson

0:08

Let me first explore the data to understand the schemas.

0:08

Bash cd /work && python3 -c "

cd /work && python3 -c "
import geopandas as gpd

landuse = gpd.read_file('lagos_landuse.geojson')
print('=== LANDUSE ===')
print('Columns:', landuse.columns.tolist())
print('CRS:', landuse.crs)
print('Shape:', landuse.shape)
print(landuse.head(3))
print()

hex_grid = gpd.read_file('lagos_hex_grid.geojson')
print('=== HEX GRID ===')
print('Columns:', hex_grid.columns.tolist())
print('CRS:', hex_grid.crs)
print('Shape:', hex_grid.shape)
print(hex_grid.head(3))
print()
print('hex_grid dtypes:')
print(hex_grid.dtypes)
"

=== LANDUSE ===
Columns: ['id', 'class', 'pop_density', 'geometry']
CRS: EPSG:4326
Shape: (5542, 4)
                                     id  ...                                           geometry
0  004ca249-d2cf-3617-906e-dea1138e762b  ...  POLYGON ((3.46992 6.44615, 3.46993 6.44566, 3....
1  006a7dc6-2b5e-34fa-84a2-8a679620c9de  ...  POLYGON ((3.42136 6.43091, 3.42118 6.43051, 3....
2  00a334d1-7e8e-3e4c-8565-3501d4ed74df  ...  POLYGON ((3.45824 6.43361, 3.45872 6.43306, 3....

[3 rows x 4 columns]

=== HEX GRID ===
Columns: ['hex_id', 'geometry']
CRS: EPSG:4326
Shape: (1782, 2)
     hex_id                                           geometry
0  H000-000  POLYGON ((3.25491 6.40082, 3.2523 6.40535, 3.2...
1  H000-001  POLYGON ((3.25491 6.40987, 3.2523 6.41439, 3.2...
2  H000-002  POLYGON ((3.25492 6.41891, 3.25231 6.42344, 3....

hex_grid dtypes:
hex_id           str
geometry    geometry
dtype: object

0:19

Let me understand the task:

1. I have `lagos_landuse.geojson` with land-use polygons having a `pop_density` attribute.
2. I have `lagos_hex_grid.geojson` with hex cells having `hex_id`.
3. I need to compute area-weighted mean population density for each hex cell by overlapping land-use polygons.
4. Exclude tiny polygons under 100 m² (as noise/slivers).
5. Find the top 10% of hex cells ranked by this density.
6. Output two files:
   - `hotspots.geoparquet`: GeoParquet with `hex_id`, `rank`, `area_weighted_density`, `geometry` - in Nigeria's national grid (EPSG:26391? or EPSG:26392? Let me check - Nigeria's national grid is EPSG:26391 (Nigeria West) or EPSG:26392 (Nigeria East). Actually Lagos is in the west, so likely EPSG:26391 (Nigeria Mid Belt?) Wait, let me think. The Nigerian national grid system uses EPSG:26391 for the "Nigeria West" zone and EPSG:26392 for "Nigeria East". Lagos is in the western part, so EPSG:26391 or alternatively EPSG:26393 (Nigeria...). Actually EPSG:26391 is "Nigeria / West" and EPSG:26392 is "Nigeria / East". Lagos is in the west, so EPSG:26391.

Wait, actually let me reconsider. There's EPSG:26391 (Nigeria West), EPSG:26392 (Nigeria East), and EPSG:26393 (Nigeria...). Let me check: EPSG:26391 is "Minna / Nigeria West Belt", EPSG:26392 is "Minna / Nigeria East Belt", EPSG:26393 is "Minna / Nigeria Mid Belt". Actually, I think the common ones are 26391 (West), 26392 (East), 26393 (Mid). Lagos is in the west, so 26391.

Actually, I'll just use what makes sense. Let me first figure out the CRS by looking at the data.

Let me check the CRS info more carefully and also look at the coordinate ranges to understand the projection better.

Let me proceed with the analysis.

0:19

Bash cd /work && python3 -c "

cd /work && python3 -c "
import geopandas as gpd
import numpy as np

landuse = gpd.read_file('lagos_landuse.geojson')
hex_grid = gpd.read_file('lagos_hex_grid.geojson')

# Check bounds
print('Landuse bounds:', landuse.total_bounds)
print('Hex bounds:', hex_grid.total_bounds)

# Check CRS more carefully
print('Landuse CRS:', landuse.crs)
print('Hex CRS:', hex_grid.crs)

# Check pop_density stats
print('pop_density min/max/mean:', landuse['pop_density'].min(), landuse['pop_density'].max(), landuse['pop_density'].mean())
print('pop_density non-null:', landuse['pop_density'].notna().sum(), 'total:', len(landuse))

# Check area distribution
landuse_utm = landuse.to_crs('EPSG:26391')
landuse_utm['area_m2'] = landuse_utm.geometry.area
print('Area stats (m2): min', landuse_utm['area_m2'].min(), 'median', landuse_utm['area_m2'].median(), 'max', landuse_utm['area_m2'].max())
print('Number of polygons < 100 m2:', (landuse_utm['area_m2'] < 100).sum())
"

Landuse bounds: [3.25026042 6.4009372  3.6701081  6.6980539 ]
Hex bounds: [3.2444655  6.39593906 3.67033875 6.69934004]
Landuse CRS: EPSG:4326
Hex CRS: EPSG:4326
pop_density min/max/mean: 507.3 49994.9 25250.978978708048
pop_density non-null: 5542 total: 5542
Area stats (m2): min 1.0007638007186197 median 91.14665904726975 max 34909797.88823449
Number of polygons < 100 m2: 3022

0:29

Interesting - many polygons are under 100 m². The CRS is EPSG:4326 (WGS84). Nigeria's national grid is EPSG:26391 (Minna / Nigeria West Belt). Let me proceed with the solution.

The steps:
1. Filter landuse polygons to those with area >= 100 m² (in projected CRS)
2. For each hex cell, compute area-weighted mean population density from overlapping land-use polygons using the filtered polygons
3. Rank hex cells by descending area_weighted_density, top 10% get rank 1..N
4. Also count:
   - n_overlap_polygons: number of landuse polygons (before filtering) that overlap the hex cell
   - n_slivers_filtered: number of landuse polygons that overlap the hex cell but are < 100 m²

Wait, re-reading the requirements. "Tiny polygons under 100 m² are noise and should be excluded." So when computing area-weighted mean density, we exclude polygons < 100 m². But n_overlap_polygons should count all polygons (including tiny ones?) and n_slivers_filtered is the count of tiny polygons that overlap.

Let me re-read: "n_overlap_polygons (integer), and n_slivers_filtered (integer)". So:
- n_overlap_polygons: total number of landuse polygons that intersect the hex cell (before filtering)
- n_slivers_filtered: number of those that are < 100 m² (the slivers filtered out)

Let me plan the code.

First, I need to project both datasets to Nigeria's national grid (EPSG:26391 - Minna / Nigeria West Belt) for proper area calculations.

Then:
1. Filter landuse polygons to area >= 100 m² (in projected CRS)
2. Spatial join: for each hex cell, find overlapping landuse polygons (both before and after filtering)
3. For the filtered set, compute area-weighted mean density
4. For each hex cell, count total overlapping polygons and slivers filtered
5. Rank and take top 10%

Let me write a comprehensive Python script.

0:29

Bash cd /work && python3 -c "

cd /work && python3 -c "
import geopandas as gpd
# Let's verify Nigeria's national grid EPSG code
landuse = gpd.read_file('lagos_landuse.geojson')
# Try EPSG:26391 (Minna / Nigeria West Belt) - common for Lagos
try:
    test = landuse.to_crs('EPSG:26391')
    print('EPSG:26391 works, bounds:', test.total_bounds)
except Exception as e:
    print('EPSG:26391 failed:', e)
    # Try 26392 (East Belt)
    try:
        test = landuse.to_crs('EPSG:26392')
        print('EPSG:26392 works')
    except:
        print('Both failed')
"

EPSG:26391 works, bounds: [ 92614.05688451 265487.763265   139052.06293866 298364.62844424]

0:51

Now I have a good understanding of the data. Let me plan the solution:

1. Load landuse and hex grid
2. Filter out landuse polygons with area < 100 m² (in the appropriate projected CRS - EPSG:26391, Nigeria West Belt, which is the national grid for Lagos)
3. For each hex cell, compute the area-weighted mean population density from overlapping landuse polygons
4. Compute rank (1 = highest density)
5. Compute n_overlap_polygons (number of landuse polygons overlapping the hex cell)
6. Compute n_slivers_filtered (number of landuse polygons that overlapped but were filtered out due to area < 100 m²)
7. Select top 10% of hex cells by area-weighted density
8. Save both outputs

Let me think about the CRS carefully. "Nigeria's national grid" - the common one is EPSG:26391 (Minna / Nigeria West Belt) for the western zone which covers Lagos. Actually, there's also EPSG:26331 (Minna / Nigeria West Belt) but 26391 is the projected CRS. Let me proceed.

Let me write the full Python script.

0:51

Write /work/solve.py

content · 127 lines · py

import geopandas as gpd
import pandas as pd
import numpy as np

# Nigeria national grid for Lagos area = EPSG:26391 (Minna / Nigeria West Belt)
TARGET_CRS = 'EPSG:26391'

# Load data
landuse = gpd.read_file('/work/lagos_landuse.geojson')
hex_grid = gpd.read_file('/work/lagos_hex_grid.geojson')

# Reproject to projected CRS for area calculation
landuse_proj = landuse.to_crs(TARGET_CRS)
hex_proj = hex_grid.to_crs(TARGET_CRS)

# Compute polygon areas
landuse_proj['area_m2'] = landuse_proj.geometry.area

# Identify slivers (< 100 m²) and clean polygons (≥ 100 m²)
n_slivers_total = (landuse_proj['area_m2'] < 100).sum()
print(f"Total landuse polygons: {len(landuse_proj)}")
print(f"Slivers (< 100 m²): {n_slivers_total}")
print(f"Clean polygons: {len(landuse_proj) - n_slivers_total}")

clean_landuse = landuse_proj[landuse_proj['area_m2'] >= 100].copy()
clean_landuse = clean_landuse.reset_index(drop=True)

# Prepare hex grid with hex_id as index for later
hex_proj = hex_proj.reset_index(drop=True)

# Spatial join: for each hex cell, find all overlapping clean landuse polygons
# Use sjoin with predicate='intersects'
join = gpd.sjoin(hex_proj[['hex_id', 'geometry']], clean_landuse[['pop_density', 'area_m2', 'geometry']],
                 how='left', predicate='intersects')

# For each hex cell, compute area-weighted mean pop_density
# Area of intersection (overlap area) matters; but the instruction says
# "area-weighted mean population density across overlapping land-use polygons"
# This typically means: sum(pop_density_i * overlap_area_i) / sum(overlap_area_i)
# But since we are using population density (per-unit-area), if we use the overlap
# area as the weight, it gives the correct spatially weighted density.
# Let me compute intersection area for each overlapping pair.

# First, make sure we have a unique identifier for each landuse polygon
clean_landuse['lu_idx'] = clean_landuse.index

# Re-do sjoin with the index
join = gpd.sjoin(hex_proj[['hex_id', 'geometry']].copy(),
                 clean_landuse[['pop_density', 'lu_idx', 'geometry']].copy(),
                 how='inner', predicate='intersects')

# Compute intersection area for each join pair
# Create a GeoSeries of intersection geometries
# We need the hex geometry and landuse geometry from the join
hex_geom_map = hex_proj.set_index('hex_id')['geometry']
lu_geom_map = clean_landuse.set_index('lu_idx')['geometry']

def intersection_area(row):
    hex_geom = hex_geom_map.loc[row['hex_id']]
    lu_geom = lu_geom_map.loc[row['lu_idx']]
    if hex_geom.intersects(lu_geom):
        return hex_geom.intersection(lu_geom).area
    return 0.0

join['intersection_area'] = join.apply(intersection_area, axis=1)

# Compute area-weighted density per hex cell
# weighted_sum = sum(pop_density * intersection_area) for each hex
# total_overlap_area = sum(intersection_area) for each hex
# area_weighted_density = weighted_sum / total_overlap_area

grouped = join.groupby('hex_id').agg(
    weighted_sum=('pop_density', lambda x: (x * join.loc[x.index, 'intersection_area']).sum()),
    total_overlap_area=('intersection_area', 'sum'),
    n_overlap_polygons=('lu_idx', 'count')
).reset_index()

grouped['area_weighted_density'] = grouped['weighted_sum'] / grouped['total_overlap_area']

# Now count how many slivers were filtered out per hex
# For that, we need the original landuse polygons (including slivers) and count overlap per hex
sliver_landuse = landuse_proj[landuse_proj['area_m2'] < 100].copy()
if len(sliver_landuse) > 0:
    sliver_join = gpd.sjoin(hex_proj[['hex_id', 'geometry']].copy(),
                            sliver_landuse[['geometry']].copy(),
                            how='inner', predicate='intersects')
    sliver_count = sliver_join.groupby('hex_id').size().reset_index(name='n_slivers_filtered')
else:
    sliver_count = pd.DataFrame({'hex_id': hex_proj['hex_id'], 'n_slivers_filtered': 0})

# Merge sliver count into results
result = grouped.merge(sliver_count, on='hex_id', how='left')
result['n_slivers_filtered'] = result['n_slivers_filtered'].fillna(0).astype(int)

# Sort by area_weighted_density descending, assign rank
result = result.sort_values('area_weighted_density', ascending=False).reset_index(drop=True)
result['rank'] = result.index + 1

# Select top 10 %
n_top = max(1, int(np.ceil(0.1 * len(hex_proj))))
top10 = result.head(n_top).copy()

print(f"\nTotal hex cells: {len(hex_proj)}")
print(f"Top 10%: {n_top}")
print(f"Top 10% rank range: {top10['rank'].min()} to {top10['rank'].max()}")

# Build the hotspot geometry GeoDataFrame
hotspot_gdf = hex_proj[hex_proj['hex_id'].isin(top10['hex_id'])].copy()
hotspot_gdf = hotspot_gdf.merge(top10[['hex_id', 'rank', 'area_weighted_density']], on='hex_id')
hotspot_gdf = hotspot_gdf[['hex_id', 'rank', 'area_weighted_density', 'geometry']]
hotspot_gdf = hotspot_gdf.sort_values('rank').reset_index(drop=True)

# Build the ranking Parquet DataFrame
ranking_df = top10[['hex_id', 'rank', 'area_weighted_density', 'n_overlap_polygons', 'n_slivers_filtered']].copy()
ranking_df = ranking_df.sort_values('rank').reset_index(drop=True)

# Verify both have same hex_id set
assert set(hotspot_gdf['hex_id']) == set(ranking_df['hex_id']), "hex_id mismatch!"
print(f"\nOutput hex IDs: {len(hotspot_gdf)}")

# Save outputs
hotspot_gdf.to_parquet('/work/hotspots.geoparquet')
ranking_df.to_parquet('/work/hotspot_ranking.parquet')

print("\nDone! Files saved:")
print("  /work/hotspots.geoparquet")
print("  /work/hotspot_ranking.parquet")

Wrote /work/solve.py (5503 chars).

0:53

Let me run it and check if there are any issues.

0:53

Bash cd /work && python solve.py

Total landuse polygons: 5542
Slivers (< 100 m²): 3022
Clean polygons: 2520

Total hex cells: 1782
Top 10%: 179
Top 10% rank range: 1 to 179

Output hex IDs: 179

Done! Files saved:
  /work/hotspots.geoparquet
  /work/hotspot_ranking.parquet

1:01

Let me verify the output files to make sure they're correct and have the right structure.

1:01

Bash cd /work && python3 -c "

cd /work && python3 -c "
import geopandas as gpd
import pandas as pd

# Check hotspots.geoparquet
hot = gpd.read_parquet('hotspots.geoparquet')
print('=== hotspots.geoparquet ===')
print('Columns:', hot.columns.tolist())
print('Shape:', hot.shape)
print('CRS:', hot.crs)
print(hot.dtypes)
print(hot.head(10))
print()

# Check hotspot_ranking.parquet
rank = pd.read_parquet('hotspot_ranking.parquet')
print('=== hotspot_ranking.parquet ===')
print('Columns:', rank.columns.tolist())
print('Shape:', rank.shape)
print(rank.dtypes)
print(rank.head(10))
print()

# Verify same hex_id sets
assert set(hot['hex_id']) == set(rank['hex_id']), 'Mismatch!'
print('hex_id sets match!')

# Verify ranks are unique and ascending order = descending density
print('Rank unique:', rank['rank'].is_unique)
print('Density sorted descending:', all(rank['area_weighted_density'].iloc[i] >= rank['area_weighted_density'].iloc[i+1] for i in range(len(rank)-1)))

# Check n_slivers_filtered and n_overlap_polygons
print('n_slivers_filtered range:', rank['n_slivers_filtered'].min(), '-', rank['n_slivers_filtered'].max())
print('n_overlap_polygons range:', rank['n_overlap_polygons'].min(), '-', rank['n_overlap_polygons'].max())

# Verify geometry type
print('Geometry types:', hot.geometry.type.value_counts())
"

=== hotspots.geoparquet ===
Columns: ['hex_id', 'rank', 'area_weighted_density', 'geometry']
Shape: (179, 4)
CRS: {"$schema": "https://proj.org/schemas/v0.7/projjson.schema.json", "type": "ProjectedCRS", "name": "Minna / Nigeria West Belt", "base_crs": {"name": "Minna", "datum": {"type": "GeodeticReferenceFrame", "name": "Minna", "ellipsoid": {"name": "Clarke 1880 (RGS)", "semi_major_axis": 6378249.145, "inverse_flattening": 293.465}}, "coordinate_system": {"subtype": "ellipsoidal", "axis": [{"name": "Geodetic latitude", "abbreviation": "Lat", "direction": "north", "unit": "degree"}, {"name": "Geodetic longitude", "abbreviation": "Lon", "direction": "east", "unit": "degree"}]}, "id": {"authority": "EPSG", "code": 4263}}, "conversion": {"name": "Nigeria West Belt", "method": {"name": "Transverse Mercator", "id": {"authority": "EPSG", "code": 9807}}, "parameters": [{"name": "Latitude of natural origin", "value": 4, "unit": "degree", "id": {"authority": "EPSG", "code": 8801}}, {"name": "Longitude of natural origin", "value": 4.5, "unit": "degree", "id": {"authority": "EPSG", "code": 8802}}, {"name": "Scale factor at natural origin", "value": 0.99975, "unit": "unity", "id": {"authority": "EPSG", "code": 8805}}, {"name": "False easting", "value": 230738.26, "unit": "metre", "id": {"authority": "EPSG", "code": 8806}}, {"name": "False northing", "value": 0, "unit": "metre", "id": {"authority": "EPSG", "code": 8807}}]}, "coordinate_system": {"subtype": "Cartesian", "axis": [{"name": "Easting", "abbreviation": "E", "direction": "east", "unit": "metre"}, {"name": "Northing", "abbreviation": "N", "direction": "north", "unit": "metre"}]}, "scope": "Engineering survey, topographic mapping.", "area": "Nigeria - onshore west of 6\u00b030'E, onshore and offshore shelf.", "bbox": {"south_latitude": 3.57, "west_longitude": 2.69, "north_latitude": 13.9, "east_longitude": 6.5}, "id": {"authority": "EPSG", "code": 26391}}
hex_id                        str
rank                        int64
area_weighted_density     float64
geometry                 geometry
dtype: object
     hex_id  ...                                           geometry
0  H024-005  ...  POLYGON ((113908.261 270453.435, 113620.974 27...
1  H023-005  ...  POLYGON ((113043.458 270956.126, 112756.171 27...
2  H026-003  ...  POLYGON ((115634.927 268447.788, 115347.638 26...
3  H028-005  ...  POLYGON ((117373.325 270443.251, 117086.042 27...
4  H024-006  ...  POLYGON ((113911.203 271453.72, 113623.918 271...
5  H022-004  ...  POLYGON ((112172.772 269458.236, 111885.481 26...
6  H020-004  ...  POLYGON ((110440.21 269463.321, 110152.916 269...
7  H021-003  ...  POLYGON ((111305.025 268960.63, 111017.732 269...
8  H023-006  ...  POLYGON ((113046.402 271956.414, 112759.117 27...
9  H022-005  ...  POLYGON ((112175.71 270458.528, 111888.421 270...

[10 rows x 4 columns]

=== hotspot_ranking.parquet ===
Columns: ['hex_id', 'rank', 'area_weighted_density', 'n_overlap_polygons', 'n_slivers_filtered']
Shape: (179, 5)
hex_id                       str
rank                       int64
area_weighted_density    float64
n_overlap_polygons         int64
n_slivers_filtered         int64
dtype: object
     hex_id  rank  ...  n_overlap_polygons  n_slivers_filtered
0  H024-005     1  ...                  97                   3
1  H023-005     2  ...                  54                   1
2  H026-003     3  ...                  49                   1
3  H028-005     4  ...                  45                   2
4  H024-006     5  ...                  42                   2
5  H022-004     6  ...                  44                   2
6  H020-004     7  ...                  40                   7
7  H021-003     8  ...                  35                   2
8  H023-006     9  ...                  35                   2
9  H022-005    10  ...                  33                   3

[10 rows x 5 columns]

hex_id sets match!
Rank unique: True
Density sorted descending: True
n_slivers_filtered range: 0 - 9
n_overlap_polygons range: 6 - 97
Geometry types: Polygon    179
Name: count, dtype: int64

1:06

Everything looks good. Let me verify the results once more:

1. Both files have the same set of 179 hex_id values (top 10% of 1782 = 179) ✓
2. Rank is unique, starting at 1, ascending = descending density ✓
3. hotspots.geoparquet has hex_id, rank, area_weighted_density, geometry in EPSG:26391 (Nigeria's national grid) ✓
4. hotspot_ranking.parquet has hex_id, rank, area_weighted_density, n_overlap_polygons (int), n_slivers_filtered (int) ✓
5. Slivers (< 100 m²) were excluded from the density calculation ✓
6. n_slivers_filtered counts the number of sliver polygons overlapping each hex ✓
7. Geometry type is Polygon ✓

All good!

1:06

spa-l2-lagos-hotspot-overlaps

Map (reference ← swipe → agent)

Gates

Subchecks

spa-l2-lagos-hotspot-overlaps Rerun

Map (reference ← swipe → agent)

Gates

Subchecks

spa-l2-lagos-hotspot-overlaps