spa-l2-lagos-hotspot-overlaps
Map (reference ← swipe → agent)
0:00
Hot-spot map for the Lagos housing brief. `lagos_landuse` carries land-use polygons with a population density attribute; `lagos_hex_grid` is a hex grid. I need the top 10 % of hex cells ranked by area-weighted mean population density across overlapping land-use polygons. Tiny polygons under 100 m² are noise and should be excluded.
I need two output files that share exactly the same set of `hex_id` values, one row per cell. `hotspots.geoparquet` is a GeoParquet (Polygon or MultiPolygon geometry) in Nigeria's national grid with columns `hex_id`, `rank`, `area_weighted_density`, and `geometry`. `hotspot_ranking.parquet` is a plain Parquet with no geometry, carrying `hex_id`, `rank`, `area_weighted_density`, `n_overlap_polygons` (integer), and `n_slivers_filtered` (integer). In both files `rank` must be a unique integer starting at 1 where ascending rank corresponds to descending `area_weighted_density`.
0:00
System info
{
"adapter": "openrouter",
"model": "deepseek/deepseek-v4-flash",
"base_url": "https://openrouter.ai/api/v1"
}
0:03
Let me start by understanding the data. I need to look at the two input files to understand their structure.
0:03
Bash
ls -la /work/
total 4244 drwxr-xr-x 1 agent agent 86 Jun 16 04:31 . drwxr-xr-x 1 root root 14 Jun 16 04:31 .. -rw-r--r-- 1 agent agent 736774 Jun 16 04:31 lagos_hex_grid.geojson -rw-r--r-- 1 agent agent 3607119 Jun 16 04:31 lagos_landuse.geojson
0:13
Let's first explore the data to understand the schema and coordinate reference systems.
0:13
Bash
cd /work && python3 -c "
cd /work && python3 -c "
import geopandas as gpd
lu = gpd.read_file('lagos_landuse.geojson')
hex = gpd.read_file('lagos_hex_grid.geojson')
print('=== LANDUSE ===')
print(lu.columns.tolist())
print(lu.crs)
print(lu.head(2))
print('shape:', lu.shape)
print()
print('=== HEX GRID ===')
print(hex.columns.tolist())
print(hex.crs)
print(hex.head(2))
print('shape:', hex.shape)
"
=== LANDUSE ===
['id', 'class', 'pop_density', 'geometry']
EPSG:4326
id ... geometry
0 004ca249-d2cf-3617-906e-dea1138e762b ... POLYGON ((3.46992 6.44615, 3.46993 6.44566, 3....
1 006a7dc6-2b5e-34fa-84a2-8a679620c9de ... POLYGON ((3.42136 6.43091, 3.42118 6.43051, 3....
[2 rows x 4 columns]
shape: (5542, 4)
=== HEX GRID ===
['hex_id', 'geometry']
EPSG:4326
hex_id geometry
0 H000-000 POLYGON ((3.25491 6.40082, 3.2523 6.40535, 3.2...
1 H000-001 POLYGON ((3.25491 6.40987, 3.2523 6.41439, 3.2...
shape: (1782, 2)
0:38
Let me understand the task: 1. Load `lagos_landuse` (land-use polygons with `pop_density` attribute) and `lagos_hex_grid` (hex grid with `hex_id`). 2. Exclude tiny land-use polygons under 100 m² (noise). 3. For each hex cell, compute area-weighted mean population density of overlapping land-use polygons. 4. Calculate `n_overlap_polygons` (count of overlapping land-use polygons after filtering) and `n_slivers_filtered` (count of polygons excluded because < 100 m²). 5. Get the top 10% of hex cells ranked by descending area-weighted mean density. 6. Assign unique rank starting at 1 (higher density = lower rank number). 7. Output: - `hotspots.geoparquet`: GeoParquet with geometry in Nigeria's national grid, columns: hex_id, rank, area_weighted_density, geometry - `hotspot_ranking.parquet`: Parquet (no geometry), columns: hex_id, rank, area_weighted_density, n_overlap_polygons, n_slivers_filtered Need to figure out Nigeria's national grid CRS. Nigeria uses EPSG:26391 (Minna / Nigeria East Belt) or EPSG:26393 (Minna / Nigeria West Belt). Actually, there's EPSG:26391 (Minna / Nigeria East Belt) and EPSG:26392 (Minna / Nigeria Mid Belt) and EPSG:26393 (Minna / Nigeria West Belt). But more commonly, Nigeria uses EPSG:26391 (Minna / Nigeria East Belt) or the newer EPSG:32632 (WGS 84 / UTM zone 32N). Let me think... Nigeria spans UTM zones 31N, 32N, and 33N. The "Nigeria national grid" is likely Minna / Nigeria East Belt (EPSG:26391), but actually the "Nigeria national grid" could also be EPSG:26391 (Minna / Nigeria East Belt) which is the standard for Nigeria. Wait, let me check more carefully. The Nigerian national grid system uses Transverse Mercator based on Minna datum. There are three belts: East (EPSG:26391), Mid (EPSG:26392), and West (EPSG:26393). Lagos is in the west of Nigeria, so it would be EPSG:26393 (Minna / Nigeria West Belt) or perhaps EPSG:26391. Actually, looking it up: - EPSG:26391 - Minna / Nigeria East Belt - EPSG:26392 - Minna / Nigeria Mid Belt - EPSG:26393 - Minna / Nigeria West Belt Lagos is in southwestern Nigeria, so West Belt (EPSG:26393) would make sense. But the instruction says "Nigeria's national grid" which might be EPSG:26391. Let me check by looking at the coordinates. The input data is in EPSG:4326 (WGS84 lat/lon). Lagos is around 3.4°E, 6.4°N. Actually, I'll search for what "Nigeria national grid" typically refers to. Let me just proceed and reproject to EPSG:26393 (West belt) since Lagos is in western Nigeria. But I'm not sure. Let me check if there's metadata. Let me just process and figure it out. Let me write a script.
0:38
Bash
cd /work && python3 -c "
cd /work && python3 -c "
import pyproj
# Check EPSG codes for Nigerian grids
for code in [26391, 26392, 26393, 32631, 32632, 32633]:
try:
crs = pyproj.CRS.from_epsg(code)
print(f'EPSG:{code}: {crs.name}')
except:
print(f'EPSG:{code}: not available')
"
EPSG:26391: Minna / Nigeria West Belt EPSG:26392: Minna / Nigeria Mid Belt EPSG:26393: Minna / Nigeria East Belt EPSG:32631: WGS 84 / UTM zone 31N EPSG:32632: WGS 84 / UTM zone 32N EPSG:32633: WGS 84 / UTM zone 33N
0:42
Lagos is in southwestern Nigeria. The "Nigeria national grid" is typically the Minna / Nigeria Mid Belt (EPSG:26392) for the Lagos area, or perhaps the full Nigeria national grid. Let me check what's common. Actually, I'm told "Nigeria's national grid" - that's usually the Nigeria Mid Belt (EPSG:26392) which covers Lagos. But let me double check. The Nigerian national grid is commonly EPSG:26392 (Minna / Nigeria Mid Belt) or sometimes EPSG:26391 for the west belt. Lagos is in the west, so it could be EPSG:26391 (Nigeria West Belt). Let me check the bounding box of the data. Actually, let me just check the bounds and decide.
0:42
Bash
cd /work && python3 -c "
cd /work && python3 -c "
import geopandas as gpd
lu = gpd.read_file('lagos_landuse.geojson')
print('Total bounds:', lu.total_bounds)
print('Lagos is in SW Nigeria. Minna / Nigeria West Belt (EPSG:26391) is likely the right national grid.')
"
Total bounds: [3.25026042 6.4009372 3.6701081 6.6980539 ] Lagos is in SW Nigeria. Minna / Nigeria West Belt (EPSG:26391) is likely the right national grid.
1:07
Now let me think about the approach: 1. Load both datasets 2. Filter out tiny polygons from landuse (small area -> noise) 3. For each hex cell, find overlapping land-use polygons 4. Compute area-weighted mean population density 5. Rank by area-weighted density descending 6. Keep top 10% 7. Output two files The tricky part: "Tiny polygons under X area are noise" - the instruction says "Tiny polygons under 100 m² are noise and should be excluded". But the data is in EPSG:4326 (lat/lon), so I need to project to a metric CRS to calculate area. EPSG:26391 (Minna / Nigeria West Belt) is the national grid. Let me write the script. Actually, let me think more carefully about the workflow: 1. Read landuse GeoJSON, project to EPSG:26391 2. Calculate area for each polygon, filter out those with area < 100 m² 3. For each hex cell (already in the hex grid), compute the overlap with remaining landuse polygons 4. For area-weighted mean density: sum(density_i * overlap_area_i) / sum(overlap_area_i) for each hex 5. Count n_overlap_polygons (number of overlapping landuse polygons per hex) 6. Count n_slivers_filtered (number of landuse polygons excluded per hex due to area < 100 m²) 7. Rank by area_weighted_density descending 8. Take top 10% 9. Output For n_slivers_filtered - this is the count per hex cell of tiny polygons that were filtered out. So I need to identify which polygons were filtered, and for each hex, count how many of those tiny polygons overlap with that hex. Let me write the Python script.
1:07
Write
/work/solve.py
content · 119 lines · py
#!/usr/bin/env python3
"""
Compute hotspot ranking for Lagos housing brief.
Produces:
- hotspots.geoparquet (with geometry)
- hotspot_ranking.parquet (no geometry)
"""
import geopandas as gpd
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings('ignore')
# --- Configuration ---
TARGET_CRS = 'EPSG:26391' # Minna / Nigeria West Belt
AREA_THRESHOLD = 100.0 # m², tiny polygons below this are noise
# --- 1. Read data ---
print("Reading data...")
lu = gpd.read_file('/work/lagos_landuse.geojson')
hex_grid = gpd.read_file('/work/lag95os_hex_grid.geojson')
# --- 2. Project to metric CRS ---
print(f"Projecting to {TARGET_CRS}...")
lu = lu.to_crs(TARGET_CRS)
hex_grid = hex_grid.to_crs(TARGET_CRS)
# --- 3. Compute area for each landuse polygon and classify ---
print("Computing areas...")
lu['area_m2'] = lu.geometry.area
lu['is_sliver'] = lu['area_m2'] < AREA_THRESHOLD
n_total_lu = len(lu)
n_slivers = lu['is_sliver'].sum()
print(f" Total landuse polygons: {n_total_lu}")
print(f" Slivers (< {AREA_THRESHOLD} m²): {n_slivers}")
# --- 4. Split landuse into valid and sliver sets ---
lu_valid = lu[~lu['is_sliver']].copy()
lu_slivers = lu[lu['is_sliver']].copy()
# --- 5. Spatial overlay: valid polygons vs hex grid ---
print("Performing spatial overlay (valid polygons)...")
# For each hex cell, get overlapping valid landuse polygons
# Use sjoin to find overlaps, then compute intersection areas
overlay = gpd.overlay(hex_grid[['hex_id', 'geometry']],
lu_valid[['pop_density', 'geometry']],
how='intersection', keep_geom_type=True)
print(f" Overlay result rows: {len(overlay)}")
# Compute intersection area
overlay['overlap_area'] = overlay.geometry.area
# Compute weighted density contribution
overlay['weighted_density'] = overlay['pop_density'] * overlay['overlap_area']
# Aggregate per hex cell
hex_agg = overlay.groupby('hex_id').agg(
sum_weighted_density=('weighted_density', 'sum'),
sum_overlap_area=('overlap_area', 'sum'),
n_overlap_polygons=('pop_density', 'count')
).reset_index()
# Area-weighted mean density
hex_agg['area_weighted_density'] = hex_agg['sum_weighted_density'] / hex_agg['sum_overlap_area']
# --- 6. Count slivers per hex ---
print("Counting slivers per hex...")
if len(lu_slivers) > 0:
sliver_overlay = gpd.overlay(hex_grid[['hex_id', 'geometry']],
lu_slivers[['geometry']],
how='intersection', keep_geom_type=True)
sliver_count = sliver_overlay.groupby('hex_id').size().reset_index(name='n_slivers_filtered')
else:
sliver_count = pd.DataFrame({'hex_id': hex_grid['hex_id'], 'n_slivers_filtered': 0})
# Merge sliver count into result
hex_agg = hex_agg.merge(sliver_count, on='hex_id', how='left')
hex_agg['n_slivers_filtered'] = hex_agg['n_slivers_filtered'].fillna(0).astype(int)
# --- 7. Merge geometry back ---
result = hex_grid[['hex_id', 'geometry']].merge(hex_agg, on='hex_id', how='inner')
# --- 8. Rank by area_weighted_density descending ---
result = result.sort_values('area_weighted_density', ascending=False).reset_index(drop=True)
result['rank'] = range(1, len(result) + 1)
# --- 9. Top 10 % ---
top_n = max(1, int(np.ceil(0.10 * len(result))))
result_top = result.head(top_n).copy()
print(f"Total hex cells: {len(result)}")
print(f"Top 10% (>= {top_n} cells): {len(result_top)}")
# --- 10. Prepare output GeoParquet ---
geo_out = result_top[['hex_id', 'rank', 'area_weighted_density', 'geometry']].copy()
geo_out.to_parquet('/work/hotspots.geoparquet', index=False)
# --- 11. Prepare output plain Parquet ---
plain_out = result_top[['hex_id', 'rank', 'area_weighted_density',
'n_overlap_polygons', 'n_slivers_filtered']].copy()
plain_out['n_overlap_polygons'] = plain_out['n_overlap_polygons'].astype(int)
plain_out['n_slivers_filtered'] = plain_out['n_slivers_filtered'].astype(int)
plain_out.to_parquet('/work/hotspot_ranking.parquet', index=False)
print("Done! Files written:")
print(" /work/hotspots.geoparquet")
print(" /work/hotspot_ranking.parquet")
# --- Sanity checks ---
print("\n--- Sanity checks ---")
print(f"GeoParquet columns: {geo_out.columns.tolist()}")
print(f"Plain Parquet columns: {plain_out.columns.tolist()}")
print(f"Number of rows: {len(geo_out)}")
print(f"Rank range: {geo_out['rank'].min()} to {geo_out['rank'].max()}")
print(f"Ranks are unique: {geo_out['rank'].is_unique}")
print(f"hex_id sets match: {set(geo_out['hex_id']) == set(plain_out['hex_id'])}")
print(f"Top 3:\n{geo_out[['hex_id','rank','area_weighted_density']].head(3)}")
Wrote /work/solve.py (4627 chars).
1:12
Let me check for the typo in the file path I noticed: I wrote `/work/lag95os_hex_grid.geojson` with a '95' typo - it should be `lagos_hex_grid.geojson`. Let me fix that and also run it.
1:12
Edit
/work/solve.py
Edited /work/solve.py.
1:25
1:25
Bash
cd /work && python solve.py 2>&1
Reading data...
Projecting to EPSG:26391...
Computing areas...
Total landuse polygons: 5542
Slivers (< 100.0 m²): 3022
Performing spatial overlay (valid polygons)...
Overlay result rows: 4631
Counting slivers per hex...
Total hex cells: 1032
Top 10% (>= 104 cells): 104
Done! Files written:
/work/hotspots.geoparquet
/work/hotspot_ranking.parquet
--- Sanity checks ---
GeoParquet columns: ['hex_id', 'rank', 'area_weighted_density', 'geometry']
Plain Parquet columns: ['hex_id', 'rank', 'area_weighted_density', 'n_overlap_polygons', 'n_slivers_filtered']
Number of rows: 104
Rank range: 1 to 104
Ranks are unique: True
hex_id sets match: True
Top 3:
hex_id rank area_weighted_density
0 H011-017 1 49816.761617
1 H049-008 2 49618.400000
2 H031-006 3 49405.200000
1:41
Wait, 5542 total landuse polygons and 3022 are slivers (area < 100 m²). That's more than half! But the spatial overlay produced 4631 rows, and the hex count merged down to 1032 from 1782 original hex cells. Let me check: some hex cells might have no valid overlapping landuse polygons, so they get filtered out by the inner merge. The requirement says "the top 10% of hex cells ranked by area-weighted mean population density across overlapping land-use polygons". So hex cells with no overlap would have no density, so they can't be in the top 10%. That seems fine. But let me verify that the output files look correct. Also, I should double-check that the geometry in the GeoParquet is indeed Polygon/MultiPolygon (it should be since we used the hex grid geometry). Let me also verify the output files can be read back properly.
1:41
Bash
cd /work && python3 -c "
cd /work && python3 -c "
import geopandas as gpd
import pandas as pd
# Read back and verify
geo = gpd.read_parquet('hotspots.geoparquet')
plain = pd.read_parquet('hotspot_ranking.parquet')
print('=== GeoParquet ===')
print('CRS:', geo.crs)
print('Columns:', geo.columns.tolist())
print('Geometry types:', geo.geometry.geom_type.value_counts())
print('Shape:', geo.shape)
print()
print('=== Plain Parquet ===')
print('Columns:', plain.columns.tolist())
print('Shape:', plain.shape)
print('dtypes:', plain.dtypes)
print()
# Verify both have same hex_ids
assert set(geo['hex_id']) == set(plain['hex_id']), 'hex_id mismatch!'
print('hex_id sets match: OK')
# Verify rank ascending is density descending
ranks = geo.sort_values('rank')
density_desc = ranks['area_weighted_density'].is_monotonic_decreasing
print(f'Density descending with rank: {density_desc}')
# Verify ranks are unique sequential
assert list(geo['rank']) == list(range(1, len(geo)+1)), 'Ranks not sequential!'
print('Ranks sequential from 1: OK')
print()
print('First 5:')
print(geo[['hex_id','rank','area_weighted_density']].head())
"
=== GeoParquet ===
CRS: {"$schema": "https://proj.org/schemas/v0.7/projjson.schema.json", "type": "ProjectedCRS", "name": "Minna / Nigeria West Belt", "base_crs": {"name": "Minna", "datum": {"type": "GeodeticReferenceFrame", "name": "Minna", "ellipsoid": {"name": "Clarke 1880 (RGS)", "semi_major_axis": 6378249.145, "inverse_flattening": 293.465}}, "coordinate_system": {"subtype": "ellipsoidal", "axis": [{"name": "Geodetic latitude", "abbreviation": "Lat", "direction": "north", "unit": "degree"}, {"name": "Geodetic longitude", "abbreviation": "Lon", "direction": "east", "unit": "degree"}]}, "id": {"authority": "EPSG", "code": 4263}}, "conversion": {"name": "Nigeria West Belt", "method": {"name": "Transverse Mercator", "id": {"authority": "EPSG", "code": 9807}}, "parameters": [{"name": "Latitude of natural origin", "value": 4, "unit": "degree", "id": {"authority": "EPSG", "code": 8801}}, {"name": "Longitude of natural origin", "value": 4.5, "unit": "degree", "id": {"authority": "EPSG", "code": 8802}}, {"name": "Scale factor at natural origin", "value": 0.99975, "unit": "unity", "id": {"authority": "EPSG", "code": 8805}}, {"name": "False easting", "value": 230738.26, "unit": "metre", "id": {"authority": "EPSG", "code": 8806}}, {"name": "False northing", "value": 0, "unit": "metre", "id": {"authority": "EPSG", "code": 8807}}]}, "coordinate_system": {"subtype": "Cartesian", "axis": [{"name": "Easting", "abbreviation": "E", "direction": "east", "unit": "metre"}, {"name": "Northing", "abbreviation": "N", "direction": "north", "unit": "metre"}]}, "scope": "Engineering survey, topographic mapping.", "area": "Nigeria - onshore west of 6\u00b030'E, onshore and offshore shelf.", "bbox": {"south_latitude": 3.57, "west_longitude": 2.69, "north_latitude": 13.9, "east_longitude": 6.5}, "id": {"authority": "EPSG", "code": 26391}}
Columns: ['hex_id', 'rank', 'area_weighted_density', 'geometry']
Geometry types: Polygon 104
Name: count, dtype: int64
Shape: (104, 4)
=== Plain Parquet ===
Columns: ['hex_id', 'rank', 'area_weighted_density', 'n_overlap_polygons', 'n_slivers_filtered']
Shape: (104, 5)
dtypes: hex_id str
rank int64
area_weighted_density float64
n_overlap_polygons int64
n_slivers_filtered int64
dtype: object
hex_id sets match: OK
Density descending with rank: True
Ranks sequential from 1: OK
First 5:
hex_id rank area_weighted_density
0 H011-017 1 49816.761617
1 H049-008 2 49618.400000
2 H031-006 3 49405.200000
3 H035-007 4 49167.200000
4 H039-014 5 49163.000000
2:02
Everything looks good. Let me summarize what was done.
2:02