spa-l2-lagos-hotspot-overlaps
Map (reference ← swipe → agent)
0:00
Hot-spot map for the Lagos housing brief. `lagos_landuse` carries land-use polygons with a population density attribute; `lagos_hex_grid` is a hex grid. I need the top 10 % of hex cells ranked by area-weighted mean population density across overlapping land-use polygons. Tiny polygons under 100 m² are noise and should be excluded.
I need two output files that share exactly the same set of `hex_id` values, one row per cell. `hotspots.geoparquet` is a GeoParquet (Polygon or MultiPolygon geometry) in Nigeria's national grid with columns `hex_id`, `rank`, `area_weighted_density`, and `geometry`. `hotspot_ranking.parquet` is a plain Parquet with no geometry, carrying `hex_id`, `rank`, `area_weighted_density`, `n_overlap_polygons` (integer), and `n_slivers_filtered` (integer). In both files `rank` must be a unique integer starting at 1 where ascending rank corresponds to descending `area_weighted_density`.
0:00
System info
{
"adapter": "openrouter",
"model": "deepseek/deepseek-v4-flash",
"base_url": "https://openrouter.ai/api/v1"
}
0:03
Let me understand the task: 1. We have `lagos_landuse` (land-use polygons with population density attribute) and `lagos_hex_grid` (hex grid) 2. Need to find top 10% of hex cells ranked by area-weighted mean population density 3. Exclude tiny polygons under 100 m² (noise) 4. Output two files: - `hotspots.geoparquet` - GeoParquet with Polygon/MultiPolygon geometry in Nigeria's national grid, columns: hex_id, rank, area_weighted_density, geometry - `hotspot_ranking.parquet` - plain Parquet, columns: hex_id, rank, area_weighted_density, n_overlap_polygons, n_slivers_filtered Let me first explore the data to understand the structure.
0:03
Bash
ls /work/
lagos_hex_grid.geojson lagos_landuse.geojson
0:07
Let me first preview the input files to understand their structure.
0:07
Bash
cd /work && python3 -c "
cd /work && python3 -c "
import geopandas as gpd
import pandas as pd
# Read landuse
lu = gpd.read_file('lagos_landuse.geojson')
print('=== LANDUSE ===')
print('Columns:', lu.columns.tolist())
print('CRS:', lu.crs)
print('Shape:', lu.shape)
print('dtypes:')
print(lu.dtypes)
print()
print(lu.head(3))
print()
# Read hex grid
hex = gpd.read_file('lagos_hex_grid.geojson')
print('=== HEX GRID ===')
print('Columns:', hex.columns.tolist())
print('CRS:', hex.crs)
print('Shape:', hex.shape)
print('dtypes:')
print(hex.dtypes)
print()
print(hex.head(3))
"
=== LANDUSE ===
Columns: ['id', 'class', 'pop_density', 'geometry']
CRS: EPSG:4326
Shape: (5542, 4)
dtypes:
id str
class str
pop_density float64
geometry geometry
dtype: object
id ... geometry
0 004ca249-d2cf-3617-906e-dea1138e762b ... POLYGON ((3.46992 6.44615, 3.46993 6.44566, 3....
1 006a7dc6-2b5e-34fa-84a2-8a679620c9de ... POLYGON ((3.42136 6.43091, 3.42118 6.43051, 3....
2 00a334d1-7e8e-3e4c-8565-3501d4ed74df ... POLYGON ((3.45824 6.43361, 3.45872 6.43306, 3....
[3 rows x 4 columns]
=== HEX GRID ===
Columns: ['hex_id', 'geometry']
CRS: EPSG:4326
Shape: (1782, 2)
dtypes:
hex_id str
geometry geometry
dtype: object
hex_id geometry
0 H000-000 POLYGON ((3.25491 6.40082, 3.2523 6.40535, 3.2...
1 H000-001 POLYGON ((3.25491 6.40987, 3.2523 6.41439, 3.2...
2 H000-002 POLYGON ((3.25492 6.41891, 3.25231 6.42344, 3....
0:26
Let me understand the task:
1. I have `lagos_landuse.geojson` with land-use polygons in EPSG:4326. It has `pop_density` attribute.
2. I have `lagos_hex_grid.geojson` with a hex grid in EPSG:4326. It has `hex_id` attribute.
3. I need to:
- Exclude tiny land-use polygons under 100 m² (noise).
- For each hex cell, compute the area-weighted mean population density across overlapping land-use polygons.
- Rank hex cells by area-weighted mean density (descending), take top 10%.
- Produce two files:
a. `hotspots.geoparquet`: GeoParquet with columns `hex_id`, `rank`, `area_weighted_density`, `geometry` in Nigeria's national grid (EPSG:26391 - Minna / Nigeria West Belt... or EPSG:26393? Let me think... Nigeria's national grid is typically EPSG:26391 (Minna / Nigeria West) or EPSG:26393 (Minna / Nigeria East). But actually the common one for Lagos area is EPSG:26391 (Minna / Nigeria West Belt) which covers the western part of Nigeria including Lagos. Let me check.)
Actually, Nigeria's national grid is "Minna / Nigeria West Belt" (EPSG:26391) for the western part and "Minna / Nigeria East Belt" (EPSG:26393) for the eastern part. Lagos is in the western part so it would be EPSG:26391. But let me just check if there's another common CRS. The user said "Nigeria's national grid" - this is likely EPSG:26391 (Minna / Nigeria West Belt) since Lagos is in western Nigeria.
Wait, there's also EPSG:26392 (Minna / Nigeria Mid Belt) and EPSG:26393 (Minna / Nigeria East Belt). Let me use 26391 for Lagos.
Actually, let me just compute everything and produce the outputs.
b. `hotspot_ranking.parquet`: plain Parquet (no geometry) with columns `hex_id`, `rank`, `area_weighted_density`, `n_overlap_polygons` (int), `n_slivers_filtered` (int).
The `n_overlap_polygons` is the number of land-use polygons that overlap with each hex cell (after filtering slivers).
The `n_slivers_filtered` is the number of polygons under 100 m² that were excluded per hex.
Wait, let me re-read: "Tiny polygons under 100 m² are noise and should be excluded." So we filter out land-use polygons with area < 100 m² before doing the analysis. The `n_slivers_filtered` column should be the count of such tiny polygons per hex that were excluded.
Hmm, actually let me think about this more carefully. The `n_slivers_filtered` is per hex cell. It could mean the number of polygons that overlap with that hex cell but were filtered out because they're under 100 m². But since filtering happens globally (we remove all tiny polygons before the overlap analysis), the count per hex of "how many of these tiny polygons overlapped this hex" would be needed.
Actually, let me think step by step:
1. First, for each land-use polygon, compute its area (in a metric CRS).
2. Filter out polygons with area < 100 m².
3. For the filtered-out polygons, we still need to count per hex cell how many of them overlap with each hex (to report `n_slivers_filtered`).
4. For the remaining polygons, compute the area-weighted mean pop_density per hex cell.
Let me plan the solution:
1. Read both files.
2. Reproject land-use to EPSG:26391 (or similar metric CRS) to compute areas.
3. Identify tiny polygons (area < 100 m²) - these are "slivers".
4. For each tiny polygon, find which hex cells it overlaps with, and count per hex (n_slivers_filtered).
5. For non-tiny polygons, for each hex cell, compute:
- Intersection area between the hex and each overlapping land-use polygon
- Area-weighted mean density = sum(density_i * intersection_area_i) / sum(intersection_area_i)
6. Sort by area-weighted density descending, assign rank 1..N
7. Take top 10% (ceil).
8. Write files.
Let me write the script.
For Nigeria's national grid - Lagos is in the western part of Nigeria. The standard is EPSG:26391 (Minna / Nigeria West Belt).
Let me verify by looking at the coordinates. Lagos hex data is around 3.25°E to 3.5°E, 6.4°N to 6.45°N. EPSG:26391 would be appropriate.
Actually, let me think about whether I should use 26391 or some projected CRS. Let me just use 26391.
Let me write the solve script.
Actually, I need to be careful about the `n_slivers_filtered` column. Let me re-read:
"`n_slivers_filtered` (integer)" - this is the number of polygons that were filtered out (under 100 m²) that overlap with each hex cell.
So per hex cell, count of tiny polygons overlapping that cell.
Let me write the script.
0:26
Bash
cd /work && python3 -c "
cd /work && python3 -c "
import geopandas as gpd
lu = gpd.read_file('lagos_landuse.geojson')
# Check CRS to find Nigeria national grid
# Lagos is west Nigeria, so EPSG:26391 (Minna / Nigeria West Belt)
# Let's verify
from pyproj import CRS
crs = CRS.from_epsg(26391)
print(crs)
print(crs.name)
"
EPSG:26391 Minna / Nigeria West Belt
0:42
The Nigeria national grid for the western belt is EPSG:26391 (Minna / Nigeria West Belt). Let me also check if there's a unified national grid EPSG:26392 or EPSG:26393. For Lagos, the West Belt (26391) is appropriate. Actually, let me think about what "Nigeria's national grid" means. Nigeria uses the Minna datum with three belts: - EPSG:26391 - Nigeria West Belt (zone 1) - EPSG:26392 - Nigeria Mid Belt (zone 2) - EPSG:26393 - Nigeria East Belt (zone 3) Lagos is in the West, so EPSG:26391 is correct. Now let me write the solution script. Plan: 1. Read both files 2. Convert both to Nigeria national grid (EPSG:26391) for area calculations 3. Filter out land-use polygons with area < 100 m² 4. For each hex cell, find overlapping land-use polygons 5. Compute area-weighted mean population density: - For each hex cell, find intersection area with each overlapping polygon - area_weighted_density = sum(overlap_area_i * pop_density_i) / sum(overlap_area_i) 6. Rank hex cells by area_weighted_density descending 7. Take top 10% 8. Output hotspots.geoparquet (with geometry) and hotspot_ranking.parquet Let me also track n_overlap_polygons (number of overlapping polygons used in calculation) and n_slivers_filtered (number of polygons excluded due to area < 100 m²). Wait - "Tiny polygons under 100 m² are noise and should be excluded." - This means filtering out land-use polygons whose area is < 100 m² before doing the overlap analysis. And n_slivers_filtered = number of those tiny polygons that were excluded. n_overlap_polygons = number of polygons that overlap each hex cell (after filtering). Let me code this up carefully.
0:42
Write
/work/solve.py
content · 87 lines · py
import geopandas as gpd
import pandas as pd
import numpy as np
# ── 1. Read inputs ──────────────────────────────────────────────────
lu = gpd.read_file("/work/lagos_landuse.geojson")
hex_grid = gpd.read_file("/work/lagos_hex_grid.geojson")
# ── 2. Reproject to Nigeria national grid (West Belt) ──────────────
TARGET_CRS = "EPSG:26391"
lu = lu.to_crs(TARGET_CRS)
hex_grid = hex_grid.to_crs(TARGET_CRS)
# ── 3. Filter out tiny polygons (< 100 m²) ─────────────────────────
lu["_area"] = lu.area
tiny_mask = lu["_area"] < 100.0
n_slivers = int(tiny_mask.sum())
lu_big = lu[~tiny_mask].copy()
print(f"Filtered out {n_slivers} tiny polygons (< 100 m²)")
# ── 4. Spatial join: hex cells ↔ land-use polygons ─────────────────
# We need each hex cell + each overlapping polygon (with intersection area)
# Use sjoin with predicate 'intersects', then compute actual intersection area
sj = gpd.sjoin(hex_grid[["hex_id", "geometry"]],
lu_big[["pop_density", "geometry"]],
how="inner", predicate="intersects")
# For each joined pair, compute the intersection area
# We'll iterate over pairs, compute intersection polygon and its area
rows = []
for idx, row in sj.iterrows():
hex_geom = hex_grid.loc[hex_grid["hex_id"] == row["hex_id"], "geometry"].values[0]
poly_geom = row["geometry"]
try:
inter = hex_geom.intersection(poly_geom)
if inter.is_empty:
continue
inter_area = inter.area
if inter_area <= 0:
continue
rows.append({
"hex_id": row["hex_id"],
"pop_density": row["pop_density"],
"overlap_area": inter_area
})
except Exception:
continue
df = pd.DataFrame(rows)
print(f"Spatial join produced {len(df)} hex-polygon pairs")
# ── 5. Compute area-weighted mean population density per hex ───────
agg = df.groupby("hex_id").agg(
weighted_sum=("overlap_area", lambda s: (s * df.loc[s.index, "pop_density"]).sum()),
total_overlap_area=("overlap_area", "sum"),
n_overlap_polygons=("pop_density", "count")
).reset_index()
agg["area_weighted_density"] = agg["weighted_sum"] / agg["total_overlap_area"]
# ── 6. Rank by descending density ──────────────────────────────────
agg = agg.sort_values("area_weighted_density", ascending=False).reset_index(drop=True)
agg["rank"] = agg.index + 1 # unique integer starting at 1
# ── 7. Top 10 % ────────────────────────────────────────────────────
n_top = max(1, int(np.ceil(0.1 * len(agg))))
top = agg.head(n_top).copy()
print(f"Top 10 % = {n_top} out of {len(agg)} hex cells")
# ── 8. Build outputs ───────────────────────────────────────────────
# a) hotspots.geoparquet — with geometry
top_gdf = hex_grid[hex_grid["hex_id"].isin(top["hex_id"])].copy()
top_gdf = top_gdf.merge(top[["hex_id", "rank", "area_weighted_density"]], on="hex_id", how="inner")
top_gdf = top_gdf.sort_values("rank").reset_index(drop=True)
# Ensure geometry is Polygon/MultiPolygon — already is
top_gdf.to_parquet("/work/hotspots.geoparquet", index=False)
print(f"hotspots.geoparquet: {len(top_gdf)} rows")
# b) hotspot_ranking.parquet — no geometry, extra columns
top["n_slivers_filtered"] = n_slivers
hotspot_rank = top[["hex_id", "rank", "area_weighted_density",
"n_overlap_polygons", "n_slivers_filtered"]].copy()
hotspot_rank = hotspot_rank.sort_values("rank").reset_index(drop=True)
hotspot_rank.to_parquet("/work/hotspot_ranking.parquet", index=False)
print(f"hotspot_ranking.parquet: {len(hotspot_rank)} rows")
print("Done.")
Wrote /work/solve.py (3544 chars).
0:46
Let me run this and see if it works.
0:46
Bash
cd /work && python solve.py
Filtered out 3022 tiny polygons (< 100 m²) Spatial join produced 4631 hex-polygon pairs Top 10 % = 104 out of 1032 hex cells hotspots.geoparquet: 104 rows hotspot_ranking.parquet: 104 rows Done.
0:59
Wait, let me check the output carefully. The hex grid had 1782 cells but only 1032 had overlapping polygons. That's fine. But I need to verify: 1. Both files share exactly the same set of hex_id values. 2. The CRS of hotspots.geoparquet is Nigeria national grid (EPSG:26391). 3. rank is unique, starts at 1, ascending = descending density. 4. n_overlap_polygons is integer. 5. n_slivers_filtered is integer. Let me verify these things.
0:59
Bash
cd /work && python3 -c "
cd /work && python3 -c "
import geopandas as gpd
import pandas as pd
# Check hotspots.geoparquet
g = gpd.read_parquet('hotspots.geoparquet')
print('=== hotspots.geoparquet ===')
print('CRS:', g.crs)
print('Shape:', g.shape)
print('Columns:', g.columns.tolist())
print(g.head())
print('dtypes:')
print(g.dtypes)
print()
# Check hotspot_ranking.parquet
r = pd.read_parquet('hotspot_ranking.parquet')
print('=== hotspot_ranking.parquet ===')
print('Shape:', r.shape)
print('Columns:', r.columns.tolist())
print(r.head())
print('dtypes:')
print(r.dtypes)
print()
# Verify same hex_id sets
set_g = set(g['hex_id'])
set_r = set(r['hex_id'])
print(f'hex_ids in geoparquet: {len(set_g)}, in parquet: {len(set_r)}')
print(f'Same? {set_g == set_r}')
print(f'Diffs (geo - rank): {set_g - set_r}')
print(f'Diffs (rank - geo): {set_r - set_g}')
# Verify rank uniqueness and ordering
g_sorted = g.sort_values('rank')
r_sorted = r.sort_values('rank')
print(f'Ranks unique in geo: {g_sorted[\"rank\"].is_unique}')
print(f'Ranks unique in rank: {r_sorted[\"rank\"].is_unique}')
print(f'Rank 1..N in geo: {g_sorted[\"rank\"].min()}..{g_sorted[\"rank\"].max()}')
print(f'Rank 1..N in rank: {r_sorted[\"rank\"].min()}..{r_sorted[\"rank\"].max()}')
print(f'Rank == index+1 in geo: {(g_sorted[\"rank\"] == range(1, len(g_sorted)+1)).all()}')
print(f'Rank == index+1 in rank: {(r_sorted[\"rank\"] == range(1, len(r_sorted)+1)).all()}')
# Verify descending density
dens = g_sorted['area_weighted_density'].values
print(f'Descending order: {all(dens[i] >= dens[i+1] for i in range(len(dens)-1))}')
# n_overlap_polygons and n_slivers_filtered types
print(f'n_overlap_polygons dtype: {r[\"n_overlap_polygons\"].dtype}')
print(f'n_slivers_filtered dtype: {r[\"n_slivers_filtered\"].dtype}')
print(f'n_slivers_filtered value: {r[\"n_slivers_filtered\"].iloc[0]}')
"
=== hotspots.geoparquet ===
CRS: {"$schema": "https://proj.org/schemas/v0.7/projjson.schema.json", "type": "ProjectedCRS", "name": "Minna / Nigeria West Belt", "base_crs": {"name": "Minna", "datum": {"type": "GeodeticReferenceFrame", "name": "Minna", "ellipsoid": {"name": "Clarke 1880 (RGS)", "semi_major_axis": 6378249.145, "inverse_flattening": 293.465}}, "coordinate_system": {"subtype": "ellipsoidal", "axis": [{"name": "Geodetic latitude", "abbreviation": "Lat", "direction": "north", "unit": "degree"}, {"name": "Geodetic longitude", "abbreviation": "Lon", "direction": "east", "unit": "degree"}]}, "id": {"authority": "EPSG", "code": 4263}}, "conversion": {"name": "Nigeria West Belt", "method": {"name": "Transverse Mercator", "id": {"authority": "EPSG", "code": 9807}}, "parameters": [{"name": "Latitude of natural origin", "value": 4, "unit": "degree", "id": {"authority": "EPSG", "code": 8801}}, {"name": "Longitude of natural origin", "value": 4.5, "unit": "degree", "id": {"authority": "EPSG", "code": 8802}}, {"name": "Scale factor at natural origin", "value": 0.99975, "unit": "unity", "id": {"authority": "EPSG", "code": 8805}}, {"name": "False easting", "value": 230738.26, "unit": "metre", "id": {"authority": "EPSG", "code": 8806}}, {"name": "False northing", "value": 0, "unit": "metre", "id": {"authority": "EPSG", "code": 8807}}]}, "coordinate_system": {"subtype": "Cartesian", "axis": [{"name": "Easting", "abbreviation": "E", "direction": "east", "unit": "metre"}, {"name": "Northing", "abbreviation": "N", "direction": "north", "unit": "metre"}]}, "scope": "Engineering survey, topographic mapping.", "area": "Nigeria - onshore west of 6\u00b030'E, onshore and offshore shelf.", "bbox": {"south_latitude": 3.57, "west_longitude": 2.69, "north_latitude": 13.9, "east_longitude": 6.5}, "id": {"authority": "EPSG", "code": 26391}}
Shape: (104, 4)
Columns: ['hex_id', 'geometry', 'rank', 'area_weighted_density']
hex_id ... area_weighted_density
0 H049-008 ... 49618.4
1 H031-006 ... 49405.2
2 H035-007 ... 49167.2
3 H039-014 ... 49163.0
4 H038-016 ... 49163.0
[5 rows x 4 columns]
dtypes:
hex_id str
geometry geometry
rank int64
area_weighted_density float64
dtype: object
=== hotspot_ranking.parquet ===
Shape: (104, 5)
Columns: ['hex_id', 'rank', 'area_weighted_density', 'n_overlap_polygons', 'n_slivers_filtered']
hex_id rank ... n_overlap_polygons n_slivers_filtered
0 H049-008 1 ... 1 3022
1 H031-006 2 ... 1 3022
2 H035-007 3 ... 1 3022
3 H039-014 4 ... 1 3022
4 H038-016 5 ... 1 3022
[5 rows x 5 columns]
dtypes:
hex_id str
rank int64
area_weighted_density float64
n_overlap_polygons int64
n_slivers_filtered int64
dtype: object
hex_ids in geoparquet: 104, in parquet: 104
Same? True
Diffs (geo - rank): set()
Diffs (rank - geo): set()
Ranks unique in geo: True
Ranks unique in rank: True
Rank 1..N in geo: 1..104
Rank 1..N in rank: 1..104
Rank == index+1 in geo: True
Rank == index+1 in rank: True
Descending order: True
n_overlap_polygons dtype: int64
n_slivers_filtered dtype: int64
n_slivers_filtered value: 3022
1:05
Everything checks out perfectly: 1. **hotspots.geoparquet** - 104 rows, CRS is EPSG:26391 (Minna / Nigeria West Belt - Nigeria national grid), columns: `hex_id`, `rank`, `area_weighted_density`, `geometry` 2. **hotspot_ranking.parquet** - 104 rows, same `hex_id` set, columns: `hex_id`, `rank`, `area_weighted_density`, `n_overlap_polygons` (int64), `n_slivers_filtered` (int64) 3. Both files have exactly the same 104 hex_ids 4. Rank is unique starting at 1, ascending corresponds to descending area_weighted_density 5. `n_overlap_polygons` and `n_slivers_filtered` are integers Let me just verify the files exist on disk.
1:05
Bash
ls -lh /work/hotspots.geoparquet /work/hotspot_ranking.parquet
-rw-r--r-- 1 agent agent 5.4K Jun 16 20:19 /work/hotspot_ranking.parquet -rw-r--r-- 1 agent agent 22K Jun 16 20:19 /work/hotspots.geoparquet
1:17
Everything checks out perfectly. Let me summarize the results.
1:17