spa-l2-lagos-hotspot-overlaps
Map (reference ← swipe → agent)
0:00
Hot-spot map for the Lagos housing brief. `lagos_landuse` carries land-use polygons with a population density attribute; `lagos_hex_grid` is a hex grid. I need the top 10 % of hex cells ranked by area-weighted mean population density across overlapping land-use polygons. Tiny polygons under 100 m² are noise and should be excluded.
I need two output files that share exactly the same set of `hex_id` values, one row per cell. `hotspots.geoparquet` is a GeoParquet (Polygon or MultiPolygon geometry) in Nigeria's national grid with columns `hex_id`, `rank`, `area_weighted_density`, and `geometry`. `hotspot_ranking.parquet` is a plain Parquet with no geometry, carrying `hex_id`, `rank`, `area_weighted_density`, `n_overlap_polygons` (integer), and `n_slivers_filtered` (integer). In both files `rank` must be a unique integer starting at 1 where ascending rank corresponds to descending `area_weighted_density`.
0:00
System info
{
"adapter": "openrouter",
"model": "google/gemma-4-26b-a4b-it",
"base_url": "https://openrouter.ai/api/v1"
}
0:01
Bash
ls -F
ls -F
lagos_hex_grid.geojson lagos_landuse.geojson
0:08
Write
/work/inspect_data.py
content · 14 lines · py
import geopandas as gpd
# Inspect landuse
landuse = gpd.read_file('lagos_landuse.geojson')
print("Landuse columns:", landuse.columns)
print("Landuse head:\n", landuse.head())
print("Landuse geometry type:", landuse.geometry.type.unique())
# Inspect hex grid
hex_grid = gpd.read_file('lagos_hex_grid.geojson')
print("\nHex grid columns:", hex_grid.columns)
print("Hex grid head:\n", hex_grid.head())
print("Hex grid geometry type:", hex_grid.geometry.type.unique())
Wrote /work/inspect_data.py (467 chars).
0:14
Bash
python /work/inspect_data.py
python /work/inspect_data.py
Landuse columns: Index(['id', 'class', 'pop_density', 'geometry'], dtype='str')
Landuse head:
id ... geometry
0 004ca249-d2cf-3617-906e-dea1138e762b ... POLYGON ((3.46992 6.44615, 3.46993 6.44566, 3....
1 006a7dc6-2b5e-34fa-84a2-8a679620c9de ... POLYGON ((3.42136 6.43091, 3.42118 6.43051, 3....
2 00a334d1-7e8e-3e4c-8565-3501d4ed74df ... POLYGON ((3.45824 6.43361, 3.45872 6.43306, 3....
3 00af9ff7-3138-37a7-bbb8-0a2fbe2bb8e2 ... POLYGON ((3.34834 6.61603, 3.34842 6.61593, 3....
4 00d4b7d6-09b5-3d43-97c5-009fd8c97362 ... POLYGON ((3.31147 6.48073, 3.31041 6.47971, 3....
[5 rows x 4 columns]
Landuse geometry type: <ArrowStringArray>
['Polygon']
Length: 1, dtype: str
Hex grid columns: Index(['hex_id', 'geometry'], dtype='str')
Hex grid head:
hex_id geometry
0 H000-000 POLYGON ((3.25491 6.40082, 3.2523 6.40535, 3.2...
1 H000-001 POLYGON ((3.25491 6.40987, 3.2523 6.41439, 3.2...
2 H000-002 POLYGON ((3.25492 6.41891, 3.25231 6.42344, 3....
3 H000-003 POLYGON ((3.25492 6.42796, 3.25231 6.43248, 3....
4 H000-004 POLYGON ((3.25493 6.43701, 3.25232 6.44153, 3....
Hex grid geometry type: <ArrowStringArray>
['Polygon']
Length: 1, dtype: str
1:53
Write
/work/solve.py
content · 190 lines · py
import geopandas as gpd
import pandas as pd
def solve():
# Load data
landuse = gpd.read_file('lagos_landuse.geojson')
hex_grid = gpd.read_file('lagos_hex_grid.geojson')
# 1. Filter tiny polygons < 100 m^2
# Assuming input is in WGS84 (degrees) based on the print output.
# We need to project to a metric system to calculate area accurately.
# Nigeria's national grid is typically EPSG:26331 (UTM Zone 31N) or similar.
# Let's check if we can use a generic metric CRS or if we should find a specific one.
# For area calculation, projecting to a local UTM is best.
# Let's try to find a suitable UTM zone for Lagos (around 6.5 N, 3.4 E) -> UTM 31N (EPSG:32631)
projected_crs = 'EPSG:32631'
landuse_proj = landuse.to_crs(projected_crs)
hex_grid_proj = hex_grid.to_crs(projected_crs)
# Identify slivers (polygons < 100 m2)
# We need to keep track of which ones were filtered.
# However, the requirement says "Tiny polygons ... should be excluded".
# Does this mean they are excluded from the calculation? Yes.
# We need to count how many slivers are filtered per hex.
# This implies we need to know which landuse polygons intersect which hexes.
# To count n_slivers_filtered, we need to see which polygons in the original landuse
# would have fallen into the hex if they weren't slivers.
# Let's first define the mask for valid landuse polygons.
landuse_proj['area'] = landuse_proj.geometry.area
is_sliver = landuse_proj['area'] < 100
n_slivers_total = is_sliver.sum()
# The task says "n_slivers_filtered" is a column in hotspot_ranking.parquet.
# This likely means the number of sliver polygons that intersected that specific hex.
# Filter landuse
landuse_valid = landuse_proj[~is_sliver].copy()
landuse_slivers = landuse_proj[is_sliver].copy()
# 2. Spatial Join: Intersect hexes with landuse
# We need area-weighted mean population density.
# Area-weighted mean = sum(pop_density_i * area_i_in_hex) / total_area_of_hex_covered_by_landuse
# Wait, the requirement is "area-weighted mean population density across overlapping land-use polygons".
# Usually, this means for each hex, we look at all landuse polygons that overlap it,
# find the intersection area, and weight the density by that intersection area.
# We use the valid landuse for the density calculation.
# But we need to count slivers per hex.
# Step A: Calculate intersection areas for valid landuse
# We'll perform a spatial join to find intersections.
# Using overlay is better for getting the actual intersection geometries.
# Note: hex_grid_proj might be large, and landuse_valid might be large.
# overlay(how='intersection') can be expensive.
# Let's do it more efficiently.
# For each hex, we want to compute:
# density = sum(density_i * area(intersection_i)) / sum(area(intersection_i))
# wait, "area-weighted mean population density across overlapping land-use polygons".
# This often means: (sum_{i in hex} density_i * area(intersection_i)) / (sum_{i in hex} area(intersection_i))
# Actually, the sum of intersection areas might not equal the hex area if there are gaps.
# But the denominator should be the sum of the intersection areas to get a weighted mean of the densities present.
# Let's perform overlay.
intersections = gpd.overlay(hex_grid_proj[['hex_id', 'geometry']], landuse_valid[['pop_density', 'geometry']], how='intersection')
# Calculate area of each intersection piece
intersections['inter_area'] = intersections.geometry.area
# Calculate weighted density components
intersections['weighted_pop'] = intersections['pop_density'] * intersections['inter_area']
# Group by hex_id
hex_stats = intersections.groupby('hex_id').agg(
sum_weighted_pop=('weighted_pop', 'sum'),
sum_inter_area=('inter_area', 'sum'),
n_overlap_polygons=('pop_density', 'count') # This might be wrong if one polygon creates multiple intersection parts.
# The requirement "n_overlap_polygons" usually refers to the number of unique landuse polygons.
).reset_index()
# Wait, n_overlap_polygons needs to be the count of unique landuse polygons.
# Let's adjust the intersection to keep track of landuse IDs.
# Re-doing intersection with landuse ID
landuse_valid_with_id = landuse_proj.copy()
landuse_valid_with_id['landuse_id'] = landuse_valid_with_id['id']
intersections = gpd.overlay(hex_grid_proj[['hex_id', 'geometry']], landuse_valid_with_id[['landuse_id', 'pop_density', 'geometry']], how='intersection')
intersections['inter_area'] = intersections.geometry.area
intersections['weighted_pop'] = intersections['pop_density'] * intersections['inter_area']
hex_stats = intersections.groupby('hex_id').agg(
sum_weighted_pop=('weighted_pop', 'sum'),
sum_inter_area=('inter_area', 'sum'),
n_overlap_polygons=('landuse_id', 'nunique')
).reset_index()
hex_stats['area_weighted_density'] = hex_stats['sum_weighted_pop'] / hex_stats['sum_inter_area']
# Step B: Count slivers per hex
# We need to know how many slivers intersect each hex.
sliver_intersections = gpd.overlay(hex_grid_proj[['hex_id', 'geometry']], landuse_slivers[['geometry']], how='intersection')
# For slivers, we just need to count how many original sliver polygons were involved.
# Actually, overlay(how='intersection') with landuse_slivers will give us the intersection pieces.
# We want to count how many sliver polygons intersect each hex.
# A spatial join is better for this.
sliver_counts = gpd.sjoin(hex_grid_proj[['hex_id', 'geometry']], landuse_slivers[['geometry']], how='left', predicate='intersects')
# The sjoin will duplicate hex_id for every intersection.
# We need to count unique landuse indices (or just count how many indices are not null).
# But landuse_slivers doesn't have a unique ID in the snippet above, let's add one.
landuse_slivers['sli_id'] = range(len(landuse_slivers))
sliver_counts = gpd.sjoin(hex_grid_proj[['hex_id', 'geometry']], landuse_slivers[['sli_id', 'geometry']], how='left', predicate='intersects')
# Count non-null sli_id
hex_sliver_counts = sliver_counts.groupby('hex_id')['sli_id'].nunique().reset_index()
hex_sliver_counts.columns = ['hex_id', 'n_slivers_filtered']
# Step C: Combine everything
# Start with hex_grid to ensure we have all hexes if needed?
# No, the prompt says "top 10 % of hex cells ranked...".
# This implies we only care about hexes that have some landuse?
# Or do we rank all hexes? "top 10 % of hex cells ranked by area-weighted mean population density".
# Usually, if a hex has no landuse, its density is 0.
# Let's merge stats back to the hex_grid.
final_df = hex_grid[['hex_id', 'geometry']].merge(hex_stats, on='hex_id', how='left')
final_df = final_df.merge(hex_sliver_counts, on='hex_id', how='left')
# Fill NaNs
final_df['area_weighted_density'] = final_df['area_weighted_density'].fillna(0)
final_df['n_overlap_polygons'] = final_df['n_overlap_polygons'].fillna(0).astype(int)
final_df['n_slivers_filtered'] = final_df['n_slivers_filtered'].fillna(0).astype(int)
# Now find the top 10%
# Rank descending area_weighted_density.
# "rank must be a unique integer starting at 1 where ascending rank corresponds to descending area_weighted_density"
# This means rank 1 is the highest density.
final_df = final_df.sort_values('area_weighted_density', ascending=False)
final_df['rank'] = range(1, len(final_df) + 1)
# Top 10%
n_top = max(1, int(len(final_df) * 0.10))
hotspots = final_df.head(n_top).copy()
# Ensure the rank is from the full sorted list?
# "top 10 % of hex cells ranked ... rank must be a unique integer starting at 1"
# This is ambiguous. Does it mean the rank is relative to all cells, or 1 to N_top?
# "rank must be a unique integer starting at 1 where ascending rank corresponds to descending area_weighted_density"
# This usually implies the rank is the position in the sorted list.
# And since the output files "share exactly the same set of hex_id values",
# the rank should be the same in both.
# Let's check the requirement: "rank must be a unique integer starting at 1 where ascending rank corresponds to descending area_weighted_density"
# This applies to the "top 10 %".
# If I have 100 cells and take top 10, do their ranks become 1-10 or their original ranks?
# "rank must be a unique integer starting at 1" suggests 1, 2, 3...
# Let's re-rank the hotspots.
hotspots = hotspots.sort_values('area_weighted_density', ascending=False)
hotspots['rank'] = range(1, len(hotspots) + 1)
# Output 1: hotspots.geoparquet
# columns: hex_id, rank, area_weighted_density, geometry
hotspots_geo = hotspots[['hex_id', 'rank', 'area_weighted_density', 'geometry']]
# Re-project to national grid if needed. The prompt says "in Nigeria's national grid".
# Nigeria's national grid is typically EPSG:26331 (UTM 31N).
# But wait, the input is in WGS84 (lat/lon).
# The prompt says "in Nigeria's national grid". I'll use EPSG:26331.
# However, let's check if the input hex_grid is already in some grid.
# It's in WGS84.
# I will project the output to EPSG:26331.
hotspots_geo = hotspots_geo.to_crs('EPSG:26331')
hotspots_geo.to_parquet('hotspots.geoparquet')
# Output 2: hotspot_ranking.parquet
# columns: hex_id, rank, area_weighted_density, n_overlap_polygons, n_slivers_filtered
hotspots_rank = hotspots[['hex_id', 'rank', 'area_weighted_density', 'n_overlap_polygons', 'n_slivers_filtered']]
hotspots_rank.to_parquet('hotspot_ranking.parquet')
print(f"Successfully created files. Hotspots count: {len(hotspots)}")
if __name__ == "__main__":
solve()
Wrote /work/solve.py (10145 chars).
1:54
Bash
python /work/solve.py
python /work/solve.py
Successfully created files. Hotspots count: 178
2:33
Write
/work/solve.py
content · 97 lines · py
import geopandas as gpd
import pandas as pd
def solve():
# Load data
landuse = gpd.read_file('lagos_landuse.geojson')
hex_grid = gpd.read_file('lagos_hex_grid.geojson')
# Project to UTM Zone 31N (EPSG:32631) for area calculations
# This is a standard for Nigeria/Lagos.
projected_crs = 'EPSG:32631'
landuse_proj = landuse.to_crs(projected_crs)
hex_grid_proj = hex_grid.to_crs(projected_crs)
# 1. Identify slivers (< 100 m2)
landuse_proj['area'] = landuse_proj.geometry.area
is_sliver = landuse_proj['area'] < 100
landuse_valid = landuse_proj[~is_sliver].copy()
landuse_slivers = landuse_proj[is_sliver].copy()
# 2. Calculate area-weighted density for each hex
# We need to find intersection geometries between hexes and valid landuse
# Overlay is necessary to get the area of the intersection
# To be efficient, we first spatial join to filter candidates
# then overlay only on those
joined = gpd.sjoin(hex_grid_proj[['hex_id', 'geometry']], landuse_valid[['pop_density', 'geometry']], how='inner', predicate='intersects')
# Actually, overlay is safer for area-weighted density.
# Let's perform overlay.
intersections = gpd.overlay(hex_grid_proj[['hex_id', 'geometry']],
landuse_valid[['pop_density', 'id', 'geometry']],
how='intersection')
intersections['inter_area'] = intersections.geometry.area
intersections['weighted_pop'] = intersections['pop_density'] * intersections['inter_area']
# Group by hex_id
hex_stats = intersections.groupby('hex_id').agg(
sum_weighted_pop=('weighted_pop', 'sum'),
sum_inter_area=('inter_area', 'sum'),
n_overlap_polygons=('id', 'nunique')
).reset_index()
hex_stats['area_weighted_density'] = hex_stats['sum_weighted_pop'] / hex_stats['sum_inter_area']
# 3. Count slivers per hex
# Use sjoin for efficiency
landuse_slivers['sli_id'] = range(len(landuse_slivers))
sliver_sjoin = gpd.sjoin(hex_grid_proj[['hex_id', 'geometry']], landuse_slivers[['sli_id', 'geometry']], how='left', predicate='intersects')
hex_sliver_counts = sliver_sjoin.groupby('hex_id')['sli_id'].nunique().reset_index()
hex_sliver_counts.columns = ['hex_id', 'n_slivers_filtered']
# 4. Combine results
# Start with the full hex grid to ensure we handle all cells correctly for ranking
final_df = hex_grid_proj[['hex_id', 'geometry']].merge(hex_stats, on='hex_id', how='left')
final_df = final_df.merge(hex_sliver_counts, on='hex_id', how='left')
# Fill NaNs
final_df['area_weighted_density'] = final_df['area_weighted_density'].fillna(0.0)
final_df['n_overlap_polygons'] = final_df['n_overlap_polygons'].fillna(0).astype(int)
final_df['n_slivers_filtered'] = final_df['n_slivers_filtered'].fillna(0).astype(int)
# 5. Ranking and Hotspot selection
# Sort by density descending
final_df = final_df.sort_values('area_weighted_density', ascending=False).reset_index(drop=True)
# Assign rank 1..N
final_df['rank'] = range(1, len(final_df) + 1)
# Identify top 10%
n_total = len(final_df)
n_top = max(1, int(n_total * 0.10))
# Note: some people define top 10% as floor, some as ceil.
# Standard "top 10%" usually means taking the top 10% of the count.
hotspots = final_df.head(n_top).copy()
# 6. Output preparation
# Output 1: hotspots.geoparquet
# columns: hex_id, rank, area_weighted_density, geometry
# Must be in Nigeria's national grid (EPSG:26331)
hotspots_geo = hotspots[['hex_id', 'rank', 'area_weighted_density', 'geometry']].to_crs('EPSG:26331')
hotspots_geo.to_parquet('hotspots.geoparquet')
# Output 2: hotspot_ranking.parquet
# columns: hex_id, rank, area_weighted_density, n_overlap_polygons, n_slivers_filtered
hotspots_rank = hotspots[['hex_id', 'rank', 'area_weighted_density', 'n_overlap_polygons', 'n_slivers_filtered']]
hotspots_rank.to_parquet('hotspot_ranking.parquet')
print(f"Total hexes: {n_total}")
print(f"Hotspots: {len(hotspots)}")
print(f"Top density: {hotspots['area_weighted_density'].max()}")
print(f"Min density in hotspots: {hotspots['area_weighted_density'].min()}")
if __name__ == "__main__":
solve()
Wrote /work/solve.py (4376 chars).
2:34
Bash
python /work/solve.py
python /work/solve.py
Total hexes: 1782 Hotspots: 178 Top density: 49816.76175066191 Min density in hotspots: 36510.61962497228
2:41
Bash
# Verify the outputs
# Verify the outputs
python -c "import geopandas as gpd; print('hotspots.geoparquet:', gpd.read_parquet('hotspots.geoparquet').head())"
python -c "import pandas as pd; print('hotspot_ranking.parquet:', pd.read_parquet('hotspot_ranking.parquet').head())"
hotspots.geoparquet: hex_id ... geometry 0 H011-017 ... POLYGON ((537803.63 724900, 537514.955 725400,... 1 H049-008 ... POLYGON ((570712.595 715900, 570423.92 716400,... 2 H031-006 ... POLYGON ((555124.138 713900, 554835.463 714400... 3 H035-007 ... POLYGON ((558588.24 714900, 558299.565 715400,... 4 H039-014 ... POLYGON ((562052.341 721900, 561763.666 722400... [5 rows x 4 columns] hotspot_ranking.parquet: hex_id rank ... n_overlap_polygons n_slivers_filtered 0 H011-017 1 ... 2 1 1 H049-008 2 ... 1 2 2 H031-006 3 ... 1 3 3 H035-007 4 ... 1 2 4 H039-014 5 ... 1 0 [5 rows x 5 columns]
2:45
Bash
python -c "import pandas as pd; p1 = pd.read_parquet('hotspot_ranking.parquet');...
python -c "import pandas as pd; p1 = pd.read_parquet('hotspot_ranking.parquet'); p2 = pd.read_parquet('hotspots.geoparquet'); print('Check hex_id overlap:', p1['hex_id'].equals(p2['hex_id']))"
Check hex_id overlap: True
2:51
Bash
python -c "import pandas as pd; p1 = pd.read_parquet('hotspot_ranking.parquet');...
python -c "import pandas as pd; p1 = pd.read_parquet('hotspot_ranking.parquet'); p2 = pd.read_parquet('hotspots.geoparquet'); print('Check rank matches:', (p1['rank'] == p2['rank']).all())"
Check rank matches: True
3:08