run-20260617-194747Z / spa-l2-lagos-hotspot-overlaps

0:00

Hot-spot map for the Lagos housing brief. `lagos_landuse` carries land-use polygons with a population density attribute; `lagos_hex_grid` is a hex grid. I need the top 10 % of hex cells ranked by area-weighted mean population density across overlapping land-use polygons. Tiny polygons under 100 m² are noise and should be excluded. I need two output files that share exactly the same set of `hex_id` values, one row per cell. `hotspots.geoparquet` is a GeoParquet (Polygon or MultiPolygon geometry) in Nigeria's national grid with columns `hex_id`, `rank`, `area_weighted_density`, and `geometry`. `hotspot_ranking.parquet` is a plain Parquet with no geometry, carrying `hex_id`, `rank`, `area_weighted_density`, `n_overlap_polygons` (integer), and `n_slivers_filtered` (integer). In both files `rank` must be a unique integer starting at 1 where ascending rank corresponds to descending `area_weighted_density`.

0:00

System info

{
  "adapter": "openrouter",
  "model": "google/gemma-4-26b-a4b-it",
  "base_url": "https://openrouter.ai/api/v1"
}

0:01

Bash ls -F

lagos_hex_grid.geojson
lagos_landuse.geojson

0:08

Write /work/inspect_data.py

content · 16 lines · py

import geopandas as gpd

# Check the CRS and columns of the input files
landuse = gpd.read_file('lagos_landuse.geojson')
hex_grid = gpd.read_file('lagos_hex_grid.geojson')

print("--- Landuse ---")
print(landuse.head())
print(landuse.crs)
print(landuse.columns)

print("\n--- Hex Grid ---")
print(hex_grid.head())
print(hex_grid.crs)
print(hex_grid.columns)

Wrote /work/inspect_data.py (358 chars).

0:09

Bash python /work/inspect_data.py

--- Landuse ---
                                     id  ...                                           geometry
0  004ca249-d2cf-3617-906e-dea1138e762b  ...  POLYGON ((3.46992 6.44615, 3.46993 6.44566, 3....
1  006a7dc6-2b5e-34fa-84a2-8a679620c9de  ...  POLYGON ((3.42136 6.43091, 3.42118 6.43051, 3....
2  00a334d1-7e8e-3e4c-8565-3501d4ed74df  ...  POLYGON ((3.45824 6.43361, 3.45872 6.43306, 3....
3  00af9ff7-3138-37a7-bbb8-0a2fbe2bb8e2  ...  POLYGON ((3.34834 6.61603, 3.34842 6.61593, 3....
4  00d4b7d6-09b5-3d43-97c5-009fd8c97362  ...  POLYGON ((3.31147 6.48073, 3.31041 6.47971, 3....

[5 rows x 4 columns]
EPSG:4326
Index(['id', 'class', 'pop_density', 'geometry'], dtype='str')

--- Hex Grid ---
     hex_id                                           geometry
0  H000-000  POLYGON ((3.25491 6.40082, 3.2523 6.40535, 3.2...
1  H000-001  POLYGON ((3.25491 6.40987, 3.2523 6.41439, 3.2...
2  H000-002  POLYGON ((3.25492 6.41891, 3.25231 6.42344, 3....
3  H000-003  POLYGON ((3.25492 6.42796, 3.25231 6.43248, 3....
4  H000-004  POLYGON ((3.25493 6.43701, 3.25232 6.44153, 3....
EPSG:4326
Index(['hex_id', 'geometry'], dtype='str')

0:24

Write /work/solve.py

content · 116 lines · py

import geopandas as gpd
import pandas as pd
import numpy as np

def solve():
    # 1. Load data
    landuse = gpd.read_file('lagos_landuse.geojson')
    hex_grid = gpd.read_file('lagos_hex_grid.geojson')

    # 2. CRS handling
    # We need to work in a metric CRS for area calculations.
    # Nigeria's national grid (UTM Zone 31N or similar) is often EPSG:26331 or EPSG:32631.
    # Let's use EPSG:32631 (WGS 84 / UTM zone 31N) as it's standard for Lagos.
    target_crs = 'EPSG:32631'
    
    landuse_proj = landuse.to_crs(target_crs)
    hex_grid_proj = hex_grid.to_crs(target_crs)

    # 3. Filter slivers (< 100 m2)
    # We must count how many were filtered per hex later.
    # It's easier to calculate landuse area first.
    landuse_proj['landuse_area'] = landuse_proj.geometry.area
    
    # Pre-calculate sliver mask
    is_sliver = landuse_proj['landuse_area'] < 100
    n_slivers_total = is_sliver.sum()
    
    # Let's actually track slivers per hex. 
    # To do this properly, we need to know which hex a sliver belongs to.
    # But a sliver might overlap multiple hexes. 
    # The prompt says "n_slivers_filtered (integer)" in the ranking file.
    # Since it's "per cell", we should count how many slivers overlap each cell.
    
    landuse_clean = landuse_proj[~is_sliver].copy()

    # 4. Spatial Join
    # Join hexes with cleaned landuse
    # We need to calculate area-weighted mean: sum(density_i * area_intersect_i) / sum(area_intersect_i)
    # intersection = overlay(hex, landuse)
    
    # To count slivers per hex:
    # We can do an intersection between hex_grid and the slivers.
    slivers = landuse_proj[is_sliver].copy()
    hex_slivers = gpd.sjoin(slivers, hex_grid_proj, how='inner', predicate='intersects')
    sliver_counts = hex_slivers.groupby('hex_id').size().reset_index(name='n_slivers_filtered')

    # Now the main calculation:
    # Intersection of hex_grid and landuse_clean
    intersections = gpd.overlay(hex_grid_proj, landuse_clean, how='intersection')
    
    # Area of intersection in metric CRS
    intersections['inter_area'] = intersections.geometry.area
    
    # Weighted density: density * area
    intersections['weighted_pop'] = intersections['pop_density'] * intersections['inter_area']
    
    # Group by hex_id to get the weighted mean
    grouped = intersections.groupby('hex_id').agg(
        total_weighted_pop=('weighted_pop', 'sum'),
        total_inter_area=('inter_area', 'sum'),
        n_overlap_polygons=('id', 'count') # Using 'id' from landuse as proxy for unique polygons
    ).reset_index()
    
    # Area-weighted mean = sum(pop * area) / sum(area)
    # If a hex has no intersection area (though overlay handles this), avoid division by zero.
    grouped['area_weighted_density'] = grouped['total_weighted_pop'] / grouped['total_inter_area']

    # 5. Select Top 10%
    # The ranking is based on area_weighted_density.
    # "Top 10% of hex cells ranked by area-weighted mean population density"
    # Usually this means the 10% of cells with the highest density.
    
    num_to_keep = max(1, int(np.ceil(len(grouped) * 0.10)))
    
    # Sort by density descending
    grouped = grouped.sort_values('area_weighted_density', ascending=False)
    
    # We take the top 10%
    top_hexes = grouped.head(num_to_keep).copy()
    
    # 6. Ranking
    # "rank must be a unique integer starting at 1 where ascending rank corresponds to descending area_weighted_density"
    top_hexes['rank'] = range(1, len(top_hexes) + 1)
    
    # 7. Finalize DataFrames
    # Merge sliver counts back
    top_hexes = top_hexes.merge(sliver_counts, on='hex_id', how='left')
    top_hexes['n_slivers_filtered'] = top_hexes['n_slivers_filtered'].fillna(0).astype(int)
    
    # The files must share EXACTLY the same set of hex_id values.
    # The instructions say: "I need two output files that share exactly the same set of hex_id values, one row per cell."
    # This refers to the top 10% cells.
    
    # Prepare hotspot_ranking.parquet
    # columns: hex_id, rank, area_weighted_density, n_overlap_polygons, n_slivers_filtered
    ranking_df = top_hexes[['hex_id', 'rank', 'area_weighted_density', 'n_overlap_polygons', 'n_slivers_filtered']]
    
    # Prepare hotspots.geoparquet
    # columns: hex_id, rank, area_weighted_density, geometry
    # Needs to be in Nigeria's national grid (EPSG:32631)
    # We need the geometry from the original hex_grid.
    
    hotspots_gdf = hex_grid_proj.merge(top_hexes[['hex_id', 'rank', 'area_weighted_density']], on='hex_id')
    hotspots_gdf = hotspots_gdf[['hex_id', 'rank', 'area_weighted_density', 'geometry']]
    
    # 8. Save files
    ranking_df.to_parquet('hotspot_ranking.parquet', index=False)
    hotspots_gdf.to_geoparquet('hotspots.geoparquet')

    print("Success.")
    print(f"Top 10% size: {len(top_hexes)}")
    print(f"Rankings: {ranking_df.head()}")

if __name__ == "__main__":
    solve()

Wrote /work/solve.py (4936 chars).