run-20260606-1129Z / spa-l2-lagos-hotspot-overlaps

0:00

Hot-spot map for the Lagos housing brief. `lagos_landuse` carries land-use polygons with a population density attribute; `lagos_hex_grid` is a hex grid. I need the top 10 % of hex cells ranked by area-weighted mean population density across overlapping land-use polygons. Tiny polygons under 100 m² are noise and should be excluded. I need two output files that share exactly the same set of `hex_id` values, one row per cell. `hotspots.geoparquet` is a GeoParquet (Polygon or MultiPolygon geometry) in Nigeria's national grid with columns `hex_id`, `rank`, `area_weighted_density`, and `geometry`. `hotspot_ranking.parquet` is a plain Parquet with no geometry, carrying `hex_id`, `rank`, `area_weighted_density`, `n_overlap_polygons` (integer), and `n_slivers_filtered` (integer). In both files `rank` must be a unique integer starting at 1 where ascending rank corresponds to descending `area_weighted_density`.

0:00

System info

{
  "adapter": "openrouter",
  "model": "google/gemma-4-26b-a4b-it",
  "base_url": "https://openrouter.ai/api/v1"
}

0:01

Bash ls -F

lagos_hex_grid.geojson
lagos_landuse.geojson

0:13

Write /work/inspect_data.py

content · 14 lines · py

import geopandas as gpd

# Inspect lagos_landuse
landuse = gpd.read_file('lagos_landuse.geojson')
print("Landuse columns:", landuse.columns)
print("Landuse head:\n", landuse.head())
print("Landuse CRS:", landuse.crs)

# Inspect lagos_hex_grid
hex_grid = gpd.read_file('lagos_hex_grid.geojson')
print("\nHex grid columns:", hex_grid.columns)
print("Hex grid head:\n", hex_grid.head())
print("Hex grid CRS:", hex_grid.crs)

Wrote /work/inspect_data.py (421 chars).

0:14

Bash python /work/inspect_data.py

Landuse columns: Index(['id', 'class', 'pop_density', 'geometry'], dtype='str')
Landuse head:
                                      id  ...                                           geometry
0  004ca249-d2cf-3617-906e-dea1138e762b  ...  POLYGON ((3.46992 6.44615, 3.46993 6.44566, 3....
1  006a7dc6-2b5e-34fa-84a2-8a679620c9de  ...  POLYGON ((3.42136 6.43091, 3.42118 6.43051, 3....
2  00a334d1-7e8e-3e4c-8565-3501d4ed74df  ...  POLYGON ((3.45824 6.43361, 3.45872 6.43306, 3....
3  00af9ff7-3138-37a7-bbb8-0a2fbe2bb8e2  ...  POLYGON ((3.34834 6.61603, 3.34842 6.61593, 3....
4  00d4b7d6-09b5-3d43-97c5-009fd8c97362  ...  POLYGON ((3.31147 6.48073, 3.31041 6.47971, 3....

[5 rows x 4 columns]
Landuse CRS: EPSG:4326

Hex grid columns: Index(['hex_id', 'geometry'], dtype='str')
Hex grid head:
      hex_id                                           geometry
0  H000-000  POLYGON ((3.25491 6.40082, 3.2523 6.40535, 3.2...
1  H000-001  POLYGON ((3.25491 6.40987, 3.2523 6.41439, 3.2...
2  H000-002  POLYGON ((3.25492 6.41891, 3.25231 6.42344, 3....
3  H000-003  POLYGON ((3.25492 6.42796, 3.25231 6.43248, 3....
4  H000-004  POLYGON ((3.25493 6.43701, 3.25232 6.44153, 3....
Hex grid CRS: EPSG:4326

0:44

Write /work/solve.py

content · 154 lines · py

import geopandas as gpd
import pandas as pd
import numpy as np

def solve():
    # 1. Load data
    landuse = gpd.read_file('lagos_landuse.geojson')
    hex_grid = gpd.read_file('lagos_hex_grid.geojson')

    # 2. Project to a metric CRS (Nigeria's national grid - EPSG:26332 or similar, 
    # but let's use a generic UTM for Lagos if specific EPSG is not known, 
    # or just use EPSG:32631/32632 which is UTM Zone 31N/32N. 
    # Lagos is roughly at 6.5 N, 3.4 E. UTM Zone 31N is EPSG:32631.
    # Let's check area of polygons in 4326 vs projected.
    # To be safe, I'll use a standard projection for Nigeria: EPSG:26332 (Minna / Nigeria Mid Belt)
    # or just UTM 31N (EPSG:32631) which covers Lagos.
    
    target_crs = 'EPSG:32631'
    
    landuse_proj = landuse.to_crs(target_crs)
    hex_grid_proj = hex_grid.to_crs(target_crs)

    # 3. Filter out tiny polygons (< 100 m2)
    # We need to track how many were filtered
    landuse_proj['orig_area'] = landuse_proj.geometry.area
    landuse_proj['is_sliver'] = landuse_proj['orig_area'] < 100
    n_slivers_filtered = landuse_proj['is_sliver'].sum()
    
    landuse_filtered = landuse_proj[~landuse_proj['is_sliver']].copy()
    
    # 4. Spatial Join: Intersect hexes with landuse polygons
    # We need to calculate area-weighted mean density.
    # For each hex, we find overlapping polygons, calculate intersection area, 
    # and weight the density by (intersection_area / hex_area).
    
    # To optimize, we do a spatial join between hexes and landuse
    # Note: We need the area of the intersection.
    
    # Let's calculate hex area first
    hex_grid_proj['hex_area'] = hex_grid_proj.geometry.area
    
    # Spatial join: hexes intersect landuse
    # This returns one row for each intersection
    intersections = gpd.overlay(hex_grid_proj[['hex_id', 'hex_area', 'geometry']], 
                                landuse_filtered[['pop_density', 'geometry']], 
                                how='intersection')
    
    # Calculate intersection area
    intersections['inter_area'] = intersections.geometry.area
    
    # Group by hex_id to calculate area-weighted mean
    # Weighted Mean = Sum(density_i * area_i) / Sum(area_i)
    # Note: Sum(area_i) might be slightly less than hex_area due to precision/slivers
    # But the instruction says "area-weighted mean population density across overlapping land-use polygons"
    
    def calc_weighted_stats(group):
        # weight = inter_area / total_inter_area_in_hex
        # Or weight = inter_area / hex_area? 
        # "area-weighted mean population density across overlapping land-use polygons"
        # usually implies: sum(density * inter_area) / sum(inter_area)
        
        sum_inter_area = group['inter_area'].sum()
        if sum_inter_area == 0:
            return pd.Series({
                'area_weighted_density': 0.0,
                'n_overlap_polygons': 0,
                'sum_weighted_density': 0.0
            })
            
        weighted_density = (group['pop_density'] * group['inter_area']).sum() / sum_inter_area
        return pd.Series({
            'area_weighted_density': weighted_density,
            'n_overlap_polygons': len(group),
            'sum_weighted_density': (group['pop_density'] * group['inter_area']).sum()
        })

    stats = intersections.groupby('hex_id').apply(calc_weighted_stats).reset_index()
    
    # Join stats back to the original hex_grid to include all hexes (even those with 0 density)
    # The task asks for "top 10% of hex cells". 
    # If a hex has no overlaps, its density is 0.
    
    final_df = hex_grid[['hex_id']].merge(stats, on='hex_id', how='left')
    final_df['area_weighted_density'] = final_df['area_weighted_density'].fillna(0.0)
    final_df['n_overlap_polygons'] = final_df['n_overlap_polygons'].fillna(0).astype(int)
    
    # 5. Rank and select top 10%
    # "top 10 % of hex cells ranked by area-weighted mean population density"
    # We need to sort by density descending.
    
    # Calculate number of cells to keep
    n_cells = len(final_df)
    n_top = max(1, int(np.ceil(0.1 * n_cells)))
    
    # Rank: 1 is highest density
    final_df['rank'] = final_df['area_weighted_density'].rank(ascending=False, method='first').astype(int)
    
    # Get top 10%
    top_10_percent = final_df.sort_values('rank').head(n_top).copy()
    
    # 6. Prepare outputs
    # Output 1: hotspots.geoparquet (GeoParquet, Nigeria's national grid, columns: hex_id, rank, area_weighted_density, geometry)
    # We need the geometry from the original hex_grid in the national grid.
    # Wait, "Nigeria's national grid". Let's use EPSG:26332 if that's what they mean, 
    # or maybe they just mean the CRS of the input if it's already a projection.
    # The input is 4326. Let's re-project to EPSG:26332 for the output if "national grid" is requested.
    # Actually, usually in these tasks, if a specific CRS is not provided, 
    # and they say "national grid", they might expect a specific EPSG.
    # However, EPSG:26332 is a common one for Nigeria. 
    # But looking at the prompt again, it doesn't specify the EPSG code.
    # Let's check if there's a hint. "Nigeria's national grid".
    # I will use EPSG:26332 for the output.
    
    # Re-read: "hotspots.geoparquet ... in Nigeria's national grid"
    # I'll use EPSG:26332.
    
    # First, get the geometries for the top cells from the original hex_grid (unprojected) 
    # and then project them to the national grid.
    geometries = hex_grid.set_index('hex_id').loc[top_10_percent['hex_id'], 'geometry']
    top_10_percent = top_10_percent.set_index('hex_id')
    top_10_percent['geometry'] = geometries
    
    # Convert index back to column
    top_10_percent = top_10_percent.reset_index()
    
    # Reorder columns for hotspots.geoparquet
    hotspots = top_10_percent[['hex_id', 'rank', 'area_weighted_density', 'geometry']].copy()
    
    # Project to national grid (EPSG:26332)
    hotspots = hotspots.to_crs('EPSG:26332')
    hotspots.to_parquet('hotspots.geoparquet')
    
    # Output 2: hotspot_ranking.parquet (Plain Parquet, columns: hex_id, rank, area_weighted_density, n_overlap_polygons, n_slivers_filtered)
    # Note: n_slivers_filtered is a single value for the whole dataset? 
    # Or per hex? "n_slivers_filtered (integer)". 
    # Usually, if it's in a row, it's per row. But the number of slivers filtered is a global count.
    # Let's check the prompt: "carrying hex_id, rank, area_weighted_density, n_overlap_polygons (integer), and n_slivers_filtered (integer)".
    # If it's per row, and it's a global count, it's the same for all rows.
    
    ranking = top_10_percent[['hex_id', 'rank', 'area_weighted_density', 'n_overlap_polygons']].copy()
    ranking['n_slivers_filtered'] = n_slivers_filtered
    
    # Ensure column order
    ranking = ranking[['hex_id', 'rank', 'area_weighted_density', 'n_overlap_polygons', 'n_slivers_filtered']]
    ranking.to_parquet('hotspot_ranking.parquet')
    
    print("Success")
    print(f"Total cells: {n_cells}")
    print(f"Top 10% count: {n_top}")
    print(f"Slivers filtered: {n_slivers_filtered}")

if __name__ == "__main__":
    solve()

Wrote /work/solve.py (7211 chars).