spa-l2-lagos-hotspot-overlaps

Status: done
Score: 0.789
Duration: 3:10
Cost: 0.70¢
Model: google/gemma-4-26b-a4b-it

Map (reference ← swipe → agent)

0:00
Hot-spot map for the Lagos housing brief. `lagos_landuse` carries land-use polygons with a population density attribute; `lagos_hex_grid` is a hex grid. I need the top 10 % of hex cells ranked by area-weighted mean population density across overlapping land-use polygons. Tiny polygons under 100 m² are noise and should be excluded. I need two output files that share exactly the same set of `hex_id` values, one row per cell. `hotspots.geoparquet` is a GeoParquet (Polygon or MultiPolygon geometry) in Nigeria's national grid with columns `hex_id`, `rank`, `area_weighted_density`, and `geometry`. `hotspot_ranking.parquet` is a plain Parquet with no geometry, carrying `hex_id`, `rank`, `area_weighted_density`, `n_overlap_polygons` (integer), and `n_slivers_filtered` (integer). In both files `rank` must be a unique integer starting at 1 where ascending rank corresponds to descending `area_weighted_density`.
0:00
System info
{
  "adapter": "openrouter",
  "model": "google/gemma-4-26b-a4b-it",
  "base_url": "https://openrouter.ai/api/v1"
}
0:01
Bash ls -F
ls -F
lagos_hex_grid.geojson
lagos_landuse.geojson
0:08
Write /work/inspect_data.py
content · 14 lines · py
import geopandas as gpd

# Inspect landuse
landuse = gpd.read_file('lagos_landuse.geojson')
print("Landuse columns:", landuse.columns)
print("Landuse head:\n", landuse.head())
print("Landuse geometry type:", landuse.geometry.type.unique())

# Inspect hex grid
hex_grid = gpd.read_file('lagos_hex_grid.geojson')
print("\nHex grid columns:", hex_grid.columns)
print("Hex grid head:\n", hex_grid.head())
print("Hex grid geometry type:", hex_grid.geometry.type.unique())
Wrote /work/inspect_data.py (467 chars).
0:14
Bash python /work/inspect_data.py
python /work/inspect_data.py
Landuse columns: Index(['id', 'class', 'pop_density', 'geometry'], dtype='str')
Landuse head:
                                      id  ...                                           geometry
0  004ca249-d2cf-3617-906e-dea1138e762b  ...  POLYGON ((3.46992 6.44615, 3.46993 6.44566, 3....
1  006a7dc6-2b5e-34fa-84a2-8a679620c9de  ...  POLYGON ((3.42136 6.43091, 3.42118 6.43051, 3....
2  00a334d1-7e8e-3e4c-8565-3501d4ed74df  ...  POLYGON ((3.45824 6.43361, 3.45872 6.43306, 3....
3  00af9ff7-3138-37a7-bbb8-0a2fbe2bb8e2  ...  POLYGON ((3.34834 6.61603, 3.34842 6.61593, 3....
4  00d4b7d6-09b5-3d43-97c5-009fd8c97362  ...  POLYGON ((3.31147 6.48073, 3.31041 6.47971, 3....

[5 rows x 4 columns]
Landuse geometry type: <ArrowStringArray>
['Polygon']
Length: 1, dtype: str

Hex grid columns: Index(['hex_id', 'geometry'], dtype='str')
Hex grid head:
      hex_id                                           geometry
0  H000-000  POLYGON ((3.25491 6.40082, 3.2523 6.40535, 3.2...
1  H000-001  POLYGON ((3.25491 6.40987, 3.2523 6.41439, 3.2...
2  H000-002  POLYGON ((3.25492 6.41891, 3.25231 6.42344, 3....
3  H000-003  POLYGON ((3.25492 6.42796, 3.25231 6.43248, 3....
4  H000-004  POLYGON ((3.25493 6.43701, 3.25232 6.44153, 3....
Hex grid geometry type: <ArrowStringArray>
['Polygon']
Length: 1, dtype: str
1:53
Write /work/solve.py
content · 190 lines · py
import geopandas as gpd
import pandas as pd

def solve():
    # Load data
    landuse = gpd.read_file('lagos_landuse.geojson')
    hex_grid = gpd.read_file('lagos_hex_grid.geojson')

    # 1. Filter tiny polygons < 100 m^2
    # Assuming input is in WGS84 (degrees) based on the print output. 
    # We need to project to a metric system to calculate area accurately.
    # Nigeria's national grid is typically EPSG:26331 (UTM Zone 31N) or similar.
    # Let's check if we can use a generic metric CRS or if we should find a specific one.
    # For area calculation, projecting to a local UTM is best.
    
    # Let's try to find a suitable UTM zone for Lagos (around 6.5 N, 3.4 E) -> UTM 31N (EPSG:32631)
    projected_crs = 'EPSG:32631'
    
    landuse_proj = landuse.to_crs(projected_crs)
    hex_grid_proj = hex_grid.to_crs(projected_crs)

    # Identify slivers (polygons < 100 m2)
    # We need to keep track of which ones were filtered.
    # However, the requirement says "Tiny polygons ... should be excluded".
    # Does this mean they are excluded from the calculation? Yes.
    
    # We need to count how many slivers are filtered per hex. 
    # This implies we need to know which landuse polygons intersect which hexes.
    
    # To count n_slivers_filtered, we need to see which polygons in the original landuse
    # would have fallen into the hex if they weren't slivers.
    
    # Let's first define the mask for valid landuse polygons.
    landuse_proj['area'] = landuse_proj.geometry.area
    is_sliver = landuse_proj['area'] < 100
    n_slivers_total = is_sliver.sum()
    
    # The task says "n_slivers_filtered" is a column in hotspot_ranking.parquet.
    # This likely means the number of sliver polygons that intersected that specific hex.
    
    # Filter landuse
    landuse_valid = landuse_proj[~is_sliver].copy()
    landuse_slivers = landuse_proj[is_sliver].copy()
    
    # 2. Spatial Join: Intersect hexes with landuse
    # We need area-weighted mean population density.
    # Area-weighted mean = sum(pop_density_i * area_i_in_hex) / total_area_of_hex_covered_by_landuse
    # Wait, the requirement is "area-weighted mean population density across overlapping land-use polygons".
    # Usually, this means for each hex, we look at all landuse polygons that overlap it, 
    # find the intersection area, and weight the density by that intersection area.
    
    # We use the valid landuse for the density calculation.
    # But we need to count slivers per hex.
    
    # Step A: Calculate intersection areas for valid landuse
    # We'll perform a spatial join to find intersections.
    # Using overlay is better for getting the actual intersection geometries.
    
    # Note: hex_grid_proj might be large, and landuse_valid might be large.
    # overlay(how='intersection') can be expensive.
    
    # Let's do it more efficiently.
    # For each hex, we want to compute:
    # density = sum(density_i * area(intersection_i)) / sum(area(intersection_i))
    # wait, "area-weighted mean population density across overlapping land-use polygons".
    # This often means: (sum_{i in hex} density_i * area(intersection_i)) / (sum_{i in hex} area(intersection_i))
    # Actually, the sum of intersection areas might not equal the hex area if there are gaps.
    # But the denominator should be the sum of the intersection areas to get a weighted mean of the densities present.
    
    # Let's perform overlay.
    intersections = gpd.overlay(hex_grid_proj[['hex_id', 'geometry']], landuse_valid[['pop_density', 'geometry']], how='intersection')
    
    # Calculate area of each intersection piece
    intersections['inter_area'] = intersections.geometry.area
    
    # Calculate weighted density components
    intersections['weighted_pop'] = intersections['pop_density'] * intersections['inter_area']
    
    # Group by hex_id
    hex_stats = intersections.groupby('hex_id').agg(
        sum_weighted_pop=('weighted_pop', 'sum'),
        sum_inter_area=('inter_area', 'sum'),
        n_overlap_polygons=('pop_density', 'count') # This might be wrong if one polygon creates multiple intersection parts.
        # The requirement "n_overlap_polygons" usually refers to the number of unique landuse polygons.
    ).reset_index()
    
    # Wait, n_overlap_polygons needs to be the count of unique landuse polygons.
    # Let's adjust the intersection to keep track of landuse IDs.
    
    # Re-doing intersection with landuse ID
    landuse_valid_with_id = landuse_proj.copy()
    landuse_valid_with_id['landuse_id'] = landuse_valid_with_id['id']
    
    intersections = gpd.overlay(hex_grid_proj[['hex_id', 'geometry']], landuse_valid_with_id[['landuse_id', 'pop_density', 'geometry']], how='intersection')
    intersections['inter_area'] = intersections.geometry.area
    intersections['weighted_pop'] = intersections['pop_density'] * intersections['inter_area']
    
    hex_stats = intersections.groupby('hex_id').agg(
        sum_weighted_pop=('weighted_pop', 'sum'),
        sum_inter_area=('inter_area', 'sum'),
        n_overlap_polygons=('landuse_id', 'nunique')
    ).reset_index()
    
    hex_stats['area_weighted_density'] = hex_stats['sum_weighted_pop'] / hex_stats['sum_inter_area']
    
    # Step B: Count slivers per hex
    # We need to know how many slivers intersect each hex.
    sliver_intersections = gpd.overlay(hex_grid_proj[['hex_id', 'geometry']], landuse_slivers[['geometry']], how='intersection')
    # For slivers, we just need to count how many original sliver polygons were involved.
    # Actually, overlay(how='intersection') with landuse_slivers will give us the intersection pieces.
    # We want to count how many sliver polygons intersect each hex.
    # A spatial join is better for this.
    
    sliver_counts = gpd.sjoin(hex_grid_proj[['hex_id', 'geometry']], landuse_slivers[['geometry']], how='left', predicate='intersects')
    # The sjoin will duplicate hex_id for every intersection.
    # We need to count unique landuse indices (or just count how many indices are not null).
    # But landuse_slivers doesn't have a unique ID in the snippet above, let's add one.
    landuse_slivers['sli_id'] = range(len(landuse_slivers))
    sliver_counts = gpd.sjoin(hex_grid_proj[['hex_id', 'geometry']], landuse_slivers[['sli_id', 'geometry']], how='left', predicate='intersects')
    
    # Count non-null sli_id
    hex_sliver_counts = sliver_counts.groupby('hex_id')['sli_id'].nunique().reset_index()
    hex_sliver_counts.columns = ['hex_id', 'n_slivers_filtered']
    
    # Step C: Combine everything
    # Start with hex_grid to ensure we have all hexes if needed? 
    # No, the prompt says "top 10 % of hex cells ranked...". 
    # This implies we only care about hexes that have some landuse? 
    # Or do we rank all hexes? "top 10 % of hex cells ranked by area-weighted mean population density".
    # Usually, if a hex has no landuse, its density is 0.
    
    # Let's merge stats back to the hex_grid.
    final_df = hex_grid[['hex_id', 'geometry']].merge(hex_stats, on='hex_id', how='left')
    final_df = final_df.merge(hex_sliver_counts, on='hex_id', how='left')
    
    # Fill NaNs
    final_df['area_weighted_density'] = final_df['area_weighted_density'].fillna(0)
    final_df['n_overlap_polygons'] = final_df['n_overlap_polygons'].fillna(0).astype(int)
    final_df['n_slivers_filtered'] = final_df['n_slivers_filtered'].fillna(0).astype(int)
    
    # Now find the top 10%
    # Rank descending area_weighted_density.
    # "rank must be a unique integer starting at 1 where ascending rank corresponds to descending area_weighted_density"
    # This means rank 1 is the highest density.
    final_df = final_df.sort_values('area_weighted_density', ascending=False)
    final_df['rank'] = range(1, len(final_df) + 1)
    
    # Top 10%
    n_top = max(1, int(len(final_df) * 0.10))
    hotspots = final_df.head(n_top).copy()
    
    # Ensure the rank is from the full sorted list? 
    # "top 10 % of hex cells ranked ... rank must be a unique integer starting at 1"
    # This is ambiguous. Does it mean the rank is relative to all cells, or 1 to N_top?
    # "rank must be a unique integer starting at 1 where ascending rank corresponds to descending area_weighted_density"
    # This usually implies the rank is the position in the sorted list.
    # And since the output files "share exactly the same set of hex_id values", 
    # the rank should be the same in both.
    
    # Let's check the requirement: "rank must be a unique integer starting at 1 where ascending rank corresponds to descending area_weighted_density"
    # This applies to the "top 10 %". 
    # If I have 100 cells and take top 10, do their ranks become 1-10 or their original ranks?
    # "rank must be a unique integer starting at 1" suggests 1, 2, 3...
    # Let's re-rank the hotspots.
    hotspots = hotspots.sort_values('area_weighted_density', ascending=False)
    hotspots['rank'] = range(1, len(hotspots) + 1)

    # Output 1: hotspots.geoparquet
    # columns: hex_id, rank, area_weighted_density, geometry
    hotspots_geo = hotspots[['hex_id', 'rank', 'area_weighted_density', 'geometry']]
    # Re-project to national grid if needed. The prompt says "in Nigeria's national grid".
    # Nigeria's national grid is typically EPSG:26331 (UTM 31N).
    # But wait, the input is in WGS84 (lat/lon). 
    # The prompt says "in Nigeria's national grid". I'll use EPSG:26331.
    # However, let's check if the input hex_grid is already in some grid. 
    # It's in WGS84. 
    # I will project the output to EPSG:26331.
    hotspots_geo = hotspots_geo.to_crs('EPSG:26331')
    hotspots_geo.to_parquet('hotspots.geoparquet')
    
    # Output 2: hotspot_ranking.parquet
    # columns: hex_id, rank, area_weighted_density, n_overlap_polygons, n_slivers_filtered
    hotspots_rank = hotspots[['hex_id', 'rank', 'area_weighted_density', 'n_overlap_polygons', 'n_slivers_filtered']]
    hotspots_rank.to_parquet('hotspot_ranking.parquet')
    
    print(f"Successfully created files. Hotspots count: {len(hotspots)}")

if __name__ == "__main__":
    solve()
Wrote /work/solve.py (10145 chars).
1:54
Bash python /work/solve.py
python /work/solve.py
Successfully created files. Hotspots count: 178
2:33
Write /work/solve.py
content · 97 lines · py
import geopandas as gpd
import pandas as pd

def solve():
    # Load data
    landuse = gpd.read_file('lagos_landuse.geojson')
    hex_grid = gpd.read_file('lagos_hex_grid.geojson')

    # Project to UTM Zone 31N (EPSG:32631) for area calculations
    # This is a standard for Nigeria/Lagos.
    projected_crs = 'EPSG:32631'
    landuse_proj = landuse.to_crs(projected_crs)
    hex_grid_proj = hex_grid.to_crs(projected_crs)

    # 1. Identify slivers (< 100 m2)
    landuse_proj['area'] = landuse_proj.geometry.area
    is_sliver = landuse_proj['area'] < 100
    landuse_valid = landuse_proj[~is_sliver].copy()
    landuse_slivers = landuse_proj[is_sliver].copy()
    
    # 2. Calculate area-weighted density for each hex
    # We need to find intersection geometries between hexes and valid landuse
    # Overlay is necessary to get the area of the intersection
    
    # To be efficient, we first spatial join to filter candidates
    # then overlay only on those
    joined = gpd.sjoin(hex_grid_proj[['hex_id', 'geometry']], landuse_valid[['pop_density', 'geometry']], how='inner', predicate='intersects')
    
    # Actually, overlay is safer for area-weighted density.
    # Let's perform overlay.
    intersections = gpd.overlay(hex_grid_proj[['hex_id', 'geometry']], 
                                landuse_valid[['pop_density', 'id', 'geometry']], 
                                how='intersection')
    
    intersections['inter_area'] = intersections.geometry.area
    intersections['weighted_pop'] = intersections['pop_density'] * intersections['inter_area']
    
    # Group by hex_id
    hex_stats = intersections.groupby('hex_id').agg(
        sum_weighted_pop=('weighted_pop', 'sum'),
        sum_inter_area=('inter_area', 'sum'),
        n_overlap_polygons=('id', 'nunique')
    ).reset_index()
    
    hex_stats['area_weighted_density'] = hex_stats['sum_weighted_pop'] / hex_stats['sum_inter_area']
    
    # 3. Count slivers per hex
    # Use sjoin for efficiency
    landuse_slivers['sli_id'] = range(len(landuse_slivers))
    sliver_sjoin = gpd.sjoin(hex_grid_proj[['hex_id', 'geometry']], landuse_slivers[['sli_id', 'geometry']], how='left', predicate='intersects')
    hex_sliver_counts = sliver_sjoin.groupby('hex_id')['sli_id'].nunique().reset_index()
    hex_sliver_counts.columns = ['hex_id', 'n_slivers_filtered']
    
    # 4. Combine results
    # Start with the full hex grid to ensure we handle all cells correctly for ranking
    final_df = hex_grid_proj[['hex_id', 'geometry']].merge(hex_stats, on='hex_id', how='left')
    final_df = final_df.merge(hex_sliver_counts, on='hex_id', how='left')
    
    # Fill NaNs
    final_df['area_weighted_density'] = final_df['area_weighted_density'].fillna(0.0)
    final_df['n_overlap_polygons'] = final_df['n_overlap_polygons'].fillna(0).astype(int)
    final_df['n_slivers_filtered'] = final_df['n_slivers_filtered'].fillna(0).astype(int)
    
    # 5. Ranking and Hotspot selection
    # Sort by density descending
    final_df = final_df.sort_values('area_weighted_density', ascending=False).reset_index(drop=True)
    # Assign rank 1..N
    final_df['rank'] = range(1, len(final_df) + 1)
    
    # Identify top 10%
    n_total = len(final_df)
    n_top = max(1, int(n_total * 0.10))
    # Note: some people define top 10% as floor, some as ceil. 
    # Standard "top 10%" usually means taking the top 10% of the count.
    
    hotspots = final_df.head(n_top).copy()
    
    # 6. Output preparation
    # Output 1: hotspots.geoparquet
    # columns: hex_id, rank, area_weighted_density, geometry
    # Must be in Nigeria's national grid (EPSG:26331)
    hotspots_geo = hotspots[['hex_id', 'rank', 'area_weighted_density', 'geometry']].to_crs('EPSG:26331')
    hotspots_geo.to_parquet('hotspots.geoparquet')
    
    # Output 2: hotspot_ranking.parquet
    # columns: hex_id, rank, area_weighted_density, n_overlap_polygons, n_slivers_filtered
    hotspots_rank = hotspots[['hex_id', 'rank', 'area_weighted_density', 'n_overlap_polygons', 'n_slivers_filtered']]
    hotspots_rank.to_parquet('hotspot_ranking.parquet')
    
    print(f"Total hexes: {n_total}")
    print(f"Hotspots: {len(hotspots)}")
    print(f"Top density: {hotspots['area_weighted_density'].max()}")
    print(f"Min density in hotspots: {hotspots['area_weighted_density'].min()}")

if __name__ == "__main__":
    solve()
Wrote /work/solve.py (4376 chars).
2:34
Bash python /work/solve.py
python /work/solve.py
Total hexes: 1782
Hotspots: 178
Top density: 49816.76175066191
Min density in hotspots: 36510.61962497228
2:41
Bash # Verify the outputs
# Verify the outputs
python -c "import geopandas as gpd; print('hotspots.geoparquet:', gpd.read_parquet('hotspots.geoparquet').head())"
python -c "import pandas as pd; print('hotspot_ranking.parquet:', pd.read_parquet('hotspot_ranking.parquet').head())"
hotspots.geoparquet:      hex_id  ...                                           geometry
0  H011-017  ...  POLYGON ((537803.63 724900, 537514.955 725400,...
1  H049-008  ...  POLYGON ((570712.595 715900, 570423.92 716400,...
2  H031-006  ...  POLYGON ((555124.138 713900, 554835.463 714400...
3  H035-007  ...  POLYGON ((558588.24 714900, 558299.565 715400,...
4  H039-014  ...  POLYGON ((562052.341 721900, 561763.666 722400...

[5 rows x 4 columns]
hotspot_ranking.parquet:      hex_id  rank  ...  n_overlap_polygons  n_slivers_filtered
0  H011-017     1  ...                   2                   1
1  H049-008     2  ...                   1                   2
2  H031-006     3  ...                   1                   3
3  H035-007     4  ...                   1                   2
4  H039-014     5  ...                   1                   0

[5 rows x 5 columns]
2:45
Bash python -c "import pandas as pd; p1 = pd.read_parquet('hotspot_ranking.parquet');...
python -c "import pandas as pd; p1 = pd.read_parquet('hotspot_ranking.parquet'); p2 = pd.read_parquet('hotspots.geoparquet'); print('Check hex_id overlap:', p1['hex_id'].equals(p2['hex_id']))"
Check hex_id overlap: True
2:51
Bash python -c "import pandas as pd; p1 = pd.read_parquet('hotspot_ranking.parquet');...
python -c "import pandas as pd; p1 = pd.read_parquet('hotspot_ranking.parquet'); p2 = pd.read_parquet('hotspots.geoparquet'); print('Check rank matches:', (p1['rank'] == p2['rank']).all())"
Check rank matches: True
3:08