run-20260616-214355Z / spa-l2-lagos-hotspot-overlaps

0:00

Hot-spot map for the Lagos housing brief. `lagos_landuse` carries land-use polygons with a population density attribute; `lagos_hex_grid` is a hex grid. I need the top 10 % of hex cells ranked by area-weighted mean population density across overlapping land-use polygons. Tiny polygons under 100 m² are noise and should be excluded. I need two output files that share exactly the same set of `hex_id` values, one row per cell. `hotspots.geoparquet` is a GeoParquet (Polygon or MultiPolygon geometry) in Nigeria's national grid with columns `hex_id`, `rank`, `area_weighted_density`, and `geometry`. `hotspot_ranking.parquet` is a plain Parquet with no geometry, carrying `hex_id`, `rank`, `area_weighted_density`, `n_overlap_polygons` (integer), and `n_slivers_filtered` (integer). In both files `rank` must be a unique integer starting at 1 where ascending rank corresponds to descending `area_weighted_density`.

0:00

System info

{
  "adapter": "openrouter",
  "model": "deepseek/deepseek-v4-flash",
  "base_url": "https://openrouter.ai/api/v1"
}

0:15

Let me break down this task:

1. I have two datasets: `lagos_landuse` (land-use polygons with population density attribute) and `lagos_hex_grid` (hex grid)
2. I need to:
   - Exclude tiny polygons under 100 m² (noise)
   - Compute area-weighted mean population density for each hex cell across overlapping land-use polygons
   - Find the top 10% of hex cells ranked by this density
   - Output two files: `hotspots.geoparquet` and `hotspot_ranking.parquet`

Let me start by exploring the data to understand the schema.

0:15

Bash cd /work && python -c "

cd /work && python -c "
import geopandas as gpd
import pyarrow.parquet as pq

# Explore the datasets
for f in ['lagos_landuse', 'lagos_hex_grid']:
    for ext in ['gpkg', 'geoparquet', 'parquet', 'shp']:
        try:
            gdf = gpd.read_file(f'{f}.{ext}')
            print(f'=== {f}.{ext} ===')
            print(f'CRS: {gdf.crs}')
            print(f'Columns: {list(gdf.columns)}')
            print(f'Shape: {gdf.shape}')
            print(gdf.head(2))
            print()
            break
        except:
            pass
"

(no output)

0:19

It seems no files were found. Let me check what files are in the working directory.

0:19

Bash cd /work && ls -la

total 4244
drwxr-xr-x 1 agent agent      86 Jun 17 00:43 .
drwxr-xr-x 1 root  root       14 Jun 17 00:43 ..
-rw-r--r-- 1 agent agent  736774 Jun 17 00:43 lagos_hex_grid.geojson
-rw-r--r-- 1 agent agent 3607119 Jun 17 00:43 lagos_landuse.geojson

0:22

Both are GeoJSON files. Let me read them to understand their structure.

0:22

Bash cd /work && python -c "

cd /work && python -c "
import geopandas as gpd

lu = gpd.read_file('lagos_landuse.geojson')
hex = gpd.read_file('lagos_hex_grid.geojson')

print('=== LANDUSE ===')
print(f'CRS: {lu.crs}')
print(f'Columns: {list(lu.columns)}')
print(f'Shape: {lu.shape}')
print(lu.head(3))
print(lu.dtypes)
print()
print('=== HEX GRID ===')
print(f'CRS: {hex.crs}')
print(f'Columns: {list(hex.columns)}')
print(f'Shape: {hex.shape}')
print(hex.head(3))
print(hex.dtypes)
"

=== LANDUSE ===
CRS: EPSG:4326
Columns: ['id', 'class', 'pop_density', 'geometry']
Shape: (5542, 4)
                                     id  ...                                           geometry
0  004ca249-d2cf-3617-906e-dea1138e762b  ...  POLYGON ((3.46992 6.44615, 3.46993 6.44566, 3....
1  006a7dc6-2b5e-34fa-84a2-8a679620c9de  ...  POLYGON ((3.42136 6.43091, 3.42118 6.43051, 3....
2  00a334d1-7e8e-3e4c-8565-3501d4ed74df  ...  POLYGON ((3.45824 6.43361, 3.45872 6.43306, 3....

[3 rows x 4 columns]
id                  str
class               str
pop_density     float64
geometry       geometry
dtype: object

=== HEX GRID ===
CRS: EPSG:4326
Columns: ['hex_id', 'geometry']
Shape: (1782, 2)
     hex_id                                           geometry
0  H000-000  POLYGON ((3.25491 6.40082, 3.2523 6.40535, 3.2...
1  H000-001  POLYGON ((3.25491 6.40987, 3.2523 6.41439, 3.2...
2  H000-002  POLYGON ((3.25492 6.41891, 3.25231 6.42344, 3....
hex_id           str
geometry    geometry
dtype: object

0:51

Let me understand the task:

1. Land-use polygons with pop_density attribute
2. Hex grid with hex_id
3. Need to compute area-weighted mean population density for each hex cell, considering only land-use polygons that overlap with the hex cell
4. Exclude tiny polygons under 100 m² (slivers)
5. Find top 10% of hex cells ranked by area-weighted mean pop density
6. Output two files:
- hotspots.geoparquet: GeoParquet with hex_id, rank, area_weighted_density, geometry (in Nigeria's national grid - EPSG:26391? Let me check)
- hotspot_ranking.parquet: Parquet with hex_id, rank, area_weighted_density, n_overlap_polygons (int), n_slivers_filtered (int)

Wait, "Nigeria's national grid" - Let me check the CRS. Nigeria uses EPSG:26391 (Minna / Nigeria East Belt) or EPSG:26392 (Minna / Nigeria Mid Belt) or EPSG:26393 (Minna / Nigeria West Belt). Actually, Nigeria's national grid is EPSG:26391 (Minna / Nigeria East Belt) or maybe the more modern one is EPSG:26392. Let me check what's typical.

Actually, there's also EPSG:26391 which is "Minna / Nigeria East Belt". But the most commonly used "Nigeria national grid" would be... Let me think. There's also EPSG:26331 (Minna / Nigeria West Belt) etc.

Actually, the standard Nigeria national grid is the projected CRS using the Minna datum. Let me just use a reasonable one. The data is in EPSG:4326 (lat/lon). I need to transform to Nigeria's national grid.

Let me think about what "Nigeria's national grid" means. It's likely EPSG:26391 (Minna / Nigeria East Belt) or EPSG:26392 (Minna / Nigeria Mid Belt) or EPSG:26393 (Minna / Nigeria West Belt). Since Lagos is in the west of Nigeria, probably EPSG:26393 (Minna / Nigeria West Belt) would be appropriate for Lagos.

Let me check if there's a better one. Actually, the Nigeria national grid is sometimes referred to as "Minna datum / Nigeria East Belt" (EPSG:26391), "Minna / Nigeria Mid Belt" (EPSG:26392), or "Minna / Nigeria West Belt" (EPSG:26393).

For Lagos, which is in western Nigeria, EPSG:26393 (Minna / Nigeria West Belt) would be the correct one.

Let me proceed with the computation.

Steps:
1. Read land-use polygons and hex grid
2. Reproject both to Nigeria national grid (EPSG:26393) so we can compute areas in m²
3. For each land-use polygon, compute area. Filter out those under 100 m² (slivers)
4. For each hex cell, find overlapping land-use polygons
5. Compute area-weighted mean population density = sum(area_of_intersection * pop_density) / sum(area_of_intersection)
6. Also count n_overlap_polygons and n_slivers_filtered
7. Rank hex cells by area_weighted_density descending
8. Take top 10%
9. Write output files

Wait, let me re-read: "Tiny polygons under 100 m² are noise and should be excluded." - this means the land-use polygons themselves that are smaller than 100 m² should be excluded from the analysis entirely.

Also: "n_slivers_filtered" - this is the number of tiny polygons (under 100 m²) that were filtered out for each hex cell? Or total across the dataset? Let me think...

Actually, looking at the output schema: n_slivers_filtered is per hex cell (one row per hex cell). So it's the number of land-use polygons that overlap with that hex cell but were excluded because they were under 100 m². But wait, if we exclude them entirely from the analysis, then we count how many tiny polygons overlapped each hex cell.

Actually, let me re-read: "Tiny polygons under 100 m² are noise and should be excluded." So we filter them out before computing area-weighted density. But we also want to record how many were filtered per hex cell.

So:
- For each land-use polygon, compute its area (in the projected CRS)
- If area < 100 m², it's a sliver - filter it out
- For each hex cell, count how many overlapping land-use polygons were filtered out (n_slivers_filtered)
- Also count how many overlapping land-use polygons were used (n_overlap_polygons)
- Compute area-weighted density using only non-filtered polygons

Let me code this up.

Actually, for the area-weighted mean, we need the intersection area between each (non-sliver) land-use polygon and the hex cell, multiplied by the polygon's pop_density, divided by the total intersection area.

Let me write the script.

Wait - "area-weighted mean population density across overlapping land-use polygons". So for each hex cell, the density is:

sum_over_polygons(area_of_intersection * pop_density_of_polygon) / sum_over_polygons(area_of_intersection)

Where area_of_intersection is the area of the overlapping portion of the polygon and the hex cell.

This makes sense - it's like a weighted average where larger overlaps have more influence.

Let me code this up with geopandas.

0:51

Bash cd /work && python -c "

cd /work && python -c "
import geopandas as gpd

# Read data
lu = gpd.read_file('lagos_landuse.geojson')
hex = gpd.read_file('lagos_hex_grid.geojson')

print('Landuse CRS:', lu.crs)
print('Hex CRS:', hex.crs)

# Check available CRS for Nigeria
import pyproj
print()
print('Checking EPSG:26393 (Minna / Nigeria West Belt):')
try:
    crs = pyproj.CRS.from_epsg(26393)
    print(crs)
except:
    print('Not found')

print()
print('Checking EPSG:26391 (Minna / Nigeria East Belt):')
try:
    crs = pyproj.CRS.from_epsg(26391)
    print(crs)
except:
    print('Not found')
"

Landuse CRS: EPSG:4326
Hex CRS: EPSG:4326

Checking EPSG:26393 (Minna / Nigeria West Belt):
EPSG:26393

Checking EPSG:26391 (Minna / Nigeria East Belt):
EPSG:26391

1:02

The user says "Nigeria's national grid". Let me check which is the Nigeria national grid - it might be EPSG:26391 (Minna / Nigeria East Belt) or EPSG:26393 (Minna / Nigeria West Belt). Actually, the Nigerian national grid is typically EPSG:26393 (Minna / Nigeria West Belt) or EPSG:26391 (Minna / Nigeria East Belt). But there's also EPSG:26392 (Minna / Nigeria Mid Belt). Let me check which one covers Lagos.

Lagos is in the western part of Nigeria, so EPSG:26393 (Minna / Nigeria West Belt) would be appropriate. Actually, let me look up what is commonly referred to as "Nigeria's national grid". Sometimes people mean EPSG:26391, 26392, 26393 as the "Nigeria Minna belts". But actually Nigeria's official projected CRS is EPSG:26393 for the west zone (which covers Lagos). Let me just use EPSG:26393 since Lagos is in Western Nigeria.

Actually, I should double check what is meant by "Nigeria's national grid". Let me check.

1:02

Bash cd /work && python -c "

cd /work && python -c "
import geopandas as gpd
import pyproj

# Check bounds of Lagos data
lu = gpd.read_file('lagos_landuse.geojson')
print('Lagos bounds (WGS84):', lu.total_bounds)

# The Nigeria national grid is typically EPSG:26393 for West (Lagos area)
# But let's verify with the proper name
crs = pyproj.CRS.from_epsg(26393)
print('EPSG:26393:', crs.name)
print('Area:', crs.area_of_use)
"

Lagos bounds (WGS84): [3.25026042 6.4009372  3.6701081  6.6980539 ]
EPSG:26393: Minna / Nigeria East Belt
Area: - name: Nigeria east of 10°30'E.
- bounds: (10.49, 6.43, 14.65, 13.72)

1:09

The data bounds show it's in Lagos (around 3.25 to 3.67°E, 6.40 to 6.70°N). This is west of 10°30'E, so it should be EPSG:26392 (Nigeria Mid Belt) or EPSG:26393... Wait, let me check:

- EPSG:26391 = Minna / Nigeria West Belt (west of 10°30'E)
- EPSG:26392 = Minna / Nigeria Mid Belt
- EPSG:26393 = Minna / Nigeria East Belt

Actually, let me check properly.

1:09

Bash cd /work && python -c "

cd /work && python -c "
import pyproj

# Check all Nigeria belts
for epsg in [26391, 26392, 26393]:
    crs = pyproj.CRS.from_epsg(epsg)
    print(f'EPSG:{epsg} - {crs.name}')
    print(f'  Area: {crs.area_of_use}')
    print()
"

EPSG:26391 - Minna / Nigeria West Belt
  Area: - name: Nigeria - onshore west of 6°30'E, onshore and offshore shelf.
- bounds: (2.69, 3.57, 6.5, 13.9)

EPSG:26392 - Minna / Nigeria Mid Belt
  Area: - name: Nigeria between 6°30'E and 10°30'E, onshore and offshore shelf.
- bounds: (6.5, 3.57, 10.51, 13.53)

EPSG:26393 - Minna / Nigeria East Belt
  Area: - name: Nigeria east of 10°30'E.
- bounds: (10.49, 6.43, 14.65, 13.72)

1:16

Lagos is at about 3.4°E longitude, so it's in the West Belt (EPSG:26391 - Minna / Nigeria West Belt). That's the correct projected CRS for Nigeria's national grid (for the western part).

1:16

Bash cd /work && python -c "

cd /work && python -c "
import geopandas as gpd
import numpy as np

lu = gpd.read_file('lagos_landuse.geojson')
print('Landuse pop_density stats:')
print(f'  min: {lu[\"pop_density\"].min()}')
print(f'  max: {lu[\"pop_density\"].max()}')
print(f'  mean: {lu[\"pop_density\"].mean()}')
print(f'  median: {lu[\"pop_density\"].median()}')
print(f'  nulls: {lu[\"pop_density\"].isna().sum()}')
print()
print('Classes:', lu['class'].unique())
print()

# Check areas in WGS84 (approximate)
# Let's check polygon areas
lu_proj = lu.to_crs('EPSG:26391')
lu_proj['area_m2'] = lu_proj.area
print('Areas in projected CRS:')
print(f'  min: {lu_proj[\"area_m2\"].min():.1f}')
print(f'  max: {lu_proj[\"area_m2\"].max():.1f}')
print(f'  polygons < 100 m2: {(lu_proj[\"area_m2\"] < 100).sum()}')
"

Landuse pop_density stats:
  min: 507.3
  max: 49994.9
  mean: 25250.978978708048
  median: 25024.25
  nulls: 0

Classes: <ArrowStringArray>
[       'commercial',            'retail',      'construction',
             'pitch',        'industrial',          'landfill',
            'school',         'religious',          'hospital',
       'residential',        'brownfield',             'grass',
       'golf_course',              'park',        'pedestrian',
        'greenfield',          'farmland',             'track',
          'military',            'quarry',          'barracks',
           'stadium',            'clinic',          'cemetery',
           'railway',             'green',          'airfield',
        'playground',             'works',           'college',
            'garden',        'university',      'music_school',
           'garages',     'village_green', 'recreation_ground',
            'meadow',            'marina',          'farmyard',
        'theme_park',          'artefact',    'nature_reserve',
        'grave_yard']
Length: 43, dtype: str

Areas in projected CRS:
  min: 1.0
  max: 34909797.9
  polygons < 100 m2: 3022

1:37

So there are 3022 polygons under 100 m² out of 5542 total. Let's now build the solution.

The task:
1. Filter land-use polygons with area >= 100 m² (in projected CRS, i.e., Nigeria's national grid). The instruction says "Nigeria's national grid" - since Lagos is in the western belt, EPSG:26391 (Minna / Nigeria West Belt) is appropriate for Lagos. Wait, let me check which belt covers Lagos.

Lagos bounds: [3.25026042, 6.4009372, 3.6701081, 6.6980539] (lon ~3.25-3.67)

EPSG:26391 (West Belt) covers "Nigeria - onshore west of 6°30'E" - Lagos at ~3.4°E is in this zone. So EPSG:26391 is the right one for Lagos as "Nigeria's national grid" reference.

2. For each hex cell, calculate area-weighted mean population density across overlapping land-use polygons (filtered).
3. Rank hex cells by descending area_weighted_density, top 10%.
4. Output two files:
   - hotspots.geoparquet: hex_id, rank, area_weighted_density, geometry (in EPSG:26391)
   - hotspot_ranking.parquet: hex_id, rank, area_weighted_density, n_overlap_polygons, n_slivers_filtered

Let me write the solution.

1:37

Write /work/solve.py

content · 98 lines · py

import geopandas as gpd
import pandas as pd
import numpy as np

# --- Read data ---
lu = gpd.read_file('/work/lagos_landuse.geojson')
hex = gpd.read_file('/work/lagos_hex_grid.geojson')

# Both are in EPSG:4326; reproject to Nigeria national grid (West Belt for Lagos)
PROJ_CRS = 'EPSG:26391'

lu_proj = lu.to_crs(PROJ_CRS)
hex_proj = hex.to_crs(PROJ_CRS)

# --- Filter out slivers (polygons < 100 m²) ---
lu_proj['area_m2'] = lu_proj.area
lu_filtered = lu_proj[lu_proj['area_m2'] >= 100.0].copy()
lu_filtered.drop(columns='area_m2', inplace=True)

n_slivers_total = len(lu_proj) - len(lu_filtered)
print(f'Total land-use polygons: {len(lu_proj)}')
print(f'Filtered out (area < 100 m²): {n_slivers_total}')
print(f'Remaining polygons: {len(lu_filtered)}')

# --- Spatial join: hex cells with overlapping land-use polygons ---
# For each hex cell, find all overlapping land-use polygons
# Use predicate='intersects' to get overlapping pairs
joined = gpd.sjoin(hex_proj, lu_filtered, predicate='intersects', how='inner')

# --- Compute overlap area per pair ---
# We need the area of the intersection between each hex cell and each polygon
# Use GeoDataFrame.overlay for this
print('Computing intersections...')
# Use overlay with 'intersection' operation
intersections = gpd.overlay(hex_proj, lu_filtered, how='intersection', keep_geom_type=True)
print(f'Number of intersection pairs: {len(intersections)}')

# Compute area of each intersection piece
intersections['overlap_area'] = intersections.area

# --- Compute area-weighted mean population density per hex ---
# area_weighted_density = sum(pop_density_i * overlap_area_i) / sum(overlap_area_i)
# Group by hex_id
agg = intersections.groupby('hex_id').agg(
    sum_weighted_density=('overlap_area', lambda x: (intersections.loc[x.index, 'pop_density'] * intersections.loc[x.index, 'overlap_area']).sum()),
    total_overlap_area=('overlap_area', 'sum'),
    n_overlap_polygons=('hex_id', 'count')
).reset_index()

agg['area_weighted_density'] = agg['sum_weighted_density'] / agg['total_overlap_area']

# --- Determine n_slivers_filtered per hex ---
# Count slivers (polygons < 100 m²) that overlap each hex
# First, identify which original polygons are slivers
lu_proj['is_sliver'] = lu_proj['area_m2'] < 100.0
slivers = lu_proj[lu_proj['is_sliver']].copy()

if len(slivers) > 0:
    # Spatial join to count slivers per hex
    sliver_join = gpd.sjoin(hex_proj[['hex_id', 'geometry']], slivers[['geometry']], predicate='intersects', how='inner')
    sliver_counts = sliver_join.groupby('hex_id').size().reset_index(name='n_slivers_filtered')
else:
    sliver_counts = pd.DataFrame({'hex_id': hex_proj['hex_id'].unique(), 'n_slivers_filtered': 0})

# Merge sliver counts into agg
agg = agg.merge(sliver_counts, on='hex_id', how='left')
agg['n_slivers_filtered'] = agg['n_slivers_filtered'].fillna(0).astype(int)

# --- Rank ---
agg = agg.sort_values('area_weighted_density', ascending=False).reset_index(drop=True)
agg['rank'] = np.arange(1, len(agg) + 1)

# --- Select top 10% ---
n_top = max(1, int(np.ceil(len(agg) * 0.1)))
top = agg.head(n_top).copy()

print(f'Total hex cells with overlap: {len(agg)}')
print(f'Top 10% (n={n_top}):')

# --- Build geometry output ---
# Merge geometry from hex_proj
top_with_geom = top.merge(hex_proj[['hex_id', 'geometry']], on='hex_id', how='left')

# Sort by rank
top_with_geom = top_with_geom.sort_values('rank').reset_index(drop=True)

# --- Output 1: hotspots.geoparquet (with geometry, projected CRS) ---
cols_geom = ['hex_id', 'rank', 'area_weighted_density', 'geometry']
gdf_out = gpd.GeoDataFrame(top_with_geom[cols_geom], geometry='geometry', crs=PROJ_CRS)
gdf_out.to_parquet('/work/hotspots.geoparquet')
print(f'Wrote hotspots.geoparquet: {len(gdf_out)} rows, CRS={gdf_out.crs}')

# --- Output 2: hotspot_ranking.parquet (no geometry) ---
cols_plain = ['hex_id', 'rank', 'area_weighted_density', 'n_overlap_polygons', 'n_slivers_filtered']
df_out = top_with_geom[cols_plain].copy()
df_out.to_parquet('/work/hotspot_ranking.parquet', index=False)
print(f'Wrote hotspot_ranking.parquet: {len(df_out)} rows')
print(df_out.head(10))

Wrote /work/solve.py (4130 chars).

1:40

Let me run this script and see if it works.

1:40

Bash cd /work && python solve.py

Total land-use polygons: 5542
Filtered out (area < 100 m²): 3022
Remaining polygons: 2520
Computing intersections...
Number of intersection pairs: 4631
Total hex cells with overlap: 1032
Top 10% (n=104):
Wrote hotspots.geoparquet: 104 rows, CRS=EPSG:26391
Wrote hotspot_ranking.parquet: 104 rows
     hex_id  rank  ...  n_overlap_polygons  n_slivers_filtered
0  H011-017     1  ...                   2                   1
1  H049-008     2  ...                   1                   2
2  H031-006     3  ...                   1                   3
3  H035-007     4  ...                   1                   2
4  H039-014     5  ...                   1                   0
5  H039-016     6  ...                   1                   1
6  H038-016     7  ...                   1                   1
7  H038-015     8  ...                   3                   5
8  H039-015     9  ...                   2                   1
9  H031-005    10  ...                   2                   2

[10 rows x 5 columns]

1:51

Let me verify the outputs to make sure they are correct and meet the requirements.

Requirements:
1. Both files share exactly the same set of hex_id values - check
2. hotspots.geoparquet - GeoParquet with Polygon/MultiPolygon geometry in Nigeria's national grid (EPSG:26391) with hex_id, rank, area_weighted_density, geometry - check
3. hotspot_ranking.parquet - plain Parquet with hex_id, rank, area_weighted_density, n_overlap_polygons (int), n_slivers_filtered (int) - check
4. rank is unique integer starting at 1, ascending rank = descending area_weighted_density - check
5. Top 10% - I used ceil of 10% of 1032 = 104 - that seems right.

Let me verify the files more thoroughly.

1:51

Bash cd /work && python -c "

cd /work && python -c "
import geopandas as gpd
import pandas as pd

# Check hotspots.geoparquet
hot = gpd.read_parquet('hotspots.geoparquet')
print('=== hotspots.geoparquet ===')
print(f'Shape: {hot.shape}')
print(f'Columns: {list(hot.columns)}')
print(f'CRS: {hot.crs}')
print(f'Geometry types: {hot.geometry.type.unique()}')
print(f'hex_id dtype: {hot[\"hex_id\"].dtype}')
print(f'rank dtype: {hot[\"rank\"].dtype}')
print(f'area_weighted_density dtype: {hot[\"area_weighted_density\"].dtype}')
print(f'Min rank: {hot[\"rank\"].min()}, Max rank: {hot[\"rank\"].max()}')
print(f'Unique ranks: {hot[\"rank\"].nunique()}')
print(f'Unique hex_ids: {hot[\"hex_id\"].nunique()}')
print()

# Check hotspot_ranking.parquet
rank = pd.read_parquet('hotspot_ranking.parquet')
print('=== hotspot_ranking.parquet ===')
print(f'Shape: {rank.shape}')
print(f'Columns: {list(rank.columns)}')
print(f'dtypes: {rank.dtypes}')
print(f'Min rank: {rank[\"rank\"].min()}, Max rank: {rank[\"rank\"].max()}')
print(f'Unique ranks: {rank[\"rank\"].nunique()}')
print(f'Unique hex_ids: {rank[\"hex_id\"].nunique()}')
print()

# Verify same hex_ids
print('=== Hex IDs match check ===')
hot_ids = set(hot['hex_id'])
rank_ids = set(rank['hex_id'])
print(f'Hot IDs: {len(hot_ids)}, Rank IDs: {len(rank_ids)}')
print(f'In hot but not rank: {hot_ids - rank_ids}')
print(f'In rank but not hot: {rank_ids - hot_ids}')
print()

# Check ranking order
print('=== Ranking order check ===')
print('Top 5 (highest density):')
print(hot[['hex_id', 'rank', 'area_weighted_density']].head(5))
print('Bottom 5 (lowest density among top 10%):')
print(hot[['hex_id', 'rank', 'area_weighted_density']].tail(5))

# Check descending order
densities = hot.sort_values('rank')['area_weighted_density'].values
is_descending = all(densities[i] >= densities[i+1] for i in range(len(densities)-1))
print(f'Strictly descending by rank: {is_descending}')
"

=== hotspots.geoparquet ===
Shape: (104, 4)
Columns: ['hex_id', 'rank', 'area_weighted_density', 'geometry']
CRS: {"$schema": "https://proj.org/schemas/v0.7/projjson.schema.json", "type": "ProjectedCRS", "name": "Minna / Nigeria West Belt", "base_crs": {"name": "Minna", "datum": {"type": "GeodeticReferenceFrame", "name": "Minna", "ellipsoid": {"name": "Clarke 1880 (RGS)", "semi_major_axis": 6378249.145, "inverse_flattening": 293.465}}, "coordinate_system": {"subtype": "ellipsoidal", "axis": [{"name": "Geodetic latitude", "abbreviation": "Lat", "direction": "north", "unit": "degree"}, {"name": "Geodetic longitude", "abbreviation": "Lon", "direction": "east", "unit": "degree"}]}, "id": {"authority": "EPSG", "code": 4263}}, "conversion": {"name": "Nigeria West Belt", "method": {"name": "Transverse Mercator", "id": {"authority": "EPSG", "code": 9807}}, "parameters": [{"name": "Latitude of natural origin", "value": 4, "unit": "degree", "id": {"authority": "EPSG", "code": 8801}}, {"name": "Longitude of natural origin", "value": 4.5, "unit": "degree", "id": {"authority": "EPSG", "code": 8802}}, {"name": "Scale factor at natural origin", "value": 0.99975, "unit": "unity", "id": {"authority": "EPSG", "code": 8805}}, {"name": "False easting", "value": 230738.26, "unit": "metre", "id": {"authority": "EPSG", "code": 8806}}, {"name": "False northing", "value": 0, "unit": "metre", "id": {"authority": "EPSG", "code": 8807}}]}, "coordinate_system": {"subtype": "Cartesian", "axis": [{"name": "Easting", "abbreviation": "E", "direction": "east", "unit": "metre"}, {"name": "Northing", "abbreviation": "N", "direction": "north", "unit": "metre"}]}, "scope": "Engineering survey, topographic mapping.", "area": "Nigeria - onshore west of 6\u00b030'E, onshore and offshore shelf.", "bbox": {"south_latitude": 3.57, "west_longitude": 2.69, "north_latitude": 13.9, "east_longitude": 6.5}, "id": {"authority": "EPSG", "code": 26391}}
Geometry types: <ArrowStringArray>
['Polygon']
Length: 1, dtype: str
hex_id dtype: str
rank dtype: int64
area_weighted_density dtype: float64
Min rank: 1, Max rank: 104
Unique ranks: 104
Unique hex_ids: 104

=== hotspot_ranking.parquet ===
Shape: (104, 5)
Columns: ['hex_id', 'rank', 'area_weighted_density', 'n_overlap_polygons', 'n_slivers_filtered']
dtypes: hex_id                       str
rank                       int64
area_weighted_density    float64
n_overlap_polygons         int64
n_slivers_filtered         int64
dtype: object
Min rank: 1, Max rank: 104
Unique ranks: 104
Unique hex_ids: 104

=== Hex IDs match check ===
Hot IDs: 104, Rank IDs: 104
In hot but not rank: set()
In rank but not hot: set()

=== Ranking order check ===
Top 5 (highest density):
     hex_id  rank  area_weighted_density
0  H011-017     1           49816.761617
1  H049-008     2           49618.400000
2  H031-006     3           49405.200000
3  H035-007     4           49167.200000
4  H039-014     5           49163.000000
Bottom 5 (lowest density among top 10%):
       hex_id  rank  area_weighted_density
99   H042-011   100           40450.400000
100  H042-012   101           40450.400000
101  H019-014   102           40417.115159
102  H019-018   103           40361.983995
103  H014-021   104           40321.125518
Strictly descending by rank: True

2:10

Both files look correct. Let me verify the key requirements once more:

1. ✅ `hotspots.geoparquet` - GeoParquet with Polygon geometry in Nigeria's national grid (EPSG:26391, Minna / Nigeria West Belt)
2. ✅ Columns: hex_id, rank, area_weighted_density, geometry
3. ✅ `hotspot_ranking.parquet` - Plain Parquet with hex_id, rank, area_weighted_density, n_overlap_polygons, n_slivers_filtered
4. ✅ Both files share exactly the same hex_id values (104 each, fully matching)
5. ✅ Rank is unique integer starting at 1, ascending rank = descending area_weighted_density
6. ✅ Tiny polygons under 100 m² excluded (3022 filtered out)
7. ✅ n_slivers_filtered counts slivers per hex cell
8. ✅ n_overlap_polygons counts overlapping polygons per hex cell (after sliver filtering)
9. ✅ Top 10% selected (104 out of 1032 = ~10.1%, correct ceil)

Everything looks good!

2:10

spa-l2-lagos-hotspot-overlaps

Map (reference ← swipe → agent)

Gates

Subchecks

spa-l2-lagos-hotspot-overlaps Rerun

Map (reference ← swipe → agent)

Gates

Subchecks

spa-l2-lagos-hotspot-overlaps