spa-l1-capetown-hospital-nn
Map (reference ← swipe → agent)
0:00
I am putting together EMS coverage planning for the Western Cape and need each residential pickup address tagged with its nearest hospital so the ops-research team can join the result against shift schedules. For every address in `addresses.parquet`, find the nearest hospital from `hospitals.parquet` by straight-line distance and record that hospital's name plus the distance in metres. Keep the original `address_id` from the input so we can join back to the address master, and make sure `nearest_hospital_name` comes through as a non-empty string and `distance_m` as a numeric, finite, non-negative value in metres.
Please write the result to `nearest_hospital.gpkg` in GeoPackage format with one feature per input address, keeping the original address geometry and the columns `address_id`, `nearest_hospital_name`, and `distance_m`.
0:00
System info
{
"adapter": "openrouter",
"model": "deepseek/deepseek-v4-flash",
"base_url": "https://openrouter.ai/api/v1"
}
0:04
Let me start by examining the input files to understand their structure.
0:04
Bash
python3 -c "
python3 -c "
import geopandas as gpd
import pandas as pd
# Read the parquet files
addr = gpd.read_parquet('/work/addresses.parquet')
hosp = gpd.read_parquet('/work/hospitals.parquet')
print('=== Addresses ===')
print(addr.head())
print(addr.columns.tolist())
print(addr.crs)
print(addr.dtypes)
print(f'Count: {len(addr)}')
print()
print('=== Hospitals ===')
print(hosp.head())
print(hosp.columns.tolist())
print(hosp.crs)
print(hosp.dtypes)
print(f'Count: {len(hosp)}')
"
=== Addresses ===
address_id geometry
0 A0001 POINT (18.53441 -33.94184)
1 A0002 POINT (18.50296 -33.93623)
2 A0003 POINT (18.53234 -33.95918)
3 A0004 POINT (18.44857 -33.98103)
4 A0005 POINT (18.49928 -33.99649)
['address_id', 'geometry']
{"$schema": "https://proj.org/schemas/v0.7/projjson.schema.json", "type": "GeographicCRS", "name": "WGS 84", "datum_ensemble": {"name": "World Geodetic System 1984 ensemble", "members": [{"name": "World Geodetic System 1984 (Transit)"}, {"name": "World Geodetic System 1984 (G730)"}, {"name": "World Geodetic System 1984 (G873)"}, {"name": "World Geodetic System 1984 (G1150)"}, {"name": "World Geodetic System 1984 (G1674)"}, {"name": "World Geodetic System 1984 (G1762)"}, {"name": "World Geodetic System 1984 (G2139)"}, {"name": "World Geodetic System 1984 (G2296)"}], "ellipsoid": {"name": "WGS 84", "semi_major_axis": 6378137, "inverse_flattening": 298.257223563}, "accuracy": "2.0", "id": {"authority": "EPSG", "code": 6326}}, "coordinate_system": {"subtype": "ellipsoidal", "axis": [{"name": "Geodetic latitude", "abbreviation": "Lat", "direction": "north", "unit": "degree"}, {"name": "Geodetic longitude", "abbreviation": "Lon", "direction": "east", "unit": "degree"}]}, "scope": "Horizontal component of 3D system.", "area": "World.", "bbox": {"south_latitude": -90, "west_longitude": -180, "north_latitude": 90, "east_longitude": 180}, "id": {"authority": "EPSG", "code": 4326}}
address_id str
geometry geometry
dtype: object
Count: 120
=== Hospitals ===
hospital_id name geometry
0 H001 Mowbray Maternity Hospital POINT (18.47483 -33.94921)
1 H002 Life Claremont Hospital POINT (18.46637 -33.98695)
2 H003 Old Somerset Hospital POINT (18.42013 -33.9158)
3 H004 Mpheni Block C Elim Hospital POINT (18.43246 -33.92207)
4 H005 Life Kingsbury Hospital POINT (18.46883 -33.98645)
['hospital_id', 'name', 'geometry']
{"$schema": "https://proj.org/schemas/v0.7/projjson.schema.json", "type": "GeographicCRS", "name": "WGS 84", "datum_ensemble": {"name": "World Geodetic System 1984 ensemble", "members": [{"name": "World Geodetic System 1984 (Transit)"}, {"name": "World Geodetic System 1984 (G730)"}, {"name": "World Geodetic System 1984 (G873)"}, {"name": "World Geodetic System 1984 (G1150)"}, {"name": "World Geodetic System 1984 (G1674)"}, {"name": "World Geodetic System 1984 (G1762)"}, {"name": "World Geodetic System 1984 (G2139)"}, {"name": "World Geodetic System 1984 (G2296)"}], "ellipsoid": {"name": "WGS 84", "semi_major_axis": 6378137, "inverse_flattening": 298.257223563}, "accuracy": "2.0", "id": {"authority": "EPSG", "code": 6326}}, "coordinate_system": {"subtype": "ellipsoidal", "axis": [{"name": "Geodetic latitude", "abbreviation": "Lat", "direction": "north", "unit": "degree"}, {"name": "Geodetic longitude", "abbreviation": "Lon", "direction": "east", "unit": "degree"}]}, "scope": "Horizontal component of 3D system.", "area": "World.", "bbox": {"south_latitude": -90, "west_longitude": -180, "north_latitude": 90, "east_longitude": 180}, "id": {"authority": "EPSG", "code": 4326}}
hospital_id str
name str
geometry geometry
dtype: object
Count: 37
0:23
Both datasets are in WGS 84 (EPSG:4326). I need to: 1. For each address point, find the nearest hospital by straight-line distance. 2. Record the hospital name and distance in metres. 3. Write to GeoPackage with address geometry, address_id, nearest_hospital_name, distance_m. Since the data is in lon/lat degrees, I need to compute great-circle distances (Haversine) or project to a local Cartesian CRS for the Western Cape area. Let me think about the best approach. For straight-line distance in metres, I can: - Use pyproj to compute geodesic distances - Or project to a suitable projected CRS for South Africa (like EPSG:32734 - UTM zone 34S, which covers Western Cape) - Or use shapely's distance but that would be in degrees, not metres Let me use a projected CRS approach or geodesic computation. Actually, the simplest robust approach: compute distances using the WGS 84 ellipsoid via pyproj's Geod, or project the data to a local CRS. Let me use a UTM projection. The Western Cape roughly spans longitudes 18°E to 23°E, so UTM zone 34S (EPSG:32734) covers 18°E to 24°E, and 33S (EPSG:32733) covers 12°E to 18°E. Let me check which zone covers most of the data. Looking at the sample data, the coordinates are around 18.4-18.5°E, so that's UTM zone 34S (EPSG:32734). But to be safe, let me use a more localized approach. Actually, the simplest accurate method: use `geopandas.sjoin_nearest` with a projected CRS. Let me project both to EPSG:32734 (UTM zone 34S) which gives metres. Let me write the script.
0:23
Write
/work/solve.py
content · 58 lines · py
import geopandas as gpd
import pandas as pd
import numpy as np
# Read inputs
addr = gpd.read_parquet('/work/addresses.parquet')
hosp = gpd.read_parquet('/work/hospitals.parquet')
# Both are in EPSG:4326 (WGS 84 degrees). For accurate distance in metres,
# project to a suitable projected CRS covering the Western Cape.
# UTM zone 34S (EPSG:32734) covers 18°E–24°E, appropriate for most of the Western Cape.
PROJ_CRS = 'EPSG:32734'
addr_proj = addr.to_crs(PROJ_CRS)
hosp_proj = hosp.to_crs(PROJ_CRS)
# For each address, find the nearest hospital using spatial join
# sjoin_nearest returns the nearest neighbour with distance
result = gpd.sjoin_nearest(
addr_proj,
hosp_proj[['hospital_id', 'name', 'geometry']],
how='left',
max_distance=None, # no distance limit
distance_col='distance_m'
)
# Group by address_id in case of ties (should be fine, sjoin_nearest returns 1 row per left)
# But sjoin_nearest with how='left' returns one row per left geometry by default
# Actually it returns one row per match; if there are ties (same distance) it may return multiple rows.
# Let's handle that by grouping and taking the first (or any).
result = result.sort_values('distance_m').groupby('address_id').first().reset_index()
# Rename columns as required
result = result.rename(columns={'name': 'nearest_hospital_name'})
# Keep only the needed columns
result = result[['address_id', 'nearest_hospital_name', 'distance_m', 'geometry']]
# Ensure distance_m is numeric, finite, non-negative
result['distance_m'] = pd.to_numeric(result['distance_m'], errors='coerce')
result = result[result['distance_m'].notna() & np.isfinite(result['distance_m'])]
result['distance_m'] = result['distance_m'].clip(lower=0.0)
# Ensure nearest_hospital_name is non-empty string
result['nearest_hospital_name'] = result['nearest_hospital_name'].fillna('').astype(str)
result = result[result['nearest_hospital_name'].str.strip() != '']
# Convert geometry back to original CRS (WGS 84)
result = result.set_geometry('geometry').set_crs(PROJ_CRS).to_crs(addr.crs)
# Write to GeoPackage
result.to_file('/work/nearest_hospital.gpkg', layer='nearest_hospital', driver='GPKG')
print(f'Written {len(result)} records to /work/nearest_hospital.gpkg')
print(result.head())
print('CRS:', result.crs)
print('distance_m summary:')
print(result['distance_m'].describe())
Wrote /work/solve.py (2357 chars).
0:27
Let me run it and check the result.
0:27
Bash
cd /work && python solve.py
Written 120 records to /work/nearest_hospital.gpkg
address_id ... geometry
0 A0001 ... POINT (18.53441 -33.94184)
1 A0002 ... POINT (18.50296 -33.93623)
2 A0003 ... POINT (18.53234 -33.95918)
3 A0004 ... POINT (18.44857 -33.98103)
4 A0005 ... POINT (18.49928 -33.99649)
[5 rows x 4 columns]
CRS: {"$schema": "https://proj.org/schemas/v0.7/projjson.schema.json", "type": "GeographicCRS", "name": "WGS 84", "datum_ensemble": {"name": "World Geodetic System 1984 ensemble", "members": [{"name": "World Geodetic System 1984 (Transit)"}, {"name": "World Geodetic System 1984 (G730)"}, {"name": "World Geodetic System 1984 (G873)"}, {"name": "World Geodetic System 1984 (G1150)"}, {"name": "World Geodetic System 1984 (G1674)"}, {"name": "World Geodetic System 1984 (G1762)"}, {"name": "World Geodetic System 1984 (G2139)"}, {"name": "World Geodetic System 1984 (G2296)"}], "ellipsoid": {"name": "WGS 84", "semi_major_axis": 6378137, "inverse_flattening": 298.257223563}, "accuracy": "2.0", "id": {"authority": "EPSG", "code": 6326}}, "coordinate_system": {"subtype": "ellipsoidal", "axis": [{"name": "Geodetic latitude", "abbreviation": "Lat", "direction": "north", "unit": "degree"}, {"name": "Geodetic longitude", "abbreviation": "Lon", "direction": "east", "unit": "degree"}]}, "scope": "Horizontal component of 3D system.", "area": "World.", "bbox": {"south_latitude": -90, "west_longitude": -180, "north_latitude": 90, "east_longitude": 180}, "id": {"authority": "EPSG", "code": 4326}}
distance_m summary:
count 120.000000
mean 1841.992144
std 1187.555528
min 47.018743
25% 996.402110
50% 1582.328219
75% 2629.297510
max 5956.744689
Name: distance_m, dtype: float64
0:35
The result looks good. Let me verify the output file is valid and has the right structure.
0:35
Bash
python3 -c "
python3 -c "
import geopandas as gpd
gdf = gpd.read_file('/work/nearest_hospital.gpkg')
print('Columns:', gdf.columns.tolist())
print('Count:', len(gdf))
print('CRS:', gdf.crs)
print()
print('Sample:')
print(gdf.head(10).to_string())
# Check constraints
print()
print('=== Validations ===')
print('All address_id non-null:', gdf['address_id'].notna().all())
print('All nearest_hospital_name non-empty:', (gdf['nearest_hospital_name'].astype(str).str.strip() != '').all())
print('All distance_m numeric & finite:', gdf['distance_m'].notna().all() and all(gdf['distance_m'].notna()))
print('All distance_m >= 0:', (gdf['distance_m'] >= 0).all())
print('Geometry type:', gdf.geometry.type.unique())
"
Columns: ['address_id', 'nearest_hospital_name', 'distance_m', 'geometry'] Count: 120 CRS: EPSG:4326 Sample: address_id nearest_hospital_name distance_m geometry 0 A0001 Melomed Gatesville Private Hospital 3192.958737 POINT (18.53441 -33.94184) 1 A0002 Life Vincent Pallotti Hospital 1468.601852 POINT (18.50296 -33.93623) 2 A0003 Melomed Gatesville Private Hospital 1271.941611 POINT (18.53234 -33.95918) 3 A0004 Maitland Cottage Children's Orthopaedic Hospital 984.047189 POINT (18.44857 -33.98103) 4 A0005 Kingsbury Hospital 3021.737824 POINT (18.49928 -33.99649) 5 A0006 Intercare Physical Rehabilitation Hospital - Newlands 1777.422286 POINT (18.48709 -33.97114) 6 A0007 Melomed Gatesville Private Hospital 1268.558960 POINT (18.54461 -33.96394) 7 A0008 Melomed Gatesville Private Hospital 1264.240167 POINT (18.52017 -33.97329) 8 A0009 Melomed Gatesville Private Hospital 2020.478965 POINT (18.51179 -33.97298) 9 A0010 Red Cross Children's Hospital 1811.017934 POINT (18.49876 -33.9682) === Validations === All address_id non-null: True All nearest_hospital_name non-empty: True All distance_m numeric & finite: True All distance_m >= 0: True Geometry type: <ArrowStringArray> ['Point'] Length: 1, dtype: str
0:42
Everything checks out. All 120 addresses have been processed with: - `address_id` preserved - `nearest_hospital_name` as non-empty string - `distance_m` as numeric, finite, non-negative (in metres) - Original geometry preserved in WGS 84 (EPSG:4326) - Output written to `/work/nearest_hospital.gpkg` Let me confirm the file exists and is valid.
0:42