spa-l1-capetown-hospital-nn
Map (reference ← swipe → agent)
0:00
I am putting together EMS coverage planning for the Western Cape and need each residential pickup address tagged with its nearest hospital so the ops-research team can join the result against shift schedules. For every address in `addresses.parquet`, find the nearest hospital from `hospitals.parquet` by straight-line distance and record that hospital's name plus the distance in metres. Keep the original `address_id` from the input so we can join back to the address master, and make sure `nearest_hospital_name` comes through as a non-empty string and `distance_m` as a numeric, finite, non-negative value in metres.
Please write the result to `nearest_hospital.gpkg` in GeoPackage format with one feature per input address, keeping the original address geometry and the columns `address_id`, `nearest_hospital_name`, and `distance_m`.
0:00
System info
{
"adapter": "openrouter",
"model": "deepseek/deepseek-v4-flash",
"base_url": "https://openrouter.ai/api/v1"
}
0:04
Let me start by reading the input files to understand their structure.
0:04
Bash
cd /work && python3 -c "
cd /work && python3 -c "
import pandas as pd
df = pd.read_parquet('addresses.parquet')
print('Addresses columns:', list(df.columns))
print('Addresses shape:', df.shape)
print(df.head())
print()
df2 = pd.read_parquet('hospitals.parquet')
print('Hospitals columns:', list(df2.columns))
print('Hospitals shape:', df2.shape)
print(df2.head())
"
Addresses columns: ['address_id', 'geometry'] Addresses shape: (120, 2) address_id geometry 0 A0001 b'\x01\x01\x00\x00\x00\xc2\x99$,\xcf\x882@\xcd... 1 A0002 b'\x01\x01\x00\x00\x00\xfcg\xbf\x12\xc2\x802@t... 2 A0003 b'\x01\x01\x00\x00\x00\xb8\xa8\x80\x7fG\x882@r... 3 A0004 b'\x01\x01\x00\x00\x00\x01Vxv\xd5r2@\x02\\\xc4... 4 A0005 b'\x01\x01\x00\x00\x00\xcc\xa7\xbb\xf0\xd0\x7f... Hospitals columns: ['hospital_id', 'name', 'geometry'] Hospitals shape: (37, 3) hospital_id ... geometry 0 H001 ... b"\x01\x01\x00\x00\x00\xb7''^\x8ey2@\x12\xf4z\... 1 H002 ... b'\x01\x01\x00\x00\x00\xb5\x01THdw2@\x15\xe1`N... 2 H003 ... b'\x01\x01\x00\x00\x00\x0b\xe8K\x90\x8dk2@\xd6... 3 H004 ... b'\x01\x01\x00\x00\x00\x02\xd4\xd4\xb2\xb5n2@D... 4 H005 ... b'\x01\x01\x00\x00\x00?ke_\x05x2@\xa8\xe8\xe5\... [5 rows x 3 columns]
0:09
The data has: - Addresses: columns = ['address_id', 'geometry'] with 120 rows - Hospitals: columns = ['hospital_id', 'name', 'geometry'] with 37 rows The geometry is stored as WKB hex or bytes. Let me check the CRS and geometry type more carefully.
0:09
Bash
cd /work && python3 -c "
cd /work && python3 -c "
import geopandas as gpd
addr = gpd.read_parquet('addresses.parquet')
hosp = gpd.read_parquet('hospitals.parquet')
print('Addresses:')
print(addr.head())
print(addr.crs)
print()
print('Hospitals:')
print(hosp.head())
print(hosp.crs)
"
Addresses:
address_id geometry
0 A0001 POINT (18.53441 -33.94184)
1 A0002 POINT (18.50296 -33.93623)
2 A0003 POINT (18.53234 -33.95918)
3 A0004 POINT (18.44857 -33.98103)
4 A0005 POINT (18.49928 -33.99649)
{"$schema": "https://proj.org/schemas/v0.7/projjson.schema.json", "type": "GeographicCRS", "name": "WGS 84", "datum_ensemble": {"name": "World Geodetic System 1984 ensemble", "members": [{"name": "World Geodetic System 1984 (Transit)"}, {"name": "World Geodetic System 1984 (G730)"}, {"name": "World Geodetic System 1984 (G873)"}, {"name": "World Geodetic System 1984 (G1150)"}, {"name": "World Geodetic System 1984 (G1674)"}, {"name": "World Geodetic System 1984 (G1762)"}, {"name": "World Geodetic System 1984 (G2139)"}, {"name": "World Geodetic System 1984 (G2296)"}], "ellipsoid": {"name": "WGS 84", "semi_major_axis": 6378137, "inverse_flattening": 298.257223563}, "accuracy": "2.0", "id": {"authority": "EPSG", "code": 6326}}, "coordinate_system": {"subtype": "ellipsoidal", "axis": [{"name": "Geodetic latitude", "abbreviation": "Lat", "direction": "north", "unit": "degree"}, {"name": "Geodetic longitude", "abbreviation": "Lon", "direction": "east", "unit": "degree"}]}, "scope": "Horizontal component of 3D system.", "area": "World.", "bbox": {"south_latitude": -90, "west_longitude": -180, "north_latitude": 90, "east_longitude": 180}, "id": {"authority": "EPSG", "code": 4326}}
Hospitals:
hospital_id name geometry
0 H001 Mowbray Maternity Hospital POINT (18.47483 -33.94921)
1 H002 Life Claremont Hospital POINT (18.46637 -33.98695)
2 H003 Old Somerset Hospital POINT (18.42013 -33.9158)
3 H004 Mpheni Block C Elim Hospital POINT (18.43246 -33.92207)
4 H005 Life Kingsbury Hospital POINT (18.46883 -33.98645)
{"$schema": "https://proj.org/schemas/v0.7/projjson.schema.json", "type": "GeographicCRS", "name": "WGS 84", "datum_ensemble": {"name": "World Geodetic System 1984 ensemble", "members": [{"name": "World Geodetic System 1984 (Transit)"}, {"name": "World Geodetic System 1984 (G730)"}, {"name": "World Geodetic System 1984 (G873)"}, {"name": "World Geodetic System 1984 (G1150)"}, {"name": "World Geodetic System 1984 (G1674)"}, {"name": "World Geodetic System 1984 (G1762)"}, {"name": "World Geodetic System 1984 (G2139)"}, {"name": "World Geodetic System 1984 (G2296)"}], "ellipsoid": {"name": "WGS 84", "semi_major_axis": 6378137, "inverse_flattening": 298.257223563}, "accuracy": "2.0", "id": {"authority": "EPSG", "code": 6326}}, "coordinate_system": {"subtype": "ellipsoidal", "axis": [{"name": "Geodetic latitude", "abbreviation": "Lat", "direction": "north", "unit": "degree"}, {"name": "Geodetic longitude", "abbreviation": "Lon", "direction": "east", "unit": "degree"}]}, "scope": "Horizontal component of 3D system.", "area": "World.", "bbox": {"south_latitude": -90, "west_longitude": -180, "north_latitude": 90, "east_longitude": 180}, "id": {"authority": "EPSG", "code": 4326}}
0:25
Both datasets are in WGS 84 (EPSG:4326) with point geometries. I need to find the nearest hospital for each address using straight-line distance. Since the data is in lat/lon, I should reproject to a suitable projected CRS for South Africa (like EPSG:32734 - UTM zone 34S which covers Western Cape) to compute accurate Euclidean distances, or I can use the haversine formula for great-circle distance on the sphere. Actually, the simplest approach: use GeoPandas' `sjoin_nearest` which can compute distances using the CRS. Since the CRS is geographic (degrees), the distances would be in degrees, not metres. Better to reproject to a suitable projected CRS first. Western Cape, South Africa falls in UTM zone 34S (EPSG:32734). Let me use that. Let me write a script to: 1. Read both parquet files as GeoDataFrames 2. Reproject to EPSG:32734 (UTM zone 34S, metres) 3. For each address, find the nearest hospital using sjoin_nearest or a manual nearest neighbor calculation 4. Compute the distance in metres 5. Write to GeoPackage with the required columns Actually, `sjoin_nearest` can give us the nearest hospital for each address. Let me use that. Let me write the script.
0:25
Write
/work/solve.py
content · 44 lines · py
import geopandas as gpd
import numpy as np
# Read the data
addr = gpd.read_parquet('/work/addresses.parquet')
hosp = gpd.read_parquet('/work/hospitals.parquet')
# Both are in EPSG:4326 (WGS 84). Reproject to UTM zone 34S (EPSG:32734)
# which covers Western Cape and uses metres.
addr_utm = addr.to_crs('EPSG:32734')
hosp_utm = hosp.to_crs('EPSG:32734')
# For each address, find the nearest hospital using sjoin_nearest
# This performs a spatial join, matching each address to its nearest hospital.
result = gpd.sjoin_nearest(
addr_utm,
hosp_utm[['hospital_id', 'name', 'geometry']],
how='left',
distance_col='distance_m'
)
# Rename 'name' to 'nearest_hospital_name'
result = result.rename(columns={'name': 'nearest_hospital_name'})
# Keep only the required columns and the original geometry
result = result[['address_id', 'nearest_hospital_name', 'distance_m', 'geometry']]
# The geometry is still in UTM; we want the output in the original CRS (EPSG:4326)
result = result.set_geometry('geometry').set_crs('EPSG:32734')
result = result.to_crs('EPSG:4326')
# Ensure nearest_hospital_name is a non-empty string (all should be fine)
# Ensure distance_m is numeric, finite, non-negative
result['distance_m'] = result['distance_m'].astype(float)
assert result['distance_m'].notna().all(), "NaN distances found"
assert np.isfinite(result['distance_m']).all(), "Non-finite distances found"
assert (result['distance_m'] >= 0).all(), "Negative distances found"
# Write to GeoPackage
result.to_file('/work/nearest_hospital.gpkg', layer='nearest_hospital', driver='GPKG')
print("Done! Output written to /work/nearest_hospital.gpkg")
print(f"Rows: {len(result)}")
print(result.head())
Wrote /work/solve.py (1697 chars).
0:35
Let me run the script.
0:35
Bash
cd /work && python solve.py
Done! Output written to /work/nearest_hospital.gpkg Rows: 120 address_id ... geometry 0 A0001 ... POINT (18.53441 -33.94184) 1 A0002 ... POINT (18.50296 -33.93623) 2 A0003 ... POINT (18.53234 -33.95918) 3 A0004 ... POINT (18.44857 -33.98103) 4 A0005 ... POINT (18.49928 -33.99649) [5 rows x 4 columns]
0:41
Let me verify the output to make sure it's correct - check that the columns are as expected, the distance values are reasonable, and the GeoPackage loads properly.
0:41
Bash
cd /work && python3 -c "
cd /work && python3 -c "
import geopandas as gpd
result = gpd.read_file('/work/nearest_hospital.gpkg')
print('Columns:', list(result.columns))
print('Shape:', result.shape)
print()
print(result[['address_id', 'nearest_hospital_name', 'distance_m']].head(10))
print()
print('Distance stats:')
print(result['distance_m'].describe())
print()
print('CRS:', result.crs)
print()
# Check no empty names
print('Empty names:', (result['nearest_hospital_name'] == '').sum())
print('NaN names:', result['nearest_hospital_name'].isna().sum())
print('NaN distances:', result['distance_m'].isna().sum())
print('Neg distances:', (result['distance_m'] < 0).sum())
print('Inf distances:', (result['distance_m'] == float('inf')).sum())
"
Columns: ['address_id', 'nearest_hospital_name', 'distance_m', 'geometry'] Shape: (120, 4) address_id nearest_hospital_name distance_m 0 A0001 Melomed Gatesville Private Hospital 3192.958737 1 A0002 Life Vincent Pallotti Hospital 1468.601852 2 A0003 Melomed Gatesville Private Hospital 1271.941611 3 A0004 Maitland Cottage Children's Orthopaedic Hospital 984.047189 4 A0005 Kingsbury Hospital 3021.737824 5 A0006 Intercare Physical Rehabilitation Hospital - N... 1777.422286 6 A0007 Melomed Gatesville Private Hospital 1268.558960 7 A0008 Melomed Gatesville Private Hospital 1264.240167 8 A0009 Melomed Gatesville Private Hospital 2020.478965 9 A0010 Red Cross Children's Hospital 1811.017934 Distance stats: count 120.000000 mean 1841.992144 std 1187.555528 min 47.018743 25% 996.402110 50% 1582.328219 75% 2629.297510 max 5956.744689 Name: distance_m, dtype: float64 CRS: EPSG:4326 Empty names: 0 NaN names: 0 NaN distances: 0 Neg distances: 0 Inf distances: 0
0:51
Everything looks good. The output has: - 120 rows (one per input address) - Columns: address_id, nearest_hospital_name, distance_m, geometry - All distances are numeric, finite, non-negative - No empty or NaN names - CRS is EPSG:4326 (WGS 84) - The original geometry is preserved The output is written to `/work/nearest_hospital.gpkg` as required.
0:51