spa-l1-capetown-hospital-nn
Map (reference ← swipe → agent)
0:00
I am putting together EMS coverage planning for the Western Cape and need each residential pickup address tagged with its nearest hospital so the ops-research team can join the result against shift schedules. For every address in `addresses.parquet`, find the nearest hospital from `hospitals.parquet` by straight-line distance and record that hospital's name plus the distance in metres. Keep the original `address_id` from the input so we can join back to the address master, and make sure `nearest_hospital_name` comes through as a non-empty string and `distance_m` as a numeric, finite, non-negative value in metres.
Please write the result to `nearest_hospital.gpkg` in GeoPackage format with one feature per input address, keeping the original address geometry and the columns `address_id`, `nearest_hospital_name`, and `distance_m`.
0:00
System info
{
"adapter": "openrouter",
"model": "deepseek/deepseek-v4-flash",
"base_url": "https://openrouter.ai/api/v1"
}
0:02
Let me start by examining the data files to understand their structure.
0:02
Bash
cd /work && python -c "
cd /work && python -c "
import geopandas as gpd
import pandas as pd
# Read the input files
addr = gpd.read_parquet('addresses.parquet')
hosp = gpd.read_parquet('hospitals.parquet')
print('=== Addresses ===')
print('Columns:', addr.columns.tolist())
print('CRS:', addr.crs)
print('Shape:', addr.shape)
print(addr.head())
print()
print('=== Hospitals ===')
print('Columns:', hosp.columns.tolist())
print('CRS:', hosp.crs)
print('Shape:', hosp.shape)
print(hosp.head())
"
=== Addresses ===
Columns: ['address_id', 'geometry']
CRS: {"$schema": "https://proj.org/schemas/v0.7/projjson.schema.json", "type": "GeographicCRS", "name": "WGS 84", "datum_ensemble": {"name": "World Geodetic System 1984 ensemble", "members": [{"name": "World Geodetic System 1984 (Transit)"}, {"name": "World Geodetic System 1984 (G730)"}, {"name": "World Geodetic System 1984 (G873)"}, {"name": "World Geodetic System 1984 (G1150)"}, {"name": "World Geodetic System 1984 (G1674)"}, {"name": "World Geodetic System 1984 (G1762)"}, {"name": "World Geodetic System 1984 (G2139)"}, {"name": "World Geodetic System 1984 (G2296)"}], "ellipsoid": {"name": "WGS 84", "semi_major_axis": 6378137, "inverse_flattening": 298.257223563}, "accuracy": "2.0", "id": {"authority": "EPSG", "code": 6326}}, "coordinate_system": {"subtype": "ellipsoidal", "axis": [{"name": "Geodetic latitude", "abbreviation": "Lat", "direction": "north", "unit": "degree"}, {"name": "Geodetic longitude", "abbreviation": "Lon", "direction": "east", "unit": "degree"}]}, "scope": "Horizontal component of 3D system.", "area": "World.", "bbox": {"south_latitude": -90, "west_longitude": -180, "north_latitude": 90, "east_longitude": 180}, "id": {"authority": "EPSG", "code": 4326}}
Shape: (120, 2)
address_id geometry
0 A0001 POINT (18.53441 -33.94184)
1 A0002 POINT (18.50296 -33.93623)
2 A0003 POINT (18.53234 -33.95918)
3 A0004 POINT (18.44857 -33.98103)
4 A0005 POINT (18.49928 -33.99649)
=== Hospitals ===
Columns: ['hospital_id', 'name', 'geometry']
CRS: {"$schema": "https://proj.org/schemas/v0.7/projjson.schema.json", "type": "GeographicCRS", "name": "WGS 84", "datum_ensemble": {"name": "World Geodetic System 1984 ensemble", "members": [{"name": "World Geodetic System 1984 (Transit)"}, {"name": "World Geodetic System 1984 (G730)"}, {"name": "World Geodetic System 1984 (G873)"}, {"name": "World Geodetic System 1984 (G1150)"}, {"name": "World Geodetic System 1984 (G1674)"}, {"name": "World Geodetic System 1984 (G1762)"}, {"name": "World Geodetic System 1984 (G2139)"}, {"name": "World Geodetic System 1984 (G2296)"}], "ellipsoid": {"name": "WGS 84", "semi_major_axis": 6378137, "inverse_flattening": 298.257223563}, "accuracy": "2.0", "id": {"authority": "EPSG", "code": 6326}}, "coordinate_system": {"subtype": "ellipsoidal", "axis": [{"name": "Geodetic latitude", "abbreviation": "Lat", "direction": "north", "unit": "degree"}, {"name": "Geodetic longitude", "abbreviation": "Lon", "direction": "east", "unit": "degree"}]}, "scope": "Horizontal component of 3D system.", "area": "World.", "bbox": {"south_latitude": -90, "west_longitude": -180, "north_latitude": 90, "east_longitude": 180}, "id": {"authority": "EPSG", "code": 4326}}
Shape: (37, 3)
hospital_id name geometry
0 H001 Mowbray Maternity Hospital POINT (18.47483 -33.94921)
1 H002 Life Claremont Hospital POINT (18.46637 -33.98695)
2 H003 Old Somerset Hospital POINT (18.42013 -33.9158)
3 H004 Mpheni Block C Elim Hospital POINT (18.43246 -33.92207)
4 H005 Life Kingsbury Hospital POINT (18.46883 -33.98645)
0:11
Good, I can see the data structure. Both datasets are in WGS 84 (EPSG:4326). I need to: 1. For each address, find the nearest hospital by straight-line distance 2. Record that hospital's name and the distance in metres 3. Keep original geometry and address_id 4. Write to nearest_hospital.gpkg Since both are in lat/lon (EPSG:4326), I need to compute distances in metres. The best approach is to: - Project to a suitable projected CRS for South Africa (like EPSG:32734 - UTM zone 34S which covers the Western Cape), OR - Use the more accurate geodesic distance via pyproj or shapely's distance with a geographic CRS. Let me use a projected approach. Since the data is around Cape Town / Western Cape, I'll use a local projected CRS. Let me project both to a suitable CRS for distance calculations in metres, then compute nearest neighbor. Actually, the simplest and most accurate approach is to use `geopandas.sjoin_nearest` with a projected CRS. Or I can compute distances using `shapely`'s `distance` method on a projected GeoDataFrame. Let me use `sjoin_nearest` which does exactly this - for each geometry in the left dataframe, find the nearest geometry in the right dataframe and compute the distance. But `sjoin_nearest` returns the nearest geometry and distance in the units of the CRS. If I project to a metric CRS, the distance will be in metres. Let me check: the Western Cape is in UTM zone 34S (EPSG:32734). Let me project both to that CRS and use sjoin_nearest. Actually, let me use a slightly different approach - for accuracy over a larger area, I could compute geodesic distances. But UTM 34S should work fine for the Western Cape region. Let me write the script.
0:11
Bash
cd /work && python -c "
cd /work && python -c "
import geopandas as gpd
from shapely import wkt
# Check the extent to determine appropriate UTM zone
addr = gpd.read_parquet('addresses.parquet')
print('Total bounds:', addr.total_bounds)
print('Centroid:', addr.geometry.unary_union.centroid)
"
Total bounds: [ 18.4008295 -33.99916655 18.54986077 -33.90055752] Centroid: POINT (18.495574599523838 -33.949285496022604)
<string>:8: DeprecationWarning: The 'unary_union' attribute is deprecated, use the 'union_all()' method instead.
0:20
The data is in WGS 84 (EPSG:4326). The bounds are around Cape Town / Western Cape. For accurate distance calculations, I should project to a suitable local projected CRS. The UTM zone for this area is 34S (EPSG:32734). Let me write the script to: 1. Read both files 2. Project to UTM 34S (EPSG:32734) for accurate distance calculation 3. For each address, find the nearest hospital by calculating distances to all hospitals 4. Keep the smallest distance and the hospital name 5. Write to GeoPackage with the original geometry (in EPSG:4326) and the required columns Let me write the solve script.
0:20
Write
/work/solve.py
content · 67 lines · py
import geopandas as gpd
import pandas as pd
import numpy as np
from shapely import wkt
# Read input files
addr = gpd.read_parquet('/work/addresses.parquet')
hosp = gpd.read_parquet('/work/hospitals.parquet')
# Ensure both have the same CRS (they are both EPSG:4326)
print(f"Address CRS: {addr.crs}")
print(f"Hospital CRS: {hosp.crs}")
# Project to UTM 34S (suitable for Western Cape) for accurate distance calculations
utm_crs = "EPSG:32734"
addr_proj = addr.to_crs(utm_crs)
hosp_proj = hosp.to_crs(utm_crs)
# Extract projected coordinates as numpy arrays
addr_coords = np.column_stack([
addr_proj.geometry.x.values,
addr_proj.geometry.y.values
])
hosp_coords = np.column_stack([
hosp_proj.geometry.x.values,
hosp_proj.geometry.y.values
])
# Compute pairwise distances using broadcasting
# addr_coords shape: (N, 2), hosp_coords shape: (M, 2)
# Result shape: (N, M)
dist_matrix = np.sqrt(
(addr_coords[:, np.newaxis, 0] - hosp_coords[np.newaxis, :, 0]) ** 2 +
(addr_coords[:, np.newaxis, 1] - hosp_coords[np.newaxis, :, 1]) ** 2
)
# For each address, find the index of the nearest hospital
nearest_idx = np.argmin(dist_matrix, axis=1)
min_distances = dist_matrix[np.arange(len(dist_matrix)), nearest_idx]
# Get the hospital names
hospital_names = hosp['name'].values
nearest_names = hospital_names[nearest_idx]
# Build result GeoDataFrame - keep original geometry (EPSG:4326)
result = addr[['address_id', 'geometry']].copy()
result['nearest_hospital_name'] = nearest_names
result['distance_m'] = min_distances
# Verify constraints
assert result['nearest_hospital_name'].notna().all(), "Found null hospital names"
assert (result['nearest_hospital_name'].str.strip() != '').all(), "Found empty hospital names"
assert result['distance_m'].notna().all(), "Found null distances"
assert np.isfinite(result['distance_m']).all(), "Found non-finite distances"
assert (result['distance_m'] >= 0).all(), "Found negative distances"
assert result['distance_m'].dtype in [np.float64, np.float32], "Distance must be numeric float"
print(f"\nResult shape: {result.shape}")
print(f"Columns: {result.columns.tolist()}")
print(f"CRS: {result.crs}")
print(f"\nSample results:")
print(result.head(10))
print(f"\nDistance stats: min={result['distance_m'].min():.2f}m, max={result['distance_m'].max():.2f}m, mean={result['distance_m'].mean():.2f}m")
# Write to GeoPackage
result.to_file('/work/nearest_hospital.gpkg', layer='nearest_hospital', driver='GPKG')
print("\nWritten to /work/nearest_hospital.gpkg")
Wrote /work/solve.py (2520 chars).
0:22
Let me run the script.
0:22
Bash
cd /work && python solve.py
Address CRS: {"$schema": "https://proj.org/schemas/v0.7/projjson.schema.json", "type": "GeographicCRS", "name": "WGS 84", "datum_ensemble": {"name": "World Geodetic System 1984 ensemble", "members": [{"name": "World Geodetic System 1984 (Transit)"}, {"name": "World Geodetic System 1984 (G730)"}, {"name": "World Geodetic System 1984 (G873)"}, {"name": "World Geodetic System 1984 (G1150)"}, {"name": "World Geodetic System 1984 (G1674)"}, {"name": "World Geodetic System 1984 (G1762)"}, {"name": "World Geodetic System 1984 (G2139)"}, {"name": "World Geodetic System 1984 (G2296)"}], "ellipsoid": {"name": "WGS 84", "semi_major_axis": 6378137, "inverse_flattening": 298.257223563}, "accuracy": "2.0", "id": {"authority": "EPSG", "code": 6326}}, "coordinate_system": {"subtype": "ellipsoidal", "axis": [{"name": "Geodetic latitude", "abbreviation": "Lat", "direction": "north", "unit": "degree"}, {"name": "Geodetic longitude", "abbreviation": "Lon", "direction": "east", "unit": "degree"}]}, "scope": "Horizontal component of 3D system.", "area": "World.", "bbox": {"south_latitude": -90, "west_longitude": -180, "north_latitude": 90, "east_longitude": 180}, "id": {"authority": "EPSG", "code": 4326}}
Hospital CRS: {"$schema": "https://proj.org/schemas/v0.7/projjson.schema.json", "type": "GeographicCRS", "name": "WGS 84", "datum_ensemble": {"name": "World Geodetic System 1984 ensemble", "members": [{"name": "World Geodetic System 1984 (Transit)"}, {"name": "World Geodetic System 1984 (G730)"}, {"name": "World Geodetic System 1984 (G873)"}, {"name": "World Geodetic System 1984 (G1150)"}, {"name": "World Geodetic System 1984 (G1674)"}, {"name": "World Geodetic System 1984 (G1762)"}, {"name": "World Geodetic System 1984 (G2139)"}, {"name": "World Geodetic System 1984 (G2296)"}], "ellipsoid": {"name": "WGS 84", "semi_major_axis": 6378137, "inverse_flattening": 298.257223563}, "accuracy": "2.0", "id": {"authority": "EPSG", "code": 6326}}, "coordinate_system": {"subtype": "ellipsoidal", "axis": [{"name": "Geodetic latitude", "abbreviation": "Lat", "direction": "north", "unit": "degree"}, {"name": "Geodetic longitude", "abbreviation": "Lon", "direction": "east", "unit": "degree"}]}, "scope": "Horizontal component of 3D system.", "area": "World.", "bbox": {"south_latitude": -90, "west_longitude": -180, "north_latitude": 90, "east_longitude": 180}, "id": {"authority": "EPSG", "code": 4326}}
Result shape: (120, 4)
Columns: ['address_id', 'geometry', 'nearest_hospital_name', 'distance_m']
CRS: {"$schema": "https://proj.org/schemas/v0.7/projjson.schema.json", "type": "GeographicCRS", "name": "WGS 84", "datum_ensemble": {"name": "World Geodetic System 1984 ensemble", "members": [{"name": "World Geodetic System 1984 (Transit)"}, {"name": "World Geodetic System 1984 (G730)"}, {"name": "World Geodetic System 1984 (G873)"}, {"name": "World Geodetic System 1984 (G1150)"}, {"name": "World Geodetic System 1984 (G1674)"}, {"name": "World Geodetic System 1984 (G1762)"}, {"name": "World Geodetic System 1984 (G2139)"}, {"name": "World Geodetic System 1984 (G2296)"}], "ellipsoid": {"name": "WGS 84", "semi_major_axis": 6378137, "inverse_flattening": 298.257223563}, "accuracy": "2.0", "id": {"authority": "EPSG", "code": 6326}}, "coordinate_system": {"subtype": "ellipsoidal", "axis": [{"name": "Geodetic latitude", "abbreviation": "Lat", "direction": "north", "unit": "degree"}, {"name": "Geodetic longitude", "abbreviation": "Lon", "direction": "east", "unit": "degree"}]}, "scope": "Horizontal component of 3D system.", "area": "World.", "bbox": {"south_latitude": -90, "west_longitude": -180, "north_latitude": 90, "east_longitude": 180}, "id": {"authority": "EPSG", "code": 4326}}
Sample results:
address_id ... distance_m
0 A0001 ... 3192.958737
1 A0002 ... 1468.601852
2 A0003 ... 1271.941611
3 A0004 ... 984.047189
4 A0005 ... 3021.737824
5 A0006 ... 1777.422286
6 A0007 ... 1268.558960
7 A0008 ... 1264.240167
8 A0009 ... 2020.478965
9 A0010 ... 1811.017934
[10 rows x 4 columns]
Distance stats: min=47.02m, max=5956.74m, mean=1841.99m
Written to /work/nearest_hospital.gpkg
0:27
Let me verify the output file and do a quick sanity check.
0:27
Bash
cd /work && python -c "
cd /work && python -c "
import geopandas as gpd
# Verify the output
result = gpd.read_file('/work/nearest_hospital.gpkg')
print('=== Output verification ===')
print(f'Shape: {result.shape}')
print(f'Columns: {result.columns.tolist()}')
print(f'CRS: {result.crs}')
print(f'Geometry type: {result.geometry.type.unique()}')
print()
# Check constraints
print('=== Constraint checks ===')
print(f'address_id not null: {result[\"address_id\"].notna().all()}')
print(f'address_id unique: {result[\"address_id\"].is_unique}')
print(f'nearest_hospital_name non-empty strings: {(result[\"nearest_hospital_name\"].notna() & (result[\"nearest_hospital_name\"].str.strip() != \"\")).all()} ')
print(f'distance_m not null: {result[\"distance_m\"].notna().all()}')
print(f'distance_m finite: {all(result[\"distance_m\"].apply(lambda x: x == x and x >= 0))}') # also checks no NaN
print(f'distance_m numeric: {result[\"distance_m\"].dtype}')
print(f'distance_m min: {result[\"distance_m\"].min()}')
print(f'distance_m >= 0: {(result[\"distance_m\"] >= 0).all()}')
print()
print('First 10 rows:')
print(result.head(10))
"
=== Output verification === Shape: (120, 4) Columns: ['address_id', 'nearest_hospital_name', 'distance_m', 'geometry'] CRS: EPSG:4326 Geometry type: <ArrowStringArray> ['Point'] Length: 1, dtype: str === Constraint checks === address_id not null: True address_id unique: True nearest_hospital_name non-empty strings: True distance_m not null: True distance_m finite: True distance_m numeric: float64 distance_m min: 47.01874251111449 distance_m >= 0: True First 10 rows: address_id ... geometry 0 A0001 ... POINT (18.53441 -33.94184) 1 A0002 ... POINT (18.50296 -33.93623) 2 A0003 ... POINT (18.53234 -33.95918) 3 A0004 ... POINT (18.44857 -33.98103) 4 A0005 ... POINT (18.49928 -33.99649) 5 A0006 ... POINT (18.48709 -33.97114) 6 A0007 ... POINT (18.54461 -33.96394) 7 A0008 ... POINT (18.52017 -33.97329) 8 A0009 ... POINT (18.51179 -33.97298) 9 A0010 ... POINT (18.49876 -33.9682) [10 rows x 4 columns]
0:34
Everything checks out. The output file `/work/nearest_hospital.gpkg` has been created with: - 120 features (one per input address) - Columns: `address_id`, `nearest_hospital_name`, `distance_m`, `geometry` (original WGS 84 point geometry) - `address_id` is preserved from the input - `nearest_hospital_name` is a non-empty string for every row - `distance_m` is numeric (float64), finite, and non-negative - Distances range from ~47 m to ~5,957 m (mean ~1,842 m)
0:34