spa-l1-capetown-hospital-nn
Map (reference ← swipe → agent)
0:00
I am putting together EMS coverage planning for the Western Cape and need each residential pickup address tagged with its nearest hospital so the ops-research team can join the result against shift schedules. For every address in `addresses.parquet`, find the nearest hospital from `hospitals.parquet` by straight-line distance and record that hospital's name plus the distance in metres. Keep the original `address_id` from the input so we can join back to the address master, and make sure `nearest_hospital_name` comes through as a non-empty string and `distance_m` as a numeric, finite, non-negative value in metres.
Please write the result to `nearest_hospital.gpkg` in GeoPackage format with one feature per input address, keeping the original address geometry and the columns `address_id`, `nearest_hospital_name`, and `distance_m`.
0:00
System info
{
"adapter": "openrouter",
"model": "deepseek/deepseek-v4-flash",
"base_url": "https://openrouter.ai/api/v1"
}
0:10
Let me start by examining the input files to understand their schemas and data.
0:10
Bash
ls -lh /work/addresses.parquet /work/hospitals.parquet 2>&1
-rw-r--r-- 1 agent agent 11K Jun 16 20:10 /work/addresses.parquet -rw-r--r-- 1 agent agent 9.8K Jun 16 20:10 /work/hospitals.parquet
0:10
Bash
cd /work && python3 -c "
cd /work && python3 -c "
import pandas as pd
df = pd.read_parquet('addresses.parquet')
print('=== addresses ===')
print('Columns:', list(df.columns))
print('Shape:', df.shape)
print(df.dtypes)
print(df.head(3))
print()
"
=== addresses === Columns: ['address_id', 'geometry'] Shape: (120, 2) address_id str geometry object dtype: object address_id geometry 0 A0001 b'\x01\x01\x00\x00\x00\xc2\x99$,\xcf\x882@\xcd... 1 A0002 b'\x01\x01\x00\x00\x00\xfcg\xbf\x12\xc2\x802@t... 2 A0003 b'\x01\x01\x00\x00\x00\xb8\xa8\x80\x7fG\x882@r...
0:11
Bash
cd /work && python3 -c "
cd /work && python3 -c "
import pandas as pd
df = pd.read_parquet('hospitals.parquet')
print('=== hospitals ===')
print('Columns:', list(df.columns))
print('Shape:', df.shape)
print(df.dtypes)
print(df.head(3))
print()
"
=== hospitals === Columns: ['hospital_id', 'name', 'geometry'] Shape: (37, 3) hospital_id str name str geometry object dtype: object hospital_id ... geometry 0 H001 ... b"\x01\x01\x00\x00\x00\xb7''^\x8ey2@\x12\xf4z\... 1 H002 ... b'\x01\x01\x00\x00\x00\xb5\x01THdw2@\x15\xe1`N... 2 H003 ... b'\x01\x01\x00\x00\x00\x0b\xe8K\x90\x8dk2@\xd6... [3 rows x 3 columns]
0:17
The geometries are in WKB binary format (the `\x01\x01\x00...` is WKB for a point). Let me read them properly with geopandas to see the CRS and coordinates.
0:17
Bash
cd /work && python3 -c "
cd /work && python3 -c "
import geopandas as gpd
hosp = gpd.read_parquet('hospitals.parquet')
print('=== hospitals ===')
print(hosp.head())
print('CRS:', hosp.crs)
print('Geometry types:', hosp.geometry.type.unique())
addr = gpd.read_parquet('addresses.parquet')
print()
print('=== addresses ===')
print(addr.head())
print('CRS:', addr.crs)
print('Geometry types:', addr.geometry.type.unique())
print('Total addresses:', len(addr))
"
=== hospitals ===
hospital_id name geometry
0 H001 Mowbray Maternity Hospital POINT (18.47483 -33.94921)
1 H002 Life Claremont Hospital POINT (18.46637 -33.98695)
2 H003 Old Somerset Hospital POINT (18.42013 -33.9158)
3 H004 Mpheni Block C Elim Hospital POINT (18.43246 -33.92207)
4 H005 Life Kingsbury Hospital POINT (18.46883 -33.98645)
CRS: {"$schema": "https://proj.org/schemas/v0.7/projjson.schema.json", "type": "GeographicCRS", "name": "WGS 84", "datum_ensemble": {"name": "World Geodetic System 1984 ensemble", "members": [{"name": "World Geodetic System 1984 (Transit)"}, {"name": "World Geodetic System 1984 (G730)"}, {"name": "World Geodetic System 1984 (G873)"}, {"name": "World Geodetic System 1984 (G1150)"}, {"name": "World Geodetic System 1984 (G1674)"}, {"name": "World Geodetic System 1984 (G1762)"}, {"name": "World Geodetic System 1984 (G2139)"}, {"name": "World Geodetic System 1984 (G2296)"}], "ellipsoid": {"name": "WGS 84", "semi_major_axis": 6378137, "inverse_flattening": 298.257223563}, "accuracy": "2.0", "id": {"authority": "EPSG", "code": 6326}}, "coordinate_system": {"subtype": "ellipsoidal", "axis": [{"name": "Geodetic latitude", "abbreviation": "Lat", "direction": "north", "unit": "degree"}, {"name": "Geodetic longitude", "abbreviation": "Lon", "direction": "east", "unit": "degree"}]}, "scope": "Horizontal component of 3D system.", "area": "World.", "bbox": {"south_latitude": -90, "west_longitude": -180, "north_latitude": 90, "east_longitude": 180}, "id": {"authority": "EPSG", "code": 4326}}
Geometry types: <ArrowStringArray>
['Point']
Length: 1, dtype: str
=== addresses ===
address_id geometry
0 A0001 POINT (18.53441 -33.94184)
1 A0002 POINT (18.50296 -33.93623)
2 A0003 POINT (18.53234 -33.95918)
3 A0004 POINT (18.44857 -33.98103)
4 A0005 POINT (18.49928 -33.99649)
CRS: {"$schema": "https://proj.org/schemas/v0.7/projjson.schema.json", "type": "GeographicCRS", "name": "WGS 84", "datum_ensemble": {"name": "World Geodetic System 1984 ensemble", "members": [{"name": "World Geodetic System 1984 (Transit)"}, {"name": "World Geodetic System 1984 (G730)"}, {"name": "World Geodetic System 1984 (G873)"}, {"name": "World Geodetic System 1984 (G1150)"}, {"name": "World Geodetic System 1984 (G1674)"}, {"name": "World Geodetic System 1984 (G1762)"}, {"name": "World Geodetic System 1984 (G2139)"}, {"name": "World Geodetic System 1984 (G2296)"}], "ellipsoid": {"name": "WGS 84", "semi_major_axis": 6378137, "inverse_flattening": 298.257223563}, "accuracy": "2.0", "id": {"authority": "EPSG", "code": 6326}}, "coordinate_system": {"subtype": "ellipsoidal", "axis": [{"name": "Geodetic latitude", "abbreviation": "Lat", "direction": "north", "unit": "degree"}, {"name": "Geodetic longitude", "abbreviation": "Lon", "direction": "east", "unit": "degree"}]}, "scope": "Horizontal component of 3D system.", "area": "World.", "bbox": {"south_latitude": -90, "west_longitude": -180, "north_latitude": 90, "east_longitude": 180}, "id": {"authority": "EPSG", "code": 4326}}
Geometry types: <ArrowStringArray>
['Point']
Length: 1, dtype: str
Total addresses: 120
0:32
Let me understand the data: 1. **addresses.parquet**: 120 rows with columns `address_id` (str) and `geometry` (Point, WGS84/EPSG:4326) 2. **hospitals.parquet**: 37 rows with columns `hospital_id` (str), `name` (str), `geometry` (Point, WGS84/EPSG:4326) I need to: - For each address, find the nearest hospital by straight-line distance (geodesic, since data is in WGS84) - Record the hospital's name and distance in metres - Write to `nearest_hospital.gpkg` with columns: `address_id`, `nearest_hospital_name`, `distance_m`, plus the original geometry Best approach: use the `shapely` and `geopandas` libraries with a spatial index (R-tree) for efficient nearest-neighbor lookup. Since we're in WGS84, I'll compute great-circle distances using the haversine formula or use pyproj's Geod for accurate ellipsoidal distances. Let me think about performance: 120 addresses vs 37 hospitals. That's tiny, so even a brute-force O(n*m) approach is fine. But I'll use sjoin_nearest or a manual approach with a spatial index. I'll use `geopandas.sjoin_nearest` which does exactly this - for each geometry in the left DataFrame, find the nearest geometry in the right DataFrame. With `distance_col=True` it adds the distance in the CRS units. Since CRS is degrees, I need to be careful - the distance will be in degrees. So I should project to a suitable local CRS (UTM or a Mercator-based one for South Africa/Western Cape). Western Cape, South Africa is around -34°S, 18-24°E. UTM zone 34S (EPSG:32734) would be appropriate, or use the custom Mercator/Azimuthal projection. Let me use EPSG:32734 (UTM zone 34S) which covers most of the Western Cape. Actually, let me project both datasets to a metric CRS first, then do the nearest join with distance in metres. Let me write the script. Alternative: I could compute geodesic distances using `pyproj.Geod` which would be more accurate, but for nearest neighbor search, projecting to UTM and using Euclidean distance is fine given the small area. Let me use `sjoin_nearest` with the data reprojected to a suitable UTM zone. Western Cape spans roughly 18-24°E. UTM zone 34 covers 18-24°E (central meridian 21°E). The southern hemisphere zone is EPSG:32734. That should work well. Let me write the script.
0:32
Write
/work/solve.py
content · 47 lines · py
import geopandas as gpd
# ── 1. Load source data ──────────────────────────────────────────────
addr = gpd.read_parquet("/work/addresses.parquet")
hosp = gpd.read_parquet("/work/hospitals.parquet")
# Both are in EPSG:4326 (WGS84). For metre-distance calculation we
# reproject to UTM zone 34S, which covers the Western Cape well.
UTM_S = "EPSG:32734" # UTM zone 34S
addr_utm = addr.to_crs(UTM_S)
hosp_utm = hosp.to_crs(UTM_S)
# ── 2. Nearest-hospital join ─────────────────────────────────────────
# sjoin_nearest performs a one-to-one match: for each address it finds
# the single closest hospital. distance_col gives metres in the
# projected CRS.
joined = gpd.sjoin_nearest(
addr_utm[["address_id", "geometry"]],
hosp_utm[["name", "geometry"]],
how="left",
distance_col="distance_m",
)
# Rename 'name' to the column the user wants
joined = joined.rename(columns={"name": "nearest_hospital_name"})
# ── 3. Assemble output columns ──────────────────────────────────────
# Keep original WGS84 geometry (reproject back), plus the required
# attribute columns.
joined = joined.to_crs("EPSG:4326")
joined = joined[["address_id", "nearest_hospital_name", "distance_m", "geometry"]].copy()
# Sanity checks per the prompt
assert joined["nearest_hospital_name"].notna().all(), "Some nearest_hospital_name is null!"
assert (joined["nearest_hospital_name"].str.len() > 0).all(), "Some nearest_hospital_name is empty!"
assert joined["distance_m"].notna().all(), "Some distance_m is null!"
assert (joined["distance_m"] >= 0).all(), "Negative distance encountered!"
assert (joined["distance_m"] < float("inf")).all(), "Infinite distance encountered!"
# ── 4. Write output ─────────────────────────────────────────────────
joined.to_file("/work/nearest_hospital.gpkg", layer="nearest_hospital", driver="GPKG")
print("✓ nearest_hospital.gpkg written successfully")
print(f" Addresses processed : {len(joined)}")
print(f" Columns : {list(joined.columns)}")
print(f" Distance range (m) : {joined['distance_m'].min():.1f} – {joined['distance_m'].max():.1f}")
Wrote /work/solve.py (2093 chars).
0:43
Let me run the script first, then verify the output.
0:43
Bash
cd /work && python solve.py
✓ nearest_hospital.gpkg written successfully Addresses processed : 120 Columns : ['address_id', 'nearest_hospital_name', 'distance_m', 'geometry'] Distance range (m) : 47.0 – 5956.7
0:51
Let me verify the output file visually, and also open it with geopandas to confirm all constraints are met.
0:51
Bash
cd /work && python3 -c "
cd /work && python3 -c "
import geopandas as gpd
df = gpd.read_file('nearest_hospital.gpkg')
print('Shape:', df.shape)
print('Columns:', list(df.columns))
print('CRS:', df.crs)
print()
print('Head:')
print(df.head(10).to_string())
print()
print('Tail:')
print(df.tail(5).to_string())
print()
print('Null checks:')
print(' nearest_hospital_name is null:', df['nearest_hospital_name'].isna().sum())
print(' distance_m is null:', df['distance_m'].isna().sum())
print(' distance_m < 0:', (df['distance_m'] < 0).sum())
print(' distance_m == inf:', (df['distance_m'] == float('inf')).sum())
print()
print('Distance range:', df['distance_m'].min(), '-', df['distance_m'].max())
print()
# Check that every address_id from input is present
import pandas as pd
orig = pd.read_parquet('/work/addresses.parquet')
orig_ids = set(orig['address_id'])
result_ids = set(df['address_id'])
print('Missing addresses:', orig_ids - result_ids)
print('Extra addresses:', result_ids - orig_ids)
"
Shape: (120, 4)
Columns: ['address_id', 'nearest_hospital_name', 'distance_m', 'geometry']
CRS: EPSG:4326
Head:
address_id nearest_hospital_name distance_m geometry
0 A0001 Melomed Gatesville Private Hospital 3192.958737 POINT (18.53441 -33.94184)
1 A0002 Life Vincent Pallotti Hospital 1468.601852 POINT (18.50296 -33.93623)
2 A0003 Melomed Gatesville Private Hospital 1271.941611 POINT (18.53234 -33.95918)
3 A0004 Maitland Cottage Children's Orthopaedic Hospital 984.047189 POINT (18.44857 -33.98103)
4 A0005 Kingsbury Hospital 3021.737824 POINT (18.49928 -33.99649)
5 A0006 Intercare Physical Rehabilitation Hospital - Newlands 1777.422286 POINT (18.48709 -33.97114)
6 A0007 Melomed Gatesville Private Hospital 1268.558960 POINT (18.54461 -33.96394)
7 A0008 Melomed Gatesville Private Hospital 1264.240167 POINT (18.52017 -33.97329)
8 A0009 Melomed Gatesville Private Hospital 2020.478965 POINT (18.51179 -33.97298)
9 A0010 Red Cross Children's Hospital 1811.017934 POINT (18.49876 -33.9682)
Tail:
address_id nearest_hospital_name distance_m geometry
115 A0116 Brooklyn Chest Hospital 2119.571801 POINT (18.49986 -33.91486)
116 A0117 Red Cross War Memorial Children's Hospital 1440.855099 POINT (18.49998 -33.96303)
117 A0118 Brooklyn Chest Hospital 3695.971913 POINT (18.52359 -33.91002)
118 A0119 Maitland Cottage Children's Orthopaedic Hospital 1420.765451 POINT (18.44483 -33.98346)
119 A0120 Life Vincent Pallotti Hospital 798.332128 POINT (18.49497 -33.93821)
Null checks:
nearest_hospital_name is null: 0
distance_m is null: 0
distance_m < 0: 0
distance_m == inf: 0
Distance range: 47.01874251111449 - 5956.744689131489
Missing addresses: set()
Extra addresses: set()
1:00
Everything checks out perfectly: - 120 addresses processed (all present, none missing/extra) - Columns are `address_id`, `nearest_hospital_name`, `distance_m`, `geometry` (original WGS84) - No null values in either required column - All distances are non-negative, finite, in metres - nearest_hospital_name is always a non-empty string - Written to GeoPackage The output file `nearest_hospital.gpkg` is ready.
1:00