spa-l1-capetown-hospital-nn

Status: done
Score: 1.000
Duration: 0:44
Cost: 0.16¢
Model: deepseek/deepseek-v4-flash

Map (reference ← swipe → agent)

0:00
I am putting together EMS coverage planning for the Western Cape and need each residential pickup address tagged with its nearest hospital so the ops-research team can join the result against shift schedules. For every address in `addresses.parquet`, find the nearest hospital from `hospitals.parquet` by straight-line distance and record that hospital's name plus the distance in metres. Keep the original `address_id` from the input so we can join back to the address master, and make sure `nearest_hospital_name` comes through as a non-empty string and `distance_m` as a numeric, finite, non-negative value in metres. Please write the result to `nearest_hospital.gpkg` in GeoPackage format with one feature per input address, keeping the original address geometry and the columns `address_id`, `nearest_hospital_name`, and `distance_m`.
0:00
System info
{
  "adapter": "openrouter",
  "model": "deepseek/deepseek-v4-flash",
  "base_url": "https://openrouter.ai/api/v1"
}
0:04
Let me start by examining the input files to understand their structure.
0:04
Bash python3 -c "
python3 -c "
import geopandas as gpd
import pandas as pd

# Read the parquet files
addr = gpd.read_parquet('/work/addresses.parquet')
hosp = gpd.read_parquet('/work/hospitals.parquet')

print('=== Addresses ===')
print(addr.head())
print(addr.columns.tolist())
print(addr.crs)
print(addr.dtypes)
print(f'Count: {len(addr)}')

print()
print('=== Hospitals ===')
print(hosp.head())
print(hosp.columns.tolist())
print(hosp.crs)
print(hosp.dtypes)
print(f'Count: {len(hosp)}')
"
=== Addresses ===
  address_id                    geometry
0      A0001  POINT (18.53441 -33.94184)
1      A0002  POINT (18.50296 -33.93623)
2      A0003  POINT (18.53234 -33.95918)
3      A0004  POINT (18.44857 -33.98103)
4      A0005  POINT (18.49928 -33.99649)
['address_id', 'geometry']
{"$schema": "https://proj.org/schemas/v0.7/projjson.schema.json", "type": "GeographicCRS", "name": "WGS 84", "datum_ensemble": {"name": "World Geodetic System 1984 ensemble", "members": [{"name": "World Geodetic System 1984 (Transit)"}, {"name": "World Geodetic System 1984 (G730)"}, {"name": "World Geodetic System 1984 (G873)"}, {"name": "World Geodetic System 1984 (G1150)"}, {"name": "World Geodetic System 1984 (G1674)"}, {"name": "World Geodetic System 1984 (G1762)"}, {"name": "World Geodetic System 1984 (G2139)"}, {"name": "World Geodetic System 1984 (G2296)"}], "ellipsoid": {"name": "WGS 84", "semi_major_axis": 6378137, "inverse_flattening": 298.257223563}, "accuracy": "2.0", "id": {"authority": "EPSG", "code": 6326}}, "coordinate_system": {"subtype": "ellipsoidal", "axis": [{"name": "Geodetic latitude", "abbreviation": "Lat", "direction": "north", "unit": "degree"}, {"name": "Geodetic longitude", "abbreviation": "Lon", "direction": "east", "unit": "degree"}]}, "scope": "Horizontal component of 3D system.", "area": "World.", "bbox": {"south_latitude": -90, "west_longitude": -180, "north_latitude": 90, "east_longitude": 180}, "id": {"authority": "EPSG", "code": 4326}}
address_id         str
geometry      geometry
dtype: object
Count: 120

=== Hospitals ===
  hospital_id                          name                    geometry
0        H001    Mowbray Maternity Hospital  POINT (18.47483 -33.94921)
1        H002       Life Claremont Hospital  POINT (18.46637 -33.98695)
2        H003         Old Somerset Hospital   POINT (18.42013 -33.9158)
3        H004  Mpheni Block C Elim Hospital  POINT (18.43246 -33.92207)
4        H005       Life Kingsbury Hospital  POINT (18.46883 -33.98645)
['hospital_id', 'name', 'geometry']
{"$schema": "https://proj.org/schemas/v0.7/projjson.schema.json", "type": "GeographicCRS", "name": "WGS 84", "datum_ensemble": {"name": "World Geodetic System 1984 ensemble", "members": [{"name": "World Geodetic System 1984 (Transit)"}, {"name": "World Geodetic System 1984 (G730)"}, {"name": "World Geodetic System 1984 (G873)"}, {"name": "World Geodetic System 1984 (G1150)"}, {"name": "World Geodetic System 1984 (G1674)"}, {"name": "World Geodetic System 1984 (G1762)"}, {"name": "World Geodetic System 1984 (G2139)"}, {"name": "World Geodetic System 1984 (G2296)"}], "ellipsoid": {"name": "WGS 84", "semi_major_axis": 6378137, "inverse_flattening": 298.257223563}, "accuracy": "2.0", "id": {"authority": "EPSG", "code": 6326}}, "coordinate_system": {"subtype": "ellipsoidal", "axis": [{"name": "Geodetic latitude", "abbreviation": "Lat", "direction": "north", "unit": "degree"}, {"name": "Geodetic longitude", "abbreviation": "Lon", "direction": "east", "unit": "degree"}]}, "scope": "Horizontal component of 3D system.", "area": "World.", "bbox": {"south_latitude": -90, "west_longitude": -180, "north_latitude": 90, "east_longitude": 180}, "id": {"authority": "EPSG", "code": 4326}}
hospital_id         str
name                str
geometry       geometry
dtype: object
Count: 37
0:23
Both datasets are in WGS 84 (EPSG:4326). I need to:

1. For each address point, find the nearest hospital by straight-line distance.
2. Record the hospital name and distance in metres.
3. Write to GeoPackage with address geometry, address_id, nearest_hospital_name, distance_m.

Since the data is in lon/lat degrees, I need to compute great-circle distances (Haversine) or project to a local Cartesian CRS for the Western Cape area. Let me think about the best approach.

For straight-line distance in metres, I can:
- Use pyproj to compute geodesic distances
- Or project to a suitable projected CRS for South Africa (like EPSG:32734 - UTM zone 34S, which covers Western Cape)
- Or use shapely's distance but that would be in degrees, not metres

Let me use a projected CRS approach or geodesic computation.

Actually, the simplest robust approach: compute distances using the WGS 84 ellipsoid via pyproj's Geod, or project the data to a local CRS.

Let me use a UTM projection. The Western Cape roughly spans longitudes 18°E to 23°E, so UTM zone 34S (EPSG:32734) covers 18°E to 24°E, and 33S (EPSG:32733) covers 12°E to 18°E. Let me check which zone covers most of the data.

Looking at the sample data, the coordinates are around 18.4-18.5°E, so that's UTM zone 34S (EPSG:32734). But to be safe, let me use a more localized approach.

Actually, the simplest accurate method: use `geopandas.sjoin_nearest` with a projected CRS. Let me project both to EPSG:32734 (UTM zone 34S) which gives metres.

Let me write the script.
0:23
Write /work/solve.py
content · 58 lines · py
import geopandas as gpd
import pandas as pd
import numpy as np

# Read inputs
addr = gpd.read_parquet('/work/addresses.parquet')
hosp = gpd.read_parquet('/work/hospitals.parquet')

# Both are in EPSG:4326 (WGS 84 degrees). For accurate distance in metres,
# project to a suitable projected CRS covering the Western Cape.
# UTM zone 34S (EPSG:32734) covers 18°E–24°E, appropriate for most of the Western Cape.
PROJ_CRS = 'EPSG:32734'

addr_proj = addr.to_crs(PROJ_CRS)
hosp_proj = hosp.to_crs(PROJ_CRS)

# For each address, find the nearest hospital using spatial join
# sjoin_nearest returns the nearest neighbour with distance
result = gpd.sjoin_nearest(
    addr_proj,
    hosp_proj[['hospital_id', 'name', 'geometry']],
    how='left',
    max_distance=None,   # no distance limit
    distance_col='distance_m'
)

# Group by address_id in case of ties (should be fine, sjoin_nearest returns 1 row per left)
# But sjoin_nearest with how='left' returns one row per left geometry by default
# Actually it returns one row per match; if there are ties (same distance) it may return multiple rows.
# Let's handle that by grouping and taking the first (or any).
result = result.sort_values('distance_m').groupby('address_id').first().reset_index()

# Rename columns as required
result = result.rename(columns={'name': 'nearest_hospital_name'})

# Keep only the needed columns
result = result[['address_id', 'nearest_hospital_name', 'distance_m', 'geometry']]

# Ensure distance_m is numeric, finite, non-negative
result['distance_m'] = pd.to_numeric(result['distance_m'], errors='coerce')
result = result[result['distance_m'].notna() & np.isfinite(result['distance_m'])]
result['distance_m'] = result['distance_m'].clip(lower=0.0)

# Ensure nearest_hospital_name is non-empty string
result['nearest_hospital_name'] = result['nearest_hospital_name'].fillna('').astype(str)
result = result[result['nearest_hospital_name'].str.strip() != '']

# Convert geometry back to original CRS (WGS 84)
result = result.set_geometry('geometry').set_crs(PROJ_CRS).to_crs(addr.crs)

# Write to GeoPackage
result.to_file('/work/nearest_hospital.gpkg', layer='nearest_hospital', driver='GPKG')

print(f'Written {len(result)} records to /work/nearest_hospital.gpkg')
print(result.head())
print('CRS:', result.crs)
print('distance_m summary:')
print(result['distance_m'].describe())
Wrote /work/solve.py (2357 chars).
0:27
Let me run it and check the result.
0:27
Bash cd /work && python solve.py
Written 120 records to /work/nearest_hospital.gpkg
  address_id  ...                    geometry
0      A0001  ...  POINT (18.53441 -33.94184)
1      A0002  ...  POINT (18.50296 -33.93623)
2      A0003  ...  POINT (18.53234 -33.95918)
3      A0004  ...  POINT (18.44857 -33.98103)
4      A0005  ...  POINT (18.49928 -33.99649)

[5 rows x 4 columns]
CRS: {"$schema": "https://proj.org/schemas/v0.7/projjson.schema.json", "type": "GeographicCRS", "name": "WGS 84", "datum_ensemble": {"name": "World Geodetic System 1984 ensemble", "members": [{"name": "World Geodetic System 1984 (Transit)"}, {"name": "World Geodetic System 1984 (G730)"}, {"name": "World Geodetic System 1984 (G873)"}, {"name": "World Geodetic System 1984 (G1150)"}, {"name": "World Geodetic System 1984 (G1674)"}, {"name": "World Geodetic System 1984 (G1762)"}, {"name": "World Geodetic System 1984 (G2139)"}, {"name": "World Geodetic System 1984 (G2296)"}], "ellipsoid": {"name": "WGS 84", "semi_major_axis": 6378137, "inverse_flattening": 298.257223563}, "accuracy": "2.0", "id": {"authority": "EPSG", "code": 6326}}, "coordinate_system": {"subtype": "ellipsoidal", "axis": [{"name": "Geodetic latitude", "abbreviation": "Lat", "direction": "north", "unit": "degree"}, {"name": "Geodetic longitude", "abbreviation": "Lon", "direction": "east", "unit": "degree"}]}, "scope": "Horizontal component of 3D system.", "area": "World.", "bbox": {"south_latitude": -90, "west_longitude": -180, "north_latitude": 90, "east_longitude": 180}, "id": {"authority": "EPSG", "code": 4326}}
distance_m summary:
count     120.000000
mean     1841.992144
std      1187.555528
min        47.018743
25%       996.402110
50%      1582.328219
75%      2629.297510
max      5956.744689
Name: distance_m, dtype: float64
0:35
The result looks good. Let me verify the output file is valid and has the right structure.
0:35
Bash python3 -c "
python3 -c "
import geopandas as gpd

gdf = gpd.read_file('/work/nearest_hospital.gpkg')
print('Columns:', gdf.columns.tolist())
print('Count:', len(gdf))
print('CRS:', gdf.crs)
print()
print('Sample:')
print(gdf.head(10).to_string())

# Check constraints
print()
print('=== Validations ===')
print('All address_id non-null:', gdf['address_id'].notna().all())
print('All nearest_hospital_name non-empty:', (gdf['nearest_hospital_name'].astype(str).str.strip() != '').all())
print('All distance_m numeric & finite:', gdf['distance_m'].notna().all() and all(gdf['distance_m'].notna()))
print('All distance_m >= 0:', (gdf['distance_m'] >= 0).all())
print('Geometry type:', gdf.geometry.type.unique())
"
Columns: ['address_id', 'nearest_hospital_name', 'distance_m', 'geometry']
Count: 120
CRS: EPSG:4326

Sample:
  address_id                                  nearest_hospital_name   distance_m                    geometry
0      A0001                    Melomed Gatesville Private Hospital  3192.958737  POINT (18.53441 -33.94184)
1      A0002                         Life Vincent Pallotti Hospital  1468.601852  POINT (18.50296 -33.93623)
2      A0003                    Melomed Gatesville Private Hospital  1271.941611  POINT (18.53234 -33.95918)
3      A0004       Maitland Cottage Children's Orthopaedic Hospital   984.047189  POINT (18.44857 -33.98103)
4      A0005                                     Kingsbury Hospital  3021.737824  POINT (18.49928 -33.99649)
5      A0006  Intercare Physical Rehabilitation Hospital - Newlands  1777.422286  POINT (18.48709 -33.97114)
6      A0007                    Melomed Gatesville Private Hospital  1268.558960  POINT (18.54461 -33.96394)
7      A0008                    Melomed Gatesville Private Hospital  1264.240167  POINT (18.52017 -33.97329)
8      A0009                    Melomed Gatesville Private Hospital  2020.478965  POINT (18.51179 -33.97298)
9      A0010                          Red Cross Children's Hospital  1811.017934   POINT (18.49876 -33.9682)

=== Validations ===
All address_id non-null: True
All nearest_hospital_name non-empty: True
All distance_m numeric & finite: True
All distance_m >= 0: True
Geometry type: <ArrowStringArray>
['Point']
Length: 1, dtype: str
0:42
Everything checks out. All 120 addresses have been processed with:
- `address_id` preserved
- `nearest_hospital_name` as non-empty string
- `distance_m` as numeric, finite, non-negative (in metres)
- Original geometry preserved in WGS 84 (EPSG:4326)
- Output written to `/work/nearest_hospital.gpkg`

Let me confirm the file exists and is valid.
0:42