spa-l1-capetown-hospital-nn

Status: done
Score: 1.000
Duration: 0:35
Cost: 0.21¢
Model: deepseek/deepseek-v4-flash

Map (reference ← swipe → agent)

0:00
I am putting together EMS coverage planning for the Western Cape and need each residential pickup address tagged with its nearest hospital so the ops-research team can join the result against shift schedules. For every address in `addresses.parquet`, find the nearest hospital from `hospitals.parquet` by straight-line distance and record that hospital's name plus the distance in metres. Keep the original `address_id` from the input so we can join back to the address master, and make sure `nearest_hospital_name` comes through as a non-empty string and `distance_m` as a numeric, finite, non-negative value in metres. Please write the result to `nearest_hospital.gpkg` in GeoPackage format with one feature per input address, keeping the original address geometry and the columns `address_id`, `nearest_hospital_name`, and `distance_m`.
0:00
System info
{
  "adapter": "openrouter",
  "model": "deepseek/deepseek-v4-flash",
  "base_url": "https://openrouter.ai/api/v1"
}
0:02
Let me start by examining the data files to understand their structure.
0:02
Bash cd /work && python -c "
cd /work && python -c "
import geopandas as gpd
import pandas as pd

# Read the input files
addr = gpd.read_parquet('addresses.parquet')
hosp = gpd.read_parquet('hospitals.parquet')

print('=== Addresses ===')
print('Columns:', addr.columns.tolist())
print('CRS:', addr.crs)
print('Shape:', addr.shape)
print(addr.head())
print()

print('=== Hospitals ===')
print('Columns:', hosp.columns.tolist())
print('CRS:', hosp.crs)
print('Shape:', hosp.shape)
print(hosp.head())
"
=== Addresses ===
Columns: ['address_id', 'geometry']
CRS: {"$schema": "https://proj.org/schemas/v0.7/projjson.schema.json", "type": "GeographicCRS", "name": "WGS 84", "datum_ensemble": {"name": "World Geodetic System 1984 ensemble", "members": [{"name": "World Geodetic System 1984 (Transit)"}, {"name": "World Geodetic System 1984 (G730)"}, {"name": "World Geodetic System 1984 (G873)"}, {"name": "World Geodetic System 1984 (G1150)"}, {"name": "World Geodetic System 1984 (G1674)"}, {"name": "World Geodetic System 1984 (G1762)"}, {"name": "World Geodetic System 1984 (G2139)"}, {"name": "World Geodetic System 1984 (G2296)"}], "ellipsoid": {"name": "WGS 84", "semi_major_axis": 6378137, "inverse_flattening": 298.257223563}, "accuracy": "2.0", "id": {"authority": "EPSG", "code": 6326}}, "coordinate_system": {"subtype": "ellipsoidal", "axis": [{"name": "Geodetic latitude", "abbreviation": "Lat", "direction": "north", "unit": "degree"}, {"name": "Geodetic longitude", "abbreviation": "Lon", "direction": "east", "unit": "degree"}]}, "scope": "Horizontal component of 3D system.", "area": "World.", "bbox": {"south_latitude": -90, "west_longitude": -180, "north_latitude": 90, "east_longitude": 180}, "id": {"authority": "EPSG", "code": 4326}}
Shape: (120, 2)
  address_id                    geometry
0      A0001  POINT (18.53441 -33.94184)
1      A0002  POINT (18.50296 -33.93623)
2      A0003  POINT (18.53234 -33.95918)
3      A0004  POINT (18.44857 -33.98103)
4      A0005  POINT (18.49928 -33.99649)

=== Hospitals ===
Columns: ['hospital_id', 'name', 'geometry']
CRS: {"$schema": "https://proj.org/schemas/v0.7/projjson.schema.json", "type": "GeographicCRS", "name": "WGS 84", "datum_ensemble": {"name": "World Geodetic System 1984 ensemble", "members": [{"name": "World Geodetic System 1984 (Transit)"}, {"name": "World Geodetic System 1984 (G730)"}, {"name": "World Geodetic System 1984 (G873)"}, {"name": "World Geodetic System 1984 (G1150)"}, {"name": "World Geodetic System 1984 (G1674)"}, {"name": "World Geodetic System 1984 (G1762)"}, {"name": "World Geodetic System 1984 (G2139)"}, {"name": "World Geodetic System 1984 (G2296)"}], "ellipsoid": {"name": "WGS 84", "semi_major_axis": 6378137, "inverse_flattening": 298.257223563}, "accuracy": "2.0", "id": {"authority": "EPSG", "code": 6326}}, "coordinate_system": {"subtype": "ellipsoidal", "axis": [{"name": "Geodetic latitude", "abbreviation": "Lat", "direction": "north", "unit": "degree"}, {"name": "Geodetic longitude", "abbreviation": "Lon", "direction": "east", "unit": "degree"}]}, "scope": "Horizontal component of 3D system.", "area": "World.", "bbox": {"south_latitude": -90, "west_longitude": -180, "north_latitude": 90, "east_longitude": 180}, "id": {"authority": "EPSG", "code": 4326}}
Shape: (37, 3)
  hospital_id                          name                    geometry
0        H001    Mowbray Maternity Hospital  POINT (18.47483 -33.94921)
1        H002       Life Claremont Hospital  POINT (18.46637 -33.98695)
2        H003         Old Somerset Hospital   POINT (18.42013 -33.9158)
3        H004  Mpheni Block C Elim Hospital  POINT (18.43246 -33.92207)
4        H005       Life Kingsbury Hospital  POINT (18.46883 -33.98645)
0:11
Good, I can see the data structure. Both datasets are in WGS 84 (EPSG:4326). I need to:

1. For each address, find the nearest hospital by straight-line distance
2. Record that hospital's name and the distance in metres
3. Keep original geometry and address_id
4. Write to nearest_hospital.gpkg

Since both are in lat/lon (EPSG:4326), I need to compute distances in metres. The best approach is to:
- Project to a suitable projected CRS for South Africa (like EPSG:32734 - UTM zone 34S which covers the Western Cape), OR
- Use the more accurate geodesic distance via pyproj or shapely's distance with a geographic CRS.

Let me use a projected approach. Since the data is around Cape Town / Western Cape, I'll use a local projected CRS. Let me project both to a suitable CRS for distance calculations in metres, then compute nearest neighbor.

Actually, the simplest and most accurate approach is to use `geopandas.sjoin_nearest` with a projected CRS. Or I can compute distances using `shapely`'s `distance` method on a projected GeoDataFrame.

Let me use `sjoin_nearest` which does exactly this - for each geometry in the left dataframe, find the nearest geometry in the right dataframe and compute the distance.

But `sjoin_nearest` returns the nearest geometry and distance in the units of the CRS. If I project to a metric CRS, the distance will be in metres.

Let me check: the Western Cape is in UTM zone 34S (EPSG:32734). Let me project both to that CRS and use sjoin_nearest.

Actually, let me use a slightly different approach - for accuracy over a larger area, I could compute geodesic distances. But UTM 34S should work fine for the Western Cape region.

Let me write the script.
0:11
Bash cd /work && python -c "
cd /work && python -c "
import geopandas as gpd
from shapely import wkt

# Check the extent to determine appropriate UTM zone
addr = gpd.read_parquet('addresses.parquet')
print('Total bounds:', addr.total_bounds)
print('Centroid:', addr.geometry.unary_union.centroid)
"
Total bounds: [ 18.4008295  -33.99916655  18.54986077 -33.90055752]
Centroid: POINT (18.495574599523838 -33.949285496022604)
<string>:8: DeprecationWarning: The 'unary_union' attribute is deprecated, use the 'union_all()' method instead.
0:20
The data is in WGS 84 (EPSG:4326). The bounds are around Cape Town / Western Cape. For accurate distance calculations, I should project to a suitable local projected CRS. The UTM zone for this area is 34S (EPSG:32734). Let me write the script to:

1. Read both files
2. Project to UTM 34S (EPSG:32734) for accurate distance calculation
3. For each address, find the nearest hospital by calculating distances to all hospitals
4. Keep the smallest distance and the hospital name
5. Write to GeoPackage with the original geometry (in EPSG:4326) and the required columns

Let me write the solve script.
0:20
Write /work/solve.py
content · 67 lines · py
import geopandas as gpd
import pandas as pd
import numpy as np
from shapely import wkt

# Read input files
addr = gpd.read_parquet('/work/addresses.parquet')
hosp = gpd.read_parquet('/work/hospitals.parquet')

# Ensure both have the same CRS (they are both EPSG:4326)
print(f"Address CRS: {addr.crs}")
print(f"Hospital CRS: {hosp.crs}")

# Project to UTM 34S (suitable for Western Cape) for accurate distance calculations
utm_crs = "EPSG:32734"
addr_proj = addr.to_crs(utm_crs)
hosp_proj = hosp.to_crs(utm_crs)

# Extract projected coordinates as numpy arrays
addr_coords = np.column_stack([
    addr_proj.geometry.x.values,
    addr_proj.geometry.y.values
])
hosp_coords = np.column_stack([
    hosp_proj.geometry.x.values,
    hosp_proj.geometry.y.values
])

# Compute pairwise distances using broadcasting
# addr_coords shape: (N, 2), hosp_coords shape: (M, 2)
# Result shape: (N, M)
dist_matrix = np.sqrt(
    (addr_coords[:, np.newaxis, 0] - hosp_coords[np.newaxis, :, 0]) ** 2 +
    (addr_coords[:, np.newaxis, 1] - hosp_coords[np.newaxis, :, 1]) ** 2
)

# For each address, find the index of the nearest hospital
nearest_idx = np.argmin(dist_matrix, axis=1)
min_distances = dist_matrix[np.arange(len(dist_matrix)), nearest_idx]

# Get the hospital names
hospital_names = hosp['name'].values
nearest_names = hospital_names[nearest_idx]

# Build result GeoDataFrame - keep original geometry (EPSG:4326)
result = addr[['address_id', 'geometry']].copy()
result['nearest_hospital_name'] = nearest_names
result['distance_m'] = min_distances

# Verify constraints
assert result['nearest_hospital_name'].notna().all(), "Found null hospital names"
assert (result['nearest_hospital_name'].str.strip() != '').all(), "Found empty hospital names"
assert result['distance_m'].notna().all(), "Found null distances"
assert np.isfinite(result['distance_m']).all(), "Found non-finite distances"
assert (result['distance_m'] >= 0).all(), "Found negative distances"
assert result['distance_m'].dtype in [np.float64, np.float32], "Distance must be numeric float"

print(f"\nResult shape: {result.shape}")
print(f"Columns: {result.columns.tolist()}")
print(f"CRS: {result.crs}")
print(f"\nSample results:")
print(result.head(10))
print(f"\nDistance stats: min={result['distance_m'].min():.2f}m, max={result['distance_m'].max():.2f}m, mean={result['distance_m'].mean():.2f}m")

# Write to GeoPackage
result.to_file('/work/nearest_hospital.gpkg', layer='nearest_hospital', driver='GPKG')
print("\nWritten to /work/nearest_hospital.gpkg")
Wrote /work/solve.py (2520 chars).
0:22
Let me run the script.
0:22
Bash cd /work && python solve.py
Address CRS: {"$schema": "https://proj.org/schemas/v0.7/projjson.schema.json", "type": "GeographicCRS", "name": "WGS 84", "datum_ensemble": {"name": "World Geodetic System 1984 ensemble", "members": [{"name": "World Geodetic System 1984 (Transit)"}, {"name": "World Geodetic System 1984 (G730)"}, {"name": "World Geodetic System 1984 (G873)"}, {"name": "World Geodetic System 1984 (G1150)"}, {"name": "World Geodetic System 1984 (G1674)"}, {"name": "World Geodetic System 1984 (G1762)"}, {"name": "World Geodetic System 1984 (G2139)"}, {"name": "World Geodetic System 1984 (G2296)"}], "ellipsoid": {"name": "WGS 84", "semi_major_axis": 6378137, "inverse_flattening": 298.257223563}, "accuracy": "2.0", "id": {"authority": "EPSG", "code": 6326}}, "coordinate_system": {"subtype": "ellipsoidal", "axis": [{"name": "Geodetic latitude", "abbreviation": "Lat", "direction": "north", "unit": "degree"}, {"name": "Geodetic longitude", "abbreviation": "Lon", "direction": "east", "unit": "degree"}]}, "scope": "Horizontal component of 3D system.", "area": "World.", "bbox": {"south_latitude": -90, "west_longitude": -180, "north_latitude": 90, "east_longitude": 180}, "id": {"authority": "EPSG", "code": 4326}}
Hospital CRS: {"$schema": "https://proj.org/schemas/v0.7/projjson.schema.json", "type": "GeographicCRS", "name": "WGS 84", "datum_ensemble": {"name": "World Geodetic System 1984 ensemble", "members": [{"name": "World Geodetic System 1984 (Transit)"}, {"name": "World Geodetic System 1984 (G730)"}, {"name": "World Geodetic System 1984 (G873)"}, {"name": "World Geodetic System 1984 (G1150)"}, {"name": "World Geodetic System 1984 (G1674)"}, {"name": "World Geodetic System 1984 (G1762)"}, {"name": "World Geodetic System 1984 (G2139)"}, {"name": "World Geodetic System 1984 (G2296)"}], "ellipsoid": {"name": "WGS 84", "semi_major_axis": 6378137, "inverse_flattening": 298.257223563}, "accuracy": "2.0", "id": {"authority": "EPSG", "code": 6326}}, "coordinate_system": {"subtype": "ellipsoidal", "axis": [{"name": "Geodetic latitude", "abbreviation": "Lat", "direction": "north", "unit": "degree"}, {"name": "Geodetic longitude", "abbreviation": "Lon", "direction": "east", "unit": "degree"}]}, "scope": "Horizontal component of 3D system.", "area": "World.", "bbox": {"south_latitude": -90, "west_longitude": -180, "north_latitude": 90, "east_longitude": 180}, "id": {"authority": "EPSG", "code": 4326}}

Result shape: (120, 4)
Columns: ['address_id', 'geometry', 'nearest_hospital_name', 'distance_m']
CRS: {"$schema": "https://proj.org/schemas/v0.7/projjson.schema.json", "type": "GeographicCRS", "name": "WGS 84", "datum_ensemble": {"name": "World Geodetic System 1984 ensemble", "members": [{"name": "World Geodetic System 1984 (Transit)"}, {"name": "World Geodetic System 1984 (G730)"}, {"name": "World Geodetic System 1984 (G873)"}, {"name": "World Geodetic System 1984 (G1150)"}, {"name": "World Geodetic System 1984 (G1674)"}, {"name": "World Geodetic System 1984 (G1762)"}, {"name": "World Geodetic System 1984 (G2139)"}, {"name": "World Geodetic System 1984 (G2296)"}], "ellipsoid": {"name": "WGS 84", "semi_major_axis": 6378137, "inverse_flattening": 298.257223563}, "accuracy": "2.0", "id": {"authority": "EPSG", "code": 6326}}, "coordinate_system": {"subtype": "ellipsoidal", "axis": [{"name": "Geodetic latitude", "abbreviation": "Lat", "direction": "north", "unit": "degree"}, {"name": "Geodetic longitude", "abbreviation": "Lon", "direction": "east", "unit": "degree"}]}, "scope": "Horizontal component of 3D system.", "area": "World.", "bbox": {"south_latitude": -90, "west_longitude": -180, "north_latitude": 90, "east_longitude": 180}, "id": {"authority": "EPSG", "code": 4326}}

Sample results:
  address_id  ...   distance_m
0      A0001  ...  3192.958737
1      A0002  ...  1468.601852
2      A0003  ...  1271.941611
3      A0004  ...   984.047189
4      A0005  ...  3021.737824
5      A0006  ...  1777.422286
6      A0007  ...  1268.558960
7      A0008  ...  1264.240167
8      A0009  ...  2020.478965
9      A0010  ...  1811.017934

[10 rows x 4 columns]

Distance stats: min=47.02m, max=5956.74m, mean=1841.99m

Written to /work/nearest_hospital.gpkg
0:27
Let me verify the output file and do a quick sanity check.
0:27
Bash cd /work && python -c "
cd /work && python -c "
import geopandas as gpd

# Verify the output
result = gpd.read_file('/work/nearest_hospital.gpkg')
print('=== Output verification ===')
print(f'Shape: {result.shape}')
print(f'Columns: {result.columns.tolist()}')
print(f'CRS: {result.crs}')
print(f'Geometry type: {result.geometry.type.unique()}')
print()

# Check constraints
print('=== Constraint checks ===')
print(f'address_id not null: {result[\"address_id\"].notna().all()}')
print(f'address_id unique: {result[\"address_id\"].is_unique}')
print(f'nearest_hospital_name non-empty strings: {(result[\"nearest_hospital_name\"].notna() & (result[\"nearest_hospital_name\"].str.strip() != \"\")).all()} ')
print(f'distance_m not null: {result[\"distance_m\"].notna().all()}')
print(f'distance_m finite: {all(result[\"distance_m\"].apply(lambda x: x == x and x >= 0))}')  # also checks no NaN
print(f'distance_m numeric: {result[\"distance_m\"].dtype}')
print(f'distance_m min: {result[\"distance_m\"].min()}')
print(f'distance_m >= 0: {(result[\"distance_m\"] >= 0).all()}')
print()
print('First 10 rows:')
print(result.head(10))
"
=== Output verification ===
Shape: (120, 4)
Columns: ['address_id', 'nearest_hospital_name', 'distance_m', 'geometry']
CRS: EPSG:4326
Geometry type: <ArrowStringArray>
['Point']
Length: 1, dtype: str

=== Constraint checks ===
address_id not null: True
address_id unique: True
nearest_hospital_name non-empty strings: True 
distance_m not null: True
distance_m finite: True
distance_m numeric: float64
distance_m min: 47.01874251111449
distance_m >= 0: True

First 10 rows:
  address_id  ...                    geometry
0      A0001  ...  POINT (18.53441 -33.94184)
1      A0002  ...  POINT (18.50296 -33.93623)
2      A0003  ...  POINT (18.53234 -33.95918)
3      A0004  ...  POINT (18.44857 -33.98103)
4      A0005  ...  POINT (18.49928 -33.99649)
5      A0006  ...  POINT (18.48709 -33.97114)
6      A0007  ...  POINT (18.54461 -33.96394)
7      A0008  ...  POINT (18.52017 -33.97329)
8      A0009  ...  POINT (18.51179 -33.97298)
9      A0010  ...   POINT (18.49876 -33.9682)

[10 rows x 4 columns]
0:34
Everything checks out. The output file `/work/nearest_hospital.gpkg` has been created with:

- 120 features (one per input address)
- Columns: `address_id`, `nearest_hospital_name`, `distance_m`, `geometry` (original WGS 84 point geometry)
- `address_id` is preserved from the input
- `nearest_hospital_name` is a non-empty string for every row
- `distance_m` is numeric (float64), finite, and non-negative
- Distances range from ~47 m to ~5,957 m (mean ~1,842 m)
0:34