spa-l1-capetown-hospital-nn
Map (reference ← swipe → agent)
0:00
I am putting together EMS coverage planning for the Western Cape and need each residential pickup address tagged with its nearest hospital so the ops-research team can join the result against shift schedules. For every address in `addresses.parquet`, find the nearest hospital from `hospitals.parquet` by straight-line distance and record that hospital's name plus the distance in metres. Keep the original `address_id` from the input so we can join back to the address master, and make sure `nearest_hospital_name` comes through as a non-empty string and `distance_m` as a numeric, finite, non-negative value in metres.
Please write the result to `nearest_hospital.gpkg` in GeoPackage format with one feature per input address, keeping the original address geometry and the columns `address_id`, `nearest_hospital_name`, and `distance_m`.
0:00
System info
{
"adapter": "openrouter",
"model": "deepseek/deepseek-v4-flash",
"base_url": "https://openrouter.ai/api/v1"
}
0:03
Let me start by reading the input files to understand their structure.
0:03
Read
/work/addresses.parquet
binary file: /work/addresses.parquet contains NUL bytes. Use a domain tool to inspect it (duckdb, geopandas, pyogrio, sqlite3, ...).
0:03
Read
/work/hospitals.parquet
binary file: /work/hospitals.parquet contains NUL bytes. Use a domain tool to inspect it (duckdb, geopandas, pyogrio, sqlite3, ...).
0:07
Let me inspect the Parquet files using Python to understand their structure.
0:07
Bash
cd /work && python3 -c "
cd /work && python3 -c "
import pandas as pd
df = pd.read_parquet('addresses.parquet')
print('=== ADDRESSES ===')
print('Columns:', list(df.columns))
print('Shape:', df.shape)
print('dtypes:')
print(df.dtypes)
print()
print(df.head())
print('---')
print(df.iloc[0])
"
=== ADDRESSES === Columns: ['address_id', 'geometry'] Shape: (120, 2) dtypes: address_id str geometry object dtype: object address_id geometry 0 A0001 b'\x01\x01\x00\x00\x00\xc2\x99$,\xcf\x882@\xcd... 1 A0002 b'\x01\x01\x00\x00\x00\xfcg\xbf\x12\xc2\x802@t... 2 A0003 b'\x01\x01\x00\x00\x00\xb8\xa8\x80\x7fG\x882@r... 3 A0004 b'\x01\x01\x00\x00\x00\x01Vxv\xd5r2@\x02\\\xc4... 4 A0005 b'\x01\x01\x00\x00\x00\xcc\xa7\xbb\xf0\xd0\x7f... --- address_id A0001 geometry b'\x01\x01\x00\x00\x00\xc2\x99$,\xcf\x882@\xcd... Name: 0, dtype: object
0:08
Bash
cd /work && python3 -c "
cd /work && python3 -c "
import pandas as pd
df = pd.read_parquet('hospitals.parquet')
print('=== HOSPITALS ===')
print('Columns:', list(df.columns))
print('Shape:', df.shape)
print('dtypes:')
print(df.dtypes)
print()
print(df.head())
print('---')
print(df.iloc[0])
"
=== HOSPITALS === Columns: ['hospital_id', 'name', 'geometry'] Shape: (37, 3) dtypes: hospital_id str name str geometry object dtype: object hospital_id ... geometry 0 H001 ... b"\x01\x01\x00\x00\x00\xb7''^\x8ey2@\x12\xf4z\... 1 H002 ... b'\x01\x01\x00\x00\x00\xb5\x01THdw2@\x15\xe1`N... 2 H003 ... b'\x01\x01\x00\x00\x00\x0b\xe8K\x90\x8dk2@\xd6... 3 H004 ... b'\x01\x01\x00\x00\x00\x02\xd4\xd4\xb2\xb5n2@D... 4 H005 ... b'\x01\x01\x00\x00\x00?ke_\x05x2@\xa8\xe8\xe5\... [5 rows x 3 columns] --- hospital_id H001 name Mowbray Maternity Hospital geometry b"\x01\x01\x00\x00\x00\xb7''^\x8ey2@\x12\xf4z\... Name: 0, dtype: object
0:14
The geometry column is stored as WKB binary. I need to use GeoPandas (or geopandas with pyogrio) to read the parquet files properly as geometries. Let me inspect them with geopandas.
0:14
Bash
cd /work && python3 -c "
cd /work && python3 -c "
import geopandas as gpd
addr = gpd.read_parquet('addresses.parquet')
print('=== ADDRESSES (GeoDataFrame) ===')
print('Columns:', list(addr.columns))
print('Shape:', addr.shape)
print('CRS:', addr.crs)
print('dtypes:')
print(addr.dtypes)
print()
print(addr.head())
print('---')
print(addr.iloc[0])
print()
hosp = gpd.read_parquet('hospitals.parquet')
print('=== HOSPITALS (GeoDataFrame) ===')
print('Columns:', list(hosp.columns))
print('Shape:', hosp.shape)
print('CRS:', hosp.crs)
print('dtypes:')
print(hosp.dtypes)
print()
print(hosp.head())
print('---')
print(hosp.iloc[0])
"
=== ADDRESSES (GeoDataFrame) ===
Columns: ['address_id', 'geometry']
Shape: (120, 2)
CRS: {"$schema": "https://proj.org/schemas/v0.7/projjson.schema.json", "type": "GeographicCRS", "name": "WGS 84", "datum_ensemble": {"name": "World Geodetic System 1984 ensemble", "members": [{"name": "World Geodetic System 1984 (Transit)"}, {"name": "World Geodetic System 1984 (G730)"}, {"name": "World Geodetic System 1984 (G873)"}, {"name": "World Geodetic System 1984 (G1150)"}, {"name": "World Geodetic System 1984 (G1674)"}, {"name": "World Geodetic System 1984 (G1762)"}, {"name": "World Geodetic System 1984 (G2139)"}, {"name": "World Geodetic System 1984 (G2296)"}], "ellipsoid": {"name": "WGS 84", "semi_major_axis": 6378137, "inverse_flattening": 298.257223563}, "accuracy": "2.0", "id": {"authority": "EPSG", "code": 6326}}, "coordinate_system": {"subtype": "ellipsoidal", "axis": [{"name": "Geodetic latitude", "abbreviation": "Lat", "direction": "north", "unit": "degree"}, {"name": "Geodetic longitude", "abbreviation": "Lon", "direction": "east", "unit": "degree"}]}, "scope": "Horizontal component of 3D system.", "area": "World.", "bbox": {"south_latitude": -90, "west_longitude": -180, "north_latitude": 90, "east_longitude": 180}, "id": {"authority": "EPSG", "code": 4326}}
dtypes:
address_id str
geometry geometry
dtype: object
address_id geometry
0 A0001 POINT (18.53441 -33.94184)
1 A0002 POINT (18.50296 -33.93623)
2 A0003 POINT (18.53234 -33.95918)
3 A0004 POINT (18.44857 -33.98103)
4 A0005 POINT (18.49928 -33.99649)
---
address_id A0001
geometry POINT (18.534411200462053 -33.94183671394203)
Name: 0, dtype: object
=== HOSPITALS (GeoDataFrame) ===
Columns: ['hospital_id', 'name', 'geometry']
Shape: (37, 3)
CRS: {"$schema": "https://proj.org/schemas/v0.7/projjson.schema.json", "type": "GeographicCRS", "name": "WGS 84", "datum_ensemble": {"name": "World Geodetic System 1984 ensemble", "members": [{"name": "World Geodetic System 1984 (Transit)"}, {"name": "World Geodetic System 1984 (G730)"}, {"name": "World Geodetic System 1984 (G873)"}, {"name": "World Geodetic System 1984 (G1150)"}, {"name": "World Geodetic System 1984 (G1674)"}, {"name": "World Geodetic System 1984 (G1762)"}, {"name": "World Geodetic System 1984 (G2139)"}, {"name": "World Geodetic System 1984 (G2296)"}], "ellipsoid": {"name": "WGS 84", "semi_major_axis": 6378137, "inverse_flattening": 298.257223563}, "accuracy": "2.0", "id": {"authority": "EPSG", "code": 6326}}, "coordinate_system": {"subtype": "ellipsoidal", "axis": [{"name": "Geodetic latitude", "abbreviation": "Lat", "direction": "north", "unit": "degree"}, {"name": "Geodetic longitude", "abbreviation": "Lon", "direction": "east", "unit": "degree"}]}, "scope": "Horizontal component of 3D system.", "area": "World.", "bbox": {"south_latitude": -90, "west_longitude": -180, "north_latitude": 90, "east_longitude": 180}, "id": {"authority": "EPSG", "code": 4326}}
dtypes:
hospital_id str
name str
geometry geometry
dtype: object
hospital_id name geometry
0 H001 Mowbray Maternity Hospital POINT (18.47483 -33.94921)
1 H002 Life Claremont Hospital POINT (18.46637 -33.98695)
2 H003 Old Somerset Hospital POINT (18.42013 -33.9158)
3 H004 Mpheni Block C Elim Hospital POINT (18.43246 -33.92207)
4 H005 Life Kingsbury Hospital POINT (18.46883 -33.98645)
---
hospital_id H001
name Mowbray Maternity Hospital
geometry POINT (18.47482861 -33.94920784)
Name: 0, dtype: object
0:31
Let me analyze the data: 1. **Addresses**: 120 rows with `address_id` (string) and Point geometry in EPSG:4326 (WGS 84) 2. **Hospitals**: 37 rows with `hospital_id`, `name` (string), and Point geometry in EPSG:4326 I need to: - For each address, find the nearest hospital by straight-line distance - Record `nearest_hospital_name` and `distance_m` (in metres) - Keep original `address_id` and geometry - Write to `nearest_hospital.gpkg` Since the data is in WGS 84 (degrees), I need to compute proper geodesic distances (or project to a suitable metric CRS). For the Western Cape, a good choice would be UTM zone 34S (EPSG:32734) which covers most of the Western Cape. Or I could use the haversine formula, or shapely's `distance` combined with a geographic CRS conversion. Actually, the simplest approach: 1. Convert to a projected CRS suitable for Western Cape (like EPSG:32734 - UTM zone 34S) 2. Compute nearest neighbor distances in metres 3. Write the result Let me use geopandas with sjoin_nearest or a manual approach. Actually, the cleanest way is to: 1. Project both to a metric CRS (EPSG:32734 for UTM 34S - covers Western Cape) 2. For each address, find the nearest hospital using ball tree or sjoin_nearest Let me use `sjoin_nearest` from geopandas. Wait - `sjoin_nearest` joins each address to the nearest hospital and returns that. But I need to make sure distance is in metres. Let me project first, then use sjoin_nearest. Actually, there's a subtlety: if I project to UTM 34S and use sjoin_nearest, the distance will be in metres, correctly. Let me do that. Let me write the script.
0:31
0:31
Write
/work/solve.py
content · 64 lines · py
import geopandas as gpd
# ---------------------------------------------------------------------------
# 1. Load inputs
# ---------------------------------------------------------------------------
addresses = gpd.read_parquet("/work/addresses.parquet")
hospitals = gpd.read_parquet("/work/hospitals.parquet")
# Both are already in EPSG:4326 (WGS 84) — verify
assert str(addresses.crs).upper() == "EPSG:4326", f"Unexpected addresses CRS: {addresses.crs}"
assert str(hospitals.crs).upper() == "EPSG:4326", f"Unexpected hospitals CRS: {hospitals.crs}"
# ---------------------------------------------------------------------------
# 2. Project to UTM zone 34S (covers Western Cape) so distances are in metres
# ---------------------------------------------------------------------------
PROJ_CRS = "EPSG:32734" # WGS 84 / UTM zone 34S
addresses_proj = addresses.to_crs(PROJ_CRS)
hospitals_proj = hospitals.to_crs(PROJ_CRS)
# ---------------------------------------------------------------------------
# 3. Nearest-neighbour join — for each address, find the single nearest hospital
# ---------------------------------------------------------------------------
# sjoin_nearest returns the row from the right (hospitals) nearest to each left
# (addresses) geometry. We set distance_col to capture the distance in CRS units (metres).
joined = gpd.sjoin_nearest(
addresses_proj,
hospitals_proj[["hospital_id", "name", "geometry"]],
how="left",
max_distance=None, # no limit – find absolute nearest
distance_col="distance_m",
)
# The distance column is already in metres (UTM).
# ---------------------------------------------------------------------------
# 4. Build the output GeoDataFrame
# ---------------------------------------------------------------------------
# Keep original address geometry (in WGS 84), address_id, nearest_hospital_name, distance_m.
# The sjoin_nearest already attached the hospital name; rename it.
joined = joined.rename(columns={"name": "nearest_hospital_name"})
# Bring geometry back to original CRS
joined = joined.set_geometry("geometry", crs=PROJ_CRS)
joined = joined.to_crs("EPSG:4326")
# Select and order the required columns
out = joined[["address_id", "nearest_hospital_name", "distance_m", "geometry"]].copy()
# ---------------------------------------------------------------------------
# 5. Quality checks
# ---------------------------------------------------------------------------
assert out["nearest_hospital_name"].notna().all(), "Some addresses have no nearest hospital!"
assert out["nearest_hospital_name"].str.len().gt(0).all(), "Empty hospital name found!"
assert (out["distance_m"] >= 0).all(), "Negative distance encountered!"
assert out["distance_m"].notna().all(), "Null distance encountered!"
# ---------------------------------------------------------------------------
# 6. Write to GeoPackage
# ---------------------------------------------------------------------------
out.to_file("/work/nearest_hospital.gpkg", layer="nearest_hospital", driver="GPKG")
print("Done. Output rows:", len(out))
print(out.head())
Wrote /work/solve.py (3107 chars).
0:35
Let me run the script and check the output.
0:35
Bash
cd /work && python solve.py
Traceback (most recent call last):
File "/work/solve.py", line 10, in <module>
assert str(addresses.crs).upper() == "EPSG:4326", f"Unexpected addresses CRS: {addresses.crs}"
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: Unexpected addresses CRS: {"$schema": "https://proj.org/schemas/v0.7/projjson.schema.json", "type": "GeographicCRS", "name": "WGS 84", "datum_ensemble": {"name": "World Geodetic System 1984 ensemble", "members": [{"name": "World Geodetic System 1984 (Transit)"}, {"name": "World Geodetic System 1984 (G730)"}, {"name": "World Geodetic System 1984 (G873)"}, {"name": "World Geodetic System 1984 (G1150)"}, {"name": "World Geodetic System 1984 (G1674)"}, {"name": "World Geodetic System 1984 (G1762)"}, {"name": "World Geodetic System 1984 (G2139)"}, {"name": "World Geodetic System 1984 (G2296)"}], "ellipsoid": {"name": "WGS 84", "semi_major_axis": 6378137, "inverse_flattening": 298.257223563}, "accuracy": "2.0", "id": {"authority": "EPSG", "code": 6326}}, "coordinate_system": {"subtype": "ellipsoidal", "axis": [{"name": "Geodetic latitude", "abbreviation": "Lat", "direction": "north", "unit": "degree"}, {"name": "Geodetic longitude", "abbreviation": "Lon", "direction": "east", "unit": "degree"}]}, "scope": "Horizontal component of 3D system.", "area": "World.", "bbox": {"south_latitude": -90, "west_longitude": -180, "north_latitude": 90, "east_longitude": 180}, "id": {"authority": "EPSG", "code": 4326}}
0:41
The CRS is stored as a PROJJSON dict, not a simple "EPSG:4326" string. I need to fix the assertion or just remove it and use the CRS object directly. Actually the comparison `str(addresses.crs).upper() == "EPSG:4326"` fails because `str(addresses.crs)` returns the full PROJJSON string. I can just use `addresses.crs.is_epsg_code(4326)` or simply not assert and let the code work since we know the CRS is WGS 84.
0:41
Edit
/work/solve.py
Edited /work/solve.py.
0:43
Let me also fix the assertion for the output GeoPackage CRS - the `to_crs("EPSG:4326")` line should work fine. Let me run the script again.
0:43
Bash
cd /work && python solve.py
Traceback (most recent call last):
File "/work/solve.py", line 10, in <module>
assert addresses.crs.is_epsg_code(4326), f"Unexpected addresses CRS: {addresses.crs}"
^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'CRS' object has no attribute 'is_epsg_code'
0:49
Let me just use a simpler check that handles the CRS as a pyproj CRS object.
0:49
Edit
/work/solve.py
Edited /work/solve.py.
0:52
Wait, `to_epsg()` can return `None` for non-EPSG CRS or ensemble CRS. Let me check if it works with this PROJJSON ensemble CRS.
0:52
Bash
cd /work && python3 -c "
cd /work && python3 -c "
import geopandas as gpd
addr = gpd.read_parquet('addresses.parquet')
print('to_epsg():', addr.crs.to_epsg())
print('is_exact_same:', addr.crs == 4326)
"
to_epsg(): 4326 is_exact_same: True
0:57
The CRS comparison works now - `to_epsg()` returns 4326. The issue was just the API difference. Let me just run the script again.
0:57
Bash
cd /work && python solve.py
Done. Output rows: 120 address_id ... geometry 0 A0001 ... POINT (18.53441 -33.94184) 1 A0002 ... POINT (18.50296 -33.93623) 2 A0003 ... POINT (18.53234 -33.95918) 3 A0004 ... POINT (18.44857 -33.98103) 4 A0005 ... POINT (18.49928 -33.99649) [5 rows x 4 columns]
1:02
The script ran successfully. Let me verify the output by reading it back with geopandas and checking the quality constraints.
1:02
Bash
cd /work && python3 -c "
cd /work && python3 -c "
import geopandas as gpd
out = gpd.read_file('nearest_hospital.gpkg')
print('Rows:', len(out))
print('Columns:', list(out.columns))
print('CRS:', out.crs)
print()
print(out.head(10))
print()
# Checks
print('All hospital names non-empty?', out['nearest_hospital_name'].str.len().gt(0).all())
print('All distances non-negative?', (out['distance_m'] >= 0).all())
print('All distances finite?', out['distance_m'].notna().all() & (out['distance_m'] != float('inf')).all())
print('All address_ids present?', out['address_id'].notna().all())
print()
print('Distance stats:')
print(out['distance_m'].describe())
"
Rows: 120 Columns: ['address_id', 'nearest_hospital_name', 'distance_m', 'geometry'] CRS: EPSG:4326 address_id ... geometry 0 A0001 ... POINT (18.53441 -33.94184) 1 A0002 ... POINT (18.50296 -33.93623) 2 A0003 ... POINT (18.53234 -33.95918) 3 A0004 ... POINT (18.44857 -33.98103) 4 A0005 ... POINT (18.49928 -33.99649) 5 A0006 ... POINT (18.48709 -33.97114) 6 A0007 ... POINT (18.54461 -33.96394) 7 A0008 ... POINT (18.52017 -33.97329) 8 A0009 ... POINT (18.51179 -33.97298) 9 A0010 ... POINT (18.49876 -33.9682) [10 rows x 4 columns] All hospital names non-empty? True All distances non-negative? True All distances finite? True All address_ids present? True Distance stats: count 120.000000 mean 1841.992144 std 1187.555528 min 47.018743 25% 996.402110 50% 1582.328219 75% 2629.297510 max 5956.744689 Name: distance_m, dtype: float64
1:12
Everything checks out: - 120 rows (same as input addresses) - Columns: `address_id`, `nearest_hospital_name`, `distance_m`, `geometry` - CRS is EPSG:4326 (original geometry preserved) - All hospital names are non-empty strings - All distances are non-negative, finite - Distance in metres (47 m to 5.9 km — realistic for Western Cape urban areas) - File written to `/work/nearest_hospital.gpkg`
1:12