geo-l2-nyc-park-symdiff
Model error
Retrospective
assessed 2026-06-09T12:06:46Z
Map (reference ← swipe → agent)
0:00
We're reconciling our official Parks polygons against an OSM-derived parks export. Both feed the public 'find a park' map and they disagree in dozens of places. `nyc_parks.gpkg` is a GeoPackage with official and OSM-derived park layers. I need every patch claimed by exactly one side, with adjacent disagreement patches merged together so that a shifted park's gain and loss form a single feature rather than two. One MultiPolygon per merged group with a `source` attribute whose value is exactly one of `parks_official`, `parks_osm`, or `both` (those exact strings, matching the input GPKG layer names). For each group I also need a label-anchor point that falls strictly inside the group's geometry.
Please write two GeoJSON FeatureCollections. First, `parks_disagreement.geojson` with the `source` column. Second, `park_label_anchors.geojson` with the same `source` column, with exactly one anchor per disagreement group in matching order.
0:00
System info
{
"adapter": "openrouter",
"model": "google/gemma-4-26b-a4b-it",
"base_url": "https://openrouter.ai/api/v1"
}
0:01
Bash
ls -F
nyc_parks.gpkg
0:02
Bash
ogrinfo nyc_parks.gpkg
INFO: Open of `nyc_parks.gpkg'
using driver `GPKG' successful.
1: parks_official (Multi Polygon)
2: parks_osm (Multi Polygon)
1:01
Write
/work/solve.py
content · 159 lines · py
import geopandas as gpd
import pandas as pd
from shapely.geometry import MultiPolygon, Polygon, Point
from shapely.ops import unary_union
def solve():
# 1. Load data
gdf_official = gpd.read_file("nyc_parks.gpkg", layer="parks_official")
gdf_osm = gpd.read_file("nyc_parks.gpkg", layer="parks_osm")
# Ensure CRS is consistent (though they should be from same GPKG)
if gdf_official.crs != gdf_osm.crs:
gdf_osm = gdf_osm.to_crs(gdf_official.crs)
# 2. Compute Differences
# "Claimed by exactly one side" means:
# (Official - OSM) -> Official only
# (OSM - Official) -> OSM only
# Dissolve both to ensure we are dealing with unified geometries for difference ops
union_official = gdf_official.unary_union
union_osm = gdf_osm.unary_union
diff_official = union_official.difference(union_osm)
diff_osm = union_osm.difference(union_official)
# 3. Handle MultiPolygon/Polygon fragmentation
# We need to identify "adjacent disagreement patches merged together".
# A "shifted park's gain and loss" should be a single feature.
# Actually, if a park shifts, the area it left is (Official - OSM)
# and the area it gained is (OSM - Official).
# These two areas are adjacent.
# Let's build a single geometry representing all disagreements.
# All disagreement = (Official XOR OSM)
# But we need to group them.
# Approach:
# Create a single geometry of all disagreements.
# Then, find connected components of these disagreement areas.
# Wait, the prompt says "every patch claimed by exactly one side, with adjacent
# disagreement patches merged together".
# If a park shifts, the "loss" (Official \ OSM) and "gain" (OSM \ Official)
# share a boundary.
# We can merge all disagreement polygons into one set of polygons,
# then group them by adjacency.
# Let's collect all "disagreement" polygons.
all_disagreements = []
# Convert diffs to a list of Polygons
def get_polys(geom):
if geom.is_empty:
return []
if isinstance(geom, Polygon):
return [geom]
if isinstance(geom, MultiPolygon):
return list(geom.geoms)
return []
polys_official_only = get_polys(diff_official)
polys_osm_only = get_polys(diff_osm)
# We need to group them. If a poly from official_only touches a poly from osm_only,
# they belong to the same "disagreement group".
# Combine all disjoint pieces into a list of geometries with their intended source.
# However, the "source" attribute for a merged group must be one of:
# 'parks_official', 'parks_osm', or 'both'.
# Wait, the prompt says: "A source attribute whose value is exactly one of
# 'parks_official', 'parks_osm', or 'both'".
# But it also says: "Every patch claimed by exactly one side".
# If a group contains patches from BOTH sides (like a shifted park),
# the source should be 'both'.
all_pieces = []
for p in polys_official_only:
all_pieces.append({'geometry': p, 'source': 'parks_official'})
for p in polys_osm_only:
all_pieces.append({'geometry': p, 'source': 'parks_osm'})
if not all_pieces:
print("No disagreements found.")
return
# Grouping by adjacency:
# We can use a Disjoint Set Union (DSU) or a graph approach.
# Two pieces are "adjacent" if they intersect or touch (share a boundary).
# In Shapely, .touches() or .intersects()?
# If they share a boundary, they are adjacent.
# Since they are from a difference operation, they won't overlap, they will touch.
num_pieces = len(all_pieces)
adj = [[] for _ in range(num_pieces)]
for i in range(num_pieces):
for j in range(i + 1, num_pieces):
if all_pieces[i]['geometry'].touches(all_pieces[j]['geometry']):
adj[i].append(j)
adj[j].append(i)
# Find connected components
visited = [False] * num_pieces
groups = []
for i in range(num_pieces):
if not visited[i]:
component = []
stack = [i]
visited[i] = True
while stack:
u = stack.pop()
component.append(u)
for v in adj[u]:
if not visited[v]:
visited[v] = True
stack.append(v)
groups.append(component)
# Create the merged features
merged_features = []
for group_indices in groups:
group_geoms = [all_pieces[idx]['geometry'] for idx in group_indices]
merged_geom = unary_union(group_geoms)
# Determine source
sources = set(all_pieces[idx]['source'] for idx in group_indices)
if len(sources) > 1:
final_source = 'both'
else:
final_source = list(sources)[0]
# Label anchor: point strictly inside
# representative_point() is guaranteed to be inside.
anchor = merged_geom.representative_point()
merged_features.append({
'geometry': merged_geom,
'source': final_source,
'anchor': anchor
})
# Create GeoDataFrames
# For parks_disagreement.geojson
df_disag = gpd.GeoDataFrame([
{'geometry': f['geometry'], 'source': f['source']} for f in merged_features
], crs=gdf_official.crs)
df_disag.to_file("parks_disagreement.geojson", driver="GeoJSON")
# For park_label_anchors.geojson
df_anchors = gpd.GeoDataFrame([
{'geometry': f['anchor'], 'source': f['source']} for f in merged_features
], crs=gdf_official.crs)
df_anchors.to_file("park_label_anchors.geojson", driver="GeoJSON")
print("Successfully created output files.")
if __name__ == "__main__":
solve()
Wrote /work/solve.py (5887 chars).
1:03
Bash
python /work/solve.py
Successfully created output files.
/work/solve.py:21: DeprecationWarning: The 'unary_union' attribute is deprecated, use the 'union_all()' method instead. union_official = gdf_official.unary_union /work/solve.py:22: DeprecationWarning: The 'unary_union' attribute is deprecated, use the 'union_all()' method instead. union_osm = gdf_osm.unary_union
1:06
Bash
ls -l parks_disagreement.geojson park_label_anchors.geojson
-rw-r--r-- 1 agent agent 7733 Jun 7 15:51 park_label_anchors.geojson -rw-r--r-- 1 agent agent 169362 Jun 7 15:51 parks_disagreement.geojson
1:14
Bash
python -c "import geopandas as gpd; d = gpd.read_file('parks_disagreement.geojso...
python -c "import geopandas as gpd; d = gpd.read_file('parks_disagreement.geojson'); print(d.head()); print(d['source'].unique())"
python -c "import geopandas as gpd; d = gpd.read_file('park_label_anchors.geojson'); print(d.head()); print(d['source'].unique())"
source geometry
0 parks_official POLYGON ((976145.648 208296.516, 976120.339 20...
1 parks_official POLYGON ((973618.104 221247.219, 973613.812 22...
2 parks_official POLYGON ((970496.405 230792.269, 970474.375 23...
3 both MULTIPOLYGON (((980789.804 220491.549, 980805....
4 parks_official POLYGON ((980566.935 233198.034, 980456.847 23...
<ArrowStringArray>
['parks_official', 'both', 'parks_osm']
Length: 3, dtype: str
source geometry
0 parks_official POINT (976123.09 208245.917)
1 parks_official POINT (973656.824 221358.283)
2 parks_official POINT (970748.291 230848.511)
3 both POINT (980838.201 220522.885)
4 parks_official POINT (980625.989 233218.086)
<ArrowStringArray>
['parks_official', 'both', 'parks_osm']
Length: 3, dtype: str
1:27
Bash
# Let's check if the number of features matches
# Let's check if the number of features matches
python -c "import geopandas as gpd; d1 = gpd.read_file('parks_disagreement.geojson'); d2 = gpd.read_file('park_label_anchors.geojson'); print(f'disagreement count: {len(d1)}'); print(f'anchors count: {len(d2)}'); print(f'match: {len(d1) == len(d2)}')"
disagreement count: 46 anchors count: 46 match: True
1:38