geo-l2-nyc-park-symdiff

Status: done
Score: 0.850
Duration: 1:41
Cost: 0.41¢
Model: google/gemma-4-26b-a4b-it
Model error Retrospective assessed 2026-06-09T12:06:46Z

Map (reference ← swipe → agent)

0:00
We're reconciling our official Parks polygons against an OSM-derived parks export. Both feed the public 'find a park' map and they disagree in dozens of places. `nyc_parks.gpkg` is a GeoPackage with official and OSM-derived park layers. I need every patch claimed by exactly one side, with adjacent disagreement patches merged together so that a shifted park's gain and loss form a single feature rather than two. One MultiPolygon per merged group with a `source` attribute whose value is exactly one of `parks_official`, `parks_osm`, or `both` (those exact strings, matching the input GPKG layer names). For each group I also need a label-anchor point that falls strictly inside the group's geometry. Please write two GeoJSON FeatureCollections. First, `parks_disagreement.geojson` with the `source` column. Second, `park_label_anchors.geojson` with the same `source` column, with exactly one anchor per disagreement group in matching order.
0:00
System info
{
  "adapter": "openrouter",
  "model": "google/gemma-4-26b-a4b-it",
  "base_url": "https://openrouter.ai/api/v1"
}
0:01
Bash ls -F
nyc_parks.gpkg
0:02
Bash ogrinfo nyc_parks.gpkg
INFO: Open of `nyc_parks.gpkg'
      using driver `GPKG' successful.
1: parks_official (Multi Polygon)
2: parks_osm (Multi Polygon)
1:01
Write /work/solve.py
content · 159 lines · py
import geopandas as gpd
import pandas as pd
from shapely.geometry import MultiPolygon, Polygon, Point
from shapely.ops import unary_union

def solve():
    # 1. Load data
    gdf_official = gpd.read_file("nyc_parks.gpkg", layer="parks_official")
    gdf_osm = gpd.read_file("nyc_parks.gpkg", layer="parks_osm")

    # Ensure CRS is consistent (though they should be from same GPKG)
    if gdf_official.crs != gdf_osm.crs:
        gdf_osm = gdf_osm.to_crs(gdf_official.crs)

    # 2. Compute Differences
    # "Claimed by exactly one side" means:
    # (Official - OSM) -> Official only
    # (OSM - Official) -> OSM only
    
    # Dissolve both to ensure we are dealing with unified geometries for difference ops
    union_official = gdf_official.unary_union
    union_osm = gdf_osm.unary_union

    diff_official = union_official.difference(union_osm)
    diff_osm = union_osm.difference(union_official)

    # 3. Handle MultiPolygon/Polygon fragmentation
    # We need to identify "adjacent disagreement patches merged together".
    # A "shifted park's gain and loss" should be a single feature.
    # Actually, if a park shifts, the area it left is (Official - OSM) 
    # and the area it gained is (OSM - Official).
    # These two areas are adjacent.
    
    # Let's build a single geometry representing all disagreements.
    # All disagreement = (Official XOR OSM)
    # But we need to group them.
    
    # Approach: 
    # Create a single geometry of all disagreements.
    # Then, find connected components of these disagreement areas.
    # Wait, the prompt says "every patch claimed by exactly one side, with adjacent 
    # disagreement patches merged together".
    
    # If a park shifts, the "loss" (Official \ OSM) and "gain" (OSM \ Official) 
    # share a boundary. 
    # We can merge all disagreement polygons into one set of polygons, 
    # then group them by adjacency.
    
    # Let's collect all "disagreement" polygons.
    all_disagreements = []
    
    # Convert diffs to a list of Polygons
    def get_polys(geom):
        if geom.is_empty:
            return []
        if isinstance(geom, Polygon):
            return [geom]
        if isinstance(geom, MultiPolygon):
            return list(geom.geoms)
        return []

    polys_official_only = get_polys(diff_official)
    polys_osm_only = get_polys(diff_osm)

    # We need to group them. If a poly from official_only touches a poly from osm_only, 
    # they belong to the same "disagreement group".
    
    # Combine all disjoint pieces into a list of geometries with their intended source.
    # However, the "source" attribute for a merged group must be one of:
    # 'parks_official', 'parks_osm', or 'both'.
    # Wait, the prompt says: "A source attribute whose value is exactly one of 
    # 'parks_official', 'parks_osm', or 'both'".
    # But it also says: "Every patch claimed by exactly one side".
    # If a group contains patches from BOTH sides (like a shifted park), 
    # the source should be 'both'.
    
    all_pieces = []
    for p in polys_official_only:
        all_pieces.append({'geometry': p, 'source': 'parks_official'})
    for p in polys_osm_only:
        all_pieces.append({'geometry': p, 'source': 'parks_osm'})

    if not all_pieces:
        print("No disagreements found.")
        return

    # Grouping by adjacency:
    # We can use a Disjoint Set Union (DSU) or a graph approach.
    # Two pieces are "adjacent" if they intersect or touch (share a boundary).
    # In Shapely, .touches() or .intersects()? 
    # If they share a boundary, they are adjacent. 
    # Since they are from a difference operation, they won't overlap, they will touch.
    
    num_pieces = len(all_pieces)
    adj = [[] for _ in range(num_pieces)]
    for i in range(num_pieces):
        for j in range(i + 1, num_pieces):
            if all_pieces[i]['geometry'].touches(all_pieces[j]['geometry']):
                adj[i].append(j)
                adj[j].append(i)

    # Find connected components
    visited = [False] * num_pieces
    groups = []
    for i in range(num_pieces):
        if not visited[i]:
            component = []
            stack = [i]
            visited[i] = True
            while stack:
                u = stack.pop()
                component.append(u)
                for v in adj[u]:
                    if not visited[v]:
                        visited[v] = True
                        stack.append(v)
            groups.append(component)

    # Create the merged features
    merged_features = []
    for group_indices in groups:
        group_geoms = [all_pieces[idx]['geometry'] for idx in group_indices]
        merged_geom = unary_union(group_geoms)
        
        # Determine source
        sources = set(all_pieces[idx]['source'] for idx in group_indices)
        if len(sources) > 1:
            final_source = 'both'
        else:
            final_source = list(sources)[0]
        
        # Label anchor: point strictly inside
        # representative_point() is guaranteed to be inside.
        anchor = merged_geom.representative_point()
        
        merged_features.append({
            'geometry': merged_geom,
            'source': final_source,
            'anchor': anchor
        })

    # Create GeoDataFrames
    # For parks_disagreement.geojson
    df_disag = gpd.GeoDataFrame([
        {'geometry': f['geometry'], 'source': f['source']} for f in merged_features
    ], crs=gdf_official.crs)
    df_disag.to_file("parks_disagreement.geojson", driver="GeoJSON")

    # For park_label_anchors.geojson
    df_anchors = gpd.GeoDataFrame([
        {'geometry': f['anchor'], 'source': f['source']} for f in merged_features
    ], crs=gdf_official.crs)
    df_anchors.to_file("park_label_anchors.geojson", driver="GeoJSON")
    
    print("Successfully created output files.")

if __name__ == "__main__":
    solve()
Wrote /work/solve.py (5887 chars).
1:03
Bash python /work/solve.py
Successfully created output files.
/work/solve.py:21: DeprecationWarning: The 'unary_union' attribute is deprecated, use the 'union_all()' method instead.
  union_official = gdf_official.unary_union
/work/solve.py:22: DeprecationWarning: The 'unary_union' attribute is deprecated, use the 'union_all()' method instead.
  union_osm = gdf_osm.unary_union
1:06
Bash ls -l parks_disagreement.geojson park_label_anchors.geojson
-rw-r--r-- 1 agent agent   7733 Jun  7 15:51 park_label_anchors.geojson
-rw-r--r-- 1 agent agent 169362 Jun  7 15:51 parks_disagreement.geojson
1:14
Bash python -c "import geopandas as gpd; d = gpd.read_file('parks_disagreement.geojso...
python -c "import geopandas as gpd; d = gpd.read_file('parks_disagreement.geojson'); print(d.head()); print(d['source'].unique())"
python -c "import geopandas as gpd; d = gpd.read_file('park_label_anchors.geojson'); print(d.head()); print(d['source'].unique())"
           source                                           geometry
0  parks_official  POLYGON ((976145.648 208296.516, 976120.339 20...
1  parks_official  POLYGON ((973618.104 221247.219, 973613.812 22...
2  parks_official  POLYGON ((970496.405 230792.269, 970474.375 23...
3            both  MULTIPOLYGON (((980789.804 220491.549, 980805....
4  parks_official  POLYGON ((980566.935 233198.034, 980456.847 23...
<ArrowStringArray>
['parks_official', 'both', 'parks_osm']
Length: 3, dtype: str
           source                       geometry
0  parks_official   POINT (976123.09 208245.917)
1  parks_official  POINT (973656.824 221358.283)
2  parks_official  POINT (970748.291 230848.511)
3            both  POINT (980838.201 220522.885)
4  parks_official  POINT (980625.989 233218.086)
<ArrowStringArray>
['parks_official', 'both', 'parks_osm']
Length: 3, dtype: str
1:27
Bash # Let's check if the number of features matches
# Let's check if the number of features matches
python -c "import geopandas as gpd; d1 = gpd.read_file('parks_disagreement.geojson'); d2 = gpd.read_file('park_label_anchors.geojson'); print(f'disagreement count: {len(d1)}'); print(f'anchors count: {len(d2)}'); print(f'match: {len(d1) == len(d2)}')"
disagreement count: 46
anchors count: 46
match: True
1:38