run-20260616-001510Z / dc-l3-vienna-overpass-historical

0:00

Fetch Vienna's current district polygons (Bezirke) and their 2014-01-01 counterparts from Overpass and identify which parts of the district boundaries have changed between then and now. District names changed casing and spelling between snapshots, so normalise them to match. Classify each resulting geometry fragment by how it changed: `added_since_2014` (in current but not 2014), `removed_since_2014` (in 2014 but not current), or `unchanged` (present in both snapshots). For each district flag whether its geometry borders any area that was added or removed. Output vienna_boundary_changes.geojson, MultiPolygon, each feature with change_type, district_name (normalised), and touches_changed (boolean).

0:00

System info

{
  "adapter": "openrouter",
  "model": "deepseek/deepseek-v4-flash",
  "base_url": "https://openrouter.ai/api/v1"
}

0:03

Let me start by understanding the task:

1. Fetch Vienna's current district polygons from Overpass API
2. Fetch Vienna's 2014-01-01 district polygons from Overpass API (using old data/timestamp)
3. Normalize district names to match between the two snapshots
4. Compare geometries and classify changes (added, removed, unchanged)
5. For each district, flag if it borders any changed area
6. Output as GeoJSON with MultiPolygon geometry

Let me first explore what's available and understand the Overpass query structure.

Vienna's districts are called "Bezirke" in German. In OpenStreetMap, they are typically relations/ways with boundary=administrative and admin_level=9 or 10 for districts within Vienna.

Let me start by writing a script to fetch this data.

0:03

Bash ls /work/

(no output)

0:06

Let me start by writing a Python script to solve this task. I need to:

1. Fetch Vienna's current district polygons (Bezirke) from Overpass
2. Fetch Vienna's 2014-01-01 district polygons from Overpass
3. Normalize district names to match (handle casing/spelling differences)
4. Compute differences (symmetric difference) between the two snapshots
5. Classify geometry fragments: added_since_2014, removed_since_2014, unchanged
6. For each district, flag whether its geometry borders any area that was added or removed
7. Output to vienna_boundary_changes.geojson

Let me first check what's available in the workspace and what libraries we have.

0:06

duckdb             1.5.3
geopandas          1.1.3
osmnx              2.1.0
pyproj             3.7.2
shapely            2.1.2

0:12

Let me plan this out:

1. Fetch Vienna's current district polygons from Overpass (using `osmnx` or `overpass` API)
2. Fetch Vienna's historic district polygons from Overpass (with date=2014-01-01)
3. Normalise district names (handle casing/spelling differences)
4. Compute differences: added (current - historic), removed (historic - current), unchanged (intersection)
5. For each district, check if its geometry borders any changed area
6. Output as GeoJSON

Let me start by querying Overpass. I'll use osmnx which provides a nice interface.

First, let me figure out the relation IDs for Vienna districts. Vienna districts are typically tagged as `admin_level=9` or `admin_level=10` in the boundary relation. Let me find the correct query.

Actually, I can use Overpass API directly with a query. Let me use the `overpass-api.de` endpoint or use `osmnx.features_from_*` methods.

Let me try using osmnx to get the current districts first.

0:12

Bash cd /work && python3 -c "

cd /work && python3 -c "
import osmnx as ox
import geopandas as gpd

# Get Vienna current boundaries (admin_level=9 for Vienna districts)
# First check what OSM place/relation exists for Vienna
vienna = ox.geocode_to_gdf('Vienna, Austria')
print(vienna.columns)
print(vienna[['name', 'osm_type', 'osm_id', 'admin_level']].head())
"

Index(['geometry', 'bbox_west', 'bbox_south', 'bbox_east', 'bbox_north',
       'place_id', 'osm_type', 'osm_id', 'lat', 'lon', 'class', 'type',
       'place_rank', 'importance', 'addresstype', 'name', 'display_name'],
      dtype='str')

Traceback (most recent call last):
  File "<string>", line 9, in <module>
    print(vienna[['name', 'osm_type', 'osm_id', 'admin_level']].head())
          ~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.14/site-packages/geopandas/geodataframe.py", line 1891, in __getitem__
    result = super().__getitem__(key)
  File "/usr/local/lib/python3.14/site-packages/pandas/core/frame.py", line 4384, in __getitem__
    indexer = self.columns._get_indexer_strict(key, "columns")[1]
              ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.14/site-packages/pandas/core/indexes/base.py", line 6302, in _get_indexer_strict
    self._raise_if_missing(keyarr, indexer, axis_name)
    ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.14/site-packages/pandas/core/indexes/base.py", line 6355, in _raise_if_missing
    raise KeyError(f"{not_found} not in index")
KeyError: "['admin_level'] not in index"

0:22

Let me use osmnx to fetch the Vienna districts. I'll use the `features_from_place` function with the right tags. District boundaries in Vienna are typically tagged as `admin_level=9` and `boundary=administrative` in OSM. Let me also use Overpass API directly to get snapshots from 2014-01-01.

Let me first explore what's available.

0:22

Bash cd /work && python3 -c "

cd /work && python3 -c "
import osmnx as ox
ox.settings.log_console = False

# Get current Vienna districts using OSMnx
# admin_level=9 are districts in Vienna
gdf = ox.features_from_place('Vienna, Austria', tags={'boundary':'administrative', 'admin_level':'9'})
print('Shape:', gdf.shape)
print('Columns:', gdf.columns.tolist())
print('---')
print(gdf[['name', 'name:en', 'admin_level', 'osm_id']].head(10))
print('---')
print('Unique admin_level values:', gdf['admin_level'].unique())
print('Unique boundary values:', gdf['boundary'].dropna().unique())
"

Shape: (897, 411)
Columns: ['geometry', 'ISO3166-1:alpha2', 'admin_level', 'capital', 'description', 'name', 'name:ab', 'name:ace', 'name:af', 'name:am', 'name:an', 'name:ar', 'name:arc', 'name:arz', 'name:as', 'name:ast', 'name:av', 'name:az', 'name:ba', 'name:bar', 'name:bat-smg', 'name:be', 'name:be-tarask', 'name:bg', 'name:bi', 'name:bn', 'name:bo', 'name:br', 'name:bs', 'name:ca', 'name:ce', 'name:ceb', 'name:ch', 'name:ckb', 'name:co', 'name:crh', 'name:cs', 'name:cu', 'name:cv', 'name:cy', 'name:de', 'name:diq', 'name:ee', 'name:el', 'name:en', 'name:eo', 'name:es', 'name:et', 'name:eu', 'name:ext', 'name:fa', 'name:ff', 'name:fi', 'name:fr', 'name:frp', 'name:fur', 'name:fy', 'name:ga', 'name:gag', 'name:gd', 'name:gl', 'name:gn', 'name:gv', 'name:ha', 'name:he', 'name:hi', 'name:hif', 'name:hr', 'name:ht', 'name:hu', 'name:hy', 'name:ia', 'name:id', 'name:ie', 'name:ilo', 'name:is', 'name:it', 'name:ja', 'name:jv', 'name:ka', 'name:kab', 'name:kk', 'name:kn', 'name:ko', 'name:koi', 'name:krc', 'name:ku', 'name:kv', 'name:ky', 'name:la', 'name:lez', 'name:li', 'name:lij', 'name:lmo', 'name:ln', 'name:lt', 'name:lv', 'name:mg', 'name:mhr', 'name:mi', 'name:mk', 'name:ml', 'name:mn', 'name:mr', 'name:mrj', 'name:ms', 'name:mt', 'name:my', 'name:nah', 'name:nds', 'name:nds-nl', 'name:ne', 'name:nl', 'name:no', 'name:nv', 'name:oc', 'name:or', 'name:os', 'name:pa', 'name:pap', 'name:pih', 'name:pl', 'name:pms', 'name:pnb', 'name:ps', 'name:pt', 'name:rm', 'name:rmy', 'name:ro', 'name:roa-tara', 'name:ru', 'name:rue', 'name:sah', 'name:sc', 'name:scn', 'name:sco', 'name:sd', 'name:sh', 'name:si', 'name:sk', 'name:sl', 'name:sn', 'name:so', 'name:sq', 'name:sr', 'name:st', 'name:sv', 'name:sw', 'name:szl', 'name:ta', 'name:te', 'name:tg', 'name:th', 'name:tk', 'name:tl', 'name:tr', 'name:tt', 'name:tw', 'name:tzl', 'name:udm', 'name:ug', 'name:uk', 'name:ur', 'name:uz', 'name:vec', 'name:vep', 'name:vi', 'name:vro', 'name:wa', 'name:war', 'name:wo', 'name:wuu', 'name:yi', 'name:yo', 'name:yue', 'name:zh', 'name:zh-Hans', 'name:zh-Hant', 'population', 'population:date', 'source:population', 'website', 'wikidata', 'wikipedia', 'ISO3166-1', 'alt_name:eo', 'currency', 'flag', 'name:ak', 'name:chr', 'name:da', 'name:dz', 'name:fo', 'name:gu', 'name:haw', 'name:hsb', 'name:io', 'name:jbo', 'name:kl', 'name:ksh', 'name:kw', 'name:lb', 'name:lo', 'name:nn', 'name:se', 'name:smn', 'name:sms', 'name:tok', 'name:vo', 'name:zu', 'official_name', 'official_name:be', 'official_name:ckb', 'official_name:cs', 'official_name:el', 'official_name:eo', 'official_name:hu', 'source', 'boundary', 'image', 'ref', 'alt_name', 'width', 'wikimedia_commons', 'note', 'long_name', 'addr:city', 'addr:country', 'addr:housenumber', 'addr:postcode', 'addr:street', 'government', 'office', 'wheelchair', 'opening_hours', 'highway', 'maxspeed', 'sidewalk', 'source:maxspeed', 'lane_markings', 'lit', 'surface', 'oneway', 'oneway:bicycle', 'cycleway:left', 'cycleway:right', 'lanes', 'sidewalk:left', 'sidewalk:right', 'destination', 'destination:symbol', 'lanes:bus', 'sidewalk:left:surface', 'smoothness', 'turn:lanes', 'cycleway', 'cycleway:lane', 'parking:both', 'parking:left:orientation', 'parking:right:orientation', 'cycleway:both', 'parking:both:orientation', 'sidewalk:both', 'maxspeed:type', 'sidewalk:right:surface', 'cycleway:both:lane', 'network', 'embedded_rails', 'sidewalk:both:surface', 'est_width:carriageway', 'parking:left', 'parking:right', 'source:est_width:carriageway', 'name:etymology:wikidata', 'bridge', 'lanes:backward', 'lanes:forward', 'layer', 'turn:lanes:backward', 'turn:lanes:forward', 'lcn', 'cycleway:left:lane', 'name:etymology', 'name:start_date', 'maxweight:signed', 'bicycle', 'cycleway:left:oneway', 'loc_name', 'motor_vehicle:conditional', 'old_ref', 'traffic_sign', 'electrified', 'frequency', 'gauge', 'operator', 'operator:wikidata', 'railway', 'railway:preferred_direction', 'voltage', 'oneway:psv', 'odbl', 'boundary_type', 'cycleway:right:lane', 'shoulder', 'check_date', 'oneway:bus', 'source:width', 'amenity', 'building', 'building:levels', 'check_date:cycleway', 'check_date:smoothness', 'name:wikipedia', 'maxheight', 'level', 'destination:backward', 'destination:ref:backward', 'destination:forward', 'destination:ref:forward', 'parking:right:restriction', 'type', 'ISO3166-1:alpha3', 'ISO3166-1:numeric', 'TMC:cid_58:tabcd_1:Class', 'TMC:cid_58:tabcd_1:LCLversion', 'TMC:cid_58:tabcd_1:LocationCode', 'default_language', 'driving_side', 'int_name', 'land_area', 'name:als', 'name:ang', 'name:ay', 'name:bcl', 'name:bh', 'name:bpy', 'name:bug', 'name:bxr', 'name:cbk-zam', 'name:cdo', 'name:chm', 'name:chy', 'name:csb', 'name:din', 'name:dsb', 'name:dv', 'name:frr', 'name:hak', 'name:ig', 'name:kaa', 'name:kbd', 'name:kg', 'name:lad', 'name:lld', 'name:ltg', 'name:lzh', 'name:mdf', 'name:mwl', 'name:na', 'name:nan', 'name:new', 'name:nov', 'name:nrm', 'name:om', 'name:pag', 'name:pam', 'name:pcd', 'name:pdc', 'name:pfl', 'name:pi', 'name:pnt', 'name:prefix:at', 'name:qu', 'name:roa-rup', 'name:rw', 'name:sa', 'name:sm', 'name:ss', 'name:stq', 'name:su', 'name:tet', 'name:tpi', 'name:ts', 'name:vls', 'name:xal', 'name:xmf', 'name:za', 'name:zea', 'official_name:be-tarask', 'official_name:ca', 'official_name:mi', 'ref:at:gkz', 'timezone', 'ISO3166-2', 'alt_name:cs', 'coat_of_arms', 'ref:nuts', 'ref:nuts:2', 'ref:at:okz', 'admin_level:6', 'admin_level:8', 'ref:at:kgnr', 'short_name', 'admin_title', 'designation', 'official_status', 'street_names:wikipedia', 'old_name', 'wikipedia:ru', 'roof:levels', 'related:wikipedia']
---

Traceback (most recent call last):
  File "<string>", line 11, in <module>
    print(gdf[['name', 'name:en', 'admin_level', 'osm_id']].head(10))
          ~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.14/site-packages/geopandas/geodataframe.py", line 1891, in __getitem__
    result = super().__getitem__(key)
  File "/usr/local/lib/python3.14/site-packages/pandas/core/frame.py", line 4384, in __getitem__
    indexer = self.columns._get_indexer_strict(key, "columns")[1]
              ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.14/site-packages/pandas/core/indexes/base.py", line 6302, in _get_indexer_strict
    self._raise_if_missing(keyarr, indexer, axis_name)
    ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.14/site-packages/pandas/core/indexes/base.py", line 6355, in _raise_if_missing
    raise KeyError(f"{not_found} not in index")
KeyError: "['osm_id'] not in index"

0:45

The issue is that the features returned don't have 'osm_id' in columns. Let me explore better.

0:45

Bash cd /work && python3 -c "

cd /work && python3 -c "
import osmnx as ox
ox.settings.log_console = False

gdf = ox.features_from_place('Vienna, Austria', tags={'boundary':'administrative', 'admin_level':'9'})
print('Shape:', gdf.shape)
print('Index names:', gdf.index.names)
print('Index sample:', gdf.index[:5])
print()
# Check admin_level values
print('admin_level values:', sorted(gdf['admin_level'].dropna().unique()))
print()

# Filter to only admin_level=9
gdf9 = gdf[gdf['admin_level'] == '9'].copy()
print('admin_level=9 shape:', gdf9.shape)
print()
print('name values:')
print(gdf9['name'].unique())
"

Shape: (897, 411)
Index names: ['element', 'id']
Index sample: MultiIndex([(    'node', 2518271394),
            (    'node', 2518271399),
            (    'node', 2518271406),
            ('relation',      16239),
            ('relation',      63619)],
           names=['element', 'id'])

admin_level values: ['10', '2', '4', '6', '8', '9']

admin_level=9 shape: (418, 411)

name values:
<ArrowStringArray>
[               'Bezirksgesundheitsamt',
                              'Mag Elf',
 'Bürgerdienst Kundenzentrum Simmering',
                           'Alsergrund',
                             'Hietzing',
                         'Innere Stadt',
                           'Josefstadt',
                         'Leopoldstadt',
                            'Mariahilf',
                             'Meidling',
                               'Neubau',
                              'Penzing',
                 'Rudolfsheim-Fünfhaus',
                              'Währing',
                           'Landstraße',
                          'Brigittenau',
                           'Donaustadt',
                              'Döbling',
                            'Favoriten',
                          'Floridsdorf',
                              'Hernals',
                              'Liesing',
                           'Margareten',
                            'Ottakring',
                            'Simmering',
                               'Wieden',
          'Amtshaus für den 16. Bezirk',
                       'Reichmanngasse',
                           'Fenzlgasse',
                'Josef-Weinheber-Platz',
                     'Steinbruchstraße',
                         'Flötzersteig',
                            'Burggasse',
                      'Kleine Neugasse',
                       'Blechturmgasse',
                   'Rosa-Fischer-Gasse',
                       'Friedmanngasse',
                 'Lerchenfelder Straße',
                      'Donauturmstraße',
                                    nan,
                      'Starkfriedgasse',
                     'Emil-Behring-Weg',
                           'Gräßlplatz',
                        'Paragonstraße',
                         'Drygalskiweg',
                 'Gregor-Mendel-Straße',
                  'Peter-Jordan-Straße',
                          'Geblergasse',
                       'Klopstockgasse',
                      'Döblerhofstraße',
                    'Kreuzgassenbrücke',
                            'Wattgasse',
                        'Büdingergasse',
                           'Klimtgasse',
                   'Wittgensteinstraße',
                       'Kirchfeldgasse',
                 'Hermann-Pacher-Gasse',
                       'Haslingergasse',
                   'Kettenbrückengasse',
                   'Mariahilfer Straße',
                      'Ziegelofengasse',
                          'Sebaldgasse',
                     'Wienerbergstraße',
                            'Guglgasse',
                          'Mittersteig',
                           'Wundtgasse',
                 'Atzgersdorfer Straße',
                          'Blumengasse',
                      'Schlechtastraße',
                   'Amtshaus Simmering',
                      'Hasenauerstraße',
                   'Ottakringer Straße',
                   'Spiegelgrundstraße',
                    'Babenbergerstraße',
                   'Prinz-Eugen-Straße',
                        'Veronikagasse',
                    'Speisinger Straße',
                        'Erdbergstraße',
                    'Wilhelminenstraße',
                     'Margaretenstraße',
                         'Wichtelgasse',
                  'Leopold-Ernst-Gasse',
                      'Hildebrandgasse',
                   'Khevenhüllerstraße',
              'Oskar-Grissemann-Straße',
                         'Kettenbrücke',
                     'Rosenackerstraße',
                         'Karplusgasse',
         'Bezirksvorstehung Alsergrund']
Length: 89, dtype: str

0:53

I see the issue - the `features_from_place` returns all sorts of features tagged with `boundary=administrative` and `admin_level=9`, but many of them are not actually districts. The district names I can see are the unique district names like "Alsergrund", "Hietzing", "Innere Stadt", etc. Let me look more carefully at what we have.

Actually, I notice that there are 418 features with admin_level=9. Many are buildings or addresses that also have admin_level=9 for some reason. Let me filter by geometry type and look at the actual polygon boundary relations.

Let me use a different approach - I'll query Overpass directly for the district relations.

0:53

Bash cd /work && python3 -c "

cd /work && python3 -c "
import osmnx as ox
import geopandas as gpd
ox.settings.log_console = False

gdf = ox.features_from_place('Vienna, Austria', tags={'boundary':'administrative', 'admin_level':'9'})
gdf9 = gdf[gdf['admin_level'] == '9'].copy()

# Let's look at just the districts - filter by geometry type polygon and reasonable size
# Also check 'type' column if present
print('Type column values:', gdf9['type'].dropna().unique())
print()
# Let's filter to only relation-type features (districts are typically relations)
relations = gdf9[gdf9.index.get_level_values('element') == 'relation']
print('Number of relations:', len(relations))
print('Names of relations:')
print(relations['name'].unique())
"

Type column values: <ArrowStringArray>
['boundary', 'multipolygon']
Length: 2, dtype: str

Number of relations: 24
Names of relations:
<ArrowStringArray>
[                 'Alsergrund',                    'Hietzing',
                'Innere Stadt',                  'Josefstadt',
                'Leopoldstadt',                   'Mariahilf',
                    'Meidling',                      'Neubau',
                     'Penzing',        'Rudolfsheim-Fünfhaus',
                     'Währing',                  'Landstraße',
                 'Brigittenau',                  'Donaustadt',
                     'Döbling',                   'Favoriten',
                 'Floridsdorf',                     'Hernals',
                     'Liesing',                  'Margareten',
                   'Ottakring',                   'Simmering',
                      'Wieden', 'Amtshaus für den 16. Bezirk']
Length: 24, dtype: str

2:04

So OSMnx gives us 24 relations for Vienna districts - 23 districts (Bezirke) plus one extra ("Amtshaus für den 16. Bezirk" is an office not a district). Let me check which ones are actually districts.

Vienna has 23 districts:
1. Innere Stadt
2. Leopoldstadt
3. Landstraße
4. Wieden
5. Margareten
6. Mariahilf
7. Neubau
8. Josefstadt
9. Alsergrund
10. Favoriten
11. Simmering
12. Meidling
13. Hietzing
14. Penzing
15. Rudolfsheim-Fünfhaus
16. Ottakring
17. Hernals
18. Währing
19. Döbling
20. Brigittenau
21. Floridsdorf
22. Donaustadt
23. Liesing

So the relations with district names look right - 23 districts. The "Amtshaus für den 16. Bezirk" is a building/office, not a district.

Now I need to also get the 2014-01-01 snapshot. Overpass API has a way to query historical data using `[date:"2014-01-01T00:00:00Z"]`. Let me use Overpass directly via the Overpass API or use OSMnx with custom data.

Actually, OSMnx can't query historical data directly. I'll need to use the Overpass API directly with a Python request.

Let me write a comprehensive script to:
1. Get current Vienna district boundaries from Overpass
2. Get 2014-01-01 Vienna district boundaries from Overpass
3. Normalize district names
4. Compute symmetric difference to find changes
5. Classify fragments
6. Determine which districts border changed areas
7. Output GeoJSON

Let me write a Python script for this.

2:04

Write /work/solve.py

content · 515 lines · py

"""
Fetch Vienna district boundaries (current and 2014-01-01) from Overpass,
compare to find changes, and output vienna_boundary_changes.geojson.
"""

import json
import sys
import time
import urllib.request
import urllib.error
import re

import geopandas as gpd
import pandas as pd
import shapely
import numpy as np

# ---------------------------------------------------------------------------
# 1. Overpass query helper
# ---------------------------------------------------------------------------
OVERPASS_URL = "https://overpass-api.de/api/interpreter"


def overpass_query(query, date=None):
    """Run an Overpass query, optionally at a historical date."""
    full_query = query
    if date:
        # prepend the date setting
        full_query = f"[date:\"{date}\"];\n" + query
    data = f"data={urllib.request.quote(full_query)}"
    req = urllib.request.Request(OVERPASS_URL, data.encode("utf-8"))
    req.add_header("Content-Type", "application/x-www-form-urlencoded")
    # Retry with backoff
    for attempt in range(5):
        try:
            with urllib.request.urlopen(req, timeout=120) as resp:
                return json.loads(resp.read().decode("utf-8"))
        except urllib.error.HTTPError as e:
            if e.code == 429:
                time.sleep(2 ** attempt)
                continue
            raise
        except Exception as e:
            if attempt < 4:
                time.sleep(2 ** attempt)
                continue
            raise
    return None


def osm_json_to_gdf(osm_json):
    """Convert Overpass JSON output (nodes, ways, relations) into a GeoDataFrame
    with MultiPolygon geometries for relations tagged as boundary/administrative.
    
    We look for relations with type=boundary or boundary=administrative.
    """
    # Parse elements
    elements = osm_json.get("elements", [])
    
    # Collect nodes by id
    nodes = {}
    ways = {}
    relations = []
    
    for el in elements:
        etype = el.get("type")
        eid = el.get("id")
        if etype == "node":
            nodes[eid] = (el.get("lon"), el.get("lat"))
        elif etype == "way":
            ways[eid] = el.get("nodes", [])
        elif etype == "relation":
            relations.append(el)
    
    # Build linestrings for each way
    def way_to_coords(way_id):
        node_ids = ways.get(way_id, [])
        coords = []
        for nid in node_ids:
            if nid in nodes:
                coords.append(nodes[nid])
        return coords
    
    # For each relation tagged as boundary=administrative (admin_level=9)
    # build the geometry from its members
    features = []
    
    for rel in relations:
        tags = rel.get("tags", {})
        # Check if this is a district boundary relation
        is_boundary = (tags.get("boundary") == "administrative" or 
                       tags.get("type") == "boundary" or
                       tags.get("admin_level") == "9")
        if not is_boundary:
            continue
        
        name = tags.get("name", "")
        # We want district-level boundaries (admin_level=9)
        if tags.get("admin_level") not in (None, "", "9"):
            continue
            
        members = rel.get("members", [])
        
        # Collect outer/inner rings from way members
        outer_rings = []
        inner_rings = []
        
        for mem in members:
            if mem.get("type") != "way":
                continue
            role = mem.get("role", "")
            ref = mem.get("ref")
            if ref not in ways:
                continue
            coords = way_to_coords(ref)
            if len(coords) < 3:
                continue
            if role == "inner":
                inner_rings.append(coords)
            else:  # outer or no role
                outer_rings.append(coords)
        
        # Build polygons
        polygons = []
        for outer in outer_rings:
            if len(outer) < 3:
                continue
            # Close ring if not closed
            if outer[0] != outer[-1]:
                outer = outer + [outer[0]]
            try:
                outer_ls = shapely.LineString(outer)
                if not outer_ls.is_valid or outer_ls.is_empty:
                    continue
                outer_ring = shapely.polygon.orient(shapely.Polygon(outer_ls))
                
                # Find inners that are contained in this outer
                inners_for_this = []
                remaining_inners = []
                for inner in inner_rings:
                    if inner[0] != inner[-1]:
                        inner = inner + [inner[0]]
                    try:
                        inner_ls = shapely.LineString(inner)
                        if inner_ls.is_valid and not inner_ls.is_empty:
                            ipoly = shapely.Polygon(inner_ls)
                            if outer_ring.contains(ipoly.representative_point()) or outer_ring.intersects(ipoly):
                                inners_for_this.append(ipoly)
                            else:
                                remaining_inners.append(inner)
                        else:
                            remaining_inners.append(inner)
                    except Exception:
                        remaining_inners.append(inner)
                
                inner_rings = remaining_inners
                
                if inners_for_this:
                    poly = shapely.Polygon(outer_ls, [list(i.exterior.coords) for i in inners_for_this])
                else:
                    poly = outer_ls
                
                if poly.is_valid and not poly.is_empty:
                    polygons.append(poly)
            except Exception:
                continue
        
        if polygons:
            geom = shapely.unary_union(polygons) if len(polygons) > 1 else polygons[0]
            if geom.geom_type == "Polygon":
                geom = shapely.geometry.MultiPolygon([geom])
            elif geom.geom_type == "MultiPolygon":
                pass
            else:
                continue
            
            features.append({
                "geometry": geom,
                "name": name,
                "osm_id": rel.get("id"),
                "admin_level": tags.get("admin_level", "9"),
            })
    
    if not features:
        return gpd.GeoDataFrame()
    
    gdf = gpd.GeoDataFrame(features, crs="EPSG:4326")
    return gdf


def fetch_vienna_districts(date=None):
    """Fetch all Vienna district boundary relations (admin_level=9) via Overpass."""
    # Use a bounding box around Vienna to limit results, and filter by admin_level=9
    query = """
    [out:json][timeout:120];
    (
      relation["admin_level"="9"]["boundary"="administrative"](48.1,16.1,48.35,16.6);
    );
    out body;
    >;
    out skel qt;
    """
    return overpass_query(query, date=date)


# ---------------------------------------------------------------------------
# 2. Name normalisation
# ---------------------------------------------------------------------------
# Vienna districts and their expected normalised names
VIENNA_DISTRICTS = {
    1: "Innere Stadt",
    2: "Leopoldstadt",
    3: "Landstraße",
    4: "Wieden",
    5: "Margareten",
    6: "Mariahilf",
    7: "Neubau",
    8: "Josefstadt",
    9: "Alsergrund",
    10: "Favoriten",
    11: "Simmering",
    12: "Meidling",
    13: "Hietzing",
    14: "Penzing",
    15: "Rudolfsheim-Fünfhaus",
    16: "Ottakring",
    17: "Hernals",
    18: "Währing",
    19: "Döbling",
    20: "Brigittenau",
    21: "Floridsdorf",
    22: "Donaustadt",
    23: "Liesing",
}


def normalise_name(name):
    """Normalise a district name to its canonical form."""
    if not name or pd.isna(name):
        return None
    # Strip whitespace, handle known variations
    n = name.strip()
    # Some common variations in casing/spelling
    lookup = {
        "innere stadt": "Innere Stadt",
        "1. bezirk": "Innere Stadt",
        "1. bezirk innere stadt": "Innere Stadt",
        "leopoldstadt": "Leopoldstadt",
        "2. bezirk": "Leopoldstadt",
        "landstraße": "Landstraße",
        "landstrasse": "Landstraße",
        "3. bezirk": "Landstraße",
        "wieden": "Wieden",
        "4. bezirk": "Wieden",
        "margareten": "Margareten",
        "5. bezirk": "Margareten",
        "margarethen": "Margareten",
        "mariahilf": "Mariahilf",
        "6. bezirk": "Mariahilf",
        "neubau": "Neubau",
        "7. bezirk": "Neubau",
        "josefstadt": "Josefstadt",
        "8. bezirk": "Josefstadt",
        "alsergrund": "Alsergrund",
        "9. bezirk": "Alsergrund",
        "favoriten": "Favoriten",
        "10. bezirk": "Favoriten",
        "simmering": "Simmering",
        "11. bezirk": "Simmering",
        "meidling": "Meidling",
        "12. bezirk": "Meidling",
        "hietzing": "Hietzing",
        "13. bezirk": "Hietzing",
        "penzing": "Penzing",
        "14. bezirk": "Penzing",
        "rudolfsheim-fünfhaus": "Rudolfsheim-Fünfhaus",
        "rudolfsheim-fuenfhaus": "Rudolfsheim-Fünfhaus",
        "rudolfsheim funfhaus": "Rudolfsheim-Fünfhaus",
        "rudolfsheim fünfhaus": "Rudolfsheim-Fünfhaus",
        "15. bezirk": "Rudolfsheim-Fünfhaus",
        "ottakring": "Ottakring",
        "16. bezirk": "Ottakring",
        "hernals": "Hernals",
        "17. bezirk": "Hernals",
        "währing": "Währing",
        "waehring": "Währing",
        "18. bezirk": "Währing",
        "döbling": "Döbling",
        "doebling": "Döbling",
        "19. bezirk": "Döbling",
        "brigittenau": "Brigittenau",
        "20. bezirk": "Brigittenau",
        "floridsdorf": "Floridsdorf",
        "21. bezirk": "Floridsdorf",
        "donaustadt": "Donaustadt",
        "22. bezirk": "Donaustadt",
        "liesing": "Liesing",
        "23. bezirk": "Liesing",
    }
    key = n.lower().replace("  ", " ").replace("-", "-")
    if key in lookup:
        return lookup[key]
    # Try fuzzy matching
    for pattern, canon in lookup.items():
        if pattern in key or key in pattern:
            return canon
    return n


# ---------------------------------------------------------------------------
# 3. Main pipeline
# ---------------------------------------------------------------------------
def main():
    print("Fetching current Vienna district boundaries...")
    current_osm = fetch_vienna_districts()
    current_gdf = osm_json_to_gdf(current_osm)
    print(f"  Got {len(current_gdf)} current district features")
    
    print("Fetching 2014-01-01 Vienna district boundaries...")
    historical_osm = fetch_vienna_districts(date="2014-01-01T00:00:00Z")
    historical_gdf = osm_json_to_gdf(historical_osm)
    print(f"  Got {len(historical_gdf)} historical district features")
    
    if current_gdf.empty or historical_gdf.empty:
        print("ERROR: Empty data returned. Saving raw Overpass responses for debugging.")
        with open("/work/current_raw.json", "w") as f:
            json.dump(current_osm, f, indent=2)
        with open("/work/historical_raw.json", "w") as f:
            json.dump(historical_osm, f, indent=2)
        print("Raw data saved to /work/current_raw.json and /work/historical_raw.json")
        return
    
    # Normalise names
    current_gdf["district_name"] = current_gdf["name"].apply(normalise_name)
    historical_gdf["district_name"] = historical_gdf["name"].apply(normalise_name)
    
    print("\nCurrent district names:")
    for n in sorted(current_gdf["district_name"].dropna().unique()):
        print(f"  - {n}")
    print("\nHistorical district names:")
    for n in sorted(historical_gdf["district_name"].dropna().unique()):
        print(f"  - {n}")
    
    # Remove non-district features (like "Amtshaus für den 16. Bezirk")
    valid_names = set(VIENNA_DISTRICTS.values())
    current_gdf = current_gdf[current_gdf["district_name"].isin(valid_names)].copy()
    historical_gdf = historical_gdf[historical_gdf["district_name"].isin(valid_names)].copy()
    
    print(f"\nAfter filtering: {len(current_gdf)} current, {len(historical_gdf)} historical districts")
    
    # Check for matching districts
    current_names = set(current_gdf["district_name"].dropna())
    historical_names = set(historical_gdf["district_name"].dropna())
    
    common = current_names & historical_names
    only_current = current_names - historical_names
    only_historical = historical_names - current_names
    
    print(f"Common districts: {len(common)}")
    if only_current:
        print(f"Only in current: {only_current}")
    if only_historical:
        print(f"Only in historical: {only_historical}")
    
    # Build a unified dissolve of all current and all historical boundaries
    current_union = current_gdf.geometry.unary_union
    historical_union = historical_gdf.geometry.unary_union
    
    print(f"\nCurrent total boundary area: {current_union.area:.4f} sq deg")
    print(f"Historical total boundary area: {historical_union.area:.4f} sq deg")
    
    # Symmetric difference gives changed areas
    added = shapely.difference(current_union, historical_union)
    removed = shapely.difference(historical_union, current_union)
    unchanged = shapely.intersection(current_union, historical_union)
    
    print(f"Added area: {added.area:.6f}")
    print(f"Removed area: {removed.area:.6f}")
    print(f"Unchanged area: {unchanged.area:.6f}")
    
    # Prepare fragments
    fragments = []
    
    # Added fragments
    if not added.is_empty:
        if added.geom_type == "MultiPolygon":
            for poly in added.geoms:
                fragments.append((shapely.geometry.MultiPolygon([poly]), "added_since_2014"))
        elif added.geom_type == "Polygon":
            fragments.append((shapely.geometry.MultiPolygon([added]), "added_since_2014"))
    
    # Removed fragments
    if not removed.is_empty:
        if removed.geom_type == "MultiPolygon":
            for poly in removed.geoms:
                fragments.append((shapely.geometry.MultiPolygon([poly]), "removed_since_2014"))
        elif removed.geom_type == "Polygon":
            fragments.append((shapely.geometry.MultiPolygon([removed]), "removed_since_2014"))
    
    # Unchanged fragments - we can split by district for better attribution
    unchanged_frags = []
    for _, row in current_gdf.iterrows():
        dist_name = row["district_name"]
        dist_geom = row["geometry"]
        unchanged_part = shapely.intersection(dist_geom, unchanged)
        if not unchanged_part.is_empty:
            if unchanged_part.geom_type == "MultiPolygon":
                for poly in unchanged_part.geoms:
                    unchanged_frags.append((shapely.geometry.MultiPolygon([poly]), "unchanged", dist_name))
            elif unchanged_part.geom_type == "Polygon":
                unchanged_frags.append((shapely.geometry.MultiPolygon([unchanged_part]), "unchanged", dist_name))
            else:
                unchanged_frags.append((shapely.geometry.MultiPolygon([unchanged_part]), "unchanged", dist_name))
    
    # For added/removed, determine which district they belong to
    # We'll assign the district_name based on which current/historical district contains/intersects them
    added_with_dist = []
    for geom, ctype in fragments:
        if ctype == "added_since_2014":
            # Find which current district(s) this belongs to
            touching_districts = set()
            for _, row in current_gdf.iterrows():
                if geom.intersects(row["geometry"]):
                    touching_districts.add(row["district_name"])
            dist_name = ", ".join(sorted(touching_districts)) if touching_districts else "unknown"
            added_with_dist.append((geom, ctype, dist_name))
        elif ctype == "removed_since_2014":
            touching_districts = set()
            for _, row in historical_gdf.iterrows():
                if geom.intersects(row["geometry"]):
                    touching_districts.add(row["district_name"])
            dist_name = ", ".join(sorted(touching_districts)) if touching_districts else "unknown"
            added_with_dist.append((geom, ctype, dist_name))
    
    # Combine all fragments
    all_fragments = []
    for geom, ctype, dname in added_with_dist:
        all_fragments.append({"geometry": geom, "change_type": ctype, "district_name": dname})
    for geom, ctype, dname in unchanged_frags:
        all_fragments.append({"geometry": geom, "change_type": ctype, "district_name": dname})
    
    # Build GeoDataFrame
    result_gdf = gpd.GeoDataFrame(all_fragments, crs="EPSG:4326")
    
    if result_gdf.empty:
        print("No fragments produced — all boundaries unchanged.")
        # Create a dummy feature showing all current boundaries as unchanged
        for _, row in current_gdf.iterrows():
            result_gdf = gpd.GeoDataFrame([
                {"geometry": row["geometry"], "change_type": "unchanged", "district_name": row["district_name"]}
            ], crs="EPSG:4326")
    else:
        print(f"\nFragments created: {len(result_gdf)}")
        print(result_gdf["change_type"].value_counts())
    
    # -----------------------------------------------------------------------
    # 4. Compute touches_changed for each district
    # -----------------------------------------------------------------------
    # For each district, check if its geometry borders (touches) any added or
    # removed area.
    
    # Get all added/removed geometries combined
    changed_areas = []
    for _, row in result_gdf.iterrows():
        if row["change_type"] in ("added_since_2014", "removed_since_2014"):
            changed_areas.append(row["geometry"])
    
    if changed_areas:
        changed_union = shapely.unary_union(changed_areas)
    else:
        changed_union = shapely.geometry.MultiPolygon()
    
    # For each district in current data, check if it touches changed areas
    district_touches = {}
    for _, row in current_gdf.iterrows():
        dname = row["district_name"]
        geom = row["geometry"]
        if not changed_union.is_empty and geom.touches(changed_union):
            district_touches[dname] = True
        else:
            # Also check if the district itself IS an added/removed part
            # i.e. if the district geometry overlaps with changed areas
            if not changed_union.is_empty and geom.intersects(changed_union):
                # Check if it's more than just touching - if it contains changed area
                overlap = shapely.intersection(geom, changed_union)
                if not overlap.is_empty and overlap.area > 0:
                    district_touches[dname] = True
                else:
                    district_touches[dname] = False
            else:
                district_touches[dname] = False
    
    # Assign touches_changed to each fragment
    result_gdf["touches_changed"] = result_gdf["district_name"].apply(
        lambda dn: any(district_touches.get(d.strip(), False) for d in dn.split(", "))
        if dn != "unknown" else False
    )
    
    # Simplify geometries a bit to keep file size manageable
    result_gdf["geometry"] = result_gdf["geometry"].simplify(0.00001, preserve_topology=True)
    
    # Output
    output_path = "/work/vienna_boundary_changes.geojson"
    result_gdf.to_file(output_path, driver="GeoJSON")
    print(f"\nOutput written to {output_path}")
    print(f"Features: {len(result_gdf)}")
    print(f"Columns: {result_gdf.columns.tolist()}")
    print(f"\nSample:\n{result_gdf[['change_type', 'district_name', 'touches_changed']].head(10)}")
    print(f"\nDistrict touch summary:")
    for dname, touches in sorted(district_touches.items()):
        print(f"  {dname}: touches_changed={touches}")


if __name__ == "__main__":
    main()

Wrote /work/solve.py (19800 chars).

2:06

Let me run the script to test it.

2:06

Bash cd /work && timeout 300 python solve.py 2>&1

Fetching current Vienna district boundaries...
Traceback (most recent call last):
  File "/work/solve.py", line 515, in <module>
    main()
    ~~~~^^
  File "/work/solve.py", line 315, in main
    current_osm = fetch_vienna_districts()
  File "/work/solve.py", line 203, in fetch_vienna_districts
    return overpass_query(query, date=date)
  File "/work/solve.py", line 36, in overpass_query
    with urllib.request.urlopen(req, timeout=120) as resp:
         ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.14/urllib/request.py", line 187, in urlopen
    return opener.open(url, data, timeout)
           ~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.14/urllib/request.py", line 493, in open
    response = meth(req, response)
  File "/usr/local/lib/python3.14/urllib/request.py", line 602, in http_response
    response = self.parent.error(
        'http', request, response, code, msg, hdrs)
  File "/usr/local/lib/python3.14/urllib/request.py", line 531, in error
    return self._call_chain(*args)
           ~~~~~~~~~~~~~~~~^^^^^^^
  File "/usr/local/lib/python3.14/urllib/request.py", line 464, in _call_chain
    result = func(*args)
  File "/usr/local/lib/python3.14/urllib/request.py", line 611, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 406: Not Acceptable

2:15

The Overpass API is returning a 406 error. Let me check the query format. The issue might be that we're using GET with data in the body, which Overpass doesn't accept. We need to POST properly. Let me check the official Overpass API specs.

Actually, looking at the code, I'm using `urllib.request.quote` on the data and sending it as a POST with Content-Type `application/x-www-form-urlencoded`. But Overpass API typically expects the data parameter to be sent as POST body with Content-Type `application/x-www-form-urlencoded`. Let me check the URL format.

Overpass API (https://overpass-api.de/api/interpreter) accepts POST requests with the query in the `data` parameter.

Let me try using a different approach - maybe using `requests` library or fixing the query. Let me also check if we can use `osmnx` with a date parameter instead.

Actually, let me try using `requests` and the correct format.

2:15

Bash cd /work && python3 -c "

cd /work && python3 -c "
import requests
query = '''
[out:json][timeout:120];
(
  relation[\"admin_level\"=\"9\"][\"boundary\"=\"administrative\"](48.1,16.1,48.35,16.6);
);
out body;
>;
out skel qt;
'''
r = requests.post('https://overpass-api.de/api/interpreter', data={'data': query}, timeout=120)
print('Status:', r.status_code)
print('Len:', len(r.text))
if r.status_code == 200:
    import json
    data = r.json()
    print('Elements:', len(data.get('elements', [])))
else:
    print(r.text[:500])
"

Status: 406
Len: 371
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html><head>
<title>406 Not Acceptable</title>
</head><body>
<h1>Not Acceptable</h1>
<p>An appropriate representation of the requested resource could not be found on this server.</p>
<hr>
<address>Apache/2.4.67 (Debian) Server at overpass-api.de Port 443</address>
</body></html>

2:19

The Overpass API is returning 406 Not Acceptable. This might be due to content-type negotiation. Let me check the Overpass API requirements. The older API uses a different content-type approach. Let me try with the right format.

2:19

Bash curl -s -X POST -d 'data=[out:json];relation["admin_level"="9"]["boundary"="admi...

bash: line 1: curl: command not found

2:24

Let me fix the Overpass query to use proper POST with requests library.

2:24

Bash cd /work && python3 -c "

cd /work && python3 -c "
import requests
query = '''
[out:json][timeout:120];
(
  relation[\"admin_level\"=\"9\"][\"boundary\"=\"administrative\"](48.1,16.1,48.35,16.6);
);
out body;
>;
out skel qt;
'''
# Use proper POST with data parameter
r = requests.post('https://overpass-api.de/api/interpreter', data={'data': query}, timeout=120, 
                  headers={'Accept': 'application/json'})
print('Status:', r.status_code)
print('Content-Type:', r.headers.get('content-type'))
if r.status_code == 200:
    import json
    data = r.json()
    print('Elements:', len(data.get('elements', [])))
    for el in data.get('elements', [])[:5]:
        print('  type:', el.get('type'), 'id:', el.get('id'), 'tags:', {k:v for k,v in (el.get('tags') or {}).items() if k in ('name','admin_level','boundary')})
else:
    print(r.text[:500])
"

Status: 406
Content-Type: text/html; charset=iso-8859-1
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html><head>
<title>406 Not Acceptable</title>
</head><body>
<h1>Not Acceptable</h1>
<p>An appropriate representation of the requested resource could not be found on this server.</p>
<hr>
<address>Apache/2.4.67 (Debian) Server at overpass-api.de Port 443</address>
</body></html>

2:32

The 406 error might be caused by the request format. Let me try using urllib properly or check what's going on with the Overpass API.

2:32

Bash cd /work && python3 -c "

cd /work && python3 -c "
import urllib.request, urllib.parse
query = '''
[out:json][timeout:120];
(
  relation[\"admin_level\"=\"9\"][\"boundary\"=\"administrative\"](48.1,16.1,48.35,16.6);
);
out body;
>;
out skel qt;
'''
# Use urllib with proper POST
data = urllib.parse.urlencode({'data': query}).encode()
req = urllib.request.Request('https://overpass-api.de/api/interpreter', data=data)
req.add_header('Accept', 'application/json')
try:
    with urllib.request.urlopen(req, timeout=120) as resp:
        import json
        data = json.loads(resp.read().decode())
        print('Elements:', len(data.get('elements', [])))
except Exception as e:
    print('Error:', e)
    # Try alternative endpoint
    try:
        req2 = urllib.request.Request('https://overpass-api.de/api/interpreter', data=data)
        req2.add_header('Content-Type', 'application/x-www-form-urlencoded')
        with urllib.request.urlopen(req2, timeout=120) as resp:
            import json
            data = json.loads(resp.read().decode())
            print('Elements:', len(data.get('elements', [])))
    except Exception as e2:
        print('Error2:', e2)
"

Error: HTTP Error 406: Not Acceptable
Error2: HTTP Error 406: Not Acceptable

2:38

The Overpass API is returning 406 errors. Let me try using a different approach - maybe using OSMnx to fetch historical data or a different way to query Overpass. Actually, let me try the Overpass Turbo endpoint or a simpler query format.

2:38

Bash # Let's try with a simpler query and different content type

# Let's try with a simpler query and different content type
cd /work && python3 << 'EOF'
import urllib.request, urllib.parse

# Simple query first
query = """
[out:json];
node(48.2,16.3,48.21,16.31);
out;
"""
data = urllib.parse.urlencode({'data': query}).encode()
req = urllib.request.Request('https://overpass-api.de/api/interpreter', data=data)
req.add_header('Content-Type', 'application/x-www-form-urlencoded')
try:
    with urllib.request.urlopen(req, timeout=30) as resp:
        text = resp.read().decode()
        print('OK, len:', len(text))
        print(text[:300])
except Exception as e:
    print('Error:', e)
    # Try without Content-Type header
    req2 = urllib.request.Request('https://overpass-api.de/api/interpreter', data=data)
    try:
        with urllib.request.urlopen(req2, timeout=30) as resp2:
            text = resp2.read().decode()
            print('OK2, len:', len(text))
            print(text[:300])
    except Exception as e2:
        print('Error2:', e2)
EOF

Error: HTTP Error 406: Not Acceptable
Error2: HTTP Error 406: Not Acceptable

2:42

The Overpass API is returning 406 Not Acceptable. This might mean the IP is blocked or the API has been updated. Let me try using OSMnx directly which uses its own Overpass client, or perhaps try the Kumi Systems overpass endpoint.

Let me try an alternative approach - use OSMnx's built-in Overpass capabilities or use a different Overpass endpoint.

2:42

Bash # Check if osmnx can do this - it has its own overpass handling

# Check if osmnx can do this - it has its own overpass handling
cd /work && python3 -c "
import osmnx as ox
ox.settings.log_console = False
print(ox.__version__)
# Try using osmnx's internal overpass request
print('Overpass endpoint:', ox.settings.overpass_endpoint)
"

2.1.0

Traceback (most recent call last):
  File "<string>", line 6, in <module>
    print('Overpass endpoint:', ox.settings.overpass_endpoint)
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: module 'osmnx.settings' has no attribute 'overpass_endpoint'

2:49

The 406 error means the Overpass API isn't accepting our requests. This might be a server-side issue or we need to use a different approach. Let me try using osmnx's own capabilities to fetch data, or try a different Overpass endpoint.

Let me try using osmnx's features_from_place with the extract_bbox parameter or using the overpass API with the correct format.

Actually, let me try a slightly different Overpass query format - the issue might be with the brackets/formatting.

2:49

Bash cd /work && python3 << 'PYEOF'

cd /work && python3 << 'PYEOF'
import urllib.request, urllib.parse

# Try with properly escaped query
query = '[out:json];relation["admin_level"="9"]["boundary"="administrative"](48.1,16.1,48.35,16.6);out body;>;out skel qt;'
data = urllib.parse.urlencode({'data': query}).encode()
print("Query:", query[:80])

req = urllib.request.Request('https://overpass-api.de/api/interpreter', data=data)
# Don't set Accept, let server decide
try:
    with urllib.request.urlopen(req, timeout=120) as resp:
        text = resp.read().decode()
        print('OK, len:', len(text))
        print(text[:500])
except Exception as e:
    print('Error:', e)
    # Try with curl
PYEOF

Query: [out:json];relation["admin_level"="9"]["boundary"="administrative"](48.1,16.1,48
Error: HTTP Error 406: Not Acceptable

2:52

The Overpass API is returning 406 errors. Let me check if maybe the server requires a different approach. Let's try without URL encoding, or use requests library with proper settings.

2:52

Bash cd /work && python3 << 'PYEOF'

cd /work && python3 << 'PYEOF'
import osmnx as ox
ox.settings.log_console = False

# Use osmnx to fetch data - it handles overpass internally
# First get the city boundary
vienna = ox.geocode_to_gdf('Wien, Österreich', which_result=1)
print(vienna[['name', 'geometry']].head())
print(vienna.crs)

# Now get the districts using features_from_polygon
poly = vienna.geometry.iloc[0]
gdf = ox.features_from_polygon(poly, tags={'boundary': 'administrative', 'admin_level': '9'})
print('\nAll features:', len(gdf))
# Filter to relation type only (districts)
rel = gdf[gdf.index.get_level_values('element') == 'relation']
print('Relations:', len(rel))
print('Names:', rel['name'].unique() if 'name' in rel.columns else 'no name col')
PYEOF

     name                                           geometry
0  Vienna  POLYGON ((16.18183 48.17112, 16.1819 48.17103,...
epsg:4326

All features: 897
Relations: 169
Names: <ArrowStringArray>
[                       'Österreich',                 'Bezirk St. Pölten',
                  'Niederösterreich',                    'Deutsch-Wagram',
                          'Aderklaa',                          'Raasdorf',
                'Bezirk Gänserndorf',                   'Groß-Enzersdorf',
                      'Bezirk Tulln',                 'Bezirk Korneuburg',
 ...
 'Katastralgemeinde Fischamend Dorf',  'Katastralgemeinde Langenzersdorf',
    'Katastralgemeinde Kapellerfeld',    'Katastralgemeinde Weidlingbach',
        'Katastralgemeinde Weidling',  'Katastralgemeinde Klosterneuburg',
        'Katastralgemeinde Aderklaa',       'Katastralgemeinde Mauerbach',
   'Katastralgemeinde Laab im Walde',     'Katastralgemeinde Purkersdorf']
Length: 168, dtype: str

3:01

The issue is that many relations are returned. Let me filter more carefully - I need to only get relations with admin_level=9 that are within Vienna proper (not the surrounding areas). Let me check the admin_level values more carefully.

3:01

Bash cd /work && python3 << 'PYEOF'

cd /work && python3 << 'PYEOF'
import osmnx as ox
import geopandas as gpd
ox.settings.log_console = False

vienna = ox.geocode_to_gdf('Wien, Österreich', which_result=1)
poly = vienna.geometry.iloc[0]

gdf = ox.features_from_polygon(poly, tags={'boundary': 'administrative', 'admin_level': '9'})
rel = gdf[gdf.index.get_level_values('element') == 'relation']

# Check admin_level values
print('admin_level values:', rel['admin_level'].dropna().unique())
print()

# Filter to admin_level=9
rel9 = rel[rel['admin_level'] == '9']
print('admin_level=9 count:', len(rel9))
print('Names:', sorted(rel9['name'].dropna().unique()))
print()

# Also check 'boundary' column
print('boundary values:', rel9['boundary'].dropna().unique())
print('type values:', rel9['type'].dropna().unique())
PYEOF

admin_level values: <ArrowStringArray>
['2', '6', '4', '8', '10', '9']
Length: 6, dtype: str

admin_level=9 count: 24
Names: ['Alsergrund', 'Amtshaus für den 16. Bezirk', 'Brigittenau', 'Donaustadt', 'Döbling', 'Favoriten', 'Floridsdorf', 'Hernals', 'Hietzing', 'Innere Stadt', 'Josefstadt', 'Landstraße', 'Leopoldstadt', 'Liesing', 'Margareten', 'Mariahilf', 'Meidling', 'Neubau', 'Ottakring', 'Penzing', 'Rudolfsheim-Fünfhaus', 'Simmering', 'Wieden', 'Währing']

boundary values: <ArrowStringArray>
['administrative']
Length: 1, dtype: str
type values: <ArrowStringArray>
['boundary', 'multipolygon']
Length: 2, dtype: str

3:08

Good, so osmnx can fetch current data successfully. Now I need to figure out how to get the 2014-01-01 data. OSMnx might support this through its settings, or I need to use overpass with date filtering. Let me check if osmnx supports historical data, or if I need a different approach.

Actually, the 406 error from Overpass is weird. Let me try to use osmnx's own overpass downloading mechanism or try directly querying overpass through osmnx.

3:08

Bash cd /work && python3 << 'PYEOF'

cd /work && python3 << 'PYEOF'
import osmnx as ox
ox.settings.log_console = False
print(ox.settings.default_language)

# Check if we can customize the overpass query
# Look at all settings
for attr in dir(ox.settings):
    if not attr.startswith('_'):
        val = getattr(ox.settings, attr)
        if not callable(val):
            print(f"{attr} = {val}")
PYEOF

Traceback (most recent call last):
  File "<stdin>", line 3, in <module>
AttributeError: module 'osmnx.settings' has no attribute 'default_language'

3:14

Let me try a different approach. Since osmnx can fetch the current data successfully, I can use osmnx for current and write a custom Overpass query using the `requests` library with proper handling. But the Overpass API is returning 406 errors, which might be related to the Accept header. Let me try using osmnx's internal mechanisms.

Actually, osmnx has its own way to make Overpass queries. Let me look at the osmnx source to see how it does it, or simply use osmnx's own caching and download functionality with the `date` parameter.

Let me check if osmnx supports date-based queries.

3:14

Bash cd /work && python3 << 'PYEOF'

cd /work && python3 << 'PYEOF'
import osmnx as ox
ox.settings.log_console = False

# Check if there's a way to do historical data
# The overpass library that osmnx uses internally
import osmnx._overpass as op
print(dir(op))
PYEOF

['Any', 'InsufficientResponseError', 'OrderedDict', 'RequestsConnectionError', 'TYPE_CHECKING', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', '_create_overpass_features_query', '_download_overpass_features', '_download_overpass_network', '_get_network_filter', '_get_overpass_pause', '_http', '_make_overpass_polygon_coord_strs', '_make_overpass_settings', '_overpass_request', 'annotations', 'dt', 'lg', 'np', 'projection', 'requests', 'settings', 'time', 'utils', 'utils_geo']

3:18

OK, so osmnx works for current data. The issue is getting 2014 data from Overpass. The HTTP 406 error is weird - maybe the server is blocking requests from this IP? Let's try a different approach.

Let me try using osmnx for current data and use the Overpass API directly with the proper `date` parameter for historical data. The 406 might be related to the `Accept` header or similar. Let me check what osmnx does internally.

3:18

Bash cd /work && python3 << 'PYEOF'

cd /work && python3 << 'PYEOF'
import osmnx as ox
import osmnx._overpass as op
import inspect
print(inspect.getsource(op._overpass_request)[:2000])
PYEOF

def _overpass_request(data: OrderedDict[str, Any]) -> dict[str, Any]:
    """
    Send a HTTP POST request to the Overpass API and return response.

    Parameters
    ----------
    data
        Key-value pairs of parameters.

    Returns
    -------
    response_json
        The Overpass API's response.
    """
    # resolve url to same IP even if there is server round-robin redirecting
    _http._config_dns(settings.overpass_url)

    # prepare the Overpass API URL and see if request already exists in cache
    url = settings.overpass_url.rstrip("/") + "/interpreter"
    prepared_url = str(requests.Request("GET", url, params=data).prepare().url)
    cached_response_json = _http._retrieve_from_cache(prepared_url)
    if isinstance(cached_response_json, dict):
        return cached_response_json

    # pause then request this URL
    pause = _get_overpass_pause(settings.overpass_url)
    hostname = _http._hostname_from_url(url)
    msg = f"Pausing {pause} second(s) before making HTTP POST request to {hostname!r}"
    utils.log(msg, level=lg.INFO)
    time.sleep(pause)

    # transmit the HTTP POST request
    msg = f"Post {prepared_url} with timeout={settings.requests_timeout}"
    utils.log(msg, level=lg.INFO)
    response = requests.post(
        url,
        data=data,
        timeout=settings.requests_timeout,
        headers=_http._get_http_headers(),
        **settings.requests_kwargs,
    )

    # handle 429 and 504 errors by pausing then recursively re-trying request
    if response.status_code in {429, 504}:  # pragma: no cover
        error_pause = 55
        msg = (
            f"{hostname!r} responded {response.status_code} {response.reason}: "
            f"we'll retry in {error_pause} secs"
        )
        utils.log(msg, level=lg.WARNING)
        time.sleep(error_pause)
        return _overpass_request(data)

    response_json = _http._parse_response(response)
    if not isinstance(response_json, dict):  # pragma: no cover
        msg = "Overpass AP

3:22

Let me look at how osmnx creates overpass queries to see if I can add a date parameter. Let me look at the full function and the `_create_overpass_features_query` function.

3:22

Bash cd /work && python3 << 'PYEOF'

cd /work && python3 << 'PYEOF'
import osmnx as ox
import osmnx._overpass as op
import inspect
# Get the source for _create_overpass_features_query
print(inspect.getsource(op._create_overpass_features_query))
PYEOF

def _create_overpass_features_query(  # noqa: PLR0912
    polygon_coord_str: str,
    tags: dict[str, bool | str | list[str]],
) -> str:
    """
    Create an Overpass features query string based on tags.

    Parameters
    ----------
    polygon_coord_str
        The lat lon coordinates.
    tags
        Tags used for finding elements in the search area.

    Returns
    -------
    query
        The Overpass features query.
    """
    # create overpass settings string
    overpass_settings = _make_overpass_settings()

    # make sure every value in dict is bool, str, or list of str
    err_msg = "`tags` must be a dict with values of bool, str, or list of str."
    if not isinstance(tags, dict):  # pragma: no cover
        raise TypeError(err_msg)

    tags_dict: dict[str, bool | str | list[str]] = {}
    for key, value in tags.items():
        if isinstance(value, bool):
            tags_dict[key] = value

        elif isinstance(value, str):
            tags_dict[key] = [value]

        elif isinstance(value, list):
            if not all(isinstance(s, str) for s in value):  # pragma: no cover
                raise TypeError(err_msg)
            tags_dict[key] = value

        else:  # pragma: no cover
            raise TypeError(err_msg)

    # convert the tags dict into a list of {tag:value} dicts
    tags_list: list[dict[str, bool | str | list[str]]] = []
    for key, value in tags_dict.items():
        if isinstance(value, bool):
            tags_list.append({key: value})
        else:
            for value_item in value:
                tags_list.append({key: value_item})  # noqa: PERF401

    # add node/way/relation query components one at a time
    components = []
    for d in tags_list:
        for key, value in d.items():
            if isinstance(value, bool):
                # if bool (ie, True) just pass the key, no value
                tag_str = f"[{key!r}](poly:{polygon_coord_str!r});(._;>;);"
            else:
                # otherwise, pass "key"="value"
                tag_str = f"[{key!r}={value!r}](poly:{polygon_coord_str!r});(._;>;);"

            for kind in ("node", "way", "relation"):
                components.append(f"({kind}{tag_str});")  # noqa: PERF401

    # finalize query and return
    components_str = "".join(components)
    return f"{overpass_settings};({components_str});out;"

3:26

Now I understand how osmnx builds queries. The problem with our direct Overpass API calls is likely the content negotiation. Let me look at what osmnx's HTTP headers are.

3:26

Bash cd /work && python3 << 'PYEOF'

cd /work && python3 << 'PYEOF'
import osmnx as ox
import osmnx._http as http_mod
import inspect
print(inspect.getsource(http_mod._get_http_headers))
PYEOF

def _get_http_headers(
    *,
    user_agent: str | None = None,
    referer: str | None = None,
    accept_language: str | None = None,
) -> dict[str, str]:
    """
    Update the default requests HTTP headers with OSMnx information.

    Parameters
    ----------
    user_agent
        The user agent. If None, use `settings.http_user_agent` value.
    referer
        The referer. If None, use `settings.http_referer` value.
    accept_language
        The accept language. If None, use `settings.http_accept_language`
        value.

    Returns
    -------
    headers
        The updated HTTP headers.
    """
    if user_agent is None:
        user_agent = settings.http_user_agent
    if referer is None:
        referer = settings.http_referer
    if accept_language is None:
        accept_language = settings.http_accept_language

    info = {"User-Agent": user_agent, "referer": referer, "Accept-Language": accept_language}
    headers = dict(requests.utils.default_headers())
    headers.update(info)
    return headers

3:31

I see - the 406 error is coming from our own code not setting proper headers. Let me use osmnx's internal overpass functions to make the request, or use requests library directly with proper headers. Let me check the osmnx settings for overpass URL.

Actually, let me just use the `requests` library with the same approach osmnx uses internally.

3:31

Bash cd /work && python3 << 'PYEOF'

cd /work && python3 << 'PYEOF'
import osmnx.settings
print('overpass_url:', osmnx.settings.overpass_url)
print('http_user_agent:', osmnx.settings.http_user_agent)
print('http_referer:', osmnx.settings.http_referer)
print('http_accept_language:', osmnx.settings.http_accept_language)
PYEOF

overpass_url: https://overpass-api.de/api
http_user_agent: OSMnx Python package (https://github.com/gboeing/osmnx)
http_referer: OSMnx Python package (https://github.com/gboeing/osmnx)
http_accept_language: en

3:41

The issue is that my direct requests to Overpass API are getting 406 errors. This might be because the server is blocking certain user agents. Let me use osmnx's internal overpass handling to do the queries since it works fine.

The plan:
1. Use osmnx to fetch current districts via `ox.features_from_polygon` with `tags={'boundary': 'administrative', 'admin_level': '9'}` and filter to relations with admin_level=9
2. For historical data (2014-01-01), I need to modify the overpass query to add a date parameter. I can use osmnx's internal `_overpass_request` function to send a custom query with the date parameter.

Let me construct a proper overpass query with date and use osmnx's internal mechanisms.

3:41

Bash cd /work && python3 << 'PYEOF'

cd /work && python3 << 'PYEOF'
import osmnx as ox
import osmnx._overpass as op
import osmnx.settings as settings
from collections import OrderedDict
import json

ox.settings.log_console = False

vienna = ox.geocode_to_gdf('Wien, Österreich', which_result=1)
poly = vienna.geometry.iloc[0]

# Get polygon coord string
poly_coord_str = op._make_overpass_polygon_coord_strs(poly)[0]
print("Poly coord str (first 100):", poly_coord_str[:100])

# Build a custom query with date for historical data
overpass_settings = op._make_overpass_settings()
print("Settings:", overpass_settings)

# Build query for admin_level=9 boundary=administrative relations
tags_query = 'relation["admin_level"="9"]["boundary"="administrative"]'
query = f"{overpass_settings};(poly:\"{poly_coord_str}\")->.search_area;({tags_query}(area.search_area););out body;>;out skel qt;"
print("Query:", query[:200])

# Try with date
date_query = f"[date:\"2014-01-01T00:00:00Z\"];\n{overpass_settings};(poly:\"{poly_coord_str}\")->.search_area;({tags_query}(area.search_area););out body;>;out skel qt;"
print("Date query:", date_query[:200])

# Use _overpass_request
try:
    data = OrderedDict()
    data['data'] = date_query
    result = op._overpass_request(data)
    print("Success! Elements:", len(result.get('elements', [])))
    for el in result.get('elements', [])[:5]:
        tags = el.get('tags', {})
        print(f"  {el.get('type')} {el.get('id')}: name={tags.get('name')}, admin_level={tags.get('admin_level')}")
except Exception as e:
    print(f"Error: {e}")
PYEOF

Poly coord str (first 100): 48.171120 16.181831 48.171031 16.181902 48.169754 16.182927 48.169331 16.183994 48.169236 16.184241 
Settings: [out:json][timeout:180]
Query: [out:json][timeout:180];(poly:"48.171120 16.181831 48.171031 16.181902 48.169754 16.182927 48.169331 16.183994 48.169236 16.184241 48.168928 16.184610 48.168093 16.187748 48.166351 16.193729 48.163962
Date query: [date:"2014-01-01T00:00:00Z"];
[out:json][timeout:180];(poly:"48.171120 16.181831 48.171031 16.181902 48.169754 16.182927 48.169331 16.183994 48.169236 16.184241 48.168928 16.184610 48.168093 16.18774
Error: 'overpass-api.de' responded: 400 Bad Request <?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
  <meta http-equiv="content-type" content="text/html; charset=utf-8" lang="en"/>
  <title>OSM3S Response</title>
</head>
<body>

<p>The data included in this document is from www.openstreetmap.org. The data is made available under ODbL.</p>
<p><strong style="color:#FF0000">Error</strong>: line 2: parse error: Unknown type &quot;[&quot; </p>
<p><strong style="color:#FF0000">Error</strong>: line 2: parse error: An empty query is not allowed </p>
<p><strong style="color:#FF0000">Error</strong>: line 2: parse error: ';' expected - 'out' found. </p>
<p><strong style="color:#FF0000">Error</strong>: line 2: parse error: Unknown type &quot;poly&quot; </p>
<p><strong style="color:#FF0000">Error</strong>: line 2: parse error: An empty query is not allowed </p>
<p><strong style="color:#FF0000">Error</strong>: line 2: parse error: ';' expected - ':' found. </p>
<p><strong style="color:#FF0000">Error</strong>: line 2: parse error: Unexpected end of input. </p>
<p><strong style="color:#FF0000">Error</strong>: line 2: static error: Element &quot;print&quot; cannot be subelement of element &quot;union&quot;. </p>
<p><strong style="color:#FF0000">Error</strong>: line 2: static error: Element &quot;print&quot; cannot be subelement of element &quot;union&quot;. </p>

</body>
</html>

3:55

OK, the date syntax needs to be proper. In Overpass QL, the date is specified as a global setting: `[date:"2014-01-01T00:00:00Z"]` must be on its own line before any statements. Let me try a simpler query with proper formatting.

Actually the issue is that the date setting must be placed at the beginning, before the `[out:json]` or after. Let me check the Overpass QL documentation. In Overpass QL, you place `[date:"..."]` at the very beginning of the query.

Let me try a cleaner approach - use OSMnx for current data (since it works) and write a direct Overpass query for historical data.

3:55

Bash cd /work && python3 << 'PYEOF'

cd /work && python3 << 'PYEOF'
import osmnx as ox
import osmnx._overpass as op
from collections import OrderedDict

ox.settings.log_console = False

# Simple test query with date
query = "[date:\"2014-01-01T00:00:00Z\"];\n[out:json][timeout:180];\nrelation[\"admin_level\"=\"9\"][\"boundary\"=\"administrative\"](48.1,16.1,48.35,16.6);\nout body;\n>;\nout skel qt;"
print("Query:")
print(query)
print()

data = OrderedDict()
data['data'] = query
try:
    result = op._overpass_request(data)
    print("Success! Elements:", len(result.get('elements', [])))
    for el in result.get('elements', [])[:10]:
        tags = el.get('tags', {})
        print(f"  {el.get('type')} {el.get('id')}: name={tags.get('name')}, admin_level={tags.get('admin_level')}")
except Exception as e:
    print(f"Error: {e}")
PYEOF

Query:
[date:"2014-01-01T00:00:00Z"];
[out:json][timeout:180];
relation["admin_level"="9"]["boundary"="administrative"](48.1,16.1,48.35,16.6);
out body;
>;
out skel qt;

Error: 'overpass-api.de' responded: 400 Bad Request <?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
  <meta http-equiv="content-type" content="text/html; charset=utf-8" lang="en"/>
  <title>OSM3S Response</title>
</head>
<body>

<p>The data included in this document is from www.openstreetmap.org. The data is made available under ODbL.</p>
<p><strong style="color:#FF0000">Error</strong>: line 2: parse error: Unknown type &quot;[&quot; </p>
<p><strong style="color:#FF0000">Error</strong>: line 2: parse error: An empty query is not allowed </p>
<p><strong style="color:#FF0000">Error</strong>: line 2: parse error: ';' expected - 'out' found. </p>

</body>
</html>

4:02

The Overpass API doesn't like the date setting on a separate line. Let me check the proper format for date queries in Overpass. The `[date:"..."]` should be placed inside the query block, typically after `[out:json]`. Let me try a different format.

4:02

Bash cd /work && python3 << 'PYEOF'

cd /work && python3 << 'PYEOF'
import osmnx as ox
import osmnx._overpass as op
from collections import OrderedDict

ox.settings.log_console = False

# Proper Overpass QL with date - the date needs to be part of the settings
query = """
[out:json][timeout:180][date:"2014-01-01T00:00:00Z"];
relation["admin_level"="9"]["boundary"="administrative"](48.1,16.1,48.35,16.6);
out body;
>;
out skel qt;
"""
print("Query:")
print(query.strip())

data = OrderedDict()
data['data'] = query
try:
    result = op._overpass_request(data)
    print("Success! Elements:", len(result.get('elements', [])))
    for el in result.get('elements', [])[:15]:
        tags = el.get('tags', {})
        print(f"  {el.get('type')} {el.get('id')}: name={tags.get('name')}, admin_level={tags.get('admin_level')}")
except Exception as e:
    print(f"Error: {e}")
PYEOF

Query:
[out:json][timeout:180][date:"2014-01-01T00:00:00Z"];
relation["admin_level"="9"]["boundary"="administrative"](48.1,16.1,48.35,16.6);
out body;
>;
out skel qt;
Success! Elements: 5438
  relation 1641752: name=Klein-Engersdorf, admin_level=9
  relation 1641771: name=Enzersfeld, admin_level=9
  relation 1643642: name=Gerasdorf bei Wien, admin_level=9
  relation 1990590: name=Alsergrund, admin_level=9
  relation 1990591: name=Hietzing, admin_level=9
  relation 1990592: name=Innere Stadt, admin_level=9
  relation 1990593: name=Josefstadt, admin_level=9
  relation 1990594: name=Leopoldstadt, admin_level=9
  relation 1990595: name=Mariahilf, admin_level=9
  relation 1990596: name=Meidling, admin_level=9
  relation 1990597: name=Neubau, admin_level=9
  relation 1990598: name=Penzing, admin_level=9
  relation 1990599: name=Rudolfsheim-Fünfhaus, admin_level=9
  relation 1990600: name=Währing, admin_level=9
  relation 1991416: name=Landstraße, admin_level=9

6:28

Great! The historical query works! Now let's also get the current snapshot and then build the complete solution.

6:28

Write /work/solve.py

content · 470 lines · py

"""
Fetch Vienna district boundaries (current and 2014-01-01) from Overpass via osmnx,
compare to find changes, and output vienna_boundary_changes.geojson.

Each feature is a MultiPolygon geometry fragment with:
  - change_type: added_since_2014 | removed_since_2014 | unchanged
  - district_name: normalised district name(s) the fragment belongs to
  - touches_changed: bool, whether that district's boundary touches any changed area
"""

import os
import sys
import json
from collections import OrderedDict

import osmnx as ox
import osmnx._overpass as op
import geopandas as gpd
import pandas as pd
import shapely
import shapely.geometry
import numpy as np

ox.settings.log_console = False

# ---------------------------------------------------------------------------
# 1. Overpass query via osmnx internals
# ---------------------------------------------------------------------------

def fetch_vienna_districts_osmnx(date=None):
    """Fetch Vienna district (admin_level=9) boundary relations via Overpass.
    
    Uses osmnx's _overpass_request to handle HTTP correctly.
    Returns a GeoDataFrame with columns: geometry, name, district_name, osm_id.
    """
    # Bounding box covering Vienna
    bbox = (48.1, 16.1, 48.35, 16.6)  # south, west, north, east
    
    if date:
        query = f"""
[out:json][timeout:180][date:"{date}"];
relation["admin_level"="9"]["boundary"="administrative"]({bbox[0]},{bbox[1]},{bbox[2]},{bbox[3]});
out body;
>;
out skel qt;
"""
    else:
        query = f"""
[out:json][timeout:180];
relation["admin_level"="9"]["boundary"="administrative"]({bbox[0]},{bbox[1]},{bbox[2]},{bbox[3]});
out body;
>;
out skel qt;
"""
    
    data = OrderedDict()
    data['data'] = query
    result = op._overpass_request(data)
    
    elements = result.get("elements", [])
    
    # Parse: collect nodes, ways, relations
    nodes = {}
    ways = {}
    relations = []
    
    for el in elements:
        etype = el.get("type")
        eid = el.get("id")
        if etype == "node":
            nodes[eid] = (el.get("lon"), el.get("lat"))
        elif etype == "way":
            ways[eid] = el.get("nodes", [])
        elif etype == "relation":
            relations.append(el)
    
    def way_to_coords(way_id):
        node_ids = ways.get(way_id, [])
        return [nodes[nid] for nid in node_ids if nid in nodes]
    
    features = []
    
    for rel in relations:
        tags = rel.get("tags", {})
        name = tags.get("name", "")
        admin_level = tags.get("admin_level", "")
        
        # Only Vienna districts (admin_level=9)
        if admin_level != "9":
            continue
        
        members = rel.get("members", [])
        
        outer_rings = []
        inner_rings = []
        
        for mem in members:
            if mem.get("type") != "way":
                continue
            role = mem.get("role", "")
            ref = mem.get("ref")
            if ref not in ways:
                continue
            coords = way_to_coords(ref)
            if len(coords) < 3:
                continue
            if role == "inner":
                inner_rings.append(coords)
            else:
                outer_rings.append(coords)
        
        polygons = []
        for outer in outer_rings:
            if len(outer) < 3:
                continue
            if outer[0] != outer[-1]:
                outer = outer + [outer[0]]
            try:
                outer_ls = shapely.LineString(outer)
                if not outer_ls.is_valid or outer_ls.is_empty:
                    continue
                
                inners_for_this = []
                remaining = []
                for inner in inner_rings:
                    if inner[0] != inner[-1]:
                        inner = inner + [inner[0]]
                    try:
                        inner_ls = shapely.LineString(inner)
                        if inner_ls.is_valid and not inner_ls.is_empty:
                            ipoly = shapely.Polygon(inner_ls)
                            outer_poly = shapely.Polygon(outer_ls)
                            if outer_poly.contains(ipoly.representative_point()) or outer_poly.intersects(ipoly):
                                inners_for_this.append(ipoly)
                            else:
                                remaining.append(inner)
                        else:
                            remaining.append(inner)
                    except Exception:
                        remaining.append(inner)
                
                inner_rings = remaining
                
                if inners_for_this:
                    poly = shapely.Polygon(outer_ls, [list(i.exterior.coords) for i in inners_for_this])
                else:
                    poly = shapely.Polygon(outer_ls)
                
                # Fix orientation and validate
                poly = shapely.polygon.orient(poly)
                if poly.is_valid and not poly.is_empty:
                    polygons.append(poly)
            except Exception:
                continue
        
        if polygons:
            geom = shapely.unary_union(polygons)
            if geom.geom_type == "Polygon":
                geom = shapely.geometry.MultiPolygon([geom])
            
            if not geom.is_empty:
                features.append({
                    "geometry": geom,
                    "name": name,
                    "osm_id": rel.get("id"),
                })
    
    if not features:
        return gpd.GeoDataFrame()
    
    gdf = gpd.GeoDataFrame(features, crs="EPSG:4326")
    return gdf


# ---------------------------------------------------------------------------
# 2. Name normalisation
# ---------------------------------------------------------------------------

# Canonical Vienna district names
VIENNA_DISTRICTS_CANON = {
    "innere stadt": "Innere Stadt",
    "leopoldstadt": "Leopoldstadt",
    "landstraße": "Landstraße",
    "landstrasse": "Landstraße",
    "wieden": "Wieden",
    "margareten": "Margareten",
    "margarethen": "Margareten",
    "mariahilf": "Mariahilf",
    "neubau": "Neubau",
    "josefstadt": "Josefstadt",
    "alsergrund": "Alsergrund",
    "favoriten": "Favoriten",
    "simmering": "Simmering",
    "meidling": "Meidling",
    "hietzing": "Hietzing",
    "penzing": "Penzing",
    "rudolfsheim-fünfhaus": "Rudolfsheim-Fünfhaus",
    "rudolfsheim-fuenfhaus": "Rudolfsheim-Fünfhaus",
    "ottakring": "Ottakring",
    "hernals": "Hernals",
    "währing": "Währing",
    "waehring": "Währing",
    "döbling": "Döbling",
    "doebling": "Döbling",
    "brigittenau": "Brigittenau",
    "floridsdorf": "Floridsdorf",
    "donaustadt": "Donaustadt",
    "liesing": "Liesing",
}


def normalise_name(name):
    """Normalise a district name to its canonical form."""
    if not name or pd.isna(name):
        return None
    n = name.strip()
    key = n.lower().replace("  ", " ").replace("ß", "ss").replace("-", "-")
    # Direct lookup
    if key in VIENNA_DISTRICTS_CANON:
        return VIENNA_DISTRICTS_CANON[key]
    # try with ß vs ss variations
    key2 = key.replace("ss", "ß")
    if key2 in VIENNA_DISTRICTS_CANON:
        return VIENNA_DISTRICTS_CANON[key2]
    # Try fuzzy: if any lookup key is contained in our key
    for pattern, canon in VIENNA_DISTRICTS_CANON.items():
        if pattern in key or key in pattern:
            return canon
    return n


# ---------------------------------------------------------------------------
# 3. Main pipeline
# ---------------------------------------------------------------------------

def main():
    print("=" * 60)
    print("Fetching CURRENT Vienna district boundaries...")
    print("=" * 60)
    current_gdf = fetch_vienna_districts_osmnx()
    print(f"  Got {len(current_gdf)} current district features")
    
    print("\n" + "=" * 60)
    print("Fetching 2014-01-01 Vienna district boundaries...")
    print("=" * 60)
    historical_gdf = fetch_vienna_districts_osmnx(date="2014-01-01T00:00:00Z")
    print(f"  Got {len(historical_gdf)} historical district features")
    
    if current_gdf.empty or historical_gdf.empty:
        print("ERROR: Empty data returned!")
        sys.exit(1)
    
    # Normalise names
    current_gdf["district_name"] = current_gdf["name"].apply(normalise_name)
    historical_gdf["district_name"] = historical_gdf["name"].apply(normalise_name)
    
    print("\nCurrent district names:")
    for n in sorted(current_gdf["district_name"].dropna().unique()):
        print(f"  - {n}")
    print("\nHistorical district names:")
    for n in sorted(historical_gdf["district_name"].dropna().unique()):
        print(f"  - {n}")
    
    # Remove non-Vienna-district features (e.g. Klein-Engersdorf, Enzersfeld in historical data)
    valid_names = set(VIENNA_DISTRICTS_CANON.values())
    current_gdf = current_gdf[current_gdf["district_name"].isin(valid_names)].copy()
    historical_gdf = historical_gdf[historical_gdf["district_name"].isin(valid_names)].copy()
    
    print(f"\nAfter filtering to Vienna districts only:")
    print(f"  Current: {len(current_gdf)} districts")
    print(f"  Historical: {len(historical_gdf)} districts")
    
    # Check for missing districts
    current_names = set(current_gdf["district_name"].dropna())
    historical_names = set(historical_gdf["district_name"].dropna())
    
    common = current_names & historical_names
    only_current = current_names - historical_names
    only_historical = historical_names - current_names
    
    print(f"\nCommon districts: {len(common)}")
    if only_current:
        print(f"Only in current: {only_current}")
    if only_historical:
        print(f"Only in historical: {only_historical}")
    
    # Build unified dissolve
    current_union = current_gdf.geometry.unary_union
    historical_union = historical_gdf.geometry.unary_union
    
    print(f"\nCurrent total area: {current_union.area:.6f} sq deg")
    print(f"Historical total area: {historical_union.area:.6f} sq deg")
    
    # Symmetric difference
    added = shapely.difference(current_union, historical_union)
    removed = shapely.difference(historical_union, current_union)
    unchanged = shapely.intersection(current_union, historical_union)
    
    print(f"Added area: {added.area:.6f}" if not added.is_empty else "Added: empty")
    print(f"Removed area: {removed.area:.6f}" if not removed.is_empty else "Removed: empty")
    print(f"Unchanged area: {unchanged.area:.6f}" if not unchanged.is_empty else "Unchanged: empty")
    
    # Collect fragments
    fragments = []
    
    # Helper to split MultiPolygon into individual fragments
    def add_fragments(geom, ctype, dist_name_fn):
        """Add geometry fragments, assigning district name via dist_name_fn."""
        if geom.is_empty:
            return
        if geom.geom_type == "MultiPolygon":
            polys = list(geom.geoms)
        elif geom.geom_type == "Polygon":
            polys = [geom]
        elif geom.geom_type == "GeometryCollection":
            polys = [g for g in geom.geoms if g.geom_type in ("Polygon", "MultiPolygon")]
        else:
            polys = [geom]
        
        for poly in polys:
            if poly.is_empty:
                continue
            if poly.geom_type == "Polygon":
                mp = shapely.geometry.MultiPolygon([poly])
            elif poly.geom_type == "MultiPolygon":
                mp = poly
            else:
                continue
            dname = dist_name_fn(mp)
            fragments.append({
                "geometry": mp,
                "change_type": ctype,
                "district_name": dname,
            })
    
    # Added fragments - assign to current districts that contain/intersect them
    def added_dist_name(geom):
        districts = set()
        for _, row in current_gdf.iterrows():
            if geom.intersects(row["geometry"]):
                districts.add(row["district_name"])
        return ", ".join(sorted(districts)) if districts else "unknown"
    
    add_fragments(added, "added_since_2014", added_dist_name)
    
    # Removed fragments - assign to historical districts
    def removed_dist_name(geom):
        districts = set()
        for _, row in historical_gdf.iterrows():
            if geom.intersects(row["geometry"]):
                districts.add(row["district_name"])
        return ", ".join(sorted(districts)) if districts else "unknown"
    
    add_fragments(removed, "removed_since_2014", removed_dist_name)
    
    # Unchanged fragments - assign per current district
    for _, row in current_gdf.iterrows():
        dname = row["district_name"]
        dist_geom = row["geometry"]
        unchanged_part = shapely.intersection(dist_geom, unchanged)
        if not unchanged_part.is_empty:
            def dn_fn(geom, dn=dname):
                return dn
            add_fragments(unchanged_part, "unchanged", dn_fn)
    
    result_gdf = gpd.GeoDataFrame(fragments, crs="EPSG:4326")
    
    if result_gdf.empty:
        print("No fragments — all boundaries unchanged. Using current boundaries as unchanged.")
        result_gdf = current_gdf.copy()
        result_gdf["change_type"] = "unchanged"
        result_gdf["district_name"] = result_gdf["district_name"]
        result_gdf = result_gdf[["geometry", "change_type", "district_name"]]
    
    print(f"\nFragments created: {len(result_gdf)}")
    print(result_gdf["change_type"].value_counts())
    
    # -----------------------------------------------------------------------
    # 4. Compute touches_changed for each district
    # -----------------------------------------------------------------------
    
    # Collect all changed geometries
    changed_geoms = []
    for _, row in result_gdf.iterrows():
        if row["change_type"] in ("added_since_2014", "removed_since_2014"):
            changed_geoms.append(row["geometry"])
    
    if changed_geoms:
        changed_union = shapely.unary_union(changed_geoms)
    else:
        changed_union = shapely.geometry.MultiPolygon()
    
    # For each district in current data, check if it touches changed areas
    # We use a small buffer (1e-7 degrees ~ 1cm) to capture near-touching
    district_touches = {}
    for _, row in current_gdf.iterrows():
        dname = row["district_name"]
        geom = row["geometry"]
        if not changed_union.is_empty:
            # Use touches with a tiny buffer for robustness
            buffered = changed_union.buffer(1e-8)
            if geom.touches(changed_union) or geom.intersects(buffered):
                district_touches[dname] = True
            else:
                district_touches[dname] = False
        else:
            district_touches[dname] = False
    
    # For historical-only districts (if any), also check
    for _, row in historical_gdf.iterrows():
        dname = row["district_name"]
        if dname not in district_touches:
            geom = row["geometry"]
            if not changed_union.is_empty and geom.intersects(changed_union):
                district_touches[dname] = True
            else:
                district_touches[dname] = False
    
    def get_touches(dn_str):
        """Check if any district in a comma-separated string touches changed areas."""
        if pd.isna(dn_str) or dn_str == "unknown":
            return False
        for d in dn_str.split(","):
            d = d.strip()
            if district_touches.get(d, False):
                return True
        return False
    
    result_gdf["touches_changed"] = result_gdf["district_name"].apply(get_touches)
    
    # Simplify geometries slightly to keep file manageable
    result_gdf["geometry"] = result_gdf["geometry"].simplify(0.00001, preserve_topology=True)
    
    # Repair any invalid geometries from simplification
    result_gdf["geometry"] = result_gdf["geometry"].buffer(0)
    
    # Ensure MultiPolygon type
    def ensure_multipolygon(geom):
        if geom.geom_type == "Polygon":
            return shapely.geometry.MultiPolygon([geom])
        return geom
    
    result_gdf["geometry"] = result_gdf["geometry"].apply(ensure_multipolygon)
    
    # -----------------------------------------------------------------------
    # 5. Output
    # -----------------------------------------------------------------------
    output_path = "/work/vienna_boundary_changes.geojson"
    result_gdf.to_file(output_path, driver="GeoJSON")
    
    print(f"\n{'=' * 60}")
    print(f"Output written to {output_path}")
    print(f"Features: {len(result_gdf)}")
    print(f"Columns: {result_gdf.columns.tolist()}")
    print(f"\nSample features:")
    print(result_gdf[["change_type", "district_name", "touches_changed"]].head(15))
    
    print(f"\nDistrict touch summary:")
    for dname, touches in sorted(district_touches.items()):
        print(f"  {dname}: touches_changed={touches}")
    
    # Verify output
    verify = gpd.read_file(output_path)
    print(f"\nVerification: read back {len(verify)} features")
    print(verify[["change_type", "district_name", "touches_changed"]].head(10))
    print(f"Geometry types: {verify.geometry.geom_type.unique()}")


if __name__ == "__main__":
    main()

Wrote /work/solve.py (17198 chars).

6:30

Bash cd /work && timeout 300 python solve.py 2>&1

============================================================
Fetching CURRENT Vienna district boundaries...
============================================================
  Got 0 current district features

============================================================
Fetching 2014-01-01 Vienna district boundaries...
============================================================
  Got 0 historical district features
ERROR: Empty data returned!

6:39

The problem is that `_overpass_request` returns data but our parsing might be failing. Let me debug.

6:39

Bash cd /work && python3 << 'PYEOF'

cd /work && python3 << 'PYEOF'
import osmnx as ox
import osmnx._overpass as op
from collections import OrderedDict

ox.settings.log_console = False

query = """
[out:json][timeout:180];
relation["admin_level"="9"]["boundary"="administrative"](48.1,16.1,48.35,16.6);
out body;
>;
out skel qt;
"""

data = OrderedDict()
data['data'] = query
result = op._overpass_request(data)
print('result type:', type(result))
print('keys:', list(result.keys()))
print('elements count:', len(result.get('elements', [])))

# Look at first relation
for el in result.get('elements', []):
    if el.get('type') == 'relation':
        tags = el.get('tags', {})
        print(f"Relation {el.get('id')}: name={tags.get('name')}, admin_level={tags.get('admin_level')}, boundary={tags.get('boundary')}")
        break

# Check what elements we have
types = {}
for el in result.get('elements', []):
    t = el.get('type')
    types[t] = types.get(t, 0) + 1
print('Element types:', types)
PYEOF

result type: <class 'dict'>
keys: ['version', 'generator', 'osm3s', 'elements']
elements count: 6732
Relation 1990590: name=Alsergrund, admin_level=9, boundary=administrative
Element types: {'relation': 23, 'node': 6210, 'way': 499}

6:52

The query returns 23 relations and 499 ways. But the code that converts to GeoDataFrame is returning 0. Let me debug the conversion.

6:52

Bash cd /work && python3 << 'PYEOF'

cd /work && python3 << 'PYEOF'
import osmnx as ox
import osmnx._overpass as op
from collections import OrderedDict
import geopandas as gpd
import shapely
import shapely.geometry

ox.settings.log_console = False

query = """
[out:json][timeout:180];
relation["admin_level"="9"]["boundary"="administrative"](48.1,16.1,48.35,16.6);
out body;
>;
out skel qt;
"""

data = OrderedDict()
data['data'] = query
result = op._overpass_request(data)

elements = result.get("elements", [])

# Collect nodes, ways, relations
nodes = {}
ways = {}
relations = []

for el in elements:
    etype = el.get("type")
    eid = el.get("id")
    if etype == "node":
        nodes[eid] = (el.get("lon"), el.get("lat"))
    elif etype == "way":
        ways[eid] = el.get("nodes", [])
    elif etype == "relation":
        relations.append(el)

print(f"Nodes: {len(nodes)}, Ways: {len(ways)}, Relations: {len(relations)}")

# Debug a specific relation - Alsergrund
for rel in relations:
    tags = rel.get("tags", {})
    name = tags.get("name", "")
    if name == "Alsergrund":
        print(f"\nAlsergrund relation id={rel.get('id')}")
        members = rel.get("members", [])
        print(f"  Members: {len(members)}")
        for mem in members[:10]:
            print(f"    type={mem.get('type')} ref={mem.get('ref')} role={mem.get('role')}")
        # Check which way refs are in our ways dict
        way_refs = [m.get('ref') for m in members if m.get('type') == 'way']
        print(f"  Way refs in ways dict: {sum(1 for r in way_refs if r in ways)}/{len(way_refs)}")
        
        # Check outer rings
        outer_rings = []
        inner_rings = []
        for mem in members:
            if mem.get("type") != "way":
                continue
            role = mem.get("role", "")
            ref = mem.get("ref")
            if ref not in ways:
                continue
            coords = [(nodes[nid] if nid in nodes else None) for nid in ways[ref]]
            coords = [c for c in coords if c is not None]
            if len(coords) < 3:
                continue
            if role == "inner":
                inner_rings.append(coords)
            else:
                outer_rings.append(coords)
        
        print(f"  Outer rings built: {len(outer_rings)}, Inner rings: {len(inner_rings)}")
        
        # Check first outer ring
        if outer_rings:
            outer = outer_rings[0]
            print(f"  First outer ring length: {len(outer)}")
            if outer[0] != outer[-1]:
                outer = outer + [outer[0]]
            print(f"  After closing: {len(outer)} coords")
            try:
                ls = shapely.LineString(outer)
                print(f"  LineString valid: {ls.is_valid}, empty: {ls.is_empty}")
                poly = shapely.Polygon(ls)
                print(f"  Polygon valid: {poly.is_valid}, area: {poly.area}")
            except Exception as e:
                print(f"  Error: {e}")
        break
PYEOF

Nodes: 6210, Ways: 499, Relations: 23

Alsergrund relation id=1990590
  Members: 9
    type=way ref=147468285 role=outer
    type=way ref=8046654 role=outer
    type=way ref=147396089 role=outer
    type=way ref=147463455 role=outer
    type=way ref=147463456 role=outer
    type=way ref=390540126 role=outer
    type=way ref=390540129 role=outer
    type=way ref=147463454 role=outer
    type=way ref=147463462 role=outer
  Way refs in ways dict: 9/9
  Outer rings built: 9, Inner rings: 0
  First outer ring length: 32
  After closing: 33 coords
  LineString valid: True, empty: False
  Polygon valid: False, area: 4.780675000099115e-08

7:08

The polygon is invalid - it's likely a multipolygon with many outer ways forming separate rings, not a single polygon. Each "outer" way in a district boundary relation typically forms part of the outer boundary, meaning they are separate line segments that together form one ring. But we're treating each way as a separate outer ring (polygon), which is wrong.

Actually, for Vienna districts, the boundary is typically a single ring. Each way is a segment of that ring. So we need to join them into a single closed ring by matching start/end nodes.

Let me check what these ways look like.

7:08

Bash cd /work && python3 << 'PYEOF'

cd /work && python3 << 'PYEOF'
import osmnx as ox
import osmnx._overpass as op
from collections import OrderedDict

ox.settings.log_console = False

query = """
[out:json][timeout:180];
relation["admin_level"="9"]["boundary"="administrative"](48.1,16.1,48.35,16.6);
out body;
>;
out skel qt;
"""

data = OrderedDict()
data['data'] = query
result = op._overpass_request(data)

elements = result.get("elements", [])

nodes = {}
ways = {}
relations = []

for el in elements:
    etype = el.get("type")
    eid = el.get("id")
    if etype == "node":
        nodes[eid] = (el.get("lon"), el.get("lat"))
    elif etype == "way":
        ways[eid] = el.get("nodes", [])
    elif etype == "relation":
        relations.append(el)

# Check Alsergrund ways to see how they connect
for rel in relations:
    tags = rel.get("tags", {})
    name = tags.get("name", "")
    if name == "Alsergrund":
        members = rel.get("members", [])
        print("Alsergrund way segments:")
        for mem in members:
            if mem.get("type") == "way":
                ref = mem.get("ref")
                node_ids = ways.get(ref, [])
                if node_ids:
                    first_node = nodes.get(node_ids[0], "?")
                    last_node = nodes.get(node_ids[-1], "?")
                    print(f"  way {ref}: {node_ids[0]}->{node_ids[-1]}, "
                          f"first={first_node}, last={last_node}, len={len(node_ids)}")
        
        # Try to assemble the ring
        # Collect all way segments with their start/end nodes
        segments = []
        for mem in members:
            if mem.get("type") == "way":
                ref = mem.get("ref")
                node_ids = ways.get(ref, [])
                if node_ids:
                    coords = [nodes[nid] for nid in node_ids if nid in nodes]
                    if len(coords) >= 2:
                        segments.append((coords, node_ids[0], node_ids[-1]))
        
        print(f"\nTotal segments: {len(segments)}")
        
        # Simple ring assembly: start with first segment, then find matching endpoints
        if segments:
            # Use the first segment's first node and last node
            ring_coords = list(segments[0][0])
            used = {0}
            current_end_node = segments[0][2]  # last node id of first segment
            
            # Try to find next segment
            for _ in range(len(segments) * 2):
                found = False
                for i, (coords, start_node, end_node) in enumerate(segments):
                    if i in used:
                        continue
                    if start_node == current_end_node:
                        # Append coords (skip first coord to avoid duplication)
                        ring_coords.extend(coords[1:])
                        current_end_node = end_node
                        used.add(i)
                        found = True
                        break
                    elif end_node == current_end_node:
                        # Reverse this segment
                        rev_coords = list(reversed(coords))
                        ring_coords.extend(rev_coords[1:])
                        current_end_node = start_node
                        used.add(i)
                        found = True
                        break
                if not found:
                    break
            
            print(f"Used {len(used)} segments, ring has {len(ring_coords)} coords")
            if ring_coords[0] != ring_coords[-1]:
                ring_coords.append(ring_coords[0])
            print(f"Ring closed: {ring_coords[0] == ring_coords[-1]}, first={ring_coords[0]}, last={ring_coords[-1]}")
            
            import shapely
            ls = shapely.LineString(ring_coords)
            print(f"LineString valid: {ls.is_valid}, length: {ls.length}")
            poly = shapely.Polygon(ls)
            print(f"Polygon valid: {poly.is_valid}, area: {poly.area}")
            
            # Print first few and last few coords
            print(f"First 3 coords: {ring_coords[:3]}")
            print(f"Last 3 coords: {ring_coords[-3:]}")
        break
PYEOF

Alsergrund way segments:
  way 147468285: 731110806->2157233697, first=(16.3408703, 48.2151672), last=(16.3419883, 48.2172473), len=32
  way 8046654: 731110806->313620580, first=(16.3408703, 48.2151672), last=(16.3565102, 48.2143408), len=28
  way 147396089: 1606198290->313620580, first=(16.370149, 48.2184891), last=(16.3565102, 48.2143408), len=33
  way 147463455: 1606822119->1606198290, first=(16.3674575, 48.2251075), last=(16.370149, 48.2184891), len=15
  way 147463456: 1606822180->1606822119, first=(16.3616214, 48.236309), last=(16.3674575, 48.2251075), len=22
  way 390540126: 3936984440->1606822180, first=(16.359617, 48.236246), last=(16.3616214, 48.236309), len=3
  way 390540129: 3936984434->3936984440, first=(16.3567669, 48.2361278), last=(16.359617, 48.236246), len=3
  way 147463454: 731244444->3936984434, first=(16.3537656, 48.231992), last=(16.3567669, 48.2361278), len=17
  way 147463462: 2157233697->731244444, first=(16.3419883, 48.2172473), last=(16.3537656, 48.231992), len=268

Total segments: 9
Used 9 segments, ring has 413 coords
Ring closed: True, first=(16.3408703, 48.2151672), last=(16.3408703, 48.2151672)
LineString valid: True, length: 0.08759135674212101
Polygon valid: True, area: 0.00035913040147501186
First 3 coords: [(16.3408703, 48.2151672), (16.3409357, 48.2152918), (16.3412026, 48.2157703)]
Last 3 coords: [(16.3440565, 48.2152535), (16.3433909, 48.2152357), (16.3408703, 48.2151672)]

8:12

The issue is that the individual way segments from the relation don't form a single ring - they need to be assembled. My earlier code was treating each outer way as a separate ring and building separate polygons from each, which produced invalid polygons. I need to properly assemble the outer ring from all outer way members, and separately handle inner rings.

Let me rewrite the assembly code properly.

8:12

Write /work/solve.py

content · 587 lines · py

"""
Fetch Vienna district boundaries (current and 2014-01-01) from Overpass via osmnx,
compare to find changes, and output vienna_boundary_changes.geojson.

Each feature is a MultiPolygon geometry fragment with:
  - change_type: added_since_2014 | removed_since_2014 | unchanged
  - district_name: normalised district name(s) the fragment belongs to
  - touches_changed: bool, whether that district's boundary touches any changed area
"""

import os
import sys
import json
from collections import OrderedDict

import osmnx as ox
import osmnx._overpass as op
import geopandas as gpd
import pandas as pd
import shapely
import shapely.geometry
import numpy as np

ox.settings.log_console = False

# ---------------------------------------------------------------------------
# 1. Overpass query via osmnx internals
# ---------------------------------------------------------------------------

def assemble_ring_from_ways(way_refs, role, ways_dict, nodes_dict):
    """Assemble a closed ring from a list of way references.
    
    Ways in OSM relations are directed segments that need to be chained
    end-to-end by matching their start/end node IDs.
    
    Returns list of (lon, lat) tuples forming a closed ring, or None.
    """
    if not way_refs:
        return None
    
    # Get all segments with their start/end node ids
    segments = []
    for ref in way_refs:
        if ref not in ways_dict:
            continue
        node_ids = ways_dict[ref]
        if len(node_ids) < 2:
            continue
        coords = [nodes_dict.get(nid) for nid in node_ids]
        coords = [c for c in coords if c is not None]
        if len(coords) < 2:
            continue
        segments.append({
            'ref': ref,
            'coords': coords,
            'start_node': node_ids[0],
            'end_node': node_ids[-1],
        })
    
    if not segments:
        return None
    
    # Chain segments together
    used = [False] * len(segments)
    
    # Start with the first segment
    ring_coords = list(segments[0]['coords'])
    current_end_node = segments[0]['end_node']
    used[0] = True
    
    # Keep connecting segments until we close the ring or can't find more
    changed = True
    while changed:
        changed = False
        for i, seg in enumerate(segments):
            if used[i]:
                continue
            if seg['start_node'] == current_end_node:
                # Append coords (skip first to avoid duplication)
                ring_coords.extend(seg['coords'][1:])
                current_end_node = seg['end_node']
                used[i] = True
                changed = True
                break
            elif seg['end_node'] == current_end_node:
                # Reverse segment
                rev_coords = list(reversed(seg['coords']))
                ring_coords.extend(rev_coords[1:])
                current_end_node = seg['start_node']
                used[i] = True
                changed = True
                break
    
    # Check if the ring is closed
    if ring_coords[0] != ring_coords[-1]:
        ring_coords.append(ring_coords[0])
    
    return ring_coords


def fetch_vienna_districts_osmnx(date=None):
    """Fetch Vienna district (admin_level=9) boundary relations via Overpass.
    
    Returns a GeoDataFrame with columns: geometry, name, district_name, osm_id.
    """
    bbox = (48.1, 16.1, 48.35, 16.6)  # south, west, north, east
    
    if date:
        query = f"""
[out:json][timeout:180][date:"{date}"];
relation["admin_level"="9"]["boundary"="administrative"]({bbox[0]},{bbox[1]},{bbox[2]},{bbox[3]});
out body;
>;
out skel qt;
"""
    else:
        query = f"""
[out:json][timeout:180];
relation["admin_level"="9"]["boundary"="administrative"]({bbox[0]},{bbox[1]},{bbox[2]},{bbox[3]});
out body;
>;
out skel qt;
"""
    
    data = OrderedDict()
    data['data'] = query
    result = op._overpass_request(data)
    
    elements = result.get("elements", [])
    
    nodes = {}
    ways = {}
    relations = []
    
    for el in elements:
        etype = el.get("type")
        eid = el.get("id")
        if etype == "node":
            nodes[eid] = (el.get("lon"), el.get("lat"))
        elif etype == "way":
            ways[eid] = el.get("nodes", [])
        elif etype == "relation":
            relations.append(el)
    
    features = []
    
    for rel in relations:
        tags = rel.get("tags", {})
        name = tags.get("name", "")
        admin_level = tags.get("admin_level", "")
        
        if admin_level != "9":
            continue
        
        members = rel.get("members", [])
        
        # Collect way refs by role
        outer_refs = []
        inner_refs = []
        
        for mem in members:
            if mem.get("type") != "way":
                continue
            role = mem.get("role", "")
            ref = mem.get("ref")
            if ref not in ways:
                continue
            if role == "inner":
                inner_refs.append(ref)
            else:
                outer_refs.append(ref)
        
        # Assemble each outer ring
        outer_polygons = []
        
        # Group outer ways into separate rings
        # Multiple outer rings mean multipolygon
        unused_outer = list(outer_refs)
        
        while unused_outer:
            # Try to build a ring starting from the first unused way
            ring = assemble_ring_from_ways(unused_outer, 'outer', ways, nodes)
            if ring is None or len(ring) < 4:
                break
            
            # Remove used ways
            # (We need to track which refs were used — for simplicity, remove those
            # that start/end match what we consumed)
            # Better approach: find which refs have coords in the ring
            
            # Create linestring and polygon
            try:
                ls = shapely.LineString(ring)
                if not ls.is_valid or ls.is_empty or ls.length < 1e-10:
                    break
                poly = shapely.Polygon(ls)
                poly = shapely.polygon.orient(poly)
                if poly.is_valid and not poly.is_empty and poly.area > 1e-12:
                    outer_polygons.append(poly)
            except Exception:
                break
            
            # Remove used ways from unused_outer
            # We find which refs contributed by checking coordinates
            ring_coords_set = set(tuple(round(c, 10) for c in coord) for coord in ring)
            new_unused = []
            for ref in unused_outer:
                if ref not in ways:
                    continue
                node_ids = ways[ref]
                # Check if this way contributed to the ring
                contributed = False
                for nid in node_ids:
                    coord = nodes.get(nid)
                    if coord and tuple(round(c, 10) for c in coord) in ring_coords_set:
                        contributed = True
                        break
                if not contributed:
                    new_unused.append(ref)
            
            if len(new_unused) == len(unused_outer):
                # No progress - break to avoid infinite loop
                break
            unused_outer = new_unused
        
        if not outer_polygons:
            continue
        
        # Now handle inner rings
        # For each outer polygon, find contained inner rings
        inner_polygons = []
        for ref in inner_refs:
            if ref not in ways:
                continue
            node_ids = ways[ref]
            coords = [nodes.get(nid) for nid in node_ids]
            coords = [c for c in coords if c is not None]
            if len(coords) < 3:
                continue
            if coords[0] != coords[-1]:
                coords.append(coords[0])
            try:
                inner_ls = shapely.LineString(coords)
                if inner_ls.is_valid and not inner_ls.is_empty and inner_ls.length > 1e-10:
                    inner_poly = shapely.Polygon(inner_ls)
                    inner_poly = shapely.polygon.orient(inner_poly)
                    if inner_poly.is_valid and not inner_poly.is_empty:
                        inner_polygons.append(inner_poly)
            except Exception:
                continue
        
        # Build final polygons with holes
        final_polygons = []
        for outer_poly in outer_polygons:
            # Find inners contained in this outer
            inners_for_this = []
            remaining_inners = []
            for ipoly in inner_polygons:
                try:
                    if outer_poly.contains(ipoly.representative_point()) or outer_poly.intersects(ipoly):
                        inners_for_this.append(ipoly)
                    else:
                        remaining_inners.append(ipoly)
                except Exception:
                    remaining_inners.append(ipoly)
            inner_polygons = remaining_inners
            
            if inners_for_this:
                # Create polygon with holes
                try:
                    outer_coords = list(outer_poly.exterior.coords)
                    inner_coords_list = [list(ip.exterior.coords) for ip in inners_for_this]
                    poly = shapely.Polygon(outer_coords, inner_coords_list)
                    poly = shapely.polygon.orient(poly)
                    if poly.is_valid and not poly.is_empty:
                        final_polygons.append(poly)
                except Exception:
                    final_polygons.append(outer_poly)
            else:
                final_polygons.append(outer_poly)
        
        if final_polygons:
            geom = shapely.unary_union(final_polygons)
            if geom.geom_type == "Polygon":
                geom = shapely.geometry.MultiPolygon([geom])
            
            if not geom.is_empty:
                features.append({
                    "geometry": geom,
                    "name": name,
                    "osm_id": rel.get("id"),
                })
    
    if not features:
        return gpd.GeoDataFrame()
    
    gdf = gpd.GeoDataFrame(features, crs="EPSG:4326")
    return gdf


# ---------------------------------------------------------------------------
# 2. Name normalisation
# ---------------------------------------------------------------------------

# Canonical Vienna district names
VIENNA_DISTRICTS_CANON = {
    "innere stadt": "Innere Stadt",
    "leopoldstadt": "Leopoldstadt",
    "landstraße": "Landstraße",
    "landstrasse": "Landstraße",
    "wieden": "Wieden",
    "margareten": "Margareten",
    "margarethen": "Margareten",
    "mariahilf": "Mariahilf",
    "neubau": "Neubau",
    "josefstadt": "Josefstadt",
    "alsergrund": "Alsergrund",
    "favoriten": "Favoriten",
    "simmering": "Simmering",
    "meidling": "Meidling",
    "hietzing": "Hietzing",
    "penzing": "Penzing",
    "rudolfsheim-fünfhaus": "Rudolfsheim-Fünfhaus",
    "rudolfsheim-fuenfhaus": "Rudolfsheim-Fünfhaus",
    "ottakring": "Ottakring",
    "hernals": "Hernals",
    "währing": "Währing",
    "waehring": "Währing",
    "döbling": "Döbling",
    "doebling": "Döbling",
    "brigittenau": "Brigittenau",
    "floridsdorf": "Floridsdorf",
    "donaustadt": "Donaustadt",
    "liesing": "Liesing",
}


def normalise_name(name):
    """Normalise a district name to its canonical form."""
    if not name or pd.isna(name):
        return None
    n = name.strip()
    key = n.lower().replace("  ", " ").replace("ß", "ss").replace("-", "-")
    if key in VIENNA_DISTRICTS_CANON:
        return VIENNA_DISTRICTS_CANON[key]
    key2 = key.replace("ss", "ß")
    if key2 in VIENNA_DISTRICTS_CANON:
        return VIENNA_DISTRICTS_CANON[key2]
    for pattern, canon in VIENNA_DISTRICTS_CANON.items():
        if pattern in key or key in pattern:
            return canon
    return n


# ---------------------------------------------------------------------------
# 3. Main pipeline
# ---------------------------------------------------------------------------

def main():
    print("=" * 60)
    print("Fetching CURRENT Vienna district boundaries...")
    print("=" * 60)
    current_gdf = fetch_vienna_districts_osmnx()
    print(f"  Got {len(current_gdf)} current district features")
    
    print("\n" + "=" * 60)
    print("Fetching 2014-01-01 Vienna district boundaries...")
    print("=" * 60)
    historical_gdf = fetch_vienna_districts_osmnx(date="2014-01-01T00:00:00Z")
    print(f"  Got {len(historical_gdf)} historical district features")
    
    if current_gdf.empty or historical_gdf.empty:
        print("ERROR: Empty data returned!")
        sys.exit(1)
    
    # Normalise names
    current_gdf["district_name"] = current_gdf["name"].apply(normalise_name)
    historical_gdf["district_name"] = historical_gdf["name"].apply(normalise_name)
    
    print("\nCurrent district names:")
    for n in sorted(current_gdf["district_name"].dropna().unique()):
        print(f"  - {n}")
    print("\nHistorical district names:")
    for n in sorted(historical_gdf["district_name"].dropna().unique()):
        print(f"  - {n}")
    
    # Remove non-Vienna-district features
    valid_names = set(VIENNA_DISTRICTS_CANON.values())
    current_gdf = current_gdf[current_gdf["district_name"].isin(valid_names)].copy()
    historical_gdf = historical_gdf[historical_gdf["district_name"].isin(valid_names)].copy()
    
    print(f"\nAfter filtering to Vienna districts only:")
    print(f"  Current: {len(current_gdf)} districts")
    print(f"  Historical: {len(historical_gdf)} districts")
    
    current_names = set(current_gdf["district_name"].dropna())
    historical_names = set(historical_gdf["district_name"].dropna())
    
    common = current_names & historical_names
    only_current = current_names - historical_names
    only_historical = historical_names - current_names
    
    print(f"\nCommon districts: {len(common)}")
    if only_current:
        print(f"Only in current: {only_current}")
    if only_historical:
        print(f"Only in historical: {only_historical}")
    
    # Build unified dissolve
    current_union = current_gdf.geometry.unary_union
    historical_union = historical_gdf.geometry.unary_union
    
    print(f"\nCurrent total area: {current_union.area:.6f} sq deg")
    print(f"Historical total area: {historical_union.area:.6f} sq deg")
    
    # Symmetric difference
    added = shapely.difference(current_union, historical_union)
    removed = shapely.difference(historical_union, current_union)
    unchanged = shapely.intersection(current_union, historical_union)
    
    print(f"Added area: {added.area:.6f}" if not added.is_empty else "Added: empty")
    print(f"Removed area: {removed.area:.6f}" if not removed.is_empty else "Removed: empty")
    print(f"Unchanged area: {unchanged.area:.6f}" if not unchanged.is_empty else "Unchanged: empty")
    
    # Collect fragments
    fragments = []
    
    def add_fragments(geom, ctype, dist_name_fn):
        if geom.is_empty:
            return
        if geom.geom_type == "MultiPolygon":
            polys = list(geom.geoms)
        elif geom.geom_type == "Polygon":
            polys = [geom]
        elif geom.geom_type == "GeometryCollection":
            polys = [g for g in geom.geoms if g.geom_type in ("Polygon", "MultiPolygon")]
        else:
            polys = [geom]
        
        for poly in polys:
            if poly.is_empty:
                continue
            if poly.geom_type == "Polygon":
                mp = shapely.geometry.MultiPolygon([poly])
            elif poly.geom_type == "MultiPolygon":
                mp = poly
            else:
                continue
            dname = dist_name_fn(mp)
            fragments.append({
                "geometry": mp,
                "change_type": ctype,
                "district_name": dname,
            })
    
    def added_dist_name(geom):
        districts = set()
        for _, row in current_gdf.iterrows():
            if geom.intersects(row["geometry"]):
                districts.add(row["district_name"])
        return ", ".join(sorted(districts)) if districts else "unknown"
    
    add_fragments(added, "added_since_2014", added_dist_name)
    
    def removed_dist_name(geom):
        districts = set()
        for _, row in historical_gdf.iterrows():
            if geom.intersects(row["geometry"]):
                districts.add(row["district_name"])
        return ", ".join(sorted(districts)) if districts else "unknown"
    
    add_fragments(removed, "removed_since_2014", removed_dist_name)
    
    # Unchanged fragments - per current district
    for _, row in current_gdf.iterrows():
        dname = row["district_name"]
        dist_geom = row["geometry"]
        unchanged_part = shapely.intersection(dist_geom, unchanged)
        if not unchanged_part.is_empty:
            def dn_fn(geom, dn=dname):
                return dn
            add_fragments(unchanged_part, "unchanged", dn_fn)
    
    result_gdf = gpd.GeoDataFrame(fragments, crs="EPSG:4326")
    
    if result_gdf.empty:
        print("No fragments — all boundaries unchanged.")
        result_gdf = current_gdf.copy()
        result_gdf["change_type"] = "unchanged"
        result_gdf["district_name"] = result_gdf["district_name"]
        result_gdf = result_gdf[["geometry", "change_type", "district_name"]]
    
    print(f"\nFragments created: {len(result_gdf)}")
    if not result_gdf.empty:
        print(result_gdf["change_type"].value_counts())
    
    # -----------------------------------------------------------------------
    # 4. Compute touches_changed for each district
    # -----------------------------------------------------------------------
    
    changed_geoms = []
    for _, row in result_gdf.iterrows():
        if row["change_type"] in ("added_since_2014", "removed_since_2014"):
            changed_geoms.append(row["geometry"])
    
    if changed_geoms:
        changed_union = shapely.unary_union(changed_geoms)
    else:
        changed_union = shapely.geometry.MultiPolygon()
    
    district_touches = {}
    for _, row in current_gdf.iterrows():
        dname = row["district_name"]
        geom = row["geometry"]
        if not changed_union.is_empty:
            # Use touches or a small buffer for robustness
            touches = geom.touches(changed_union)
            if not touches:
                # Also check if the district itself contains changed areas
                overlap = shapely.intersection(geom, changed_union)
                touches = not overlap.is_empty
            district_touches[dname] = touches
        else:
            district_touches[dname] = False
    
    for _, row in historical_gdf.iterrows():
        dname = row["district_name"]
        if dname not in district_touches:
            geom = row["geometry"]
            if not changed_union.is_empty:
                touches = geom.touches(changed_union)
                if not touches:
                    overlap = shapely.intersection(geom, changed_union)
                    touches = not overlap.is_empty
                district_touches[dname] = touches
            else:
                district_touches[dname] = False
    
    def get_touches(dn_str):
        if pd.isna(dn_str) or dn_str == "unknown":
            return False
        for d in dn_str.split(","):
            d = d.strip()
            if district_touches.get(d, False):
                return True
        return False
    
    result_gdf["touches_changed"] = result_gdf["district_name"].apply(get_touches)
    
    # Simplify geometries slightly
    result_gdf["geometry"] = result_gdf["geometry"].simplify(0.00001, preserve_topology=True)
    result_gdf["geometry"] = result_gdf["geometry"].buffer(0)
    
    # Ensure MultiPolygon type
    def ensure_multipolygon(geom):
        if geom.geom_type == "Polygon":
            return shapely.geometry.MultiPolygon([geom])
        return geom
    
    result_gdf["geometry"] = result_gdf["geometry"].apply(ensure_multipolygon)
    
    # -----------------------------------------------------------------------
    # 5. Output
    # -----------------------------------------------------------------------
    output_path = "/work/vienna_boundary_changes.geojson"
    result_gdf.to_file(output_path, driver="GeoJSON")
    
    print(f"\n{'=' * 60}")
    print(f"Output written to {output_path}")
    print(f"Features: {len(result_gdf)}")
    print(f"Columns: {result_gdf.columns.tolist()}")
    print(f"\nSample features:")
    print(result_gdf[["change_type", "district_name", "touches_changed"]].head(15))
    
    print(f"\nDistrict touch summary:")
    for dname, touches in sorted(district_touches.items()):
        print(f"  {dname}: touches_changed={touches}")
    
    # Verify output
    verify = gpd.read_file(output_path)
    print(f"\nVerification: read back {len(verify)} features")
    print(verify[["change_type", "district_name", "touches_changed"]].head(10))
    print(f"Geometry types: {verify.geometry.geom_type.unique()}")


if __name__ == "__main__":
    main()

Wrote /work/solve.py (21127 chars).

8:14

The ring assembly is getting complex. Let me test it first with a quick run to see if we can get the districts to parse correctly.

8:14

Bash cd /work && timeout 300 python solve.py 2>&1

============================================================
Fetching CURRENT Vienna district boundaries...
============================================================
  Got 0 current district features

============================================================
Fetching 2014-01-01 Vienna district boundaries...
============================================================
  Got 0 historical district features
ERROR: Empty data returned!

8:25

The ring assembly logic seems to be the issue. Let me debug it more carefully. The issue is that `assemble_ring_from_ways` might be consuming all outer refs as one ring, or the way detection is failing.

Let me trace through the logic more carefully.

8:25

Bash cd /work && python3 << 'PYEOF'

cd /work && python3 << 'PYEOF'
import osmnx as ox
import osmnx._overpass as op
from collections import OrderedDict

ox.settings.log_console = False

query = """
[out:json][timeout:180];
relation["admin_level"="9"]["boundary"="administrative"](48.1,16.1,48.35,16.6);
out body;
>;
out skel qt;
"""

data = OrderedDict()
data['data'] = query
result = op._overpass_request(data)
elements = result.get("elements", [])

nodes = {}
ways = {}
relations = []

for el in elements:
    etype = el.get("type")
    eid = el.get("id")
    if etype == "node":
        nodes[eid] = (el.get("lon"), el.get("lat"))
    elif etype == "way":
        ways[eid] = el.get("nodes", [])
    elif etype == "relation":
        relations.append(el)

print(f"Nodes: {len(nodes)}, Ways: {len(ways)}, Relations: {len(relations)}")

# Test assemble_ring_from_ways for Alsergrund
def assemble_ring_from_ways(way_refs, role, ways_dict, nodes_dict):
    if not way_refs:
        return None
    
    segments = []
    for ref in way_refs:
        if ref not in ways_dict:
            continue
        node_ids = ways_dict[ref]
        if len(node_ids) < 2:
            continue
        coords = [nodes_dict.get(nid) for nid in node_ids]
        coords = [c for c in coords if c is not None]
        if len(coords) < 2:
            continue
        segments.append({
            'ref': ref,
            'coords': coords,
            'start_node': node_ids[0],
            'end_node': node_ids[-1],
        })
    
    if not segments:
        return None
    
    used = [False] * len(segments)
    ring_coords = list(segments[0]['coords'])
    current_end_node = segments[0]['end_node']
    used[0] = True
    
    changed = True
    while changed:
        changed = False
        for i, seg in enumerate(segments):
            if used[i]:
                continue
            if seg['start_node'] == current_end_node:
                ring_coords.extend(seg['coords'][1:])
                current_end_node = seg['end_node']
                used[i] = True
                changed = True
                break
            elif seg['end_node'] == current_end_node:
                rev_coords = list(reversed(seg['coords']))
                ring_coords.extend(rev_coords[1:])
                current_end_node = seg['start_node']
                used[i] = True
                changed = True
                break
    
    if ring_coords[0] != ring_coords[-1]:
        ring_coords.append(ring_coords[0])
    
    return ring_coords, used

# Test for Alsergrund
for rel in relations:
    tags = rel.get("tags", {})
    name = tags.get("name", "")
    if name == "Alsergrund":
        members = rel.get("members", [])
        outer_refs = []
        inner_refs = []
        for mem in members:
            if mem.get("type") != "way":
                continue
            role = mem.get("role", "")
            ref = mem.get("ref")
            if ref not in ways:
                continue
            if role == "inner":
                inner_refs.append(ref)
            else:
                outer_refs.append(ref)
        
        print(f"\nAlsergrund: {len(outer_refs)} outer refs, {len(inner_refs)} inner refs")
        print(f"Outer refs: {outer_refs}")
        
        result = assemble_ring_from_ways(outer_refs, 'outer', ways, nodes)
        if result:
            ring_coords, used = result
            print(f"Ring: {len(ring_coords)} coords, used: {used}")
            print(f"Closed: {ring_coords[0] == ring_coords[-1]}")
            import shapely
            ls = shapely.LineString(ring_coords)
            print(f"LS valid: {ls.is_valid}, length: {ls.length}")
            poly = shapely.Polygon(ring_coords)
            print(f"Poly valid: {poly.is_valid}, area: {poly.area}")
        else:
            print("Ring assembly returned None")
        break

# Test all relations
print("\n\nTesting ALL relations:")
for rel in relations:
    tags = rel.get("tags", {})
    name = tags.get("name", "")
    members = rel.get("members", [])
    outer_refs = []
    for mem in members:
        if mem.get("type") != "way":
            continue
        role = mem.get("role", "")
        ref = mem.get("ref")
        if ref in ways and role != "inner":
            outer_refs.append(ref)
    
    result = assemble_ring_from_ways(outer_refs, 'outer', ways, nodes)
    if result:
        ring_coords, used = result
        import shapely
        ls = shapely.LineString(ring_coords)
        poly = shapely.Polygon(ring_coords)
        valid = poly.is_valid and poly.area > 1e-12
        print(f"{name}: {len(outer_refs)} outer refs, ring {len(ring_coords)} coords, used all={all(used)}, valid={valid}, area={poly.area:.8f}")
    else:
        print(f"{name}: FAILED - {len(outer_refs)} outer refs")
PYEOF

Nodes: 6210, Ways: 499, Relations: 23

Alsergrund: 9 outer refs, 0 inner refs
Outer refs: [147468285, 8046654, 147396089, 147463455, 147463456, 390540126, 390540129, 147463454, 147463462]
Ring: 413 coords, used: [True, True, True, True, True, True, True, True, True]
Closed: True
LS valid: True, length: 0.08759135674212101
Poly valid: True, area: 0.00035913040147501186


Testing ALL relations:
Alsergrund: 9 outer refs, ring 413 coords, used all=True, valid=True, area=0.00035913
Hietzing: 50 outer refs, ring 308 coords, used all=True, valid=True, area=0.00456027
Innere Stadt: 7 outer refs, ring 176 coords, used all=True, valid=True, area=0.00034707
Josefstadt: 16 outer refs, ring 137 coords, used all=True, valid=True, area=0.00013189
Leopoldstadt: 11 outer refs, ring 208 coords, used all=True, valid=True, area=0.00232716
Mariahilf: 26 outer refs, ring 212 coords, used all=True, valid=True, area=0.00017618
Meidling: 53 outer refs, ring 306 coords, used all=False, valid=True, area=0.00097967
Neubau: 31 outer refs, ring 182 coords, used all=True, valid=True, area=0.00019452
Penzing: 45 outer refs, ring 1317 coords, used all=True, valid=True, area=0.00408591
Rudolfsheim-Fünfhaus: 16 outer refs, ring 194 coords, used all=True, valid=True, area=0.00047390
Währing: 51 outer refs, ring 835 coords, used all=True, valid=True, area=0.00076838
Landstraße: 48 outer refs, ring 326 coords, used all=True, valid=True, area=0.00089591
Brigittenau: 10 outer refs, ring 113 coords, used all=True, valid=True, area=0.00069161
Donaustadt: 66 outer refs, ring 572 coords, used all=True, valid=True, area=0.01238082
Döbling: 51 outer refs, ring 488 coords, used all=True, valid=True, area=0.00302130
Favoriten: 43 outer refs, ring 417 coords, used all=True, valid=True, area=0.00384630
Floridsdorf: 38 outer refs, ring 430 coords, used all=True, valid=True, area=0.00538470
Hernals: 81 outer refs, ring 1137 coords, used all=True, valid=True, area=0.00137956
Liesing: 69 outer refs, ring 921 coords, used all=True, valid=True, area=0.00387493
Margareten: 22 outer refs, ring 177 coords, used all=True, valid=True, area=0.00024329
Ottakring: 76 outer refs, ring 668 coords, used all=True, valid=True, area=0.00104948
Simmering: 43 outer refs, ring 287 coords, used all=True, valid=True, area=0.00281169
Wieden: 30 outer refs, ring 175 coords, used all=True, valid=True, area=0.00021484

8:50

The ring assembly works correctly. The issue is that my function returns a tuple (ring_coords, used) but in the solve.py I had the function returning just ring_coords (or None). Let me check the solve.py code more carefully.

Actually, looking at the solve.py code, the `assemble_ring_from_ways` function there returns `ring_coords` or None (not a tuple). That's correct for that version. But the issue is that it only builds one ring from all the outer ways - and for districts with multiple outer rings (like Meidling with 53 outer ways but only some used), we need to handle multiple rings.

Wait, looking at the test output: Meidling has 53 outer refs but only some were used in the ring (used all=False). This means there are multiple rings. The function only builds one ring. After building one ring, the unused ways remain and need to be built into additional rings.

But wait, the actual problem is that in solve.py, the function `assemble_ring_from_ways` is being called correctly but then the result is used to create polygons. Let me look at the code flow more carefully...

Actually, the issue might be that in `fetch_vienna_districts_osmnx`, after building the first ring, I try to figure out which ways were used by checking if their coordinates are in the ring. This is fragile. Let me rework this.

Let me rewrite solve.py to properly handle this. The key insight from the test is:
1. The ring assembly works for most districts
2. For multi-ring districts, we need to iteratively build rings from remaining unused ways

Let me rewrite the function more cleanly.

8:50

Write /work/solve.py

content · 529 lines · py

"""
Fetch Vienna district boundaries (current and 2014-01-01) from Overpass via osmnx,
compare to find changes, and output vienna_boundary_changes.geojson.
"""

import os
import sys
import json
from collections import OrderedDict

import osmnx as ox
import osmnx._overpass as op
import geopandas as gpd
import pandas as pd
import shapely
import shapely.geometry
import numpy as np

ox.settings.log_console = False

# ---------------------------------------------------------------------------
# 1. Overpass query
# ---------------------------------------------------------------------------

def assemble_ring_from_ways(way_refs, ways_dict, nodes_dict):
    """Assemble a closed ring from a sequence of OSM way references.
    
    Returns (ring_coords, used_mask) or (None, None) if no ring could be built.
    """
    if not way_refs:
        return None, None
    
    segments = []
    for ref in way_refs:
        if ref not in ways_dict:
            continue
        node_ids = ways_dict[ref]
        if len(node_ids) < 2:
            continue
        coords = [nodes_dict.get(nid) for nid in node_ids]
        coords = [c for c in coords if c is not None]
        if len(coords) < 2:
            continue
        segments.append({
            'ref': ref,
            'coords': list(coords),
            'start_node': node_ids[0],
            'end_node': node_ids[-1],
        })
    
    if not segments:
        return None, None
    
    used = [False] * len(segments)
    ring_coords = list(segments[0]['coords'])
    current_end_node = segments[0]['end_node']
    used[0] = True
    
    changed = True
    while changed:
        changed = False
        for i, seg in enumerate(segments):
            if used[i]:
                continue
            if seg['start_node'] == current_end_node:
                ring_coords.extend(seg['coords'][1:])
                current_end_node = seg['end_node']
                used[i] = True
                changed = True
                break
            elif seg['end_node'] == current_end_node:
                rev_coords = list(reversed(seg['coords']))
                ring_coords.extend(rev_coords[1:])
                current_end_node = seg['start_node']
                used[i] = True
                changed = True
                break
    
    if ring_coords[0] != ring_coords[-1]:
        ring_coords.append(ring_coords[0])
    
    return ring_coords, used


def fetch_vienna_districts_osmnx(date=None):
    """Fetch Vienna district (admin_level=9) boundary relations via Overpass."""
    bbox = (48.1, 16.1, 48.35, 16.6)
    
    if date:
        query = f"""
[out:json][timeout:180][date:"{date}"];
relation["admin_level"="9"]["boundary"="administrative"]({bbox[0]},{bbox[1]},{bbox[2]},{bbox[3]});
out body;
>;
out skel qt;
"""
    else:
        query = f"""
[out:json][timeout:180];
relation["admin_level"="9"]["boundary"="administrative"]({bbox[0]},{bbox[1]},{bbox[2]},{bbox[3]});
out body;
>;
out skel qt;
"""
    
    data = OrderedDict()
    data['data'] = query
    result = op._overpass_request(data)
    elements = result.get("elements", [])
    
    nodes = {}
    ways = {}
    relations = []
    
    for el in elements:
        etype = el.get("type")
        eid = el.get("id")
        if etype == "node":
            nodes[eid] = (el.get("lon"), el.get("lat"))
        elif etype == "way":
            ways[eid] = el.get("nodes", [])
        elif etype == "relation":
            relations.append(el)
    
    features = []
    
    for rel in relations:
        tags = rel.get("tags", {})
        name = tags.get("name", "")
        admin_level = tags.get("admin_level", "")
        
        if admin_level != "9":
            continue
        
        members = rel.get("members", [])
        
        # Collect way refs by role
        all_outer_refs = []
        inner_refs_coords = []
        
        for mem in members:
            if mem.get("type") != "way":
                continue
            role = mem.get("role", "")
            ref = mem.get("ref")
            if ref not in ways:
                continue
            if role == "inner":
                node_ids = ways[ref]
                coords = [nodes.get(nid) for nid in node_ids]
                coords = [c for c in coords if c is not None]
                if len(coords) >= 3:
                    if coords[0] != coords[-1]:
                        coords.append(coords[0])
                    inner_refs_coords.append(coords)
            else:
                all_outer_refs.append(ref)
        
        # Build outer rings iteratively
        outer_polygons = []
        remaining = list(all_outer_refs)
        
        while remaining:
            ring_coords, used = assemble_ring_from_ways(remaining, ways, nodes)
            if ring_coords is None or len(ring_coords) < 4:
                break
            
            try:
                ls = shapely.LineString(ring_coords)
                if not ls.is_valid or ls.is_empty or ls.length < 1e-10:
                    break
                poly = shapely.Polygon(ring_coords)
                poly = shapely.polygon.orient(poly)
                if poly.is_valid and not poly.is_empty and poly.area > 1e-12:
                    outer_polygons.append(poly)
            except Exception:
                break
            
            # Remove used refs from remaining
            remaining = [r for i, r in enumerate(remaining) 
                        if i >= len(used) or not used[i]]
        
        if not outer_polygons:
            continue
        
        # Handle inner rings
        inner_polygons = []
        for coords in inner_refs_coords:
            try:
                inner_ls = shapely.LineString(coords)
                if inner_ls.is_valid and not inner_ls.is_empty and inner_ls.length > 1e-10:
                    ipoly = shapely.Polygon(coords)
                    ipoly = shapely.polygon.orient(ipoly)
                    if ipoly.is_valid and not ipoly.is_empty:
                        inner_polygons.append(ipoly)
            except Exception:
                continue
        
        # Cut inner rings from outer polygons
        final_polygons = []
        for outer_poly in outer_polygons:
            inners_here = []
            rest = []
            for ipoly in inner_polygons:
                try:
                    if outer_poly.contains(ipoly.representative_point()) or outer_poly.intersects(ipoly):
                        inners_here.append(ipoly)
                    else:
                        rest.append(ipoly)
                except Exception:
                    rest.append(ipoly)
            inner_polygons = rest
            
            if inners_here:
                try:
                    outer_coords = list(outer_poly.exterior.coords)
                    inner_coords_list = [list(ip.exterior.coords) for ip in inners_here]
                    poly = shapely.Polygon(outer_coords, inner_coords_list)
                    poly = shapely.polygon.orient(poly)
                    if poly.is_valid and not poly.is_empty:
                        final_polygons.append(poly)
                    else:
                        final_polygons.append(outer_poly)
                except Exception:
                    final_polygons.append(outer_poly)
            else:
                final_polygons.append(outer_poly)
        
        if final_polygons:
            geom = shapely.unary_union(final_polygons)
            if geom.geom_type == "Polygon":
                geom = shapely.geometry.MultiPolygon([geom])
            
            if not geom.is_empty:
                features.append({
                    "geometry": geom,
                    "name": name,
                    "osm_id": rel.get("id"),
                })
    
    if not features:
        return gpd.GeoDataFrame()
    
    return gpd.GeoDataFrame(features, crs="EPSG:4326")


# ---------------------------------------------------------------------------
# 2. Name normalisation
# ---------------------------------------------------------------------------

VIENNA_DISTRICTS_CANON = {
    "innere stadt": "Innere Stadt",
    "leopoldstadt": "Leopoldstadt",
    "landstraße": "Landstraße",
    "landstrasse": "Landstraße",
    "wieden": "Wieden",
    "margareten": "Margareten",
    "margarethen": "Margareten",
    "mariahilf": "Mariahilf",
    "neubau": "Neubau",
    "josefstadt": "Josefstadt",
    "alsergrund": "Alsergrund",
    "favoriten": "Favoriten",
    "simmering": "Simmering",
    "meidling": "Meidling",
    "hietzing": "Hietzing",
    "penzing": "Penzing",
    "rudolfsheim-fünfhaus": "Rudolfsheim-Fünfhaus",
    "rudolfsheim-fuenfhaus": "Rudolfsheim-Fünfhaus",
    "ottakring": "Ottakring",
    "hernals": "Hernals",
    "währing": "Währing",
    "waehring": "Währing",
    "döbling": "Döbling",
    "doebling": "Döbling",
    "brigittenau": "Brigittenau",
    "floridsdorf": "Floridsdorf",
    "donaustadt": "Donaustadt",
    "liesing": "Liesing",
}


def normalise_name(name):
    if not name or pd.isna(name):
        return None
    n = name.strip()
    key = n.lower().replace("  ", " ").replace("ß", "ss").replace("-", "-")
    if key in VIENNA_DISTRICTS_CANON:
        return VIENNA_DISTRICTS_CANON[key]
    key2 = key.replace("ss", "ß")
    if key2 in VIENNA_DISTRICTS_CANON:
        return VIENNA_DISTRICTS_CANON[key2]
    for pattern, canon in VIENNA_DISTRICTS_CANON.items():
        if pattern in key or key in pattern:
            return canon
    return n


# ---------------------------------------------------------------------------
# 3. Main pipeline
# ---------------------------------------------------------------------------

def main():
    print("=" * 60)
    print("Fetching CURRENT Vienna district boundaries...")
    print("=" * 60)
    current_gdf = fetch_vienna_districts_osmnx()
    print(f"  Got {len(current_gdf)} current district features")
    
    print("\n" + "=" * 60)
    print("Fetching 2014-01-01 Vienna district boundaries...")
    print("=" * 60)
    historical_gdf = fetch_vienna_districts_osmnx(date="2014-01-01T00:00:00Z")
    print(f"  Got {len(historical_gdf)} historical district features")
    
    if current_gdf.empty or historical_gdf.empty:
        print("ERROR: Empty data returned!")
        sys.exit(1)
    
    # Normalise names
    current_gdf["district_name"] = current_gdf["name"].apply(normalise_name)
    historical_gdf["district_name"] = historical_gdf["name"].apply(normalise_name)
    
    print("\nCurrent district names:")
    for n in sorted(current_gdf["district_name"].dropna().unique()):
        print(f"  - {n}")
    print("\nHistorical district names:")
    for n in sorted(historical_gdf["district_name"].dropna().unique()):
        print(f"  - {n}")
    
    # Filter to Vienna districts only
    valid_names = set(VIENNA_DISTRICTS_CANON.values())
    current_gdf = current_gdf[current_gdf["district_name"].isin(valid_names)].copy()
    historical_gdf = historical_gdf[historical_gdf["district_name"].isin(valid_names)].copy()
    
    print(f"\nAfter filtering:")
    print(f"  Current: {len(current_gdf)} districts")
    print(f"  Historical: {len(historical_gdf)} districts")
    
    current_names = set(current_gdf["district_name"].dropna())
    historical_names = set(historical_gdf["district_name"].dropna())
    
    common = current_names & historical_names
    only_current = current_names - historical_names
    only_historical = historical_names - current_names
    
    print(f"\nCommon districts: {len(common)}")
    if only_current:
        print(f"Only in current: {only_current}")
    if only_historical:
        print(f"Only in historical: {only_historical}")
    
    # Build unified dissolve
    current_union = current_gdf.geometry.unary_union
    historical_union = historical_gdf.geometry.unary_union
    
    print(f"\nCurrent total area: {current_union.area:.8f} sq deg")
    print(f"Historical total area: {historical_union.area:.8f} sq deg")
    
    # Symmetric difference
    added = shapely.difference(current_union, historical_union)
    removed = shapely.difference(historical_union, current_union)
    unchanged = shapely.intersection(current_union, historical_union)
    
    for label, geom in [("Added", added), ("Removed", removed), ("Unchanged", unchanged)]:
        if geom.is_empty:
            print(f"{label}: empty")
        else:
            print(f"{label}: area={geom.area:.8f}, type={geom.geom_type}")
    
    # Collect fragments
    fragments = []
    
    def add_fragments(geom, ctype, dist_name_fn):
        if geom.is_empty:
            return
        if geom.geom_type == "MultiPolygon":
            polys = list(geom.geoms)
        elif geom.geom_type == "Polygon":
            polys = [geom]
        elif geom.geom_type == "GeometryCollection":
            polys = [g for g in geom.geoms if g.geom_type in ("Polygon", "MultiPolygon")]
        else:
            polys = [geom]
        
        for poly in polys:
            if poly.is_empty:
                continue
            if poly.geom_type == "Polygon":
                mp = shapely.geometry.MultiPolygon([poly])
            elif poly.geom_type == "MultiPolygon":
                mp = poly
            else:
                continue
            dname = dist_name_fn(mp)
            fragments.append({
                "geometry": mp,
                "change_type": ctype,
                "district_name": dname,
            })
    
    def added_dist_name(geom):
        districts = set()
        for _, row in current_gdf.iterrows():
            if geom.intersects(row["geometry"]):
                districts.add(row["district_name"])
        return ", ".join(sorted(districts)) if districts else "unknown"
    
    add_fragments(added, "added_since_2014", added_dist_name)
    
    def removed_dist_name(geom):
        districts = set()
        for _, row in historical_gdf.iterrows():
            if geom.intersects(row["geometry"]):
                districts.add(row["district_name"])
        return ", ".join(sorted(districts)) if districts else "unknown"
    
    add_fragments(removed, "removed_since_2014", removed_dist_name)
    
    # Unchanged fragments per current district
    for _, row in current_gdf.iterrows():
        dname = row["district_name"]
        dist_geom = row["geometry"]
        unchanged_part = shapely.intersection(dist_geom, unchanged)
        if not unchanged_part.is_empty:
            def dn_fn(geom, dn=dname):
                return dn
            add_fragments(unchanged_part, "unchanged", dn_fn)
    
    result_gdf = gpd.GeoDataFrame(fragments, crs="EPSG:4326")
    
    if result_gdf.empty:
        print("No fragments — all boundaries unchanged.")
        result_gdf = current_gdf.copy()
        result_gdf["change_type"] = "unchanged"
        result_gdf = result_gdf[["geometry", "change_type", "district_name"]]
    
    print(f"\nFragments created: {len(result_gdf)}")
    if not result_gdf.empty:
        print(result_gdf["change_type"].value_counts())
    
    # -----------------------------------------------------------------------
    # 4. touches_changed
    # -----------------------------------------------------------------------
    
    changed_geoms = []
    for _, row in result_gdf.iterrows():
        if row["change_type"] in ("added_since_2014", "removed_since_2014"):
            changed_geoms.append(row["geometry"])
    
    if changed_geoms:
        changed_union = shapely.unary_union(changed_geoms)
    else:
        changed_union = shapely.geometry.MultiPolygon()
    
    district_touches = {}
    for _, row in current_gdf.iterrows():
        dname = row["district_name"]
        geom = row["geometry"]
        if not changed_union.is_empty:
            touches = geom.touches(changed_union)
            if not touches:
                overlap = shapely.intersection(geom, changed_union)
                touches = not overlap.is_empty
            district_touches[dname] = touches
        else:
            district_touches[dname] = False
    
    for _, row in historical_gdf.iterrows():
        dname = row["district_name"]
        if dname not in district_touches:
            geom = row["geometry"]
            if not changed_union.is_empty:
                touches = geom.touches(changed_union)
                if not touches:
                    overlap = shapely.intersection(geom, changed_union)
                    touches = not overlap.is_empty
                district_touches[dname] = touches
            else:
                district_touches[dname] = False
    
    def get_touches(dn_str):
        if pd.isna(dn_str) or dn_str == "unknown":
            return False
        for d in dn_str.split(","):
            d = d.strip()
            if district_touches.get(d, False):
                return True
        return False
    
    result_gdf["touches_changed"] = result_gdf["district_name"].apply(get_touches)
    
    # Simplify and ensure MultiPolygon
    result_gdf["geometry"] = result_gdf["geometry"].simplify(0.00001, preserve_topology=True)
    result_gdf["geometry"] = result_gdf["geometry"].buffer(0)
    
    def ensure_multipolygon(geom):
        if geom.geom_type == "Polygon":
            return shapely.geometry.MultiPolygon([geom])
        return geom
    
    result_gdf["geometry"] = result_gdf["geometry"].apply(ensure_multipolygon)
    
    # -----------------------------------------------------------------------
    # 5. Output
    # -----------------------------------------------------------------------
    output_path = "/work/vienna_boundary_changes.geojson"
    result_gdf.to_file(output_path, driver="GeoJSON")
    
    print(f"\n{'=' * 60}")
    print(f"Output written to {output_path}")
    print(f"Features: {len(result_gdf)}")
    print(f"Columns: {result_gdf.columns.tolist()}")
    print(f"\nSample features:")
    print(result_gdf[["change_type", "district_name", "touches_changed"]].head(15))
    
    print(f"\nDistrict touch summary:")
    for dname, touches in sorted(district_touches.items()):
        print(f"  {dname}: touches_changed={touches}")
    
    # Verify
    verify = gpd.read_file(output_path)
    print(f"\nVerification: read back {len(verify)} features")
    print(verify[["change_type", "district_name", "touches_changed"]].head(10))
    print(f"Geometry types: {verify.geometry.geom_type.unique()}")


if __name__ == "__main__":
    main()

Wrote /work/solve.py (18422 chars).

9:01

The problem is that my `fetch_vienna_districts_osmnx` function works in the test script but returns 0 in solve.py. Let me debug this more carefully. The issue might be that `assemble_ring_from_ways` returns (None, None) or the ring building logic has a bug in the full script.

Let me check: in the test above, I call `assemble_ring_from_ways(outer_refs, 'outer', ways, nodes)` with 3 args. But in my solve.py I call `assemble_ring_from_ways(remaining, ways, nodes)` with 2 args! The function signature is `def assemble_ring_from_ways(way_refs, ways_dict, nodes_dict)` - that's 3 params including way_refs, so `assemble_ring_from_ways(remaining, ways, nodes)` has 3 args total which is correct.

Wait, actually looking more carefully at the test: I called `assemble_ring_from_ways(outer_refs, 'outer', ways, nodes)` - that's 4 args! But my function only takes 3. In the test output it works... 

Oh wait, I defined `assemble_ring_from_ways(way_refs, role, ways_dict, nodes_dict)` in the test script but `assemble_ring_from_ways(way_refs, ways_dict, nodes_dict)` in solve.py. The test used a 4-param version. Let me check the test's function:

In the test script I defined:
```python
def assemble_ring_from_ways(way_refs, role, ways_dict, nodes_dict):
```

But in solve.py:
```python
def assemble_ring_from_ways(way_refs, ways_dict, nodes_dict):
```

That's fine - the solve.py version has 3 params (the `role` was unused). So the call `assemble_ring_from_ways(remaining, ways, nodes)` passes 3 args to a function that takes 3 params. That should work.

The issue must be somewhere else. Let me run a minimal version of solve.py to debug.

9:01

Bash cd /work && python3 << 'PYEOF'

cd /work && python3 << 'PYEOF'
import osmnx as ox
import osmnx._overpass as op
from collections import OrderedDict
import geopandas as gpd
import shapely
import shapely.geometry

ox.settings.log_console = False

def assemble_ring_from_ways(way_refs, ways_dict, nodes_dict):
    if not way_refs:
        return None, None
    segments = []
    for ref in way_refs:
        if ref not in ways_dict:
            continue
        node_ids = ways_dict[ref]
        if len(node_ids) < 2:
            continue
        coords = [nodes_dict.get(nid) for nid in node_ids]
        coords = [c for c in coords if c is not None]
        if len(coords) < 2:
            continue
        segments.append({
            'ref': ref,
            'coords': list(coords),
            'start_node': node_ids[0],
            'end_node': node_ids[-1],
        })
    if not segments:
        return None, None
    used = [False] * len(segments)
    ring_coords = list(segments[0]['coords'])
    current_end_node = segments[0]['end_node']
    used[0] = True
    changed = True
    while changed:
        changed = False
        for i, seg in enumerate(segments):
            if used[i]:
                continue
            if seg['start_node'] == current_end_node:
                ring_coords.extend(seg['coords'][1:])
                current_end_node = seg['end_node']
                used[i] = True
                changed = True
                break
            elif seg['end_node'] == current_end_node:
                rev_coords = list(reversed(seg['coords']))
                ring_coords.extend(rev_coords[1:])
                current_end_node = seg['start_node']
                used[i] = True
                changed = True
                break
    if ring_coords[0] != ring_coords[-1]:
        ring_coords.append(ring_coords[0])
    return ring_coords, used

def fetch_vienna_districts_osmnx(date=None):
    bbox = (48.1, 16.1, 48.35, 16.6)
    if date:
        query = f"""
[out:json][timeout:180][date:"{date}"];
relation["admin_level"="9"]["boundary"="administrative"]({bbox[0]},{bbox[1]},{bbox[2]},{bbox[3]});
out body;
>;
out skel qt;
"""
    else:
        query = f"""
[out:json][timeout:180];
relation["admin_level"="9"]["boundary"="administrative"]({bbox[0]},{bbox[1]},{bbox[2]},{bbox[3]});
out body;
>;
out skel qt;
"""
    data = OrderedDict()
    data['data'] = query
    result = op._overpass_request(data)
    elements = result.get("elements", [])
    nodes = {}
    ways = {}
    relations = []
    for el in elements:
        etype = el.get("type")
        eid = el.get("id")
        if etype == "node":
            nodes[eid] = (el.get("lon"), el.get("lat"))
        elif etype == "way":
            ways[eid] = el.get("nodes", [])
        elif etype == "relation":
            relations.append(el)
    
    features = []
    for rel in relations:
        tags = rel.get("tags", {})
        name = tags.get("name", "")
        admin_level = tags.get("admin_level", "")
        if admin_level != "9":
            continue
        members = rel.get("members", [])
        all_outer_refs = []
        inner_refs_coords = []
        for mem in members:
            if mem.get("type") != "way":
                continue
            role = mem.get("role", "")
            ref = mem.get("ref")
            if ref not in ways:
                continue
            if role == "inner":
                node_ids = ways[ref]
                coords = [nodes.get(nid) for nid in node_ids]
                coords = [c for c in coords if c is not None]
                if len(coords) >= 3:
                    if coords[0] != coords[-1]:
                        coords.append(coords[0])
                    inner_refs_coords.append(coords)
            else:
                all_outer_refs.append(ref)
        
        outer_polygons = []
        remaining = list(all_outer_refs)
        while remaining:
            ring_coords, used = assemble_ring_from_ways(remaining, ways, nodes)
            if ring_coords is None or len(ring_coords) < 4:
                break
            try:
                ls = shapely.LineString(ring_coords)
                if not ls.is_valid or ls.is_empty or ls.length < 1e-10:
                    break
                poly = shapely.Polygon(ring_coords)
                poly = shapely.polygon.orient(poly)
                if poly.is_valid and not poly.is_empty and poly.area > 1e-12:
                    outer_polygons.append(poly)
            except Exception:
                break
            remaining = [r for i, r in enumerate(remaining) 
                        if i >= len(used) or not used[i]]
        
        if not outer_polygons:
            continue
        
        inner_polygons = []
        for coords in inner_refs_coords:
            try:
                inner_ls = shapely.LineString(coords)
                if inner_ls.is_valid and not inner_ls.is_empty and inner_ls.length > 1e-10:
                    ipoly = shapely.Polygon(coords)
                    ipoly = shapely.polygon.orient(ipoly)
                    if ipoly.is_valid and not ipoly.is_empty:
                        inner_polygons.append(ipoly)
            except Exception:
                continue
        
        final_polygons = []
        for outer_poly in outer_polygons:
            inners_here = []
            rest = []
            for ipoly in inner_polygons:
                try:
                    if outer_poly.contains(ipoly.representative_point()) or outer_poly.intersects(ipoly):
                        inners_here.append(ipoly)
                    else:
                        rest.append(ipoly)
                except Exception:
                    rest.append(ipoly)
            inner_polygons = rest
            if inners_here:
                try:
                    outer_coords = list(outer_poly.exterior.coords)
                    inner_coords_list = [list(ip.exterior.coords) for ip in inners_here]
                    poly = shapely.Polygon(outer_coords, inner_coords_list)
                    poly = shapely.polygon.orient(poly)
                    if poly.is_valid and not poly.is_empty:
                        final_polygons.append(poly)
                    else:
                        final_polygons.append(outer_poly)
                except Exception:
                    final_polygons.append(outer_poly)
            else:
                final_polygons.append(outer_poly)
        
        if final_polygons:
            geom = shapely.unary_union(final_polygons)
            if geom.geom_type == "Polygon":
                geom = shapely.geometry.MultiPolygon([geom])
            if not geom.is_empty:
                features.append({
                    "geometry": geom,
                    "name": name,
                    "osm_id": rel.get("id"),
                })
    
    if not features:
        return gpd.GeoDataFrame()
    return gpd.GeoDataFrame(features, crs="EPSG:4326")

# Test
print("CURRENT:")
gdf = fetch_vienna_districts_osmnx()
print(f"  Got {len(gdf)} features")
print(f"  Names: {sorted(gdf['name'].unique()) if not gdf.empty else 'N/A'}")
print()

print("HISTORICAL:")
gdf2 = fetch_vienna_districts_osmnx(date="2014-01-01T00:00:00Z")
print(f"  Got {len(gdf2)} features")
print(f"  Names: {sorted(gdf2['name'].unique()) if not gdf2.empty else 'N/A'}")
PYEOF

CURRENT:
  Got 0 features
  Names: N/A

HISTORICAL:
  Got 0 features
  Names: N/A

9:10

The function returns 0 features but when I tested it with a standalone script earlier, it worked fine for all 23 districts. Let me check what's different - perhaps the `remaining` list management is wrong. The issue is that after `used` is returned, I'm filtering `remaining` using the indices from `used`, but the segments were created from `remaining` inside `assemble_ring_from_ways` - and the used array corresponds to segments, not to the original `remaining` list directly since I filtered out ways not in ways_dict. Let me debug more carefully.

9:10

Bash cd /work && python3 << 'PYEOF'

cd /work && python3 << 'PYEOF'
import osmnx as ox
import osmnx._overpass as op
from collections import OrderedDict

ox.settings.log_console = False

bbox = (48.1, 16.1, 48.35, 16.6)
query = f"""
[out:json][timeout:180];
relation["admin_level"="9"]["boundary"="administrative"]({bbox[0]},{bbox[1]},{bbox[2]},{bbox[3]});
out body;
>;
out skel qt;
"""
data = OrderedDict()
data['data'] = query
result = op._overpass_request(data)
elements = result.get("elements", [])

nodes = {}
ways = {}
relations = []
for el in elements:
    etype = el.get("type")
    eid = el.get("id")
    if etype == "node":
        nodes[eid] = (el.get("lon"), el.get("lat"))
    elif etype == "way":
        ways[eid] = el.get("nodes", [])
    elif etype == "relation":
        relations.append(el)

print(f"Nodes: {len(nodes)}, Ways: {len(ways)}, Relations: {len(relations)}")

def assemble_ring_from_ways(way_refs, ways_dict, nodes_dict):
    if not way_refs:
        return None, None, []
    segments = []
    for ref in way_refs:
        if ref not in ways_dict:
            continue
        node_ids = ways_dict[ref]
        if len(node_ids) < 2:
            continue
        coords = [nodes_dict.get(nid) for nid in node_ids]
        coords = [c for c in coords if c is not None]
        if len(coords) < 2:
            continue
        segments.append({
            'ref': ref,
            'coords': list(coords),
            'start_node': node_ids[0],
            'end_node': node_ids[-1],
        })
    if not segments:
        return None, None, []
    used = [False] * len(segments)
    ring_coords = list(segments[0]['coords'])
    current_end_node = segments[0]['end_node']
    used[0] = True
    changed = True
    while changed:
        changed = False
        for i, seg in enumerate(segments):
            if used[i]:
                continue
            if seg['start_node'] == current_end_node:
                ring_coords.extend(seg['coords'][1:])
                current_end_node = seg['end_node']
                used[i] = True
                changed = True
                break
            elif seg['end_node'] == current_end_node:
                rev_coords = list(reversed(seg['coords']))
                ring_coords.extend(rev_coords[1:])
                current_end_node = seg['start_node']
                used[i] = True
                changed = True
                break
    if ring_coords[0] != ring_coords[-1]:
        ring_coords.append(ring_coords[0])
    used_refs = [segments[i]['ref'] for i in range(len(segments)) if used[i]]
    return ring_coords, used, used_refs

for rel in relations[:3]:
    tags = rel.get("tags", {})
    name = tags.get("name", "")
    admin_level = tags.get("admin_level", "")
    print(f"\n{name} admin_level={admin_level}")
    
    members = rel.get("members", [])
    all_outer_refs = []
    inner_refs_coords = []
    for mem in members:
        if mem.get("type") != "way":
            continue
        role = mem.get("role", "")
        ref = mem.get("ref")
        if ref not in ways:
            continue
        if role == "inner":
            inner_refs_coords.append(ref)
        else:
            all_outer_refs.append(ref)
    
    print(f"  Outer refs: {len(all_outer_refs)}, Inner refs: {len(inner_refs_coords)}")
    
    outer_polygons = []
    remaining = list(all_outer_refs)
    iter_count = 0
    while remaining and iter_count < 10:
        iter_count += 1
        ring_coords, used, used_refs = assemble_ring_from_ways(remaining, ways, nodes)
        if ring_coords is None or len(ring_coords) < 4:
            print(f"  Iter {iter_count}: ring_coords is None or too short")
            break
        try:
            ls = shapely.LineString(ring_coords)
            if not ls.is_valid or ls.is_empty or ls.length < 1e-10:
                print(f"  Iter {iter_count}: invalid LineString")
                break
            poly = shapely.Polygon(ring_coords)
            poly = shapely.polygon.orient(poly)
            if poly.is_valid and not poly.is_empty and poly.area > 1e-12:
                outer_polygons.append(poly)
                print(f"  Iter {iter_count}: built polygon area={poly.area:.8f}, used {len(used_refs)} refs")
            else:
                print(f"  Iter {iter_count}: invalid polygon")
                break
        except Exception as e:
            print(f"  Iter {iter_count}: error {e}")
            break
        
        # Remove used refs
        remaining = [r for r in remaining if r not in used_refs]
        print(f"  Remaining: {len(remaining)} refs")
    
    print(f"  Result: {len(outer_polygons)} polygons")
PYEOF

Nodes: 6210, Ways: 499, Relations: 23

Alsergrund admin_level=9
  Outer refs: 9, Inner refs: 0
  Iter 1: error name 'shapely' is not defined
  Result: 0 polygons

Hietzing admin_level=9
  Outer refs: 50, Inner refs: 0
  Iter 1: error name 'shapely' is not defined
  Result: 0 polygons

Innere Stadt admin_level=9
  Outer refs: 7, Inner refs: 0
  Iter 1: error name 'shapely' is not defined
  Result: 0 polygons

9:21

I forgot to import shapely in the test script. But the real problem in my solve.py is that the `assemble_ring_from_ways` function seems to have an issue where the `remaining` list isn't being properly reduced because `used_refs` returned from the function may not align with the original references passed in. Let me check this more carefully.

Actually, looking at the code more carefully, the issue is in how I remove used refs from remaining. The `used` list from `assemble_ring_from_ways` corresponds to segments built from `remaining`, but segments are filtered within the function (some refs might not have valid segments). The `remaining` list passed to the function may have refs that don't match because `assemble_ring_from_ways` creates segments from those refs, but then tracks used as indices into the segments list, not into the original `remaining` list.

Let me fix this by returning used_refs (the actual refs that were used) and filtering remaining based on those.

Let me rewrite the code more carefully.

9:21

Bash cd /work && python3 << 'PYEOF'

cd /work && python3 << 'PYEOF'
import osmnx as ox
import osmnx._overpass as op
from collections import OrderedDict
import shapely
import shapely.geometry

ox.settings.log_console = False

bbox = (48.1, 16.1, 48.35, 16.6)
query = f"""
[out:json][timeout:180];
relation["admin_level"="9"]["boundary"="administrative"]({bbox[0]},{bbox[1]},{bbox[2]},{bbox[3]});
out body;
>;
out skel qt;
"""
data = OrderedDict()
data['data'] = query
result = op._overpass_request(data)
elements = result.get("elements", [])

nodes = {}
ways = {}
relations = []
for el in elements:
    etype = el.get("type")
    eid = el.get("id")
    if etype == "node":
        nodes[eid] = (el.get("lon"), el.get("lat"))
    elif etype == "way":
        ways[eid] = el.get("nodes", [])
    elif etype == "relation":
        relations.append(el)

print(f"Nodes: {len(nodes)}, Ways: {len(ways)}, Relations: {len(relations)}")

def build_ring_from_ways(way_refs, ways_dict, nodes_dict):
    """Return (ring_coords, used_refs) or (None, [])."""
    if not way_refs:
        return None, []
    
    # Build segments from the given refs
    segments = []
    for ref in way_refs:
        if ref not in ways_dict:
            continue
        node_ids = ways_dict[ref]
        if len(node_ids) < 2:
            continue
        coords = [nodes_dict.get(nid) for nid in node_ids]
        coords = [c for c in coords if c is not None]
        if len(coords) < 2:
            continue
        segments.append({
            'ref': ref,
            'coords': list(coords),
            'start_node': node_ids[0],
            'end_node': node_ids[-1],
        })
    
    if not segments:
        return None, []
    
    used = [False] * len(segments)
    
    # Start with the longest segment for better results
    longest_idx = max(range(len(segments)), key=lambda i: len(segments[i]['coords']))
    ring_coords = list(segments[longest_idx]['coords'])
    current_end_node = segments[longest_idx]['end_node']
    used[longest_idx] = True
    
    changed = True
    while changed:
        changed = False
        for i, seg in enumerate(segments):
            if used[i]:
                continue
            if seg['start_node'] == current_end_node:
                ring_coords.extend(seg['coords'][1:])
                current_end_node = seg['end_node']
                used[i] = True
                changed = True
                break
            elif seg['end_node'] == current_end_node:
                rev_coords = list(reversed(seg['coords']))
                ring_coords.extend(rev_coords[1:])
                current_end_node = seg['start_node']
                used[i] = True
                changed = True
                break
    
    if ring_coords[0] != ring_coords[-1]:
        ring_coords.append(ring_coords[0])
    
    used_refs = [segments[i]['ref'] for i in range(len(segments)) if used[i]]
    return ring_coords, used_refs


# Test building polygon for each relation
for rel in relations:
    tags = rel.get("tags", {})
    name = tags.get("name", "")
    admin_level = tags.get("admin_level", "")
    if admin_level != "9":
        continue
    
    members = rel.get("members", [])
    all_outer_refs = []
    inner_refs_coords = []
    for mem in members:
        if mem.get("type") != "way":
            continue
        role = mem.get("role", "")
        ref = mem.get("ref")
        if ref not in ways:
            continue
        if role == "inner":
            node_ids = ways[ref]
            coords = [nodes.get(nid) for nid in node_ids]
            coords = [c for c in coords if c is not None]
            if len(coords) >= 3:
                if coords[0] != coords[-1]:
                    coords.append(coords[0])
                inner_refs_coords.append(coords)
        else:
            all_outer_refs.append(ref)
    
    # Build outer polygons iteratively
    outer_polygons = []
    remaining = list(all_outer_refs)
    
    while remaining:
        ring_coords, used_refs = build_ring_from_ways(remaining, ways, nodes)
        if ring_coords is None or len(ring_coords) < 4:
            break
        
        try:
            ls = shapely.LineString(ring_coords)
            if not ls.is_valid or ls.is_empty or ls.length < 1e-10:
                break
            poly = shapely.Polygon(ring_coords)
            poly = shapely.polygon.orient(poly)
            if poly.is_valid and not poly.is_empty and poly.area > 1e-12:
                outer_polygons.append(poly)
        except Exception:
            break
        
        # Remove used refs
        remaining = [r for r in remaining if r not in used_refs]
    
    # Handle inner rings
    inner_polygons = []
    for coords in inner_refs_coords:
        try:
            ipoly = shapely.Polygon(coords)
            ipoly = shapely.polygon.orient(ipoly)
            if ipoly.is_valid and not ipoly.is_empty:
                inner_polygons.append(ipoly)
        except Exception:
            continue
    
    # Cut holes
    final_polygons = []
    for outer_poly in outer_polygons:
        inners_here = []
        rest = []
        for ipoly in inner_polygons:
            try:
                if outer_poly.contains(ipoly.representative_point()) or outer_poly.intersects(ipoly):
                    inners_here.append(ipoly)
                else:
                    rest.append(ipoly)
            except Exception:
                rest.append(ipoly)
        inner_polygons = rest
        
        if inners_here:
            try:
                outer_coords = list(outer_poly.exterior.coords)
                inner_coords_list = [list(ip.exterior.coords) for ip in inners_here]
                poly = shapely.Polygon(outer_coords, inner_coords_list)
                poly = shapely.polygon.orient(poly)
                if poly.is_valid and not poly.is_empty:
                    final_polygons.append(poly)
                else:
                    final_polygons.append(outer_poly)
            except Exception:
                final_polygons.append(outer_poly)
        else:
            final_polygons.append(outer_poly)
    
    if final_polygons:
        geom = shapely.unary_union(final_polygons)
        if geom.geom_type == "Polygon":
            geom = shapely.geometry.MultiPolygon([geom])
        print(f"{name}: {len(all_outer_refs)} outer refs, {len(outer_polygons)} polys, area={geom.area:.8f}")
    else:
        print(f"{name}: FAILED to build any polygon")
PYEOF

Nodes: 6210, Ways: 499, Relations: 23
Alsergrund: FAILED to build any polygon
Hietzing: FAILED to build any polygon
Innere Stadt: FAILED to build any polygon
Josefstadt: FAILED to build any polygon
Leopoldstadt: FAILED to build any polygon
Mariahilf: FAILED to build any polygon
Meidling: FAILED to build any polygon
Neubau: FAILED to build any polygon
Penzing: FAILED to build any polygon
Rudolfsheim-Fünfhaus: FAILED to build any polygon
Währing: FAILED to build any polygon
Landstraße: FAILED to build any polygon
Brigittenau: FAILED to build any polygon
Donaustadt: FAILED to build any polygon
Döbling: FAILED to build any polygon
Favoriten: FAILED to build any polygon
Floridsdorf: FAILED to build any polygon
Hernals: FAILED to build any polygon
Liesing: FAILED to build any polygon
Margareten: FAILED to build any polygon
Ottakring: FAILED to build any polygon
Simmering: FAILED to build any polygon
Wieden: FAILED to build any polygon

9:32

The problem is in the `build_ring_from_ways` function. The issue might be that the function attempts to build ONE ring from ALL the outer way refs, but sometimes there are multiple separate rings (for multipolygons). Looking at the earlier test, when we passed all 9 outer refs for Alsergrund to `assemble_ring_from_ways`, it worked fine and used all 9. But now with `build_ring_from_ways` starting with the longest segment, let me check what's happening.

Wait - the `build_ring_from_ways` starts with the LONGEST segment instead of the FIRST. Let me check if the logic for finding the next segment is correct. The issue might be that with a different starting segment, the ring assembly fails to find connections.

Actually, looking more carefully at the earlier successful test: `assemble_ring_from_ways` started with the first segment. In `build_ring_from_ways`, we start with the longest segment. But the ring assembly logic is the same - it just chains segments end-to-end by matching start/end nodes.

Let me debug more carefully.

9:32

Bash cd /work && python3 << 'PYEOF'

cd /work && python3 << 'PYEOF'
import osmnx as ox
import osmnx._overpass as op
from collections import OrderedDict
import shapely
import shapely.geometry

ox.settings.log_console = False

bbox = (48.1, 16.1, 48.35, 16.6)
query = f"""
[out:json][timeout:180];
relation["admin_level"="9"]["boundary"="administrative"]({bbox[0]},{bbox[1]},{bbox[2]},{bbox[3]});
out body;
>;
out skel qt;
"""
data = OrderedDict()
data['data'] = query
result = op._overpass_request(data)
elements = result.get("elements", [])

nodes = {}
ways = {}
relations = []
for el in elements:
    etype = el.get("type")
    eid = el.get("id")
    if etype == "node":
        nodes[eid] = (el.get("lon"), el.get("lat"))
    elif etype == "way":
        ways[eid] = el.get("nodes", [])
    elif etype == "relation":
        relations.append(el)

print(f"Nodes: {len(nodes)}, Ways: {len(ways)}, Relations: {len(relations)}")

def build_ring_from_ways(way_refs, ways_dict, nodes_dict):
    if not way_refs:
        return None, []
    
    segments = []
    for ref in way_refs:
        if ref not in ways_dict:
            continue
        node_ids = ways_dict[ref]
        if len(node_ids) < 2:
            continue
        coords = [nodes_dict.get(nid) for nid in node_ids]
        coords = [c for c in coords if c is not None]
        if len(coords) < 2:
            continue
        segments.append({
            'ref': ref,
            'coords': list(coords),
            'start_node': node_ids[0],
            'end_node': node_ids[-1],
        })
    
    if not segments:
        return None, []
    
    print(f"    build_ring: {len(segments)} segments")
    for s in segments:
        print(f"      ref={s['ref']}: {s['start_node']} -> {s['end_node']}, {len(s['coords'])} coords")
    
    used = [False] * len(segments)
    
    # Start with first segment
    ring_coords = list(segments[0]['coords'])
    current_end_node = segments[0]['end_node']
    current_start_node = segments[0]['start_node']
    used[0] = True
    print(f"    Start with ref={segments[0]['ref']}, start_node={current_start_node}, end_node={current_end_node}")
    
    changed = True
    iteration = 0
    while changed and iteration < 100:
        iteration += 1
        changed = False
        for i, seg in enumerate(segments):
            if used[i]:
                continue
            if seg['start_node'] == current_end_node:
                ring_coords.extend(seg['coords'][1:])
                current_end_node = seg['end_node']
                used[i] = True
                changed = True
                print(f"    Iter {iteration}: append ref={seg['ref']} (fwd), now end={current_end_node}")
                break
            elif seg['end_node'] == current_end_node:
                rev_coords = list(reversed(seg['coords']))
                ring_coords.extend(rev_coords[1:])
                current_end_node = seg['start_node']
                used[i] = True
                changed = True
                print(f"    Iter {iteration}: append ref={seg['ref']} (rev), now end={current_end_node}")
                break
    
    print(f"    Ring closed? ring_coords[0]={ring_coords[0]}, ring_coords[-1]={ring_coords[-1]}")
    print(f"    current_end_node={current_end_node}, current_start_node={current_start_node}")
    print(f"    Used: {[segments[i]['ref'] for i in range(len(segments)) if used[i]]}")
    
    if ring_coords[0] != ring_coords[-1]:
        ring_coords.append(ring_coords[0])
    
    used_refs = [segments[i]['ref'] for i in range(len(segments)) if used[i]]
    return ring_coords, used_refs


# Test Alsergrund
for rel in relations:
    tags = rel.get("tags", {})
    name = tags.get("name", "")
    if name != "Alsergrund":
        continue
    
    members = rel.get("members", [])
    all_outer_refs = []
    for mem in members:
        if mem.get("type") != "way":
            continue
        role = mem.get("role", "")
        ref = mem.get("ref")
        if ref not in ways:
            continue
        if role != "inner":
            all_outer_refs.append(ref)
    
    print(f"\n{name}: {len(all_outer_refs)} outer refs")
    print(f"  Refs: {all_outer_refs}")
    
    ring_coords, used_refs = build_ring_from_ways(all_outer_refs, ways, nodes)
    
    if ring_coords and len(ring_coords) >= 4:
        ls = shapely.LineString(ring_coords)
        poly = shapely.Polygon(ring_coords)
        poly = shapely.polygon.orient(poly)
        print(f"  LineString valid={ls.is_valid}, length={ls.length}")
        print(f"  Polygon valid={poly.is_valid}, area={poly.area}")
    break
PYEOF

Nodes: 6210, Ways: 499, Relations: 23

Alsergrund: 9 outer refs
  Refs: [147468285, 8046654, 147396089, 147463455, 147463456, 390540126, 390540129, 147463454, 147463462]
    build_ring: 9 segments
      ref=147468285: 731110806 -> 2157233697, 32 coords
      ref=8046654: 731110806 -> 313620580, 28 coords
      ref=147396089: 1606198290 -> 313620580, 33 coords
      ref=147463455: 1606822119 -> 1606198290, 15 coords
      ref=147463456: 1606822180 -> 1606822119, 22 coords
      ref=390540126: 3936984440 -> 1606822180, 3 coords
      ref=390540129: 3936984434 -> 3936984440, 3 coords
      ref=147463454: 731244444 -> 3936984434, 17 coords
      ref=147463462: 2157233697 -> 731244444, 268 coords
    Start with ref=147468285, start_node=731110806, end_node=2157233697
    Iter 1: append ref=147463462 (fwd), now end=731244444
    Iter 2: append ref=147463454 (fwd), now end=3936984434
    Iter 3: append ref=390540129 (fwd), now end=3936984440
    Iter 4: append ref=390540126 (fwd), now end=1606822180
    Iter 5: append ref=147463456 (fwd), now end=1606822119
    Iter 6: append ref=147463455 (fwd), now end=1606198290
    Iter 7: append ref=147396089 (fwd), now end=313620580
    Iter 8: append ref=8046654 (rev), now end=731110806
    Ring closed? ring_coords[0]=(16.3408703, 48.2151672), ring_coords[-1]=(16.3408703, 48.2151672)
    current_end_node=731110806, current_start_node=731110806
    Used: [147468285, 8046654, 147396089, 147463455, 147463456, 390540126, 390540129, 147463454, 147463462]

Traceback (most recent call last):
  File "<stdin>", line 137, in <module>
AttributeError: module 'shapely' has no attribute 'polygon'. Did you mean: 'Polygon'?

9:36

The ring assembly works perfectly for Alsergrund. The issue is `shapely.polygon.orient` is actually `shapely.geometry.polygon.orient` or more precisely it's accessible via `shapely.geometry.polygon.orient` but the standard way is with `shapely.geometry.polygon.orient` or simply calling `shapely.geometry.polygon.orient()`. Let me check:

Actually, `shapely.geometry.polygon.orient` exists in newer shapely versions. But `shapely.polygon` doesn't. The correct way is `from shapely.geometry import polygon` or `shapely.geometry.polygon.orient()`.

Let me fix the code.

9:36

Bash cd /work && python3 -c "

cd /work && python3 -c "
import shapely
print(dir(shapely))
" | tr ',' '\n' | grep -i orient

 'orient_polygons'
 'oriented_envelope'

9:39

So in shapely 2.x, `orient` is not at `shapely.polygon.orient` but rather `shapely.orient_polygons`. Let me check the exact API.

9:39

Bash cd /work && python3 -c "

cd /work && python3 -c "
import shapely
help(shapely.orient_polygons)
" 2>&1 | head -20

Help on function orient_polygons in module shapely.constructive:

orient_polygons(geometry, *, exterior_cw=False, **kwargs)
    Enforce a ring orientation on all polygonal elements in the input geometry.

    Forces (Multi)Polygons to use a counter-clockwise orientation for their
    exterior ring, and a clockwise orientation for their interior rings (or
    the oppposite if ``exterior_cw=True``).

    Also processes geometries inside a GeometryCollection in the same way.
    Other geometries are returned unchanged.

    .. versionadded:: 2.1.0

    Parameters
    ----------
    geometry : Geometry or array_like
        Geometry or geometries to orient consistently.
    exterior_cw : bool, default False
        If True, exterior rings will be clockwise and interior rings

9:42

So in shapely 2.1.0, `shapely.polygon.orient` is replaced by `shapely.orient_polygons`. But actually `shapely.polygon.orient` might still exist. Let me check:

Actually, `shapely.polygon` is a module in older versions. In shapely 2.x, it's been restructured. Let me just use `shapely.orient_polygons` which works on the full geometry, or just use `__geo_interface__` to check orientation.

Actually, the simplest fix: use `shapely.geometry.polygon.orient` or just skip orientation since the polygon should already be valid. Let me check if the polygon is already CCW.

Actually, the Polygon constructor should handle orientation automatically in many cases. Let me just remove the orient call.

9:42

Bash cd /work && python3 -c "

cd /work && python3 -c "
import shapely, shapely.geometry
# Check if polygon.orient exists
try:
    from shapely.geometry import polygon
    print(hasattr(polygon, 'orient'))
except:
    print('no')

# Check what's available
print([x for x in dir(shapely) if 'orient' in x.lower()])
print([x for x in dir(shapely.geometry) if 'orient' in x.lower()])
"

True
['orient_polygons', 'oriented_envelope']
[]

9:45

So `shapely.polygon.orient` exists! But it was giving me "module 'shapely' has no attribute 'polygon'" error. Let me check more carefully.

9:45

Bash cd /work && python3 << 'PYEOF'

cd /work && python3 << 'PYEOF'
import shapely
import shapely.geometry

# Check if shapely.polygon exists
print("shapely.polygon:", hasattr(shapely, 'polygon'))
if hasattr(shapely, 'polygon'):
    print("  type:", type(shapely.polygon))
    print("  has orient:", hasattr(shapely.polygon, 'orient'))

# The issue was in the previous script where I used import shapely but not shapely.geometry
# Let's test properly
from shapely import polygon
print("from shapely import polygon works:", polygon)
print("has orient:", hasattr(polygon, 'orient'))

# Try calling it
from shapely.geometry import Point
p = Point(0,0).buffer(1)
result = polygon.orient(p)
print("Result:", result, result.area)
PYEOF

shapely.polygon: False

Traceback (most recent call last):
  File "<stdin>", line 12, in <module>
ImportError: cannot import name 'polygon' from 'shapely' (/usr/local/lib/python3.14/site-packages/shapely/__init__.py). Did you mean: 'Polygon'?

9:48

So `shapely.polygon` doesn't exist in shapely 2.x. I need to use `shapely.orient_polygons()` instead. Let me update the solve.py accordingly.

9:48

Edit /work/solve.py

Edited /work/solve.py.

9:50

Let me do a global replace of all `shapely.polygon.orient(` with `shapely.orient_polygons(` calls.

9:50

Edit /work/solve.py

Edited /work/solve.py: replaced 1 occurrence(s).

9:52

Let me also fix the rest of the occurrences where `shapely.polygon.orient` is used.

9:52

Bash cd /work && grep -n "polygon\.orient\|\.orient(" /work/solve.py

193:                    ipoly = shapely.polygon.orient(ipoly)

9:53

Edit /work/solve.py

Edited /work/solve.py.

9:55

Also need to fix the `shapely.polygon.orient` in the test script. Let me now also fix the issue where in the test `build_ring_from_ways`, the `shapely` wasn't imported. Let me just run the actual solve.py and see if it works now.

9:55

Bash cd /work && grep -n "orient" /work/solve.py

173:                poly = shapely.orient_polygons(poly)
193:                    ipoly = shapely.orient_polygons(ipoly)
219:                    poly = shapely.orient_polygons(poly)

9:57

Good, all occurrences of `shapely.polygon.orient` have been replaced with `shapely.orient_polygons`. Now let's run the full solve script.

9:57

Bash cd /work && timeout 300 python solve.py 2>&1

============================================================
Fetching CURRENT Vienna district boundaries...
============================================================
  Got 23 current district features

============================================================
Fetching 2014-01-01 Vienna district boundaries...
============================================================
  Got 26 historical district features

Current district names:
  - Alsergrund
  - Brigittenau
  - Donaustadt
  - Döbling
  - Favoriten
  - Floridsdorf
  - Hernals
  - Hietzing
  - Innere Stadt
  - Josefstadt
  - Landstraße
  - Leopoldstadt
  - Liesing
  - Margareten
  - Mariahilf
  - Meidling
  - Neubau
  - Ottakring
  - Penzing
  - Rudolfsheim-Fünfhaus
  - Simmering
  - Wieden
  - Währing

Historical district names:
  - Alsergrund
  - Brigittenau
  - Donaustadt
  - Döbling
  - Enzersfeld
  - Favoriten
  - Floridsdorf
  - Gerasdorf bei Wien
  - Hernals
  - Hietzing
  - Innere Stadt
  - Josefstadt
  - Klein-Engersdorf
  - Landstraße
  - Leopoldstadt
  - Liesing
  - Margareten
  - Mariahilf
  - Meidling
  - Neubau
  - Ottakring
  - Penzing
  - Rudolfsheim-Fünfhaus
  - Simmering
  - Wieden
  - Währing

After filtering:
  Current: 23 districts
  Historical: 23 districts

Common districts: 23
/work/solve.py:354: DeprecationWarning: The 'unary_union' attribute is deprecated, use the 'union_all()' method instead.
  current_union = current_gdf.geometry.unary_union
/work/solve.py:355: DeprecationWarning: The 'unary_union' attribute is deprecated, use the 'union_all()' method instead.
  historical_union = historical_gdf.geometry.unary_union

Current total area: 0.05019851 sq deg
Historical total area: 0.05019825 sq deg
Added: area=0.00000195, type=MultiPolygon
Removed: area=0.00000169, type=MultiPolygon
Unchanged: area=0.05019656, type=Polygon

Fragments created: 631
change_type
removed_since_2014    312
added_since_2014      296
unchanged              23
Name: count, dtype: int64

============================================================
Output written to /work/vienna_boundary_changes.geojson
Features: 631
Columns: ['geometry', 'change_type', 'district_name', 'touches_changed']

Sample features:
         change_type district_name  touches_changed
0   added_since_2014       Liesing             True
1   added_since_2014       Liesing             True
2   added_since_2014       Liesing             True
3   added_since_2014       Liesing             True
4   added_since_2014       Liesing             True
5   added_since_2014       Liesing             True
6   added_since_2014       Liesing             True
7   added_since_2014       Liesing             True
8   added_since_2014       Liesing             True
9   added_since_2014       Liesing             True
10  added_since_2014       Liesing             True
11  added_since_2014       Liesing             True
12  added_since_2014       Liesing             True
13  added_since_2014       Liesing             True
14  added_since_2014       Liesing             True

District touch summary:
  Alsergrund: touches_changed=False
  Brigittenau: touches_changed=False
  Donaustadt: touches_changed=True
  Döbling: touches_changed=True
  Favoriten: touches_changed=True
  Floridsdorf: touches_changed=True
  Hernals: touches_changed=True
  Hietzing: touches_changed=True
  Innere Stadt: touches_changed=False
  Josefstadt: touches_changed=False
  Landstraße: touches_changed=False
  Leopoldstadt: touches_changed=False
  Liesing: touches_changed=True
  Margareten: touches_changed=False
  Mariahilf: touches_changed=False
  Meidling: touches_changed=False
  Neubau: touches_changed=False
  Ottakring: touches_changed=False
  Penzing: touches_changed=True
  Rudolfsheim-Fünfhaus: touches_changed=False
  Simmering: touches_changed=True
  Wieden: touches_changed=False
  Währing: touches_changed=False

Verification: read back 631 features
        change_type district_name  touches_changed
0  added_since_2014       Liesing             True
1  added_since_2014       Liesing             True
2  added_since_2014       Liesing             True
3  added_since_2014       Liesing             True
4  added_since_2014       Liesing             True
5  added_since_2014       Liesing             True
6  added_since_2014       Liesing             True
7  added_since_2014       Liesing             True
8  added_since_2014       Liesing             True
9  added_since_2014       Liesing             True
Geometry types: <ArrowStringArray>
['MultiPolygon']
Length: 1, dtype: str

10:04

The script is working. Let me analyze the results:

1. We got 23 districts in both current and historical (after filtering)
2. The added and removed areas are very small (1.95e-6 and 1.69e-6 sq deg) - these are minor boundary changes
3. The fragment count is quite high (631) because each small boundary change polygon gets split into many tiny polygons by the simplification/intersection operations
4. All geometry types are MultiPolygon as required

However, I notice some issues:
1. The region around Meidling shows many small fragments - these are likely from the ring assembly not handling multiple separate rings well (Meidling only used some of its outer refs)
2. 296 added + 312 removed + 23 unchanged = 631 fragments. The number seems high because the symmetric difference can generate many tiny slivers.

Let me verify the output more carefully by checking a few things:
- The added/removed areas are very tiny, suggesting minor boundary changes
- The district touches seem reasonable

Let me check if the output is of good quality by looking at some actual geometries.

Actually, let me check the output more carefully - the number of fragments is quite high. This could be because the ring assembly algorithm is breaking the unchanged parts into many small pieces. Let me check if the "unchanged" fragments are correct.

Wait, there are only 23 unchanged fragments, which is one per district. So the 296 + 312 = 608 fragments are the added/removed areas. But those are very tiny areas (1.95e-6 sq deg total). Why so many fragments?

The symmetric difference between the two unions might produce many small fragments because the boundaries don't exactly align. Each tiny difference in the boundary nodes creates a small sliver polygon. Let me reduce the number of fragments by simplifying or using a tolerance.

Actually, let me check what's happening - the simplification is too aggressive or not enough. Let me also look at one of the fragments to see its size.

Let me adjust the approach to produce cleaner output.

10:04

Bash cd /work && python3 << 'PYEOF'

cd /work && python3 << 'PYEOF'
import geopandas as gpd
gdf = gpd.read_file("/work/vienna_boundary_changes.geojson")
print("Total features:", len(gdf))
print("\nChange type distribution:")
print(gdf["change_type"].value_counts())
print("\nArea stats:")
for ct in ["added_since_2014", "removed_since_2014", "unchanged"]:
    sub = gdf[gdf["change_type"] == ct]
    if not sub.empty:
        areas = sub.geometry.area
        print(f"  {ct}: count={len(sub)}, total_area={areas.sum():.8f}, min={areas.min():.10f}, max={areas.max():.8f}")

# Look at some tiny features
small = gdf[gdf.geometry.area < 1e-10]
print(f"\nFeatures with area < 1e-10: {len(small)}")
if len(small) > 0:
    print(small[["change_type", "district_name", "touches_changed"]].head(10))
PYEOF

Total features: 631

Change type distribution:
change_type
removed_since_2014    312
added_since_2014      296
unchanged              23
Name: count, dtype: int64

Area stats:
  added_since_2014: count=296, total_area=0.00000187, min=0.0000000000, max=0.00000028
  removed_since_2014: count=312, total_area=0.00000168, min=0.0000000000, max=0.00000017
  unchanged: count=23, total_area=0.05019658, min=0.0001318603, max=0.01238067

Features with area < 1e-10: 112
         change_type district_name  touches_changed
7   added_since_2014       Liesing             True
9   added_since_2014       Liesing             True
15  added_since_2014       Liesing             True
26  added_since_2014       Liesing             True
30  added_since_2014       Liesing             True
32  added_since_2014       Liesing             True
33  added_since_2014       Liesing             True
41  added_since_2014       Liesing             True
42  added_since_2014       Liesing             True
57  added_since_2014       Liesing             True

<stdin>:10: UserWarning: Geometry is in a geographic CRS. Results from 'area' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.

<stdin>:10: UserWarning: Geometry is in a geographic CRS. Results from 'area' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.

<stdin>:10: UserWarning: Geometry is in a geographic CRS. Results from 'area' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.

<stdin>:14: UserWarning: Geometry is in a geographic CRS. Results from 'area' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.

10:30

The issue is that the symmetric difference is producing many tiny sliver fragments (area < 1e-10 sq degrees), which are numerical artifacts from slightly different boundary representations between the two time snapshots. I need to:

1. Filter out tiny fragments (slivers) that are just numerical noise
2. Simplify the approach - the current approach creates individual fragments for each tiny piece of the boundary difference

Let me rethink the approach. Instead of using `shapely.difference` on the dissolved union which creates all these tiny slivers, I should:

1. For each district, compare its current geometry to its historical geometry
2. Compute the symmetric difference per district
3. Add a small buffer/tolerance to avoid slivers

Or better: use a snapping tolerance when computing differences. Let me use a small buffer (like 1e-9 or so) to clean up slivers.

Actually, the slivers come from the fact that the boundaries in 2014 and current are slightly different representations (different node coordinates), so when we do `shapely.difference` we get tiny slivers along the boundaries that are actually the same line but digitized differently.

A better approach:
1. For each district, compute the symmetric difference between current and historical geometries
2. Use a small buffer (e.g., 1e-8) to clean up slivers
3. Classify the result

But actually, the problem is that the boundaries between districts in the same snapshot might not perfectly align either (since they're separate relations with potentially different node coordinates).

Let me restructure:
- For each district, compare its current vs historical geometry directly
- Use a tolerance to handle tiny slivers
- Collect all added/removed/unchanged parts

Actually, a simpler and more robust approach:
1. Snap the district geometries to each other (using a small tolerance)
2. Then compute the difference

Or even simpler: since the current districts and historical districts should be the same 23 districts (just with slightly different boundaries), I can:
1. For each district, align current and historical by normalised name
2. Compute `shapely.symmetric_difference(current_geom, historical_geom)`
3. Use `shapely.normalize` and `shapely.snap` to reduce slivers
4. Filter out tiny fragments

Let me rewrite the pipeline with this approach.

10:30

Write /work/solve.py

content · 562 lines · py

"""
Fetch Vienna district boundaries (current and 2014-01-01) from Overpass via osmnx,
compare to find changes, and output vienna_boundary_changes.geojson.
"""

import os
import sys
import json
from collections import OrderedDict

import osmnx as ox
import osmnx._overpass as op
import geopandas as gpd
import pandas as pd
import shapely
import shapely.geometry
import numpy as np

ox.settings.log_console = False

# ---------------------------------------------------------------------------
# 1. Overpass query
# ---------------------------------------------------------------------------

def build_ring_from_ways(way_refs, ways_dict, nodes_dict):
    """Assemble a closed ring from a sequence of OSM way references.
    
    Returns (ring_coords, used_refs) or (None, []).
    """
    if not way_refs:
        return None, []
    
    segments = []
    for ref in way_refs:
        if ref not in ways_dict:
            continue
        node_ids = ways_dict[ref]
        if len(node_ids) < 2:
            continue
        coords = [nodes_dict.get(nid) for nid in node_ids]
        coords = [c for c in coords if c is not None]
        if len(coords) < 2:
            continue
        segments.append({
            'ref': ref,
            'coords': list(coords),
            'start_node': node_ids[0],
            'end_node': node_ids[-1],
        })
    
    if not segments:
        return None, []
    
    used = [False] * len(segments)
    
    # Start with longest segment
    longest_idx = max(range(len(segments)), key=lambda i: len(segments[i]['coords']))
    ring_coords = list(segments[longest_idx]['coords'])
    current_end_node = segments[longest_idx]['end_node']
    used[longest_idx] = True
    
    changed = True
    while changed:
        changed = False
        for i, seg in enumerate(segments):
            if used[i]:
                continue
            if seg['start_node'] == current_end_node:
                ring_coords.extend(seg['coords'][1:])
                current_end_node = seg['end_node']
                used[i] = True
                changed = True
                break
            elif seg['end_node'] == current_end_node:
                rev_coords = list(reversed(seg['coords']))
                ring_coords.extend(rev_coords[1:])
                current_end_node = seg['start_node']
                used[i] = True
                changed = True
                break
    
    if ring_coords[0] != ring_coords[-1]:
        ring_coords.append(ring_coords[0])
    
    used_refs = [segments[i]['ref'] for i in range(len(segments)) if used[i]]
    return ring_coords, used_refs


def fetch_vienna_districts_osmnx(date=None):
    """Fetch Vienna district (admin_level=9) boundary relations via Overpass."""
    bbox = (48.1, 16.1, 48.35, 16.6)
    
    if date:
        query = f"""
[out:json][timeout:180][date:"{date}"];
relation["admin_level"="9"]["boundary"="administrative"]({bbox[0]},{bbox[1]},{bbox[2]},{bbox[3]});
out body;
>;
out skel qt;
"""
    else:
        query = f"""
[out:json][timeout:180];
relation["admin_level"="9"]["boundary"="administrative"]({bbox[0]},{bbox[1]},{bbox[2]},{bbox[3]});
out body;
>;
out skel qt;
"""
    
    data = OrderedDict()
    data['data'] = query
    result = op._overpass_request(data)
    elements = result.get("elements", [])
    
    nodes = {}
    ways = {}
    relations = []
    
    for el in elements:
        etype = el.get("type")
        eid = el.get("id")
        if etype == "node":
            nodes[eid] = (el.get("lon"), el.get("lat"))
        elif etype == "way":
            ways[eid] = el.get("nodes", [])
        elif etype == "relation":
            relations.append(el)
    
    features = []
    
    for rel in relations:
        tags = rel.get("tags", {})
        name = tags.get("name", "")
        admin_level = tags.get("admin_level", "")
        
        if admin_level != "9":
            continue
        
        members = rel.get("members", [])
        
        all_outer_refs = []
        inner_refs_coords = []
        
        for mem in members:
            if mem.get("type") != "way":
                continue
            role = mem.get("role", "")
            ref = mem.get("ref")
            if ref not in ways:
                continue
            if role == "inner":
                node_ids = ways[ref]
                coords = [nodes.get(nid) for nid in node_ids]
                coords = [c for c in coords if c is not None]
                if len(coords) >= 3:
                    if coords[0] != coords[-1]:
                        coords.append(coords[0])
                    inner_refs_coords.append(coords)
            else:
                all_outer_refs.append(ref)
        
        # Build outer polygons iteratively
        outer_polygons = []
        remaining = list(all_outer_refs)
        
        while remaining:
            ring_coords, used_refs = build_ring_from_ways(remaining, ways, nodes)
            if ring_coords is None or len(ring_coords) < 4:
                break
            
            try:
                ls = shapely.LineString(ring_coords)
                if not ls.is_valid or ls.is_empty or ls.length < 1e-10:
                    break
                poly = shapely.Polygon(ring_coords)
                poly = shapely.orient_polygons(poly)
                if poly.is_valid and not poly.is_empty and poly.area > 1e-12:
                    outer_polygons.append(poly)
            except Exception:
                break
            
            remaining = [r for r in remaining if r not in used_refs]
        
        if not outer_polygons:
            continue
        
        # Build inner polygons
        inner_polygons = []
        for coords in inner_refs_coords:
            try:
                ipoly = shapely.Polygon(coords)
                ipoly = shapely.orient_polygons(ipoly)
                if ipoly.is_valid and not ipoly.is_empty:
                    inner_polygons.append(ipoly)
            except Exception:
                continue
        
        # Combine outer with holes
        final_polygons = []
        for outer_poly in outer_polygons:
            inners_here = []
            rest = []
            for ipoly in inner_polygons:
                try:
                    if outer_poly.contains(ipoly.representative_point()) or outer_poly.intersects(ipoly):
                        inners_here.append(ipoly)
                    else:
                        rest.append(ipoly)
                except Exception:
                    rest.append(ipoly)
            inner_polygons = rest
            
            if inners_here:
                try:
                    outer_coords = list(outer_poly.exterior.coords)
                    inner_coords_list = [list(ip.exterior.coords) for ip in inners_here]
                    poly = shapely.Polygon(outer_coords, inner_coords_list)
                    poly = shapely.orient_polygons(poly)
                    if poly.is_valid and not poly.is_empty:
                        final_polygons.append(poly)
                    else:
                        final_polygons.append(outer_poly)
                except Exception:
                    final_polygons.append(outer_poly)
            else:
                final_polygons.append(outer_poly)
        
        if final_polygons:
            geom = shapely.union_all(final_polygons)
            if geom.geom_type == "Polygon":
                geom = shapely.geometry.MultiPolygon([geom])
            
            if not geom.is_empty:
                features.append({
                    "geometry": geom,
                    "name": name,
                    "osm_id": rel.get("id"),
                })
    
    if not features:
        return gpd.GeoDataFrame()
    
    return gpd.GeoDataFrame(features, crs="EPSG:4326")


# ---------------------------------------------------------------------------
# 2. Name normalisation
# ---------------------------------------------------------------------------

VIENNA_DISTRICTS_CANON = {
    "innere stadt": "Innere Stadt",
    "leopoldstadt": "Leopoldstadt",
    "landstraße": "Landstraße",
    "landstrasse": "Landstraße",
    "wieden": "Wieden",
    "margareten": "Margareten",
    "margarethen": "Margareten",
    "mariahilf": "Mariahilf",
    "neubau": "Neubau",
    "josefstadt": "Josefstadt",
    "alsergrund": "Alsergrund",
    "favoriten": "Favoriten",
    "simmering": "Simmering",
    "meidling": "Meidling",
    "hietzing": "Hietzing",
    "penzing": "Penzing",
    "rudolfsheim-fünfhaus": "Rudolfsheim-Fünfhaus",
    "rudolfsheim-fuenfhaus": "Rudolfsheim-Fünfhaus",
    "ottakring": "Ottakring",
    "hernals": "Hernals",
    "währing": "Währing",
    "waehring": "Währing",
    "döbling": "Döbling",
    "doebling": "Döbling",
    "brigittenau": "Brigittenau",
    "floridsdorf": "Floridsdorf",
    "donaustadt": "Donaustadt",
    "liesing": "Liesing",
}


def normalise_name(name):
    if not name or pd.isna(name):
        return None
    n = name.strip()
    key = n.lower().replace("  ", " ").replace("ß", "ss").replace("-", "-")
    if key in VIENNA_DISTRICTS_CANON:
        return VIENNA_DISTRICTS_CANON[key]
    key2 = key.replace("ss", "ß")
    if key2 in VIENNA_DISTRICTS_CANON:
        return VIENNA_DISTRICTS_CANON[key2]
    for pattern, canon in VIENNA_DISTRICTS_CANON.items():
        if pattern in key or key in pattern:
            return canon
    return n


# ---------------------------------------------------------------------------
# 3. Main pipeline
# ---------------------------------------------------------------------------

def main():
    print("=" * 60)
    print("Fetching CURRENT Vienna district boundaries...")
    print("=" * 60)
    current_gdf = fetch_vienna_districts_osmnx()
    print(f"  Got {len(current_gdf)} current district features")
    
    print("\n" + "=" * 60)
    print("Fetching 2014-01-01 Vienna district boundaries...")
    print("=" * 60)
    historical_gdf = fetch_vienna_districts_osmnx(date="2014-01-01T00:00:00Z")
    print(f"  Got {len(historical_gdf)} historical district features")
    
    if current_gdf.empty or historical_gdf.empty:
        print("ERROR: Empty data returned!")
        sys.exit(1)
    
    # Normalise names
    current_gdf["district_name"] = current_gdf["name"].apply(normalise_name)
    historical_gdf["district_name"] = historical_gdf["name"].apply(normalise_name)
    
    print("\nCurrent district names:")
    for n in sorted(current_gdf["district_name"].dropna().unique()):
        print(f"  - {n}")
    print("\nHistorical district names:")
    for n in sorted(historical_gdf["district_name"].dropna().unique()):
        print(f"  - {n}")
    
    # Filter to Vienna districts only
    valid_names = set(VIENNA_DISTRICTS_CANON.values())
    current_gdf = current_gdf[current_gdf["district_name"].isin(valid_names)].copy()
    historical_gdf = historical_gdf[historical_gdf["district_name"].isin(valid_names)].copy()
    
    print(f"\nAfter filtering:")
    print(f"  Current: {len(current_gdf)} districts")
    print(f"  Historical: {len(historical_gdf)} districts")
    
    # Build per-district comparison
    # Index historical by name
    hist_by_name = {}
    for _, row in historical_gdf.iterrows():
        hist_by_name[row["district_name"]] = row["geometry"]
    
    # Tolerance for snapping (roughly 1-2 meters in degrees)
    TOLERANCE = 1e-7
    
    fragments = []
    
    for _, row in current_gdf.iterrows():
        dname = row["district_name"]
        current_geom = row["geometry"]
        historical_geom = hist_by_name.get(dname)
        
        if historical_geom is None:
            # District exists now but not in 2014 — fully added
            fragments.append({
                "geometry": current_geom,
                "change_type": "added_since_2014",
                "district_name": dname,
            })
            continue
        
        # Snap current to historical to reduce slivers
        # Use normalize for robust comparison
        current_norm = shapely.normalize(current_geom)
        hist_norm = shapely.normalize(historical_geom)
        
        # Snap historical geometry to current geometry's coordinates
        snapped_hist = shapely.snap(hist_norm, current_norm, TOLERANCE)
        
        # Compute differences
        added_part = shapely.difference(current_norm, snapped_hist)
        removed_part = shapely.difference(snapped_hist, current_norm)
        unchanged_part = shapely.intersection(current_norm, snapped_hist)
        
        # Clean up small slivers (noise)
        MIN_AREA = 1e-10  # ~ 1 sq meter in degrees
        
        def clean_geom(geom):
            """Remove tiny polygons and ensure MultiPolygon."""
            if geom.is_empty:
                return shapely.geometry.MultiPolygon()
            
            # Filter out tiny polygons
            if geom.geom_type == "MultiPolygon":
                polys = [g for g in geom.geoms if g.area >= MIN_AREA]
                if not polys:
                    return shapely.geometry.MultiPolygon()
                if len(polys) == 1:
                    geom = polys[0]
                else:
                    geom = shapely.geometry.MultiPolygon(polys)
            
            if geom.geom_type == "Polygon":
                if geom.area < MIN_AREA:
                    return shapely.geometry.MultiPolygon()
                return shapely.geometry.MultiPolygon([geom])
            
            return geom
        
        added_clean = clean_geom(added_part)
        removed_clean = clean_geom(removed_part)
        unchanged_clean = clean_geom(unchanged_part)
        
        if not added_clean.is_empty:
            if added_clean.geom_type == "MultiPolygon":
                for poly in added_clean.geoms:
                    fragments.append({
                        "geometry": shapely.geometry.MultiPolygon([poly]),
                        "change_type": "added_since_2014",
                        "district_name": dname,
                    })
            else:
                fragments.append({
                    "geometry": added_clean,
                    "change_type": "added_since_2014",
                    "district_name": dname,
                })
        
        if not removed_clean.is_empty:
            if removed_clean.geom_type == "MultiPolygon":
                for poly in removed_clean.geoms:
                    fragments.append({
                        "geometry": shapely.geometry.MultiPolygon([poly]),
                        "change_type": "removed_since_2014",
                        "district_name": dname,
                    })
            else:
                fragments.append({
                    "geometry": removed_clean,
                    "change_type": "removed_since_2014",
                    "district_name": dname,
                })
        
        if not unchanged_clean.is_empty:
            if unchanged_clean.geom_type == "MultiPolygon":
                for poly in unchanged_clean.geoms:
                    fragments.append({
                        "geometry": shapely.geometry.MultiPolygon([poly]),
                        "change_type": "unchanged",
                        "district_name": dname,
                    })
            else:
                fragments.append({
                    "geometry": unchanged_clean,
                    "change_type": "unchanged",
                    "district_name": dname,
                })
    
    result_gdf = gpd.GeoDataFrame(fragments, crs="EPSG:4326")
    
    if result_gdf.empty:
        print("No fragments produced. Using current boundaries as unchanged.")
        result_gdf = current_gdf.copy()
        result_gdf["change_type"] = "unchanged"
        result_gdf = result_gdf[["geometry", "change_type", "district_name"]]
    
    print(f"\nFragments created: {len(result_gdf)}")
    if not result_gdf.empty:
        print(result_gdf["change_type"].value_counts())
    
    # -----------------------------------------------------------------------
    # 4. touches_changed
    # -----------------------------------------------------------------------
    
    changed_geoms = []
    for _, row in result_gdf.iterrows():
        if row["change_type"] in ("added_since_2014", "removed_since_2014"):
            changed_geoms.append(row["geometry"])
    
    if changed_geoms:
        changed_union = shapely.union_all(changed_geoms)
    else:
        changed_union = shapely.geometry.MultiPolygon()
    
    # Compute touches per district (using current boundaries as reference)
    district_touches = {}
    for _, row in current_gdf.iterrows():
        dname = row["district_name"]
        geom = row["geometry"]
        if not changed_union.is_empty:
            touches = geom.touches(changed_union)
            if not touches:
                overlap = shapely.intersection(geom, changed_union)
                touches = not overlap.is_empty
            district_touches[dname] = touches
        else:
            district_touches[dname] = False
    
    # Also check historical-only districts
    for _, row in historical_gdf.iterrows():
        dname = row["district_name"]
        if dname not in district_touches:
            geom = row["geometry"]
            if not changed_union.is_empty:
                touches = geom.touches(changed_union)
                if not touches:
                    overlap = shapely.intersection(geom, changed_union)
                    touches = not overlap.is_empty
                district_touches[dname] = touches
            else:
                district_touches[dname] = False
    
    def get_touches(dn_str):
        if pd.isna(dn_str) or dn_str == "unknown":
            return False
        for d in dn_str.split(","):
            d = d.strip()
            if district_touches.get(d, False):
                return True
        return False
    
    result_gdf["touches_changed"] = result_gdf["district_name"].apply(get_touches)
    
    # Simplify geometries and ensure MultiPolygon
    result_gdf["geometry"] = result_gdf["geometry"].simplify(0.00001, preserve_topology=True)
    result_gdf["geometry"] = result_gdf["geometry"].buffer(0)
    
    def ensure_multipolygon(geom):
        if geom.is_empty:
            return geom
        if geom.geom_type == "Polygon":
            return shapely.geometry.MultiPolygon([geom])
        return geom
    
    result_gdf["geometry"] = result_gdf["geometry"].apply(ensure_multipolygon)
    
    # -----------------------------------------------------------------------
    # 5. Output
    # -----------------------------------------------------------------------
    output_path = "/work/vienna_boundary_changes.geojson"
    result_gdf.to_file(output_path, driver="GeoJSON")
    
    print(f"\n{'=' * 60}")
    print(f"Output written to {output_path}")
    print(f"Features: {len(result_gdf)}")
    print(f"Columns: {result_gdf.columns.tolist()}")
    
    print(f"\nSample features:")
    print(result_gdf[["change_type", "district_name", "touches_changed"]].head(20))
    
    print(f"\nDistrict touch summary:")
    for dname, touches in sorted(district_touches.items()):
        print(f"  {dname}: touches_changed={touches}")
    
    # Verify
    verify = gpd.read_file(output_path)
    print(f"\nVerification: read back {len(verify)} features")
    print(verify["change_type"].value_counts())
    print(f"Geometry types: {verify.geometry.geom_type.unique()}")
    
    # Area stats
    for ct in ["added_since_2014", "removed_since_2014", "unchanged"]:
        sub = verify[verify["change_type"] == ct]
        if not sub.empty:
            areas = sub.geometry.area
            print(f"  {ct}: count={len(sub)}, total_area={areas.sum():.10f}")


if __name__ == "__main__":
    main()

Wrote /work/solve.py (19789 chars).

10:32

Bash cd /work && timeout 300 python solve.py 2>&1

============================================================
Fetching CURRENT Vienna district boundaries...
============================================================
  Got 23 current district features

============================================================
Fetching 2014-01-01 Vienna district boundaries...
============================================================
  Got 26 historical district features

Current district names:
  - Alsergrund
  - Brigittenau
  - Donaustadt
  - Döbling
  - Favoriten
  - Floridsdorf
  - Hernals
  - Hietzing
  - Innere Stadt
  - Josefstadt
  - Landstraße
  - Leopoldstadt
  - Liesing
  - Margareten
  - Mariahilf
  - Meidling
  - Neubau
  - Ottakring
  - Penzing
  - Rudolfsheim-Fünfhaus
  - Simmering
  - Wieden
  - Währing

Historical district names:
  - Alsergrund
  - Brigittenau
  - Donaustadt
  - Döbling
  - Enzersfeld
  - Favoriten
  - Floridsdorf
  - Gerasdorf bei Wien
  - Hernals
  - Hietzing
  - Innere Stadt
  - Josefstadt
  - Klein-Engersdorf
  - Landstraße
  - Leopoldstadt
  - Liesing
  - Margareten
  - Mariahilf
  - Meidling
  - Neubau
  - Ottakring
  - Penzing
  - Rudolfsheim-Fünfhaus
  - Simmering
  - Wieden
  - Währing

After filtering:
  Current: 23 districts
  Historical: 23 districts

Fragments created: 2009
change_type
removed_since_2014    997
added_since_2014      989
unchanged              23
Name: count, dtype: int64

============================================================
Output written to /work/vienna_boundary_changes.geojson
Features: 2009
Columns: ['geometry', 'change_type', 'district_name', 'touches_changed']

Sample features:
         change_type district_name  touches_changed
0   added_since_2014    Alsergrund             True
1   added_since_2014    Alsergrund             True
2   added_since_2014    Alsergrund             True
3   added_since_2014    Alsergrund             True
4   added_since_2014    Alsergrund             True
5   added_since_2014    Alsergrund             True
6   added_since_2014    Alsergrund             True
7   added_since_2014    Alsergrund             True
8   added_since_2014    Alsergrund             True
9   added_since_2014    Alsergrund             True
10  added_since_2014    Alsergrund             True
11  added_since_2014    Alsergrund             True
12  added_since_2014    Alsergrund             True
13  added_since_2014    Alsergrund             True
14  added_since_2014    Alsergrund             True
15  added_since_2014    Alsergrund             True
16  added_since_2014    Alsergrund             True
17  added_since_2014    Alsergrund             True
18  added_since_2014    Alsergrund             True
19  added_since_2014    Alsergrund             True

District touch summary:
  Alsergrund: touches_changed=True
  Brigittenau: touches_changed=True
  Donaustadt: touches_changed=True
  Döbling: touches_changed=True
  Favoriten: touches_changed=True
  Floridsdorf: touches_changed=True
  Hernals: touches_changed=True
  Hietzing: touches_changed=True
  Innere Stadt: touches_changed=True
  Josefstadt: touches_changed=True
  Landstraße: touches_changed=True
  Leopoldstadt: touches_changed=True
  Liesing: touches_changed=True
  Margareten: touches_changed=True
  Mariahilf: touches_changed=True
  Meidling: touches_changed=True
  Neubau: touches_changed=True
  Ottakring: touches_changed=True
  Penzing: touches_changed=True
  Rudolfsheim-Fünfhaus: touches_changed=True
  Simmering: touches_changed=True
  Wieden: touches_changed=True
  Währing: touches_changed=True

Verification: read back 2009 features
change_type
removed_since_2014    997
added_since_2014      989
unchanged              23
Name: count, dtype: int64
Geometry types: <ArrowStringArray>
['MultiPolygon']
Length: 1, dtype: str
/work/solve.py:557: UserWarning: Geometry is in a geographic CRS. Results from 'area' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.

  areas = sub.geometry.area
  added_since_2014: count=989, total_area=0.0000204773
/work/solve.py:557: UserWarning: Geometry is in a geographic CRS. Results from 'area' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.

  areas = sub.geometry.area
  removed_since_2014: count=997, total_area=0.0000203214
/work/solve.py:557: UserWarning: Geometry is in a geographic CRS. Results from 'area' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.

  areas = sub.geometry.area
  unchanged: count=23, total_area=0.0501774692

11:00

This is still producing a lot of tiny sliver fragments (2009 features). The problem is that even with snapping, the difference operations produce many tiny fragments along all shared boundaries. This is a classic problem with comparing two datasets that have slightly different coordinate precision.

Let me rethink the approach. Instead of per-district difference (which creates slivers along every border), I should:

1. Union all current districts and union all historical districts
2. Compute the overall difference at the aggregate level (this will only show real changes, not slivers along shared edges)
3. Then clip the changed areas by individual districts

Actually, the key insight is that when two districts share a boundary, the per-district difference creates slivers where those shared boundaries don't align perfectly in the two snapshots. The solution is to work at the aggregate level first.

Let me try a different approach:
1. Union current and historical districts separately
2. Compute added = difference(current_union, historical_union) and removed = difference(historical_union, current_union)
3. Then assign each fragment to its district(s) by intersection with the current/historical districts
4. For unchanged, take intersection of current_union and historical_union, then clip by each current district

This way, the shared boundaries between districts won't produce slivers because we dissolve them first.

11:00

Write /work/solve.py

content · 558 lines · py

"""
Fetch Vienna district boundaries (current and 2014-01-01) from Overpass via osmnx,
compare to find changes, and output vienna_boundary_changes.geojson.
"""

import os
import sys
import json
from collections import OrderedDict

import osmnx as ox
import osmnx._overpass as op
import geopandas as gpd
import pandas as pd
import shapely
import shapely.geometry
import numpy as np

ox.settings.log_console = False

# ---------------------------------------------------------------------------
# 1. Overpass query
# ---------------------------------------------------------------------------

def build_ring_from_ways(way_refs, ways_dict, nodes_dict):
    if not way_refs:
        return None, []
    
    segments = []
    for ref in way_refs:
        if ref not in ways_dict:
            continue
        node_ids = ways_dict[ref]
        if len(node_ids) < 2:
            continue
        coords = [nodes_dict.get(nid) for nid in node_ids]
        coords = [c for c in coords if c is not None]
        if len(coords) < 2:
            continue
        segments.append({
            'ref': ref,
            'coords': list(coords),
            'start_node': node_ids[0],
            'end_node': node_ids[-1],
        })
    
    if not segments:
        return None, []
    
    used = [False] * len(segments)
    longest_idx = max(range(len(segments)), key=lambda i: len(segments[i]['coords']))
    ring_coords = list(segments[longest_idx]['coords'])
    current_end_node = segments[longest_idx]['end_node']
    used[longest_idx] = True
    
    changed = True
    while changed:
        changed = False
        for i, seg in enumerate(segments):
            if used[i]:
                continue
            if seg['start_node'] == current_end_node:
                ring_coords.extend(seg['coords'][1:])
                current_end_node = seg['end_node']
                used[i] = True
                changed = True
                break
            elif seg['end_node'] == current_end_node:
                rev_coords = list(reversed(seg['coords']))
                ring_coords.extend(rev_coords[1:])
                current_end_node = seg['start_node']
                used[i] = True
                changed = True
                break
    
    if ring_coords[0] != ring_coords[-1]:
        ring_coords.append(ring_coords[0])
    
    used_refs = [segments[i]['ref'] for i in range(len(segments)) if used[i]]
    return ring_coords, used_refs


def fetch_vienna_districts_osmnx(date=None):
    """Fetch Vienna district (admin_level=9) boundary relations via Overpass."""
    bbox = (48.1, 16.1, 48.35, 16.6)
    
    if date:
        query = f"""
[out:json][timeout:180][date:"{date}"];
relation["admin_level"="9"]["boundary"="administrative"]({bbox[0]},{bbox[1]},{bbox[2]},{bbox[3]});
out body;
>;
out skel qt;
"""
    else:
        query = f"""
[out:json][timeout:180];
relation["admin_level"="9"]["boundary"="administrative"]({bbox[0]},{bbox[1]},{bbox[2]},{bbox[3]});
out body;
>;
out skel qt;
"""
    
    data = OrderedDict()
    data['data'] = query
    result = op._overpass_request(data)
    elements = result.get("elements", [])
    
    nodes = {}
    ways = {}
    relations = []
    
    for el in elements:
        etype = el.get("type")
        eid = el.get("id")
        if etype == "node":
            nodes[eid] = (el.get("lon"), el.get("lat"))
        elif etype == "way":
            ways[eid] = el.get("nodes", [])
        elif etype == "relation":
            relations.append(el)
    
    features = []
    
    for rel in relations:
        tags = rel.get("tags", {})
        name = tags.get("name", "")
        admin_level = tags.get("admin_level", "")
        
        if admin_level != "9":
            continue
        
        members = rel.get("members", [])
        
        all_outer_refs = []
        inner_refs_coords = []
        
        for mem in members:
            if mem.get("type") != "way":
                continue
            role = mem.get("role", "")
            ref = mem.get("ref")
            if ref not in ways:
                continue
            if role == "inner":
                node_ids = ways[ref]
                coords = [nodes.get(nid) for nid in node_ids]
                coords = [c for c in coords if c is not None]
                if len(coords) >= 3:
                    if coords[0] != coords[-1]:
                        coords.append(coords[0])
                    inner_refs_coords.append(coords)
            else:
                all_outer_refs.append(ref)
        
        outer_polygons = []
        remaining = list(all_outer_refs)
        
        while remaining:
            ring_coords, used_refs = build_ring_from_ways(remaining, ways, nodes)
            if ring_coords is None or len(ring_coords) < 4:
                break
            
            try:
                ls = shapely.LineString(ring_coords)
                if not ls.is_valid or ls.is_empty or ls.length < 1e-10:
                    break
                poly = shapely.Polygon(ring_coords)
                poly = shapely.orient_polygons(poly)
                if poly.is_valid and not poly.is_empty and poly.area > 1e-12:
                    outer_polygons.append(poly)
            except Exception:
                break
            
            remaining = [r for r in remaining if r not in used_refs]
        
        if not outer_polygons:
            continue
        
        inner_polygons = []
        for coords in inner_refs_coords:
            try:
                ipoly = shapely.Polygon(coords)
                ipoly = shapely.orient_polygons(ipoly)
                if ipoly.is_valid and not ipoly.is_empty:
                    inner_polygons.append(ipoly)
            except Exception:
                continue
        
        final_polygons = []
        for outer_poly in outer_polygons:
            inners_here = []
            rest = []
            for ipoly in inner_polygons:
                try:
                    if outer_poly.contains(ipoly.representative_point()) or outer_poly.intersects(ipoly):
                        inners_here.append(ipoly)
                    else:
                        rest.append(ipoly)
                except Exception:
                    rest.append(ipoly)
            inner_polygons = rest
            
            if inners_here:
                try:
                    outer_coords = list(outer_poly.exterior.coords)
                    inner_coords_list = [list(ip.exterior.coords) for ip in inners_here]
                    poly = shapely.Polygon(outer_coords, inner_coords_list)
                    poly = shapely.orient_polygons(poly)
                    if poly.is_valid and not poly.is_empty:
                        final_polygons.append(poly)
                    else:
                        final_polygons.append(outer_poly)
                except Exception:
                    final_polygons.append(outer_poly)
            else:
                final_polygons.append(outer_poly)
        
        if final_polygons:
            geom = shapely.union_all(final_polygons)
            if geom.geom_type == "Polygon":
                geom = shapely.geometry.MultiPolygon([geom])
            
            if not geom.is_empty:
                features.append({
                    "geometry": geom,
                    "name": name,
                    "osm_id": rel.get("id"),
                })
    
    if not features:
        return gpd.GeoDataFrame()
    
    return gpd.GeoDataFrame(features, crs="EPSG:4326")


# ---------------------------------------------------------------------------
# 2. Name normalisation
# ---------------------------------------------------------------------------

VIENNA_DISTRICTS_CANON = {
    "innere stadt": "Innere Stadt",
    "leopoldstadt": "Leopoldstadt",
    "landstraße": "Landstraße",
    "landstrasse": "Landstraße",
    "wieden": "Wieden",
    "margareten": "Margareten",
    "margarethen": "Margareten",
    "mariahilf": "Mariahilf",
    "neubau": "Neubau",
    "josefstadt": "Josefstadt",
    "alsergrund": "Alsergrund",
    "favoriten": "Favoriten",
    "simmering": "Simmering",
    "meidling": "Meidling",
    "hietzing": "Hietzing",
    "penzing": "Penzing",
    "rudolfsheim-fünfhaus": "Rudolfsheim-Fünfhaus",
    "rudolfsheim-fuenfhaus": "Rudolfsheim-Fünfhaus",
    "ottakring": "Ottakring",
    "hernals": "Hernals",
    "währing": "Währing",
    "waehring": "Währing",
    "döbling": "Döbling",
    "doebling": "Döbling",
    "brigittenau": "Brigittenau",
    "floridsdorf": "Floridsdorf",
    "donaustadt": "Donaustadt",
    "liesing": "Liesing",
}


def normalise_name(name):
    if not name or pd.isna(name):
        return None
    n = name.strip()
    key = n.lower().replace("  ", " ").replace("ß", "ss").replace("-", "-")
    if key in VIENNA_DISTRICTS_CANON:
        return VIENNA_DISTRICTS_CANON[key]
    key2 = key.replace("ss", "ß")
    if key2 in VIENNA_DISTRICTS_CANON:
        return VIENNA_DISTRICTS_CANON[key2]
    for pattern, canon in VIENNA_DISTRICTS_CANON.items():
        if pattern in key or key in pattern:
            return canon
    return n


# ---------------------------------------------------------------------------
# 3. Main pipeline
# ---------------------------------------------------------------------------

def main():
    print("=" * 60)
    print("Fetching CURRENT Vienna district boundaries...")
    print("=" * 60)
    current_gdf = fetch_vienna_districts_osmnx()
    print(f"  Got {len(current_gdf)} current district features")
    
    print("\n" + "=" * 60)
    print("Fetching 2014-01-01 Vienna district boundaries...")
    print("=" * 60)
    historical_gdf = fetch_vienna_districts_osmnx(date="2014-01-01T00:00:00Z")
    print(f"  Got {len(historical_gdf)} historical district features")
    
    if current_gdf.empty or historical_gdf.empty:
        print("ERROR: Empty data returned!")
        sys.exit(1)
    
    # Normalise names
    current_gdf["district_name"] = current_gdf["name"].apply(normalise_name)
    historical_gdf["district_name"] = historical_gdf["name"].apply(normalise_name)
    
    print("\nCurrent district names:")
    for n in sorted(current_gdf["district_name"].dropna().unique()):
        print(f"  - {n}")
    print("\nHistorical district names:")
    for n in sorted(historical_gdf["district_name"].dropna().unique()):
        print(f"  - {n}")
    
    # Filter to Vienna districts only
    valid_names = set(VIENNA_DISTRICTS_CANON.values())
    current_gdf = current_gdf[current_gdf["district_name"].isin(valid_names)].copy()
    historical_gdf = historical_gdf[historical_gdf["district_name"].isin(valid_names)].copy()
    
    print(f"\nAfter filtering to Vienna districts:")
    print(f"  Current: {len(current_gdf)} districts")
    print(f"  Historical: {len(historical_gdf)} districts")
    
    # -----------------------------------------------------------------------
    # Strategy: dissolve to city level first, find changes there,
    # then clip back per-district for attribution
    # -----------------------------------------------------------------------
    
    # First, create a clean dissolved union of each snapshot
    current_union = shapely.union_all(current_gdf.geometry)
    historical_union = shapely.union_all(historical_gdf.geometry)
    
    print(f"\nCurrent union area: {current_union.area:.8f}")
    print(f"Historical union area: {historical_union.area:.8f}")
    
    # Snap historical to current to eliminate coordinate precision differences
    TOLERANCE = 1e-7  # ~1cm in degrees
    snapped_historical = shapely.snap(historical_union, current_union, TOLERANCE)
    
    # Compute symmetric difference at city level
    added = shapely.difference(current_union, snapped_historical)
    removed = shapely.difference(snapped_historical, current_union)
    unchanged = shapely.intersection(current_union, snapped_historical)
    
    print(f"\nAdded: area={added.area:.10f}, empty={added.is_empty}")
    print(f"Removed: area={removed.area:.10f}, empty={removed.is_empty}")
    print(f"Unchanged: area={unchanged.area:.10f}, empty={unchanged.is_empty}")
    
    # Remove tiny slivers
    MIN_AREA = 1e-9  # ~ 0.01 sq m in degrees
    
    def filter_slivers(geom):
        if geom.is_empty:
            return shapely.geometry.MultiPolygon()
        if geom.geom_type == "MultiPolygon":
            polys = [g for g in geom.geoms if g.area >= MIN_AREA]
            if not polys:
                return shapely.geometry.MultiPolygon()
            if len(polys) == 1:
                return shapely.geometry.MultiPolygon([polys[0]])
            return shapely.geometry.MultiPolygon(polys)
        elif geom.geom_type == "Polygon":
            if geom.area >= MIN_AREA:
                return shapely.geometry.MultiPolygon([geom])
            return shapely.geometry.MultiPolygon()
        return geom
    
    added = filter_slivers(added)
    removed = filter_slivers(removed)
    unchanged = filter_slivers(unchanged)
    
    print(f"\nAfter sliver removal:")
    print(f"  Added: area={added.area:.10f}" if not added.is_empty else "  Added: empty")
    print(f"  Removed: area={removed.area:.10f}" if not removed.is_empty else "  Removed: empty")
    print(f"  Unchanged: area={unchanged.area:.10f}" if not unchanged.is_empty else "  Unchanged: empty")
    
    # -----------------------------------------------------------------------
    # Build fragments with district attribution
    # -----------------------------------------------------------------------
    fragments = []
    
    def add_fragments(geom, ctype, source_gdf):
        """Add geometry fragments, assigning district_name by intersection with source_gdf."""
        if geom.is_empty:
            return
        if geom.geom_type == "MultiPolygon":
            polys = list(geom.geoms)
        elif geom.geom_type == "Polygon":
            polys = [geom]
        else:
            polys = []
        
        for poly in polys:
            if poly.area < MIN_AREA:
                continue
            poly_mp = shapely.geometry.MultiPolygon([poly]) if poly.geom_type == "Polygon" else poly
            
            # Find which district(s) this belongs to
            districts = set()
            for _, row in source_gdf.iterrows():
                if poly_mp.intersects(row["geometry"]):
                    districts.add(row["district_name"])
            
            dname = ", ".join(sorted(districts)) if districts else "unknown"
            fragments.append({
                "geometry": poly_mp,
                "change_type": ctype,
                "district_name": dname,
            })
    
    # Added areas: from current districts
    add_fragments(added, "added_since_2014", current_gdf)
    # Removed areas: from historical districts
    add_fragments(removed, "removed_since_2014", historical_gdf)
    
    # Unchanged: clip by each current district separately
    for _, row in current_gdf.iterrows():
        dname = row["district_name"]
        dist_geom = row["geometry"]
        unchanged_part = shapely.intersection(dist_geom, unchanged)
        if not unchanged_part.is_empty:
            unchanged_part = filter_slivers(unchanged_part)
            if not unchanged_part.is_empty:
                if unchanged_part.geom_type == "MultiPolygon":
                    polys = list(unchanged_part.geoms)
                elif unchanged_part.geom_type == "Polygon":
                    polys = [unchanged_part]
                else:
                    polys = []
                
                for poly in polys:
                    if poly.area < MIN_AREA:
                        continue
                    if poly.geom_type == "Polygon":
                        poly_mp = shapely.geometry.MultiPolygon([poly])
                    else:
                        poly_mp = poly
                    fragments.append({
                        "geometry": poly_mp,
                        "change_type": "unchanged",
                        "district_name": dname,
                    })
    
    result_gdf = gpd.GeoDataFrame(fragments, crs="EPSG:4326")
    
    if result_gdf.empty:
        print("No fragments produced. Using current boundaries as unchanged.")
        result_gdf = current_gdf.copy()
        result_gdf["change_type"] = "unchanged"
        result_gdf = result_gdf[["geometry", "change_type", "district_name"]]
    
    print(f"\nFragments created: {len(result_gdf)}")
    if not result_gdf.empty:
        print(result_gdf["change_type"].value_counts())
    
    # -----------------------------------------------------------------------
    # 4. touches_changed
    # -----------------------------------------------------------------------
    changed_geoms = []
    for _, row in result_gdf.iterrows():
        if row["change_type"] in ("added_since_2014", "removed_since_2014"):
            changed_geoms.append(row["geometry"])
    
    if changed_geoms:
        changed_union = shapely.union_all(changed_geoms)
    else:
        changed_union = shapely.geometry.MultiPolygon()
    
    district_touches = {}
    for _, row in current_gdf.iterrows():
        dname = row["district_name"]
        geom = row["geometry"]
        if not changed_union.is_empty:
            touches = geom.touches(changed_union)
            if not touches:
                overlap = shapely.intersection(geom, changed_union)
                touches = not overlap.is_empty
            district_touches[dname] = touches
        else:
            district_touches[dname] = False
    
    for _, row in historical_gdf.iterrows():
        dname = row["district_name"]
        if dname not in district_touches:
            geom = row["geometry"]
            if not changed_union.is_empty:
                touches = geom.touches(changed_union)
                if not touches:
                    overlap = shapely.intersection(geom, changed_union)
                    touches = not overlap.is_empty
                district_touches[dname] = touches
            else:
                district_touches[dname] = False
    
    def get_touches(dn_str):
        if pd.isna(dn_str) or dn_str == "unknown":
            return False
        for d in dn_str.split(","):
            d = d.strip()
            if district_touches.get(d, False):
                return True
        return False
    
    result_gdf["touches_changed"] = result_gdf["district_name"].apply(get_touches)
    
    # Simplify and ensure MultiPolygon
    result_gdf["geometry"] = result_gdf["geometry"].simplify(0.00001, preserve_topology=True)
    result_gdf["geometry"] = result_gdf["geometry"].buffer(0)
    
    def ensure_multipolygon(geom):
        if geom.is_empty:
            return geom
        if geom.geom_type == "Polygon":
            return shapely.geometry.MultiPolygon([geom])
        return geom
    
    result_gdf["geometry"] = result_gdf["geometry"].apply(ensure_multipolygon)
    
    # -----------------------------------------------------------------------
    # 5. Output
    # -----------------------------------------------------------------------
    output_path = "/work/vienna_boundary_changes.geojson"
    result_gdf.to_file(output_path, driver="GeoJSON")
    
    print(f"\n{'=' * 60}")
    print(f"Output written to {output_path}")
    print(f"Features: {len(result_gdf)}")
    print(f"Columns: {result_gdf.columns.tolist()}")
    
    print(f"\nSample features:")
    print(result_gdf[["change_type", "district_name", "touches_changed"]].head(20))
    
    print(f"\nDistrict touch summary:")
    for dname, touches in sorted(district_touches.items()):
        print(f"  {dname}: touches_changed={touches}")
    
    # Verify
    verify = gpd.read_file(output_path)
    print(f"\nVerification: read back {len(verify)} features")
    print(verify["change_type"].value_counts())
    print(f"Geometry types: {verify.geometry.geom_type.unique()}")
    
    # Area stats
    for ct in ["added_since_2014", "removed_since_2014", "unchanged"]:
        sub = verify[verify["change_type"] == ct]
        if not sub.empty:
            areas = sub.geometry.area
            print(f"  {ct}: count={len(sub)}, total_area={areas.sum():.10f}")


if __name__ == "__main__":
    main()

Wrote /work/solve.py (20242 chars).

11:02

Bash cd /work && timeout 300 python solve.py 2>&1

============================================================
Fetching CURRENT Vienna district boundaries...
============================================================
  Got 23 current district features

============================================================
Fetching 2014-01-01 Vienna district boundaries...
============================================================
  Got 26 historical district features

Current district names:
  - Alsergrund
  - Brigittenau
  - Donaustadt
  - Döbling
  - Favoriten
  - Floridsdorf
  - Hernals
  - Hietzing
  - Innere Stadt
  - Josefstadt
  - Landstraße
  - Leopoldstadt
  - Liesing
  - Margareten
  - Mariahilf
  - Meidling
  - Neubau
  - Ottakring
  - Penzing
  - Rudolfsheim-Fünfhaus
  - Simmering
  - Wieden
  - Währing

Historical district names:
  - Alsergrund
  - Brigittenau
  - Donaustadt
  - Döbling
  - Enzersfeld
  - Favoriten
  - Floridsdorf
  - Gerasdorf bei Wien
  - Hernals
  - Hietzing
  - Innere Stadt
  - Josefstadt
  - Klein-Engersdorf
  - Landstraße
  - Leopoldstadt
  - Liesing
  - Margareten
  - Mariahilf
  - Meidling
  - Neubau
  - Ottakring
  - Penzing
  - Rudolfsheim-Fünfhaus
  - Simmering
  - Wieden
  - Währing

After filtering to Vienna districts:
  Current: 23 districts
  Historical: 23 districts

Current union area: 0.05019851
Historical union area: 0.05019825

Added: area=0.0000019540, empty=False
Removed: area=0.0000016878, empty=False
Unchanged: area=0.0501965579, empty=False

After sliver removal:
  Added: area=0.0000019149
  Removed: area=0.0000016405
  Unchanged: area=0.0501965579

Fragments created: 332
change_type
added_since_2014      158
removed_since_2014    151
unchanged              23
Name: count, dtype: int64

============================================================
Output written to /work/vienna_boundary_changes.geojson
Features: 332
Columns: ['geometry', 'change_type', 'district_name', 'touches_changed']

Sample features:
         change_type district_name  touches_changed
0   added_since_2014       Liesing             True
1   added_since_2014       Liesing             True
2   added_since_2014       Liesing             True
3   added_since_2014       Liesing             True
4   added_since_2014       Liesing             True
5   added_since_2014       Liesing             True
6   added_since_2014       Liesing             True
7   added_since_2014       Liesing             True
8   added_since_2014       Liesing             True
9   added_since_2014       Liesing             True
10  added_since_2014       Liesing             True
11  added_since_2014       Liesing             True
12  added_since_2014       Liesing             True
13  added_since_2014       Liesing             True
14  added_since_2014       Liesing             True
15  added_since_2014       Liesing             True
16  added_since_2014       Liesing             True
17  added_since_2014       Liesing             True
18  added_since_2014       Liesing             True
19  added_since_2014       Liesing             True

District touch summary:
  Alsergrund: touches_changed=False
  Brigittenau: touches_changed=False
  Donaustadt: touches_changed=True
  Döbling: touches_changed=True
  Favoriten: touches_changed=True
  Floridsdorf: touches_changed=True
  Hernals: touches_changed=True
  Hietzing: touches_changed=True
  Innere Stadt: touches_changed=False
  Josefstadt: touches_changed=False
  Landstraße: touches_changed=False
  Leopoldstadt: touches_changed=False
  Liesing: touches_changed=True
  Margareten: touches_changed=False
  Mariahilf: touches_changed=False
  Meidling: touches_changed=False
  Neubau: touches_changed=False
  Ottakring: touches_changed=False
  Penzing: touches_changed=True
  Rudolfsheim-Fünfhaus: touches_changed=False
  Simmering: touches_changed=True
  Wieden: touches_changed=False
  Währing: touches_changed=False

Verification: read back 332 features
change_type
added_since_2014      158
removed_since_2014    151
unchanged              23
Name: count, dtype: int64
Geometry types: <ArrowStringArray>
['MultiPolygon']
Length: 1, dtype: str
/work/solve.py:553: UserWarning: Geometry is in a geographic CRS. Results from 'area' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.

  areas = sub.geometry.area
  added_since_2014: count=158, total_area=0.0000018387
/work/solve.py:553: UserWarning: Geometry is in a geographic CRS. Results from 'area' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.

  areas = sub.geometry.area
  removed_since_2014: count=151, total_area=0.0000016243
/work/solve.py:553: UserWarning: Geometry is in a geographic CRS. Results from 'area' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.

  areas = sub.geometry.area
  unchanged: count=23, total_area=0.0501965399

11:30

Good improvement. Now we have 332 fragments (down from 2000+). But we still have 158 added fragments and 151 removed fragments - still a lot of tiny slivers from the `shapely.intersection` clipping against districts. The per-district unchanged clipping is fine (23 fragments for 23 districts), but the added/removed fragments are getting broken into many pieces by the intersection with district boundaries.

The issue is that when we take the added geometry (which might be a single polygon) and then try to attribute it to districts by intersecting with each district, we end up with many pieces.

Let me think about this differently. The added/removed areas are already small. Instead of doing per-district intersection for added/removed, we can just assign them to the district they belong to by checking containment/intersection, without actually cutting them along district boundaries. We can keep them as-is (MultiPolygon) and just tag them with all districts they touch.

Actually, the real problem is the `add_fragments` function. When we call `poly_mp.intersects(row["geometry"])` for each district, we correctly identify the district. But in the fragment creation, we're NOT cutting - we're just checking intersects and assigning the name. So the fragments should remain as the original added/removed polygons.

Wait, looking at the code again - we ARE adding the geometry unchanged from `add_fragments`. So `added` starts as a MultiPolygon with some number of polygons (from the dissolve), and we iterate over each polygon in the MultiPolygon. That gives us a reasonable number.

But 158 + 151 = 309 added/removed fragments! That means the added MultiPolygon had 158 individual polygons and removed had 151. That's a lot...

Actually, these are individual tiny polygons from the boundary difference. Vienna boundaries probably changed in many small segments along the outer boundary. Let me check what districts these are in.

Most of them are in Liesing (southern district). Let me check if there's a boundary change along the southern edge of Vienna.

Actually, the counts might be OK - real boundary changes between 2014 and now would produce many small difference polygons along the boundary. But 158 seems high. Let me check the area of the largest fragment to understand the scale.

Actually, looking at total added area = 0.0000018387 sq degrees ≈ 0.0000018387 * (111000)^2 ≈ 22,600 sq meters. That's a real change. And 158 fragments for that area means many tiny slivers.

The issue is that the boundaries don't perfectly align even after snapping. Let me increase the snapping tolerance and also try using a larger tolerance for the difference operation.

Let me try a different approach: use a slightly larger tolerance and also clean up the results by dissolving very close geometries.

11:30

Write /work/solve.py

content · 533 lines · py

"""
Fetch Vienna district boundaries (current and 2014-01-01) from Overpass via osmnx,
compare to find changes, and output vienna_boundary_changes.geojson.
"""

import os
import sys
import json
from collections import OrderedDict

import osmnx as ox
import osmnx._overpass as op
import geopandas as gpd
import pandas as pd
import shapely
import shapely.geometry
import numpy as np

ox.settings.log_console = False

# ---------------------------------------------------------------------------
# 1. Overpass query
# ---------------------------------------------------------------------------

def build_ring_from_ways(way_refs, ways_dict, nodes_dict):
    if not way_refs:
        return None, []
    
    segments = []
    for ref in way_refs:
        if ref not in ways_dict:
            continue
        node_ids = ways_dict[ref]
        if len(node_ids) < 2:
            continue
        coords = [nodes_dict.get(nid) for nid in node_ids]
        coords = [c for c in coords if c is not None]
        if len(coords) < 2:
            continue
        segments.append({
            'ref': ref,
            'coords': list(coords),
            'start_node': node_ids[0],
            'end_node': node_ids[-1],
        })
    
    if not segments:
        return None, []
    
    used = [False] * len(segments)
    longest_idx = max(range(len(segments)), key=lambda i: len(segments[i]['coords']))
    ring_coords = list(segments[longest_idx]['coords'])
    current_end_node = segments[longest_idx]['end_node']
    used[longest_idx] = True
    
    changed = True
    while changed:
        changed = False
        for i, seg in enumerate(segments):
            if used[i]:
                continue
            if seg['start_node'] == current_end_node:
                ring_coords.extend(seg['coords'][1:])
                current_end_node = seg['end_node']
                used[i] = True
                changed = True
                break
            elif seg['end_node'] == current_end_node:
                rev_coords = list(reversed(seg['coords']))
                ring_coords.extend(rev_coords[1:])
                current_end_node = seg['start_node']
                used[i] = True
                changed = True
                break
    
    if ring_coords[0] != ring_coords[-1]:
        ring_coords.append(ring_coords[0])
    
    used_refs = [segments[i]['ref'] for i in range(len(segments)) if used[i]]
    return ring_coords, used_refs


def fetch_vienna_districts_osmnx(date=None):
    bbox = (48.1, 16.1, 48.35, 16.6)
    
    if date:
        query = f"""
[out:json][timeout:180][date:"{date}"];
relation["admin_level"="9"]["boundary"="administrative"]({bbox[0]},{bbox[1]},{bbox[2]},{bbox[3]});
out body;
>;
out skel qt;
"""
    else:
        query = f"""
[out:json][timeout:180];
relation["admin_level"="9"]["boundary"="administrative"]({bbox[0]},{bbox[1]},{bbox[2]},{bbox[3]});
out body;
>;
out skel qt;
"""
    
    data = OrderedDict()
    data['data'] = query
    result = op._overpass_request(data)
    elements = result.get("elements", [])
    
    nodes = {}
    ways = {}
    relations = []
    
    for el in elements:
        etype = el.get("type")
        eid = el.get("id")
        if etype == "node":
            nodes[eid] = (el.get("lon"), el.get("lat"))
        elif etype == "way":
            ways[eid] = el.get("nodes", [])
        elif etype == "relation":
            relations.append(el)
    
    features = []
    
    for rel in relations:
        tags = rel.get("tags", {})
        name = tags.get("name", "")
        admin_level = tags.get("admin_level", "")
        
        if admin_level != "9":
            continue
        
        members = rel.get("members", [])
        
        all_outer_refs = []
        inner_refs_coords = []
        
        for mem in members:
            if mem.get("type") != "way":
                continue
            role = mem.get("role", "")
            ref = mem.get("ref")
            if ref not in ways:
                continue
            if role == "inner":
                node_ids = ways[ref]
                coords = [nodes.get(nid) for nid in node_ids]
                coords = [c for c in coords if c is not None]
                if len(coords) >= 3:
                    if coords[0] != coords[-1]:
                        coords.append(coords[0])
                    inner_refs_coords.append(coords)
            else:
                all_outer_refs.append(ref)
        
        outer_polygons = []
        remaining = list(all_outer_refs)
        
        while remaining:
            ring_coords, used_refs = build_ring_from_ways(remaining, ways, nodes)
            if ring_coords is None or len(ring_coords) < 4:
                break
            
            try:
                ls = shapely.LineString(ring_coords)
                if not ls.is_valid or ls.is_empty or ls.length < 1e-10:
                    break
                poly = shapely.Polygon(ring_coords)
                poly = shapely.orient_polygons(poly)
                if poly.is_valid and not poly.is_empty and poly.area > 1e-12:
                    outer_polygons.append(poly)
            except Exception:
                break
            
            remaining = [r for r in remaining if r not in used_refs]
        
        if not outer_polygons:
            continue
        
        inner_polygons = []
        for coords in inner_refs_coords:
            try:
                ipoly = shapely.Polygon(coords)
                ipoly = shapely.orient_polygons(ipoly)
                if ipoly.is_valid and not ipoly.is_empty:
                    inner_polygons.append(ipoly)
            except Exception:
                continue
        
        final_polygons = []
        for outer_poly in outer_polygons:
            inners_here = []
            rest = []
            for ipoly in inner_polygons:
                try:
                    if outer_poly.contains(ipoly.representative_point()) or outer_poly.intersects(ipoly):
                        inners_here.append(ipoly)
                    else:
                        rest.append(ipoly)
                except Exception:
                    rest.append(ipoly)
            inner_polygons = rest
            
            if inners_here:
                try:
                    outer_coords = list(outer_poly.exterior.coords)
                    inner_coords_list = [list(ip.exterior.coords) for ip in inners_here]
                    poly = shapely.Polygon(outer_coords, inner_coords_list)
                    poly = shapely.orient_polygons(poly)
                    if poly.is_valid and not poly.is_empty:
                        final_polygons.append(poly)
                    else:
                        final_polygons.append(outer_poly)
                except Exception:
                    final_polygons.append(outer_poly)
            else:
                final_polygons.append(outer_poly)
        
        if final_polygons:
            geom = shapely.union_all(final_polygons)
            if geom.geom_type == "Polygon":
                geom = shapely.geometry.MultiPolygon([geom])
            
            if not geom.is_empty:
                features.append({
                    "geometry": geom,
                    "name": name,
                    "osm_id": rel.get("id"),
                })
    
    if not features:
        return gpd.GeoDataFrame()
    
    return gpd.GeoDataFrame(features, crs="EPSG:4326")


# ---------------------------------------------------------------------------
# 2. Name normalisation
# ---------------------------------------------------------------------------

VIENNA_DISTRICTS_CANON = {
    "innere stadt": "Innere Stadt",
    "leopoldstadt": "Leopoldstadt",
    "landstraße": "Landstraße",
    "landstrasse": "Landstraße",
    "wieden": "Wieden",
    "margareten": "Margareten",
    "margarethen": "Margareten",
    "mariahilf": "Mariahilf",
    "neubau": "Neubau",
    "josefstadt": "Josefstadt",
    "alsergrund": "Alsergrund",
    "favoriten": "Favoriten",
    "simmering": "Simmering",
    "meidling": "Meidling",
    "hietzing": "Hietzing",
    "penzing": "Penzing",
    "rudolfsheim-fünfhaus": "Rudolfsheim-Fünfhaus",
    "rudolfsheim-fuenfhaus": "Rudolfsheim-Fünfhaus",
    "ottakring": "Ottakring",
    "hernals": "Hernals",
    "währing": "Währing",
    "waehring": "Währing",
    "döbling": "Döbling",
    "doebling": "Döbling",
    "brigittenau": "Brigittenau",
    "floridsdorf": "Floridsdorf",
    "donaustadt": "Donaustadt",
    "liesing": "Liesing",
}


def normalise_name(name):
    if not name or pd.isna(name):
        return None
    n = name.strip()
    key = n.lower().replace("  ", " ").replace("ß", "ss").replace("-", "-")
    if key in VIENNA_DISTRICTS_CANON:
        return VIENNA_DISTRICTS_CANON[key]
    key2 = key.replace("ss", "ß")
    if key2 in VIENNA_DISTRICTS_CANON:
        return VIENNA_DISTRICTS_CANON[key2]
    for pattern, canon in VIENNA_DISTRICTS_CANON.items():
        if pattern in key or key in pattern:
            return canon
    return n


# ---------------------------------------------------------------------------
# 3. Main pipeline
# ---------------------------------------------------------------------------

def main():
    print("=" * 60)
    print("Fetching CURRENT Vienna district boundaries...")
    print("=" * 60)
    current_gdf = fetch_vienna_districts_osmnx()
    print(f"  Got {len(current_gdf)} current district features")
    
    print("\n" + "=" * 60)
    print("Fetching 2014-01-01 Vienna district boundaries...")
    print("=" * 60)
    historical_gdf = fetch_vienna_districts_osmnx(date="2014-01-01T00:00:00Z")
    print(f"  Got {len(historical_gdf)} historical district features")
    
    if current_gdf.empty or historical_gdf.empty:
        print("ERROR: Empty data returned!")
        sys.exit(1)
    
    # Normalise names
    current_gdf["district_name"] = current_gdf["name"].apply(normalise_name)
    historical_gdf["district_name"] = historical_gdf["name"].apply(normalise_name)
    
    print("\nCurrent district names:")
    for n in sorted(current_gdf["district_name"].dropna().unique()):
        print(f"  - {n}")
    print("\nHistorical district names:")
    for n in sorted(historical_gdf["district_name"].dropna().unique()):
        print(f"  - {n}")
    
    # Filter to Vienna districts only
    valid_names = set(VIENNA_DISTRICTS_CANON.values())
    current_gdf = current_gdf[current_gdf["district_name"].isin(valid_names)].copy()
    historical_gdf = historical_gdf[historical_gdf["district_name"].isin(valid_names)].copy()
    
    print(f"\nAfter filtering to Vienna districts:")
    print(f"  Current: {len(current_gdf)} districts")
    print(f"  Historical: {len(historical_gdf)} districts")
    
    # Create per-district snapshots indexed by name
    hist_by_name = {row["district_name"]: row["geometry"] for _, row in historical_gdf.iterrows()}
    
    # -----------------------------------------------------------------------
    # Strategy: compare district-per-district, with coordinate snapping
    # -----------------------------------------------------------------------
    
    TOLERANCE = 1e-6  # ~10cm in degrees
    MIN_AREA = 1e-9   # ~0.01 sq m in degrees
    
    fragments = []
    
    for _, row in current_gdf.iterrows():
        dname = row["district_name"]
        current_geom = row["geometry"]
        historical_geom = hist_by_name.get(dname)
        
        if historical_geom is None:
            # District exists now but not in 2014
            fragments.append({
                "geometry": current_geom,
                "change_type": "added_since_2014",
                "district_name": dname,
            })
            continue
        
        # Snap geometries to each other to eliminate coordinate noise
        current_norm = shapely.normalize(current_geom)
        hist_norm = shapely.normalize(historical_geom)
        
        # Two-way snapping for best alignment
        snapped_current = shapely.snap(current_norm, hist_norm, TOLERANCE)
        snapped_hist = shapely.snap(hist_norm, snapped_current, TOLERANCE)
        
        # Symmetric difference
        added_part = shapely.difference(snapped_current, snapped_hist)
        removed_part = shapely.difference(snapped_hist, snapped_current)
        unchanged_part = shapely.intersection(snapped_current, snapped_hist)
        
        # Clean slivers
        def clean_and_split(geom, label):
            """Clean geometry, filter slivers, return list of (geom, label) fragments."""
            if geom.is_empty:
                return []
            
            if geom.geom_type == "MultiPolygon":
                polys = list(geom.geoms)
            elif geom.geom_type == "Polygon":
                polys = [geom]
            else:
                return []
            
            result = []
            for poly in polys:
                if poly.is_empty or poly.area < MIN_AREA:
                    continue
                if poly.geom_type == "Polygon":
                    mp = shapely.geometry.MultiPolygon([poly])
                else:
                    mp = poly
                result.append((mp, label))
            return result
        
        for mp, ctype in clean_and_split(added_part, "added_since_2014"):
            fragments.append({
                "geometry": mp,
                "change_type": ctype,
                "district_name": dname,
            })
        
        for mp, ctype in clean_and_split(removed_part, "removed_since_2014"):
            fragments.append({
                "geometry": mp,
                "change_type": ctype,
                "district_name": dname,
            })
        
        for mp, ctype in clean_and_split(unchanged_part, "unchanged"):
            fragments.append({
                "geometry": mp,
                "change_type": ctype,
                "district_name": dname,
            })
    
    # Also check for historical districts that no longer exist
    current_names = set(current_gdf["district_name"])
    for _, row in historical_gdf.iterrows():
        dname = row["district_name"]
        if dname not in current_names:
            # District existed in 2014 but not now — fully removed
            fragments.append({
                "geometry": row["geometry"],
                "change_type": "removed_since_2014",
                "district_name": dname,
            })
    
    result_gdf = gpd.GeoDataFrame(fragments, crs="EPSG:4326")
    
    if result_gdf.empty:
        print("No fragments produced. Using current boundaries as unchanged.")
        result_gdf = current_gdf.copy()
        result_gdf["change_type"] = "unchanged"
        result_gdf = result_gdf[["geometry", "change_type", "district_name"]]
    
    print(f"\nFragments created: {len(result_gdf)}")
    if not result_gdf.empty:
        print(result_gdf["change_type"].value_counts())
    
    # -----------------------------------------------------------------------
    # 4. touches_changed
    # -----------------------------------------------------------------------
    changed_geoms = []
    for _, row in result_gdf.iterrows():
        if row["change_type"] in ("added_since_2014", "removed_since_2014"):
            changed_geoms.append(row["geometry"])
    
    if changed_geoms:
        changed_union = shapely.union_all(changed_geoms)
    else:
        changed_union = shapely.geometry.MultiPolygon()
    
    district_touches = {}
    for _, row in current_gdf.iterrows():
        dname = row["district_name"]
        geom = row["geometry"]
        if not changed_union.is_empty:
            touches = geom.touches(changed_union)
            if not touches:
                overlap = shapely.intersection(geom, changed_union)
                touches = not overlap.is_empty
            district_touches[dname] = touches
        else:
            district_touches[dname] = False
    
    for _, row in historical_gdf.iterrows():
        dname = row["district_name"]
        if dname not in district_touches:
            geom = row["geometry"]
            if not changed_union.is_empty:
                touches = geom.touches(changed_union)
                if not touches:
                    overlap = shapely.intersection(geom, changed_union)
                    touches = not overlap.is_empty
                district_touches[dname] = touches
            else:
                district_touches[dname] = False
    
    def get_touches(dn_str):
        if pd.isna(dn_str) or dn_str == "unknown":
            return False
        for d in dn_str.split(","):
            d = d.strip()
            if district_touches.get(d, False):
                return True
        return False
    
    result_gdf["touches_changed"] = result_gdf["district_name"].apply(get_touches)
    
    # Simplify and ensure MultiPolygon
    result_gdf["geometry"] = result_gdf["geometry"].simplify(0.00001, preserve_topology=True)
    result_gdf["geometry"] = result_gdf["geometry"].buffer(0)
    
    def ensure_multipolygon(geom):
        if geom.is_empty:
            return geom
        if geom.geom_type == "Polygon":
            return shapely.geometry.MultiPolygon([geom])
        return geom
    
    result_gdf["geometry"] = result_gdf["geometry"].apply(ensure_multipolygon)
    
    # -----------------------------------------------------------------------
    # 5. Output
    # -----------------------------------------------------------------------
    output_path = "/work/vienna_boundary_changes.geojson"
    result_gdf.to_file(output_path, driver="GeoJSON")
    
    print(f"\n{'=' * 60}")
    print(f"Output written to {output_path}")
    print(f"Features: {len(result_gdf)}")
    print(f"Columns: {result_gdf.columns.tolist()}")
    
    print(f"\nSample features:")
    print(result_gdf[["change_type", "district_name", "touches_changed"]].head(20))
    
    print(f"\nDistrict touch summary:")
    for dname, touches in sorted(district_touches.items()):
        print(f"  {dname}: touches_changed={touches}")
    
    # Verify
    verify = gpd.read_file(output_path)
    print(f"\nVerification: read back {len(verify)} features")
    print(verify["change_type"].value_counts())
    print(f"Geometry types: {verify.geometry.geom_type.unique()}")
    
    # Area stats
    for ct in ["added_since_2014", "removed_since_2014", "unchanged"]:
        sub = verify[verify["change_type"] == ct]
        if not sub.empty:
            areas = sub.geometry.area
            print(f"  {ct}: count={len(sub)}, total_area={areas.sum():.10f}")


if __name__ == "__main__":
    main()

Wrote /work/solve.py (18611 chars).

11:32

Bash cd /work && timeout 300 python solve.py 2>&1

============================================================
Fetching CURRENT Vienna district boundaries...
============================================================
  Got 23 current district features

============================================================
Fetching 2014-01-01 Vienna district boundaries...
============================================================
  Got 26 historical district features

Current district names:
  - Alsergrund
  - Brigittenau
  - Donaustadt
  - Döbling
  - Favoriten
  - Floridsdorf
  - Hernals
  - Hietzing
  - Innere Stadt
  - Josefstadt
  - Landstraße
  - Leopoldstadt
  - Liesing
  - Margareten
  - Mariahilf
  - Meidling
  - Neubau
  - Ottakring
  - Penzing
  - Rudolfsheim-Fünfhaus
  - Simmering
  - Wieden
  - Währing

Historical district names:
  - Alsergrund
  - Brigittenau
  - Donaustadt
  - Döbling
  - Enzersfeld
  - Favoriten
  - Floridsdorf
  - Gerasdorf bei Wien
  - Hernals
  - Hietzing
  - Innere Stadt
  - Josefstadt
  - Klein-Engersdorf
  - Landstraße
  - Leopoldstadt
  - Liesing
  - Margareten
  - Mariahilf
  - Meidling
  - Neubau
  - Ottakring
  - Penzing
  - Rudolfsheim-Fünfhaus
  - Simmering
  - Wieden
  - Währing

After filtering to Vienna districts:
  Current: 23 districts
  Historical: 23 districts

Fragments created: 1366
change_type
added_since_2014      676
removed_since_2014    667
unchanged              23
Name: count, dtype: int64

============================================================
Output written to /work/vienna_boundary_changes.geojson
Features: 1366
Columns: ['geometry', 'change_type', 'district_name', 'touches_changed']

Sample features:
           change_type district_name  touches_changed
0     added_since_2014    Alsergrund             True
1     added_since_2014    Alsergrund             True
2     added_since_2014    Alsergrund             True
3     added_since_2014    Alsergrund             True
4     added_since_2014    Alsergrund             True
5     added_since_2014    Alsergrund             True
6     added_since_2014    Alsergrund             True
7     added_since_2014    Alsergrund             True
8     added_since_2014    Alsergrund             True
9     added_since_2014    Alsergrund             True
10    added_since_2014    Alsergrund             True
11  removed_since_2014    Alsergrund             True
12  removed_since_2014    Alsergrund             True
13  removed_since_2014    Alsergrund             True
14  removed_since_2014    Alsergrund             True
15  removed_since_2014    Alsergrund             True
16  removed_since_2014    Alsergrund             True
17  removed_since_2014    Alsergrund             True
18  removed_since_2014    Alsergrund             True
19  removed_since_2014    Alsergrund             True

District touch summary:
  Alsergrund: touches_changed=True
  Brigittenau: touches_changed=True
  Donaustadt: touches_changed=True
  Döbling: touches_changed=True
  Favoriten: touches_changed=True
  Floridsdorf: touches_changed=True
  Hernals: touches_changed=True
  Hietzing: touches_changed=True
  Innere Stadt: touches_changed=True
  Josefstadt: touches_changed=True
  Landstraße: touches_changed=True
  Leopoldstadt: touches_changed=True
  Liesing: touches_changed=True
  Margareten: touches_changed=True
  Mariahilf: touches_changed=True
  Meidling: touches_changed=True
  Neubau: touches_changed=True
  Ottakring: touches_changed=True
  Penzing: touches_changed=True
  Rudolfsheim-Fünfhaus: touches_changed=True
  Simmering: touches_changed=True
  Wieden: touches_changed=True
  Währing: touches_changed=True

Verification: read back 1366 features
change_type
added_since_2014      676
removed_since_2014    667
unchanged              23
Name: count, dtype: int64
Geometry types: <ArrowStringArray>
['MultiPolygon']
Length: 1, dtype: str
/work/solve.py:528: UserWarning: Geometry is in a geographic CRS. Results from 'area' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.

  areas = sub.geometry.area
  added_since_2014: count=676, total_area=0.0000202083
/work/solve.py:528: UserWarning: Geometry is in a geographic CRS. Results from 'area' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.

  areas = sub.geometry.area
  removed_since_2014: count=667, total_area=0.0000201035
/work/solve.py:528: UserWarning: Geometry is in a geographic CRS. Results from 'area' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.

  areas = sub.geometry.area
  unchanged: count=23, total_area=0.0501775233

11:58

The per-district approach is creating slivers along shared boundaries between districts. The problem is that when comparing district-by-district, the shared boundary between e.g. Alsergrund and Währing gets fragmented because the historical and current boundaries don't align perfectly at the shared edge.

The better approach is:
1. Do a city-level dissolve to identify changed regions (added/removed)
2. Then attribute those changes back to districts by intersection

Let me go back to the dissolve-first approach but with a better tolerance and post-processing.

11:58

Write /work/solve.py

content · 562 lines · py

"""
Fetch Vienna district boundaries (current and 2014-01-01) from Overpass via osmnx,
compare to find changes, and output vienna_boundary_changes.geojson.

Strategy: dissolve to city-level boundary first, compute symmetric difference,
then attribute fragments back to districts. Use generous tolerance to avoid
slivers from coordinate misalignment.
"""

import os
import sys
import json
from collections import OrderedDict

import osmnx as ox
import osmnx._overpass as op
import geopandas as gpd
import pandas as pd
import shapely
import shapely.geometry
import numpy as np

ox.settings.log_console = False

# ---------------------------------------------------------------------------
# 1. Overpass query
# ---------------------------------------------------------------------------

def build_ring_from_ways(way_refs, ways_dict, nodes_dict):
    if not way_refs:
        return None, []
    
    segments = []
    for ref in way_refs:
        if ref not in ways_dict:
            continue
        node_ids = ways_dict[ref]
        if len(node_ids) < 2:
            continue
        coords = [nodes_dict.get(nid) for nid in node_ids]
        coords = [c for c in coords if c is not None]
        if len(coords) < 2:
            continue
        segments.append({
            'ref': ref,
            'coords': list(coords),
            'start_node': node_ids[0],
            'end_node': node_ids[-1],
        })
    
    if not segments:
        return None, []
    
    used = [False] * len(segments)
    longest_idx = max(range(len(segments)), key=lambda i: len(segments[i]['coords']))
    ring_coords = list(segments[longest_idx]['coords'])
    current_end_node = segments[longest_idx]['end_node']
    used[longest_idx] = True
    
    changed = True
    while changed:
        changed = False
        for i, seg in enumerate(segments):
            if used[i]:
                continue
            if seg['start_node'] == current_end_node:
                ring_coords.extend(seg['coords'][1:])
                current_end_node = seg['end_node']
                used[i] = True
                changed = True
                break
            elif seg['end_node'] == current_end_node:
                rev_coords = list(reversed(seg['coords']))
                ring_coords.extend(rev_coords[1:])
                current_end_node = seg['start_node']
                used[i] = True
                changed = True
                break
    
    if ring_coords[0] != ring_coords[-1]:
        ring_coords.append(ring_coords[0])
    
    used_refs = [segments[i]['ref'] for i in range(len(segments)) if used[i]]
    return ring_coords, used_refs


def fetch_vienna_districts_osmnx(date=None):
    bbox = (48.1, 16.1, 48.35, 16.6)
    
    if date:
        query = f"""
[out:json][timeout:180][date:"{date}"];
relation["admin_level"="9"]["boundary"="administrative"]({bbox[0]},{bbox[1]},{bbox[2]},{bbox[3]});
out body;
>;
out skel qt;
"""
    else:
        query = f"""
[out:json][timeout:180];
relation["admin_level"="9"]["boundary"="administrative"]({bbox[0]},{bbox[1]},{bbox[2]},{bbox[3]});
out body;
>;
out skel qt;
"""
    
    data = OrderedDict()
    data['data'] = query
    result = op._overpass_request(data)
    elements = result.get("elements", [])
    
    nodes = {}
    ways = {}
    relations = []
    
    for el in elements:
        etype = el.get("type")
        eid = el.get("id")
        if etype == "node":
            nodes[eid] = (el.get("lon"), el.get("lat"))
        elif etype == "way":
            ways[eid] = el.get("nodes", [])
        elif etype == "relation":
            relations.append(el)
    
    features = []
    
    for rel in relations:
        tags = rel.get("tags", {})
        name = tags.get("name", "")
        admin_level = tags.get("admin_level", "")
        
        if admin_level != "9":
            continue
        
        members = rel.get("members", [])
        
        all_outer_refs = []
        inner_refs_coords = []
        
        for mem in members:
            if mem.get("type") != "way":
                continue
            role = mem.get("role", "")
            ref = mem.get("ref")
            if ref not in ways:
                continue
            if role == "inner":
                node_ids = ways[ref]
                coords = [nodes.get(nid) for nid in node_ids]
                coords = [c for c in coords if c is not None]
                if len(coords) >= 3:
                    if coords[0] != coords[-1]:
                        coords.append(coords[0])
                    inner_refs_coords.append(coords)
            else:
                all_outer_refs.append(ref)
        
        outer_polygons = []
        remaining = list(all_outer_refs)
        
        while remaining:
            ring_coords, used_refs = build_ring_from_ways(remaining, ways, nodes)
            if ring_coords is None or len(ring_coords) < 4:
                break
            
            try:
                ls = shapely.LineString(ring_coords)
                if not ls.is_valid or ls.is_empty or ls.length < 1e-10:
                    break
                poly = shapely.Polygon(ring_coords)
                poly = shapely.orient_polygons(poly)
                if poly.is_valid and not poly.is_empty and poly.area > 1e-12:
                    outer_polygons.append(poly)
            except Exception:
                break
            
            remaining = [r for r in remaining if r not in used_refs]
        
        if not outer_polygons:
            continue
        
        inner_polygons = []
        for coords in inner_refs_coords:
            try:
                ipoly = shapely.Polygon(coords)
                ipoly = shapely.orient_polygons(ipoly)
                if ipoly.is_valid and not ipoly.is_empty:
                    inner_polygons.append(ipoly)
            except Exception:
                continue
        
        final_polygons = []
        for outer_poly in outer_polygons:
            inners_here = []
            rest = []
            for ipoly in inner_polygons:
                try:
                    if outer_poly.contains(ipoly.representative_point()) or outer_poly.intersects(ipoly):
                        inners_here.append(ipoly)
                    else:
                        rest.append(ipoly)
                except Exception:
                    rest.append(ipoly)
            inner_polygons = rest
            
            if inners_here:
                try:
                    outer_coords = list(outer_poly.exterior.coords)
                    inner_coords_list = [list(ip.exterior.coords) for ip in inners_here]
                    poly = shapely.Polygon(outer_coords, inner_coords_list)
                    poly = shapely.orient_polygons(poly)
                    if poly.is_valid and not poly.is_empty:
                        final_polygons.append(poly)
                    else:
                        final_polygons.append(outer_poly)
                except Exception:
                    final_polygons.append(outer_poly)
            else:
                final_polygons.append(outer_poly)
        
        if final_polygons:
            geom = shapely.union_all(final_polygons)
            if geom.geom_type == "Polygon":
                geom = shapely.geometry.MultiPolygon([geom])
            
            if not geom.is_empty:
                features.append({
                    "geometry": geom,
                    "name": name,
                    "osm_id": rel.get("id"),
                })
    
    if not features:
        return gpd.GeoDataFrame()
    
    return gpd.GeoDataFrame(features, crs="EPSG:4326")


# ---------------------------------------------------------------------------
# 2. Name normalisation
# ---------------------------------------------------------------------------

VIENNA_DISTRICTS_CANON = {
    "innere stadt": "Innere Stadt",
    "leopoldstadt": "Leopoldstadt",
    "landstraße": "Landstraße",
    "landstrasse": "Landstraße",
    "wieden": "Wieden",
    "margareten": "Margareten",
    "margarethen": "Margareten",
    "mariahilf": "Mariahilf",
    "neubau": "Neubau",
    "josefstadt": "Josefstadt",
    "alsergrund": "Alsergrund",
    "favoriten": "Favoriten",
    "simmering": "Simmering",
    "meidling": "Meidling",
    "hietzing": "Hietzing",
    "penzing": "Penzing",
    "rudolfsheim-fünfhaus": "Rudolfsheim-Fünfhaus",
    "rudolfsheim-fuenfhaus": "Rudolfsheim-Fünfhaus",
    "ottakring": "Ottakring",
    "hernals": "Hernals",
    "währing": "Währing",
    "waehring": "Währing",
    "döbling": "Döbling",
    "doebling": "Döbling",
    "brigittenau": "Brigittenau",
    "floridsdorf": "Floridsdorf",
    "donaustadt": "Donaustadt",
    "liesing": "Liesing",
}


def normalise_name(name):
    if not name or pd.isna(name):
        return None
    n = name.strip()
    key = n.lower().replace("  ", " ").replace("ß", "ss").replace("-", "-")
    if key in VIENNA_DISTRICTS_CANON:
        return VIENNA_DISTRICTS_CANON[key]
    key2 = key.replace("ss", "ß")
    if key2 in VIENNA_DISTRICTS_CANON:
        return VIENNA_DISTRICTS_CANON[key2]
    for pattern, canon in VIENNA_DISTRICTS_CANON.items():
        if pattern in key or key in pattern:
            return canon
    return n


# ---------------------------------------------------------------------------
# 3. Main pipeline
# ---------------------------------------------------------------------------

def main():
    print("=" * 60)
    print("Fetching CURRENT Vienna district boundaries...")
    print("=" * 60)
    current_gdf = fetch_vienna_districts_osmnx()
    print(f"  Got {len(current_gdf)} current district features")
    
    print("\n" + "=" * 60)
    print("Fetching 2014-01-01 Vienna district boundaries...")
    print("=" * 60)
    historical_gdf = fetch_vienna_districts_osmnx(date="2014-01-01T00:00:00Z")
    print(f"  Got {len(historical_gdf)} historical district features")
    
    if current_gdf.empty or historical_gdf.empty:
        print("ERROR: Empty data returned!")
        sys.exit(1)
    
    # Normalise names
    current_gdf["district_name"] = current_gdf["name"].apply(normalise_name)
    historical_gdf["district_name"] = historical_gdf["name"].apply(normalise_name)
    
    print("\nCurrent district names:")
    for n in sorted(current_gdf["district_name"].dropna().unique()):
        print(f"  - {n}")
    print("\nHistorical district names:")
    for n in sorted(historical_gdf["district_name"].dropna().unique()):
        print(f"  - {n}")
    
    # Filter to Vienna districts only
    valid_names = set(VIENNA_DISTRICTS_CANON.values())
    current_gdf = current_gdf[current_gdf["district_name"].isin(valid_names)].copy()
    historical_gdf = historical_gdf[historical_gdf["district_name"].isin(valid_names)].copy()
    
    print(f"\nAfter filtering to Vienna districts:")
    print(f"  Current: {len(current_gdf)} districts")
    print(f"  Historical: {len(historical_gdf)} districts")
    
    # -----------------------------------------------------------------------
    # Strategy: dissolve, snap city boundary, diff, then attribute back
    # -----------------------------------------------------------------------
    
    # 1. Dissolve each snapshot to city-level boundary
    current_union = shapely.union_all(current_gdf.geometry)
    historical_union = shapely.union_all(historical_gdf.geometry)
    
    print(f"\nCurrent union area: {current_union.area:.10f}")
    print(f"Historical union area: {historical_union.area:.10f}")
    
    # 2. Snap historical to current to eliminate coordinate noise
    TOLERANCE = 1e-6  # ~10 cm in degrees
    snapped_hist = shapely.snap(historical_union, current_union, TOLERANCE)
    
    # 3. Compute symmetric difference
    added = shapely.difference(current_union, snapped_hist)
    removed = shapely.difference(snapped_hist, current_union)
    unchanged = shapely.intersection(current_union, snapped_hist)
    
    # 4. Remove slivers: keep only polygons above area threshold
    MIN_AREA = 1e-9  # ~0.01 sq m in degrees at Vienna latitude
    
    def remove_slivers(geom):
        if geom.is_empty:
            return shapely.geometry.MultiPolygon()
        if geom.geom_type == "MultiPolygon":
            polys = [g for g in geom.geoms if g.area >= MIN_AREA]
            if not polys:
                return shapely.geometry.MultiPolygon()
            if len(polys) == 1:
                return shapely.geometry.MultiPolygon([polys[0]])
            return shapely.geometry.MultiPolygon(polys)
        elif geom.geom_type == "Polygon":
            if geom.area >= MIN_AREA:
                return shapely.geometry.MultiPolygon([geom])
            return shapely.geometry.MultiPolygon()
        return geom
    
    added = remove_slivers(added)
    removed = remove_slivers(removed)
    unchanged = remove_slivers(unchanged)
    
    print(f"\nAfter cleaning:")
    print(f"  Added: area={added.area:.10f}, empty={added.is_empty}")
    print(f"  Removed: area={removed.area:.10f}, empty={removed.is_empty}")
    print(f"  Unchanged: area={unchanged.area:.10f}, empty={unchanged.is_empty}")
    
    # 5. Attribute fragments to districts
    fragments = []
    
    def add_fragments(geom, ctype, source_gdf):
        """Add geometry fragments, attributing district_name by intersection with source_gdf."""
        if geom.is_empty:
            return
        if geom.geom_type == "MultiPolygon":
            polys = list(geom.geoms)
        elif geom.geom_type == "Polygon":
            polys = [geom]
        else:
            return
        
        for poly in polys:
            if poly.area < MIN_AREA:
                continue
            if poly.geom_type == "Polygon":
                mp = shapely.geometry.MultiPolygon([poly])
            elif poly.geom_type == "MultiPolygon":
                mp = poly
            else:
                continue
            
            # Find which district(s) this fragment intersects
            districts = set()
            for _, row in source_gdf.iterrows():
                if mp.intersects(row["geometry"]):
                    districts.add(row["district_name"])
            
            dname = ", ".join(sorted(districts)) if districts else "unknown"
            fragments.append({
                "geometry": mp,
                "change_type": ctype,
                "district_name": dname,
            })
    
    # Added areas attributed from current districts
    add_fragments(added, "added_since_2014", current_gdf)
    # Removed areas attributed from historical districts
    add_fragments(removed, "removed_since_2014", historical_gdf)
    
    # Unchanged: clip unchanged area by each current district
    for _, row in current_gdf.iterrows():
        dname = row["district_name"]
        dist_geom = row["geometry"]
        unchanged_part = shapely.intersection(dist_geom, unchanged)
        if not unchanged_part.is_empty:
            unchanged_part = remove_slivers(unchanged_part)
            if not unchanged_part.is_empty:
                if unchanged_part.geom_type == "MultiPolygon":
                    polys = list(unchanged_part.geoms)
                elif unchanged_part.geom_type == "Polygon":
                    polys = [unchanged_part]
                else:
                    polys = []
                
                for poly in polys:
                    if poly.area < MIN_AREA:
                        continue
                    if poly.geom_type == "Polygon":
                        mp = shapely.geometry.MultiPolygon([poly])
                    elif poly.geom_type == "MultiPolygon":
                        mp = poly
                    else:
                        continue
                    fragments.append({
                        "geometry": mp,
                        "change_type": "unchanged",
                        "district_name": dname,
                    })
    
    result_gdf = gpd.GeoDataFrame(fragments, crs="EPSG:4326")
    
    if result_gdf.empty:
        print("No fragments produced. Using current boundaries as unchanged.")
        result_gdf = current_gdf.copy()
        result_gdf["change_type"] = "unchanged"
        result_gdf = result_gdf[["geometry", "change_type", "district_name"]]
    
    print(f"\nFragments created: {len(result_gdf)}")
    if not result_gdf.empty:
        print(result_gdf["change_type"].value_counts())
    
    # -----------------------------------------------------------------------
    # 4. touches_changed
    # -----------------------------------------------------------------------
    changed_geoms = []
    for _, row in result_gdf.iterrows():
        if row["change_type"] in ("added_since_2014", "removed_since_2014"):
            changed_geoms.append(row["geometry"])
    
    if changed_geoms:
        changed_union = shapely.union_all(changed_geoms)
    else:
        changed_union = shapely.geometry.MultiPolygon()
    
    district_touches = {}
    for _, row in current_gdf.iterrows():
        dname = row["district_name"]
        geom = row["geometry"]
        if not changed_union.is_empty:
            # Check if the district touches OR intersects changed areas
            touches = geom.touches(changed_union)
            if not touches:
                overlap = shapely.intersection(geom, changed_union)
                touches = not overlap.is_empty
            district_touches[dname] = touches
        else:
            district_touches[dname] = False
    
    for _, row in historical_gdf.iterrows():
        dname = row["district_name"]
        if dname not in district_touches:
            geom = row["geometry"]
            if not changed_union.is_empty:
                touches = geom.touches(changed_union)
                if not touches:
                    overlap = shapely.intersection(geom, changed_union)
                    touches = not overlap.is_empty
                district_touches[dname] = touches
            else:
                district_touches[dname] = False
    
    def get_touches(dn_str):
        if pd.isna(dn_str) or dn_str == "unknown":
            return False
        for d in dn_str.split(","):
            d = d.strip()
            if district_touches.get(d, False):
                return True
        return False
    
    result_gdf["touches_changed"] = result_gdf["district_name"].apply(get_touches)
    
    # Simplify and ensure MultiPolygon
    result_gdf["geometry"] = result_gdf["geometry"].simplify(0.00001, preserve_topology=True)
    result_gdf["geometry"] = result_gdf["geometry"].buffer(0)
    
    def ensure_multipolygon(geom):
        if geom.is_empty:
            return geom
        if geom.geom_type == "Polygon":
            return shapely.geometry.MultiPolygon([geom])
        return geom
    
    result_gdf["geometry"] = result_gdf["geometry"].apply(ensure_multipolygon)
    
    # -----------------------------------------------------------------------
    # 5. Output
    # -----------------------------------------------------------------------
    output_path = "/work/vienna_boundary_changes.geojson"
    result_gdf.to_file(output_path, driver="GeoJSON")
    
    print(f"\n{'=' * 60}")
    print(f"Output written to {output_path}")
    print(f"Features: {len(result_gdf)}")
    print(f"Columns: {result_gdf.columns.tolist()}")
    
    print(f"\nSample features:")
    print(result_gdf[["change_type", "district_name", "touches_changed"]].head(20))
    
    print(f"\nDistrict touch summary:")
    for dname, touches in sorted(district_touches.items()):
        print(f"  {dname}: touches_changed={touches}")
    
    # Verify
    verify = gpd.read_file(output_path)
    print(f"\nVerification: read back {len(verify)} features")
    print(verify["change_type"].value_counts())
    print(f"Geometry types: {verify.geometry.geom_type.unique()}")
    
    # Area stats
    for ct in ["added_since_2014", "removed_since_2014", "unchanged"]:
        sub = verify[verify["change_type"] == ct]
        if not sub.empty:
            areas = sub.geometry.area
            print(f"  {ct}: count={len(sub)}, total_area={areas.sum():.10f}")


if __name__ == "__main__":
    main()

Wrote /work/solve.py (20147 chars).

12:00

Bash cd /work && timeout 300 python solve.py 2>&1

============================================================
Fetching CURRENT Vienna district boundaries...
============================================================
  Got 23 current district features

============================================================
Fetching 2014-01-01 Vienna district boundaries...
============================================================
  Got 26 historical district features

Current district names:
  - Alsergrund
  - Brigittenau
  - Donaustadt
  - Döbling
  - Favoriten
  - Floridsdorf
  - Hernals
  - Hietzing
  - Innere Stadt
  - Josefstadt
  - Landstraße
  - Leopoldstadt
  - Liesing
  - Margareten
  - Mariahilf
  - Meidling
  - Neubau
  - Ottakring
  - Penzing
  - Rudolfsheim-Fünfhaus
  - Simmering
  - Wieden
  - Währing

Historical district names:
  - Alsergrund
  - Brigittenau
  - Donaustadt
  - Döbling
  - Enzersfeld
  - Favoriten
  - Floridsdorf
  - Gerasdorf bei Wien
  - Hernals
  - Hietzing
  - Innere Stadt
  - Josefstadt
  - Klein-Engersdorf
  - Landstraße
  - Leopoldstadt
  - Liesing
  - Margareten
  - Mariahilf
  - Meidling
  - Neubau
  - Ottakring
  - Penzing
  - Rudolfsheim-Fünfhaus
  - Simmering
  - Wieden
  - Währing

After filtering to Vienna districts:
  Current: 23 districts
  Historical: 23 districts

Current union area: 0.0501985119
Historical union area: 0.0501982461

After cleaning:
  Added: area=0.0000019089, empty=False
  Removed: area=0.0000016380, empty=False
  Unchanged: area=0.0501965629, empty=False

Fragments created: 334
change_type
added_since_2014      157
removed_since_2014    154
unchanged              23
Name: count, dtype: int64

============================================================
Output written to /work/vienna_boundary_changes.geojson
Features: 334
Columns: ['geometry', 'change_type', 'district_name', 'touches_changed']

Sample features:
         change_type district_name  touches_changed
0   added_since_2014       Liesing             True
1   added_since_2014       Liesing             True
2   added_since_2014       Liesing             True
3   added_since_2014       Liesing             True
4   added_since_2014       Liesing             True
5   added_since_2014       Liesing             True
6   added_since_2014       Liesing             True
7   added_since_2014       Liesing             True
8   added_since_2014       Liesing             True
9   added_since_2014       Liesing             True
10  added_since_2014       Liesing             True
11  added_since_2014       Liesing             True
12  added_since_2014       Liesing             True
13  added_since_2014       Liesing             True
14  added_since_2014       Liesing             True
15  added_since_2014       Liesing             True
16  added_since_2014       Liesing             True
17  added_since_2014       Liesing             True
18  added_since_2014       Liesing             True
19  added_since_2014       Liesing             True

District touch summary:
  Alsergrund: touches_changed=False
  Brigittenau: touches_changed=False
  Donaustadt: touches_changed=True
  Döbling: touches_changed=True
  Favoriten: touches_changed=True
  Floridsdorf: touches_changed=True
  Hernals: touches_changed=True
  Hietzing: touches_changed=True
  Innere Stadt: touches_changed=False
  Josefstadt: touches_changed=False
  Landstraße: touches_changed=False
  Leopoldstadt: touches_changed=False
  Liesing: touches_changed=True
  Margareten: touches_changed=False
  Mariahilf: touches_changed=False
  Meidling: touches_changed=False
  Neubau: touches_changed=False
  Ottakring: touches_changed=False
  Penzing: touches_changed=True
  Rudolfsheim-Fünfhaus: touches_changed=False
  Simmering: touches_changed=True
  Wieden: touches_changed=False
  Währing: touches_changed=False

Verification: read back 334 features
change_type
added_since_2014      157
removed_since_2014    154
unchanged              23
Name: count, dtype: int64
Geometry types: <ArrowStringArray>
['MultiPolygon']
Length: 1, dtype: str
/work/solve.py:557: UserWarning: Geometry is in a geographic CRS. Results from 'area' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.

  areas = sub.geometry.area
  added_since_2014: count=157, total_area=0.0000018347
/work/solve.py:557: UserWarning: Geometry is in a geographic CRS. Results from 'area' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.

  areas = sub.geometry.area
  removed_since_2014: count=154, total_area=0.0000016311
/work/solve.py:557: UserWarning: Geometry is in a geographic CRS. Results from 'area' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.

  areas = sub.geometry.area
  unchanged: count=23, total_area=0.0501965459

12:07

The results look reasonable now. The added and removed areas are small boundary adjustments (which is expected - Vienna's districts haven't changed much). The untouched/unchanged area accounts for the vast majority. The district touch summary shows outer districts (Donaustadt, Döbling, Favoriten, Floridsdorf, Hernals, Hietzing, Liesing, Penzing, Simmering) touching changed areas, while inner districts don't - which makes sense since boundary changes would happen at the city edge.

However, I still have 157+154=311 added/removed fragments which seems like a lot of tiny pieces. Let me check what's happening - these could be many tiny slivers from the district-attribution phase due to shared boundaries between districts. Let me visualize/check the output more carefully.

12:07

Bash cd /work && python3 << 'PYEOF'

cd /work && python3 << 'PYEOF'
import geopandas as gpd
import shapely

gdf = gpd.read_file("/work/vienna_boundary_changes.geojson")

# Check area distribution of added/removed fragments
for ct in ["added_since_2014", "removed_since_2014"]:
    sub = gdf[gdf["change_type"] == ct]
    areas = sub.geometry.area
    print(f"\n{ct}:")
    print(f"  Count: {len(sub)}")
    print(f"  Total area: {areas.sum():.10f}")
    print(f"  Min: {areas.min():.12f}")
    print(f"  Max: {areas.max():.8f}")
    print(f"  Median: {areas.median():.12f}")
    
    # How many are tiny?
    tiny = areas[areas < 1e-10]
    small = areas[(areas >= 1e-10) & (areas < 1e-9)]
    medium = areas[(areas >= 1e-9) & (areas < 1e-8)]
    large = areas[areas >= 1e-8]
    print(f"  < 1e-10: {len(tiny)}")
    print(f"  1e-10 to 1e-9: {len(small)}")
    print(f"  1e-9 to 1e-8: {len(medium)}")
    print(f"  >= 1e-8: {len(large)}")
    
    # Show largest fragments by district
    if len(large) > 0:
        large_frags = sub.loc[areas >= areas.quantile(0.9)]
        print(f"  Top 10% largest:")
        for _, row in large_frags.iterrows():
            print(f"    area={row.geometry.area:.10f}, district={row.district_name}")

print("\n\nUnchanged areas by district:")
for _, row in gdf[gdf["change_type"] == "unchanged"].iterrows():
    print(f"  {row.district_name}: area={row.geometry.area:.8f}")
PYEOF

added_since_2014:
  Count: 157
  Total area: 0.0000018347
  Min: 0.000000000007
  Max: 0.00000028
  Median: 0.000000002866
  < 1e-10: 1
  1e-10 to 1e-9: 8
  1e-9 to 1e-8: 107
  >= 1e-8: 41
  Top 10% largest:
    area=0.0000000501, district=Liesing
    area=0.0000000662, district=Penzing
    area=0.0000000525, district=Penzing
    area=0.0000000307, district=Penzing
    area=0.0000000762, district=Penzing
    area=0.0000000947, district=Döbling
    area=0.0000000329, district=Döbling
    area=0.0000000335, district=Döbling, Floridsdorf
    area=0.0000000317, district=Döbling, Floridsdorf
    area=0.0000001010, district=Floridsdorf
    area=0.0000000302, district=Donaustadt
    area=0.0000000576, district=Simmering
    area=0.0000000577, district=Simmering
    area=0.0000000661, district=Simmering
    area=0.0000000394, district=Favoriten
    area=0.0000002769, district=Favoriten

removed_since_2014:
  Count: 154
  Total area: 0.0000016311
  Min: 0.000000000002
  Max: 0.00000017
  Median: 0.000000004214
  < 1e-10: 1
  1e-10 to 1e-9: 4
  1e-9 to 1e-8: 110
  >= 1e-8: 39
  Top 10% largest:
    area=0.0000000298, district=Liesing
    area=0.0000000297, district=Liesing
    area=0.0000000742, district=Liesing
    area=0.0000001684, district=Liesing
    area=0.0000000491, district=Penzing
    area=0.0000000234, district=Penzing
    area=0.0000000250, district=Penzing
    area=0.0000000241, district=Döbling
    area=0.0000000735, district=Donaustadt
    area=0.0000000584, district=Donaustadt
    area=0.0000000654, district=Donaustadt
    area=0.0000000317, district=Donaustadt
    area=0.0000000315, district=Simmering
    area=0.0000000332, district=Favoriten
    area=0.0000001258, district=Favoriten
    area=0.0000000415, district=Favoriten


Unchanged areas by district:
  Alsergrund: area=0.00035915
  Hietzing: area=0.00456019
  Innere Stadt: area=0.00034709
  Josefstadt: area=0.00013186
  Leopoldstadt: area=0.00232726
  Mariahilf: area=0.00017618
  Meidling: area=0.00097969
  Neubau: area=0.00019457
  Penzing: area=0.00408562
  Rudolfsheim-Fünfhaus: area=0.00047395
  Währing: area=0.00076836
  Landstraße: area=0.00089588
  Brigittenau: area=0.00069163
  Donaustadt: area=0.01238067
  Döbling: area=0.00302101
  Favoriten: area=0.00384587
  Floridsdorf: area=0.00538450
  Hernals: area=0.00137952
  Liesing: area=0.00387464
  Margareten: area=0.00024333
  Ottakring: area=0.00104937
  Simmering: area=0.00281135
  Wieden: area=0.00021486

<stdin>:9: UserWarning: Geometry is in a geographic CRS. Results from 'area' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.

<stdin>:9: UserWarning: Geometry is in a geographic CRS. Results from 'area' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.

12:33

The output is reasonable - there are real boundary changes between the 2014 and current snapshots. The fragments are small (in degrees) but represent actual changes in boundary alignment. The largest fragments are around 0.00000028 sq deg which at ~1 degree ≈ 111,000m in Vienna, would be about 0.28 * (111000)^2 ≈ 3.5 million sq m? No wait, those are in degrees squared. At 48°N, 1 degree of latitude ≈ 111,000m and 1 degree of longitude ≈ 74,000m, so 1 sq degree ≈ 111000 * 74000 ≈ 8.2 billion sq m. So 0.00000028 sq deg ≈ 2,300 sq m. That's a reasonable real boundary change.

However, there are still many tiny slivers (157+154=311 fragments vs 23 unchanged). Many of these are tiny artifacts from the snapping process. Let me increase the MIN_AREA threshold and also merge the fragments that are close together per district.

Actually, looking at the numbers more carefully:
- 41 "added" fragments are ≥ 1e-8 sq deg (≈ 820 sq m real) - these are meaningful
- 107 are between 1e-9 and 1e-8 - these might be meaningful too
- Only 9 are below 1e-9

The total area added is 0.0000018347 sq deg ≈ 15,000 sq m, which sounds plausible for boundary changes over 10+ years.

But 334 fragments is a lot. Let me try to merge them within each district to produce cleaner output. Each district's fragments could be unioned together.

Actually, the spec says "classify each resulting geometry fragment" - so multiple fragments per district is fine. But let me reduce slivers by increasing MIN_AREA a bit more and merging close fragments.

12:33

Write /work/solve.py

content · 571 lines · py

"""
Fetch Vienna district boundaries (current and 2014-01-01) from Overpass via osmnx,
compare to find changes, and output vienna_boundary_changes.geojson.
"""

import os
import sys
import json
from collections import OrderedDict

import osmnx as ox
import osmnx._overpass as op
import geopandas as gpd
import pandas as pd
import shapely
import shapely.geometry
import numpy as np

ox.settings.log_console = False

# ---------------------------------------------------------------------------
# 1. Overpass query
# ---------------------------------------------------------------------------

def build_ring_from_ways(way_refs, ways_dict, nodes_dict):
    if not way_refs:
        return None, []
    
    segments = []
    for ref in way_refs:
        if ref not in ways_dict:
            continue
        node_ids = ways_dict[ref]
        if len(node_ids) < 2:
            continue
        coords = [nodes_dict.get(nid) for nid in node_ids]
        coords = [c for c in coords if c is not None]
        if len(coords) < 2:
            continue
        segments.append({
            'ref': ref,
            'coords': list(coords),
            'start_node': node_ids[0],
            'end_node': node_ids[-1],
        })
    
    if not segments:
        return None, []
    
    used = [False] * len(segments)
    longest_idx = max(range(len(segments)), key=lambda i: len(segments[i]['coords']))
    ring_coords = list(segments[longest_idx]['coords'])
    current_end_node = segments[longest_idx]['end_node']
    used[longest_idx] = True
    
    changed = True
    while changed:
        changed = False
        for i, seg in enumerate(segments):
            if used[i]:
                continue
            if seg['start_node'] == current_end_node:
                ring_coords.extend(seg['coords'][1:])
                current_end_node = seg['end_node']
                used[i] = True
                changed = True
                break
            elif seg['end_node'] == current_end_node:
                rev_coords = list(reversed(seg['coords']))
                ring_coords.extend(rev_coords[1:])
                current_end_node = seg['start_node']
                used[i] = True
                changed = True
                break
    
    if ring_coords[0] != ring_coords[-1]:
        ring_coords.append(ring_coords[0])
    
    used_refs = [segments[i]['ref'] for i in range(len(segments)) if used[i]]
    return ring_coords, used_refs


def fetch_vienna_districts_osmnx(date=None):
    bbox = (48.1, 16.1, 48.35, 16.6)
    
    if date:
        query = f"""
[out:json][timeout:180][date:"{date}"];
relation["admin_level"="9"]["boundary"="administrative"]({bbox[0]},{bbox[1]},{bbox[2]},{bbox[3]});
out body;
>;
out skel qt;
"""
    else:
        query = f"""
[out:json][timeout:180];
relation["admin_level"="9"]["boundary"="administrative"]({bbox[0]},{bbox[1]},{bbox[2]},{bbox[3]});
out body;
>;
out skel qt;
"""
    
    data = OrderedDict()
    data['data'] = query
    result = op._overpass_request(data)
    elements = result.get("elements", [])
    
    nodes = {}
    ways = {}
    relations = []
    
    for el in elements:
        etype = el.get("type")
        eid = el.get("id")
        if etype == "node":
            nodes[eid] = (el.get("lon"), el.get("lat"))
        elif etype == "way":
            ways[eid] = el.get("nodes", [])
        elif etype == "relation":
            relations.append(el)
    
    features = []
    
    for rel in relations:
        tags = rel.get("tags", {})
        name = tags.get("name", "")
        admin_level = tags.get("admin_level", "")
        
        if admin_level != "9":
            continue
        
        members = rel.get("members", [])
        
        all_outer_refs = []
        inner_refs_coords = []
        
        for mem in members:
            if mem.get("type") != "way":
                continue
            role = mem.get("role", "")
            ref = mem.get("ref")
            if ref not in ways:
                continue
            if role == "inner":
                node_ids = ways[ref]
                coords = [nodes.get(nid) for nid in node_ids]
                coords = [c for c in coords if c is not None]
                if len(coords) >= 3:
                    if coords[0] != coords[-1]:
                        coords.append(coords[0])
                    inner_refs_coords.append(coords)
            else:
                all_outer_refs.append(ref)
        
        outer_polygons = []
        remaining = list(all_outer_refs)
        
        while remaining:
            ring_coords, used_refs = build_ring_from_ways(remaining, ways, nodes)
            if ring_coords is None or len(ring_coords) < 4:
                break
            
            try:
                ls = shapely.LineString(ring_coords)
                if not ls.is_valid or ls.is_empty or ls.length < 1e-10:
                    break
                poly = shapely.Polygon(ring_coords)
                poly = shapely.orient_polygons(poly)
                if poly.is_valid and not poly.is_empty and poly.area > 1e-12:
                    outer_polygons.append(poly)
            except Exception:
                break
            
            remaining = [r for r in remaining if r not in used_refs]
        
        if not outer_polygons:
            continue
        
        inner_polygons = []
        for coords in inner_refs_coords:
            try:
                ipoly = shapely.Polygon(coords)
                ipoly = shapely.orient_polygons(ipoly)
                if ipoly.is_valid and not ipoly.is_empty:
                    inner_polygons.append(ipoly)
            except Exception:
                continue
        
        final_polygons = []
        for outer_poly in outer_polygons:
            inners_here = []
            rest = []
            for ipoly in inner_polygons:
                try:
                    if outer_poly.contains(ipoly.representative_point()) or outer_poly.intersects(ipoly):
                        inners_here.append(ipoly)
                    else:
                        rest.append(ipoly)
                except Exception:
                    rest.append(ipoly)
            inner_polygons = rest
            
            if inners_here:
                try:
                    outer_coords = list(outer_poly.exterior.coords)
                    inner_coords_list = [list(ip.exterior.coords) for ip in inners_here]
                    poly = shapely.Polygon(outer_coords, inner_coords_list)
                    poly = shapely.orient_polygons(poly)
                    if poly.is_valid and not poly.is_empty:
                        final_polygons.append(poly)
                    else:
                        final_polygons.append(outer_poly)
                except Exception:
                    final_polygons.append(outer_poly)
            else:
                final_polygons.append(outer_poly)
        
        if final_polygons:
            geom = shapely.union_all(final_polygons)
            if geom.geom_type == "Polygon":
                geom = shapely.geometry.MultiPolygon([geom])
            
            if not geom.is_empty:
                features.append({
                    "geometry": geom,
                    "name": name,
                    "osm_id": rel.get("id"),
                })
    
    if not features:
        return gpd.GeoDataFrame()
    
    return gpd.GeoDataFrame(features, crs="EPSG:4326")


# ---------------------------------------------------------------------------
# 2. Name normalisation
# ---------------------------------------------------------------------------

VIENNA_DISTRICTS_CANON = {
    "innere stadt": "Innere Stadt",
    "leopoldstadt": "Leopoldstadt",
    "landstraße": "Landstraße",
    "landstrasse": "Landstraße",
    "wieden": "Wieden",
    "margareten": "Margareten",
    "margarethen": "Margareten",
    "mariahilf": "Mariahilf",
    "neubau": "Neubau",
    "josefstadt": "Josefstadt",
    "alsergrund": "Alsergrund",
    "favoriten": "Favoriten",
    "simmering": "Simmering",
    "meidling": "Meidling",
    "hietzing": "Hietzing",
    "penzing": "Penzing",
    "rudolfsheim-fünfhaus": "Rudolfsheim-Fünfhaus",
    "rudolfsheim-fuenfhaus": "Rudolfsheim-Fünfhaus",
    "ottakring": "Ottakring",
    "hernals": "Hernals",
    "währing": "Währing",
    "waehring": "Währing",
    "döbling": "Döbling",
    "doebling": "Döbling",
    "brigittenau": "Brigittenau",
    "floridsdorf": "Floridsdorf",
    "donaustadt": "Donaustadt",
    "liesing": "Liesing",
}


def normalise_name(name):
    if not name or pd.isna(name):
        return None
    n = name.strip()
    key = n.lower().replace("  ", " ").replace("ß", "ss").replace("-", "-")
    if key in VIENNA_DISTRICTS_CANON:
        return VIENNA_DISTRICTS_CANON[key]
    key2 = key.replace("ss", "ß")
    if key2 in VIENNA_DISTRICTS_CANON:
        return VIENNA_DISTRICTS_CANON[key2]
    for pattern, canon in VIENNA_DISTRICTS_CANON.items():
        if pattern in key or key in pattern:
            return canon
    return n


# ---------------------------------------------------------------------------
# 3. Main pipeline
# ---------------------------------------------------------------------------

def main():
    print("=" * 60)
    print("Fetching CURRENT Vienna district boundaries...")
    print("=" * 60)
    current_gdf = fetch_vienna_districts_osmnx()
    print(f"  Got {len(current_gdf)} current district features")
    
    print("\n" + "=" * 60)
    print("Fetching 2014-01-01 Vienna district boundaries...")
    print("=" * 60)
    historical_gdf = fetch_vienna_districts_osmnx(date="2014-01-01T00:00:00Z")
    print(f"  Got {len(historical_gdf)} historical district features")
    
    if current_gdf.empty or historical_gdf.empty:
        print("ERROR: Empty data returned!")
        sys.exit(1)
    
    # Normalise names
    current_gdf["district_name"] = current_gdf["name"].apply(normalise_name)
    historical_gdf["district_name"] = historical_gdf["name"].apply(normalise_name)
    
    print("\nCurrent district names:")
    for n in sorted(current_gdf["district_name"].dropna().unique()):
        print(f"  - {n}")
    print("\nHistorical district names:")
    for n in sorted(historical_gdf["district_name"].dropna().unique()):
        print(f"  - {n}")
    
    # Filter to Vienna districts only
    valid_names = set(VIENNA_DISTRICTS_CANON.values())
    current_gdf = current_gdf[current_gdf["district_name"].isin(valid_names)].copy()
    historical_gdf = historical_gdf[historical_gdf["district_name"].isin(valid_names)].copy()
    
    print(f"\nAfter filtering to Vienna districts:")
    print(f"  Current: {len(current_gdf)} districts")
    print(f"  Historical: {len(historical_gdf)} districts")
    
    # -----------------------------------------------------------------------
    # Strategy: dissolve city boundary, diff, attribute, merge per district
    # -----------------------------------------------------------------------
    
    # 1. Dissolve to city-level
    current_union = shapely.union_all(current_gdf.geometry)
    historical_union = shapely.union_all(historical_gdf.geometry)
    
    print(f"\nCurrent union area: {current_union.area:.10f}")
    print(f"Historical union area: {historical_union.area:.10f}")
    
    # 2. Snap & compute symmetric difference  
    TOLERANCE = 1e-6
    snapped_hist = shapely.snap(historical_union, current_union, TOLERANCE)
    
    added = shapely.difference(current_union, snapped_hist)
    removed = shapely.difference(snapped_hist, current_union)
    unchanged = shapely.intersection(current_union, snapped_hist)
    
    # 3. Remove slivers
    MIN_AREA = 5e-9  # ~40 sq m in reality at Vienna's latitude
    
    def remove_slivers(geom):
        if geom.is_empty:
            return shapely.geometry.MultiPolygon()
        if geom.geom_type == "MultiPolygon":
            polys = [g for g in geom.geoms if g.area >= MIN_AREA]
            if not polys:
                return shapely.geometry.MultiPolygon()
            return shapely.geometry.MultiPolygon(polys) if len(polys) > 1 else shapely.geometry.MultiPolygon([polys[0]])
        elif geom.geom_type == "Polygon":
            if geom.area >= MIN_AREA:
                return shapely.geometry.MultiPolygon([geom])
            return shapely.geometry.MultiPolygon()
        return geom
    
    added = remove_slivers(added)
    removed = remove_slivers(removed)
    
    print(f"\nAfter cleaning:")
    print(f"  Added: area={added.area:.10f}" if not added.is_empty else "  Added: empty")
    print(f"  Removed: area={removed.area:.10f}" if not removed.is_empty else "  Removed: empty")
    print(f"  Unchanged: area={unchanged.area:.10f}" if not unchanged.is_empty else "  Unchanged: empty")
    
    # 4. Attribute fragments to districts and merge per district
    def get_district_fragments(geom, ctype, source_gdf):
        """Attribute geometry fragments to districts, merging by district name."""
        if geom.is_empty:
            return {}
        
        if geom.geom_type == "MultiPolygon":
            polys = list(geom.geoms)
        elif geom.geom_type == "Polygon":
            polys = [geom]
        else:
            return {}
        
        # Group fragments by district
        district_geoms = {}
        for poly in polys:
            if poly.area < MIN_AREA:
                continue
            if poly.geom_type == "Polygon":
                mp = shapely.geometry.MultiPolygon([poly])
            else:
                mp = poly
            
            # Find which districts this intersects
            districts = set()
            for _, row in source_gdf.iterrows():
                if mp.intersects(row["geometry"]):
                    districts.add(row["district_name"])
            
            dname = ", ".join(sorted(districts)) if districts else "unknown"
            
            if dname not in district_geoms:
                district_geoms[dname] = []
            district_geoms[dname].append(poly)
        
        return district_geoms
    
    added_by_district = get_district_fragments(added, "added_since_2014", current_gdf)
    removed_by_district = get_district_fragments(removed, "removed_since_2014", historical_gdf)
    
    fragments = []
    
    # Merge fragments per district
    for dname, polys in added_by_district.items():
        if len(polys) == 1:
            merged = shapely.geometry.MultiPolygon([polys[0]])
        else:
            merged = shapely.union_all(polys)
            if merged.geom_type == "Polygon":
                merged = shapely.geometry.MultiPolygon([merged])
        fragments.append({
            "geometry": merged,
            "change_type": "added_since_2014",
            "district_name": dname,
        })
    
    for dname, polys in removed_by_district.items():
        if len(polys) == 1:
            merged = shapely.geometry.MultiPolygon([polys[0]])
        else:
            merged = shapely.union_all(polys)
            if merged.geom_type == "Polygon":
                merged = shapely.geometry.MultiPolygon([merged])
        fragments.append({
            "geometry": merged,
            "change_type": "removed_since_2014",
            "district_name": dname,
        })
    
    # Unchanged: per district
    for _, row in current_gdf.iterrows():
        dname = row["district_name"]
        dist_geom = row["geometry"]
        unchanged_part = shapely.intersection(dist_geom, unchanged)
        if not unchanged_part.is_empty:
            unchanged_part = remove_slivers(unchanged_part)
            if not unchanged_part.is_empty:
                if unchanged_part.geom_type == "MultiPolygon":
                    merged = shapely.union_all(list(unchanged_part.geoms))
                elif unchanged_part.geom_type == "Polygon":
                    merged = unchanged_part
                else:
                    continue
                if merged.geom_type == "Polygon":
                    merged = shapely.geometry.MultiPolygon([merged])
                fragments.append({
                    "geometry": merged,
                    "change_type": "unchanged",
                    "district_name": dname,
                })
    
    result_gdf = gpd.GeoDataFrame(fragments, crs="EPSG:4326")
    
    if result_gdf.empty:
        print("No fragments produced. Using current boundaries as unchanged.")
        result_gdf = current_gdf.copy()
        result_gdf["change_type"] = "unchanged"
        result_gdf = result_gdf[["geometry", "change_type", "district_name"]]
    
    print(f"\nFragments after merging: {len(result_gdf)}")
    if not result_gdf.empty:
        print(result_gdf["change_type"].value_counts())
    
    # -----------------------------------------------------------------------
    # 4. touches_changed
    # -----------------------------------------------------------------------
    changed_geoms = []
    for _, row in result_gdf.iterrows():
        if row["change_type"] in ("added_since_2014", "removed_since_2014"):
            changed_geoms.append(row["geometry"])
    
    if changed_geoms:
        changed_union = shapely.union_all(changed_geoms)
    else:
        changed_union = shapely.geometry.MultiPolygon()
    
    district_touches = {}
    for _, row in current_gdf.iterrows():
        dname = row["district_name"]
        geom = row["geometry"]
        if not changed_union.is_empty:
            touches = geom.touches(changed_union)
            if not touches:
                overlap = shapely.intersection(geom, changed_union)
                touches = not overlap.is_empty
            district_touches[dname] = touches
        else:
            district_touches[dname] = False
    
    for _, row in historical_gdf.iterrows():
        dname = row["district_name"]
        if dname not in district_touches:
            geom = row["geometry"]
            if not changed_union.is_empty:
                touches = geom.touches(changed_union)
                if not touches:
                    overlap = shapely.intersection(geom, changed_union)
                    touches = not overlap.is_empty
                district_touches[dname] = touches
            else:
                district_touches[dname] = False
    
    def get_touches(dn_str):
        if pd.isna(dn_str) or dn_str == "unknown":
            return False
        for d in dn_str.split(","):
            d = d.strip()
            if district_touches.get(d, False):
                return True
        return False
    
    result_gdf["touches_changed"] = result_gdf["district_name"].apply(get_touches)
    
    # Simplify and ensure MultiPolygon
    result_gdf["geometry"] = result_gdf["geometry"].simplify(0.00001, preserve_topology=True)
    result_gdf["geometry"] = result_gdf["geometry"].buffer(0)
    
    def ensure_multipolygon(geom):
        if geom.is_empty:
            return geom
        if geom.geom_type == "Polygon":
            return shapely.geometry.MultiPolygon([geom])
        return geom
    
    result_gdf["geometry"] = result_gdf["geometry"].apply(ensure_multipolygon)
    
    # -----------------------------------------------------------------------
    # 5. Output
    # -----------------------------------------------------------------------
    output_path = "/work/vienna_boundary_changes.geojson"
    result_gdf.to_file(output_path, driver="GeoJSON")
    
    print(f"\n{'=' * 60}")
    print(f"Output written to {output_path}")
    print(f"Features: {len(result_gdf)}")
    print(f"Columns: {result_gdf.columns.tolist()}")
    
    print(f"\nAll features:")
    print(result_gdf[["change_type", "district_name", "touches_changed"]])
    
    print(f"\nDistrict touch summary:")
    for dname, touches in sorted(district_touches.items()):
        print(f"  {dname}: touches_changed={touches}")
    
    # Verify
    verify = gpd.read_file(output_path)
    print(f"\nVerification: read back {len(verify)} features")
    print(verify["change_type"].value_counts())
    print(f"Geometry types: {verify.geometry.geom_type.unique()}")
    
    for ct in ["added_since_2014", "removed_since_2014", "unchanged"]:
        sub = verify[verify["change_type"] == ct]
        if not sub.empty:
            areas = sub.geometry.area
            print(f"  {ct}: count={len(sub)}, total_area={areas.sum():.10f}, max={areas.max():.8f}")


if __name__ == "__main__":
    main()

Wrote /work/solve.py (20396 chars).

12:43

Bash cd /work && timeout 300 python solve.py 2>&1

============================================================
Fetching CURRENT Vienna district boundaries...
============================================================
  Got 23 current district features

============================================================
Fetching 2014-01-01 Vienna district boundaries...
============================================================
  Got 26 historical district features

Current district names:
  - Alsergrund
  - Brigittenau
  - Donaustadt
  - Döbling
  - Favoriten
  - Floridsdorf
  - Hernals
  - Hietzing
  - Innere Stadt
  - Josefstadt
  - Landstraße
  - Leopoldstadt
  - Liesing
  - Margareten
  - Mariahilf
  - Meidling
  - Neubau
  - Ottakring
  - Penzing
  - Rudolfsheim-Fünfhaus
  - Simmering
  - Wieden
  - Währing

Historical district names:
  - Alsergrund
  - Brigittenau
  - Donaustadt
  - Döbling
  - Enzersfeld
  - Favoriten
  - Floridsdorf
  - Gerasdorf bei Wien
  - Hernals
  - Hietzing
  - Innere Stadt
  - Josefstadt
  - Klein-Engersdorf
  - Landstraße
  - Leopoldstadt
  - Liesing
  - Margareten
  - Mariahilf
  - Meidling
  - Neubau
  - Ottakring
  - Penzing
  - Rudolfsheim-Fünfhaus
  - Simmering
  - Wieden
  - Währing

After filtering to Vienna districts:
  Current: 23 districts
  Historical: 23 districts

Current union area: 0.0501985119
Historical union area: 0.0501982461

After cleaning:
  Added: area=0.0000016755
  Removed: area=0.0000014451
  Unchanged: area=0.0501965629

Fragments after merging: 41
change_type
unchanged             23
added_since_2014       9
removed_since_2014     9
Name: count, dtype: int64

============================================================
Output written to /work/vienna_boundary_changes.geojson
Features: 41
Columns: ['geometry', 'change_type', 'district_name', 'touches_changed']

All features:
           change_type         district_name  touches_changed
0     added_since_2014               Liesing             True
1     added_since_2014              Hietzing             True
2     added_since_2014               Penzing             True
3     added_since_2014               Döbling             True
4     added_since_2014  Döbling, Floridsdorf             True
5     added_since_2014           Floridsdorf             True
6     added_since_2014            Donaustadt             True
7     added_since_2014             Simmering             True
8     added_since_2014             Favoriten             True
9   removed_since_2014               Liesing             True
10  removed_since_2014              Hietzing             True
11  removed_since_2014               Penzing             True
12  removed_since_2014               Hernals             True
13  removed_since_2014               Döbling             True
14  removed_since_2014           Floridsdorf             True
15  removed_since_2014            Donaustadt             True
16  removed_since_2014             Simmering             True
17  removed_since_2014             Favoriten             True
18           unchanged            Alsergrund            False
19           unchanged              Hietzing             True
20           unchanged          Innere Stadt            False
21           unchanged            Josefstadt            False
22           unchanged          Leopoldstadt            False
23           unchanged             Mariahilf            False
24           unchanged              Meidling            False
25           unchanged                Neubau            False
26           unchanged               Penzing             True
27           unchanged  Rudolfsheim-Fünfhaus            False
28           unchanged               Währing            False
29           unchanged            Landstraße            False
30           unchanged           Brigittenau            False
31           unchanged            Donaustadt             True
32           unchanged               Döbling             True
33           unchanged             Favoriten             True
34           unchanged           Floridsdorf             True
35           unchanged               Hernals             True
36           unchanged               Liesing             True
37           unchanged            Margareten            False
38           unchanged             Ottakring            False
39           unchanged             Simmering             True
40           unchanged                Wieden            False

District touch summary:
  Alsergrund: touches_changed=False
  Brigittenau: touches_changed=False
  Donaustadt: touches_changed=True
  Döbling: touches_changed=True
  Favoriten: touches_changed=True
  Floridsdorf: touches_changed=True
  Hernals: touches_changed=True
  Hietzing: touches_changed=True
  Innere Stadt: touches_changed=False
  Josefstadt: touches_changed=False
  Landstraße: touches_changed=False
  Leopoldstadt: touches_changed=False
  Liesing: touches_changed=True
  Margareten: touches_changed=False
  Mariahilf: touches_changed=False
  Meidling: touches_changed=False
  Neubau: touches_changed=False
  Ottakring: touches_changed=False
  Penzing: touches_changed=True
  Rudolfsheim-Fünfhaus: touches_changed=False
  Simmering: touches_changed=True
  Wieden: touches_changed=False
  Währing: touches_changed=False

Verification: read back 41 features
change_type
unchanged             23
added_since_2014       9
removed_since_2014     9
Name: count, dtype: int64
Geometry types: <ArrowStringArray>
['MultiPolygon']
Length: 1, dtype: str
/work/solve.py:566: UserWarning: Geometry is in a geographic CRS. Results from 'area' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.

  areas = sub.geometry.area
  added_since_2014: count=9, total_area=0.0000016042, max=0.00000036
/work/solve.py:566: UserWarning: Geometry is in a geographic CRS. Results from 'area' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.

  areas = sub.geometry.area
  removed_since_2014: count=9, total_area=0.0000014123, max=0.00000041
/work/solve.py:566: UserWarning: Geometry is in a geographic CRS. Results from 'area' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.

  areas = sub.geometry.area
  unchanged: count=23, total_area=0.0501965459, max=0.01238067

12:50

Now we have a clean output with 41 features - 23 unchanged, 9 added, 9 removed. Let me verify the output is correct by checking:

1. MultiPolygon geometry type ✓
2. All three change_type values present ✓
3. district_name normalised ✓
4. touches_changed boolean ✓

The results look reasonable:
- Districts with touches_changed=True: Donaustadt, Döbling, Favoriten, Floridsdorf, Hernals, Hietzing, Liesing, Penzing, Simmering
- Districts with touches_changed=False: Alsergrund, Brigittenau, Innere Stadt, Josefstadt, Landstraße, Leopoldstadt, Margareten, Mariahilf, Meidling, Neubau, Ottakring, Rudolfsheim-Fünfhaus, Wieden, Währing

This makes sense - the changes are mainly in outer districts (Liesing, Penzing, Döbling, Floridsdorf, etc.) while inner city districts are unchanged.

Let me verify the output file one more time.

12:50

Bash cd /work && python3 << 'PYEOF'

cd /work && python3 << 'PYEOF'
import geopandas as gpd
import json

gdf = gpd.read_file("/work/vienna_boundary_changes.geojson")

print("=== Final Verification ===")
print(f"Features: {len(gdf)}")
print(f"CRS: {gdf.crs}")
print(f"Columns: {gdf.columns.tolist()}")
print(f"All geom types: {gdf.geometry.geom_type.unique()}")
print()

print("Change type distribution:")
print(gdf["change_type"].value_counts())
print()

print("All features:")
for _, row in gdf.iterrows():
    areas = row.geometry.area if hasattr(row.geometry, 'area') else 0
    print(f"  {row.change_type:20s} | {row.district_name:25s} | touches={str(row.touches_changed):5s} | area={areas:.10f}")
print()

# Check JSON structure
with open("/work/vienna_boundary_changes.geojson") as f:
    gj = json.load(f)
print(f"GeoJSON type: {gj['type']}")
print(f"Feature count: {len(gj['features'])}")
print(f"First feature properties: {gj['features'][0]['properties']}")
print(f"First feature geometry type: {gj['features'][0]['geometry']['type']}")

# Ensure all properties present
props_set = set()
for feat in gj['features']:
    props_set.add(tuple(feat['properties'].keys()))
print(f"\nProperty sets across features: {props_set}")

# Check all geometry types
geom_types = set()
for feat in gj['features']:
    geom_types.add(feat['geometry']['type'])
print(f"Geometry types in output: {geom_types}")
PYEOF

=== Final Verification ===
Features: 41
CRS: EPSG:4326
Columns: ['change_type', 'district_name', 'touches_changed', 'geometry']
All geom types: <ArrowStringArray>
['MultiPolygon']
Length: 1, dtype: str

Change type distribution:
change_type
unchanged             23
added_since_2014       9
removed_since_2014     9
Name: count, dtype: int64

All features:
  added_since_2014     | Liesing                   | touches=True  | area=0.0000001652
  added_since_2014     | Hietzing                  | touches=True  | area=0.0000000213
  added_since_2014     | Penzing                   | touches=True  | area=0.0000003099
  added_since_2014     | Döbling                   | touches=True  | area=0.0000002086
  added_since_2014     | Döbling, Floridsdorf      | touches=True  | area=0.0000000653
  added_since_2014     | Floridsdorf               | touches=True  | area=0.0000001150
  added_since_2014     | Donaustadt                | touches=True  | area=0.0000000845
  added_since_2014     | Simmering                 | touches=True  | area=0.0000002696
  added_since_2014     | Favoriten                 | touches=True  | area=0.0000003649
  removed_since_2014   | Liesing                   | touches=True  | area=0.0000004089
  removed_since_2014   | Hietzing                  | touches=True  | area=0.0000000228
  removed_since_2014   | Penzing                   | touches=True  | area=0.0000002052
  removed_since_2014   | Hernals                   | touches=True  | area=0.0000000342
  removed_since_2014   | Döbling                   | touches=True  | area=0.0000000541
  removed_since_2014   | Floridsdorf               | touches=True  | area=0.0000000875
  removed_since_2014   | Donaustadt                | touches=True  | area=0.0000003224
  removed_since_2014   | Simmering                 | touches=True  | area=0.0000000433
  removed_since_2014   | Favoriten                 | touches=True  | area=0.0000002339
  unchanged            | Alsergrund                | touches=False | area=0.0003591513
  unchanged            | Hietzing                  | touches=True  | area=0.0045601884
  unchanged            | Innere Stadt              | touches=False | area=0.0003470941
  unchanged            | Josefstadt                | touches=False | area=0.0001318582
  unchanged            | Leopoldstadt              | touches=False | area=0.0023272641
  unchanged            | Mariahilf                 | touches=False | area=0.0001761767
  unchanged            | Meidling                  | touches=False | area=0.0009796859
  unchanged            | Neubau                    | touches=False | area=0.0001945685
  unchanged            | Penzing                   | touches=True  | area=0.0040856172
  unchanged            | Rudolfsheim-Fünfhaus      | touches=False | area=0.0004739455
  unchanged            | Währing                   | touches=False | area=0.0007683558
  unchanged            | Landstraße                | touches=False | area=0.0008958819
  unchanged            | Brigittenau               | touches=False | area=0.0006916291
  unchanged            | Donaustadt                | touches=True  | area=0.0123806677
  unchanged            | Döbling                   | touches=True  | area=0.0030210108
  unchanged            | Favoriten                 | touches=True  | area=0.0038458717
  unchanged            | Floridsdorf               | touches=True  | area=0.0053845023
  unchanged            | Hernals                   | touches=True  | area=0.0013795225
  unchanged            | Liesing                   | touches=True  | area=0.0038746394
  unchanged            | Margareten                | touches=False | area=0.0002433314
  unchanged            | Ottakring                 | touches=False | area=0.0010493688
  unchanged            | Simmering                 | touches=True  | area=0.0028113522
  unchanged            | Wieden                    | touches=False | area=0.0002148624

GeoJSON type: FeatureCollection
Feature count: 41
First feature properties: {'change_type': 'added_since_2014', 'district_name': 'Liesing', 'touches_changed': True}
First feature geometry type: MultiPolygon

Property sets across features: {('change_type', 'district_name', 'touches_changed')}
Geometry types in output: {'MultiPolygon'}

12:54

The output looks clean and correct. Let me summarize:

1. We fetched 23 current Vienna districts and 26 historical (2014-01-01) ones, of which 23 are actual Vienna districts (the extra 3 are Enzersfeld, Gerasdorf bei Wien, and Klein-Engersdorf which were filtered out).

2. Name normalisation correctly matched all 23 districts between the two snapshots.

3. We computed the city-level symmetric difference and attributed changes back to districts:
   - 9 added_since_2014 fragments (Liesing, Hietzing, Penzing, Döbling, Döbling/Floridsdorf, Floridsdorf, Donaustadt, Simmering, Favoriten)
   - 9 removed_since_2014 fragments (Liesing, Hietzing, Penzing, Hernals, Döbling, Floridsdorf, Donaustadt, Simmering, Favoriten)
   - 23 unchanged fragments (one per district)

4. touches_changed is correctly True for districts bordering/containing changed areas (outer districts like those on the city boundary) and False for inner districts whose boundaries haven't changed.

The output format matches the requirements:
- MultiPolygon geometry ✓
- change_type: added_since_2014, removed_since_2014, or unchanged ✓
- district_name (normalised) ✓
- touches_changed (boolean) ✓

This is a good output.

12:54

Bash ls -la /work/vienna_boundary_changes.geojson && head -5 /work/vienna_boundary_ch...

-rw-r--r-- 1 agent agent 204964 Jun 16 00:46 /work/vienna_boundary_changes.geojson
{
"type": "FeatureCollection",
"name": "vienna_boundary_changes",
"crs": { "type": "name", "properties": { "name": "urn:ogc:def:crs:OGC:1.3:CRS84" } },
"features": [

13:02

The output looks good. Let me do a final sanity check on the data. The file is created at `/work/vienna_boundary_changes.geojson` with 41 features, each having:
- `change_type`: "added_since_2014", "removed_since_2014", or "unchanged"
- `district_name`: normalised district name
- `touches_changed`: boolean
- Geometry: MultiPolygon

The districts that touch changed areas are: Donaustadt, Döbling, Favoriten, Floridsdorf, Hernals, Hietzing, Liesing, Penzing, Simmering

The districts that don't touch changed areas are: Alsergrund, Brigittenau, Innere Stadt, Josefstadt, Landstraße, Leopoldstadt, Margareten, Mariahilf, Meidling, Neubau, Ottakring, Rudolfsheim-Fünfhaus, Wieden, Währing

This makes sense - the boundary changes are in the outer districts (south and west where Vienna borders Lower Austria), while the inner city districts are unchanged.

Let me verify there are no issues with the file by reading it back one more time.

13:02

Bash cd /work && python3 << 'PYEOF'

cd /work && python3 << 'PYEOF'
# Final comprehensive check
import geopandas as gpd
import json

gdf = gpd.read_file("/work/vienna_boundary_changes.geojson")

# Check all required fields exist
required = ['change_type', 'district_name', 'touches_changed']
for col in required:
    assert col in gdf.columns, f"Missing column: {col}"
    assert not gdf[col].isna().any(), f"Null values in {col}"

# Check change_type values
valid_types = ['added_since_2014', 'removed_since_2014', 'unchanged']
for val in gdf['change_type'].unique():
    assert val in valid_types, f"Invalid change_type: {val}"

# Check touches_changed is boolean
assert gdf['touches_changed'].dtype == bool

# Check all geometries are MultiPolygon
assert all(gdf.geometry.geom_type == 'MultiPolygon'), "Not all geometries are MultiPolygon"

# Check districts are normalised correctly
vienna_districts = {
    "Innere Stadt", "Leopoldstadt", "Landstraße", "Wieden", "Margareten",
    "Mariahilf", "Neubau", "Josefstadt", "Alsergrund", "Favoriten",
    "Simmering", "Meidling", "Hietzing", "Penzing", "Rudolfsheim-Fünfhaus",
    "Ottakring", "Hernals", "Währing", "Döbling", "Brigittenau",
    "Floridsdorf", "Donaustadt", "Liesing"
}

for _, row in gdf.iterrows():
    dn = row['district_name']
    if dn == 'unknown':
        continue
    for d in dn.split(", "):
        d = d.strip()
        if d and d not in vienna_districts:
            print(f"Warning: unrecognised district: {d}")

print("All checks passed!")
print(f"\nFinal output: /work/vienna_boundary_changes.geojson")
print(f"  Features: {len(gdf)}")
print(f"  Added: {(gdf['change_type']=='added_since_2014').sum()}")
print(f"  Removed: {(gdf['change_type']=='removed_since_2014').sum()}")
print(f"  Unchanged: {(gdf['change_type']=='unchanged').sum()}")
print(f"  Districts touching changes: {sorted(gdf[gdf['touches_changed']]['district_name'].unique())}")
PYEOF

All checks passed!

Final output: /work/vienna_boundary_changes.geojson
  Features: 41
  Added: 9
  Removed: 9
  Unchanged: 23
  Districts touching changes: ['Donaustadt', 'Döbling', 'Döbling, Floridsdorf', 'Favoriten', 'Floridsdorf', 'Hernals', 'Hietzing', 'Liesing', 'Penzing', 'Simmering']

13:09

The output looks clean and correct. Let me review what we've produced:

1. **Data Sources**: We fetched current Vienna district boundaries (admin_level=9) and their 2014-01-01 counterparts from Overpass via osmnx.

2. **Name Normalisation**: We handled casing and spelling differences (e.g., `Landstraße` vs `Landstrasse`, `Währing` vs `Waehring`, etc.). We also filtered out non-Vienna districts that appeared in the historical data (Enzersfeld, Gerasdorf bei Wien, Klein-Engersdorf).

3. **Comparison**: We dissolved to city-level boundaries, snapped with tolerance to avoid coordinate noise, computed symmetric difference, and attributed fragments back to districts.

4. **Classification**:
- 9 added_since_2014 fragments touching: Liesing, Hietzing, Penzing, Döbling, Döbling/Floridsdorf, Floridsdorf, Donaustadt, Simmering, Favoriten
- 9 removed_since_2014 fragments touching: Liesing, Hietzing, Penzing, Hernals, Döbling, Floridsdorf, Donaustadt, Simmering, Favoriten
- 23 unchanged fragments (one per district)

5. **touches_changed**: Districts with True are those that interestingly include the outer/peripheral districts (Donaustadt, Döbling, Favoriten, Floridsdorf, Hernals, Hietzing, Liesing, Penzing, Simmering) and the inner districts that share borders with changed areas (Hietzing also shows up unchanged but touches_changed=True since it borders changed areas).

The output file `/work/vienna_boundary_changes.geojson` is a valid GeoJSON with MultiPolygon geometries and the required properties.

13:09

dc-l3-vienna-overpass-historical

Map (reference ← swipe → agent)

Gates

Subchecks

dc-l3-vienna-overpass-historical Rerun

Map (reference ← swipe → agent)

Gates

Subchecks

dc-l3-vienna-overpass-historical