dc-l3-vienna-overpass-historical
Map (reference ← swipe → agent)
0:00
Fetch Vienna's current district polygons (Bezirke) and their 2014-01-01 counterparts from Overpass and identify which parts of the district boundaries have changed between then and now. District names changed casing and spelling between snapshots, so normalise them to match. Classify each resulting geometry fragment by how it changed: `added_since_2014` (in current but not 2014), `removed_since_2014` (in 2014 but not current), or `unchanged` (present in both snapshots). For each district flag whether its geometry borders any area that was added or removed. Output vienna_boundary_changes.geojson, MultiPolygon, each feature with change_type, district_name (normalised), and touches_changed (boolean).
0:00
System info
{
"adapter": "openrouter",
"model": "deepseek/deepseek-v4-flash",
"base_url": "https://openrouter.ai/api/v1"
}
0:03
Let me start by understanding the task: 1. Fetch Vienna's current district polygons from Overpass API 2. Fetch Vienna's 2014-01-01 district polygons from Overpass API (using old data/timestamp) 3. Normalize district names to match between the two snapshots 4. Compare geometries and classify changes (added, removed, unchanged) 5. For each district, flag if it borders any changed area 6. Output as GeoJSON with MultiPolygon geometry Let me first explore what's available and understand the Overpass query structure. Vienna's districts are called "Bezirke" in German. In OpenStreetMap, they are typically relations/ways with boundary=administrative and admin_level=9 or 10 for districts within Vienna. Let me start by writing a script to fetch this data.
0:03
Bash
ls /work/
(no output)
0:06
Let me start by writing a Python script to solve this task. I need to: 1. Fetch Vienna's current district polygons (Bezirke) from Overpass 2. Fetch Vienna's 2014-01-01 district polygons from Overpass 3. Normalize district names to match (handle casing/spelling differences) 4. Compute differences (symmetric difference) between the two snapshots 5. Classify geometry fragments: added_since_2014, removed_since_2014, unchanged 6. For each district, flag whether its geometry borders any area that was added or removed 7. Output to vienna_boundary_changes.geojson Let me first check what's available in the workspace and what libraries we have.
0:06
Bash
pip list 2>/dev/null | grep -i -E "geopandas|shapely|pyproj|overpass|osmnx|duckd...
duckdb 1.5.3 geopandas 1.1.3 osmnx 2.1.0 pyproj 3.7.2 shapely 2.1.2
0:12
Let me plan this out: 1. Fetch Vienna's current district polygons from Overpass (using `osmnx` or `overpass` API) 2. Fetch Vienna's historic district polygons from Overpass (with date=2014-01-01) 3. Normalise district names (handle casing/spelling differences) 4. Compute differences: added (current - historic), removed (historic - current), unchanged (intersection) 5. For each district, check if its geometry borders any changed area 6. Output as GeoJSON Let me start by querying Overpass. I'll use osmnx which provides a nice interface. First, let me figure out the relation IDs for Vienna districts. Vienna districts are typically tagged as `admin_level=9` or `admin_level=10` in the boundary relation. Let me find the correct query. Actually, I can use Overpass API directly with a query. Let me use the `overpass-api.de` endpoint or use `osmnx.features_from_*` methods. Let me try using osmnx to get the current districts first.
0:12
Bash
cd /work && python3 -c "
cd /work && python3 -c "
import osmnx as ox
import geopandas as gpd
# Get Vienna current boundaries (admin_level=9 for Vienna districts)
# First check what OSM place/relation exists for Vienna
vienna = ox.geocode_to_gdf('Vienna, Austria')
print(vienna.columns)
print(vienna[['name', 'osm_type', 'osm_id', 'admin_level']].head())
"
Index(['geometry', 'bbox_west', 'bbox_south', 'bbox_east', 'bbox_north',
'place_id', 'osm_type', 'osm_id', 'lat', 'lon', 'class', 'type',
'place_rank', 'importance', 'addresstype', 'name', 'display_name'],
dtype='str')
Traceback (most recent call last):
File "<string>", line 9, in <module>
print(vienna[['name', 'osm_type', 'osm_id', 'admin_level']].head())
~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.14/site-packages/geopandas/geodataframe.py", line 1891, in __getitem__
result = super().__getitem__(key)
File "/usr/local/lib/python3.14/site-packages/pandas/core/frame.py", line 4384, in __getitem__
indexer = self.columns._get_indexer_strict(key, "columns")[1]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.14/site-packages/pandas/core/indexes/base.py", line 6302, in _get_indexer_strict
self._raise_if_missing(keyarr, indexer, axis_name)
~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.14/site-packages/pandas/core/indexes/base.py", line 6355, in _raise_if_missing
raise KeyError(f"{not_found} not in index")
KeyError: "['admin_level'] not in index"
0:22
Let me use osmnx to fetch the Vienna districts. I'll use the `features_from_place` function with the right tags. District boundaries in Vienna are typically tagged as `admin_level=9` and `boundary=administrative` in OSM. Let me also use Overpass API directly to get snapshots from 2014-01-01. Let me first explore what's available.
0:22
Bash
cd /work && python3 -c "
cd /work && python3 -c "
import osmnx as ox
ox.settings.log_console = False
# Get current Vienna districts using OSMnx
# admin_level=9 are districts in Vienna
gdf = ox.features_from_place('Vienna, Austria', tags={'boundary':'administrative', 'admin_level':'9'})
print('Shape:', gdf.shape)
print('Columns:', gdf.columns.tolist())
print('---')
print(gdf[['name', 'name:en', 'admin_level', 'osm_id']].head(10))
print('---')
print('Unique admin_level values:', gdf['admin_level'].unique())
print('Unique boundary values:', gdf['boundary'].dropna().unique())
"
Shape: (897, 411) Columns: ['geometry', 'ISO3166-1:alpha2', 'admin_level', 'capital', 'description', 'name', 'name:ab', 'name:ace', 'name:af', 'name:am', 'name:an', 'name:ar', 'name:arc', 'name:arz', 'name:as', 'name:ast', 'name:av', 'name:az', 'name:ba', 'name:bar', 'name:bat-smg', 'name:be', 'name:be-tarask', 'name:bg', 'name:bi', 'name:bn', 'name:bo', 'name:br', 'name:bs', 'name:ca', 'name:ce', 'name:ceb', 'name:ch', 'name:ckb', 'name:co', 'name:crh', 'name:cs', 'name:cu', 'name:cv', 'name:cy', 'name:de', 'name:diq', 'name:ee', 'name:el', 'name:en', 'name:eo', 'name:es', 'name:et', 'name:eu', 'name:ext', 'name:fa', 'name:ff', 'name:fi', 'name:fr', 'name:frp', 'name:fur', 'name:fy', 'name:ga', 'name:gag', 'name:gd', 'name:gl', 'name:gn', 'name:gv', 'name:ha', 'name:he', 'name:hi', 'name:hif', 'name:hr', 'name:ht', 'name:hu', 'name:hy', 'name:ia', 'name:id', 'name:ie', 'name:ilo', 'name:is', 'name:it', 'name:ja', 'name:jv', 'name:ka', 'name:kab', 'name:kk', 'name:kn', 'name:ko', 'name:koi', 'name:krc', 'name:ku', 'name:kv', 'name:ky', 'name:la', 'name:lez', 'name:li', 'name:lij', 'name:lmo', 'name:ln', 'name:lt', 'name:lv', 'name:mg', 'name:mhr', 'name:mi', 'name:mk', 'name:ml', 'name:mn', 'name:mr', 'name:mrj', 'name:ms', 'name:mt', 'name:my', 'name:nah', 'name:nds', 'name:nds-nl', 'name:ne', 'name:nl', 'name:no', 'name:nv', 'name:oc', 'name:or', 'name:os', 'name:pa', 'name:pap', 'name:pih', 'name:pl', 'name:pms', 'name:pnb', 'name:ps', 'name:pt', 'name:rm', 'name:rmy', 'name:ro', 'name:roa-tara', 'name:ru', 'name:rue', 'name:sah', 'name:sc', 'name:scn', 'name:sco', 'name:sd', 'name:sh', 'name:si', 'name:sk', 'name:sl', 'name:sn', 'name:so', 'name:sq', 'name:sr', 'name:st', 'name:sv', 'name:sw', 'name:szl', 'name:ta', 'name:te', 'name:tg', 'name:th', 'name:tk', 'name:tl', 'name:tr', 'name:tt', 'name:tw', 'name:tzl', 'name:udm', 'name:ug', 'name:uk', 'name:ur', 'name:uz', 'name:vec', 'name:vep', 'name:vi', 'name:vro', 'name:wa', 'name:war', 'name:wo', 'name:wuu', 'name:yi', 'name:yo', 'name:yue', 'name:zh', 'name:zh-Hans', 'name:zh-Hant', 'population', 'population:date', 'source:population', 'website', 'wikidata', 'wikipedia', 'ISO3166-1', 'alt_name:eo', 'currency', 'flag', 'name:ak', 'name:chr', 'name:da', 'name:dz', 'name:fo', 'name:gu', 'name:haw', 'name:hsb', 'name:io', 'name:jbo', 'name:kl', 'name:ksh', 'name:kw', 'name:lb', 'name:lo', 'name:nn', 'name:se', 'name:smn', 'name:sms', 'name:tok', 'name:vo', 'name:zu', 'official_name', 'official_name:be', 'official_name:ckb', 'official_name:cs', 'official_name:el', 'official_name:eo', 'official_name:hu', 'source', 'boundary', 'image', 'ref', 'alt_name', 'width', 'wikimedia_commons', 'note', 'long_name', 'addr:city', 'addr:country', 'addr:housenumber', 'addr:postcode', 'addr:street', 'government', 'office', 'wheelchair', 'opening_hours', 'highway', 'maxspeed', 'sidewalk', 'source:maxspeed', 'lane_markings', 'lit', 'surface', 'oneway', 'oneway:bicycle', 'cycleway:left', 'cycleway:right', 'lanes', 'sidewalk:left', 'sidewalk:right', 'destination', 'destination:symbol', 'lanes:bus', 'sidewalk:left:surface', 'smoothness', 'turn:lanes', 'cycleway', 'cycleway:lane', 'parking:both', 'parking:left:orientation', 'parking:right:orientation', 'cycleway:both', 'parking:both:orientation', 'sidewalk:both', 'maxspeed:type', 'sidewalk:right:surface', 'cycleway:both:lane', 'network', 'embedded_rails', 'sidewalk:both:surface', 'est_width:carriageway', 'parking:left', 'parking:right', 'source:est_width:carriageway', 'name:etymology:wikidata', 'bridge', 'lanes:backward', 'lanes:forward', 'layer', 'turn:lanes:backward', 'turn:lanes:forward', 'lcn', 'cycleway:left:lane', 'name:etymology', 'name:start_date', 'maxweight:signed', 'bicycle', 'cycleway:left:oneway', 'loc_name', 'motor_vehicle:conditional', 'old_ref', 'traffic_sign', 'electrified', 'frequency', 'gauge', 'operator', 'operator:wikidata', 'railway', 'railway:preferred_direction', 'voltage', 'oneway:psv', 'odbl', 'boundary_type', 'cycleway:right:lane', 'shoulder', 'check_date', 'oneway:bus', 'source:width', 'amenity', 'building', 'building:levels', 'check_date:cycleway', 'check_date:smoothness', 'name:wikipedia', 'maxheight', 'level', 'destination:backward', 'destination:ref:backward', 'destination:forward', 'destination:ref:forward', 'parking:right:restriction', 'type', 'ISO3166-1:alpha3', 'ISO3166-1:numeric', 'TMC:cid_58:tabcd_1:Class', 'TMC:cid_58:tabcd_1:LCLversion', 'TMC:cid_58:tabcd_1:LocationCode', 'default_language', 'driving_side', 'int_name', 'land_area', 'name:als', 'name:ang', 'name:ay', 'name:bcl', 'name:bh', 'name:bpy', 'name:bug', 'name:bxr', 'name:cbk-zam', 'name:cdo', 'name:chm', 'name:chy', 'name:csb', 'name:din', 'name:dsb', 'name:dv', 'name:frr', 'name:hak', 'name:ig', 'name:kaa', 'name:kbd', 'name:kg', 'name:lad', 'name:lld', 'name:ltg', 'name:lzh', 'name:mdf', 'name:mwl', 'name:na', 'name:nan', 'name:new', 'name:nov', 'name:nrm', 'name:om', 'name:pag', 'name:pam', 'name:pcd', 'name:pdc', 'name:pfl', 'name:pi', 'name:pnt', 'name:prefix:at', 'name:qu', 'name:roa-rup', 'name:rw', 'name:sa', 'name:sm', 'name:ss', 'name:stq', 'name:su', 'name:tet', 'name:tpi', 'name:ts', 'name:vls', 'name:xal', 'name:xmf', 'name:za', 'name:zea', 'official_name:be-tarask', 'official_name:ca', 'official_name:mi', 'ref:at:gkz', 'timezone', 'ISO3166-2', 'alt_name:cs', 'coat_of_arms', 'ref:nuts', 'ref:nuts:2', 'ref:at:okz', 'admin_level:6', 'admin_level:8', 'ref:at:kgnr', 'short_name', 'admin_title', 'designation', 'official_status', 'street_names:wikipedia', 'old_name', 'wikipedia:ru', 'roof:levels', 'related:wikipedia'] ---
Traceback (most recent call last):
File "<string>", line 11, in <module>
print(gdf[['name', 'name:en', 'admin_level', 'osm_id']].head(10))
~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.14/site-packages/geopandas/geodataframe.py", line 1891, in __getitem__
result = super().__getitem__(key)
File "/usr/local/lib/python3.14/site-packages/pandas/core/frame.py", line 4384, in __getitem__
indexer = self.columns._get_indexer_strict(key, "columns")[1]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.14/site-packages/pandas/core/indexes/base.py", line 6302, in _get_indexer_strict
self._raise_if_missing(keyarr, indexer, axis_name)
~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.14/site-packages/pandas/core/indexes/base.py", line 6355, in _raise_if_missing
raise KeyError(f"{not_found} not in index")
KeyError: "['osm_id'] not in index"
0:45
The issue is that the features returned don't have 'osm_id' in columns. Let me explore better.
0:45
Bash
cd /work && python3 -c "
cd /work && python3 -c "
import osmnx as ox
ox.settings.log_console = False
gdf = ox.features_from_place('Vienna, Austria', tags={'boundary':'administrative', 'admin_level':'9'})
print('Shape:', gdf.shape)
print('Index names:', gdf.index.names)
print('Index sample:', gdf.index[:5])
print()
# Check admin_level values
print('admin_level values:', sorted(gdf['admin_level'].dropna().unique()))
print()
# Filter to only admin_level=9
gdf9 = gdf[gdf['admin_level'] == '9'].copy()
print('admin_level=9 shape:', gdf9.shape)
print()
print('name values:')
print(gdf9['name'].unique())
"
Shape: (897, 411)
Index names: ['element', 'id']
Index sample: MultiIndex([( 'node', 2518271394),
( 'node', 2518271399),
( 'node', 2518271406),
('relation', 16239),
('relation', 63619)],
names=['element', 'id'])
admin_level values: ['10', '2', '4', '6', '8', '9']
admin_level=9 shape: (418, 411)
name values:
<ArrowStringArray>
[ 'Bezirksgesundheitsamt',
'Mag Elf',
'Bürgerdienst Kundenzentrum Simmering',
'Alsergrund',
'Hietzing',
'Innere Stadt',
'Josefstadt',
'Leopoldstadt',
'Mariahilf',
'Meidling',
'Neubau',
'Penzing',
'Rudolfsheim-Fünfhaus',
'Währing',
'Landstraße',
'Brigittenau',
'Donaustadt',
'Döbling',
'Favoriten',
'Floridsdorf',
'Hernals',
'Liesing',
'Margareten',
'Ottakring',
'Simmering',
'Wieden',
'Amtshaus für den 16. Bezirk',
'Reichmanngasse',
'Fenzlgasse',
'Josef-Weinheber-Platz',
'Steinbruchstraße',
'Flötzersteig',
'Burggasse',
'Kleine Neugasse',
'Blechturmgasse',
'Rosa-Fischer-Gasse',
'Friedmanngasse',
'Lerchenfelder Straße',
'Donauturmstraße',
nan,
'Starkfriedgasse',
'Emil-Behring-Weg',
'Gräßlplatz',
'Paragonstraße',
'Drygalskiweg',
'Gregor-Mendel-Straße',
'Peter-Jordan-Straße',
'Geblergasse',
'Klopstockgasse',
'Döblerhofstraße',
'Kreuzgassenbrücke',
'Wattgasse',
'Büdingergasse',
'Klimtgasse',
'Wittgensteinstraße',
'Kirchfeldgasse',
'Hermann-Pacher-Gasse',
'Haslingergasse',
'Kettenbrückengasse',
'Mariahilfer Straße',
'Ziegelofengasse',
'Sebaldgasse',
'Wienerbergstraße',
'Guglgasse',
'Mittersteig',
'Wundtgasse',
'Atzgersdorfer Straße',
'Blumengasse',
'Schlechtastraße',
'Amtshaus Simmering',
'Hasenauerstraße',
'Ottakringer Straße',
'Spiegelgrundstraße',
'Babenbergerstraße',
'Prinz-Eugen-Straße',
'Veronikagasse',
'Speisinger Straße',
'Erdbergstraße',
'Wilhelminenstraße',
'Margaretenstraße',
'Wichtelgasse',
'Leopold-Ernst-Gasse',
'Hildebrandgasse',
'Khevenhüllerstraße',
'Oskar-Grissemann-Straße',
'Kettenbrücke',
'Rosenackerstraße',
'Karplusgasse',
'Bezirksvorstehung Alsergrund']
Length: 89, dtype: str
0:53
I see the issue - the `features_from_place` returns all sorts of features tagged with `boundary=administrative` and `admin_level=9`, but many of them are not actually districts. The district names I can see are the unique district names like "Alsergrund", "Hietzing", "Innere Stadt", etc. Let me look more carefully at what we have. Actually, I notice that there are 418 features with admin_level=9. Many are buildings or addresses that also have admin_level=9 for some reason. Let me filter by geometry type and look at the actual polygon boundary relations. Let me use a different approach - I'll query Overpass directly for the district relations.
0:53
Bash
cd /work && python3 -c "
cd /work && python3 -c "
import osmnx as ox
import geopandas as gpd
ox.settings.log_console = False
gdf = ox.features_from_place('Vienna, Austria', tags={'boundary':'administrative', 'admin_level':'9'})
gdf9 = gdf[gdf['admin_level'] == '9'].copy()
# Let's look at just the districts - filter by geometry type polygon and reasonable size
# Also check 'type' column if present
print('Type column values:', gdf9['type'].dropna().unique())
print()
# Let's filter to only relation-type features (districts are typically relations)
relations = gdf9[gdf9.index.get_level_values('element') == 'relation']
print('Number of relations:', len(relations))
print('Names of relations:')
print(relations['name'].unique())
"
Type column values: <ArrowStringArray>
['boundary', 'multipolygon']
Length: 2, dtype: str
Number of relations: 24
Names of relations:
<ArrowStringArray>
[ 'Alsergrund', 'Hietzing',
'Innere Stadt', 'Josefstadt',
'Leopoldstadt', 'Mariahilf',
'Meidling', 'Neubau',
'Penzing', 'Rudolfsheim-Fünfhaus',
'Währing', 'Landstraße',
'Brigittenau', 'Donaustadt',
'Döbling', 'Favoriten',
'Floridsdorf', 'Hernals',
'Liesing', 'Margareten',
'Ottakring', 'Simmering',
'Wieden', 'Amtshaus für den 16. Bezirk']
Length: 24, dtype: str
2:04
So OSMnx gives us 24 relations for Vienna districts - 23 districts (Bezirke) plus one extra ("Amtshaus für den 16. Bezirk" is an office not a district). Let me check which ones are actually districts.
Vienna has 23 districts:
1. Innere Stadt
2. Leopoldstadt
3. Landstraße
4. Wieden
5. Margareten
6. Mariahilf
7. Neubau
8. Josefstadt
9. Alsergrund
10. Favoriten
11. Simmering
12. Meidling
13. Hietzing
14. Penzing
15. Rudolfsheim-Fünfhaus
16. Ottakring
17. Hernals
18. Währing
19. Döbling
20. Brigittenau
21. Floridsdorf
22. Donaustadt
23. Liesing
So the relations with district names look right - 23 districts. The "Amtshaus für den 16. Bezirk" is a building/office, not a district.
Now I need to also get the 2014-01-01 snapshot. Overpass API has a way to query historical data using `[date:"2014-01-01T00:00:00Z"]`. Let me use Overpass directly via the Overpass API or use OSMnx with custom data.
Actually, OSMnx can't query historical data directly. I'll need to use the Overpass API directly with a Python request.
Let me write a comprehensive script to:
1. Get current Vienna district boundaries from Overpass
2. Get 2014-01-01 Vienna district boundaries from Overpass
3. Normalize district names
4. Compute symmetric difference to find changes
5. Classify fragments
6. Determine which districts border changed areas
7. Output GeoJSON
Let me write a Python script for this.
2:04
Write
/work/solve.py
content · 515 lines · py
"""
Fetch Vienna district boundaries (current and 2014-01-01) from Overpass,
compare to find changes, and output vienna_boundary_changes.geojson.
"""
import json
import sys
import time
import urllib.request
import urllib.error
import re
import geopandas as gpd
import pandas as pd
import shapely
import numpy as np
# ---------------------------------------------------------------------------
# 1. Overpass query helper
# ---------------------------------------------------------------------------
OVERPASS_URL = "https://overpass-api.de/api/interpreter"
def overpass_query(query, date=None):
"""Run an Overpass query, optionally at a historical date."""
full_query = query
if date:
# prepend the date setting
full_query = f"[date:\"{date}\"];\n" + query
data = f"data={urllib.request.quote(full_query)}"
req = urllib.request.Request(OVERPASS_URL, data.encode("utf-8"))
req.add_header("Content-Type", "application/x-www-form-urlencoded")
# Retry with backoff
for attempt in range(5):
try:
with urllib.request.urlopen(req, timeout=120) as resp:
return json.loads(resp.read().decode("utf-8"))
except urllib.error.HTTPError as e:
if e.code == 429:
time.sleep(2 ** attempt)
continue
raise
except Exception as e:
if attempt < 4:
time.sleep(2 ** attempt)
continue
raise
return None
def osm_json_to_gdf(osm_json):
"""Convert Overpass JSON output (nodes, ways, relations) into a GeoDataFrame
with MultiPolygon geometries for relations tagged as boundary/administrative.
We look for relations with type=boundary or boundary=administrative.
"""
# Parse elements
elements = osm_json.get("elements", [])
# Collect nodes by id
nodes = {}
ways = {}
relations = []
for el in elements:
etype = el.get("type")
eid = el.get("id")
if etype == "node":
nodes[eid] = (el.get("lon"), el.get("lat"))
elif etype == "way":
ways[eid] = el.get("nodes", [])
elif etype == "relation":
relations.append(el)
# Build linestrings for each way
def way_to_coords(way_id):
node_ids = ways.get(way_id, [])
coords = []
for nid in node_ids:
if nid in nodes:
coords.append(nodes[nid])
return coords
# For each relation tagged as boundary=administrative (admin_level=9)
# build the geometry from its members
features = []
for rel in relations:
tags = rel.get("tags", {})
# Check if this is a district boundary relation
is_boundary = (tags.get("boundary") == "administrative" or
tags.get("type") == "boundary" or
tags.get("admin_level") == "9")
if not is_boundary:
continue
name = tags.get("name", "")
# We want district-level boundaries (admin_level=9)
if tags.get("admin_level") not in (None, "", "9"):
continue
members = rel.get("members", [])
# Collect outer/inner rings from way members
outer_rings = []
inner_rings = []
for mem in members:
if mem.get("type") != "way":
continue
role = mem.get("role", "")
ref = mem.get("ref")
if ref not in ways:
continue
coords = way_to_coords(ref)
if len(coords) < 3:
continue
if role == "inner":
inner_rings.append(coords)
else: # outer or no role
outer_rings.append(coords)
# Build polygons
polygons = []
for outer in outer_rings:
if len(outer) < 3:
continue
# Close ring if not closed
if outer[0] != outer[-1]:
outer = outer + [outer[0]]
try:
outer_ls = shapely.LineString(outer)
if not outer_ls.is_valid or outer_ls.is_empty:
continue
outer_ring = shapely.polygon.orient(shapely.Polygon(outer_ls))
# Find inners that are contained in this outer
inners_for_this = []
remaining_inners = []
for inner in inner_rings:
if inner[0] != inner[-1]:
inner = inner + [inner[0]]
try:
inner_ls = shapely.LineString(inner)
if inner_ls.is_valid and not inner_ls.is_empty:
ipoly = shapely.Polygon(inner_ls)
if outer_ring.contains(ipoly.representative_point()) or outer_ring.intersects(ipoly):
inners_for_this.append(ipoly)
else:
remaining_inners.append(inner)
else:
remaining_inners.append(inner)
except Exception:
remaining_inners.append(inner)
inner_rings = remaining_inners
if inners_for_this:
poly = shapely.Polygon(outer_ls, [list(i.exterior.coords) for i in inners_for_this])
else:
poly = outer_ls
if poly.is_valid and not poly.is_empty:
polygons.append(poly)
except Exception:
continue
if polygons:
geom = shapely.unary_union(polygons) if len(polygons) > 1 else polygons[0]
if geom.geom_type == "Polygon":
geom = shapely.geometry.MultiPolygon([geom])
elif geom.geom_type == "MultiPolygon":
pass
else:
continue
features.append({
"geometry": geom,
"name": name,
"osm_id": rel.get("id"),
"admin_level": tags.get("admin_level", "9"),
})
if not features:
return gpd.GeoDataFrame()
gdf = gpd.GeoDataFrame(features, crs="EPSG:4326")
return gdf
def fetch_vienna_districts(date=None):
"""Fetch all Vienna district boundary relations (admin_level=9) via Overpass."""
# Use a bounding box around Vienna to limit results, and filter by admin_level=9
query = """
[out:json][timeout:120];
(
relation["admin_level"="9"]["boundary"="administrative"](48.1,16.1,48.35,16.6);
);
out body;
>;
out skel qt;
"""
return overpass_query(query, date=date)
# ---------------------------------------------------------------------------
# 2. Name normalisation
# ---------------------------------------------------------------------------
# Vienna districts and their expected normalised names
VIENNA_DISTRICTS = {
1: "Innere Stadt",
2: "Leopoldstadt",
3: "Landstraße",
4: "Wieden",
5: "Margareten",
6: "Mariahilf",
7: "Neubau",
8: "Josefstadt",
9: "Alsergrund",
10: "Favoriten",
11: "Simmering",
12: "Meidling",
13: "Hietzing",
14: "Penzing",
15: "Rudolfsheim-Fünfhaus",
16: "Ottakring",
17: "Hernals",
18: "Währing",
19: "Döbling",
20: "Brigittenau",
21: "Floridsdorf",
22: "Donaustadt",
23: "Liesing",
}
def normalise_name(name):
"""Normalise a district name to its canonical form."""
if not name or pd.isna(name):
return None
# Strip whitespace, handle known variations
n = name.strip()
# Some common variations in casing/spelling
lookup = {
"innere stadt": "Innere Stadt",
"1. bezirk": "Innere Stadt",
"1. bezirk innere stadt": "Innere Stadt",
"leopoldstadt": "Leopoldstadt",
"2. bezirk": "Leopoldstadt",
"landstraße": "Landstraße",
"landstrasse": "Landstraße",
"3. bezirk": "Landstraße",
"wieden": "Wieden",
"4. bezirk": "Wieden",
"margareten": "Margareten",
"5. bezirk": "Margareten",
"margarethen": "Margareten",
"mariahilf": "Mariahilf",
"6. bezirk": "Mariahilf",
"neubau": "Neubau",
"7. bezirk": "Neubau",
"josefstadt": "Josefstadt",
"8. bezirk": "Josefstadt",
"alsergrund": "Alsergrund",
"9. bezirk": "Alsergrund",
"favoriten": "Favoriten",
"10. bezirk": "Favoriten",
"simmering": "Simmering",
"11. bezirk": "Simmering",
"meidling": "Meidling",
"12. bezirk": "Meidling",
"hietzing": "Hietzing",
"13. bezirk": "Hietzing",
"penzing": "Penzing",
"14. bezirk": "Penzing",
"rudolfsheim-fünfhaus": "Rudolfsheim-Fünfhaus",
"rudolfsheim-fuenfhaus": "Rudolfsheim-Fünfhaus",
"rudolfsheim funfhaus": "Rudolfsheim-Fünfhaus",
"rudolfsheim fünfhaus": "Rudolfsheim-Fünfhaus",
"15. bezirk": "Rudolfsheim-Fünfhaus",
"ottakring": "Ottakring",
"16. bezirk": "Ottakring",
"hernals": "Hernals",
"17. bezirk": "Hernals",
"währing": "Währing",
"waehring": "Währing",
"18. bezirk": "Währing",
"döbling": "Döbling",
"doebling": "Döbling",
"19. bezirk": "Döbling",
"brigittenau": "Brigittenau",
"20. bezirk": "Brigittenau",
"floridsdorf": "Floridsdorf",
"21. bezirk": "Floridsdorf",
"donaustadt": "Donaustadt",
"22. bezirk": "Donaustadt",
"liesing": "Liesing",
"23. bezirk": "Liesing",
}
key = n.lower().replace(" ", " ").replace("-", "-")
if key in lookup:
return lookup[key]
# Try fuzzy matching
for pattern, canon in lookup.items():
if pattern in key or key in pattern:
return canon
return n
# ---------------------------------------------------------------------------
# 3. Main pipeline
# ---------------------------------------------------------------------------
def main():
print("Fetching current Vienna district boundaries...")
current_osm = fetch_vienna_districts()
current_gdf = osm_json_to_gdf(current_osm)
print(f" Got {len(current_gdf)} current district features")
print("Fetching 2014-01-01 Vienna district boundaries...")
historical_osm = fetch_vienna_districts(date="2014-01-01T00:00:00Z")
historical_gdf = osm_json_to_gdf(historical_osm)
print(f" Got {len(historical_gdf)} historical district features")
if current_gdf.empty or historical_gdf.empty:
print("ERROR: Empty data returned. Saving raw Overpass responses for debugging.")
with open("/work/current_raw.json", "w") as f:
json.dump(current_osm, f, indent=2)
with open("/work/historical_raw.json", "w") as f:
json.dump(historical_osm, f, indent=2)
print("Raw data saved to /work/current_raw.json and /work/historical_raw.json")
return
# Normalise names
current_gdf["district_name"] = current_gdf["name"].apply(normalise_name)
historical_gdf["district_name"] = historical_gdf["name"].apply(normalise_name)
print("\nCurrent district names:")
for n in sorted(current_gdf["district_name"].dropna().unique()):
print(f" - {n}")
print("\nHistorical district names:")
for n in sorted(historical_gdf["district_name"].dropna().unique()):
print(f" - {n}")
# Remove non-district features (like "Amtshaus für den 16. Bezirk")
valid_names = set(VIENNA_DISTRICTS.values())
current_gdf = current_gdf[current_gdf["district_name"].isin(valid_names)].copy()
historical_gdf = historical_gdf[historical_gdf["district_name"].isin(valid_names)].copy()
print(f"\nAfter filtering: {len(current_gdf)} current, {len(historical_gdf)} historical districts")
# Check for matching districts
current_names = set(current_gdf["district_name"].dropna())
historical_names = set(historical_gdf["district_name"].dropna())
common = current_names & historical_names
only_current = current_names - historical_names
only_historical = historical_names - current_names
print(f"Common districts: {len(common)}")
if only_current:
print(f"Only in current: {only_current}")
if only_historical:
print(f"Only in historical: {only_historical}")
# Build a unified dissolve of all current and all historical boundaries
current_union = current_gdf.geometry.unary_union
historical_union = historical_gdf.geometry.unary_union
print(f"\nCurrent total boundary area: {current_union.area:.4f} sq deg")
print(f"Historical total boundary area: {historical_union.area:.4f} sq deg")
# Symmetric difference gives changed areas
added = shapely.difference(current_union, historical_union)
removed = shapely.difference(historical_union, current_union)
unchanged = shapely.intersection(current_union, historical_union)
print(f"Added area: {added.area:.6f}")
print(f"Removed area: {removed.area:.6f}")
print(f"Unchanged area: {unchanged.area:.6f}")
# Prepare fragments
fragments = []
# Added fragments
if not added.is_empty:
if added.geom_type == "MultiPolygon":
for poly in added.geoms:
fragments.append((shapely.geometry.MultiPolygon([poly]), "added_since_2014"))
elif added.geom_type == "Polygon":
fragments.append((shapely.geometry.MultiPolygon([added]), "added_since_2014"))
# Removed fragments
if not removed.is_empty:
if removed.geom_type == "MultiPolygon":
for poly in removed.geoms:
fragments.append((shapely.geometry.MultiPolygon([poly]), "removed_since_2014"))
elif removed.geom_type == "Polygon":
fragments.append((shapely.geometry.MultiPolygon([removed]), "removed_since_2014"))
# Unchanged fragments - we can split by district for better attribution
unchanged_frags = []
for _, row in current_gdf.iterrows():
dist_name = row["district_name"]
dist_geom = row["geometry"]
unchanged_part = shapely.intersection(dist_geom, unchanged)
if not unchanged_part.is_empty:
if unchanged_part.geom_type == "MultiPolygon":
for poly in unchanged_part.geoms:
unchanged_frags.append((shapely.geometry.MultiPolygon([poly]), "unchanged", dist_name))
elif unchanged_part.geom_type == "Polygon":
unchanged_frags.append((shapely.geometry.MultiPolygon([unchanged_part]), "unchanged", dist_name))
else:
unchanged_frags.append((shapely.geometry.MultiPolygon([unchanged_part]), "unchanged", dist_name))
# For added/removed, determine which district they belong to
# We'll assign the district_name based on which current/historical district contains/intersects them
added_with_dist = []
for geom, ctype in fragments:
if ctype == "added_since_2014":
# Find which current district(s) this belongs to
touching_districts = set()
for _, row in current_gdf.iterrows():
if geom.intersects(row["geometry"]):
touching_districts.add(row["district_name"])
dist_name = ", ".join(sorted(touching_districts)) if touching_districts else "unknown"
added_with_dist.append((geom, ctype, dist_name))
elif ctype == "removed_since_2014":
touching_districts = set()
for _, row in historical_gdf.iterrows():
if geom.intersects(row["geometry"]):
touching_districts.add(row["district_name"])
dist_name = ", ".join(sorted(touching_districts)) if touching_districts else "unknown"
added_with_dist.append((geom, ctype, dist_name))
# Combine all fragments
all_fragments = []
for geom, ctype, dname in added_with_dist:
all_fragments.append({"geometry": geom, "change_type": ctype, "district_name": dname})
for geom, ctype, dname in unchanged_frags:
all_fragments.append({"geometry": geom, "change_type": ctype, "district_name": dname})
# Build GeoDataFrame
result_gdf = gpd.GeoDataFrame(all_fragments, crs="EPSG:4326")
if result_gdf.empty:
print("No fragments produced — all boundaries unchanged.")
# Create a dummy feature showing all current boundaries as unchanged
for _, row in current_gdf.iterrows():
result_gdf = gpd.GeoDataFrame([
{"geometry": row["geometry"], "change_type": "unchanged", "district_name": row["district_name"]}
], crs="EPSG:4326")
else:
print(f"\nFragments created: {len(result_gdf)}")
print(result_gdf["change_type"].value_counts())
# -----------------------------------------------------------------------
# 4. Compute touches_changed for each district
# -----------------------------------------------------------------------
# For each district, check if its geometry borders (touches) any added or
# removed area.
# Get all added/removed geometries combined
changed_areas = []
for _, row in result_gdf.iterrows():
if row["change_type"] in ("added_since_2014", "removed_since_2014"):
changed_areas.append(row["geometry"])
if changed_areas:
changed_union = shapely.unary_union(changed_areas)
else:
changed_union = shapely.geometry.MultiPolygon()
# For each district in current data, check if it touches changed areas
district_touches = {}
for _, row in current_gdf.iterrows():
dname = row["district_name"]
geom = row["geometry"]
if not changed_union.is_empty and geom.touches(changed_union):
district_touches[dname] = True
else:
# Also check if the district itself IS an added/removed part
# i.e. if the district geometry overlaps with changed areas
if not changed_union.is_empty and geom.intersects(changed_union):
# Check if it's more than just touching - if it contains changed area
overlap = shapely.intersection(geom, changed_union)
if not overlap.is_empty and overlap.area > 0:
district_touches[dname] = True
else:
district_touches[dname] = False
else:
district_touches[dname] = False
# Assign touches_changed to each fragment
result_gdf["touches_changed"] = result_gdf["district_name"].apply(
lambda dn: any(district_touches.get(d.strip(), False) for d in dn.split(", "))
if dn != "unknown" else False
)
# Simplify geometries a bit to keep file size manageable
result_gdf["geometry"] = result_gdf["geometry"].simplify(0.00001, preserve_topology=True)
# Output
output_path = "/work/vienna_boundary_changes.geojson"
result_gdf.to_file(output_path, driver="GeoJSON")
print(f"\nOutput written to {output_path}")
print(f"Features: {len(result_gdf)}")
print(f"Columns: {result_gdf.columns.tolist()}")
print(f"\nSample:\n{result_gdf[['change_type', 'district_name', 'touches_changed']].head(10)}")
print(f"\nDistrict touch summary:")
for dname, touches in sorted(district_touches.items()):
print(f" {dname}: touches_changed={touches}")
if __name__ == "__main__":
main()
Wrote /work/solve.py (19800 chars).
2:06
Let me run the script to test it.
2:06
Bash
cd /work && timeout 300 python solve.py 2>&1
Fetching current Vienna district boundaries...
Traceback (most recent call last):
File "/work/solve.py", line 515, in <module>
main()
~~~~^^
File "/work/solve.py", line 315, in main
current_osm = fetch_vienna_districts()
File "/work/solve.py", line 203, in fetch_vienna_districts
return overpass_query(query, date=date)
File "/work/solve.py", line 36, in overpass_query
with urllib.request.urlopen(req, timeout=120) as resp:
~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.14/urllib/request.py", line 187, in urlopen
return opener.open(url, data, timeout)
~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.14/urllib/request.py", line 493, in open
response = meth(req, response)
File "/usr/local/lib/python3.14/urllib/request.py", line 602, in http_response
response = self.parent.error(
'http', request, response, code, msg, hdrs)
File "/usr/local/lib/python3.14/urllib/request.py", line 531, in error
return self._call_chain(*args)
~~~~~~~~~~~~~~~~^^^^^^^
File "/usr/local/lib/python3.14/urllib/request.py", line 464, in _call_chain
result = func(*args)
File "/usr/local/lib/python3.14/urllib/request.py", line 611, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 406: Not Acceptable
2:15
The Overpass API is returning a 406 error. Let me check the query format. The issue might be that we're using GET with data in the body, which Overpass doesn't accept. We need to POST properly. Let me check the official Overpass API specs. Actually, looking at the code, I'm using `urllib.request.quote` on the data and sending it as a POST with Content-Type `application/x-www-form-urlencoded`. But Overpass API typically expects the data parameter to be sent as POST body with Content-Type `application/x-www-form-urlencoded`. Let me check the URL format. Overpass API (https://overpass-api.de/api/interpreter) accepts POST requests with the query in the `data` parameter. Let me try using a different approach - maybe using `requests` library or fixing the query. Let me also check if we can use `osmnx` with a date parameter instead. Actually, let me try using `requests` and the correct format.
2:15
Bash
cd /work && python3 -c "
cd /work && python3 -c "
import requests
query = '''
[out:json][timeout:120];
(
relation[\"admin_level\"=\"9\"][\"boundary\"=\"administrative\"](48.1,16.1,48.35,16.6);
);
out body;
>;
out skel qt;
'''
r = requests.post('https://overpass-api.de/api/interpreter', data={'data': query}, timeout=120)
print('Status:', r.status_code)
print('Len:', len(r.text))
if r.status_code == 200:
import json
data = r.json()
print('Elements:', len(data.get('elements', [])))
else:
print(r.text[:500])
"
Status: 406 Len: 371 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> <html><head> <title>406 Not Acceptable</title> </head><body> <h1>Not Acceptable</h1> <p>An appropriate representation of the requested resource could not be found on this server.</p> <hr> <address>Apache/2.4.67 (Debian) Server at overpass-api.de Port 443</address> </body></html>
2:19
The Overpass API is returning 406 Not Acceptable. This might be due to content-type negotiation. Let me check the Overpass API requirements. The older API uses a different content-type approach. Let me try with the right format.
2:19
Bash
curl -s -X POST -d 'data=[out:json];relation["admin_level"="9"]["boundary"="admi...
bash: line 1: curl: command not found
2:24
Let me fix the Overpass query to use proper POST with requests library.
2:24
Bash
cd /work && python3 -c "
cd /work && python3 -c "
import requests
query = '''
[out:json][timeout:120];
(
relation[\"admin_level\"=\"9\"][\"boundary\"=\"administrative\"](48.1,16.1,48.35,16.6);
);
out body;
>;
out skel qt;
'''
# Use proper POST with data parameter
r = requests.post('https://overpass-api.de/api/interpreter', data={'data': query}, timeout=120,
headers={'Accept': 'application/json'})
print('Status:', r.status_code)
print('Content-Type:', r.headers.get('content-type'))
if r.status_code == 200:
import json
data = r.json()
print('Elements:', len(data.get('elements', [])))
for el in data.get('elements', [])[:5]:
print(' type:', el.get('type'), 'id:', el.get('id'), 'tags:', {k:v for k,v in (el.get('tags') or {}).items() if k in ('name','admin_level','boundary')})
else:
print(r.text[:500])
"
Status: 406 Content-Type: text/html; charset=iso-8859-1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> <html><head> <title>406 Not Acceptable</title> </head><body> <h1>Not Acceptable</h1> <p>An appropriate representation of the requested resource could not be found on this server.</p> <hr> <address>Apache/2.4.67 (Debian) Server at overpass-api.de Port 443</address> </body></html>
2:32
The 406 error might be caused by the request format. Let me try using urllib properly or check what's going on with the Overpass API.
2:32
Bash
cd /work && python3 -c "
cd /work && python3 -c "
import urllib.request, urllib.parse
query = '''
[out:json][timeout:120];
(
relation[\"admin_level\"=\"9\"][\"boundary\"=\"administrative\"](48.1,16.1,48.35,16.6);
);
out body;
>;
out skel qt;
'''
# Use urllib with proper POST
data = urllib.parse.urlencode({'data': query}).encode()
req = urllib.request.Request('https://overpass-api.de/api/interpreter', data=data)
req.add_header('Accept', 'application/json')
try:
with urllib.request.urlopen(req, timeout=120) as resp:
import json
data = json.loads(resp.read().decode())
print('Elements:', len(data.get('elements', [])))
except Exception as e:
print('Error:', e)
# Try alternative endpoint
try:
req2 = urllib.request.Request('https://overpass-api.de/api/interpreter', data=data)
req2.add_header('Content-Type', 'application/x-www-form-urlencoded')
with urllib.request.urlopen(req2, timeout=120) as resp:
import json
data = json.loads(resp.read().decode())
print('Elements:', len(data.get('elements', [])))
except Exception as e2:
print('Error2:', e2)
"
Error: HTTP Error 406: Not Acceptable Error2: HTTP Error 406: Not Acceptable
2:38
The Overpass API is returning 406 errors. Let me try using a different approach - maybe using OSMnx to fetch historical data or a different way to query Overpass. Actually, let me try the Overpass Turbo endpoint or a simpler query format.
2:38
Bash
# Let's try with a simpler query and different content type
# Let's try with a simpler query and different content type
cd /work && python3 << 'EOF'
import urllib.request, urllib.parse
# Simple query first
query = """
[out:json];
node(48.2,16.3,48.21,16.31);
out;
"""
data = urllib.parse.urlencode({'data': query}).encode()
req = urllib.request.Request('https://overpass-api.de/api/interpreter', data=data)
req.add_header('Content-Type', 'application/x-www-form-urlencoded')
try:
with urllib.request.urlopen(req, timeout=30) as resp:
text = resp.read().decode()
print('OK, len:', len(text))
print(text[:300])
except Exception as e:
print('Error:', e)
# Try without Content-Type header
req2 = urllib.request.Request('https://overpass-api.de/api/interpreter', data=data)
try:
with urllib.request.urlopen(req2, timeout=30) as resp2:
text = resp2.read().decode()
print('OK2, len:', len(text))
print(text[:300])
except Exception as e2:
print('Error2:', e2)
EOF
Error: HTTP Error 406: Not Acceptable Error2: HTTP Error 406: Not Acceptable
2:42
The Overpass API is returning 406 Not Acceptable. This might mean the IP is blocked or the API has been updated. Let me try using OSMnx directly which uses its own Overpass client, or perhaps try the Kumi Systems overpass endpoint. Let me try an alternative approach - use OSMnx's built-in Overpass capabilities or use a different Overpass endpoint.
2:42
Bash
# Check if osmnx can do this - it has its own overpass handling
# Check if osmnx can do this - it has its own overpass handling
cd /work && python3 -c "
import osmnx as ox
ox.settings.log_console = False
print(ox.__version__)
# Try using osmnx's internal overpass request
print('Overpass endpoint:', ox.settings.overpass_endpoint)
"
2.1.0
Traceback (most recent call last):
File "<string>", line 6, in <module>
print('Overpass endpoint:', ox.settings.overpass_endpoint)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: module 'osmnx.settings' has no attribute 'overpass_endpoint'
2:49
The 406 error means the Overpass API isn't accepting our requests. This might be a server-side issue or we need to use a different approach. Let me try using osmnx's own capabilities to fetch data, or try a different Overpass endpoint. Let me try using osmnx's features_from_place with the extract_bbox parameter or using the overpass API with the correct format. Actually, let me try a slightly different Overpass query format - the issue might be with the brackets/formatting.
2:49
Bash
cd /work && python3 << 'PYEOF'
cd /work && python3 << 'PYEOF'
import urllib.request, urllib.parse
# Try with properly escaped query
query = '[out:json];relation["admin_level"="9"]["boundary"="administrative"](48.1,16.1,48.35,16.6);out body;>;out skel qt;'
data = urllib.parse.urlencode({'data': query}).encode()
print("Query:", query[:80])
req = urllib.request.Request('https://overpass-api.de/api/interpreter', data=data)
# Don't set Accept, let server decide
try:
with urllib.request.urlopen(req, timeout=120) as resp:
text = resp.read().decode()
print('OK, len:', len(text))
print(text[:500])
except Exception as e:
print('Error:', e)
# Try with curl
PYEOF
Query: [out:json];relation["admin_level"="9"]["boundary"="administrative"](48.1,16.1,48 Error: HTTP Error 406: Not Acceptable
2:52
The Overpass API is returning 406 errors. Let me check if maybe the server requires a different approach. Let's try without URL encoding, or use requests library with proper settings.
2:52
Bash
cd /work && python3 << 'PYEOF'
cd /work && python3 << 'PYEOF'
import osmnx as ox
ox.settings.log_console = False
# Use osmnx to fetch data - it handles overpass internally
# First get the city boundary
vienna = ox.geocode_to_gdf('Wien, Österreich', which_result=1)
print(vienna[['name', 'geometry']].head())
print(vienna.crs)
# Now get the districts using features_from_polygon
poly = vienna.geometry.iloc[0]
gdf = ox.features_from_polygon(poly, tags={'boundary': 'administrative', 'admin_level': '9'})
print('\nAll features:', len(gdf))
# Filter to relation type only (districts)
rel = gdf[gdf.index.get_level_values('element') == 'relation']
print('Relations:', len(rel))
print('Names:', rel['name'].unique() if 'name' in rel.columns else 'no name col')
PYEOF
name geometry
0 Vienna POLYGON ((16.18183 48.17112, 16.1819 48.17103,...
epsg:4326
All features: 897
Relations: 169
Names: <ArrowStringArray>
[ 'Österreich', 'Bezirk St. Pölten',
'Niederösterreich', 'Deutsch-Wagram',
'Aderklaa', 'Raasdorf',
'Bezirk Gänserndorf', 'Groß-Enzersdorf',
'Bezirk Tulln', 'Bezirk Korneuburg',
...
'Katastralgemeinde Fischamend Dorf', 'Katastralgemeinde Langenzersdorf',
'Katastralgemeinde Kapellerfeld', 'Katastralgemeinde Weidlingbach',
'Katastralgemeinde Weidling', 'Katastralgemeinde Klosterneuburg',
'Katastralgemeinde Aderklaa', 'Katastralgemeinde Mauerbach',
'Katastralgemeinde Laab im Walde', 'Katastralgemeinde Purkersdorf']
Length: 168, dtype: str
3:01
The issue is that many relations are returned. Let me filter more carefully - I need to only get relations with admin_level=9 that are within Vienna proper (not the surrounding areas). Let me check the admin_level values more carefully.
3:01
Bash
cd /work && python3 << 'PYEOF'
cd /work && python3 << 'PYEOF'
import osmnx as ox
import geopandas as gpd
ox.settings.log_console = False
vienna = ox.geocode_to_gdf('Wien, Österreich', which_result=1)
poly = vienna.geometry.iloc[0]
gdf = ox.features_from_polygon(poly, tags={'boundary': 'administrative', 'admin_level': '9'})
rel = gdf[gdf.index.get_level_values('element') == 'relation']
# Check admin_level values
print('admin_level values:', rel['admin_level'].dropna().unique())
print()
# Filter to admin_level=9
rel9 = rel[rel['admin_level'] == '9']
print('admin_level=9 count:', len(rel9))
print('Names:', sorted(rel9['name'].dropna().unique()))
print()
# Also check 'boundary' column
print('boundary values:', rel9['boundary'].dropna().unique())
print('type values:', rel9['type'].dropna().unique())
PYEOF
admin_level values: <ArrowStringArray> ['2', '6', '4', '8', '10', '9'] Length: 6, dtype: str admin_level=9 count: 24 Names: ['Alsergrund', 'Amtshaus für den 16. Bezirk', 'Brigittenau', 'Donaustadt', 'Döbling', 'Favoriten', 'Floridsdorf', 'Hernals', 'Hietzing', 'Innere Stadt', 'Josefstadt', 'Landstraße', 'Leopoldstadt', 'Liesing', 'Margareten', 'Mariahilf', 'Meidling', 'Neubau', 'Ottakring', 'Penzing', 'Rudolfsheim-Fünfhaus', 'Simmering', 'Wieden', 'Währing'] boundary values: <ArrowStringArray> ['administrative'] Length: 1, dtype: str type values: <ArrowStringArray> ['boundary', 'multipolygon'] Length: 2, dtype: str
3:08
Good, so osmnx can fetch current data successfully. Now I need to figure out how to get the 2014-01-01 data. OSMnx might support this through its settings, or I need to use overpass with date filtering. Let me check if osmnx supports historical data, or if I need a different approach. Actually, the 406 error from Overpass is weird. Let me try to use osmnx's own overpass downloading mechanism or try directly querying overpass through osmnx.
3:08
Bash
cd /work && python3 << 'PYEOF'
cd /work && python3 << 'PYEOF'
import osmnx as ox
ox.settings.log_console = False
print(ox.settings.default_language)
# Check if we can customize the overpass query
# Look at all settings
for attr in dir(ox.settings):
if not attr.startswith('_'):
val = getattr(ox.settings, attr)
if not callable(val):
print(f"{attr} = {val}")
PYEOF
Traceback (most recent call last): File "<stdin>", line 3, in <module> AttributeError: module 'osmnx.settings' has no attribute 'default_language'
3:14
Let me try a different approach. Since osmnx can fetch the current data successfully, I can use osmnx for current and write a custom Overpass query using the `requests` library with proper handling. But the Overpass API is returning 406 errors, which might be related to the Accept header. Let me try using osmnx's internal mechanisms. Actually, osmnx has its own way to make Overpass queries. Let me look at the osmnx source to see how it does it, or simply use osmnx's own caching and download functionality with the `date` parameter. Let me check if osmnx supports date-based queries.
3:14
Bash
cd /work && python3 << 'PYEOF'
cd /work && python3 << 'PYEOF' import osmnx as ox ox.settings.log_console = False # Check if there's a way to do historical data # The overpass library that osmnx uses internally import osmnx._overpass as op print(dir(op)) PYEOF
['Any', 'InsufficientResponseError', 'OrderedDict', 'RequestsConnectionError', 'TYPE_CHECKING', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', '_create_overpass_features_query', '_download_overpass_features', '_download_overpass_network', '_get_network_filter', '_get_overpass_pause', '_http', '_make_overpass_polygon_coord_strs', '_make_overpass_settings', '_overpass_request', 'annotations', 'dt', 'lg', 'np', 'projection', 'requests', 'settings', 'time', 'utils', 'utils_geo']
3:18
OK, so osmnx works for current data. The issue is getting 2014 data from Overpass. The HTTP 406 error is weird - maybe the server is blocking requests from this IP? Let's try a different approach. Let me try using osmnx for current data and use the Overpass API directly with the proper `date` parameter for historical data. The 406 might be related to the `Accept` header or similar. Let me check what osmnx does internally.
3:18
Bash
cd /work && python3 << 'PYEOF'
cd /work && python3 << 'PYEOF' import osmnx as ox import osmnx._overpass as op import inspect print(inspect.getsource(op._overpass_request)[:2000]) PYEOF
def _overpass_request(data: OrderedDict[str, Any]) -> dict[str, Any]:
"""
Send a HTTP POST request to the Overpass API and return response.
Parameters
----------
data
Key-value pairs of parameters.
Returns
-------
response_json
The Overpass API's response.
"""
# resolve url to same IP even if there is server round-robin redirecting
_http._config_dns(settings.overpass_url)
# prepare the Overpass API URL and see if request already exists in cache
url = settings.overpass_url.rstrip("/") + "/interpreter"
prepared_url = str(requests.Request("GET", url, params=data).prepare().url)
cached_response_json = _http._retrieve_from_cache(prepared_url)
if isinstance(cached_response_json, dict):
return cached_response_json
# pause then request this URL
pause = _get_overpass_pause(settings.overpass_url)
hostname = _http._hostname_from_url(url)
msg = f"Pausing {pause} second(s) before making HTTP POST request to {hostname!r}"
utils.log(msg, level=lg.INFO)
time.sleep(pause)
# transmit the HTTP POST request
msg = f"Post {prepared_url} with timeout={settings.requests_timeout}"
utils.log(msg, level=lg.INFO)
response = requests.post(
url,
data=data,
timeout=settings.requests_timeout,
headers=_http._get_http_headers(),
**settings.requests_kwargs,
)
# handle 429 and 504 errors by pausing then recursively re-trying request
if response.status_code in {429, 504}: # pragma: no cover
error_pause = 55
msg = (
f"{hostname!r} responded {response.status_code} {response.reason}: "
f"we'll retry in {error_pause} secs"
)
utils.log(msg, level=lg.WARNING)
time.sleep(error_pause)
return _overpass_request(data)
response_json = _http._parse_response(response)
if not isinstance(response_json, dict): # pragma: no cover
msg = "Overpass AP
3:22
Let me look at how osmnx creates overpass queries to see if I can add a date parameter. Let me look at the full function and the `_create_overpass_features_query` function.
3:22
Bash
cd /work && python3 << 'PYEOF'
cd /work && python3 << 'PYEOF' import osmnx as ox import osmnx._overpass as op import inspect # Get the source for _create_overpass_features_query print(inspect.getsource(op._create_overpass_features_query)) PYEOF
def _create_overpass_features_query( # noqa: PLR0912
polygon_coord_str: str,
tags: dict[str, bool | str | list[str]],
) -> str:
"""
Create an Overpass features query string based on tags.
Parameters
----------
polygon_coord_str
The lat lon coordinates.
tags
Tags used for finding elements in the search area.
Returns
-------
query
The Overpass features query.
"""
# create overpass settings string
overpass_settings = _make_overpass_settings()
# make sure every value in dict is bool, str, or list of str
err_msg = "`tags` must be a dict with values of bool, str, or list of str."
if not isinstance(tags, dict): # pragma: no cover
raise TypeError(err_msg)
tags_dict: dict[str, bool | str | list[str]] = {}
for key, value in tags.items():
if isinstance(value, bool):
tags_dict[key] = value
elif isinstance(value, str):
tags_dict[key] = [value]
elif isinstance(value, list):
if not all(isinstance(s, str) for s in value): # pragma: no cover
raise TypeError(err_msg)
tags_dict[key] = value
else: # pragma: no cover
raise TypeError(err_msg)
# convert the tags dict into a list of {tag:value} dicts
tags_list: list[dict[str, bool | str | list[str]]] = []
for key, value in tags_dict.items():
if isinstance(value, bool):
tags_list.append({key: value})
else:
for value_item in value:
tags_list.append({key: value_item}) # noqa: PERF401
# add node/way/relation query components one at a time
components = []
for d in tags_list:
for key, value in d.items():
if isinstance(value, bool):
# if bool (ie, True) just pass the key, no value
tag_str = f"[{key!r}](poly:{polygon_coord_str!r});(._;>;);"
else:
# otherwise, pass "key"="value"
tag_str = f"[{key!r}={value!r}](poly:{polygon_coord_str!r});(._;>;);"
for kind in ("node", "way", "relation"):
components.append(f"({kind}{tag_str});") # noqa: PERF401
# finalize query and return
components_str = "".join(components)
return f"{overpass_settings};({components_str});out;"
3:26
Now I understand how osmnx builds queries. The problem with our direct Overpass API calls is likely the content negotiation. Let me look at what osmnx's HTTP headers are.
3:26
Bash
cd /work && python3 << 'PYEOF'
cd /work && python3 << 'PYEOF' import osmnx as ox import osmnx._http as http_mod import inspect print(inspect.getsource(http_mod._get_http_headers)) PYEOF
def _get_http_headers(
*,
user_agent: str | None = None,
referer: str | None = None,
accept_language: str | None = None,
) -> dict[str, str]:
"""
Update the default requests HTTP headers with OSMnx information.
Parameters
----------
user_agent
The user agent. If None, use `settings.http_user_agent` value.
referer
The referer. If None, use `settings.http_referer` value.
accept_language
The accept language. If None, use `settings.http_accept_language`
value.
Returns
-------
headers
The updated HTTP headers.
"""
if user_agent is None:
user_agent = settings.http_user_agent
if referer is None:
referer = settings.http_referer
if accept_language is None:
accept_language = settings.http_accept_language
info = {"User-Agent": user_agent, "referer": referer, "Accept-Language": accept_language}
headers = dict(requests.utils.default_headers())
headers.update(info)
return headers
3:31
I see - the 406 error is coming from our own code not setting proper headers. Let me use osmnx's internal overpass functions to make the request, or use requests library directly with proper headers. Let me check the osmnx settings for overpass URL. Actually, let me just use the `requests` library with the same approach osmnx uses internally.
3:31
Bash
cd /work && python3 << 'PYEOF'
cd /work && python3 << 'PYEOF'
import osmnx.settings
print('overpass_url:', osmnx.settings.overpass_url)
print('http_user_agent:', osmnx.settings.http_user_agent)
print('http_referer:', osmnx.settings.http_referer)
print('http_accept_language:', osmnx.settings.http_accept_language)
PYEOF
overpass_url: https://overpass-api.de/api http_user_agent: OSMnx Python package (https://github.com/gboeing/osmnx) http_referer: OSMnx Python package (https://github.com/gboeing/osmnx) http_accept_language: en
3:41
The issue is that my direct requests to Overpass API are getting 406 errors. This might be because the server is blocking certain user agents. Let me use osmnx's internal overpass handling to do the queries since it works fine.
The plan:
1. Use osmnx to fetch current districts via `ox.features_from_polygon` with `tags={'boundary': 'administrative', 'admin_level': '9'}` and filter to relations with admin_level=9
2. For historical data (2014-01-01), I need to modify the overpass query to add a date parameter. I can use osmnx's internal `_overpass_request` function to send a custom query with the date parameter.
Let me construct a proper overpass query with date and use osmnx's internal mechanisms.
3:41
Bash
cd /work && python3 << 'PYEOF'
cd /work && python3 << 'PYEOF'
import osmnx as ox
import osmnx._overpass as op
import osmnx.settings as settings
from collections import OrderedDict
import json
ox.settings.log_console = False
vienna = ox.geocode_to_gdf('Wien, Österreich', which_result=1)
poly = vienna.geometry.iloc[0]
# Get polygon coord string
poly_coord_str = op._make_overpass_polygon_coord_strs(poly)[0]
print("Poly coord str (first 100):", poly_coord_str[:100])
# Build a custom query with date for historical data
overpass_settings = op._make_overpass_settings()
print("Settings:", overpass_settings)
# Build query for admin_level=9 boundary=administrative relations
tags_query = 'relation["admin_level"="9"]["boundary"="administrative"]'
query = f"{overpass_settings};(poly:\"{poly_coord_str}\")->.search_area;({tags_query}(area.search_area););out body;>;out skel qt;"
print("Query:", query[:200])
# Try with date
date_query = f"[date:\"2014-01-01T00:00:00Z\"];\n{overpass_settings};(poly:\"{poly_coord_str}\")->.search_area;({tags_query}(area.search_area););out body;>;out skel qt;"
print("Date query:", date_query[:200])
# Use _overpass_request
try:
data = OrderedDict()
data['data'] = date_query
result = op._overpass_request(data)
print("Success! Elements:", len(result.get('elements', [])))
for el in result.get('elements', [])[:5]:
tags = el.get('tags', {})
print(f" {el.get('type')} {el.get('id')}: name={tags.get('name')}, admin_level={tags.get('admin_level')}")
except Exception as e:
print(f"Error: {e}")
PYEOF
Poly coord str (first 100): 48.171120 16.181831 48.171031 16.181902 48.169754 16.182927 48.169331 16.183994 48.169236 16.184241
Settings: [out:json][timeout:180]
Query: [out:json][timeout:180];(poly:"48.171120 16.181831 48.171031 16.181902 48.169754 16.182927 48.169331 16.183994 48.169236 16.184241 48.168928 16.184610 48.168093 16.187748 48.166351 16.193729 48.163962
Date query: [date:"2014-01-01T00:00:00Z"];
[out:json][timeout:180];(poly:"48.171120 16.181831 48.171031 16.181902 48.169754 16.182927 48.169331 16.183994 48.169236 16.184241 48.168928 16.184610 48.168093 16.18774
Error: 'overpass-api.de' responded: 400 Bad Request <?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8" lang="en"/>
<title>OSM3S Response</title>
</head>
<body>
<p>The data included in this document is from www.openstreetmap.org. The data is made available under ODbL.</p>
<p><strong style="color:#FF0000">Error</strong>: line 2: parse error: Unknown type "[" </p>
<p><strong style="color:#FF0000">Error</strong>: line 2: parse error: An empty query is not allowed </p>
<p><strong style="color:#FF0000">Error</strong>: line 2: parse error: ';' expected - 'out' found. </p>
<p><strong style="color:#FF0000">Error</strong>: line 2: parse error: Unknown type "poly" </p>
<p><strong style="color:#FF0000">Error</strong>: line 2: parse error: An empty query is not allowed </p>
<p><strong style="color:#FF0000">Error</strong>: line 2: parse error: ';' expected - ':' found. </p>
<p><strong style="color:#FF0000">Error</strong>: line 2: parse error: Unexpected end of input. </p>
<p><strong style="color:#FF0000">Error</strong>: line 2: static error: Element "print" cannot be subelement of element "union". </p>
<p><strong style="color:#FF0000">Error</strong>: line 2: static error: Element "print" cannot be subelement of element "union". </p>
</body>
</html>
3:55
OK, the date syntax needs to be proper. In Overpass QL, the date is specified as a global setting: `[date:"2014-01-01T00:00:00Z"]` must be on its own line before any statements. Let me try a simpler query with proper formatting. Actually the issue is that the date setting must be placed at the beginning, before the `[out:json]` or after. Let me check the Overpass QL documentation. In Overpass QL, you place `[date:"..."]` at the very beginning of the query. Let me try a cleaner approach - use OSMnx for current data (since it works) and write a direct Overpass query for historical data.
3:55
Bash
cd /work && python3 << 'PYEOF'
cd /work && python3 << 'PYEOF'
import osmnx as ox
import osmnx._overpass as op
from collections import OrderedDict
ox.settings.log_console = False
# Simple test query with date
query = "[date:\"2014-01-01T00:00:00Z\"];\n[out:json][timeout:180];\nrelation[\"admin_level\"=\"9\"][\"boundary\"=\"administrative\"](48.1,16.1,48.35,16.6);\nout body;\n>;\nout skel qt;"
print("Query:")
print(query)
print()
data = OrderedDict()
data['data'] = query
try:
result = op._overpass_request(data)
print("Success! Elements:", len(result.get('elements', [])))
for el in result.get('elements', [])[:10]:
tags = el.get('tags', {})
print(f" {el.get('type')} {el.get('id')}: name={tags.get('name')}, admin_level={tags.get('admin_level')}")
except Exception as e:
print(f"Error: {e}")
PYEOF
Query:
[date:"2014-01-01T00:00:00Z"];
[out:json][timeout:180];
relation["admin_level"="9"]["boundary"="administrative"](48.1,16.1,48.35,16.6);
out body;
>;
out skel qt;
Error: 'overpass-api.de' responded: 400 Bad Request <?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8" lang="en"/>
<title>OSM3S Response</title>
</head>
<body>
<p>The data included in this document is from www.openstreetmap.org. The data is made available under ODbL.</p>
<p><strong style="color:#FF0000">Error</strong>: line 2: parse error: Unknown type "[" </p>
<p><strong style="color:#FF0000">Error</strong>: line 2: parse error: An empty query is not allowed </p>
<p><strong style="color:#FF0000">Error</strong>: line 2: parse error: ';' expected - 'out' found. </p>
</body>
</html>
4:02
The Overpass API doesn't like the date setting on a separate line. Let me check the proper format for date queries in Overpass. The `[date:"..."]` should be placed inside the query block, typically after `[out:json]`. Let me try a different format.
4:02
Bash
cd /work && python3 << 'PYEOF'
cd /work && python3 << 'PYEOF'
import osmnx as ox
import osmnx._overpass as op
from collections import OrderedDict
ox.settings.log_console = False
# Proper Overpass QL with date - the date needs to be part of the settings
query = """
[out:json][timeout:180][date:"2014-01-01T00:00:00Z"];
relation["admin_level"="9"]["boundary"="administrative"](48.1,16.1,48.35,16.6);
out body;
>;
out skel qt;
"""
print("Query:")
print(query.strip())
data = OrderedDict()
data['data'] = query
try:
result = op._overpass_request(data)
print("Success! Elements:", len(result.get('elements', [])))
for el in result.get('elements', [])[:15]:
tags = el.get('tags', {})
print(f" {el.get('type')} {el.get('id')}: name={tags.get('name')}, admin_level={tags.get('admin_level')}")
except Exception as e:
print(f"Error: {e}")
PYEOF
Query: [out:json][timeout:180][date:"2014-01-01T00:00:00Z"]; relation["admin_level"="9"]["boundary"="administrative"](48.1,16.1,48.35,16.6); out body; >; out skel qt; Success! Elements: 5438 relation 1641752: name=Klein-Engersdorf, admin_level=9 relation 1641771: name=Enzersfeld, admin_level=9 relation 1643642: name=Gerasdorf bei Wien, admin_level=9 relation 1990590: name=Alsergrund, admin_level=9 relation 1990591: name=Hietzing, admin_level=9 relation 1990592: name=Innere Stadt, admin_level=9 relation 1990593: name=Josefstadt, admin_level=9 relation 1990594: name=Leopoldstadt, admin_level=9 relation 1990595: name=Mariahilf, admin_level=9 relation 1990596: name=Meidling, admin_level=9 relation 1990597: name=Neubau, admin_level=9 relation 1990598: name=Penzing, admin_level=9 relation 1990599: name=Rudolfsheim-Fünfhaus, admin_level=9 relation 1990600: name=Währing, admin_level=9 relation 1991416: name=Landstraße, admin_level=9
6:28
Great! The historical query works! Now let's also get the current snapshot and then build the complete solution.
6:28
6:28
Write
/work/solve.py
content · 470 lines · py
"""
Fetch Vienna district boundaries (current and 2014-01-01) from Overpass via osmnx,
compare to find changes, and output vienna_boundary_changes.geojson.
Each feature is a MultiPolygon geometry fragment with:
- change_type: added_since_2014 | removed_since_2014 | unchanged
- district_name: normalised district name(s) the fragment belongs to
- touches_changed: bool, whether that district's boundary touches any changed area
"""
import os
import sys
import json
from collections import OrderedDict
import osmnx as ox
import osmnx._overpass as op
import geopandas as gpd
import pandas as pd
import shapely
import shapely.geometry
import numpy as np
ox.settings.log_console = False
# ---------------------------------------------------------------------------
# 1. Overpass query via osmnx internals
# ---------------------------------------------------------------------------
def fetch_vienna_districts_osmnx(date=None):
"""Fetch Vienna district (admin_level=9) boundary relations via Overpass.
Uses osmnx's _overpass_request to handle HTTP correctly.
Returns a GeoDataFrame with columns: geometry, name, district_name, osm_id.
"""
# Bounding box covering Vienna
bbox = (48.1, 16.1, 48.35, 16.6) # south, west, north, east
if date:
query = f"""
[out:json][timeout:180][date:"{date}"];
relation["admin_level"="9"]["boundary"="administrative"]({bbox[0]},{bbox[1]},{bbox[2]},{bbox[3]});
out body;
>;
out skel qt;
"""
else:
query = f"""
[out:json][timeout:180];
relation["admin_level"="9"]["boundary"="administrative"]({bbox[0]},{bbox[1]},{bbox[2]},{bbox[3]});
out body;
>;
out skel qt;
"""
data = OrderedDict()
data['data'] = query
result = op._overpass_request(data)
elements = result.get("elements", [])
# Parse: collect nodes, ways, relations
nodes = {}
ways = {}
relations = []
for el in elements:
etype = el.get("type")
eid = el.get("id")
if etype == "node":
nodes[eid] = (el.get("lon"), el.get("lat"))
elif etype == "way":
ways[eid] = el.get("nodes", [])
elif etype == "relation":
relations.append(el)
def way_to_coords(way_id):
node_ids = ways.get(way_id, [])
return [nodes[nid] for nid in node_ids if nid in nodes]
features = []
for rel in relations:
tags = rel.get("tags", {})
name = tags.get("name", "")
admin_level = tags.get("admin_level", "")
# Only Vienna districts (admin_level=9)
if admin_level != "9":
continue
members = rel.get("members", [])
outer_rings = []
inner_rings = []
for mem in members:
if mem.get("type") != "way":
continue
role = mem.get("role", "")
ref = mem.get("ref")
if ref not in ways:
continue
coords = way_to_coords(ref)
if len(coords) < 3:
continue
if role == "inner":
inner_rings.append(coords)
else:
outer_rings.append(coords)
polygons = []
for outer in outer_rings:
if len(outer) < 3:
continue
if outer[0] != outer[-1]:
outer = outer + [outer[0]]
try:
outer_ls = shapely.LineString(outer)
if not outer_ls.is_valid or outer_ls.is_empty:
continue
inners_for_this = []
remaining = []
for inner in inner_rings:
if inner[0] != inner[-1]:
inner = inner + [inner[0]]
try:
inner_ls = shapely.LineString(inner)
if inner_ls.is_valid and not inner_ls.is_empty:
ipoly = shapely.Polygon(inner_ls)
outer_poly = shapely.Polygon(outer_ls)
if outer_poly.contains(ipoly.representative_point()) or outer_poly.intersects(ipoly):
inners_for_this.append(ipoly)
else:
remaining.append(inner)
else:
remaining.append(inner)
except Exception:
remaining.append(inner)
inner_rings = remaining
if inners_for_this:
poly = shapely.Polygon(outer_ls, [list(i.exterior.coords) for i in inners_for_this])
else:
poly = shapely.Polygon(outer_ls)
# Fix orientation and validate
poly = shapely.polygon.orient(poly)
if poly.is_valid and not poly.is_empty:
polygons.append(poly)
except Exception:
continue
if polygons:
geom = shapely.unary_union(polygons)
if geom.geom_type == "Polygon":
geom = shapely.geometry.MultiPolygon([geom])
if not geom.is_empty:
features.append({
"geometry": geom,
"name": name,
"osm_id": rel.get("id"),
})
if not features:
return gpd.GeoDataFrame()
gdf = gpd.GeoDataFrame(features, crs="EPSG:4326")
return gdf
# ---------------------------------------------------------------------------
# 2. Name normalisation
# ---------------------------------------------------------------------------
# Canonical Vienna district names
VIENNA_DISTRICTS_CANON = {
"innere stadt": "Innere Stadt",
"leopoldstadt": "Leopoldstadt",
"landstraße": "Landstraße",
"landstrasse": "Landstraße",
"wieden": "Wieden",
"margareten": "Margareten",
"margarethen": "Margareten",
"mariahilf": "Mariahilf",
"neubau": "Neubau",
"josefstadt": "Josefstadt",
"alsergrund": "Alsergrund",
"favoriten": "Favoriten",
"simmering": "Simmering",
"meidling": "Meidling",
"hietzing": "Hietzing",
"penzing": "Penzing",
"rudolfsheim-fünfhaus": "Rudolfsheim-Fünfhaus",
"rudolfsheim-fuenfhaus": "Rudolfsheim-Fünfhaus",
"ottakring": "Ottakring",
"hernals": "Hernals",
"währing": "Währing",
"waehring": "Währing",
"döbling": "Döbling",
"doebling": "Döbling",
"brigittenau": "Brigittenau",
"floridsdorf": "Floridsdorf",
"donaustadt": "Donaustadt",
"liesing": "Liesing",
}
def normalise_name(name):
"""Normalise a district name to its canonical form."""
if not name or pd.isna(name):
return None
n = name.strip()
key = n.lower().replace(" ", " ").replace("ß", "ss").replace("-", "-")
# Direct lookup
if key in VIENNA_DISTRICTS_CANON:
return VIENNA_DISTRICTS_CANON[key]
# try with ß vs ss variations
key2 = key.replace("ss", "ß")
if key2 in VIENNA_DISTRICTS_CANON:
return VIENNA_DISTRICTS_CANON[key2]
# Try fuzzy: if any lookup key is contained in our key
for pattern, canon in VIENNA_DISTRICTS_CANON.items():
if pattern in key or key in pattern:
return canon
return n
# ---------------------------------------------------------------------------
# 3. Main pipeline
# ---------------------------------------------------------------------------
def main():
print("=" * 60)
print("Fetching CURRENT Vienna district boundaries...")
print("=" * 60)
current_gdf = fetch_vienna_districts_osmnx()
print(f" Got {len(current_gdf)} current district features")
print("\n" + "=" * 60)
print("Fetching 2014-01-01 Vienna district boundaries...")
print("=" * 60)
historical_gdf = fetch_vienna_districts_osmnx(date="2014-01-01T00:00:00Z")
print(f" Got {len(historical_gdf)} historical district features")
if current_gdf.empty or historical_gdf.empty:
print("ERROR: Empty data returned!")
sys.exit(1)
# Normalise names
current_gdf["district_name"] = current_gdf["name"].apply(normalise_name)
historical_gdf["district_name"] = historical_gdf["name"].apply(normalise_name)
print("\nCurrent district names:")
for n in sorted(current_gdf["district_name"].dropna().unique()):
print(f" - {n}")
print("\nHistorical district names:")
for n in sorted(historical_gdf["district_name"].dropna().unique()):
print(f" - {n}")
# Remove non-Vienna-district features (e.g. Klein-Engersdorf, Enzersfeld in historical data)
valid_names = set(VIENNA_DISTRICTS_CANON.values())
current_gdf = current_gdf[current_gdf["district_name"].isin(valid_names)].copy()
historical_gdf = historical_gdf[historical_gdf["district_name"].isin(valid_names)].copy()
print(f"\nAfter filtering to Vienna districts only:")
print(f" Current: {len(current_gdf)} districts")
print(f" Historical: {len(historical_gdf)} districts")
# Check for missing districts
current_names = set(current_gdf["district_name"].dropna())
historical_names = set(historical_gdf["district_name"].dropna())
common = current_names & historical_names
only_current = current_names - historical_names
only_historical = historical_names - current_names
print(f"\nCommon districts: {len(common)}")
if only_current:
print(f"Only in current: {only_current}")
if only_historical:
print(f"Only in historical: {only_historical}")
# Build unified dissolve
current_union = current_gdf.geometry.unary_union
historical_union = historical_gdf.geometry.unary_union
print(f"\nCurrent total area: {current_union.area:.6f} sq deg")
print(f"Historical total area: {historical_union.area:.6f} sq deg")
# Symmetric difference
added = shapely.difference(current_union, historical_union)
removed = shapely.difference(historical_union, current_union)
unchanged = shapely.intersection(current_union, historical_union)
print(f"Added area: {added.area:.6f}" if not added.is_empty else "Added: empty")
print(f"Removed area: {removed.area:.6f}" if not removed.is_empty else "Removed: empty")
print(f"Unchanged area: {unchanged.area:.6f}" if not unchanged.is_empty else "Unchanged: empty")
# Collect fragments
fragments = []
# Helper to split MultiPolygon into individual fragments
def add_fragments(geom, ctype, dist_name_fn):
"""Add geometry fragments, assigning district name via dist_name_fn."""
if geom.is_empty:
return
if geom.geom_type == "MultiPolygon":
polys = list(geom.geoms)
elif geom.geom_type == "Polygon":
polys = [geom]
elif geom.geom_type == "GeometryCollection":
polys = [g for g in geom.geoms if g.geom_type in ("Polygon", "MultiPolygon")]
else:
polys = [geom]
for poly in polys:
if poly.is_empty:
continue
if poly.geom_type == "Polygon":
mp = shapely.geometry.MultiPolygon([poly])
elif poly.geom_type == "MultiPolygon":
mp = poly
else:
continue
dname = dist_name_fn(mp)
fragments.append({
"geometry": mp,
"change_type": ctype,
"district_name": dname,
})
# Added fragments - assign to current districts that contain/intersect them
def added_dist_name(geom):
districts = set()
for _, row in current_gdf.iterrows():
if geom.intersects(row["geometry"]):
districts.add(row["district_name"])
return ", ".join(sorted(districts)) if districts else "unknown"
add_fragments(added, "added_since_2014", added_dist_name)
# Removed fragments - assign to historical districts
def removed_dist_name(geom):
districts = set()
for _, row in historical_gdf.iterrows():
if geom.intersects(row["geometry"]):
districts.add(row["district_name"])
return ", ".join(sorted(districts)) if districts else "unknown"
add_fragments(removed, "removed_since_2014", removed_dist_name)
# Unchanged fragments - assign per current district
for _, row in current_gdf.iterrows():
dname = row["district_name"]
dist_geom = row["geometry"]
unchanged_part = shapely.intersection(dist_geom, unchanged)
if not unchanged_part.is_empty:
def dn_fn(geom, dn=dname):
return dn
add_fragments(unchanged_part, "unchanged", dn_fn)
result_gdf = gpd.GeoDataFrame(fragments, crs="EPSG:4326")
if result_gdf.empty:
print("No fragments — all boundaries unchanged. Using current boundaries as unchanged.")
result_gdf = current_gdf.copy()
result_gdf["change_type"] = "unchanged"
result_gdf["district_name"] = result_gdf["district_name"]
result_gdf = result_gdf[["geometry", "change_type", "district_name"]]
print(f"\nFragments created: {len(result_gdf)}")
print(result_gdf["change_type"].value_counts())
# -----------------------------------------------------------------------
# 4. Compute touches_changed for each district
# -----------------------------------------------------------------------
# Collect all changed geometries
changed_geoms = []
for _, row in result_gdf.iterrows():
if row["change_type"] in ("added_since_2014", "removed_since_2014"):
changed_geoms.append(row["geometry"])
if changed_geoms:
changed_union = shapely.unary_union(changed_geoms)
else:
changed_union = shapely.geometry.MultiPolygon()
# For each district in current data, check if it touches changed areas
# We use a small buffer (1e-7 degrees ~ 1cm) to capture near-touching
district_touches = {}
for _, row in current_gdf.iterrows():
dname = row["district_name"]
geom = row["geometry"]
if not changed_union.is_empty:
# Use touches with a tiny buffer for robustness
buffered = changed_union.buffer(1e-8)
if geom.touches(changed_union) or geom.intersects(buffered):
district_touches[dname] = True
else:
district_touches[dname] = False
else:
district_touches[dname] = False
# For historical-only districts (if any), also check
for _, row in historical_gdf.iterrows():
dname = row["district_name"]
if dname not in district_touches:
geom = row["geometry"]
if not changed_union.is_empty and geom.intersects(changed_union):
district_touches[dname] = True
else:
district_touches[dname] = False
def get_touches(dn_str):
"""Check if any district in a comma-separated string touches changed areas."""
if pd.isna(dn_str) or dn_str == "unknown":
return False
for d in dn_str.split(","):
d = d.strip()
if district_touches.get(d, False):
return True
return False
result_gdf["touches_changed"] = result_gdf["district_name"].apply(get_touches)
# Simplify geometries slightly to keep file manageable
result_gdf["geometry"] = result_gdf["geometry"].simplify(0.00001, preserve_topology=True)
# Repair any invalid geometries from simplification
result_gdf["geometry"] = result_gdf["geometry"].buffer(0)
# Ensure MultiPolygon type
def ensure_multipolygon(geom):
if geom.geom_type == "Polygon":
return shapely.geometry.MultiPolygon([geom])
return geom
result_gdf["geometry"] = result_gdf["geometry"].apply(ensure_multipolygon)
# -----------------------------------------------------------------------
# 5. Output
# -----------------------------------------------------------------------
output_path = "/work/vienna_boundary_changes.geojson"
result_gdf.to_file(output_path, driver="GeoJSON")
print(f"\n{'=' * 60}")
print(f"Output written to {output_path}")
print(f"Features: {len(result_gdf)}")
print(f"Columns: {result_gdf.columns.tolist()}")
print(f"\nSample features:")
print(result_gdf[["change_type", "district_name", "touches_changed"]].head(15))
print(f"\nDistrict touch summary:")
for dname, touches in sorted(district_touches.items()):
print(f" {dname}: touches_changed={touches}")
# Verify output
verify = gpd.read_file(output_path)
print(f"\nVerification: read back {len(verify)} features")
print(verify[["change_type", "district_name", "touches_changed"]].head(10))
print(f"Geometry types: {verify.geometry.geom_type.unique()}")
if __name__ == "__main__":
main()
Wrote /work/solve.py (17198 chars).
6:30
Bash
cd /work && timeout 300 python solve.py 2>&1
============================================================ Fetching CURRENT Vienna district boundaries... ============================================================ Got 0 current district features ============================================================ Fetching 2014-01-01 Vienna district boundaries... ============================================================ Got 0 historical district features ERROR: Empty data returned!
6:39
The problem is that `_overpass_request` returns data but our parsing might be failing. Let me debug.
6:39
Bash
cd /work && python3 << 'PYEOF'
cd /work && python3 << 'PYEOF'
import osmnx as ox
import osmnx._overpass as op
from collections import OrderedDict
ox.settings.log_console = False
query = """
[out:json][timeout:180];
relation["admin_level"="9"]["boundary"="administrative"](48.1,16.1,48.35,16.6);
out body;
>;
out skel qt;
"""
data = OrderedDict()
data['data'] = query
result = op._overpass_request(data)
print('result type:', type(result))
print('keys:', list(result.keys()))
print('elements count:', len(result.get('elements', [])))
# Look at first relation
for el in result.get('elements', []):
if el.get('type') == 'relation':
tags = el.get('tags', {})
print(f"Relation {el.get('id')}: name={tags.get('name')}, admin_level={tags.get('admin_level')}, boundary={tags.get('boundary')}")
break
# Check what elements we have
types = {}
for el in result.get('elements', []):
t = el.get('type')
types[t] = types.get(t, 0) + 1
print('Element types:', types)
PYEOF
result type: <class 'dict'>
keys: ['version', 'generator', 'osm3s', 'elements']
elements count: 6732
Relation 1990590: name=Alsergrund, admin_level=9, boundary=administrative
Element types: {'relation': 23, 'node': 6210, 'way': 499}
6:52
The query returns 23 relations and 499 ways. But the code that converts to GeoDataFrame is returning 0. Let me debug the conversion.
6:52
Bash
cd /work && python3 << 'PYEOF'
cd /work && python3 << 'PYEOF'
import osmnx as ox
import osmnx._overpass as op
from collections import OrderedDict
import geopandas as gpd
import shapely
import shapely.geometry
ox.settings.log_console = False
query = """
[out:json][timeout:180];
relation["admin_level"="9"]["boundary"="administrative"](48.1,16.1,48.35,16.6);
out body;
>;
out skel qt;
"""
data = OrderedDict()
data['data'] = query
result = op._overpass_request(data)
elements = result.get("elements", [])
# Collect nodes, ways, relations
nodes = {}
ways = {}
relations = []
for el in elements:
etype = el.get("type")
eid = el.get("id")
if etype == "node":
nodes[eid] = (el.get("lon"), el.get("lat"))
elif etype == "way":
ways[eid] = el.get("nodes", [])
elif etype == "relation":
relations.append(el)
print(f"Nodes: {len(nodes)}, Ways: {len(ways)}, Relations: {len(relations)}")
# Debug a specific relation - Alsergrund
for rel in relations:
tags = rel.get("tags", {})
name = tags.get("name", "")
if name == "Alsergrund":
print(f"\nAlsergrund relation id={rel.get('id')}")
members = rel.get("members", [])
print(f" Members: {len(members)}")
for mem in members[:10]:
print(f" type={mem.get('type')} ref={mem.get('ref')} role={mem.get('role')}")
# Check which way refs are in our ways dict
way_refs = [m.get('ref') for m in members if m.get('type') == 'way']
print(f" Way refs in ways dict: {sum(1 for r in way_refs if r in ways)}/{len(way_refs)}")
# Check outer rings
outer_rings = []
inner_rings = []
for mem in members:
if mem.get("type") != "way":
continue
role = mem.get("role", "")
ref = mem.get("ref")
if ref not in ways:
continue
coords = [(nodes[nid] if nid in nodes else None) for nid in ways[ref]]
coords = [c for c in coords if c is not None]
if len(coords) < 3:
continue
if role == "inner":
inner_rings.append(coords)
else:
outer_rings.append(coords)
print(f" Outer rings built: {len(outer_rings)}, Inner rings: {len(inner_rings)}")
# Check first outer ring
if outer_rings:
outer = outer_rings[0]
print(f" First outer ring length: {len(outer)}")
if outer[0] != outer[-1]:
outer = outer + [outer[0]]
print(f" After closing: {len(outer)} coords")
try:
ls = shapely.LineString(outer)
print(f" LineString valid: {ls.is_valid}, empty: {ls.is_empty}")
poly = shapely.Polygon(ls)
print(f" Polygon valid: {poly.is_valid}, area: {poly.area}")
except Exception as e:
print(f" Error: {e}")
break
PYEOF
Nodes: 6210, Ways: 499, Relations: 23
Alsergrund relation id=1990590
Members: 9
type=way ref=147468285 role=outer
type=way ref=8046654 role=outer
type=way ref=147396089 role=outer
type=way ref=147463455 role=outer
type=way ref=147463456 role=outer
type=way ref=390540126 role=outer
type=way ref=390540129 role=outer
type=way ref=147463454 role=outer
type=way ref=147463462 role=outer
Way refs in ways dict: 9/9
Outer rings built: 9, Inner rings: 0
First outer ring length: 32
After closing: 33 coords
LineString valid: True, empty: False
Polygon valid: False, area: 4.780675000099115e-08
7:08
The polygon is invalid - it's likely a multipolygon with many outer ways forming separate rings, not a single polygon. Each "outer" way in a district boundary relation typically forms part of the outer boundary, meaning they are separate line segments that together form one ring. But we're treating each way as a separate outer ring (polygon), which is wrong. Actually, for Vienna districts, the boundary is typically a single ring. Each way is a segment of that ring. So we need to join them into a single closed ring by matching start/end nodes. Let me check what these ways look like.
7:08
Bash
cd /work && python3 << 'PYEOF'
cd /work && python3 << 'PYEOF'
import osmnx as ox
import osmnx._overpass as op
from collections import OrderedDict
ox.settings.log_console = False
query = """
[out:json][timeout:180];
relation["admin_level"="9"]["boundary"="administrative"](48.1,16.1,48.35,16.6);
out body;
>;
out skel qt;
"""
data = OrderedDict()
data['data'] = query
result = op._overpass_request(data)
elements = result.get("elements", [])
nodes = {}
ways = {}
relations = []
for el in elements:
etype = el.get("type")
eid = el.get("id")
if etype == "node":
nodes[eid] = (el.get("lon"), el.get("lat"))
elif etype == "way":
ways[eid] = el.get("nodes", [])
elif etype == "relation":
relations.append(el)
# Check Alsergrund ways to see how they connect
for rel in relations:
tags = rel.get("tags", {})
name = tags.get("name", "")
if name == "Alsergrund":
members = rel.get("members", [])
print("Alsergrund way segments:")
for mem in members:
if mem.get("type") == "way":
ref = mem.get("ref")
node_ids = ways.get(ref, [])
if node_ids:
first_node = nodes.get(node_ids[0], "?")
last_node = nodes.get(node_ids[-1], "?")
print(f" way {ref}: {node_ids[0]}->{node_ids[-1]}, "
f"first={first_node}, last={last_node}, len={len(node_ids)}")
# Try to assemble the ring
# Collect all way segments with their start/end nodes
segments = []
for mem in members:
if mem.get("type") == "way":
ref = mem.get("ref")
node_ids = ways.get(ref, [])
if node_ids:
coords = [nodes[nid] for nid in node_ids if nid in nodes]
if len(coords) >= 2:
segments.append((coords, node_ids[0], node_ids[-1]))
print(f"\nTotal segments: {len(segments)}")
# Simple ring assembly: start with first segment, then find matching endpoints
if segments:
# Use the first segment's first node and last node
ring_coords = list(segments[0][0])
used = {0}
current_end_node = segments[0][2] # last node id of first segment
# Try to find next segment
for _ in range(len(segments) * 2):
found = False
for i, (coords, start_node, end_node) in enumerate(segments):
if i in used:
continue
if start_node == current_end_node:
# Append coords (skip first coord to avoid duplication)
ring_coords.extend(coords[1:])
current_end_node = end_node
used.add(i)
found = True
break
elif end_node == current_end_node:
# Reverse this segment
rev_coords = list(reversed(coords))
ring_coords.extend(rev_coords[1:])
current_end_node = start_node
used.add(i)
found = True
break
if not found:
break
print(f"Used {len(used)} segments, ring has {len(ring_coords)} coords")
if ring_coords[0] != ring_coords[-1]:
ring_coords.append(ring_coords[0])
print(f"Ring closed: {ring_coords[0] == ring_coords[-1]}, first={ring_coords[0]}, last={ring_coords[-1]}")
import shapely
ls = shapely.LineString(ring_coords)
print(f"LineString valid: {ls.is_valid}, length: {ls.length}")
poly = shapely.Polygon(ls)
print(f"Polygon valid: {poly.is_valid}, area: {poly.area}")
# Print first few and last few coords
print(f"First 3 coords: {ring_coords[:3]}")
print(f"Last 3 coords: {ring_coords[-3:]}")
break
PYEOF
Alsergrund way segments: way 147468285: 731110806->2157233697, first=(16.3408703, 48.2151672), last=(16.3419883, 48.2172473), len=32 way 8046654: 731110806->313620580, first=(16.3408703, 48.2151672), last=(16.3565102, 48.2143408), len=28 way 147396089: 1606198290->313620580, first=(16.370149, 48.2184891), last=(16.3565102, 48.2143408), len=33 way 147463455: 1606822119->1606198290, first=(16.3674575, 48.2251075), last=(16.370149, 48.2184891), len=15 way 147463456: 1606822180->1606822119, first=(16.3616214, 48.236309), last=(16.3674575, 48.2251075), len=22 way 390540126: 3936984440->1606822180, first=(16.359617, 48.236246), last=(16.3616214, 48.236309), len=3 way 390540129: 3936984434->3936984440, first=(16.3567669, 48.2361278), last=(16.359617, 48.236246), len=3 way 147463454: 731244444->3936984434, first=(16.3537656, 48.231992), last=(16.3567669, 48.2361278), len=17 way 147463462: 2157233697->731244444, first=(16.3419883, 48.2172473), last=(16.3537656, 48.231992), len=268 Total segments: 9 Used 9 segments, ring has 413 coords Ring closed: True, first=(16.3408703, 48.2151672), last=(16.3408703, 48.2151672) LineString valid: True, length: 0.08759135674212101 Polygon valid: True, area: 0.00035913040147501186 First 3 coords: [(16.3408703, 48.2151672), (16.3409357, 48.2152918), (16.3412026, 48.2157703)] Last 3 coords: [(16.3440565, 48.2152535), (16.3433909, 48.2152357), (16.3408703, 48.2151672)]
8:12
The issue is that the individual way segments from the relation don't form a single ring - they need to be assembled. My earlier code was treating each outer way as a separate ring and building separate polygons from each, which produced invalid polygons. I need to properly assemble the outer ring from all outer way members, and separately handle inner rings. Let me rewrite the assembly code properly.
8:12
Write
/work/solve.py
content · 587 lines · py
"""
Fetch Vienna district boundaries (current and 2014-01-01) from Overpass via osmnx,
compare to find changes, and output vienna_boundary_changes.geojson.
Each feature is a MultiPolygon geometry fragment with:
- change_type: added_since_2014 | removed_since_2014 | unchanged
- district_name: normalised district name(s) the fragment belongs to
- touches_changed: bool, whether that district's boundary touches any changed area
"""
import os
import sys
import json
from collections import OrderedDict
import osmnx as ox
import osmnx._overpass as op
import geopandas as gpd
import pandas as pd
import shapely
import shapely.geometry
import numpy as np
ox.settings.log_console = False
# ---------------------------------------------------------------------------
# 1. Overpass query via osmnx internals
# ---------------------------------------------------------------------------
def assemble_ring_from_ways(way_refs, role, ways_dict, nodes_dict):
"""Assemble a closed ring from a list of way references.
Ways in OSM relations are directed segments that need to be chained
end-to-end by matching their start/end node IDs.
Returns list of (lon, lat) tuples forming a closed ring, or None.
"""
if not way_refs:
return None
# Get all segments with their start/end node ids
segments = []
for ref in way_refs:
if ref not in ways_dict:
continue
node_ids = ways_dict[ref]
if len(node_ids) < 2:
continue
coords = [nodes_dict.get(nid) for nid in node_ids]
coords = [c for c in coords if c is not None]
if len(coords) < 2:
continue
segments.append({
'ref': ref,
'coords': coords,
'start_node': node_ids[0],
'end_node': node_ids[-1],
})
if not segments:
return None
# Chain segments together
used = [False] * len(segments)
# Start with the first segment
ring_coords = list(segments[0]['coords'])
current_end_node = segments[0]['end_node']
used[0] = True
# Keep connecting segments until we close the ring or can't find more
changed = True
while changed:
changed = False
for i, seg in enumerate(segments):
if used[i]:
continue
if seg['start_node'] == current_end_node:
# Append coords (skip first to avoid duplication)
ring_coords.extend(seg['coords'][1:])
current_end_node = seg['end_node']
used[i] = True
changed = True
break
elif seg['end_node'] == current_end_node:
# Reverse segment
rev_coords = list(reversed(seg['coords']))
ring_coords.extend(rev_coords[1:])
current_end_node = seg['start_node']
used[i] = True
changed = True
break
# Check if the ring is closed
if ring_coords[0] != ring_coords[-1]:
ring_coords.append(ring_coords[0])
return ring_coords
def fetch_vienna_districts_osmnx(date=None):
"""Fetch Vienna district (admin_level=9) boundary relations via Overpass.
Returns a GeoDataFrame with columns: geometry, name, district_name, osm_id.
"""
bbox = (48.1, 16.1, 48.35, 16.6) # south, west, north, east
if date:
query = f"""
[out:json][timeout:180][date:"{date}"];
relation["admin_level"="9"]["boundary"="administrative"]({bbox[0]},{bbox[1]},{bbox[2]},{bbox[3]});
out body;
>;
out skel qt;
"""
else:
query = f"""
[out:json][timeout:180];
relation["admin_level"="9"]["boundary"="administrative"]({bbox[0]},{bbox[1]},{bbox[2]},{bbox[3]});
out body;
>;
out skel qt;
"""
data = OrderedDict()
data['data'] = query
result = op._overpass_request(data)
elements = result.get("elements", [])
nodes = {}
ways = {}
relations = []
for el in elements:
etype = el.get("type")
eid = el.get("id")
if etype == "node":
nodes[eid] = (el.get("lon"), el.get("lat"))
elif etype == "way":
ways[eid] = el.get("nodes", [])
elif etype == "relation":
relations.append(el)
features = []
for rel in relations:
tags = rel.get("tags", {})
name = tags.get("name", "")
admin_level = tags.get("admin_level", "")
if admin_level != "9":
continue
members = rel.get("members", [])
# Collect way refs by role
outer_refs = []
inner_refs = []
for mem in members:
if mem.get("type") != "way":
continue
role = mem.get("role", "")
ref = mem.get("ref")
if ref not in ways:
continue
if role == "inner":
inner_refs.append(ref)
else:
outer_refs.append(ref)
# Assemble each outer ring
outer_polygons = []
# Group outer ways into separate rings
# Multiple outer rings mean multipolygon
unused_outer = list(outer_refs)
while unused_outer:
# Try to build a ring starting from the first unused way
ring = assemble_ring_from_ways(unused_outer, 'outer', ways, nodes)
if ring is None or len(ring) < 4:
break
# Remove used ways
# (We need to track which refs were used — for simplicity, remove those
# that start/end match what we consumed)
# Better approach: find which refs have coords in the ring
# Create linestring and polygon
try:
ls = shapely.LineString(ring)
if not ls.is_valid or ls.is_empty or ls.length < 1e-10:
break
poly = shapely.Polygon(ls)
poly = shapely.polygon.orient(poly)
if poly.is_valid and not poly.is_empty and poly.area > 1e-12:
outer_polygons.append(poly)
except Exception:
break
# Remove used ways from unused_outer
# We find which refs contributed by checking coordinates
ring_coords_set = set(tuple(round(c, 10) for c in coord) for coord in ring)
new_unused = []
for ref in unused_outer:
if ref not in ways:
continue
node_ids = ways[ref]
# Check if this way contributed to the ring
contributed = False
for nid in node_ids:
coord = nodes.get(nid)
if coord and tuple(round(c, 10) for c in coord) in ring_coords_set:
contributed = True
break
if not contributed:
new_unused.append(ref)
if len(new_unused) == len(unused_outer):
# No progress - break to avoid infinite loop
break
unused_outer = new_unused
if not outer_polygons:
continue
# Now handle inner rings
# For each outer polygon, find contained inner rings
inner_polygons = []
for ref in inner_refs:
if ref not in ways:
continue
node_ids = ways[ref]
coords = [nodes.get(nid) for nid in node_ids]
coords = [c for c in coords if c is not None]
if len(coords) < 3:
continue
if coords[0] != coords[-1]:
coords.append(coords[0])
try:
inner_ls = shapely.LineString(coords)
if inner_ls.is_valid and not inner_ls.is_empty and inner_ls.length > 1e-10:
inner_poly = shapely.Polygon(inner_ls)
inner_poly = shapely.polygon.orient(inner_poly)
if inner_poly.is_valid and not inner_poly.is_empty:
inner_polygons.append(inner_poly)
except Exception:
continue
# Build final polygons with holes
final_polygons = []
for outer_poly in outer_polygons:
# Find inners contained in this outer
inners_for_this = []
remaining_inners = []
for ipoly in inner_polygons:
try:
if outer_poly.contains(ipoly.representative_point()) or outer_poly.intersects(ipoly):
inners_for_this.append(ipoly)
else:
remaining_inners.append(ipoly)
except Exception:
remaining_inners.append(ipoly)
inner_polygons = remaining_inners
if inners_for_this:
# Create polygon with holes
try:
outer_coords = list(outer_poly.exterior.coords)
inner_coords_list = [list(ip.exterior.coords) for ip in inners_for_this]
poly = shapely.Polygon(outer_coords, inner_coords_list)
poly = shapely.polygon.orient(poly)
if poly.is_valid and not poly.is_empty:
final_polygons.append(poly)
except Exception:
final_polygons.append(outer_poly)
else:
final_polygons.append(outer_poly)
if final_polygons:
geom = shapely.unary_union(final_polygons)
if geom.geom_type == "Polygon":
geom = shapely.geometry.MultiPolygon([geom])
if not geom.is_empty:
features.append({
"geometry": geom,
"name": name,
"osm_id": rel.get("id"),
})
if not features:
return gpd.GeoDataFrame()
gdf = gpd.GeoDataFrame(features, crs="EPSG:4326")
return gdf
# ---------------------------------------------------------------------------
# 2. Name normalisation
# ---------------------------------------------------------------------------
# Canonical Vienna district names
VIENNA_DISTRICTS_CANON = {
"innere stadt": "Innere Stadt",
"leopoldstadt": "Leopoldstadt",
"landstraße": "Landstraße",
"landstrasse": "Landstraße",
"wieden": "Wieden",
"margareten": "Margareten",
"margarethen": "Margareten",
"mariahilf": "Mariahilf",
"neubau": "Neubau",
"josefstadt": "Josefstadt",
"alsergrund": "Alsergrund",
"favoriten": "Favoriten",
"simmering": "Simmering",
"meidling": "Meidling",
"hietzing": "Hietzing",
"penzing": "Penzing",
"rudolfsheim-fünfhaus": "Rudolfsheim-Fünfhaus",
"rudolfsheim-fuenfhaus": "Rudolfsheim-Fünfhaus",
"ottakring": "Ottakring",
"hernals": "Hernals",
"währing": "Währing",
"waehring": "Währing",
"döbling": "Döbling",
"doebling": "Döbling",
"brigittenau": "Brigittenau",
"floridsdorf": "Floridsdorf",
"donaustadt": "Donaustadt",
"liesing": "Liesing",
}
def normalise_name(name):
"""Normalise a district name to its canonical form."""
if not name or pd.isna(name):
return None
n = name.strip()
key = n.lower().replace(" ", " ").replace("ß", "ss").replace("-", "-")
if key in VIENNA_DISTRICTS_CANON:
return VIENNA_DISTRICTS_CANON[key]
key2 = key.replace("ss", "ß")
if key2 in VIENNA_DISTRICTS_CANON:
return VIENNA_DISTRICTS_CANON[key2]
for pattern, canon in VIENNA_DISTRICTS_CANON.items():
if pattern in key or key in pattern:
return canon
return n
# ---------------------------------------------------------------------------
# 3. Main pipeline
# ---------------------------------------------------------------------------
def main():
print("=" * 60)
print("Fetching CURRENT Vienna district boundaries...")
print("=" * 60)
current_gdf = fetch_vienna_districts_osmnx()
print(f" Got {len(current_gdf)} current district features")
print("\n" + "=" * 60)
print("Fetching 2014-01-01 Vienna district boundaries...")
print("=" * 60)
historical_gdf = fetch_vienna_districts_osmnx(date="2014-01-01T00:00:00Z")
print(f" Got {len(historical_gdf)} historical district features")
if current_gdf.empty or historical_gdf.empty:
print("ERROR: Empty data returned!")
sys.exit(1)
# Normalise names
current_gdf["district_name"] = current_gdf["name"].apply(normalise_name)
historical_gdf["district_name"] = historical_gdf["name"].apply(normalise_name)
print("\nCurrent district names:")
for n in sorted(current_gdf["district_name"].dropna().unique()):
print(f" - {n}")
print("\nHistorical district names:")
for n in sorted(historical_gdf["district_name"].dropna().unique()):
print(f" - {n}")
# Remove non-Vienna-district features
valid_names = set(VIENNA_DISTRICTS_CANON.values())
current_gdf = current_gdf[current_gdf["district_name"].isin(valid_names)].copy()
historical_gdf = historical_gdf[historical_gdf["district_name"].isin(valid_names)].copy()
print(f"\nAfter filtering to Vienna districts only:")
print(f" Current: {len(current_gdf)} districts")
print(f" Historical: {len(historical_gdf)} districts")
current_names = set(current_gdf["district_name"].dropna())
historical_names = set(historical_gdf["district_name"].dropna())
common = current_names & historical_names
only_current = current_names - historical_names
only_historical = historical_names - current_names
print(f"\nCommon districts: {len(common)}")
if only_current:
print(f"Only in current: {only_current}")
if only_historical:
print(f"Only in historical: {only_historical}")
# Build unified dissolve
current_union = current_gdf.geometry.unary_union
historical_union = historical_gdf.geometry.unary_union
print(f"\nCurrent total area: {current_union.area:.6f} sq deg")
print(f"Historical total area: {historical_union.area:.6f} sq deg")
# Symmetric difference
added = shapely.difference(current_union, historical_union)
removed = shapely.difference(historical_union, current_union)
unchanged = shapely.intersection(current_union, historical_union)
print(f"Added area: {added.area:.6f}" if not added.is_empty else "Added: empty")
print(f"Removed area: {removed.area:.6f}" if not removed.is_empty else "Removed: empty")
print(f"Unchanged area: {unchanged.area:.6f}" if not unchanged.is_empty else "Unchanged: empty")
# Collect fragments
fragments = []
def add_fragments(geom, ctype, dist_name_fn):
if geom.is_empty:
return
if geom.geom_type == "MultiPolygon":
polys = list(geom.geoms)
elif geom.geom_type == "Polygon":
polys = [geom]
elif geom.geom_type == "GeometryCollection":
polys = [g for g in geom.geoms if g.geom_type in ("Polygon", "MultiPolygon")]
else:
polys = [geom]
for poly in polys:
if poly.is_empty:
continue
if poly.geom_type == "Polygon":
mp = shapely.geometry.MultiPolygon([poly])
elif poly.geom_type == "MultiPolygon":
mp = poly
else:
continue
dname = dist_name_fn(mp)
fragments.append({
"geometry": mp,
"change_type": ctype,
"district_name": dname,
})
def added_dist_name(geom):
districts = set()
for _, row in current_gdf.iterrows():
if geom.intersects(row["geometry"]):
districts.add(row["district_name"])
return ", ".join(sorted(districts)) if districts else "unknown"
add_fragments(added, "added_since_2014", added_dist_name)
def removed_dist_name(geom):
districts = set()
for _, row in historical_gdf.iterrows():
if geom.intersects(row["geometry"]):
districts.add(row["district_name"])
return ", ".join(sorted(districts)) if districts else "unknown"
add_fragments(removed, "removed_since_2014", removed_dist_name)
# Unchanged fragments - per current district
for _, row in current_gdf.iterrows():
dname = row["district_name"]
dist_geom = row["geometry"]
unchanged_part = shapely.intersection(dist_geom, unchanged)
if not unchanged_part.is_empty:
def dn_fn(geom, dn=dname):
return dn
add_fragments(unchanged_part, "unchanged", dn_fn)
result_gdf = gpd.GeoDataFrame(fragments, crs="EPSG:4326")
if result_gdf.empty:
print("No fragments — all boundaries unchanged.")
result_gdf = current_gdf.copy()
result_gdf["change_type"] = "unchanged"
result_gdf["district_name"] = result_gdf["district_name"]
result_gdf = result_gdf[["geometry", "change_type", "district_name"]]
print(f"\nFragments created: {len(result_gdf)}")
if not result_gdf.empty:
print(result_gdf["change_type"].value_counts())
# -----------------------------------------------------------------------
# 4. Compute touches_changed for each district
# -----------------------------------------------------------------------
changed_geoms = []
for _, row in result_gdf.iterrows():
if row["change_type"] in ("added_since_2014", "removed_since_2014"):
changed_geoms.append(row["geometry"])
if changed_geoms:
changed_union = shapely.unary_union(changed_geoms)
else:
changed_union = shapely.geometry.MultiPolygon()
district_touches = {}
for _, row in current_gdf.iterrows():
dname = row["district_name"]
geom = row["geometry"]
if not changed_union.is_empty:
# Use touches or a small buffer for robustness
touches = geom.touches(changed_union)
if not touches:
# Also check if the district itself contains changed areas
overlap = shapely.intersection(geom, changed_union)
touches = not overlap.is_empty
district_touches[dname] = touches
else:
district_touches[dname] = False
for _, row in historical_gdf.iterrows():
dname = row["district_name"]
if dname not in district_touches:
geom = row["geometry"]
if not changed_union.is_empty:
touches = geom.touches(changed_union)
if not touches:
overlap = shapely.intersection(geom, changed_union)
touches = not overlap.is_empty
district_touches[dname] = touches
else:
district_touches[dname] = False
def get_touches(dn_str):
if pd.isna(dn_str) or dn_str == "unknown":
return False
for d in dn_str.split(","):
d = d.strip()
if district_touches.get(d, False):
return True
return False
result_gdf["touches_changed"] = result_gdf["district_name"].apply(get_touches)
# Simplify geometries slightly
result_gdf["geometry"] = result_gdf["geometry"].simplify(0.00001, preserve_topology=True)
result_gdf["geometry"] = result_gdf["geometry"].buffer(0)
# Ensure MultiPolygon type
def ensure_multipolygon(geom):
if geom.geom_type == "Polygon":
return shapely.geometry.MultiPolygon([geom])
return geom
result_gdf["geometry"] = result_gdf["geometry"].apply(ensure_multipolygon)
# -----------------------------------------------------------------------
# 5. Output
# -----------------------------------------------------------------------
output_path = "/work/vienna_boundary_changes.geojson"
result_gdf.to_file(output_path, driver="GeoJSON")
print(f"\n{'=' * 60}")
print(f"Output written to {output_path}")
print(f"Features: {len(result_gdf)}")
print(f"Columns: {result_gdf.columns.tolist()}")
print(f"\nSample features:")
print(result_gdf[["change_type", "district_name", "touches_changed"]].head(15))
print(f"\nDistrict touch summary:")
for dname, touches in sorted(district_touches.items()):
print(f" {dname}: touches_changed={touches}")
# Verify output
verify = gpd.read_file(output_path)
print(f"\nVerification: read back {len(verify)} features")
print(verify[["change_type", "district_name", "touches_changed"]].head(10))
print(f"Geometry types: {verify.geometry.geom_type.unique()}")
if __name__ == "__main__":
main()
Wrote /work/solve.py (21127 chars).
8:14
The ring assembly is getting complex. Let me test it first with a quick run to see if we can get the districts to parse correctly.
8:14
Bash
cd /work && timeout 300 python solve.py 2>&1
============================================================ Fetching CURRENT Vienna district boundaries... ============================================================ Got 0 current district features ============================================================ Fetching 2014-01-01 Vienna district boundaries... ============================================================ Got 0 historical district features ERROR: Empty data returned!
8:25
The ring assembly logic seems to be the issue. Let me debug it more carefully. The issue is that `assemble_ring_from_ways` might be consuming all outer refs as one ring, or the way detection is failing. Let me trace through the logic more carefully.
8:25
Bash
cd /work && python3 << 'PYEOF'
cd /work && python3 << 'PYEOF'
import osmnx as ox
import osmnx._overpass as op
from collections import OrderedDict
ox.settings.log_console = False
query = """
[out:json][timeout:180];
relation["admin_level"="9"]["boundary"="administrative"](48.1,16.1,48.35,16.6);
out body;
>;
out skel qt;
"""
data = OrderedDict()
data['data'] = query
result = op._overpass_request(data)
elements = result.get("elements", [])
nodes = {}
ways = {}
relations = []
for el in elements:
etype = el.get("type")
eid = el.get("id")
if etype == "node":
nodes[eid] = (el.get("lon"), el.get("lat"))
elif etype == "way":
ways[eid] = el.get("nodes", [])
elif etype == "relation":
relations.append(el)
print(f"Nodes: {len(nodes)}, Ways: {len(ways)}, Relations: {len(relations)}")
# Test assemble_ring_from_ways for Alsergrund
def assemble_ring_from_ways(way_refs, role, ways_dict, nodes_dict):
if not way_refs:
return None
segments = []
for ref in way_refs:
if ref not in ways_dict:
continue
node_ids = ways_dict[ref]
if len(node_ids) < 2:
continue
coords = [nodes_dict.get(nid) for nid in node_ids]
coords = [c for c in coords if c is not None]
if len(coords) < 2:
continue
segments.append({
'ref': ref,
'coords': coords,
'start_node': node_ids[0],
'end_node': node_ids[-1],
})
if not segments:
return None
used = [False] * len(segments)
ring_coords = list(segments[0]['coords'])
current_end_node = segments[0]['end_node']
used[0] = True
changed = True
while changed:
changed = False
for i, seg in enumerate(segments):
if used[i]:
continue
if seg['start_node'] == current_end_node:
ring_coords.extend(seg['coords'][1:])
current_end_node = seg['end_node']
used[i] = True
changed = True
break
elif seg['end_node'] == current_end_node:
rev_coords = list(reversed(seg['coords']))
ring_coords.extend(rev_coords[1:])
current_end_node = seg['start_node']
used[i] = True
changed = True
break
if ring_coords[0] != ring_coords[-1]:
ring_coords.append(ring_coords[0])
return ring_coords, used
# Test for Alsergrund
for rel in relations:
tags = rel.get("tags", {})
name = tags.get("name", "")
if name == "Alsergrund":
members = rel.get("members", [])
outer_refs = []
inner_refs = []
for mem in members:
if mem.get("type") != "way":
continue
role = mem.get("role", "")
ref = mem.get("ref")
if ref not in ways:
continue
if role == "inner":
inner_refs.append(ref)
else:
outer_refs.append(ref)
print(f"\nAlsergrund: {len(outer_refs)} outer refs, {len(inner_refs)} inner refs")
print(f"Outer refs: {outer_refs}")
result = assemble_ring_from_ways(outer_refs, 'outer', ways, nodes)
if result:
ring_coords, used = result
print(f"Ring: {len(ring_coords)} coords, used: {used}")
print(f"Closed: {ring_coords[0] == ring_coords[-1]}")
import shapely
ls = shapely.LineString(ring_coords)
print(f"LS valid: {ls.is_valid}, length: {ls.length}")
poly = shapely.Polygon(ring_coords)
print(f"Poly valid: {poly.is_valid}, area: {poly.area}")
else:
print("Ring assembly returned None")
break
# Test all relations
print("\n\nTesting ALL relations:")
for rel in relations:
tags = rel.get("tags", {})
name = tags.get("name", "")
members = rel.get("members", [])
outer_refs = []
for mem in members:
if mem.get("type") != "way":
continue
role = mem.get("role", "")
ref = mem.get("ref")
if ref in ways and role != "inner":
outer_refs.append(ref)
result = assemble_ring_from_ways(outer_refs, 'outer', ways, nodes)
if result:
ring_coords, used = result
import shapely
ls = shapely.LineString(ring_coords)
poly = shapely.Polygon(ring_coords)
valid = poly.is_valid and poly.area > 1e-12
print(f"{name}: {len(outer_refs)} outer refs, ring {len(ring_coords)} coords, used all={all(used)}, valid={valid}, area={poly.area:.8f}")
else:
print(f"{name}: FAILED - {len(outer_refs)} outer refs")
PYEOF
Nodes: 6210, Ways: 499, Relations: 23 Alsergrund: 9 outer refs, 0 inner refs Outer refs: [147468285, 8046654, 147396089, 147463455, 147463456, 390540126, 390540129, 147463454, 147463462] Ring: 413 coords, used: [True, True, True, True, True, True, True, True, True] Closed: True LS valid: True, length: 0.08759135674212101 Poly valid: True, area: 0.00035913040147501186 Testing ALL relations: Alsergrund: 9 outer refs, ring 413 coords, used all=True, valid=True, area=0.00035913 Hietzing: 50 outer refs, ring 308 coords, used all=True, valid=True, area=0.00456027 Innere Stadt: 7 outer refs, ring 176 coords, used all=True, valid=True, area=0.00034707 Josefstadt: 16 outer refs, ring 137 coords, used all=True, valid=True, area=0.00013189 Leopoldstadt: 11 outer refs, ring 208 coords, used all=True, valid=True, area=0.00232716 Mariahilf: 26 outer refs, ring 212 coords, used all=True, valid=True, area=0.00017618 Meidling: 53 outer refs, ring 306 coords, used all=False, valid=True, area=0.00097967 Neubau: 31 outer refs, ring 182 coords, used all=True, valid=True, area=0.00019452 Penzing: 45 outer refs, ring 1317 coords, used all=True, valid=True, area=0.00408591 Rudolfsheim-Fünfhaus: 16 outer refs, ring 194 coords, used all=True, valid=True, area=0.00047390 Währing: 51 outer refs, ring 835 coords, used all=True, valid=True, area=0.00076838 Landstraße: 48 outer refs, ring 326 coords, used all=True, valid=True, area=0.00089591 Brigittenau: 10 outer refs, ring 113 coords, used all=True, valid=True, area=0.00069161 Donaustadt: 66 outer refs, ring 572 coords, used all=True, valid=True, area=0.01238082 Döbling: 51 outer refs, ring 488 coords, used all=True, valid=True, area=0.00302130 Favoriten: 43 outer refs, ring 417 coords, used all=True, valid=True, area=0.00384630 Floridsdorf: 38 outer refs, ring 430 coords, used all=True, valid=True, area=0.00538470 Hernals: 81 outer refs, ring 1137 coords, used all=True, valid=True, area=0.00137956 Liesing: 69 outer refs, ring 921 coords, used all=True, valid=True, area=0.00387493 Margareten: 22 outer refs, ring 177 coords, used all=True, valid=True, area=0.00024329 Ottakring: 76 outer refs, ring 668 coords, used all=True, valid=True, area=0.00104948 Simmering: 43 outer refs, ring 287 coords, used all=True, valid=True, area=0.00281169 Wieden: 30 outer refs, ring 175 coords, used all=True, valid=True, area=0.00021484
8:50
The ring assembly works correctly. The issue is that my function returns a tuple (ring_coords, used) but in the solve.py I had the function returning just ring_coords (or None). Let me check the solve.py code more carefully. Actually, looking at the solve.py code, the `assemble_ring_from_ways` function there returns `ring_coords` or None (not a tuple). That's correct for that version. But the issue is that it only builds one ring from all the outer ways - and for districts with multiple outer rings (like Meidling with 53 outer ways but only some used), we need to handle multiple rings. Wait, looking at the test output: Meidling has 53 outer refs but only some were used in the ring (used all=False). This means there are multiple rings. The function only builds one ring. After building one ring, the unused ways remain and need to be built into additional rings. But wait, the actual problem is that in solve.py, the function `assemble_ring_from_ways` is being called correctly but then the result is used to create polygons. Let me look at the code flow more carefully... Actually, the issue might be that in `fetch_vienna_districts_osmnx`, after building the first ring, I try to figure out which ways were used by checking if their coordinates are in the ring. This is fragile. Let me rework this. Let me rewrite solve.py to properly handle this. The key insight from the test is: 1. The ring assembly works for most districts 2. For multi-ring districts, we need to iteratively build rings from remaining unused ways Let me rewrite the function more cleanly.
8:50
8:50
Write
/work/solve.py
content · 529 lines · py
"""
Fetch Vienna district boundaries (current and 2014-01-01) from Overpass via osmnx,
compare to find changes, and output vienna_boundary_changes.geojson.
"""
import os
import sys
import json
from collections import OrderedDict
import osmnx as ox
import osmnx._overpass as op
import geopandas as gpd
import pandas as pd
import shapely
import shapely.geometry
import numpy as np
ox.settings.log_console = False
# ---------------------------------------------------------------------------
# 1. Overpass query
# ---------------------------------------------------------------------------
def assemble_ring_from_ways(way_refs, ways_dict, nodes_dict):
"""Assemble a closed ring from a sequence of OSM way references.
Returns (ring_coords, used_mask) or (None, None) if no ring could be built.
"""
if not way_refs:
return None, None
segments = []
for ref in way_refs:
if ref not in ways_dict:
continue
node_ids = ways_dict[ref]
if len(node_ids) < 2:
continue
coords = [nodes_dict.get(nid) for nid in node_ids]
coords = [c for c in coords if c is not None]
if len(coords) < 2:
continue
segments.append({
'ref': ref,
'coords': list(coords),
'start_node': node_ids[0],
'end_node': node_ids[-1],
})
if not segments:
return None, None
used = [False] * len(segments)
ring_coords = list(segments[0]['coords'])
current_end_node = segments[0]['end_node']
used[0] = True
changed = True
while changed:
changed = False
for i, seg in enumerate(segments):
if used[i]:
continue
if seg['start_node'] == current_end_node:
ring_coords.extend(seg['coords'][1:])
current_end_node = seg['end_node']
used[i] = True
changed = True
break
elif seg['end_node'] == current_end_node:
rev_coords = list(reversed(seg['coords']))
ring_coords.extend(rev_coords[1:])
current_end_node = seg['start_node']
used[i] = True
changed = True
break
if ring_coords[0] != ring_coords[-1]:
ring_coords.append(ring_coords[0])
return ring_coords, used
def fetch_vienna_districts_osmnx(date=None):
"""Fetch Vienna district (admin_level=9) boundary relations via Overpass."""
bbox = (48.1, 16.1, 48.35, 16.6)
if date:
query = f"""
[out:json][timeout:180][date:"{date}"];
relation["admin_level"="9"]["boundary"="administrative"]({bbox[0]},{bbox[1]},{bbox[2]},{bbox[3]});
out body;
>;
out skel qt;
"""
else:
query = f"""
[out:json][timeout:180];
relation["admin_level"="9"]["boundary"="administrative"]({bbox[0]},{bbox[1]},{bbox[2]},{bbox[3]});
out body;
>;
out skel qt;
"""
data = OrderedDict()
data['data'] = query
result = op._overpass_request(data)
elements = result.get("elements", [])
nodes = {}
ways = {}
relations = []
for el in elements:
etype = el.get("type")
eid = el.get("id")
if etype == "node":
nodes[eid] = (el.get("lon"), el.get("lat"))
elif etype == "way":
ways[eid] = el.get("nodes", [])
elif etype == "relation":
relations.append(el)
features = []
for rel in relations:
tags = rel.get("tags", {})
name = tags.get("name", "")
admin_level = tags.get("admin_level", "")
if admin_level != "9":
continue
members = rel.get("members", [])
# Collect way refs by role
all_outer_refs = []
inner_refs_coords = []
for mem in members:
if mem.get("type") != "way":
continue
role = mem.get("role", "")
ref = mem.get("ref")
if ref not in ways:
continue
if role == "inner":
node_ids = ways[ref]
coords = [nodes.get(nid) for nid in node_ids]
coords = [c for c in coords if c is not None]
if len(coords) >= 3:
if coords[0] != coords[-1]:
coords.append(coords[0])
inner_refs_coords.append(coords)
else:
all_outer_refs.append(ref)
# Build outer rings iteratively
outer_polygons = []
remaining = list(all_outer_refs)
while remaining:
ring_coords, used = assemble_ring_from_ways(remaining, ways, nodes)
if ring_coords is None or len(ring_coords) < 4:
break
try:
ls = shapely.LineString(ring_coords)
if not ls.is_valid or ls.is_empty or ls.length < 1e-10:
break
poly = shapely.Polygon(ring_coords)
poly = shapely.polygon.orient(poly)
if poly.is_valid and not poly.is_empty and poly.area > 1e-12:
outer_polygons.append(poly)
except Exception:
break
# Remove used refs from remaining
remaining = [r for i, r in enumerate(remaining)
if i >= len(used) or not used[i]]
if not outer_polygons:
continue
# Handle inner rings
inner_polygons = []
for coords in inner_refs_coords:
try:
inner_ls = shapely.LineString(coords)
if inner_ls.is_valid and not inner_ls.is_empty and inner_ls.length > 1e-10:
ipoly = shapely.Polygon(coords)
ipoly = shapely.polygon.orient(ipoly)
if ipoly.is_valid and not ipoly.is_empty:
inner_polygons.append(ipoly)
except Exception:
continue
# Cut inner rings from outer polygons
final_polygons = []
for outer_poly in outer_polygons:
inners_here = []
rest = []
for ipoly in inner_polygons:
try:
if outer_poly.contains(ipoly.representative_point()) or outer_poly.intersects(ipoly):
inners_here.append(ipoly)
else:
rest.append(ipoly)
except Exception:
rest.append(ipoly)
inner_polygons = rest
if inners_here:
try:
outer_coords = list(outer_poly.exterior.coords)
inner_coords_list = [list(ip.exterior.coords) for ip in inners_here]
poly = shapely.Polygon(outer_coords, inner_coords_list)
poly = shapely.polygon.orient(poly)
if poly.is_valid and not poly.is_empty:
final_polygons.append(poly)
else:
final_polygons.append(outer_poly)
except Exception:
final_polygons.append(outer_poly)
else:
final_polygons.append(outer_poly)
if final_polygons:
geom = shapely.unary_union(final_polygons)
if geom.geom_type == "Polygon":
geom = shapely.geometry.MultiPolygon([geom])
if not geom.is_empty:
features.append({
"geometry": geom,
"name": name,
"osm_id": rel.get("id"),
})
if not features:
return gpd.GeoDataFrame()
return gpd.GeoDataFrame(features, crs="EPSG:4326")
# ---------------------------------------------------------------------------
# 2. Name normalisation
# ---------------------------------------------------------------------------
VIENNA_DISTRICTS_CANON = {
"innere stadt": "Innere Stadt",
"leopoldstadt": "Leopoldstadt",
"landstraße": "Landstraße",
"landstrasse": "Landstraße",
"wieden": "Wieden",
"margareten": "Margareten",
"margarethen": "Margareten",
"mariahilf": "Mariahilf",
"neubau": "Neubau",
"josefstadt": "Josefstadt",
"alsergrund": "Alsergrund",
"favoriten": "Favoriten",
"simmering": "Simmering",
"meidling": "Meidling",
"hietzing": "Hietzing",
"penzing": "Penzing",
"rudolfsheim-fünfhaus": "Rudolfsheim-Fünfhaus",
"rudolfsheim-fuenfhaus": "Rudolfsheim-Fünfhaus",
"ottakring": "Ottakring",
"hernals": "Hernals",
"währing": "Währing",
"waehring": "Währing",
"döbling": "Döbling",
"doebling": "Döbling",
"brigittenau": "Brigittenau",
"floridsdorf": "Floridsdorf",
"donaustadt": "Donaustadt",
"liesing": "Liesing",
}
def normalise_name(name):
if not name or pd.isna(name):
return None
n = name.strip()
key = n.lower().replace(" ", " ").replace("ß", "ss").replace("-", "-")
if key in VIENNA_DISTRICTS_CANON:
return VIENNA_DISTRICTS_CANON[key]
key2 = key.replace("ss", "ß")
if key2 in VIENNA_DISTRICTS_CANON:
return VIENNA_DISTRICTS_CANON[key2]
for pattern, canon in VIENNA_DISTRICTS_CANON.items():
if pattern in key or key in pattern:
return canon
return n
# ---------------------------------------------------------------------------
# 3. Main pipeline
# ---------------------------------------------------------------------------
def main():
print("=" * 60)
print("Fetching CURRENT Vienna district boundaries...")
print("=" * 60)
current_gdf = fetch_vienna_districts_osmnx()
print(f" Got {len(current_gdf)} current district features")
print("\n" + "=" * 60)
print("Fetching 2014-01-01 Vienna district boundaries...")
print("=" * 60)
historical_gdf = fetch_vienna_districts_osmnx(date="2014-01-01T00:00:00Z")
print(f" Got {len(historical_gdf)} historical district features")
if current_gdf.empty or historical_gdf.empty:
print("ERROR: Empty data returned!")
sys.exit(1)
# Normalise names
current_gdf["district_name"] = current_gdf["name"].apply(normalise_name)
historical_gdf["district_name"] = historical_gdf["name"].apply(normalise_name)
print("\nCurrent district names:")
for n in sorted(current_gdf["district_name"].dropna().unique()):
print(f" - {n}")
print("\nHistorical district names:")
for n in sorted(historical_gdf["district_name"].dropna().unique()):
print(f" - {n}")
# Filter to Vienna districts only
valid_names = set(VIENNA_DISTRICTS_CANON.values())
current_gdf = current_gdf[current_gdf["district_name"].isin(valid_names)].copy()
historical_gdf = historical_gdf[historical_gdf["district_name"].isin(valid_names)].copy()
print(f"\nAfter filtering:")
print(f" Current: {len(current_gdf)} districts")
print(f" Historical: {len(historical_gdf)} districts")
current_names = set(current_gdf["district_name"].dropna())
historical_names = set(historical_gdf["district_name"].dropna())
common = current_names & historical_names
only_current = current_names - historical_names
only_historical = historical_names - current_names
print(f"\nCommon districts: {len(common)}")
if only_current:
print(f"Only in current: {only_current}")
if only_historical:
print(f"Only in historical: {only_historical}")
# Build unified dissolve
current_union = current_gdf.geometry.unary_union
historical_union = historical_gdf.geometry.unary_union
print(f"\nCurrent total area: {current_union.area:.8f} sq deg")
print(f"Historical total area: {historical_union.area:.8f} sq deg")
# Symmetric difference
added = shapely.difference(current_union, historical_union)
removed = shapely.difference(historical_union, current_union)
unchanged = shapely.intersection(current_union, historical_union)
for label, geom in [("Added", added), ("Removed", removed), ("Unchanged", unchanged)]:
if geom.is_empty:
print(f"{label}: empty")
else:
print(f"{label}: area={geom.area:.8f}, type={geom.geom_type}")
# Collect fragments
fragments = []
def add_fragments(geom, ctype, dist_name_fn):
if geom.is_empty:
return
if geom.geom_type == "MultiPolygon":
polys = list(geom.geoms)
elif geom.geom_type == "Polygon":
polys = [geom]
elif geom.geom_type == "GeometryCollection":
polys = [g for g in geom.geoms if g.geom_type in ("Polygon", "MultiPolygon")]
else:
polys = [geom]
for poly in polys:
if poly.is_empty:
continue
if poly.geom_type == "Polygon":
mp = shapely.geometry.MultiPolygon([poly])
elif poly.geom_type == "MultiPolygon":
mp = poly
else:
continue
dname = dist_name_fn(mp)
fragments.append({
"geometry": mp,
"change_type": ctype,
"district_name": dname,
})
def added_dist_name(geom):
districts = set()
for _, row in current_gdf.iterrows():
if geom.intersects(row["geometry"]):
districts.add(row["district_name"])
return ", ".join(sorted(districts)) if districts else "unknown"
add_fragments(added, "added_since_2014", added_dist_name)
def removed_dist_name(geom):
districts = set()
for _, row in historical_gdf.iterrows():
if geom.intersects(row["geometry"]):
districts.add(row["district_name"])
return ", ".join(sorted(districts)) if districts else "unknown"
add_fragments(removed, "removed_since_2014", removed_dist_name)
# Unchanged fragments per current district
for _, row in current_gdf.iterrows():
dname = row["district_name"]
dist_geom = row["geometry"]
unchanged_part = shapely.intersection(dist_geom, unchanged)
if not unchanged_part.is_empty:
def dn_fn(geom, dn=dname):
return dn
add_fragments(unchanged_part, "unchanged", dn_fn)
result_gdf = gpd.GeoDataFrame(fragments, crs="EPSG:4326")
if result_gdf.empty:
print("No fragments — all boundaries unchanged.")
result_gdf = current_gdf.copy()
result_gdf["change_type"] = "unchanged"
result_gdf = result_gdf[["geometry", "change_type", "district_name"]]
print(f"\nFragments created: {len(result_gdf)}")
if not result_gdf.empty:
print(result_gdf["change_type"].value_counts())
# -----------------------------------------------------------------------
# 4. touches_changed
# -----------------------------------------------------------------------
changed_geoms = []
for _, row in result_gdf.iterrows():
if row["change_type"] in ("added_since_2014", "removed_since_2014"):
changed_geoms.append(row["geometry"])
if changed_geoms:
changed_union = shapely.unary_union(changed_geoms)
else:
changed_union = shapely.geometry.MultiPolygon()
district_touches = {}
for _, row in current_gdf.iterrows():
dname = row["district_name"]
geom = row["geometry"]
if not changed_union.is_empty:
touches = geom.touches(changed_union)
if not touches:
overlap = shapely.intersection(geom, changed_union)
touches = not overlap.is_empty
district_touches[dname] = touches
else:
district_touches[dname] = False
for _, row in historical_gdf.iterrows():
dname = row["district_name"]
if dname not in district_touches:
geom = row["geometry"]
if not changed_union.is_empty:
touches = geom.touches(changed_union)
if not touches:
overlap = shapely.intersection(geom, changed_union)
touches = not overlap.is_empty
district_touches[dname] = touches
else:
district_touches[dname] = False
def get_touches(dn_str):
if pd.isna(dn_str) or dn_str == "unknown":
return False
for d in dn_str.split(","):
d = d.strip()
if district_touches.get(d, False):
return True
return False
result_gdf["touches_changed"] = result_gdf["district_name"].apply(get_touches)
# Simplify and ensure MultiPolygon
result_gdf["geometry"] = result_gdf["geometry"].simplify(0.00001, preserve_topology=True)
result_gdf["geometry"] = result_gdf["geometry"].buffer(0)
def ensure_multipolygon(geom):
if geom.geom_type == "Polygon":
return shapely.geometry.MultiPolygon([geom])
return geom
result_gdf["geometry"] = result_gdf["geometry"].apply(ensure_multipolygon)
# -----------------------------------------------------------------------
# 5. Output
# -----------------------------------------------------------------------
output_path = "/work/vienna_boundary_changes.geojson"
result_gdf.to_file(output_path, driver="GeoJSON")
print(f"\n{'=' * 60}")
print(f"Output written to {output_path}")
print(f"Features: {len(result_gdf)}")
print(f"Columns: {result_gdf.columns.tolist()}")
print(f"\nSample features:")
print(result_gdf[["change_type", "district_name", "touches_changed"]].head(15))
print(f"\nDistrict touch summary:")
for dname, touches in sorted(district_touches.items()):
print(f" {dname}: touches_changed={touches}")
# Verify
verify = gpd.read_file(output_path)
print(f"\nVerification: read back {len(verify)} features")
print(verify[["change_type", "district_name", "touches_changed"]].head(10))
print(f"Geometry types: {verify.geometry.geom_type.unique()}")
if __name__ == "__main__":
main()
Wrote /work/solve.py (18422 chars).
9:01
The problem is that my `fetch_vienna_districts_osmnx` function works in the test script but returns 0 in solve.py. Let me debug this more carefully. The issue might be that `assemble_ring_from_ways` returns (None, None) or the ring building logic has a bug in the full script. Let me check: in the test above, I call `assemble_ring_from_ways(outer_refs, 'outer', ways, nodes)` with 3 args. But in my solve.py I call `assemble_ring_from_ways(remaining, ways, nodes)` with 2 args! The function signature is `def assemble_ring_from_ways(way_refs, ways_dict, nodes_dict)` - that's 3 params including way_refs, so `assemble_ring_from_ways(remaining, ways, nodes)` has 3 args total which is correct. Wait, actually looking more carefully at the test: I called `assemble_ring_from_ways(outer_refs, 'outer', ways, nodes)` - that's 4 args! But my function only takes 3. In the test output it works... Oh wait, I defined `assemble_ring_from_ways(way_refs, role, ways_dict, nodes_dict)` in the test script but `assemble_ring_from_ways(way_refs, ways_dict, nodes_dict)` in solve.py. The test used a 4-param version. Let me check the test's function: In the test script I defined: ```python def assemble_ring_from_ways(way_refs, role, ways_dict, nodes_dict): ``` But in solve.py: ```python def assemble_ring_from_ways(way_refs, ways_dict, nodes_dict): ``` That's fine - the solve.py version has 3 params (the `role` was unused). So the call `assemble_ring_from_ways(remaining, ways, nodes)` passes 3 args to a function that takes 3 params. That should work. The issue must be somewhere else. Let me run a minimal version of solve.py to debug.
9:01
Bash
cd /work && python3 << 'PYEOF'
cd /work && python3 << 'PYEOF'
import osmnx as ox
import osmnx._overpass as op
from collections import OrderedDict
import geopandas as gpd
import shapely
import shapely.geometry
ox.settings.log_console = False
def assemble_ring_from_ways(way_refs, ways_dict, nodes_dict):
if not way_refs:
return None, None
segments = []
for ref in way_refs:
if ref not in ways_dict:
continue
node_ids = ways_dict[ref]
if len(node_ids) < 2:
continue
coords = [nodes_dict.get(nid) for nid in node_ids]
coords = [c for c in coords if c is not None]
if len(coords) < 2:
continue
segments.append({
'ref': ref,
'coords': list(coords),
'start_node': node_ids[0],
'end_node': node_ids[-1],
})
if not segments:
return None, None
used = [False] * len(segments)
ring_coords = list(segments[0]['coords'])
current_end_node = segments[0]['end_node']
used[0] = True
changed = True
while changed:
changed = False
for i, seg in enumerate(segments):
if used[i]:
continue
if seg['start_node'] == current_end_node:
ring_coords.extend(seg['coords'][1:])
current_end_node = seg['end_node']
used[i] = True
changed = True
break
elif seg['end_node'] == current_end_node:
rev_coords = list(reversed(seg['coords']))
ring_coords.extend(rev_coords[1:])
current_end_node = seg['start_node']
used[i] = True
changed = True
break
if ring_coords[0] != ring_coords[-1]:
ring_coords.append(ring_coords[0])
return ring_coords, used
def fetch_vienna_districts_osmnx(date=None):
bbox = (48.1, 16.1, 48.35, 16.6)
if date:
query = f"""
[out:json][timeout:180][date:"{date}"];
relation["admin_level"="9"]["boundary"="administrative"]({bbox[0]},{bbox[1]},{bbox[2]},{bbox[3]});
out body;
>;
out skel qt;
"""
else:
query = f"""
[out:json][timeout:180];
relation["admin_level"="9"]["boundary"="administrative"]({bbox[0]},{bbox[1]},{bbox[2]},{bbox[3]});
out body;
>;
out skel qt;
"""
data = OrderedDict()
data['data'] = query
result = op._overpass_request(data)
elements = result.get("elements", [])
nodes = {}
ways = {}
relations = []
for el in elements:
etype = el.get("type")
eid = el.get("id")
if etype == "node":
nodes[eid] = (el.get("lon"), el.get("lat"))
elif etype == "way":
ways[eid] = el.get("nodes", [])
elif etype == "relation":
relations.append(el)
features = []
for rel in relations:
tags = rel.get("tags", {})
name = tags.get("name", "")
admin_level = tags.get("admin_level", "")
if admin_level != "9":
continue
members = rel.get("members", [])
all_outer_refs = []
inner_refs_coords = []
for mem in members:
if mem.get("type") != "way":
continue
role = mem.get("role", "")
ref = mem.get("ref")
if ref not in ways:
continue
if role == "inner":
node_ids = ways[ref]
coords = [nodes.get(nid) for nid in node_ids]
coords = [c for c in coords if c is not None]
if len(coords) >= 3:
if coords[0] != coords[-1]:
coords.append(coords[0])
inner_refs_coords.append(coords)
else:
all_outer_refs.append(ref)
outer_polygons = []
remaining = list(all_outer_refs)
while remaining:
ring_coords, used = assemble_ring_from_ways(remaining, ways, nodes)
if ring_coords is None or len(ring_coords) < 4:
break
try:
ls = shapely.LineString(ring_coords)
if not ls.is_valid or ls.is_empty or ls.length < 1e-10:
break
poly = shapely.Polygon(ring_coords)
poly = shapely.polygon.orient(poly)
if poly.is_valid and not poly.is_empty and poly.area > 1e-12:
outer_polygons.append(poly)
except Exception:
break
remaining = [r for i, r in enumerate(remaining)
if i >= len(used) or not used[i]]
if not outer_polygons:
continue
inner_polygons = []
for coords in inner_refs_coords:
try:
inner_ls = shapely.LineString(coords)
if inner_ls.is_valid and not inner_ls.is_empty and inner_ls.length > 1e-10:
ipoly = shapely.Polygon(coords)
ipoly = shapely.polygon.orient(ipoly)
if ipoly.is_valid and not ipoly.is_empty:
inner_polygons.append(ipoly)
except Exception:
continue
final_polygons = []
for outer_poly in outer_polygons:
inners_here = []
rest = []
for ipoly in inner_polygons:
try:
if outer_poly.contains(ipoly.representative_point()) or outer_poly.intersects(ipoly):
inners_here.append(ipoly)
else:
rest.append(ipoly)
except Exception:
rest.append(ipoly)
inner_polygons = rest
if inners_here:
try:
outer_coords = list(outer_poly.exterior.coords)
inner_coords_list = [list(ip.exterior.coords) for ip in inners_here]
poly = shapely.Polygon(outer_coords, inner_coords_list)
poly = shapely.polygon.orient(poly)
if poly.is_valid and not poly.is_empty:
final_polygons.append(poly)
else:
final_polygons.append(outer_poly)
except Exception:
final_polygons.append(outer_poly)
else:
final_polygons.append(outer_poly)
if final_polygons:
geom = shapely.unary_union(final_polygons)
if geom.geom_type == "Polygon":
geom = shapely.geometry.MultiPolygon([geom])
if not geom.is_empty:
features.append({
"geometry": geom,
"name": name,
"osm_id": rel.get("id"),
})
if not features:
return gpd.GeoDataFrame()
return gpd.GeoDataFrame(features, crs="EPSG:4326")
# Test
print("CURRENT:")
gdf = fetch_vienna_districts_osmnx()
print(f" Got {len(gdf)} features")
print(f" Names: {sorted(gdf['name'].unique()) if not gdf.empty else 'N/A'}")
print()
print("HISTORICAL:")
gdf2 = fetch_vienna_districts_osmnx(date="2014-01-01T00:00:00Z")
print(f" Got {len(gdf2)} features")
print(f" Names: {sorted(gdf2['name'].unique()) if not gdf2.empty else 'N/A'}")
PYEOF
CURRENT: Got 0 features Names: N/A HISTORICAL: Got 0 features Names: N/A
9:10
The function returns 0 features but when I tested it with a standalone script earlier, it worked fine for all 23 districts. Let me check what's different - perhaps the `remaining` list management is wrong. The issue is that after `used` is returned, I'm filtering `remaining` using the indices from `used`, but the segments were created from `remaining` inside `assemble_ring_from_ways` - and the used array corresponds to segments, not to the original `remaining` list directly since I filtered out ways not in ways_dict. Let me debug more carefully.
9:10
Bash
cd /work && python3 << 'PYEOF'
cd /work && python3 << 'PYEOF'
import osmnx as ox
import osmnx._overpass as op
from collections import OrderedDict
ox.settings.log_console = False
bbox = (48.1, 16.1, 48.35, 16.6)
query = f"""
[out:json][timeout:180];
relation["admin_level"="9"]["boundary"="administrative"]({bbox[0]},{bbox[1]},{bbox[2]},{bbox[3]});
out body;
>;
out skel qt;
"""
data = OrderedDict()
data['data'] = query
result = op._overpass_request(data)
elements = result.get("elements", [])
nodes = {}
ways = {}
relations = []
for el in elements:
etype = el.get("type")
eid = el.get("id")
if etype == "node":
nodes[eid] = (el.get("lon"), el.get("lat"))
elif etype == "way":
ways[eid] = el.get("nodes", [])
elif etype == "relation":
relations.append(el)
print(f"Nodes: {len(nodes)}, Ways: {len(ways)}, Relations: {len(relations)}")
def assemble_ring_from_ways(way_refs, ways_dict, nodes_dict):
if not way_refs:
return None, None, []
segments = []
for ref in way_refs:
if ref not in ways_dict:
continue
node_ids = ways_dict[ref]
if len(node_ids) < 2:
continue
coords = [nodes_dict.get(nid) for nid in node_ids]
coords = [c for c in coords if c is not None]
if len(coords) < 2:
continue
segments.append({
'ref': ref,
'coords': list(coords),
'start_node': node_ids[0],
'end_node': node_ids[-1],
})
if not segments:
return None, None, []
used = [False] * len(segments)
ring_coords = list(segments[0]['coords'])
current_end_node = segments[0]['end_node']
used[0] = True
changed = True
while changed:
changed = False
for i, seg in enumerate(segments):
if used[i]:
continue
if seg['start_node'] == current_end_node:
ring_coords.extend(seg['coords'][1:])
current_end_node = seg['end_node']
used[i] = True
changed = True
break
elif seg['end_node'] == current_end_node:
rev_coords = list(reversed(seg['coords']))
ring_coords.extend(rev_coords[1:])
current_end_node = seg['start_node']
used[i] = True
changed = True
break
if ring_coords[0] != ring_coords[-1]:
ring_coords.append(ring_coords[0])
used_refs = [segments[i]['ref'] for i in range(len(segments)) if used[i]]
return ring_coords, used, used_refs
for rel in relations[:3]:
tags = rel.get("tags", {})
name = tags.get("name", "")
admin_level = tags.get("admin_level", "")
print(f"\n{name} admin_level={admin_level}")
members = rel.get("members", [])
all_outer_refs = []
inner_refs_coords = []
for mem in members:
if mem.get("type") != "way":
continue
role = mem.get("role", "")
ref = mem.get("ref")
if ref not in ways:
continue
if role == "inner":
inner_refs_coords.append(ref)
else:
all_outer_refs.append(ref)
print(f" Outer refs: {len(all_outer_refs)}, Inner refs: {len(inner_refs_coords)}")
outer_polygons = []
remaining = list(all_outer_refs)
iter_count = 0
while remaining and iter_count < 10:
iter_count += 1
ring_coords, used, used_refs = assemble_ring_from_ways(remaining, ways, nodes)
if ring_coords is None or len(ring_coords) < 4:
print(f" Iter {iter_count}: ring_coords is None or too short")
break
try:
ls = shapely.LineString(ring_coords)
if not ls.is_valid or ls.is_empty or ls.length < 1e-10:
print(f" Iter {iter_count}: invalid LineString")
break
poly = shapely.Polygon(ring_coords)
poly = shapely.polygon.orient(poly)
if poly.is_valid and not poly.is_empty and poly.area > 1e-12:
outer_polygons.append(poly)
print(f" Iter {iter_count}: built polygon area={poly.area:.8f}, used {len(used_refs)} refs")
else:
print(f" Iter {iter_count}: invalid polygon")
break
except Exception as e:
print(f" Iter {iter_count}: error {e}")
break
# Remove used refs
remaining = [r for r in remaining if r not in used_refs]
print(f" Remaining: {len(remaining)} refs")
print(f" Result: {len(outer_polygons)} polygons")
PYEOF
Nodes: 6210, Ways: 499, Relations: 23 Alsergrund admin_level=9 Outer refs: 9, Inner refs: 0 Iter 1: error name 'shapely' is not defined Result: 0 polygons Hietzing admin_level=9 Outer refs: 50, Inner refs: 0 Iter 1: error name 'shapely' is not defined Result: 0 polygons Innere Stadt admin_level=9 Outer refs: 7, Inner refs: 0 Iter 1: error name 'shapely' is not defined Result: 0 polygons
9:21
I forgot to import shapely in the test script. But the real problem in my solve.py is that the `assemble_ring_from_ways` function seems to have an issue where the `remaining` list isn't being properly reduced because `used_refs` returned from the function may not align with the original references passed in. Let me check this more carefully. Actually, looking at the code more carefully, the issue is in how I remove used refs from remaining. The `used` list from `assemble_ring_from_ways` corresponds to segments built from `remaining`, but segments are filtered within the function (some refs might not have valid segments). The `remaining` list passed to the function may have refs that don't match because `assemble_ring_from_ways` creates segments from those refs, but then tracks used as indices into the segments list, not into the original `remaining` list. Let me fix this by returning used_refs (the actual refs that were used) and filtering remaining based on those. Let me rewrite the code more carefully.
9:21
Bash
cd /work && python3 << 'PYEOF'
cd /work && python3 << 'PYEOF'
import osmnx as ox
import osmnx._overpass as op
from collections import OrderedDict
import shapely
import shapely.geometry
ox.settings.log_console = False
bbox = (48.1, 16.1, 48.35, 16.6)
query = f"""
[out:json][timeout:180];
relation["admin_level"="9"]["boundary"="administrative"]({bbox[0]},{bbox[1]},{bbox[2]},{bbox[3]});
out body;
>;
out skel qt;
"""
data = OrderedDict()
data['data'] = query
result = op._overpass_request(data)
elements = result.get("elements", [])
nodes = {}
ways = {}
relations = []
for el in elements:
etype = el.get("type")
eid = el.get("id")
if etype == "node":
nodes[eid] = (el.get("lon"), el.get("lat"))
elif etype == "way":
ways[eid] = el.get("nodes", [])
elif etype == "relation":
relations.append(el)
print(f"Nodes: {len(nodes)}, Ways: {len(ways)}, Relations: {len(relations)}")
def build_ring_from_ways(way_refs, ways_dict, nodes_dict):
"""Return (ring_coords, used_refs) or (None, [])."""
if not way_refs:
return None, []
# Build segments from the given refs
segments = []
for ref in way_refs:
if ref not in ways_dict:
continue
node_ids = ways_dict[ref]
if len(node_ids) < 2:
continue
coords = [nodes_dict.get(nid) for nid in node_ids]
coords = [c for c in coords if c is not None]
if len(coords) < 2:
continue
segments.append({
'ref': ref,
'coords': list(coords),
'start_node': node_ids[0],
'end_node': node_ids[-1],
})
if not segments:
return None, []
used = [False] * len(segments)
# Start with the longest segment for better results
longest_idx = max(range(len(segments)), key=lambda i: len(segments[i]['coords']))
ring_coords = list(segments[longest_idx]['coords'])
current_end_node = segments[longest_idx]['end_node']
used[longest_idx] = True
changed = True
while changed:
changed = False
for i, seg in enumerate(segments):
if used[i]:
continue
if seg['start_node'] == current_end_node:
ring_coords.extend(seg['coords'][1:])
current_end_node = seg['end_node']
used[i] = True
changed = True
break
elif seg['end_node'] == current_end_node:
rev_coords = list(reversed(seg['coords']))
ring_coords.extend(rev_coords[1:])
current_end_node = seg['start_node']
used[i] = True
changed = True
break
if ring_coords[0] != ring_coords[-1]:
ring_coords.append(ring_coords[0])
used_refs = [segments[i]['ref'] for i in range(len(segments)) if used[i]]
return ring_coords, used_refs
# Test building polygon for each relation
for rel in relations:
tags = rel.get("tags", {})
name = tags.get("name", "")
admin_level = tags.get("admin_level", "")
if admin_level != "9":
continue
members = rel.get("members", [])
all_outer_refs = []
inner_refs_coords = []
for mem in members:
if mem.get("type") != "way":
continue
role = mem.get("role", "")
ref = mem.get("ref")
if ref not in ways:
continue
if role == "inner":
node_ids = ways[ref]
coords = [nodes.get(nid) for nid in node_ids]
coords = [c for c in coords if c is not None]
if len(coords) >= 3:
if coords[0] != coords[-1]:
coords.append(coords[0])
inner_refs_coords.append(coords)
else:
all_outer_refs.append(ref)
# Build outer polygons iteratively
outer_polygons = []
remaining = list(all_outer_refs)
while remaining:
ring_coords, used_refs = build_ring_from_ways(remaining, ways, nodes)
if ring_coords is None or len(ring_coords) < 4:
break
try:
ls = shapely.LineString(ring_coords)
if not ls.is_valid or ls.is_empty or ls.length < 1e-10:
break
poly = shapely.Polygon(ring_coords)
poly = shapely.polygon.orient(poly)
if poly.is_valid and not poly.is_empty and poly.area > 1e-12:
outer_polygons.append(poly)
except Exception:
break
# Remove used refs
remaining = [r for r in remaining if r not in used_refs]
# Handle inner rings
inner_polygons = []
for coords in inner_refs_coords:
try:
ipoly = shapely.Polygon(coords)
ipoly = shapely.polygon.orient(ipoly)
if ipoly.is_valid and not ipoly.is_empty:
inner_polygons.append(ipoly)
except Exception:
continue
# Cut holes
final_polygons = []
for outer_poly in outer_polygons:
inners_here = []
rest = []
for ipoly in inner_polygons:
try:
if outer_poly.contains(ipoly.representative_point()) or outer_poly.intersects(ipoly):
inners_here.append(ipoly)
else:
rest.append(ipoly)
except Exception:
rest.append(ipoly)
inner_polygons = rest
if inners_here:
try:
outer_coords = list(outer_poly.exterior.coords)
inner_coords_list = [list(ip.exterior.coords) for ip in inners_here]
poly = shapely.Polygon(outer_coords, inner_coords_list)
poly = shapely.polygon.orient(poly)
if poly.is_valid and not poly.is_empty:
final_polygons.append(poly)
else:
final_polygons.append(outer_poly)
except Exception:
final_polygons.append(outer_poly)
else:
final_polygons.append(outer_poly)
if final_polygons:
geom = shapely.unary_union(final_polygons)
if geom.geom_type == "Polygon":
geom = shapely.geometry.MultiPolygon([geom])
print(f"{name}: {len(all_outer_refs)} outer refs, {len(outer_polygons)} polys, area={geom.area:.8f}")
else:
print(f"{name}: FAILED to build any polygon")
PYEOF
Nodes: 6210, Ways: 499, Relations: 23 Alsergrund: FAILED to build any polygon Hietzing: FAILED to build any polygon Innere Stadt: FAILED to build any polygon Josefstadt: FAILED to build any polygon Leopoldstadt: FAILED to build any polygon Mariahilf: FAILED to build any polygon Meidling: FAILED to build any polygon Neubau: FAILED to build any polygon Penzing: FAILED to build any polygon Rudolfsheim-Fünfhaus: FAILED to build any polygon Währing: FAILED to build any polygon Landstraße: FAILED to build any polygon Brigittenau: FAILED to build any polygon Donaustadt: FAILED to build any polygon Döbling: FAILED to build any polygon Favoriten: FAILED to build any polygon Floridsdorf: FAILED to build any polygon Hernals: FAILED to build any polygon Liesing: FAILED to build any polygon Margareten: FAILED to build any polygon Ottakring: FAILED to build any polygon Simmering: FAILED to build any polygon Wieden: FAILED to build any polygon
9:32
The problem is in the `build_ring_from_ways` function. The issue might be that the function attempts to build ONE ring from ALL the outer way refs, but sometimes there are multiple separate rings (for multipolygons). Looking at the earlier test, when we passed all 9 outer refs for Alsergrund to `assemble_ring_from_ways`, it worked fine and used all 9. But now with `build_ring_from_ways` starting with the longest segment, let me check what's happening. Wait - the `build_ring_from_ways` starts with the LONGEST segment instead of the FIRST. Let me check if the logic for finding the next segment is correct. The issue might be that with a different starting segment, the ring assembly fails to find connections. Actually, looking more carefully at the earlier successful test: `assemble_ring_from_ways` started with the first segment. In `build_ring_from_ways`, we start with the longest segment. But the ring assembly logic is the same - it just chains segments end-to-end by matching start/end nodes. Let me debug more carefully.
9:32
Bash
cd /work && python3 << 'PYEOF'
cd /work && python3 << 'PYEOF'
import osmnx as ox
import osmnx._overpass as op
from collections import OrderedDict
import shapely
import shapely.geometry
ox.settings.log_console = False
bbox = (48.1, 16.1, 48.35, 16.6)
query = f"""
[out:json][timeout:180];
relation["admin_level"="9"]["boundary"="administrative"]({bbox[0]},{bbox[1]},{bbox[2]},{bbox[3]});
out body;
>;
out skel qt;
"""
data = OrderedDict()
data['data'] = query
result = op._overpass_request(data)
elements = result.get("elements", [])
nodes = {}
ways = {}
relations = []
for el in elements:
etype = el.get("type")
eid = el.get("id")
if etype == "node":
nodes[eid] = (el.get("lon"), el.get("lat"))
elif etype == "way":
ways[eid] = el.get("nodes", [])
elif etype == "relation":
relations.append(el)
print(f"Nodes: {len(nodes)}, Ways: {len(ways)}, Relations: {len(relations)}")
def build_ring_from_ways(way_refs, ways_dict, nodes_dict):
if not way_refs:
return None, []
segments = []
for ref in way_refs:
if ref not in ways_dict:
continue
node_ids = ways_dict[ref]
if len(node_ids) < 2:
continue
coords = [nodes_dict.get(nid) for nid in node_ids]
coords = [c for c in coords if c is not None]
if len(coords) < 2:
continue
segments.append({
'ref': ref,
'coords': list(coords),
'start_node': node_ids[0],
'end_node': node_ids[-1],
})
if not segments:
return None, []
print(f" build_ring: {len(segments)} segments")
for s in segments:
print(f" ref={s['ref']}: {s['start_node']} -> {s['end_node']}, {len(s['coords'])} coords")
used = [False] * len(segments)
# Start with first segment
ring_coords = list(segments[0]['coords'])
current_end_node = segments[0]['end_node']
current_start_node = segments[0]['start_node']
used[0] = True
print(f" Start with ref={segments[0]['ref']}, start_node={current_start_node}, end_node={current_end_node}")
changed = True
iteration = 0
while changed and iteration < 100:
iteration += 1
changed = False
for i, seg in enumerate(segments):
if used[i]:
continue
if seg['start_node'] == current_end_node:
ring_coords.extend(seg['coords'][1:])
current_end_node = seg['end_node']
used[i] = True
changed = True
print(f" Iter {iteration}: append ref={seg['ref']} (fwd), now end={current_end_node}")
break
elif seg['end_node'] == current_end_node:
rev_coords = list(reversed(seg['coords']))
ring_coords.extend(rev_coords[1:])
current_end_node = seg['start_node']
used[i] = True
changed = True
print(f" Iter {iteration}: append ref={seg['ref']} (rev), now end={current_end_node}")
break
print(f" Ring closed? ring_coords[0]={ring_coords[0]}, ring_coords[-1]={ring_coords[-1]}")
print(f" current_end_node={current_end_node}, current_start_node={current_start_node}")
print(f" Used: {[segments[i]['ref'] for i in range(len(segments)) if used[i]]}")
if ring_coords[0] != ring_coords[-1]:
ring_coords.append(ring_coords[0])
used_refs = [segments[i]['ref'] for i in range(len(segments)) if used[i]]
return ring_coords, used_refs
# Test Alsergrund
for rel in relations:
tags = rel.get("tags", {})
name = tags.get("name", "")
if name != "Alsergrund":
continue
members = rel.get("members", [])
all_outer_refs = []
for mem in members:
if mem.get("type") != "way":
continue
role = mem.get("role", "")
ref = mem.get("ref")
if ref not in ways:
continue
if role != "inner":
all_outer_refs.append(ref)
print(f"\n{name}: {len(all_outer_refs)} outer refs")
print(f" Refs: {all_outer_refs}")
ring_coords, used_refs = build_ring_from_ways(all_outer_refs, ways, nodes)
if ring_coords and len(ring_coords) >= 4:
ls = shapely.LineString(ring_coords)
poly = shapely.Polygon(ring_coords)
poly = shapely.polygon.orient(poly)
print(f" LineString valid={ls.is_valid}, length={ls.length}")
print(f" Polygon valid={poly.is_valid}, area={poly.area}")
break
PYEOF
Nodes: 6210, Ways: 499, Relations: 23
Alsergrund: 9 outer refs
Refs: [147468285, 8046654, 147396089, 147463455, 147463456, 390540126, 390540129, 147463454, 147463462]
build_ring: 9 segments
ref=147468285: 731110806 -> 2157233697, 32 coords
ref=8046654: 731110806 -> 313620580, 28 coords
ref=147396089: 1606198290 -> 313620580, 33 coords
ref=147463455: 1606822119 -> 1606198290, 15 coords
ref=147463456: 1606822180 -> 1606822119, 22 coords
ref=390540126: 3936984440 -> 1606822180, 3 coords
ref=390540129: 3936984434 -> 3936984440, 3 coords
ref=147463454: 731244444 -> 3936984434, 17 coords
ref=147463462: 2157233697 -> 731244444, 268 coords
Start with ref=147468285, start_node=731110806, end_node=2157233697
Iter 1: append ref=147463462 (fwd), now end=731244444
Iter 2: append ref=147463454 (fwd), now end=3936984434
Iter 3: append ref=390540129 (fwd), now end=3936984440
Iter 4: append ref=390540126 (fwd), now end=1606822180
Iter 5: append ref=147463456 (fwd), now end=1606822119
Iter 6: append ref=147463455 (fwd), now end=1606198290
Iter 7: append ref=147396089 (fwd), now end=313620580
Iter 8: append ref=8046654 (rev), now end=731110806
Ring closed? ring_coords[0]=(16.3408703, 48.2151672), ring_coords[-1]=(16.3408703, 48.2151672)
current_end_node=731110806, current_start_node=731110806
Used: [147468285, 8046654, 147396089, 147463455, 147463456, 390540126, 390540129, 147463454, 147463462]
Traceback (most recent call last): File "<stdin>", line 137, in <module> AttributeError: module 'shapely' has no attribute 'polygon'. Did you mean: 'Polygon'?
9:36
The ring assembly works perfectly for Alsergrund. The issue is `shapely.polygon.orient` is actually `shapely.geometry.polygon.orient` or more precisely it's accessible via `shapely.geometry.polygon.orient` but the standard way is with `shapely.geometry.polygon.orient` or simply calling `shapely.geometry.polygon.orient()`. Let me check: Actually, `shapely.geometry.polygon.orient` exists in newer shapely versions. But `shapely.polygon` doesn't. The correct way is `from shapely.geometry import polygon` or `shapely.geometry.polygon.orient()`. Let me fix the code.
9:36
9:36
Bash
cd /work && python3 -c "
cd /work && python3 -c " import shapely print(dir(shapely)) " | tr ',' '\n' | grep -i orient
'orient_polygons' 'oriented_envelope'
9:39
So in shapely 2.x, `orient` is not at `shapely.polygon.orient` but rather `shapely.orient_polygons`. Let me check the exact API.
9:39
Bash
cd /work && python3 -c "
cd /work && python3 -c " import shapely help(shapely.orient_polygons) " 2>&1 | head -20
Help on function orient_polygons in module shapely.constructive:
orient_polygons(geometry, *, exterior_cw=False, **kwargs)
Enforce a ring orientation on all polygonal elements in the input geometry.
Forces (Multi)Polygons to use a counter-clockwise orientation for their
exterior ring, and a clockwise orientation for their interior rings (or
the oppposite if ``exterior_cw=True``).
Also processes geometries inside a GeometryCollection in the same way.
Other geometries are returned unchanged.
.. versionadded:: 2.1.0
Parameters
----------
geometry : Geometry or array_like
Geometry or geometries to orient consistently.
exterior_cw : bool, default False
If True, exterior rings will be clockwise and interior rings
9:42
So in shapely 2.1.0, `shapely.polygon.orient` is replaced by `shapely.orient_polygons`. But actually `shapely.polygon.orient` might still exist. Let me check: Actually, `shapely.polygon` is a module in older versions. In shapely 2.x, it's been restructured. Let me just use `shapely.orient_polygons` which works on the full geometry, or just use `__geo_interface__` to check orientation. Actually, the simplest fix: use `shapely.geometry.polygon.orient` or just skip orientation since the polygon should already be valid. Let me check if the polygon is already CCW. Actually, the Polygon constructor should handle orientation automatically in many cases. Let me just remove the orient call.
9:42
Bash
cd /work && python3 -c "
cd /work && python3 -c "
import shapely, shapely.geometry
# Check if polygon.orient exists
try:
from shapely.geometry import polygon
print(hasattr(polygon, 'orient'))
except:
print('no')
# Check what's available
print([x for x in dir(shapely) if 'orient' in x.lower()])
print([x for x in dir(shapely.geometry) if 'orient' in x.lower()])
"
True ['orient_polygons', 'oriented_envelope'] []
9:45
So `shapely.polygon.orient` exists! But it was giving me "module 'shapely' has no attribute 'polygon'" error. Let me check more carefully.
9:45
Bash
cd /work && python3 << 'PYEOF'
cd /work && python3 << 'PYEOF'
import shapely
import shapely.geometry
# Check if shapely.polygon exists
print("shapely.polygon:", hasattr(shapely, 'polygon'))
if hasattr(shapely, 'polygon'):
print(" type:", type(shapely.polygon))
print(" has orient:", hasattr(shapely.polygon, 'orient'))
# The issue was in the previous script where I used import shapely but not shapely.geometry
# Let's test properly
from shapely import polygon
print("from shapely import polygon works:", polygon)
print("has orient:", hasattr(polygon, 'orient'))
# Try calling it
from shapely.geometry import Point
p = Point(0,0).buffer(1)
result = polygon.orient(p)
print("Result:", result, result.area)
PYEOF
shapely.polygon: False
Traceback (most recent call last): File "<stdin>", line 12, in <module> ImportError: cannot import name 'polygon' from 'shapely' (/usr/local/lib/python3.14/site-packages/shapely/__init__.py). Did you mean: 'Polygon'?
9:48
So `shapely.polygon` doesn't exist in shapely 2.x. I need to use `shapely.orient_polygons()` instead. Let me update the solve.py accordingly.
9:48
Edit
/work/solve.py
Edited /work/solve.py.
9:50
Let me do a global replace of all `shapely.polygon.orient(` with `shapely.orient_polygons(` calls.
9:50
Edit
/work/solve.py
Edited /work/solve.py: replaced 1 occurrence(s).
9:52
Let me also fix the rest of the occurrences where `shapely.polygon.orient` is used.
9:52
Bash
cd /work && grep -n "polygon\.orient\|\.orient(" /work/solve.py
193: ipoly = shapely.polygon.orient(ipoly)
9:53
Edit
/work/solve.py
Edited /work/solve.py.
9:55
Also need to fix the `shapely.polygon.orient` in the test script. Let me now also fix the issue where in the test `build_ring_from_ways`, the `shapely` wasn't imported. Let me just run the actual solve.py and see if it works now.
9:55
9:55
Bash
cd /work && grep -n "orient" /work/solve.py
173: poly = shapely.orient_polygons(poly) 193: ipoly = shapely.orient_polygons(ipoly) 219: poly = shapely.orient_polygons(poly)
9:57
Good, all occurrences of `shapely.polygon.orient` have been replaced with `shapely.orient_polygons`. Now let's run the full solve script.
9:57
9:57
Bash
cd /work && timeout 300 python solve.py 2>&1
============================================================
Fetching CURRENT Vienna district boundaries...
============================================================
Got 23 current district features
============================================================
Fetching 2014-01-01 Vienna district boundaries...
============================================================
Got 26 historical district features
Current district names:
- Alsergrund
- Brigittenau
- Donaustadt
- Döbling
- Favoriten
- Floridsdorf
- Hernals
- Hietzing
- Innere Stadt
- Josefstadt
- Landstraße
- Leopoldstadt
- Liesing
- Margareten
- Mariahilf
- Meidling
- Neubau
- Ottakring
- Penzing
- Rudolfsheim-Fünfhaus
- Simmering
- Wieden
- Währing
Historical district names:
- Alsergrund
- Brigittenau
- Donaustadt
- Döbling
- Enzersfeld
- Favoriten
- Floridsdorf
- Gerasdorf bei Wien
- Hernals
- Hietzing
- Innere Stadt
- Josefstadt
- Klein-Engersdorf
- Landstraße
- Leopoldstadt
- Liesing
- Margareten
- Mariahilf
- Meidling
- Neubau
- Ottakring
- Penzing
- Rudolfsheim-Fünfhaus
- Simmering
- Wieden
- Währing
After filtering:
Current: 23 districts
Historical: 23 districts
Common districts: 23
/work/solve.py:354: DeprecationWarning: The 'unary_union' attribute is deprecated, use the 'union_all()' method instead.
current_union = current_gdf.geometry.unary_union
/work/solve.py:355: DeprecationWarning: The 'unary_union' attribute is deprecated, use the 'union_all()' method instead.
historical_union = historical_gdf.geometry.unary_union
Current total area: 0.05019851 sq deg
Historical total area: 0.05019825 sq deg
Added: area=0.00000195, type=MultiPolygon
Removed: area=0.00000169, type=MultiPolygon
Unchanged: area=0.05019656, type=Polygon
Fragments created: 631
change_type
removed_since_2014 312
added_since_2014 296
unchanged 23
Name: count, dtype: int64
============================================================
Output written to /work/vienna_boundary_changes.geojson
Features: 631
Columns: ['geometry', 'change_type', 'district_name', 'touches_changed']
Sample features:
change_type district_name touches_changed
0 added_since_2014 Liesing True
1 added_since_2014 Liesing True
2 added_since_2014 Liesing True
3 added_since_2014 Liesing True
4 added_since_2014 Liesing True
5 added_since_2014 Liesing True
6 added_since_2014 Liesing True
7 added_since_2014 Liesing True
8 added_since_2014 Liesing True
9 added_since_2014 Liesing True
10 added_since_2014 Liesing True
11 added_since_2014 Liesing True
12 added_since_2014 Liesing True
13 added_since_2014 Liesing True
14 added_since_2014 Liesing True
District touch summary:
Alsergrund: touches_changed=False
Brigittenau: touches_changed=False
Donaustadt: touches_changed=True
Döbling: touches_changed=True
Favoriten: touches_changed=True
Floridsdorf: touches_changed=True
Hernals: touches_changed=True
Hietzing: touches_changed=True
Innere Stadt: touches_changed=False
Josefstadt: touches_changed=False
Landstraße: touches_changed=False
Leopoldstadt: touches_changed=False
Liesing: touches_changed=True
Margareten: touches_changed=False
Mariahilf: touches_changed=False
Meidling: touches_changed=False
Neubau: touches_changed=False
Ottakring: touches_changed=False
Penzing: touches_changed=True
Rudolfsheim-Fünfhaus: touches_changed=False
Simmering: touches_changed=True
Wieden: touches_changed=False
Währing: touches_changed=False
Verification: read back 631 features
change_type district_name touches_changed
0 added_since_2014 Liesing True
1 added_since_2014 Liesing True
2 added_since_2014 Liesing True
3 added_since_2014 Liesing True
4 added_since_2014 Liesing True
5 added_since_2014 Liesing True
6 added_since_2014 Liesing True
7 added_since_2014 Liesing True
8 added_since_2014 Liesing True
9 added_since_2014 Liesing True
Geometry types: <ArrowStringArray>
['MultiPolygon']
Length: 1, dtype: str
10:04
The script is working. Let me analyze the results: 1. We got 23 districts in both current and historical (after filtering) 2. The added and removed areas are very small (1.95e-6 and 1.69e-6 sq deg) - these are minor boundary changes 3. The fragment count is quite high (631) because each small boundary change polygon gets split into many tiny polygons by the simplification/intersection operations 4. All geometry types are MultiPolygon as required However, I notice some issues: 1. The region around Meidling shows many small fragments - these are likely from the ring assembly not handling multiple separate rings well (Meidling only used some of its outer refs) 2. 296 added + 312 removed + 23 unchanged = 631 fragments. The number seems high because the symmetric difference can generate many tiny slivers. Let me verify the output more carefully by checking a few things: - The added/removed areas are very tiny, suggesting minor boundary changes - The district touches seem reasonable Let me check if the output is of good quality by looking at some actual geometries. Actually, let me check the output more carefully - the number of fragments is quite high. This could be because the ring assembly algorithm is breaking the unchanged parts into many small pieces. Let me check if the "unchanged" fragments are correct. Wait, there are only 23 unchanged fragments, which is one per district. So the 296 + 312 = 608 fragments are the added/removed areas. But those are very tiny areas (1.95e-6 sq deg total). Why so many fragments? The symmetric difference between the two unions might produce many small fragments because the boundaries don't exactly align. Each tiny difference in the boundary nodes creates a small sliver polygon. Let me reduce the number of fragments by simplifying or using a tolerance. Actually, let me check what's happening - the simplification is too aggressive or not enough. Let me also look at one of the fragments to see its size. Let me adjust the approach to produce cleaner output.
10:04
10:04
Bash
cd /work && python3 << 'PYEOF'
cd /work && python3 << 'PYEOF'
import geopandas as gpd
gdf = gpd.read_file("/work/vienna_boundary_changes.geojson")
print("Total features:", len(gdf))
print("\nChange type distribution:")
print(gdf["change_type"].value_counts())
print("\nArea stats:")
for ct in ["added_since_2014", "removed_since_2014", "unchanged"]:
sub = gdf[gdf["change_type"] == ct]
if not sub.empty:
areas = sub.geometry.area
print(f" {ct}: count={len(sub)}, total_area={areas.sum():.8f}, min={areas.min():.10f}, max={areas.max():.8f}")
# Look at some tiny features
small = gdf[gdf.geometry.area < 1e-10]
print(f"\nFeatures with area < 1e-10: {len(small)}")
if len(small) > 0:
print(small[["change_type", "district_name", "touches_changed"]].head(10))
PYEOF
Total features: 631
Change type distribution:
change_type
removed_since_2014 312
added_since_2014 296
unchanged 23
Name: count, dtype: int64
Area stats:
added_since_2014: count=296, total_area=0.00000187, min=0.0000000000, max=0.00000028
removed_since_2014: count=312, total_area=0.00000168, min=0.0000000000, max=0.00000017
unchanged: count=23, total_area=0.05019658, min=0.0001318603, max=0.01238067
Features with area < 1e-10: 112
change_type district_name touches_changed
7 added_since_2014 Liesing True
9 added_since_2014 Liesing True
15 added_since_2014 Liesing True
26 added_since_2014 Liesing True
30 added_since_2014 Liesing True
32 added_since_2014 Liesing True
33 added_since_2014 Liesing True
41 added_since_2014 Liesing True
42 added_since_2014 Liesing True
57 added_since_2014 Liesing True
<stdin>:10: UserWarning: Geometry is in a geographic CRS. Results from 'area' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation. <stdin>:10: UserWarning: Geometry is in a geographic CRS. Results from 'area' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation. <stdin>:10: UserWarning: Geometry is in a geographic CRS. Results from 'area' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation. <stdin>:14: UserWarning: Geometry is in a geographic CRS. Results from 'area' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.
10:30
The issue is that the symmetric difference is producing many tiny sliver fragments (area < 1e-10 sq degrees), which are numerical artifacts from slightly different boundary representations between the two time snapshots. I need to: 1. Filter out tiny fragments (slivers) that are just numerical noise 2. Simplify the approach - the current approach creates individual fragments for each tiny piece of the boundary difference Let me rethink the approach. Instead of using `shapely.difference` on the dissolved union which creates all these tiny slivers, I should: 1. For each district, compare its current geometry to its historical geometry 2. Compute the symmetric difference per district 3. Add a small buffer/tolerance to avoid slivers Or better: use a snapping tolerance when computing differences. Let me use a small buffer (like 1e-9 or so) to clean up slivers. Actually, the slivers come from the fact that the boundaries in 2014 and current are slightly different representations (different node coordinates), so when we do `shapely.difference` we get tiny slivers along the boundaries that are actually the same line but digitized differently. A better approach: 1. For each district, compute the symmetric difference between current and historical geometries 2. Use a small buffer (e.g., 1e-8) to clean up slivers 3. Classify the result But actually, the problem is that the boundaries between districts in the same snapshot might not perfectly align either (since they're separate relations with potentially different node coordinates). Let me restructure: - For each district, compare its current vs historical geometry directly - Use a tolerance to handle tiny slivers - Collect all added/removed/unchanged parts Actually, a simpler and more robust approach: 1. Snap the district geometries to each other (using a small tolerance) 2. Then compute the difference Or even simpler: since the current districts and historical districts should be the same 23 districts (just with slightly different boundaries), I can: 1. For each district, align current and historical by normalised name 2. Compute `shapely.symmetric_difference(current_geom, historical_geom)` 3. Use `shapely.normalize` and `shapely.snap` to reduce slivers 4. Filter out tiny fragments Let me rewrite the pipeline with this approach.
10:30
10:30
Write
/work/solve.py
content · 562 lines · py
"""
Fetch Vienna district boundaries (current and 2014-01-01) from Overpass via osmnx,
compare to find changes, and output vienna_boundary_changes.geojson.
"""
import os
import sys
import json
from collections import OrderedDict
import osmnx as ox
import osmnx._overpass as op
import geopandas as gpd
import pandas as pd
import shapely
import shapely.geometry
import numpy as np
ox.settings.log_console = False
# ---------------------------------------------------------------------------
# 1. Overpass query
# ---------------------------------------------------------------------------
def build_ring_from_ways(way_refs, ways_dict, nodes_dict):
"""Assemble a closed ring from a sequence of OSM way references.
Returns (ring_coords, used_refs) or (None, []).
"""
if not way_refs:
return None, []
segments = []
for ref in way_refs:
if ref not in ways_dict:
continue
node_ids = ways_dict[ref]
if len(node_ids) < 2:
continue
coords = [nodes_dict.get(nid) for nid in node_ids]
coords = [c for c in coords if c is not None]
if len(coords) < 2:
continue
segments.append({
'ref': ref,
'coords': list(coords),
'start_node': node_ids[0],
'end_node': node_ids[-1],
})
if not segments:
return None, []
used = [False] * len(segments)
# Start with longest segment
longest_idx = max(range(len(segments)), key=lambda i: len(segments[i]['coords']))
ring_coords = list(segments[longest_idx]['coords'])
current_end_node = segments[longest_idx]['end_node']
used[longest_idx] = True
changed = True
while changed:
changed = False
for i, seg in enumerate(segments):
if used[i]:
continue
if seg['start_node'] == current_end_node:
ring_coords.extend(seg['coords'][1:])
current_end_node = seg['end_node']
used[i] = True
changed = True
break
elif seg['end_node'] == current_end_node:
rev_coords = list(reversed(seg['coords']))
ring_coords.extend(rev_coords[1:])
current_end_node = seg['start_node']
used[i] = True
changed = True
break
if ring_coords[0] != ring_coords[-1]:
ring_coords.append(ring_coords[0])
used_refs = [segments[i]['ref'] for i in range(len(segments)) if used[i]]
return ring_coords, used_refs
def fetch_vienna_districts_osmnx(date=None):
"""Fetch Vienna district (admin_level=9) boundary relations via Overpass."""
bbox = (48.1, 16.1, 48.35, 16.6)
if date:
query = f"""
[out:json][timeout:180][date:"{date}"];
relation["admin_level"="9"]["boundary"="administrative"]({bbox[0]},{bbox[1]},{bbox[2]},{bbox[3]});
out body;
>;
out skel qt;
"""
else:
query = f"""
[out:json][timeout:180];
relation["admin_level"="9"]["boundary"="administrative"]({bbox[0]},{bbox[1]},{bbox[2]},{bbox[3]});
out body;
>;
out skel qt;
"""
data = OrderedDict()
data['data'] = query
result = op._overpass_request(data)
elements = result.get("elements", [])
nodes = {}
ways = {}
relations = []
for el in elements:
etype = el.get("type")
eid = el.get("id")
if etype == "node":
nodes[eid] = (el.get("lon"), el.get("lat"))
elif etype == "way":
ways[eid] = el.get("nodes", [])
elif etype == "relation":
relations.append(el)
features = []
for rel in relations:
tags = rel.get("tags", {})
name = tags.get("name", "")
admin_level = tags.get("admin_level", "")
if admin_level != "9":
continue
members = rel.get("members", [])
all_outer_refs = []
inner_refs_coords = []
for mem in members:
if mem.get("type") != "way":
continue
role = mem.get("role", "")
ref = mem.get("ref")
if ref not in ways:
continue
if role == "inner":
node_ids = ways[ref]
coords = [nodes.get(nid) for nid in node_ids]
coords = [c for c in coords if c is not None]
if len(coords) >= 3:
if coords[0] != coords[-1]:
coords.append(coords[0])
inner_refs_coords.append(coords)
else:
all_outer_refs.append(ref)
# Build outer polygons iteratively
outer_polygons = []
remaining = list(all_outer_refs)
while remaining:
ring_coords, used_refs = build_ring_from_ways(remaining, ways, nodes)
if ring_coords is None or len(ring_coords) < 4:
break
try:
ls = shapely.LineString(ring_coords)
if not ls.is_valid or ls.is_empty or ls.length < 1e-10:
break
poly = shapely.Polygon(ring_coords)
poly = shapely.orient_polygons(poly)
if poly.is_valid and not poly.is_empty and poly.area > 1e-12:
outer_polygons.append(poly)
except Exception:
break
remaining = [r for r in remaining if r not in used_refs]
if not outer_polygons:
continue
# Build inner polygons
inner_polygons = []
for coords in inner_refs_coords:
try:
ipoly = shapely.Polygon(coords)
ipoly = shapely.orient_polygons(ipoly)
if ipoly.is_valid and not ipoly.is_empty:
inner_polygons.append(ipoly)
except Exception:
continue
# Combine outer with holes
final_polygons = []
for outer_poly in outer_polygons:
inners_here = []
rest = []
for ipoly in inner_polygons:
try:
if outer_poly.contains(ipoly.representative_point()) or outer_poly.intersects(ipoly):
inners_here.append(ipoly)
else:
rest.append(ipoly)
except Exception:
rest.append(ipoly)
inner_polygons = rest
if inners_here:
try:
outer_coords = list(outer_poly.exterior.coords)
inner_coords_list = [list(ip.exterior.coords) for ip in inners_here]
poly = shapely.Polygon(outer_coords, inner_coords_list)
poly = shapely.orient_polygons(poly)
if poly.is_valid and not poly.is_empty:
final_polygons.append(poly)
else:
final_polygons.append(outer_poly)
except Exception:
final_polygons.append(outer_poly)
else:
final_polygons.append(outer_poly)
if final_polygons:
geom = shapely.union_all(final_polygons)
if geom.geom_type == "Polygon":
geom = shapely.geometry.MultiPolygon([geom])
if not geom.is_empty:
features.append({
"geometry": geom,
"name": name,
"osm_id": rel.get("id"),
})
if not features:
return gpd.GeoDataFrame()
return gpd.GeoDataFrame(features, crs="EPSG:4326")
# ---------------------------------------------------------------------------
# 2. Name normalisation
# ---------------------------------------------------------------------------
VIENNA_DISTRICTS_CANON = {
"innere stadt": "Innere Stadt",
"leopoldstadt": "Leopoldstadt",
"landstraße": "Landstraße",
"landstrasse": "Landstraße",
"wieden": "Wieden",
"margareten": "Margareten",
"margarethen": "Margareten",
"mariahilf": "Mariahilf",
"neubau": "Neubau",
"josefstadt": "Josefstadt",
"alsergrund": "Alsergrund",
"favoriten": "Favoriten",
"simmering": "Simmering",
"meidling": "Meidling",
"hietzing": "Hietzing",
"penzing": "Penzing",
"rudolfsheim-fünfhaus": "Rudolfsheim-Fünfhaus",
"rudolfsheim-fuenfhaus": "Rudolfsheim-Fünfhaus",
"ottakring": "Ottakring",
"hernals": "Hernals",
"währing": "Währing",
"waehring": "Währing",
"döbling": "Döbling",
"doebling": "Döbling",
"brigittenau": "Brigittenau",
"floridsdorf": "Floridsdorf",
"donaustadt": "Donaustadt",
"liesing": "Liesing",
}
def normalise_name(name):
if not name or pd.isna(name):
return None
n = name.strip()
key = n.lower().replace(" ", " ").replace("ß", "ss").replace("-", "-")
if key in VIENNA_DISTRICTS_CANON:
return VIENNA_DISTRICTS_CANON[key]
key2 = key.replace("ss", "ß")
if key2 in VIENNA_DISTRICTS_CANON:
return VIENNA_DISTRICTS_CANON[key2]
for pattern, canon in VIENNA_DISTRICTS_CANON.items():
if pattern in key or key in pattern:
return canon
return n
# ---------------------------------------------------------------------------
# 3. Main pipeline
# ---------------------------------------------------------------------------
def main():
print("=" * 60)
print("Fetching CURRENT Vienna district boundaries...")
print("=" * 60)
current_gdf = fetch_vienna_districts_osmnx()
print(f" Got {len(current_gdf)} current district features")
print("\n" + "=" * 60)
print("Fetching 2014-01-01 Vienna district boundaries...")
print("=" * 60)
historical_gdf = fetch_vienna_districts_osmnx(date="2014-01-01T00:00:00Z")
print(f" Got {len(historical_gdf)} historical district features")
if current_gdf.empty or historical_gdf.empty:
print("ERROR: Empty data returned!")
sys.exit(1)
# Normalise names
current_gdf["district_name"] = current_gdf["name"].apply(normalise_name)
historical_gdf["district_name"] = historical_gdf["name"].apply(normalise_name)
print("\nCurrent district names:")
for n in sorted(current_gdf["district_name"].dropna().unique()):
print(f" - {n}")
print("\nHistorical district names:")
for n in sorted(historical_gdf["district_name"].dropna().unique()):
print(f" - {n}")
# Filter to Vienna districts only
valid_names = set(VIENNA_DISTRICTS_CANON.values())
current_gdf = current_gdf[current_gdf["district_name"].isin(valid_names)].copy()
historical_gdf = historical_gdf[historical_gdf["district_name"].isin(valid_names)].copy()
print(f"\nAfter filtering:")
print(f" Current: {len(current_gdf)} districts")
print(f" Historical: {len(historical_gdf)} districts")
# Build per-district comparison
# Index historical by name
hist_by_name = {}
for _, row in historical_gdf.iterrows():
hist_by_name[row["district_name"]] = row["geometry"]
# Tolerance for snapping (roughly 1-2 meters in degrees)
TOLERANCE = 1e-7
fragments = []
for _, row in current_gdf.iterrows():
dname = row["district_name"]
current_geom = row["geometry"]
historical_geom = hist_by_name.get(dname)
if historical_geom is None:
# District exists now but not in 2014 — fully added
fragments.append({
"geometry": current_geom,
"change_type": "added_since_2014",
"district_name": dname,
})
continue
# Snap current to historical to reduce slivers
# Use normalize for robust comparison
current_norm = shapely.normalize(current_geom)
hist_norm = shapely.normalize(historical_geom)
# Snap historical geometry to current geometry's coordinates
snapped_hist = shapely.snap(hist_norm, current_norm, TOLERANCE)
# Compute differences
added_part = shapely.difference(current_norm, snapped_hist)
removed_part = shapely.difference(snapped_hist, current_norm)
unchanged_part = shapely.intersection(current_norm, snapped_hist)
# Clean up small slivers (noise)
MIN_AREA = 1e-10 # ~ 1 sq meter in degrees
def clean_geom(geom):
"""Remove tiny polygons and ensure MultiPolygon."""
if geom.is_empty:
return shapely.geometry.MultiPolygon()
# Filter out tiny polygons
if geom.geom_type == "MultiPolygon":
polys = [g for g in geom.geoms if g.area >= MIN_AREA]
if not polys:
return shapely.geometry.MultiPolygon()
if len(polys) == 1:
geom = polys[0]
else:
geom = shapely.geometry.MultiPolygon(polys)
if geom.geom_type == "Polygon":
if geom.area < MIN_AREA:
return shapely.geometry.MultiPolygon()
return shapely.geometry.MultiPolygon([geom])
return geom
added_clean = clean_geom(added_part)
removed_clean = clean_geom(removed_part)
unchanged_clean = clean_geom(unchanged_part)
if not added_clean.is_empty:
if added_clean.geom_type == "MultiPolygon":
for poly in added_clean.geoms:
fragments.append({
"geometry": shapely.geometry.MultiPolygon([poly]),
"change_type": "added_since_2014",
"district_name": dname,
})
else:
fragments.append({
"geometry": added_clean,
"change_type": "added_since_2014",
"district_name": dname,
})
if not removed_clean.is_empty:
if removed_clean.geom_type == "MultiPolygon":
for poly in removed_clean.geoms:
fragments.append({
"geometry": shapely.geometry.MultiPolygon([poly]),
"change_type": "removed_since_2014",
"district_name": dname,
})
else:
fragments.append({
"geometry": removed_clean,
"change_type": "removed_since_2014",
"district_name": dname,
})
if not unchanged_clean.is_empty:
if unchanged_clean.geom_type == "MultiPolygon":
for poly in unchanged_clean.geoms:
fragments.append({
"geometry": shapely.geometry.MultiPolygon([poly]),
"change_type": "unchanged",
"district_name": dname,
})
else:
fragments.append({
"geometry": unchanged_clean,
"change_type": "unchanged",
"district_name": dname,
})
result_gdf = gpd.GeoDataFrame(fragments, crs="EPSG:4326")
if result_gdf.empty:
print("No fragments produced. Using current boundaries as unchanged.")
result_gdf = current_gdf.copy()
result_gdf["change_type"] = "unchanged"
result_gdf = result_gdf[["geometry", "change_type", "district_name"]]
print(f"\nFragments created: {len(result_gdf)}")
if not result_gdf.empty:
print(result_gdf["change_type"].value_counts())
# -----------------------------------------------------------------------
# 4. touches_changed
# -----------------------------------------------------------------------
changed_geoms = []
for _, row in result_gdf.iterrows():
if row["change_type"] in ("added_since_2014", "removed_since_2014"):
changed_geoms.append(row["geometry"])
if changed_geoms:
changed_union = shapely.union_all(changed_geoms)
else:
changed_union = shapely.geometry.MultiPolygon()
# Compute touches per district (using current boundaries as reference)
district_touches = {}
for _, row in current_gdf.iterrows():
dname = row["district_name"]
geom = row["geometry"]
if not changed_union.is_empty:
touches = geom.touches(changed_union)
if not touches:
overlap = shapely.intersection(geom, changed_union)
touches = not overlap.is_empty
district_touches[dname] = touches
else:
district_touches[dname] = False
# Also check historical-only districts
for _, row in historical_gdf.iterrows():
dname = row["district_name"]
if dname not in district_touches:
geom = row["geometry"]
if not changed_union.is_empty:
touches = geom.touches(changed_union)
if not touches:
overlap = shapely.intersection(geom, changed_union)
touches = not overlap.is_empty
district_touches[dname] = touches
else:
district_touches[dname] = False
def get_touches(dn_str):
if pd.isna(dn_str) or dn_str == "unknown":
return False
for d in dn_str.split(","):
d = d.strip()
if district_touches.get(d, False):
return True
return False
result_gdf["touches_changed"] = result_gdf["district_name"].apply(get_touches)
# Simplify geometries and ensure MultiPolygon
result_gdf["geometry"] = result_gdf["geometry"].simplify(0.00001, preserve_topology=True)
result_gdf["geometry"] = result_gdf["geometry"].buffer(0)
def ensure_multipolygon(geom):
if geom.is_empty:
return geom
if geom.geom_type == "Polygon":
return shapely.geometry.MultiPolygon([geom])
return geom
result_gdf["geometry"] = result_gdf["geometry"].apply(ensure_multipolygon)
# -----------------------------------------------------------------------
# 5. Output
# -----------------------------------------------------------------------
output_path = "/work/vienna_boundary_changes.geojson"
result_gdf.to_file(output_path, driver="GeoJSON")
print(f"\n{'=' * 60}")
print(f"Output written to {output_path}")
print(f"Features: {len(result_gdf)}")
print(f"Columns: {result_gdf.columns.tolist()}")
print(f"\nSample features:")
print(result_gdf[["change_type", "district_name", "touches_changed"]].head(20))
print(f"\nDistrict touch summary:")
for dname, touches in sorted(district_touches.items()):
print(f" {dname}: touches_changed={touches}")
# Verify
verify = gpd.read_file(output_path)
print(f"\nVerification: read back {len(verify)} features")
print(verify["change_type"].value_counts())
print(f"Geometry types: {verify.geometry.geom_type.unique()}")
# Area stats
for ct in ["added_since_2014", "removed_since_2014", "unchanged"]:
sub = verify[verify["change_type"] == ct]
if not sub.empty:
areas = sub.geometry.area
print(f" {ct}: count={len(sub)}, total_area={areas.sum():.10f}")
if __name__ == "__main__":
main()
Wrote /work/solve.py (19789 chars).
10:32
Bash
cd /work && timeout 300 python solve.py 2>&1
============================================================
Fetching CURRENT Vienna district boundaries...
============================================================
Got 23 current district features
============================================================
Fetching 2014-01-01 Vienna district boundaries...
============================================================
Got 26 historical district features
Current district names:
- Alsergrund
- Brigittenau
- Donaustadt
- Döbling
- Favoriten
- Floridsdorf
- Hernals
- Hietzing
- Innere Stadt
- Josefstadt
- Landstraße
- Leopoldstadt
- Liesing
- Margareten
- Mariahilf
- Meidling
- Neubau
- Ottakring
- Penzing
- Rudolfsheim-Fünfhaus
- Simmering
- Wieden
- Währing
Historical district names:
- Alsergrund
- Brigittenau
- Donaustadt
- Döbling
- Enzersfeld
- Favoriten
- Floridsdorf
- Gerasdorf bei Wien
- Hernals
- Hietzing
- Innere Stadt
- Josefstadt
- Klein-Engersdorf
- Landstraße
- Leopoldstadt
- Liesing
- Margareten
- Mariahilf
- Meidling
- Neubau
- Ottakring
- Penzing
- Rudolfsheim-Fünfhaus
- Simmering
- Wieden
- Währing
After filtering:
Current: 23 districts
Historical: 23 districts
Fragments created: 2009
change_type
removed_since_2014 997
added_since_2014 989
unchanged 23
Name: count, dtype: int64
============================================================
Output written to /work/vienna_boundary_changes.geojson
Features: 2009
Columns: ['geometry', 'change_type', 'district_name', 'touches_changed']
Sample features:
change_type district_name touches_changed
0 added_since_2014 Alsergrund True
1 added_since_2014 Alsergrund True
2 added_since_2014 Alsergrund True
3 added_since_2014 Alsergrund True
4 added_since_2014 Alsergrund True
5 added_since_2014 Alsergrund True
6 added_since_2014 Alsergrund True
7 added_since_2014 Alsergrund True
8 added_since_2014 Alsergrund True
9 added_since_2014 Alsergrund True
10 added_since_2014 Alsergrund True
11 added_since_2014 Alsergrund True
12 added_since_2014 Alsergrund True
13 added_since_2014 Alsergrund True
14 added_since_2014 Alsergrund True
15 added_since_2014 Alsergrund True
16 added_since_2014 Alsergrund True
17 added_since_2014 Alsergrund True
18 added_since_2014 Alsergrund True
19 added_since_2014 Alsergrund True
District touch summary:
Alsergrund: touches_changed=True
Brigittenau: touches_changed=True
Donaustadt: touches_changed=True
Döbling: touches_changed=True
Favoriten: touches_changed=True
Floridsdorf: touches_changed=True
Hernals: touches_changed=True
Hietzing: touches_changed=True
Innere Stadt: touches_changed=True
Josefstadt: touches_changed=True
Landstraße: touches_changed=True
Leopoldstadt: touches_changed=True
Liesing: touches_changed=True
Margareten: touches_changed=True
Mariahilf: touches_changed=True
Meidling: touches_changed=True
Neubau: touches_changed=True
Ottakring: touches_changed=True
Penzing: touches_changed=True
Rudolfsheim-Fünfhaus: touches_changed=True
Simmering: touches_changed=True
Wieden: touches_changed=True
Währing: touches_changed=True
Verification: read back 2009 features
change_type
removed_since_2014 997
added_since_2014 989
unchanged 23
Name: count, dtype: int64
Geometry types: <ArrowStringArray>
['MultiPolygon']
Length: 1, dtype: str
/work/solve.py:557: UserWarning: Geometry is in a geographic CRS. Results from 'area' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.
areas = sub.geometry.area
added_since_2014: count=989, total_area=0.0000204773
/work/solve.py:557: UserWarning: Geometry is in a geographic CRS. Results from 'area' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.
areas = sub.geometry.area
removed_since_2014: count=997, total_area=0.0000203214
/work/solve.py:557: UserWarning: Geometry is in a geographic CRS. Results from 'area' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.
areas = sub.geometry.area
unchanged: count=23, total_area=0.0501774692
11:00
This is still producing a lot of tiny sliver fragments (2009 features). The problem is that even with snapping, the difference operations produce many tiny fragments along all shared boundaries. This is a classic problem with comparing two datasets that have slightly different coordinate precision. Let me rethink the approach. Instead of per-district difference (which creates slivers along every border), I should: 1. Union all current districts and union all historical districts 2. Compute the overall difference at the aggregate level (this will only show real changes, not slivers along shared edges) 3. Then clip the changed areas by individual districts Actually, the key insight is that when two districts share a boundary, the per-district difference creates slivers where those shared boundaries don't align perfectly in the two snapshots. The solution is to work at the aggregate level first. Let me try a different approach: 1. Union current and historical districts separately 2. Compute added = difference(current_union, historical_union) and removed = difference(historical_union, current_union) 3. Then assign each fragment to its district(s) by intersection with the current/historical districts 4. For unchanged, take intersection of current_union and historical_union, then clip by each current district This way, the shared boundaries between districts won't produce slivers because we dissolve them first.
11:00
11:00
Write
/work/solve.py
content · 558 lines · py
"""
Fetch Vienna district boundaries (current and 2014-01-01) from Overpass via osmnx,
compare to find changes, and output vienna_boundary_changes.geojson.
"""
import os
import sys
import json
from collections import OrderedDict
import osmnx as ox
import osmnx._overpass as op
import geopandas as gpd
import pandas as pd
import shapely
import shapely.geometry
import numpy as np
ox.settings.log_console = False
# ---------------------------------------------------------------------------
# 1. Overpass query
# ---------------------------------------------------------------------------
def build_ring_from_ways(way_refs, ways_dict, nodes_dict):
if not way_refs:
return None, []
segments = []
for ref in way_refs:
if ref not in ways_dict:
continue
node_ids = ways_dict[ref]
if len(node_ids) < 2:
continue
coords = [nodes_dict.get(nid) for nid in node_ids]
coords = [c for c in coords if c is not None]
if len(coords) < 2:
continue
segments.append({
'ref': ref,
'coords': list(coords),
'start_node': node_ids[0],
'end_node': node_ids[-1],
})
if not segments:
return None, []
used = [False] * len(segments)
longest_idx = max(range(len(segments)), key=lambda i: len(segments[i]['coords']))
ring_coords = list(segments[longest_idx]['coords'])
current_end_node = segments[longest_idx]['end_node']
used[longest_idx] = True
changed = True
while changed:
changed = False
for i, seg in enumerate(segments):
if used[i]:
continue
if seg['start_node'] == current_end_node:
ring_coords.extend(seg['coords'][1:])
current_end_node = seg['end_node']
used[i] = True
changed = True
break
elif seg['end_node'] == current_end_node:
rev_coords = list(reversed(seg['coords']))
ring_coords.extend(rev_coords[1:])
current_end_node = seg['start_node']
used[i] = True
changed = True
break
if ring_coords[0] != ring_coords[-1]:
ring_coords.append(ring_coords[0])
used_refs = [segments[i]['ref'] for i in range(len(segments)) if used[i]]
return ring_coords, used_refs
def fetch_vienna_districts_osmnx(date=None):
"""Fetch Vienna district (admin_level=9) boundary relations via Overpass."""
bbox = (48.1, 16.1, 48.35, 16.6)
if date:
query = f"""
[out:json][timeout:180][date:"{date}"];
relation["admin_level"="9"]["boundary"="administrative"]({bbox[0]},{bbox[1]},{bbox[2]},{bbox[3]});
out body;
>;
out skel qt;
"""
else:
query = f"""
[out:json][timeout:180];
relation["admin_level"="9"]["boundary"="administrative"]({bbox[0]},{bbox[1]},{bbox[2]},{bbox[3]});
out body;
>;
out skel qt;
"""
data = OrderedDict()
data['data'] = query
result = op._overpass_request(data)
elements = result.get("elements", [])
nodes = {}
ways = {}
relations = []
for el in elements:
etype = el.get("type")
eid = el.get("id")
if etype == "node":
nodes[eid] = (el.get("lon"), el.get("lat"))
elif etype == "way":
ways[eid] = el.get("nodes", [])
elif etype == "relation":
relations.append(el)
features = []
for rel in relations:
tags = rel.get("tags", {})
name = tags.get("name", "")
admin_level = tags.get("admin_level", "")
if admin_level != "9":
continue
members = rel.get("members", [])
all_outer_refs = []
inner_refs_coords = []
for mem in members:
if mem.get("type") != "way":
continue
role = mem.get("role", "")
ref = mem.get("ref")
if ref not in ways:
continue
if role == "inner":
node_ids = ways[ref]
coords = [nodes.get(nid) for nid in node_ids]
coords = [c for c in coords if c is not None]
if len(coords) >= 3:
if coords[0] != coords[-1]:
coords.append(coords[0])
inner_refs_coords.append(coords)
else:
all_outer_refs.append(ref)
outer_polygons = []
remaining = list(all_outer_refs)
while remaining:
ring_coords, used_refs = build_ring_from_ways(remaining, ways, nodes)
if ring_coords is None or len(ring_coords) < 4:
break
try:
ls = shapely.LineString(ring_coords)
if not ls.is_valid or ls.is_empty or ls.length < 1e-10:
break
poly = shapely.Polygon(ring_coords)
poly = shapely.orient_polygons(poly)
if poly.is_valid and not poly.is_empty and poly.area > 1e-12:
outer_polygons.append(poly)
except Exception:
break
remaining = [r for r in remaining if r not in used_refs]
if not outer_polygons:
continue
inner_polygons = []
for coords in inner_refs_coords:
try:
ipoly = shapely.Polygon(coords)
ipoly = shapely.orient_polygons(ipoly)
if ipoly.is_valid and not ipoly.is_empty:
inner_polygons.append(ipoly)
except Exception:
continue
final_polygons = []
for outer_poly in outer_polygons:
inners_here = []
rest = []
for ipoly in inner_polygons:
try:
if outer_poly.contains(ipoly.representative_point()) or outer_poly.intersects(ipoly):
inners_here.append(ipoly)
else:
rest.append(ipoly)
except Exception:
rest.append(ipoly)
inner_polygons = rest
if inners_here:
try:
outer_coords = list(outer_poly.exterior.coords)
inner_coords_list = [list(ip.exterior.coords) for ip in inners_here]
poly = shapely.Polygon(outer_coords, inner_coords_list)
poly = shapely.orient_polygons(poly)
if poly.is_valid and not poly.is_empty:
final_polygons.append(poly)
else:
final_polygons.append(outer_poly)
except Exception:
final_polygons.append(outer_poly)
else:
final_polygons.append(outer_poly)
if final_polygons:
geom = shapely.union_all(final_polygons)
if geom.geom_type == "Polygon":
geom = shapely.geometry.MultiPolygon([geom])
if not geom.is_empty:
features.append({
"geometry": geom,
"name": name,
"osm_id": rel.get("id"),
})
if not features:
return gpd.GeoDataFrame()
return gpd.GeoDataFrame(features, crs="EPSG:4326")
# ---------------------------------------------------------------------------
# 2. Name normalisation
# ---------------------------------------------------------------------------
VIENNA_DISTRICTS_CANON = {
"innere stadt": "Innere Stadt",
"leopoldstadt": "Leopoldstadt",
"landstraße": "Landstraße",
"landstrasse": "Landstraße",
"wieden": "Wieden",
"margareten": "Margareten",
"margarethen": "Margareten",
"mariahilf": "Mariahilf",
"neubau": "Neubau",
"josefstadt": "Josefstadt",
"alsergrund": "Alsergrund",
"favoriten": "Favoriten",
"simmering": "Simmering",
"meidling": "Meidling",
"hietzing": "Hietzing",
"penzing": "Penzing",
"rudolfsheim-fünfhaus": "Rudolfsheim-Fünfhaus",
"rudolfsheim-fuenfhaus": "Rudolfsheim-Fünfhaus",
"ottakring": "Ottakring",
"hernals": "Hernals",
"währing": "Währing",
"waehring": "Währing",
"döbling": "Döbling",
"doebling": "Döbling",
"brigittenau": "Brigittenau",
"floridsdorf": "Floridsdorf",
"donaustadt": "Donaustadt",
"liesing": "Liesing",
}
def normalise_name(name):
if not name or pd.isna(name):
return None
n = name.strip()
key = n.lower().replace(" ", " ").replace("ß", "ss").replace("-", "-")
if key in VIENNA_DISTRICTS_CANON:
return VIENNA_DISTRICTS_CANON[key]
key2 = key.replace("ss", "ß")
if key2 in VIENNA_DISTRICTS_CANON:
return VIENNA_DISTRICTS_CANON[key2]
for pattern, canon in VIENNA_DISTRICTS_CANON.items():
if pattern in key or key in pattern:
return canon
return n
# ---------------------------------------------------------------------------
# 3. Main pipeline
# ---------------------------------------------------------------------------
def main():
print("=" * 60)
print("Fetching CURRENT Vienna district boundaries...")
print("=" * 60)
current_gdf = fetch_vienna_districts_osmnx()
print(f" Got {len(current_gdf)} current district features")
print("\n" + "=" * 60)
print("Fetching 2014-01-01 Vienna district boundaries...")
print("=" * 60)
historical_gdf = fetch_vienna_districts_osmnx(date="2014-01-01T00:00:00Z")
print(f" Got {len(historical_gdf)} historical district features")
if current_gdf.empty or historical_gdf.empty:
print("ERROR: Empty data returned!")
sys.exit(1)
# Normalise names
current_gdf["district_name"] = current_gdf["name"].apply(normalise_name)
historical_gdf["district_name"] = historical_gdf["name"].apply(normalise_name)
print("\nCurrent district names:")
for n in sorted(current_gdf["district_name"].dropna().unique()):
print(f" - {n}")
print("\nHistorical district names:")
for n in sorted(historical_gdf["district_name"].dropna().unique()):
print(f" - {n}")
# Filter to Vienna districts only
valid_names = set(VIENNA_DISTRICTS_CANON.values())
current_gdf = current_gdf[current_gdf["district_name"].isin(valid_names)].copy()
historical_gdf = historical_gdf[historical_gdf["district_name"].isin(valid_names)].copy()
print(f"\nAfter filtering to Vienna districts:")
print(f" Current: {len(current_gdf)} districts")
print(f" Historical: {len(historical_gdf)} districts")
# -----------------------------------------------------------------------
# Strategy: dissolve to city level first, find changes there,
# then clip back per-district for attribution
# -----------------------------------------------------------------------
# First, create a clean dissolved union of each snapshot
current_union = shapely.union_all(current_gdf.geometry)
historical_union = shapely.union_all(historical_gdf.geometry)
print(f"\nCurrent union area: {current_union.area:.8f}")
print(f"Historical union area: {historical_union.area:.8f}")
# Snap historical to current to eliminate coordinate precision differences
TOLERANCE = 1e-7 # ~1cm in degrees
snapped_historical = shapely.snap(historical_union, current_union, TOLERANCE)
# Compute symmetric difference at city level
added = shapely.difference(current_union, snapped_historical)
removed = shapely.difference(snapped_historical, current_union)
unchanged = shapely.intersection(current_union, snapped_historical)
print(f"\nAdded: area={added.area:.10f}, empty={added.is_empty}")
print(f"Removed: area={removed.area:.10f}, empty={removed.is_empty}")
print(f"Unchanged: area={unchanged.area:.10f}, empty={unchanged.is_empty}")
# Remove tiny slivers
MIN_AREA = 1e-9 # ~ 0.01 sq m in degrees
def filter_slivers(geom):
if geom.is_empty:
return shapely.geometry.MultiPolygon()
if geom.geom_type == "MultiPolygon":
polys = [g for g in geom.geoms if g.area >= MIN_AREA]
if not polys:
return shapely.geometry.MultiPolygon()
if len(polys) == 1:
return shapely.geometry.MultiPolygon([polys[0]])
return shapely.geometry.MultiPolygon(polys)
elif geom.geom_type == "Polygon":
if geom.area >= MIN_AREA:
return shapely.geometry.MultiPolygon([geom])
return shapely.geometry.MultiPolygon()
return geom
added = filter_slivers(added)
removed = filter_slivers(removed)
unchanged = filter_slivers(unchanged)
print(f"\nAfter sliver removal:")
print(f" Added: area={added.area:.10f}" if not added.is_empty else " Added: empty")
print(f" Removed: area={removed.area:.10f}" if not removed.is_empty else " Removed: empty")
print(f" Unchanged: area={unchanged.area:.10f}" if not unchanged.is_empty else " Unchanged: empty")
# -----------------------------------------------------------------------
# Build fragments with district attribution
# -----------------------------------------------------------------------
fragments = []
def add_fragments(geom, ctype, source_gdf):
"""Add geometry fragments, assigning district_name by intersection with source_gdf."""
if geom.is_empty:
return
if geom.geom_type == "MultiPolygon":
polys = list(geom.geoms)
elif geom.geom_type == "Polygon":
polys = [geom]
else:
polys = []
for poly in polys:
if poly.area < MIN_AREA:
continue
poly_mp = shapely.geometry.MultiPolygon([poly]) if poly.geom_type == "Polygon" else poly
# Find which district(s) this belongs to
districts = set()
for _, row in source_gdf.iterrows():
if poly_mp.intersects(row["geometry"]):
districts.add(row["district_name"])
dname = ", ".join(sorted(districts)) if districts else "unknown"
fragments.append({
"geometry": poly_mp,
"change_type": ctype,
"district_name": dname,
})
# Added areas: from current districts
add_fragments(added, "added_since_2014", current_gdf)
# Removed areas: from historical districts
add_fragments(removed, "removed_since_2014", historical_gdf)
# Unchanged: clip by each current district separately
for _, row in current_gdf.iterrows():
dname = row["district_name"]
dist_geom = row["geometry"]
unchanged_part = shapely.intersection(dist_geom, unchanged)
if not unchanged_part.is_empty:
unchanged_part = filter_slivers(unchanged_part)
if not unchanged_part.is_empty:
if unchanged_part.geom_type == "MultiPolygon":
polys = list(unchanged_part.geoms)
elif unchanged_part.geom_type == "Polygon":
polys = [unchanged_part]
else:
polys = []
for poly in polys:
if poly.area < MIN_AREA:
continue
if poly.geom_type == "Polygon":
poly_mp = shapely.geometry.MultiPolygon([poly])
else:
poly_mp = poly
fragments.append({
"geometry": poly_mp,
"change_type": "unchanged",
"district_name": dname,
})
result_gdf = gpd.GeoDataFrame(fragments, crs="EPSG:4326")
if result_gdf.empty:
print("No fragments produced. Using current boundaries as unchanged.")
result_gdf = current_gdf.copy()
result_gdf["change_type"] = "unchanged"
result_gdf = result_gdf[["geometry", "change_type", "district_name"]]
print(f"\nFragments created: {len(result_gdf)}")
if not result_gdf.empty:
print(result_gdf["change_type"].value_counts())
# -----------------------------------------------------------------------
# 4. touches_changed
# -----------------------------------------------------------------------
changed_geoms = []
for _, row in result_gdf.iterrows():
if row["change_type"] in ("added_since_2014", "removed_since_2014"):
changed_geoms.append(row["geometry"])
if changed_geoms:
changed_union = shapely.union_all(changed_geoms)
else:
changed_union = shapely.geometry.MultiPolygon()
district_touches = {}
for _, row in current_gdf.iterrows():
dname = row["district_name"]
geom = row["geometry"]
if not changed_union.is_empty:
touches = geom.touches(changed_union)
if not touches:
overlap = shapely.intersection(geom, changed_union)
touches = not overlap.is_empty
district_touches[dname] = touches
else:
district_touches[dname] = False
for _, row in historical_gdf.iterrows():
dname = row["district_name"]
if dname not in district_touches:
geom = row["geometry"]
if not changed_union.is_empty:
touches = geom.touches(changed_union)
if not touches:
overlap = shapely.intersection(geom, changed_union)
touches = not overlap.is_empty
district_touches[dname] = touches
else:
district_touches[dname] = False
def get_touches(dn_str):
if pd.isna(dn_str) or dn_str == "unknown":
return False
for d in dn_str.split(","):
d = d.strip()
if district_touches.get(d, False):
return True
return False
result_gdf["touches_changed"] = result_gdf["district_name"].apply(get_touches)
# Simplify and ensure MultiPolygon
result_gdf["geometry"] = result_gdf["geometry"].simplify(0.00001, preserve_topology=True)
result_gdf["geometry"] = result_gdf["geometry"].buffer(0)
def ensure_multipolygon(geom):
if geom.is_empty:
return geom
if geom.geom_type == "Polygon":
return shapely.geometry.MultiPolygon([geom])
return geom
result_gdf["geometry"] = result_gdf["geometry"].apply(ensure_multipolygon)
# -----------------------------------------------------------------------
# 5. Output
# -----------------------------------------------------------------------
output_path = "/work/vienna_boundary_changes.geojson"
result_gdf.to_file(output_path, driver="GeoJSON")
print(f"\n{'=' * 60}")
print(f"Output written to {output_path}")
print(f"Features: {len(result_gdf)}")
print(f"Columns: {result_gdf.columns.tolist()}")
print(f"\nSample features:")
print(result_gdf[["change_type", "district_name", "touches_changed"]].head(20))
print(f"\nDistrict touch summary:")
for dname, touches in sorted(district_touches.items()):
print(f" {dname}: touches_changed={touches}")
# Verify
verify = gpd.read_file(output_path)
print(f"\nVerification: read back {len(verify)} features")
print(verify["change_type"].value_counts())
print(f"Geometry types: {verify.geometry.geom_type.unique()}")
# Area stats
for ct in ["added_since_2014", "removed_since_2014", "unchanged"]:
sub = verify[verify["change_type"] == ct]
if not sub.empty:
areas = sub.geometry.area
print(f" {ct}: count={len(sub)}, total_area={areas.sum():.10f}")
if __name__ == "__main__":
main()
Wrote /work/solve.py (20242 chars).
11:02
Bash
cd /work && timeout 300 python solve.py 2>&1
============================================================
Fetching CURRENT Vienna district boundaries...
============================================================
Got 23 current district features
============================================================
Fetching 2014-01-01 Vienna district boundaries...
============================================================
Got 26 historical district features
Current district names:
- Alsergrund
- Brigittenau
- Donaustadt
- Döbling
- Favoriten
- Floridsdorf
- Hernals
- Hietzing
- Innere Stadt
- Josefstadt
- Landstraße
- Leopoldstadt
- Liesing
- Margareten
- Mariahilf
- Meidling
- Neubau
- Ottakring
- Penzing
- Rudolfsheim-Fünfhaus
- Simmering
- Wieden
- Währing
Historical district names:
- Alsergrund
- Brigittenau
- Donaustadt
- Döbling
- Enzersfeld
- Favoriten
- Floridsdorf
- Gerasdorf bei Wien
- Hernals
- Hietzing
- Innere Stadt
- Josefstadt
- Klein-Engersdorf
- Landstraße
- Leopoldstadt
- Liesing
- Margareten
- Mariahilf
- Meidling
- Neubau
- Ottakring
- Penzing
- Rudolfsheim-Fünfhaus
- Simmering
- Wieden
- Währing
After filtering to Vienna districts:
Current: 23 districts
Historical: 23 districts
Current union area: 0.05019851
Historical union area: 0.05019825
Added: area=0.0000019540, empty=False
Removed: area=0.0000016878, empty=False
Unchanged: area=0.0501965579, empty=False
After sliver removal:
Added: area=0.0000019149
Removed: area=0.0000016405
Unchanged: area=0.0501965579
Fragments created: 332
change_type
added_since_2014 158
removed_since_2014 151
unchanged 23
Name: count, dtype: int64
============================================================
Output written to /work/vienna_boundary_changes.geojson
Features: 332
Columns: ['geometry', 'change_type', 'district_name', 'touches_changed']
Sample features:
change_type district_name touches_changed
0 added_since_2014 Liesing True
1 added_since_2014 Liesing True
2 added_since_2014 Liesing True
3 added_since_2014 Liesing True
4 added_since_2014 Liesing True
5 added_since_2014 Liesing True
6 added_since_2014 Liesing True
7 added_since_2014 Liesing True
8 added_since_2014 Liesing True
9 added_since_2014 Liesing True
10 added_since_2014 Liesing True
11 added_since_2014 Liesing True
12 added_since_2014 Liesing True
13 added_since_2014 Liesing True
14 added_since_2014 Liesing True
15 added_since_2014 Liesing True
16 added_since_2014 Liesing True
17 added_since_2014 Liesing True
18 added_since_2014 Liesing True
19 added_since_2014 Liesing True
District touch summary:
Alsergrund: touches_changed=False
Brigittenau: touches_changed=False
Donaustadt: touches_changed=True
Döbling: touches_changed=True
Favoriten: touches_changed=True
Floridsdorf: touches_changed=True
Hernals: touches_changed=True
Hietzing: touches_changed=True
Innere Stadt: touches_changed=False
Josefstadt: touches_changed=False
Landstraße: touches_changed=False
Leopoldstadt: touches_changed=False
Liesing: touches_changed=True
Margareten: touches_changed=False
Mariahilf: touches_changed=False
Meidling: touches_changed=False
Neubau: touches_changed=False
Ottakring: touches_changed=False
Penzing: touches_changed=True
Rudolfsheim-Fünfhaus: touches_changed=False
Simmering: touches_changed=True
Wieden: touches_changed=False
Währing: touches_changed=False
Verification: read back 332 features
change_type
added_since_2014 158
removed_since_2014 151
unchanged 23
Name: count, dtype: int64
Geometry types: <ArrowStringArray>
['MultiPolygon']
Length: 1, dtype: str
/work/solve.py:553: UserWarning: Geometry is in a geographic CRS. Results from 'area' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.
areas = sub.geometry.area
added_since_2014: count=158, total_area=0.0000018387
/work/solve.py:553: UserWarning: Geometry is in a geographic CRS. Results from 'area' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.
areas = sub.geometry.area
removed_since_2014: count=151, total_area=0.0000016243
/work/solve.py:553: UserWarning: Geometry is in a geographic CRS. Results from 'area' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.
areas = sub.geometry.area
unchanged: count=23, total_area=0.0501965399
11:30
Good improvement. Now we have 332 fragments (down from 2000+). But we still have 158 added fragments and 151 removed fragments - still a lot of tiny slivers from the `shapely.intersection` clipping against districts. The per-district unchanged clipping is fine (23 fragments for 23 districts), but the added/removed fragments are getting broken into many pieces by the intersection with district boundaries. The issue is that when we take the added geometry (which might be a single polygon) and then try to attribute it to districts by intersecting with each district, we end up with many pieces. Let me think about this differently. The added/removed areas are already small. Instead of doing per-district intersection for added/removed, we can just assign them to the district they belong to by checking containment/intersection, without actually cutting them along district boundaries. We can keep them as-is (MultiPolygon) and just tag them with all districts they touch. Actually, the real problem is the `add_fragments` function. When we call `poly_mp.intersects(row["geometry"])` for each district, we correctly identify the district. But in the fragment creation, we're NOT cutting - we're just checking intersects and assigning the name. So the fragments should remain as the original added/removed polygons. Wait, looking at the code again - we ARE adding the geometry unchanged from `add_fragments`. So `added` starts as a MultiPolygon with some number of polygons (from the dissolve), and we iterate over each polygon in the MultiPolygon. That gives us a reasonable number. But 158 + 151 = 309 added/removed fragments! That means the added MultiPolygon had 158 individual polygons and removed had 151. That's a lot... Actually, these are individual tiny polygons from the boundary difference. Vienna boundaries probably changed in many small segments along the outer boundary. Let me check what districts these are in. Most of them are in Liesing (southern district). Let me check if there's a boundary change along the southern edge of Vienna. Actually, the counts might be OK - real boundary changes between 2014 and now would produce many small difference polygons along the boundary. But 158 seems high. Let me check the area of the largest fragment to understand the scale. Actually, looking at total added area = 0.0000018387 sq degrees ≈ 0.0000018387 * (111000)^2 ≈ 22,600 sq meters. That's a real change. And 158 fragments for that area means many tiny slivers. The issue is that the boundaries don't perfectly align even after snapping. Let me increase the snapping tolerance and also try using a larger tolerance for the difference operation. Let me try a different approach: use a slightly larger tolerance and also clean up the results by dissolving very close geometries.
11:30
11:30
Write
/work/solve.py
content · 533 lines · py
"""
Fetch Vienna district boundaries (current and 2014-01-01) from Overpass via osmnx,
compare to find changes, and output vienna_boundary_changes.geojson.
"""
import os
import sys
import json
from collections import OrderedDict
import osmnx as ox
import osmnx._overpass as op
import geopandas as gpd
import pandas as pd
import shapely
import shapely.geometry
import numpy as np
ox.settings.log_console = False
# ---------------------------------------------------------------------------
# 1. Overpass query
# ---------------------------------------------------------------------------
def build_ring_from_ways(way_refs, ways_dict, nodes_dict):
if not way_refs:
return None, []
segments = []
for ref in way_refs:
if ref not in ways_dict:
continue
node_ids = ways_dict[ref]
if len(node_ids) < 2:
continue
coords = [nodes_dict.get(nid) for nid in node_ids]
coords = [c for c in coords if c is not None]
if len(coords) < 2:
continue
segments.append({
'ref': ref,
'coords': list(coords),
'start_node': node_ids[0],
'end_node': node_ids[-1],
})
if not segments:
return None, []
used = [False] * len(segments)
longest_idx = max(range(len(segments)), key=lambda i: len(segments[i]['coords']))
ring_coords = list(segments[longest_idx]['coords'])
current_end_node = segments[longest_idx]['end_node']
used[longest_idx] = True
changed = True
while changed:
changed = False
for i, seg in enumerate(segments):
if used[i]:
continue
if seg['start_node'] == current_end_node:
ring_coords.extend(seg['coords'][1:])
current_end_node = seg['end_node']
used[i] = True
changed = True
break
elif seg['end_node'] == current_end_node:
rev_coords = list(reversed(seg['coords']))
ring_coords.extend(rev_coords[1:])
current_end_node = seg['start_node']
used[i] = True
changed = True
break
if ring_coords[0] != ring_coords[-1]:
ring_coords.append(ring_coords[0])
used_refs = [segments[i]['ref'] for i in range(len(segments)) if used[i]]
return ring_coords, used_refs
def fetch_vienna_districts_osmnx(date=None):
bbox = (48.1, 16.1, 48.35, 16.6)
if date:
query = f"""
[out:json][timeout:180][date:"{date}"];
relation["admin_level"="9"]["boundary"="administrative"]({bbox[0]},{bbox[1]},{bbox[2]},{bbox[3]});
out body;
>;
out skel qt;
"""
else:
query = f"""
[out:json][timeout:180];
relation["admin_level"="9"]["boundary"="administrative"]({bbox[0]},{bbox[1]},{bbox[2]},{bbox[3]});
out body;
>;
out skel qt;
"""
data = OrderedDict()
data['data'] = query
result = op._overpass_request(data)
elements = result.get("elements", [])
nodes = {}
ways = {}
relations = []
for el in elements:
etype = el.get("type")
eid = el.get("id")
if etype == "node":
nodes[eid] = (el.get("lon"), el.get("lat"))
elif etype == "way":
ways[eid] = el.get("nodes", [])
elif etype == "relation":
relations.append(el)
features = []
for rel in relations:
tags = rel.get("tags", {})
name = tags.get("name", "")
admin_level = tags.get("admin_level", "")
if admin_level != "9":
continue
members = rel.get("members", [])
all_outer_refs = []
inner_refs_coords = []
for mem in members:
if mem.get("type") != "way":
continue
role = mem.get("role", "")
ref = mem.get("ref")
if ref not in ways:
continue
if role == "inner":
node_ids = ways[ref]
coords = [nodes.get(nid) for nid in node_ids]
coords = [c for c in coords if c is not None]
if len(coords) >= 3:
if coords[0] != coords[-1]:
coords.append(coords[0])
inner_refs_coords.append(coords)
else:
all_outer_refs.append(ref)
outer_polygons = []
remaining = list(all_outer_refs)
while remaining:
ring_coords, used_refs = build_ring_from_ways(remaining, ways, nodes)
if ring_coords is None or len(ring_coords) < 4:
break
try:
ls = shapely.LineString(ring_coords)
if not ls.is_valid or ls.is_empty or ls.length < 1e-10:
break
poly = shapely.Polygon(ring_coords)
poly = shapely.orient_polygons(poly)
if poly.is_valid and not poly.is_empty and poly.area > 1e-12:
outer_polygons.append(poly)
except Exception:
break
remaining = [r for r in remaining if r not in used_refs]
if not outer_polygons:
continue
inner_polygons = []
for coords in inner_refs_coords:
try:
ipoly = shapely.Polygon(coords)
ipoly = shapely.orient_polygons(ipoly)
if ipoly.is_valid and not ipoly.is_empty:
inner_polygons.append(ipoly)
except Exception:
continue
final_polygons = []
for outer_poly in outer_polygons:
inners_here = []
rest = []
for ipoly in inner_polygons:
try:
if outer_poly.contains(ipoly.representative_point()) or outer_poly.intersects(ipoly):
inners_here.append(ipoly)
else:
rest.append(ipoly)
except Exception:
rest.append(ipoly)
inner_polygons = rest
if inners_here:
try:
outer_coords = list(outer_poly.exterior.coords)
inner_coords_list = [list(ip.exterior.coords) for ip in inners_here]
poly = shapely.Polygon(outer_coords, inner_coords_list)
poly = shapely.orient_polygons(poly)
if poly.is_valid and not poly.is_empty:
final_polygons.append(poly)
else:
final_polygons.append(outer_poly)
except Exception:
final_polygons.append(outer_poly)
else:
final_polygons.append(outer_poly)
if final_polygons:
geom = shapely.union_all(final_polygons)
if geom.geom_type == "Polygon":
geom = shapely.geometry.MultiPolygon([geom])
if not geom.is_empty:
features.append({
"geometry": geom,
"name": name,
"osm_id": rel.get("id"),
})
if not features:
return gpd.GeoDataFrame()
return gpd.GeoDataFrame(features, crs="EPSG:4326")
# ---------------------------------------------------------------------------
# 2. Name normalisation
# ---------------------------------------------------------------------------
VIENNA_DISTRICTS_CANON = {
"innere stadt": "Innere Stadt",
"leopoldstadt": "Leopoldstadt",
"landstraße": "Landstraße",
"landstrasse": "Landstraße",
"wieden": "Wieden",
"margareten": "Margareten",
"margarethen": "Margareten",
"mariahilf": "Mariahilf",
"neubau": "Neubau",
"josefstadt": "Josefstadt",
"alsergrund": "Alsergrund",
"favoriten": "Favoriten",
"simmering": "Simmering",
"meidling": "Meidling",
"hietzing": "Hietzing",
"penzing": "Penzing",
"rudolfsheim-fünfhaus": "Rudolfsheim-Fünfhaus",
"rudolfsheim-fuenfhaus": "Rudolfsheim-Fünfhaus",
"ottakring": "Ottakring",
"hernals": "Hernals",
"währing": "Währing",
"waehring": "Währing",
"döbling": "Döbling",
"doebling": "Döbling",
"brigittenau": "Brigittenau",
"floridsdorf": "Floridsdorf",
"donaustadt": "Donaustadt",
"liesing": "Liesing",
}
def normalise_name(name):
if not name or pd.isna(name):
return None
n = name.strip()
key = n.lower().replace(" ", " ").replace("ß", "ss").replace("-", "-")
if key in VIENNA_DISTRICTS_CANON:
return VIENNA_DISTRICTS_CANON[key]
key2 = key.replace("ss", "ß")
if key2 in VIENNA_DISTRICTS_CANON:
return VIENNA_DISTRICTS_CANON[key2]
for pattern, canon in VIENNA_DISTRICTS_CANON.items():
if pattern in key or key in pattern:
return canon
return n
# ---------------------------------------------------------------------------
# 3. Main pipeline
# ---------------------------------------------------------------------------
def main():
print("=" * 60)
print("Fetching CURRENT Vienna district boundaries...")
print("=" * 60)
current_gdf = fetch_vienna_districts_osmnx()
print(f" Got {len(current_gdf)} current district features")
print("\n" + "=" * 60)
print("Fetching 2014-01-01 Vienna district boundaries...")
print("=" * 60)
historical_gdf = fetch_vienna_districts_osmnx(date="2014-01-01T00:00:00Z")
print(f" Got {len(historical_gdf)} historical district features")
if current_gdf.empty or historical_gdf.empty:
print("ERROR: Empty data returned!")
sys.exit(1)
# Normalise names
current_gdf["district_name"] = current_gdf["name"].apply(normalise_name)
historical_gdf["district_name"] = historical_gdf["name"].apply(normalise_name)
print("\nCurrent district names:")
for n in sorted(current_gdf["district_name"].dropna().unique()):
print(f" - {n}")
print("\nHistorical district names:")
for n in sorted(historical_gdf["district_name"].dropna().unique()):
print(f" - {n}")
# Filter to Vienna districts only
valid_names = set(VIENNA_DISTRICTS_CANON.values())
current_gdf = current_gdf[current_gdf["district_name"].isin(valid_names)].copy()
historical_gdf = historical_gdf[historical_gdf["district_name"].isin(valid_names)].copy()
print(f"\nAfter filtering to Vienna districts:")
print(f" Current: {len(current_gdf)} districts")
print(f" Historical: {len(historical_gdf)} districts")
# Create per-district snapshots indexed by name
hist_by_name = {row["district_name"]: row["geometry"] for _, row in historical_gdf.iterrows()}
# -----------------------------------------------------------------------
# Strategy: compare district-per-district, with coordinate snapping
# -----------------------------------------------------------------------
TOLERANCE = 1e-6 # ~10cm in degrees
MIN_AREA = 1e-9 # ~0.01 sq m in degrees
fragments = []
for _, row in current_gdf.iterrows():
dname = row["district_name"]
current_geom = row["geometry"]
historical_geom = hist_by_name.get(dname)
if historical_geom is None:
# District exists now but not in 2014
fragments.append({
"geometry": current_geom,
"change_type": "added_since_2014",
"district_name": dname,
})
continue
# Snap geometries to each other to eliminate coordinate noise
current_norm = shapely.normalize(current_geom)
hist_norm = shapely.normalize(historical_geom)
# Two-way snapping for best alignment
snapped_current = shapely.snap(current_norm, hist_norm, TOLERANCE)
snapped_hist = shapely.snap(hist_norm, snapped_current, TOLERANCE)
# Symmetric difference
added_part = shapely.difference(snapped_current, snapped_hist)
removed_part = shapely.difference(snapped_hist, snapped_current)
unchanged_part = shapely.intersection(snapped_current, snapped_hist)
# Clean slivers
def clean_and_split(geom, label):
"""Clean geometry, filter slivers, return list of (geom, label) fragments."""
if geom.is_empty:
return []
if geom.geom_type == "MultiPolygon":
polys = list(geom.geoms)
elif geom.geom_type == "Polygon":
polys = [geom]
else:
return []
result = []
for poly in polys:
if poly.is_empty or poly.area < MIN_AREA:
continue
if poly.geom_type == "Polygon":
mp = shapely.geometry.MultiPolygon([poly])
else:
mp = poly
result.append((mp, label))
return result
for mp, ctype in clean_and_split(added_part, "added_since_2014"):
fragments.append({
"geometry": mp,
"change_type": ctype,
"district_name": dname,
})
for mp, ctype in clean_and_split(removed_part, "removed_since_2014"):
fragments.append({
"geometry": mp,
"change_type": ctype,
"district_name": dname,
})
for mp, ctype in clean_and_split(unchanged_part, "unchanged"):
fragments.append({
"geometry": mp,
"change_type": ctype,
"district_name": dname,
})
# Also check for historical districts that no longer exist
current_names = set(current_gdf["district_name"])
for _, row in historical_gdf.iterrows():
dname = row["district_name"]
if dname not in current_names:
# District existed in 2014 but not now — fully removed
fragments.append({
"geometry": row["geometry"],
"change_type": "removed_since_2014",
"district_name": dname,
})
result_gdf = gpd.GeoDataFrame(fragments, crs="EPSG:4326")
if result_gdf.empty:
print("No fragments produced. Using current boundaries as unchanged.")
result_gdf = current_gdf.copy()
result_gdf["change_type"] = "unchanged"
result_gdf = result_gdf[["geometry", "change_type", "district_name"]]
print(f"\nFragments created: {len(result_gdf)}")
if not result_gdf.empty:
print(result_gdf["change_type"].value_counts())
# -----------------------------------------------------------------------
# 4. touches_changed
# -----------------------------------------------------------------------
changed_geoms = []
for _, row in result_gdf.iterrows():
if row["change_type"] in ("added_since_2014", "removed_since_2014"):
changed_geoms.append(row["geometry"])
if changed_geoms:
changed_union = shapely.union_all(changed_geoms)
else:
changed_union = shapely.geometry.MultiPolygon()
district_touches = {}
for _, row in current_gdf.iterrows():
dname = row["district_name"]
geom = row["geometry"]
if not changed_union.is_empty:
touches = geom.touches(changed_union)
if not touches:
overlap = shapely.intersection(geom, changed_union)
touches = not overlap.is_empty
district_touches[dname] = touches
else:
district_touches[dname] = False
for _, row in historical_gdf.iterrows():
dname = row["district_name"]
if dname not in district_touches:
geom = row["geometry"]
if not changed_union.is_empty:
touches = geom.touches(changed_union)
if not touches:
overlap = shapely.intersection(geom, changed_union)
touches = not overlap.is_empty
district_touches[dname] = touches
else:
district_touches[dname] = False
def get_touches(dn_str):
if pd.isna(dn_str) or dn_str == "unknown":
return False
for d in dn_str.split(","):
d = d.strip()
if district_touches.get(d, False):
return True
return False
result_gdf["touches_changed"] = result_gdf["district_name"].apply(get_touches)
# Simplify and ensure MultiPolygon
result_gdf["geometry"] = result_gdf["geometry"].simplify(0.00001, preserve_topology=True)
result_gdf["geometry"] = result_gdf["geometry"].buffer(0)
def ensure_multipolygon(geom):
if geom.is_empty:
return geom
if geom.geom_type == "Polygon":
return shapely.geometry.MultiPolygon([geom])
return geom
result_gdf["geometry"] = result_gdf["geometry"].apply(ensure_multipolygon)
# -----------------------------------------------------------------------
# 5. Output
# -----------------------------------------------------------------------
output_path = "/work/vienna_boundary_changes.geojson"
result_gdf.to_file(output_path, driver="GeoJSON")
print(f"\n{'=' * 60}")
print(f"Output written to {output_path}")
print(f"Features: {len(result_gdf)}")
print(f"Columns: {result_gdf.columns.tolist()}")
print(f"\nSample features:")
print(result_gdf[["change_type", "district_name", "touches_changed"]].head(20))
print(f"\nDistrict touch summary:")
for dname, touches in sorted(district_touches.items()):
print(f" {dname}: touches_changed={touches}")
# Verify
verify = gpd.read_file(output_path)
print(f"\nVerification: read back {len(verify)} features")
print(verify["change_type"].value_counts())
print(f"Geometry types: {verify.geometry.geom_type.unique()}")
# Area stats
for ct in ["added_since_2014", "removed_since_2014", "unchanged"]:
sub = verify[verify["change_type"] == ct]
if not sub.empty:
areas = sub.geometry.area
print(f" {ct}: count={len(sub)}, total_area={areas.sum():.10f}")
if __name__ == "__main__":
main()
Wrote /work/solve.py (18611 chars).
11:32
Bash
cd /work && timeout 300 python solve.py 2>&1
============================================================
Fetching CURRENT Vienna district boundaries...
============================================================
Got 23 current district features
============================================================
Fetching 2014-01-01 Vienna district boundaries...
============================================================
Got 26 historical district features
Current district names:
- Alsergrund
- Brigittenau
- Donaustadt
- Döbling
- Favoriten
- Floridsdorf
- Hernals
- Hietzing
- Innere Stadt
- Josefstadt
- Landstraße
- Leopoldstadt
- Liesing
- Margareten
- Mariahilf
- Meidling
- Neubau
- Ottakring
- Penzing
- Rudolfsheim-Fünfhaus
- Simmering
- Wieden
- Währing
Historical district names:
- Alsergrund
- Brigittenau
- Donaustadt
- Döbling
- Enzersfeld
- Favoriten
- Floridsdorf
- Gerasdorf bei Wien
- Hernals
- Hietzing
- Innere Stadt
- Josefstadt
- Klein-Engersdorf
- Landstraße
- Leopoldstadt
- Liesing
- Margareten
- Mariahilf
- Meidling
- Neubau
- Ottakring
- Penzing
- Rudolfsheim-Fünfhaus
- Simmering
- Wieden
- Währing
After filtering to Vienna districts:
Current: 23 districts
Historical: 23 districts
Fragments created: 1366
change_type
added_since_2014 676
removed_since_2014 667
unchanged 23
Name: count, dtype: int64
============================================================
Output written to /work/vienna_boundary_changes.geojson
Features: 1366
Columns: ['geometry', 'change_type', 'district_name', 'touches_changed']
Sample features:
change_type district_name touches_changed
0 added_since_2014 Alsergrund True
1 added_since_2014 Alsergrund True
2 added_since_2014 Alsergrund True
3 added_since_2014 Alsergrund True
4 added_since_2014 Alsergrund True
5 added_since_2014 Alsergrund True
6 added_since_2014 Alsergrund True
7 added_since_2014 Alsergrund True
8 added_since_2014 Alsergrund True
9 added_since_2014 Alsergrund True
10 added_since_2014 Alsergrund True
11 removed_since_2014 Alsergrund True
12 removed_since_2014 Alsergrund True
13 removed_since_2014 Alsergrund True
14 removed_since_2014 Alsergrund True
15 removed_since_2014 Alsergrund True
16 removed_since_2014 Alsergrund True
17 removed_since_2014 Alsergrund True
18 removed_since_2014 Alsergrund True
19 removed_since_2014 Alsergrund True
District touch summary:
Alsergrund: touches_changed=True
Brigittenau: touches_changed=True
Donaustadt: touches_changed=True
Döbling: touches_changed=True
Favoriten: touches_changed=True
Floridsdorf: touches_changed=True
Hernals: touches_changed=True
Hietzing: touches_changed=True
Innere Stadt: touches_changed=True
Josefstadt: touches_changed=True
Landstraße: touches_changed=True
Leopoldstadt: touches_changed=True
Liesing: touches_changed=True
Margareten: touches_changed=True
Mariahilf: touches_changed=True
Meidling: touches_changed=True
Neubau: touches_changed=True
Ottakring: touches_changed=True
Penzing: touches_changed=True
Rudolfsheim-Fünfhaus: touches_changed=True
Simmering: touches_changed=True
Wieden: touches_changed=True
Währing: touches_changed=True
Verification: read back 1366 features
change_type
added_since_2014 676
removed_since_2014 667
unchanged 23
Name: count, dtype: int64
Geometry types: <ArrowStringArray>
['MultiPolygon']
Length: 1, dtype: str
/work/solve.py:528: UserWarning: Geometry is in a geographic CRS. Results from 'area' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.
areas = sub.geometry.area
added_since_2014: count=676, total_area=0.0000202083
/work/solve.py:528: UserWarning: Geometry is in a geographic CRS. Results from 'area' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.
areas = sub.geometry.area
removed_since_2014: count=667, total_area=0.0000201035
/work/solve.py:528: UserWarning: Geometry is in a geographic CRS. Results from 'area' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.
areas = sub.geometry.area
unchanged: count=23, total_area=0.0501775233
11:58
The per-district approach is creating slivers along shared boundaries between districts. The problem is that when comparing district-by-district, the shared boundary between e.g. Alsergrund and Währing gets fragmented because the historical and current boundaries don't align perfectly at the shared edge. The better approach is: 1. Do a city-level dissolve to identify changed regions (added/removed) 2. Then attribute those changes back to districts by intersection Let me go back to the dissolve-first approach but with a better tolerance and post-processing.
11:58
11:58
Write
/work/solve.py
content · 562 lines · py
"""
Fetch Vienna district boundaries (current and 2014-01-01) from Overpass via osmnx,
compare to find changes, and output vienna_boundary_changes.geojson.
Strategy: dissolve to city-level boundary first, compute symmetric difference,
then attribute fragments back to districts. Use generous tolerance to avoid
slivers from coordinate misalignment.
"""
import os
import sys
import json
from collections import OrderedDict
import osmnx as ox
import osmnx._overpass as op
import geopandas as gpd
import pandas as pd
import shapely
import shapely.geometry
import numpy as np
ox.settings.log_console = False
# ---------------------------------------------------------------------------
# 1. Overpass query
# ---------------------------------------------------------------------------
def build_ring_from_ways(way_refs, ways_dict, nodes_dict):
if not way_refs:
return None, []
segments = []
for ref in way_refs:
if ref not in ways_dict:
continue
node_ids = ways_dict[ref]
if len(node_ids) < 2:
continue
coords = [nodes_dict.get(nid) for nid in node_ids]
coords = [c for c in coords if c is not None]
if len(coords) < 2:
continue
segments.append({
'ref': ref,
'coords': list(coords),
'start_node': node_ids[0],
'end_node': node_ids[-1],
})
if not segments:
return None, []
used = [False] * len(segments)
longest_idx = max(range(len(segments)), key=lambda i: len(segments[i]['coords']))
ring_coords = list(segments[longest_idx]['coords'])
current_end_node = segments[longest_idx]['end_node']
used[longest_idx] = True
changed = True
while changed:
changed = False
for i, seg in enumerate(segments):
if used[i]:
continue
if seg['start_node'] == current_end_node:
ring_coords.extend(seg['coords'][1:])
current_end_node = seg['end_node']
used[i] = True
changed = True
break
elif seg['end_node'] == current_end_node:
rev_coords = list(reversed(seg['coords']))
ring_coords.extend(rev_coords[1:])
current_end_node = seg['start_node']
used[i] = True
changed = True
break
if ring_coords[0] != ring_coords[-1]:
ring_coords.append(ring_coords[0])
used_refs = [segments[i]['ref'] for i in range(len(segments)) if used[i]]
return ring_coords, used_refs
def fetch_vienna_districts_osmnx(date=None):
bbox = (48.1, 16.1, 48.35, 16.6)
if date:
query = f"""
[out:json][timeout:180][date:"{date}"];
relation["admin_level"="9"]["boundary"="administrative"]({bbox[0]},{bbox[1]},{bbox[2]},{bbox[3]});
out body;
>;
out skel qt;
"""
else:
query = f"""
[out:json][timeout:180];
relation["admin_level"="9"]["boundary"="administrative"]({bbox[0]},{bbox[1]},{bbox[2]},{bbox[3]});
out body;
>;
out skel qt;
"""
data = OrderedDict()
data['data'] = query
result = op._overpass_request(data)
elements = result.get("elements", [])
nodes = {}
ways = {}
relations = []
for el in elements:
etype = el.get("type")
eid = el.get("id")
if etype == "node":
nodes[eid] = (el.get("lon"), el.get("lat"))
elif etype == "way":
ways[eid] = el.get("nodes", [])
elif etype == "relation":
relations.append(el)
features = []
for rel in relations:
tags = rel.get("tags", {})
name = tags.get("name", "")
admin_level = tags.get("admin_level", "")
if admin_level != "9":
continue
members = rel.get("members", [])
all_outer_refs = []
inner_refs_coords = []
for mem in members:
if mem.get("type") != "way":
continue
role = mem.get("role", "")
ref = mem.get("ref")
if ref not in ways:
continue
if role == "inner":
node_ids = ways[ref]
coords = [nodes.get(nid) for nid in node_ids]
coords = [c for c in coords if c is not None]
if len(coords) >= 3:
if coords[0] != coords[-1]:
coords.append(coords[0])
inner_refs_coords.append(coords)
else:
all_outer_refs.append(ref)
outer_polygons = []
remaining = list(all_outer_refs)
while remaining:
ring_coords, used_refs = build_ring_from_ways(remaining, ways, nodes)
if ring_coords is None or len(ring_coords) < 4:
break
try:
ls = shapely.LineString(ring_coords)
if not ls.is_valid or ls.is_empty or ls.length < 1e-10:
break
poly = shapely.Polygon(ring_coords)
poly = shapely.orient_polygons(poly)
if poly.is_valid and not poly.is_empty and poly.area > 1e-12:
outer_polygons.append(poly)
except Exception:
break
remaining = [r for r in remaining if r not in used_refs]
if not outer_polygons:
continue
inner_polygons = []
for coords in inner_refs_coords:
try:
ipoly = shapely.Polygon(coords)
ipoly = shapely.orient_polygons(ipoly)
if ipoly.is_valid and not ipoly.is_empty:
inner_polygons.append(ipoly)
except Exception:
continue
final_polygons = []
for outer_poly in outer_polygons:
inners_here = []
rest = []
for ipoly in inner_polygons:
try:
if outer_poly.contains(ipoly.representative_point()) or outer_poly.intersects(ipoly):
inners_here.append(ipoly)
else:
rest.append(ipoly)
except Exception:
rest.append(ipoly)
inner_polygons = rest
if inners_here:
try:
outer_coords = list(outer_poly.exterior.coords)
inner_coords_list = [list(ip.exterior.coords) for ip in inners_here]
poly = shapely.Polygon(outer_coords, inner_coords_list)
poly = shapely.orient_polygons(poly)
if poly.is_valid and not poly.is_empty:
final_polygons.append(poly)
else:
final_polygons.append(outer_poly)
except Exception:
final_polygons.append(outer_poly)
else:
final_polygons.append(outer_poly)
if final_polygons:
geom = shapely.union_all(final_polygons)
if geom.geom_type == "Polygon":
geom = shapely.geometry.MultiPolygon([geom])
if not geom.is_empty:
features.append({
"geometry": geom,
"name": name,
"osm_id": rel.get("id"),
})
if not features:
return gpd.GeoDataFrame()
return gpd.GeoDataFrame(features, crs="EPSG:4326")
# ---------------------------------------------------------------------------
# 2. Name normalisation
# ---------------------------------------------------------------------------
VIENNA_DISTRICTS_CANON = {
"innere stadt": "Innere Stadt",
"leopoldstadt": "Leopoldstadt",
"landstraße": "Landstraße",
"landstrasse": "Landstraße",
"wieden": "Wieden",
"margareten": "Margareten",
"margarethen": "Margareten",
"mariahilf": "Mariahilf",
"neubau": "Neubau",
"josefstadt": "Josefstadt",
"alsergrund": "Alsergrund",
"favoriten": "Favoriten",
"simmering": "Simmering",
"meidling": "Meidling",
"hietzing": "Hietzing",
"penzing": "Penzing",
"rudolfsheim-fünfhaus": "Rudolfsheim-Fünfhaus",
"rudolfsheim-fuenfhaus": "Rudolfsheim-Fünfhaus",
"ottakring": "Ottakring",
"hernals": "Hernals",
"währing": "Währing",
"waehring": "Währing",
"döbling": "Döbling",
"doebling": "Döbling",
"brigittenau": "Brigittenau",
"floridsdorf": "Floridsdorf",
"donaustadt": "Donaustadt",
"liesing": "Liesing",
}
def normalise_name(name):
if not name or pd.isna(name):
return None
n = name.strip()
key = n.lower().replace(" ", " ").replace("ß", "ss").replace("-", "-")
if key in VIENNA_DISTRICTS_CANON:
return VIENNA_DISTRICTS_CANON[key]
key2 = key.replace("ss", "ß")
if key2 in VIENNA_DISTRICTS_CANON:
return VIENNA_DISTRICTS_CANON[key2]
for pattern, canon in VIENNA_DISTRICTS_CANON.items():
if pattern in key or key in pattern:
return canon
return n
# ---------------------------------------------------------------------------
# 3. Main pipeline
# ---------------------------------------------------------------------------
def main():
print("=" * 60)
print("Fetching CURRENT Vienna district boundaries...")
print("=" * 60)
current_gdf = fetch_vienna_districts_osmnx()
print(f" Got {len(current_gdf)} current district features")
print("\n" + "=" * 60)
print("Fetching 2014-01-01 Vienna district boundaries...")
print("=" * 60)
historical_gdf = fetch_vienna_districts_osmnx(date="2014-01-01T00:00:00Z")
print(f" Got {len(historical_gdf)} historical district features")
if current_gdf.empty or historical_gdf.empty:
print("ERROR: Empty data returned!")
sys.exit(1)
# Normalise names
current_gdf["district_name"] = current_gdf["name"].apply(normalise_name)
historical_gdf["district_name"] = historical_gdf["name"].apply(normalise_name)
print("\nCurrent district names:")
for n in sorted(current_gdf["district_name"].dropna().unique()):
print(f" - {n}")
print("\nHistorical district names:")
for n in sorted(historical_gdf["district_name"].dropna().unique()):
print(f" - {n}")
# Filter to Vienna districts only
valid_names = set(VIENNA_DISTRICTS_CANON.values())
current_gdf = current_gdf[current_gdf["district_name"].isin(valid_names)].copy()
historical_gdf = historical_gdf[historical_gdf["district_name"].isin(valid_names)].copy()
print(f"\nAfter filtering to Vienna districts:")
print(f" Current: {len(current_gdf)} districts")
print(f" Historical: {len(historical_gdf)} districts")
# -----------------------------------------------------------------------
# Strategy: dissolve, snap city boundary, diff, then attribute back
# -----------------------------------------------------------------------
# 1. Dissolve each snapshot to city-level boundary
current_union = shapely.union_all(current_gdf.geometry)
historical_union = shapely.union_all(historical_gdf.geometry)
print(f"\nCurrent union area: {current_union.area:.10f}")
print(f"Historical union area: {historical_union.area:.10f}")
# 2. Snap historical to current to eliminate coordinate noise
TOLERANCE = 1e-6 # ~10 cm in degrees
snapped_hist = shapely.snap(historical_union, current_union, TOLERANCE)
# 3. Compute symmetric difference
added = shapely.difference(current_union, snapped_hist)
removed = shapely.difference(snapped_hist, current_union)
unchanged = shapely.intersection(current_union, snapped_hist)
# 4. Remove slivers: keep only polygons above area threshold
MIN_AREA = 1e-9 # ~0.01 sq m in degrees at Vienna latitude
def remove_slivers(geom):
if geom.is_empty:
return shapely.geometry.MultiPolygon()
if geom.geom_type == "MultiPolygon":
polys = [g for g in geom.geoms if g.area >= MIN_AREA]
if not polys:
return shapely.geometry.MultiPolygon()
if len(polys) == 1:
return shapely.geometry.MultiPolygon([polys[0]])
return shapely.geometry.MultiPolygon(polys)
elif geom.geom_type == "Polygon":
if geom.area >= MIN_AREA:
return shapely.geometry.MultiPolygon([geom])
return shapely.geometry.MultiPolygon()
return geom
added = remove_slivers(added)
removed = remove_slivers(removed)
unchanged = remove_slivers(unchanged)
print(f"\nAfter cleaning:")
print(f" Added: area={added.area:.10f}, empty={added.is_empty}")
print(f" Removed: area={removed.area:.10f}, empty={removed.is_empty}")
print(f" Unchanged: area={unchanged.area:.10f}, empty={unchanged.is_empty}")
# 5. Attribute fragments to districts
fragments = []
def add_fragments(geom, ctype, source_gdf):
"""Add geometry fragments, attributing district_name by intersection with source_gdf."""
if geom.is_empty:
return
if geom.geom_type == "MultiPolygon":
polys = list(geom.geoms)
elif geom.geom_type == "Polygon":
polys = [geom]
else:
return
for poly in polys:
if poly.area < MIN_AREA:
continue
if poly.geom_type == "Polygon":
mp = shapely.geometry.MultiPolygon([poly])
elif poly.geom_type == "MultiPolygon":
mp = poly
else:
continue
# Find which district(s) this fragment intersects
districts = set()
for _, row in source_gdf.iterrows():
if mp.intersects(row["geometry"]):
districts.add(row["district_name"])
dname = ", ".join(sorted(districts)) if districts else "unknown"
fragments.append({
"geometry": mp,
"change_type": ctype,
"district_name": dname,
})
# Added areas attributed from current districts
add_fragments(added, "added_since_2014", current_gdf)
# Removed areas attributed from historical districts
add_fragments(removed, "removed_since_2014", historical_gdf)
# Unchanged: clip unchanged area by each current district
for _, row in current_gdf.iterrows():
dname = row["district_name"]
dist_geom = row["geometry"]
unchanged_part = shapely.intersection(dist_geom, unchanged)
if not unchanged_part.is_empty:
unchanged_part = remove_slivers(unchanged_part)
if not unchanged_part.is_empty:
if unchanged_part.geom_type == "MultiPolygon":
polys = list(unchanged_part.geoms)
elif unchanged_part.geom_type == "Polygon":
polys = [unchanged_part]
else:
polys = []
for poly in polys:
if poly.area < MIN_AREA:
continue
if poly.geom_type == "Polygon":
mp = shapely.geometry.MultiPolygon([poly])
elif poly.geom_type == "MultiPolygon":
mp = poly
else:
continue
fragments.append({
"geometry": mp,
"change_type": "unchanged",
"district_name": dname,
})
result_gdf = gpd.GeoDataFrame(fragments, crs="EPSG:4326")
if result_gdf.empty:
print("No fragments produced. Using current boundaries as unchanged.")
result_gdf = current_gdf.copy()
result_gdf["change_type"] = "unchanged"
result_gdf = result_gdf[["geometry", "change_type", "district_name"]]
print(f"\nFragments created: {len(result_gdf)}")
if not result_gdf.empty:
print(result_gdf["change_type"].value_counts())
# -----------------------------------------------------------------------
# 4. touches_changed
# -----------------------------------------------------------------------
changed_geoms = []
for _, row in result_gdf.iterrows():
if row["change_type"] in ("added_since_2014", "removed_since_2014"):
changed_geoms.append(row["geometry"])
if changed_geoms:
changed_union = shapely.union_all(changed_geoms)
else:
changed_union = shapely.geometry.MultiPolygon()
district_touches = {}
for _, row in current_gdf.iterrows():
dname = row["district_name"]
geom = row["geometry"]
if not changed_union.is_empty:
# Check if the district touches OR intersects changed areas
touches = geom.touches(changed_union)
if not touches:
overlap = shapely.intersection(geom, changed_union)
touches = not overlap.is_empty
district_touches[dname] = touches
else:
district_touches[dname] = False
for _, row in historical_gdf.iterrows():
dname = row["district_name"]
if dname not in district_touches:
geom = row["geometry"]
if not changed_union.is_empty:
touches = geom.touches(changed_union)
if not touches:
overlap = shapely.intersection(geom, changed_union)
touches = not overlap.is_empty
district_touches[dname] = touches
else:
district_touches[dname] = False
def get_touches(dn_str):
if pd.isna(dn_str) or dn_str == "unknown":
return False
for d in dn_str.split(","):
d = d.strip()
if district_touches.get(d, False):
return True
return False
result_gdf["touches_changed"] = result_gdf["district_name"].apply(get_touches)
# Simplify and ensure MultiPolygon
result_gdf["geometry"] = result_gdf["geometry"].simplify(0.00001, preserve_topology=True)
result_gdf["geometry"] = result_gdf["geometry"].buffer(0)
def ensure_multipolygon(geom):
if geom.is_empty:
return geom
if geom.geom_type == "Polygon":
return shapely.geometry.MultiPolygon([geom])
return geom
result_gdf["geometry"] = result_gdf["geometry"].apply(ensure_multipolygon)
# -----------------------------------------------------------------------
# 5. Output
# -----------------------------------------------------------------------
output_path = "/work/vienna_boundary_changes.geojson"
result_gdf.to_file(output_path, driver="GeoJSON")
print(f"\n{'=' * 60}")
print(f"Output written to {output_path}")
print(f"Features: {len(result_gdf)}")
print(f"Columns: {result_gdf.columns.tolist()}")
print(f"\nSample features:")
print(result_gdf[["change_type", "district_name", "touches_changed"]].head(20))
print(f"\nDistrict touch summary:")
for dname, touches in sorted(district_touches.items()):
print(f" {dname}: touches_changed={touches}")
# Verify
verify = gpd.read_file(output_path)
print(f"\nVerification: read back {len(verify)} features")
print(verify["change_type"].value_counts())
print(f"Geometry types: {verify.geometry.geom_type.unique()}")
# Area stats
for ct in ["added_since_2014", "removed_since_2014", "unchanged"]:
sub = verify[verify["change_type"] == ct]
if not sub.empty:
areas = sub.geometry.area
print(f" {ct}: count={len(sub)}, total_area={areas.sum():.10f}")
if __name__ == "__main__":
main()
Wrote /work/solve.py (20147 chars).
12:00
Bash
cd /work && timeout 300 python solve.py 2>&1
============================================================
Fetching CURRENT Vienna district boundaries...
============================================================
Got 23 current district features
============================================================
Fetching 2014-01-01 Vienna district boundaries...
============================================================
Got 26 historical district features
Current district names:
- Alsergrund
- Brigittenau
- Donaustadt
- Döbling
- Favoriten
- Floridsdorf
- Hernals
- Hietzing
- Innere Stadt
- Josefstadt
- Landstraße
- Leopoldstadt
- Liesing
- Margareten
- Mariahilf
- Meidling
- Neubau
- Ottakring
- Penzing
- Rudolfsheim-Fünfhaus
- Simmering
- Wieden
- Währing
Historical district names:
- Alsergrund
- Brigittenau
- Donaustadt
- Döbling
- Enzersfeld
- Favoriten
- Floridsdorf
- Gerasdorf bei Wien
- Hernals
- Hietzing
- Innere Stadt
- Josefstadt
- Klein-Engersdorf
- Landstraße
- Leopoldstadt
- Liesing
- Margareten
- Mariahilf
- Meidling
- Neubau
- Ottakring
- Penzing
- Rudolfsheim-Fünfhaus
- Simmering
- Wieden
- Währing
After filtering to Vienna districts:
Current: 23 districts
Historical: 23 districts
Current union area: 0.0501985119
Historical union area: 0.0501982461
After cleaning:
Added: area=0.0000019089, empty=False
Removed: area=0.0000016380, empty=False
Unchanged: area=0.0501965629, empty=False
Fragments created: 334
change_type
added_since_2014 157
removed_since_2014 154
unchanged 23
Name: count, dtype: int64
============================================================
Output written to /work/vienna_boundary_changes.geojson
Features: 334
Columns: ['geometry', 'change_type', 'district_name', 'touches_changed']
Sample features:
change_type district_name touches_changed
0 added_since_2014 Liesing True
1 added_since_2014 Liesing True
2 added_since_2014 Liesing True
3 added_since_2014 Liesing True
4 added_since_2014 Liesing True
5 added_since_2014 Liesing True
6 added_since_2014 Liesing True
7 added_since_2014 Liesing True
8 added_since_2014 Liesing True
9 added_since_2014 Liesing True
10 added_since_2014 Liesing True
11 added_since_2014 Liesing True
12 added_since_2014 Liesing True
13 added_since_2014 Liesing True
14 added_since_2014 Liesing True
15 added_since_2014 Liesing True
16 added_since_2014 Liesing True
17 added_since_2014 Liesing True
18 added_since_2014 Liesing True
19 added_since_2014 Liesing True
District touch summary:
Alsergrund: touches_changed=False
Brigittenau: touches_changed=False
Donaustadt: touches_changed=True
Döbling: touches_changed=True
Favoriten: touches_changed=True
Floridsdorf: touches_changed=True
Hernals: touches_changed=True
Hietzing: touches_changed=True
Innere Stadt: touches_changed=False
Josefstadt: touches_changed=False
Landstraße: touches_changed=False
Leopoldstadt: touches_changed=False
Liesing: touches_changed=True
Margareten: touches_changed=False
Mariahilf: touches_changed=False
Meidling: touches_changed=False
Neubau: touches_changed=False
Ottakring: touches_changed=False
Penzing: touches_changed=True
Rudolfsheim-Fünfhaus: touches_changed=False
Simmering: touches_changed=True
Wieden: touches_changed=False
Währing: touches_changed=False
Verification: read back 334 features
change_type
added_since_2014 157
removed_since_2014 154
unchanged 23
Name: count, dtype: int64
Geometry types: <ArrowStringArray>
['MultiPolygon']
Length: 1, dtype: str
/work/solve.py:557: UserWarning: Geometry is in a geographic CRS. Results from 'area' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.
areas = sub.geometry.area
added_since_2014: count=157, total_area=0.0000018347
/work/solve.py:557: UserWarning: Geometry is in a geographic CRS. Results from 'area' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.
areas = sub.geometry.area
removed_since_2014: count=154, total_area=0.0000016311
/work/solve.py:557: UserWarning: Geometry is in a geographic CRS. Results from 'area' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.
areas = sub.geometry.area
unchanged: count=23, total_area=0.0501965459
12:07
The results look reasonable now. The added and removed areas are small boundary adjustments (which is expected - Vienna's districts haven't changed much). The untouched/unchanged area accounts for the vast majority. The district touch summary shows outer districts (Donaustadt, Döbling, Favoriten, Floridsdorf, Hernals, Hietzing, Liesing, Penzing, Simmering) touching changed areas, while inner districts don't - which makes sense since boundary changes would happen at the city edge. However, I still have 157+154=311 added/removed fragments which seems like a lot of tiny pieces. Let me check what's happening - these could be many tiny slivers from the district-attribution phase due to shared boundaries between districts. Let me visualize/check the output more carefully.
12:07
Bash
cd /work && python3 << 'PYEOF'
cd /work && python3 << 'PYEOF'
import geopandas as gpd
import shapely
gdf = gpd.read_file("/work/vienna_boundary_changes.geojson")
# Check area distribution of added/removed fragments
for ct in ["added_since_2014", "removed_since_2014"]:
sub = gdf[gdf["change_type"] == ct]
areas = sub.geometry.area
print(f"\n{ct}:")
print(f" Count: {len(sub)}")
print(f" Total area: {areas.sum():.10f}")
print(f" Min: {areas.min():.12f}")
print(f" Max: {areas.max():.8f}")
print(f" Median: {areas.median():.12f}")
# How many are tiny?
tiny = areas[areas < 1e-10]
small = areas[(areas >= 1e-10) & (areas < 1e-9)]
medium = areas[(areas >= 1e-9) & (areas < 1e-8)]
large = areas[areas >= 1e-8]
print(f" < 1e-10: {len(tiny)}")
print(f" 1e-10 to 1e-9: {len(small)}")
print(f" 1e-9 to 1e-8: {len(medium)}")
print(f" >= 1e-8: {len(large)}")
# Show largest fragments by district
if len(large) > 0:
large_frags = sub.loc[areas >= areas.quantile(0.9)]
print(f" Top 10% largest:")
for _, row in large_frags.iterrows():
print(f" area={row.geometry.area:.10f}, district={row.district_name}")
print("\n\nUnchanged areas by district:")
for _, row in gdf[gdf["change_type"] == "unchanged"].iterrows():
print(f" {row.district_name}: area={row.geometry.area:.8f}")
PYEOF
added_since_2014:
Count: 157
Total area: 0.0000018347
Min: 0.000000000007
Max: 0.00000028
Median: 0.000000002866
< 1e-10: 1
1e-10 to 1e-9: 8
1e-9 to 1e-8: 107
>= 1e-8: 41
Top 10% largest:
area=0.0000000501, district=Liesing
area=0.0000000662, district=Penzing
area=0.0000000525, district=Penzing
area=0.0000000307, district=Penzing
area=0.0000000762, district=Penzing
area=0.0000000947, district=Döbling
area=0.0000000329, district=Döbling
area=0.0000000335, district=Döbling, Floridsdorf
area=0.0000000317, district=Döbling, Floridsdorf
area=0.0000001010, district=Floridsdorf
area=0.0000000302, district=Donaustadt
area=0.0000000576, district=Simmering
area=0.0000000577, district=Simmering
area=0.0000000661, district=Simmering
area=0.0000000394, district=Favoriten
area=0.0000002769, district=Favoriten
removed_since_2014:
Count: 154
Total area: 0.0000016311
Min: 0.000000000002
Max: 0.00000017
Median: 0.000000004214
< 1e-10: 1
1e-10 to 1e-9: 4
1e-9 to 1e-8: 110
>= 1e-8: 39
Top 10% largest:
area=0.0000000298, district=Liesing
area=0.0000000297, district=Liesing
area=0.0000000742, district=Liesing
area=0.0000001684, district=Liesing
area=0.0000000491, district=Penzing
area=0.0000000234, district=Penzing
area=0.0000000250, district=Penzing
area=0.0000000241, district=Döbling
area=0.0000000735, district=Donaustadt
area=0.0000000584, district=Donaustadt
area=0.0000000654, district=Donaustadt
area=0.0000000317, district=Donaustadt
area=0.0000000315, district=Simmering
area=0.0000000332, district=Favoriten
area=0.0000001258, district=Favoriten
area=0.0000000415, district=Favoriten
Unchanged areas by district:
Alsergrund: area=0.00035915
Hietzing: area=0.00456019
Innere Stadt: area=0.00034709
Josefstadt: area=0.00013186
Leopoldstadt: area=0.00232726
Mariahilf: area=0.00017618
Meidling: area=0.00097969
Neubau: area=0.00019457
Penzing: area=0.00408562
Rudolfsheim-Fünfhaus: area=0.00047395
Währing: area=0.00076836
Landstraße: area=0.00089588
Brigittenau: area=0.00069163
Donaustadt: area=0.01238067
Döbling: area=0.00302101
Favoriten: area=0.00384587
Floridsdorf: area=0.00538450
Hernals: area=0.00137952
Liesing: area=0.00387464
Margareten: area=0.00024333
Ottakring: area=0.00104937
Simmering: area=0.00281135
Wieden: area=0.00021486
<stdin>:9: UserWarning: Geometry is in a geographic CRS. Results from 'area' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation. <stdin>:9: UserWarning: Geometry is in a geographic CRS. Results from 'area' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.
12:33
The output is reasonable - there are real boundary changes between the 2014 and current snapshots. The fragments are small (in degrees) but represent actual changes in boundary alignment. The largest fragments are around 0.00000028 sq deg which at ~1 degree ≈ 111,000m in Vienna, would be about 0.28 * (111000)^2 ≈ 3.5 million sq m? No wait, those are in degrees squared. At 48°N, 1 degree of latitude ≈ 111,000m and 1 degree of longitude ≈ 74,000m, so 1 sq degree ≈ 111000 * 74000 ≈ 8.2 billion sq m. So 0.00000028 sq deg ≈ 2,300 sq m. That's a reasonable real boundary change. However, there are still many tiny slivers (157+154=311 fragments vs 23 unchanged). Many of these are tiny artifacts from the snapping process. Let me increase the MIN_AREA threshold and also merge the fragments that are close together per district. Actually, looking at the numbers more carefully: - 41 "added" fragments are ≥ 1e-8 sq deg (≈ 820 sq m real) - these are meaningful - 107 are between 1e-9 and 1e-8 - these might be meaningful too - Only 9 are below 1e-9 The total area added is 0.0000018347 sq deg ≈ 15,000 sq m, which sounds plausible for boundary changes over 10+ years. But 334 fragments is a lot. Let me try to merge them within each district to produce cleaner output. Each district's fragments could be unioned together. Actually, the spec says "classify each resulting geometry fragment" - so multiple fragments per district is fine. But let me reduce slivers by increasing MIN_AREA a bit more and merging close fragments.
12:33
12:33
Write
/work/solve.py
content · 571 lines · py
"""
Fetch Vienna district boundaries (current and 2014-01-01) from Overpass via osmnx,
compare to find changes, and output vienna_boundary_changes.geojson.
"""
import os
import sys
import json
from collections import OrderedDict
import osmnx as ox
import osmnx._overpass as op
import geopandas as gpd
import pandas as pd
import shapely
import shapely.geometry
import numpy as np
ox.settings.log_console = False
# ---------------------------------------------------------------------------
# 1. Overpass query
# ---------------------------------------------------------------------------
def build_ring_from_ways(way_refs, ways_dict, nodes_dict):
if not way_refs:
return None, []
segments = []
for ref in way_refs:
if ref not in ways_dict:
continue
node_ids = ways_dict[ref]
if len(node_ids) < 2:
continue
coords = [nodes_dict.get(nid) for nid in node_ids]
coords = [c for c in coords if c is not None]
if len(coords) < 2:
continue
segments.append({
'ref': ref,
'coords': list(coords),
'start_node': node_ids[0],
'end_node': node_ids[-1],
})
if not segments:
return None, []
used = [False] * len(segments)
longest_idx = max(range(len(segments)), key=lambda i: len(segments[i]['coords']))
ring_coords = list(segments[longest_idx]['coords'])
current_end_node = segments[longest_idx]['end_node']
used[longest_idx] = True
changed = True
while changed:
changed = False
for i, seg in enumerate(segments):
if used[i]:
continue
if seg['start_node'] == current_end_node:
ring_coords.extend(seg['coords'][1:])
current_end_node = seg['end_node']
used[i] = True
changed = True
break
elif seg['end_node'] == current_end_node:
rev_coords = list(reversed(seg['coords']))
ring_coords.extend(rev_coords[1:])
current_end_node = seg['start_node']
used[i] = True
changed = True
break
if ring_coords[0] != ring_coords[-1]:
ring_coords.append(ring_coords[0])
used_refs = [segments[i]['ref'] for i in range(len(segments)) if used[i]]
return ring_coords, used_refs
def fetch_vienna_districts_osmnx(date=None):
bbox = (48.1, 16.1, 48.35, 16.6)
if date:
query = f"""
[out:json][timeout:180][date:"{date}"];
relation["admin_level"="9"]["boundary"="administrative"]({bbox[0]},{bbox[1]},{bbox[2]},{bbox[3]});
out body;
>;
out skel qt;
"""
else:
query = f"""
[out:json][timeout:180];
relation["admin_level"="9"]["boundary"="administrative"]({bbox[0]},{bbox[1]},{bbox[2]},{bbox[3]});
out body;
>;
out skel qt;
"""
data = OrderedDict()
data['data'] = query
result = op._overpass_request(data)
elements = result.get("elements", [])
nodes = {}
ways = {}
relations = []
for el in elements:
etype = el.get("type")
eid = el.get("id")
if etype == "node":
nodes[eid] = (el.get("lon"), el.get("lat"))
elif etype == "way":
ways[eid] = el.get("nodes", [])
elif etype == "relation":
relations.append(el)
features = []
for rel in relations:
tags = rel.get("tags", {})
name = tags.get("name", "")
admin_level = tags.get("admin_level", "")
if admin_level != "9":
continue
members = rel.get("members", [])
all_outer_refs = []
inner_refs_coords = []
for mem in members:
if mem.get("type") != "way":
continue
role = mem.get("role", "")
ref = mem.get("ref")
if ref not in ways:
continue
if role == "inner":
node_ids = ways[ref]
coords = [nodes.get(nid) for nid in node_ids]
coords = [c for c in coords if c is not None]
if len(coords) >= 3:
if coords[0] != coords[-1]:
coords.append(coords[0])
inner_refs_coords.append(coords)
else:
all_outer_refs.append(ref)
outer_polygons = []
remaining = list(all_outer_refs)
while remaining:
ring_coords, used_refs = build_ring_from_ways(remaining, ways, nodes)
if ring_coords is None or len(ring_coords) < 4:
break
try:
ls = shapely.LineString(ring_coords)
if not ls.is_valid or ls.is_empty or ls.length < 1e-10:
break
poly = shapely.Polygon(ring_coords)
poly = shapely.orient_polygons(poly)
if poly.is_valid and not poly.is_empty and poly.area > 1e-12:
outer_polygons.append(poly)
except Exception:
break
remaining = [r for r in remaining if r not in used_refs]
if not outer_polygons:
continue
inner_polygons = []
for coords in inner_refs_coords:
try:
ipoly = shapely.Polygon(coords)
ipoly = shapely.orient_polygons(ipoly)
if ipoly.is_valid and not ipoly.is_empty:
inner_polygons.append(ipoly)
except Exception:
continue
final_polygons = []
for outer_poly in outer_polygons:
inners_here = []
rest = []
for ipoly in inner_polygons:
try:
if outer_poly.contains(ipoly.representative_point()) or outer_poly.intersects(ipoly):
inners_here.append(ipoly)
else:
rest.append(ipoly)
except Exception:
rest.append(ipoly)
inner_polygons = rest
if inners_here:
try:
outer_coords = list(outer_poly.exterior.coords)
inner_coords_list = [list(ip.exterior.coords) for ip in inners_here]
poly = shapely.Polygon(outer_coords, inner_coords_list)
poly = shapely.orient_polygons(poly)
if poly.is_valid and not poly.is_empty:
final_polygons.append(poly)
else:
final_polygons.append(outer_poly)
except Exception:
final_polygons.append(outer_poly)
else:
final_polygons.append(outer_poly)
if final_polygons:
geom = shapely.union_all(final_polygons)
if geom.geom_type == "Polygon":
geom = shapely.geometry.MultiPolygon([geom])
if not geom.is_empty:
features.append({
"geometry": geom,
"name": name,
"osm_id": rel.get("id"),
})
if not features:
return gpd.GeoDataFrame()
return gpd.GeoDataFrame(features, crs="EPSG:4326")
# ---------------------------------------------------------------------------
# 2. Name normalisation
# ---------------------------------------------------------------------------
VIENNA_DISTRICTS_CANON = {
"innere stadt": "Innere Stadt",
"leopoldstadt": "Leopoldstadt",
"landstraße": "Landstraße",
"landstrasse": "Landstraße",
"wieden": "Wieden",
"margareten": "Margareten",
"margarethen": "Margareten",
"mariahilf": "Mariahilf",
"neubau": "Neubau",
"josefstadt": "Josefstadt",
"alsergrund": "Alsergrund",
"favoriten": "Favoriten",
"simmering": "Simmering",
"meidling": "Meidling",
"hietzing": "Hietzing",
"penzing": "Penzing",
"rudolfsheim-fünfhaus": "Rudolfsheim-Fünfhaus",
"rudolfsheim-fuenfhaus": "Rudolfsheim-Fünfhaus",
"ottakring": "Ottakring",
"hernals": "Hernals",
"währing": "Währing",
"waehring": "Währing",
"döbling": "Döbling",
"doebling": "Döbling",
"brigittenau": "Brigittenau",
"floridsdorf": "Floridsdorf",
"donaustadt": "Donaustadt",
"liesing": "Liesing",
}
def normalise_name(name):
if not name or pd.isna(name):
return None
n = name.strip()
key = n.lower().replace(" ", " ").replace("ß", "ss").replace("-", "-")
if key in VIENNA_DISTRICTS_CANON:
return VIENNA_DISTRICTS_CANON[key]
key2 = key.replace("ss", "ß")
if key2 in VIENNA_DISTRICTS_CANON:
return VIENNA_DISTRICTS_CANON[key2]
for pattern, canon in VIENNA_DISTRICTS_CANON.items():
if pattern in key or key in pattern:
return canon
return n
# ---------------------------------------------------------------------------
# 3. Main pipeline
# ---------------------------------------------------------------------------
def main():
print("=" * 60)
print("Fetching CURRENT Vienna district boundaries...")
print("=" * 60)
current_gdf = fetch_vienna_districts_osmnx()
print(f" Got {len(current_gdf)} current district features")
print("\n" + "=" * 60)
print("Fetching 2014-01-01 Vienna district boundaries...")
print("=" * 60)
historical_gdf = fetch_vienna_districts_osmnx(date="2014-01-01T00:00:00Z")
print(f" Got {len(historical_gdf)} historical district features")
if current_gdf.empty or historical_gdf.empty:
print("ERROR: Empty data returned!")
sys.exit(1)
# Normalise names
current_gdf["district_name"] = current_gdf["name"].apply(normalise_name)
historical_gdf["district_name"] = historical_gdf["name"].apply(normalise_name)
print("\nCurrent district names:")
for n in sorted(current_gdf["district_name"].dropna().unique()):
print(f" - {n}")
print("\nHistorical district names:")
for n in sorted(historical_gdf["district_name"].dropna().unique()):
print(f" - {n}")
# Filter to Vienna districts only
valid_names = set(VIENNA_DISTRICTS_CANON.values())
current_gdf = current_gdf[current_gdf["district_name"].isin(valid_names)].copy()
historical_gdf = historical_gdf[historical_gdf["district_name"].isin(valid_names)].copy()
print(f"\nAfter filtering to Vienna districts:")
print(f" Current: {len(current_gdf)} districts")
print(f" Historical: {len(historical_gdf)} districts")
# -----------------------------------------------------------------------
# Strategy: dissolve city boundary, diff, attribute, merge per district
# -----------------------------------------------------------------------
# 1. Dissolve to city-level
current_union = shapely.union_all(current_gdf.geometry)
historical_union = shapely.union_all(historical_gdf.geometry)
print(f"\nCurrent union area: {current_union.area:.10f}")
print(f"Historical union area: {historical_union.area:.10f}")
# 2. Snap & compute symmetric difference
TOLERANCE = 1e-6
snapped_hist = shapely.snap(historical_union, current_union, TOLERANCE)
added = shapely.difference(current_union, snapped_hist)
removed = shapely.difference(snapped_hist, current_union)
unchanged = shapely.intersection(current_union, snapped_hist)
# 3. Remove slivers
MIN_AREA = 5e-9 # ~40 sq m in reality at Vienna's latitude
def remove_slivers(geom):
if geom.is_empty:
return shapely.geometry.MultiPolygon()
if geom.geom_type == "MultiPolygon":
polys = [g for g in geom.geoms if g.area >= MIN_AREA]
if not polys:
return shapely.geometry.MultiPolygon()
return shapely.geometry.MultiPolygon(polys) if len(polys) > 1 else shapely.geometry.MultiPolygon([polys[0]])
elif geom.geom_type == "Polygon":
if geom.area >= MIN_AREA:
return shapely.geometry.MultiPolygon([geom])
return shapely.geometry.MultiPolygon()
return geom
added = remove_slivers(added)
removed = remove_slivers(removed)
print(f"\nAfter cleaning:")
print(f" Added: area={added.area:.10f}" if not added.is_empty else " Added: empty")
print(f" Removed: area={removed.area:.10f}" if not removed.is_empty else " Removed: empty")
print(f" Unchanged: area={unchanged.area:.10f}" if not unchanged.is_empty else " Unchanged: empty")
# 4. Attribute fragments to districts and merge per district
def get_district_fragments(geom, ctype, source_gdf):
"""Attribute geometry fragments to districts, merging by district name."""
if geom.is_empty:
return {}
if geom.geom_type == "MultiPolygon":
polys = list(geom.geoms)
elif geom.geom_type == "Polygon":
polys = [geom]
else:
return {}
# Group fragments by district
district_geoms = {}
for poly in polys:
if poly.area < MIN_AREA:
continue
if poly.geom_type == "Polygon":
mp = shapely.geometry.MultiPolygon([poly])
else:
mp = poly
# Find which districts this intersects
districts = set()
for _, row in source_gdf.iterrows():
if mp.intersects(row["geometry"]):
districts.add(row["district_name"])
dname = ", ".join(sorted(districts)) if districts else "unknown"
if dname not in district_geoms:
district_geoms[dname] = []
district_geoms[dname].append(poly)
return district_geoms
added_by_district = get_district_fragments(added, "added_since_2014", current_gdf)
removed_by_district = get_district_fragments(removed, "removed_since_2014", historical_gdf)
fragments = []
# Merge fragments per district
for dname, polys in added_by_district.items():
if len(polys) == 1:
merged = shapely.geometry.MultiPolygon([polys[0]])
else:
merged = shapely.union_all(polys)
if merged.geom_type == "Polygon":
merged = shapely.geometry.MultiPolygon([merged])
fragments.append({
"geometry": merged,
"change_type": "added_since_2014",
"district_name": dname,
})
for dname, polys in removed_by_district.items():
if len(polys) == 1:
merged = shapely.geometry.MultiPolygon([polys[0]])
else:
merged = shapely.union_all(polys)
if merged.geom_type == "Polygon":
merged = shapely.geometry.MultiPolygon([merged])
fragments.append({
"geometry": merged,
"change_type": "removed_since_2014",
"district_name": dname,
})
# Unchanged: per district
for _, row in current_gdf.iterrows():
dname = row["district_name"]
dist_geom = row["geometry"]
unchanged_part = shapely.intersection(dist_geom, unchanged)
if not unchanged_part.is_empty:
unchanged_part = remove_slivers(unchanged_part)
if not unchanged_part.is_empty:
if unchanged_part.geom_type == "MultiPolygon":
merged = shapely.union_all(list(unchanged_part.geoms))
elif unchanged_part.geom_type == "Polygon":
merged = unchanged_part
else:
continue
if merged.geom_type == "Polygon":
merged = shapely.geometry.MultiPolygon([merged])
fragments.append({
"geometry": merged,
"change_type": "unchanged",
"district_name": dname,
})
result_gdf = gpd.GeoDataFrame(fragments, crs="EPSG:4326")
if result_gdf.empty:
print("No fragments produced. Using current boundaries as unchanged.")
result_gdf = current_gdf.copy()
result_gdf["change_type"] = "unchanged"
result_gdf = result_gdf[["geometry", "change_type", "district_name"]]
print(f"\nFragments after merging: {len(result_gdf)}")
if not result_gdf.empty:
print(result_gdf["change_type"].value_counts())
# -----------------------------------------------------------------------
# 4. touches_changed
# -----------------------------------------------------------------------
changed_geoms = []
for _, row in result_gdf.iterrows():
if row["change_type"] in ("added_since_2014", "removed_since_2014"):
changed_geoms.append(row["geometry"])
if changed_geoms:
changed_union = shapely.union_all(changed_geoms)
else:
changed_union = shapely.geometry.MultiPolygon()
district_touches = {}
for _, row in current_gdf.iterrows():
dname = row["district_name"]
geom = row["geometry"]
if not changed_union.is_empty:
touches = geom.touches(changed_union)
if not touches:
overlap = shapely.intersection(geom, changed_union)
touches = not overlap.is_empty
district_touches[dname] = touches
else:
district_touches[dname] = False
for _, row in historical_gdf.iterrows():
dname = row["district_name"]
if dname not in district_touches:
geom = row["geometry"]
if not changed_union.is_empty:
touches = geom.touches(changed_union)
if not touches:
overlap = shapely.intersection(geom, changed_union)
touches = not overlap.is_empty
district_touches[dname] = touches
else:
district_touches[dname] = False
def get_touches(dn_str):
if pd.isna(dn_str) or dn_str == "unknown":
return False
for d in dn_str.split(","):
d = d.strip()
if district_touches.get(d, False):
return True
return False
result_gdf["touches_changed"] = result_gdf["district_name"].apply(get_touches)
# Simplify and ensure MultiPolygon
result_gdf["geometry"] = result_gdf["geometry"].simplify(0.00001, preserve_topology=True)
result_gdf["geometry"] = result_gdf["geometry"].buffer(0)
def ensure_multipolygon(geom):
if geom.is_empty:
return geom
if geom.geom_type == "Polygon":
return shapely.geometry.MultiPolygon([geom])
return geom
result_gdf["geometry"] = result_gdf["geometry"].apply(ensure_multipolygon)
# -----------------------------------------------------------------------
# 5. Output
# -----------------------------------------------------------------------
output_path = "/work/vienna_boundary_changes.geojson"
result_gdf.to_file(output_path, driver="GeoJSON")
print(f"\n{'=' * 60}")
print(f"Output written to {output_path}")
print(f"Features: {len(result_gdf)}")
print(f"Columns: {result_gdf.columns.tolist()}")
print(f"\nAll features:")
print(result_gdf[["change_type", "district_name", "touches_changed"]])
print(f"\nDistrict touch summary:")
for dname, touches in sorted(district_touches.items()):
print(f" {dname}: touches_changed={touches}")
# Verify
verify = gpd.read_file(output_path)
print(f"\nVerification: read back {len(verify)} features")
print(verify["change_type"].value_counts())
print(f"Geometry types: {verify.geometry.geom_type.unique()}")
for ct in ["added_since_2014", "removed_since_2014", "unchanged"]:
sub = verify[verify["change_type"] == ct]
if not sub.empty:
areas = sub.geometry.area
print(f" {ct}: count={len(sub)}, total_area={areas.sum():.10f}, max={areas.max():.8f}")
if __name__ == "__main__":
main()
Wrote /work/solve.py (20396 chars).
12:43
Bash
cd /work && timeout 300 python solve.py 2>&1
============================================================
Fetching CURRENT Vienna district boundaries...
============================================================
Got 23 current district features
============================================================
Fetching 2014-01-01 Vienna district boundaries...
============================================================
Got 26 historical district features
Current district names:
- Alsergrund
- Brigittenau
- Donaustadt
- Döbling
- Favoriten
- Floridsdorf
- Hernals
- Hietzing
- Innere Stadt
- Josefstadt
- Landstraße
- Leopoldstadt
- Liesing
- Margareten
- Mariahilf
- Meidling
- Neubau
- Ottakring
- Penzing
- Rudolfsheim-Fünfhaus
- Simmering
- Wieden
- Währing
Historical district names:
- Alsergrund
- Brigittenau
- Donaustadt
- Döbling
- Enzersfeld
- Favoriten
- Floridsdorf
- Gerasdorf bei Wien
- Hernals
- Hietzing
- Innere Stadt
- Josefstadt
- Klein-Engersdorf
- Landstraße
- Leopoldstadt
- Liesing
- Margareten
- Mariahilf
- Meidling
- Neubau
- Ottakring
- Penzing
- Rudolfsheim-Fünfhaus
- Simmering
- Wieden
- Währing
After filtering to Vienna districts:
Current: 23 districts
Historical: 23 districts
Current union area: 0.0501985119
Historical union area: 0.0501982461
After cleaning:
Added: area=0.0000016755
Removed: area=0.0000014451
Unchanged: area=0.0501965629
Fragments after merging: 41
change_type
unchanged 23
added_since_2014 9
removed_since_2014 9
Name: count, dtype: int64
============================================================
Output written to /work/vienna_boundary_changes.geojson
Features: 41
Columns: ['geometry', 'change_type', 'district_name', 'touches_changed']
All features:
change_type district_name touches_changed
0 added_since_2014 Liesing True
1 added_since_2014 Hietzing True
2 added_since_2014 Penzing True
3 added_since_2014 Döbling True
4 added_since_2014 Döbling, Floridsdorf True
5 added_since_2014 Floridsdorf True
6 added_since_2014 Donaustadt True
7 added_since_2014 Simmering True
8 added_since_2014 Favoriten True
9 removed_since_2014 Liesing True
10 removed_since_2014 Hietzing True
11 removed_since_2014 Penzing True
12 removed_since_2014 Hernals True
13 removed_since_2014 Döbling True
14 removed_since_2014 Floridsdorf True
15 removed_since_2014 Donaustadt True
16 removed_since_2014 Simmering True
17 removed_since_2014 Favoriten True
18 unchanged Alsergrund False
19 unchanged Hietzing True
20 unchanged Innere Stadt False
21 unchanged Josefstadt False
22 unchanged Leopoldstadt False
23 unchanged Mariahilf False
24 unchanged Meidling False
25 unchanged Neubau False
26 unchanged Penzing True
27 unchanged Rudolfsheim-Fünfhaus False
28 unchanged Währing False
29 unchanged Landstraße False
30 unchanged Brigittenau False
31 unchanged Donaustadt True
32 unchanged Döbling True
33 unchanged Favoriten True
34 unchanged Floridsdorf True
35 unchanged Hernals True
36 unchanged Liesing True
37 unchanged Margareten False
38 unchanged Ottakring False
39 unchanged Simmering True
40 unchanged Wieden False
District touch summary:
Alsergrund: touches_changed=False
Brigittenau: touches_changed=False
Donaustadt: touches_changed=True
Döbling: touches_changed=True
Favoriten: touches_changed=True
Floridsdorf: touches_changed=True
Hernals: touches_changed=True
Hietzing: touches_changed=True
Innere Stadt: touches_changed=False
Josefstadt: touches_changed=False
Landstraße: touches_changed=False
Leopoldstadt: touches_changed=False
Liesing: touches_changed=True
Margareten: touches_changed=False
Mariahilf: touches_changed=False
Meidling: touches_changed=False
Neubau: touches_changed=False
Ottakring: touches_changed=False
Penzing: touches_changed=True
Rudolfsheim-Fünfhaus: touches_changed=False
Simmering: touches_changed=True
Wieden: touches_changed=False
Währing: touches_changed=False
Verification: read back 41 features
change_type
unchanged 23
added_since_2014 9
removed_since_2014 9
Name: count, dtype: int64
Geometry types: <ArrowStringArray>
['MultiPolygon']
Length: 1, dtype: str
/work/solve.py:566: UserWarning: Geometry is in a geographic CRS. Results from 'area' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.
areas = sub.geometry.area
added_since_2014: count=9, total_area=0.0000016042, max=0.00000036
/work/solve.py:566: UserWarning: Geometry is in a geographic CRS. Results from 'area' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.
areas = sub.geometry.area
removed_since_2014: count=9, total_area=0.0000014123, max=0.00000041
/work/solve.py:566: UserWarning: Geometry is in a geographic CRS. Results from 'area' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.
areas = sub.geometry.area
unchanged: count=23, total_area=0.0501965459, max=0.01238067
12:50
Now we have a clean output with 41 features - 23 unchanged, 9 added, 9 removed. Let me verify the output is correct by checking: 1. MultiPolygon geometry type ✓ 2. All three change_type values present ✓ 3. district_name normalised ✓ 4. touches_changed boolean ✓ The results look reasonable: - Districts with touches_changed=True: Donaustadt, Döbling, Favoriten, Floridsdorf, Hernals, Hietzing, Liesing, Penzing, Simmering - Districts with touches_changed=False: Alsergrund, Brigittenau, Innere Stadt, Josefstadt, Landstraße, Leopoldstadt, Margareten, Mariahilf, Meidling, Neubau, Ottakring, Rudolfsheim-Fünfhaus, Wieden, Währing This makes sense - the changes are mainly in outer districts (Liesing, Penzing, Döbling, Floridsdorf, etc.) while inner city districts are unchanged. Let me verify the output file one more time.
12:50
12:50
Bash
cd /work && python3 << 'PYEOF'
cd /work && python3 << 'PYEOF'
import geopandas as gpd
import json
gdf = gpd.read_file("/work/vienna_boundary_changes.geojson")
print("=== Final Verification ===")
print(f"Features: {len(gdf)}")
print(f"CRS: {gdf.crs}")
print(f"Columns: {gdf.columns.tolist()}")
print(f"All geom types: {gdf.geometry.geom_type.unique()}")
print()
print("Change type distribution:")
print(gdf["change_type"].value_counts())
print()
print("All features:")
for _, row in gdf.iterrows():
areas = row.geometry.area if hasattr(row.geometry, 'area') else 0
print(f" {row.change_type:20s} | {row.district_name:25s} | touches={str(row.touches_changed):5s} | area={areas:.10f}")
print()
# Check JSON structure
with open("/work/vienna_boundary_changes.geojson") as f:
gj = json.load(f)
print(f"GeoJSON type: {gj['type']}")
print(f"Feature count: {len(gj['features'])}")
print(f"First feature properties: {gj['features'][0]['properties']}")
print(f"First feature geometry type: {gj['features'][0]['geometry']['type']}")
# Ensure all properties present
props_set = set()
for feat in gj['features']:
props_set.add(tuple(feat['properties'].keys()))
print(f"\nProperty sets across features: {props_set}")
# Check all geometry types
geom_types = set()
for feat in gj['features']:
geom_types.add(feat['geometry']['type'])
print(f"Geometry types in output: {geom_types}")
PYEOF
=== Final Verification ===
Features: 41
CRS: EPSG:4326
Columns: ['change_type', 'district_name', 'touches_changed', 'geometry']
All geom types: <ArrowStringArray>
['MultiPolygon']
Length: 1, dtype: str
Change type distribution:
change_type
unchanged 23
added_since_2014 9
removed_since_2014 9
Name: count, dtype: int64
All features:
added_since_2014 | Liesing | touches=True | area=0.0000001652
added_since_2014 | Hietzing | touches=True | area=0.0000000213
added_since_2014 | Penzing | touches=True | area=0.0000003099
added_since_2014 | Döbling | touches=True | area=0.0000002086
added_since_2014 | Döbling, Floridsdorf | touches=True | area=0.0000000653
added_since_2014 | Floridsdorf | touches=True | area=0.0000001150
added_since_2014 | Donaustadt | touches=True | area=0.0000000845
added_since_2014 | Simmering | touches=True | area=0.0000002696
added_since_2014 | Favoriten | touches=True | area=0.0000003649
removed_since_2014 | Liesing | touches=True | area=0.0000004089
removed_since_2014 | Hietzing | touches=True | area=0.0000000228
removed_since_2014 | Penzing | touches=True | area=0.0000002052
removed_since_2014 | Hernals | touches=True | area=0.0000000342
removed_since_2014 | Döbling | touches=True | area=0.0000000541
removed_since_2014 | Floridsdorf | touches=True | area=0.0000000875
removed_since_2014 | Donaustadt | touches=True | area=0.0000003224
removed_since_2014 | Simmering | touches=True | area=0.0000000433
removed_since_2014 | Favoriten | touches=True | area=0.0000002339
unchanged | Alsergrund | touches=False | area=0.0003591513
unchanged | Hietzing | touches=True | area=0.0045601884
unchanged | Innere Stadt | touches=False | area=0.0003470941
unchanged | Josefstadt | touches=False | area=0.0001318582
unchanged | Leopoldstadt | touches=False | area=0.0023272641
unchanged | Mariahilf | touches=False | area=0.0001761767
unchanged | Meidling | touches=False | area=0.0009796859
unchanged | Neubau | touches=False | area=0.0001945685
unchanged | Penzing | touches=True | area=0.0040856172
unchanged | Rudolfsheim-Fünfhaus | touches=False | area=0.0004739455
unchanged | Währing | touches=False | area=0.0007683558
unchanged | Landstraße | touches=False | area=0.0008958819
unchanged | Brigittenau | touches=False | area=0.0006916291
unchanged | Donaustadt | touches=True | area=0.0123806677
unchanged | Döbling | touches=True | area=0.0030210108
unchanged | Favoriten | touches=True | area=0.0038458717
unchanged | Floridsdorf | touches=True | area=0.0053845023
unchanged | Hernals | touches=True | area=0.0013795225
unchanged | Liesing | touches=True | area=0.0038746394
unchanged | Margareten | touches=False | area=0.0002433314
unchanged | Ottakring | touches=False | area=0.0010493688
unchanged | Simmering | touches=True | area=0.0028113522
unchanged | Wieden | touches=False | area=0.0002148624
GeoJSON type: FeatureCollection
Feature count: 41
First feature properties: {'change_type': 'added_since_2014', 'district_name': 'Liesing', 'touches_changed': True}
First feature geometry type: MultiPolygon
Property sets across features: {('change_type', 'district_name', 'touches_changed')}
Geometry types in output: {'MultiPolygon'}
12:54
The output looks clean and correct. Let me summarize: 1. We fetched 23 current Vienna districts and 26 historical (2014-01-01) ones, of which 23 are actual Vienna districts (the extra 3 are Enzersfeld, Gerasdorf bei Wien, and Klein-Engersdorf which were filtered out). 2. Name normalisation correctly matched all 23 districts between the two snapshots. 3. We computed the city-level symmetric difference and attributed changes back to districts: - 9 added_since_2014 fragments (Liesing, Hietzing, Penzing, Döbling, Döbling/Floridsdorf, Floridsdorf, Donaustadt, Simmering, Favoriten) - 9 removed_since_2014 fragments (Liesing, Hietzing, Penzing, Hernals, Döbling, Floridsdorf, Donaustadt, Simmering, Favoriten) - 23 unchanged fragments (one per district) 4. touches_changed is correctly True for districts bordering/containing changed areas (outer districts like those on the city boundary) and False for inner districts whose boundaries haven't changed. The output format matches the requirements: - MultiPolygon geometry ✓ - change_type: added_since_2014, removed_since_2014, or unchanged ✓ - district_name (normalised) ✓ - touches_changed (boolean) ✓ This is a good output.
12:54
12:54
Bash
ls -la /work/vienna_boundary_changes.geojson && head -5 /work/vienna_boundary_ch...
-rw-r--r-- 1 agent agent 204964 Jun 16 00:46 /work/vienna_boundary_changes.geojson
{
"type": "FeatureCollection",
"name": "vienna_boundary_changes",
"crs": { "type": "name", "properties": { "name": "urn:ogc:def:crs:OGC:1.3:CRS84" } },
"features": [
13:02
The output looks good. Let me do a final sanity check on the data. The file is created at `/work/vienna_boundary_changes.geojson` with 41 features, each having: - `change_type`: "added_since_2014", "removed_since_2014", or "unchanged" - `district_name`: normalised district name - `touches_changed`: boolean - Geometry: MultiPolygon The districts that touch changed areas are: Donaustadt, Döbling, Favoriten, Floridsdorf, Hernals, Hietzing, Liesing, Penzing, Simmering The districts that don't touch changed areas are: Alsergrund, Brigittenau, Innere Stadt, Josefstadt, Landstraße, Leopoldstadt, Margareten, Mariahilf, Meidling, Neubau, Ottakring, Rudolfsheim-Fünfhaus, Wieden, Währing This makes sense - the boundary changes are in the outer districts (south and west where Vienna borders Lower Austria), while the inner city districts are unchanged. Let me verify there are no issues with the file by reading it back one more time.
13:02
Bash
cd /work && python3 << 'PYEOF'
cd /work && python3 << 'PYEOF'
# Final comprehensive check
import geopandas as gpd
import json
gdf = gpd.read_file("/work/vienna_boundary_changes.geojson")
# Check all required fields exist
required = ['change_type', 'district_name', 'touches_changed']
for col in required:
assert col in gdf.columns, f"Missing column: {col}"
assert not gdf[col].isna().any(), f"Null values in {col}"
# Check change_type values
valid_types = ['added_since_2014', 'removed_since_2014', 'unchanged']
for val in gdf['change_type'].unique():
assert val in valid_types, f"Invalid change_type: {val}"
# Check touches_changed is boolean
assert gdf['touches_changed'].dtype == bool
# Check all geometries are MultiPolygon
assert all(gdf.geometry.geom_type == 'MultiPolygon'), "Not all geometries are MultiPolygon"
# Check districts are normalised correctly
vienna_districts = {
"Innere Stadt", "Leopoldstadt", "Landstraße", "Wieden", "Margareten",
"Mariahilf", "Neubau", "Josefstadt", "Alsergrund", "Favoriten",
"Simmering", "Meidling", "Hietzing", "Penzing", "Rudolfsheim-Fünfhaus",
"Ottakring", "Hernals", "Währing", "Döbling", "Brigittenau",
"Floridsdorf", "Donaustadt", "Liesing"
}
for _, row in gdf.iterrows():
dn = row['district_name']
if dn == 'unknown':
continue
for d in dn.split(", "):
d = d.strip()
if d and d not in vienna_districts:
print(f"Warning: unrecognised district: {d}")
print("All checks passed!")
print(f"\nFinal output: /work/vienna_boundary_changes.geojson")
print(f" Features: {len(gdf)}")
print(f" Added: {(gdf['change_type']=='added_since_2014').sum()}")
print(f" Removed: {(gdf['change_type']=='removed_since_2014').sum()}")
print(f" Unchanged: {(gdf['change_type']=='unchanged').sum()}")
print(f" Districts touching changes: {sorted(gdf[gdf['touches_changed']]['district_name'].unique())}")
PYEOF
All checks passed! Final output: /work/vienna_boundary_changes.geojson Features: 41 Added: 9 Removed: 9 Unchanged: 23 Districts touching changes: ['Donaustadt', 'Döbling', 'Döbling, Floridsdorf', 'Favoriten', 'Floridsdorf', 'Hernals', 'Hietzing', 'Liesing', 'Penzing', 'Simmering']
13:09
The output looks clean and correct. Let me review what we've produced: 1. **Data Sources**: We fetched current Vienna district boundaries (admin_level=9) and their 2014-01-01 counterparts from Overpass via osmnx. 2. **Name Normalisation**: We handled casing and spelling differences (e.g., `Landstraße` vs `Landstrasse`, `Währing` vs `Waehring`, etc.). We also filtered out non-Vienna districts that appeared in the historical data (Enzersfeld, Gerasdorf bei Wien, Klein-Engersdorf). 3. **Comparison**: We dissolved to city-level boundaries, snapped with tolerance to avoid coordinate noise, computed symmetric difference, and attributed fragments back to districts. 4. **Classification**: - 9 added_since_2014 fragments touching: Liesing, Hietzing, Penzing, Döbling, Döbling/Floridsdorf, Floridsdorf, Donaustadt, Simmering, Favoriten - 9 removed_since_2014 fragments touching: Liesing, Hietzing, Penzing, Hernals, Döbling, Floridsdorf, Donaustadt, Simmering, Favoriten - 23 unchanged fragments (one per district) 5. **touches_changed**: Districts with True are those that interestingly include the outer/peripheral districts (Donaustadt, Döbling, Favoriten, Floridsdorf, Hernals, Hietzing, Liesing, Penzing, Simmering) and the inner districts that share borders with changed areas (Hietzing also shows up unchanged but touches_changed=True since it borders changed areas). The output file `/work/vienna_boundary_changes.geojson` is a valid GeoJSON with MultiPolygon geometries and the required properties.
13:09