fio-l3-vienna-geofabrik-highways
Model error
Retrospective
assessed 2026-06-09T12:13:11Z
Map (reference ← swipe → agent)
0:00
I'm building a traffic-noise model of the Gürtel corridor for the City of Vienna and need the surrounding network as a single GeoPackage. Can you pull every highway from the current Geofabrik Austria PBF (https://download.geofabrik.de/europe/austria-latest.osm.pbf) that intersects a 500 m buffer around Vienna's Gürtel, defined as the highway ways whose name ends in 'Gürtel'? I also need every public-transport route relation (bus, tram, subway, train, trolleybus, light_rail) that crosses the same buffer, with each route as a single feature. Keep the OSM attributes full and untruncated: name, highway, maxspeed, lanes, surface, and oneway on highways; ref, name, operator, and route on the PT routes. Please write both layers to vienna_network.gpkg, with the highways layer as LineString and the pt_routes layer as MultiLineString, in Austria's standard projected coordinate system.
0:00
System info
{
"adapter": "openrouter",
"model": "deepseek/deepseek-v4-flash",
"base_url": "https://openrouter.ai/api/v1"
}
4:02
The user wants me to:
1. Download the Austria PBF from Geofabrik
2. Find all highway ways whose name ends in 'Gürtel' (the Gürtel road in Vienna)
3. Create a 500m buffer around those ways
4. Extract all highways that intersect that buffer
5. Extract all public transport route relations (bus, tram, subway, train, trolleybus, light_rail) that cross the same buffer
6. Write both layers to vienna_network.gpkg with specific attributes and CRS (Austria's standard projected CRS - EPSG:31256 or EPSG:31287... let me think - Austria's standard national CRS is EPSG:31256 (MGI / Austria GK East) or more commonly EPSG:31287 (MGI / Austria Lambert). Actually for Vienna, EPSG:31256 is the Gauss-Krüger East zone. But "Austria's standard projected coordinate system" is typically EPSG:31287 (MGI Austria Lambert) or EPSG:31256 (MGI / Austria GK East). Let me check - the most commonly used standard projected CRS for all of Austria is EPSG:3416 (MGI / Austria GK Central) or EPSG:31287. Actually, EPSG:31287 is MGI / Austria Lambert which covers all of Austria. But EPSG:31256 (MGI / Austria GK East) covers the eastern part including Vienna. However, "Austria's standard projected coordinate system" likely refers to EPSG:31287 MGI Austria Lambert which is the standard for nationwide mapping. Actually, I think the most common one used in Austria is EPSG:31256 for the eastern region or EPSG:31254 for the central... Wait, in practice, many Austrian GIS projects use EPSG:31287 (MGI Austria Lambert) as the national projected CRS. Let me just use EPSG:31287 - MGI / Austria Lambert.
Actually, looking more carefully - the standard projected coordinate system for Austria is often EPSG:31256 (MGI / Austria GK East) for the Vienna area since it's in the eastern Gauss-Krüger zone. But "Austria's standard projected coordinate system" could mean EPSG:31287 (MGI Austria Lambert) which is the official national projection. I think I'll use EPSG:31287 as it's the nationwide standard.
Let me plan the approach:
1. Download the PBF file
2. Use osmium to parse it and find:
a. Ways whose name ends in 'Gürtel' (the Gürtel road)
b. Create a 500m buffer around those ways (in a suitable CRS)
c. Find all highways that intersect this buffer
d. Find all PT route relations that cross the buffer
This is complex. Let me use osmium for the heavy lifting and geopandas/shapely for spatial operations.
Actually, let me think about this more carefully. The PBF is large (Austria is ~300MB+). I need to:
1. First scan for ways with name ending in 'Gürtel' to get the Gürtel geometry
2. Create a buffer
3. Then scan for all highways intersecting the buffer
4. Then scan for PT route relations
Using osmium with Python handler is probably the best approach for this.
Let me write a comprehensive Python script.
Plan:
1. Download the PBF file
2. Use osmium to read the PBF in two passes (or one pass with handlers)
3. First pass: find Gürtel ways and their geometries
4. Compute buffer
5. Second pass: find all highways intersecting buffer and all PT route relations crossing buffer
Actually, osmium allows a single pass approach. Let me think about the data flow:
For the Gürtel:
- Scan all ways, find those with name ending in 'Gürtel'
- Collect their node references
- Build geometries
For the buffer:
- Union all Gürtel ways into a single geometry
- Buffer by 500m
For highways:
- Scan all ways, if highway tag exists and the way intersects the buffer, collect it
For PT routes:
- This is trickier - route relations. I need to find relation members that are ways, and check if those ways are within the buffer.
Actually, with osmium I can do this in one pass but I need to handle the data flow carefully. The problem is that nodes come before ways, and ways come before relations. So I need to:
- Store all nodes, or at least those relevant
- When processing ways, check if they're Gürtel or highways
- When processing relations, if it's a PT route, check if its member ways are in the buffer
Let me write this step by step.
Actually, given the complexity, let me use a simpler approach:
1. Use osmium to extract just the Vienna area first (using a bounding box or the region)
2. Then work with the smaller dataset
Or better yet, I can use the approach:
1. Download the whole PBF
2. Use osmium to process it in a single pass with a custom handler
Let me write the script.
For osmium, the typical pattern is:
```python
import osmium
import shapely.wkb
import geopandas as gpd
import pandas as pd
import numpy as np
from pyproj import Transformer
class MyHandler(osmium.SimpleHandler):
def __init__(self):
super().__init__()
...
```
Let me think about the approach more carefully.
The key challenge is: to find PT route relations that "cross" the buffer, I need to check if the geometry of the relation (i.e., the ways that make up the route) intersects the buffer. But relations reference ways, and ways reference nodes. So I need to:
1. First pass: read all nodes (store locations)
2. Second pass: read all ways (store geometries and tags)
3. Third pass: read all relations and use way geometries to check intersection
Or I can do it in osmium with a handler and store things in memory.
Actually, with `osmium.SimpleHandler`, it processes nodes, ways, and relations in order. So I can:
1. In the `node` callback, store node locations
2. In the `way` callback, build geometries from stored nodes, check if it's a Gürtel way or highway
3. In the `relation` callback, check if it's a PT route and if its member ways are within the buffer
But there's a catch: when processing relations, the ways that are members may not have been processed yet (in the way callback). Actually, in osmium, the order is: all nodes, then all ways, then all relations. So by the time relations are processed, all ways have been seen.
Wait, osmium.SimpleHandler processes in order: nodes, ways, relations. So I can:
1. Store all node locations
2. When processing ways, build geometries and check for Gürtel/highway status
3. When processing relations, all ways are already stored, so I can check their geometries
Let me outline the approach:
```python
class ViennaHandler(osmium.SimpleHandler):
def __init__(self):
super().__init__()
self.node_coords = {} # id -> (lon, lat)
self.gurtel_way_ids = set()
self.gurtel_geometries = []
self.highway_features = [] # list of (geometry, tags)
self.pt_route_features = [] # list of (geometry, tags)
def node(self, n):
self.node_coords[n.id] = (n.location.lon, n.location.lat)
def way(self, w):
# Check if it's a Gürtel way
name = w.tags.get('name')
highway = w.tags.get('highway')
if name and name.endswith('Gürtel') and highway:
# Build geometry
coords = [self.node_coords[n.ref] for n in w.nodes if n.ref in self.node_coords]
if len(coords) >= 2:
self.gurtel_way_ids.add(w.id)
geom = LineString(coords)
self.gurtel_geometries.append(geom)
# Store all highway ways for later processing
if highway:
coords = [self.node_coords[n.ref] for n in w.nodes if n.ref in self.node_coords]
if len(coords) >= 2:
tags = dict(w.tags)
geom = LineString(coords)
self.highway_features.append((w.id, geom, tags))
```
Hmm, but this would store ALL highways in Austria which is a lot. Let me optimize.
Better approach:
1. First, read the PBF to find Gürtel ways and build their geometries
2. Compute the buffer
3. Read the PBF again, filtering for highways/PT routes that intersect the buffer
But two passes over the PBF is slow. Let me do it in one pass but be smarter.
Actually, I think the best approach is:
1. First pass: read the entire PBF, store all node locations, and identify Gürtel ways
2. Compute buffer from Gürtel geometries
3. Second pass: read the PBF again (or rather, use the stored data from the first pass) to filter highways and PT routes
Wait, but I can't do a second pass easily without storing everything or re-reading the file.
Let me reconsider. The PBF for Austria is about 300-400MB. Reading it once is fine. Storing all node coordinates for all of Austria would be ~hundreds of millions of nodes, which could be memory-heavy.
Let me think of a smarter approach:
Option A: Use osmium to extract just the Vienna bounding box first, then work with the smaller subset.
Option B: Use two passes - first pass to find Gürtel, compute buffer, second pass with a location filter.
Option C: Use one pass but be smart about memory.
Let me go with Option A - extract using bounding box.
The bounding box for Vienna is approximately:
- min lon: 16.18
- min lat: 48.12
- max lon: 16.58
- max lat: 48.32
But since I need a 500m buffer around Gürtel, I need to be more careful. Let me just download the whole PBF and work with it, but be memory efficient.
Actually, let me use a practical approach:
1. Download the PBF
2. Use osmium command-line tool to extract just the area around Vienna (using a generous bbox)
3. Then process the extracted file with Python
Let me check if osmium tool is installed.
Actually, let me just write a Python script that does it all. I'll use osmium's Python bindings.
For memory efficiency with nodes, I can use osmium's NodeLocationsForWays handler which handles this automatically. But for the filtering I need, a custom approach is better.
Let me try a practical approach:
1. First, find Gürtel ways using a simple osmium script
2. Compute the buffer
3. Use osmium to extract ways and relations within the buffer area using a bounding box or geographic filter
4. Process the extracted data
Actually, let me just write a clean Python script that does it in one pass but stores nodes only for relevant ways.
Hmm, the problem is that when reading a PBF, nodes come first, then ways. So I don't know which nodes to store until I see the ways. But the ways reference nodes.
One trick: osmium has osmium.index, which can create a disk-based index for node locations. This is memory-efficient.
Let me use osmium's `NodeLocationsForWays` with a disk-based index:
```python
class MyHandler(osmium.SimpleHandler):
def __init__(self):
super().__init__()
self.locations = osmium.index.create_map("sparse_file_array", "node_cache")
self.gurtel_ways = []
self.highway_ways = []
class MyLocationHandler(osmium.SimpleHandler):
def __init__(self, handler):
super().__init__()
self.handler = handler
def node(self, n):
self.handler.locations.set(n.id, (n.location.lon, n.location.lat))
self.handler.node(n)
```
Wait, this is getting complex. Let me just use a simpler approach with two passes.
Actually, let me look at this from a practical standpoint. The simplest approach:
1. Download PBF
2. Use osmium to find Gürtel ways and write their node refs
3. Use a second script to compute the buffer
4. Use osmium to extract the region
Actually, I think the cleanest approach is to write a Python script that:
1. Reads the PBF once, storing ALL node coords (this might be large but let's try)
2. Finds Gürtel ways
3. Computes buffer
4. Filters highways and PT routes
For Austria, the number of nodes in the PBF is around 200-300 million. Each node coordinate stored as two floats + an ID is maybe 24 bytes, so 200M * 24 = 4.8GB. That's too much.
Let me use a different approach. I'll use osmnx or just process the file smarter.
Actually, let me try a different approach entirely:
1. Use `osmium` command line tools to extract just Vienna and surrounding area by bounding box
2. Process the smaller file
Let me check what osmium tools are available.
Let me write a comprehensive script. I'll use a two-pass approach:
Pass 1: Read the PBF to find Gürtel ways and their node references, store node coords only for Gürtel ways
Pass 2: Compute buffer, then read the PBF again to find highways and PT routes within the buffer
For the second pass, I need to know which nodes are within/around the buffer. I can't know this without reading the data.
OK, let me take yet another approach. I'll use the osmium `extract` command with a polygon or use the `osmium extract` tool to clip by a bounding box that generously covers Vienna.
Step 1: Download the PBF
Step 2: Use osmium CLI to extract a bounding box around Vienna
Step 3: Process the extracted file
Let me check what osmium CLI tools are available.
Actually, let me just use Python with a pragmatic approach:
1. First, find the Gürtel ways and their bbox
2. Use that bbox to extract a region from the PBF
3. Process the region
Let me write the script step by step.
Actually, I realize I can do this all in one pass if I use a two-stage approach within osmium:
Stage 1: `osmium.SimpleHandler` to find Gürtel ways, store their nodes
After Stage 1: Compute buffer
Stage 2: `osmium.SimpleHandler` to find highways and PT routes within the buffer
But Stage 2 needs to check if way geometries intersect the buffer, which requires node coordinates. So I need node coords for all potentially relevant ways.
This is the fundamental challenge with large PBF files.
Let me just use a practical approach:
1. Download PBF
2. Use `osmium` command to extract a generous bounding box around Vienna
3. Then process the smaller file
Let me check what the bounding box of the Gürtel approximately is:
- The Gürtel is a ring road around Vienna's inner districts
- Approx center: 48.21°N, 16.35°E
- A generous box would be: 48.15 to 48.27 N, 16.30 to 16.42 E
- With 500m buffer, maybe 48.14 to 48.28, 16.29 to 16.43
But I don't know the exact extent yet. Let me first find the Gürtel and then compute.
Let me just write a practical script. I'll use memory-efficient approaches.
Actually, I just realized I can use `osmium` with a filter that only processes certain tags. For example, `osmium tags-filter` can extract only objects with certain tags, which makes the file much smaller.
Let me plan the approach:
Step 1: Download the PBF
Step 2: Use osmium tags-filter to extract all ways with highway=* (this will still be large for Austria, but manageable)
Step 3: Also extract all relations with type=route and route in (bus, tram, subway, train, trolleybus, light_rail)
Step 4: From the highway ways, find those with name ending in 'Gürtel'
Step 5: Compute buffer
Step 6: Filter the highway ways to those intersecting the buffer
Step 7: Filter the route relations to those crossing the buffer
Actually, `osmium tags-filter` preserves node references but doesn't include all nodes. So I'd need to use `osmium getid` to get the nodes.
This is getting complex. Let me just use a Python script that does it efficiently.
Let me try this approach: use `osmium` with a handler that:
1. Stores node locations in a sparse file array (disk-based)
2. In the way callback, checks if it's a Gürtel way and stores its geometry
3. After Gürtel ways are collected, computes the buffer
4. Then filters other ways and relations
But the problem is that in osmium.SimpleHandler, all nodes come first, then all ways, then all relations. So by the time I've collected the Gürtel ways, the nodes are already processed. I can't go back.
Ok, let me use a two-pass approach with osmium:
Pass 1: Read the PBF, find Gürtel ways, collect their node refs, compute buffer geometry
Pass 2: Read the PBF again, use the buffer to filter highways and PT routes
For Pass 2, I need node coords for all ways that might intersect the buffer. I can't know which ways those are without reading them.
Alternative for Pass 2: Use osmium with a geographic filter. osmium supports `osmium extract --polygon` or `osmium extract --bbox`.
Or better: use the `osmium` Python library with a `LocationHandler` that stores all node locations to disk, then use `SimpleHandler` for ways and relations.
Let me try this approach:
```python
import osmium
from shapely.geometry import LineString, MultiLineString, shape
from shapely.ops import unary_union
import pyproj
import geopandas as gpd
import pandas as pd
import os
import urllib.request
# Step 1: Download PBF
url = "https://download.geofabrik.de/europe/austria-latest.osm.pbf"
pbf_path = "/work/austria-latest.osm.pbf"
...
# Step 2: First pass - find Gürtel ways
class GurtelHandler(osmium.SimpleHandler):
def __init__(self):
super().__init__()
self.node_coords = {} # Only for Gürtel way nodes
self.gurtel_way_ids = set()
self.gurtel_way_nodes = {} # way_id -> [node_ids]
self.gurtel_tags = {} # way_id -> {tag_dict}
def way(self, w):
name = w.tags.get('name')
highway = w.tags.get('highway')
if name and name.endswith('Gürtel') and highway:
self.gurtel_way_ids.add(w.id)
node_refs = [n.ref for n in w.nodes]
self.gurtel_way_nodes[w.id] = node_refs
self.gurtel_tags[w.id] = dict(w.tags)
handler1 = GurtelHandler()
handler1.apply_file(pbf_path)
# Now we know the Gürtel way node IDs. But we already passed the nodes.
# So we need another pass just for nodes.
```
This confirms the two-pass approach. Let me just do it:
Pass 1: Collect Gürtel way node refs
Pass 2: Collect node coords for those refs
[Compute buffer]
Pass 3: Collect highways and PT routes intersecting buffer
Actually, osmium lets you chain handlers. I can use the `LocationHandler` pattern:
```python
class GurtelNodeFinder(osmium.SimpleHandler):
def __init__(self):
super().__init__()
self.gurtel_node_ids = set()
self.node_coords = {} # id -> (lon, lat)
def way(self, w):
name = w.tags.get('name')
highway = w.tags.get('highway')
if name and name.endswith('Gürtel') and highway:
for n in w.nodes:
self.gurtel_node_ids.add(n.ref)
def node(self, n):
if n.id in self.gurtel_node_ids:
self.node_coords[n.id] = (n.location.lon, n.location.lat)
```
But wait, in osmium.SimpleHandler, the order is: node(), way(), relation(). So nodes are processed before ways. In this handler, when `node()` is called, `gurtel_node_ids` hasn't been populated yet because we haven't seen the ways.
So I need to use `osmium.SimpleHandler` properly. Let me use a different approach:
Use `osmium.apply_file()` with `locations=True` parameter, which automatically resolves node locations when processing ways. But this only works if we use `osmium.NodeLocationsForWays` wrapper.
Actually, osmium provides `osmium.NodeLocationsForWays` which is a handler wrapper that fills in node locations automatically:
```python
class MyHandler(osmium.SimpleHandler):
def way(self, w):
# w.nodes now have valid locations if wrapped with NodeLocationsForWays
for n in w.nodes:
lat = n.location.lat
lon = n.location.lon
```
But I don't think this stores all node locations - it just resolves them for the current way being processed. Let me check...
Actually, `NodeLocationsForWays` does store all node locations (in memory or disk) so that it can resolve them when ways are processed. The default is to use an in-memory index.
Let me use this approach:
```python
import osmium
from shapely.geometry import LineString, MultiLineString, shape, box
from shapely.ops import unary_union
import pyproj
import geopandas as gpd
import pandas as pd
import numpy as np
from functools import partial
import os
import urllib.request
from urllib.error import URLError
import time
# Download PBF...
class GurtelHandler(osmium.SimpleHandler):
def __init__(self):
super().__init__()
self.gurtel_ways = [] # list of (geom, tags)
def way(self, w):
name = w.tags.get('name')
highway = w.tags.get('highway')
if name and name.endswith('Gürtel') and highway:
coords = [(n.location.lon, n.location.lat) for n in w.nodes if n.location.valid()]
if len(coords) >= 2:
geom = LineString(coords)
self.gurtel_ways.append((geom, dict(w.tags)))
handler = GurtelHandler()
# Use NodeLocationsForWays to resolve node coords
handler.apply_file(pbf_path, locations=True)
```
This should work! The `locations=True` parameter tells osmium to resolve node locations automatically.
Now, for the second pass, I need to find:
1. All highway ways intersecting the buffer
2. All PT route relations (and their member ways) crossing the buffer
For highways, I can use `locations=True` in a handler.
For PT route relations, I need to:
- Find relations with type=route and route in (bus, tram, subway, train, trolleybus, light_rail)
- Collect the member ways of each route
- Build a MultiLineString from the member way geometries
- Check if it crosses the buffer
This requires knowing which ways belong to which relation. In osmium, relations come after ways, so when processing relations, I can check if I've already seen their member ways.
Here's my approach:
```python
class NetworkHandler(osmium.SimpleHandler):
def __init__(self, buffer_geom, buffer_wgs84):
super().__init__()
self.buffer_geom = buffer_geom # in projected CRS
self.buffer_wgs84 = buffer_wgs84 # in WGS84
self.highway_features = [] # (geom_wgs84, tags)
self.way_geometries = {} # way_id -> LineString WGS84
self.way_tags = {} # way_id -> tags
self.pt_routes = {} # rel_id -> {'tags': {}, 'way_ids': set()}
self.pt_route_geometries = {} # rel_id -> {'geom': MultiLineString, 'tags': {}}
def way(self, w):
highway = w.tags.get('highway')
if highway:
coords = [(n.location.lon, n.location.lat) for n in w.nodes if n.location.valid()]
if len(coords) >= 2:
geom = LineString(coords)
# Check if intersects buffer
# Transform to projected CRS for intersection test
geom_proj = transform(geom, wgs84_to_austria)
if geom_proj.intersects(self.buffer_geom):
self.highway_features.append((geom, dict(w.tags)))
# Store for possible PT route use
self.way_geometries[w.id] = geom
self.way_tags[w.id] = dict(w.tags)
def relation(self, r):
route = r.tags.get('route')
rel_type = r.tags.get('type')
if rel_type == 'route' and route in ('bus', 'tram', 'subway', 'train', 'trolleybus', 'light_rail'):
way_ids = []
for member in r.members:
if member.type == 'w': # way member
way_ids.append(member.ref)
# Check if any member way intersects the buffer
for wid in way_ids:
if wid in self.way_geometries:
geom = self.way_geometries[wid]
geom_proj = transform(geom, wgs84_to_austria)
if geom_proj.intersects(self.buffer_geom):
# This route crosses the buffer
route_ways = [self.way_geometries[wid] for wid in way_ids if wid in self.way_geometries]
if route_ways:
multi = MultiLineString(route_ways)
self.pt_route_geometries[r.id] = {'geom': multi, 'tags': dict(r.tags)}
break
```
Hmm, but storing all way geometries in memory could be large. Let me think...
Actually, for the PT routes, I only need ways that are members of PT route relations. I could first identify which ways are part of such relations, then only store those.
But since relations come after ways in the PBF, I can't know which ways are part of PT routes when processing ways.
Alternative: Make a third pass just for relations.
Or: Store all way geometries only for ways that are referenced by relations. But we don't know that yet.
Let me try a different approach: store all ways (but maybe we can filter by bounding box first). Since we're only interested in ways around Vienna, let me extract a bounding box first.
Actually, you know what, let me simplify. The buffer around Gürtel is relatively small. Most ways in Austria won't be within it. Let me just do the two-pass approach:
Pass 1: Find Gürtel ways and compute buffer
Pass 2: Read all ways and relations, filtering by the buffer
For Pass 2, I'll use osmium but only store geometries that might be relevant. Since we're looking at a small area, the number of ways within/near the buffer will be manageable.
Let me also add a spatial index to speed up the check, or just transform each geometry and check intersection.
Let me write the script now.
For the CRS: "Austria's standard projected coordinate system" - I'll use EPSG:31287 (MGI / Austria Lambert). Actually, looking at common use in Vienna, EPSG:31256 (MGI / Austria GK East) is also very common. Let me check...
EPSG:31287 (MGI / Austria Lambert) is the official national projection for Austria. EPSG:31256 is the Gauss-Krüger East zone which covers the Vienna area. Both are common.
I'll go with EPSG:31287 as it covers all of Austria uniformly.
Actually, many Austrian authorities use EPSG:31256 for the eastern part. Let me use EPSG:31287 since it says "Austria's standard" and Lambert is the national standard.
Wait, I just recalled - EPSG:3416 is MGI / Austria GK Central. Hmm, there are several.
Let me use EPSG:31287 (MGI Austria Lambert) - this is a common national standard.
Alright, let me write the full script.
Actually, let me reconsider my approach. Instead of storing ALL ways in memory, let me:
1. Find Gürtel ways, compute buffer in both WGS84 and projected
2. For the second pass, use osmium with `locations=True`
3. Store only ways that are within or near the buffer (I can use a rough bbox check first)
4. For PT routes, do a third pass or store the ways that are part of PT routes
But again, the challenge with PT routes is that relations come after ways.
Let me take yet another approach: process the file THREE times.
Pass 1: Find Gürtel ways, compute buffer (with locations=True)
Pass 2: Store ALL highway ways (only those near buffer) and ALL way geometries (for all ways, not just highways, since PT routes might use non-highway ways) that are near the buffer
Pass 3: Process relations, using the stored way geometries from Pass 2
Actually, for PT routes, the ways might be highways (roads, tracks) that I'm already collecting in Pass 2 for the highway layer. So if I store way geometries for all ways (not just highway-tagged), I can use them for PT routes.
Let me optimize: Pass 2 stores all way geometries (with coords) for ways that are within or near the buffer. Then Pass 3 uses these to build PT route geometries.
But "near the buffer" is tricky because a route relation might include ways that are outside the buffer but the route as a whole crosses the buffer. But the user said "crosses the same buffer", meaning the route should cross the buffer. If the route has some ways inside the buffer, it crosses the buffer.
So in Pass 2, I need to store all ways that are part of PT route relations AND are within/near the buffer. But I don't know which ways are part of PT routes until Pass 3.
This is a chicken-and-egg problem. Let me just store ALL ways (in the Vienna area) in Pass 2, and then filter in Pass 3.
Since we're dealing with a relatively small area (Vienna), the number of ways should be manageable (maybe 100k-200k ways).
Let me use this approach:
1. Download PBF
2. Pass 1: Find Gürtel ways, compute buffer
3. Pass 2: Read PBF again with locations=True, store way geometries for all ways within/around the buffer area (using a generous bbox)
4. Pass 3: Read PBF again, process relations, use stored way geometries to build PT route geometries
Actually, there's a simpler way. Let me use two passes:
Pass 1: Find Gürtel ways, compute buffer
Pass 2: Read PBF with locations=True, process ways AND relations:
- For ways: check if within buffer, if so and it's highway, add to highway_features
- Also store way geometry if it's a highway (for potential PT route use)
- For relations: check if PT route, look up member way geometries that we've stored
Since relations come after ways in the PBF, by the time we process relations, we've already seen all the ways. So if we store way geometries for all highway ways (anyway needed for the output), and also for any way that might be part of a PT route... but we don't know which are PT route members.
Hmm, this is circular again. Let me just store ALL way geometries that are within the buffer area, regardless of tagging. The number should be manageable for a small area.
Let me use a generous bbox filter first to reduce the number of ways stored.
OK let me just write the script. I'll use `locations=True` and store way geometries for everything in the buffer area.
Let me code it up:
```python
#!/usr/bin/env python3
"""
Extract Gürtel buffer highway network and PT routes from Austria PBF.
"""
import osmium
import geopandas as gpd
import pandas as pd
from shapely.geometry import LineString, MultiLineString, box, shape
from shapely.ops import unary_union, transform
import pyproj
import urllib.request
import os
import time
from urllib.error import URLError
# CRS
WGS84 = "EPSG:4326"
AUSTRIALAMB = "EPSG:31287" # MGI / Austria Lambert
# File paths
PBF_URL = "https://download.geofabrik.de/europe/austria-latest.osm.pbf"
PBF_PATH = "/work/austria-latest.osm.pbf"
GPKG_PATH = "/work/vienna_network.gpkg"
def download_file(url, path):
"""Download with retries."""
for attempt in range(3):
try:
print(f"Downloading {url}... (attempt {attempt+1})")
urllib.request.urlretrieve(url, path)
return
except URLError as e:
if attempt < 2:
time.sleep(5)
else:
raise
# Step 1: Download
if not os.path.exists(PBF_PATH):
download_file(PBF_URL, PBF_PATH)
else:
print(f"{PBF_PATH} already exists, skipping download")
# Step 2: Find Gürtel ways
print("Finding Gürtel ways...")
class GurtelHandler(osmium.SimpleHandler):
def __init__(self):
super().__init__()
self.gurtel_geoms_wgs84 = []
self.gurtel_ways = [] # (geom_wgs84, tags)
def way(self, w):
name = w.tags.get('name')
highway = w.tags.get('highway')
if name and name.endswith('Gürtel') and highway:
coords = [(n.location.lon, n.location.lat) for n in w.nodes if n.location.valid()]
if len(coords) >= 2:
geom = LineString(coords)
self.gurtel_geoms_wgs84.append(geom)
self.gurtel_ways.append((geom, dict(w.tags)))
h1 = GurtelHandler()
h1.apply_file(PBF_PATH, locations=True)
print(f"Found {len(h1.gurtel_ways)} Gürtel way segments")
# Step 3: Compute 500m buffer
print("Computing buffer...")
# Project Gürtel to Austria Lambert
project_to = pyproj.Transformer.from_crs(WGS84, AUSTRIALAMB, always_xy=True).transform
project_from = pyproj.Transformer.from_crs(AUSTRIALAMB, WGS84, always_xy=True).transform
gurtel_proj = [transform(project_to, g) for g in h1.gurtel_geoms_wgs84]
gurtel_union = unary_union(gurtel_proj)
buffer_proj = gurtel_union.buffer(500)
# Also get WGS84 buffer for bounding box checks
buffer_wgs84 = transform(project_from, buffer_proj)
print("Buffer area computed")
# Step 4: Collect highways and PT routes
print("Collecting highways and PT routes...")
class NetworkHandler(osmium.SimpleHandler):
def __init__(self, buffer_proj, project_to):
super().__init__()
self.buffer_proj = buffer_proj
self.project_to = project_to # transformer WGS84 -> projected
self.highway_features = [] # (geom_wgs84, tags)
self.way_geoms = {} # way_id -> LineString (WGS84)
self.pt_route_features = [] # (geom_wgs84, tags)
def way(self, w):
highway = w.tags.get('highway')
coords = [(n.location.lon, n.location.lat) for n in w.nodes if n.location.valid()]
if len(coords) < 2:
return
geom = LineString(coords)
# Quick bbox check first
if not geom.intersects(buffer_wgs84):
return
# More precise check in projected CRS
geom_proj = transform(self.project_to, geom)
if not geom_proj.intersects(self.buffer_proj):
return
if highway:
self.highway_features.append((geom, dict(w.tags)))
# Store way geom for PT route use
self.way_geoms[w.id] = geom
def relation(self, r):
route = r.tags.get('route')
rel_type = r.tags.get('type')
if rel_type == 'route' and route in ('bus', 'tram', 'subway', 'train', 'trolleybus', 'light_rail'):
# Collect member way geometries
member_geoms = []
for member in r.members:
if member.type == 'w' and member.ref in self.way_geoms:
member_geoms.append(self.way_geoms[member.ref])
if not member_geoms:
return
# Check if any member way intersects buffer
for g in member_geoms:
g_proj = transform(self.project_to, g)
if g_proj.intersects(self.buffer_proj):
multi = MultiLineString(member_geoms)
self.pt_route_features.append((multi, dict(r.tags)))
break
h2 = NetworkHandler(buffer_proj, project_to)
h2.apply_file(PBF_PATH, locations=True)
print(f"Found {len(h2.highway_features)} highway features")
print(f"Found {len(h2.pt_route_features)} PT route features")
# Step 5: Write to GeoPackage
print("Writing GeoPackage...")
# Highways
highway_rows = []
for geom, tags in h2.highway_features:
row = {}
for key in ['name', 'highway', 'maxspeed', 'lanes', 'surface', 'oneway']:
row[key] = tags.get(key)
row['geometry'] = transform(project_to, geom)
highway_rows.append(row)
gdf_highways = gpd.GeoDataFrame(highway_rows, crs=AUSTRIALAMB, geometry='geometry')
gdf_highways = gdf_highways[['name', 'highway', 'maxspeed', 'lanes', 'surface', 'oneway', 'geometry']]
# PT Routes
pt_rows = []
for geom, tags in h2.pt_route_features:
row = {}
for key in ['ref', 'name', 'operator', 'route']:
row[key] = tags.get(key)
row['geometry'] = transform(project_to, geom)
pt_rows.append(row)
gdf_pt = gpd.GeoDataFrame(pt_rows, crs=AUSTRIALAMB, geometry='geometry')
gdf_pt = gdf_pt[['ref', 'name', 'operator', 'route', 'geometry']]
# Write to GeoPackage
gdf_highways.to_file(GPKG_PATH, layer='highways', driver='GPKG')
gdf_pt.to_file(GPKG_PATH, layer='pt_routes', driver='GPKG')
print(f"Done! Written to {GPKG_PATH}")
```
Wait, there are some issues with this approach:
1. The `buffer_wgs84` reference in the `way` method - this is defined outside the class. I need to pass it in.
2. The `project_to` function reference - when I store it in the handler, it might not be picklable or might cause issues. Let me use pyproj.Transformer properly.
Let me also reconsider - for the `buffer_wgs84`, I'm checking `geom.intersects(buffer_wgs84)` but the buffer in WGS84 is distorted. It's better to just do the projected check.
Also, storing all way geometries for ways that intersect the buffer might still be a lot. But for a 500m buffer around Gürtel (which is about 15km long ring), the area is about 15km * (500m * 2) + pi * 500^2 = 15km² + 0.785km² ≈ 16km². That's quite small. The number of ways in this area should be very manageable.
Let me refine the approach and also handle the large file download.
Actually, I realize there might be an issue with the bbox quick check. The `buffer_wgs84` in WGS84 coordinates might not be a good approximation since 500m in projected space is different from 500m in WGS84 degrees. But for a quick bbox check, it's fine to use a generous WGS84 buffer.
Let me refine the script and make sure it handles everything correctly.
One more thing - I need to be careful about the `oneway` tag. In OSM, it could be "yes", "no", "1", "0", "-1", etc. I'll keep it as-is.
Also, for the PT routes, the user said "each route as a single feature" with MultiLineString geometry. My approach of collecting all member ways into a MultiLineString is correct.
Let me also consider: what if a PT route relation has no ways with stored geometries? Then we can't create the route geometry.
Let me also handle the case where the same way might be a highway AND part of a PT route. That's fine - they go to different layers.
OK, let me write the final script. But first, let me also consider that downloading the full Austria PBF might take a while and be large. Let me check if there's a smaller regional extract for Vienna specifically.
Geofabrik has Austria regions:
- https://download.geofabrik.de/europe/austria.html
- Upper Austria, Lower Austria, Vienna, etc.
Vienna-specific: https://download.geofabrik.de/europe/austria/vienna-latest.osm.pbf
This would be much smaller! Let me use the Vienna extract instead of the whole Austria.
But wait - the Gürtel is at the border of Vienna and might extend into Lower Austria. Let me check...
The Gürtel is entirely within Vienna municipality, so the Vienna extract should be sufficient. But to be safe, I could use the whole Austria extract. However, the Vienna extract is much smaller and faster to process.
Actually, let me use the Austria PBF as the user specifically asked for the "Geofabrik Austria PBF" URL. But to avoid issues with the download size, let me use the Vienna extract if it's more practical. Hmm, the user specifically said the Austria PBF though.
Let me use the Austria PBF as requested, but I'll handle the large file efficiently.
Wait, let me reconsider - maybe I can use the Vienna area extract since all the data we need is in Vienna. The Gürtel is entirely in Vienna. And the buffer around it would extend slightly beyond Vienna's borders into Lower Austria, but the Vienna PBF might already include some surrounding areas or might be clipped to administrative boundaries.
To be safe, let me use the Austria-wide PBF as the user specifically mentioned it.
Let me write a clean, well-structured script.
Actually, one more consideration: for the PT routes, some route relations might have hundreds of member ways spanning a large area. If the route crosses the buffer, I need to include all its member ways (to create a complete route geometry), not just those within the buffer. But storing all way geometries for all PT route members across all of Austria would be too much.
So my approach of only storing way geometries that are within/near the buffer is flawed for PT routes. A PT route might cross the buffer but have member ways far outside the buffer.
Let me think about this differently. The user wants "every public-transport route relation that crosses the same buffer" with "each route as a single feature". So if a route crosses the buffer, I need the FULL geometry of the route (all member ways), not just the part within the buffer.
This means I need to store geometries of all ways that are members of PT routes, even if they're far from the buffer. But for Austria, this could be a lot of ways.
Alternative approach for PT routes:
1. First, find all PT route relations (using a tag filter - no geometry needed)
2. Collect all member way IDs from these relations
3. Then extract those ways from the PBF
This would require processing the PBF multiple times or finding a way to handle it in one pass.
Actually, since relations come after ways in the PBF, I can't know which ways I need when processing ways. But I can do it in two passes over just the relations:
Pass A: Read all relations to find PT routes and their member way IDs
Pass B: Read the PBF to extract all those ways, checking if any intersect the buffer
If any way of a route intersects the buffer, the whole route is included.
Let me refine the approach:
Pass 1: Find Gürtel ways, compute buffer
Pass 2: Read all relations, find PT routes, collect member way IDs per route, AND store which routes have ways that are in our stored way geometries
But this is still chicken-and-egg. Let me do:
Pass 1: Find Gürtel ways, compute buffer
Pass 2: Read all ways (with locations=True), store geometries for ALL ways that intersect the buffer (for highways), and ALSO store geometries for ALL ways (regardless of location) for ways that might be part of PT routes
But I can't know which ways are part of PT routes during Pass 2 since relations haven't been processed yet.
Hmm, let me try a different approach:
1. Find Gürtel, compute buffer - one pass
2. Read all relations to find PT routes and their member way IDs - one pass
3. Read the PBF with locations=True, process ALL ways, checking:
a. If it's a highway intersecting buffer -> add to highway layer
b. If its ID is in the set of PT route member way IDs -> store geometry
4. For each PT route, build the MultiLineString from stored member way geometries
This is three passes but the second pass (relations only) is fast since relations are fewer than ways.
Wait, actually I can do this in two passes:
Pass 1: Find Gürtel, compute buffer (using locations=True for ways)
Pass 2:
a. Skip nodes (we use locations=True)
b. For ways: check if intersects buffer. If highway, add to highway features. Also store ALL way geometries that are in PT route member way IDs (but we don't know these yet!)
c. For relations: find PT routes, look up member way geometries
The issue remains: during way processing, we don't know which ways are PT route members.
Let me use THREE passes:
Pass 1: Find Gürtel ways, compute buffer
Pass 2: Read all relations, find PT routes, collect {rel_id: {tags, [member_way_ids]}}
Pass 3: Read all ways (locations=True):
- Check if highway and intersects buffer -> add to highway layer
- Check if way_id is in any PT route's member way_ids -> store geometry
After ways are done, build PT route geometries from stored way geometries
Actually wait, I realize there's a simpler way. Since the user wants PT routes that CROSS the buffer, I need to check if the route's geometry intersects the buffer. The route's geometry is the union of its member way geometries. If I store way geometries only for ways within the buffer, I can check if the route has at least one way within the buffer.
But then for the output, I need the FULL route geometry, which requires storing all member way geometries. However, for PT routes that cross the buffer, their member ways might be spread across the entire route network.
For a bus route in Vienna, the total length might be 10-20km, which is manageable. Let me just store way geometries for ALL ways that are members of PT routes across all of Austria. This might be a lot (thousands of routes × tens of ways per route = tens of thousands of ways), but it should be manageable in memory.
Actually, let me think about this differently. Most ways are only a few hundred meters long. Even if a PT route has 100 ways, each with ~10 nodes, storing 100 LineStrings is nothing. So storing geometries for all PT route member ways across Austria should be fine.
Let me estimate:
- Austria has maybe 1000 bus routes
- Each bus route has maybe 50-100 ways
- Total: 50,000-100,000 ways
- Each LineString with ~10 nodes = ~800 bytes
- Total: ~40-80 MB for geometries, plus ~20 MB for node coords
This is manageable!
So the three-pass approach:
Pass 1: Find Gürtel ways, compute buffer (use locations=True)
Pass 2: Read relations, find PT routes, collect {rel_id: {tags, [member_way_ids]}}
Pass 3: Read ways (locations=True):
- For each way:
- Check if intersects buffer
- If highway and intersects buffer -> add to highway features
- If way_id in any PT route's member way_ids -> store geometry
- After all ways: build PT route geometries
Let me also collect the total set of PT route member way IDs from Pass 2 to use in Pass 3.
Actually, I think osmium allows me to avoid a separate relation pass. Let me check if osmium processes data in order: nodes → ways → relations.
In osmium.SimpleHandler:
- node() is called for each node
- way() is called for each way
- relation() is called for each relation
So yes, ways come before relations. This means in a single pass, when processing ways, I don't know which ways are part of PT routes yet.
OK, three passes it is.
Let me also optimize: in Pass 2 (relations only), I don't need locations=True. I can just read tags and member references without resolving locations.
Let me write the complete script now. Let me also handle the download with proper retry logic.
One more thought - for the actual node locations of PT route ways, I need locations=True to build geometries. So in Pass 3, I use locations=True.
Let me code this up properly now.
Wait, actually I just realized I can do it in TWO passes if I reverse the order. But osmium always reads in PBF order (nodes, ways, relations).
OK, three passes it is. Let me write it.
Actually, I want to reconsider - can I do it in two passes by being clever about what I store?
Pass 1: Find Gürtel, compute buffer
Pass 2: Read PBF with locations=True
- way(): Check if intersects buffer. If highway, add to highway_features. Store ALL way geometries (id -> geom) for ways that intersect buffer.
- relation(): Check if PT route. Look up member way geometries. If any member way has stored geometry (i.e., intersects buffer), build full route from all member ways (but we might not have all member way geometries stored!).
The problem with Pass 2 is that for PT routes, I might only have geometries for member ways that intersect the buffer, missing ways outside the buffer.
But wait - the user wants "each route as a single feature" with MultiLineString. If I can only get the partial geometry (ways within buffer), that's incomplete. But maybe it's acceptable? No, the user wants the full route.
OK, three passes it is. Or... let me use a different approach altogether.
What if I first extract all data using osmium CLI with a bounding box around Vienna, then work with the smaller file?
Actually, I just thought of another approach: use osmnx! OSMnx can download data from OSM for a specific area. But the user specifically asked for the Geofabrik PBF.
Let me just go with the three-pass approach. It's clean and correct.
Let me write the script now. I'll be careful about memory usage.
Actually, let me reconsider one more time. The user said "highway ways" for the first layer and "public-transport route relations" for the second layer. For the PT routes, I need the route relations, not the individual ways. So I need to:
1. Find route relations of the specified types
2. Check if they cross the buffer
3. Extract their full geometry (all member ways)
For step 3, I need the geometries of all member ways. Since many PT routes in Austria might not be near Vienna, I only need to store way geometries for PT routes that have at least one way near the buffer. But to determine this, I need to check if any member way is near the buffer.
Hmm, so I need:
1. Find all PT route relations (all of Austria - manageable, there aren't that many relations)
2. For each PT route, collect its member way IDs
3. For each member way, check if it intersects the buffer
4. If any member way does, collect all member way geometries for the full route
The challenge is steps 3 and 4 - I need way geometries.
Let me use this approach:
Pass 1: Find Gürtel ways, compute buffer
Pass 2: Read all relations, find PT routes, get {rel_id: {tags, [member_way_ids]}}
- Also build a set of all PT route member way IDs
Pass 3: Read all ways (locations=True):
- Store geometry for any way in the PT member way ID set
- Also check highway + buffer intersection for highway layer
- After all ways processed, build PT route geometries
For Pass 3, I need to store geometries for ALL PT route member ways, not just those in the buffer. This could be many ways across Austria. But I think it's manageable.
Let me estimate more carefully:
- Austria has ~2,000 bus routes, ~200 tram routes, ~50 subway routes, ~30 train routes (just guessing)
- Each route has 20-100 ways
- Total ways: (2000+200+50+30) * 50 = ~114,000 ways
- Each way has ~10 nodes, each node is 2 doubles = 16 bytes, plus overhead
- Total: ~114,000 * 10 * 16 = ~18 MB for coordinates, plus geometry overhead
This should be fine. Let me proceed.
Let me now write the script.
Actually, on reflection, let me also handle the case where the same way might be in multiple PT routes. I'll store the geometry once and reference it multiple times.
Let me also think about what "crosses the same buffer" means for PT routes. If a route's ways collectively form a geometry that intersects the buffer, it crosses. But if only one way of the route is within the buffer, the route as a whole crosses the buffer. So I check if any member way intersects the buffer.
For the full route geometry, I collect ALL member ways (not just those in the buffer) and create a MultiLineString.
OK let me write the script.
Actually, I want to think about memory for nodes. The Austria PBF has about 200-300 million nodes. With osmium and `locations=True`, osmium stores all node locations in memory (or uses a disk-backed index). The default is an in-memory index using arrays, which could use a lot of RAM.
Let me use a disk-backed node cache. osmium provides `osmium.index.create_map("sparse_file_array", "file_name")` for this.
Actually, when using `locations=True` in `apply_file()`, osmium uses a handler called `NodeLocationsForWays` which internally manages node locations. I can't control which index it uses through the simple API.
Hmm, let me check if I can use `osmium.NodeLocationsForWays` with a custom index.
Actually, I can create my own location handler. Let me use this approach:
```python
class LocationHandler(osmium.SimpleHandler):
def __init__(self):
super().__init__()
self.locations = osmium.index.create_map("sparse_file_array", "/work/node_cache")
def node(self, n):
self.locations.set(n.id, (n.location.lon, n.location.lat))
```
But then I'd need to chain it with my way handler. osmium supports handler chaining.
Let me look at how to properly use osmium with disk-based node cache:
```python
import osmium
class MyWayHandler(osmium.SimpleHandler):
def __init__(self):
super().__init__()
# This won't have node locations automatically
def way(self, w):
# w.nodes won't have valid locations
pass
```
The correct way is to use `NodeLocationsForWays` as a wrapper:
```python
loc_handler = osmium.NodeLocationsForWays()
loc_handler.set_node_cache("/work/node_cache") # Disk-based cache
class MyWayHandler(osmium.SimpleHandler):
def way(self, w):
# Now w.nodes have valid locations
for n in w.nodes:
print(n.location.lat, n.location.lon)
# Chain handlers
handler = MyWayHandler()
loc_handler.register_handler(handler)
osmium.apply_file(PBF_PATH, handlers=[loc_handler])
```
Wait, I think the API might be different. Let me check what's available.
Actually, osmium's Python bindings provide different approaches. Let me check:
The `NodeLocationsForWays` handler can be used like this:
```python
handler = NodeLocationsForWays()
handler.set_node_cache("my_node_cache") # Optional disk cache
# Then apply_file with this handler
apply_file("file.osm.pbf", handlers=[handler, my_way_handler])
```
Or with the SimpleHandler:
```python
class MyHandler(osmium.SimpleHandler):
def way(self, w):
for n in w.nodes:
if n.location.valid():
# use n.location.lat, n.location.lon
handler = MyHandler()
handler.apply_file(PBF_PATH, locations=True)
```
The `locations=True` parameter tells osmium to use a default NodeLocationsForWays handler internally. This stores all node locations in memory by default.
For large files, I should use a disk cache. Let me try:
```python
handler = MyHandler()
# Can't pass custom node cache through locations=True
```
Hmm, maybe I need to set up the handlers manually.
Let me just use `locations=True` and hope it works. If memory becomes an issue, I'll add a disk cache. The Austria PBF is about 300MB zipped, and the node locations in memory might be ~2-3GB which is a lot.
Actually, let me use the proper approach with a disk cache:
```python
import osmium
class NodeLocationHandler(osmium.NodeLocationsForWays):
def __init__(self):
super().__init__()
# Use a disk-based node cache
self.node_cache = osmium.index.create_map("sparse_file_array", "/work/node_cache")
class WayHandler(osmium.SimpleHandler):
def way(self, w):
# Node locations are resolved
pass
nl_handler = NodeLocationHandler()
way_handler = WayHandler()
nl_handler.register_handler(way_handler)
osmium.apply_file(PBF_PATH, handlers=[nl_handler])
```
Wait, I'm not sure about the exact API. Let me just try the simple approach first and see if it works.
Actually, for osmpythonbindings, the typical usage with disk cache is:
```python
import osmium
class MyHandler(osmium.SimpleHandler):
def way(self, w):
for n in w.nodes:
if n.location.valid():
# use it
pass
h = MyHandler()
h.apply_file(PBF_PATH, locations=True, idx="sparse_file_array,/work/node_cache")
```
Or something like that. Let me check the osmium Python docs... I'm not 100% sure about the syntax. Let me just use `locations=True` and see what happens.
Actually, let me test with a small script first.
Let me just write the full solution. I'll use `locations=True` for the handlers that need node locations. If memory is an issue, I'll adjust.
Let me finalize my approach:
1. Download the Austria PBF
2. One pass to find Gürtel ways (with locations=True)
3. Compute buffer
4. Second pass just for relations (no locations needed, just tags and member refs)
5. Third pass for ways (with locations=True) - store highway features and PT route way geometries
6. Build PT route geometries from stored way geometries
7. Write to GeoPackage
Let me code this up now.
OK wait, I also realize that `buffer_wgs84` check I was thinking about is for a rough bbox check. Since the buffer in WGS84 is distorted, let me just do all intersection checks in the projected CRS (EPSG:31287) where distances are accurate.
Here's my refined plan for the way handler:
```python
def way(self, w):
coords = [(n.location.lon, n.location.lat) for n in w.nodes if n.location.valid()]
if len(coords) < 2:
return
geom_wgs84 = LineString(coords)
geom_proj = transform(project_to, geom_wgs84)
# Check if intersects buffer
if not geom_proj.intersects(self.buffer_proj):
return
# Highway
highway = w.tags.get('highway')
if highway:
self.highway_features.append((geom_wgs84, dict(w.tags)))
# Store for PT routes if this way is a member of any PT route
if w.id in self.pt_member_way_ids:
self.way_geoms[w.id] = geom_wgs84
```
But this only stores way geometries for ways that are BOTH in the PT member set AND intersect the buffer. For the full route geometry, I need ALL member ways of the route, not just those in the buffer.
So I need to separate the concerns:
1. For highway layer: only ways that intersect the buffer
2. For PT route geometries: ALL ways that are members of PT routes that cross the buffer
The problem is: to determine if a PT route crosses the buffer, I need to check if ANY of its member ways intersect the buffer. But to build the full route geometry, I need ALL member ways.
So my approach should be:
1. Store way geometries for ALL PT route member ways (across all of Austria)
2. For each PT route, check if any member way intersects the buffer
3. If yes, build the full route geometry from all member ways
The number of PT route member ways is manageable as I estimated above.
Let me update the code:
In Pass 2 (relations):
- Find all PT routes → {rel_id: {tags, [member_way_ids]}}
- Build set of ALL PT route member way IDs
In Pass 3 (ways):
- For each way:
- Check if intersects buffer AND is highway → add to highway_features
- If way_id in PT member way ID set → store geometry (regardless of buffer intersection)
- After all ways processed:
- For each PT route:
- Check if any member way intersects buffer
- If yes, collect all member way geometries → MultiLineString
But wait, for checking if a member way intersects the buffer in Pass 3, I need to transform each way to projected CRS. That's fine since I'm already doing it for highways.
Let me write the final script now. I'll try to be efficient and clear.
Let me also think about what happens if a PT route has member ways that I haven't stored (because the way wasn't in the PBF, or wasn't a way type member, etc.). I'll just skip those.
Let me write the code.
Actually, let me use a cleaner approach. In Pass 3, I'll store way geometries (in projected CRS) for ALL PT route member ways. Then I can check buffer intersection directly without transforming.
Let me finalize and write the script.
One more thing - for `oneway`, OSM uses values like "yes", "no", "1", "0", "-1". I'll keep the raw value.
Also, for `maxspeed`, OSM stores it as a string like "50", "30", "100 km/h". I'll keep as-is.
Let me write the complete script now:
```python
#!/usr/bin/env python3
import os
import time
import sys
import urllib.request
from urllib.error import URLError
from collections import defaultdict
import osmium
import geopandas as gpd
import pandas as pd
from shapely.geometry import LineString, MultiLineString, box
from shapely.ops import unary_union, transform
import pyproj
from shapely import wkb
# === Configuration ===
PBF_URL = "https://download.geofabrik.de/europe/austria-latest.osm.pbf"
PBF_PATH = "/work/austria-latest.osm.pbf"
GPKG_PATH = "/work/vienna_network.gpkg"
WGS84 = "EPSG:4326"
AUSTRIALAMB = "EPSG:31287" # MGI / Austria Lambert
# === Helpers ===
def download_with_retry(url, path, max_retries=3):
for attempt in range(max_retries):
try:
print(f"Downloading {url} (attempt {attempt+1}/{max_retries})...")
urllib.request.urlretrieve(url, path)
size_mb = os.path.getsize(path) / (1024 * 1024)
print(f"Downloaded {size_mb:.1f} MB")
return
except URLError as e:
print(f"Download failed: {e}")
if attempt < max_retries - 1:
wait = 5 * (attempt + 1)
print(f"Retrying in {wait}s...")
time.sleep(wait)
raise RuntimeError(f"Failed to download {url} after {max_retries} attempts")
# === Transformer ===
transformer_wgs84_to_mgi = pyproj.Transformer.from_crs(WGS84, AUSTRIALAMB, always_xy=True)
project_to = transformer_wgs84_to_mgi.transform
project_from = pyproj.Transformer.from_crs(AUSTRIALAMB, WGS84, always_xy=True).transform
# === Step 1: Download ===
if not os.path.exists(PBF_PATH):
download_with_retry(PBF_URL, PBF_PATH)
else:
print(f"{PBF_PATH} exists, skipping download")
# === Step 2: Find Gürtel ways ===
print("\n=== Finding Gürtel ways ===")
class GurtelHandler(osmium.SimpleHandler):
def __init__(self):
super().__init__()
self.gurtel_geoms_wgs84 = []
self.gurtel_features = [] # (geom_wgs84, tags)
def way(self, w):
name = w.tags.get('name')
highway = w.tags.get('highway')
if name and name.rstrip().endswith('Gürtel') and highway:
coords = [(n.location.lon, n.location.lat) for n in w.nodes if n.location.valid()]
if len(coords) >= 2:
geom = LineString(coords)
self.gurtel_geoms_wgs84.append(geom)
self.gurtel_features.append((geom, dict(w.tags)))
h1 = GurtelHandler()
h1.apply_file(PBF_PATH, locations=True)
print(f"Found {len(h1.gurtel_geoms_wgs84)} Gürtel way segments")
if len(h1.gurtel_geoms_wgs84) == 0:
print("ERROR: No Gürtel ways found!")
sys.exit(1)
# === Step 3: Compute 500m buffer ===
print("\n=== Computing buffer ===")
gurtel_proj = [transform(project_to, g) for g in h1.gurtel_geoms_wgs84]
gurtel_union = unary_union(gurtel_proj)
buffer_proj = gurtel_union.buffer(500)
print(f"Buffer area: {buffer_proj.area / 1e6:.2f} km²")
# Also compute buffer bbox in WGS84 for quick filtering
buffer_bounds_proj = buffer_proj.bounds
buffer_bbox_proj = box(*buffer_bounds_proj)
# === Step 4: Find all PT route relations ===
print("\n=== Finding PT route relations ===")
class PTRelationHandler(osmium.SimpleHandler):
def __init__(self):
super().__init__()
self.pt_routes = {} # rel_id -> {'tags': {...}, 'way_ids': set()}
self.all_pt_way_ids = set()
self.valid_route_types = {'bus', 'tram', 'subway', 'train', 'trolleybus', 'light_rail'}
def relation(self, r):
route = r.tags.get('route')
rel_type = r.tags.get('type')
if rel_type == 'route' and route in self.valid_route_types:
way_ids = set()
for member in r.members:
if member.type == 'w':
way_ids.add(member.ref)
if way_ids:
self.pt_routes[r.id] = {
'tags': dict(r.tags),
'way_ids': way_ids
}
self.all_pt_way_ids.update(way_ids)
h2 = PTRelationHandler()
h2.apply_file(PBF_PATH) # No locations needed - just tags and refs
print(f"Found {len(h2.pt_routes)} PT route relations with {len(h2.all_pt_way_ids)} unique member ways")
# === Step 5: Read ways - highways + PT route geometries ===
print("\n=== Collecting highways and PT route way geometries ===")
class NetworkWayHandler(osmium.SimpleHandler):
def __init__(self, buffer_proj, pt_way_ids):
super().__init__()
self.buffer_proj = buffer_proj
self.pt_way_ids = pt_way_ids # set of way IDs that are members of PT routes
self.highway_features = [] # (geom_wgs84, tags)
self.pt_way_geoms = {} # way_id -> LineString (in projected CRS)
def way(self, w):
coords = [(n.location.lon, n.location.lat) for n in w.nodes if n.location.valid()]
if len(coords) < 2:
return
geom_wgs84 = LineString(coords)
# Quick bounding box check to avoid unnecessary transforms
geom_bounds = geom_wgs84.bounds
# WGS84 buffer approx bounds (very rough)
if not geom_wgs84.intersects(buffer_wgs84_approx):
# Still need to check PT route ways even outside buffer
if w.id not in self.pt_way_ids:
return
geom_proj = transform(project_to, geom_wgs84)
# Check for highway layer
highway = w.tags.get('highway')
if highway and geom_proj.intersects(self.buffer_proj):
self.highway_features.append((geom_wgs84, dict(w.tags)))
# Store geometry for PT route member ways (anywhere in Austria)
if w.id in self.pt_way_ids:
self.pt_way_geoms[w.id] = geom_proj
```
Hmm, this is getting complex with the buffer_wgs84_approx variable. Let me simplify and just check in projected CRS for everything.
Also, for the PT route way geometries, I should store them in the projected CRS so I can check buffer intersection directly.
Let me revise the NetworkWayHandler:
```python
class NetworkWayHandler(osmium.SimpleHandler):
def __init__(self, buffer_proj, pt_way_ids, project_to):
super().__init__()
self.buffer_proj = buffer_proj
self.pt_way_ids = pt_way_ids
self.project_to = project_to
self.highway_features = [] # (geom_wgs84, tags)
self.pt_way_geoms_proj = {} # way_id -> LineString (projected)
self.pt_way_geoms_wgs84 = {} # way_id -> LineString (WGS84)
def way(self, w):
coords = [(n.location.lon, n.location.lat) for n in w.nodes if n.location.valid()]
if len(coords) < 2:
return
highway = w.tags.get('highway')
is_pt_member = w.id in self.pt_way_ids
if not highway and not is_pt_member:
return
geom_wgs84 = LineString(coords)
geom_proj = transform(self.project_to, geom_wgs84)
if highway and geom_proj.intersects(self.buffer_proj):
self.highway_features.append((geom_wgs84, dict(w.tags)))
if is_pt_member:
self.pt_way_geoms_proj[w.id] = geom_proj
self.pt_way_geoms_wgs84[w.id] = geom_wgs84
```
Wait, but I need to think about whether storing ALL PT route member way geometries (across all of Austria) is too much. Let me reconsider.
If a PT route in Vorarlberg (far from Vienna) has 50 member ways, storing all those geometries is wasteful since the route won't cross the Vienna buffer.
Can I optimize? I could first check if the route's bbox overlaps the buffer's bbox using the relation's bbox (but relations don't have bboxes in the PBF).
Alternatively, I could estimate: if ALL member way IDs of a route are outside the buffer area (by some heuristic), skip storing their geometries.
But this adds complexity. Let me just store all PT route way geometries and see if it works. The number should be manageable.
Alright, let me write the complete script. Let me also think about the PT route feature output carefully.
For each PT route:
- Check if any member way intersects the buffer
- If yes, collect all member way geometries and create a MultiLineString
- Tags: ref, name, operator, route
Wait, I also want to store the way geometries in WGS84 so I can transform them later. Or I can just store in projected and create the output directly.
Let me store in both: projected for buffer check, WGS84 for eventual output (I'll project at write time).
Actually, I'll just store in WGS84 and project at the end. For buffer intersection checks, I'll transform on the fly. Since we're doing each check once, it's fine.
Let me simplify and write the script.
Actually, thinking more about memory: if I store all PT route member way geometries for all of Austria, that could be 100k+ LineStrings. Each LineString has maybe 10-50 coordinates. That's 100k * 10 * 2 * 8 bytes = 16 MB for coords, plus object overhead (maybe 100-200 bytes per LineString) = 100k * 200 = 20 MB. Total ~40 MB. Fine.
Let me just write the complete script now. I'll handle the buffer_wgs84 thing differently - I'll just not do a quick bbox check and directly check in projected CRS for everything.
Wait, but I need to be smart. For the highway check, I only care about ways that are near the buffer. Checking `geom_proj.intersects(buffer_proj)` for every highway way in Austria is too many. Let me do a quick bbox check first.
Actually, a quick way: check if the WGS84 bbox of the way overlaps a generous WGS84 bbox of the buffer. The WGS84 bbox of the buffer would be a rectangle in lat/lon that contains the buffer.
Let me compute the buffer's WGS85 bbox from the projected buffer:
```python
buffer_bounds_proj = buffer_proj.bounds
# Transform corners to WGS84
min_lon, min_lat = transform(project_from, (buffer_bounds_proj[0], buffer_bounds_proj[1]))
max_lon, max_lat = transform(project_from, (buffer_bounds_proj[2], buffer_bounds_proj[3]))
```
Then for each way, I can do a quick check if its bbox overlaps this WGS84 bbox before doing the full transform and intersection test.
OK let me stop overthinking and write the code.
Let me write it clean and complete:
```python
#!/usr/bin/env python3
"""
Extract highways and PT routes around Vienna's Gürtel from Austria OSM PBF.
Output: vienna_network.gpkg with highways and pt_routes layers.
"""
import os
import sys
import time
import urllib.request
from urllib.error import URLError
import osmium
import geopandas as gpd
import pandas as pd
from shapely.geometry import LineString, MultiLineString, box
from shapely.ops import unary_union, transform
import pyproj
# === CONFIG ===
PBF_URL = "https://download.geofabrik.de/europe/austria-latest.osm.pbf"
PBF_PATH = "/work/austria-latest.osm.pbf"
GPKG_PATH = "/work/vienna_network.gpkg"
WGS84 = "EPSG:4326"
AUSTRIALAMB = "EPSG:31287" # MGI / Austria Lambert
VALID_PT_ROUTES = {'bus', 'tram', 'subway', 'train', 'trolleybus', 'light_rail'}
HIGHWAY_ATTRS = ['name', 'highway', 'maxspeed', 'lanes', 'surface', 'oneway']
PT_ATTRS = ['ref', 'name', 'operator', 'route']
# === COORDINATE TRANSFORMERS ===
trans_4326_to_31287 = pyproj.Transformer.from_crs(WGS84, AUSTRIALAMB, always_xy=True).transform
trans_31287_to_4326 = pyproj.Transformer.from_crs(AUSTRIALAMB, WGS84, always_xy=True).transform
def download(url, path):
for attempt in range(3):
try:
print(f"Downloading {url} (attempt {attempt+1}/3)...")
urllib.request.urlretrieve(url, path)
mb = os.path.getsize(path) / 1e6
print(f" Done: {mb:.0f} MB")
return
except URLError as e:
print(f" Error: {e}")
if attempt < 2:
time.sleep(5 * (attempt + 1))
raise RuntimeError(f"Download failed after 3 attempts")
# ============================================================
# PASS 1 — Find Gürtel ways
# ============================================================
print("=" * 60)
print("PASS 1: Finding Gürtel ways")
print("=" * 60)
class GurtelHandler(osmium.SimpleHandler):
def __init__(self):
super().__init__()
self.gurtel_ways_wgs84 = []
def way(self, w):
name = w.tags.get('name')
if not name or not name.rstrip().endswith('Gürtel'):
return
highway = w.tags.get('highway')
if not highway:
return
coords = [(n.location.lon, n.location.lat) for n in w.nodes if n.location.valid()]
if len(coords) >= 2:
self.gurtel_ways_wgs84.append(LineString(coords))
h1 = GurtelHandler()
h1.apply_file(PBF_PATH, locations=True)
print(f" Found {len(h1.gurtel_ways_wgs84)} Gürtel way segments")
if not h1.gurtel_ways_wgs84:
sys.exit("ERROR: No Gürtel ways found. Cannot proceed.")
# Project Gürtel, union + buffer
gurtel_proj = [transform(trans_4326_to_31287, g) for g in h1.gurtel_ways_wgs84]
buffer_proj = unary_union(gurtel_proj).buffer(500)
# A generous WGS84 bounding box for quick pre-filtering
buffer_bounds_proj = buffer_proj.bounds
sw = transform(trans_31287_to_4326, (buffer_bounds_proj[0], buffer_bounds_proj[1]))
ne = transform(trans_31287_to_4326, (buffer_bounds_proj[2], buffer_bounds_proj[3]))
bbox_wgs84 = box(sw[0], sw[1], ne[0], ne[1])
print(f" Buffer bbox (WGS84): {[round(c, 4) for c in bbox_wgs84.bounds]}")
# ============================================================
# PASS 2 — Find PT route relations
# ============================================================
print("=" * 60)
print("PASS 2: Finding PT route relations")
print("=" * 60)
class PTRelationHandler(osmium.SimpleHandler):
def __init__(self):
super().__init__()
self.pt_routes = {} # rel_id -> {'tags': dict, 'way_ids': set}
self.all_pt_way_ids = set()
def relation(self, r):
if r.tags.get('type') != 'route':
return
route = r.tags.get('route')
if route not in VALID_PT_ROUTES:
return
way_ids = {m.ref for m in r.members if m.type == 'w'}
if not way_ids:
return
self.pt_routes[r.id] = {
'tags': dict(r.tags),
'way_ids': way_ids,
}
self.all_pt_way_ids.update(way_ids)
h2 = PTRelationHandler()
h2.apply_file(PBF_PATH) # no node locations needed
print(f" Found {len(h2.pt_routes)} PT route relations")
print(f" With {len(h2.all_pt_way_ids)} unique member ways")
# ============================================================
# PASS 3 — Read ways (highway + PT member way geometries)
# ============================================================
print("=" * 60)
print("PASS 3: Reading ways")
print("=" * 60)
class WayHandler(osmium.SimpleHandler):
def __init__(self, buffer_proj, bbox_wgs84, pt_way_ids):
super().__init__()
self.buffer_proj = buffer_proj
self.bbox_wgs84 = bbox_wgs84
self.pt_way_ids = pt_way_ids
# Output collections
self.highway_features = [] # list of (geom_wgs84, tags_dict)
self.pt_way_geoms = {} # way_id -> (LineString projected, LineString wgs84)
def way(self, w):
coords = [(n.location.lon, n.location.lat) for n in w.nodes if n.location.valid()]
if len(coords) < 2:
return
highway = w.tags.get('highway')
is_pt = w.id in self.pt_way_ids
if not highway and not is_pt:
return
geom_wgs84 = LineString(coords)
# Quick WGS84 bbox check for highway ways
if highway and not geom_wgs84.intersects(self.bbox_wgs84):
# Still check PT way if needed
if not is_pt:
return
# For PT ways outside the bbox, we still need to store geometry
# But let's do a quick check - if way is very far, skip
way_bounds = geom_wgs84.bounds
if (way_bounds[2] < bbox_wgs84.bounds[0] or way_bounds[0] > bbox_wgs84.bounds[2] or
way_bounds[3] < bbox_wgs84.bounds[1] or way_bounds[1] > bbox_wgs84.bounds[3]):
# Way is definitely outside bbox, but still could be part of a PT route
# that crosses the buffer. Let me check more carefully.
# Actually, if the way is far from the buffer, the route might still
# cross the buffer via other ways. So I should store this way's geometry
# if it's part of a PT route, regardless of location.
pass # Still store PT way geometries
geom_proj = transform(trans_4326_to_31287, geom_wgs84)
# Check buffer intersection
in_buffer = geom_proj.intersects(self.buffer_proj)
if highway and in_buffer:
self.highway_features.append((geom_wgs84, dict(w.tags)))
if is_pt:
self.pt_way_geoms[w.id] = (geom_proj, geom_wgs84)
h3 = WayHandler(buffer_proj, bbox_wgs84, h2.all_pt_way_ids)
h3.apply_file(PBF_PATH, locations=True)
print(f" Highway features in buffer: {len(h3.highway_features)}")
print(f" PT member way geometries stored: {len(h3.pt_way_geoms)}")
# ============================================================
# Build PT route features
# ============================================================
print("=" * 60)
print("Building PT route features")
print("=" * 60)
pt_route_features = [] # list of (MultiLineString projected, tags)
for rel_id, route_info in h2.pt_routes.items():
way_ids = route_info['way_ids']
tags = route_info['tags']
# Find if any member way intersects the buffer
member_geoms_proj = []
member_geoms_wgs84 = []
crosses_buffer = False
for wid in way_ids:
if wid in h3.pt_way_geoms:
geom_proj, geom_wgs84 = h3.pt_way_geoms[wid]
member_geoms_proj.append(geom_proj)
member_geoms_wgs84.append(geom_wgs84)
if geom_proj.intersects(buffer_proj):
crosses_buffer = True
if not crosses_buffer or not member_geoms_proj:
continue
# Create MultiLineString in projected CRS
multi_proj = MultiLineString(member_geoms_proj)
pt_route_features.append((multi_proj, tags))
print(f" PT routes crossing buffer: {len(pt_route_features)}")
# ============================================================
# Write GeoPackage
# ============================================================
print("=" * 60)
print("Writing GeoPackage")
print("=" * 60)
# --- Highways layer ---
rows_hw = []
for geom_wgs84, tags in h3.highway_features:
geom_proj = transform(trans_4326_to_31287, geom_wgs84)
row = {k: tags.get(k) for k in HIGHWAY_ATTRS}
row['geometry'] = geom_proj
rows_hw.append(row)
gdf_hw = gpd.GeoDataFrame(rows_hw, crs=AUSTRIALAMB, geometry='geometry')
gdf_hw = gdf_hw[HIGHWAY_ATTRS + ['geometry']]
# --- PT Routes layer ---
rows_pt = []
for geom_proj, tags in pt_route_features:
row = {k: tags.get(k) for k in PT_ATTRS}
row['geometry'] = geom_proj
rows_pt.append(row)
gdf_pt = gpd.GeoDataFrame(rows_pt, crs=AUSTRIALAMB, geometry='geometry')
gdf_pt = gdf_pt[PT_ATTRS + ['geometry']]
# Write to GeoPackage
gdf_hw.to_file(GPKG_PATH, layer='highways', driver='GPKG')
gdf_pt.to_file(GPKG_PATH, layer='pt_routes', driver='GPKG')
print(f" Highways: {len(gdf_hw)} features")
print(f" PT Routes: {len(gdf_pt)} features")
print(f" Written to {GPKG_PATH}")
print("Done!")
```
Wait, there's an issue with the bbox check logic for PT routes. In the WayHandler, I'm doing:
```python
if highway and not geom_wgs84.intersects(self.bbox_wgs84):
if not is_pt:
return
```
But if `highway` is None (not a highway way), this check is skipped and we go to the PT check. Let me restructure:
```python
if not highway and not is_pt:
return
geom_wgs84 = LineString(coords)
if highway:
# Only check buffer for highways
if geom_wgs84.intersects(self.bbox_wgs84):
geom_proj = transform(...)
if geom_proj.intersects(self.buffer_proj):
self.highway_features.append(...)
# Also check if this highway is a PT member way
if is_pt:
geom_proj = transform(...)
self.pt_way_geoms[w.id] = (geom_proj, geom_wgs84)
elif is_pt:
# Non-highway PT member way
geom_proj = transform(...)
self.pt_way_geoms[w.id] = (geom_proj, geom_wgs84)
```
Actually, the logic is getting messy. Let me simplify:
For every way:
1. If it's a highway AND intersects buffer → add to highway_features
2. If it's a PT member way → store geometry
Let me not use the bbox optimization initially and just do the full check. The number of highway ways near the buffer is small anyway.
Here's a cleaner version:
```python
class WayHandler(osmium.SimpleHandler):
def __init__(self, buffer_proj, pt_way_ids):
super().__init__()
self.buffer_proj = buffer_proj
self.pt_way_ids = pt_way_ids
self.highway_features = []
self.pt_way_geoms = {}
def way(self, w):
# Quick tag check before building geometry
highway = w.tags.get('highway')
is_pt = w.id in self.pt_way_ids
if not highway and not is_pt:
return
coords = [(n.location.lon, n.location.lat) for n in w.nodes if n.location.valid()]
if len(coords) < 2:
return
geom_wgs84 = LineString(coords)
if highway or is_pt:
geom_proj = transform(trans_4326_to_31287, geom_wgs84)
if highway and geom_proj.intersects(self.buffer_proj):
self.highway_features.append((geom_wgs84, dict(w.tags)))
if is_pt:
self.pt_way_geoms[w.id] = (geom_proj, geom_wgs84)
```
This is cleaner. The downside is that for non-highway PT member ways far from the buffer, we still transform and store them. But that's fine - the number is manageable.
Actually, I realize there's another issue: for ways that are PT member ways but NOT near the buffer AND not highways, they're still being transformed and stored. This could be many ways. But as I estimated, it's ~100k which is fine.
Let me also consider: what if a way is a PT member way but also a highway near the buffer? It gets added to highway_features AND stored in pt_way_geoms. That's correct.
OK, let me finalize the script. Let me also handle the case where for the PT routes, some member ways might not have been stored (because they weren't in the PBF as ways, or were of type 'n' or 'r'). In that case, I just skip them.
Let me write the complete final script now, being careful about all edge cases.
Also, I want to make sure the PT route MultiLineString is correct. If a route has only one way, MultiLineString with one geometry is fine.
Let me also think about the output CRS. The user says "Austria's standard projected coordinate system". EPSG:31287 (MGI / Austria Lambert) is the standard. Let me use that.
Actually, wait. I want to double-check: "Austria's standard projected coordinate system" - in Austrian GIS contexts, this could refer to:
- EPSG:31287 (MGI / Austria Lambert) - national projection
- EPSG:31256 (MGI / Austria GK East) - eastern zone (Vienna)
- EPSG:3416 (MGI / Austria GK Central) - central zone
I think EPSG:31287 is the safest choice as it covers all of Austria. Let me use that.
But actually, many Austrian datasets use EPSG:31256 for the Vienna area. And EPSG:31287 is less commonly used in practice. Let me check...
Hmm, I'll go with EPSG:31287 since it's explicitly "Austria's standard projected coordinate system" and Lambert is the national standard projection.
Alright, let me write the final script and run it.
One more thing: I should also handle the `oneway` tag correctly. In OSM, `oneway=yes` and `oneway=1` both mean one-way. `oneway=-1` means reverse direction. I'll keep the raw value.
Let me also make sure the PT route way IDs are integers (which they should be from the PBF).
OK, writing the script now.
Actually, I realize I'm overcomplicating the bbox quick check. Let me just not do it and directly check intersection in projected CRS. The number of ways that are highways or PT member ways in Austria is manageable.
Wait, "highway" ways include all roads, paths, etc. In Austria, there are millions of highway ways. Processing all of them and checking intersection with the buffer would be very slow.
So I NEED a quick pre-filter. Let me use a simple WGS84 bounding box check before doing the transform and projected intersection check.
Let me add a pre-filter:
```python
# Before expensive transform, do quick WGS84 bbox check
if highway and not bbox_wgs84.intersects(geom_wgs84):
# Way is likely outside buffer area
if not is_pt:
return
# For PT ways, still might be needed
```
But `bbox_wgs84` is a box in WGS84 degrees, which is a rough approximation. Some ways might be outside the bbox but still inside the buffer (due to coordinate distortion). And some ways inside the bbox might be outside the buffer. That's fine - the bbox check is just a pre-filter to avoid unnecessary transforms.
Actually, a better approach: use a generous bbox and check `geom_wgs84.intersects(bbox_wgs84)` before doing the transform. This is a fast check.
Let me revise:
```python
class WayHandler(osmium.SimpleHandler):
def __init__(self, buffer_proj, bbox_wgs84, pt_way_ids):
super().__init__()
self.buffer_proj = buffer_proj
self.bbox_wgs84 = bbox_wgs84
self.pt_way_ids = pt_way_ids
self.highway_features = []
self.pt_way_geoms = {}
def way(self, w):
highway = w.tags.get('highway')
is_pt = w.id in self.pt_way_ids
if not highway and not is_pt:
return
coords = [(n.location.lon, n.location.lat) for n in w.nodes if n.location.valid()]
if len(coords) < 2:
return
geom_wgs84 = LineString(coords)
# Quick bbox pre-filter
if highway and not geom_wgs84.intersects(self.bbox_wgs84):
# Only need to continue if it's a PT member way
if not is_pt:
return
# For PT ways, we still need geometry even if outside bbox
# because the route might cross the buffer via other ways
geom_proj = transform(trans_4326_to_31287, geom_wgs84)
if highway and geom_proj.intersects(self.buffer_proj):
self.highway_features.append((geom_wgs84, dict(w.tags)))
if is_pt:
self.pt_way_geoms[w.id] = (geom_proj, geom_wgs84)
```
Wait, but for PT ways that are outside the bbox, we still do the transform and store them. That's fine.
Actually, let me reconsider. For PT ways, if they're far from the buffer (e.g., in Vorarlberg), we don't need to store them because their route won't cross the Vienna buffer. But we don't know that without checking.
So for PT ways, I have two options:
1. Store ALL PT member way geometries (across all of Austria) - memory may be high
2. Store only PT member way geometries that are near the buffer - but then I'd miss routes where the crossing way is one of the near ones but other ways are far
Let me think about this differently. A PT route crosses the buffer if ANY of its member ways intersect the buffer. So I need to check each PT route's member ways for buffer intersection.
For efficiency, I can:
1. Store PT way geometries only for ways that are near the buffer (using bbox check)
2. For PT routes, check if any stored way geometry intersects the buffer
3. If yes, the route crosses the buffer
But wait, this approach would miss routes where the ONLY way that crosses the buffer is stored (near buffer) but other ways (far from buffer) are NOT stored. I'd know the route crosses the buffer (from the stored way), but I wouldn't have the full geometry of the route.
Hmm, but actually if one way is near the buffer, I store it. Then when checking the route, I see this way intersects the buffer, so the route crosses. But I don't have the other ways' geometries to build the full route.
So I CAN'T use approach 2. I need ALL member way geometries for routes that cross the buffer.
Alternative:
1. Store ALL PT member way geometries (memory heavy but manageable)
2. For each PT route, check if ANY member way intersects buffer
3. If yes, use ALL stored member way geometries to build the route
This is my original approach. Let me stick with it but optimize: for PT member ways, still use the bbox pre-filter but with a much more generous bbox (e.g., whole of eastern Austria) to reduce storage.
Actually, the number of PT member ways in Austria is not that large. Let me just store all of them. If memory is an issue, I can optimize later.
Let me write the final script now. I'll use the bbox pre-filter only for highways. For PT ways, I'll store everything.
Wait, but processing ALL PT member ways across Austria requires building geometries for them. That means for each PT member way, I need to look up its node coords (from the osmium node cache) and build a LineString. This happens in the `way()` callback. With `locations=True`, osmium resolves node coords on the fly.
The number of PT member way ways across Austria is maybe 100k. Each way has maybe 10 nodes. So we're processing 100k ways with 10 nodes each = 1M node lookups. That's very fast.
OK, let me just write the script without worrying about optimization.
Let me also reconsider: for the highway layer, the user wants "every highway ... that intersects a 500 m buffer". So I only need highways that intersect the buffer. The bbox pre-filter is useful here to avoid processing millions of highway ways across Austria.
For highways, I can be aggressive with the bbox pre-filter: use a buffer bbox expanded by a generous margin.
Let me compute the bbox in WGS84 by transforming the buffer bounds:
```python
buffer_bounds = buffer_proj.bounds
# Add some margin (500m)
buffer_bounds_expanded = (
buffer_bounds[0] - 500,
buffer_bounds[1] - 500,
buffer_bounds[2] + 500,
buffer_bounds[3] + 500
)
# Transform to WGS84
sw_proj = (buffer_bounds_expanded[0], buffer_bounds_expanded[1])
ne_proj = (buffer_bounds_expanded[2], buffer_bounds_expanded[3])
sw_wgs = transform(trans_31287_to_4326, sw_proj)
ne_wgs = transform(trans_31287_to_4326, ne_proj)
bbox_wgs84 = box(sw_wgs[0], sw_wgs[1], ne_wgs[0], ne_wgs[1])
```
This gives a generous WGS84 bounding box around the buffer. Any highway way whose bbox doesn't intersect this box is definitely outside the buffer.
For PT ways, I need a different approach since they might be far from the buffer. Let me use a very generous bbox covering all of eastern Austria (where PT routes that cross the Vienna buffer might be).
Actually, let me think about this. A PT route that crosses the Vienna buffer could have member ways extending far from Vienna. For example, a long-distance bus route from Vienna to Salzburg that crosses the Gürtel. Its ways could be in Salzburg, Linz, etc.
To capture the full geometry of such a route, I'd need way geometries from all over Austria.
So I should store PT member way geometries from ALL of Austria. This is the only correct approach.
Let me estimate the number of PT member ways more carefully:
- Vienna has ~200 bus lines, ~30 tram lines, ~5 subway lines
- Other cities have their own systems
- Long-distance trains and buses
Each bus route might have 20-100 ways. Let's say 2000 PT relations in Austria, each with 50 ways on average = 100,000 ways.
100,000 LineStrings with ~10 nodes each = ~1,000,000 coordinates. This is very manageable.
OK, final approach:
1. Pre-filter highways with bbox check (to avoid processing millions of ways)
2. Store ALL PT member way geometries (no pre-filter)
3. Avoid duplicate transforms
Let me write the final script.
Actually, you know what, let me just write the script and run it. I've been planning for too long.
Let me also handle one more thing: the PT routes crossing the buffer. I need to check if the ROUTE crosses the buffer, not individual ways. If a route has 10 ways and only 1 is within the buffer, the route crosses the buffer and I include ALL 10 ways in the route geometry.
This is what I'm doing - in the final step, I check if any member way intersects the buffer, and if so, include all member way geometries.
Let me write the final script now.
One more consideration about performance: when using `locations=True` with osmium on a large PBF, osmium stores all node locations. For the Austria PBF (200M+ nodes), this could use 3-4 GB of RAM with the in-memory index.
Let me use a disk-based node cache. osmium supports this through the `idx` parameter or by using `NodeLocationsForWays` with a custom index.
Actually, looking at the osmium Python docs, when you do `apply_file(PBF_PATH, locations=True)`, it uses an in-memory index by default. For large files, I should set up a disk cache.
Let me try:
```python
# Create a disk-based node location store
node_cache = "/work/osmium_node_cache"
os.makedirs(node_cache, exist_ok=True)
# Option 1: Use NodeLocationsForWays directly
from osmium import NodeLocationsForWays
nlh = NodeLocationsForWays()
nlh.node_cache = node_cache # Not sure about exact API
# Option 2: Use locations=True with a custom index
# Not sure if this is supported
```
Let me check the osmium Python API by looking at the installed version.
Actually, let me just try a practical approach:
```python
handler = MyHandler()
handler.apply_file(PBF_PATH, locations=True)
```
If it runs out of memory, I'll add a disk cache. Let me try this first.
For the Gürtel finding pass, I definitely need node locations. For the way collection pass, I also need node locations.
Let me just try with `locations=True` and see what happens. The Austria PBF is about 350MB compressed, and the node cache might use 2-3GB which should be OK for most systems.
Actually, let me be more careful. Let me check the system memory first and use a disk cache if needed.
You know what, let me just write the script and test it. If it fails, I'll add a disk cache.
Also, I should use `osmium.apply_file()` instead of handler.apply_file() for more control. But handler.apply_file() is simpler.
Let me write it now and stop overthinking.
Wait, one more thing: the osmium Python bindings version. Let me check what's installed.
Let me just write the script and run it. I'll handle errors as they come up.
4:02
Bash
python3 -c "import osmium; print(osmium.__version__)" 2>/dev/null || echo "osmiu...
osmium not found
4:02
Bash
python3 -c "import geopandas; print(geopandas.__version__); from shapely.geometr...
1.1.3 shapely ok 3.7.2
4:09
Let me install osmium first.
4:09
Bash
pip install --user osmium 2>&1 | tail -5
Requirement already satisfied: requests in /usr/local/lib/python3.14/site-packages (from osmium) (2.34.2) Requirement already satisfied: charset_normalizer<4,>=2 in /usr/local/lib/python3.14/site-packages (from requests->osmium) (3.4.7) Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.14/site-packages (from requests->osmium) (3.18) Requirement already satisfied: urllib3<3,>=1.26 in /usr/local/lib/python3.14/site-packages (from requests->osmium) (2.7.0) Requirement already satisfied: certifi>=2023.5.7 in /usr/local/lib/python3.14/site-packages (from requests->osmium) (2026.5.20)
4:22
Let me check the import again carefully - the PyPI package name is `osmium` but the import is `import osmium`.
4:22
Bash
python3 -c "import osmium; print(osmium.__version__)"
Traceback (most recent call last):
File "<string>", line 1, in <module>
import osmium; print(osmium.__version__)
^^^^^^^^^^^^^^^^^^
AttributeError: module 'osmium' has no attribute '__version__'
4:36
4:36
Bash
python3 -c "import osmium; print(osmium)"
<module 'osmium' from '/usr/local/lib/python3.14/site-packages/osmium/__init__.py'>
6:10
Good, osmium is available. Now let me write the Python script to solve this task. Here's my plan:
1. Download the Geofabrik Austria PBF if not already present
2. Parse the PBF using osmium to find all highway ways whose name ends with 'Gürtel' (the Gürtel corridor)
3. Buffer those ways by 500m to create an area of interest
4. Find all highway ways that intersect that buffer
5. Find all public-transport route relations that cross the same buffer
6. Write both layers to a GeoPackage in Austria's standard projected CRS (EPSG:31287 - MGI / Austria Lambert, or EPSG:31256 - MGI / Austria GK East? Let me think... Austria's standard projected coordinate system is typically EPSG:31287 (MGI / Austria Lambert) for nationwide purposes, or EPSG:31256/31255/31254 for GK zones. For Vienna, which is in zone M28 (EPSG:31257) or the national equal-area projection EPSG:3416 (MGI / Austria Lambert). Actually, the most commonly used "standard projected coordinate system" for all of Austria is EPSG:31287 (MGI / Austria Lambert) or EPSG:3416. Let me use EPSG:31287 as it's the standard national projection.
Wait, actually let me reconsider. EPSG:31287 (MGI / Austria Lambert) is indeed the standard projected CRS for Austria.
Now let me write the script. The approach using osmium:
For highways:
- First pass: find all way objects where tags get("name", "").endswith("Gürtel") and "highway" in tags
- Collect their geometries, create a buffer
- Second pass: find all way objects with "highway" tag, check if they intersect the buffer
For PT route relations:
- Find relations with type=route and route in (bus, tram, subway, train, trolleybus, light_rail, railway)
- Check if their geometry (the combined geometry of member ways) crosses the buffer
- Output each as a single MultiLineString
Actually, osmium's handler-based approach can be a bit tricky. Let me think about the best approach.
Option A: Use osmium for the first pass (find Gürtel ways), then use osmium again for highways and routes.
Option B: Use osmium.handler for a two-pass approach within a single file processing.
Let me write a comprehensive script. The challenge is that with osmium, you process the file sequentially and need to collect data in memory.
For the Gürtel ways:
- I'll use a handler that collects way nodes and builds linestrings for ways ending with "Gürtel"
- Then buffer them
For the highways:
- I need to find all highway ways intersecting that buffer
- I'll do a second pass through the PBF
For the PT route relations:
- I need to find relations with type=route and appropriate route tag
- For each such relation, I need to get the member ways, their geometries, and merge into a single geometry
- This requires first collecting way geometries (from a first pass), then processing relations (from a second pass)
This is complex with osmium. Let me think of an alternative approach.
Actually, let me use a simpler approach:
1. Use osmium to extract just the Gürtel ways and their node locations
2. Build the buffer
3. Use osmium again to extract all highway ways within the buffer
4. For PT routes, use osmium to extract relations with type=route and route=bus/tram/etc.
Let me write a Python script that does this step by step.
Actually, for PT routes, I need to collect the member way geometries. This is complex with osmium. Let me think...
One approach:
1. First, find Gürtel ways and compute buffer
2. Use `osmium getid` or filter to extract data within the bbox/area
3. Parse the extracted data
Actually, a more practical approach might be to use osmium directly from command line to filter, then parse with Python.
Let me think about the most straightforward approach:
Approach:
1. Find Gürtel ways using Python/osmium
2. Build the buffer
3. Use osmium tags-filter to extract relevant data
4. Parse the filtered data
Actually, let me just write a comprehensive Python script using osmium handlers.
Here's my plan:
```python
import osmium
import geopandas as gpd
import shapely.geometry
import shapely.ops
import pyproj
import math
# ...
class GuertelHandler(osmium.SimpleHandler):
def __init__(self):
super().__init__()
self.ways = []
self.nodes = {}
def way(self, w):
if 'highway' in w.tags and w.tags.get('name', '').endswith('Gürtel'):
nodes = [(n.ref, n.lon, n.lat) for n in w.nodes]
self.ways.append((w, nodes))
def node(self, n):
self.nodes[n.id] = (n.lon, n.lat)
# First pass: find Gürtel ways
handler = GuertelHandler()
handler.apply_file("austria-latest.osm.pbf")
# Build Gürtel geometries
guertel_lines = []
for w, node_refs in handler.ways:
coords = [(handler.nodes[nid][0], handler.nodes[nid][1]) for nid, lon, lat in node_refs]
if len(coords) >= 2:
guertel_lines.append(shapely.geometry.LineString(coords))
# Buffer by 500m
buffer_geom = shapely.ops.unary_union(guertel_lines).buffer(500)
```
Then for the highways pass, I need another handler that collects highway ways intersecting the buffer.
For PT routes, I need a handler that:
1. First collects all way geometries (with their node refs)
2. Then processes relations that are route relations
3. For each route relation, finds the way members, gets their geometries, combines them
This is tricky because in a single pass, we see ways before relations, so we can collect way geometries first and then when we see relations, we already have the way geometries.
Wait, actually in osmium.handler, the order is: nodes, then ways, then relations. So within a single pass, by the time we process relations, we've already seen all the ways. Good!
But the problem is that a PBF file might have nodes, ways, and relations interleaved in complex ways. Actually, in a standard PBF, the typical order is all nodes, then all ways, then all relations.
Let me write the complete script.
Actually, I realize I need to be more careful. Let me write the script step by step.
One more consideration: the buffer might be quite large (500m around a several-km-long road). I need to extract all highway ways that intersect this buffer, which could be a lot. And for PT routes, I need routes that cross the buffer.
Let me write it now.
For the PT routes, the requirement says "each route as a single feature" as MultiLineString. So for each route relation, I need to combine all its member way geometries into one MultiLineString.
Let me also handle the CRS: EPSG:31287 is MGI / Austria Lambert. I'll reproject from WGS84 to that.
Let me write the full script:
```python
import osmium
import geopandas as gpd
import pandas as pd
from shapely.geometry import LineString, MultiLineString, Point, shape
from shapely.ops import unary_union
import pyproj
import sys
import math
PBF_PATH = "/work/austria-latest.osm.pbf"
# ---------- Step 1: Find Gürtel ways ----------
class GuertelHandler(osmium.SimpleHandler):
def __init__(self):
super().__init__()
self.nodes = {}
self.guertel_ways = [] # (way_id, tags, node_ids)
def node(self, n):
self.nodes[n.id] = (float(n.lon), float(n.lat))
def way(self, w):
if 'highway' in w.tags:
name = w.tags.get('name', '')
if name.endswith('Gürtel'):
node_ids = [n.ref for n in w.nodes if n.ref != 0]
tags = dict(w.tags)
self.guertel_ways.append((w.id, tags, node_ids))
handler1 = GuertelHandler()
handler1.apply_file(PBF_PATH)
print(f"Found {len(handler1.guertel_ways)} Gürtel ways")
# Build Gürtel line geometries
guertel_lines = []
for wid, tags, node_ids in handler1.guertel_ways:
coords = []
for nid in node_ids:
if nid in handler1.nodes:
coords.append(handler1.nodes[nid])
if len(coords) >= 2:
guertel_lines.append(LineString(coords))
print(f"Built {len(guertel_lines)} Gürtel line geometries")
# Union and buffer
if guertel_lines:
guertel_union = unary_union(guertel_lines)
guertel_buffer = guertel_union.buffer(500)
print(f"Buffer area: {guertel_buffer.area} sq degrees (approx)")
else:
print("No Gürtel ways found!")
sys.exit(1)
# ---------- Step 2: Find highway ways intersecting buffer ----------
class HighwayHandler(osmium.SimpleHandler):
def __init__(self):
super().__init__()
self.nodes = {}
self.highway_ways = [] # (id, tags, geometry, node_ids)
def node(self, n):
self.nodes[n.id] = (float(n.lon), float(n.lat))
def way(self, w):
if 'highway' in w.tags:
node_ids = [n.ref for n in w.nodes if n.ref != 0]
tags = dict(w.tags)
self.highway_ways.append((w.id, tags, node_ids))
handler2 = HighwayHandler()
handler2.apply_file(PBF_PATH)
print(f"Found {len(handler2.highway_ways)} highway ways total")
# Filter by intersection with buffer
filtered_highways = []
for wid, tags, node_ids in handler2.highway_ways:
coords = []
for nid in node_ids:
if nid in handler2.nodes:
coords.append(handler2.nodes[nid])
if len(coords) >= 2:
line = LineString(coords)
if line.intersects(guertel_buffer):
filtered_highways.append({
'id': wid,
'geometry': line,
**{k: v for k, v in tags.items() if k in ('name', 'highway', 'maxspeed', 'lanes', 'surface', 'oneway')}
})
print(f"Found {len(filtered_highways)} highway ways intersecting buffer")
# ---------- Step 3: Find PT route relations ----------
class PTHandler(osmium.SimpleHandler):
def __init__(self):
super().__init__()
self.nodes = {}
self.all_ways = {} # way_id -> (tags, node_ids)
self.route_relations = [] # (rel_id, tags, member_way_ids)
def node(self, n):
self.nodes[n.id] = (float(n.lon), float(n.lat))
def way(self, w):
node_ids = [n.ref for n in w.nodes if n.ref != 0]
self.all_ways[w.id] = (dict(w.tags), node_ids)
def relation(self, r):
if r.tags.get('type') == 'route':
route = r.tags.get('route', '')
if route in ('bus', 'tram', 'subway', 'train', 'trolleybus', 'light_rail'):
member_way_ids = []
for m in r.members:
if m.type == 'w': # way member
member_way_ids.append(m.ref)
tags = dict(r.tags)
self.route_relations.append((r.id, tags, member_way_ids))
handler3 = PTHandler()
handler3.apply_file(PBF_PATH)
print(f"Found {len(handler3.route_relations)} PT route relations")
print(f"Total ways in memory: {len(handler3.all_ways)}")
# Filter routes that cross the buffer
filtered_routes = []
for rid, tags, member_way_ids in handler3.route_relations:
route_lines = []
for wid in member_way_ids:
if wid in handler3.all_ways:
_, node_ids = handler3.all_ways[wid]
coords = []
for nid in node_ids:
if nid in handler3.nodes:
coords.append(handler3.nodes[nid])
if len(coords) >= 2:
route_lines.append(LineString(coords))
if route_lines:
# Combine all member lines
combined = unary_union(route_lines)
# Check if it crosses (touches or goes through) the buffer
if combined.intersects(guertel_buffer):
if combined.geom_type == 'LineString':
mls = MultiLineString([combined])
elif combined.geom_type == 'MultiLineString':
mls = combined
else:
continue # shouldn't happen for lines
filtered_routes.append({
'id': rid,
'geometry': mls,
**{k: v for k, v in tags.items() if k in ('ref', 'name', 'operator', 'route')}
})
print(f"Found {len(filtered_routes)} PT routes crossing buffer")
# ---------- Step 4: Write to GeoPackage ----------
# Create GeoDataFrames
gdf_highways = gpd.GeoDataFrame(filtered_highways, geometry='geometry', crs='EPSG:4326')
gdf_routes = gpd.GeoDataFrame(filtered_routes, geometry='geometry', crs='EPSG:4326')
# Reproject to EPSG:31287 (MGI / Austria Lambert)
gdf_highways = gdf_highways.to_crs('EPSG:31287')
gdf_routes = gdf_routes.to_crs('EPSG:31287')
# Write to GeoPackage
gdf_highways.to_file('/work/vienna_network.gpkg', layer='highways', driver='GPKG')
gdf_routes.to_file('/work/vienna_network.gpkg', layer='pt_routes', driver='GPKG')
print("Done! Written to /work/vienna_network.gpkg")
```
Wait, but this approach processes the PBF three times, which is slow for a ~1.5GB file. Let me think if I can do it in fewer passes.
Actually, I can do it in two passes:
1. First pass: collect all nodes, find Gürtel ways, and also collect all highway ways AND all PT route relations with their member ways
2. But the issue is I need the Gürtel buffer to filter highways, so I can't do it all in one pass.
Alternative:
- Pass 1: Collect Gürtel ways and all nodes
- Compute buffer
- Pass 2: Filter highways and collect PT route info
Actually, I also need to collect ALL nodes (for way geometries) in pass 2. But I already collected them in pass 1. Can I reuse? Well, the node coordinates don't change, so I could reuse. But I need to be careful about memory.
Let me optimize:
- Pass 1: Collect Gürtel ways + all nodes
- Compute buffer
- Pass 2: Using the same nodes dict from pass 1 (but osmium creates a new handler)... Hmm, I can't easily share state between handlers across separate apply_file calls unless I use the same handler object.
Wait, actually I can use a single handler that does everything, but processes in order. The issue is that I need to know the Gürtel buffer before I can filter highways and routes. So I need two passes.
Let me structure it as two passes:
1. Pass 1: Read all nodes and find Gürtel ways
2. Compute buffer
3. Pass 2: Read all nodes again (or reuse if stored), find highways intersecting buffer and PT routes
Actually, let me store nodes from pass 1 and reuse them. The PBF is about 1.5GB and Austria has millions of nodes... Let me check how many nodes Austria has.
Actually, the nodes could be ~100+ million. Storing all of them as Python dict entries might use a lot of memory (~10GB+). Let me think about a more memory-efficient approach.
Alternative: Use a spatial index approach. I can:
1. First pass: find Gürtel ways, extract their node coords, build buffer
2. Second pass: for highways, I need to check each way's nodes against the buffer. Instead of storing all nodes, I can stream through and build geometries on the fly, checking against the buffer.
But I need the actual node coordinates to build the geometry to check intersection! So I do need the nodes.
Actually, let me think about this differently. The Gürtel is a road in Vienna. The buffer of 500m around it is a relatively small area. Most nodes and ways in Austria won't intersect it. So I could:
1. First pass: find Gürtel ways, collect their node coordinates, build buffer
2. Second pass: process ALL nodes, but only keep those that are within or near the buffer (or referenced by Gürtel ways). For highway ways, check if any of their referenced nodes are near the buffer, and if so, build the geometry using the stored nodes.
Actually, the simplest approach might be to use osmium's location handler which gives us node coordinates as we process ways. Let me use osmium's `NodeLocationsForWays` handler.
Actually, osmium has `osmium.geom.GeoLocationHandler` or `osmium.NodeLocationsForWays` that can be used to automatically populate node locations when processing ways. Let me look at the osmium API.
Wait, I can use `osmium.geom.GeoLocationHandler` or simpler: use `osmium.NodeLocationsForWays` as a handler wrapper.
Actually, in the Python bindings, there's `osmium.NodeLocationsForWays` which is a handler that stores node locations and makes them available when processing ways. Let me use this.
Here's a better approach:
```python
class HighwayCollector(osmium.SimpleHandler):
def __init__(self, buffer_geom):
super().__init__()
self.buffer = buffer_geom
self.highways = []
def way(self, w):
if 'highway' in w.tags:
# Build geometry from node locations
# But we need the node locations!
```
Hmm, osmium doesn't automatically resolve node locations for ways in Python. We need to use the locations handler.
Let me look at how osmium works. There's `osmium.NodeLocationsForWays` which is a handler that, when used as part of the handler chain, populates node locations in ways:
```python
import osmium
class MyHandler(osmium.SimpleHandler):
def __init__(self):
super().__init__()
def way(self, w):
# w.nodes now have .lat and .lon populated if NodeLocationsForWays was used
for n in w.nodes:
print(n.lon, n.lat)
h = MyHandler()
# Apply with NodeLocationsForWays as a location handler
h.apply_file("file.osm.pbf", locations=True) # or use a separate handler
```
Actually, in the Python osmium bindings, `SimpleHandler.apply_file` has a `locations` parameter. When set to True, it uses `NodeLocationsForWays` internally. Let me check this.
Looking at the osmium Python docs/examples:
```python
handler.apply_file("file.osm.pbf", locations=True)
```
This should work! But it stores node locations in memory, which could be large.
OK, let me just be pragmatic and write the script. It might use a lot of memory for all of Austria's nodes, but let me try the two-pass approach:
Pass 1: Read all nodes + Gürtel ways (using locations=True)
- This stores all node locations in memory via NodeLocationsForWays
- Collect Gürtel way geometries
Actually, with `locations=True`, the osmium library stores all node locations internally. Then I can access them in the way handler. Let me try this.
But wait, if I use `locations=True` in the first pass, all nodes are stored. Then in the second pass, I need them again. I could combine everything into one pass if I use a two-stage approach within a single handler.
Let me try a different approach. Since osmium handlers are simple sequential processors:
Pass 1: Just find Gürtel ways and collect their node IDs
- Don't need node coordinates yet, just the IDs
Actually wait, I need the node coordinates to build the geometry to buffer. Let me think...
Actually, I can get the bbox of the Gürtel from the way nodes even without storing all coordinates. Let me try:
Pass 1:
- For each node: if its ID is referenced by a Gürtel way, store its coordinates
- For each way: if it's a Gürtel way, note which node IDs it uses
- After the pass, build Gürtel geometries and buffer
This is more memory efficient. Let me estimate: the Gürtel is maybe 10-15 km long, with nodes every 10-20 meters, so maybe 500-1500 nodes. The buffer of 500m around this is still very small compared to all of Austria.
So:
Pass 1:
- Collect all nodes referenced by Gürtel ways (store these node coords)
- Collect Gürtel way data
Then build buffer.
Pass 2:
- For each node: check if it's within a bounding box of the buffer (rough filter). If so, store it.
- For each way: if it's a highway way, check if its nodes are stored (meaning they're near the buffer). If at least one node is near, build the full geometry from stored node coordinates.
- For each relation: similar approach.
But this is getting complex. Let me just use the simple approach with `locations=True` and accept the memory usage. The script will work for practical use.
Actually, I just realized I can download a smaller regional extract. Vienna is in the "Wien" region of Geofabrik, which has a much smaller PBF. But the user specified https://download.geofabrik.de/europe/austria-latest.osm.pbf specifically.
Let me try the straightforward approach and see if it works:
1. Download the PBF
2. Use osmium with `locations=True` to process
3. Two passes (or one long pass)
Actually, I just thought of an even better approach. I can use a single pass with osmium, using the fact that osmium processes nodes before ways before relations. I can:
1. In the node handler: store all node locations (using NodeLocationsForWays or manually)
2. In the way handler:
a. If it's a Gürtel way, collect its geometry
b. If it's a highway way, store it temporarily
3. After all ways are processed, compute the buffer from Gürtel geometries
4. Then in the relation handler:
a. Filter PT routes that intersect the buffer
But wait, osmium processes in order: all nodes, then all ways, then all relations. So by the time we're in the way handler, we've seen all nodes. And by the time we're in the relation handler, we've seen all ways. But we can't compute the buffer until we've seen all Gürtel ways, which means we can't filter highway ways during the way handler callback.
However, I can collect all highway ways during the way callback, then after the file is processed, filter them. Same for relations.
Let me restructure:
```python
class AllInOneHandler(osmium.SimpleHandler):
def __init__(self):
super().__init__()
self.guertel_ways = []
self.all_highway_ways = []
self.route_relations = []
def way(self, w):
if 'highway' in w.tags:
name = w.tags.get('name', '')
highway_info = {
'id': w.id,
'tags': dict(w.tags),
'nodes': [(n.lon, n.lat) for n in w.nodes if n.location_valid()]
}
if name.endswith('Gürtel') and len(highway_info['nodes']) >= 2:
self.guertel_ways.append(highway_info)
self.all_highway_ways.append(highway_info)
def relation(self, r):
if r.tags.get('type') == 'route':
route = r.tags.get('route', '')
if route in ('bus', 'tram', 'subway', 'train', 'trolleybus', 'light_rail'):
# Store relation info - we'll process member ways later
member_way_ids = [m.ref for m in r.members if m.type == 'w']
self.route_relations.append({
'id': r.id,
'tags': dict(r.tags),
'member_way_ids': member_way_ids
})
```
But with this approach, I don't have the way geometries for route members during the relation callback... Actually, I can store all way geometries by ID and look them up when processing relations after the file is read.
Wait, in osmium, with `locations=True`, the way handler receives ways with node locations populated. So I can build geometries in the way handler and store them. Then in the relation handler, I can look up member way geometries.
But the issue is: I need the Gürtel buffer to filter highways, and I don't have that until after all ways are processed. So during the way callback, I can't filter yet.
Let me just store everything and filter after the file is processed.
Here's my revised plan:
```python
handler = AllInOneHandler()
handler.apply_file(PBF_PATH, locations=True)
# Now compute buffer from Gürtel ways
guertel_lines = [LineString(w['nodes']) for w in handler.guertel_ways if len(w['nodes']) >= 2]
buffer = unary_union(guertel_lines).buffer(500 / 111000) # approximate deg-to-m
# Filter highways
highways = []
for w in handler.all_highway_ways:
line = LineString(w['nodes'])
if line.intersects(buffer):
highways.append({...})
# For PT routes, I need to look up way geometries
way_geoms = {w['id']: LineString(w['nodes']) for w in handler.all_highway_ways if len(w['nodes']) >= 2}
# Also need to include non-highway ways that might be part of PT routes...
# Actually, railway ways might not have 'highway' tag but could be part of train/subway routes
```
Hmm, I need to store ALL ways, not just highway ways, for PT routes. Let me store all ways.
Actually, I think the cleanest approach is:
1. One pass: store all nodes (via locations=True), all ways (with geometries), and all PT route relations
2. After the pass: compute buffer, filter highways, filter PT routes
But storing all ways for all of Austria could be huge... Let me estimate. Austria has about 5M ways. Each way has maybe 5 nodes on average. That's a lot of data.
Let me try a more practical approach using the two-pass method with minimal memory:
Pass 1: Find Gürtel ways, collect their node coords only
Pass 2: Use osmium's `NodeLocationsForWays` to process all data, but only store what's in/near the buffer's bbox
Actually, I think the simplest thing that will work is:
1. Download the file
2. Use osmium to filter by a bounding box around Vienna first, then process the filtered data
Or even better: let me use `osmium tags-filter` from the command line to extract a small region first, then process that with Python.
Let me try this approach:
```bash
# First, find the bbox of the Gürtel using osmium
osmium getid -r austria-latest.osm.pbf --bbox-from-ways --highway --name="*Gürtel" > guertel.osm
```
Hmm, actually let me just write the Python script and see if it works. I'll use `locations=True` which should handle memory reasonably.
Actually, let me think about this more practically. Let me use `osmium` command-line tools to extract just Vienna, then process the much smaller file.
Wait, actually, let me try another approach. Let me use `osmium tags-filter` to filter just the relevant data first.
Plan:
1. Download the PBF
2. Use Python/osmium to find Gürtel ways and compute buffer bbox
3. Use `osmium extract` with the bbox to get just the Vienna area
4. Process the extracted file for highways and routes
Actually, this is getting complicated. Let me just write the full Python script and run it.
Let me write a pragmatic script:
```python
import osmium
import geopandas as gpd
from shapely.geometry import LineString, MultiLineString, box
from shapely.ops import unary_union
import sys
import os
PBF = "/work/austria-latest.osm.pbf"
# ---------- Pass 1: Find Gürtel ways only ----------
class GuertelFinder(osmium.SimpleHandler):
def __init__(self):
super().__init__()
self.nodes_needed = set()
self.guertel_way_nodes = {} # way_id -> [node_id, ...]
self.guertel_tags = {}
self.node_coords = {}
def way(self, w):
if 'highway' in w.tags:
name = w.tags.get('name', '')
if name.endswith('Gürtel'):
node_ids = [n.ref for n in w.nodes if n.ref != 0]
self.guertel_way_nodes[w.id] = node_ids
self.guertel_tags[w.id] = dict(w.tags)
self.nodes_needed.update(node_ids)
def node(self, n):
if n.id in self.nodes_needed:
self.node_coords[n.id] = (n.lon, n.lat)
print("Pass 1: Finding Gürtel ways...")
finder = GuertelFinder()
finder.apply_file(PBF)
print(f"Found {len(finder.guertel_way_nodes)} Gürtel ways, {len(finder.nodes_needed)} unique nodes")
# Build Gürtel geometries
guertel_lines = []
for wid, node_ids in finder.guertel_way_nodes.items():
coords = [finder.node_coords[nid] for nid in node_ids if nid in finder.node_coords]
if len(coords) >= 2:
guertel_lines.append(LineString(coords))
guertel_union = unary_union(guertel_lines)
# Buffer 500m - convert degrees to approximate meters at Vienna's latitude (~48.2°N)
# 1 degree lon ≈ 111320 * cos(48.2°) ≈ 74200 m
# 1 degree lat ≈ 111320 m
# So 500m ≈ 0.0045° lat, 0.0067° lon
# Use a simple buffer in degrees (slightly generous)
buffer_deg = 500.0 / 111000.0 # approximate
guertel_buffer = guertel_union.buffer(buffer_deg)
print(f"Buffer computed")
# ---------- Pass 2: Find highways and PT routes ----------
class Collector(osmium.SimpleHandler):
def __init__(self):
super().__init__()
self.way_geoms = {} # way_id -> LineString
self.way_tags = {} # way_id -> tags dict
self.highways_in_buffer = []
self.route_relations = []
def way(self, w):
# Build geometry if it has location info
coords = [(n.lon, n.lat) for n in w.nodes if n.location_valid()]
if len(coords) < 2:
return
line = LineString(coords)
self.way_geoms[w.id] = line
self.way_tags[w.id] = dict(w.tags)
# Check if highway and intersects buffer
if 'highway' in w.tags:
if line.intersects(guertel_buffer):
tags = dict(w.tags)
self.highways_in_buffer.append({
'geometry': line,
**{k: v for k, v in tags.items() if k in ('name', 'highway', 'maxspeed', 'lanes', 'surface', 'oneway')}
})
def relation(self, r):
if r.tags.get('type') != 'route':
return
route = r.tags.get('route', '')
if route not in ('bus', 'tram', 'subway', 'train', 'trolleybus', 'light_rail'):
return
# Get member way geometries
member_lines = []
for m in r.members:
if m.type == 'w' and m.ref in self.way_geoms:
member_lines.append(self.way_geoms[m.ref])
if not member_lines:
return
# Combine into single geometry
combined = unary_union(member_lines)
if combined.is_empty:
return
# Check if it crosses the buffer
if not combined.intersects(guertel_buffer):
return
# Convert to MultiLineString
if combined.geom_type == 'LineString':
geom = MultiLineString([combined])
elif combined.geom_type == 'MultiLineString':
geom = combined
else:
return # skip Point/MultiPoint/other
tags = dict(r.tags)
self.route_relations.append({
'geometry': geom,
**{k: v for k, v in tags.items() if k in ('ref', 'name', 'operator', 'route')}
})
print("Pass 2: Collecting highways and PT routes...")
collector = Collector()
# We need locations=True to get node coordinates in ways
collector.apply_file(PBF, locations=True)
print(f"Found {len(collector.highways_in_buffer)} highway ways in buffer")
print(f"Found {len(collector.route_relations)} PT routes crossing buffer")
```
Wait, but with `locations=True`, osmium stores ALL node locations in memory. That's basically the whole of Austria's nodes. This could be 100M+ nodes, each taking ~40 bytes in the internal C++ data structure. That's 4GB+ just for nodes. This might work but could be slow.
Actually, let me check if there's a better way. I can set `locations=True` only for ways that I'm interested in, but that's not how osmium works - it stores all nodes.
Alternative: I could use the `NodeLocationsForWays` handler with a bbox filter (not available in the Python bindings directly).
Let me try a different approach: use the bbox of the Gürtel buffer to extract a region first.
Let me get the bbox of the buffer and extract just that region using osmium command line.
Actually, let me just try running the script and see if it works within memory constraints. If it's too slow/memory-intensive, I'll adjust.
Let me write the complete script now.
Let me also check: the user wants EPSG:31287 (MGI / Austria Lambert) which is the standard projected CRS for Austria.
Let me write the full script now.
One more thing: I need to handle the fact that for PT routes, the member ways might include non-highway ways (e.g., railway tracks for train/subway). So I need to store all way geometries, not just highway ways.
Actually, looking at my code above, I already store all way geometries in `self.way_geoms`. Good.
Let me also think about what "crosses the same buffer" means for PT routes. A route "crosses" the buffer if any part of its geometry passes through the buffer. Using `intersects` is correct for this.
Let me finalize the script.
6:10
Write
/work/solve.py
content · 215 lines · py
#!/usr/bin/env python3
"""
Extract highways intersecting a 500m buffer around Vienna's Gürtel,
and public-transport route relations crossing that same buffer,
from the Geofabrik Austria PBF.
Output: vienna_network.gpkg (layers: highways, pt_routes)
CRS: EPSG:31287 (MGI / Austria Lambert)
"""
import osmium
import sys
import os
import time
from shapely.geometry import LineString, MultiLineString
from shapely.ops import unary_union
import geopandas as gpd
PBF = "/work/austria-latest.osm.pbf"
# ---------------------------------------------------------------------------
# Step 1 — Find Gürtel ways and compute 500 m buffer
# ---------------------------------------------------------------------------
class GuertelFinder(osmium.SimpleHandler):
"""First pass: collect only the node coordinates needed for Gürtel ways."""
def __init__(self):
super().__init__()
self.nodes_needed = set() # node ids referenced by Gürtel ways
self.guertel_way_nodes = {} # way_id -> list of node ids
self.guertel_tags = {} # way_id -> tags dict
self.node_coords = {} # node_id -> (lon, lat)
def way(self, w):
if "highway" in w.tags:
name = w.tags.get("name", "")
if name.endswith("Gürtel"):
node_ids = [n.ref for n in w.nodes if n.ref != 0]
self.guertel_way_nodes[w.id] = node_ids
self.guertel_tags[w.id] = dict(w.tags)
self.nodes_needed.update(node_ids)
def node(self, n):
if n.id in self.nodes_needed:
self.node_coords[n.id] = (n.lon, n.lat)
print("=" * 60, flush=True)
print("Step 1 — Finding Gürtel ways...", flush=True)
t0 = time.time()
finder = GuertelFinder()
finder.apply_file(PBF)
print(f" Gürtel ways found: {len(finder.guertel_way_nodes)}", flush=True)
print(f" Unique nodes to collect: {len(finder.nodes_needed)}", flush=True)
print(f" Nodes collected: {len(finder.node_coords)}", flush=True)
# Build LineStrings for each Gürtel way
guertel_lines = []
for wid, node_ids in finder.guertel_way_nodes.items():
coords = [finder.node_coords[nid] for nid in node_ids if nid in finder.node_coords]
if len(coords) >= 2:
guertel_lines.append(LineString(coords))
if not guertel_lines:
print("ERROR: No Gürtel ways found. Exiting.", flush=True)
sys.exit(1)
guertel_union = unary_union(guertel_lines)
# 500 m buffer — approximate conversion: 1° ≈ 111 000 m at this latitude
BUFFER_DEG = 500.0 / 111000.0
guertel_buffer = guertel_union.buffer(BUFFER_DEG)
buffer_bounds = guertel_buffer.bounds # (minx, miny, maxx, maxy)
print(f"Gürtel buffer bounds (deg): {buffer_bounds}", flush=True)
print(f" Elapsed: {time.time()-t0:.1f}s", flush=True)
# ---------------------------------------------------------------------------
# Step 2 — One pass: collect all way geometries & find highways + PT routes
# ---------------------------------------------------------------------------
class Collector(osmium.SimpleHandler):
"""Second pass (with locations=True): build every way geometry, then
filter highways by buffer and build PT route geometries."""
def __init__(self, buffer_geom):
super().__init__()
self.buffer = buffer_geom
self.way_geoms = {} # way_id -> Shapely LineString
self.way_tags = {} # way_id -> full tags dict
self.highway_features = [] # list of dicts for GeoDataFrame
self.route_features = [] # list of dicts for GeoDataFrame
# stats
self._n_ways = 0
self._n_highways = 0
self._n_highway_in = 0
self._n_routes = 0
self._n_route_in = 0
def way(self, w):
self._n_ways += 1
coords = [(n.lon, n.lat) for n in w.nodes if n.location_valid()]
if len(coords) < 2:
return
line = LineString(coords)
self.way_geoms[w.id] = line
self.way_tags[w.id] = dict(w.tags)
if "highway" not in w.tags:
return
self._n_highways += 1
if not line.intersects(self.buffer):
return
self._n_highway_in += 1
tags = dict(w.tags)
row = {
"geometry": line,
}
for k in ("name", "highway", "maxspeed", "lanes", "surface", "oneway"):
row[k] = tags.get(k, None)
self.highway_features.append(row)
def relation(self, r):
if r.tags.get("type") != "route":
return
route_val = r.tags.get("route", "")
if route_val not in ("bus", "tram", "subway", "train",
"trolleybus", "light_rail"):
return
self._n_routes += 1
# Collect member-way geometries
member_lines = []
for m in r.members:
if m.type == "w" and m.ref in self.way_geoms:
member_lines.append(self.way_geoms[m.ref])
if not member_lines:
return
combined = unary_union(member_lines)
if combined.is_empty or not combined.intersects(self.buffer):
return
self._n_route_in += 1
# Normalise to MultiLineString
if combined.geom_type == "LineString":
geom = MultiLineString([combined])
elif combined.geom_type == "MultiLineString":
geom = combined
else:
return # point / polygon — skip
tags = dict(r.tags)
row = {"geometry": geom}
for k in ("ref", "name", "operator", "route"):
row[k] = tags.get(k, None)
self.route_features.append(row)
print("Step 2 — Collecting highways and PT routes (locations=True) ...", flush=True)
t1 = time.time()
collector = Collector(guertel_buffer)
collector.apply_file(PBF, locations=True)
print(f" Total ways seen: {collector._n_ways}", flush=True)
print(f" Highway ways seen: {collector._n_highways}", flush=True)
print(f" Highway ways in buffer: {collector._n_highway_in}", flush=True)
print(f" PT route relations seen: {collector._n_routes}", flush=True)
print(f" PT routes crossing buffer: {collector._n_route_in}", flush=True)
print(f" Elapsed: {time.time()-t1:.1f}s", flush=True)
# ---------------------------------------------------------------------------
# Step 3 — Build GeoDataFrames, reproject, write GeoPackage
# ---------------------------------------------------------------------------
if not collector.highway_features:
print("WARNING: no highway features to write!", flush=True)
if not collector.route_features:
print("WARNING: no PT route features to write!", flush=True)
print("Step 3 — Writing GeoPackage ...", flush=True)
t2 = time.time()
gdf_hw = gpd.GeoDataFrame(collector.highway_features, crs="EPSG:4326")
gdf_pt = gpd.GeoDataFrame(collector.route_features, crs="EPSG:4326")
# Reproject to Austria's standard projected CRS
gdf_hw = gdf_hw.to_crs("EPSG:31287")
gdf_pt = gdf_pt.to_crs("EPSG:31287")
OUT = "/work/vienna_network.gpkg"
# Remove output if it exists (GeoPackage append mode can be tricky)
if os.path.exists(OUT):
os.remove(OUT)
gdf_hw.to_file(OUT, layer="highways", driver="GPKG", engine="pyogrio")
gdf_pt.to_file(OUT, layer="pt_routes", driver="GPKG", engine="pyogrio")
print(f" Highways layer: {len(gdf_hw)} features", flush=True)
print(f" PT routes layer: {len(gdf_pt)} features", flush=True)
print(f" Output: {OUT}", flush=True)
print(f" Elapsed: {time.time()-t2:.1f}s", flush=True)
print("=" * 60, flush=True)
print("Done.", flush=True)
Wrote /work/solve.py (7571 chars).
6:17
Let me first download the PBF file, then run the script. Actually, wait. The script expects the PBF to already be at `/work/austria-latest.osm.pbf`. I need to download it first. Let me do that. The PBF is ~90 MB or more, so it might take some time. Let me download it in the background.
6:17
Bash
cd /work && wget -q -O austria-latest.osm.pbf "https://download.geofabrik.de/eur...
bash: line 1: wget: command not found
6:24
Let me use curl instead.
6:24
Bash
cd /work && curl -sLo austria-latest.osm.pbf "https://download.geofabrik.de/euro...
cd /work && curl -sLo austria-latest.osm.pbf "https://download.geofabrik.de/europe/austria-latest.osm.pbf" & echo "Download started in background, PID=$!"
Download started in background, PID=97
bash: line 1: curl: command not found
6:30
Neither wget nor curl is available. Let me try python's urllib to download.
6:30
Bash
python3 -c "
python3 -c "
import urllib.request
import sys
url = 'https://download.geofabrik.de/europe/austria-latest.osm.pbf'
print(f'Downloading {url} ...')
urllib.request.urlretrieve(url, '/work/austria-latest.osm.pbf')
print('Download complete.')
" 2>&1
Downloading https://download.geofabrik.de/europe/austria-latest.osm.pbf ... Download complete.
7:05
Good, the PBF is downloaded. Now let's run the script. The PBF for Austria is about 200+ MB, so it might take a while. Let me check the file size first.
7:05
Bash
ls -lh /work/austria-latest.osm.pbf
-rw-r--r-- 1 agent agent 765M Jun 15 16:47 /work/austria-latest.osm.pbf
7:14
Let me think about the approach. The first approach requires two passes over the PBF - which means loading the file twice. That would be very slow for a 765MB file. Let me reconsider. Actually, the first pass is just to find Gürtel ways and their node coordinates. But in the second pass, we need `locations=True` to get all way geometries. Wait, there's a problem with my approach: Step 1 does a first pass without `locations=True`, and Step 2 does a second pass with `locations=True`. This means the whole 765MB file gets read twice. The second pass (with locations=True) is the main heavy one anyway. Let me think if we can do this in a single pass... The issue is: to compute the buffer from Gürtel ways, we need Gürtel way geometries. And to determine if a highway is in the buffer, we need the buffer. So we can't know the buffer until all ways are processed. One approach: First pass finds Gürtel ways and nodes (locations=False but we store nodes for Gürtel ways), then compute buffer, then second pass (with locations=True) filters highways and collects PT routes. Actually, my code already does exactly this. Let me just run it and see if it works. But it will take a while with 765MB. Actually, I realize there's an issue: the first pass only collects node coordinates for Gürtel ways. But I should use `locations=False` (default) for the first pass - which is what I do. Then in the handler, `way()` only gets node refs, not coordinates. And `node()` gets lon/lat. So I track which nodes I need and collect them. Wait, but `finder.apply_file(PBF)` without `locations=True` means the way handler won't have coordinates, but node handler will fire. That's correct for the first pass. Then in the second pass with `locations=True`, both ways and nodes get coordinates automatically, so the way handler sees `n.location_valid()` - that's correct. Let me run it. This could take a while...
7:14
Bash
cd /work && python3 solve.py 2>&1
============================================================ Step 1 — Finding Gürtel ways... Gürtel ways found: 398 Unique nodes to collect: 1695 Nodes collected: 0 ERROR: No Gürtel ways found. Exiting.
11:01
The issue is that in the first pass, we're using `osmium.SimpleHandler` (without `locations=True`), so the `node()` method only gets called for nodes that are read but the handler doesn't have locations - but wait, `SimpleHandler` with `locations=False` (default) should still call `node()` with the node data including coordinates. Let me check: when `locations=True`, the `way()` method automatically provides location for each node reference. When `locations=False`, `node()` is still called with the node coordinates. The issue is: `finder.apply_file(PBF)` by default doesn't enable locations. But the `node()` callback is still called. Let me debug. Actually wait, the problem is that in the first pass (SimpleHandler without locations=True), the `node()` handler IS called. But the `node()` method checks `if n.id in self.nodes_needed`. The nodes_needed is populated in the `way()` callback. But the issue is that in a SimpleHandler without `locations=True`, the nodes might be processed BEFORE the ways. Wait, no. Actually in osmium, the order of callbacks depends on the file format. In PBF files, nodes come before ways. So when `way()` is called, nodes have already been processed. That's the problem! The `nodes_needed` set is only populated during `way()` processing, but `node()` is called before `way()`. Wait actually I think that's not right either. Let me think more carefully. In OSM PBF files, the data is ordered: nodes first, then ways, then relations. So when using `SimpleHandler` (without locations=True), the callbacks will be: 1. node() for all nodes 2. way() for all ways 3. relation() for all relations But with `locations=True`, the handler buffers node locations and makes them available during way() calls. So for the first pass, the node() callback is called first for ALL nodes, but at that point `self.nodes_needed` is empty. Then way() is called and populates `self.nodes_needed`, but it's too late - nodes have already been processed. I need a different approach. Options: 1. Do two passes: first find Gürtel way node IDs, then run a second pass with those node IDs to get coordinates. 2. Use a single pass with `locations=True` to get the Gürtel geometries directly. 3. Read the file twice: first pass just to get node IDs, then re-read to get those node coordinates. Actually, the simplest fix: In the first pass (without locations), we can still capture node coordinates for ALL nodes (or at least store them), then in way() we can check. But that would require storing all nodes in Austria (memory heavy). Better approach: Use two passes. - Pass 1: Just find Gürtel way IDs and their node IDs (SimpleHandler without locations, store node IDs from way()) - Pass 2: Read with locations=True and collect only those nodes' coordinates. But wait, `node()` is still called before `way()` in pass 2... Actually, with `locations=True`, the handler enables "locations on ways" mode, which means way() callbacks get node locations automatically. But the node() callback is still called first for all nodes. Let me restructure: - Use just ONE pass with `locations=True`. Find the Gürtel ways when their ways are processed, capture their geometries directly. Actually, for the first pass we only need the buffer geometry. Let me use a single pass with locations=True: 1. In `way()`: if name ends with Gürtel and has highway, store the LineString 2. After processing, compute buffer from those lines 3. But we can't use that buffer during the same pass... Better approach: Two complete passes, both with locations=True. Pass 1 (GuertelFinder with locations=True): - way(): if name ends Gürtel, build LineString from node locations, store it - node(): just needed by osmium for locations - After: compute buffer Pass 2 (Collector with locations=True): - way(): if highway, check intersection with buffer - relation(): check intersection with buffer Let me rewrite.
11:01
11:01
Write
/work/solve.py
content · 199 lines · py
#!/usr/bin/env python3
"""
Extract highways intersecting a 500m buffer around Vienna's Gürtel,
and public-transport route relations crossing that same buffer,
from the Geofabrik Austria PBF.
Output: vienna_network.gpkg (layers: highways, pt_routes)
CRS: EPSG:31287 (MGI / Austria Lambert)
"""
import osmium
import sys
import os
import time
from shapely.geometry import LineString, MultiLineString
from shapely.ops import unary_union
import geopandas as gpd
PBF = "/work/austria-latest.osm.pbf"
# ===================================================================
# PASS 1 — Find Gürtel ways (name ends with 'Gürtel') with locations
# ===================================================================
class GuertelFinder(osmium.SimpleHandler):
def __init__(self):
super().__init__()
self.guertel_lines = []
def way(self, w):
if "highway" not in w.tags:
return
name = w.tags.get("name", "")
if not name.endswith("Gürtel"):
return
coords = [(n.lon, n.lat) for n in w.nodes if n.location_valid()]
if len(coords) >= 2:
self.guertel_lines.append(LineString(coords))
print("=" * 60, flush=True)
print("PASS 1 — Finding Gürtel ways...", flush=True)
t0 = time.time()
finder = GuertelFinder()
finder.apply_file(PBF, locations=True)
print(f" Gürtel LineStrings: {len(finder.guertel_lines)}", flush=True)
if not finder.guertel_lines:
print("ERROR: No Gürtel ways found.", flush=True)
sys.exit(1)
guertel_union = unary_union(finder.guertel_lines)
# 500 m buffer — approximate: 1° ≈ 111 000 m at Vienna latitude
BUFFER_DEG = 500.0 / 111000.0
guertel_buffer = guertel_union.buffer(BUFFER_DEG)
print(f" Buffer bounds (deg): {guertel_buffer.bounds}", flush=True)
print(f" Elapsed: {time.time()-t0:.1f}s", flush=True)
# ===================================================================
# PASS 2 — Collect highway ways + PT route relations in buffer
# ===================================================================
class Collector(osmium.SimpleHandler):
def __init__(self, buffer_geom):
super().__init__()
self.buffer = buffer_geom
self.way_geoms = {} # way_id -> Shapely LineString
self.way_tags = {} # way_id -> full tags dict
self.highway_features = [] # list of dicts for GeoDataFrame
self.route_features = [] # list of dicts for GeoDataFrame
# stats
self._n_ways = 0
self._n_highways = 0
self._n_highway_in = 0
self._n_routes = 0
self._n_route_in = 0
def way(self, w):
self._n_ways += 1
coords = [(n.lon, n.lat) for n in w.nodes if n.location_valid()]
if len(coords) < 2:
return
line = LineString(coords)
self.way_geoms[w.id] = line
self.way_tags[w.id] = dict(w.tags)
if "highway" not in w.tags:
return
self._n_highways += 1
if not line.intersects(self.buffer):
return
self._n_highway_in += 1
tags = dict(w.tags)
row = {"geometry": line}
for k in ("name", "highway", "maxspeed", "lanes", "surface", "oneway"):
row[k] = tags.get(k, None)
self.highway_features.append(row)
def relation(self, r):
if r.tags.get("type") != "route":
return
route_val = r.tags.get("route", "")
if route_val not in ("bus", "tram", "subway", "train",
"trolleybus", "light_rail"):
return
self._n_routes += 1
# Collect member-way geometries
member_lines = []
for m in r.members:
if m.type == "w" and m.ref in self.way_geoms:
member_lines.append(self.way_geoms[m.ref])
if not member_lines:
return
combined = unary_union(member_lines)
if combined.is_empty or not combined.intersects(self.buffer):
return
self._n_route_in += 1
# Normalise to MultiLineString
if combined.geom_type == "LineString":
geom = MultiLineString([combined])
elif combined.geom_type == "MultiLineString":
geom = combined
else:
return # point / polygon — skip
tags = dict(r.tags)
row = {"geometry": geom}
for k in ("ref", "name", "operator", "route"):
row[k] = tags.get(k, None)
self.route_features.append(row)
print("PASS 2 — Collecting highways and PT routes...", flush=True)
t1 = time.time()
collector = Collector(guertel_buffer)
collector.apply_file(PBF, locations=True)
print(f" Total ways seen: {collector._n_ways}", flush=True)
print(f" Highway ways seen: {collector._n_highways}", flush=True)
print(f" Highway ways in buffer: {collector._n_highway_in}", flush=True)
print(f" PT route relations seen: {collector._n_routes}", flush=True)
print(f" PT routes crossing buffer: {collector._n_route_in}", flush=True)
print(f" Elapsed: {time.time()-t1:.1f}s", flush=True)
# ===================================================================
# PASS 2b — Re-read relations for PT routes whose ways were already
# collected earlier in ways but relations came after ways?
# Actually osmium calls way() then relation() in order, so all ways
# are already collected by the time relations are processed. Good.
# ===================================================================
# ===================================================================
# Step 3 — Build GeoDataFrames, reproject, write GeoPackage
# ===================================================================
if not collector.highway_features:
print("WARNING: no highway features to write!", flush=True)
if not collector.route_features:
print("WARNING: no PT route features to write!", flush=True)
print("STEP 3 — Writing GeoPackage ...", flush=True)
t2 = time.time()
gdf_hw = gpd.GeoDataFrame(collector.highway_features, crs="EPSG:4326")
gdf_pt = gpd.GeoDataFrame(collector.route_features, crs="EPSG:4326")
# Reproject to Austria's standard projected CRS
gdf_hw = gdf_hw.to_crs("EPSG:31287")
gdf_pt = gdf_pt.to_crs("EPSG:31287")
OUT = "/work/vienna_network.gpkg"
# Remove output if it exists (GeoPackage append mode can be tricky)
if os.path.exists(OUT):
os.remove(OUT)
gdf_hw.to_file(OUT, layer="highways", driver="GPKG", engine="pyogrio")
gdf_pt.to_file(OUT, layer="pt_routes", driver="GPKG", engine="pyogrio")
print(f" Highways layer: {len(gdf_hw)} features", flush=True)
print(f" PT routes layer: {len(gdf_pt)} features", flush=True)
print(f" Output: {OUT}", flush=True)
print(f" Elapsed: {time.time()-t2:.1f}s", flush=True)
print("=" * 60, flush=True)
print("Done.", flush=True)
Wrote /work/solve.py (6809 chars).
11:12
Wait, actually looking more carefully at osmium's SimpleHandler: the handler processes nodes first, then ways, then relations. But with `locations=True`, the node coordinates are available when `way()` is called. So in PASS 1, the `GuertelFinder` handler should get the node coordinates through `n.location_valid()` in the `way()` call. The issue was that in the first version I split it into two passes: first finding node IDs, then collecting coordinates. With `locations=True` the coordinates are already embedded in the way nodes. But in the first run, the GuertelFinder didn't have `locations=True` initially - it was `finder.apply_file(PBF)` without `locations=True`. That's why nodes collected was 0. Let me check: actually wait, the first version had `finder.apply_file(PBF)` without `locations=True` for the first pass, but then the way handler collected node IDs and the node handler checked `if n.id in self.nodes_needed` - but since processing order is nodes first, then ways, the node handler would have been called BEFORE the way handler sets `self.nodes_needed`. So the node handler never saw any needed nodes. So I fixed it by using `locations=True` and reconstructing coordinates directly in `way()`. That should work now. Let me run it.
11:12
Bash
cd /work && python3 solve.py 2>&1
============================================================
PASS 1 — Finding Gürtel ways...
Traceback (most recent call last):
File "/work/solve.py", line 45, in <module>
finder.apply_file(PBF, locations=True)
~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.14/site-packages/osmium/simple_handler.py", line 61, in apply_file
self._apply_object(filename, locations, idx, filters)
~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.14/site-packages/osmium/simple_handler.py", line 96, in _apply_object
apply(rd, *handlers)
~~~~~^^^^^^^^^^^^^^^
File "/work/solve.py", line 36, in way
coords = [(n.lon, n.lat) for n in w.nodes if n.location_valid()]
^^^^^^^^^^^^^^^^
AttributeError: 'NodeRef' object has no attribute 'location_valid'
11:25
The issue is that `n.location_valid()` doesn't exist for NodeRef objects. When using `locations=True`, we can check if the node has valid coordinates via `n.lon` and checking for -1 sentinel, but actually looking at osmium docs, when `locations=True`, `n.lon` and `n.lat` exist on NodeRef. We need to check if lon != 0.0 or something like that. Actually the standard approach is to just check if lon != 0.0 and lat != 0.0, or we can use a try/except. Let me look at the osmium documentation approach. Actually, in osmium, `NodeRef` with `locations=True` has `lon`, `lat` properties. If the location is not known, it returns `0.0`. So we can check `if n.lon != 0.0 or n.lat != 0.0`. But more precisely, we should check that the node actually has a valid location. A common pattern is: ```python coords = [(n.lon, n.lat) for n in w.nodes if n.lon != 0.0 or n.lat != 0.0] ``` Actually that might filter out some nodes at 0,0 (unlikely but possible). A better approach is `if abs(n.lon) > 0.001 or abs(n.lat) > 0.001` or simply try/except. Let me just use the approach of checking if lon != 0.0 or lat != 0.0.
11:25
Edit
/work/solve.py
old_string not found in /work/solve.py
11:33
Let me read the file to get the exact content.
11:33
Read
/work/solve.py
1 #!/usr/bin/env python3
2 """
3 Extract highways intersecting a 500m buffer around Vienna's Gürtel,
4 and public-transport route relations crossing that same buffer,
5 from the Geofabrik Austria PBF.
6
7 Output: vienna_network.gpkg (layers: highways, pt_routes)
8 CRS: EPSG:31287 (MGI / Austria Lambert)
9 """
10
11 import osmium
12 import sys
13 import os
14 import time
15
16 from shapely.geometry import LineString, MultiLineString
17 from shapely.ops import unary_union
18 import geopandas as gpd
19
20 PBF = "/work/austria-latest.osm.pbf"
21
22 # ===================================================================
23 # PASS 1 — Find Gürtel ways (name ends with 'Gürtel') with locations
24 # ===================================================================
25 class GuertelFinder(osmium.SimpleHandler):
26 def __init__(self):
27 super().__init__()
28 self.guertel_lines = []
29
30 def way(self, w):
31 if "highway" not in w.tags:
32 return
33 name = w.tags.get("name", "")
34 if not name.endswith("Gürtel"):
35 return
36 coords = [(n.lon, n.lat) for n in w.nodes if n.location_valid()]
37 if len(coords) >= 2:
38 self.guertel_lines.append(LineString(coords))
39
40 print("=" * 60, flush=True)
41 print("PASS 1 — Finding Gürtel ways...", flush=True)
42 t0 = time.time()
43
44 finder = GuertelFinder()
45 finder.apply_file(PBF, locations=True)
46
47 print(f" Gürtel LineStrings: {len(finder.guertel_lines)}", flush=True)
48
49 if not finder.guertel_lines:
50 print("ERROR: No Gürtel ways found.", flush=True)
51 sys.exit(1)
52
53 guertel_union = unary_union(finder.guertel_lines)
54
55 # 500 m buffer — approximate: 1° ≈ 111 000 m at Vienna latitude
56 BUFFER_DEG = 500.0 / 111000.0
57 guertel_buffer = guertel_union.buffer(BUFFER_DEG)
58
59 print(f" Buffer bounds (deg): {guertel_buffer.bounds}", flush=True)
60 print(f" Elapsed: {time.time()-t0:.1f}s", flush=True)
<system-reminder>Showed lines 1-60 of 199. Re-call Read with offset=61 to continue, or raise limit.</system-reminder>
11:38
The issue is that `location_valid()` is a method on `NodeRef` only when using `locations=True` parameter in the handler. Let me check the osmium documentation. Actually, when using `locations=True`, `NodeRef` objects get `lon` and `lat` attributes that return 0.0 if invalid. So checking for non-zero lon/lat should work. But the exact attribute name might be different. Let me try a different approach - checking if the node reference is valid by seeing if both lon and lat are non-zero.
11:38
Read
/work/solve.py
61
62 # ===================================================================
63 # PASS 2 — Collect highway ways + PT route relations in buffer
64 # ===================================================================
65 class Collector(osmium.SimpleHandler):
66 def __init__(self, buffer_geom):
67 super().__init__()
68 self.buffer = buffer_geom
69 self.way_geoms = {} # way_id -> Shapely LineString
70 self.way_tags = {} # way_id -> full tags dict
71
72 self.highway_features = [] # list of dicts for GeoDataFrame
73 self.route_features = [] # list of dicts for GeoDataFrame
74
75 # stats
76 self._n_ways = 0
77 self._n_highways = 0
78 self._n_highway_in = 0
79 self._n_routes = 0
80 self._n_route_in = 0
81
82 def way(self, w):
83 self._n_ways += 1
84
85 coords = [(n.lon, n.lat) for n in w.nodes if n.location_valid()]
86 if len(coords) < 2:
87 return
88
89 line = LineString(coords)
90 self.way_geoms[w.id] = line
91 self.way_tags[w.id] = dict(w.tags)
92
93 if "highway" not in w.tags:
94 return
95
96 self._n_highways += 1
97 if not line.intersects(self.buffer):
98 return
99
100 self._n_highway_in += 1
101 tags = dict(w.tags)
102 row = {"geometry": line}
103 for k in ("name", "highway", "maxspeed", "lanes", "surface", "oneway"):
104 row[k] = tags.get(k, None)
105 self.highway_features.append(row)
106
107 def relation(self, r):
108 if r.tags.get("type") != "route":
109 return
110 route_val = r.tags.get("route", "")
111 if route_val not in ("bus", "tram", "subway", "train",
112 "trolleybus", "light_rail"):
113 return
114
115 self._n_routes += 1
116
117 # Collect member-way geometries
118 member_lines = []
119 for m in r.members:
120 if m.type == "w" and m.ref in self.way_geoms:
121 member_lines.append(self.way_geoms[m.ref])
122
123 if not member_lines:
124 return
125
126 combined = unary_union(member_lines)
127 if combined.is_empty or not combined.intersects(self.buffer):
128 return
129
130 self._n_route_in += 1
131
132 # Normalise to MultiLineString
133 if combined.geom_type == "LineString":
134 geom = MultiLineString([combined])
135 elif combined.geom_type == "MultiLineString":
136 geom = combined
137 else:
138 return # point / polygon — skip
139
140 tags = dict(r.tags)
141 row = {"geometry": geom}
142 for k in ("ref", "name", "operator", "route"):
143 row[k] = tags.get(k, None)
144 self.route_features.append(row)
145
146
147 print("PASS 2 — Collecting highways and PT routes...", flush=True)
148 t1 = time.time()
149
150 collector = Collector(guertel_buffer)
151 collector.apply_file(PBF, locations=True)
152
153 print(f" Total ways seen: {collector._n_ways}", flush=True)
154 print(f" Highway ways seen: {collector._n_highways}", flush=True)
155 print(f" Highway ways in buffer: {collector._n_highway_in}", flush=True)
156 print(f" PT route relations seen: {collector._n_routes}", flush=True)
157 print(f" PT routes crossing buffer: {collector._n_route_in}", flush=True)
158 print(f" Elapsed: {time.time()-t1:.1f}s", flush=True)
159
160 # ===================================================================
161 # PASS 2b — Re-read relations for PT routes whose ways were already
162 # collected earlier in ways but relations came after ways?
163 # Actually osmium calls way() then relation() in order, so all ways
164 # are already collected by the time relations are processed. Good.
165 # ===================================================================
166
167 # ===================================================================
168 # Step 3 — Build GeoDataFrames, reproject, write GeoPackage
169 # ===================================================================
170 if not collector.highway_features:
171 print("WARNING: no highway features to write!", flush=True)
172 if not collector.route_features:
173 print("WARNING: no PT route features to write!", flush=True)
174
175 print("STEP 3 — Writing GeoPackage ...", flush=True)
176 t2 = time.time()
177
178 gdf_hw = gpd.GeoDataFrame(collector.highway_features, crs="EPSG:4326")
179 gdf_pt = gpd.GeoDataFrame(collector.route_features, crs="EPSG:4326")
180
181 # Reproject to Austria's standard projected CRS
182 gdf_hw = gdf_hw.to_crs("EPSG:31287")
183 gdf_pt = gdf_pt.to_crs("EPSG:31287")
184
185 OUT = "/work/vienna_network.gpkg"
186
187 # Remove output if it exists (GeoPackage append mode can be tricky)
188 if os.path.exists(OUT):
189 os.remove(OUT)
190
191 gdf_hw.to_file(OUT, layer="highways", driver="GPKG", engine="pyogrio")
192 gdf_pt.to_file(OUT, layer="pt_routes", driver="GPKG", engine="pyogrio")
193
194 print(f" Highways layer: {len(gdf_hw)} features", flush=True)
195 print(f" PT routes layer: {len(gdf_pt)} features", flush=True)
196 print(f" Output: {OUT}", flush=True)
197 print(f" Elapsed: {time.time()-t2:.1f}s", flush=True)
198 print("=" * 60, flush=True)
199 print("Done.", flush=True)
11:45
The issue is that `n.location_valid()` doesn't exist on `NodeRef` objects. With `locations=True`, the way nodes have `.lon` and `.lat` directly, but not `.location_valid()`. Let me check how osmium handles this - when `locations=True` is passed, the way nodes should have their positions resolved. But `location_valid()` is a method on the `Location` object not on `NodeRef` directly. Actually, let me think. In osmium, when `locations=True` is used in `apply_file`, the node coordinates are stored internally and NodeRef objects get `.lon` and `.lat` directly. But if a node's location hasn't been resolved (e.g. the node is not in the file), `.lon` and `.lat` could be 0.0. Actually, looking at the osmium docs more carefully: when `locations=True` is used, the handler processes nodes first, then ways. So the `NodeRef` objects on ways will have `.lon` and `.lat` populated. But there's no `location_valid()` method on `NodeRef`. We can check if lon/lat are both 0.0, or if the node ref is 0. Let me just use a direct check: `if n.ref != 0 and abs(n.lon) > 0.0 or abs(n.lat) > 0.0` or simply `if True` since coordinates should be populated with locations=True.
11:45
Edit
/work/solve.py
Edited /work/solve.py: replaced 2 occurrence(s).
11:53
Wait, `n.ref` is always the node ID for NodeRef objects. With `locations=True`, the reader will populate lon/lat on the NodeRef objects for all nodes that exist. So we should check if the location is valid by testing if lon/lat are not zero. Let me check the osmium docs for NodeRef — with `locations=True`, nodes are stored in a location cache and NodeRef objects get `.lon` and `.lat` attributes. If a node wasn't found, the coordinates would be 0.0/0.0. Let me use `abs(n.lon) > 0.0 or abs(n.lat) > 0.0` instead.
11:53
Edit
/work/solve.py
Edited /work/solve.py: replaced 2 occurrence(s).
12:13
Wait, `n.ref != 0` should work because when locations=True, the nodes in a way will have coordinates. But `n.lon` returns 0.0 for invalid nodes, so both approaches should work. Let me check if osmium NodeRef has `.ref` attribute - yes it does. But the issue was that `n.location_valid()` is not a method. Let me see if `.ref != 0` works. Actually, the simplest approach: when locations=True, each node in w.nodes has .lat and .lon that are 0.0 if invalid. So checking `abs(n.lon) > 0.0 or abs(n.lat) > 0.0` would work, but for nodes at (0,0) (in the Gulf of Guinea) it could be wrong. In practice Vienna is at ~48°N, 16°E, so this is fine. But actually the issue might also be that NodeRef doesn't even have those attributes when locations=True. Let me check - in osmium, when you use locations=True, the Reader provides a NodeLocationsForWays handler, and the way nodes get their lat/lon populated as attributes on the NodeRef objects. So `n.lon` and `n.lat` should work. Wait, looking at the error more carefully: ``` AttributeError: 'NodeRef' object has no attribute 'location_valid' ``` So `n` is a NodeRef, and `.lon` and `.lat` should exist. Let me just go with checking if lon and lat are non-zero. Let me run the script again.
12:13
Bash
cd /work && python3 solve.py 2>&1
<exit_code>-1</exit_code> <error>Command timed out after 600000ms</error>
22:26
The script timed out after 600s (10 minutes). The PBF file is 765 MB so a double pass with `locations=True` can be slow. Let me try a more efficient approach: 1. First pass: just find Gürtel way IDs and their node IDs, but without locations=True (just metadata). Then collect those specific node coordinates in a second quick pass. 2. Second pass with locations=True for full geometry collection. Or better: use a single pass approach where we first do a quick scan for Gürtel ways and their node coords, then buffer. But PBF doesn't support pre-filtering easily. Actually, osmium's `SimpleHandler` with `locations=True` stores all node locations in memory, which could be heavy for the full Austria file (765MB of PBF = maybe several GB of RAM). But it should still work... The timeout might be from the download happening in parallel? No, the download finished. Wait, the first pass timed out. The issue is that `locations=True` in osmium keeps all nodes in memory, and with 765MB PBF for all of Austria, that's a lot of nodes. Let me think of a better approach. Alternative approach: use `osmium` to do a first pass WITHOUT locations to get Gürtel way IDs and the node IDs they reference. Then use a second osmium pass with `locations=True` but only for the needed nodes — but osmium doesn't support selective location loading. Better alternative: Use `osmium` CLI tool to extract first, then process. But we may not have osmium CLI installed. Let me check what's available:
22:26
Bash
osmium --version 2>/dev/null || echo "no osmium CLI"
no osmium CLI
22:55
22:55
Bash
apt-get update -qq && apt-get install -y -qq osmium-tool 2>&1 | tail -5
E: List directory /var/lib/apt/lists/partial is missing. - Acquire (13: Permission denied)
23:22
Can't install osmium-tool. Let me think of a better approach using the Python osmium library that doesn't require `locations=True`.
The problem: `locations=True` causes osmium to load all nodes into memory and resolves them for every way. For a 765MB PBF, this is very slow.
Better approach:
1. First pass: Use osmium without locations=True, just collect way IDs and node IDs for Gürtel ways.
2. Second pass: Use a NodeCache or read nodes to get coordinates for just those node IDs.
3. Then do the actual extraction with locations=True but only after building a node cache.
Actually, another approach: use `osmium` Python library but build a custom handler that:
1. In the `node()` callback, store coordinates for all nodes.
2. In the `way()` callback, build the geometry from already-seen nodes.
3. But this means storing ALL node coordinates - that's memory heavy but might work for Austria (not the whole planet).
Actually, for Austria (765MB PBF), the whole node set might be 300-400M coordinates, which could be ~10GB. Not ideal.
Let me try a different approach: use OSMnx! OSMnx can query from Overpass API for just the Gürtel area and network. That's much more targeted.
Or use DuckDB + spatial extension to query the PBF directly via ogr/gdal.
Actually wait - the simplest approach: install `osmium` CLI tool via pip? No, `osmium` Python package is already installed. The issue is `locations=True` is slow because it resolves all nodes.
Let me look at how to use osmium Python more efficiently. The key insight: we can use `osmium.io.Reader` with `osmium.osm.NodeLocationForWays` handler or we can use `osmium.index` to build a node location cache.
Actually, the best approach for large files: use `osmium.index.create_map("sparse_mmap_array", "nodes.cache")` and `NodeLocationForWays` which will lazily store nodes as they're read.
Let me rewrite the script more efficiently.
23:22
Write
/work/solve.py
content · 249 lines · py
#!/usr/bin/env python3
"""
Extract highways intersecting a 500m buffer around Vienna's Gürtel,
and public-transport route relations crossing that same buffer,
from the Geofabrik Austria PBF.
Output: vienna_network.gpkg (layers: highways, pt_routes)
CRS: EPSG:31287 (MGI / Austria Lambert)
Efficient approach using osmium NodeLocationForWays with sparse file index.
"""
import osmium
from osmium import osm
from osmium.index import create_map
import sys
import os
import time
from shapely.geometry import LineString, MultiLineString
from shapely.ops import unary_union
import geopandas as gpd
PBF = "/work/austria-latest.osm.pbf"
NODE_CACHE = "/work/nodes.cache"
# ===================================================================
# PASS 1 — Find Gürtel ways (name ends with 'Gürtel')
# Without locations — just collect way IDs and their node IDs
# ===================================================================
class GuertelCollector(osmium.SimpleHandler):
"""First pass: collect node IDs of Gürtel ways."""
def __init__(self):
super().__init__()
self.guertel_way_node_ids = [] # list of lists of node refs
def way(self, w):
if "highway" not in w.tags:
return
name = w.tags.get("name", "")
if not name.endswith("Gürtel"):
return
node_ids = [n.ref for n in w.nodes if n.ref != 0]
if len(node_ids) >= 2:
self.guertel_way_node_ids.append(node_ids)
print("=" * 60, flush=True)
print("PASS 1 — Finding Gürtel ways...", flush=True)
t0 = time.time()
gcol = GuertelCollector()
gcol.apply_file(PBF, locations=False)
print(f" Gürtel ways found: {len(gcol.guertel_way_node_ids)}", flush=True)
if not gcol.guertel_way_node_ids:
print("ERROR: No Gürtel ways found.", flush=True)
sys.exit(1)
# Collect all needed node IDs
needed_nodes = set()
for nids in gcol.guertel_way_node_ids:
needed_nodes.update(nids)
print(f" Unique node refs needed: {len(needed_nodes)}", flush=True)
# ===================================================================
# PASS 1b — Read all nodes, keep only those we need
# ===================================================================
class NodeGrabber(osmium.SimpleHandler):
def __init__(self, needed):
super().__init__()
self.needed = needed
self.node_coords = {} # id -> (lon, lat)
def node(self, n):
if n.id in self.needed:
self.node_coords[n.id] = (n.lon, n.lat)
print("PASS 1b — Reading node coordinates...", flush=True)
t1a = time.time()
ng = NodeGrabber(needed_nodes)
ng.apply_file(PBF, locations=False)
print(f" Nodes found: {len(ng.node_coords)}", flush=True)
print(f" Elapsed: {time.time()-t1a:.1f}s", flush=True)
# Build Gürtel lines
guertel_lines = []
for nids in gcol.guertel_way_node_ids:
coords = [ng.node_coords[nid] for nid in nids if nid in ng.node_coords]
if len(coords) >= 2:
guertel_lines.append(LineString(coords))
if not guertel_lines:
print("ERROR: Could not build any Gürtel geometries.", flush=True)
sys.exit(1)
guertel_union = unary_union(guertel_lines)
# 500 m buffer — approximate: 1° ≈ 111 000 m at Vienna latitude
BUFFER_DEG = 500.0 / 111000.0
guertel_buffer = guertel_union.buffer(BUFFER_DEG)
print(f" Buffer bounds (deg): {guertel_buffer.bounds}", flush=True)
print(f" Total elapsed: {time.time()-t0:.1f}s", flush=True)
# ===================================================================
# PASS 2 — Read the whole file with NodeLocationForWays to get all
# way geometries, then filter highways + PT routes
# ===================================================================
class Collector(osmium.SimpleHandler):
def __init__(self, buffer_geom):
super().__init__()
self.buffer = buffer_geom
self.way_geoms = {} # way_id -> Shapely LineString
self.way_tags = {} # way_id -> full tags dict
self.highway_features = [] # list of dicts for GeoDataFrame
self.route_features = [] # list of dicts for GeoDataFrame
# stats
self._n_ways = 0
self._n_highways = 0
self._n_highway_in = 0
self._n_routes = 0
self._n_route_in = 0
def way(self, w):
self._n_ways += 1
coords = [(n.lon, n.lat) for n in w.nodes if n.ref != 0 and (abs(n.lon) > 0.0 or abs(n.lat) > 0.0)]
if len(coords) < 2:
return
line = LineString(coords)
self.way_geoms[w.id] = line
self.way_tags[w.id] = dict(w.tags)
if "highway" not in w.tags:
return
self._n_highways += 1
if not line.intersects(self.buffer):
return
self._n_highway_in += 1
tags = dict(w.tags)
row = {"geometry": line}
for k in ("name", "highway", "maxspeed", "lanes", "surface", "oneway"):
row[k] = tags.get(k, None)
self.highway_features.append(row)
def relation(self, r):
if r.tags.get("type") != "route":
return
route_val = r.tags.get("route", "")
if route_val not in ("bus", "tram", "subway", "train",
"trolleybus", "light_rail"):
return
self._n_routes += 1
# Collect member-way geometries
member_lines = []
for m in r.members:
if m.type == "w" and m.ref in self.way_geoms:
member_lines.append(self.way_geoms[m.ref])
if not member_lines:
return
combined = unary_union(member_lines)
if combined.is_empty or not combined.intersects(self.buffer):
return
self._n_route_in += 1
# Normalise to MultiLineString
if combined.geom_type == "LineString":
geom = MultiLineString([combined])
elif combined.geom_type == "MultiLineString":
geom = combined
else:
return # point / polygon — skip
tags = dict(r.tags)
row = {"geometry": geom}
for k in ("ref", "name", "operator", "route"):
row[k] = tags.get(k, None)
self.route_features.append(row)
print("PASS 2 — Collecting highways and PT routes (NodeLocationForWays)...", flush=True)
t2 = time.time()
# Use sparse file map for node cache - avoids loading all nodes into RAM
idx = create_map("sparse_file_array", NODE_CACHE)
loc_handler = osmium.NodeLocationForWays(idx)
collector = Collector(guertel_buffer)
collector.apply_file(PBF, locations=True, idx=idx)
print(f" Total ways seen: {collector._n_ways}", flush=True)
print(f" Highway ways seen: {collector._n_highways}", flush=True)
print(f" Highway ways in buffer: {collector._n_highway_in}", flush=True)
print(f" PT route relations seen: {collector._n_routes}", flush=True)
print(f" PT routes crossing buffer: {collector._n_route_in}", flush=True)
print(f" Elapsed: {time.time()-t2:.1f}s", flush=True)
# Cleanup
try:
os.remove(NODE_CACHE)
except:
pass
# ===================================================================
# Step 3 — Build GeoDataFrames, reproject, write GeoPackage
# ===================================================================
if not collector.highway_features:
print("WARNING: no highway features to write!", flush=True)
if not collector.route_features:
print("WARNING: no PT route features to write!", flush=True)
print("STEP 3 — Writing GeoPackage ...", flush=True)
t3 = time.time()
gdf_hw = gpd.GeoDataFrame(collector.highway_features, crs="EPSG:4326")
gdf_pt = gpd.GeoDataFrame(collector.route_features, crs="EPSG:4326")
# Reproject to Austria's standard projected CRS (MGI / Austria Lambert)
gdf_hw = gdf_hw.to_crs("EPSG:31287")
gdf_pt = gdf_pt.to_crs("EPSG:31287")
OUT = "/work/vienna_network.gpkg"
# Remove output if it exists
if os.path.exists(OUT):
os.remove(OUT)
gdf_hw.to_file(OUT, layer="highways", driver="GPKG", engine="pyogrio")
gdf_pt.to_file(OUT, layer="pt_routes", driver="GPKG", engine="pyogrio")
print(f" Highways layer: {len(gdf_hw)} features", flush=True)
print(f" PT routes layer: {len(gdf_pt)} features", flush=True)
print(f" Output: {OUT}", flush=True)
print(f" Elapsed: {time.time()-t3:.1f}s", flush=True)
print("=" * 60, flush=True)
print("Done.", flush=True)
Wrote /work/solve.py (8315 chars).
23:27
23:27
Bash
cd /work && python3 solve.py 2>&1
============================================================
PASS 1 — Finding Gürtel ways...
Gürtel ways found: 398
Unique node refs needed: 1695
PASS 1b — Reading node coordinates...
Nodes found: 1695
Elapsed: 171.3s
Buffer bounds (deg): (15.1625046521, 47.0244920757, 16.4031906723, 48.2408288825)
Total elapsed: 194.1s
PASS 2 — Collecting highways and PT routes (NodeLocationForWays)...
Traceback (most recent call last):
File "/work/solve.py", line 198, in <module>
idx = create_map("sparse_file_array", NODE_CACHE)
TypeError: create_map(): incompatible function arguments. The following argument types are supported:
1. (map_type: str) -> osmium.index.LocationTable
Invoked with: 'sparse_file_array', '/work/nodes.cache'
26:57
The `create_map` function doesn't accept a file path in that way. Let me look at how osmium's NodeLocationForWays works properly. The standard approach is to use `osmium.NodeLocationForWays` with a location handler that stores locations.
Actually, looking more carefully, the common pattern is:
```python
handler = osmium.NodeLocationForWays()
handler.apply_file(PBF, locations=True)
```
But that reads the file twice internally. Let me check the proper API.
Actually, the issue is that when `locations=True` is passed to `apply_file`, it creates a node cache automatically. But the timeout issue was because it's reading the entire 765MB PBF file which is slow in Python.
Let me take a completely different approach. Instead of using osmium Python which seems to be slow, let me use DuckDB's spatial extension or OGR/GDAL to read the PBF.
Actually, a simpler approach: let me use `osmium` command-line tool to filter the PBF, then use ogr2ogr or geopandas to process.
But I can't install osmium-tool via apt. Let me try another approach: use the `osmium` Python module properly.
Actually, looking at the osmium Python docs more carefully:
`apply_file` with `locations=True` reads the whole file with node locations stored. This is what was slow (it timed out at 10 min).
But actually, PASS 1b took 171s to read the whole file just for 1695 nodes - that's the bottleneck. The whole PBF reading is the slow part.
Let me think differently. Instead of reading the whole PBF twice (once for Gürtel nodes, once for everything), let me:
1. First pass: find Gürtel way IDs (fast, no locations)
2. Second pass: read the whole file with locations=True ONCE, and in that handler do both:
- Collect the Gürtel way geometries (knowing which IDs to look for)
- Collect all other highway ways intersecting the buffer
- But wait, we need the buffer first before we know which ways intersect it...
So we need the Gürtel geometry before the second pass. That means we need node coordinates for Gürtel ways first.
Let me try a different approach: use `osmium` Python with the `NodeLocationForWays` handler properly. The key insight is that we can create a location handler that stores locations on-the-fly, and pass it to the handler.
Looking at the osmium source code, `NodeLocationForWays` can be used as a handler that's passed to `apply_file`:
```python
class MyHandler(osmium.SimpleHandler):
pass
h = MyHandler()
h.apply_file(PBF, locations=True)
```
When `locations=True` is set, osmium internally creates a node cache and stores all node locations. This is the slow part because it has to read all nodes.
Actually, the speed issue is just inherent - reading a 765MB PBF takes time. Let me check if it's truly stuck or just slow.
Let me try the approach where we read the file just once with locations=True and do everything in one pass. But for that, we need to know which ways are Gürtel beforehand... Actually no - we can first check the name in the way handler, and for Gürtel ways, build the geometry and union them as we go. Then for non-Gürtel ways, check against a partially-built buffer... but that won't work because we need the complete Gürtel to build the buffer.
OK, let me try yet another approach. The fastest way is probably to use DuckDB's spatial extension to query the OSM PBF directly. But DuckDB can't read PBF directly.
Let me try a different tactic: use `osmium` command line tool (which is much faster than Python) by installing it via pip.
osmium-tool is available via pip: `pip install osmium-tool`
26:57
26:57
Bash
pip install --user osmium-tool 2>&1 | tail -3
ERROR: Could not find a version that satisfies the requirement osmium-tool (from versions: none) ERROR: No matching distribution found for osmium-tool
27:17
Let me check what index maps are available with pyosmium.
27:17
Bash
python3 -c "
python3 -c "
from osmium.index import create_map
print('Available maps:')
import osmium.index as idx
print(dir(idx))
"
Available maps: ['IdSet', 'LocationTable', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', 'create_map', 'map_types']
27:27
The `create_map` function takes just a string for the map type. The `NodeLocationForWays` needs an index object. Let me look at how to properly use this. The standard approach with osmium is to use the handler with a `NodeLocationForWays` handler that gets added before the main handler. Let me check the osmium docs/examples.
27:27
Bash
python3 -c "
python3 -c "
from osmium.index import create_map, map_types
print('map_types:', map_types)
m = create_map('sparse_file_array')
print(type(m))
" 2>&1
map_types: <built-in method map_types of pybind11_builtins.pybind11_detail_function_record_v1_system_libstdcpp_gxx_abi_1xxx_use_cxx11_abi_1 object at 0x7f2d345bc210> <class 'osmium.index.LocationTable'>
27:44
The `create_map` only takes the map_type string, not a filename. The location table is used differently - we need to use the `NodeLocationForWays` handler which reads locations from the PBF file directly. Let me look at how `apply_file` with `locations=True` works and how to properly use a node cache.
Actually, the problem in pass 1 was that `locations=True` was very slow because it had to read every node. But actually it completed in 171 seconds for just reading nodes, which is reasonable. However, pass 2 also needs `locations=True` which means reading all nodes again.
Let me try a different strategy:
1. First pass: find Gürtel ways (already done quickly)
2. Second pass: read ALL nodes to build a full node cache file
3. Third pass: use the node cache to process ways quickly
Actually, the `NodeLocationForWays` handler in osmium Python uses a LocationTable as the node cache. We need to:
1. Create a LocationTable (like `sparse_file_array`)
2. Fill it by reading the file with a `NodeLocationsForWays` handler
3. Then apply the handler again with the filled location table
Wait, the osmium Python library's `NodeLocationForWays` is actually a handler that fills the location table as it processes nodes. So we need to:
```python
from osmium import NodeLocationForWays
idx = create_map("sparse_file_array")
handler = NodeLocationForWays(idx)
# First pass to fill the index
apply_file(PBF, locations=False, handler=handler)
# Now idx is filled. Then second pass with our collector
```
But we also need our collector to use the idx. Actually when `locations=True` is used, osmium internally uses its own location handler. We need to pass the `idx` parameter separately.
Let me look at how `apply_file` works more carefully.
Actually looking at the osmium source code or docs - when `locations=True` is passed, it processes with node locations. But we can also pass `idx` parameter. Let me check.
Actually the error was `create_map("sparse_file_array", NODE_CACHE)` - it doesn't take a filename argument. The sparse_file_array creates a temporary file. We should just call it without filename.
Let me rewrite the approach:
1. Pass 1: find Gürtel way node IDs (fast, without locations)
2. Pass 2: read all nodes to build LocationTable
3. Pass 3: read ways with locations from LocationTable, filter highways and PT routes
Actually, we can combine passes 2 and 3: use `apply_file` with a custom handler chain that both fills the location table AND collects data. Let me try using `osmium.apply` or handler chaining.
Actually, in osmium Python you can pass multiple handlers:
```python
from osmium import NodeLocationForWays
idx = create_map("sparse_file_array")
loc_handler = NodeLocationForWays(idx)
collector = Collector(guertel_buffer)
osmium.apply(PBF, loc_handler, collector)
```
This should process the file with both handlers - loc_handler fills the index with node locations, and collector processes ways and relations with location data available.
Let me try this approach.
27:44
Bash
python3 -c "
python3 -c "
import osmium
from osmium import NodeLocationForWays
from osmium.index import create_map
idx = create_map('sparse_file_array')
print(type(idx))
# Check if we can pass idx to NodeLocationForWays
nl = NodeLocationForWays(idx)
print(type(nl))
# Check apply signature
help(osmium.apply)
" 2>&1 | head -30
Traceback (most recent call last):
File "<string>", line 3, in <module>
from osmium import NodeLocationForWays
ImportError: cannot import name 'NodeLocationForWays' from 'osmium' (/usr/local/lib/python3.14/site-packages/osmium/__init__.py). Did you mean: 'NodeLocationsForWays'?
27:50
Let me check the osmium Python API more carefully. The key issue is that `locations=True` in `apply_file` is very slow (reading all nodes). Let me try a different approach using `osmium.NodeLocationsForWays` handler. Actually, looking at the osmium Python API, when you do `apply_file(PBF, locations=True)`, it internally reads ALL nodes regardless. That's the slow part. A better approach: Use a two-step method where: 1. First pass (fast, no locations): collect Gürtel way IDs and node IDs 2. Second pass with NodeLocationsForWays handler: process the file with a handler chain Let me look at the osmium SimpleHandler's `apply_file` signature more carefully.
27:50
Bash
python3 -c "
python3 -c " import osmium help(osmium.SimpleHandler.apply_file) " 2>&1
Help on function apply_file in module osmium.simple_handler:
apply_file(
self,
filename: Union[str, 'os.PathLike[str]', File],
locations: bool = False,
idx: str = 'flex_mem',
filters: List['HandlerLike'] = []
) -> None
Apply the handler to the given file. If locations is true, then
a location handler will be applied before, which saves the node
positions. In that case, the type of this position index can be
further selected in idx. If an area callback is implemented, then
the file will be scanned twice and a location handler and a
handler for assembling multipolygons and areas from ways will
be executed.
28:31
Now I see the issue. The `apply_file` method accepts `idx` as a string, not a pre-created index. Let me pass the index type as a string. Also, the `idx` parameter expects a string like `'sparse_file_array'` - the index will be created internally. But I need to persist it across passes or read it again. Actually, looking at the code more carefully, the `apply_file` with `locations=True` already uses a node location handler internally. The `idx` parameter is the type of index to use. So I can just pass `idx='sparse_file_array'` to keep memory usage low. Let me rewrite the script more efficiently. For Pass 1, I already read all nodes needed (1695 nodes). Then for Pass 2, I can use `locations=True, idx='sparse_file_array'` to use a disk-based cache. But wait - the problem before was that using `locations=True` was extremely slow (timed out). Let me think about why. The PBF is 765 MB. Reading with locations=True means it needs to read all 7M+ nodes, store their coordinates, then process ways. With a memory map (`flex_mem`) it's fast but uses RAM. With `sparse_file_array` it writes to disk which could be slow. Actually, the timeout might have been due to something else. Let me try a different approach: use two passes - first without locations to collect Gürtel ways (already done in 194s), then use a node cache for Pass 2. Actually the real problem with the first attempt was that it timed out. Let me check if the issue was just the first pass (which took 194s) or the full thing. The first version had `locations=True` for both passes, and timed out after 10 min. The current version has: - Pass 1: no locations (just reads tags) - fast - Pass 1b: reads all nodes, filters needed - took 171s for just node reading... that's slow because it scans all nodes (millions) but only keeps 1695 - Pass 2: this failed due to API issue Let me try all three passes, fixing the index. Or better: instead of doing 3 full file passes, let me do: 1. First pass: collect Gürtel way node IDs (no locations, just header scan) 2. Then extract just those nodes using osmium's get_nodes? No. Actually, the simplest approach: only do ONE full-file read with locations=True and collect everything in one handler. The first version that timed out - maybe it was just too slow with `locations=True` default (flex_mem memory index). Let me try `idx='sparse_file_array'` to use disk-based index. Wait - the first version that timed out DID use `locations=True` in both passes. Each pass would read the full file. That's 2 full reads of 765MB = 1.5GB read, with node caching. Could be slow but should complete in <10 min. Actually, the issue might be that `locations=True` first reads all nodes, then processes ways and relations. The node pass takes a while. With the default `flex_mem` index, it stores all ~350M coordinates for 175M nodes in memory which could cause swapping. Let me try using `idx='sparse_file_array'` which writes to disk instead. But first, let me rewrite to do everything in ONE single pass to minimize I/O. Actually, the best approach: 1. First quick pass (no locations) to find Gürtel way IDs and their node IDs - this is fast, just reads headers 2. Then extract just those nodes (using the node IDs) - but we can't easily extract specific nodes without a full scan 3. Build buffer 4. Second pass with locations=True and idx='sparse_file_array' for disk-based caching - collect highways and relations Let me restructure to minimize passes. Actually, 2 full passes should be okay if each takes ~4-5 min. Let me just fix the API. For the apply_file, the `idx` parameter takes a string. Let me use `idx='sparse_file_array'`.
28:31
28:31
Write
/work/solve.py
content · 223 lines · py
#!/usr/bin/env python3
"""
Extract highways intersecting a 500m buffer around Vienna's Gürtel,
and public-transport route relations crossing that same buffer,
from the Geofabrik Austria PBF.
Output: vienna_network.gpkg (layers: highways, pt_routes)
CRS: EPSG:31287 (MGI / Austria Lambert)
Strategy:
PASS 1 – scan without locations to find Gürtel way IDs + their node refs
PASS 2 – scan all nodes, keep only the ones we need for Gürtel ways
PASS 3 – full scan with locations=True (sparse_file_array) to collect
highway ways + PT route relations intersecting the buffer
"""
import osmium
import sys
import os
import time
from shapely.geometry import LineString, MultiLineString
from shapely.ops import unary_union
import geopandas as gpd
PBF = "/work/austria-latest.osm.pbf"
# ===================================================================
# PASS 1 – Find Gürtel ways (no locations)
# ===================================================================
class GuertelWayCollector(osmium.SimpleHandler):
def __init__(self):
super().__init__()
self.guertel_way_node_ids = [] # list of lists of node refs
def way(self, w):
if "highway" not in w.tags:
return
name = w.tags.get("name", "")
if not name.endswith("Gürtel"):
return
node_ids = [n.ref for n in w.nodes if n.ref != 0]
if len(node_ids) >= 2:
self.guertel_way_node_ids.append(node_ids)
print("=" * 60, flush=True)
print("PASS 1 — Finding Gürtel ways...", flush=True)
t0 = time.time()
gwc = GuertelWayCollector()
gwc.apply_file(PBF, locations=False)
print(f" Gürtel ways: {len(gwc.guertel_way_node_ids)}", flush=True)
if not gwc.guertel_way_node_ids:
print("ERROR: No Gürtel ways found.", flush=True)
sys.exit(1)
needed_nodes = set()
for nids in gwc.guertel_way_node_ids:
needed_nodes.update(nids)
print(f" Unique node refs: {len(needed_nodes)}", flush=True)
print(f" Elapsed: {time.time()-t0:.1f}s", flush=True)
# ===================================================================
# PASS 2 – Read all nodes, keep coordinates for needed ones
# ===================================================================
class NodeGrabber(osmium.SimpleHandler):
def __init__(self, needed):
super().__init__()
self.needed = needed
self.node_coords = {}
def node(self, n):
if n.id in self.needed:
self.node_coords[n.id] = (n.lon, n.lat)
print("PASS 2 — Reading node coordinates for Gürtel...", flush=True)
t1 = time.time()
ng = NodeGrabber(needed_nodes)
ng.apply_file(PBF, locations=False)
print(f" Nodes found: {len(ng.node_coords)}", flush=True)
print(f" Elapsed: {time.time()-t1:.1f}s", flush=True)
# Build Gürtel lines
guertel_lines = []
for nids in gwc.guertel_way_node_ids:
coords = [ng.node_coords[nid] for nid in nids if nid in ng.node_coords]
if len(coords) >= 2:
guertel_lines.append(LineString(coords))
if not guertel_lines:
print("ERROR: Could not build Gürtel geometries.", flush=True)
sys.exit(1)
guertel_union = unary_union(guertel_lines)
# 500 m buffer — approx: 1° ≈ 111000 m at Vienna latitude
BUFFER_DEG = 500.0 / 111000.0
guertel_buffer = guertel_union.buffer(BUFFER_DEG)
print(f" Buffer bounds (deg): {guertel_buffer.bounds}", flush=True)
print(f" Total elapsed so far: {time.time()-t0:.1f}s", flush=True)
# ===================================================================
# PASS 3 – Full read with locations (disk-based index) to get highways + PT
# ===================================================================
class Collector(osmium.SimpleHandler):
def __init__(self, buffer_geom):
super().__init__()
self.buffer = buffer_geom
self.way_geoms = {} # way_id -> LineString
self.highway_features = []
self.route_features = []
self._n_ways = 0
self._n_highways = 0
self._n_highway_in = 0
self._n_routes = 0
self._n_route_in = 0
def way(self, w):
self._n_ways += 1
coords = [(n.lon, n.lat) for n in w.nodes
if n.ref != 0 and (abs(n.lon) > 0.0 or abs(n.lat) > 0.0)]
if len(coords) < 2:
return
line = LineString(coords)
self.way_geoms[w.id] = line
if "highway" not in w.tags:
return
self._n_highways += 1
if not line.intersects(self.buffer):
return
self._n_highway_in += 1
tags = dict(w.tags)
row = {"geometry": line}
for k in ("name", "highway", "maxspeed", "lanes", "surface", "oneway"):
row[k] = tags.get(k, None)
self.highway_features.append(row)
def relation(self, r):
if r.tags.get("type") != "route":
return
route_val = r.tags.get("route", "")
if route_val not in ("bus", "tram", "subway", "train",
"trolleybus", "light_rail"):
return
self._n_routes += 1
member_lines = []
for m in r.members:
if m.type == "w" and m.ref in self.way_geoms:
member_lines.append(self.way_geoms[m.ref])
if not member_lines:
return
combined = unary_union(member_lines)
if combined.is_empty or not combined.intersects(self.buffer):
return
self._n_route_in += 1
if combined.geom_type == "LineString":
geom = MultiLineString([combined])
elif combined.geom_type == "MultiLineString":
geom = combined
else:
return
tags = dict(r.tags)
row = {"geometry": geom}
for k in ("ref", "name", "operator", "route"):
row[k] = tags.get(k, None)
self.route_features.append(row)
print("PASS 3 — Collecting highways and PT routes...", flush=True)
t2 = time.time()
collector = Collector(guertel_buffer)
# Use sparse_file_array (disk-based) to avoid huge memory usage
collector.apply_file(PBF, locations=True, idx='sparse_file_array')
print(f" Total ways: {collector._n_ways}", flush=True)
print(f" Highway ways: {collector._n_highways}", flush=True)
print(f" Highway ways in buffer: {collector._n_highway_in}", flush=True)
print(f" PT route relations: {collector._n_routes}", flush=True)
print(f" PT routes crossing: {collector._n_route_in}", flush=True)
print(f" Elapsed: {time.time()-t2:.1f}s", flush=True)
# Clean up the sparse file array if it was created
for f in ("nodes.cache", "nodes.idx", "nodes.dat"):
fp = f"/work/{f}"
if os.path.exists(fp):
os.remove(fp)
# ===================================================================
# Step 4 – Write GeoPackage
# ===================================================================
if not collector.highway_features:
print("WARNING: no highway features to write!", flush=True)
if not collector.route_features:
print("WARNING: no PT route features to write!", flush=True)
print("STEP 4 — Writing GeoPackage ...", flush=True)
t3 = time.time()
gdf_hw = gpd.GeoDataFrame(collector.highway_features, crs="EPSG:4326")
gdf_pt = gpd.GeoDataFrame(collector.route_features, crs="EPSG:4326")
# Reproject to MGI / Austria Lambert
gdf_hw = gdf_hw.to_crs("EPSG:31287")
gdf_pt = gdf_pt.to_crs("EPSG:31287")
OUT = "/work/vienna_network.gpkg"
if os.path.exists(OUT):
os.remove(OUT)
gdf_hw.to_file(OUT, layer="highways", driver="GPKG", engine="pyogrio")
gdf_pt.to_file(OUT, layer="pt_routes", driver="GPKG", engine="pyogrio")
print(f" Highways: {len(gdf_hw)} features", flush=True)
print(f" PT routes: {len(gdf_pt)} features", flush=True)
print(f" Output: {OUT}", flush=True)
print(f" Total elapsed: {time.time()-t0:.1f}s", flush=True)
print("=" * 60, flush=True)
print("Done.", flush=True)
Wrote /work/solve.py (7844 chars).
28:40
Wait, but the previous run already completed PASS 1 and PASS 1b in about 194 seconds. The issue was in PASS 2 where I tried to create the index map incorrectly. But actually the sparse_file_array should work when passed as a string parameter. Let me check - the error was `create_map('sparse_file_array', NODE_CACHE)` but the correct call is `apply_file(..., idx='sparse_file_array')` which I'm now using.
However, I'm concerned about the `n.ref != 0 and (abs(n.lon) > 0.0 or abs(n.lat) > 0.0)` condition. In osmium with locations=True, the NodeRef objects should have valid lon/lat if the node was present in the file. The ref 0 check doesn't make sense - NodeRef.ref is always the node ID. Let me think...
Actually, with `locations=True`, osmium populates node locations in the way handler. For node references that don't have a matching node in the file (e.g., nodes outside the PBF extract), the location will be (0.0, 0.0) or invalid. So checking `abs(n.lon) > 0.0 or abs(n.lat) > 0.0` should filter out unlocated nodes. But there could be a node at (0, 0) - unlikely for Vienna though.
But wait - the PBF is the full Austria extract, so all nodes referenced by ways in Austria should be present. The condition `abs(n.lon) > 0.0 or abs(n.lat) > 0.0` should work fine to filter out any truly invalid/zero coordinates.
Let me run this.
28:40
Bash
cd /work && python3 solve.py 2>&1
============================================================ PASS 1 — Finding Gürtel ways... Gürtel ways: 398 Unique node refs: 1695 Elapsed: 21.8s PASS 2 — Reading node coordinates for Gürtel... Nodes found: 1695 Elapsed: 165.3s Buffer bounds (deg): (15.1625046521, 47.0244920757, 16.4031906723, 48.2408288825) Total elapsed so far: 188.0s PASS 3 — Collecting highways and PT routes... Total ways: 9288423 Highway ways: 2501709 Highway ways in buffer: 14539 PT route relations: 12444 PT routes crossing: 665 Elapsed: 325.6s STEP 4 — Writing GeoPackage ... Highways: 14539 features PT routes: 665 features Output: /work/vienna_network.gpkg Total elapsed: 515.0s ============================================================ Done.
37:38
The script ran successfully. Let me verify the output.
37:38
37:38
Bash
cd /work && python3 -c "
cd /work && python3 -c "
import geopandas as gpd
# Check layers
gdf_hw = gpd.read_file('vienna_network.gpkg', layer='highways')
gdf_pt = gpd.read_file('vienna_network.gpkg', layer='pt_routes')
print('=== HIGHWAYS ===')
print(f'Features: {len(gdf_hw)}')
print(f'CRS: {gdf_hw.crs}')
print(f'Geometry types: {gdf_hw.geometry.type.value_counts().to_dict()}')
print(f'Columns: {list(gdf_hw.columns)}')
print(f'Sample:')
print(gdf_hw[['name','highway','maxspeed','lanes','surface','oneway']].head(10).to_string())
print()
print('=== PT ROUTES ===')
print(f'Features: {len(gdf_pt)}')
print(f'CRS: {gdf_pt.crs}')
print(f'Geometry types: {gdf_pt.geometry.type.value_counts().to_dict()}')
print(f'Columns: {list(gdf_pt.columns)}')
print(f'Sample:')
print(gdf_pt[['ref','name','operator','route']].head(10).to_string())
print()
# Check coordinate magnitudes (should be metric, ~10^5-10^6 range for Austria)
bounds_hw = gdf_hw.total_bounds
bounds_pt = gdf_pt.total_bounds
print(f'Highways bounds (EPSG:31287): {bounds_hw}')
print(f'PT routes bounds (EPSG:31287): {bounds_pt}')
print(f'File size: ', end='')
import os
sz = os.path.getsize('vienna_network.gpkg')
print(f'{sz/1024/1024:.1f} MB')
"
=== HIGHWAYS ===
Features: 14539
CRS: EPSG:31287
Geometry types: {'LineString': 14539}
Columns: ['name', 'highway', 'maxspeed', 'lanes', 'surface', 'oneway', 'geometry']
Sample:
name highway maxspeed lanes surface oneway
0 Sankt-Peter-Gürtel primary 60 7 asphalt no
1 Sankt-Peter-Hauptstraße primary 50 5 asphalt no
2 Waltendorfer Gürtel secondary 50 3 asphalt no
3 Koßgasse secondary 50 2 asphalt no
4 Puntigamer Straße primary 50 2 asphalt yes
5 Weblinger Gürtel primary 50 1 asphalt yes
6 Herta-Frauneder-Straße unclassified 50 NaN asphalt NaN
7 Autobahnzubringer Graz Ost motorway 100 2 asphalt yes
8 NaN motorway_link 100 2 asphalt yes
9 NaN motorway_link NaN 2 asphalt yes
=== PT ROUTES ===
Features: 665
CRS: EPSG:31287
Geometry types: {'MultiLineString': 665}
Columns: ['ref', 'name', 'operator', 'route', 'geometry']
Sample:
ref name operator route
0 S2 S2: Mödling => Laa ÖBB-Personenverkehr AG train
1 U3 U3: Ottakring => Simmering Wiener Linien subway
2 U1 U1: Oberlaa => Leopoldau Wiener Linien subway
3 1 Tram 1: Eggenberg/UKH => Mariatrost NaN tram
4 7 Tram 7: Wetzelsdorf Burgenlandstraße => LKH Med Uni/Klinikum Nord Graz Linien tram
5 58 Bus 58: Hauptbahnhof => Ragnitz Graz Linien bus
6 5 Tram 5: Andritz => Puntigam NaN tram
7 4 Tram 4: Reininghaus => Liebenau NaN tram
8 3 Tram 3: Andritz => Krenngasse Graz Linien tram
9 6 Tram 6: Smart City/Peter-Tunner-Gasse/tim => Sankt Peter NaN tram
Highways bounds (EPSG:31287): [536189.50655806 349200.10562584 628361.57988143 487679.13770081]
PT routes bounds (EPSG:31287): [114319.22587394 292266.24860261 683732.13557579 548552.65693629]
File size: 26.7 MB
38:21
Looking at the results: 1. **Highways** - 14,539 features, all LineString, correct CRS (EPSG:31287). But wait - the sample shows roads from Graz (Sankt-Peter-Gürtel, Graz streets). The buffer bounds from PASS 1 showed `(15.1625, 47.0245, 16.4032, 48.2408)` which is quite large - spanning from around Graz up to north of Vienna. Let me think about this. The Gürtel in Vienna is the "Gürtel" road ring around Vienna. The buffer bounds should be centered around Vienna. But the buffer bounds show a bounding box from 15.16 to 16.40 longitude and 47.02 to 48.24 latitude. Vienna is around 16.37°E, 48.2°N. Graz is around 15.44°E, 47.07°N. The problem is that we found 398 Gürtel ways, but some of them are in Graz (Sankt-Peter-Gürtel, Weblinger Gürtel, Waltendorfer Gürtel), not Vienna! The name ending in "Gürtel" finds all ways whose name ends with "Gürtel", not just Vienna's Gürtel. The user says "Vienna's Gürtel, defined as the highway ways whose name ends in 'Gürtel'". So they want the Gürtel ring in Vienna. But there are other Gürtel streets in other Austrian cities. I need to filter to only Vienna's Gürtel. The main Gürtel in Vienna follows a semicircular path around the inner city. It's known as the "Wiener Gürtel" and consists of segments like "Währinger Gürtel", "Gürtel", "Margaretengürtel", etc. Actually, looking at the sample output - Sankt-Peter-Gürtel is in Graz. So the buffer is centered around Graz+Graz's Gürtel, not Vienna's. I need to fix this. The user specifically says "Vienna's Gürtel". The Gürtel in Vienna is also referred to as "Gürtel" and there are many segments with names like: - "Währinger Gürtel" - "Döblinger Gürtel" - "Alser Gürtel" - "Hernalser Gürtel" - "Ottakringer Gürtel" - "Gürtel" (the core section) - "Margaretengürtel" - "Gürtel" sections... But what defines Vienna's Gürtel as opposed to Graz's? The city boundary, or we could use the fact that Vienna's Gürtel ways are within the Vienna administrative area. Let me think about a better approach: 1. First find all ways where name ends with "Gürtel" 2. The ones in Vienna are within the Vienna city boundary. We can check if any node of the way is in Vienna's bounding box (~16.1-16.6°E, 48.1-48.3°N). Actually, a simpler approach: Vienna is in the state of "Wien" (Vienna). We can check if the way's nodes fall within a reasonable bounding box for Vienna. Or better yet, we can filter the Gürtel ways to only those within a bounding box that covers Vienna but not Graz. Vienna's approximate bounding box: 16.15 to 16.60 E, 48.10 to 48.35 N. Graz: ~15.35 to 15.50 E, 47.03 to 47.10 N. So I can filter to only Gürtel ways where the centroid/any node is within a Vienna bounding box. Let me rewrite the approach.
38:21
38:21
Write
/work/solve.py
content · 246 lines · py
#!/usr/bin/env python3
"""
Extract highways intersecting a 500m buffer around Vienna's Gürtel,
and public-transport route relations crossing that same buffer,
from the Geofabrik Austria PBF.
Output: vienna_network.gpkg (layers: highways, pt_routes)
CRS: EPSG:31287 (MGI / Austria Lambert)
Strategy:
PASS 1 – scan without locations to find ways whose name ends with "Gürtel"
and whose nodes lie within the Vienna area (bbox ~16.15–16.60°E,
48.10–48.35°N). Collect their node refs.
PASS 2 – read all nodes, keep coordinates for those needed nodes
PASS 3 – full scan with locations to collect highway ways + PT route
relations intersecting the 500 m buffer
"""
import osmium
import sys
import os
import time
from shapely.geometry import LineString, MultiLineString, box
from shapely.ops import unary_union
import geopandas as gpd
PBF = "/work/austria-latest.osm.pbf"
# Vienna approximate bounding box (in degrees)
VIENNA_BBOX = box(16.10, 48.10, 16.65, 48.35)
# ===================================================================
# PASS 1 – Find Gürtel ways in Vienna (no locations)
# ===================================================================
class GuertelWayCollector(osmium.SimpleHandler):
"""Collect way IDs + node refs for ways named *Gürtel in Vienna."""
def __init__(self):
super().__init__()
self.guertel_way_node_ids = [] # list of lists of node refs
self._candidate_way_node_ids = [] # all *Gürtel ways before bbox filter
def way(self, w):
if "highway" not in w.tags:
return
name = w.tags.get("name", "")
if not name.endswith("Gürtel"):
return
node_ids = [n.ref for n in w.nodes if n.ref != 0]
if len(node_ids) >= 2:
self._candidate_way_node_ids.append((w.id, node_ids))
def node(self, n):
pass # not needed here
print("=" * 60, flush=True)
print("PASS 1 — Finding Gürtel ways in Vienna...", flush=True)
t0 = time.time()
gwc = GuertelWayCollector()
gwc.apply_file(PBF, locations=False)
print(f" All *Gürtel ways found in Austria: {len(gwc._candidate_way_node_ids)}", flush=True)
# Collect ALL nodes referenced by candidate Gürtel ways
all_candidate_nodes = set()
for wid, nids in gwc._candidate_way_node_ids:
all_candidate_nodes.update(nids)
print(f" Unique node refs from all *Gürtel ways: {len(all_candidate_nodes)}", flush=True)
# ===================================================================
# PASS 2 – Read all nodes to determine which Gürtel ways are in Vienna
# ===================================================================
class NodeGrabberAll(osmium.SimpleHandler):
def __init__(self, needed):
super().__init__()
self.needed = needed
self.node_coords = {}
def node(self, n):
if n.id in self.needed:
self.node_coords[n.id] = (n.lon, n.lat)
print("PASS 2 — Reading node coordinates for candidate *Gürtel ways...", flush=True)
t1 = time.time()
ng = NodeGrabberAll(all_candidate_nodes)
ng.apply_file(PBF, locations=False)
print(f" Nodes found: {len(ng.node_coords)}", flush=True)
print(f" Elapsed: {time.time()-t1:.1f}s", flush=True)
# Now filter to only Gürtel ways whose nodes are within Vienna bbox
guertel_lines = []
needed_nodes = set()
vienna_guertel_count = 0
for wid, nids in gwc._candidate_way_node_ids:
coords = [ng.node_coords[nid] for nid in nids if nid in ng.node_coords]
if len(coords) < 2:
continue
# Check if the way's centroid is within Vienna bbox
lons = [c[0] for c in coords]
lats = [c[1] for c in coords]
centroid_lon = sum(lons) / len(lons)
centroid_lat = sum(lats) / len(lats)
if VIENNA_BBOX.contains(centroid_lon, centroid_lat) or VIENNA_BBOX.intersects(centroid_lon, centroid_lat):
# Check by point
pt = LineString(coords).centroid
if VIENNA_BBOX.contains(pt):
vienna_guertel_count += 1
guertel_lines.append(LineString(coords))
needed_nodes.update(nids)
print(f" Vienna *Gürtel ways: {vienna_guertel_count}", flush=True)
if not guertel_lines:
print("ERROR: No Vienna Gürtel ways found.", flush=True)
sys.exit(1)
guertel_union = unary_union(guertel_lines)
# 500 m buffer — approx: 1° ≈ 111000 m at Vienna latitude
BUFFER_DEG = 500.0 / 111000.0
guertel_buffer = guertel_union.buffer(BUFFER_DEG)
print(f" Buffer bounds (deg): {guertel_buffer.bounds}", flush=True)
print(f" Total elapsed so far: {time.time()-t0:.1f}s", flush=True)
# ===================================================================
# PASS 3 – Full read with locations (disk-based index) to get highways + PT
# ===================================================================
class Collector(osmium.SimpleHandler):
def __init__(self, buffer_geom):
super().__init__()
self.buffer = buffer_geom
self.way_geoms = {} # way_id -> LineString
self.highway_features = []
self.route_features = []
self._n_ways = 0
self._n_highways = 0
self._n_highway_in = 0
self._n_routes = 0
self._n_route_in = 0
def way(self, w):
self._n_ways += 1
coords = [(n.lon, n.lat) for n in w.nodes
if n.ref != 0 and (abs(n.lon) > 0.0 or abs(n.lat) > 0.0)]
if len(coords) < 2:
return
line = LineString(coords)
self.way_geoms[w.id] = line
if "highway" not in w.tags:
return
self._n_highways += 1
if not line.intersects(self.buffer):
return
self._n_highway_in += 1
tags = dict(w.tags)
row = {"geometry": line}
for k in ("name", "highway", "maxspeed", "lanes", "surface", "oneway"):
row[k] = tags.get(k, None)
self.highway_features.append(row)
def relation(self, r):
if r.tags.get("type") != "route":
return
route_val = r.tags.get("route", "")
if route_val not in ("bus", "tram", "subway", "train",
"trolleybus", "light_rail"):
return
self._n_routes += 1
member_lines = []
for m in r.members:
if m.type == "w" and m.ref in self.way_geoms:
member_lines.append(self.way_geoms[m.ref])
if not member_lines:
return
combined = unary_union(member_lines)
if combined.is_empty or not combined.intersects(self.buffer):
return
self._n_route_in += 1
if combined.geom_type == "LineString":
geom = MultiLineString([combined])
elif combined.geom_type == "MultiLineString":
geom = combined
else:
return
tags = dict(r.tags)
row = {"geometry": geom}
for k in ("ref", "name", "operator", "route"):
row[k] = tags.get(k, None)
self.route_features.append(row)
print("PASS 3 — Collecting highways and PT routes...", flush=True)
t2 = time.time()
collector = Collector(guertel_buffer)
collector.apply_file(PBF, locations=True, idx='sparse_file_array')
print(f" Total ways: {collector._n_ways}", flush=True)
print(f" Highway ways: {collector._n_highways}", flush=True)
print(f" Highway ways in buffer: {collector._n_highway_in}", flush=True)
print(f" PT route relations: {collector._n_routes}", flush=True)
print(f" PT routes crossing: {collector._n_route_in}", flush=True)
print(f" Elapsed: {time.time()-t2:.1f}s", flush=True)
# Clean up any sparse file array artifacts
for f in ("nodes.cache", "nodes.idx", "nodes.dat"):
fp = f"/work/{f}"
if os.path.exists(fp):
os.remove(fp)
# ===================================================================
# Step 4 – Write GeoPackage
# ===================================================================
if not collector.highway_features:
print("WARNING: no highway features to write!", flush=True)
if not collector.route_features:
print("WARNING: no PT route features to write!", flush=True)
print("STEP 4 — Writing GeoPackage ...", flush=True)
t3 = time.time()
gdf_hw = gpd.GeoDataFrame(collector.highway_features, crs="EPSG:4326")
gdf_pt = gpd.GeoDataFrame(collector.route_features, crs="EPSG:4326")
# Reproject to MGI / Austria Lambert
gdf_hw = gdf_hw.to_crs("EPSG:31287")
gdf_pt = gdf_pt.to_crs("EPSG:31287")
OUT = "/work/vienna_network.gpkg"
if os.path.exists(OUT):
os.remove(OUT)
gdf_hw.to_file(OUT, layer="highways", driver="GPKG", engine="pyogrio")
gdf_pt.to_file(OUT, layer="pt_routes", driver="GPKG", engine="pyogrio")
print(f" Highways: {len(gdf_hw)} features", flush=True)
print(f" PT routes: {len(gdf_pt)} features", flush=True)
print(f" Output: {OUT}", flush=True)
print(f" Total elapsed: {time.time()-t0:.1f}s", flush=True)
print("=" * 60, flush=True)
print("Done.", flush=True)
Wrote /work/solve.py (8914 chars).
38:42
Wait, the check `VIENNA_BBOX.contains(centroid_lon, centroid_lat)` won't work because centroid_lon and centroid_lat are floats, not Points. Let me also use `.intersects` with a Point. Let me fix the check. Also, I already downloaded the PBF again (it was overwritten? Actually it should still be there). Let me check the file size. Actually, I should also consider: the first approach collected all *Gürtel ways in Austria (398 ways), and the buffer bounds were (15.16, 47.02, 16.40, 48.24) which shows it was around Graz (15.4°E, 47.1°N) but also extending up to Vienna area. The fact that it had 398 Gürtel ways is the issue. Vienna's Gürtel is the famous ring road - there are specific ways like "Gürtel" (the street), "Äußerer Gürtel", "Innerer Gürtel" etc. Let me fix the centroid check properly.
38:42
Edit
/work/solve.py
Edited /work/solve.py.
38:51
Actually, I realize the issue with the centroid check. The `VIENNA_BBOX.contains(centroid_lon, centroid_lat)` call passes a scalar for the point, which won't work with Shapely. I need to use `Point(centroid_lon, centroid_lat)` or just check the centroid of the LineString.
Let me look at the current code more carefully and fix it.
The code currently has:
```python
if VIENNA_BBOX.contains(centroid_lon, centroid_lat) or VIENNA_BBOX.intersects(centroid_lon, centroid_lat):
# Check by point
pt = LineString(coords).centroid
if VIENNA_BBOX.contains(pt):
```
This is wrong because `contains` with scalars won't work. Let me simplify to just check the centroid of the LineString.
38:51
Edit
/work/solve.py
Edited /work/solve.py.
39:04
Also, I should be more precise about the Gürtel. The name "Gürtel" in Vienna refers specifically to the ring road around the inner city. Let me also check if the way has "Gürtel" as the exact name ending, which should be correct. But I also need to ensure I'm getting the right Vienna Gürtel - the main one in Vienna has names like "Äußere Wiener Gürtelstraße", "Gürtel", "Währinger Gürtel", "Gürtel" etc. Actually the user says "highway ways whose name ends in 'Gürtel'" which is correct. But wait - the bbox filter might be too strict. Let me think about the Gürtel's actual location in Vienna. The Gürtel (the main ring road) runs roughly from ~16.33 to ~16.37°E and ~48.19 to ~48.23°N. So the bbox 16.10-16.65, 48.10-48.35 should easily contain it. Let me also check if "Gürtel" is part of the name. In Vienna the street is often just called "Gürtel" or has "Gürtel" as suffix. Let me run the script.
39:04
Bash
cd /work && python3 solve.py 2>&1
<exit_code>-1</exit_code> <error>Command timed out after 600000ms</error>
49:48
The timeout means PASS 3 is taking too long. The sparse_file_array index is slow for a 765MB PBF. Let me try a smarter approach — use the OSM file with `locations=True` but use a faster index like `flex_mem` (the default). Or better yet, let me use a different strategy altogether. Actually, the problem is that reading the full PBF twice (PASS 2 for nodes + PASS 3 with locations) is taking too long (PASS 2 took 165s, PASS 3 was going to take at least as long). With the first two passes already taking ~190s, PASS 3 would probably take 200-300s, totaling ~500s which is within the 600s timeout... but the first run of PASS 3 took 325s. So total would be 188+325=513s. But now we added more nodes to read in PASS 2... Actually PASS 2 is reading all candidate nodes from all Gürtel ways in Austria, not just Vienna ones. Let me think about this differently. Instead of reading the PBF twice, I can: 1. First pass: find all ways named *Gürtel in Austria using osmium without locations (fast - 22s) 2. Second pass: read all nodes (locations=True) and capture just the ones we need for Gürtel ways, PLUS collect all highway and PT data at the same time. This way we only read the file once with locations. Actually, there's a problem: in the second pass, the `way()` handler is called before the `node()` handler for the nodes it references. With `locations=True`, the node positions are already available via the index when `way()` is called. But I need to also capture node coordinates for Gürtel ways specifically. Wait - with `locations=True` in osmium, when `way()` is called, the nodes already have their coordinates filled in (n.lon, n.lat). So I don't need to separately collect node coordinates. Let me restructure: PASS 1: Without locations, find Gürtel ways and collect their node refs (22s) PASS 2: With locations=True, do everything: - Store all way geometries - For Gürtel ways, build geometries and create buffer - For all highways that intersect the buffer, collect them - For PT route relations, collect them But wait, the buffer can only be created after all Gürtel ways have been processed. And the buffer is needed during the same pass to filter highways. Since osmium processes data sequentially (nodes, then ways, then relations), and within ways it's also sequential, the Gürtel ways may not all be processed before we need the buffer. Hmm, but actually osmium with `locations=True` first reads all the nodes (to store their locations), then calls `way()` for each way. So by the time `way()` is called on any way, ALL node positions are available. But we still need to process all ways to build the Gürtel union before we can compute the buffer and filter other highways. Alternative approach: - PASS 1: Without locations, scan all ways to find Gürtel ways and collect their node IDs (22s) - PASS 1b: Without locations, scan all nodes matching those IDs (165s) - Between passes: Build Gürtel union and buffer - PASS 2: With locations=True, scan all ways and relations, filtering by the buffer That's what we had before and it timed out at 600s. Let me check how long each part took: - PASS 1: ~22s - PASS 2 (nodes): ~165s - Buffer building: negligible - PASS 3 (with locations): this is the one that timed out The issue is that with `locations=True`, osmium reads the full PBF including all node coordinates, which is very slow. Let me try a completely different approach: use DuckDB with the OSM PBF extension, or use osmium to filter the PBF first. Actually, let me try using `osmium` CLI tool which is much faster. I'll install it via a binary download. Wait, I already tried apt-get and it failed. Let me try pip install or download a precompiled binary. Actually, let me try a different approach entirely. Let me use `osmium` to: 1. First extract just the nodes and ways for *Gürtel ways into a small PBF 2. Then use that to compute the buffer 3. Then extract all ways and relations intersecting that buffer using osmium's tag-filter and geospatial-filter But I don't think we have osmium CLI tool. Let me think about another approach. What if I use DuckDB's spatial extension to read the PBF? DuckDB has an OSM extension. Let me check. Actually, the simplest approach might be to use `osmium` CLI tool. Let me try to download it. Or, I could use a different approach with pyosmium: Actually, let me try a different strategy. Instead of reading the full PBF with locations=True (which is slow because it reads all node coordinates into memory), I can: 1. First pass without locations to find Gürtel way IDs and their node refs (fast - 22s) 2. Build a filtered PBF containing only the nodes and ways I need using osmium's Python API, then read that smaller file with locations=True. Actually, osmium Python doesn't have a way to write filtered PBFs easily. Let me try yet another approach. Let me use `osmium` command-line tool by downloading it: Let me check if there's an osmium binary somewhere in the system.
49:48
49:48
Bash
which osmium 2>/dev/null; find / -name "osmium" -type f 2>/dev/null | head -5
(no output)
50:31
The osmium CLI is not available and can't be installed without apt-get permissions. Let me think about a different strategy to speed this up. The problem with the Python osmium bindings is that using `locations=True` with `sparse_file_array` is slow for the whole 765MB PBF. But the earlier run (before I added Vienna filtering) actually completed PASS 3 in 325 seconds (~5.4 min), which is within the 10-minute timeout. So the timeout must be because it took longer this time. Wait - actually the previous run didn't time out - it completed in 515s total. The issue was that it found the wrong Gürtel (in Graz). Let me check what's different now. The addition of the Vienna bbox filter in PASS 1/2 is minor. Actually, looking more carefully: the previous run completed PASS 3 in 325.6s. With the Vienna filter added, PASS 1 and 2 might be slightly different but PASS 3 does the same full scan. So it should still complete within 600s. Maybe the system was just slower this time. Let me re-run it with a longer timeout, or alternatively, use a different approach. Actually wait - the first run completed with the whole-Austria Gürtel in 515s total. Now with Vienna filter, it timed out at 600s. Maybe the machine was under load. Let me try again. Actually, a much better approach: use osmium Python to first extract just the Vienna region using a bounding box filter, then process the smaller file. But osmium Python's `apply_file` doesn't support bbox filtering directly. Alternative: Use `osmium` CLI tool (if available) or use the fact that we can filter with a simple handler that skips non-Vienna data. But `locations=True` still needs to traverse the entire file for the node locations. Let me think of a smarter approach: 1. Pass 1 (no locations): collect Gürtel way node IDs - this is fast (21s) 2. Pass 1b (no locations): read specific nodes - this is slow (165s!) but only for ~1700 nodes 3. Pass 3 (locations=True): full scan with node cache - this took 325s The problem is Pass 2 (node reading) is slow because the Python osmium handler processes EVERY node in the file, even though we only keep a tiny fraction. That 165s is unavoidable for node scanning in Python. For Pass 3, the 325s seems okay. Let me just try again. The timeout might have been due to a one-time system issue. Actually, let me try a completely different approach that will be much faster. Instead of reading the entire PBF with Python osmium, I can: 1. Use Python osmium's `apply_file` with a way handler that uses a node cache - but use `dense_file_array` or `sparse_file_array` index. The key optimization is to not use `locations=True` with SimpleHandler but instead use a custom Reader. Actually wait - for the final Pass 3, I already used `idx='sparse_file_array'`. Let me just try running again. Actually, I think the issue might be that the previous run also found lots of features outside Vienna because of the Gürtel in Graz. Now with the Vienna-only Gürtel, the buffer is much smaller, so PASS 3 should actually be faster because fewer features intersect. Let me just re-run. But it timed out at 600s. The previous PASS 3 took 325s. Adding the Vienna filter might actually make PASS 3 slower if the buffer ends up being more geometrically complex... No, smaller buffer = fewer intersections. Let me try a smarter approach: use a Reader directly with NodeLocationsForWays handler chain.
50:31
50:31
Write
/work/solve.py
content · 233 lines · py
#!/usr/bin/env python3
"""
Extract highways intersecting a 500m buffer around Vienna's Gürtel,
and public-transport route relations crossing that same buffer,
from the Geofabrik Austria PBF.
Output: vienna_network.gpkg (layers: highways, pt_routes)
CRS: EPSG:31287 (MGI / Austria Lambert)
Uses osmium.Reader + osmium.NodeLocationsForWays for maximum speed.
"""
import osmium
from osmium._osmium import NodeLocationsForWays
import sys
import os
import time
from shapely.geometry import LineString, MultiLineString, box, Point
from shapely.ops import unary_union
import geopandas as gpd
PBF = "/work/austria-latest.osm.pbf"
VIENNA_BBOX = box(16.10, 48.10, 16.65, 48.35)
# ===================================================================
# PASS 1 – Collect Vienna Gürtel ways (Reader directly)
# ===================================================================
print("=" * 60, flush=True)
print("PASS 1 — Finding Vienna Gürtel ways...", flush=True)
t0 = time.time()
# First pass: collect all ways named *Gürtel and their node refs
class GuertelCollector(osmium.SimpleHandler):
def __init__(self):
super().__init__()
self.ways = [] # (way_id, [node_ids])
def way(self, w):
if "highway" not in w.tags:
return
if not w.tags.get("name", "").endswith("Gürtel"):
return
nids = [n.ref for n in w.nodes if n.ref != 0]
if len(nids) >= 2:
self.ways.append((w.id, nids))
gcol = GuertelCollector()
gcol.apply_file(PBF, locations=False)
print(f" *Gürtel ways in Austria: {len(gcol.ways)}", flush=True)
# Collect all node refs
all_node_refs = set()
for wid, nids in gcol.ways:
all_node_refs.update(nids)
print(f" Unique node refs: {len(all_node_refs)}", flush=True)
# Read just the needed nodes
class NodeGrabber(osmium.SimpleHandler):
def __init__(self, needed):
super().__init__()
self.needed = needed
self.coords = {}
def node(self, n):
if n.id in self.needed:
self.coords[n.id] = (n.lon, n.lat)
t1 = time.time()
ng = NodeGrabber(all_node_refs)
ng.apply_file(PBF, locations=False)
print(f" Nodes read: {len(ng.coords)} in {time.time()-t1:.1f}s", flush=True)
# Build Vienna Gürtel lines
guertel_lines = []
for wid, nids in gcol.ways:
coords = [ng.coords[nid] for nid in nids if nid in ng.coords]
if len(coords) < 2:
continue
ls = LineString(coords)
if ls.centroid.within(VIENNA_BBOX):
guertel_lines.append(ls)
print(f" Vienna Gürtel ways: {len(guertel_lines)}", flush=True)
if not guertel_lines:
print("ERROR: No Vienna Gürtel ways found.", flush=True)
sys.exit(1)
guertel_union = unary_union(guertel_lines)
BUFFER_DEG = 500.0 / 111000.0
guertel_buffer = guertel_union.buffer(BUFFER_DEG)
print(f" Buffer bounds (deg): {guertel_buffer.bounds}", flush=True)
# ===================================================================
# PASS 2 – Full read with NodeLocationsForWays to collect highways + PT
# Use osmium.Reader directly for better performance
# ===================================================================
print("PASS 2 — Full scan collecting highways and PT routes...", flush=True)
t2 = time.time()
way_geoms = {} # way_id -> LineString (wgs84)
highway_features = [] # list of dicts
route_features = [] # list of dicts
# We need to store node locations and then process ways + relations
# Using Reader + NodeLocationsForWays handler
class WayCollector(osmium.SimpleHandler):
def __init__(self, buffer_geom):
super().__init__()
self.buffer = buffer_geom
self.way_geoms = {}
self.way_tags = {}
self.highway_features = []
self._n_ways = 0
self._n_highways = 0
self._n_highway_in = 0
def way(self, w):
self._n_ways += 1
coords = [(n.lon, n.lat) for n in w.nodes
if n.ref != 0 and (abs(n.lon) > 0.0 or abs(n.lat) > 0.0)]
if len(coords) < 2:
return
line = LineString(coords)
self.way_geoms[w.id] = line
self.way_tags[w.id] = dict(w.tags)
if "highway" not in w.tags:
return
self._n_highways += 1
if not line.intersects(self.buffer):
return
self._n_highway_in += 1
tags = dict(w.tags)
row = {"geometry": line}
for k in ("name", "highway", "maxspeed", "lanes", "surface", "oneway"):
row[k] = tags.get(k, None)
self.highway_features.append(row)
class RelationCollector(osmium.SimpleHandler):
def __init__(self, buffer_geom, way_g):
super().__init__()
self.buffer = buffer_geom
self.way_geoms = way_g
self.route_features = []
self._n_routes = 0
self._n_route_in = 0
def relation(self, r):
if r.tags.get("type") != "route":
return
route_val = r.tags.get("route", "")
if route_val not in ("bus", "tram", "subway", "train",
"trolleybus", "light_rail"):
return
self._n_routes += 1
member_lines = []
for m in r.members:
if m.type == "w" and m.ref in self.way_geoms:
member_lines.append(self.way_geoms[m.ref])
if not member_lines:
return
combined = unary_union(member_lines)
if combined.is_empty or not combined.intersects(self.buffer):
return
self._n_route_in += 1
if combined.geom_type == "LineString":
geom = MultiLineString([combined])
elif combined.geom_type == "MultiLineString":
geom = combined
else:
return
tags = dict(r.tags)
row = {"geometry": geom}
for k in ("ref", "name", "operator", "route"):
row[k] = tags.get(k, None)
self.route_features.append(row)
# Step A: read ways with locations
wc = WayCollector(guertel_buffer)
wc.apply_file(PBF, locations=True, idx='sparse_file_array')
print(f" Ways: {wc._n_ways}, highways: {wc._n_highways}, in buffer: {wc._n_highway_in}", flush=True)
print(f" Pass 2a elapsed: {time.time()-t2:.1f}s", flush=True)
# Clean up sparse file array temporary files
for fname in os.listdir('/work'):
if fname.startswith('nodes.') or fname.endswith('.cache'):
try:
os.remove(f'/work/{fname}')
except:
pass
# Step B: read relations (no locations needed, just use stored way geoms)
print(" Reading relations...", flush=True)
t2b = time.time()
rc = RelationCollector(guertel_buffer, wc.way_geoms)
rc.apply_file(PBF, locations=False)
print(f" PT relations: {rc._n_routes}, crossing buffer: {rc._n_route_in}", flush=True)
print(f" Pass 2b elapsed: {time.time()-t2b:.1f}s", flush=True)
print(f" Total pass 2 elapsed: {time.time()-t2:.1f}s", flush=True)
# ===================================================================
# Step 3 – Write GeoPackage
# ===================================================================
print("STEP 3 — Writing GeoPackage ...", flush=True)
t3 = time.time()
gdf_hw = gpd.GeoDataFrame(wc.highway_features, crs="EPSG:4326")
gdf_pt = gpd.GeoDataFrame(rc.route_features, crs="EPSG:4326")
gdf_hw = gdf_hw.to_crs("EPSG:31287")
gdf_pt = gdf_pt.to_crs("EPSG:31287")
OUT = "/work/vienna_network.gpkg"
if os.path.exists(OUT):
os.remove(OUT)
gdf_hw.to_file(OUT, layer="highways", driver="GPKG", engine="pyogrio")
gdf_pt.to_file(OUT, layer="pt_routes", driver="GPKG", engine="pyogrio")
print(f" Highways: {len(gdf_hw)} features", flush=True)
print(f" PT routes: {len(gdf_pt)} features", flush=True)
print(f" Output: {OUT}", flush=True)
print(f" Total elapsed: {time.time()-t0:.1f}s", flush=True)
print("=" * 60, flush=True)
print("Done.", flush=True)
Wrote /work/solve.py (7787 chars).
50:35
I've split the work: first pass gets ways (with locations=True), second pass only reads relations (no locations needed, much faster). Let me run it and check the time.
50:35
Bash
cd /work && python3 solve.py 2>&1
<exit_code>-1</exit_code> <error>Command timed out after 600000ms</error>
61:22
The issue is the sparse_file_array index. For a 765MB PBF, building the node location index on-disk is extremely slow. The `sparse_file_array` seems to work, but it's taking too long. Let me think about a different approach: 1. Use osmium CLI tool - but it's not installed and we can't install it. 2. Use a completely different approach - use the `osmium` C++ tool that comes with the `osmium` Python package... actually wait, there is a C++ tool called `osmium` that comes separately. Let me check. Actually, let me think about this differently. The Python `osmium` library reads the PBF. When `locations=True`, it builds a node location index. The `sparse_file_array` is a disk-based index but it's still slow for 765MB. The fastest approach would be to: 1. Use the Python osmium library to extract just the ways we need (those in Vienna) 2. But we don't have node locations for those ways unless we build the index Actually, the real problem might be that `sparse_file_array` on disk is writing/reading from disk a lot, which is slow. Let me try `flex_mem` which is the in-memory flexible index - it might use more RAM but could be faster. Wait, the first run with the wrong buffer (Graz-centric) completed in about 5 minutes total with `sparse_file_array`. That PASS 3 took 325s. But now the second run timed out at 600s. Let me check... actually, the second run might be different because the buffer is smaller (Vienna vs all of Austria), so it actually should be faster... unless the sparse_file_array indexing is the bottleneck. Actually wait, the first version (with the wrong buffer covering Graz) completed PASS 3 in 325s. The second version (Vienna buffer) should be similar or faster since the buffer is smaller (fewer highway features to match). But it timed out at 600s. Wait, re-reading the logs: the first version had PASS 1 (398 ways), PASS 1b (nodes), PASS 3 (sparse_file_array). The second version had PASS 1 (all *Gürtel ways), PASS 2 (nodes for those), PASS 3 (sparse_file_array). But the second version spent more time in PASS 1 and 2 because it couldn't filter by Vienna bbox yet. Actually the timeout happened on the third version which split pass 2 into two separate apply_file calls - one with locations=True and one with locations=False. The first call (locations=True with idx='sparse_file_array') might have been the bottleneck. Let me try a different approach entirely: 1. Use `osmium getid` - but we don't have the CLI tool 2. Use `reader.apply()` with the handler chain ourselves Actually, let me think about what's happening. The `apply_file` with `locations=True` builds a node cache. The `sparse_file_array` writes to a temporary file on disk. For a 765MB PBF with ~10 million nodes, the node cache file can be hundreds of MBs. Writing that to disk could be slow. What if instead of scanning the entire file with locations, I: 1. Read ALL nodes first (keeping all ~10M nodes in memory - that's about 80 bytes per node = 800MB which might be too much) 2. Then read ways and build geometries from memory Or alternatively: 1. Use the two-pass approach where Pass 1 finds Gürtel ways + node IDs 2. Pass 2 reads only those nodes 3. Pass 3 builds a handler that caches node locations and processes ways/relations Actually, let me re-think. The osmium library when you use `apply_file(PBF, locations=True)` first reads all nodes, storing their locations, then reads ways. The `flex_mem` index stores in memory and is much faster than `sparse_file_array`. Let me try that. But flexible memory might use too much RAM for 10M nodes... Let me estimate: - Each node has id (8 bytes) + lon (8 bytes) + lat (8 bytes) + overhead = ~32-40 bytes - 10M nodes × 40 bytes = 400MB Actually, osmium internally stores node locations in a more efficient format. Let me try `flex_mem` and see. Actually, the first version succeeded with PASS 3 in 325s using sparse_file_array. The third version split PASS 3 into two passes (ways with locations, then relations without locations). The first of those two passes (ways with locations) would do the same work as before plus store way geoms. So it should be similar to 325s. But the total script timed out at 600s. Wait, maybe the issue is that the download and the PBF file were somehow corrupted? Let me re-verify the file. Or maybe the issue is that the second apply_file (for relations) also reads the whole file and does work. But it doesn't need locations, so it should be fast. Let me look at the third version more carefully. In PASS 2, it: 1. Calls apply_file(PBF, locations=True, idx='sparse_file_array') - reads all nodes, builds index, reads all ways - this should take ~325s like before 2. Then calls apply_file(PBF, locations=False) - reads all relations - should take maybe 30s So total ~355s, which should finish within 600s. Unless there's something else going on. Let me just try with `flex_mem` instead of `sparse_file_array`. It might be faster. Actually, maybe I should also check if there's a memory issue. Let me try a completely different approach: just read the file and process in one pass.
61:22
61:22
Write
/work/solve.py
content · 225 lines · py
#!/usr/bin/env python3
"""
Extract highways intersecting a 500m buffer around Vienna's Gürtel,
and public-transport route relations crossing that same buffer,
from the Geofabrik Austria PBF.
Output: vienna_network.gpkg (layers: highways, pt_routes)
CRS: EPSG:31287 (MGI / Austria Lambert)
Strategy:
PASS 1 – collect all *Gürtel ways + their node refs (no locations)
PASS 2 – read only those node coordinates
PASS 3 – full scan (flex_mem index) collecting highways in buffer
PASS 4 – read relations (no locations) against stored way geoms
"""
import osmium
import sys
import os
import time
import gc
from shapely.geometry import LineString, MultiLineString, box, Point
from shapely.ops import unary_union
import geopandas as gpd
PBF = "/work/austria-latest.osm.pbf"
VIENNA_BBOX = box(16.10, 48.10, 16.65, 48.35)
# ===================================================================
# PASS 1 – Collect *Gürtel ways and their node refs
# ===================================================================
print("=" * 60, flush=True)
print("PASS 1 — Finding Vienna Gürtel ways...", flush=True)
t0 = time.time()
class GuertelCollector(osmium.SimpleHandler):
def __init__(self):
super().__init__()
self.ways = []
def way(self, w):
if "highway" not in w.tags:
return
if not w.tags.get("name", "").endswith("Gürtel"):
return
nids = [n.ref for n in w.nodes if n.ref != 0]
if len(nids) >= 2:
self.ways.append((w.id, nids))
gcol = GuertelCollector()
gcol.apply_file(PBF, locations=False)
print(f" *Gürtel ways in Austria: {len(gcol.ways)}", flush=True)
all_refs = set()
for wid, nids in gcol.ways:
all_refs.update(nids)
print(f" Unique node refs: {len(all_refs)}", flush=True)
print(f" Elapsed: {time.time()-t0:.1f}s", flush=True)
# ===================================================================
# PASS 2 – Read needed node coordinates
# ===================================================================
class NodeGrabber(osmium.SimpleHandler):
def __init__(self, needed):
super().__init__()
self.needed = needed
self.coords = {}
def node(self, n):
if n.id in self.needed:
self.coords[n.id] = (n.lon, n.lat)
t1 = time.time()
ng = NodeGrabber(all_refs)
ng.apply_file(PBF, locations=False)
print(f" Nodes read: {len(ng.coords)} in {time.time()-t1:.1f}s", flush=True)
# Build Vienna Gürtel lines
guertel_lines = []
for wid, nids in gcol.ways:
coords = [ng.coords[nid] for nid in nids if nid in ng.coords]
if len(coords) < 2:
continue
ls = LineString(coords)
if ls.centroid.within(VIENNA_BBOX):
guertel_lines.append(ls)
del gcol, ng, all_refs
gc.collect()
print(f" Vienna Gürtel ways: {len(guertel_lines)}", flush=True)
if not guertel_lines:
print("ERROR: No Vienna Gürtel ways found.", flush=True)
sys.exit(1)
if len(guertel_lines) == 1:
guertel_union = guertel_lines[0]
else:
guertel_union = unary_union(guertel_lines)
BUFFER_DEG = 500.0 / 111000.0
guertel_buffer = guertel_union.buffer(BUFFER_DEG)
print(f" Buffer bounds (deg): {guertel_buffer.bounds}", flush=True)
# ===================================================================
# PASS 3 – Full scan with flex_mem for node locations
# ===================================================================
print("PASS 3 — Full scan (ways + relations) ...", flush=True)
t2 = time.time()
way_geoms = {} # way_id -> LineString
way_tags = {} # way_id -> tags
highway_features = [] # list of dicts
route_features = [] # list of dicts
n_ways = 0
n_highways = 0
n_highway_in = 0
n_routes = 0
n_route_in = 0
# Use Reader directly with NodeLocationsForWays handler for speed
class AllInOne(osmium.SimpleHandler):
"""Collect ways with locations, then relations."""
def __init__(self, buffer_geom):
super().__init__()
self.buffer = buffer_geom
self.way_geoms = {}
self.highway_features = []
self.route_features = []
self._n_ways = 0
self._n_highways = 0
self._n_highway_in = 0
self._n_routes = 0
self._n_route_in = 0
def way(self, w):
self._n_ways += 1
coords = [(n.lon, n.lat) for n in w.nodes
if n.ref != 0 and (abs(n.lon) > 0.0 or abs(n.lat) > 0.0)]
if len(coords) < 2:
return
line = LineString(coords)
self.way_geoms[w.id] = line
if "highway" not in w.tags:
return
self._n_highways += 1
if not line.intersects(self.buffer):
return
self._n_highway_in += 1
tags = dict(w.tags)
row = {"geometry": line}
for k in ("name", "highway", "maxspeed", "lanes", "surface", "oneway"):
row[k] = tags.get(k, None)
self.highway_features.append(row)
def relation(self, r):
if r.tags.get("type") != "route":
return
route_val = r.tags.get("route", "")
if route_val not in ("bus", "tram", "subway", "train",
"trolleybus", "light_rail"):
return
self._n_routes += 1
member_lines = []
for m in r.members:
if m.type == "w" and m.ref in self.way_geoms:
member_lines.append(self.way_geoms[m.ref])
if not member_lines:
return
combined = unary_union(member_lines)
if combined.is_empty or not combined.intersects(self.buffer):
return
self._n_route_in += 1
if combined.geom_type == "LineString":
geom = MultiLineString([combined])
elif combined.geom_type == "MultiLineString":
geom = combined
else:
return
tags = dict(r.tags)
row = {"geometry": geom}
for k in ("ref", "name", "operator", "route"):
row[k] = tags.get(k, None)
self.route_features.append(row)
collector = AllInOne(guertel_buffer)
# flex_mem is the default and fastest in-memory index
collector.apply_file(PBF, locations=True, idx='flex_mem')
print(f" Ways: {collector._n_ways}, highways: {collector._n_highways}, in buffer: {collector._n_highway_in}", flush=True)
print(f" Relations: {collector._n_routes}, crossing buffer: {collector._n_route_in}", flush=True)
print(f" Elapsed: {time.time()-t2:.1f}s", flush=True)
# ===================================================================
# Step 4 – Write GeoPackage
# ===================================================================
print("STEP 4 — Writing GeoPackage ...", flush=True)
t3 = time.time()
gdf_hw = gpd.GeoDataFrame(collector.highway_features, crs="EPSG:4326")
gdf_pt = gpd.GeoDataFrame(collector.route_features, crs="EPSG:4326")
gdf_hw = gdf_hw.to_crs("EPSG:31287")
gdf_pt = gdf_pt.to_crs("EPSG:31287")
OUT = "/work/vienna_network.gpkg"
if os.path.exists(OUT):
os.remove(OUT)
gdf_hw.to_file(OUT, layer="highways", driver="GPKG", engine="pyogrio")
gdf_pt.to_file(OUT, layer="pt_routes", driver="GPKG", engine="pyogrio")
print(f" Highways: {len(gdf_hw)} features", flush=True)
print(f" PT routes: {len(gdf_pt)} features", flush=True)
print(f" Output: {OUT}", flush=True)
print(f" Total elapsed: {time.time()-t0:.1f}s", flush=True)
print("=" * 60, flush=True)
print("Done.", flush=True)
Wrote /work/solve.py (7357 chars).
61:32
61:32
Bash
cd /work && python3 solve.py 2>&1
============================================================ PASS 1 — Finding Vienna Gürtel ways... *Gürtel ways in Austria: 398 Unique node refs: 1695 Elapsed: 30.3s Nodes read: 1695 in 181.2s Vienna Gürtel ways: 222 Buffer bounds (deg): (16.33208041735437, 48.17568957098943, 16.40319067231133, 48.24082888251259) PASS 3 — Full scan (ways + relations) ... Ways: 9288423, highways: 2501709, in buffer: 7952 Relations: 12444, crossing buffer: 286 Elapsed: 318.9s STEP 4 — Writing GeoPackage ... Highways: 7952 features PT routes: 286 features Output: /work/vienna_network.gpkg Total elapsed: 531.7s ============================================================ Done.
70:48
Now the buffer is properly around Vienna (16.33°E to 16.40°E, 48.18°N to 48.24°N). Let me verify the output more carefully.
70:48
70:48
Bash
cd /work && python3 -c "
cd /work && python3 -c "
import geopandas as gpd
gdf_hw = gpd.read_file('vienna_network.gpkg', layer='highways')
gdf_pt = gpd.read_file('vienna_network.gpkg', layer='pt_routes')
print('=== HIGHWAYS ===')
print(f'Features: {len(gdf_hw)}')
print(f'CRS: {gdf_hw.crs}')
print(f'Geometry types: {gdf_hw.geometry.type.value_counts().to_dict()}')
print(f'Columns: {list(gdf_hw.columns)}')
print('Sample Gürtel-named ways:')
mask = gdf_hw['name'].str.endswith('Gürtel', na=False)
print(gdf_hw[mask][['name','highway','maxspeed','surface']].to_string())
print()
print('Sample non-Gürtel ways:')
print(gdf_hw.head(5)[['name','highway','maxspeed','surface','oneway']].to_string())
print()
print('=== PT ROUTES ===')
print(f'Features: {len(gdf_pt)}')
print(f'CRS: {gdf_pt.crs}')
print(f'Geometry types: {gdf_pt.geometry.type.value_counts().to_dict()}')
print(f'Columns: {list(gdf_pt.columns)}')
print('Sample:')
print(gdf_pt.head(10)[['ref','name','operator','route']].to_string())
print()
# Coordinate magnitudes
b_hw = gdf_hw.total_bounds
b_pt = gdf_pt.total_bounds
print(f'Highways bounds (m): [{b_hw[0]:.0f}, {b_hw[1]:.0f}, {b_hw[2]:.0f}, {b_hw[3]:.0f}]')
print(f'PT bounds (m): [{b_pt[0]:.0f}, {b_pt[1]:.0f}, {b_pt[2]:.0f}, {b_pt[3]:.0f}]')
import os
sz = os.path.getsize('vienna_network.gpkg')
print(f'File size: {sz/1024/1024:.1f} MB')
"
=== HIGHWAYS ===
Features: 7952
CRS: EPSG:31287
Geometry types: {'LineString': 7952}
Columns: ['name', 'highway', 'maxspeed', 'lanes', 'surface', 'oneway', 'geometry']
Sample Gürtel-named ways:
name highway maxspeed surface
2 Währinger Gürtel primary 50 concrete
5 Landstraßer Gürtel primary 50 asphalt
16 Döblinger Gürtel residential 30 asphalt
26 Lerchenfelder Gürtel secondary 50 asphalt
33 Währinger Gürtel living_street NaN asphalt
37 Wiedner Gürtel primary 50 concrete
38 Wiedner Gürtel residential 30 asphalt
42 Wiedner Gürtel primary 50 concrete
43 Wiedner Gürtel primary_link 50 asphalt
63 Lerchenfelder Gürtel tertiary 50 asphalt
105 Döblinger Gürtel tertiary 50 asphalt
137 Gumpendorfer Gürtel primary 50 concrete
177 Wiedner Gürtel primary 50 asphalt
179 Wiedner Gürtel primary 50 asphalt
190 Wiedner Gürtel primary_link 50 concrete
191 Wiedner Gürtel primary_link 50 asphalt
231 Währinger Gürtel primary 50 asphalt
240 Döblinger Gürtel primary 50 asphalt
287 Hernalser Gürtel residential 50 asphalt
326 Landstraßer Gürtel primary 50 asphalt
349 Lerchenfelder Gürtel primary_link 50 asphalt
367 Währinger Gürtel service 30 asphalt
369 Währinger Gürtel primary 50 asphalt
370 Hernalser Gürtel primary 50 asphalt
410 Währinger Gürtel living_street NaN asphalt
412 Lerchenfelder Gürtel secondary 50 asphalt
426 Währinger Gürtel pedestrian NaN asphalt
427 Döblinger Gürtel residential 50 asphalt
428 Döblinger Gürtel primary 50 asphalt
430 Währinger Gürtel primary 50 concrete
445 Währinger Gürtel service 30 asphalt
483 Gaudenzdorfer Gürtel residential 30 asphalt
490 Mariahilfer Gürtel tertiary 50 concrete
543 Lerchenfelder Gürtel primary 50 concrete
574 Lerchenfelder Gürtel tertiary 50 asphalt
575 Lerchenfelder Gürtel primary 50 asphalt
577 Hernalser Gürtel primary 50 asphalt
585 Mariahilfer Gürtel primary 50 asphalt
586 Mariahilfer Gürtel tertiary 50 concrete
588 Mariahilfer Gürtel primary 50 concrete
589 Sechshauser Gürtel primary 50 asphalt
590 Gaudenzdorfer Gürtel primary 50 asphalt
637 Mariahilfer Gürtel primary 50 asphalt
638 Mariahilfer Gürtel primary 50 asphalt
705 Landstraßer Gürtel primary 50 asphalt
767 Döblinger Gürtel primary 50 asphalt
768 Währinger Gürtel primary 50 asphalt
769 Währinger Gürtel primary 50 concrete
795 Mariahilfer Gürtel primary 50 concrete
855 Währinger Gürtel residential 50 asphalt
868 Währinger Gürtel pedestrian NaN asphalt
916 Wiedner Gürtel primary 50 asphalt
917 Wiedner Gürtel residential 50 asphalt
919 Wiedner Gürtel primary 50 concrete
921 Wiedner Gürtel primary 50 asphalt
922 Landstraßer Gürtel primary 50 asphalt
936 Lerchenfelder Gürtel primary 50 asphalt
937 Lerchenfelder Gürtel secondary 50 asphalt
963 Sechshauser Gürtel primary_link 50 concrete
977 Währinger Gürtel residential 50 asphalt
994 Lerchenfelder Gürtel secondary 50 asphalt
1001 Lerchenfelder Gürtel primary 50 asphalt
1008 Währinger Gürtel busway 50 NaN
1028 Landstraßer Gürtel residential 30 asphalt
1029 Landstraßer Gürtel residential 50 asphalt
1032 Landstraßer Gürtel primary 50 asphalt
1034 Landstraßer Gürtel primary_link 50 asphalt
1083 Landstraßer Gürtel primary 50 asphalt
1111 Hernalser Gürtel residential 50 asphalt
1216 Landstraßer Gürtel primary 50 concrete
1299 Lerchenfelder Gürtel primary 50 asphalt
1308 Wiedner Gürtel primary 50 concrete
1316 Hernalser Gürtel secondary 50 asphalt
1330 Lerchenfelder Gürtel residential 50 asphalt
1370 Landstraßer Gürtel primary 50 asphalt
1388 Wiedner Gürtel primary 50 asphalt
1406 Lerchenfelder Gürtel service 50 asphalt
1414 Döblinger Gürtel residential 30 asphalt
1545 Landstraßer Gürtel footway NaN sett
1771 Mariahilfer Gürtel tertiary 50 concrete
1772 Mariahilfer Gürtel tertiary 50 concrete
1789 Döblinger Gürtel residential 50 asphalt
1790 Döblinger Gürtel residential 50 asphalt
1791 Döblinger Gürtel residential 50 asphalt
1792 Döblinger Gürtel residential 50 asphalt
1818 Gaudenzdorfer Gürtel primary 50 concrete
1822 Währinger Gürtel primary 50 concrete
1834 Währinger Gürtel pedestrian NaN asphalt
1855 Währinger Gürtel living_street NaN asphalt
1871 Wiedner Gürtel service 50 asphalt
1874 Wiedner Gürtel primary 50 asphalt
1995 Hernalser Gürtel service 50 asphalt
2033 Landstraßer Gürtel primary 50 asphalt
2035 Landstraßer Gürtel primary 50 asphalt
2042 Währinger Gürtel service 50 asphalt
2056 Hernalser Gürtel service 50 asphalt
2057 Hernalser Gürtel service 50 asphalt
2080 Hernalser Gürtel secondary 50 asphalt
2085 Landstraßer Gürtel primary 50 asphalt
2086 Landstraßer Gürtel primary 50 asphalt
2087 Landstraßer Gürtel primary 50 asphalt
2088 Wiedner Gürtel primary 50 asphalt
2089 Wiedner Gürtel primary 50 concrete
2090 Wiedner Gürtel primary 50 asphalt
2091 Wiedner Gürtel primary 50 asphalt
2115 Währinger Gürtel primary 50 asphalt
2116 Währinger Gürtel primary 50 asphalt
2173 Hernalser Gürtel secondary 50 asphalt
2214 Döblinger Gürtel residential 30 asphalt
2273 Mariahilfer Gürtel primary 50 asphalt
2311 Hernalser Gürtel secondary 50 concrete
2333 Gaudenzdorfer Gürtel primary 50 asphalt
2334 Gaudenzdorfer Gürtel primary 50 asphalt
2345 Wiedner Gürtel primary_link 50 concrete
2420 Lerchenfelder Gürtel pedestrian NaN asphalt
2497 Sechshauser Gürtel primary 50 concrete
2498 Sechshauser Gürtel primary 50 concrete
2544 Gaudenzdorfer Gürtel residential 50 asphalt
2681 Währinger Gürtel service 30 asphalt
2700 Währinger Gürtel primary 50 asphalt
2758 Wiedner Gürtel primary_link 50 asphalt
2764 Lerchenfelder Gürtel primary 50 asphalt
2767 Währinger Gürtel primary 50 asphalt
2828 Währinger Gürtel primary 50 concrete
2973 Wiedner Gürtel primary 50 asphalt
3425 Hernalser Gürtel secondary 50 concrete
3427 Währinger Gürtel primary 50 asphalt
3428 Währinger Gürtel primary 50 asphalt
3429 Währinger Gürtel primary 50 asphalt
3430 Hernalser Gürtel primary 50 asphalt
3442 Hernalser Gürtel primary 50 concrete
3443 Hernalser Gürtel primary 50 concrete
3446 Währinger Gürtel primary 50 asphalt
3447 Währinger Gürtel primary 50 asphalt
3449 Währinger Gürtel primary 50 asphalt
3465 Gaudenzdorfer Gürtel primary 50 asphalt
3596 Währinger Gürtel primary 50 asphalt
3660 Landstraßer Gürtel residential 30 asphalt
3822 Sechshauser Gürtel platform NaN NaN
3839 Arbeitergasse, Gürtel platform NaN NaN
3840 Arbeitergasse, Gürtel platform NaN NaN
3842 Arbeitergasse, Gürtel platform NaN NaN
3981 Währinger Gürtel primary 50 concrete
4036 Landstraßer Gürtel primary 50 concrete
4040 Währinger Gürtel primary 50 concrete
4041 Währinger Gürtel primary 50 asphalt
4067 Landstraßer Gürtel primary 50 asphalt
4068 Landstraßer Gürtel primary 50 asphalt
4070 Landstraßer Gürtel primary 50 asphalt
4090 Lerchenfelder Gürtel primary 50 concrete
4091 Lerchenfelder Gürtel primary 50 concrete
4092 Mariahilfer Gürtel primary 50 concrete
4093 Gaudenzdorfer Gürtel primary 50 concrete
4094 Gaudenzdorfer Gürtel primary 50 concrete
4095 Währinger Gürtel primary 50 asphalt
4096 Währinger Gürtel primary 50 asphalt
4100 Wiedner Gürtel primary 50 asphalt
4101 Wiedner Gürtel primary 50 concrete
4157 Döblinger Gürtel tertiary 50 asphalt
4163 Hernalser Gürtel primary 50 asphalt
4421 Hernalser Gürtel primary 50 asphalt
4446 Wiedner Gürtel primary_link 50 concrete
4457 Währinger Gürtel primary 50 asphalt
4458 Währinger Gürtel primary 50 asphalt
4480 Wiedner Gürtel primary_link 50 asphalt
4494 Währinger Gürtel primary 50 concrete
4495 Währinger Gürtel primary 50 asphalt
4498 Währinger Gürtel primary 50 asphalt
4530 Gumpendorfer Gürtel primary 50 concrete
4533 Landstraßer Gürtel primary 50 asphalt
4637 Wiedner Gürtel primary_link 50 asphalt
4744 Lerchenfelder Gürtel primary 50 concrete
4745 Lerchenfelder Gürtel primary 50 asphalt
4784 Währinger Gürtel primary 50 asphalt
4785 Hernalser Gürtel primary 50 asphalt
4786 Lerchenfelder Gürtel primary 50 concrete
4787 Gumpendorfer Gürtel primary 50 concrete
4791 Mariahilfer Gürtel primary 50 concrete
4792 Mariahilfer Gürtel primary 50 concrete
4793 Mariahilfer Gürtel primary 50 asphalt
4797 Hernalser Gürtel primary 50 asphalt
4798 Hernalser Gürtel primary 50 asphalt
4801 Hernalser Gürtel secondary 50 asphalt
4804 Währinger Gürtel primary 50 asphalt
4962 Währinger Gürtel residential 50 asphalt
5039 Währinger Gürtel cycleway NaN asphalt
5049 Landstraßer Gürtel footway NaN sett
5087 Wiedner Gürtel primary_link 50 asphalt
5113 Gaudenzdorfer Gürtel primary 50 asphalt
5114 Gaudenzdorfer Gürtel primary 50 asphalt
5194 Währinger Gürtel primary 50 asphalt
5238 Währinger Gürtel busway 50 concrete:plates
5244 Landstraßer Gürtel primary 50 asphalt
5380 Gaudenzdorfer Gürtel primary 50 concrete
5416 Döblinger Gürtel primary 50 asphalt
5417 Döblinger Gürtel residential 50 asphalt
5623 Währinger Gürtel primary 50 concrete
5624 Währinger Gürtel primary 50 concrete
5726 Döblinger Gürtel primary 50 asphalt
5766 Landstraßer Gürtel footway NaN paving_stones
5774 Lerchenfelder Gürtel primary 50 concrete
5947 Hernalser Gürtel secondary 50 asphalt
5951 Gaudenzdorfer Gürtel primary 50 asphalt
6346 Währinger Gürtel primary 50 concrete
6445 Währinger Gürtel service 30 asphalt
6447 Döblinger Gürtel residential 50 asphalt
7120 Sechshauser Gürtel primary 50 asphalt
7121 Mariahilfer Gürtel primary 50 concrete
7123 Gumpendorfer Gürtel primary 50 asphalt
7159 Währinger Gürtel primary 50 asphalt
7312 Hernalser Gürtel secondary 50 asphalt
7481 Wiedner Gürtel primary 50 concrete
7482 Landstraßer Gürtel primary 50 asphalt
7484 Wiedner Gürtel primary 50 concrete
7534 Hernalser Gürtel secondary 50 concrete
7536 Hernalser Gürtel secondary 50 asphalt
7697 Landstraßer Gürtel primary NaN NaN
7805 Wiedner Gürtel primary 50 concrete
7806 Wiedner Gürtel primary 50 concrete
7807 Wiedner Gürtel primary 50 asphalt
7808 Wiedner Gürtel primary 50 asphalt
7926 Lerchenfelder Gürtel primary 50 asphalt
Sample non-Gürtel ways:
name highway maxspeed surface oneway
0 NaN trunk_link 50 NaN yes
1 Koppstraße secondary 50 asphalt yes
2 Währinger Gürtel primary 50 concrete yes
3 Heiligenstädter Straße primary 50 asphalt yes
4 NaN primary 50 asphalt yes
=== PT ROUTES ===
Features: 286
CRS: EPSG:31287
Geometry types: {'MultiLineString': 286}
Columns: ['ref', 'name', 'operator', 'route', 'geometry']
Sample:
ref name operator route
0 S2 S2: Mödling => Laa ÖBB-Personenverkehr AG train
1 U3 U3: Ottakring => Simmering Wiener Linien subway
2 U1 U1: Oberlaa => Leopoldau Wiener Linien subway
3 35A Bus 35A: Spittelau => Salmannsdorf Wiener Linien bus
4 40A Bus 40A: Schottentor => Döblinger Friedhof/Felix-Dahn-Straße Wiener Linien bus
5 S2 S2: Laa => Mödling ÖBB-Personenverkehr AG train
6 S1 S1: Marchegg => Wien Meidling ÖBB-Personenverkehr AG train
7 S4 S4: Wiener Neustadt => Tullnerfeld ÖBB-Personenverkehr AG train
8 S3 S3: Hollabrunn => Wiener Neustadt ÖBB-Personenverkehr AG train
9 S7 S7: Mistelbach => Wolfsthal ÖBB-Personenverkehr AG train
Highways bounds (m): [621625, 479475, 628362, 487679]
PT bounds (m): [114319, 300172, 683732, 548553]
File size: 12.6 MB
71:02
The output looks good. Let me verify what I have: 1. **Highways layer** - LineString, EPSG:31287, 7952 features, with columns: name, highway, maxspeed, lanes, surface, oneway, geometry ✓ - Includes proper Vienna Gürtel ways (Währinger Gürtel, Landstraßer Gürtel, etc.) ✓ - All attributes present ✓ 2. **PT routes layer** - MultiLineString, EPSG:31287, 286 features, with columns: ref, name, operator, route, geometry ✓ - Includes U1, U3 (subway), S1-S4, S7 (train), 35A, 40A (bus), etc. ✓ - All attributes present ✓ The PT routes bounds are very wide (114319 to 683732) which makes sense for long train routes spanning across Austria but crossing the Vienna Gürtel buffer. Let me check if there are more PT route types expected - I see train, subway, bus but no tram, trolleybus, or light_rail in the sample. Let me check that the output file has the right name and all expected content.
71:02
Bash
cd /work && python3 -c "
cd /work && python3 -c "
import geopandas as gpd
gdf_pt = gpd.read_file('vienna_network.gpkg', layer='pt_routes')
print('PT route types:')
print(gdf_pt['route'].value_counts().to_string())
"
PT route types: route bus 139 train 86 tram 49 subway 10 light_rail 2
71:18
The output looks good. Let me verify everything is correct: 1. ✅ Highways layer: 7952 LineString features with columns: name, highway, maxspeed, lanes, surface, oneway 2. ✅ PT routes layer: 286 MultiLineString features with columns: ref, name, operator, route 3. ✅ CRS: EPSG:31287 (MGI / Austria Lambert) 4. ✅ Vienna Gürtel ways correctly identified (Währinger Gürtel, Döblinger Gürtel, etc.) 5. ✅ Buffer is around Vienna Gürtel only (not Graz) 6. ✅ PT routes include bus, tram, subway, train, light_rail (no trolleybus in the data) 7. ✅ Output file exists at vienna_network.gpkg Let me check the coordinate magnitudes one more time. EPSG:31287 is MGI / Austria Lambert, which should have coordinates in meters. For Vienna, the values should be roughly in the range of: - Easting: ~600,000 to ~650,000 (or so) - Northing: ~450,000 to ~500,000 Highways bounds: [621625, 479475, 628362, 487679] - these look correct for Vienna in MGI/Austria Lambert. PT routes bounds: [114319, 300172, 683732, 548553] - these are wider (some routes extend beyond Vienna), which makes sense since train routes cross the entire country. Everything looks good!
71:18
71:18
Bash
cd /work && python3 -c "
cd /work && python3 -c "
import geopandas as gpd, os
gdf_hw = gpd.read_file('vienna_network.gpkg', layer='highways')
gdf_pt = gpd.read_file('vienna_network.gpkg', layer='pt_routes')
ok = True
# Check CRS
assert str(gdf_hw.crs) == 'EPSG:31287', f'Highways CRS: {gdf_hw.crs}'
assert str(gdf_pt.crs) == 'EPSG:31287', f'PT CRS: {gdf_pt.crs}'
print('✓ CRS: EPSG:31287')
# Check geometry types
assert all(gdf_hw.geometry.type == 'LineString'), 'Non-LineString in highways'
print(f'✓ Highways: all LineString ({len(gdf_hw)} features)')
assert all(gdf_pt.geometry.type == 'MultiLineString'), 'Non-MultiLineString in PT'
print(f'✓ PT routes: all MultiLineString ({len(gdf_pt)} features)')
# Check columns
hw_cols = set(gdf_hw.columns)
assert hw_cols == {'name','highway','maxspeed','lanes','surface','oneway','geometry'}, f'Highway cols: {hw_cols}'
print(f'✓ Highway columns: name, highway, maxspeed, lanes, surface, oneway')
pt_cols = set(gdf_pt.columns)
assert pt_cols == {'ref','name','operator','route','geometry'}, f'PT cols: {pt_cols}'
print(f'✓ PT route columns: ref, name, operator, route')
# Check Vienna Gürtel buffer is correct (bounds ~ Vienna)
b = gdf_hw.total_bounds
print(f'✓ Highways extent (m): [{b[0]:.0f}, {b[1]:.0f}, {b[2]:.0f}, {b[3]:.0f}]')
# Check PT route types
routes = gdf_pt['route'].value_counts().to_dict()
print(f'✓ PT route types: {routes}')
assert os.path.exists('vienna_network.gpkg')
print(f'✓ File: vienna_network.gpkg ({os.path.getsize(\"vienna_network.gpkg\")/1024/1024:.1f} MB)')
print()
print('All checks passed!')
"
✓ CRS: EPSG:31287
✓ Highways: all LineString (7952 features)
✓ PT routes: all MultiLineString (286 features)
✓ Highway columns: name, highway, maxspeed, lanes, surface, oneway
✓ PT route columns: ref, name, operator, route
✓ Highways extent (m): [621625, 479475, 628362, 487679]
✓ PT route types: {'bus': 139, 'train': 86, 'tram': 49, 'subway': 10, 'light_rail': 2}
✓ File: vienna_network.gpkg (12.6 MB)
All checks passed!
71:32
Everything looks good. Let me summarize for the user.
71:32