run-20260608-074701Z / fio-l3-vienna-geofabrik-highways

0:00

I'm building a traffic-noise model of the Gürtel corridor for the City of Vienna and need the surrounding network as a single GeoPackage. Can you pull every highway from the current Geofabrik Austria PBF (https://download.geofabrik.de/europe/austria-latest.osm.pbf) that intersects a 500 m buffer around Vienna's Gürtel, defined as the highway ways whose name ends in 'Gürtel'? I also need every public-transport route relation (bus, tram, subway, train, trolleybus, light_rail) that crosses the same buffer, with each route as a single feature. Keep the OSM attributes full and untruncated: name, highway, maxspeed, lanes, surface, and oneway on highways; ref, name, operator, and route on the PT routes. Please write both layers to vienna_network.gpkg, with the highways layer as LineString and the pt_routes layer as MultiLineString, in Austria's standard projected coordinate system.

0:00

System info

{
  "adapter": "openrouter",
  "model": "deepseek/deepseek-v4-flash",
  "base_url": "https://openrouter.ai/api/v1"
}

4:02

The user wants me to:

1. Download the Austria PBF from Geofabrik
2. Find all highway ways whose name ends in 'Gürtel' (the Gürtel road in Vienna)
3. Create a 500m buffer around those ways
4. Extract all highways that intersect that buffer
5. Extract all public transport route relations (bus, tram, subway, train, trolleybus, light_rail) that cross the same buffer
6. Write both layers to vienna_network.gpkg with specific attributes and CRS (Austria's standard projected CRS - EPSG:31256 or EPSG:31287... let me think - Austria's standard national CRS is EPSG:31256 (MGI / Austria GK East) or more commonly EPSG:31287 (MGI / Austria Lambert). Actually for Vienna, EPSG:31256 is the Gauss-Krüger East zone. But "Austria's standard projected coordinate system" is typically EPSG:31287 (MGI Austria Lambert) or EPSG:31256 (MGI / Austria GK East). Let me check - the most commonly used standard projected CRS for all of Austria is EPSG:3416 (MGI / Austria GK Central) or EPSG:31287. Actually, EPSG:31287 is MGI / Austria Lambert which covers all of Austria. But EPSG:31256 (MGI / Austria GK East) covers the eastern part including Vienna. However, "Austria's standard projected coordinate system" likely refers to EPSG:31287 MGI Austria Lambert which is the standard for nationwide mapping. Actually, I think the most common one used in Austria is EPSG:31256 for the eastern region or EPSG:31254 for the central... Wait, in practice, many Austrian GIS projects use EPSG:31287 (MGI Austria Lambert) as the national projected CRS. Let me just use EPSG:31287 - MGI / Austria Lambert.

Actually, looking more carefully - the standard projected coordinate system for Austria is often EPSG:31256 (MGI / Austria GK East) for the Vienna area since it's in the eastern Gauss-Krüger zone. But "Austria's standard projected coordinate system" could mean EPSG:31287 (MGI Austria Lambert) which is the official national projection. I think I'll use EPSG:31287 as it's the nationwide standard.

Let me plan the approach:

1. Download the PBF file
2. Use osmium to parse it and find:
   a. Ways whose name ends in 'Gürtel' (the Gürtel road)
   b. Create a 500m buffer around those ways (in a suitable CRS)
   c. Find all highways that intersect this buffer
   d. Find all PT route relations that cross the buffer

This is complex. Let me use osmium for the heavy lifting and geopandas/shapely for spatial operations.

Actually, let me think about this more carefully. The PBF is large (Austria is ~300MB+). I need to:

1. First scan for ways with name ending in 'Gürtel' to get the Gürtel geometry
2. Create a buffer
3. Then scan for all highways intersecting the buffer
4. Then scan for PT route relations

Using osmium with Python handler is probably the best approach for this.

Let me write a comprehensive Python script.

Plan:
1. Download the PBF file
2. Use osmium to read the PBF in two passes (or one pass with handlers)
3. First pass: find Gürtel ways and their geometries
4. Compute buffer
5. Second pass: find all highways intersecting buffer and all PT route relations crossing buffer

Actually, osmium allows a single pass approach. Let me think about the data flow:

For the Gürtel:
- Scan all ways, find those with name ending in 'Gürtel'
- Collect their node references
- Build geometries

For the buffer:
- Union all Gürtel ways into a single geometry
- Buffer by 500m

For highways:
- Scan all ways, if highway tag exists and the way intersects the buffer, collect it

For PT routes:
- This is trickier - route relations. I need to find relation members that are ways, and check if those ways are within the buffer.

Actually, with osmium I can do this in one pass but I need to handle the data flow carefully. The problem is that nodes come before ways, and ways come before relations. So I need to:
- Store all nodes, or at least those relevant
- When processing ways, check if they're Gürtel or highways
- When processing relations, if it's a PT route, check if its member ways are in the buffer

Let me write this step by step.

Actually, given the complexity, let me use a simpler approach:

1. Use osmium to extract just the Vienna area first (using a bounding box or the region)
2. Then work with the smaller dataset

Or better yet, I can use the approach:
1. Download the whole PBF
2. Use osmium to process it in a single pass with a custom handler

Let me write the script.

For osmium, the typical pattern is:

```python
import osmium
import shapely.wkb
import geopandas as gpd
import pandas as pd
import numpy as np
from pyproj import Transformer

class MyHandler(osmium.SimpleHandler):
    def __init__(self):
        super().__init__()
        ...
```

Let me think about the approach more carefully.

The key challenge is: to find PT route relations that "cross" the buffer, I need to check if the geometry of the relation (i.e., the ways that make up the route) intersects the buffer. But relations reference ways, and ways reference nodes. So I need to:
1. First pass: read all nodes (store locations)
2. Second pass: read all ways (store geometries and tags)
3. Third pass: read all relations and use way geometries to check intersection

Or I can do it in osmium with a handler and store things in memory.

Actually, with `osmium.SimpleHandler`, it processes nodes, ways, and relations in order. So I can:
1. In the `node` callback, store node locations
2. In the `way` callback, build geometries from stored nodes, check if it's a Gürtel way or highway
3. In the `relation` callback, check if it's a PT route and if its member ways are within the buffer

But there's a catch: when processing relations, the ways that are members may not have been processed yet (in the way callback). Actually, in osmium, the order is: all nodes, then all ways, then all relations. So by the time relations are processed, all ways have been seen.

Wait, osmium.SimpleHandler processes in order: nodes, ways, relations. So I can:
1. Store all node locations
2. When processing ways, build geometries and check for Gürtel/highway status
3. When processing relations, all ways are already stored, so I can check their geometries

Let me outline the approach:

```python
class ViennaHandler(osmium.SimpleHandler):
    def __init__(self):
        super().__init__()
        self.node_coords = {}  # id -> (lon, lat)
        self.gurtel_way_ids = set()
        self.gurtel_geometries = []
        self.highway_features = []  # list of (geometry, tags)
        self.pt_route_features = []  # list of (geometry, tags)
        
    def node(self, n):
        self.node_coords[n.id] = (n.location.lon, n.location.lat)
        
    def way(self, w):
        # Check if it's a Gürtel way
        name = w.tags.get('name')
        highway = w.tags.get('highway')
        if name and name.endswith('Gürtel') and highway:
            # Build geometry
            coords = [self.node_coords[n.ref] for n in w.nodes if n.ref in self.node_coords]
            if len(coords) >= 2:
                self.gurtel_way_ids.add(w.id)
                geom = LineString(coords)
                self.gurtel_geometries.append(geom)
                
        # Store all highway ways for later processing
        if highway:
            coords = [self.node_coords[n.ref] for n in w.nodes if n.ref in self.node_coords]
            if len(coords) >= 2:
                tags = dict(w.tags)
                geom = LineString(coords)
                self.highway_features.append((w.id, geom, tags))
```

Hmm, but this would store ALL highways in Austria which is a lot. Let me optimize.

Better approach:
1. First, read the PBF to find Gürtel ways and build their geometries
2. Compute the buffer
3. Read the PBF again, filtering for highways/PT routes that intersect the buffer

But two passes over the PBF is slow. Let me do it in one pass but be smarter.

Actually, I think the best approach is:

1. First pass: read the entire PBF, store all node locations, and identify Gürtel ways
2. Compute buffer from Gürtel geometries
3. Second pass: read the PBF again (or rather, use the stored data from the first pass) to filter highways and PT routes

Wait, but I can't do a second pass easily without storing everything or re-reading the file.

Let me reconsider. The PBF for Austria is about 300-400MB. Reading it once is fine. Storing all node coordinates for all of Austria would be ~hundreds of millions of nodes, which could be memory-heavy.

Let me think of a smarter approach:

Option A: Use osmium to extract just the Vienna bounding box first, then work with the smaller subset.

Option B: Use two passes - first pass to find Gürtel, compute buffer, second pass with a location filter.

Option C: Use one pass but be smart about memory.

Let me go with Option A - extract using bounding box.

The bounding box for Vienna is approximately: 
- min lon: 16.18
- min lat: 48.12
- max lon: 16.58
- max lat: 48.32

But since I need a 500m buffer around Gürtel, I need to be more careful. Let me just download the whole PBF and work with it, but be memory efficient.

Actually, let me use a practical approach:

1. Download the PBF
2. Use osmium command-line tool to extract just the area around Vienna (using a generous bbox)
3. Then process the extracted file with Python

Let me check if osmium tool is installed.

Actually, let me just write a Python script that does it all. I'll use osmium's Python bindings.

For memory efficiency with nodes, I can use osmium's NodeLocationsForWays handler which handles this automatically. But for the filtering I need, a custom approach is better.

Let me try a practical approach:

1. First, find Gürtel ways using a simple osmium script
2. Compute the buffer
3. Use osmium to extract ways and relations within the buffer area using a bounding box or geographic filter
4. Process the extracted data

Actually, let me just write a clean Python script that does it in one pass but stores nodes only for relevant ways.

Hmm, the problem is that when reading a PBF, nodes come first, then ways. So I don't know which nodes to store until I see the ways. But the ways reference nodes.

One trick: osmium has osmium.index, which can create a disk-based index for node locations. This is memory-efficient.

Let me use osmium's `NodeLocationsForWays` with a disk-based index:

```python
class MyHandler(osmium.SimpleHandler):
    def __init__(self):
        super().__init__()
        self.locations = osmium.index.create_map("sparse_file_array", "node_cache")
        self.gurtel_ways = []
        self.highway_ways = []
        
class MyLocationHandler(osmium.SimpleHandler):
    def __init__(self, handler):
        super().__init__()
        self.handler = handler
        
    def node(self, n):
        self.handler.locations.set(n.id, (n.location.lon, n.location.lat))
        self.handler.node(n)
```

Wait, this is getting complex. Let me just use a simpler approach with two passes.

Actually, let me look at this from a practical standpoint. The simplest approach:

1. Download PBF
2. Use osmium to find Gürtel ways and write their node refs
3. Use a second script to compute the buffer
4. Use osmium to extract the region

Actually, I think the cleanest approach is to write a Python script that:

1. Reads the PBF once, storing ALL node coords (this might be large but let's try)
2. Finds Gürtel ways
3. Computes buffer
4. Filters highways and PT routes

For Austria, the number of nodes in the PBF is around 200-300 million. Each node coordinate stored as two floats + an ID is maybe 24 bytes, so 200M * 24 = 4.8GB. That's too much.

Let me use a different approach. I'll use osmnx or just process the file smarter.

Actually, let me try a different approach entirely:

1. Use `osmium` command line tools to extract just Vienna and surrounding area by bounding box
2. Process the smaller file

Let me check what osmium tools are available.

Let me write a comprehensive script. I'll use a two-pass approach:

Pass 1: Read the PBF to find Gürtel ways and their node references, store node coords only for Gürtel ways
Pass 2: Compute buffer, then read the PBF again to find highways and PT routes within the buffer

For the second pass, I need to know which nodes are within/around the buffer. I can't know this without reading the data.

OK, let me take yet another approach. I'll use the osmium `extract` command with a polygon or use the `osmium extract` tool to clip by a bounding box that generously covers Vienna.

Step 1: Download the PBF
Step 2: Use osmium CLI to extract a bounding box around Vienna
Step 3: Process the extracted file

Let me check what osmium CLI tools are available.

Actually, let me just use Python with a pragmatic approach:

1. First, find the Gürtel ways and their bbox
2. Use that bbox to extract a region from the PBF
3. Process the region

Let me write the script step by step.

Actually, I realize I can do this all in one pass if I use a two-stage approach within osmium:

Stage 1: `osmium.SimpleHandler` to find Gürtel ways, store their nodes
After Stage 1: Compute buffer
Stage 2: `osmium.SimpleHandler` to find highways and PT routes within the buffer

But Stage 2 needs to check if way geometries intersect the buffer, which requires node coordinates. So I need node coords for all potentially relevant ways.

This is the fundamental challenge with large PBF files.

Let me just use a practical approach:

1. Download PBF
2. Use `osmium` command to extract a generous bounding box around Vienna
3. Then process the smaller file

Let me check what the bounding box of the Gürtel approximately is:
- The Gürtel is a ring road around Vienna's inner districts
- Approx center: 48.21°N, 16.35°E
- A generous box would be: 48.15 to 48.27 N, 16.30 to 16.42 E
- With 500m buffer, maybe 48.14 to 48.28, 16.29 to 16.43

But I don't know the exact extent yet. Let me first find the Gürtel and then compute.

Let me just write a practical script. I'll use memory-efficient approaches.

Actually, I just realized I can use `osmium` with a filter that only processes certain tags. For example, `osmium tags-filter` can extract only objects with certain tags, which makes the file much smaller.

Let me plan the approach:

Step 1: Download the PBF
Step 2: Use osmium tags-filter to extract all ways with highway=* (this will still be large for Austria, but manageable)
Step 3: Also extract all relations with type=route and route in (bus, tram, subway, train, trolleybus, light_rail)
Step 4: From the highway ways, find those with name ending in 'Gürtel'
Step 5: Compute buffer
Step 6: Filter the highway ways to those intersecting the buffer
Step 7: Filter the route relations to those crossing the buffer

Actually, `osmium tags-filter` preserves node references but doesn't include all nodes. So I'd need to use `osmium getid` to get the nodes.

This is getting complex. Let me just use a Python script that does it efficiently.

Let me try this approach: use `osmium` with a handler that:
1. Stores node locations in a sparse file array (disk-based)
2. In the way callback, checks if it's a Gürtel way and stores its geometry
3. After Gürtel ways are collected, computes the buffer
4. Then filters other ways and relations

But the problem is that in osmium.SimpleHandler, all nodes come first, then all ways, then all relations. So by the time I've collected the Gürtel ways, the nodes are already processed. I can't go back.

Ok, let me use a two-pass approach with osmium:

Pass 1: Read the PBF, find Gürtel ways, collect their node refs, compute buffer geometry
Pass 2: Read the PBF again, use the buffer to filter highways and PT routes

For Pass 2, I need node coords for all ways that might intersect the buffer. I can't know which ways those are without reading them.

Alternative for Pass 2: Use osmium with a geographic filter. osmium supports `osmium extract --polygon` or `osmium extract --bbox`.

Or better: use the `osmium` Python library with a `LocationHandler` that stores all node locations to disk, then use `SimpleHandler` for ways and relations.

Let me try this approach:

```python
import osmium
from shapely.geometry import LineString, MultiLineString, shape
from shapely.ops import unary_union
import pyproj
import geopandas as gpd
import pandas as pd
import os
import urllib.request

# Step 1: Download PBF
url = "https://download.geofabrik.de/europe/austria-latest.osm.pbf"
pbf_path = "/work/austria-latest.osm.pbf"
...

# Step 2: First pass - find Gürtel ways
class GurtelHandler(osmium.SimpleHandler):
    def __init__(self):
        super().__init__()
        self.node_coords = {}  # Only for Gürtel way nodes
        self.gurtel_way_ids = set()
        self.gurtel_way_nodes = {}  # way_id -> [node_ids]
        self.gurtel_tags = {}  # way_id -> {tag_dict}
        
    def way(self, w):
        name = w.tags.get('name')
        highway = w.tags.get('highway')
        if name and name.endswith('Gürtel') and highway:
            self.gurtel_way_ids.add(w.id)
            node_refs = [n.ref for n in w.nodes]
            self.gurtel_way_nodes[w.id] = node_refs
            self.gurtel_tags[w.id] = dict(w.tags)
            
handler1 = GurtelHandler()
handler1.apply_file(pbf_path)

# Now we know the Gürtel way node IDs. But we already passed the nodes.
# So we need another pass just for nodes.
```

This confirms the two-pass approach. Let me just do it:

Pass 1: Collect Gürtel way node refs
Pass 2: Collect node coords for those refs
[Compute buffer]
Pass 3: Collect highways and PT routes intersecting buffer

Actually, osmium lets you chain handlers. I can use the `LocationHandler` pattern:

```python
class GurtelNodeFinder(osmium.SimpleHandler):
    def __init__(self):
        super().__init__()
        self.gurtel_node_ids = set()
        self.node_coords = {}  # id -> (lon, lat)
        
    def way(self, w):
        name = w.tags.get('name')
        highway = w.tags.get('highway')
        if name and name.endswith('Gürtel') and highway:
            for n in w.nodes:
                self.gurtel_node_ids.add(n.ref)
                
    def node(self, n):
        if n.id in self.gurtel_node_ids:
            self.node_coords[n.id] = (n.location.lon, n.location.lat)
```

But wait, in osmium.SimpleHandler, the order is: node(), way(), relation(). So nodes are processed before ways. In this handler, when `node()` is called, `gurtel_node_ids` hasn't been populated yet because we haven't seen the ways.

So I need to use `osmium.SimpleHandler` properly. Let me use a different approach:

Use `osmium.apply_file()` with `locations=True` parameter, which automatically resolves node locations when processing ways. But this only works if we use `osmium.NodeLocationsForWays` wrapper.

Actually, osmium provides `osmium.NodeLocationsForWays` which is a handler wrapper that fills in node locations automatically:

```python
class MyHandler(osmium.SimpleHandler):
    def way(self, w):
        # w.nodes now have valid locations if wrapped with NodeLocationsForWays
        for n in w.nodes:
            lat = n.location.lat
            lon = n.location.lon
```

But I don't think this stores all node locations - it just resolves them for the current way being processed. Let me check...

Actually, `NodeLocationsForWays` does store all node locations (in memory or disk) so that it can resolve them when ways are processed. The default is to use an in-memory index.

Let me use this approach:

```python
import osmium
from shapely.geometry import LineString, MultiLineString, shape, box
from shapely.ops import unary_union
import pyproj
import geopandas as gpd
import pandas as pd
import numpy as np
from functools import partial
import os
import urllib.request
from urllib.error import URLError
import time

# Download PBF...

class GurtelHandler(osmium.SimpleHandler):
    def __init__(self):
        super().__init__()
        self.gurtel_ways = []  # list of (geom, tags)
        
    def way(self, w):
        name = w.tags.get('name')
        highway = w.tags.get('highway')
        if name and name.endswith('Gürtel') and highway:
            coords = [(n.location.lon, n.location.lat) for n in w.nodes if n.location.valid()]
            if len(coords) >= 2:
                geom = LineString(coords)
                self.gurtel_ways.append((geom, dict(w.tags)))

handler = GurtelHandler()
# Use NodeLocationsForWays to resolve node coords
handler.apply_file(pbf_path, locations=True)
```

This should work! The `locations=True` parameter tells osmium to resolve node locations automatically.

Now, for the second pass, I need to find:
1. All highway ways intersecting the buffer
2. All PT route relations (and their member ways) crossing the buffer

For highways, I can use `locations=True` in a handler.

For PT route relations, I need to:
- Find relations with type=route and route in (bus, tram, subway, train, trolleybus, light_rail)
- Collect the member ways of each route
- Build a MultiLineString from the member way geometries
- Check if it crosses the buffer

This requires knowing which ways belong to which relation. In osmium, relations come after ways, so when processing relations, I can check if I've already seen their member ways.

Here's my approach:

```python
class NetworkHandler(osmium.SimpleHandler):
    def __init__(self, buffer_geom, buffer_wgs84):
        super().__init__()
        self.buffer_geom = buffer_geom  # in projected CRS
        self.buffer_wgs84 = buffer_wgs84  # in WGS84
        self.highway_features = []  # (geom_wgs84, tags)
        self.way_geometries = {}  # way_id -> LineString WGS84
        self.way_tags = {}  # way_id -> tags
        self.pt_routes = {}  # rel_id -> {'tags': {}, 'way_ids': set()}
        self.pt_route_geometries = {}  # rel_id -> {'geom': MultiLineString, 'tags': {}}
        
    def way(self, w):
        highway = w.tags.get('highway')
        if highway:
            coords = [(n.location.lon, n.location.lat) for n in w.nodes if n.location.valid()]
            if len(coords) >= 2:
                geom = LineString(coords)
                # Check if intersects buffer
                # Transform to projected CRS for intersection test
                geom_proj = transform(geom, wgs84_to_austria)
                if geom_proj.intersects(self.buffer_geom):
                    self.highway_features.append((geom, dict(w.tags)))
                # Store for possible PT route use
                self.way_geometries[w.id] = geom
                self.way_tags[w.id] = dict(w.tags)
                
    def relation(self, r):
        route = r.tags.get('route')
        rel_type = r.tags.get('type')
        if rel_type == 'route' and route in ('bus', 'tram', 'subway', 'train', 'trolleybus', 'light_rail'):
            way_ids = []
            for member in r.members:
                if member.type == 'w':  # way member
                    way_ids.append(member.ref)
            # Check if any member way intersects the buffer
            for wid in way_ids:
                if wid in self.way_geometries:
                    geom = self.way_geometries[wid]
                    geom_proj = transform(geom, wgs84_to_austria)
                    if geom_proj.intersects(self.buffer_geom):
                        # This route crosses the buffer
                        route_ways = [self.way_geometries[wid] for wid in way_ids if wid in self.way_geometries]
                        if route_ways:
                            multi = MultiLineString(route_ways)
                            self.pt_route_geometries[r.id] = {'geom': multi, 'tags': dict(r.tags)}
                        break
```

Hmm, but storing all way geometries in memory could be large. Let me think...

Actually, for the PT routes, I only need ways that are members of PT route relations. I could first identify which ways are part of such relations, then only store those.

But since relations come after ways in the PBF, I can't know which ways are part of PT routes when processing ways.

Alternative: Make a third pass just for relations.

Or: Store all way geometries only for ways that are referenced by relations. But we don't know that yet.

Let me try a different approach: store all ways (but maybe we can filter by bounding box first). Since we're only interested in ways around Vienna, let me extract a bounding box first.

Actually, you know what, let me simplify. The buffer around Gürtel is relatively small. Most ways in Austria won't be within it. Let me just do the two-pass approach:

Pass 1: Find Gürtel ways and compute buffer
Pass 2: Read all ways and relations, filtering by the buffer

For Pass 2, I'll use osmium but only store geometries that might be relevant. Since we're looking at a small area, the number of ways within/near the buffer will be manageable.

Let me also add a spatial index to speed up the check, or just transform each geometry and check intersection.

Let me write the script now.

For the CRS: "Austria's standard projected coordinate system" - I'll use EPSG:31287 (MGI / Austria Lambert). Actually, looking at common use in Vienna, EPSG:31256 (MGI / Austria GK East) is also very common. Let me check... 

EPSG:31287 (MGI / Austria Lambert) is the official national projection for Austria. EPSG:31256 is the Gauss-Krüger East zone which covers the Vienna area. Both are common.

I'll go with EPSG:31287 as it covers all of Austria uniformly.

Actually, many Austrian authorities use EPSG:31256 for the eastern part. Let me use EPSG:31287 since it says "Austria's standard" and Lambert is the national standard.

Wait, I just recalled - EPSG:3416 is MGI / Austria GK Central. Hmm, there are several.

Let me use EPSG:31287 (MGI Austria Lambert) - this is a common national standard.

Alright, let me write the full script.

Actually, let me reconsider my approach. Instead of storing ALL ways in memory, let me:

1. Find Gürtel ways, compute buffer in both WGS84 and projected
2. For the second pass, use osmium with `locations=True`
3. Store only ways that are within or near the buffer (I can use a rough bbox check first)
4. For PT routes, do a third pass or store the ways that are part of PT routes

But again, the challenge with PT routes is that relations come after ways.

Let me take yet another approach: process the file THREE times.

Pass 1: Find Gürtel ways, compute buffer (with locations=True)
Pass 2: Store ALL highway ways (only those near buffer) and ALL way geometries (for all ways, not just highways, since PT routes might use non-highway ways) that are near the buffer
Pass 3: Process relations, using the stored way geometries from Pass 2

Actually, for PT routes, the ways might be highways (roads, tracks) that I'm already collecting in Pass 2 for the highway layer. So if I store way geometries for all ways (not just highway-tagged), I can use them for PT routes.

Let me optimize: Pass 2 stores all way geometries (with coords) for ways that are within or near the buffer. Then Pass 3 uses these to build PT route geometries.

But "near the buffer" is tricky because a route relation might include ways that are outside the buffer but the route as a whole crosses the buffer. But the user said "crosses the same buffer", meaning the route should cross the buffer. If the route has some ways inside the buffer, it crosses the buffer.

So in Pass 2, I need to store all ways that are part of PT route relations AND are within/near the buffer. But I don't know which ways are part of PT routes until Pass 3.

This is a chicken-and-egg problem. Let me just store ALL ways (in the Vienna area) in Pass 2, and then filter in Pass 3.

Since we're dealing with a relatively small area (Vienna), the number of ways should be manageable (maybe 100k-200k ways).

Let me use this approach:

1. Download PBF
2. Pass 1: Find Gürtel ways, compute buffer
3. Pass 2: Read PBF again with locations=True, store way geometries for all ways within/around the buffer area (using a generous bbox)
4. Pass 3: Read PBF again, process relations, use stored way geometries to build PT route geometries

Actually, there's a simpler way. Let me use two passes:

Pass 1: Find Gürtel ways, compute buffer
Pass 2: Read PBF with locations=True, process ways AND relations:
  - For ways: check if within buffer, if so and it's highway, add to highway_features
  - Also store way geometry if it's a highway (for potential PT route use)
  - For relations: check if PT route, look up member way geometries that we've stored

Since relations come after ways in the PBF, by the time we process relations, we've already seen all the ways. So if we store way geometries for all highway ways (anyway needed for the output), and also for any way that might be part of a PT route... but we don't know which are PT route members.

Hmm, this is circular again. Let me just store ALL way geometries that are within the buffer area, regardless of tagging. The number should be manageable for a small area.

Let me use a generous bbox filter first to reduce the number of ways stored.

OK let me just write the script. I'll use `locations=True` and store way geometries for everything in the buffer area.

Let me code it up:

```python
#!/usr/bin/env python3
"""
Extract Gürtel buffer highway network and PT routes from Austria PBF.
"""

import osmium
import geopandas as gpd
import pandas as pd
from shapely.geometry import LineString, MultiLineString, box, shape
from shapely.ops import unary_union, transform
import pyproj
import urllib.request
import os
import time
from urllib.error import URLError

# CRS
WGS84 = "EPSG:4326"
AUSTRIALAMB = "EPSG:31287"  # MGI / Austria Lambert

# File paths
PBF_URL = "https://download.geofabrik.de/europe/austria-latest.osm.pbf"
PBF_PATH = "/work/austria-latest.osm.pbf"
GPKG_PATH = "/work/vienna_network.gpkg"

def download_file(url, path):
    """Download with retries."""
    for attempt in range(3):
        try:
            print(f"Downloading {url}... (attempt {attempt+1})")
            urllib.request.urlretrieve(url, path)
            return
        except URLError as e:
            if attempt < 2:
                time.sleep(5)
            else:
                raise

# Step 1: Download
if not os.path.exists(PBF_PATH):
    download_file(PBF_URL, PBF_PATH)
else:
    print(f"{PBF_PATH} already exists, skipping download")

# Step 2: Find Gürtel ways
print("Finding Gürtel ways...")
class GurtelHandler(osmium.SimpleHandler):
    def __init__(self):
        super().__init__()
        self.gurtel_geoms_wgs84 = []
        self.gurtel_ways = []  # (geom_wgs84, tags)
        
    def way(self, w):
        name = w.tags.get('name')
        highway = w.tags.get('highway')
        if name and name.endswith('Gürtel') and highway:
            coords = [(n.location.lon, n.location.lat) for n in w.nodes if n.location.valid()]
            if len(coords) >= 2:
                geom = LineString(coords)
                self.gurtel_geoms_wgs84.append(geom)
                self.gurtel_ways.append((geom, dict(w.tags)))

h1 = GurtelHandler()
h1.apply_file(PBF_PATH, locations=True)

print(f"Found {len(h1.gurtel_ways)} Gürtel way segments")

# Step 3: Compute 500m buffer
print("Computing buffer...")
# Project Gürtel to Austria Lambert
project_to = pyproj.Transformer.from_crs(WGS84, AUSTRIALAMB, always_xy=True).transform
project_from = pyproj.Transformer.from_crs(AUSTRIALAMB, WGS84, always_xy=True).transform

gurtel_proj = [transform(project_to, g) for g in h1.gurtel_geoms_wgs84]
gurtel_union = unary_union(gurtel_proj)
buffer_proj = gurtel_union.buffer(500)

# Also get WGS84 buffer for bounding box checks
buffer_wgs84 = transform(project_from, buffer_proj)

print("Buffer area computed")

# Step 4: Collect highways and PT routes
print("Collecting highways and PT routes...")

class NetworkHandler(osmium.SimpleHandler):
    def __init__(self, buffer_proj, project_to):
        super().__init__()
        self.buffer_proj = buffer_proj
        self.project_to = project_to  # transformer WGS84 -> projected
        self.highway_features = []  # (geom_wgs84, tags)
        self.way_geoms = {}  # way_id -> LineString (WGS84)
        self.pt_route_features = []  # (geom_wgs84, tags)
        
    def way(self, w):
        highway = w.tags.get('highway')
        coords = [(n.location.lon, n.location.lat) for n in w.nodes if n.location.valid()]
        if len(coords) < 2:
            return
        geom = LineString(coords)
        
        # Quick bbox check first
        if not geom.intersects(buffer_wgs84):
            return
            
        # More precise check in projected CRS
        geom_proj = transform(self.project_to, geom)
        if not geom_proj.intersects(self.buffer_proj):
            return
            
        if highway:
            self.highway_features.append((geom, dict(w.tags)))
        
        # Store way geom for PT route use
        self.way_geoms[w.id] = geom
        
    def relation(self, r):
        route = r.tags.get('route')
        rel_type = r.tags.get('type')
        if rel_type == 'route' and route in ('bus', 'tram', 'subway', 'train', 'trolleybus', 'light_rail'):
            # Collect member way geometries
            member_geoms = []
            for member in r.members:
                if member.type == 'w' and member.ref in self.way_geoms:
                    member_geoms.append(self.way_geoms[member.ref])
            
            if not member_geoms:
                return
                
            # Check if any member way intersects buffer
            for g in member_geoms:
                g_proj = transform(self.project_to, g)
                if g_proj.intersects(self.buffer_proj):
                    multi = MultiLineString(member_geoms)
                    self.pt_route_features.append((multi, dict(r.tags)))
                    break

h2 = NetworkHandler(buffer_proj, project_to)
h2.apply_file(PBF_PATH, locations=True)

print(f"Found {len(h2.highway_features)} highway features")
print(f"Found {len(h2.pt_route_features)} PT route features")

# Step 5: Write to GeoPackage
print("Writing GeoPackage...")

# Highways
highway_rows = []
for geom, tags in h2.highway_features:
    row = {}
    for key in ['name', 'highway', 'maxspeed', 'lanes', 'surface', 'oneway']:
        row[key] = tags.get(key)
    row['geometry'] = transform(project_to, geom)
    highway_rows.append(row)

gdf_highways = gpd.GeoDataFrame(highway_rows, crs=AUSTRIALAMB, geometry='geometry')
gdf_highways = gdf_highways[['name', 'highway', 'maxspeed', 'lanes', 'surface', 'oneway', 'geometry']]

# PT Routes
pt_rows = []
for geom, tags in h2.pt_route_features:
    row = {}
    for key in ['ref', 'name', 'operator', 'route']:
        row[key] = tags.get(key)
    row['geometry'] = transform(project_to, geom)
    pt_rows.append(row)

gdf_pt = gpd.GeoDataFrame(pt_rows, crs=AUSTRIALAMB, geometry='geometry')
gdf_pt = gdf_pt[['ref', 'name', 'operator', 'route', 'geometry']]

# Write to GeoPackage
gdf_highways.to_file(GPKG_PATH, layer='highways', driver='GPKG')
gdf_pt.to_file(GPKG_PATH, layer='pt_routes', driver='GPKG')

print(f"Done! Written to {GPKG_PATH}")
```

Wait, there are some issues with this approach:

1. The `buffer_wgs84` reference in the `way` method - this is defined outside the class. I need to pass it in.
2. The `project_to` function reference - when I store it in the handler, it might not be picklable or might cause issues. Let me use pyproj.Transformer properly.

Let me also reconsider - for the `buffer_wgs84`, I'm checking `geom.intersects(buffer_wgs84)` but the buffer in WGS84 is distorted. It's better to just do the projected check.

Also, storing all way geometries for ways that intersect the buffer might still be a lot. But for a 500m buffer around Gürtel (which is about 15km long ring), the area is about 15km * (500m * 2) + pi * 500^2 = 15km² + 0.785km² ≈ 16km². That's quite small. The number of ways in this area should be very manageable.

Let me refine the approach and also handle the large file download.

Actually, I realize there might be an issue with the bbox quick check. The `buffer_wgs84` in WGS84 coordinates might not be a good approximation since 500m in projected space is different from 500m in WGS84 degrees. But for a quick bbox check, it's fine to use a generous WGS84 buffer.

Let me refine the script and make sure it handles everything correctly.

One more thing - I need to be careful about the `oneway` tag. In OSM, it could be "yes", "no", "1", "0", "-1", etc. I'll keep it as-is.

Also, for the PT routes, the user said "each route as a single feature" with MultiLineString geometry. My approach of collecting all member ways into a MultiLineString is correct.

Let me also consider: what if a PT route relation has no ways with stored geometries? Then we can't create the route geometry.

Let me also handle the case where the same way might be a highway AND part of a PT route. That's fine - they go to different layers.

OK, let me write the final script. But first, let me also consider that downloading the full Austria PBF might take a while and be large. Let me check if there's a smaller regional extract for Vienna specifically.

Geofabrik has Austria regions:
- https://download.geofabrik.de/europe/austria.html
- Upper Austria, Lower Austria, Vienna, etc.

Vienna-specific: https://download.geofabrik.de/europe/austria/vienna-latest.osm.pbf

This would be much smaller! Let me use the Vienna extract instead of the whole Austria.

But wait - the Gürtel is at the border of Vienna and might extend into Lower Austria. Let me check...

The Gürtel is entirely within Vienna municipality, so the Vienna extract should be sufficient. But to be safe, I could use the whole Austria extract. However, the Vienna extract is much smaller and faster to process.

Actually, let me use the Austria PBF as the user specifically asked for the "Geofabrik Austria PBF" URL. But to avoid issues with the download size, let me use the Vienna extract if it's more practical. Hmm, the user specifically said the Austria PBF though.

Let me use the Austria PBF as requested, but I'll handle the large file efficiently.

Wait, let me reconsider - maybe I can use the Vienna area extract since all the data we need is in Vienna. The Gürtel is entirely in Vienna. And the buffer around it would extend slightly beyond Vienna's borders into Lower Austria, but the Vienna PBF might already include some surrounding areas or might be clipped to administrative boundaries.

To be safe, let me use the Austria-wide PBF as the user specifically mentioned it.

Let me write a clean, well-structured script.

Actually, one more consideration: for the PT routes, some route relations might have hundreds of member ways spanning a large area. If the route crosses the buffer, I need to include all its member ways (to create a complete route geometry), not just those within the buffer. But storing all way geometries for all PT route members across all of Austria would be too much.

So my approach of only storing way geometries that are within/near the buffer is flawed for PT routes. A PT route might cross the buffer but have member ways far outside the buffer.

Let me think about this differently. The user wants "every public-transport route relation that crosses the same buffer" with "each route as a single feature". So if a route crosses the buffer, I need the FULL geometry of the route (all member ways), not just the part within the buffer.

This means I need to store geometries of all ways that are members of PT routes, even if they're far from the buffer. But for Austria, this could be a lot of ways.

Alternative approach for PT routes:
1. First, find all PT route relations (using a tag filter - no geometry needed)
2. Collect all member way IDs from these relations
3. Then extract those ways from the PBF

This would require processing the PBF multiple times or finding a way to handle it in one pass.

Actually, since relations come after ways in the PBF, I can't know which ways I need when processing ways. But I can do it in two passes over just the relations:

Pass A: Read all relations to find PT routes and their member way IDs
Pass B: Read the PBF to extract all those ways, checking if any intersect the buffer

If any way of a route intersects the buffer, the whole route is included.

Let me refine the approach:

Pass 1: Find Gürtel ways, compute buffer
Pass 2: Read all relations, find PT routes, collect member way IDs per route, AND store which routes have ways that are in our stored way geometries

But this is still chicken-and-egg. Let me do:

Pass 1: Find Gürtel ways, compute buffer
Pass 2: Read all ways (with locations=True), store geometries for ALL ways that intersect the buffer (for highways), and ALSO store geometries for ALL ways (regardless of location) for ways that might be part of PT routes

But I can't know which ways are part of PT routes during Pass 2 since relations haven't been processed yet.

Hmm, let me try a different approach:

1. Find Gürtel, compute buffer - one pass
2. Read all relations to find PT routes and their member way IDs - one pass  
3. Read the PBF with locations=True, process ALL ways, checking:
   a. If it's a highway intersecting buffer -> add to highway layer
   b. If its ID is in the set of PT route member way IDs -> store geometry
4. For each PT route, build the MultiLineString from stored member way geometries

This is three passes but the second pass (relations only) is fast since relations are fewer than ways.

Wait, actually I can do this in two passes:

Pass 1: Find Gürtel, compute buffer (using locations=True for ways)
Pass 2: 
  a. Skip nodes (we use locations=True)
  b. For ways: check if intersects buffer. If highway, add to highway features. Also store ALL way geometries that are in PT route member way IDs (but we don't know these yet!)
  c. For relations: find PT routes, look up member way geometries

The issue remains: during way processing, we don't know which ways are PT route members.

Let me use THREE passes:

Pass 1: Find Gürtel ways, compute buffer
Pass 2: Read all relations, find PT routes, collect {rel_id: {tags, [member_way_ids]}}
Pass 3: Read all ways (locations=True):
  - Check if highway and intersects buffer -> add to highway layer
  - Check if way_id is in any PT route's member way_ids -> store geometry
  After ways are done, build PT route geometries from stored way geometries

Actually wait, I realize there's a simpler way. Since the user wants PT routes that CROSS the buffer, I need to check if the route's geometry intersects the buffer. The route's geometry is the union of its member way geometries. If I store way geometries only for ways within the buffer, I can check if the route has at least one way within the buffer.

But then for the output, I need the FULL route geometry, which requires storing all member way geometries. However, for PT routes that cross the buffer, their member ways might be spread across the entire route network.

For a bus route in Vienna, the total length might be 10-20km, which is manageable. Let me just store way geometries for ALL ways that are members of PT routes across all of Austria. This might be a lot (thousands of routes × tens of ways per route = tens of thousands of ways), but it should be manageable in memory.

Actually, let me think about this differently. Most ways are only a few hundred meters long. Even if a PT route has 100 ways, each with ~10 nodes, storing 100 LineStrings is nothing. So storing geometries for all PT route member ways across Austria should be fine.

Let me estimate:
- Austria has maybe 1000 bus routes
- Each bus route has maybe 50-100 ways
- Total: 50,000-100,000 ways
- Each LineString with ~10 nodes = ~800 bytes
- Total: ~40-80 MB for geometries, plus ~20 MB for node coords

This is manageable!

So the three-pass approach:

Pass 1: Find Gürtel ways, compute buffer (use locations=True)
Pass 2: Read relations, find PT routes, collect {rel_id: {tags, [member_way_ids]}}
Pass 3: Read ways (locations=True):
  - For each way: 
    - Check if intersects buffer
    - If highway and intersects buffer -> add to highway features
    - If way_id in any PT route's member way_ids -> store geometry
  - After all ways: build PT route geometries

Let me also collect the total set of PT route member way IDs from Pass 2 to use in Pass 3.

Actually, I think osmium allows me to avoid a separate relation pass. Let me check if osmium processes data in order: nodes → ways → relations.

In osmium.SimpleHandler:
- node() is called for each node
- way() is called for each way
- relation() is called for each relation

So yes, ways come before relations. This means in a single pass, when processing ways, I don't know which ways are part of PT routes yet.

OK, three passes it is.

Let me also optimize: in Pass 2 (relations only), I don't need locations=True. I can just read tags and member references without resolving locations.

Let me write the complete script now. Let me also handle the download with proper retry logic.

One more thought - for the actual node locations of PT route ways, I need locations=True to build geometries. So in Pass 3, I use locations=True.

Let me code this up properly now.

Wait, actually I just realized I can do it in TWO passes if I reverse the order. But osmium always reads in PBF order (nodes, ways, relations).

OK, three passes it is. Let me write it.

Actually, I want to reconsider - can I do it in two passes by being clever about what I store?

Pass 1: Find Gürtel, compute buffer
Pass 2: Read PBF with locations=True
  - way(): Check if intersects buffer. If highway, add to highway_features. Store ALL way geometries (id -> geom) for ways that intersect buffer.
  - relation(): Check if PT route. Look up member way geometries. If any member way has stored geometry (i.e., intersects buffer), build full route from all member ways (but we might not have all member way geometries stored!).

The problem with Pass 2 is that for PT routes, I might only have geometries for member ways that intersect the buffer, missing ways outside the buffer.

But wait - the user wants "each route as a single feature" with MultiLineString. If I can only get the partial geometry (ways within buffer), that's incomplete. But maybe it's acceptable? No, the user wants the full route.

OK, three passes it is. Or... let me use a different approach altogether.

What if I first extract all data using osmium CLI with a bounding box around Vienna, then work with the smaller file?

Actually, I just thought of another approach: use osmnx! OSMnx can download data from OSM for a specific area. But the user specifically asked for the Geofabrik PBF.

Let me just go with the three-pass approach. It's clean and correct.

Let me write the script now. I'll be careful about memory usage.

Actually, let me reconsider one more time. The user said "highway ways" for the first layer and "public-transport route relations" for the second layer. For the PT routes, I need the route relations, not the individual ways. So I need to:
1. Find route relations of the specified types
2. Check if they cross the buffer
3. Extract their full geometry (all member ways)

For step 3, I need the geometries of all member ways. Since many PT routes in Austria might not be near Vienna, I only need to store way geometries for PT routes that have at least one way near the buffer. But to determine this, I need to check if any member way is near the buffer.

Hmm, so I need:
1. Find all PT route relations (all of Austria - manageable, there aren't that many relations)
2. For each PT route, collect its member way IDs
3. For each member way, check if it intersects the buffer
4. If any member way does, collect all member way geometries for the full route

The challenge is steps 3 and 4 - I need way geometries.

Let me use this approach:

Pass 1: Find Gürtel ways, compute buffer
Pass 2: Read all relations, find PT routes, get {rel_id: {tags, [member_way_ids]}}
  - Also build a set of all PT route member way IDs
Pass 3: Read all ways (locations=True):
  - Store geometry for any way in the PT member way ID set
  - Also check highway + buffer intersection for highway layer
  - After all ways processed, build PT route geometries

For Pass 3, I need to store geometries for ALL PT route member ways, not just those in the buffer. This could be many ways across Austria. But I think it's manageable.

Let me estimate more carefully:
- Austria has ~2,000 bus routes, ~200 tram routes, ~50 subway routes, ~30 train routes (just guessing)
- Each route has 20-100 ways
- Total ways: (2000+200+50+30) * 50 = ~114,000 ways
- Each way has ~10 nodes, each node is 2 doubles = 16 bytes, plus overhead
- Total: ~114,000 * 10 * 16 = ~18 MB for coordinates, plus geometry overhead

This should be fine. Let me proceed.

Let me now write the script.

Actually, on reflection, let me also handle the case where the same way might be in multiple PT routes. I'll store the geometry once and reference it multiple times.

Let me also think about what "crosses the same buffer" means for PT routes. If a route's ways collectively form a geometry that intersects the buffer, it crosses. But if only one way of the route is within the buffer, the route as a whole crosses the buffer. So I check if any member way intersects the buffer.

For the full route geometry, I collect ALL member ways (not just those in the buffer) and create a MultiLineString.

OK let me write the script.

Actually, I want to think about memory for nodes. The Austria PBF has about 200-300 million nodes. With osmium and `locations=True`, osmium stores all node locations in memory (or uses a disk-backed index). The default is an in-memory index using arrays, which could use a lot of RAM.

Let me use a disk-backed node cache. osmium provides `osmium.index.create_map("sparse_file_array", "file_name")` for this.

Actually, when using `locations=True` in `apply_file()`, osmium uses a handler called `NodeLocationsForWays` which internally manages node locations. I can't control which index it uses through the simple API.

Hmm, let me check if I can use `osmium.NodeLocationsForWays` with a custom index.

Actually, I can create my own location handler. Let me use this approach:

```python
class LocationHandler(osmium.SimpleHandler):
    def __init__(self):
        super().__init__()
        self.locations = osmium.index.create_map("sparse_file_array", "/work/node_cache")
        
    def node(self, n):
        self.locations.set(n.id, (n.location.lon, n.location.lat))
```

But then I'd need to chain it with my way handler. osmium supports handler chaining.

Let me look at how to properly use osmium with disk-based node cache:

```python
import osmium

class MyWayHandler(osmium.SimpleHandler):
    def __init__(self):
        super().__init__()
        # This won't have node locations automatically
        
    def way(self, w):
        # w.nodes won't have valid locations
        pass
```

The correct way is to use `NodeLocationsForWays` as a wrapper:

```python
loc_handler = osmium.NodeLocationsForWays()
loc_handler.set_node_cache("/work/node_cache")  # Disk-based cache

class MyWayHandler(osmium.SimpleHandler):
    def way(self, w):
        # Now w.nodes have valid locations
        for n in w.nodes:
            print(n.location.lat, n.location.lon)

# Chain handlers
handler = MyWayHandler()
loc_handler.register_handler(handler)
osmium.apply_file(PBF_PATH, handlers=[loc_handler])
```

Wait, I think the API might be different. Let me check what's available.

Actually, osmium's Python bindings provide different approaches. Let me check:

The `NodeLocationsForWays` handler can be used like this:

```python
handler = NodeLocationsForWays()
handler.set_node_cache("my_node_cache")  # Optional disk cache

# Then apply_file with this handler
apply_file("file.osm.pbf", handlers=[handler, my_way_handler])
```

Or with the SimpleHandler:

```python
class MyHandler(osmium.SimpleHandler):
    def way(self, w):
        for n in w.nodes:
            if n.location.valid():
                # use n.location.lat, n.location.lon

handler = MyHandler()
handler.apply_file(PBF_PATH, locations=True)
```

The `locations=True` parameter tells osmium to use a default NodeLocationsForWays handler internally. This stores all node locations in memory by default.

For large files, I should use a disk cache. Let me try:

```python
handler = MyHandler()
# Can't pass custom node cache through locations=True
```

Hmm, maybe I need to set up the handlers manually.

Let me just use `locations=True` and hope it works. If memory becomes an issue, I'll add a disk cache. The Austria PBF is about 300MB zipped, and the node locations in memory might be ~2-3GB which is a lot.

Actually, let me use the proper approach with a disk cache:

```python
import osmium

class NodeLocationHandler(osmium.NodeLocationsForWays):
    def __init__(self):
        super().__init__()
        # Use a disk-based node cache
        self.node_cache = osmium.index.create_map("sparse_file_array", "/work/node_cache")
        
class WayHandler(osmium.SimpleHandler):
    def way(self, w):
        # Node locations are resolved
        pass

nl_handler = NodeLocationHandler()
way_handler = WayHandler()
nl_handler.register_handler(way_handler)
osmium.apply_file(PBF_PATH, handlers=[nl_handler])
```

Wait, I'm not sure about the exact API. Let me just try the simple approach first and see if it works.

Actually, for osmpythonbindings, the typical usage with disk cache is:

```python
import osmium

class MyHandler(osmium.SimpleHandler):
    def way(self, w):
        for n in w.nodes:
            if n.location.valid():
                # use it
                pass

h = MyHandler()
h.apply_file(PBF_PATH, locations=True, idx="sparse_file_array,/work/node_cache")
```

Or something like that. Let me check the osmium Python docs... I'm not 100% sure about the syntax. Let me just use `locations=True` and see what happens.

Actually, let me test with a small script first.

Let me just write the full solution. I'll use `locations=True` for the handlers that need node locations. If memory is an issue, I'll adjust.

Let me finalize my approach:

1. Download the Austria PBF
2. One pass to find Gürtel ways (with locations=True)
3. Compute buffer
4. Second pass just for relations (no locations needed, just tags and member refs)
5. Third pass for ways (with locations=True) - store highway features and PT route way geometries
6. Build PT route geometries from stored way geometries
7. Write to GeoPackage

Let me code this up now.

OK wait, I also realize that `buffer_wgs84` check I was thinking about is for a rough bbox check. Since the buffer in WGS84 is distorted, let me just do all intersection checks in the projected CRS (EPSG:31287) where distances are accurate.

Here's my refined plan for the way handler:

```python
def way(self, w):
    coords = [(n.location.lon, n.location.lat) for n in w.nodes if n.location.valid()]
    if len(coords) < 2:
        return
    geom_wgs84 = LineString(coords)
    geom_proj = transform(project_to, geom_wgs84)
    
    # Check if intersects buffer
    if not geom_proj.intersects(self.buffer_proj):
        return
        
    # Highway
    highway = w.tags.get('highway')
    if highway:
        self.highway_features.append((geom_wgs84, dict(w.tags)))
    
    # Store for PT routes if this way is a member of any PT route
    if w.id in self.pt_member_way_ids:
        self.way_geoms[w.id] = geom_wgs84
```

But this only stores way geometries for ways that are BOTH in the PT member set AND intersect the buffer. For the full route geometry, I need ALL member ways of the route, not just those in the buffer.

So I need to separate the concerns:
1. For highway layer: only ways that intersect the buffer
2. For PT route geometries: ALL ways that are members of PT routes that cross the buffer

The problem is: to determine if a PT route crosses the buffer, I need to check if ANY of its member ways intersect the buffer. But to build the full route geometry, I need ALL member ways.

So my approach should be:

1. Store way geometries for ALL PT route member ways (across all of Austria)
2. For each PT route, check if any member way intersects the buffer
3. If yes, build the full route geometry from all member ways

The number of PT route member ways is manageable as I estimated above.

Let me update the code:

In Pass 2 (relations): 
- Find all PT routes → {rel_id: {tags, [member_way_ids]}}
- Build set of ALL PT route member way IDs

In Pass 3 (ways):
- For each way:
  - Check if intersects buffer AND is highway → add to highway_features
  - If way_id in PT member way ID set → store geometry (regardless of buffer intersection)
- After all ways processed:
  - For each PT route:
    - Check if any member way intersects buffer
    - If yes, collect all member way geometries → MultiLineString

But wait, for checking if a member way intersects the buffer in Pass 3, I need to transform each way to projected CRS. That's fine since I'm already doing it for highways.

Let me write the final script now. I'll try to be efficient and clear.

Let me also think about what happens if a PT route has member ways that I haven't stored (because the way wasn't in the PBF, or wasn't a way type member, etc.). I'll just skip those.

Let me write the code.

Actually, let me use a cleaner approach. In Pass 3, I'll store way geometries (in projected CRS) for ALL PT route member ways. Then I can check buffer intersection directly without transforming.

Let me finalize and write the script.

One more thing - for `oneway`, OSM uses values like "yes", "no", "1", "0", "-1". I'll keep the raw value.

Also, for `maxspeed`, OSM stores it as a string like "50", "30", "100 km/h". I'll keep as-is.

Let me write the complete script now:

```python
#!/usr/bin/env python3

import os
import time
import sys
import urllib.request
from urllib.error import URLError
from collections import defaultdict

import osmium
import geopandas as gpd
import pandas as pd
from shapely.geometry import LineString, MultiLineString, box
from shapely.ops import unary_union, transform
import pyproj
from shapely import wkb

# === Configuration ===
PBF_URL = "https://download.geofabrik.de/europe/austria-latest.osm.pbf"
PBF_PATH = "/work/austria-latest.osm.pbf"
GPKG_PATH = "/work/vienna_network.gpkg"
WGS84 = "EPSG:4326"
AUSTRIALAMB = "EPSG:31287"  # MGI / Austria Lambert

# === Helpers ===
def download_with_retry(url, path, max_retries=3):
    for attempt in range(max_retries):
        try:
            print(f"Downloading {url} (attempt {attempt+1}/{max_retries})...")
            urllib.request.urlretrieve(url, path)
            size_mb = os.path.getsize(path) / (1024 * 1024)
            print(f"Downloaded {size_mb:.1f} MB")
            return
        except URLError as e:
            print(f"Download failed: {e}")
            if attempt < max_retries - 1:
                wait = 5 * (attempt + 1)
                print(f"Retrying in {wait}s...")
                time.sleep(wait)
    raise RuntimeError(f"Failed to download {url} after {max_retries} attempts")

# === Transformer ===
transformer_wgs84_to_mgi = pyproj.Transformer.from_crs(WGS84, AUSTRIALAMB, always_xy=True)
project_to = transformer_wgs84_to_mgi.transform
project_from = pyproj.Transformer.from_crs(AUSTRIALAMB, WGS84, always_xy=True).transform

# === Step 1: Download ===
if not os.path.exists(PBF_PATH):
    download_with_retry(PBF_URL, PBF_PATH)
else:
    print(f"{PBF_PATH} exists, skipping download")

# === Step 2: Find Gürtel ways ===
print("\n=== Finding Gürtel ways ===")

class GurtelHandler(osmium.SimpleHandler):
    def __init__(self):
        super().__init__()
        self.gurtel_geoms_wgs84 = []
        self.gurtel_features = []  # (geom_wgs84, tags)
        
    def way(self, w):
        name = w.tags.get('name')
        highway = w.tags.get('highway')
        if name and name.rstrip().endswith('Gürtel') and highway:
            coords = [(n.location.lon, n.location.lat) for n in w.nodes if n.location.valid()]
            if len(coords) >= 2:
                geom = LineString(coords)
                self.gurtel_geoms_wgs84.append(geom)
                self.gurtel_features.append((geom, dict(w.tags)))

h1 = GurtelHandler()
h1.apply_file(PBF_PATH, locations=True)
print(f"Found {len(h1.gurtel_geoms_wgs84)} Gürtel way segments")

if len(h1.gurtel_geoms_wgs84) == 0:
    print("ERROR: No Gürtel ways found!")
    sys.exit(1)

# === Step 3: Compute 500m buffer ===
print("\n=== Computing buffer ===")
gurtel_proj = [transform(project_to, g) for g in h1.gurtel_geoms_wgs84]
gurtel_union = unary_union(gurtel_proj)
buffer_proj = gurtel_union.buffer(500)
print(f"Buffer area: {buffer_proj.area / 1e6:.2f} km²")

# Also compute buffer bbox in WGS84 for quick filtering
buffer_bounds_proj = buffer_proj.bounds
buffer_bbox_proj = box(*buffer_bounds_proj)

# === Step 4: Find all PT route relations ===
print("\n=== Finding PT route relations ===")

class PTRelationHandler(osmium.SimpleHandler):
    def __init__(self):
        super().__init__()
        self.pt_routes = {}  # rel_id -> {'tags': {...}, 'way_ids': set()}
        self.all_pt_way_ids = set()
        self.valid_route_types = {'bus', 'tram', 'subway', 'train', 'trolleybus', 'light_rail'}
        
    def relation(self, r):
        route = r.tags.get('route')
        rel_type = r.tags.get('type')
        if rel_type == 'route' and route in self.valid_route_types:
            way_ids = set()
            for member in r.members:
                if member.type == 'w':
                    way_ids.add(member.ref)
            if way_ids:
                self.pt_routes[r.id] = {
                    'tags': dict(r.tags),
                    'way_ids': way_ids
                }
                self.all_pt_way_ids.update(way_ids)

h2 = PTRelationHandler()
h2.apply_file(PBF_PATH)  # No locations needed - just tags and refs
print(f"Found {len(h2.pt_routes)} PT route relations with {len(h2.all_pt_way_ids)} unique member ways")

# === Step 5: Read ways - highways + PT route geometries ===
print("\n=== Collecting highways and PT route way geometries ===")

class NetworkWayHandler(osmium.SimpleHandler):
    def __init__(self, buffer_proj, pt_way_ids):
        super().__init__()
        self.buffer_proj = buffer_proj
        self.pt_way_ids = pt_way_ids  # set of way IDs that are members of PT routes
        self.highway_features = []  # (geom_wgs84, tags)
        self.pt_way_geoms = {}  # way_id -> LineString (in projected CRS)
        
    def way(self, w):
        coords = [(n.location.lon, n.location.lat) for n in w.nodes if n.location.valid()]
        if len(coords) < 2:
            return
        
        geom_wgs84 = LineString(coords)
        
        # Quick bounding box check to avoid unnecessary transforms
        geom_bounds = geom_wgs84.bounds
        # WGS84 buffer approx bounds (very rough)
        if not geom_wgs84.intersects(buffer_wgs84_approx):
            # Still need to check PT route ways even outside buffer
            if w.id not in self.pt_way_ids:
                return
        
        geom_proj = transform(project_to, geom_wgs84)
        
        # Check for highway layer
        highway = w.tags.get('highway')
        if highway and geom_proj.intersects(self.buffer_proj):
            self.highway_features.append((geom_wgs84, dict(w.tags)))
        
        # Store geometry for PT route member ways (anywhere in Austria)
        if w.id in self.pt_way_ids:
            self.pt_way_geoms[w.id] = geom_proj

```

Hmm, this is getting complex with the buffer_wgs84_approx variable. Let me simplify and just check in projected CRS for everything.

Also, for the PT route way geometries, I should store them in the projected CRS so I can check buffer intersection directly.

Let me revise the NetworkWayHandler:

```python
class NetworkWayHandler(osmium.SimpleHandler):
    def __init__(self, buffer_proj, pt_way_ids, project_to):
        super().__init__()
        self.buffer_proj = buffer_proj
        self.pt_way_ids = pt_way_ids
        self.project_to = project_to
        self.highway_features = []  # (geom_wgs84, tags)
        self.pt_way_geoms_proj = {}  # way_id -> LineString (projected)
        self.pt_way_geoms_wgs84 = {}  # way_id -> LineString (WGS84)
        
    def way(self, w):
        coords = [(n.location.lon, n.location.lat) for n in w.nodes if n.location.valid()]
        if len(coords) < 2:
            return
        
        highway = w.tags.get('highway')
        is_pt_member = w.id in self.pt_way_ids
        
        if not highway and not is_pt_member:
            return
        
        geom_wgs84 = LineString(coords)
        geom_proj = transform(self.project_to, geom_wgs84)
        
        if highway and geom_proj.intersects(self.buffer_proj):
            self.highway_features.append((geom_wgs84, dict(w.tags)))
        
        if is_pt_member:
            self.pt_way_geoms_proj[w.id] = geom_proj
            self.pt_way_geoms_wgs84[w.id] = geom_wgs84
```

Wait, but I need to think about whether storing ALL PT route member way geometries (across all of Austria) is too much. Let me reconsider.

If a PT route in Vorarlberg (far from Vienna) has 50 member ways, storing all those geometries is wasteful since the route won't cross the Vienna buffer.

Can I optimize? I could first check if the route's bbox overlaps the buffer's bbox using the relation's bbox (but relations don't have bboxes in the PBF).

Alternatively, I could estimate: if ALL member way IDs of a route are outside the buffer area (by some heuristic), skip storing their geometries.

But this adds complexity. Let me just store all PT route way geometries and see if it works. The number should be manageable.

Alright, let me write the complete script. Let me also think about the PT route feature output carefully.

For each PT route:
- Check if any member way intersects the buffer
- If yes, collect all member way geometries and create a MultiLineString
- Tags: ref, name, operator, route

Wait, I also want to store the way geometries in WGS84 so I can transform them later. Or I can just store in projected and create the output directly.

Let me store in both: projected for buffer check, WGS84 for eventual output (I'll project at write time).

Actually, I'll just store in WGS84 and project at the end. For buffer intersection checks, I'll transform on the fly. Since we're doing each check once, it's fine.

Let me simplify and write the script.

Actually, thinking more about memory: if I store all PT route member way geometries for all of Austria, that could be 100k+ LineStrings. Each LineString has maybe 10-50 coordinates. That's 100k * 10 * 2 * 8 bytes = 16 MB for coords, plus object overhead (maybe 100-200 bytes per LineString) = 100k * 200 = 20 MB. Total ~40 MB. Fine.

Let me just write the complete script now. I'll handle the buffer_wgs84 thing differently - I'll just not do a quick bbox check and directly check in projected CRS for everything.

Wait, but I need to be smart. For the highway check, I only care about ways that are near the buffer. Checking `geom_proj.intersects(buffer_proj)` for every highway way in Austria is too many. Let me do a quick bbox check first.

Actually, a quick way: check if the WGS84 bbox of the way overlaps a generous WGS84 bbox of the buffer. The WGS84 bbox of the buffer would be a rectangle in lat/lon that contains the buffer.

Let me compute the buffer's WGS85 bbox from the projected buffer:

```python
buffer_bounds_proj = buffer_proj.bounds
# Transform corners to WGS84
min_lon, min_lat = transform(project_from, (buffer_bounds_proj[0], buffer_bounds_proj[1]))
max_lon, max_lat = transform(project_from, (buffer_bounds_proj[2], buffer_bounds_proj[3]))
```

Then for each way, I can do a quick check if its bbox overlaps this WGS84 bbox before doing the full transform and intersection test.

OK let me stop overthinking and write the code.

Let me write it clean and complete:

```python
#!/usr/bin/env python3
"""
Extract highways and PT routes around Vienna's Gürtel from Austria OSM PBF.
Output: vienna_network.gpkg with highways and pt_routes layers.
"""

import os
import sys
import time
import urllib.request
from urllib.error import URLError

import osmium
import geopandas as gpd
import pandas as pd
from shapely.geometry import LineString, MultiLineString, box
from shapely.ops import unary_union, transform
import pyproj

# === CONFIG ===
PBF_URL = "https://download.geofabrik.de/europe/austria-latest.osm.pbf"
PBF_PATH = "/work/austria-latest.osm.pbf"
GPKG_PATH = "/work/vienna_network.gpkg"
WGS84 = "EPSG:4326"
AUSTRIALAMB = "EPSG:31287"  # MGI / Austria Lambert
VALID_PT_ROUTES = {'bus', 'tram', 'subway', 'train', 'trolleybus', 'light_rail'}
HIGHWAY_ATTRS = ['name', 'highway', 'maxspeed', 'lanes', 'surface', 'oneway']
PT_ATTRS = ['ref', 'name', 'operator', 'route']

# === COORDINATE TRANSFORMERS ===
trans_4326_to_31287 = pyproj.Transformer.from_crs(WGS84, AUSTRIALAMB, always_xy=True).transform
trans_31287_to_4326 = pyproj.Transformer.from_crs(AUSTRIALAMB, WGS84, always_xy=True).transform


def download(url, path):
    for attempt in range(3):
        try:
            print(f"Downloading {url} (attempt {attempt+1}/3)...")
            urllib.request.urlretrieve(url, path)
            mb = os.path.getsize(path) / 1e6
            print(f"  Done: {mb:.0f} MB")
            return
        except URLError as e:
            print(f"  Error: {e}")
            if attempt < 2:
                time.sleep(5 * (attempt + 1))
    raise RuntimeError(f"Download failed after 3 attempts")


# ============================================================
# PASS 1 — Find Gürtel ways
# ============================================================
print("=" * 60)
print("PASS 1: Finding Gürtel ways")
print("=" * 60)

class GurtelHandler(osmium.SimpleHandler):
    def __init__(self):
        super().__init__()
        self.gurtel_ways_wgs84 = []

    def way(self, w):
        name = w.tags.get('name')
        if not name or not name.rstrip().endswith('Gürtel'):
            return
        highway = w.tags.get('highway')
        if not highway:
            return
        coords = [(n.location.lon, n.location.lat) for n in w.nodes if n.location.valid()]
        if len(coords) >= 2:
            self.gurtel_ways_wgs84.append(LineString(coords))

h1 = GurtelHandler()
h1.apply_file(PBF_PATH, locations=True)
print(f"  Found {len(h1.gurtel_ways_wgs84)} Gürtel way segments")

if not h1.gurtel_ways_wgs84:
    sys.exit("ERROR: No Gürtel ways found. Cannot proceed.")

# Project Gürtel, union + buffer
gurtel_proj = [transform(trans_4326_to_31287, g) for g in h1.gurtel_ways_wgs84]
buffer_proj = unary_union(gurtel_proj).buffer(500)

# A generous WGS84 bounding box for quick pre-filtering
buffer_bounds_proj = buffer_proj.bounds
sw = transform(trans_31287_to_4326, (buffer_bounds_proj[0], buffer_bounds_proj[1]))
ne = transform(trans_31287_to_4326, (buffer_bounds_proj[2], buffer_bounds_proj[3]))
bbox_wgs84 = box(sw[0], sw[1], ne[0], ne[1])
print(f"  Buffer bbox (WGS84): {[round(c, 4) for c in bbox_wgs84.bounds]}")


# ============================================================
# PASS 2 — Find PT route relations
# ============================================================
print("=" * 60)
print("PASS 2: Finding PT route relations")
print("=" * 60)

class PTRelationHandler(osmium.SimpleHandler):
    def __init__(self):
        super().__init__()
        self.pt_routes = {}       # rel_id -> {'tags': dict, 'way_ids': set}
        self.all_pt_way_ids = set()

    def relation(self, r):
        if r.tags.get('type') != 'route':
            return
        route = r.tags.get('route')
        if route not in VALID_PT_ROUTES:
            return
        way_ids = {m.ref for m in r.members if m.type == 'w'}
        if not way_ids:
            return
        self.pt_routes[r.id] = {
            'tags': dict(r.tags),
            'way_ids': way_ids,
        }
        self.all_pt_way_ids.update(way_ids)

h2 = PTRelationHandler()
h2.apply_file(PBF_PATH)  # no node locations needed
print(f"  Found {len(h2.pt_routes)} PT route relations")
print(f"  With {len(h2.all_pt_way_ids)} unique member ways")


# ============================================================
# PASS 3 — Read ways (highway + PT member way geometries)
# ============================================================
print("=" * 60)
print("PASS 3: Reading ways")
print("=" * 60)

class WayHandler(osmium.SimpleHandler):
    def __init__(self, buffer_proj, bbox_wgs84, pt_way_ids):
        super().__init__()
        self.buffer_proj = buffer_proj
        self.bbox_wgs84 = bbox_wgs84
        self.pt_way_ids = pt_way_ids
        # Output collections
        self.highway_features = []     # list of (geom_wgs84, tags_dict)
        self.pt_way_geoms = {}         # way_id -> (LineString projected, LineString wgs84)

    def way(self, w):
        coords = [(n.location.lon, n.location.lat) for n in w.nodes if n.location.valid()]
        if len(coords) < 2:
            return

        highway = w.tags.get('highway')
        is_pt = w.id in self.pt_way_ids

        if not highway and not is_pt:
            return

        geom_wgs84 = LineString(coords)

        # Quick WGS84 bbox check for highway ways
        if highway and not geom_wgs84.intersects(self.bbox_wgs84):
            # Still check PT way if needed
            if not is_pt:
                return
            # For PT ways outside the bbox, we still need to store geometry
            # But let's do a quick check - if way is very far, skip
            way_bounds = geom_wgs84.bounds
            if (way_bounds[2] < bbox_wgs84.bounds[0] or way_bounds[0] > bbox_wgs84.bounds[2] or
                way_bounds[3] < bbox_wgs84.bounds[1] or way_bounds[1] > bbox_wgs84.bounds[3]):
                # Way is definitely outside bbox, but still could be part of a PT route
                # that crosses the buffer. Let me check more carefully.
                # Actually, if the way is far from the buffer, the route might still
                # cross the buffer via other ways. So I should store this way's geometry
                # if it's part of a PT route, regardless of location.
                pass  # Still store PT way geometries

        geom_proj = transform(trans_4326_to_31287, geom_wgs84)

        # Check buffer intersection
        in_buffer = geom_proj.intersects(self.buffer_proj)

        if highway and in_buffer:
            self.highway_features.append((geom_wgs84, dict(w.tags)))

        if is_pt:
            self.pt_way_geoms[w.id] = (geom_proj, geom_wgs84)

h3 = WayHandler(buffer_proj, bbox_wgs84, h2.all_pt_way_ids)
h3.apply_file(PBF_PATH, locations=True)
print(f"  Highway features in buffer: {len(h3.highway_features)}")
print(f"  PT member way geometries stored: {len(h3.pt_way_geoms)}")


# ============================================================
# Build PT route features
# ============================================================
print("=" * 60)
print("Building PT route features")
print("=" * 60)

pt_route_features = []  # list of (MultiLineString projected, tags)

for rel_id, route_info in h2.pt_routes.items():
    way_ids = route_info['way_ids']
    tags = route_info['tags']
    
    # Find if any member way intersects the buffer
    member_geoms_proj = []
    member_geoms_wgs84 = []
    crosses_buffer = False
    
    for wid in way_ids:
        if wid in h3.pt_way_geoms:
            geom_proj, geom_wgs84 = h3.pt_way_geoms[wid]
            member_geoms_proj.append(geom_proj)
            member_geoms_wgs84.append(geom_wgs84)
            if geom_proj.intersects(buffer_proj):
                crosses_buffer = True
    
    if not crosses_buffer or not member_geoms_proj:
        continue
    
    # Create MultiLineString in projected CRS
    multi_proj = MultiLineString(member_geoms_proj)
    pt_route_features.append((multi_proj, tags))

print(f"  PT routes crossing buffer: {len(pt_route_features)}")


# ============================================================
# Write GeoPackage
# ============================================================
print("=" * 60)
print("Writing GeoPackage")
print("=" * 60)

# --- Highways layer ---
rows_hw = []
for geom_wgs84, tags in h3.highway_features:
    geom_proj = transform(trans_4326_to_31287, geom_wgs84)
    row = {k: tags.get(k) for k in HIGHWAY_ATTRS}
    row['geometry'] = geom_proj
    rows_hw.append(row)

gdf_hw = gpd.GeoDataFrame(rows_hw, crs=AUSTRIALAMB, geometry='geometry')
gdf_hw = gdf_hw[HIGHWAY_ATTRS + ['geometry']]

# --- PT Routes layer ---
rows_pt = []
for geom_proj, tags in pt_route_features:
    row = {k: tags.get(k) for k in PT_ATTRS}
    row['geometry'] = geom_proj
    rows_pt.append(row)

gdf_pt = gpd.GeoDataFrame(rows_pt, crs=AUSTRIALAMB, geometry='geometry')
gdf_pt = gdf_pt[PT_ATTRS + ['geometry']]

# Write to GeoPackage
gdf_hw.to_file(GPKG_PATH, layer='highways', driver='GPKG')
gdf_pt.to_file(GPKG_PATH, layer='pt_routes', driver='GPKG')

print(f"  Highways: {len(gdf_hw)} features")
print(f"  PT Routes: {len(gdf_pt)} features")
print(f"  Written to {GPKG_PATH}")
print("Done!")
```

Wait, there's an issue with the bbox check logic for PT routes. In the WayHandler, I'm doing:

```python
if highway and not geom_wgs84.intersects(self.bbox_wgs84):
    if not is_pt:
        return
```

But if `highway` is None (not a highway way), this check is skipped and we go to the PT check. Let me restructure:

```python
if not highway and not is_pt:
    return

geom_wgs84 = LineString(coords)

if highway:
    # Only check buffer for highways
    if geom_wgs84.intersects(self.bbox_wgs84):
        geom_proj = transform(...)
        if geom_proj.intersects(self.buffer_proj):
            self.highway_features.append(...)
    # Also check if this highway is a PT member way
    if is_pt:
        geom_proj = transform(...)
        self.pt_way_geoms[w.id] = (geom_proj, geom_wgs84)
elif is_pt:
    # Non-highway PT member way
    geom_proj = transform(...)
    self.pt_way_geoms[w.id] = (geom_proj, geom_wgs84)
```

Actually, the logic is getting messy. Let me simplify:

For every way:
1. If it's a highway AND intersects buffer → add to highway_features
2. If it's a PT member way → store geometry

Let me not use the bbox optimization initially and just do the full check. The number of highway ways near the buffer is small anyway.

Here's a cleaner version:

```python
class WayHandler(osmium.SimpleHandler):
    def __init__(self, buffer_proj, pt_way_ids):
        super().__init__()
        self.buffer_proj = buffer_proj
        self.pt_way_ids = pt_way_ids
        self.highway_features = []
        self.pt_way_geoms = {}

    def way(self, w):
        # Quick tag check before building geometry
        highway = w.tags.get('highway')
        is_pt = w.id in self.pt_way_ids
        
        if not highway and not is_pt:
            return
            
        coords = [(n.location.lon, n.location.lat) for n in w.nodes if n.location.valid()]
        if len(coords) < 2:
            return
            
        geom_wgs84 = LineString(coords)
        
        if highway or is_pt:
            geom_proj = transform(trans_4326_to_31287, geom_wgs84)
            
            if highway and geom_proj.intersects(self.buffer_proj):
                self.highway_features.append((geom_wgs84, dict(w.tags)))
            
            if is_pt:
                self.pt_way_geoms[w.id] = (geom_proj, geom_wgs84)
```

This is cleaner. The downside is that for non-highway PT member ways far from the buffer, we still transform and store them. But that's fine - the number is manageable.

Actually, I realize there's another issue: for ways that are PT member ways but NOT near the buffer AND not highways, they're still being transformed and stored. This could be many ways. But as I estimated, it's ~100k which is fine.

Let me also consider: what if a way is a PT member way but also a highway near the buffer? It gets added to highway_features AND stored in pt_way_geoms. That's correct.

OK, let me finalize the script. Let me also handle the case where for the PT routes, some member ways might not have been stored (because they weren't in the PBF as ways, or were of type 'n' or 'r'). In that case, I just skip them.

Let me write the complete final script now, being careful about all edge cases.

Also, I want to make sure the PT route MultiLineString is correct. If a route has only one way, MultiLineString with one geometry is fine.

Let me also think about the output CRS. The user says "Austria's standard projected coordinate system". EPSG:31287 (MGI / Austria Lambert) is the standard. Let me use that.

Actually, wait. I want to double-check: "Austria's standard projected coordinate system" - in Austrian GIS contexts, this could refer to:
- EPSG:31287 (MGI / Austria Lambert) - national projection
- EPSG:31256 (MGI / Austria GK East) - eastern zone (Vienna)
- EPSG:3416 (MGI / Austria GK Central) - central zone

I think EPSG:31287 is the safest choice as it covers all of Austria. Let me use that.

But actually, many Austrian datasets use EPSG:31256 for the Vienna area. And EPSG:31287 is less commonly used in practice. Let me check...

Hmm, I'll go with EPSG:31287 since it's explicitly "Austria's standard projected coordinate system" and Lambert is the national standard projection.

Alright, let me write the final script and run it.

One more thing: I should also handle the `oneway` tag correctly. In OSM, `oneway=yes` and `oneway=1` both mean one-way. `oneway=-1` means reverse direction. I'll keep the raw value.

Let me also make sure the PT route way IDs are integers (which they should be from the PBF).

OK, writing the script now.

Actually, I realize I'm overcomplicating the bbox quick check. Let me just not do it and directly check intersection in projected CRS. The number of ways that are highways or PT member ways in Austria is manageable.

Wait, "highway" ways include all roads, paths, etc. In Austria, there are millions of highway ways. Processing all of them and checking intersection with the buffer would be very slow.

So I NEED a quick pre-filter. Let me use a simple WGS84 bounding box check before doing the transform and projected intersection check.

Let me add a pre-filter:

```python
# Before expensive transform, do quick WGS84 bbox check
if highway and not bbox_wgs84.intersects(geom_wgs84):
    # Way is likely outside buffer area
    if not is_pt:
        return
    # For PT ways, still might be needed
```

But `bbox_wgs84` is a box in WGS84 degrees, which is a rough approximation. Some ways might be outside the bbox but still inside the buffer (due to coordinate distortion). And some ways inside the bbox might be outside the buffer. That's fine - the bbox check is just a pre-filter to avoid unnecessary transforms.

Actually, a better approach: use a generous bbox and check `geom_wgs84.intersects(bbox_wgs84)` before doing the transform. This is a fast check.

Let me revise:

```python
class WayHandler(osmium.SimpleHandler):
    def __init__(self, buffer_proj, bbox_wgs84, pt_way_ids):
        super().__init__()
        self.buffer_proj = buffer_proj
        self.bbox_wgs84 = bbox_wgs84
        self.pt_way_ids = pt_way_ids
        self.highway_features = []
        self.pt_way_geoms = {}

    def way(self, w):
        highway = w.tags.get('highway')
        is_pt = w.id in self.pt_way_ids
        
        if not highway and not is_pt:
            return
            
        coords = [(n.location.lon, n.location.lat) for n in w.nodes if n.location.valid()]
        if len(coords) < 2:
            return
            
        geom_wgs84 = LineString(coords)
        
        # Quick bbox pre-filter
        if highway and not geom_wgs84.intersects(self.bbox_wgs84):
            # Only need to continue if it's a PT member way
            if not is_pt:
                return
            # For PT ways, we still need geometry even if outside bbox
            # because the route might cross the buffer via other ways
        
        geom_proj = transform(trans_4326_to_31287, geom_wgs84)
        
        if highway and geom_proj.intersects(self.buffer_proj):
            self.highway_features.append((geom_wgs84, dict(w.tags)))
        
        if is_pt:
            self.pt_way_geoms[w.id] = (geom_proj, geom_wgs84)
```

Wait, but for PT ways that are outside the bbox, we still do the transform and store them. That's fine.

Actually, let me reconsider. For PT ways, if they're far from the buffer (e.g., in Vorarlberg), we don't need to store them because their route won't cross the Vienna buffer. But we don't know that without checking.

So for PT ways, I have two options:
1. Store ALL PT member way geometries (across all of Austria) - memory may be high
2. Store only PT member way geometries that are near the buffer - but then I'd miss routes where the crossing way is one of the near ones but other ways are far

Let me think about this differently. A PT route crosses the buffer if ANY of its member ways intersect the buffer. So I need to check each PT route's member ways for buffer intersection. 

For efficiency, I can:
1. Store PT way geometries only for ways that are near the buffer (using bbox check)
2. For PT routes, check if any stored way geometry intersects the buffer
3. If yes, the route crosses the buffer

But wait, this approach would miss routes where the ONLY way that crosses the buffer is stored (near buffer) but other ways (far from buffer) are NOT stored. I'd know the route crosses the buffer (from the stored way), but I wouldn't have the full geometry of the route.

Hmm, but actually if one way is near the buffer, I store it. Then when checking the route, I see this way intersects the buffer, so the route crosses. But I don't have the other ways' geometries to build the full route.

So I CAN'T use approach 2. I need ALL member way geometries for routes that cross the buffer.

Alternative: 
1. Store ALL PT member way geometries (memory heavy but manageable)
2. For each PT route, check if ANY member way intersects buffer
3. If yes, use ALL stored member way geometries to build the route

This is my original approach. Let me stick with it but optimize: for PT member ways, still use the bbox pre-filter but with a much more generous bbox (e.g., whole of eastern Austria) to reduce storage.

Actually, the number of PT member ways in Austria is not that large. Let me just store all of them. If memory is an issue, I can optimize later.

Let me write the final script now. I'll use the bbox pre-filter only for highways. For PT ways, I'll store everything.

Wait, but processing ALL PT member ways across Austria requires building geometries for them. That means for each PT member way, I need to look up its node coords (from the osmium node cache) and build a LineString. This happens in the `way()` callback. With `locations=True`, osmium resolves node coords on the fly.

The number of PT member way ways across Austria is maybe 100k. Each way has maybe 10 nodes. So we're processing 100k ways with 10 nodes each = 1M node lookups. That's very fast.

OK, let me just write the script without worrying about optimization.

Let me also reconsider: for the highway layer, the user wants "every highway ... that intersects a 500 m buffer". So I only need highways that intersect the buffer. The bbox pre-filter is useful here to avoid processing millions of highway ways across Austria.

For highways, I can be aggressive with the bbox pre-filter: use a buffer bbox expanded by a generous margin.

Let me compute the bbox in WGS84 by transforming the buffer bounds:

```python
buffer_bounds = buffer_proj.bounds
# Add some margin (500m)
buffer_bounds_expanded = (
    buffer_bounds[0] - 500, 
    buffer_bounds[1] - 500, 
    buffer_bounds[2] + 500, 
    buffer_bounds[3] + 500
)
# Transform to WGS84
sw_proj = (buffer_bounds_expanded[0], buffer_bounds_expanded[1])
ne_proj = (buffer_bounds_expanded[2], buffer_bounds_expanded[3])
sw_wgs = transform(trans_31287_to_4326, sw_proj)
ne_wgs = transform(trans_31287_to_4326, ne_proj)
bbox_wgs84 = box(sw_wgs[0], sw_wgs[1], ne_wgs[0], ne_wgs[1])
```

This gives a generous WGS84 bounding box around the buffer. Any highway way whose bbox doesn't intersect this box is definitely outside the buffer.

For PT ways, I need a different approach since they might be far from the buffer. Let me use a very generous bbox covering all of eastern Austria (where PT routes that cross the Vienna buffer might be).

Actually, let me think about this. A PT route that crosses the Vienna buffer could have member ways extending far from Vienna. For example, a long-distance bus route from Vienna to Salzburg that crosses the Gürtel. Its ways could be in Salzburg, Linz, etc.

To capture the full geometry of such a route, I'd need way geometries from all over Austria.

So I should store PT member way geometries from ALL of Austria. This is the only correct approach.

Let me estimate the number of PT member ways more carefully:
- Vienna has ~200 bus lines, ~30 tram lines, ~5 subway lines
- Other cities have their own systems
- Long-distance trains and buses

Each bus route might have 20-100 ways. Let's say 2000 PT relations in Austria, each with 50 ways on average = 100,000 ways.

100,000 LineStrings with ~10 nodes each = ~1,000,000 coordinates. This is very manageable.

OK, final approach:

1. Pre-filter highways with bbox check (to avoid processing millions of ways)
2. Store ALL PT member way geometries (no pre-filter)
3. Avoid duplicate transforms

Let me write the final script.

Actually, you know what, let me just write the script and run it. I've been planning for too long.

Let me also handle one more thing: the PT routes crossing the buffer. I need to check if the ROUTE crosses the buffer, not individual ways. If a route has 10 ways and only 1 is within the buffer, the route crosses the buffer and I include ALL 10 ways in the route geometry.

This is what I'm doing - in the final step, I check if any member way intersects the buffer, and if so, include all member way geometries.

Let me write the final script now.

One more consideration about performance: when using `locations=True` with osmium on a large PBF, osmium stores all node locations. For the Austria PBF (200M+ nodes), this could use 3-4 GB of RAM with the in-memory index.

Let me use a disk-based node cache. osmium supports this through the `idx` parameter or by using `NodeLocationsForWays` with a custom index.

Actually, looking at the osmium Python docs, when you do `apply_file(PBF_PATH, locations=True)`, it uses an in-memory index by default. For large files, I should set up a disk cache.

Let me try:

```python
# Create a disk-based node location store
node_cache = "/work/osmium_node_cache"
os.makedirs(node_cache, exist_ok=True)

# Option 1: Use NodeLocationsForWays directly
from osmium import NodeLocationsForWays
nlh = NodeLocationsForWays()
nlh.node_cache = node_cache  # Not sure about exact API

# Option 2: Use locations=True with a custom index
# Not sure if this is supported
```

Let me check the osmium Python API by looking at the installed version.

Actually, let me just try a practical approach:

```python
handler = MyHandler()
handler.apply_file(PBF_PATH, locations=True)
```

If it runs out of memory, I'll add a disk cache. Let me try this first.

For the Gürtel finding pass, I definitely need node locations. For the way collection pass, I also need node locations.

Let me just try with `locations=True` and see what happens. The Austria PBF is about 350MB compressed, and the node cache might use 2-3GB which should be OK for most systems.

Actually, let me be more careful. Let me check the system memory first and use a disk cache if needed.

You know what, let me just write the script and test it. If it fails, I'll add a disk cache.

Also, I should use `osmium.apply_file()` instead of handler.apply_file() for more control. But handler.apply_file() is simpler.

Let me write it now and stop overthinking.

Wait, one more thing: the osmium Python bindings version. Let me check what's installed.

Let me just write the script and run it. I'll handle errors as they come up.

4:02

Bash python3 -c "import osmium; print(osmium.__version__)" 2>/dev/null || echo "osmiu...

osmium not found

4:02

Bash python3 -c "import geopandas; print(geopandas.__version__); from shapely.geometr...

1.1.3
shapely ok
3.7.2

4:09

Let me install osmium first.

4:09

Bash pip install --user osmium 2>&1 | tail -5

Requirement already satisfied: requests in /usr/local/lib/python3.14/site-packages (from osmium) (2.34.2)
Requirement already satisfied: charset_normalizer<4,>=2 in /usr/local/lib/python3.14/site-packages (from requests->osmium) (3.4.7)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.14/site-packages (from requests->osmium) (3.18)
Requirement already satisfied: urllib3<3,>=1.26 in /usr/local/lib/python3.14/site-packages (from requests->osmium) (2.7.0)
Requirement already satisfied: certifi>=2023.5.7 in /usr/local/lib/python3.14/site-packages (from requests->osmium) (2026.5.20)

4:22

Let me check the import again carefully - the PyPI package name is `osmium` but the import is `import osmium`.

4:22

Bash python3 -c "import osmium; print(osmium.__version__)"

Traceback (most recent call last):
  File "<string>", line 1, in <module>
    import osmium; print(osmium.__version__)
                         ^^^^^^^^^^^^^^^^^^
AttributeError: module 'osmium' has no attribute '__version__'

4:36

Bash python3 -c "import osmium; print(osmium)"

<module 'osmium' from '/usr/local/lib/python3.14/site-packages/osmium/__init__.py'>

6:10

Good, osmium is available. Now let me write the Python script to solve this task. Here's my plan:

1. Download the Geofabrik Austria PBF if not already present
2. Parse the PBF using osmium to find all highway ways whose name ends with 'Gürtel' (the Gürtel corridor)
3. Buffer those ways by 500m to create an area of interest
4. Find all highway ways that intersect that buffer
5. Find all public-transport route relations that cross the same buffer
6. Write both layers to a GeoPackage in Austria's standard projected CRS (EPSG:31287 - MGI / Austria Lambert, or EPSG:31256 - MGI / Austria GK East? Let me think... Austria's standard projected coordinate system is typically EPSG:31287 (MGI / Austria Lambert) for nationwide purposes, or EPSG:31256/31255/31254 for GK zones. For Vienna, which is in zone M28 (EPSG:31257) or the national equal-area projection EPSG:3416 (MGI / Austria Lambert). Actually, the most commonly used "standard projected coordinate system" for all of Austria is EPSG:31287 (MGI / Austria Lambert) or EPSG:3416. Let me use EPSG:31287 as it's the standard national projection.

Wait, actually let me reconsider. EPSG:31287 (MGI / Austria Lambert) is indeed the standard projected CRS for Austria.

Now let me write the script. The approach using osmium:

For highways:
- First pass: find all way objects where tags get("name", "").endswith("Gürtel") and "highway" in tags
- Collect their geometries, create a buffer
- Second pass: find all way objects with "highway" tag, check if they intersect the buffer

For PT route relations:
- Find relations with type=route and route in (bus, tram, subway, train, trolleybus, light_rail, railway)
- Check if their geometry (the combined geometry of member ways) crosses the buffer
- Output each as a single MultiLineString

Actually, osmium's handler-based approach can be a bit tricky. Let me think about the best approach.

Option A: Use osmium for the first pass (find Gürtel ways), then use osmium again for highways and routes.

Option B: Use osmium.handler for a two-pass approach within a single file processing.

Let me write a comprehensive script. The challenge is that with osmium, you process the file sequentially and need to collect data in memory.

For the Gürtel ways:
- I'll use a handler that collects way nodes and builds linestrings for ways ending with "Gürtel"
- Then buffer them

For the highways:
- I need to find all highway ways intersecting that buffer
- I'll do a second pass through the PBF

For the PT route relations:
- I need to find relations with type=route and appropriate route tag
- For each such relation, I need to get the member ways, their geometries, and merge into a single geometry
- This requires first collecting way geometries (from a first pass), then processing relations (from a second pass)

This is complex with osmium. Let me think of an alternative approach.

Actually, let me use a simpler approach:
1. Use osmium to extract just the Gürtel ways and their node locations
2. Build the buffer
3. Use osmium again to extract all highway ways within the buffer
4. For PT routes, use osmium to extract relations with type=route and route=bus/tram/etc.

Let me write a Python script that does this step by step.

Actually, for PT routes, I need to collect the member way geometries. This is complex with osmium. Let me think...

One approach:
1. First, find Gürtel ways and compute buffer
2. Use `osmium getid` or filter to extract data within the bbox/area
3. Parse the extracted data

Actually, a more practical approach might be to use osmium directly from command line to filter, then parse with Python.

Let me think about the most straightforward approach:

Approach:
1. Find Gürtel ways using Python/osmium
2. Build the buffer
3. Use osmium tags-filter to extract relevant data
4. Parse the filtered data

Actually, let me just write a comprehensive Python script using osmium handlers.

Here's my plan:

```python
import osmium
import geopandas as gpd
import shapely.geometry
import shapely.ops
import pyproj
import math

# ... 

class GuertelHandler(osmium.SimpleHandler):
    def __init__(self):
        super().__init__()
        self.ways = []
        self.nodes = {}
    
    def way(self, w):
        if 'highway' in w.tags and w.tags.get('name', '').endswith('Gürtel'):
            nodes = [(n.ref, n.lon, n.lat) for n in w.nodes]
            self.ways.append((w, nodes))
    
    def node(self, n):
        self.nodes[n.id] = (n.lon, n.lat)

# First pass: find Gürtel ways
handler = GuertelHandler()
handler.apply_file("austria-latest.osm.pbf")

# Build Gürtel geometries
guertel_lines = []
for w, node_refs in handler.ways:
    coords = [(handler.nodes[nid][0], handler.nodes[nid][1]) for nid, lon, lat in node_refs]
    if len(coords) >= 2:
        guertel_lines.append(shapely.geometry.LineString(coords))

# Buffer by 500m
buffer_geom = shapely.ops.unary_union(guertel_lines).buffer(500)
```

Then for the highways pass, I need another handler that collects highway ways intersecting the buffer.

For PT routes, I need a handler that:
1. First collects all way geometries (with their node refs)
2. Then processes relations that are route relations
3. For each route relation, finds the way members, gets their geometries, combines them

This is tricky because in a single pass, we see ways before relations, so we can collect way geometries first and then when we see relations, we already have the way geometries.

Wait, actually in osmium.handler, the order is: nodes, then ways, then relations. So within a single pass, by the time we process relations, we've already seen all the ways. Good!

But the problem is that a PBF file might have nodes, ways, and relations interleaved in complex ways. Actually, in a standard PBF, the typical order is all nodes, then all ways, then all relations.

Let me write the complete script.

Actually, I realize I need to be more careful. Let me write the script step by step.

One more consideration: the buffer might be quite large (500m around a several-km-long road). I need to extract all highway ways that intersect this buffer, which could be a lot. And for PT routes, I need routes that cross the buffer.

Let me write it now.

For the PT routes, the requirement says "each route as a single feature" as MultiLineString. So for each route relation, I need to combine all its member way geometries into one MultiLineString.

Let me also handle the CRS: EPSG:31287 is MGI / Austria Lambert. I'll reproject from WGS84 to that.

Let me write the full script:

```python
import osmium
import geopandas as gpd
import pandas as pd
from shapely.geometry import LineString, MultiLineString, Point, shape
from shapely.ops import unary_union
import pyproj
import sys
import math

PBF_PATH = "/work/austria-latest.osm.pbf"

# ---------- Step 1: Find Gürtel ways ----------
class GuertelHandler(osmium.SimpleHandler):
    def __init__(self):
        super().__init__()
        self.nodes = {}
        self.guertel_ways = []  # (way_id, tags, node_ids)
    
    def node(self, n):
        self.nodes[n.id] = (float(n.lon), float(n.lat))
    
    def way(self, w):
        if 'highway' in w.tags:
            name = w.tags.get('name', '')
            if name.endswith('Gürtel'):
                node_ids = [n.ref for n in w.nodes if n.ref != 0]
                tags = dict(w.tags)
                self.guertel_ways.append((w.id, tags, node_ids))

handler1 = GuertelHandler()
handler1.apply_file(PBF_PATH)

print(f"Found {len(handler1.guertel_ways)} Gürtel ways")

# Build Gürtel line geometries
guertel_lines = []
for wid, tags, node_ids in handler1.guertel_ways:
    coords = []
    for nid in node_ids:
        if nid in handler1.nodes:
            coords.append(handler1.nodes[nid])
    if len(coords) >= 2:
        guertel_lines.append(LineString(coords))

print(f"Built {len(guertel_lines)} Gürtel line geometries")

# Union and buffer
if guertel_lines:
    guertel_union = unary_union(guertel_lines)
    guertel_buffer = guertel_union.buffer(500)
    print(f"Buffer area: {guertel_buffer.area} sq degrees (approx)")
else:
    print("No Gürtel ways found!")
    sys.exit(1)

# ---------- Step 2: Find highway ways intersecting buffer ----------
class HighwayHandler(osmium.SimpleHandler):
    def __init__(self):
        super().__init__()
        self.nodes = {}
        self.highway_ways = []  # (id, tags, geometry, node_ids)
    
    def node(self, n):
        self.nodes[n.id] = (float(n.lon), float(n.lat))
    
    def way(self, w):
        if 'highway' in w.tags:
            node_ids = [n.ref for n in w.nodes if n.ref != 0]
            tags = dict(w.tags)
            self.highway_ways.append((w.id, tags, node_ids))

handler2 = HighwayHandler()
handler2.apply_file(PBF_PATH)

print(f"Found {len(handler2.highway_ways)} highway ways total")

# Filter by intersection with buffer
filtered_highways = []
for wid, tags, node_ids in handler2.highway_ways:
    coords = []
    for nid in node_ids:
        if nid in handler2.nodes:
            coords.append(handler2.nodes[nid])
    if len(coords) >= 2:
        line = LineString(coords)
        if line.intersects(guertel_buffer):
            filtered_highways.append({
                'id': wid,
                'geometry': line,
                **{k: v for k, v in tags.items() if k in ('name', 'highway', 'maxspeed', 'lanes', 'surface', 'oneway')}
            })

print(f"Found {len(filtered_highways)} highway ways intersecting buffer")

# ---------- Step 3: Find PT route relations ----------
class PTHandler(osmium.SimpleHandler):
    def __init__(self):
        super().__init__()
        self.nodes = {}
        self.all_ways = {}  # way_id -> (tags, node_ids)
        self.route_relations = []  # (rel_id, tags, member_way_ids)
    
    def node(self, n):
        self.nodes[n.id] = (float(n.lon), float(n.lat))
    
    def way(self, w):
        node_ids = [n.ref for n in w.nodes if n.ref != 0]
        self.all_ways[w.id] = (dict(w.tags), node_ids)
    
    def relation(self, r):
        if r.tags.get('type') == 'route':
            route = r.tags.get('route', '')
            if route in ('bus', 'tram', 'subway', 'train', 'trolleybus', 'light_rail'):
                member_way_ids = []
                for m in r.members:
                    if m.type == 'w':  # way member
                        member_way_ids.append(m.ref)
                tags = dict(r.tags)
                self.route_relations.append((r.id, tags, member_way_ids))

handler3 = PTHandler()
handler3.apply_file(PBF_PATH)

print(f"Found {len(handler3.route_relations)} PT route relations")
print(f"Total ways in memory: {len(handler3.all_ways)}")

# Filter routes that cross the buffer
filtered_routes = []
for rid, tags, member_way_ids in handler3.route_relations:
    route_lines = []
    for wid in member_way_ids:
        if wid in handler3.all_ways:
            _, node_ids = handler3.all_ways[wid]
            coords = []
            for nid in node_ids:
                if nid in handler3.nodes:
                    coords.append(handler3.nodes[nid])
            if len(coords) >= 2:
                route_lines.append(LineString(coords))
    
    if route_lines:
        # Combine all member lines
        combined = unary_union(route_lines)
        # Check if it crosses (touches or goes through) the buffer
        if combined.intersects(guertel_buffer):
            if combined.geom_type == 'LineString':
                mls = MultiLineString([combined])
            elif combined.geom_type == 'MultiLineString':
                mls = combined
            else:
                continue  # shouldn't happen for lines
            
            filtered_routes.append({
                'id': rid,
                'geometry': mls,
                **{k: v for k, v in tags.items() if k in ('ref', 'name', 'operator', 'route')}
            })

print(f"Found {len(filtered_routes)} PT routes crossing buffer")

# ---------- Step 4: Write to GeoPackage ----------
# Create GeoDataFrames
gdf_highways = gpd.GeoDataFrame(filtered_highways, geometry='geometry', crs='EPSG:4326')
gdf_routes = gpd.GeoDataFrame(filtered_routes, geometry='geometry', crs='EPSG:4326')

# Reproject to EPSG:31287 (MGI / Austria Lambert)
gdf_highways = gdf_highways.to_crs('EPSG:31287')
gdf_routes = gdf_routes.to_crs('EPSG:31287')

# Write to GeoPackage
gdf_highways.to_file('/work/vienna_network.gpkg', layer='highways', driver='GPKG')
gdf_routes.to_file('/work/vienna_network.gpkg', layer='pt_routes', driver='GPKG')

print("Done! Written to /work/vienna_network.gpkg")
```

Wait, but this approach processes the PBF three times, which is slow for a ~1.5GB file. Let me think if I can do it in fewer passes.

Actually, I can do it in two passes:
1. First pass: collect all nodes, find Gürtel ways, and also collect all highway ways AND all PT route relations with their member ways
2. But the issue is I need the Gürtel buffer to filter highways, so I can't do it all in one pass.

Alternative: 
- Pass 1: Collect Gürtel ways and all nodes
- Compute buffer
- Pass 2: Filter highways and collect PT route info

Actually, I also need to collect ALL nodes (for way geometries) in pass 2. But I already collected them in pass 1. Can I reuse? Well, the node coordinates don't change, so I could reuse. But I need to be careful about memory.

Let me optimize: 
- Pass 1: Collect Gürtel ways + all nodes
- Compute buffer
- Pass 2: Using the same nodes dict from pass 1 (but osmium creates a new handler)... Hmm, I can't easily share state between handlers across separate apply_file calls unless I use the same handler object.

Wait, actually I can use a single handler that does everything, but processes in order. The issue is that I need to know the Gürtel buffer before I can filter highways and routes. So I need two passes.

Let me structure it as two passes:
1. Pass 1: Read all nodes and find Gürtel ways
2. Compute buffer
3. Pass 2: Read all nodes again (or reuse if stored), find highways intersecting buffer and PT routes

Actually, let me store nodes from pass 1 and reuse them. The PBF is about 1.5GB and Austria has millions of nodes... Let me check how many nodes Austria has.

Actually, the nodes could be ~100+ million. Storing all of them as Python dict entries might use a lot of memory (~10GB+). Let me think about a more memory-efficient approach.

Alternative: Use a spatial index approach. I can:
1. First pass: find Gürtel ways, extract their node coords, build buffer
2. Second pass: for highways, I need to check each way's nodes against the buffer. Instead of storing all nodes, I can stream through and build geometries on the fly, checking against the buffer.

But I need the actual node coordinates to build the geometry to check intersection! So I do need the nodes.

Actually, let me think about this differently. The Gürtel is a road in Vienna. The buffer of 500m around it is a relatively small area. Most nodes and ways in Austria won't intersect it. So I could:

1. First pass: find Gürtel ways, collect their node coordinates, build buffer
2. Second pass: process ALL nodes, but only keep those that are within or near the buffer (or referenced by Gürtel ways). For highway ways, check if any of their referenced nodes are near the buffer, and if so, build the geometry using the stored nodes.

Actually, the simplest approach might be to use osmium's location handler which gives us node coordinates as we process ways. Let me use osmium's `NodeLocationsForWays` handler.

Actually, osmium has `osmium.geom.GeoLocationHandler` or `osmium.NodeLocationsForWays` that can be used to automatically populate node locations when processing ways. Let me look at the osmium API.

Wait, I can use `osmium.geom.GeoLocationHandler` or simpler: use `osmium.NodeLocationsForWays` as a handler wrapper.

Actually, in the Python bindings, there's `osmium.NodeLocationsForWays` which is a handler that stores node locations and makes them available when processing ways. Let me use this.

Here's a better approach:

```python
class HighwayCollector(osmium.SimpleHandler):
    def __init__(self, buffer_geom):
        super().__init__()
        self.buffer = buffer_geom
        self.highways = []
    
    def way(self, w):
        if 'highway' in w.tags:
            # Build geometry from node locations
            # But we need the node locations! 
```

Hmm, osmium doesn't automatically resolve node locations for ways in Python. We need to use the locations handler.

Let me look at how osmium works. There's `osmium.NodeLocationsForWays` which is a handler that, when used as part of the handler chain, populates node locations in ways:

```python
import osmium

class MyHandler(osmium.SimpleHandler):
    def __init__(self):
        super().__init__()
    
    def way(self, w):
        # w.nodes now have .lat and .lon populated if NodeLocationsForWays was used
        for n in w.nodes:
            print(n.lon, n.lat)

h = MyHandler()
# Apply with NodeLocationsForWays as a location handler
h.apply_file("file.osm.pbf", locations=True)  # or use a separate handler
```

Actually, in the Python osmium bindings, `SimpleHandler.apply_file` has a `locations` parameter. When set to True, it uses `NodeLocationsForWays` internally. Let me check this.

Looking at the osmium Python docs/examples:

```python
handler.apply_file("file.osm.pbf", locations=True)
```

This should work! But it stores node locations in memory, which could be large.

OK, let me just be pragmatic and write the script. It might use a lot of memory for all of Austria's nodes, but let me try the two-pass approach:

Pass 1: Read all nodes + Gürtel ways (using locations=True)
- This stores all node locations in memory via NodeLocationsForWays
- Collect Gürtel way geometries

Actually, with `locations=True`, the osmium library stores all node locations internally. Then I can access them in the way handler. Let me try this.

But wait, if I use `locations=True` in the first pass, all nodes are stored. Then in the second pass, I need them again. I could combine everything into one pass if I use a two-stage approach within a single handler.

Let me try a different approach. Since osmium handlers are simple sequential processors:

Pass 1: Just find Gürtel ways and collect their node IDs
- Don't need node coordinates yet, just the IDs

Actually wait, I need the node coordinates to build the geometry to buffer. Let me think...

Actually, I can get the bbox of the Gürtel from the way nodes even without storing all coordinates. Let me try:

Pass 1: 
- For each node: if its ID is referenced by a Gürtel way, store its coordinates
- For each way: if it's a Gürtel way, note which node IDs it uses
- After the pass, build Gürtel geometries and buffer

This is more memory efficient. Let me estimate: the Gürtel is maybe 10-15 km long, with nodes every 10-20 meters, so maybe 500-1500 nodes. The buffer of 500m around this is still very small compared to all of Austria.

So:
Pass 1: 
- Collect all nodes referenced by Gürtel ways (store these node coords)
- Collect Gürtel way data

Then build buffer.

Pass 2:
- For each node: check if it's within a bounding box of the buffer (rough filter). If so, store it.
- For each way: if it's a highway way, check if its nodes are stored (meaning they're near the buffer). If at least one node is near, build the full geometry from stored node coordinates.
- For each relation: similar approach.

But this is getting complex. Let me just use the simple approach with `locations=True` and accept the memory usage. The script will work for practical use.

Actually, I just realized I can download a smaller regional extract. Vienna is in the "Wien" region of Geofabrik, which has a much smaller PBF. But the user specified https://download.geofabrik.de/europe/austria-latest.osm.pbf specifically.

Let me try the straightforward approach and see if it works:

1. Download the PBF
2. Use osmium with `locations=True` to process
3. Two passes (or one long pass)

Actually, I just thought of an even better approach. I can use a single pass with osmium, using the fact that osmium processes nodes before ways before relations. I can:

1. In the node handler: store all node locations (using NodeLocationsForWays or manually)
2. In the way handler: 
   a. If it's a Gürtel way, collect its geometry
   b. If it's a highway way, store it temporarily
3. After all ways are processed, compute the buffer from Gürtel geometries
4. Then in the relation handler:
   a. Filter PT routes that intersect the buffer

But wait, osmium processes in order: all nodes, then all ways, then all relations. So by the time we're in the way handler, we've seen all nodes. And by the time we're in the relation handler, we've seen all ways. But we can't compute the buffer until we've seen all Gürtel ways, which means we can't filter highway ways during the way handler callback.

However, I can collect all highway ways during the way callback, then after the file is processed, filter them. Same for relations.

Let me restructure:

```python
class AllInOneHandler(osmium.SimpleHandler):
    def __init__(self):
        super().__init__()
        self.guertel_ways = []
        self.all_highway_ways = []
        self.route_relations = []
    
    def way(self, w):
        if 'highway' in w.tags:
            name = w.tags.get('name', '')
            highway_info = {
                'id': w.id,
                'tags': dict(w.tags),
                'nodes': [(n.lon, n.lat) for n in w.nodes if n.location_valid()]
            }
            
            if name.endswith('Gürtel') and len(highway_info['nodes']) >= 2:
                self.guertel_ways.append(highway_info)
            
            self.all_highway_ways.append(highway_info)
    
    def relation(self, r):
        if r.tags.get('type') == 'route':
            route = r.tags.get('route', '')
            if route in ('bus', 'tram', 'subway', 'train', 'trolleybus', 'light_rail'):
                # Store relation info - we'll process member ways later
                member_way_ids = [m.ref for m in r.members if m.type == 'w']
                self.route_relations.append({
                    'id': r.id,
                    'tags': dict(r.tags),
                    'member_way_ids': member_way_ids
                })
```

But with this approach, I don't have the way geometries for route members during the relation callback... Actually, I can store all way geometries by ID and look them up when processing relations after the file is read.

Wait, in osmium, with `locations=True`, the way handler receives ways with node locations populated. So I can build geometries in the way handler and store them. Then in the relation handler, I can look up member way geometries.

But the issue is: I need the Gürtel buffer to filter highways, and I don't have that until after all ways are processed. So during the way callback, I can't filter yet.

Let me just store everything and filter after the file is processed.

Here's my revised plan:

```python
handler = AllInOneHandler()
handler.apply_file(PBF_PATH, locations=True)

# Now compute buffer from Gürtel ways
guertel_lines = [LineString(w['nodes']) for w in handler.guertel_ways if len(w['nodes']) >= 2]
buffer = unary_union(guertel_lines).buffer(500 / 111000)  # approximate deg-to-m

# Filter highways
highways = []
for w in handler.all_highway_ways:
    line = LineString(w['nodes'])
    if line.intersects(buffer):
        highways.append({...})

# For PT routes, I need to look up way geometries
way_geoms = {w['id']: LineString(w['nodes']) for w in handler.all_highway_ways if len(w['nodes']) >= 2}
# Also need to include non-highway ways that might be part of PT routes... 
# Actually, railway ways might not have 'highway' tag but could be part of train/subway routes
```

Hmm, I need to store ALL ways, not just highway ways, for PT routes. Let me store all ways.

Actually, I think the cleanest approach is:

1. One pass: store all nodes (via locations=True), all ways (with geometries), and all PT route relations
2. After the pass: compute buffer, filter highways, filter PT routes

But storing all ways for all of Austria could be huge... Let me estimate. Austria has about 5M ways. Each way has maybe 5 nodes on average. That's a lot of data.

Let me try a more practical approach using the two-pass method with minimal memory:

Pass 1: Find Gürtel ways, collect their node coords only
Pass 2: Use osmium's `NodeLocationsForWays` to process all data, but only store what's in/near the buffer's bbox

Actually, I think the simplest thing that will work is:

1. Download the file
2. Use osmium to filter by a bounding box around Vienna first, then process the filtered data

Or even better: let me use `osmium tags-filter` from the command line to extract a small region first, then process that with Python.

Let me try this approach:

```bash
# First, find the bbox of the Gürtel using osmium
osmium getid -r austria-latest.osm.pbf --bbox-from-ways --highway --name="*Gürtel" > guertel.osm
```

Hmm, actually let me just write the Python script and see if it works. I'll use `locations=True` which should handle memory reasonably.

Actually, let me think about this more practically. Let me use `osmium` command-line tools to extract just Vienna, then process the much smaller file.

Wait, actually, let me try another approach. Let me use `osmium tags-filter` to filter just the relevant data first.

Plan:
1. Download the PBF
2. Use Python/osmium to find Gürtel ways and compute buffer bbox
3. Use `osmium extract` with the bbox to get just the Vienna area
4. Process the extracted file for highways and routes

Actually, this is getting complicated. Let me just write the full Python script and run it.

Let me write a pragmatic script:

```python
import osmium
import geopandas as gpd
from shapely.geometry import LineString, MultiLineString, box
from shapely.ops import unary_union
import sys
import os

PBF = "/work/austria-latest.osm.pbf"

# ---------- Pass 1: Find Gürtel ways only ----------
class GuertelFinder(osmium.SimpleHandler):
    def __init__(self):
        super().__init__()
        self.nodes_needed = set()
        self.guertel_way_nodes = {}  # way_id -> [node_id, ...]
        self.guertel_tags = {}
        self.node_coords = {}
    
    def way(self, w):
        if 'highway' in w.tags:
            name = w.tags.get('name', '')
            if name.endswith('Gürtel'):
                node_ids = [n.ref for n in w.nodes if n.ref != 0]
                self.guertel_way_nodes[w.id] = node_ids
                self.guertel_tags[w.id] = dict(w.tags)
                self.nodes_needed.update(node_ids)
    
    def node(self, n):
        if n.id in self.nodes_needed:
            self.node_coords[n.id] = (n.lon, n.lat)

print("Pass 1: Finding Gürtel ways...")
finder = GuertelFinder()
finder.apply_file(PBF)
print(f"Found {len(finder.guertel_way_nodes)} Gürtel ways, {len(finder.nodes_needed)} unique nodes")

# Build Gürtel geometries
guertel_lines = []
for wid, node_ids in finder.guertel_way_nodes.items():
    coords = [finder.node_coords[nid] for nid in node_ids if nid in finder.node_coords]
    if len(coords) >= 2:
        guertel_lines.append(LineString(coords))

guertel_union = unary_union(guertel_lines)
# Buffer 500m - convert degrees to approximate meters at Vienna's latitude (~48.2°N)
# 1 degree lon ≈ 111320 * cos(48.2°) ≈ 74200 m
# 1 degree lat ≈ 111320 m
# So 500m ≈ 0.0045° lat, 0.0067° lon
# Use a simple buffer in degrees (slightly generous)
buffer_deg = 500.0 / 111000.0  # approximate
guertel_buffer = guertel_union.buffer(buffer_deg)
print(f"Buffer computed")

# ---------- Pass 2: Find highways and PT routes ----------
class Collector(osmium.SimpleHandler):
    def __init__(self):
        super().__init__()
        self.way_geoms = {}  # way_id -> LineString
        self.way_tags = {}   # way_id -> tags dict
        self.highways_in_buffer = []
        self.route_relations = []
    
    def way(self, w):
        # Build geometry if it has location info
        coords = [(n.lon, n.lat) for n in w.nodes if n.location_valid()]
        if len(coords) < 2:
            return
        
        line = LineString(coords)
        self.way_geoms[w.id] = line
        self.way_tags[w.id] = dict(w.tags)
        
        # Check if highway and intersects buffer
        if 'highway' in w.tags:
            if line.intersects(guertel_buffer):
                tags = dict(w.tags)
                self.highways_in_buffer.append({
                    'geometry': line,
                    **{k: v for k, v in tags.items() if k in ('name', 'highway', 'maxspeed', 'lanes', 'surface', 'oneway')}
                })
    
    def relation(self, r):
        if r.tags.get('type') != 'route':
            return
        route = r.tags.get('route', '')
        if route not in ('bus', 'tram', 'subway', 'train', 'trolleybus', 'light_rail'):
            return
        
        # Get member way geometries
        member_lines = []
        for m in r.members:
            if m.type == 'w' and m.ref in self.way_geoms:
                member_lines.append(self.way_geoms[m.ref])
        
        if not member_lines:
            return
        
        # Combine into single geometry
        combined = unary_union(member_lines)
        if combined.is_empty:
            return
        
        # Check if it crosses the buffer
        if not combined.intersects(guertel_buffer):
            return
        
        # Convert to MultiLineString
        if combined.geom_type == 'LineString':
            geom = MultiLineString([combined])
        elif combined.geom_type == 'MultiLineString':
            geom = combined
        else:
            return  # skip Point/MultiPoint/other
        
        tags = dict(r.tags)
        self.route_relations.append({
            'geometry': geom,
            **{k: v for k, v in tags.items() if k in ('ref', 'name', 'operator', 'route')}
        })

print("Pass 2: Collecting highways and PT routes...")
collector = Collector()
# We need locations=True to get node coordinates in ways
collector.apply_file(PBF, locations=True)
print(f"Found {len(collector.highways_in_buffer)} highway ways in buffer")
print(f"Found {len(collector.route_relations)} PT routes crossing buffer")
```

Wait, but with `locations=True`, osmium stores ALL node locations in memory. That's basically the whole of Austria's nodes. This could be 100M+ nodes, each taking ~40 bytes in the internal C++ data structure. That's 4GB+ just for nodes. This might work but could be slow.

Actually, let me check if there's a better way. I can set `locations=True` only for ways that I'm interested in, but that's not how osmium works - it stores all nodes.

Alternative: I could use the `NodeLocationsForWays` handler with a bbox filter (not available in the Python bindings directly).

Let me try a different approach: use the bbox of the Gürtel buffer to extract a region first.

Let me get the bbox of the buffer and extract just that region using osmium command line.

Actually, let me just try running the script and see if it works within memory constraints. If it's too slow/memory-intensive, I'll adjust.

Let me write the complete script now.

Let me also check: the user wants EPSG:31287 (MGI / Austria Lambert) which is the standard projected CRS for Austria.

Let me write the full script now.

One more thing: I need to handle the fact that for PT routes, the member ways might include non-highway ways (e.g., railway tracks for train/subway). So I need to store all way geometries, not just highway ways.

Actually, looking at my code above, I already store all way geometries in `self.way_geoms`. Good.

Let me also think about what "crosses the same buffer" means for PT routes. A route "crosses" the buffer if any part of its geometry passes through the buffer. Using `intersects` is correct for this.

Let me finalize the script.

6:10

Write /work/solve.py

content · 215 lines · py

#!/usr/bin/env python3
"""
Extract highways intersecting a 500m buffer around Vienna's Gürtel,
and public-transport route relations crossing that same buffer,
from the Geofabrik Austria PBF.

Output: vienna_network.gpkg (layers: highways, pt_routes)
CRS: EPSG:31287 (MGI / Austria Lambert)
"""

import osmium
import sys
import os
import time

from shapely.geometry import LineString, MultiLineString
from shapely.ops import unary_union
import geopandas as gpd

PBF = "/work/austria-latest.osm.pbf"

# ---------------------------------------------------------------------------
# Step 1 — Find Gürtel ways and compute 500 m buffer
# ---------------------------------------------------------------------------
class GuertelFinder(osmium.SimpleHandler):
    """First pass: collect only the node coordinates needed for Gürtel ways."""
    def __init__(self):
        super().__init__()
        self.nodes_needed = set()          # node ids referenced by Gürtel ways
        self.guertel_way_nodes = {}        # way_id -> list of node ids
        self.guertel_tags = {}             # way_id -> tags dict
        self.node_coords = {}              # node_id -> (lon, lat)

    def way(self, w):
        if "highway" in w.tags:
            name = w.tags.get("name", "")
            if name.endswith("Gürtel"):
                node_ids = [n.ref for n in w.nodes if n.ref != 0]
                self.guertel_way_nodes[w.id] = node_ids
                self.guertel_tags[w.id] = dict(w.tags)
                self.nodes_needed.update(node_ids)

    def node(self, n):
        if n.id in self.nodes_needed:
            self.node_coords[n.id] = (n.lon, n.lat)


print("=" * 60, flush=True)
print("Step 1 — Finding Gürtel ways...", flush=True)
t0 = time.time()

finder = GuertelFinder()
finder.apply_file(PBF)

print(f"  Gürtel ways found: {len(finder.guertel_way_nodes)}", flush=True)
print(f"  Unique nodes to collect: {len(finder.nodes_needed)}", flush=True)
print(f"  Nodes collected: {len(finder.node_coords)}", flush=True)

# Build LineStrings for each Gürtel way
guertel_lines = []
for wid, node_ids in finder.guertel_way_nodes.items():
    coords = [finder.node_coords[nid] for nid in node_ids if nid in finder.node_coords]
    if len(coords) >= 2:
        guertel_lines.append(LineString(coords))

if not guertel_lines:
    print("ERROR: No Gürtel ways found. Exiting.", flush=True)
    sys.exit(1)

guertel_union = unary_union(guertel_lines)

# 500 m buffer — approximate conversion: 1° ≈ 111 000 m at this latitude
BUFFER_DEG = 500.0 / 111000.0
guertel_buffer = guertel_union.buffer(BUFFER_DEG)

buffer_bounds = guertel_buffer.bounds  # (minx, miny, maxx, maxy)
print(f"Gürtel buffer bounds (deg): {buffer_bounds}", flush=True)
print(f"  Elapsed: {time.time()-t0:.1f}s", flush=True)

# ---------------------------------------------------------------------------
# Step 2 — One pass: collect all way geometries & find highways + PT routes
# ---------------------------------------------------------------------------
class Collector(osmium.SimpleHandler):
    """Second pass (with locations=True): build every way geometry, then
    filter highways by buffer and build PT route geometries."""

    def __init__(self, buffer_geom):
        super().__init__()
        self.buffer = buffer_geom
        self.way_geoms = {}        # way_id -> Shapely LineString
        self.way_tags = {}         # way_id -> full tags dict

        self.highway_features = []  # list of dicts for GeoDataFrame
        self.route_features = []    # list of dicts for GeoDataFrame

        # stats
        self._n_ways = 0
        self._n_highways = 0
        self._n_highway_in = 0
        self._n_routes = 0
        self._n_route_in = 0

    def way(self, w):
        self._n_ways += 1

        coords = [(n.lon, n.lat) for n in w.nodes if n.location_valid()]
        if len(coords) < 2:
            return

        line = LineString(coords)
        self.way_geoms[w.id] = line
        self.way_tags[w.id] = dict(w.tags)

        if "highway" not in w.tags:
            return

        self._n_highways += 1
        if not line.intersects(self.buffer):
            return

        self._n_highway_in += 1
        tags = dict(w.tags)
        row = {
            "geometry": line,
        }
        for k in ("name", "highway", "maxspeed", "lanes", "surface", "oneway"):
            row[k] = tags.get(k, None)
        self.highway_features.append(row)

    def relation(self, r):
        if r.tags.get("type") != "route":
            return
        route_val = r.tags.get("route", "")
        if route_val not in ("bus", "tram", "subway", "train",
                             "trolleybus", "light_rail"):
            return

        self._n_routes += 1

        # Collect member-way geometries
        member_lines = []
        for m in r.members:
            if m.type == "w" and m.ref in self.way_geoms:
                member_lines.append(self.way_geoms[m.ref])

        if not member_lines:
            return

        combined = unary_union(member_lines)
        if combined.is_empty or not combined.intersects(self.buffer):
            return

        self._n_route_in += 1

        # Normalise to MultiLineString
        if combined.geom_type == "LineString":
            geom = MultiLineString([combined])
        elif combined.geom_type == "MultiLineString":
            geom = combined
        else:
            return  # point / polygon — skip

        tags = dict(r.tags)
        row = {"geometry": geom}
        for k in ("ref", "name", "operator", "route"):
            row[k] = tags.get(k, None)
        self.route_features.append(row)


print("Step 2 — Collecting highways and PT routes (locations=True) ...", flush=True)
t1 = time.time()

collector = Collector(guertel_buffer)
collector.apply_file(PBF, locations=True)

print(f"  Total ways seen:               {collector._n_ways}", flush=True)
print(f"  Highway ways seen:             {collector._n_highways}", flush=True)
print(f"  Highway ways in buffer:        {collector._n_highway_in}", flush=True)
print(f"  PT route relations seen:       {collector._n_routes}", flush=True)
print(f"  PT routes crossing buffer:     {collector._n_route_in}", flush=True)
print(f"  Elapsed: {time.time()-t1:.1f}s", flush=True)

# ---------------------------------------------------------------------------
# Step 3 — Build GeoDataFrames, reproject, write GeoPackage
# ---------------------------------------------------------------------------
if not collector.highway_features:
    print("WARNING: no highway features to write!", flush=True)
if not collector.route_features:
    print("WARNING: no PT route features to write!", flush=True)

print("Step 3 — Writing GeoPackage ...", flush=True)
t2 = time.time()

gdf_hw = gpd.GeoDataFrame(collector.highway_features, crs="EPSG:4326")
gdf_pt = gpd.GeoDataFrame(collector.route_features, crs="EPSG:4326")

# Reproject to Austria's standard projected CRS
gdf_hw = gdf_hw.to_crs("EPSG:31287")
gdf_pt = gdf_pt.to_crs("EPSG:31287")

OUT = "/work/vienna_network.gpkg"

# Remove output if it exists (GeoPackage append mode can be tricky)
if os.path.exists(OUT):
    os.remove(OUT)

gdf_hw.to_file(OUT, layer="highways", driver="GPKG", engine="pyogrio")
gdf_pt.to_file(OUT, layer="pt_routes", driver="GPKG", engine="pyogrio")

print(f"  Highways layer: {len(gdf_hw)} features", flush=True)
print(f"  PT routes layer: {len(gdf_pt)} features", flush=True)
print(f"  Output: {OUT}", flush=True)
print(f"  Elapsed: {time.time()-t2:.1f}s", flush=True)
print("=" * 60, flush=True)
print("Done.", flush=True)

Wrote /work/solve.py (7571 chars).

6:17

Let me first download the PBF file, then run the script.

Actually, wait. The script expects the PBF to already be at `/work/austria-latest.osm.pbf`. I need to download it first. Let me do that.

The PBF is ~90 MB or more, so it might take some time. Let me download it in the background.

6:17

Bash cd /work && wget -q -O austria-latest.osm.pbf "https://download.geofabrik.de/eur...

bash: line 1: wget: command not found

6:24

Let me use curl instead.

6:24

Bash cd /work && curl -sLo austria-latest.osm.pbf "https://download.geofabrik.de/euro...

cd /work && curl -sLo austria-latest.osm.pbf "https://download.geofabrik.de/europe/austria-latest.osm.pbf" &
echo "Download started in background, PID=$!"

Download started in background, PID=97

bash: line 1: curl: command not found

6:30

Neither wget nor curl is available. Let me try python's urllib to download.

6:30

Bash python3 -c "

python3 -c "
import urllib.request
import sys

url = 'https://download.geofabrik.de/europe/austria-latest.osm.pbf'
print(f'Downloading {url} ...')
urllib.request.urlretrieve(url, '/work/austria-latest.osm.pbf')
print('Download complete.')
" 2>&1

Downloading https://download.geofabrik.de/europe/austria-latest.osm.pbf ...
Download complete.

7:05

Good, the PBF is downloaded. Now let's run the script. The PBF for Austria is about 200+ MB, so it might take a while. Let me check the file size first.

7:05

Bash ls -lh /work/austria-latest.osm.pbf

-rw-r--r-- 1 agent agent 765M Jun 15 16:47 /work/austria-latest.osm.pbf

7:14

Let me think about the approach. The first approach requires two passes over the PBF - which means loading the file twice. That would be very slow for a 765MB file. Let me reconsider.

Actually, the first pass is just to find Gürtel ways and their node coordinates. But in the second pass, we need `locations=True` to get all way geometries.

Wait, there's a problem with my approach: Step 1 does a first pass without `locations=True`, and Step 2 does a second pass with `locations=True`. This means the whole 765MB file gets read twice. The second pass (with locations=True) is the main heavy one anyway.

Let me think if we can do this in a single pass... The issue is: to compute the buffer from Gürtel ways, we need Gürtel way geometries. And to determine if a highway is in the buffer, we need the buffer. So we can't know the buffer until all ways are processed.

One approach: First pass finds Gürtel ways and nodes (locations=False but we store nodes for Gürtel ways), then compute buffer, then second pass (with locations=True) filters highways and collects PT routes.

Actually, my code already does exactly this. Let me just run it and see if it works. But it will take a while with 765MB.

Actually, I realize there's an issue: the first pass only collects node coordinates for Gürtel ways. But I should use `locations=False` (default) for the first pass - which is what I do. Then in the handler, `way()` only gets node refs, not coordinates. And `node()` gets lon/lat. So I track which nodes I need and collect them.

Wait, but `finder.apply_file(PBF)` without `locations=True` means the way handler won't have coordinates, but node handler will fire. That's correct for the first pass.

Then in the second pass with `locations=True`, both ways and nodes get coordinates automatically, so the way handler sees `n.location_valid()` - that's correct.

Let me run it. This could take a while...

7:14

Bash cd /work && python3 solve.py 2>&1

============================================================
Step 1 — Finding Gürtel ways...
  Gürtel ways found: 398
  Unique nodes to collect: 1695
  Nodes collected: 0
ERROR: No Gürtel ways found. Exiting.

11:01

The issue is that in the first pass, we're using `osmium.SimpleHandler` (without `locations=True`), so the `node()` method only gets called for nodes that are read but the handler doesn't have locations - but wait, `SimpleHandler` with `locations=False` (default) should still call `node()` with the node data including coordinates. Let me check: when `locations=True`, the `way()` method automatically provides location for each node reference. When `locations=False`, `node()` is still called with the node coordinates.

The issue is: `finder.apply_file(PBF)` by default doesn't enable locations. But the `node()` callback is still called. Let me debug.

Actually wait, the problem is that in the first pass (SimpleHandler without locations=True), the `node()` handler IS called. But the `node()` method checks `if n.id in self.nodes_needed`. The nodes_needed is populated in the `way()` callback. But the issue is that in a SimpleHandler without `locations=True`, the nodes might be processed BEFORE the ways. Wait, no.

Actually in osmium, the order of callbacks depends on the file format. In PBF files, nodes come before ways. So when `way()` is called, nodes have already been processed. That's the problem! The `nodes_needed` set is only populated during `way()` processing, but `node()` is called before `way()`.

Wait actually I think that's not right either. Let me think more carefully.

In OSM PBF files, the data is ordered: nodes first, then ways, then relations. So when using `SimpleHandler` (without locations=True), the callbacks will be:
1. node() for all nodes
2. way() for all ways
3. relation() for all relations

But with `locations=True`, the handler buffers node locations and makes them available during way() calls.

So for the first pass, the node() callback is called first for ALL nodes, but at that point `self.nodes_needed` is empty. Then way() is called and populates `self.nodes_needed`, but it's too late - nodes have already been processed.

I need a different approach. Options:

1. Do two passes: first find Gürtel way node IDs, then run a second pass with those node IDs to get coordinates.
2. Use a single pass with `locations=True` to get the Gürtel geometries directly.
3. Read the file twice: first pass just to get node IDs, then re-read to get those node coordinates.

Actually, the simplest fix: In the first pass (without locations), we can still capture node coordinates for ALL nodes (or at least store them), then in way() we can check. But that would require storing all nodes in Austria (memory heavy).

Better approach: Use two passes.
- Pass 1: Just find Gürtel way IDs and their node IDs (SimpleHandler without locations, store node IDs from way())
- Pass 2: Read with locations=True and collect only those nodes' coordinates. But wait, `node()` is still called before `way()` in pass 2...

Actually, with `locations=True`, the handler enables "locations on ways" mode, which means way() callbacks get node locations automatically. But the node() callback is still called first for all nodes.

Let me restructure: 
- Use just ONE pass with `locations=True`. Find the Gürtel ways when their ways are processed, capture their geometries directly.

Actually, for the first pass we only need the buffer geometry. Let me use a single pass with locations=True:

1. In `way()`: if name ends with Gürtel and has highway, store the LineString
2. After processing, compute buffer from those lines
3. But we can't use that buffer during the same pass...

Better approach: Two complete passes, both with locations=True.

Pass 1 (GuertelFinder with locations=True):
- way(): if name ends Gürtel, build LineString from node locations, store it
- node(): just needed by osmium for locations
- After: compute buffer

Pass 2 (Collector with locations=True):
- way(): if highway, check intersection with buffer
- relation(): check intersection with buffer

Let me rewrite.

11:01

Write /work/solve.py

content · 199 lines · py

#!/usr/bin/env python3
"""
Extract highways intersecting a 500m buffer around Vienna's Gürtel,
and public-transport route relations crossing that same buffer,
from the Geofabrik Austria PBF.

Output: vienna_network.gpkg (layers: highways, pt_routes)
CRS: EPSG:31287 (MGI / Austria Lambert)
"""

import osmium
import sys
import os
import time

from shapely.geometry import LineString, MultiLineString
from shapely.ops import unary_union
import geopandas as gpd

PBF = "/work/austria-latest.osm.pbf"

# ===================================================================
# PASS 1 — Find Gürtel ways (name ends with 'Gürtel') with locations
# ===================================================================
class GuertelFinder(osmium.SimpleHandler):
    def __init__(self):
        super().__init__()
        self.guertel_lines = []

    def way(self, w):
        if "highway" not in w.tags:
            return
        name = w.tags.get("name", "")
        if not name.endswith("Gürtel"):
            return
        coords = [(n.lon, n.lat) for n in w.nodes if n.location_valid()]
        if len(coords) >= 2:
            self.guertel_lines.append(LineString(coords))

print("=" * 60, flush=True)
print("PASS 1 — Finding Gürtel ways...", flush=True)
t0 = time.time()

finder = GuertelFinder()
finder.apply_file(PBF, locations=True)

print(f"  Gürtel LineStrings: {len(finder.guertel_lines)}", flush=True)

if not finder.guertel_lines:
    print("ERROR: No Gürtel ways found.", flush=True)
    sys.exit(1)

guertel_union = unary_union(finder.guertel_lines)

# 500 m buffer — approximate: 1° ≈ 111 000 m at Vienna latitude
BUFFER_DEG = 500.0 / 111000.0
guertel_buffer = guertel_union.buffer(BUFFER_DEG)

print(f"  Buffer bounds (deg): {guertel_buffer.bounds}", flush=True)
print(f"  Elapsed: {time.time()-t0:.1f}s", flush=True)

# ===================================================================
# PASS 2 — Collect highway ways + PT route relations in buffer
# ===================================================================
class Collector(osmium.SimpleHandler):
    def __init__(self, buffer_geom):
        super().__init__()
        self.buffer = buffer_geom
        self.way_geoms = {}        # way_id -> Shapely LineString
        self.way_tags = {}         # way_id -> full tags dict

        self.highway_features = []  # list of dicts for GeoDataFrame
        self.route_features = []    # list of dicts for GeoDataFrame

        # stats
        self._n_ways = 0
        self._n_highways = 0
        self._n_highway_in = 0
        self._n_routes = 0
        self._n_route_in = 0

    def way(self, w):
        self._n_ways += 1

        coords = [(n.lon, n.lat) for n in w.nodes if n.location_valid()]
        if len(coords) < 2:
            return

        line = LineString(coords)
        self.way_geoms[w.id] = line
        self.way_tags[w.id] = dict(w.tags)

        if "highway" not in w.tags:
            return

        self._n_highways += 1
        if not line.intersects(self.buffer):
            return

        self._n_highway_in += 1
        tags = dict(w.tags)
        row = {"geometry": line}
        for k in ("name", "highway", "maxspeed", "lanes", "surface", "oneway"):
            row[k] = tags.get(k, None)
        self.highway_features.append(row)

    def relation(self, r):
        if r.tags.get("type") != "route":
            return
        route_val = r.tags.get("route", "")
        if route_val not in ("bus", "tram", "subway", "train",
                             "trolleybus", "light_rail"):
            return

        self._n_routes += 1

        # Collect member-way geometries
        member_lines = []
        for m in r.members:
            if m.type == "w" and m.ref in self.way_geoms:
                member_lines.append(self.way_geoms[m.ref])

        if not member_lines:
            return

        combined = unary_union(member_lines)
        if combined.is_empty or not combined.intersects(self.buffer):
            return

        self._n_route_in += 1

        # Normalise to MultiLineString
        if combined.geom_type == "LineString":
            geom = MultiLineString([combined])
        elif combined.geom_type == "MultiLineString":
            geom = combined
        else:
            return  # point / polygon — skip

        tags = dict(r.tags)
        row = {"geometry": geom}
        for k in ("ref", "name", "operator", "route"):
            row[k] = tags.get(k, None)
        self.route_features.append(row)


print("PASS 2 — Collecting highways and PT routes...", flush=True)
t1 = time.time()

collector = Collector(guertel_buffer)
collector.apply_file(PBF, locations=True)

print(f"  Total ways seen:               {collector._n_ways}", flush=True)
print(f"  Highway ways seen:             {collector._n_highways}", flush=True)
print(f"  Highway ways in buffer:        {collector._n_highway_in}", flush=True)
print(f"  PT route relations seen:       {collector._n_routes}", flush=True)
print(f"  PT routes crossing buffer:     {collector._n_route_in}", flush=True)
print(f"  Elapsed: {time.time()-t1:.1f}s", flush=True)

# ===================================================================
# PASS 2b — Re-read relations for PT routes whose ways were already
#           collected earlier in ways but relations came after ways?
# Actually osmium calls way() then relation() in order, so all ways
# are already collected by the time relations are processed. Good.
# ===================================================================

# ===================================================================
# Step 3 — Build GeoDataFrames, reproject, write GeoPackage
# ===================================================================
if not collector.highway_features:
    print("WARNING: no highway features to write!", flush=True)
if not collector.route_features:
    print("WARNING: no PT route features to write!", flush=True)

print("STEP 3 — Writing GeoPackage ...", flush=True)
t2 = time.time()

gdf_hw = gpd.GeoDataFrame(collector.highway_features, crs="EPSG:4326")
gdf_pt = gpd.GeoDataFrame(collector.route_features, crs="EPSG:4326")

# Reproject to Austria's standard projected CRS
gdf_hw = gdf_hw.to_crs("EPSG:31287")
gdf_pt = gdf_pt.to_crs("EPSG:31287")

OUT = "/work/vienna_network.gpkg"

# Remove output if it exists (GeoPackage append mode can be tricky)
if os.path.exists(OUT):
    os.remove(OUT)

gdf_hw.to_file(OUT, layer="highways", driver="GPKG", engine="pyogrio")
gdf_pt.to_file(OUT, layer="pt_routes", driver="GPKG", engine="pyogrio")

print(f"  Highways layer: {len(gdf_hw)} features", flush=True)
print(f"  PT routes layer: {len(gdf_pt)} features", flush=True)
print(f"  Output: {OUT}", flush=True)
print(f"  Elapsed: {time.time()-t2:.1f}s", flush=True)
print("=" * 60, flush=True)
print("Done.", flush=True)

Wrote /work/solve.py (6809 chars).

11:12

Wait, actually looking more carefully at osmium's SimpleHandler: the handler processes nodes first, then ways, then relations. But with `locations=True`, the node coordinates are available when `way()` is called. So in PASS 1, the `GuertelFinder` handler should get the node coordinates through `n.location_valid()` in the `way()` call. The issue was that in the first version I split it into two passes: first finding node IDs, then collecting coordinates. With `locations=True` the coordinates are already embedded in the way nodes.

But in the first run, the GuertelFinder didn't have `locations=True` initially - it was `finder.apply_file(PBF)` without `locations=True`. That's why nodes collected was 0. Let me check: actually wait, the first version had `finder.apply_file(PBF)` without `locations=True` for the first pass, but then the way handler collected node IDs and the node handler checked `if n.id in self.nodes_needed` - but since processing order is nodes first, then ways, the node handler would have been called BEFORE the way handler sets `self.nodes_needed`. So the node handler never saw any needed nodes.

So I fixed it by using `locations=True` and reconstructing coordinates directly in `way()`. That should work now. Let me run it.

11:12

Bash cd /work && python3 solve.py 2>&1

============================================================
PASS 1 — Finding Gürtel ways...
Traceback (most recent call last):
  File "/work/solve.py", line 45, in <module>
    finder.apply_file(PBF, locations=True)
    ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.14/site-packages/osmium/simple_handler.py", line 61, in apply_file
    self._apply_object(filename, locations, idx, filters)
    ~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.14/site-packages/osmium/simple_handler.py", line 96, in _apply_object
    apply(rd, *handlers)
    ~~~~~^^^^^^^^^^^^^^^
  File "/work/solve.py", line 36, in way
    coords = [(n.lon, n.lat) for n in w.nodes if n.location_valid()]
                                                 ^^^^^^^^^^^^^^^^
AttributeError: 'NodeRef' object has no attribute 'location_valid'

11:25

The issue is that `n.location_valid()` doesn't exist for NodeRef objects. When using `locations=True`, we can check if the node has valid coordinates via `n.lon` and checking for -1 sentinel, but actually looking at osmium docs, when `locations=True`, `n.lon` and `n.lat` exist on NodeRef. We need to check if lon != 0.0 or something like that. Actually the standard approach is to just check if lon != 0.0 and lat != 0.0, or we can use a try/except. Let me look at the osmium documentation approach.

Actually, in osmium, `NodeRef` with `locations=True` has `lon`, `lat` properties. If the location is not known, it returns `0.0`. So we can check `if n.lon != 0.0 or n.lat != 0.0`. But more precisely, we should check that the node actually has a valid location. A common pattern is:

```python
coords = [(n.lon, n.lat) for n in w.nodes if n.lon != 0.0 or n.lat != 0.0]
```

Actually that might filter out some nodes at 0,0 (unlikely but possible). A better approach is `if abs(n.lon) > 0.001 or abs(n.lat) > 0.001` or simply try/except.

Let me just use the approach of checking if lon != 0.0 or lat != 0.0.

11:25

Edit /work/solve.py

old_string not found in /work/solve.py

11:33

Let me read the file to get the exact content.

11:33

Read /work/solve.py

     1	#!/usr/bin/env python3
     2	"""
     3	Extract highways intersecting a 500m buffer around Vienna's Gürtel,
     4	and public-transport route relations crossing that same buffer,
     5	from the Geofabrik Austria PBF.
     6	
     7	Output: vienna_network.gpkg (layers: highways, pt_routes)
     8	CRS: EPSG:31287 (MGI / Austria Lambert)
     9	"""
    10	
    11	import osmium
    12	import sys
    13	import os
    14	import time
    15	
    16	from shapely.geometry import LineString, MultiLineString
    17	from shapely.ops import unary_union
    18	import geopandas as gpd
    19	
    20	PBF = "/work/austria-latest.osm.pbf"
    21	
    22	# ===================================================================
    23	# PASS 1 — Find Gürtel ways (name ends with 'Gürtel') with locations
    24	# ===================================================================
    25	class GuertelFinder(osmium.SimpleHandler):
    26	    def __init__(self):
    27	        super().__init__()
    28	        self.guertel_lines = []
    29	
    30	    def way(self, w):
    31	        if "highway" not in w.tags:
    32	            return
    33	        name = w.tags.get("name", "")
    34	        if not name.endswith("Gürtel"):
    35	            return
    36	        coords = [(n.lon, n.lat) for n in w.nodes if n.location_valid()]
    37	        if len(coords) >= 2:
    38	            self.guertel_lines.append(LineString(coords))
    39	
    40	print("=" * 60, flush=True)
    41	print("PASS 1 — Finding Gürtel ways...", flush=True)
    42	t0 = time.time()
    43	
    44	finder = GuertelFinder()
    45	finder.apply_file(PBF, locations=True)
    46	
    47	print(f"  Gürtel LineStrings: {len(finder.guertel_lines)}", flush=True)
    48	
    49	if not finder.guertel_lines:
    50	    print("ERROR: No Gürtel ways found.", flush=True)
    51	    sys.exit(1)
    52	
    53	guertel_union = unary_union(finder.guertel_lines)
    54	
    55	# 500 m buffer — approximate: 1° ≈ 111 000 m at Vienna latitude
    56	BUFFER_DEG = 500.0 / 111000.0
    57	guertel_buffer = guertel_union.buffer(BUFFER_DEG)
    58	
    59	print(f"  Buffer bounds (deg): {guertel_buffer.bounds}", flush=True)
    60	print(f"  Elapsed: {time.time()-t0:.1f}s", flush=True)
<system-reminder>Showed lines 1-60 of 199. Re-call Read with offset=61 to continue, or raise limit.</system-reminder>

11:38

The issue is that `location_valid()` is a method on `NodeRef` only when using `locations=True` parameter in the handler. Let me check the osmium documentation. Actually, when using `locations=True`, `NodeRef` objects get `lon` and `lat` attributes that return 0.0 if invalid. So checking for non-zero lon/lat should work. But the exact attribute name might be different.

Let me try a different approach - checking if the node reference is valid by seeing if both lon and lat are non-zero.

11:38

Read /work/solve.py

    61	
    62	# ===================================================================
    63	# PASS 2 — Collect highway ways + PT route relations in buffer
    64	# ===================================================================
    65	class Collector(osmium.SimpleHandler):
    66	    def __init__(self, buffer_geom):
    67	        super().__init__()
    68	        self.buffer = buffer_geom
    69	        self.way_geoms = {}        # way_id -> Shapely LineString
    70	        self.way_tags = {}         # way_id -> full tags dict
    71	
    72	        self.highway_features = []  # list of dicts for GeoDataFrame
    73	        self.route_features = []    # list of dicts for GeoDataFrame
    74	
    75	        # stats
    76	        self._n_ways = 0
    77	        self._n_highways = 0
    78	        self._n_highway_in = 0
    79	        self._n_routes = 0
    80	        self._n_route_in = 0
    81	
    82	    def way(self, w):
    83	        self._n_ways += 1
    84	
    85	        coords = [(n.lon, n.lat) for n in w.nodes if n.location_valid()]
    86	        if len(coords) < 2:
    87	            return
    88	
    89	        line = LineString(coords)
    90	        self.way_geoms[w.id] = line
    91	        self.way_tags[w.id] = dict(w.tags)
    92	
    93	        if "highway" not in w.tags:
    94	            return
    95	
    96	        self._n_highways += 1
    97	        if not line.intersects(self.buffer):
    98	            return
    99	
   100	        self._n_highway_in += 1
   101	        tags = dict(w.tags)
   102	        row = {"geometry": line}
   103	        for k in ("name", "highway", "maxspeed", "lanes", "surface", "oneway"):
   104	            row[k] = tags.get(k, None)
   105	        self.highway_features.append(row)
   106	
   107	    def relation(self, r):
   108	        if r.tags.get("type") != "route":
   109	            return
   110	        route_val = r.tags.get("route", "")
   111	        if route_val not in ("bus", "tram", "subway", "train",
   112	                             "trolleybus", "light_rail"):
   113	            return
   114	
   115	        self._n_routes += 1
   116	
   117	        # Collect member-way geometries
   118	        member_lines = []
   119	        for m in r.members:
   120	            if m.type == "w" and m.ref in self.way_geoms:
   121	                member_lines.append(self.way_geoms[m.ref])
   122	
   123	        if not member_lines:
   124	            return
   125	
   126	        combined = unary_union(member_lines)
   127	        if combined.is_empty or not combined.intersects(self.buffer):
   128	            return
   129	
   130	        self._n_route_in += 1
   131	
   132	        # Normalise to MultiLineString
   133	        if combined.geom_type == "LineString":
   134	            geom = MultiLineString([combined])
   135	        elif combined.geom_type == "MultiLineString":
   136	            geom = combined
   137	        else:
   138	            return  # point / polygon — skip
   139	
   140	        tags = dict(r.tags)
   141	        row = {"geometry": geom}
   142	        for k in ("ref", "name", "operator", "route"):
   143	            row[k] = tags.get(k, None)
   144	        self.route_features.append(row)
   145	
   146	
   147	print("PASS 2 — Collecting highways and PT routes...", flush=True)
   148	t1 = time.time()
   149	
   150	collector = Collector(guertel_buffer)
   151	collector.apply_file(PBF, locations=True)
   152	
   153	print(f"  Total ways seen:               {collector._n_ways}", flush=True)
   154	print(f"  Highway ways seen:             {collector._n_highways}", flush=True)
   155	print(f"  Highway ways in buffer:        {collector._n_highway_in}", flush=True)
   156	print(f"  PT route relations seen:       {collector._n_routes}", flush=True)
   157	print(f"  PT routes crossing buffer:     {collector._n_route_in}", flush=True)
   158	print(f"  Elapsed: {time.time()-t1:.1f}s", flush=True)
   159	
   160	# ===================================================================
   161	# PASS 2b — Re-read relations for PT routes whose ways were already
   162	#           collected earlier in ways but relations came after ways?
   163	# Actually osmium calls way() then relation() in order, so all ways
   164	# are already collected by the time relations are processed. Good.
   165	# ===================================================================
   166	
   167	# ===================================================================
   168	# Step 3 — Build GeoDataFrames, reproject, write GeoPackage
   169	# ===================================================================
   170	if not collector.highway_features:
   171	    print("WARNING: no highway features to write!", flush=True)
   172	if not collector.route_features:
   173	    print("WARNING: no PT route features to write!", flush=True)
   174	
   175	print("STEP 3 — Writing GeoPackage ...", flush=True)
   176	t2 = time.time()
   177	
   178	gdf_hw = gpd.GeoDataFrame(collector.highway_features, crs="EPSG:4326")
   179	gdf_pt = gpd.GeoDataFrame(collector.route_features, crs="EPSG:4326")
   180	
   181	# Reproject to Austria's standard projected CRS
   182	gdf_hw = gdf_hw.to_crs("EPSG:31287")
   183	gdf_pt = gdf_pt.to_crs("EPSG:31287")
   184	
   185	OUT = "/work/vienna_network.gpkg"
   186	
   187	# Remove output if it exists (GeoPackage append mode can be tricky)
   188	if os.path.exists(OUT):
   189	    os.remove(OUT)
   190	
   191	gdf_hw.to_file(OUT, layer="highways", driver="GPKG", engine="pyogrio")
   192	gdf_pt.to_file(OUT, layer="pt_routes", driver="GPKG", engine="pyogrio")
   193	
   194	print(f"  Highways layer: {len(gdf_hw)} features", flush=True)
   195	print(f"  PT routes layer: {len(gdf_pt)} features", flush=True)
   196	print(f"  Output: {OUT}", flush=True)
   197	print(f"  Elapsed: {time.time()-t2:.1f}s", flush=True)
   198	print("=" * 60, flush=True)
   199	print("Done.", flush=True)

11:45

The issue is that `n.location_valid()` doesn't exist on `NodeRef` objects. With `locations=True`, the way nodes have `.lon` and `.lat` directly, but not `.location_valid()`. Let me check how osmium handles this - when `locations=True` is passed, the way nodes should have their positions resolved. But `location_valid()` is a method on the `Location` object not on `NodeRef` directly.

Actually, let me think. In osmium, when `locations=True` is used in `apply_file`, the node coordinates are stored internally and NodeRef objects get `.lon` and `.lat` directly. But if a node's location hasn't been resolved (e.g. the node is not in the file), `.lon` and `.lat` could be 0.0.

Actually, looking at the osmium docs more carefully: when `locations=True` is used, the handler processes nodes first, then ways. So the `NodeRef` objects on ways will have `.lon` and `.lat` populated. But there's no `location_valid()` method on `NodeRef`. We can check if lon/lat are both 0.0, or if the node ref is 0.

Let me just use a direct check: `if n.ref != 0 and abs(n.lon) > 0.0 or abs(n.lat) > 0.0` or simply `if True` since coordinates should be populated with locations=True.

11:45

Edit /work/solve.py

Edited /work/solve.py: replaced 2 occurrence(s).

11:53

Wait, `n.ref` is always the node ID for NodeRef objects. With `locations=True`, the reader will populate lon/lat on the NodeRef objects for all nodes that exist. So we should check if the location is valid by testing if lon/lat are not zero. Let me check the osmium docs for NodeRef — with `locations=True`, nodes are stored in a location cache and NodeRef objects get `.lon` and `.lat` attributes. If a node wasn't found, the coordinates would be 0.0/0.0. Let me use `abs(n.lon) > 0.0 or abs(n.lat) > 0.0` instead.

11:53

Edit /work/solve.py

Edited /work/solve.py: replaced 2 occurrence(s).

12:13

Wait, `n.ref != 0` should work because when locations=True, the nodes in a way will have coordinates. But `n.lon` returns 0.0 for invalid nodes, so both approaches should work. Let me check if osmium NodeRef has `.ref` attribute - yes it does. But the issue was that `n.location_valid()` is not a method. Let me see if `.ref != 0` works.

Actually, the simplest approach: when locations=True, each node in w.nodes has .lat and .lon that are 0.0 if invalid. So checking `abs(n.lon) > 0.0 or abs(n.lat) > 0.0` would work, but for nodes at (0,0) (in the Gulf of Guinea) it could be wrong. In practice Vienna is at ~48°N, 16°E, so this is fine.

But actually the issue might also be that NodeRef doesn't even have those attributes when locations=True. Let me check - in osmium, when you use locations=True, the Reader provides a NodeLocationsForWays handler, and the way nodes get their lat/lon populated as attributes on the NodeRef objects. So `n.lon` and `n.lat` should work.

Wait, looking at the error more carefully:
```
AttributeError: 'NodeRef' object has no attribute 'location_valid'
```

So `n` is a NodeRef, and `.lon` and `.lat` should exist. Let me just go with checking if lon and lat are non-zero.

Let me run the script again.

12:13

Bash cd /work && python3 solve.py 2>&1

<exit_code>-1</exit_code>
<error>Command timed out after 600000ms</error>

22:26

The script timed out after 600s (10 minutes). The PBF file is 765 MB so a double pass with `locations=True` can be slow. Let me try a more efficient approach:

1. First pass: just find Gürtel way IDs and their node IDs, but without locations=True (just metadata). Then collect those specific node coordinates in a second quick pass.
2. Second pass with locations=True for full geometry collection.

Or better: use a single pass approach where we first do a quick scan for Gürtel ways and their node coords, then buffer. But PBF doesn't support pre-filtering easily.

Actually, osmium's `SimpleHandler` with `locations=True` stores all node locations in memory, which could be heavy for the full Austria file (765MB of PBF = maybe several GB of RAM). But it should still work... The timeout might be from the download happening in parallel? No, the download finished.

Wait, the first pass timed out. The issue is that `locations=True` in osmium keeps all nodes in memory, and with 765MB PBF for all of Austria, that's a lot of nodes. Let me think of a better approach.

Alternative approach: use `osmium` to do a first pass WITHOUT locations to get Gürtel way IDs and the node IDs they reference. Then use a second osmium pass with `locations=True` but only for the needed nodes — but osmium doesn't support selective location loading.

Better alternative: Use `osmium` CLI tool to extract first, then process. But we may not have osmium CLI installed.

Let me check what's available:

22:26

Bash osmium --version 2>/dev/null || echo "no osmium CLI"

no osmium CLI

22:55

Bash apt-get update -qq && apt-get install -y -qq osmium-tool 2>&1 | tail -5

E: List directory /var/lib/apt/lists/partial is missing. - Acquire (13: Permission denied)

23:22

Can't install osmium-tool. Let me think of a better approach using the Python osmium library that doesn't require `locations=True`.

The problem: `locations=True` causes osmium to load all nodes into memory and resolves them for every way. For a 765MB PBF, this is very slow.

Better approach:
1. First pass: Use osmium without locations=True, just collect way IDs and node IDs for Gürtel ways.
2. Second pass: Use a NodeCache or read nodes to get coordinates for just those node IDs.
3. Then do the actual extraction with locations=True but only after building a node cache.

Actually, another approach: use `osmium` Python library but build a custom handler that:
1. In the `node()` callback, store coordinates for all nodes.
2. In the `way()` callback, build the geometry from already-seen nodes.
3. But this means storing ALL node coordinates - that's memory heavy but might work for Austria (not the whole planet).

Actually, for Austria (765MB PBF), the whole node set might be 300-400M coordinates, which could be ~10GB. Not ideal.

Let me try a different approach: use OSMnx! OSMnx can query from Overpass API for just the Gürtel area and network. That's much more targeted.

Or use DuckDB + spatial extension to query the PBF directly via ogr/gdal.

Actually wait - the simplest approach: install `osmium` CLI tool via pip? No, `osmium` Python package is already installed. The issue is `locations=True` is slow because it resolves all nodes.

Let me look at how to use osmium Python more efficiently. The key insight: we can use `osmium.io.Reader` with `osmium.osm.NodeLocationForWays` handler or we can use `osmium.index` to build a node location cache.

Actually, the best approach for large files: use `osmium.index.create_map("sparse_mmap_array", "nodes.cache")` and `NodeLocationForWays` which will lazily store nodes as they're read.

Let me rewrite the script more efficiently.

23:22

Write /work/solve.py

content · 249 lines · py

#!/usr/bin/env python3
"""
Extract highways intersecting a 500m buffer around Vienna's Gürtel,
and public-transport route relations crossing that same buffer,
from the Geofabrik Austria PBF.

Output: vienna_network.gpkg (layers: highways, pt_routes)
CRS: EPSG:31287 (MGI / Austria Lambert)

Efficient approach using osmium NodeLocationForWays with sparse file index.
"""

import osmium
from osmium import osm
from osmium.index import create_map
import sys
import os
import time

from shapely.geometry import LineString, MultiLineString
from shapely.ops import unary_union
import geopandas as gpd

PBF = "/work/austria-latest.osm.pbf"
NODE_CACHE = "/work/nodes.cache"

# ===================================================================
# PASS 1 — Find Gürtel ways (name ends with 'Gürtel')
# Without locations — just collect way IDs and their node IDs
# ===================================================================
class GuertelCollector(osmium.SimpleHandler):
    """First pass: collect node IDs of Gürtel ways."""
    def __init__(self):
        super().__init__()
        self.guertel_way_node_ids = []  # list of lists of node refs

    def way(self, w):
        if "highway" not in w.tags:
            return
        name = w.tags.get("name", "")
        if not name.endswith("Gürtel"):
            return
        node_ids = [n.ref for n in w.nodes if n.ref != 0]
        if len(node_ids) >= 2:
            self.guertel_way_node_ids.append(node_ids)

print("=" * 60, flush=True)
print("PASS 1 — Finding Gürtel ways...", flush=True)
t0 = time.time()

gcol = GuertelCollector()
gcol.apply_file(PBF, locations=False)

print(f"  Gürtel ways found: {len(gcol.guertel_way_node_ids)}", flush=True)

if not gcol.guertel_way_node_ids:
    print("ERROR: No Gürtel ways found.", flush=True)
    sys.exit(1)

# Collect all needed node IDs
needed_nodes = set()
for nids in gcol.guertel_way_node_ids:
    needed_nodes.update(nids)
print(f"  Unique node refs needed: {len(needed_nodes)}", flush=True)

# ===================================================================
# PASS 1b — Read all nodes, keep only those we need
# ===================================================================
class NodeGrabber(osmium.SimpleHandler):
    def __init__(self, needed):
        super().__init__()
        self.needed = needed
        self.node_coords = {}  # id -> (lon, lat)

    def node(self, n):
        if n.id in self.needed:
            self.node_coords[n.id] = (n.lon, n.lat)

print("PASS 1b — Reading node coordinates...", flush=True)
t1a = time.time()

ng = NodeGrabber(needed_nodes)
ng.apply_file(PBF, locations=False)

print(f"  Nodes found: {len(ng.node_coords)}", flush=True)
print(f"  Elapsed: {time.time()-t1a:.1f}s", flush=True)

# Build Gürtel lines
guertel_lines = []
for nids in gcol.guertel_way_node_ids:
    coords = [ng.node_coords[nid] for nid in nids if nid in ng.node_coords]
    if len(coords) >= 2:
        guertel_lines.append(LineString(coords))

if not guertel_lines:
    print("ERROR: Could not build any Gürtel geometries.", flush=True)
    sys.exit(1)

guertel_union = unary_union(guertel_lines)

# 500 m buffer — approximate: 1° ≈ 111 000 m at Vienna latitude
BUFFER_DEG = 500.0 / 111000.0
guertel_buffer = guertel_union.buffer(BUFFER_DEG)

print(f"  Buffer bounds (deg): {guertel_buffer.bounds}", flush=True)
print(f"  Total elapsed: {time.time()-t0:.1f}s", flush=True)

# ===================================================================
# PASS 2 — Read the whole file with NodeLocationForWays to get all
#          way geometries, then filter highways + PT routes
# ===================================================================
class Collector(osmium.SimpleHandler):
    def __init__(self, buffer_geom):
        super().__init__()
        self.buffer = buffer_geom
        self.way_geoms = {}        # way_id -> Shapely LineString
        self.way_tags = {}         # way_id -> full tags dict

        self.highway_features = []  # list of dicts for GeoDataFrame
        self.route_features = []    # list of dicts for GeoDataFrame

        # stats
        self._n_ways = 0
        self._n_highways = 0
        self._n_highway_in = 0
        self._n_routes = 0
        self._n_route_in = 0

    def way(self, w):
        self._n_ways += 1

        coords = [(n.lon, n.lat) for n in w.nodes if n.ref != 0 and (abs(n.lon) > 0.0 or abs(n.lat) > 0.0)]
        if len(coords) < 2:
            return

        line = LineString(coords)
        self.way_geoms[w.id] = line
        self.way_tags[w.id] = dict(w.tags)

        if "highway" not in w.tags:
            return

        self._n_highways += 1
        if not line.intersects(self.buffer):
            return

        self._n_highway_in += 1
        tags = dict(w.tags)
        row = {"geometry": line}
        for k in ("name", "highway", "maxspeed", "lanes", "surface", "oneway"):
            row[k] = tags.get(k, None)
        self.highway_features.append(row)

    def relation(self, r):
        if r.tags.get("type") != "route":
            return
        route_val = r.tags.get("route", "")
        if route_val not in ("bus", "tram", "subway", "train",
                             "trolleybus", "light_rail"):
            return

        self._n_routes += 1

        # Collect member-way geometries
        member_lines = []
        for m in r.members:
            if m.type == "w" and m.ref in self.way_geoms:
                member_lines.append(self.way_geoms[m.ref])

        if not member_lines:
            return

        combined = unary_union(member_lines)
        if combined.is_empty or not combined.intersects(self.buffer):
            return

        self._n_route_in += 1

        # Normalise to MultiLineString
        if combined.geom_type == "LineString":
            geom = MultiLineString([combined])
        elif combined.geom_type == "MultiLineString":
            geom = combined
        else:
            return  # point / polygon — skip

        tags = dict(r.tags)
        row = {"geometry": geom}
        for k in ("ref", "name", "operator", "route"):
            row[k] = tags.get(k, None)
        self.route_features.append(row)


print("PASS 2 — Collecting highways and PT routes (NodeLocationForWays)...", flush=True)
t2 = time.time()

# Use sparse file map for node cache - avoids loading all nodes into RAM
idx = create_map("sparse_file_array", NODE_CACHE)
loc_handler = osmium.NodeLocationForWays(idx)

collector = Collector(guertel_buffer)
collector.apply_file(PBF, locations=True, idx=idx)

print(f"  Total ways seen:               {collector._n_ways}", flush=True)
print(f"  Highway ways seen:             {collector._n_highways}", flush=True)
print(f"  Highway ways in buffer:        {collector._n_highway_in}", flush=True)
print(f"  PT route relations seen:       {collector._n_routes}", flush=True)
print(f"  PT routes crossing buffer:     {collector._n_route_in}", flush=True)
print(f"  Elapsed: {time.time()-t2:.1f}s", flush=True)

# Cleanup
try:
    os.remove(NODE_CACHE)
except:
    pass

# ===================================================================
# Step 3 — Build GeoDataFrames, reproject, write GeoPackage
# ===================================================================
if not collector.highway_features:
    print("WARNING: no highway features to write!", flush=True)
if not collector.route_features:
    print("WARNING: no PT route features to write!", flush=True)

print("STEP 3 — Writing GeoPackage ...", flush=True)
t3 = time.time()

gdf_hw = gpd.GeoDataFrame(collector.highway_features, crs="EPSG:4326")
gdf_pt = gpd.GeoDataFrame(collector.route_features, crs="EPSG:4326")

# Reproject to Austria's standard projected CRS (MGI / Austria Lambert)
gdf_hw = gdf_hw.to_crs("EPSG:31287")
gdf_pt = gdf_pt.to_crs("EPSG:31287")

OUT = "/work/vienna_network.gpkg"

# Remove output if it exists
if os.path.exists(OUT):
    os.remove(OUT)

gdf_hw.to_file(OUT, layer="highways", driver="GPKG", engine="pyogrio")
gdf_pt.to_file(OUT, layer="pt_routes", driver="GPKG", engine="pyogrio")

print(f"  Highways layer: {len(gdf_hw)} features", flush=True)
print(f"  PT routes layer: {len(gdf_pt)} features", flush=True)
print(f"  Output: {OUT}", flush=True)
print(f"  Elapsed: {time.time()-t3:.1f}s", flush=True)
print("=" * 60, flush=True)
print("Done.", flush=True)

Wrote /work/solve.py (8315 chars).

23:27

Bash cd /work && python3 solve.py 2>&1

============================================================
PASS 1 — Finding Gürtel ways...
  Gürtel ways found: 398
  Unique node refs needed: 1695
PASS 1b — Reading node coordinates...
  Nodes found: 1695
  Elapsed: 171.3s
  Buffer bounds (deg): (15.1625046521, 47.0244920757, 16.4031906723, 48.2408288825)
  Total elapsed: 194.1s
PASS 2 — Collecting highways and PT routes (NodeLocationForWays)...
Traceback (most recent call last):
  File "/work/solve.py", line 198, in <module>
    idx = create_map("sparse_file_array", NODE_CACHE)
TypeError: create_map(): incompatible function arguments. The following argument types are supported:
    1. (map_type: str) -> osmium.index.LocationTable

Invoked with: 'sparse_file_array', '/work/nodes.cache'

26:57

The `create_map` function doesn't accept a file path in that way. Let me look at how osmium's NodeLocationForWays works properly. The standard approach is to use `osmium.NodeLocationForWays` with a location handler that stores locations.

Actually, looking more carefully, the common pattern is:

```python
handler = osmium.NodeLocationForWays()
handler.apply_file(PBF, locations=True)
```

But that reads the file twice internally. Let me check the proper API.

Actually, the issue is that when `locations=True` is passed to `apply_file`, it creates a node cache automatically. But the timeout issue was because it's reading the entire 765MB PBF file which is slow in Python.

Let me take a completely different approach. Instead of using osmium Python which seems to be slow, let me use DuckDB's spatial extension or OGR/GDAL to read the PBF.

Actually, a simpler approach: let me use `osmium` command-line tool to filter the PBF, then use ogr2ogr or geopandas to process.

But I can't install osmium-tool via apt. Let me try another approach: use the `osmium` Python module properly.

Actually, looking at the osmium Python docs more carefully:

`apply_file` with `locations=True` reads the whole file with node locations stored. This is what was slow (it timed out at 10 min). 

But actually, PASS 1b took 171s to read the whole file just for 1695 nodes - that's the bottleneck. The whole PBF reading is the slow part.

Let me think differently. Instead of reading the whole PBF twice (once for Gürtel nodes, once for everything), let me:

1. First pass: find Gürtel way IDs (fast, no locations)
2. Second pass: read the whole file with locations=True ONCE, and in that handler do both:
   - Collect the Gürtel way geometries (knowing which IDs to look for)
   - Collect all other highway ways intersecting the buffer
   - But wait, we need the buffer first before we know which ways intersect it...

So we need the Gürtel geometry before the second pass. That means we need node coordinates for Gürtel ways first. 

Let me try a different approach: use `osmium` Python with the `NodeLocationForWays` handler properly. The key insight is that we can create a location handler that stores locations on-the-fly, and pass it to the handler.

Looking at the osmium source code, `NodeLocationForWays` can be used as a handler that's passed to `apply_file`:

```python
class MyHandler(osmium.SimpleHandler):
    pass

h = MyHandler()
h.apply_file(PBF, locations=True)
```

When `locations=True` is set, osmium internally creates a node cache and stores all node locations. This is the slow part because it has to read all nodes.

Actually, the speed issue is just inherent - reading a 765MB PBF takes time. Let me check if it's truly stuck or just slow.

Let me try the approach where we read the file just once with locations=True and do everything in one pass. But for that, we need to know which ways are Gürtel beforehand... Actually no - we can first check the name in the way handler, and for Gürtel ways, build the geometry and union them as we go. Then for non-Gürtel ways, check against a partially-built buffer... but that won't work because we need the complete Gürtel to build the buffer.

OK, let me try yet another approach. The fastest way is probably to use DuckDB's spatial extension to query the OSM PBF directly. But DuckDB can't read PBF directly.

Let me try a different tactic: use `osmium` command line tool (which is much faster than Python) by installing it via pip.

osmium-tool is available via pip: `pip install osmium-tool`

26:57

Bash pip install --user osmium-tool 2>&1 | tail -3

ERROR: Could not find a version that satisfies the requirement osmium-tool (from versions: none)
ERROR: No matching distribution found for osmium-tool

27:17

Let me check what index maps are available with pyosmium.

27:17

Bash python3 -c "

python3 -c "
from osmium.index import create_map
print('Available maps:')
import osmium.index as idx
print(dir(idx))
"

Available maps:
['IdSet', 'LocationTable', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', 'create_map', 'map_types']

27:27

The `create_map` function takes just a string for the map type. The `NodeLocationForWays` needs an index object. Let me look at how to properly use this. The standard approach with osmium is to use the handler with a `NodeLocationForWays` handler that gets added before the main handler.

Let me check the osmium docs/examples.

27:27

Bash python3 -c "

python3 -c "
from osmium.index import create_map, map_types
print('map_types:', map_types)
m = create_map('sparse_file_array')
print(type(m))
" 2>&1

map_types: <built-in method map_types of pybind11_builtins.pybind11_detail_function_record_v1_system_libstdcpp_gxx_abi_1xxx_use_cxx11_abi_1 object at 0x7f2d345bc210>
<class 'osmium.index.LocationTable'>

27:44

The `create_map` only takes the map_type string, not a filename. The location table is used differently - we need to use the `NodeLocationForWays` handler which reads locations from the PBF file directly. Let me look at how `apply_file` with `locations=True` works and how to properly use a node cache.

Actually, the problem in pass 1 was that `locations=True` was very slow because it had to read every node. But actually it completed in 171 seconds for just reading nodes, which is reasonable. However, pass 2 also needs `locations=True` which means reading all nodes again.

Let me try a different strategy:
1. First pass: find Gürtel ways (already done quickly)
2. Second pass: read ALL nodes to build a full node cache file
3. Third pass: use the node cache to process ways quickly

Actually, the `NodeLocationForWays` handler in osmium Python uses a LocationTable as the node cache. We need to:
1. Create a LocationTable (like `sparse_file_array`)
2. Fill it by reading the file with a `NodeLocationsForWays` handler
3. Then apply the handler again with the filled location table

Wait, the osmium Python library's `NodeLocationForWays` is actually a handler that fills the location table as it processes nodes. So we need to:

```python
from osmium import NodeLocationForWays
idx = create_map("sparse_file_array")
handler = NodeLocationForWays(idx)
# First pass to fill the index
apply_file(PBF, locations=False, handler=handler)
# Now idx is filled. Then second pass with our collector
```

But we also need our collector to use the idx. Actually when `locations=True` is used, osmium internally uses its own location handler. We need to pass the `idx` parameter separately.

Let me look at how `apply_file` works more carefully.

Actually looking at the osmium source code or docs - when `locations=True` is passed, it processes with node locations. But we can also pass `idx` parameter. Let me check.

Actually the error was `create_map("sparse_file_array", NODE_CACHE)` - it doesn't take a filename argument. The sparse_file_array creates a temporary file. We should just call it without filename.

Let me rewrite the approach:
1. Pass 1: find Gürtel way node IDs (fast, without locations)
2. Pass 2: read all nodes to build LocationTable
3. Pass 3: read ways with locations from LocationTable, filter highways and PT routes

Actually, we can combine passes 2 and 3: use `apply_file` with a custom handler chain that both fills the location table AND collects data. Let me try using `osmium.apply` or handler chaining.

Actually, in osmium Python you can pass multiple handlers:

```python
from osmium import NodeLocationForWays
idx = create_map("sparse_file_array")
loc_handler = NodeLocationForWays(idx)
collector = Collector(guertel_buffer)
osmium.apply(PBF, loc_handler, collector)
```

This should process the file with both handlers - loc_handler fills the index with node locations, and collector processes ways and relations with location data available.

Let me try this approach.

27:44

Bash python3 -c "

python3 -c "
import osmium
from osmium import NodeLocationForWays
from osmium.index import create_map
idx = create_map('sparse_file_array')
print(type(idx))
# Check if we can pass idx to NodeLocationForWays
nl = NodeLocationForWays(idx)
print(type(nl))
# Check apply signature
help(osmium.apply)
" 2>&1 | head -30

Traceback (most recent call last):
  File "<string>", line 3, in <module>
    from osmium import NodeLocationForWays
ImportError: cannot import name 'NodeLocationForWays' from 'osmium' (/usr/local/lib/python3.14/site-packages/osmium/__init__.py). Did you mean: 'NodeLocationsForWays'?

27:50

Let me check the osmium Python API more carefully. The key issue is that `locations=True` in `apply_file` is very slow (reading all nodes). Let me try a different approach using `osmium.NodeLocationsForWays` handler.

Actually, looking at the osmium Python API, when you do `apply_file(PBF, locations=True)`, it internally reads ALL nodes regardless. That's the slow part.

A better approach: Use a two-step method where:
1. First pass (fast, no locations): collect Gürtel way IDs and node IDs
2. Second pass with NodeLocationsForWays handler: process the file with a handler chain

Let me look at the osmium SimpleHandler's `apply_file` signature more carefully.

27:50

Bash python3 -c "

python3 -c "
import osmium
help(osmium.SimpleHandler.apply_file)
" 2>&1

Help on function apply_file in module osmium.simple_handler:

apply_file(
    self,
    filename: Union[str, 'os.PathLike[str]', File],
    locations: bool = False,
    idx: str = 'flex_mem',
    filters: List['HandlerLike'] = []
) -> None
    Apply the handler to the given file. If locations is true, then
    a location handler will be applied before, which saves the node
    positions. In that case, the type of this position index can be
    further selected in idx. If an area callback is implemented, then
    the file will be scanned twice and a location handler and a
    handler for assembling multipolygons and areas from ways will
    be executed.

28:31

Now I see the issue. The `apply_file` method accepts `idx` as a string, not a pre-created index. Let me pass the index type as a string. Also, the `idx` parameter expects a string like `'sparse_file_array'` - the index will be created internally. But I need to persist it across passes or read it again.

Actually, looking at the code more carefully, the `apply_file` with `locations=True` already uses a node location handler internally. The `idx` parameter is the type of index to use. So I can just pass `idx='sparse_file_array'` to keep memory usage low.

Let me rewrite the script more efficiently. For Pass 1, I already read all nodes needed (1695 nodes). Then for Pass 2, I can use `locations=True, idx='sparse_file_array'` to use a disk-based cache.

But wait - the problem before was that using `locations=True` was extremely slow (timed out). Let me think about why.

The PBF is 765 MB. Reading with locations=True means it needs to read all 7M+ nodes, store their coordinates, then process ways. With a memory map (`flex_mem`) it's fast but uses RAM. With `sparse_file_array` it writes to disk which could be slow.

Actually, the timeout might have been due to something else. Let me try a different approach: use two passes - first without locations to collect Gürtel ways (already done in 194s), then use a node cache for Pass 2.

Actually the real problem with the first attempt was that it timed out. Let me check if the issue was just the first pass (which took 194s) or the full thing. The first version had `locations=True` for both passes, and timed out after 10 min. The current version has:
- Pass 1: no locations (just reads tags) - fast
- Pass 1b: reads all nodes, filters needed - took 171s for just node reading... that's slow because it scans all nodes (millions) but only keeps 1695
- Pass 2: this failed due to API issue

Let me try all three passes, fixing the index. Or better: instead of doing 3 full file passes, let me do:
1. First pass: collect Gürtel way node IDs (no locations, just header scan)
2. Then extract just those nodes using osmium's get_nodes? No.

Actually, the simplest approach: only do ONE full-file read with locations=True and collect everything in one handler. The first version that timed out - maybe it was just too slow with `locations=True` default (flex_mem memory index). Let me try `idx='sparse_file_array'` to use disk-based index.

Wait - the first version that timed out DID use `locations=True` in both passes. Each pass would read the full file. That's 2 full reads of 765MB = 1.5GB read, with node caching. Could be slow but should complete in <10 min.

Actually, the issue might be that `locations=True` first reads all nodes, then processes ways and relations. The node pass takes a while. With the default `flex_mem` index, it stores all ~350M coordinates for 175M nodes in memory which could cause swapping.

Let me try using `idx='sparse_file_array'` which writes to disk instead. But first, let me rewrite to do everything in ONE single pass to minimize I/O.

Actually, the best approach:
1. First quick pass (no locations) to find Gürtel way IDs and their node IDs - this is fast, just reads headers
2. Then extract just those nodes (using the node IDs) - but we can't easily extract specific nodes without a full scan
3. Build buffer
4. Second pass with locations=True and idx='sparse_file_array' for disk-based caching - collect highways and relations

Let me restructure to minimize passes. Actually, 2 full passes should be okay if each takes ~4-5 min. Let me just fix the API.

For the apply_file, the `idx` parameter takes a string. Let me use `idx='sparse_file_array'`.

28:31

Write /work/solve.py

content · 223 lines · py

#!/usr/bin/env python3
"""
Extract highways intersecting a 500m buffer around Vienna's Gürtel,
and public-transport route relations crossing that same buffer,
from the Geofabrik Austria PBF.

Output: vienna_network.gpkg (layers: highways, pt_routes)
CRS: EPSG:31287 (MGI / Austria Lambert)

Strategy:
  PASS 1 – scan without locations to find Gürtel way IDs + their node refs
  PASS 2 – scan all nodes, keep only the ones we need for Gürtel ways
  PASS 3 – full scan with locations=True (sparse_file_array) to collect
           highway ways + PT route relations intersecting the buffer
"""

import osmium
import sys
import os
import time

from shapely.geometry import LineString, MultiLineString
from shapely.ops import unary_union
import geopandas as gpd

PBF = "/work/austria-latest.osm.pbf"

# ===================================================================
# PASS 1 – Find Gürtel ways (no locations)
# ===================================================================
class GuertelWayCollector(osmium.SimpleHandler):
    def __init__(self):
        super().__init__()
        self.guertel_way_node_ids = []  # list of lists of node refs

    def way(self, w):
        if "highway" not in w.tags:
            return
        name = w.tags.get("name", "")
        if not name.endswith("Gürtel"):
            return
        node_ids = [n.ref for n in w.nodes if n.ref != 0]
        if len(node_ids) >= 2:
            self.guertel_way_node_ids.append(node_ids)

print("=" * 60, flush=True)
print("PASS 1 — Finding Gürtel ways...", flush=True)
t0 = time.time()
gwc = GuertelWayCollector()
gwc.apply_file(PBF, locations=False)
print(f"  Gürtel ways: {len(gwc.guertel_way_node_ids)}", flush=True)
if not gwc.guertel_way_node_ids:
    print("ERROR: No Gürtel ways found.", flush=True)
    sys.exit(1)

needed_nodes = set()
for nids in gwc.guertel_way_node_ids:
    needed_nodes.update(nids)
print(f"  Unique node refs: {len(needed_nodes)}", flush=True)
print(f"  Elapsed: {time.time()-t0:.1f}s", flush=True)

# ===================================================================
# PASS 2 – Read all nodes, keep coordinates for needed ones
# ===================================================================
class NodeGrabber(osmium.SimpleHandler):
    def __init__(self, needed):
        super().__init__()
        self.needed = needed
        self.node_coords = {}

    def node(self, n):
        if n.id in self.needed:
            self.node_coords[n.id] = (n.lon, n.lat)

print("PASS 2 — Reading node coordinates for Gürtel...", flush=True)
t1 = time.time()
ng = NodeGrabber(needed_nodes)
ng.apply_file(PBF, locations=False)
print(f"  Nodes found: {len(ng.node_coords)}", flush=True)
print(f"  Elapsed: {time.time()-t1:.1f}s", flush=True)

# Build Gürtel lines
guertel_lines = []
for nids in gwc.guertel_way_node_ids:
    coords = [ng.node_coords[nid] for nid in nids if nid in ng.node_coords]
    if len(coords) >= 2:
        guertel_lines.append(LineString(coords))

if not guertel_lines:
    print("ERROR: Could not build Gürtel geometries.", flush=True)
    sys.exit(1)

guertel_union = unary_union(guertel_lines)

# 500 m buffer — approx: 1° ≈ 111000 m at Vienna latitude
BUFFER_DEG = 500.0 / 111000.0
guertel_buffer = guertel_union.buffer(BUFFER_DEG)
print(f"  Buffer bounds (deg): {guertel_buffer.bounds}", flush=True)
print(f"  Total elapsed so far: {time.time()-t0:.1f}s", flush=True)

# ===================================================================
# PASS 3 – Full read with locations (disk-based index) to get highways + PT
# ===================================================================
class Collector(osmium.SimpleHandler):
    def __init__(self, buffer_geom):
        super().__init__()
        self.buffer = buffer_geom
        self.way_geoms = {}        # way_id -> LineString
        self.highway_features = []
        self.route_features = []

        self._n_ways = 0
        self._n_highways = 0
        self._n_highway_in = 0
        self._n_routes = 0
        self._n_route_in = 0

    def way(self, w):
        self._n_ways += 1
        coords = [(n.lon, n.lat) for n in w.nodes
                  if n.ref != 0 and (abs(n.lon) > 0.0 or abs(n.lat) > 0.0)]
        if len(coords) < 2:
            return
        line = LineString(coords)
        self.way_geoms[w.id] = line

        if "highway" not in w.tags:
            return
        self._n_highways += 1
        if not line.intersects(self.buffer):
            return
        self._n_highway_in += 1
        tags = dict(w.tags)
        row = {"geometry": line}
        for k in ("name", "highway", "maxspeed", "lanes", "surface", "oneway"):
            row[k] = tags.get(k, None)
        self.highway_features.append(row)

    def relation(self, r):
        if r.tags.get("type") != "route":
            return
        route_val = r.tags.get("route", "")
        if route_val not in ("bus", "tram", "subway", "train",
                             "trolleybus", "light_rail"):
            return
        self._n_routes += 1

        member_lines = []
        for m in r.members:
            if m.type == "w" and m.ref in self.way_geoms:
                member_lines.append(self.way_geoms[m.ref])
        if not member_lines:
            return
        combined = unary_union(member_lines)
        if combined.is_empty or not combined.intersects(self.buffer):
            return
        self._n_route_in += 1

        if combined.geom_type == "LineString":
            geom = MultiLineString([combined])
        elif combined.geom_type == "MultiLineString":
            geom = combined
        else:
            return

        tags = dict(r.tags)
        row = {"geometry": geom}
        for k in ("ref", "name", "operator", "route"):
            row[k] = tags.get(k, None)
        self.route_features.append(row)


print("PASS 3 — Collecting highways and PT routes...", flush=True)
t2 = time.time()

collector = Collector(guertel_buffer)
# Use sparse_file_array (disk-based) to avoid huge memory usage
collector.apply_file(PBF, locations=True, idx='sparse_file_array')

print(f"  Total ways:               {collector._n_ways}", flush=True)
print(f"  Highway ways:             {collector._n_highways}", flush=True)
print(f"  Highway ways in buffer:   {collector._n_highway_in}", flush=True)
print(f"  PT route relations:       {collector._n_routes}", flush=True)
print(f"  PT routes crossing:       {collector._n_route_in}", flush=True)
print(f"  Elapsed: {time.time()-t2:.1f}s", flush=True)

# Clean up the sparse file array if it was created
for f in ("nodes.cache", "nodes.idx", "nodes.dat"):
    fp = f"/work/{f}"
    if os.path.exists(fp):
        os.remove(fp)

# ===================================================================
# Step 4 – Write GeoPackage
# ===================================================================
if not collector.highway_features:
    print("WARNING: no highway features to write!", flush=True)
if not collector.route_features:
    print("WARNING: no PT route features to write!", flush=True)

print("STEP 4 — Writing GeoPackage ...", flush=True)
t3 = time.time()

gdf_hw = gpd.GeoDataFrame(collector.highway_features, crs="EPSG:4326")
gdf_pt = gpd.GeoDataFrame(collector.route_features, crs="EPSG:4326")

# Reproject to MGI / Austria Lambert
gdf_hw = gdf_hw.to_crs("EPSG:31287")
gdf_pt = gdf_pt.to_crs("EPSG:31287")

OUT = "/work/vienna_network.gpkg"
if os.path.exists(OUT):
    os.remove(OUT)

gdf_hw.to_file(OUT, layer="highways", driver="GPKG", engine="pyogrio")
gdf_pt.to_file(OUT, layer="pt_routes", driver="GPKG", engine="pyogrio")

print(f"  Highways: {len(gdf_hw)} features", flush=True)
print(f"  PT routes: {len(gdf_pt)} features", flush=True)
print(f"  Output: {OUT}", flush=True)
print(f"  Total elapsed: {time.time()-t0:.1f}s", flush=True)
print("=" * 60, flush=True)
print("Done.", flush=True)

Wrote /work/solve.py (7844 chars).

28:40

Wait, but the previous run already completed PASS 1 and PASS 1b in about 194 seconds. The issue was in PASS 2 where I tried to create the index map incorrectly. But actually the sparse_file_array should work when passed as a string parameter. Let me check - the error was `create_map('sparse_file_array', NODE_CACHE)` but the correct call is `apply_file(..., idx='sparse_file_array')` which I'm now using.

However, I'm concerned about the `n.ref != 0 and (abs(n.lon) > 0.0 or abs(n.lat) > 0.0)` condition. In osmium with locations=True, the NodeRef objects should have valid lon/lat if the node was present in the file. The ref 0 check doesn't make sense - NodeRef.ref is always the node ID. Let me think...

Actually, with `locations=True`, osmium populates node locations in the way handler. For node references that don't have a matching node in the file (e.g., nodes outside the PBF extract), the location will be (0.0, 0.0) or invalid. So checking `abs(n.lon) > 0.0 or abs(n.lat) > 0.0` should filter out unlocated nodes. But there could be a node at (0, 0) - unlikely for Vienna though.

But wait - the PBF is the full Austria extract, so all nodes referenced by ways in Austria should be present. The condition `abs(n.lon) > 0.0 or abs(n.lat) > 0.0` should work fine to filter out any truly invalid/zero coordinates.

Let me run this.

28:40

Bash cd /work && python3 solve.py 2>&1

============================================================
PASS 1 — Finding Gürtel ways...
  Gürtel ways: 398
  Unique node refs: 1695
  Elapsed: 21.8s
PASS 2 — Reading node coordinates for Gürtel...
  Nodes found: 1695
  Elapsed: 165.3s
  Buffer bounds (deg): (15.1625046521, 47.0244920757, 16.4031906723, 48.2408288825)
  Total elapsed so far: 188.0s
PASS 3 — Collecting highways and PT routes...
  Total ways:               9288423
  Highway ways:             2501709
  Highway ways in buffer:   14539
  PT route relations:       12444
  PT routes crossing:       665
  Elapsed: 325.6s
STEP 4 — Writing GeoPackage ...
  Highways: 14539 features
  PT routes: 665 features
  Output: /work/vienna_network.gpkg
  Total elapsed: 515.0s
============================================================
Done.

37:38

The script ran successfully. Let me verify the output.

37:38

Bash cd /work && python3 -c "

cd /work && python3 -c "
import geopandas as gpd

# Check layers
gdf_hw = gpd.read_file('vienna_network.gpkg', layer='highways')
gdf_pt = gpd.read_file('vienna_network.gpkg', layer='pt_routes')

print('=== HIGHWAYS ===')
print(f'Features: {len(gdf_hw)}')
print(f'CRS: {gdf_hw.crs}')
print(f'Geometry types: {gdf_hw.geometry.type.value_counts().to_dict()}')
print(f'Columns: {list(gdf_hw.columns)}')
print(f'Sample:')
print(gdf_hw[['name','highway','maxspeed','lanes','surface','oneway']].head(10).to_string())
print()

print('=== PT ROUTES ===')
print(f'Features: {len(gdf_pt)}')
print(f'CRS: {gdf_pt.crs}')
print(f'Geometry types: {gdf_pt.geometry.type.value_counts().to_dict()}')
print(f'Columns: {list(gdf_pt.columns)}')
print(f'Sample:')
print(gdf_pt[['ref','name','operator','route']].head(10).to_string())
print()

# Check coordinate magnitudes (should be metric, ~10^5-10^6 range for Austria)
bounds_hw = gdf_hw.total_bounds
bounds_pt = gdf_pt.total_bounds
print(f'Highways bounds (EPSG:31287): {bounds_hw}')
print(f'PT routes bounds (EPSG:31287): {bounds_pt}')
print(f'File size: ', end='')
import os
sz = os.path.getsize('vienna_network.gpkg')
print(f'{sz/1024/1024:.1f} MB')
"

=== HIGHWAYS ===
Features: 14539
CRS: EPSG:31287
Geometry types: {'LineString': 14539}
Columns: ['name', 'highway', 'maxspeed', 'lanes', 'surface', 'oneway', 'geometry']
Sample:
                         name        highway maxspeed lanes  surface oneway
0          Sankt-Peter-Gürtel        primary       60     7  asphalt     no
1     Sankt-Peter-Hauptstraße        primary       50     5  asphalt     no
2         Waltendorfer Gürtel      secondary       50     3  asphalt     no
3                    Koßgasse      secondary       50     2  asphalt     no
4           Puntigamer Straße        primary       50     2  asphalt    yes
5            Weblinger Gürtel        primary       50     1  asphalt    yes
6      Herta-Frauneder-Straße   unclassified       50   NaN  asphalt    NaN
7  Autobahnzubringer Graz Ost       motorway      100     2  asphalt    yes
8                         NaN  motorway_link      100     2  asphalt    yes
9                         NaN  motorway_link      NaN     2  asphalt    yes

=== PT ROUTES ===
Features: 665
CRS: EPSG:31287
Geometry types: {'MultiLineString': 665}
Columns: ['ref', 'name', 'operator', 'route', 'geometry']
Sample:
  ref                                                               name                operator   route
0  S2                                                 S2: Mödling => Laa  ÖBB-Personenverkehr AG   train
1  U3                                         U3: Ottakring => Simmering           Wiener Linien  subway
2  U1                                           U1: Oberlaa => Leopoldau           Wiener Linien  subway
3   1                                Tram 1: Eggenberg/UKH => Mariatrost                     NaN    tram
4   7  Tram 7: Wetzelsdorf Burgenlandstraße => LKH Med Uni/Klinikum Nord             Graz Linien    tram
5  58                                    Bus 58: Hauptbahnhof => Ragnitz             Graz Linien     bus
6   5                                        Tram 5: Andritz => Puntigam                     NaN    tram
7   4                                    Tram 4: Reininghaus => Liebenau                     NaN    tram
8   3                                      Tram 3: Andritz => Krenngasse             Graz Linien    tram
9   6           Tram 6: Smart City/Peter-Tunner-Gasse/tim => Sankt Peter                     NaN    tram

Highways bounds (EPSG:31287): [536189.50655806 349200.10562584 628361.57988143 487679.13770081]
PT routes bounds (EPSG:31287): [114319.22587394 292266.24860261 683732.13557579 548552.65693629]
File size: 26.7 MB

38:21

Looking at the results:

1. **Highways** - 14,539 features, all LineString, correct CRS (EPSG:31287). But wait - the sample shows roads from Graz (Sankt-Peter-Gürtel, Graz streets). The buffer bounds from PASS 1 showed `(15.1625, 47.0245, 16.4032, 48.2408)` which is quite large - spanning from around Graz up to north of Vienna.

Let me think about this. The Gürtel in Vienna is the "Gürtel" road ring around Vienna. The buffer bounds should be centered around Vienna. But the buffer bounds show a bounding box from 15.16 to 16.40 longitude and 47.02 to 48.24 latitude. Vienna is around 16.37°E, 48.2°N. Graz is around 15.44°E, 47.07°N.

The problem is that we found 398 Gürtel ways, but some of them are in Graz (Sankt-Peter-Gürtel, Weblinger Gürtel, Waltendorfer Gürtel), not Vienna! The name ending in "Gürtel" finds all ways whose name ends with "Gürtel", not just Vienna's Gürtel.

The user says "Vienna's Gürtel, defined as the highway ways whose name ends in 'Gürtel'". So they want the Gürtel ring in Vienna. But there are other Gürtel streets in other Austrian cities.

I need to filter to only Vienna's Gürtel. The main Gürtel in Vienna follows a semicircular path around the inner city. It's known as the "Wiener Gürtel" and consists of segments like "Währinger Gürtel", "Gürtel", "Margaretengürtel", etc.

Actually, looking at the sample output - Sankt-Peter-Gürtel is in Graz. So the buffer is centered around Graz+Graz's Gürtel, not Vienna's.

I need to fix this. The user specifically says "Vienna's Gürtel". The Gürtel in Vienna is also referred to as "Gürtel" and there are many segments with names like:
- "Währinger Gürtel"
- "Döblinger Gürtel"
- "Alser Gürtel"
- "Hernalser Gürtel"
- "Ottakringer Gürtel"
- "Gürtel" (the core section)
- "Margaretengürtel"
- "Gürtel" sections...

But what defines Vienna's Gürtel as opposed to Graz's? The city boundary, or we could use the fact that Vienna's Gürtel ways are within the Vienna administrative area.

Let me think about a better approach:
1. First find all ways where name ends with "Gürtel"
2. The ones in Vienna are within the Vienna city boundary. We can check if any node of the way is in Vienna's bounding box (~16.1-16.6°E, 48.1-48.3°N).

Actually, a simpler approach: Vienna is in the state of "Wien" (Vienna). We can check if the way's nodes fall within a reasonable bounding box for Vienna. Or better yet, we can filter the Gürtel ways to only those within a bounding box that covers Vienna but not Graz.

Vienna's approximate bounding box: 16.15 to 16.60 E, 48.10 to 48.35 N.
Graz: ~15.35 to 15.50 E, 47.03 to 47.10 N.

So I can filter to only Gürtel ways where the centroid/any node is within a Vienna bounding box.

Let me rewrite the approach.

38:21

Write /work/solve.py

content · 246 lines · py

#!/usr/bin/env python3
"""
Extract highways intersecting a 500m buffer around Vienna's Gürtel,
and public-transport route relations crossing that same buffer,
from the Geofabrik Austria PBF.

Output: vienna_network.gpkg (layers: highways, pt_routes)
CRS: EPSG:31287 (MGI / Austria Lambert)

Strategy:
  PASS 1 – scan without locations to find ways whose name ends with "Gürtel"
           and whose nodes lie within the Vienna area (bbox ~16.15–16.60°E,
           48.10–48.35°N). Collect their node refs.
  PASS 2 – read all nodes, keep coordinates for those needed nodes
  PASS 3 – full scan with locations to collect highway ways + PT route
           relations intersecting the 500 m buffer
"""

import osmium
import sys
import os
import time

from shapely.geometry import LineString, MultiLineString, box
from shapely.ops import unary_union
import geopandas as gpd

PBF = "/work/austria-latest.osm.pbf"

# Vienna approximate bounding box (in degrees)
VIENNA_BBOX = box(16.10, 48.10, 16.65, 48.35)

# ===================================================================
# PASS 1 – Find Gürtel ways in Vienna (no locations)
# ===================================================================
class GuertelWayCollector(osmium.SimpleHandler):
    """Collect way IDs + node refs for ways named *Gürtel in Vienna."""
    def __init__(self):
        super().__init__()
        self.guertel_way_node_ids = []  # list of lists of node refs
        self._candidate_way_node_ids = []  # all *Gürtel ways before bbox filter

    def way(self, w):
        if "highway" not in w.tags:
            return
        name = w.tags.get("name", "")
        if not name.endswith("Gürtel"):
            return
        node_ids = [n.ref for n in w.nodes if n.ref != 0]
        if len(node_ids) >= 2:
            self._candidate_way_node_ids.append((w.id, node_ids))

    def node(self, n):
        pass  # not needed here


print("=" * 60, flush=True)
print("PASS 1 — Finding Gürtel ways in Vienna...", flush=True)
t0 = time.time()
gwc = GuertelWayCollector()
gwc.apply_file(PBF, locations=False)
print(f"  All *Gürtel ways found in Austria: {len(gwc._candidate_way_node_ids)}", flush=True)

# Collect ALL nodes referenced by candidate Gürtel ways
all_candidate_nodes = set()
for wid, nids in gwc._candidate_way_node_ids:
    all_candidate_nodes.update(nids)
print(f"  Unique node refs from all *Gürtel ways: {len(all_candidate_nodes)}", flush=True)

# ===================================================================
# PASS 2 – Read all nodes to determine which Gürtel ways are in Vienna
# ===================================================================
class NodeGrabberAll(osmium.SimpleHandler):
    def __init__(self, needed):
        super().__init__()
        self.needed = needed
        self.node_coords = {}

    def node(self, n):
        if n.id in self.needed:
            self.node_coords[n.id] = (n.lon, n.lat)

print("PASS 2 — Reading node coordinates for candidate *Gürtel ways...", flush=True)
t1 = time.time()
ng = NodeGrabberAll(all_candidate_nodes)
ng.apply_file(PBF, locations=False)
print(f"  Nodes found: {len(ng.node_coords)}", flush=True)
print(f"  Elapsed: {time.time()-t1:.1f}s", flush=True)

# Now filter to only Gürtel ways whose nodes are within Vienna bbox
guertel_lines = []
needed_nodes = set()
vienna_guertel_count = 0
for wid, nids in gwc._candidate_way_node_ids:
    coords = [ng.node_coords[nid] for nid in nids if nid in ng.node_coords]
    if len(coords) < 2:
        continue
    # Check if the way's centroid is within Vienna bbox
    lons = [c[0] for c in coords]
    lats = [c[1] for c in coords]
    centroid_lon = sum(lons) / len(lons)
    centroid_lat = sum(lats) / len(lats)
    if VIENNA_BBOX.contains(centroid_lon, centroid_lat) or VIENNA_BBOX.intersects(centroid_lon, centroid_lat):
        # Check by point
        pt = LineString(coords).centroid
        if VIENNA_BBOX.contains(pt):
            vienna_guertel_count += 1
            guertel_lines.append(LineString(coords))
            needed_nodes.update(nids)

print(f"  Vienna *Gürtel ways: {vienna_guertel_count}", flush=True)

if not guertel_lines:
    print("ERROR: No Vienna Gürtel ways found.", flush=True)
    sys.exit(1)

guertel_union = unary_union(guertel_lines)

# 500 m buffer — approx: 1° ≈ 111000 m at Vienna latitude
BUFFER_DEG = 500.0 / 111000.0
guertel_buffer = guertel_union.buffer(BUFFER_DEG)
print(f"  Buffer bounds (deg): {guertel_buffer.bounds}", flush=True)
print(f"  Total elapsed so far: {time.time()-t0:.1f}s", flush=True)

# ===================================================================
# PASS 3 – Full read with locations (disk-based index) to get highways + PT
# ===================================================================
class Collector(osmium.SimpleHandler):
    def __init__(self, buffer_geom):
        super().__init__()
        self.buffer = buffer_geom
        self.way_geoms = {}        # way_id -> LineString
        self.highway_features = []
        self.route_features = []

        self._n_ways = 0
        self._n_highways = 0
        self._n_highway_in = 0
        self._n_routes = 0
        self._n_route_in = 0

    def way(self, w):
        self._n_ways += 1
        coords = [(n.lon, n.lat) for n in w.nodes
                  if n.ref != 0 and (abs(n.lon) > 0.0 or abs(n.lat) > 0.0)]
        if len(coords) < 2:
            return
        line = LineString(coords)
        self.way_geoms[w.id] = line

        if "highway" not in w.tags:
            return
        self._n_highways += 1
        if not line.intersects(self.buffer):
            return
        self._n_highway_in += 1
        tags = dict(w.tags)
        row = {"geometry": line}
        for k in ("name", "highway", "maxspeed", "lanes", "surface", "oneway"):
            row[k] = tags.get(k, None)
        self.highway_features.append(row)

    def relation(self, r):
        if r.tags.get("type") != "route":
            return
        route_val = r.tags.get("route", "")
        if route_val not in ("bus", "tram", "subway", "train",
                             "trolleybus", "light_rail"):
            return
        self._n_routes += 1

        member_lines = []
        for m in r.members:
            if m.type == "w" and m.ref in self.way_geoms:
                member_lines.append(self.way_geoms[m.ref])
        if not member_lines:
            return
        combined = unary_union(member_lines)
        if combined.is_empty or not combined.intersects(self.buffer):
            return
        self._n_route_in += 1

        if combined.geom_type == "LineString":
            geom = MultiLineString([combined])
        elif combined.geom_type == "MultiLineString":
            geom = combined
        else:
            return

        tags = dict(r.tags)
        row = {"geometry": geom}
        for k in ("ref", "name", "operator", "route"):
            row[k] = tags.get(k, None)
        self.route_features.append(row)


print("PASS 3 — Collecting highways and PT routes...", flush=True)
t2 = time.time()

collector = Collector(guertel_buffer)
collector.apply_file(PBF, locations=True, idx='sparse_file_array')

print(f"  Total ways:               {collector._n_ways}", flush=True)
print(f"  Highway ways:             {collector._n_highways}", flush=True)
print(f"  Highway ways in buffer:   {collector._n_highway_in}", flush=True)
print(f"  PT route relations:       {collector._n_routes}", flush=True)
print(f"  PT routes crossing:       {collector._n_route_in}", flush=True)
print(f"  Elapsed: {time.time()-t2:.1f}s", flush=True)

# Clean up any sparse file array artifacts
for f in ("nodes.cache", "nodes.idx", "nodes.dat"):
    fp = f"/work/{f}"
    if os.path.exists(fp):
        os.remove(fp)

# ===================================================================
# Step 4 – Write GeoPackage
# ===================================================================
if not collector.highway_features:
    print("WARNING: no highway features to write!", flush=True)
if not collector.route_features:
    print("WARNING: no PT route features to write!", flush=True)

print("STEP 4 — Writing GeoPackage ...", flush=True)
t3 = time.time()

gdf_hw = gpd.GeoDataFrame(collector.highway_features, crs="EPSG:4326")
gdf_pt = gpd.GeoDataFrame(collector.route_features, crs="EPSG:4326")

# Reproject to MGI / Austria Lambert
gdf_hw = gdf_hw.to_crs("EPSG:31287")
gdf_pt = gdf_pt.to_crs("EPSG:31287")

OUT = "/work/vienna_network.gpkg"
if os.path.exists(OUT):
    os.remove(OUT)

gdf_hw.to_file(OUT, layer="highways", driver="GPKG", engine="pyogrio")
gdf_pt.to_file(OUT, layer="pt_routes", driver="GPKG", engine="pyogrio")

print(f"  Highways: {len(gdf_hw)} features", flush=True)
print(f"  PT routes: {len(gdf_pt)} features", flush=True)
print(f"  Output: {OUT}", flush=True)
print(f"  Total elapsed: {time.time()-t0:.1f}s", flush=True)
print("=" * 60, flush=True)
print("Done.", flush=True)

Wrote /work/solve.py (8914 chars).

38:42

Wait, the check `VIENNA_BBOX.contains(centroid_lon, centroid_lat)` won't work because centroid_lon and centroid_lat are floats, not Points. Let me also use `.intersects` with a Point. Let me fix the check.

Also, I already downloaded the PBF again (it was overwritten? Actually it should still be there). Let me check the file size.

Actually, I should also consider: the first approach collected all *Gürtel ways in Austria (398 ways), and the buffer bounds were (15.16, 47.02, 16.40, 48.24) which shows it was around Graz (15.4°E, 47.1°N) but also extending up to Vienna area. The fact that it had 398 Gürtel ways is the issue. Vienna's Gürtel is the famous ring road - there are specific ways like "Gürtel" (the street), "Äußerer Gürtel", "Innerer Gürtel" etc.

Let me fix the centroid check properly.

38:42

Edit /work/solve.py

Edited /work/solve.py.

38:51

Actually, I realize the issue with the centroid check. The `VIENNA_BBOX.contains(centroid_lon, centroid_lat)` call passes a scalar for the point, which won't work with Shapely. I need to use `Point(centroid_lon, centroid_lat)` or just check the centroid of the LineString.

Let me look at the current code more carefully and fix it.

The code currently has:
```python
if VIENNA_BBOX.contains(centroid_lon, centroid_lat) or VIENNA_BBOX.intersects(centroid_lon, centroid_lat):
    # Check by point
    pt = LineString(coords).centroid
    if VIENNA_BBOX.contains(pt):
```

This is wrong because `contains` with scalars won't work. Let me simplify to just check the centroid of the LineString.

38:51

Edit /work/solve.py

Edited /work/solve.py.

39:04

Also, I should be more precise about the Gürtel. The name "Gürtel" in Vienna refers specifically to the ring road around the inner city. Let me also check if the way has "Gürtel" as the exact name ending, which should be correct. But I also need to ensure I'm getting the right Vienna Gürtel - the main one in Vienna has names like "Äußere Wiener Gürtelstraße", "Gürtel", "Währinger Gürtel", "Gürtel" etc. Actually the user says "highway ways whose name ends in 'Gürtel'" which is correct.

But wait - the bbox filter might be too strict. Let me think about the Gürtel's actual location in Vienna. The Gürtel (the main ring road) runs roughly from ~16.33 to ~16.37°E and ~48.19 to ~48.23°N. So the bbox 16.10-16.65, 48.10-48.35 should easily contain it.

Let me also check if "Gürtel" is part of the name. In Vienna the street is often just called "Gürtel" or has "Gürtel" as suffix. Let me run the script.

39:04

Bash cd /work && python3 solve.py 2>&1

<exit_code>-1</exit_code>
<error>Command timed out after 600000ms</error>

49:48

The timeout means PASS 3 is taking too long. The sparse_file_array index is slow for a 765MB PBF. Let me try a smarter approach — use the OSM file with `locations=True` but use a faster index like `flex_mem` (the default). Or better yet, let me use a different strategy altogether.

Actually, the problem is that reading the full PBF twice (PASS 2 for nodes + PASS 3 with locations) is taking too long (PASS 2 took 165s, PASS 3 was going to take at least as long). With the first two passes already taking ~190s, PASS 3 would probably take 200-300s, totaling ~500s which is within the 600s timeout... but the first run of PASS 3 took 325s. So total would be 188+325=513s. But now we added more nodes to read in PASS 2... Actually PASS 2 is reading all candidate nodes from all Gürtel ways in Austria, not just Vienna ones.

Let me think about this differently. Instead of reading the PBF twice, I can:

1. First pass: find all ways named *Gürtel in Austria using osmium without locations (fast - 22s)
2. Second pass: read all nodes (locations=True) and capture just the ones we need for Gürtel ways, PLUS collect all highway and PT data at the same time. This way we only read the file once with locations.

Actually, there's a problem: in the second pass, the `way()` handler is called before the `node()` handler for the nodes it references. With `locations=True`, the node positions are already available via the index when `way()` is called. But I need to also capture node coordinates for Gürtel ways specifically.

Wait - with `locations=True` in osmium, when `way()` is called, the nodes already have their coordinates filled in (n.lon, n.lat). So I don't need to separately collect node coordinates. Let me restructure:

PASS 1: Without locations, find Gürtel ways and collect their node refs (22s)
PASS 2: With locations=True, do everything:
- Store all way geometries
- For Gürtel ways, build geometries and create buffer
- For all highways that intersect the buffer, collect them
- For PT route relations, collect them

But wait, the buffer can only be created after all Gürtel ways have been processed. And the buffer is needed during the same pass to filter highways. Since osmium processes data sequentially (nodes, then ways, then relations), and within ways it's also sequential, the Gürtel ways may not all be processed before we need the buffer.

Hmm, but actually osmium with `locations=True` first reads all the nodes (to store their locations), then calls `way()` for each way. So by the time `way()` is called on any way, ALL node positions are available. But we still need to process all ways to build the Gürtel union before we can compute the buffer and filter other highways.

Alternative approach:
- PASS 1: Without locations, scan all ways to find Gürtel ways and collect their node IDs (22s)
- PASS 1b: Without locations, scan all nodes matching those IDs (165s)
- Between passes: Build Gürtel union and buffer
- PASS 2: With locations=True, scan all ways and relations, filtering by the buffer

That's what we had before and it timed out at 600s. Let me check how long each part took:
- PASS 1: ~22s
- PASS 2 (nodes): ~165s
- Buffer building: negligible
- PASS 3 (with locations): this is the one that timed out

The issue is that with `locations=True`, osmium reads the full PBF including all node coordinates, which is very slow.

Let me try a completely different approach: use DuckDB with the OSM PBF extension, or use osmium to filter the PBF first.

Actually, let me try using `osmium` CLI tool which is much faster. I'll install it via a binary download.

Wait, I already tried apt-get and it failed. Let me try pip install or download a precompiled binary.

Actually, let me try a different approach entirely. Let me use `osmium` to:
1. First extract just the nodes and ways for *Gürtel ways into a small PBF
2. Then use that to compute the buffer
3. Then extract all ways and relations intersecting that buffer using osmium's tag-filter and geospatial-filter

But I don't think we have osmium CLI tool. Let me think about another approach.

What if I use DuckDB's spatial extension to read the PBF? DuckDB has an OSM extension. Let me check.

Actually, the simplest approach might be to use `osmium` CLI tool. Let me try to download it.

Or, I could use a different approach with pyosmium:

Actually, let me try a different strategy. Instead of reading the full PBF with locations=True (which is slow because it reads all node coordinates into memory), I can:

1. First pass without locations to find Gürtel way IDs and their node refs (fast - 22s)
2. Build a filtered PBF containing only the nodes and ways I need using osmium's Python API, then read that smaller file with locations=True.

Actually, osmium Python doesn't have a way to write filtered PBFs easily.

Let me try yet another approach. Let me use `osmium` command-line tool by downloading it:

Let me check if there's an osmium binary somewhere in the system.

49:48

Bash which osmium 2>/dev/null; find / -name "osmium" -type f 2>/dev/null | head -5

(no output)

50:31

The osmium CLI is not available and can't be installed without apt-get permissions. Let me think about a different strategy to speed this up.

The problem with the Python osmium bindings is that using `locations=True` with `sparse_file_array` is slow for the whole 765MB PBF. But the earlier run (before I added Vienna filtering) actually completed PASS 3 in 325 seconds (~5.4 min), which is within the 10-minute timeout. So the timeout must be because it took longer this time.

Wait - actually the previous run didn't time out - it completed in 515s total. The issue was that it found the wrong Gürtel (in Graz). Let me check what's different now. The addition of the Vienna bbox filter in PASS 1/2 is minor.

Actually, looking more carefully: the previous run completed PASS 3 in 325.6s. With the Vienna filter added, PASS 1 and 2 might be slightly different but PASS 3 does the same full scan. So it should still complete within 600s.

Maybe the system was just slower this time. Let me re-run it with a longer timeout, or alternatively, use a different approach.

Actually wait - the first run completed with the whole-Austria Gürtel in 515s total. Now with Vienna filter, it timed out at 600s. Maybe the machine was under load. Let me try again.

Actually, a much better approach: use osmium Python to first extract just the Vienna region using a bounding box filter, then process the smaller file. But osmium Python's `apply_file` doesn't support bbox filtering directly.

Alternative: Use `osmium` CLI tool (if available) or use the fact that we can filter with a simple handler that skips non-Vienna data. But `locations=True` still needs to traverse the entire file for the node locations.

Let me think of a smarter approach:

1. Pass 1 (no locations): collect Gürtel way node IDs - this is fast (21s)
2. Pass 1b (no locations): read specific nodes - this is slow (165s!) but only for ~1700 nodes
3. Pass 3 (locations=True): full scan with node cache - this took 325s

The problem is Pass 2 (node reading) is slow because the Python osmium handler processes EVERY node in the file, even though we only keep a tiny fraction. That 165s is unavoidable for node scanning in Python.

For Pass 3, the 325s seems okay. Let me just try again. The timeout might have been due to a one-time system issue.

Actually, let me try a completely different approach that will be much faster. Instead of reading the entire PBF with Python osmium, I can:

1. Use Python osmium's `apply_file` with a way handler that uses a node cache - but use `dense_file_array` or `sparse_file_array` index. The key optimization is to not use `locations=True` with SimpleHandler but instead use a custom Reader.

Actually wait - for the final Pass 3, I already used `idx='sparse_file_array'`. Let me just try running again.

Actually, I think the issue might be that the previous run also found lots of features outside Vienna because of the Gürtel in Graz. Now with the Vienna-only Gürtel, the buffer is much smaller, so PASS 3 should actually be faster because fewer features intersect. Let me just re-run.

But it timed out at 600s. The previous PASS 3 took 325s. Adding the Vienna filter might actually make PASS 3 slower if the buffer ends up being more geometrically complex... No, smaller buffer = fewer intersections.

Let me try a smarter approach: use a Reader directly with NodeLocationsForWays handler chain.

50:31

Write /work/solve.py

content · 233 lines · py

#!/usr/bin/env python3
"""
Extract highways intersecting a 500m buffer around Vienna's Gürtel,
and public-transport route relations crossing that same buffer,
from the Geofabrik Austria PBF.

Output: vienna_network.gpkg (layers: highways, pt_routes)
CRS: EPSG:31287 (MGI / Austria Lambert)

Uses osmium.Reader + osmium.NodeLocationsForWays for maximum speed.
"""

import osmium
from osmium._osmium import NodeLocationsForWays
import sys
import os
import time

from shapely.geometry import LineString, MultiLineString, box, Point
from shapely.ops import unary_union
import geopandas as gpd

PBF = "/work/austria-latest.osm.pbf"
VIENNA_BBOX = box(16.10, 48.10, 16.65, 48.35)

# ===================================================================
# PASS 1 – Collect Vienna Gürtel ways (Reader directly)
# ===================================================================
print("=" * 60, flush=True)
print("PASS 1 — Finding Vienna Gürtel ways...", flush=True)
t0 = time.time()

# First pass: collect all ways named *Gürtel and their node refs
class GuertelCollector(osmium.SimpleHandler):
    def __init__(self):
        super().__init__()
        self.ways = []  # (way_id, [node_ids])

    def way(self, w):
        if "highway" not in w.tags:
            return
        if not w.tags.get("name", "").endswith("Gürtel"):
            return
        nids = [n.ref for n in w.nodes if n.ref != 0]
        if len(nids) >= 2:
            self.ways.append((w.id, nids))

gcol = GuertelCollector()
gcol.apply_file(PBF, locations=False)
print(f"  *Gürtel ways in Austria: {len(gcol.ways)}", flush=True)

# Collect all node refs
all_node_refs = set()
for wid, nids in gcol.ways:
    all_node_refs.update(nids)
print(f"  Unique node refs: {len(all_node_refs)}", flush=True)

# Read just the needed nodes
class NodeGrabber(osmium.SimpleHandler):
    def __init__(self, needed):
        super().__init__()
        self.needed = needed
        self.coords = {}

    def node(self, n):
        if n.id in self.needed:
            self.coords[n.id] = (n.lon, n.lat)

t1 = time.time()
ng = NodeGrabber(all_node_refs)
ng.apply_file(PBF, locations=False)
print(f"  Nodes read: {len(ng.coords)} in {time.time()-t1:.1f}s", flush=True)

# Build Vienna Gürtel lines
guertel_lines = []
for wid, nids in gcol.ways:
    coords = [ng.coords[nid] for nid in nids if nid in ng.coords]
    if len(coords) < 2:
        continue
    ls = LineString(coords)
    if ls.centroid.within(VIENNA_BBOX):
        guertel_lines.append(ls)

print(f"  Vienna Gürtel ways: {len(guertel_lines)}", flush=True)
if not guertel_lines:
    print("ERROR: No Vienna Gürtel ways found.", flush=True)
    sys.exit(1)

guertel_union = unary_union(guertel_lines)
BUFFER_DEG = 500.0 / 111000.0
guertel_buffer = guertel_union.buffer(BUFFER_DEG)
print(f"  Buffer bounds (deg): {guertel_buffer.bounds}", flush=True)

# ===================================================================
# PASS 2 – Full read with NodeLocationsForWays to collect highways + PT
# Use osmium.Reader directly for better performance
# ===================================================================
print("PASS 2 — Full scan collecting highways and PT routes...", flush=True)
t2 = time.time()

way_geoms = {}        # way_id -> LineString (wgs84)
highway_features = []  # list of dicts
route_features = []    # list of dicts

# We need to store node locations and then process ways + relations
# Using Reader + NodeLocationsForWays handler

class WayCollector(osmium.SimpleHandler):
    def __init__(self, buffer_geom):
        super().__init__()
        self.buffer = buffer_geom
        self.way_geoms = {}
        self.way_tags = {}
        self.highway_features = []
        self._n_ways = 0
        self._n_highways = 0
        self._n_highway_in = 0

    def way(self, w):
        self._n_ways += 1
        coords = [(n.lon, n.lat) for n in w.nodes
                  if n.ref != 0 and (abs(n.lon) > 0.0 or abs(n.lat) > 0.0)]
        if len(coords) < 2:
            return
        line = LineString(coords)
        self.way_geoms[w.id] = line
        self.way_tags[w.id] = dict(w.tags)

        if "highway" not in w.tags:
            return
        self._n_highways += 1
        if not line.intersects(self.buffer):
            return
        self._n_highway_in += 1
        tags = dict(w.tags)
        row = {"geometry": line}
        for k in ("name", "highway", "maxspeed", "lanes", "surface", "oneway"):
            row[k] = tags.get(k, None)
        self.highway_features.append(row)


class RelationCollector(osmium.SimpleHandler):
    def __init__(self, buffer_geom, way_g):
        super().__init__()
        self.buffer = buffer_geom
        self.way_geoms = way_g
        self.route_features = []
        self._n_routes = 0
        self._n_route_in = 0

    def relation(self, r):
        if r.tags.get("type") != "route":
            return
        route_val = r.tags.get("route", "")
        if route_val not in ("bus", "tram", "subway", "train",
                             "trolleybus", "light_rail"):
            return
        self._n_routes += 1

        member_lines = []
        for m in r.members:
            if m.type == "w" and m.ref in self.way_geoms:
                member_lines.append(self.way_geoms[m.ref])
        if not member_lines:
            return
        combined = unary_union(member_lines)
        if combined.is_empty or not combined.intersects(self.buffer):
            return
        self._n_route_in += 1

        if combined.geom_type == "LineString":
            geom = MultiLineString([combined])
        elif combined.geom_type == "MultiLineString":
            geom = combined
        else:
            return

        tags = dict(r.tags)
        row = {"geometry": geom}
        for k in ("ref", "name", "operator", "route"):
            row[k] = tags.get(k, None)
        self.route_features.append(row)


# Step A: read ways with locations
wc = WayCollector(guertel_buffer)
wc.apply_file(PBF, locations=True, idx='sparse_file_array')
print(f"  Ways: {wc._n_ways}, highways: {wc._n_highways}, in buffer: {wc._n_highway_in}", flush=True)
print(f"  Pass 2a elapsed: {time.time()-t2:.1f}s", flush=True)

# Clean up sparse file array temporary files
for fname in os.listdir('/work'):
    if fname.startswith('nodes.') or fname.endswith('.cache'):
        try:
            os.remove(f'/work/{fname}')
        except:
            pass

# Step B: read relations (no locations needed, just use stored way geoms)
print("  Reading relations...", flush=True)
t2b = time.time()
rc = RelationCollector(guertel_buffer, wc.way_geoms)
rc.apply_file(PBF, locations=False)
print(f"  PT relations: {rc._n_routes}, crossing buffer: {rc._n_route_in}", flush=True)
print(f"  Pass 2b elapsed: {time.time()-t2b:.1f}s", flush=True)

print(f"  Total pass 2 elapsed: {time.time()-t2:.1f}s", flush=True)

# ===================================================================
# Step 3 – Write GeoPackage
# ===================================================================
print("STEP 3 — Writing GeoPackage ...", flush=True)
t3 = time.time()

gdf_hw = gpd.GeoDataFrame(wc.highway_features, crs="EPSG:4326")
gdf_pt = gpd.GeoDataFrame(rc.route_features, crs="EPSG:4326")

gdf_hw = gdf_hw.to_crs("EPSG:31287")
gdf_pt = gdf_pt.to_crs("EPSG:31287")

OUT = "/work/vienna_network.gpkg"
if os.path.exists(OUT):
    os.remove(OUT)

gdf_hw.to_file(OUT, layer="highways", driver="GPKG", engine="pyogrio")
gdf_pt.to_file(OUT, layer="pt_routes", driver="GPKG", engine="pyogrio")

print(f"  Highways: {len(gdf_hw)} features", flush=True)
print(f"  PT routes: {len(gdf_pt)} features", flush=True)
print(f"  Output: {OUT}", flush=True)
print(f"  Total elapsed: {time.time()-t0:.1f}s", flush=True)
print("=" * 60, flush=True)
print("Done.", flush=True)

Wrote /work/solve.py (7787 chars).

50:35

I've split the work: first pass gets ways (with locations=True), second pass only reads relations (no locations needed, much faster). Let me run it and check the time.

50:35

Bash cd /work && python3 solve.py 2>&1

<exit_code>-1</exit_code>
<error>Command timed out after 600000ms</error>

61:22

The issue is the sparse_file_array index. For a 765MB PBF, building the node location index on-disk is extremely slow. The `sparse_file_array` seems to work, but it's taking too long.

Let me think about a different approach:

1. Use osmium CLI tool - but it's not installed and we can't install it.
2. Use a completely different approach - use the `osmium` C++ tool that comes with the `osmium` Python package... actually wait, there is a C++ tool called `osmium` that comes separately. Let me check.

Actually, let me think about this differently. The Python `osmium` library reads the PBF. When `locations=True`, it builds a node location index. The `sparse_file_array` is a disk-based index but it's still slow for 765MB.

The fastest approach would be to:
1. Use the Python osmium library to extract just the ways we need (those in Vienna)
2. But we don't have node locations for those ways unless we build the index

Actually, the real problem might be that `sparse_file_array` on disk is writing/reading from disk a lot, which is slow. Let me try `flex_mem` which is the in-memory flexible index - it might use more RAM but could be faster.

Wait, the first run with the wrong buffer (Graz-centric) completed in about 5 minutes total with `sparse_file_array`. That PASS 3 took 325s.

But now the second run timed out at 600s. Let me check... actually, the second run might be different because the buffer is smaller (Vienna vs all of Austria), so it actually should be faster... unless the sparse_file_array indexing is the bottleneck.

Actually wait, the first version (with the wrong buffer covering Graz) completed PASS 3 in 325s. The second version (Vienna buffer) should be similar or faster since the buffer is smaller (fewer highway features to match). But it timed out at 600s.

Wait, re-reading the logs: the first version had PASS 1 (398 ways), PASS 1b (nodes), PASS 3 (sparse_file_array). The second version had PASS 1 (all *Gürtel ways), PASS 2 (nodes for those), PASS 3 (sparse_file_array). But the second version spent more time in PASS 1 and 2 because it couldn't filter by Vienna bbox yet.

Actually the timeout happened on the third version which split pass 2 into two separate apply_file calls - one with locations=True and one with locations=False. The first call (locations=True with idx='sparse_file_array') might have been the bottleneck.

Let me try a different approach entirely:
1. Use `osmium getid` - but we don't have the CLI tool
2. Use `reader.apply()` with the handler chain ourselves

Actually, let me think about what's happening. The `apply_file` with `locations=True` builds a node cache. The `sparse_file_array` writes to a temporary file on disk. For a 765MB PBF with ~10 million nodes, the node cache file can be hundreds of MBs. Writing that to disk could be slow.

What if instead of scanning the entire file with locations, I:
1. Read ALL nodes first (keeping all ~10M nodes in memory - that's about 80 bytes per node = 800MB which might be too much)
2. Then read ways and build geometries from memory

Or alternatively:
1. Use the two-pass approach where Pass 1 finds Gürtel ways + node IDs
2. Pass 2 reads only those nodes
3. Pass 3 builds a handler that caches node locations and processes ways/relations

Actually, let me re-think. The osmium library when you use `apply_file(PBF, locations=True)` first reads all nodes, storing their locations, then reads ways. The `flex_mem` index stores in memory and is much faster than `sparse_file_array`. Let me try that.

But flexible memory might use too much RAM for 10M nodes... Let me estimate:
- Each node has id (8 bytes) + lon (8 bytes) + lat (8 bytes) + overhead = ~32-40 bytes
- 10M nodes × 40 bytes = 400MB

Actually, osmium internally stores node locations in a more efficient format. Let me try `flex_mem` and see.

Actually, the first version succeeded with PASS 3 in 325s using sparse_file_array. The third version split PASS 3 into two passes (ways with locations, then relations without locations). The first of those two passes (ways with locations) would do the same work as before plus store way geoms. So it should be similar to 325s. But the total script timed out at 600s.

Wait, maybe the issue is that the download and the PBF file were somehow corrupted? Let me re-verify the file.

Or maybe the issue is that the second apply_file (for relations) also reads the whole file and does work. But it doesn't need locations, so it should be fast.

Let me look at the third version more carefully. In PASS 2, it:
1. Calls apply_file(PBF, locations=True, idx='sparse_file_array') - reads all nodes, builds index, reads all ways - this should take ~325s like before
2. Then calls apply_file(PBF, locations=False) - reads all relations - should take maybe 30s

So total ~355s, which should finish within 600s. Unless there's something else going on.

Let me just try with `flex_mem` instead of `sparse_file_array`. It might be faster.

Actually, maybe I should also check if there's a memory issue. Let me try a completely different approach: just read the file and process in one pass.

61:22

Write /work/solve.py

content · 225 lines · py

#!/usr/bin/env python3
"""
Extract highways intersecting a 500m buffer around Vienna's Gürtel,
and public-transport route relations crossing that same buffer,
from the Geofabrik Austria PBF.

Output: vienna_network.gpkg (layers: highways, pt_routes)
CRS: EPSG:31287 (MGI / Austria Lambert)

Strategy:
  PASS 1 – collect all *Gürtel ways + their node refs (no locations)
  PASS 2 – read only those node coordinates
  PASS 3 – full scan (flex_mem index) collecting highways in buffer
  PASS 4 – read relations (no locations) against stored way geoms
"""

import osmium
import sys
import os
import time
import gc

from shapely.geometry import LineString, MultiLineString, box, Point
from shapely.ops import unary_union
import geopandas as gpd

PBF = "/work/austria-latest.osm.pbf"
VIENNA_BBOX = box(16.10, 48.10, 16.65, 48.35)

# ===================================================================
# PASS 1 – Collect *Gürtel ways and their node refs
# ===================================================================
print("=" * 60, flush=True)
print("PASS 1 — Finding Vienna Gürtel ways...", flush=True)
t0 = time.time()

class GuertelCollector(osmium.SimpleHandler):
    def __init__(self):
        super().__init__()
        self.ways = []

    def way(self, w):
        if "highway" not in w.tags:
            return
        if not w.tags.get("name", "").endswith("Gürtel"):
            return
        nids = [n.ref for n in w.nodes if n.ref != 0]
        if len(nids) >= 2:
            self.ways.append((w.id, nids))

gcol = GuertelCollector()
gcol.apply_file(PBF, locations=False)
print(f"  *Gürtel ways in Austria: {len(gcol.ways)}", flush=True)

all_refs = set()
for wid, nids in gcol.ways:
    all_refs.update(nids)
print(f"  Unique node refs: {len(all_refs)}", flush=True)
print(f"  Elapsed: {time.time()-t0:.1f}s", flush=True)

# ===================================================================
# PASS 2 – Read needed node coordinates
# ===================================================================
class NodeGrabber(osmium.SimpleHandler):
    def __init__(self, needed):
        super().__init__()
        self.needed = needed
        self.coords = {}

    def node(self, n):
        if n.id in self.needed:
            self.coords[n.id] = (n.lon, n.lat)

t1 = time.time()
ng = NodeGrabber(all_refs)
ng.apply_file(PBF, locations=False)
print(f"  Nodes read: {len(ng.coords)} in {time.time()-t1:.1f}s", flush=True)

# Build Vienna Gürtel lines
guertel_lines = []
for wid, nids in gcol.ways:
    coords = [ng.coords[nid] for nid in nids if nid in ng.coords]
    if len(coords) < 2:
        continue
    ls = LineString(coords)
    if ls.centroid.within(VIENNA_BBOX):
        guertel_lines.append(ls)

del gcol, ng, all_refs
gc.collect()

print(f"  Vienna Gürtel ways: {len(guertel_lines)}", flush=True)
if not guertel_lines:
    print("ERROR: No Vienna Gürtel ways found.", flush=True)
    sys.exit(1)

if len(guertel_lines) == 1:
    guertel_union = guertel_lines[0]
else:
    guertel_union = unary_union(guertel_lines)

BUFFER_DEG = 500.0 / 111000.0
guertel_buffer = guertel_union.buffer(BUFFER_DEG)
print(f"  Buffer bounds (deg): {guertel_buffer.bounds}", flush=True)

# ===================================================================
# PASS 3 – Full scan with flex_mem for node locations
# ===================================================================
print("PASS 3 — Full scan (ways + relations) ...", flush=True)
t2 = time.time()

way_geoms = {}        # way_id -> LineString
way_tags = {}         # way_id -> tags
highway_features = []  # list of dicts
route_features = []    # list of dicts

n_ways = 0
n_highways = 0
n_highway_in = 0
n_routes = 0
n_route_in = 0

# Use Reader directly with NodeLocationsForWays handler for speed
class AllInOne(osmium.SimpleHandler):
    """Collect ways with locations, then relations."""
    def __init__(self, buffer_geom):
        super().__init__()
        self.buffer = buffer_geom
        self.way_geoms = {}
        self.highway_features = []
        self.route_features = []
        self._n_ways = 0
        self._n_highways = 0
        self._n_highway_in = 0
        self._n_routes = 0
        self._n_route_in = 0

    def way(self, w):
        self._n_ways += 1
        coords = [(n.lon, n.lat) for n in w.nodes
                  if n.ref != 0 and (abs(n.lon) > 0.0 or abs(n.lat) > 0.0)]
        if len(coords) < 2:
            return
        line = LineString(coords)
        self.way_geoms[w.id] = line

        if "highway" not in w.tags:
            return
        self._n_highways += 1
        if not line.intersects(self.buffer):
            return
        self._n_highway_in += 1
        tags = dict(w.tags)
        row = {"geometry": line}
        for k in ("name", "highway", "maxspeed", "lanes", "surface", "oneway"):
            row[k] = tags.get(k, None)
        self.highway_features.append(row)

    def relation(self, r):
        if r.tags.get("type") != "route":
            return
        route_val = r.tags.get("route", "")
        if route_val not in ("bus", "tram", "subway", "train",
                             "trolleybus", "light_rail"):
            return
        self._n_routes += 1

        member_lines = []
        for m in r.members:
            if m.type == "w" and m.ref in self.way_geoms:
                member_lines.append(self.way_geoms[m.ref])
        if not member_lines:
            return
        combined = unary_union(member_lines)
        if combined.is_empty or not combined.intersects(self.buffer):
            return
        self._n_route_in += 1

        if combined.geom_type == "LineString":
            geom = MultiLineString([combined])
        elif combined.geom_type == "MultiLineString":
            geom = combined
        else:
            return

        tags = dict(r.tags)
        row = {"geometry": geom}
        for k in ("ref", "name", "operator", "route"):
            row[k] = tags.get(k, None)
        self.route_features.append(row)


collector = AllInOne(guertel_buffer)
# flex_mem is the default and fastest in-memory index
collector.apply_file(PBF, locations=True, idx='flex_mem')

print(f"  Ways: {collector._n_ways}, highways: {collector._n_highways}, in buffer: {collector._n_highway_in}", flush=True)
print(f"  Relations: {collector._n_routes}, crossing buffer: {collector._n_route_in}", flush=True)
print(f"  Elapsed: {time.time()-t2:.1f}s", flush=True)

# ===================================================================
# Step 4 – Write GeoPackage
# ===================================================================
print("STEP 4 — Writing GeoPackage ...", flush=True)
t3 = time.time()

gdf_hw = gpd.GeoDataFrame(collector.highway_features, crs="EPSG:4326")
gdf_pt = gpd.GeoDataFrame(collector.route_features, crs="EPSG:4326")

gdf_hw = gdf_hw.to_crs("EPSG:31287")
gdf_pt = gdf_pt.to_crs("EPSG:31287")

OUT = "/work/vienna_network.gpkg"
if os.path.exists(OUT):
    os.remove(OUT)

gdf_hw.to_file(OUT, layer="highways", driver="GPKG", engine="pyogrio")
gdf_pt.to_file(OUT, layer="pt_routes", driver="GPKG", engine="pyogrio")

print(f"  Highways: {len(gdf_hw)} features", flush=True)
print(f"  PT routes: {len(gdf_pt)} features", flush=True)
print(f"  Output: {OUT}", flush=True)
print(f"  Total elapsed: {time.time()-t0:.1f}s", flush=True)
print("=" * 60, flush=True)
print("Done.", flush=True)

Wrote /work/solve.py (7357 chars).

61:32

Bash cd /work && python3 solve.py 2>&1

============================================================
PASS 1 — Finding Vienna Gürtel ways...
  *Gürtel ways in Austria: 398
  Unique node refs: 1695
  Elapsed: 30.3s
  Nodes read: 1695 in 181.2s
  Vienna Gürtel ways: 222
  Buffer bounds (deg): (16.33208041735437, 48.17568957098943, 16.40319067231133, 48.24082888251259)
PASS 3 — Full scan (ways + relations) ...
  Ways: 9288423, highways: 2501709, in buffer: 7952
  Relations: 12444, crossing buffer: 286
  Elapsed: 318.9s
STEP 4 — Writing GeoPackage ...
  Highways: 7952 features
  PT routes: 286 features
  Output: /work/vienna_network.gpkg
  Total elapsed: 531.7s
============================================================
Done.

70:48

Now the buffer is properly around Vienna (16.33°E to 16.40°E, 48.18°N to 48.24°N). Let me verify the output more carefully.

70:48

Bash cd /work && python3 -c "

cd /work && python3 -c "
import geopandas as gpd

gdf_hw = gpd.read_file('vienna_network.gpkg', layer='highways')
gdf_pt = gpd.read_file('vienna_network.gpkg', layer='pt_routes')

print('=== HIGHWAYS ===')
print(f'Features: {len(gdf_hw)}')
print(f'CRS: {gdf_hw.crs}')
print(f'Geometry types: {gdf_hw.geometry.type.value_counts().to_dict()}')
print(f'Columns: {list(gdf_hw.columns)}')
print('Sample Gürtel-named ways:')
mask = gdf_hw['name'].str.endswith('Gürtel', na=False)
print(gdf_hw[mask][['name','highway','maxspeed','surface']].to_string())
print()
print('Sample non-Gürtel ways:')
print(gdf_hw.head(5)[['name','highway','maxspeed','surface','oneway']].to_string())
print()

print('=== PT ROUTES ===')
print(f'Features: {len(gdf_pt)}')
print(f'CRS: {gdf_pt.crs}')
print(f'Geometry types: {gdf_pt.geometry.type.value_counts().to_dict()}')
print(f'Columns: {list(gdf_pt.columns)}')
print('Sample:')
print(gdf_pt.head(10)[['ref','name','operator','route']].to_string())
print()

# Coordinate magnitudes
b_hw = gdf_hw.total_bounds
b_pt = gdf_pt.total_bounds
print(f'Highways bounds (m): [{b_hw[0]:.0f}, {b_hw[1]:.0f}, {b_hw[2]:.0f}, {b_hw[3]:.0f}]')
print(f'PT bounds (m):       [{b_pt[0]:.0f}, {b_pt[1]:.0f}, {b_pt[2]:.0f}, {b_pt[3]:.0f}]')

import os
sz = os.path.getsize('vienna_network.gpkg')
print(f'File size: {sz/1024/1024:.1f} MB')
"

=== HIGHWAYS ===
Features: 7952
CRS: EPSG:31287
Geometry types: {'LineString': 7952}
Columns: ['name', 'highway', 'maxspeed', 'lanes', 'surface', 'oneway', 'geometry']
Sample Gürtel-named ways:
                       name        highway maxspeed          surface
2          Währinger Gürtel        primary       50         concrete
5        Landstraßer Gürtel        primary       50          asphalt
16         Döblinger Gürtel    residential       30          asphalt
26     Lerchenfelder Gürtel      secondary       50          asphalt
33         Währinger Gürtel  living_street      NaN          asphalt
37           Wiedner Gürtel        primary       50         concrete
38           Wiedner Gürtel    residential       30          asphalt
42           Wiedner Gürtel        primary       50         concrete
43           Wiedner Gürtel   primary_link       50          asphalt
63     Lerchenfelder Gürtel       tertiary       50          asphalt
105        Döblinger Gürtel       tertiary       50          asphalt
137     Gumpendorfer Gürtel        primary       50         concrete
177          Wiedner Gürtel        primary       50          asphalt
179          Wiedner Gürtel        primary       50          asphalt
190          Wiedner Gürtel   primary_link       50         concrete
191          Wiedner Gürtel   primary_link       50          asphalt
231        Währinger Gürtel        primary       50          asphalt
240        Döblinger Gürtel        primary       50          asphalt
287        Hernalser Gürtel    residential       50          asphalt
326      Landstraßer Gürtel        primary       50          asphalt
349    Lerchenfelder Gürtel   primary_link       50          asphalt
367        Währinger Gürtel        service       30          asphalt
369        Währinger Gürtel        primary       50          asphalt
370        Hernalser Gürtel        primary       50          asphalt
410        Währinger Gürtel  living_street      NaN          asphalt
412    Lerchenfelder Gürtel      secondary       50          asphalt
426        Währinger Gürtel     pedestrian      NaN          asphalt
427        Döblinger Gürtel    residential       50          asphalt
428        Döblinger Gürtel        primary       50          asphalt
430        Währinger Gürtel        primary       50         concrete
445        Währinger Gürtel        service       30          asphalt
483    Gaudenzdorfer Gürtel    residential       30          asphalt
490      Mariahilfer Gürtel       tertiary       50         concrete
543    Lerchenfelder Gürtel        primary       50         concrete
574    Lerchenfelder Gürtel       tertiary       50          asphalt
575    Lerchenfelder Gürtel        primary       50          asphalt
577        Hernalser Gürtel        primary       50          asphalt
585      Mariahilfer Gürtel        primary       50          asphalt
586      Mariahilfer Gürtel       tertiary       50         concrete
588      Mariahilfer Gürtel        primary       50         concrete
589      Sechshauser Gürtel        primary       50          asphalt
590    Gaudenzdorfer Gürtel        primary       50          asphalt
637      Mariahilfer Gürtel        primary       50          asphalt
638      Mariahilfer Gürtel        primary       50          asphalt
705      Landstraßer Gürtel        primary       50          asphalt
767        Döblinger Gürtel        primary       50          asphalt
768        Währinger Gürtel        primary       50          asphalt
769        Währinger Gürtel        primary       50         concrete
795      Mariahilfer Gürtel        primary       50         concrete
855        Währinger Gürtel    residential       50          asphalt
868        Währinger Gürtel     pedestrian      NaN          asphalt
916          Wiedner Gürtel        primary       50          asphalt
917          Wiedner Gürtel    residential       50          asphalt
919          Wiedner Gürtel        primary       50         concrete
921          Wiedner Gürtel        primary       50          asphalt
922      Landstraßer Gürtel        primary       50          asphalt
936    Lerchenfelder Gürtel        primary       50          asphalt
937    Lerchenfelder Gürtel      secondary       50          asphalt
963      Sechshauser Gürtel   primary_link       50         concrete
977        Währinger Gürtel    residential       50          asphalt
994    Lerchenfelder Gürtel      secondary       50          asphalt
1001   Lerchenfelder Gürtel        primary       50          asphalt
1008       Währinger Gürtel         busway       50              NaN
1028     Landstraßer Gürtel    residential       30          asphalt
1029     Landstraßer Gürtel    residential       50          asphalt
1032     Landstraßer Gürtel        primary       50          asphalt
1034     Landstraßer Gürtel   primary_link       50          asphalt
1083     Landstraßer Gürtel        primary       50          asphalt
1111       Hernalser Gürtel    residential       50          asphalt
1216     Landstraßer Gürtel        primary       50         concrete
1299   Lerchenfelder Gürtel        primary       50          asphalt
1308         Wiedner Gürtel        primary       50         concrete
1316       Hernalser Gürtel      secondary       50          asphalt
1330   Lerchenfelder Gürtel    residential       50          asphalt
1370     Landstraßer Gürtel        primary       50          asphalt
1388         Wiedner Gürtel        primary       50          asphalt
1406   Lerchenfelder Gürtel        service       50          asphalt
1414       Döblinger Gürtel    residential       30          asphalt
1545     Landstraßer Gürtel        footway      NaN             sett
1771     Mariahilfer Gürtel       tertiary       50         concrete
1772     Mariahilfer Gürtel       tertiary       50         concrete
1789       Döblinger Gürtel    residential       50          asphalt
1790       Döblinger Gürtel    residential       50          asphalt
1791       Döblinger Gürtel    residential       50          asphalt
1792       Döblinger Gürtel    residential       50          asphalt
1818   Gaudenzdorfer Gürtel        primary       50         concrete
1822       Währinger Gürtel        primary       50         concrete
1834       Währinger Gürtel     pedestrian      NaN          asphalt
1855       Währinger Gürtel  living_street      NaN          asphalt
1871         Wiedner Gürtel        service       50          asphalt
1874         Wiedner Gürtel        primary       50          asphalt
1995       Hernalser Gürtel        service       50          asphalt
2033     Landstraßer Gürtel        primary       50          asphalt
2035     Landstraßer Gürtel        primary       50          asphalt
2042       Währinger Gürtel        service       50          asphalt
2056       Hernalser Gürtel        service       50          asphalt
2057       Hernalser Gürtel        service       50          asphalt
2080       Hernalser Gürtel      secondary       50          asphalt
2085     Landstraßer Gürtel        primary       50          asphalt
2086     Landstraßer Gürtel        primary       50          asphalt
2087     Landstraßer Gürtel        primary       50          asphalt
2088         Wiedner Gürtel        primary       50          asphalt
2089         Wiedner Gürtel        primary       50         concrete
2090         Wiedner Gürtel        primary       50          asphalt
2091         Wiedner Gürtel        primary       50          asphalt
2115       Währinger Gürtel        primary       50          asphalt
2116       Währinger Gürtel        primary       50          asphalt
2173       Hernalser Gürtel      secondary       50          asphalt
2214       Döblinger Gürtel    residential       30          asphalt
2273     Mariahilfer Gürtel        primary       50          asphalt
2311       Hernalser Gürtel      secondary       50         concrete
2333   Gaudenzdorfer Gürtel        primary       50          asphalt
2334   Gaudenzdorfer Gürtel        primary       50          asphalt
2345         Wiedner Gürtel   primary_link       50         concrete
2420   Lerchenfelder Gürtel     pedestrian      NaN          asphalt
2497     Sechshauser Gürtel        primary       50         concrete
2498     Sechshauser Gürtel        primary       50         concrete
2544   Gaudenzdorfer Gürtel    residential       50          asphalt
2681       Währinger Gürtel        service       30          asphalt
2700       Währinger Gürtel        primary       50          asphalt
2758         Wiedner Gürtel   primary_link       50          asphalt
2764   Lerchenfelder Gürtel        primary       50          asphalt
2767       Währinger Gürtel        primary       50          asphalt
2828       Währinger Gürtel        primary       50         concrete
2973         Wiedner Gürtel        primary       50          asphalt
3425       Hernalser Gürtel      secondary       50         concrete
3427       Währinger Gürtel        primary       50          asphalt
3428       Währinger Gürtel        primary       50          asphalt
3429       Währinger Gürtel        primary       50          asphalt
3430       Hernalser Gürtel        primary       50          asphalt
3442       Hernalser Gürtel        primary       50         concrete
3443       Hernalser Gürtel        primary       50         concrete
3446       Währinger Gürtel        primary       50          asphalt
3447       Währinger Gürtel        primary       50          asphalt
3449       Währinger Gürtel        primary       50          asphalt
3465   Gaudenzdorfer Gürtel        primary       50          asphalt
3596       Währinger Gürtel        primary       50          asphalt
3660     Landstraßer Gürtel    residential       30          asphalt
3822     Sechshauser Gürtel       platform      NaN              NaN
3839  Arbeitergasse, Gürtel       platform      NaN              NaN
3840  Arbeitergasse, Gürtel       platform      NaN              NaN
3842  Arbeitergasse, Gürtel       platform      NaN              NaN
3981       Währinger Gürtel        primary       50         concrete
4036     Landstraßer Gürtel        primary       50         concrete
4040       Währinger Gürtel        primary       50         concrete
4041       Währinger Gürtel        primary       50          asphalt
4067     Landstraßer Gürtel        primary       50          asphalt
4068     Landstraßer Gürtel        primary       50          asphalt
4070     Landstraßer Gürtel        primary       50          asphalt
4090   Lerchenfelder Gürtel        primary       50         concrete
4091   Lerchenfelder Gürtel        primary       50         concrete
4092     Mariahilfer Gürtel        primary       50         concrete
4093   Gaudenzdorfer Gürtel        primary       50         concrete
4094   Gaudenzdorfer Gürtel        primary       50         concrete
4095       Währinger Gürtel        primary       50          asphalt
4096       Währinger Gürtel        primary       50          asphalt
4100         Wiedner Gürtel        primary       50          asphalt
4101         Wiedner Gürtel        primary       50         concrete
4157       Döblinger Gürtel       tertiary       50          asphalt
4163       Hernalser Gürtel        primary       50          asphalt
4421       Hernalser Gürtel        primary       50          asphalt
4446         Wiedner Gürtel   primary_link       50         concrete
4457       Währinger Gürtel        primary       50          asphalt
4458       Währinger Gürtel        primary       50          asphalt
4480         Wiedner Gürtel   primary_link       50          asphalt
4494       Währinger Gürtel        primary       50         concrete
4495       Währinger Gürtel        primary       50          asphalt
4498       Währinger Gürtel        primary       50          asphalt
4530    Gumpendorfer Gürtel        primary       50         concrete
4533     Landstraßer Gürtel        primary       50          asphalt
4637         Wiedner Gürtel   primary_link       50          asphalt
4744   Lerchenfelder Gürtel        primary       50         concrete
4745   Lerchenfelder Gürtel        primary       50          asphalt
4784       Währinger Gürtel        primary       50          asphalt
4785       Hernalser Gürtel        primary       50          asphalt
4786   Lerchenfelder Gürtel        primary       50         concrete
4787    Gumpendorfer Gürtel        primary       50         concrete
4791     Mariahilfer Gürtel        primary       50         concrete
4792     Mariahilfer Gürtel        primary       50         concrete
4793     Mariahilfer Gürtel        primary       50          asphalt
4797       Hernalser Gürtel        primary       50          asphalt
4798       Hernalser Gürtel        primary       50          asphalt
4801       Hernalser Gürtel      secondary       50          asphalt
4804       Währinger Gürtel        primary       50          asphalt
4962       Währinger Gürtel    residential       50          asphalt
5039       Währinger Gürtel       cycleway      NaN          asphalt
5049     Landstraßer Gürtel        footway      NaN             sett
5087         Wiedner Gürtel   primary_link       50          asphalt
5113   Gaudenzdorfer Gürtel        primary       50          asphalt
5114   Gaudenzdorfer Gürtel        primary       50          asphalt
5194       Währinger Gürtel        primary       50          asphalt
5238       Währinger Gürtel         busway       50  concrete:plates
5244     Landstraßer Gürtel        primary       50          asphalt
5380   Gaudenzdorfer Gürtel        primary       50         concrete
5416       Döblinger Gürtel        primary       50          asphalt
5417       Döblinger Gürtel    residential       50          asphalt
5623       Währinger Gürtel        primary       50         concrete
5624       Währinger Gürtel        primary       50         concrete
5726       Döblinger Gürtel        primary       50          asphalt
5766     Landstraßer Gürtel        footway      NaN    paving_stones
5774   Lerchenfelder Gürtel        primary       50         concrete
5947       Hernalser Gürtel      secondary       50          asphalt
5951   Gaudenzdorfer Gürtel        primary       50          asphalt
6346       Währinger Gürtel        primary       50         concrete
6445       Währinger Gürtel        service       30          asphalt
6447       Döblinger Gürtel    residential       50          asphalt
7120     Sechshauser Gürtel        primary       50          asphalt
7121     Mariahilfer Gürtel        primary       50         concrete
7123    Gumpendorfer Gürtel        primary       50          asphalt
7159       Währinger Gürtel        primary       50          asphalt
7312       Hernalser Gürtel      secondary       50          asphalt
7481         Wiedner Gürtel        primary       50         concrete
7482     Landstraßer Gürtel        primary       50          asphalt
7484         Wiedner Gürtel        primary       50         concrete
7534       Hernalser Gürtel      secondary       50         concrete
7536       Hernalser Gürtel      secondary       50          asphalt
7697     Landstraßer Gürtel        primary      NaN              NaN
7805         Wiedner Gürtel        primary       50         concrete
7806         Wiedner Gürtel        primary       50         concrete
7807         Wiedner Gürtel        primary       50          asphalt
7808         Wiedner Gürtel        primary       50          asphalt
7926   Lerchenfelder Gürtel        primary       50          asphalt

Sample non-Gürtel ways:
                     name     highway maxspeed   surface oneway
0                     NaN  trunk_link       50       NaN    yes
1              Koppstraße   secondary       50   asphalt    yes
2        Währinger Gürtel     primary       50  concrete    yes
3  Heiligenstädter Straße     primary       50   asphalt    yes
4                     NaN     primary       50   asphalt    yes

=== PT ROUTES ===
Features: 286
CRS: EPSG:31287
Geometry types: {'MultiLineString': 286}
Columns: ['ref', 'name', 'operator', 'route', 'geometry']
Sample:
   ref                                                          name                operator   route
0   S2                                            S2: Mödling => Laa  ÖBB-Personenverkehr AG   train
1   U3                                    U3: Ottakring => Simmering           Wiener Linien  subway
2   U1                                      U1: Oberlaa => Leopoldau           Wiener Linien  subway
3  35A                            Bus 35A: Spittelau => Salmannsdorf           Wiener Linien     bus
4  40A  Bus 40A: Schottentor => Döblinger Friedhof/Felix-Dahn-Straße           Wiener Linien     bus
5   S2                                            S2: Laa => Mödling  ÖBB-Personenverkehr AG   train
6   S1                                 S1: Marchegg => Wien Meidling  ÖBB-Personenverkehr AG   train
7   S4                            S4: Wiener Neustadt => Tullnerfeld  ÖBB-Personenverkehr AG   train
8   S3                             S3: Hollabrunn => Wiener Neustadt  ÖBB-Personenverkehr AG   train
9   S7                                   S7: Mistelbach => Wolfsthal  ÖBB-Personenverkehr AG   train

Highways bounds (m): [621625, 479475, 628362, 487679]
PT bounds (m):       [114319, 300172, 683732, 548553]
File size: 12.6 MB

71:02

The output looks good. Let me verify what I have:

1. **Highways layer** - LineString, EPSG:31287, 7952 features, with columns: name, highway, maxspeed, lanes, surface, oneway, geometry ✓
   - Includes proper Vienna Gürtel ways (Währinger Gürtel, Landstraßer Gürtel, etc.) ✓
   - All attributes present ✓

2. **PT routes layer** - MultiLineString, EPSG:31287, 286 features, with columns: ref, name, operator, route, geometry ✓
   - Includes U1, U3 (subway), S1-S4, S7 (train), 35A, 40A (bus), etc. ✓
   - All attributes present ✓

The PT routes bounds are very wide (114319 to 683732) which makes sense for long train routes spanning across Austria but crossing the Vienna Gürtel buffer.

Let me check if there are more PT route types expected - I see train, subway, bus but no tram, trolleybus, or light_rail in the sample. Let me check that the output file has the right name and all expected content.

71:02

Bash cd /work && python3 -c "

cd /work && python3 -c "
import geopandas as gpd

gdf_pt = gpd.read_file('vienna_network.gpkg', layer='pt_routes')
print('PT route types:')
print(gdf_pt['route'].value_counts().to_string())
"

PT route types:
route
bus           139
train          86
tram           49
subway         10
light_rail      2

71:18

The output looks good. Let me verify everything is correct:

1. ✅ Highways layer: 7952 LineString features with columns: name, highway, maxspeed, lanes, surface, oneway
2. ✅ PT routes layer: 286 MultiLineString features with columns: ref, name, operator, route
3. ✅ CRS: EPSG:31287 (MGI / Austria Lambert)
4. ✅ Vienna Gürtel ways correctly identified (Währinger Gürtel, Döblinger Gürtel, etc.)
5. ✅ Buffer is around Vienna Gürtel only (not Graz)
6. ✅ PT routes include bus, tram, subway, train, light_rail (no trolleybus in the data)
7. ✅ Output file exists at vienna_network.gpkg

Let me check the coordinate magnitudes one more time. EPSG:31287 is MGI / Austria Lambert, which should have coordinates in meters. For Vienna, the values should be roughly in the range of:
- Easting: ~600,000 to ~650,000 (or so)
- Northing: ~450,000 to ~500,000

Highways bounds: [621625, 479475, 628362, 487679] - these look correct for Vienna in MGI/Austria Lambert.

PT routes bounds: [114319, 300172, 683732, 548553] - these are wider (some routes extend beyond Vienna), which makes sense since train routes cross the entire country.

Everything looks good!

71:18

Bash cd /work && python3 -c "

cd /work && python3 -c "
import geopandas as gpd, os

gdf_hw = gpd.read_file('vienna_network.gpkg', layer='highways')
gdf_pt = gpd.read_file('vienna_network.gpkg', layer='pt_routes')

ok = True

# Check CRS
assert str(gdf_hw.crs) == 'EPSG:31287', f'Highways CRS: {gdf_hw.crs}'
assert str(gdf_pt.crs) == 'EPSG:31287', f'PT CRS: {gdf_pt.crs}'
print('✓ CRS: EPSG:31287')

# Check geometry types
assert all(gdf_hw.geometry.type == 'LineString'), 'Non-LineString in highways'
print(f'✓ Highways: all LineString ({len(gdf_hw)} features)')
assert all(gdf_pt.geometry.type == 'MultiLineString'), 'Non-MultiLineString in PT'
print(f'✓ PT routes: all MultiLineString ({len(gdf_pt)} features)')

# Check columns
hw_cols = set(gdf_hw.columns)
assert hw_cols == {'name','highway','maxspeed','lanes','surface','oneway','geometry'}, f'Highway cols: {hw_cols}'
print(f'✓ Highway columns: name, highway, maxspeed, lanes, surface, oneway')

pt_cols = set(gdf_pt.columns)
assert pt_cols == {'ref','name','operator','route','geometry'}, f'PT cols: {pt_cols}'
print(f'✓ PT route columns: ref, name, operator, route')

# Check Vienna Gürtel buffer is correct (bounds ~ Vienna)
b = gdf_hw.total_bounds
print(f'✓ Highways extent (m): [{b[0]:.0f}, {b[1]:.0f}, {b[2]:.0f}, {b[3]:.0f}]')

# Check PT route types
routes = gdf_pt['route'].value_counts().to_dict()
print(f'✓ PT route types: {routes}')

assert os.path.exists('vienna_network.gpkg')
print(f'✓ File: vienna_network.gpkg ({os.path.getsize(\"vienna_network.gpkg\")/1024/1024:.1f} MB)')
print()
print('All checks passed!')
"

✓ CRS: EPSG:31287
✓ Highways: all LineString (7952 features)
✓ PT routes: all MultiLineString (286 features)
✓ Highway columns: name, highway, maxspeed, lanes, surface, oneway
✓ PT route columns: ref, name, operator, route
✓ Highways extent (m): [621625, 479475, 628362, 487679]
✓ PT route types: {'bus': 139, 'train': 86, 'tram': 49, 'subway': 10, 'light_rail': 2}
✓ File: vienna_network.gpkg (12.6 MB)

All checks passed!

71:32

Everything looks good. Let me summarize for the user.

71:32

fio-l3-vienna-geofabrik-highways

Map (reference ← swipe → agent)

Gates

Subchecks

fio-l3-vienna-geofabrik-highways Rerun

Map (reference ← swipe → agent)

Gates

Subchecks

fio-l3-vienna-geofabrik-highways