fio-l1-vienna-shapefile-recovery
Map (reference ← swipe → agent)
0:00
I'm migrating an old parcel shapefile (`parcels.shp`) into our web viewer, and a reference file `column_map.csv` is provided alongside it. Can you produce a clean GeoJSON `parcels.geojson` with the correct full column names and proper character encoding? The features should come through as Polygon or MultiPolygon.
0:00
System info
{
"adapter": "openrouter",
"model": "deepseek/deepseek-v4-flash",
"base_url": "https://openrouter.ai/api/v1"
}
0:13
0:13
Bash
ls -la /work/
total 56 drwxr-xr-x 1 agent agent 138 Jun 16 18:01 . drwxr-xr-x 1 root root 14 Jun 16 18:01 .. -rw-r--r-- 1 agent agent 188 Jun 16 18:01 column_map.csv -rw-r--r-- 1 agent agent 6 Jun 16 18:01 parcels.cpg -rw-r--r-- 1 agent agent 25726 Jun 16 18:01 parcels.dbf -rw-r--r-- 1 agent agent 457 Jun 16 18:01 parcels.prj -rw-r--r-- 1 agent agent 8260 Jun 16 18:01 parcels.shp -rw-r--r-- 1 agent agent 580 Jun 16 18:01 parcels.shx
0:41
Let me inspect the column_map.csv and also look at the shapefile structure.
0:41
Read
/work/column_map.csv
1 truncated,original
2 KATASTRALG,KATASTRALGEMEINDE_NAME
3 GRUNDSTUEC,GRUNDSTUECKSNUMMER
4 EIGENTUEME,EIGENTUEMER_NAME
5 WIDMUNG_BE,WIDMUNG_BEZEICHNUNG
6 STRASSE_NA,STRASSE_NAME
7 FLAECHE_M2,FLAECHE_M2
0:41
Bash
cat /work/parcels.cpg
CP1252
0:41
Bash
python3 -c "
python3 -c "
import geopandas as gpd
gdf = gpd.read_file('/work/parcels.shp')
print('CRS:', gdf.crs)
print('Columns:', list(gdf.columns))
print('Shape:', gdf.shape)
print('---')
print(gdf.head(3).to_string())
print('---')
print('Geometry types:', gdf.geometry.geom_type.value_counts())
"
CRS: EPSG:31287
Columns: ['KATASTRALG', 'GRUNDSTUEC', 'EIGENTUEME', 'WIDMUNG_BE', 'STRASSE_NA', 'FLAECHE_M2', 'geometry']
Shape: (60, 7)
---
KATASTRALG GRUNDSTUEC EIGENTUEME WIDMUNG_BE STRASSE_NA FLAECHE_M2 geometry
0 Innere Stadt 0013/1 Müller GmbH Wohngebiet Mariahilfer Straße 750.00 POLYGON ((625700 483400, 625700 483425, 625730 483425, 625730 483400, 625700 483400))
1 Mariahilf 0020/2 Schönbrunner Bauges. m.b.H. Wohngebiet Döblinger Hauptstraße 751.25 POLYGON ((625732 483400, 625732 483425, 625762 483425, 625762 483400, 625732 483400))
2 Währing 0027/3 Müller GmbH Wohngebiet Währinger Gürtel 752.50 POLYGON ((625764 483400, 625764 483425, 625794 483425, 625794 483400, 625764 483400))
---
Geometry types: Polygon 60
Name: count, dtype: int64
0:57
Now I have a clear picture: 1. **column_map.csv**: maps truncated shapefile column names (max 10 chars in dBASE) to full/original names: - `KATASTRALG` → `KATASTRALGEMEINDE_NAME` - `GRUNDSTUEC` → `GRUNDSTUECKSNUMMER` - `EIGENTUEME` → `EIGENTUEMER_NAME` - `WIDMUNG_BE` → `WIDMUNG_BEZEICHNUNG` - `STRASSE_NA` → `STRASSE_NAME` - `FLAECHE_M2` → `FLAECHE_M2` (no change) 2. The shapefile has CP1252 encoding (per the .cpg file). 3. CRS is EPSG:31287 (MGI / Austria GK Central — metric). 4. Currently has `Polygon` geometries. We need to output clean GeoJSON with proper full column names, and ensure it passes through as Polygon/MultiPolygon. Let me write a script to: - Read the column mapping - Read the shapefile with proper encoding (CP1252) - Rename columns - Ensure geometries are Polygon or MultiPolygon - Write to GeoJSON (EPSG:4326 is standard for GeoJSON, but actually GeoJSON spec says coordinates must be in WGS84 / EPSG:4326) Actually, let me check — the CRS is EPSG:31287 (metric). GeoJSON requires WGS84 (EPSG:4326). So I should reproject to EPSG:4326. Let me write the script.
0:57
0:57
Write
/work/solve.py
content · 68 lines · py
import geopandas as gpd
import pandas as pd
import json
# --- 1. Read column map ---
col_map = pd.read_csv("/work/column_map.csv", encoding="utf-8")
# Build dict: truncated -> original
rename_dict = {}
for _, row in col_map.iterrows():
truncated, original = row.iloc[0], row.iloc[1]
rename_dict[truncated.strip()] = original.strip()
print("Rename mapping:", rename_dict)
# --- 2. Read shapefile with proper encoding ---
gdf = gpd.read_file("/work/parcels.shp", encoding="CP1252")
print("Original columns:", list(gdf.columns))
print("Original CRS:", gdf.crs)
print("Original geometry types:\n", gdf.geometry.geom_type.value_counts())
# --- 3. Rename columns ---
# Only rename columns that exist in the mapping
cols_to_rename = {k: v for k, v in rename_dict.items() if k in gdf.columns}
gdf = gdf.rename(columns=cols_to_rename)
print("Renamed columns:", list(gdf.columns))
# Ensure FLAECHE_M2 is numeric
if "FLAECHE_M2" in gdf.columns:
gdf["FLAECHE_M2"] = pd.to_numeric(gdf["FLAECHE_M2"], errors="coerce")
# --- 4. Reproject to WGS84 (EPSG:4326) for GeoJSON ---
if gdf.crs is not None and gdf.crs.to_epsg() != 4326:
gdf = gdf.to_crs("EPSG:4326")
print("Reprojected to EPSG:4326")
# --- 5. Ensure geometries are Polygon or MultiPolygon ---
# ogr/GeoJSON handles this fine, but let's be safe and cast any single Polygon to MultiPolygon
# Actually GeoJSON supports both Polygon and MultiPolygon; just keep as-is.
# Check if there are any non-(Multi)Polygon types and convert if needed
non_poly = gdf[~gdf.geometry.geom_type.isin(["Polygon", "MultiPolygon"])]
if len(non_poly) > 0:
print(f"Found {len(non_poly)} non-Polygon geometries. Converting...")
from shapely.geometry import MultiPolygon
def to_multipolygon(geom):
if geom is None:
return None
if geom.geom_type == "Polygon":
return MultiPolygon([geom])
return geom
gdf.geometry = gdf.geometry.apply(to_multipolygon)
print("Final geometry types:\n", gdf.geometry.geom_type.value_counts())
# --- 6. Write GeoJSON ---
gdf.to_file("/work/parcels.geojson", driver="GeoJSON", encoding="utf-8")
print("Written /work/parcels.geojson")
# --- 7. Quick sanity check ---
gdf2 = gpd.read_file("/work/parcels.geojson")
print("Readback CRS:", gdf2.crs)
print("Readback columns:", list(gdf2.columns))
print("Readback shape:", gdf2.shape)
print("Readback geometry types:\n", gdf2.geometry.geom_type.value_counts())
# Sample a text field to verify encoding
if len(gdf2) > 0 and "KATASTRALGEMEINDE_NAME" in gdf2.columns:
print("Sample KATASTRALGEMEINDE_NAME values:", gdf2["KATASTRALGEMEINDE_NAME"].head(3).tolist())
if len(gdf2) > 0 and "EIGENTUEMER_NAME" in gdf2.columns:
print("Sample EIGENTUEMER_NAME values:", gdf2["EIGENTUEMER_NAME"].head(3).tolist())
Wrote /work/solve.py (2791 chars).
0:59
Let me run the script to see if everything works correctly.
0:59
Bash
cd /work && python solve.py
Rename mapping: {'KATASTRALG': 'KATASTRALGEMEINDE_NAME', 'GRUNDSTUEC': 'GRUNDSTUECKSNUMMER', 'EIGENTUEME': 'EIGENTUEMER_NAME', 'WIDMUNG_BE': 'WIDMUNG_BEZEICHNUNG', 'STRASSE_NA': 'STRASSE_NAME', 'FLAECHE_M2': 'FLAECHE_M2'}
Original columns: ['KATASTRALG', 'GRUNDSTUEC', 'EIGENTUEME', 'WIDMUNG_BE', 'STRASSE_NA', 'FLAECHE_M2', 'geometry']
Original CRS: EPSG:31287
Original geometry types:
Polygon 60
Name: count, dtype: int64
Renamed columns: ['KATASTRALGEMEINDE_NAME', 'GRUNDSTUECKSNUMMER', 'EIGENTUEMER_NAME', 'WIDMUNG_BEZEICHNUNG', 'STRASSE_NAME', 'FLAECHE_M2', 'geometry']
Reprojected to EPSG:4326
Final geometry types:
Polygon 60
Name: count, dtype: int64
Written /work/parcels.geojson
Readback CRS: EPSG:4326
Readback columns: ['KATASTRALGEMEINDE_NAME', 'GRUNDSTUECKSNUMMER', 'EIGENTUEMER_NAME', 'WIDMUNG_BEZEICHNUNG', 'STRASSE_NAME', 'FLAECHE_M2', 'geometry']
Readback shape: (60, 7)
Readback geometry types:
Polygon 60
Name: count, dtype: int64
Sample KATASTRALGEMEINDE_NAME values: ['Innere Stadt', 'Mariahilf', 'Währing']
Sample EIGENTUEMER_NAME values: ['Müller GmbH', 'Schönbrunner Bauges. m.b.H.', 'Müller GmbH']
1:03
Everything looks good. Let me verify the output file exists and do a quick sanity check on coordinate magnitudes (they should be in degrees ~±180 now since we reprojected to EPSG:4326).
1:03
Bash
ls -la /work/parcels.geojson
-rw-r--r-- 1 agent agent 31875 Jun 16 18:02 /work/parcels.geojson
1:03
Bash
python3 -c "
python3 -c "
import geopandas as gpd
gdf = gpd.read_file('/work/parcels.geojson')
# Check coordinate range — should be in degrees
bounds = gdf.total_bounds
print('Bounds (minx, miny, maxx, maxy):', bounds)
print('Coordinate magnitudes look like degrees:', all(-180 < b < 180 for b in bounds if b is not None))
print('Number of features:', len(gdf))
print('Columns:', list(gdf.columns))
print('Encoding verified via sample text with umlauts: ✓')
"
Bounds (minx, miny, maxx, maxy): [16.37090843 48.21007951 16.37530279 48.2116286 ] Coordinate magnitudes look like degrees: True Number of features: 60 Columns: ['KATASTRALGEMEINDE_NAME', 'GRUNDSTUECKSNUMMER', 'EIGENTUEMER_NAME', 'WIDMUNG_BEZEICHNUNG', 'STRASSE_NAME', 'FLAECHE_M2', 'geometry'] Encoding verified via sample text with umlauts: ✓
1:11
Everything checks out: 1. ✅ **Column mapping** — truncated dBASE names replaced with full names from `column_map.csv`: - `KATASTRALG` → `KATASTRALGEMEINDE_NAME` - `GRUNDSTUEC` → `GRUNDSTUECKSNUMMER` - `EIGENTUEME` → `EIGENTUEMER_NAME` - `WIDMUNG_BE` → `WIDMUNG_BEZEICHNUNG` - `STRASSE_NA` → `STRASSE_NAME` - `FLAECHE_M2` stayed as-is (already full) 2. ✅ **Encoding** — read with CP1252 (from `.cpg` file), output in UTF-8. Verifies: "Müller GmbH" and "Währing" with umlauts came through. 3. ✅ **CRS** — reprojected from EPSG:31287 (MGI / Austria GK Central, metric) to EPSG:4326 (WGS84, degrees). Coordinates are ~16.37°–16.38° long, ~48.21°–48.21° lat — Vienna, correct. 4. ✅ **Geometry** — all 60 features are Polygon (acceptable for GeoJSON). No non-Polygon types found. 5. ✅ **Output** — `/work/parcels.geojson` exists at 31,875 bytes. All good!
1:11