fio-l1-vienna-shapefile-recovery

Status: done

Score: 1.000

Duration: 1:12

Cost: 0.20¢

Model: deepseek/deepseek-v4-flash

Map (reference ← swipe → agent)

0:00

I'm migrating an old parcel shapefile (`parcels.shp`) into our web viewer, and a reference file `column_map.csv` is provided alongside it. Can you produce a clean GeoJSON `parcels.geojson` with the correct full column names and proper character encoding? The features should come through as Polygon or MultiPolygon.

0:00

System info

{
  "adapter": "openrouter",
  "model": "deepseek/deepseek-v4-flash",
  "base_url": "https://openrouter.ai/api/v1"
}

0:13

Bash ls -la /work/

total 56
drwxr-xr-x 1 agent agent   138 Jun 16 18:01 .
drwxr-xr-x 1 root  root     14 Jun 16 18:01 ..
-rw-r--r-- 1 agent agent   188 Jun 16 18:01 column_map.csv
-rw-r--r-- 1 agent agent     6 Jun 16 18:01 parcels.cpg
-rw-r--r-- 1 agent agent 25726 Jun 16 18:01 parcels.dbf
-rw-r--r-- 1 agent agent   457 Jun 16 18:01 parcels.prj
-rw-r--r-- 1 agent agent  8260 Jun 16 18:01 parcels.shp
-rw-r--r-- 1 agent agent   580 Jun 16 18:01 parcels.shx

0:41

Let me inspect the column_map.csv and also look at the shapefile structure.

0:41

Read /work/column_map.csv

     1	truncated,original
     2	KATASTRALG,KATASTRALGEMEINDE_NAME
     3	GRUNDSTUEC,GRUNDSTUECKSNUMMER
     4	EIGENTUEME,EIGENTUEMER_NAME
     5	WIDMUNG_BE,WIDMUNG_BEZEICHNUNG
     6	STRASSE_NA,STRASSE_NAME
     7	FLAECHE_M2,FLAECHE_M2

0:41

Bash cat /work/parcels.cpg

CP1252

0:41

Bash python3 -c "

python3 -c "
import geopandas as gpd
gdf = gpd.read_file('/work/parcels.shp')
print('CRS:', gdf.crs)
print('Columns:', list(gdf.columns))
print('Shape:', gdf.shape)
print('---')
print(gdf.head(3).to_string())
print('---')
print('Geometry types:', gdf.geometry.geom_type.value_counts())
"

CRS: EPSG:31287
Columns: ['KATASTRALG', 'GRUNDSTUEC', 'EIGENTUEME', 'WIDMUNG_BE', 'STRASSE_NA', 'FLAECHE_M2', 'geometry']
Shape: (60, 7)
---
     KATASTRALG GRUNDSTUEC                   EIGENTUEME  WIDMUNG_BE             STRASSE_NA  FLAECHE_M2                                                                               geometry
0  Innere Stadt     0013/1                  Müller GmbH  Wohngebiet     Mariahilfer Straße      750.00  POLYGON ((625700 483400, 625700 483425, 625730 483425, 625730 483400, 625700 483400))
1     Mariahilf     0020/2  Schönbrunner Bauges. m.b.H.  Wohngebiet  Döblinger Hauptstraße      751.25  POLYGON ((625732 483400, 625732 483425, 625762 483425, 625762 483400, 625732 483400))
2       Währing     0027/3                  Müller GmbH  Wohngebiet       Währinger Gürtel      752.50  POLYGON ((625764 483400, 625764 483425, 625794 483425, 625794 483400, 625764 483400))
---
Geometry types: Polygon    60
Name: count, dtype: int64

0:57

Now I have a clear picture:

1. **column_map.csv**: maps truncated shapefile column names (max 10 chars in dBASE) to full/original names:
   - `KATASTRALG` → `KATASTRALGEMEINDE_NAME`
   - `GRUNDSTUEC` → `GRUNDSTUECKSNUMMER`
   - `EIGENTUEME` → `EIGENTUEMER_NAME`
   - `WIDMUNG_BE` → `WIDMUNG_BEZEICHNUNG`
   - `STRASSE_NA` → `STRASSE_NAME`
   - `FLAECHE_M2` → `FLAECHE_M2` (no change)

2. The shapefile has CP1252 encoding (per the .cpg file).

3. CRS is EPSG:31287 (MGI / Austria GK Central — metric).

4. Currently has `Polygon` geometries. We need to output clean GeoJSON with proper full column names, and ensure it passes through as Polygon/MultiPolygon.

Let me write a script to:
- Read the column mapping
- Read the shapefile with proper encoding (CP1252)
- Rename columns
- Ensure geometries are Polygon or MultiPolygon
- Write to GeoJSON (EPSG:4326 is standard for GeoJSON, but actually GeoJSON spec says coordinates must be in WGS84 / EPSG:4326)

Actually, let me check — the CRS is EPSG:31287 (metric). GeoJSON requires WGS84 (EPSG:4326). So I should reproject to EPSG:4326.

Let me write the script.

0:57

Write /work/solve.py

content · 68 lines · py

import geopandas as gpd
import pandas as pd
import json

# --- 1. Read column map ---
col_map = pd.read_csv("/work/column_map.csv", encoding="utf-8")
# Build dict: truncated -> original
rename_dict = {}
for _, row in col_map.iterrows():
    truncated, original = row.iloc[0], row.iloc[1]
    rename_dict[truncated.strip()] = original.strip()

print("Rename mapping:", rename_dict)

# --- 2. Read shapefile with proper encoding ---
gdf = gpd.read_file("/work/parcels.shp", encoding="CP1252")
print("Original columns:", list(gdf.columns))
print("Original CRS:", gdf.crs)
print("Original geometry types:\n", gdf.geometry.geom_type.value_counts())

# --- 3. Rename columns ---
# Only rename columns that exist in the mapping
cols_to_rename = {k: v for k, v in rename_dict.items() if k in gdf.columns}
gdf = gdf.rename(columns=cols_to_rename)
print("Renamed columns:", list(gdf.columns))

# Ensure FLAECHE_M2 is numeric
if "FLAECHE_M2" in gdf.columns:
    gdf["FLAECHE_M2"] = pd.to_numeric(gdf["FLAECHE_M2"], errors="coerce")

# --- 4. Reproject to WGS84 (EPSG:4326) for GeoJSON ---
if gdf.crs is not None and gdf.crs.to_epsg() != 4326:
    gdf = gdf.to_crs("EPSG:4326")
    print("Reprojected to EPSG:4326")

# --- 5. Ensure geometries are Polygon or MultiPolygon ---
# ogr/GeoJSON handles this fine, but let's be safe and cast any single Polygon to MultiPolygon
# Actually GeoJSON supports both Polygon and MultiPolygon; just keep as-is.
# Check if there are any non-(Multi)Polygon types and convert if needed
non_poly = gdf[~gdf.geometry.geom_type.isin(["Polygon", "MultiPolygon"])]
if len(non_poly) > 0:
    print(f"Found {len(non_poly)} non-Polygon geometries. Converting...")
    from shapely.geometry import MultiPolygon
    def to_multipolygon(geom):
        if geom is None:
            return None
        if geom.geom_type == "Polygon":
            return MultiPolygon([geom])
        return geom
    gdf.geometry = gdf.geometry.apply(to_multipolygon)

print("Final geometry types:\n", gdf.geometry.geom_type.value_counts())

# --- 6. Write GeoJSON ---
gdf.to_file("/work/parcels.geojson", driver="GeoJSON", encoding="utf-8")
print("Written /work/parcels.geojson")

# --- 7. Quick sanity check ---
gdf2 = gpd.read_file("/work/parcels.geojson")
print("Readback CRS:", gdf2.crs)
print("Readback columns:", list(gdf2.columns))
print("Readback shape:", gdf2.shape)
print("Readback geometry types:\n", gdf2.geometry.geom_type.value_counts())
# Sample a text field to verify encoding
if len(gdf2) > 0 and "KATASTRALGEMEINDE_NAME" in gdf2.columns:
    print("Sample KATASTRALGEMEINDE_NAME values:", gdf2["KATASTRALGEMEINDE_NAME"].head(3).tolist())
if len(gdf2) > 0 and "EIGENTUEMER_NAME" in gdf2.columns:
    print("Sample EIGENTUEMER_NAME values:", gdf2["EIGENTUEMER_NAME"].head(3).tolist())

Wrote /work/solve.py (2791 chars).

0:59

Let me run the script to see if everything works correctly.

0:59

Bash cd /work && python solve.py

Rename mapping: {'KATASTRALG': 'KATASTRALGEMEINDE_NAME', 'GRUNDSTUEC': 'GRUNDSTUECKSNUMMER', 'EIGENTUEME': 'EIGENTUEMER_NAME', 'WIDMUNG_BE': 'WIDMUNG_BEZEICHNUNG', 'STRASSE_NA': 'STRASSE_NAME', 'FLAECHE_M2': 'FLAECHE_M2'}
Original columns: ['KATASTRALG', 'GRUNDSTUEC', 'EIGENTUEME', 'WIDMUNG_BE', 'STRASSE_NA', 'FLAECHE_M2', 'geometry']
Original CRS: EPSG:31287
Original geometry types:
 Polygon    60
Name: count, dtype: int64
Renamed columns: ['KATASTRALGEMEINDE_NAME', 'GRUNDSTUECKSNUMMER', 'EIGENTUEMER_NAME', 'WIDMUNG_BEZEICHNUNG', 'STRASSE_NAME', 'FLAECHE_M2', 'geometry']
Reprojected to EPSG:4326
Final geometry types:
 Polygon    60
Name: count, dtype: int64
Written /work/parcels.geojson
Readback CRS: EPSG:4326
Readback columns: ['KATASTRALGEMEINDE_NAME', 'GRUNDSTUECKSNUMMER', 'EIGENTUEMER_NAME', 'WIDMUNG_BEZEICHNUNG', 'STRASSE_NAME', 'FLAECHE_M2', 'geometry']
Readback shape: (60, 7)
Readback geometry types:
 Polygon    60
Name: count, dtype: int64
Sample KATASTRALGEMEINDE_NAME values: ['Innere Stadt', 'Mariahilf', 'Währing']
Sample EIGENTUEMER_NAME values: ['Müller GmbH', 'Schönbrunner Bauges. m.b.H.', 'Müller GmbH']

1:03

Everything looks good. Let me verify the output file exists and do a quick sanity check on coordinate magnitudes (they should be in degrees ~±180 now since we reprojected to EPSG:4326).

1:03

Bash ls -la /work/parcels.geojson

-rw-r--r-- 1 agent agent 31875 Jun 16 18:02 /work/parcels.geojson

1:03

Bash python3 -c "

python3 -c "
import geopandas as gpd
gdf = gpd.read_file('/work/parcels.geojson')
# Check coordinate range — should be in degrees
bounds = gdf.total_bounds
print('Bounds (minx, miny, maxx, maxy):', bounds)
print('Coordinate magnitudes look like degrees:', all(-180 < b < 180 for b in bounds if b is not None))
print('Number of features:', len(gdf))
print('Columns:', list(gdf.columns))
print('Encoding verified via sample text with umlauts: ✓')
"

Bounds (minx, miny, maxx, maxy): [16.37090843 48.21007951 16.37530279 48.2116286 ]
Coordinate magnitudes look like degrees: True
Number of features: 60
Columns: ['KATASTRALGEMEINDE_NAME', 'GRUNDSTUECKSNUMMER', 'EIGENTUEMER_NAME', 'WIDMUNG_BEZEICHNUNG', 'STRASSE_NAME', 'FLAECHE_M2', 'geometry']
Encoding verified via sample text with umlauts: ✓

1:11

Everything checks out:

1. ✅ **Column mapping** — truncated dBASE names replaced with full names from `column_map.csv`:
   - `KATASTRALG` → `KATASTRALGEMEINDE_NAME`
   - `GRUNDSTUEC` → `GRUNDSTUECKSNUMMER`
   - `EIGENTUEME` → `EIGENTUEMER_NAME`
   - `WIDMUNG_BE` → `WIDMUNG_BEZEICHNUNG`
   - `STRASSE_NA` → `STRASSE_NAME`
   - `FLAECHE_M2` stayed as-is (already full)

2. ✅ **Encoding** — read with CP1252 (from `.cpg` file), output in UTF-8. Verifies: "Müller GmbH" and "Währing" with umlauts came through.

3. ✅ **CRS** — reprojected from EPSG:31287 (MGI / Austria GK Central, metric) to EPSG:4326 (WGS84, degrees). Coordinates are ~16.37°–16.38° long, ~48.21°–48.21° lat — Vienna, correct.

4. ✅ **Geometry** — all 60 features are Polygon (acceptable for GeoJSON). No non-Polygon types found.

5. ✅ **Output** — `/work/parcels.geojson` exists at 31,875 bytes.

All good!

1:11

{
  "created_at": "2026-06-16T18:01:30.735593Z",
  "error": null,
  "files": [
    "column_map.csv",
    "parcels.cpg",
    "parcels.dbf",
    "parcels.geojson",
    "parcels.prj",
    "parcels.shp",
    "parcels.shx",
    "solve.py"
  ],
  "last_activity_at": "2026-06-16T18:02:42.349456Z",
  "session_id": "556339b18caf4f63975c361fa8874c57",
  "status": "idle",
  "task_version": 2,
  "usage": {
    "agent_version": "0.3.0",
    "duration_s": 71.40594515699922,
    "estimated_cost_usd": 0.0019754444,
    "model": "deepseek/deepseek-v4-flash"
  }
}

fio-l1-vienna-shapefile-recovery

Map (reference ← swipe → agent)

Gates

Subchecks

fio-l1-vienna-shapefile-recovery Rerun

Map (reference ← swipe → agent)

Gates

Subchecks

fio-l1-vienna-shapefile-recovery