fio-l1-nyc-csvwkt-addresses
Analyst notes
Description
Tests CSV-with-WKT reading plus Arrow-schema literacy on the GeoParquet write side. The agent has to parse the WKT geometry column, then coerce `recorded_at` to `timestamp[us]` and `unit_count` to `int32` exactly, not the pyarrow defaults (`timestamp[ns]`, `int64`). The prompt deliberately does not mention the leftover `geometry_wkt` text column, so the agent has to recognise on its own that storing both the parsed geometry and the original WKT string defeats the point of the conversion.
Approach
- Read the all-quoted CSV and treat every column as text on the way in.
- Parse the WKT column into Point geometry in EPSG:4326.
- Cast `recorded_at` to a microsecond timestamp and `unit_count` to int32, keeping the other address text columns as strings so leading-zero postcodes survive.
- Drop the original WKT text column so the output carries the geometry only once.
- Write `addresses.geoparquet` with the row count matching the input exactly.
Pitfalls
- Leaving the `geometry_wkt` text column in alongside the parsed geometry doubles storage and confuses downstream consumers.
- Letting pandas default `recorded_at` to `timestamp[ns]` makes DuckDB and other tools cast on every query, which is exactly what the persona is trying to avoid.
- Calling `.astype(int)` on `unit_count` lands int64 instead of the requested int32, so the SUM aggregations widen and the file gets larger than it needs to be.
- Letting type inference re-type `postcode` or `number` to integer mangles leading-zero ZIPs and fraction-style house numbers like `1/2`.
- Silently dropping malformed rows during CSV read breaks the persona's downstream `SUM(unit_count)` so the row count has to match the input exactly.
- Forgetting to declare EPSG:4326 on the GeoParquet, or reprojecting the WKT to a metric CRS, throws the geometry comparison off.
Map
Recent runs task v1
| adapter | started | score | steps | duration | cost | status |
|---|---|---|---|---|---|---|
| openrouter-gemma4-26b-basic | 2026-06-18T07:32:32Z | pending | — | — | — | pending |
| openrouter-deepseek-v4-flash-basic | 2026-06-18T03:08:04Z | 1.00 | 6 | 0:42 | 0.11¢ | done |
| openrouter-deepseek-v4-flash-detailed | 2026-06-17T22:01:33Z | 1.00 | 23 | 2:45 | 0.85¢ | done |
| openrouter-gemma4-26b-detailed | 2026-06-17T19:47:47Z | 1.00 | 12 | 2:13 | 0.54¢ | done |
| openrouter-deepseek-v4-flash-basic | 2026-06-16T21:43:55Z | 1.00 | 7 | 1:13 | 0.19¢ | done |