fio-l1-nyc-csvwkt-addresses

Analyst notes

Description

Tests CSV-with-WKT reading plus Arrow-schema literacy on the GeoParquet write side. The agent has to parse the WKT geometry column, then coerce `recorded_at` to `timestamp[us]` and `unit_count` to `int32` exactly, not the pyarrow defaults (`timestamp[ns]`, `int64`). The prompt deliberately does not mention the leftover `geometry_wkt` text column, so the agent has to recognise on its own that storing both the parsed geometry and the original WKT string defeats the point of the conversion.

Approach

  1. Read the all-quoted CSV and treat every column as text on the way in.
  2. Parse the WKT column into Point geometry in EPSG:4326.
  3. Cast `recorded_at` to a microsecond timestamp and `unit_count` to int32, keeping the other address text columns as strings so leading-zero postcodes survive.
  4. Drop the original WKT text column so the output carries the geometry only once.
  5. Write `addresses.geoparquet` with the row count matching the input exactly.

Pitfalls

Inputs

nameformatcrsgeometryfeatures
nyc_addresses csv

Expected outputs

nameformatcrsgeometryfeatures
addresses.geoparquet geoparquet EPSG:4326 Point 1,056

Map

Recent runs task v1

adapterstartedscorestepsdurationcoststatus
openrouter-gemma4-26b-basic 2026-06-18T07:32:32Z pending pending
openrouter-deepseek-v4-flash-basic 2026-06-18T03:08:04Z 1.00 6 0:42 0.11¢ done
openrouter-deepseek-v4-flash-detailed 2026-06-17T22:01:33Z 1.00 23 2:45 0.85¢ done
openrouter-gemma4-26b-detailed 2026-06-17T19:47:47Z 1.00 12 2:13 0.54¢ done
openrouter-deepseek-v4-flash-basic 2026-06-16T21:43:55Z 1.00 7 1:13 0.19¢ done