dd-l2-bangkok-multicrs-audit
Map (reference ← swipe → agent)
0:00
A contractor just shipped `bangkok_contractor_delivery.gpkg` and I need a defect cite-sheet before I sign off. Please walk every layer in the file and write one row per layer to `crs_audit.csv` with these columns: layer_name, declared_crs (formatted as `EPSG:NNNN`), geometry_type, feature_count, sample_x and sample_y for a sample coordinate taken from the layer as it sits on disk, and encoding_detected. For encoding_detected, look at the text attributes for anomalies and report either `latin1-mojibake` or `utf-8`.
0:00
System info
{
"adapter": "openrouter",
"model": "deepseek/deepseek-v4-flash",
"base_url": "https://openrouter.ai/api/v1"
}
0:02
Let me start by examining the GPKG file to understand its structure.
0:02
Bash
ls -lh /work/
total 2.6M -rw-r--r-- 1 agent agent 2.6M Jun 16 06:02 bangkok_contractor_delivery.gpkg
0:11
Let me explore the GeoPackage file to understand its layers and structure.
0:11
Bash
cd /work && python3 -c "
cd /work && python3 -c "
import fiona
gpkg = '/work/bangkok_contractor_delivery.gpkg'
layers = fiona.list_layers(gpkg)
print('Layers:', layers)
for l in layers:
with fiona.open(gpkg, layer=l) as src:
print(f' Layer: {l}')
print(f' CRS: {src.crs}')
print(f' Schema: {src.schema}')
print(f' Count: {len(src)}')
if len(src) > 0:
feat = next(iter(src))
print(f' Sample geom: {feat[\"geometry\"]}')
print(f' Sample props: {feat[\"properties\"]}')
print()
"
Traceback (most recent call last):
File "<string>", line 4, in <module>
layers = fiona.list_layers(gpkg)
^^^^^^^^^^^^^^^^^
AttributeError: module 'fiona' has no attribute 'list_layers'. Did you mean: 'listlayers'?
0:17
Let me use fiona.listlayers instead.
0:17
Bash
cd /work && python3 -c "
cd /work && python3 -c "
import fiona
gpkg = '/work/bangkok_contractor_delivery.gpkg'
layers = fiona.listlayers(gpkg)
print('Layers:', list(layers))
for l in layers:
with fiona.open(gpkg, layer=l) as src:
print(f' Layer: {l}')
print(f' CRS: {src.crs}')
print(f' Schema: {src.schema}')
print(f' Count: {len(src)}')
if len(src) > 0:
feat = next(iter(src))
print(f' Sample geom: {feat[\"geometry\"]}')
print(f' Sample props: {feat[\"properties\"]}')
print()
"
Layers: ['parcels', 'roads', 'markets']
Layer: parcels
CRS: EPSG:24047
Schema: {'properties': {'id': 'str', 'name_th': 'str', 'owner_class': 'str'}, 'geometry': 'Polygon'}
Count: 4000
Sample geom: fiona.Geometry(coordinates=[[(657421.5158199039, 1509162.3701859994), ...]], type='Polygon')
Sample props: fiona.Properties(id='P00000', name_th='à¹\x81à¸\x9bลà¸\x87à¸\x97ีà¹\x88à¸\x94ิà¸\x99 à¸\x9aาà¸\x87รัà¸\x81', owner_class='private')
Layer: roads
CRS: EPSG:32647
Schema: {'properties': {'id': 'str', 'name': 'str', 'highway': 'str'}, 'geometry': 'LineString'}
Count: 5000
Sample geom: fiona.Geometry(coordinates=[(656835.6850029442, 1509487.0031804747), ...], type='LineString')
Sample props: fiona.Properties(id='R00000', name='à¸\x96à¸\x99à¸\x99สุà¸\x82ุมวิà¸\x97', highway='primary')
Layer: markets
CRS: EPSG:4326
Schema: {'properties': {'id': 'str', 'name_th': 'str', 'kind': 'str'}, 'geometry': 'Point'}
Count: 1000
Sample geom: fiona.Geometry(coordinates=(100.4533, 13.655000000000001), type='Point')
Sample props: fiona.Properties(id='M00000', name_th='ตลาดนัดจตุจักร', kind='fresh')
0:51
0:51
Write
/work/solve.py
content · 110 lines · py
#!/usr/bin/env python3
"""Audit every layer in bangkok_contractor_delivery.gpkg and write crs_audit.csv."""
import csv
import fiona
import pyproj
GPKG = "/work/bangkok_contractor_delivery.gpkg"
OUT = "/work/crs_audit.csv"
# Heuristic: check if text attributes contain bytes that look like latin1-mojibake
# of UTF-8 encoded Thai. Thai UTF-8 bytes in the range 0xE0-0xFB when
# re-interpreted as latin1 produce sequences like à¹, à¸, etc.
def detect_encoding(props: dict) -> str:
"""Return 'latin1-mojibake' if any string value shows the pattern, else 'utf-8'."""
for k, v in props.items():
if isinstance(v, str):
# Check for common mojibake patterns: sequences like à¹\x9b, à¸\x99 etc.
# These are the latin1 rendering of UTF-8 Thai bytes.
# Quick heuristic: look for 'à' followed by a non-ASCII char in the
# range 0x80-0xBF (which is what latin1 shows for continuation bytes).
for i, ch in enumerate(v):
if ch == 'à' and i + 1 < len(v):
nxt = v[i + 1]
if 0x80 <= ord(nxt) <= 0xBF:
return "latin1-mojibake"
# Also check for other common lead bytes: â, å, ç, è, é, ê, ë, ì, í, î, ï, ñ, ò, ó, ô, õ, ö, ù, ú, û, ü
if ch in 'âåçèéêëìíîïñòóôõöùúûü' and i + 1 < len(v):
nxt = v[i + 1]
if 0x80 <= ord(nxt) <= 0xBF:
return "latin1-mojibake"
return "utf-8"
rows = []
layers = list(fiona.listlayers(GPKG))
print(f"Found layers: {layers}")
for layer_name in layers:
with fiona.open(GPKG, layer=layer_name) as src:
crs = src.crs
schema = src.schema
geom_type = schema.get("geometry", "Unknown")
feature_count = len(src)
# Format CRS as EPSG:NNNN
if crs:
try:
crs_obj = pyproj.CRS(crs)
if crs_obj.to_epsg():
declared_crs = f"EPSG:{crs_obj.to_epsg()}"
else:
declared_crs = str(crs)
except Exception:
declared_crs = str(crs)
else:
declared_crs = "None"
# Get a sample coordinate
sample_x = ""
sample_y = ""
encoding_detected = "utf-8"
if feature_count > 0:
feat = next(iter(src))
geom = feat.get("geometry")
if geom and geom.get("coordinates"):
coords = geom["coordinates"]
# For Polygon/MultiPolygon, drill down to first ring
if geom["type"] in ("Polygon",):
coords = coords[0]
elif geom["type"] in ("MultiPolygon",):
coords = coords[0][0]
elif geom["type"] in ("MultiLineString",):
coords = coords[0]
elif geom["type"] in ("MultiPoint",):
coords = coords[0]
if coords and len(coords) > 0:
pt = coords[0]
if isinstance(pt, (list, tuple)) and len(pt) >= 2:
sample_x = pt[0]
sample_y = pt[1]
# Detect encoding from text attributes
encoding_detected = detect_encoding(feat.get("properties", {}))
rows.append({
"layer_name": layer_name,
"declared_crs": declared_crs,
"geometry_type": geom_type,
"feature_count": feature_count,
"sample_x": sample_x,
"sample_y": sample_y,
"encoding_detected": encoding_detected,
})
print(f" {layer_name}: CRS={declared_crs}, geom={geom_type}, count={feature_count}, "
f"sample=({sample_x}, {sample_y}), encoding={encoding_detected}")
# Write CSV
fieldnames = ["layer_name", "declared_crs", "geometry_type", "feature_count",
"sample_x", "sample_y", "encoding_detected"]
with open(OUT, "w", newline="") as f:
writer = csv.DictWriter(f, fieldnames=fieldnames)
writer.writeheader()
writer.writerows(rows)
print(f"\nWrote {len(rows)} rows to {OUT}")
Wrote /work/solve.py (4161 chars).
0:55
Let me run the script.
0:55
Bash
cd /work && python3 solve.py
Found layers: ['parcels', 'roads', 'markets'] parcels: CRS=EPSG:24047, geom=Polygon, count=4000, sample=(657421.5158199039, 1509162.3701859994), encoding=latin1-mojibake roads: CRS=EPSG:32647, geom=LineString, count=5000, sample=(656835.6850029442, 1509487.0031804747), encoding=latin1-mojibake markets: CRS=EPSG:4326, geom=Point, count=1000, sample=(, ), encoding=utf-8 Wrote 3 rows to /work/crs_audit.csv
1:12
1:12
Bash
cd /work && python3 -c "
cd /work && python3 -c "
import fiona
with fiona.open('/work/bangkok_contractor_delivery.gpkg', layer='markets') as src:
f = next(iter(src))
print(f['geometry'])
"
fiona.Geometry(coordinates=(100.4533, 13.655000000000001), type='Point')
1:21
For Point geometry, `geom["coordinates"]` is a tuple `(100.4533, 13.655)`, not a list of lists. My code does `coords = geom["coordinates"]` then checks `if coords and len(coords) > 0` — a tuple of length 2 passes, then `pt = coords[0]` gives `100.4533` which is a float, not a list/tuple, so the `isinstance(pt, (list, tuple))` check fails and sample_x/y stay empty. I need to fix the logic to handle Point geometry coordinates properly.
1:21
1:21
Read
/work/solve.py
1 #!/usr/bin/env python3
2 """Audit every layer in bangkok_contractor_delivery.gpkg and write crs_audit.csv."""
3
4 import csv
5 import fiona
6 import pyproj
7
8 GPKG = "/work/bangkok_contractor_delivery.gpkg"
9 OUT = "/work/crs_audit.csv"
10
11 # Heuristic: check if text attributes contain bytes that look like latin1-mojibake
12 # of UTF-8 encoded Thai. Thai UTF-8 bytes in the range 0xE0-0xFB when
13 # re-interpreted as latin1 produce sequences like à¹, à¸, etc.
14 def detect_encoding(props: dict) -> str:
15 """Return 'latin1-mojibake' if any string value shows the pattern, else 'utf-8'."""
16 for k, v in props.items():
17 if isinstance(v, str):
18 # Check for common mojibake patterns: sequences like à¹\x9b, à¸\x99 etc.
19 # These are the latin1 rendering of UTF-8 Thai bytes.
20 # Quick heuristic: look for 'à' followed by a non-ASCII char in the
21 # range 0x80-0xBF (which is what latin1 shows for continuation bytes).
22 for i, ch in enumerate(v):
23 if ch == 'à' and i + 1 < len(v):
24 nxt = v[i + 1]
25 if 0x80 <= ord(nxt) <= 0xBF:
26 return "latin1-mojibake"
27 # Also check for other common lead bytes: â, å, ç, è, é, ê, ë, ì, í, î, ï, ñ, ò, ó, ô, õ, ö, ù, ú, û, ü
28 if ch in 'âåçèéêëìíîïñòóôõöùúûü' and i + 1 < len(v):
29 nxt = v[i + 1]
30 if 0x80 <= ord(nxt) <= 0xBF:
31 return "latin1-mojibake"
32 return "utf-8"
33
34
35 rows = []
36
37 layers = list(fiona.listlayers(GPKG))
38 print(f"Found layers: {layers}")
39
40 for layer_name in layers:
41 with fiona.open(GPKG, layer=layer_name) as src:
42 crs = src.crs
43 schema = src.schema
44 geom_type = schema.get("geometry", "Unknown")
45 feature_count = len(src)
46
47 # Format CRS as EPSG:NNNN
48 if crs:
49 try:
50 crs_obj = pyproj.CRS(crs)
51 if crs_obj.to_epsg():
52 declared_crs = f"EPSG:{crs_obj.to_epsg()}"
53 else:
54 declared_crs = str(crs)
55 except Exception:
56 declared_crs = str(crs)
57 else:
58 declared_crs = "None"
59
60 # Get a sample coordinate
61 sample_x = ""
62 sample_y = ""
63 encoding_detected = "utf-8"
64
65 if feature_count > 0:
66 feat = next(iter(src))
67 geom = feat.get("geometry")
68 if geom and geom.get("coordinates"):
69 coords = geom["coordinates"]
70 # For Polygon/MultiPolygon, drill down to first ring
71 if geom["type"] in ("Polygon",):
72 coords = coords[0]
73 elif geom["type"] in ("MultiPolygon",):
74 coords = coords[0][0]
75 elif geom["type"] in ("MultiLineString",):
76 coords = coords[0]
77 elif geom["type"] in ("MultiPoint",):
78 coords = coords[0]
79
80 if coords and len(coords) > 0:
81 pt = coords[0]
82 if isinstance(pt, (list, tuple)) and len(pt) >= 2:
83 sample_x = pt[0]
84 sample_y = pt[1]
85
86 # Detect encoding from text attributes
87 encoding_detected = detect_encoding(feat.get("properties", {}))
88
89 rows.append({
90 "layer_name": layer_name,
91 "declared_crs": declared_crs,
92 "geometry_type": geom_type,
93 "feature_count": feature_count,
94 "sample_x": sample_x,
95 "sample_y": sample_y,
96 "encoding_detected": encoding_detected,
97 })
98
99 print(f" {layer_name}: CRS={declared_crs}, geom={geom_type}, count={feature_count}, "
100 f"sample=({sample_x}, {sample_y}), encoding={encoding_detected}")
101
102 # Write CSV
103 fieldnames = ["layer_name", "declared_crs", "geometry_type", "feature_count",
104 "sample_x", "sample_y", "encoding_detected"]
105 with open(OUT, "w", newline="") as f:
106 writer = csv.DictWriter(f, fieldnames=fieldnames)
107 writer.writeheader()
108 writer.writerows(rows)
109
110 print(f"\nWrote {len(rows)} rows to {OUT}")
1:40
1:40
Edit
/work/solve.py
Edited /work/solve.py.
1:43
1:43
Bash
cd /work && python3 solve.py
Found layers: ['parcels', 'roads', 'markets'] parcels: CRS=EPSG:24047, geom=Polygon, count=4000, sample=(657421.5158199039, 1509162.3701859994), encoding=latin1-mojibake roads: CRS=EPSG:32647, geom=LineString, count=5000, sample=(656835.6850029442, 1509487.0031804747), encoding=latin1-mojibake markets: CRS=EPSG:4326, geom=Point, count=1000, sample=(100.4533, 13.655000000000001), encoding=utf-8 Wrote 3 rows to /work/crs_audit.csv
1:45
1:45
Read
/work/crs_audit.csv
1 layer_name,declared_crs,geometry_type,feature_count,sample_x,sample_y,encoding_detected
2 parcels,EPSG:24047,Polygon,4000,657421.5158199039,1509162.3701859994,latin1-mojibake
3 roads,EPSG:32647,LineString,5000,656835.6850029442,1509487.0031804747,latin1-mojibake
4 markets,EPSG:4326,Point,1000,100.4533,13.655000000000001,utf-8
1:52