dd-l2-bangkok-multicrs-audit
Map (reference ← swipe → agent)
0:00
A contractor just shipped `bangkok_contractor_delivery.gpkg` and I need a defect cite-sheet before I sign off. Please walk every layer in the file and write one row per layer to `crs_audit.csv` with these columns: layer_name, declared_crs (formatted as `EPSG:NNNN`), geometry_type, feature_count, sample_x and sample_y for a sample coordinate taken from the layer as it sits on disk, and encoding_detected. For encoding_detected, look at the text attributes for anomalies and report either `latin1-mojibake` or `utf-8`.
0:00
System info
{
"adapter": "openrouter",
"model": "deepseek/deepseek-v4-flash",
"base_url": "https://openrouter.ai/api/v1"
}
0:11
Let me start by exploring the file to understand its structure - what layers it contains, their schemas, etc.
0:11
Bash
ls -la /work/bangkok_contractor_delivery.gpkg
-rw-r--r-- 1 agent agent 2703360 Jun 16 22:51 /work/bangkok_contractor_delivery.gpkg
0:15
Let me first explore the GeoPackage file to understand its structure - what layers it contains, their schemas, etc.
0:15
Bash
cd /work && python3 -c "
cd /work && python3 -c "
import geopandas as gpd
import fiona
# List all layers
layers = fiona.listlayers('bangkok_contractor_delivery.gpkg')
print('Layers:', layers)
"
Layers: ['parcels', 'roads', 'markets']
0:24
Let me explore each layer to understand the structure.
0:24
Bash
cd /work && python3 -c "
cd /work && python3 -c "
import fiona
layers = fiona.listlayers('bangkok_contractor_delivery.gpkg')
for lyr in layers:
with fiona.open('bangkok_contractor_delivery.gpkg', layer=lyr) as src:
meta = src.meta
schema = src.schema
crs = src.crs
count = len(src)
# get first feature
feat = next(iter(src))
print(f'=== {lyr} ===')
print(f' CRS: {crs}')
print(f' Schema: {schema}')
print(f' Count: {count}')
print(f' First feature geometry: {feat[\"geometry\"]}')
print(f' First feature props: {feat[\"properties\"]}')
print()
"
=== parcels ===
CRS: EPSG:24047
Schema: {'properties': {'id': 'str', 'name_th': 'str', 'owner_class': 'str'}, 'geometry': 'Polygon'}
Count: 4000
First feature geometry: fiona.Geometry(coordinates=[[(657421.5158199039, 1509162.3701859994), ...]], type='Polygon')
First feature props: fiona.Properties(id='P00000', name_th='à¹\x81à¸\x9bลà¸\x87à¸\x97ีà¹\x88à¸\x94ิà¸\x99 à¸\x9aาà¸\x87รัà¸\x81', owner_class='private')
=== roads ===
CRS: EPSG:32647
Schema: {'properties': {'id': 'str', 'name': 'str', 'highway': 'str'}, 'geometry': 'LineString'}
Count: 5000
First feature geometry: fiona.Geometry(coordinates=[(656835.6850029442, 1509487.0031804747), ...], type='LineString')
First feature props: fiona.Properties(id='R00000', name='à¸\x96à¸\x99à¸\x99สุà¸\x82ุมวิà¸\x97', highway='primary')
=== markets ===
CRS: EPSG:4326
Schema: {'properties': {'id': 'str', 'name_th': 'str', 'kind': 'str'}, 'geometry': 'Point'}
Count: 1000
First feature geometry: fiona.Geometry(coordinates=(100.4533, 13.655000000000001), type='Point')
First feature props: fiona.Properties(id='M00000', name_th='ตลาดนัดจตุจักร', kind='fresh')
0:37
Interesting! Let's analyze: 1. **parcels**: CRS is EPSG:24047 (Indian 1954 / UTM zone 47N, I think). The name_th field shows `à¹\x81à¸\x9bลà¸\x87à¸\x97ีà¹\x88à¸\x94ิà¸\x99 à¸\x9aาà¸\x87รัà¸\x81` - this looks like UTF-8 bytes being interpreted as Latin-1 (mojibake). The actual Thai characters like "แผนการ" etc. encoded in UTF-8 and then displayed as Latin-1 would produce those sequences. 2. **roads**: CRS is EPSG:32647 (WGS 84 / UTM zone 47N). The name field shows `à¸\x96à¸\x99à¸\x99สุà¸\x82ุมวิà¸\x97` - also looks like mojibake. 3. **markets**: CRS is EPSG:4326 (WGS 84). The name_th field shows `ตลาดนัดจตุจักร` which looks like proper Thai/UTF-8 text. So the encoding detection: - parcels: latin1-mojibake (Thai text encoded as UTF-8 but stored/read as Latin-1) - roads: latin1-mojibake - markets: utf-8 Now I need to write a script to generate the CSV. Let me also get sample coordinates. I need the coordinate from the geometry as it sits on disk. For sample_x and sample_y, I'll grab the first coordinate from the first feature of each layer. Let me write the full script.
0:37
Bash
cd /work && python3 -c "
cd /work && python3 -c "
import fiona
def get_sample_coord(layer):
with fiona.open('bangkok_contractor_delivery.gpkg', layer=layer) as src:
feat = next(iter(src))
geom = feat['geometry']
# Handle different geometry types to extract one coordinate
if geom['type'] == 'Point':
return geom['coordinates'][0], geom['coordinates'][1]
elif geom['type'] in ('Polygon', 'LineString'):
# first coord of first ring/segment
coords = geom['coordinates']
if geom['type'] == 'Polygon':
coords = coords[0] # exterior ring
return coords[0][0], coords[0][1]
elif geom['type'] == 'MultiPoint':
return geom['coordinates'][0][0], geom['coordinates'][0][1]
return None, None
def detect_encoding(layer):
with fiona.open('bangkok_contractor_delivery.gpkg', layer=layer) as src:
feat = next(iter(src))
props = feat['properties']
for k, v in props.items():
if isinstance(v, str) and any(ord(c) > 127 for c in v):
# Check if it's valid UTF-8 Thai or mojibake
# Mojibake pattern: sequences like ๠etc (UTF-8 bytes read as Latin-1)
# We can try to re-encode as Latin-1 and decode as UTF-8 to test
try:
# If the string is actually UTF-8 bytes misinterpreted as Latin-1,
# we can round-trip: encode('latin-1') -> bytes, then decode('utf-8') -> proper Thai
recovered = v.encode('latin-1').decode('utf-8')
# Check if all chars are printable/category
import unicodedata
# If recovery yields consistent Thai characters, it's mojibake
# Check if recovered looks like proper Thai
thai_chars = sum(1 for c in recovered if '\u0E00' <= c <= '\u0E7F')
if thai_chars > 0:
return 'latin1-mojibake'
else:
return 'utf-8'
except:
# If the round-trip fails, it might be legit UTF-8
return 'utf-8'
return 'utf-8'
layers = fiona.listlayers('bangkok_contractor_delivery.gpkg')
for lyr in layers:
print(f'Layer: {lyr}')
x, y = get_sample_coord(lyr)
print(f' Sample coord: {x}, {y}')
enc = detect_encoding(lyr)
print(f' Encoding: {enc}')
"
Layer: parcels Sample coord: 657421.5158199039, 1509162.3701859994 Encoding: latin1-mojibake Layer: roads Sample coord: 656835.6850029442, 1509487.0031804747 Encoding: latin1-mojibake Layer: markets Sample coord: 100.4533, 13.655000000000001 Encoding: utf-8
0:45
Let me now get more details to build the CSV. I need: 1. layer_name 2. declared_crs as EPSG:NNNN 3. geometry_type 4. feature_count 5. sample_x and sample_y (for a sample coordinate taken from the layer as it sits on disk) 6. encoding_detected Let me write a comprehensive script to produce the CSV.
0:45
Bash
cd /work && python3 << 'PYEOF'
cd /work && python3 << 'PYEOF'
import csv
import fiona
layers = fiona.listlayers('bangkok_contractor_delivery.gpkg')
rows = []
for lyr in layers:
with fiona.open('bangkok_contractor_delivery.gpkg', layer=lyr) as src:
# CRS
crs = src.crs
if crs is None:
declared_crs = 'EPSG:None'
elif crs.get('init'):
declared_crs = crs['init'].upper()
else:
# try to build from epsg code
epsg = crs.to_epsg() if hasattr(crs, 'to_epsg') else None
if epsg:
declared_crs = f'EPSG:{epsg}'
else:
declared_crs = str(crs)
# Geometry type
geom_type = src.schema['geometry']
# Feature count
count = len(src)
# Sample coordinate from first feature
feat = next(iter(src))
geom = feat['geometry']
coords = geom['coordinates']
if geom_type == 'Point':
sample_x, sample_y = coords[0], coords[1]
elif geom_type in ('MultiPoint',):
sample_x, sample_y = coords[0][0], coords[0][1]
elif geom_type == 'LineString':
sample_x, sample_y = coords[0][0], coords[0][1]
elif geom_type == 'MultiLineString':
sample_x, sample_y = coords[0][0][0], coords[0][0][1]
elif geom_type == 'Polygon':
sample_x, sample_y = coords[0][0][0], coords[0][0][1]
elif geom_type == 'MultiPolygon':
sample_x, sample_y = coords[0][0][0][0], coords[0][0][0][1]
else:
sample_x, sample_y = None, None
# Encoding detection
props = feat['properties']
encoding = 'utf-8'
for k, v in props.items():
if isinstance(v, str) and any(ord(c) > 127 for c in v):
try:
recovered = v.encode('latin-1').decode('utf-8')
thai_chars = sum(1 for c in recovered if '\u0E00' <= c <= '\u0E7F')
if thai_chars > 0:
encoding = 'latin1-mojibake'
break
except:
pass
rows.append({
'layer_name': lyr,
'declared_crs': declared_crs,
'geometry_type': geom_type,
'feature_count': count,
'sample_x': sample_x,
'sample_y': sample_y,
'encoding_detected': encoding
})
with open('crs_audit.csv', 'w', newline='') as f:
writer = csv.DictWriter(f, fieldnames=['layer_name', 'declared_crs', 'geometry_type', 'feature_count', 'sample_x', 'sample_y', 'encoding_detected'])
writer.writeheader()
writer.writerows(rows)
print("Done. Contents:")
with open('crs_audit.csv') as f:
print(f.read())
PYEOF
Done. Contents: layer_name,declared_crs,geometry_type,feature_count,sample_x,sample_y,encoding_detected parcels,EPSG:24047,Polygon,4000,657421.5158199039,1509162.3701859994,latin1-mojibake roads,EPSG:32647,LineString,5000,656835.6850029442,1509487.0031804747,latin1-mojibake markets,EPSG:4326,Point,1000,100.4533,13.655000000000001,utf-8
0:53
The task is complete. The CSV file has been written to `/work/crs_audit.csv` with all required columns and correct data for each layer.
0:53