dd-l2-bangkok-multicrs-audit
Map (reference ← swipe → agent)
0:00
A contractor just shipped `bangkok_contractor_delivery.gpkg` and I need a defect cite-sheet before I sign off. Please walk every layer in the file and write one row per layer to `crs_audit.csv` with these columns: layer_name, declared_crs (formatted as `EPSG:NNNN`), geometry_type, feature_count, sample_x and sample_y for a sample coordinate taken from the layer as it sits on disk, and encoding_detected. For encoding_detected, look at the text attributes for anomalies and report either `latin1-mojibake` or `utf-8`.
0:00
System info
{
"adapter": "openrouter",
"model": "google/gemma-4-26b-a4b-it",
"base_url": "https://openrouter.ai/api/v1"
}
0:01
Bash
ls -F
bangkok_contractor_delivery.gpkg
0:21
Write
/work/solve.py
content · 95 lines · py
import geopandas as gpd
import fiona
import pandas as pd
from shapely.geometry import Point
def detect_encoding(gdf):
"""
Heuristic to check if text columns contain mojibake.
We check string columns for common mojibake patterns.
"""
# Select string columns
str_cols = gdf.select_dtypes(include=['object']).columns
for col in str_cols:
# Get non-null values
samples = gdf[col].dropna().astype(str).unique()
for s in samples:
# A common sign of latin1-mojibake in utf-8 environments is
# seeing sequences like 'é' instead of 'é'
# We can check if the string contains characters that look like
# results of incorrect decoding.
# For this task, a simple heuristic: if it looks like it has
# high-bit chars interpreted as multiple chars, we might flag it.
# However, a more robust way is to see if encoding as utf-8
# and decoding back results in the same string, or looking for
# specific patterns.
# Simple check: if the string has characters that are typical of mojibake
# e.g., 'Ã' followed by something.
if 'Ã' in s:
return 'latin1-mojibake'
return 'utf-8'
def audit_gpkg(file_path):
layers = fiona.listlayers(file_path)
audit_data = []
for layer_name in layers:
gdf = gpd.read_file(file_path, layer=layer_name)
# Declared CRS
crs = gdf.crs
if crs is not None and crs.is_epsg:
declared_crs = f"EPSG:{crs.to_epsg()}"
elif crs is not None:
declared_crs = str(crs)
else:
declared_crs = "None"
# Geometry Type
geom_type = gdf.geometry.type.iloc[0] if not gdf.empty else "None"
# Feature Count
feature_count = len(gdf)
# Sample Coordinates
sample_x, sample_y = None, None
if feature_count > 0:
sample_geom = gdf.geometry.iloc[0]
if hasattr(sample_geom, 'x'):
sample_x = sample_geom.x
sample_y = sample_geom.y
elif sample_geom.geom_type == 'MultiPoint':
# For MultiPoint, just take the first point
sample_x = sample_geom.geoms[0].x
sample_y = sample_geom.geoms[0].y
elif sample_geom.geom_type == 'GeometryCollection':
# Try to find a point in the collection
for part in sample_geom.geoms:
if hasattr(part, 'x'):
sample_x = part.x
sample_y = part.y
break
# Encoding detection
encoding_detected = detect_encoding(gdf)
audit_data.append({
'layer_name': layer_name,
'declared_crs': declared_crs,
'geometry_type': geom_type,
'feature_count': feature_count,
'sample_x': sample_x,
'sample_y': sample_y,
'encoding_detected': encoding_detected
})
df = pd.DataFrame(audit_data)
df.to_csv('crs_audit.csv', index=False)
print("Audit complete. Results written to crs_audit.csv")
if __name__ == "__main__":
audit_gpkg('bangkok_contractor_delivery.gpkg')
Wrote /work/solve.py (3369 chars).
0:23
Bash
python solve.py
Traceback (most recent call last):
File "/work/solve.py", line 94, in <module>
audit_gpkg('bangkok_contractor_delivery.gpkg')
~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/work/solve.py", line 44, in audit_gpkg
if crs is not None and crs.is_epsg:
^^^^^^^^^^^
AttributeError: 'CRS' object has no attribute 'is_epsg'. Did you mean: 'to_epsg'?
0:30
Edit
/work/solve.py
Edited /work/solve.py.
0:31
Bash
python solve.py
Audit complete. Results written to crs_audit.csv
/work/solve.py:12: Pandas4Warning: For backward compatibility, 'str' dtypes are included by select_dtypes when 'object' dtype is specified. This behavior is deprecated and will be removed in a future version. Explicitly pass 'str' to `include` to select them, or to `exclude` to remove them and silence this warning. See https://pandas.pydata.org/docs/user_guide/migration-3-strings.html#string-migration-select-dtypes for details on how to write code that works with pandas 2 and 3. str_cols = gdf.select_dtypes(include=['object']).columns /work/solve.py:12: Pandas4Warning: For backward compatibility, 'str' dtypes are included by select_dtypes when 'object' dtype is specified. This behavior is deprecated and will be removed in a future version. Explicitly pass 'str' to `include` to select them, or to `exclude` to remove them and silence this warning. See https://pandas.pydata.org/docs/user_guide/migration-3-strings.html#string-migration-select-dtypes for details on how to write code that works with pandas 2 and 3. str_cols = gdf.select_dtypes(include=['object']).columns /work/solve.py:12: Pandas4Warning: For backward compatibility, 'str' dtypes are included by select_dtypes when 'object' dtype is specified. This behavior is deprecated and will be removed in a future version. Explicitly pass 'str' to `include` to select them, or to `exclude` to remove them and silence this warning. See https://pandas.pydata.org/docs/user_guide/migration-3-strings.html#string-migration-select-dtypes for details on how to write code that works with pandas 2 and 3. str_cols = gdf.select_dtypes(include=['object']).columns
0:34
Bash
cat crs_audit.csv
layer_name,declared_crs,geometry_type,feature_count,sample_x,sample_y,encoding_detected parcels,EPSG:24047,Polygon,4000,,,utf-8 roads,EPSG:32647,LineString,5000,,,utf-8 markets,EPSG:4326,Point,1000,100.4533,13.655000000000001,utf-8
1:02
Write
/work/solve.py
content · 103 lines · py
import geopandas as gpd
import fiona
import pandas as pd
from shapely.geometry import Point
def detect_encoding(gdf):
"""
Heuristic to check if text columns contain mojibake.
We check string columns for common mojibake patterns.
"""
# Select string columns
str_cols = gdf.select_dtypes(include=['object']).columns
for col in str_cols:
# Get non-null values
samples = gdf[col].dropna().astype(str).unique()
for s in samples:
# A common sign of latin1-mojibake in utf-8 environments is
# seeing sequences like 'é' instead of 'é'
if 'Ã' in s:
return 'latin1-mojibake'
return 'utf-8'
def audit_gpkg(file_path):
layers = fiona.listlayers(file_path)
audit_data = []
for layer_name in layers:
gdf = gpd.read_file(file_path, layer=layer_name)
# Declared CRS
crs = gdf.crs
if crs is not None:
epsg_code = crs.to_epsg()
if epsg_code:
declared_crs = f"EPSG:{epsg_code}"
else:
declared_crs = str(crs)
else:
declared_crs = "None"
# Geometry Type
geom_type = gdf.geometry.type.iloc[0] if not gdf.empty else "None"
# Feature Count
feature_count = len(gdf)
# Sample Coordinates
sample_x, sample_y = None, None
if feature_count > 0:
sample_geom = gdf.geometry.iloc[0]
if sample_geom.geom_type == 'Point':
sample_x = sample_geom.x
sample_y = sample_geom.y
elif sample_geom.geom_type == 'LineString':
sample_x = sample_geom.coords[0][0]
sample_y = sample_geom.coords[0][1]
elif sample_geom.geom_type == 'Polygon':
# Use first point of exterior ring
sample_x = sample_geom.exterior.coords[0][0]
sample_y = sample_geom.exterior.coords[0][1]
elif sample_geom.geom_type == 'MultiPoint':
sample_x = sample_geom.geoms[0].x
sample_y = sample_geom.geoms[0].y
elif sample_geom.geom_type == 'MultiLineString':
sample_x = sample_geom.geoms[0].coords[0][0]
sample_y = sample_geom.geoms[0].coords[0][1]
elif sample_geom.geom_type == 'MultiPolygon':
sample_x = sample_geom.geoms[0].exterior.coords[0][0]
sample_y = sample_geom.geoms[0].exterior.coords[0][1]
elif sample_geom.geom_type == 'GeometryCollection':
for part in sample_geom.geoms:
if part.geom_type == 'Point':
sample_x, sample_y = part.x, part.y
break
elif hasattr(part, 'exterior'):
sample_x, sample_y = part.exterior.coords[0][0], part.exterior.coords[0][1]
break
elif hasattr(part, 'coords'):
sample_x, sample_y = part.coords[0][0], part.coords[0][1]
break
# Encoding detection
encoding_detected = detect_encoding(gdf)
audit_data.append({
'layer_name': layer_name,
'declared_crs': declared_crs,
'geometry_type': geom_type,
'feature_count': feature_count,
'sample_x': sample_x,
'sample_y': sample_y,
'encoding_detected': encoding_detected
})
df = pd.DataFrame(audit_data)
df.to_csv('crs_audit.csv', index=False)
print("Audit complete. Results written to crs_audit.csv")
if __name__ == "__main__":
audit_gpkg('bangkok_contractor_delivery.gpkg')
Wrote /work/solve.py (3785 chars).
1:03
Bash
python solve.py
Audit complete. Results written to crs_audit.csv
/work/solve.py:12: Pandas4Warning: For backward compatibility, 'str' dtypes are included by select_dtypes when 'object' dtype is specified. This behavior is deprecated and will be removed in a future version. Explicitly pass 'str' to `include` to select them, or to `exclude` to remove them and silence this warning. See https://pandas.pydata.org/docs/user_guide/migration-3-strings.html#string-migration-select-dtypes for details on how to write code that works with pandas 2 and 3. str_cols = gdf.select_dtypes(include=['object']).columns /work/solve.py:12: Pandas4Warning: For backward compatibility, 'str' dtypes are included by select_dtypes when 'object' dtype is specified. This behavior is deprecated and will be removed in a future version. Explicitly pass 'str' to `include` to select them, or to `exclude` to remove them and silence this warning. See https://pandas.pydata.org/docs/user_guide/migration-3-strings.html#string-migration-select-dtypes for details on how to write code that works with pandas 2 and 3. str_cols = gdf.select_dtypes(include=['object']).columns /work/solve.py:12: Pandas4Warning: For backward compatibility, 'str' dtypes are included by select_dtypes when 'object' dtype is specified. This behavior is deprecated and will be removed in a future version. Explicitly pass 'str' to `include` to select them, or to `exclude` to remove them and silence this warning. See https://pandas.pydata.org/docs/user_guide/migration-3-strings.html#string-migration-select-dtypes for details on how to write code that works with pandas 2 and 3. str_cols = gdf.select_dtypes(include=['object']).columns
1:06
Bash
cat crs_audit.csv
layer_name,declared_crs,geometry_type,feature_count,sample_x,sample_y,encoding_detected parcels,EPSG:24047,Polygon,4000,657421.5158199039,1509162.3701859994,utf-8 roads,EPSG:32647,LineString,5000,656835.6850029442,1509487.0031804747,utf-8 markets,EPSG:4326,Point,1000,100.4533,13.655000000000001,utf-8
1:13