dd-l2-bangkok-multicrs-audit
Map (reference ← swipe → agent)
0:00
A contractor just shipped `bangkok_contractor_delivery.gpkg` and I need a defect cite-sheet before I sign off. Please walk every layer in the file and write one row per layer to `crs_audit.csv` with these columns: layer_name, declared_crs (formatted as `EPSG:NNNN`), geometry_type, feature_count, sample_x and sample_y for a sample coordinate taken from the layer as it sits on disk, and encoding_detected. For encoding_detected, look at the text attributes for anomalies and report either `latin1-mojibake` or `utf-8`.
0:00
System info
{
"adapter": "openrouter",
"model": "google/gemma-4-26b-a4b-it",
"base_url": "https://openrouter.ai/api/v1"
}
0:01
Bash
ls -l bangkok_contractor_delivery.gpkg
-rw-r--r-- 1 agent agent 2703360 Jun 17 20:25 bangkok_contractor_delivery.gpkg
0:29
Write
/work/solve.py
content · 99 lines · py
import geopandas as gpd
import pandas as pd
import fiona
from fiona.crs import CRS
def check_encoding(df):
"""
Heuristic to check if a dataframe has mojibake.
We check string columns for common mojibake patterns or
sequences that suggest incorrect encoding.
"""
# If no string columns, return utf-8 as a default
str_cols = df.select_dtypes(include=['object']).columns
if len(str_cols) == 0:
return "utf-8"
for col in str_cols:
# Check for common mojibake indicators like Ã, ©, etc.
# in contexts where they don't belong, but a simpler way is to
# try to see if the column can be encoded/decoded cleanly if it were bytes.
# However, since it's already loaded as a string, we look for "garbage" characters.
# A common sign of latin1-mojibake in utf-8 is the presence of 'Ã' followed by
# something else.
sample_text = df[col].dropna().astype(str).head(100).tolist()
for text in sample_text:
# A very basic heuristic: if we see characters like à followed by something
# that looks like a single character's byte representation.
# Example: 'é' instead of 'é'
if "Ã" in text:
return "latin1-mojibake"
return "utf-8"
def audit_gpkg(file_path, output_csv):
layers = fiona.listlayers(file_path)
audit_results = []
for layer in layers:
gdf = gpd.read_file(file_path, layer=layer)
# layer_name
layer_name = layer
# declared_crs
# fiona/geopandas handles CRS. We want EPSG:NNNN format.
# If it's a custom CRS or missing, we'll try to format it or use 'Unknown'
try:
if gdf.crs is not None:
# Use to_epsg() if available, otherwise string representation
epsg_code = gdf.crs.to_epsg()
declared_crs = f"EPSG:{epsg_code}" if epsg_code else str(gdf.crs)
else:
declared_crs = "Unknown"
except Exception:
declared_crs = "Unknown"
# geometry_type
geometry_type = gdf.geometry.type.iloc[0] if not gdf.empty else "None"
# feature_count
feature_count = len(gdf)
# sample_x and sample_y
sample_x = None
sample_y = None
if not gdf.empty:
sample_point = gdf.geometry.iloc[0]
if sample_point.geom_type == 'Point':
sample_x = sample_point.x
sample_y = sample_point.y
elif sample_point.geom_type in ['LineString', 'MultiLineString', 'Polygon', 'MultiPolygon']:
# Get representative point or centroid
rep_point = sample_point.centroid
sample_x = rep_point.x
sample_y = rep_point.y
elif sample_point.geom_type == 'MultiPoint':
sample_x = sample_point.geoms[0].x
sample_y = sample_point.geoms[0].y
# encoding_detected
encoding = check_encoding(gdf)
audit_results.append({
"layer_name": layer_name,
"declared_crs": declared_crs,
"geometry_type": geometry_type,
"feature_count": feature_count,
"sample_x": sample_x,
"sample_y": sample_y,
"encoding_detected": encoding
})
audit_df = pd.DataFrame(audit_results)
audit_df.to_csv(output_csv, index=False)
print(f"Audit complete. Results written to {output_csv}")
if __name__ == "__main__":
audit_gpkg("bangkok_contractor_delivery.gpkg", "crs_audit.csv")
Wrote /work/solve.py (3657 chars).
0:31
Bash
python solve.py
Audit complete. Results written to crs_audit.csv
/work/solve.py:13: Pandas4Warning: For backward compatibility, 'str' dtypes are included by select_dtypes when 'object' dtype is specified. This behavior is deprecated and will be removed in a future version. Explicitly pass 'str' to `include` to select them, or to `exclude` to remove them and silence this warning. See https://pandas.pydata.org/docs/user_guide/migration-3-strings.html#string-migration-select-dtypes for details on how to write code that works with pandas 2 and 3. str_cols = df.select_dtypes(include=['object']).columns /work/solve.py:13: Pandas4Warning: For backward compatibility, 'str' dtypes are included by select_dtypes when 'object' dtype is specified. This behavior is deprecated and will be removed in a future version. Explicitly pass 'str' to `include` to select them, or to `exclude` to remove them and silence this warning. See https://pandas.pydata.org/docs/user_guide/migration-3-strings.html#string-migration-select-dtypes for details on how to write code that works with pandas 2 and 3. str_cols = df.select_dtypes(include=['object']).columns /work/solve.py:13: Pandas4Warning: For backward compatibility, 'str' dtypes are included by select_dtypes when 'object' dtype is specified. This behavior is deprecated and will be removed in a future version. Explicitly pass 'str' to `include` to select them, or to `exclude` to remove them and silence this warning. See https://pandas.pydata.org/docs/user_guide/migration-3-strings.html#string-migration-select-dtypes for details on how to write code that works with pandas 2 and 3. str_cols = df.select_dtypes(include=['object']).columns
0:33
Read
crs_audit.csv
1 layer_name,declared_crs,geometry_type,feature_count,sample_x,sample_y,encoding_detected
2 parcels,EPSG:24047,Polygon,4000,657706.4571303247,1509274.702150785,utf-8
3 roads,EPSG:32647,LineString,5000,656996.3371533038,1509487.963830215,utf-8
4 markets,EPSG:4326,Point,1000,100.4533,13.655000000000001,utf-8
0:38
Bash
# Let's see if I can find any actual mojibake by manually checking the file if i...
# Let's see if I can find any actual mojibake by manually checking the file if it's small enough,
# but since the CSV output didn't show any, I'll trust my logic for now unless I find evidence.
# Wait, I should check if any layer actually HAS mojibake.
# Let's double check the 'markets' layer specifically.
python -c "import geopandas as gpd; gdf = gpd.read_file('bangkok_contractor_delivery.gpkg', layer='markets'); print(gdf.head())"
id name_th kind geometry 0 M00000 ตลาดนัดจตุจักร fresh POINT (100.4533 13.655) 1 M00001 ตลาดอ.ต.ก. wet POINT (100.4599 13.655) 2 M00002 ตลาดคลองเตย night POINT (100.4665 13.655) 3 M00003 ตลาดสะพานขาว fresh POINT (100.4731 13.655) 4 M00004 ตลาดบางรัก wet POINT (100.4797 13.655)
0:42
Bash
python -c "import geopandas as gpd; gdf = gpd.read_file('bangkok_contractor_deli...
python -c "import geopandas as gpd; gdf = gpd.read_file('bangkok_contractor_delivery.gpkg', layer='parcels'); print(gdf.head())"
id ... geometry 0 P00000 ... POLYGON ((657421.516 1509162.37, 657992.722 15... 1 P00001 ... POLYGON ((658135.524 1509166.644, 658706.731 1... 2 P00002 ... POLYGON ((658849.533 1509170.937, 659420.743 1... 3 P00003 ... POLYGON ((659563.545 1509175.249, 660134.756 1... 4 P00004 ... POLYGON ((660277.558 1509179.581, 660848.771 1... [5 rows x 4 columns]
0:46
Bash
# Check the roads layer for mojibake
# Check the roads layer for mojibake
python -c "import geopandas as gpd; gdf = gpd.read_file('bangkok_contractor_delivery.gpkg', layer='roads'); print(gdf.head())"
id ... geometry 0 R00000 ... LINESTRING (656835.685 1509487.003, 657156.989... 1 R00001 ... LINESTRING (657192.69 1509489.138, 657190.305 ... 2 R00002 ... LINESTRING (657549.695 1509491.278, 657871 150... 3 R00003 ... LINESTRING (657906.701 1509493.423, 657904.305... 4 R00004 ... LINESTRING (658263.707 1509495.572, 658585.013... [5 rows x 4 columns]
0:58
Bash
# Let's try to check if there's any layer that has encoding issues.
# Let's try to check if there's any layer that has encoding issues.
# I'll run a script that specifically looks for mojibake in all layers.
cat << 'EOF' > check_mojibake.py
import geopandas as gpd
import fiona
def find_mojibake(gdf):
str_cols = gdf.select_dtypes(include=['object']).columns
for col in str_cols:
sample = gdf[col].dropna().astype(str).head(100).tolist()
for text in sample:
# Check for common mojibake patterns
# Latin-1 interpreted as UTF-8 often results in characters like Ã, Â, etc.
# followed by another character.
for char in ["Ã", "Â", "Ã", "Â"]:
if char in text:
return True
return False
layers = fiona.listlayers('bangkok_contractor_delivery.gpkg')
for layer in layers:
gdf = gpd.read_file('bangkok_contractor_delivery.gpkg', layer=layer)
if find_mojibake(gdf):
print(f"Layer {layer} HAS mojibake")
else:
print(f"Layer {layer} is clean")
EOF
python check_mojibake.py
Layer parcels is clean Layer roads is clean Layer markets is clean
/work/check_mojibake.py:5: Pandas4Warning: For backward compatibility, 'str' dtypes are included by select_dtypes when 'object' dtype is specified. This behavior is deprecated and will be removed in a future version. Explicitly pass 'str' to `include` to select them, or to `exclude` to remove them and silence this warning. See https://pandas.pydata.org/docs/user_guide/migration-3-strings.html#string-migration-select-dtypes for details on how to write code that works with pandas 2 and 3. str_cols = gdf.select_dtypes(include=['object']).columns /work/check_mojibake.py:5: Pandas4Warning: For backward compatibility, 'str' dtypes are included by select_dtypes when 'object' dtype is specified. This behavior is deprecated and will be removed in a future version. Explicitly pass 'str' to `include` to select them, or to `exclude` to remove them and silence this warning. See https://pandas.pydata.org/docs/user_guide/migration-3-strings.html#string-migration-select-dtypes for details on how to write code that works with pandas 2 and 3. str_cols = gdf.select_dtypes(include=['object']).columns /work/check_mojibake.py:5: Pandas4Warning: For backward compatibility, 'str' dtypes are included by select_dtypes when 'object' dtype is specified. This behavior is deprecated and will be removed in a future version. Explicitly pass 'str' to `include` to select them, or to `exclude` to remove them and silence this warning. See https://pandas.pydata.org/docs/user_guide/migration-3-strings.html#string-migration-select-dtypes for details on how to write code that works with pandas 2 and 3. str_cols = gdf.select_dtypes(include=['object']).columns
1:01
Bash
ls -l crs_audit.csv
-rw-r--r-- 1 agent agent 300 Jun 17 20:26 crs_audit.csv
1:07