dd-l2-bangkok-multicrs-audit

Status: done
Score: 0.882
Duration: 0:52
Cost: 0.20¢
Model: google/gemma-4-26b-a4b-it
Model error Retrospective assessed 2026-06-09T12:12:15Z

Map (reference ← swipe → agent)

0:00
A contractor just shipped `bangkok_contractor_delivery.gpkg` and I need a defect cite-sheet before I sign off. Please walk every layer in the file and write one row per layer to `crs_audit.csv` with these columns: layer_name, declared_crs (formatted as `EPSG:NNNN`), geometry_type, feature_count, sample_x and sample_y for a sample coordinate taken from the layer as it sits on disk, and encoding_detected. For encoding_detected, look at the text attributes for anomalies and report either `latin1-mojibake` or `utf-8`.
0:00
System info
{
  "adapter": "openrouter",
  "model": "google/gemma-4-26b-a4b-it",
  "base_url": "https://openrouter.ai/api/v1"
}
0:01
Bash ls -lh bangkok_contractor_delivery.gpkg
-rw-r--r-- 1 agent agent 2.6M Jun  7 13:39 bangkok_contractor_delivery.gpkg
0:36
Write /work/solve.py
content · 94 lines · py
import geopandas as gpd
import pandas as pd
import fiona
from shapely.geometry import Point

def detect_encoding(gdf):
    """
    Detect encoding by checking text columns for common mojibake patterns.
    Specifically looking for things like 'é' instead of 'é'.
    """
    # Identify text columns
    text_cols = gdf.select_dtypes(include=['object']).columns
    
    for col in text_cols:
        # Sample some values
        samples = gdf[col].dropna().astype(str).head(100).tolist()
        for s in samples:
            # A common sign of latin1 interpreted as utf-8:
            # Check if there are sequences like 'Ã' followed by a character that might be part of a multibyte char
            # Or more simply, check if common UTF-8 characters look like mojibake.
            # This is a heuristic.
            if 'Ã' in s:
                return 'latin1-mojibake'
    return 'utf-8'

def audit_gpkg(file_path, output_csv):
    layers = fiona.listlayers(file_path)
    audit_results = []

    for layer_name in layers:
        try:
            gdf = gpd.read_file(file_path, layer=layer_name)
            
            # 1. layer_name
            # 2. declared_crs
            crs = gdf.crs
            if crs is None:
                declared_crs = "None"
            else:
                # Try to get EPSG code
                epsg_code = crs.to_epsg()
                if epsg_code:
                    declared_crs = f"EPSG:{epsg_code}"
                else:
                    declared_crs = str(crs)

            # 3. geometry_type
            geom_type = gdf.geom_type.iloc[0] if not gdf.empty else "None"

            # 4. feature_count
            feature_count = len(gdf)

            # 5. sample_x and sample_y
            sample_x, sample_y = None, None
            if not gdf.empty:
                # Get first geometry
                first_geom = gdf.geometry.iloc[0]
                if first_geom.geom_type == 'Point':
                    sample_x = first_geom.x
                    sample_y = first_geom.y
                elif first_geom.geom_type in ['LineString', 'MultiLineString', 'Polygon', 'MultiPolygon']:
                    # Get centroid for sample
                    centroid = first_geom.centroid
                    sample_x = centroid.x
                    sample_y = centroid.y
                # If it's something else, we might fail or use representative_point
                else:
                    rep_point = first_geom.representative_point()
                    sample_x = rep_point.x
                    sample_y = rep_point.y

            # 6. encoding_detected
            encoding = detect_encoding(gdf)

            audit_results.append({
                'layer_name': layer_name,
                'declared_crs': declared_crs,
                'geometry_type': geom_type,
                'feature_count': feature_count,
                'sample_x': sample_x,
                'sample_y': sample_y,
                'encoding_detected': encoding
            })
            
        except Exception as e:
            print(f"Error processing layer {layer_name}: {e}")

    df_audit = pd.DataFrame(audit_results)
    df_audit.to_csv(output_csv, index=False)
    print(f"Audit complete. Results saved to {output_csv}")

if __name__ == "__main__":
    audit_gpkg('bangkok_contractor_delivery.gpkg', 'crs_audit.csv')
Wrote /work/solve.py (3363 chars).
0:37
Bash python /work/solve.py
Audit complete. Results saved to crs_audit.csv
/work/solve.py:12: Pandas4Warning: For backward compatibility, 'str' dtypes are included by select_dtypes when 'object' dtype is specified. This behavior is deprecated and will be removed in a future version. Explicitly pass 'str' to `include` to select them, or to `exclude` to remove them and silence this warning.
See https://pandas.pydata.org/docs/user_guide/migration-3-strings.html#string-migration-select-dtypes for details on how to write code that works with pandas 2 and 3.
  text_cols = gdf.select_dtypes(include=['object']).columns
/work/solve.py:12: Pandas4Warning: For backward compatibility, 'str' dtypes are included by select_dtypes when 'object' dtype is specified. This behavior is deprecated and will be removed in a future version. Explicitly pass 'str' to `include` to select them, or to `exclude` to remove them and silence this warning.
See https://pandas.pydata.org/docs/user_guide/migration-3-strings.html#string-migration-select-dtypes for details on how to write code that works with pandas 2 and 3.
  text_cols = gdf.select_dtypes(include=['object']).columns
/work/solve.py:12: Pandas4Warning: For backward compatibility, 'str' dtypes are included by select_dtypes when 'object' dtype is specified. This behavior is deprecated and will be removed in a future version. Explicitly pass 'str' to `include` to select them, or to `exclude` to remove them and silence this warning.
See https://pandas.pydata.org/docs/user_guide/migration-3-strings.html#string-migration-select-dtypes for details on how to write code that works with pandas 2 and 3.
  text_cols = gdf.select_dtypes(include=['object']).columns
0:40
Bash cat crs_audit.csv
layer_name,declared_crs,geometry_type,feature_count,sample_x,sample_y,encoding_detected
parcels,EPSG:24047,Polygon,4000,657706.4571303247,1509274.702150785,utf-8
roads,EPSG:32647,LineString,5000,656996.3371533038,1509487.963830215,utf-8
markets,EPSG:4326,Point,1000,100.4533,13.655000000000001,utf-8
0:51