dd-l3-lagos-overture-buildings
Map (reference ← swipe → agent)
0:00
Updating the flood-risk model before rainy season. Need every building footprint exceeding 1000 m² across Lagos State (Nigeria) from Overture.
Also need a per-LGA roll-up: for each Lagos Local Government Area, the building count, total footprint area, count of buildings with a non-null Overture height, and the median height where available (null otherwise).
Write two output files. `lagos_buildings.geoparquet` should be EPSG:4326 GeoParquet with columns `id`, `height`, `footprint_area_m2`, `lga`, and `geometry`. `lagos_building_summary.parquet` should be a plain Parquet with no geometry column and columns `lga`, `n_buildings`, `total_footprint_m2`, `n_with_height`, and `p50_height_m`.
0:00
System info
{
"adapter": "openrouter",
"model": "deepseek/deepseek-v4-flash",
"base_url": "https://openrouter.ai/api/v1"
}
0:08
Let me break this down: 1. I need to get Overture Maps data for Lagos State, Nigeria - building footprints > 1000 m² 2. Need to get LGA (Local Government Area) boundaries for Lagos State 3. Process and output two files: - `lagos_buildings.geoparquet` - EPSG:4326 with columns: id, height, footprint_area_m2, lga, geometry - `lagos_building_summary.parquet` - plain Parquet with: lga, n_buildings, total_footprint_m2, n_with_height, p50_height_m Let me start by exploring what's available and how to access Overture Maps data. First, let me check what tools/libraries are available.
0:08
Bash
python3 -c "import geopandas; print(geopandas.__version__)"
1.1.3
0:09
Bash
python3 -c "import duckdb; print(duckdb.__version__)"
1.5.3
0:10
Bash
python3 -c "import pyarrow; print(pyarrow.__version__)"
24.0.0
0:20
Let me understand the task: 1. Get building footprints > 1000 m² across Lagos State (Nigeria) from Overture Maps data 2. For each LGA (Local Government Area), compute summary statistics 3. Output two files: - `lagos_buildings.geoparquet` (EPSG:4326) with columns: id, height, footprint_area_m2, lga, geometry - `lagos_building_summary.parquet` (plain parquet) with columns: lga, n_buildings, total_footprint_m2, n_with_height, p50_height_m Overture Maps provides data via their public dataset. I need to access it. The usual way is through Overture's parquet files hosted on AWS S3, or via their SQL-based access. Let me think about how to get Overture data for Lagos State. Overture Maps data is available as Parquet files on AWS S3 (s3://overturemaps-us-west-2/release/...). The latest release would have building footprints. Actually, I can use DuckDB with the spatial extension to query Overture Maps data directly. Overture provides a global schema with themes like "buildings". Let me write a script that: 1. Uses DuckDB to query Overture buildings for Lagos State, Nigeria 2. Filters to buildings > 1000 m² 3. Calculates the area 4. Spatially joins with LGAs of Lagos State 5. Outputs the two files First, I need to know the boundaries of Lagos State and its LGAs. I can get administrative boundaries from Overture as well (the "divisions" theme or from Natural Earth). Let me think about the approach step by step: 1. Query Overture buildings for the bounding box of Lagos State 2. Get Lagos State boundary from Overture divisions 3. Spatial filter to keep only buildings within Lagos State 4. Get LGA boundaries 5. Spatial join to assign LGA names 6. Filter to buildings > 1000 m² 7. Calculate area 8. Output GeoParquet 9. Compute per-LGA summary and output Parquet Alternatively, I could use a simpler approach with GeoPandas and duckdb. Let me download/access Overture data. The easiest way is using DuckDB's ability to query S3 parquet files directly. Overture data schema for buildings (2024-09-18 or later releases): - id: string - geometry: binary (WKB) - height: double (nullable) - ...etc. Let me write a comprehensive script. First, let me check what Overture release is available. Actually, let me just write a script using duckdb and geopandas. I'll use Overture's public data via their S3 bucket. The typical Overture buildings table path is: `s3://overturemaps-us-west-2/release/2024-09-18.0/theme=buildings/type=building/` Or more recent releases. Let me check what's available. Actually, let me write the script and see if it works. I'll query Overture data using DuckDB with spatial features.
0:20
Bash
duckdb -c "SELECT * FROM read_parquet('s3://overturemaps-us-west-2/release/2024-...
bash: line 1: duckdb: command not found
0:23
Let me find the duckdb binary location or install it.
0:23
Bash
which duckdb || find / -name duckdb -type f 2>/dev/null | head -5
(no output)
0:33
Let me check what's available and how to best query Overture Maps data. I'll use Python with duckdb library.
0:33
Bash
python3 -c "
python3 -c "
import duckdb
con = duckdb.connect()
print(con.execute('SELECT current_setting(\'duckdb.version\')').fetchone())
"
Traceback (most recent call last):
File "<string>", line 4, in <module>
print(con.execute('SELECT current_setting(\'duckdb.version\')').fetchone())
~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
_duckdb.CatalogException: Catalog Error: unrecognized configuration parameter "duckdb.version"
Did you mean: "duckdb_api"
0:38
Let me work with Overture Maps data using Python. I need to: 1. Query Overture buildings data for Lagos State, Nigeria 2. Filter buildings with footprint area > 1000 m² 3. Get LGA boundaries for Lagos State 4. Spatial join to assign LGA to each building 5. Compute per-LGA summary statistics Let me check what's available and how to access Overture data.
0:38
Bash
python3 -c "
python3 -c "
import duckdb
con = duckdb.connect()
# Check if we can access S3
print(con.execute('SELECT * FROM parquet_schema(\'s3://overturemaps-us-west-2/release/2024-09-18.0/theme=buildings/type=building/bbox=0-0-0-0/*.parquet\')').fetchdf().head())
" 2>&1 | head -20
Traceback (most recent call last):
File "<string>", line 5, in <module>
print(con.execute('SELECT * FROM parquet_schema(\'s3://overturemaps-us-west-2/release/2024-09-18.0/theme=buildings/type=building/bbox=0-0-0-0/*.parquet\')').fetchdf().head())
~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
_duckdb.IOException: IO Error: No files found that match the pattern "s3://overturemaps-us-west-2/release/2024-09-18.0/theme=buildings/type=building/bbox=0-0-0-0/*.parquet"
LINE 1: SELECT * FROM parquet_schema('s3://overturemaps-us-west-2/release/2024...
^
0:42
Let me check the Overture data format and location. Overture maps data is on S3. Let me see what's available.
0:42
Bash
python3 -c "
python3 -c "
import duckdb
con = duckdb.connect()
con.execute('SET s3_region=\'us-west-2\'')
print(con.execute(\"SELECT * FROM parquet_schema('s3://overturemaps-us-west-2/release/2024-09-18.0/theme=buildings/type=building/*') LIMIT 20\").fetchdf())
" 2>&1 | head -40
Traceback (most recent call last):
File "<string>", line 5, in <module>
print(con.execute("SELECT * FROM parquet_schema('s3://overturemaps-us-west-2/release/2024-09-18.0/theme=buildings/type=building/*') LIMIT 20").fetchdf())
~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
_duckdb.IOException: IO Error: No files found that match the pattern "s3://overturemaps-us-west-2/release/2024-09-18.0/theme=buildings/type=building/*"
LINE 1: SELECT * FROM parquet_schema('s3://overturemaps-us-west-2/release/2024...
^
0:47
Let me check the available Overture releases and the correct S3 path pattern.
0:47
Bash
python3 -c "
python3 -c "
import duckdb
con = duckdb.connect()
con.execute('SET s3_region=\'us-west-2\'')
# Try listing the bucket
result = con.execute(\"SELECT * FROM parquet_schema('s3://overturemaps-us-west-2/release/*') LIMIT 20\").fetchdf()
print(result)
" 2>&1 | head -20
Traceback (most recent call last):
File "<string>", line 6, in <module>
result = con.execute("SELECT * FROM parquet_schema('s3://overturemaps-us-west-2/release/*') LIMIT 20").fetchdf()
~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
_duckdb.IOException: IO Error: No files found that match the pattern "s3://overturemaps-us-west-2/release/*"
LINE 1: SELECT * FROM parquet_schema('s3://overturemaps-us-west-2/release/*')...
^
0:53
DuckDB can't directly access S3 without proper credentials or configuration. Let me try using geopandas to read overture data. Overture provides GeoParquet files on S3, but we need to configure S3 access properly. Let me check if there are any credentials or if we can use the AWS SDK.
0:53
Bash
python3 -c "
python3 -c "
import os
# Check for AWS credentials
for k in ['AWS_ACCESS_KEY_ID', 'AWS_SECRET_ACCESS_KEY', 'AWS_DEFAULT_REGION', 'AWS_REGION']:
print(f'{k}={os.environ.get(k, \"NOT SET\")}')
"
AWS_ACCESS_KEY_ID=NOT SET AWS_SECRET_ACCESS_KEY=NOT SET AWS_DEFAULT_REGION=NOT SET AWS_REGION=NOT SET
0:57
Overture data is on S3 but we don't have AWS credentials. Let me check if we can access it through a public endpoint or use some other approach. The Overture maps data is publicly accessible on S3, but DuckDB needs to be configured properly to access it. Let me check if we can use the https/httpfs extension.
0:57
Bash
python3 -c "
python3 -c "
import duckdb
con = duckdb.connect()
# Check available extensions
con.execute('SELECT * FROM duckdb_extensions()').show()
" 2>&1
Traceback (most recent call last):
File "<string>", line 5, in <module>
con.execute('SELECT * FROM duckdb_extensions()').show()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: '_duckdb.DuckDBPyConnection' object has no attribute 'show'
1:04
Let me try a different approach. Overture data is publicly accessible from S3 so we should be able to read it without credentials. Let me try using the HTTP server endpoint or the proper Overture release structure.
1:04
Bash
python3 -c "
python3 -c "
import duckdb
con = duckdb.connect()
# Try reading with httpfs using the Overture release URL
con.execute('INSTALL httpfs; LOAD httpfs;')
con.execute('SET s3_region=\'us-west-2\'')
# Actually Overture is public so we can use anonymous access
con.execute('SET s3_use_ssl=true')
result = con.execute(\"SELECT column_name, column_type FROM (DESCRIBE SELECT * FROM read_parquet('s3://overturemaps-us-west-2/release/2024-09-18.0/theme=buildings/type=building/*.parquet', hive_partitioning=true) LIMIT 0)\").fetchall()
for r in result:
print(r)
" 2>&1 | head -30
Traceback (most recent call last):
File "<string>", line 9, in <module>
result = con.execute("SELECT column_name, column_type FROM (DESCRIBE SELECT * FROM read_parquet('s3://overturemaps-us-west-2/release/2024-09-18.0/theme=buildings/type=building/*.parquet', hive_partitioning=true) LIMIT 0)").fetchall()
~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
_duckdb.IOException: IO Error: No files found that match the pattern "s3://overturemaps-us-west-2/release/2024-09-18.0/theme=buildings/type=building/*.parquet"
LINE 1: ...LECT column_name, column_type FROM (DESCRIBE SELECT * FROM read_parquet('s3://overturemaps-us-west-2/release/2024-09...
^
1:10
The S3 path pattern seems wrong. Let me check the Overture maps 2024-09-18.0 release structure more carefully. The Overture release uses a specific partitioning structure. Let me try using the `glob` pattern or reading the bucket directly.
1:10
Bash
python3 << 'PYEOF'
python3 << 'PYEOF'
import duckdb
con = duckdb.connect()
con.execute("INSTALL httpfs; LOAD httpfs;")
con.execute("SET s3_region='us-west-2'")
con.execute("SET s3_endpoint='s3.us-west-2.amazonaws.com'")
# Try glob pattern that matches Overture's structure
# Overture uses: release/YYYY-MM-DD.0/theme=<theme>/type=<type>/...
# Let's try listing the buildings
result = con.execute("""
SELECT count(*) FROM read_parquet('s3://overturemaps-us-west-2/release/2024-09-18.0/theme=buildings/type=building/**/*.parquet', filename=true, hive_partitioning=true)
""").fetchone()
print(result)
PYEOF
Traceback (most recent call last):
File "<stdin>", line 10, in <module>
_duckdb.IOException: IO Error: No files found that match the pattern "s3://overturemaps-us-west-2/release/2024-09-18.0/theme=buildings/type=building/**/*.parquet"
LINE 2: SELECT count(*) FROM read_parquet('s3://overturemaps-us-west-2/release/2024-09...
^
1:15
Let me check the actual bucket structure.
1:15
Bash
python3 << 'PYEOF'
python3 << 'PYEOF'
import urllib.request
import json
# Check Overture's release listing
url = "https://overturemaps-us-west-2.s3.us-west-2.amazonaws.com/?prefix=release/2024-09-18.0/"
try:
resp = urllib.request.urlopen(url)
content = resp.read().decode()
print(content[:2000])
except Exception as e:
print(f"Error: {e}")
PYEOF
<?xml version="1.0" encoding="UTF-8"?> <ListBucketResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Name>overturemaps-us-west-2</Name><Prefix>release/2024-09-18.0/</Prefix><Marker></Marker><MaxKeys>1000</MaxKeys><IsTruncated>false</IsTruncated></ListBucketResult>
1:22
The bucket listing is empty or we're not authorized. Let me try a different approach. Let me check the current Overture release structure by looking at their documentation or trying to access the data a different way. Actually, let me check what releases are available.
1:22
Bash
python3 << 'PYEOF'
python3 << 'PYEOF' import urllib.request import xml.etree.ElementTree as ET # List the top-level url = "https://overturemaps-us-west-2.s3.us-west-2.amazonaws.com/?prefix=release/" resp = urllib.request.urlopen(url) content = resp.read().decode() print(content[:3000]) PYEOF
<?xml version="1.0" encoding="UTF-8"?> <ListBucketResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Name>overturemaps-us-west-2</Name><Prefix>release/</Prefix><Marker></Marker><MaxKeys>1000</MaxKeys><IsTruncated>false</IsTruncated><Contents><Key>release/2026-05-20.0/theme=addresses/type=address/part-00000-31c0e023-6c05-5507-a6be-9582c0e29591-c000.zstd.parquet</Key><LastModified>2026-05-20T16:09:57.000Z</LastModified><ETag>"d2837a26120f888adc2668728b9132f4-11"</ETag><ChecksumAlgorithm>SHA256</ChecksumAlgorithm><ChecksumType>COMPOSITE</ChecksumType><Size>688939482</Size><Owner><ID>15b7f69019aeb60070a7e5e0abef900903bcad93b2cd252fc9ac14881f1cea14</ID></Owner><StorageClass>INTELLIGENT_TIERING</StorageClass></Contents><Contents><Key>release/2026-05-20.0/theme=addresses/type=address/part-00001-2df67527-dbc6-507f-b98e-49a66290a099-c000.zstd.parquet</Key><LastModified>2026-05-20T16:12:14.000Z</LastModified><ETag>"15b78f73e8570eec6519a91613e05a3e-11"</ETag><ChecksumAlgorithm>SHA256</ChecksumAlgorithm><ChecksumType>COMPOSITE</ChecksumType><Size>709031459</Size><Owner><ID>15b7f69019aeb60070a7e5e0abef900903bcad93b2cd252fc9ac14881f1cea14</ID></Owner><StorageClass>INTELLIGENT_TIERING</StorageClass></Contents><Contents><Key>release/2026-05-20.0/theme=addresses/type=address/part-00002-35508ac1-3c35-52f7-b97f-d49888be067a-c000.zstd.parquet</Key><LastModified>2026-05-20T16:11:36.000Z</LastModified><ETag>"76c9a09ef1d71cb51c7a6baec9d0b290-10"</ETag><ChecksumAlgorithm>SHA256</ChecksumAlgorithm><ChecksumType>COMPOSITE</ChecksumType><Size>643939151</Size><Owner><ID>15b7f69019aeb60070a7e5e0abef900903bcad93b2cd252fc9ac14881f1cea14</ID></Owner><StorageClass>INTELLIGENT_TIERING</StorageClass></Contents><Contents><Key>release/2026-05-20.0/theme=addresses/type=address/part-00003-271f45f9-79f3-5340-85ce-1161298146a8-c000.zstd.parquet</Key><LastModified>2026-05-20T16:13:18.000Z</LastModified><ETag>"3a87bbd3ccbe12f36f8b39f8b5c011c3-11"</ETag><ChecksumAlgorithm>SHA256</ChecksumAlgorithm><ChecksumType>COMPOSITE</ChecksumType><Size>726355163</Size><Owner><ID>15b7f69019aeb60070a7e5e0abef900903bcad93b2cd252fc9ac14881f1cea14</ID></Owner><StorageClass>INTELLIGENT_TIERING</StorageClass></Contents><Contents><Key>release/2026-05-20.0/theme=addresses/type=address/part-00004-3ffb4aa1-3a6d-51e4-8658-cad4718e93f8-c000.zstd.parquet</Key><LastModified>2026-05-20T16:15:12.000Z</LastModified><ETag>"1dc8566827c013ed860e961e5c750cc7-11"</ETag><ChecksumAlgorithm>SHA256</ChecksumAlgorithm><ChecksumType>COMPOSITE</ChecksumType><Size>689517806</Size><Owner><ID>15b7f69019aeb60070a7e5e0abef900903bcad93b2cd252fc9ac14881f1cea14</ID></Owner><StorageClass>INTELLIGENT_TIERING</StorageClass></Contents><Contents><Key>release/2026-05-20.0/theme=addresses/type=address/part-00005-5c5cc44e-c017-53f8-baa2-9360addf610e-c000.zstd.parquet</Key><LastModified>2026-05-20T16:10:59.000Z</LastModified><ETag>"0347e80934c6bea84a0333865ca1d71f-10"</ETag><Checksum
1:29
So the latest release is 2026-05-20.0, not 2024-09-18.0. Let me look for buildings in this release.
1:29
Bash
python3 << 'PYEOF'
python3 << 'PYEOF'
import urllib.request
import xml.etree.ElementTree as ET
# List buildings in the latest release
url = "https://overturemaps-us-west-2.s3.us-west-2.amazonaws.com/?prefix=release/2026-05-20.0/theme=buildings/type=building/"
resp = urllib.request.urlopen(url)
content = resp.read().decode()
# Parse XML
root = ET.fromstring(content)
ns = {'s3': 'http://s3.amazonaws.com/doc/2006-03-01/'}
keys = []
for contents in root.findall('s3:Contents', ns):
key = contents.find('s3:Key', ns).text
keys.append(key)
print(key)
print(f"\nTotal files: {len(keys)}")
PYEOF
release/2026-05-20.0/theme=buildings/type=building/part-00000-ddc67476-7254-536d-b4da-7c405a803f55-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00001-f4de43c3-74e5-5dcf-939c-b7fd5d63590c-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00002-b6f0b781-b07a-57b7-b297-e396f0d0691b-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00003-1a188294-fdd0-5e36-b44c-377733edfd48-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00004-accd68b1-a26c-5309-9da4-d84a3344386f-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00005-750e1262-2a4e-5be7-be4b-58905d7c0b7b-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00006-ad0a6b9a-1459-5aa3-842d-3929059d59a3-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00007-a7e91b03-c993-537d-ad2d-342d3ed290b8-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00008-025ead5a-4db2-5f3e-911a-c54aba2f8162-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00009-e4c0b47c-f43a-58ed-93d8-d184b5ba9b41-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00010-fb3c01fd-0e8f-50d9-8975-eff4ca11880b-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00011-1023d82d-ca62-5521-a24a-1b083b061398-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00012-dabb97e2-1efc-5407-98c5-f8d7a6620ab0-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00013-c41e64a5-3c33-5214-b994-43997c43e606-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00014-aa1b9760-5d6a-5a50-96ee-30033ca4a80c-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00015-ac33d8ad-255a-5e5f-bbb1-5886594cb22b-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00016-2e09ab1f-9dac-5cd1-a397-4ebda9a51f48-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00017-96b2bcad-9734-5350-9cbb-1477f81274cd-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00018-08e2adc6-7651-512d-9886-040ef2df21f8-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00019-9b4b2f98-1d7d-5075-b37d-ebf2fd9e27a4-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00020-3febfef6-3699-5c77-9c01-ddb824a955d5-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00021-cce0b8f2-bd84-5308-a643-980e97da4300-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00022-3b5434fc-7611-523b-b811-075f1b108b55-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00023-f57a4b6b-1911-563c-b1d5-8b01fbc2019d-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00024-35748a88-55f9-5607-a086-e49f61293bbd-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00025-34cbcf07-2df7-57d7-be32-b68076ab6aba-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00026-db319641-d4f6-59ae-a2a8-32aaf201c11f-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00027-f00cfe89-53da-54c4-80fc-080dc1c3fa8e-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00028-a45b27ff-3d86-53e9-9450-67c777b1ff8d-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00029-459050d5-db85-520a-87e4-b088045aa42a-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00030-6674f6f7-77f3-59cb-bdb2-65156682022f-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00031-5ca8137b-c3bb-5fa9-8bc0-6c33c6ae9e6b-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00032-b9a0a735-d193-5e21-b64d-304c5d2b5156-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00033-f3ffd145-091a-5846-a6b1-8a9bd3d35287-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00034-234ee8c8-4a01-5d47-8953-c70d31b63326-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00035-65155caf-cc89-5a05-be97-ed73de00406b-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00036-432168f8-b79b-5481-86bc-e1eac2ed1b21-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00037-718570a5-334b-50ef-aa0f-2e37fbade12d-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00038-546c2c78-cf95-5f8f-adec-465bcc4d9354-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00039-13aab510-223b-5b83-a03d-9dbc72e6d62f-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00040-0bb9f33d-faa0-5c5f-833c-6c0f8ad2dd77-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00041-b81a3018-f9f3-5ed3-934d-bdac534cb1ce-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00042-8721c921-2481-53c6-9352-4d9b0157dbed-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00043-30bb1661-03ad-50ec-bd37-e07f65971cc4-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00044-a7e2e32b-dc11-551d-bd30-4f3a78266f88-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00045-9319d19a-5f3c-5e07-a186-0f0b5ce9a799-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00046-772e79a5-2e4e-511a-883b-891d9e95e64f-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00047-b5e5f987-a727-52e2-9f44-eee85c9990ef-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00048-08aabcfc-acd5-5668-a806-2a7c0073844c-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00049-848922cf-b7d5-57dc-999e-2cc8f6a6390d-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00050-1636e4a8-b0ed-5891-ab9a-6a47d5cbc4ed-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00051-ac06d99e-a8f3-5422-8ad9-c2bb3b7fd776-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00052-5f043d60-4450-5055-939d-d8f13e17568e-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00053-ef4d1442-18f4-5614-a3ca-5838ae32c916-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00054-0bc29e49-df11-5b6b-bb46-c6a7bb4fe2d9-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00055-cc0f3bd0-e61e-5621-a0c4-bc666afb2001-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00056-e4a0bf68-c67a-574d-93de-967b87f4795d-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00057-76e23feb-d693-56f4-94c4-eaa0c40faadd-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00058-dd35f94b-5c72-5b28-995c-064cffbeef01-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00059-de6b863d-18d0-58ff-aee7-f6a132a615f4-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00060-525b0792-80e4-56c0-8d8d-6f95bf9ccc6f-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00061-566b98d4-5bd3-54db-a788-e5893ddacc37-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00062-b12db0fc-5e0b-5a18-b90a-be33b80e1b8e-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00063-42213e01-3098-5ad0-99d4-9b1b119d5ce7-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00064-eaa08e58-ace1-5c78-a7dc-bc55a1423995-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00065-24f12bfa-48a1-55c7-b311-af9b9617be05-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00066-cc7b238a-2acc-5510-9850-7b250a7f0135-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00067-e6e09788-ed3b-509d-9e14-6126302a65ca-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00068-8e1116b2-b891-5cf9-b3cc-59a3e4184443-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00069-025c2f5b-3042-5c07-8afe-7b09d12478a1-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00070-aa1fc855-b5a1-57be-b9e8-efb3f38bbb19-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00071-27e69a4f-ee29-5725-8cac-a75fdd7a39fc-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00072-ba338bb4-2c81-5f03-b3a5-5fa5b0552b6a-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00073-1684ef98-f08d-5a22-ae91-17a0299ef3c5-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00074-67b17ff3-8752-5f60-9a21-0e1f8945d25d-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00075-85cefa8a-ac7a-5b8e-b58c-7da85e119465-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00076-c1f7ab01-eacf-5c51-99fb-206ce9549384-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00077-dbcb702d-5a8f-5b54-ac76-bc7249fc31f1-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00078-9ee062cc-1458-5f9c-8829-2a2c782406ac-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00079-6c62f3d4-86e2-579b-a61c-b84366af4731-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00080-3086a6e4-e8b2-5a0a-8119-e05dff4e4129-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00081-7c28cb4f-3d3e-51c0-8f00-2d2a6dbf3cce-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00082-aa05a3a2-a59c-55a3-8556-7fcf9a3c38d5-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00083-cd42b04c-7177-528d-a8b9-4e3cb841a76a-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00084-f79c0fcb-fb64-5996-92e7-4355bd5a2804-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00085-8d8af64b-2e53-50bc-859f-bd1b35f7db2d-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00086-0e37d2bc-ac8c-50c9-91bc-37863b59ad43-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00087-6de409f2-2148-551e-8ace-00a19b8657ca-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00088-df603c44-665d-50ca-b07e-3ad68334aeb7-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00089-01812f32-7ae8-5a33-81a3-d32f6184bcea-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00090-ce8b0dfa-1121-5b12-bc6e-c7c68ad0f5bd-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00091-a797fa82-42d5-5aa2-ba2b-a813cea8eb9a-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00092-9f4e4a13-76b5-5fac-b951-4d8b6940bbf2-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00093-41e2cec0-5302-5dcf-9d71-852ad704e51a-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00094-78b7924d-b0c5-593d-b6b2-2ac77ffbc544-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00095-9f6750fd-2c5e-5d84-8d70-26625d0ee176-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00096-02960212-e05c-56e7-ae78-aafb4a733c37-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00097-ce821039-8c1f-565f-bf1d-cbb91cf202a6-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00098-99e24b01-edc4-57ec-8162-d5371cf4a2e9-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00099-d8b2416b-b6c7-5f4c-86af-3b1be45df747-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00100-79eda48a-74af-5fbf-95c0-18006e141b16-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00101-25bed595-3b54-5063-95f1-56dfbdd2fb5a-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00102-118d9d34-6e8c-575b-b796-e33ee701a024-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00103-57827f27-3c61-5aad-baa2-2b09a4811267-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00104-c8f1be40-df3d-5388-bef6-02674ec1996b-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00105-fe77292c-db99-58ea-a319-bab6318c046d-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00106-8ee64d3b-cb40-5f94-8ac8-a124545f2290-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00107-9f0cea5a-5f8a-50a8-8ab6-5589b8840e58-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00108-1f1ab852-b2e1-5742-a61b-8418589e7939-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00109-90bef44a-6107-518d-912e-e8b1d98549dc-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00110-ce21f003-c995-5129-aa0f-fc65b04cc598-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00111-54c3137c-e043-5307-9cc6-8e424fbb6009-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00112-26a753b2-c3b9-578d-ace0-52189cd1e484-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00113-74bbb9bd-0fe1-57c2-b525-371a4f17909e-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00114-0bbcca89-7c64-59af-ade3-2959b464f1cf-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00115-834c0750-b303-5f54-a421-aa9ca209e035-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00116-6411b233-f44d-5aec-9cf5-d5576c910c78-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00117-8a0ca779-3043-53eb-8981-87a4c900b237-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00118-7e5eb552-6775-5c9f-8119-323b73893002-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00119-fb1be2ad-1b11-5c9e-aa1b-0952d774e243-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00120-f930a03d-acde-54e9-a973-89edaf89b65f-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00121-e94193fd-4352-5b3d-8ee7-6c7a7bf0ed29-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00122-9a9307a9-ec64-579b-90ca-f0cafa3e7942-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00123-9e8c1169-9167-5b22-a52c-e9d12d4ba888-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00124-c542d215-e5eb-56b5-a84e-28521ee13ab0-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00125-0d1e4188-6d5f-513c-ac23-c596d6b223ba-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00126-30bdba33-380a-5fec-9ffa-6905e5e828bb-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00127-95239d59-dd38-5d3e-948f-a4afa73cbd67-c000.zstd.parquet release/2026-05-20.0/the [... truncated ...]rquet release/2026-05-20.0/theme=buildings/type=building/part-00384-63b769c7-8b88-5b04-89da-5c411b5c0dd8-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00385-b862403b-8bdc-5ddf-92f2-090509a1e291-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00386-5b7a6012-89ae-5013-b92d-4cdf729f089e-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00387-0c9d406c-6fc9-573d-b4ed-ed6f95657df4-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00388-1254462e-7230-57f9-aee4-b7c760cf06bf-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00389-f9ff0fd4-4fb4-5804-b382-5327990882a1-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00390-580124bf-92b7-523f-ab38-ba2bfccd761d-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00391-ec38c8a5-dacc-5900-bd82-d44f2e979abe-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00392-0dfa7ac1-648a-5ac0-9240-e66c5ce91883-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00393-d8061f05-cd99-567a-bc58-253de8469517-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00394-b154e2fc-4ce8-58c2-9769-d0a87927bd66-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00395-d4972204-6aa7-5be8-a134-38fe5d8de121-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00396-bc1b2beb-d679-5fb6-b984-e4ffdeef80d9-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00397-f0aaa861-9277-5dd0-bf83-8f59362c3ed2-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00398-2b277c81-28cf-5a3e-b125-e7757e6ec3f7-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00399-bb22c769-871d-53ee-8fb5-38c437658c5b-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00400-5569b16d-f531-5bdd-afe7-59bf00cbb251-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00401-08f6387f-4efe-53d2-8253-39affc71a3e8-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00402-17af27bc-1c2f-5367-9399-d1a96bd74fdc-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00403-5d82deb5-6b6f-5534-850d-390ac989e814-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00404-a9250c1b-6cb3-562c-b34c-20d51f10ebdd-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00405-ceb1ed6e-94de-5127-822f-06a1cc33380c-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00406-9bda817c-3eda-59c8-9f5e-f841cc9252b0-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00407-5954c6fd-b62d-5dba-9eaa-2dc71f5ae064-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00408-7d34e7b3-e8e2-5d46-b2fe-79f158d22779-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00409-35b92d0c-1208-5b94-98d5-1f96952d5d75-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00410-d3ac45a2-9598-5137-a0ae-011fb2c02c46-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00411-66424a40-015b-53fb-84d2-d4b8237ab1dd-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00412-25596a44-ed27-57ea-b429-d5e9885a3eb6-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00413-a6e5d668-5c55-5559-aed8-a1569e17bfed-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00414-26186971-8033-511c-a99e-b6d4926f0064-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00415-2825f635-6655-548a-89fb-bfc3267a7799-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00416-ff7f1d92-e37c-5d15-99db-39f300bd5072-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00417-cd2ff1f5-92dd-51db-8967-cdea4120fe6c-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00418-eb35ee37-3bbc-571a-a259-3853ef564db2-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00419-bd9aa58d-1253-5e69-abf1-b3b9b4c15c30-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00420-b2fb0c36-8190-5ef1-b7c6-e1d5d781b334-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00421-766f1977-e6aa-587b-8b4c-6cb7dcd2248d-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00422-a794fece-d226-5ab7-a16f-ed7445af799e-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00423-fdf1d8c1-a6bd-5e97-8c5d-92897c057ac6-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00424-1636e7da-5391-5b7b-9ff0-8abfbc168146-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00425-ceaef95e-6533-5485-9b73-d8f5ddf0f42a-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00426-9e1b5c1f-4880-5a2d-a2d8-0c937f09d2c0-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00427-e5073581-d6ef-51d1-89cc-439ca52f56a3-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00428-09e8f2b1-13f3-5430-bc2e-8f2a04b22e15-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00429-0e3ae1b1-7f3b-547a-b782-1ff7962f8dc0-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00430-c8a905db-09e9-5630-86b0-81be1f9f641e-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00431-f88f81ab-ccb0-5b18-b12f-93f053f79073-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00432-34a0418f-434d-5383-bd6b-d8dcf2968f2b-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00433-53a12fef-ea8a-553d-bff3-d85cc37df468-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00434-7d6998e6-9537-5d56-b11d-7095a96b18bd-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00435-35ca826f-5e87-530a-9998-db502cf6e738-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00436-79b32368-6083-50c2-8418-9508052490fb-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00437-9a74453e-0f12-5630-9821-642bd39924e0-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00438-514b946f-1252-5d53-8ed9-98bcf918396f-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00439-4e47c967-e662-56e6-8c75-e0cf888482af-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00440-8e16fcfb-d65e-54a5-ab55-b59f5b7091f6-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00441-9ad77ea6-9f3f-5601-b82f-b1c0f283f6a6-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00442-acef933c-f891-5c28-8295-ec1fb604c8ad-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00443-3b507f05-d2d2-5dc4-bbff-f77ca8f40e88-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00444-c758194f-8660-57f6-b660-6d7341d5c7fe-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00445-78b349aa-70cf-59a3-a0de-91147cb6269d-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00446-9c69e1a5-9019-5a9d-8339-14d3b571068b-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00447-14473460-69ec-5a06-a795-2099c9b95a23-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00448-6307f695-1235-558a-bdc7-5edb0a20d1b9-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00449-cc1673fa-282d-5f3e-89a5-32f23cd251df-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00450-98ab3f61-77ed-5a75-b73a-5942320d4b7c-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00451-6d4d9b75-77ef-5ea4-a364-14a68ac04b23-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00452-66996d8d-75d1-5efc-97d8-df1ea9fe4251-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00453-9d8b9026-71f2-520d-b1c9-423584f6da77-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00454-cb179c50-3437-5b00-8cfd-d086c37bf86b-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00455-dcbd2fce-5127-5c67-ab4e-ed2ea0f01e32-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00456-c71ff78d-1f24-5e7a-a146-525df0061f5a-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00457-bc581576-d2ff-5657-be51-d7ccd760153a-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00458-75d02402-c381-53d1-9055-6fed09f0bb80-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00459-8be3989b-524f-52f8-8c67-a380278910c3-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00460-1aaac0bd-7c82-5c9e-80f1-0e233b17327a-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00461-b5d2f57e-8f4f-5ded-9693-7bac10c1fea9-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00462-624bb5fc-479f-56ee-bcfb-90bde92ee14f-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00463-5409b1c4-01c6-5ae5-ac70-6c6d2afed13f-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00464-ae040d7e-9f6e-597f-8e9d-c6d3da8ce236-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00465-9da3fa33-8706-5f9b-a793-e6d9c6d890f9-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00466-c378714e-e6c3-51d0-abd8-001f2182e018-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00467-1262d79f-0d99-5dac-9df1-923c093855d6-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00468-6157eb4d-ba41-56bd-b0ce-c85c64420f93-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00469-26dd233d-43ea-52b7-a618-28df67c1232a-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00470-b763313b-7474-5295-89d6-157f415d9e63-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00471-bc523b3f-0dfe-5b80-a8fc-4c7613b80a57-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00472-356a065d-949f-5579-8d43-840a8c7e24fa-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00473-c5bb37c1-ab3a-5a85-a367-a1a55c0f4c1b-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00474-c49eec5e-e682-5284-9076-3fb78744b52e-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00475-9eed6d49-8f13-5b0c-9b93-fc7d8a940bd9-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00476-61153faf-fc6d-52a6-888c-26aa5a65a70c-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00477-75ded13a-c075-5cde-b69a-9e0e647e956d-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00478-98399d28-99d8-5e48-9f77-2c19536b4d90-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00479-c7b773c3-fb5f-5762-90e2-bb16143ec291-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00480-9735e0c8-d5c5-5f76-893f-8ff1f5ce28bc-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00481-619e852c-607f-5913-8a75-0c6c95b31da0-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00482-fcc47ffd-de93-51b6-bb6a-9924f886709d-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00483-980f9d63-bea9-59ec-aa47-f97cc8281e9d-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00484-84d9753d-9cd1-5c8f-a3a1-5cf73708ff73-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00485-5d38de3f-3574-5828-8894-fda385c28239-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00486-a8e53b3a-bab6-5234-8d05-562e34c72f10-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00487-63736c23-ff09-5304-adbb-0c3ba4650de1-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00488-2eee0858-6d5f-5fd9-9aa5-f00c206a9582-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00489-7494bea1-e067-509c-a0ca-16f37e985e26-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00490-9b34070f-e13a-5276-ac84-176f58b4f068-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00491-0004d26f-d9db-58c8-9943-143d18cc3cec-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00492-0664164f-fd94-527e-b7f7-6161d9b8a071-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00493-6c7fcdf6-1d7b-5a58-8881-0509b541a882-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00494-70d49d7b-9cd2-5fc5-b9f3-885922fb41d3-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00495-f828bba8-1eda-58e4-a375-07f1fdfe1aca-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00496-fb883ac5-4ae7-55d1-901c-48abae7b38e5-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00497-6c446824-0aa8-5d79-9978-4d84d7afa7cb-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00498-bb4db278-a3b6-55c3-9c6d-6d80fd889150-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00499-9e579710-19fb-57cd-95c1-c716cd80f6f2-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00500-b3e444d7-892b-5d04-8ef9-96e5058581f9-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00501-419e7568-cbd5-5a6c-ad22-06c3258568bb-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00502-351bb0c3-b9f3-5433-8bf2-209720a4e0b8-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00503-f3607c09-79f0-52b0-b68d-2d72d8e1fa17-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00504-98d8fe2c-e7e0-553c-b6d3-b8e10083cb80-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00505-8e2a56eb-fcc8-53ce-8ae4-3647b54dc544-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00506-737fd210-8c60-578d-8f5a-4f3362c4152a-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00507-db876cf1-d2ac-542e-bf19-aa51e22b9313-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00508-fb8a2521-9a1e-59fe-b4c3-13fb703470d0-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00509-f37eb431-6415-5a42-af67-1fbb989d2451-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00510-1b010e2d-8979-5a2a-aeec-4d7c3ad5ad14-c000.zstd.parquet release/2026-05-20.0/theme=buildings/type=building/part-00511-625c74da-89f0-5a08-9db3-093e2432f62c-c000.zstd.parquet Total files: 512
1:40
Good, the latest release is 2026-05-20.0. Let me now query the schema of Overture buildings to understand the columns, and also check what Nigeria/Lagos data looks like.
1:40
Bash
python3 << 'PYEOF'
python3 << 'PYEOF'
import urllib.request
import xml.etree.ElementTree as ET
# Check schema first - read one parquet file headers
# Let's use pyarrow to check schema from S3
import pyarrow.parquet as pq
import s3fs
# Need to check what's available
url = "https://overturemaps-us-west-2.s3.us-west-2.amazonaws.com/release/2026-05-20.0/theme=buildings/type=building/part-00000-ddc67476-7254-536d-b4da-7c405a803f55-c000.zstd.parquet"
# Use pyarrow with anonymous access
fs = s3fs.S3FileSystem(anon=True)
try:
schema = pq.read_schema(url, filesystem=fs)
print("Schema fields:")
for field in schema:
print(f" {field.name}: {field.type}")
except Exception as e:
print(f"Error: {e}")
# Try with fsspec
import fsspec
print("Trying fsspec...")
PYEOF
Traceback (most recent call last): File "<stdin>", line 7, in <module> ModuleNotFoundError: No module named 's3fs'
1:45
Bash
pip install --user s3fs 2>&1 | tail -5
Downloading frozenlist-1.8.0-cp314-cp314-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl (231 kB) Downloading propcache-0.5.2-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (61 kB) Installing collected packages: wrapt, propcache, multidict, jmespath, fsspec, frozenlist, aioitertools, aiohappyeyeballs, yarl, botocore, aiosignal, aiohttp, aiobotocore, s3fs Successfully installed aiobotocore-3.7.0 aiohappyeyeballs-2.6.2 aiohttp-3.14.1 aioitertools-0.13.0 aiosignal-1.4.0 botocore-1.43.0 frozenlist-1.8.0 fsspec-2026.6.0 jmespath-1.1.0 multidict-6.7.1 propcache-0.5.2 s3fs-2026.6.0 wrapt-2.2.1 yarl-1.24.2
1:59
Bash
python3 << 'PYEOF'
python3 << 'PYEOF'
import pyarrow.parquet as pq
import s3fs
fs = s3fs.S3FileSystem(anon=True)
# Read schema from one file
url = "s3://overturemaps-us-west-2/release/2026-05-20.0/theme=buildings/type=building/part-00000-ddc67476-7254-536d-b4da-7c405a803f55-c000.zstd.parquet"
schema = pq.read_schema(url, filesystem=fs)
print("Schema fields:")
for field in schema:
print(f" {field.name}: {field.type}")
PYEOF
Schema fields:
id: string
names: struct<primary: string, common: map<string, string ('common')>, rules: list<element: struct<variant: string, language: string, perspectives: struct<mode: string, countries: list<element: string>>, value: string, between: list<element: double>, side: string>>>
sources: list<element: struct<property: string, dataset: string, license: string, record_id: string, update_time: string, confidence: double, between: list<element: double>>>
level: int32
height: double
min_height: double
is_underground: bool
num_floors: int32
num_floors_underground: int32
min_floor: int32
subtype: string
class: string
facade_color: string
facade_material: string
roof_material: string
roof_shape: string
roof_direction: double
roof_orientation: string
roof_color: string
roof_height: double
geometry: binary
has_parts: bool
version: int32
bbox: struct<xmin: double, xmax: double, ymin: double, ymax: double>
2:13
Great! So the Overture buildings schema has columns like `id`, `height`, `geometry`, etc. Now I need to: 1. Filter buildings in Lagos State, Nigeria 2. Filter those with footprint area > 1000 m² 3. Compute per-LGA roll-up Let me first get the Lagos State boundary to filter. I need a geometry for Lagos State, Nigeria. Let me get it from Natural Earth or another source. Actually, the Overture buildings data has global coverage. I need to filter by Lagos State, Nigeria. I can get the boundary of Lagos State from Natural Earth data, or use a bounding box. Let me check what the bounding box of Lagos State is. Actually, I think the easiest approach is: 1. Get a Lagos State boundary from naturalearth_lowres or gadm or similar 2. Use spatial filtering to select only buildings within Lagos State 3. Join with LGA boundaries But wait - I don't have LGA boundaries. Let me think about the workflow. Actually, I should: 1. Get Lagos State outline and LGA boundaries from some source 2. Filter buildings within Lagos State 3. Filter those > 1000 m² footprint area 4. Compute the per-LGA summary Let me check if there's a Lagos State boundary available through geopandas built-in datasets or if I need to use something else. Actually, I can download administrative boundaries from naturalearthdata.com or use Overture's administrative boundaries. Let me check Overture's admin boundaries. Actually, let me first just figure out the bounding box of Lagos State and use that to narrow down the buildings data. Then I'll need LGA boundaries. Let me use Natural Earth data for country and admin boundaries.
2:13
Bash
python3 << 'PYEOF'
python3 << 'PYEOF'
import pyarrow.parquet as pq
import s3fs
fs = s3fs.S3FileSystem(anon=True)
# Check the divisions/administrative boundaries from Overture
# Let's look at what themes are available
import urllib.request
import xml.etree.ElementTree as ET
url = "https://overturemaps-us-west-2.s3.us-west-2.amazonaws.com/?prefix=release/2026-05-20.0/theme=divisions/"
resp = urllib.request.urlopen(url)
content = resp.read().decode()
root = ET.fromstring(content)
ns = {'s3': 'http://s3.amazonaws.com/doc/2006-03-01/'}
for contents in root.findall('s3:Contents', ns):
key = contents.find('s3:Key', ns).text
print(key)
PYEOF
release/2026-05-20.0/theme=divisions/type=division/part-00000-20106c43-63b4-56d1-81ac-1de154f6e5cb-c000.zstd.parquet release/2026-05-20.0/theme=divisions/type=division_area/part-00000-20b4270d-ad56-5e61-86d3-ee6b0af5789c-c000.zstd.parquet release/2026-05-20.0/theme=divisions/type=division_area/part-00001-0f46ac69-aac0-5c92-83fe-7bbfc0b0b1f4-c000.zstd.parquet release/2026-05-20.0/theme=divisions/type=division_area/part-00002-e7b1aeb4-0a8c-5609-8d73-1633e86e5ee6-c000.zstd.parquet release/2026-05-20.0/theme=divisions/type=division_area/part-00003-8a1f9a09-1fa6-52df-85a4-5c3f8a5ba71d-c000.zstd.parquet release/2026-05-20.0/theme=divisions/type=division_area/part-00004-580a2fdd-cd3a-5a09-af83-f5629a629db2-c000.zstd.parquet release/2026-05-20.0/theme=divisions/type=division_area/part-00005-754fe1eb-cbda-554a-9454-3ddf6a004d7b-c000.zstd.parquet release/2026-05-20.0/theme=divisions/type=division_area/part-00006-ae943f0e-f148-55fa-b082-9e7b8a9e3fdf-c000.zstd.parquet release/2026-05-20.0/theme=divisions/type=division_area/part-00007-fb803484-32e2-5175-ab82-3e65539fb0ca-c000.zstd.parquet release/2026-05-20.0/theme=divisions/type=division_boundary/part-00000-9a8bf1fb-4501-531f-aa17-7dc03ed921e4-c000.zstd.parquet
2:20
Let me check the schema of the divisions data and look for Nigeria/Lagos boundaries.
2:20
Bash
python3 << 'PYEOF'
python3 << 'PYEOF'
import pyarrow.parquet as pq
import s3fs
fs = s3fs.S3FileSystem(anon=True)
# Check division_area schema
url = "s3://overturemaps-us-west-2/release/2026-05-20.0/theme=divisions/type=division_area/part-00000-20b4270d-ad56-5e61-86d3-ee6b0af5789c-c000.zstd.parquet"
schema = pq.read_schema(url, filesystem=fs)
print("Division area schema:")
for field in schema:
print(f" {field.name}: {field.type}")
PYEOF
Division area schema:
id: string
geometry: binary
country: string
sources: list<element: struct<property: string, dataset: string, license: string, record_id: string, update_time: string, confidence: double, between: list<element: double>>>
subtype: string
admin_level: int32
class: string
names: struct<primary: string, common: map<string, string ('common')>, rules: list<element: struct<variant: string, language: string, perspectives: struct<mode: string, countries: list<element: string>>, value: string, between: list<element: double>, side: string>>>
is_land: bool
is_territorial: bool
region: string
division_id: string
version: int32
bbox: struct<xmin: double, xmax: double, ymin: double, ymax: double>
2:36
So the division_area table has: id, geometry (binary), country, subtype, admin_level, class, names (struct with primary), division_id, etc. I need to: 1. Filter Nigeria divisions (country='NG') to get Lagos State boundary 2. Get LGA boundaries (admin_level... Let me check what admin_level LGAs are) 3. Filter buildings in Lagos State and with footprint > 1000 m² 4. Spatially join buildings to LGAs 5. Compute per-LGA summary stats Let me first understand the data better. Let me check what countries data looks like and what admin levels are used for LGAs.
2:36
Bash
python3 << 'PYEOF'
python3 << 'PYEOF'
import pyarrow.parquet as pq
import s3fs
fs = s3fs.S3FileSystem(anon=True)
# Quick check on divisions data for Nigeria
url = "s3://overturemaps-us-west-2/release/2026-05-20.0/theme=divisions/type=division_area/part-00000-20b4270d-ad56-5e61-86d3-ee6b0af5789c-c000.zstd.parquet"
# Let's read a few rows to understand the data
import pyarrow.dataset as ds
dataset = ds.dataset(url, format="parquet", filesystem=fs)
# Read all data with country='NG'
import pyarrow.compute as pc
# Filter for Nigeria
scanner = dataset.scanner(filter=pc.field("country") == "NG")
table = scanner.to_table()
print(f"Total Nigeria rows: {len(table)}")
# Check unique admin_levels
import pandas as pd
df = table.to_pandas()
print("\nAdmin levels:")
print(df['admin_level'].value_counts().sort_index())
print("\nSubtypes:")
print(df['subtype'].value_counts())
print("\nSample rows:")
print(df[['id', 'country', 'subtype', 'admin_level', 'class', 'names', 'region']].head(20).to_string())
PYEOF
<exit_code>-1</exit_code> <error>Command timed out after 120000ms</error>
4:44
The query timed out. Let me try a different approach. Let me first check what the division_area data looks like, filtering more efficiently.
4:44
Bash
python3 << 'PYEOF'
python3 << 'PYEOF'
import pyarrow.parquet as pq
import s3fs
fs = s3fs.S3FileSystem(anon=True)
# Read just a few rows from division_area
url = "s3://overturemaps-us-west-2/release/2026-05-20.0/theme=divisions/type=division_area/part-00000-20b4270d-ad56-5e61-86d3-ee6b0af5789c-c000.zstd.parquet"
# Read first 100 rows
table = pq.read_table(url, filesystem=fs)
df = table.to_pandas()
print("Columns:", list(df.columns))
print(f"Total rows in part 0: {len(df)}")
print("\nSample rows (first 20):")
for i, row in df.head(20).iterrows():
print(f" id={row['id']}, country={row['country']}, subtype={row['subtype']}, admin_level={row['admin_level']}, class={row['class']}, region={row['region']}, names_primary={row['names']['primary'] if row['names'] else None}")
PYEOF
<exit_code>-1</exit_code> <error>Command timed out after 60000ms</error>
5:53
The data is quite large. Let me try a different approach - use DuckDB which may be faster, or just query the data efficiently. Let me try using DuckDB with the httpfs extension properly configured for anonymous S3 access.
5:53
Bash
python3 << 'PYEOF'
python3 << 'PYEOF'
import duckdb
con = duckdb.connect()
con.execute("INSTALL httpfs; LOAD httpfs;")
con.execute("SET s3_region='us-west-2'")
con.execute("SET s3_endpoint='s3.us-west-2.amazonaws.com'")
con.execute("SET s3_use_ssl=true")
# No credentials needed for public bucket
con.execute("SET s3_access_key_id=''")
con.execute("SET s3_secret_access_key=''")
con.execute("SET s3_session_token=''")
# Check if we can read
result = con.execute("""
SELECT count(*) as cnt FROM read_parquet(
's3://overturemaps-us-west-2/release/2026-05-20.0/theme=buildings/type=building/part-00000-ddc67476-7254-536d-b4da-7c405a803f55-c000.zstd.parquet'
)
""").fetchone()
print(f"Row count: {result}")
PYEOF
Row count: (4953140,)
6:05
DuckDB can read the files. Let me use DuckDB for the whole workflow. Let me first explore the data to understand the structure better, especially for Nigeria and Lagos. I need: 1. Buildings in Lagos State, Nigeria from Overture 2. Filter to buildings > 1000 m² footprint 3. Assign each building to a Local Government Area (LGA) 4. Create the per-LGA summary Let me start by exploring the division_area data for Nigeria.
6:05
Bash
python3 << 'PYEOF'
python3 << 'PYEOF'
import duckdb
con = duckdb.connect()
con.execute("INSTALL httpfs; LOAD httpfs;")
con.execute("SET s3_region='us-west-2'")
con.execute("SET s3_endpoint='s3.us-west-2.amazonaws.com'")
con.execute("SET s3_use_ssl=true")
con.execute("SET s3_access_key_id=''")
con.execute("SET s3_secret_access_key=''")
con.execute("SET s3_session_token=''")
# Read a few rows from division_area
result = con.execute("""
SELECT id, country, subtype, admin_level, class, region, names.primary as name
FROM read_parquet(
's3://overturemaps-us-west-2/release/2026-05-20.0/theme=divisions/type=division_area/part-00000-20b4270d-ad56-5e61-86d3-ee6b0af5789c-c000.zstd.parquet'
)
LIMIT 30
""").fetchdf()
print(result.to_string())
PYEOF
id country subtype admin_level class region name 0 bfde889c-de6e-4b06-9498-b7e8ec1fabbc AS locality <NA> land NaN Alao 1 55fbda6e-f6c4-4c45-9cdd-624a4c84d1cb CK region 2 maritime NaN Pukapuka 2 9c5848fb-22c6-4b93-a516-74beb7ae7465 PF neighborhood <NA> land NaN Raroia 3 0a743dd4-0670-4c75-b8d0-ff8e26a1a75e MX neighborhood <NA> land MX-CMX Tlacoyaque 4 0207cd56-3a3a-494e-92fe-289491ff15ad PF locality <NA> land NaN Katiu 5 9f281aac-60fe-4a3c-8d53-dc6c70456be6 MX neighborhood <NA> land MX-CMX Corpus Christi 6 e6221fc8-d60f-4f3e-bf4f-5eb7ca1d506a WS locality <NA> land WS-SA Fogasavai'i 7 d6b6cb4b-7de0-4e8d-b943-b663e1820aa4 MX county 2 land MX-JAL Quitupan 8 98298ca9-1c2f-4647-a7e4-18dc88c22544 TO locality <NA> land TO-02 'O'ua 9 3b8fd853-1998-4b32-b73e-f5fc1852f108 PF neighborhood <NA> land NaN Moruroa 10 ea0bebe4-097b-4722-8508-bb3fe0e0881f MX county 2 land MX-GRO Zirándaro 11 48e16336-ee30-4370-9405-6351392f3742 MX locality <NA> land MX-MIC El Caracol 12 a8d55659-72fa-49dd-bc01-1a827b069727 AS locality <NA> land NaN Mapusagafou 13 bbd9c6d8-8bda-471e-803a-e27cccc088c0 MX neighborhood <NA> land MX-CMX Santa Fe Peña Blanca 14 d41ccbff-62e7-4d96-9a3d-3b22344261d7 MX microhood <NA> land MX-JAL Plaza de San Jose 15 ec7d0bdd-b34b-4bb0-a6fa-0b0644d277d1 FJ locality <NA> land FJ-E Namuka 16 371105a8-1bd3-4727-acfa-07168f002407 PF county 3 land NaN Faʻaʻā 17 3b569175-b1b9-4588-a1a9-b9c8e14011e8 US locality <NA> land US-HI Honalo 18 9af2da29-2e9c-4626-9e7b-e9e4d5b57891 MX county 2 land MX-MEX Lerma 19 5f85b505-c2f0-460d-8d8a-59cb2f895025 MX neighborhood <NA> land MX-MEX Lomas Verdes 6ta Sección 20 6d8c3144-f86c-42b7-9bec-e81e5f28d58d TK locality <NA> land NaN Fakaofo 21 78622374-aa0e-43df-9728-69da22307982 MX microhood <NA> land MX-MOR Espacio Público Eterna Primavera 22 e0a29f9a-29bd-4535-8a11-12c6dcee0209 MX neighborhood <NA> land MX-MEX Colonia Cruz de Dendho 23 620f7b0a-eac9-423a-bde7-1ddb122659c0 UM region 2 land UM-86 Jarvis Island 24 d130f967-8985-48e3-a6b4-996a8eb9f265 MX neighborhood <NA> land MX-CMX Abdías García Soto 25 d66785e6-4e06-4fab-a4bf-3e42724a6bca MX neighborhood <NA> land MX-COL Luis Donaldo Colosio 26 8e0c7b6c-eff4-453b-8695-3fbabe9cfdf8 TO locality <NA> land TO-03 Ma'atu 27 0d1bc330-e551-43fe-ae4a-fd91ad16ada7 MX county 2 land MX-MIC Huandacareo 28 a0e8f4b6-51a1-4cbe-aa75-191363efbe7e MX neighborhood <NA> land MX-MEX El Beato 29 e04481a5-9578-4959-b236-8dd5cb6545ff MX locality <NA> land MX-GRO Las Juntas
6:16
Good, DuckDB can read from S3. Let me take a different approach - use DuckDB for all the heavy lifting. First, let me get the Nigeria boundary (Lagos State) and then the buildings. Let me first find Nigeria (country code NG) and Lagos State in the divisions data.
6:16
Bash
python3 << 'PYEOF'
python3 << 'PYEOF'
import duckdb
con = duckdb.connect()
con.execute("INSTALL httpfs; LOAD httpfs;")
con.execute("SET s3_region='us-west-2'")
con.execute("SET s3_endpoint='s3.us-west-2.amazonaws.com'")
con.execute("SET s3_use_ssl=true")
con.execute("SET s3_access_key_id=''")
con.execute("SET s3_secret_access_key=''")
con.execute("SET s3_session_token=''")
# Find Nigeria divisions
result = con.execute("""
SELECT id, country, subtype, admin_level, class, region, names.primary as name
FROM read_parquet(
's3://overturemaps-us-west-2/release/2026-05-20.0/theme=divisions/type=division_area/*.zstd.parquet',
hive_partitioning=true
)
WHERE country = 'NG'
LIMIT 100
""").fetchdf()
print(result.to_string())
PYEOF
id country subtype admin_level class region name 0 6e1ada31-8f7b-4529-b8d9-cd14f488b4eb NG county 2 land NG-OY Atisbo 1 32cfa1af-5e74-41a7-9272-b9dd8da715d9 NG county 2 land NG-LA Amuwo Odofin 2 fcaab4e8-075a-4ab6-a2d2-cd614e95dfc2 NG county 2 land NG-LA Agege 3 7aa34b16-ba84-45d9-94c4-083c65b8cf09 NG county 2 land NG-OG Ifo 4 b45b178e-4913-475f-a57e-c26c4d286198 NG locality <NA> land NG-OY Aiyete 5 51955eb1-4697-4268-b8e2-8554f7a77475 NG county 2 land NG-OG Ado Odo/Ota 6 07c90284-d410-436d-b210-4525190b58d5 NG locality <NA> land NG-OY Tapa 7 ddd19324-8591-4e3f-816a-4c1133cb659d NG county 2 land NG-OG Abeokuta North 8 00496a96-e1ca-42cb-a1c8-531f48aa7461 NG county 2 land NG-OG Ipokia 9 a97eadcd-f4e8-445b-8d28-fb98a825aad3 NG county 2 land NG-OG Imeko Afon 10 934f5b76-4cbf-4902-b124-b33d057d87b1 NG county 2 land NG-LA Ifako/Ijaye 11 37bf198f-83f4-4cee-929c-cc1558c287ad NG county 2 land NG-OY Ibarapa North 12 dc9f547a-48ef-4f9c-a90b-31f6b247a3b8 NG county 2 land NG-OG Yewa South 13 eab2ff24-e6f4-4a66-9d19-a4169499674a NG county 2 land NG-OG Ewekoro 14 457a7986-f212-4765-b432-e1c1b77d6d61 NG county 2 land NG-LA Oshodi/Isolo 15 93ccd19d-3389-4c00-a8e3-920e20810d39 NG county 2 land NG-LA Badagry 16 1a9532db-bcd5-4050-b62b-f27a22f2ebcf NG county 2 land NG-LA Alimosho 17 a11b7421-21e3-4644-94a6-b7c707176710 NG county 2 land NG-OY Iwajowa 18 811f3e9d-01f3-4cba-acc0-d2a4ec81dfb0 NG county 2 land NG-OY Ibarapa Central 19 7918fd74-2ce4-48af-b3c1-1a4f1e26558e NG county 2 land NG-LA Ojo 20 fa202edf-03e8-4cf4-9453-a56f196740d3 NG county 2 land NG-KW Baruten 21 c4470aea-2ffd-451c-8c2a-961a38bd10eb NG county 2 land NG-OY Saki West 22 2ef4ab31-2bf2-4831-9dad-3631ec1ad4dd NG county 2 land NG-OG Yewa North 23 7cd48648-cb4c-43ad-801a-ea7003fd4979 NG locality <NA> land NG-KD Kurmin Kogi 24 8a9899fa-8cf5-4c8a-bd50-0f83cf5c2c3f NG county 2 land NG-KD Jaba 25 c140e0a6-3e56-4c79-8f1e-4d5b4af52914 NG locality <NA> land NG-KD Wuciciri 26 74ff4301-6d55-4a93-bdcf-21d0ee77dd49 NG locality <NA> land NG-KD Danwata 27 d70906a1-dbe8-44ad-8e68-0908a621bd39 NG locality <NA> land NG-KN Kokiya 28 42f8dc05-1379-4b59-ab1c-1970e090e53f NG locality <NA> land NG-KD Rahama 29 14b75cb3-b3f9-4ee4-af33-0ad92c8db0e3 NG locality <NA> land NG-SO Bodinga Tauma 30 ab126ab5-8213-4242-800e-cd453cdae1d8 NG region 1 land NG-SO Sokoto 31 b80d0c54-2069-444c-a3ad-19ab313292aa NG locality <NA> land NG-KT Tamilo A 32 267f5486-6d3e-4917-9458-04632dcc6c57 NG locality <NA> land NG-SO Dingyadi 33 018c0489-85e2-47af-b930-ee6bb67e24ab NG locality <NA> land NG-KT Bujawa 34 cdaf6a75-c484-4d65-b04e-e310711b7bd3 NG county 2 land NG-OY Ibadan North East 35 ae873cf6-87de-4aa4-b031-21346de3590a NG locality <NA> land NG-KT Baure 36 9a537d3b-67e0-498f-805b-d7cdceac8c20 NG locality <NA> land NG-SO Sarkin Adar Gandu 37 4ad6f9f3-24ee-4cd0-b52d-6f29c775a42d NG locality <NA> land NG-KD Kufena 38 809f4ea7-b903-43d7-abc2-7334068c7145 NG county 2 land NG-AK Uyo 39 f5772514-0257-40a7-a68a-b9227328b344 NG county 2 land NG-KE Sakaba 40 bc01e791-45ea-40f9-bf32-2da73d037f7a NG locality <NA> land NG-KE Hirishi-Magarza 41 d6575fc7-67b1-40d1-a7df-b92227cd168c NG locality <NA> land NG-KT Shema 42 3722cf55-91ed-41a4-b944-4c7031528e53 NG locality <NA> land NG-ZA Wonaka 43 9928dddf-7fca-4f54-8513-8deec984141f NG county 2 land NG-OY Ogbomosho South 44 e4a8a1de-3082-48a2-96aa-303e5246f9e5 NG locality <NA> land NG-KD Madakiya 45 2cd6369f-46bb-4b65-b043-c42c92332b23 NG county 2 land NG-NI Munya 46 ab9cda6a-af21-4490-a6c0-b9254c38f4c7 NG county 2 land NG-NI Gbako 47 c861ed53-adfe-416a-b0a7-bab921f6db77 NG locality <NA> land NG-KT Rimi 48 5aa26d19-5bf6-4045-a4ed-fdf22d5f264a NG locality <NA> land NG-KD Garu 49 13a9f930-a7b7-44b9-bfc9-9c9b948572ae NG county 2 land NG-KD Sabon Gari 50 af48f46b-180c-42b5-b579-5cd9c15fe1f2 NG locality <NA> land NG-KT Gwarjo 51 f2ac41b9-7d05-4676-9f1c-d805bb6ffde9 NG county 2 land NG-DE Oshimili North 52 06dd88e8-84e5-46d9-aa9e-2658545177a7 NG locality <NA> land NG-KE Dutsin Mari Dul Meru 53 5836c271-c0ee-47fc-93b2-255e1daa76ff NG county 2 land NG-KT Matazu 54 5852c184-ef5c-400c-8a33-00b0ad233e35 NG locality <NA> land NG-KN Unguwar Tudu 55 7de372d4-9d2f-4dc4-a4cf-dc5a857afe6b NG locality <NA> land NG-SO Fakka 56 e65aa201-c5b7-4c1d-ae9b-8da887d7c737 NG locality <NA> land NG-SO Maikujera 57 0db46add-8cbc-4880-9439-c9a38e3cefbe NG locality <NA> land NG-KN Kwajale 58 80762d73-8f76-4c51-a3e5-e9ccec836800 NG county 2 land NG-EK Ise/Orun 59 b35f2aa0-9ec0-413a-b042-b3f331ead685 NG locality <NA> land NG-ZA Shinkafi North 60 ca2c03eb-80de-4931-80a8-117d72057dac NG locality <NA> land NG-DE Iyara 61 c8ebc3e6-4db5-49d4-aa8f-7927eca07c50 NG locality <NA> land NG-KE Janbirni 62 3490af5b-b510-4ab9-a45b-439eab0df4ea NG locality <NA> land NG-JI Dansure 63 6543410e-308b-40c1-8cad-b8d61a16e5a0 NG locality <NA> land NG-KT Jibia B 64 79b2c29f-b7ed-4fc3-8f11-5094d66777fc NG county 2 land NG-KN Gwarzo 65 707d32f7-a189-45c0-baf0-e9e16ba0fc6f NG locality <NA> land NG-KT Galadima A 66 92d1e309-9d1f-4d49-be1d-9096635e5fa9 NG locality <NA> land NG-KT Mazoji B 67 4033e7a8-2698-4de5-bbf9-cbea44ce22c4 NG macrohood <NA> land NG-EN Agbani 68 c25545a5-e0d7-40ca-b68a-8762ebbf0d77 NG locality <NA> land NG-KE Bubuche 69 b113986b-6823-4565-82fd-ca3b25e1c515 NG locality <NA> land NG-KD Gwanki 70 2803a799-af98-47a7-b446-1726d8baada7 NG county 2 land NG-EK Oye 71 5eaaaa15-4e64-47ad-8797-81ba300963ff NG locality <NA> land NG-KN Tsaudawa 72 eb103c4b-681c-49a0-b282-038c9c652537 NG locality <NA> land NG-KT Kafur 73 68805783-4721-44b1-aae3-44a2517ae97b NG locality <NA> land NG-KE Yelwa South 74 0bc00f7c-087d-4eb7-889a-8f3026dec25b NG locality <NA> land NG-ZA Danjibga 75 50a9303c-3c3f-4314-9dac-ca44dfee8b37 NG locality <NA> land NG-ZA Gyalganye 76 b3adf4e0-f701-435b-a3ef-f0b644371569 NG county 2 land NG-EB Ivo 77 54ec4bfe-ec0f-485d-a167-a6a37e5676f0 NG locality <NA> land NG-KD Kargi 78 6a9ff3b3-37ec-4d1f-aec2-73322f647047 NG county 2 land NG-KO Adavi 79 900e3a19-ad60-4d95-8206-9e74b22ac13c NG locality <NA> land NG-KD Kallah 80 51b996d6-9b2e-4476-831f-8f24987b86bd NG locality <NA> land NG-OS Offatedo 81 9054eb32-b7d4-42bc-9c09-a349c6d392ec NG county 2 land NG-IM Ezinihitte 82 21a06479-eaea-4b70-b901-6320ad29aba2 NG locality <NA> land NG-KT Danja B 83 5fb3f6cb-3993-4bb7-8a53-73e775feb14f NG region 1 land NG-LA Lagos 84 e893047a-cbff-4a33-b6e8-0a1502cae2cf NG county 2 land NG-AN Ihiala 85 3aa19b7b-a3a2-4233-b1d9-eb9d57ce000b NG county 2 land NG-KO Ankpa 86 709af9f8-7f88-4b9c-b328-9a3e8235fc58 NG locality <NA> land NG-ZA Dangaladima 87 030cb2c1-697b-418e-b820-49393989a03b NG county 2 land NG-LA Ìkòròdú 88 51881eec-d425-4d14-aa25-fcc2a98a6d9a NG county 2 land NG-OS Isokan 89 d0f56911-3e8e-41a2-af36-3f925b3f9e08 NG locality <NA> land NG-KN Kabagiwa 90 750a7f3b-c0d0-46dc-b953-c821d24c31d3 NG county 2 land NG-DE Aniocha South 91 580fe8ed-0388-4503-b857-9074c5c73322 NG county 2 land NG-KD Zangon Kataf 92 7f53952e-6e0f-406e-aea5-c468d8d54f48 NG county 2 land NG-AK Ibeno 93 3c5bf504-2eb8-4b87-9cfc-3458b8eb5d51 NG locality <NA> land NG-KD Katari 94 867159af-6f6d-4b64-8ad5-5e2edc3e4c59 NG locality <NA> land NG-SO Rikina 95 e69e0ecd-90b9-4996-b993-ff4072821ace NG region 1 land NG-KE Kebbi 96 fbd4f0ad-a276-499b-a39b-ab750cd37675 NG locality <NA> land NG-ZA Shinkafi South 97 1ba3b03d-1064-4ba0-a4e1-a9ddc736c6ab NG locality <NA> land NG-KT Kukasheka 98 089798f0-8eb0-4dd4-93e6-05cb13312c72 NG county 2 land NG-KE Wasagu-Danko 99 0799474f-ad84-4ed6-b8f2-7ce536b80f6a NG locality <NA> land NG-KE Zauro
6:41
Good. I can see: - Nigeria uses `country = 'NG'` - Lagos State region has `region = 'NG-LA'` and `subtype = 'region'`, admin_level=1 - Lagos Local Government Areas (LGAs) are `subtype = 'county'`, admin_level=2 with `region = 'NG-LA'` Let me get all the Lagos LGAs (counties in NG-LA region).
6:41
Bash
python3 << 'PYEOF'
python3 << 'PYEOF'
import duckdb
con = duckdb.connect()
con.execute("INSTALL httpfs; LOAD httpfs;")
con.execute("SET s3_region='us-west-2'")
con.execute("SET s3_endpoint='s3.us-west-2.amazonaws.com'")
con.execute("SET s3_use_ssl=true")
con.execute("SET s3_access_key_id=''")
con.execute("SET s3_secret_access_key=''")
con.execute("SET s3_session_token=''")
# Get Lagos region
result = con.execute("""
SELECT id, name, region
FROM (
SELECT id, names.primary as name, region
FROM read_parquet(
's3://overturemaps-us-west-2/release/2026-05-20.0/theme=divisions/type=division_area/*.zstd.parquet',
hive_partitioning=true
)
WHERE country = 'NG' AND region = 'NG-LA' AND subtype = 'county'
)
ORDER BY name
""").fetchdf()
print("Lagos LGAs (counties):")
print(result.to_string())
print(f"\nTotal: {len(result)}")
PYEOF
Lagos LGAs (counties):
id name region
0 fcaab4e8-075a-4ab6-a2d2-cd614e95dfc2 Agege NG-LA
1 e2af9375-5eed-46f8-a258-515f22297084 Ajeromi/Ifelodun NG-LA
2 1a9532db-bcd5-4050-b62b-f27a22f2ebcf Alimosho NG-LA
3 32cfa1af-5e74-41a7-9272-b9dd8da715d9 Amuwo Odofin NG-LA
4 4b715cfc-a9b3-4a05-b601-353b77619939 Apapa NG-LA
5 93ccd19d-3389-4c00-a8e3-920e20810d39 Badagry NG-LA
6 61530733-5992-41f4-9212-43f1b460319c Epe NG-LA
7 0c5600b6-df25-4e75-a25e-ae9937299bad Eti Osa NG-LA
8 70bd9c0f-fcdd-4bd2-81df-34d9351c997e Ibeju Lekki NG-LA
9 934f5b76-4cbf-4902-b124-b33d057d87b1 Ifako/Ijaye NG-LA
10 c540f662-44c9-498d-8f3d-2a41a039a29f Ikeja NG-LA
11 cb527384-149e-41c1-b3c5-0070deab27d4 Kosofe NG-LA
12 c58fd335-e502-421c-b8e3-f263c5fb0ff7 Lagos Island NG-LA
13 639f85d6-f581-4f31-b52b-98da63a57271 Lagos Mainland NG-LA
14 ffe6c369-91e2-4984-97d1-bd24feff3732 Mushin NG-LA
15 7918fd74-2ce4-48af-b3c1-1a4f1e26558e Ojo NG-LA
16 457a7986-f212-4765-b432-e1c1b77d6d61 Oshodi/Isolo NG-LA
17 b098198f-55d1-46a6-ac1e-71ae35fdd913 Shomolu NG-LA
18 f0f8532f-4c5e-428a-981e-12a31b2a9044 Surulere NG-LA
19 030cb2c1-697b-418e-b820-49393989a03b Ìkòròdú NG-LA
Total: 20
7:04
Now I have a good understanding of the data structure. Let me plan the approach: 1. Get the Lagos state boundary (region with region='NG-LA' and subtype='region') 2. Get all 20 LGA (county) polygons for Lagos (region='NG-LA', subtype='county') 3. Read all building footprints that intersect the Lagos area 4. Filter to buildings with footprint area > 1000 m² 5. Spatial join with LGA polygons to assign LGA names 6. Compute summary statistics per LGA Let me first get the Lagos region geometry to use for spatial filtering of buildings. Actually, I need to think about this more carefully. The buildings are global and there are 512 parquet files. I need to filter for Lagos state efficiently. Let me: 1. First get the Lagos region boundary to know the bounding box 2. Use that bounding box to filter buildings 3. Then do the spatial join Let me start by getting the Lagos region geometry.
7:04
Bash
python3 << 'PYEOF'
python3 << 'PYEOF'
import duckdb
con = duckdb.connect()
con.execute("INSTALL spatial; LOAD spatial;")
con.execute("INSTALL httpfs; LOAD httpfs;")
con.execute("SET s3_region='us-west-2'")
con.execute("SET s3_endpoint='s3.us-west-2.amazonaws.com'")
con.execute("SET s3_use_ssl=true")
con.execute("SET s3_access_key_id=''")
con.execute("SET s3_secret_access_key=''")
con.execute("SET s3_session_token=''")
# Get Lagos region geometry as WKT
result = con.execute("""
SELECT id, names.primary as name, ST_AsText(ST_GeomFromWKB(geometry)) as wkt
FROM read_parquet(
's3://overturemaps-us-west-2/release/2026-05-20.0/theme=divisions/type=division_area/*.zstd.parquet',
hive_partitioning=true
)
WHERE country = 'NG' AND region = 'NG-LA' AND subtype = 'region'
LIMIT 1
""").fetchdf()
print(result['wkt'].iloc[0][:500])
print(f"Name: {result['name'].iloc[0]}")
PYEOF
Traceback (most recent call last):
File "<stdin>", line 13, in <module>
_duckdb.BinderException: Binder Error: No function matches the given name and argument types 'st_geomfromwkb(GEOMETRY('OGC:CRS84'))'. You might need to add explicit type casts.
Candidate functions:
st_geomfromwkb(BLOB) -> GEOMETRY
LINE 2: SELECT id, names.primary as name, ST_AsText(ST_GeomFromWKB(geometry)) as wkt
^
7:15
The geometry column is stored as WKB binary in the Overture parquet files, but the DuckDB spatial extension might need it cast properly. Let me try a different approach - use PyArrow to read the WKB and then convert with shapely, or use DuckDB differently. Actually, let me check the geometry type more carefully.
7:15
Bash
python3 << 'PYEOF'
python3 << 'PYEOF'
import duckdb
con = duckdb.connect()
con.execute("INSTALL spatial; LOAD spatial;")
con.execute("INSTALL httpfs; LOAD httpfs;")
con.execute("SET s3_region='us-west-2'")
con.execute("SET s3_endpoint='s3.us-west-2.amazonaws.com'")
con.execute("SET s3_use_ssl=true")
con.execute("SET s3_access_key_id=''")
con.execute("SET s3_secret_access_key=''")
con.execute("SET s3_session_token=''")
# Check geometry type
result = con.execute("""
SELECT typeof(geometry)
FROM read_parquet(
's3://overturemaps-us-west-2/release/2026-05-20.0/theme=divisions/type=division_area/part-00000-20b4270d-ad56-5e61-86d3-ee6b0af5789c-c000.zstd.parquet'
)
LIMIT 1
""").fetchdf()
print(result)
PYEOF
typeof(geometry)
0 GEOMETRY('OGC:CRS84')
7:33
So DuckDB's spatial extension handles geometry differently. Let me use the spatial extension properly. The geometry column is already a GEOMETRY type (OGC:CRS84). Let me figure out the approach. I need to: 1. Get the Lagos state boundary (region level) - or rather, the 20 LGAs 2. Get all buildings from Overture 3. Filter buildings that intersect Lagos LGAs 4. Calculate footprint area in m² 5. Filter those with > 1000 m² footprint 6. Create per-LGA summary Let me use Python with DuckDB approach - query the buildings that intersect with Lagos LGAs. The buildings have a WKB geometry (binary). I need to convert to geometry using ST_GeomFromWKB, then filter by spatial join with Lagos LGA polygons. Let me do it step by step. First, get the LGA geometries.
7:33
Bash
python3 << 'PYEOF'
python3 << 'PYEOF'
import duckdb
con = duckdb.connect()
con.execute("INSTALL spatial; LOAD spatial;")
con.execute("INSTALL httpfs; LOAD httpfs;")
con.execute("SET s3_region='us-west-2'")
con.execute("SET s3_endpoint='s3.us-west-2.amazonaws.com'")
con.execute("SET s3_use_ssl=true")
con.execute("SET s3_access_key_id=''")
con.execute("SET s3_secret_access_key=''")
con.execute("SET s3_session_token=''")
# Check how to use geometry with DuckDB spatial
result = con.execute("""
SELECT id, names.primary as name, ST_AsText(geometry) as wkt
FROM read_parquet(
's3://overturemaps-us-west-2/release/2026-05-20.0/theme=divisions/type=division_area/*.zstd.parquet',
hive_partitioning=true
)
WHERE country = 'NG' AND region = 'NG-LA' AND subtype = 'county'
ORDER BY name
""").fetchdf()
print(f"Columns: {list(result.columns)}")
print(f"Number of LGAs: {len(result)}")
# Print names
for i, row in result.iterrows():
print(f" {row['name']} - id={row['id']}")
# Save the LGA geometries for later use
import pickle
with open('/work/lga_data.pkl', 'wb') as f:
pickle.dump(result, f)
PYEOF
Columns: ['id', 'name', 'wkt'] Number of LGAs: 20 Agege - id=fcaab4e8-075a-4ab6-a2d2-cd614e95dfc2 Ajeromi/Ifelodun - id=e2af9375-5eed-46f8-a258-515f22297084 Alimosho - id=1a9532db-bcd5-4050-b62b-f27a22f2ebcf Amuwo Odofin - id=32cfa1af-5e74-41a7-9272-b9dd8da715d9 Apapa - id=4b715cfc-a9b3-4a05-b601-353b77619939 Badagry - id=93ccd19d-3389-4c00-a8e3-920e20810d39 Epe - id=61530733-5992-41f4-9212-43f1b460319c Eti Osa - id=0c5600b6-df25-4e75-a25e-ae9937299bad Ibeju Lekki - id=70bd9c0f-fcdd-4bd2-81df-34d9351c997e Ifako/Ijaye - id=934f5b76-4cbf-4902-b124-b33d057d87b1 Ikeja - id=c540f662-44c9-498d-8f3d-2a41a039a29f Kosofe - id=cb527384-149e-41c1-b3c5-0070deab27d4 Lagos Island - id=c58fd335-e502-421c-b8e3-f263c5fb0ff7 Lagos Mainland - id=639f85d6-f581-4f31-b52b-98da63a57271 Mushin - id=ffe6c369-91e2-4984-97d1-bd24feff3732 Ojo - id=7918fd74-2ce4-48af-b3c1-1a4f1e26558e Oshodi/Isolo - id=457a7986-f212-4765-b432-e1c1b77d6d61 Shomolu - id=b098198f-55d1-46a6-ac1e-71ae35fdd913 Surulere - id=f0f8532f-4c5e-428a-981e-12a31b2a9044 Ìkòròdú - id=030cb2c1-697b-418e-b820-49393989a03b
8:19
Now I have the LGA boundaries for Lagos. Let me build a comprehensive solution. The approach will be: 1. Read all LGA boundaries for Lagos (county subtype, region=NG-LA) 2. Read all building footprints that intersect with Lagos bounding box 3. For each building, compute its area and join with LGA (spatial join) 4. Filter buildings > 1000 m² 5. Output to geoparquet 6. Compute per-LGA summary statistics Let me write the full script.
8:19
8:19
Write
/work/solve.py
content · 160 lines · py
"""
Extract Lagos State (Nigeria) building footprints > 1000 m² from Overture Maps,
spatially join to LGAs, and produce per-LGA summary statistics.
Outputs:
/work/lagos_buildings.geoparquet - EPSG:4326 GeoParquet with id, height,
footprint_area_m2, lga, geometry
/work/lagos_building_summary.parquet - Plain Parquet with per-LGA rollup
"""
import duckdb
import geopandas as gpd
import pandas as pd
import numpy as np
import pyarrow.parquet as pq
import pyarrow as pa
from shapely import wkb, wkt
from shapely.geometry import shape
import json
# ---------------------------------------------------------------
# 1. Connect to DuckDB with spatial + httpfs
# ---------------------------------------------------------------
con = duckdb.connect()
con.execute("INSTALL spatial; LOAD spatial;")
con.execute("INSTALL httpfs; LOAD httpfs;")
con.execute("SET s3_region='us-west-2'")
con.execute("SET s3_endpoint='s3.us-west-2.amazonaws.com'")
con.execute("SET s3_use_ssl=true")
con.execute("SET s3_access_key_id=''")
con.execute("SET s3_secret_access_key=''")
con.execute("SET s3_session_token=''")
OAK = "s3://overturemaps-us-west-2/release/2026-05-20.0"
# ---------------------------------------------------------------
# 2. Load Lagos LGA (county) boundaries
# ---------------------------------------------------------------
print("Loading Lagos LGA boundaries...")
lga_df = con.execute(f"""
SELECT
id,
names.primary AS name,
ST_AsWKB(geometry) AS geom_wkb
FROM read_parquet(
'{OAK}/theme=divisions/type=division_area/*.zstd.parquet',
hive_partitioning=true
)
WHERE country = 'NG' AND region = 'NG-LA' AND subtype = 'county'
ORDER BY name
""").fetchdf()
# Convert WKB to shapely geometries
lga_df['geometry'] = lga_df['geom_wkb'].apply(lambda x: wkb.loads(bytes(x)))
lga_gdf = gpd.GeoDataFrame(lga_df, geometry='geometry', crs='EPSG:4326')
print(f" Loaded {len(lga_gdf)} LGAs")
# Get Lagos bounding box for filtering buildings
minx, miny, maxx, maxy = lga_gdf.total_bounds
print(f" Lagos bbox: {minx:.4f}, {miny:.4f}, {maxx:.4f}, {maxy:.4f}")
# ---------------------------------------------------------------
# 3. Read building footprints within Lagos bbox
# ---------------------------------------------------------------
print("Loading building data for Lagos region...")
# First, let's check how many buildings there are in the bbox
# by reading building parquet files that intersect the bbox
# Buildings are not bbox-partitioned in this release (no bbox= subdirs),
# so we need to read all files and filter
# Let's try a smarter approach - read with a bbox filter
# Use bbox column to filter efficiently
buildings_df = con.execute(f"""
SELECT
id,
height,
ST_Area(ST_GeomFromWKB(geometry)) AS footprint_area_m2,
ST_AsWKB(geometry) AS geom_wkb
FROM read_parquet(
'{OAK}/theme=buildings/type=building/*.zstd.parquet',
hive_partitioning=true
)
WHERE bbox.xmin < {maxx}
AND bbox.xmax > {minx}
AND bbox.ymin < {maxy}
AND bbox.ymax > {miny}
""").fetchdf()
print(f" Read {len(buildings_df)} buildings in Lagos bbox")
# Convert WKB to geometry
buildings_df['geometry'] = buildings_df['geom_wkb'].apply(lambda x: wkb.loads(bytes(x)))
buildings_gdf = gpd.GeoDataFrame(buildings_df, geometry='geometry', crs='EPSG:4326')
# ---------------------------------------------------------------
# 4. Spatial join: assign each building to its LGA
# ---------------------------------------------------------------
print("Performing spatial join...")
# Use GeoPandas sjoin
joined = gpd.sjoin(buildings_gdf, lga_gdf[['id', 'name', 'geometry']],
how='inner', predicate='intersects')
# Some buildings might intersect multiple LGAs — keep the first match
joined = joined.drop_duplicates(subset='id_left')
joined = joined.rename(columns={'id_left': 'building_id', 'id_right': 'lga_id', 'name': 'lga'})
print(f" Matched {len(joined)} buildings to LGAs")
# ---------------------------------------------------------------
# 5. Filter buildings > 1000 m²
# ---------------------------------------------------------------
large_buildings = joined[joined['footprint_area_m2'] > 1000.0].copy()
print(f" Buildings > 1000 m²: {len(large_buildings)}")
# ---------------------------------------------------------------
# 6. Prepare output GeoParquet
# ---------------------------------------------------------------
out_buildings = large_buildings[['building_id', 'height', 'footprint_area_m2', 'lga', 'geometry']].copy()
out_buildings = out_buildings.rename(columns={'building_id': 'id'})
out_buildings = out_buildings.reset_index(drop=True)
print(f" Writing {len(out_buildings)} buildings to /work/lagos_buildings.geoparquet")
out_buildings.to_parquet(
'/work/lagos_buildings.geoparquet',
index=False,
compression='zstd',
schema_version='1.0'
)
# ---------------------------------------------------------------
# 7. Per-LGA roll-up summary
# ---------------------------------------------------------------
print("Computing LGA summary...")
summary = large_buildings.groupby('lga').agg(
n_buildings=('building_id', 'count'),
total_footprint_m2=('footprint_area_m2', 'sum'),
n_with_height=('height', lambda x: x.notna().sum()),
p50_height_m=('height', 'median')
).reset_index()
# If an LGA has no buildings > 1000 m², it won't appear.
# Ensure all 20 LGAs are present (with NaN for missing data).
all_lgas = lga_gdf['name'].tolist()
summary = summary.set_index('lga').reindex(all_lgas).reset_index()
summary['n_buildings'] = summary['n_buildings'].fillna(0).astype(int)
summary['n_with_height'] = summary['n_with_height'].fillna(0).astype(int)
summary['total_footprint_m2'] = summary['total_footprint_m2'].fillna(0.0)
# p50_height_m stays NaN where there are no buildings (or no heights)
print(f" Summary has {len(summary)} LGAs")
print(summary.to_string())
print(f" Writing to /work/lagos_building_summary.parquet")
summary.to_parquet(
'/work/lagos_building_summary.parquet',
index=False,
compression='zstd'
)
print("Done!")
Wrote /work/solve.py (6164 chars).
8:30
Let me run this script. But I'm worried about the memory - there are 500+ parquet files each with ~5 million rows, so reading all buildings in the Lagos bbox could be huge. Let me first check how many buildings are roughly in the Lagos area to estimate memory usage. Actually, let me just run it and see. I can also optimize by reading only relevant columns.
Wait, one issue: the building data has 512 files with ~5M rows each. Filtering by bbox in DuckDB should push down the predicate effectively. But it might still need to scan all files. Let me think about a better approach...
Actually, the ST_Area computation inside DuckDB might not work correctly because the buildings are stored with WKB geometry but the type is GEOMETRY('OGC:CRS84'). Also, the area computed by ST_Area will be in degrees, not square meters, so that's wrong. I need to compute area using a projected CRS.
Let me rewrite the approach:
1. Read building data using DuckDB, filtering by bbox
2. Use GeoPandas to reproject to a suitable UTM zone for Lagos (UTM zone 31N? Actually Nigeria spans zones 31-33. Lagos is around 3°E, so UTM zone 31N (EPSG:32631) should work for area calculation)
3. Compute footprint area in a projected CRS
Actually, let me compute the area with ST_Area using a geography cast or a suitable projection in DuckDB. Or I can do it in GeoPandas after loading.
Let me restructure the approach to be more memory-efficient:
- Read the division_area data first (small)
- Read buildings using DuckDB with bbox filter and minimal columns
- Convert to GeoPandas
- Spatial join
- Compute area in a projected CRS
- Filter by area > 1000
Let me also think about whether the bbox column filter will be effective. DuckDB should be able to use the bbox struct fields to filter.
Let me run the script but first check memory and time constraints.
8:30
Bash
python3 /work/solve.py 2>&1
Loading Lagos LGA boundaries...
Loaded 20 LGAs
Lagos bbox: 2.7060, 6.3667, 4.3509, 6.6968
Loading building data for Lagos region...
Traceback (most recent call last):
File "/work/solve.py", line 74, in <module>
buildings_df = con.execute(f"""
~~~~~~~~~~~^^^^^
SELECT
^^^^^^
...<11 lines>...
AND bbox.ymax > {miny}
^^^^^^^^^^^^^^^^^^^^^^
""").fetchdf()
^^^^
_duckdb.BinderException: Binder Error: No function matches the given name and argument types 'st_geomfromwkb(GEOMETRY('OGC:CRS84'))'. You might need to add explicit type casts.
Candidate functions:
st_geomfromwkb(BLOB) -> GEOMETRY
LINE 5: ST_Area(ST_GeomFromWKB(geometry)) AS footprint_area_m2,
^
8:56
The geometry column is already of type `GEOMETRY('OGC:CRS84')` in DuckDB, so I don't need to convert it. I can use the geometry directly with spatial functions.
8:56
Edit
/work/solve.py
old_string not found in /work/solve.py
9:02
Let me check the current content of the file.
9:02
Read
/work/solve.py
1 """
2 Extract Lagos State (Nigeria) building footprints > 1000 m² from Overture Maps,
3 spatially join to LGAs, and produce per-LGA summary statistics.
4
5 Outputs:
6 /work/lagos_buildings.geoparquet - EPSG:4326 GeoParquet with id, height,
7 footprint_area_m2, lga, geometry
8 /work/lagos_building_summary.parquet - Plain Parquet with per-LGA rollup
9 """
10
11 import duckdb
12 import geopandas as gpd
13 import pandas as pd
14 import numpy as np
15 import pyarrow.parquet as pq
16 import pyarrow as pa
17 from shapely import wkb, wkt
18 from shapely.geometry import shape
19 import json
20
21 # ---------------------------------------------------------------
22 # 1. Connect to DuckDB with spatial + httpfs
23 # ---------------------------------------------------------------
24 con = duckdb.connect()
25 con.execute("INSTALL spatial; LOAD spatial;")
26 con.execute("INSTALL httpfs; LOAD httpfs;")
27 con.execute("SET s3_region='us-west-2'")
28 con.execute("SET s3_endpoint='s3.us-west-2.amazonaws.com'")
29 con.execute("SET s3_use_ssl=true")
30 con.execute("SET s3_access_key_id=''")
31 con.execute("SET s3_secret_access_key=''")
32 con.execute("SET s3_session_token=''")
33
34 OAK = "s3://overturemaps-us-west-2/release/2026-05-20.0"
35
36 # ---------------------------------------------------------------
37 # 2. Load Lagos LGA (county) boundaries
38 # ---------------------------------------------------------------
39 print("Loading Lagos LGA boundaries...")
40 lga_df = con.execute(f"""
41 SELECT
42 id,
43 names.primary AS name,
44 ST_AsWKB(geometry) AS geom_wkb
45 FROM read_parquet(
46 '{OAK}/theme=divisions/type=division_area/*.zstd.parquet',
47 hive_partitioning=true
48 )
49 WHERE country = 'NG' AND region = 'NG-LA' AND subtype = 'county'
50 ORDER BY name
51 """).fetchdf()
52
53 # Convert WKB to shapely geometries
54 lga_df['geometry'] = lga_df['geom_wkb'].apply(lambda x: wkb.loads(bytes(x)))
55 lga_gdf = gpd.GeoDataFrame(lga_df, geometry='geometry', crs='EPSG:4326')
56 print(f" Loaded {len(lga_gdf)} LGAs")
57
58 # Get Lagos bounding box for filtering buildings
59 minx, miny, maxx, maxy = lga_gdf.total_bounds
60 print(f" Lagos bbox: {minx:.4f}, {miny:.4f}, {maxx:.4f}, {maxy:.4f}")
61
62 # ---------------------------------------------------------------
63 # 3. Read building footprints within Lagos bbox
64 # ---------------------------------------------------------------
65 print("Loading building data for Lagos region...")
66 # First, let's check how many buildings there are in the bbox
67 # by reading building parquet files that intersect the bbox
68
69 # Buildings are not bbox-partitioned in this release (no bbox= subdirs),
70 # so we need to read all files and filter
71 # Let's try a smarter approach - read with a bbox filter
72
73 # Use bbox column to filter efficiently
74 buildings_df = con.execute(f"""
75 SELECT
76 id,
77 height,
78 ST_Area(ST_GeomFromWKB(geometry)) AS footprint_area_m2,
79 ST_AsWKB(geometry) AS geom_wkb
80 FROM read_parquet(
81 '{OAK}/theme=buildings/type=building/*.zstd.parquet',
82 hive_partitioning=true
83 )
84 WHERE bbox.xmin < {maxx}
85 AND bbox.xmax > {minx}
86 AND bbox.ymin < {maxy}
87 AND bbox.ymax > {miny}
88 """).fetchdf()
89
90 print(f" Read {len(buildings_df)} buildings in Lagos bbox")
91
92 # Convert WKB to geometry
93 buildings_df['geometry'] = buildings_df['geom_wkb'].apply(lambda x: wkb.loads(bytes(x)))
94 buildings_gdf = gpd.GeoDataFrame(buildings_df, geometry='geometry', crs='EPSG:4326')
95
96 # ---------------------------------------------------------------
97 # 4. Spatial join: assign each building to its LGA
98 # ---------------------------------------------------------------
99 print("Performing spatial join...")
100 # Use GeoPandas sjoin
101 joined = gpd.sjoin(buildings_gdf, lga_gdf[['id', 'name', 'geometry']],
102 how='inner', predicate='intersects')
103
104 # Some buildings might intersect multiple LGAs — keep the first match
105 joined = joined.drop_duplicates(subset='id_left')
106 joined = joined.rename(columns={'id_left': 'building_id', 'id_right': 'lga_id', 'name': 'lga'})
107 print(f" Matched {len(joined)} buildings to LGAs")
108
109 # ---------------------------------------------------------------
110 # 5. Filter buildings > 1000 m²
111 # ---------------------------------------------------------------
112 large_buildings = joined[joined['footprint_area_m2'] > 1000.0].copy()
113 print(f" Buildings > 1000 m²: {len(large_buildings)}")
114
115 # ---------------------------------------------------------------
116 # 6. Prepare output GeoParquet
117 # ---------------------------------------------------------------
118 out_buildings = large_buildings[['building_id', 'height', 'footprint_area_m2', 'lga', 'geometry']].copy()
119 out_buildings = out_buildings.rename(columns={'building_id': 'id'})
120 out_buildings = out_buildings.reset_index(drop=True)
121
122 print(f" Writing {len(out_buildings)} buildings to /work/lagos_buildings.geoparquet")
123 out_buildings.to_parquet(
124 '/work/lagos_buildings.geoparquet',
125 index=False,
126 compression='zstd',
127 schema_version='1.0'
128 )
129
130 # ---------------------------------------------------------------
131 # 7. Per-LGA roll-up summary
132 # ---------------------------------------------------------------
133 print("Computing LGA summary...")
134 summary = large_buildings.groupby('lga').agg(
135 n_buildings=('building_id', 'count'),
136 total_footprint_m2=('footprint_area_m2', 'sum'),
137 n_with_height=('height', lambda x: x.notna().sum()),
138 p50_height_m=('height', 'median')
139 ).reset_index()
140
141 # If an LGA has no buildings > 1000 m², it won't appear.
142 # Ensure all 20 LGAs are present (with NaN for missing data).
143 all_lgas = lga_gdf['name'].tolist()
144 summary = summary.set_index('lga').reindex(all_lgas).reset_index()
145 summary['n_buildings'] = summary['n_buildings'].fillna(0).astype(int)
146 summary['n_with_height'] = summary['n_with_height'].fillna(0).astype(int)
147 summary['total_footprint_m2'] = summary['total_footprint_m2'].fillna(0.0)
148 # p50_height_m stays NaN where there are no buildings (or no heights)
149
150 print(f" Summary has {len(summary)} LGAs")
151 print(summary.to_string())
152
153 print(f" Writing to /work/lagos_building_summary.parquet")
154 summary.to_parquet(
155 '/work/lagos_building_summary.parquet',
156 index=False,
157 compression='zstd'
158 )
159
160 print("Done!")
9:07
The issue is on line 78: `ST_Area(ST_GeomFromWKB(geometry))`. The geometry column is already a GEOMETRY type in DuckDB's spatial extension, so I can just use `ST_Area(geometry)` directly.
9:07
Edit
/work/solve.py
Edited /work/solve.py.
9:13
Wait, the building schema says `geometry: binary` not a geometry type. So DuckDB needs to convert it. Let me check the actual type of the geometry column in DuckDB when reading from parquet.
9:13
Bash
python3 << 'PYEOF'
python3 << 'PYEOF'
import duckdb
con = duckdb.connect()
con.execute("INSTALL spatial; LOAD spatial;")
con.execute("INSTALL httpfs; LOAD httpfs;")
con.execute("SET s3_region='us-west-2'")
con.execute("SET s3_endpoint='s3.us-west-2.amazonaws.com'")
con.execute("SET s3_use_ssl=true")
con.execute("SET s3_access_key_id=''")
con.execute("SET s3_secret_access_key=''")
con.execute("SET s3_session_token=''")
# Check the type of geometry column
result = con.execute("""
SELECT typeof(geometry)
FROM read_parquet(
's3://overturemaps-us-west-2/release/2026-05-20.0/theme=buildings/type=building/part-00000-ddc67476-7254-536d-b4da-7c405a803f55-c000.zstd.parquet'
)
LIMIT 1
""").fetchdf()
print(f"Geometry type: {result.iloc[0,0]}")
# Check if we can cast it
result = con.execute("""
SELECT ST_Area(ST_GeomFromWKB(geometry::BLOB)) as area
FROM read_parquet(
's3://overturemaps-us-west-2/release/2026-05-20.0/theme=buildings/type=building/part-00000-ddc67476-7254-536d-b4da-7c405a803f55-c000.zstd.parquet'
)
LIMIT 5
""").fetchdf()
print(result)
PYEOF
Geometry type: GEOMETRY('OGC:CRS84')
Traceback (most recent call last):
File "<stdin>", line 23, in <module>
_duckdb.ConversionException: Conversion Error: Unimplemented type for cast (GEOMETRY('OGC:CRS84') -> BLOB) when casting from source column geometry
LINE 2: SELECT ST_Area(ST_GeomFromWKB(geometry::BLOB)) as area
^
9:30
The geometry is stored as GEOMETRY type in DuckDB. We should use ST_Area directly on the geometry column without conversion. Let me fix the script.
9:30
Bash
python3 << 'PYEOF'
python3 << 'PYEOF'
import duckdb
con = duckdb.connect()
con.execute("INSTALL spatial; LOAD spatial;")
con.execute("INSTALL httpfs; LOAD httpfs;")
con.execute("SET s3_region='us-west-2'")
con.execute("SET s3_endpoint='s3.us-west-2.amazonaws.com'")
con.execute("SET s3_use_ssl=true")
con.execute("SET s3_access_key_id=''")
con.execute("SET s3_secret_access_key=''")
con.execute("SET s3_session_token=''")
# ST_Area directly on geometry should work
result = con.execute("""
SELECT ST_Area(geometry) as area, ST_AsWKB(geometry) as wkb
FROM read_parquet(
's3://overturemaps-us-west-2/release/2026-05-20.0/theme=buildings/type=building/part-00000-ddc67476-7254-536d-b4da-7c405a803f55-c000.zstd.parquet'
)
LIMIT 5
""").fetchdf()
print(result)
PYEOF
area wkb 0 2.234262e-06 [1, 3, 0, 0, 0, 1, 0, 0, 0, 6, 0, 0, 0, 134, 1... 1 1.119847e-08 [1, 3, 0, 0, 0, 1, 0, 0, 0, 9, 0, 0, 0, 64, 20... 2 7.462190e-09 [1, 3, 0, 0, 0, 1, 0, 0, 0, 5, 0, 0, 0, 246, 1... 3 1.239404e-08 [1, 3, 0, 0, 0, 1, 0, 0, 0, 5, 0, 0, 0, 182, 1... 4 4.740940e-09 [1, 3, 0, 0, 0, 1, 0, 0, 0, 5, 0, 0, 0, 74, 19...
9:48
Good, ST_Area directly on the geometry works. The area values are very small though — that's because the geometry is in WGS84 (degrees), so area is in square degrees. I need to compute the area in square meters. I'll need to reproject the data or use a different approach. Let me think about this... The Overture buildings geometry is in EPSG:4326. To get area in m², I should either: 1. Use ST_Area with a geography cast (if DuckDB supports it) 2. Reproject to a suitable projected CRS for Nigeria/Lagos area 3. Use Shapely's area with a projected CRS after reading Since we're already reading the data into GeoPandas, it might be better to compute the footprint area in GeoPandas with a proper projected CRS. Let me compute area in m² using an appropriate UTM zone for Lagos (zone 31N or 32N). Actually, let me check DuckDB's behavior. In DuckDB Spatial, ST_Area on geography geometries returns m². But here the geometry is in WGS84. Let me try casting to geography.
9:48
Bash
python3 << 'PYEOF'
python3 << 'PYEOF'
import duckdb
con = duckdb.connect()
con.execute("INSTALL spatial; LOAD spatial;")
con.execute("INSTALL httpfs; LOAD httpfs;")
con.execute("SET s3_region='us-west-2'")
con.execute("SET s3_endpoint='s3.us-west-2.amazonaws.com'")
con.execute("SET s3_use_ssl=true")
con.execute("SET s3_access_key_id=''")
con.execute("SET s3_secret_access_key=''")
con.execute("SET s3_session_token=''")
# Try using ST_Area on geography type
result = con.execute("""
SELECT ST_Area(geometry::GEOGRAPHY) as area_m2
FROM read_parquet(
's3://overturemaps-us-west-2/release/2026-05-20.0/theme=buildings/type=building/part-00000-ddc67476-7254-536d-b4da-7c405a803f55-c000.zstd.parquet'
)
LIMIT 5
""").fetchdf()
print(result)
PYEOF
Traceback (most recent call last):
File "<stdin>", line 13, in <module>
_duckdb.CatalogException: Catalog Error: Type with name GEOGRAPHY does not exist!
Did you mean "LINESTRING_2D"?
LINE 2: SELECT ST_Area(geometry::GEOGRAPHY) as area_m2
^
10:03
The ST_Area on GEOMETRY returns area in degrees (which is wrong for measurement). I need to use the correct approach. Let me use `ST_Transform` to project to a local CRS for Lagos (UTM zone 31N or 32N) and then compute area. Actually, looking at the Overture docs, the geometry stored in buildings is in EPSG:4326 (WGS84). To compute area in m², I need to either: 1. Use `ST_Area(geometry)` after transforming to a projected CRS 2. Or transform in Python with GeoPandas Let me use DuckDB's ST_Transform. For Nigeria/Lagos, UTM zone 31N (EPSG:32631) covers most of it. Let me check. Actually, Lagos is around longitude 3°E, which is in UTM zone 31N (EPSG:32631). Let me use that. But wait - DuckDB spatial might not have ST_Transform. Let me check.
10:03
Bash
python3 << 'PYEOF'
python3 << 'PYEOF'
import duckdb
con = duckdb.connect()
con.execute("INSTALL spatial; LOAD spatial;")
con.execute("INSTALL httpfs; LOAD httpfs;")
con.execute("SET s3_region='us-west-2'")
con.execute("SET s3_endpoint='s3.us-west-2.amazonaws.com'")
con.execute("SET s3_use_ssl=true")
con.execute("SET s3_access_key_id=''")
con.execute("SET s3_secret_access_key=''")
con.execute("SET s3_session_token=''")
# Check if ST_Transform exists
result = con.execute("""
SELECT 1
""").fetchdf()
# List functions
functions = con.execute("""
SELECT function_name, function_type
FROM duckdb_functions()
WHERE function_name LIKE '%Transform%' OR function_name LIKE '%Area%'
""").fetchdf()
print(functions.to_string())
PYEOF
function_name function_type 0 ST_Area scalar 1 ST_Area scalar 2 ST_Area scalar 3 ST_Area scalar 4 ST_Area scalar 5 ST_Area_Spheroid scalar 6 ST_Area_Spheroid scalar 7 ST_BuildArea scalar 8 ST_Transform scalar 9 ST_Transform scalar 10 ST_Transform scalar 11 ST_Transform scalar 12 ST_Transform scalar 13 ST_Transform scalar 14 ST_Transform scalar 15 ST_Transform scalar
10:14
The issue is that `ST_Area` on a geometry in EPSG:4326 gives area in square degrees, not square meters. I need to either: 1. Use `ST_Transform` to project to a suitable meter-based CRS for Nigeria (UTM zone 31N or 32N) 2. Or use `ST_Area_Spheroid` which calculates area on the spheroid Let's try `ST_Area_Spheroid` first - it should return area in square meters. Actually, a simpler approach is to use `ST_Area(geometry)` which in DuckDB spatial with GeoParquet data might return area in the native CRS units. But for EPSG:4326, that's degrees. Let me try ST_Area_Spheroid or ST_Transform to UTM. For Lagos, the UTM zone is 31N (EPSG:32631) - around longitude 2-4°E. Actually, let me use a different approach - use pyarrow/geopandas directly for all the processing since DuckDB seems to have issues with area calculation.
10:14
Bash
python3 << 'PYEOF'
python3 << 'PYEOF'
import duckdb
con = duckdb.connect()
con.execute("INSTALL spatial; LOAD spatial;")
con.execute("INSTALL httpfs; LOAD httpfs;")
con.execute("SET s3_region='us-west-2'")
con.execute("SET s3_endpoint='s3.us-west-2.amazonaws.com'")
con.execute("SET s3_use_ssl=true")
con.execute("SET s3_access_key_id=''")
con.execute("SET s3_secret_access_key=''")
con.execute("SET s3_session_token=''")
# Try ST_Area_Spheroid which returns area in square meters
result = con.execute("""
SELECT ST_Area_Spheroid(geometry) as area_m2
FROM read_parquet(
's3://overturemaps-us-west-2/release/2026-05-20.0/theme=buildings/type=building/part-00000-ddc67476-7254-536d-b4da-7c405a803f55-c000.zstd.parquet'
)
LIMIT 5
""").fetchdf()
print(result)
PYEOF
area_m2 0 NaN 1 NaN 2 NaN 3 NaN 4 NaN
10:42
The ST_Area_Spheroid returns NaN for these small geometries. Let me try using ST_Transform to a UTM projection for area calculation, or just compute area in geopandas after loading. Actually, let me think about this differently. The geometry is in EPSG:4326 (degrees). ST_Area on degrees gives areas in square degrees (not useful). ST_Area_Spheroid should give m² but returns NaN for these very small buildings. Instead, I'll compute the area in GeoPandas after loading the geometries. That's simpler and more reliable. Let me revise the approach: 1. Load building data from DuckDB using bbox filter, getting id, height, geometry (as WKB) 2. Convert to GeoPandas GeoDataFrame 3. Compute area in a projected CRS (UTM zone 31N or 32N) to get m² 4. Filter > 1000 m² 5. Spatial join with LGAs 6. Output
10:42
Write
/work/solve.py
content · 169 lines · py
"""
Extract Lagos State (Nigeria) building footprints > 1000 m² from Overture Maps,
spatially join to LGAs, and produce per-LGA summary statistics.
Outputs:
/work/lagos_buildings.geoparquet - EPSG:4326 GeoParquet with id, height,
footprint_area_m2, lga, geometry
/work/lagos_building_summary.parquet - Plain Parquet with per-LGA rollup
"""
import duckdb
import geopandas as gpd
import pandas as pd
import numpy as np
from shapely import wkb
from shapely.geometry import shape
import json
# ---------------------------------------------------------------
# 1. Connect to DuckDB with spatial + httpfs
# ---------------------------------------------------------------
con = duckdb.connect()
con.execute("INSTALL spatial; LOAD spatial;")
con.execute("INSTALL httpfs; LOAD httpfs;")
con.execute("SET s3_region='us-west-2'")
con.execute("SET s3_endpoint='s3.us-west-2.amazonaws.com'")
con.execute("SET s3_use_ssl=true")
con.execute("SET s3_access_key_id=''")
con.execute("SET s3_secret_access_key=''")
con.execute("SET s3_session_token=''")
OAK = "s3://overturemaps-us-west-2/release/2026-05-20.0"
# ---------------------------------------------------------------
# 2. Load Lagos LGA (county) boundaries
# ---------------------------------------------------------------
print("Loading Lagos LGA boundaries...")
lga_df = con.execute(f"""
SELECT
id,
names.primary AS name,
ST_AsWKB(geometry) AS geom_wkb
FROM read_parquet(
'{OAK}/theme=divisions/type=division_area/*.zstd.parquet',
hive_partitioning=true
)
WHERE country = 'NG' AND region = 'NG-LA' AND subtype = 'county'
ORDER BY name
""").fetchdf()
# Convert WKB to shapely geometries
lga_df['geometry'] = lga_df['geom_wkb'].apply(lambda x: wkb.loads(bytes(x)))
lga_gdf = gpd.GeoDataFrame(lga_df, geometry='geometry', crs='EPSG:4326')
print(f" Loaded {len(lga_gdf)} LGAs")
# Get Lagos bounding box for filtering buildings
minx, miny, maxx, maxy = lga_gdf.total_bounds
print(f" Lagos bbox: {minx:.4f}, {miny:.4f}, {maxx:.4f}, {maxy:.4f}")
# ---------------------------------------------------------------
# 3. Read building footprints within Lagos bbox
# ---------------------------------------------------------------
print("Loading building data for Lagos region...")
buildings_df = con.execute(f"""
SELECT
id,
height,
ST_AsWKB(geometry) AS geom_wkb
FROM read_parquet(
'{OAK}/theme=buildings/type=building/*.zstd.parquet',
hive_partitioning=true
)
WHERE bbox.xmin < {maxx}
AND bbox.xmax > {minx}
AND bbox.ymin < {maxy}
AND bbox.ymax > {miny}
""").fetchdf()
print(f" Read {len(buildings_df)} buildings in Lagos bbox")
# Convert WKB to geometry (lazy, keep as WKB to save memory then convert)
buildings_gdf = gpd.GeoDataFrame(
buildings_df,
geometry=[wkb.loads(bytes(x)) for x in buildings_df['geom_wkb']],
crs='EPSG:4326'
)
# Drop WKB column
buildings_gdf = buildings_gdf.drop(columns=['geom_wkb'])
print(f" Built GeoDataFrame with {len(buildings_gdf)} buildings")
# ---------------------------------------------------------------
# 4. Compute footprint areas in square meters
# ---------------------------------------------------------------
print("Computing footprint areas...")
# Lagos spans UTM zones 31N and 32N. Let's use a local UTM projection.
# For accuracy across Lagos, we can use the median longitude to pick a UTM zone,
# or use an Albers/Equidistant projection for Nigeria.
# Lagos is around longitudes 2.7° to 4.4°.
# UTM 31N covers 0°-6°E, so all of Lagos is in UTM 31N.
buildings_utm = buildings_gdf.to_crs('EPSG:32631') # UTM zone 31N
buildings_gdf['footprint_area_m2'] = buildings_utm.area
print(f" Area computed (min: {buildings_gdf['footprint_area_m2'].min():.2f}, "
f"max: {buildings_gdf['footprint_area_m2'].max():.2f})")
# ---------------------------------------------------------------
# 5. Spatial join: assign each building to its LGA
# ---------------------------------------------------------------
print("Performing spatial join...")
joined = gpd.sjoin(buildings_gdf, lga_gdf[['id', 'name', 'geometry']],
how='inner', predicate='intersects')
# Some buildings might intersect multiple LGAs — keep the first match
joined = joined.drop_duplicates(subset='id_left')
joined = joined.rename(columns={'id_left': 'building_id', 'id_right': 'lga_id', 'name': 'lga'})
print(f" Matched {len(joined)} buildings to LGAs")
# ---------------------------------------------------------------
# 6. Filter buildings > 1000 m²
# ---------------------------------------------------------------
large_buildings = joined[joined['footprint_area_m2'] > 1000.0].copy()
print(f" Buildings > 1000 m²: {len(large_buildings)}")
# ---------------------------------------------------------------
# 7. Prepare output GeoParquet
# ---------------------------------------------------------------
out_buildings = large_buildings[['building_id', 'height', 'footprint_area_m2', 'lga', 'geometry']].copy()
out_buildings = out_buildings.rename(columns={'building_id': 'id'})
out_buildings = out_buildings.reset_index(drop=True)
print(f" Writing {len(out_buildings)} buildings to /work/lagos_buildings.geoparquet")
out_buildings.to_parquet(
'/work/lagos_buildings.geoparquet',
index=False,
compression='zstd',
schema_version='1.0'
)
# ---------------------------------------------------------------
# 8. Per-LGA roll-up summary
# ---------------------------------------------------------------
print("Computing LGA summary...")
summary = large_buildings.groupby('lga').agg(
n_buildings=('building_id', 'count'),
total_footprint_m2=('footprint_area_m2', 'sum'),
n_with_height=('height', lambda x: x.notna().sum()),
p50_height_m=('height', 'median')
).reset_index()
# Ensure all 20 LGAs are present
all_lgas = lga_gdf['name'].tolist()
summary = summary.set_index('lga').reindex(all_lgas).reset_index()
summary['n_buildings'] = summary['n_buildings'].fillna(0).astype(int)
summary['n_with_height'] = summary['n_with_height'].fillna(0).astype(int)
summary['total_footprint_m2'] = summary['total_footprint_m2'].fillna(0.0)
# p50_height_m stays NaN where no heights available
print(f" Summary has {len(summary)} LGAs")
print(summary.to_string())
print(f" Writing to /work/lagos_building_summary.parquet")
summary.to_parquet(
'/work/lagos_building_summary.parquet',
index=False,
compression='zstd'
)
print("Done!")
Wrote /work/solve.py (6534 chars).
10:47
Bash
python3 /work/solve.py 2>&1
<exit_code>-1</exit_code> <error>Command timed out after 600000ms</error>
20:57
The query is reading ALL building parquet files (512 parts) from S3. Even with the bbox filter, it still needs to scan all files. This is taking way too long. Let me think of a better approach. I have a few options: 1. Instead of reading all building data through DuckDB, use a more targeted approach with PyArrow/pandas to filter the data 2. Download the data locally first, then process it 3. Use a more efficient spatial filtering approach Actually, the issue is that reading 512 parquet files from S3 is slow. Let me think about this differently. The Overture building data for 2026-05-20.0 has 512 parquet files, each quite large. Reading all of them with a WHERE clause still requires scanning the bbox column from all files. Let me try a different approach: 1. First, read only the `id`, `height`, `bbox`, and `geometry` columns for buildings that likely intersect Lagos 2. Use PyArrow with row group filtering since DuckDB's bbox filtering might still need to read data Actually, the key insight is that the bbox column is a struct, and DuckDB may be scanning everything. Let me try using PyArrow directly to read only the needed files/row groups. Wait, actually maybe the issue is that reading 512 parquet files just takes too long. Let me try a more incremental approach - read fewer files first, or use a different strategy. Let me try using GeoPandas directly with pyarrow to filter by bbox. The building data has a `bbox` struct column with xmin, xmax, ymin, ymax. I can use `pyarrow.dataset` to filter efficiently. Actually, let me try a simpler approach: use PyArrow with filter on the bbox columns. Or even better - use DuckDB but with the anonymous credentials. Let me check if the issue is with the duckdb query or the data volume. Let me first check how many buildings are in Lagos by just counting with a bbox filter.
20:57
Bash
python3 << 'PYEOF'
python3 << 'PYEOF'
import time
import duckdb
con = duckdb.connect()
con.execute("INSTALL spatial; LOAD spatial;")
con.execute("INSTALL httpfs; LOAD httpfs;")
con.execute("SET s3_region='us-west-2'")
con.execute("SET s3_endpoint='s3.us-west-2.amazonaws.com'")
con.execute("SET s3_use_ssl=true")
con.execute("SET s3_access_key_id=''")
con.execute("SET s3_secret_access_key=''")
con.execute("SET s3_session_token=''")
OAK = "s3://overturemaps-us-west-2/release/2026-05-20.0"
t0 = time.time()
# Just count how many buildings in Lagos bbox
result = con.execute(f"""
SELECT count(*) as cnt
FROM read_parquet(
'{OAK}/theme=buildings/type=building/*.zstd.parquet',
hive_partitioning=true
)
WHERE bbox.xmin < 4.351 AND bbox.xmax > 2.706
AND bbox.ymin < 6.697 AND bbox.ymax > 6.367
""").fetchone()
t1 = time.time()
print(f"Count: {result[0]}, Time: {t1-t0:.1f}s")
PYEOF
Count: 3060143, Time: 138.2s
23:34
OK so there are ~3 million buildings in Lagos bbox. Reading all those into a GeoDataFrame and then doing a spatial join with 20 LGAs might be heavy but let's try a more efficient approach. Instead of loading all 3M buildings into GeoPandas, let's use DuckDB to do the spatial join directly. We can load the LGA geometries into DuckDB and use ST_Intersects/ST_Within there.
23:34
Write
/work/solve.py
content · 152 lines · py
"""
Extract Lagos State (Nigeria) building footprints > 1000 m² from Overture Maps,
spatially join to LGAs using DuckDB spatial, and produce per-LGA summary.
Outputs:
/work/lagos_buildings.geoparquet - EPSG:4326 GeoParquet with id, height,
footprint_area_m2, lga, geometry
/work/lagos_building_summary.parquet - Plain Parquet with per-LGA rollup
"""
import duckdb
import geopandas as gpd
import pandas as pd
from shapely import wkb
import time
# ---------------------------------------------------------------
# 1. Connect to DuckDB with spatial + httpfs
# ---------------------------------------------------------------
con = duckdb.connect()
con.execute("INSTALL spatial; LOAD spatial;")
con.execute("INSTALL httpfs; LOAD httpfs;")
con.execute("SET s3_region='us-west-2'")
con.execute("SET s3_endpoint='s3.us-west-2.amazonaws.com'")
con.execute("SET s3_use_ssl=true")
con.execute("SET s3_access_key_id=''")
con.execute("SET s3_secret_access_key=''")
con.execute("SET s3_session_token=''")
OAK = "s3://overturemaps-us-west-2/release/2026-05-20.0"
t0 = time.time()
# ---------------------------------------------------------------
# 2. Load Lagos LGA boundaries into a DuckDB table
# ---------------------------------------------------------------
print("Loading Lagos LGA boundaries...")
lga_df = con.execute(f"""
SELECT
id AS lga_id,
names.primary AS lga_name,
geometry
FROM read_parquet(
'{OAK}/theme=divisions/type=division_area/*.zstd.parquet',
hive_partitioning=true
)
WHERE country = 'NG' AND region = 'NG-LA' AND subtype = 'county'
ORDER BY lga_name
""").fetchdf()
lga_gdf = gpd.GeoDataFrame(lga_df, geometry='geometry', crs='EPSG:4326')
print(f" Loaded {len(lga_gdf)} LGAs")
# Create a temporary table with LGA geometries
con.execute("CREATE TABLE lgas AS SELECT * FROM lga_df")
print(f" Created LGA table: {time.time()-t0:.1f}s")
# Get Lagos bounding box
minx, miny, maxx, maxy = lga_gdf.total_bounds
print(f" Lagos bbox: {minx:.4f}, {miny:.4f}, {maxx:.4f}, {maxy:.4f}")
# ---------------------------------------------------------------
# 3. Read buildings in Lagos bbox and spatial join with LGAs
# ---------------------------------------------------------------
print("Loading buildings & performing spatial join...")
# Strategy: use DuckDB's spatial join between buildings and LGA polygons
# Use a subquery that filters buildings by bbox, then spatially joins with LGAs
t1 = time.time()
join_result = con.execute(f"""
WITH lagos_buildings AS (
SELECT id, height, geometry
FROM read_parquet(
'{OAK}/theme=buildings/type=building/*.zstd.parquet',
hive_partitioning=true
)
WHERE bbox.xmin < {maxx}
AND bbox.xmax > {minx}
AND bbox.ymin < {maxy}
AND bbox.ymax > {miny}
)
SELECT b.id, b.height, b.geometry,
l.lga_name AS lga
FROM lagos_buildings b
JOIN lgas l ON ST_Intersects(b.geometry, l.geometry)
""").fetchdf()
t2 = time.time()
print(f" Joined {len(join_result)} buildings to LGAs in {t2-t1:.1f}s")
# Some buildings might intersect multiple LGAs — keep first match
join_result = join_result.drop_duplicates(subset='id')
print(f" After dedup: {len(join_result)} unique buildings")
# ---------------------------------------------------------------
# 4. Add area computation (in EPSG:32631 UTM 31N)
# ---------------------------------------------------------------
print("Computing footprint areas...")
gdf = gpd.GeoDataFrame(join_result, geometry='geometry', crs='EPSG:4326')
gdf_utm = gdf.to_crs('EPSG:32631')
gdf['footprint_area_m2'] = gdf_utm.area
print(f" Areas: min={gdf['footprint_area_m2'].min():.1f}, "
f"max={gdf['footprint_area_m2'].max():.1f}")
# ---------------------------------------------------------------
# 5. Filter buildings > 1000 m²
# ---------------------------------------------------------------
large = gdf[gdf['footprint_area_m2'] > 1000.0].copy()
print(f" Buildings > 1000 m²: {len(large)}")
# ---------------------------------------------------------------
# 6. Write output GeoParquet (EPSG:4326)
# ---------------------------------------------------------------
out_buildings = large[['id', 'height', 'footprint_area_m2', 'lga', 'geometry']].copy()
out_buildings = out_buildings.reset_index(drop=True)
print(f" Writing {len(out_buildings)} buildings to /work/lagos_buildings.geoparquet")
out_buildings.to_parquet(
'/work/lagos_buildings.geoparquet',
index=False,
compression='zstd',
schema_version='1.0'
)
# ---------------------------------------------------------------
# 7. Per-LGA summary
# ---------------------------------------------------------------
print("Computing LGA summary...")
summary = large.groupby('lga').agg(
n_buildings=('id', 'count'),
total_footprint_m2=('footprint_area_m2', 'sum'),
n_with_height=('height', lambda x: x.notna().sum()),
p50_height_m=('height', 'median')
).reset_index()
# Ensure all 20 LGAs are present
all_lgas = lga_gdf['lga_name'].tolist()
summary = summary.set_index('lga').reindex(all_lgas).reset_index()
summary['n_buildings'] = summary['n_buildings'].fillna(0).astype(int)
summary['n_with_height'] = summary['n_with_height'].fillna(0).astype(int)
summary['total_footprint_m2'] = summary['total_footprint_m2'].fillna(0.0)
print(f" Summary ({len(summary)} LGAs):")
print(summary.to_string())
print(" Writing to /work/lagos_building_summary.parquet")
summary.to_parquet(
'/work/lagos_building_summary.parquet',
index=False,
compression='zstd'
)
print(f"Total time: {time.time()-t0:.1f}s")
print("Done!")
Wrote /work/solve.py (5689 chars).
23:45
Let me first check if the DuckDB spatial join approach works quickly, or if it's going to be very slow. The previous attempt timed out at 600s. Let me check what's happening with a smaller test. Actually, the issue might be that reading all building parquet files (512 files, each ~5M rows) is very slow. Let me think of a more efficient approach. Let me try a different strategy: use pyarrow/s3fs to read the data more efficiently, or use DuckDB with a more targeted approach. Actually, looking at the previous timing: counting took 138s for 3M buildings. That's because it has to scan through all 512 parquet files. But the DuckDB query used bbox filtering which helps. Let me try a different approach: read the data in chunks using pyarrow, filter by bbox, and process locally.
23:45
Bash
python3 << 'PYEOF'
python3 << 'PYEOF'
import time
import duckdb
con = duckdb.connect()
con.execute("INSTALL spatial; LOAD spatial;")
con.execute("INSTALL httpfs; LOAD httpfs;")
con.execute("SET s3_region='us-west-2'")
con.execute("SET s3_endpoint='s3.us-west-2.amazonaws.com'")
con.execute("SET s3_use_ssl=true")
con.execute("SET s3_access_key_id=''")
con.execute("SET s3_secret_access_key=''")
con.execute("SET s3_session_token=''")
OAK = "s3://overturemaps-us-west-2/release/2026-05-20.0"
# First create LGA table
lga_df = con.execute(f"""
SELECT
id AS lga_id,
names.primary AS lga_name,
geometry
FROM read_parquet(
'{OAK}/theme=divisions/type=division_area/*.zstd.parquet',
hive_partitioning=true
)
WHERE country = 'NG' AND region = 'NG-LA' AND subtype = 'county'
ORDER BY lga_name
""").fetchdf()
con.execute("CREATE TABLE lgas AS SELECT * FROM lga_df")
# Get bbox
minx, miny, maxx, maxy = 2.706, 6.367, 4.351, 6.697
t0 = time.time()
# Try a more efficient approach: read from just the buildings files with streaming
# Actually, let's try reading the building data into a temp table first with bbox filter,
# then do the spatial join
con.execute(f"""
CREATE TABLE lagos_buildings AS
SELECT id, height, geometry
FROM read_parquet(
'{OAK}/theme=buildings/type=building/*.zstd.parquet',
hive_partitioning=true
)
WHERE bbox.xmin < {maxx}
AND bbox.xmax > {minx}
AND bbox.ymin < {maxy}
AND bbox.ymax > {miny}
""")
t1 = time.time()
print(f"Created lagos_buildings table: {t1-t0:.1f}s")
# Count
cnt = con.execute("SELECT count(*) FROM lagos_buildings").fetchone()[0]
print(f" Buildings: {cnt}")
# Now spatial join
result = con.execute("""
SELECT b.id, b.height, l.lga_name AS lga, b.geometry
FROM lagos_buildings b
JOIN lgas l ON ST_Intersects(b.geometry, l.geometry)
""").fetchdf()
t2 = time.time()
print(f"Spatial join: {t2-t1:.1f}s, Result: {len(result)} rows")
# Dedup
result = result.drop_duplicates(subset='id')
print(f"After dedup: {len(result)}")
print(f"Total time: {t2-t0:.1f}s")
PYEOF
Created lagos_buildings table: 184.7s Buildings: 3060143
Traceback (most recent call last):
File "<stdin>", line 58, in <module>
_duckdb.BinderException: Binder Error: No function matches the given name and argument types 'ST_Intersects(GEOMETRY('OGC:CRS84'), BLOB)'. You might need to add explicit type casts.
Candidate functions:
ST_Intersects(BOX_2D, BOX_2D) -> BOOLEAN
ST_Intersects(GEOMETRY, GEOMETRY) -> BOOLEAN
LINE 4: JOIN lgas l ON ST_Intersects(b.geometry, l.geometry)
^
27:13
The issue is that the LGA geometry column is stored as a blob (because DuckDB read it from parquet as binary), so I need to cast it to geometry or store it as geometry properly. Let me fix this.
27:13
Bash
python3 << 'PYEOF'
python3 << 'PYEOF'
import duckdb
con = duckdb.connect()
con.execute("INSTALL spatial; LOAD spatial;")
con.execute("INSTALL httpfs; LOAD httpfs;")
con.execute("SET s3_region='us-west-2'")
con.execute("SET s3_endpoint='s3.us-west-2.amazonaws.com'")
con.execute("SET s3_use_ssl=true")
con.execute("SET s3_access_key_id=''")
con.execute("SET s3_secret_access_key=''")
con.execute("SET s3_session_token=''")
OAK = "s3://overturemaps-us-west-2/release/2026-05-20.0"
# Create LGA table with geometry type properly
con.execute(f"""
CREATE TABLE lgas AS
SELECT id AS lga_id, names.primary AS lga_name,
geometry::GEOMETRY AS geom
FROM read_parquet(
'{OAK}/theme=divisions/type=division_area/*.zstd.parquet',
hive_partitioning=true
)
WHERE country = 'NG' AND region = 'NG-LA' AND subtype = 'county'
ORDER BY lga_name
""")
# Check the type
res = con.execute("SELECT typeof(geom) FROM lgas LIMIT 1").fetchone()
print(f"LGA geom type: {res[0]}")
# Also check building geometry type
res = con.execute(f"""
SELECT typeof(geometry)
FROM read_parquet(
'{OAK}/theme=buildings/type=building/part-00000-ddc67476-7254-536d-b4da-7c405a803f55-c000.zstd.parquet'
)
LIMIT 1
""").fetchone()
print(f"Building geom type: {res[0]}")
PYEOF
LGA geom type: GEOMETRY
Building geom type: GEOMETRY('OGC:CRS84')
27:42
The issue is that the building geometry has a specific SRID (OGC:CRS84) while the LGA geometry from `CREATE TABLE AS` lost the SRID. Let me fix this by explicitly casting or using `ST_Intersects` properly.
27:42
Bash
python3 << 'PYEOF'
python3 << 'PYEOF'
import time
import duckdb
con = duckdb.connect()
con.execute("INSTALL spatial; LOAD spatial;")
con.execute("INSTALL httpfs; LOAD httpfs;")
con.execute("SET s3_region='us-west-2'")
con.execute("SET s3_endpoint='s3.us-west-2.amazonaws.com'")
con.execute("SET s3_use_ssl=true")
con.execute("SET s3_access_key_id=''")
con.execute("SET s3_secret_access_key=''")
con.execute("SET s3_session_token=''")
OAK = "s3://overturemaps-us-west-2/release/2026-05-20.0"
t0 = time.time()
# Create LGA table
con.execute(f"""
CREATE TABLE lgas AS
SELECT id AS lga_id, names.primary AS lga_name,
geometry AS geom
FROM read_parquet(
'{OAK}/theme=divisions/type=division_area/*.zstd.parquet',
hive_partitioning=true
)
WHERE country = 'NG' AND region = 'NG-LA' AND subtype = 'county'
ORDER BY lga_name
""")
print(f"LGA table: {time.time()-t0:.1f}s")
# Create buildings table
minx, miny, maxx, maxy = 2.706, 6.367, 4.351, 6.697
con.execute(f"""
CREATE TABLE lagos_buildings AS
SELECT id, height, geometry
FROM read_parquet(
'{OAK}/theme=buildings/type=building/*.zstd.parquet',
hive_partitioning=true
)
WHERE bbox.xmin < {maxx}
AND bbox.xmax > {minx}
AND bbox.ymin < {maxy}
AND bbox.ymax > {miny}
""")
t1 = time.time()
cnt = con.execute("SELECT count(*) FROM lagos_buildings").fetchone()[0]
print(f"Buildings table: {t1-t0:.1f}s, Count: {cnt}")
# Try spatial join - cast lga.geom to same type
result = con.execute("""
SELECT b.id, b.height, l.lga_name AS lga, b.geometry
FROM lagos_buildings b
JOIN lgas l ON ST_Intersects(b.geometry, l.geom::GEOMETRY('OGC:CRS84'))
""").fetchdf()
t2 = time.time()
print(f"Spatial join: {t2-t1:.1f}s, Result: {len(result)} rows")
result = result.drop_duplicates(subset='id')
print(f"After dedup: {len(result)}")
print(f"Total time: {time.time()-t0:.1f}s")
PYEOF
LGA table: 14.9s Buildings table: 183.0s, Count: 3060143 Spatial join: 13.4s, Result: 2246328 rows After dedup: 2245422 Total time: 196.9s
31:23
The spatial join worked and is fast. Now let me update the full solve script to use this approach. The key insight is: 1. Create temp tables for both LGAs and buildings 2. Use proper type casting for the spatial join 3. Then add area calculations in Python with GeoPandas Let me write the final script.
31:23
31:23
Write
/work/solve.py
content · 158 lines · py
"""
Extract Lagos State (Nigeria) building footprints > 1000 m² from Overture Maps,
spatially join to LGAs using DuckDB spatial, and produce per-LGA summary.
Outputs:
/work/lagos_buildings.geoparquet - EPSG:4326 GeoParquet with id, height,
footprint_area_m2, lga, geometry
/work/lagos_building_summary.parquet - Plain Parquet with per-LGA rollup
"""
import duckdb
import geopandas as gpd
import pandas as pd
from shapely import wkb
import time
t0 = time.time()
# ---------------------------------------------------------------
# 1. Connect to DuckDB with spatial + httpfs
# ---------------------------------------------------------------
con = duckdb.connect()
con.execute("INSTALL spatial; LOAD spatial;")
con.execute("INSTALL httpfs; LOAD httpfs;")
con.execute("SET s3_region='us-west-2'")
con.execute("SET s3_endpoint='s3.us-west-2.amazonaws.com'")
con.execute("SET s3_use_ssl=true")
con.execute("SET s3_access_key_id=''")
con.execute("SET s3_secret_access_key=''")
con.execute("SET s3_session_token=''")
OAK = "s3://overturemaps-us-west-2/release/2026-05-20.0"
# ---------------------------------------------------------------
# 2. Load Lagos LGA boundaries into DuckDB table
# ---------------------------------------------------------------
print("Loading Lagos LGA boundaries...")
con.execute(f"""
CREATE TABLE lgas AS
SELECT id AS lga_id, names.primary AS lga_name,
geometry AS geom
FROM read_parquet(
'{OAK}/theme=divisions/type=division_area/*.zstd.parquet',
hive_partitioning=true
)
WHERE country = 'NG' AND region = 'NG-LA' AND subtype = 'county'
ORDER BY lga_name
""")
# Get the list of LGA names and bbox
lga_info = con.execute("""
SELECT lga_name, ST_AsWKB(geom) AS geom_wkb
FROM lgas
ORDER BY lga_name
""").fetchdf()
lga_geoms = [wkb.loads(bytes(w)) for w in lga_info['geom_wkb']]
lga_gdf = gpd.GeoDataFrame(
{'lga_name': lga_info['lga_name']},
geometry=lga_geoms,
crs='EPSG:4326'
)
minx, miny, maxx, maxy = lga_gdf.total_bounds
print(f" {len(lga_gdf)} LGAs loaded, bbox: {minx:.4f}, {miny:.4f}, {maxx:.4f}, {maxy:.4f}")
# ---------------------------------------------------------------
# 3. Load Lagos buildings filtered by bbox
# ---------------------------------------------------------------
print("Loading buildings in Lagos bbox and spatially joining to LGAs...")
con.execute(f"""
CREATE TABLE lagos_buildings AS
SELECT id, height, geometry
FROM read_parquet(
'{OAK}/theme=buildings/type=building/*.zstd.parquet',
hive_partitioning=true
)
WHERE bbox.xmin < {maxx}
AND bbox.xmax > {minx}
AND bbox.ymin < {maxy}
AND bbox.ymax > {miny}
""")
cnt = con.execute("SELECT count(*) FROM lagos_buildings").fetchone()[0]
print(f" {cnt} buildings in Lagos bbox")
# ---------------------------------------------------------------
# 4. Spatial join buildings to LGAs
# ------------------------------------------------------------
print(" Running spatial join...")
result = con.execute("""
SELECT b.id, b.height, l.lga_name AS lga, b.geometry
FROM lagos_buildings b
JOIN lgas l ON ST_Intersects(b.geometry, l.geom::GEOMETRY('OGC:CRS84'))
""").fetchdf()
print(f" {len(result)} rows from spatial join")
# Deduplicate (some buildings intersect multiple LGAs)
result = result.drop_duplicates(subset='id')
print(f" {len(result)} unique buildings after dedup")
# ---------------------------------------------------------------
# 5. Compute footprint area in square metres using UTM 31N
# ---------------------------------------------------------------
print("Computing footprint areas...")
gdf = gpd.GeoDataFrame(result, geometry='geometry', crs='EPSG:4326')
gdf_utm = gdf.to_crs('EPSG:32631')
gdf['footprint_area_m2'] = gdf_utm.area
print(f" Areas: min={gdf['footprint_area_m2'].min():.1f}, "
f"max={gdf['footprint_area_m2'].max():.1f}")
# ---------------------------------------------------------------
# 6. Filter to buildings > 1000 m²
# ---------------------------------------------------------------
large = gdf[gdf['footprint_area_m2'] > 1000.0].copy()
print(f" Buildings > 1000 m²: {len(large)}")
# ---------------------------------------------------------------
# 7. Write output GeoParquet (EPSG:4326)
# ---------------------------------------------------------------
out_buildings = large[['id', 'height', 'footprint_area_m2', 'lga', 'geometry']].copy()
out_buildings = out_buildings.reset_index(drop=True)
print(f" Writing {len(out_buildings)} buildings to /work/lagos_buildings.geoparquet")
out_buildings.to_parquet(
'/work/lagos_buildings.geoparquet',
index=False,
compression='zstd',
schema_version='1.0'
)
# ---------------------------------------------------------------
# 8. Per-LGA summary
# ---------------------------------------------------------------
print("Computing LGA summary...")
summary = large.groupby('lga').agg(
n_buildings=('id', 'count'),
total_footprint_m2=('footprint_area_m2', 'sum'),
n_with_height=('height', lambda x: x.notna().sum()),
p50_height_m=('height', 'median')
).reset_index()
# Ensure all 20 LGAs are present (fill missing with zeros/NULL)
all_lgas = lga_gdf['lga_name'].tolist()
summary = summary.set_index('lga').reindex(all_lgas).reset_index()
summary['n_buildings'] = summary['n_buildings'].fillna(0).astype(int)
summary['n_with_height'] = summary['n_with_height'].fillna(0).astype(int)
summary['total_footprint_m2'] = summary['total_footprint_m2'].fillna(0.0)
# p50_height_m remains NULL where no heights exist
print(f" Summary ({len(summary)} LGAs):")
print(summary.to_string())
print(" Writing to /work/lagos_building_summary.parquet")
summary.to_parquet(
'/work/lagos_building_summary.parquet',
index=False,
compression='zstd'
)
print(f"Total time: {time.time()-t0:.1f}s")
print("Done!")
Wrote /work/solve.py (5909 chars).
31:30
Bash
python3 /work/solve.py 2>&1
Loading Lagos LGA boundaries...
20 LGAs loaded, bbox: 2.7060, 6.3667, 4.3509, 6.6968
Loading buildings in Lagos bbox and spatially joining to LGAs...
3057423 buildings in Lagos bbox
Running spatial join...
2246328 rows from spatial join
2245422 unique buildings after dedup
Computing footprint areas...
Traceback (most recent call last):
File "/work/solve.py", line 103, in <module>
gdf = gpd.GeoDataFrame(result, geometry='geometry', crs='EPSG:4326')
File "/usr/local/lib/python3.14/site-packages/geopandas/geodataframe.py", line 243, in __init__
self.set_geometry(geometry, inplace=True, crs=crs)
~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.14/site-packages/geopandas/geodataframe.py", line 464, in set_geometry
level = _ensure_geometry(level, crs=crs)
File "/usr/local/lib/python3.14/site-packages/geopandas/geodataframe.py", line 68, in _ensure_geometry
out = from_shapely(np.asarray(data), crs=crs)
File "/usr/local/lib/python3.14/site-packages/geopandas/array.py", line 187, in from_shapely
raise TypeError(f"Input must be valid geometry objects: {geom}")
TypeError: Input must be valid geometry objects: bytearray(b'\x01\x03\x00\x00\x00\x01\x00\x00\x00\x05\x00\x00\x00\x0c/\xd2\xd5g\xa8\x05@\x18\x84\xa8\xb1s\x81\x19@\x1d\x83_4u\xa8\x05@\x0cz\xca\xc5s\x81\x19@\xe8\x9d\xaf\xfet\xa8\x05@\x96\xe2O\xf9|\x81\x19@$C\x8e\xadg\xa8\x05@I\xe9\xe3\xeb|\x81\x19@\x0c/\xd2\xd5g\xa8\x05@\x18\x84\xa8\xb1s\x81\x19@')
34:46
The issue is that DuckDB's geometry column is being read as bytes/bytearray, but GeoPandas expects Shapely geometry objects. I need to convert the geometry from DuckDB's format to Shapely objects. Let me convert using `wkb.loads()` on each geometry.
34:46
Edit
/work/solve.py
Edited /work/solve.py.
34:52
Bash
python3 /work/solve.py 2>&1
Loading Lagos LGA boundaries...
20 LGAs loaded, bbox: 2.7060, 6.3667, 4.3509, 6.6968
Loading buildings in Lagos bbox and spatially joining to LGAs...
3057423 buildings in Lagos bbox
Running spatial join...
2246328 rows from spatial join
2245422 unique buildings after dedup
Converting geometry and computing footprint areas...
Areas: min=0.0, max=181048.2
Buildings > 1000 m²: 7249
Writing 7249 buildings to /work/lagos_buildings.geoparquet
Traceback (most recent call last):
File "/work/solve.py", line 126, in <module>
out_buildings.to_parquet(
~~~~~~~~~~~~~~~~~~~~~~~~^
'/work/lagos_buildings.geoparquet',
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...<2 lines>...
schema_version='1.0'
^^^^^^^^^^^^^^^^^^^^
)
^
File "/usr/local/lib/python3.14/site-packages/geopandas/geodataframe.py", line 1464, in to_parquet
_to_parquet(
~~~~~~~~~~~^
self,
^^^^^
...<6 lines>...
**kwargs,
^^^^^^^^^
)
^
File "/usr/local/lib/python3.14/site-packages/geopandas/io/arrow.py", line 440, in _to_parquet
table = _geopandas_to_arrow(
df,
...<3 lines>...
write_covering_bbox=write_covering_bbox,
)
File "/usr/local/lib/python3.14/site-packages/geopandas/io/arrow.py", line 357, in _geopandas_to_arrow
geo_metadata = _create_metadata(
df,
...<2 lines>...
write_covering_bbox=write_covering_bbox,
)
File "/usr/local/lib/python3.14/site-packages/geopandas/io/arrow.py", line 147, in _create_metadata
raise ValueError(
f"schema_version must be one of: {', '.join(SUPPORTED_VERSIONS)}"
)
ValueError: schema_version must be one of: 0.1.0, 0.4.0, 1.0.0-beta.1, 1.0.0, 1.1.0
38:24
Good progress! Two issues: 1. schema_version='1.0' should be '1.0.0' 2. The p50_height_m might have issues - let me check Also, min area is 0.0 which is suspicious - some geometries might be points or have zero area. Let me fix the schema version and continue.
38:24
Edit
/work/solve.py
Edited /work/solve.py.
38:36
Bash
python3 /work/solve.py 2>&1
Loading Lagos LGA boundaries...
20 LGAs loaded, bbox: 2.7060, 6.3667, 4.3509, 6.6968
Loading buildings in Lagos bbox and spatially joining to LGAs...
3057423 buildings in Lagos bbox
Running spatial join...
2246328 rows from spatial join
2245422 unique buildings after dedup
Converting geometry and computing footprint areas...
Areas: min=0.0, max=181048.2
Buildings > 1000 m²: 7249
Writing 7249 buildings to /work/lagos_buildings.geoparquet
Computing LGA summary...
Summary (20 LGAs):
lga n_buildings total_footprint_m2 n_with_height p50_height_m
0 Agege 55 1.178889e+05 0 NaN
1 Ajeromi/Ifelodun 60 1.078803e+05 0 NaN
2 Alimosho 353 6.476227e+05 0 NaN
3 Amuwo Odofin 834 1.746186e+06 0 NaN
4 Apapa 286 6.083739e+05 0 NaN
5 Badagry 124 2.259610e+05 0 NaN
6 Epe 345 8.156054e+05 0 NaN
7 Eti Osa 1055 1.828513e+06 237 20.0
8 Ibeju Lekki 593 1.790242e+06 0 NaN
9 Ifako/Ijaye 120 2.485358e+05 0 NaN
10 Ikeja 920 3.175973e+06 46 15.0
11 Kosofe 164 3.455446e+05 0 NaN
12 Lagos Island 117 2.261127e+05 9 20.0
13 Lagos Mainland 123 2.295373e+05 0 NaN
14 Mushin 461 1.050575e+06 1 35.0
15 Ojo 232 4.085015e+05 0 NaN
16 Oshodi/Isolo 415 7.888486e+05 3 27.0
17 Shomolu 122 1.951046e+05 3 17.0
18 Surulere 338 7.731603e+05 0 NaN
19 Ìkòròdú 532 1.852702e+06 2 8.0
Writing to /work/lagos_building_summary.parquet
Total time: 203.5s
Done!
42:22
Both files have been generated successfully. Let me verify the outputs to make sure they're correct.
42:22
Bash
python3 << 'PYEOF'
python3 << 'PYEOF'
import geopandas as gpd
import pandas as pd
# Verify buildings geoparquet
print("=== lagos_buildings.geoparquet ===")
gdf = gpd.read_parquet('/work/lagos_buildings.geoparquet')
print(f"Shape: {gdf.shape}")
print(f"Columns: {list(gdf.columns)}")
print(f"CRS: {gdf.crs}")
print(f"Sample rows:")
print(gdf.head(10).to_string())
print(f"\nNulls:\n{gdf.isnull().sum()}")
# Verify summary parquet
print("\n=== lagos_building_summary.parquet ===")
df = pd.read_parquet('/work/lagos_building_summary.parquet')
print(f"Shape: {df.shape}")
print(f"Columns: {list(df.columns)}")
print(f"\nAll rows:")
print(df.to_string())
print(f"\nNulls:\n{df.isnull().sum()}")
print(f"Dtypes:\n{df.dtypes}")
PYEOF
=== lagos_buildings.geoparquet ===
Shape: (7249, 5)
Columns: ['id', 'height', 'footprint_area_m2', 'lga', 'geometry']
CRS: {"$schema": "https://proj.org/schemas/v0.7/projjson.schema.json", "type": "GeographicCRS", "name": "WGS 84", "datum_ensemble": {"name": "World Geodetic System 1984 ensemble", "members": [{"name": "World Geodetic System 1984 (Transit)"}, {"name": "World Geodetic System 1984 (G730)"}, {"name": "World Geodetic System 1984 (G873)"}, {"name": "World Geodetic System 1984 (G1150)"}, {"name": "World Geodetic System 1984 (G1674)"}, {"name": "World Geodetic System 1984 (G1762)"}, {"name": "World Geodetic System 1984 (G2139)"}, {"name": "World Geodetic System 1984 (G2296)"}], "ellipsoid": {"name": "WGS 84", "semi_major_axis": 6378137, "inverse_flattening": 298.257223563}, "accuracy": "2.0", "id": {"authority": "EPSG", "code": 6326}}, "coordinate_system": {"subtype": "ellipsoidal", "axis": [{"name": "Geodetic latitude", "abbreviation": "Lat", "direction": "north", "unit": "degree"}, {"name": "Geodetic longitude", "abbreviation": "Lon", "direction": "east", "unit": "degree"}]}, "scope": "Horizontal component of 3D system.", "area": "World.", "bbox": {"south_latitude": -90, "west_longitude": -180, "north_latitude": 90, "east_longitude": 180}, "id": {"authority": "EPSG", "code": 4326}}
Sample rows:
id height footprint_area_m2 lga geometry
0 e24f1bf5-e957-4867-859b-a6a578fd9b06 NaN 1072.458049 Badagry POLYGON ((2.70836 6.37804, 2.70816 6.37804, 2.70817 6.37759, 2.70837 6.37759, 2.70836 6.37804))
1 5df57f70-f431-495f-98cb-eff96e1ca151 NaN 1052.426977 Badagry POLYGON ((2.72675 6.37881, 2.72672 6.37907, 2.72646 6.37904, 2.72645 6.37908, 2.7263 6.37906, 2.72632 6.37888, 2.72657 6.37891, 2.72657 6.37896, 2.72664 6.37897, 2.72665 6.37889, 2.72652 6.37888, 2.72654 6.37878, 2.72675 6.37881))
2 cfb03953-1851-4ec2-a087-3283cd510753 NaN 1077.286127 Badagry POLYGON ((2.7137 6.37984, 2.7136 6.37984, 2.71361 6.37962, 2.71356 6.37962, 2.71356 6.37955, 2.71398 6.37957, 2.71398 6.37964, 2.71393 6.37964, 2.71392 6.37983, 2.71382 6.37983, 2.71383 6.37966, 2.71382 6.37966, 2.71381 6.37983, 2.71372 6.37983, 2.71373 6.37963, 2.71371 6.37963, 2.7137 6.37984))
3 a98aa38c-6427-44a7-b421-4afe08a12736 NaN 1025.393519 Badagry POLYGON ((2.70892 6.38282, 2.70897 6.38283, 2.70897 6.38284, 2.709 6.38285, 2.709 6.38283, 2.70923 6.38284, 2.70921 6.38311, 2.7089 6.38309, 2.70892 6.38282))
4 76050de0-5fe7-4f62-9c6c-9da9f1ea5838 NaN 2957.625037 Badagry POLYGON ((2.75489 6.3838, 2.75521 6.38381, 2.75518 6.38458, 2.75486 6.38456, 2.75489 6.3838))
5 fd705243-4110-4880-8e2b-2347ab6d5df7 NaN 1866.547296 Badagry POLYGON ((2.77089 6.38733, 2.77089 6.38733, 2.77086 6.38744, 2.77082 6.38743, 2.77079 6.38752, 2.77083 6.38753, 2.77079 6.38767, 2.77049 6.38758, 2.77064 6.3871, 2.77093 6.38719, 2.77091 6.38726, 2.77091 6.38726, 2.77089 6.38733))
6 c1d67322-9991-4fc8-aa70-92ffb98d6053 NaN 1197.195382 Badagry POLYGON ((2.79038 6.38712, 2.79075 6.38713, 2.79074 6.3874, 2.79037 6.38738, 2.79038 6.38712))
7 0a35bec0-00e4-4c2b-8429-73ed712067a7 NaN 1230.180132 Epe POLYGON ((4.309 6.39156, 4.30903 6.39172, 4.30839 6.39183, 4.30837 6.39167, 4.309 6.39156))
8 8538438f-813b-421d-9a08-a69c6f58a3bc NaN 1380.873191 Badagry POLYGON ((2.7734 6.42377, 2.77378 6.42383, 2.77373 6.42412, 2.77359 6.42409, 2.77359 6.42412, 2.7735 6.4241, 2.7735 6.42408, 2.77335 6.42405, 2.7734 6.42377))
9 2dc26425-f32b-4fcc-ba97-c158bd09a7cb NaN 1396.006517 Badagry POLYGON ((2.77299 6.4237, 2.77337 6.42377, 2.77331 6.42405, 2.77317 6.42403, 2.77316 6.42405, 2.77308 6.42403, 2.77308 6.42401, 2.77294 6.42398, 2.77299 6.4237))
Nulls:
id 0
height 6948
footprint_area_m2 0
lga 0
geometry 0
dtype: int64
=== lagos_building_summary.parquet ===
Shape: (20, 5)
Columns: ['lga', 'n_buildings', 'total_footprint_m2', 'n_with_height', 'p50_height_m']
All rows:
lga n_buildings total_footprint_m2 n_with_height p50_height_m
0 Agege 55 1.178889e+05 0 NaN
1 Ajeromi/Ifelodun 60 1.078803e+05 0 NaN
2 Alimosho 353 6.476227e+05 0 NaN
3 Amuwo Odofin 834 1.746186e+06 0 NaN
4 Apapa 286 6.083739e+05 0 NaN
5 Badagry 124 2.259610e+05 0 NaN
6 Epe 345 8.156054e+05 0 NaN
7 Eti Osa 1055 1.828513e+06 237 20.0
8 Ibeju Lekki 593 1.790242e+06 0 NaN
9 Ifako/Ijaye 120 2.485358e+05 0 NaN
10 Ikeja 920 3.175973e+06 46 15.0
11 Kosofe 164 3.455446e+05 0 NaN
12 Lagos Island 117 2.261127e+05 9 20.0
13 Lagos Mainland 123 2.295373e+05 0 NaN
14 Mushin 461 1.050575e+06 1 35.0
15 Ojo 232 4.085015e+05 0 NaN
16 Oshodi/Isolo 415 7.888486e+05 3 27.0
17 Shomolu 122 1.951046e+05 3 17.0
18 Surulere 338 7.731603e+05 0 NaN
19 Ìkòròdú 532 1.852702e+06 2 8.0
Nulls:
lga 0
n_buildings 0
total_footprint_m2 0
n_with_height 0
p50_height_m 13
dtype: int64
Dtypes:
lga str
n_buildings int64
total_footprint_m2 float64
n_with_height int64
p50_height_m float64
dtype: object
42:37
Both output files look correct. Let me verify the outputs are as required: 1. `lagos_buildings.geoparquet` - EPSG:4326 GeoParquet with columns: `id`, `height`, `footprint_area_m2`, `lga`, `geometry` ✓ 2. `lagos_building_summary.parquet` - Plain Parquet (no geometry) with columns: `lga`, `n_buildings`, `total_footprint_m2`, `n_with_height`, `p50_height_m` ✓ The houses are >1000 m² buildings, which makes sense for large buildings like warehouses, commercial buildings, etc. Let me also verify that the summary correctly handles the case where an LGA has 0 buildings over 1000 m² - looking at the output, all 20 LGAs are present, so that's good. Some have NaN for p50_height_m when no height data is available. Everything looks good!
42:37