dd-l3-lagos-overture-buildings
Map (reference ← swipe → agent)
0:00
Updating the flood-risk model before rainy season. Need every building footprint exceeding 1000 m² across Lagos State (Nigeria) from Overture.
Also need a per-LGA roll-up: for each Lagos Local Government Area, the building count, total footprint area, count of buildings with a non-null Overture height, and the median height where available (null otherwise).
Write two output files. `lagos_buildings.geoparquet` should be EPSG:4326 GeoParquet with columns `id`, `height`, `footprint_area_m2`, `lga`, and `geometry`. `lagos_building_summary.parquet` should be a plain Parquet with no geometry column and columns `lga`, `n_buildings`, `total_footprint_m2`, `n_with_height`, and `p50_height_m`.
0:00
System info
{
"adapter": "openrouter",
"model": "deepseek/deepseek-v4-flash",
"base_url": "https://openrouter.ai/api/v1"
}
0:06
Let me break down the task: 1. Get building footprints > 1000 m² across Lagos State, Nigeria from Overture Maps data. 2. For each LGA (Local Government Area): - Building count - Total footprint area - Count of buildings with non-null Overture height - Median height (where available, null otherwise) 3. Output two files: - `lagos_buildings.geoparquet` - EPSG:4326 GeoParquet with columns: id, height, footprint_area_m2, lga, geometry - `lagos_building_summary.parquet` - plain Parquet with columns: lga, n_buildings, total_footprint_m2, n_with_height, p50_height_m First, I need to figure out how to access Overture Maps data. Overture Maps data is available as a Parquet dataset on AWS/S3. I can use DuckDB with the spatial extension to read Overture data. Overture Maps schema for buildings: https://docs.overturemaps.org/schema/reference/buildings/building/ The Overture Maps data is stored at s3://overturemaps-us-west-2/release/... or we can use the `read_parquet` with glob patterns. Actually, let me think about the approach. I'll use DuckDB with spatial extension to query Overture data directly. The Overture buildings data is at: s3://overturemaps-us-west-2/release/2024-09-26.0/theme=buildings/type=building/* Or more recent releases. Let me check what's available. Actually, I can use DuckDB with the HTTP parquet reader to query Overture data. Let me start by exploring the Overture data structure.
0:06
Bash
python3 -c "import duckdb; print(duckdb.__version__)"
1.5.3
0:07
Bash
python3 -c "import geopandas; print(geopandas.__version__)"
1.1.3
0:12
Let me plan the approach: 1. Download Overture buildings data for Lagos State, Nigeria 2. Filter buildings with footprint > 1000 m² 3. Get LGA boundaries for Lagos State 4. Perform spatial join to assign LGA names 5. Compute per-LGA summary statistics 6. Output the two files Let me start by exploring Overture data availability via DuckDB (which has Overture extension) or by downloading directly. Actually, the best approach is to use DuckDB with the Overture extension to query Overture Maps data. Let me first check what's available.
0:12
Bash
python3 << 'EOF'
python3 << 'EOF'
import duckdb
# Check if overture extension is available
duckdb.sql("INSTALL spatial; LOAD spatial; INSTALL overture; LOAD overture; SELECT * FROM duckdb_extensions() WHERE extension_name IN ('spatial', 'overture')").show()
EOF
Traceback (most recent call last): File "<stdin>", line 3, in <module> _duckdb.HTTPException: HTTP Error: Failed to download extension "overture" at URL "http://extensions.duckdb.org/v1.5.3/linux_amd64/overture.duckdb_extension.gz" (HTTP 404) Candidate extensions: "vortex", "motherduck", "inet", "postgres", "postgres_scanner" For more info, visit https://duckdb.org/docs/stable/extensions/troubleshooting?version=v1.5.3&platform=linux_amd64&extension=overture
0:17
The overture extension for duckdb seems not available. Let me use the Overture Maps Python library or directly query Overture data. Let me first check what's available. Actually, I should use the Overture Maps data directly. The data is available as Parquet files on AWS S3. Let me use duckdb with the spatial extension and query the Overture data directly from S3. The Overture buildings dataset is at `s3://overturemaps-us-west-2/release/2025-03-19.0/theme=buildings/type=building/`. Let me check the available release version. Let me first install the overturemaps Python library which can help.
0:17
Bash
pip install --user overturemaps 2>&1 | tail -5
Requirement already satisfied: orjson>=3.9.0 in /usr/local/lib/python3.14/site-packages (from overturemaps) (3.11.9) Requirement already satisfied: pyarrow>=15.0.2 in /usr/local/lib/python3.14/site-packages (from overturemaps) (24.0.0) Requirement already satisfied: pyfiglet>=1.0.2 in /usr/local/lib/python3.14/site-packages (from overturemaps) (1.0.4) Requirement already satisfied: shapely>=2.1.0 in /usr/local/lib/python3.14/site-packages (from overturemaps) (2.1.2) Requirement already satisfied: tqdm>=4.67.3 in /usr/local/lib/python3.14/site-packages (from overturemaps) (4.68.1)
0:22
Let me check how to use the overturemaps library to get building data for Lagos State, Nigeria.
0:22
Bash
python3 -c "import overturemaps; help(overturemaps)" 2>&1 | head -40
Help on package overturemaps:
NAME
overturemaps
PACKAGE CONTENTS
__main__
changelog
cli
core
models
releases
state
writers
FILE
/usr/local/lib/python3.14/site-packages/overturemaps/__init__.py
0:26
Let me understand the workflow: 1. Use overturemaps to get building data for Lagos State, Nigeria 2. Filter buildings with footprint area > 1000 m² 3. Read LGA boundaries for Lagos State 4. Spatial join buildings to LGAs 5. Calculate per-LGA summary 6. Write outputs Let me first explore the overturemaps API and see how to query data.
0:26
Bash
python3 -c "import overturemaps; help(overturemaps.core)" 2>&1 | head -80
Help on module overturemaps.core in overturemaps:
NAME
overturemaps.core
FUNCTIONS
count_rows(
overture_type,
bbox=None,
release=None,
connect_timeout=None,
request_timeout=None,
stac=False
) -> int
Return the number of rows matching the given parameters.
geoarrow_schema_adapter(schema: pa.Schema) -> pa.Schema
Convert a geoarrow-compatible schema to a proper geoarrow schema
This assumes there is a single "geometry" column with WKB formatting
Parameters
----------
schema: pa.Schema
Returns
-------
pa.Schema
A copy of the input schema with the geometry field replaced with
a new one with the proper geoarrow ARROW:extension metadata
geodataframe(
overture_type: str,
bbox: BBox | tuple[float, float, float, float] | list[float] | None = None,
release: str = None,
connect_timeout: int = None,
request_timeout: int = None,
stac: bool = False
) -> GeoDataFrame
Loads geoparquet for specified type into a geopandas dataframe
Parameters
----------
overture_type: type to load
bbox: optional bounding box for data fetch (xmin, ymin, xmax, ymax)
connect_timeout: optional connection timeout in seconds
request_timeout: optional request timeout in seconds
Returns
-------
GeoDataFrame with the optionally filtered theme data
get_all_overture_types() -> List[str]
get_available_releases() -> Tuple[List[str], str]
Fetch available releases from the STAC catalog.
Returns
-------
Tuple of (all_releases, latest_release) where:
- all_releases is a list of release version strings
- latest_release is the latest release version string
get_latest_release() -> str
Get the latest release version.
Returns
-------
str: The latest release version
query_gers_registry(gers_id: str) -> Optional[Tuple[str, BBox | None]]
Query the GERS registry to get the filepath and bbox for a given GERS ID.
The registry always uses the latest release.
Parameters
----------
gers_id: The GERS ID to look up
Returns
0:31
Let me now explore the overturemaps library to understand how to get buildings data. I'll look at what types are available and how to fetch data for Lagos State, Nigeria.
0:31
Bash
python3 -c "import overturemaps; print(overturemaps.core.get_all_overture_types(...
['address', 'bathymetry', 'building', 'building_part', 'division', 'division_area', 'division_boundary', 'place', 'segment', 'connector', 'infrastructure', 'land', 'land_cover', 'land_use', 'water']
0:38
Great. So I can use `overturemaps.geodataframe('building', bbox=...)` to get building data. I need Lagos State, Nigeria. Let me find the bounding box for Lagos State.
I'll need:
1. Get the boundary/bbox for Lagos State, Nigeria
2. Get building footprints with area > 1000 m²
3. Join with LGA boundaries to assign each building to an LGA
4. Compute summary statistics per LGA
Let me start by getting the Lagos State boundary.
0:38
Bash
python3 << 'EOF'
python3 << 'EOF'
import geopandas as gpd
from shapely.geometry import box
# Load world boundaries to find Nigeria's Lagos state
# Using naturalearth or we can use overture's division data
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
nigeria = world[world['iso_a3'] == 'NGA']
print(nigeria[['name', 'iso_a3']])
print(nigeria.total_bounds)
EOF
Traceback (most recent call last):
File "<stdin>", line 6, in <module>
File "/usr/local/lib/python3.14/site-packages/geopandas/datasets/__init__.py", line 18, in get_path
raise AttributeError(error_msg)
AttributeError: The geopandas.dataset has been deprecated and was removed in GeoPandas 1.0. You can get the original 'naturalearth_lowres' data from https://www.naturalearthdata.com/downloads/110m-cultural-vectors/.
0:42
Let me try a different approach. I'll use overturemaps to get the division data for Nigeria to find the Lagos State boundary.
0:42
Bash
python3 << 'EOF'
python3 << 'EOF'
import overturemaps
# Get Nigerian divisions
print("Fetching division data for Nigeria...")
df = overturemaps.geodataframe("division_area", bbox=None)
# Filter to Nigeria
nigeria = df[df['country'] == 'NG']
print(f"Number of rows for Nigeria: {len(nigeria)}")
print(nigeria.columns.tolist())
print(nigeria[['name', 'subtype', 'country', 'region']].head(20))
EOF
Fetching division data for Nigeria...
bash: line 12: 327 Killed python3 <<'EOF'
import overturemaps
# Get Nigerian divisions
print("Fetching division data for Nigeria...")
df = overturemaps.geodataframe("division_area", bbox=None)
# Filter to Nigeria
nigeria = df[df['country'] == 'NG']
print(f"Number of rows for Nigeria: {len(nigeria)}")
print(nigeria.columns.tolist())
print(nigeria[['name', 'subtype', 'country', 'region']].head(20))
EOF
3:50
Loading the whole world might be too heavy. Let me try a different approach - using DuckDB with the spatial extension to query Overture data directly, which is more efficient.
3:50
Bash
python3 << 'EOF'
python3 << 'EOF'
import duckdb
# Connect to DuckDB and install extensions
duckdb.sql("""
INSTALL spatial;
LOAD spatial;
""")
# Let's check what Overture tables are available
# Use DuckDB's ability to read Overture parquet files
# First, let's check what the schema looks like for the building theme
print("Checking building schema...")
result = duckdb.sql("""
DESCRIBE read_parquet('s3://overturemaps-us-west-1/release/2025-05-19.0/theme=buildings/type=building/*');
""").fetchall()
for row in result:
print(row)
EOF
Checking building schema...
Traceback (most recent call last):
File "<stdin>", line 13, in <module>
_duckdb.ParserException: Parser Error: syntax error at or near "("
LINE 2: DESCRIBE read_parquet('s3://overturemaps-us-west-1/release/2025-05-19.0/theme...
^
3:56
Let me try a different approach. Let me use the `overturemaps` Python library but with a bounding box for Lagos only. First, I need to find the bounding box of Lagos State, Nigeria. Let me get Lagos boundaries first using the division data but with a focused approach.
3:56
Bash
python3 << 'EOF'
python3 << 'EOF'
import duckdb
duckdb.sql("INSTALL spatial; LOAD spatial;")
# Try describing with proper syntax
result = duckdb.sql("""
DESCRIBE TABLE read_parquet('s3://overturemaps-us-west-1/release/2025-05-19.0/theme=buildings/type=building/*');
""").fetchall()
for row in result:
print(row)
EOF
Traceback (most recent call last):
File "<stdin>", line 5, in <module>
_duckdb.ParserException: Parser Error: syntax error at or near "("
LINE 2: DESCRIBE TABLE read_parquet('s3://overturemaps-us-west-1/release/2025-05-19.0/theme...
^
4:01
Let me try a different approach. The overturemaps library can handle the data fetching. Let me use it more efficiently - first get the bounding box for Lagos State, then use that to filter buildings.
4:01
Bash
python3 << 'EOF'
python3 << 'EOF'
import overturemaps
# First, let's just get the latest release
print(overturemaps.core.get_latest_release())
# Let's try getting the divisions for Nigeria more efficiently
# Use a bounding box for Nigeria
# Nigeria bbox roughly: (3, 4, 15, 14)
df = overturemaps.geodataframe("division_area", bbox=(2.5, 4, 15, 14))
print(f"Total rows: {len(df)}")
print(df.columns.tolist())
# Filter to Nigeria
nigeria = df[df['country'] == 'NG']
print(f"Nigeria rows: {len(nigeria)}")
print(nigeria[['name', 'subtype', 'country', 'region']].head(30))
EOF
2026-06-17.0 Total rows: 4626 ['id', 'geometry', 'country', 'sources', 'subtype', 'admin_level', 'class', 'names', 'is_land', 'is_territorial', 'region', 'division_id', 'version', 'bbox'] Nigeria rows: 3741
Traceback (most recent call last):
File "<stdin>", line 15, in <module>
File "/usr/local/lib/python3.14/site-packages/geopandas/geodataframe.py", line 1891, in __getitem__
result = super().__getitem__(key)
File "/usr/local/lib/python3.14/site-packages/pandas/core/frame.py", line 4384, in __getitem__
indexer = self.columns._get_indexer_strict(key, "columns")[1]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.14/site-packages/pandas/core/indexes/base.py", line 6302, in _get_indexer_strict
self._raise_if_missing(keyarr, indexer, axis_name)
~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.14/site-packages/pandas/core/indexes/base.py", line 6355, in _raise_if_missing
raise KeyError(f"{not_found} not in index")
KeyError: "['name'] not in index"
4:20
So the column is called `names` (a struct), not `name`. Let me check the schema more carefully.
4:20
Bash
python3 << 'EOF'
python3 << 'EOF'
import overturemaps
import pandas as pd
# Get the latest release
print("Latest release:", overturemaps.core.get_latest_release())
# Get division_area for Nigeria bbox
df = overturemaps.geodataframe("division_area", bbox=(2.5, 4, 15, 14))
print(f"Total rows: {len(df)}")
print("Columns:", df.columns.tolist())
print(df.dtypes)
print(df[['id', 'subtype', 'admin_level', 'country', 'region']].head(20))
# Check names column structure
print("\n--- names sample ---")
print(df['names'].head(5))
EOF
Latest release: 2026-06-17.0
Total rows: 4626
Columns: ['id', 'geometry', 'country', 'sources', 'subtype', 'admin_level', 'class', 'names', 'is_land', 'is_territorial', 'region', 'division_id', 'version', 'bbox']
id str
geometry geometry
country str
sources object
subtype str
admin_level float64
class str
names object
is_land bool
is_territorial bool
region str
division_id str
version int32
bbox object
dtype: object
id subtype ... country region
0 4bbf89ce-38b8-489a-9753-5db1360339d6 dependency ... UM NaN
1 f5536621-a798-4c77-b1c7-39a54c8c1cdb dependency ... UM NaN
2 3ee017ee-ae6c-45ff-ae72-84758cec508f country ... ML NaN
3 a7d9ecf7-a4bd-4143-842e-f020a157a081 country ... KI NaN
4 b5d2270c-40ca-4d88-8783-69d26f524df4 country ... KI NaN
5 29b5953d-b77c-4dc6-a7a4-5b20d738f54d county ... NE NE-6
6 0ff561b3-cede-4553-82c4-8f3edb7181be region ... NE NE-6
7 d0a0bb7f-b11b-4336-8b35-992b14ce13df county ... NE NE-6
8 dc73795a-31ea-4b7c-b879-d89aeae638f8 region ... BJ BJ-CO
9 8f60b9ca-9907-40fd-8406-f20d8b621700 region ... BJ BJ-ZO
10 045f935e-5ded-4798-9cc7-d7d35f09c18e county ... BJ BJ-OU
11 7108ac41-68bb-4602-b583-bead9a36068f region ... BJ BJ-OU
12 e354a76e-2221-4c97-b742-fd097f7cc3ee locality ... BJ BJ-OU
13 2a31b998-68eb-403b-883c-d0499e357cfc county ... BJ BJ-OU
14 9b2620d6-d4fd-43c8-a88a-b1a1412f8022 county ... BJ BJ-OU
15 a4173f4b-fdae-4d74-8fe6-33c6420d6d46 county ... BJ BJ-OU
16 62ad7016-1433-4176-ad2f-501367509d55 county ... BJ BJ-ZO
17 0030ea08-dd45-4051-948e-869effbdc267 county ... BJ BJ-OU
18 3dd9b9d4-d4eb-49e3-8d18-19b328be8cdc locality ... BJ BJ-PL
19 65cbe225-a2ff-4166-845c-da8f6b8bc620 region ... BJ BJ-PL
[20 rows x 5 columns]
--- names sample ---
0 {'primary': 'United States Minor Outlying Isla...
1 {'primary': 'United States Minor Outlying Isla...
2 {'primary': 'Mali', 'common': [('hy', 'Մալի'),...
3 {'primary': 'Kiribati', 'common': [('hy', 'Կիր...
4 {'primary': 'Kiribati', 'common': [('hy', 'Կիր...
Name: names, dtype: object
4:48
I need to get the divisions for Lagos State, Nigeria. Let me first get Nigeria's divisions to find Lagos State and its LGAs.
4:48
Bash
python3 << 'EOF'
python3 << 'EOF'
import overturemaps
import pandas as pd
# Get Nigeria divisions
df = overturemaps.geodataframe("division_area", bbox=(2.5, 4, 15, 14))
# Filter to Nigeria
nigeria = df[df['country'] == 'NG'].copy()
print(f"Nigeria rows: {len(nigeria)}")
print("Subtypes:", nigeria['subtype'].value_counts())
print("Admin levels:", sorted(nigeria['admin_level'].dropna().unique()))
print()
# Look at names for Nigeria entries to find Lagos
for idx, row in nigeria.iterrows():
names = row['names']
if isinstance(names, dict) and 'primary' in names:
primary = names['primary']
if 'agos' in primary or 'Lagos' in primary or 'lagos' in primary.lower():
print(f"Found Lagos: id={row['id']}, subtype={row['subtype']}, admin_level={row['admin_level']}, names={names}")
EOF
Nigeria rows: 3741
Subtypes: subtype
locality 2866
county 773
macrohood 47
region 37
neighborhood 11
microhood 5
country 2
Name: count, dtype: int64
Admin levels: [np.float64(0.0), np.float64(1.0), np.float64(2.0)]
Found Lagos: id=639f85d6-f581-4f31-b52b-98da63a57271, subtype=county, admin_level=2.0, names={'primary': 'Lagos Mainland', 'common': None, 'rules': None}
Found Lagos: id=5fb3f6cb-3993-4bb7-8a53-73e775feb14f, subtype=region, admin_level=1.0, names={'primary': 'Lagos', 'common': [('ka', 'ლაგოსის შტატი'), ('ar', 'لاغوس'), ('fr', 'État de Lagos'), ('uk', 'Лагос'), ('ceb', 'Lagos State'), ('war', 'Lagos'), ('pt', 'Lagos'), ('cs', 'Stát Lagos'), ('gl', 'Estado de Lagos'), ('it', 'Lagos'), ('ca', 'Estat de Lagos'), ('yo', 'Ìpínlẹ̀ Èkó'), ('nl', 'Lagos'), ('bg', 'Лагос'), ('ko', '라고스주'), ('mk', 'Лагос'), ('et', 'Lagose osariik'), ('de', 'Lagos'), ('ha', 'Lagos'), ('azb', 'لاقوس ایالتی'), ('ig', 'Ȯra Lagos'), ('ur', 'لاگوس ریاست'), ('nan', 'Lagos Chiu'), ('ru', 'Лагос'), ('th', 'รัฐเลกอส'), ('id', 'Lagos'), ('sv', 'Lagos'), ('tr', 'Lagos Eyaleti'), ('da', 'Lagos'), ('sr-Cyrl', 'Лагос'), ('en', 'Lagos State'), ('he', 'לאגוס'), ('mrj', 'Лагос'), ('az', 'Laqos ştatı'), ('es', 'Estado de Lagos'), ('hi', 'लेगोस राज्य'), ('sw', 'Jimbo la Lagos'), ('ja', 'ラゴス州'), ('fi', 'Lagos'), ('ro', 'Statul Lagos'), ('no', 'Lagos'), ('vec', 'Ƚagos'), ('fa', 'ایالت لاگوس'), ('zh', '拉哥斯'), ('ms', 'Negeri Lagos'), ('pl', 'Lagos')], 'rules': array([{'variant': 'alternate', 'language': 'zh', 'perspectives': None, 'value': '拉各斯州', 'between': None, 'side': None},
{'variant': 'alternate', 'language': 'ar', 'perspectives': None, 'value': 'ولاية لاغوس', 'between': None, 'side': None},
{'variant': 'alternate', 'language': 'es', 'perspectives': None, 'value': 'Estado Lagos', 'between': None, 'side': None},
{'variant': 'alternate', 'language': 'pt', 'perspectives': None, 'value': 'Estado Lagos', 'between': None, 'side': None},
{'variant': 'alternate', 'language': 'it', 'perspectives': None, 'value': 'Stato Lagos', 'between': None, 'side': None},
{'variant': 'alternate', 'language': 'fr', 'perspectives': None, 'value': 'État Lagos', 'between': None, 'side': None},
{'variant': 'alternate', 'language': 'de', 'perspectives': None, 'value': 'Staat Lagos', 'between': None, 'side': None},
{'variant': 'alternate', 'language': 'nl', 'perspectives': None, 'value': 'Staat Lagos', 'between': None, 'side': None},
{'variant': 'alternate', 'language': 'sv', 'perspectives': None, 'value': 'Lagos-tillstånd', 'between': None, 'side': None},
{'variant': 'alternate', 'language': 'no', 'perspectives': None, 'value': 'Lagos Stat', 'between': None, 'side': None},
{'variant': 'alternate', 'language': 'pl', 'perspectives': None, 'value': 'Stan Lagos', 'between': None, 'side': None},
{'variant': 'alternate', 'language': 'da', 'perspectives': None, 'value': 'Lagos Stat', 'between': None, 'side': None},
{'variant': 'alternate', 'language': 'tr', 'perspectives': None, 'value': 'Durum Lagos', 'between': None, 'side': None},
{'variant': 'alternate', 'language': 'id', 'perspectives': None, 'value': 'Negara Bagian Lagos', 'between': None, 'side': None}],
dtype=object)}
Found Lagos: id=c58fd335-e502-421c-b8e3-f263c5fb0ff7, subtype=county, admin_level=2.0, names={'primary': 'Lagos Island', 'common': [('ar', 'جزيرة لاغوس')], 'rules': None}
5:11
So Lagos State is the region with subtype='region' and admin_level=1.0. The id is 5fb3f6cb-3993-4bb7-8a53-73e775feb14f. Now I need to: 1. Get the geometry of Lagos State 2. Use that as a mask to filter buildings from Overture that intersect Lagos State 3. Find the LGA (Local Government Area) boundaries within Lagos State 4. Compute the per-LGA summary Let me get the Lagos State geometry first, then find all the LGAs within it.
5:11
Bash
python3 << 'EOF'
python3 << 'EOF'
import overturemaps
import geopandas as gpd
import pandas as pd
# Get Nigeria divisions
df = overturemaps.geodataframe("division_area", bbox=(2.5, 4, 15, 14))
nigeria = df[df['country'] == 'NG'].copy()
# Find Lagos State region
lagos_state = nigeria[(nigeria['subtype'] == 'region') & (nigeria['admin_level'] == 1.0) & (nigeria['names'].apply(lambda x: x.get('primary') == 'Lagos'))]
print(f"Lagos State rows: {len(lagos_state)}")
print(lagos_state[['id', 'subtype', 'admin_level', 'names']])
# Get its geometry
lagos_geom = lagos_state.geometry.iloc[0]
print(f"\nGeometry type: {lagos_geom.geom_type}")
print(f"Bounds: {lagos_geom.bounds}")
# Now find all counties (LGAs) within Lagos State
counties = nigeria[nigeria['subtype'] == 'county']
# Spatial join to find counties within Lagos State
counties_in_lagos = counties[counties.intersects(lagos_geom)].copy()
print(f"\nNumber of LGAs in Lagos State: {len(counties_in_lagos)}")
print(counties_in_lagos[['id', 'names']].head(30))
print()
print("All LGAs:")
for idx, row in counties_in_lagos.iterrows():
print(f" {row['id']}: {row['names']['primary']}")
EOF
Lagos State rows: 1
id ... names
335 5fb3f6cb-3993-4bb7-8a53-73e775feb14f ... {'primary': 'Lagos', 'common': [('ka', 'ლაგოსი...
[1 rows x 4 columns]
Geometry type: Polygon
Bounds: (2.7059846, 6.366684, 4.3509056, 6.6967964)
Number of LGAs in Lagos State: 28
id names
95 00496a96-e1ca-42cb-a1c8-531f48aa7461 {'primary': 'Ipokia', 'common': None, 'rules':...
96 93ccd19d-3389-4c00-a8e3-920e20810d39 {'primary': 'Badagry', 'common': None, 'rules'...
98 51955eb1-4697-4268-b8e2-8554f7a77475 {'primary': 'Ado Odo/Ota', 'common': None, 'ru...
99 32cfa1af-5e74-41a7-9272-b9dd8da715d9 {'primary': 'Amuwo Odofin', 'common': None, 'r...
100 934f5b76-4cbf-4902-b124-b33d057d87b1 {'primary': 'Ifako/Ijaye', 'common': None, 'ru...
101 7aa34b16-ba84-45d9-94c4-083c65b8cf09 {'primary': 'Ifo', 'common': None, 'rules': None}
102 1a9532db-bcd5-4050-b62b-f27a22f2ebcf {'primary': 'Alimosho', 'common': None, 'rules...
103 7918fd74-2ce4-48af-b3c1-1a4f1e26558e {'primary': 'Ojo', 'common': None, 'rules': None}
312 1f5072a4-71e6-409b-89c9-cd999be6a442 {'primary': 'Shagamu', 'common': None, 'rules'...
322 cb527384-149e-41c1-b3c5-0070deab27d4 {'primary': 'Kosofe', 'common': None, 'rules':...
323 b098198f-55d1-46a6-ac1e-71ae35fdd913 {'primary': 'Shomolu', 'common': None, 'rules'...
326 639f85d6-f581-4f31-b52b-98da63a57271 {'primary': 'Lagos Mainland', 'common': None, ...
327 4b715cfc-a9b3-4a05-b601-353b77619939 {'primary': 'Apapa', 'common': None, 'rules': ...
329 ffe6c369-91e2-4984-97d1-bd24feff3732 {'primary': 'Mushin', 'common': [('ru', 'Мушин...
330 c540f662-44c9-498d-8f3d-2a41a039a29f {'primary': 'Ikeja', 'common': [('ru', 'Икеджа...
331 fcaab4e8-075a-4ab6-a2d2-cd614e95dfc2 {'primary': 'Agege', 'common': None, 'rules': ...
332 457a7986-f212-4765-b432-e1c1b77d6d61 {'primary': 'Oshodi/Isolo', 'common': None, 'r...
333 e2af9375-5eed-46f8-a258-515f22297084 {'primary': 'Ajeromi/Ifelodun', 'common': None...
334 f0f8532f-4c5e-428a-981e-12a31b2a9044 {'primary': 'Surulere', 'common': None, 'rules...
336 c58fd335-e502-421c-b8e3-f263c5fb0ff7 {'primary': 'Lagos Island', 'common': [('ar', ...
337 0c5600b6-df25-4e75-a25e-ae9937299bad {'primary': 'Eti Osa', 'common': [('yo', 'Etí ...
338 030cb2c1-697b-418e-b820-49393989a03b {'primary': 'Ìkòròdú', 'common': [('en', 'Ikor...
339 61530733-5992-41f4-9212-43f1b460319c {'primary': 'Epe', 'common': [('ru', 'Эпе')], ...
341 70bd9c0f-fcdd-4bd2-81df-34d9351c997e {'primary': 'Ibeju Lekki', 'common': None, 'ru...
351 a277c8df-124c-47b7-9ab5-092e7f3bcdab {'primary': 'Ìjẹ̀bú Òde', 'common': [('en', 'I...
356 d8c2f7a3-d77d-4781-ba92-5359250c2516 {'primary': 'Odogbolu', 'common': None, 'rules...
465 6b8e9727-a665-4667-a087-31276385e8b6 {'primary': 'Ijebu East', 'common': None, 'rul...
484 cf4ccbd2-4a02-4f5e-94ca-835428f164cf {'primary': 'Ogun Waterside', 'common': None, ...
All LGAs:
00496a96-e1ca-42cb-a1c8-531f48aa7461: Ipokia
93ccd19d-3389-4c00-a8e3-920e20810d39: Badagry
51955eb1-4697-4268-b8e2-8554f7a77475: Ado Odo/Ota
32cfa1af-5e74-41a7-9272-b9dd8da715d9: Amuwo Odofin
934f5b76-4cbf-4902-b124-b33d057d87b1: Ifako/Ijaye
7aa34b16-ba84-45d9-94c4-083c65b8cf09: Ifo
1a9532db-bcd5-4050-b62b-f27a22f2ebcf: Alimosho
7918fd74-2ce4-48af-b3c1-1a4f1e26558e: Ojo
1f5072a4-71e6-409b-89c9-cd999be6a442: Shagamu
cb527384-149e-41c1-b3c5-0070deab27d4: Kosofe
b098198f-55d1-46a6-ac1e-71ae35fdd913: Shomolu
639f85d6-f581-4f31-b52b-98da63a57271: Lagos Mainland
4b715cfc-a9b3-4a05-b601-353b77619939: Apapa
ffe6c369-91e2-4984-97d1-bd24feff3732: Mushin
c540f662-44c9-498d-8f3d-2a41a039a29f: Ikeja
fcaab4e8-075a-4ab6-a2d2-cd614e95dfc2: Agege
457a7986-f212-4765-b432-e1c1b77d6d61: Oshodi/Isolo
e2af9375-5eed-46f8-a258-515f22297084: Ajeromi/Ifelodun
f0f8532f-4c5e-428a-981e-12a31b2a9044: Surulere
c58fd335-e502-421c-b8e3-f263c5fb0ff7: Lagos Island
0c5600b6-df25-4e75-a25e-ae9937299bad: Eti Osa
030cb2c1-697b-418e-b820-49393989a03b: Ìkòròdú
61530733-5992-41f4-9212-43f1b460319c: Epe
70bd9c0f-fcdd-4bd2-81df-34d9351c997e: Ibeju Lekki
a277c8df-124c-47b7-9ab5-092e7f3bcdab: Ìjẹ̀bú Òde
d8c2f7a3-d77d-4781-ba92-5359250c2516: Odogbolu
6b8e9727-a665-4667-a087-31276385e8b6: Ijebu East
cf4ccbd2-4a02-4f5e-94ca-835428f164cf: Ogun Waterside
5:32
Wait, some of these LGAs are actually in Ogun State, not Lagos State. The intersection with Lagos State boundary may have picked up bordering LGAs from Ogun State. Let me narrow down to only the actual Lagos State LGAs (which are admin_level=2 within the Lagos State region). Let me think about this differently. The administrative divisions have a hierarchy. The Lagos State region (admin_level=1) should contain counties (admin_level=2) that are part of it. But the intersection approach is picking up neighboring counties that just touch the boundary. Let me use a more precise containment approach - counties whose centroids are within the Lagos State boundary, or counties that have division_id pointing to Lagos State. Actually, looking at the division_area data, there's a `division_id` column. Let me check if counties reference their parent region via this column.
5:32
Bash
python3 << 'EOF'
python3 << 'EOF'
import overturemaps
import geopandas as gpd
import pandas as pd
# Get Nigeria divisions
df = overturemaps.geodataframe("division_area", bbox=(2.5, 4, 15, 14))
nigeria = df[df['country'] == 'NG'].copy()
# Find Lagos State region
lagos_state = nigeria[(nigeria['subtype'] == 'region') & (nigeria['names'].apply(lambda x: x.get('primary') == 'Lagos'))]
lagos_geom = lagos_state.geometry.iloc[0]
lagos_id = lagos_state.iloc[0]['id']
print(f"Lagos State ID: {lagos_id}")
# Filter counties - use strict containment (centroid within Lagos State)
counties = nigeria[(nigeria['subtype'] == 'county') & (nigeria['admin_level'] == 2.0)].copy()
counties['centroid'] = counties.centroid
counties_in_lagos = counties[counties['centroid'].within(lagos_geom)].copy()
print(f"\nNumber of LGAs with centroid in Lagos State: {len(counties_in_lagos)}")
# Also check division_id
print("\nBy division_id:")
lagos_counties_by_parent = nigeria[(nigeria['subtype'] == 'county') & (nigeria['division_id'] == lagos_id)]
print(f"Counties with division_id = Lagos State: {len(lagos_counties_by_parent)}")
for idx, row in lagos_counties_by_parent.iterrows():
print(f" {row['names']['primary']}")
print("\n\nAll LGAs with centroid in Lagos:")
for idx, row in counties_in_lagos.iterrows():
print(f" {row['names']['primary']}")
EOF
Lagos State ID: 5fb3f6cb-3993-4bb7-8a53-73e775feb14f Number of LGAs with centroid in Lagos State: 20 By division_id: Counties with division_id = Lagos State: 0 All LGAs with centroid in Lagos: Badagry Amuwo Odofin Ifako/Ijaye Alimosho Ojo Kosofe Shomolu Lagos Mainland Apapa Mushin Ikeja Agege Oshodi/Isolo Ajeromi/Ifelodun Surulere Lagos Island Eti Osa Ìkòròdú Epe Ibeju Lekki
5:51
So the counties falling within Lagos State by centroid are 20. But the actual Lagos State has 20 LGAs. Let me check - the official list of LGAs in Lagos State are: Agege, Ajeromi-Ifelodun, Alimosho, Amuwo-Odofin, Apapa, Badagry, Epe, Eti-Osa, Ibeju-Lekki, Ifako-Ijaiye, Ikeja, Ikorodu, Kosofe, Lagos Island, Lagos Mainland, Mushin, Ojo, Oshodi-Isolo, Shomolu, Surulere. That's 20 LGAs! But some of the counties I saw earlier like Ipokia, Ado Odo/Ota, Ifo, Shagamu, Ijebu Ode, Odogbolu, Ijebu East, Ogun Waterside are in Ogun State, not Lagos State. The intersection/contains approach might have included some that touch the boundary. Let me use the centroid approach which gives exactly 20 - that matches the actual number of LGAs in Lagos State. Good. Wait, actually I'm missing Ikorodu. Let me check... Looking at the list again: - Ìkòròdú (Ikorodu) - yes it's there So 20 LGAs by centroid. Let me use the division_area boundaries of these 20 LGAs to spatially filter the building data. Let me now get the buildings for Lagos State. I'll use the bbox from Lagos State geometry to limit the data download, then clip spatially.
5:51
Bash
python3 << 'EOF'
python3 << 'EOF'
import overturemaps
import geopandas as gpd
import pandas as pd
# Get Nigeria divisions
df = overturemaps.geodataframe("division_area", bbox=(2.5, 4, 15, 14))
nigeria = df[df['country'] == 'NG'].copy()
# Find Lagos State region
lagos_state = nigeria[(nigeria['subtype'] == 'region') & (nigeria['names'].apply(lambda x: x.get('primary') == 'Lagos'))]
lagos_geom = lagos_state.geometry.iloc[0]
print(f"Lagos State bounds: {lagos_geom.bounds}")
# Get counties with centroid in Lagos
counties = nigeria[(nigeria['subtype'] == 'county')].copy()
counties['centroid'] = counties.centroid
lagos_counties = counties[counties['centroid'].within(lagos_geom)].copy()
print(f"Number of Lagos LGAs: {len(lagos_counties)}")
# Create a mapping of LGA name to geometry
lga_dict = {}
for idx, row in lagos_counties.iterrows():
name = row['names']['primary']
lga_dict[name] = row['geometry']
print("\nLagos LGAs:")
for name in sorted(lga_dict.keys()):
print(f" {name}")
# Save the LGA geometries
gdf_lgas = gpd.GeoDataFrame({'lga': list(lga_dict.keys()), 'geometry': list(laga_dict.values())}, crs='EPSG:4326')
gdf_lgas.to_file('/work/lagos_lgas.geojson', driver='GeoJSON')
print(f"\nSaved {len(gdf_lgas)} LGAs to /work/lagos_lgas.geojson")
EOF
Lagos State bounds: (2.7059846, 6.366684, 4.3509056, 6.6967964) Number of Lagos LGAs: 20 Lagos LGAs: Agege Ajeromi/Ifelodun Alimosho Amuwo Odofin Apapa Badagry Epe Eti Osa Ibeju Lekki Ifako/Ijaye Ikeja Kosofe Lagos Island Lagos Mainland Mushin Ojo Oshodi/Isolo Shomolu Surulere Ìkòròdú
Traceback (most recent call last): File "<stdin>", line 31, in <module> NameError: name 'laga_dict' is not defined. Did you mean: 'lga_dict'?
6:12
Typo in the script. Let me fix it.
6:12
Bash
python3 << 'EOF'
python3 << 'EOF'
import overturemaps
import geopandas as gpd
import pandas as pd
# Get Nigeria divisions
df = overturemaps.geodataframe("division_area", bbox=(2.5, 4, 15, 14))
nigeria = df[df['country'] == 'NG'].copy()
# Find Lagos State region
lagos_state = nigeria[(nigeria['subtype'] == 'region') & (nigeria['names'].apply(lambda x: x.get('primary') == 'Lagos'))]
lagos_geom = lagos_state.geometry.iloc[0]
# Get counties with centroid in Lagos
counties = nigeria[(nigeria['subtype'] == 'county')].copy()
counties['centroid'] = counties.centroid
lagos_counties = counties[counties['centroid'].within(lagos_geom)].copy()
# Create a mapping of LGA name to geometry
lga_dict = {}
for idx, row in lagos_counties.iterrows():
name = row['names']['primary']
lga_dict[name] = row['geometry']
# Save the LGA geometries
gdf_lgas = gpd.GeoDataFrame({'lga': list(lga_dict.keys()), 'geometry': list(lga_dict.values())}, crs='EPSG:4326')
gdf_lgas.to_file('/work/lagos_lgas.geojson', driver='GeoJSON')
print(f"Saved {len(gdf_lgas)} LGAs to /work/lagos_lgas.geojson")
print("LGA names:", sorted(gdf_lgas['lga'].tolist()))
EOF
Saved 20 LGAs to /work/lagos_lgas.geojson LGA names: ['Agege', 'Ajeromi/Ifelodun', 'Alimosho', 'Amuwo Odofin', 'Apapa', 'Badagry', 'Epe', 'Eti Osa', 'Ibeju Lekki', 'Ifako/Ijaye', 'Ikeja', 'Kosofe', 'Lagos Island', 'Lagos Mainland', 'Mushin', 'Ojo', 'Oshodi/Isolo', 'Shomolu', 'Surulere', 'Ìkòròdú']
6:42
Now I need to get the overture building data for the Lagos area. Let me first explore the building schema to understand what columns are available.
6:42
Bash
python3 << 'EOF'
python3 << 'EOF'
import overturemaps
# Check building schema
print("Building types available:", overturemaps.core.get_all_overture_types())
# Get a small sample to see the schema
import geopandas as gpd
# Just get the schema info - use geodataframe with a small bbox around Lagos
# Lagos bbox roughly: (2.7, 6.37, 4.35, 6.7)
bbox = (2.7, 6.37, 4.35, 6.7)
buildings = overturemaps.geodataframe("building", bbox=bbox)
print(f"\nBuilding columns: {buildings.columns.tolist()}")
print(f"Building dtypes:\n{buildings.dtypes}")
print(f"\nShape: {buildings.shape}")
print(f"\nFirst few rows:")
print(buildings.head())
EOF
Building types available: ['address', 'bathymetry', 'building', 'building_part', 'division', 'division_area', 'division_boundary', 'place', 'segment', 'connector', 'infrastructure', 'land', 'land_cover', 'land_use', 'water']
Building columns: ['id', 'names', 'sources', 'level', 'height', 'min_height', 'is_underground', 'num_floors', 'num_floors_underground', 'min_floor', 'subtype', 'class', 'facade_color', 'facade_material', 'roof_material', 'roof_shape', 'roof_direction', 'roof_orientation', 'roof_color', 'roof_height', 'geometry', 'has_parts', 'version', 'bbox']
Building dtypes:
id str
names object
sources object
level float64
height float64
min_height float64
is_underground bool
num_floors float64
num_floors_underground float64
min_floor float64
subtype str
class str
facade_color str
facade_material str
roof_material str
roof_shape str
roof_direction float64
roof_orientation str
roof_color str
roof_height float64
geometry geometry
has_parts bool
version int32
bbox object
dtype: object
Shape: (3107308, 24)
First few rows:
id ... bbox
0 c4148245-7dda-4ed2-99dc-4d4b2d203bc8 ... {'xmin': 2.7009310722351074, 'xmax': 2.7010583...
1 4c2c3125-bdfa-4345-8807-4bc61e88e4ff ... {'xmin': 2.7003517150878906, 'xmax': 2.7004008...
2 7211cd12-94b9-439d-ad5d-824c66f17087 ... {'xmin': 2.7003777027130127, 'xmax': 2.7004311...
3 07cc2b43-9271-4656-b9d3-c1318d9af758 ... {'xmin': 2.7020692825317383, 'xmax': 2.7021231...
4 8c4382f1-5b91-4ebf-93af-e6a83c1c3bd0 ... {'xmin': 2.70092511177063, 'xmax': 2.701052904...
[5 rows x 24 columns]
9:41
The building data is huge (3.1 million rows just for the bounding box around Lagos). I need to: 1. Get the building footprints > 1000 m² within Lagos State 2. Join with LGA boundaries to assign each building to an LGA 3. Compute summary statistics per LGA Let me think about the approach: 1. First, I have the Lagos LGA boundaries saved as GeoJSON 2. I need to load buildings for the Lagos area and spatially filter them 3. For each building that intersects Lagos State, calculate its area and keep those > 1000 m² 4. Join with LGA boundaries to assign LGA names 5. Compute summaries The building data is 3.1M rows just for the bounding box. I need to process this efficiently. Let me use DuckDB with spatial extension for efficient processing, or use geopandas with spatial joins. Actually, let me reconsider. The overturemaps library's geodataframe loads all data matching the bbox into memory. With 3.1M buildings, it could be very heavy. Let me use DuckDB's spatial extension to read the parquet files directly and do the processing more efficiently. Let me check if DuckDB can read from S3 directly.
9:41
Bash
python3 << 'EOF'
python3 << 'EOF'
import duckdb
# Check if we can read from S3
duckdb.sql("""
SELECT count(*)
FROM read_parquet('s3://overturemaps-us-west-1/release/2026-06-17.0/theme=buildings/type=building/*')
WHERE bbox.xmin >= 2.7 AND bbox.xmax <= 4.35 AND bbox.ymin >= 6.37 AND bbox.ymax <= 6.7
""").show()
EOF
Traceback (most recent call last): File "<stdin>", line 3, in <module> _duckdb.HTTPException: HTTP Error: HTTP GET error reading 's3://overturemaps-us-west-1/release/2026-06-17.0/theme=buildings/type=building' in region '' (HTTP 404 Not Found) NoSuchBucket: The specified bucket does not exist
9:46
Let me check how overturemaps library accesses the data - it might use a different endpoint.
9:46
Bash
python3 << 'EOF'
python3 << 'EOF'
import overturemaps
# Check the endpoint used
print("Release:", overturemaps.core.get_latest_release())
print("Available releases:", overturemaps.core.get_available_releases())
EOF
Release: 2026-06-17.0 Available releases: (['2026-06-17.0', '2026-05-20.0'], '2026-06-17.0')
9:55
Let me think about how to approach this efficiently. We have 3.1 million buildings in the Lagos bbox. We need to: 1. Filter to only buildings > 1000 m² 2. Assign each building to an LGA 3. Compute per-LGA stats The `overturemaps.geodataframe` method already loads the data. Let me check what endpoint it uses for the data. Actually, let me first check how overturemaps loads data - it probably handles the S3/URL for us. Let me just use the overturemaps library which already handles the parquet fetching. For the spatial join, I can either: - Use a spatial join with geopandas - Or use DuckDB with spatial extension Given 3.1M buildings and 20 LGAs, let's do this efficiently. Let me first compute area for each building in the bbox, then filter > 1000 m², then spatial join with LGAs. Actually the buildings data returned includes the geometry, and we need to compute area in a projected CRS for accurate area measurement. Let me think about the workflow: 1. Load buildings in Lagos bbox 2. Compute area in a suitable projected CRS (e.g., EPSG:26391 - Nigeria UTM zone 31N or EPSG:32631 - WGS 84 UTM zone 31N) 3. Filter > 1000 m² 4. Spatial join with LGA polygons 5. Compute summary stats But 3.1M buildings is a lot. Let me try DuckDB approach with the overturemaps data. Actually, let me first understand what endpoint overturemaps uses to fetch data.
9:55
Bash
python3 << 'EOF'
python3 << 'EOF' import overturemaps.core as core # Check where the data comes from import inspect print(inspect.getsource(core.geodataframe)[:2000]) EOF
def geodataframe(
overture_type: str,
bbox: BBox | tuple[float, float, float, float] | list[float] | None = None,
release: str = None,
connect_timeout: int = None,
request_timeout: int = None,
stac: bool = False,
) -> GeoDataFrame:
"""
Loads geoparquet for specified type into a geopandas dataframe
Parameters
----------
overture_type: type to load
bbox: optional bounding box for data fetch (xmin, ymin, xmax, ymax)
connect_timeout: optional connection timeout in seconds
request_timeout: optional request timeout in seconds
Returns
-------
GeoDataFrame with the optionally filtered theme data
"""
if not HAS_GEOPANDAS:
raise ImportError("geopandas is required to use this function")
reader = record_batch_reader(
overture_type,
bbox=bbox,
release=release,
connect_timeout=connect_timeout,
request_timeout=request_timeout,
stac=stac,
)
return GeoDataFrame.from_arrow(reader)
10:00
Let me look at the record_batch_reader function to understand the data source.
10:00
Bash
python3 << 'EOF'
python3 << 'EOF' import overturemaps.core as core import inspect print(inspect.getsource(core.record_batch_reader)[:3000]) EOF
def record_batch_reader(
overture_type,
bbox=None,
release=None,
connect_timeout=None,
request_timeout=None,
stac=False,
) -> Optional[pa.RecordBatchReader]:
"""Return a pyarrow RecordBatchReader for the desired bounding box and s3 path, or None on error."""
result = _prepare_query(
overture_type, bbox, release, connect_timeout, request_timeout, stac
)
if result is None:
return None
dataset, filter_expr = result
return _record_batch_reader_from_dataset(dataset, filter_expr=filter_expr)
10:07
Let me check the S3 path used.
10:07
Bash
python3 << 'EOF'
python3 << 'EOF' import overturemaps.core as core import inspect # Get _prepare_query print(inspect.getsource(core._prepare_query)[:3000]) EOF
def _prepare_query(
overture_type,
bbox: BBox | tuple[float, float, float, float] | list[float] | None = None,
release=None,
connect_timeout=None,
request_timeout=None,
stac=False,
) -> Optional[Tuple[ds.Dataset, Optional[pc.Expression]]]:
"""
Resolve the S3 dataset and filter expression for a given query.
Returns the dataset and filter expression ready for counting or streaming,
or None if STAC reports no files intersect the bbox.
"""
if release is None:
release = get_latest_release()
path = _dataset_path(overture_type, release)
bbox_obj = _coerce_bbox(bbox)
intersecting_files = None
if bbox_obj and stac:
intersecting_files = _get_files_from_stac(
type_theme_map[overture_type], overture_type, bbox_obj, release
)
if intersecting_files is not None and len(intersecting_files) == 0:
return None
if bbox_obj:
xmin, ymin, xmax, ymax = bbox_obj.as_tuple()
filter_expr = (
(pc.field("bbox", "xmin") < xmax)
& (pc.field("bbox", "xmax") > xmin)
& (pc.field("bbox", "ymin") < ymax)
& (pc.field("bbox", "ymax") > ymin)
)
else:
filter_expr = None
dataset = ds.dataset(
intersecting_files if intersecting_files is not None else path,
filesystem=fs.S3FileSystem(
anonymous=True,
region="us-west-2",
connect_timeout=connect_timeout,
request_timeout=request_timeout,
),
)
return dataset, filter_expr
10:16
So the overturemaps library uses the s3 filesystem from pyarrow to read data from S3. The geodataframe function already handles fetching data with a bbox filter. We already have 3.1 million buildings for the Lagos bbox. Now I need to: 1. Get all buildings in Lagos State (bounding box + spatial filter using LGA geometries) 2. Calculate footprint areas 3. Join with LGA names 4. Filter to buildings > 1000 m² Actually, the bbox I used (2.7, 6.37, 4.35, 6.7) covers Lagos State and some neighboring areas. Let me use the Lagos State geometry to spatially filter the buildings. Let me now write the main script that: 1. Loads the LGA geometries for Lagos State 2. Loads the buildings for the Lagos bbox 3. Spatial joins buildings to LGAs 4. Calculates footprint area 5. Filters to > 1000 m² 6. Computes per-LGA summaries 7. Outputs both files Let me think about the approach. The building dataset has 3.1 million rows for the bounding box. Doing a spatial join with 20 LGAs should be manageable. Let me use geopandas operations. I should use the Lagos State boundary to clip the buildings first, then spatial join with the LGAs.
10:16
Bash
python3 << 'EOF'
python3 << 'EOF'
import overturemaps
import geopandas as gpd
import pandas as pd
import numpy as np
from shapely import wkb
import pyarrow.parquet as pq
import json
# Load LGA geometries
lgas = gpd.read_file('/work/lagos_lgas.geojson')
print(f"LGAs loaded: {len(lgas)}")
print(lgas)
# Create union of all LGA geometries for Lagos State boundary
lagos_union = lgas.union_all()
print(f"\nLagos State union type: {lagos_union.geom_type}")
print(f"Bounds: {lagos_union.bounds}")
# Load buildings with bbox
bbox = lagos_union.bounds # (xmin, ymin, xmax, ymax)
print(f"\nLoading buildings with bbox: {bbox}")
buildings = overturemaps.geodataframe("building", bbox=bbox)
print(f"Total buildings in bbox: {len(buildings)}")
print(f"Columns: {buildings.columns.tolist()}")
EOF
LGAs loaded: 20
lga geometry
0 Badagry POLYGON ((2.71658 6.45699, 2.715 6.45623, 2.71...
1 Amuwo Odofin POLYGON ((3.30142 6.4856, 3.29815 6.48485, 3.2...
2 Ifako/Ijaye POLYGON ((3.2601 6.69335, 3.25974 6.69188, 3.2...
3 Alimosho POLYGON ((3.26363 6.68593, 3.26259 6.68688, 3....
4 Ojo POLYGON ((3.11433 6.50108, 3.11139 6.5012, 3.1...
5 Kosofe POLYGON ((3.43487 6.61755, 3.43477 6.62054, 3....
6 Shomolu POLYGON ((3.39726 6.5137, 3.40154 6.51511, 3.4...
7 Lagos Mainland POLYGON ((3.40813 6.51653, 3.4067 6.51825, 3.4...
8 Apapa POLYGON ((3.34116 6.43702, 3.3392 6.43759, 3.3...
9 Mushin POLYGON ((3.35791 6.558, 3.35726 6.55769, 3.35...
10 Ikeja POLYGON ((3.35522 6.65638, 3.35471 6.65716, 3....
11 Agege POLYGON ((3.31308 6.64832, 3.31189 6.64821, 3....
12 Oshodi/Isolo POLYGON ((3.32179 6.55426, 3.3183 6.55116, 3.3...
13 Ajeromi/Ifelodun POLYGON ((3.32156 6.46208, 3.31938 6.46088, 3....
14 Surulere POLYGON ((3.37958 6.49029, 3.37897 6.49181, 3....
15 Lagos Island POLYGON ((3.46955 6.59478, 3.47093 6.58921, 3....
16 Eti Osa POLYGON ((3.4039 6.44073, 3.40341 6.4395, 3.40...
17 Ìkòròdú POLYGON ((3.45641 6.64191, 3.45766 6.63943, 3....
18 Epe POLYGON ((4.09425 6.67068, 4.09277 6.67253, 4....
19 Ibeju Lekki POLYGON ((4.19407 6.38705, 4.22673 6.383, 4.24...
Lagos State union type: Polygon
Bounds: (2.7059846, 6.366684, 4.3509056, 6.6967964)
Loading buildings with bbox: (2.7059846, 6.366684, 4.3509056, 6.6967964)
Total buildings in bbox: 3055530
Columns: ['id', 'names', 'sources', 'level', 'height', 'min_height', 'is_underground', 'num_floors', 'num_floors_underground', 'min_floor', 'subtype', 'class', 'facade_color', 'facade_material', 'roof_material', 'roof_shape', 'roof_direction', 'roof_orientation', 'roof_color', 'roof_height', 'geometry', 'has_parts', 'version', 'bbox']
12:51
Now I have 3 million buildings in the bbox. I need to: 1. Filter to only buildings within Lagos State (intersect with union of LGA geometries) 2. Calculate footprint area 3. Assign LGA name 4. Export filtered buildings > 1000 m² Let me do this step by step. First, let me filter the buildings to only those intersecting the Lagos state boundary, then assign LGA names.
12:51
12:51
Write
/work/process_buildings.py
content · 118 lines · py
"""
Extract Lagos State buildings from Overture, filter to >1000m²,
assign LGA via spatial join, and produce two output files.
"""
import sys
import warnings
import time
warnings.filterwarnings("ignore")
import overturemaps
import geopandas as gpd
import pandas as pd
import numpy as np
from shapely import coverage_union
# ── 1. Load LGA boundaries (already clipped to Lagos State) ──────────────
print("Loading LGA boundaries...")
lgas = gpd.read_file("/work/lagos_lgas.geojson")
lgas = lgas.to_crs("EPSG:4326")
print(f" {len(lgas)} LGAs loaded")
# Create a single dissolved boundary for Lagos State
lagos_union = lgas.union_all()
lagos_bounds = lagos_union.bounds # (xmin, ymin, xmax, ymax)
print(f" Lagos bounds: {lagos_bounds}")
# ── 2. Load all buildings in the Lagos bbox ─────────────────────────────
print("\nLoading Overture buildings (bbox filter)...")
t0 = time.time()
buildings = overturemaps.geodataframe("building", bbox=lagos_bounds)
print(f" {len(buildings)} buildings loaded in {time.time()-t0:.1f}s")
print(f" Columns: {buildings.columns.tolist()}")
# ── 3. Spatial filter: keep only buildings within Lagos State ────────────
print("\nFiltering buildings to Lagos State polygon...")
t0 = time.time()
# Use within/intersects - within is stricter
lagos_buildings = buildings[buildings.intersects(lagos_union)].copy()
print(f" {len(lagos_buildings)} buildings within Lagos in {time.time()-t0:.1f}s")
del buildings # free memory
# ── 4. Compute footprint area in square metres ──────────────────────────
print("\nComputing footprint area...")
t0 = time.time()
# Use an equal-area projection for Nigeria: EPSG:26391 (UTM 31N) or EPSG:32631
# Lagos is around UTM zone 31N
aea = lagos_buildings.to_crs("EPSG:32631") # WGS 84 / UTM zone 31N
lagos_buildings["footprint_area_m2"] = aea.area
print(f" Done in {time.time()-t0:.1f}s")
# ── 5. Spatial join to assign LGA name ──────────────────────────────────
print("\nAssigning LGA via spatial join...")
t0 = time.time()
# Use sjoin with predicate='within' for speed
lagos_buildings = lagos_buildings.sjoin(
lgas[["lga", "geometry"]], how="left", predicate="within"
)
print(f" Done in {time.time()-t0:.1f}s")
print(f" NA LGA count: {lagos_buildings['lga'].isna().sum()}")
# Drop the index_right column
lagos_buildings = lagos_buildings.drop(columns=["index_right"])
# ── 6. Filter to buildings > 1000 m² ────────────────────────────────────
print(f"\nFiltering to footprint_area_m2 > 1000...")
t0 = time.time()
before = len(lagos_buildings)
large_buildings = lagos_buildings[lagos_buildings["footprint_area_m2"] > 1000].copy()
print(f" {before} → {len(large_buildings)} buildings ({time.time()-t0:.1f}s)")
# ── 7. Select and rename columns for output ─────────────────────────────
print("\nPreparing output GeoDataFrame...")
out_cols = ["id", "height", "footprint_area_m2", "lga", "geometry"]
out_gdf = large_buildings[out_cols].copy()
# Ensure geometry is in EPSG:4326
out_gdf = out_gdf.set_geometry("geometry")
out_gdf.crs = "EPSG:4326"
print(f" Output rows: {len(out_gdf)}")
print(f" Output columns: {out_gdf.columns.tolist()}")
# ── 8. Write lagos_buildings.geoparquet ─────────────────────────────────
print("\nWriting lagos_buildings.geoparquet...")
t0 = time.time()
out_gdf.to_parquet(
"/work/lagos_buildings.geoparquet",
index=False,
geometry_encoding="geoarrow",
)
print(f" Written to /work/lagos_buildings.geoparquet in {time.time()-t0:.1f}s")
# ── 9. Build per-LGA summary ────────────────────────────────────────────
print("\nBuilding per-LGA summary...")
t0 = time.time()
summary = (
large_buildings.groupby("lga")
.agg(
n_buildings=("id", "count"),
total_footprint_m2=("footprint_area_m2", "sum"),
n_with_height=("height", lambda x: x.notna().sum()),
# median height (NaN if all null)
p50_height_m=("height", "median"),
)
.reset_index()
)
print(f" Summary rows: {len(summary)}")
print(summary.to_string())
# ── 10. Write lagos_building_summary.parquet ────────────────────────────
print("\nWriting lagos_building_summary.parquet...")
summary.to_parquet("/work/lagos_building_summary.parquet", index=False)
print(f" Written to /work/lagos_building_summary.parquet in {time.time()-t0:.1f}s")
print("\n✅ Done!")
Wrote /work/process_buildings.py (4287 chars).
12:56
Let me think about potential issues: 1. The overturemaps library loads all data for the bounding box into memory first as a GeoDataFrame - this could be memory intensive with 3M buildings. But it should work. 2. The spatial join with 3M buildings and 20 LGAs could be slow. Let me think of optimizations. 3. The building dataset has a `has_parts` column - but we don't need that. 4. Need to handle the case where some buildings might not have an LGA assigned (fall outside any LGA boundary). Let me run it and see.
12:56
Bash
cd /work && python process_buildings.py
Loading LGA boundaries... 20 LGAs loaded Lagos bounds: (2.7059846, 6.366684, 4.3509056, 6.6967964) Loading Overture buildings (bbox filter)... 3055530 buildings loaded in 140.2s Columns: ['id', 'names', 'sources', 'level', 'height', 'min_height', 'is_underground', 'num_floors', 'num_floors_underground', 'min_floor', 'subtype', 'class', 'facade_color', 'facade_material', 'roof_material', 'roof_shape', 'roof_direction', 'roof_orientation', 'roof_color', 'roof_height', 'geometry', 'has_parts', 'version', 'bbox'] Filtering buildings to Lagos State polygon... 2243598 buildings within Lagos in 12.7s Computing footprint area...
Traceback (most recent call last):
File "/work/process_buildings.py", line 48, in <module>
aea = lagos_buildings.to_crs("EPSG:32631") # WGS 84 / UTM zone 31N
File "/usr/local/lib/python3.14/site-packages/geopandas/geodataframe.py", line 1842, in to_crs
geom = df.geometry.to_crs(crs=crs, epsg=epsg)
File "/usr/local/lib/python3.14/site-packages/geopandas/geoseries.py", line 1251, in to_crs
self.values.to_crs(crs=crs, epsg=epsg), index=self.index, name=self.name
~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.14/site-packages/geopandas/_compat.py", line 88, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.14/site-packages/geopandas/array.py", line 1114, in to_crs
raise ValueError(
...<2 lines>...
)
ValueError: Cannot transform naive geometries. Please set a crs on the object first.
15:39
The buildings GeoDataFrame from Overture doesn't have a CRS set. I need to set it to EPSG:4326 first before reprojecting.
15:39
Edit
/work/process_buildings.py
Edited /work/process_buildings.py.
15:48
The building data from overturemaps is already in WGS84 coordinates but the CRS might not be explicitly set on the GeoDataFrame. Let me also check if the spatial join and other steps need the CRS set.
15:48
Edit
/work/process_buildings.py
Edited /work/process_buildings.py.
15:53
Let me also fix the CRS setting to remove the duplicate now.
15:53
Edit
/work/process_buildings.py
Edited /work/process_buildings.py.
15:56
Bash
cd /work && python process_buildings.py
Loading LGA boundaries...
20 LGAs loaded
Lagos bounds: (2.7059846, 6.366684, 4.3509056, 6.6967964)
Loading Overture buildings (bbox filter)...
3055530 buildings loaded in 130.9s
Columns: ['id', 'names', 'sources', 'level', 'height', 'min_height', 'is_underground', 'num_floors', 'num_floors_underground', 'min_floor', 'subtype', 'class', 'facade_color', 'facade_material', 'roof_material', 'roof_shape', 'roof_direction', 'roof_orientation', 'roof_color', 'roof_height', 'geometry', 'has_parts', 'version', 'bbox']
Filtering buildings to Lagos State polygon...
2243598 buildings within Lagos in 12.6s
Computing footprint area...
Done in 3.7s
Assigning LGA via spatial join...
Done in 2.6s
NA LGA count: 2651
Filtering to footprint_area_m2 > 1000...
2243598 → 7256 buildings (0.0s)
Preparing output GeoDataFrame...
Output rows: 7256
Output columns: ['id', 'height', 'footprint_area_m2', 'lga', 'geometry']
Writing lagos_buildings.geoparquet...
Written to /work/lagos_buildings.geoparquet in 0.1s
Building per-LGA summary...
Summary rows: 20
lga n_buildings total_footprint_m2 n_with_height p50_height_m
0 Agege 47 9.387676e+04 0 NaN
1 Ajeromi/Ifelodun 59 1.044837e+05 0 NaN
2 Alimosho 350 6.435003e+05 0 NaN
3 Amuwo Odofin 835 1.749515e+06 0 NaN
4 Apapa 286 6.083739e+05 0 NaN
5 Badagry 122 2.061790e+05 0 NaN
6 Epe 342 8.118842e+05 0 NaN
7 Eti Osa 1063 1.853554e+06 237 20.0
8 Ibeju Lekki 593 1.790242e+06 0 NaN
9 Ifako/Ijaye 117 2.429469e+05 0 NaN
10 Ikeja 912 3.141260e+06 46 15.0
11 Kosofe 163 3.399309e+05 0 NaN
12 Lagos Island 117 2.261127e+05 9 20.0
13 Lagos Mainland 122 2.283266e+05 0 NaN
14 Mushin 459 1.045023e+06 1 35.0
15 Ojo 232 4.085015e+05 0 NaN
16 Oshodi/Isolo 414 7.878077e+05 3 27.0
17 Shomolu 122 1.951046e+05 3 17.0
18 Surulere 338 7.737084e+05 0 NaN
19 Ìkòròdú 534 1.857791e+06 4 7.5
Writing lagos_building_summary.parquet...
Written to /work/lagos_building_summary.parquet in 0.0s
✅ Done!
18:38
The script completed successfully. Let me verify the outputs look correct.
18:38
18:38
Bash
python3 << 'EOF'
python3 << 'EOF'
import geopandas as gpd
import pandas as pd
# Check GeoParquet
gdf = gpd.read_parquet("/work/lagos_buildings.geoparquet")
print("=== lagos_buildings.geoparquet ===")
print(f"Rows: {len(gdf)}")
print(f"Columns: {gdf.columns.tolist()}")
print(f"CRS: {gdf.crs}")
print(f"Dtypes:\n{gdf.dtypes}")
print(f"\nFirst 3 rows:")
print(gdf.head(3))
print(f"\nNull height: {gdf['height'].isna().sum()}")
print(f"Null lga: {gdf['lga'].isna().sum()}")
print(f"Min area: {gdf['footprint_area_m2'].min():.2f}")
print(f"Max area: {gdf['footprint_area_m2'].max():.2f}")
print("\n\n=== lagos_building_summary.parquet ===")
df = pd.read_parquet("/work/lagos_building_summary.parquet")
print(f"Rows: {len(df)}")
print(f"Columns: {df.columns.tolist()}")
print(df.to_string())
EOF
=== lagos_buildings.geoparquet ===
Rows: 7256
Columns: ['id', 'height', 'footprint_area_m2', 'lga', 'geometry']
CRS: {"$schema": "https://proj.org/schemas/v0.7/projjson.schema.json", "type": "GeographicCRS", "name": "WGS 84", "datum_ensemble": {"name": "World Geodetic System 1984 ensemble", "members": [{"name": "World Geodetic System 1984 (Transit)"}, {"name": "World Geodetic System 1984 (G730)"}, {"name": "World Geodetic System 1984 (G873)"}, {"name": "World Geodetic System 1984 (G1150)"}, {"name": "World Geodetic System 1984 (G1674)"}, {"name": "World Geodetic System 1984 (G1762)"}, {"name": "World Geodetic System 1984 (G2139)"}, {"name": "World Geodetic System 1984 (G2296)"}], "ellipsoid": {"name": "WGS 84", "semi_major_axis": 6378137, "inverse_flattening": 298.257223563}, "accuracy": "2.0", "id": {"authority": "EPSG", "code": 6326}}, "coordinate_system": {"subtype": "ellipsoidal", "axis": [{"name": "Geodetic latitude", "abbreviation": "Lat", "direction": "north", "unit": "degree"}, {"name": "Geodetic longitude", "abbreviation": "Lon", "direction": "east", "unit": "degree"}]}, "scope": "Horizontal component of 3D system.", "area": "World.", "bbox": {"south_latitude": -90, "west_longitude": -180, "north_latitude": 90, "east_longitude": 180}, "id": {"authority": "EPSG", "code": 4326}}
Dtypes:
id str
height float64
footprint_area_m2 float64
lga str
geometry geometry
dtype: object
First 3 rows:
id ... geometry
0 e24f1bf5-e957-4867-859b-a6a578fd9b06 ... MULTIPOLYGON (((2.70836 6.37804, 2.70816 6.378...
1 5df57f70-f431-495f-98cb-eff96e1ca151 ... MULTIPOLYGON (((2.72675 6.37881, 2.72672 6.379...
2 cfb03953-1851-4ec2-a087-3283cd510753 ... MULTIPOLYGON (((2.7137 6.37984, 2.7136 6.37984...
[3 rows x 5 columns]
Null height: 6953
Null lga: 29
Min area: 1000.04
Max area: 181048.18
=== lagos_building_summary.parquet ===
Rows: 20
Columns: ['lga', 'n_buildings', 'total_footprint_m2', 'n_with_height', 'p50_height_m']
lga n_buildings total_footprint_m2 n_with_height p50_height_m
0 Agege 47 9.387676e+04 0 NaN
1 Ajeromi/Ifelodun 59 1.044837e+05 0 NaN
2 Alimosho 350 6.435003e+05 0 NaN
3 Amuwo Odofin 835 1.749515e+06 0 NaN
4 Apapa 286 6.083739e+05 0 NaN
5 Badagry 122 2.061790e+05 0 NaN
6 Epe 342 8.118842e+05 0 NaN
7 Eti Osa 1063 1.853554e+06 237 20.0
8 Ibeju Lekki 593 1.790242e+06 0 NaN
9 Ifako/Ijaye 117 2.429469e+05 0 NaN
10 Ikeja 912 3.141260e+06 46 15.0
11 Kosofe 163 3.399309e+05 0 NaN
12 Lagos Island 117 2.261127e+05 9 20.0
13 Lagos Mainland 122 2.283266e+05 0 NaN
14 Mushin 459 1.045023e+06 1 35.0
15 Ojo 232 4.085015e+05 0 NaN
16 Oshodi/Isolo 414 7.878077e+05 3 27.0
17 Shomolu 122 1.951046e+05 3 17.0
18 Surulere 338 7.737084e+05 0 NaN
19 Ìkòròdú 534 1.857791e+06 4 7.5
18:47
The outputs look good. Let me check the 29 null LGA records to see if we can handle them better. Some buildings might be on boundaries and not strictly "within" any LGA polygon. Let me check.
18:47
Bash
python3 << 'EOF'
python3 << 'EOF'
import geopandas as gpd
import pandas as pd
# Check the buildings with null LGA
gdf = gpd.read_parquet("/work/lagos_buildings.geoparquet")
null_lga = gdf[gdf['lga'].isna()]
print(f"Buildings with null LGA: {len(null_lga)}")
# Check if they're on boundary - try spatial join with predicate='intersects'
lgas = gpd.read_file("/work/lagos_lgas.geojson")
intersects = gdf[gdf['lga'].isna()].copy()
intersects = intersects.sjoin(lgas, how="left", predicate="intersects")
print(f"After intersects join, null count: {intersects['lga'].isna().sum()}")
if len(intersects) > 0 and intersects['lga'].notna().any():
print("Some buildings intersect but don't fall within any LGA.")
# Show which LGAs they hit
print(intersects['lga'].value_counts().head(10))
EOF
Buildings with null LGA: 29
Traceback (most recent call last):
File "/usr/local/lib/python3.14/site-packages/pandas/core/indexes/base.py", line 3641, in get_loc
return self._engine.get_loc(casted_key)
~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^
File "pandas/_libs/index.pyx", line 168, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 197, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 7668, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 7676, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'lga'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 13, in <module>
File "/usr/local/lib/python3.14/site-packages/geopandas/geodataframe.py", line 1891, in __getitem__
result = super().__getitem__(key)
File "/usr/local/lib/python3.14/site-packages/pandas/core/frame.py", line 4378, in __getitem__
indexer = self.columns.get_loc(key)
File "/usr/local/lib/python3.14/site-packages/pandas/core/indexes/base.py", line 3648, in get_loc
raise KeyError(key) from err
KeyError: 'lga'
18:54
The sjoin added `lga_left` and `lga_right`. Let me check.
18:54
Bash
python3 << 'EOF'
python3 << 'EOF'
import geopandas as gpd
import pandas as pd
gdf = gpd.read_parquet("/work/lagos_buildings.geoparquet")
null_lga = gdf[gdf['lga'].isna()].copy()
print(f"Buildings with null LGA: {len(null_lga)}")
print(null_lga)
# Check if they might be on boundaries
lgas = gpd.read_file("/work/lagos_lgas.geojson")
# Try intersects
intersects = null_lga.sjoin(lgas, how="left", predicate="intersects")
print(f"\nIntersects join columns: {intersects.columns.tolist()}")
# Check for lga column
if 'lga_right' in intersects.columns:
print(f"Intersects - LGA matches: {intersects['lga_right'].tolist()}")
else:
# Check what lga columns exist
lga_cols = [c for c in intersects.columns if 'lga' in c.lower()]
print(f"LGA cols: {lga_cols}")
EOF
Buildings with null LGA: 29
id ... geometry
1530 895f9f3b-f7ef-4a30-ae21-c883334ba511 ... MULTIPOLYGON (((3.09233 6.49853, 3.09254 6.498...
1531 405668f6-875c-4a5c-aaef-bf519dfe4687 ... MULTIPOLYGON (((3.08985 6.50027, 3.0898 6.5003...
2070 2c6a5f4f-2ab8-4690-bb73-a42b8f2f4995 ... MULTIPOLYGON (((3.27868 6.48533, 3.27851 6.485...
2433 50eabc23-68e1-4f73-8939-22d99da1674f ... MULTIPOLYGON (((3.33305 6.47083, 3.33229 6.470...
2610 9a614aad-8f9f-4af9-abda-9970f9bf63c2 ... MULTIPOLYGON (((3.284 6.51873, 3.28367 6.5181,...
2612 9a4fa19c-8b4e-4694-9d45-39bd8f4ab60d ... MULTIPOLYGON (((3.28823 6.52972, 3.28792 6.529...
3328 64ad92cd-059b-4dec-9071-2ecbc76f291f ... MULTIPOLYGON (((3.37037 6.51161, 3.37035 6.511...
3330 32b6bc3b-1aa5-456c-8f39-fce66e85470f ... MULTIPOLYGON (((3.37063 6.51073, 3.37062 6.510...
3331 967da7df-cc98-42cc-90c6-5747eb159ff1 ... MULTIPOLYGON (((3.37171 6.50779, 3.37182 6.507...
3798 02444433-78a8-404c-8327-433d356d7371 ... MULTIPOLYGON (((3.33106 6.61436, 3.33038 6.614...
3799 6f9ae86c-c9e6-4a4b-8ae0-704e8df142bf ... MULTIPOLYGON (((3.32918 6.61258, 3.32916 6.612...
3800 94fe8c9f-260d-4554-abf9-9e5d91476876 ... MULTIPOLYGON (((3.32977 6.61257, 3.32976 6.612...
3805 51e519cc-748e-444e-a784-1112c5c5b841 ... MULTIPOLYGON (((3.33087 6.61155, 3.32977 6.611...
3806 fd468090-1e08-4511-86b2-4b78b1d4fe7a ... MULTIPOLYGON (((3.3309 6.6113, 3.3298 6.61115,...
3807 65eaedf5-1b0a-452a-a9b6-039eddc0a9c2 ... MULTIPOLYGON (((3.33092 6.61117, 3.32987 6.611...
3811 217771a9-a907-4398-bd21-e6791d46b2b8 ... MULTIPOLYGON (((3.331 6.61045, 3.33015 6.61036...
4179 a44ac7f1-a26d-47e2-b2ac-ee1149697cfe ... MULTIPOLYGON (((3.35563 6.557, 3.3554 6.55692,...
4186 f04b91f4-2435-4dc1-8f8a-ba646b30910a ... MULTIPOLYGON (((3.35668 6.55688, 3.35671 6.556...
4242 bd2e25fa-e21a-4f34-bddc-2e6fb3426648 ... MULTIPOLYGON (((3.36753 6.56764, 3.36718 6.567...
4521 9230bb15-deaa-44a6-90b1-a44e4c3e7842 ... MULTIPOLYGON (((3.33439 6.57954, 3.33436 6.579...
4705 c836554e-7788-4a78-b1fe-d6259fa68aca ... MULTIPOLYGON (((3.33277 6.61541, 3.33283 6.615...
4924 40f2005b-a6ad-4fc6-a810-31478416e631 ... MULTIPOLYGON (((3.37372 6.6059, 3.37367 6.6059...
6210 05d75e5a-88f6-4722-8722-299fe81c5af9 ... MULTIPOLYGON (((3.89782 6.53447, 3.89852 6.534...
6211 7b726225-b4d8-4196-a0b4-541b558d164b ... MULTIPOLYGON (((3.89814 6.53399, 3.89883 6.534...
6212 70b6aa21-6207-4361-882c-72ba170b747a ... MULTIPOLYGON (((3.89798 6.5342, 3.89866 6.5344...
6622 a79d5935-83e2-40a8-8e35-6d7647b0cb89 ... MULTIPOLYGON (((3.36943 6.63907, 3.36949 6.639...
6770 80ac5101-5647-4f61-b5e5-f4f9bce80539 ... MULTIPOLYGON (((3.27445 6.68867, 3.27426 6.688...
6771 e588d926-ae29-4bf5-bbaf-49b72349acb7 ... MULTIPOLYGON (((3.27476 6.68775, 3.27464 6.687...
6772 425622ca-c130-41a9-9caf-4c0cd761c12d ... MULTIPOLYGON (((3.27521 6.68745, 3.27482 6.687...
[29 rows x 5 columns]
Intersects join columns: ['id', 'height', 'footprint_area_m2', 'lga_left', 'geometry', 'index_right', 'lga_right']
Intersects - LGA matches: ['Badagry', 'Badagry', 'Amuwo Odofin', 'Alimosho', 'Surulere', 'Ajeromi/Ifelodun', 'Oshodi/Isolo', 'Alimosho', 'Oshodi/Isolo', 'Alimosho', 'Shomolu', 'Mushin', 'Shomolu', 'Mushin', 'Surulere', 'Lagos Mainland', 'Ikeja', 'Agege', 'Ikeja', 'Agege', 'Ikeja', 'Agege', 'Ikeja', 'Agege', 'Ikeja', 'Agege', 'Ikeja', 'Agege', 'Ikeja', 'Agege', 'Mushin', 'Ikeja', 'Mushin', 'Ikeja', 'Kosofe', 'Ikeja', 'Oshodi/Isolo', 'Ikeja', 'Ikeja', 'Agege', 'Kosofe', 'Ikeja', 'Ibeju Lekki', 'Epe', 'Ibeju Lekki', 'Epe', 'Ibeju Lekki', 'Epe', 'Ikeja', 'Ifako/Ijaye', 'Ifako/Ijaye', 'Ifako/Ijaye']
19:05
So 29 buildings out of 7256 (0.4%) have null LGA because they fall on boundaries between LGAs and were not strictly "within" any single LGA. These are edge cases. For the purpose of the exercise, this is a very small percentage and acceptable. The `within` predicate is the most accurate approach - these buildings are on boundaries between LGAs. Let me also update those to use `intersects` for the 29 edge cases so we can fill them in. But actually the task says "for each Lagos Local Government Area" and with 29 buildings not assigned, it's a negligible amount. Let me update the script to use `intersects` instead of `within` to capture those boundary cases, or better yet, we can use a simple point-in-polygon approach with the centroid. Actually, let me re-assign those 29 with intersects to make the output cleaner.
19:05
19:05
Edit
/work/process_buildings.py
Edited /work/process_buildings.py.
19:13
Wait, I also need to fix the column naming. When using sjoin with `how="left"`, the right columns get suffix "_right" if there are overlapping names. The LGA column already exists in both, so it will become "lga_left" and "lga_right". Let me check and adjust.
19:13
Read
/work/process_buildings.py
52 print(f" Done in {time.time()-t0:.1f}s")
53
54 # ── 5. Spatial join to assign LGA name ──────────────────────────────────
55 print("\nAssigning LGA via spatial join...")
56 t0 = time.time()
57 # Use sjoin with predicate='intersects' to capture boundary buildings
58 # When a building intersects multiple LGAs (on boundaries), we pick the first
59 lagos_buildings = lagos_buildings.sjoin(
60 lgas[["lga", "geometry"]], how="left", predicate="intersects"
61 )
62 # Drop duplicate index rows from multi-intersections, keep first
63 lagos_buildings = lagos_buildings[~lagos_buildings.index.duplicated(keep="first")]
64 print(f" Done in {time.time()-t0:.1f}s")
65 print(f" NA LGA count: {lagos_buildings['lga'].isna().sum()}")
66
67 # Drop the index_right column
68 lagos_buildings = lagos_buildings.drop(columns=["index_right"])
69
70 # ── 6. Filter to buildings > 1000 m² ────────────────────────────────────
71 print(f"\nFiltering to footprint_area_m2 > 1000...")
72 t0 = time.time()
73 before = len(lagos_buildings)
74 large_buildings = lagos_buildings[lagos_buildings["footprint_area_m2"] > 1000].copy()
75 print(f" {before} → {len(large_buildings)} buildings ({time.time()-t0:.1f}s)")
76
77 # ── 7. Select and rename columns for output ─────────────────────────────
78 print("\nPreparing output GeoDataFrame...")
79 out_cols = ["id", "height", "footprint_area_m2", "lga", "geometry"]
80 out_gdf = large_buildings[out_cols].copy()
81
<system-reminder>Showed lines 52-81 of 123. Re-call Read with offset=82 to continue, or raise limit.</system-reminder>
19:22
The sjoin with `intersects` will create multiple rows for buildings that touch multiple LGAs. The `drop_duplicates` approach only keeps the first occurrence. But the sjoin output has a column `index_right` which needs to be dropped. Let me also think about whether we need to handle the `lga_left` vs `lga` column naming issue. Actually, when you do `sjoin(gdf, lgas[["lga", "geometry"]], ...)`, and the left df already has a 'lga' column? Let me check - no, the left df doesn't have an 'lga' column. So the result should just have 'lga' from the right side. Wait, but I need to be more careful. Let me check the output columns after sjoin. The previous run showed the column was just 'lga' (not 'lga_left'). But now the intersects join might give us different column names if there are column overlaps. Actually, the buildings don't have an 'lga' column, so it should be fine. However, the duplicate removal approach is not robust. When a building intersects multiple LGAs, the sjoin will create duplicate rows with the same index. Using `~lagos_buildings.index.duplicated(keep="first")` will keep the first LGA match, which is fine. But wait - the sjoin already has `how="left"`, which means all buildings are preserved. The issue is that some buildings might intersect multiple LGAs, creating extra rows. So I need to drop duplicates on the original index. Let me also think about whether the sjoin now introduces an 'index_right' column. Yes it does, and we drop it. But the column might be called 'index_right' from the sjoin operation. Let me re-run and also handle the edge cases. But first, let me also make sure that the output is clean. Let me re-run with the updated script.
19:22
Bash
cd /work && python process_buildings.py
Loading LGA boundaries...
20 LGAs loaded
Lagos bounds: (2.7059846, 6.366684, 4.3509056, 6.6967964)
Loading Overture buildings (bbox filter)...
3055530 buildings loaded in 144.1s
Columns: ['id', 'names', 'sources', 'level', 'height', 'min_height', 'is_underground', 'num_floors', 'num_floors_underground', 'min_floor', 'subtype', 'class', 'facade_color', 'facade_material', 'roof_material', 'roof_shape', 'roof_direction', 'roof_orientation', 'roof_color', 'roof_height', 'geometry', 'has_parts', 'version', 'bbox']
Filtering buildings to Lagos State polygon...
2243598 buildings within Lagos in 12.5s
Computing footprint area...
Done in 4.6s
Assigning LGA via spatial join...
Done in 9.6s
NA LGA count: 0
Filtering to footprint_area_m2 > 1000...
2243598 → 7256 buildings (0.0s)
Preparing output GeoDataFrame...
Output rows: 7256
Output columns: ['id', 'height', 'footprint_area_m2', 'lga', 'geometry']
Writing lagos_buildings.geoparquet...
Written to /work/lagos_buildings.geoparquet in 0.0s
Building per-LGA summary...
Summary rows: 20
lga n_buildings total_footprint_m2 n_with_height p50_height_m
0 Agege 47 9.387676e+04 0 NaN
1 Ajeromi/Ifelodun 59 1.044837e+05 0 NaN
2 Alimosho 350 6.435003e+05 0 NaN
3 Amuwo Odofin 836 1.751259e+06 0 NaN
4 Apapa 286 6.083739e+05 0 NaN
5 Badagry 124 2.259610e+05 0 NaN
6 Epe 342 8.118842e+05 0 NaN
7 Eti Osa 1063 1.853554e+06 237 20.0
8 Ibeju Lekki 596 1.793963e+06 0 NaN
9 Ifako/Ijaye 120 2.485358e+05 0 NaN
10 Ikeja 921 3.166509e+06 46 15.0
11 Kosofe 165 3.437419e+05 0 NaN
12 Lagos Island 117 2.261127e+05 9 20.0
13 Lagos Mainland 122 2.283266e+05 0 NaN
14 Mushin 461 1.058241e+06 1 35.0
15 Ojo 232 4.085015e+05 0 NaN
16 Oshodi/Isolo 417 7.918409e+05 3 27.0
17 Shomolu 124 2.006563e+05 3 17.0
18 Surulere 340 7.783158e+05 0 NaN
19 Ìkòròdú 534 1.857791e+06 4 7.5
Writing lagos_building_summary.parquet...
Written to /work/lagos_building_summary.parquet in 0.0s
✅ Done!
22:32
Now there are 0 NA LGA values. The summary counts shifted slightly because the boundary buildings are now assigned. Let me do a final verification of both output files.
22:32
Bash
python3 << 'EOF'
python3 << 'EOF'
import geopandas as gpd
import pandas as pd
print("=" * 60)
print("VERIFICATION")
print("=" * 60)
# GeoParquet
gdf = gpd.read_parquet("/work/lagos_buildings.geoparquet")
print(f"\n📦 lagos_buildings.geoparquet")
print(f" Rows: {len(gdf):,}")
print(f" Columns: {gdf.columns.tolist()}")
print(f" CRS: EPSG:4326 ✓" if gdf.crs.to_epsg() == 4326 else f" CRS: {gdf.crs} ✗")
print(f" Geometry encoding: geoarrow")
print(f" Null LGA: {gdf['lga'].isna().sum()}")
print(f" Min area: {gdf['footprint_area_m2'].min():.2f} m²")
print(f" Max area: {gdf['footprint_area_m2'].max():.2f} m²")
print(f" All > 1000 m²: {(gdf['footprint_area_m2'] > 1000).all()}")
# Summary Parquet
df = pd.read_parquet("/work/lagos_building_summary.parquet")
print(f"\n📦 lagos_building_summary.parquet")
print(f" Rows: {len(df)}")
print(f" Columns: {df.columns.tolist()}")
print(f" LGAs: {sorted(df['lga'].tolist())}")
print(f"\n Summary:")
print(df.to_string(index=False))
# Cross-check totals
print(f"\n🔍 Cross-check:")
print(f" Total buildings in GeoParquet: {len(gdf):,}")
print(f" Total buildings in summary: {df['n_buildings'].sum():,}")
print(f" Match: {len(gdf) == df['n_buildings'].sum()}")
print(f" Total footprint in GeoParquet: {gdf['footprint_area_m2'].sum():,.0f} m²")
print(f" Total footprint in summary: {df['total_footprint_m2'].sum():,.0f} m²")
print(f" Match: {abs(gdf['footprint_area_m2'].sum() - df['total_footprint_m2'].sum()) < 0.01}")
EOF
============================================================
VERIFICATION
============================================================
📦 lagos_buildings.geoparquet
Rows: 7,256
Columns: ['id', 'height', 'footprint_area_m2', 'lga', 'geometry']
CRS: EPSG:4326 ✓
Geometry encoding: geoarrow
Null LGA: 0
Min area: 1000.04 m²
Max area: 181048.18 m²
All > 1000 m²: True
📦 lagos_building_summary.parquet
Rows: 20
Columns: ['lga', 'n_buildings', 'total_footprint_m2', 'n_with_height', 'p50_height_m']
LGAs: ['Agege', 'Ajeromi/Ifelodun', 'Alimosho', 'Amuwo Odofin', 'Apapa', 'Badagry', 'Epe', 'Eti Osa', 'Ibeju Lekki', 'Ifako/Ijaye', 'Ikeja', 'Kosofe', 'Lagos Island', 'Lagos Mainland', 'Mushin', 'Ojo', 'Oshodi/Isolo', 'Shomolu', 'Surulere', 'Ìkòròdú']
Summary:
lga n_buildings total_footprint_m2 n_with_height p50_height_m
Agege 47 9.387676e+04 0 NaN
Ajeromi/Ifelodun 59 1.044837e+05 0 NaN
Alimosho 350 6.435003e+05 0 NaN
Amuwo Odofin 836 1.751259e+06 0 NaN
Apapa 286 6.083739e+05 0 NaN
Badagry 124 2.259610e+05 0 NaN
Epe 342 8.118842e+05 0 NaN
Eti Osa 1063 1.853554e+06 237 20.0
Ibeju Lekki 596 1.793963e+06 0 NaN
Ifako/Ijaye 120 2.485358e+05 0 NaN
Ikeja 921 3.166509e+06 46 15.0
Kosofe 165 3.437419e+05 0 NaN
Lagos Island 117 2.261127e+05 9 20.0
Lagos Mainland 122 2.283266e+05 0 NaN
Mushin 461 1.058241e+06 1 35.0
Ojo 232 4.085015e+05 0 NaN
Oshodi/Isolo 417 7.918409e+05 3 27.0
Shomolu 124 2.006563e+05 3 17.0
Surulere 340 7.783158e+05 0 NaN
Ìkòròdú 534 1.857791e+06 4 7.5
🔍 Cross-check:
Total buildings in GeoParquet: 7,256
Total buildings in summary: 7,256
Match: True
Total footprint in GeoParquet: 17,195,429 m²
Total footprint in summary: 17,195,429 m²
Match: True
22:44
Both files look correct. Let me do a final check to ensure the Parquet files have no issues by reading them back. Let me also verify that the summary file is indeed a plain Parquet (no geometry column) as specified. Everything checks out: 1. **lagos_buildings.geoparquet** - EPSG:4326 GeoParquet with columns `id`, `height`, `footprint_area_m2`, `lga`, `geometry`. 7,256 buildings all > 1000 m². ✓ 2. **lagos_building_summary.parquet** - Plain Parquet with columns `lga`, `n_buildings`, `total_footprint_m2`, `n_with_height`, `p50_height_m`. ✓ The cross-checks pass - totals match between the two files.
22:44