crs-l2-svalbard-polar-areas
Analyst notes
Description
Tests whether the agent recognises that area ranking at 78 degrees north is meaningless in WGS84 and that the textbook-correct fix is an equal-area projection, not just any polar one. The prompt asks for the 'most appropriate coordinate system for measuring area at these latitudes' but deliberately does not name LAEA or any EPSG code, so the agent has to make the equal-area-versus-conformal call on its own. A conformal polar pick (Polar Stereographic) still earns most of the credit because the area distortion at Svalbard latitudes is under 1%, but it loses the equal-area subcheck.
Approach
- Open the GeoPackage and confirm the input is in WGS84 with named glacier polygons.
- Pick a North-Pole-origin equal-area projection, ideally a Lambert Azimuthal Equal-Area variant centred on the European Arctic, and reproject the polygons into it.
- Compute each polygon's area in square kilometres and its axis-aligned bounding box in the projected metres.
- Sort by area descending and keep the top 20.
- Write a CSV with name, area_km2, the four bbox columns, and a crs_epsg column whose integer value is the same on every row.
Pitfalls
- Computing area directly on the WGS84 polygons gives values in square degrees and produces a ranking that has nothing to do with real glacier size, so the area and top-20 subchecks both collapse.
- Picking a Polar Stereographic CRS (EPSG:3995 or EPSG:3413) is defensible and passes the area subchecks within tolerance, but it fails the equal-area subcheck because the instruction asks for true geographic area.
- Picking UTM 33 North looks reasonable for Norway but is a cylindrical projection that distorts area badly at 78 degrees north, and it also falls outside the North-Pole-origin set the grader rewards.
- Forgetting the crs_epsg column, or putting different EPSG values on different rows, fails Gate 1 because the grader cannot decide which frame the bbox values are in.
- Reporting bboxes in WGS84 degrees while declaring crs_epsg as a projected code, or swapping min and max, breaks the bbox subchecks even when the area work is right.
Map
Recent runs task v2
| adapter | started | score | steps | duration | cost | status |
|---|---|---|---|---|---|---|
| openrouter-gemma4-26b-basic | 2026-06-18T07:32:32Z | pending | — | — | — | pending |
| openrouter-deepseek-v4-flash-basic | 2026-06-18T03:08:04Z | done | 8 | 1:12 | 0.23¢ | done |
| openrouter-deepseek-v4-flash-detailed | 2026-06-17T22:01:33Z | done | 8 | 1:06 | 0.30¢ | done |
| openrouter-gemma4-26b-detailed | 2026-06-17T19:47:47Z | done | 17 | 1:58 | 0.97¢ | done |
| openrouter-deepseek-v4-flash-basic | 2026-06-16T21:43:55Z | 1.00 | 12 | 1:44 | 0.48¢ | done |