Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor the features module #1157

Merged
merged 22 commits into from
Apr 24, 2024
Merged

Refactor the features module #1157

merged 22 commits into from
Apr 24, 2024

Conversation

gboeing
Copy link
Owner

@gboeing gboeing commented Apr 3, 2024

In advance of v2.0.0 (see #1106), this PR refactors the internal functionality of the features module, leaving its public API unchanged. It produces only minor differences to the final GeoDataFrame (namely, no more "nodes" or "ways" columns containing lists of the members of the ways and relations, respectively).

It streamlines the code for maintainability and improves runtime and memory efficiency. We have cut the lines of code by about ~1/3 and runtime for a medium-size query by ~20%. See below for example (cached) timings:

import osmnx as ox
import osmnx.features_v1
place = "Manhattan, New York, USA"
tags = {"building": True}

%%timeit
gdf1 = ox.features.features_from_place(place, tags)
# 3.89 s ± 49.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%%timeit
gdf2 = ox.features_v1.features_from_place(place, tags)
# 4.83 s ± 71.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

@gboeing
Copy link
Owner Author

gboeing commented Apr 3, 2024

I would appreciate anyone's help in testing this PR. The goal is to ensure result parity with the old v1 features module while maintaining or improving runtime speed. I still have that old v1 features module in this branch, renamed as features_v1. I'll remove it after comparative testing suggests consistent result parity between the old module and the new one.

Here is an example testing script that could be expanded for more query locations/tags:

from itertools import product
import osmnx as ox
import osmnx.features_v1
ox.settings.log_console = True
ox.settings.requests_timeout = 500

def test(gdf1, gdf2):
    gdf2 = gdf2.dropna(axis="columns", how="all")
    gdf2 = gdf2.drop(columns=set(gdf2.columns) & {"nodes", "ways"})
    assert (gdf1.geometry.type == gdf2.geometry.type).all()
    assert all(g1.equals(g2) for g1, g2 in zip(gdf1.geometry, gdf2.geometry))
    gdf1 = gdf1.reindex(columns=sorted(gdf1.columns)).sort_index().drop(columns=["geometry"])
    gdf2 = gdf2.reindex(columns=sorted(gdf2.columns)).sort_index().drop(columns=["geometry"])
    assert gdf1.shape == gdf2.shape
    assert set(gdf1.columns) == set(gdf2.columns)
    assert (gdf1.index.to_numpy() == gdf2.index.to_numpy()).all()
    assert gdf1.equals(gdf2)

# file from /tests/input_data
# don't compare these... invalid geoms are handled slightly differently in v2
gdf1 = ox.features.features_from_xml("planet_10.068,48.135_10.071,48.137.osm")
gdf2 = ox.features_v1.features_from_xml("planet_10.068,48.135_10.071,48.137.osm")

# file from /tests/input_data
gdf1 = ox.features.features_from_xml("West-Oakland.osm.bz2")
gdf2 = ox.features_v1.features_from_xml("West-Oakland.osm.bz2")
test(gdf1, gdf2)

places = ["Piedmont, California, USA",
          "Manhattan, New York, USA",
          "Yokohama, Japan",
          "Bonn, Germany"]
tagsl = [{"building": ["house", "garage", "detached"], "parking": True, "highway": "bus_stop"},
         {"building": True}]

for place, tags in product(places, tagsl):
    print(place, tags)
    gdf1 = ox.features.features_from_place(place, tags)
    gdf2 = ox.features_v1.features_from_place(place, tags)
    test(gdf1, gdf2)

@AtelierLibre you've done a ton of work for this module in the past, and your feedback would be most valuable if you could 1) formally review this PR, and 2) run some tests on this branch to check result parity. Thanks!

Copy link

codecov bot commented Apr 3, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.30%. Comparing base (ba5df5c) to head (0735f22).

Additional details and impacted files
@@            Coverage Diff             @@
##               v2    #1157      +/-   ##
==========================================
+ Coverage   97.98%   98.30%   +0.32%     
==========================================
  Files          24       24              
  Lines        2435     2365      -70     
==========================================
- Hits         2386     2325      -61     
+ Misses         49       40       -9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@gboeing gboeing mentioned this pull request Apr 3, 2024
13 tasks
@AtelierLibre
Copy link
Contributor

@gboeing Time is a little tight at the moment but definitely happy to have a look through. If I can run some comparisons I will but, realistically, they're probably going to be limited. Really interested to see the refactoring!

@gboeing
Copy link
Owner Author

gboeing commented Apr 7, 2024

Thanks @AtelierLibre I look forward to your review.

@gboeing
Copy link
Owner Author

gboeing commented Apr 15, 2024

Aiming to merge this PR next week if there are no comments or changes required.

@gboeing
Copy link
Owner Author

gboeing commented Apr 24, 2024

I found one divergence in geometry between features v1 and v2: https://www.openstreetmap.org/relation/12737019

This is a complex multipolygon, and as far as I can tell, it is incorrectly digitized. It has nested polygons, but all of its inner holes and islands are tagged as "inner" polygons. However, according to the wiki, an island within a hole within a polygon should be tagged as "outer", "inner", and "outer", respectively. So it appears the v2 algorithm is working properly but the source data is bad.

@gboeing gboeing merged commit f8b0c17 into v2 Apr 24, 2024
7 checks passed
@gboeing gboeing deleted the features branch April 24, 2024 15:39
@gboeing
Copy link
Owner Author

gboeing commented May 3, 2024

The first pre-release OSMnx v2 beta has been released. Testers needed! See #1123 for details.

@gboeing
Copy link
Owner Author

gboeing commented Aug 14, 2024

See further refinement and optimization in #1205.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants