-
Notifications
You must be signed in to change notification settings - Fork 827
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor the features module #1157
Conversation
I would appreciate anyone's help in testing this PR. The goal is to ensure result parity with the old v1 Here is an example testing script that could be expanded for more query locations/tags: from itertools import product
import osmnx as ox
import osmnx.features_v1
ox.settings.log_console = True
ox.settings.requests_timeout = 500
def test(gdf1, gdf2):
gdf2 = gdf2.dropna(axis="columns", how="all")
gdf2 = gdf2.drop(columns=set(gdf2.columns) & {"nodes", "ways"})
assert (gdf1.geometry.type == gdf2.geometry.type).all()
assert all(g1.equals(g2) for g1, g2 in zip(gdf1.geometry, gdf2.geometry))
gdf1 = gdf1.reindex(columns=sorted(gdf1.columns)).sort_index().drop(columns=["geometry"])
gdf2 = gdf2.reindex(columns=sorted(gdf2.columns)).sort_index().drop(columns=["geometry"])
assert gdf1.shape == gdf2.shape
assert set(gdf1.columns) == set(gdf2.columns)
assert (gdf1.index.to_numpy() == gdf2.index.to_numpy()).all()
assert gdf1.equals(gdf2)
# file from /tests/input_data
# don't compare these... invalid geoms are handled slightly differently in v2
gdf1 = ox.features.features_from_xml("planet_10.068,48.135_10.071,48.137.osm")
gdf2 = ox.features_v1.features_from_xml("planet_10.068,48.135_10.071,48.137.osm")
# file from /tests/input_data
gdf1 = ox.features.features_from_xml("West-Oakland.osm.bz2")
gdf2 = ox.features_v1.features_from_xml("West-Oakland.osm.bz2")
test(gdf1, gdf2)
places = ["Piedmont, California, USA",
"Manhattan, New York, USA",
"Yokohama, Japan",
"Bonn, Germany"]
tagsl = [{"building": ["house", "garage", "detached"], "parking": True, "highway": "bus_stop"},
{"building": True}]
for place, tags in product(places, tagsl):
print(place, tags)
gdf1 = ox.features.features_from_place(place, tags)
gdf2 = ox.features_v1.features_from_place(place, tags)
test(gdf1, gdf2) @AtelierLibre you've done a ton of work for this module in the past, and your feedback would be most valuable if you could 1) formally review this PR, and 2) run some tests on this branch to check result parity. Thanks! |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## v2 #1157 +/- ##
==========================================
+ Coverage 97.98% 98.30% +0.32%
==========================================
Files 24 24
Lines 2435 2365 -70
==========================================
- Hits 2386 2325 -61
+ Misses 49 40 -9 ☔ View full report in Codecov by Sentry. |
@gboeing Time is a little tight at the moment but definitely happy to have a look through. If I can run some comparisons I will but, realistically, they're probably going to be limited. Really interested to see the refactoring! |
Thanks @AtelierLibre I look forward to your review. |
Aiming to merge this PR next week if there are no comments or changes required. |
I found one divergence in geometry between This is a complex multipolygon, and as far as I can tell, it is incorrectly digitized. It has nested polygons, but all of its inner holes and islands are tagged as "inner" polygons. However, according to the wiki, an island within a hole within a polygon should be tagged as "outer", "inner", and "outer", respectively. So it appears the v2 algorithm is working properly but the source data is bad. |
The first pre-release OSMnx v2 beta has been released. Testers needed! See #1123 for details. |
See further refinement and optimization in #1205. |
In advance of v2.0.0 (see #1106), this PR refactors the internal functionality of the
features
module, leaving its public API unchanged. It produces only minor differences to the final GeoDataFrame (namely, no more "nodes" or "ways" columns containing lists of the members of the ways and relations, respectively).It streamlines the code for maintainability and improves runtime and memory efficiency. We have cut the lines of code by about ~1/3 and runtime for a medium-size query by ~20%. See below for example (cached) timings: