-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add per-operation timing to segment_current_trips using ect.Timer #990
Conversation
@shankari Please review;
|
@TeachMeTW Thank you for the clean PR! I don't think that the trip segmentations stats are fine-grained enough.
There are literally two statements within the timer in the code snippet above - a log statement and a I would:
as a nice-to-have, can you also split commit into two:
You can even create two PRs so I can merge the easy one while reviewing the more complex one. |
013d099
to
b8f5523
Compare
@TeachMeTW is there an ETA for addressing my comments? In particular, "easy: one for adding instrumentation to the _get_and_store_range?" is the main change that we want to get into production soon. |
It is almost finished but I need to test it; is there a uuid in ca_ebike that has trips that are segmentable? How do I search for that? The tests I've done stops at:
because there are no new segments hence I cannot see if the fine grained timers are being called. |
you can reset the pipeline using |
a236c62
to
fe21f7b
Compare
@shankari I have added more timings, specifically for |
@shankari ready for feedback; is this in depth timers enough; I also only noticed 3 things that take the most times; filter methods, create_places_and_trips, single/multi filter |
899c0cb
to
fe21f7b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TeachMeTW no, this doesn't actually address my comments.
And you don't have any timers within create_places_and_trips, so you won't know where we are spending the time, and what we should try to optimize.
I miswrote; I would actually expect most of the time to be in segment_into_trips
. I think that is what you found as well.
From #990 (comment)
I also only noticed 3 things that take the most times; filter methods, create_places_and_trips, single/multi filter
what are those times?
I would expect to see fine-grained instrumentation in segment_into_trips
as well.
Note also that this has broken several tests.
As I said earlier, it looks like this will take some time to implement, so please move
#990 (comment)
to a separate PR
fe21f7b
to
6d798c6
Compare
@shankari Addressed comments |
@TeachMeTW tests are still failing
I am not going to review unless the tests are passing. |
6d798c6
to
c8e8080
Compare
5c916ed
to
c8e8080
Compare
@JGreenlee since this is no longer the weekend, can you review this before it comes to me? |
@JGreenlee Please review, tests are passing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks pretty clean! We should be able to assess bottlenecks in segmentation; I think this will allow for even more granular instrumentation than I had in mind.
09c4fdc
to
f4f6d19
Compare
@shankari @JGreenlee and I discovered a confusing behavior with the filters. I tested with both an ios and android user -- but the main issue stems from the fact that both only use the distance filter, while the time filter is left unused. What are your thoughts? We are both stumped in regards to this. @JGreenlee can add more to this thread if I have not covered everything. |
I would like to see proof of this. Please use |
There is no bug. We were testing against 2 iOS users when we thought it was 1 Android and 1 iOS user. |
cc2a352
to
7b5ef4f
Compare
@JGreenlee Please review the two changes. I do not believe the failing ubuntu test has anything to do with the changes I made. |
...ion/analysis/intake/segmentation/trip_segmentation_methods/dwell_segmentation_dist_filter.py
Outdated
Show resolved
Hide resolved
Noted that tests are passing but workflow fails due to e-mission/e-mission-docs#1097 |
09b3800
to
c48a655
Compare
fixed dist name Added name to time
c48a655
to
1a5a0d9
Compare
@JGreenlee please re review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine to merge.
Some of these Timer blocks might still be overkill, but it's not easy to tell without digging into every function called. We can always remove some later after assessing on prod and seeing what is and isn't a bottleneck.
At this stage, better too granular than not granular enough
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @JGreenlee that more fine-grained is better than less, but I also want to point out that there is a cost to storing a lot of stats in terms of database usage. This is particularly true for the pipeline stats which are generated ~ every hour, as opposed to the admin dashboard stats, which are only generated when the admin user logs in to the dashboard.
I can deploy this to staging, but we should use the staging results to strip out the bottom 89-90% of stats before we move to production (even the limited 3 environment production). We can add them back if we resolve all the issues with the top 10-20% of readings!
In particular, I do not anticipate that any of the stats below 👇 will be relevant. They are literally just creating python objects.
ts = esta.TimeSeries.get_time_series(user_id) | ||
esds.store_pipeline_time(user_id, ecwp.PipelineStages.TRIP_SEGMENTATION.name + "/get_time_series", time.time(), t_get_time_series.elapsed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
creating a python object
with ect.Timer() as t_get_time_range: | ||
time_query = epq.get_time_range_for_segmentation(user_id) | ||
esds.store_pipeline_time(user_id, ecwp.PipelineStages.TRIP_SEGMENTATION.name + "/get_time_range_for_segmentation", time.time(), t_get_time_range.elapsed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
simple database query
dstfsm = dstf.DwellSegmentationTimeFilter(time_threshold=5 * 60, # 5 mins | ||
point_threshold=9, | ||
distance_threshold=100) # 100 m | ||
esds.store_pipeline_time(user_id, ecwp.PipelineStages.TRIP_SEGMENTATION.name + "/create_time_filter", time.time(), t_create_time_filter.elapsed) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
created python object
with ect.Timer() as t_create_dist_filter: | ||
dsdfsm = dsdf.DwellSegmentationDistFilter(time_threshold=10 * 60, # 10 mins | ||
point_threshold=9, | ||
distance_threshold=50) # 50 m | ||
esds.store_pipeline_time(user_id, ecwp.PipelineStages.TRIP_SEGMENTATION.name + "/create_dist_filter", time.time(), t_create_dist_filter.elapsed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
created python object
segment_current_trips
function withect.Timer
context managers.ecwp.PipelineStages.TRIP_SEGMENTATION.name + "/operation"
for consistent identification.esds.store_pipeline_time
with the appropriate parameters.This enhancement enables granular performance monitoring of the trip segmentation process, allowing for better identification of potential bottlenecks and optimization opportunities.