Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add per-operation timing to segment_current_trips using ect.Timer #990

Merged
merged 3 commits into from
Nov 9, 2024

Conversation

TeachMeTW
Copy link
Contributor

  • Wrapped each significant operation within the segment_current_trips function with ect.Timer context managers.
  • Named each timer using the pattern ecwp.PipelineStages.TRIP_SEGMENTATION.name + "/operation" for consistent identification.
  • After each timed block, recorded the elapsed time by calling esds.store_pipeline_time with the appropriate parameters.
  • Ensured that only timing-related code was added without altering existing logic, error handling, or formatting.

This enhancement enables granular performance monitoring of the trip segmentation process, allowing for better identification of potential bottlenecks and optimization opportunities.

@TeachMeTW
Copy link
Contributor Author

@shankari Please review;

  • added a timer to the "ad-hoc" pipeline step ✅
  • add timers to parts of the trip segmentation, with suffixes ✅
  • tested with a user intake, updates show when querying ✅

@shankari
Copy link
Contributor

shankari commented Nov 2, 2024

@TeachMeTW Thank you for the clean PR!

I don't think that the trip segmentations stats are fine-grained enough.
Did you look at the logs/stats from the "tested with a user intake, updates show when querying"?
My expectation is that the create_places_and_trips timer will account for the bulk of the overall time for the step (e.g. if the step takes 100ms, create_places_and_trips will account for 95ms). And you don't have any timers within create_places_and_trips, so you won't know where we are spending the time, and what we should try to optimize.
Further, I don't think we need timers for simple if/then statements - e.g. the below

        elif len(segmentation_points) == 0:
            with ect.Timer(ecwp.PipelineStages.TRIP_SEGMENTATION.name + "/early_return_no_segmentation") as timer_early_return_no_segmentation:
                # no new segments, no need to keep looking at these again
                logging.debug("len(segmentation_points) == 0, early return")
                epq.mark_segmentation_done(user_id, None)
            esds.store_pipeline_time(
                user_id,
                ecwp.PipelineStages.TRIP_SEGMENTATION.name + "/early_return_no_segmentation",
                time.time(),
                timer_early_return_no_segmentation.elapsed
            )

There are literally two statements within the timer in the code snippet above - a log statement and a mark_segmentation_done, neither of which are expected to be heavyweight.

I would:

  1. look at the logs from running the single user intake,
  2. correlate that to the trip segmentation codebase, and
  3. add timers for the sections of code (within create_places_and_trips) that are taking the most time.

as a nice-to-have, can you also split commit into two:

  • easy: one for adding instrumentation to the _get_and_store_range?
  • more complex: one for adding instrumentation to trip_segmentation?

You can even create two PRs so I can merge the easy one while reviewing the more complex one.
Feel free to close this PR and create two new ones if that makes it easier!

@TeachMeTW TeachMeTW force-pushed the Add-Lower-Level-Timings branch 2 times, most recently from 013d099 to b8f5523 Compare November 2, 2024 03:14
@shankari
Copy link
Contributor

shankari commented Nov 2, 2024

@TeachMeTW is there an ETA for addressing my comments?
I see that you have split the commit, but I still don't see the "easy: one for adding instrumentation to the _get_and_store_range?" PR and you haven't addressed my comment around "I don't think that the trip segmentations stats are fine-grained enough." in this PR.

In particular, "easy: one for adding instrumentation to the _get_and_store_range?" is the main change that we want to get into production soon.

@TeachMeTW
Copy link
Contributor Author

@TeachMeTW is there an ETA for addressing my comments? I see that you have split the commit, but I still don't see the "easy: one for adding instrumentation to the _get_and_store_range?" PR and you haven't addressed my comment around "I don't think that the trip segmentations stats are fine-grained enough." in this PR.

In particular, "easy: one for adding instrumentation to the _get_and_store_range?" is the main change that we want to get into production soon.

It is almost finished but I need to test it; is there a uuid in ca_ebike that has trips that are segmentable? How do I search for that? The tests I've done stops at:

    if len(loc_df) == 0:
        # no new segments, no need to keep looking at these again
        logging.debug("len(loc_df) == 0, early return")
        epq.mark_segmentation_done(user_id, None)
        return
        

because there are no new segments hence I cannot see if the fine grained timers are being called.

@shankari
Copy link
Contributor

shankari commented Nov 3, 2024

you can reset the pipeline using bin/reset_pipeline.py

@TeachMeTW TeachMeTW force-pushed the Add-Lower-Level-Timings branch from a236c62 to fe21f7b Compare November 3, 2024 03:44
@TeachMeTW
Copy link
Contributor Author

@shankari I have added more timings, specifically for create_places_and_trips

@TeachMeTW
Copy link
Contributor Author

@shankari ready for feedback; is this in depth timers enough; I also only noticed 3 things that take the most times; filter methods, create_places_and_trips, single/multi filter

emission/pipeline/intake_stage.py Outdated Show resolved Hide resolved
Copy link
Contributor

@shankari shankari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TeachMeTW no, this doesn't actually address my comments.

And you don't have any timers within create_places_and_trips, so you won't know where we are spending the time, and what we should try to optimize.

I miswrote; I would actually expect most of the time to be in segment_into_trips. I think that is what you found as well.

From #990 (comment)

I also only noticed 3 things that take the most times; filter methods, create_places_and_trips, single/multi filter

what are those times?

I would expect to see fine-grained instrumentation in segment_into_trips as well.

Note also that this has broken several tests.

As I said earlier, it looks like this will take some time to implement, so please move
#990 (comment)
to a separate PR

@TeachMeTW TeachMeTW force-pushed the Add-Lower-Level-Timings branch from fe21f7b to 6d798c6 Compare November 4, 2024 20:48
@TeachMeTW
Copy link
Contributor Author

@shankari Addressed comments

@shankari
Copy link
Contributor

shankari commented Nov 5, 2024

@TeachMeTW tests are still failing

#990 (review)

Note also that this has broken several tests.

I am not going to review unless the tests are passing.

@TeachMeTW TeachMeTW closed this Nov 5, 2024
@TeachMeTW TeachMeTW force-pushed the Add-Lower-Level-Timings branch from 6d798c6 to c8e8080 Compare November 5, 2024 18:26
@TeachMeTW TeachMeTW reopened this Nov 5, 2024
@TeachMeTW TeachMeTW closed this Nov 5, 2024
@TeachMeTW TeachMeTW force-pushed the Add-Lower-Level-Timings branch from 5c916ed to c8e8080 Compare November 5, 2024 18:49
@TeachMeTW TeachMeTW reopened this Nov 5, 2024
@shankari
Copy link
Contributor

shankari commented Nov 5, 2024

@JGreenlee since this is no longer the weekend, can you review this before it comes to me?

@TeachMeTW
Copy link
Contributor Author

@JGreenlee Please review, tests are passing

Copy link
Contributor

@JGreenlee JGreenlee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty clean! We should be able to assess bottlenecks in segmentation; I think this will allow for even more granular instrumentation than I had in mind.

@TeachMeTW TeachMeTW force-pushed the Add-Lower-Level-Timings branch from 09c4fdc to f4f6d19 Compare November 5, 2024 21:03
@TeachMeTW
Copy link
Contributor Author

@shankari @JGreenlee and I discovered a confusing behavior with the filters. I tested with both an ios and android user -- but the main issue stems from the fact that both only use the distance filter, while the time filter is left unused. What are your thoughts? We are both stumped in regards to this. @JGreenlee can add more to this thread if I have not covered everything.

@shankari
Copy link
Contributor

shankari commented Nov 5, 2024

I tested with both an ios and android user -- but the main issue stems from the fact that both only use the distance filter, while the time filter is left unused.

I would like to see proof of this. Please use <details> to avoid a wall of text and strip out any identifying information before posting

@JGreenlee
Copy link
Contributor

There is no bug. We were testing against 2 iOS users when we thought it was 1 Android and 1 iOS user.
I was correct that the reason @TeachMeTW wasn't seeing the time filter be used is because he was testing a user on one platform and needed to test the other platform. However, I mixed it up and thought android -> distance and ios -> time, when it is actually the opposite.
After choosing an Android user, @TeachMeTW was indeed able to see readings from the time filter.

@TeachMeTW TeachMeTW force-pushed the Add-Lower-Level-Timings branch 3 times, most recently from cc2a352 to 7b5ef4f Compare November 6, 2024 03:35
@TeachMeTW
Copy link
Contributor Author

@JGreenlee Please review the two changes. I do not believe the failing ubuntu test has anything to do with the changes I made.

@JGreenlee
Copy link
Contributor

Noted that tests are passing but workflow fails due to e-mission/e-mission-docs#1097

@TeachMeTW TeachMeTW force-pushed the Add-Lower-Level-Timings branch from 09b3800 to c48a655 Compare November 8, 2024 00:04
fixed dist name

Added name to time
@TeachMeTW TeachMeTW force-pushed the Add-Lower-Level-Timings branch from c48a655 to 1a5a0d9 Compare November 8, 2024 00:06
@TeachMeTW
Copy link
Contributor Author

@JGreenlee please re review

Copy link
Contributor

@JGreenlee JGreenlee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine to merge.
Some of these Timer blocks might still be overkill, but it's not easy to tell without digging into every function called. We can always remove some later after assessing on prod and seeing what is and isn't a bottleneck.
At this stage, better too granular than not granular enough

Copy link
Contributor

@shankari shankari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @JGreenlee that more fine-grained is better than less, but I also want to point out that there is a cost to storing a lot of stats in terms of database usage. This is particularly true for the pipeline stats which are generated ~ every hour, as opposed to the admin dashboard stats, which are only generated when the admin user logs in to the dashboard.

I can deploy this to staging, but we should use the staging results to strip out the bottom 89-90% of stats before we move to production (even the limited 3 environment production). We can add them back if we resolve all the issues with the top 10-20% of readings!

In particular, I do not anticipate that any of the stats below 👇 will be relevant. They are literally just creating python objects.

Comment on lines +55 to +56
ts = esta.TimeSeries.get_time_series(user_id)
esds.store_pipeline_time(user_id, ecwp.PipelineStages.TRIP_SEGMENTATION.name + "/get_time_series", time.time(), t_get_time_series.elapsed)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

creating a python object

Comment on lines +58 to +60
with ect.Timer() as t_get_time_range:
time_query = epq.get_time_range_for_segmentation(user_id)
esds.store_pipeline_time(user_id, ecwp.PipelineStages.TRIP_SEGMENTATION.name + "/get_time_range_for_segmentation", time.time(), t_get_time_range.elapsed)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simple database query

Comment on lines +66 to +70
dstfsm = dstf.DwellSegmentationTimeFilter(time_threshold=5 * 60, # 5 mins
point_threshold=9,
distance_threshold=100) # 100 m
esds.store_pipeline_time(user_id, ecwp.PipelineStages.TRIP_SEGMENTATION.name + "/create_time_filter", time.time(), t_create_time_filter.elapsed)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

created python object

Comment on lines +71 to +75
with ect.Timer() as t_create_dist_filter:
dsdfsm = dsdf.DwellSegmentationDistFilter(time_threshold=10 * 60, # 10 mins
point_threshold=9,
distance_threshold=50) # 50 m
esds.store_pipeline_time(user_id, ecwp.PipelineStages.TRIP_SEGMENTATION.name + "/create_dist_filter", time.time(), t_create_dist_filter.elapsed)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

created python object

@shankari shankari merged commit 3900d3c into e-mission:master Nov 9, 2024
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Tasks completed
Development

Successfully merging this pull request may close these issues.

3 participants