You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In users_sessions_this_run we filter start_tstamp from the sessions on lower_limit and upper_limit from user_limits. By joining on start_tstamp we attempt to pull info from the first session per user:
FROM {{.output_schema}}.sessions{{.entropy}} a
INNER JOIN {{.scratch_schema}}.users_userids_this_run{{.entropy}} b
ONa.domain_userid=b.domain_useridWHEREa.start_tstamp>= (SELECT lower_limit FROM {{.scratch_schema}}.users_limits{{.entropy}})
ANDa.start_tstamp<= (SELECT upper_limit FROM {{.scratch_schema}}.users_limits{{.entropy}})
The lower and upper limits are created by taking the MIN and MAX of start_tstamp from users_userids_this_run, where the start_tstamp is the first timestamp for each user based on all of their sessions (taken from {mobile_}sessions --> {mobile_}sessions_userid_manifest_staged -> users_userids_this_run).
In cases when we have very few users, the upper_limit can have a lower value than the start_tstamp of sessions that we need to process. As a result, the users_sessions_this_run table is artificially truncated, which leads to incorrect information being calculated downstream in users_aggs, users_lasts and users.
Proposed Fix 1 (breaking changes)
In {mobile_}sessions_userid_manifest_this_run add a new end_tstamp TIMESTAMP column and fill it as MAX(start_tstamp) AS max_tstamp
In {mobile_}users_manifest table, add a new end_tstamp TIMESTAMP column and fill it as {{start_date}}::TIMESTAMP
In {mobile_}users_limits change the definition of upper_limit to MAX(end_tstamp) AS upper_limit
Proposed Fix 2 (no breaking changes)
In users_sessions_this_run remove start_tstamp filter related to the upper_limit: AND a.start_tstamp <= (SELECT upper_limit FROM {{.scratch_schema}}.{mobile_}users_limits{{.entropy}})
The text was updated successfully, but these errors were encountered:
It may well be the case that in v1 the upper limit was included for some eventuality that I thought would matter, but in reality doesn't. If you folks are satisfied that just removing it won't cause any issues then I'm on board.
Issue
In
users_sessions_this_run
we filterstart_tstamp
from thesessions
onlower_limit
andupper_limit
fromuser_limits
. By joining onstart_tstamp
we attempt to pull info from the first session per user:The lower and upper limits are created by taking the
MIN
andMAX
ofstart_tstamp
fromusers_userids_this_run
, where thestart_tstamp
is the first timestamp for each user based on all of their sessions (taken from{mobile_}sessions
-->{mobile_}sessions_userid_manifest_staged
->users_userids_this_run
).In cases when we have very few users, the
upper_limit
can have a lower value than thestart_tstamp
of sessions that we need to process. As a result, theusers_sessions_this_run
table is artificially truncated, which leads to incorrect information being calculated downstream inusers_aggs
,users_lasts
andusers
.Proposed Fix 1 (breaking changes)
{mobile_}sessions_userid_manifest_this_run
add a newend_tstamp TIMESTAMP
column and fill it asMAX(start_tstamp) AS max_tstamp
{mobile_}users_manifest
table, add a newend_tstamp TIMESTAMP
column and fill it as{{start_date}}::TIMESTAMP
{mobile_}users_limits
change the definition ofupper_limit
toMAX(end_tstamp) AS upper_limit
Proposed Fix 2 (no breaking changes)
users_sessions_this_run
remove start_tstamp filter related to theupper_limit
:AND a.start_tstamp <= (SELECT upper_limit FROM {{.scratch_schema}}.{mobile_}users_limits{{.entropy}})
The text was updated successfully, but these errors were encountered: