A library to find the Probability Estimation of Navigation Paths and their Pattern Prediction.
The library helps in identifying the high probability trail path in a data. This navigation probability provides the means to analyze and predict the next link choice of unseen navigation sessions. Currently, the library allows three types of probability estimation from the path data -
- State Probability
- Transition Probability
- Path or Trail Probability
The initial probability of a state is estimated as the proportion of times the corresponding state was requested by the user. This probability is obtained by dividing the number of times a state was browsed by the total number of states browsed.
The probability of a transition between two states is estimated by the ratio of the number of times the sequence was visited to the number of total paths where the from page was visited.
The probability of a trail is estimated by the product of the initial probability of the first state in the trail and the transition probabilities of the next transitions taken in a path. The chain rule is applied in order to compute all path probabilities.
For the probability estimations
import pandas as pd
import path_nav as nv
data = {
"other_data": [1,4,5],
"path": [
["A", "B", "C", "A", "C"],
["B", "D", "B", "A"],
["A", "C", "B", "A", "D"]
],
"conversions": [0, 0, 1],
}
df = pd.DataFrame(data)
print(df)
other_data path conversions
0 1 [A, B, C, A, C] 0
1 4 [B, D, B, A] 0
2 5 [A, C, B, A, D] 1
# To find the state probability
state_probability = nv.state_probability(df, 'path')
print(state_probability)
State State_probability
0 D 0.142857
1 A 0.357143
2 C 0.214286
3 B 0.285714
# To add the start and conversion values to the path (optional)
df = nv.add_start_end(df,'path','conversions')
print(df)
other_data path conversions
0 1 [start, A, B, C, A, C, exit] 0
1 4 [start, B, D, B, A, exit] 0
2 5 [start, A, C, B, A, D, conversion] 1
# To find the transition probability
transition_df = nv.transition_probability(df, 'path')
print(transition_df)
from_sitesection to_sitesection transition_probability
B exit 0.000000
B conversion 0.000000
B D 0.333333
B A 0.666667
B C 0.333333
D exit 0.000000
D conversion 0.500000
D B 0.500000
D A 0.000000
D C 0.000000
start exit 0.000000
start conversion 0.000000
start B 0.333333
start D 0.000000
start A 0.666667
start C 0.000000
A exit 0.333333
A conversion 0.000000
A B 0.333333
A D 0.333333
A C 0.666667
C exit 0.500000
C conversion 0.000000
C B 0.500000
C D 0.000000
C A 0.500000
# To find the path probability
path_df = nv.path_probability(df, 'path', transition_df)
print(path_df)
other_data path conversions path_probability
0 1 [start, A, B, C, A, C, exit] 0 0.012346
1 4 [start, B, D, B, A, exit] 0 0.012346
2 5 [start, A, C, B, A, D, conversion] 1 0.024691
# To convert the path column to string data type
df = nv.convert_to_str(df, "path")
# To convert the path column to list data type
df = nv.convert_to_list(df, "path")