-
Notifications
You must be signed in to change notification settings - Fork 77
Add option to not load user agent database #283
Comments
I believe startup time is dominated by the compilation of the mapping script (Groovy). Do you have a particular usage scenario where startup time is critical? |
I think we should remove, or at least deprecate, the user agent database. It is old and I'm not sure if it should be part of Divolte itself. I would run something like YAUAA in |
Understandably, the outdated parser is problematic. Yet, I imagine there are users who rely on the UA enrichment happening before events hit their services. The default schema has UA information in it, so there is no backwards compatible way to remove it by default. We can swap out the current UA parser in favour of YAUAA, but I believe one issue with YAUAA is that its output cannot populate all fields in the Divolte default schema (this was the case when I last looked at it), so you'd probably want to add a configuration construct to switch to this parser in order not to break existing configurations and dependencies on the default schema. I don't believe the startup time overhead of loading the UA database is really an issue. There is no runtime overhead of the UA parser if you don't use it in mapping (evaluation is lazy). UA parsing is also cached, so most times it will be a hash map lookup and not a regex operation. |
From the docs. |
Yes, having a single-threaded library doesn't look really good. It works well with Getting back to the original question about the startup. I think we really need some numbers here to really see if it is the UA database. Apart from that, the JVM isn't really known for its excellent startup speed, so we could also introduce some readiness check. Then we can let k8s/docker know that we're ready and it can be added to the load balancer pool. |
It basically means that the parser is stateful and for some reason the state needs to be kept in the implementing class as instance variables as opposed to something scoped to the method call. I can't imagine any justification for such an implementation. I chose to stay away from it. Also, I am not sure if the parse tree approach that is documented is future proof, but time will tell. The reason for providing ip2geo and UA parsing in Divolte is exactly that you can ditch the two most discriminating pieces of information on a client early in the pipeline. I believe a better approach to offline parsing would be to get a dataset from something like this and build a tailored parser to that. This also has the benefit of only parsing for the top-N most seen user agents and leaving the rest as esoteric or otherwise unimportant. These databases, however, tend to come at a cost (for obvious reasons), so we could never deliver it as part of Divolte. The same way we don't do this for ip2geo from MaxMind. |
Divolte is complaining that Should I let PHP parse the UA on my site and push it as a custom field towards Divolte? Same goes for country detection? |
Since user agent database is not up-to-date it should be possible to configure divolte not to load this database
From my experience loading user agent database takes much time during divolte startup so this could minimize startup time and saves memory
The text was updated successfully, but these errors were encountered: