-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LDI] Improve performance of historical data import #11
Comments
I stumbled across this InfluxDB blog mentioning enhancements for improved ingestion speeds by switching from JSON to the native line protocol format. A Python function to translate JSON's into this LP-format is included. See Writing Data to InfluxDB with Python Perhaps this idea is useful for luftdatenpumpe too? |
Thanks for mentioning that detail. However, it's a bit misleading: There is no JSON at all! Data is always transferred to InfluxDB using the line protocol. If data is passed in as a Python dictionary, it will get converted into line protocol through the I believe tuning the Please recognize that luftdatenpumpe also provides the possibility to submit data using UDP [1] instead of HTTP over TCP. I am wondering why this detail was not mentioned at all within the blog post you have referenced. Do you still have troubles with ingest performance, @d-roet? I would be happy to look into that if time permits. [1] https://docs.influxdata.com/influxdb/v1.7/supported_protocols/udp/ |
Thanks very much for mentioning that UDP possibility. I tried it following the docs you linked, but even the high-traffic UDP case mentioned there did not speed-up things as compared to the default HTTP ingestion. I experimented a bit with the As a work-around I have tried filtering our locally mirrored Luftdaten archive snapshot by only keeping the CSV archives of sensor id's that are relevant for our case. That, not surprisingly, speeds up things a lot (factor x20-x30 faster than before). I imagine this is because now So for my case this is a workable solution and probably does not warrant further investigation for optimizations inside luftdatenpumpe itself. |
I've just made a gist about how to acquire compressed Parquet files from archive.sensor.community and store their contents into InfluxDB. If this works out well, I will be happy to integrate this into Luftdatenpumpe appropriately. The synopsis is easy to grok:
However, please note the Parquet files are partitioned by time range, so there is no way to filter by country before actually reading them. cc @d-roet, @wetterfrosch |
Coming from #9 and #8, we see the ingesting performance of historical data from LDI acquired from http://archive.luftdaten.info/ through
wget
could well be improved. We will outline some thoughts about this here (in no particular order):The text was updated successfully, but these errors were encountered: