Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PubSub: Use integer type for timestamps (SQL) #3667

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

weiss
Copy link
Member

@weiss weiss commented Aug 24, 2021

Store PubSub item creation/modification timestamps as integers instead of "$megasec:$sec:$microsec" strings. This can improve the performance of certain SQL queries significantly.

This change requires a non-trivial conversion of the pubsub_item table, but I think the potential performance improvements are worth the hassle. The upgrade notes could suggest conversion scripts such as the following one, which I tested successfully with PostgreSQL. If everyone agrees with this change, I'll look into MySQL and SQLite as well.

#!/bin/sh

set -e
set -u

host='localhost'
user='ejabberd'
database='ejabberd'

sql()
{
        psql -X -q -t -A -F ' ' "$@" -h "$host" "$database" "$user"
}

query()
{
        sql -c "$*;"
}

convert()
{
        local timestamp=$1
        local ms=${timestamp%%:*}
        local us=${timestamp##*:}
        local s0=${timestamp#*:}
        local s=${s0%:*}

        ms=${ms#${ms%%[!0]*}}
        us=${us#${us%%[!0]*}}
        s=${s#${s%%[!0]*}}

        printf '%u' $(((ms * 1000000 + s) * 1000000 + us))
}

query 'ALTER TABLE pubsub_item ADD creation_temp BIGINT'
query 'ALTER TABLE pubsub_item ADD modification_temp BIGINT'

query 'SELECT nodeid,itemid,creation,modification FROM pubsub_item' |
    while read nodeid itemid creation modification
    do
        new_creation=$(convert "$creation")
        new_modification=$(convert "$modification")

        echo "UPDATE pubsub_item SET creation_temp = $new_creation \
              WHERE nodeid = $nodeid AND itemid = '$itemid';"
        echo "UPDATE pubsub_item SET modification_temp = $new_modification \
              WHERE nodeid = $nodeid AND itemid = '$itemid';"
    done | sql

query 'ALTER TABLE pubsub_item ALTER creation TYPE BIGINT USING creation_temp'
query 'ALTER TABLE pubsub_item ALTER modification TYPE BIGINT USING modification_temp'
query 'ALTER TABLE pubsub_item DROP creation_temp'
query 'ALTER TABLE pubsub_item DROP modification_temp'

Thanks to Ammonit Measurement GmbH for sponsoring this work.

Store PubSub item creation/modification timestamps as integers instead
of "$megasec:$sec:$microsec" strings.  This can improve the performance
of certain SQL queries significantly.

Thanks to Ammonit Measurement GmbH for sponsoring this work.
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.003%) to 34.183% when pulling 79ab6bb on weiss:integer-timestamps into ebf03a3 on processone:master.

@mremond
Copy link
Member

mremond commented Aug 25, 2021

@weiss The migration would require a downtime, right ?

@weiss
Copy link
Member Author

weiss commented Aug 26, 2021

The migration would require a downtime, right ?

Yes, it would.

@ghost
Copy link

ghost commented Sep 2, 2021

We did a quick and dirty test:

  1. We just created the BIGINT timestamp columns, but did not change the VARCHAR ones.
  2. We did explain analyze select itemid from pubsub_item where nodeid=NNN and publisher='XXX' order by modification; and the same with order by modification_temp. This matched around 650000 items.
  3. We did explain analyze select itemid from pubsub_item order by modification; and the same with order by modification_temp. This matched around 1800000 items.

The first SELECT (650000 items) was about 4 to 5 times faster with BIGINT.

The second SELECT (1800000 items) was about 9 to 10 times faster with BIGINT.

My conclusion is, that this is a significant performance win for IoT and maybe also interesting to social network applications, but probably not so relevant for instant messaging. In our case, the old timestamp format is just too slow. We get one iq per second per device, but after a while ejabberd takes longer than one second (up to 1.7 s) to handle it on our current hardware.

Side note: If the item contents are very small, e.g. only one or two measurement values from IoT sensors, and there are a lot of messages, size might be of concern, too. A BIGINT needs 8 bytes, the VARCHAR timestamp needs 24 bytes on PostgreSQL (20 bytes for the string, 4 bytes overhead). This does not matter in our setup, though, because we have many bytes per item anyway.

@Neustradamus
Copy link

Have you planned a merging like other PRs?

Thanks to Ammonit Measurement GmbH and @weiss about all PRs!

@nosnilmot
Copy link
Contributor

When I first saw this I wrote some pure-SQL scripts that could also be used to perform the migration, but forgot to share them.

Enjoy!

PostgreSQL

ALTER TABLE pubsub_item ALTER creation TYPE BIGINT USING
  (TO_NUMBER(substring(creation from 1 for position(':' in creation)-1), '999999999999') * 1000000 +
    TO_NUMBER(substring(creation from position(':' in creation)+1 for position(':' in substring(creation from position(':' in creation)+1))-1), '999999')) * 1000000 +
    TO_NUMBER(substring(creation from position(':' in substring(creation from position(':' in creation)+1)) + position(':' in creation)+1), '999999');

ALTER TABLE pubsub_item ALTER modification TYPE BIGINT USING
  (TO_NUMBER(substring(modification from 1 for position(':' in modification)-1), '999999999999') * 1000000 +
    TO_NUMBER(substring(modification from position(':' in modification)+1 for position(':' in substring(modification from position(':' in modification)+1))-1), '999999')) * 1000000 +
    TO_NUMBER(substring(modification from position(':' in substring(modification from position(':' in modification)+1)) + position(':' in modification)+1), '999999');

SQLite

UPDATE pubsub_item SET
  creation =
    (SUBSTR(creation, 1, INSTR(creation, ':')-1) * 1000000 +
      SUBSTR(creation, INSTR(creation, ':')+1, INSTR(SUBSTR(creation, INSTR(creation, ':')+1), ':')-1)) * 1000000 +
      SUBSTR(creation, INSTR(SUBSTR(creation, INSTR(creation, ':')+1), ':') + INSTR(creation, ':')+1),
  modification =
    (SUBSTR(modification, 1, INSTR(modification, ':')-1) * 1000000 +
      SUBSTR(modification, INSTR(modification, ':')+1, INSTR(SUBSTR(modification, INSTR(modification, ':')+1), ':')-1)) * 1000000 +
      SUBSTR(modification, INSTR(SUBSTR(modification, INSTR(modification, ':')+1), ':') + INSTR(modification, ':')+1);

MySQL/MariaDB

Option 1 - using temporary table:

CREATE TABLE pubsub_item_temp (
  nodeid bigint,
  itemid text NOT NULL,
  publisher text NOT NULL,
  creation bigint,
  modification bigint,
  payload mediumtext NOT NULL
) ENGINE=InnoDB CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci as SELECT
nodeid, itemid, publisher,
(CAST(substring(creation from 1 for position(':' in creation)-1) AS UNSIGNED) * 1000000 + CAST(substring(creation from position(':' in creation)+1 for position(':' in substring(creation from position(':' in creation)+1))-1) AS UNSIGNED)) * 1000000 + CAST(substring(creation from position(':' in substring(creation from position(':' in creation)+1)) + position(':' in creation)+1) AS UNSIGNED) creation,
(CAST(substring(modification from 1 for position(':' in modification)-1) AS UNSIGNED) * 1000000 + CAST(substring(modification from position(':' in modification)+1 for position(':' in substring(modification from position(':' in modification)+1))-1) AS UNSIGNED)) * 1000000 + CAST(substring(modification from position(':' in substring(modification from position(':' in modification)+1)) + position(':' in modification)+1) AS UNSIGNED) modification,
payload
from pubsub_item;
ALTER TABLE pubsub_item DROP FOREIGN KEY (`nodeid`);
DROP TABLE pubsub_item;
DROP INDEX i_pubsub_item_itemid;
DROP INDEX i_pubsub_item_tuple;
RENAME TABLE pubsub_item_temp TO pubsub_item;
CREATE INDEX i_pubsub_item_itemid ON pubsub_item(itemid(36));
CREATE UNIQUE INDEX i_pubsub_item_tuple ON pubsub_item(nodeid, itemid(36));
ALTER TABLE `pubsub_item` ADD FOREIGN KEY (`nodeid`) REFERENCES `pubsub_node` (`nodeid`) ON DELETE CASCADE;

Option 2 - using temporary column:

ALTER TABLE pubsub_item ADD creation_temp BIGINT;
ALTER TABLE pubsub_item ADD modification_temp BIGINT;
UPDATE pubsub_item set creation_temp =
(CAST(substring(creation from 1 for position(':' in creation)-1) AS UNSIGNED) * 1000000 + CAST(substring(creation from position(':' in creation)+1 for position(':' in substring(creation from position(':' in creation)+1))-1) AS UNSIGNED)) * 1000000 + CAST(substring(creation from position(':' in substring(creation from position(':' in creation)+1)) + position(':' in creation)+1) AS UNSIGNED),
modification_temp = (CAST(substring(modification from 1 for position(':' in modification)-1) AS UNSIGNED) * 1000000 + CAST(substring(modification from position(':' in modification)+1 for position(':' in substring(modification from position(':' in modification)+1))-1) AS UNSIGNED)) * 1000000 + CAST(substring(modification from position(':' in substring(modification from position(':' in modification)+1)) + position(':' in modification)+1) AS UNSIGNED);
ALTER TABLE pubsub_item DROP creation;
ALTER TABLE pubsub_item DROP modification;
ALTER TABLE pubsub_item CHANGE COLUMN creation_temp creation BIGINT;
ALTER TABLE pubsub_item CHANGE COLUMN modification_temp modification BIGINT;

@mremond mremond added this to the ejabberd 23.xx milestone Jul 20, 2023
@badlop badlop modified the milestones: ejabberd 23.10, ejabberd 23.xx Oct 17, 2023
@badlop badlop modified the milestones: ejabberd 24.xx, Parking Lot Jun 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants