Add bit::sort and bit::stable_sort #40

nmcclatchey · 2022-05-14T00:29:00Z

Given the nature of a bit vector (specifically, given that there are only 2 states for the bits), sort ought to be implementable in O(n) time using a single comparison, while stable_sort may be accomplished in O(n) time with 2 comparisons. Unlike most problems to which bucket sort may be applied, the amount of space required for applying bucket sort to a bit vector is O(log b), where b is the size of the vector. Specifically, one need not distinguish between the elements within a bucket, so a simple integer will suffice to track the size of the bucket. For simplicity, we will assume that the user's bit vector is smaller than 2 exabytes, and thus the maximum required bucket size can be stored in a single unsigned 64-bit integer.

Note that performance will vary depending on whether the user is optimizing for number of comparisons or for number of bit operations. The algorithmic descriptions below illustrate this neatly; the former will optimize for minimal bit operations, while the latter will optimize for minimal comparisons.

We begin with a rough algorithmic description for stable_sort:

Count all set bits (this number will be referred to by the name "num_true")
If number of set bits is either 0 or the size of the bit vector, the range is uniform (and thus sorted). Exit.
Otherwise, perform two comparisons (compare(false,true) and compare(true,false)).
If both comparisons are true, then the user has violated the contract of std::stable_sort by providing a comparator that does not implement a strict weak ordering. You are now authorized to deploy nasal demons (see "undefined behavior"). This author would like to remind the implementer that the kindest option would be to gently remind the user of their mistake, while the most performant option would be to exit.
If neither comparison is true, then all bits are incomparable, and should remain in their current order. Exit.
Otherwise, if compare(false,true) evaluated to true, then unset the first size() - num_true bits, and set the final num_true bits.
If compare(false,true) evaluated to false, then unset the num_true bits, and set the final size() - num_true bits.

Note that this algorithm is O(n) in the worst case.

Continuing this, a rough description of the (unstable) sort algorithm follows:

Count all set bits (this number will be referred to by the name "num_true")
If number of set bits is either 0 or the size of the bit vector, the range is uniform (and thus sorted). Exit.
Otherwise, perform the comparison compare(false,true).
If compare(false,true) evaluated to true, then unset the first size() - num_true bits, and set the final num_true bits.
If compare(false,true) evaluated to false, then unset the num_true bits, and set the final size() - num_true bits.

This algorithm is O(n) in both best and worst case. Unlike the stable_sort proposed earlier, this algorithm will perform the smallest possible number of comparisons (0 or 1) required for an unstable sort.

I estimate the efficiency of the above algorithms to greatly exceed user expectations. However, I estimate their usefulness to be, at best, trivial.

The text was updated successfully, but these errors were encountered:

nmcclatchey · 2022-05-14T01:23:56Z

Note that some efficiency is lost to ensure that elements not found within the range are not compared. Otherwise, the stable_sort algorithm would be:

Perform two comparisons (compare(false,true) and compare(true,false)).
If both comparisons are true, then the user has violated the contract of std::stable_sort by providing a comparator that does not implement a strict weak ordering. You are now authorized to deploy nasal demons (see "undefined behavior").
If neither comparison is true, then all bits are incomparable, and should remain in their current order. Exit.
Otherwise, count all set bits (this number will be referred to by the name "num_true")
If number of set bits is either 0 or the size of the bit vector, the range is uniform (and thus sorted). Exit.
Otherwise, if compare(false,true) evaluated to true, then unset the first size() - num_true bits, and set the final num_true bits.
If compare(false,true) evaluated to false, then unset the num_true bits, and set the final size() - num_true bits.

bkille added this to the STL algorithms milestone May 18, 2022

bkille added the new-stl-algorithm label May 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add bit::sort and bit::stable_sort #40

Add bit::sort and bit::stable_sort #40

nmcclatchey commented May 14, 2022 •

edited

Loading

nmcclatchey commented May 14, 2022

Add bit::sort and bit::stable_sort #40

Add bit::sort and bit::stable_sort #40

Comments

nmcclatchey commented May 14, 2022 • edited Loading

nmcclatchey commented May 14, 2022

nmcclatchey commented May 14, 2022 •

edited

Loading