Skip to content

ftlHashSet

Robert Rüger edited this page Jul 25, 2018 · 1 revision

ftlHashSet is a container for a set of unique values. Search, insertion, and removal of values have average constant-time complexity. It basically represents a set in the mathematical sense.

Internally, the values are not sorted in any particular order, but organized into buckets. Which bucket an element is placed into depends entirely on the hash of its value. This allows fast access to individual elements, since once the hash is computed, it refers to the exact bucket the element is placed into. The value types have to implement the ftlHash method and be comparable with the == operator. Hash functions for the basic Fortran types are predefined in the ftlHash module and can be combined to implement hash functions for any derived types.

ftlHashSet basically a Fortran reimplementation of C++'s std::unordered_set with a slightly different (and more intuitive) interface.

Instantiation

Macros

ftlHashSet uses the following macros for instantiation:

FTL_TEMPLATE_TYPE
The type of the objects to be stored in the container, e.g. integer(int64) or MyDerivedType. Do not add the enclosing type() for derived types.
FTL_TEMPLATE_TYPE_IS_DERIVED
This needs to be defined if FTL_TEMPLATE_TYPE is a derived type. If the type is derived, it must implement the ftlHash function and be comparable with the == operator.
FTL_TEMPLATE_TYPE_MODULE
The name of the module in which FTL_TEMPLATE_TYPE is defined. Specifying this is probably only necessary if FTL_TEMPLATE_TYPE is a derived type, or uses a kind defined in another module, e.g. int64 from the iso_fortran_env module.
FTL_TEMPLATE_TYPE_NAME
A convenient user-readable name for the stored type, e.g. Int. This will be used in the type-name of the dynArray container itself and the module in which it is available, e.g. setting this to Int will result in the type ftlHashSetInt available in module ftlHashSetIntModule.
FTL_INSTANTIATE_TEMPLATE
If this macro is defined the template will be instantiated. If not, the template will not be instantiated, but some information about the ftlHashSet container (e.g. its name and the kind of iterator it provides) will be passed on to subsequent template instantiations (e.g. ftlAlgorithms).

Specialization for ftlString keys

There is a specialization of the ftlHashSet template for ftlString as the element type. It is accessed by defining FTL_TEMPLATE_TYPE_IS_FTLSTRING. Its sole purpose is to add an alternative version for all methods that accept an element as input. The alternative version accepts a plain Fortran character variable as input instead of a ftlString. This in convenient when passing a string literal. Of course the normal versions accepting an ftlString are also available.

type(ftlHashSetString) :: set

call set%Insert(ftlString('inconvenient'))
! ... is inconvenient compared to:
call set%Insert('convenient')
! ... but both versions are totally equivalent

If FTL_TEMPLATE_TYPE_IS_FTLSTRING is defined, it is not needed to define either FTL_TEMPLATE_TYPE_MODULE or FTL_TEMPLATE_TYPE_IS_DERIVED. (Since we know that the keytype is exactly ftlString, we already know the module in which it is defined and that it is a derived type.) Note however, that FTL_TEMPLATE_TYPE_NAME should still be defined.

Example instantiations

Example instantiation of a set of integers. The resulting type ftlHashSetInt is available in ftlHashSetIntModule.

#define FTL_TEMPLATE_TYPE integer
#define FTL_TEMPLATE_TYPE_NAME Int
#define FTL_INSTANTIATE_TEMPLATE
#include <ftlHashSet.F90_template>

Methods and public data members

Construction, destruction & assignment

ftlHashSet%New()

Constructs a new ftlHashSet container. There a two versions:

  • Constructs an empty container with a specified number of buckets.

    subroutine New(self, nBuckets)
       type(ftlHashSetT), intent(out) :: self
       integer          , intent(in)  :: nBuckets

    For best performance, the initial number of buckets should be chosen to be slightly larger than the expected number of elements to be stored in the container. If this information is not available, it should be a small number as the number of buckets will grow dynamically as more elements are stored in the container.

  • Constructs an ftlHashSet as a copy of another ftlHashSet.

    subroutine New(self, other)
       type(ftlHashSetT), intent(out) :: self
       type(ftlHashSetT), intent(in)  :: self

ftlHashSet%Delete()

Destructs the container. The used storage is deallocated.

subroutine Delete(self)
   type(ftlHashSetT), intent(inout) :: self

ftlHashSet assignment(=)

Assignment of one ftlHashSet to another is exactly the same as copy-construction, see above.

subroutine assignment(=)(self, other)
   type(ftlHashSetT), intent(out) :: self
   type(ftlHashSetT), intent(in)  :: other

Iterators

WARNING: Like all iterators, iterators of the ftlHashSet container provide a it%value member, that can be used to access the value pointer to be the iterator. While it is technically possible to change it%value by assigning to it or passing it into subroutines writing to it, this must never be done, as it would leave the set in an inconsistent state if the new value should be assigned to a different bucket. Use the %value field of ftlHashSet iterators strictly read-only! Note that the only ftlAlgorithms that are safe to run on an ftlHashSet are the "non-modifying sequence operations". (Luckily it also doesn't make much sense to run any of the other algorithms ...)

ftlHashSet%Begin()

Begin(ftlHashSet)

Returns a random access iterator to the first element of the container.

ftlHashSetTIterator function Begin(self)
   type(ftlHashSetT), intent(in) :: self

Notice that an ftlHashSet object makes no guarantees on which specific element is considered its first element. But, in any case, the range that goes from its begin to its end covers all the elements in the container.

If the container is empty, the returned iterator will be equal to End(). Note that this method exists as both type-bound and as a free function.

ftlHashSet%End()

End(ftlHashSet)

Returns a random access iterator to the element following the last element of the container. This element acts as a placeholder; attempting to access it results in undefined behavior.

ftlHashSetTIterator function End(self)
   type(ftlHashSetT), intent(in) :: self

Notice that an ftlHashSet object makes no guarantees on which order its elements follow. But, in any case, the range that goes from its begin to its end covers all the elements in the container.

Note that this method exists as both type-bound and as a free function.

Capacity

ftlHashSet%Empty()

Returns a logical value indicating whether the set is empty, i.e. whether its size is 0.

logical function Empty(self)
   type(ftlHashSetT), intent(in) :: self

This function does not modify the content of the container in any way. To clear the contents, member function ftlHashSet%Clear() exists.

ftlHashSet%Size()

size(ftlHashSet)

Returns the number of entries in the ftlHashSet container.

integer function Size(self)
   type(ftlHashSetT), intent(in) :: self

Element access

ftlHashSet%Insert()

Inserts a value into the set. If the value is already in the set, the container remains unchanged.

subroutine Set(self, key, value)
   type(ftlHashSetT), intent(inout) :: self
   T                , intent(in)    :: value

Insertion into the set is a constant time operation, except for the case where the insertion causes the maximum load factor to be exceeded. In this case all N elements of the set are rehashed, which is an O(N) operation.

ftlHashSet%Has()

Checks whether a value is in the set.

logical function Has(self, value)
   type(ftlHashSetT), intent(in) :: self
   T                , intent(in) :: value

T operator(.in.) ftlHashSet

Python style for checking whether a set has an entry for a particular key. Absolutely equivalent to ftlHashSet%Has().

logical function operator(.in.)(lhs, rhs)
   T                , intent(in) :: lhs
   type(ftlHashSetT), intent(in) :: rhs

ftlHashSet%Find()

Searches the container for a value and returns an iterator to it if found, otherwise it returns an iterator to ftlHashSet%End() (the element past the end of the container).

type(ftlHashSetTIterator) function Find(self, value)
   type(ftlHashSetT), intent(in) :: self
   T                , intent(in) :: value

Modifiers

ftlHashSet%Erase()

Removes entries from the ftlHashSet container, reducing the size of the container by the number of removed elements.

  • Remove the entry with a particular value. This has no effect if the set contains no such entry.

    subroutine Erase(self, value)
       type(ftlHashSetT), intent(inout) :: self
       T                , intent(in)    :: value
  • Remove the entry pointed to be a particular iterator.

    subroutine Erase(self, it)
       type(ftlHashSetT)        , intent(inout) :: self
       type(ftlHashSetTIterator), intent(in)    :: it
  • Removes all entries in the (half open) range delimited by the iterators [first,last).

    subroutine Erase(self, first, last)
       type(ftlHashSetT)        , intent(inout) :: self
       type(ftlHashSetTIterator), intent(in)    :: first, last

ftlHashSet%Clear()

Removes all entries from the set, leaving it with a size of zero.

subroutine Clear(self)
   type(ftlHashSetT), intent(inout) :: self

Buckets

ftlHashSet%BucketCount()

Returns the number of buckets in the ftlHashSet container.

The number of buckets influences directly the load factor of the container's hash table (and thus the probability of collision). The container automatically increases the number of buckets to keep the load factor below a specific threshold (its maxLoadFactor), causing a rehash each time the number of buckets needs to be increased.

integer function BucketCount(self)
   type(ftlHashSetT), intent(in) :: self

ftlHashSet%BucketSize()

Returns the number of elements in bucket n.

The number of elements in a bucket influences the time it takes to access a particular element in the bucket. The container automatically increases the number of buckets to keep the load factor (which is the average bucket size) below its maxLoadFactor.

integer function BucketSize(self, n)
   type(ftlHashSetT), intent(in) :: self
   integer          , intent(in) :: n

ftlHashSet%Bucket()

Returns the bucket number where the element with value is (or would be) located. Buckets are numbered from 1 to BucketCount.

integer function Bucket(self, value)
   type(ftlHashSetT), intent(in) :: self
   T                , intent(in) :: value

Hash policy

ftlHashSet%LoadFactor()

Returns the current load factor in the ftlHashSet container, that is the ratio between the number of elements in the container (its size) and the number of buckets.

real function LoadFactor(self)
   type(ftlHashSetT), intent(in) :: self

The load factor influences the probability of collision in the hash table (i.e., the probability of two elements being located in the same bucket). The container automatically increases the number of buckets to keep the load factor below a specific threshold (its maxLoadFactor), causing a rehash each time an expansion is needed.

To retrieve or change this threshold, use member functions GetMaxLoadFactor and SetMaxLoadFactor:

ftlHashSet%SetMaxLoadFactor()

ftlHashSet%GetMaxLoadFactor()

Sets/Gets the maximum load factor (number of elements per bucket). The container automatically increases the number of buckets if the load factor exceeds this threshold.

subroutine SetMaxLoadFactor(self, ml)
   type(ftlHashSetT), intent(inout) :: self
   real             , intent(in)    :: ml

real function GetMaxLoadFactor(self)
   type(ftlHashSetT), intent(in) :: self

ftlHashSet%Rehash()

Sets the number of buckets in the container to n, forcing a rehash of the elements in the container.

subroutine Rehash(self, nBuckets)
   type(ftlHashSetT), intent(inout) :: self
   integer          , intent(in)    :: nBuckets

A rehash is the reconstruction of the hash table: All the elements in the container are rearranged according to their hash value into the new set of buckets. This may alter the order of iteration of elements within the container.

Rehashes are automatically performed by the container whenever its load factor is going to surpass its maxLoadFactor in an operation.

Notice that this subroutine expects the number of buckets as argument. A similar subroutine exists, ftlHashSet%Reserve(), that expects the number of elements in the container as argument.

ftlHashSet%Reserve()

Sets the number of buckets in the container (bucketCount) to the most appropriate to contain at least n elements.

subroutine Reserve(self, n)
   type(ftlHashSetT), intent(inout) :: self
   integer          , intent(in)    :: n

If n is greater than the current bucketCount multiplied by the maxLoadFactor, the container's bucketCount is increased and a rehash is forced.

If n is lower than that, the subroutine has no effect.