Some strengths of the file format

This file format is pretty simple to parse
With this file format we can only read the blocks per the attributes that specified in the projection and selection clause.
It is also possible to filter out the blocks for a given attribute by checking the min/max metrics, a very primitive sort of "zone map"

Some weaknesses of the file format

Attributes like Sex and Farm Name in chicken table have many same value, leave them as it is without compress would require more disk/buffer-pool space to hold the data in disk/memory.
All the blocks of a single attribute are placed together continuously, in other words, different attributes far from each other, this would increase the cost of stitching tuple back together.
Min/Max metric is not good for attribute that have many same values, for example the attribute Sex in chicken.

Suggestions for improving the file format

We may introduce a block-based compression machinery to reduce the file size on disk and the memory footprint.
Store the data of a single attribute into multiple small chunks and group different chunks of attributes like parquet row group to make stitch tuple back together more efficient.
Support bitmap index as the filter in the block stat might a good idea.

Other thoughts

Although we are using the columnar file format, the postgres scan executor is assuming a row-major traversal, for example, the ExecScanAccessMtd(in our db721 fdw, it's ForeignNext) is called for each row or TupleTableSlot in the source code. This assumption limit the power of the columnar storage for example this row-major traversal might not be cache friendly in the case of columnar file format.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reflection.md

reflection.md

Some strengths of the file format

Some weaknesses of the file format

Suggestions for improving the file format

Other thoughts

Files

reflection.md

Latest commit

History

reflection.md

File metadata and controls

Some strengths of the file format

Some weaknesses of the file format

Suggestions for improving the file format

Other thoughts