Skip to content

sudarshanmg/huffman

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Huffman compression

  • Compresses files using the huffman-compression technique
  • Achieves best results (compression ratios) if the given file has many repeated characters
  • Can also encode and decode a string (legacy)
  • Takes a file as input and spits out a compressed file

Installation

  • Build the executable using the cmd make
  • Run the huffman_compress file with appropriate inputs
  • Make sure to make clean before re-building

Huffman Compression Process

  • Each char from the input file is assigned a code in binary that is way shorter than 8 bits (size required to store a char)
  • The input file is then encoded using these codes (known as the huffman codes)
  • The compressed file's metadata consist of huffman codes + encoded data
  • Huffman codes are generated by building a tree known as the huffman tree which internally uses a min-heap to pick out and insert the nodes to the tree
  • Here's a detailed video on how huffman coding works by the OG Abdul Bari