🧠

Compression

  • The act of making something smaller
  • Often used to store/transmit efficiently
  • Reduction in the amount of space needed to store a piece of data or the bandwidth to transmit it
  • Compression ratio
    • Size compressed data / size of original data
  • Can be lossy or lossless
    • Lossless --> compression where no data is lost
    • Lossy --> compression where some data is lost

Text Compression

Keyword Encoding

  • Frequently used words are replaced with characters
  • We do not bother encoding single length words (I, a, etc) because we wouldn't save any space
  • The longer the word, the higher the compression ratio

Keyword Encoding Example

wordsymbol
As*^
the~
and+
that$
must&

*note the uppercase A

As the Sun and the Moon circle the Earth, we must remember that we are small.

becomes...

^ ~ Sun + ~ Moon circle ~ Earth, we & remember $ we are small.

Run-Length Encoding

  • When characters are repeated, we remove all duplicates and replace them with a count
    • AAAAAAA ---> *A7
    • * is a flag indicating compression needs to be decoded

Huffman Encoding

  • Uses variable-length bit strings to represent each characters
  • Frequently used characters have short codes
    • Saves space
  • Every bit string representing a character must have a unique prefix
    • Allows you to decode

Decoding

  • Look for match left --> right (bit by bit)
  • Replace when match found
  • Repeat until decoded

Huffman Encoding Example

Huffman CodeCharacter
00A
01E
100L
110O
111R
1010B
1011D
  • Encoding DOORBELL would result in 1011110110111101001100100
  • If this used a fixed-size bit string for each character (say 8 bits) it would take 64 bits to store the word (8 x 8)
  • The compressed version only takes 25 bits
  • Compression ratio is 25/64 (~0.39)

Video Compression

Temporal Compression

  • Looks for differences between consecutive frames
  • If most of an image in two frames hasn't changed, we don't need all of the data of the second frame
  • Keyframe is chosen to compare differences
    • Entire keyframe is stored
  • Consecutive images only store changes
    • Delta frames

Spatial Compression

  • Removes redundant information within a frame
  • Similar issues faced by compression still images
  • Group pixels into blocks that have same color
    • Instead of storing each pixel, color and coordinates are stored
    • Similar to run-length encoding

Computer Science

Backlinks