Compression
- The act of making something smaller
- Often used to store/transmit efficiently
- Reduction in the amount of space needed to store a piece of data or the bandwidth to transmit it
- Compression ratio
- Size compressed data / size of original data
- Can be lossy or lossless
- Lossless --> compression where no data is lost
- Lossy --> compression where some data is lost
Text Compression
- Three common types:
Keyword Encoding
- Frequently used words are replaced with characters
- We do not bother encoding single length words (I, a, etc) because we wouldn't save any space
- The longer the word, the higher the compression ratio
Keyword Encoding Example
| word | symbol |
|---|---|
| As* | ^ |
| the | ~ |
| and | + |
| that | $ |
| must | & |
*note the uppercase A
As the Sun and the Moon circle the Earth, we must remember that we are small.
becomes...
^ ~ Sun + ~ Moon circle ~ Earth, we & remember $ we are small.
Run-Length Encoding
- When characters are repeated, we remove all duplicates and replace them with a count
AAAAAAA--->*A7*is a flag indicating compression needs to be decoded
Huffman Encoding
- Uses variable-length bit strings to represent each characters
- Frequently used characters have short codes
- Saves space
- Every bit string representing a character must have a unique prefix
- Allows you to decode
Decoding
- Look for match left --> right (bit by bit)
- Replace when match found
- Repeat until decoded
Huffman Encoding Example
| Huffman Code | Character |
|---|---|
| 00 | A |
| 01 | E |
| 100 | L |
| 110 | O |
| 111 | R |
| 1010 | B |
| 1011 | D |
- Encoding
DOORBELLwould result in1011110110111101001100100 - If this used a fixed-size bit string for each character (say 8 bits) it would take 64 bits to store the word (8 x 8)
- The compressed version only takes 25 bits
- Compression ratio is 25/64 (~0.39)
Video Compression
Temporal Compression
- Looks for differences between consecutive frames
- If most of an image in two frames hasn't changed, we don't need all of the data of the second frame
- Keyframe is chosen to compare differences
- Entire keyframe is stored
- Consecutive images only store changes
- Delta frames
Spatial Compression
- Removes redundant information within a frame
- Similar issues faced by compression still images
- Group pixels into blocks that have same color
- Instead of storing each pixel, color and coordinates are stored
- Similar to run-length encoding