back

how lossless compression preserves audio quality

nicholas chen · march 14, 2026 · 6 min read

FLAC

in an era of streaming and convenience, audio quality is often overlooked. however, for those who care about the nuances of sound, lossless audio is the gold standard. in this post, i will explore what it is and why it matters.

what is lossless audio?

lossless audio compression reduces the file size of an audio track without losing any data. unlike lossy formats like MP3 or AAC, which discard information to save space, lossless formats like FLAC or ALAC preserve every single bit of the original recording.

lossless vs lossy

lossy formats like MP3 use psychoacoustics — they throw away information humans can't easily hear: sounds masked by louder nearby frequencies, very high frequencies (above ~16 kHz for most adults), and quiet sounds during loud moments (temporal masking).

lossless formats (FLAC, WAV) keep every sample exactly. they compress like a zip file: perfectly reconstructable, nothing discarded.

Less lossyMore lossy

Lossiness: 55% · levels: 21 · kernel: 17 · Frequency: 0 – 3.9 kHz

OriginalSimulated lossy
Spectral representation of compressed sound
Spectral representation of compressed sound.

common lossless formats

there are several lossless formats available today, each with its own advantages. WAV and AIFF are uncompressed formats, while FLAC and ALAC are compressed but still lossless.

what is FLAC?

FLAC = Free Lossless Audio Codec. open source by Xiph.org (same as Ogg Vorbis). completely free, no patents. common alternatives: ALAC (Apple's version, same idea), WAV/AIFF (lossless but uncompressed, no LPC), WavPack (slightly better compression). FLAC is the standard for lossless archival: open, well-supported, ~50–60% compression.

ALAC (apple lossless audio codec)

ALAC is apple's proprietary lossless format. it is similar to FLAC but designed for use within the apple ecosystem, including itunes and apple music.

WAV and AIFF

these are uncompressed formats that store raw PCM audio. they are the highest quality but have the largest file sizes and limited metadata support compared to FLAC or ALAC.

MP3 under the hood

MP3 splits audio into short frames, uses MDCT and a psychoacoustic model to compute a masking threshold per band, then allocates bits only where the signal is audible. you're compressing perceptual error — the removed data is designed to be inaudible. at 128 kbps it mostly works; at 64 kbps artifacts appear.

why does it matter?

the primary benefit is sound quality. lossless audio provides more detail, better dynamic range, and a wider soundstage. it's also essential for archiving and professional audio work. if you ever need to convert your music to another format, starting from a lossless source ensures the best possible results.

honest answer: for casual listening on spotify through airpods, you generally cannot hear the difference. studies show ABX tests at 320 kbps are basically coin flips for most people. the meaningful difference is archival and editing, not perceptual quality on a good encode.

Visualizer: FLAC
0:00 / 0:00
MP3FLAC

Note: It’s really hard to tell the difference for most people.

how lossless audio is compressed

FLAC and similar codecs use linear prediction plus entropy coding to shrink the file without losing a single sample. below is how it works.

linear prediction in depth

a "sample" is one number in the sequence. at CD quality, 44,100 numbers per second — each is air pressure at that moment. a 3-minute song is ~8 million integers: x[0],x[1],x[2],,x[n]x[0], x[1], x[2], \ldots, x[n].

why is audio predictable? sound waves are smooth. if the last four samples were 100, 105, 110, 115, the next is probably ~120. the predictor finds coefficients a1,a2,,apa_1, a_2, \ldots, a_p so that the next sample is best predicted by: x^[n]=a1x[n1]+a2x[n2]++apx[np]\hat{x}[n] = a_1 x[n-1] + a_2 x[n-2] + \cdots + a_p x[n-p]

concrete example. say p=3p=3 and FLAC found these coefficients: a1=1.5,a2=0.7,a3=0.2a_1 = 1.5,\quad a_2 = -0.7,\quad a_3 = 0.2 and the last three samples were x[n1]=100x[n-1]=100, x[n2]=90x[n-2]=90, x[n3]=80x[n-3]=80. the prediction is: x^[n]=1.5(100)+(0.7)(90)+0.2(80)=15063+16=103\hat{x}[n] = 1.5(100) + (-0.7)(90) + 0.2(80) \\ = 150 - 63 + 16 = 103 if the actual sample x[n]=105x[n] = 105, the residual is: e[n]=105103=2e[n] = 105 - 103 = 2 so instead of storing 105 (needs ~8 bits), you store 2 (needs ~2 bits). that's the compression.

// Same example: p=3, coefficients a1=1.5, a2=-0.7, a3=0.2
const a = [1.5, -0.7, 0.2];
const prev = [100, 90, 80];  // x[n-1], x[n-2], x[n-3]

let pred = 0;
for (let k = 0; k < a.length; k++) pred += a[k] * prev[k];
// pred = 1.5*100 + (-0.7)*90 + 0.2*80 = 103

const xActual = 105;
const residual = xActual - pred;  // e[n] = 105 - 103 = 2
// Store residual (small) instead of 105 (large) → compression.

how does FLAC find the coefficients?

uses Levinson–Durbin algorithm — solves a system of equations called the Yule–Walker equations. basically finds the \( a \) values that minimize the average squared residual:

mina1apne[n]2=minn(x[n]k=1pakx[nk])2\min_{a_1 \ldots a_p} \sum_n e[n]^2 = \min \sum_n \left(x[n] - \sum_{k=1}^p a_k x[n-k]\right)^2

this is just least squares regression but for time series. FLAC stores the winning coefficients in the subframe header so the decoder can reconstruct.

order pp — how many previous samples?

higher order = better prediction = smaller residuals = better compression, but you have to store more coefficients. FLAC tries multiple orders and picks the best tradeoff. typically p=8p = 8 to p=12p = 12 for music. order 1 (just previous sample) works okay for slowly varying signals. order 8+ captures more complex wave patterns like harmonics.

the key point:

the decoder has the same coefficients and the residuals. it just runs x[n]=x^[n]+e[n]x[n] = \hat{x}[n] + e[n] since nothing was ever approximated or thrown away — residuals stored exactly — you get back the original perfectly every time.

what is n?

n is just the index — the position of the current sample in the sequence. so if you have x[0],x[1],x[2],x[3],x[4],x[0], x[1], x[2], x[3], x[4], \ldots and you're predicting sample number 50, then n=50n = 50: x[n]=x[50]x[n] = x[50], x[n1]=x[49]x[n-1] = x[49], x[n2]=x[48]x[n-2] = x[48]. the formula works for any nn — you slide it across the whole audio sequence, predicting each sample from the ones before it.

why store the error?

the residual e[n]e[n] is the predictor's mistake. if it guessed 103 and the real sample was 105, e[n]=2e[n] = 2. we store the error because it's almost always a small number — small numbers need fewer bits. raw sample 105 could be anything in ±32768\pm 32768 (16 bits); residual 2 needs ~2–3 bits. audio is smooth so the predictor is usually close; the error is the small unpredictable part. we never round e[n]e[n] — we store it exactly. that's what makes it lossless.

so instead of writing 16 bits for every sample, you write: the predictor coefficients once per frame (small, fixed cost), and 2–3 bits per residual instead of 16 bits per sample. across millions of samples that difference is massive.

why are residuals almost always small? because audio is smooth — the predictor is pretty good, so the error is rarely large. the distribution looks like: e=0e = 0 → very common; e=±1,±2e = \pm 1, \pm 2 → common; e=±100e = \pm 100 → rare. Rice coding exploits this: small numbers get short codes, large numbers get long codes. since large residuals are rare, the average bits per sample stays low.

the lossless guarantee: if you stored e[n]=5e[n] = 5 instead of e[n]=2e[n] = 2, you'd get a different x[n]x[n] back. so FLAC stores the exact integer residual every time — no rounding.

how does the error give back the audio?

it doesn't — on its own the error is meaningless. the decoder needs both: x[n]=x^[n]+e[n]x[n] = \hat{x}[n] + e[n]. it has the coefficients, runs the same predictor to get x^[n]\hat{x}[n], then adds the stored residual. prediction + error = original sample exactly. example: predictor 103, residual 2 → 103 + 2 = 105 ✓. think of it like directions: "start at the coffee shop (prediction) and walk 2 steps east (residual)." together they get you exactly there.

// Decoder: prediction + residual → original sample
const pred = 103;   // from same coefficients + previous samples
const residual = 2; // stored in the bitstream
const xReconstructed = pred + residual;  // 103 + 2 = 105 ✓

rice coding

goal: small residuals are common, large ones rare. give small numbers short codes, large numbers long codes — like Morse code (E is one dot).

the parameter kk: with k=2k = 2 you split at the 22=42^2 = 4 boundary. quotient q=e/4q = \lfloor |e|/4 \rfloor (how many 4s fit), remainder r=emod4r = |e| \bmod 4 (the leftover). store qq in unary (qq ones then a zero), then rr in kk bits.

concrete example: e=6e = 6, k=2k = 2. then q=6/4=1q = \lfloor 6/4 \rfloor = 1, r=6mod4=2r = 6 \bmod 4 = 2. store qq in unary: 1 one followed by a zero → 10. store rr in binary with k=2k = 2 bits → 10. full code: 10 10 = 4 bits. versus storing 6 in 16-bit audio = 8 bits. already saving bits.

why unary for qq? unary means qq ones then a zero: q=0q = 0 → 0 (1 bit); q=1q = 1 → 10 (2 bits); q=2q = 2 → 110 (3 bits). small qq (small residual) = short code.

e.g. e=0e = 0 is the most common residual (predictor nailed it) → gets the shortest code, just 1 bit.

enormal binaryRice code (k=2)
000000 (1 bit)
10001010 (3 bits)
20010011 (3 bits)
4010010100 (5 bits)

FLAC tries multiple values of kk and picks whichever gives the smallest total size for that block of residuals. it stores the chosen kk in the subframe so the decoder knows how to decode.

references

/