WAV to MP3 Conversion Technology

Understanding Audio Formats

WAV Format

  • Uncompressed audio format
  • PCM (Pulse Code Modulation) encoding
  • High quality but large file size
  • Sample rates typically 44.1kHz or 48kHz
  • 16-bit or 24-bit depth

MP3 Format

  • Lossy compressed audio format
  • MPEG-1 Audio Layer III standard
  • Smaller file size with good quality
  • Uses perceptual coding
  • Bitrates from 32kbps to 320kbps

The Conversion Process

1. Audio Decoding

The WAV file is decoded from its PCM format into raw audio samples. This involves reading the audio data header to understand the sample rate, bit depth, and number of channels.

2. Psychoacoustic Analysis

The encoder performs a Fast Fourier Transform (FFT) to analyze the frequency content of the audio. It identifies sounds that are less perceptible to human hearing based on:

  • Frequency masking (louder sounds mask quieter ones at similar frequencies)
  • Temporal masking (sounds mask other sounds that occur shortly before or after)
  • Absolute threshold of hearing (sounds too quiet to be heard)

3. MDCT Transformation

The audio is divided into frames and transformed using Modified Discrete Cosine Transform (MDCT). This converts the time-domain audio samples into frequency-domain coefficients.

4. Quantization

Based on the psychoacoustic model, the encoder allocates bits to different frequency bands. More bits are given to perceptually important frequencies, while less important frequencies are coarsely quantized or discarded.

5. Huffman Coding

The quantized coefficients are compressed using Huffman coding, which assigns shorter codes to more frequent values.

6. Frame Packing

The processed data is packed into MP3 frames with headers containing metadata like bitrate and sampling rate. Each frame is 26ms of audio (1152 samples at 44.1kHz).

Our Implementation

This website uses the LAME encoder (via lamejs) running entirely in your browser. Key features:

  • Client-side processing ensures your audio never leaves your device
  • Supports variable bitrate (VBR) and constant bitrate (CBR) encoding
  • Implements fast psychoacoustic models optimized for web performance
  • Uses Web Workers to prevent UI freezing during conversion

Technical Specifications

Encoder Version LAME 3.100 (JavaScript port)
Supported Sample Rates 8kHz, 11.025kHz, 12kHz, 16kHz, 22.05kHz, 24kHz, 32kHz, 44.1kHz, 48kHz
Supported Bitrates 32kbps to 320kbps
Channel Modes Mono, Stereo, Joint Stereo