The crappy-old-mp3 standard has now been around for nearly a quarter-century. Most kinds of audio players and web browsers have supported better, legally-free formats for a while now, but as usual Microsoft and (most prominently) Apple are stuck with only formats that you have to pay a metaphorical “poll tax” for permission to use.
.mp3 is one of them, of course. By modern standards, mp3 is pretty poor. It’s high-latency so it’s not suitable for interactive or live uses (e.g. VoIP), the quality is lacking at all but the highest bitrates (so you either have low-quality audio or huge files to transmit and store). It’s also weighed down by a bunch of patents, of course, so you can’t even legally make or use .mp3 files without somebody paying protection money to some lawyers for permission…
…or can you?
Personally, I’d rather just never touch the stuff (patents or not, the audio format is lacking, and I really don’t like the fussy, limited little “id3″ standards for its metadata, either), but I grudgingly concur that having a basic “fallback” format that Microsoft/Apple/fifteen-year-old-media-player owners could use until they realize they can upgrade to something better by using a different browser/media-playing app.
I occurs to me that with the original specification for .mp3 being published in 1993 or so, more than 20 years ago, and given that patents aren’t supposed to last more than 20 years, it seems like a reasonable assumption that at least some if not all of the still-threatening patents (the last of which still doesn’t expire until 2017!) are optional “optimizations” or techniques that don’t necessarily have to be applied to generate a valid .mp3 file that ancient media players (or new media players from ancient companies…) can at least play back, even if the files are not “optimal”.
For example (DISCLAIMER: IANAL), This Big List of MP3 Patents shows that there are 9 patents left (as of 2014) keeping mp3 locked up. However, at least four of those are specifically patents on ways of encoding two or more channels of audio (i.e. stereo, surround-sound, etc.), so a single-channel (mono) audio stream encoded to .mp3 should definitely not trespass on half of those patents. One more appears to be specific to techniques for encoding “low sampling rate” audio (i.e NOT the usual 44.1kHz or 48kHz), so a typical 44.1kHz or 48kHz audio source encoded to .mp3 at that rate wouldn’t trespass, either.
That leaves 4 remaining US patents to be tiptoed around to generate legal .mp3 files:
- Digital encoding process (expires November 26, 2016)
- Digital adaptive transformation coding method (expires April 21, 2015)
- Digital coding process for transmission or storage of acoustical signals by transforming of scanning values into spectral coefficients (expires August 29, 2017)
- Method and apparatus for encoding digital signals employing bit allocation using combinations of different threshold models to achieve desired bit rates (expires April 16, 2017)
Trying to read those things make my head hurt, but it kind of looks to my definitely-non-expert eye like 5579430 has something to do with “average bit rate” encoding (not “constant bit rate”), so possibly a true “constant bitrate” encoding wouldn’t infringe?
5742735 looks kind of like it’s both a specific way of designing an encoder (with a “controller” and multiple “multi-signal processors”), AND possibly involves application of particular and/or multiple “psychoacoustic models” (as I understand it, more or less an algorithm for deciding when some part of the sound can’t be noticed anyway and can be degraded or thrown out entirely, to save extra room for detail in more important parts of the sound). I’m at a loss as to whether an encoder like LAME actually falls under this patent at all. Anybody know? Even if LAME’s architecture might trespass on the special “multi-signal processor” stuff, might it be possible that running it at “-q 9″ (“disables almost all algorithms including psy-model.”) would avoid this patent?
5924060 is also a bit beyond me, but I notice claim 7 specifically calls for application of a psychoacoustic model, so perhaps “lame -q 9″ might avoid it entirely? Beyond that, I can’t tell how avoidable the techniques described are.
6009399 seems to specifically claim using multiple psychoacoustic models at the same time (and then deciding which one works best for a particular sample and using that one, if I’m interpreting that correctly). Seems like not applying a psychoacoustic model to optimize the encoding would bypass this one, or for that matter even only using one model for the entire encoding process.
So then…am I correct in thinking that there’s a good chance that “lame -q 9 -m m –cbr”, given input audio with 44.1kHz or 48kHz sampling rate, would avoid the last remaining patent threats still hovering over mp3?
(And, yes, I’m aware that the result would sound even worse than usual mp3 files, but the point is merely to generate “usable” mp3 data as a fallback for old/recalcitrant audio players. I’ve got .opus, .ogg [vorbis], and .flac for high-quality audio.)
This content is published under the Attribution-Share Alike 3.0 Unported license.