Alex Wetmore is always busy with something…

Archive for the ‘Audio’ Category.

Bulk CD Ripping — Part Two: FLAC images to MP3 files

December 20, 2006, 1:25 am

Part one of this series showed how I ripped a large pile of CDs into FLAC image files. The FLAC image files are playable, but most of our hardware wants MP3 files and I’d prefer one file per track instead of one file per album. The FLAC image files also have crummy tags from CDDB so there will be a lot of typos in artist names and track titles.

A friend and I started tackling this problem a few years ago and wrote a pair of Python scripts called tag.py and transcode.py that converted FLAC images intoMP3s. When we wrote these scripts FLAC images were a sort of new concept and we came up with our own unique way of tagging them that no one else used. Tag.py read an EAC-generated CUEsheet (including CDDB tags) and enter them into the FLAC image. We could (painfully)hand edit the tags in the FLAC image after this step. Transcode.py read the FLAC image, including the tags embedded by tag.py and generated MP3s (or any other format that we wanted, including per-track FLAC files).

The process worked, but it was clumsy. We also still relied on CDDB tags for our music and then a lot of hand editting to fix them.

Since writing those scripts a CDDB replacement called MusicBrainz has become mature. MusicBrainz is a much stricter tag database where all entries are reviewed and there are tight relationships between artists, releases (albums) and tracks. This tight relationship means that each artist only has one name, you won’t have problems with one CD being tagged “The Beatles” while another is tagged “Beatles” and a third is tagged “Beatles, The”.

We talked about our old scripts and came up with a better system:

tag.py would find the CD in the MusicBrainz database and get the MusicBrainz ID for it. This is a unique ID that identifies that album.
tag.py would embed the MusicBrainzID into the FLAC image.
transcode.py would read the MusicBrainzID, get the track metadata from MusicBrainz, and encode our MP3s.

Once you’ve run tag.py you can regenerate MP3s (or any other type of music file) just by rerunning transcode.py. This future proofs our music.

I rewrote tag.py to have a small GUI. When opening a FLAC file tag.py first computes the DiscId (a mostly-unique identifier used to find the disc in the MusicBrainz database) and sends this to MusicBrainz. If the DiscId isn’t found then it searches based on the artist and title that we already have in the FLAC image from CDDB. The GUI lets you see each of these matches and pick the right one. If you don’t find the right one then you can search on your own and get the MusicBrainzID and paste that into the GUI. Once you’ve found the right tags for this album you hit save and it writes the ID out the FLAC.

Note that my day job is writing server software, not GUIs, and it shows in the ugliness of this tool. Also,even though this has a few buttons along the bottom it is really designed to be keyboard driven. A lot of output is written to the console that you started the tool from. Consider it a half-GUI/half command line tool.

You start the tagger by running tag.py and passing it some filename globs. For instance “tag.py *.flac” will have it work on all of the FLAC files in the current directory.

The screen is divided into three sections. The top section shows you the important data from the FLAC image (filename,CDDB artist/title, DiscId, embedded MusicBrainz ID and number of tracks). The fields used for a MusicBrainz search will show up in red. The second section show you the results for a current MusicBrainz query. You can enter a new MusicBrainz ID if you find something better on their search page. The last section has buttons that map to some of the key presses and a status bar that tells you what it is doing.

Here are the keys that matter:

Control-N and Control-P — Go to the next and previous FLAC file
Alt-N and Alt-P — Go to the next and previous hit from a MusicBrainz search
Alt-S — Save the current MusicBrainz ID to the current FLAC
Alt-L — Reload MusicBrainz matches (this searches by MusicBrainz ID, Disc ID, and CDDB artist and title)
Control-Shift-N — Find the first FLAC without a MusicBrainzID
Alt-Q — quit

With this tool I can find the right tags for our FLAC images from MusicBrainz at a rate of about 100 CDs every 20 minutes. This includes entering releases into MusicBrainz for CDs that we own, but which they don’t have in their database.

Once tag.py has been run on a bunch of FLAC files you just run transcode.py and walk away. This will transcode each of your FLAC images into seperate MP3 files and put them in the right directory. The exact method for doing this is controlled by the file transcode.cfg. Here is my version of the file:

[GeneralConfig]
Flac: d:/util/bin/flac.exe
Metaflac: d:/util/bin/metaflac.exe
Encoders: mp3
[mp3]
Directory: f:/music-rerip/mp3/New/$P/$T
Filename: %D%n-$t.mp3
FilenameVA: %D%n-$t($p).mp3
Command: d:/util/bin/lame.exe –alt-preset standard –id3v2-only –tt %t –ta %P–tl %T –ty %Y –tn %n –tg Rock %f %F
CommandVA: d:/util/bin/lame.exe –alt-preset standard –id3v2-only –tt %t –ta %p –tl %T –ty %Y –tn %n –tg Rock %f %F

This tells transcode.py that we are going to use one encoder and that it will make mp3 files. The files will go into d:musicmp3new and then be listed under performer and release title. If the album is a compilation (coming from multiple artists) then the second command listed is used to encode them, otherwise the first one is. If we wanted two different qualities of MP3 (high for home use,crappy for portable device use) you could just make another config that looks like this:

[GeneralConfig]
Flac: d:/util/bin/flac.exe
Metaflac: d:/util/bin/metaflac.exe
Encoders: mp3,mp3crappy
[mp3]
Directory: f:/music-rerip/mp3/New/$P/$T
Filename: %D%n-$t.mp3
FilenameVA: %D%n-$t($p).mp3
Command: d:/util/bin/lame.exe –alt-preset standard –id3v2-only –tt %t –ta %P–tl %T –ty %Y –tn %n –tg Rock %f %F
CommandVA: d:/util/bin/lame.exe –alt-preset standard –id3v2-only –tt %t –ta %p –tl %T –ty %Y –tn %n –tg Rock %f %F
[mp3crappy]
Directory: f:/music-rerip/mp3crappy/New/$P/$T
Filename: %D%n-$t.mp3
FilenameVA: %D%n-$t($p).mp3
Command: d:/util/bin/lame.exe –alt-preset 96 –id3v2-only –tt %t –ta %P–tl %T –ty %Y –tn %n –tg Rock %f %F
CommandVA: d:/util/bin/lame.exe –alt-preset 96 –id3v2-only –tt %t –ta %p –tl %T –ty %Y –tn %n –tg Rock %f %F

All of these scripts are at http://www.phred.org/~alex/transcode. Note that they are likely to change a lot in the next couple of weeks. Here is whatyou’ll find there today:

MusicBrainzHelper.py — A helper class for Python tomake it easier to work with MusicBrainz
FlacHelper.py — A helperclass for Python to make it easier to work with FLAC files.
tag.py — The GUI tool show above for adding the DISC_MUSICBRAINZ_ID tag to FLAC images.
transcode.py — The command line tool to convert FLAC images to MP3 or other files
cache-tags.py — This will cache the tags for each FLAC image if you want to run transcode.py while disconnected from the internet.
TODO — Known bugs

alex

Category: Audio, Computers, Music | 7 Comments

Bulk CD Ripping — Part One: CDs to FLAC Images

December 20, 2006, 12:55 am

This is going to be a multipart blog series on ripping CDs. I’ll start be describing the goals that lead me to this solution:

Use a hardware changer to rip the CDs so that I don’t need to sit there and feed the computer each disk one at a time.
Use ripping software which detects and corrects for errors during ripping caused by scratches or fingerprints on the CDs.
Rip to a lossless format so that the files can be converted to any audio format at any point in the future.
Each ripped disk should be represented by a single file (instead of a file per track) so that an exact copy of the CD can be burned in the future if that need arises.
Use the most reliable tagging (metadata) database available so that the CDs end up with consistent and accuate titles.

Part one talks about the first four bullet points. Part two will expand on the last bullet point.

I had some vacation time to burn and rather than doing something useful like going on a long bike ride, remodeling the downstairs bathroom, or even reading a few novels I decided to rerip all of our music. Since I’m a software developer and not a CD changer I decided to buy new hardware and then write software to make it all work.

I ordered a Sony VGP-XL1B2 Media Changerfrom Amazon. This is a 200 disk CD/DVD changer designed for Windows Media Center machines. Itconnects to your computer via firewire (1394). The changer isn’t supposed to work under Windows XP and I didn’t want to use Media Center for ripping, but I figured that I could make it work anyway. What really convinced me was this blog post from Matt Goyerwhich described the interface that the changer’s driver must provide to Windows.

I ordered the changer and it arrived the next day (thanks to Amazon Prime’s $3.99 overnight shipping). When I plugged it into a Windows XP SP2 machine I was surprised to see it work without needing to install any drivers. A service called the Removable Storage Manager (RSM) detected it and seemed to start doing something with the CDs. I wasted about two hours playing with RSM before deciding that it was overly complicated for what I needed. So I decided to write my own tool to control the changer.

Using information from Matt Goyer’s blog post and MSDN I was able to write a small program called MediaChanger(the zip in that directory contains source and a debug binary). MediaChanger supports the following commands:

MediaChanger mount — This mounts the disk in slot into the drive.
MediaChanger unmount — This puts the currently mounted disk back to the slot that it came from.
MediaChanger drivestatus — Reports if the drive is empty or full and where the disk came from.
MediaChanger next — Unmounts the current disk and mounts the next one.

This provides enough changer support for a shell script to mount each disk one by one and do something with them. That provided the basis for ripping all of our disks.

Now the problem was automating my preferred ripper, Exact Audio Copy (EAC). EAC is the best audio ripper for CDs if you care about getting perfect rips with no errors. It is not always fast, but it always does a good job. It also supports ripping to a format called WAV+CUE which is a WAV file with all of the music and a text file called a CUEsheet that has the table of contents from the disk. We need WAV+CUE support to create FLAC images.

Sadly EAC is not designed to be scripted, which makes using it with my MediaChanger program a little difficult. A couple of friends pointed me to a script called REACT. REACT is written in a scripting language called AutoIt and adds features to EAC by watching dialog boxes and using menu items as a human would. This is a really ugly way to script something, but it is also the only option available for EAC. I modified REACT to add the following features:

When finished ripping a diskswitch to the next one and start ripping (this is turned on by pressing Alt-F5). This is built on top of REACT’s existing feature of ripping an image by pressing F10.
When EAC finds multiple hits for a CD in CDDB use the first one.
If there is no CDDB information for a CD then title the CD after the slot that it came from.

I’m using REACT to generate FLAC images from the WAV+CUE ripped by EAC. This produces a single FLAC file which contains all of the data from the CD. In part two of this series I’ll explainhow Itag and convert my FLAC images into MP3 files.

With these changes REACT was ready for bulk CD ripping from the changer. I loaded up the changer with a drawer full of CDs from our CD cabinet and let it run. The drive in the changer is not very fast at audio extraction, so it takes about 24 hours for it to rip 200 CDs. This is a little longer than I hoped for it, but at least I don’t have to sit there while it runs.

Category: Audio, Computers, Music | 27 Comments

Alex Wetmore is always busy with something…

Bulk CD Ripping — Part Two: FLAC images to MP3 files

Bulk CD Ripping — Part One: CDs to FLAC Images

Recent Posts

Categories

Archives

Recent Comments

Blogroll

Meta