Track and artist separation

This application note builds on the note "Playing tracks sequentially", in which a simple script is developed that plays tracks with random selection.

A short detour on randomness

When you roll a dice, the chance that you roll a 4 is 1/6^th, or roughly 17%. If you roll the dice several times, you may the same value on two successive rolls. That is, if you just rolled a 4 and then roll the dice again, the chance that this next roll also gives a 4 is 1/6^th. Let's assume that you have just rolled the number 4 twice in succession; what is the chance that the next roll will also be 4? Well, still 1/6^th, actually. That is just how chance works.

Chance has no memory and no concept of fairness. This is often misunderstood, and it is the basis of many gambling fallacies. Of course, if you roll a dice very often (and assuming that it is a fair dice), each of the values 1 to 6 will appear approximately equally often. This still does not allow you to make any kind of prediction for the next roll. For example, if out of 60 rolls, it turns out that number 1 has appeared twelve times, number 2 nine times, number 3 thirteen times, number 4 six times, number 5 eleven times and number 6 nine times, then the chance that the next roll will give the number 4 is still 1/6^th. The reasoning that its chance of appearing next must be higher, because its frequency of appearance so far is lower than the expected average of sixty rolls, is invalid.

The above digression becomes relevant when selecting tracks to play in random fashion. With random selection, even when selecting a track from a pool of 50 tracks, it is not unlikely that the same track plays twice in succession The more tracks you have, the smaller the chance that this happens, but it can still happen. It is in fact much more likely than it may appear at first sight: our problem is related to the Birthday Paradox, a well known example how something apparently very unlikely turns out to happen quite often.

In practice, we usually just want to make sure that some minimum of other tracks play before repeating any track. For example, if the script has selected the track "The Mistuned Piano.mp3" at random, we will want to play at least, say, ten other tracks before repeating "The Mistuned Piano" again. Playing more than ten tracks between two repetitions of "The Mistuned Piano" is fine, but playing less than ten tracks in between must be ruled out. This is called "track separation", which is discussed next. A further step is "artist separation", where no track of the same artist (or band) as the one that performs "The Mistuned Piano" should occur within ten tracks that play after "The Mistuned Piano". The criterion of ten tracks is just an example, by the way, you may equally well decide to have a separation of twenty or thirty tracks or artists.

The base script

As stated near the top of this page, this application note builds on the script developed in the note "Playing tracks sequentially". To make this article self-contained, the script from that note is reproduced here.

new TrackCount
new Filename[100 char]

@reset()
    {
    TrackCount = fexist("*.mp3")
    fmatch Filename, "*.mp3", random(TrackCount)
    playrandom
    }

@audiostatus(AudioStat: status)
    {
    if (status == Stopped)
        playrandom
    }

playrandom()
    {
    play Filename
    fmatch Filename, "*.mp3", random(TrackCount)
    }

In this application note, we will make some modifications to function playrandom() (and add a few new functions). The application note "Playing tracks sequentially" also contains an improved version of the above script, but to better present the concept of track and artist separation, I use this, simpler, script.

Track separation: avoiding tracks to be repeated too quickly

We can achieve track separation by maintaining a queue with recently played tracks. Every time that we select a new track number, we look this number up in a queue that we allocated. If the track number is already in the queue, this means that the file was recently played and we go back to choose another random track number. If the chosen track number is not in the queue, we start playing this track and add the track number onto the top of the queue. In addition, we also remove the bottom entry of the queue —that is, the "least recently used" (LRU) entry gets dropped.

const lru_queue_size = 10
static lru_queue[lru_queue_size] = { -1, ... }

bool: addqueue(index, total)
    {
    /* limit queue size to lru_queue_size or to 3/4th of
     * the total count, whichever is lower
     */
    total = min(3 * total / 4, lru_queue_size)

    /* find the item in the queue */
    for (new i = 0; i < total; i++)
        if (lru_queue[i] == index)
            return false

    /* item not found in the queue, add it */
    for (new i = total - 1; i > 0; i--)
        lru_queue[i] = lru_queue[i - 1]
    lru_queue[0] = index

    return true
    }

There is one additional trick in the addqueue() function that handles the LRU queue: the number of tracks in the queue has an additional limit of 3/4^th of the number of tracks that exist in the directory. If we did not do this, and assuming that the queue size is 10 and there are only eight MP3 tracks on the CompactFlash card, no track would ever be accepted after these eight tracks have played. Therefore, the addqueue() limits the queue size to six entries in case the track count is eight.

Also note that the lru_queue variable is initialized to all "-1" values, and that valid track indices are always positive.

Below is a modification of function playrandom() that illustrates how you could use the addqueue() function. It uses a loop to first select a random track number and then test whether that number can be added to the LRU queue (without duplicating an existing entry). If the new entry can be added, we break out of the loop and play the track. If the new entry cannot be added because it is already in the list, we iterate the loop once more and choose a new random number for the track index.

playrandom()
    {
    play Filename

    /* choose an index at random, but avoid repeating
     * recently played tracks
     */
    new index
    do
        index = random(TrackCount)
    while (!addqueue(index, TrackCount))

    fmatch Filename, "*.mp3", index
    }

Artist separation

Track separation is easy, because the index used to select the track, with fmatch() can be doubly used as a unique value representing the track. To aquire a value that uniquely identifies an artist, we will need some help. To start: how do we know the name of the artist or band in the first place?

There are two possible sources to get the artist name from: the ID3 tag or the filename. Whichever scheme you use depends on how you prefer to work and what the source of your MP3 tracks is. In this article, I will use the "filename" scheme. It is common practice that the filename of a track consists of the name of the artist and the title of the song, separated by a dash. For example, the song "The Mistuned Piano" performed by Turmoil Tony would give the filename:

Turmoil Tony - The Mistuned Piano.mp3

For artist separation, the part of the filename left of the dash is the key that we need to add to the Least Recently Used (LRU) queue. The addqueue() function accepts numeric keys, not multi-character strings. Again, we have two options: change the addqueue() function to accept (and store) strings instead of values, or find a way to map a name to a unique number. I use the latter technique, because it is simpler and quicker and it uses less memory.

Hash functions map a name to a number. These functions are widely used in databases, library managers and other programs where quick look-ups of names or other textual data is essential. Many hash functions have been devised. The one below is designed by Daniel Bernstein, with a few adaptions: I changed the hash so that it treats upper case and lower case characters as equal (case insentitive), and it ignores any character below the space (punctuation) —plus, of course, that it only uses the part of the string left of the dash.

artist_id(const filename[])
    {
    /* assume artist/band name and track title are separated
     * with " - ", with the artist coming before the title
     */
    new sentinel = strfind(filename, " - ")
    if (sentinel < 0)
        sentinel = strlen(filename)

    /* calculate a hash from the first part of the string
     * (artist/band name) using a case-insensitive hash (which
     * also ignores non-letters)
     */
    new bool: packedname = ispacked(filename)
    new letter

    new hash = 5381
    for(new i = 0; i < sentinel; i++)
        {
        letter = packedname ? filename{i} : filename[i]
        letter = tolower(letter)
        if (letter > ' ')
            hash = 33 * hash ^ letter
        }

    return hash & cellmax
    }

With the above routine, once we select a file and get its filename, we can derive an almost unique number from it and use that in the call to addqueue(). There are two caveats. The simplest one (simple, because I won't handle the special case) is that a calculated hash is not guaranteed to be unique. It might well be possible that two different names compute to the same hash (this is called a collision). I am not handling the case, because it only happens rarely (provided that you use a decent hash function) and because artist separation with "colliding" artists may go unnoticed in practice. For example, if the band name "Wooden Iron" happens to compute to the same hash as "Turmoil Tony", the net effect would be that a song from Wooden Iron would not play within ten tracks from a song by Turmoil Tony.

The second problem is that we have to handle the case where the LRU queue is bigger than the number of distinctive artist/band names. If we have a queue with 10 elements, and 30 tracks from eight artists on the CompactFlash card, then the script cannot select a new track after one song from each artist has played: all eight artists are in the LRU queue (which holds up to 10 items), so every next track that is randomly selected is immediately rejected: its performing artist is already in the queue. We essentially had the same problem with track separation, and concluded that we must use a value below the track count to limit the LRU queue. With artist separation, we need to limit the LRU queue with a value that is lower than the count of distinctive artists.

To count artists, we run through all tracks and get the hash for the artist/band name. Then, we have to verify whether we have already seen this artist. This is more complicated, because it implies that we must maintain a list of hash values that have ocurred earlier, and we do not know beforehand how large this list must be. Fortunately, an approximate artist count is acceptible, as long as we err on the conservative side. That is, if our program counts 21 different artists, but there really are 23, that is fine —we should just avoid counting more artists then there are (meaning that if our program counts 23 artists and there are only 16, our artist separation may come to a halt).

For a conservative approximate count, we reduce the hash value from 31 bits to 8 bits, so that the range of hash values is reduced to 0 to 255. Then, we create an array with 256 entries that will hold the presence/absence flags for the computed hash values. If a track computes to a hash value of "43" (for example), we set the associated flag (the flag with index 43) to 1. We call these flags buckets and every hash gets dropped in its matching bucket.

count_ids(const pattern[])
    {
    new buckets[256 char] = { 0, ... }
    new filename[128 char]

    /* walk over all tracks, and sort the artist/band IDs
     * into 256 buckets
     */
    new tracks = fexist(pattern)
    for (new i = 0; i < tracks; i++)
        {
        fmatch filename, pattern, i
        /* calculate the hash, then mask with 0xff to reduce
         * the range to 0..255
         */
        new hash = artist_id(filename) & 0xff
        buckets{hash} = 1
        }

    /* now count the non-empty buckets */
    new count
    for (new i = 0; i < 256; i++)
        count += buckets{i}
    return count
    }

In function @reset(), we call this function once to initialize a global variable. For completeness, the snippet below holds the full function.

new TrackCount
new ArtistCount
new Filename[100 char]

@reset()
    {
    TrackCount = fexist("*.mp3")
    ArtistCount = count_ids("*.mp3")
    fmatch Filename, "*.mp3", random(TrackCount)
    playrandom
    }

Finally, we need to change the function playrandom() again, but the modifications are only minor. The new routine (see below) still selects a track at random, but instead of passing this to addqueue() directly, it now looks up the filename and then computes the artist ID (the hash). This artist ID is the value that gets added to the LRU queue.

playrandom()
    {
    play Filename

    /* choose an index at random, but avoid repeating
     * recently played artists/bands
     */
    new index
    do
        {
        fmatch Filename, "*.mp3", random(TrackCount)
        index = artist_id(Filename)
        }
    while (!addqueue(index, ArtistCount))
    }

End notes

At the time of this writing, I am unaware of an artist performing under the name "Turmoil Tony", a band called "Wooden Iron" or a song with the name "The Mistuned Piano".

Programming the MP3 controller/player: track & artist separation