# Documentation

This document describes the features used in detecting hits from www.scoreahit.com. Many of the features used here were extracted with thanks from the Echonest API, for which the feature description is paraphrased from the their documentation. Features derived directly from Echonest are denoted by an asterix (*)

## Official Chart Data

**Peak Chart Position**

The peak position that the song achieved in the UK singles charts. We used sales data from the NME for singles released between 1st January 1960 - 9th March 1960, Record Retailer from 9th March 1960 - 31st December 1968 and The Official Charts Company thereafter. We considered the peak position achieved by a given artist recording a given song (including re-releases), and ignored cover versions by different artists. Note that some songs reach their peak performance after the artist's recording career is over.

**Peak Date**

When the song achieved its Peak Chart Position. Note that this is not when the single was initially released, so that tracks which slowly climb the charts or are re-released may have release dates significantly after the initial release.

## Echonest Features

**Duration ^{*}**

Length of song in seconds. From this we created the following binary features, chosen in such a way that each bin had roughly the same number of songs.

- Duration < 224 seconds
- Duration 224 - 375 seconds
- Duration 376 - 524 seconds
- Duration > 524 seconds

**Tempo ^{*}**

Estimated average tempo in Beats Per Minute (bpm), a basic measure of the speed of a track. The tempi were then binned into the following classes:

- Tempo < 69 bpm
- Tempo 70 - 89 bpm
- Tempo 90 - 109 bpm
- Tempo 110 - 129 bpm
- Tempo 130 - 149 bpm
- Tempo 150 - 169 bpm
- Tempo 170 - 189 bpm
- Tempo > 189 bpm

**Time Signature ^{*}**

An estimated overall time signature of a track. The time signature (meter) is a notational convention to specify how many beats are in each bar (or measure). From this we derived the following binary features:

- Binary Time Signature. Songs in 2/4 or 4/4
- Tertiary Time Signature. Songs in 3/4 or 6/8
- Complex Time Signature. Songs in other meters

**Mode ^{*}**

A mode is a type of scale as related to its tonic, which for example would not include the black keys on a piano (flats and sharps). Mode tells us if a piece is in a major or minor key. 1 if the key is major, 0 if minor.

**Loundess ^{*}**

The overall loudness of a track in decibels (dB). Loudness is the quality of a sound that is the primary psychological correlate of physical strength (amplitude).

## High-Level Features

**Energy ^{*}**

How energetic is the music? Does it make you want to bop all over the room, or fall into a coma? The feature mix The Echonest uses to compute energy includes loudness and segment durations.

**Beat CV**

Stands for Beat Co-efficient of Variation, and measures how much the inter-beat time varies. Let the beat locations for a song be denoted as b_{i},

i = 1 ... B, and let the differences between these beat times be bd_{i} = d_{i}-d_{i-1}, i = 2 ... B. Let the mean of these beat differences be μ_{bd}. The Beat CV is then calculated as the standard deviation of the differences divided by the mean:

**Loudness CV**

Stands for Loundess Co-efficient of Variation, and measures how much the loudness varies. Let the loudness of each sample be denoted as l_{i},

i = 1 ... L, and let the differences between these loundesses be ld_{i} = l_{i}-l_{i-1}, i = 2 ... L. Let the mean of these loudness differences be μ_{ld}. The Loudness CV is then calculated as the standard deviation of the differences divided by the mean:

**Non-Harmonicity**

This tells us how `noisy' the signal is. For each song, we estimated a chord sequence from the chromagram. For each frame, we then summed the salience of the chromagram which was not part of the chord. Let C_{i} be the chromagram at the ith frame, P_{i} the predicted chord at frame i, and finally let b_{i} be a binary vector of length 12 which is 1 for notes **not** in P_{i}, 0 otherwise. The Non-harmonicity of a track with F frames is then calculated as:

**Harmonic Simplicity**

Measures how harmonically complex the song is. Given our predicted chord sequence, we measure how likely it was to occur given the model parameters using the log-likelihood score, normalised by the song length to give a simplicity `per beat'. Chord sequences which are complex are assigned lower scores, whilst those which are simple (e.g. just I-IV-V movements) will be assigned high harmonic simplicity values. The Harmonic simplicity is the product over all frames of the probability of observing this chord multiplied by the probability of transitioning into this chord: