资源数据集Million Song 歌曲音频数据

Million Song 歌曲音频数据

2019-12-26 | |  142 |   0 |   0

Welcome to the Last.fm dataset, the official song tag and song similarity dataset of the Million Song Dataset.

The MSD team is proud to partner with Last.fm in order to bring you the largest research collection of song-level tags and precomputed song-level similarity. All the data is associated with MSD tracks, which makes it easy to link it to other MSD resources: audio features, artist data, lyrics, etc.

Some numbers


Before you read the full description, you might want to know that the Last.fm dataset is big. How big?

  • 943,347 matched tracks MSD <-> Last.fm

  • 505,216 tracks with at least one tag

  • 584,897 tracks with at least one similar track

  • 522,366 unique tags

  • 8,598,630 (track - tag) pairs

  • 56,506,688 (track - similar track) pairs

Description

The Last.fm dataset consists of two kinds of data at the song level: tags and similar songs. If you are familiar with the Last.fm API, it corresponds to the track methods 'getTopTags' and 'getSimilar'.

Below is a list of the top tags with their total frequencies in the dataset. The graph lets you glance at the total (log) frequencies of the top 200K tags.

   rock                  101,071
   pop                    69,159
   alternative            55,777
   indie                  48,175
   electronic             46,270
   female vocalists       42,565
   favorites	          39,921
   Love	                  34,901
   dance                  33,618
   00s                    31,432
   ...




Below is the list of similar tracks for Kenny Loggins - Footloose (TRRQSYC128F92DF7C8). The first number is a "similarity measure". Note that we have removed duplicates, see this blog entry regarding the duplicates issue in the MSD.

1 TRVBGMW12903CBB920 (u'Deniece Williams', u"Let's Hear It For The Boy")
0.779581 TRUPEBD12903CCDB24 (u'Kenny Loggins', u'Danger Zone')
0.621877 TRCGAQU128F9364C33 (u'Starship', u'We Built This City')
0.599988 TRMKELO128F92FF72A (u'Michael Sembello', u'Maniac')
0.593485 TRFJBDW128F428AB32 (u'Starship', u"Nothing's Gonna Stop Us Now")
0.559087 TRHDJDB128F930527A (u'Survivor', u'Eye Of The Tiger')ower Of Love')
0.537466 TRKMBZN128F428E0C0 (u'Huey Lewis And The News', u'The Power Of Love')
0.488286 TRMVKSL128F14640E0 (u'Robert Palmer', u'Addicted To Love')
0.469828 TRCHXXE128F428547C (u'The Pointer Sisters', u"I'm So Excited")
0.467316 TRGVORX128F4291DF1 (u'Mr. Mister', u'Broken Wings')
0.464955 TRQJQBY128F4289141 (u'Rick Springfield', u"Jessie's Girl")
0.445663 TRJTJEQ128F92DF7C2 (u'Ray Parker Jr', u'Ghostbusters')
0.443182 TRQQQUV12903CC84BF (u'Boy Meets Girl', u'Waiting For A Star To Fall')

We are releasing the data as a set of json-encoded text files. In Python, use simplejson to load the data as a dictionary. The dataset comes in two zipped folders, one for training and one for testing. The split is the same as for artist tags from The Echo Nest. If you're using Last.fm song similarity, please use the same split if possible. Here is what the file TRVABRY128F1476445.json looks like. Keys are artisttitletimestampsimilars and tags.

{"artist": "Josu00e9 Mercu00e9", "timestamp": "2011-08-16 01:34:38.887856", "similars":
[["TRNIEVD128F147645F", 1], ["TRTXPMH128F1476447", 1], ["TRZIWPD12903CDE96C", 0.66243399999999997], ["TRLILVX12903CDE95E", 0.62811899999999998], 
["TRLSYHL128F428C7CB", 0.55656099999999997], ["TRWDJQC12903CB287F", 
0.50215799999999999], ["TROJDSM128F9304E70", 0.50215799999999999], 
["TRZOIRZ128F42A9659", 0.474242], ["TROMZDT128F92EFE12", 0.472804], 
["TRVWRLB128F148F534", 0.46567999999999998], ["TRFYKLP128F92EFE18", 
0.46275100000000002], ["TRTTOVG128F148F533", 0.46058300000000002], ["TRSXYZI128F42A9663", 0.41137699999999999], ["TRRCXTY128F4277F63", 
0.19137299999999999], ["TRUTJAF128F4277F4C", 0.17661499999999999], 
["TRHEWAO128F935F427", 0.011253300000000001], ["TRNHEHN128F4293C4E", 
0.011253300000000001], ["TRNXBQQ12903CEEF46", 0.011253300000000001], 
["TRXEWMB128F42772F4", 0.0111964], ["TRCUSYP128F428633D", 0.0111485], 
["TRNGAKK128F4244109", 0.011090600000000001], ["TRQPBGA128F42772EC", 0.0110475], 
["TRANHDB128F4244103", 0.0110405], ["TRBXSTV128F4287DF5", 0.011037699999999999], 
["TRFOECV128F428633E", 0.011037699999999999], ["TRAHJXO128F424C7A3", 
0.011037699999999999], "tags": [["Flamenco", "100"], ["world", "50"], ["cante flamenco", "50"],
["jose merce", "50"], ["MyFlamenco", "50"], ["okFlamenco", "50"]], "track_id": 
"TRVABRY128F1476445", "title": "Campesino y minero (Tarantos)"}


上一篇:Nice Ride 共享单车骑行数据

下一篇:The Last.fm 广播音频数据

用户评价
全部评价

热门资源

  • GRAZ 图像分类数据

    GRAZ 图像分类数据

  • MIT Cars 汽车图像...

    MIT Cars 汽车图像数据

  • 凶杀案报告数据

    凶杀案报告数据

  • 猫和狗图像分类数...

    Kaggle 上的竞赛数据,用以区分猫和狗两类对象,...

  • Bosch 流水线降低...

    数据来自产品在Bosch真实生产线上制造过程中的设备...