资源数据集millionHeadlines

millionHeadlines

2019-09-10 | |  143 |   0 |   0

超过一百万条的新闻头条信息


Context

This contains data of news headlines published over a period of 15 years.

Sourced from the reputable Australian news source ABC (Australian Broadcasting Corp.)

Agency Site: (http://www.abc.net.au)

Content

Format: CSV ; Single File

  1. publish_date: Date of publishing for the article in yyyyMMdd format

  2. headline_text: Text of the headline in Ascii , English , lowercase

Start Date: 2003-02-19 End Date: 2017-12-31

Total Records: 1,103,663

Inspiration

I look at this news dataset as a summarised historical record of noteworthy events in the globe from early-2003 to end-2017 with a more granular focus on Australia.

This includes the entire corpus of articles published by the ABC website in the given time range. With a volume of 200 articles per day and a good focus on international news, we can be fairly certain that every event of significance has been captured here.

Digging into the keywords, one can see all the important episodes shaping the last decade and how they evolved over time. Ex: financial crisis, iraq war, multiple US elections, ecological disasters, terrorism, famous people, Australian crimes etc.



上一篇:wowavatar

下一篇:kchousesales

用户评价
全部评价

热门资源

  • GRAZ 图像分类数据

    GRAZ 图像分类数据

  • MIT Cars 汽车图像...

    MIT Cars 汽车图像数据

  • 凶杀案报告数据

    凶杀案报告数据

  • 猫和狗图像分类数...

    Kaggle 上的竞赛数据,用以区分猫和狗两类对象,...

  • Bosch 流水线降低...

    数据来自产品在Bosch真实生产线上制造过程中的设备...