资源数据集2016年美国两党辩论数据

2016年美国两党辩论数据

2019-12-04 | |  132 |   0 |   0

Overview

Only two major-party candidates are competing in the 2016 US presidential election, but there was tough competition to get to the general election. This dataset contains transcripts of every Democratic, Republican, and Republican Undercard debate held during the 2016 primary season.

This dataset is meant to be a complement to Megan Risdal's transcripts of the 2016 US Presidential (General Election) Debates.

So you can now take all of the questions (who talks the most? who has a wider vocabulary?) that you answered in the general election debates, and apply the same procedures to see how your favorite (or least favorite) candidate has changed over time.

The column names (and order) in this dataset are a superset of those found in the general election dataset, and non-speech annotations (such as "(applause)") in this dataset are also a superset. Kernels uploaded for the general election dataset should be compatible with this dataset as well; please let me know if you have any compatibility issues.

What in the world is an "Undercard" debate?

The field of Republicans running for President in the primaries was (yuuuuge!) pretty big: 17 candidates threw their hat in the ring at one point or another. Debate organizers realized that having 17 people on stage (each with a set amount of time to answer a question / respond to a criticism / interrupt each other) would, in the best case, lead to a three-to-four-hour-long debate (and, in the worst case, lead to a chaotic shout-fest as candidates tried to talk over each other for three to four hours).

To alleviate this issue, many of the Republican debate nights were split into two separate debates: the main debate with the top party contenders, aired live during primetime; and the Undercard debate, which typically aired a few hours earlier than the main debate.

The criteria for a candidate to be allowed into the main debate (rather than the "kids' table" debate, as some pundits derisively called the Undercard event) varied a bit by event. Typically a poll showing of 1% in one of several specified polls was sufficient to gain admission to the Undercard. To get into the main debate, candidates had to either (a) be polling above a different, higher, percentage in specific polls, or (b) be among the top n Republican candidates in said polls.

The details get a little thorny (certain debates had multiple criteria, of which candidates had to meet at least one), so I refer questions to the individual Undercard debate pages at the American Presidency Project for detailed criteria.

In this dataset, the split between Republican debates is made in the Party column: Republican is used for the primetime main events, and Republican Undercard is used for the Undercard.


The Data

The fields are described more fully in the file description. This section describes the particular elements that can appear in the Speaker and Text fields.

Who's Who?

(aka, the Speaker column)

The primary debates had a ton of participants. This dataset contains utterances from 22 politicians and 49 moderators, not to mention the occasional audience laughter or 2-minute timer.

Almost all Speaker columns contain either a single title-case name (listed below in the Participants and Moderators subsections) or a single upper-case word indicating non-speaker noise (listed below in the None-speaker Turns subsection); the exceptions to this are cases where a name is concatenated with a space and a parenthesized tag, as follows:

  • spkr (VIDEO): transcriptions of pre-recorded material of any of the candidates or moderators

  • spkr (TRANSLATED): in the Univision/Telemundo debate, some questions and answers are translated into English from the original Spanish; the transcript reflects the translations as spoken by a translator

Non-speaker Turns

The non-speaker turns in the Speaker column are:

  • AUDIENCE: any laughter, booing, applause, etc. from the audience

  • CANDIDATES: cross-talk between candidates

  • OTHER: non-speaker, non-audience noise (such as commercial break, timer bell, national anthem, etc.)

  • QUESTION: a question from an audience member (or a prerecorded question)

  • UNKNOWN: cases where the transcriber could hear a phrase but could not determine who said it

Here are the various speakers who appear in the dataset:

Candidates

Democratic

  • Chafee: Former Governor Lincoln Chafee (RI)

  • Clinton: Former Secretary of State Hillary Clinton

  • O'Malley: Former Governor Martin O'Malley (MD)

  • Sanders: Senator Bernie Sanders (VT)

  • Webb: Former Senator Jim Webb (VA)

Republican

  • Bush: Former Governor Jeb Bush (FL)

  • Carson: Ben Carson

  • Cruz: Senator Ted Cruz (TX)

  • Kasich: Governor John Kasich (OH)

  • Paul: Senator Rand Paul (KY)

  • Rubio: Senator Marco Rubio (FL)

  • Trump: Donald Trump

  • Walker: Governor Scott Walker (WI)

Republican (Undercard ONLY)

  • Gilmore: Former Governor Jim Gilmore (VA)

  • Graham: Senator Lindsey Graham (SC)

  • Jindal: Governor Bobby Jindal (LA)

  • Pataki: Former Governor George Pataki (NY)

  • Perry: Former Governor Rick Perry (TX)

  • Santorum: Former Senator Rick Santorum (PA)

Republican (Main AND Undercard)

  • Christie: Governor Chris Christie (NJ)

  • Fiorina: Carly Fiorina

  • Huckabee: Former Governor Mike Huckabee (AR)

All candidates in a Python list for easy copy/paste

['Bush', 'Carson', 'Chafee', 'Christie', 'Clinton', 'Cruz', 'Fiorina', 'Gilmore', 'Graham', 'Huckabee', 'Jindal', 'Kasich', "O'Malley", 'Pataki', 'Paul', 'Perry', 'Rubio', 'Sanders', 'Santorum', 'Trump', 'Walker', 'Webb']

Moderators

NOTE: Some moderators are seen across various debates; in particular, the Republican main debates and undercard debates on a given day tend to share the same moderators. Some moderators are public figures who are seen only in videos (with the (VIDEO) tag).

Moderators

  • Arrarás: María Celeste Arrarás (Telemundo)

  • Baier: Bret Baier (Fox News)

  • Baker: Gerard Baker (The Wall Street Journal)

  • Bartiromo: Maria Bartiromo (Fox Business Network)

  • Bash: Dana Bash (CNN)

  • Blitzer: Wolf Blitzer (CNN)

  • Cavuto: Neil Cavuto (Fox Business Network)

  • Cooney: Kevin Cooney (CBS News)

  • Cooper: Anderson Cooper (CNN)

  • Cordes: Nancy Cordes (CBS News)

  • Cramer: Jim Cramer (CNBC)

  • Cuomo: Governor Andrew Cuomo (NY)

  • Dickerson: John Dickerson (CBS News)

  • Dinan: Stephen Dinan (Washington Times)

  • Epperson: Sharon Epperson (CNBC)

  • Garrett: Major Garrett (CBS News)

  • Ham: Mary Katharine Ham (Hot Air)

  • Hannity: Sean Hannity (Fox News)

  • Harwood: John Harwood (CNBC)

  • Hemmer: Bill Hemmer (Fox News)

  • Hewitt: Hugh Hewitt (Salem Radio Network)

  • Holt: Lester Holt (NBC News)

  • Ifill: Gwen Ifill (PBS)

  • Kelly: Megyn Kelly (Fox News)

  • Lemon: Don Lemon (CNN)

  • Levesque: Neil Levesque (New Hampshire Institute of Politics)

  • Lopez: Juan Carlos Lopez (CNN en Espanol)

  • Louis: Errol Louis (NY1)

  • MacCallum: Martha MacCallum (Fox News)

  • Maddow: Rachel Maddow (MSNBC)

  • Mcelveen: Josh McElveen (WMUR-TV)

  • Mitchell: Andrea Mitchell (NBC News)

  • Muir: David Muir (ABC News)

  • O'Reilly: Bill O'Reilly (Fox News)

  • Obradovich: Kathie Obradovich (The Des Moines Register)

  • Quick: Becky Quick (CNBC)

  • Quintanilla: Carl Quintanilla (CNBC)

  • Raddatz: Martha Raddatz (ABC News)

  • Ramos: Jorge Ramos (Univision)

  • Regan: Trish Regan (Fox Business Network)

  • Salinas: María Elena Salinas (Univision)

  • Santelli: Rick Santelli (CNBC)

  • Seib: Gerald Seib (The Wall Street Journal)

  • Strassel: Kimberly Strassel (The Wall Street Journal)

  • Tapper: Jake Tapper (CNN)

  • Todd: Chuck Todd (MSNBC)

  • Tumulty: Karen Tumulty (The Washington Post)

  • Wallace: Chris Wallace (Fox News)

  • Woodruff: Judy Woodruff (PBS)

All moderators in a Python list for easy copy/paste

['Arrarás', 'Baier', 'Baker', 'Bartiromo', 'Bash', 'Blitzer', 'Cavuto', 'Cooney', 'Cooper', 'Cordes', 'Cramer', 'Cuomo', 'Dickerson', 'Dinan', 'Epperson', 'Garrett', 'Ham', 'Hannity', 'Harwood', 'Hemmer', 'Hewitt', 'Holt', 'Ifill', 'Kelly', 'Lemon', 'Levesque', 'Lopez', 'Louis', 'MacCallum', 'Maddow', 'Mcelveen', 'Mitchell', 'Muir', "O'Reilly", 'Obradovich', 'Quick', 'Quintanilla', 'Raddatz', 'Ramos', 'Regan', 'Salinas', 'Santelli', 'Seib', 'Strassel', 'Tapper', 'Todd', 'Tumulty', 'Wallace', 'Woodruff']

What's What?

(aka, the Text column)

In general, the Text column contains fully punctuated and appropriately capitalized speech transcriptions. Most parenthesized elements are non-speech elements, with the following exceptions:

  • (c) and (4): are spoken in reference to 501(c)(4)s (tax-exempt lobbying groups)

  • (k): spoken in reference to 401(k)s (individual pension accounts)

The non-speech elements that can be found in parentheses in the Text column are:

  • (ANTHEM): the national anthem is played

  • (APPLAUSE): the audience expresses approval

  • (BELL): a bell or buzzer indicating that a candidate's time has expired

  • (BOOING): the audience expresses disapproval

  • (COMMERCIAL): the televised debate breaks for a commercial advertisement

  • (CROSSTALK): more than one candidate or moderator are speaking at the same time

  • (LAUGHTER): the audience expresses a sense of humor

  • (MOMENT.OF.SILENCE): the debate pauses for a moment of silence

  • (SPANISH): the utterance is in Spanish (for the Democrats' Univision-hosted debate on 3/9/16 in Miami)

  • (VIDEO.END): a video clip ends

  • (VIDEO.START): a video clip begins

  • (inaudible): the utterance was inaudible, off-mike, or too indecipherable to transcribe


上一篇:波士顿 Airbnb 公开数据

下一篇:WikiLinks 跨文档语义指代语料

用户评价
全部评价

热门资源

  • GRAZ 图像分类数据

    GRAZ 图像分类数据

  • MIT Cars 汽车图像...

    MIT Cars 汽车图像数据

  • 凶杀案报告数据

    凶杀案报告数据

  • 猫和狗图像分类数...

    Kaggle 上的竞赛数据,用以区分猫和狗两类对象,...

  • Bosch 流水线降低...

    数据来自产品在Bosch真实生产线上制造过程中的设备...