Television Explorer: Television News As Data

In collaboration with the Internet Archive's Television News Archive, GDELT's Television Explorer allows you to keyword search the closed captioning streams of the Archive's 6 years of American television news and explore macro-level trends in how America's television news is shaping the conversation around key societal issues. Unlike the Archive's primary Television News interface, which returns results at the level of an hour or half-hour "show," the interface here reaches inside of those six years of programming and breaks the more than one million shows into individual sentences and counts how many of those sentences contain your keyword of interest. Instead of reporting that CNN had 24 hour-long shows yesterday that mentioned Donald Trump, the interface here will count how many sentences uttered on CNN yesterday mentioned his name - a vastly more accurate metric for assessing media attention.

No quotation marks are necessary, your primary keyword is treated as an exact phrase match. You can use commas to specify multiple primary keywords, which are OR'd together. You can also provide an optional "context" keyword or set of keywords (multiple keywords are similarly boolean OR'd together) to further refine your results by counting only cases where your primary keyword appears within four sentences of your context keyword. For example, if you search for "clinton" as your primary keyword, the resulting timeline will count every mention of her name, regardless of context. But, what if you want to examine when discussion of her began to focus on her email server? To count just mentions of her in context with her emails, you would specify "clinton" as your Primary Keyword and "email,emails" as your Context Keywords. This will run the equivalent of "clinton NEAR (email OR emails)". At least one of your Context Keywords must appear within four sentences before or after an instance of your Primary Keyword for it to count. This typically works out to a window of around 30 seconds of air time.

We define a "sentence" as being a string of words spanning one or more captioning timecode lines and ending with a period, question mark or exclaimation mark. After extensive experimentation we found that this provides a highly robust and reliable segmentation metric that works across all 150 monitored stations over all 6 years. Do keep in mind, however, that you are searching the raw unedited closed captioning transcript of each show - typographical and other errors in the transcript mean search results are not 100% perfect. Also note that since the number of sentences can vary day-to-day, we strongly recommend that you set the "Timeline As" option to "Percent Of All Sentences," which sums up the total number of all sentences monitored from the matching stations over the selected period and divides the number of matching sentences by that total. This reports media attention as a percent of all output of the monitored stations, ensuring that any day-to-day fluctuations are normalized away. You might still see an odd result here or there if for example there was an issue with monitoring of a station during a particular day (for example, a few missing or garbled transcripts), so if you see a few isolated one-day spikes or plunges in your data this could be the reason. A time resolution of one day is used for all searches spanning more than 7 days - for searches of 7 days or less an hourly resolution (in UTC timezone) is used to better highlight high resolution trends. You can also download the JSON data and graph at even higher resolution using your own tools.

For more sophisticated comparison analyses, you can chose to set Timeline As to Raw Counts and download the results in CSV format. You could then run multiple searches for different keywords and then compare their relative coverage in a spreadsheet tool like Excel or run more sophisticated analyses in a tool like R. This also allows you to perform more advanced contextualization analyses. For example, you could run a search for "clinton" and download the raw counts. Then, run a search for "clinton" with Contextual Keywords set to "email,emails" like the example above, but then download the raw counts as a CSV file. Now, copy paste the results into a single Excel spreadsheet and divide the results from the second query by the results of the first query for each day. The result will be the percent of all mentions of clinton that also mentioned her email troubles nearby. This is different than if you had run the "clinton"/"email,emails" query with Timeline As set to Percent of Sentences, since that reports the count as the percent of all monitored sentences from the monitored stations, rather than as a percent of just clinton mentions. Using this workflow you can perform all kinds of advanced analyses. You can also click-drag on the timeline below to zoom into just a portion of the timeline.

NOTE: Not all television stations are monitored for the entire time period and some may have brief outages. Many of the affiliate stations were monitored for only a few months. See the Station Start/Stop Dates list for the precise dates each station's monitoring began and ended (stations with an "end" date in the last few days are still being actively monitored). Note that some affiliate stations were monitored for a few months, then not monitored for a few months, then monitored again towards the end of the 2016 presidential campaign, so there may be gaps and when using affiliate stations you should try several different queries if you see a stretch of a month or two of zero hits in the middle of your results.

Email kalev.leetaru5@gmail.com with any questions. Permission is granted for any and all use of these graphs in media reports and academic research, please cite "Analysis by the GDELT Project using data from the Internet Archive Television News Archive." This tool would not be possible without the incredible work of the Internet Archive's Television News Archive to monitor and preserve American television news.



Trending Topics

To get you started, here are a list of some of the top topics that have been trending over the past 24 hours compared with the preceeding 24 hours (updated every 15 minutes using a rolling 48 hour window). Click on any topic below to see how it has been covered on television. Note - these topics are algorithmically assigned by the Internet Archive based primarily on noun phrase extraction using an adaptation of the Stanford Named Entity Recognizer.


russia mueller paul ryan marco rubio iran fbi omarosa rupert murdoch disney fox rosenstein brussels comcast china england netflix john mccain at&t bob iger grenfell tower austin vladimir putin australia entyvio laura mario draghi scotland cnn clinton joe biden amazon europe london dave espn oracle syria nikki haley moscow fcc mike lee jim jordan bbc napolitano otezla texas sarah huckabee sanders verizon obama xarelto charlie


Trending Topics By Station

To get you started, here are a list of some of the top topics that have been trending over the past 24 hours compared with the preceeding 24 hours (updated every 15 minutes using a rolling 48 hour window). Click on any topic below to see how it has been covered on television. Note - these topics are algorithmically assigned by the Internet Archive based primarily on noun phrase extraction using an adaptation of the Stanford Named Entity Recognizer.




Top Topics By Station

To get you started, here are a list of some of the top topics being covered the most by each national station over the last 24 hours (updated every 15 minutes using a rolling 24 hour window). The lists below reflect what each station is talking the most about overall, rather than attempting to surface "trending" topics - in short, this means that this list may change more slowly than the trending topics list above, since a station may focus on the same topic for several days in a row. Click on any topic below to see how it has been covered on television. Note - these topics are algorithmically assigned by the Internet Archive based primarily on noun phrase extraction using an adaptation of the Stanford Named Entity Recognizer.




Top Phrases

To get you started, here are a list of some of the top phrases that have been trending over the past 24 hours compared with the preceeding 24 hours (updated every 15 minutes using a rolling 48 hour window). To calculate this list, the system takes each line of all national network transcripts that aired in the last 24 hours and generates a series of 4-grams, drops any stopwords from each and then compares its popularity with all transcripts from the preceeding 24 hours. Typically the phrases below will reflect a mixture of topics and trending quotations/memes. Click on any phrase below to see how it has been covered on television.


white house child tax credit giant tax cut american people tax bill public trust middle class washington post st century fox tax cut for christmas corporate tax rate stock market insurance policy profound story child tax dads courage awaiting certification service and security sync keeping tax cut house and senate middle east path you threw made me uncomfortable house speaker paul ryan close race african american woman make me feel make me show national security counsel deeply and emotionally affected me deeply bank of england giant i mean giant affected my community bigger paychecks beginning long time american people a giant give you the american people a giant tax highest levels things i love national security things that gave top individual rate senior staff president trump director mueller understands russian president vladimir putin responsibility to make received calls


Search Options

Use the options below to specify a primary keyword or phrase (no quotes needed) and an optional set of comma-separated words/phrases (no quotes needed) that must appear within four sentences of your keyword to narrow its context. You can use the controls below to fine-tune your search.


Primary Keyword/Phrase

Context Keyword(s)/Phrase(s)

Television Network

Time Period

Timeline As

Combined/Separate Timeline

Output Format



Error

There was an error with your query:


    Your keyword(s) were either too long or too short. All keywords must be a minimum of 3 characters and a maximum of 50 characters.