Television Explorer: Television News As Data

In collaboration with the Internet Archive's Television News Archive, GDELT's Television Explorer allows you to keyword search the closed captioning streams of the Archive's 6 years of American television news and explore macro-level trends in how America's television news is shaping the conversation around key societal issues. Unlike the Archive's primary Television News interface, which returns results at the level of an hour or half-hour "show," the interface here reaches inside of those six years of programming and breaks the more than one million shows into individual sentences and counts how many of those sentences contain your keyword of interest. Instead of reporting that CNN had 24 hour-long shows yesterday that mentioned Donald Trump, the interface here will count how many sentences uttered on CNN yesterday mentioned his name - a vastly more accurate metric for assessing media attention.

No quotation marks are necessary, your primary keyword is treated as an exact phrase match. You can use commas to specify multiple primary keywords, which are OR'd together. You can also provide an optional "context" keyword or set of keywords (multiple keywords are similarly boolean OR'd together) to further refine your results by counting only cases where your primary keyword appears within four sentences of your context keyword. For example, if you search for "clinton" as your primary keyword, the resulting timeline will count every mention of her name, regardless of context. But, what if you want to examine when discussion of her began to focus on her email server? To count just mentions of her in context with her emails, you would specify "clinton" as your Primary Keyword and "email,emails" as your Context Keywords. This will run the equivalent of "clinton NEAR (email OR emails)". At least one of your Context Keywords must appear within four sentences before or after an instance of your Primary Keyword for it to count. This typically works out to a window of around 30 seconds of air time.

We define a "sentence" as being a string of words spanning one or more captioning timecode lines and ending with a period, question mark or exclaimation mark. After extensive experimentation we found that this provides a highly robust and reliable segmentation metric that works across all 150 monitored stations over all 6 years. Do keep in mind, however, that you are searching the raw unedited closed captioning transcript of each show - typographical and other errors in the transcript mean search results are not 100% perfect. Also note that since the number of sentences can vary day-to-day, we strongly recommend that you set the "Timeline As" option to "Percent Of All Sentences," which sums up the total number of all sentences monitored from the matching stations over the selected period and divides the number of matching sentences by that total. This reports media attention as a percent of all output of the monitored stations, ensuring that any day-to-day fluctuations are normalized away. You might still see an odd result here or there if for example there was an issue with monitoring of a station during a particular day (for example, a few missing or garbled transcripts), so if you see a few isolated one-day spikes or plunges in your data this could be the reason. A time resolution of one day is used for all searches spanning more than 7 days - for searches of 7 days or less an hourly resolution (in UTC timezone) is used to better highlight high resolution trends. You can also download the JSON data and graph at even higher resolution using your own tools.

For more sophisticated comparison analyses, you can chose to set Timeline As to Raw Counts and download the results in CSV format. You could then run multiple searches for different keywords and then compare their relative coverage in a spreadsheet tool like Excel or run more sophisticated analyses in a tool like R. This also allows you to perform more advanced contextualization analyses. For example, you could run a search for "clinton" and download the raw counts. Then, run a search for "clinton" with Contextual Keywords set to "email,emails" like the example above, but then download the raw counts as a CSV file. Now, copy paste the results into a single Excel spreadsheet and divide the results from the second query by the results of the first query for each day. The result will be the percent of all mentions of clinton that also mentioned her email troubles nearby. This is different than if you had run the "clinton"/"email,emails" query with Timeline As set to Percent of Sentences, since that reports the count as the percent of all monitored sentences from the monitored stations, rather than as a percent of just clinton mentions. Using this workflow you can perform all kinds of advanced analyses. You can also click-drag on the timeline below to zoom into just a portion of the timeline.

NOTE: Not all television stations are monitored for the entire time period and some may have brief outages. Many of the affiliate stations were monitored for only a few months. See the Station Start/Stop Dates list for the precise dates each station's monitoring began and ended (stations with an "end" date in the last few days are still being actively monitored). Note that some affiliate stations were monitored for a few months, then not monitored for a few months, then monitored again towards the end of the 2016 presidential campaign, so there may be gaps and when using affiliate stations you should try several different queries if you see a stretch of a month or two of zero hits in the middle of your results.

Email with any questions. Permission is granted for any and all use of these graphs in media reports and academic research, please cite "Analysis by the GDELT Project using data from the Internet Archive Television News Archive." This tool would not be possible without the incredible work of the Internet Archive's Television News Archive to monitor and preserve American television news.

Search Options

Use the options below to specify a primary keyword or phrase (no quotes needed) and an optional set of comma-separated words/phrases (no quotes needed) that must appear within four sentences of your keyword to narrow its context. You can use the controls below to fine-tune your search.

Primary Keyword/Phrase

Context Keyword(s)/Phrase(s)

Television Network

Time Period

Timeline As

Combined/Separate Timeline

Output Format