Studying the Distribution of Computational Propaganda with SerpAPI

Detecting fake news and misinformation is hard. But detecting propaganda is harder. Propaganda is the orchestrated and organised manipulation of opinions by governments and large political parties to persuade, change and influence the beliefs of the population. In particular, computational propaganda is the propaganda pushed by computational means. How on earth can we detect that?

In the Stratosphere Laboratory we set out to address this challenge of detecting if a news article is propaganda by leveraging a new idea: find which other sites are linking/referencing the news article. In this blog post, we will show how we accomplished this by using SerpApi.

Detecting Computational Propaganda Through Distribution Patterns

The analysis of the distribution pattern of a propaganda article is important because the entity doing propaganda needs a large exposure of its ideas, and therefore to promote the article more and more through links. They can achieve this by publishing the article in many different web pages, both in social networks and in news sites and blogs. 

In order to find the distribution pattern, it is necessary to find which other URLs and web pages are linking TO the article. This is fundamentally different from the more common search of which web pages are linked from the article. We want to find all the URLs (articles) that link to the original article. The following diagram shows this difference.

Difference between extracting links from a URL, and to a URL

Searching on the Internet is Not that Easy

To solve the problem of how to find which URLs link to the news article it is necessary to use the power of search engines. Search engines routinely scan the Internet, parse web pages and index the content and links. Therefore, our first proposed technique is to search on search engines the URL of the news article, which would give us who is linking to it. 

However, this would give us only the web pages that link directly to the URL, but not the web pages that talk about the news article without linking it. In propaganda distribution it is very common that some pages have the same title and text, but they don’t link to each other. So our second proposed technique is to search in search engines by the title of the news article.

For this to work, we would need to browse all the search results on all the search engines (results not only differ on Google and Yandex, but also from which country you search). We need some kind of automated search for this, in particular because search engines do not allow for direct access to all their results.

SerpApi To The Rescue

After a long search, we decided to use the services of the company SerpApi (https://serpapi.com/). Not only they implemented a very easy and fast API, but they allow you to access all the different parts of the search results, such as Google Cache, Images, Knowledge Graph, Local Search, Related Questions, etc. On top of that, it works perfectly with Google, Yahoo, Yandex, Bing, Baidu, DuckDuckGo, Ebay, Youtube, Walmart, HomeDepot, Apple, Naver, etc.

Are you interested in getting all results for Best Movie 2020 using Google? Register in their site, get for free an API key and do in python:

from serpapi import GoogleSearch

search_parameters= {
  "q": "Best Movie 2020",
  "engine": "google",
  "api_key": API_KEY
}

search = GoogleSearch(search_parameters)

Are you interested in a DIY Bookshelf? Furthermore, you want to be original, so you want to see what is on Baidu as the 35th result? Easy:

search_parameters= {
  "q": "DIY Bookshelf",
  "engine": "baidu",
  "pn": "35"
  "api_key": API_KEY
}

search = GoogleSearch(search_parameters)

This is just the tip of the iceberg of what SerpApi can do. You can use Java, Node.js, or different programming languages. If you don’t want to code at all, just use their Google Sheets extension. This makes automatic searching a piece of cake.

SerpApi for getting Computational Propaganda

Using SerpApi we are then able to search on the Internet for which URLs link and refer to the news article that we want to study. Using this API we can find them easily like this:

Conclusions

When trying to stop computational propaganda on the Internet, searching on many social networks and search engines is crucial, and we thank SerpApi for their support of this research.