I’m fascinated by the potential of news as a data source. The quantity of data is just breathtaking — there are billions of news articles and blog posts on the web, with hundreds of thousands more published every day. These articles are fact-checked and reliable, and editorially, they’re probably some of the best content to be found on the Internet. They typically cover breaking news or highly relevant topics of interest. They’re produced by serious news organizations and reporters, who take special care to create diverse, accurate content. Finally, they’re designed to appeal to readers and web users. This all means they can be used in all kinds of interesting ways.
For example, news data from various internet outlets can be unified in a dynamic, continuous stream representing a range of perspectives and opinions. News from across the web can be filtered to any set of topics or interests. Alerts can be set up to trigger whenever a certain person or brand is mentioned on a news site.
The potential scope is particularly breathtaking when combined with AI and machine learning techniques. Investment decisions can be made by scraping oil or crypto news data; Public opinion can be gauged by scraping all news mentioning a political figure and running sentiment analysis algorithms on top; Personalization engines can recommend articles similar to the ones a user has already read and enjoyed.
All the above is great fun, but to do anything with search, you’d probably want to start off by finding a good news API provider. Fortunately, there are several such providers online. We evaluated them on the following parameters:
- Result accuracy —Results need to be highly relevant to the search terms entered. Where possible, we added actual result examples for popular queries to compare result quality. There is nothing more important than accurate, relevant search results.
- Dataset size — Results should be drawn from as large a dataset as possible. It’s impossible to rely on an API which draws data from a small selection of sites, because it would never accurately reflect the most relevant news from around the web.
- Breaking news — Results should be up-to-date, and newly published articles should be promptly added to the database. The best news API would crawl new articles published on the major news sites instantly, or at least several times per hour.
- Price — A generous free plan should be included, and price for large request amounts should still be relatively affordable.
- API Features — If the API offers any standout features it gets extra points.
- Privacy — We’re in an era where privacy is #1. The APIs we review should have top-notch privacy protection procedures in place.
Without further ado, let’s jump to it — 2020’s top news APIs! In this article, we’ll review…
- ContextualWeb News API
- Aylien News API
- Microsoft Bing News API
ContextualWeb is a rising Search API company offering search APIs in multiple flavors — one of them a news search API. ContextualWeb uses a cutting-edge search technology that has its roots in neuroscience – taking inspiration from the way the human brain indexes and retrieves memories. This approach allows for rapid search in enormous datasets and contextual search results.
Our News API offers a web demo interface to make it easy to compare results with other API providers. Searching for “Coronavirus in the US” brought up results discussing the expansion of the virus to the US, rather than results from US news outlets or results discussing the US as an aside.
Another search I’ve tried is “IBM”. With News API, every single one of the first page results featured an article which deals specifically with IBM as a company. This is just one example of context-aware search – ContextualWeb doesn’t just look at keywords, it looks at the article as a whole. This contrasts with newsapi.org results which, for example, are mostly articles that mention IBM in passing without focusing on it. One search result returned by newsapi.org was “On this day: Born April 13, 1963; Russian chess champion Garry Kasparov”, an article which mentions that Kasparov lost to IBM-built Deep Blue. This isn’t a result you’d expect to receive.
ContextualWeb News API can also handle way more complicated search queries – ones that other search engines would really struggle with. The query “IT chapter two” (referring to the movie), for example, takes most search engines to all kinds of weird places. Most search engines would fail to recognize that “IT” is a movie name and not just a pronoun. With ContextualWeb News API, however, all first page search results were spot on, with recent news articles from major outlets discussing the IT movie, instead of just displaying pages where the search keywords appear haphazardly. The first result pointed to a news item dating very close to the writing of this article.
(By the way, I’m including screenshots from specific searches at the end of this article so you can compare for yourself without creating API keys.)
ContextualWeb crawls more than 100,000 news sources, and you will quickly notice that results come from a wide variety of sources, and that even niche topics get extensive coverage in our dataset – queries like local financial companies, Bollywood stars and the like get extensive search results. Regarding timeliness, some of the articles were published less than 20 minutes before being displayed in my search results.
In terms of privacy, ContextualWeb ensures that it does not collect or store personal information, and that it is impossible to tie searches back to any particular user or engine, even on the back end. If requested, ContextualWeb can also ensure zero search leakage and history. It’s the most comprehensive privacy guarantee found among the search engines on this list.
Price: Free up to 10,000 requests per month, then starting at $0.5/1,000 requests. This makes ContextualWeb News API the most affordable solution we’ve been able to find.
Standout features: Entity extraction — great for machine learning applications
Newsapi.org is one of the oldest News API providers on this list. Since their founding, they have managed to develop a very impressive product: it is a sophisticated yet easy-to-use API that returns JSON metadata for headlines and news articles from all over the web.
According to their website, Newsapi.org crawls articles from over 30,000 sources worldwide. The API is fairly straightforward and does not attempt to include any bells and whistles. Documentation is on point.
On to the results test! I went ahead and created a free newsapi.org account to take their free tier for a spin. As it turned out, I didn’t have to do so, because the API key included in the docs is an actual working API key.
For my “coronavirus” search, the results were excellent: they originated in a variety of sources and were very up-to date, with some results published minutes before I made the search. The first search result yielded an article from The Verge – a very reputable publication.
Now for more complicated searches! For my “IT chapter two” search, results were mixed. The first result was good – the same boingboing article found by ContextualWeb – but the second and third results were completely irrelevant articles, that may have included “it”, “chapter” and “two” keywords in their text, but had nothing to do with the movie “IT chapter two”.
Another try – “detroit become human” is the title of a video game. In my test, the first search result was “The U.S. Should Stop Water Shutoffs During the Covid-19 Pandemic—and Forever””. The second? An article about the game, but in a foreign language.
These issues stem from an approach most search engines based on Solr or Elastic Search take called “tf-idf” (term frequency–inverse document frequency). Some search engines, like newsapi.org, put an emphasis on the amount of times certain keywords have been mentioned on a webpage, prioritizing pages with the highest keyword frequency. This approach, which relies mostly on textual cues from webpages, completely stumbles when any ambiguity is introduced to search keywords, as we’ve shown above.
One final search – for “David Haye”, the boxing champ – returned just two results. Contrast this with both Bing and ContextualWeb News API, which returned hundreds of results for this search query. This issue stems from the limited number of domains Newsapi.org crawls, and will reoccur with other niche searches.
Before moving on to the next API provider, I need to clarify that newsapi.org is an excellent news provider – if you’re looking for chunks of articles that mention popular search terms. The API is accessible and easy to use.
It is worth noting that newsapi.org does not provide a privacy guarantee.
Price: Free up to 500 requests per day, then starting at $449 for 250,000 requests per month.
Aylien is a full-featured “news intelligence platform” that offers News API as part of a suite of news-related services. Aylien claims to add up to 25 data enrichments to every article they ingest, giving you a huge number of search and filter options to find what matters to you. They even offer multilingual support and real-time news processing.
Aylien also offers a straightforward way to demo their API: simply head on over to this link. The demo isn’t interactive, and only lets you try out a few predefined search queries. We found the results to be displayed beautifully. The demo doesn’t allow us to make custom search queries, however, so I went ahead and created an account.
Even after creating an account, there is no easy to use demo of the search engine. Using Postman – an API querying tool for desktop – I started making some searches. Searching for “coronavirus” yielded accurate results in that they include the keywords, but relevancy can be debated – the second search result linked to a podcast episode. Searching for “IBM” brought up many results that didn’t have much to do with the company, but merely referenced it in one way or another.
My tricky “it chapter two” search, however, tripped the engine completely, and it started returning irrelevant results – just like newsapi.org. These have nothing to do with the IT movie, for the exact same reasons this query trips up some other search engines mentioned in this review: Aylien seems to be completely focused on textual article analysis and neglecting contextual analysis.
If you’d like to try Aylien News Search API for yourself, please don’t hesitate to reach out and I’ll help you set up a demo through Postman.
Price: Free 14-day trial, then $49/mo for 30,000 requests.
Standout features: Rich JSON responses
Bing News API
Bing is a subsidiary of Microsoft and the world’s second most popular search engine – biting at Google’s heels. Bing offers an incredibly robust News API, which offers very accurate results and an enormous database. The only downside is its price – Bing charges $7 per 1,000 requests, way more than any of the competition.
Well, that’s not the only downside. Bing logs every query you make to improve its search results and analytics, which means privacy isn’t as top notch as you might like it to be.
Despite those drawbacks, Microsoft currently offers one of the strongest News APIs on the market with Bing. Where Bing is particularly strong is in result quality. I took the engine for a test. My “it chapter two” tripwire was barely noticed: the engine returned highly relevant results (though for some reason without thumbnails). A search for “detroit beyond human” also yielded relatively accurate results.
Price: $7 per 1,000 requests.
The only truly free option on this list, Newsriver is a not-for-profit News API. A simple GET request will return a list of the latest news articles relating to your search query. The engine also supports boolean expressions to generate highly accurate search queries.
It should be mentioned that Newsriver is a “real-time firehose API”, meaning, as far as I could tell, that it doesn’t attempt to sort results by accuracy or relevance – rather, it displays the latest results it could find.
Newsriver crawls a jaw-dropping 500,000 sources per day, and the results include rich metadata, which is a great plus.
We were not yet able to conduct a results test with Newsriver.
Ah, Google, the new skynet… We couldn’t list News APIs without mentioning Google News. However, Google doesn’t offer an official News API, so a third-party service is required to access all that aggregated goodness programmatically.
Google News is, of course, a news aggregation service that provides up-to-date news articles collected from numerous sources around the world. The service is available in 35 languages and can be accessed on the web and mobile operating systems.
This unofficial API allows you to integrate Google News search results into your application or web pages. You can use it to display topics, headlines, trending stories, URLs, and other news items from Google searches.
It is incredibly simple and easy to use, and you don’t need a credit card to get started. The only issue I had was that GNews does not provide an on-site demo, and you have to integrate the API or use an app like Postman to test it out.
On to the search test: “coronavirus” yielded accurate, up-to-date results from a number of different news outlets. So far so good!
“IT chapter two” results were mixed. The first couple of results did reference the movie, but further results included articles like “Harry and Meghan’s new chapter”, for example.
The “IBM” search also left much to be desired: the first result was an article about the company, but most other results were general articles covering stock exchange data in general that have mentioned IBM as an aside.
Price: Free up to 100 requests per day, then starting at 29.99€/MO for 2000 requests per day.
Most search engines rely on textual search parameters, while ContextualWeb relies on contextual parameters,
News API has results that originated in a variety of sources and were very up-to-date, with some results published minutes before I made the search.
News API returned just one result. Contrast this with both Bing and ContextualWeb News API, which returned hundreds of results for this search query.
Most results in newsapi.org just mention IBM but aren’t about IBM.
News API yields results about coronavirus and Netflix, which have nothing to do with the title of the videogame. ContextualWeb, on the other hand, yields accurate results corresponding to the subject at hand.