Here’s my whiteboard doodle that sort of captures how inNews works 🙂
Step 1: A crawler tracks news sources (RSS feeds) on a regular basis. We can go real-time but currently it is periodic to save on costs.
Step 2: The new feeds / stories are categorized, the headlines and the body of the story identified. The articles are archived in our vault. One day, we hope to be able to provide a “hor-news-grows” view and show the natural life-cycle of a news story. A bit like the graphic below:
Step 3: Our keyword extraction kicks in and identifies relevant keywords in headlines, body, pictures.
Step 4: We use a clustering algorithm to identify related keywords and stories. Each story is matched to the cluster database. New clusters are created if needed.
Step 5: The clusters are sorted based on category, velocity and volume of stories
Step 6: The REST APIs use the cluster information to retrieve BIG news stories based on a data range; the # of stories related in the cluster and the summary of the story
Step 7: The app receives the response from the REST API and renders its UI. See a sample below:
How do we know about big stories? In the age of newspapers, we assumed that the front-page headlines were big stories (and trusted the judgement of the editorial team). In the age of TV news, we assume that the stories featured in the 9 pm debates are the top stories for the day. When we look a breaking news thru the day on TV, we assume it must be a big news item to make it to breaking news. In the internet age, we glanced at our favorite news web sites and saw the top headlines and assumed they were the big stories. But as social media, blogs and news portals have proliferated, we observed that different portals and channels prioritize different stories as their top stories – based on their judgement, bias and sometimes interest.
So how does find out what the top news of the moment is?
Our approach is algorithmic. We look at the volume of articles that are written on a theme across a wide range of sources. We co-relate related keywords into a thematic framework and determine which themes are getting popular. The top story is usually the one with the max number of interest, ie – individual sources reporting on that story.
This is the basis of our News Reader app – inNews. It uses sophisticated algorithms to track, process and analyze thousands of news feeds to simplify the top news for you. Open the app and Bingo – the top news of the hour is in front of you, with all of the related stories from all the sources so you can read as much as you like. Flip thru the top stories and in a quick 2 minute scan, you can see all the top 10-20 stories of the moment.
Or you can read thru 20 sites, and/or apps and do the heavy lifting yourself. Did I hear you say “Nah, that would be stupid in the age of mobile consumer with limited time?” We agree 🙂
So What are you waiting for? Grab the app and check it out yourself. The Play Store APK is live (https://play.google.com/store/apps/details?id=com.plabs.apps.innews&hl=en ). The App Store link should be up in a week.
A couple of posts ago, I talked about the idea that “audio search” makes so much sense for a music app. We have been working behind the scenes looking at voice to speech technologies and evaluating them with a view to offer voice search in our app “Filmi Filmy”.
We are happy to report that we were completely wrong when we first thought of this – Since all of the song titles are entered in English but represent Hindi words phonetically eg: “O mere dil ke chain”, “Gata rahe mera dil” we think that we can use a voice to speech engine to take user inputs, turn them into phonetic English and use the English text as the search keys.
It turns out that is it much more elegant and natural to take the voice input “O mere dil ke chain”, render it as the hindi string “ओ मेरे दिल के चैन” and search for the hindi string in the database. One significant advantage to this is that it reduces the complexity of the phonetics completely. It does not matter if the “ke” is spelled as “key” anymore as in Hindi it will always be spelled as “के”.
The challenge of course is getting a database of film song titles entered in Hindi. Nearly all song databases have English transliterated titles – and may we add- not two of them spell the same song the same way. A healthy inheritance from English led and US led software is that from YouTube to the home grown Gaana nearly all the songs are in English.
We are happy to report that fortunately a bit of innovation and tons of persistence can solve this problem (we may not have a huge cash chest at Pariksha but we are certainly not short on tech coolness). One of our engineers figured out a way to use existing open-source tools to build hindi equivalents of the titles.
The results are spectacular, to say the least. Consider for example this song search using voice search with hindi titles v/s text search with English phrases below:
|Text Search With English Phrases
||Voice Search with Hindi Titles
We need to do a bit more work on the hindi song titles and improve the error handling on the search and this should be ready for public use. Now consider the scenario we had described earlier – Imagine slumping in a car after a long day and with no energy to type to search, all you have to do is say the song and voila the app will play it on your phone, ear-phone or connected blue-tooth speaker. Dare we say, it is not long before this will be a reality!