Here’s my whiteboard doodle that sort of captures how inNews works 🙂
Step 1: A crawler tracks news sources (RSS feeds) on a regular basis. We can go real-time but currently it is periodic to save on costs.
Step 2: The new feeds / stories are categorized, the headlines and the body of the story identified. The articles are archived in our vault. One day, we hope to be able to provide a “hor-news-grows” view and show the natural life-cycle of a news story. A bit like the graphic below:
Step 3: Our keyword extraction kicks in and identifies relevant keywords in headlines, body, pictures.
Step 4: We use a clustering algorithm to identify related keywords and stories. Each story is matched to the cluster database. New clusters are created if needed.
Step 5: The clusters are sorted based on category, velocity and volume of stories
Step 6: The REST APIs use the cluster information to retrieve BIG news stories based on a data range; the # of stories related in the cluster and the summary of the story
Step 7: The app receives the response from the REST API and renders its UI. See a sample below: