One key challenge with searching for hindi film songs is that most song titles (names) are spelled / recorded in roman english. Try Apple Music or YouTube or Gaana and their databases are full of badly spelled, differently spelled english based renditions of hindi / urdu words. It is no surprise that it is very hard to find the song you are looking for coz the song may be spelled with a different spelling than the one you have in mind. For example, there are hundreds of songs that start with “मैं” (me) but they could be spelled with “mein”, “main”, “mayn” or even “men”. More intricate words have even harder spelling issues.
Moreover when user types 3-4 words of a song, which is the dominant way to search, the probably of one of the words spelled differently increases which reduces the probability of finding the exact song! So the more you type the poorer results you get!
The reason for this mess is the inherited nature of language computing. However with rise of smartphones we are seeing rise of language usage and language keyboards on the smartphones are making it easier to type in indian languages. So why do the music vendors continue to support english based song names /titles? Can we not challenge this?
“Filmi Filmy” is our app where user can view song names in Hindi. Just choose the language setting and the app will automatically translate the song name and show you the Hindi song title. This itself is hugely useful because beyond the 80-100 mi users, most users cant even read english names. So having hindi song displayed in hindi is a god-send.
To improve the search experience in “Filmi Filmy” we added Phonetic Search capability/. We implemented the “soundex algorithm” of Phonetic Search and found that the results are better, however we were still not satisfied. The soundex for hindi sounds is clearly also evolving. E.g. there exists a song titled “Lagi Aaj Sawan” and when user searches for exact phrase, it works, however when user searches for “Lagi Aaj Savan”, even the soundex result is fails to match it.
Then we experimented with “Edit Distance” based search – how far is the search string from the song name? The results were better, but still not satisfying. E.g. user searches for “आप” because user wants to search for song starting with this word, however user will also get results like “आ छुप जाएँ सनम” in beginning because both the letters of the word are found in this song. Definitely not a solution when you want to build THE BEST SEARCH ENGINE for hindi film songs. Period.
That is when we realized that our search which was implemented on complete text of song, needs to be implemented on individual words and then we have to combine word results to make complete Title result to make better. We have now evolved our own algorithm that uses a combination of phonetics, edit search and semantic analysis to detect the key words in the search string and find the same in the song database. Instead of doing a search on the full name, the engine searches for the key words/phrases and then combines the word results to make complete Title result. Further, we use a weightage scheme to prioritize the keywords depending on frequency of appearance in the database and position in the search string. This approach improves the search results appreciably. And then if you add character level Phonetics with Edit Distance, the results are almost near perfect search! E.g. ख़्वाहिशें and ख्वाहिशें are considered exact match instead of approximate match.
There are still certain challenges to be worked out. Text in Hindi can be written using different Unicode characters(!), which in turn makes our algorithm weak in cases where the word looks the same but internally they are different Unicode characters and thus not a match. Also there are challenges where a word can be written differently (e.g. जिंदगी vs ज़िन्दगी).
But inspite of these challenges, we now have, we believe, the BEST SEARCH ENGINE for hindi film songs. The new search will be available in an updated release of the app “Filmi Filmy” soon. Do try it and share your feedback. And if you are frustrated by the search experience on Apple Music, YouTube, Gaana etc do send them a note to license our technology!!