Ideas for mapping any $searchterms to given set of $results (stemming, spelling correction, etc.) with AS?

ManuelSchmidt · September 13, 2015, 12:24pm

We have an application that is dealing with user input on a somewhat broader base. Analysis showed that we loose quite some folks because their input does not match what we have prepared in our dataset. Users would type in ‘cars’, ‘car’, ‘cars driving’ and ‘carrs’ etc. while we only have ‘car’ prepared.

I wonder if anybody has already worked on something that is dealing with user-provided search terms that should each map to a single, most matching result record (prepared results).

Think of it like the ‘google input’-problem. They got dozens of millions of indexes, but not an infinite amount of them. However, they still can return results for any input you provide them with that has atleast some similarity with something existent.

This is a kinda tough problem to model in a NoSQL datastore, so it might be valuable to have a public discussion about how this might be implemented (especially with AS, but if somebody implemented something using Redis, that would be interesting too, I guess…)

So, when splitting this problem up into smaller, more solveable parts, one ends up thinking about ‘stemming’ (removing plurals, bringing verbs/nouns to a common ‘base’) and spelling correction and word order ‘normalization’(?). First one is kinda easy to solve with porter-stemmer-algorithm, the second one not so much (especially if you don’t have a db of correct variants of a word before stemming).

Would you try to implement those things with AS, or jump to a fallback solution like Apache Solr in case no entry could be found?

Of course there are alternative/simpler approaches to this problem like offering auto-suggest, asking users to re-check their queries and so on, but since we are really concerned about usability - and space requirements for our huge result-records - over here, we would like to eliminate as much trouble as possible. Since we haven’t really decided on algorithms yet, we haven’t spend a single second on what data model to use within AS or whether we should use AS for this at all.

Any input welcome.

Topic		Replies	Views
Using Aerospike for Search Bar Autocomplete How Developers Are Using Aerospike	0	1509	March 9, 2016
Storing queryable tags against a user profile Data Modeling	8	3692	January 14, 2016
Autocomplete query Query & Indexing	4	2004	September 17, 2015
Full text research queries How Developers Are Using Aerospike query	3	3298	April 22, 2016
Solr search functionality How Developers Are Using Aerospike	3	2575	May 19, 2017

Ideas for mapping any $searchterms to given set of $results (stemming, spelling correction, etc.) with AS?

Related topics