Full text research queries

query

#1

Hello,

I am very new to Aerospike, I installed it on a server and made some tests using the C# client and it looks very promising.

I am currently working on a project involving high amount of data with a big amount of reads and inserts. Part of this project includes saving in the database millions of text articles (coming, for instance, from RSS feeds, with an average of 500 characters each) each week. Then each entry would be analyzed to match possible tags and topics, and other various statistical stuff.

The tag matching part would probably need some kind of full text searching tool (kindda like the MATCH AGAINST or LIKE statements from SQL), but I cannot find anything that seems related to this in the Aerospike features. Is there any full text research tool available, or is it possible to link it to something existing? Do you think Aerospike would be adapted to this kind of text articles storage?

Thanks a bunch, and sorry if this has already been asked, couldn’t find anything!

Cheers


#2

mmm, from Jan 27, and no answer or surround… I have the same question or problem.

Should I migrate to Aerospike my Mysql database? If I can’t make querys like: select * from table where column like ‘%hello%’… Maybe is not my best choise.

The only solution I can see now is get all the bins and make the search of the text in the client language (PHP, Java, etc). There all the efficiency are lost.


#3

I’ve answered this same question on StackOverflow, but here’s the gist:

You should not be doing full text searching with a SQL LIKE operator on an RDBMS, anyway. This is where a system like Apache Solr makes sense. In Aerospike, you would use a stream UDF to implement this functionality, if you insist on having it.

To search for articles that has a given tag, or all the tags for an article, you would just model your data in a way that makes sense, without needing to do a LIKE search.

For example a set article-tags, where the PK is the articleID and has a bin tags which grows with either a list of strings, or a list of tag IDs. For the orthogonal search you would have a set tag-articles that has a bin containing the list of article IDs for the given PK=tagID . In both cases you use the list append() operation to add values to those lists.

Alternatively, you can have a bin tags in the set articles which is a list of tags for the article. You can then build a secondary index on that list for searches.


#4

Thanks rbotzer for your answer. It’s a good technique what you propose. Although I’m still need to search for text. I have an application that search for words or parts of words.

I think that a MongoDb suits better for this purpose and still be a NoSQL database.

Best regards.