Friday, September 13, 2019

The Importance of Broad Querying and Narrow Indexing

One of the issues we came across with developing our enterprise NLP search engine was the issue of precision. Rather than rank, if the precision level is not met documents simply fall out of query match.

Let's take the case of an object being a predicate object in the query vs being a noun subject in the sentence being indexed.  There would be no match.  So how do we handle this? Well, we can create a broad query - containing the object as both noun subject and predicate object. Now it will match sentences that have either case.

Wait didn't we lose precision? Yep, we sure did. So how does that pass the muster? Well it has to do with two things. The first is many other things are contributing to the match. And all those elements will result in partial scores that add up to a total. The end user experience of the results found will match their mental expectation - they think that queries will resolve symmetry of grammar structure, or as we call them inversions.  There can be models where prepositional clauses dangle from different parts of the sentence. Should that be a match? For enterprise search a large part of this process is tuning and using a "gist" a "gestalt" that things feel ok to the brain. There is no hard and set rule.

Another way to approach broadening is to handle cases where you didn't find the expected document and you realize it was due to a narrow definition.  While you can't simply broaden the case, you can run a test broadening it and see how it effects your regression test set of queries. And then do a bit of spot checking. 

Finally the last issue is that of the near miss. This often happens with the verb. Microsoft accquires Documentum.  Microsoft purchased Documentum.  This might result in a total miss. So how to handle this? The best technique is to run your docset through a processor which determines similarity and clustering. Then you can extend your query with additional terms if there are any within a specified distance.  Again, its a technique that can assist or blow up your query. It takes tuning and time to review many queries.

So if your latest search technology isn't producing the results you expected, remember that broadening the search vs the index is one technique to bring more potential matches into your query rankings.

No comments:

Post a Comment