Tuesday, October 1, 2019

User Intent Analysis and NLP - Potato Chips or Computer Chips

One area that is not the sine qua non of NLP but nonetheless a core technique is user intent analysis. This is more critical for areas like building a chat bot or an Alexa, and has some but lesser utility in enterprise search.

So consider the user entered query:
    what is the most expensive chip

Do they mean computer chip or potato chip? How can the machine know? Technically this is a query disambiguation class problem.

There are two types of user intent. One is categorical and the other is free intent.

For categorical intent, a NLP corpus is split into different areas. This is quite common. One person may work in the food side of the org and another in the silicon chip side. Typically there might be 10-30 categorical intents.

To solve for categorical intent, each corpus would get processed in the different areas and statistical maps or 3D spatial similarity maps would be generated. Typically these are generated on NLP generated tokens not the words themselves.  In our case, expensive would have a stronger correlation score to computer chips than potato chips. So that would trigger the user intent of "computers" and query transforms to boost to that would be applied.

The other user intent is more subtle. Which is to boost a query based on historical analysis of the user. So if the user's other query is "what is the fastest ram" then we might deduce that he is seeking computer chips. If the user's other query from history is "what is the saltiest snack" then we might deduce potato chips.

Ah but do we really know the user's intent. What if they've just started a keto diet and all potato chips are verbotten.  So if we start re-writing queries based on analyzed intent that's going to piss a lot of users off!  So for historical intent it's more an art than a science, and the goal is often to extend and nudge the query rather than obliterate it. How much is enough and how much is too much? Generally the intent provided term should be enough to show up in result sets but not dominate. Think of it as a leather clad dominatrix who uses a pillow rather than a whip. err. ok don't think of that. The point is, it's subtle and not overbearing, which is the mistake that's made most often.  Same goes for a chat bot but much worse - Alexa, turn off the lights becomes burn down the house. "sure can do!" replies alexa and turns on the oven.  Subtle. These are guesses after all!   They take a lot of tuning and regression testing to get the right sensibility.

Products which deliver intent analysis without enterprise search integration are clearly targeting chat bots. And while chat bots have their niche use, providing User Intent is also a powerful tool for enterprise NLP search.

No comments:

Post a Comment