QAC Example & Term Definition

Term definition:

Prefix: partial query
Block (or column):

suggested result list for prefix
there is block for every prefix; there are several blocks within the same conversation

Conversation:

Begin when a user begins to type prefix; end when user choose one of the suggest result or abandon it

Two Important User Behavior

Skip Behavior

In a conversation, even though the suggested column contains users' final selected query, users frequently skip them.

The behind reasons are related to devices, typing skills of users. For example, fast typists tend to continue typing additional characters without examining the completions

Example of skipping behavior

Frequency of sipping behavior (a study from Yahoo QAC log data)

Example of skipping behavior

Frequency of skipping behaviro (study from Yahoo QAC log data)

Observation Bias (position bias)

Similar to other ranking problems, most of the clicks (or interactions) concentrated on top positions

Due to the vertical setting, users will observe the top position first. It also depends on the UI (or device)

Evaluation Metrics

Learning from User Behavior

Based on the observation above, in a conversation there are actually several funnels from the point when users begin to type to that when user select one of the suggested query.

So for a suggest item as position n in the ith column. Here are the funnels

Funnel 1: Whether users decide to stop typing and check the ith column
Funnel 2: Whether users check the nth position
Funnel 3: Whether suggested item is relevant to what the user want

What we really want to evaluate is the performance of the 3rd funnel, however, all of the data we’re going to collected mostly reflect the combined influence of all these 3 funnels. Keep it in mind while evaluating the auto completion system

Metrics

Basic Idea	Metrics	Data Needed
How many users started typing a query, but never actually selected a result	drop-off rate or abandonment rate	1. users' interactions with last column of each conversation
Whenever a result is selected, what was its position in the suggestion list? (or is the selected result in high position)	1. Average selected position 2. success rate of top K position: success rate@K 3. Mean Reciprocal Rank@K (MRR@K) 4. Mean Average Precision@K (MAP@K) 5. ….	1. Only conversations ended with click(or selection) are needed 2. Only users' interactions with last column of each conversation are needed
How many characters did the user have to type before s/he was able click on the result s/he was looking for?	1. Minimal keystrokes 2. Effort saved	1. Only conversations ended with click(or selection) are needed 2. Only users' interactions with last column of each conversation are needed

Data

Tracking

Based on the metrics above, it looks like users' interactions with last column of each conversation is more important. If resource is limited, we can track this data. However, if resource are enough, it will be helpful if we can track high-resolution log that records every keystroke

Here is an example of the processed data from Yahoo

Open Datasets

There are several open datasets from

Yahoo
AOL
Bing

‣

Query Auto Completion (QAC) - Metrics & Data