- QAC Example & Term Definition
- Two Important User Behavior
- Skip Behavior
- Observation Bias (position bias)
- Evaluation Metrics
- Learning from User Behavior
- Metrics
- Data
- Tracking
- Open Datasets
- Reference
QAC Example & Term Definition
Term definition:
- Prefix: partial query
- Block (or column):
- suggested result list for prefix
- there is block for every prefix; there are several blocks within the same conversation
- Conversation:
- Begin when a user begins to type prefix; end when user choose one of the suggest result or abandon it
Two Important User Behavior
Skip Behavior
In a conversation, even though the suggested column contains users' final selected query, users frequently skip them.
The behind reasons are related to devices, typing skills of users. For example, fast typists tend to continue typing additional characters without examining the completions
Example of skipping behavior
Frequency of sipping behavior (a study from Yahoo QAC log data)
Observation Bias (position bias)
Similar to other ranking problems, most of the clicks (or interactions) concentrated on top positions
Due to the vertical setting, users will observe the top position first. It also depends on the UI (or device)
Evaluation Metrics
Learning from User Behavior
Based on the observation above, in a conversation there are actually several funnels from the point when users begin to type to that when user select one of the suggested query.
So for a suggest item as position n in the ith column. Here are the funnels
- Funnel 1: Whether users decide to stop typing and check the ith column
- Funnel 2: Whether users check the nth position
- Funnel 3: Whether suggested item is relevant to what the user want
What we really want to evaluate is the performance of the 3rd funnel, however, all of the data we’re going to collected mostly reflect the combined influence of all these 3 funnels. Keep it in mind while evaluating the auto completion system
Metrics
Basic Idea | Metrics | Data Needed |
How many users started typing a query, but never actually selected a result | drop-off rate or abandonment rate | 1. users' interactions with last column of each conversation |
Whenever a result is selected, what was its position in the suggestion list? (or is the selected result in high position) | 1. Average selected position
2. success rate of top K position: success rate@K
3. Mean Reciprocal Rank@K (MRR@K)
4. Mean Average Precision@K (MAP@K)
5. …. | 1. Only conversations ended with click(or selection) are needed
2. Only users' interactions with last column of each conversation are needed |
How many characters did the user have to type before s/he was able click on the result s/he was looking for? | 1. Minimal keystrokes
2. Effort saved | 1. Only conversations ended with click(or selection) are needed
2. Only users' interactions with last column of each conversation are needed |
Data
Tracking
Based on the metrics above, it looks like users' interactions with last column of each conversation is more important. If resource is limited, we can track this data. However, if resource are enough, it will be helpful if we can track high-resolution log that records every keystroke
Here is an example of the processed data from Yahoo
Open Datasets
There are several open datasets from
- Yahoo
- AOL
- Bing