- Recommender System in Real-world Scenarios
- Objectives & Requirements of Each Stage
- Two Ways to Combine Embedding with Recommender
- Pre-trained Embedding
- Embedding Layer of Model
- Pros and Cons
- Embedding-based Candidate Generation
- First, Forget About Recommender, What’s Vector and Vector Space?
- From Vector and Vector Space to Candidate Generation
- Content-based Filtering (Item2Item)
- Collaborative Filtering (User2Item and User2User2Item)
- Reference
Recommender System in Real-world Scenarios
In realy-world scenarios, one of the main challenges is the huge number of items and complixity of business. As a result, most of the time, the solution is a multi-stages recommender which combined both machine learning models and business rules
Two-stage recommender is the most classic one. (But I also include re-ranking in the plot below)
Beblow are some industry practices:
Objectives & Requirements of Each Stage
Objectives | Requirements | Methods | |
Candiate Generation /Match/Retrieval/Recall | Remove unrelevant items, and reduce number of items | 1. Fast speed for inference → simpler model, less features
2. no need to be that accurate
3. Retrieve from different resouces | Multi-strategy candidate generation: retrieve items from differnt method such as rules based, content-based model, collaborative filtering based model |
Ranking/Scoring | Rank items as accurate as possible | 1. More accurate → complex model, more features | Mainly dominated by machine learning model |
Re-ranking | Meet some specific business contrains or improve item diversity | Most of the time → simple business rules |
Two Ways to Combine Embedding with Recommender
There are two ways to combine embedding and recommender. And both methods can be applied in both candidate generation and ranking stages
- Pre-traned embedding
- Embedding layer in model
Pre-trained Embedding
As the name indicated, the embedding is trained seperately before model training, and these embeddings will be concated with other features as the input of models.
In part 1 we introduced many methods to get embedding: Item2Vec, RandomWalk, Node2Vec, and GraphSAGE. All of their embedding can be used as pre-trained embedding to help the downstream tasks.
Here are some examples
Actually, to get pre-trained embedding, the methods are not constrained to what we’ve discussed in part 1. For example, in the work from Facebook, XGBoost is used to get pre-trained embedding, which is acutally leaf nodes of each tree. Then these info are feeded into to linear classifier to predict CTR
Embedding Layer of Model
For this case, the main function of embedding is to transform features from high dimentional sparse representation to low-dimentional dense representations.
Except user_id and item_id, we can actually get embedding for all categorical features. Even for continuous feature, in order to improve generalization and reduce over-fitting, it’s also a common practice to bin continuous features to categorical features. Then apply embedding layer.
Below are some examples
Pros and Cons
Pros | Cons | |
Pre-trained embdding | 1. accellerate training for downstream tasks
2. improve the performance of downstream tasks | 1. Since embeddign is not tailored for dowstream task, it might cause suboptimal performance |
Embedding Layer | 1. Learnt specifically for the task → potential better performance
| 1. Due to huge number of paramters of embeding, it takes time to train
2. If training data is not enough → potential bad performance |
Generally speaking, for high cardinality features such as user_id and item_id, it’s always a good idea to pre-trained them with extensive sequence or graph data. In addition, it’s also a good option to initialize the embeddding layer with pre-trained embedding, then update them while training model.
Embedding-based Candidate Generation
One of the important application of embedding is embedding-based candidate generation. To understand the real-world practices in the following blogs, it’s crucial to understand it.
First, Forget About Recommender, What’s Vector and Vector Space?
I believe most of us learn about vector and vector space in linear algebra course, and they have extensive applications in math and physics.
Basically, vectors are mathematical entities that have both magnitude and direction. And they can often pictured as arrows pointing in a certain direction in a vector space.
Take 2D vector as example, below are several examples
There are ways to measure the similarity between vectors such as cosine similarity, dot product, Euclidian distance and so on. We’ll leave the details and comparison of these similarity measurement to future blogs. Here we only discusss the basic idea.
Take cosine similarity as an example, which mainly consider the angle between vectors and discard the length of vectors. For two vector A and B, the cosine similarity between them are:
So for these 2D vectors above :
- , which means v1 and v2 are the same
- , which means v1 and v4 are completely different
From Vector and Vector Space to Candidate Generation
Embedding based candidate generation is based on the knowledge of vector space and vector similarity, and it can be classified to two main categories:
- Content-based filtering, which is also called Item2Item.
- Collaborative Filtering, which includes User2Item and User2User2Item.
Content-based Filtering (Item2Item)
Basic Idea
The basic idea is to use similarity
between items to recommend items similar to what the user like. For example, If user A watches two cute cat videos, then the system can recommend cute animal videos to that user.
How to Get this Embedding ?
Basically, all of the method that are introduced in part 1: Item2Vec, RandomWalk, Node2Vec and so on.
We can borrow an example from the recommender course. Assume there are two dimentions:
- X axis: whether the movie is for children (negative values) or adults (positive values)
- Y axis: the degree to which each movie is a blockbuster or an arthouse movie
Then all movies can be represented as 2-dimentional embedding below (let’s ignore the users for now)
Detailed Steps of Content-based Filtering
- Find the item that users liked, for example the Harry Potter movie
- Calculate the similarity (cosine similarity, dot similarity or whatever) between the embeding of this item and the rest of items
- Sort items based on similarity
- Get the top N items and recommend them
Collaborative Filtering (User2Item and User2User2Item)
Basic Idea
The basic idea is to use similarity
between users and items simultaneously to provide recommendations. For example, If user A is similar to user B, and user B likes video 1, then the system can recommend video 1 to user A (even if user A hasn’t seen any videos similar to video 1).
How to Get this Embedding ?
At the begining, you might find it hard to understand the similarity between user and item
, and why there are similarity between these two different entities.
Well, it actually should be the relevance between user and item
or the similarity between user embedding and item embedding
. So if user and item are relevance (for example user has clicked/booked/liked this item), we’ll try to make their embeddings as close as possible in the vector space.
For the example from previous section, based on users’ preference, users can also be mapped to 2-dimentional embeding. The embedding of user1 and movie Shrek is close (or similar), then we can recommend movie Shrek to user1
There are two main approaches to get the embedding, and the key is to map user and item to the same vector space, so that the similarity is measurable.
The first one is the the classic collaborative filtering method: Matrix Factorization. And the the downside is that context information cann’t be used, which constrains its performance
The second one, which is also the most extensive adopted one, is two-tower neural network. It can include more features and has the advantages of fast inference speed.
Detailed Steps of Collaborative Filtering
The steps are similar to that from content-based filtering. The main difference is the first and second step.
- Get the user information (and context information for two-tower neural network)
- Retrieve user and item embedding
- For matrix factorization, user and item embedding are stored in cache beforehands
- For two-towel neural network, item embeddings are pre-calculated and stored in cache; For user embedding, we’ll need to feed user and context information to user tower to get it
- Calculate the similarity between the user embeding and the embedding of all items.
- Sort items based on similarity
- Find the top N items and recommend them