Skip to content

Commit d74285f

Browse files
author
TimePi
committed
add blogs
1 parent 6f89cd7 commit d74285f

24 files changed

+608
-151
lines changed

blog/2019-05-28-first-blog-post.md

Lines changed: 0 additions & 12 deletions
This file was deleted.

blog/2019-05-29-long-blog-post.md

Lines changed: 0 additions & 44 deletions
This file was deleted.

blog/2021-08-01-mdx-blog-post.mdx

Lines changed: 0 additions & 24 deletions
This file was deleted.
-93.9 KB
Binary file not shown.

blog/2021-08-26-welcome/index.md

Lines changed: 0 additions & 29 deletions
This file was deleted.

blog/DeepFM.md

Lines changed: 124 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,124 @@
1+
# Application of DeepFM Model in AppS
2+
3+
In the field of recommendation systems, efficiently combining low-order and high-order feature interactions to improve prediction accuracy has always been a key challenge. The DeepFM model offers a solution that combines memory capacity and generalization ability by integrating Factorization Machines (FM) with Deep Neural Networks (DNN). This article will introduce the application and effectiveness of DeepFM in the AppS business.
4+
5+
<img src="../static/images/deepfm.svg" title="" alt="" width="522" data-align="center">
6+
7+
## Introduction
8+
9+
DeepFM (Deep Factorization Machine) is a recommendation system model that combines factorization machines (FM) with deep learning. It aims to capture both low-order and high-order feature interactions simultaneously. The architecture of DeepFM consists of two components: the FM component and the Deep component. The FM component is used to capture low-order feature interactions, while the Deep component learns high-order feature interactions through a multi-layer perceptron (MLP).
10+
11+
### FM Module
12+
13+
- **Function**: The FM module focuses on capturing second-order interactions between features. It leverages feature embeddings to compute interaction terms and efficiently represent relationships between sparse features.
14+
- **Advantage**: By modeling low-order interactions, the FM module effectively handles sparse data, making it particularly suitable for scenarios with a large number of sparse features.
15+
16+
### DNN Module
17+
18+
- **Function**: The DNN module is used to learn high-order feature combinations. Through a multi-layer neural network, DNN can capture complex nonlinear feature interactions.
19+
- **Customization Capability**: Users can design the network structure of the DNN according to specific needs, including the number of layers, the number of neurons in each layer, activation functions, and regularization strategies.
20+
- **Advantage**: With a flexible structure design, the DNN module can generalize to new feature combinations and improve the model's adaptability to different data distributions.
21+
22+
### Benefits of DeepFM over FM
23+
24+
1. **Comprehensive Feature Interaction Capability**: Traditional FM models mainly focus on second-order interactions between features, whereas DeepFM can effectively capture high-order feature interactions by introducing a deep learning component, thus improving recommendation accuracy.
25+
26+
2. **No Need for Manual Feature Engineering**: DeepFM can automatically learn feature interactions, reducing the reliance on manual feature engineering, which is particularly useful for handling complex, large-scale datasets.
27+
28+
3. **Shared Feature Embeddings**: The feature embedding layer in DeepFM is shared between the FM and Deep components, making the model more efficient in capturing feature interactions while reducing the number of model parameters.
29+
30+
### Advantages of DeepFM
31+
32+
- **Comprehensive Capability**: DeepFM combines the strengths of FM and DNN, allowing it to learn both low-order and high-order feature interactions without the need for feature engineering.
33+
- **Model Simplicity**: Compared to training FM and DNN separately and then combining them, DeepFM maintains model compactness and efficiency by sharing the feature embedding layer.
34+
- **Wide Applicability**: Due to its flexibility and strong expressive power, DeepFM is widely used in fields such as ad click-through rate prediction and recommendation systems.
35+
36+
### Example Code for Developing DeepFM with PyTorch
37+
38+
In the following example, we will develop the FM and DNN modules separately and then combine them into a complete DeepFM model.
39+
40+
```python
41+
import torch
42+
import torch.nn as nn
43+
import torch.nn.functional as F
44+
45+
class DeepFM(nn.Module):
46+
def __init__(self, field_dims, embed_dim, mlp_dims):
47+
super(DeepFM, self).__init__()
48+
self.embeddings = nn.ModuleList([
49+
nn.Embedding(dim, embed_dim) for dim in field_dims
50+
])
51+
self.linear = nn.Linear(sum(field_dims), 1)
52+
self.fm = FM(embed_dim)
53+
self.dnn = DNN(sum(field_dims) * embed_dim, mlp_dims)
54+
55+
def forward(self, x):
56+
x_emb = [emb(x[:, i]) for i, emb in enumerate(self.embeddings)]
57+
x_emb = torch.cat(x_emb, dim=1)
58+
x_linear = self.linear(x)
59+
x_fm = self.fm(x_emb)
60+
x_dnn = self.dnn(x_emb.view(x_emb.size(0), -1))
61+
return x_linear + x_fm + x_dnn
62+
63+
class FM(nn.Module):
64+
def __init__(self, embed_dim):
65+
super(FM, self).__init__()
66+
self.embed_dim = embed_dim
67+
68+
def forward(self, x):
69+
square_of_sum = torch.sum(x, dim=1) ** 2
70+
sum_of_square = torch.sum(x ** 2, dim=1)
71+
return 0.5 * torch.sum(square_of_sum - sum_of_square, dim=1, keepdim=True)
72+
73+
class DNN(nn.Module):
74+
def __init__(self, input_dim, dims):
75+
super(DNN, self).__init__()
76+
layers = []
77+
for dim in dims:
78+
layers.append(nn.Linear(input_dim, dim))
79+
layers.append(nn.ReLU())
80+
input_dim = dim
81+
self.layers = nn.Sequential(*layers)
82+
83+
def forward(self, x):
84+
return self.layers(x)
85+
86+
# Example usage:
87+
field_dims = [10, 10, 10] # Example field dimensions
88+
embed_dim = 10
89+
mlp_dims = [64, 32]
90+
model = DeepFM(field_dims, embed_dim, mlp_dims)
91+
92+
# Dummy input
93+
x = torch.randint(0, 10, (4, len(field_dims))) # Batch size 4
94+
output = model(x)
95+
print(output)
96+
```
97+
98+
## Application
99+
100+
### 1. Feature Embedding Configuration
101+
102+
In our DeepFM model, the embedding dimension for each feature is set to 10. This configuration effectively captures low-order feature interactions and provides a solid foundation for subsequent high-order feature learning through the deep neural network.
103+
104+
### 2. Model Training and Optimization
105+
106+
Building on our experience with FM model training, the DeepFM model excels in combining memory and generalization. The FM component captures low-order feature interactions, while the DNN component learns high-order feature combinations. This combination achieves excellent results in the current business scenario.
107+
108+
- Memory Capability: DeepFM uses the FM component's low-order interactions to capture known, stable feature combinations.
109+
110+
- Generalization Capability: Through the DNN component, DeepFM can discover new, potential high-order feature combinations, enhancing the prediction of user behavior.
111+
112+
### 3. AB Testing Results
113+
114+
In the "Guess You Like" module, deploying the DeepFM model led to a **4.66%** increase in average distribution per user. This result indicates that DeepFM significantly enhances the quality of personalized recommendations for users.
115+
116+
<img src="../static/images/DeepFM-AB.png" title="" alt="" width="522" data-align="center">
117+
118+
## Further Reading
119+
120+
- [A Factorization-Machine based Neural Network for CTR Prediction - arXiv](https://arxiv.org/abs/1703.04247)
121+
122+
- [Deep Factorization Machines — Dive into Deep Learning](https://d2l.ai/chapter_recommender-systems/deepfm.html)
123+
124+
- [DeepFM for recommendation systems explained with codes](https://medium.com/data-science-in-your-pocket/deepfm-for-recommendation-systems-explained-with-codes-c200063990f7)

blog/ESMM.md

Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
# Application of the ESMM Model in AppS
2+
3+
In modern recommendation systems, particularly within the AppS business environment, predicting user behaviors such as Click-Through Rate (CTR) and Conversion Rate (CVR) is crucial for enhancing user satisfaction and driving business growth. The ESMM model, with its unique architecture and efficient multi-task learning capability, offers an outstanding solution for the AppS business.
4+
5+
<img title="" src="../static/images/ESMM-origin.webp" alt="" width="522" data-align="center">
6+
7+
## Introduction
8+
9+
ESMM, short for Entire Space Multi-task Model, is a multi-task learning model specifically designed to tackle problems related to ad recommendations and user behavior prediction. The core idea of ESMM is to enhance the overall performance of the model by simultaneously learning multiple related tasks. This approach not only shares potential information between different tasks but also effectively alleviates the issue of data sparsity.
10+
11+
The ESMM model is typically applied to predict CTR and CVR. Traditional methods often train two separate models to predict CTR and CVR, whereas ESMM simultaneously performs these two prediction tasks within a unified framework, thereby capturing the correlation between them more effectively.
12+
13+
To learn more about the foundational concepts of ESMM, you can read this [academic paper on ESMM](https://arxiv.org/abs/1804.07931).
14+
15+
### Major Advantages of ESMM
16+
17+
- **Data Efficiency**: By sharing the feature space, ESMM can better utilize data, especially in sparse data scenarios.
18+
- **Performance Enhancement**: By jointly learning multiple tasks, ESMM can better capture the mutual influences between related tasks, improving the accuracy of predictions.
19+
- **Simplified Architecture**: Compared to training multiple models independently, ESMM provides a more streamlined and efficient solution.
20+
21+
### Differences Between ESMM and MMOE
22+
23+
In multi-task learning, besides ESMM, there is another popular model known as MMOE (Multi-gate Mixture-of-Experts). Both MMOE and ESMM aim to enhance the performance of multiple tasks by sharing information, but they exhibit significant differences in architecture and application scenarios:
24+
25+
#### Architectural Differences
26+
27+
- **ESMM**: ESMM conducts multi-task learning by sharing the entire feature space. It primarily uses a unified network structure to simultaneously predict multiple tasks (such as CTR and CVR) and enhances overall performance by sharing underlying features.
28+
29+
- **MMOE**: MMOE employs a more complex structure by introducing multiple expert networks and gating mechanisms to dynamically select suitable features and model paths for each task. Each task has its own gating network to select the most relevant information from multiple experts.
30+
31+
#### Application Scenarios
32+
33+
- **ESMM**: Suitable for scenarios where tasks are highly related and require extensive information sharing, particularly when data is sparse and efficient information utilization is needed.
34+
35+
- **MMOE**: More flexible and applicable to scenarios where task correlations are weaker or personalized feature selection is required. Due to its complex selection mechanism, MMOE performs better in situations with conflicting task requirements.
36+
37+
#### Performance Aspects
38+
39+
- **ESMM**: Provides stable performance improvements between related tasks through its simplified network architecture and efficient feature sharing.
40+
41+
- **MMOE**: Capable of offering higher prediction accuracy in complex task environments through flexible expert selection mechanisms, especially when task requirements are diverse.
42+
43+
## Application
44+
45+
### Similarity in Basic Structure Between ESMM and MMOE
46+
47+
The ESMM model shares many structural similarities with the traditional MMOE model. Both employ a multi-task learning framework to enhance the performance of different tasks by sharing information. However, the ESMM model adopts a different approach in the final conversion rate prediction: it calculates the predicted Conversion Rate (pCVR) as the product of two towers, a design aimed at fully capturing the interaction between CTR and CVR.
48+
49+
### Key Components of the ESMM Model
50+
51+
<img title="" src="../static/images/ESMM.webp" alt="" width="522" data-align="center">
52+
53+
#### Two Expert Networks
54+
55+
When applied to the AppS business, ESMM uses two expert networks. These expert networks are responsible for handling features related to CTR and CVR tasks, respectively. Through specialized network structures, ESMM can better extract and utilize task-specific information, thereby enhancing prediction accuracy.
56+
57+
#### Two Gating Mechanisms
58+
59+
In addition to expert networks, ESMM also employs two gating mechanisms to control the CTR and CVR tasks separately. These gating mechanisms dynamically adjust the selection and utilization of features for each task, ensuring that each task receives the most suitable information flow. Through optimization of gating mechanisms, ESMM provides more precise results in complex user behavior predictions.
60+
61+
## Experimental Results and Effects
62+
63+
In practical applications within the AppS business, the ESMM model has demonstrated significant results through A/B testing. In the "Guess You Like" module, the ESMM model successfully achieved a 6.45% increase in average distribution per user.
64+
65+
<img title="" src="../static/images/ESMM-AB.png" alt="" width="522" data-align="center">
66+
67+
## Further Reading
68+
69+
- [Entire Space Multi-Task Model: An Effective Approach for Estimating ... - arXiv](https://arxiv.org/abs/1804.07931)
70+
71+
- [ESMM &mdash; easy_rec 0.8.5 documentation](https://easyrec.readthedocs.io/en/latest/models/esmm.html)
72+
73+
- [GitHub - dai08srhg/ESMM: PyTorch implementation of Entire Space Multitask Model (ESMM)](https://github.com/dai08srhg/ESMM)

0 commit comments

Comments
 (0)