One of the biggest questions faced by the recommendation systems community is whether recommendation systems should be solely evaluated by their predictive accuracy, or also focus on other factors such as preference broadening. In order to appreciate the importance of this question, let us look at a couple of simulated scenarios in a dynamic setting.
Imagine we have an equal preference towards action movies and comedy movies. Now, suppose these preferences are denoted by two marbles in an urn, red and gray (meaning initially they are equally preferred). The color red denotes action movies and the color gray denotes comedy movies. Next, let us consider two simple models of how our preferences could evolve dynamically.
Personalize: In this model, we randomly draw a marble from the urn and watch a movie from the corresponding genre. After drawing the marble, we place it back and add another marble of the same color. So, if we picked a red marble in the first instance, in the next instance the urn has two red marbles and one gray marble. This model positively reinforces our choices, meaning showing us more of what we like. Eventually after thousands of draws, the urn will be full of one kind of marble and it will be very hard to draw the other kind, even randomly. This process is known as the Polya process. Polya processes are an excellent toy model for understanding self-reinforcing dynamical systems. They have equally likely path dependent outcomes. Meaning, if everything is equally likely, after 100 draws, the probability that the urn contains 85 red marbles is equal to the probability that it contains 1 red marble. But no matter which path is chosen, at equilibrium state (after infinite draws) one choice heavily dominates the other one. Analogically speaking, if a user is simply shown more of what she prefers currently, she could be stuck in a small universe of choices.
Balance: In the second model, we draw a marble randomly from the urn, place it back, and add another marble of the opposite color. So, if we picked a red marble in the first instance, in the next instance the urn has one red marble and two gray marbles. This model negatively reinforces our choices, meaning showing us less of what we like. Eventually after infinite draws the urn converges to equal proportions of each color. This process is known as the balancing process. Balancing processes are an excellent toy model for self-balancing dynamical systems. Analogically speaking, if we diversify aggressively based on a user’s current preferences, we would expose her to a large set of possible choices but would never really engage her in the short run.
Theoretically, on one hand we could have a Polya process-based system which would engage a user heavily in the short run but would lead to boredom, loss of engagement and lack of preference broadening in the long run. On the other hand, we could have a balancing process based system which doesn’t lead to any immediate engagement but is good for maintaining long term preferential diversity. So, which way do we lean? We’ll outline some practical methods that machine learning practitioners can follow to take the proverbial “middle path” between these extremes in creating wholesome experiences in the long run with minor short term trade-offs.
But before that, let’s discuss some basics.
In our toy models, using a simplistic argument we could denote preferences simply as marbles in an urn: static, precise and single-dimensional. In reality, preferences have all the following complexities:
Philosophically the problem of diversifying recommendations sounds similar to the explore-exploit tradeoff. In reinforcement learning settings optimality of choices/paths and rewards could be reasonably well defined. But the hidden rules of human preferences and psyche could change after every session.
Given these complexities, it is challenging to come up with a closed form solution to this optimization problem or even be able to define a single objective function. Below we will discuss some practical heuristics that could help us provide a more diversified user experience given rigorous quantitative and qualitative evaluations.
Being solely driven by machine learning-centric metrics like NDCG or product-centric metrics like number of likes could lead to eventual boredom and disengagement. Keeping these factors in mind, we recommend optimizing for long term user satisfaction. If you want to learn more about this work or are interested in joining one of our engineering teams, please visit our careers page, follow us on Facebook.