Jan 17, 2022
I’m curious how much difference the user embeddings make. For example, if you just embed the entire sequence of movies with padding for each user and perform a MASK prediction task, similar to BERT, you might get most of the benefits you are seeing here in a simpler way. One may even just take off-the-shelf code and treat this as a language modeling task. I’ve done this for categorical data and it beats the pants off MF. Anyway, great article.