A User-Centric Evaluation of Democratic News Recommenders
by Nordin Bouchrit
Supervisor: dr. Damian Trilling
Following the increasing availability of digital data, news organizations are more and more stimulated to adopt recommender systems as the engine behind personalized news recommendations. Several concerns have been addressed by scholars about the media having the power to automatically guide users in what they read, especially because of the potential harmful effects of democracy (i.e. echo chambers, polarization). On the other hand, news recommender systems are also believed to have the potential to fulfill the democratic role of the media by accommodating the diversity of the audience in facilitating the right information which helps informing the audience better and more effectively. On this backdrop, designs for recommender systems have been formulated that are inspired by democratic theories and are believed to fulfill the potential democratic role of the media. However, the proposed parameters are never actually developed into news recommender systems let alone be tested in a user environment. Therefore, this thesis had two goals: 1) develop two recommender systems that are inspired by liberal- and critical democratic theories and 2) take a user-centric endeavor by examining the difference in the user experience of these two systems.
Developing the experimental environment
In order to reach these goals, a Python based web-application (3bij3) was used to set-up an online evaluation experiment logging implicit as well as explicit feedback in the form of user interactions. The 3bij3 framework made it possible to set up this environment by delivering open source code that, with a considerable amount of tweaking, can be used for your own research. When deployed, it functions as a ‘real’ online news application with a good looking interface ready to be used by multiple users (See Figure 1)
Figure 1: Desktop interface (left) and mobile interface (right) of the application
The application that was deployed for this thesis differentiated two groups in terms of the selection algorithm for news-items: the liberal recommender and the critical recommender. Users got distributed into one of the groups randomly, after making an account. The recommender systems were content-based, which means that the recommenders selected news items based on the description of the item and the user’s profile. Both recommenders picked the articles based on implicit measures (i.e. past selections) to infer a certain preference for topics and construct the user profile.
The user-profile was constructed by generating a topic-list selecting all the topics the user clicked in the past. From this topic-list, three topics are randomly picked, so that more frequent topics have a higher chance of getting into the topic list. The recommenders used in this study differ in the selection of articles based on this topic list. For the liberal recommender, the six articles are selected by using the three topics and show two articles for each of these topics. On the other hand, the critical recommender selects the six articles by doing the opposite (excluding the articles with the topics in the topic list).
In order to complete the full study, participants needed to use the system for at least seven different days with a considerable amount of interactions. Upon completion, users were directed via an in-app link to an evaluation survey with question items measuring the constructs of the influential ResQue Framework. The system was subjectively evaluated by the participants on the perceived system qualities, user beliefs, user attitudes and behavioral intentions. Next to that, the objective system metric time spent reading the articles is examined, which is often used as an implicit measurement for user interest as well as user engagement.
The thesis successfully developed and deployed two different recommender algorithms on the system but failed to find significant differences between the critical and liberal recommender system on almost all subjective evaluations. The timespan of this thesis was in the end not enough to recruit enough participants resulting in a statistically underpowered study. Albeit no significant differences are found between the two systems the average scores of both systems evidently show that in general, both systems are evaluated slightly above or around the middle of the scale for every dimension. This indicates that even when the system ignores the wishes of the user by showing topics that are not matching the interest of the user, the performance of the system is still acceptable. With regard to the objective system metrics, the study remarkably demonstrated that users of the critical recommender system significantly spent more time reading the articles than users of the liberal recommender system.
A starting signal for more research
This thesis demonstrated that there is still much to learn and understand about the user behavior and -evaluation of democratic news recommender systems and the underlying processes in play. This thesis might inspire researchers with sufficient resources to study the user behavior and evaluation of more enhanced recommender systems. In addition, the study found that the overall average evaluation of the systems were not remarkably bad, which might inspire news organizations to assess their current systems and experiment with more democratic ones.