4.6 Article

Big Data, Small Personas: How Algorithms Shape the Demographic Representation of Data-Driven User Segments

Journal

BIG DATA
Volume 10, Issue 4, Pages 313-336

Publisher

MARY ANN LIEBERT, INC
DOI: 10.1089/big.2021.0177

Keywords

algorithms; fairness; personas; user segmentation

Ask authors/readers for more resources

Creating user segments from data can lead to biased and inconsistent results. Comparing different algorithms, it is found that there is a trade-off between diversity and fairness. The choice of algorithm has a significant impact on how decision makers perceive the user population.
Derived from the notion of algorithmic bias, it is possible that creating user segments such as personas from data results in over- or under-representing certain segments (FAIRNESS), does not properly represent the diversity of the user populations (DIVERSITY), or produces inconsistent results when hyperparameters are changed (CONSISTENCY). Collecting user data on 363M video views from a global news and media organization, we compare personas created from this data using different algorithms. Results indicate that the algorithms fall into two groups: those that generate personas with low diversity-high fairness and those that generate personas with high diversity-low fairness. The algorithms that rank high on diversity tend to rank low on fairness (Spearman's correlation: -0.83). The algorithm that best balances diversity, fairness, and consistency is Spectral Embedding. The results imply that the choice of algorithm is a crucial step in data-driven user segmentation, because the algorithm fundamentally impacts the demographic attributes of the generated personas and thus influences how decision makers view the user population. The results have implications for algorithmic bias in user segmentation and creating user segments that not only consider commercial segmentation criteria but also consider criteria derived from ethical discussions in the computing community.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available