4.7 Article

Session stitching using sequence fingerprinting for web page visits

Journal

DECISION SUPPORT SYSTEMS
Volume 150, Issue -, Pages -

Publisher

ELSEVIER
DOI: 10.1016/j.dss.2021.113579

Keywords

Session stitching; Web analytics; Sequence mining; Session fingerprinting

Funding

  1. Edinburgh Parallel Computing Centre (EPCC)
  2. DataLab [7868323]

Ask authors/readers for more resources

The way people navigate the web has changed significantly with the use of multiple devices and shared devices. Analyzing a large volume of seemingly disjoint data can support decision-making through machine learning. This study introduces an alternative approach based on learning behavioral patterns from web page visit fingerprints to identify and stitch web sessions efficiently.
The nature of people's web navigation has significantly changed in recent years. The advent of smartphones and other handheld devices has given rise to web users consulting websites with more than one device, or using a shared device. As a result, large volumes of seemingly disjoint data are available, which when analysed together can support decision-making. The task of identifying web sessions by linking such data back to a specific person, however, is hard. The idea of session stitching aims to overcome this by using machine learning inference to identify similar or identical users. Many such efforts use various demographic data or device-based features to train matching algorithms. However, often these variables are not available for every dataset or are recorded differently, making a streamlined setup difficult. Besides, they often result in vast feature spaces which are hard to use for actionable interpretation. In this paper, we present an alternative approach based on the fingerprinting of web pages visited by users in a single session. By learning behavioural patterns from these sequences of page visits, we obtain features that can be used for matching without requiring sensitive user-agent data such as IP, geo location, or device details as is common with other approaches. Using these sequential fingerprints does not rely on pre-defined features, but only requires the recording of web page visits, making our approach actionable. The approach is empirically tested on real-life web logs and compared with matching using regular user-agent features and state-of-the-art embedding techniques. Results in an ecommerce context show sequential features can still obtain strong performance with fewer features, facilitating decision-making on session stitching and inform subsequent related activities such as marketing or customer analysis.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available