TU Berlin

Database Systems and Information Management GroupPublications

Logo FG DIMA-new  65px

Page Content

to Navigation


Scalable Frequent Sequence Mining With Flexible Subsequence Constraints
Citation key Renz-WielandBG19
Author Alexander Renz-Wieland, Matthias Bertsch, Rainer Gemulla,
Year 2019
Journal 35th IEEE International Conference on Data Engineering (ICDE 2019)
Volume 2019
Abstract We study scalable algorithms for frequent sequencemining under flexible subsequence constraints. Such constraints enable applications to specify concisely which patterns are ofinterest and which are not. We focus on the bulk synchronous parallel model with one round of communication; this model is suitable for platforms such as MapReduce or Spark. We derive a general framework for frequent sequence mining under this model and propose the D-SEQ and D-CAND algorithms within this framework. The algorithms differ in what data are communicated and how computation is split up among workers.To the best of our knowledge, D-SEQ and D-CAND are the first scalable algorithms for frequent sequence mining with flexible constraints. We conducted an experimental study on multiple real-world datasets that suggests that our algorithms scale nearly linearly, outperform common baselines, and offer acceptable generalization overhead over existing, less general mining algorithms.
Link to publication Link to original publication Download Bibtex entry


Quick Access

Schnellnavigation zur Seite über Nummerneingabe