Skip to main content
Fig. 1 | Journal of Big Data

Fig. 1

From: Dynamic order Markov model for categorical sequence clustering

Fig. 1

Selected patterns detected from the 3\(\beta\)-HSD protein family are used here to illustrate the concept of sparse pattern. a Shows some of the patterns GTGT(25), GSGT(16), GVGT(18), GTIT(17), GTAT(15) with occurrence in bracket, which represents that original dataset; b Shows patterns detected by conventional consecutive detection method, and only one consecutive pattern GTGT, while the other patterns are filtered by a predefined support threshold \(\tau =20\). c Shows the patterns detected by the wildcard/gap detection method, yielding only sparse pattern \(G*GT\) and \(GT*T\) detected since the can not find consecutive patterns. d Presents the patterns detected by our approach (sparse pattern detection) with the same threshold. Compared with b, c, our approach not only detects the sparse patterns \(G*GT\) and \(GT*T\), but also retains the consecutive pattern GTGT as shown in (d)

Back to article page