CrossFIM: a spark-based hybrid frequent itemset mining algorithm for large datasets

Raj, Shashi; DHARAVATH, RAMESH; Gantela, Prabhakar

doi:10.1007/s10586-024-04868-8

CrossFIM: a spark-based hybrid frequent itemset mining algorithm for large datasets.

AUT.RAJ SHASHI, DHARAVATH RAMESH, GANTELA PRABHAKAR.

Opis bibliograficzny

CrossFIM: a spark-based hybrid frequent itemset mining algorithm for large datasets. [AUT.] RAJ SHASHI, DHARAVATH RAMESH, GANTELA PRABHAKAR. Cluster Computing. DOI: 10.1007/s10586-024-04868-8

Skopiowane!

Kliknij opis aby skopiować do schowka

Szczegóły publikacji

Źródło:

Cluster Computing

Rok:2025

Język:angielski

Charakter formalny:Artykuł w czasopismie

Typ MNiSW/MEiN:inne

Streszczenia

Frequent Itemset Mining (FIM) is the fundamental technique for discovering interesting patterns from transactional datasets. Typical algorithmic solutions for extracting such patterns are inefficient since they lead to an exponential increase in computational complexity with input data size. Due to the rapid growth in data accumulation, FIM demands high computational resources. Numerous adaptations of the fundamental FIM algorithms have been presented in a distributed setting to accommodate the ever-growing data needs. In continuation of the episode, several MapReduce-based FIM techniques have recently been demonstrated to alleviate the speed and scalability constraints. However, the existing MapReduce-based alternatives continue to underperform because of the iterative approach of algorithms, large intermediate data produced, workload skewness, and high disk I/O and network communication overhead. This paper proposes CrossFIM, a scalable Spark-based FIM algorithm combining Apriori and Eclat-based variants with partitioning techniques. The Apriori variant is used for the first few iterations until frequent itemsets start decreasing. After that, the Eclat variant is used with the partitioning technique. When used in conjunction with the Eclat variant, the partitioning approach makes it possible to efficiently handle large transaction sets (or diffsets) in memory and compute their intersection effectively. It uses the advantages of both horizontal and vertical data formats to boost performance. It regulates the number of key-value pairs shuffled amongst cluster nodes throughout iterations. It was demonstrated that CrossFIM significantly outperforms other state-of-the-art methods in efficiency and scalability through comprehensive trials using benchmark datasets.

Linki zewnętrzne

PBN

67c6aedbfdbe8357041f2f1e

DOI

10.1007/s10586-024-04868-8

Strona WWW

https://link.springer.com/content/pdf/1…

Identyfikatory

ISSN: 1386-7857

BPP ID: (6, 7921) wydawnictwo ciągłe #7921

Metryki

70,00

Punkty MNiSW/MEiN

0

Impact Factor

0

Index Copernicus

0

Punktacja wewnętrzna

Eksport cytowania

Wsparcie dla menedżerów bibliografii:
Ta strona wspiera automatyczny import do Zotero, Mendeley i EndNote. Użytkownicy z zainstalowanym rozszerzeniem przeglądarki mogą zapisać tę publikację jednym kliknięciem - ikona pojawi się automatycznie w pasku narzędzi przeglądarki.