Dinh Bay Vo1*, Tan Phuc Nguyen2
1 Faculty of Information Technology, Ho Chi Minh City University of Technology
2 Foreign Languages and Informatics Center, Khanh Hoa University
Abstract:
Mining frequent itemsets just focuses on mining items which have the same importance (e.g., unit profit) and may not appear more than once in each transaction. On the contrary, mining high utility itemsets (HUIs) considers items which have different unit profits and may have non-binary purchase quantities in transactions. Basically, mining HUIs is to find the items that produce a higher profit than those bought frequently. There have been many algorithms developed for mining HUIs, among which EFIM is the latest algorithm which applies several techniques to improve the runtime and the search space. However, the cost of EFIM for scanning transactions to determine candidate relevance is high, which reduces the efficiency of the algorithm, especially on sparse databases. In this paper, the authors developed a P-set structure and proposed an improved algorithm of EFIM to reduce the number of transaction scans and thereby reduce the mining time. Experimental results showed that the improved algorithm reduced significantly the number of transaction scans and the mining time, especially on sparse databases.