• Association Rules/Frequent Itemsets

    Name Description Readme and Source Code
    Hash Tree There are 3 classes: HashTree, HashNode & IntList. And The users only required to handle HashTree & IntList. HashNode is not need to be handle by users. The index used in HashTree is in the form of IntList. The details of HashTree and IntList are described in the README file.

    Readme , Source Code (zipped, 26k)

    FP-tree/FP-Growth Mining large itemsets using the FP-tree algorithm.

    Reference: Jiawei Han, Jian Pei, Yiwen Yin,
    Mining Frequent Patterns without Candidate Generation, In
    2000 ACM SIGMOD Intl. Conference on Management of Data Paper

    Readme, Source Code
    BOMO Mining top K frequent itemsets using BOMO algorithm

    Reference: Y.L. Cheung, A.W. Fu: An FP-tree Approach for Mining N-most Interesting Itemsets. In Proceedings of the SPIE Conference on Data Mining, 2002. Paper

    Readme, Source Code

    Constraint (BOMO) Mining Association Rules without Support Threshold: with and without Item Constraints"

    Reference: Y.L. Cheung, A. Fu, "Mining Association Rules without Support Threshold: with and without Item Constraints", IEEE Transactions on Knowledge and Data Engineering (TKDE), 2004. Paper
    Source Code
    COFI-tree COFI-tree Mining: A New Approach to Pattern Growth with Reduced Candidacy Generation

    Reference: Osmar R. Zaiane and Mohammed El-Hajj: COFI-tree Mining: A New Approach to Pattern Growth with Reduced Candidacy Generation. In FIMI 2003, the first Workshop on Frequent Itemset Mining Implementations, held with IEEE ICDM 2003. Paper

    Readme, Source Code


  • Content-Based Retrieval in Multimedia Databases

    Name Description Readme and Source Code
    R*-tree

    An R*-tree indexing structure for nearest neighbor search.

    Reference: Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, Bernhard Seeger: 'The R*-tree: An Efficient and Robust Access Method for Points and Rectangles', In Proceedings of the ACM SIGMOD, 1990 Paper

    Readme, Source Code
    SR-tree

    An R*-tree and SS-tree like indexing structure for nearest neighbor search.

    Readme, Source Code
    VP-tree A VP-tree indexing structure for nearest neighbor search

    Reference: T. Chiueh, Content-Based Image Indexing, In VLDB 1994 Paper

    Readme, Source Code (normal visiting order)
    Readme, Source Code (special visiting order)

    X-tree An X-tree indexing structure for nearest neighbor search

    Reference: Berchtold, S., Keim, D., Kriegel, H.P, The X-tree:An Index Structure for High-Dimensional Data, In VLDB 1996
    Paper

    Readme, Source Code
    X+-tree An X+-tree indexing structure for nearest neighbor search

    X+-tree is a variation of X-tree, which disallows the splitting of the supernodes in the X-tree for a better performance.
    More specifically, X+-tree is a variation of X-tree, which disallows the splitting of the supernodes in the X-tree for a better performance. That is, we do not allow that supernode grows too much. In X-tree, the size of supernode can be a multiple of a normal node. In X+-tree, the size of supernode is at most the size of a normal node multiplied by a given user parameter MAX_X_SNODE.

    Readme, Source Code

  • Clustering

    Name Description Readme and Source Code
    ENCLUS An Entropy-based Subspace Clustering Algorithm

    Reference: C.H. Cheng, A.W. Fu, Y. Zhang, Entropy-based Subspace Clustering for Mining Numerical Data. In Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-99), San Diego, Aug 1999. gzipped psfile

    Readme Source Code
    EPC2D An Efficient Project Clustering Algorithm

    Reference: Eric K.K. Ng, A. Fu : Efficient algorithm for Projected Clustering, 18th International Conference on Data Engineering (ICDE), February 26-March 1, San Jose, California 2002. (poster presentation).Paper

    EPC2D: Source Code

    Data Set Generator: Source Code



  • Time Series

    Name Description Readme and Source Code
    Efficient Time Series Matching by Wavelets An algorithm of Efficient Time Series Matching by Wavelets

    Reference: K.P. Chan, A.W. Fu, Efficient Time Series Matching by Wavelets. In Proceedings of Internation Conference on Data Engineering (ICDE '99), Sydney, March 1999. gzipped ps

    Details

  • Web Mining

    Name Description Readme and Source Code
    Increment Document Clustering An algorithm for Increment Document Clustering

    Reference: Wai-chiu Wong, Ada Wai-chee Fu, Incremental Document Clustering for Web Page Classification. In Proceedings of 2000 International Conference on Information Society in the 21st Century: Emerging Technologies and New Challenges (IS2000), Aizu-Wakamatsu City, Fukushima, Japan November 5-8, 2000. gzipped psfile

    Source Code

  • Maximal-Profit Item Selection (MPIS)

    Name Description Readme and Source Code
    MPIS_Alg An algorithm for problem Maximal-Profit Item Selection

    Reference: Raymond Chi-Wing Wong, Ada Wai-Chee Fu and Ke Wang: MPIS: Maximal-Profit Item Selection with Cross-Selling Considerations, The 2003 IEEE International Conference on Data Mining (ICDM), Melbourne, Florida on November 19-22, 2003 Paper

    Readme, Source Code
    ISM An algorithm for problem Item Selection for Marketing

    Reference: Raymond Chi-Wing Wong and Ada Wai-Chee Fu: ISM: Item Selection for Marketing with Cross-Selling Considerations, The Eighth Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), Sydney, Australia on May 26-28, 2004 Paper

    Readme, Source Code