| |
|

- Data Sets
-
Frequent Itemset Mining Implementations Repository
- This repository is the result of The 1st International Workshop on Frequent Itemset Mining Implementations, (FIMI'03) which took place at IEEE ICDM'03, on November 19, 2003, Melbourne, Florida, USA.
This website will serve as the FIMI repository containing the source codes of all implementations that were accepted at the FIMI workshop together with several puclicly available datasets.
- KDD Cup
- KDD Cup 1999
The competition task was to build a network intrusion detector, a predictive model capable of distinguishing between ``bad'' connections, called intrusions or attacks, and ``good'' normal connections. This database contains a standard set of data to be audited, which includes a wide variety of intrusions simulated in a military network environment.
- KDD Cup 2000
WebView1, WebView2, WebPOS
- KDD Cup 2001
Because of the rapid growth of interest in mining biological databases, KDD Cup 2001 was focused on data from genomics and drug design. Sufficient (yet concise) information was provided so that detailed domain knowledge was not a requirement for entry. A total of 136 groups participated to produce a total of 200 submitted predictions over the 3 tasks: 114 for Thrombin, 41 for Function, and 45 for Localization.
- KDD Cup 2002
This year the competition included two tasks that involved data mining in molecular biology domains. The first task focused on constructing models that can assist genome annotators by automatically extracting information from scientific articles. The second task focused on learning models that characterize the behavior of individual genes in a hidden experimental setting.
- KDD Cup 2003
The first task involves predicting the future; contestants predict how many citations each paper will receive during the three months leading up to the KDD 2003 conference. For the second task, contestants must build a citation graph of a large subset of the archive from only the LaTex sources. In the third task, each paper's popularity will be estimated based on partial download logs. And the last task is open! Given the large amount of data, contestants can devise their own questions and the most interesting result is the winner.
-
UCI Knowledge Discovery in Databases
Archive
-
StatLib (Department of Statistics at
Carnegie Mellon University)
-
ILP
applications and Datasets
-
Review
of Available ILP Datasets
-
Workgroup KDD-SISYPHUS
-
University
of Toronto's Delve datasets
-
Statlog Datasets
-
RISE
- Repository of online Information Sources used in information Extraction
tasks. (RISE is a distributed repository of online information sources
that are used for the empirical analysis of [machine] learning algorithms
that generate extraction patterns)
-
The UC
Irvine Database Repository
-
Oxford
University Computing Laboratory ILP Datasets
-
Pattern recognition
datasets from Universal Problem Solvers Inc
- Synthetic Data Generator
|
|
|