Tziatzios, Achilleas
2014.
Data mining of range-based classification rules for data characterization.
PhD Thesis,
Cardiff University.
Item availability restricted. |
Preview |
PDF
- Accepted Post-Print Version
Download (1MB) | Preview |
PDF
- Supplemental Material
Restricted to Repository staff only Download (73kB) |
Abstract
Advances in data gathering have led to the creation of very large collections across different fields like industrial site sensor measurements or the account statuses of a financial institution's clients. The ability to learn classification rules, rules that associate specific attribute values with a specific class label, from this data is important and useful in a range of applications. While many methods to facilitate this task have been proposed, existing work has focused on categorical datasets and very few solutions that can derive classification rules of associated continuous ranges (numerical intervals) have been developed. Furthermore, these solutions have solely relied in classification performance as a means of evaluation and therefore focus on the mining of mutually exclusive classification rules and the correct prediction of the most dominant class values. As a result existing solutions demonstrate only limited utility when applied for data characterization tasks. This thesis proposes a method that derives range-based classification rules from numerical data inspired by classification association rule mining. The presented method searches for associated numerical ranges that have a class value as their consequent and meet a set of user defined criteria. A new interestingness measure is proposed for evaluating the density of range-based rules and four heuristic based approaches are presented for targeting different sets of rules. Extensive experiments demonstrate the effectiveness of the new algorithm for classification tasks when compared to existing solutions and its utility as a solution for data characterization.
Item Type: | Thesis (PhD) |
---|---|
Status: | Unpublished |
Schools: | Computer Science & Informatics |
Subjects: | Q Science > QA Mathematics > QA75 Electronic computers. Computer science |
Uncontrolled Keywords: | Data mining; Classification rules; Data characterization; Continuous data |
Date of First Compliant Deposit: | 30 March 2016 |
Last Modified: | 06 Oct 2023 15:57 |
URI: | https://orca.cardiff.ac.uk/id/eprint/65902 |
Actions (repository staff only)
Edit Item |