weka.associations.classification
Class JCBAPruning

java.lang.Object
  extended by weka.associations.classification.PruneCAR
      extended by weka.associations.classification.CrTree
          extended by weka.associations.classification.JCBAPruning
All Implemented Interfaces:
java.io.Serializable, OptionHandler
Direct Known Subclasses:
PrecedencePruning

public class JCBAPruning
extends CrTree
implements OptionHandler, java.io.Serializable

Class implemting the pruning step of the CBA algorithm using a CrTree. The Tree Structure is described in: W. Li, J. Han, J.Pei: CMAR: Accurate and Efficient Classification Based on Multiple Class-Association Rules. In ICDM'01:369-376,2001. The CBA algorithm is described in: B. Liu, W. Hsu, Y. Ma: Integrating Classification and Association Rule Mining. In KDD'98:80-86,1998. Valid options are:

-C the confidence value
The confidence value for the optional pessimistic-error-rate-based pruning step (default: 0.25).

-N
If set no pessimistic-error-rate-based pruning is performed.

Version:
$Revision: 8108 $
Author:
Stefan Mutter
See Also:
Serialized Form

Constructor Summary
JCBAPruning()
          Constructor
 
Method Summary
static double addErrs(double N, double e, float CF)
          Computes estimated pessimistic error rate for given total number of instances and error using normal approximation to binomial distribution (and continuity correction).
 int calculateDefaultClass(Instances RemainingClassInstances)
          Calculates the default class as the majority class in the instances
 double calculateError(FastVector premise, FastVector consequence)
          Calculates the pessimistic error rate of a rule
 int getClassValue(int index)
          Gets the consequence (the class label) of a rule as an integer value.
 java.lang.String[] getOptions()
          Gets the current settings of the Apriori object.
 FastVector getPrecedenceList()
          Gets the sorted list (according to the interestingness measure) of all rules.
 int getStopIndex()
          Gets the number of rules that should be used for classification
 void insertContent(CrNode node, FastVector input)
          Insert the consequence and the interestingness measures of a rule and builds up the precedence information that allows a ranking according to the interestingness measures
 java.util.Enumeration listOptions()
          Returns an enumeration describing the available options.
 int numClassRules()
          The number of rules in the tree.
 int numMinedRules()
          Gets the number of rules after the mining process that tried to get inserted into the CrTree
 int numPrunedRules()
          Gets the number of rules left after the (optional) pessimistic-error-rate-based pruning step.
 void optPruning(boolean flag, float value)
          Sets optional pruning on or off and its confidence value
 void preprocess(FastVector premises, FastVector consequences, FastVector confidences)
          The preprocessing step before a rule is inserted into a CrTree.
 void prune()
          Method that implements the obligatory pruning step
 void pruneBeforeInsertion(FastVector premise, FastVector consequence)
          Performs the (optional) pessimistic-error.rate-based pruning step.
 void resetOptions()
          Resets the options to the default values.
 void setDefaultClass(int i)
          Sets the default class in each step during obligatory pruning.
 void setInstances(Instances instances)
          Sets the instances (including the class attribute)
 void setOptions(java.lang.String[] options)
          Parses a given list of options.
 java.lang.String toString(java.lang.String metricString)
          Returns a string description of the rule set stored in the tree
 
Methods inherited from class weka.associations.classification.CrTree
deleteContent, deleteNode, getAssociateList, getDefaultClass, getRoot, insertNode, isEmpty, makeEmpty, pathToString, prunedRules, pruningCriterions, removeAtChild, removeAtList, removeAtSibling, reportSubtreeCount, rulePremise, search, setDefaultClass, setInstancesNoClass, setInstancesOnlyClass, sortItemSet, updateHeight
 
Methods inherited from class weka.associations.classification.PruneCAR
forName
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

JCBAPruning

public JCBAPruning()
Constructor

Method Detail

resetOptions

public void resetOptions()
Resets the options to the default values.


pruneBeforeInsertion

public void pruneBeforeInsertion(FastVector premise,
                                 FastVector consequence)
Performs the (optional) pessimistic-error.rate-based pruning step. If the rule is not pruned, the method inserts it into the CrTree.

Overrides:
pruneBeforeInsertion in class CrTree
Parameters:
premise - the rule premise
consequence - the consequence and interestingness measures

insertContent

public void insertContent(CrNode node,
                          FastVector input)
Insert the consequence and the interestingness measures of a rule and builds up the precedence information that allows a ranking according to the interestingness measures

Overrides:
insertContent in class CrTree
Parameters:
node - the node in the tree where the consequence should be inserted
input - the consequence

calculateError

public double calculateError(FastVector premise,
                             FastVector consequence)
Calculates the pessimistic error rate of a rule

Parameters:
premise - the premise
consequence - the consequence
Returns:
the pessimistic error rate

addErrs

public static double addErrs(double N,
                             double e,
                             float CF)
Computes estimated pessimistic error rate for given total number of instances and error using normal approximation to binomial distribution (and continuity correction).

Parameters:
N - number of instances
e - observed error
CF - confidence value
Returns:
estimated pessimistic error rate

preprocess

public void preprocess(FastVector premises,
                       FastVector consequences,
                       FastVector confidences)
                throws java.lang.Exception
The preprocessing step before a rule is inserted into a CrTree. The main purpose is to sort the items according to their frequencies in the datsaset. More frequent items will be found in nodes closer to the root.

Overrides:
preprocess in class CrTree
Parameters:
premises - the premises
consequences - the consequences
confidences - the interestingness measures
Throws:
java.lang.Exception - throws exception if preprocessing is not possible

prune

public void prune()
Method that implements the obligatory pruning step

Overrides:
prune in class CrTree

getStopIndex

public int getStopIndex()
Gets the number of rules that should be used for classification

Returns:
index in the sorted list before which the rules are used for classification.

getPrecedenceList

public FastVector getPrecedenceList()
Gets the sorted list (according to the interestingness measure) of all rules. The FastVector contains a pointer to a node containing the consequence and the least frequent item of the premise of a rule.

Returns:
FastVector containing CrNodes.

getClassValue

public int getClassValue(int index)
Gets the consequence (the class label) of a rule as an integer value.

Parameters:
index - the rank of the rule in the sort order induced by the interestingness measure.
Returns:
the consequence of a rule (that is a class label) as a integer.

setInstances

public void setInstances(Instances instances)
Sets the instances (including the class attribute)

Parameters:
instances - the instances for which class association rules are mined.

calculateDefaultClass

public int calculateDefaultClass(Instances RemainingClassInstances)
Calculates the default class as the majority class in the instances

Parameters:
RemainingClassInstances - the set of instances
Returns:
the default class label

setDefaultClass

public void setDefaultClass(int i)
Sets the default class in each step during obligatory pruning.

Parameters:
i - -1, if the default class is the majority class in the data the index of the rule in the sort order induced by the interestingness measure, if the default class is set during obligatory pruning.

getOptions

public java.lang.String[] getOptions()
Gets the current settings of the Apriori object.

Specified by:
getOptions in interface OptionHandler
Overrides:
getOptions in class CrTree
Returns:
an array of strings suitable for passing to setOptions

listOptions

public java.util.Enumeration listOptions()
Returns an enumeration describing the available options.

Specified by:
listOptions in interface OptionHandler
Overrides:
listOptions in class CrTree
Returns:
an enumeration of all the available options.

setOptions

public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
Parses a given list of options. Valid options are:

-C the confidence value
The confidence value for the optional pessimistic-error-rate-based pruning step (default: 0.25).

-N
If set no pessimistic-error-rate-based pruning is performed.

Specified by:
setOptions in interface OptionHandler
Overrides:
setOptions in class CrTree
Parameters:
options - the list of options as an array of strings
Throws:
java.lang.Exception - if an option is not supported

optPruning

public void optPruning(boolean flag,
                       float value)
Sets optional pruning on or off and its confidence value

Parameters:
value - the confidence value
flag - flag indicating whether optional pruning should be on or off

toString

public java.lang.String toString(java.lang.String metricString)
Returns a string description of the rule set stored in the tree

Overrides:
toString in class CrTree
Parameters:
metricString - the metric used as interestingness measure
Returns:
outputs the stored rule set as a string

numMinedRules

public int numMinedRules()
Gets the number of rules after the mining process that tried to get inserted into the CrTree

Returns:
the (initial) number of rules after the mining step.

numPrunedRules

public int numPrunedRules()
Gets the number of rules left after the (optional) pessimistic-error-rate-based pruning step.

Returns:
the number of rules left after the optional pruning step.

numClassRules

public int numClassRules()
The number of rules in the tree. After the pruning step this number equals the number of rules used for classification.

Returns:
the number of rules stored in the tree.