weka.classifiers.meta
Class OneClassClassifier

java.lang.Object
  extended by weka.classifiers.AbstractClassifier
      extended by weka.classifiers.SingleClassifierEnhancer
          extended by weka.classifiers.RandomizableSingleClassifierEnhancer
              extended by weka.classifiers.meta.OneClassClassifier
All Implemented Interfaces:
java.io.Serializable, java.lang.Cloneable, Classifier, CapabilitiesHandler, OptionHandler, Randomizable, RevisionHandler, TechnicalInformationHandler

public class OneClassClassifier
extends RandomizableSingleClassifierEnhancer
implements TechnicalInformationHandler

Performs one-class classification on a dataset.

Classifier reduces the class being classified to just a single class, and learns the datawithout using any information from other classes. The testing stage will classify as 'target'or 'outlier' - so in order to calculate the outlier pass rate the dataset must contain informationfrom more than one class.

Also, the output varies depending on whether the label 'outlier' exists in the instances usedto build the classifier. If so, then 'outlier' will be predicted, if not, then the label willbe considered missing when the prediction does not favour the target class. The 'outlier' classwill not be used to build the model if there are instances of this class in the dataset. It cansimply be used as a flag, you do not need to relabel any classes.

For more information, see:

Kathryn Hempstalk, Eibe Frank, Ian H. Witten: One-Class Classification by Combining Density and Class Probability Estimation. In: Proceedings of the 12th European Conference on Principles and Practice of Knowledge Discovery in Databases and 19th European Conference on Machine Learning, ECMLPKDD2008, Berlin, 505--519, 2008.

BibTeX:

 @conference{Hempstalk2008,
    address = {Berlin},
    author = {Kathryn Hempstalk and Eibe Frank and Ian H. Witten},
    booktitle = {Proceedings of the 12th European Conference on Principles and Practice of Knowledge Discovery in Databases and 19th European Conference on Machine Learning, ECMLPKDD2008},
    month = {September},
    pages = {505--519},
    publisher = {Springer},
    series = {Lecture Notes in Computer Science},
    title = {One-Class Classification by Combining Density and Class Probability Estimation},
    volume = {Vol. 5211},
    year = {2008},
    location = {Antwerp, Belgium}
 }
 

Valid options are:

 -trr <rate>
  Sets the target rejection rate
  (default: 0.1)
 -tcl <label>
  Sets the target class label
  (default: 'target')
 -cvr <rep>
  Sets the number of times to repeat cross validation
  to find the threshold
  (default: 10)
 -P <prop>
  Sets the proportion of generated data
  (default: 0.5)
 -cvf <perc>
  Sets the percentage of heldout data for each cross validation
  fold
  (default: 10)
 -num <classname + options>
  Sets the numeric generator
  (default: weka.classifiers.meta.generators.GaussianGenerator)
 -nom <classname + options>
  Sets the nominal generator
  (default: weka.classifiers.meta.generators.NominalGenerator)
 -L
  Sets whether to correct the number of classes to two,
  if omitted no correction will be made.
 -E
  Sets whether to exclusively use the density estimate.
 -I
  Sets whether to use instance weights.
 -S <num>
  Random number seed.
  (default 1)
 -D
  If set, classifier is run in debug mode and
  may output additional info to the console
 -W
  Full name of base classifier.
  (default: weka.classifiers.meta.Bagging)
 
 Options specific to classifier weka.classifiers.meta.Bagging:
 
 -P
  Size of each bag, as a percentage of the
  training set size. (default 100)
 -O
  Calculate the out of bag error.
 -S <num>
  Random number seed.
  (default 1)
 -I <num>
  Number of iterations.
  (default 10)
 -D
  If set, classifier is run in debug mode and
  may output additional info to the console
 -W
  Full name of base classifier.
  (default: weka.classifiers.trees.REPTree)
 
 Options specific to classifier weka.classifiers.trees.REPTree:
 
 -M <minimum number of instances>
  Set minimum number of instances per leaf (default 2).
 -V <minimum variance for split>
  Set minimum numeric class variance proportion
  of train variance for split (default 1e-3).
 -N <number of folds>
  Number of folds for reduced error pruning (default 3).
 -S <seed>
  Seed for random data shuffling (default 1).
 -P
  No pruning.
 -L
  Maximum tree depth (default -1, no maximum)
Options after -- are passed to the designated classifier.

Version:
$Revision: 9709 $
Author:
Kathryn Hempstalk (kah18 at cs.waikato.ac.nz), Eibe Frank (eibe at cs.waikato.ac.nz)
See Also:
Serialized Form

Field Summary
static java.lang.String OUTLIER_LABEL
          The label for the outlier class.
 
Constructor Summary
OneClassClassifier()
          Default constructor.
 
Method Summary
 void buildClassifier(Instances data)
          Build the one-class classifier, any non-target data values are ignored.
 java.lang.String densityOnlyTipText()
          Returns the tip text for this property.
 double[] distributionForInstance(Instance instance)
          Returns a probability distribution for a given instance.
 Capabilities getCapabilities()
          Returns default capabilities of the base classifier.
 boolean getDensityOnly()
          Gets whether only the density estimate should be used by the classifier.
 NominalAttributeGenerator getNominalGenerator()
          Gets the generator that will be used by default to generate nominal outlier data.
 NumericAttributeGenerator getNumericGenerator()
          Gets thegenerator that will be used by default to generate numeric outlier data.
 int getNumRepeats()
          Gets the number of repeats for (internal) cross validation.
 java.lang.String[] getOptions()
          Gets the current settings of the Classifier.
 double getPercentageHeldout()
          Gets the percentage of data that will be heldout in each iteration of cross validation.
 double getProportionGenerated()
          Gets the proportion of data that will be generated compared to the target class label.
 java.lang.String getRevision()
          Returns the revision string.
 java.lang.String getTargetClassLabel()
          Gets the target class label - the class label to perform one class classification on.
 double getTargetRejectionRate()
          Gets the target rejection rate - the proportion of target class samples that will be rejected in order to build a threshold.
 TechnicalInformation getTechnicalInformation()
          Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.
 boolean getUseInstanceWeights()
          Gets whether instance weighting will be performed.
 boolean getUseLaplaceCorrection()
          Gets whether a laplace correction should be used.
 java.lang.String globalInfo()
          Returns a string describing this classes ability.
 java.util.Enumeration listOptions()
          Returns an enumeration describing the available options.
static void main(java.lang.String[] args)
          Main method for executing this classifier.
 java.lang.String nominalGeneratorTipText()
          Returns the tip text for this property.
 java.lang.String numericGeneratorTipText()
          Returns the tip text for this property.
 java.lang.String numRepeatsTipText()
          Returns the tip text for this property.
 java.lang.String percentageHeldoutTipText()
          Returns the tip text for this property.
 java.lang.String proportionGeneratedTipText()
          Returns the tip text for this property.
 void setDensityOnly(boolean density)
          Sets whether the density estimate will be used by itself.
 void setNominalGenerator(NominalAttributeGenerator agen)
          Sets the generator that will be used by default to generate nominal outlier data.
 void setNumericGenerator(NumericAttributeGenerator agen)
          Sets the generator that will be used by default to generate numeric outlier data.
 void setNumRepeats(int repeats)
          Sets the number of repeats for (internal) cross validation to a new value.
 void setOptions(java.lang.String[] options)
          Parses a given list of options.
 void setPercentageHeldout(double percent)
          Sets the percentage heldout in each CV fold.
 void setProportionGenerated(double prop)
          Sets the proportion of generated data to a new value.
 void setTargetClassLabel(java.lang.String label)
          Sets the target class label to a new value.
 void setTargetRejectionRate(double rate)
          Sets the target rejection rate.
 void setUseInstanceWeights(boolean newuse)
          Sets whether to perform weighting on instances based on their prevalence in the data.
 void setUseLaplaceCorrection(boolean newuse)
          Sets whether a laplace correction should be used.
 java.lang.String targetClassLabelTipText()
          Returns the tip text for this property.
 java.lang.String targetRejectionRateTipText()
          Returns the tip text for this property.
 java.lang.String toString()
          Output a representation of this classifier
 java.lang.String useInstanceWeightsTipText()
          Returns the tip text for this property.
 java.lang.String useLaplaceCorrectionTipText()
          Returns the tip text for this property.
 
Methods inherited from class weka.classifiers.RandomizableSingleClassifierEnhancer
getSeed, seedTipText, setSeed
 
Methods inherited from class weka.classifiers.SingleClassifierEnhancer
classifierTipText, getClassifier, setClassifier
 
Methods inherited from class weka.classifiers.AbstractClassifier
classifyInstance, debugTipText, forName, getDebug, makeCopies, makeCopy, runClassifier, setDebug
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

OUTLIER_LABEL

public static final java.lang.String OUTLIER_LABEL
The label for the outlier class.

See Also:
Constant Field Values
Constructor Detail

OneClassClassifier

public OneClassClassifier()
Default constructor.

Method Detail

globalInfo

public java.lang.String globalInfo()
Returns a string describing this classes ability.

Returns:
A description of the method.

getTechnicalInformation

public TechnicalInformation getTechnicalInformation()
Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.

Specified by:
getTechnicalInformation in interface TechnicalInformationHandler
Returns:
the technical information about this class

listOptions

public java.util.Enumeration listOptions()
Returns an enumeration describing the available options.

Specified by:
listOptions in interface OptionHandler
Overrides:
listOptions in class RandomizableSingleClassifierEnhancer
Returns:
An enumeration of all the available options.

setOptions

public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
Parses a given list of options.

Valid options are:

 -trr <rate>
  Sets the target rejection rate
  (default: 0.1)
 -tcl <label>
  Sets the target class label
  (default: 'target')
 -cvr <rep>
  Sets the number of times to repeat cross validation
  to find the threshold
  (default: 10)
 -P <prop>
  Sets the proportion of generated data
  (default: 0.5)
 -cvf <perc>
  Sets the percentage of heldout data for each cross validation
  fold
  (default: 10)
 -num <classname + options>
  Sets the numeric generator
  (default: weka.classifiers.meta.generators.GaussianGenerator)
 -nom <classname + options>
  Sets the nominal generator
  (default: weka.classifiers.meta.generators.NominalGenerator)
 -L
  Sets whether to correct the number of classes to two,
  if omitted no correction will be made.
 -E
  Sets whether to exclusively use the density estimate.
 -I
  Sets whether to use instance weights.
 -S <num>
  Random number seed.
  (default 1)
 -D
  If set, classifier is run in debug mode and
  may output additional info to the console
 -W
  Full name of base classifier.
  (default: weka.classifiers.meta.Bagging)
 
 Options specific to classifier weka.classifiers.meta.Bagging:
 
 -P
  Size of each bag, as a percentage of the
  training set size. (default 100)
 -O
  Calculate the out of bag error.
 -S <num>
  Random number seed.
  (default 1)
 -I <num>
  Number of iterations.
  (default 10)
 -D
  If set, classifier is run in debug mode and
  may output additional info to the console
 -W
  Full name of base classifier.
  (default: weka.classifiers.trees.REPTree)
 
 Options specific to classifier weka.classifiers.trees.REPTree:
 
 -M <minimum number of instances>
  Set minimum number of instances per leaf (default 2).
 -V <minimum variance for split>
  Set minimum numeric class variance proportion
  of train variance for split (default 1e-3).
 -N <number of folds>
  Number of folds for reduced error pruning (default 3).
 -S <seed>
  Seed for random data shuffling (default 1).
 -P
  No pruning.
 -L
  Maximum tree depth (default -1, no maximum)

Specified by:
setOptions in interface OptionHandler
Overrides:
setOptions in class RandomizableSingleClassifierEnhancer
Parameters:
options - The list of options as an array of strings.
Throws:
java.lang.Exception - If an option is not supported.

getOptions

public java.lang.String[] getOptions()
Gets the current settings of the Classifier.

Specified by:
getOptions in interface OptionHandler
Overrides:
getOptions in class RandomizableSingleClassifierEnhancer
Returns:
An array of strings suitable for passing to setOptions.

getDensityOnly

public boolean getDensityOnly()
Gets whether only the density estimate should be used by the classifier. If false, the base classifier's estimate will be incorporated using bayes rule for two classes.

Returns:
Whether to use only the density estimate.

setDensityOnly

public void setDensityOnly(boolean density)
Sets whether the density estimate will be used by itself.

Parameters:
density - Whether to use the density estimate exclusively or not.

densityOnlyTipText

public java.lang.String densityOnlyTipText()
Returns the tip text for this property.

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getTargetRejectionRate

public double getTargetRejectionRate()
Gets the target rejection rate - the proportion of target class samples that will be rejected in order to build a threshold.

Returns:
The target rejection rate.

setTargetRejectionRate

public void setTargetRejectionRate(double rate)
Sets the target rejection rate.

Parameters:
rate - The new target rejection rate.

targetRejectionRateTipText

public java.lang.String targetRejectionRateTipText()
Returns the tip text for this property.

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getTargetClassLabel

public java.lang.String getTargetClassLabel()
Gets the target class label - the class label to perform one class classification on.

Returns:
The target class label.

setTargetClassLabel

public void setTargetClassLabel(java.lang.String label)
Sets the target class label to a new value.

Parameters:
label - The target class label to classify for.

targetClassLabelTipText

public java.lang.String targetClassLabelTipText()
Returns the tip text for this property.

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getNumRepeats

public int getNumRepeats()
Gets the number of repeats for (internal) cross validation.

Returns:
The number of repeats for internal cross validation.

setNumRepeats

public void setNumRepeats(int repeats)
Sets the number of repeats for (internal) cross validation to a new value.

Parameters:
repeats - The new number of repeats for cross validation.

numRepeatsTipText

public java.lang.String numRepeatsTipText()
Returns the tip text for this property.

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setProportionGenerated

public void setProportionGenerated(double prop)
Sets the proportion of generated data to a new value.

Parameters:
prop - The new proportion.

getProportionGenerated

public double getProportionGenerated()
Gets the proportion of data that will be generated compared to the target class label.

Returns:
The proportion of generated data.

proportionGeneratedTipText

public java.lang.String proportionGeneratedTipText()
Returns the tip text for this property.

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setPercentageHeldout

public void setPercentageHeldout(double percent)
Sets the percentage heldout in each CV fold.

Parameters:
percent - The new percent of heldout data.

getPercentageHeldout

public double getPercentageHeldout()
Gets the percentage of data that will be heldout in each iteration of cross validation.

Returns:
The percentage of heldout data.

percentageHeldoutTipText

public java.lang.String percentageHeldoutTipText()
Returns the tip text for this property.

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getNumericGenerator

public NumericAttributeGenerator getNumericGenerator()
Gets thegenerator that will be used by default to generate numeric outlier data.

Returns:
The numeric data generator.

setNumericGenerator

public void setNumericGenerator(NumericAttributeGenerator agen)
Sets the generator that will be used by default to generate numeric outlier data.

Parameters:
agen - The new numeric data generator to use.

numericGeneratorTipText

public java.lang.String numericGeneratorTipText()
Returns the tip text for this property.

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getNominalGenerator

public NominalAttributeGenerator getNominalGenerator()
Gets the generator that will be used by default to generate nominal outlier data.

Returns:
The nominal data generator.

setNominalGenerator

public void setNominalGenerator(NominalAttributeGenerator agen)
Sets the generator that will be used by default to generate nominal outlier data.

Parameters:
agen - The new nominal data generator to use.

nominalGeneratorTipText

public java.lang.String nominalGeneratorTipText()
Returns the tip text for this property.

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getUseLaplaceCorrection

public boolean getUseLaplaceCorrection()
Gets whether a laplace correction should be used.

Returns:
Whether a laplace correction should be used.

setUseLaplaceCorrection

public void setUseLaplaceCorrection(boolean newuse)
Sets whether a laplace correction should be used. A laplace correction will reduce the number of class labels to two, the target and outlier classes, regardless of how many labels actually exist. This is useful for classifiers that use the number of class labels to make use a laplace value based on the unseen class.

Parameters:
newuse - Whether to use the laplace correction (default: true).

useLaplaceCorrectionTipText

public java.lang.String useLaplaceCorrectionTipText()
Returns the tip text for this property.

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setUseInstanceWeights

public void setUseInstanceWeights(boolean newuse)
Sets whether to perform weighting on instances based on their prevalence in the data.

Parameters:
newuse - Whether or not to use instance weighting.

getUseInstanceWeights

public boolean getUseInstanceWeights()
Gets whether instance weighting will be performed.

Returns:
Whether instance weighting will be performed.

useInstanceWeightsTipText

public java.lang.String useInstanceWeightsTipText()
Returns the tip text for this property.

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getCapabilities

public Capabilities getCapabilities()
Returns default capabilities of the base classifier.

Specified by:
getCapabilities in interface Classifier
Specified by:
getCapabilities in interface CapabilitiesHandler
Overrides:
getCapabilities in class SingleClassifierEnhancer
Returns:
the capabilities of the base classifier

buildClassifier

public void buildClassifier(Instances data)
                     throws java.lang.Exception
Build the one-class classifier, any non-target data values are ignored. The target class label must exist in the arff file or else an exception will be thrown.

Specified by:
buildClassifier in interface Classifier
Parameters:
data - The training data.
Throws:
java.lang.Exception - If the classifier could not be built successfully.

distributionForInstance

public double[] distributionForInstance(Instance instance)
                                 throws java.lang.Exception
Returns a probability distribution for a given instance.

Specified by:
distributionForInstance in interface Classifier
Overrides:
distributionForInstance in class AbstractClassifier
Parameters:
instance - The instance to calculate the probability distribution for.
Returns:
The probability for each class.
Throws:
java.lang.Exception

toString

public java.lang.String toString()
Output a representation of this classifier

Overrides:
toString in class java.lang.Object
Returns:
a representation of this classifier

getRevision

public java.lang.String getRevision()
Returns the revision string.

Specified by:
getRevision in interface RevisionHandler
Overrides:
getRevision in class AbstractClassifier
Returns:
The revision string.

main

public static void main(java.lang.String[] args)
Main method for executing this classifier.

Parameters:
args - use -h to see all available options