causalnex.discretiser.DecisionTreeSupervisedDiscretiserMethod

class causalnex.discretiser.DecisionTreeSupervisedDiscretiserMethod(mode='single', split_unselected_feat=False, tree_params=None)[source]

Bases: causalnex.discretiser.abstract_discretiser.AbstractSupervisedDiscretiserMethod

Allows the discretisation of continuous features based on the split thresholds of either sklearn’s DecisionTreeRegressor or DecisionTreeClassifier. DecisionTreeSupervisedDiscretiserMethod is inhereited from AbstractSupervisedDiscretiserMethod. When instantiated, we have an object with .fit method to learn discretisation thresholds from data and .transform method to process the input.

Example:

 import pandas as pd
 import numpy as np
 from causalnex.discretiser.discretiser_strategy import DecisionTreeSupervisedDiscretiserMethod
 from sklearn.datasets import load_iris
 iris = load_iris()
 X, y = iris["data"], iris["target"]
 names = iris["feature_names"]
 data = pd.DataFrame(X, columns=names)
 data["target"] = y
 dt_multi = DecisionTreeSupervisedDiscretiserMethod(
     mode="multi", tree_params={"max_depth": 3, "random_state": 2020}
 )
 tree_discretiser = dt_multi.fit(
     feat_names=[
         "sepal length (cm)",
         "sepal width (cm)",
         "petal length (cm)",
         "petal width (cm)",
     ],
     dataframe=data,
     target="target",
     target_continuous=False,
 )
 discretised_data = tree_discretiser.transform(data[["petal width (cm)"]])
 discretised_data.values.ravel()
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
   0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
   0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
   1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
   1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2,
   2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2,
   2, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

Attributes

Methods

DecisionTreeSupervisedDiscretiserMethod.__delattr__(name, /)

Implement delattr(self, name).

DecisionTreeSupervisedDiscretiserMethod.__dir__()

default dir() implementation

DecisionTreeSupervisedDiscretiserMethod.__eq__(…)

Return self==value.

DecisionTreeSupervisedDiscretiserMethod.__format__

default object formatter

DecisionTreeSupervisedDiscretiserMethod.__ge__(…)

Return self>=value.

DecisionTreeSupervisedDiscretiserMethod.__getattribute__(name, /)

Return getattr(self, name).

DecisionTreeSupervisedDiscretiserMethod.__getstate__()

DecisionTreeSupervisedDiscretiserMethod.__gt__(…)

Return self>value.

DecisionTreeSupervisedDiscretiserMethod.__hash__()

Return hash(self).

DecisionTreeSupervisedDiscretiserMethod.__init__([…])

This Discretiser Method uses Decision Trees to predict the target.

DecisionTreeSupervisedDiscretiserMethod.__init_subclass__

This method is called when a class is subclassed.

DecisionTreeSupervisedDiscretiserMethod.__le__(…)

Return self<=value.

DecisionTreeSupervisedDiscretiserMethod.__lt__(…)

Return self<value.

DecisionTreeSupervisedDiscretiserMethod.__ne__(…)

Return self!=value.

DecisionTreeSupervisedDiscretiserMethod.__new__(…)

Create and return a new object.

DecisionTreeSupervisedDiscretiserMethod.__reduce__

helper for pickle

DecisionTreeSupervisedDiscretiserMethod.__reduce_ex__

helper for pickle

DecisionTreeSupervisedDiscretiserMethod.__repr__([…])

Return repr(self).

DecisionTreeSupervisedDiscretiserMethod.__setattr__(…)

Implement setattr(self, name, value).

DecisionTreeSupervisedDiscretiserMethod.__setstate__(state)

DecisionTreeSupervisedDiscretiserMethod.__sizeof__()

size of object in memory, in bytes

DecisionTreeSupervisedDiscretiserMethod.__str__()

Return str(self).

DecisionTreeSupervisedDiscretiserMethod.__subclasshook__

Abstract classes can override this to customize issubclass().

DecisionTreeSupervisedDiscretiserMethod._check_n_features(X, …)

Set the n_features_in_ attribute, or check against it.

DecisionTreeSupervisedDiscretiserMethod._get_param_names()

Get parameter names for the estimator

DecisionTreeSupervisedDiscretiserMethod._get_tags()

DecisionTreeSupervisedDiscretiserMethod._more_tags()

DecisionTreeSupervisedDiscretiserMethod._repr_html_inner()

This function is returned by the @property _repr_html_ to make hasattr(estimator, “_repr_html_”) return `True or False depending on get_config()[“display”].

DecisionTreeSupervisedDiscretiserMethod._repr_mimebundle_(…)

Mime bundle used by jupyter kernels to display estimator

DecisionTreeSupervisedDiscretiserMethod._transform_one_column(…)

Given one “original” feature (continuous), discretise it.

DecisionTreeSupervisedDiscretiserMethod._validate_data(X)

Validate input data and set or check the n_features_in_ attribute.

DecisionTreeSupervisedDiscretiserMethod.fit(…)

The fit method allows DecisionTrees to learn split thresholds from the input data

DecisionTreeSupervisedDiscretiserMethod.fit_transform(…)

raises NotImplementedError

fit_transform is not implemented

DecisionTreeSupervisedDiscretiserMethod.get_params([deep])

Get parameters for this estimator.

DecisionTreeSupervisedDiscretiserMethod.set_params(…)

Set the parameters of this estimator.

DecisionTreeSupervisedDiscretiserMethod.transform(data)

Given one “original” dataframe, discretise it.

__init__(mode='single', split_unselected_feat=False, tree_params=None)[source]

This Discretiser Method uses Decision Trees to predict the target. The cutting points on the the Decision Tree becomes the chosen discretisation thresholds

If the target is a continuous variable, we fit a DecisionTreeRegressor to discretise the data. Otherwise, we fit a Classifier.

Parameters
  • max_depth (int) – maximum depth of the decision tree.

  • mode (str) – Either ‘single’ or ‘multi’.

  • if single (-) – The splitting points of the decision tree become discretiser fixed points

  • a univariate decision tree for each continuous variable being discretised. (Train) – The splitting points of the decision tree become discretiser fixed points

  • if multi (-) – The splitting points of each variable used in the Decision tree become the thresholds for discretisation

  • a decision tree over all the variables passed. (Train) – The splitting points of each variable used in the Decision tree become the thresholds for discretisation

  • split_unselected_feat (bool) – only applicable if self.mode = ‘multi’.

  • if True (-) –

  • not selected by the decision tree will be discretised using 'single' mode (features) –

  • the same tree parameters (with) –

  • if False (-) –

  • not selected by the decision tree will be left unchanged (features) –

  • tree_params (Optional[Dict[str, Any]]) – keyword arguments, which are parameters

  • for sklearn.tree.DecisionTreeClassifier/sklearn.tree.DecisionTreeRegressor (used) –

Raises

KeyError – if an incorrect argument is passed

fit(feat_names, target, dataframe, target_continuous)[source]

The fit method allows DecisionTrees to learn split thresholds from the input data

Parameters
  • feat_names (List[str]) – a list of feature to be discretised

  • target (str) – name of variable that is going to be used a target for the decision tree

  • dataframe (pd.DataFrame) – pandas dataframe of input data

  • target_continuous (bool) – a boolean that indicates if the target variable is continuous

Returns

DecisionTreeSupervisedDiscretiserMethod object with learned split thresholds from the decision tree

Return type

self

fit_transform(*args, **kwargs)
Raises

NotImplementedError – fit_transform is not implemented

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params – Parameter names mapped to their values.

Return type

dict

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**params (dict) – Estimator parameters.

Returns

self – Estimator instance.

Return type

estimator instance

transform(data)

Given one “original” dataframe, discretise it.

Parameters

data (DataFrame) – dataframe with continuous features, to be transformed into discrete

Return type

array

Returns

discretised version of the input data