causalnex.discretiser.Discretiser

class causalnex.discretiser.Discretiser(method='uniform', num_buckets=None, outlier_percentile=None, numeric_split_points=None, percentile_split_points=None)[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

Allows the discretisation of numeric data.

Example:

 import causalnex
 import pandas as pd

 df = pd.DataFrame({'Age': [12, 13, 18, 19, 22, 60]})

 from causalnex.discretiser import Discretiser
 df["Transformed_Age_1"] = Discretiser(method="fixed",
 numeric_split_points=[11,18,50]).transform(df["Age"])
 df.to_dict()
{'Age': {0: 7, 1: 12, 2: 13, 3: 18, 4: 19, 5: 22, 6: 60},
'Transformed_Age': {0: 0, 1: 1, 2: 1, 3: 2, 4: 2, 5: 2, 6: 3}}

Attributes

Methods

Discretiser.__delattr__(name, /)

Implement delattr(self, name).

Discretiser.__dir__()

default dir() implementation

Discretiser.__eq__(value, /)

Return self==value.

Discretiser.__format__

default object formatter

Discretiser.__ge__(value, /)

Return self>=value.

Discretiser.__getattribute__(name, /)

Return getattr(self, name).

Discretiser.__getstate__()

Discretiser.__gt__(value, /)

Return self>value.

Discretiser.__hash__()

Return hash(self).

Discretiser.__init__([method, num_buckets, …])

Creates a new Discretiser, that provides fit, fit_transform, and transform function to discretise data.

Discretiser.__init_subclass__

This method is called when a class is subclassed.

Discretiser.__le__(value, /)

Return self<=value.

Discretiser.__lt__(value, /)

Return self<value.

Discretiser.__ne__(value, /)

Return self!=value.

Discretiser.__new__(**kwargs)

Create and return a new object.

Discretiser.__reduce__

helper for pickle

Discretiser.__reduce_ex__

helper for pickle

Discretiser.__repr__([N_CHAR_MAX])

Return repr(self).

Discretiser.__setattr__(name, value, /)

Implement setattr(self, name, value).

Discretiser.__setstate__(state)

Discretiser.__sizeof__()

size of object in memory, in bytes

Discretiser.__str__()

Return str(self).

Discretiser.__subclasshook__

Abstract classes can override this to customize issubclass().

Discretiser._check_n_features(X, reset)

Set the n_features_in_ attribute, or check against it.

Discretiser._get_param_names()

Get parameter names for the estimator

Discretiser._get_tags()

Discretiser._more_tags()

Discretiser._repr_html_inner()

This function is returned by the @property _repr_html_ to make hasattr(estimator, “_repr_html_”) return `True or False depending on get_config()[“display”].

Discretiser._repr_mimebundle_(**kwargs)

Mime bundle used by jupyter kernels to display estimator

Discretiser._validate_data(X[, y, reset, …])

Validate input data and set or check the n_features_in_ attribute.

Discretiser.fit(data)

Fit where split points are based on the input data.

Discretiser.fit_transform(X[, y])

Fit to data, then transform it.

Discretiser.get_params([deep])

Get parameters for this estimator.

Discretiser.set_params(**params)

Set the parameters of this estimator.

Discretiser.transform(data)

Transform the input data into discretised digits, based on the numeric_split_points that were either learned through using fit(), or from initialisation if method=”fixed”.

__init__(method='uniform', num_buckets=None, outlier_percentile=None, numeric_split_points=None, percentile_split_points=None)[source]

Creates a new Discretiser, that provides fit, fit_transform, and transform function to discretise data.

Parameters
  • method (str) – can be one of: - uniform: discretise data into uniformly spaced buckets. Note, complete uniformity cannot be guaranteed under all circumstances, for example, if 5 data points are to split into 2 buckets, then one will contain 2 points, and the other will contain 3. Provide num_buckets. - quantile: discretise data according to the distribution of values. For example, providing num_buckets=4 will discretise data into 4 buckets, [0-25th, 25th-50th, 50th-75th, 75th-100th] percentiles. Provide num_buckets. - outlier: discretise data into 3 buckets - [low_outliers, normal, high_outliers] based on outliers being below outlier_percentile, or above 1-outlier_percentile. Provide outlier_percentile. - fixed: discretise according to pre-defined split points. Provide numeric_split_points - percentiles: discretise data according to the distribution of percentiles values. Provide percentile_split_points.

  • num_buckets (Optional[int]) – (int): used by method=uniform and method=quantile.

  • outlier_percentile (Optional[float]) – used by method=outlier.

  • numeric_split_points (Optional[List[float]]) – used by method=fixed. to split such that values below 10 go into bucket 0,

  • to 20 go into bucket 1 (10) –

  • above 20 go into bucket 2 (and) –

  • [10 (provide) –

  • Note that split_point (21]) –

  • are non-inclusive. (values) –

  • percentile_split_points (Optional[List[float]]) – used by method=percentiles. to split such that values below 10th percentiles

  • into bucket 0 (go) –

  • to below 75th percentiles go into bucket 1 (10th) –

  • 75th percentiles and above go into (and) –

  • 2 (bucket) –

  • [0.1 (provide) –

  • 0.75]

Raises

ValueError – If an incorrect argument is passed.

fit(data)[source]

Fit where split points are based on the input data.

Parameters

data (np.ndarray) – values used to learn where split points exist.

Return type

Discretiser

Returns

self

Raises

RuntimeError – If an attempt to fit fixed numeric_split_points is made.

fit_transform(X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters
  • X (array-like of shape (n_samples, n_features)) – Input samples.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).

  • **fit_params (dict) – Additional fit parameters.

Returns

X_new – Transformed array.

Return type

ndarray array of shape (n_samples, n_features_new)

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params – Parameter names mapped to their values.

Return type

dict

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**params (dict) – Estimator parameters.

Returns

self – Estimator instance.

Return type

estimator instance

transform(data)[source]

Transform the input data into discretised digits, based on the numeric_split_points that were either learned through using fit(), or from initialisation if method=”fixed”.

Parameters

data (np.ndarray) – values that will be transformed into discretised digits.

Return type

ndarray

Returns

input data transformed into discretised digits.