# causalnex.discretiser.Discretiser¶

class causalnex.discretiser.Discretiser(method='uniform', num_buckets=None, outlier_percentile=None, numeric_split_points=None, percentile_split_points=None)[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

Allows the discretisation of numeric data.

Example:

 import causalnex
import pandas as pd

df = pd.DataFrame({'Age': [12, 13, 18, 19, 22, 60]})

from causalnex.discretiser import Discretiser
df["Transformed_Age_1"] = Discretiser(method="fixed",
numeric_split_points=[11,18,50]).transform(df["Age"])
df.to_dict()
{'Age': {0: 7, 1: 12, 2: 13, 3: 18, 4: 19, 5: 22, 6: 60},
'Transformed_Age': {0: 0, 1: 1, 2: 1, 3: 2, 4: 2, 5: 2, 6: 3}}


Methods

 Discretiser.__init__([method, num_buckets, …]) Creates a new Discretiser, that provides fit, fit_transform, and transform function to discretise data. Discretiser.fit(data) Fit where split points are based on the input data. Discretiser.fit_transform(X[, y]) Fit to data, then transform it. Discretiser.get_params([deep]) Get parameters for this estimator. Discretiser.set_params(**params) Set the parameters of this estimator. Discretiser.transform(data) Transform the input data into discretised digits, based on the numeric_split_points that were either learned through using fit(), or from initialisation if method=”fixed”.
__init__(method='uniform', num_buckets=None, outlier_percentile=None, numeric_split_points=None, percentile_split_points=None)[source]

Creates a new Discretiser, that provides fit, fit_transform, and transform function to discretise data.

Parameters: method (str) – can be one of: - uniform: discretise data into uniformly spaced buckets. Note, complete uniformity cannot be guaranteed under all circumstances, for example, if 5 data points are to split into 2 buckets, then one will contain 2 points, and the other will contain 3. Provide num_buckets. - quantile: discretise data according to the distribution of values. For example, providing num_buckets=4 will discretise data into 4 buckets, [0-25th, 25th-50th, 50th-75th, 75th-100th] percentiles. Provide num_buckets. - outlier: discretise data into 3 buckets - [low_outliers, normal, high_outliers] based on outliers being below outlier_percentile, or above 1-outlier_percentile. Provide outlier_percentile. - fixed: discretise according to pre-defined split points. Provide numeric_split_points - percentiles: discretise data according to the distribution of percentiles values. Provide percentile_split_points. num_buckets (Optional[int]) – (int): used by method=uniform and method=quantile. outlier_percentile (Optional[float]) – used by method=outlier. numeric_split_points (Optional[List[float]]) – used by method=fixed. to split such that values below 10 go into bucket 0, to 20 go into bucket 1, and above 20 go into bucket 2, provide [10, 21] Note that split_point (10) – are non-inclusive. (values) – percentile_split_points (Optional[List[float]]) – used by method=percentiles. to split such that values below 10th percentiles into bucket 0, 10th to below 75th percentiles go into bucket 1, and 75th percentiles and above go into (go) – 2, provide [0.1, 0.75] (bucket) – ValueError – If an incorrect argument is passed.
fit(data)[source]

Fit where split points are based on the input data.

Parameters: data (np.ndarray) – values used to learn where split points exist. Discretiser self RuntimeError – If an attempt to fit fixed numeric_split_points is made.
fit_transform(X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters: X (numpy array of shape [n_samples, n_features]) – Training set. y (numpy array of shape [n_samples]) – Target values. **fit_params (dict) – Additional fit parameters. X_new – Transformed array. numpy array of shape [n_samples, n_features_new]
get_params(deep=True)

Get parameters for this estimator.

Parameters: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators. params – Parameter names mapped to their values. mapping of string to any
set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters: **params (dict) – Estimator parameters. self – Estimator instance. object
transform(data)[source]

Transform the input data into discretised digits, based on the numeric_split_points that were either learned through using fit(), or from initialisation if method=”fixed”.

Parameters: data (np.ndarray) – values that will be transformed into discretised digits. ndarray input data transformed into discretised digits.