causalnex.discretiser.Discretiser

class causalnex.discretiser.Discretiser(method='uniform', num_buckets=None, outlier_percentile=None, numeric_split_points=None, percentile_split_points=None)[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

Allows the discretisation of numeric data.

Example:

 import causalnex
 import pandas as pd

 df = pd.DataFrame({'Age': [12, 13, 18, 19, 22, 60]})

 from causalnex.discretiser import Discretiser
 df["Transformed_Age_1"] = Discretiser(method="fixed",
 numeric_split_points=[11,18,50]).transform(df["Age"])
 df.to_dict()
{'Age': {0: 7, 1: 12, 2: 13, 3: 18, 4: 19, 5: 22, 6: 60},
'Transformed_Age': {0: 0, 1: 1, 2: 1, 3: 2, 4: 2, 5: 2, 6: 3}}

Methods

Discretiser.__init__([method, num_buckets, …]) Creates a new Discretiser, that provides fit, fit_transform, and transform function to discretise data.
Discretiser.fit(data) Fit where split points are based on the input data.
Discretiser.fit_transform(X[, y]) Fit to data, then transform it.
Discretiser.get_params([deep]) Get parameters for this estimator.
Discretiser.set_params(**params) Set the parameters of this estimator.
Discretiser.transform(data) Transform the input data into discretised digits, based on the numeric_split_points that were either learned through using fit(), or from initialisation if method=”fixed”.
__init__(method='uniform', num_buckets=None, outlier_percentile=None, numeric_split_points=None, percentile_split_points=None)[source]

Creates a new Discretiser, that provides fit, fit_transform, and transform function to discretise data.

Parameters:
  • method (str) – can be one of: - uniform: discretise data into uniformly spaced buckets. Note, complete uniformity cannot be guaranteed under all circumstances, for example, if 5 data points are to split into 2 buckets, then one will contain 2 points, and the other will contain 3. Provide num_buckets. - quantile: discretise data according to the distribution of values. For example, providing num_buckets=4 will discretise data into 4 buckets, [0-25th, 25th-50th, 50th-75th, 75th-100th] percentiles. Provide num_buckets. - outlier: discretise data into 3 buckets - [low_outliers, normal, high_outliers] based on outliers being below outlier_percentile, or above 1-outlier_percentile. Provide outlier_percentile. - fixed: discretise according to pre-defined split points. Provide numeric_split_points - percentiles: discretise data according to the distribution of percentiles values. Provide percentile_split_points.
  • num_buckets (Optional[int]) – (int): used by method=uniform and method=quantile.
  • outlier_percentile (Optional[float]) – used by method=outlier.
  • numeric_split_points (Optional[List[float]]) – used by method=fixed. to split such that values below 10 go into bucket 0,
  • to 20 go into bucket 1, and above 20 go into bucket 2, provide [10, 21] Note that split_point (10) –
  • are non-inclusive. (values) –
  • percentile_split_points (Optional[List[float]]) – used by method=percentiles. to split such that values below 10th percentiles
  • into bucket 0, 10th to below 75th percentiles go into bucket 1, and 75th percentiles and above go into (go) –
  • 2, provide [0.1, 0.75] (bucket) –
Raises:

ValueError – If an incorrect argument is passed.

fit(data)[source]

Fit where split points are based on the input data.

Parameters:data (np.ndarray) – values used to learn where split points exist.
Return type:Discretiser
Returns:self
Raises:RuntimeError – If an attempt to fit fixed numeric_split_points is made.
fit_transform(X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
  • X (numpy array of shape [n_samples, n_features]) – Training set.
  • y (numpy array of shape [n_samples]) – Target values.
  • **fit_params (dict) – Additional fit parameters.
Returns:

X_new – Transformed array.

Return type:

numpy array of shape [n_samples, n_features_new]

get_params(deep=True)

Get parameters for this estimator.

Parameters:deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:params – Parameter names mapped to their values.
Return type:mapping of string to any
set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:**params (dict) – Estimator parameters.
Returns:self – Estimator instance.
Return type:object
transform(data)[source]

Transform the input data into discretised digits, based on the numeric_split_points that were either learned through using fit(), or from initialisation if method=”fixed”.

Parameters:data (np.ndarray) – values that will be transformed into discretised digits.
Return type:ndarray
Returns:input data transformed into discretised digits.