causalnex.discretiser.Discretiser¶

class causalnex.discretiser.Discretiser(method='uniform', num_buckets=None, outlier_percentile=None, numeric_split_points=None, percentile_split_points=None)[source]¶

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

Allows the discretisation of numeric data.

Example:

 import causalnex
 import pandas as pd

 df = pd.DataFrame({'Age': [12, 13, 18, 19, 22, 60]})

 from causalnex.discretiser import Discretiser
 df["Transformed_Age_1"] = Discretiser(method="fixed",
 numeric_split_points=[11,18,50]).transform(df["Age"])
 df.to_dict()
{'Age': {0: 7, 1: 12, 2: 13, 3: 18, 4: 19, 5: 22, 6: 60},
'Transformed_Age': {0: 0, 1: 1, 2: 1, 3: 2, 4: 2, 5: 2, 6: 3}}

Methods

`Discretiser.__init__`([method, num_buckets, …])	Creates a new Discretiser, that provides fit, fit_transform, and transform function to discretise data.
`Discretiser.fit`(data)	Fit where split points are based on the input data.
`Discretiser.fit_transform`(X[, y])	Fit to data, then transform it.
`Discretiser.get_params`([deep])	Get parameters for this estimator.
`Discretiser.set_params`(**params)	Set the parameters of this estimator.
`Discretiser.transform`(data)	Transform the input data into discretised digits, based on the numeric_split_points that were either learned through using fit(), or from initialisation if method=”fixed”.

__init__(method='uniform', num_buckets=None, outlier_percentile=None, numeric_split_points=None, percentile_split_points=None)[source]¶

Creates a new Discretiser, that provides fit, fit_transform, and transform function to discretise data.

Parameters:

method (str) – can be one of: - uniform: discretise data into uniformly spaced buckets. Note, complete uniformity cannot be guaranteed under all circumstances, for example, if 5 data points are to split into 2 buckets, then one will contain 2 points, and the other will contain 3. Provide num_buckets. - quantile: discretise data according to the distribution of values. For example, providing num_buckets=4 will discretise data into 4 buckets, [0-25th, 25th-50th, 50th-75th, 75th-100th] percentiles. Provide num_buckets. - outlier: discretise data into 3 buckets - [low_outliers, normal, high_outliers] based on outliers being below outlier_percentile, or above 1-outlier_percentile. Provide outlier_percentile. - fixed: discretise according to pre-defined split points. Provide numeric_split_points - percentiles: discretise data according to the distribution of percentiles values. Provide percentile_split_points.
num_buckets (Optional[int]) – (int): used by method=uniform and method=quantile.
outlier_percentile (Optional[float]) – used by method=outlier.
numeric_split_points (Optional[List[float]]) – used by method=fixed. to split such that values below 10 go into bucket 0,
to 20 go into bucket 1, and above 20 go into bucket 2, provide [10, 21] Note that split_point (10) –
are non-inclusive. (values) –
percentile_split_points (Optional[List[float]]) – used by method=percentiles. to split such that values below 10th percentiles
into bucket 0, 10th to below 75th percentiles go into bucket 1, and 75th percentiles and above go into (go) –
2, provide [0.1, 0.75] (bucket) –

Raises:

ValueError – If an incorrect argument is passed.

fit(data)[source]¶

Fit where split points are based on the input data.

Parameters:	data (np.ndarray) – values used to learn where split points exist.
Return type:	`Discretiser`
Returns:	self
Raises:	`RuntimeError` – If an attempt to fit fixed numeric_split_points is made.

fit_transform(X, y=None, **fit_params)¶

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:	X (numpy array of shape [n_samples, n_features]) – Training set. y (numpy array of shape [n_samples]) – Target values.
Returns:	X_new – Transformed array.
Return type:	numpy array of shape [n_samples, n_features_new]

get_params(deep=True)¶

Get parameters for this estimator.

Parameters:	deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:	params – Parameter names mapped to their values.
Return type:	mapping of string to any

set_params(**params)¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:
Return type:	self

transform(data)[source]¶

Transform the input data into discretised digits, based on the numeric_split_points that were either learned through using fit(), or from initialisation if method=”fixed”.

Parameters:	data (np.ndarray) – values that will be transformed into discretised digits.
Return type:	`ndarray`
Returns:	input data transformed into discretised digits.