causalnex.discretiser.Discretiser¶

class causalnex.discretiser.Discretiser(method='uniform', num_buckets=None, outlier_percentile=None, numeric_split_points=None, percentile_split_points=None)[source]¶

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

Allows the discretisation of numeric data.

Example:

 import causalnex
 import pandas as pd

 df = pd.DataFrame({'Age': [12, 13, 18, 19, 22, 60]})

 from causalnex.discretiser import Discretiser
 df["Transformed_Age_1"] = Discretiser(method="fixed",
 numeric_split_points=[11,18,50]).transform(df["Age"])
 df.to_dict()
{'Age': {0: 7, 1: 12, 2: 13, 3: 18, 4: 19, 5: 22, 6: 60},
'Transformed_Age': {0: 0, 1: 1, 2: 1, 3: 2, 4: 2, 5: 2, 6: 3}}

Attributes

Methods

`Discretiser.__delattr__`(name, /)	Implement delattr(self, name).
`Discretiser.__dir__`()	Default dir() implementation.
`Discretiser.__eq__`(value, /)	Return self==value.
`Discretiser.__format__`(format_spec, /)	Default object formatter.
`Discretiser.__ge__`(value, /)	Return self>=value.
`Discretiser.__getattribute__`(name, /)	Return getattr(self, name).
`Discretiser.__getstate__`()
`Discretiser.__gt__`(value, /)	Return self>value.
`Discretiser.__hash__`()	Return hash(self).
`Discretiser.__init__`([method, num_buckets, …])	Creates a new Discretiser, that provides fit, fit_transform, and transform function to discretise data.
`Discretiser.__init_subclass__`(**kwargs)	Set the `set_{method}_request` methods.
`Discretiser.__le__`(value, /)	Return self<=value.
`Discretiser.__lt__`(value, /)	Return self<value.
`Discretiser.__ne__`(value, /)	Return self!=value.
`Discretiser.__new__`(**kwargs)	Create and return a new object.
`Discretiser.__reduce__`()	Helper for pickle.
`Discretiser.__reduce_ex__`(protocol, /)	Helper for pickle.
`Discretiser.__repr__`([N_CHAR_MAX])	Return repr(self).
`Discretiser.__setattr__`(name, value, /)	Implement setattr(self, name, value).
`Discretiser.__setstate__`(state)
`Discretiser.__sizeof__`()	Size of object in memory, in bytes.
`Discretiser.__sklearn_clone__`()
`Discretiser.__str__`()	Return str(self).
`Discretiser.__subclasshook__`	Abstract classes can override this to customize issubclass().
`Discretiser._build_request_for_signature`(…)	Build the MethodMetadataRequest for a method using its signature.
`Discretiser._check_feature_names`(X, *, reset)	Set or check the feature_names_in_ attribute.
`Discretiser._check_n_features`(X, reset)	Set the n_features_in_ attribute, or check against it.
`Discretiser._get_default_requests`()	Collect default request values.
`Discretiser._get_metadata_request`()	Get requested data properties.
`Discretiser._get_param_names`()	Get parameter names for the estimator
`Discretiser._get_tags`()
`Discretiser._more_tags`()
`Discretiser._repr_html_inner`()	This function is returned by the @property _repr_html_ to make hasattr(estimator, “_repr_html_”) return `True or False depending on get_config()[“display”].
`Discretiser._repr_mimebundle_`(**kwargs)	Mime bundle used by jupyter kernels to display estimator
`Discretiser._validate_data`([X, y, reset, …])	Validate input data and set or check the n_features_in_ attribute.
`Discretiser._validate_params`()	Validate types and values of constructor parameters
`Discretiser.fit`(data)	Fit where split points are based on the input data.
`Discretiser.fit_transform`(X[, y])	Fit to data, then transform it.
`Discretiser.get_metadata_routing`()	Get metadata routing of this object.
`Discretiser.get_params`([deep])	Get parameters for this estimator.
`Discretiser.set_fit_request`(*[, data])	Request metadata passed to the `fit` method.
`Discretiser.set_output`(*[, transform])	Set output container.
`Discretiser.set_params`(**params)	Set the parameters of this estimator.
`Discretiser.set_transform_request`(*[, data])	Request metadata passed to the `transform` method.
`Discretiser.transform`(data)	Transform the input data into discretised digits, based on the numeric_split_points that were either learned through using fit(), or from initialisation if method=”fixed”.

__init__(method='uniform', num_buckets=None, outlier_percentile=None, numeric_split_points=None, percentile_split_points=None)[source]¶

Creates a new Discretiser, that provides fit, fit_transform, and transform function to discretise data.

Parameters

method (str) – can be one of: - uniform: discretise data into uniformly spaced buckets. Note, complete uniformity cannot be guaranteed under all circumstances, for example, if 5 data points are to split into 2 buckets, then one will contain 2 points, and the other will contain 3. Provide num_buckets. - quantile: discretise data according to the distribution of values. For example, providing num_buckets=4 will discretise data into 4 buckets, [0-25th, 25th-50th, 50th-75th, 75th-100th] percentiles. Provide num_buckets. - outlier: discretise data into 3 buckets - [low_outliers, normal, high_outliers] based on outliers being below outlier_percentile, or above 1-outlier_percentile. Provide outlier_percentile. - fixed: discretise according to pre-defined split points. Provide numeric_split_points - percentiles: discretise data according to the distribution of percentiles values. Provide percentile_split_points.
num_buckets (Optional[int]) – (int): used by method=uniform and method=quantile.
outlier_percentile (Optional[float]) – used by method=outlier.
numeric_split_points (Optional[List[float]]) – used by method=fixed. to split such that values below 10 go into bucket 0,
to 20 go into bucket 1 (10) –
above 20 go into bucket 2 (and) –
[10 (provide) –
Note that split_point (21]) –
are non-inclusive. (values) –
percentile_split_points (Optional[List[float]]) – used by method=percentiles. to split such that values below 10th percentiles
into bucket 0 (go) –
to below 75th percentiles go into bucket 1 (10th) –
75th percentiles and above go into (and) –
2 (bucket) –
[0.1 (provide) –
0.75] –

Raises

ValueError – If an incorrect argument is passed.

fit(data)[source]¶

Fit where split points are based on the input data.

Parameters: data (np.ndarray) – values used to learn where split points exist.
Return type: Discretiser
Returns: self
Raises: RuntimeError – If an attempt to fit fixed numeric_split_points is made.

fit_transform(X, y=None, **fit_params)¶

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters

X (array-like of shape (n_samples, n_features)) – Input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
**fit_params (dict) – Additional fit parameters.

Returns

X_new – Transformed array.

Return type

ndarray array of shape (n_samples, n_features_new)

get_metadata_routing()¶

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns: routing – A MetadataRequest encapsulating routing information.
Return type: MetadataRequest

get_params(deep=True)¶

Get parameters for this estimator.

Parameters: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params – Parameter names mapped to their values.
Return type: dict

set_fit_request(*, data: Union[bool, None, str] = '$UNCHANGED$') → causalnex.discretiser.discretiser.Discretiser¶

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a pipeline.Pipeline. Otherwise it has no effect.

Parameters: data (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for data parameter in fit.
Returns: self – The updated object.
Return type: object

set_output(*, transform=None)¶

Set output container.

See sphx_glr_auto_examples_miscellaneous_plot_set_output.py for an example on how to use the API.

Parameters

transform ({"default", "pandas"}, default=None) –

Configure output of transform and fit_transform.

”default”: Default output format of a transformer
”pandas”: DataFrame output
None: Transform configuration is unchanged

Returns

self – Estimator instance.

Return type

estimator instance

set_params(**params)¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters: **params (dict) – Estimator parameters.
Returns: self – Estimator instance.
Return type: estimator instance

set_transform_request(*, data: Union[bool, None, str] = '$UNCHANGED$') → causalnex.discretiser.discretiser.Discretiser¶

Request metadata passed to the transform method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to transform.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a pipeline.Pipeline. Otherwise it has no effect.

Parameters: data (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for data parameter in transform.
Returns: self – The updated object.
Return type: object

transform(data)[source]¶

Transform the input data into discretised digits, based on the numeric_split_points that were either learned through using fit(), or from initialisation if method=”fixed”.

Parameters: data (np.ndarray) – values that will be transformed into discretised digits.
Return type: ndarray
Returns: input data transformed into discretised digits.