causalnex.discretiser.Discretiser¶
-
class
causalnex.discretiser.
Discretiser
(method='uniform', num_buckets=None, outlier_percentile=None, numeric_split_points=None, percentile_split_points=None)[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
Allows the discretisation of numeric data.
Example:
import causalnex import pandas as pd df = pd.DataFrame({'Age': [12, 13, 18, 19, 22, 60]}) from causalnex.discretiser import Discretiser df["Transformed_Age_1"] = Discretiser(method="fixed", numeric_split_points=[11,18,50]).transform(df["Age"]) df.to_dict() {'Age': {0: 7, 1: 12, 2: 13, 3: 18, 4: 19, 5: 22, 6: 60}, 'Transformed_Age': {0: 0, 1: 1, 2: 1, 3: 2, 4: 2, 5: 2, 6: 3}}
Methods
Discretiser.__init__
([method, num_buckets, …])Creates a new Discretiser, that provides fit, fit_transform, and transform function to discretise data. Discretiser.fit
(data)Fit where split points are based on the input data. Discretiser.fit_transform
(X[, y])Fit to data, then transform it. Discretiser.get_params
([deep])Get parameters for this estimator. Discretiser.set_params
(**params)Set the parameters of this estimator. Discretiser.transform
(data)Transform the input data into discretised digits, based on the numeric_split_points that were either learned through using fit(), or from initialisation if method=”fixed”. -
__init__
(method='uniform', num_buckets=None, outlier_percentile=None, numeric_split_points=None, percentile_split_points=None)[source]¶ Creates a new Discretiser, that provides fit, fit_transform, and transform function to discretise data.
Parameters: - method (str) – can be one of: - uniform: discretise data into uniformly spaced buckets. Note, complete uniformity cannot be guaranteed under all circumstances, for example, if 5 data points are to split into 2 buckets, then one will contain 2 points, and the other will contain 3. Provide num_buckets. - quantile: discretise data according to the distribution of values. For example, providing num_buckets=4 will discretise data into 4 buckets, [0-25th, 25th-50th, 50th-75th, 75th-100th] percentiles. Provide num_buckets. - outlier: discretise data into 3 buckets - [low_outliers, normal, high_outliers] based on outliers being below outlier_percentile, or above 1-outlier_percentile. Provide outlier_percentile. - fixed: discretise according to pre-defined split points. Provide numeric_split_points - percentiles: discretise data according to the distribution of percentiles values. Provide percentile_split_points.
- num_buckets (
Optional
[int
]) – (int): used by method=uniform and method=quantile. - outlier_percentile (
Optional
[float
]) – used by method=outlier. - numeric_split_points (
Optional
[List
[float
]]) – used by method=fixed. to split such that values below 10 go into bucket 0, - to 20 go into bucket 1, and above 20 go into bucket 2, provide [10, 21] Note that split_point (10) –
- are non-inclusive. (values) –
- percentile_split_points (
Optional
[List
[float
]]) – used by method=percentiles. to split such that values below 10th percentiles - into bucket 0, 10th to below 75th percentiles go into bucket 1, and 75th percentiles and above go into (go) –
- 2, provide [0.1, 0.75] (bucket) –
Raises: ValueError
– If an incorrect argument is passed.
-
fit
(data)[source]¶ Fit where split points are based on the input data.
Parameters: data (np.ndarray) – values used to learn where split points exist. Return type: Discretiser
Returns: self Raises: RuntimeError
– If an attempt to fit fixed numeric_split_points is made.
-
fit_transform
(X, y=None, **fit_params)¶ Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters: - X (numpy array of shape [n_samples, n_features]) – Training set.
- y (numpy array of shape [n_samples]) – Target values.
Returns: X_new – Transformed array.
Return type: numpy array of shape [n_samples, n_features_new]
-
get_params
(deep=True)¶ Get parameters for this estimator.
Parameters: deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns: params – Parameter names mapped to their values. Return type: mapping of string to any
-
set_params
(**params)¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it’s possible to update each component of a nested object.Returns: Return type: self
-
transform
(data)[source]¶ Transform the input data into discretised digits, based on the numeric_split_points that were either learned through using fit(), or from initialisation if method=”fixed”.
Parameters: data (np.ndarray) – values that will be transformed into discretised digits. Return type: ndarray
Returns: input data transformed into discretised digits.
-