causalnex.structure.DAGRegressor

class causalnex.structure.DAGRegressor(dist_type_schema=None, alpha=0.0, beta=0.0, fit_intercept=True, hidden_layer_units=None, threshold=0.0, tabu_edges=None, tabu_parent_nodes=None, tabu_child_nodes=None, dependent_target=True, enforce_dag=False, standardize=False, target_dist_type=None, notears_mlp_kwargs=None)[source]

Bases: sklearn.base.RegressorMixin, causalnex.structure.pytorch.sklearn._base.DAGBase

Regressor wrapper of the StructureModel. Implements the sklearn .fit and .predict interface.

Example:

 from causalnex.sklearn import DAGRegressor

 reg = DAGRegressor(threshold=0.1)
 reg.fit(X_train, y_train)

 y_preds = reg.predict(X_test)
 type(y_preds)
np.ndarray

 type(reg.feature_importances_)
np.ndarray
.. attribute:: feature_importances_

An array of edge weights corresponding

type

np.ndarray

positionally to the feature X.
coef_

An array of edge weights corresponding

Type

np.ndarray

positionally to the feature X.
intercept_

The target node bias value.

Type

float

Attributes

DAGRegressor.coef_

Signed relationship between features and the target.

DAGRegressor.feature_importances_

Unsigned importances of the features wrt to the target.

DAGRegressor.intercept_

The bias term from the target node

Methods

DAGRegressor.__delattr__(name, /)

Implement delattr(self, name).

DAGRegressor.__dir__()

default dir() implementation

DAGRegressor.__eq__(value, /)

Return self==value.

DAGRegressor.__format__

default object formatter

DAGRegressor.__ge__(value, /)

Return self>=value.

DAGRegressor.__getattribute__(name, /)

Return getattr(self, name).

DAGRegressor.__getstate__()

DAGRegressor.__gt__(value, /)

Return self>value.

DAGRegressor.__hash__()

Return hash(self).

DAGRegressor.__init__([dist_type_schema, …])

type dist_type_schema

Optional[Dict[Union[str, int], str]]

DAGRegressor.__init_subclass__

This method is called when a class is subclassed.

DAGRegressor.__le__(value, /)

Return self<=value.

DAGRegressor.__lt__(value, /)

Return self<value.

DAGRegressor.__ne__(value, /)

Return self!=value.

DAGRegressor.__new__(**kwargs)

Create and return a new object.

DAGRegressor.__reduce__

helper for pickle

DAGRegressor.__reduce_ex__

helper for pickle

DAGRegressor.__repr__([N_CHAR_MAX])

Return repr(self).

DAGRegressor.__setattr__(name, value, /)

Implement setattr(self, name, value).

DAGRegressor.__setstate__(state)

DAGRegressor.__sizeof__()

size of object in memory, in bytes

DAGRegressor.__str__()

Return str(self).

DAGRegressor.__subclasshook__

Abstract classes can override this to customize issubclass().

DAGRegressor._check_n_features(X, reset)

Set the n_features_in_ attribute, or check against it.

DAGRegressor._get_param_names()

Get parameter names for the estimator

DAGRegressor._get_tags()

DAGRegressor._more_tags()

DAGRegressor._repr_html_inner()

This function is returned by the @property _repr_html_ to make hasattr(estimator, “_repr_html_”) return `True or False depending on get_config()[“display”].

DAGRegressor._repr_mimebundle_(**kwargs)

Mime bundle used by jupyter kernels to display estimator

DAGRegressor._validate_data(X[, y, reset, …])

Validate input data and set or check the n_features_in_ attribute.

DAGRegressor.fit(X, y)

Fits the sm model using the concat of X and y.

DAGRegressor.get_edges_to_node(name[, data])

Get the edges to a specific node. :type name: str :param name: The name of the node which to get weights towards. :type data: str :param data: The edge parameter to get. Default is “weight” to return the adjacency matrix. Set to “mean_effect” to return the signed average effect of features on the target node.

DAGRegressor.get_params([deep])

Get parameters for this estimator.

DAGRegressor.plot_dag([enforce_dag, …])

Plot the DAG of the fitted model.

DAGRegressor.predict(X)

Uses the fitted NOTEARS algorithm to reconstruct y from known X data.

DAGRegressor.score(X, y[, sample_weight])

Return the coefficient of determination \(R^2\) of the prediction.

DAGRegressor.set_params(**params)

Set the parameters of this estimator.

__init__(dist_type_schema=None, alpha=0.0, beta=0.0, fit_intercept=True, hidden_layer_units=None, threshold=0.0, tabu_edges=None, tabu_parent_nodes=None, tabu_child_nodes=None, dependent_target=True, enforce_dag=False, standardize=False, target_dist_type=None, notears_mlp_kwargs=None)
Parameters
  • dist_type_schema (Optional[Dict[Union[str, int], str]]) – The dist type schema corresponding to the X data passed to fit or predict.

  • maps the pandas column name in X to the string alias of a dist type. (It) –

  • X is a np.ndarray (If) –

  • maps the positional index to the string alias of a dist type. (it) –

  • list of alias names can be found in dist_type/__init__.py. (A) –

  • None (If) –

  • that all data in X is continuous. (assumes) –

  • alpha (float) – l1 loss weighting. When using nonlinear layers this is only applied

  • the first layer. (to) –

  • beta (float) – l2 loss weighting. Applied across all layers. Reccomended to use this

  • fitting nonlinearities. (when) –

  • fit_intercept (bool) – Whether to fit an intercept in the structure model

  • Use this if variables are offset. (equation.) –

  • hidden_layer_units (Optional[Iterable[int]]) – An iterable where its length determine the number of layers used,

  • the numbers determine the number of nodes used for the layer in order. (and) –

  • threshold (float) – The thresholding to apply to the DAG weights.

  • 0.0 (If) –

  • not apply any threshold. (does) –

  • tabu_edges (Optional[List]) – Tabu edges passed directly to the NOTEARS algorithm.

  • tabu_parent_nodes (Optional[List]) – Tabu nodes passed directly to the NOTEARS algorithm.

  • tabu_child_nodes (Optional[List]) – Tabu nodes passed directly to the NOTEARS algorithm.

  • dependent_target (bool) – If True, constrains NOTEARS so that y can only

  • dependent (be) –

  • enforce_dag (bool) – If True, thresholds the graph until it is a DAG.

  • a properly trained model should be a DAG (NOTE) –

  • failure (and) –

  • other issues. Use of this is only recommended if (indicates) –

  • have similar units (features) –

  • comparing edge weight (otherwise) –

  • has limited meaning. (magnitude) –

  • standardize (bool) – Whether to standardize the X and y variables before fitting.

  • L-BFGS algorithm used to fit the underlying NOTEARS works best on data (The) –

  • of the same scale so this parameter is reccomended. (all) –

  • notears_mlp_kwargs (Optional[Dict]) – Additional arguments for the NOTEARS MLP model.

  • target_dist_type (Optional[str]) – The distribution type of the target.

  • the same aliases as dist_type_schema. (Uses) –

Raises
  • TypeError – if alpha is not numeric.

  • TypeError – if beta is not numeric.

  • TypeError – if fit_intercept is not a bool.

  • TypeError – if threshold is not numeric.

  • NotImplementedError – if target_dist_type not in supported_types

property coef_

Signed relationship between features and the target. For this linear case this equivalent to linear regression coefficients. :rtype: ndarray :returns: the mean effect relationship between nodes.

property feature_importances_

Unsigned importances of the features wrt to the target. NOTE: these are used as the graph adjacency matrix. :rtype: ndarray :returns: the L2 relationship between nodes.

fit(X, y)[source]

Fits the sm model using the concat of X and y.

Raises

NotImplementedError – If unsupported _target_dist_type provided.

Return type

DAGRegressor

Returns

Instance of DAGRegressor.

get_edges_to_node(name, data='weight')

Get the edges to a specific node. :type name: str :param name: The name of the node which to get weights towards. :type data: str :param data: The edge parameter to get. Default is “weight” to return

the adjacency matrix. Set to “mean_effect” to return the signed average effect of features on the target node.

Return type

Series

Returns

The specified edge data.

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params – Parameter names mapped to their values.

Return type

dict

property intercept_

The bias term from the target node

Return type

float

plot_dag(enforce_dag=False, plot_structure_kwargs=None, use_mpl=True, ax=None, pixel_size_in=0.01)

Plot the DAG of the fitted model. :type enforce_dag: bool :param enforce_dag: Whether to threshold the model until it is a DAG. :param Does not alter the underlying model.: :type ax: Optional[Axes] :param ax: Matplotlib axes to plot the model on. :param If None: :param creates axis.: :type pixel_size_in: float :param pixel_size_in: Scaling multiple for the plot. :type plot_structure_kwargs: Optional[Dict] :param plot_structure_kwargs: Dictionary of kwargs for the causalnex plotting module. :type use_mpl: bool :param use_mpl: Whether to use matplotlib as the backend. :param If False: :param ax and pixel_size_in are ignored.:

Return type

Union[Tuple[Figure, Axes], Image]

Returns

Plot of the DAG.

predict(X)

Uses the fitted NOTEARS algorithm to reconstruct y from known X data.

Return type

ndarray

Returns

Predicted y values for each row of X.

score(X, y, sample_weight=None)

Return the coefficient of determination \(R^2\) of the prediction.

The coefficient \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares ((y_true - y_pred) ** 2).sum() and \(v\) is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.

Parameters
  • X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape (n_samples, n_samples_fitted), where n_samples_fitted is the number of samples used in the fitting for the estimator.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True values for X.

  • sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.

Returns

score\(R^2\) of self.predict(X) wrt. y.

Return type

float

Notes

The \(R^2\) score used when calling score on a regressor uses multioutput='uniform_average' from version 0.23 to keep consistent with default value of r2_score(). This influences the score method of all the multioutput regressors (except for MultiOutputRegressor).

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**params (dict) – Estimator parameters.

Returns

self – Estimator instance.

Return type

estimator instance