causalnex.structure.DAGClassifier¶

class causalnex.structure.DAGClassifier(dist_type_schema=None, alpha=0.0, beta=0.0, fit_intercept=True, hidden_layer_units=None, threshold=0.0, tabu_edges=None, tabu_parent_nodes=None, tabu_child_nodes=None, dependent_target=True, enforce_dag=False, standardize=False, target_dist_type=None, notears_mlp_kwargs=None)[source]¶

Bases: sklearn.base.ClassifierMixin, causalnex.structure.pytorch.sklearn._base.DAGBase

Classifier wrapper of the StructureModel. Implements the sklearn .fit and .predict interface.

Example:

 from causalnex.sklearn import DAGRegressor

 clf = DAGClassifier(threshold=0.1)
 clf.fit(X_train, y_train)

 y_preds = clf.predict(X_test)
 type(y_preds)
np.ndarray

 type(clf.feature_importances_)
np.ndarray

.. attribute:: feature_importances_

An array of edge weights corresponding

type

np.ndarray

positionally to the feature X.

coef_¶

An array of edge weights corresponding

Type: np.ndarray

positionally to the feature X.

intercept_¶

The target node bias value.

Type: float

Attributes

`DAGClassifier.coef_`	Signed relationship between features and the target. For this linear case this equivalent to linear regression coefficients. :rtype: `ndarray` :returns: the mean effect relationship between nodes. shape: (1, n_features) or (n_classes, n_features).
`DAGClassifier.feature_importances_`	Unsigned importances of the features wrt to the target. NOTE: these are used as the graph adjacency matrix. :rtype: `ndarray` :returns: the L2 relationship between nodes. shape: (1, n_features) or (n_classes, n_features).
`DAGClassifier.intercept_`	Returns: The bias term from the target node.

Methods

`DAGClassifier.__delattr__`(name, /)	Implement delattr(self, name).
`DAGClassifier.__dir__`()	Default dir() implementation.
`DAGClassifier.__eq__`(value, /)	Return self==value.
`DAGClassifier.__format__`(format_spec, /)	Default object formatter.
`DAGClassifier.__ge__`(value, /)	Return self>=value.
`DAGClassifier.__getattribute__`(name, /)	Return getattr(self, name).
`DAGClassifier.__getstate__`()
`DAGClassifier.__gt__`(value, /)	Return self>value.
`DAGClassifier.__hash__`()	Return hash(self).
`DAGClassifier.__init__`([dist_type_schema, …])	type dist_type_schema `Optional`[`Dict`[`Union`[`str`, `int`], `str`]]
`DAGClassifier.__init_subclass__`(**kwargs)	Set the `set_{method}_request` methods.
`DAGClassifier.__le__`(value, /)	Return self<=value.
`DAGClassifier.__lt__`(value, /)	Return self<value.
`DAGClassifier.__ne__`(value, /)	Return self!=value.
`DAGClassifier.__new__`(**kwargs)	Create and return a new object.
`DAGClassifier.__reduce__`()	Helper for pickle.
`DAGClassifier.__reduce_ex__`(protocol, /)	Helper for pickle.
`DAGClassifier.__repr__`([N_CHAR_MAX])	Return repr(self).
`DAGClassifier.__setattr__`(name, value, /)	Implement setattr(self, name, value).
`DAGClassifier.__setstate__`(state)
`DAGClassifier.__sizeof__`()	Size of object in memory, in bytes.
`DAGClassifier.__sklearn_clone__`()
`DAGClassifier.__str__`()	Return str(self).
`DAGClassifier.__subclasshook__`	Abstract classes can override this to customize issubclass().
`DAGClassifier._build_request_for_signature`(…)	Build the MethodMetadataRequest for a method using its signature.
`DAGClassifier._check_feature_names`(X, *, reset)	Set or check the feature_names_in_ attribute.
`DAGClassifier._check_n_features`(X, reset)	Set the n_features_in_ attribute, or check against it.
`DAGClassifier._get_default_requests`()	Collect default request values.
`DAGClassifier._get_metadata_request`()	Get requested data properties.
`DAGClassifier._get_param_names`()	Get parameter names for the estimator
`DAGClassifier._get_tags`()
`DAGClassifier._more_tags`()
`DAGClassifier._repr_html_inner`()	This function is returned by the @property _repr_html_ to make hasattr(estimator, “_repr_html_”) return `True or False depending on get_config()[“display”].
`DAGClassifier._repr_mimebundle_`(**kwargs)	Mime bundle used by jupyter kernels to display estimator
`DAGClassifier._validate_data`([X, y, reset, …])	Validate input data and set or check the n_features_in_ attribute.
`DAGClassifier._validate_params`()	Validate types and values of constructor parameters
`DAGClassifier.fit`(X, y)	Fits the sm model using the concat of X and y.
`DAGClassifier.get_edges_to_node`(name[, data])	Get the edges to a specific node. :type name: `str` :param name: The name of the node which to get weights towards. :type data: `str` :param data: The edge parameter to get. Default is “weight” to return the adjacency matrix. Set to “mean_effect” to return the signed average effect of features on the target node.
`DAGClassifier.get_metadata_routing`()	Get metadata routing of this object.
`DAGClassifier.get_params`([deep])	Get parameters for this estimator.
`DAGClassifier.plot_dag`(output_filename[, …])	Plot the DAG of the fitted model. :type enforce_dag: `bool` :param enforce_dag: Whether to threshold the model until it is a DAG. :param Does not alter the underlying model.: :type plot_structure_kwargs: `Optional`[`Dict`[`str`, `Dict`]] :param plot_structure_kwargs: Dictionary of kwargs for the causalnex plotting module. :type layout_kwargs: `Optional`[`Dict`[`str`, `Dict`]] :param layout_kwargs: Dictionary to set the layout and physics of the graph. :param Example: :param ::: layout_kwargs = { “physics”: { “solver”: “repulsion” }, “layout”: { “hierarchical”: { “enabled”: True } } } :type output_filename: `str` :param output_filename: If provided, write html to a given path, e.g. “./plot.html”.
`DAGClassifier.predict`(X)	Uses the fitted NOTEARS algorithm to reconstruct y from known X data.
`DAGClassifier.predict_proba`(X)	Uses the fitted NOTEARS algorithm to reconstruct y from known X data.
`DAGClassifier.score`(X, y[, sample_weight])	Return the mean accuracy on the given test data and labels.
`DAGClassifier.set_params`(**params)	Set the parameters of this estimator.
`DAGClassifier.set_score_request`(*[, …])	Request metadata passed to the `score` method.

__init__(dist_type_schema=None, alpha=0.0, beta=0.0, fit_intercept=True, hidden_layer_units=None, threshold=0.0, tabu_edges=None, tabu_parent_nodes=None, tabu_child_nodes=None, dependent_target=True, enforce_dag=False, standardize=False, target_dist_type=None, notears_mlp_kwargs=None)¶

Parameters

dist_type_schema (Optional[Dict[Union[str, int], str]]) – The dist type schema corresponding to the X data passed to fit or predict.
maps the pandas column name in X to the string alias of a dist type. (It) –
X is a np.ndarray (If) –
maps the positional index to the string alias of a dist type. (it) –
list of alias names can be found in dist_type/__init__.py. (A) –
None (If) –
that all data in X is continuous. (assumes) –
alpha (float) – l1 loss weighting. When using nonlinear layers this is only applied
the first layer. (to) –
beta (float) – l2 loss weighting. Applied across all layers. Reccomended to use this
fitting nonlinearities. (when) –
fit_intercept (bool) – Whether to fit an intercept in the structure model
Use this if variables are offset. (equation.) –
hidden_layer_units (Optional[Iterable[int]]) – An iterable where its length determine the number of layers used,
the numbers determine the number of nodes used for the layer in order. (and) –
threshold (float) – The thresholding to apply to the DAG weights.
0.0 (If) –
not apply any threshold. (does) –
tabu_edges (Optional[List]) – Tabu edges passed directly to the NOTEARS algorithm.
tabu_parent_nodes (Optional[List]) – Tabu nodes passed directly to the NOTEARS algorithm.
tabu_child_nodes (Optional[List]) – Tabu nodes passed directly to the NOTEARS algorithm.
dependent_target (bool) – If True, constrains NOTEARS so that y can only
dependent (be) –
enforce_dag (bool) – If True, thresholds the graph until it is a DAG.
a properly trained model should be a DAG (NOTE) –
failure (and) –
other issues. Use of this is only recommended if (indicates) –
have similar units (features) –
comparing edge weight (otherwise) –
has limited meaning. (magnitude) –
standardize (bool) – Whether to standardize the X and y variables before fitting.
L-BFGS algorithm used to fit the underlying NOTEARS works best on data (The) –
of the same scale so this parameter is reccomended. (all) –
notears_mlp_kwargs (Optional[Dict]) – Additional arguments for the NOTEARS MLP model.
target_dist_type (Optional[str]) – The distribution type of the target.
the same aliases as dist_type_schema. (Uses) –

Raises

TypeError – if alpha is not numeric.
TypeError – if beta is not numeric.
TypeError – if fit_intercept is not a bool.
TypeError – if threshold is not numeric.
NotImplementedError – if target_dist_type not in supported_types

property coef_¶: Signed relationship between features and the target. For this linear case this equivalent to linear regression coefficients. :rtype: ndarray :returns: the mean effect relationship between nodes.

shape: (1, n_features) or (n_classes, n_features).

property feature_importances_¶: Unsigned importances of the features wrt to the target. NOTE: these are used as the graph adjacency matrix. :rtype: ndarray :returns: the L2 relationship between nodes.

shape: (1, n_features) or (n_classes, n_features).

fit(X, y)[source]¶

Fits the sm model using the concat of X and y.

Raises

NotImplementedError – If unsupported target_dist_type provided.
ValueError – If less than 2 classes provided.

Return type

DAGClassifier

Returns

Instance of DAGClassifier.

get_edges_to_node(name, data='weight')¶

Get the edges to a specific node. :type name: str :param name: The name of the node which to get weights towards. :type data: str :param data: The edge parameter to get. Default is “weight” to return

the adjacency matrix. Set to “mean_effect” to return the signed average effect of features on the target node.

Return type: Series
Returns: The specified edge data.

get_metadata_routing()¶

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns: routing – A MetadataRequest encapsulating routing information.
Return type: MetadataRequest

get_params(deep=True)¶

Get parameters for this estimator.

Parameters: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params – Parameter names mapped to their values.
Return type: dict

property intercept_¶

Returns: The bias term from the target node. shape: (1,) or (n_classes,).

Return type: ndarray

plot_dag(output_filename, enforce_dag=False, plot_structure_kwargs=None, layout_kwargs=None)¶

Plot the DAG of the fitted model. :type enforce_dag: bool :param enforce_dag: Whether to threshold the model until it is a DAG. :param Does not alter the underlying model.: :type plot_structure_kwargs: Optional[Dict[str, Dict]] :param plot_structure_kwargs: Dictionary of kwargs for the causalnex plotting module. :type layout_kwargs: Optional[Dict[str, Dict]] :param layout_kwargs: Dictionary to set the layout and physics of the graph. :param Example: :param ::

layout_kwargs = {
       "physics": {
           "solver": "repulsion"
           },
       "layout": {
           "hierarchical": {
               "enabled": True
               }
           }
       }

Parameters: output_filename (str) – If provided, write html to a given path, e.g. “./plot.html”
Return type: IFrame
Returns: Plot of the DAG with the proper encoding to run on Windows machines.

predict(X)[source]¶

Uses the fitted NOTEARS algorithm to reconstruct y from known X data.

Return type: ndarray
Returns: Predicted y values for each row of X.

predict_proba(X)[source]¶

Uses the fitted NOTEARS algorithm to reconstruct y from known X data.

Return type: ndarray
Returns: Predicted y class probabilities for each row of X.

score(X, y, sample_weight=None)¶

Return the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters

X (array-like of shape (n_samples, n_features)) – Test samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True labels for X.
sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.

Returns

score – Mean accuracy of self.predict(X) w.r.t. y.

Return type

float

set_params(**params)¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters: **params (dict) – Estimator parameters.
Returns: self – Estimator instance.
Return type: estimator instance

set_score_request(*, sample_weight: Union[bool, None, str] = '$UNCHANGED$') → causalnex.structure.pytorch.sklearn.clf.DAGClassifier¶

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a pipeline.Pipeline. Otherwise it has no effect.

Parameters: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.
Returns: self – The updated object.
Return type: object