Models

Base

class bsix.models.base.BaseSurvival[source]

Bases: BaseEstimator, ABC

Abstract Class for Survival Analysis models.

abstract calculate_xai(X, **kwargs)[source]: Calculate XAI values.

static dinamic_discretise(y, dataset, seed=0, plot=False)[source]: Discretise data by piecewise exponential and show in kaplan meier.

static feature_selection(X, y)[source]: Calculate the best features based on p-value.

abstract fit(X, y, **kwargs)[source]: Fit the model.

static generate_simulated_survival_data(number_rows=1000, number_columns=10, censored=0.75, relation=None, seed=0)[source]: Generate simulated survival data based.

static logrank_test(y, groups, weights=None)[source]: Calculate the log-rank test for n groups.

static plot_coefficients(coefficients, estimator_name, dataset, seed=None, progression=None)[source]: Plot XAI coefficients for the data (lollipop plot).

static plot_individual_shap(shap_explainer, identifier_index, index, scaler, estimator_name, dataset, seed=None, progression=None)[source]: Plot SHAP values for an individual instance (horizontal bar plot).

static plot_shap(shap_explainer, index, scaler, estimator_name, dataset, seed=None, progression=None)[source]: Plot SHAP values for the data (beeswarm plot).

abstract predict(X, **kwargs)[source]: Predict on X.

abstract predict_cumulative_hazard_function(X, **kwargs)[source]: H(x,t) = H0(t) * exp(g(x)).

abstract predict_survival_function(X, **kwargs)[source]: S(x, t) = exp(-H(x, t)).

static to_time_dependent(dataframe, splits, identifier='identifier', time='time', event='event')[source]: Transform a DataFrame with a per-subject measurement into a time-dependent format.

static to_time_varying(dataframe, identifier='identifier', time='time', event='event')[source]: Transform a DataFrame with a multiple-subject measurements into a start-stop format.

Metodologies

class bsix.models.AcceleratedFailureTime(type='WeibullAFT', penalizer=0.0, l1_ratio=0.0)[source]

Bases: BaseSurvival

Weibull Accelerated Failure Time model.

calculate_xai(X, index, scaler, dataset, seed, feature_names, background=False, plot=False)[source]: Calculate XAI values.

fit(X, y)[source]: Fit the model to the data.

predict(X)[source]: Predict risk scores for the given data.

predict_cumulative_hazard_function(X, index, dataset, seed, plot=False)[source]: H(x,t) = H₀(t) × exp(βᵀx).

predict_survival_function(X, index, dataset, seed, plot=False)[source]: S(x, t) = exp(-H(x, t)).

score(X, y)[source]: Calculate the score for the model.

class bsix.models.BaseCoxRegression(alpha=0.0, ties='breslow', n_iter=100)[source]

Bases: BaseSurvival

Cox Regression model.

calculate_xai(X, index, scaler, dataset, seed, feature_names, background=False, plot=False)[source]: Calculate XAI values.

fit(X, y)[source]: Fit the model to the data.

predict(X)[source]: Predict risk scores for the given data.

predict_cumulative_hazard_function(X, index, dataset, seed, plot=False)[source]: H(x,t) = H₀(t) × exp(βᵀx).

predict_survival_function(X, index, dataset, seed, plot=False)[source]: S(x, t) = exp(-H(x, t)).

score(X, y)[source]: Calculate the score for the model.

class bsix.models.BaseCoxRegressionWithTimeVarying(penalizer=0.0, l1_ratio=0.0, formula=None)[source]

Bases: BaseSurvival

Cox Regression with Time-Varying Covariates model.

calculate_xai(X, index, scaler, dataset, seed, feature_names, background=False, plot=False)[source]: Calculate XAI values.

fit(X, y)[source]: Fit the model to the data.

predict(X)[source]: Predict risk scores for the given data.

predict_cumulative_hazard_function(X, index, dataset, seed, plot=False)[source]: H(x,t) = H₀(t) × exp(βᵀx).

predict_survival_function(X, index, dataset, seed, plot=False)[source]: S(x, t) = exp(-H(x, t)).

score(X, y)[source]: Calculate the score for the model.

class bsix.models.BaseRandomSurvivalForest(seed, n_jobs=-1, n_estimators=100, max_depth=None, min_samples_leaf=3, min_samples_split=6)[source]

Bases: BaseSurvival

Random Survival Forest model.

calculate_xai(X, index, scaler, dataset, seed, feature_names, background=False, plot=False)[source]: Calculate XAI values.

fit(X, y)[source]: Fit the model to the data.

predict(X)[source]: Predict risk scores for the given data.

predict_cumulative_hazard_function(X, index, dataset, seed, plot=False)[source]: H(x,t) = H₀(t) × exp(βᵀx).

predict_survival_function(X, index, dataset, seed, plot=False)[source]: S(x, t) = exp(-H(x, t)).

score(X, y)[source]: Calculate the score for the model.

class bsix.models.BaseSurvivalTree(seed, max_depth=5, min_samples_split=2, min_samples_leaf=1)[source]

Bases: BaseSurvival

Survival Tree model.

calculate_xai(X, index, scaler, dataset, seed, feature_names, background=False, plot=False)[source]: Calculate XAI values.

fit(X, y)[source]: Fit the model to the data.

predict(X)[source]: Predict risk scores for the given data.

predict_cumulative_hazard_function(X, index, dataset, seed, plot=False)[source]: H(x,t) = H₀(t) × exp(βᵀx).

predict_survival_function(X, index, dataset, seed, plot=False)[source]: S(x, t) = exp(-H(x, t)).

score(X, y)[source]: Calculate the score for the model.

class bsix.models.CoxRegression(alpha=0.0, ties='breslow', n_iter=100)[source]

Bases: BaseSurvival

Cox Regression model.

Parameters:

alpha (float, default =0.0) – Regularization strength.
ties (str, default = "breslow") – Method for handling tied event times. "breslow" or "efron".
n_iter (int, default =100) – Number of iterations for the Newton-Raphson algorithm.

coef_

Estimated coefficients for the model.

Type:: array-like, shape (n_features,)

breslow

Breslow estimator for baseline hazards.

Type:: BreslowEstimator

survival_function

Estimated survival function.

Type:: array-like, shape (n_samples, n_times)

cumulative_hazard_function

Estimated cumulative hazard function.

Type:: array-like, shape (n_samples, n_times)

shap_explainer

SHAP explainer for model interpretability.

Type:: shap.Explainer

Examples

from bsix.models.metodologies import CoxRegression
model = CoxRegression(alpha=0.1, ties="efron", n_iter=200)
model.fit(X_train, y_train)

calculate_xai(X, index, scaler, dataset, seed, feature_names, background=False, plot=False)[source]

Calculate XAI values.

Parameters:

X (array-like, shape (n_samples, n_features)) – Input data.
index (array-like, shape (n_samples,)) – Index for the samples.
scaler (object) – Scaler used for the data.
dataset (str) – Name of the dataset.
seed (int) – Random seed for reproducibility.
feature_names (list of str) – Names of the features.
background (bool, default = False) – Whether to use background data for SHAP.
plot (bool, default = False) – Whether to plot the XAI values.

Returns:

shap_explainer (shap.Explainer) – SHAP explainer for model interpretability.
coefficients (dict) – Dictionary of feature coefficients sorted by absolute value.

fit(X, y)[source]

Fit the model to the data.

Parameters:

X (array-like, shape (n_samples, n_features)) – Training data.
y (structured array-like, shape (n_samples,)) – Target training values (events, times).

Returns:

self – Fitted estimator.

Return type:

CoxRegression

predict(X)[source]

Predict risk scores for the given data.

Parameters:: X (array-like, shape (n_samples, n_features)) – Input data.
Returns:: risk – Predicted risk scores.
Return type:: array-like, shape (n_samples,)

predict_cumulative_hazard_function(X, index, dataset, seed, plot=False)[source]

Predict the cumulative hazard function for the given data.

Parameters:

X (array-like, shape (n_samples, n_features)) – Input data.
index (array-like, shape (n_samples,)) – Index for the samples.
dataset (str) – Name of the dataset.
seed (int) – Random seed for reproducibility.
plot (bool, default = False) – Whether to plot the cumulative hazard function.

Returns:

cumulative_hazard_function – Predicted cumulative hazard function.

Return type:

array-like, shape (n_samples, n_times)

predict_survival_function(X, index, dataset, seed, plot=False)[source]

Predict the survival function for the given data.

Parameters:

X (array-like, shape (n_samples, n_features)) – Input data.
index (array-like, shape (n_samples,)) – Index for the samples.
dataset (str) – Name of the dataset.
seed (int) – Random seed for reproducibility.
plot (bool, default = False) – Whether to plot the survival function.

Returns:

survival_function – Predicted survival function.

Return type:

array-like, shape (n_samples, n_times)

class bsix.models.CoxRegressionWithTimeVarying(alpha=0.0, ties='breslow', n_iter=100)[source]

Bases: BaseSurvival

Cox Regression Time-Varying model.

Parameters:

alpha (float, default =0.0) – Regularization strength.
ties (str, default = "breslow") – Method for handling tied event times. "breslow" or "efron".
n_iter (int, default =100) – Number of iterations for the Newton-Raphson algorithm.

coef_

Estimated coefficients for the model.

Type:: array-like, shape (n_features,)

breslow

Breslow estimator for baseline hazards.

Type:: BreslowEstimator

survival_function

Estimated survival function.

Type:: array-like, shape (n_samples, n_times)

cumulative_hazard_function

Estimated cumulative hazard function.

Type:: array-like, shape (n_samples, n_times)

shap_explainer

SHAP explainer for model interpretability.

Type:: shap.Explainer

Examples

from bsix.models.metodologies import CoxRegressionWithTimeVarying
model = CoxRegressionWithTimeVarying(alpha=0.1, ties="efron", n_iter=200)
model.fit(X_train, y_train)

calculate_xai(X, index, scaler, dataset, seed, feature_names, background=False, plot=False)[source]

Calculate XAI values.

Parameters:

X (array-like, shape (n_samples, n_features)) – Input data.
index (array-like, shape (n_samples,)) – Index for the samples.
scaler (object) – Scaler used for the data.
dataset (str) – Name of the dataset.
seed (int) – Random seed for reproducibility.
feature_names (list of str) – Names of the features.
background (bool, default = False) – Whether to use background data for SHAP.
plot (bool, default = False) – Whether to plot the XAI values.

Returns:

shap_explainer (shap.Explainer) – SHAP explainer for model interpretability.
coefficients (dict) – Dictionary of feature coefficients sorted by absolute value.

fit(X, y)[source]

Fit the model to the data.

Parameters:

X (array-like, shape (n_samples, n_features)) – Training data.
y (structured array-like, shape (n_samples,)) – Target training values (event, start times, stop times).

Returns:

self – Fitted estimator.

Return type:

CoxRegressionWithTimeVarying

predict(X)[source]

Predict risk scores for the given data.

Parameters:: X (array-like, shape (n_samples, n_features)) – Input data.
Returns:: risk – Predicted risk scores.
Return type:: array-like, shape (n_samples,)

predict_cumulative_hazard_function(X, index, dataset, seed, plot=False)[source]

Predict the cumulative hazard function for the given data.

Parameters:

X (array-like, shape (n_samples, n_features)) – Input data.
index (array-like, shape (n_samples,)) – Index for the samples.
dataset (str) – Name of the dataset.
seed (int) – Random seed for reproducibility.
plot (bool, default = False) – Whether to plot the cumulative hazard function.

Returns:

cumulative_hazard_function – Predicted cumulative hazard function.

Return type:

array-like, shape (n_samples, n_times)

predict_survival_function(X, index, dataset, seed, plot=False)[source]

Predict the survival function for the given data.

Parameters:

X (array-like, shape (n_samples, n_features)) – Input data.
index (array-like, shape (n_samples,)) – Index for the samples.
dataset (str) – Name of the dataset.
seed (int) – Random seed for reproducibility.
plot (bool, default = False) – Whether to plot the survival function.

Returns:

survival_function – Predicted survival function.

Return type:

array-like, shape (n_samples, n_times)

class bsix.models.DeepMultiTask(num_inputs, valid_data=None, hidden_layers=None, epochs=500, learn_rate=0.0, lr_decay=0.0, l1_reg=0.0, l2_reg=0.0, cox_reg=0.0, momentum=0.9, activation='relu', dropout=0.0, standardize=True, ties='cox', device=None, validation_frequency=10, patience=500, improvement_threshold=0.99999, patience_increase=25, logger=None, verbose=True, seed=None, coef_likelihood=[1.0])[source]

Bases: BaseSurvival

Deep Multi-Task model.

Parameters:

num_inputs (int) – Number of input features.
valid_data (dict, default = None) – Validation data in the form of a dictionary with keys “x”, “e”, and “t” for features, events, and times, respectively.
hidden_layers (list of int, default = None) – List specifying the number of units in each hidden layer.
epochs (int, default = 500) – Number of training epochs.
learn_rate (float, default = 0.0) – Learning rate for the optimizer.
lr_decay (float, default = 0.0) – Learning rate decay factor.
l1_reg (float, default = 0.0) – L1 regularization strength.
l2_reg (float, default = 0.0) – L2 regularization strength.
cox_reg (float, default = 0.0) – Coefficient for the Cox loss in the total loss function.
momentum (float, default = 0.9) – Momentum for the optimizer.
activation (str, default = "relu") – Activation function to use in the hidden layers. relu, selu, tanh or sigmoid.
dropout (float, default = 0.0) – Dropout rate for regularization.
standardize (bool, default = True) – Whether to standardize input features.
ties (str, default = "cox") – Method for handling tied event times. "cox" or "breslow".
device (torch.device, default = None) – Device to run the model on (e.g., “cpu” or “cuda”).
validation_frequency (int, default = 10) – Frequency (in epochs) to perform validation.
patience (int, default = 2000) – Number of epochs to wait for improvement before early stopping.
improvement_threshold (float, default = 0.99999) – Threshold for considering an improvement in validation loss.
patience_increase (int, default = 2) – Factor by which to increase patience when an improvement is observed.
logger (DeepSurvLogger, default = None) – Logger for tracking training progress.
verbose (bool, default = True) – Whether to print training progress.
seed (int, default = None) – Random seed for reproducibility.
coef_likelihood (list of float, default = [1.0]) – Coefficients for the likelihood loss of each progression in the total loss function.

survival_function

Estimated survival function.

Type:: array-like, shape (n_samples, n_times)

cumulative_hazard_function

Estimated cumulative hazard function.

Type:: array-like, shape (n_samples, n_times)

shap_explainer

SHAP explainer for model interpretability.

Type:: list of shap.Explainer

Examples

from bsix.models.metodologies import DeepMultiTask
model = DeepMultiTask(num_inputs=10, hidden_layers=[32,], epochs=200, learn_rate=0.01)
model.fit(X_train, y_train)

calculate_xai(X, index, scaler, dataset, seed, feature_names, background=False, plot=False)[source]

Calculate XAI values.

Parameters:

X (array-like, shape (n_samples, n_features)) – Input data.
index (array-like, shape (n_samples,)) – Index for the samples.
scaler (object) – Scaler used for the data.
dataset (str) – Name of the dataset.
seed (int) – Random seed for reproducibility.
feature_names (list of str) – Names of the features.
background (bool, default = False) – Whether to use background data for SHAP.
plot (bool, default = False) – Whether to plot the XAI values.

Returns:

shap_explainer – SHAP explainer for model interpretability.

Return type:

list of shap.Explainer, shape (n_progressions,)

fit(X_train, y_train, **kwargs)[source]

Fit the model to the data.

Parameters:

X_train (array-like, shape (n_progressions, n_samples, n_features)) – Training data.
y_train (structured array-like, shape (n_progressions, n_samples,)) – Target training values (events, times).

Returns:

self – Fitted estimator.

Return type:

DeepMultiTask

predict(x)[source]

Predict risk scores for the given data.

Parameters:: X (array-like, shape (n_progressions, n_samples, n_features)) – Input data.
Returns:: risk – Predicted risk scores.
Return type:: array-like, shape (n_progressions, n_samples,)

predict_cumulative_hazard_function(X, index, dataset, seed, plot=False)[source]

Predict the cumulative hazard function for the given data.

Parameters:

X (array-like, shape (n_samples, n_features)) – Input data.
index (array-like, shape (n_samples,)) – Index for the samples.
dataset (str) – Name of the dataset.
seed (int) – Random seed for reproducibility.
plot (bool, default = False) – Whether to plot the cumulative hazard function.

Returns:

cumulative_hazard_function – Predicted cumulative hazard functions.

Return type:

array-like, shape (n_progressions, n_samples, n_times)

predict_survival_function(X, index, dataset, seed, plot=False)[source]

Predict the survival function for the given data.

Parameters:

X (array-like, shape (n_samples, n_features)) – Input data.
index (array-like, shape (n_samples,)) – Index for the samples.
dataset (str) – Name of the dataset.
seed (int) – Random seed for reproducibility.
plot (bool, default = False) – Whether to plot the survival function.

Returns:

survival_function – Predicted survival functions.

Return type:

array-like, shape (n_progressions, n_samples, n_times)

set_fit_request(*, X_train: bool | None | str = '$UNCHANGED$', y_train: bool | None | str = '$UNCHANGED$') → DeepMultiTask

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

X_train (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for X_train parameter in fit.
y_train (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for y_train parameter in fit.

Returns:

self – The updated object.

Return type:

object

set_predict_request(*, x: bool | None | str = '$UNCHANGED$') → DeepMultiTask

Configure whether metadata should be requested to be passed to the predict method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to predict if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to predict.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: x (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for x parameter in predict.
Returns:: self – The updated object.
Return type:: object

class bsix.models.DeepMultiTaskMultiLoss(num_inputs, valid_data=None, hidden_layers=None, epochs=500, learn_rate=0.0, lr_decay=0.0, l1_reg=0.0, l2_reg=0.0, cox_reg=0.0, bin_reg=0.0, momentum=0.9, activation='relu', dropout=0.0, standardize=True, ties='cox', device=None, validation_frequency=10, patience=500, improvement_threshold=0.99999, patience_increase=25, logger=None, verbose=True, seed=None, coef_likelihood=[1.0], coef_binary=[1.0])[source]

Bases: BaseSurvival

Deep Multi-Task Multi-Loss model.

Parameters:

num_inputs (int) – Number of input features.
valid_data (dict, default = None) – Validation data in the form of a dictionary with keys “x”, “e”, and “t” for features, events, and times, respectively.
hidden_layers (list of int, default = None) – List specifying the number of units in each hidden layer.
epochs (int, default = 500) – Number of training epochs.
learn_rate (float, default = 0.0) – Learning rate for the optimizer.
lr_decay (float, default = 0.0) – Learning rate decay factor.
l1_reg (float, default = 0.0) – L1 regularization strength.
l2_reg (float, default = 0.0) – L2 regularization strength.
cox_reg (float, default = 0.0) – Coefficient for the Cox loss in the total loss function.
bin_reg (float, default = 0.0) – Coefficient for the binary loss in the total loss function.
momentum (float, default = 0.9) – Momentum for the optimizer.
activation (str, default = "relu") – Activation function to use in the hidden layers. relu, selu, tanh or sigmoid.
dropout (float, default = 0.0) – Dropout rate for regularization.
standardize (bool, default = True) – Whether to standardize input features.
ties (str, default = "cox") – Method for handling tied event times. "cox" or "breslow".
device (torch.device, default = None) – Device to run the model on (e.g., “cpu” or “cuda”).
validation_frequency (int, default = 10) – Frequency (in epochs) to perform validation.
patience (int, default = 2000) – Number of epochs to wait for improvement before early stopping.
improvement_threshold (float, default = 0.99999) – Threshold for considering an improvement in validation loss.
patience_increase (int, default = 2) – Factor by which to increase patience when an improvement is observed.
logger (DeepSurvLogger, default = None) – Logger for tracking training progress.
verbose (bool, default = True) – Whether to print training progress.
seed (int, default = None) – Random seed for reproducibility.
coef_likelihood (list of float, default = [1.0]) – Coefficients for the likelihood loss of each progression in the total loss function.
coef_binary (list of float, default = [1.0]) – Coefficients for the binary loss of each progression in the total loss function.

survival_function

Estimated survival function.

Type:: array-like, shape (n_samples, n_times)

cumulative_hazard_function

Estimated cumulative hazard function.

Type:: array-like, shape (n_samples, n_times)

shap_explainer

SHAP explainer for model interpretability.

Type:: list of shap.Explainer

Examples

from bsix.models.metodologies import DeepMultiTaskMultiLoss
model = DeepMultiTaskMultiLoss(num_inputs=10, hidden_layers=[32,], epochs=200, learn_rate=0.01)
model.fit(X_train, y_train)

calculate_xai(X, index, scaler, dataset, seed, feature_names, background=False, plot=False)[source]

Calculate XAI values.

Parameters:

X (array-like, shape (n_samples, n_features)) – Input data.
index (array-like, shape (n_samples,)) – Index for the samples.
scaler (object) – Scaler used for the data.
dataset (str) – Name of the dataset.
seed (int) – Random seed for reproducibility.
feature_names (list of str) – Names of the features.
background (bool, default = False) – Whether to use background data for SHAP.
plot (bool, default = False) – Whether to plot the XAI values.

Returns:

shap_explainer – SHAP explainer for model interpretability.

Return type:

list of shap.Explainer, shape (n_progressions,)

fit(X_train, y_train, **kwargs)[source]

Fit the model to the data.

Parameters:

X_train (array-like, shape (n_progressions, n_samples, n_features)) – Training data.
y_train (structured array-like, shape (n_progressions, n_samples,)) – Target training values (events, times).

Returns:

self – Fitted estimator.

Return type:

DeepMultiTaskMultiLoss

predict(x)[source]

Predict risk scores for the given data.

Parameters:: X (array-like, shape (n_progressions, n_samples, n_features)) – Input data.
Returns:: risk – Predicted risk scores.
Return type:: array-like, shape (n_progressions, n_samples,)

predict_cumulative_hazard_function(X, index, dataset, seed, plot=False)[source]

Predict the cumulative hazard function for the given data.

Parameters:

X (array-like, shape (n_samples, n_features)) – Input data.
index (array-like, shape (n_samples,)) – Index for the samples.
dataset (str) – Name of the dataset.
seed (int) – Random seed for reproducibility.
plot (bool, default = False) – Whether to plot the cumulative hazard function.

Returns:

cumulative_hazard_function – Predicted cumulative hazard functions.

Return type:

array-like, shape (n_progressions, n_samples, n_times)

predict_survival_function(X, index, dataset, seed, plot=False)[source]

Predict the survival function for the given data.

Parameters:

X (array-like, shape (n_samples, n_features)) – Input data.
index (array-like, shape (n_samples,)) – Index for the samples.
dataset (str) – Name of the dataset.
seed (int) – Random seed for reproducibility.
plot (bool, default = False) – Whether to plot the survival function.

Returns:

survival_function – Predicted survival functions.

Return type:

array-like, shape (n_progressions, n_samples, n_times)

set_fit_request(*, X_train: bool | None | str = '$UNCHANGED$', y_train: bool | None | str = '$UNCHANGED$') → DeepMultiTaskMultiLoss

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

X_train (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for X_train parameter in fit.
y_train (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for y_train parameter in fit.

Returns:

self – The updated object.

Return type:

object

set_predict_request(*, x: bool | None | str = '$UNCHANGED$') → DeepMultiTaskMultiLoss

Configure whether metadata should be requested to be passed to the predict method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to predict if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to predict.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: x (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for x parameter in predict.
Returns:: self – The updated object.
Return type:: object

set_score_request(*, x: bool | None | str = '$UNCHANGED$') → DeepMultiTaskMultiLoss

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: x (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for x parameter in score.
Returns:: self – The updated object.
Return type:: object

class bsix.models.DeepSurv(num_inputs, valid_data=None, hidden_layers=None, epochs=500, learn_rate=0.0, lr_decay=0.0, l1_reg=0.0, l2_reg=0.0, momentum=0.9, activation='relu', dropout=0.0, standardize=True, ties='cox', device=None, validation_frequency=10, patience=2000, improvement_threshold=0.99999, patience_increase=2, logger=None, verbose=True, seed=None)[source]

Bases: BaseSurvival

Deep Survival model.

Parameters:

num_inputs (int) – Number of input features.
valid_data (dict, default = None) – Validation data in the form of a dictionary with keys “x”, “e”, and “t” for features, events, and times, respectively.
hidden_layers (list of int, default = None) – List specifying the number of units in each hidden layer.
epochs (int, default =500) – Number of training epochs.
learn_rate (float, default =0.0) – Learning rate for the optimizer.
lr_decay (float, default =0.0) – Learning rate decay factor.
l1_reg (float, default =0.0) – L1 regularization strength.
l2_reg (float, default =0.0) – L2 regularization strength.
momentum (float, default =0.9) – Momentum for the optimizer.
activation (str, default = "relu") – Activation function to use in the hidden layers. relu, selu, tanh or sigmoid.
dropout (float, default =0.0) – Dropout rate for regularization.
standardize (bool, default = True) – Whether to standardize input features.
ties (str, default = "cox") – Method for handling tied event times. "cox" or "breslow".
device (torch.device, default = None) – Device to run the model on (e.g., “cpu” or “cuda”).
validation_frequency (int, default =10) – Frequency (in epochs) to perform validation.
patience (int, default =2000) – Number of epochs to wait for improvement before early stopping.
improvement_threshold (float, default =0.99999) – Threshold for considering an improvement in validation loss.
patience_increase (int, default =2) – Factor by which to increase patience when an improvement is observed.
logger (DeepSurvLogger, default = None) – Logger for tracking training progress.
verbose (bool, default = True) – Whether to print training progress.
seed (int, default = None) – Random seed for reproducibility.

breslow

Breslow estimator for baseline hazards.

Type:: BreslowEstimator

survival_function

Estimated survival function.

Type:: array-like, shape (n_samples, n_times)

cumulative_hazard_function

Estimated cumulative hazard function.

Type:: array-like, shape (n_samples, n_times)

shap_explainer

SHAP explainer for model interpretability.

Type:: shap.Explainer

Examples

from bsix.models.metodologies import DeepSurv
model = DeepSurv(num_inputs=10, hidden_layers=[32,], epochs=200, learn_rate=0.01)
model.fit(X_train, y_train)

calculate_xai(X, index, scaler, dataset, seed, feature_names, background=False, plot=False)[source]

Calculate XAI values.

Parameters:

X (array-like, shape (n_samples, n_features)) – Input data.
index (array-like, shape (n_samples,)) – Index for the samples.
scaler (object) – Scaler used for the data.
dataset (str) – Name of the dataset.
seed (int) – Random seed for reproducibility.
feature_names (list of str) – Names of the features.
background (bool, default = False) – Whether to use background data for SHAP.
plot (bool, default = False) – Whether to plot the XAI values.

Returns:

shap_explainer – SHAP explainer for model interpretability.

Return type:

shap.Explainer

fit(X_train, y_train, **kwargs)[source]

Fit the model to the data.

Parameters:

X_train (array-like, shape (n_samples, n_features)) – Training data.
y_train (structured array-like, shape (n_samples,)) – Target training values (events, times).

Returns:

self – Fitted estimator.

Return type:

DeepSurv

predict(x)[source]

Predict risk scores for the given data.

Parameters:: X (array-like, shape (n_samples, n_features)) – Input data.
Returns:: risk – Predicted risk scores.
Return type:: array-like, shape (n_samples,)

predict_cumulative_hazard_function(X, index, dataset, seed, plot=False)[source]

Predict the cumulative hazard function for the given data.

Parameters:

X (array-like, shape (n_samples, n_features)) – Input data.
index (array-like, shape (n_samples,)) – Index for the samples.
dataset (str) – Name of the dataset.
seed (int) – Random seed for reproducibility.
plot (bool, default = False) – Whether to plot the cumulative hazard function.

Returns:

cumulative_hazard_function – Predicted cumulative hazard function.

Return type:

array-like, shape (n_samples, n_times)

predict_survival_function(X, index, dataset, seed, plot=False)[source]

Predict the survival function for the given data.

Parameters:

X (array-like, shape (n_samples, n_features)) – Input data.
index (array-like, shape (n_samples,)) – Index for the samples.
dataset (str) – Name of the dataset.
seed (int) – Random seed for reproducibility.
plot (bool, default = False) – Whether to plot the survival function.

Returns:

survival_function – Predicted survival function.

Return type:

array-like, shape (n_samples, n_times)

set_fit_request(*, X_train: bool | None | str = '$UNCHANGED$', y_train: bool | None | str = '$UNCHANGED$') → DeepSurv

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

X_train (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for X_train parameter in fit.
y_train (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for y_train parameter in fit.

Returns:

self – The updated object.

Return type:

object

set_predict_request(*, x: bool | None | str = '$UNCHANGED$') → DeepSurv

Configure whether metadata should be requested to be passed to the predict method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to predict if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to predict.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: x (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for x parameter in predict.
Returns:: self – The updated object.
Return type:: object

class bsix.models.DeepTimeVarying(num_inputs, valid_data=None, hidden_layers=None, epochs=500, learn_rate=0.0, lr_decay=0.0, l1_reg=0.0, l2_reg=0.0, momentum=0.9, activation='relu', dropout=0.0, standardize=True, ties='cox', device=None, validation_frequency=10, patience=2000, improvement_threshold=0.99999, patience_increase=2, logger=None, verbose=True, seed=None)[source]

Bases: BaseSurvival

Deep Time-Varying model.

Parameters:

num_inputs (int) – Number of input features.
valid_data (dict, default = None) – Validation data in the form of a dictionary with keys “x”, “e”, and “t” for features, events, and times, respectively.
hidden_layers (list of int, default = None) – List specifying the number of units in each hidden layer.
epochs (int, default =500) – Number of training epochs.
learn_rate (float, default =0.0) – Learning rate for the optimizer.
lr_decay (float, default =0.0) – Learning rate decay factor.
l1_reg (float, default =0.0) – L1 regularization strength.
l2_reg (float, default =0.0) – L2 regularization strength.
momentum (float, default =0.9) – Momentum for the optimizer.
activation (str, default = "relu") – Activation function to use in the hidden layers. relu, selu, tanh or sigmoid.
dropout (float, default =0.0) – Dropout rate for regularization.
standardize (bool, default = True) – Whether to standardize input features.
ties (str, default = "cox") – Method for handling tied event times. "cox" or "breslow".
device (torch.device, default = None) – Device to run the model on (e.g., “cpu” or “cuda”).
validation_frequency (int, default =10) – Frequency (in epochs) to perform validation.
patience (int, default =2000) – Number of epochs to wait for improvement before early stopping.
improvement_threshold (float, default =0.99999) – Threshold for considering an improvement in validation loss.
patience_increase (int, default =2) – Factor by which to increase patience when an improvement is observed.
logger (DeepSurvLogger, default = None) – Logger for tracking training progress.
verbose (bool, default = True) – Whether to print training progress.
seed (int, default = None) – Random seed for reproducibility.

breslow

Breslow estimator for baseline hazards.

Type:: BreslowEstimator

survival_function

Estimated survival function.

Type:: array-like, shape (n_samples, n_times)

cumulative_hazard_function

Estimated cumulative hazard function.

Type:: array-like, shape (n_samples, n_times)

shap_explainer

SHAP explainer for model interpretability.

Type:: shap.Explainer

Examples

from bsix.models.metodologies import DeepTimeVarying
model = DeepTimeVarying(num_inputs=10, hidden_layers=[32,], epochs=200, learn_rate=0.01)
model.fit(X_train, y_train)

calculate_xai(X, index, scaler, dataset, seed, feature_names, background=False, plot=False)[source]

Calculate XAI values.

Parameters:

X (array-like, shape (n_samples, n_features)) – Input data.
index (array-like, shape (n_samples,)) – Index for the samples.
scaler (object) – Scaler used for the data.
dataset (str) – Name of the dataset.
seed (int) – Random seed for reproducibility.
feature_names (list of str) – Names of the features.
background (bool, default = False) – Whether to use background data for SHAP.
plot (bool, default = False) – Whether to plot the XAI values.

Returns:

shap_explainer – SHAP explainer for model interpretability.

Return type:

shap.Explainer

fit(X_train, y_train, **kwargs)[source]

Fit the model to the data.

Parameters:

X_train (array-like, shape (n_samples, n_features)) – Training data.
y_train (structured array-like, shape (n_samples,)) – Target training values (events, start times, stop times).

Returns:

self – Fitted estimator.

Return type:

DeepTimeVarying

predict(x)[source]

Predict risk scores for the given data.

Parameters:: X (array-like, shape (n_samples, n_features)) – Input data.
Returns:: risk – Predicted risk scores.
Return type:: array-like, shape (n_samples,)

predict_cumulative_hazard_function(X, index, dataset, seed, plot=False)[source]

Predict the cumulative hazard function for the given data.

Parameters:

X (array-like, shape (n_samples, n_features)) – Input data.
index (array-like, shape (n_samples,)) – Index for the samples.
dataset (str) – Name of the dataset.
seed (int) – Random seed for reproducibility.
plot (bool, default = False) – Whether to plot the cumulative hazard function.

Returns:

cumulative_hazard_function – Predicted cumulative hazard function.

Return type:

array-like, shape (n_samples, n_times)

predict_survival_function(X, index, dataset, seed, plot=False)[source]

Predict the survival function for the given data.

Parameters:

X (array-like, shape (n_samples, n_features)) – Input data.
index (array-like, shape (n_samples,)) – Index for the samples.
dataset (str) – Name of the dataset.
seed (int) – Random seed for reproducibility.
plot (bool, default = False) – Whether to plot the survival function.

Returns:

survival_function – Predicted survival function.

Return type:

array-like, shape (n_samples, n_times)

set_fit_request(*, X_train: bool | None | str = '$UNCHANGED$', y_train: bool | None | str = '$UNCHANGED$') → DeepTimeVarying

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

X_train (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for X_train parameter in fit.
y_train (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for y_train parameter in fit.

Returns:

self – The updated object.

Return type:

object

set_predict_request(*, x: bool | None | str = '$UNCHANGED$') → DeepTimeVarying

Configure whether metadata should be requested to be passed to the predict method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to predict if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to predict.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: x (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for x parameter in predict.
Returns:: self – The updated object.
Return type:: object

class bsix.models.RandomSurvForest(seed, n_jobs=-1, n_estimators=100, max_depth=None, min_samples_leaf=3, min_samples_split=6)[source]

Bases: BaseSurvival

Random Survival Forest model.

Parameters:

seed (int) – Random seed for reproducibility.
n_jobs (int, default =-1) – Number of jobs to run in parallel.
n_estimators (int, default =100) – The number of trees in the forest.
max_depth (int, default =´´None´´) – The maximum depth of the tree.
min_samples_leaf (int, default =3) – The minimum number of samples required to be at a leaf node.
min_samples_split (int, default =6) – The minimum number of samples required to split an internal node.

survival_function

Estimated survival function.

Type:: array-like, shape (n_samples, n_times)

cumulative_hazard_function

Estimated cumulative hazard function.

Type:: array-like, shape (n_samples, n_times)

shap_explainer

SHAP explainer for model interpretability.

Type:: shap.Explainer

Examples

from bsix.models.metodologies import RandomSurvForest
model = RandomSurvForest(seed=42, n_estimators=100, max_depth=5)
model.fit(X_train, y_train)

calculate_xai(X, index, scaler, dataset, seed, feature_names, background=False, plot=False)[source]

Calculate XAI values.

Parameters:

X (array-like, shape (n_samples, n_features)) – Input data.
index (array-like, shape (n_samples,)) – Index for the samples.
scaler (object) – Scaler used for the data.
dataset (str) – Name of the dataset.
seed (int) – Random seed for reproducibility.
feature_names (list of str) – Names of the features.
background (bool, default = False) – Whether to use background data for SHAP.
plot (bool, default = False) – Whether to plot the XAI values.

Returns:

shap_explainer – SHAP explainer for model interpretability.

Return type:

shap.Explainer

fit(X, y)[source]

Fit the model to the data.

Parameters:

X (array-like, shape (n_samples, n_features)) – Training data.
y (structured array-like, shape (n_samples,)) – Target training values (events, times).

Returns:

self – Fitted estimator.

Return type:

RandomSurvForest

predict(X)[source]

Predict risk scores for the given data.

Parameters:: X (array-like, shape (n_samples, n_features)) – Input data.
Returns:: risk – Predicted risk scores.
Return type:: array-like, shape (n_samples,)

predict_cumulative_hazard_function(X, index, dataset, seed, plot=False)[source]

Predict the cumulative hazard function for the given data.

Parameters:

X (array-like, shape (n_samples, n_features)) – Input data.
index (array-like, shape (n_samples,)) – Index for the samples.
dataset (str) – Name of the dataset.
seed (int) – Random seed for reproducibility.
plot (bool, default = False) – Whether to plot the cumulative hazard function.

Returns:

cumulative_hazard_function – Predicted cumulative hazard function.

Return type:

array-like, shape (n_samples, n_times)

predict_survival_function(X, index, dataset, seed, plot=False)[source]

Predict the survival function for the given data.

Parameters:

X (array-like, shape (n_samples, n_features)) – Input data.
index (array-like, shape (n_samples,)) – Index for the samples.
dataset (str) – Name of the dataset.
seed (int) – Random seed for reproducibility.
plot (bool, default = False) – Whether to plot the survival function.

Returns:

survival_function – Predicted survival function.

Return type:

array-like, shape (n_samples, n_times)

class bsix.models.SurvTree(max_depth=None, min_samples_split=6, min_samples_leaf=3, seed=0)[source]

Bases: BaseSurvival

Survival Tree model.

calculate_xai(X, index, scaler, dataset, seed, feature_names, background=False, plot=False)[source]

Calculate XAI values.

Parameters:

X (array-like, shape (n_samples, n_features)) – Input data.
index (array-like, shape (n_samples,)) – Index for the samples.
scaler (object) – Scaler used for the data.
dataset (str) – Name of the dataset.
seed (int) – Random seed for reproducibility.
feature_names (list of str) – Names of the features.
background (bool, default = False) – Whether to use background data for SHAP.
plot (bool, default = False) – Whether to plot the XAI values.

Returns:

shap_explainer – SHAP explainer for model interpretability.

Return type:

shap.Explainer

fit(X, y)[source]

Fit the model to the data.

Parameters:

X (array-like, shape (n_samples, n_features)) – Training data.
y (structured array-like, shape (n_samples,)) – Target training values (events, times).

Returns:

self – Fitted estimator.

Return type:

SurvTree

predict(X)[source]

Predict risk scores for the given data.

Parameters:: X (array-like, shape (n_samples, n_features)) – Input data.
Returns:: risk – Predicted risk scores.
Return type:: array-like, shape (n_samples,)

predict_cumulative_hazard_function(X, index, dataset, seed, plot=False)[source]

Predict the cumulative hazard function for the given data.

Parameters:

X (array-like, shape (n_samples, n_features)) – Input data.
index (array-like, shape (n_samples,)) – Index for the samples.
dataset (str) – Name of the dataset.
seed (int) – Random seed for reproducibility.
plot (bool, default = False) – Whether to plot the cumulative hazard function.

Returns:

cumulative_hazard_function – Predicted cumulative hazard function.

Return type:

array-like, shape (n_samples, n_times)

predict_survival_function(X, index, dataset, seed, plot=False)[source]

Predict the survival function for the given data.

Parameters:

X (array-like, shape (n_samples, n_features)) – Input data.
index (array-like, shape (n_samples,)) – Index for the samples.
dataset (str) – Name of the dataset.
seed (int) – Random seed for reproducibility.
plot (bool, default = False) – Whether to plot the survival function.

Returns:

survival_function – Predicted survival function.

Return type:

array-like, shape (n_samples, n_times)