Models

Base

class bsix.models.base.BaseSurvival[source]

Bases: BaseEstimator, ABC

Abstract Class for Survival Analysis models.

abstract calculate_xai(X, **kwargs)[source]

Calculate XAI values.

static dinamic_discretise(y, dataset, seed=0, plot=False)[source]

Discretise data by piecewise exponential and show in kaplan meier.

static feature_selection(X, y)[source]

Calculate the best features based on p-value.

abstract fit(X, y, **kwargs)[source]

Fit the model.

static generate_simulated_survival_data(number_rows=1000, number_columns=10, censored=0.75, relation=None, seed=0)[source]

Generate simulated survival data based.

static logrank_test(y, groups, weights=None)[source]

Calculate the log-rank test for n groups.

static plot_coefficients(coefficients, estimator_name, dataset, seed=None, progression=None)[source]

Plot XAI coefficients for the data (lollipop plot).

static plot_individual_shap(shap_explainer, identifier_index, index, scaler, estimator_name, dataset, seed=None, progression=None)[source]

Plot SHAP values for an individual instance (horizontal bar plot).

static plot_shap(shap_explainer, index, scaler, estimator_name, dataset, seed=None, progression=None)[source]

Plot SHAP values for the data (beeswarm plot).

abstract predict(X, **kwargs)[source]

Predict on X.

abstract predict_cumulative_hazard_function(X, **kwargs)[source]

H(x,t) = H0(t) * exp(g(x)).

abstract predict_survival_function(X, **kwargs)[source]

S(x, t) = exp(-H(x, t)).

static to_time_dependent(dataframe, splits, identifier='identifier', time='time', event='event')[source]

Transform a DataFrame with a per-subject measurement into a time-dependent format.

static to_time_varying(dataframe, identifier='identifier', time='time', event='event')[source]

Transform a DataFrame with a multiple-subject measurements into a start-stop format.

Metodologies

class bsix.models.AcceleratedFailureTime(type='WeibullAFT', penalizer=0.0, l1_ratio=0.0)[source]

Bases: BaseSurvival

Weibull Accelerated Failure Time model.

calculate_xai(X, index, scaler, dataset, seed, feature_names, background=False, plot=False)[source]

Calculate XAI values.

fit(X, y)[source]

Fit the model to the data.

predict(X)[source]

Predict risk scores for the given data.

predict_cumulative_hazard_function(X, index, dataset, seed, plot=False)[source]

H(x,t) = H₀(t) × exp(βᵀx).

predict_survival_function(X, index, dataset, seed, plot=False)[source]

S(x, t) = exp(-H(x, t)).

score(X, y)[source]

Calculate the score for the model.

class bsix.models.BaseCoxRegression(alpha=0.0, ties='breslow', n_iter=100)[source]

Bases: BaseSurvival

Cox Regression model.

calculate_xai(X, index, scaler, dataset, seed, feature_names, background=False, plot=False)[source]

Calculate XAI values.

fit(X, y)[source]

Fit the model to the data.

predict(X)[source]

Predict risk scores for the given data.

predict_cumulative_hazard_function(X, index, dataset, seed, plot=False)[source]

H(x,t) = H₀(t) × exp(βᵀx).

predict_survival_function(X, index, dataset, seed, plot=False)[source]

S(x, t) = exp(-H(x, t)).

score(X, y)[source]

Calculate the score for the model.

class bsix.models.BaseCoxRegressionWithTimeVarying(penalizer=0.0, l1_ratio=0.0, formula=None)[source]

Bases: BaseSurvival

Cox Regression with Time-Varying Covariates model.

calculate_xai(X, index, scaler, dataset, seed, feature_names, background=False, plot=False)[source]

Calculate XAI values.

fit(X, y)[source]

Fit the model to the data.

predict(X)[source]

Predict risk scores for the given data.

predict_cumulative_hazard_function(X, index, dataset, seed, plot=False)[source]

H(x,t) = H₀(t) × exp(βᵀx).

predict_survival_function(X, index, dataset, seed, plot=False)[source]

S(x, t) = exp(-H(x, t)).

score(X, y)[source]

Calculate the score for the model.

class bsix.models.BaseRandomSurvivalForest(seed, n_jobs=-1, n_estimators=100, max_depth=None, min_samples_leaf=3, min_samples_split=6)[source]

Bases: BaseSurvival

Random Survival Forest model.

calculate_xai(X, index, scaler, dataset, seed, feature_names, background=False, plot=False)[source]

Calculate XAI values.

fit(X, y)[source]

Fit the model to the data.

predict(X)[source]

Predict risk scores for the given data.

predict_cumulative_hazard_function(X, index, dataset, seed, plot=False)[source]

H(x,t) = H₀(t) × exp(βᵀx).

predict_survival_function(X, index, dataset, seed, plot=False)[source]

S(x, t) = exp(-H(x, t)).

score(X, y)[source]

Calculate the score for the model.

class bsix.models.BaseSurvivalTree(seed, max_depth=5, min_samples_split=2, min_samples_leaf=1)[source]

Bases: BaseSurvival

Survival Tree model.

calculate_xai(X, index, scaler, dataset, seed, feature_names, background=False, plot=False)[source]

Calculate XAI values.

fit(X, y)[source]

Fit the model to the data.

predict(X)[source]

Predict risk scores for the given data.

predict_cumulative_hazard_function(X, index, dataset, seed, plot=False)[source]

H(x,t) = H₀(t) × exp(βᵀx).

predict_survival_function(X, index, dataset, seed, plot=False)[source]

S(x, t) = exp(-H(x, t)).

score(X, y)[source]

Calculate the score for the model.

class bsix.models.CoxRegression(alpha=0.0, ties='breslow', n_iter=100)[source]

Bases: BaseSurvival

Cox Regression model.

Parameters:
  • alpha (float, default =0.0) – Regularization strength.

  • ties (str, default = "breslow") – Method for handling tied event times. "breslow" or "efron".

  • n_iter (int, default =100) – Number of iterations for the Newton-Raphson algorithm.

coef_

Estimated coefficients for the model.

Type:

array-like, shape (n_features,)

breslow

Breslow estimator for baseline hazards.

Type:

BreslowEstimator

survival_function

Estimated survival function.

Type:

array-like, shape (n_samples, n_times)

cumulative_hazard_function

Estimated cumulative hazard function.

Type:

array-like, shape (n_samples, n_times)

shap_explainer

SHAP explainer for model interpretability.

Type:

shap.Explainer

Examples

from bsix.models.metodologies import CoxRegression
model = CoxRegression(alpha=0.1, ties="efron", n_iter=200)
model.fit(X_train, y_train)
calculate_xai(X, index, scaler, dataset, seed, feature_names, background=False, plot=False)[source]

Calculate XAI values.

Parameters:
  • X (array-like, shape (n_samples, n_features)) – Input data.

  • index (array-like, shape (n_samples,)) – Index for the samples.

  • scaler (object) – Scaler used for the data.

  • dataset (str) – Name of the dataset.

  • seed (int) – Random seed for reproducibility.

  • feature_names (list of str) – Names of the features.

  • background (bool, default = False) – Whether to use background data for SHAP.

  • plot (bool, default = False) – Whether to plot the XAI values.

Returns:

  • shap_explainer (shap.Explainer) – SHAP explainer for model interpretability.

  • coefficients (dict) – Dictionary of feature coefficients sorted by absolute value.

fit(X, y)[source]

Fit the model to the data.

Parameters:
  • X (array-like, shape (n_samples, n_features)) – Training data.

  • y (structured array-like, shape (n_samples,)) – Target training values (events, times).

Returns:

self – Fitted estimator.

Return type:

CoxRegression

predict(X)[source]

Predict risk scores for the given data.

Parameters:

X (array-like, shape (n_samples, n_features)) – Input data.

Returns:

risk – Predicted risk scores.

Return type:

array-like, shape (n_samples,)

predict_cumulative_hazard_function(X, index, dataset, seed, plot=False)[source]

Predict the cumulative hazard function for the given data.

Parameters:
  • X (array-like, shape (n_samples, n_features)) – Input data.

  • index (array-like, shape (n_samples,)) – Index for the samples.

  • dataset (str) – Name of the dataset.

  • seed (int) – Random seed for reproducibility.

  • plot (bool, default = False) – Whether to plot the cumulative hazard function.

Returns:

cumulative_hazard_function – Predicted cumulative hazard function.

Return type:

array-like, shape (n_samples, n_times)

predict_survival_function(X, index, dataset, seed, plot=False)[source]

Predict the survival function for the given data.

Parameters:
  • X (array-like, shape (n_samples, n_features)) – Input data.

  • index (array-like, shape (n_samples,)) – Index for the samples.

  • dataset (str) – Name of the dataset.

  • seed (int) – Random seed for reproducibility.

  • plot (bool, default = False) – Whether to plot the survival function.

Returns:

survival_function – Predicted survival function.

Return type:

array-like, shape (n_samples, n_times)

class bsix.models.CoxRegressionWithTimeVarying(alpha=0.0, ties='breslow', n_iter=100)[source]

Bases: BaseSurvival

Cox Regression Time-Varying model.

Parameters:
  • alpha (float, default =0.0) – Regularization strength.

  • ties (str, default = "breslow") – Method for handling tied event times. "breslow" or "efron".

  • n_iter (int, default =100) – Number of iterations for the Newton-Raphson algorithm.

coef_

Estimated coefficients for the model.

Type:

array-like, shape (n_features,)

breslow

Breslow estimator for baseline hazards.

Type:

BreslowEstimator

survival_function

Estimated survival function.

Type:

array-like, shape (n_samples, n_times)

cumulative_hazard_function

Estimated cumulative hazard function.

Type:

array-like, shape (n_samples, n_times)

shap_explainer

SHAP explainer for model interpretability.

Type:

shap.Explainer

Examples

from bsix.models.metodologies import CoxRegressionWithTimeVarying
model = CoxRegressionWithTimeVarying(alpha=0.1, ties="efron", n_iter=200)
model.fit(X_train, y_train)
calculate_xai(X, index, scaler, dataset, seed, feature_names, background=False, plot=False)[source]

Calculate XAI values.

Parameters:
  • X (array-like, shape (n_samples, n_features)) – Input data.

  • index (array-like, shape (n_samples,)) – Index for the samples.

  • scaler (object) – Scaler used for the data.

  • dataset (str) – Name of the dataset.

  • seed (int) – Random seed for reproducibility.

  • feature_names (list of str) – Names of the features.

  • background (bool, default = False) – Whether to use background data for SHAP.

  • plot (bool, default = False) – Whether to plot the XAI values.

Returns:

  • shap_explainer (shap.Explainer) – SHAP explainer for model interpretability.

  • coefficients (dict) – Dictionary of feature coefficients sorted by absolute value.

fit(X, y)[source]

Fit the model to the data.

Parameters:
  • X (array-like, shape (n_samples, n_features)) – Training data.

  • y (structured array-like, shape (n_samples,)) – Target training values (event, start times, stop times).

Returns:

self – Fitted estimator.

Return type:

CoxRegressionWithTimeVarying

predict(X)[source]

Predict risk scores for the given data.

Parameters:

X (array-like, shape (n_samples, n_features)) – Input data.

Returns:

risk – Predicted risk scores.

Return type:

array-like, shape (n_samples,)

predict_cumulative_hazard_function(X, index, dataset, seed, plot=False)[source]

Predict the cumulative hazard function for the given data.

Parameters:
  • X (array-like, shape (n_samples, n_features)) – Input data.

  • index (array-like, shape (n_samples,)) – Index for the samples.

  • dataset (str) – Name of the dataset.

  • seed (int) – Random seed for reproducibility.

  • plot (bool, default = False) – Whether to plot the cumulative hazard function.

Returns:

cumulative_hazard_function – Predicted cumulative hazard function.

Return type:

array-like, shape (n_samples, n_times)

predict_survival_function(X, index, dataset, seed, plot=False)[source]

Predict the survival function for the given data.

Parameters:
  • X (array-like, shape (n_samples, n_features)) – Input data.

  • index (array-like, shape (n_samples,)) – Index for the samples.

  • dataset (str) – Name of the dataset.

  • seed (int) – Random seed for reproducibility.

  • plot (bool, default = False) – Whether to plot the survival function.

Returns:

survival_function – Predicted survival function.

Return type:

array-like, shape (n_samples, n_times)

class bsix.models.DeepMultiTask(num_inputs, valid_data=None, hidden_layers=None, epochs=500, learn_rate=0.0, lr_decay=0.0, l1_reg=0.0, l2_reg=0.0, cox_reg=0.0, momentum=0.9, activation='relu', dropout=0.0, standardize=True, ties='cox', device=None, validation_frequency=10, patience=500, improvement_threshold=0.99999, patience_increase=25, logger=None, verbose=True, seed=None, coef_likelihood=[1.0])[source]

Bases: BaseSurvival

Deep Multi-Task model.

Parameters:
  • num_inputs (int) – Number of input features.

  • valid_data (dict, default = None) – Validation data in the form of a dictionary with keys “x”, “e”, and “t” for features, events, and times, respectively.

  • hidden_layers (list of int, default = None) – List specifying the number of units in each hidden layer.

  • epochs (int, default = 500) – Number of training epochs.

  • learn_rate (float, default = 0.0) – Learning rate for the optimizer.

  • lr_decay (float, default = 0.0) – Learning rate decay factor.

  • l1_reg (float, default = 0.0) – L1 regularization strength.

  • l2_reg (float, default = 0.0) – L2 regularization strength.

  • cox_reg (float, default = 0.0) – Coefficient for the Cox loss in the total loss function.

  • momentum (float, default = 0.9) – Momentum for the optimizer.

  • activation (str, default = "relu") – Activation function to use in the hidden layers. relu, selu, tanh or sigmoid.

  • dropout (float, default = 0.0) – Dropout rate for regularization.

  • standardize (bool, default = True) – Whether to standardize input features.

  • ties (str, default = "cox") – Method for handling tied event times. "cox" or "breslow".

  • device (torch.device, default = None) – Device to run the model on (e.g., “cpu” or “cuda”).

  • validation_frequency (int, default = 10) – Frequency (in epochs) to perform validation.

  • patience (int, default = 2000) – Number of epochs to wait for improvement before early stopping.

  • improvement_threshold (float, default = 0.99999) – Threshold for considering an improvement in validation loss.

  • patience_increase (int, default = 2) – Factor by which to increase patience when an improvement is observed.

  • logger (DeepSurvLogger, default = None) – Logger for tracking training progress.

  • verbose (bool, default = True) – Whether to print training progress.

  • seed (int, default = None) – Random seed for reproducibility.

  • coef_likelihood (list of float, default = [1.0]) – Coefficients for the likelihood loss of each progression in the total loss function.

survival_function

Estimated survival function.

Type:

array-like, shape (n_samples, n_times)

cumulative_hazard_function

Estimated cumulative hazard function.

Type:

array-like, shape (n_samples, n_times)

shap_explainer

SHAP explainer for model interpretability.

Type:

list of shap.Explainer

Examples

from bsix.models.metodologies import DeepMultiTask
model = DeepMultiTask(num_inputs=10, hidden_layers=[32,], epochs=200, learn_rate=0.01)
model.fit(X_train, y_train)
calculate_xai(X, index, scaler, dataset, seed, feature_names, background=False, plot=False)[source]

Calculate XAI values.

Parameters:
  • X (array-like, shape (n_samples, n_features)) – Input data.

  • index (array-like, shape (n_samples,)) – Index for the samples.

  • scaler (object) – Scaler used for the data.

  • dataset (str) – Name of the dataset.

  • seed (int) – Random seed for reproducibility.

  • feature_names (list of str) – Names of the features.

  • background (bool, default = False) – Whether to use background data for SHAP.

  • plot (bool, default = False) – Whether to plot the XAI values.

Returns:

shap_explainer – SHAP explainer for model interpretability.

Return type:

list of shap.Explainer, shape (n_progressions,)

fit(X_train, y_train, **kwargs)[source]

Fit the model to the data.

Parameters:
  • X_train (array-like, shape (n_progressions, n_samples, n_features)) – Training data.

  • y_train (structured array-like, shape (n_progressions, n_samples,)) – Target training values (events, times).

Returns:

self – Fitted estimator.

Return type:

DeepMultiTask

predict(x)[source]

Predict risk scores for the given data.

Parameters:

X (array-like, shape (n_progressions, n_samples, n_features)) – Input data.

Returns:

risk – Predicted risk scores.

Return type:

array-like, shape (n_progressions, n_samples,)

predict_cumulative_hazard_function(X, index, dataset, seed, plot=False)[source]

Predict the cumulative hazard function for the given data.

Parameters:
  • X (array-like, shape (n_samples, n_features)) – Input data.

  • index (array-like, shape (n_samples,)) – Index for the samples.

  • dataset (str) – Name of the dataset.

  • seed (int) – Random seed for reproducibility.

  • plot (bool, default = False) – Whether to plot the cumulative hazard function.

Returns:

cumulative_hazard_function – Predicted cumulative hazard functions.

Return type:

array-like, shape (n_progressions, n_samples, n_times)

predict_survival_function(X, index, dataset, seed, plot=False)[source]

Predict the survival function for the given data.

Parameters:
  • X (array-like, shape (n_samples, n_features)) – Input data.

  • index (array-like, shape (n_samples,)) – Index for the samples.

  • dataset (str) – Name of the dataset.

  • seed (int) – Random seed for reproducibility.

  • plot (bool, default = False) – Whether to plot the survival function.

Returns:

survival_function – Predicted survival functions.

Return type:

array-like, shape (n_progressions, n_samples, n_times)

set_fit_request(*, X_train: bool | None | str = '$UNCHANGED$', y_train: bool | None | str = '$UNCHANGED$') DeepMultiTask

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • X_train (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for X_train parameter in fit.

  • y_train (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for y_train parameter in fit.

Returns:

self – The updated object.

Return type:

object

set_predict_request(*, x: bool | None | str = '$UNCHANGED$') DeepMultiTask

Configure whether metadata should be requested to be passed to the predict method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to predict if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to predict.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

x (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for x parameter in predict.

Returns:

self – The updated object.

Return type:

object

class bsix.models.DeepMultiTaskMultiLoss(num_inputs, valid_data=None, hidden_layers=None, epochs=500, learn_rate=0.0, lr_decay=0.0, l1_reg=0.0, l2_reg=0.0, cox_reg=0.0, bin_reg=0.0, momentum=0.9, activation='relu', dropout=0.0, standardize=True, ties='cox', device=None, validation_frequency=10, patience=500, improvement_threshold=0.99999, patience_increase=25, logger=None, verbose=True, seed=None, coef_likelihood=[1.0], coef_binary=[1.0])[source]

Bases: BaseSurvival

Deep Multi-Task Multi-Loss model.

Parameters:
  • num_inputs (int) – Number of input features.

  • valid_data (dict, default = None) – Validation data in the form of a dictionary with keys “x”, “e”, and “t” for features, events, and times, respectively.

  • hidden_layers (list of int, default = None) – List specifying the number of units in each hidden layer.

  • epochs (int, default = 500) – Number of training epochs.

  • learn_rate (float, default = 0.0) – Learning rate for the optimizer.

  • lr_decay (float, default = 0.0) – Learning rate decay factor.

  • l1_reg (float, default = 0.0) – L1 regularization strength.

  • l2_reg (float, default = 0.0) – L2 regularization strength.

  • cox_reg (float, default = 0.0) – Coefficient for the Cox loss in the total loss function.

  • bin_reg (float, default = 0.0) – Coefficient for the binary loss in the total loss function.

  • momentum (float, default = 0.9) – Momentum for the optimizer.

  • activation (str, default = "relu") – Activation function to use in the hidden layers. relu, selu, tanh or sigmoid.

  • dropout (float, default = 0.0) – Dropout rate for regularization.

  • standardize (bool, default = True) – Whether to standardize input features.

  • ties (str, default = "cox") – Method for handling tied event times. "cox" or "breslow".

  • device (torch.device, default = None) – Device to run the model on (e.g., “cpu” or “cuda”).

  • validation_frequency (int, default = 10) – Frequency (in epochs) to perform validation.

  • patience (int, default = 2000) – Number of epochs to wait for improvement before early stopping.

  • improvement_threshold (float, default = 0.99999) – Threshold for considering an improvement in validation loss.

  • patience_increase (int, default = 2) – Factor by which to increase patience when an improvement is observed.

  • logger (DeepSurvLogger, default = None) – Logger for tracking training progress.

  • verbose (bool, default = True) – Whether to print training progress.

  • seed (int, default = None) – Random seed for reproducibility.

  • coef_likelihood (list of float, default = [1.0]) – Coefficients for the likelihood loss of each progression in the total loss function.

  • coef_binary (list of float, default = [1.0]) – Coefficients for the binary loss of each progression in the total loss function.

survival_function

Estimated survival function.

Type:

array-like, shape (n_samples, n_times)

cumulative_hazard_function

Estimated cumulative hazard function.

Type:

array-like, shape (n_samples, n_times)

shap_explainer

SHAP explainer for model interpretability.

Type:

list of shap.Explainer

Examples

from bsix.models.metodologies import DeepMultiTaskMultiLoss
model = DeepMultiTaskMultiLoss(num_inputs=10, hidden_layers=[32,], epochs=200, learn_rate=0.01)
model.fit(X_train, y_train)
calculate_xai(X, index, scaler, dataset, seed, feature_names, background=False, plot=False)[source]

Calculate XAI values.

Parameters:
  • X (array-like, shape (n_samples, n_features)) – Input data.

  • index (array-like, shape (n_samples,)) – Index for the samples.

  • scaler (object) – Scaler used for the data.

  • dataset (str) – Name of the dataset.

  • seed (int) – Random seed for reproducibility.

  • feature_names (list of str) – Names of the features.

  • background (bool, default = False) – Whether to use background data for SHAP.

  • plot (bool, default = False) – Whether to plot the XAI values.

Returns:

shap_explainer – SHAP explainer for model interpretability.

Return type:

list of shap.Explainer, shape (n_progressions,)

fit(X_train, y_train, **kwargs)[source]

Fit the model to the data.

Parameters:
  • X_train (array-like, shape (n_progressions, n_samples, n_features)) – Training data.

  • y_train (structured array-like, shape (n_progressions, n_samples,)) – Target training values (events, times).

Returns:

self – Fitted estimator.

Return type:

DeepMultiTaskMultiLoss

predict(x)[source]

Predict risk scores for the given data.

Parameters:

X (array-like, shape (n_progressions, n_samples, n_features)) – Input data.

Returns:

risk – Predicted risk scores.

Return type:

array-like, shape (n_progressions, n_samples,)

predict_cumulative_hazard_function(X, index, dataset, seed, plot=False)[source]

Predict the cumulative hazard function for the given data.

Parameters:
  • X (array-like, shape (n_samples, n_features)) – Input data.

  • index (array-like, shape (n_samples,)) – Index for the samples.

  • dataset (str) – Name of the dataset.

  • seed (int) – Random seed for reproducibility.

  • plot (bool, default = False) – Whether to plot the cumulative hazard function.

Returns:

cumulative_hazard_function – Predicted cumulative hazard functions.

Return type:

array-like, shape (n_progressions, n_samples, n_times)

predict_survival_function(X, index, dataset, seed, plot=False)[source]

Predict the survival function for the given data.

Parameters:
  • X (array-like, shape (n_samples, n_features)) – Input data.

  • index (array-like, shape (n_samples,)) – Index for the samples.

  • dataset (str) – Name of the dataset.

  • seed (int) – Random seed for reproducibility.

  • plot (bool, default = False) – Whether to plot the survival function.

Returns:

survival_function – Predicted survival functions.

Return type:

array-like, shape (n_progressions, n_samples, n_times)

set_fit_request(*, X_train: bool | None | str = '$UNCHANGED$', y_train: bool | None | str = '$UNCHANGED$') DeepMultiTaskMultiLoss

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • X_train (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for X_train parameter in fit.

  • y_train (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for y_train parameter in fit.

Returns:

self – The updated object.

Return type:

object

set_predict_request(*, x: bool | None | str = '$UNCHANGED$') DeepMultiTaskMultiLoss

Configure whether metadata should be requested to be passed to the predict method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to predict if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to predict.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

x (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for x parameter in predict.

Returns:

self – The updated object.

Return type:

object

set_score_request(*, x: bool | None | str = '$UNCHANGED$') DeepMultiTaskMultiLoss

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

x (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for x parameter in score.

Returns:

self – The updated object.

Return type:

object

class bsix.models.DeepSurv(num_inputs, valid_data=None, hidden_layers=None, epochs=500, learn_rate=0.0, lr_decay=0.0, l1_reg=0.0, l2_reg=0.0, momentum=0.9, activation='relu', dropout=0.0, standardize=True, ties='cox', device=None, validation_frequency=10, patience=2000, improvement_threshold=0.99999, patience_increase=2, logger=None, verbose=True, seed=None)[source]

Bases: BaseSurvival

Deep Survival model.

Parameters:
  • num_inputs (int) – Number of input features.

  • valid_data (dict, default = None) – Validation data in the form of a dictionary with keys “x”, “e”, and “t” for features, events, and times, respectively.

  • hidden_layers (list of int, default = None) – List specifying the number of units in each hidden layer.

  • epochs (int, default =500) – Number of training epochs.

  • learn_rate (float, default =0.0) – Learning rate for the optimizer.

  • lr_decay (float, default =0.0) – Learning rate decay factor.

  • l1_reg (float, default =0.0) – L1 regularization strength.

  • l2_reg (float, default =0.0) – L2 regularization strength.

  • momentum (float, default =0.9) – Momentum for the optimizer.

  • activation (str, default = "relu") – Activation function to use in the hidden layers. relu, selu, tanh or sigmoid.

  • dropout (float, default =0.0) – Dropout rate for regularization.

  • standardize (bool, default = True) – Whether to standardize input features.

  • ties (str, default = "cox") – Method for handling tied event times. "cox" or "breslow".

  • device (torch.device, default = None) – Device to run the model on (e.g., “cpu” or “cuda”).

  • validation_frequency (int, default =10) – Frequency (in epochs) to perform validation.

  • patience (int, default =2000) – Number of epochs to wait for improvement before early stopping.

  • improvement_threshold (float, default =0.99999) – Threshold for considering an improvement in validation loss.

  • patience_increase (int, default =2) – Factor by which to increase patience when an improvement is observed.

  • logger (DeepSurvLogger, default = None) – Logger for tracking training progress.

  • verbose (bool, default = True) – Whether to print training progress.

  • seed (int, default = None) – Random seed for reproducibility.

breslow

Breslow estimator for baseline hazards.

Type:

BreslowEstimator

survival_function

Estimated survival function.

Type:

array-like, shape (n_samples, n_times)

cumulative_hazard_function

Estimated cumulative hazard function.

Type:

array-like, shape (n_samples, n_times)

shap_explainer

SHAP explainer for model interpretability.

Type:

shap.Explainer

Examples

from bsix.models.metodologies import DeepSurv
model = DeepSurv(num_inputs=10, hidden_layers=[32,], epochs=200, learn_rate=0.01)
model.fit(X_train, y_train)
calculate_xai(X, index, scaler, dataset, seed, feature_names, background=False, plot=False)[source]

Calculate XAI values.

Parameters:
  • X (array-like, shape (n_samples, n_features)) – Input data.

  • index (array-like, shape (n_samples,)) – Index for the samples.

  • scaler (object) – Scaler used for the data.

  • dataset (str) – Name of the dataset.

  • seed (int) – Random seed for reproducibility.

  • feature_names (list of str) – Names of the features.

  • background (bool, default = False) – Whether to use background data for SHAP.

  • plot (bool, default = False) – Whether to plot the XAI values.

Returns:

shap_explainer – SHAP explainer for model interpretability.

Return type:

shap.Explainer

fit(X_train, y_train, **kwargs)[source]

Fit the model to the data.

Parameters:
  • X_train (array-like, shape (n_samples, n_features)) – Training data.

  • y_train (structured array-like, shape (n_samples,)) – Target training values (events, times).

Returns:

self – Fitted estimator.

Return type:

DeepSurv

predict(x)[source]

Predict risk scores for the given data.

Parameters:

X (array-like, shape (n_samples, n_features)) – Input data.

Returns:

risk – Predicted risk scores.

Return type:

array-like, shape (n_samples,)

predict_cumulative_hazard_function(X, index, dataset, seed, plot=False)[source]

Predict the cumulative hazard function for the given data.

Parameters:
  • X (array-like, shape (n_samples, n_features)) – Input data.

  • index (array-like, shape (n_samples,)) – Index for the samples.

  • dataset (str) – Name of the dataset.

  • seed (int) – Random seed for reproducibility.

  • plot (bool, default = False) – Whether to plot the cumulative hazard function.

Returns:

cumulative_hazard_function – Predicted cumulative hazard function.

Return type:

array-like, shape (n_samples, n_times)

predict_survival_function(X, index, dataset, seed, plot=False)[source]

Predict the survival function for the given data.

Parameters:
  • X (array-like, shape (n_samples, n_features)) – Input data.

  • index (array-like, shape (n_samples,)) – Index for the samples.

  • dataset (str) – Name of the dataset.

  • seed (int) – Random seed for reproducibility.

  • plot (bool, default = False) – Whether to plot the survival function.

Returns:

survival_function – Predicted survival function.

Return type:

array-like, shape (n_samples, n_times)

set_fit_request(*, X_train: bool | None | str = '$UNCHANGED$', y_train: bool | None | str = '$UNCHANGED$') DeepSurv

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • X_train (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for X_train parameter in fit.

  • y_train (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for y_train parameter in fit.

Returns:

self – The updated object.

Return type:

object

set_predict_request(*, x: bool | None | str = '$UNCHANGED$') DeepSurv

Configure whether metadata should be requested to be passed to the predict method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to predict if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to predict.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

x (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for x parameter in predict.

Returns:

self – The updated object.

Return type:

object

class bsix.models.DeepTimeVarying(num_inputs, valid_data=None, hidden_layers=None, epochs=500, learn_rate=0.0, lr_decay=0.0, l1_reg=0.0, l2_reg=0.0, momentum=0.9, activation='relu', dropout=0.0, standardize=True, ties='cox', device=None, validation_frequency=10, patience=2000, improvement_threshold=0.99999, patience_increase=2, logger=None, verbose=True, seed=None)[source]

Bases: BaseSurvival

Deep Time-Varying model.

Parameters:
  • num_inputs (int) – Number of input features.

  • valid_data (dict, default = None) – Validation data in the form of a dictionary with keys “x”, “e”, and “t” for features, events, and times, respectively.

  • hidden_layers (list of int, default = None) – List specifying the number of units in each hidden layer.

  • epochs (int, default =500) – Number of training epochs.

  • learn_rate (float, default =0.0) – Learning rate for the optimizer.

  • lr_decay (float, default =0.0) – Learning rate decay factor.

  • l1_reg (float, default =0.0) – L1 regularization strength.

  • l2_reg (float, default =0.0) – L2 regularization strength.

  • momentum (float, default =0.9) – Momentum for the optimizer.

  • activation (str, default = "relu") – Activation function to use in the hidden layers. relu, selu, tanh or sigmoid.

  • dropout (float, default =0.0) – Dropout rate for regularization.

  • standardize (bool, default = True) – Whether to standardize input features.

  • ties (str, default = "cox") – Method for handling tied event times. "cox" or "breslow".

  • device (torch.device, default = None) – Device to run the model on (e.g., “cpu” or “cuda”).

  • validation_frequency (int, default =10) – Frequency (in epochs) to perform validation.

  • patience (int, default =2000) – Number of epochs to wait for improvement before early stopping.

  • improvement_threshold (float, default =0.99999) – Threshold for considering an improvement in validation loss.

  • patience_increase (int, default =2) – Factor by which to increase patience when an improvement is observed.

  • logger (DeepSurvLogger, default = None) – Logger for tracking training progress.

  • verbose (bool, default = True) – Whether to print training progress.

  • seed (int, default = None) – Random seed for reproducibility.

breslow

Breslow estimator for baseline hazards.

Type:

BreslowEstimator

survival_function

Estimated survival function.

Type:

array-like, shape (n_samples, n_times)

cumulative_hazard_function

Estimated cumulative hazard function.

Type:

array-like, shape (n_samples, n_times)

shap_explainer

SHAP explainer for model interpretability.

Type:

shap.Explainer

Examples

from bsix.models.metodologies import DeepTimeVarying
model = DeepTimeVarying(num_inputs=10, hidden_layers=[32,], epochs=200, learn_rate=0.01)
model.fit(X_train, y_train)
calculate_xai(X, index, scaler, dataset, seed, feature_names, background=False, plot=False)[source]

Calculate XAI values.

Parameters:
  • X (array-like, shape (n_samples, n_features)) – Input data.

  • index (array-like, shape (n_samples,)) – Index for the samples.

  • scaler (object) – Scaler used for the data.

  • dataset (str) – Name of the dataset.

  • seed (int) – Random seed for reproducibility.

  • feature_names (list of str) – Names of the features.

  • background (bool, default = False) – Whether to use background data for SHAP.

  • plot (bool, default = False) – Whether to plot the XAI values.

Returns:

shap_explainer – SHAP explainer for model interpretability.

Return type:

shap.Explainer

fit(X_train, y_train, **kwargs)[source]

Fit the model to the data.

Parameters:
  • X_train (array-like, shape (n_samples, n_features)) – Training data.

  • y_train (structured array-like, shape (n_samples,)) – Target training values (events, start times, stop times).

Returns:

self – Fitted estimator.

Return type:

DeepTimeVarying

predict(x)[source]

Predict risk scores for the given data.

Parameters:

X (array-like, shape (n_samples, n_features)) – Input data.

Returns:

risk – Predicted risk scores.

Return type:

array-like, shape (n_samples,)

predict_cumulative_hazard_function(X, index, dataset, seed, plot=False)[source]

Predict the cumulative hazard function for the given data.

Parameters:
  • X (array-like, shape (n_samples, n_features)) – Input data.

  • index (array-like, shape (n_samples,)) – Index for the samples.

  • dataset (str) – Name of the dataset.

  • seed (int) – Random seed for reproducibility.

  • plot (bool, default = False) – Whether to plot the cumulative hazard function.

Returns:

cumulative_hazard_function – Predicted cumulative hazard function.

Return type:

array-like, shape (n_samples, n_times)

predict_survival_function(X, index, dataset, seed, plot=False)[source]

Predict the survival function for the given data.

Parameters:
  • X (array-like, shape (n_samples, n_features)) – Input data.

  • index (array-like, shape (n_samples,)) – Index for the samples.

  • dataset (str) – Name of the dataset.

  • seed (int) – Random seed for reproducibility.

  • plot (bool, default = False) – Whether to plot the survival function.

Returns:

survival_function – Predicted survival function.

Return type:

array-like, shape (n_samples, n_times)

set_fit_request(*, X_train: bool | None | str = '$UNCHANGED$', y_train: bool | None | str = '$UNCHANGED$') DeepTimeVarying

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • X_train (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for X_train parameter in fit.

  • y_train (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for y_train parameter in fit.

Returns:

self – The updated object.

Return type:

object

set_predict_request(*, x: bool | None | str = '$UNCHANGED$') DeepTimeVarying

Configure whether metadata should be requested to be passed to the predict method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to predict if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to predict.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

x (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for x parameter in predict.

Returns:

self – The updated object.

Return type:

object

class bsix.models.RandomSurvForest(seed, n_jobs=-1, n_estimators=100, max_depth=None, min_samples_leaf=3, min_samples_split=6)[source]

Bases: BaseSurvival

Random Survival Forest model.

Parameters:
  • seed (int) – Random seed for reproducibility.

  • n_jobs (int, default =-1) – Number of jobs to run in parallel.

  • n_estimators (int, default =100) – The number of trees in the forest.

  • max_depth (int, default =´´None´´) – The maximum depth of the tree.

  • min_samples_leaf (int, default =3) – The minimum number of samples required to be at a leaf node.

  • min_samples_split (int, default =6) – The minimum number of samples required to split an internal node.

survival_function

Estimated survival function.

Type:

array-like, shape (n_samples, n_times)

cumulative_hazard_function

Estimated cumulative hazard function.

Type:

array-like, shape (n_samples, n_times)

shap_explainer

SHAP explainer for model interpretability.

Type:

shap.Explainer

Examples

from bsix.models.metodologies import RandomSurvForest
model = RandomSurvForest(seed=42, n_estimators=100, max_depth=5)
model.fit(X_train, y_train)
calculate_xai(X, index, scaler, dataset, seed, feature_names, background=False, plot=False)[source]

Calculate XAI values.

Parameters:
  • X (array-like, shape (n_samples, n_features)) – Input data.

  • index (array-like, shape (n_samples,)) – Index for the samples.

  • scaler (object) – Scaler used for the data.

  • dataset (str) – Name of the dataset.

  • seed (int) – Random seed for reproducibility.

  • feature_names (list of str) – Names of the features.

  • background (bool, default = False) – Whether to use background data for SHAP.

  • plot (bool, default = False) – Whether to plot the XAI values.

Returns:

shap_explainer – SHAP explainer for model interpretability.

Return type:

shap.Explainer

fit(X, y)[source]

Fit the model to the data.

Parameters:
  • X (array-like, shape (n_samples, n_features)) – Training data.

  • y (structured array-like, shape (n_samples,)) – Target training values (events, times).

Returns:

self – Fitted estimator.

Return type:

RandomSurvForest

predict(X)[source]

Predict risk scores for the given data.

Parameters:

X (array-like, shape (n_samples, n_features)) – Input data.

Returns:

risk – Predicted risk scores.

Return type:

array-like, shape (n_samples,)

predict_cumulative_hazard_function(X, index, dataset, seed, plot=False)[source]

Predict the cumulative hazard function for the given data.

Parameters:
  • X (array-like, shape (n_samples, n_features)) – Input data.

  • index (array-like, shape (n_samples,)) – Index for the samples.

  • dataset (str) – Name of the dataset.

  • seed (int) – Random seed for reproducibility.

  • plot (bool, default = False) – Whether to plot the cumulative hazard function.

Returns:

cumulative_hazard_function – Predicted cumulative hazard function.

Return type:

array-like, shape (n_samples, n_times)

predict_survival_function(X, index, dataset, seed, plot=False)[source]

Predict the survival function for the given data.

Parameters:
  • X (array-like, shape (n_samples, n_features)) – Input data.

  • index (array-like, shape (n_samples,)) – Index for the samples.

  • dataset (str) – Name of the dataset.

  • seed (int) – Random seed for reproducibility.

  • plot (bool, default = False) – Whether to plot the survival function.

Returns:

survival_function – Predicted survival function.

Return type:

array-like, shape (n_samples, n_times)

class bsix.models.SurvTree(max_depth=None, min_samples_split=6, min_samples_leaf=3, seed=0)[source]

Bases: BaseSurvival

Survival Tree model.

calculate_xai(X, index, scaler, dataset, seed, feature_names, background=False, plot=False)[source]

Calculate XAI values.

Parameters:
  • X (array-like, shape (n_samples, n_features)) – Input data.

  • index (array-like, shape (n_samples,)) – Index for the samples.

  • scaler (object) – Scaler used for the data.

  • dataset (str) – Name of the dataset.

  • seed (int) – Random seed for reproducibility.

  • feature_names (list of str) – Names of the features.

  • background (bool, default = False) – Whether to use background data for SHAP.

  • plot (bool, default = False) – Whether to plot the XAI values.

Returns:

shap_explainer – SHAP explainer for model interpretability.

Return type:

shap.Explainer

fit(X, y)[source]

Fit the model to the data.

Parameters:
  • X (array-like, shape (n_samples, n_features)) – Training data.

  • y (structured array-like, shape (n_samples,)) – Target training values (events, times).

Returns:

self – Fitted estimator.

Return type:

SurvTree

predict(X)[source]

Predict risk scores for the given data.

Parameters:

X (array-like, shape (n_samples, n_features)) – Input data.

Returns:

risk – Predicted risk scores.

Return type:

array-like, shape (n_samples,)

predict_cumulative_hazard_function(X, index, dataset, seed, plot=False)[source]

Predict the cumulative hazard function for the given data.

Parameters:
  • X (array-like, shape (n_samples, n_features)) – Input data.

  • index (array-like, shape (n_samples,)) – Index for the samples.

  • dataset (str) – Name of the dataset.

  • seed (int) – Random seed for reproducibility.

  • plot (bool, default = False) – Whether to plot the cumulative hazard function.

Returns:

cumulative_hazard_function – Predicted cumulative hazard function.

Return type:

array-like, shape (n_samples, n_times)

predict_survival_function(X, index, dataset, seed, plot=False)[source]

Predict the survival function for the given data.

Parameters:
  • X (array-like, shape (n_samples, n_features)) – Input data.

  • index (array-like, shape (n_samples,)) – Index for the samples.

  • dataset (str) – Name of the dataset.

  • seed (int) – Random seed for reproducibility.

  • plot (bool, default = False) – Whether to plot the survival function.

Returns:

survival_function – Predicted survival function.

Return type:

array-like, shape (n_samples, n_times)