Skip to contents

Based on computed area under the curves (i.e., compute_aucs()) and slopes (i.e., compute_slopes()) for several intervals using a model fitted by time_model(), compute an outlier detection. For details, see methods iqr and zscore of performance::check_outliers().

Usage

compute_outliers(
  fit,
  method,
  period = c(0, 0.5, 1.5, 3.5, 6.5, 10, 12, 17),
  knots = list(cubic_slope = NULL, linear_splines = c(0.75, 5.5, 11), cubic_splines =
    c(1, 8, 12))[[method]],
  from = c("predicted", "observed"),
  start = 0.25,
  end = 10,
  step = 0.01,
  filter = NULL,
  outlier_method = "iqr",
  outlier_threshold = list(iqr = 2)
)

Arguments

fit

A model object from a statistical model such as from a call to time_model().

method

The type of model provided in fit, i.e., one of "cubic_slope", "linear_splines" or "cubic_splines".

period

The intervals knots on which AUCs are to be computed.

knots

The knots as defined fit and according to method.

from

A string indicating the type of data to be used for the AP and AR computation, either "predicted" or "observed". Default is "predicted".

start

The start of the time window to compute AP and AR.

end

The end of the time window to compute AP and AR.

step

The step to increment the sequence.

filter

A string following data.table syntax for filtering on "i" (i.e., row elements), e.g., filter = "source == 'A'". Argument pass through compute_apar() (see predict_bmi()). Default is NULL.

outlier_method

The outlier detection method(s). Default is "iqr". Can be "cook", "pareto", "zscore", "zscore_robust", "iqr", "ci", "eti", "hdi", "bci", "mahalanobis", "mahalanobis_robust", "mcd", "ics", "optics" or "lof". See performance::check_outliers() https://easystats.github.io/performance/reference/check_outliers.html for details.

outlier_threshold

A list containing the threshold values for each method (e.g., list('mahalanobis' = 7, 'cook' = 1)), above which an observation is considered as outlier. If NULL, default values will be used (see 'Details'). If a numeric value is given, it will be used as the threshold for any of the method run. See performance::check_outliers() https://easystats.github.io/performance/reference/check_outliers.html for details.

Value

A data.frame listing the individuals which are not outliers based on several criteria.

Examples

data("bmigrowth")
ls_mod <- time_model(
  x = "age",
  y = "log(bmi)",
  cov = NULL,
  data = bmigrowth[bmigrowth[["sex"]] == 0, ],
  method = "cubic_splines"
)
#> nlme::lme(
#>   fixed = log(bmi) ~ gsp(age, knots = c(1, 8, 12), degree = rep(3, 4), smooth = rep(2, 3)),
#>   data = data,
#>   random = ~ gsp(age, knots = c(1, 8, 12), degree = rep(3, 4), smooth = rep(2, 3)) | ID,
#>   na.action = stats::na.omit,
#>   method = "ML",
#>   control = nlme::lmeControl(opt = "optim", maxIter = 500, msMaxIter = 500)
#> )
head(compute_outliers(
  fit = ls_mod,
  method = "cubic_splines",
  period = c(0, 0.5, 1.5, 3.5, 6.5, 10, 12, 17)#,
  # knots = list(
  #   "cubic_slope" = NULL,
  #   "linear_splines" = c(0.75, 5.5, 11),
  #   "cubic_splines" = c(1, 8, 12)
  # )[[method]]
)[Outlier != 0])
#>       parameter     ID   Row Distance_IQR Outlier_IQR Outlier
#>          <char> <char> <int>        <num>       <num>   <num>
#> 1: slope_0--0.5    007     5     1.302187           1       1
#> 2:  AR_ageyears    039    15     1.511658           1       1
#> 3:       AP_bmi    044    16     1.650865           1       1