Skip to content

uqregressors.metrics.metrics

This script contains many metrics which can be used to compare the efficacy of different models. The methods are described along with their functions below.

RMSCD(lower, upper, y_true, alpha, n_bins=10)

Computes the Root Mean Square Coverage Deviation (RMSCD) evaluated over a given number of bins (see group_conditional_coverage).

Parameters:

Name Type Description Default
lower Union[ndarray, Tensor]

The lower bound predictions made by the model, should be able to be flattened to 1 dimension.

required
upper Union[ndarray, Tensor]

The upper bound predictions made by the model, should be the same shape as lower.

required
y_true Union[ndarray, Tensor]

The targets to compare against, should be the same shape as lower.

required
alpha float

1 - confidence, should be a float between 0 and 1.

required
n_bins int

The number of bins to divide the outputs into.

10

Returns:

Type Description
float

The root mean square coverage deviation from alpha.

Source code in uqregressors\metrics\metrics.py
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
def RMSCD(lower, upper, y_true, alpha, n_bins=10): 
    """
    Computes the Root Mean Square Coverage Deviation (RMSCD) evaluated over a given number of bins (see group_conditional_coverage).

    Args: 
        lower (Union[np.ndarray, torch.Tensor]): The lower bound predictions made by the model, should be able to be flattened to 1 dimension.
        upper (Union[np.ndarray, torch.Tensor]): The upper bound predictions made by the model, should be the same shape as lower.
        y_true (Union[np.ndarray, torch.Tensor]): The targets to compare against, should be the same shape as lower.
        alpha (float): 1 - confidence, should be a float between 0 and 1. 
        n_bins (int): The number of bins to divide the outputs into.

    Returns: 
        (float): The root mean square coverage deviation from alpha.
    """
    _, lower, upper, y_true, alpha = validate_inputs(lower, lower, upper, y_true, alpha)
    gcc = group_conditional_coverage(lower, upper, y_true, n_bins)
    return np.sqrt(np.mean((gcc["bin_coverages"] - (1-alpha)) ** 2))

RMSCD_under(lower, upper, y_true, alpha, n_bins=10)

Computes the Root Mean Square Coverage Deviation (RMSCD) evaluated only over bins which do not meet nominal coverage (see RMSCD, group_conditional_coverage).

Parameters:

Name Type Description Default
lower Union[ndarray, Tensor]

The lower bound predictions made by the model, should be able to be flattened to 1 dimension.

required
upper Union[ndarray, Tensor]

The upper bound predictions made by the model, should be the same shape as lower.

required
y_true Union[ndarray, Tensor]

The targets to compare against, should be the same shape as lower.

required
alpha float

1 - confidence, should be a float between 0 and 1.

required
n_bins int

The number of bins to divide the outputs into.

10

Returns:

Type Description
float

The root mean square coverage deviation from alpha over bins which do not meet nominal coverage.

Source code in uqregressors\metrics\metrics.py
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
def RMSCD_under(lower, upper, y_true, alpha, n_bins=10):
    """
    Computes the Root Mean Square Coverage Deviation (RMSCD) evaluated only over bins which do not meet nominal coverage (see RMSCD, group_conditional_coverage).

    Args: 
        lower (Union[np.ndarray, torch.Tensor]): The lower bound predictions made by the model, should be able to be flattened to 1 dimension.
        upper (Union[np.ndarray, torch.Tensor]): The upper bound predictions made by the model, should be the same shape as lower.
        y_true (Union[np.ndarray, torch.Tensor]): The targets to compare against, should be the same shape as lower.
        alpha (float): 1 - confidence, should be a float between 0 and 1. 
        n_bins (int): The number of bins to divide the outputs into.

    Returns: 
        (float): The root mean square coverage deviation from alpha over bins which do not meet nominal coverage.
    """
    _, lower, upper, y_true, alpha = validate_inputs(lower, lower, upper, y_true, alpha)
    gcc = group_conditional_coverage(lower, upper, y_true, n_bins)
    miscovered_bins = gcc["bin_coverages"][gcc["bin_coverages"] < (1-alpha)]
    if len(miscovered_bins) == 0: 
        rmscd = 0.0
    else: 
        rmscd = np.sqrt(np.mean((miscovered_bins - (1-alpha)) ** 2))
    return rmscd

average_interval_width(lower, upper, **kwargs)

Computes the average interval width (distance between the predicted upper and lower bounds).

Parameters:

Name Type Description Default
lower Union[ndarray, Tensor]

The lower bound predictions made by the model, should be able to be flattened to 1 dimension.

required
upper Union[ndarray, Tensor]

The upper bound predictions made by the model, should be the same shape as lower.

required

Returns:

Type Description
float

Average distance between the upper and lower bound.

Source code in uqregressors\metrics\metrics.py
72
73
74
75
76
77
78
79
80
81
82
83
84
def average_interval_width(lower, upper, **kwargs):
    """
    Computes the average interval width (distance between the predicted upper and lower bounds). 

    Args:
        lower (Union[np.ndarray, torch.Tensor]): The lower bound predictions made by the model, should be able to be flattened to 1 dimension.
        upper (Union[np.ndarray, torch.Tensor]): The upper bound predictions made by the model, should be the same shape as lower.

    Returns: 
        (float): Average distance between the upper and lower bound.
    """
    _, lower, upper, _, _ = validate_inputs(lower, lower, upper, lower)
    return np.mean(upper - lower)

compute_all_metrics(mean, lower, upper, y_true, alpha, n_bins=10, excluded_metrics=['group_conditional_coverage'])

Compute all standard uncertainty quantification metrics and return as a dictionary. Computes the Root Mean Square Coverage Deviation (RMSCD) evaluated over a given number of bins.

Parameters:

Name Type Description Default
mean Union[Tensor, ndarray]

The mean predictions to compute metrics for, should be able to be flattened to one dimension.

required
lower Union[ndarray, Tensor]

The lower bound predictions made by the model, should be able to be flattened to 1 dimension.

required
upper Union[ndarray, Tensor]

The upper bound predictions made by the model, should be the same shape as mean.

required
y_true Union[ndarray, Tensor]

The targets to compare against, should be the same shape as mean.

required
alpha float

1 - confidence, should be a float between 0 and 1.

required
n_bins int

The number of bins to divide the outputs into for conditional coverage metrics.

10
excluded_metrics list

The key of any metrics to exclude from being returned.

['group_conditional_coverage']

Returns:

Type Description
dict

dictionary containing the following metrics, except those named in excluded_metrics.

rmse (float): Root Mean Square Error.

coverage (float): Marginal coverage.

average interval width (float): Average distance between upper and lower bound predictions.

interval_score (float): Interval score between predictions and data.

nll_gaussian (float): Average Negative Log Likelihood of data given predictions under Gaussian assumption.

error_width_corr (float): Pearson correlation coefficient between true errors and predicted interval width.

group_conditional_coverage (dict): Dictionary containing the mean and coverage of each bin when the outputs are split between several bins.

RMSCD (float): Root mean square coverage deviation between the coverage conditional on output bin and the nominal coverage.

RMSCD_under (float): Root mean square coverage deviation for all bins which undercover compared to nominal coverage.

lowest_group_coverage (float): The lowest coverage of any bin into which the outputs were binned.

Source code in uqregressors\metrics\metrics.py
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
def compute_all_metrics(mean, lower, upper, y_true, alpha, n_bins=10, excluded_metrics=["group_conditional_coverage"]):
    """
    Compute all standard uncertainty quantification metrics and return as a dictionary.
    Computes the Root Mean Square Coverage Deviation (RMSCD) evaluated over a given number of bins. 

    Args: 
        mean (Union[torch.Tensor, np.ndarray]): The mean predictions to compute metrics for, should be able to be flattened to one dimension.
        lower (Union[np.ndarray, torch.Tensor]): The lower bound predictions made by the model, should be able to be flattened to 1 dimension. 
        upper (Union[np.ndarray, torch.Tensor]): The upper bound predictions made by the model, should be the same shape as mean.
        y_true (Union[np.ndarray, torch.Tensor]): The targets to compare against, should be the same shape as mean.
        alpha (float): 1 - confidence, should be a float between 0 and 1. 
        n_bins (int): The number of bins to divide the outputs into for conditional coverage metrics. 
        excluded_metrics (list): The key of any metrics to exclude from being returned.

    Returns: 
        (dict): dictionary containing the following metrics, except those named in excluded_metrics.

            rmse (float): Root Mean Square Error. 

            coverage (float): Marginal coverage. 

            average interval width (float): Average distance between upper and lower bound predictions.

            interval_score (float): Interval score between predictions and data. 

            nll_gaussian (float): Average Negative Log Likelihood of data given predictions under Gaussian assumption.

            error_width_corr (float): Pearson correlation coefficient between true errors and predicted interval width. 

            group_conditional_coverage (dict): Dictionary containing the mean and coverage of each bin when the outputs are split between several bins.

            RMSCD (float): Root mean square coverage deviation between the coverage conditional on output bin and the nominal coverage.

            RMSCD_under (float): Root mean square coverage deviation for all bins which undercover compared to nominal coverage.

            lowest_group_coverage (float): The lowest coverage of any bin into which the outputs were binned. 
    """

    mean, lower, upper, y_true, alpha = validate_inputs(mean, lower, upper, y_true, alpha)

    metrics_dict = {
        "rmse": rmse(mean, y_true, alpha=alpha),
        "coverage": coverage(lower, upper, y_true, alpha=alpha),
        "average_interval_width": average_interval_width(lower, upper, alpha=alpha),
        "interval_score": interval_score(lower, upper, y_true, alpha),
        "nll_gaussian": nll_gaussian(mean, lower, upper, y_true, alpha),
        "error_width_corr": error_width_corr(mean, lower, upper, y_true), 
        "group_conditional_coverage": group_conditional_coverage(lower, upper, y_true, n_bins),
        "RMSCD": RMSCD(lower, upper, y_true, alpha, n_bins),
        "RMSCD_under": RMSCD_under(lower, upper, y_true, alpha, n_bins),
        "lowest_group_coverage": lowest_group_coverage(lower, upper, y_true, n_bins)
    }

    return_dict = {}
    for metric, value in metrics_dict.items(): 
        if metric not in excluded_metrics: 
            return_dict[metric] = value 

    return return_dict

coverage(lower, upper, y_true, **kwargs)

Computes the coverage as a float between 0 and 1.

Parameters:

Name Type Description Default
lower Union[ndarray, Tensor]

The lower bound predictions made by the model, should be able to be flattened to 1 dimension.

required
upper Union[ndarray, Tensor]

The upper bound predictions made by the model, should be the same shape as lower.

required
y_true Union[ndarray, Tensor]

The targets to compare against, should be the same shape as lower.

required

Returns:

Type Description
float

Coverage as a scalar between 0.0 and 1.0.

Source code in uqregressors\metrics\metrics.py
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
def coverage(lower, upper, y_true, **kwargs):
    """
    Computes the coverage as a float between 0 and 1. 

    Args:
        lower (Union[np.ndarray, torch.Tensor]): The lower bound predictions made by the model, should be able to be flattened to 1 dimension.
        upper (Union[np.ndarray, torch.Tensor]): The upper bound predictions made by the model, should be the same shape as lower.
        y_true (Union[np.ndarray, torch.Tensor]): The targets to compare against, should be the same shape as lower.

    Returns: 
        (float): Coverage as a scalar between 0.0 and 1.0.
    """
    _, lower, upper, y_true, _ = validate_inputs(lower, lower, upper, y_true)
    covered = (y_true >= lower) & (y_true <= upper)
    return np.mean(covered)

error_width_corr(mean, lower, upper, y_true, **kwargs)

Computes the Pearson correlation coefficient between true errors and the predicted interval width.

Parameters:

Name Type Description Default
lower Union[ndarray, Tensor]

The lower bound predictions made by the model, should be able to be flattened to 1 dimension.

required
upper Union[ndarray, Tensor]

The upper bound predictions made by the model, should be the same shape as lower.

required
y_true Union[ndarray, Tensor]

The targets to compare against, should be the same shape as lower.

required

Returns:

Type Description
float

Correlation coefficient between residuals and predicted interval width, bounded in [-1, 1].

Source code in uqregressors\metrics\metrics.py
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
def error_width_corr(mean, lower, upper, y_true, **kwargs): 
    """
    Computes the Pearson correlation coefficient between true errors and the predicted interval width.

    Args: 
        lower (Union[np.ndarray, torch.Tensor]): The lower bound predictions made by the model, should be able to be flattened to 1 dimension.
        upper (Union[np.ndarray, torch.Tensor]): The upper bound predictions made by the model, should be the same shape as lower.
        y_true (Union[np.ndarray, torch.Tensor]): The targets to compare against, should be the same shape as lower.

    Returns: 
        (float): Correlation coefficient between residuals and predicted interval width, bounded in [-1, 1].
    """
    mean, lower, upper, y_true, _ = validate_inputs(mean, lower, upper, y_true)
    width = upper - lower 
    res = np.abs(mean - y_true)
    corr = np.corrcoef(width, res)[0, 1]
    return corr

group_conditional_coverage(lower, upper, y_true, n_bins=10)

Divides the outputs into approximately equal bins, and computes the coverage in each bin. Returns a dictionary containing the mean of the output in each bin, and the coverage in each bin.

Parameters:

Name Type Description Default
lower Union[ndarray, Tensor]

The lower bound predictions made by the model, should be able to be flattened to 1 dimension.

required
upper Union[ndarray, Tensor]

The upper bound predictions made by the model, should be the same shape as lower.

required
y_true Union[ndarray, Tensor]

The targets to compare against, should be the same shape as lower.

required
n_bins int

The number of bins to compute conditional coverage for.

10

Returns:

Type Description
dict

dictionary containing the following keys:

y_true_bin_means (np.ndarray): One dimensional array of the mean of the outputs within each bin.

bin_coverages (np.ndarray): One dimensional array of the coverage of the predictions within each bin.

Source code in uqregressors\metrics\metrics.py
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
def group_conditional_coverage(lower, upper, y_true, n_bins = 10): 
    """
    Divides the outputs into approximately equal bins, and computes the coverage in each bin. Returns a dictionary containing the mean of the 
    output in each bin, and the coverage in each bin. 

    Args: 
        lower (Union[np.ndarray, torch.Tensor]): The lower bound predictions made by the model, should be able to be flattened to 1 dimension.
        upper (Union[np.ndarray, torch.Tensor]): The upper bound predictions made by the model, should be the same shape as lower.
        y_true (Union[np.ndarray, torch.Tensor]): The targets to compare against, should be the same shape as lower.
        n_bins (int): The number of bins to compute conditional coverage for.

    Returns: 
        (dict): dictionary containing the following keys: 

            y_true_bin_means (np.ndarray): One dimensional array of the mean of the outputs within each bin.

            bin_coverages (np.ndarray): One dimensional array of the coverage of the predictions within each bin.
    """
    _, lower, upper, y_true, alpha = validate_inputs(lower, lower, upper, y_true)
    coverage_mask = (y_true > lower) & (y_true < upper)
    sort_ind = np.argsort(y_true)
    y_true_sort = y_true[sort_ind]
    coverage_mask_sort = coverage_mask[sort_ind]
    split_y_true = np.array_split(y_true_sort, n_bins)
    split_coverage_mask = np.array_split(coverage_mask_sort, n_bins)
    bin_means = [np.mean(bin) for bin in split_y_true]
    bin_coverages = [np.mean(bin) for bin in split_coverage_mask]
    return {"y_true_bin_means": np.array(bin_means), 
            "bin_coverages": np.array(bin_coverages)}

interval_score(lower, upper, y_true, alpha, **kwargs)

Computes the interval score as given in Gneiting and Raftery, 2007.

Parameters:

Name Type Description Default
lower Union[ndarray, Tensor]

The lower bound predictions made by the model, should be able to be flattened to 1 dimension.

required
upper Union[ndarray, Tensor]

The upper bound predictions made by the model, should be the same shape as lower.

required
y_true Union[ndarray, Tensor]

The targets to compare against, should be the same shape as lower.

required
alpha float

1 - confidence, should be a float between 0 and 1.

required

Returns:

Type Description
float

Interval score.

Source code in uqregressors\metrics\metrics.py
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
def interval_score(lower, upper, y_true, alpha, **kwargs):
    """
    Computes the interval score as given in [Gneiting and Raftery, 2007](https://sites.stat.washington.edu/raftery/Research/PDF/Gneiting2007jasa.pdf).

    Args: 
        lower (Union[np.ndarray, torch.Tensor]): The lower bound predictions made by the model, should be able to be flattened to 1 dimension.
        upper (Union[np.ndarray, torch.Tensor]): The upper bound predictions made by the model, should be the same shape as lower.
        y_true (Union[np.ndarray, torch.Tensor]): The targets to compare against, should be the same shape as lower.
        alpha (float): 1 - confidence, should be a float between 0 and 1. 

    Returns: 
        (float): Interval score.
    """
    _, lower, upper, y_true, alpha = validate_inputs(lower, lower, upper, y_true)
    width = upper - lower
    penalty_lower = (2 / alpha) * (lower - y_true) * (y_true < lower)
    penalty_upper = (2 / alpha) * (y_true - upper) * (y_true > upper)
    return np.mean(width + penalty_lower + penalty_upper)

lowest_group_coverage(lower, upper, y_true, n_bins=10)

Computes the coverage of the bin with lowest coverage when the outputs are divided into several bins and coverage is evaluated conditional on each bin.

Parameters:

Name Type Description Default
lower Union[ndarray, Tensor]

The lower bound predictions made by the model, should be able to be flattened to 1 dimension.

required
upper Union[ndarray, Tensor]

The upper bound predictions made by the model, should be the same shape as lower.

required
y_true Union[ndarray, Tensor]

The targets to compare against, should be the same shape as lower.

required
n_bins int

The number of bins to divide the outputs into.

10

Returns:

Type Description
float

The coverage of the least covered bin of outputs, float between 0 and 1.

Source code in uqregressors\metrics\metrics.py
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
def lowest_group_coverage(lower, upper, y_true, n_bins=10): 
    """
    Computes the coverage of the bin with lowest coverage when the outputs are divided into several bins and coverage is evaluated conditional on each bin. 

    Args: 
        lower (Union[np.ndarray, torch.Tensor]): The lower bound predictions made by the model, should be able to be flattened to 1 dimension.
        upper (Union[np.ndarray, torch.Tensor]): The upper bound predictions made by the model, should be the same shape as lower.
        y_true (Union[np.ndarray, torch.Tensor]): The targets to compare against, should be the same shape as lower.
        n_bins (int): The number of bins to divide the outputs into.

    Returns: 
        (float): The coverage of the least covered bin of outputs, float between 0 and 1. 
    """
    _, lower, upper, y_true, alpha = validate_inputs(lower, lower, upper, y_true)
    gcc = group_conditional_coverage(lower, upper, y_true, n_bins)
    return np.min(gcc["bin_coverages"])

nll_gaussian(mean, lower, upper, y_true, alpha, **kwargs)

Computes the average negative log likelihood of the data given the predictions and assuming a Gaussian distribution of predictions.

Parameters:

Name Type Description Default
lower Union[ndarray, Tensor]

The lower bound predictions made by the model, should be able to be flattened to 1 dimension.

required
upper Union[ndarray, Tensor]

The upper bound predictions made by the model, should be the same shape as lower.

required
y_true Union[ndarray, Tensor]

The targets to compare against, should be the same shape as lower.

required
alpha float

1 - confidence, should be a float between 0 and 1.

required

Returns:

Type Description
float

Average negative log likelihood of the data given the predictions.

Source code in uqregressors\metrics\metrics.py
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
def nll_gaussian(mean, lower, upper, y_true, alpha, **kwargs):
    """
    Computes the average negative log likelihood of the data given the predictions and assuming a Gaussian distribution of predictions.

    Args: 
        lower (Union[np.ndarray, torch.Tensor]): The lower bound predictions made by the model, should be able to be flattened to 1 dimension.
        upper (Union[np.ndarray, torch.Tensor]): The upper bound predictions made by the model, should be the same shape as lower.
        y_true (Union[np.ndarray, torch.Tensor]): The targets to compare against, should be the same shape as lower.
        alpha (float): 1 - confidence, should be a float between 0 and 1. 

    Returns: 
        (float): Average negative log likelihood of the data given the predictions.
    """
    mean, lower, upper, y_true, alpha = validate_inputs(mean, lower, upper, y_true, alpha)
    z = norm.ppf(1 - alpha / 2)
    std = (upper - lower) / (2 * z)
    std = np.clip(std, 1e-6, None)

    log_likelihoods = -0.5 * np.log(2 * np.pi * std**2) - 0.5 * ((y_true - mean) / std) ** 2
    return -np.mean(log_likelihoods)

rmse(mean, y_true, **kwargs)

Computes the root mean square error of the predictions compared to the targets.

Parameters:

Name Type Description Default
mean Union[ndarray, Tensor]

The mean predictions made by the model, should be able to be flattened to 1 dimension.

required
y_true Union[ndarray, Tensor]

The targets to compare against, should be the same shape as mean.

required

Returns:

Type Description
float

Scalar root mean squared error.

Source code in uqregressors\metrics\metrics.py
40
41
42
43
44
45
46
47
48
49
50
51
52
def rmse(mean, y_true, **kwargs):
    """
    Computes the root mean square error of the predictions compared to the targets.

    Args: 
        mean (Union[np.ndarray, torch.Tensor]): The mean predictions made by the model, should be able to be flattened to 1 dimension.
        y_true (Union[np.ndarray, torch.Tensor]): The targets to compare against, should be the same shape as mean. 

    Returns: 
        (float): Scalar root mean squared error.
    """
    mean, _, _, y_true, _ = validate_inputs(mean, mean, mean, y_true)
    return np.sqrt(np.mean((mean - y_true) ** 2))

validate_inputs(mean, lower, upper, y_true, alpha=0.5)

Ensure inputs are converted to 1D numpy arrays and alpha is a float in (0, 1) for use in computing metrics.

Parameters:

Name Type Description Default
mean Union[Tensor, ndarray]

The mean predictions to compute metrics for, should be able to be flattened to one dimension.

required
lower Union[Tensor, ndarray]

The lower bound predictions to compute metrics for, should be the same shape as mean.

required
upper Union[Tensor, ndarray]

The upper bound predictions to compute metrics for, should be the same shape as mean.

required
y_true Union[Tensor, ndarray]

The targets to compute metrics with, should be the same shape as mean.

required
alpha float

The desired confidence level, if relevannt, should be a float between 0 and 1.

0.5
Source code in uqregressors\metrics\metrics.py
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
def validate_inputs(mean, lower, upper, y_true, alpha=0.5):
    """
    Ensure inputs are converted to 1D numpy arrays and alpha is a float in (0, 1) for use in computing metrics.

    Args: 
        mean (Union[torch.Tensor, np.ndarray]): The mean predictions to compute metrics for, should be able to be flattened to one dimension.
        lower (Union[torch.Tensor, np.ndarray]): The lower bound predictions to compute metrics for, should be the same shape as mean. 
        upper (Union[torch.Tensor, np.ndarray]): The upper bound predictions to compute metrics for, should be the same shape as mean. 
        y_true (Union[torch.Tensor, np.ndarray]): The targets to compute metrics with, should be the same shape as mean.
        alpha (float): The desired confidence level, if relevannt, should be a float between 0 and 1. 
    """

    def to_1d_numpy(x):
        if isinstance(x, torch.Tensor):
            x = x.detach().cpu().numpy()
        x = np.asarray(x)
        if x.ndim != 1:
            x = x.flatten()
        return x

    mean = to_1d_numpy(mean)
    lower = to_1d_numpy(lower)
    upper = to_1d_numpy(upper)
    y_true = to_1d_numpy(y_true)

    if not (0 < float(alpha) < 1):
        raise ValueError(f"alpha must be in (0, 1), got {alpha}")

    length = len(mean)
    if not (len(lower) == len(upper) == len(y_true) == length):
        raise ValueError("All input arrays must be of the same length.")

    return mean, lower, upper, y_true, float(alpha)