uqregressors.utils.torch_sklearn_utils
torch_sklearn_utils
A collection of sklearn utility functions refactored to work with pytorch tensors.
The key functions are
- TorchStandardScaler (class)
- TorchKFold (class)
- train_test_split (function)
Warning
TorchKFold returns the indices of each K-Fold, while train_test_split returns the values in each split.
TorchKFold
A class meant to split the data into K-folds for conformalization or cross validation.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
n_splits
|
int
|
The number of folds for data splitting. |
5
|
shuffle
|
bool
|
Whether to shuffle the data before splitting. |
False
|
random_state
|
int or None
|
Controls shuffling for reproducibility. |
None
|
Source code in uqregressors\utils\torch_sklearn_utils.py
140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 |
|
split(X)
Yield train/test indices for each fold.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X
|
torch.Tensor, np.ndarray, or list
|
Input data with shape (n_samples, ...) |
required |
Yields:
Type | Description |
---|---|
tuple[LongTensor, LongTensor]
|
train_idx, val_idx; the indices of the training and validation sets for each of the splits. |
Source code in uqregressors\utils\torch_sklearn_utils.py
156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 |
|
TorchStandardScaler
Standardized scaling to 0 mean values with unit variance.
Attributes:
Name | Type | Description |
---|---|---|
mean_ |
float
|
The mean of the data, subtracted from the data during scaling. |
std_ |
float
|
The standard deviation of the data, by which the data is divided during scaling. |
Source code in uqregressors\utils\torch_sklearn_utils.py
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 |
|
fit(X)
Fits the standard scaler.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X
|
Tensor
|
data to be scaled of shape (n_samples, n_features). |
required |
Returns:
Type | Description |
---|---|
TorchStandardScaler
|
the scaler with updated mean_ and std_ attributes. |
Source code in uqregressors\utils\torch_sklearn_utils.py
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 |
|
fit_transform(X)
Performs the fit and transforms the data. A combination of the fit and transform methods.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X
|
Tensor
|
data to be scaled of shape (n_samples, n_features). |
required |
Returns:
Type | Description |
---|---|
Tensor
|
The scaled data |
Source code in uqregressors\utils\torch_sklearn_utils.py
58 59 60 61 62 63 64 65 66 67 68 69 |
|
inverse_transform(X_scaled)
Transforms scaled data back to the original scale.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X_scaled
|
Tensor
|
scaled data of shape (n_samples, n_features). |
required |
Returns:
Type | Description |
---|---|
Tensor
|
The unscaled data. |
Source code in uqregressors\utils\torch_sklearn_utils.py
71 72 73 74 75 76 77 78 79 80 81 |
|
transform(X)
Transforms the standard scaler based on the attributes obtained with the fit method.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X
|
Tensor
|
data to be scaled of shape (n_samples, n_features). |
required |
Returns:
Type | Description |
---|---|
Tensor
|
The scaled data |
Source code in uqregressors\utils\torch_sklearn_utils.py
46 47 48 49 50 51 52 53 54 55 56 |
|
train_test_split(X, y, test_size=0.2, device='cpu', random_state=None, shuffle=True)
Split arrays or tensors into training and test sets.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X
|
array - like or Tensor
|
Features to be split. |
required |
y
|
array - like or Tensor
|
Targets to be split. |
required |
test_size
|
float
|
Proportion of the dataset to include in the test split (between 0 and 1). |
0.2
|
random_state
|
int or None
|
Controls the shuffling for reproducibility. |
None
|
shuffle
|
bool
|
Whether or not to shuffle the data before splitting. |
True
|
Returns:
Type | Description |
---|---|
Tuple[ndarray, ndarray, ndarray, ndarray]
|
X_train, X_test, y_train, y_test; same type as inputs |
Source code in uqregressors\utils\torch_sklearn_utils.py
83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 |
|