API Reference
Here you find the api reference for the major components of MOTrainer
.
motrainer.splitter:
dataset_split(ds, identifier)
Split a Dataset by indentifier for independent training tasks.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
ds |
Dataset
|
Xarray Dataset to be splitted. |
required |
identifier |
dict | str
|
When When |
required |
Returns:
Type | Description |
---|---|
bag
|
A Dask Databag of splited Datasets |
Source code in motrainer/splitter.py
is_splitable(ds)
Check if a Dataset is can be splitted using MOTrainer.
The following checks will be applied: - The Dastaset has exactly 2 dimensions - The 2 dims are "space" and "time" - There are no duplicated coordinates A UserWarning will be raised for each failed check.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
ds |
Dataset
|
Xarray Dataset to be splitted. |
required |
Returns:
Type | Description |
---|---|
bool
|
Result of check in Boolean. If all checks are passed, it will be True. Otherwise False. |
Source code in motrainer/splitter.py
train_test_split(ds, mask=None, split=None, reverse=False)
Split data to train and test datasets.
The split is performed either 1) by specifying the training data mask (mask
)
where training data locations are True, or 2) by a specifying a coordinate value
(split
) splitting the data into two.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
ds |
Dataset
|
Xarray dataset to split |
required |
mask |
DataArray
|
Mask, True at training data locations. By default None |
None
|
split |
dict
|
coordinate diactionary in {NAME: coordinates} which split the Dataset into two. The part smaller than it will be training, by default None. |
None
|
reverse |
bool
|
Reverse the split results, by default False |
False
|
Returns:
Type | Description |
---|---|
tuple[Dataset, Dataset]
|
Split results. In (training, test). |
Raises:
Type | Description |
---|---|
ValueError
|
When neither mask nor split is specified. |
ValueError
|
When both mask and split are specified. |
Source code in motrainer/splitter.py
JackknifeGPI:
motrainer.jackknife.JackknifeGPI(gpi_data, val_split_year, input_list, output_list, export_all_years=True, outpath='./jackknife_results')
GPI object for neuron netowork training using Jackknife resampling method.
Methods:
Name | Description |
---|---|
train |
performance_method='rmse', training_method='dnn', verbose=0) train neuron network with given method |
export_best |
export the best results in Jackknife process. |
Initialize JackknifeGPI object.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
gpi_data |
DataFrame
|
DataFrame of a single GPI. Each row represents all properties at a certain timestamp. Each column represents a time-series of a property. |
required |
val_split_year |
int
|
Split year of validation. All data after (include) this year will be reserved for benchmarking. |
required |
input_list |
list of str
|
Column names in gpi_data will will be used as input. |
required |
output_list |
list of str
|
Column names in gpi_data will will be used as output. |
required |
export_all_years |
bool
|
Switch to export the results of all years, by default True |
True
|
outpath |
str
|
Results exporting path, by default './jackknife_results' |
'./jackknife_results'
|
Source code in motrainer/jackknife.py
export_best(model_name='best_optimized_model')
Export the best results in Jackknife process.
Source code in motrainer/jackknife.py
train(searching_space, optimize_space, normalize_method='standard', performance_method='rmse', training_method='dnn', verbose=0)
Train neuron network with Jackknife resampling method.
Procedures: 1. Reserve in/output after self.val_split_year for later benchmarking. 2. From the rest in/output data, leave out one year as validation data. 3. Perform neuron network training. 4. Repeat Step 2 and 3 until all years exept benchmarking years have been used for validation. 5. Select the best trainning by best performance. 6. Perform benchmarking on reserved data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
searching_space |
dict
|
Arguments of searching space. |
required |
optimize_space |
dict
|
Arguments of optimazation space. |
required |
normalize_method |
str
|
Method of normalization. Choose from 'standard' and 'min_max'. By default 'standard' |
'standard'
|
performance_method |
str
|
Method of computing performance. Choose from 'rmse', 'mae', 'pearson' and 'spearman'. By default 'rmse'. |
'rmse'
|
training_method |
str
|
Traning method selection. Select from 'dnn' or 'dnn_lossweights'. By default 'dnn' |
'dnn'
|
verbose |
int
|
Control the verbosity. By default 0, which means no screen feedback. |
0
|
Source code in motrainer/jackknife.py
70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 |
|
Utility Functions:
motrainer.util
normalize(data, method)
Pre-normalization for input/output.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
DataFrAME
|
Data to normalize. |
required |
method |
str
|
Data to normalize. Choose from 'standard' or 'min_max'. |
required |
Returns:
Type | Description |
---|---|
list
|
A list of [data_norm, scaler]. Normalized data and scaler used for normalization. |
Source code in motrainer/util.py
performance(data_input, data_label, model, method, scaler_output=None)
Compute performance of trained neuron netowrk.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data_input |
DataFrame
|
Input data. |
required |
data_label |
DataFrame
|
Label data. |
required |
model |
models
|
Trained model to compute performance. |
required |
method |
str
|
Method to compute |
required |
scaler_output |
optional
|
Scaler of output, by default None. When not None, function will assume that a normalization has been performed to output, and will use scaler_output to transform the output back to the original scale. |
None
|
Returns:
Type | Description |
---|---|
float or list of float
|
Performance value. If the model gives multiple output, the performance will be a list. |
Source code in motrainer/util.py
sklearn_load(path_model)
Load sklearn model from hdf5 file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path_model |
str
|
Path to the model. |
required |
Returns:
Type | Description |
---|---|
model
|
Sklearn model. |
Source code in motrainer/util.py
sklearn_save(model, path_model, meta_data=None)
Save sklearn model to hdf5 file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model |
model
|
Sklearn model to save. |
required |
path_model |
str
|
Path to save the model. |
required |
meta_data |
Dict
|
optional. A dict of meta data to save. |
None
|