45 add interpolation methods for input output#51
Conversation
that specifies the shift for each input individually
…olation-methods-for-input-output
…olation-methods-for-input-output
output wasn't considered, error occurred when using pinn
…olation-methods-for-input-output
…://github.com/RWTH-EBC/physXAI into 45-add-interpolation-methods-for-input-output
updated docstrings
| shift = { | ||
| 'reaTZon_y': 'previous', # for all lags of reaTZon_y, the shift will be set automatically | ||
| 'weaSta_reaWeaHDirNor_y': 'mean_over_interval', | ||
| '_default': 0, | ||
| } |
There was a problem hiding this comment.
We should include this as part of Feature with attribute sampling_method. Default should be None, and there should be a class attribute default_sampling_method that is used if None
Shift should be removed from preprocessing (not backwards compatible, but necessary to maintain usability)
| if (len(self.inputs) != len(self.shift.keys())) or not all(inp in self.shift.keys() for inp in self.inputs): | ||
| self.shift = convert_shift_to_dict(self.shift, self.inputs, custom_default=self.shift_default) | ||
|
|
||
| assert len(self.inputs) == len(self.shift.keys()), ( | ||
| f"Something went wrong, number of inputs ({len(self.inputs)})" | ||
| f" doesn't match number of inputs defined in shift ({len(self.shift.keys())})") |
There was a problem hiding this comment.
Is this not already done in the init function?
There was a problem hiding this comment.
L291-292 are necessary in case of recursive feature selection since preprocessing object is only created once but inputs are changed
| FeatureConstruction.process(df) | ||
| # Only apply for those features that are not lags since lags must be constructed after sampling the data | ||
| # according to the given time step | ||
| FeatureConstruction.process(df, feature_names=inputs_without_lags + [out for out in self.output if out not in inputs_without_lags]) |
There was a problem hiding this comment.
It might be easier to implement a process_without_lags and process_only_lags function, to avoid too much code in high level functions
| """ | ||
|
|
||
| def __init__(self, inputs: list[str], output: Union[str, list[str]], shift: int = 1, | ||
| def __init__(self, inputs: list[str], output: Union[str, list[str]], shift: Union[int, str, dict] = 'previous', |
There was a problem hiding this comment.
Can probably be removed if sampling_method is part of feature
| os.environ['TF_CPP_MIN_LOG_LEVEL'] = '0' | ||
|
|
||
|
|
||
| def convert_shift_to_dict(s: Union[int, str, dict], inputs: list[str], custom_default: Union[int, str] = None) -> dict: |
There was a problem hiding this comment.
Function should (hopefully) get much easier if shift is a attribute of Feature
| if all('current' == self.shift[k] for k in inputs_without_lags): | ||
| # filter / sample data | ||
| X = self.filter_df_according_to_timestep(X) | ||
| # nothing more to do here | ||
| elif all('previous' == self.shift[k] for k in inputs_without_lags): | ||
| # filter / sample data | ||
| X = self.filter_df_according_to_timestep(X) | ||
|
|
||
| # shift data by 1 and shorten DataFrames accordingly | ||
| X = X.shift(1) | ||
| y = y.iloc[1:] | ||
| X = X.iloc[1:] | ||
| elif all('mean_over_interval' == self.shift[k] for k in inputs_without_lags): | ||
| X = get_mean_over_interval(y, X) | ||
| # synchronize length between X and y | ||
| y = y.iloc[1:] | ||
|
|
||
| else: # different inputs have different shifts |
There was a problem hiding this comment.
Is this not a duplicate of the more general "else: # different inputs have different shifts" statement?
There was a problem hiding this comment.
Yes, in fact, the else would do the same thing. The idea was that it might be faster in case all inputs have the same shift
There was a problem hiding this comment.
I see, i think the preprocessing is not really a performance bottleneck, so i would sacrifice a little performance for a more general implementation
| if isinstance(shift, dict) and '_default' in shift.keys(): | ||
| self.shift_default = shift['_default'] | ||
| shift.__delitem__('_default') |
There was a problem hiding this comment.
Should get easier if Feature has a default sampling_method (as class attribute, so it is changeable)
| df = df.loc[first_valid_index:last_valid_index] | ||
| if df.isnull().values.any(): | ||
|
|
||
| def get_mean_over_interval(y: pd.DataFrame, x: pd.DataFrame): |
There was a problem hiding this comment.
Is this similar to how the sampling is done in agentlib/agentlib-mpc?
There was a problem hiding this comment.
The get_mean_over_interval function is mostly copied from agentlib-mpc/utils/sampling.py. However, small adaptions were necessary due to differing data structures
| y = y.iloc[:-self.shift] | ||
| X = X.iloc[:-self.shift] | ||
| # Applies feature constructions defined in `FeatureConstruction` to the lagged inputs | ||
| FeatureConstruction.process(res_df, feature_names=lagged_inputs) |
There was a problem hiding this comment.
How is it handeld if there is a constructed feature that is based on a lagged input?
before: only str allowed
deleted deprecated code and test for shift conversion
sampling_method of constructed features determined based on corresponding base_features(s)
resetting FeatureConstruction.features also affected p_hp_data
…olation-methods-for-input-output
…://github.com/RWTH-EBC/physXAI into 45-add-interpolation-methods-for-input-output
…olation-methods-for-input-output
Pull Request
Description
Restructure and individualise shift and add interpolation methods
Type of Change
Required Checklist
Testing
Examples
Compatibility
pyproject.toml(required independenciesor optional in[project.optional-dependencies])Documentation
Optional
GitHub Copilot Review