Replies: 1 comment 2 replies
-
|
Are you saying you have FDS simulation data from multiple simulations where the number of rows in the devc.csv files have different timesteps or different numbers of rows? Or is this data from an experiment? Why does it matter that some tests scenarios have more samples? What are you doing with the data that this is important? |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi folks,
I have some timeseries data recorded from fire and carbon monoxide detectors for different test case scenarios.
The issue is that this dataset is imbalanced such that some test case scenarios have more samples that others.
I have thought of a naive approach to balance it would be to use upsampling on the minority test cases but I'm skeptical it might lead to hidden risks of biasing the data and I would like to avoid those or any unnecessary "artificial" correlations created by the upsampling process.
That's why I thought I should ask here to see if folks have any better ideas or know what are the current state of the art methods to generate synthetic data for the test case scenarios with few samples.
Hopefully there exist some methods that can generate synthetic data similar to real recordings without introducing too many unnecessary correlations and hidden biases.
If you know any, or even better if you've used in the past or have found yourself in a similar situation, please let me know you resolved it and what you used. Thanks 🙏
Beta Was this translation helpful? Give feedback.
All reactions