The Time Series and Scatter Plot Charts within Flow has the capability to apply Regression analysis on every measure configured on a Axis within the Chart.
Important to note: The regression analysis will only be applied to the data that is loaded onto the chart. The calculation is done in the front end when redering the chart.
This setting must be used in conjunction with the period settings to understand how the regression analysis will be used to potentially match and project an outcome.
In statistical modelling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (often called the 'outcome' or 'response' variable, or a 'label' in machine learning parlance) and one or more independent variables (often called 'predictors', 'covariates', 'explanatory variables' or 'features').
The most common form of regression analysis is linear regression, in which one finds the line (or a more complex linear combination) that most closely fits the data according to a specific mathematical criterion.
Linear-regression models are relatively simple and provide an easy-to-interpret mathematical formula that can generate predictions.
The different Type of Regression models that can be applied is:
- Linear
- Exponential
- Logarithmic
- Power
- Polynomial
Lets look at an example. We are calculating the total bad production on an hourly basis based on a counter aggregation form a totaliser in the field.
Lets see if we can predict the total that we can expect at the end of the shift or perhaps for the next hour?
I created an Hourly Time Series Chart and added the hourly bad production to one of the Axis. Now we need to configure the period setting correctly.
Trying to predict the end of the shift, I'm going to change my interval type to be Shift(s). Also, I want to display all the data until the end of the shift, thus my period is set to "Start of Interval to End of Interval"
The frist type of analysis we will preform is Linear Regression. Based on my period settings, I will only use data from the beginning of the shift to try and match a best fit line.
As one can see (Based on the Green regression line), it seems that we stop having bad production before the end of the shift, which might not be realistic.
Lets add more historical data. Lets change the "Period Start Intervals" on our period setting to include the last 2 shift as well:
With more data available, the best fit line is actually showing an increase in bad production, which might be a better outcome of the regression analysis.
What about if one want to predict the next hours' bad production?
Again I changed my period setting to display the last 12 hours, and to show the end of the current hour:
A linear fit shows that the value will be higher than the pervious hou, but still it is has a negative gradient.
Lets see what a polynomial fit will do:
I had to play with the order (degree) setting until the line fits my data the best:
but the outcome is a much better fit to the data:
It might predict the same vale for the next hour as the linear regression, but is shows that there might be an upward bad production count for the next few hours.