Linear regression
August 21st, 2006 andris
We get a lot of questions about regression analysis. We have dug into this and decided to write a post about it, so we can help everyone with this.
You do a regression when you assume that a variable is influencing another one, like in the following example: We assume that cars that run on Diesel have higher costs.
To test this assumption, we run a Linear Regression in SPSS. Take the following steps:
- Define your dependent and independent variable. In our example Fuel is the indepent variable and Costs is the dependent one.
- Click Analyze
- Go to Regression and click Linear
- Click “Fuel” into the Independent variable field, and “Costs” into the Dependent variable field.
The output exists of:
1 Model Summary, in which you can find the relation between the variables.
R stands for the correlation and gives us the relation between the dependent and the independent variables. The correlation between Fuel and Costs is ,839.
R Square is the proportion of variance in the dependent variable (Costs) which can be predicted from the independent variable (Fuel). This value indicates that 70% of the variance in costs can be predicted from the variable fuel. The Adjusted R-square tries to give an even better calculation for the whole population.
2 ANOVA, which holds data about the significance of the regressionmodel.
The value under Sig. holds the significance value of the regression. In most cases this should be under 0.05. In our example this is 0.00, better it cannot get!
3 Coefficients, gives information about the first line of regression.
Conclusion would be that this regression analysis is significant and that 70% of the variance in costs can be predicted from the variable fuel.
Please find below the SPSS file we used to create this example. Just one note, the information in the SPSS file is not based on anything. Even more, it’s just random data. Please don’t sue us.
Linear Regression Example Cars
Entry Filed under: 3b. Heavy statistics with SPSS, Questions and answers
6 Comments Add your own
1. Franklin | September 18th, 2006 at 1:45 am
What dothe residual sum of suqares mean
2. andris | September 18th, 2006 at 4:17 am
Franklin, see Wikipedia
3. Yves | November 21st, 2006 at 1:44 pm
Regression analysis is not conducted right this way. You have to make sure you have a look at the outliers, at the normal distribution, at the scatterplot zpred*zresid, and so on…
Only when you take notice of these “problems”, your regression will be valuable
4. Dave in Canada | December 12th, 2006 at 3:52 pm
Andris’ explanation of linear regression is correct but very brief. Yves added the important step of checking the outliers. I want to remind users that Linear regression (the regression which Andris explains) only works if the dependent variable — “costs” in Andris’ example — is a continuous variable. The variable does not have to be perfectly continuous. Some say that you can have as few as seven possible levels of outcome in the dependent variable. But with fewer than that, the linear regression method begins to give misleading answers.
The alternative is in an add-on module from SPSS, called Regression Models. It has methods called “Binary Logistic” and “Muiltinomial Logistic” regression. Use these when the dependent variable is binary, like Yes/No or Passed/Failed; or when it is multinomial, like Low-Medium-High.
5. andris | December 13th, 2006 at 2:48 am
Yves, Dave,
Thank you for you additions! This make the answer more complete.
6. tathagata ghosh | August 7th, 2008 at 4:48 am
how to add contants(residual lines) in the graph
Leave a Comment
Some HTML allowed:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>
Trackback this post | Subscribe to the comments via RSS Feed