W18 SPSS Workshop: Getting Comfortable with your Data

Before we start any statistical analysis, we should really take a step back and get familiar and comfortable with our data.  “Playing” around with it to ensure that you know what’s in there.  This may sound funny, but getting comfortable with your data by running descriptive statistics really does two things:  One, you understand what’s been collected and how; and second, gives you the opportunity to review the data and find any errors in it.  Sometimes you may find an extra 1 added to the front of a number, or maybe a 6 instead of a 9, or any combinations of data entry errors.  By playing around with your data and getting comfortable with it before running your analysis, you may find some of these anomalies.

For this workshop, we will use a fictitious dataset looking at 25 samples of woodchips, their weight and a quality score for the woodchips within each sample.  Please download the dataset here.  Once you have downloaded the Excel file, open it into SPSS.

My goals for this session are to review the use of the Descriptive Statistics in SPSS and some file information.

DATA FILE INFORMATION

When you receive a file from a colleague, labmate, website, or repository, it is often very handy to take at the Data File Information, to give you a sense as to what is contained in the file.  To accomplish this follow these steps:

  • File
    • Display Data File Information
    • Working File – which is the file that is currently open in SPSS

The data file information will now be available in the SPSS  Statistics Viewer.  Notice that the information is very similar to what we see in the Variable View, with the exception of the last 2 columns:  Print Format and Write Format.  These two columns show us the internal formatting of the variables.  Note that they are and should be the same for each variable.  The PRINT format is the format of the variable for output.  To change either FORMAT you will need to use the FORMATS command.  For more information on this please visit this page on the IBM Knowledge Center.

If there are any values set up in the dataset, the data file information will provide you with a small table with the values and their respective labels.  To test this out add the following labels to the Quality variable:

1 = Low Quality
2 = Regular Quality
3 = High Quality
4 = Exceptional Quality

Once you’ve added these to your dataset, save it on your computer, and try running the Data File Information again to see how the output changes.

Descriptive Statistics

Descriptive statistics are essentially that – they describe your data, or they summarize your data to give you a good, solid base understanding of what you have collected.  The type of descriptive statistics you will conduct will depend on the type of variable you have.  Remember the 3 types of variables that SPSS distinguishes between?

  • Scale – a continuous piece of information, also referred to as Interval or Ratio.  Examples: age, weight, height
  • Nominal – a categorical piece of data – there is NO relationship between the categories.  Examples:  religion, colour, gender
  • Ordinal – a categorical piece of data – this time there is a relationship or order to the categories.  Examples:  Year of study, age group, likert scales

Each of these data types will use a different type of descriptive statistic.  For instance, calculating the mean of colour makes no sense at all, but a frequency count of colour does work.

Frequency

To calculate the frequency of a categorical variable (nominal OR ordinal) in SPSS:

  • Analyze
  • Descriptive Statistics
  • Frequencies
    • Select the variables in question and drag to the right hand side
      • As an example, select Quality
    • Click OK to run

You should now have a frequency table of the variable, Quality

The lists the categories of the variable.  If you had not provided the value labels, you would see 1; 2; 3; 4 as the categories with no explanation as to what they represent.

The table lists Frequency – actual count of observation in each category; Percent – percent of observations as a total; Valid Percent – this will change if you have missing observations.  The Valid Percent is the percentage of observations that have values for Income Category; Cumulative Percent.

Mode

Mode is the value in the data that appears the most.  When you run the frequency you have a table that shows you the 5 levels of wood quality:

  • Low Quality = 5
  • Regular Quality = 6
  • High Quality = 8
  • Exceptional Quality = 6

By looking at these results I can see that High Quality appears to be the category that was selected the most.  But let’s get SPSS to do the hard work for us and confirm whether this is correct or not.

To obtain the MODE of a variable:

  • Analyze
  • Descriptive Statistics
  • Frequencies
    • Select the variables in question and drag to the right hand side
    • Click on the Statistics button on the right
      • Select Mode
      • Click Continue
      • Click OK

You should now see the Mode in the first table of the Frequency output.

Median

The median of a variable, is the middle value.  So if you have an even number of categories, there will be no median or middle value, but if you have an odd number you will see it.

To obtain the MEDIAN in SPSS, follow the same instructions as the MODE, but select the MEDIAN in the Statistics dialogue box.

Mean

The mean or average is calculated on a scale variable or continuous variable.  It just doesn’t make sense to calculate the mean of a categorical variable.

To obtain the MEAN in SPSS:

  • Analyze
  • Descriptive Statistics
  • Descriptives
    • Select the variable in question and drag to the right hand side
      • Click OK to run

You should now have a table with N, Minimum, Maximum, Mean, and Standard Deviation for the household income variable.  These are the default values you obtain when you run this analysis.  But, what happens if you want the Sum or the Standard Error of this variable?

  • Analyze
  • Descriptive Statistics
  • Descriptives
    • Select the variable in question and drag to the right hand side
    • Select the Options button – this will open another dialogue box that has a list of statistics to select from
      • Select Sum and S.E. mean (standard error of the mean)
    • Click Continue
    • Click OK to run

Your output table will now contain these added statistics.

Explore Function in SPSS

Sometimes you may want to determine what the mean household income by marital status or by another categorical variable.  Till now, we’ve been looking at the entire dataset.  There are a few ways to do this, but the most direct way is to use the Explore function in SPSS.

  • Analyze
  • Descriptive Statistics
  • Explore
    • In the Dependent List box, add the variables for which you would like to calculate the means
    • In the Factor List box, add the variable by which you would like to see the means for – for example: Quality
    • Click Ok to run.

You will now see a much larger table than we have seen to date.  SPSS provides you with a long list of descriptive statistics for wood chip weight by each quality category.

You will also see a Stem and Leaf plot along with a Boxplot to provide you with a sense of the distribution of the data.  More information to help you get a better feeling for the data that you are working with.

Summary

The common descriptive statistics that are used include: frequency, median, mode, mean, and measures of variation (standard deviation, standard error, etc..).  Each of these statistics should be run on the appropriate types of data – keep in mind, that a frequency on a variable such as age will give you a long table with meaningless information.

SPSS OUTPUT WINDOW

As we’ve been working along, you’ve already noticed that all the output or results can be found in a second window – referred to as the SPSS Statistics Viewer window.  If you want to save your work here, using the File -> Save or Save As option will save the entire output window as an .SPV file which is an SPSS format.  This means that if you want to re-open this file you must have SPSS installed on your computer.

If you only want to save a table or a chart, you have a couple of options:

  1. Export the parts you want to save as a Word, Excel, PDF, amongst a few more options.  To accomplish this, follow these steps:
    • select the tables, graphs that you want to export
    • File
      • Export… you should see a new dialogue box open.
        • At the top, ensure that you select “Selected”.  If you leave it as the default ALL, you will be exporting everything in the SPSS output window including the Notes for each analysis.
        • Select the Type of Document you wish to export to – PDF, Excel, etc…
        • Select the location and name for the file you will be exporting in the File Name box
        • Click OK to run
        • This will result in a new file in the location you set out – with the SPSS results you selected.
  2. Copy and Paste
    • This is probably the easiest way to save the tables or charts you want.  On a WINDOWS computer, simply select the table or chart, Copy (either by using the Menubar option or Ctrl-C), move to the document you want to paste the results into – Word, PPT, Excel, etc..  and Paste (either by using the Menubar option or Ctrl-V).
    • On a MAC, you will need to use the Menubar option and select Copy Special and check Image.  Move to the document you want the selected table or graph and Paste or Cmd-V.

Name

 

 

2 thoughts on “W18 SPSS Workshop: Getting Comfortable with your Data”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s