Module B4:Basic Data Analysis Techniques

1. Reports

Procedures in the REPORTcommand group can provide all univariate statistics available in other procedures. In addition, computations involving aggregated statistics are directly accessible only in the REPORT procedures.Among the others, Codebook and OLAP Cubes, which are included with SPSS, are essential procedures for education data analysts.

In SPSS Statistics, the first item under the ‘Analyze’ menu is ‘Reports’. The procedures in ‘Reports’ can provide all univariate statistics available in the procedures under ‘Descriptive Statistics’ and sub-population means available in the Means procedure. Some statistics available in the report procedures such as computations involving aggregated statistics are not directly accessible in any other command procedures.

By default, ‘Reports’ provides output tables that can be used directly in presentations and documents (such as National EFA Monitoring Reports). A variety of table elements, such as column widths, titles, footnotes, and spacing can also be customized. Because it is flexible and the output has so many components, it is often efficient to preview a sample output using a small number of cases until finding the format that is the best match for your needs.

The group of Reports commands comprises of Codebook, OLAP Cubes, and Summarize procedures. Summarize procedure contains ‘Case Summaries’, ‘Report Summaries in Rows’ and ‘Report Summaries in Columns’.

Codebook

This procedure reports the dictionary information and summary statistics for all or specified variables and multiple response sets in the active data set.

Summarize procedure (or Case Summaries)

Case summaries produce subgroup statistics for variables that exist within categories of one or more grouping variables. All levels of grouping variables are cross-tabulated. Summary statistics for each variable are also displayed for all categories. The order in which the statistics are displayed can be chosen. Data values in each category can be listed or suppressed. With large data sets, either the first n cases or all cases can be listed.

Report Summaries in Rows produce reports in which various summary statistics are laid out in rows. Case listings are also available, with or without summary statistics.

Report Summaries in Columns produce summary reports in which different summary statistics appear in separate columns.

OLAP Cubes (Online Analytical Processing Cubes)

OLAP Cubes are used to calculate totals, means, and other univariate statistics for continuous summary variables within categories of one or more categorical grouping variables. A separate layer in the table is created for each category of each grouping variable.

1.1 Codebook

Codebook reports information, such as variable names, variable labels, value labels, and missing values from the data dictionary. It also provides summary statistics for all variables or specified variables and multiple response sets in the active data set.

The codebook can produce summary statistics, such as counts and percentages for nominal and ordinal variables, and for multiple response sets. Summary statistics, such as mean, standard deviation, and quartiles can be produced for scale variables. This makes the codebook a useful tool for preliminary analysis.

To obtain a codebook of the current data set:

  1. Click ‘Analyze’ on main menu bar.
  2. Click ‘Reports’.
  3. Click ‘Codebook’ again, and a new window will appear with complete list of variables.
  4. Select and send the variables to ‘Codebook Variables’ pane. In the following exhibit, three variables with different measurement scales are chosen.
  5. Click ‘OK’ to proceed with the default settings for Output and Statistics.

The output table obtained by this procedure for the first variable (HV009 – number of household members) is displayed in the following exhibit.

Since ‘HV009 – Number of household members’ is a scale variable, the statistics produced for the variable are mean, standard deviation and three quartile values.

The other variables (‘HV024 – Division’ is nominal and ‘HV025 – Type of place of residence’ is ordinal, so only count and percentage are provided as statistics for each valid value (category).

In the Codebook procedure, measurement level of variables can be changed temporarily by moving the mouse over a variable and clicking the right mouse button.  The following example shows changing measurement level of ‘HV270 – Wealth index’ from ‘ordinal’ to ‘scale’. Remember that changing from ‘ordinal’ to ‘scale’ type is temporary and only useful in the codebook procedure.

The other options which are available within the Codebook command are:

The following output table is the codebook for ‘HV009 – Number of household members’ after changing three options:

(i)     measurement level to ‘Ordinal’ from ‘Scale’.

(ii)     selecting ‘Measurement level’ and ‘Weight status’.

(iii)     displaying only ‘Percent’ in statistics.

1.2 Case Summaries: Listing Selected Cases

Occasionally we need to list selected cases with a limited number of variables to check the validity of data (error checking) and for reporting, printing and presenting data. We use Case Summaries for this task.

Case Summaries, which is located under Reports, is used to filter and list cases that meet specified criteria. We could, for example, use Case Summaries to list the age, sex and highest level of education of 20 out-of-school children from the lowest socio-economic status who are aged 6-14.

Note that the data set must be limited to the household members aged 6-14 who are out of school (use ‘Select Cases’), and sorted in ascending order by ‘Wealth index factor score’ (use ‘Sort Cases’) before the case summaries function can be used.

The following exhibits show the preparatory steps that must be completed before executing Case Summaries.

After completing these preparatory tasks, the SPSS ‘Data Editor’ shows the ‘Out_of_School_6_to_14’ data set with the selected cases sorted in ascending order of ‘HV271 – Wealth index factor score’. The original sample data set contains 53,413 cases, but the filtered data set contains only 974 cases, which are the records for children aged 6 – 14 who are not attending school.

Once the preparation work is complete, the following steps are used to produce data summaries in the ‘Case Summaries’ command.

  1. Click ‘Analyze’ on main menu bar.
  2. Click ‘Reports’.
  3. Click again on ‘Case Summaries’. A new window will appear with a complete list of variables in the current data set.
  4. In ‘Case Summaries’ window, select the variables in desired sequence.
  5. Set the number of cases to display in the ‘Limit cases to first’ box. In this example, we will limit the results to the first 20 cases.
  6. Click the ‘OK’ button to create a case summary report.

The output table of the procedure is as following.

The table below was copied from SPSS Viewer and pasted directly into MS Word. Once in MS Word, MS Word’s table formatting functions can be used to lay out and give style to the table.

The following output table shows the same list of 20 out-of-school children, but by ‘Division’.

Report Summaries in Rows produce reports in which various summary statistics are laid out in rows. Case listings are also available, with or without summary statistics. Similarly, Report Summaries in Columns can provide summary reports in which various summary statistics are listed in separate columns.

The outputs of both these commands are in text format and, therefore, pivot table techniques cannot be used with them. All these outputs can be created from the Case Summaries command described above.

The following table is the summary statistics obtained from the Case Summaries command, but individual cases are not displayed (this is achieved by unselecting the ‘Display cases’ checkbox in the ‘Summarize case’ dialogue box). The variable for which summary statistics are displayed is ‘number of years effectively studied by a household member (HV108 – Education in single year)’. The report will provide the following statistics:

(i)  The number of cases.

(ii)  The mean year of study (average of HV108).

(iii)  The standard error of mean.

(iv)  The median year of study by:

  1. Sex.
  2. Residence.
  3. District without listing individual cases.

Case Summaries

Education in single years

1.3 OLAP Cubes (Online Analytical Processing Cubes)

The OLAP Cubes procedure can produce variety of summary statistics for summary variables within categories of one or more grouping variables.

OLAP Cubes create a separate layer for each category of every grouping variable in the table. The summary variables are quantitative (continuous variables measured on an interval or ratio scale), and the grouping variables are categorical. The values of categorical variables can be numeric or string.

OLAP Cubes provide a wide variety of summary statistics, such as sum, number of cases, mean, median, grouped median, standard error of the mean, minimum, maximum, range, variable value of the first category of the grouping variable, variable value of the last category of the grouping variable, standard deviation, variance, kurtosis, standard error of kurtosis, skewness, standard error of skewness, percentage of total cases, percentage of total sum, percentage of total cases within grouping variables, percentage of total sum within grouping variables, geometric mean, and harmonic mean.

Some of the optional subgroup statistics, such as the mean and standard deviation, are based on normal theory and are appropriate for quantitative variables with symmetric distributions. OLAP cube uses the pivot table techniques, but with specific statistics and output options that cannot be obtained from other procedures such as cross-tabulation.

Example: OLAP Cubes

Among the variables in the sample data set, only ‘HV108 – Education in single year’ is the education-related continuous (interval or ratio scale) variable. Since the continuous variable(s) must be selected as a ‘Summary’ variable, HV108 is selected in this example. Thus, the following exhibits demonstrate how OLAP Cubes are useful for exploring the ‘average number of years of study by adult household members’, using four grouping variables: Sex; Age Group; Residence and Division.

Before using the OLAP Cubes procedure, the adult household members (aged 15 and above) must be selected using Case Selection.

After selecting only adults:

  1. Click ‘Analyze’ on main menu bar.
  2. Click ‘Reports’.
  3. Click ‘OLAP Cubes’ and a new window will appear with complete list of variables.
  4. In the ‘OLAP Cubes’ window, select Summary and Grouping variables as planned.
  5. Click ‘Statistics’ to set the desired summary statistics:
    1. By default, the six summary statistics are selected (can be left as default).
    2. Any statistics that aren’t selected can be selected by double clicking them. Any statistics that are selected can be unselected by double clicking them.
    3. Click ‘Continue’ when the selection of summary statistics is complete.

  1. Click the ‘Differences’ button to compute absolute or percentage differences for all measures selected in the Statistics dialog box. This step is optional.

The ‘Differences’ dialog box allows calculation of percentage or absolute differences:

  • Differences between Variables calculates the difference between pairs of variables. At least two summary variables must be selected before the difference between two variables can be calculated.
  • Differences between Groups of Cases calculates the differences between pairs of groups that are defined by a grouping variable. One or more grouping variables must be selected in the main dialog box before the difference between groups can be calculated.
  • The differences are calculated between summary statistics values by subtracting the value of the ‘minus’ variable/category from the values of the first in the pair. Percentage differences use the value of the summary statistic of the second (the Minus) as the denominator.
  1. Click the ‘Title’ button to create custom table titles. This step is optional.  The title of an output table or a caption (added below the table) can be specified in this step. If the title or caption expands over one line, inset \n for wrapping (a line break in the text).

Enter appropriate title and caption, and click the ‘continue’ button when complete.

  1. Click the ‘OK’ button on ‘OLAP Cubes window’ to create an OLAP cube with the specified options.

After creating the OLAP Cube, the following output will be placed in the output viewer. The default output provides three summary statistics: the number of cases (N), mean, and the standard error of mean for ‘HV108 – Education in single year’ for all the valid cases in the sample.

Although this table seems simple and unattractive, we can select different options for each and every category of ‘Grouping variables’ just like in the Pivot tables. To do this, double-click on the table in Output Viewer, click on the dropdown box and select a category from the list. The following exhibit shows the statistics for the ‘Males aged 15-29 who lived in the urban areas’.

Again, one can pivot the output table to be more attractive as followings:

Comments are closed.