2. Introduction to SPSS (PASW) Statistics
2.1 What is SPSS (PASW) Statistics?
2.1.1 Brief History
In 1968 at Stanford University, Norman H. Nie, a social scientist and doctoral candidate, C. Hadlai (Tex) Hull, who has just completed master of business administration, and Dale H. Bent, a doctoral candidate in operations research, developed a software system based on the idea of using statistics to turn raw data into information that is essential for decision-making. This statistical software system was called SPSS, the Statistical Package for the Social Sciences. This software is the root of present day PASW, the Predictive Analytics Software.
Nie, Hull and Bent developed SPSS because they need to quickly analyse volumes of social science data gathered through various methods of research. Nie represented the target audience and set the requirements, Bent had the analysis expertise and designed the SPSS system file structure; and Hull wrote the programmes. The initial work on SPSS was done at Stanford University with the intention that it would only be used within the university. With the launch of the SPSS user’s manual in 1970, however, the demand for SPSS software expanded. The original SPSS user’s manual has been described as “Sociology’s most influential book”. Because of its growing demand and popularity, a commercial entity, SPSS Inc. was formed in 1975. Up to the mid-1980s SPSS was only available for mainframe computers.
With advances of personal computers in the early 1980s, the SPSS/PC was introduced in 1984 as the first statistical package for PC that worked on the MS-DOS platform. Similarly, SPSS was the first statistical product for the Microsoft Windows (version 3.1) operating system when a version for Windows 3.1 was launched in 1992.
Since then SPSS has been regularly updated to fit in and exploit the advance features of new operating systems, and to fulfil the growing needs among users.
2.1.2 SPSS Users
In the beginning, most users of SPSS were academic researchers, who were based in large universities with mainframe computers. Because of its very high price, employment of touch security systems and its difficult user interface, not many users or organisations used SPSS. SPSS was not popular among the researchers until the earlier personal computer version SPSS/PC+.
Once the Windows version was launched, however, the use of SPSS increased rapidly because it was user-friendly and was easy to acquire (users could download a fully functional evaluation version with a specified trial period).
Moreover, the cost of obtaining an SPSS license is minimal for students, and it is within reasonable price range for members of corporations/organisations, although it is still expensive for general users. Many market researchers, health researchers, survey companies, government and education researchers use SPSS.
2.1.3 Strengths of SPSS
In addition to superb statistical analysis, SPSS offers good data management (case selection, file reshaping, creating derived data) and data documentation (a metadata dictionary is stored with the data). SPSS data files are portable (smaller in size compared to other database systems) and its program (SPSS syntax) files are quite small.
2.1.4 Organization of SPSS Statistics Software Package
SPSS has a base system with additional optional components or modules. Most of the optional components are add-ons to the base system. However, some optional components, such as the Data Entry component, works independently.
The base system, main component for running SPS, has the following functions:
- Data handling and manipulation: importing from and exporting to the other data file formats, such as Excel, dBase, SQL and Access; and allowing sampling, sorting, ranking, subsetting, merging, and aggregating the data sets.
- Basic statistics and summarisation: Codebook, Frequencies, Descriptive statistics, Explore, Crosstabs, Ratio statistics, Tables.
- Significance testing: Means, t-test, ANOVA, Correlation (bivariate, partial, distances) and Nonparametric tests.
- Inferential statistics: Linear and non-linear regression, Factor, Cluster and Discriminant analysis.
Some of the optional components (add-on modules) available are:
- Data Preparation provides a quick visual snapshot of the data. It provides the ability to apply validation rules that identify invalid data values. You can create rules that flag out-of-range values, missing values, or blank values. You can also save variables that record individual rule violations and the total number of rule violations per case. A limited set of predefined rules that you can copy or modify is provided.
- Missing Values describes patterns of missing data, estimates means and other statistics, and imputes values for missing observations.
- Complex Samples allows survey, market, health, and public opinion researchers, as well as social scientists who use sample survey methodology, to incorporate complex sample designs into data analysis.
- Regression provides techniques for analysing data that do not fit traditional linear statistical models. It includes procedures for probit analysis, logistic regression, weight estimation, two-stage least-squares regression, and general nonlinear regression.
- Advanced Statistics focuses on techniques often used in sophisticated experimental and biomedical research. It includes procedures for general linear models (GLM), linear mixed models, variance components analysis, log-linear analysis, ordinal regression, actuarial life tables, Kaplan-Meier survival analysis, and basic and extended Cox regression.
- Custom Tables creates a variety of presentation-quality tabular reports, including complex stub-and-banner tables and displays of multiple response data.
- Forecasting performs comprehensive forecasting and time series analyses with multiple curve-fitting models, smoothing models, and methods for estimating autoregressive functions.
- Categories performs optimal scaling procedures, including correspondence analysis.
- effect of each product attribute in the context of a set of product attributes – as consumers do when making purchasing decisions.
- Exact Tests calculates exact p values for statistical tests when small or very unevenly distributed samples could make the usual tests inaccurate. Available only on Windows OS.
- Decision Trees creates a tree-based classification model. It classifies cases into groups or predicts values of a dependent (target) variable based on values of independent (predictor) variables. The procedure provides validation tools for exploratory and confirmatory classification analysis.
- Neural Networks can be used to make business decisions by forecasting demand for a product as a function of price and other variables, or by categorising customers based on buying habits and demographic characteristics. Neural networks are non-linear data modelling tools. They can be used to model complex relationships between inputs and outputs or to find patterns in data.
- EZ RFM performs RFM (recency, frequency, monetary) analysis on transaction data files and customer data files.
- Amos™ (analysis of moment structures) uses structural equation modelling to confirm and explain conceptual models that involve attitudes, perceptions, and other factors that drive behaviour.
2.2 Step-by-Step Procedure for SPSS (PASW Statistics 17.0) Installation
First, the user must have the software package with an official license or a 21-day evaluation licence. In this module, the evaluation version of SPSS (PASW Statistics 17.0) for Windows will be used for demonstration purpose.
The system requirements to install SPSS (PASW Statistics 17.0):Operating System: Microsoft Windows 7, Vista, XP or 2000
System Requirements: Intel Pentium-compatible processor, 256MB RAM, 700MB free disc space, VGA monitor, and Internet Explorer 6.0 or above
Follow the steps below to install evaluation version of SPSS (PASW Statistics 17.0):
Step 1: Check Installed SPSS Versions
- Make sure no older version is already installed. If a previous version exists, please uninstall it before starting the installation process.
Step 2: Download and Run “PASW_Statistics_1702_win_en.exe”
- Download and open “PASW 17.0 for Windows” folder.
- Double-click the file named “PASW_Statistics_1702_win_en.exe” to begin extraction of the contents automatically by the InstallShield Wizard”.
Step 3: Follow the “InstallShield Wizard” prompts until the installation is successfully completed.
- When requesting to choose license type, select “Single user license” and click Next to continue to the license agreement.
- Select I accept the terms in the license agreement and click Next to continue.
- Immediately, a dialog window with additional information for the users will appear. Read the information and click Next to continue.
- Fill in ‘User Name’ and ‘Organization’ accordingly and click Next to continue.
- A new window will appear requesting the place (folder) to save program files. It is strongly recommended to accept default location and just click “Next” to proceed.
PASW InstallShield Wizard will again confirm you want to begin the installation.
- Click Install to start installation or Back to review and change the installation settings.
When you click the “Install” button, the installation begins. It takes a few minutes. During installation, do not press a key or click mouse buttons since it may interrupt the installation.
When installation is complete, the Wizard will prompt you to register SPSS (PASW Statistics).
- Click OK to begin registration process.
- Select “Enable a temporary trial period” and Click Next.
- Click browse button.
- Select the trial license file “trial.txt” and click Open to get the trial license file.
- Click Next to continue and the next window will inform you that the trial period is in progress.
- Click Finish to complete installing the SPSS (PASW Statistics 17.0) with 21 days trial period.
2.3 Running SPSS (PASW) Statistics and Its User Interface
After successful installation, a program group called “PASW Statistics 17” will be placed under “SPSS Inc.” in the “Start Menu”. There will be at least two items in the menu; ‘PASW Statistics 17’ and ‘PASW Statistics 17 License Authorization Wizard’.
More items may be displayed in the menu, depending upon which optional components (add-on modules) you have installed.
2.3.1 Starting and Ending a SPSS Session
To run SPSS, just click the ‘PASW Statistics 17” menu item.
If you double-click on any SPSS data or syntax file it will also start SPSS and open the file in an SPSS Window.
When running SPSS for the first time, a superimposed dialog window will be displayed on top of the Data Editor window. This will help you initiate tasks when you first start SPSS. It helps users to perform initial tasks such as opening a data file, syntax file or output file; begin data entry; activate an existing query; create a new query or import data from another database file. A tutorial for beginners can also be run from this dialogue window.
Opening an existing data file from the list or by browsing for the file is often the first task users perform in SPSS. By default, up to nine most recently used files will be listed in both ‘Open an existing data source’ and ‘Open another type of file’. There will, however, be no file in either list when SPSS is run for the first time. To open an unlisted data file, double-click the ‘More Files…’ item and follow the steps you would normally use to open a file with the ‘Open file’ dialog box. Double-click the listed file names, or select a file from the list and click the OK button to open one of the most recently used files.
By checking the box , only the Data Editor will appear when starting SPSS in future sessions. It is recommended to click the ‘Cancel’ button to close the dialogue box, as this will ensure it reappears the next time your launch SPSS. If you hit the ‘Cancel’ button without selecting a file, a blank ‘Data Editor’ window will appear.
While using the evaluation version, the following message will be appear every time you launch SPSS. The message will indicate the number of days you have remaining in your evaluation period.
Once the trial period has elapsed, the SPSS processor will no longer work, and the commands will not produce any result.
Tips:Save the syntax and output files frequently!
Active running session of SPSS will end and exit automatically if the user closes the last active data set (or data file). Whenever a user exits SPSS, it will prompt the user to save all unsaved windows including data windows, output windows and syntax windows. SPSS does not have an automatic recovery feature and there is no ‘undo’ function for data transformations. Thus, it is important to save the syntax and output files frequently. Data files should be saved under a different name after applying any transformations or erasing any variables, so the original data files do not get lost.
2.3.2 Data Editor and Data Views
In SPSS Statistics, data files are displayed in the ‘Data Editor’. In the Data Editor, when the mouse cursor moves over a variable name in the column headings, a more descriptive label for that variable is displayed (if the variable has been defined with a label).
Data editor has two views: “Data View” and “Variable View”.
Data View: the actual data values are displayed in the cells by default. The ‘case numbers’ are displayed as row captions (the same as a ‘row number’ in Microsoft Excel), and the variable names appear as column captions. For the cells, users can choose to display descriptive value labels. For example, users can choose to display “Male” and “Female” instead of coded 1 and 2, from the menus by choosing View, then, click Value Labels as follows:
or by clicking the Labels button .Value labels make it easier to interpret the responses in the household survey.
The following is the data set for individual household members from the Bangladesh Demographic and Health Survey 2007 in the Data View with Value Labels.
The Data View shows the cases (or observations) in rows and each column represents a variable (a characteristic that is being measured). In the above example, each individual ‘member of selected households’ is a case, and each ‘item in the questionnaire’ is a variable. For example, ‘relationship to head of household’, ‘age’ or ‘highest education level’ is a variable. Each cell contains a single data value of a variable for a case. The cell is where the case and the variable intersect, for example, if the case represents the ‘head of household’ (row 13) and variable is ‘sex’ (HV104), the cell is ‘sex of the head of household’. When displaying the actual data values, the cell will show “2”, or it will become “Female” if selected to view in value labels. SPSS data files are stored in flat-file format and the data cells cannot store any formula.
Variable View: This displays the metadata dictionary where each row represents a variable and shows the attributes (or characteristics or properties) of the variable on 10 columns:
1) variable name;
2) type: numeric, comma, dot, scientific notation, date, dollar, custom currency, and string;
3) variable width, i.e. number of digits or characters;
4) number of decimal places;
5) variable label;
6) value labels;
7) codes for user-defined missing values;
8) column width in data view;
9) cell alignment, i.e., left, right or center when displaying in data view; and
10) type of measurement (scale, ordinal or nominal).
All attributes are saved with data values in the file.
The number of rows and columns (size or dimension) of the data file are determined by the number of cases and variables used in that file. Data can be entered in any cell, even in a cell that is outside the boundaries of the defined data set. In this case the dimension of the data view extends to include all the rows and columns to cover that newly entered cell. Variable names for the undefined columns will automatically be assigned as “VAR00001”, then “VAR00002”, and so on.
The cells without any data in the newly expanded data range (in both rows and columns) will display a period (.)as a place-holder to show there is a missing numeric value, or a blank space ( ) to show there is a blank string value. Note that a blank is a valid string value in SPSS.
In this case, the type of the new variables is automatically defined to be ‘numeric’ and default attributes for the numeric variable are set automatically. Users can change all attributes, including variable name and type, in the Variable View.
Apart from directly defining variables in the Variable View, two other methods can also be used to define variable properties:
- Copy Data Properties Wizard provides the ability to use an external data file or another data set that is available in the current session as a template for defining file and variable properties in the active data set. Similarly, variables in the active data set can be used as templates for other variables in the same data set. ‘Copy Data Properties’ is available on the ‘Data menu’ in the main SPSS window.
- Define Variable Properties, which are also available on the ‘Data menu’, scans the data and lists all unique data values for any selected variables, identifies unlabeled values, and provides an auto-label feature. This method is particularly useful for categorical variables that use numeric codes to represent categories, for example, 0 = Male, 1 = Female.