Module B2: Introduction to Data Analysis Software

4. Using Data Sets Created in Other Formats

In general, SPSS can read data sets created by almost all popular statistical software and databases and an SPSS data set can be saved in several popular formats. Therefore, SPSS data format (*.sav) is the common format in sharing/distributing survey data sets.

Generally, SPSS can read data files created in:

  • all versions of SPSS (*.sav) and SPSS/PC+ (*.sys) formats.
  • spreadsheets (EXCEL, Lotus and SYLK).
  • database tables (dBase, MS Access, FoxPro, Oracle, SQL Server)
  • statistical software (SAS, SYSTAT, and Stata).
  • different text formats (fixed width, comma delimited/ CSV, tab or space delimited, etc.).

Data files created by spreadsheets and other statistical software can be opened directly as SPSS data files. Similarly, SPSS can open dBase files, text data files and other files without converting the files to an intermediate format or entering data definition information. On the other hand, complex database files such as MS Access, FoxPro and files from SQL databases can be accessed through the database wizard or by writing SQL queries.

Opening a data file makes it the active data set. The active data set is the one from which SPSS will read and write during the session if there is no specific command to change to another data set. If there are one or more data files (or data sets) opened, they remain open and available for subsequent use in the session. Clicking anywhere in the Data Editor window for an opened data file makes it the active data set.

An SPSS data file can be saved (or exported) into other file types. However, some file types can only save data values. In its native format, SPSS is able to save both values and the data dictionary (or attributes). The data dictionary or attributes, such as variable labels, value labels and missing values are lost if the file is saved to other formats, including Microsoft Excel format.

4.1     Importing Data from Microsoft Excel

Importing data from Microsoft Excel is the easiest among the data sources. First, arrange the spreadsheet in tabular format fulfilling following six recommendations:

i)  Names of the variables on the first row of the data range.

ii)  Variable names comply with SPSS naming rules.

iii)  For all numeric variables, there should be no blanks in the second row of the data range.

iv)  Data range should be continuous – no blank rows or column.

v)  Clear of any graphs, labels, and extra text or data on the worksheet.

vi)  Delete unnecessary worksheets that are not going to be imported.

If the data in Excel file has several worksheets, it is better to create a new Excel file with just one worksheet containing all the necessary data, including variable names.

Once this is complete, follow these steps:

On the main menu click:-

  1. ‘File’
  2. ‘Open’
  3. ‘Data’

And, an ‘Open Data’ pop-up window will appear. In this window:-

  1. Change ‘Files of type’ to ‘Excel (*.xls, *.xlsx, *.xlsm)
  2. Select the folder containing Excel data file from the ‘Lookin’ box.
  3. Select the correct Excel data file (in 97-2003 or 2007 format).
  4. Click ‘Open’.

The ‘Opening Excel Data Source’ pop-up window will appear. On that window:

  1. Clear the checkbox next to ‘Read variable names from the first row’, but only›› if the first row of the Excel data sheet does not have variable names.
  2. If the file has more than one worksheet, select the worksheet that contains the data.
  3. Type in the range of data to be imported. For example A1:V100 for the first 99 cases or 100 rows, including the row for the variable names.
  4. Click OK.

If the data file in Excel was prepared according to the six recommendations (above), steps 8 to10 may be skipped because there will be only one sheet in the Excel file, the data range is continuous and there is no extra cells or objects in the sheet other than the data that is to be analysed.

Once this process is complete, the data from Excel file will have been transferred into an SPSS data set. At this time, it is important to save the current SPSS data set with an appropriate name.

Because data files in Excel, text format or database systems do not have a ‘data dictionary’, which contains information about data attributes such as variable labels, value labels or missing values, it is important to define such attributes for all variables, and then save the data file again.

4.2     Importing Data from Delimited ASCII Text Files

When requesting data from other agencies and departments, sometimes, data are provided in text or ASCII file format. Normally, data in an ASCII file are arranged in either: (i) fixed width format, where each variable is placed in same location, or (ii) delimited format where a variable is separated by a specific character such as tab, space, comma, semicolon and any another specific character that is unique throughout the data file and not used in any of the data values.

To import data from a delimited text file, first review the file in a text editor such as notepad and check the character used for delimitation (normally, tab, space, comma or semicolon). Then, follow these steps:

On main menu click:-

  1. ‘File’
  2. ‘Read Text Data’
  3. The ‘Open Data’ pop-up window will be appear with text (*.txt) file type.

In Open Data window:-

  1. Select the folder that contains the text data file in the ‘Lookin’ box.
  2. Change ‘File of type’ to ‘All Files (*.*)’.
  3. Select the correct data file (*.txt, *.dat, *.csv, *.prn).
  4. Click ‘Open’.

A ‘Text Import Wizard’ will begin and guide you through the import process.

The wizard contains the following 6 steps:

Step 1/6:    Click ‘Next’ to go to Step 2 of 6;

In Step1 of the Wizard, you can apply a predefined format (previously saved from the Text Wizard) or follow these steps.

Step 2/6:    (i)    The Wizard will identify whether the data is arranged as ‘Delimited’ or ‘Fixed width’, but make sure the correct option is detected (in “Data.csv” file, the variables are separated by a comma “,”, and thus the file structure is delimited).

(ii)   Identify whether the variable names are included at the top (first line) of the data file or not (in this example, ‘Yes’), and click ‘Next’ to go forward to Step 3 of 6.

Step 1/6:    (i)    Since data files begin with variable names, the first data begins on line 2. Otherwise, the user should identify the line number on which data begins.

(ii)   If a line represents one case (eg. one person), just click ‘Next’, otherwise select the second option on ‘How are you cases represented?’, specify the number of variables per case and click ‘Next’.

Step 2/6:    The Wizard will automatically identify the delimiter(s) between variables. However, it is important to check they were identified correctly. Some software packages export text in quotes, such as “text” or ‘text’. If this is the case, then the character of text qualifier (or quotation mark) must be specified by the radio buttons of the second question ‘What is the text qualifier?’, and click ‘Next’.

Step 1/6:    In this step, variable names and data formats can be specified or changed from the default settings. Then, click ‘Next’ to continue or ‘Finish’ to start importing data.

Step 2/6:    In this step, just click ‘Finish’ to start importing. This may take a few minutes to complete.

4.3     Importing Data from Fixed Width Text Files

In some text data files, variables are aligned in fixed width columns. That is, data for a variable are located at the same column throughout the data file. For example, sex of household member is situated in column 33 of every line in the “Data(Fix).txt” file, that was extracted from the Bangladesh DHS 2007.

To import data from a text file with fixed width data structure it is important to have the data dictionary of the variables, that is, to know which variables are located in which column(s).

Then, on main menu click:-

  1. ‘File’
  2. ‘Read Text Data’

Then, ‘Open Data’ pop-up window will appear with text (*.txt) file type.

  1. Select the folder containing text data file from ‘Look-in’ box.
  2. Select the correct data file (*.txt or *.dat).
  3. Click ‘Open’.

The “Text Import Wizard” will begin automatically and guide us through the importing process.

 

The wizard contains the following six steps:

Step 1/6:    Simply, click ‘Next’ to go forward to Step 2 of 6;

Step 2/6:    (i)    The Wizard will detect whether the data file is “Delimited” or “Fixed width”, but check that SPSS detects the correct format (“Data(Fix).txt” contains no separation character so the file structure is fixed width).

(ii)   Identify whether the variable names are included at the top (first line) of the data file or not (for this example, choose ‘No’), and click ‘Next’ to forward to Step 3.

Step 3/6:    Since there is no variable name in the first line, the first case of data begins on line number 1.  Sometimes, a case spans over one more than line, so users have to specify the number of lines per case. If there is only one line per case, just click ‘Next’ to continue.

Step 4/6:    This is the most crucial step in importing a fix width data file. Use the data dictionary to identify and split the case into variables accordingly. In this example, one line of data represents a case, and the location of variables are as following:Since there is no variable name in the first line, the first case of data begins on line number 1. Sometimes, a case spans over one more than line, so users have to specify the number of lines per case. If there is only one line per case, just click ‘Next’ to continue.Since there is no variable name in the first line, the first case of data begins on line number 1. Sometimes, a case spans over one more than line, so users have to specify the number of lines per case. If there is only one line per case, just click ‘Next’ to continue.

The Wizard will put in separation lines or break lines wherever required. For example, if a column contains blank(s) consistently across the lines, the Wizard will insert a break line. A break line can be inserted or deleted with the ‘Column number’ input box below the data view. For example, to insert a break line in the column 13, put in 13 in the ‘Column number’ input box and press the ‘Insert Break’ button. Similarly, to delete a break located on column 28, just type in 28 and click ‘Delete Break’ button. In this step, the user needs to check and identify all break lines to import the data correctly.

After defining the location click ‘Next’ to proceed to Step 5.

Step 5/6:    In this Step, one can select ‘Finish’ to start importing data with default variable names (V1, V2, …,Vn), and data formats (all numbers will be numeric and the remaining be string). Or, users can enter variable names and formats individually.

Step 6/6:    Simply click ‘Finish’ to start importing the text data

In text data import wizard, the user can save the format (including break lines and variable names) for future use.

It may take a few minutes to import the text data into SPSS Statistics Data Editor. It is strongly recommended to check and edit (or create) variable attributes such as variable labels, value labels and missing values. It is important to define such attributes to all variables, and then save the data file again.

4.4     Importing Data from Microsoft Access Databases

Data from the databases that use Open Database Connectivity (ODBC) drivers can be read directly by SPSS if the correct drivers are installed on the computer. Commonly used ODBC drivers are provided with the SPSS installation package. Among these, Microsoft Access is the most widely used database system. In this section, we present a step-by-step guide to importing data from MS Access. The same steps, with minor variations, could be followed to import data from ODBC-compliant databases created on other platforms.

Before importing data from a Microsoft Access database, check if the database contains a single table in flat file format (like a worksheet) with all the required variables. If the data is contained in several database tables, it is better to consolidate all the variables into a single Microsoft Access table before importing the data.

To begin, click followings on main menu:-

  1. ‘File’
  2. ‘Open Database’
  3. ‘New Query’

A ‘Database Wizard’ window will appear. We can specify our ODBC data source in this wizard. All available ODBC data sources will be listed on the right pane. Click the source that matches the database we are importing. If there is no appropriate source, a new driver file for that particular source must be installed or added the data can be imported from the database.

Normally, however, ‘MS Access Database’ will be included in the list.

  1. Select ‘MS Access Database’ from ODBC Data Sources
  2. Click Next’ to continue.

When using a particular data source for the first time, the ‘ODBC Driver Login’ window will appear. If it is not the first time that this import procedure has run, the Wizard may skip this step.

  1. Click ‘Browse’ to locate the folder where the database files are kept, and select the correct database file to open.
  2. Click OK to open that database file.

At this point, the user may also setup a new link.

Then, the ‘Database Wizard’ window will display two panes: ‘Available Tables’ on the left and ‘Retrieve Fields in This Order’ on the right.

  1. Click table name to expand and double-click the field name(s) to select individual fields or double-click the table name to select all variables in that table.
  2. Click ‘Finish’ to import all cases (53,413 cases) from the database.

It is important to save the data file after the import process.

  1. We can click ‘Next’ to proceed to another step where ,we can choose which cases to import based on some criteria (filtering). The following example shows how to import the cases where age of household member is between 6 and 15 years.

If so, only 12,621 cases will be imported instead of 53,413 cases in the entire database. It is important to save the data file at the end of importing process.

  1. When we press ‘Next’ again, we can redefine variable names and process auto-recoding string variables before we press ‘Finish’ to start importing.

When all variables have been imported, SPSS assigns F8.2 (floating-point format; total of 8 digits including 2 decimal places) to all numeric variables, and A255 (alpha-numeric format; up to 255 characters) to string variables. It is important to ensure these default formats suit all our variables, and also to set column widths to display the data correctly. String variables should be recoded for easier analysis. The next module will explain how to refine imported data sets.

If there are several tables in the source database file, we can link tables through identification fields to import related variables from different tables. However, it is often convenient to link and consolidate tables into a single table with all required variables in Microsoft Access (or in the original database software) before importing into SPSS Statistics.

Comments are closed.