Module B3: Checking, Editing and Preparing Household Survey Data for Analysis

4. Tips and Exercises

4.1     Tips: Do and Dont

i) Do request copies of documents such as project proposals, questionnaire sets, codebooks, documents on fieldworks, and survey reports while approaching the agencies/departments to get survey data.

Dont judge the usefulness of data on the spot and do not leave any survey documents and data sets that are available in the survey agencies/departments.

ii) Do understand, check and edit metadata (a set of data that describes and gives information about other data) before using a secondary data set.

Dont leave any variable without a proper definition that includes the variable label, value labels, missing values and measurement level (scale, ordinal and nominal).

iii) Do save the data set with a new appropriate filename whenever changes have been made, and record any changes that have been made to earlier versions.

Dont save the current data set using the original filename after making changes. Do not replace the original data file with edited versions.

iv) Do copy variable properties whenever possible.

Dont leave it as they are after copying without checking the variable properties and editing the properties as necessary.

v) Do define and use variable sets for the ease of analysis, and subset new data sets by selecting variables as well as cases.

Dont change variable type and measurement level without thoroughly understanding the impact of the changes.

vi) Do recode string variables into numeric codes using “automatic recode” and use “visual binning” for the continuous variables (or numeric variable with several different values) to reduce the number of items.

Dont recode into same variable since it is irreversible (the original variable can easily be deleted when it is no longer needed).

vii) Do validate data through single-variable and multiple-variable rules and check the existence of duplicate cases before conducting any analysis.

Dont change the values in the data set using assumptions or guesses about the correct values. Always contact to the primary data source to obtain corrections or, if there are not too many invalid cases, they may be omitted.


4.2     Self-evaluation
l  Do you understand how to set variable properties in SPSS (PASW) statistics?
Very well / Somewhat well / Not much / Not at alll  Are you confident that you can do the followings in an active dataset?

  • Compute a new variable:
    Confident / Somewhat confident / Not very confident / Not at all
  • Recode into a different variable:
    Confident / Somewhat confident / Not very confident / Not at all
  • Selecting cases with girls under 15:
    Confident / Somewhat confident / Not very confident / Not at all
  • Sorting cases with wealth index factor score and highest education attained:
    Confident / Somewhat confident / Not very confident / Not at all
  • Check erroneous values in a variable (validate with single/cross variable rule)
    Confident / Somewhat confident / Not very confident / Not at all
  • Check existence of duplicate cases in the dataset
    Confident / Somewhat confident / Not very confident / Not at all

l  Do you understand visual binning?
Very well / Somewhat well / Not very confident / Almost None

4.3     Hands-on Exercises

1)    Import the attached “data1(tab).dat” and define all variables appropriately.

2)    From the data set obtained from Exercise 1 above, recode all string variables.

3)    Create single-variable rules to check the validity of three education related variables.

4)    Create two multi-variable rules to check the validity of (i) current schooling status of household members, and (ii) education in single year of household members.

5)    Find duplicate cases from the current data set and propose how to handle those cases.

Comments are closed.