Most researchers agree that one of the first steps in data analysis is to take a good look at the actual data, descriptive statistics such as the mean and standard deviation, and graphs of the data.
Your goals for this section are:
1. To familiarize yourself with your data (“get close to the data”)
2. To gather summary information regarding your variables (descriptive statistics)
3. To provide graphical representations of the data.
Often times, data that seem extreme are actually the product of data entry errors. It is fairly common for errors to be made in this process. While most researchers would give lip service to the idea of verification of data entry (checking to make sure one has entered all data correctly), this is usually skipped as it is tedious and time consuming. As a result, extreme scores often slip through the cracks. Luckily, in most cases, there is an official copy of the data. Usually, data are first written on a paper form and then later entered into a computerized data file.
Note: For convenience, a column titled “entered data” is provided for ease of assessment. Normally, you would have to take a printout of the data and compare it to the “hard-copy” of the original data. (Culture 2 = Belize, 3 = Samoa)
Were the data entered correctly?
Yes, look at Measures of Central Tendency and Graphs
Are you sure about this? Compare the data entered to the data in the form again.
Statistical tests are completely useless if your data are entered incorrectly. Once you are satisfied that the data have been entered correctly, you should proceed to look at descriptive statistics, graphs, etc.
No, there are errors
Correct! There is in fact exactly one incorrectly entered value. The score of 125600.00 was entered incorrectly. You go back to the original data and find that the value should have been entered as 1256.
Remember, you should never trust data entry, always verify that values are correct.
Ask the Expert
Often times when data are entered, mistakes are made. Many times these mistakes are subtle (e.g. a score of 1 entered as a 2) and hard to catch. Other times incorrectly entered scores appear more obvious (i.e. a score of 10 receiving an extra zero and becoming 100). The best option is to check all data entry and make sure it is accurate. Certain scores can “tip” us off as to the existence of a problem. Often times extreme outliers are inaccurately entered scores.
ENTERED DATA | RAW DATA | ||
CULTURE | Distance | CULTURE | Distance |
2 | 966.39 | 2 | 966.39 |
2 | 322.66 | 2 | 322.66 |
2 | 415.00 | 2 | 415.00 |
2 | 238.21 | 2 | 238.21 |
2 | 335.93 | 2 | 335.93 |
2 | 102.06 | 2 | 102.06 |
2 | 508.10 | 2 | 508.10 |
2 | 1286.00 | 2 | 1286.00 |
2 | 1048.21 | 2 | 1048.21 |
2 | 96.21 | 2 | 96.21 |
2 | 465.75 | 2 | 465.75 |
2 | 1698.99 | 2 | 1698.99 |
2 | 3686.00 | 2 | 3686.00 |
2 | 388.82 | 2 | 388.82 |
2 | 137.29 | 2 | 137.29 |
2 | 937.97 | 2 | 937.97 |
2 | 9246.30 | 2 | 9246.30 |
2 | 1374.00 | 2 | 1374.00 |
2 | 1219.62 | 2 | 1219.62 |
2 | 645.51 | 2 | 645.51 |
2 | 108.05 | 2 | 108.05 |
2 | 639.66 | 2 | 639.66 |
2 | 878.33 | 2 | 878.33 |
2 | 663.57 | 2 | 663.57 |
3 | 1811.24 | 3 | 1811.24 |
3 | 441.22 | 3 | 441.22 |
3 | 1081.04 | 3 | 1081.04 |
3 | 706.33 | 3 | 706.33 |
3 | 730.00 | 3 | 730.00 |
3 | 444.34 | 3 | 444.34 |
3 | 714.80 | 3 | 714.80 |
3 | 1968.17 | 3 | 1968.17 |
3 | 19898.00 | 3 | 19898.00 |
3 | 420.00 | 3 | 420.00 |
3 | 526.40 | 3 | 526.40 |
3 | 669.24 | 3 | 669.24 |
3 | 683.97 | 3 | 683.97 |
3 | 12502.79 | 3 | 12502.79 |
3 | 2684.76 | 3 | 2684.76 |
3 | 1632.26 | 3 | 1632.26 |
3 | 5985.72 | 3 | 5985.72 |
3 | 602.11 | 3 | 602.11 |
3 | 125600.00 | 3 | 1256.00 |
3 | 3733.70 | 3 | 3733.70 |
3 | 640.00 | 3 | 640.00 |
3 | 15212.00 | 3 | 15212.00 |
3 | 20121.00 | 3 | 20121.00 |
3 | 21331.00 | 3 | 21331.00 |