Good News, Bad News, and Good News
The good news is that a ton of data is available for free. The US Federal Government, all the US State Governments, the European Union, The United Nations, and other non-governmental entities make data available for free. The bad news is that the particular combination of data you need may need to compiled from two or more different sources. The good news is that the EXCEL vLookUp formula and a bit of EXCEL text formulas may be all you need to bring the data together for your analysis.
Rent and Birth
Suppose you wanted to know if/how the cost of rent in New York correlates to child birth problems and infant morbidity. The New York State Department of Health has the data you need regarding birth and infants at
Here are two cropped pictures of a data set that is available and can be imported to EXCEL:
This view is for having something pretty:
This is how it looks when it is scraped into EXCEL with a data connection:
If you copy and paste the “pretty” version into EXCEL, one has trouble with the headers.
So now we have our birth data.
The United States Census has rent data available at the American Fact Finder website at:
You need to go through a few screens to get to the data. The two key screens are Advanced Search screen and the Table Viewer. We do not cover all the details of how to navigate these in this article (see links to more information about that at the end of this article).
Here is what the advanced search screen looks like:
In this example we chose a topic (Gross Rent) and a geography (All 5-Digit ZIP Code . . .). This brought up a choice of many resources. We chose “MEDIAN GROSS RENT”. Then we clicked on “View”.
This brought us to the Table Viewer screen.
We used some of its features to get the data the way we wanted it (more information about that at the end of this article) and then clicked on “Download”.
Here is the downloaded EXCEL data set:
Applying the vLookUp Formula
The details for preparing the data for the vLookUp formula and then applying the vLookUp formula are not provided here (see more about that at the end of this article). The point is that this allows us to combine the rental cost and birth data so that we can produce the following scatter plots:
This certainly would appear to support the hypothesis that median rent by zip code correlates to these indicators. Given that this is highly plausible prior to any examination of data, this analyst would say the burden of proof has now shifted to those who would deny it.
Comparing Teen Birth Rate and Infant Death Rate
If we make the scatter plot of the teen birth rate on the vertical axis and the infant death rate on the horizontal axis, then it appears that the dots for the 14 highest teen birth rates separate into three clusters.
Below we examine whether those Cartesian clusters correspond to geographic clusters.
Cartesian (Scatter Plot) and Geographic Data Mapping Compared
Of course, we don’t expect perfectly corresponding clusters. We mark one Cartesian cluster with an orange triangle, one with a blue disk, and the third with a green pentagon.
- 5 of the seven blue disks line up along the geographic west side relatively north of the other data points.
- The three orange triangles cluster towards the geographic middle of the data points.
- 2 of the three green pentagons are way at the geographic southern end. The one in the geographic middle has the highest infant death rate.
Free Membership with DataDrivingSchool.com
The Free Membership includes:
- Monthly News Letter
- Discounted Prices on Learning Apps
- A Free See-It-Do-It learning app to get you up and running with the EXCEL vLookUp function.
- Free access to members only instructional videos including
- “Credibility and Effectiveness Mistakes with Data”.
- “Using Government Information Sources”.