Cancer Rate Analysis

This is a research on the topic of Associations between External Factors and Cancer Death Rate with my classmates Christina Lee, Jin-Su Oh, and Joe Cho at University of California, Berkeley. Here are part of the data analysis and results we’ve found.


This is a research done on the cancer rate and its possible correlated factors, such as precipitation, population density, latitude, and race in the United States. we obtained data regarding cancer rate and used graphs and images to show different correlations with each factor.

We analyzed the precipitation and death rate due to cancer and after running a regression, we found that there is a slight positive correlation between the two factors. We then mapped the United States longitude and cancer rates for each state and found that there is no real trend about the geographical latitude and cancer rate. After juxtaposing maps of population density and death rate and creating a regression of these two variables, we found that there is also a slight positive correlation between a county’s population density and death rate.

Relationship between precipitation and cancer death rate: average total precipitation per year in inches, taken from 1971-2000, are plotted against annual death rate per 100,000 in 2006 (data taken from the National Cancer Institute), with the plots represented using state abbreviations. A regression is run through the scatterplot, and it shows a positive correlation between precipitation and death rate.  It seems that states with more rain fall, on an average of 50-60 inches per year, have a much higher death rate, around 220 deaths per 100,000 people a year. After running the regression, the line resulted with an intercept of 162.1512 and a slope of 0.5782. This means that, according to the linear regression model, with no precipitation, death rate is about 162; after that, every increase of one-inch precipitation leads to about an increase of 0.5782 in death rate.

Population density is plotted per county (data taken from U.S. Census Bureau), with warmer colors representing areas of higher population density, and cooler colors representing areas of lower population density. As expected, counties consist of business-intensive cities, such as Los Angeles, New York, San Francisco, and Chicago, are in very warm colors, indicating very high population density.







Cancer rate is plotted again county, areas with higher cancer death rates are represented with warmer colors and lower cancer death rate are represented with cooler colors. Predictions are made using method of covariogram. This agrees with the research hypothesis that counties in the Southeast have relatively higher cancer death rates over the nation.

%d bloggers like this: