Tuesday, May 5, 2020

Covid-19 Correlations

Where I started

I was thinking about why there was such a wide spread of values of deaths per million from or of Coronaviris across the 48 countries of Europe so I decided to look at some correlations. It's important to note these are not direct country to country comparisons, I think we all know by now they are very dangerous. Rather they are a statistical method to see what the possibility that a pair of sets of values are somehow linked. The strength of a correlation is measured in the P (for Pearson) value and they range form -1 negative correlation through 0 (no correlation) to +1 positive correlation. A negative correlation means that as one value gets bigger, the other value gets smaller. A positive correlation means that the values increase or decrease together. I used to do rank correlations by hand for my OPen University courses but that is hard work and we now have computers that can do that stuff for us. I used LibreOffice calc and it a matter of fractions of a second to get an answer that would certainly taken hours using the old pen and paper method.

"Correlation is not causation" is a very important concept. It might be that there is a spurious correlations. There are some great spurious correlations of time series here:

https://www.tylervigen.com/spurious-correlations

However given a strong correlation of P>.5 it is certainly legitimate to hypothesise for causation.

Population Density

My first instinct was that it was due to population densities. It seems obvious that if you cram people together the spread of disease will be faster and you may get a higher mortality per capita than more sparsely populated countries. So what does the correlation look like?

The P value is 0.234105007129706 which represents a weak positive correlation. Not very much above chance. I was surprised.

That is certainly not a good enough correlation warrant searching to hard for a causation

Life Expectancy

We generally think of Life expectancy as an indicator of the quality if health and social care and general health of the population. The longer the live expectancy the better we were doing. So you'd expect the longer the live expectancy, the better the health and social care available and there should be a negaitive correlation, that is as life expectancy goes up, mortality goes down.

The European values range from 71.72  to 85.42 a pretty wide range for what we would all expect to be developed industrial countries.

I tried life expectancy against deaths per million. P value is 0.529. That is certainly more significant. Yes, thats a positive correlation. as life expectancy goes up so does a country's mortality rate for the Chinese Cough.

It seems therefore that a high life expectancy is a disadvantage where it comes to mortality from this virus. So maybe it is not a good indicator of the quality of health, social care and fitness after all. Maybe it's just an indicator that we can keep sick people alive longer. I'm not that sure that's a good thing when a pandemic strikes.

There is an alternative explanation. It could be that the reporting in countries with a lower life expectancy is less accurate and they are underreporting deaths from or with to coronavirus.

I'll let you decide.

Data


I did this a while ago and the data is from Worldometer from Wednesday 29th April. The Uk has changed its reporting method since then to include non NHS deaths. I don't know whether this has a large effect but I'll do a revision if I get a chance.

The spreadsheet of values I have used is here 







No comments:

Post a Comment