The charts we provide on the global development of Covid-19 always use a selection. So we embraced the help of another country for creating a tool where everybody can make their own selections. The tool is now grown up. So time to give a bit of background info, some explainers and some disclaimers.
Introduction
We've been tracking the Covid-19 pandemic since January 25th, first on Twitter, later on the dedicated page on our site. Our main goal is helping to gain some insights from the available data for a wider audience. We try to stay out of predictions or analysis, we leave that to the experts.
In the beginning the outbreak was only happening in a few countries, so reporting on all was possible. But as the epidemic became a pandemic, this was no longer doable. So we had to make selections. And we also had to limited time in creating different forms of presentation. For making it possible to compare countries, different variations of starting points exist (first case/death in a country, first 100, first case per 100.000 capita, etc...). We couldn't keep up.
Then we got support from Innouveau. They suggested to put all the data we've been gathering in a tool so users could make their own selection. Not only selecting countries, but also starting points and perspectives (day-to-day versus cumulative or absolute numbers versus per capita for example).
So after some sleepless nights the tool was created, tested and placed online.
Now we are already 12 version further, after using all the feedback from the users and some brainstorming ourselfs.
Hope you will find the tool useful. Please mail us your suggestions or feedback.
How does it work
There are three parts in the screen for selecting the countries and settings.
The first part is on the upper left corner. Here you can search and select countries. In the search box you can type several letters and the list of options appears, select the country/region by clicking on it. The country/region will appear in the graphs. You can also select all regions of a country in one go (if regions are available) by searching for that country and scroll to the option "all regions...".
It's also possible to select global regions like Africa and Europe.
You can remove the countries/regions from the selection by clicking on the small x next to each country. You can change the line color by clicking on the colored dot and selecting another color.
The second part, just below the selection of countries, is for setting up the starting point. You have four options. For the first three options you create a graph with on the horizontal axis the number of days since since the choosen starting point. This means countries won't be synced in calender time any more on the graph.
The options are:
- Events: select an event type as starting point. If you pick this one, you can change the event type below it. Please be aware that we don't have all the events in our database for all countries. So if a country doesn't have a specific event, the line won't show up in the graph because it has no starting point.
- Fatalities: Create a starting point based on the absolute or relative number of fatalities for a country.
- Cases: Create a starting point based on the absolute or relative number of cases (confirmed) for a country.
- Date: Pick a date from which the data should be displayed.
The third part, right from second part, is about changing the way the data is displayed. The toggle switches are:
- Logarithmic scale: switch the vertical axis from linear to logarithmic (power of 10). This means each step up is 10 times the values. Useful for studying possible exponential growth patterns.
- Per (1 million) capita: Switch from absolute number to the numbers proportional to the population (expressed as x per million people). Enables comparing countries with big differences in population size on the relative impact.
- Cut Y-Axis: If there is a lot of blank space below the lowest data point, you can use this option to gain more detail by spreading the data evenly.
- Cases/Fatalities cumulative: switch between new cases/fatalities per day to the cumulative cases/fatalities (all cases/fatalities up to that day
What are the sources of the data
- Population data are mainly from Wikipedia.
- The data on cases and fatalities on country level plus the states/provinces from China, Canada and Australia are from Center for Systems Science and Engineering (CSSE) at Johns Hopkins University.
- The data on cases and fatalities on regional level for countries from Europe re from the EU Joint Research Centre (JRC).
- The data on cases and fatalities for the states of the USA are from the New York Times (found via the Humanitarian Data Exchange).
- The events (a manual selection) are from ACAPS.
We use Python to gather and reformat the data. We use Javascript for the tool.
What calculations do we add
In the graphs we use two calculations.
In the cases and fatalities graphs we use a simple smoothing average as the thick line to see through the sometimes noisy data. This is a simple algorithm that takes the value of a point and multiplies it by 7, then it takes both the values one day distant (one in the past one ahead in time) and multiplies them both with 6, then the value 2 days out times 5 and further until you reach 6 days out and multiply those values by 1. You sum all the values and divided them by the sum of the multipliers. Thus you get a kind of centered 15 day average where the nearness gives more weight.
In the growth the value is a we take the smoothed values from the fatalities graph and calculate the current value (at day x) divide by the previous valued (at day x minus 1). And we then smooth all the values again using the same smoothing as mention above.
What are specific known (data) issues with countries or regions
Not all data has the right quality or grain. Here we list the most obvious problems. Please take not of them before mailing us on strange results.
General remark on the data. If you see a flat line appearing when you select the per capita option, this probably means we haven't found the right population figure for that country/region. If you really want to make graphs with that country/region, please feel free to look it up and send us the data/link.
Specific issues:
Country Level Issue Austria Regional Data on fatalities starts only at 20 April with the cumulative number of all days before. Netherlands Regional No data on fatalities at this level yet. Sweden Regional No data on fatalities at this level yet.General issues:
Sometimes the source data is cumulative but in the data of one country/region one day suddenly has no value or zero as value, even when previous days have values above zero. This probably happens when reporting omitted a day. We have changed these omitted values to the value of the day before in order not to "disrupt" the used data in the graphs too much.
Since we use several sources, sometimes one of the sources is one day behind on reporting. We then fill the last day with zero new cases/fatalities. So sometimes you can see a sudden drop at the end in the graph. This not always means there are no more cases/fatalities. Check next day for the update.
Disclaimers!
Please be aware that the data we use is the data as provided by the governments or responsible institutions in specific countries.
Due to differences in testing, vetting, administration and reporting, data between countries is often hard to compare.
Less testing means less found cases. But this does not mean people are not affected. In many countries, specially in the beginning of the outbreak, the elderly were not tested and thus not counted as cases or deaths. Given the fact that the 65+ group was more vunerable, this leads to underreporting.
Further more there is the political issue. In countries like China, Iran and Egypt (and many more) the people in charge have surpressed correct reporting or in some cases the people reporting to the top were too scared for their jobs to give the real numbers.
Last but not least, we lean on third parties for gathering all the data. Errors can be made in that process. And we can also make errors in scraping, aligning and cleaning the data we gather.
So don't use this tool for scientific research or policy making or company decisions please. It only to here to help to gain some insight in what going on all over the world in this pandemic.
If you find any obvious and correctable errors, please let us know.
Stay safe!