Coronavirus Mapping as a Service

Note: This post does not contain any medical advice or suggestions on how to act and react. If you are looking for that, you are looking in the wrong place.

Economy and society in Switzerland are currently highly affected by the spreading second version of the coronavirus (SARS-CoV-2) that causes the associated infectious desease (COVID-19). The World Health Organisation (WHO) has classified the virus outbreak as PHEIC on January 30 and as pandemy on March 11. In Switzerland, the state emergency level Eminent/Special Situation was reached on February 28, and further restrictions led de-facto to the subsequent level Extraordinary Situation on March 13. This blog post reports on how the outbreak evolution can be continuously visualised as a reliable service with off-the-shelf tools.

As researchers and educators, we are inevitably affected by the coronavirus like most of the population. Conferences are cancelled, moved to later days, or switching to online-only participation, as are lectures. Visiting researchers can no longer visit, and project trips are cancelled as well.

We are however privileged in two ways as computer science researchers. First, most of our research works fine in isolation. Even Internet access could drop for a while, as long as electricity is available to power our notebooks. Second, as part of the digital transformation, we can think of how we can contribute to better capture, preserve, augment and convey the information about the outbreak. False facts spread as quickly as the virus does, and due to the dynamics and the global scope, the data quality differs a lot. Generally, much of the data are also not verified by independent sources, and are presented in a particular way without easy remixing for reinterpretation. Due to our work on MAO, we could think of a global collaborative network of research data. In this post, however, the focus is on a more entry-level visualisation of existing data for Switzerland.

A lot of maps are produced that show the absolute number of cases. Few sources exist to systematically track the numbers relative to the population density. A good example would be on Wikipedia, showing world maps for both metrics side by side. Yet this does not allow for precise comparisons due to the binning (i.e. 11 and 50 cases per 1 million inhabitants are coloured equally). Furthermore it does not allow for zooming into a country’s second-order administrative areas easily. Even fewer reliable open data repositories exist. The BAG numbers, for instance, are delivered as HTML content and are not available via OpenData.Swiss as they should be. How much effort would it take to prototype a self-made map service that uses up-to-date numbers on a daily basis?

Using the BAG numbers, GIS data, population numbers and a Python/Geopandas tutorial, this is not hard to answer. Although the tutorial is not complete – it misses normalised colour bars and the timeseries perspective – it is straight-forward to adapt to Switzerland. Recipe:

  1. Prepare GIS data: wget; mkdir -p data; mv data; cd data; unzip
  2. Prepare cantonal data: cases and population (manually; CSV file is available for 2020-03-14)
  3. Render maps (Python file is available; see below).

The following two figures compare absolute and relative case numbers (sometimes without the disclaimer that these are obviously known and/or confirmed cases only). Much of the current media attention is on the “Southern cantons” Ticino and Vaud due to high absolutes. When taking population density into account, Ticino becomes worse, Vaud less so, and Basel is evidently on the worse side as well. Valais, another “Southern canton”, does not stand out in either, making a pure geolocation-centric hypothesis less likely.

Switzerland coronavirus cases – absolute numbers
Switzerland coronavirus cases – relative numbers

The numbers show that in Ticino, around 0.07% of the population is currently affected (or, potentially, had been – the inclusion of deaths is not explicit in the BAG numbers). The scales have been chosen consciously to highlight the distribution differences. On a more intuitive scale, even just reaching 1% and the corresponding cantonal population of 500’000 (leaving just Bern and Zurich above), the maps look a lot less frightening:

Evidently, there is a lot of power in how the maps are presented which calls for detailed studies. From a data science perspective, it is just clear that raw data, ideally as a convenient service with augmented representations, should be provided first and foremost from authoritative sources. The service would have to be stateful, updating numbers on a daily schedule instead of per request and in an automated way using web scraping until declarative data formats become available. The map representation could drill down to lower administrative areas (see however the lack of Zurich cases in detail), strengthening the geoinformatics perspective needed to resonate better with the population and hence increasing the societal impact.

Here are the map sources:

import pandas as pd
import numpy as np
import geopandas as gpd
import matplotlib.pyplot as plt
import datetime
import os

fp = "data/CHE_adm1.shp"
map_df = gpd.read_file(fp)
cantons_df = pd.read_csv("cantons.csv")
merged_df = map_df.merge(cantons_df, how="left", left_on="NAME_1", right_on="CANTON")

def plotmap(df, datacol, vmax, filename, title):
    sm ='Blues', norm=plt.Normalize(vmin=0, vmax=vmax))
    fig, ax = plt.subplots(1, figsize=(30, 10))
    ax.set_title(title, fontdict={'fontsize': '25', 'fontweight' : '3'}) 
    ax.annotate("Sources: BAG, WP, ZHAW SPLab", xy=(0.68, 0.11), 
    xycoords='figure fraction', fontsize=12, color='#555555')
    fig.colorbar(sm, ax=ax, extend="max")
    df['coords'] = df['geometry'].apply(lambda x: x.representative_point().coords[:]) 
    df['coords'] = [coords[0] for coords in df['coords']]
    for idx, row in df.iterrows():
        plt.annotate(s=row['NAME_1'], xy=row['coords'],horizontalalignment='center') df.plot(column=datacol, cmap='Blues', linewidth=0.8, ax=ax, edgecolor='0.8', vmax=vmax) 
    fig.savefig(filename, dpi=150)

merged_df["VIRUSCASESDENSITY"] = 100 * merged_df["VIRUSCASESCONFIRMED"] / merged_df["INHABITANTS"]
print(merged_df[["ACR", "VIRUSCASESDENSITY"]])
if not os.path.isfile("map_absolute.png"):
    plotmap(merged_df, "VIRUSCASESCONFIRMED", 500, "map_absolute.png", "# of confirmed coronavirus cases per canton")
    plotmap(merged_df, "VIRUSCASESDENSITY", 0.1, "map_density.png", "% of coronavirus cases per cantonal population")

os.makedirs("dailymaps", exist_ok=True)
stamp ="%Y%m%d")
hdate ="%d.%m.%Y")

plotmap(merged_df, "VIRUSCASESCONFIRMED", 5000, f"dailymaps/map_abs_{stamp}.png", f"# of confirmed coronavirus cases per canton [{hdate}]")
plotmap(merged_df, "VIRUSCASESDENSITY", 1, f"dailymaps/map_den_{stamp}.png", f"% of coronavirus cases per cantonal population [{hdate}]")
Appenzell Innerrhoden,AI,16145,0
Appenzell Ausserrhoden,AR,55234,3
Sankt Gallen,SG,507697,16

3 Kommentare

Leave a Reply

Your email address will not be published. Required fields are marked *