Geohistograms in Python

Lately when working on a data analysis project I was asked to show the customer distribution based on their post code on a map. Never before I have done any geographic plots, so had to research the topic a bit. I have found a great library for that specific case and would like to share how to use it in this quick tutorial - how to show unemployment rates on maps.

Installation

For our visualizations we will be using Folium. This is a description from the authors:

Folium makes it easy to visualize data that’s been manipulated in Python on an interactive leaflet map. It enables both the binding of data to a map for choropleth visualizations as well as passing rich vector/raster/HTML visualizations as markers on the map. The library has a number of built-in tilesets from OpenStreetMap, Mapbox, and Stamen, and supports custom tilesets with Mapbox or Cloudmade API keys. folium supports both Image, Video, GeoJSON and TopoJSON overlays.

That sounds neat. To install folium for our project, we just need to run one of the below (depending which Python distribution you are using):

# Pick one of those
conda install folium -c conda-forge
pip install folium

Creating the map

So as we got our data ready we can start building the visualizations. First of all, we need to… you guessed it! Import the folium package.

import folium

The next step is to create a Map object with selected parameters (you can create the object without parameters to just show a world map). We will utilize the location and zoom_start parameters to specify which map fragment to show. Moreover we will use the tiles and attr for map styling. The specified latitude and longitude describes the center of Poland, while the zoom is set respectively to see the whole country.

pl_map = folium.Map(
    location=[51.9194, 19.1451],
    zoom_start=6,
)

Colored map

The map looks good, but why not make it better for our analysis. We really don’t need all those fancy colors. To change the style of the map, we need to use the before-mentioned tiles and attr properties. You can get values for many different styles from leaflet-providers. In this tutorial I used the CartoDB.PositronNoLabels theme. As you can see below, the map looks much more clear now.

# Add those to folium.Map() parameters
tiles='https://{s}.basemaps.cartocdn.com/light_nolabels/{z}/{x}/{y}{r}.png',
attr='&copy; <a href="https://www.openstreetmap.org/copyright">OpenStreetMap</a> contributors &copy; <a href="https://carto.com/attributions">CARTO</a>'

White map

Adding data to the map

The data I’m using is unemployment rates for each of the districts in Poland for april 2019. I took the data from Statistics Poland and it is available here. For the purposes of this tutorial, I converted the data into a json and imported it into a pandas DataFrame. The head of the DataFrame can be seen below:

NAME RATE
Swietokrzyskie 8.0
Wielkopolskie 3.0
Kujawsko-Pomorskie 8.4
Malopolskie 4.5
Dolnoslaskie 5.0


Moreover, to display the data on the map, I needed a GeoJson containing all the districts in Poland. The github user filipstachura provided a really great gist that we can download: GeoJSON with Polish Administrative Areas Boundaries. We will use this one soon!

First of all, we need to create a choropleth layer, this is built in the folium package. Let’s load the GeoJSON and use it with our map. When we display the map we can see that we have really nice dark borders around every district and each of the districts is coloured blue. It’s something, but we still want to display the unemployment rates!

district_file = os.path.join('data/','poland_woj.json')

folium.Choropleth(
    geo_data=district_file,
).add_to(pl_map)

White map

To do this, we will have to use our previously created DataFrame with the name of the districts and unemployment rates. We will add five more parameters to our Choropleth. The data parameter is nothing more than our DataFrame. We select the columns to use and we tell the object to use the key_on from the GeoJSON to match our NAME column. The fill_color and legend_name are purely cosmetic.

folium.Choropleth(
    geo_data=district_file,
    data=unemployment_df,
    columns=['NAME', 'RATE'],
    key_on='properties.name',
    fill_color='Reds',
    legend_name='Unemployment Rate (%)'
).add_to(pl_map)

Unemployment map

Great job! This geohistogram looks really great!

Conclusions

The package folium is a great tool for displaying map data for analysis. It allows us to create maps, load in GeoJSON data and use our data from great tools like pandas with ease. The output of a map is a HTML file, so you possibly can use that also in your Django/Flask projects. Once again we can see why Python is such a great language, with tools like folium. Do you use any other packages for maps? Throw in a comment if you do!