The article Cartograms in Tableau described the limitations of Choropleth Maps under certain circumstances and why Cartograms can sometimes be a viable alternative to color coded maps.

Building upon that, the previous post (Create your own Cartograms in Tableau) provided a Microsoft Excel tool to create the polygons for Cartograms based on your own data. The Excel tool (called Cartogram Data Generator) is based on my VBA implementation of “an algorithm to construct continuous area cartograms” by J. Dougenik, N. Chrisman and D. Niemeyer, published in “Professional Geographer” back in 1985.

With this VBA algorithm and two other techniques previously published here (Faster Choropleth Maps with Microsoft Excel and Create Excel Freeform Shapes from Polygons), we have all modules available to calculate and plot Cartograms directly in a stand-alone Microsoft Excel workbook.

Today’s post describes how to create Cartograms in Microsoft Excel and provides two versions of the Cartogram workbook for free download.

As mentioned in the introduction, we already have all the techniques and VBA implementations available to plot Cartograms in Excel. All we have to do is to pull the different the code snippets into one workbook.

The code in this workbook will

- calculate the distorted polygons of the regions using the algorithm provided in Create your own Cartograms in Tableau
- create freeform shapes based on the resulting distorted polygons using the code published in Create Excel Freeform Shapes from Polygons
- and finally color the freeform shapes (regions) in the tried and tested way shown in Faster Choropleth Maps with Microsoft Excel

Nothing new under the sun, but the combination of all these techniques makes an interesting and easy-to-use Excel workbook for plotting Cartograms:

Cartograms of the US Presidential Elections since 1900:

Or selected statistics of the European Union:

As you can see in the screenshots above, I reused the examples of the previous articles: statistics of the European Union and the results of the US Presidential Elections. I thereby noticed that it may be useful to create two different versions of the workbook depending on the structure of the underlying data:

- Version 1 (the EU Statistics)

Different measures and the option for the user to define which measure shall be used for color coding and which one for resizing the regions - Version 2 (the US Presidential Elections)

Data on a yearly basis and the option to select one specific year to be plotted. The measures for coloring (winner of the region) and the size (popular votes) stay the same

The VBA algorithm itself is pretty much the same in the two versions. There are only a few differences in referencing the input data and the user-defined selections.

As mentioned before, the code is very similar, but the data set up is different.

But let’s start with the things the two versions have in common: both workbooks need the original polygon data on a worksheet called [polygons]:

In addition to this polygon sheet, version 1 requires only one data worksheet with the measures organized in a cross table (regions in the rows and measures in the columns). The sheet looks like this:

Version 2, however, needs two data sheets organized in the same structure: the cross table has again the regions in the rows, but the columns represent the years of the elections. The worksheet [data color] contains the values used for coloring the map (the winning parties):

The worksheet [data size] has the values of the size measure (the popular votes):

If you want to use one of the templates for your own data, you have to decide first which version is best for the structure of your data.

**Version 1 – One data worksheet with several measures**

Above and right to the map, the dashboard provides a few interactive options and displays the legend and some statistics about the results of the algorithm:

- With the 2 combo boxes (drop down lists) in the top row of the dashboard, you define the measure to be used for coloring the map and the measure to be used for resizing the regions

- The spinner in the options section lets you define the number of iterations of the algorithm. Recommended minimum is 8. According to my experiences and tests, reasonable results are usually achieved with 12 to 15 iterations. The more iterations, the more precise the results, but also the longer the runtime and vice versa
- With the checkbox beneath the iterations, you can specify whether or not the original, i.e. not distorted map shall be shown in the background. This applies only if you selected the map type Cartogram, of course. If the box is checked, the map would look like this:

- The color scheme drop down provides 24 pre-defined color schemes to select from plus six dummy entries you can use for your own schemes by changing the cell colors on worksheet [color schemes]
- With the option button you decide whether you want to display a Cartogram or a standard Choropleth Map
- The legend is automatically updated depending on the selected color measure and the current color scheme
- Finally the section “Cartogram algorithm results” provides some statistics about the data and the last run of the algorithm. It shows indicators about the size of the data (regions, measures, polygon points), the last runtime in seconds and some statistics regarding the final size errors (minimum, maximum, average, median and standard deviation). The size error is the key performance indicator of the algorithm. It is defined as the difference between the desired area of the region (according to the size measure value) and the actual size of the polygon of this region in the Cartogram. The statistics allow you to evaluate how precise the results are. If you think the size errors are too high, you can increase the number of iterations and run the algorithm again (at the cost of a longer runtime).

**Version 2 – Two data worksheet with e.g. years in the columns**

In version 2, color and size measure are fixed and cannot be changed by the user. Also, there is no option to change the color scheme. Other than that, the dashboard provides the same input features and displays the same statistics:

In version 1, changing the size measure or using the option buttons will start the algorithm. Selecting another color measure or color scheme will just recolor the map. Changing the number of iterations or the background map option will not trigger the algorithm. The changes will take effect in the next run, i.e. if you select a new size measure or map type.

In version 2, only selecting another year or map type will start the algorithm. Again, changing iterations or the background map option will take effect in the next run.

During the runtime of the algorithm, the code shows its progress in Excel’s status bar at bottom left.

Using the templates for your own data requires ore or less the same procedure as described in steps 1 to 6 of the previous article (Create your own Cartograms in Tableau). As soon as you have your data and the polygons in the workbook, you may have to change the list of measures on the worksheet [control] (e.g. if you have more measures in your data than the template), the combo boxes on the dashboard and maybe the values for scaling and positioning the Cartogram (at the top of worksheet [control]).

As long as you do not change the structure of the workbook (i.e. the order of the sheets), the table names and the named ranges, you do not have to change anything in the VBA code.

The Cartogram workbook for the selected statistics of the European Union (version 1 of the implementation):

Download Cartograms Excel European Union (Microsoft Excel 2010-2013, 1,163K)

The workbook with Cartograms of the results of the US Presidential Elections since 1900 (version 2 of the implementation):

Download Cartograms in Excel US Presidential Elections (Microsoft Excel 2010-2013, 453K)

Besides the time needed to set up and adjust the templates for your own data, the performance of the algorithm is the main drawback of the implementation. The runtime of the algorithm depends on the number of polygon points and iterations. On my machine, the code needs ~ 15 seconds to update the EU Cartogram and 6 to 7 seconds for the US Presidential Elections Cartogram.

This is too long for a dashboard in production in my book, but I could not find a way to further improve the performance. If you look at the VBA code, the most time consuming part of the algorithm is step 7, where the distance of each polygon point from the center of each region is calculated (and some more variables based on the distance). In my performance tests, this step took more than 95% of the entire runtime of the algorithm and I simply do not see how I could speed this up. There may be some performance potential in the other parts of the code, but since those parts only make 5% of the runtime, performance tweaks would not really pay off.

So, agreed, the algorithm is too slow for an Excel dashboard in production, but I think the results are worth waiting for a few seconds.

So much for Cartograms in Excel and Tableau.

Speaking of performance issues: the performance of VBA implementations in Excel very often suffer from using the wrong way to read data from and / or write data back to the worksheet. The upcoming article will demonstrate how much the performance can be improved by using the right way of transferring data from the sheets to the VBA code and back.

Stay tuned.

]]>One of the recent posts showed how Cartograms can be a viable alternative when Choropleth or Filled Maps are stretched to their limits and tell the wrong overall story:

The article included a Tableau Public dashboard comparing Filled Maps with Cartograms for the results of the US Presidential Elections since 1900.

Since Tableau does not provide Cartograms as a built-in chart type, the distorted polygons have to be calculated outside of Tableau and then used as a data source to plot the maps using Tableau’s polygon map approach.

Today’s article is the follow-up post. It provides a free and open Microsoft Excel workbook to create Cartogram data with a few mouse clicks: copy in the data and the original polygon points of all regions and let the tool create a ready-to-use data set to plot Cartograms in Tableau.

**Step 1: Download the Tool**

First download the Cartogram Data Generator:

Download Cartogram Data Generator (Microsoft Excel 2010 – 2013, 991.8K)

**Step 2: Collect the Data**

The tool requires a defined set up of the data and the polygons of the regions.

The data set up looks like this:The first three columns of the table represent the regions:

- an ascending index (ID) starting at 1
- the region’s abbreviation
- the full name of the region

The following columns (from column E to the right) contain the data of the measures you want to create Cartograms for.

In the EU example shown in the screenshot, the table has 28 regions (EU member states) and 12 measures (Members of Parliament, Population 2014, etc.). You can expand this table to more regions and more measures, but it is crucial that everything is inside the table “tab_data” (List Object). This is important, because the code is referencing this table name and you therefore should not delete or rename it.

**Step 3: Collect the Polygons**

Next step is collecting the polygon data for all regions. In the end the polygon data has to look like this:

The first two columns contain the regions (ID and abbreviation), followed by the point order of the polygon and the latitude and longitude of this point.

You can find ready-to-use polygon data for some states or continents on the Internet. If you do not find what you are looking for, you can still try to find an ESRI Shapefile and use Richard Leeke’s ShapeToTab utility to transfer it to polygon data (more at the end of this article: Create Your Own Filled Maps in Tableau).

There is one very important thing to know: the algorithm handles only one polygon per region. If you have regions in your data with more than one polygon, you need to find out which one is the largest (e.g. the main land without the islands around it) and delete all others.

Again, do not rename or delete the table “tab_polygons”.

**Step 4: Copy the Data and Polygons to the Tool**

Open the tool with Microsoft Excel and make sure macros are enabled.

I left the example data of the European Union in the tool for demonstration purposes, so you have to clear the tables first.

- go to the worksheet [data]
- clear the existing values (from row 6 downwards)
- clear the header row for the measure columns (cell E5 to the right).
- add as many rows (regions) and columns (measures) as you need
- insert your own data in the data body range and the measure names in the column headers
- if applicable, delete unused (empty rows and columns) of the table

Do the same for the table “tab_polygons” on worksheet [polygons].

**Step 5: Check Data Validity and Integrity**

The tool / algorithm does not perform any checks on the validity, integrity or plausibility of the data. It expects to find a valid data set and valid polygons. Hence, data integrity and validity is in the user’s responsibility and you should especially check the following points:

- the IDs of the regions are unique (no duplicates)
- the values in the measure columns are either numbers or empty cells, i.e. no texts. Make sure the values are real number, not texts just looking like numbers
- all data rows of one region in the table “tab_polygons” (in the defined point order) make a valid, closed polygon, i.e. coordinates of the first and last point are identical
- the IDs of the regions in the data table and the polygon table correspond correctly, e.g. if ID of Austria is 1 in the data table, all points with the ID 1 in the polygon table make the valid polygon of Austria
- each region has exactly one polygon (see step 3): if a region / country in your polygon data consists of more than one polygon, you have to delete all except for the largest one, e.g. delete all islands of Italy, Spain, United Kingdom, etc. and keep only one polygon per country (the main land)

**Step 6: Adjust the settings**

Go to the worksheet [dashboard] and adjust the settings:

There are only two things you may or may not want to change:

- The number of iterations of the algorithm (cell C4)

The originators of the algorithm (J. Dougenik, N. Chrisman and D. Niemeyer, 1985) recommend a minimum of 8, but I would even set this to 12 or 15.

The more iterations, the better the results and the longer the runtime and vice versa. - The option to unpivot the data by clicking the checkbox

Unpivoting means the output of the data will be a normalized table, i.e. the measure columns will be transferred to two columns: the first including the measure name, the second including the value. This is especially helpful if the measures in your cross tab data are e.g. years (like in the US Presidential Election example of the previous post).

**Step 7: Run the Algorithm**

Go to the tab CARTOGRAM DATA GENERATOR and click on the icon “Create Cartogram Data”:

The algorithm runs and shows where it is in Excel’s status bar at bottom left:

Please be patient, this may take a few minutes. The runtime of the algorithm depends on

- the number of regions
- the number of measures (columns of the table “tab_data”)
- the number of polygon points (rows of the table “tab_polygons”)
- the number of iterations you defined in step 4

Agreed, it isn’t really fast, but it is not too bad. For the EU example used here (28 states, 12 measures, 23,637 polygon points, 15 iterations), it took ~ 3 minutes on my machine (without unpivoting). A welcome opportunity to grab a coffee, but not enough time to walk the dog. I am sure you will find something useful to do while the algorithm is running.

**Step 8: Check the Results**

On the worksheet [dashboard] you can see the main results of the algorithm:

The key performance indicator of the algorithm is the size error, i.e. the difference between the desired area of the region (according to the measure value) and the actual size of the polygon of this region in the Cartogram.

The dashboard shows various statistics of the size error: minimum, maximum, average, weighted average, standard deviation, 25%, 50% and 75% percentiles and the count of areas with a size error larger than the standard deviation. At the bottom of the dashboard you see the size errors by region.

The dashboard is using a fixed cell range and displays only the first 10 measures and the first 100 regions. As mentioned above, you can use more regions and measures in the tool, but the dashboard will only show the first 10 respectively 100. These limits may be increased by expanding the cell ranges on the dashboard and copying the formulas, but you would also have to change some constants in the VBA code. In any case, 10 measures and 100 regions already give a very good impression of the results.

With this dashboard, you can evaluate the quality of the algorithm’s outcome, i.e. how good the Cartogram polygons represent the values in the data. If the size errors are too high, you need to increase the number of iterations and try again.

**Step 9: Check the Output**

After the algorithm is finished, you will find a new, time-stamped Excel workbook in the same folder, called e.g. “cartogram_data_2015-06-20 12-00.xlsx”. This workbook contains two sheets: the data (as a cross tab or a normalized table, depending on the unpivot checkbox) and the Cartogram polygon data. You can directly connect to this data set with Tableau or import it into your database.

If you selected the unpivot option, you may want to change the generic column headers “Measure Name” to e.g. “Year” and “Measure Value” to “Popular votes”. You can do this directly in Excel or change the names later in Tableau.

**Step 10: Create the Cartograms in Tableau**

Let’s say we are connecting Tableau directly to Excel.

Case 1: Without unpivoting

Open Tableau, connect to the Excel workbook created in step 5 and drag both sheets to the top area of the data source page:

Tableau automatically sets the join by ID, so no need to change anything here.

Go to the worksheet and you can plot your Cartogram in Tableau using the polygon map approach and filtering by one of the measure names:

Again open Tableau, connect to the Excel workbook and drag both sheets to the top area of the data source page:

Make sure the join is by ID and Measure Name (or whatever you called the field in the Excel workbook).

A few more mouse clicks for the polygon map approach and you will see this:

You may have seen the Cartograms with Tableau in action already in the previous post, but I wanted to show another example: some selected statistics of the European Union displayed on a Filled Map and on a Cartogram. The dashboard lets you select the measure to be used to resize the regions and the measure to color the map:

As mentioned in the previous post, the Cartogram Data Generator uses “An algorithm to construct continuous area cartograms” by J. Dougenik, N. Chrisman and D. Niemeyer, published in “Professional Geographer” back in 1985. This paper is 30 years old and in the meantime there are more sophisticated and probably faster options available, but it still does the job.

My contribution to this is minimal: I simply implemented Dougenik et al’s algorithm as it is in Excel and VBA. Although it definitely is one of the more complex VBA projects published here on Clearly and Simply, I won’t go into the details. The VBA project is open, the size is manageable (17 procedures with ~ 500 lines of code) and I tried to clearly comment every procedure, variable and section of the code. If you are interested in the implementation, please have a look for yourself. If you have any recommendations how to improve the code, please leave me a comment or drop me an email. Any input is much appreciated.

That’s it with Cartograms in Tableau for the time being.

Having said that, with this VBA implementation of a Cartogram algorithm and some of the techniques posted here previously (Faster Choropleth Maps with Microsoft Excel and Create Excel Freeform Shapes from Polygons), we have all components available to dynamically plot Cartograms in a stand-alone Excel workbook. We just have to combine them in one workbook. The next post will explain how to create Cartograms directly in Excel and provide an example workbook for free download.

Stay tuned.

I just noticed that there were a few smaller bugs in the Cartogram Data Generator originally posted for download. The results of the algorithm (the Cartogram polygon data) were correct, but there have been some issues with the statistics displayed on the dashboard. I fixed the bugs and updated the post now.

To the few readers (less than 30) who downloaded the tool already: I apologize for any possible confusion and inconvenience. Please download the updated tool again.

]]>The previous post explained why color coding public holidays can be very effective for analyzing daily data, provided the Calculated Fields for all public holidays in the United States and included an interactive calendar for the US on Tableau Public.

The Calculated Fields for the US public holidays are pretty straight forward and not too complicated.

This can be different in other countries. And it sure is in my homeland. In Germany, all variable public holidays depend on Easter Sunday. And, if you read on, you will see that Easter Sunday requires a really complex calculation.

Today’s post provides the Calculated Fields for all German public holidays and a German version of the interactive calendar on Tableau Public.

Like the United States, Germany has fixed and variable holidays, too. However, the basic rule of a variable holiday in the US (“nth weekday of a defined month”) does not exist in my country. In Germany, all variable holidays are based on the date of Easter Sunday, i.e. a defined number of days before or after that date. And Easter Sunday requires a complex formula.

**Fixed Public Holidays**

Let’s start with the easy part. Here are the Calculated Fields for fixed holidays in Germany:

Neujahr (New Year’s Day): January, 1

DATETIME(STR(YEAR([Date]))+"-1-1")

Heilige Drei Könige (Epiphany): January, 6

DATETIME(STR(YEAR([Date]))+"-1-6")

Tag der Arbeit (Labor Day): May, 1

DATETIME(STR(YEAR([Date]))+"-5-1")

Maria Himmelfahrt (Assumption of Mary): August, 15

DATETIME(STR(YEAR([Date]))+"-8-15")

Tag der Deutschen Einheit (Day of German Unity): October, 3

DATETIME(STR(YEAR([Date]))+"-10-3")

Allerheiligen (All Hallows’ Day): November, 1

DATETIME(STR(YEAR([Date]))+"-11-1")

1. Weihnachtsfeiertag (Christmas Day): December, 25

DATETIME(STR(YEAR([Date]))+"-12-25")

2. Weihnachtsfeiertag (Boxing or St. Stephan’s Day): December, 26

DATETIME(STR(YEAR([Date]))+"-12-26")

As already explained in the previous post, the calculations for fixed holidays is pretty straight forward: YEAR([Date]) returns the year of [Date]. STR converts the year to a string. The fixed month and day are then concatenated at the end of this string with a hyphen out in front. Finally DATETIME converts the string back to a date.

**Variable Public Holidays Part 1: Easter Sunday **

On the Gregorian calendar, Easter Sunday is the first Sunday after the first full moon on or after March 21.

I can’t tell you no lies: I had no clue how to put this rule into a formula. Fortunately I am having an Excel template in my toolbox with all kind of date calculations and this workbook also provides a formula for Easter Sunday. I would love to give credit to the genius who created it, but I simply can’t remember where I got it from.

Now that I had the solution in Excel, I simply dissected it and transferred the logic from an Excel cell to a Tableau Calculated Field.

Since the formula is very complex and one part is repeated 4 times inside the formula, I decided to split it into 2 Calculated Fields

[ES Intermediate] is the calculation repeated 4 times:

((255-11*(INT(YEAR([Date])) % 19) - 21) % 30) + 21

Based on this intermediate step, the formula for Easter Sunday (called [Ostersonntag] in my workbook) looks like this:

DATEADD ('day',

[ES Intermediate] - INT([ES Intermediate] > 48) + 6 - (INT(DATEPART('year', [Date]) + DATEPART('year', [Date]) / 4 +

[ES Intermediate] - INT([ES Intermediate] > 48) + 1) % 7), DATETIME(STR(YEAR([Date]))+"-3-1"))

I can see that it is working with the INT function and several modulo operations. But truth be told, I do not have the slightest idea how this is working and I can’t provide an explanation here. I am sorry.

**Variable Public Holidays Part 2: All other variable holidays**

As soon as you have the Calculated Field for Easter Sunday, all other variable holidays in Tableau are a walk in the park:

Karfreitag (Good Friday): 2 days before Easter Sunday

DATEADD('day', -2, [Ostersonntag])

Ostermontag (Easter Monday): one day after Easter Sunday

DATEADD('day', 1, [Ostersonntag])

Christi Himmelfahrt (Ascension Day): 39 days after Easter Sunday

DATEADD('day', 39, [Ostersonntag])

Pfingstsonntag (Pentecost / Whit Sunday): 49 days after Easter Sunday

DATEADD('day', 49, [Ostersonntag])

Pfingstmontag (Pentecost / Whit Monday): 50 days after Easter Sunday

DATEADD('day', 50, [Ostersonntag])

Fronleichnam (Corpus Christi): 60 days after Easter Sunday

DATEADD('day', 60, [Ostersonntag])

Not much to explain here, I guess. DATEADD('day', n, [Ostersonntag]) adds n days to the Calculated Field [Ostersonntag] (Easter Sunday).

Again, the workbook contains 3 additional Calculated Fields. Please refer to the previous post for the details.

Stay tuned.

I wrote above that I couldn’t remember where I found the Excel version of the Easter Sunday formula. Martin Nolan recognized it (see first comment below): Chip Pearson shares different techniques for calculating Easter Sunday, including the one I used (the shorter function): Easter Calculation - Chip Pearson

Many thanks go to Chip Pearson for providing the formulas (and all the other fabulous Excel solutions he shares for free) and to Martin Nolan for pointing this out. Thanks, gentlemen.

]]>For the analysis of daily data, the weekday can be an important and helpful additional information.

For instance: if you are analyzing sales or order figures, incoming calls in a call center, internet usage, web statistics or the like, you will probably see significant differences in the data between working days, weekends and public holidays.

Your views and dashboards should enable the reader to immediately distinguish between the three types, e.g. by color coding. This is pretty easy to do for the weekends, but it takes some additional efforts to include the public holidays, too.

Today’s post explains why color coding daily data by the type of the day can be important for analysis and visualization. The article provides and describes Calculated Fields in Tableau for all US public holidays and put them to use in a Tableau Public dashboard showing an interactive calendar with all public holidays in the US between 1900 and 2099.

Why should you highlight public holidays in your views and dashboards?

Let’s have a look at an example. Here is an excerpt of some web analytics data of this blog for one month:

You can easily see significant differences. The pattern (2 small values, followed by 5 larger numbers, followed by 2 small numbers again) suggests that this is a workdays/ weekend slope. But you do not know for sure and you do not see it at a glance.

Let’s try to improve the view the easy way. We add the weekdays of the dates to the view:

Better than before and it confirms our workdays / weekend assumption. However, your eyes have to jump back and forth between the chart area, the labels and the axis to identify the weekends.

So, let’s make this easier for the reader by color coding the weekends:

Now that’s much better and we can see the weekends at a glance.

However, have a look at the last four columns. Only 29th and 30th are a weekend. Why are the numbers so small on a Thursday and Friday? You guessed it: public holidays. Let’s add them to the color coding of the view:

Now you see it: we are looking at November and the fourth Thursday in November (the 27th in this case) is Thanksgiving. I do not have an explanation for the small number on the day after Thanksgiving, but my wild guess would be that many people took the day off for a long weekend.

There is another interesting fact visible at a glance with this color coding: the value on the other public holiday in November (11th: Veterans Day) is comparable to a working day. This means that the numbers are significantly smaller on weekends and public holidays, but not on all public holidays.

In a nutshell: color coding weekends and public holidays can provide a lot of additional interesting information at a glance.

I see two options to realize public holidays in Tableau: Data Blending and Calculated Fields.

This is certainly the easier way. First, you create a list of all holidays (e.g. in Microsoft Excel), maybe looking like this:

To save you some time, here is such a list for free download covering the holidays from 1900 to 2099:

Download US Holidays (zipped Microsoft Excel 2007-2013 workbook, 785.1K)

Next, you use Tableau’s data blending feature to add this list as a new data source of your workbook and define the relationship (if Tableau didn’t automatically):

That’s it. A little data preprocessing to get the list of public holidays, a few mouse clicks in Tableau and you are good to go.

The other, more complicated option is to create a Calculated Field for each holiday.

Public holidays can be divided into two different categories: fixed and variable.

Fixed holidays always fall on the same day of the same month, like New Year’s Day or Independence Day. In the US, the variable holidays are all defined by a “nth weekday of a defined month” rule. e.g. Martin Luther King Jr. Day is the third Monday of January and Memorial Day is the last Monday of May.

**Fixed Public Holidays**

Let’s start with the easier ones: the fixed holidays.

Here are the Calculated Fields in Tableau for the fixed public holidays:

New Years Day (January, 1):

DATETIME(STR(YEAR([Date]))+ "-1-1")

Independence Day (July, 4):

DATETIME(STR(YEAR([Date]))+"-7-4")

Veterans Day (November, 11):

DATETIME(STR(YEAR([Date]))+"-11-11")

Christmas Day (December, 25):

DATETIME(STR(YEAR([Date]))+"-12-25")

How does this work?

YEAR([Date]) returns the year of [Date]. STR converts the year to a string. The fixed month and day are then concatenated at the end of this string with a hyphen out in front. Finally DATETIME converts the string back to a date.

**Variable Public Holidays**

To implement the variable holidays, you first have to understand how to work with weekdays in Tableau.

The function DATEPART (date_part, date, [start_of_week]) returns a part of the given date as an integer. DATEPART('weekday', [Date]) returns a number indicating the day of the week: 1 for Sunday, 2 for Monday, …, 7 for Saturday.

To implement the “nth weekday of a defined month” rules, the Calculated Fields start at a certain day in the month (e.g. two weeks after the first day of the month) and then add days to it based on the weekday of this starting date and the defined weekday of the holiday.

Too academic? Ok, let’s have a closer look at the first variable public holiday in the US:

**Martin Luther King Jr. Day: 3rd Monday of January**

DATETIME(STR(YEAR([Date]))+"-1-15") +

IF DATEPART('weekday', DATE(STR(YEAR([Date]))+"-1-15")) > 2 THEN

9 - DATEPART('weekday', DATE(STR(YEAR([Date]))+"-1-15"))

ELSE

2 - DATEPART('weekday', DATE(STR(YEAR([Date]))+"-1-15"))

END

Starting point of the calculation is the 15th of January, i.e. two weeks after New Year’s Day (first line). The following IF clause checks if the weekday of January 15 is Tuesday to Saturday (i.e. weekday > 2). If so, it adds 9 and subtracts the weekday of January 15. Else (January 15 is a Sunday or Monday), it adds 2 and subtracts the weekday.

Let’s look at the results of the inner calculations for two examples:

Year 2015:

January 15, 2015 was a Thursday, i.e. its weekday is 5. The formula calculates as follows: it starts at the 15th of January. Since the IF condition is true (5>2), it adds 4 days (=9-5) to this date. The result: January 19th 2015, the third Monday of January 2015.

Year 2012:

January 15, 2012 was a Sunday, i.e. its weekday is 1. Again, the formula starts at January 15. The condition is false (1<2) and in the ELSE part of the IF clause, 1 (=2-1) day is added to the start date. The result: January 16th 2012, the third Monday of January 2012.

This is the basic logic in all Calculated Fields for variable public holidays in the US. Here are the other formulas:

**Presidents' Day: 3rd Monday of February**

DATETIME(STR(YEAR([Date]))+"-2-15") +

IF DATEPART('weekday', DATE(STR(YEAR([Date]))+"-2-15")) > 2 THEN

9 - DATEPART('weekday', DATE(STR(YEAR([Date]))+"-2-15"))

ELSE

2 - DATEPART('weekday', DATE(STR(YEAR([Date]))+"-2-15"))

END

Almost the same formula as for Martin Luther King Jr. Day, just replacing January by February.

**Memorial Day: Last Monday of May**

DATETIME(STR(YEAR([Date]))+"-6-1") +

IF DATEPART('weekday', DATE(STR(YEAR([Date]))+"-6-1")) > 2 THEN

9 - DATEPART('weekday', DATE(STR(YEAR([Date]))+"-6-1"))

ELSE

2 - DATEPART('weekday', DATE(STR(YEAR([Date]))+"-6-1"))

END

- 7

This Calculated Field starts at the first day of the next month (June) and finally subtracts one week (7 days) after the IF clause.

**Labor Day: 1st Monday of September**

DATETIME(STR(YEAR([Date]))+"-9-1")+

IF DATEPART('weekday', DATE(STR(YEAR([Date]))+"-9-1")) > 2 THEN

9 - DATEPART('weekday', DATE(STR(YEAR([Date]))+"-9-1"))

ELSE

2 - DATEPART('weekday', DATE(STR(YEAR([Date]))+"-9-1"))

END

**Columbus Day: 2nd Monday of October**

DATETIME(STR(YEAR([Date]))+"-10-8") +

IF DATEPART('weekday', DATE(STR(YEAR([Date]))+"-10-8")) > 2 THEN

9 - DATEPART('weekday', DATE(STR(YEAR([Date]))+"-10-8"))

ELSE

2 - DATEPART('weekday', DATE(STR(YEAR([Date]))+"-10-8"))

END

**Thanksgiving: 4th Thursday of November**

DATETIME(STR(YEAR([Date]))+"-11-22") +

IF DATEPART('weekday', DATE(STR(YEAR([Date]))+"-11-22")) > 5 THEN

12 - DATEPART('weekday', DATE(STR(YEAR([Date]))+"-11-22"))

ELSE

5 - DATEPART('weekday', DATE(STR(YEAR([Date]))+"-11-22"))

END

**Enhancement: Shift Public Holidays to the closest workday**

I learned from Wikipedia that public holidays falling on a Saturday or Sunday are observed by federal employees on the closest working day. Since all variable holidays cannot fall on a weekend by definition, this affects only the fixed holidays. I am not sure if this is really necessary, but you can easily add a “shift to closest workday” option to your calculations:

Let’s say [Shift to closest workday] is a Boolean parameter to define whether or not public holidays on weekdays shall be shifted to the closest workday. The formula for Independence Day would then look like this:

DATETIME(STR(YEAR([Date]))+"-7-4")+

IF [Shift to closest workday] THEN

IF DATEPART('weekday', DATETIME(STR(YEAR([Date]))+"-7-4")) = 7 THEN

-1

ELSEIF DATEPART('weekday', DATETIME(STR(YEAR([Date]))+"-7-4")) = 1 THEN

1

ELSE

0

END

ELSE

0

END

All other fixed holidays would be calculated with a similar formula.

Here is an example putting all this to use: a US calendar including the public holidays for the years 1900 to 2099 on Tableau Public:

The workbook contains 3 additional Calculated Fields:

[Date Description] assigns the name of the holiday (and “Saturday”, “Sunday” or “Work Day” if the day is no holiday) to the dates using a CASE statement and an IF clause. The field is used in the tooltips.

[Date Type] assigns “Work Day”, “Weekend” or Holiday” to the dates, also using a CASE statement. [Date Type] is used to color code the days in the calendar.

[Holiday Definition] assigns the definition text to the holidays. The field is used in the list of holidays at the bottom of the dashboard and in the tooltips.

The rules of US public holidays are rather simple compared to other countries. In Germany, for instance, many public holidays depend on the date of Easter Sunday. The challenge: calculating Easter Sunday requires a pretty complex Calculated Field. But it is possible. One of the next articles will provide the same workbook as shown above, but this time for a German calendar.

Stay tuned.

]]>The previous post demonstrated an alternative technique to implement a direct interaction with an Excel chart using an ActiveX label control sitting on top of the chart. The example used in that post (a geography quiz) allowed a click on a map and evaluated the coordinates of the position the user clicked on.

The technique, however, can also be used in many more cases. If you are working with an XY scatter chart and especially if you are working with a map, interactive selecting and highlighting a user-defined area of the chart can be a very helpful feature for exploring and analyzing the data.

Tableau Software, for one, allows for different ways of selecting and highlighting areas. Microsoft Excel has no such features built in. With the technique described in the previous post and some additional VBA code, however, you can bluff almost the same functionality in Excel.

Today’s article describes how to implement selecting and highlighting features on Excel charts. As always the example workbook and the VBA code is provided for free download.

First let’s have a look at Tableau Software as the shining model for great interactive features: Tableau always allowed to select and highlight rectangular areas on a map (and on other views, too). Since version 9, you now have the choice between a rectangular, a radial or a so called “lasso” selection using the fly out menu at top left of the map:

The rectangular selection is the standard feature and – as already mentioned – available in Tableau for a long time already. The new radial selection is great to select marks in a certain distance around a defined center:

The “lasso” makes the selection even more flexible and is a great option if you want to select an area which you can’t define with a rectangle or circle.

In a nutshell: Tableau provides great interactive selection features which would be very helpful on Excel dashboards, too.

The previous post (Another Technique for Interactive Excel Charts) describes how to implement an interactive map in Excel, enabling the user to click on any given location of the chart and detecting the coordinates of this position. The article on hand shows how to realize Tableau’s rectangular and radial selection features in Microsoft Excel building upon this implementation and technique.

I will skip the “lasso” option. Not because it is not interesting, rather because this is way more complicated to implement in Excel than the other two.

First, let’s have a look at the result and the features. Here is the dashboard:

You see the map of Europe and its cities with more than 100,000 inhabitants at the left side of the dashboard. Now, here is how the selection works:

- With the option buttons top right to the map, you select whether you want to use the radial or the rectangular selection.
- Hover over the map with the mouse. The cursor changes to a hair cross. Keep the SHIFT key pressed, click wherever you want and move the mouse to span the selection area.
- If you chose the radial selection, click on the center of the area you want to select and move the mouse in any direction. You can expand the size of the circle only as far as the resulting circle still entirely fits on the map.
- If the rectangular selection is active, you can only span the area by moving the mouse right and down (i.e. to the East and South), so you have to start (click) at the top left corner of the area you want to select.
- During the selection process, the entire dashboard will dynamically be updated:
- The selected cities (inside the rectangle or circle) will be highlighted with a red fill color on the map
- The additional information above the map will be updated on the fly (count of selected and not selected cities and the total and average population of the selection)
- The bar charts right to the map (Top 15 countries by count of selected cities and sum of the population) will also be dynamically updated

- If you are ready, simply release the mouse button and SHIFT key.
- To deselect again, simply click somewhere on the map without keeping the SHIFT key pressed.

Truth be told, there is one downside: if you click first on the map, keep the mouse button down and then press the SHIFT key, the technique doesn’t work as expected. If this happens, simply release the SHIFT key, click somewhere and try again.

As mentioned above, the workbook is based on the example provided in the previous post, which explains the basic concept in detail.

Thus, I will restrict my explanations here to the necessary adjustments for the implementation of the rectangular and radial selection.

The basic idea is adding 3 more freeform shapes to the worksheet formatted with a light and half transparent fill color: a rectangle, a circle and a cross (indicating the center of the circle). The VBA code will make the shapes visible or invisible according to the user’s selection (the option buttons) and position and resize the shapes according to the mouse clicks and moves on the chart (more precisely on the ActiveX label control).

It is important that the label control is still sitting on top of the chart and the freeform shapes, so you may have to change the order of the shapes in the selection pane:

In the example shown in the screenshot above, the radial selection is active, i.e. the cross and the circle are visible, the rectangle is invisible.

**The structure of the workbook and the worksheets**

There are three worksheets in the workbook:

The worksheet [map] contains the map of Europe, which actually consists of several objects: the freeform shapes of Europe’s countries, the XY scatter chart displaying the cities, the rectangle, cross and circle shape and finally the label control. Furthermore there are the radio buttons (as described above), a few result cells directly linking to worksheet [control] and 2 bar charts.

The XY scatter chart displays 2 data series: the coordinates of all cities (defined on [data]) formatted in grey and the coordinates of the selected cities formatted in red. The data source of the selected cities is calculated on sheet [control] (see below). The data shown in the bar charts is also defined on worksheet [control].

The worksheet [data] contains the city data only, i.e. no calculations. Nothing to explain here, I think.

The [control] worksheet is divided into several sections:

- The first section (“Control parameters”) serves as the container for some parameter cells updated by the option buttons and the VBA code .
- The next section (“Geometrical calculations”) calculates the distance between the starting position (where the user clicked) and the current position of the mouse.
- In section “Displayed measures” the aggregated results are calculated: count of selected and not selected cities as well as sum and average of the population of the selected cities.
- The most important formula of the solution is in column C of the section “Selected cities”. It is a Boolean formula returning true, if the city is inside the selected area and false otherwise. The three columns right to it are using this result in a simple IF formula to define whether the city shall be highlighted in red or not and whether the population shall be counted or not.
- Finally, the last section on this worksheet (“Data preparation for sorted bar charts”) is the usual technique for sorting data by formulas and produces the list and data of the countries sorted in descending order. These results are then used as the data source of the bar charts.

**The VBA code**

You can find the entire VBA code in the worksheet object [map] of the workbook. It is only 112 lines of code and every sub, variable and code section is commented. I will not repeat this here. If you are interested in how the code works, please have a look for yourself. If you have any questions, please leave me a comment.

Here is the Microsoft Excel workbook for free download:

Download Select and Highlight Area on a Chart (Excel 2010-2013 workbook, 694KB)

I have done this in the previous post already, but I do want to thank Andy Pope again for sharing the stellar idea of using an ActiveX label control to implement interactive features for Excel charts. Thanks, Andy!

That’s it for interactive selecting and highlighting in Excel. At least for the moment. Maybe I will come back to this topic some time later.

The upcoming articles will be about Cartograms in Tableau and Excel. Furthermore I am having a few posts about Tableau Calculated Fields in the pipeline.

Stay tuned.

]]>