Off late, I have been doing a lot of data analysis and visualization on performance ratings, salary hike, gender pay equality etc. Today let me share you an awesome way to visualize massive amounts of data.

**Scenario: **Your organization of 3,686 people recently went thru annual performance ratings & review process. At the end of it, everyone was offered some salary increase (from $0 to $24,000 per year). You have 7 business groups. How do you tell the story of all these salary hikes in one chart?

**How about this one?**

Ready to know how to create this in Excel? Read on.

That is right, what you are seeing above is good old scatter plot with a bit of jitter (random noise added to X values). This way, when too many dots are at a single point, we spread them apart to show more.

Here is a sample of 3,500+ employee’s ratings and salary hikes (randomly made up), with the usual columns:

**Convert rating and group names to numbers:**

Since we can’t use rating and group names in XY plot (we need numbers, not text), let’s convert these in to numbers using simple MATCH() formula.

We get two new columns, like below:

**Creating X & Y values from data:**

Next up, we need to generate the X & Y values for our plot.

**Y value:** This is easy. It is the amount of salary increase with two twists:

- If employee got $0 hike, we want to omit them in the plot. This will remove many of dots from the plot (less clutter)
- If an employee is unrated (even if they got a hike), we want to omit them too. This is because our plot has only 4 rating levels per group. There are very few unrated people and they are not the focus of this chart.

*We can create Y value using a simple IF formula like below:*

- =IF(OR([@[Salary Increase $]]=0,[@[Rating 17 (number)]]=5),NA(),[@[Salary Increase $]])

**X value:** This is the tricky bit. Since there are 7 groups, each with 4 ratings (excluding the unrated), we have 28 possible X values. We want to space these out so dots for one group + rating combination don’t encroach other combination.

Let’s say we give 10 units of space per group.

That means, we have 2.5 units of space per rating in that group (and total of 70 units of space).

Now, the dot needs to plotted at the center of this 2.5 unit of space (ie at 1.25)

** The basic formula would be:** =[@[Business Group]]*10+([@[Rating 17 (number)]]-1)*2.5+1.25

Aah, right. We need to add random noise to X value. Since each rating has 2.5 of space, how about noise between -0.7 to 0.7 ? This still leaves plenty of space on both ends thus keeping the plot clear.

We can use below formula to generate the noise.

=RANDBETWEEN(-700,700)/1000

**The final formula for X value goes like this:**

=[@[Business Group]]*10+([@[Rating 17 (number)]]-1)*2.5+1.25+[@Noise]

*Here is how our X,Y looks at this stage:*

Data prep done, let’s move to the plot.

- Select both X & Y values and insert XY plot. We get this.
**Set X axis limits and remove title:**As all our dots are between 10 to 80, let’s set them as limits for X axis. Also, let’s remove the chart title.**Add vertical gridlines:**Although our dot towers are separated from each other, adding grid lines makes it easy to read the chart.**Format the markers:**Set fill to solid color and 25% transparency. This makes the dots look nice and shows the density when there are too many people at some co-ords.**Set Y axis limit:**So that we can focus on people getting salary increase of up to $10,000. This zooms the chart to meaty part while showing plenty of outliers. We get this:**Last step:**Remove plot and chart borders, so we can add extra info, labels etc.

Ok, now our chart is almost ready. Next step, making it a story.

**Create a wireframe in 10 column area, as shown below:**

Next place the chart inside the red box. Adjust plot area size so it fits in to 7 columns. Hold ALT key when adjusting so chart’s plot area would fit in to 7 columns. You need to repeat this step every time you fiddle with the chart. So do it at last.

**Add extra story points:**

- A clear and descriptive title
- A sub-title explaining what is going on and how to read the chart.
- Group names and rating names. You can use the below trick to align the rating labels inside cell nicely.
- Show some more stats like median hike, median new pay (if you have it), head counts and unrated counts.
- Add any footers, disclaimers (about excluded people in the plot etc.)
- Add a border around this entire wire frame so it all looks like one piece.
- Shade alternative columns in some dull color. This improves the readability. As our chart is transparent, cell fill colors will show up nicely.

**We are done.**

That is right. You can create a similar plot quicker and better using R. ggplot, an R library has built-in support for jittering dots on XY plots. So using that, you can create below chart with just 7 lines of code. This is what you get (yes, you can show each rating dots in different color, and yes, you can order the groups by number of people in them).

Here is the R script if you want to experiment.

**Click here to download the workbook** containing this chart, tutorial and raw data. Try re-creating it in Excel (or your favorite visualization tool) to learn more.

I had lots of fun making and tweaking this chart. It shows some interesting patterns about how salary hikes are distributed across groups and where everyone is.

**How do you like this? **Do you plan to add some jitter to your busy scatter plots? Please share your thoughts in comments section. And if you want some inspiration, check out more such charts.

If you love story telling and beautiful visualizations but not sure how to get there, consider enrolling in our Excel School or 50 ways to Analyze Data programs. In these powerful courses, I teach you all about awesome data analysis and visualization techniques.

]]>**What you need:**

- Power BI Desktop and R
- Raw data set – rem-data.csv

- Load CSV data in to Power BI
- Edit the query in so we can transform the data in to Pivot shape in Power Query
- Apply below steps in Power Query
- Group data by Group and Branch with aggregations on count (named Branch Count) and All rows (named Ratings)
- We get a totals by group and branch level.
- Expand Ratings table and show only Ratings column
- This creates a table with all group, branch and rating combinations along with branch total
this time on Group, Branch, Branch total and Ratings with aggregation on count.**Group again,**- Calculate count as
Branch count**percent of** - Format the percentage as percent
- Close and apply to Load this data in to Power BI

- Insert R script visualization
- Add Group, Branch, Ratings and Pct to values area.
*This creates a dataframe with all 4 columns*. - Add below R script and your visualization is ready.

```
library(tidyverse)
ggplot(data=dataset) +
geom_bar(aes(x=Rating, y=Pct), stat="identity")+
scale_x_discrete(limits = c("NME","AME","SP","OP","NR"))+
facet_wrap(~Branch, nrow=1)+
theme(strip.text.x = element_text(size = 8))
```

* To run the R script, *simply press play button in R script editor pane.

This creates a truly powerful interactive panel chart in Power BI. Simply add Group as a slicer and play with it. Every time you select a new Group, Power BI runs the R script with filtered data fed to the dataframe. There is a second or two lag, but the wait is totally worth it.

Here is a video outlining the entire process along with some tips on how to use R in Power BI. Check it out below or on Chandoo.org YouTube channel.

**Click here to download Power BI workbook for this**. You may need to adjust the data source settings. Play with the slicer to refresh the R panel charts.

I am playing with Power BI for last year or so and I am in love. You are going to hear more about it on Chandoo.org for sure.

* What about you? *Have you played with Power BI yet? What are your thoughts?

*Related: Introduction to Power Query.*

]]>

Around July, most companies in NZ have their employee performance reviews (or rem reviews). In August, all this review data is, well, *reviewed* so managers can decide whether any moderation (curve fitting) is needed and how much hike to offer. To help with this process, you need to first understand what proportion of staff are in each rating category. If you are head of a large company, then naturally, you want to know how this spread looks like at overall level as well as individual business group, branch levels, *ie *at various hierarchies.

One quick and easy way to do this is by making lots of charts.

**Enter Panel Charts. **

Let’s say you have 80 branches, spread across 10 business groups. Each branch has different number of people. To understand if the ratings are normally spread in *a* branch, you can make a column chart showing number of people in each rating category. Then to see how this spread looks across the company, simply create 80 charts, one per branch.

Related: Introduction to Excel Panel Charts.

**80 Charts – that is not fun!!!**

Tell me about it. You might need a strong cup of espresso and right mood for lots of copy pasting.

Or you could use some VBA to do the dirty work for you.

In either case, you are going to waste (and hate) that afternoon.

If you are not in the mood for manic copy pasting, then you could use sparklines to cheat. Once you make 80 sparklines, you can just move them around so they look like an 8×10 grid.

It is still a lot of work, but less annoying than creating 80 charts.

If you are familiar with programming (VBA/C or something else) you can quickly learn the basics of R and use simple code to convert raw data to elegant panel chart in *almost *no time.

Watch below video. I explain both Excel sparkline and R approaches. As the process is elaborate (especially the Excel one), I am using video format. See it below or on Chandoo.org channel.

Use below links to download files:

- R code panel-gen.r
- Excel workbook with sparkline panel charts – panel-charts
- Raw data in CSV format – rem-data

R is fun, interesting and challenging. If you have some free time, pick it up. It is free and offers heaps of possibilities for statistical analysis, data science, modeling, visualization, data cleanup and exploration. If you are looking for a book, I suggest R for Data Science by Hadley Wickham and Garret Grolemund.

If you are doing some R programming, please share your experiences and thoughts in comments.

]]>- Just open the workbook
- Go to File > Info
- Right on the top, you can see all protected worksheets and a link to unprotect them.
- Click to unprotect the ones you want to.
- Done.

- Just open the workbook
- Go to File > Info
- Right on the top, you can see all protected worksheets and a link to unprotect them.
- Click to unprotect the ones you want to.
- Done.

**Here is a quick demo of the process.**

Happy world domination.

**Few more quick tips:**

Be warned though. You will question your loyalties after finding the awesome powers of Excel.

- Hide columns same way as they were in another workbook
- Make copies of worksheets quickly with CTRL key
- On/off conditional formatting with a simple trick
- Autosum many ranges quickly
- More Quick Tips & Productivity hacks

]]>

- Formulas - not easy given the unstructured nature of data. But
*almost*possible. See this for an example. - VBA - possible, read this forum discussion few ways to do it.
- Power Query - at first glance it might seem tricky, but PQ makes this all too easy. Read on.

Note: Thanks to Monty for posting this problem on our forum.

**How to go about it?**

We could use a variety of techniques to extract the values.

- Formulas – not easy given the unstructured nature of data. But
*almost*possible. See this for an example. - VBA – possible, read this forum discussion few ways to do it.
- Power Query – at first glance it might seem tricky, but PQ makes this all too easy. Read on.

- Specify a list of currency codes. Create a table in Excel and mention the codes. Something like below. Let’s name this table as
*codes*

- Load text data in to Power Query (Use Power Query > from Table or Data > from Table). We get this:

- Now, let’s bring in currency codes as a cross join. To do this, just insert a new custom column. In the formula section, type

= Excel.CurrentWorkbook(){[Name=”“]}[Content]**codes** - This will bring a new column with all currency codes as tables. Expand the column to cross join both tables. See below demo.

- Now, add a conditional column to check which currency code is present in the text data. You can use below settings:
- At this stage, our PQ data looks like this:
- Now, let’s filter away any nulls in the Found? column.
**Splitting each row by the currency code in next column:**this is the tricky part. We can use Text.Split() to split a text value by delimiter. But the result will be a list. We just want one of the items of that list. Simple, we can pass the result of Text.Split() to List.Last() to get that. Use below formula:=List.Last(Text.Split([String],[Custom.Codes]))- We get this:
- Now, convert the Amount column to decimal number. This will throw errors for incorrect values like .530.680268. Simply remove all these errors.
- Tidy up by removing unnecessary columns and renaming the rest. Load in to Excel. Here is a snapshot of cleaned data.

**Click here to download sample workbook**. Try cleaning the data in first tab yourself using PQ. You will realize how awesome and simple this approach is compared to either formula or VBA driven methods.

**How are you using Power Query?**

These days, PQ (or get data & transform as it is known in newer versions of Excel) has become my go to tool for most data polishing, cleanup and reshaping problems. What about you? Are you addicted to PQ yet? *Please share your experiences and wins in the comments section*.

**More ways to extract, clean and massage data with Power Query:**

If this looks interesting, check out below tuts to learn more about PQ.

- Introduction to Power Query – Podcast
- SUMPRODUCT Vs. Power Query
- Figuring out Employee Churn with Power Query [HR Analytics]
- Unpivot data quickly with Power Query [tutorial]

]]>

Peak time for sports and leisure #dataviz. About time for a joyplot; might do a write-up on them. #rstats code at https://t.co/Q2AgW068Wa pic.twitter.com/SVT6pkB2hB

— Henrik Lindberg (@hnrklndbrg) July 8, 2017

First let me share the final outcome.

Here is the final overlapped area chart with a bit of formatting thrown in. It is a pretty close imitation of Henrik’s original chart. Click on it to enlarge.

As you can guess, the chart is a ** just an overlapped area chart** (ie each area sits behind another, unlike

Let’s start with a look at data. Henrik’s original data has 10,656 rows, each row containing activity name, time and p value – how much survey respondents enjoyed [@activity] at that time.

Here is a snapshot of first few rows.

As you can see, while this format is excellent for storing, it is very tedious if we want to make one chart with all series. So let’s scrub.

- We need to figure out if an activity should be included or not. I am using the same criteria as Henrik’s. Exclude activities with p value less than 0.003 or activity title “Playing sports n.e.c. *” (not elsewhere classified)
- To do this, we first pivot the data on activity and max(p). Then filter this pivot two ways – max(p) >=0.003 and label not equal
*Playing sports n.e.c. **

Tip: You may need to enable multiple filters per field in the field settings of row labels. - We will end up with
*28 activities*. - Then add a helper column to original table that looks up the pivot and tells if an activity should be included or not

- To do this, we first pivot the data on activity and max(p). Then filter this pivot two ways – max(p) >=0.003 and label not equal
- Add two more columns to original table to tell peak time and modified time. This will help us in rearranging and sorting the data. Modified time just moves time by 3 hours (Henrik’s chart is plotted from 3AM to 3AM). At this stage our data looks like this:

- Now, pivot the data once again. This time,
- exclude activities by using report filter on include? column.
- Set up peak and activity in row labels area, modified time in column labels area and p in values area.
- Arrange the report in tabular format, turn off sub-totals.
- We get this:

- Calculate normalized values by dividing each p value with maximum p value for that activity. We can use another range of 28×288 cells to do this. We get this:

It will become clear once you look at the charts.*The next 2 steps may seem confusing.*- Define an offset value. Start with 0.5. You can change this later. In a separate 28×288 cell range, calculate gaps by multiplying offset with position of an activity. Something like this:

- Now, finally calculate activity + gap values by adding up respective cells in each of the 28×288 ranges. We get this:

**At this stage, our data is a shape ready for visualizing.**

The chart creation process has 5 steps.

- Select the 28×288 range of cells created in step 7 and insert an overlapped area chart.
- Now, copy the gaps range (created in step 6 above) and paste them on to area chart as new series (just ctrl+c your data and select the chart, press ctrl+v)
- Adjust the order of series so that each activity is sandwiched by appropriately named gap series
- Tip: adjusting 56 series is painful with the chart select data > move series up/down buttons. Instead, just select the series, look at formula bar. The SERIES formula has last parameter as order. Change this number. It is easy to figure out the number once you try a few.

- Change all gap series fill color to white. This instantly creates the floating area chart effect.
- Change the colors of activity series. Apply white / off-white border to these series. Your joyplot is ready.

**Quick overview of the chart creation process:**

Let’s examine the result of each those 5 steps with a smaller dataset so you can see how everything fits together. Here is the data for this example:

- Create an overlapped area chart with activity+gaps data. We get this:

- Add gaps as new series to chart. You get this:

- Move the gap series so that they sandwich activity series. Use Chart Data > Move series up/down buttons or SERIES formula

- Apply white color fill formatting for gap series. This creates floating area chart effect as below:

- Finally, format the chart by apply some colors and border formatting etc.

So there you go. The final outcome does look joyful.

While joyplot is awesome, it is not easy to make. Fortunately, there are a few simpler alternatives that we can whip up in Excel as soon as you have either the pivot or normalized values. Below I have shown two such examples. Read about sparklines or conditional formatting heatmaps for more.

**Joyplot alternative – using sparklines:**

Tip: to get axis on your sparkline, just type the times separated by a single space. Then go to format cell (ctrl+1) and set horizontal alignment to distributed. Viola, Excel will fill the cell by adjusting spaces.

**Joyplot alternative – Conditional Formatting Heatmap **

**Click here to download Joyplot Excel workbook**. Examine the data scrubbing formulas, pivot and chart settings to learn how this is created.

If you are familiar with R, then go thru Henrik’s R code. It is much shorter than the Excel gymnastics we did with circular pivot table referencing. That said, some of the data re-arrangement could be done with same ease in Power Query too.

The only step we missed in Excel implementation is moving average smoothing of the area charts. It can be easily added as a step between 3 and 4 in data stage.

How do you like Joyplot? Would you create something like this for your business / personal data? Share your stories and thoughts in the comments section.

**More joy for you…**

If you love this, you are going to enjoy these charts too.

]]>MR06 was using Goal Seek to change the equity input so that the IRR achieved 25% value. But he wanted to do that for a number of Year 3 price changes in the model.

He could obviously change the Year 3 value manually, then run a Goal Seek and write down the Solution and repeat for the range of Year 3 prices changes.

Or he could have setup a macro to do the same and tabulate the results.

Eventually I provided a solution which uses a Data Table and some formulas to determine these values

This post will examine the technique used to solve the problem for MR06

This is MR06’s model. It is a simple financial model of an investment in some units over a 6 1/2 year period.

The problem is that MR06 wants to know what Equity value **B10** is required to achieve an Internal Rate of Return (IRR) of 25% subject to varying the Year 3 Price, cell **G6**.

This problem can be solved using Goal Seek

Goto the **Data, What-If-Analysis, Goal Seek** Tab

Set the dialogs in the Goal Seek to the values shown above and press Ok

Excel sets the value in cell B12 to the value of $441,910 and shows an IRR of 25.1%. This small discrepancy will be discussed later.

MR06 can now manually repeat that changing the values of G6 between each iteration. Except that MR06 has many many models and wants to do that for variations in the Year 3 Price from $8 to $15.

An alternative method to solve this is to use a Data Table to feed in the starting values for both the Year 3 Price **G6**, as well as a list of Values for the Equity Contribution **B12.**

The Data Table will then retrieve a list of the IRR’s for the various combinations.

Then we can then setup a formula to calculate the Equity contribution required to achieve the target IRR value of 25%

You can follow along with a copy of the worked example: Download Example File Here

Setup a list of required Prices for Year 3 in a Row D15:K15 and a list of possible Equities in a Column C16:C25

I have colored the cells to simplify which cells are linked to which parts of the Data Table.

In the Top Left cell C15 link that to the Target IRR cell using =B12

Now select the whole area **C15:K25** and goto the **Data, What-If-Analysis, Data Table** Tab

Complete the Data Table dialog

Row Input cell: $G$6

Row Input cell: $B$10

Press **Ok**

Excel will now fill in the Data Table area with the IRR’s from the combination of Each Row (Equity) and Column (Year 3 Price)

At this point we now have a table of IRR’s from the combination of Each Row (Equity) and Column (Year 3 Price), but we can see that we don’t have an actual solution, only approximate solutions for each Year 3 Price.

Our goal was to find the Equity value which returned a 25% IRR

We can use the Excel **Forecast()** function to interpolate the exact answers based on the data in the Data Table

Forecast allows us to forecast a Y value based on an X value from a Table of Known X and Y values

You can read the excel help about forecast

We can see from the help that Forecast uses the syntax:

=FORECAST(x, known_y’s, known_x’s)

In our example we are seeking a known value of the IRR of 25%

We have a List of Known X’s which is the IRR’s in the Data Table and the list of known Y’s is the corresponding list of Equities in Column C

To test this in cell **D27** type the following: =FORECAST(25%,$C16:$C25,D16:D25) and press **Enter**

Excel will return **$444,481.25**

This is the equity required to return 25% IRR based on a Year 3 Price of $8.00

You can copy Cell D27 across and you will now have a Table of all the equities required for all Year 3 Prices.

A neater solution is to also allow the IRR to be set as a variable. This way it can be changed and the new equities automatically shown for each Year 3 Price change

In Cell **D29** type 25%

Then setup a Row of Year 3 Prices in cell **D31** type: =D15

Copy D31 across to K1

Then in **D32** type: =FORECAST($D$29,$C16:$C25,D16:D25)

Copy it across to K32

You now have a table of Year 3 Prices and the required Equities to achieve a 25% IRR

You can simply change the Required IRR in Cell **D29** eg 20%

And you will have a Table of the required Equities to achieve a 20% IRR

The good thing about having this as a separate table is that it can easily be graphed

Select the Table and got the **Insert, Chart** Tab

Select an appropriate chart type

In this example the model appears to be very linear, this means there is a direct or straight line relationship between the Equity and the IRR for a given Year 3 Price

If we choose a value of equity of say 30% we can see that we need an equity of $725,020 for a Year 3 Price of 8.00

It is important to note that the value of $725,020 is outside the range of the Equity values we set in Column C

In this case we should check that this is still a valid Forecast

To do this reset the Equity values in Column C to that shown below, ensuring that the equity values extend both below and above the expected equity value we saw of $725,020

We can now see that the Equity value required is $649,577.

Those who are observant will notice that the Equity value of 649,577 should have an IRR of slightly less than 29.61%, not 30%

This is because although the model appears linear, it is in fact not exactly linear.

The model’s non-linearity occurs for a number of reasons but primarily is that XIRR is not an exact function. It is an Iterative Function and to quote from Microsoft Help “*Excel uses an iterative technique for calculating XIRR. Using a changing rate (starting with guess), XIRR cycles through the calculation until the result is accurate within 0.000001 percent.*”

For all intention purposes it is close enough to linear that we don’t notice the differences.

If it was perfectly linear the extrapolation to 30% which was outside the original Equity range would have given us the same result of $649,577 not the $725,020 that it returned. Non-linearity is magnified once the extrapolation used in the Forecast Function is outside the range of the known Y values and we need to always check for this.

This non-linearity causes small discrepancies in the Excel functions like IRR and Forecast, of which we are using both.

Forecast uses a least squares approximation to best estimate a line of best fit, but it is just that, an approximation.

If you are wondering how close the model is to being linear, it is 99.77% fit to being linear

This is calculated using the r value or Pearson() function

D50: =PEARSON(C16:C25,D16:D25)

=0.997747518

That is the closer the value is to 1 the more linear it is.

In Excel versions 2016+ Microsoft has added a new set of functions including: FORECAST.LINEAR()

Forecast.Linear and the other new Statistical Functions “*use advanced machine learning algorithms, such as Exponential Triple Smoothing (ETS)*”

You can read all about them here: Advanced Forecast Functions in Excel 2016+

It is recommended that they are used in model from using Excel 2016 onwards.

I hope you have enjoyed this discussion on the use of Data Tables and the Forecast() Function to replicate multiple Goal Seek commands.

For more information on Data Tables I refer you to my post: Data Tables & Monte Carlo Simulations in Excel – A Comprehensive Guide

Many thanx to MR02 for permission to use his model for this post.

]]>