<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>Catbird Analytics</title>
	<atom:link href="https://catbirdanalytics.wordpress.com/feed/" rel="self" type="application/rss+xml" />
	<link>https://catbirdanalytics.wordpress.com</link>
	<description>Digital Analytics for the Catbird Seat...by John Yuill</description>
	<lastBuildDate>Sun, 02 Jan 2022 19:38:02 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<site xmlns="com-wordpress:feed-additions:1">6012381</site><cloud domain='catbirdanalytics.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>https://s0.wp.com/i/buttonw-com.png</url>
		<title>Catbird Analytics</title>
		<link>https://catbirdanalytics.wordpress.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="https://catbirdanalytics.wordpress.com/osd.xml" title="Catbird Analytics" />
	<atom:link rel='hub' href='https://catbirdanalytics.wordpress.com/?pushpress=hub'/>
	<item>
		<title>Dual-Axis Charts: Better Alternatives</title>
		<link>https://catbirdanalytics.wordpress.com/2021/12/31/dual-axis-charts-better-alternatives/</link>
					<comments>https://catbirdanalytics.wordpress.com/2021/12/31/dual-axis-charts-better-alternatives/#respond</comments>
		
		<dc:creator><![CDATA[John]]></dc:creator>
		<pubDate>Fri, 31 Dec 2021 23:01:00 +0000</pubDate>
				<category><![CDATA[Data visualization]]></category>
		<category><![CDATA[R Markdown]]></category>
		<category><![CDATA[R Stats]]></category>
		<category><![CDATA[dual-axis charts]]></category>
		<category><![CDATA[ggplot2]]></category>
		<category><![CDATA[R data visualization]]></category>
		<guid isPermaLink="false">http://catbirdanalytics.wordpress.com/?p=1312</guid>

					<description><![CDATA[Alternatives to Dual-Axis Charts In a previous blog post called &#039;Dual-Axis Charts: Temptations, Traps, Tips&#039;, I went through some of the pitfalls of using dual-axis charts. These are charts where you want to compare two metrics or data attributes but there are vastly different scales involved, so you reach for a layout with two y-axes&#8230; <a href="https://catbirdanalytics.wordpress.com/2021/12/31/dual-axis-charts-better-alternatives/" class="more-link">Continue reading <span class="screen-reader-text">Dual-Axis Charts: Better&#160;Alternatives</span></a>]]></description>
										<content:encoded><![CDATA[<p><img src="https://catbirdanalytics.wordpress.com/wp-content/uploads/2022/01/btc-ada-pc-chg-bars-1.png?w=730" alt="plot of chunk unnamed-chunk-11" /></p>
<h2>Alternatives to Dual-Axis Charts</h2>
<p>In a previous blog post called <a href="https://catbirdanalytics.wordpress.com/2021/12/29/dual-axis-charts-temptations-traps-tips/">&#039;Dual-Axis Charts: Temptations, Traps, Tips&#039;</a>, I went through some of the pitfalls of using <strong>dual-axis</strong> charts. These are charts where you want to compare two metrics or data attributes but there are vastly different scales involved, so you reach for a layout with two y-axes in order to deal with the scales separately. I acknowledged this as a tempting choice, but fraught with danger due to at least a couple of common issues:</p>
<ol>
<li><strong>Misrepresentation due to mixed scales</strong>: changing relative scales arbitrarily can suggest different conclusions and imply relationships that may not be as strong (or weak) as they appear.</li>
<li><strong>Difficultly in interpretation</strong>: these charts require extra mental effort to untangle the lines and associate them with their respective data points.</li>
</ol>
<p>I also provided some basic tips on how to avoid/minimize these issues with dual-axis charts so here I want to try out some alternatives that may provide even better options for communicating effectively with data.</p>
<p>As noted in the previous post, there are two common scenarios where dual-axis charts come up and we will walk through alternatives for each of them.</p>
<ol>
<li>Comparing trends in two (or more) similar data sets that have vastly different scales.</li>
<li>Comparing a volume metric with a related rate or ratio metric. (so, again, vastly different scales)</li>
</ol>
<p>As usual, the examples here are produced using R, with ggplot2 package as the preferred visualization tool.</p>
<h2>Scenario 1: Compare trends in similar metrics from two datasets</h2>
<p>Same example as previous post: prices history of two different crypotcurrencies &#8211; Cardano, with its token ADA and Bitcoin (BTC). </p>
<p>To recap our initial setup:</p>
<p>As always, the visualization choice should be based on <strong>what questions we are trying to answer</strong>, <strong>what we are hoping to learn</strong>, and, ultimately <strong>what decisions we want to make</strong>. </p>
<p>If we are starting with general exploration, we still need to frame it up. Our first thought may be to compare prices over a recent period, to answer questions like:</p>
<ul>
<li>What are the relative changes in the currencies over time?</li>
<li>Do the two follow a similar pattern of ups and downs?</li>
<li>Are there any points where a general pattern breaks? (could provide focus for further investigation)</li>
<li>Eventually: are there ways we can take advantage of these patterns to make investment decisions? (probably beyond initial scope but helps to have that broader perspective) </li>
</ul>
<p>In this random sample of recent data (Cdn$), we can see the two sets of prices are on much different scales.</p>
<table class="table" style="width:auto !important;margin-left:auto;margin-right:auto;">
<thead>
<tr>
<th style="text-align:left;"> date </th>
<th style="text-align:right;"> BTC_CAD </th>
<th style="text-align:right;"> ADA_CAD </th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:left;"> 2021-02-19 </td>
<td style="text-align:right;"> 70534 </td>
<td style="text-align:right;"> 1.17 </td>
</tr>
<tr>
<td style="text-align:left;"> 2021-05-20 </td>
<td style="text-align:right;"> 49186 </td>
<td style="text-align:right;"> 2.18 </td>
</tr>
<tr>
<td style="text-align:left;"> 2021-08-08 </td>
<td style="text-align:right;"> 55067 </td>
<td style="text-align:right;"> 1.79 </td>
</tr>
<tr>
<td style="text-align:left;"> 2021-09-21 </td>
<td style="text-align:right;"> 52157 </td>
<td style="text-align:right;"> 2.55 </td>
</tr>
</tbody>
</table>
<h3>Line charts stacked vertically</h3>
<pre><code class="r">cp01 &lt;- crypto_data %&gt;% ggplot(aes(x=date, y=BTC_CAD))+geom_line()+
  scale_y_continuous(labels=dollar_format())+
  labs(title=&#039;BTC (top) and ADA (bottom) prices (CDN$)&#039;, x=&#039;&#039;)+
  theme(axis.text.x = element_blank())
cp02 &lt;- crypto_data %&gt;% ggplot(aes(x=date, y=ADA_CAD))+geom_line()+
  scale_y_continuous(labels=dollar_format())+
  labs(x=&#039;&#039;)
#grid.arrange(cp01, cp02, nrow=2)
plot_grid(cp01, cp02, nrow=2, align=&#039;v&#039;)
</code></pre>
<p><img src="https://catbirdanalytics.wordpress.com/wp-content/uploads/2022/01/crypto-lines-01-1.png?w=730" alt="plot of chunk crypto-lines-01" /></p>
<p>Here we still have the issue of scale ratios, but we have some advantages:</p>
<ul>
<li>clear and easy to see which dataset is which.</li>
<li>separating the lines puts the focus on general pattern comparison, doesn&#039;t create as strong an implication around magnitude of comparative changes and doesn&#039;t create distractions like cross-over points, which are meaningless. </li>
</ul>
<p>We are able to focus on the comparison, not on bending our mind around untangling the lines. This might more easily lead to a follow-up question, like:</p>
<ul>
<li>seems to be some similarity in the trends but not super-consistent, I wonder what the correlation between the lines is?</li>
</ul>
<pre><code class="r">corel &lt;- cor.test(crypto_data$BTC_CAD, crypto_data$ADA_CAD)
corelcoef &lt;- corel$estimate
corelci_lower &lt;- corel$conf.int[1]
corelci_upper &lt;- corel$conf.int[2]
</code></pre>
<p>Quick calculation shows <strong>r= 0.386</strong>, which is not that strong, and the 95% confidence interval is between 0.292 and 0.474, which seems pretty wide, lowering our sense of the strength of the relationship even further.</p>
<p>There&#039;s a whole other rabbit hole we can go down here, if we choose &#8211; creating scatterplots and all kinds of things &#8211; but for our purposes we&#039;ll continue with other visualization strategies.</p>
<h3>% Change Comparison</h3>
<p>Since a key part of what we are trying to understand is relative change in prices, a logical approach to get away from scale issue is to look at % changes. After calculation, we can get this view:</p>
<pre><code class="r">## calculate % changes day-over-day
crypto_data_pc &lt;- crypto_data %&gt;% mutate(BTC_CAD_pc=BTC_CAD/lag(BTC_CAD)-1,
                                         ADA_CAD_pc=ADA_CAD/lag(ADA_CAD)-1)
crypto_data_pc &lt;- crypto_data_pc[-1,]
## produce chart of % changes for each currency
crypto_data_pc %&gt;% ggplot(aes(x=date, y=BTC_CAD_pc))+geom_line(color=&#039;goldenrod&#039;)+
  scale_y_continuous(labels=percent_format())+
  geom_line(aes(y=ADA_CAD_pc), color=&#039;blue&#039;)+
  labs(title=&#039;Daily % Changes in Prices (gold=BTC, blue=ADA)&#039;, x=&quot;&quot;, y=&#039;Daily % Chg&#039;)
</code></pre>
<p><img src="https://catbirdanalytics.wordpress.com/wp-content/uploads/2022/01/crypto-pc-lines-01-1.png?w=730" alt="plot of chunk crypto-pc-lines-01" /></p>
<p>Pretty messy with this amount of data, but hopefully you can see how this approach could be useful in comparing the two currencies. This could be made more readable by either zooming in on shorter period OR aggregrating the data by week or month. </p>
<p>Here&#039;s the example with weeks (using Lubridate pkg to choosing weekday==7 ):</p>
<pre><code class="r">## use lubridate to get day of week for each date, filter for single day of week, calc WoW % chg
crypto_data_pc_wk &lt;- crypto_data %&gt;% mutate(
  weekday=wday(date)
) %&gt;% filter(weekday==7) %&gt;% mutate(
  BTC_CAD_pc=BTC_CAD/lag(BTC_CAD)-1,
  ADA_CAD_pc=ADA_CAD/lag(ADA_CAD)-1
)
crypto_data_pc_wk &lt;- crypto_data_pc_wk[-1,] ## drop first row, since NA for % chg
## plot weekly change comparison
crypto_data_pc_wk %&gt;% ggplot(aes(x=date, y=BTC_CAD_pc))+geom_line(color=&#039;goldenrod&#039;)+
  scale_y_continuous(labels=percent_format())+
  geom_line(aes(y=ADA_CAD_pc), color=&#039;blue&#039;)+
  labs(title=&#039;Weekly % Changes in Prices (gold=BTC, blue=ADA)&#039;, x=&quot;&quot;, y=&#039;Daily % Chg&#039;)
</code></pre>
<p><img src="https://catbirdanalytics.wordpress.com/wp-content/uploads/2022/01/crypto-pc-wk-lines-01-1.png?w=730" alt="plot of chunk crypto-pc-wk-lines-01" /></p>
<p>Can also try this view with bars, after a bit of manipulation to longer data shape for ease of bar chart comparison:</p>
<pre><code class="r">## pivot data longer to make it easier to display side-by-side bars with legend
crypto_data_pc_wk_lg &lt;- crypto_data_pc_wk %&gt;% select(date, BTC_CAD_pc, ADA_CAD_pc) %&gt;% 
  pivot_longer(cols=c(BTC_CAD_pc, ADA_CAD_pc), names_to=&#039;currency&#039;, values_to = &#039;pc_chg&#039;)  
## side-by-side bar plot
crypto_data_pc_wk_lg %&gt;% ggplot(aes(x=date, y=pc_chg, fill=currency))+
  geom_col(position = position_dodge2())+
  scale_y_continuous(labels=percent_format())+
  labs(title=&#039;Weekly % Changes in Prices&#039;, x=&quot;&quot;, y=&#039;Daily % Chg&#039;)+
  theme(legend.position = &#039;top&#039;, legend.title = element_blank())
</code></pre>
<p><img src="https://catbirdanalytics.wordpress.com/wp-content/uploads/2022/01/crypto-pc-wk-bar-01-1.png?w=730" alt="plot of chunk crypto-pc-wk-bar-01" /></p>
<p>Still some challenges with density of the chart, but highlights how bar/column charts can facilitate side-by-side comparisons, whereas line charts favour reading trends.</p>
<h4>Difference in Difference</h4>
<p>Once we see the comparison of % change week-over-week, we might want to go further to compare the differences in those changes. We can take the next logical step and calculate the difference between the % change in ADA and % change in BTC &#8211; a version of the <a href="https://en.wikipedia.org/wiki/Difference_in_differences#:%7E:text=Difference%20in%20differences%20(DID%20or,&#x27;%20versus%20a%20&#x27;control%20group&#x27;">&#039;difference in difference&#039;</a> approach used in statistics.  </p>
<pre><code class="r">crypto_data_pc_wk &lt;- crypto_data_pc_wk %&gt;% mutate(
  ADA_BTC_diff=ADA_CAD_pc-BTC_CAD_pc
)
crypto_data_pc_wk %&gt;% ggplot(aes(x=date, y=ADA_BTC_diff*100))+geom_col()+
  labs(title=&quot;Difference in % Difference: ADA-BTC&quot;, x=&quot;&quot;, y=&#039;Difference in Chg (percentage pts)&#039;)
</code></pre>
<p><img src="https://catbirdanalytics.wordpress.com/wp-content/uploads/2022/01/crypto_data_diff_in_diff-01-1.png?w=730" alt="plot of chunk crypto_data_diff_in_diff-01" /></p>
<p>This view highlights even further the lack of consistent relationship between changes in the two currencies: </p>
<ul>
<li>if the two had the same % changes (increase or decrease) week-over-week, the bars would be at/close to 0.</li>
<li>if there was a consistent difference in difference, for example, if a 10% change in BTC accompanied an 15% change in ADA, and 5% change in BTC accompanied a 10% in ADA, the bars would all be at 5.</li>
</ul>
<p>Although the bars appear to be somewhat centered around 0 there is a LOT of variation on either side. Getting back to our original questions, there is no identifiable pattern in comparative price changes and not even apparent changes in the relationship over time or at different periods. </p>
<h4>Center and Spread</h4>
<p>Looking at the relative patterns over time, one direction this could lead us is to ask questions around center and distribution of daily changes in the two currencies. This is going off on a tangent from the main goal of exploring alternatives to dual-axis time series charts, but I can&#039;t resist. 😉  </p>
<pre>## BTC-CAD summary
</pre>
<pre>##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -0.1335 -0.0216 -0.0004  0.0024  0.0245  0.1855
</pre>
<pre>## std deviation:  0.0422
</pre>
<pre>## 
## ADA-CAD summary
</pre>
<pre>##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  -0.256  -0.027   0.003   0.009   0.032   0.322
</pre>
<pre>## std deviation:  0.0692
</pre>
<p>Both appear to be pretty tightly-centered around 0, with Cardano being a bit more volatile. This could lead us into some distribution visualizations like histogram with changes in the two currencies overlaid each other&hellip;</p>
<pre><code class="r">crypto_data_pc %&gt;% ggplot()+
  geom_histogram(aes(x=BTC_CAD_pc), fill=&#039;goldenrod&#039;, alpha=0.2)+
  geom_histogram(aes(x=ADA_CAD_pc), fill=&#039;blue&#039;, alpha=0.2)+
  labs(title=&#039;Distribution of Daily % Chg, gold=BTC-CAD, blue=ADA-CAD&#039;, x=&#039;&#039;)
</code></pre>
<p><img src="https://catbirdanalytics.wordpress.com/wp-content/uploads/2022/01/crypto-hist-01-1.png?w=730" alt="plot of chunk crypto-hist-01" /></p>
<p>&hellip;or, personal preference, boxplot&hellip;</p>
<pre><code class="r">## pivot data longer to make it easier to display side-by-side bars with legend
crypto_data_pc_lg &lt;- crypto_data_pc %&gt;% select(date, BTC_CAD_pc, ADA_CAD_pc) %&gt;% 
  pivot_longer(cols=c(BTC_CAD_pc, ADA_CAD_pc), names_to=&#039;currency&#039;, values_to = &#039;pc_chg&#039;)
## boxplot
crypto_data_pc_lg %&gt;% ggplot(aes(x=currency, y=pc_chg))+geom_boxplot(fill=&#039;dodgerblue&#039;)+
  scale_y_continuous(labels=percent_format())+
  labs(title=&#039;Distribution of Daily % Chg&#039;, x=&#039;&#039;, y=&#039;Daily % Chg&#039;)
</code></pre>
<p><img src="https://catbirdanalytics.wordpress.com/wp-content/uploads/2022/01/crypto-pc-box-01-1.png?w=730" alt="plot of chunk crypto-pc-box-01" /></p>
<h3>Re-Scale the Data</h3>
<p>Another option we have is to rescale both sets of prices so that they are on a common scale, and therefore more camparable. This is the kind of thing that is often done in machine learning in order to balance the weights of features. There are a number of potential pitfalls, so it is best to proceed with caution, understanding of your data, and objectives in mind. This info is presented as <strong>demonstration, not necessarily endorsement</strong>. 😉</p>
<h4>Two main approaches: Normalization and Standardization</h4>
<p>This is a whole area unto itself and there are variations in the terminology used. I&#039;m relying on the following references:</p>
<ul>
<li><a href="https://www.analyticsvidhya.com/blog/2020/04/feature-scaling-machine-learning-normalization-standardization/">Feature Scaling for Machine Learning: Understanding the Differences between Normalization and Standardization (Analyticsvidyha.com)</a></li>
<li><a href="https://sebastianraschka.com/Articles/2014_about_feature_scaling.html">About Feature Scaling and Normalization (Sebastian Raschka)</a></li>
<li><a href="https://medium.com/swlh/data-normalisation-with-r-6ef1d1947970">Data Normalization with R (Nikhita Singh Shiv Kalpana on Medium)</a></li>
</ul>
<p>Based on the above, there are two general approaches, described as:</p>
<ul>
<li><strong>Normalization</strong>: scale the values from 0 &#8211; 1, using &#039;min-max scaling&#039;. Doesn&#039;t treat outliers well. This is often described as maintaining the same distribution as original data, just shifting to different scale. However, it appears that will particularly volatile data, this is not always the case.</li>
<li><strong>Standardization</strong>: aka &#039;z-score&#039;: scale the values so that mean = 0 and standard deviation = 1. No upper or lower bound, so tends to be better at handling outliers. Skews the data toward normal distribution.</li>
</ul>
<p>There are general guidelines but no hard and fast rules around when to use one or the other, so let&#039;s check them both out.</p>
<h4>Normalization</h4>
<p>According to info on analyticsvidhya.com, considered &#039;<a href="https://www.analyticsvidhya.com/blog/2020/04/feature-scaling-machine-learning-normalization-standardization/">good to use when you know that the distribution of your data does not follow a Gaussian distribution</a>&#039;. Let&#039;s check:</p>
<pre><code class="r">## use density function to compare actual vs normal ideal
## BTC
hist01 &lt;- crypto_data %&gt;% ggplot(aes(x=BTC_CAD))+geom_histogram(aes(y=..density..))+
  ## function to calculate ideal normal dist based on mean and sd in the dataset 
  stat_function(fun=dnorm, args=list(mean=mean(crypto_data$BTC_CAD), sd=sd(crypto_data$BTC_CAD)), color=&#039;red&#039;)+
  labs(title=&#039;BTC_CAD price distribution&#039;)
## ADA
hist02 &lt;- crypto_data %&gt;% ggplot(aes(x=ADA_CAD))+geom_histogram(aes(y=..density..))+
  ## function to calculate ideal normal dist based on mean and sd in the dataset
  stat_function(fun=dnorm, args=list(mean=mean(crypto_data$ADA_CAD), sd=sd(crypto_data$ADA_CAD)), color=&#039;red&#039;)+
  labs(title=&#039;ADA_CAD price distribution&#039;)
## print both
grid.arrange(hist01, hist02, nrow=1)
</code></pre>
<p><img src="https://catbirdanalytics.wordpress.com/wp-content/uploads/2022/01/crypto-norm-check-01-1.png?w=730" alt="plot of chunk crypto-norm-check-01" /></p>
<p>This is pretty &#039;abnormal data&#039; &#8211; <em>hello cryptocurrency!</em> &#8211; so another reason to exercise caution. </p>
<p>The formula for min-max normalization is pretty straightforward &#8211; basically yor each value in the data set you calculate the distance from the minimum value and then divide by the full range of data:</p>
<p>Xeach = (X &#8211; Xmin) / (Xmax-Xmin)</p>
<p>For fun, we can use the caret pkg, based on example code from <a href="https://www.journaldev.com/47850/normalize-data-in-r">JournalDev.com</a>:</p>
<pre><code class="r">## use caret pkg functions for fun
library(caret)
process &lt;- preProcess(crypto_data, method=c(&#039;range&#039;))
crypto_norm &lt;- predict(process, crypto_data)

## alternative methods:
## - mutate
crypto_data_minmax &lt;- crypto_data %&gt;% mutate(
  BTC_mm=(BTC_CAD-min(crypto_data$BTC_CAD))/(max(crypto_data$BTC_CAD)-min(crypto_data$BTC_CAD)),
  ADA_mm=(ADA_CAD-min(crypto_data$ADA_CAD))/(max(crypto_data$ADA_CAD)-min(crypto_data$ADA_CAD))
)
## - simple function with lapply
fminmax &lt;- function(x){
  (x-min(x))/(max(x)-min(x))
  }
crypto_data_fminmax &lt;- as.data.frame(lapply(crypto_data[,2:3], fminmax))
</code></pre>
<p>Create plot for display later:</p>
<pre><code class="r">## create a plot of normalized lines
p_norm &lt;- crypto_norm %&gt;% ggplot(aes(x=date))+
  geom_line(aes(y=BTC_CAD), color=&#039;goldenrod&#039;)+
  geom_line(aes(y=ADA_CAD), color=&#039;blue&#039;)+
  labs(title=&#039;Nrmlized Prices (gold=BTC, blue=ADA)&#039;, x=&#039;&#039;, y=&#039;normalized prices&#039;)
</code></pre>
<p>One thing to note is that contrary to info  <a href="https://medium.com/@sjacks/feature-transformation-21282d1a3215">here</a> (for example), the distribution of the newly normalized data is similar but <em>not the same</em> as original data:</p>
<p><img src="https://catbirdanalytics.wordpress.com/wp-content/uploads/2022/01/crypto-hist-btc-norm-01-1.png?w=730" alt="plot of chunk crypto-hist-btc-norm-01" /></p>
<h4>Standardization</h4>
<ul>
<li>use the built-in &#039;scale&#039; function in R</li>
</ul>
<pre><code class="r">## z-score scaling
crypto_scale &lt;- as.data.frame(scale(crypto_data[2:3]))
crypto_scale &lt;- crypto_scale %&gt;% rename(
  BTC_CAD_scale=BTC_CAD,
  ADA_CAD_scale=ADA_CAD
)
## bind the values back to original data set, with dates
crypto_scale &lt;- bind_cols(crypto_data, crypto_scale)
</code></pre>
<p>Here we see again that the distributions are similar but not the same before/after. This is expected with standardization: distributions shift more toward normal curve.  </p>
<p><img src="https://catbirdanalytics.wordpress.com/wp-content/uploads/2022/01/crypto-hist-std-01-1.png?w=730" alt="plot of chunk crypto-hist-std-01" /></p>
<p>Create line plot:</p>
<pre><code class="r">p_std &lt;- crypto_scale %&gt;% ggplot(aes(x=date))+
  geom_line(aes(y=BTC_CAD_scale), color=&#039;goldenrod&#039;)+
  geom_line(aes(y=ADA_CAD_scale), color=&#039;blue&#039;)+
  labs(title=&#039;Stdized Prices (gold=BTC, blue=ADA)&#039;, x=&#039;&#039;, y=&#039;standardized prices&#039;)
</code></pre>
<p>Compare the two methods:</p>
<p><img src="https://catbirdanalytics.wordpress.com/wp-content/uploads/2022/01/crypto-norm-std-01-1.png?w=730" alt="plot of chunk crypto-norm-std-01" /></p>
<p>Very similar results, although interesting to see how the Standardized view on the right has a bit more spread for ADA relative to BTC, as values are not constrained between upper and lower bound. This may be a more accurate reflection of the higher volatility of ADA.</p>
<p>So if we focus on the Standardized version, what can we learn from this view, relative to the questions we want to answer? A few things to unpack so let me make take a stab at it:</p>
<ul>
<li>we see the overall upward trend in both data sets, with peaks and valleys along the way, at different points for each currency</li>
<li>lots of volatility in each data set, with a bit relative volatility in ADA: including higher peaks and lower lows.</li>
<li>ADA was slower off the mark than BTC at the beginning of the year, peaked a bit after, trended down in concert with BTC and then took off, before falling more consistently and harder than BTC at the end of the year.</li>
</ul>
<p>Now, with dual-axis charts like the <a href="https://catbirdanalytics.wordpress.com/2021/12/29/dual-axis-charts-temptations-traps-tips/">companion versions in the previous post</a>, we can draw similar conclusions, depending on how we configure the two axes &#8211; which is the <em>crux</em> of the problem. The point is this: </p>
<p><strong>with a standardized comparison, we can make these conclusions with more confidence.</strong>  </p>
<h4>Percentile comparisons</h4>
<p>Data can also be scaled using percentiles, resulting in a scale between 0 and 100, based on the ranking of each value. Doesn&#039;t seem to be a recommended for machine learning feature engineering, but could have application for comparisons as alternative to dual axis charts, where a more even comparison is wanted. </p>
<p>The tidyverse has a handy &#039;percent_rank&#039; function for easy calculation.</p>
<pre><code class="r">crypto_data_pctl &lt;- crypto_data %&gt;% mutate(
  BTC_CAD_pctl=percent_rank(BTC_CAD),
  ADA_CAD_pctl=percent_rank(ADA_CAD)
)

p_pctl &lt;- crypto_data_pctl %&gt;% ggplot(aes(x=date))+
  geom_line(aes(y=BTC_CAD_pctl), color=&#039;goldenrod&#039;)+
  geom_line(aes(y=ADA_CAD_pctl), color=&#039;blue&#039;)+
  scale_y_continuous(labels=percent_format())+
  labs(title=&#039;Prcntile Prices (gold=BTC, blue=ADA)&#039;, x=&#039;&#039;, y=&#039;percentile rank&#039;)
p_pctl
</code></pre>
<p><img src="https://catbirdanalytics.wordpress.com/wp-content/uploads/2022/01/crypto-pctl-lines-01-1.png?w=730" alt="plot of chunk crypto-pctl-lines-01" /></p>
<p>Also appears potentially usable although does seem to diverge from the other two approaches. Compare all three methods:</p>
<p><img src="https://catbirdanalytics.wordpress.com/wp-content/uploads/2022/01/crypto-all-norm-01-1.png?w=730" alt="plot of chunk crypto-all-norm-01" /></p>
<p>The percentile version stands out as having different relative patterns compared to the other two. Possibly because by definition data points for each of the currencies will include all the same percentile values (0 to 100 in increments of n/100 where n=number of rows in the data). The only difference in the two will be the order in which the values occur in time, by date. So the percentile approach is one to avoid if the relative amount, and not just rank position, matters.</p>
<h3>Conclusion &#8211; Scenario 1</h3>
<p>There are some viable alternatives to dual-axis charts when trying to answer questions about the relationships/trends between to metrics on wildly different scales. In particular, comparing percentage changes can provide insights without the hazards of dual-axis charts.</p>
<h2>Scenario 2: Comparing a Count and a Ratio</h2>
<p>In other cases, we may want to compare patterns in a volume or count metric with a related key indicator. Here&#039;s an example using some Google Analytics data for a website:</p>
<ul>
<li>daily users</li>
<li>daily conversion rate</li>
</ul>
<p>Interesting questions with these metrics can inlude:</p>
<ul>
<li>what is the relationship between patterns in site traffic and conversion rates?</li>
<li>do increases in daily users correspond to decreases in conversion rates or vice versa?</li>
<li>are there any points where breaking of the typical relationship between these metrics warrants further investigation?</li>
</ul>
<p>Quick look at the data:</p>
<table class="table" style="width:auto !important;margin-left:auto;margin-right:auto;">
<thead>
<tr>
<th style="text-align:left;"> date </th>
<th style="text-align:right;"> users </th>
<th style="text-align:right;"> conv_rate </th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:left;"> 2021-11-06 </td>
<td style="text-align:right;"> 464654 </td>
<td style="text-align:right;"> 0.024 </td>
</tr>
<tr>
<td style="text-align:left;"> 2021-11-10 </td>
<td style="text-align:right;"> 439000 </td>
<td style="text-align:right;"> 0.016 </td>
</tr>
<tr>
<td style="text-align:left;"> 2021-11-12 </td>
<td style="text-align:right;"> 576465 </td>
<td style="text-align:right;"> 0.020 </td>
</tr>
<tr>
<td style="text-align:left;"> 2021-11-17 </td>
<td style="text-align:right;"> 414958 </td>
<td style="text-align:right;"> 0.022 </td>
</tr>
</tbody>
</table>
<p>Approaches shown above can be applied to this data as well. So we&#039;ll focus on some further alternatives particularly well-suited to this type of data.</p>
<h3>Line chart with Bar chart below</h3>
<p>We saw above how line charts stocked above and below can be easier to interpret that dual-axis charts. In the previous blog post, there was an example of how a dual-axis chart can be improved by combining a line chart for a percentage/ratio metric with a bar chart for a volume metric. We can take from each of these examples and set up a line chart with a bar chart below it &#8211; similar to the typical way stock market charts display price data with volume data underneath.</p>
<p><img src="https://catbirdanalytics.wordpress.com/wp-content/uploads/2022/01/unnamed-chunk-19-1.png?w=730" alt="plot of chunk unnamed-chunk-19" /></p>
<p>Here we&#039;re abandoning any attempt to align the scales in favour of considering the metrics separately but within the same context of time frame. This also:</p>
<ul>
<li>frees us up to adjust the height ratio of the charts in order to focus on the key metric of interest. </li>
<li>the separation reduces confusion over the scales, as there is less ambiguity.</li>
<li>allows us to add context like a trend (regression) line without creating the confusion and clutter that we would have on a dual-axis chart.</li>
</ul>
<p>My view is that these changes make it easier and more intuitive to arrive at the same conclusions from the <a href="https://catbirdanalytics.wordpress.com/2021/12/29/dual-axis-charts-temptations-traps-tips/">previous blog post</a>: conversion rates rising and fall out-of-sync with variations in user counts.</p>
<p>Again, this gets us quicker to follow-up questions and directed investigation, as well as enabling us to communicate what we are seeing in the data more clearly with others.</p>
<h3>Conclusion &#8211; Scenario 2</h3>
<p>As with previous scenario, there are viable alternatives to comparing a volume metric with a rate or ratio metric, even though they are on very different scales. </p>
<h2>Overall Conclusion</h2>
<p>With these viable alternatives to dual-axis charts, there should be few cases where an analyst needs to succumb to the temptation of the dual-axis monster. The solutions offered here, or variations on them to fit your needs, will usually provide more clarity and better reveal the answers to your data questions.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://catbirdanalytics.wordpress.com/2021/12/31/dual-axis-charts-better-alternatives/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">1312</post-id>
		<media:thumbnail url="https://catbirdanalytics.wordpress.com/wp-content/uploads/2022/01/btc-ada-pc-chg-bars.png" />
		<media:content url="https://catbirdanalytics.wordpress.com/wp-content/uploads/2022/01/btc-ada-pc-chg-bars.png" medium="image">
			<media:title type="html">btc-ada-pc-chg-bars.png</media:title>
		</media:content>

		<media:content url="https://0.gravatar.com/avatar/61a7e5aa4e24773b513b314f89b5611e80124a6e923e7234e89ca72a387b08d1?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">john</media:title>
		</media:content>

		<media:content url="https://catbirdanalytics.wordpress.com/wp-content/uploads/2022/01/btc-ada-pc-chg-bars-1.png" medium="image">
			<media:title type="html">plot of chunk unnamed-chunk-11</media:title>
		</media:content>

		<media:content url="https://catbirdanalytics.wordpress.com/wp-content/uploads/2022/01/crypto-lines-01-1.png" medium="image">
			<media:title type="html">plot of chunk crypto-lines-01</media:title>
		</media:content>

		<media:content url="https://catbirdanalytics.wordpress.com/wp-content/uploads/2022/01/crypto-pc-lines-01-1.png" medium="image">
			<media:title type="html">plot of chunk crypto-pc-lines-01</media:title>
		</media:content>

		<media:content url="https://catbirdanalytics.wordpress.com/wp-content/uploads/2022/01/crypto-pc-wk-lines-01-1.png" medium="image">
			<media:title type="html">plot of chunk crypto-pc-wk-lines-01</media:title>
		</media:content>

		<media:content url="https://catbirdanalytics.wordpress.com/wp-content/uploads/2022/01/crypto-pc-wk-bar-01-1.png" medium="image">
			<media:title type="html">plot of chunk crypto-pc-wk-bar-01</media:title>
		</media:content>

		<media:content url="https://catbirdanalytics.wordpress.com/wp-content/uploads/2022/01/crypto_data_diff_in_diff-01-1.png" medium="image">
			<media:title type="html">plot of chunk crypto_data_diff_in_diff-01</media:title>
		</media:content>

		<media:content url="https://catbirdanalytics.wordpress.com/wp-content/uploads/2022/01/crypto-hist-01-1.png" medium="image">
			<media:title type="html">plot of chunk crypto-hist-01</media:title>
		</media:content>

		<media:content url="https://catbirdanalytics.wordpress.com/wp-content/uploads/2022/01/crypto-pc-box-01-1.png" medium="image">
			<media:title type="html">plot of chunk crypto-pc-box-01</media:title>
		</media:content>

		<media:content url="https://catbirdanalytics.wordpress.com/wp-content/uploads/2022/01/crypto-norm-check-01-1.png" medium="image">
			<media:title type="html">plot of chunk crypto-norm-check-01</media:title>
		</media:content>

		<media:content url="https://catbirdanalytics.wordpress.com/wp-content/uploads/2022/01/crypto-hist-btc-norm-01-1.png" medium="image">
			<media:title type="html">plot of chunk crypto-hist-btc-norm-01</media:title>
		</media:content>

		<media:content url="https://catbirdanalytics.wordpress.com/wp-content/uploads/2022/01/crypto-hist-std-01-1.png" medium="image">
			<media:title type="html">plot of chunk crypto-hist-std-01</media:title>
		</media:content>

		<media:content url="https://catbirdanalytics.wordpress.com/wp-content/uploads/2022/01/crypto-norm-std-01-1.png" medium="image">
			<media:title type="html">plot of chunk crypto-norm-std-01</media:title>
		</media:content>

		<media:content url="https://catbirdanalytics.wordpress.com/wp-content/uploads/2022/01/crypto-pctl-lines-01-1.png" medium="image">
			<media:title type="html">plot of chunk crypto-pctl-lines-01</media:title>
		</media:content>

		<media:content url="https://catbirdanalytics.wordpress.com/wp-content/uploads/2022/01/crypto-all-norm-01-1.png" medium="image">
			<media:title type="html">plot of chunk crypto-all-norm-01</media:title>
		</media:content>

		<media:content url="https://catbirdanalytics.wordpress.com/wp-content/uploads/2022/01/unnamed-chunk-19-1.png" medium="image">
			<media:title type="html">plot of chunk unnamed-chunk-19</media:title>
		</media:content>
	</item>
		<item>
		<title>Dual-Axis Charts: Temptations, Traps, Tips</title>
		<link>https://catbirdanalytics.wordpress.com/2021/12/29/dual-axis-charts-temptations-traps-tips/</link>
					<comments>https://catbirdanalytics.wordpress.com/2021/12/29/dual-axis-charts-temptations-traps-tips/#comments</comments>
		
		<dc:creator><![CDATA[John]]></dc:creator>
		<pubDate>Wed, 29 Dec 2021 22:13:53 +0000</pubDate>
				<category><![CDATA[Data visualization]]></category>
		<category><![CDATA[R Markdown]]></category>
		<category><![CDATA[R Stats]]></category>
		<category><![CDATA[ggplot2]]></category>
		<guid isPermaLink="false">http://catbirdanalytics.wordpress.com/?p=1255</guid>

					<description><![CDATA[The Temptation of Dual-Axis Charts Sometimes when we are trying to show relationships over time between two dimensions of the same metric, or two separate metrics, we run into a situation where differences in the scales involved make it difficult/impossible to really tell what is going on using a standard line chart. One solution we&#8230; <a href="https://catbirdanalytics.wordpress.com/2021/12/29/dual-axis-charts-temptations-traps-tips/" class="more-link">Continue reading <span class="screen-reader-text">Dual-Axis Charts: Temptations, Traps,&#160;Tips</span></a>]]></description>
										<content:encoded><![CDATA[<p><img src="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/12/btc-ada-thumbnail-10.png?w=730" alt="plot of chunk unnamed-chunk-1"></p>
<h2>The Temptation of Dual-Axis Charts</h2>
<p>Sometimes when we are trying to show relationships over time between two dimensions of the same metric, or two separate metrics, we run into a situation where differences in the scales involved make it difficult/impossible to really tell what is going on using a standard line chart. One solution we may be tempted to turn to: a <strong>dual-y-axis</strong> chart, with an axis on the left for the scale that fits one dimension or metric, and a separate scale on the right that fits the other dimension or metric.</p>
<p>Tempting, but risky. This post walks through a couple of scenarios and demonstrates why due care and attention is needed to avoid pitfalls of dual-axis charts. I will follow-up with a post that highlights some alternatives for consideration, instead of giving in to the dual-axis temptation.</p>
<h2>The Trap of Dual-Axis Charts</h2>
<p>The general idea here is that dual-axis charts should be avoided if possible, because they suffer from at least two flaws:</p>
<ol>
<li><strong>Misrepresentation due to mixed scales</strong>: changing relative scales arbitrarily can suggest different conclusions and imply relationships that may not be as strong (or weak) as they appear.</li>
<li><strong>Difficultly in interpretation</strong>: these charts require extra mental effort to untangle the lines and associate them with their respective data points.</li>
</ol>
<h3>What the Experts Say: Avoid</h3>
<p>Data visualization experts generally recommend against the use of dual-axis charts, for similar reasons cited above (and sometimes more). For example, in the book <a href="https://amzn.to/3pyJxGe">&#8216;Better Data Visualizations&#8217;*</a> by Jonathan Schwabish, he has a section called <strong>&#8216;Avoid Dual-Axis Line Charts&#8217;</strong> that covers similar territory to what is discussed here.</p>
<p><em>(</em> affiliate link for a book I wholeheartedly recommend for any data viz professional)*</p>
<p>With that baseline-setting, let&#8217;s dig into some scenarios and examples to make the point more clearly. As usual, the examples here are produced using R, with ggplot2 package as the preferred visualization tool.</p>
<h2>Two common scenarios for dual-axis charts</h2>
<p>There are two common scenarios where dual-axis charts become tempting:</p>
<ol>
<li>Comparing trends in two data sets that have <strong>vastly different scales</strong>.</li>
<li>Comparing a <strong>volume metric</strong> with a related <strong>rate or ratio metric</strong>. (so, again, vastly different scales)</li>
</ol>
<p>Most charting tools allow for the creation of dual-axis charts. Most relevant for our purposes, this can be done in ggplot2:</p>
<p><a href="https://www.r-graph-gallery.com/line-chart-dual-Y-axis-ggplot2.html">ggplot2 dual-y-axis reference</a></p>
<h2>Scenario 1: Compare trends in similar metrics from two datasets</h2>
<p>Suppose we are interested in crypto currencies and are curious about how a crypto currency like Cardano, with its token ADA compares against Bitcoin (BTC).</p>
<p>As always, the visualization choice should be based on <strong>what questions we are trying to answer</strong>, <strong>what we are hoping to learn</strong>, and, ultimately <strong>what decisions we want to make</strong>.</p>
<p>If we don&#8217;t have a specific objective beyond curiousity and want to start with general exploration, we still need to frame up our exploration. Our first thought may be to compare prices over a recent period, to answer questions like:</p>
<ul>
<li>What are the relative changes in the currencies over time?</li>
<li>Do the two follow a similar pattern of ups and downs?</li>
<li>Are there any points where a general pattern breaks? (could provide focus for further investigation)</li>
<li>Eventually: are there ways we can take advantage of these patterns to make investment decisions? (probably beyond initial scope but helps to have that broader perspective)</li>
</ul>
<p>So we gather some price data (Cdn$). Here is a random sample of rows, along with summary data. We can see the two sets of prices are on much different scales.</p>
<table class="table" style="width:auto !important;margin-left:auto;margin-right:auto;">
<thead>
<tr>
<th style="text-align:left;"> date</th>
<th style="text-align:right;"> BTC_CAD</th>
<th style="text-align:right;"> ADA_CAD</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:left;"> 2021-01-10</td>
<td style="text-align:right;"> 48818</td>
<td style="text-align:right;"> 0.388</td>
</tr>
<tr>
<td style="text-align:left;"> 2021-04-03</td>
<td style="text-align:right;"> 72437</td>
<td style="text-align:right;"> 1.475</td>
</tr>
<tr>
<td style="text-align:left;"> 2021-04-17</td>
<td style="text-align:right;"> 75906</td>
<td style="text-align:right;"> 1.732</td>
</tr>
<tr>
<td style="text-align:left;"> 2021-04-27</td>
<td style="text-align:right;"> 68280</td>
<td style="text-align:right;"> 1.622</td>
</tr>
<tr>
<td style="text-align:left;"> 2021-06-20</td>
<td style="text-align:right;"> 44447</td>
<td style="text-align:right;"> 1.779</td>
</tr>
<tr>
<td style="text-align:left;"> 2021-09-07</td>
<td style="text-align:right;"> 59200</td>
<td style="text-align:right;"> 3.165</td>
</tr>
</tbody>
</table>
<h3>Basic line chart</h3>
<p>This is confirmed in a basic line chart produced in ggplot2: (click into code block and scroll/drag horizontally)</p>
<pre><code class="r">crypto_data %&gt;% ggplot(aes(x=date))+geom_line(aes(y=BTC_CAD), color='gold')+
  geom_line(aes(y=ADA_CAD), color='blue')
</code></pre>
<p><img src="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/12/da-crypto-plot-01-1-7.png?w=730" alt="plot of chunk da-crypto-plot-01"></p>
<p>This shows us the pattern in Bitcoin prices but due to difference in scales, doesn&#8217;t help much with any of our questions around comparing patterns between the two currencies.</p>
<h3>Dual-axis option</h3>
<p>A common approach, then, is to use dual y-axis, with different scales on each, to enable visibility of the data side by side. This can be a trap if not managed carefully, though. There are two key questions to ask:</p>
<p><strong>1. what should the range of the second axis be?</strong><br />
2. How to make it as easy as possible for user to interpret, without having to wrap their heads around lining up different data with different axes.</p>
<p>Depending on your tool of choice, you may have a variety of options. For example, default two-axis chart in Google Sheets looks like this:</p>
<p><img src="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/12/btc-vs-cad-gsheets-8.png?w=730" alt="plot of chunk unnamed-chunk-4"></p>
<p>It takes a bit to get oriented and in this case depends on your knowledge that BTC has much higher prices, so must be the left axis, and ADA is on the right axis. (this could be handled with better axis labelling)</p>
<p>The thing I notice here is that once I get settled that the blue line is Bitcoin, I can easily line up the start against 40,000 or so and as I follow the trend over time and get near the end, my eye gravitates toward the closer axis on the right &#8211; just under 3. Then I realize that doesn&#8217;t make sense and I switch back to the left axis for reference. So my brain is spinning a bit. Similarly with the red ADA line, where my tendency at the start of the period is to associate it with the left axis, then adjusting to look all the way over to the right axis and continue from there.</p>
<p>Once we get through that, the dual-axis does provide a way to view the two data sets alongside each other with much more granularity than the previous chart. In terms of the patterns we are looking to discover, it shows a relatively close relationship in trends and some points, divergence at others. Both have trended up over time. Maybe Cardano is more volatile, prone to relatively higher peaks and lower troughs?</p>
<p>Before we go too far with our conclusions, though, there are some further subtleties that we should be aware of.</p>
<h3>Relative Scales Matter</h3>
<p>The above Google sheets chart is based on an automatically-selected ratio of 25000:1 in the two axes. This automatic selection may be suitable in some cases, but not necessarily all.</p>
<p>In R, ggplot2 includes the option to add a secondary y-axis and set the transformation from the left axis to the right axis. This provides flexibility, but also comes with a caution:</p>
<ul>
<li><strong>the choice of relative scales can spin the interpretation of the data in different ways</strong>, as shown in the examples below: (based on 4 charts built with the code shown below but different transformation values)</li>
</ul>
<pre><code class="r">## select a relevant transformation factor
transfm &lt;- 5000
col_left &lt;- 'darkgoldenrod3'
col_right &lt;- 'blue'

ch_title &lt;- paste0('BTC vs ADA (scale ratio: ', transfm,')')
cd1 &lt;- crypto_data %&gt;% ggplot(aes(x=date)) +
  geom_line(aes(y=BTC_CAD), color=col_left)+
  geom_line(aes(y=ADA_CAD*transfm), color=col_right)+
  # Custom the Y scales:
  scale_y_continuous(
    # Features of the first axis
    name = "BTC",
    # Add a second axis and specify its features
    sec.axis = sec_axis(~./transfm, name="ADA")
  )+labs(title=ch_title, x="")
</code></pre>
<pre><code class="r">grid.arrange(cd1, cd2, cd3, cd4, nrow=2)
</code></pre>
<p><img src="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/12/da-plot-crypto-scale-all-1-7.png?w=730" alt="plot of chunk da-plot-crypto-scale-all"></p>
<p><em>Note</em>: <strong>GOLD line = Bitcoin, BLUE line = ADA</strong>. <em>The charts are simplified for this demo; in a more formal presentation, the data-axis match would be made more clear on the charts themselves.</em></p>
<p>In terms of relative scales, the smaller the ratio used, the more the secondary axis is stretched out. The result:</p>
<ul>
<li><strong>top-left</strong>: the lowest ratio used (5000:1) compresses the ADA (blue) line, revealing some patterns but suggesting it is relatively stable compared to BTC (gold) line.</li>
<li>cycling through a few variations shows the impact of various relative scales.</li>
<li><strong>lower-left</strong>: 25000:1 ratio as the Google sheets example.</li>
<li><strong>lower-right</strong>: by the time we get here, we have flipped the story to where ADA (blue) looks like the volatile one, soaring to great heights and then crashing back down, while BTC (gold) is relatively quiet.</li>
</ul>
<p>I&#8217;m not sure there is a clear/correct/easy answer here. It is just a hazard that comes with these charts, something to watch out for when data is presented this way, and a reason to be wary of using dual-axis charts.</p>
<p>The bottom line is that the answers to our questions may vary with the ratio between the two scales, may lead us to different conclusions, may cause us to take or recommend different actions depending on which view we are looking at, even though it is the <em>same</em> data.</p>
<h4>Ratio by Calculation</h4>
<p>As far as finding a fair/reasonable transformation value, one way to go about it may be to calculate an overall ratio of the two datasets. For example, rather than picking a number:</p>
<pre><code class="r">transfm &lt;- median(crypto_data$BTC_CAD)/median(crypto_data$ADA_CAD)
</code></pre>
<p>Median BTC vs ADA is 34242.71 so this could be useful for transformation, at least as starting point. You could 1) use this directly, as below, or 2) use it to guide you toward a nearby number that is a nice, round number to work with. For example, a reasonably close multiple in this case is <strong>10,000</strong> (chart in the top-right of the 4 charts above). This is much easier for a person to relate to when comparing the two axes, making it slightly less daunting.</p>
<pre><code class="r">ch_title &lt;- paste0('BTC vs ADA (scale ratio: ', transfm,')')
crypto_data %&gt;% ggplot(aes(x=date)) +
  geom_line(aes(y=BTC_CAD), color=col_left)+
  geom_line(aes(y=ADA_CAD*transfm), color=col_right)+
  annotate("text", x=date('2021-02-20'), y=75000, label='BTC', color='goldenrod', size=6)+
  annotate('text', x=date('2021-05-26'), y=100000, label='ADA', color='blue', size=6)+
  # Customize the Y scales:
  scale_y_continuous(
    labels=dollar_format(),
    # Features of the first axis
    name = "BTC",
    # Add a second axis and specify its features
    sec.axis = sec_axis(~./transfm, name="ADA")
  )+labs(title=ch_title, x="")
</code></pre>
<p><img src="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/12/da-plot-crypto-scale-calc-1-7.png?w=730" alt="plot of chunk da-plot-crypto-scale-calc"></p>
<h3>Conclusion &#8211; Scenario 1</h3>
<p>The takeaway for me here is confirmation of the &#8216;best practice&#8217; warnings to avoid the tempatation of dual-axis charts: both because of how the story in the data can be manipulated and for the mental gymnastics required to parse out what that story is.</p>
<p>Let&#8217;s see if this conclusion holds up for another scenario…</p>
<h2>Scenario 2: Comparing a Count and a Ratio</h2>
<p>In other cases, we may want to compare patterns in a volume or count metric with a related key indicator. Here&#8217;s an example using some Google Analytics data for a website:</p>
<ul>
<li>daily users</li>
<li>daily conversion rate</li>
</ul>
<p>Interesting questions with these metrics can include:</p>
<ul>
<li>what is the relationship between patterns in site traffic and conversion rates?</li>
<li>do increases in daily users correspond to decreases in conversion rates or vice versa?</li>
<li>are there any points where breaking of the typical relationship between these metrics warrants further investigation?</li>
</ul>
<p>Quick look at the data:</p>
<table class="table" style="width:auto !important;float:left;margin-right:10px;">
<thead>
<tr>
<th style="text-align:left;"> date</th>
<th style="text-align:right;"> users</th>
<th style="text-align:right;"> conv_rate</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:left;"> 2021-11-13</td>
<td style="text-align:right;"> 538502</td>
<td style="text-align:right;"> 0.026</td>
</tr>
<tr>
<td style="text-align:left;"> 2021-11-14</td>
<td style="text-align:right;"> 483116</td>
<td style="text-align:right;"> 0.019</td>
</tr>
<tr>
<td style="text-align:left;"> 2021-11-15</td>
<td style="text-align:right;"> 465402</td>
<td style="text-align:right;"> 0.018</td>
</tr>
<tr>
<td style="text-align:left;"> 2021-11-30</td>
<td style="text-align:right;"> 448335</td>
<td style="text-align:right;"> 0.015</td>
</tr>
<tr>
<td style="text-align:left;"> 2021-12-02</td>
<td style="text-align:right;"> 541645</td>
<td style="text-align:right;"> 0.012</td>
</tr>
<tr>
<td style="text-align:left;"> 2021-12-03</td>
<td style="text-align:right;"> 482156</td>
<td style="text-align:right;"> 0.013</td>
</tr>
</tbody>
</table>
<p>Again, very different scales &#8211; no surprise there.</p>
<h3>Dual-axis Lines</h3>
<pre><code class="r">transfm &lt;- median(ga_data$users)/median(ga_data$conv_rate)

col_left &lt;- 'blue'
col_right &lt;- 'purple'

ch_title &lt;- "Website users vs conversation rates"
ch_sub &lt;- paste0("(Ratio: ",transfm,")")
ga_data %&gt;% ggplot(aes(x=date)) +
  geom_line(aes(y=users), color=col_left)+
  geom_line(aes(y=conv_rate*transfm), color=col_right)+
  # Custom the Y scales:
  scale_y_continuous(
    # Features of the first axis
    name = "users", labels=comma,
    # Add a second axis and specify its features
    sec.axis = sec_axis(~./transfm, name="conversion %", labels=percent)
  )+labs(title=ch_title, subtitle=ch_sub, x="")+
  theme(axis.text.y.left = element_text(color=col_left, size=8),
        axis.title.y.left = element_text(color=col_left, size=12),
        axis.text.y.right = element_text(color=col_right, size=8),
        axis.title.y.right = element_text(color=col_right, size=12))
</code></pre>
<p><img src="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/12/da-plot-ga-1-1-7.png?w=730" alt="plot of chunk da-plot-ga-1"></p>
<p>Looks pretty messy but there does seem to be some degree of opposite movement in these two metrics. There is also the precipitous drop in conversion rate at the start of the period that is probably worth looking into, especially since there is little change in volume of users at that point.</p>
<h3>Dual-axis with Bar Chart</h3>
<p>When working with two different types of metrics, a variation on the line charts that can help to bring out the message within the data is to combine bar chart and line chart.</p>
<ul>
<li>bar chart to represent count or volume metrics</li>
<li>line chart for ratio or rate metrics</li>
</ul>
<p>In the example below, I have changed the transformation ratio from the <strong>median calculation (~26 million)</strong> to <strong>10 million</strong>. This <em>maybe</em> provides a more intuitive way to interpret the relationship: it is easy to see that when the scale doubles on one side, the other scale doubles as well.</p>
<pre><code class="r">transfm &lt;- median(ga_data$users)/median(ga_data$conv_rate)
transfm &lt;- 10000000
col_left &lt;- '#009E73'
col_right &lt;- 'darkblue'

ch_title &lt;- "Website Users vs Conversion Rates"
ga_data %&gt;% ggplot(aes(x=date)) +
  ## change line to bar chart for contrast
  geom_col(aes(y=users), fill=col_left)+
  geom_line(aes(y=conv_rate*transfm), color=col_right, size=1)+
  # Custom the Y scales:
  scale_y_continuous(
    # Features of the first axis
    name = "users", labels=comma,
    # Add a second axis and specify its features
    sec.axis = sec_axis(~./transfm, name="conversion %", labels=percent)
  )+labs(title=ch_title, x="")+
  theme(axis.text.y.left = element_text(color=col_left, size=10),
        axis.title.y.left = element_text(color=col_left, size=14),
        axis.text.y.right = element_text(color=col_right, size=10),
        axis.title.y.right = element_text(color=col_right, size=14))
</code></pre>
<p><img src="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/12/da-plot-ga-2-1-7.png?w=730" alt="plot of chunk da-plot-ga-2"></p>
<p>It does seem somewhat easier to untangle the relationship using the bar and line combination, along with the even 10M:1 ratio. From this view, it looks like there is no consistent pattern in the trends between the two metrics, with conversion rate sometimes rising with user count increases, sometimes dropping. So that provides us with some answers to our initial questions about the relationship.</p>
<p>We are still left with the same challenge, initially at least, of wrapping our heads around what is going on here, which axis is which, what the relative values are.</p>
<h3>Conclusion &#8211; Scenario 2</h3>
<p>This second scenario confirms that even for a different use case, dual-axis charts are problematic. The competing scales are an issue and switch to a combo of bar and line may help, but doesn&#8217;t remove all the problems.</p>
<h2>Tips</h2>
<p>As the above examples highlight, if you are unable to resist the temptation to display your data in a dual-axis chart there are some things to do in order to make it as easy as possible for users to extract meaning:</p>
<ul>
<li>make sure both axes are carefully and clearly labelled, with color or other signals to associate the axis to the respective data.</li>
<li>consider mixing bars (vol) and lines (ratios), although don&#8217;t rely on this to solve the major problems.</li>
<li>be responsible: don&#8217;t contort scales to fit your pre-defined message or beliefs or hopes or other biases.</li>
<li>make second axis a factor of 10 ratio if feasible, since it is easy for people to translate between the two scales.</li>
<li>for full transparency, disclose the ratio between the two scales that is being used.</li>
</ul>
<h2>Alternatives to the Double-Axis</h2>
<p>On the other hand, if we are to resist the temptation to dual-axis charts, what are the alternatives? I&#8217;ll share my thoughts &#8211; and examples &#8211; in the next blog post.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://catbirdanalytics.wordpress.com/2021/12/29/dual-axis-charts-temptations-traps-tips/feed/</wfw:commentRss>
			<slash:comments>1</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">1255</post-id>
		<media:thumbnail url="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/12/btc-ada-thumbnail-9.png" />
		<media:content url="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/12/btc-ada-thumbnail-9.png" medium="image">
			<media:title type="html">btc-ada-thumbnail.png</media:title>
		</media:content>

		<media:content url="https://0.gravatar.com/avatar/61a7e5aa4e24773b513b314f89b5611e80124a6e923e7234e89ca72a387b08d1?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">john</media:title>
		</media:content>

		<media:content url="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/12/btc-ada-thumbnail-10.png" medium="image">
			<media:title type="html">plot of chunk unnamed-chunk-1</media:title>
		</media:content>

		<media:content url="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/12/da-crypto-plot-01-1-7.png" medium="image">
			<media:title type="html">plot of chunk da-crypto-plot-01</media:title>
		</media:content>

		<media:content url="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/12/btc-vs-cad-gsheets-8.png" medium="image">
			<media:title type="html">plot of chunk unnamed-chunk-4</media:title>
		</media:content>

		<media:content url="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/12/da-plot-crypto-scale-all-1-7.png" medium="image">
			<media:title type="html">plot of chunk da-plot-crypto-scale-all</media:title>
		</media:content>

		<media:content url="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/12/da-plot-crypto-scale-calc-1-7.png" medium="image">
			<media:title type="html">plot of chunk da-plot-crypto-scale-calc</media:title>
		</media:content>

		<media:content url="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/12/da-plot-ga-1-1-7.png" medium="image">
			<media:title type="html">plot of chunk da-plot-ga-1</media:title>
		</media:content>

		<media:content url="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/12/da-plot-ga-2-1-7.png" medium="image">
			<media:title type="html">plot of chunk da-plot-ga-2</media:title>
		</media:content>
	</item>
		<item>
		<title>Don&#8217;t Underestimate Durability along with Scalability in Analytics Practice</title>
		<link>https://catbirdanalytics.wordpress.com/2021/11/01/dont-underestimate-durability-along-with-scalability-in-analytics-practice/</link>
					<comments>https://catbirdanalytics.wordpress.com/2021/11/01/dont-underestimate-durability-along-with-scalability-in-analytics-practice/#respond</comments>
		
		<dc:creator><![CDATA[John]]></dc:creator>
		<pubDate>Mon, 01 Nov 2021 14:50:00 +0000</pubDate>
				<category><![CDATA[Analytics Management]]></category>
		<category><![CDATA[rstats]]></category>
		<guid isPermaLink="false">http://catbirdanalytics.wordpress.com/?p=1171</guid>

					<description><![CDATA[One of the things our team has been working on over the past few years is improving the durability of our work. People often talk about the importance of scalability, and rightly so: scale of data and customer reach is the most obvious source of both opportunity and impact. While the benefits of scalability are&#8230; <a href="https://catbirdanalytics.wordpress.com/2021/11/01/dont-underestimate-durability-along-with-scalability-in-analytics-practice/" class="more-link">Continue reading <span class="screen-reader-text">Don&#8217;t Underestimate Durability along with Scalability in Analytics&#160;Practice</span></a>]]></description>
										<content:encoded><![CDATA[
<p class="wp-block-paragraph">One of the things our team has been working on over the past few years is improving the <strong>durability</strong> of our work. People often talk about the importance of <strong>scalability</strong>, and rightly so: scale of data and customer reach is the most obvious source of both opportunity and impact. While the benefits of scalability are generally understood, I see durability as a related quality that is less understood but even more fundamental. It has benefits of its own and is often a pre-requisite to scalability.</p>



<p class="wp-block-paragraph">I get interesting answers &#8211; or just confused looks &#8211; when I ask job candidates about how they build durability and scalability into their work and this often helps me separate from the newbies from the more experienced analysts. That&#8217;s another reason I thought it would be worth discussing.</p>



<p class="wp-block-paragraph">First let&#8217;s get clear on how we&#8217;re using these terms: What are we actually talking about when we say a project is &#8216;<strong>scalable</strong>&#8216; or &#8216;<strong>durable</strong>&#8216;?</p>



<p class="wp-block-paragraph"><a href="https://www.merriam-webster.com/dictionary/scalable">Scalable definition</a>, according to Merriam Webster</p>



<figure class="wp-block-image size-full is-resized"><img data-attachment-id="1173" data-permalink="https://catbirdanalytics.wordpress.com/scalability-defn/" data-orig-file="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/10/scalability-defn.png" data-orig-size="2116,1072" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="scalability-defn" data-image-description="" data-image-caption="" data-medium-file="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/10/scalability-defn.png?w=300" data-large-file="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/10/scalability-defn.png?w=730" src="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/10/scalability-defn.png" alt="scalable - capable of being easily expanded or upgraded on demand." class="wp-image-1173" width="400" /></figure>



<p class="wp-block-paragraph">Ok, we get that. Now, <a href="https://www.merriam-webster.com/dictionary/durable"><strong>durable definition</strong></a>, also according to Merriam Webster</p>



<figure class="wp-block-image size-full is-resized"><img data-attachment-id="1175" data-permalink="https://catbirdanalytics.wordpress.com/durability-defn/" data-orig-file="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/10/durability-defn.png" data-orig-size="2412,844" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="durability-defn" data-image-description="" data-image-caption="" data-medium-file="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/10/durability-defn.png?w=300" data-large-file="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/10/durability-defn.png?w=730" src="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/10/durability-defn.png" alt="durability - exist " class="wp-image-1175" width="480" /></figure>



<p class="wp-block-paragraph"> I like to think of it as boiling down to two dimensions:</p>



<ul class="wp-block-list"><li>durability = time</li><li>scalability = size</li></ul>



<figure class="wp-block-image size-large is-resized"><img data-attachment-id="1198" data-permalink="https://catbirdanalytics.wordpress.com/durable-vs-scalable-chart/" data-orig-file="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/10/durable-vs-scalable-chart.png" data-orig-size="830,574" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="durable-vs-scalable-chart" data-image-description="" data-image-caption="" data-medium-file="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/10/durable-vs-scalable-chart.png?w=300" data-large-file="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/10/durable-vs-scalable-chart.png?w=730" src="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/10/durable-vs-scalable-chart.png?w=830" alt="" class="wp-image-1198" width="415" height="287" srcset="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/10/durable-vs-scalable-chart.png?w=415 415w, https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/10/durable-vs-scalable-chart.png?w=150 150w, https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/10/durable-vs-scalable-chart.png?w=300 300w, https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/10/durable-vs-scalable-chart.png?w=768 768w, https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/10/durable-vs-scalable-chart.png 830w" sizes="(max-width: 415px) 100vw, 415px" /></figure>



<p class="wp-block-paragraph">The idea here is that we should not only think about scale but also about durability when we design, structure and execute our analytics work. This creates additional opportunities for impact and pays dividends over time, especially for projects where scale may not be as much of a factor &#8211; which can be a frequent case for lots of analytics work: </p>



<ul class="wp-block-list"><li><strong>smaller projects</strong> that may be repeated in future or may grow with follow-up requests &#8211; or reproduced to confirm results.</li><li>projects that may benefit from <strong>collaboration or hand-off </strong>to other team members.</li><li>work that can be <strong>re-used or repurposed </strong>down the road to answer similar questions.</li><li>work that helps to build a <strong>knowledge repository</strong> within the group to expand upon (rather than having to start from scratch).</li><li>work that supports <strong>automated </strong>processes.</li><li>work that may form MVP for larger scale project that may be undertaken further down the road.</li></ul>



<p class="wp-block-paragraph">The end result of incorporating durability into your analytics practice will include:</p>



<ul class="wp-block-list"><li><strong>Quicker turn-around time:</strong> using accessible, repeatable processes.</li><li><strong>Better standardization and consistency</strong> of deliverables: leveraging established structures, less need to re-think/design from scratch.</li><li><strong>Increased reliability </strong>of results, quality control: better reproducibility, less opportunity for manual error, stray formulas/references, over-written values. </li><li><strong>Improved clarity of communication</strong>: ease of understanding of your analytical logic by teammates, partners and stakeholders.</li><li><strong>Easier collaboration </strong>within the team: smoother transfer of work from one colleague to another, combination of work from different teammates, opportunities to build on work of each other.</li><li><strong>Progressive accumulation of capabilities</strong> and impact over time: building an expanding practice, less running in circles. </li></ul>



<p class="wp-block-paragraph">Ok, so how do we get there? The most obvious and critical step to increasing durability in your work:</p>



<ul class="wp-block-list"><li><strong>ditch those spreadsheets</strong> for R or other statistical processing software of your choosing!</li></ul>



<p class="wp-block-paragraph">There have been lots of words written and talks given on advantages of R over spreadsheets, so I&#8217;m not going to get into that here, other than to focus on durability and a specific and significant advantage.</p>



<p class="wp-block-paragraph">Even if we have R skills, the spreadsheets can be sooo tempting: we can quickly throw some data together, do some manual manipulation if necessary, create some pivot tables, build some easy charts or attractive tables, send an email to stakeholders, and move on. When we get the follow-up question that requires a change of parameters or additional data, or need to walk a teammate through it, or get a similar question 3 months later&#8230;that&#8217;s where things fall apart. Yes, spreadsheets have their place but these and other scenarios demonstrate the point: <strong>with spreadsheets, you will never achieve as much durability as you can with R &#8211; and that durability has significant value</strong>.</p>



<p class="wp-block-paragraph">Once you commit to moving your practice to R, it opens up all kinds of opportunities for increased durability:</p>



<ul class="wp-block-list"><li><strong>Organizing your code</strong> in a logical, easy-to-follow flow that can be understood and adapted over time.</li><li>Use <strong>re-usable methods</strong> that can be easily modified as needed: such as using variables as often as possible, rather than hard-coding in values.</li><li><strong>Commenting</strong> your code for additional understanding of &#8216;why&#8217; certain choices were made.</li><li>Using <strong>version control</strong> (GitHub, GitLab, etc) for even small, individual projects to build up a coherent, accessible repository.</li><li>Using <strong>RMarkdown</strong> to integrate reporting and analysis in with data-processing for even greater logical flow, ease of updating, transparency.</li><li><strong>Extending the workflow</strong> to integrated slide decks, flexdashboards, Shiny, etc.</li><li>P<strong>roviding complete source info</strong> within any end products, so origin can be tracked back by anyone.</li><li><strong>Sharing links to the end product</strong> stored in a standard location/format, rather than dropping tables or charts into email.</li></ul>



<p class="wp-block-paragraph">Of course, there are other practices along these lines, but you get the idea &#8211; the important thing is the <strong>&#8216;durability&#8217; mindset</strong>.</p>



<p class="wp-block-paragraph">Pretty much all of the above also supports and <strong>contributes to scalability</strong> as well. On this foundation, it becomes easy to add scalability features, such as:</p>



<ul class="wp-block-list"><li>writing custom functions to avoid duplicating code.</li><li>combining components from smaller projects to meet the demands of larger ones, since the components are well organized and documented.</li><li>building on pre-existing components and incorporating into machine learning models and workflows. </li></ul>



<p class="wp-block-paragraph">Ultimately, durability and scalability are inter-twined. The point is not to draw a hard distinction but to promote the idea that even without the need for scalability, durability still matters &#8211; and can make a huge difference to your practice and to impact on the business over the long-term, both individually and as a team.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://catbirdanalytics.wordpress.com/2021/11/01/dont-underestimate-durability-along-with-scalability-in-analytics-practice/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">1171</post-id>
		<media:thumbnail url="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/10/durability-defn.png" />
		<media:content url="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/10/durability-defn.png" medium="image">
			<media:title type="html">durability-defn</media:title>
		</media:content>

		<media:content url="https://0.gravatar.com/avatar/61a7e5aa4e24773b513b314f89b5611e80124a6e923e7234e89ca72a387b08d1?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">john</media:title>
		</media:content>

		<media:content url="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/10/scalability-defn.png" medium="image">
			<media:title type="html">scalable - capable of being easily expanded or upgraded on demand.</media:title>
		</media:content>

		<media:content url="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/10/durable-vs-scalable-chart.png?w=830" medium="image" />
	</item>
		<item>
		<title>Data Doesn&#8217;t Tell Stories: You Do&#8230;Respect the Data When You Do</title>
		<link>https://catbirdanalytics.wordpress.com/2021/09/19/data-doesnt-tell-stories-you-do-respect-the-data-when-you-do/</link>
					<comments>https://catbirdanalytics.wordpress.com/2021/09/19/data-doesnt-tell-stories-you-do-respect-the-data-when-you-do/#respond</comments>
		
		<dc:creator><![CDATA[John]]></dc:creator>
		<pubDate>Mon, 20 Sep 2021 05:27:54 +0000</pubDate>
				<category><![CDATA[Analytics Management]]></category>
		<category><![CDATA[analytics]]></category>
		<guid isPermaLink="false">http://catbirdanalytics.wordpress.com/?p=1145</guid>

					<description><![CDATA[(full disclosure: this article has affiliate links to books that I admire and highly recommend) We all know that data is only tool, a resource, a means to an end, not an end in itself. It&#8217;s critical that, as analysts, we are able to not just assemble the data but also interpret responsibly it for&#8230; <a href="https://catbirdanalytics.wordpress.com/2021/09/19/data-doesnt-tell-stories-you-do-respect-the-data-when-you-do/" class="more-link">Continue reading <span class="screen-reader-text">Data Doesn&#8217;t Tell Stories: You Do&#8230;Respect the Data When You&#160;Do</span></a>]]></description>
										<content:encoded><![CDATA[
<p class="has-small-font-size wp-block-paragraph">(<em>full disclosure: this article has affiliate links to books that I admire and highly recommend</em>)</p>



<p class="wp-block-paragraph">We all know that data is only tool, a resource, a means to an end, not an end in itself. It&#8217;s critical that, as analysts, we are able to not just assemble the data but also interpret responsibly it for our audiences. The best books I know for explaining how to do this effectively are <a rel="noreferrer noopener" target="_blank" href="https://www.amazon.ca/gp/product/1119002257/ref=as_li_tl?ie=UTF8&amp;camp=15121&amp;creative=330641&amp;creativeASIN=1119002257&amp;linkCode=as2&amp;tag=johnyuill-20&amp;linkId=2629141b8f8023776a06648cdef44e85">Storytelling with Data: A Data Visualization Guide for Business Professionals</a> by Cole Nussbaumer Knaflic, from the more technical data-visualization side, and Nancy Duarte&#8217;s <a rel="noreferrer noopener" target="_blank" href="https://www.amazon.ca/gp/product/1940858984/ref=as_li_tl?ie=UTF8&amp;camp=15121&amp;creative=330641&amp;creativeASIN=1940858984&amp;linkCode=as2&amp;tag=johnyuill-20&amp;linkId=85e51884182f0a77de6d873d6a90a259">DataStory: Explain Data and Inspire Action Through Story</a> focusing on presentation structure. </p>



<p class="wp-block-paragraph">We also all know that humans have strong biases and pattern-recognition abilities that can cause them to read a story in the data that may not actually be there. Or at least stretching the story beyond what is provided by the data. This is something we have to remain vigilant against. It&#8217;s one of the reasons <a rel="noreferrer noopener" target="_blank" href="https://www.amazon.ca/gp/product/0735217556/ref=as_li_tl?ie=UTF8&amp;camp=15121&amp;creative=330641&amp;creativeASIN=0735217556&amp;linkCode=as2&amp;tag=johnyuill-20&amp;linkId=8844f472952ccfacfdec912ed58828ce">The Scout Mindset: Why Some People See Things Clearly and Others Don&#8217;t</a> is such a powerful book.</p>



<p class="wp-block-paragraph">Take, for example, a recent article about the cryptocurrency Cardano. The key takeaway was summed up concisely in the title, &#8220;<a rel="noreferrer noopener" href="https://www.entrepreneur.com/article/385982" target="_blank">If You Day-Trade Cardano, It&#8217;s Best to Buy on Thursday&#8221;</a>, and the article presents data supporting this recommendation. All of which is potentially valuable information.    </p>



<p class="wp-block-paragraph">Before even getting to the data, though, there are some sharp turns into more speculative territory that are not supported by data and don&#8217;t appear to stand-up well to logic. The author notes the well-known tendency for Bitcoin price to be more volatile on weekends than weekdays, which can create weekend buying opportunities, at least for short-term gains. This is put down to an <a href="https://in.investing.com/news/why-do-crypto-prices-fall-on-the-weekend-2762109">Investing.com explanation</a> that lower trading volume on weekends leads to trades of &#8216;weekday&#8217; size moving prices more on the weekend. Ok, makes sense and verifiable with data. </p>



<p class="wp-block-paragraph">It is the next step &#8211; more of a leap, really &#8211; that goes too far: the author quotes the Investing.com article&#8217;s claim that weekend price volatility is accentuated by the fact that <strong>banks are closed on the weekend</strong>. </p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow"><p>&#8220;Banks are closed over the weekend and so investors are unable to add more money into their accounts&#8230;.They are trapped if the price falls over the weekend and&#8230;unable to make a profit if prices surge&#8230;&#8221;</p><cite>&#8211; <a href="https://in.investing.com/news/why-do-crypto-prices-fall-on-the-weekend-2762109">Investing.com</a> </cite></blockquote>



<p class="wp-block-paragraph">Say what, now? Even though I&#8217;m not the most sophisticated person when it comes to finance, it has a been a <em>very long time since I visited a bank</em> any day of the week for anything other than a cash machine. So I somehow doubt that people who are actively participating in the world of DEFI &#8211; which has at its core purpose the disruption of traditional financial institutions &#8211; are somehow held back from trading crypto and &#8216;trapped&#8217; by reliance on physical bank branches. </p>



<p class="wp-block-paragraph"><em>Side note: it&#8217;s not just me &#8211; the comments on the Investing.com article highlight the ridiculousness of this theory. </em></p>



<p class="wp-block-paragraph">The Cardano article goes on to make the case that there are &#8220;different trading behaviours between Bitcoin and up-and-coming altcoins&#8221; which is laid out convincingly with data. Again, though, we leap beyond what is in the data.</p>



<p class="wp-block-paragraph">These data-based insights are combined with the questionable  &#8220;banks are closed on weekends&#8221; theory to somehow make Cardano the &#8220;people&#8217;s cryptocurrency&#8221; and leads to the declaration that &#8220;Cardano is a &#8216;Blue-Collar&#8217; Crypto&#8221;.</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow"><p>&#8220;It&#8217;s the regular folks, those with nine-to-five job, that are buying ADA.&#8221; (Cardano coin)</p><cite>&#8211; <a href="https://www.entrepreneur.com/article/385982">If you Day-Trade Cardano, It&#8217;s Best to Buy on Thursday</a></cite></blockquote>



<p class="wp-block-paragraph">Well, maybe. But how would the author know that? How can we get there from the data? What other evidence supports this?</p>



<p class="wp-block-paragraph">Further down, the author admits &#8216;<strong>it is not entirely clear</strong>&#8216; what causes the price patterns observed in the data and that his proposed explanation is an &#8216;<strong>opinion</strong>&#8216;. Fair enough, but that is well after the theories are rolled out with an air of certainty that is just not there.</p>



<p class="wp-block-paragraph">And the thing is: so what? What difference does it make? The data shows that there is, at least for now, a probabilistic &#8216;Thursday&#8217; advantage to buying Cardano that we can profit from. Why do we need to invent reasons why this is the case? The human need for causal relationships is powerful, indeed.</p>



<p class="wp-block-paragraph">Stories can be effective for turning data into knowledge and turning that knowledge into action. But as analysts, I believe we need to be extra careful that those stories are supported by data. I work in an exceptionally creative organization, where there is no shortage of storytellers with theories and opinions about what is causing all manner of outcomes. It makes for a vibrant culture and thought-provoking discussion. The way I see it, the reason we are invited to the table as data analysts is to speak for the data, to weave it into stories that bring the data to life, but to resist the very human temptation to go beyond. It can be a fine line, but our credibility depends on it.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://catbirdanalytics.wordpress.com/2021/09/19/data-doesnt-tell-stories-you-do-respect-the-data-when-you-do/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">1145</post-id>
		<media:content url="https://0.gravatar.com/avatar/61a7e5aa4e24773b513b314f89b5611e80124a6e923e7234e89ca72a387b08d1?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">john</media:title>
		</media:content>
	</item>
		<item>
		<title>Google Trends + R: Leverage gtrendsR Package for More Powerful Search Trend Analytics</title>
		<link>https://catbirdanalytics.wordpress.com/2021/08/29/google-trends-r-leverage-gtrendsr-package-for-more-powerful-analytics/</link>
					<comments>https://catbirdanalytics.wordpress.com/2021/08/29/google-trends-r-leverage-gtrendsr-package-for-more-powerful-analytics/#comments</comments>
		
		<dc:creator><![CDATA[John]]></dc:creator>
		<pubDate>Sun, 29 Aug 2021 17:11:50 +0000</pubDate>
				<category><![CDATA[R Markdown]]></category>
		<category><![CDATA[R Stats]]></category>
		<guid isPermaLink="false">http://catbirdanalytics.wordpress.com/?p=1131</guid>

					<description><![CDATA[Google Trends is a popular tool for all manner of curiousity related to trends in search activity on the Google search engine: what topics are trending? what are trends for a given search term? how does this compare to other terms? what related terms are people using? how does interest vary by region? And lots&#8230; <a href="https://catbirdanalytics.wordpress.com/2021/08/29/google-trends-r-leverage-gtrendsr-package-for-more-powerful-analytics/" class="more-link">Continue reading <span class="screen-reader-text">Google Trends + R: Leverage gtrendsR Package for More Powerful Search Trend&#160;Analytics</span></a>]]></description>
										<content:encoded><![CDATA[<p><a href="https://trends.google.com/trends/">Google Trends</a> is a popular tool for all manner of curiousity related to trends in search activity on the Google search engine:</p>
<ul>
<li>what topics are trending?</li>
<li>what are trends for a given search term?</li>
<li>how does this compare to other terms?</li>
<li>what related terms are people using?</li>
<li>how does interest vary by region?</li>
</ul>
<p>And lots more.</p>
<p>Google Trends has recently passed its 15th birthday, prompting a Google blog post on <a href="https://blog.google/products/search/15-tips-getting-most-out-google-trends/">“15 Tips for Getting the Most Out of Google Trends”</a>. One thing they noted right at the start is that they used Google Trends to identify search queries related to Google Trends in order to prioritize content for a blog post. This is a classic use case &#8211; <strong>using Google Trends to fuel content decisions for your marketing</strong>.</p>
<p>An important point is that Google Trends does not represent or translate to an <em>actual</em> number of searches. This search interest presented is <strong>indexed between 0 and 100</strong>, where <strong>100 indicates the peak search interest during the particular date range</strong> reported on. Everything is <em>relative</em> in Google Trends.</p>
<h2>Going Beyond the Interface</h2>
<p><img src="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/google-trends-interface-page-crypto-7.png" alt="plot of chunk unnamed-chunk-1"></p>
<p>Google Trends is a convenient, intuitive interface packed with info that can be great for playing around with, but comes with limitations for tracking trends over time, sharing with others, analyzing more deeply. You can <strong>export</strong> to a spreadsheet, but this comes with inefficiencies as well: you have to export data for each component in the interface separately and you may lose source information. And if you want to make any adjustments (change date range, geography, category, add terms) you have to go back in there and repeat the process.</p>
<p><strong>Using R to work with Google Trends</strong> can provide a more efficient solution if you want to:</p>
<ul>
<li>quickly <strong>import</strong> Google Trends data into R for further analysis.</li>
<li>grab <strong>all the modules</strong> from the Google Trends interface at once (interest over time, geo data, related topics, related queries).</li>
<li><strong>repeat</strong> Google Trends reporting to monitor trends over time.</li>
<li><strong>reproduce</strong> the same Google Trends data in future, based on detailed record of your query.</li>
<li><strong>integrate</strong> Google Trends reporting with other datasets or reporting structures.</li>
</ul>
<h2>Google Trends in R</h2>
<p>Google does not provide an official API for Google Trends but the <a href="https://cran.r-project.org/web/packages/gtrendsR/gtrendsR.pdf"><strong>gtrendsR</strong></a> R package created by Philippe Massicotte is a major helper in the accessing Google Trends data within R for reporting and analysis. Along with all the benefits of using R to process and analyze data, the gtrendsR package provides some big advantages over using the Google Trends Interface:</p>
<ul>
<li><strong>Durability</strong>: don&#8217;t have to go to the interface and fetch the data each time, you have an on-going reference with source info. You have code that can be referred to, re-used, and shared with others.</li>
<li><strong>Scalability</strong>: can expand on existing queries, going beyond the limit of 5 that are available in the tool.</li>
</ul>
<h2>Google Trends Parameters</h2>
<p>Google Trends has a number of parameters that can be used to fine-tune your search: date ranges, geo data, categories, Google properties. These are <strong>available via the gtrendsR package</strong>, corresponding to the options in the Google Trend online interface. You just have to know how to tap into them:</p>
<ul>
<li><strong>Dates</strong>:
<ul>
<li>“now 1-H”: last hour &#8211; by MINUTE</li>
<li>“now 4-H”: last 4 hrs &#8211; by MINUTE</li>
<li>“now 1-d”: last day &#8211; every 8 MINUTES</li>
<li>“now 7-d”: last 7 days &#8211; HOURLY data</li>
<li>“today 1-m”: last 30 days &#8211; DAILY data</li>
<li>“today 3-m”: last 90 days &#8211; DAILY data</li>
<li>“today 12-m”: last 12 months &#8211; WEEKLY data</li>
<li>“today+5-y”: last 5 yrs (default) &#8211; WEEKLY data</li>
<li>“all” since beginning of Google Trends 2004</li>
<li>“YYYY-MM-DD YYYY-MM-DD”: custom start / end date &#8211; granularity will depend on time spans above</li>
</ul>
</li>
<li><strong>Geo</strong>:
<ul>
<li>use gtrendsR::<strong>countries</strong> to see complete list</li>
<li>close to 110,000 options, including country / state / city</li>
<li>code below shows how to filter for countries</li>
<li><strong>geo=“”</strong> for all countries</li>
</ul>
</li>
<li><strong>Categories</strong>:
<ul>
<li>use gtrendsR::<strong>categories</strong></li>
<li>over 1,400 categories, with ids that are used in the query</li>
<li><strong>category = 0</strong> for all categories</li>
</ul>
</li>
<li><strong>Google properties</strong>:
<ul>
<li>specify one or more of &#8216;web&#8217;, &#8216;news&#8217;, &#8216;images&#8217;, &#8216;froogle&#8217;, &#8216;youtube&#8217;</li>
<li>gprop=c(“web”, “youtube”) as example for web and youtube search</li>
</ul>
</li>
</ul>
<h2>Setup &#8211; Libraries</h2>
<p>There&#8217;s basically no setup required &#8211; no credentials, etc. Only need to load the gtrendsR package. <em>(I&#8217;ve pre-loaded other packages I&#8217;m using for general purpose, such as &#8216;tidyverse&#8217;, etc.)</em></p>
<pre><code class="r">library(gtrendsR) ## package for accessing Google Trends - all you need to get going!
</code></pre>
<h2>Single term query</h2>
<p>Using the gtrendsR package to get Google Trends for a single search term.</p>
<pre><code class="r">## basic search
gt_results &lt;- gtrends(keyword='cryptocurrency',
        geo="",
        time="now 7-d",
        gprop=c("web"),
        category=0)
</code></pre>
<p>The query returns a bundle of 7 data frames with different info, reflecting what is shown in the Google Trends interface:</p>
<pre><code class="r">names(gt_results)
</code></pre>
<pre>## [1] "interest_over_time"  "interest_by_country" "interest_by_region"  "interest_by_dma"     "interest_by_city"   
## [6] "related_topics"      "related_queries"
</pre>
<p>(see screenshot of Google Trends interface for comparison)</p>
<h3>Interest over time</h3>
<p>The <strong>&#8216;interest_over_time&#8217;</strong> data frame is the main data object, with relative search volume for the selected search term, country, period, property, and category.</p>
<pre><code class="r">chart_title &lt;- "Searches for: cryptocurrency"
sub_title &lt;- "Period: past 7 days; Geo: world; Prop: 'web'; Category: all"

## create chart based on search interest over time
gt_results$interest_over_time %&gt;% ggplot(aes(x=date, y=hits, color=keyword))+geom_line()+
  labs(title=chart_title, subtitle=sub_title, x="", y="")
</code></pre>
<p><img src="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/search-interest-over-time-1-15.png" alt="plot of chunk search-interest-over-time"></p>
<h3>Related Topics</h3>
<p>The &#8216;related_topics&#8217; data frame holds data on queries related to the main search term (&#8216;cryptocurrency&#8217; in this case).</p>
<pre><code class="r">str(gt_results$related_topics)
</code></pre>
<pre>## 'data.frame':    35 obs. of  5 variables:
##  $ subject       : chr  "100" "9" "8" "6" ...
##  $ related_topics: chr  "top" "top" "top" "top" ...
##  $ value         : chr  "Cryptocurrency" "Bitcoin" "Investment" "Coin" ...
##  $ keyword       : chr  "cryptocurrency" "cryptocurrency" "cryptocurrency" "cryptocurrency" ...
##  $ category      : int  0 0 0 0 0 0 0 0 0 0 ...
##  - attr(*, "reshapeLong")=List of 4
##   ..$ varying:List of 1
##   .. ..$ value: chr "top"
##   .. ..- attr(*, "v.names")= chr "value"
##   .. ..- attr(*, "times")= chr "top"
##   ..$ v.names: chr "value"
##   ..$ idvar  : chr "id"
##   ..$ timevar: chr "related_topics"
</pre>
<ul>
<li><strong>subject</strong>: relative value to main search term</li>
<li><strong>related_topics</strong>: contains &#8216;top&#8217; topics and &#8216;rising&#8217; topics</li>
<li><strong>value</strong>: related topic</li>
<li><strong>keyword</strong>: main search term</li>
<li><strong>category</strong>: search term category, if applicable</li>
</ul>
<pre><code class="r">  chart_title &lt;- "crytopcurrency: related topics"
  ## 
  top &lt;- gt_results$related_topics %&gt;% filter(related_topics=='top' &amp; !is.na(subject) &amp;
                                                subject!='&lt;1')
  ## convert value to factor and subject to numeric
  top$value &lt;- as.factor(top$value)
  top$subject &lt;- as.numeric(top$subject)
  ## PLOT related topics
  top %&gt;% ggplot(aes(x=reorder(value, subject), y=subject))+geom_col()+
  coord_flip()+
    scale_y_continuous(expand=expansion(add=c(0,10)))+
    labs(title=chart_title, y='', x='')
</code></pre>
<p><img src="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/related-topics-1-13.png" alt="plot of chunk related-topics"></p>
<h3>Related Queries</h3>
<p>Related Queries module has similar structure to Related Topics:</p>
<pre><code class="r">str(gt_results$related_queries)
</code></pre>
<pre>## 'data.frame':    50 obs. of  5 variables:
##  $ subject        : chr  "100" "91" "66" "56" ...
##  $ related_queries: chr  "top" "top" "top" "top" ...
##  $ value          : chr  "cryptocurrency price" "crypto" "best cryptocurrency" "cryptocurrency news" ...
##  $ keyword        : chr  "cryptocurrency" "cryptocurrency" "cryptocurrency" "cryptocurrency" ...
##  $ category       : int  0 0 0 0 0 0 0 0 0 0 ...
##  - attr(*, "reshapeLong")=List of 4
##   ..$ varying:List of 1
##   .. ..$ value: chr "top"
##   .. ..- attr(*, "v.names")= chr "value"
##   .. ..- attr(*, "times")= chr "top"
##   ..$ v.names: chr "value"
##   ..$ idvar  : chr "id"
##   ..$ timevar: chr "related_queries"
</pre>
<pre><code class="r">  chart_title &lt;- "crytopcurrency: related queries"
  ## 
  top &lt;- gt_results$related_queries %&gt;% filter(related_queries=='top' &amp; !is.na(subject) &amp;
                                                subject!='&lt;1')
  ## convert value to factor and subject to numeric
  top$value &lt;- as.factor(top$value)
  top$subject &lt;- as.numeric(top$subject)
  ## PLOT related topics
  top %&gt;% ggplot(aes(x=reorder(value, subject), y=subject))+geom_col()+
  coord_flip()+
    scale_y_continuous(expand=expansion(add=c(0,10)))+
    labs(title=chart_title, y='', x='')
</code></pre>
<p><img src="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/related-queries-1-7.png" alt="plot of chunk related-queries"></p>
<p>You can see right there the possibilities for content marketing: if you&#8217;re a crypto currency blogger, for example, you may want to write about how to identify the &#8216;best cryptocurrency&#8217;, etc.</p>
<h2>Multi-Term Query</h2>
<p>The same approach used to query for single terms can be extended to multiple terms. The example below shows how to load up a collection of terms, as well as leveraging other variables for the query.</p>
<pre><code class="r">## create list of multiple search terms
srch_term &lt;- c("cryptocurrency",
               "bitcoin",
               "ethereum",
               "stock market",
               "real estate")
period &lt;- "today 12-m"
ctry &lt;- "" ## blank = world; based on world countries ISO code
prop &lt;- c("web")
cat &lt;- 0 ## 0 = all categories

## user-friendly versions of parameters for use in chart titles or other query descriptions
ctry_ &lt;- ifelse(ctry=="","world",ctry)
prop_ &lt;- paste0(prop, collapse=", ")
cat_ &lt;- ifelse(cat==0,"all",cat)

## use gtrendsR to call google trends API
gt_results &lt;- gtrends(keyword=srch_term,
        geo=ctry,
        time=period,
        gprop=prop,
        category=cat)
</code></pre>
<p>The gt_results object returned is the same as with single query, just has more values for the &#8216;keyword&#8217; variable in each data frame.</p>
<h3>Interest over time</h3>
<pre><code class="r">chart_title &lt;- paste0("Search trends: ", paste(srch_term[1:2], collapse=", "), " +")
sub_title &lt;- paste0("Period: ", period, "; Geo: ", ctry_, "; Prop: ", prop_, "; Category: ", cat_)

## create chart based on search interest over time
gt_results$interest_over_time %&gt;% ggplot(aes(x=date, y=hits, color=keyword))+geom_line()+
  scale_y_continuous(expand=expansion(add=c(0,0)))+
  labs(title=chart_title, subtitle=sub_title, x="", y="")
</code></pre>
<p><img src="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/search-interest-multi-terms-1-7.png" alt="plot of chunk search-interest-multi-terms"></p>
<p>As noted, there is the same limitation as the interface of <em>maximum five terms</em> at a time. Of course, one of the beauties of doing thing programmatically is that provides opportunities for combining queries to go beyond five. For that, you need to make sure there is a common term in each queries to calculate the relative values. This is a topic for another blog post.</p>
<h2>Search Terms vs Search &#8216;Topics&#8217;</h2>
<p>Tip #3 in <a href="https://blog.google/products/search/15-tips-getting-most-out-google-trends/">“15 Tips for Getting the Most Out of Google Trends”</a> mentions the importance of <strong>choosing search &#8216;topics&#8217; when available</strong> for a given search term.</p>
<p>Using a topic version of the term has benefits, but also complications for programmatic access:</p>
<ul>
<li>you need to go to Google Trends to check if a topic is available (it may also be called something different than &#8216;topic&#8217;, like &#8216;currency&#8217; for the term &#8216;Bitcoin&#8217;).</li>
<li>the topic term is an indecipherable code, as circled in the URL in the browser bar above.</li>
<li>comparisons may be skewed if mixing terms and topics.</li>
</ul>
<p><img src="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/google-trends-search-type-example-full-2.png" alt="plot of chunk unnamed-chunk-7"></p>
<p>The best way to handle this situation:</p>
<ol>
<li>If in doubt, go to Google Trends and determine if there is a topic for your term.</li>
<li>If you want to use the topic, copy the code from the URL.</li>
<li>The term is encoded so you can drop it in a <a href="https://www.url-encode-decode.com/">URL decoder</a> OR…just <strong>replace the &#8216;%2Fm%2F&#8217; with &#8216;/m/&#8217;</strong> and use the rest as is.</li>
<li>Use this in your query &#8211; it will still work with <strong>gtrendsR</strong>…<em>BUT</em>…it <strong>won&#8217;t mix with search terms</strong> in the same query. <img src="https://s0.wp.com/wp-content/mu-plugins/wpcom-smileys/twemoji/2/72x72/1f626.png" alt="😦" class="wp-smiley" style="height: 1em; max-height: 1em;" /> You can still do multiple topic terms, even if identified as different topics, but you can&#8217;t mix terms and topics.</li>
</ol>
<p>So you may have to decide between using the topic version or the basic search term, depending on your needs. As in the example above, both versions <em>tend</em> to trend the same, with the topic having higher volume. No guarantees, though.</p>
<h3>Example with topics</h3>
<p><img src="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/google-trends-topic-crypto-bitcoin-3.png" alt="plot of chunk unnamed-chunk-8"></p>
<pre><code class="r">## create list of multiple search terms using topic codes, separated by commas in URL and decoded
srch_term &lt;- c("/m/0vpj4_b",
               "/m/05p0rrx")

srch_topic &lt;- c("Cryptocurrency_topic",
                "Bitcoin_currency")

period &lt;- "today 12-m"
ctry &lt;- "" ## blank = world; based on world countries ISO code
prop &lt;- c("web")
cat &lt;- 0 ## 0 = all categories

## user-friendly versions of parameters for use in chart titles or other query descriptions
ctry_ &lt;- ifelse(ctry=="","world",ctry)
prop_ &lt;- paste0(prop, collapse=", ")
cat_ &lt;- ifelse(cat==0,"all",cat)

## use gtrendsR to call google trends API
gt_results &lt;- gtrends(keyword=srch_term,
        geo=ctry,
        time=period,
        gprop=prop,
        category=cat)

## replace codes with topics
## - extract interest_over_time data frame
gt_interest &lt;- gt_results$interest_over_time
## - replace codes with corresponding terms
gt_interest &lt;- gt_interest %&gt;% mutate(
  keyword=ifelse(keyword==srch_term[1],srch_topic[1],
                 ifelse(keyword==srch_term[2], srch_topic[2],""))
)
</code></pre>
<pre><code class="r">## create chart based on search interest over time
pint1 &lt;- gt_interest %&gt;% ggplot(aes(x=date, y=hits, color=keyword))+geom_line(size=2)+
  scale_y_continuous(expand=expansion(add=c(0,0)))+
  scale_color_manual(values=c("red","blue"))+
  theme(legend.position = 'top')+
  labs(x="", y="")

pint2 &lt;- gt_interest %&gt;% group_by(keyword) %&gt;% summarize(avg_int=mean(hits)) %&gt;%
  ggplot(aes(x=keyword, y=avg_int, fill=keyword))+geom_col()+
  scale_y_continuous(limit=c(0,100))+
  scale_fill_manual(values=c("red","blue"))+
  theme(legend.position = 'none',
        axis.text.x = element_blank())+
  labs(x="Average", y="")

grid.arrange(pint2, pint1, nrow=1, widths=c(2,8))
</code></pre>
<p><img src="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/google-trends-topics-bar-line-1-3.png" alt="plot of chunk google-trends-topics-bar-line"></p>
<h2>Wrap-up and Additional References</h2>
<p>There&#8217;s lots more you can do with Google Trends and gtrendsR. I hope to cover some more ideas in future blog posts. In the meantime, hopefully this is a helpful start!</p>
<h3>References</h3>
<p>Other useful references for working with gtrendsR and Google Trends:</p>
<ul>
<li><a href="https://cran.r-project.org/web/packages/gtrendsR/gtrendsR.pdf">gtrendsR vignette</a></li>
<li><a href="https://blog.quiet.ly/industry/exploring-google-trends-explore-function-finding-keywords-queries/">https://blog.quiet.ly/industry/exploring-google-trends-explore-function-finding-keywords-queries/</a></li>
<li><a href="https://blog.google/products/search/15-tips-getting-most-out-google-trends/">https://blog.google/products/search/15-tips-getting-most-out-google-trends/</a>
<ul>
<li>the more authoritative and recent blog post mentioned above</li>
</ul>
</li>
</ul>
<p>Happy trending!</p>
]]></content:encoded>
					
					<wfw:commentRss>https://catbirdanalytics.wordpress.com/2021/08/29/google-trends-r-leverage-gtrendsr-package-for-more-powerful-analytics/feed/</wfw:commentRss>
			<slash:comments>7</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">1131</post-id>
		<media:thumbnail url="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/search-interest-over-time-1-14.png" />
		<media:content url="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/search-interest-over-time-1-14.png" medium="image">
			<media:title type="html">search-interest-over-time-1.png</media:title>
		</media:content>

		<media:content url="https://0.gravatar.com/avatar/61a7e5aa4e24773b513b314f89b5611e80124a6e923e7234e89ca72a387b08d1?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">john</media:title>
		</media:content>

		<media:content url="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/google-trends-interface-page-crypto-7.png" medium="image">
			<media:title type="html">plot of chunk unnamed-chunk-1</media:title>
		</media:content>

		<media:content url="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/search-interest-over-time-1-15.png" medium="image">
			<media:title type="html">plot of chunk search-interest-over-time</media:title>
		</media:content>

		<media:content url="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/related-topics-1-13.png" medium="image">
			<media:title type="html">plot of chunk related-topics</media:title>
		</media:content>

		<media:content url="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/related-queries-1-7.png" medium="image">
			<media:title type="html">plot of chunk related-queries</media:title>
		</media:content>

		<media:content url="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/search-interest-multi-terms-1-7.png" medium="image">
			<media:title type="html">plot of chunk search-interest-multi-terms</media:title>
		</media:content>

		<media:content url="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/google-trends-search-type-example-full-2.png" medium="image">
			<media:title type="html">plot of chunk unnamed-chunk-7</media:title>
		</media:content>

		<media:content url="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/google-trends-topic-crypto-bitcoin-3.png" medium="image">
			<media:title type="html">plot of chunk unnamed-chunk-8</media:title>
		</media:content>

		<media:content url="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/google-trends-topics-bar-line-1-3.png" medium="image">
			<media:title type="html">plot of chunk google-trends-topics-bar-line</media:title>
		</media:content>
	</item>
		<item>
		<title>How to Create a Custom R Markdown Template</title>
		<link>https://catbirdanalytics.wordpress.com/2021/08/16/how-to-create-a-custom-r-markdown-template/</link>
					<comments>https://catbirdanalytics.wordpress.com/2021/08/16/how-to-create-a-custom-r-markdown-template/#comments</comments>
		
		<dc:creator><![CDATA[John]]></dc:creator>
		<pubDate>Tue, 17 Aug 2021 04:57:00 +0000</pubDate>
				<category><![CDATA[R Markdown]]></category>
		<category><![CDATA[R Stats]]></category>
		<category><![CDATA[RMarkdown]]></category>
		<guid isPermaLink="false">http://catbirdanalytics.wordpress.com/?p=899</guid>

					<description><![CDATA[If you use the R language for statistical programming, you probably use RStudio as your integrated development environment (IDE) and you may also use &#8211; or at least have come across &#8211; R Markdown as a file type that allows you to create documents that weave together data and text analysis. (Gratuitous example: my curiousity&#8230; <a href="https://catbirdanalytics.wordpress.com/2021/08/16/how-to-create-a-custom-r-markdown-template/" class="more-link">Continue reading <span class="screen-reader-text">How to Create a Custom R Markdown&#160;Template</span></a>]]></description>
										<content:encoded><![CDATA[
<p class="wp-block-paragraph">If you use the <a href="https://www.r-project.org/">R language</a> for statistical programming, you probably use <a href="https://www.rstudio.com/">RStudio</a> as your integrated development  environment (IDE) and you may also use &#8211; or at least have come across &#8211; <a href="https://rmarkdown.rstudio.com/">R Markdown</a> as a file type that allows you to create documents that weave together data and text analysis. (Gratuitous example: my curiousity project on <a href="https://jyuill.github.io/van-weather-report.html">Vancouver weather history</a>.) So far, so good: R Markdown is an amazing tool.</p>



<p class="wp-block-paragraph">You may also find that, each time you set up a new R Markdown document, you find yourself <strong>deleting the standard template code and then typing in a lot of the same code each time</strong> to get started: header settings, code block options, adding your favourite libraries, and other tricks you have picked up over time. And of course that means you have to remember all that stuff &#8211; not hard, but, as programmers, we&#8217;re always looking for ways to cut corners, remove tediousness, get straight to work, right?</p>



<p class="wp-block-paragraph"><em>(For long-time readers who are wondering what this has to do with web analytics&#8230;a) this blog has expanded  beyond web analytics; b) to see the surprise twist that relates to Google Analytics jump to the <a href="#supportfiles">Support Files </a>section <img src="https://s0.wp.com/wp-content/mu-plugins/wpcom-smileys/twemoji/2/72x72/1f609.png" alt="😉" class="wp-smiley" style="height: 1em; max-height: 1em;" /></em></p>



<h2 class="wp-block-heading">R Markdown Streamlining Options</h2>



<p class="wp-block-paragraph">There are <strong>a few options</strong> to avoid all the remembering/re-typing every time you start a new R Markdown document:</p>



<ol class="wp-block-list"><li><strong>Copy another recent/similar document</strong>, keep what you need, delete everything else: all well and good but you&#8217;re a productive person with lots on the go, each project has its own nuance and it would be nice not to have figure out which project fits best, find it, copy files, remove extra code, etc.</li><li><strong>Use a pre-configured template from a package</strong> that has one or more templates bundled in it: the Templates section in the <a href="https://rmarkdown.rstudio.com/gallery.html">R Markdown Gallery</a> has some solid recommendations, several of which are geared toward specific purposes, like meeting the guidelines for the Journal of Statistical Sciences. But that may not quite fit with your jam.</li><li><strong>CREATE YOUR OWN CUSTOM TEMPLATE:</strong> which takes a bit of work but can we well worth the effort, is what this article is about, and I&#8217;m assuming if you read this far, is what you are up for!  </li></ol>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img data-attachment-id="928" data-permalink="https://catbirdanalytics.wordpress.com/r-markdown-template-options/" data-orig-file="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/r-markdown-template-options.png" data-orig-size="1074,454" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="r-markdown-template-options" data-image-description="" data-image-caption="" data-medium-file="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/r-markdown-template-options.png?w=300" data-large-file="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/r-markdown-template-options.png?w=730" src="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/r-markdown-template-options.png?w=1024" alt="" class="wp-image-928" width="511" height="216" srcset="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/r-markdown-template-options.png?w=511 511w, https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/r-markdown-template-options.png?w=1022 1022w, https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/r-markdown-template-options.png?w=150 150w, https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/r-markdown-template-options.png?w=300 300w, https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/r-markdown-template-options.png?w=768 768w" sizes="(max-width: 511px) 100vw, 511px" /><figcaption>Accessing R Markdown template options that come with various packages. Follow the steps below, and you can add yours to this list!</figcaption></figure></div>



<h2 class="wp-block-heading">Custom Template Creation</h2>



<p class="wp-block-paragraph">The thing with R Markdown templates is that <strong>you need to create an R package</strong> to hold the template and to make it available within R Studio. This can sound intimidating to some folks &#8211; I know it did for me &#8211; but if you don&#8217;t have experience with packages, <strong>rest assured: it&#8217;s a straightforward process</strong> and once you go through it you will see the benefit.</p>



<p class="wp-block-paragraph">One of the benefits is that you can roll multiple different markdown templates into the same package, so that you can access a collection for different use cases from that one source. </p>



<h3 class="wp-block-heading">Essentials</h3>



<p class="wp-block-paragraph">The essential process for creating a template is:</p>



<ol class="wp-block-list"><li>Create a new package project in R Studio.</li><li>Add specific folder: <strong>inst &gt; rmarkdown &gt; templates &gt; template folder name of your choosing &gt; skeleton</strong></li><li><strong>skeleton.Rmd:</strong> In the &#8216;skeleton&#8217; folder, add a <strong>skeleton.Rmd</strong> file and customize it to your purposes.</li><li><strong>template.yaml</strong>: One level up, in the folder named after the template (&#8216;template folder name of your choosing&#8217;) add a <strong>template.yaml</strong> file that specifies the name of the template, as you want it to appear in the template list for new file R Markdown files in RStudio.</li></ol>



<p class="wp-block-paragraph">And that&#8217;s basically it for the setup. From there, you can install the package on your local machine and once installed the template(s) from the package will be available in your list of templates when you go to start a new markdown document. You can also push the package to a Git repo for sharing with others, continue developing/expanding.</p>



<h3 class="wp-block-heading">Details</h3>



<p class="wp-block-paragraph">So let&#8217;s breakdown the above into some more specifics.</p>



<h4 class="wp-block-heading">1. Create a new package project in R Studio</h4>



<p class="wp-block-paragraph">To create a new project in R Studio that has out-of-the-box components for a package:</p>



<ol class="wp-block-list"><li>Create Project &gt; New Directory &gt; R Package using devtools</li><li>Directory Name: provide a name that you have chosen for your template package</li><li>Description [optional]: add details to the DESCRIPTION file if you like </li></ol>



<h4 class="wp-block-heading">2. Add a specific folder to the package</h4>



<p class="wp-block-paragraph">With the project setup, now you need to add a specific folder structure for R Markdown templates:</p>



<ul class="wp-block-list"><li><strong>inst &gt; rmarkdown &gt; templates &gt; template name of choice &gt; skeleton</strong><ul><li>as Chester Ismay pointed out in his <a href="https://chester.rbind.io/ecots2k16/template_pkg/">helpful article on the same topic</a> you can use dir.create function: dir.create(&#8220;inst/rmarkdown/templates/&lt;template name&gt;/skeleton&#8221;, recursive=TRUE)</li></ul></li></ul>



<figure class="wp-block-image size-large is-resized"><img data-attachment-id="926" data-permalink="https://catbirdanalytics.wordpress.com/rmarkdown-template-folder-structure/" data-orig-file="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/rmarkdown-template-folder-structure.png" data-orig-size="1276,280" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="rmarkdown-template-folder-structure" data-image-description="" data-image-caption="" data-medium-file="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/rmarkdown-template-folder-structure.png?w=300" data-large-file="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/rmarkdown-template-folder-structure.png?w=730" src="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/rmarkdown-template-folder-structure.png?w=1024" alt="" class="wp-image-926" width="766" height="168" srcset="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/rmarkdown-template-folder-structure.png?w=1024 1024w, https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/rmarkdown-template-folder-structure.png?w=766 766w, https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/rmarkdown-template-folder-structure.png?w=150 150w, https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/rmarkdown-template-folder-structure.png?w=300 300w, https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/rmarkdown-template-folder-structure.png 1276w" sizes="(max-width: 766px) 100vw, 766px" /><figcaption>Resulting folder structure in a package called &#8216;templateforthewin&#8217; and template called &#8216;templatebestever&#8217;</figcaption></figure>



<h4 class="wp-block-heading">3. skeleton.Rmd file</h4>



<p class="wp-block-paragraph">Add a file called &#8216;<strong>skeleton.Rmd</strong>&#8216; to the skeleton folder, same as you would any new R Markdown file. This is the file that will serve as the actual template. Customize it with whatever settings &amp; code you would routinely be adding for your markdown files, so that it will be replicated when you use the template.</p>



<p class="wp-block-paragraph">This might include things like:</p>



<ul class="wp-block-list"><li><strong>customized yaml headers</strong> &#8230;</li></ul>



<figure class="wp-block-image size-medium is-resized"><img loading="lazy" data-attachment-id="914" data-permalink="https://catbirdanalytics.wordpress.com/yaml-header-custom-2/" data-orig-file="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/yaml-header-custom-2.png" data-orig-size="546,312" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="yaml-header-custom-2" data-image-description="" data-image-caption="" data-medium-file="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/yaml-header-custom-2.png?w=300" data-large-file="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/yaml-header-custom-2.png?w=546" src="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/yaml-header-custom-2.png?w=300" alt="" class="wp-image-914" width="299" height="171" srcset="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/yaml-header-custom-2.png?w=299 299w, https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/yaml-header-custom-2.png?w=150 150w, https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/yaml-header-custom-2.png 546w" sizes="(max-width: 299px) 100vw, 299px" /></figure>



<ul class="wp-block-list"><li><strong>initial settings, default libraries </strong>to include&#8230;</li></ul>



<figure class="wp-block-image size-medium is-resized"><img loading="lazy" data-attachment-id="916" data-permalink="https://catbirdanalytics.wordpress.com/r-markdown-setup-2/" data-orig-file="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/r-markdown-setup-2.png" data-orig-size="582,354" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="r-markdown-setup-2" data-image-description="" data-image-caption="" data-medium-file="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/r-markdown-setup-2.png?w=300" data-large-file="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/r-markdown-setup-2.png?w=582" src="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/r-markdown-setup-2.png?w=300" alt="" class="wp-image-916" width="299" height="182" srcset="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/r-markdown-setup-2.png?w=299 299w, https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/r-markdown-setup-2.png?w=150 150w, https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/r-markdown-setup-2.png 582w" sizes="(max-width: 299px) 100vw, 299px" /></figure>



<ul class="wp-block-list"><li><strong>default text / code blocks </strong>&#8230;</li></ul>



<figure class="wp-block-image size-large is-resized"><img loading="lazy" data-attachment-id="919" data-permalink="https://catbirdanalytics.wordpress.com/r-markdown-text-codeblock2/" data-orig-file="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/r-markdown-text-codeblock2.png" data-orig-size="952,372" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="r-markdown-text-codeblock2" data-image-description="" data-image-caption="" data-medium-file="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/r-markdown-text-codeblock2.png?w=300" data-large-file="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/r-markdown-text-codeblock2.png?w=730" src="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/r-markdown-text-codeblock2.png?w=952" alt="" class="wp-image-919" width="476" height="186" srcset="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/r-markdown-text-codeblock2.png?w=476 476w, https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/r-markdown-text-codeblock2.png?w=150 150w, https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/r-markdown-text-codeblock2.png?w=300 300w, https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/r-markdown-text-codeblock2.png?w=768 768w, https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/r-markdown-text-codeblock2.png 952w" sizes="(max-width: 476px) 100vw, 476px" /></figure>



<p class="wp-block-paragraph">&#8230;and much more to suit your needs!</p>



<p class="wp-block-paragraph">Once you have the customizations in place, test by knitting as you normally would. When you have confirmed that the results are as expected, <strong>delete the skeleton.html </strong>output file &#8211; otherwise it will show up when you use the template.</p>



<h4 class="wp-block-heading">4. template.yaml file</h4>



<p class="wp-block-paragraph">A file called &#8216;<strong>template.yaml</strong>&#8216; is required immediately within the template folder, at the same level of the &#8216;<strong>skeleton</strong>&#8216; folder to provide meta data for the template.</p>



<figure class="wp-block-image size-large"><img loading="lazy" width="1024" height="227" data-attachment-id="920" data-permalink="https://catbirdanalytics.wordpress.com/template-yaml/" data-orig-file="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/template-yaml.png" data-orig-size="2138,476" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="template-yaml" data-image-description="" data-image-caption="" data-medium-file="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/template-yaml.png?w=300" data-large-file="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/template-yaml.png?w=730" src="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/template-yaml.png?w=1024" alt="" class="wp-image-920" srcset="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/template-yaml.png?w=1024 1024w, https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/template-yaml.png?w=2048 2048w, https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/template-yaml.png?w=150 150w, https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/template-yaml.png?w=300 300w, https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/template-yaml.png?w=768 768w, https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/template-yaml.png?w=1440 1440w" sizes="(max-width: 1024px) 100vw, 1024px" /><figcaption>template.yaml location and content</figcaption></figure>



<p class="wp-block-paragraph">Content for template.yaml:</p>



<ul class="wp-block-list"><li><strong>name: </strong>the name of the template as you want it to appear in the R Markdown template list (seems to work either in quotes or not).</li><li><strong>description: </strong>apparently optional and i&#8217;m not even sure how it is used, but things ran smoother for me in several tests when I included it. <img src="https://s0.wp.com/wp-content/mu-plugins/wpcom-smileys/twemoji/2/72x72/1f609.png" alt="😉" class="wp-smiley" style="height: 1em; max-height: 1em;" /></li><li><strong>create_dir:</strong> optional but useful because it specifies whether or not a new folder is created containing your .Rmd file based on the template. </li></ul>



<p class="wp-block-paragraph">Now your template is ready to use.</p>



<h2 class="wp-block-heading">Using the Template</h2>



<p class="wp-block-paragraph">Using the template is as easy as:</p>



<ol class="wp-block-list"><li>Installing your new package by running: <strong>devtools::install(&#8220;&lt;path&gt;/&lt;template package name&gt;&#8221;)</strong></li><li>Load the new package: <strong>library(&lt;template package name&gt;)</strong></li></ol>



<p class="wp-block-paragraph">Should look something like this:</p>



<figure class="wp-block-image size-large is-resized"><img loading="lazy" data-attachment-id="931" data-permalink="https://catbirdanalytics.wordpress.com/r-markdown-template-install/" data-orig-file="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/r-markdown-template-install.png" data-orig-size="1230,1090" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="r-markdown-template-install" data-image-description="" data-image-caption="" data-medium-file="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/r-markdown-template-install.png?w=300" data-large-file="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/r-markdown-template-install.png?w=730" src="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/r-markdown-template-install.png?w=1024" alt="" class="wp-image-931" width="512" height="454" srcset="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/r-markdown-template-install.png?w=1024 1024w, https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/r-markdown-template-install.png?w=512 512w, https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/r-markdown-template-install.png?w=150 150w, https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/r-markdown-template-install.png?w=300 300w, https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/r-markdown-template-install.png?w=768 768w" sizes="(max-width: 512px) 100vw, 512px" /><figcaption>3. Open an R project: New File &gt; R Markdown &gt; Templates and you should see it in the list (may have to close/re-open project to see it)</figcaption></figure>



<figure class="wp-block-image size-large is-resized"><img loading="lazy" data-attachment-id="932" data-permalink="https://catbirdanalytics.wordpress.com/template-in-rmarkdown-template-list-1/" data-orig-file="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/template-in-rmarkdown-template-list-1.png" data-orig-size="1026,442" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="template-in-rmarkdown-template-list-1" data-image-description="" data-image-caption="" data-medium-file="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/template-in-rmarkdown-template-list-1.png?w=300" data-large-file="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/template-in-rmarkdown-template-list-1.png?w=730" src="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/template-in-rmarkdown-template-list-1.png?w=1024" alt="" class="wp-image-932" width="512" height="221" srcset="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/template-in-rmarkdown-template-list-1.png?w=1024 1024w, https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/template-in-rmarkdown-template-list-1.png?w=512 512w, https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/template-in-rmarkdown-template-list-1.png?w=150 150w, https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/template-in-rmarkdown-template-list-1.png?w=300 300w, https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/template-in-rmarkdown-template-list-1.png?w=768 768w" sizes="(max-width: 512px) 100vw, 512px" /><figcaption>Voila!</figcaption></figure>



<h2 class="wp-block-heading">Extras</h2>



<p class="wp-block-paragraph">That hopefully gives you the idea and enough info to move forward with confidence. Of course, there is lots that can be done to extend the convenience of templates. A couple common enhancements are <strong>putting the package on Github</strong> and adding <strong>support files</strong> that work with the template. This is where the benefit of being able to bundle multiple templates in the same package comes in.</p>



<h3 class="wp-block-heading">Github Repo</h3>



<p class="wp-block-paragraph">If you use <a href="https://github.com/">Github</a> (and I hope you do!), you know the benefits of code management, sharing, version control, etc. Once you have your package set up on your machine, there are a couple of options. I&#8217;m no Github expert, so I took the path of least resistance: created a new Github repo with same name as my R package project and followed the instructions from there.</p>



<p class="wp-block-paragraph"> </p>



<figure class="wp-block-image size-large is-resized"><img loading="lazy" data-attachment-id="935" data-permalink="https://catbirdanalytics.wordpress.com/github-push-existing-repo/" data-orig-file="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/github-push-existing-repo.png" data-orig-size="1864,1056" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="github-push-existing-repo" data-image-description="" data-image-caption="" data-medium-file="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/github-push-existing-repo.png?w=300" data-large-file="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/github-push-existing-repo.png?w=730" src="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/github-push-existing-repo.png?w=1024" alt="" class="wp-image-935" width="768" height="435" srcset="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/github-push-existing-repo.png?w=1024 1024w, https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/github-push-existing-repo.png?w=768 768w, https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/github-push-existing-repo.png?w=1536 1536w, https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/github-push-existing-repo.png?w=150 150w, https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/github-push-existing-repo.png?w=300 300w, https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/github-push-existing-repo.png?w=1440 1440w" sizes="(max-width: 768px) 100vw, 768px" /></figure>



<h3 class="wp-block-heading" id="supportfiles">Support Files (css etc)</h3>



<p class="wp-block-paragraph">Beyond the markdown settings/code that you add to the skeleton.Rmd file, you can also assemble support files that can be used in conjunction with skeleton.Rmd for extra bells and whistles. There are placed in the <strong>same directory as the skeleton.Rmd</strong> file and then referenced from within skeleton.Rmd. Some great reference info/examples on the <a href="https://rstudio.github.io/rstudio-extensions/rmarkdown_templates.html#supporting-files">R Markdown Templates website </a>.</p>



<p class="wp-block-paragraph">For example, I have a need for a template for web pages that use standard navigation and also <a href="https://tagmanager.google.com/#/home">Google Tag Manager </a>code snippets for <strong>Google Analytics</strong>:</p>



<figure class="wp-block-image size-large is-resized"><img loading="lazy" data-attachment-id="937" data-permalink="https://catbirdanalytics.wordpress.com/r-markdown-template-support/" data-orig-file="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/r-markdown-template-support.png" data-orig-size="1884,726" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="r-markdown-template-support" data-image-description="" data-image-caption="" data-medium-file="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/r-markdown-template-support.png?w=300" data-large-file="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/r-markdown-template-support.png?w=730" src="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/r-markdown-template-support.png?w=1024" alt="" class="wp-image-937" width="767" height="296" srcset="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/r-markdown-template-support.png?w=1024 1024w, https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/r-markdown-template-support.png?w=767 767w, https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/r-markdown-template-support.png?w=1534 1534w, https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/r-markdown-template-support.png?w=150 150w, https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/r-markdown-template-support.png?w=300 300w, https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/r-markdown-template-support.png?w=1440 1440w" sizes="(max-width: 767px) 100vw, 767px" /><figcaption>Example using support files for R Markdown Template (_navbar.html will be incorporated automatically, so doesn&#8217;t need to be referenced)</figcaption></figure>



<h2 class="wp-block-heading">Conclusion</h2>



<p class="wp-block-paragraph">So you can see there&#8217;s lots of power in having your own custom R Markdown templates&#8230; so go forth and templatize!</p>



<h2 class="wp-block-heading">Additional Resources</h2>



<p class="wp-block-paragraph">As always, gratitude to those who have provided guidance that I have leaned on:</p>



<ul class="wp-block-list"><li><a href="https://rstudio.github.io/rstudio-extensions/rmarkdown_templates.html#overview">RStudio RMarkdown Template reference</a></li><li><a href="https://chester.rbind.io/ecots2k16/template_pkg/">Creating a Basic Template Package in R</a></li><li><a href="https://kbroman.org/pkg_primer/pages/github.html">Putting your R Package on Github</a></li></ul>
]]></content:encoded>
					
					<wfw:commentRss>https://catbirdanalytics.wordpress.com/2021/08/16/how-to-create-a-custom-r-markdown-template/feed/</wfw:commentRss>
			<slash:comments>1</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">899</post-id>
		<media:content url="https://0.gravatar.com/avatar/61a7e5aa4e24773b513b314f89b5611e80124a6e923e7234e89ca72a387b08d1?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">john</media:title>
		</media:content>

		<media:content url="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/r-markdown-template-options.png?w=1024" medium="image" />

		<media:content url="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/rmarkdown-template-folder-structure.png?w=1024" medium="image" />

		<media:content url="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/yaml-header-custom-2.png?w=300" medium="image" />

		<media:content url="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/r-markdown-setup-2.png?w=300" medium="image" />

		<media:content url="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/r-markdown-text-codeblock2.png?w=952" medium="image" />

		<media:content url="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/template-yaml.png?w=1024" medium="image" />

		<media:content url="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/r-markdown-template-install.png?w=1024" medium="image" />

		<media:content url="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/template-in-rmarkdown-template-list-1.png?w=1024" medium="image" />

		<media:content url="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/github-push-existing-repo.png?w=1024" medium="image" />

		<media:content url="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/08/r-markdown-template-support.png?w=1024" medium="image" />
	</item>
		<item>
		<title>Publish R Markdown to WordPress site? Yes You Can!</title>
		<link>https://catbirdanalytics.wordpress.com/2021/08/02/publish-r-markdown-to-wordpress-site-yes-you-can/</link>
					<comments>https://catbirdanalytics.wordpress.com/2021/08/02/publish-r-markdown-to-wordpress-site-yes-you-can/#comments</comments>
		
		<dc:creator><![CDATA[John]]></dc:creator>
		<pubDate>Tue, 03 Aug 2021 01:30:31 +0000</pubDate>
				<category><![CDATA[R Markdown]]></category>
		<category><![CDATA[R Stats]]></category>
		<guid isPermaLink="false">http://catbirdanalytics.wordpress.com/?p=859</guid>

					<description><![CDATA[This post is based on an R Markdown file published directly to WordPress.com &#8211; and you can do it, too! It was made possible by the great work of others, shared in the following highly recommended resources: RWordPress pkg by Duncan Temple Lang (2012) &#8211; there was a helpful vignette but can&#039;t find it any&#8230; <a href="https://catbirdanalytics.wordpress.com/2021/08/02/publish-r-markdown-to-wordpress-site-yes-you-can/" class="more-link">Continue reading <span class="screen-reader-text">Publish R Markdown to WordPress site? Yes You&#160;Can!</span></a>]]></description>
										<content:encoded><![CDATA[<p>This post is based on an R Markdown file published directly to WordPress.com &#8211; and you can do it, too! It was made possible by the great work of others, shared in the following <strong>highly recommended</strong> resources:</p>
<ol>
<li><a href="https://github.com/duncantl/RWordPress">RWordPress pkg</a> by Duncan Temple Lang (2012) &#8211; there was a helpful vignette but can&#039;t find it any longer</li>
<li><a href="https://yihui.org/knitr/demo/wordpress/">Publish blog posts from R + knitr to WordPress</a> by Yihui Xie (2013)</li>
<li><a href="http://3.14a.ch/archives/2015/03/08/how-to-publish-with-r-markdown-in-wordpress/">How to Publish with R Markdown in WordPress</a> by 3.14a (2015)</li>
<li><a href="http://sites.tufts.edu/emotiononthebrain/2017/08/12/blog-posting-from-r-markdown-to-wordpress/">Blog Posting from R Markdown to WordPress</a> by Heather Urry (2017)</li>
<li><a href="https://tobiasdienlin.com/2019/03/08/how-to-publish-a-blog-post-on-wordpress-using-rmarkdown/">How to Publish a Blog Post on WordPress using RMarkdown</a> by Tobias Deinlin (2019)</li>
</ol>
<p>So nothing new here, other than recap of the above and confirmation that although the code behind this was produced in 2012-13, it <strong>still works as of August, 2021</strong>. <img src="https://s0.wp.com/wp-content/mu-plugins/wpcom-smileys/twemoji/2/72x72/1f642.png" alt="🙂" class="wp-smiley" style="height: 1em; max-height: 1em;" /></p>
<p>Note that since my blog is on <a href="https://wordpress.com">WordPress.com</a> hosted site, I&#039;m focusing on that environment. Resources above cover variations to the process required for a self-hosted <a href="https://wordpress.org/">WordPress.org</a> site. </p>
<p>RMarkdown file available in this Github repo: <a href="https://github.com/jyuill/r-markdown-wordpress">https://github.com/jyuill/r-markdown-wordpress</a> </p>
<h2>Steps</h2>
<p>Summary of steps to be taken to publish R Markdown on WordPress.</p>
<ol>
<li>Install packages needed.</li>
<li>Do some R Markdown &#8211; example case below.</li>
<li>Publish to WordPress &#8211; by running separate code using RWordPress pkg.</li>
</ol>
<p>Key thing to note here is that you are not directly uploading the html file that your RMarkdown produces. Rather you are knitting the .Rmd file in a way that results in a blog post being published to your WordPress blog (can be public or draft).  </p>
<h2>1. Install Packages</h2>
<p>The key to the process is the <strong>RWordPress</strong> package, supported by the XMLRPC package, both of which you can get using the code below (along with knitr pkg):</p>
<pre><code class="r">## References
## http://tabvizexplorer.com/how-to-upload-r-markdown-directly-to-wordpress/

## Get required packages
if (!require(&#039;knitr&#039;)) {install.packages(&quot;knitr&quot;)}
if (!require(&#039;devtools&#039;)) {install.packages(&quot;devtools&quot;)}
if (!require(&#039;RWordPress&#039;)) {devtools::install_github(c(&quot;duncantl/XMLRPC&quot;, &quot;duncantl/RWordPress&quot;))}
</code></pre>
<p>As others have warned, the <strong>RWordPress package is no longer being maintained</strong> &#8211; so use at your own risk.</p>
<p>The other key is the <strong>&#039;knit2wp&#039;</strong> function in the knitr package, hence the need to install that if not already installed, as well. </p>
<h2>2. Produce R Markdown</h2>
<p>As mentioned, this blog post in entirely created with RMarkdown. For demonstrating data analysis features, I&#039;m walking through an example below based on my personal collection of <a href="https://github.com/jyuill/proj-r-van-weather/raw/master/output/van-weather.csv">Vancouver weather data</a> ;).</p>
<p>The idea here is to demonstrate some typical Markdown capabilities that you may want to have in your blog post:</p>
<ul>
<li>weaving together text and data processing</li>
<li>basic html like headings, bullet lists</li>
<li>show code chunks (with horizontal scrolling)</li>
<li>show data structure and summary
<ul>
<li>allow for side-scrolling</li>
</ul>
</li>
<li>data visualization &#8211; a couple of varieties</li>
<li>integrate dynamic values into text flow (with )</li>
</ul>
<h3>Annual Temperature Data</h3>
<p>To explore the data across the period covered, summarize by year. (<em>mouseover the code block and drag to scroll horizontally</em>)</p>
<pre><code class="r">## summarize annual temperature
annual_summary &lt;- dataset %&gt;% filter(Year&lt;max(dataset$Year)) %&gt;% ## remove most recent yr, since incomplete data
  group_by(Year) %&gt;% summarize(max_temp=max(Max.Temp, na.rm=TRUE),
                               min_temp=min(Min.Temp, na.rm=TRUE),
                               mean_daily_temp=mean(Mean.Temp, na.rm=TRUE),
                               annual_precip=sum(Total.Precip, na.rm=TRUE)) 
</code></pre>
<p>Check out the data structure: (<em>mouseover to scroll horizontally</em>)</p>
<pre>## tibble [51 × 5] (S3: tbl_df/tbl/data.frame)
##  $ Year           : num [1:51] 1970 1971 1972 1973 1974 ...
##  $ max_temp       : num [1:51] 30.6 29.4 27.2 28.9 27.2 27.8 25 30.5 30.1 28.7 ...
##  $ min_temp       : num [1:51] -7.2 -12.8 -11.1 -10.6 -8.3 -7.2 -6.7 -8.1 -13.5 -11.2 ...
##  $ mean_daily_temp: num [1:51] 9.43 9.09 9.07 9.59 9.91 ...
##  $ annual_precip  : num [1:51] 875 1335 1244 1000 1248 ...
</pre>
<p>Summary of the data:</p>
<pre>##       Year         max_temp       min_temp      mean_daily_temp annual_precip 
##  Min.   :1970   Min.   :25.0   Min.   :-15.20   Min.   : 8.92   Min.   : 810  
##  1st Qu.:1982   1st Qu.:27.9   1st Qu.:-11.25   1st Qu.: 9.88   1st Qu.:1029  
##  Median :1995   Median :28.8   Median : -9.10   Median :10.27   Median :1207  
##  Mean   :1995   Mean   :28.8   Mean   : -9.42   Mean   :10.30   Mean   :1174  
##  3rd Qu.:2008   3rd Qu.:29.6   3rd Qu.: -7.70   3rd Qu.:10.80   3rd Qu.:1296  
##  Max.   :2020   Max.   :34.4   Max.   : -3.10   Max.   :11.45   Max.   :1522
</pre>
<h4>Annual temperature patterns across the period:</h4>
<p>(<em>Name the code chunk with a meaningful name for the plot as this will be used to save the image file for uploading to WordPress</em>)</p>
<pre><code class="r">## visualize
tplot1 &lt;- annual_summary %&gt;% 
  ggplot(aes(x=Year))+
  geom_line(aes(y=mean_daily_temp), color=&#039;black&#039;)+
  geom_line(aes(y=max_temp), color=&#039;red&#039;)+
  geom_line(aes(y=min_temp), color=&#039;blue&#039;)+
  labs(title=&quot;Annual Max., Min., and Mean (Daily) Temperature&quot;,
       y=&#039;Temperature (celsius)&#039;)+
  theme_light()

tplot1
</code></pre>
<p><img src="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/12/temperature-plot-1-2.png" alt="plot of chunk temperature-plot" /></p>
<p>I&#039;m no climatologist, and this is very imprecise data exploration, but appears there might be a slight but gradual upward drift in temperatures over this relatively short period.</p>
<h3>Annual Precipitation</h3>
<p>Vancouver is famous for it&#039;s rainfall. How much rain has Vancouver received each year over the period covered? 1970 to 2020<br />
(<em>years shown are based on dynamic reference to data source</em>)</p>
<p><img src="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/12/precipitation-plot-1-1.png" alt="plot of chunk precipitation-plot" /></p>
<p>That&#039;s right&hellip;Vancouver frequently gets <strong>over 1,000 mm &#8211; 1 meter &#8211; of rain</strong> in a year!</p>
<ul>
<li>average for last 50 years:1174.153 per year</li>
<li>median over last 50 years: 1207 per year</li>
</ul>
<p>(<em>figures based on automatic calculations at run-time</em>)</p>
<h3>Monthly Precipitation Patterns</h3>
<p>Surely the rain is not steady all year long? What are the monthly patterns?</p>
<p><img src="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/12/monthly-precipitation-plot-1-1.png" alt="plot of chunk monthly-precipitation-plot" /></p>
<p>Dry in summer, wet in winter &#8211; especially November, December and, to lesser extent, January. By May, we&#039;re <em>usually</em> safely out of the rainy season. Until October rolls around and the rains set in again.</p>
<h2>3. Publish to WordPress</h2>
<p>Once your R Markdown file is polished to your liking, you just need to run some simple code to send it to WordPress:</p>
<pre>1. load libraries
library(RWordPress)
library(knitr)

2. set credentials
options(WordPressLogin= c(&lt;username&gt;=&#039;&lt;pwd&gt;&#039;),
       WordPressURL=paste0(&#039;https://&lt;blog name&gt;.wordpress.com/xmlrpc.php&#039;))

3. setup to upload plots
 - saves image files named after code chunks in WordPress Media section by date
opts_knit$set(upload.fun=function(file){library(RWordPress);uploadFile(file)$url;})

4. knit to WP
knit2wp(&quot;&lt;markdown file name&gt;&quot;,        ## markdown file to publish
        title = &quot;&lt;post title&gt;&quot;,        ## title for the post in WordPress
        publish = FALSE,               ## FALSE to add as draft; TRUE to go direct to publish
        action = &quot;newPost&quot;,            ## for new post; alternatives: &quot;editPost&quot;, &quot;newPage&quot;
        #postid                        ## needed with editPost - get from WP interface
        shortcode= FALSE,              ## optional - affects how code is shown; default FALSE
        categories= c(&#039;R Markdown&#039;))   ## set categories, if desired

5. Upload featured image / post thumbnail: option: wp_post_thumbnail=postThumbnail$id
postThumbnail &lt;- RWordPress::uploadFile(&quot;figure/&lt;image file&gt;&quot;,overwrite = TRUE)
</pre>
<p>You can set up a standard file and substitute values for markdown file and title as needed. Of course, you want to <strong>keep it private since you have password in there</strong> &#8211; add to .gitignore if you have a public Github repo.</p>
<h2>Limitations</h2>
<p>Or at least things I haven&#039;t figured out how to do yet:</p>
<ol>
<li><strong>No interactive plotly plots</strong>: render fine in R Markdown but don&#039;t make it to WP. Probably because plots are published based on static image files.</li>
<li><strong>Internal anchor links</strong>: tricky to set internal anchor links. Using {#anchor-name} beside heading works in R Markdown, but shows up as printed text in WP; using html like h2 tag with id=&ldquo;anchor-name&rdquo; works in WP and R Markdown knit but doesn&#039;t show up as heading in document outline &#8211; which can be a pain for long docs.</li>
<li><strong>glimpse</strong> function: for some reason, using the glimpse() function seems to throw an error. (caused me several hrs of trouble-shooting)</li>
</ol>
<p>If I find solutions to these &#8211; or uncover more limitations &#8211; I&#039;ll pass along updates.</p>
<h2>Alternatives</h2>
<p>At the end of the day, this works but not ideal and may not provide the flexibility everyone is looking for. There is also the element of risk inherent in the fact that the process depends on R packages that are no longer maintained.</p>
<p>I was keen to get this to work because I have a legacy WordPress blog that I want to build on, including accessing the features of the world&#039;s most popular content management system, and I want to extend it with work done in my primary environment &#8211; RStudio / RMarkdown.</p>
<p>Obviously, there are lots of other options, and warnings during process suggest you may want to check out <strong>blogdown</strong> for building blogs/websites with R Markdown.</p>
<p>So we&#039;ll see where this goes, but in meantime, hopefully some helpful tips here for others determined to make R Markdown work in WordPress.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://catbirdanalytics.wordpress.com/2021/08/02/publish-r-markdown-to-wordpress-site-yes-you-can/feed/</wfw:commentRss>
			<slash:comments>1</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">859</post-id>
		<media:thumbnail url="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/12/temperature-plot-1-1.png" />
		<media:content url="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/12/temperature-plot-1-1.png" medium="image">
			<media:title type="html">temperature-plot-1.png</media:title>
		</media:content>

		<media:content url="https://0.gravatar.com/avatar/61a7e5aa4e24773b513b314f89b5611e80124a6e923e7234e89ca72a387b08d1?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">john</media:title>
		</media:content>

		<media:content url="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/12/temperature-plot-1-2.png" medium="image">
			<media:title type="html">plot of chunk temperature-plot</media:title>
		</media:content>

		<media:content url="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/12/precipitation-plot-1-1.png" medium="image">
			<media:title type="html">plot of chunk precipitation-plot</media:title>
		</media:content>

		<media:content url="https://catbirdanalytics.wordpress.com/wp-content/uploads/2021/12/monthly-precipitation-plot-1-1.png" medium="image">
			<media:title type="html">plot of chunk monthly-precipitation-plot</media:title>
		</media:content>
	</item>
		<item>
		<title>I&#8217;m Begging You: STOP using Text in Your Presentation Slides</title>
		<link>https://catbirdanalytics.wordpress.com/2021/07/31/im-begging-you-stop-using-text-in-your-presentation-slides/</link>
					<comments>https://catbirdanalytics.wordpress.com/2021/07/31/im-begging-you-stop-using-text-in-your-presentation-slides/#respond</comments>
		
		<dc:creator><![CDATA[John]]></dc:creator>
		<pubDate>Sat, 31 Jul 2021 18:43:18 +0000</pubDate>
				<category><![CDATA[Analytics Management]]></category>
		<category><![CDATA[Data Presentation]]></category>
		<guid isPermaLink="false">http://catbirdanalytics.wordpress.com/?p=784</guid>

					<description><![CDATA[Ever since the dawn of Powerpoint, expert advice has been consistent: when giving a presentation, minimize the use of text on your slides. There are lots of well-documented reasons for this. It basically comes down to the fact that you want your audience to listen to you and take in what you are saying, not&#8230; <a href="https://catbirdanalytics.wordpress.com/2021/07/31/im-begging-you-stop-using-text-in-your-presentation-slides/" class="more-link">Continue reading <span class="screen-reader-text">I&#8217;m Begging You: STOP using Text in Your Presentation&#160;Slides</span></a>]]></description>
										<content:encoded><![CDATA[
<p class="wp-block-paragraph">Ever since the dawn of Powerpoint, expert advice has been consistent: <strong>when giving a presentation, minimize the use of text on your slides</strong>. There are lots of <a href="https://www.speakwithpersuasion.com/lot-of-text-on-slides/">well-documented reasons</a> for this.  It basically comes down to the fact that you want your audience to <em>listen</em> to you and take in what you are saying, not be distracted by reading a slide behind you. The human brain can&#8217;t process listening and reading at the same time. And we know that text is not necessary for communication: we listen to podcasts, we watch the Ted Talks, we listen to political speeches. Heck, I&#8217;ve even watched entire 2 hour movies and understood the whole thing without any text on the screen at all. </p>



<p class="wp-block-paragraph">And yet&#8230;we persist with the illusion that having text for people to read &#8211; while we&#8217;re talking &#8211; will reinforce our message and drive home our key points. It certainly can&#8217;t hurt to have a few words up there, we tell ourselves. It will help fill up the space so people aren&#8217;t staring into the abyss, we tell ourselves. It&#8217;s a win-win, we tell ourselves, because it will both help the audience get what we&#8217;re talking about AND give us some cues to stay on track. </p>



<p class="wp-block-paragraph">Ah, there we have it: <strong>the real reason we have text on our slides is not to help our audience, it&#8217;s to help <em>us</em></strong>. <strong>And that is the worst reason of all.</strong> </p>



<p class="wp-block-paragraph">The curse of slide text hit home for me during a recent presentation from a colleague. There were some helpful diagrams and charts that set the context well for the customer lifetime value model that he was presenting. There was also a collection of bullet points. Not a lot of text, probably similar amount to what I have used myself <em>many </em>times in the past. But <strong>it was just enough so that as we was talking, I couldn&#8217;t help but try to read, and as I read I noticed I wasn&#8217;t understanding what I was reading, because he was talking. </strong>And as I switched to listening to him talk, to try to get back on track, I realized I had missed part of what he said because I was trying to read. So now I was completely lost. As he finished his explanation and asked if there were any questions, I was left staring at the screen in the realization that I had no clue of the point he was trying to make. Then on to the next slide, with more of the same, eventually leaving me (and I&#8217;m sure most of the others present) with only a vague idea of the project he was describing and an unsettling feeling that this was probably something we should know about. Maybe that was why there was only silence when he asked for questions at the end.  </p>



<p class="wp-block-paragraph">It was a wake-up call. All the advice over the years fully, finally sunk in. As did an immediate feeling of guilt for all the confusion and poor communication I had spawned over the years, doing exactly the same as my colleague, thinking I was being effective, when I should have known better.</p>



<p class="wp-block-paragraph">Our job as communicators &#8211; whether analytics concepts or anything else &#8211; is to <strong>remove the friction </strong>between us, our message and our audience. We need to get out of our own way. Removing text from slides so the focus can be squarely on our speech is the first, easiest step. Here are some <strong>quick suggestions</strong> &#8211; consult an expert for more. (Basically just do whatever <a href="https://www.duarte.com/">Nancy Duarte</a> says ;))</p>



<ol class="wp-block-list"><li><strong>Visuals:</strong> Slides are for images, charts, diagrams, helpful visual aids, not text.</li><li><strong>Notes:</strong> Speaking notes go in the..notes section <em>below </em>the slide.</li><li><strong>Separate decks </strong>for presenting and reading: If the deck will be distributed later for reference or for those not able to attend the presentation, provide a stand-alone version of the deck that has the text included. This could be as simple as copy/paste/format your speaking notes into the main slides.</li><li><strong>Pause: </strong>If you need to have text on a slide for the audience to read during your presentation, pause and ask them to read it before moving on. That way there is at least less competition between your speech and the text.</li></ol>



<p class="wp-block-paragraph">Of course, there is a <em>lot </em>more that goes into a stellar presentation than ruthlessly removing text from slides. But it is a critical step, and one that I am planning to lean heavily on, starting with a presentation I have next week. Cheers!</p>
]]></content:encoded>
					
					<wfw:commentRss>https://catbirdanalytics.wordpress.com/2021/07/31/im-begging-you-stop-using-text-in-your-presentation-slides/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">784</post-id>
		<media:content url="https://0.gravatar.com/avatar/61a7e5aa4e24773b513b314f89b5611e80124a6e923e7234e89ca72a387b08d1?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">john</media:title>
		</media:content>
	</item>
		<item>
		<title>Getting % New Visit Trend in SiteCatalyst</title>
		<link>https://catbirdanalytics.wordpress.com/2012/01/07/getting-new-visit-trend-in-sitecatalyst/</link>
					<comments>https://catbirdanalytics.wordpress.com/2012/01/07/getting-new-visit-trend-in-sitecatalyst/#comments</comments>
		
		<dc:creator><![CDATA[John]]></dc:creator>
		<pubDate>Sun, 08 Jan 2012 06:43:58 +0000</pubDate>
				<category><![CDATA[SiteCatalyst]]></category>
		<category><![CDATA[Web Analytics]]></category>
		<category><![CDATA[new visitors]]></category>
		<category><![CDATA[return visitors]]></category>
		<category><![CDATA[visits]]></category>
		<guid isPermaLink="false">http://catbirdanalytics.wordpress.com/?p=676</guid>

					<description><![CDATA[Not exactly cutting-edge SiteCatalyst manipulation, but a basic yet not-so-obvious technique that may be helpful, particularly to new users migrating from something like Google Analytics. [Wow, long time no blog post. Good to be back at it&#8230;all part of the New Year plan!] Once of the most basic segmentation dimensions in web analytics the differentiation&#8230; <a href="https://catbirdanalytics.wordpress.com/2012/01/07/getting-new-visit-trend-in-sitecatalyst/" class="more-link">Continue reading <span class="screen-reader-text">Getting % New Visit Trend in&#160;SiteCatalyst</span></a>]]></description>
										<content:encoded><![CDATA[<p><strong>Not exactly cutting-edge SiteCatalyst manipulation, but a basic yet not-so-obvious technique that may be helpful, particularly to new users migrating from something like Google Analytics.</strong></p>
<p style="text-align:center;"><img loading="lazy" class="aligncenter" title="New-visit-snapshot" src="https://i0.wp.com/farm8.staticflickr.com/7149/6655989213_eef15ea9e5.jpg" alt="Percent new visits" width="354" height="72" /></p>
<p>[<em>Wow, long time no blog post. Good to be back at it&#8230;all part of the New Year plan!</em>]</p>
<p>Once of the most basic segmentation dimensions in web analytics the differentiation between new and return visits. Understanding the different types of behaviour of new vs return visitors can be critical in understanding how people are interacting with your site and then taking steps to optimize accordingly, based on your objectives. As always, different web analytics tools deal with new vs return visitor metrics in different ways. SiteCatalyst, for one, does not make it easy to tell at a glance your % of new/return visits.</p>
<p>With Google Analytics, % new visits is right there in the default dashboard. In GA v5, it even gets its own pie chart, in addition to spark line, right out of the box:</p>
<p><figure style="width: 240px" class="wp-caption aligncenter"><img loading="lazy" class=" " title="Percent New Visits - GA" src="https://i0.wp.com/farm8.staticflickr.com/7015/6657565425_088d65f4ae_m.jpg" alt="Percent New Visits - GA" width="240" height="40" /><figcaption class="wp-caption-text">% New Visits on Google Analytics dashboard</figcaption></figure></p>
<p>Meanwhile, in SiteCatalyst (at least up to v14), things are not as clear. This is not to say information on new vs return visits is not available: indeed, there are a slew of reports in the &#8216;Visitor Retention&#8217; section, such as Return Frequency, Return Visits, Daily Return Visits, and Visit number that provide some great, detailed insight into visitor behaviour. But these suffer from lack of context or unnecessary level of granularity if I&#8217;m just trying to gauge the overall balance of site traffic in order to decide where to prioritize my efforts.</p>
<p>Probably the best option is the getNewRepeat plug-in that Omniture offers (search for &#8216;getnewrepeat&#8217; in the Omniture Knowledge Base), but this requires additional implementation and may not be feasible.</p>
<p>So here is a little tip to demonstrate how to fiddle with SiteCatalyst just a bit to get an easy-access report on <strong>% New Visits</strong>:</p>
<p>1. Go to &#8216;<strong>Visitor Retention&#8217;&gt;&#8217;Visit Number</strong>&#8216; &#8211; select &#8216;Visits (Report Specific)&#8217; as your metric (of course this same process could be used for Revenue or other metric). If you select &#8216;Percent&#8217; in the Graph options, you can see right away that x% of visits are &#8216;1st Visit&#8217;: i.e. NEW visits/visitors. But this is just a snapshot for the selected period, and it is usually more informative to see the trend over time.</p>
<p><figure style="width: 464px" class="wp-caption aligncenter"><a href="http://www.flickr.com/photos/41416404@N02/6655947321/"><img loading="lazy" class=" " title="Percent New Visit Ranked" src="https://i0.wp.com/farm8.staticflickr.com/7145/6655947321_971e7c2c44.jpg" alt="Percent new visit ranked" width="464" height="158" /></a><figcaption class="wp-caption-text">% Visits ranked by Visit Number in SiteCatalyst (1st Visit = New Visits)</figcaption></figure></p>
<p>2. Using the &#8216;<strong>Trended / Ranked&#8217; option, select &#8216;Trended&#8217;</strong> &#8211; now you can see how the % 1st Visit (NEW visits) is changing over time and take action accordingly.</p>
<p><figure style="width: 450px" class="wp-caption aligncenter"><img loading="lazy" title="Percent New Visit Trend" src="https://i0.wp.com/farm8.staticflickr.com/7009/6657688641_f81b0c10d0.jpg" alt="Percent New Visit Trend" width="450" height="178" /><figcaption class="wp-caption-text">% Visits by Visit Number Trend in SiteCatalyst</figcaption></figure></p>
<p>3. Add to bookmark or dashboard for quick access in future and &#8216;Bob&#8217;s your uncle&#8217;.</p>
<p>A couple other related things:</p>
<ul>
<li>as bonus info, by default SiteCatalyst shows the % of visits for each of the top 5 visit numbers; you can use &#8216;Select Item&#8217; on the &#8216;Report&#8217; tab to filter on &#8216;1st Visit&#8217; only if you want to focus attention, but having the higher visit numbers visible can signal where the increases/decreases in the % New Visits is coming from/going to.</li>
<li>if you want to look beyond visits to outcomes (which of course we do!), it is simply a matter of selecting another metric using &#8216;Add Metrics&#8217; and you can see what <strong>% of revenue, </strong>for example, came from new visits vs 2nd visits, etc.</li>
</ul>
<h3>So What?</h3>
<p>So now that we are able to quickly monitor our % new visits, the big question becomes: <strong>what to do about it?</strong> Depending on your goals, starting point, and context of other metrics, the interesting pattern shown in the chart above may be <strong>good</strong> or <strong>bad</strong>. If you are prioritizing acquisition of new customers, a declining trend in % new visits may indicate lack of success in your efforts. If you are focused on developing relationships with customers and bringing them back to your site, a decrease in % new visits may be a reason to pat yourself on the back, and continue your visitor retention efforts. It&#8217;s<strong> all about balance</strong>, though: the trend toward the end may be a signal to keep an eye on this metric and watch for the point where it makes sense to step up the visitor acquisition initiatives to fuel longer-term growth.</p>
<p>A company I know was focused on attracting new visitors, and the trends for both new visits and overall visits were up up up. But a quick look at the % New Visits report showing near 80% New Visits highlighted the fact that few of these new visitors were returning: time to re-balance efforts to ensure that new visitors are finding content that will keep them coming back.</p>
<p>So being able to track your % New Visits can be quite useful, and in case you didn&#8217;t know how to do this easily within SiteCatalyst, now you do. <img src="https://s0.wp.com/wp-content/mu-plugins/wpcom-smileys/twemoji/2/72x72/1f642.png" alt="🙂" class="wp-smiley" style="height: 1em; max-height: 1em;" /></p>
]]></content:encoded>
					
					<wfw:commentRss>https://catbirdanalytics.wordpress.com/2012/01/07/getting-new-visit-trend-in-sitecatalyst/feed/</wfw:commentRss>
			<slash:comments>2</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">676</post-id>
		<media:content url="https://0.gravatar.com/avatar/61a7e5aa4e24773b513b314f89b5611e80124a6e923e7234e89ca72a387b08d1?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">john</media:title>
		</media:content>

		<media:content url="http://farm8.staticflickr.com/7149/6655989213_eef15ea9e5.jpg" medium="image">
			<media:title type="html">New-visit-snapshot</media:title>
		</media:content>

		<media:content url="http://farm8.staticflickr.com/7015/6657565425_088d65f4ae_m.jpg" medium="image">
			<media:title type="html">Percent New Visits - GA</media:title>
		</media:content>

		<media:content url="http://farm8.staticflickr.com/7145/6655947321_971e7c2c44.jpg" medium="image">
			<media:title type="html">Percent New Visit Ranked</media:title>
		</media:content>

		<media:content url="http://farm8.staticflickr.com/7009/6657688641_f81b0c10d0.jpg" medium="image">
			<media:title type="html">Percent New Visit Trend</media:title>
		</media:content>
	</item>
		<item>
		<title>Rise and Fall of the Yahoo! Web Analytics Forum</title>
		<link>https://catbirdanalytics.wordpress.com/2011/06/26/rise-and-fall-of-the-yahoo-web-analytics-forum/</link>
					<comments>https://catbirdanalytics.wordpress.com/2011/06/26/rise-and-fall-of-the-yahoo-web-analytics-forum/#comments</comments>
		
		<dc:creator><![CDATA[John]]></dc:creator>
		<pubDate>Sun, 26 Jun 2011 18:23:17 +0000</pubDate>
				<category><![CDATA[Web Analytics]]></category>
		<category><![CDATA[community]]></category>
		<category><![CDATA[forum]]></category>
		<guid isPermaLink="false">http://catbirdanalytics.wordpress.com/?p=671</guid>

					<description><![CDATA[The Web Analytics Forum on Yahoo! Groups was started back in 2004 by one of the giants of the web analytics industry, Eric Peterson. Since then, it has been a major hub for web analytics community news, events, and discussion &#8211; so much so that Avinash Kaushik was included &#8220;You have not only heard of&#8230; <a href="https://catbirdanalytics.wordpress.com/2011/06/26/rise-and-fall-of-the-yahoo-web-analytics-forum/" class="more-link">Continue reading <span class="screen-reader-text">Rise and Fall of the Yahoo! Web Analytics&#160;Forum</span></a>]]></description>
										<content:encoded><![CDATA[<p style="text-align:center;"><img loading="lazy" class="aligncenter" title="Monthly Trend Yahoo Web Analytics Group" src="https://i0.wp.com/farm7.static.flickr.com/6060/5873280761_d0cc1dd86a.jpg" alt="" width="450" height="224" /></p>
<p style="text-align:justify;">The <a href="http://tech.groups.yahoo.com/group/webanalytics/">Web Analytics Forum</a> on Yahoo! Groups was started back in 2004 by one of the giants of the web analytics industry, <a href="http://www.webanalyticsdemystified.com/about/web-analytics-demystified-team.asp">Eric Peterson</a>. Since then, it has been a major hub for web analytics community news, events, and discussion &#8211; so much so that Avinash Kaushik was included &#8220;You have <strong>not only heard of the Yahoo! Web Analytics group, but 20 minutes of each day is spent reading</strong> all the posts&#8221;  as item #9 in his <a href="http://www.kaushik.net/avinash/2006/06/top-ten-signs-you-are-a-great-analyst.html">Top Ten: Signs You are a Great Analyst</a>.</p>
<p style="text-align:justify;">After being out of the loop of the Web Analytics Forum for a while due to some email changes, I was recently reminded of the relevance of this forum and subscribed again for daily updates. While doing so, I noticed some interesting stats on message history that are shown at the bottom of the home page. Being the type of person who processes data much better when it is visually represented, rather than being shown as numbers in a table (as I believe most of us are), I dropped the data into Excel and created the quick chart shown above.</p>
<p style="text-align:justify;">I was disheartened to see that although message volume on the forum grew fairly steadily from inception in 2004 up until April, 2008, it has trended down since then. I have to admit that I have never posted a message to the forum, so I am as much a part of the problem as anybody, but from a community member perspective it is disappointing to see this resource waning. No doubt it is at least partly a result of the rise of social media and the resulting fragmentation of communications, but the <strong>Web Analytics Forum has a valuable role to play</strong>, in my opinion, as a central repository of current &#8211; and historical &#8211; information related to the practice of web analytics.</p>
<p style="text-align:justify;">So I will continue following the Web Analytics Forum, looking for opportunities to jump in when I have something useful to add, and I hope others will do the same.</p>
<p style="text-align:justify;">And to the <strong>forum founder Eric Peterson, as well as all those who have followed in his footsteps and contributed valuable information and insights to the Web Analytics forum in the past, present, and future&#8230;THANK YOU!</strong></p>
]]></content:encoded>
					
					<wfw:commentRss>https://catbirdanalytics.wordpress.com/2011/06/26/rise-and-fall-of-the-yahoo-web-analytics-forum/feed/</wfw:commentRss>
			<slash:comments>4</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">671</post-id>
		<media:content url="https://0.gravatar.com/avatar/61a7e5aa4e24773b513b314f89b5611e80124a6e923e7234e89ca72a387b08d1?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">john</media:title>
		</media:content>

		<media:content url="http://farm7.static.flickr.com/6060/5873280761_d0cc1dd86a.jpg" medium="image">
			<media:title type="html">Monthly Trend Yahoo Web Analytics Group</media:title>
		</media:content>
	</item>
	</channel>
</rss>
