Web analysis with R - Part 3

Web analysis with R – part 3

In our last blog post we described, how the statistical software R can be linked to the tracking tool Google Analytics. This post will show you how to pull more data from Google Analytics and save it in R in tabular form. In addition, other basic functions in R will be presented step by step.

In web analytics, it often happens that the performance marketing department or the management wants to know how the online performance does in relation to their strategic goals such as sales of certain products or services compared to the previous week.

In the first step, two data frames (linking variables and factors or linking metrics and dimensions – see Web Analytics with R – Part 1) are built in the R programming language. It is assumed that Google Analytics is already linked with R.

Then, these two data frames, which contain the web analytics data from the last completed week and the previous week, are formatted and compared. For this, the functions for calculating the sum and mean are introduced.

In addition, a new variable (CVR – performance KPI from web analytics) is added to the two data frames.

To conclude the performance comparison, the delta of the data frames is subtracted from each other.

Part A - Google Analytics metrics & combine dimensions in data frame

First, you should decide which KPIs matter. For the sake of simplicity, we will only use the metrics in this example:

These have the following “API names” in Google Analytics:

  • ga:sessions
  • ga:bounceRate
  • ga:transactions
  • ga:transactionRevenue

The spelling is important, because the R queries regarding the Google Analytics package used must adhere to Google’s spelling conventions.

Thus, the first R query is as follows:

GA_DataFrame_KW3 <- google_analytics_4(KontoName_ViewID, date_range = c(“2018-01-21”, “2018-01-27”), metrics = c(“sessions”,”bounceRate”, “transactions”, “transactionRevenue”), dimensions = c(“day”))

GA Data Frame

Figure 1: Web Analysis with R – Data Frame with Google Analytics Data

To get the data of the previous week run this query again, but you have to change the time interval.

Thus, the first R query is as follows:

GA_DataFrame_KW2 <- google_analytics_4(KontoName_ViewID, date_range = c(“2018-01-14”, “2018-01-20”), metrics = c(“sessions”, “bounceRate”, “transactions”, “transactionRevenue”), dimensions = c(“day”))

Part B - Formatting the Google Analytics Data Frames

Since management, the performance marketing department or the web analyst himself would like to present the results in his usual form, the two data frames are formatted.

(I) Column headings:
a) Edit column header in R - function edit:

The easiest way to edit the column header is to use the R command “GA_DataFrame_KW3 <- edit(GA_DataFrame_KW3).

After entering and confirming the R command, the table with Google Analytics dimensions and metrics will appear in the data editor:


Figure 2: Web Analysis with R – Edit Column Heading

Here you can select and edit the desired column heading via a mouse click.


Figure 3: Web Analysis with R – Open Variable Editor

In this case, you should change:

  • sessions in Sessions
  • bounceRate in BounceRate
  • transactions in Transactions
  • transactionRevenue in Revenue

The new name of the columns is displayed directly in the R-Console.

GA Data Frame

Figure 4: Web Analytics with R – Edit Column Heading 2

b) Edit column header in R - function names

Another way to edit the headers is to enter:

names(GA_DataFrame_KW2) <- c(“Tag” , “Sessions”, “BounceRate”, “Transactions”, “Revenue”)

In this example “day” was converted to “Tag”. Here it is important that you enter the exact number of column names inside the quotes in the combine (c()) function. Even if only one column is to be changed, the other columns must also be specified.

If it is not known in advance how many columns are stored in a data frame and what their names are, you can enter “names(GA_DataFrame_KW2)” in the R-Console. Then all existing columns and their names will appear.

If less column names than available are assigned, then all missing columns receive the designation “NA”.

GA Data Frame

Figure 5: Web analytics with R – missing designations

c) Edit column header in R - function names [Position].

The third version presented in this article, how to edit labels within a data frame, is an extension of the function “names”. Namely, it is possible to specify the position.

names(GA_DataFrame_KW2) [1] <- “Day”

By entering the number 1 in the square bracket, Tag is renamed to “Day” again. This only selects the first variable -position 1- and overwrites or renames it.

(II) Adjust decimal places - function format

To adjust the decimal places the function format(x, ….) is used.

GA Data Frame

Figure 6: Web Analysis with R – Editing Decimal Places

As you can see in the graphic, the values of the bounce rate have only 2 decimal places. This is done by setting the argument within the function digit=4 (only 4 characters are displayed).

Why was the column variable “BounceRate” changed?

In the first part of the command “GA_DataFrame_KW2$BounceRate” was specified. The dollar sign tells R that within the data frame “GA_DataFrame_KW2” the variable “BounceRate” is to be selected.

Mean value

In order to communicate to management and the online marketing department how performance has changed from week 2 to 3, the sum or mean value for the selected Google Analytics metrics must now be calculated.

a) Calculate sum with the function sum

To calculate the respective sum of Sessions, Transactions and Revenue use the following commands:

  • GA_DataFrame_SUM_Sessions <-Summe(GA_DataFrame_KW2$Sessions)
  • GA_DataFrame_SUM_Transaktionen <- Summe(GA_DataFrame_KW2$Transaktionen)
  • GA_DataFrame_SUM_Umsatz <- Summe(GA_DataFrame_KW2$Umsatz)
b) Calculate mean value with the function mean

To calculate the average value for the BounceRate use the following command:

  • GA_DataFrame_MEAN_BounceRate <- mean(GA_DataFrame_KW2$BounceRate)

Remark – conversion of a variable:

When calculating the mean value of the bounce rate, an error message will appear at this point.

GA Data Frame

This is due to the fact that by formatting the decimal places using the function “format(x,…)” the data type of the variable “BounceRate” was converted from “numeric” to “character”. This means that it is no longer possible to calculate the mean value.

Therefore you have to convert the bounce rate again. This is done with the following command:

GA_DataFrame_KW2$BounceRate <- as.numeric(GA_DataFrame_KW2$BounceRate)

The function “as.numeric(x)” converts the bounce rate from the data frame back to numeric values.

This can be checked with the function “class(x)”.

GA Data Frame

Figure 8: Web Analysis with R – Checking Variable Type

Now the mean value or other calculations can be performed.

Part D - create more variables

Variable added to both Google Analytics data frames. The new variable is the conversion rate (CVR), which is a key performance indicator (KPI).

By means of the conversion rate, the stakeholder as well as the web analyst can usually recognise how effective the traffic is int terms of conversions. For example, if the number of successful online sales is a goal defined in the KPI framework, the CVR can be used to determine how many sessions are needed on average to realise an online sale.

The formula for this looks as follows:

CVR Formel: CVR ist gleich Anzahl Online Verkäufe geteilt durch Anzahl der Sitzungen mal 100

To now include the conversion rate as a new column in both data frames, you can use the following query to retrieve the CVR from Google Analytics and store it in a variable.

  • GA_DataFrame_KW3_CVR <- google_analytics_4(account_name_ViewID, date_range = c(“2018-01-21”, “2018-01-27”), metrics = c(“transactionPerSession“), dimensions = c(“day”))
  • CVR <- GA_DataFrame_KW3_CVR$transactionsPerSession
  • GA_DataFrame_KW3$CVR <- CVR

Another possibility would be to calculate the CVR directly in both already created Data Frames. For the Data Frame “GA_DataFrame_KW2“, the calculation of the conversion rate including the creation of a new column looks like this:

GA_DataFrame_KW2$CVR <- GA_DataFrame_KW2$Transactions/GA_DataFrame_KW2$Sessions*100

Within the data frame you create “GA_DataFrame_KW2” a new column “CVR” and this gets the content transaction divided of the sessions.
If the Data Frame is called, then the CVR is indicated and in the sixth column.

GA Data Frame

Figure 9: Output of the values

Part E - Comparison of calendar weeks

In the last part of this article, we want to compare the two calendar weeks with each other in terms of metrics from Google Analytics. This allows you to see whether your online store has changed positively or negatively in its performance.

In the first step, calculate the difference for the five selected web analytics metrics as well as a dimension “tag” and assign these to new variables:

  • delta_Sessions <- GA_DataFrame_KW3$Sessions-GA_DataFrame_KW2$Session
  • delta_BounceRate <- GA_DataFrame_KW3$ BounceRate -GA_DataFrame_KW2$BounceRate
  • delta_Transactions <- GA_DataFrame_KW3$ Transactions-GA_DataFrame_KW2$Transactions
  • delta_Revenue <- GA_DataFrame_KW3$ Revenue -GA_DataFrame_KW2$Revenue
  • delta_CVR <- GA_DataFrame_KW3$CVR-GA_DataFrame_KW2$CVR
  • Tag <- c(14,15,16,17,18,19,20)

In the second step, create a new data frame “DataFrame_GA_Delta” and assign the six new variables to the Data Frame.

DataFrame_GA_Delta <- data.frame(Tag, delta_Sessions, delta_BounceRate, delta_Transactions, delta_Revenue)

Data Frame

When looking at sales and transactions, you can tell the stakeholders that, except for the first day, revenue has always been more than the previous week.

Of course, in reality, such an analysis looks at other factors. However, this post should have introduced you to new features for dealing with R.


Stay up to date!

More blog posts