BI for Beginers Part 2

MRD_MEME
Following up from last weeks post this post discuses user metrics. User Metrics (UM) group users by their activity dates. The most classic of these is daily active users (DAU). To define DAU requires some app./game context; normally I use “user has a login event on the calendar day”. However, if your app. is always live/ users are by default logged in, you might have to find some other “session start event”. The basic SQL for DAU is:

“SELECT date(login_time) as day, count(distinct id) as DAU FROM login_table GROUP BY 1 ORDER BY 1;”

DAU has many traps because it takes everybody on board and to honest not everybody is worth it. Many users may install and open your app and ether leave soon or have problems loading. The first group indicate that the app store description was not adequate as they do not want the app. The second group generally results from users have devices that the app does not support.  If either group is too large then User Acquisition (UA) methods should be changed, however that is a whole other post.

To stabilize the DAU it is helpful to exclude users; for model games this could be users that have not passed the first time user experience (FTUE) / tutorial. Users that have passed the tutorial I call DLU, Daily Leveled Users.  A query for DLU could be:

“SELECT date(login_time) as day, count(distinct id) as DAU FROM login_table where id in (select id from user_table where level = 1) GROUP BY 1 ORDER BY 1;”

Another helpful user aggregate is the Daily Monetized Users (DMU). DMU are users that have given the app money in some way (either through IAP or advertising). DAU and DLU are essentially a nuance to get DMU. DMU can be divided into Daily Active Customers (DAC) and Daily Active Viewer (DAV), these are users that have made an IAP and have vied and ad, respectively. A user does not have to make the IAP or view the ad on the day they login to be a DAC or DAV they only have to have done it before or on that day! A query for DAM, DAC, DAV should be something like:

“SELECT date(login_time) as day,
count(distinct case when LTV > 0 THEN id END) as DAM,
count(distinct case when IAP_LTV > 0 THEN id END) as DAC,
count(distinct case when ads_LTV > 0 THEN id END) as DAV
FROM user_login GROUP BY 1 ORDER BY 1;”

You might have to do some joins depending on your data schema, but  I would recommend just adding user LTV for IAP and ads to your login table. I was first first educated on the value of thinking in terms of DAC by my old boss; later I expanded the idea to DAM and DAV. Warning, the behavior of DAC, DAM, and DAV are heavily influenced by user age but I will expand on that later (it will be clearer with a few more KPI).

For users that accomplish or do something on the same day they login the analogous KPI are Daily Dollar User (DDU) [I know that is a stupid acronym but there are not enough words that mean monetize!],  Daily Paying User (DPU), and Daily Viewing User (DVU). There are users that give mony in any way on the day, users that make an IAP on the day, and users that view an ad on the day, respectively.

To clarify all users are DAU, users that finish the tutorial are DLU, users that make an IAP are DAC and users that view ads are DAV. The day a user makes either an IAP or views an add it is as DDU (DPU for IAP and DVU for ads). A user can be a DAU, DAC, DPU, and DVU for example, i.e., if you add all of these user KPI you will get a larger number then the number of distinct logins for a day.

While all of these KPI can inform how a game is doing in general they are used in ratios with each other. It can also be useful to construct the weakly and monthly versions of these KPI. Normally I change the “D” to a “W” or “M” for weekly or monthly. Calculating the KPI can be difficult because there is no count distinct window function. Self join can be used but in general it does make a very large intermediate table. As usual stack overflow comes to the rescue with an elegant approach. A clear benefit of the weekly KPI is that they smooth out day of week effects as less obvious benefit is that they let you dead pan conversation about the WACs per DAC (Weakly active costumers per daily active customer, if it is greater then 7 your customers are loging in less the once a week!)  which is also awesome.

The KPI can also be used in ratios with pay metrics such as Total revenue from the day (REV), IAP revenue (REV-IAP) from the day, and ad revenue (REV-ads) from the day. The most common is the average revenue per daily active users (ARPDAU) this is REV / DAU.

I tend to think in terms of ARPDAM however as noted before ARPDAM is influenced by user age. If users are younger the ARPDAM will be artificially high. This is because the ARPDAM is a sort of blend between the ARPDDU and ARPDAU; with new users the ARPDAM is closer to ARPDDU.

With regards to the “ARP-KPI you can get variance for free because the variance of the mean is expected to be \sigma^{2} / n where \sigma^{2} is variance of the sample. Of course \sigma^{2} is an estimate but in most practical applications n >> 1000 so for practical purposes it can be treated as a consent. The following Query gives ARPDAU with both mean and standard error for each day.

“SELECT Day,  AVG_REV, SD_REV/sqrt(NU::float) as SD_AVG_REV
FROM
(
SELECT Q1.day, count(distinct Q1.id) as NU, avg(coalesce(REV,0.0)) as      AVG_REV,   STDDEV_SAMP(coalesce(REV,0.0)) as SD_REV
FROM
(
(SELECT id, date(login_time) as day
FROM user_login
GROUP BY 1,2) Q1
LEFT JOIN
(SELECT id, date(rev_time) as day, sum(revenue) as REV
FROM game_revenue
GROUP BY 1,2) Q2
ON Q1.teamid = Q2.teamid AND Q1.day = Q2.day
)
GROUP BY 1
)
ORDER BY 1”

Finally it may be helpful to summarize the ARP-KPIs in terms of averages. This must be done in a weighted way; that is if there are 100k users on day x and 50k users on day y then day x must have twice the weight of day y. This is easiest to accomplish from the SQL, i.e.:

“SELECT AVG_REV as AVG_ARPDAU, SD_REV/sqrt(user_Days) as AVG_ARPDAU_SD
FROM
(SELECT AVG(COALECSE(REV,0)) as AVG_REV, COUNT(*) as user_Days,  STDDEV_SAMP(COALECSE(REV,0)) as SD_REV
FROM
(
(SELECT id, date(login_time) as day
FROM ks2_dau
GROUP BY 1,2) Q1
LEFT JOIN
(SELECT id, date(rev_time) as day, sum(revenue) as REV
FROM game_revenue
GROUP BY 1,2) Q2
ON Q1.teamid = Q2.teamid AND Q1.day = Q2.day
)
)

Cheers Gavin

Tune in Next week for more BI the Bayesian Way.