Analyzing Disparity Between Users and Journalists Ratings

Randomized Shuffling As My First Systematic Analysis

Intro

In this post we analyze and compare metacritic ratings of best 2019 games, Given by both the gaming community and professional journalism. It is divided into three sections, namely, data preprocessing, exploratory analysis, and systematic test. I launched a similar post upto exploratory analysis on this Reddit post. Surprisingly, The community shared with me great insights and feedbacks. If you have already seen my notebook on reddit’s post, Then skip to systematic test. There is a summary of each section if you’re lazy to read the whole kernel. Check summaries out in table of contents below. On the other extreme, If you are willing to read every detail of the kernel, I provided for you the full sourcecode used in this blog post here.

In a nutshell, The goal of this blog post is:

  • Analyzing top games got high ratings from professional critics but not from community of users.
  • Analyzing top games got high ratings from community of users but not from progessional critics.
  • Graph of percentage of games whose disparity between critics and users are low, moderate, or high.
  • Do above steps on four platforms, namely, PS4, Xbox One, Switch, and PC. Then we compare them.
  • Apply permutation and p-values systematic test on each platforms pairs distributions.

What I Have Learned From Reddit Community

In this paragraph I shall highlight and review reddit’s community comments which I found most useful. I am going to just quote the user’s name, and summarize his comment. To see his full comment, just CTRL+F his name on reddit’s page. After each summary, I spot what I learned, and how analysis could be furtherly improved according to it. However, None of these spots are implemented here.

ArtKorvalay A gamer who dislikes a game but finds no outrage from the community does not add up his voice. A gamer who moderates a game but finds an outrage from the community adds up a negative voice

We could consider ratings along whether the game is hyped or outraged from the community. In that way, we might reach more accurate analysis.

ArtKorvalay Some games like disco elsiym gets played by only those who like such genre of games. A humble 2d-graphics like this shall not be played by any casual gamer who gets attracted by marketing and high graphics. So, a gamer who chooses to play it must be a fan of that style. As a result, the game got rated only by those who like it. Hence, ratings are biased.

We could consider ratings along whether a game’s marketing budget is high or low. In that way, we might reach more accurate analysis.

EoceneMiacid Terminator Resistance case study is typical for disparity between users and critics. Have a look here how the problem was highlighted by media.

Exploring this case study might reveal new insights as it is typical of the problem of disparity between users and professional critics. We might test our new techniques on this case study and see how our techniques perform up against it. Testing analysis techniques on a case we already know about emphatically shall help us on detecting mistakes in our analysis.


Table of Contents

Preface

Data Preprocessing

Exploratory Analysis

Systematic Test


Import Libraries and Local Files

# 3rd-party libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# local-files
import jsonRW as jsRW
import graphs.pie as pieGraph
import graphs.categoricalHeatmap as categoricalHeatmapGraph
import graphs.groupedBars as groupedBarsGraph
import graphs.histogramPdf as histogramPdfGraph
import transformations.transformations as transform
import transformations.discretizeIntoCategories as discIntCat
import statTests.permutationTest as permTest

Read Data

Read Local JSON Data Into a Pandas Dataframe

# a map from each platform to its corresponding dataframe
platform_df = {}
# platform names and their corresonding data file names
platformsNames = ['ps4', 'xbox', 'switch', 'pc']
filesNames = ['ps4.csv', 'xbox.csv', 'switch.csv', 'pc.csv']

# for each platform, then 
for name in platformsNames:
    # read its local json file
    metacritic_list = jsRW.readJson(name)
    # parse it as pandas dataframe, then map platform name to it
    platform_df[name] = pd.DataFrame(metacritic_list)
# take a look at a dataframe
platform_df['ps4']

critic_rating id release_date title user_rating
0 91 1 Jul 2, 2019 final fantasy xiv: shadowbringers 8.3
1 91 2 Feb 26, 2019 nier: automata - game of the yorha edition 8.5
2 91 3 Jan 25, 2019 resident evil 2 8.8
3 90 4 Mar 22, 2019 sekiro: shadows die twice 7.9
4 89 5 Sep 6, 2019 monster hunter: world - iceborne 8.4
... ... ... ... ... ...
335 39 336 Oct 15, 2019 zombieland: double tap - road trip 4.6
336 37 337 Mar 5, 2019 left alive 8.3
337 36 338 Mar 5, 2019 eternity: the last unicorn 3.8
338 31 339 May 30, 2019 dayz 2.8
339 31 340 Mar 29, 2019 where the bees make honey 3.2

340 rows × 5 columns


Data Cleansing

# drop unneeded columns and re-organize them 
for name in platformsNames:
    platform_df[name] = platform_df[name][['title', 'user_rating', 'critic_rating']]
# take a look at a dataframe, again
platform_df['ps4']

title user_rating critic_rating
0 final fantasy xiv: shadowbringers 8.3 91
1 nier: automata - game of the yorha edition 8.5 91
2 resident evil 2 8.8 91
3 sekiro: shadows die twice 7.9 90
4 monster hunter: world - iceborne 8.4 89
... ... ... ...
335 zombieland: double tap - road trip 4.6 39
336 left alive 8.3 37
337 eternity: the last unicorn 3.8 36
338 dayz 2.8 31
339 where the bees make honey 3.2 31

340 rows × 3 columns

remarks

  • user_rating must be on the same scale as critic_rating
  • data types need to be checked
# check columns data types
platform_df['ps4'].dtypes
title            object
user_rating      object
critic_rating    object
dtype: object
# convert ratings into a numeric value
#      error ahead!
#df['user_rating'] = pd.to_numeric(df['user_rating'])
#df['critic_rating'] = pd.to_numeric(df['critic_rating'])
# get rid of user_rating with value equal to "tbd"

# for each platform
for name in platformsNames:
    # get its dataframe
    df = platform_df[name]
    # get index set in which user_rating is tbd, a non-numeric value
    tbdIndex = df[df['user_rating']=="tbd"].index
    # drop rows specified by indices in which user_rating is tbd
    df = df.drop(labels=tbdIndex, axis='index')
    # set updated data to platform_df
    platform_df[name] = df
# convert ratings to a numeric type

# for each platform
for name in platformsNames:
    # get its dataframe
    df = platform_df[name]
    # convert to a numeric type
    df['user_rating'] = pd.to_numeric(df['user_rating'])
    df['critic_rating'] = pd.to_numeric(df['critic_rating'])
    # set updated data to platform_df
    platform_df[name] = df
# check data types
platform_df['ps4'].dtypes
title             object
user_rating      float64
critic_rating      int64
dtype: object
# user ratings must be on the same scale as critics ratings, so we multiply them by 10

# for each platform
for platformName in platform_df:
    platform_df[platformName]['user_rating'] = platform_df[platformName]['user_rating'] * 10
platform_df['ps4']

title user_rating critic_rating
0 final fantasy xiv: shadowbringers 83.0 91
1 nier: automata - game of the yorha edition 85.0 91
2 resident evil 2 88.0 91
3 sekiro: shadows die twice 79.0 90
4 monster hunter: world - iceborne 84.0 89
... ... ... ...
335 zombieland: double tap - road trip 46.0 39
336 left alive 83.0 37
337 eternity: the last unicorn 38.0 36
338 dayz 28.0 31
339 where the bees make honey 32.0 31

310 rows × 3 columns


Optional: Store Cleaned Data Into a CSV File

"""
# store data to a csv file

# for each platform
for platformName in platform_df:
    # save to a csv file
    platform_df[platformName].to_csv(str(platformName)+'.csv')
"""
"\n# store data to a csv file\n\n# for each platform\nfor platformName in platform_df:\n    # save to a csv file\n    platform_df[platformName].to_csv(str(platformName)+'.csv')\n"

Data Preprocessing Summary

  • Data stored as JSON format are transformed into csv
  • Unneded columns are dropped
  • Suitable data types are recognized by pandas

Compute Disparity (Difference) Between Users and Critics

# for each platform
for name in platform_df:
    # get dataframe of the platform
    df = platform_df[name]
    # for each record, compute distance between user and critic ratings, then set result to a new column
    df['userCritic_difference'] = df.apply(lambda x: abs(x['user_rating']-x['critic_rating']), axis=1)
    # assign updates back to our dataframe
    platform_df[name] = df
platform_df['ps4']

title user_rating critic_rating userCritic_difference
0 final fantasy xiv: shadowbringers 83.0 91 8.0
1 nier: automata - game of the yorha edition 85.0 91 6.0
2 resident evil 2 88.0 91 3.0
3 sekiro: shadows die twice 79.0 90 11.0
4 monster hunter: world - iceborne 84.0 89 5.0
... ... ... ... ...
335 zombieland: double tap - road trip 46.0 39 7.0
336 left alive 83.0 37 46.0
337 eternity: the last unicorn 38.0 36 2.0
338 dayz 28.0 31 3.0
339 where the bees make honey 32.0 31 1.0

310 rows × 4 columns

Discretize Disparity Computed Earlier Into Categories

# categories names and their corresponding intervals
# category at location x corresponds to interval equal or greater than intervals location x and less than location x + 1
# except for last category, has no end
categories = pd.Series(["low", "moderate", "high", "very_high", "extremely_high"])
intervals_categories = [0, 20, 30, 40, 50]
# compute categories as defined earlier

# loop on platforms
for platformName in platform_df:
    # get dataframe of the platform
    df = platform_df[platformName]
    # add category based on difference just defined
    df['difference_category'] = df.apply(discIntCat.numToCat, axis=1, args=('userCritic_difference', categories, intervals_categories))
    
    # let categories be recognized by pandas
    df['difference_category'] = df['difference_category'].astype("category")
    # re-order categories
    df['difference_category'] = df['difference_category'].cat.set_categories(categories, ordered=True)
    
    
    # assign back to our dataframe
    platform_df[platformName] = df
# take a look after our new columns added
platform_df['ps4']

title user_rating critic_rating userCritic_difference difference_category
0 final fantasy xiv: shadowbringers 83.0 91 8.0 low
1 nier: automata - game of the yorha edition 85.0 91 6.0 low
2 resident evil 2 88.0 91 3.0 low
3 sekiro: shadows die twice 79.0 90 11.0 low
4 monster hunter: world - iceborne 84.0 89 5.0 low
... ... ... ... ... ...
335 zombieland: double tap - road trip 46.0 39 7.0 low
336 left alive 83.0 37 46.0 very_high
337 eternity: the last unicorn 38.0 36 2.0 low
338 dayz 28.0 31 3.0 low
339 where the bees make honey 32.0 31 1.0 low

310 rows × 5 columns

Sort According to Disparity Between Users and Critics

# for each platform
for platformName in platform_df:
    # get platform dataframe
    df = platform_df[platformName]
    # sort it by userCritic_difference
    df = df.sort_values(axis=0, by='userCritic_difference', ascending=False)
    # assign sorted dataframe back to our dataframe
    platform_df[platformName] = df

Basic Stats on Disparity Between Users and Critics

# for each platform
for platformName in platform_df:
    # print platform name
    print("\n", "on ", platformName)
    # show basic stat
    print(platform_df[platformName]['userCritic_difference'].describe())
 on  ps4
count    310.000000
mean      15.893548
std       13.074530
min        0.000000
25%        5.000000
50%       12.000000
75%       23.000000
max       69.000000
Name: userCritic_difference, dtype: float64

 on  xbox
count    186.000000
mean      14.801075
std       13.192881
min        0.000000
25%        5.000000
50%       11.000000
75%       21.000000
max       69.000000
Name: userCritic_difference, dtype: float64

 on  switch
count    364.000000
mean       6.876374
std        8.741062
min        0.000000
25%        1.750000
50%        4.000000
75%        9.000000
max       58.000000
Name: userCritic_difference, dtype: float64

 on  pc
count    327.000000
mean      13.547401
std       12.322982
min        0.000000
25%        4.000000
50%       10.000000
75%       19.000000
max       63.000000
Name: userCritic_difference, dtype: float64

Categories Size

Platform x Category 2D Sizes Dataframe

platform_category_size = transform.map_columnCount(platform_df, 'difference_category')
platform_category_size

low moderate high very_high extremely_high
ps4 211 52 33 9 5
xbox 131 31 14 5 5
switch 334 19 6 2 3
pc 249 43 22 8 5

Category x Platform 2D Sizes Dataframe

category_platform_size = platform_category_size.transpose()
category_platform_size

ps4 xbox switch pc
low 211 131 334 249
moderate 52 31 19 43
high 33 14 6 22
very_high 9 5 2 8
extremely_high 5 5 3 5
category_platform_size.loc['low', 'ps4']
211

Graphing Disparity Between Users and Critics

Pie Graph

for columnName in category_platform_size:
    platSeries = category_platform_size[columnName]
    platName = platSeries.name
    pieGraph.showPieGraph(platSeries, platName + ' categories percentages', 6, 6)

png

png

png

png

Grouped Bar

groupedBarsGraph.showGroupedBars(platform_category_size, platformsNames, 'categories size', 'categories size by platform')

png

Categorical Heatmap

categoricalHeatmapGraph.showCategoricalHeatmap(8, 8, category_platform_size, "categories sizes among platforms")

png

Maximum Disparity Between Users and Critics Ratings

platform_df['ps4'].head(20)

title user_rating critic_rating userCritic_difference difference_category
93 nba 2k20 9.0 78 69.0 extremely_high
82 fifa 20 11.0 79 68.0 extremely_high
116 madden nfl 20 16.0 76 60.0 extremely_high
79 gravity ghost: deluxe edition 27.0 79 52.0 extremely_high
172 simulacra 21.0 72 51.0 extremely_high
36 mortal kombat 11 33.0 82 49.0 very_high
58 call of duty: modern warfare 32.0 80 48.0 very_high
199 hitman hd enhanced collection 21.0 69 48.0 very_high
217 mxgp 2019 20.0 68 48.0 very_high
224 we. the revolution 20.0 67 47.0 very_high
218 giga wrecker alt. 20.0 67 47.0 very_high
336 left alive 83.0 37 46.0 very_high
65 dauntless 34.0 80 46.0 very_high
285 darksiders iii: keepers of the void 18.0 59 41.0 very_high
286 attack of the earthlings 20.0 59 39.0 high
299 the lego movie 2 videogame 18.0 57 39.0 high
275 asterix & obelix xxl 3: the crystal menhir 21.0 60 39.0 high
68 lonely mountains: downhill 41.0 80 39.0 high
256 a knight's quest 24.0 63 39.0 high
21 far: lone sails 44.0 83 39.0 high
platform_df['xbox'].head(20)

title user_rating critic_rating userCritic_difference difference_category
70 nba 2k20 11.0 80 69.0 extremely_high
74 fifa 20 11.0 79 68.0 extremely_high
75 madden nfl 20 20.0 79 59.0 extremely_high
9 mortal kombat 11 31.0 86 55.0 extremely_high
115 timespinner 23.0 74 51.0 extremely_high
169 wolfenstein: youngblood 20.0 68 48.0 very_high
57 call of duty: modern warfare 37.0 81 44.0 very_high
10 nhl 20 42.0 85 43.0 very_high
143 disney classic games: aladdin and the lion king 30.0 72 42.0 very_high
24 far: lone sails 43.0 84 41.0 very_high
45 dauntless 43.0 82 39.0 high
106 far cry new dawn 37.0 75 38.0 high
76 grid 45.0 79 34.0 high
248 wwe 2k20 11.0 45 34.0 high
92 assassin's creed iii remastered 43.0 77 34.0 high
160 genesis alpha one 35.0 69 34.0 high
17 trials rising 52.0 85 33.0 high
19 crash team racing: nitro-fueled 51.0 84 33.0 high
204 narcos: rise of the cartels 30.0 63 33.0 high
7 fell seal: arbiter's mark 54.0 86 32.0 high
platform_df['pc'].head(20)

title user_rating critic_rating userCritic_difference difference_category
203 nba 2k20 11.0 74 63.0 extremely_high
234 fifa 20 11.0 72 61.0 extremely_high
237 madden nfl 20 12.0 72 60.0 extremely_high
77 call of duty: modern warfare 25.0 81 56.0 extremely_high
48 mortal kombat 11 27.0 82 55.0 extremely_high
279 hearthstone: heroes of warcraft - saviors of u... 19.0 68 49.0 very_high
19 the sims 4: realm of magic 37.0 85 48.0 very_high
264 wolfenstein: youngblood 22.0 69 47.0 very_high
383 left alive 86.0 40 46.0 very_high
96 bury me, my love 34.0 80 46.0 very_high
1 red dead redemption 2 48.0 93 45.0 very_high
235 oninaki 31.0 72 41.0 very_high
47 the sims 4: discover university 41.0 82 41.0 very_high
360 wolfenstein: cyberpilot 15.0 54 39.0 high
187 assassin's creed iii remastered 36.0 75 39.0 high
181 defector 36.0 75 39.0 high
68 the elder scrolls online: dragonhold 42.0 81 39.0 high
130 surviving mars: green planet 40.0 78 38.0 high
28 dirt rally 2.0 46.0 84 38.0 high
150 plants vs. zombies: battle for neighborville 40.0 77 37.0 high
platform_df['switch'].head(20)

title user_rating critic_rating userCritic_difference difference_category
240 nba 2k20 15.0 73 58.0 extremely_high
66 pillars of eternity: complete edition 27.0 82 55.0 extremely_high
94 pokemon sword / shield dual pack 29.0 80 51.0 extremely_high
416 catan 18.0 61 43.0 very_high
123 mortal kombat 11 36.0 78 42.0 very_high
87 pokemon shield 44.0 80 36.0 high
91 pokemon sword 45.0 80 35.0 high
492 fifa 20: legacy edition 9.0 43 34.0 high
476 devil may cry 2 84.0 50 34.0 high
345 giga wrecker alt. 35.0 67 32.0 high
359 wolfenstein: youngblood 35.0 65 30.0 high
280 mutant year zero: road to eden - deluxe edition 42.0 71 29.0 moderate
237 dauntless 44.0 73 29.0 moderate
398 rad rodgers: radical edition 34.0 62 28.0 moderate
432 farming simulator 20 32.0 59 27.0 moderate
466 whipseey and the lost atlas 80.0 54 26.0 moderate
263 my time at portia 46.0 72 26.0 moderate
374 deponia 38.0 64 26.0 moderate
496 car mechanic simulator 15.0 41 26.0 moderate
499 blades of time 63.0 38 25.0 moderate
def searchforTitleInPlatform(platformStr_in, game_in):
    tem_df = platform_df[platformStr_in][platform_df[platformStr_in]['title'] == game_in]
    if len(tem_df) == 1:
        return tem_df.iloc[0]
    elif len(tem_df) == 0:
        return -1
    else:
        raise ValueError("unexpected no of games found")
        
searchforTitleInPlatform('xbox', "hitman hd enhanced collection")
title                    hitman hd enhanced collection
user_rating                                         46
critic_rating                                       66
userCritic_difference                               20
difference_category                           moderate
Name: 187, dtype: object

Minimum Disparity Between Users and Critics

platform_df['ps4'].tail(20)

title user_rating critic_rating userCritic_difference difference_category
55 five nights at freddy's vr: help wanted 78.0 80 2.0 low
251 metal wolf chaos xd 62.0 63 1.0 low
201 erica 70.0 69 1.0 low
113 blazing chrome 75.0 76 1.0 low
207 sea of solitude 68.0 69 1.0 low
88 blasphemous 77.0 78 1.0 low
339 where the bees make honey 32.0 31 1.0 low
22 bloodstained: ritual of the night 84.0 83 1.0 low
317 eden-tomorrow 53.0 52 1.0 low
78 knights and bikes 78.0 79 1.0 low
38 efootball pes 2020 81.0 82 1.0 low
75 children of morta 78.0 79 1.0 low
306 ice age: scrat's nutty adventure 56.0 55 1.0 low
110 motogp 19 76.0 76 0.0 low
188 lost ember 70.0 70 0.0 low
198 chocobo's mystery dungeon: every buddy! 69.0 69 0.0 low
330 submersed 44.0 44 0.0 low
214 effie 68.0 68 0.0 low
76 star wars jedi: fallen order 79.0 79 0.0 low
56 blood & truth 80.0 80 0.0 low

Games Which Got Higher Ratings From Users Than From Critics

def higherUserRatings(platform_in):
    return platform_df[platform_in][platform_df[platform_in]['user_rating'] > platform_df[platform_in]['critic_rating']].head(10)
    
higherUserRatings('pc')

title user_rating critic_rating userCritic_difference difference_category
383 left alive 86.0 40 46.0 very_high
376 paranoia: happiness is mandatory 71.0 47 24.0 moderate
355 little misfortune 80.0 57 23.0 moderate
348 terminator: resistance 82.0 59 23.0 moderate
384 eternity: the last unicorn 61.0 39 22.0 moderate
341 summer catchers 83.0 61 22.0 moderate
365 bannermen 72.0 52 20.0 moderate
374 i love you, colonel sanders! a finger lickin' ... 68.0 50 18.0 low
344 medieval kingdom wars 77.0 60 17.0 low
302 outbuddies 83.0 66 17.0 low

Exploratoy Analysis Summary

  • NBA, Fifa, Madden, COD: modern warefare games are on top of nearly all platforms lists of maximum disparity between users and professional critics
  • Star Wars Jedi: Fallen Order got zero disparity between users and professional critics ratings
  • Left Alive is the most praised game by the community not appreciated by professional critics
  • Switch games got much lower percentage of high and moderate disparity
  • Switch games got a mean of 7 disparity, nearly half of other platforms' disparity which got about 14

A Single Permutation Shuffle Based Trial With Histogram & Probability Density Function

We compare platforms distributions through permutation-test. It is a more systematic approach than relying upon intuition of visualizing and comparing distributions. Given two platforms, We concatenate them into one group. That group’s elements are randomly shuffled. Then we divide the group into new two groups. We compare the two distributions of the new two groups and assess whether the insight is still present as in the case of the two original groups of platforms. If the insight is not present in the two new groups, then that would count an evidence on behalf of our hypothesis. That is, The insight (difference in distribution) of original distributions is attributed to the two platforms. In addition, We consider average a p-value of a distribution and utilize it in our test. In Next section, We apply this method iteratively.

Ensure Series Data are Ascendingly Ordered

print(platform_df['ps4']['userCritic_difference'])
print("")
print(platform_df['switch']['userCritic_difference'])
93     69.0
82     68.0
116    60.0
79     52.0
172    51.0
       ... 
198     0.0
330     0.0
214     0.0
76      0.0
56      0.0
Name: userCritic_difference, Length: 310, dtype: float64

240    58.0
66     55.0
94     51.0
416    43.0
123    42.0
       ... 
443     0.0
89      0.0
106     0.0
53      0.0
208     0.0
Name: userCritic_difference, Length: 364, dtype: float64

PS4 Distribution

histogramPdfGraph.showHistPdf(platform_df['ps4']['userCritic_difference'], 30, '#e3e2e2', 'black', 'disparity', 'ps4', 10, 8)

png

Average of PS4’s Disparity

platform_df['ps4']['userCritic_difference'].mean()
15.893548387096773

Switch Distribution

histogramPdfGraph.showHistPdf(platform_df['switch']['userCritic_difference'], 30, '#e3e2e2', 'black', 'disparity', 'switch', 10, 8)

png

Average of Switch Disparity

platform_df['switch']['userCritic_difference'].mean()
6.876373626373627

Conclusion

  • The difference between ps4 and switch distributions is notable
  • The difference between ps4 and switch means is about 9

Concatenate Both PS4 and Switch

bothGroups = pd.concat([platform_df['switch']['userCritic_difference'], platform_df['ps4']['userCritic_difference']])

Shuffle and Divide

# permutation based shuffling
rng = np.random.default_rng()
bothGroups = rng.permutation(bothGroups)
# divide into two groups
firstGroup = bothGroups[:int(len(bothGroups)/2)]
secondGroup = bothGroups[int(len(bothGroups)/2):]

First Group Distribution

histogramPdfGraph.showHistPdf(firstGroup, 30, '#e3e2e2', 'black', 'disparity', 'first group', 10, 8)

png

First Group Average

firstGroup.mean()
11.0

Second Group Distribution

histogramPdfGraph.showHistPdf(secondGroup, 30, '#e3e2e2', 'black', 'disparity', 'second group', 10, 8)

png

Second Group Average

secondGroup.mean()
11.047477744807122

Conclusion

  • The difference between first and second groups distributions is not notable alike ps4 and switch
  • The difference between first and second groups means is much less than disparity between ps4 and switch distributions

Permutation Test and P-Value Based Statistical Significance

We apply the above method iteratively. The more tests, The more confident we are of our hypothesis. That is, The pattern of two distributions is attributed to the difference in two platforms.

# computes average of a list
def avgOfList(list_in):
    return pd.Series(list_in).mean()
# loop on pairs of platforms
for idx, platformName in enumerate(platformsNames):
    for idx_, platformName_ in enumerate(platformsNames):
        # compare only unique pairs
        if idx_ > idx:
            # print pairs of platforms which are compared
            print(platformName, platformName_)
            # apply test for 25 iterations on first and second platforms of the nested loop
            testResults = permTest.permutationTest(25, platform_df[platformName]['userCritic_difference'], platform_df[platformName_]['userCritic_difference'])
            # print results average
            print(avgOfList(testResults))
            print("")
ps4 xbox
0.6639999999999998

ps4 switch
8.21812431561929

ps4 pc
1.5160265239233675

xbox switch
7.010956187898123

xbox pc
0.5826510174543579

switch pc
6.286296818538614

Conclusion

  • switch has greatest statistical significance in comparison with other platforms

Systematic Test Summary

  • For ps4 and switch, The difference between distributions and means is notable
  • For the two randomly generated, through shuffling, groups, The difference between distributions and means is not notable alike original ps4 and switch
  • The disappearance of noted pattern in the two randomly generated groups counts as an evidence of our hypothesis. That is, the pattern (difference) of switch and ps4 distributions is attributed to platforms factor.
  • Switch has greatest statistical significance in comparison with other platforms
Mostafa Touny
Mostafa Touny
Software Engineering Undergrad