Analysing World of Warcraft auction house prices

Some time ago I created a data gathering project using Flask that you can preview here. I also created three blog posts about it here, here and here. The code was running on heroku for a couple of months and now is the time to see what data it has gathered. The applicatino gathered information about the prices of specific consumable items that are used by players progressing through the game content. This is exciting!

The analysis jupyter notebook is available here.

What is World of Wacraft?

According to Wikipedia:

World of Warcraft (WoW) is a massively multiplayer online role-playing game (MMORPG) released in 2004 by Blizzard Entertainment. It is the fourth released game set in the Warcraft fantasy universe. World of Warcraft takes place within the Warcraft world of Azeroth, approximately four years after the events at the conclusion of Blizzard’s previous Warcraft release, Warcraft III: The Frozen Throne. The game was announced in 2001, and was released for the 10th anniversary of the Warcraft franchise on November 23, 2004. Since launch, World of Warcraft has had eight major expansion packs produced for it: The Burning Crusade, Wrath of the Lich King, Cataclysm, Mists of Pandaria, Warlords of Draenor, Legion, Battle for Azeroth and Shadowlands.

Why is the data worth investigating?

The game World of Warcraft has already been a gold mine when it comes to scientific work (see google scholar). From social studies to finance, the world created by Blizzard has given many information about how players behave with their created characters, modelling how they act in real life situations.

What is the meaning of all those strange words?

World of Wacraft as any other game has specific terminology for items that are present in the world. Here is a summary of the specific items and words that will be used while researching this notebook.

Data Analysis

Data cleaning

Information was presented in the form of a .csv file. I had to do some cleaning on it:

<class 'pandas.core.frame.DataFrame'>
Int64Index: 44702 entries, 305 to 45163
Data columns (total 7 columns):
 #   Column               Non-Null Count  Dtype         
---  ------               --------------  -----         
 0   item_name            44702 non-null  category      
 1   item_subclass        44702 non-null  category      
 2   item_min_buyout      44702 non-null  float64       
 3   item_quantity        44702 non-null  int64         
 4   item_num_auctions    44702 non-null  int64         
 5   created_at           44702 non-null  datetime64[ns]
 6   days_after_new_raid  44702 non-null  int64         
dtypes: category(2), datetime64[ns](1), float64(1), int64(3)
memory usage: 2.1 MB

Basic information

Our dataset has 44702 non-null entries. The name of the item and its subclass are categorical values. The item_min_buyout is the value that people usually buy items at (with the buyout option, rather than actually doing auctions). The item_quantity is the number of items, while item_num_auctions is the number of auctions those items belong to (one auction can have multiple items).

We can preview the statistical numerics below. Due to the fact that we are comparing items of various subclasses, the minimum and maximum values will be quite different, we will deal with this when we go deeper into the analysis, but already we can see how consumables place on the auction house for the selected time period.

        item_min_buyout	item_quantity	item_num_auctions  days_after_new_raid
count	44702.000000	44702.000000	44702.000000	   44702.000000
mean	251.548674	7728.937989	170.392846	   102.935998
std	453.994244	19342.280116	249.134527	   63.154147
min	0.010000	1.000000	1.000000	   0.000000
25%	29.990000	1107.000000	33.000000	   46.000000
50%	90.299300	2807.000000	74.000000	   133.000000
75%	299.000000	7385.750000	220.000000	   159.000000
max	4999.180000	607024.000000	4337.000000	   194.000000

The dataset starts at 2019.11.17 00:03 and the last entry is at 2020.03.29 15:27 with 4374 unique entries. Over the course of the months, the data was gathered roughly each hour, unless the server went down (2 times) or the TSM API was unavailable.

count                   44702
unique                   4373
top       2019-12-19 15:44:36
freq                       19
first     2019-11-17 00:03:12
last      2020-03-29 15:27:13
Name: created_at, dtype: object

We can preview the items and subclasses next. There is a total of 19 unique items divided into 3 subclasses: Potion, Flask and Food & Drink. During the gathering of data, I focused on getting items useful for raiding at that time. Below you can preview the table describing which item goes into what subclass:

item_subclass	item_name
Potion	        Superior Battle Potion of Agility
Potion	        Superior Battle Potion of Intellect
Potion	        Superior Battle Potion of Stamina
Potion	        Superior Battle Potion of Strength
Potion	        Potion of Focused Resolve
Potion	        Potion of Empowered Proximity
Potion	        Potion of Unbridled Fury
Potion	        Potion of Wild Mending
Potion	        Abyssal Healing Potion
Flask	        Greater Flask of the Currents
Flask	        Greater Flask of Endless Fathoms
Flask	        Greater Flask of the Vast Horizon
Flask	        Greater Flask of the Undertow
Food & Drink	Mech-Dowel's "Big Mech"
Food & Drink	Abyssal-Fried Rissole
Food & Drink	Fragrant Kakavia
Food & Drink	Baked Port Tato
Food & Drink	Bil'Tong
Food & Drink	Famine Evaluator And Snack Table

As correlations go, we can see a few interesting connections:

Correlation matrix

We will take out the Famine Evaluator And Snack Table to analyze it separately from other food items. This is because the prices for one feast are vastly higher than normal food items, so the plots would look horrible. We will also groupby the items by day and subcategory, taking the mean, to see what generally happens with the prices for selected subcategories.

As we can see in the two plots below, the prices vary over time. Interestingly we can see a big increase, then a huge drop and again a big increase around the release of Ny’alotha Raid. This is due how the playerbase behaves around new raid releases. Players come back to the game before the raid, just to get bored of content (again decreasing), to the point where hardcore players buy out items to progress through the new raid. The prices usually increase over the first couple of weeks, to drop slowly to the next raid release.

Timeseries for all items

Timeseries for snack table

An interesting thing to check was how the numbers change during the weekday. Usually guilds go to raids on selected days of a week. Popular days are wednesday, thursday and sunday. As we can see, the prices drop closer to the reset (which happens on wednesday), to jump up on wednesday and thursday, to go slowly down again. This was expected behavior. The only interesting thing is that feasts are usually bought on thursday.

Hourly prices were also checked, but didn’t provide any interesting results.

Weekday data for all items

Weekday data for snack table

Tell me what you think!