anime sabermetrics: A-WAR

Statistics! Math!

One thing about anime viewing is people get really offended when you don’t like the shows they like. “What, you don’t like Sakurasou no Pet na Kanojo?!” as they look at you incredulously like you just admitted to being Mormon and have three wives. Even if you try to reason with these people, “The animation is piss poor, and the setup is a lazy man’s version of Love Hina,” well, they don’t go for that. They block you with the same, “Hey, it’s cold outside, therefore global warming is false” attitude that is so nicely described in Collapse. Well, it’s time for a subjective way to say, “Hey, that show sucks.”

Anyway, I’m a big believer in using statistics as a tool of prediction, whether it be baseball or politics or stocks, statistics is as an useful tool. Seeing Nate Silver win the presidential election inspired me to finally publish my newest stat. I have long since introduced the first anime sabermetric: Moe Over Replacement Value. Now it’s time for another: A-WAR. Anime Watchability Over Replacement. (I take the ‘a’ from ‘watchability’… hehehehe.)

There’s only three main services that provide anime rankings, and all three have their flaws. But aggregated? Mmm… let’s look at two shows which are considered Top Tier in that they occupy top five slots (ever) across all three ranking agencies, and they won their prestigious blog好き Best of year.

Steins;Gate: Anidb (9.10), MAL (9.14), ANN (9.08)
Clannad After Story: Anidb (9.12), MAL (9.15), ANN (9.09)

It seems like for all three ranking agencies, the top score is about 9.1 range. This makes it slightly easier since we don’t need to equalize scores across all three agencies. MAL may be slightly higher than the rest because generally it has fewer voters, and more voters means small sigma variability. So I’m not too concerned about this.

Onto shows that are great. Period. But history his dimmed on them a bit…

Cowboy Bebop: Anidb (8.45), MAL (8.83), ANN (8.9)
Kenshin: Anidb (8.36), MAL (8.47), ANN (8.25)

For these older classics, the ratings basically all line up. It also seems that they age downward as these shows once were top fives with rankings above 9 at some point. Time has a small but noticeable effect on the show ranks, as they should: no matter how great Kenshin is, the production values will be substandard to a 2012 show. Duh. And people value production values to a certain degree: some consider it very important, some moderately, some slightly, some none-at-all.

Now let’s consider replacement level shows, basically, nothing that makes me think, “Mmm… I’m going to clear out my Sunday night so I can watch three episodes of… Girls Bravo! Final Approach! Sola! Sakurasou no Pet na Kanojo!

Girls Bravo: Anidb (4.27), MAL (6.99), ANN (6.58)
Final Approach: Anidb (4.11), MAL (6.88), ANN (6.57)
Sola: Anidb (5.94), MAL (7.58), ANN (7.67)
Sakurasou no Pet: Anidb (5.74), MAL (8.05), ANN (7.22)

What does this limited sample size tells me? Getting a higher score on Anidb is worth more than MAL or ANN. For most shows, it is rare to have an Anidb sample (mainly because I’m taking Anidb written review aggregate, so I suspect people are more critical) that beats the MAL or ANN sample. Without better statistics (I would love to have better access to their databases, but, alas), it’s hard to tell what these numbers mean beyond Anidb users are harder to please than MAL or ANN ones.

So to start off with A-WAR 1.0, I’m going to use this formula to try to get 0 = average and 10 = awesome. (Anidb – 5) + (MAL – 6) + (ANN – 6) = A-WAR. In this scenario, while it looks like Anidbu is weighed more, it’s actually weighed less since MAL and ANN numbers are usually within half a point of each other. In fact, there’s only a few shows that I found where MAL and ANN didn’t agree within that margin. So if we consider MAL and ANN to be similar, Anidb is weighed slightly less. That’s okay since Anidb has the smallest sample size. Looking at the preview shows, here is the result:

Steins;Gate: 10.3
Clannad After Story: 10.4
Cowboy Bebop: 9.2
Kenshin: 8.1
Girls Bravo: 0.8
Final Approach: 0.7
Sola: 4.2
Sakurasou no Pet: 4.0

This “feels” about right, as right as any statistic can “feel.” Clannad and Steins;Gate are on top with Cowboy and Kenshin a half-tier lower (3 point is about a tier). Girls Bravo and Final Approach are replacement level shows, hence the near zero. Sola and Pet have some buzz, so they are a tier up but still not in the same tier as Clannad. Unless you’re that anime blogger who ranked Sola higher than Haruhi Suzumiya (A-WAR 8.11) back in 2006…

(A case study is Haruhi Suzumiya A-WAR of 8.11… remember, this show was higher than Bebop back in the day… I wonder if either it just didn’t age well or if voters punished the original because of Endless Eight. I wish I had data comparing 2007 vs. 2010 ranking for this show.)

For this season, I picked out twelve shows. Here’s their A-WARs:

1. Chu-2 (5.78)
2. Psycho-Pass (5.6)
3. Magi (4.97)
4. From the New World (4.36)
5. Sakurasou no Pet (4.03)
6. Zetsuen no Tempest (3.99)
7. Jojo (3.91)
8. Robotics;Notes (3.24)
9. K (1.51)
10. Little Busters (1.09)
11. Girls and Panzer (-1.17)
12. Busou Shinki (-1.65)

Sakuraou no Pet does not suck. In fact, it’s a tier above replacement level… congratulations! It is the Nick Swisher (2012 WAR 3.9) of anime. Psycho-Pass is doing well, but the ANN and MAL numbers are much, much better than the Anidb ones so I think it’ll eventually settle out in Guilty Crown (A-WAR 2.75) territory where Anidb reviewers slam it while it gets very high rankings on ANN and MAL. Interestingly enough, Guilty Crown is one of the few shows I found where ANN and MAL disagreed by over 0.5 points… with MAL being high compared to both ANN and Anidb.

Chu-2 is where I thought it would be, and it has ANN being the lowest rating. Usually, after thousands of samples, MAL and ANN align okay, and, if Anidb matches that number, that show will do well. Chu-2 is on that trajectory. Robotics;Notes is a disappointing show according to the rankings, and it’s in-line to be a rich man’s Chaos;Head (A-WAR 0.7). But I suspect Robotics;Notes will pick up as it finally gets to the gooey plot points.

Little Busters… wow… Key going with JC Staff is looking like a Barry Zito, Carl Crawford, A-Rod, and Jayson Werth-level of bad decision. And I guess I have readers who love Upotte and also love Girls and Panzer, which explains why no one else seem to like it.

That is A-WAR in a nutshell. Now back to looking up nekomimi meido images fine-tuning Value Over Replacement Moe…

K-On! Movie A-WAR 7.5… Mugilicious VORM 10.0…”

19 Responses to “anime sabermetrics: A-WAR”

  1. >that Garupan rank

    >clearly all three sites are populated by morons

  2. Oh hey, guess which two shows are at the top of my watch-list this season? Chu2 and Psycho Pass!

    Haven’t given Magi a chance yet and Shin Sekai Yori went straight to file 13 after the one-two punch of episode 5 and 6.

    Gave Sakurasou no Pet na Kanojo 3 episodes and it did not deliver. Currently have Zetsuen no Tempest downloaded but not started. Haven’t touched the other shows.

    I’m feeling strong A-WAR correlation!

    (Haruhi definitely hit 9.4+ on AniDB directly after the airing of episode 14 back in 2007. It was the highest rated show for a few nights)

  3. “Objective”.

    “Well, it’s time for a objective way to say…”

    If you’re going to call other people stupid, you should at least know the difference between “objective” and “subjective”.

  4. Hmm…guess based on A-WAR I should quit slacking off on starting some of the shows this season. I’m guessing based on the tone of this post that you’ve had a few emails about Sakurasou no Pet na Kanojo. I’ve enjoyed the humor from it, but I also know that given a season or two I’ll probably forget about it.

    I would be kind of interested to see how SA:O is faring in the A-WAR rankings. Can’t say I’ve been following public opinion on that one, but I’d be curious to see whether the bitter, light novel reading fans poisoned the well for what has turned out to be a fairly average show. I personally thought they spent a little too long on the side stories relative to the main story for Vol. 1-2, and the pacing feels off for Vol. 3-4. Also feels like they cut the budget for the fight scenes where they could have had some really great stuff.

    Chu2 seems to be on target to be the best of the season, though I think Robotics;Notes has a chance once it hits the real plot of the series. On that note, I do wonder if Robotics;Notes got a stronger start than it deserved due to its association with S;G and will crash and burn due to unrealistically high expectations. I am kind of surprised to see Magi as high up there as it is. I can’t say I haven’t enjoyed the show, but it’s pretty typical shounen when you get right down to it, though I have to say the production values for the show are pretty high.


    ’nuff said.

  6. I like SnPnK for the genkidere.

  7. Sigh, there’s lies, damned lies and statistics combined with preconceived notions. Here’s my public service announcement. If you are avoiding ‘Girls Und Panzer’ because you think it’s just another iteration of Strike Witches/Uppote/Dumb Fanservice Show, you have the wrong impression. Chu2 is showing more T&A and skin than Girls Und Panzer. Is it the best show ever? Will it leave a lasting impression? No and no, but it is surprisingly well directed and decently produced. It is entertaining and enjoyable, even if you could care less about the tanks of World War II.

  8. “Even if you try to reason with these people”
    I think it’s pretty conceited of you to think you could rank Animes objectively (see Matt’s comment). I mean, you can do that in the aspects you listed, but they don’t have to correlate with the enjoyment you could get out of the show.
    Maybe Noir would be such an example. I wouldn’t watch a similar Anime nowadays, but I watched Noir over 10 times and was rewatching it just yesterday, because I somehow got nostalgic again. Furthermore I prefer the really low quality version over a better one, it would just miss something.

  9. I think SoRaNoWoTo (or even K-On) is a better comparison for Girls und Panzer than the other shows coyote mentions. Genre-ically, not technically…

  10. Points, hits, rebounds, interceptions, etc. are concrete, measurable, atomic things, even if the methods by which they are obtained are sometimes disputed. Anime ratings are a bunch of people making up numbers. Thank god games are not decided by fan consensus.

  11. Don’t get me wrong, I actually agree with your numbers. There just shouldn’t be a reason for them to work. An average is just that, an average. I suppose anime fans tend to think along similar lines, so maybe some assumptions can be made about the distributions of the scores.

  12. Adding three numbers together then subtracting 17 doesn’t weight any of them. If you want to give greater weight to Anidb due to its deflated scores, try perhaps A-WAR = (Anidb – 5)*1.2 + (MAL – 6) + (ANN – 6) = Anidb*1.2 + MAL + ANN – 18. I haven’t run the numbers, but this should give you similar results while having slightly more significance than the scaled average you propose.

  13. “MAL may be slightly higher than the rest because generally it has fewer voters, and more voters means small sigma variability.”

    Wait did you just suggest that MAL which by far as the largest userbase of the ranking sites has the fewest voters? For instance Clannad AS which has been COMPLETED by *cough* 92,000 people more whom than 80,000 of them have ranked in some way compared to AniDB ~4400 users and ANN ~4000 users. I…I just don’t know how you can possibly make that mistake. The series that has the highest number of watches on AniDB let alone ratings is FMA with just 20700 watches. It’s an order of magnitude lower than the ~200,000 that have RANKED Death Note. Add another 25,000 for those who have watched.

    If you like anime and statistics I suggest you check out who has generally covered all the major statistical arguments made by anime better than pretty much (with all due respect) everyone.

  14. MAL has the most voters not the least.

  15. Moar A-WAR ratings!

    Tengen Toppa: 9.132
    Evangelion: 6.862
    Evangelion Re(build|boot) 1+2: 9.086
    Escaflowne: 5.842
    FLCL: 6.334
    OVER 9000!!: 3.861
    Code Geass S1+S2: 9.523 (Oh lawd!)
    Fumoffu: 7.987
    Queen’s Blade: -2.422
    Bible Black: 2.996
    Staple Staple: 8.057
    JAM IT IN!: 5.496
    Detroit Metal City: 8.027

  16. One of the key principals of analytics is that information has to be actionable from what is already known. Creating a mean-centered weighted average from 3 ratings doesn’t provide much new information. If I just looked at ANN i would know a 8-9 was worth watching without needing the other 2 ratings.

  17. Guess having the highest pre-orders for the season, a couple of sold out merchandise and being the most talked-about series this season in 2ch doesn’t help in getting high gaijin scores in a-war.

    “which explains why no one else seem to like it.” good point.

  18. Tonari no Kaibutsu-kun checks in at a 6.37 A-WAR. I guess I’m not the only one who has it as their favorite show this season.

  19. “I wonder if either it just didn’t age well or if voters punished the original because of Endless Eight. I wish I had data comparing 2007 vs. 2010 ranking for this show.”

    Well i can’t go back that far, but i can say in the case of MAL it shows as 8.33 today, back on September 30th 2009 it was at 8.58, and i seem to recall it being something like 8.65-8.68 IIRC before the whole endless eights fiasco, which is a much larger change in numbers than most shows undergo (Clannad was at 8.59 on Sept 30 2009, and is 8.58 today). So yes, I think it’s fair to say quite a few people lowered their score for the original show after experiencing season 2.

