Saturday, July 10, 2010

7.10.10

The much-anticipated World Cup final is tomorrow and I'm ecstatic. Of course, the most prevalent question on my mind is: will Spain win? And, though I could let this simple question linger within my brain - disturbed only with the slight tremors of imagination - such is not my nature. And so, with a bit of statistical caution thrown to the wind, I will attempt to use some econometric nonsense to predict the score of the upcoming match.

Now, given that the group-stage statistics constitute the largest sample for the tournament, I will be using them for my predictions: this is the first bit of caution thrown out. The second is made up of individual player statistics which, due to constraints on my interest in the topic (yes, I actually do have them), have not been recorded and will not be used. So, given this sample, my identifying assumptions are that the group stage is a representative sample of team performance and that team performance itself is representative enough to make predictions.

So, I collected data on shots, shots on goal, crosses, fouls committed, cards booked, and goals made and ran several regressions. I will spare you the regression tables and various variable transformations as I think it may reduce your already pining interest to shameless boredom, statistical annoyance, or utter pity. On to the predictions.

Assuming that the two teams play similarly to their group-stage averages, I predict that Netherlands and Spain will each score only 1 goal during the final. However, based on the following intervals,

Netherlands: 1.36 predicted goals (1.56, 1.15)
Spain: 1.22 predicted goals (1.63, 0.82)

Each team could be predicted to score as high as 2 goals. These intervals also reveal that Spain is the less consistent team since its variance (and standard error) is clearly larger. This is seen more clearly in the following graph:

You can see that Spain's interval is much wider and even extends below 1. However, the other, perhaps more interesting result is that the intervals overlap, meaning that the two teams are not statistically different. What could this mean? Basically that the two teams are well matched and that, statistics aside, we're in for a sweet World Cup final. Olé.

Other fun facts about the group stage:
- Chile and Australia tied at 61 for the most fouls committed
- Chile had the most players booked: they received 13 cards (yellow = 1; red = 2)
- Argentina took the most shots: 64 in total (Brazil was next at 57; Ghana and Spain tying for third at 54 each)
- Argentina also had the most shots on goal: 30 in total (Brazil and England tied for second at 21; the US and Ivory Coast coming in at third with 20 each)
- Spain moved the ball the most, finishing the group stage with 114 crosses (Ivory Coast was next with 94 and Germany in third with 84)
- Japan was the most accurate team: its shots-to-shots-on-goal ratio was 0.54 (Slovenia was next at 0.5 and then Argentina with 0.47)
- Serbia received the most cards proportional to its fouls: the team received a card about every third foul it committed (ratio of 0.33); Germany was next, receiving a card about every fourth foul (ratio of 0.24).

4 comments:

Alissa said...

Oh my gosh. I'm not sure if I should laugh with you or cry for you? ...

THIS IS AWESOME! Though I must admit, it brought back less than pleasent memories of 328 assignments. ;-)

xoxoxo.

The Real Spandex Bandit said...

Dude, Netherlands is gonna tear it up!!!! DUBSKIIIIISSS!

P.s. Have you heard about the psychic Octopus in Germany? It predicted all of Germany's wins and loses, has never been wrong (including todays game), and has predicted Spain to win tomorrow.

Anonymous said...

i fear i am putting too many miles on the green truck. oh, well, it's fun to drive it. i feel like i'm in idaho...

meg said...

Wow, that was a lot of words. I got distracted by the sandwich you posted on 7/6. Sorry.