Sunday, October 28, 2007

Futures vs. Game Odds

4 G 5 G 6 G 7 G

Red Sox .545 .242 .141 .049

The 2007 World Series makes for a good case study in how betting markets perceive futures bets vs. individual game bets. In general, you're going to see more efficient lines for single games than futures, because single games attract more action, which tends to move the line to a more accurate spot. In contrast, many futures bettors are the stereotypical "squares" betting on the hometown nine to win it all, leading sportsbooks to shade the lines against popular teams like the Yankees and Cubs.

This year's World Series was particularly interesting because it pitted an obviously superior Boston squad against an underdog that had as much "momentum*" as any team ever. This type of situation is likely to lead to squares betting on the winning streak to continue. In other words, we'd expect the Red Sox to be a lesser favorite for the Series than they would be against a team that was just as talented as the Rockies but didn't begin the Series on a big winning streak.

Indeed, the Series line closed at a market price of roughly Rockies +210/Red Sox -210. For those of you not familiar with money lines, that means the Rockies are expected to win the World Series 100 / (210 + 100) = 32.3% of the time.

The individual game lines, which should be a more accurate portrayal of each team's odds to win, closed at roughly:

Game 1: Rockies +205 (32.8%)
Game 2: Rockies +190 (34.5%)
Game 3: Rockies +130 (43.5%)
Game 4: Rockies +130 (43.5%)

The numbers in parentheses indicate the expected chances for the Rockies to win each game based on the closing money line. Since Games 5-7 feature the same pitching matchups as Games 1-3, we can estimate the money lines by extrapolation, with slight changes for the DH and home-field advantage:

Game 5: Rockies +125 (44.4%)
Game 6: Rockies +190 (34.5%)
Game 7: Rockies +210 (32.3%)

Given these individual game percentages, we can calculate the chances of the Rockies winning the Series--using the same method I did to come up with my pre-WS estimate. It comes to...


The Rockies, based on the bookie's own lines, should have been 3-1 dogs. Instead they were going off at almost 2-1.

I leave it to the reader to interpret this result, but it sure looks like the bookies knew they would get plenty of money on Colorado without offering a big price, and then laughed all the way to the bank.

* - The quote marks indicate that I do not agree with the media definition of the word "momentum" in sports. In physics, momentum does not magically switch from one direction to another, back and forth.

Friday, October 26, 2007

GG Rockies

4 G 5 G 6 G 7 G

.035 .060
Red Sox .277 .267 .233 .127

This seems more productive than simply giving the chances of winning the series. Yes, the Red Sox sweep three times as often as the Rockies win the Series.

Thanks for playing, Colorado. This Cinderella story is more Brothers Grimm than Disney.

Thursday, October 25, 2007

More Validation!

Diamond Mind's take

Here were my estimates for the Series results prior to Game 1:

4 G 5 G 6 G 7 G
Rockies .027 .075 .085 .091
Red Sox .118 .181 .230 .193

Pretty close match.

For the record, I had the Rockies at 45.8% to win the Series if they had taken Game 1, which is probably the biggest discrepancy I have with the Diamond Mind numbers.

Edit: Hat Tip, along with MGL/Tango numbers that also closely resemble mine: The Book blog


Just noticed this article from Clay Davenport, where he goes into detail about different ways to handicap the World Series.

Two things to take from it:

1. His 73.6% Red Sox estimate (from before Game 1) squares up pretty well with mine.

2. The gap in percentages across different methods shows just how dangerous it can be to blindly follow a postseason odds report that doesn't use the proper inputs.

Kudos to Clay for putting forth the effort.

10/25 Update

Team WS%

Boston 82.2
Colorado 17.8

Take this with a grain of salt, because I think Ubaldo Jimenez should be a substantially bigger underdog tomorrow than +170.

Tuesday, October 23, 2007

Number Tweaking

Aaron Cook is in as the Game 4 starter, and Tim Wakefield is out. How does this affect the odds?

Assuming Jon Lester starts Game 4 and Cook is on a tight pitch count:

Team WS%

Boston 72.2
Colorado 27.8

On the plus side for Colorado, Game 4 is their best chance to actually be favored in any individual game in the Series. I have the line estimated at Colorado -103.

There are all kinds of other possibilities if Terry Francona is willing to throw Josh Beckett in games 1, 4, and 7, but I see no evidence that this will happen.

Sunday, October 21, 2007

World Series Update

Team WS%

Boston 73.1
Colorado 26.9

I guess I just flat-out disagree with the oddsmakers here. If you're thinking of betting the Rockies, I urge you to bet individual game lines, which should be far more favorable than the opening Series line.

Efficient Market? Yeah, Right

Before Game 6, the consensus of the exchange markets was that the Red Sox were roughly 33.5% to win the AL and 24% to win the World Series. These numbers squared pretty well with mine, and pegged Boston as over 70% to win the World Series should they get there.

Now? Those numbers are around 62% and 39%. After winning a 12-2 blowout, Boston's chances of beating the Rockies go down to 63%? Huh? And the Indians, who were dogs in every game of the ALCS, are at 38% and 25%--indicating they would be bigger favorites in the World Series than the Red Sox?

The market may be full of savvy bettors, but from where I stand, it still looks like they can't do math. And I still say the Red Sox series line gets bet to at least -230 before the first pitch of Game 1. I'm not so sure on the Indians.

Saturday, October 20, 2007

10/21 Update

Team LCS% WS%

Boston 59.5 44.6
Cleveland 40.5 27.3

Colorado 100.0 28.1

Friday, October 19, 2007

Big Comebacks or Faulty Algorithms?

It seems like every year, we see an MLB team make an epic comeback or collapse in the playoff race. In 2007, the Mets shot themselves in the foot, resulting in their postseason chances going from 99.8% to 0% in 18 days (and 96% to 0% in the final 5 days); the Phillies were roughly 200-1 dogs on September 13. Additionally, Arizona won the NL West as a 70-1 dog on July 21.

Last year, the Twins were at one point a 500-1 shot to win the AL Central. In 2005, the Astros made it as a 240-1 dog, and the Indians blew a 96.5% chance in the final week.

How do we account for all these longshots hitting in a span of only three years? There are three possibilities:

A) We've witnessed several anomalies
B) The BP Postseason Odds aren't perfect
C) A little of column A, a little of column B

Well, B is certainly true--I'll bet even Clay Davenport would agree with this--but I think we're looking at C. Even if we have a perfect playoff odds model, it's not going to project the Mets to dump seven games in a row to the Phillies, plus some more to the Nationals and Marlins. It won't expect the Diamondbacks to have a 17-game stretch starting July 19 where they go 13-4 with a -2 run differential.

These were examples of what Football Outsiders would call Non-Predictive Events: certainly the Phillies and D-Backs played well to reach the playoffs, but if they had to do it again, they'd probably come up short.

For now, though, I'd like to focus on B. There are several things standing between between the BP Odds and a perfect model, but how many of them can be practically implemented?

- Tiebreakers

I think this is doable. Coolstandings already includes tiebreakers in their simulations. In the first half of the season, it's usually difficult to forecast how the tiebreakers are going to pan out, but by August you can usually tell.

If the report doesn't break ties, it could at least display the probability of a tie so that readers can make manual adjustments.

- Differences in Starting Pitcher Quality

This is certainly impractical for use in April, but it could be added for late September, or certainly for the playoffs themselves. Facing Brandon Webb instead of Livan Hernandez is a pretty big deal, and the September 27 (or mid-October) odds report should account for that.

- Shifting Team Composition

This is tricky. Certainly you want to adjust the odds report when Chris Carpenter goes down for the season or Mark Teixeira is traded, but where do you draw the line between significant and insignificant changes? I'm sure if Clay gambled on his reports, he would account for these things, but he can't be expected to account for every little injury or trade.

The ELO report is designed to account for these changes, but it's not very effective in that regard. How long does it take for the Teixeira trade to show up in the ELO numbers? A month? If Tex doesn't hit in the first few weeks, you may never notice the difference.

- Accurate Regression Numbers

Now we're talking. There's a good discussion on THE BOOK blog, with the authors concluding that the current Davenport formula does not regress each team's stats far enough toward their preseason projections, especially early in the season.

Take another look at the list of longshot winners. The 2007 Diamondbacks, 2006 Twins, and 2005 Astros were all expected to be serious contenders, and all were much better teams than their early-season records would have you believe. Did the BP Odds read too much into their slow starts? MGL and DFL seem to think so in posts 15 and 21 from the thread linked above.

Could it be that the '06 Twins, despite being 12 games out in the division race, were actually 50-1 dogs rather than 500-1? Between their disappointing start and the first-half over-achievement of the Tigers and White Sox that year, I think it's entirely possible.

Tom Tango and others have done good work determining the proper amount of regression to use in projections. I think this is one area the Odds Report can easily improve upon.


I don't mean to deride the BP Odds Report, a useful tool that sparked my interest in betting sports futures. But as with all things baseball, when I see something that's working at 80% efficiency, I want to see that number move closer to 100.

Can I just build a better model myself? Maybe. I'd need some help with computer programming, if anyone's willing to volunteer, but it could be an interesting project to keep me busy in the offseason.

10/19 Update

Team LCS% WS%

Boston 33.5 24.9
Cleveland 66.5 43.6

Colorado 100.0 31.5

Tuesday, October 16, 2007

10/17 Update

Team LCS% WS%

Boston 16.5 12.3
Cleveland 83.5 54.7

Colorado 100.0 33.0

As if Colorado's championship hopes hadn't risen enough lately, the Indians' three-game win streak is presenting the Rockies with a more favorable World Series matchup. Snow delays, anyone?

Aaron Cook

Apparently I've missed this because no one in the mainstream has mentioned it, but the Rockies may be considering adding Aaron Cook to their World Series roster.

Will this help them out? Obviously it depends on whether he's ready to return. A healthy Cook would be a huge upgrade over Josh Fogg and a solid upgrade over Franklin Morales or Ubaldo Jimenez. Of course, if Cook does come back, it may be as a long reliever, or maybe he pushes Morales out of the rotation instead of Fogg. Maybe Clint Hurdle will feel the same way he did with the NLCS roster: that he doesn't want to shake up a winning formula.

Whatever the case, it appears unlikely Hurdle will play his cards optimally. Still, this is something to consider; Cook could give the Rockies a fighting chance in the World Series if he's all the way back.

10/16 Update

Team LCS% WS%

Boston 37.8 28.1
Cleveland 62.2 40.8

Colorado 100.0 31.1

The Rockies pull ahead of Boston for the first time this season. Still, look at the contrast in their LCS percentages despite the similar WS numbers.

Again, it's possible I'm being too hard on Colorado here, but I still think the individual WS game lines will make these numbers seem reasonable when all is said and done. Here are some sample moneylines I actually used in the calculations for these percentages:

Cleveland (Westbrook) -107 @ Colorado (Fogg) +107
Cleveland (Sabathia) -105 @ Colorado (Francis) +105
Boston (Beckett) -118 @ Colorado (Francis) +118

Do these numbers look terribly unrealistic to anyone? The only problem is that these are among the MOST favorable matchups for the Rockies; they will be big underdogs in every road game. Colorado doesn't really have a great option at DH, and they face a top pitcher (or the solid Jake Westbrook) every time they play in Jacobs Field or Fenway Park.

Sunday, October 14, 2007

10/15 Update

Team LCS% WS%

Boston 57.0 42.3
Cleveland 43.0 28.1

Arizona 5.2 1.7
Colorado 94.8 27.9

Wait a minute, am I actually suggesting that not one but BOTH AL teams have a better chance of winning the World Series than the Rockies, who are up 3-0 in the NLCS?

Yup. 20 wins in 21 games is something special, but this team is still inferior to the Indians and a far cry from the Red Sox--Boston rates as nearly a 3-1 favorite should they get there.

Am I being unfair to Colorado? Possibly, but I'm using the same numbers I did for previous interleague games, which matched the estimated World Series line (AL -210 / NL +210) when the playoffs opened. Yes, the Rockies won 90 games, but they had healthy starting pitching for most of the season. Can you really imagine a World Champion team with Josh Fogg penciled in to start two Series games?

Colorado opened the night trading at 38 (sell) / 42 (buy) to win the World Series on WSEX. (For those unfamiliar with WSEX, this basically means they have between 38% and 42% chance to win, and if you disagree you can buy or sell Rockies futures.) That line has already been bet down to 35/39, but I think it will come down even more before the first pitch of the Fall Classic.

10/14 Update

Team LCS% WS%

Boston 57.0 42.0
Cleveland 43.0 28.0

Arizona 13.8 4.4
Colorado 86.2 25.6

Saturday, October 13, 2007

10/13 Update

Team LCS% WS%

Boston 73.2 54.5
Cleveland 26.8 18.0

Arizona 13.8 4.2
Colorado 86.2 23.3

How's this for a prop bet at the start of the year: Rockies to be the square pick to win the World Series on October 13. Probably could have gotten 500-1 odds on that easy.

Edit: The picture only gets worse for the non-Boston teams if the Red Sox throw Beckett on short rest in Game 4, a possibility that became far more likely when he cruised to a 10-3 win yesterday and hit the showers early. Terry Francona probably left Beckett in for an unnecessary inning, but at least he didn't pull a Bob Brenly. (Remember Randy Johnson being left in to protect the D-Backs' 15-0 lead in Game 6, 2001?)

Friday, October 12, 2007

10/12 Update

Team LCS% WS%

Boston 59.8 44.2
Cleveland 40.2 26.7

Arizona 29.0 9.1
Colorado 71.0 19.9

Monday, October 8, 2007

10/9 Update

Team LCS% WS%

Boston 59.7 43.8
Cleveland 40.3 26.5

Arizona 47.2 14.8
Colorado 52.8 14.8

If those NL numbers look like they've changed yet again, it's because I played around with a few numbers, and Colorado benefits.

If Arizona and Cleveland each throw their ace in Games 1, 4, and 7 of the LCS, the numbers look like this:

Team LCS% WS%

Boston 58.1 43.0
Cleveland 41.9 27.9

Arizona 49.9 14.9
Colorado 50.1 14.1

Interestingly, it looks like Arizona's chances at a championship aren't really helped by burning through Webb three times against Colorado, since he is then only available twice in the World Series. Of course, this is only a problem if the series goes the full seven games.

10/8 Update (II)

Chien-Ming Wang is now pitching Game 4 for the Yankees, leaving Andy Pettitte to Game 5.

Team DivS% LCS% WS%

Boston 100.0 56.7 41.6
Cleveland 71.1 29.3 19.3
New York 28.9 14.0 10.3

Arizona 100.0 48.6 14.8
Colorado 100.0 51.4 14.0

Note to the Red Sox: Great job picking the extra day of rest in your division series! I'm sure you'd rather face the Yankees than a team led by Eric Wedge anyway.

Sunday, October 7, 2007

10/8 Update

Team DivS% LCS% WS%

Boston 100.0 56.8 41.6
Cleveland 71.6 29.5 19.4
New York 28.4 13.7 10.2

Arizona 100.0 48.6 14.8
Colorado 100.0 51.4 14.0

If Eric Wedge grows a brain and starts C.C. Sabathia in Game 4, this changes to:

Team DivS% LCS% WS%

Boston 100.0 57.0 41.8
Cleveland 74.1 30.5 20.1
New York 25.9 12.5 9.3

Arizona 100.0 48.6 14.9
Colorado 100.0 51.4 14.0

Looks better, doesn't it? Maybe Wedge is a square and bet on the Yankees.

Edit: Why did the Ari/Col numbers change, you ask? I reduced the impact of a variable to account for Brandon Webb starting three games in the series, because I haven't seen any indications that it will happen.

10/7 Update

Team DivS% LCS% WS%

Boston 91.1 52.5 38.5
Cleveland 83.0 35.2 23.1
Los Angeles 8.9 3.9 2.5
New York 17.0 8.4 6.2

Arizona 100.0 49.9 15.6
Colorado 100.0 50.1 14.1

Saturday, October 6, 2007

A Quirk

You may have noticed that the current incarnation of the playoff odds indicates the Rockies have the edge over the D-Backs in the NLCS, but Arizona has a better shot to win the World Series once they get there. What gives?

It is very likely Arizona will set up their World Series rotation with Brandon Webb starting Games 1, 4, and 7. For them not to do so would be both colossally stupid and inconsistent with Bob Melvin's track record. (It's actually possible they will do this in the NLCS as well. Right now, I've set a low probability that Webb goes three games in the NLCS, but this may change based on developments, and take Arizona's NL% up with it.)

Meanwhile, I don't see the Rockies doing the same with Jeff Francis, and anyway, Francis is no Brandon Webb. Colorado has a more balanced rotation, which doesn't give them any additional help in a short series.

Really, this D-Backs team is reminiscent of the '01 incarnation, except they're missing the second stud SP and the big bopper in the middle of the lineup. Of course, those are some big shoes to fill, but if Webb can win three games, Arizona may have a puncher's chance.

How This Works

This is essentially a three-step process for each team:

1. Determine the target number of wins to achieve the team's desired outcome.

For example, the goal might be to clinch a division or to win a playoff series. In the case of a division race, the target (magic number) can also be reached by virtue of the other team losing games.

2. Estimate the winning chances for both teams in each relevant remaining game.

I'm a professional baseball handicapper. It's my job to get this step right, although I'm not always going to be 100% accurate.

"Relevant" game means we're dealing with all games involving the team itself, plus the teams chasing them in a division race, and their potential future opponents in the playoffs. For example, the current Red Sox probability to win the World Series is affected by every game played, because each game changes their likelihood of facing a given opponent later on. Their chances to win the AL are affected only by AL games.

It is this step which I hope will separate me from the pack of others who have developed playoff odds estimates. If Clay Davenport ever decides to start handicapping individual games, I'll be out of business in no time.

3. Use probability calculations to determine the chance of the team achieving its goal.

If a team needs to win three games in a row with winning probabilities of .474, .475, and .395, it has a (.474)(.475)(.395) = 8.9% chance to win all three. (This is the actual situation the L.A. Angels are in right now.)

Most calculations are going to be more complex than this, which is why I'm doing them for you.

Why MLB Playoff Odds?

You may be familiar with some or all of the three sites that have produced postseason odds for MLB. Why are my numbers different, and more importantly, why are they better?

The three sites above use what is known as a Monte Carlo simulation. In effect, the Monte Carlo engine creates a million different possible futures. In each future timeline, every team has a fixed percentage chance to win each of its scheduled games. If the Yankees are a 60-40 favorite over the Indians at Yankee Stadium, they are assigned a 60% chance to win that game in each simulation.

This is a good method, but to use it optimally, we need accurate inputs. All of the above simulators assume that a .500 team is a .500 team on any given day. In our example, the Yankees will be assigned a 60% chance to beat the Indians regardless of whether the pitching matchup is Wang-Laffey or DeSalvo-Sabathia. If A-Rod and Jeter collide while going for a pop-up and knock each other out for tomorrow's game, the .600 figure remains unchanged.

Furthermore, of the three, only Coolstandings applies tiebreakers to determine a champion in deadlocked playoff races. There are many instances where one team has a tiebreaker advantage clinched well before the end of the season, but the simulator simply counts this as half a win. That's fine for rough estimates, but we can do better.

Now, over a long season, these things tend to even out. But late in the season, or during the playoffs, these factors have a great influence on the race and the probabilities of each team advancing.

I'm not familiar with running my own Monte Carlo simulations, although Clay Davenport or the Coolstandings guys are welcome to give me recommendations for learning. What I can work with is probability distributions. Say you handicap a team's chances to win each game of a 5-game series at respectively .629, .514, .723, .450, and .498. From there, it's a relatively simple process to determine their probability of winning the series (61.8%). If the team wins Game 1, this figure goes up (75.9%); if they lose, it declines (38.0%).

Similarly, in the last month of a playoff race, handicapping each individual game should give you substantially better accuracy than you'd get from a Monte Carlo simulation. If your team is playing the Yankees in the final week but Joe Torre is resting all his regulars for the playoffs, you shouldn't be rated as a tremendous underdog. If the schedule lines up so that the Padres can pitch Jake Peavy twice in their final five games, this is a big advantage for them.

My goal is to integrate these considerations. Since they're most useful in a simple and predictable format like the playoffs, there's no time like the present to debut them.

October 6 Playoff Odds

Here are the playoff advancement percentages for the remaining MLB teams as of October 6. The chart should be self-explanatory:

Team DivS% LCS% WS%

Boston 90.7 52.2 37.8
Cleveland 83.0 35.2 22.8
Los Angeles 9.3 4.1 2.6
New York 17.0 8.4 6.1

Arizona 83.1 40.2 12.6
Chicago 16.9 9.4 3.2
Colorado 85.3 41.7 11.7
Philadelphia 14.7 8.6 3.1

Inaugural Post

I'm not sure what will become of this blog. Chances are high it will die a quick death. If it doesn't, it could be something along the veins of Coolstandings and the BP Playoff Odds, but based on better inputs.

Welcome. When we hit it big, you can say you knew me before I was famous.