This blog post summarizes the key points which came up in a conversation between Joe Justice and Chris Wallace discussing some challenges Chris is having with consistently and accurately estimating points for user stories in the WikiSpeed Burleson, TX, USA shop.
While explaining his current experience with story points estimation, Chris expressed that he thinks that the point estimation would not be consistent between different Scrum development teams because the skill-sets and collective expertise of the team members is different on each team.
Joe provided an analogy to clear up this common misconception about story points estimation. Imagine you are moving buckets of water. It is clear that moving a 5 gallon bucket is more effort than moving a 3 gallon bucket. Now imagine a child moving the same buckets of water. While it is more difficult for a child to move the buckets of water than an adult, it is still the same amount more difficult to move a 5 gallon bucket than it is to move a 3 gallon bucket. So, if we estimate the points using relative sizes we will have a consistent and accurate estimate of complexity for each story.
Chris pondered this water bucket analogy for some time but still had some questions based on his experience working with multiple Scrum Teams and simultaneously completing tasks which would typically be considered professionally unrelated. The following is an excerpt from Chris’ reply. NOTE: We use Tshirt sizes (S, M, L, XL, 2XL) for estimation which are then converted to Fibonacci numbers for tracking velocity.
“I absolutely agree with the water bucket analogy. And it works great if you are ONLY writing software, or ONLY working in a machine shop. I think the difficulty I’ve found personally with estimating has to do with estimating relative tasks that are of a different ‘type’ or require a different typical skill-set. For example, right now I consider measuring, marking, drilling 8 holes, and pass/fail testing 1 angle plate as a S. However, it was larger than a XL the first time I did it because of the massive amounts of knowledge I had to learn, weed through, and experiment with originally. I knew the Innovation Racing Series Captcha integration would be an XL or maybe more. Now, for me…it is also an S. I think maybe what I’ve learned is to keep the estimation in the context of it’s type, or field, or skill-set. Clearly making one angle bracket is a S compared to assembling the whole SGT01 chassis. And adding a Captcha is a S compared to eBay auto auctions integration into the Innovation Racing Series event page. I may be way off base, but I’m going to try keeping this caveat in mind and see if it improves my estimation metrics accuracy and consistency.
“Another analogy. Moving 5 gallon vs 13 gallon bucket as opposed to eating 5 pies vs 13 pies. Of course, the 13 is larger than 5 for every person. For a kid, I would assume eating 5 pies may be easier than moving a 5 gallon bucket. For me moving the bucket is easy, eating the pies would kill me. So are both the 5 gallon bucket and the 5 pies a 5? In my original view they were not. However, I think if we add context (only bucket moving or only pie eating) to the relative estimates, they are in fact, both 5’s for any team.” -Chris Wallace
Notes added by Joe: Awesome blog Chris! Reference stories I’m running into with teams seem to be professionally used per type of work. So, aluminum machining has reference stories S, M, L, (and maybe XS and XL, each mapped to Fibonacci numbers). Arduino programming and wiring has reference stories S, M, L. Composite work has reference stories S, M, L. We can only compare one team’s velocity to velocity of a similar type. A cross-functional experienced team, like you are now building at the Texas shop, will have reference stories in multiple or all categories, and eventually reference stories across categories like “Build a car, XL.” Points are from an investors point of view, outside the team. They want to know how much they need to pay this team (weeks of run rate) to get 1,000 points worth of car stuff. So they don’t want larger points if the team has to skill up, they simply want real velocity. This is why investors prefer mature teams that are used to working together and already have built cross-functionality by getting double or more velocity with many different types of reference stories relevant to the investment.
The first time any new type of work is undertaken, or when a skill is rusty, it takes more effort. But the points of the work is still the same. If teams need to set time aside to skill-up or research, that’s where the special story types are used: Spike, Research Story, and Tracer Bullet. -Joe Justice
This post is an open discussion. Please comment with your experiences on estimating story points. We’d love to hear from you!