If you work for a consultancy time tracking is obviously important because you need to accurately bill the client for the work that was done. I'm not going to address that here because I don't work for a consultancy. Some companies that are not consultancies also do time tracking on an individual basis. The reason they typically give is to track costs related to different products or different parts of a product. I have two questions I want to look at: is this information useful and if it is, can we get it another way?
First let's take a look at the typical process in an agile development environment. The product manager works with customers, marketing, sales, whatever to figure out what features are important to the customer. These features are turned into stories and prioritized. The engineering team works on these stories in roughly priority order to meet the business needs.
Seems pretty straightforward so far, right? So, let's say we have two products that we sell. Let's also say that product A costs 10 times as much as product B. Some customers purchase one and some purchase both. After analyzing our time spent we determine that we are spending 40% of our time on product A, 40% on product B and 20% on stories that benefit both. We look at the cost vs revenue and determine that we are showing a loss on product B and a healthy profit on product A which more than offsets the loss on B. If that was the end of the story we'd conclude that we need to shift more time to product A, increase the cost of product B or get rid of product B altogether.
So what's wrong with this analysis? Several things in my opinion. The first is that it doesn't take into account the product offering as a whole. For customers that purchase both products, does the value of product A decrease without product B? What percentage of customers who use product B eventually also buy product A? A cohesive set of products is often more valuable than the individual parts. If we increase the cost of product B or get rid of it, how does that affect our sales of product A and/or B? In retail terms, is product B a loss leader?
The second and probably more important problem is that it ignores the fact that the process we outlined above means we are always working on the things that we believe are the most important to our customers. If our customers are more interested in new features for product B, why wouldn't we focus our time on product B? We should be concerned with overall satisfaction and revenue, not the satisfaction or revenue on any given part.
So let's say we understand all that but we still want to track the time spent. Is there a better way to get the information than having each individual track their time spent on each product? Part of the job of VPs/directors/managers is to be the interface between the business needs and the implementation of those needs. They are aware of sales data and they are also aware of how much time is being allocated to the various products because they are involved in the agile planning process. If we completed 40 points worth of stories for product A and 40 points for B, that gives them a pretty good estimate of the allocation of time for the team.
Maybe there is an argument for time tracking at the individual level that I am missing here, but I have a hard time seeing any concrete benefits. Even if the burden is very small, the small things tend to add up.
Wednesday, October 15, 2014
Monday, September 1, 2014
Hiring Software Engineers
We recently looked at our hiring process to attempt to determine what works and what doesn't. By "works" I mean what gives us insight into future performance. The first thing we realized was that we had no process.
Step 1: Create a process
The first thing we did was look at data from other companies. Google has published some information about their hiring process and in particular what didn't work. Brain teasers were out, but there wasn't a lot of good information about what does work. The problem is that without a lot of data you really don't know how good your process is. You certainly know how well the people you hired have performed, but you have no way to know how good the people you didn't hire were and without a highly structured process it's hard to match interview performance with real-world performance.
The next thing we did was to have everyone come up with a list of questions they typically ask candidates and for each one we asked, "what does the answer to this question tell us?". Does it tell us how well the candidate will perform as an engineer, does it tell us how good the candidate is at interviewing, or does it give us some vague sense of a quality we think an engineer should have that may or may not translate to real-world performance? It's hard to know which questions fall into the first category, but it's a little easier to filter out questions that fall in the last two.
We started by not presuming any relationships that we didn't have data to support. Does performance under stress in an interview translate into performance under stress in the real world? I don't know, so presuming that relationship does not provide me with useful information. Does the ability to "stand out" in an interview process correlate with being a good engineer? Probably not. We attempted to determine for each question how and why it would impact our decision to hire someone. Is specialized technical knowledge critical, important, just nice to have? What kinds of questions will tell us if they are a good fit with the team?
After we came up with what we felt like were a good set of questions we looked at the environment of the interview. Since we have no data to support the hypothesis that there is a correlation between being good at interviewing and being good at software engineering, we wanted to create an atmosphere that was as relaxed as possible. We actually explain our process and the thinking behind it because we don't want candidates to feel like stumbling over an answer because they are nervous is going to reflect negatively on them. We wanted to give them every opportunity to show us what they've got because we don't want to miss out on someone really good over a bad interview.
The last thing we looked at was our filter for candidates. This one was tough because we don't want to waste time on someone who is obviously not going to work out, but at the same time we don't want to have such a restrictive filter that you miss out on good candidates. What we settled on was FizzBuzz. A very simple test to see if they could program at all, or at least could figure out how to use Google. We try to impress on recruiters that we don't want them to apply their own filters because frankly we don't trust them to make any better decisions than we would, but we haven't met with a lot of success on that front.
So, how well is it working? I don't know. We certainly removed things that were not likely to be useful and put some priorities on the process that everyone understands and agrees on. I'd like to say time will tell, but we just don't hire enough engineers to collect the kind of data that a Google or Facebook could. Oh, and if you're looking for a job with some challenging problems, we are hiring.
Step 1: Create a process
The first thing we did was look at data from other companies. Google has published some information about their hiring process and in particular what didn't work. Brain teasers were out, but there wasn't a lot of good information about what does work. The problem is that without a lot of data you really don't know how good your process is. You certainly know how well the people you hired have performed, but you have no way to know how good the people you didn't hire were and without a highly structured process it's hard to match interview performance with real-world performance.
The next thing we did was to have everyone come up with a list of questions they typically ask candidates and for each one we asked, "what does the answer to this question tell us?". Does it tell us how well the candidate will perform as an engineer, does it tell us how good the candidate is at interviewing, or does it give us some vague sense of a quality we think an engineer should have that may or may not translate to real-world performance? It's hard to know which questions fall into the first category, but it's a little easier to filter out questions that fall in the last two.
We started by not presuming any relationships that we didn't have data to support. Does performance under stress in an interview translate into performance under stress in the real world? I don't know, so presuming that relationship does not provide me with useful information. Does the ability to "stand out" in an interview process correlate with being a good engineer? Probably not. We attempted to determine for each question how and why it would impact our decision to hire someone. Is specialized technical knowledge critical, important, just nice to have? What kinds of questions will tell us if they are a good fit with the team?
After we came up with what we felt like were a good set of questions we looked at the environment of the interview. Since we have no data to support the hypothesis that there is a correlation between being good at interviewing and being good at software engineering, we wanted to create an atmosphere that was as relaxed as possible. We actually explain our process and the thinking behind it because we don't want candidates to feel like stumbling over an answer because they are nervous is going to reflect negatively on them. We wanted to give them every opportunity to show us what they've got because we don't want to miss out on someone really good over a bad interview.
The last thing we looked at was our filter for candidates. This one was tough because we don't want to waste time on someone who is obviously not going to work out, but at the same time we don't want to have such a restrictive filter that you miss out on good candidates. What we settled on was FizzBuzz. A very simple test to see if they could program at all, or at least could figure out how to use Google. We try to impress on recruiters that we don't want them to apply their own filters because frankly we don't trust them to make any better decisions than we would, but we haven't met with a lot of success on that front.
So, how well is it working? I don't know. We certainly removed things that were not likely to be useful and put some priorities on the process that everyone understands and agrees on. I'd like to say time will tell, but we just don't hire enough engineers to collect the kind of data that a Google or Facebook could. Oh, and if you're looking for a job with some challenging problems, we are hiring.
Saturday, August 9, 2014
S3 with strong consistency
We use S3 extensively here at Korrelate, but we frequently run into problems with it's eventual consistency model. We looked into working around it by using the US West region that has read-after-write consistency, but most of our infrastructure is on the US East region and moving it would be a lot of work. Netflix has a project called s3mper that provides a consistency checking layer for Hadoop using DynamoDB, but we really needed something for Ruby.
Since we also have a lot of infrastructure built around Redis, we decided to use it for our consistency layer because it's very fast and has built-in key expiration. The implementation is fairly simple: all writes to S3 also write a key to Redis with the etag of the new object. When a read method is called, the etag in Redis is checked against the etag from the client to see if they match. If they do, the read proceeds as normal. If they don't, an AWS::S3::Errors::PreconditionFailed is thrown. The client then decides how to handle the error, whether that is retrying or doing something else. If the Redis key is nil, it is assumed the data is consistent.
In practice, it's never more than a second or two to get consistent data after a write, but we set the Redis key timeout to 24 hours to give ourselves plenty of buffer without polluting the the DB with an endless number of keys.
This is still incomplete because it doesn't cover listing methods in ObjectCollection like with_prefix and each, but it's a start.
Since we also have a lot of infrastructure built around Redis, we decided to use it for our consistency layer because it's very fast and has built-in key expiration. The implementation is fairly simple: all writes to S3 also write a key to Redis with the etag of the new object. When a read method is called, the etag in Redis is checked against the etag from the client to see if they match. If they do, the read proceeds as normal. If they don't, an AWS::S3::Errors::PreconditionFailed is thrown. The client then decides how to handle the error, whether that is retrying or doing something else. If the Redis key is nil, it is assumed the data is consistent.
In practice, it's never more than a second or two to get consistent data after a write, but we set the Redis key timeout to 24 hours to give ourselves plenty of buffer without polluting the the DB with an endless number of keys.
This is still incomplete because it doesn't cover listing methods in ObjectCollection like with_prefix and each, but it's a start.
Thursday, August 7, 2014
The Premortem
We are all familiar with the postmortem. When things go wrong, we want to understand why they went wrong so hopefully we can avoid them in the future. This works well for classes of failures that are due to things like infrastructure or procedural deficiencies. Maybe you had a missing alert in your monitoring system or something wasn't communicated to the right person. The problem is that we can't plan for the unexpected and it's the unexpected that causes the most problems.
The way the premortem works is this: imagine it is one year from now and (insert your project here) has failed horribly. Please write a brief history explaining what went wrong.
What this attempts to do is to bypass our natural tendency to downplay potential issues. Whether you are in favor of the project or not, this exercise will engage your imagination to come up with failure scenarios you might not have otherwise considered.
Give it a shot and please post comments about what you thought of it.
The way the premortem works is this: imagine it is one year from now and (insert your project here) has failed horribly. Please write a brief history explaining what went wrong.
What this attempts to do is to bypass our natural tendency to downplay potential issues. Whether you are in favor of the project or not, this exercise will engage your imagination to come up with failure scenarios you might not have otherwise considered.
Give it a shot and please post comments about what you thought of it.
Wednesday, August 6, 2014
You are bad at estimating
For the last 4 years or so at Korrelate we have been using Scrum. We had 2 week sprints, estimated stories and planned based on those estimates. In that time we have learned one very important lesson: we are very bad at estimating. In retrospect, this shouldn't have come as a surprise to anyone. There is a mountain of research across multiple fields that demonstrates just how bad expert estimates are. They are so bad that on average they are worse than randomly assigned numbers. If you think you are somehow different and you can do it better, you are wrong. You may be thinking that your estimates have been fairly accurate and your sprints largely successful. There are some reasonable explanations for this phenomena:
Given that we know we are bad at estimating, what should we do? I propose a fairly simple change: weight all stories equally. That's right, don't estimate anything. This may sound crazy at first, but the evidence shows that equal weighting is on average better than expert estimates and as good or nearly as good as sophisticated algorithms. You would probably do better than your current estimates by basing them on the number of words in the story.
Now, obviously some stories will require more effort than others, we just don't have a good idea of which ones those are. I propose another change to help here: break every story up into the smallest pieces that make logical sense. Some stories will becomes epics that contain multiple smaller stories. If the story can't be sensibly broken up then just leave it. Do not make any attempt to equalize them either within the epic or against other stories, that's just a back door to estimating. I think this is the best attempt we can and should make to reduce the difference in effort between stories.
So, without estimates, how do you plan? This brings up the issue of sprints. My final proposal is that we abandon them as well. If you want to know how many stories the team is likely to complete over the next month, just take the average number of stories they have completed over the last few months. The time they took to finish is irrelevant and you can still follow trends in velocity over time and use them to provide better estimates about things like how adding another engineer will affect velocity and how long it will take to ramp them up. Resist the urge to add your "professional intuition" into the equation, you will only screw it up. Trust the data, not your gut.
I'd love to hear your personal experience with estimating or not estimating. Sprints vs continuous deployment or anything else related to improving the development process.
- You are doing a lot of very similar tasks. Given a history of nearly identical work it is possible to come up with fairly good estimates
- You have a large enough team that your bad estimates have roughly balanced each other out so far
- You are working a lot of extra hours
- You have yet to experience regression to the mean
Given that we know we are bad at estimating, what should we do? I propose a fairly simple change: weight all stories equally. That's right, don't estimate anything. This may sound crazy at first, but the evidence shows that equal weighting is on average better than expert estimates and as good or nearly as good as sophisticated algorithms. You would probably do better than your current estimates by basing them on the number of words in the story.
Now, obviously some stories will require more effort than others, we just don't have a good idea of which ones those are. I propose another change to help here: break every story up into the smallest pieces that make logical sense. Some stories will becomes epics that contain multiple smaller stories. If the story can't be sensibly broken up then just leave it. Do not make any attempt to equalize them either within the epic or against other stories, that's just a back door to estimating. I think this is the best attempt we can and should make to reduce the difference in effort between stories.
So, without estimates, how do you plan? This brings up the issue of sprints. My final proposal is that we abandon them as well. If you want to know how many stories the team is likely to complete over the next month, just take the average number of stories they have completed over the last few months. The time they took to finish is irrelevant and you can still follow trends in velocity over time and use them to provide better estimates about things like how adding another engineer will affect velocity and how long it will take to ramp them up. Resist the urge to add your "professional intuition" into the equation, you will only screw it up. Trust the data, not your gut.
I'd love to hear your personal experience with estimating or not estimating. Sprints vs continuous deployment or anything else related to improving the development process.
Subscribe to:
Posts (Atom)