CAT | Management
11
Thought: Staying Motivated With a Personal Project
1 Comment | Posted by josh in Management

I’ve had a lot of thought and conversation lately about how to stay motivated. The fact is that we’re all human, and we all have ups and downs. Even if your super-motivated about doing something one day, the next day you might not be. I know I’ve had a lot of personal experiences where I get on a kick for a couple days, hammer out some code, then someone says “eh, that sucks”. It’s a total downer! Well, here are a few tactics you can try to stay motivated.
- Don’t listen to what other people say about your stuff, unless it will help make it better or point out an obvious flaw.
- Remember that if someone has feedback, that usually means you need to do something.
- If you work on something a while and become disinterested, keep what you’ve done around. Who knows, you may pick it up and continue working on it several months down the road.
- Finish things through to completion
I think the last point is the most important. As software developers, we become distracted very easily. Often times we become to entranced by every new technology and every different way to do things that we don’t ever get a finished product. The old tale that “an application is never finished” has put a bad taste in my mouth since the first time I heard it. While there’s always room for improvement, finishing and releasing a product, and setting milestones for future work to be done is vital. Working in bigger companies we sometimes forget that — that’s why there are project managers, product managers, etc etc. We could learn a thing or two from those guys and apply it to our own side projects.
Aside from the “setting goals” part, most of the work happens within a very small timeframe. It’s called being “in the zone”. That’s the programmers time when you are completely focused on the task at hand, and cannot be distracted by anything. This is the most important time to keep programming. If you have to stay up all night, then do it. Here’s what Joel Spolsky (who I normally read for entertainment, not how to do my job – for another post… but this is good) has to say about being “in the zone”:
“Here’s the trouble. We all know that knowledge workers work best by getting into “flow”, also known as being “in the zone”, where they are fully concentrated on their work and fully tuned out of their environment. They lose track of time and produce great stuff through absolute concentration. This is when they get all of their productive work done. Writers, programmers, scientists, and even basketball players will tell you about being in the zone.
The trouble is, getting into “the zone” is not easy. When you try to measure it, it looks like it takes an average of 15 minutes to start working at maximum productivity. Sometimes, if you’re tired or have already done a lot of creative work that day, you just can’t get into the zone and you spend the rest of your work day fiddling around, reading the web, playing Tetris.”
If you only have time once a week to get “in the zone”, then plan it. Turn off your cell phone, close your IMs, tell your wife you love her and won’t see her for bit, and set the expectation that, for example, every Thursday night you’ll be hacking away and completely unavailable. Try to know what “business decisions”, or functionality you want to include beforehand. I think about it when I’m trying to get to sleep at night, taking a shower, eating breakfast, whatever. I try to write down what I think of the next chance I get. But when it come to getting it done, that’s when that night of being alone is vital.
This was kind of a hacked out, not-completely-thought-out thought, I will hopefully try to organize it a bit better and follow up in another blog post, but this is just what I’ve been thinking about. As always, your opinions and insights are appreciated, whether it’s through email or a comment.
Let’s face it, most bigger companies nowdays are afraid of trying something new. That happens with good reason — most new ideas tend to fall by the wayside, as trends normally do, and companies like to play it as safe as possible. I see new ideas and frameworks popping up all over the Twittersphere every day, and I wouldn’t consider using any of them in a production environment.
Amazon Web Services Isn’t Just a Pie-In-The-Sky
The reason I bring this up is this — Amazon Web Services in the business (not startup) world is *still* considered a new, unproven technology. And with all the marketing hype around clouds, infinitely scalable services, etc, etc, I honestly don’t blame them. It hard to believe a pie-in-the-sky promise. That’s just the point — AWS is not pie in the sky, and people that think it is need to dig deeper and understand what it is and what it offers. The fact is that Amazon Web Services has been around since 2002, and has uptime that is most likely better than your data center. Coincidentally, Amazon also knows this and is trying to eliminate the false perception that IT IS GOOD FOR YOUR COMPANY TO USE IT TOO. They published this article, along with an updated cost calculator and an Excel spreadsheet to compare your datacenter with using AWS.
Backcountry.com and S3

Ok, so the real reason for this article. At Backcountry.com, we try hard to stay as close as we can to the bleeding edge, but going into “the cloud” has always received serious backlash. That is, until recently. Earlier this month we took advantage of the cloud for the first time in a production environment: by using S3 for our “Jumbo” product images.
First, let me explain the reasons we decided to use S3. Our webapp tier, consisting of a few boxes, hosts the Interchange e-commerce framework, and also contains all our static content. The trouble was, the 900×900 images consumed about 100gb disk space, but each box only had less than 20gb left. That left us with one of two traditional options: put new hard disks in each webapp, or use our NetApp to host the images from a single location. Neither seemed ideal, since putting in new hard disks would be pricey and could take some time, and we were already short on NetApp space given the current budget. I had done some side-work using S3, and mentioned it. Chris Alef was able to push the decision as a great idea and it was agreed to do it.
Flash forward 1 week, and we were ready to go live. We were able to convert and upload the 900×900 images to S3 over the weekend, and get the UI in place in no-time flat. We have Akamai hosting edge cache in front of S3, and we had zero problems since launch last month. I asked our operations team what they thought the bill for the month would be, and they guessed $4,000. The actual bill? Under $50. Granted, Akamai probably took most of the traffic, but that’s still mighty impressive.
There’s so much more we can do with AWS, and I hope this is just the start. I hope to be able to take advantage of other AWS services such as EC2 and SQS in the future, and I think S3 helped build confidence. AWS is a service that can be relied on for both startups and established internet businesses alike.
8
Website Emergency Response – Best Practices For Controlling Downtime
0 Comments | Posted by josh in Management
If you work for any website that receives a lot of traffic, you know how aggravating it can be when you get woken up in the middle of the night because the website is down or payments aren’t getting processed. People call this many things — Seg-1, P5, it doesn’t matter — it’s “the shit has hit the fan”. Working at Backcountry.com, I know I’ve seen my fair share of experiences. When you have a group of 5 or more people trying to work on the same problem, chaos can ensue. People will work on the same problem, stepping on each others toes, change something without letting others know, withhold vital information for the sake of being the “rockstar” who fixes the problem, or various other problems. The ultimate problem is that the company loses money, and it’s an embarrassment to have the downtime.

- Emergency Response
The first practice I recommend you, and the one thing I hope you keep from the article, is this: keep track of what you do. Track everything. Track changes you make. Track decisions you made. Track data you have gathered, no-matter how irrelevant. Recently at Backcountry, we were troubleshooting a problem that involved nearly all aspects of our architecture — high load on databases, high load on webapps, traffic stays the same, 500 errors increased, people were losing sessions, traffic through the load balancer was inconsistent… but we couldn’t pinpoint the problem. Searching through the logs, there were no errors, only timeouts. No query locks in the database. We kept record of all that information, and tried to make correlations. We eventually came to a solution by putting the pieces (or what we tracked) together until it made sense. Then everything else falls into place. The end result was that, coincidentally, there were 2 major problems at once — Varnish was passing through a 500 error which happened to be an RSS feed (i.e. high traffic), and session databases were intermittently not allowing connections (for various reasons). If we didn’t record all the data, we couldn’t have made the connections.
The second practice I would recommend is to elect a “Call Leader” when responding to an emergency. This coordinator has a few roles: communicate with business owners periodically, keep track of what tasks people are working on, and make recommendations, in some some cases dictating, what actions are going to be taken. A side-effect is that communication patterns within the team become explicit — techs looking into the problem know they need to communicate to the Call Lead, and the Call Lead needs to work with the techs. This leave the rest of the team to concentrate on the problem at hand, and only their specific silo. An example conversation among the team might go like this:
Tech 1: “I’m seeing some weird stuff in our Apache error logs, something about an error with connecting to session dbs. I’d like to take a look at it.”
Call Lead: Ok, go ahead.
That’s all you need to communicate effectively. But you’d be surprised at how many companies and teams don’t do this. Having a Call Lead helps ease this transition.
Let me move on to another subject — the stages of an emergency. One thing I’ve seen a lot of teams do is circle around a problem, jumping from one observation to the next, without ever remediating anything. I’ve attempted to layout these stages so you know where you’re at in solving the problem so you know where you need to go next to get the problem fixed. The five stages are: Reaction/Response, Collection/State What You Know, Discovery, Remediation, and Verification.
Reaction/Response
- Once you hear about the problem, whether it be nagios or the guy sitting next to you, make sure the problem is documented however way to document these things (Bugzilla, Jira, Google Doc, whatever).
- Dial into a phone conference, or join a chatroom, or do whatever you need to communicate with the other team members.
- Validate there is actually a problem. You could waste expensive, valuable time by assuming the problem is larger than it actually is.
- Communicate outwardly that you are taking care of the problem
- Get ahold of ANYONE that should be there. Don’t be afraid to call the CEO of the company if you need, the fact is that if the problem says more than X amount of time, the company will go under.
- Elect a Call Leader (discussed above)
This should take a maximum about 5 minutes.
Collection/State What You Know (SWYK)
- Have everyone on the call state what they know, documenting thoroughly. Make sure you state what YOU know.
- Set a schedule or a plan of attack.
Discovery
- Call leader makes assignments (dependent on what people say, of course)
- Get a list of options/suggestions from techs working on the issue.
- Weigh options, and avoid “Analysis Paralysis”
Should take 5-30 minutes, sometimes more, sometimes less.
Remediation
- Call leader makes decision to do an option, and you execute on it.
Some notes about this one:
You can easily get yourself into a bind by making changes too rashly, and not thinking about the consequences. I try to use the following principles:
- Rollback should always be an option. Too many people are afraid to remove new functionality because of pride or whatever reason. After a really, most often it’s the best solution to just roll back.
- When changing live-site behavior immediately, try to do it in a rolling fashion. Restart servers one at a time when in a clustered environment. When code changes are necessary, roll them to one server if possible to verify changes fix the problem.
Verification
Another often over-looked step in the process. This is when the business verifies the problem as fixed — or there is no longer any customer impact.
There’s much more, so much that I could write a book on the subject, but I hope this is enough information to be helpful. I may dive deeper into the different roles and practices in another post, so keep checking back. As always, I would love to hear feedback on the subject.
