CrapFlingingMonkey.com
A voice for all developers

Archive for December 2009

duckhunt

Some coworkers and myself had a nice discussion over dinner tonight about how things have changed over time.  More particularly, we talked about the wildly popular game “Duck Hunt”.  Yes, the Nintendo one.  How in the world does that thing work?  After some discussion, Nate Brunson finally whipped out his iPhone and came across this article detailing how Duck Hung works.  It’s all pretty interesting stuff, and it was all done way before its day.

But Nintendo wasn’t Agile!

The thing is, if Nintendo were made in the “agile” world of today, would it have been released with Duck Hunt?  Would Duck Hunt ever had existed?  My inclination is no.  It would have been labeled as “too much scope for the first increment, we should release Mario Brothers, analyze the results, and go from there”.  Immediately following Mario Brothers, which would be a hit (obviously), they would follow up with Mario Bros 2, because hey, the first one did well.  After 2, the third increment would be… (surprise) Mario Bros 3.  Eventually the idea of Duck Hunt would have been forgotten.

If you want to change the world, don’t wait until the next increment

The point is that sometimes innovation comes at a cost.  You can’t always slim down functionality to meet a deadline, and still expect to be innovative.  If there is an incredible idea out there to be had, even if you’re not sure what kind of time it will take, resources need to be devoted, or even if it’s possible, you still need to just go for it.

Where did we go wrong?

Why are we so afraid to just get things done?  I personally thinks it comes down to people not wanting accountability, or they want to be absolutely positive that they can do what they say.  They are afraid to stretch themselves.  They really don’t care about being innovative.  They care about the business, about money, and about following a “standard procedure” or “following the most effective way of doing something”.  Seth Godin is very popular and incredibly successful because he gives people the magic formula to creating a good product.  The only problem is that he doesn’t do it for you.  I’m not saying processes are a bad thing, I’m just saying that eventually some crazy guy needs to sit down, do the impossible, and get it done.  Don’t believe me?  How about the names Steve Wozniak, Ed Logg, or Brad Fitzpatrick?  Chew on them apples….

, , ,

concentration

I’ve had a lot of thought and conversation lately about how to stay motivated.  The fact is that we’re all human, and we all have ups and downs.  Even if your super-motivated about doing something one day, the next day you might not be.  I know I’ve had a lot of personal experiences where I get on a kick for a couple days, hammer out some code, then someone says “eh, that sucks”.  It’s a total downer!  Well, here are a few tactics you can try to stay motivated.

  • Don’t listen to what other people say about your stuff, unless it will help make it better or point out an obvious flaw.
  • Remember that if someone has feedback, that usually means you need to do something.
  • If you work on something a while and become disinterested, keep what you’ve done around.  Who knows, you may pick it up and continue working on it several months down the road.
  • Finish things through to completion

I think the last point is the most important.  As software developers, we become distracted very easily.  Often times we become to entranced by every new technology and every different way to do things that we don’t ever get a finished product.  The old tale that “an application is never finished” has put a bad taste in my mouth since the first time I heard it.  While there’s always room for improvement, finishing and releasing a product, and setting milestones for future work to be done is vital.  Working in bigger companies we sometimes forget that — that’s why there are project managers, product managers, etc etc.  We could learn a thing or two from those guys and apply it to our own side projects.

Aside from the “setting goals” part, most of the work happens within a very small timeframe.  It’s called being “in the zone”.  That’s the programmers time when you are completely focused on the task at hand, and cannot be distracted by anything.  This is the most important time to keep programming.  If you have to stay up all night, then  do it.  Here’s what Joel Spolsky (who I normally read for entertainment, not how to do my job – for another post… but this is good) has to say about being “in the zone”:

“Here’s the trouble. We all know that knowledge workers work best by getting into “flow”, also known as being “in the zone”, where they are fully concentrated on their work and fully tuned out of their environment. They lose track of time and produce great stuff through absolute concentration. This is when they get all of their productive work done. Writers, programmers, scientists, and even basketball players will tell you about being in the zone.

The trouble is, getting into “the zone” is not easy. When you try to measure it, it looks like it takes an average of 15 minutes to start working at maximum productivity. Sometimes, if you’re tired or have already done a lot of creative work that day, you just can’t get into the zone and you spend the rest of your work day fiddling around, reading the web, playing Tetris.”

If you only have time once a week to get “in the zone”, then plan it.  Turn off your cell phone, close your IMs, tell your wife you love her and won’t see her for bit, and set the expectation that, for example, every Thursday night you’ll be hacking away and completely unavailable.  Try to know what “business decisions”, or functionality you want to include beforehand.  I think about it when I’m trying to get to sleep at night, taking a shower, eating breakfast, whatever.  I try to write down what I think of the next chance I get.  But when it come to getting it done, that’s when that night of being alone is vital.

This was kind of a hacked out, not-completely-thought-out thought, I will hopefully try to organize it a bit better and follow up in another blog post, but this is just what I’ve been thinking about.  As always, your opinions and insights are appreciated, whether it’s through email or a comment.

, ,

Dec/09

10

S3 At A Real-world Company

Let’s face it, most bigger companies nowdays are afraid of trying something new.  That happens with good reason — most new ideas tend to fall by the wayside, as trends normally do, and companies like to play it as safe as possible.  I see new ideas and frameworks popping up all over the Twittersphere every day, and I wouldn’t consider using any of them in a production environment.

Amazon Web Services Isn’t Just a Pie-In-The-Sky

The reason I bring this up is this — Amazon Web Services in the business (not startup) world is *still* considered a new, unproven technology.  And with all the marketing hype around clouds, infinitely scalable services, etc, etc, I honestly don’t blame them.  It hard to believe a pie-in-the-sky promise.  That’s just the point — AWS is not pie in the sky, and people that think it is need to dig deeper and understand what it is and what it offers.  The fact is that Amazon Web Services has been around since 2002, and has uptime that is most likely better than your data center.  Coincidentally, Amazon also knows this and is trying to eliminate the false perception that IT IS GOOD FOR YOUR COMPANY TO USE IT TOO.  They published this article, along with an updated cost calculator and an Excel spreadsheet to compare your datacenter with using AWS.

Backcountry.com and S3

S3 At Backcountry.com
Ok, so the real reason for this article.  At Backcountry.com, we try hard to stay as close as we can to the bleeding edge, but going into “the cloud” has always received serious backlash.  That is, until recently.  Earlier this month we took advantage of the cloud for the first time in a production environment: by using S3 for our “Jumbo” product images.

First, let me explain the reasons we decided to use S3.  Our webapp tier, consisting of a few boxes, hosts the Interchange e-commerce framework, and also contains all our static content.  The trouble was, the 900×900 images consumed about 100gb disk space, but each box only had less than 20gb left.  That left us with one of two traditional options: put new hard disks in each webapp, or use our NetApp to host the images from a single location.  Neither seemed ideal, since putting in new hard disks would be pricey and could take some time, and we were already short on NetApp space given the current budget.  I had done some side-work using S3, and mentioned it.   Chris Alef was able to push the decision as a great idea and it was agreed to do it.

Flash forward 1 week, and we were ready to go live.  We were able to convert and upload the 900×900 images to S3 over the weekend, and get the UI in place in no-time flat.  We have Akamai hosting edge cache in front of S3, and we had zero problems since launch last month.  I asked our operations team what they thought the bill for the month would be, and they guessed $4,000.  The actual bill?  Under $50.  Granted, Akamai probably took most of the traffic, but that’s still mighty impressive.

There’s so much more we can do with AWS, and I hope this is just the start.  I hope to be able to take advantage of other AWS services such as EC2 and SQS in the future, and I think S3 helped build confidence.  AWS is a service that can be relied on for both startups and established internet businesses alike.

, , ,

Dec/09

9

YUI-Magnifier Released

A coworker of mine, Dustin McQuay, released the YUI Magnifier, a YUI implementation of other popular image zoom utilities.   We were actually surprised to see that nothing else like it already existed for YUI, so Dustin took the challenge of building his own, with the hopes that it might be included in other larger YUI libraries.

It boasts the features:

  • Display a magnified portion of an image, which is controlled by where the mouse is hovering over the image
  • Control over styling
  • Control over location of magnification lens
  • Magnified image can be wrapped by a larger element

Though the release wasn’t very public, it was still quite an accomplishment. It happens to be one of the first open-source releases from Backcountry.com (preceded to my knowledge by only Bucardo, a Postgres replication application written for Backcountry.com by Endpoint).  It was originally designed to be used for our 900×900 images, but got cut after development has essentially finished due to changed requirements.

It’s a pretty solid application, and hopefully the start of more open source to be coming out of Backcountry.com

, , ,

If you work for any website that receives a lot of traffic, you know how aggravating it can be when you get woken up in the middle of the night because the website is down or payments aren’t getting processed. People call this many things — Seg-1, P5, it doesn’t matter — it’s “the shit has hit the fan”. Working at Backcountry.com, I know I’ve seen my fair share of experiences. When you have a group of 5 or more people trying to work on the same problem, chaos can ensue. People will work on the same problem, stepping on each others toes, change something without letting others know, withhold vital information for the sake of being the “rockstar” who fixes the problem, or various other problems. The ultimate problem is that the company loses money, and it’s an embarrassment to have the downtime.

Emergency Response
Emergency Response

The first practice I recommend you, and the one thing I hope you keep from the article, is this: keep track of what you do. Track everything. Track changes you make. Track decisions you made. Track data you have gathered, no-matter how irrelevant. Recently at Backcountry, we were troubleshooting a problem that involved nearly all aspects of our architecture — high load on databases, high load on webapps, traffic stays the same, 500 errors increased, people were losing sessions, traffic through the load balancer was inconsistent… but we couldn’t pinpoint the problem. Searching through the logs, there were no errors, only timeouts. No query locks in the database. We kept record of all that information, and tried to make correlations. We eventually came to a solution by putting the pieces (or what we tracked) together until it made sense. Then everything else falls into place. The end result was that, coincidentally, there were 2 major problems at once — Varnish was passing through a 500 error which happened to be an RSS feed (i.e. high traffic), and session databases were intermittently not allowing connections (for various reasons). If we didn’t record all the data, we couldn’t have made the connections.

The second practice I would recommend is to elect a “Call Leader” when responding to an emergency. This coordinator has a few roles: communicate with business owners periodically, keep track of what tasks people are working on, and make recommendations, in some some cases dictating, what actions are going to be taken.  A side-effect is that communication patterns within the team become explicit — techs looking into the problem know they need to communicate to the Call Lead, and the Call Lead needs to work with the techs.  This leave the rest of the team to concentrate on the problem at hand, and only their specific silo.  An example conversation among the team might go like this:

Tech 1: “I’m seeing some weird stuff in our Apache error logs, something about an error with connecting to session dbs.  I’d like to take a look at it.”

Call Lead: Ok, go ahead.

That’s all you need to communicate effectively.  But you’d be surprised at how many companies and teams don’t do this.  Having a Call Lead helps ease this transition.

Let me move on to another subject — the stages of an emergency.  One thing I’ve seen a lot of teams do is circle around a problem, jumping from one observation to the next, without ever remediating anything.  I’ve attempted to layout these stages so you know where you’re at in solving the problem so you know where you need to go next to get the problem fixed.  The five stages are: Reaction/Response, Collection/State What You Know, Discovery, Remediation, and Verification.

Reaction/Response

  • Once you hear about the problem, whether it be nagios or the guy sitting next to you, make sure the problem is documented however way to document these things (Bugzilla, Jira, Google Doc, whatever).
  • Dial into a phone conference, or join a chatroom, or do whatever you need to communicate with the other team members.
  • Validate there is actually a problem.  You could waste expensive, valuable time by assuming the problem is larger than it actually is.
  • Communicate outwardly that you are taking care of the problem
  • Get ahold of ANYONE that should be there.  Don’t be afraid to call the CEO of the company if you need, the fact is that if the problem says more than X amount of time, the company will go under.
  • Elect a Call Leader (discussed above)

This should take a maximum about 5 minutes.

Collection/State What You Know (SWYK)

  • Have everyone on the call state what they know, documenting thoroughly.  Make sure you state what YOU know.
  • Set a schedule or a plan of attack.

Discovery

  • Call leader makes assignments (dependent on what people say, of course)
  • Get a list of options/suggestions from techs working on the issue.
  • Weigh options, and avoid “Analysis Paralysis”

Should take 5-30 minutes, sometimes more, sometimes less.

Remediation

  • Call leader makes decision to do an option, and you execute on it.

Some notes about this one:

You can easily get yourself into a bind by making changes too rashly, and not thinking about the consequences.  I try to use the following principles:

  • Rollback should always be an option.  Too many people are afraid to remove new functionality because of pride or whatever reason.  After a really, most often it’s the best solution to just roll back.
  • When changing live-site behavior immediately, try to do it in a rolling fashion.  Restart servers one at a time when in a clustered environment.  When code changes are necessary, roll them to one server if possible to verify changes fix the problem.

Verification

Another often over-looked step in the process.  This is when the business verifies the problem as fixed — or there is no longer any customer impact.

There’s much more, so much that I could write a book on the subject, but I hope this is enough information to be helpful. I may dive deeper into the different roles and practices in another post, so keep checking back.  As always, I would love to hear feedback on the subject.

, , ,

Find it!

Theme Design by devolux.org