? ?

Previous Entry Share Flag Next Entry
Mystery Hunt lessons
It's almost February, but I still haven't posted my wrapup of the 2009 MIT Mystery Hunt. Before I get to the puzzles, let me just say that the mystery hunt has made job interviews very awkward for me! Every time I've been on the job market, I find that I want to cite "lessons learned" from mystery hunts past -- which then forces me to explain what the mystery hunt is ("well, imagine the world's hardest scavenger hunt, with puzzles"), and I end up sounding like a huge geek. Which may not be a bad thing, I guess...

Why does the Mystery Hunt keep cropping up? Well, in my recent interview it started with a discussion of the difference between "enforced" and "permissive" management styles. There is a great and good tendency in any large endeavor, corporate or puzzle-hunt, to draw up "best practices" to codify lessons learned about the most effective ways to do things. But what happens next? There is a crossroads between two fundamental strategies. The "enforcement" strategy tries to figure out how we can forcibly prevent people from doing things other than the "best thing". Bureaucracy and long lists of rules follow, along with authority: someone who will not let X do Y unless they have been convinced that Z has happened.

A better strategy is "permissive": assuming good-faith on the part of all involved, just passively monitor the situation, so that "bad things" can be remedied without intruding on all the people trying to do good things. This is a much more efficient strategy, especially if your resources are constrained — it's less costly to monitor than to prevent — but it involves a lot of trust. Some cultures can sustain the necessary trust, and some cannot.

Mystery hunt example: how do we prevent people from calling in too many wrong guesses at an answer? Team ACME solved the problem by just putting the communal answer phone at the front of the room, so that everyone would notice if someone got crazy with the guessing. Team Codex uses an enforcer: all answers must be called in by the designated operator-on-duty. Lest you get the idea that permissive==good and enforcement==bad, let me state clearly that Codex has solid reasons for its choice: it has so many remote team members (not physically present in team headquarters) that monitoring is difficult -- the simple "phone in the front of the room" doesn't work when your rooms are scattered across Seattle, Zurich, and countless other places.

At OLPC we faced the same issue with system administration: how to we keep our network and computers secure? OLPC came from the permissive MIT/Media Lab heritage: since it is a research and learning organization, MIT uses the permissive strategy, letting users do what they please on the academic network and focusing effort on detecting "bad behavior" (worms, cracked machines, attacks, misconfigurations) and efficiently and effectively black-holing small pieces of the network to contain these problems. If a naive user doesn't properly patch their personal computer and it gets hacked, that machine is noticed almost immediately and dropped off the network. "Traditional" system adminstration focuses on enforcement: by forcing all users to use only patched versions of approved software, we can proactively prevent problems -- but at the cost of also preventing Real Work and Unexpected Solutions.

Back to the mystery hunt: since the hunt is a competitive event, it's easy for people to get carried away with secrecy to protect the team's competitive advantages. But the mystery hunt starts at noon on a Friday, with people arriving that morning from around the country, and it is crucial to get all these new people integrated into the team as quickly as possible. Every minute that some team member can't print/can't access the network/can't access the wiki/can't access the chat rooms is a minute that could be spend advancing the goal: solving puzzles. Permissive strategies are a big win here, especially because the hunt code of conduct, observed by all seriously competitive teams, ensures the necessary trust. Other teams aren't actively seeking to hack into our systems; preventing Google from sucking everything into its archives and index is usually enough.

Another OLPC example: build systems. During the run-up to a stable build, it's very important to keep build changes under control. How to prevent developers from making uncontrolled changes to the stable build? The enforcement strategy appoints a build master Who Alone Is Powerful and forces all changes to be made via the build master. The permissive strategy says that developers are generally trustworthy (and are accountable to management if not!) and retains the ability of anyone to affect the build at any time. Instead of making it arbitrarily harder to fix things as the build approaches stability, we (a) trust the developers to commit only appropriate fixes to stable, (b) watch all the changes (which has the benefit of catching even unintentional changes) and (c) back out any well-intentioned fixes that might adversely affect stability. The buck still stops with the build master to determine what's "safe" for the stable build, but their direct intervention is limited to backing out the infrequent "bad thing" instead of being interposed in every frequent good thing.

This post is very long, and I've only discussed one of the "mystery hunt lessons". Others include effective ways to collaborate with remote resources/solvers, and the importance of making "the right thing easy". The best way to get people to do "the right thing" is to make it the easiest thing to do. For example: a perennial problem is keeping remote hunters "in the loop" about what puzzles have been solved and what new puzzles have been made available. Further, there's a lot of internal bookkeeping that needs to be done on these events. My best solution to the problem was a chatbot who lived in the chatroom ACME used to keep in touch with remote solvers. When a solution was phoned in, we'd type "bot, the solution to FROBNOTIC UPGRADES is BAZNIC". When a new puzzle was made available, "bot, new puzzle: GROOVY FROOD". This was the (only) means to update our collaboration software: the puzzles/answers would be added to the wiki's automatically generated "status blackboard", and new puzzle pages would be created in the wiki to track work done. But, by performing the operation in the shared chat room, it notified the remotes at the same time. Suddenly no one had to be reminded to update the remote hunters! (Codex has a similar system: all answers are phoned in by a dedicated operator, and you are encouraged to communicate with the operator via the shared ringhunters chat room. Codex doesn't have an effective a "new puzzle notification" system, although you could monitor the changelog of the wiki to see new pages being created.)

In our latest hunt, we used Google Spreadsheets extensively to collaboratively solve puzzles. Team members would typically create new sheets in the spreadsheet to test different theories for how the puzzle might work. This had the unexpected side effect of eliminating almost all updates to the puzzle's wiki pages -- it was easy to work in the spreadsheet, and hard to keep switching back to the wiki. The problem was that it is also "hard" to write long free-text blocks inside a spreadsheet, and so most sheets in the spreadsheet were "undocumented", and it was hard reconstructing what had been done on a puzzle when a newcomer picked it up. The discussion on our mailing list started along familiar paths: can we somehow force people to update the wiki? The "make the right thing easy" lesson suggests it would be more profitable to make it "easy" to add free-text blocks to sheets of the spreadsheet -- then the "right" thing will just naturally happen.

? ?