Sunday, April 29, 2007

Stop Watches and Yardsticks

Recently, I've had occasion to reflect upon one of the initiatives at work. Managers would like to measure what the workers do. Its a two-edged sword. On the one hand, there's the Big Brother aspect that we've been assured is not the case. On the other hand, you can't improve if you don't measure.

I build software. If, instead, I were to dig ditches, we could easily measure the volume of dirt displaced. But what metrics does one apply to the construction of software? Lines of code? Bug Reports? Cyclometric complexity statistics?

Last Tuesday, John Cunningham of Band XI spoke at the West Michigan XP group and described some of his frustrations doing XP within IBM. He would faithfully deliver the numbers to his higher ups that they requested and know they were utterly misleading. I knew exactly what he meant, because I've done the same.

You record the time spent doing tasks and what gets lost is the utter non-value spent in some of those tasks and the absolute priceless-ness of chance encounters that come and go in two minutes. The former are reported to management and the latter are not. They serve to push management a step away from reality.

In short, I object to measurements that mislead, or can be gamed. Once I understood this I sought a metaphor that I could use to communicate what I have in mind. Consider a track meet, what measurement tools are used there? Some officials carry around stop watches and other officials carry tape measures.

In particular, consider the high jump competition. You set the bar a measured distance above the ground and the best athlete is the one who doesn't knock it down. You measure this performance with a tape measure.

But suppose an official who's been watching footraces all day comes along with stop watch and no tape measure. He's perplexed until he notes that the guy who jumps highest also stays off the ground longest. He decides to measure the athlete's performance with just a stop watch.

What would happen then? The athletes would jump differently. But would they jump higher? Subtly, the fact that the measurement has changed will change the way the athletes jump. I don't think those changes will result in higher jumps. Performance suffers due to using the wrong measurement.

It seems silly to measure the high jump this way, but suppose you had lots of cheap stop watches and only a few expensive yardsticks. Or suppose you don't understand the field of endeavor well enough to see how yardsticks are better than stop watches.

Conversely, it'd be silly to take a yardstick or tape measure to a footrace.

So, I'm not saying "don't measure." I'm saying that you have to understand what you're doing well enough to select the correct measurement.

And you have to use, and not misuse the correct measurement.

Let's go back to the Big Brother consideration I touched on earlier. In the UK there is now one video surveillance camera for every 14 people. Those cameras are in place to "catch" wrongdoing. Contrast this with a training context where your coach videotapes your jump to analyze your form, a one frame at a time. He's got a purpose that you and he agree upon. He's not out to "catch" anything except those things you want to correct in order to jump higher. He's your Coach, not your Policeman.

Do you see a cop with a radar gun? Slide into something's radar shadow and decelerate. Then look at your speedometer. You change your behavior so he won't "catch" you speeding even when you're doing nothing wrong.

The relationship between Big Brother and those under his thumb is adversarial, whereas the relationship between Coach and athlete is cooperative. The fact that "Big Brother" even comes to mind speaks volumes of the culture of an organization.

Measurement in an adversarial context is a negative-sum game. In this context, everyone is encouraged by the system to replace reality (that might cast me or my political allies in a negative light) with whatever figures we can spin into a Potemkin village filled with smiling happy peasants, each tugging on their forelocks as the Empress sails by.

The cultural question probably hinges upon a question to the measurement taker that John Cunningham raised last Tuesday: "Do I want to know what's going on, or do I want to impose my will upon the situation?"

Wednesday, April 11, 2007

std::min and std::max versus Visual Studio

Recently, months back I wrote a wicked-cool solution using boost to encode binary data as base64. Sadly, it would not compile, b/c Microsoft did something evil in Windows.h when they defined min() and max(). I'd seen this problem and coded around it before. So I modified the boost code that broke:

C:\dev\sdk\boost_1_32_0\boost\archive\iterators>svn diff
Index: transform_width.hpp
===================================================================
--- transform_width.hpp (revision 201)
+++ transform_width.hpp (working copy)
@@ -142,7 +142,7 @@
}
else
bcount = BitsIn - m_displacement;
- unsigned int i = std::min(bcount, missing_bits);
+ unsigned int i = min(bcount, missing_bits);
// shift interesting bits to least significant position
unsigned int j = m_buffer >> (bcount - i);


But was this the most righteous solution?

Just last week I had to code away from a righteous std::numeric_limits::max() to the less righteous INT_MAX for the exact same reason. Today, this problem recurred. I had a choice between committing my change to boost, or powering through the problem. "OK, Microsoft, you've exceeded your kluge allocation."

The real problem isn't in boost, but in Visual Studio. This sent me googling to this link that said:

The Standard Library defines the two template functions std::min() and std::max() in the header. In general, you should use these template functions for calculating the min and max values of a pair. Unfortunately, Visual C++ does not define these function templates. This is because the names min and max clash with the traditional min and max macros defined in . As a workaround, Visual C++ defines two alternative templates with identical functionality called _cpp_min() and _cpp_max(). You can use them instead of std::min() and std::max().To disable the generation of the min and max macros in Visual C++, #define NOMINMAX before #including .

Therefore, I REVERTED my change to transform_width.hpp. Since I do not #include it doesn't matter where I #define NOMINMAX, so I put it in my project file's manifest of #defines.

It works and I feel more righteous.

Sunday, April 08, 2007

Viewer Discretion Required

I just walked into the family room and the TV was already on. Before the next show came on the warning appeared, "Viewer Discretion Required." The show consisted of videos of people doing stupid things and flamboyantly and visually paying the price for it.

It became obvious that viewer discretion may be required, but for the photographer and the photograph-ee, discretion was pretty much absent.

Friday, April 06, 2007

C++ Templates Are Wizardry

I had to make a fix to a C++ program, and instead of doing the same old thing I had done before, I decided to effect a righteous STL/Boost standard solution. Here's the problem, I had a bunch of named objects and I needed to store them someplace.

I had a map and it worked until my boss asked, "is it case insensitive? does it ignore suffixes?" "No, should it?" "Yes & yes."

Though you wouldn't normally think of it, map requires a comparator to function. C# calls the same thing a sorted list, b/c they don't hide the fact that in order to work, the data structure has to be sorted on keys, and this comparator keeps the keys in sorted order. This comparator was what I needed to make my map work.

If I have two equivalent keys "Thing" and "thing.subthing" and want them to map to the same value, I can do so with a comparator function. Simply declare my map as map and then define a comparator functor that ignores the things like case and the stuff after the suffix character. The first part was a simple Boost exercise:

class comparator
{
public:
bool operator()(const string& s1, const string& s2) const
{
string t1(s1);
string t2(s2);
transform (t1.begin(),t1.end(),t1.begin(), tolower);
transform (t2.begin(),t2.end(),t2.begin(), tolower);
//(more goes here later)
return t1 < t2;
}
};

The boost string algorithm is the way to go here. You should make it your friend. This handles the string insensitive part. If that's all you need. You're done.

Note: you MUST implement a LESS THAN function or the std::map template won't work.

I had to ignore all the stuff to the right of the decimal point. So, I added the following bit:

size_t dot1 = t1.find(".");
size_t dot2 = t2.find(".");
if (dot1!=string::npos)
{
t1 = t1.substr(0,dot1);
}
if (dot2!=string::npos)
{
t2 = t2.substr(0,dot2);
}

I was a little disappointed with this solution. I had hoped to just adjust the end iterators of the transform() calls above. However, it got gnarlie and harder to understand than this. If you know how can replace the transform() call with something using a back_inserter(), let me know.

After I got this working, I was impressed at how unhelpful the compiler and the language were in diagnosing the errors I'd made. I mentioned to a colleague that templates are the greatest thing ever invented, but they require a wizard to use gracefully. He agreed and suggested that if I did this every day, I'd be more efficient at diagnosis. He's right. When you work with something every day, you grok the philosophy of why things work, and you get a feel for why unhelpful compiler error messages say what they do.

Tuesday, April 03, 2007

A Righteous Way To Get The Fonts Directory

Some months ago, I had a bit of a problem where I had to write a program that went to the system fonts directory on a Windows machine. This is usually in "C:\Windows\Fonts" but on some machines the Windows directory can move.

Thus, after a bit of googling I came up with this code that worked consistently until this afternoon.

string result = System.Environment.ExpandEnvironmentVariables("%WinDir%");
result += "\\fonts\\";

This works because on Windows systems, you can find the WinDir environment variable set to the system directory. That quit working today when I encountered a weird interface problem with an ancient C++ program (written well over 10 years ago) that invokes a new, cool .NET program that uses the code above. The failure was an interesting one.

The old C++ code looks like this: (Maybe you can see what the problem was.)

loadParms.segEnv = 0;
loadParms.lpszCmdLine = commandLine;
loadParms.lpShow = show;
loadParms.lpReserved = NULL;

HINSTANCE hinst = LoadModule(exeName, &loadParms);

The loadParms struct has a segEnv handle that references the environment of the child process created by the LoadModule(). However, since it is not initialized to anything, the child process has NO environment defined. Thus the reference to %WinDir% gets nada instead of the correct directory. The newer program worked fine from the DOS command line and failed within the LoadModule() call above. This was the only clue to my problem.

I came into my boss's office chuckling about how the old system goofiness had screwed me over and pointed out the use of the %WinDir% environment variable. He countered that what I'd done above to use %WinDir% in the first place was not as righteous as using the various Environment.SpecialFolder methods/properties to get the same thing. I'd tried to do this months back and I'd failed to google up a better solution than the one using %WinDir%.

But I'd forgotten my earlier failure to find a more righteous solution. It took me a couple hours of googling to remember that I'd done this before without success. Happily, this time, I didn't give up and came upon the clues that yielded this snippet of code:

///
/// get system font path
///

/// system font path
static public string GetFontPath()
{
string systemPath = Environment.GetFolderPath(Environment.SpecialFolder.System);
string result = Path.GetDirectoryName(systemPath)
+ Path.DirectorySeparatorChar
+ "FONTS"
+ Path.DirectorySeparatorChar;
return result;
}//GetFontPath()

This works in two parts, the Environment.SpecialFolder.System gets me to C:\Windows\System32, which is too deep in the directory tree for me. But the 2nd step, invoking Path.GetDirectoryName(systemPath), strips off the "\System32" that is in my way. Then I can add back the FONTS myself.

A righteous hack. I hope this will make the next guy's googling a little easier.

Sunday, April 01, 2007

Write It Now: Relationships

The kind folks who wrote WriteItNow recently sent an evaluation copy to a writers' group that I attend. Since I'm the geekiest fellow there, I snarfed it up and promised to write up a review.

This isn't that review. It's more of a first impression. Just one aspect that I have some opinions upon: Relationshps.

WriteItNow lets you define characters in your novel. I'm working through that right now. Each character has a description, relationships, and personality. The description is just plain flat text, with a drop-down box for sex. This could be improved by adding some structure for eye-color, hair color, height and other physical attributes common to everyone. The Date Of Birth would be better replaced, or perhaps augmented, with an age when story takes place. When I am designing a story, I like to think of it in terms of "today" or "five years from now." If you have in mind to use historical figures in your novel, the Date Of Birth feature will serve you well.

Each character can have a collection of relationships. When you define a relationship, you get a dialog that has two drop downs. The first specifies the nature of the relationship: it says "wife of", "husband of", "aunt of", "neice of" and so on for about 27 choices. When I entered the first relationship, this was a problem, because I wanted the character to be the "tutor of" another character. Not in the list. Hmmmmm. No "mentor of" or "protégé of" either.

Only later did I discover the drop-down let me type in anything I wanted if it was missing from the drop-down.

A nice touch is that relationships can have "start" and "end" dates. But my complaint about birth dates applies here, too. Instead of saying that Chaz and Kim were married 3 years before Kim files for divorce, I have to specify some Saturday in June 2004 and some Wednesday in March 2007. Some specificity is good, but I'd like to hang loose on some details before I make other design decisions.

So, I started defining relationships between characters. And I noticed another problem. If I said, that Kim was the wife of Chaz, it didn't automatically say that Chaz was the husband of Kim. I saw this was a potential difficult thing to put into a program. For instance, if I say that Art is a lover of Jennifer, Jenifer may not be a lover of Art: she might just be stringing him along.

But just now a solution occurred to me. In the bowels of the source code for Write It Now, there must reside a software entity, (called a class) named something by the programmer. I'll name it "Relationship" for now. This class would be responsible for holding everything associated with a Relationship between two characters. Right now, I suppose it holds a string that specifies the type of relationship (e.g. "husband of"), two date objects for start & end, and a reference to a Character object. I here propose a refactoring of that object. Replace the string that identifies the type of relationship with a RelationshipType object.

The relationship type object I'm proposing has a boolean attribute: Reciprocal. Some relationships are reciprocal, others are not: e.g. Chaz is a husband of Kim and Kim is a wife of Chaz, but Art is a lover of Jennifer, but Jennifer has no such relationship with Art. With this distinction in mind, the programmer can associate reciprocal relationships automatically. When I add a "husband of" relationship between Chaz and Kim, it creates a reciprocal relationship "wife of" between Kim and Chaz. If I change my mind and want to I delete a reciprocal relationship, the software must goto the other character and delete its reciprocal relationship, too.

If you buy into the notion of a reciprocal relationship, you also have to worry about the polarity of that relationship. When I look at a relationship between Chaz and Kim from the perspective of Chaz, I want to see "husband of" and when I look at Kim's relationships, I should see "wife of". In graph-theoretic terms, reciprocal relationships are "directed" arcs between characters and this notion of polarity or direction, captures this distinction.

This would entail updating the Edit Relationship dialog to add a checkbox for Reciprocal or not. And also a "polarity" flag to select between "husband of" and "wife of" when displaying the relationship.

This also has an impact on the Relationship drop-down. I don't think free text in the combobox works any more. Instead it has to enumerate all the known relationships plus one called "other" selecting other would bring up a dialog where one defines:
  • reciprocal or not,
  • relationship description (e.g. husband of)
  • reciprocal description (e.g. wife of).
A good thing from the users' perspective is that the drop-down would change from being 27 choices to about half that. And the user would see explicitly what to do when s/he wants a relationship that's not shown on the list.

While writing this, it occurred to me that the WriteItNow folks would also do well to refactor their Date object. Sometimes dates are important to know exactly, other times exactitude is a disraction or should just be a deferred decision. This argues for "fuzzieness" in the specification of dates. The date specifier dialog in WriteItNow have some checkboxes for Month, Day, Week Day, and BC. I'm not quite sure how these work and should start searching the documentation. (They govern presentation of the date. Bad form to confuse data entry and presentation. There's no Help button on the date specifier dialog. It's unclear how the red X is different from the Cancel button, either.)

Instead, I think every Date object needs to recognize the distinction between absolute and relative times. Absolute times are points like 2:57 pm GMT June 2, 1934. Relative times are points like 5 years ago. Let's suppose I have in mind that Damien is 13 when the story takes place. If I write the story is to take place "now," his birthdate can't be specified absolutely. Similarly, I may want to defer decisions about how old he is exactly. I'd prefer to say Damien is between 12 and 15. When I know more, I'll narrow it down.

Same goes for absolute dates. I may want to say that an event takes place "during WW2" but I'm not sure exactly when. I had a story where my hero lost an arm (and a fiance) during a Nazi bombing attack. When exactly? I didn't know more than between 1939 and 1945. And I didn't need to know until later.

WriteItNow will do a better job if it enables the writer to capture artistic decisions in relative terms with some degree of fuzziness. Thus I think its Date object should be refactored to mind these two distinctions: absolute versus relative, and inexactitude. Relative times are tricky, I'll come back to them momentarily.

Fuzzieness can be handled by adopting intervals. Any fuzzy date is really a range between the earliest and the latest times of an event. My hero's injury starts out at [1 Sep 1939 - 2 Sep 1945]. Since my hero is an American I decide he has to be hurt after Pearl Harbor: [7 Dec 1941 - 2 Sep 1945]. So it goes, as I research and refine dates and narrow the range. Exact dates are simply zero-width intervals. In my example, these are all absolute times, but relative times may prove useful. e.g. Damien's age mentioned above.

Relative times are tricky because they are "relative to what." WriteItNow enables the writer to define events. Events seem to be the best candidate for providing reference points for relative dates. What makes them tricky is that you can have multiple narrative tracks in a story. Neal Stephenson's Cryptonomicon tracked WW2 and contemporary events in parallel, for example.

In my current story, I figure Chaz will be found dead in a locked room. (Someone named Chaz should always be found dead at some time in the story.) I figure he'll be married to Kim about 3 years before that, and she'll file for divorce about a month before that. Meanwhile, Art and Damien are establishing a love-hate relationship as tutor and student. I don't know exactly how those two narrative threads will merge. I don't want to decide that yet. I'll want to specify all Kim and Chaz events relative to each other. And i'll want to specify all Damien, Art and Jennifer events relative to themselves.

Where it gets tricky is when I try to harmonize multiple the narrative flows. I'll need to float all the dates in one narrative to let me slide it forward and back to match up with the other. And I need to check for contradictions. Like Chaz getting killed before he can insult Art at a dinner party. Thus the Date object must be made fuzzy to some extent, and must be either absolute or relative. Similarly, when events are "linked" (I haven't gotten to that part of WriteItNow yet), some kind of consistency check between them will be needed.

More on that later.