Friday, January 4, 2008

Code Organization

Even though I read Steve McConnell's seminal Code Complete years ago, there is one piece of advice in it that I use every day. In the section Making Code Read from Top to Bottom, he presents a poorly organized code fragment and reorganizes it:

    1 /* C Example of Bad Code That Jumps Around */

    2 InitMarketingData( MarketingData );

    3 InitMSIData( MSIData );

    4 InitAccountingData( AccountingData );

    5 

    6 ComputeQuarterlyMarketingData( MarketingData );

    7 ComputeQuarterlyMSIData( MSIData );

    8 ComputeQuarterlyAccountingData( AccountingData );

    9 

   10 ComputeAnnualMarketingData( MarketingData );

   11 ComputeAnnualMSIData( MSIData );

   12 ComputerAnnualAccountingData( AccountingData );

   13 

   14 PrintMarketingData( MarketingData );

   15 PrintMSIData( MSIData );

   16 PrintAccountingData( AccountingData );

   17 

   18 

   19 /* C Example of Good, Sequential Code That Reads from Top to Bottom */

   20 InitMarketingData( MarketingData );

   21 ComputeQuarterlyMarketingData( MarketingData );

   22 ComputeAnnualMarketingData( MarketingData );

   23 PrintMarketingData( MarketingData );

   24 

   25 InitMSIData( MSIData );

   26 ComputeQuarterlyMSIData( MSIData );

   27 ComputeAnnualMSIData( MSIData );

   28 PrintMSIData( MSIData );

   29 

   30 InitAccountingData( AccountingData );

   31 ComputeQuarterlyAccountingData( AccountingData );

   32 ComputerAnnualAccountingData( AccountingData );

   33 PrintAccountingData( AccountingData );

McConnell explains why the refactored layout is superior:

This code is better in several ways. References to each variable are "localized": they're kept close together. Values are assigned to variables close to where they're used. The number of lines of code in which the variables are "live" is small. And perhaps most important, the code now looks as if it could be broken into separate routines for marketing, MIS, and accounting data. The first code fragment gave no hint that such a decomposition was possible.

The idea is to structure your code so that statements are grouped by the data they operate on, not the operations they perform. I'm often tempted to organize code so that grouped statements "line up" as in the first example. This may be something that I have particular trouble with because a lot of my early coding was done in Pascal, which enforces poor layout in the language syntax. Variables must all be declared in one place, at the start of a method. In class definitions fields must come before properties and methods, when it is clearly better to group related private fields, properties, and getter/setter methods together.

At any rate, this idea can be extended beyond code layout. Just one example: the Rails directory structure is organized by file kind. All the models go in one directory, all the controllers go in another, the tests in another, etc. This is analogous to grouping statements so they "line up", and is exactly backwards. The file groups that belong together logically are those which all relate to the same concept. Folders in Rails ought to be organized by model name (or controller name, for model-less controllers). One folder should hold the model, its controller, tests, and views. This is the important relationship between files: what data they operate on, not what type of file they are.

Because developing in Rails forces you to frequently jump between related files in different directories, every Rails editor or IDE has shortcuts to help you do it. Essentially these editors are plastering the proper logical organization on top of the Rails folder structure, which alleviates the problem. The uniform structure imposed by the the fixed folder layout in Rails is a stroke of genius, but having to use a nonstandard, editor-specific abstraction layer on top of it for actual development limits its awesomeness somewhat.

1 comment:

Anonymous said...

Agreed! I have found this to the be case in my experience as well.