o In generated makefile, explicitly list which .d files to load
  in a desperate attempt to reduce time it takes to startup make.

o Replace use of my homegrown Monads library with the Hugs-GHC Monad
  libraries.

o Can we split foo.a into multiple .a files to get better parallelism,
  less unnecessary rebuilding and avoid quadratic behaviour from rebuilding
  symbol tables?  Need to use --start-group ... --end-group feature from
  GNU ld to preserve semantics.

  What is tradeoff between time spent by ld and time spent in ar?
  This quote from the man page suggests it could be significant
  but since we currently spend much more time in ar than in ld, 
  it might be worthwhile.

     Using this option has a significant performance cost.  It is best
     to use it only when there are unavoidable circular references
     between two or more archives.

  A quick test suggests that it's a win:

    Single .a file: 44s (real) 7s (user) 7s (sys)
    Ten .a files:   27s (real) 6s (user) 3s (sys)

  Of which 1.5s (user) is make startup time, 2.5s (user) is link time
  with little appreciable difference between link times in either case
  (being generous, it might be an extra 1 second for multiple .a files).

  This excludes the benefit from only having to rebuild some of the .a
  files when things change.

o When we get round to machine generating wrapper code, use declarations
  like this to declare the types of the wrapped and unwrapped versions.

    extern typeof(vprintf) in_vprintf;
           typeof(vprintf) out_vprintf;

o Should omit virtual initialisers/finalisers from Binding'
  and do a minimal sanity check that requirements and provisions are
  disjoint.

o In renaming, maybe extend suffix/prefix syntax to allow

  rename exports-foo with suffix _bar

o (When flattening) Don't mark functions with "inline" if they are
  used more than once and their size exceeds some threshold.

  Note that use of threshold breaks down if a calls b, a is small,
  b is large and a is called many times.

o Ponder how to get type information into signatures.
  Choices seem to be:
  o don't - but try to get linker to check for type errors
  o signatures can optionally specify .h files (and cppflags)
    containing the relevant information

  o When looking at specification and related activities, we might also
    allow specifications on signatures.  (Although the design by contract
    folk seem to be moving away from that - don't understand why but
    should find out.  (Something to do with not separating code from
    spec??))

o Make sure that gcc's memcpy optimisation doesn't get lost by flattening.

  Sort of done - but may want to undo it (and find a better way) 
  because it means some names are hardwired into CpU

o __udivdi3 and .mcount are both introduced during compilation so flattening
  doesn't affect them.

  gcc's memcpy optimisation also requires special hacks to preserve.

  One fix would be to let us declare some "global" symbols which are not
  subject to renaming.  A variation (which I think won't work) is to
  allow me to vary the globals at various levels of the tree (cf
  implicit parameters).  Another variation is to allow globals to be 
  overridden but not hidden - so all units agree on what the globals
  are but different units may bind a global to different things.
  All very scary.  

o Should we replace (some of) the top-level unit magic with:

    unit TheOutsideWorld = {
      imports{ /* could put something here */ };
      exports{ ld_symbols : { _start, _end },
               /* etc */
             };
      external;
    }

  and eliminate imports and exports from top level units?
  And come up with better syntax for creating initialisers.

o Allow flatten/noflatten annotations on individual files.

o Find a way to make gdb and gprof work with CpU.
  Part of this is encoding variables names the way C++ does:

    A::BC::D::e  -> Q31A2BC1De
                    ^^ # of quantifiers
                      ^ ^  ^ length of each quantiifer
 
  [More careful inspection of gcc generated code reveals:

    foo::foo2::bar -> _Q23foo4foo2$bar
    foo::bar       -> _3foo$bar
    a::b::c::d::e::f::g::h::i::j::k::l::bar -> _Q_12_1a1b1c1d1e1f1g1h1i1j1k1l$bar
  ]

  When setting breakpoints, we probably want to set a breakpoint on all
  instances of the same code.  Sadly, gdb doesn't even understand that
  different instances of an inlined function need breakpoints on them so
  little hope for CpU support from that quarter.

  May need to modify rename_dot_o_files so that it renames static variables 
  too so that we have a consistent naming scheme.

  Also need to mangle .stabstr section - looks painful (and documentation 
   on stabs is woefully out of date).

o LBS syntax changes - need clear definition

o change defaults to match by name instead of type (not sure about this)

  [Note that glue modules have to match by type since there is no name
   so we probably _do not_ want to make this change.]

o stubs:

   rhs ::= stub[signature];

o How to handle dependencies which vary from one compiler to another?
  (eg assert.h)
  Or with different levels of optimisation or #defines
  Or with architecture

o How to structure freebsd_lib?

o checker which tests unit decls against the actual code to
  reduce version skew

o Linker hacking/ C mangling to provide thread-local (or component-local) 
  storage?

o Ponder relationship between FRP and bootstrapping.

o Don't pursue optimal building of foo.a any further

  Performance is quite reasonable if you build on a local disk.
  I don't think that building on a networked disk should be a priority.

o Don't pursue caching any further (without first addressing other
  bottlenecks).

  Existing implementation gives modest gains (and doesn't handle 
  dependencies as well as it should).
  
  Testing Blkio, excluding time to run CpU
  Empty cache:                      1m11.969s
  All files in cache, no .o files:  0m27.358s
  All files in cache, .o files too: 0m00.769s
  
  Full run of icbinf:
  Empty cache:           7m44.480s
  Full cache:            6m22.259s
  Full cache + .o files: 2m11.248s
  
  Conclusions: 
  1) Large gains on small run probably due to caching effects
  2) Small gains on large run not worth additional complexity
  
