Persistency and APIs
    Updated: 2004-02-11
Created: 2004-01-24
    This document is an incomplete draft.
    
      
    
    
      
      There are at least two completely different and often incompatible
	goals in selling a library:
      
	- Sell the library as such to some people with with considerable
	  software experience who regard it as a component and who need it
	  only as one toolkit in an application they have already designed
	  and are developing using their own favoured toolkits.
- In this case the maximum chance of selling derives from the
	  intrinsic quality of the library and whether and how well it fits
	  or can be made to fit with whatever other toolkits the customer
	  has already chosen.
- Sell the library to some people who need to use it as the
	  core of an application that they are writing from scratch
	  without great software skills.
- Better chances of selling come from offering a ready made
	  selection of integrated toolkits that come with the library and
	  the client can easily customize in some high level way.
The two goals are pretty often incompatible because faciliting
	integration into an arbitrary collection of toolkits is not needed
	and is quite different from facilitating easy customization of a
	well chosen, static collection of toolkits.
      However in general at least some of the tools used for either
	situation are common.
     
    
      
      
	TODO: Mention hidden state problem, initial load, final store,
	partial/full persistence, call limitations wrt speed and data
	types.
     
    
      
      
	TODO: mention debuggers, interpreters, domains
      Dealing with persistency and API integration requires two very
	high level and rarely used concepts which come with many
	different names:
      
	- metadata
- Metadata here is data that describes the properties rather than
	  the structure of other data. For example, the list of fields
	  in a record, or the list of parameters and the body of a
	  function.
- functionals
- These are second-order functions, that is functions on
	  metadata, like a function, rather than on data.
- reflection
- The ability of a program to operate on itself.
These two concepts are essential for both persistency and
	integration because:
      
	- Persistency depends on the ability to take an arbitrary
	  piece of memory and a type, and to reflect on it to convert itsm content to some other format according
	  to its type.
- Integration depends on the ability to redefine the function
	  invocation functional (which in C/C++ is normally implicit,
	  but is there) for functions called from or by other languages,
	  so arguments lists etc. get converted.
Both persistence and integration depend on having metadata that
	describes the types to persist or the functions to integrate, and
	functionals that do the store/load of the data to persist or the
	convert the call frame from one language ABI to another.
      The important choices are on the details of the how and when,
	not what.
     
    
      
      TODO: Mention GCC extensions, SL/5
      
	- How to generate the metadata, and in which format, and when
	  to use it.
	
- What kind of save/restore functional to write, and
	  where.
	
- What kind of call conversion functional to write, and
	  where.
In some languages it's easier than in others; for example in
	Lisp since programs and data structures have exactly the same
	representation, a program is in effect its own metadata.
      In Java and Objective-C the compiler embeds in compiled code a
	significant amount of data; in other languages there are builtin
	primitives to reflect on function calls.
      The main problem is that neither C nor C++ have any easy ways
	to generate metadata or write general functionals. Some extended
	versions do, but the extensions are as a rule not portable.
      It is therefore in general very difficult to write general
	purpose save/restore or call conversion functionals for C or
	C++. This means that special purpose ones, and some degree of
	flexibility has to be lost.
      The loss of flexibility can involve several different
	alternatives.
     
    
      
      
	
	The big problem with metadata extraction is that to extract
	  truly accurate metadata one needs full parsing of the source,
	  with exactly the same processing done by the compiler.
	Ideally therefore this would be done by the compiler, but if
	  the compiler does not do it, and can't be modified, that's
	  just not an option.
	Using any other tool will to some extent produce inaccurate
	  metadata; the issue is how often and how inaccurate.
	
	  - The metadata is generated by a separate tool
- The metadata describing a program's data structures or
	    functions can be generated by another tool than the compiler.
	    This can be a preprocessor or a postprocessor, for example:
	    
	      - A tool that scans the debugging information generated by
		the compiler, as a source oriented debugger is a fully
		reflective programs that needs extensive metadata.
		
 After all the compiler usually generates fairly
		complete and accurate metadata in the form of debugger
		information, and this may be backprocessed into source
		form.
- 
		A version of GCC that
		converts the program into a tree represented in XML
		(special thanks to Marek for pointing it out).
		
 The problem with this apporach is that it will
		generate metadata that is accurate only with regards to
		a binary compiled by GCC, and on some platforms that
		just is not a viable option.
- A header file scanner that extracts function
		declarations (e.g.  proto and
		unproto).
 
- The metadata is generated manually
- 
	    This requires writing by hand description of the data
	    structures and functions in an API. This is often done before
	    the fact, for example for RPC oriented programs.
	    
 There are several API description languages,
	    for example related to
	    ILU,
	    SWIG,
	    or
	    DCOM.
- The metadata is generated in part manually in part
	    by a preprocessor.
- This usually involves adding some manual tags to the
	    definitions or declarations of types and functions. These tags
	    are either used by a special purpose preprocessor or by a