Conventions and Design in the FreeType library TOC Introduction I. Style and Formatting 1. Naming 2. Declarations & Statements 3. Blocks 4. Macros II. Design conventions 1. Modularity and Components Layout 2. Configuration and Debugging III. Usage conventions 1. Error handling 2. Font File I/O 3. Memory management (due to change soon). 4. Support for threaded environments. 5. Object Management Introduction: This text introduces the many conventions used within the FreeType library. Please read it before trying any modifications or extensions of the source code. I. Style and Formatting: The following coding rules are extremely important to keep the library's source code homogeneous. Keep in mind the following points : - "Humans read source code, not machines" (Donald Knuth) The library source code should be as readable as possible, even by non C experts. By readable, two things are meant: first, the source code should be pleasant to the eye, with sufficient whitespace and newlines, to not look like a boring stack of characters stuck to each other. Second, the source should be _expressive_ enough about its goals. This convention contains rules that can help the source focus on its purpose, not on a particular implementation. - "Paper is the _ultimate_ debugger" (Myself) There is nothing like sheets of paper (and a large floor) to help you understand the design of a library you're new to, or to debug it. The formatting style presented here is targeted at printing. For example, it is more than highly recommended to never produce a source line that is wider than 78 columns. More on this below. 1. Naming: a. Components: A unit of the library is called a 'component'. Each component has at least an interface, and often a body. The library comes in two language flavors, C and Pascal. A C component is defined by two files, one '.h' header and one '.c' body, while a Pascal component is contained in a single '.pas' file. All component source file names begin with the 'tt' prefix, with the exception of the 'FreeType' component. For example, the file component is implemented by the files 'ttfile.h', 'ttfile.c' and 'ttfile.pas'. Only lowercase letters should be used, following the 8+3 naming convention to allow compilation under DOS. In the C version, a single component can have multiple bodies. For example, 'ttfile.c' provides stream i/o through standard ANSI libc calls, while 'ttfile2.c' implements the same thing using one Unix memory-mapping API. The FreeType component is an interface-only component. b. Long and expressive labels: Never hesitate to use long labels for your types, variables, etc.! Except maybe for things like very trivial types, the longest is the best, as it increases the source's _expressiveness_. Never forget that the role of a label is to express the 'function' of the entity it represents, not its implementation! NOTE: Hungarian notation is NOT expressive, as it sticks the 'type' of a variable to its name. A label like 'usFoo' rarely tells the use of the variable it represents. And the state of a variable (global, static, dynamic) isn't helpful anymore. Avoid Hungarian Notation like the *plague*! When forging a name with several nouns (e.g."number-of-points"), use an uppercase letter for the first word of each one, like: numberOfPoints you are also welcomed to introduce underscores '_' in your labels, especially when sticking large nouns together, as it 'airs' the code greatly. E.g.: 'numberOfPoints' or 'number_Of_Points' 'IncredibleFunction' or 'Incredible_Function' And finally, always put a capital letter after an underscore, except in variable labels that are all lowercase: 'number_of_points' is OK for a variable (_all_ lowercase label) 'incredible_function' is NOT for a function! ^ ^ 'Microsoft_windows' is a *shame*! ^ ^ 'Microsoft_Windows' isn't really better, but at least its a correct ^ ^ function label within this convention ;-) c. Types: All types that are defined for use by FreeType client applications are defined in the FreeType component. All types defined there have a label beginning in 'TT_'. For examples: TT_Face, TT_F26Dot6, etc. However, the library uses a lot more of internal types that are defined in the Types, Tables, and Objs components ('tttypes' & 'tttables' files). By convention, all internal types, except the simplest ones like integers, have their name beginning with a capital 'T', like in 'TFoo'. Note that the first letter of 'foo' is also capitalized. The corresponding pointer type uses a capital 'P' instead, i.e. (TFoo*) is simply named 'PFoo'. Examples: typedef struct _TTableDir { TT_Fixed version; /* should be 0x10000 */ UShort numTables; /* Tables number */ UShort searchRange; /* These parameters are only used */ UShort entrySelector; /* for a dichotomy search in the */ UShort rangeShift; /* directory. We ignore them. */ } TTableDir; typedef TTableDir* PTableDir; Note that we _always_ define a typedef for structures. The original struct label starts with '_T'. This convention is a famous one from the Pascal world. Try to use C or Pascal types to the very least! Rely on internally defined equivalent types instead. For example, not all compilers agree on the sign of 'char', the size of 'int' is platform-specific, etc. There are equivalents to the most common types in the types components, like 'Short', 'UShort', etc. Using the internal types will guarantee that you won't need to replace every occurence of 'short' or wathever when compiling on a weird platform or with a weird compiler, and there are many more than you could think of... d. Functions: The name of a function should always begin with a capital letter, as lowercase first letters are reserved for variables. The name of a function should be, again, _expressive_! Never hesitate to put long function names in your code: it will make the code much more readable. Expressive doesn't necessarily imply long though; for instance, reading shorts from the file stream is performed using the following functions defined in the File component: Get_Byte Get_Short, Get_UShort, Get_Long, etc. Which is somewhat more readable than: cget, sget, usget, lget, etc. e. Variables: Variable names should always begin with a lowercase letter. Lowercase first letters are reserved for variables in this convention, as it has been already explained above. You're still welcome to use long and expressive variable names. Something like 'numP' can express a number of pixels, porks, pancakes, and much more... Something like 'num_points' won't. Today, we're still using short variable labels in some parts of the library. We're working on removing them however... As a side note, a field name is a variable name too. There are exceptions to the first-lowercase-letter rule, but these are only related to fields within the structure defined by the TrueType specification (well, at least it _should_ be that way). 2. Declarations & Statements: a. Columning: Try to align declarations and assignments in columns, when it proves logical. For example (taken from ttraster.c): struct _TProfile { Int flow; /* Profile orientation : Asc/Descending */ Int height; /* profile's height in scanlines */ Int start; /* profile's start scanline */ ULong offset; /* offset of profile's data in render pool */ PProfile link; /* link to next profile */ Int index; /* index of profile's entry in trace table */ Int count_lines; /* count of lines having to be drawn */ Int start_line; /* lines to be rendered before this profile */ PTraceRec trace; /* pointer to profile's current trace table */ }; instead of struct _TProfile { Int flow; /* Profile orientation : Asc/Descending */ Int height; /* profile's height in scanlines */ Int start; /* profile's start scanline */ ULong offset; /* offset of profile's data in render pool */ PProfile link; /* link to next profile */ Int index; /* index of profile's entry in trace table */ Int count_lines; /* count of lines having to be drawn */ Int start_line; /* lines to be rendered before this profile */ PTraceRec trace; /* pointer to profile's current trace table */ }; This comes from the fact that you're more interested by the field and its function than by its type. Or: x = i + 1; y += j; min = 100; instead of x=i+1; y+=j; min=100; And don't hesitate to separate blocks of declarations with newlines to "distinguish" logical sections. E.g., taken from an old source file, in the declarations of the CMap loader: long n, num_SH; unsigned short u; long off; unsigned short l; long num_Seg; unsigned short* glArray; long table_start; int limit, i; TCMapDir cmap_dir; TCMapDirEntry entry_; PCMapTable Plcmt; PCMap2SubHeader Plcmsub; PCMap4 Plcm4; PCMap4Segment segments; instead of long n, num_SH; unsigned short u; long off; unsigned short l; long num_Seg; unsigned short *glArray; long table_start; int limit, i; TCMapDir cmap_dir; TCMapDirEntry entry_; PCMapTable Plcmt; PCMap2SubHeader Plcmsub; PCMap4 Plcm4; PCMap4Segment segments; b. Aliases and the 'with' clause: The Pascal language comes with a very handy 'with' clause that is often used when dealing with the fields of a same record. The following Pascal source extract with table[incredibly_long_index] do begin x := some_x; y := some_y; z := wathever_the_hell; end; is usually translated to: table[incredibly_long_index].x = some_x; table[incredibly_long_index].y = some_y; table[incredibly_long_index].z = wathever_the_hell; When a lot of fields are involved, it is usually helpful to define an 'alias' for the record, like in: alias = table + incredibly_long_index; alias->x = some_x; alias->y = some_y; alias->z = wathever_the_hell; which gives a clearer source code, and eases the compiler's optimization work. Though the use of aliases is currently not fixed in the current library source, it is useful to follow one of these rules: - avoid an alias with a stupid, or cryptic name, something like: TFooRecord tfr; .... [lots of lines snipped] .... tfr = weird_table + weird_index; ... tfr->num = n; it doesn't really help to guess what 'tfr' stands for several lines after its declaration, even if it's an extreme contraction of one particular type. something like 'cur_record' or 'alias_cmap' is better. The current source also uses a prefix of 'Pl' for such aliases (like Pointer to Local alias), but this use is _not_ encouraged. If you want to use prefixes, use 'loc_', 'cur_' or 'al_' at the very least, with a descriptive name following. Or simply use a local variable with a semi-expressive name: { THorizontalHeader hheader; TVerticalHeader vheader; hheader = instance->fontRes->horizontalHeader; vheader = instance->fontRes->verticalHeader; hheader->foo = bar; vheader->foo = bar2; ... } which is much better than: { THorizontalHeader Plhhead; TVerticalHeader Plvhead; Plhhead = instance->fontRes->horizontalHeader; Plvhead = instance->fontRes->verticalHeader; Plhhead->foo = bar; Plvhead->foo = bar2; ... } 3. Blocks: Block separation is done with '{' and '}'. We do not use the K&R convention which becomes only useful with an extensive use of tabs. The '{' and its corresponding '}' should always be on the same column. It makes it easier to separate a block from the rest of the source, and it helps your _brain_ associates the accolades easily (ask any Lisp programmer on the topic!). Use 2 spaces for the next indentation level. Never use tabs in your code, their widths may vary with editors and systems. Example: if (condition_test) { waow mamma; I'm doing K&R format; just like the Linux kernel; } else { This test failed poorly; } is _OUT_! if (condition_test) { This code isn't stuck to the condition; read it on paper, you'll find it more; pleasant to the eye; } else { Of course, this is a matter of taste; That's just the way it is in this convention; and you should follow it to be homogenous with; the rest of the FreeType code; } is _IN_! 4. Macros: Macros should be made of uppercase letters. When a macro label is forged from several words, it is possible to only uppercasify the first word, using an underscore to separate the nouns. This is used in ttload.c, ttgload.c and ttfile.c with macros like : ACCESS_Frame, GET_UShort, CUR_Stream The role of the macros used throughout the engine is explained later in this document. II. Design Conventions: 1. Modularity and Components Layout: The FreeType engine has been designed with portability in mind. This implies the ability to compile and run it on a great variety of systems and weird environments, unlike many packages where the word strictly means 'run on a bunch of Unix-like systems'. We have thus decided to stick to the following restrictions : - The C version is written in ANSI C. The Pascal version compiles and run under Turbo Pascal 5.0 and compatible compilers.. - The library, when compiled with gcc, doesn't produce any warning with the '-ansi -pedantic' flags. Other compilers with better checks may produce ANSI warnings that we'd be happy to now about. ( NOTE : It can of course be compiled by an 'average' C compiler, and even by a C++ one.. ) - It only requires in its simplest form an ANSI libc to compile, and no utilities other than a C pre-processor, compiler and linker. - It is written in a modular fashion. Each module is called a 'component' and is made of two files in the C version ( an interface '.h' and body '.c' ) and one file in the Pascal one. - The very low-level components can be easily replaced by system-specific ones that do not rely on the standard libc. These components deal mainly with i/o, memory and mutex operations. - A client application must only include one interface file, named 'freetype.h' or 'freetype.pas' to use the engine. All other components should never be used or accessed by client applications, and their name always begin with a 'tt' prefix : ttmemory, ttobjs, ttinterp, ttapi, etc .. - All configuration options are gathered in two files. One contains the processor and OS specific configuration options, while the other treats options that may be enabled or disabled by the developper to test specific features ( like assertions, debugging, etc .. ). IMPORTANT NOTES : These restrictions only apply to the core engine. The package that comes with it contains several test programs sources that are much less portable, even if they present a modular model inspired from the engine's layout. The components currently found in the 'c/lib' directory are : -------- high-level interface ------------------- freetype.h high-level API, to be used by client applications ttapi.c implementation of the api found in 'freetype.h' -------- configuration -------------------------- ttconfig.h engine configuration options. These are commented and switched by hand by the developper. See section 2 below for more info. ft-conf.h included by ttconfig.h, this file isn't part of the 'c/lib' directory, but depends on the target environment. See section 2 blow for more info. ------- definitions ----------------------------- tttypes.h the engine's internal types definitions tttables.h the TrueType tables definitions, per se the Specs tttags.h the TrueType table tags definitions tterror.h/c the error and debugging component ttdebug.h/c only used by the debugger, should not be linked into a release build. ttcalc.h/c math component used to perform some computations with an intermediate 64-bit precision. ------- replaceable components -------------------- ttmemory.h/c memory component. This version uses the ANSI libc but can be replaced easily by your own version. ttfile.h/c stream i/o component. This version uses the ANSI libc but can be replaced easily by your own version. Compiled only if file memomry-mapping isn't available on your system. ttfile2.h/c Unix-specific file memory-mapping version of the file component. It won't be compiled on other systems. Usually results in much faster file access (about 2x on my SCSI P166) ttmutex.h/c generic mutex component. This version is dummy and should only be used for a single-thread build. You _need_ to replace this component's body with your own implementation to be able to build a threaded version of the engine. ------- data management -------------------------- ttengine.h the engine instance record definition, root of all engine data. ttlists.h/c generic lists manager ttcache.h/c generic cache manager ttobjs.h/c the engine's object definitions and implementations contains structure, constructors, destructors and methods for the following objects : face, instance, glyph, execution_context ttload.h/c the TrueType tables loader. ttgload.h/c the glyph loader. A component in itself, due to the task's complexity.. ttindex.h/c the character mapping to glyph index conversion routines. Implements functions defined in 'freetype.h' ttinterp.h/c the TrueType instructions interpreter. Probably the nicest source in this engine. Apparently, many have failed to produce a comparable one due to the very poorly written specification !! It took me three months of my spare time to get it working correctly !! :-) ttraster.h/c the engine's second best piece. This is the scan-line converter. Performs gray-level rendering (a.k.a. font-smoothing) as well as dropout-control. 2. Configuration and Debugging : As stated above, configuration depends on two files : The environment configuration file : 'ft-conf.h' This file contains the definitions of many configuration options that are processor and OS-dependent. On Unix systems, this file is generated automatically by the 'configure' script that comes with the released package. On other environments, it is located on one of the architecture directories found in 'c/arch' (e.g. 'c/arch/os2/ft-conf.h'). The path to this file should be passed to the compiler when compiling _each_ component. ( typically with an -I option ). The engine configuration file : 'ttconfig.h' This file contains many configuration options that the developper can turn on or off to experiment some 'features' of the engine that are not part of its 'simplest' form. The options are commented. Note that the makefiles are compiler-specific.. It is possible to enable the dumping of debugging information by compiling the components with the DEBUG configuration constant. The effect of this flag will be, for the following components : ttload dumps information to stderr about the tables loaded ttgload dumps information to stderr about the loaded glyph ttmemory compile a version of the component which includes a very simple memory block tracking scheme. This will dump the number of leaked blocks when the engine is closed ( i.e. when calling TT_FreeType_Done ) ttinterp will include an on-line simple text-mode debugger which will be called whenever you hint a glyph. If you want to port the engine to another environment, you will need to : - write a new 'ft-conf.h' for it. Just copy one of those available and change the flags accordingly (they're all commented). - replace the memory, file and mutex component with yours, presenting the same interface and behaviour. - Eventually add some code in ttapi.c to initialize system-specific data with the engine. III. Usage conventions: 1. Error Handling: Error handling has been refined to allow reentrant builds of the library, available only in the C version. We thus have now two different conventions : In Pascal : A global error variable is used to report errors when they are detected. All functions return a boolean that indicates success or failure of the call. When an error occurs within a given function, the latter must set the error variable and return false (which means failure). It is then possible to make several calls in a single 'if' statement like in : if not Perform_Action_1( parms_of_1 ) or not Perform_Action_2( parms_of_2 ) or not Perform_Action_3( parms_of_3 ) then goto Fail; where execution will jump to the 'Fail' label whenever an error occurs in the sequence of actions invoked in the condition. In C : global errors are forbidden in re-entrant builds. Each function thus returns directly an error code. A return value of 0 means that no error occured, while any other value indicates a failure of any kind. This convention is more constraining than the one used in the Pascal source. The above Pascal statement should be translated into the following C fragment : rc = Perform_Action_1( parms_of_1 ); if (rc) goto Fail; rc = Perform_Action_2( parms_of_2 ); if (rc) goto Fail; rc = Perform_Action_3( parms_of_3 ); if (rc) goto Fail; which, while being equivalent, isn't as pleasantly readable. One 'simple' way to match the original fragment would be to write : if ( (rc = Perform_Action_1( parms_of_1 )) || (rc = Perform_Action_2( parms_of_2 )) || (rc = Perform_Action_3( parms_of_3 )) ) goto Fail; which is better but uses assignements within expressions, which are always delicate to manipulate in C (the risk of writing '==' exists, and would go unnoticed by a compilers). Moreover, the assignements are a bit redundant, and don't express much things about the actions performed (they only speak of the error management issue). That is why some macros have been defined for the mostly used functions. Most of them relate to very low-level routines that are called very often ( i/o, mutex and memory mainly ). Each macro produces an implicit assignement to a variable called 'error', and can be used instead as a simple function call. Eg : if ( PERFORM_Action_1( parms_of_1 ) || PERFORM_Action_2( parms_of_2 ) || PERFORM_Action_3( parms_of_3 ) ) goto Fail; with #define PERFORM_Action_1(parms_1) (error = Perform_Action_1(parms_1)) #define PERFORM_Action_2(parms_1) (error = Perform_Action_2(parms_1)) #define PERFORM_Action_3(parms_1) (error = Perform_Action_3(parms_1)) defined at the beginning of the file. There, the developper only needs to define a local 'error' variable and use the macros directly in its code, without caring about the actual error handling performed. Examples of such uses can be found in 'ttload.c' and 'ttgload.c'. Moreover, the structure of the source files remain very similar, even though the error handling is very different. This convention is very close to the use of exceptions in languages like C++, Pascal, Java, etc.. where the developper focuses on the actions to perform, and not every little error checking.. 2. Font File I/O: a. Streams: The engine uses 'streams' to access the font files. A stream is a structure defined in the File component containing information used to access files through a system-specific i/o library. The current implementation of the File component uses the ANSI libc i/o functions. However, for the sake of embedding in light systems and independence of a complete libc, it is possible to re-implement the component for a specific system or OS, letting it use system calls. A stream is of type 'TStream' defined in the TTObjs interface. The type is (void*) but actually points to a structure defined within the File component. A stream is created, managed and closed through the interface of the File component. Several implementations of the same component can co-exist, each taking advantage of specific system features (the'ttfile2.c' uses memory-mapped files for instance) as long as it respects the interface. b. Frames: TrueType is tied to the big-endian format, which implies that reading shorts or longs from the font file may need conversions depending on the target processor. To be able to easily detect read errors and allow simple conversion calls or macros, the engine is able to access a font file using 'frames'. A frame is simply a sequence of successive bytes taken from the input file at the current position. A frame is pre-loaded in memory by a 'TT_Access_Frame' call of the File component. It is then possible to read all sizes of data through the Get_xxx functions, like Get_Byte, Get_Short, Get_UShort, etc. When all important data is read, the frame can be released by a call to 'TT_Forget_Frame'. The benefits of frames are various: Consider these two approaches at extracting values: if ( (error = Read_Short( &var1 )) || (error = Read_Long ( &var2 )) || (error = Read_Long ( &var3 )) || (error = Read_Short( &var4 )) ) return FAILURE; and if ( (error = TT_Access_Frame( 16L )) ) /* Read 16 next bytes */ return error; /* The Frame could not be read */ var1 = Get_Short(); /* extract values from the frame */ var2 = Get_Long(); var3 = Get_Long(); var4 = Get_Short(); TT_Forget_Frame(); /* release the frame */ In the first case, there are four error assignements with four checks of the file read. This increases un-necessarily the size of the generated code. Moreover, you must be sure that var1 and var4 are short variables, and var2/var3 long ones, if you want to avoid bugs and/or compiler warnings. In the second case, you perform only one check for the read, and exit immediately on failure. Then the values are extracted from the frame, as the result of function calls. This means that you can use automatic type conversion; there is no problem if var1 and var4 are longs, unlike previously. On big-endian machines, the Get_xxx functions could also be simple macros that merely peek the values directly from the frame, which speeds and simplifies the generated code! And finally, frames are ideal when you're using memory-mapped files, as the frame is not really 'pre-loaded' and never uses any 'heap' space. IMPORTANT You CANNOT nest several frame accesses. There is only one available at a time for a specific instance. It is also the programmer's responsablity to never extract more data than was pre-loaded in the frame! (But you usually know how many values you want to extract from the file before doing so). 3. Memory Management: The library now uses a component which interface looks like a lot malloc/free. It defines only two functions : * Alloc To be used like malloc, except that it returns an error code, not an address. Its arguments are the size of the requested block and the address of the target pointer to the 'fresh' block. An error code is returned in case of failure (and this will also set the target pointer to NULL), 0 in case success. Alloc should always respect the following rules : - requesting a block of size 0 should set the target pointer to NULL and return no error code (i.e. return 0) - the returned block is always zeroed. This is an important assumption of other parts of the library. If you wish to replace the memory component with your own, please respect this behaviour, or your engine won't work correctly. * Free As you may have already guessed, Free is Alloc's counterpart. It takes as argument the _target pointer's address_ !! You should _never_ pass the block's address directly, i.e. the pointer, to Free. Free should always respect the following rules : - calling it with a NULL argument, or the address of a NULL pointer is valid, and should return success. - the pointer is always set to NULL after the block's deallocation. This is also an important assumption of many other parts of the library. If you wish to replace the memory component with your own, please respect this behaviour, or your engine won't work correctly. As the pointers addresses needed as arguments are typed 'void**', the component's interface also provides in the C version some macros to help use them more easily, these are : MEM_Alloc a version of Alloc that casts the argument pointer to (void**) ALLOC same as MEM_Alloc, but with an assignement to a variable called 'error'. See 'error handling' above for more info on this. FREE a version of Free that casts the argument pointer to (void**). There is currently no error handling by with this macro. MEM_Set an alias for 'memset', which can be easily changed to anything else if you wish to use a different memory manager than the functions provided by the ANSI libc MEM_Copy an alias of 'memcpy' or 'bcopy' used to move blocks of memory. You may change it to something different if you wish to use something else that your standard libc 4. Support for threaded environments: Support for threaded environments have been added to the C sources, and only to these. It is now theorically possible to build three distinct versions of the library : single-thread build : The default build. This one doesn't known about different threads. Hence, no code is generated to perform coherent data sharing and locking. thread-safe build : With this build, several threads can use the library at the same time. However, some key components can only be used by one single thread at a time, and use a mutex to synchronize access to their functions. They are mainly the file, raster and interpreter components. re-entrant build : A re-entrant version is able to perform certain actions in parallel that a thread-safe one cannot. This includes accessing file(s) in parallel, interpreting different instruction streams in parallel, or even scan-line converting distinct glyphs at the same time. Note that most of the latest changes in the engine are making the distinction between the thread-safe and re-entrant builds thinner than ever. The only remaining problem being the raster component. ***** RELEASE NOTE ********************************************* Note also that the threaded build is not operational if you read this with FreeType 1.0. **************************************************************** There is a ttmutex component that presents a generic interface to mutex operations. It should be re-implemented for each platform.