Description of Dynamic Update and T_UNSPEC Code Added by Mike Schwartz University of Washington Computer Science Department 11/86 schwartz@cs.washington.edu I have incorporated 2 new features into BIND: 1. Code to allow (unauthenticated) dynamic updates: surrounded by #ifdef ALLOW_UPDATES 2. Code to allow data of unspecified type: surrounded by #ifdef ALLOW_T_UNSPEC Note that you can have one or the other or both (or neither) of these modifications running, by appropriately modifying the makefiles. Also, the external interface isn't changed (other than being extended), i.e., a BIND server that allows dynamic updates and/or T_UNSPEC data can still talk to a 'vanilla' server using the 'vanilla' operations. The description that follows is broken into 3 parts: a functional description of the dynamic update facility, a functional description of the T_UNSPEC facility, and a discussion of the implementation of dynamic updates. The implementation description is mostly intended for those who want to make future enhancements (especially the addition of a good authentication mechanism). If you make enhancements, I would be interested in hearing about them. 1. Dynamic Update Facility I added this code in conjunction with my research into naming in large heterogeneous systems. For the purposes of this research, I ignored security issues. In other words, no authentication/authorization mechanism exists to control updates. Authentication will hopefully be addressed at some future point (although probably not by me). In the mean time, BIND Internet name servers (as opposed to "private" name server networks operating with their own port numbers, as I use in my research) should be compiled *without* -DALLOW_UPDATES, so that the integrity of the Internet name database won't be compromised by this code. There are 5 different dynamic update interfaces: UPDATEA - add a resource record UPDATED - delete a specific resource record UPDATEDA - delete all named resource records UPDATEM - modify a specific resource record UPDATEMA - modify all named resource records These all work through the normal resolver interface, i.e., these interfaces are opcodes, and the data in the buffers passed to res_mkquery must conform to what is expected for the particular operation (see the #ifdef ALLOW_UPDATES extensions to nstest.c for example usage). UPDATEM is logically equivalent to an UPDATED followed by an UPDATEA, except that the updates occur atomically at the primary server (as usual with Domain servers, secondaries may become temporarily inconsistent). The difference between UPDATED and UPDATEDA is that the latter allows you to delete all RRs associated with a name; similarly for UPDATEM and UPDATEMA. The reason for the UPDATE{D,M}A interfaces is two-fold: 1. Sometimes you want to delete/modify some data, but you know you'll only have a single RR for that data; in such a case, it's more convenient to delete/modify the RR by just giving the name; otherwise, you would have to first look it up, and then delete/modify it. 2. It is sometimes useful to be able to delete/modify multiple RRs this way, since one can then perform the operation atomically. Otherwise, one would have to delete/modify the RRs one-by-one. One additional point to note about UPDATEMA is that it will return a success status if there were *zero* or more RRs associated with the given name (and the RR add succeeds), whereas UPDATEM, UPDATED, and UPDATEDA will return a success status if there were *one* or more RRs associated with the given name. The reason for the difference is to handle the (probably common) case where what you want to do is set a particular name to contain a single RR, irrespective of whether or not it was already set. 2. T_UNSPEC Facility Type T_UNSPEC allows you to store data whose layout BIND doesn't understand. Data of this type is not marshalled (i.e., converted between host and network representation, as is done, for example, with Internet addresses) by BIND, so it is up to the client to make sure things work out ok w.r.t. heterogeneous data representations. The way I use this type is to have the client marshal data, store it, retrieve it, and demarshal it. This way I can store arbitrary data in BIND without having to add new code for each specific type. T_UNSPEC data is dumped in an ASCII-encoded, checksummed format so that, although it's not human-readable, it at least doesn't fill the dump file with unprintable characters. Type T_UNSPEC is important for my research environment, where potentially lots of people want to store data in the name service, and each person's data looks different. Instead of having BIND understand the format of each of their data types, the clients define marshaling routines and pass buffers of marshalled data to BIND; BIND never tries to demarshal the data...it just holds on to it, and gives it back to the client when the client requests it, and the client must then demarshal it. The Xerox Network System's name service (the Clearinghouse) works this way. The reason 'vanilla' BIND understands the format of all the data it holds is probably that BIND is tailored for a very specific application, and wants to make sure the data it holds makes sense (and, for some types, BIND needs to take additional action depending on the data's semantics). For more general purpose name services (like the Clearinghouse and my usage of BIND), this approach is less tractable. See the #ifdef ALLOW_T_UNSPEC extensions to nstest.c for example usage of this type. 3. Dynamic Update Implementation Description This section is divided into 3 subsections: General Discussion, Miscellaneous Points, and Known Defects. 3.1 General Discussion The basic scheme is this: When an update message arrives, a call is made to InitDynUpdate, which first looks up the SOA record for the zone the update affects. If this is the primary server for that zone, we do the update and then update the zone serial number (so that secondaries will refresh later). If this is a secondary server, we forward the update to the primary, and if that's successful, we update our copy afterwards. If it's neither, we refuse the update. (One might think to try to propagate the update to an authoritative server; I figured that updates will probably be most likely within an administrative domain anyway; this could be changed if someone has strong feelings about it). Note that this mechanism disallows updates when the primary is down, preserving the Domain scheme's consistency requirements, but making the primary a critical point for updates. This seemed reasonable to me because 1. Alternative schemes must deal with potentially complex situations involving merging of inconsistent secondary updates 2. Updates are presumed to be rare relative to read accesses, so this increased restrictiveness for updates over reads is probably not critical I have placed comments through out the code, so it shouldn't be too hard to see what I did. The majority of the processing is in doupdate() and InitDynUpdate(). Also, I added a field to the zone struct, to keep track of when zones get updated, so that only changed zones get checkpointed. 3.2 Miscellaneous Points I use ns_maint to call zonedump() if the database changes, to provide a checkpointing mechanism. I use the zone refresh times to set up ns_maint interrupts if there are either secondaries or primaries. Hence, if there is a secondary, this interrupt can cause zoneref (as before), and if there is a primary, this interrupt can cause doadump. I also checkpoint if needed before shutting down. You can force a server to checkpoint any changed zones by sending the maint signal (SIGALRM) to the process. Otherwise it just checkpoints during maint. interrupts, or when being shutdown (with SIGTERM). Sending it the dump signal causes the database to be dumped into the (single) dump file, but doesn't checkpoint (i.e., update the boot files). Note that the boot files will be overwritten with checkpoint files, so if you want to preserve the comments, you should keep copies of the original boot files separate from the versions that are actually used. I disallow T_SOA updates, for several reasons: - T_SOA deletes at the primary wont be discovered by the secondaries until they try to request them at maint time, which will cause a failure - the corresponding NS record would have to be deleted at the same time (atomically) to avoid various problems - T_SOA updates would have to be done in the right order, or else the primary and secondaries will be out-of-sync for that zone. My feeling is that changing the zone topology is a weighty enough thing to do that it should involve changing the load file and reloading all affected servers. There are alot of places where bind exits due to catastrophic failures (mainly malloc failures). I don't try to dump the database in these places because it's probably inconsistent anyway. It's probably better to depend on the most recent dump. 3.2 Known Defects 1. I put the following comment in nlookup (db_lookup.c): Note: at this point, if np->n_data is NULL, we could be in one of two situations: Either we have come across a name for which all the RRs have been (dynamically) deleted, or else we have come across a name which has no RRs associated with it because it is just a place holder (e.g., EDU). In the former case, we would like to delete the namebuf, since it is no longer of use, but in the latter case we need to hold on to it, so future lookups that depend on it don't fail. The only way I can see of doing this is to always leave the namebufs around (although then the memory usage continues to grow whenever names are added, and can never shrink back down completely when all their associated RRs are deleted). Thus, there is a problem that the memory usage will keep growing for the situation described. You might just choose to ignore this problem (since I don't see any good way out), since things probably wont grow fast anyway (how many names are created and then deleted during a single server incarnation, after all?) The problem is that one can't delete old namebufs because one would want to do it from db_update, but db_update calls nlookup to do the actual work, and can't do it there, since we need to maintain place holders. One could make db_update not call nlookup, so we know it's ok to delete the namebuf (since we know the call is part of a delete call); but then there is code with alot of overlapping functionality in the 2 routines. This also causes another problem: If you create a name and then do UPDATEDA, all it's RRs get deleted, but the name remains; then, if you do a lookup on that name later, the name is found in the hash table, but no RRs are found for it. It then forwards the query to itself (for some reason), and then somehow decides there is no such domain, and then returns (with the correct answer, but after going through extra work). But the name remains, and each time it is looked up, we go through these same steps. This should be fixed, but I don't have time right now (and the right answer seems to come back anyway, so it's good enough for now). 2. There are 2 problems that crop up when you store data (other than T_SOA and T_NS records) in the root: a. Can't get primary to doaxfr RRs other than SOA and NS to secondary. b. Upon checkpoint (zonedump), this data sometimes comes out after other data in the root, so that (since the SOA and NS records have null names), they will get interpreted as being records under the other names upon the next boot up. For example, if you have a T_A record called ABC, the checkpoint may look like: $ORIGIN . ABC IN A 128.95.1.3 99999999 IN NS UW-BORNEO. IN SOA UW-BORNEO. SCHWARTZ.CS.WASHINGTON.EDU. ( 50 3600 300 3600000 3600 ) Then when booting up the next time, the SOA and NS records get interpreted as being called "ABC" rather than the null root name. 3. The secondary server caches the T_A RR for the primary, and hence when it tries to ns_forw an update, it won't find the address of the primary using nslookup unless that T_A RR is *also* stored in the main hashtable (by putting it in a named.db file as well as the named.ca file).