Why accessors?

I use accessors to read/write fields of structures, rather than using them directly. This is an uncommon (C) idiom. In this section, I explain why I use it.

deref

First of all, in larger programs, ADTs can get very large - even thousands of lines long. In such an ADT there can be hundreds to thousands of field dereferences. At one point, I was working with such a structure (let's call it T) . I was not using accessor macros, so the code had lots of field dereferences of the form someT->field.

At that point, I was working on an RS/6000 running AIX. One property of that system is that it does not cause a fault if a NULL pointer is dereferenced. Unfortunately, it was, and we needed to isolate it.

One possible solution would have been to use the debugger, and set a watchpoint on the low address(es). Unfortunately, using a debugger to observe memory slows the program down by orders of magnitude. In many situations, that would have been acceptable; however, the program that I was working on exhibited the bug after hours of running. Using the debugger would have meant days per run.

We tried to sprinkle assert( someT != NULL) statements across the code, but we didn't manage to put them in the right places. It became clear that we were on the right track, but we couldn't nail it down.

Finally, we decided to it systematically. We replaced all occurrences of someT->field with deref_T(someT)->field, and defined deref_T() as:


	#define deref_T(_t)	(assert((_t)!=0), (_t))

This approach worked. Fairly quickly, within the course of a day or so we were able to convert the code, isolate the bug, and fix it.

However, to fix this bug, we had to go through the entire code and replace all dereferences of Ts with the deref_T() macro. It would have been much better if we started off using a deref macro in the first place.

accessor

Using just a plain deref macro leaves us with code of the form deref_T(someT)->field. This is (at least to me) ugly. If we were going to use macros to get to the field of an object anyway, it seemed cleaner to use an accessor macro of the form field_T(someT), defined as:


	#define	field_T(_t)	(deref_T(_t)->field)

Very quickly, we realized that this could help us isolate bugs by transparently applying bounds checks where necessary. Thus, for instance:


	struct uint_vec {
	  int		len;
	  unsigned *	vals;
	};

	#define deref_uint_vec(_t)	(_t)
	#define len_uint_vec(_t)	(deref_uint_vec(_t)->len)
	#define vals_uint_vec(_t)	(deref_uint_vec(_t)->vals)
	#define val_uint_vec(_t,_i)	\
		(vals_uint_vec(_t)[bounds(_i, len_uint_vec(_t))])

lhs vs. rhs accessors

Initially, we used the same accessor for both reading and writing a field. However, at one point we needed to trace where an field was getting set. To do that we had to distinguish between cases where the field was being accessed so as to read the value and where the field was being accessed to write the value.

Our initial thought was to use set_field_T() macros for writing fields, as:


	#define set_len_uint_vec(_t, _v) \
		( do_trace(_t), \
		  deref_uint_vec(_t)->len = (_v))

However, that would have meant converting a lot of code that was written as:


	len_uint_vec(someT) = /* some complex expression */

Instead, we kept a fairly similar form for writing:


	#define x_len_uint_vec(_t)	\
		((do_trace(_t), deref_uint_vec(_t))->len)

Enforcing rhs accessors

Most idioms are enforced only by the discipline of the programmer. There is no way, in general, to tell whether an idiom is being used or not. Occasionally, there is an exception. Specifically, it is possible to code a rhs accessor so that using it with the appropriate compiler flags will cause a warning or an error.


	#define len_uint_vec(_t)	\
		((void)0, deref_uint_vec(_t)->len)
	#define x_len_uint_vec(_t)	\
		(deref_uint_vec(_t)->len)

Using a rhs accessor len_uint_vec() as defined above on the lhs of an assignment will cause an error if using ANSI C. Thus,


	len_uint_vec(someT) = 0;

should give a compiler warning of the form compound expressions as lvalues forbidden.

Field Names

At some point, we had two list-like ADTs with the same field, next. By mistake, we had written the following code:


	T2	ptrT2;

	... next_T1(ptrT2) ...

This code worked correctly, of course; there was no difference between next_T1(ptrT2) and next_T2(ptrT2) - both converted to ptrT2->next.

However, when we added some debugging code to the next_T2, the access to ptrT2->next via next_T1() was not tracked. That was when we started to try to disambiguate field references by adding 2 or 3 letter prefixes to struct and union fields.


	struct T1 {
	  struct T1 *	t1_next;
	  /* ... */
	};

	#define deref_T1(_t)	(_t)
	#define next_T1(_t)	(deref_T1(_t)->t1_next)

	struct T2 {
	  struct T2 *	t2_next;
	  /* ... */
	};

	#define deref_T2(_t)	(_t)
	#define next_T2(_t)	(deref_T2(_t)->t2_next)

Adding the prefixes is not much additonal work; all changes are confined to the accessor macros.

With this change, the example would not have compiled correctly. next_T1(ptrT2) would have resolved to ptrT2->t1_next, which would fail since T2 has no feild named t1_next.


Next Prev Main Top Feedback