Categories

Designing Better APIs

Is there a best design for an API? I don’t know but I do have my own guidelines for good vs bad design. One guideline I try to follow is that it’s good to design an API to make it hard to use incorrectly. Here’s a couple of examples I’ve run into recently.

Here’s an API for allocating shared memory.

class MemorySystem
{
public:
  void* AllocateSharedMemory(size_t size_in_bytes);
  {
    const size_t kPageSize = 65536;

    // Make sure a multiple of kPageSize is requested.
    assert(size_in_bytes % kPageSize == 0);
    ...
    return allocated_memory;
  }
};

Can you see why this is a bad design? First off, 65535 inputs of 65536 inputs are bad. Why design an API where most of the time the program will fail? Of course if you’re lucky the programmer will try some value that will fail and he’ll catch the error before he ships. He may or may not get an error message depending on the environment his application runs in but at least he can spend a some time debugging until he finds out that he has to pass in an multiple of page size for bytes.

Some programmers might switch to this as a better design.

class MemorySystem
{
public:
  void* AllocateSharedMemory(size_t size_in_bytes)
  {
    const size_t kPageSize = 65536;

    // Round up to the nearest page size.
    size_t pages = (size_in_bytes + kPageSize - 1) / kPageSize;
    size_in_bytes = pages * kPageSize;
    ...
    return allocated_memory;
  }
};

That fixes the problem of most inputs being bad. It has another problem though which is the user has no idea that it’s going to allocate a page each time. He has no idea that if he makes lots of shared memory allocations for small sizes lots of memory will be wasted.

Here’s a better design.

class MemorySystem
{
public:
  static const size_t kPageSize = 65536;

  void* AllocateSharedMemory(size_t number_of_pages)
  {
    ...
    return allocated_memory;
  }
};

This design has the advantage that it doesn’t have any bad inputs and by design it makes it very clear what is happening under the hood. No documentation is needed for the user of this function to know that units of page size will be allocated and knowing that he also knows he needs to find out what size those pages are and that allocating lots of small pieces will quickly gooble up the memory.

Of course there’s the idea that sometimes you don’t want the user of the API to have to know the underlying implementation but in this case he needs to know what’s happening if he wants his program to perform well.

Here’s another example. I was working with a shared memory system. The shared memory is shared between an untrusted piece of code and a trusted piece of code. The trusted piece code needs to make sure that anything it reads out of shared memory is valid. For example the untrusted piece might pass the trusted piece a shared memory id, an offset into that shared memory and a size and the trusted piece needs to verify the id is a valid piece of shared memory and the offset and size are in range for that piece of shared memory.

The API that was given was something like this:

class MemorySystem
{
public:
  // Returns an ID for the allocated shared memory.
  int AllocateSharedMemory(size_t size_in_bytes);

  // Gets the address of shared memory for the given id.
  // Returns NULL if the id is invalid.
  void* GetAddressOfSharedMemory(int shared_memory_id);

  // Gets the size of shared memory for the given id.
  // Returns 0 if the id is invalid.
  size_t GetSizeOfSharedMemory(int shared_memory_id);
}

When using this API, every time the trusted code accesses shared memory it has to check that everything is valid. For example:

bool Trusted::WriteSharedMemoryToFile(
    int shared_memory_id,
    int offset,
    size_t size)
{
  void* data = mem_sys_->GetAddressOfSharedMemory(shared_memory_id);
  if (data == NULL)
  {
    // not a valid shared memory id
    return false;
  }

  size_t mem_size = mem_sys_->GetSizeOfSharedMemory(shared_memory_id);
  if (offset + size > mem_size)
  {
     // offset or size is out of range for the given shared memory.
     return false;
  }

  if (offset + size < offset)
  {
     // offset + size overflows size_t
     return false;
  }

  // every thing is valid so write out the data.
  WriteCharacters(file_, static_cast(data) + offset, size);
}

The problem with that API is that it pushes the burden of validation on each and every piece of code that accesses shared memory.

Here’s a better design.

class MemorySystem
{
public:
  // Returns an ID for the allocated shared memory.
  int AllocateSharedMemory(size_t number_of_pages);

  // Gets the address of a segment of shared memory for the given id.
  // Returns NULL if the id or segment is invalid.
  void* GetAddressOfSharedMemory(
      int shared_memory_id,
      int offset,
      size_t size);
}

With this design there there is no easy way to get a pointer to shared memory unless it’s a valid pointer. In other words, it’s much harder to use this API incorrectly. The example gets much simpler.

bool Trusted::WriteSharedMemoryToFile(
    int shared_memory_id,
    int offset,
    size_t size)
{
  void* data = mem_sys_->GetAddressOfSharedMemory(
      shared_memory_id,
      offset,
      size);
  if (data == NULL)
  {
    // segment not valid.
    return false;
  }

  // every thing is valid so write out the data.
  WriteCharacters(file_, static_cast(data), size);
}

That change also meant I can get rid of the pointer arithmetic in code that uses this API.

I would even go one step further for code like this which is I’d add a template so I can remove the cast from code that uses this API.

class MemorySystem
{
public:
  // Returns an ID for the allocated shared memory.
  int AllocateSharedMemory(size_t number_of_pages);

  // Gets the address of a segment of shared memory for the given id.
  // Returns NULL if the id is invalid or segment is invalid.
  void* GetAddressOfSharedMemory(
      int shared_memory_id,
      int offset,
      size_t size);

  // Use this template to get the address of a shared memory segment.
  // Returns NULL if the id is invalid or segment is invalid.
  template 
  T GetSharedMemoryAs(
      int shared_memory_id,
      int offset,
      size_t size) {
    return static_cast(GetAddressOfSharedMemory(
        shared_memory_id, offset, size);
  }
}

And now the example is:

bool Trusted::WriteSharedMemoryToFile(
    int shared_memory_id,
    int offset,
    size_t size)
{
  const char* data = mem_sys_->GetSharedMemoryAs(
      shared_memory_id,
      offset,
      size);
  if (data == NULL)
  {
    // not a valid shared memory id or size or offset.
    return false;
  }

  // every thing is valid so write out the data.
  WriteCharacters(file_,  data, size);
}

That’s important because every time you use a cast it’s another chance for an error. If you cast wrong bad things happen so if you can compartmentalize any casting out of your code then you can avoid more errors.

I hope I’ve convinced you to consider designing your API to make it easy to use correctly and hard to use incorrectly. It’s often not much work, it often makes the usage of the API far more clear without having to resort to documentation and could possibly save you and your team hours of debugging and frustration.

5 comments to Designing Better APIs

  • CarlJohan

    Your pictures don’t show up very much in the rss feed :(

  • pauly

    void* AllocateSharedMemory(size_t number_of_pages) is a poor design. It puts the work of calculating the number of pages on the application programmer – every single time – leading to more chance of error. If this function is called in many places, it would be easy to accidently pass in “number of bytes” instead of “number of pages”, and never catch it.

    Now what happens if the page size changes? Or I add a second page size? At the very least I would have to recompile all client code – a major fail for an API. At the very least, you should be passing some kind of page size or page size enum.

    Also, the intention of the original API seems to be to hide the page size from the application programmer. I agree that “Asserting” is a major fail, but the original API might be trying to hide the page size for a very good reason.

    On your second example, I think you are missing the difference between an “API” and “Helper functions”. To me, a good API provides basic functionality, whilst helper functions group a series of commonly used functions together. I think your final example is a helper function, not an API function.

    Oh, and my compile times might not thank you for adding templates into your header file, but I guess its just one (inline – please add inline!) function :)

  • There’s no reason to add “inline”. The compiler will do it for you. In fact, it’s waste of typing and as it’s only a suggestion to the compiler it is mis-leading since you don’t know if it will be inlined. (besides, it’s against my current employer’s style guide)

    As for it being bad design, the function argument clearly says “page size” and is therefore auto documented so the API user is not going to use bytes by accident. You instantly know you have to look up the page size whether it’s a constant or from a function. That’s much better than having no idea. I could certainly see improving it by naming the function, AllocatePagesOfSharedMemory or AllocateSharedMemoryPages which would make it even clearer.

    As someone who had to debug and fix code because an API designer choose the “number of bytes” solution and the client code saw “number of bytes” and was allocating lots of small amounts that way I’m going on empirical evidence. I’m also not the only one to run into this. A 40% increase in speed was recently had by one of the teams at my company finding out they were using a similar API inefficiently by allocating lots of small pieces of memory when internally it was allocating pages and therefore filling memory and ending up paging the app.

    Hiding something that important is the wrong API design. A good design helps you write code that uses the API the way it was intended to be used. Bad API design puts all the burden of figuring out both how to use it at all and how to use it efficiently on the shoulders of the API user.

    As for compile times, I hope you’re not saying you’d choose compile times over safety.

  • pauly

    Are you just talking about “inline” as it applies to templates only? Its certainly more than a hint in non-template functions. In template functions, the worst that can happen is that compiler will take the hint, and even improve compile times. There’s a good article about inline-only template functions in the Meyer’s book (I don’t have the book to hand, unfortunately).

    With all these small allocations, might it not better to add a heap manager, ie. a malloc-style heap management? I can see reasons to NOT do that with shared memory (eg. process A shares memory with process B and process C, but doesn’t want process B and C to see the each others data).

    Compile times over safety? Mmmmm, what’s the cost? :) In this case, the template function is no more safe (and not even less typing) than the original, so nothing is gained. But I’m being picky. There’s nothing really wrong with your function – its one of the best uses of templates IMHO.

  • I could make GetSharedMemoryAs like this

    template 
    bool GetSharedMemoryAs(
        int shared_memory_id,
        int shared_memory_offset,
        size_t size,
        T* pointer) {
      *pointer = GetSharedMemory(
          shared_memory_id,
          shared_memory_offset,
          size);
    
       return *pointer != NULL;
    }

    Or something along those lines. That would actually make it less typing and easier to use since you don’t have to specify the type. Then you can just do

    int* some_ints;
    mem_sys->GetSharedMemoryAs(id, offset, size, &some_ints);
    

    Personally I’d make “pointer* a reference to a pointer if it was up to me but my company doesn’t allow non-const references.

Leave a Reply

 

 

 

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>