Tag Archives: Application Architecture

[repost ]Paper: A Web Of Things Application Architecture – Integrating The Real-World Into The Web


How do you layer a programmable Internet of smart things on top of the web? That’s the question addressed by Dominique Guinard in his ambitious dissertation: A Web of Things Application Architecture – Integrating the Real-World (slides). With the continued siloing of content, perhaps we can keep our things open and talking to each other?

In the architecture things are modeled using REST, they will be findable via search, they will be social via a social access controller, and they will be mashupable. Here’s great graphical overview of the entire system:



A central concern in the area of pervasive computing has been the integration of digital artifactsts with the physical world and vice-versa. Recent developments in the fi eld of embedded devices have led to smart things increasingly populating our daily life. We de fine smart things as digitally enhanced physical objectsts and devices that have communication capabilities. Application domains are for instance wireless sensor and actuator networks in cities making them more context-aware and thus smarter. New appliances such as smart TVs, alarm clocks, fridges or digital-picture frames make our living-rooms and houses more energy ecient and our lives easierer. Industries bene fit from increasingly more intelligent machines and robots. Usual objects tagged with radio-tags or barcodes become linked to virtual information sources and o er new business opportunities.

As a consequence, Internet of Things research is exploring ways to connect smart things together and build upon these networks. To facilitate these connections, research and industry have come up over the last few years with a number of low-power network protocols. However, while getting increasingly more connected, embedded devices still form multiple, small, incompatible islands at the application layer: developing applications using them is a challenging task that requires expert knowledge of each platform. As a consequence, smart things remain hard to integrate into composite applications. To remedy this fact, several service platforms proposing an integration architecture appeared in recent years. While some of them are successfully implemented on some appliances and machines, they are, for the most part, not compatible with one another. Furthermore, their complexity and lack of well-known tools let them only reach a relatively small community of expert developers and hence their usage in applications has been rather limited.

On the other hand, the Internet is a compelling example of a scalable global network of computers that interoperate across heterogeneous hardware and software platforms. On top of the Internet, the Web illustrates well how a set of relatively simple and open standards can be used to build very flexible systems while preserving eciency and scalability. The cross-integration and developments of composite applications on the Web, alongside with its ubiquitous availability across a broad range of devices (e.g., desktops, laptops, mobile phones, set-top boxes, gaming devices, etc.), make the Web an outstanding candidate for a universal integration platform. Web sites do not o er only pages anymore, but Application Programming Interfaces that can be used by other Web resources to create new, ad-hoc and composite applications running in the computing cloud and being accessed by desktops or mobile computers.

In this thesis we use the Web and its emerging technologies as the basis of a smart things application integration platform. In particular, we propose a Web of Things application architecture off ering four layers that simplify the development of applications involving smart things. First, we address device accessibility and propose implementing, on smart things, the architectural principles that are at the heart of the Web such the Representational State Transfer (REST). We extend the REST architecture by proposing and implementing a number of improvements to the special requirements of the physical world such as the need for domain-speci c proxies or real-time communication.

In the second layer we study findability: In a Web populated by billions of smart things, how can we identify the devices we can interact with, the devices that provide the right service for our application? To address these issues we propose a lightweight metadata format that search engines can understand, together with a Web-oriented discovery and lookup infrastructure that leverages the particular context of smart things.

While the Web of Things fosters a rather open network of physical objects, it is very unlikely that in the future access to smart things will be open to anyone. In the third layer we propose a sharing infrastructure that leverages social graphs encapsulated by social networks. We demonstrate how this helps sharing smart things in a straightforward, user-friendly and personal manner, building a Social Web of Things.

Our primary goal in bringing smart things to the Web is to facilitate their integration into composite applications. Just as Web developers and tech-savvies create Web 2.0 mashups (i.e., lightweight, ad-hoc compositions of several services on the Web), they should be able to create applications involving smart things with similar ease. Thus, in the composition layer we introduce the physical mashups and propose a software platform, built as an extension of an open-source work ow engine, that o ers basic constructs which can be used to build mashup editors for the Web of Things.

Finally, to test our architecture and the proposed tools, we apply them to two types of smart things. First we look at wireless sensor networks, in particular at energy and environmental monitoring sensor nodes. We evaluate the benefi ts of applying the proposed architecture fi rst empirically by means of several prototypes, then quantitatively by running performance evaluations and finally qualitatively with the help several developers who used our frameworks to develop mobile and Web-based applications. Then, to better understand and evaluate how the Web of Things architecture can facilitate the development of real-world aware business applications, we study automatic identi cation systems and propose a framework for bringing RFID data to the Web and global RFID information systems to the cloud. We evaluate the performance of this framework and illustrate its benefi ts with several prototypes.

Put together, these contributions materialize into an ecosystem of building-blocks for the Web of Things: a world-wide and interoperable network of smart things on which applications can be easily built, one step closer to bridging the gap between the virtual and physical worlds.

Related Articles


[repost ]AppBackplane – A Framework For Supporting Multiple Application Architectures


Hidden in every computer is a hardware backplane for moving signals around. Hidden in every application are ways of moving messages around and giving code CPU time to process them. Unhiding those capabilities and making them first class facilities for the programmer to control is the idea behind AppBackplane.

This goes directly against the trend of hiding everything from the programmer and doing it all automagically. Which is great, until it doesn’t work. Then it sucks. And the approach of giving the programmer all the power also sucks, until it’s tuned to work together and performance is incredible even under increasing loads. Then it’s great.

These are two different curves going in opposite directions. You need to decide for your application which curve you need to be on.

AppBackplane is an example framework supporting the multiple application architectures we talked about in Beyond Threads And Callbacks. It provides a scheduling system that supports continuous and high loads, meets critical path timing requirements, supports fair scheduling amongst priorities; is relatively easy to program; and supports higher degrees of parallelism than can be supported with a pure tasking model.

It’s a bit much for simple applications. But if you are looking to go beyond a basic thread per request model and think of an application as a container where a diverse set of components must somehow all share limited resources to accomplish work, then some of the ideas may prove useful.

In case you are still wondering where the name AppBackplane comes from, it’s something I made up a while ago as a takeoff of computer backplane:

A group of electrical connectors in parallel with each other, so that each pin of each connector is linked to the same relative pin of all the other connectors forming a computer bus. It is used as a backbone to connect several printed circuit boards together to make up a complete computer system.

Frameworks Encourage Poor Threading Models

Frameworks often force an application architecture. Your application architecture shouldn’t be determined by a servlet or a database or anything but the needs of your application.

Sure, a single threaded approach may work fine for a stateless web back end. But what if you are doing a real application on the backend like handling air traffic control or a manufacturing process? Or what if creating a reply to a REST request requires integrating the results of dozens of different requests, each with their own state machine?

In these cases a single threaded approach makes no sense because a web page is just one of a thousand different events an application will be handling. All events are not created equal. Threads, queues, priorities, CPU limits, batching, etc are all tools you can use to handle it all.

It has a lot to do with viewing your application performance as a whole, instead of a vertical slice in time. With a multi threaded approach you can create the idea of quality of service. You can have certain work done at higher priorities. You can aggregate work together even though it came in at different times. You can green light high priority traffic. You can reschedule lower priority traffic. You can drop duplicate work. You can limit the CPU usage for work items so you don’t starve other work. You can do lots of things, none of which you can do with single task that runs until completion.

Programmers commonly talk about upping the number of threads in a thread pool or tuning garbage collection or tuning memory limits as their primary scalability tactics at the application level, but they don’t talk about priorities, deadlock, queuing, latencies, and a lot of other issues to consider when structuring applications.

For example, on an incoming request from a browser you want to setup the TCP/IP connection immediately so that the browser doesn’t have to retry. Retries add load and make the user experience horrible. Instead, what you would like to do is set up the connection, then queue up the request and satisfy it later after having immediately setup the connection. But if each request is handled by a single thread then you can’t implement this sort of architecture and your responsiveness will appear horrible as you run out of threads or threads run slow, or threads block other threads on locks, when in fact you have tons of CPU resources available.

Based on the source of the request you could assign the work a specific priority or drop it immediately.

You can also do things like assign priorities to different phases of a process. If one phase hits the disk you know that takes a lot of time relative to other in memory operations. As your application scales you decide if that phase of the work is more important and give it a higher priority or you can possible drop work elsewhere because you don’t want to a request to fail once it has reached a certain processing point.

Consider if an application must process a UI request, servlet work, and database work. The work could be organized by priority. Maybe you want to handle the UI work first so it queues ahead. But if you keep getting
UI work it will starve the other clients so you give other clients a chance to process work.

With your typical application architecture you don’t have much control over any of this. That’s what we’ll address next, giving you control.


As a note, in this post Hermes refers to a general messaging infrastructure that implements subscribe and publish functionality and remote method invocation. The target environment is C++ so the major compositional structure is a framework based on interfaces and virtual functions. Other languages would no doubt express the same ideas in their own way.

AppBackplane is an application framework that:

  • Supports multiple application architectures. Applications can map to one or a pool of threads, they can be event oriented, or they can be synchronous, or they can use a SEDA pipeline type model. Multiple applications can multiplex over the same queues and threads. Applications can have multiple pieces of work outstanding at the same time.
  • Supports large scale parallelism so peer-to-peer protocols can be used without radical increases in thread counts.
  • Supports static registration of routes and other application services. One goal is for all communication routes to be available after application modules are are constructed. In a complicated system software components, or modules, can’t just come up in any order because modules are interdependent. The idea is to create a dependency graph of modules and drive them through a state machine until they reach the state appropriate for the role the application is assigned (start, initializing, primary, secondary, down). If one module depends on another module then it will only be told to move to another state when the modules it is dependent on have reached a desired state. This way all components in a system can come up in harmony. This model can be extended across clusters as well.
  • Uniformly supports Hermes, peer-to-peer, timer, and other frameworks. If you specify a one shot or repeated pattern of timer expirations, for example, these events should be treated no differently than any other event your application consumes. Timer events should always be handled in the appropriate thread context and never the thread of the timer service. And so it should be with every service/framework.
  • Provide a scheduling system that:
    • Supports continuous and high loads.
    • Performs work in priority order.
    • Meets critical path deadlines.
    • Does not cause watch dog timeouts.
    • Throttles by CPU usage. Once a certain amount of CPU is used the application will give up the processor.
    • Has a configurable number priority queues.
    • Has a quota of operations per queue.
  • Enable request throttling total number of requests per application.
  • Ensures operations on an object happen in order.
  • Has work for an object be performed at the highest priority of outstanding requests.
  • Ensures work on different objects can happen at the same time.
  • Ensures one work request for an object is completed before another work request.
  • Support objects modeled as state machines.
  • Data structures are simple so they can be in the critical path.
  • Keep various statistics on operations.

We want to make it easy for applications to select an architecture . It should be relatively simple to switch between 1-1, 1-N, M-N, event, and parallel state machine architectures.

Conceptual Model

In general the AppBackplane approach combines the event scheduling with parallel state machines using a thread pool (1 or more threads).

  • An application derives from an AppComponent.
  • AppComponents plug into and share an AppBackplane. This means one or more applications can transparently work over the same AppBackplane. This is how multiple architectures are supported.
  • Work is generally asynchronous so that multiple AppComponents can multiplex over the same AppBackplane. It is not required that work be performed asynchronously, but if it is not then it is impossible to guarantee latency. In the future it would be great of applications could have more control over pre-emptive scheduling without having to reply on the OS.
  • AppBackplane has a configurable number of priority queues, called ready queues, for scheduling objects with work to do. Applications must agree on the meanings of the priorities.
  • Each ready queue has a quota of the number of items that can be serviced at a time before the next priority queue must be serviced.
  • AppBackplane keeps a wait queue for an object waiting for a reply. Once the reply comes in the object is moved to the ready queue.
  • AppBackplane contains one or more threads blocked on a semaphore waiting for work to do. Threads can be added or deleted.
  • AppBackplane is the event scheduler.
  • Each AppObject maintains a queue of work it must perform. Operations are queued to and serviced by objects. This is how we make sure operations on an object are performed in order.
  • An application can work on more than one request simultaneously without creating more threads.
  • All operations are in the form of Actions. This allows operations from different sources to be treated generically.
  • Work is serviced in priority order. It is the object that has the priority.
  • Work for an object is processed in order.
  • Hermes and other infrastructure components are abstracted in such a way that work can be converted to Actions and that registration specifications are complete at construction.
  • Hermes registration and deregistration will happen automatically, similarly for other services.
  • AppComponent and AppBackplane are modules so they can be used in the system state machine that is in charge of bringing up the entire system in dependency order.
  • AppBackplane is a AppComponent so it can have its own operations.
  • AppBackplane is a policy domain. All AppComponents sharing an AppBackplane must agree to abide by the same rules regarding latency, etc.
  • AppBackplane can be a system module and move all its contained components through the system state machine.
  • AppBackplane can load balance between available objects implementing the same operations.
  • Each event is timed to gauge how long it took. This data are used to calculate the percentage of CPU used per second which is used to limit the amount of CPU used by the backplane in a second.
  • Events are mapped to objects which are queued to ready queues in the thread of the caller. If an event can not be dispatched quickly, by message type or other data in the message, then event should be queued to a ”’digester”’ object at a high priority so the event can be correctly prioritized based on more complicated calculations.
  • The scheduling algorithm supports multiple priorities so high priority work gets process first, quotas at each priority for fairness, and CPU limitation.
  • A warning alert is issued if any event takes longer than a specified threshold to process. This can be picked up in the logs or a dashboard so programmers can take a look as to why an operation was taking longer than expected.
  • Work is scheduled by the backplane so we can keep a lot of statistics on operations.
  • Many parallel activities can happen at the same time without a correspondingly large increase in thread counts.
  • When the AppObject is in the wait queue waiting for reply, more requests can be queued to that AppObject.

The basic idea is to represent an application by a class that separates application behaviour from threading. Then one or more applications can then be mapped to a single thread or a thread pool.

If you want an event architecture then all applications can share a single thread.

If you want N applications to share one thread then you can instantiate N applications, install them into an AppBackplane, and they should all multiplex properly over the single thread.

If you want N applications to share M threads then you can create an AppBackplane for that as well.

Locking is application specific. An application has to know its architecture and lock appropriately.

Increased Parallelism Without Large Thread Count Increases

One of the key motivations behind AppBackplane is to be able to support higher degrees of parallelism without radical increases in thread counts. This allows us to do is have more peer-to-peer protocols without worrying about how they impact threading.

For example, if multiple applications on a node want to talk with 30 other nodes using a peer-to-peer approach, then 30 threads per application will be used. This adds up to a lot of threads in the end.

Using the backplane all the protocols can be managed with far fewer threads which means we can expand the use of peer-to-peer protocols.

Bounding Process Times

If an application is using an event or parallel state machine model it is responsible for creating bounded processing times.

This means that whoever is doing work must give up the CPU so as to satisfy the latency requirements of the backplane. If, for example, an application expects a max scheduling latency of 5ms then you should not do a synchronous download of 1000 records from the database.

How Modules Are Supported

A Module is a way of looking at a slice of software as a unit that can have dependencies on other modules and that the modules must be brought up in some sort of dependency order. The state machine and complexity of the dependencies depends on your application.

Whatever your approach, the backplane needs to work within the module infrastructure. Applications need to be dependent on each other yet still work over the same backplane.

We need AppComponents installed into the backplane to be driven by the backplane in accordance with the same state machine that brings up the entire application.

Dispatching Work To Ready Queues

Requests like Hermes messages come in from a thread different than the thread in the backplane that will execute the command. A dispatch step is required to dispatch the work to an object and then to a ready queue. Dispatch happens in the thread of the caller.

Dispatching in the caller’s thread saves an intermediate queue and context switch.

If the dispatch can happen quickly then all is right with the world. Dispatching can be fast if it is based on the message type or data in the request.

Sometimes dispatching is a much more complicated process which could block the caller too long.

The strategy for getting around blocking the client is to queue the operation to a high priority ”’digester” object.

The digester object runs in the backplane and can take its time figuring out how to dispatch the work.

Objects Considered State Machines

Objects may or may not be implement as FSMs. An object’s operations may all be stateless in which case it doesn’t care. Using a state machine model makes it easier to deal with asynchronous logic.


Parallelism is by object and is made possible by asynchronous operation. This means you can have as many parallel operations as there are objects (subject to flow control limits). The idea is:

  • Operations are queued to objects.
  • Any blocking operation is implemented asynchronously allowing other operations to be performed. Latency is dependent on cooperative multitasking.
  • The reply to an operation drives the state machine for the object forward.

An application can have N objects capable of performing the same operation. The AppBackplane can load balance between available objects.

Or if an operation is stateless then one object can handle requests and replies in any order so the object does not need to be multiplied.

It may still be necessary to be able to interrupt low priority work with a flag when high priority work comes in. It is not easy to decompose iterators to an async model. It may be easier to interrupt a loop in the middle.

Per Object Work Queue

Without a per object queue we won’t be able to guarantee that work performed on an object is performed in order.

The Actor queue fails because eventually you get to requests in the queue for an object that is already working on a request. Where do the requests go then?

In an async system a reply is just another message that needs to be fed to the FSM. There are also other requests for the same object in the queue. If the reply is just appended to the queue then that probably screws up the state machine because it’s not expecting a new request. So the reply must be handled so that it is delivered before any new requests. This isn’t that obvious how this should be done.

All Operations Are Actions

The universal model for operations becomes the Action. The AppBackplane framework derives a base class called AppAction from Action for use in the framework.

All operations must be converted to AppActions by some means.

The reasoning is:

  • We need a generic way to queue operations and for the work scheduler to invoke operations.
  • Through the use of a base class an Action can represent any operation because all the details can be stuck in the derived class.
  • Actions can be put in a memory pool so they don’t have to be dynamically allocated.

Requests are converted to actions in the thread queueing the request. The actions are added to the object that is supposed to handle them. The Action/conversion approach means that the type of a message only has to be recovered once. From then on out the correct type is being used. Parallelism can be implemented by the application having more than one object for a particular service.

Ready And Wait Queues

The ready and wait queues take the place of the single Actor queue. AppBackplane is the event scheduler in that it decides what work to next. AppComponents are multiplexed over the AppBackplane.

  • An object cannot be deleted if it is the wait queue.
  • Aggregation is not performed in the queues. The correct place for aggregation is in the senders. Most every request expects a reply, so aggregation usually won’t work.
  • The total number of requests is counted so that we can maintain a limit.
  • The number of concurrent requests is limited.

Handling Replies

An object can have only one outstanding reply so we don’t need dynamic memory to keep track of an arbitrary amount of outstanding requests.

If an object is expecting reply the object is put on the wait queue until the reply comes in. The reply is queued first in the object’s work queue. The object is then scheduled like any other object with work to do.

Client Constraints

Clients can not send multiple simultaneous requests and expect any order. Order is maintained by a client sending one request to a service at a time and not duplicating requests.

Steps To Using AppBackplane

  • Create a backplane to host components that agree to share the same policies. Policies include:
    • Number of priorities.
    • Quotas per priority.
    • CPU limit.
    • Latency policies. What is the longest an application should process work?
    • Lock policies between components.
    • Number of threads in the thread pool. This addresses lock policies.
  • For each application create a class derived from AppComponent and install it into the backplane.
  • Determine how your application can decomposed asynchronously.
  • Install the backplane and/or components in the system modules if necessary.
  • In each component:
    • Implement virtual methods.
    • Create AppActions for implementing all operations.
    • Decide which objects implement which operations.
    • Decide which operations are implement in the AppAction or in the object.
    • Decide if objects represent state machines.

Class Structure


            +-- ISA ----[Module]
[AppBackplane]--HAS N ----[AppComponent]
            +-- HAS 1 ----[AppScheduler]

[AppScheduler]-- HAS N ----[AppWorkerThread]
          | |
          | +-- HAS N ----[WorkQueue]
          +-- HAS 1 ----[WorkQueue]

            +-- ISA ----[Module]
[AppComponent]-- HAS N ----[AppObject]
            +-- HAS N ----[AppEndpoint]

[AppWorkerThread]-- ISA ----[Task]

[AppObject]-- HAS N ----[AppAction]

[AppAction]-- ISA ----[Action]


[MsgHandlerEndpoint]-- ISA ----[AppEndpoint,MsgHandler]

[HermesImplement]-- ISA ----[MsgHandlerEndpoint]

[DbAdd]-- ISA ----[HermesImplement]

Hermes Request Processing Example

This is an example how application requests over Hermes are handled using AppBackplane. Note, there’s a lot of adapter cruft to integrate an external messaging layer into AppBackplane. But it’s quite common to use a third party messaging library, so we have to have a way to take a blob of bytes over the wire and turn it into a typed message and then determine where the message should be processed. With a messaging layer that knows about AppBackplane it can be much cleaner.

  • An application has created its AppComponent. The routes it supports are registered during construction.
  • Routes derive from MsgHandlerEndpoint which is a MsgHandler. It supplies the namespace and in what system states the operations should be registered.
  • At the appropriate system state the backplane will register with Hermes.
  • Hermes requests for the operation are routed by Hermes.
  • MsgHandlerEndpoint calls CreateImplementationAction so the derived MsgHandlerEndpoint class can create an action that knows how to implement the operation.
  • During the CreateImplementationAction process the MsgHandlerEndpoint must have figured out which object the action should be queue to. It will do this using information from the request or it will have predetermined objects already in mind. The AppComponent and AppBackplane are available.
  • Objects capable of processing AppActions must derive from AppObject.
  • The backplane is asked to schedule the AppAction.
  • The AppAction is queued to the AppObject specified in the AppAction.
  • The AppObject is put on the priority queue specified by the AppObject.
  • An AppWorkerThread is allocated from the idle pool.
  • The AppWorkerThread thread priority is set to that indicated by the AppObject.
  • The AppObject is given to the AppWorkerThread when it gets a chance to run.
  • If a worker is not available the work will be scheduled later.
  • The AppWorkerThread is put on the active pool.
  • The first AppAction in the queue is dequeued and the action’s Doit method is invoked. The Doit method is the action’s way of saying do whatever you are supposed to do.
  • How the action accomplishes its task is up to the action. All the logic for carrying out the operation may be in the action. The action may invoke an event on a state machine and let the object take over from there. The operation may:
    • do its work and reply immediately
    • do some work and block waiting on an asynchronous reply
    • perform a synchronous call
  • If the object needs to send out more requests it will send the request and then call DoneWaitForReply so it will be put in the wait queue waiting for a reply.
  • If the object is done it calls Hermes’s Reply to send the Reply back to the sender. Then it calls DoneWaitForMoreWork to schedule it for more work.

Scheduling Algorithm

AppBackplane implements a scheduling layer on top of OS thread scheduling. An extra scheduling layer is necessary to handle fairness, low latency for high priority work, and throttling in a high continuous load environment.

The scheduling algorithm is based on:

  • integration period – a period, 1 second by default, over which CPU utilization is calculated. Quotas are replenished. When this timer fires we call it a “tick.”
  • priority queues – work is performed in priority order. Priorities are 0..N where 0 is the highest priority. The number of queues is configurable.
  • thread pool – more than one thread can be active at a time. The thread is scheduled and is run at the priority set by the work being done. The number of threads in the pool is configurable. Thread can be dynamically added and deleted.
  • CPU limitation – each backplane can use only so much of the CPU in an integration period. If the period is exceeded work is stopped. Work is started again on the next tick.
  • quotas – only so much work at a priority is processed at a time. Quotas are settable by the application. By default quotas are assigned using the following rules:
    • The highest priority has an unlimited quota.
    • The next highest priority has a quota of 100.
    • Every priority then has 1/2 have the previous higher priority until a limit of 12 is reached.
  • priority check – after each event is processed the scheduler starts again at the highest priority that has not used its entire quota. The idea is work may have come in for higher priority work while lower priority work was being processed. A priority may run out of work before its quota has been used so there may be more work that can be done at a priority.
  • virtual tick – If all the CPU hasn’t been used and all quotas at all priorities have been filled, then a virtual integration period is triggered that resets all quotas. This allows work to be done until the CPU usage limit is reached. Otherwise the backplane could stop processing work when it had most of its CPU budget remaining. The CPU usage is not reset on a virtual tick, it is only reset on the real tick.

Backplanes Are Not Organized By Priority

It is tempting to think of backplanes as being organized by thread priority. This is not really the case. Any application has a mix of work that can all run at different thread priorities.

You probably don’t wan’t to splat an application across different backplanes, though technically it could be done.

The reason is backplanes are more of a shared data domain where latency contracts are expected to be followed. Different backplanes won’t share the same latency policies, lock policies, or ready queue priorities and quotas.

Thus backplanes can’t be thought of as able to cooperated with each other.

CPU Throttling

CPU throttling is handled by the scheduler.

Request Throttling

Requests are throttled by having a per component operation limit. When a component reaches its limit and a new operation comes in then the component is asked to make room for new work, if the component can’t make room then the request is failed with an exception.

The client can use the exception as a form of back pressure so that it knows to wait a while before trying the request again.

Priority Inheritance For Low Priority Work

Low priority work should have its priority raised to that of the highest work outstanding when the low priority work is blocking higher priority work that could execute. The reasoning is…

Work is assigned to threads to execute at a particular priority.

Thread priorities are global. We must assign thread priorities on a system wide basis. There doesn’t seem to be a way around this.

Each backplane has priority based ready queues that are quota controlled. High priority work that hasn’t met its quota is executed. By priority we are talking ready queue priority, not thread priority. Thread priority is determined by the work.

Ready queue priorities and quotas are primarily determined by the backplane. They are ultimately constrained by a CPU limit on the backplane.

Lower priority work is scheduled when there is no higher priority work or higher priority work has met its quota.

When lower priority work has been assigned to a thread higher priority work can come in.

The lower priority work’s task may not be scheduled because a higher priority task is executing elsewhere in the system. This causes the lower priority work not the be scheduled and to not get a chance to run. In fact, it can be starved.

The lower priority work is causing the higher priority work not to execute, even if the higher priority work would run at a task priority higher than the currently running task that is blocking the lower priority work from running.

By upping the task priority of the lower priority work’s task to that of the higher priority work’s task priority we give the lower priority work a chance to run so it can complete and let the higher priority work run.

The lower priority work can run to completion or it can use global flags to know if it should exit its function as soon as possible so the higher priority work can run.

The priority inheritance works across backplanes so high priority work in any backplane will not block on lower priority work in any other backplane. The higher priority work is present flag is global.

Even if the lower priority work doesn’t not give up the critical section immediately, just giving it a chance to run will make it so the higher priority work can get scheduled sooner.

Backplane Doesn’t Mean No Mutexes

Mutexes can still be used within a backplane, but they should not be shared with threads outside the backplane because there can be no guarantees about how they are used.

Applications inside a backplane should be able to share mutexes and behave according to a shared expectation of how long they will be taken and when they will be taken.

Fair sharing is implemented by a max CPU usage being assigned to each backplane.


Yep, It’s Complicated.

A lot of the complication has to do with mapping different work input sources (messages, timers) into a common framework that can execute in an AppBackplane. Some form of this has to be done unless you really want have multiple threads running through your code without any discipline other than locks. That’s a disaster waiting to happen.

Other complications are the typical complexity you have with decoding and routing messages to handlers. This is always necessary however.

There’s complication around thinking about dependencies between components and the different states your application can be in, like start, initializing, primary, secondary, down. A lot of this is usually hidden in an application with lots of hacks to get around the weird dependencies that grow over time. I think it’s better to make dependencies a first class component of your application architecture so you can gain some power out of it instead of it being a constant source of bugs.

And then there’s the actual scheduling, which is complicated, but it does allow you to tune what work gets done when, which is the point of at all as you start trying to handle  more and more load while minimizing latencies and guaranteeing SLAs.

It’s something to think about anyway.

[repost ]Beyond Threads And Callbacks – Application Architecture Pros And Cons


There’s not a lot  of talk about application architectures at the process level. You have your threads, pools of threads, and you have your callback models. That’s about it. Languages/frameworks making a virtue out of simple models, like Go and Erlang, do so at the price of control. It’s difficult to make a low latency well conditioned application when a power full tool, like work scheduling, is taken out of the hands of the programmer.

But that’s not all there is my friend. We’ll dive into different ways an application can be composed across threads of control.

Your favorite language may not give you access to all the capabilities we are going to talk about, but lately there has been a sort of revival in considering performance important, especially for controlling latency variance, so I think it’s time to talk about these kind of issues. When it was do everything in the thread of a web server thread pool none of these issues really mattered. But now that developers are creating sophisticated networks of services, how you structure you application really does matter.

In the spirit of Level Scalability Solutions – The Conditioning Collection, we are going to talk about the pros and cons of some application architectures with an eye to making sure high priority works gets done with low enough latency, while dealing with deadlock and priority inheritance, while making the application developer’s life as simple as possible.

Forces To Balance

Here are some of the forces we are trying to harmonize when selecting an application architecture.

  • Scheduling High Priority Work – If high priority work comes in while lower priority work has the CPU, how do we make sure the high priority work gets handled in a bounded period of time?
  • Priority Inheritance – is a way of getting out of priority inversion problems by assuring that a task which holds a resource will execute at the priority of the highest priority task blocked on that resource. In practice it means code you thought wasn’t important can prevent other higher priority work from getting serviced.
  • Dead Lock – What you get when you use locks to protect data and you have a cycle in your dependencies. This happens when all these conditions are true: Blocking shared resources – a shared resource is protected by some mechanism, typically a lock; No pre-emption – a process or thread cannot force another process or thread to drop a shared resource it is holding; Acquiring while holding – resources are acquired while holding locks; Circular wait – A cycle (schedule) exists such that the above three conditions all hold for a set of processes or threads accessing a shared resource.
  • OS Latency – The gap between when a time-sensitive task needs to run and when the OS actually schedules it. Some factors: Timer Resolution, Scheduling Jitter, Non-Preemptable Portions.
  • Shared Locks – When there’s a shared lock anywhere we are back to potentially poor scheduling because the high priority task must wait on the low priority task. This requires anyone who shares the lock to know there is high priority work and never to take more time than the worst case performance needed for the high priority work. Of course, this changes dynamically as code changes and with different load scenarios.
  • Ease of Programming Model – Programmers shouldn’t be burdened with complicated hard to program schemes. Which I realize may invalidate everything we are talking about.
  • Robustness of Programming Model – It should be hard for programmers to break the system through dead locks, poor performance, and other problems.
  • Minimizing Lock Scope – Locks over operations that take an unexpectedly long time can really kill the performance of an entire system as many threads will be blocked or not get enough CPU. They’ll stop making progress on their work, which is not what we want.
  • Flow Control – The general process of keeping message chatter to a minimum through back pressure on senders, removing retries, and other techniques.
  • Reasonable Use of Memory – Memory management is a hidden killer in almost every long lived program. Over time memory becomes fragmented with lots of allocations anddeallocations and everything becomes slower because of it. Garbage collection doesn’t save you here. Message serialization/deserialization is often a big culprit. And calls to memory allocation will drop you into shared locks, which is a killer when components are cycling through a lot of memory.
  • Reasonable Use of Tasking – Lots of threads, depending on the scheduler OS, can cause high scheduling overhead as naive scheduling algorithms don’t perform well as the number of threads increase. Lots threads can also mean a lot of memory usage because each thread has allocated to it the maximum stack space. This is where languages like Go have a big lead over their C based cousins as it can minimize stack space while keeping many threads of control.
  • Time Slice – The period of time for which a process is allowed to run uninterrupted in a pre-emptive multitasking operating system. Generally you want your program to use it’s entire time slice and not do anything that gives up control of the CPU while you have it.
  • Starvation – Priority is a harsh mistress. At first blush it’s easy to think priority solves all your problems, but like money, mo’ priority, more problems. This biggest problem is the surprise you feel when you see work in your system not getting done. You suffer from dead lock or priority inheritance or large lock scope, but after bugs are accounted for, it may simply be that you have higher priority tasks that are using up all the CPU time, so even medium priority tasks in your system end up not getting serviced. Higher priority tasks may have endless streams of work to do because of poor flow control and other methods of poor messaging hygiene, so you use techniques like batching to reduce CPU usage, but sometimes there’s just a lot to do. Lower priority tasks need some CPU time so their state machines can make progress. Otherwise cascading failures can start as timeouts pile up. Another problem can be fault windows widen as it takes longer to process requests.

So you look at all these issues and think, well, there’s a good reason not to expose all this complexity to programmers, and that’s true. Playing it safe can be a good thing, but if you want to make full use of all your system resources, playing it safe won’t work.

On Threading

Threads are somewhat of a religious issue. A lot of people dislike threads because multithreaded programming is difficult. Which is true.

So why do we use threads at all?

  • latency. In latency sensitive systems some work must be performed with very low latency. In a run to completion approach where code runs as long as it wants, latency can not be guaranteed. Yield, like hope, is not a plan.
  • priority. At a system level we want more important work to be performed over lower priority work. Threads provide this ability because they can be preempted.
  • fairness. Preemption allows multiple threads to make progress on work at the same time. One thread doesn’t starve everyone at the same priority.
  • policy domain. A thread can be a domain for policies that you may or may not want applied to different applications. Examples are: priority, floating point safety, stack space, thread local storage, blocking, memory allocation.

The key to safe thread usage is not using shared data structures protected by locks. It is lock protected shared data structures that cause latency issues, deadlock, and priority inheritance. It is nearly impossible for a human to make a correct system when any thread can access shared state at any time.

Threading can also be bad when thread scheduling is slow. On real-time systems and modern Linux systems, threads are usually very efficiently scheduled, so it is not a huge concern.  Where 10,000 threads would have been unthinkable in the past, it’s not a bad approach anymore.

Are Locks OK?

If locking is so dangerous, should locks be used at all? An application that uses multiple threads and has a lock usable just within the application may not, if care is taken, suffer from the same problems as when locks are arbitrarily used from different application threads.

A single application can:

  • guarantee the amount of time spent blocked on the lock
  • prevent deadlock because it probably won’t do work in different threads
  • prevent priority inheritance

When different components share a process space nothing can can be guaranteed. Even bringing in code from other groups can ruin everything. A process has many hidden variables that different code components communicate through indirectly, even if they don’t have and obvious direct dependencies. Think any shared resource like memory management, timers, socket buffers, kernel locks, and available CPU time at different priorities. It gets real complex real fast.

Evented Model

This is “good enough” concurrency. Common examples are node.js and your browser. In anEvented architecture applications share a single threaded container where work runs to completion. Thus it’s a cooperative tasking model, which makes for an asynchronous application architecture. Your application is essentially a collection of functions that get called when events happen. Locks and shared state aren’t a problem as your code can’t be preempted. It’s a simple approach and can perform very well if applications are well behaved, especially if as in most web applications they spend a lot of time blocking on IO.

Erlang, for example, supports a preemptive model where work in one process can be suspended and another process can start working. Internally Erlang uses async IO and its calls are non-blocking so the Erlang kernel doesn’t interfere with process scheduling at the application level.

Go is like Erlang in that it has lightweight processes, but it’s like node.js in that preemptive scheduling is not supported. A goroutine runs to completion, so it never has to give up the CPU and if on a single CPU it’s expected application code will insert yields to give the scheduler a chance to run.

A downside of the evented model, and even Erlang, is there’s no concept of priority. When a high priority input comes in there’s no way to bound the response latency.

Without preemption, CPU heavy workloads, as is common in a service architecture, can starve other work from making progress. This increases latency variance which is how long tails are created at the architecture level. Bounding latency variance requires a more sophisticated application architecture.

Thread Model

In this model applications always operate out of  others threads. For example, a web request is served out of a thread of a web server thread pool. Your application state is shared amongst all these threads and there’s no predictable order of when threads run and what work they are doing. Locks are a must and shared state is the rule. Usually all threads run at the same priority.

Dead lock in this model is nearly impossible to prevent as application complexity increases. You can never be sure what code will access what locks and what threads are involved in these locks.

Latency can’t be guaranteed because you don’t know what operations are being performed in your execution path. There’s no preemption. Priority inheritance becomes a big problem. Simply cranking up the number of threads for better performance can lead to very unpredictable behavior.

Actor Model  (1 – 1)

In the Actor model all work for an application is executed in a single thread context. One application maps to one thread, thus it is 1-1. Each Actor has a queue and blocks on a semaphore waiting for messages. When a message is posted on the queue this causes the Actor unblock and be scheduled to run at its priority level. The Actor then processes messages off the queue until it runs out of messages, gives up the processor by blocking, or is preempted.

There’s no shared state in this model so it eliminates locks which eliminates priority inheritance and deadlock issues. But we still have the latency for high priority work issue. As an Actor is processing a message it can’t be interrupted if a higher priority message arrives.

We can expect an Actor to be asynchronous so it doesn’t block the next message from processing for long.

Or we can introduce the idea of cooperation by having a priority associated with each message. If a higher priority message comes in on the queue then a thread variable “GiveUpWork” is set. Code in a sensitive a place would check this flag a particular points in its code and stop what it is doing so the higher priority work can be scheduled. An implication is to change the Actor’s priority accordingly with the highest priority work in its queue. This is another form cooperative multitasking. And its ugly. So we usually don’t do it this way.

Actor With Thread Pool (1-N)

Like the Actor model but requests are served by a thread pool instead of just one thread. Useful in cases where work items are independent of each other. Locks are required if there’s any shared state, but as the lock is completely under control  of the application the chances for deadlock are contained.

Actor Arbitrary Thread Model

A hybrid is where an application has an Actor, so it has a thread, but other threads access state in the Actor and even cals Actor methods directly. Shared locks are necessary to protect data. It’s easy in practice for this architecture to evolve because not everyone is disciplined about using just message passing between Actors.

This is in some ways the worse of all possible worlds. The Actor model is supposed to be a safe programming model, but instead it degenerates into a thread and lock nightmare.

Multiple Actor Model

Work is routed to multiple Actors. Parallelism is added by adding an Actor per some sort of category of work. For example, low priority work can go to one Actor. High priority work can go to another. With this approach work happens at the expected priority and another queue can preempt a lower priority queue. There’s no attempt at fairness however and a work item in each queue must cooperate and give up the processor as fast as possible.

A lock is still necessary because an application shares state across Actors, but the lock is controlled by the application and is totally responsible for locking behaviour.

The disadvantage is this architecture is it more complex for applications to create. Message order issues also arise as parallel work is spread across multiple Actors.

Parallel State Machine Model

In this model work is represented by events. Events are queued up to one or more threads. Applications don’t have their own thread. All application work is queued up in event threads. Applications are usually asynchronous. The events know what application behavior to execute. In the Actor model an Actor knows what behavior to execute based on the message type and the current state.

Deadlock and locking issues still apply. One thread is safer, but then we have priority and latency issues.

This model is attractive when most of the work is of approximately equal and small size. This model has been used very well in state machine oriented computation (communicating finite state machines) where state transitions require deterministic and small times. Breaking down your system to make it suitable for this model is a big job. Paybacks are reduced complexity of implementation. Event priorities can be used in this model. This model is usually based on a run to completion kind of scheduling.

If latency is an important issue then this model can’t be used, at least everywhere as you’ll never be able to guarantee latency. Attempts to guarantee latency basically require writing another level of scheduling.

Actor + Parallel State Machine Model

If synchronous logic is used then an Actor can not process high priority messages when a low priority message is being processed.

One alternative is to use a state machine model in the processing of a request. A request is sent and the reply is expected on a specific message queue. Now the Actor begins the processing of the high priority message. Next time it picks up a message it can look for the reply message and resume processing of the low priority message. Having a separate message queue for replies simply makes it easy to find replies.

In short, having multiple message queues for different priority messages combined with state machine driven message processing when required can potentially give, in many situations, low latency processing of high priority messages.

For example, msg1 comes in at a high priority, then msg2 comes in at medium priority, and then comes msg3 at the low priory. The assignment of these priorities to different requests is part of application protocol design. The heuristics used by an application could be as follows. Process msg1 first. If more than 90 ms is spent processing msg1 and msg2 is starving, process one msg2 for 10 ms and then go back to msg1 processing.

If a msg3 is pending and it has not been processed for the last 1 minute, process msg3. Note that in this example each message is processed to completion. The heuristic kicks in to avoid starvation when large number of requests comes in.

The state machine is parallel in terms of the number of requests it can take. For each request processed you create a context for the request and the context is driven by the state machine. You might, for example, get a flood of msg1s. For each msg1 being processed, there will be a context block for storing an id (for context identification purpose), current state, and various associated data. As hundreds of msg1s are processed, hundreds of these contexts are created. One of the state these contexts go through is communication to another node. There is a timer associated with each context in case reply times out. As replies come in or a timer fires, the associated context makes its state transition.

Many applications can be implemented in a straight forward fashion in this model to achieve high throughput and low latency for high priority work.

Dataflow Model

A technique used in functional languages. It allows users to dynamically create any number of sequential threads. The threads are dataflow threads in the sense that a thread executing an operation will suspend until all operands needed have a well-defined value.

Erlang Model

Erlang has a process-based model of concurrency with asynchronous message passing. The processes are light-weight, i.e require little memory; creating and deleting processes and message passing require little computational effort.

No shared memory, all interaction between processes is by asynchronous message passing. The distribution is location transparent. The program does not have to consider whether the recipient of a message is a local process or located on a remote Erlang virtual machine.

The deficiency of Erlang’s approach is that there’s no way to control latency through priority. If certain events need to be serviced before others or if certain events need to be serviced quickly, Erlang doesn’t give you the necessary control needed to shape message processing.

SEDA Model

SEDA is an acronym for staged event-driven architecture, and decomposes a complex, event-driven application into a set of stages connected by queues. Each queue has a thread pool to execute work from the queue.
This design avoids the high overhead associated with thread-based concurrency models, and decouples event and thread scheduling from application logic. By performing admission control on each event queue, the service can be well-conditioned to load, preventing resources from being overcommitted when demand exceeds service capacity. SEDA employs dynamic control to automatically tune runtime parameters (such as the scheduling parameters of each stage), as well as to manage load, for example, by performing adaptive load shedding. Decomposing services into a set of stages also enables modularity and code reuse, as well as the development of debugging tools for complex event-driven applications.

Multiple Application Thread Pool Model (M – N)

In this model multiple applications are multiplexed over a common thread pool and a scheduling system on top of the thread scheduling system is imposed. The pool may have only one thread or it could have many. This could be considered a container for a parallel state machine model. Execution times for individual work items are tracked so work can be scheduled at a low level of granularity. Priority and fairness are built into the scheduler and follow application specific rules.

The basic idea is to represent an application so that behaviour is separated from threading. Then one or more applications can then be mapped to a single thread or a thread pool. If you want an event architecture then all applications can share a single thread. If you want N applications to share one thread then you can instantiate N applications, install them, and they should all multiplex properly over the single thread. If you want N applications to share M threads then that would work as well. Locking is application specific. An application has to know its architecture and lock appropriately.

Where Is The State Machine?

Every application can be described by an explicit or implicit state machine. In the Actor model the state machine is often implicit on the stack and in the state of the Actor. Clearly an Actor can use a state machine, but it is often easier for developers to not use a formal state machine.

In Parallel State Machine model state machines are usually made explicit because responses are asynchronous which means a response needs to be matched up with the original request. A state machine is a natural way to do this. And applications built around state machines tend to be more robust.


These are some of the issues to keep in mind when figuring out how to structure work flows within an application. All of the options can be used depending on the situation. It’s not easy at all, but if you create the right architecture for your application you can keep your application both responsive and busy, which is a pretty good result.

Related Articles