<accesscontrol>Main:MyGroup</accesscontrol>

Communication in Cattask v2009

The new CatTask model which was discussed in Cattaskv2009_overview_of_the_new_system shows that there will be a lot of communication among the 3 CatTask instances. Besides, experience in working with the current CatTaskService tells me that this is the most error-prone part in production environment. Those can explain why we have spent so much attention on building a good communication component.

So what communication technology should we use?

We have investigated 3 communication techniques so far: remoting, WCF and MSMQ. Besides, we found some interesting tricks. You can find the whole story in Remoting,WCF and MSMQ for CatTask.

At the moment, we are designing the module using MSMQ with the help of Rhino Service Bus.

Rhino Service Bus

Rhino Service Bus (RSB) is an ESB which is built on the top of MSMQ. Since the bus behaviours are mainly specified by its configuration file, we'd better look at the configuration to learn how the bus works:

<facility id="rhino.esb" >
 <bus threadCount="1" numberOfRetries="5" endpoint="msmq://localhost/ownqueue" />
 <messages>
 <add name="CatGlobe.Messages.WebShop" endpoint="msmq://web/WebShop"/>
 <add name="CatGlobe.Messages.CatTask" endpoint="msmq://catmaxb/CatTask"/>
 </messages>
</facility>

In short, a Rhino service bus:

In the <bus> element we can see an end point. It is the queue which the bus monitors for incoming messages. When messages come, the bus receive messages from the queue and invoke the appropriate consumers to process them. For example: in the image below, we have a consumer called CatGlobeMessageController which implements the IConsumerOf<HelloCatGlobe> interface. When messages of the type come, the bus invokes CatGlobeMessageController to process them.

Can send messages to other queues (of course!!!). The point here is that it has two Send APIs:

- Send with an explicitly specified end point (queue).

- Send without a specified endpoint. We need to specify the queues (message owners) of the message type. Notice the <messages> section in the configuration block above: it says that all the messages of types which are defined in the CatGlobe.Messages.WebShop namespace will be sent to the "msmq://web/WebShop" end point. So is the second setting for CatTask.

Publish/Notify: another feature of RSB is the ability to publish/notify messages to all the buses who are interested in. In order to receive published messages, a consumer must subscribe itself to the producer bus. For example, a bus can subscribe to the CatTask bus that it is interested in messages of the type CatGlobe.Messages.CatTask.TaskCompleted. After the subcription is done, whenever the CatTask bus publishes a TaskCompleted message, one will be sent to the subscriber bus.

Should we use RSB?

At the moment, my answer is YES. Let's consider pros and cons of it:

- Pros:

It was made and is being contributed by many good developers, well unittested.
It solves many issues which I ran into when I tried to write my own MSMQ code. Well, I'm not a giant, but I can stand on the shoulder of giants.
The built-in logger is very good. It can help us figure out any problem easily.

- Cons:

RSB is used in the distributed contexts where there may be a delay time between when a message is sent and when it is received. The problem will be raised in the next section.
The help file is not good. Yeah, as I just said, we can stand on the shoulder of giants, but we have to start from the ground and there is no stairs for us to climb to the shoulder! (On the contrary, with Microsoft framework, we often have more than one stairs to use, but some of them have a gap in the middle and some others lead you to a dead-end!!!)

Buses design for CatTask

We are using the option #3 for Controller.
The real implementation may vary a little bit: it is possible to use one queue for both LD and Controller. We will decide it later. In this design, we will use one queue for each. In my opinion, it may help us understand the system more easily.

Buses diagram

- A CatTask instance has two buses:

One, which is LD_Bus in the image below, for the local part of it: LD and Worker.
The another is Controller_Bus which is used for the Controller.

Message namespaces

(Not finish yet)

At the lowest level, all messages which are sent to MSMQ are of type of System.Messaging.Message. However, at the RSB level, we have the ability of typed messages. For example, the producer may send a TaskInstanceInfo message, and the consumer will receive the exact TaskInstanceInfo object.

CatTask.Messages.LocalDispatcher
CatTask.Messages.Worker
CatTask.Messages.Controller

Buses configuration

There will be images + configuration for buses here

Behaviours

Term definition:

TaskMessage: messages which are related to a task, such as scheduling, report status... messages
ControllerMessage: messages whose the main actor is the Controller, such as Controller announcement, Controller election, query for the active Controller messages...

Local Dispatcher, Worker, Controller: there may be a bit confusion about the usages of these terms

From the communication point of view: Local Dispatcher and Worker are the same END POINT.
From the functional point of view: they are two difference modules.

In this section, you will find how the CatTask's sub-modules should behave. There is some interesting things here:

The use of usual language makes them understandable to everyone.
They are unittest cases!!! One behaviour must have at least one test case. For example, with the behaviour below

When an inactive Controller receives an AreYouController message

It will reply an ControllerReport{IsActiveController = No} message to the sender

Unittest:

[Test]

public void When_An_Inactive_Controller_Receives_An_AreYouController_Message_It_Should_Reply_No()

{

}

Yeah, after we finish designing these behaviours and the interfaces for relevant classes, we can start writing unittests beforewriting actual code.

They are also very close to the real implementation

public void Consume(AreYouController message) 

{ 

 If (!this.isActiveController) 

 { 

 this.bus.Reply(new ControllerReport {IsActiveController = false, CorrelationId = message.Id}); 

 return; 

 } 

 // else, do it here 

}

Messages look up table

The fact that Rhino Service Bus' messages are strongly typed is great. I have a message type; I make a producer and a consumer for it. The bus will take care of get them to work together.
However, on one hand, having one message type for each communication will end up at an explosion of types. Each type needs to be implemented by a consumer. For example:

public class Worker : IConsumerOf<CanYouExecuteTask>, IConsumerOf<PleaseExecuteTask>, IConsumerOf<TaskExecutionStatus>

that's why I'd like to keep message types at an acceptable amount by grouping similar message types to one with a property which can be used to distinguish them.

On the other hand, I want to make the behaviours descriptions below as much understandable as possible. Having multi-meaning messages will be a backward step.
Thus, in the behaviours descriptions below I will use single meaning messages. Some of them are used for the ease of understanding only. The mapping between them and the actual message types can be found here Message types mapping table.

Tasks scheduling (Use case 1)

Main path

When a Task is scheduled

The LD will send a TaskInstanceInfo{TaskInstanceId, DeliveryStatus = Unknown} message to the Controller and also put the message to local storage to keep trace of it.

When an active Controller receives a TaskMessage

It will reply a TaskReceived message to the sender.

When an LD receives a TaskReceived reply message

It will delegate the work to the ProcessControllerReport function to mark the stored-to-keep-trace-of message's status to DELIVERED

Exceptional path

When an inactive Controller receives a TaskMessage
It will reply a Error_'IAmNotTheActiveController message to the sender.

When an LD receives an Error_IAmNotTheActiveController message

It will move the source message to the delaySendingList and asks for the new active Controller (ProcessControllerReport)

Task Execution (Use case 2)

Main path

When the start time of a task comes

The Controller will broadcast a CanYouExecuteTask message to all Workers.

When a Worker receives a CanYouExecuteTask message and it can execute the task
It will reply a CanExecuteTaskResult{Yes} message

When the active Controller receives a CanExecuteTaskResult{Yes} message from an LD

It will send a PleaseExecuteTask message

When a Worker receives a PleaseExecuteTask message

It will reply a TaskExecutionStatus{AboutToStart} message and start the task.

When a Worker receives a TaskExecutionStatusRequest message

It will reply a TaskExecutionStatus{CurrentStatus, CurrentProgression} message

When the Controller receives a TaskExecutionStatus

It will update its data about the task.

When a SystemTask is finished

The Controller will send a TaskExecutionResult message back (reply) to the sender

Alternative path

If the same task is currently running: what "the same" means?
Determine if the task depends on currently scheduled or running tasks: perhaps the Controller doesn't need to ask all Workers about this. It can determine by itself.

Exceptional path

When a Worker receives a CanYouExecuteTask message and it cannot execute the task
It will reply a CanExecuteTaskResult{No} message

When all the CanExecuteTaskResult messages the active Controller receives are NO

It will pickup one LD randomly

CatTask starts up (Use case 4)

In this section, the verb phrase "subscribe to the Controller" means "subscribe to the Controller for message types of which it is the consumer". As a result, when the Controller publishes messages of those types, one is sent to the subscriber.

When an LD starts

It will subcribe to all Controllers

When an LD receives the IAmTheNewController announcement from the a Controller

It will subscribe to the Controller

When a Controller starts

It will send a PleaseElectANewActiveController message to all Controllers

Cache Invalidation

When a CacheInvalidation system task is scheduled

The LD will send a CacheInvalidation message to the active Controller

When the active Controller receives a CacheInvalidation message

It will broadcast a RemoveCacheItem message to all Workers

When a Worker receives a RemoveCacheItem message

It will remove the relevant cache item from the Cache.

Looking for Controller's endpoint (Use case 4)

When an LD wants to know where the active Controller is

It will broadcast an AreYouController message to all Controllers

When an LD has a message to send but it doesn't know where the active Controller is

It will put the message to the delaySendList

<span />When an LD updates the end point of the active Controller

It will send all messages in the delaySendList.

When the active Controller receives an AreYouController message

It will reply an IAmTheActiveController message to the sender

When an inactive Controller receives an AreYouController message

It will reply an IAmNotTheActiveController message to the sender

Controller election (Use case 4, 5)

Presuming that we have a place where a Controller can come to ask if it can be an active one. Obviously that there must be one and only one Controller gets a YES answer. An election is held when:

The Controllers receive a PleaseElectANewActiveController message from an LD.
The Controllers receive a PleaseElectANewActiveController message from a newly started Controller.

When an Active Controller figures out that it should still be the Controller

It will send an IAmTheNewController message to all LDs and Controllers

When an inactive Controller figures out that it has no chance to be a Controller at the moment

Do nothing/waiting for confirmation from the Active one/send a message to the active one to confirm?

When a Controller becomes the Active Controller
It will publish a IAmTheNewController message to everyone

Message lifetime

In the context of CatTask, message lifetime is a big issue. By default, a message stays in a queue as long as no one picks it up. Let's look at a example:

- At 7h00 AM: a local dispatcher wakes up and wants to know where the active controller is. It will broadcast an AreYouTheActiveController to all controllers. And if it receive no positive reply, it will broadcast a PleaseElectANewController message to all controllers' queues.

- We can assume that the #3 site (thus, controller and dispatcher) is not running at the moment. Therefore, all the messages which are sent to it are not picked up in time.

- At 8h00 AM: the #3 is started and process the messages in queue. Oops, PleaseElectANewController message is 1 hour old and it is probably that one of the other two controllers has been taken the active role.

We have two choices:

Messages should have a receive timeout. If the timeout time passes, they should be removed from queue. MSMQ does support this, but Rhino Service Bus doesn't.
Add a Send-date property to messages so that the system (CatTask) can check if they are out-of-date and ignores them. We should take into account the fact that there may be a little bit difference between clocks (current time) of the servers.