Cattaskv2009 Communication
<accesscontrol>Main:MyGroup</accesscontrol>
Contents
- 1 Communication in Cattask v2009
- 2 So what communication technology should we use?
- 3 Rhino Service Bus
- 4 Buses design for CatTask
- 5 Behaviours
- 5.1 Messages look up table
- 5.2 Tasks scheduling (Use case 1)
- 5.3 Task Execution (Use case 2)
- 5.4 CatTask starts up (Use case 4)
- 5.5 Cache Invalidation
- 5.6 Looking for Controller's endpoint (Use case 4)
- 5.7 Controller election (Use case 4, 5)
- 5.8 Message lifetime
- 5.9 Sequence diagrams
- 5.10 Place holder
- 5.11 Place holder
- 5.12 Place holder
- 5.13 Place holder
- 6 Active controller election
Communication in Cattask v2009
The new CatTask model which was discussed in Cattaskv2009_overview_of_the_new_system shows that there will be a lot of communication among the 3 CatTask instances. Besides, experience in working with the current CatTaskService tells me that this is the most error-prone part in production environment. Those can explain why we have spent so much attention on building a good communication component.
So what communication technology should we use?
We have investigated 3 communication techniques so far: remoting, WCF and MSMQ. Besides, we found some interesting tricks. You can find the whole story in Remoting,WCF and MSMQ for CatTask.
At the moment, we are designing the module using MSMQ with the help of Rhino Service Bus.
Rhino Service Bus
Rhino Service Bus (RSB) is an ESB which is built on the top of MSMQ. Since the bus behaviours are mainly specified by its configuration file, we'd better look at the configuration to learn how the bus works:
<facility id="rhino.esb" >
<bus threadCount="1" numberOfRetries="5" endpoint="msmq://localhost/ownqueue" />
<messages>
<add name="CatGlobe.Messages.WebShop" endpoint="msmq://web/WebShop"/>
<add name="CatGlobe.Messages.CatTask" endpoint="msmq://catmaxb/CatTask"/>
</messages>
</facility>
In short, a Rhino service bus:
- In the <bus> element we can see an end point. It is the queue which the bus monitors for incoming messages. When messages come, the bus receive messages from the queue and invoke the appropriate consumers to process them. For example: in the image below, we have a consumer called CatGlobeMessageController which implements the IConsumerOf<HelloCatGlobe> interface. When messages of the type come, the bus invokes CatGlobeMessageController to process them.
- Can send messages to other queues (of course!!!). The point here is that it has two Send APIs:
- Send with an explicitly specified end point (queue).
- Send without a specified endpoint. We need to specify the queues (message owners) of the message type. Notice the <messages> section in the configuration block above: it says that all the messages of types which are defined in the CatGlobe.Messages.WebShop namespace will be sent to the "msmq://web/WebShop" end point. So is the second setting for CatTask.
- Publish/Notify: another feature of RSB is the ability to publish/notify messages to all the buses who are interested in. In order to receive published messages, a consumer must subscribe itself to the producer bus. For example, a bus can subscribe to the CatTask bus that it is interested in messages of the type CatGlobe.Messages.CatTask.TaskCompleted. After the subcription is done, whenever the CatTask bus publishes a TaskCompleted message, one will be sent to the subscriber bus.
Should we use RSB?
At the moment, my answer is YES. Let's consider pros and cons of it:
- Pros:
- It was made and is being contributed by many good developers, well unittested.
- It solves many issues which I ran into when I tried to write my own MSMQ code. Well, I'm not a giant, but I can stand on the shoulder of giants.
- The built-in logger is very good. It can help us figure out any problem easily.
- Cons:
- RSB is used in the distributed contexts where there may be a delay time between when a message is sent and when it is received. The problem will be raised in the next section.
- The help file is not good. Yeah, as I just said, we can stand on the shoulder of giants, but we have to start from the ground and there is no stairs for us to climb to the shoulder! (On the contrary, with Microsoft framework, we often have more than one stairs to use, but some of them have a gap in the middle and some others lead you to a dead-end!!!)
Buses design for CatTask
- We are using the option #3 for Controller.
- The real implementation may vary a little bit: it is possible to use one queue for both LD and Controller. We will decide it later. In this design, we will use one queue for each. In my opinion, it may help us understand the system more easily.
Buses diagram
- A CatTask instance has two buses:
- One, which is LD_Bus in the image below, for the local part of it: LD and Worker.
- The another is Controller_Bus which is used for the Controller.
Message namespaces
(Not finish yet)
At the lowest level, all messages which are sent to MSMQ are of type of System.Messaging.Message. However, at the RSB level, we have the ability of typed messages. For example, the producer may send a TaskInstanceInfo message, and the consumer will receive the exact TaskInstanceInfo object.
- CatTask.Messages.LocalDispatcher
- CatTask.Messages.Worker
- CatTask.Messages.Controller
Buses configuration
There will be images + configuration for buses here
Behaviours
Term definition:
- TaskMessage: messages which are related to a task, such as scheduling, report status... messages
- ControllerMessage: messages whose the main actor is the Controller, such as Controller announcement, Controller election, query for the active Controller messages...
Local Dispatcher, Worker, Controller: there may be a bit confusion about the usages of these terms
- From the communication point of view: Local Dispatcher and Worker are the same END POINT.
- From the functional point of view: they are two difference modules.
In this section, you will find how the CatTask's sub-modules should behave. There is some interesting things here:
- The use of usual language makes them understandable to everyone.
- They are unittest cases!!! One behaviour must have at least one test case. For example, with the behaviour below
When an inactive Controller receives an AreYouController message
It will reply an ControllerReport{IsActiveController = No} message to the sender
Unittest:
[Test]
public void When_An_Inactive_Controller_Receives_An_AreYouController_Message_It_Should_Reply_No()
{
}
Yeah, after we finish designing these behaviours and the interfaces for relevant classes, we can start writing unittests beforewriting actual code.
- They are also very close to the real implementation
public void Consume(AreYouController message)
{
If (!this.isActiveController)
{
this.bus.Reply(new ControllerReport {IsActiveController = false, CorrelationId = message.Id});
return;
}
// else, do it here
}
Messages look up table
- The fact that Rhino Service Bus' messages are strongly typed is great. I have a message type; I make a producer and a consumer for it. The bus will take care of get them to work together.
- However, on one hand, having one message type for each communication will end up at an explosion of types. Each type needs to be implemented by a consumer. For example:
public class Worker : IConsumerOf<CanYouExecuteTask>, IConsumerOf<PleaseExecuteTask>, IConsumerOf<TaskExecutionStatus>
that's why I'd like to keep message types at an acceptable amount by grouping similar message types to one with a property which can be used to distinguish them.
- On the other hand, I want to make the behaviours descriptions below as much understandable as possible. Having multi-meaning messages will be a backward step.
- Thus, in the behaviours descriptions below I will use single meaning messages. Some of them are used for the ease of understanding only. The mapping between them and the actual message types can be found here Message types mapping table.
Tasks scheduling (Use case 1)
Main path
When a Task is scheduled
The LD will send a TaskInstanceInfo{TaskInstanceId, DeliveryStatus = Unknown} message to the Controller and also put the message to local storage to keep trace of it.
When an active Controller receives a TaskMessage
It will reply a TaskReceived message to the sender.
When an LD receives a TaskReceived reply message
It will delegate the work to the ProcessControllerReport function to mark the stored-to-keep-trace-of message's status to DELIVERED
Exceptional path
When an inactive Controller receives a TaskMessage
It will reply a Error_'IAmNotTheActiveController message to the sender.
When an LD receives an Error_IAmNotTheActiveController message
It will move the source message to the delaySendingList and asks for the new active Controller (ProcessControllerReport)
Task Execution (Use case 2)
Main path
When the start time of a task comes
The Controller will broadcast a CanYouExecuteTask message to all Workers.
When a Worker receives a CanYouExecuteTask message and it can execute the task
It will reply a CanExecuteTaskResult{Yes} message
When the active Controller receives a CanExecuteTaskResult{Yes} message from an LD
It will send a PleaseExecuteTask message
When a Worker receives a PleaseExecuteTask message
It will reply a TaskExecutionStatus{AboutToStart} message and start the task.
When a Worker receives a TaskExecutionStatusRequest message
It will reply a TaskExecutionStatus{CurrentStatus, CurrentProgression} message
When the Controller receives a TaskExecutionStatus
It will update its data about the task.
When a SystemTask is finished
The Controller will send a TaskExecutionResult message back (reply) to the sender
Alternative path
- If the same task is currently running: what "the same" means?
- Determine if the task depends on currently scheduled or running tasks: perhaps the Controller doesn't need to ask all Workers about this. It can determine by itself.
Exceptional path
When a Worker receives a CanYouExecuteTask message and it cannot execute the task
It will reply a CanExecuteTaskResult{No} message
When all the CanExecuteTaskResult messages the active Controller receives are NO
It will pickup one LD randomly
CatTask starts up (Use case 4)
In this section, the verb phrase "subscribe to the Controller" means "subscribe to the Controller for message types of which it is the consumer". As a result, when the Controller publishes messages of those types, one is sent to the subscriber.
When an LD starts
It will subcribe to all Controllers
When an LD receives the IAmTheNewController announcement from the a Controller
It will subscribe to the Controller
When a Controller starts
It will send a PleaseElectANewActiveController message to all Controllers
Cache Invalidation
When a CacheInvalidation system task is scheduled
The LD will send a CacheInvalidation message to the active Controller
When the active Controller receives a CacheInvalidation message
It will broadcast a RemoveCacheItem message to all Workers
When a Worker receives a RemoveCacheItem message
It will remove the relevant cache item from the Cache.
Looking for Controller's endpoint (Use case 4)
When an LD wants to know where the active Controller is
It will broadcast an AreYouController message to all Controllers
When an LD has a message to send but it doesn't know where the active Controller is
It will put the message to the delaySendList
<span />When an LD updates the end point of the active Controller
It will send all messages in the delaySendList.
When the active Controller receives an AreYouController message
It will reply an IAmTheActiveController message to the sender
When an inactive Controller receives an AreYouController message
It will reply an IAmNotTheActiveController message to the sender
Controller election (Use case 4, 5)
Presuming that we have a place where a Controller can come to ask if it can be an active one. Obviously that there must be one and only one Controller gets a YES answer. An election is held when:
- The Controllers receive a PleaseElectANewActiveController message from an LD.
- The Controllers receive a PleaseElectANewActiveController message from a newly started Controller.
When an Active Controller figures out that it should still be the Controller
It will send an IAmTheNewController message to all LDs and Controllers
When an inactive Controller figures out that it has no chance to be a Controller at the moment
Do nothing/waiting for confirmation from the Active one/send a message to the active one to confirm?
When a Controller becomes the Active Controller
It will publish a IAmTheNewController message to everyone
Message lifetime
In the context of CatTask, message lifetime is a big issue. By default, a message stays in a queue as long as no one picks it up. Let's look at a example:
- At 7h00 AM: a local dispatcher wakes up and wants to know where the active controller is. It will broadcast an AreYouTheActiveController to all controllers. And if it receive no positive reply, it will broadcast a PleaseElectANewController message to all controllers' queues.
- We can assume that the #3 site (thus, controller and dispatcher) is not running at the moment. Therefore, all the messages which are sent to it are not picked up in time.
- At 8h00 AM: the #3 is started and process the messages in queue. Oops, PleaseElectANewController message is 1 hour old and it is probably that one of the other two controllers has been taken the active role.
We have two choices:
- Messages should have a receive timeout. If the timeout time passes, they should be removed from queue. MSMQ does support this, but Rhino Service Bus doesn't.
- Add a Send-date property to messages so that the system (CatTask) can check if they are out-of-date and ignores them. We should take into account the fact that there may be a little bit difference between clocks (current time) of the servers.
Sequence diagrams
- This diagram illustrates the flow when a task is scheduled:
- Controller receives a task from the queue:
- Controller sends task to a worker:
- LocalDispatcher receives messages from its queue: it may be a report of the controller or a result of executing a SystemTask:
- Worker receives a message from its queue:
Place holder
Place holder
Place holder
Place holder
Active controller election
When is a controller not available?
- The server/IIS is down.
- The site is stopped to be disted or upgraded.
When is a controller not running?
- The site is recycled by IIS and there has been no request to it since then.
- Unhandled exception. The site crashes.
When does a controller start up?
- Recover from crash.
- Start after disting/upgrading.
- Recover from recycling.