Worker Queues for near real term post processing based on API Providers

satya - 12/17/2013 9:44:17 AM

Problem

There is a server out there that wants to process consume or process a payload or a message.

A simple approach is to call that API in real time. However the server may be down when the caller is trying. So the server needs to call again by keeping a status including how many times the retries have been performed.

The simple approach gets interesting when there are multiple machines trying to do that work and trying to contact the server. Then each worker needs to tell the other worker that a particular message is already claimed.

Further the source needs to know, over time, the status of each work item and how many times it has been tried and if it is successful or not.

Also one has to investigate can the server allow parallel processing of these API calls? or do they need to be sequential. if parallelism is allowed, what is the optimal parallel load. Will the parallelism be done by multiple threads or multiple machines?

Of course typically this kind of problem is solved by sending an ftp overnight. which is essentially a short or (long) cut for queuing. That brings its own set of issues such as updating the source properly and potential for lot more errors. Will

satya - 12/17/2013 9:53:28 AM

Some common sense solutions

Ensure the server is multi-threaded.

Assume the messages doesn't need to be sequential and stateless

Have the source drop them in to a queue. Have the ability for the queue to spawn workers based on a number of policies including a)immediate b) timed with a threshold

Have the ability for worker threads or the queue to impact or update the status on the source.

Have the queue management to alter system level exceptions.

Have the ability to reque if the queue is to be damaged.

Have the ability to have multiple processes to que effectively.

make the queuing framework an abstraction allowing code to be written in a native language with no idea of a queue.

Make the queuing declarative or configuration driven.

Run rules for source events or messages or database entities to be queued based on certain properties.

satya - 12/17/2013 9:53:43 AM

This pattern can replace ftp to provide near real time processing

This pattern can replace ftp to provide near real time processing

satya - 12/17/2013 9:53:53 AM

Loos for tools in this space.

Loos for tools in this space.

satya - 12/17/2013 9:57:16 AM

Server needs for this to work well

Make functionality a real time service sending response back

Allow for single message or multiple messages in the payload of the API

Make the API object centric and not XML centric

where possible declare meta APIs to declare optimum server loads, times, parallelism so that client can configure themselves.

satya - 12/17/2013 9:58:42 AM

Advantages

Provides near real time processing

This keeps servers clean and simple to write

Less prone to errors as there are fewer points of failure

Allows parallel load balanced servers

satya - 12/17/2013 10:04:38 AM

Some drawbacks

It pushes the work to the client

At the expense of simple, it can throttle servers for scale

A message pattern may be better for servers that require total scale

satya - 12/17/2013 10:07:01 AM

Wonder....why not use the queuing on the server??

What if the client simply sends a message and server queues it! You still need a queue then because server may not be available.

needs more thought.....

satya - 12/17/2013 10:17:15 AM

I will be doing more research on this and post what I find....

I will be doing more research on this and post what I find....

satya - 12/17/2013 10:42:50 AM

Some alternative patterns


ftp
Serverside queuing
Message based

satya - 12/17/2013 10:49:00 AM

Challenge

How can multiple processes arrange a set of objects into a queue.