Worker Queues for near real term post processing based on API Providers
satya - 12/17/2013 9:44:17 AM
Problem
There is a server out there that wants to process consume or process a payload or a message.
A simple approach is to call that API in real time. However the server may be down when the caller is trying. So the server needs to call again by keeping a status including how many times the retries have been performed.
The simple approach gets interesting when there are multiple machines trying to do that work and trying to contact the server. Then each worker needs to tell the other worker that a particular message is already claimed.
Further the source needs to know, over time, the status of each work item and how many times it has been tried and if it is successful or not.
Also one has to investigate can the server allow parallel processing of these API calls? or do they need to be sequential. if parallelism is allowed, what is the optimal parallel load. Will the parallelism be done by multiple threads or multiple machines?
Of course typically this kind of problem is solved by sending an ftp overnight. which is essentially a short or (long) cut for queuing. That brings its own set of issues such as updating the source properly and potential for lot more errors. Will
satya - 12/17/2013 9:53:28 AM
Some common sense solutions
Ensure the server is multi-threaded.
Assume the messages doesn't need to be sequential and stateless
Have the source drop them in to a queue. Have the ability for the queue to spawn workers based on a number of policies including a)immediate b) timed with a threshold
Have the ability for worker threads or the queue to impact or update the status on the source.
Have the queue management to alter system level exceptions.
Have the ability to reque if the queue is to be damaged.
Have the ability to have multiple processes to que effectively.
make the queuing framework an abstraction allowing code to be written in a native language with no idea of a queue.
Make the queuing declarative or configuration driven.
Run rules for source events or messages or database entities to be queued based on certain properties.
satya - 12/17/2013 9:53:43 AM
This pattern can replace ftp to provide near real time processing
This pattern can replace ftp to provide near real time processing
satya - 12/17/2013 9:53:53 AM
Loos for tools in this space.
Loos for tools in this space.
satya - 12/17/2013 9:57:16 AM
Server needs for this to work well
Make functionality a real time service sending response back
Allow for single message or multiple messages in the payload of the API
Make the API object centric and not XML centric
where possible declare meta APIs to declare optimum server loads, times, parallelism so that client can configure themselves.
satya - 12/17/2013 9:58:42 AM
Advantages
Provides near real time processing
This keeps servers clean and simple to write
Less prone to errors as there are fewer points of failure
Allows parallel load balanced servers
satya - 12/17/2013 10:04:38 AM
Some drawbacks
It pushes the work to the client
At the expense of simple, it can throttle servers for scale
A message pattern may be better for servers that require total scale
satya - 12/17/2013 10:07:01 AM
Wonder....why not use the queuing on the server??
What if the client simply sends a message and server queues it! You still need a queue then because server may not be available.
needs more thought.....
satya - 12/17/2013 10:17:15 AM
I will be doing more research on this and post what I find....
I will be doing more research on this and post what I find....
satya - 12/17/2013 10:42:50 AM
Some alternative patterns
ftp
Serverside queuing
Message based
satya - 12/17/2013 10:49:00 AM
Challenge
How can multiple processes arrange a set of objects into a queue.