Table of Contents
The ARE also provides agents with the possibility to persist themselves in a database. There are currently two ways of doing this.
The first option is checkpointing an agent. This means that a snapshot is taken of the current state of the agent, after which this snapshot is stored in a database. As a result, agents can easily be recovered after a system shutdown. As a value-added extra, the ARE also does automatic checkpointing on agents.
The second option is for the agent to suspend itself. This means that the state from the agent is extracted and stored in the database, after which the agent is removed from the habitat. The agent will remain in this suspended state until a message arrives. At this point the agent will transparently be restored in the habitat. This gives idle agents the change to relieve the system of their load.
For an agent to use persistency, some requirements must be met. Broadly speaking, the Habitat must be connected to a database and the agent must implement an interface that tags it as persistable.
Deploying a Habitat connected to a database is described in Section 7.4, “Habitats and Databases”. For the rest of this section it is assumed there is a database connected.
For agents, the following applies:
If the agent implements TransientAgent , it will never be suspended.
If the agent implements StateSerializable , it will be suspended logically.
If the agent implements java.io.Serializable, the agent will be suspended binary.
If an agent implements multiple interfaces, the first interface in the list above is authoritive.
Example 4.1.
An agent implementing both TransientAgent and Serializable can be transported, but it will never be persisted.
When an agent is restored, it is initially instantiated without arguments. This means that if an agent's main class only has a constructor with the String[] args arguments, this constructor must handle an empty array as args correctly.
Checkpointing is the process of storing the state of an agent in a database. This can be used for agent recovery after a system shutdown, thus ensuring a consistent habitat after a restart.
Checkpointing is done both automatically and on request. Automatic checkpointing takes place at the following points in time:
Creation
After a clone (only the cloned agent)
Being upgraded
Failures during an automatic checkpoint are logged. No agent is notified of this event.
Agents can also checkpoint on request, for instance after important transactions. This is done using request checkpoint-sender as described in The checkpoint-sender protocol.
Agents are notified of failures during checkpoint on request.
Agents can also be moved to the database. This process is called suspending and only occurs on request.
When an agent is suspended, it is descheduled and as much data as possible is removed from memory. Prudent use of suspending can seriously lower the memory and CPU requirements of your habitat.
The fact that an agent is stored in the database is transparent for the rest of the world. The agents can still be addressed and messages will arrive. This is done using a transparent wake-up mechanism where an agent is automatically restored to the Habitat if a message arrives.
Agents can also request to be woken up after a certain time has elapsed. When no message has arrived before this time-out, the Wakeup System Agent will send a wakeup call to the agent.
Due to the asynchronous background of the process, there is a window during persistency where the agent is unreachable. This boils down to the fact that sending a message to a persisting agent can result in a NoSuchAgentException or a delivery failure.
When an agent wants to suspend itself, it can send a message to its local Place System Agent, using The suspend-sender protocol.
The ARE provides two modes of persistence, binary and logical persistence. These modes refer to the way state is extracted from agents and stored in databases.
Binary persistence is accomplished by implementing the java.io.Serializable interface [7]. Extracting state then automatically follows the general contract for serializing objects as specified in The Java™ Object Serialization Specification .
Logical persistence is accomplished by implementing the tryllian.are.StateSerializable interface. In this case, state is extracted using the method StateSerializable.getState() and restored using the StateSerializable.setState(String) method.
Logical persistence has a few advantages over binary persistence. Serialization of an object traverses the entire object graph referenced by this agent and serializes the state of every object encountered. This serialization is done in a machine-readable format.
When persisting agents, large parts of the internal state are not of interest. Therefore the ARE provides the logical persistence where an agent has complete control over exactly those parts of the state that it want to be persisted.
Furthermore, since logical persistence can result in human readable strings, it becomes possible to debug the state of agents in the database without the intervention of specialized tools.
While the Java™ Serialization mechanism allows you to override the serialization behavior, the StateSerializable is still necessary. The ARE uses serialization for two purposes: mobility and persistency. It makes sense to distinguish these two kinds of state-transfer by introducing a new interface.
The coupling between JNDI and persistency currently has a small shortcoming. In theory, persistent agents also have persistent JNDI bindings. But since the JNDI module can not distinguish transient agents from persistable agents, currently every JNDI binding is persisted.
The upshot of this is that the database modifications necessary for a restore also require modification of the JNDI database by purging all entries related to transient agents.
This is described in detail in Section 7.4, “Habitats and Databases”.
[7] Or if you prefer, the java.io.Externalizable interface. On the ARE level these two interfaces are not distinguished.