Troubleshooting

This section describes various common problems that you may encounter when installing or using ScaleOut StateServer, the probable causes of these problems, and suggested remedies. To avoid difficulties, be sure that you carefully follow the Installation Steps when installing and configuring ScaleOut StateServer.

Problem: Management tools cannot connect to the local service.

Description: The Management Console and tools described in Command-line Control Programs cannot connect to the local StateServer service.

Probable cause(s) and resolution:

  1. The StateServer service is not running. Use the Windows service control manager to check that the service is installed and running.
  2. The StateServer service prematurely exits because the parameters file has become corrupted or was deleted. Restore the file or reinstall ScaleOut StateServer.

Problem: The service cannot discover other hosts.

Description: The management tools only show the local host, and the store only contains a single host.

Probable cause(s) and resolution:

  1. The multicast IP address and management port are not set identically on all hosts. Check the parameter settings on all hosts.
  2. The hosts are using different subnets. Using the management tools, check that the net_interface parameter is the same for all hosts.
  3. A network outage prevents the local host from connecting to other hosts. Recheck all network connections.

Problem: The local host does not join the store when activated.

Description: The management tools show that the local host remains inactive even though it has been started.

Probable cause(s) and resolution:

  1. The parameters file has invalid values. Check the event log for message 9 that confirms this situation. Using the management tools, recheck the parameters and correct invalid values. Make sure that you have typed in the license key correctly.
  2. An evaluation license key or temporary license key has passed its expiration date. Note that the expiration date is based on the date encoded in the key and not the date you installed the software. Please contact ScaleOut Software for assistance.
  3. You have reached the maximum number of licensed hosts. This number is encoded in your license key. Contact ScaleOut Software to upgrade your license and receive a new key.

Problem: One or more hosts intermittently report network outages.

Description: The event log shows that a host is experiencing multiple network outages although it appears to recover before becoming isolated and restarting.

Probable cause(s) and resolution:

  1. You have a very high network load (more than approximately 90% utilization) that causes ScaleOut StateServer’s heartbeat mechanism to report network outages. You may need to reduce the network load for the heartbeat messages to be reliably received.
  2. A network interface card, connector, or switch port is experiencing intermittent problems. Troubleshoot your network hardware to verify that the network is working reliably.

Problem: A host repeatedly restarts its StateServer service.

Description: A host repeatedly joins the store, becomes isolated, and restarts its service.

Probable cause(s) and resolution:

  1. This host is experiencing intermittent network outages sufficient to prevent the host from recovering prior to restarting. If a host becomes isolated for one minute, it will restart itself so that the store can heal without it. If an intermittent network outage exists, the host may succeed in rejoining the store after restarting.

Problem: Two or more hosts appear to be "stuck" in the joining, leaving, or an unknown state (signified by yellow icons in the console’s host list).

Description: Two or more hosts cannot successfully join or leave the store or are otherwise in an unknown state.

Probable cause(s) and resolution:

  1. This problem is usually caused by either a network outage that affects all hosts or by multiple, simultaneous host failures. If this problem occurs and cannot be resolved by restarting the affected hosts (using the RESTART button on the management console), the store must be restarted. To restart the store, please do the following:

    1. On each host, kill the soss_svr.exe process using the Windows Task Manager.
    2. On each host, restart the StateServer service using the Windows Service Control Manager or using the following console command: net start soss
    3. Rejoin the hosts to form a new distributed store.

Problem: The StateServer service does not restart when I hit the RESTART button.

Description: The RESTART button does not cause an immediate service restart.

Probable cause(s) and resolution:

  1. The RESTART button is designed to first have the host leave the store prior to performing a service restart. This minimizes disruption to other hosts, which would otherwise detect the restart as a server failure and invoke recovery procedures. You can force the service to immediately restart by pressing the RESTART button exactly three times. You can also use the soss restart_now command, or you can restart the StateServer service from the Windows Service Control Manager.

Problem: (GeoServer Option) My local store cannot connect to a remote store for replication.

Description: The remote store’s Test or Start button fails in the Management Console, which returns an error.

Probable cause(s) and resolution:

  1. The communications link (e.g., virtual private network) to the remote store is down. Restore the link.
  2. The remote store’s hosts are configured with gateway addresses that are not reachable from the local store over the communications link. Reconfigure the remote store’s host configuration on all hosts with gateway addresses that are reachable from the local store.
  3. The remote store has not been joined. Start the remote store by having the hosts join the store.

Problem: (GeoServer Option) Object replication does not seem to be working reliably.

Description: The remote store is missing objects that were created on the local store.

Probable cause(s) and resolution:

  1. The maximum memory limit was reached on the local store, and replication was suspended. Check the setting of the repl_threshold configuration parameter (see Configuration Parameters). Also, make sure that the communications link can keep up with the rate of object updates to the local store.
  2. The remote store was not synchronized when replication was started. Use the Sync button in the Management Console to replicate all objects to the remote store when replication is started.
  3. Some objects have been marked as not subject to replication. If you are using the APIs to create objects, make sure that you do not inadvertently mark them as "not subject to replication." See the API help files for details.