Clustering

Version 23.4.8839


Clustering


Clustering allows multiple CData Arc installations to work together, processing the same data with the same configuration. Workloads can be distributed horizontally across clustered Arc installations to improve scalability and ensure availability.

Overview

To take advantage of the high-availability and failover features supported in Arc, the application should be installed on multiple systems within the same server farm (the same cluster). A load balancer then distributes incoming traffic across the multiple systems hosting Arc instances.

Cluster behind a load balancer

When configured for clustering, each Arc installation in the server farm uses the same application configuration, processes data to/from the same locations on disk, and logs transactions in the same database tables.

As a result, multiple instances of the application behave like a single instance, and any particular instance can go down without jeopardizing the performance of the cluster.

Configuring Arc for Clustering

After Arc has been installed on each node within the cluster, each installation should be configured to use the same Application Database and Application Data Directory.

Application Database

Arc uses a database to log transaction history and any errors that occur in the application. Each instance of Arc should be configured to use the same application database to ensure that all files processed end up consolidated in the database.

Windows Edition

To configure the application database in the Windows edition, the AppDb environment variable must be set to include the appropriate connection string and provider. To accomplish this, modify the Web.Config file in the ‘www’ folder of the installation directory. Within this file is a commented-out XML element called connectionStrings, for example:

<!-- connectionStrings>
  <add 
    name="AppDb" 
    connectionString="server=SQLSERVER_LOCATION;database=DATABASE_NAME;uid=USER_ID;password=PASSWORD;" 
    providerName="System.Data.SqlClient"
  />
</connectionStrings -->

Un-comment this connectionStrings element and set the ‘connectionString’ and ‘providerName’ attributes to the appropriate connection parameters for the desired database. If Arc can successfully establish a connection with this connection string, it will use this database as the application database.

Embedded Java Server

When using the Java edition with the embedded Jetty server, the application database is configured in the arc.xml file found in the “webapp” folder of the installation directory. Within this server configuration file, the APP_DB environment variable must be set to a JDBC connection string containing the appropriate connection parameters for the desired database. For example:

<Call name="setInitParameter">
  <Arg>APP_DB</Arg>
  <Arg>jdbc:mysql:Server=MySQLServer;Port=3306;Database=mysql;User=user;Password=password</Arg>
</Call>

If Arc can successfully establish a connection with the APP_DB connection string, it will use that database as the application database.

External Java Server

When using the Java edition with an external Java servlet (any server other than the Jetty server that is included with the application), the details of configuring the application database depend upon the specific servlet used. Using the syntax appropriate for the specific servlet, one of the following approaches should be used when configuring the server:

  • Define a JNDI datasource to include the connection properties for the target database.
  • Set the APP_DB environment variable to a JDBC connection string.

If Arc can use the JDNI datasource or APP_DB connection string to connect to a database, it will use that database as the application database.

Application Data Directory

Arc stores all configuration data and application data in a folder on disk called the data directory. When clustering, each instance of Arc should be configured to use the same data directory. This ensures that all instances are processing the same files and using the same configuration.

Windows Edition

To configure the application data directory in the Windows edition, the AppDirectory environment variable must be set to the path where the directory should be created. To accomplish this, modify the Web.Config file in the ‘www’ folder of the installation directory. Within this file is a commented-out XML element called AppDirectory, and below this an element called appSettings where a custom data directory location can be specified:

<!-- appSettings>
  <add key="AppDirectory" value="C:\\directory\\subdirectory\\subdirectory\\" />
</appSettings -->

Un-comment this appSettings element and set the AppDirectory key value to the appropriate path on disk for the data directory. If Arc can find the path, and has the appropriate permissions to read and write at the given path, it will create the data folder within the specified directory.

Embedded Java Server

When using the Java edition with the embedded Jetty server, the application database is configured in the arc.xml file found in the “webapp” folder of the installation directory. Within this server configuration file, the AppDirectory environment variable must be set to the path to the desired directory. The following example demonstrates what this might look like when setting the data directory to a shared folder on a mounted drive:

<Call name="setInitParameter">
  <Arg>AppDirectory</Arg>
  <Arg>/mnt/shared/arc</Arg>
</Call>

If Arc can find the path, and has the appropriate permissions to read and write at the given path, it will create the data folder within the specified directory.

External Java Server

When using the Java edition with an external Java servlet (any server other than the Jetty server that is included with the application), the details of configuring the application data directory depend upon the specific servlet used. Using the syntax appropriate for the specific servlet, the AppDirectory environment variable must be set to the path to the desired directory.

If Arc can find the AppDirectory path, and has the appropriate permissions to read and write at the given path, it will create the data folder within the specified directory.

Locking and Concurrency

Arc uses locks to ensure that multiple instances do not interfere with each other or process the same file twice. In a clustered environment, efficient locking is critical to maintain throughput and prevent collisions. Thus it is strongly recommended not to cluster Arc instances across multiple server farms, such that file system latency might come into play.

Setting a shared Application Directory for each instance of Arc is sufficient to ensure that the file locks are respected by each instance.