Clustering
Version 25.2.9314
Version 25.2.9314
Clustering
Clustering allows multiple CData Arc installations to work together, processing the same data with the same configuration. Workloads can be distributed horizontally across clustered Arc installations to improve scalability and ensure availability.
Overview
To take advantage of the high-availability and failover features supported in Arc, the application should be installed on multiple systems in the same server farm (the same cluster). A load balancer then distributes incoming traffic across the multiple systems hosting Arc instances.
When configured for clustering, each Arc installation in the server farm uses the same application configuration, processes data to and from the same locations on disk, and logs transactions in the same database tables.
As a result, multiple instances of the application behave like a single instance, and any particular instance can go down without jeopardizing the performance of the cluster.
Configuring Arc for Clustering
After Arc has been installed on each node in the cluster, each installation should be configured to use the same Application Database and Application Data Directory.
Licensing
You must apply a unique license key to each node in the cluster. Go to the licensing page for each node and apply one of your keys. The application creates a license file in the shared directory of the cluster. The license file name includes the machine name of both nodes in the shared directory.
Application Database
Arc uses a database to log transaction history and any errors that occur in the application. Each instance of Arc should be configured to use the same application database to ensure that all files processed end up consolidated in the database.
Windows Edition
To configure the application database in the Windows edition, the AppDb
environment variable must be set to include the appropriate connection string and provider. To accomplish this, modify the Web.Config
file in the www
folder of the installation directory. In this file is a commented-out XML element called connectionStrings, for example:
<!-- connectionStrings>
<add
name="AppDb"
connectionString="server=SQLSERVER_LOCATION;database=DATABASE_NAME;uid=USER_ID;password=PASSWORD;"
providerName="System.Data.SqlClient"
/>
</connectionStrings -->
Un-comment this connectionStrings element and set the connectionString
and providerName
attributes to the appropriate connection parameters for the desired database. If Arc can successfully establish a connection with this connection string, it uses this database as the application database.
Embedded Java Server
When using the Java edition with the embedded Jetty server, the application database is configured in the arc.xml
file found in the “webapp” folder of the installation directory. In this server configuration file, the APP_DB
environment variable must be set to a JDBC connection string containing the appropriate connection parameters for the desired database. For example:
<Call name="setInitParameter">
<Arg>APP_DB</Arg>
<Arg>jdbc:cdata:mysql:Server=MySQLServer;Port=3306;Database=mysql;User=user;Password=password</Arg>
</Call>
If Arc can successfully establish a connection with the APP_DB
connection string, it uses that database as the application database.
External Java Server
When using the Java edition with an external Java servlet (any server other than the Jetty server that is included with the application), the details of configuring the application database depend upon the specific servlet used. Using the syntax appropriate for the specific servlet, one of the following approaches should be used when configuring the server:
- Define a JNDI datasource to include the connection properties for the target database.
- Set the
APP_DB
environment variable to a JDBC connection string.
If Arc can use the JDNI datasource or APP_DB
connection string to connect to a database, it uses that database as the application database.
Application Data Directory
Arc stores all configuration data and application data in a folder on disk called the data directory. When clustering, each instance of Arc should be configured to use the same data directory. This ensures that all instances are processing the same files and using the same configuration.
Windows Edition
To configure the application data directory in the Windows edition, the AppDirectory
environment variable must be set to the path where the directory should be created. To accomplish this, modify the Web.Config
file in the ‘www’ folder of the installation directory. In this file is a commented-out XML element called AppDirectory, and below this an element called appSettings where a custom data directory location can be specified:
<!-- appSettings>
<add key="AppDirectory" value="C:\\directory\\subdirectory\\subdirectory\\" />
</appSettings -->
Un-comment this appSettings element and set the AppDirectory
key value to the appropriate path on disk for the data directory. If Arc can find the path, and has the appropriate permissions to read and write at the given path, it creates the data folder in the specified directory.
Embedded Java Server
When using the Java edition with the embedded Jetty server, the application database is configured in the arc.xml
file found in the “webapp” folder of the installation directory. In this server configuration file, the AppDirectory
environment variable must be set to the path to the desired directory. The following example demonstrates what this might look like when setting the data directory to a shared folder on a mounted drive:
<Call name="setInitParameter">
<Arg>AppDirectory</Arg>
<Arg>/mnt/shared/arc</Arg>
</Call>
If Arc can find the path, and has the appropriate permissions to read and write at the given path, it creates the data folder in the specified directory.
External Java Server
When using the Java edition with an external Java servlet (any server other than the Jetty server that is included with the application), the details of configuring the application data directory depend upon the specific servlet used. Using the syntax appropriate for the specific servlet, the AppDirectory
environment variable must be set to the path to the desired directory.
If Arc can find the AppDirectory
path, and has the appropriate permissions to read and write at the given path, it creates the data folder in the specified directory.
Locking and Concurrency
Arc uses locks to ensure that multiple instances do not interfere with each other or process the same file twice. In a clustered environment, efficient locking is critical to maintain throughput and prevent collisions. Thus it is strongly recommended not to cluster Arc instances across multiple server farms, such that file system latency might come into play.
Setting a shared Application Directory for each instance of Arc is sufficient to ensure that the file locks are respected by each instance.