Batch Processing
The CData ADO.NET Provider for Databricks enables you to take advantage of the bulk load support in Databricks through DatabricksDataAdapters. You can use the Batch API to execute related SQL data manipulation statements simultaneously. The provider translates all SQL queries in the batch into a single request.
Using the ADO.NET Batch API
Performing a batch update consists of the following basic steps:
- Define custom parameterized SQL statements in DatabricksCommand objects.
- Set the UpdatedRowSource property of the DatabricksCommand object to "UpdateRowSource.None".
- Assign the DatabricksCommand objects to the DatabricksDataAdapter.
- Add the parameters to the command.
- Call the DatabricksDataAdapter's Update method. Pass in a DataSet or DataTable containing your changes.
Controlling Batch Size
Depending on factors such as the size of the request, your network resources, and the performance of the server, you may gain performance by executing several smaller batch requests. You can control the size of each batch by setting the DatabricksDataAdapter's UpdateBatchSize property to a positive integer.
Bulk INSERT
The following code prepares a single batch that inserts records in bulk. The example executes a batch INSERT of new DataRows, which have the "Added" state.
C#
DatabricksDataAdapter adapter = new DatabricksDataAdapter(); using (DatabricksConnection conn = new DatabricksConnection("Server=127.0.0.1;HTTPPath=MyHTTPPath;User=MyUser;Token=MyToken;")) { conn.Open(); adapter.InsertCommand = conn.CreateCommand(); adapter.InsertCommand.CommandText = "INSERT INTO [CData].[Sample].Customers (CompanyName) VALUES (@CompanyName)"; adapter.InsertCommand.UpdatedRowSource = UpdateRowSource.None; adapter.InsertCommand.Parameters.Add("@CompanyName", "CompanyName"); DataTable batchDataTable = new DataTable(); batchDataTable.Columns.Add("CompanyName", typeof(string)); batchDataTable.Rows.Add("Jon Deere"); batchDataTable.Rows.Add("RSSBus Inc."); adapter.UpdateBatchSize = 2; adapter.Update(batchDataTable); }
VB.NET
Dim adapter As New DatabricksDataAdapter()
Using conn As New DatabricksConnection("Server=127.0.0.1;HTTPPath=MyHTTPPath;User=MyUser;Token=MyToken;")
conn.Open()
adapter.InsertCommand = conn.CreateCommand()
adapter.InsertCommand.CommandText = "INSERT INTO [CData].[Sample].Customers (City) VALUES (@CompanyName)"
adapter.InsertCommand.UpdatedRowSource = UpdateRowSource.None
adapter.InsertCommand.Parameters.Add("@CompanyName", "CompanyName")
Dim batchDataTable As New DataTable()
batchDataTable.Columns.Add("CompanyName", GetType(String))
batchDataTable.Rows.Add("RSSBus Inc.")
batchDataTable.Rows.Add("Jon Deere")
adapter.UpdateBatchSize = 2
adapter.Update(batchDataTable)
End Using
Bulk Update
A batch update additionally requires the primary key of each row to update. The following example executes a batch for all DataRow records with a "Modified" state:
C#
DatabricksDataAdapter adapter = new DatabricksDataAdapter(); using (DatabricksConnection conn = new DatabricksConnection("Server=127.0.0.1;HTTPPath=MyHTTPPath;User=MyUser;Token=MyToken;")) { conn.Open(); adapter.UpdateCommand = conn.CreateCommand(); adapter.UpdateCommand.CommandText = "UPDATE [CData].[Sample].Customers SET CompanyName=@CompanyName WHERE _id=@_id"; adapter.UpdateCommand.Parameters.Add("@CompanyName", "CompanyName"); adapter.UpdateCommand.Parameters.Add("@_id", "_id"); adapter.UpdateCommand.UpdatedRowSource = UpdateRowSource.None; DataTable batchDataTable = new DataTable(); batchDataTable.Columns.Add("CompanyName", typeof(string)); batchDataTable.Rows.Add("Jon Deere"); batchDataTable.Rows.Add("RSSBus Inc."); adapter.UpdateBatchSize = 2; adapter.Update(dataTable); }
VB.NET
Dim adapter As New DatabricksDataAdapter()
Using conn As New DatabricksConnection("Server=127.0.0.1;HTTPPath=MyHTTPPath;User=MyUser;Token=MyToken;")
conn.Open()
adapter.UpdateCommand = conn.CreateCommand()
adapter.UpdateCommand.CommandText = "UPDATE [CData].[Sample].Customers SET CompanyName=@CompanyName WHERE _id=@_id"
adapter.UpdateCommand.Parameters.Add("@CompanyName", "CompanyName")
adapter.UpdateCommand.Parameters.Add("@_id", "_id")
adapter.UpdateCommand.UpdatedRowSource = UpdateRowSource.None
Dim batchDataTable As New DataTable()
batchDataTable.Columns.Add("CompanyName", GetType(String))
batchDataTable.Rows.Add("RSSBus Inc.")
batchDataTable.Rows.Add("Jon Deere")
adapter.UpdateBatchSize = 2
adapter.Update(dataTable)
End Using
Bulk Delete
The following code prepares a single batch that deletes records in bulk. The primary key for each row is required. The following example executes a batch for all DataRow records with a "Deleted" state:
C#
DatabricksDataAdapter adapter = new DatabricksDataAdapter();
using (DatabricksConnection conn = new DatabricksConnection("Server=127.0.0.1;HTTPPath=MyHTTPPath;User=MyUser;Token=MyToken;")) {
conn.Open();
adapter.DeleteCommand = conn.CreateCommand();
adapter.DeleteCommand.CommandText = "DELETE FROM [CData].[Sample].Customers WHERE _id=@_id";
adapter.DeleteCommand.Parameters.Add("@_id", "_id");
adapter.DeleteCommand.UpdatedRowSource = UpdateRowSource.None;
adapter.UpdateBatchSize = 2;
adpater.Update(table);
}
VB.NET
Dim adapter As New DatabricksDataAdapter()
Using conn As New DatabricksConnection("Server=127.0.0.1;HTTPPath=MyHTTPPath;User=MyUser;Token=MyToken;")
conn.Open()
adapter.DeleteCommand = conn.CreateCommand()
adapter.DeleteCommand.CommandText = "DELETE FROM [CData].[Sample].Customers WHERE _id=@_id"
adapter.DeleteCommand.Parameters.Add("@_id", "_id")
adapter.DeleteCommand.UpdatedRowSource = UpdateRowSource.None
adapter.UpdateBatchSize = 2
adpater.Update(table)
End Using