Cmdlets for Databricks

Build 21.0.7930

Establishing a Connection

With the CData Cmdlets users can install a data module, set the connection properties, and start scripting. This section provides examples of using our Databricks Cmdlets with native PowerShell cmdlets, like the CSV import and export cmdlets.

Installing and Connecting

If you have PSGet, installing the cmdlets can be accomplished from the PowerShell Gallery with the following command. You can also obtain a setup from the CData site.

Install-Module DatabricksCmdlets

The following line is then added to your profile, loading the cmdlets on the next session:

Import-Module DatabricksCmdlets;

You can then use the Connect-Databricks cmdlet to create a connection object that can be passed to other cmdlets:

$conn = Connect-Databricks -Server "127.0.0.1" -Port "443" -TransportMode "HTTP" -HTTPPath "MyHTTPPath" -UseSSL "True" -User "MyUser" -Token "MyToken"

Connecting to Databricks

To connect to a Databricks cluster, set the properties as described below.

Note: The needed values can be found in your Databricks instance by navigating to Clusters, and selecting the desired cluster, and selecting the JDBC/ODBC tab under Advanced Options.

  • Server: Set to the Server Hostname of your Databricks cluster.
  • HTTPPath: Set to the HTTP Path of your Databricks cluster.
  • Token: Set to your personal access token (this value can be obtained by navigating to the User Settings page of your Databricks instance and selecting the Access Tokens tab).

Retrieving Data

The Select-Databricks cmdlet provides a native PowerShell interface for retrieving data:

$results = Select-Databricks -Connection $conn -Table "[CData].[Sample].Customers" -Columns @("City, CompanyName") -Where "Country='US'"
The Invoke-Databricks cmdlet provides an SQL interface. This cmdlet can be used to execute an SQL query via the Query parameter.

Piping Cmdlet Output

The cmdlets return row objects to the pipeline one row at a time. The following line exports results to a CSV file:

Select-Databricks -Connection $conn -Table [CData].[Sample].Customers -Where "Country = 'US'" | Select -Property * -ExcludeProperty Connection,Table,Columns | Export-Csv -Path c:\my[CData].[Sample].CustomersData.csv -NoTypeInformation

You will notice that we piped the results from Select-Databricks into a Select-Object cmdlet and excluded some properties before piping them into an Export-CSV cmdlet. We do this because the CData Cmdlets append Connection, Table, and Columns information onto each row object in the result set, and we do not necessarily want that information in our CSV file.

However, this makes it easy to pipe the output of one cmdlet to another. The following is an example of converting a result set to JSON:

 
PS C:\> $conn  = Connect-Databricks -Server "127.0.0.1" -Port "443" -TransportMode "HTTP" -HTTPPath "MyHTTPPath" -UseSSL "True" -User "MyUser" -Token "MyToken"
PS C:\> $row = Select-Databricks -Connection $conn -Table "[CData].[Sample].Customers" -Columns (City, CompanyName) -Where "Country = 'US'" | select -first 1
PS C:\> $row | ConvertTo-Json
{
  "Connection":  {

  },
  "Table":  "[CData].[Sample].Customers",
  "Columns":  [

  ],
  "City":  "MyCity",
  "CompanyName":  "MyCompanyName"
} 

Updating Data

The cmdlets make data transformation easy as well as data cleansing. The following example loads data from a CSV file into Databricks, checking first whether a record already exists and needs to be updated instead of inserted.

Import-Csv -Path C:\My[CData].[Sample].CustomersUpdates.csv | %{
  $record = Select-Databricks -Connection $conn -Table [CData].[Sample].Customers -Where ("_id = `'"+$_._id+"`'")
  if($record){
    Update-Databricks -Connection $conn -Table [CData].[Sample].Customers -Columns @("City","CompanyName") -Values @($_.City, $_.CompanyName) -Where "_id  = `'$_._id`'"
  }else{
    Add-Databricks -Connection $conn -Table [CData].[Sample].Customers -Columns @("City","CompanyName") -Values @($_.City, $_.CompanyName)
  }
}

Copyright (c) 2021 CData Software, Inc. - All rights reserved.
Build 21.0.7930