Establishing a Connection
With the CData Cmdlets users can install a data module, set the connection properties, and start scripting. This section provides examples of using our Excel Cmdlets with native PowerShell cmdlets, like the CSV import and export cmdlets.
Installing and Connecting
If you have PSGet, installing the cmdlets can be accomplished from the PowerShell Gallery with the following command. You can also obtain a setup from the CData site.
Install-Module ExcelCmdlets
The following line is then added to your profile, loading the cmdlets on the next session:
Import-Module ExcelCmdlets;
You can then use the Connect-Excel cmdlet to create a connection object that can be passed to other cmdlets:
$conn = Connect-Excel -ExcelFile 'C:\MyExcelWorkbooks\SampleWorkbook.xlsx'
The CData Cmdlets PowerShell Module for Microsoft Excel allows connecting to local and remote Excel resources. Set the URI property to the Excel resource location, in addition to any other properties necessary to connect to your data source.
Connecting to Cloud-Hosted Microsoft Excel Files
While the cmdlet is capable of pulling data from Microsoft Excel files hosted on a variety of cloud data stores, INSERT, UPDATE, and DELETE are not supported outside of local files in this cmdlet.If you need INSERT/UPDATE/DELETE cloud files, you can download the corresponding CData cmdlet for that cloud host (supported via stored procedures), make changes with the local file's corresponding cmdlet, then upload the file using the cloud source's stored procedures.
As an example, if you wanted to update a file stored on SharePoint, you could use the CData SharePoint cmdlet's DownloadDocument procedure to download the Microsoft Excel file, update the local Microsoft Excel file with the CData Microsoft Excel cmdlet, then use the SharePoint cmdlet's UploadDocument procedure to upload the changed file to SharePoint.
A unique prefix at the beginning of the URI connection property is used to identify the cloud data store being targed by the cmdlet and the remainder of the path is a relative path to the desired folder (one table per file) or single file (a single table).
Amazon S3
Set the following to identify your Microsoft Excel resources stored on Amazon S3:
- ConnectionType: Set the ConnectionType to Amazon S3.
- URI: Set this to an Excel file in a bucket: s3://bucket1/folder1/file.xlsx.
- You can also connect to Microsoft Excel resources stored on Cloudera Ozone, after creating a volume and bucket and making a symbolic link to that bucket: s3://linktobucket/
See Connecting to Amazon S3 for more information regarding how to connect and authenticate to Excel files hosted on Amazon S3.
Azure Blob Storage
Set the following to identify your Microsoft Excel resources stored on Azure Blob Storage:
- ConnectionType: Set this to Azure Blob Storage.
- URI: Set this to the name of your container and the name of the blob. For example: azureblob://mycontainer/myblob/file.xlsx.
See Connecting to Azure Blob Storage for more information regarding how to connect and authenticate to Excel files hosted on Amazon Blob Storage.
Azure Data Lake Storage
Set the following to identify your Microsoft Excel resources stored on Azure Data Lake Storage:
- ConnectionType: Set this to Azure Data Lake Storage Gen1, Azure Data Lake Storage Gen2, or Azure Data Lake Storage Gen2 SSL.
- URI: Set this to the name of the file system, the name of the folder which contains your Microsoft Excel files, and the name of an Excel file. For example:
- Gen 1: adl://myfilesystem/folder1/file.xlsx
- Gen 2: abfs://myfilesystem/folder1/file.xlsx
- Gen 2 SSL: abfss://myfilesystem/folder1/file.xlsx
See Connecting to Azure Data Lake Storage for more information regarding how to connect and authenticate to Excel files hosted on Azure Data Lake Storage.
Azure File Storage
Set the following properties to connect:
- ConnectionType: Set this to Azure Files.
- URI: Set this the name of your azure file share and the name of the resource. For example: azurefile://fileShare/remotePath/file.xlsx.
- AzureStorageAccount (Required): Set this to the account associated with the Azure file.
You can authenticate either an Azure access key or an Azure shared access signature. Set one of the following:
- AzureAccessKey: Set this to the access key associated with the Azure file.
- AzureSharedAccessSignature: Set this to the shared access signature associated with the Azure file.
Box
Set the following to identify your Microsoft Excel resources stored on Box:
- ConnectionType: Set this to Box.
- URI: Set this the name of the file system, the name of the folder which contains your Microsoft Excel files, and the name of an Excel file. For example: box://folder1/file.xlsx.
See Connecting to Box for more information regarding how to connect and authenticate to Excel files hosted on Box.
Dropbox
Set the following to identify your Microsoft Excel resources stored on Dropbox:
- ConnectionType: Set this to Dropbox.
- URI: Set this to the path to a Excel file. For example: dropbox://folder1/file.xlsx.
See Connecting to Dropbox for more information regarding how to connect and authenticate to Excel files hosted on Dropbox.
FTP
The cmdlet supports both plaintext and SSL/TLS connections to FTP servers.
Set the following connection properties to connect:
- ConnectionType: Set this to either FTP or FTPS.
- URI: Set this to the address of the server followed by the path to the Excel file. For example: ftp://localhost:990/folder1/file.xlsx or ftps://localhost:990/folder1.
- User: Set this to your username on the FTP(S) server you want to connect to.
- Password: Set this to your password on the FTP(S) server you want to connect to.
Google Cloud Storage
Set the following to identify your Microsoft Excel resources stored on Google Cloud Storage:
- ConnectionType: Set this to Google Cloud Storage.
- URI: Set this to the path to the name of the file system, the name of the folder which contains your Microsoft Excel files, and the name of a Excel file. For example: gs://bucket/remotePath/file.xlsx.
See Connecting to Google Cloud Storage for more information regarding how to connect and authenticate to Excel files hosted on Google Cloud Storage.
Google Drive
Set the following to identify your Microsoft Excel resources stored on Google Drive:
- ConnectionType: Set this to Google Drive.
- URI: Set to the path to the name of the file system, the name of the folder which contains your Microsoft Excel files, and the name of an Excel file. For example: gdrive://folder1/file.xlsx.
See Connecting to Google Drive for more information regarding how to connect and authenticate to Excel files hosted on Google Drive.
HDFS
Set the following to identify your Microsoft Excel resources stored on HDFS:
- ConnectionType: Set this to HDFS or HDFS Secure.
- URI: Set this to the path to a Excel file. For example:
- HDFS: webhdfs://host:port/remotePath/file.xlsx
- HDFS Secure: webhdfss://host:port/remotePath/file.xlsx
- Cloudera Ozone (via the HttpFS gateway): webhdfs://<Ozone server>:<port>/user/myuser
- You must use Kerberos authentication to access Microsoft Excel files stored on Ozone.
- Ensure that you have Ozone 718.2.x on the Ozone cluster.
- Cloudera Manager version 7.10.1 is required.
There are two authentication methods available for connecting to HDFS data source, Anonymous Authentication and Negotiate (Kerberos) Authentication.
Anonymous Authentication
In some situations, you can connect to HDFS without any authentication connection properties. To do so, set the AuthScheme property to None (default).
Authenticate using Kerberos
When authentication credentials are required, you can use Kerberos for authentication. See Using Kerberos for details on how to authenticate with Kerberos.
HTTP Streams
Set the following to identify your Microsoft Excel resources stored on HTTP streams:
- ConnectionType: Set this to HTTP or HTTPS.
- URI: Set this to the URI of your HTTP(S) stream. For example:
- HTTP: http://remoteStream/file.xlsx
- HTTPS: https://remoteStream/file.xlsx
See Connecting to HTTP Streams for more information regarding how to connect and authenticate to Excel files hosted on HTTP Streams.
IBM Cloud Object Storage
Set the following to identify your Microsoft Excel resources stored on IBM Cloud Object Storage:
- ConnectionType: Set this to IBM Object Storage Source.
- URI: Set this to the bucket and folder. For example: ibmobjectstorage://bucket1/remotePath/file.xlsx.
- Region: Set this property to your IBM instance region. For example: eu-gb.
See Connecting to IBM Object Storage for more information regarding how to connect and authenticate to Excel files hosted on IBM Cloud Object Storage.
OneDrive
Set the following to identify your Microsoft Excel resources stored on OneDrive:
- ConnectionType: Set this to OneDrive.
- URI: Set this to the path to a Excel file. For example: onedrive://remotePath/file.xlsx.
See Connecting to OneDrive for more information regarding how to connect and authenticate to Excel files hosted on OneDrive.
OneLake
Set the following to identify your Microsoft Excel resources stored on OneLake:
- ConnectionType: Set this to OneLake.
- URI: Set this to the name of the workspace, followed by the item and item type. Optionally, include the folder path to be used as the root folder. For example: onelake://Workspace/Test.LakeHouse/Files/CustomFolder/file.xlsx.
See Connecting to OneLake for more information regarding how to connect and authenticate to Excel files hosted on OneLake.
Oracle Cloud Storage
Set the following properties to authenticate with HMAC:
- ConnectionType: Set the ConnectionType to Oracle Cloud Storage.
- URI: Set this to an Excel file in a bucket: os://bucket/remotePath/file.xlsx.
- AccessKey: Set this to an Oracle Cloud Access Key.
- SecretKey: Set this to an Oracle Cloud Secret Key.
- OracleNamespace: Set this to an Oracle cloud namespace.
- Region (optional): Set this to the hosting region for your S3-like Web Services.
SFTP
Set the following to identify your Microsoft Excel resources stored on SFTP:
- ConnectionType: Set this to SFTP.
- URI: Set this to the address of the server followed by the path to the folder to be used as the root folder. For example: sftp://server:port/remotePath/file.xlsx.
See Connecting to SFTP for more information regarding how to connect and authenticate to Excel files hosted on SFTP.
SharePoint Online
Set the following to identify your Microsoft Excel resources stored on SharePoint Online:
- ConnectionType: Set this to SharePoint REST or SharePoint SOAP.
- URI: Set this to a document library containing Excel files. For example:
- SharePoint Online REST: sprest://remotePath/file.xlsx
- SharePoint Online SOAP: sp://remotePath/file.xlsx
See Connecting to SharePoint Online for more information regarding how to connect and authenticate to Excel files hosted on SharePoint Online.
Connecting to a Workbook
The URI, under the Connection section, must be set to a valid Excel File (including the file path). The cmdlet supports the Office Open XML format used by Excel 2007 and later.
Querying a Workbook
You can then execute SELECT, INSERT, UPDATE, and DELETE statements to spreadsheets and ranges in the workbook. See Excel Operations for details on querying spreadsheet data as tables.
Retrieving Data
The Select-Excel cmdlet provides a native PowerShell interface for retrieving data:
$results = Select-Excel -Connection $conn -Table "Sheet" -Columns @("RowId, LastName") -Where "FirstName='Bob'"The Invoke-Excel cmdlet provides an SQL interface. This cmdlet can be used to execute an SQL query via the Query parameter.
Piping Cmdlet Output
The cmdlets return row objects to the pipeline one row at a time. The following line exports results to a CSV file:
Select-Excel -Connection $conn -Table Sheet -Where "FirstName <> 'Bob'" | Select -Property * -ExcludeProperty Connection,Table,Columns | Export-Csv -Path c:\mySheetData.csv -NoTypeInformation
You will notice that we piped the results from Select-Excel into a Select-Object cmdlet and excluded some properties before piping them into an Export-CSV cmdlet. We do this because the CData Cmdlets append Connection, Table, and Columns information onto each row object in the result set, and we do not necessarily want that information in our CSV file.
However, this makes it easy to pipe the output of one cmdlet to another. The following is an example of converting a result set to JSON:
PS C:\> $conn = Connect-Excel -ExcelFile 'C:\MyExcelWorkbooks\SampleWorkbook.xlsx' PS C:\> $row = Select-Excel -Connection $conn -Table "Sheet" -Columns (RowId, LastName) -Where "FirstName <> 'Bob'" | select -first 1 PS C:\> $row | ConvertTo-Json { "Connection": { }, "Table": "Sheet", "Columns": [ ], "RowId": "MyRowId", "LastName": "MyLastName" }
Deleting Data
The following line deletes any records that match the criteria:
Select-Excel -Connection $conn -Table Sheet -Where "FirstName = 'Bob'" | Remove-Excel
Modifying Data
The cmdlets make data transformation easy as well as data cleansing. The following example loads data from a CSV file into Microsoft Excel, checking first whether a record already exists and needs to be updated instead of inserted.
Import-Csv -Path C:\MySheetUpdates.csv | %{ $record = Select-Excel -Connection $conn -Table Sheet -Where ("RowId = `'"+$_.RowId+"`'") if($record){ Update-Excel -Connection $conn -Table Sheet -Columns @("RowId","LastName") -Values @($_.RowId, $_.LastName) -Where "RowId = `'$_.RowId`'" }else{ Add-Excel -Connection $conn -Table Sheet -Columns @("RowId","LastName") -Values @($_.RowId, $_.LastName) } }