Cmdlets for Google Data Catalog

Build 21.0.7930

Establishing a Connection

With the CData Cmdlets users can install a data module, set the connection properties, and start scripting. This section provides examples of using our GoogleDataCatalog Cmdlets with native PowerShell cmdlets, like the CSV import and export cmdlets.

Installing and Connecting

If you have PSGet, installing the cmdlets can be accomplished from the PowerShell Gallery with the following command. You can also obtain a setup from the CData site.

Install-Module GoogleDataCatalogCmdlets

The following line is then added to your profile, loading the cmdlets on the next session:

Import-Module GoogleDataCatalogCmdlets;

You can then use the Connect-GoogleDataCatalog cmdlet to create a connection object that can be passed to other cmdlets:

$conn = -InitiateoAuth "GETANDREFRESH" -ProjectId "YourProjectId"

Connecting to Google Data Catalog

Provide the following connection properties before adding the authentication properties.

  • OrganizationId: The ID associated with the Google Cloud Platform organization resource you would like to connect to. Find this by navigating to the cloud console.
    Click the project selection drop-down, and select your organization from the list. Then, click More -> Settings. The organization ID is displayed on this page.
  • ProjectId The ID associated with the Google Cloud Platform project resource you would like to connect to.
    Find this by navigating to the cloud console dashboard and selecting your project from the Select from drop-down. The project ID will be present in the Project info card.

Authenticating to Google Data Catalog

All connections to Google Data Catalog are authenticated using OAuth. The cmdlet supports using user accounts, service accounts and GCP instance accounts for authentication.

Authenticate with a User Account

AuthScheme must be set to OAuth in all of the user account flows. For desktop applications, the cmdlet's default application is the simplest way to authenticate. The only additional requirement is to set InitiateOAuth to GETANDREFRESH.

When the driver starts, it will open a browser and Google Data Catalog will request your login information. The cmdlet will use the credentials you provide to access your Google Data Catalog data. These credentials will be saved and automatically refreshed as needed.

See Using OAuth Authentication for a authentication guide covering all the supported methods in detail.

Authenticate with a Service Account

To authenticate using a service account, you must create a new service account and have a copy of the accounts certificate.

For a JSON file, you will need to set these properties:

  • AuthScheme: Required. Set this to OAuthJWT.
  • InitiateOAuth: Required. Set this to GETANDREFRESH.
  • OAuthJWTCertType: Required. Set this to GOOGLEJSON.
  • OAuthJWTCert: Required. Set this to the path to the .json file provided by Google.
  • OAuthJWTSubject: Optional. Only set this value if the service account is part of a GSuite domain and you want to enable delegation. The value of this property should be the email address of the user whose data you want to access.

For a PFX file, you will need to set these properties instead:

  • AuthScheme: Required. Set this to OAuthJWT.
  • InitiateOAuth: Required. Set this to GETANDREFRESH.
  • OAuthJWTCertType: Required. Set this to PFXFILE.
  • OAuthJWTCert: Required. Set this to the path to the .pfx file provided by Google.
  • OAuthJWTCertPassword: Optional. Set this to the .pfx file password. In most cases this will need to be provided since Google encrypts PFX certificates.
  • OAuthJWTCertSubject: Optional. Set this only if you are using a OAuthJWTCertType which stores multiple certificates. Should not be set for PFX certificates generated by Google.
  • OAuthJWTIssuer: Required. Set this to the email address of the service account. This address will usually include the domain iam.gserviceaccount.com.
  • OAuthJWTSubject: Optional. Only set this value if the service account is part of a GSuite domain and you want to enable delegation. The value of this property should be the email address of the user whose data you want to access.

If you do not already have a service account, you can create one by following the procedure in Creating a Custom OAuth App.

Authenticate with a GCP Instance Account

When running on a GCP virtual machine, the cmdlet can authenticate using a service account tied to the virtual machine. To use this mode, set AuthScheme to GCPInstanceAccount.

Retrieving Data

The Select-GoogleDataCatalog cmdlet provides a native PowerShell interface for retrieving data:

$results = Select-GoogleDataCatalog -Connection $conn -Table "Schemas" -Columns @("Type, DatasetName") -Where "ProjectId='bigquery-public-data'"
The Invoke-GoogleDataCatalog cmdlet provides an SQL interface. This cmdlet can be used to execute an SQL query via the Query parameter.

Piping Cmdlet Output

The cmdlets return row objects to the pipeline one row at a time. The following line exports results to a CSV file:

Select-GoogleDataCatalog -Connection $conn -Table Schemas -Where "ProjectId = 'bigquery-public-data'" | Select -Property * -ExcludeProperty Connection,Table,Columns | Export-Csv -Path c:\mySchemasData.csv -NoTypeInformation

You will notice that we piped the results from Select-GoogleDataCatalog into a Select-Object cmdlet and excluded some properties before piping them into an Export-CSV cmdlet. We do this because the CData Cmdlets append Connection, Table, and Columns information onto each row object in the result set, and we do not necessarily want that information in our CSV file.

However, this makes it easy to pipe the output of one cmdlet to another. The following is an example of converting a result set to JSON:

 
PS C:\> $conn  = -InitiateoAuth "GETANDREFRESH" -ProjectId "YourProjectId"
PS C:\> $row = Select-GoogleDataCatalog -Connection $conn -Table "Schemas" -Columns (Type, DatasetName) -Where "ProjectId = 'bigquery-public-data'" | select -first 1
PS C:\> $row | ConvertTo-Json
{
  "Connection":  {

  },
  "Table":  "Schemas",
  "Columns":  [

  ],
  "Type":  "MyType",
  "DatasetName":  "MyDatasetName"
} 

Copyright (c) 2021 CData Software, Inc. - All rights reserved.
Build 21.0.7930