Cmdlets for HDFS

Build 24.0.8963

Establishing a Connection

With the CData Cmdlets users can install a data module, set the connection properties, and start scripting. This section provides examples of using our HDFS Cmdlets with native PowerShell cmdlets, like the CSV import and export cmdlets.

Installing and Connecting

If you have PSGet, installing the cmdlets can be accomplished from the PowerShell Gallery with the following command. You can also obtain a setup from the CData site.

Install-Module HDFSCmdlets

The following line is then added to your profile, loading the cmdlets on the next session:

Import-Module HDFSCmdlets;

You can then use the Connect-HDFS cmdlet to create a connection object that can be passed to other cmdlets:

$conn = Connect-HDFS -Host "sandbox-hdp.hortonworks.com" -Port "50070" -Path "/user/root"

Connecting to HDFS

In order to connect, set the following connection properties:

  • Host: Set this value to the host of your HDFS installation.
  • Port: Set this value to the port of your HDFS installation. Default port: 50070
  • UseSSL: (Optional) Set this value to 'True', to negotiate TLS/SSL connections to the HDFS server. Default: 'False'.

Authenticating to HDFS

There are two authentication methods available for connecting to the HDFS data source, Anonymous Authentication and Negotiate (Kerberos) Authentication.

Anonymous Authentication

In some situations, HDFS may be connected to without any authentication connection properties. To do so, set the AuthScheme to None (default).

Kerberos

When authentication credentials are required, you can use Kerberos. See Using Kerberos for details on how to authenticate with Kerberos.

Retrieving Data

The Select-HDFS cmdlet provides a native PowerShell interface for retrieving data:

$results = Select-HDFS -Connection $conn -Table "Files" -Columns @("FileId, ChildrenNum") -Where "FileId='119116'"
The Invoke-HDFS cmdlet provides an SQL interface. This cmdlet can be used to execute an SQL query via the Query parameter.

Piping Cmdlet Output

The cmdlets return row objects to the pipeline one row at a time. The following line exports results to a CSV file:

Select-HDFS -Connection $conn -Table Files -Where "FileId = '119116'" | Select -Property * -ExcludeProperty Connection,Table,Columns | Export-Csv -Path c:\myFilesData.csv -NoTypeInformation

You will notice that we piped the results from Select-HDFS into a Select-Object cmdlet and excluded some properties before piping them into an Export-CSV cmdlet. We do this because the CData Cmdlets append Connection, Table, and Columns information onto each row object in the result set, and we do not necessarily want that information in our CSV file.

However, this makes it easy to pipe the output of one cmdlet to another. The following is an example of converting a result set to JSON:

 
PS C:\> $conn  = Connect-HDFS -Host "sandbox-hdp.hortonworks.com" -Port "50070" -Path "/user/root"
PS C:\> $row = Select-HDFS -Connection $conn -Table "Files" -Columns (FileId, ChildrenNum) -Where "FileId = '119116'" | select -first 1
PS C:\> $row | ConvertTo-Json
{
  "Connection":  {

  },
  "Table":  "Files",
  "Columns":  [

  ],
  "FileId":  "MyFileId",
  "ChildrenNum":  "MyChildrenNum"
} 

Copyright (c) 2024 CData Software, Inc. - All rights reserved.
Build 24.0.8963