CData Python Connector for HDFS

Build 24.0.9060

CData Python Connector for HDFS

Overview

The CData Python Connector for HDFS allows developers to write Python scripts with connectivity to HDFS. The connector wraps the complexity of accessing HDFS data in an interface commonly used by python connectors to common database systems.

Key Features

  • A variety of WHL files or a single TAR.GZ file that accommodate several execution environments after installation with "pip install".
  • Supported for Python versions 3.8, 3.9, 3.10, and 3.11, within both Windows and Linux environments. Python 3.8, 3.9, and 3.10 distributions on Mac are also supported.
  • Write and execute SQL queries to fetch data in HDFS.
  • Custom dialect class that enables SQLAlchemy 1.3 and 1.4 to use this connector.

Getting Started

See Getting Started to install the connector to your python distribution and to create a basic connection to HDFS.

Using the Python Connector/Using from Tools

See Using the Connector for examples of executing basic SELECT, INSERT, UPDATE, DELETE, and EXECUTE queries with the module's provided classes.

See Using from Tools to connect HDFS data to tools such as Pandas or Petl.

SQLAlchemy ORM

SQLAlchemy can be leveraged to model the tables in HDFS with mapped classes. See From SQLAlchemy for instructions for configuring the Python connector with SQLAlchemy.

Pandas

Pandas' DataFrames can be used alongside the connector to generate analytical graphics. See From Pandas for a guide.

Schema Discovery

See Schema Discovery to query the provided system tables, which allows users to discover the available tables, views, and stored procedure, alongside additional information about their columns or parameters.

Advanced Features

Advanced Features details additional features supported by the connector, such as defining user defined views, ssl configuration, remoting, caching, firewall/proxy settings, and advanced logging.

SQL Compliance

See SQL Compliance for a syntax reference and code examples outlining the supported SQL.

Data Model

See Data Model for the available database objects. This section also provides more detailed information on querying specific HDFS entities.

Connection String Options

The Connection properties describe the various options that can be used to establish a connection.

Copyright (c) 2024 CData Software, Inc. - All rights reserved.
Build 24.0.9060