JDBC Driver for HDFS

Build 20.0.7587

CData JDBC Driver for HDFS

Overview

The CData JDBC Driver for HDFS offers the most natural way to connect to HDFS data from Java-based applications and developer technologies. The driver wraps the complexity of accessing HDFS data in an easy-to-integrate, 100%-Java JDBC driver. Applications can then access HDFS as a traditional database. The driver hides the complexity of accessing data and provides additional powerful security features, smart caching, batching, socket management, and more.

Key Features

  • Deploy a single JAR that does not rely on client-side libraries.
  • Write SQL to retrieve data.
  • Compliant with JDBC 3.0 and JDBC 4.0.
  • Codeless integration with popular BI, reporting, and ETL tools.
  • Collaborative query processing.

Getting Started

See Getting Started for A-Z guides on authenticating and connecting to HDFS data. See the HDFS integration guides for information on connecting from other applications.

Using the JDBC Driver

See Using JDBC for examples of using standard JDBC classes like DataSource, Connection, Statement, ResultSet, and others, to work with HDFS data. Using from Tools walks through the steps of integration with JDBC tools, using several popular database tools as examples.

Schema Discovery

See Schema Discovery to access schema information through the standard JDBC interfaces. Query the System Tables to access additional metadata, such as data source capabilities.

JDBC Remoting

See JDBC Remoting to configure remote access to the JDBC data source. The JDBC remoting feature allows hosting the JDBC connection on a server to enable connections from virtually anywhere -- various clients on any platform (Java, .NET, C++, PHP, Python, and so on) and using any standards-based technology (ODBC, JDBC, and so on). JDBC remoting is enabled using the popular MySQL wire protocol server.

SQL Compliance

See SQL Compliance for a syntax reference and code examples outlining the supported SQL.

Caching Data

See Caching Data to configure replication and caching for a range of scenarios common to remote data access. Configurations include:

  • Autocache: Automatically cache data to a lightweight database. Save data for later offline use or enable fast reporting from the cache.
  • Replication: Copy data to local and cloud data stores such as Oracle, SQL Server, Google Cloud SQL, and so on. The replication commands allow for intelligent incremental updates to cached data.
  • No caching: Work with remote data only. No local cache file is created.

Data Model

See Data Model for information on the available entities and how to query them.

Collaborative Query Processing

The driver enhances the data source's capabilities with additional client side processing, when needed, to enable analytic summaries of data such as SUM, AVG, MAX, MIN, and so on.

See SupportEnhancedSQL, in the Connection section, for more information.

Connection String Options

The Connection properties describe the various options that can be used to establish a connection.

Copyright (c) 2020 CData Software, Inc. - All rights reserved.
Build 20.0.7587