CData Python Connector for Apache Impala

Build 21.0.7930

From Pandas

When combined with the connector, Pandas can be used to generate data frames which contains your Apache Impala data. Once created, a data frame can be passed to various other python packages.


Pandas will need to be imported before it can be used. Pandas will also rely on a SQLAlchemy engine when executing queries, as below:

import pandas as pd
from sqlalchemy import create_engine
engine = create_engine("apacheimpala:///?Server=;Port=21050;")

Querying Data

SELECT queries are provided in a call to the "read_sql()" method in pandas, alongside a relevant connection object. Pandas will execute the query on that connection, and return the results in the form of a data frame, which are used for a variety of purposes.

df = pd.read_sql("""
	FROM [CData].[Default].Customers;""", engine)

Modifying Data

To insert new records to a table, simply create a new data frame, and define its fields accordingly. From there, simply call "to_sql()" on the data frame to perform the INSERT operation with the connector, as in the below example. The "if _exists" argument must be set to "append" to prevent Pandas from attempting building the table from scratch, set index=False if needed to prevent Pandas from writing data frame index as a column:

df = pd.DataFrame({"City": ["Jon Deere"], "CompanyName": ["RSSBus Inc."]})
df.to_sql("[CData].[Default].Customers", con=engine, if_exists="append", index=False)

Copyright (c) 2021 CData Software, Inc. - All rights reserved.
Build 21.0.7930