From Pandas

When combined with the connector, Pandas can be used to generate data frames which contains your Apache Impala data. Once created, a data frame can be passed to various other python packages.


Pandas will need to be imported before it can be used. Pandas will also rely on a SQLAlchemy engine when executing queries, as below:

import pandas as pd
from sqlalchemy import create_engine
engine = create_engine("apacheimpala:///?Server=;Port=21050;")

Querying Data

SELECT queries are provided in a call to the "read_sql()" method in pandas, alongside a relevant connection object. Pandas will execute the query on that connection, and return the results in the form of a data frame, which are used for a variety of purposes.

df = pd.read_sql("""
	FROM [CData].[Default].Customers;""", engine)

Modifying Data

To insert new records to a table, simply create a new data frame, and define its fields accordingly. From there, simply call "to_sql()" on the data frame to perform the INSERT operation with the connector, as in the below example. The "if _exists" argument must be set to "append" to prevent Pandas from attempting building the table from scratch, set index=False if needed to prevent Pandas from writing data frame index as a column:

df = pd.DataFrame({"City": ["Jon Deere"], "CompanyName": ["RSSBus Inc."]})
df.to_sql("[CData].[Default].Customers", con=engine, if_exists="append", index=False)

