CData Python Connector for Apache HBase

Build 24.0.9060

From Pandas

When combined with the connector, Pandas can be used to generate data frames that contain your Apache HBase data. Once created, a data frame can be passed to various other Python packages.

Connecting

Pandas relies on an SQLAlchemy engine to execute queries. Before you can use Pandas you must import it:
import pandas as pd
from sqlalchemy import create_engine
engine = create_engine("apachehbase:///?Server=127.0.0.1;Port=8080;")

Querying Data

In Pandas, SELECT queries are provided in a call to the read_sql() method, alongside a relevant connection object. Pandas executes the query on that connection, and returns the results in the form of a data frame, which can be used for a variety of purposes.
df = pd.read_sql("""
	SELECT
	   Id,
	   Name,
     $exNumericCol;
	FROM Account;""", engine)
print(df)

Modifying Data

To insert new records into a table, create a new data frame, and define its fields accordingly. When that is done, call to_sql() on the data frame to perform the INSERT operation with the connector, as shown in the example below. You must set the "if _exists" argument to "append" to prevent Pandas from attempting building the table from scratch. To prevent Pandas from writing the data frame index as a column, set index=False.
df = pd.DataFrame({"Id": ["Jon Doe"], "Name": ["Floppy Disks"]})
df.to_sql("Account", con=engine, if_exists="append", index=False)

Copyright (c) 2024 CData Software, Inc. - All rights reserved.
Build 24.0.9060