When combined with the connector, Pandas can be used to generate data frames which contains your HDFS data. Once created, a data frame can be passed to various other python packages.
Pandas will need to be imported before it can be used. Pandas will also rely on a SQLAlchemy engine when executing queries, as below:
import pandas as pd from sqlalchemy import create_engine engine = create_engine("hdfs:///?Host=sandbox-hdp.hortonworks.com;Port=50070;Path=/user/root;")
SELECT queries are provided in a call to the "read_sql()" method in pandas, alongside a relevant connection object. Pandas will execute the query on that connection, and return the results in the form of a data frame, which are used for a variety of purposes.
df = pd.read_sql(""" SELECT FileId, ChildrenNum, $exNumericCol; FROM Files;""", engine) print(df)