title	description	services	documentationcenter	author	manager	editor	ms.assetid	ms.service	ms.custom	ms.devlang	ms.topic	ms.tgt_pltfrm	ms.workload	ms.date	ms.author
Use Apache Phoenix with HBase - Azure HDInsight \| Microsoft Docs	Learn how to use Apache Phoenix in HDInsight.	hdinsight		mumian	jhubbard	cgronlun	cda0f33b-a2e8-494c-972f-ae0bb482b818	hdinsight	hdinsightactive	na	article	na	big-data	05/26/2017	jgao

Use Apache Phoenix with HBase clusters in HDInsight

If you want to query HBase using SQL instead of scans, you can by using Apache Phoenix in HDInsight. You use the SQLLine utility to submit SQL to Phoenix and view the results.

For more information about Phoenix, see Phoenix in 15 minutes or less and Phoenix in HDInsight for an overview.

For information on the Phoenix grammar, see Phoenix Grammar.

Note

For the Phoenix version information in HDInsight, see What's new in the Hadoop cluster versions provided by HDInsight?.

Use SQLLine to create a new HBase table

SQLLine is a command-line utility to execute SQL.

Prerequisites

Before you can use SQLLine, you must have the following:

An HBase cluster in HDInsight. For information on provisioning an HBase cluster, see Get started with Apache HBase in HDInsight.
Connect to the HBase cluster via SSH. For instructions, see [Connect to HDInsight using SSH][hdinsight-hadoop-linux-use-ssh-unix].

To query HBase, you SSH into the head node and then run queries. These queries need to commnuicate with the ZooKeeper quorum.

Locating your ZooKeeper host names

Open Ambari by browsing to https://<ClusterName>.azurehdinsight.net.
Enter the HTTP (cluster) username and password to login.
Click ZooKeeper from the left menu. You see three ZooKeeper Servers listed.
Click one of the ZooKeeper Servers listed. On the Summary pane, find the Hostname. It is similar to zk1-jdolehb.3lnng4rcvp5uzokyktxs4a5dhd.bx.internal.cloudapp.net.

To highlight how you can efficiently leverage HDInsight HBase and Phoenix to analyze big data stores, this walkthrough shows you how to use HBase Phoenix to do sales analysis of an imaginary online book store.

Upload sample data to HDInsight cluster storage account

To begin, we need to upload the sample data we'll use for the remaining exercies. Read more about various ways to upload data for Hadoop jobs in HDInsight.

Browse to the HBase HDInsight cluster on the Azure portal.
On the left-hand menu, select Storage accounts.
Select your Azure Blob Storage or Azure Data Lake Store account listed in the Storage accounts pane.
Browse to the /example/data folder.

If using Azure Blob Storage, the /example/data folder will be located within your cluster's container.
In Azure Data Lake Store, use the data explorer to navigate to /clusters/hbase/example/data.

Upload the sample weblogs.csv file to this location.

Bulk Load data into HBase using Phoenix

Phoenix provides two ways to bulk load into HBase. The PSQL command line utility, which is a single-threaded client loading tool, and a MapReduce bulk loading utility. PSQL is suited for gigabytes of data, whereas MapReduce is used for much larger data volumes.

To bulk upload data into HBase using the Phoenix PSQL command line tool, perform the following actions:

SSH into your HBase cluster. For more information, see Use SSH with HDInsight.

From SSH, run the following commands to run SQLLine:

cd /usr/hdp/current/phoenix-client/bin
./sqlline.py <ZooKeeperHostname>:2181:/hbase-unsecure

Run the following command to create the weblogs table.

CREATE TABLE weblogs (
  transactionid varchar(500) Primary Key,
  "t1".transactiondate Date NULL,
  "t1".customerid varchar(50) NULL,
  "t1".bookid varchar(50) NULL,
  "t1".purchasetype varchar(50) NULL,
  "t1".orderid varchar(50) NULL,
  "t1".bookname varchar(50) NULL,
  "t1".categoryname varchar(50) NULL,
  "t1".invoicenumber varchar(50) NULL,
  "t1".invoicestatus varchar(50) NULL,
  "t1".city varchar(50) NULL,
  "t1".state varchar(50) NULL,
  "t2".paymentamount DOUBLE NULL,
  "t2".quantity INTEGER NULL,
  "t2".shippingamount DOUBLE NULL);

Note: The above query creates a weblogs table with two column families, t1 and t2. Column families are stored separately in different HFiles, thus it makes sense to have a separate column family for data which is queried often. The paymentamount, quantity, and shippingamount columns will be queried often, so they are in a different column family.

Run the following command in a new Hadoop command line window.
```
cd /usr/hdp/current/phoenix-client/bin
```
Now, let's copy the weblogs.csv file from our storage acount via hdfs to our local temp directory.
```
hdfs dfs -copyToLocal /example/data/weblogs.csv /tmp/
```

Finally, we'll use PSQL to bulk insert the rows into our new HBase table.

./psql.py -t WEBLOGS [The FQDN of one of the Zookeepers] /tmp/weblogs.csv

The above code executes the PSQL client utility to bulk upload data into the weblogs table. It takes three parameters: table name (-t), zookeeper fqdn, and the path of the csv file to bulk load from.

Once the PSQL operation is complete, you should have an output on your command window similar to the following:

CSV Upsert complete. 278972 rows upserted
Time: 64.549 sec(s)

Perform data retrieval queries with SQLLine

Close the extra Hadoop command line window you opened to run the PSQL script, and switch back over to the one running SQLLine. If you closed it, please follow the steps to once again open SQLLine.

Execute the following query to select the book name and quantity sold of books purchased between January and March:

SELECT bookname, sum(quantity) AS QuantitySold from WEBLOGS
WHERE Purchasetype='Purchased' and Month(transactiondate)>=1 and Month(transactiondate)<=3
GROUP BY bookname;

You should see an output similar to the following:

+---------------------------------------+---------------+
|               BOOKNAME                | QUANTITYSOLD  |
+---------------------------------------+---------------+
| Advances in school psychology         | 90233         |
| History of political economy          | 93772         |
| New Christian poetry                  | 90467         |
| Science in Dispute                    | 92221         |
| Space  fact and fiction               | 95544         |
| THE BOOK OF WITNESSES                 | 93753         |
| The adventures of Arthur Conan Doyle  | 93710         |
| The voyages of Captain Cook           | 94207         |
| Understanding American politics       | 91119         |
+---------------------------------------+---------------+
9 rows selected (21.942 seconds)

For more information, see SQLLine manual and Phoenix Grammar.

Next steps

In this article, you have learned how to use Apache Phoenix in HDInsight. To learn more, see:

HDInsight HBase overview: HBase is an Apache, open-source, NoSQL database built on Hadoop that provides random access and strong consistency for large amounts of unstructured and semistructured data.
Provision HBase clusters on Azure Virtual Network: With virtual network integration, HBase clusters can be deployed to the same virtual network as your applications so that applications can communicate with HBase directly.
Configure HBase replication in HDInsight: Learn how to configure HBase replication across two Azure datacenters.
Analyze Twitter sentiment with HBase in HDInsight: Learn how to do real-time sentiment analysis of big data by using HBase in a Hadoop cluster in HDInsight.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hdinsight-hbase-phoenix-squirrel-linux.md

hdinsight-hbase-phoenix-squirrel-linux.md

Use Apache Phoenix with HBase clusters in HDInsight

Use SQLLine to create a new HBase table

Prerequisites

Locating your ZooKeeper host names

Upload sample data to HDInsight cluster storage account

Bulk Load data into HBase using Phoenix

Perform data retrieval queries with SQLLine

Next steps

Files

hdinsight-hbase-phoenix-squirrel-linux.md

Latest commit

History

hdinsight-hbase-phoenix-squirrel-linux.md

File metadata and controls

Use Apache Phoenix with HBase clusters in HDInsight

Use SQLLine to create a new HBase table

Prerequisites

Locating your ZooKeeper host names

Upload sample data to HDInsight cluster storage account

Bulk Load data into HBase using Phoenix

Perform data retrieval queries with SQLLine

Next steps