Hue是一个开源的Apache Hadoop UI系统,是基于Python Web框架Django实现的。Hue可以使开发者在浏览器端的Web控制台上与Hadoop集群进行交互来分析处理数据,例如操作HDFS上的数据,运行MapReduce Job等等。

本篇中使用的HUE版本是3.9.0。(在实际生产中最好别使用3.9这个版本,因为这个版本的代码结构和3.7和3.10的代码都不一样)

HUE build

HUE在使用之前要先对其进行编译,编译时需要安装一些依赖包,我的环境是centos,安装的依赖包如下:

1
sudo yum install gcc gcc-c++ libffi-devel libxml2-devel libxslt-devel openldap-devel python-devel sqlite-devel openssl-devel gmp-devel cyrus-sasl-devel cyrus-sasl-gssapi cyrus-sasl-plain krb5-devel

mvn mysql mysql-devel JDK make 之前已离线安装
查看依赖包是否安装可以使用命令rpm -q 依赖包名

更详细的依赖包可以参考此处

依赖包安装好之后,解压HUE的代码包,进入HUE_HOME,执行命令make apps

HUE默认是使用python2.6进行编译,如果你想使用别的版本python进行编译,只需更改HUE_HOME目录下的Makefile.vars,给变量SYS_PYTHON赋值为python2.7,代码为SYS_PYTHON := python2.7

遇到的问题

编译过程中遇到几个问题,都不是什么大问题,都是因为缺少依赖,安装相应的依赖即可。错误如下:

  • src/_fastmath.c:36:18: error: gmp.h: No such file or directory

执行命令 sudo yum install gmp gmp-devel

  • src/connection.h:33:21: error: sqlite3.h: No such file or directory

执行命令 sudo yum install sqlite-devel

  • Modules/errors.h:8:18: error: lber.h: No such file or directory
    Modules/errors.h:9:18: error: ldap.h: No such file or directory

执行命令 sudo yum install openldap-devel

HUE 部署

HUE的部署也比较简单,HUE的配置文件在~/hue-3.9.0/desktop/conf中,由于版本不一样,有的配置文件名是pseudo-distributed.ini,有的则是hue.ini,3.9.0的配置文件是hue.ini。

需要配置的属性如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
[desktop]
# Set this to a random string, the longer the better.
# This is used for secure hashing in the session store.
# 随意输入的字符,长度是30-60
secret_key=iowerwerwjdsfkjfksjdfiowjeiorujfklsjdf234
# HUE所在机器的ip,也就是访问HUE的ip和端口
http_host=192.168.244.131
http_port=8888
# Time zone name
time_zone=Asia/Shanghai
# 运行hue进程的linux用户
server_user=hadoop
server_group=hadoop
# This should be the Hue admin and proxy user
default_user=hadoop
# Hadoop集群的管理员用户
default_hdfs_superuser=hadoop

[hadoop]
# Configuration for HDFS NameNode
# ------------------------------------------------------------------------
[[hdfs_clusters]]
# HA support by using HttpFs
[[[default]]]
# Enter the filesystem uri
# core-site.xml里设置
fs_defaultfs=hdfs://centos:9000
# Use WebHdfs/HttpFs as the communication mechanism.
# Domain should be the NameNode or HttpFs host.
# Default port is 14000 for HttpFs.
webhdfs_url=http://192.168.244.131:50070/webhdfs/v1
hadoop_conf_dir=$HADOOP_CONF_DIR
[[yarn_clusters]]
[[[default]]]
# Enter the host on which you are running the ResourceManager
resourcemanager_host=192.168.244.131
# The port where the ResourceManager IPC listens on
resourcemanager_port=8032
# Whether to submit jobs to this cluster
submit_to=True

# URL of the ResourceManager API
resourcemanager_api_url=http://192.168.244.131:8088
# URL of the HistoryServer API
history_server_api_url=http://192.168.244.131:19888

[beeswax]
# Host where HiveServer2 is running.
# If Kerberos security is enabled, use fully-qualified domain name (FQDN).
# hiveServer2所在的机器ip
hive_server_host=192.168.244.131
# Port where HiveServer2 Thrift server runs on.
hive_server_port=10000
# Hive configuration directory, where hive-site.xml is located
hive_conf_dir=/home/hadoop/hive/conf

[zookeeper]
[[clusters]]
[[[default]]]
# Zookeeper ensemble. Comma separated list of Host/Port.
# e.g. localhost:2181,localhost:2182,localhost:2183
host_ports=192.168.244.131:2181

配置好之后,并且hiveServer2也已经启动,则可以启动HUE,命令为:

1
2
# HUE_HOME目录下
build/env/bin/supervisor

则可以通过192.168.244.131:8888访问。

HUE数据库切换到Mysql

HUE中的信息,比如账号信息,默认是存储在sqlite3中的,在实际的生产环境中为了安全一般都会切换成Mysql数据库。切换mysql数据库的步骤如下:

1、先停止HUE,改配置文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
[desktop]
[[database]]
# Database engine is typically one of:
# postgresql_psycopg2, mysql, sqlite3 or oracle.
#
# Note that for sqlite3, 'name', below is a path to the filename. For other backends, it is the database name.
# Note for Oracle, options={"threaded":true} must be set in order to avoid crashes.
# Note for Oracle, you can use the Oracle Service Name by setting "port=0" and then "name=<host>:<port>/<service_name>".
# Note for MariaDB use the 'mysql' engine.
engine=mysql
host=127.0.0.1
port=3306
user=root
password=root
## 用于存放hue信息的数据库名
name=hue

2、备份sqlite3中的数据库

1
2
## HUE_HOME目录下,备份的文件必须是json格式的
sudo build/env/bin/hue dumpdata > hue.json

备份之后,打开hue.json并删除model字段中带有useradmin.userprofile的所有JSON对象

3、在Mysql中创建配置文件中的hue数据库create database hue;
4、同步表结构

1
2
sudo build/env/bin/hue syncdb --noinput 
sudo build/env/bin/hue migrate

5、查看是否需要删除外键
在mysql命令行中,先查看下auth_permission的建表语句,使用命令show create table auth_permission;,如果此表是InnoDB,则删除外键,命令为ALTER TABLE auth_permission DROP FOREIGN KEY content_type_id_refs_id_XXXXXX;

6、删除django_content_type表中的数据,DELETE FROM hue.django_content_type;

7、加载数据build/env/bin/hue loaddata hue.json

8、如果步骤5中删除了外键,则需要添加外键,命令ALTER TABLE auth_permission ADD FOREIGN KEY (content_type_id) REFERENCES django_content_type (id);

操作完并没有出现错误则启动HUE,此时数据库从sqlite切换到了Mysql中。

部署Mysql Editor

HUE不仅可以操作Hadoop生态圈里的组件,还可以操作关系型数据库。

操作关系型数据库只需修改配置文件即可,以Mysql为例:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
[librdbms]
[[databases]]
[[[mysql]]]
# Name to show in the UI.
nice_name="My SQL DB"

# For MySQL and PostgreSQL, name is the name of the database.
# For Oracle, Name is instance of the Oracle server. For express edition
# this is 'xe' by default.
## name=mysqldb

# Database backend to use. This can be:
# 1. mysql
# 2. postgresql
# 3. oracle
engine=mysql

# IP or hostname of the database to connect to.
## host=localhost

# Port the database server is listening to. Defaults are:
# 1. MySQL: 3306
# 2. PostgreSQL: 5432
# 3. Oracle Express Edition: 1521
## port=3306
# Username to authenticate with when connecting to the database.
user=root

# Password matching the username to authenticate with when
# connecting to the database.
password=root

# Database options to send to the server when connecting.
# https://docs.djangoproject.com/en/1.4/ref/databases/
## options={}