博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
CDH5.5.6下R、RHive、RJava、RHadoop安装测试
阅读量:6457 次
发布时间:2019-06-23

本文共 7905 字,大约阅读时间需要 26 分钟。

部署机器

NameNode1
NameNode2
DataNode1
DataNode2
DataNode3

R安装目录

/usr/local/lib64/R
RStudio Server安装目录
/usr/lib/rstudio-server

R安装步骤
1.编译前确保安装如下模块,每台机器都要执行
yum install gcc-gfortran gcc gcc-c++ libXt-devel openssl-devel readline-devel glibc-headers

2.安装R语言(各个节点都要安装)

解压
tar -zxvf R-3.2.0.tar.gz
编译
cd R-3.2.0
./configure --prefix=/usr/local --disable-nls --enable-R-shlib  #两个选项--disable-nls --enable-R-shlib是为RHive的安装座准备,如果不安装RHive可以省去。
make
make install
其中readline-devel、libXt-devel在编译R的时候需要,而--enable-R-shlib是安装R的共享库,在安装Rstudio需要。

3.确认Java环境变量

RHadoop依赖于rJava包,安装rJava前确认已经配置了Java环境变量,然后进行R对jvm建立连接。
R CMD javareconf

4.进行rJAVA 、RHive 等模块的安装

R CMD INSTALL rJava_0.9-6.tar.gz
R CMD INSTALL Rserve_1.8-3.tar.gz
R CMD INSTALL RHive_2.0-0.10.tar.gz

5.配置RHIVE

新建RHIVE 数据存储路径(本地的非HDFS)
我这里保存在 /www/store/rhive/data
mkdir -p /www/store/rhive/data

新建Rserv.conf 文件并写入 “remote enable” 保存到你指定的目录

我这里存放在 /www/cloud/R/Rserv.conf
mkdir -p /www/cloud/R
vi /www/cloud/R/Rserv.conf

修改各个节点以及master 的 /etc/profile 新增环境变量

export RHIVE_DATA=/www/store/rhive/data

将R目录下的lib目录中所有文件上传至HDFS 中的/rhive/lib 目录下(如果目录不存在手工新建一下即可)

cd /usr/local/lib64/R/lib
hadoop fs -put ./* /rhive/lib

6.启动

在所有节点和master上执行
R CMD Rserve --RS-conf /www/cloud/R/Rserv.conf
telnet NameNode1 6311
telnet NameNode2 6311
telnet DataNode1 6311
telnet DataNode2 6311
telnet DataNode3 6311

telnet无法使用执行下面语句安装

yum install telnet-server 安装telnet服务
yum install telnet.* 安装telnet客户端

然后在Master节点telnet所有slave节点,显示 Rsrv0103QAP1 则表示连接成功

启动hive远程服务: rhive是通过thrift连接hiveserver的,需要要启动后台thrift服务,即:在hive客户端启动hive远程服务,如果已经开启了跳过本步骤

nohup hive --service hiveserver &

7.Rhive 测试

library(RHive)
rhive.init
初始化报错未解决
function (hiveHome = NULL, hiveLib = NULL, hadoopHome = NULL,
hadoopConf = NULL, hadoopLib = NULL, verbose = FALSE)
{
tryCatch({
.rhive.init(hiveHome = hiveHome, hiveLib = hiveLib, hadoopHome = hadoopHome,
hadoopConf = hadoopConf, hadoopLib = hadoopLib, verbose = verbose)
}, error = function(e) {
.handleErr(e)
})
}
<environment: namespace:RHive>

rhive.connect(host ="172.16.9.32")

连接报错未解决
Warning:
+----------------------------------------------------------+
+ / hiveServer2 argument has not been provided correctly. +
+ / RHive will use a default value: hiveServer2=TRUE. +
+----------------------------------------------------------+

但是读取数据成功了

d <- rhive.query('select * from src.v_mzdm limit 1000')

RStudio Server需要设置环境变量

Sys.setenv("HIVE_HOME"="/opt/cloudera/parcels/CDH-5.5.6-1.cdh5.5.6.p0.2/lib/hive")
Sys.setenv("HADOOP_HOME"="/opt/cloudera/parcels/CDH-5.5.6-1.cdh5.5.6.p0.2/lib/hadoop")

8.Rhadoop安装配置按顺序执行有依赖关系
R CMD INSTALL Rcpp_0.12.17.tar.gz
R CMD INSTALL plyr_1.8.3.tar.gz
R CMD INSTALL stringi_1.2.3.tar.gz
R CMD INSTALL glue_1.2.0.tar.gz
R CMD INSTALL magrittr_1.5.tar.gz
R CMD INSTALL stringr_1.3.0.tar.gz
R CMD INSTALL reshape2_1.4.2.tar.gz
R CMD INSTALL iterators_1.0.9.tar.gz
R CMD INSTALL itertools_0.1-1.tar.gz
R CMD INSTALL digest_0.6.14.tar.gz
R CMD INSTALL RJSONIO_1.2-0.2.tar.gz
R CMD INSTALL functional_0.4.tar.gz
R CMD INSTALL bitops_1.0-5.tar.gz
R CMD INSTALL caTools_1.17.tar.gz
R CMD INSTALL Cairo_1.5-10.tar.gz 需要先执行yum -y install cairo* libxt*

依赖包下载路径https://cran.r-project.org/src/contrib/Archive/

9.安装RHadoop软件包

首先将下面的变量加入到环境变量中:

vi /etc/profile

export HADOOP_CMD=/opt/cloudera/parcels/CDH-5.5.6-1.cdh5.5.6.p0.2/bin/hadoop
export HADOOP_STREAMING=/opt/cloudera/parcels/CDH-5.5.6-1.cdh5.5.6.p0.2/jars/hadoop-streaming-2.6.0-cdh5.5.6.jar
export JAVA_LIBRARY_PATH=/opt/cloudera/parcels/CDH-5.5.6-1.cdh5.5.6.p0.2/lib/hadoop/lib/native
source /etc/profile #保存生效
安装
R CMD INSTALL rhdfs_1.0.8.tar.gz
R CMD INSTALL rmr2_3.3.1.tar.gz    #各个节点都要安装
报错-网上说是rmr2_3.3.1.tar.gz的编译问题未解决
Copying libs into local build directory
find: `/usr/lib/hadoop': No such file or directory
ls: cannot access /opt/cloudera/parcels/CDH-5.5.6-1.cdh5.5.6.p0.2/lib/hadoop/hadoop-*-core.jar: No such file or directory
ls: cannot access /opt/cloudera/parcels/CDH-5.5.6-1.cdh5.5.6.p0.2/lib/hadoop/hadoop-core-*.jar: No such file or directory
Cannot find hadoop-core jar file in hadoop home
cp: cannot stat `build/dist/*': No such file or directory
can't build hbase IO classes, skipping
installing to /usr/local/lib64/R/library/rmr2/libs
** R
** byte-compile and prepare package for lazy loading
Warning in library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE, :
there is no package called ‘quickcheck’
Note: no visible binding for '<<-' assignment to '.Last'
Note: no visible binding for '<<-' assignment to '.Last'
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
Warning: S3 methods ‘gorder.default’, ‘gorder.factor’, ‘gorder.data.frame’, ‘gorder.matrix’, ‘gorder.raw’ were declared in NAMESPACE but not found
* DONE (rmr2)

网上提供解决方法

http://www.dataguru.cn/thread-135199-1-1.html

再将native下面的libhadoop.so.0 及 libhadoop.so.1.0.0拷贝到 /usr/lib64下面:
cp libhadoop.so /usr/lib64/
cp libhadoop.so.1.0.0 /usr/lib64/

验证一下rhdfs、rmr2的功能

测试hdfs

library(rhdfs)
hdfs.init()
hdfs.ls("/")

rmr2的功能有问题,安装时报错没处理掉

#R
export R_HOME=/usr/local/lib64/R
export HADOOP_CMD=/opt/cloudera/parcels/CDH-5.5.6-1.cdh5.5.6.p0.2/bin/hadoop
export HADOOP_STREAMING=/opt/cloudera/parcels/CDH-5.5.6-1.cdh5.5.6.p0.2/jars/hadoop-streaming-2.6.0-cdh5.5.6.jar
export JAVA_LIBRARY_PATH=/opt/cloudera/parcels/CDH-5.5.6-1.cdh5.5.6.p0.2/lib/hadoop/lib/native
export RHIVE_DATA=/www/store/rhive/data
export HIVE_HOME=/opt/cloudera/parcels/CDH-5.5.6-1.cdh5.5.6.p0.2/lib/hive
export HADOOP_HOME=/opt/cloudera/parcels/CDH-5.5.6-1.cdh5.5.6.p0.2/lib/hadoop
export PATH=$PATH:$JAVA_HOME/bin:$ANT_HOME/bin:$R_HOME/bin

RStudio Server安装步骤
yum install --nogpgcheck rstudio-server-rhel-1.1.456-x86_64.rpm
cd /usr/lib/rstudio-server/bin
./rstudio-server start
访问ip:8787

系统设置

主要有两个配置文件,默认文件不存在
/etc/rstudio/rserver.conf
/etc/rstudio/rsession.conf

设置端口和ip控制:

vi /etc/rstudio/rserver.conf
www-port=8080#监听端口
www-address=127.0.0.0#允许访问的IP地址,默认0.0.0.0
重启服务器,生效
rstudio-server restart

会话配置管理

vi /etc/rstudio/rsession.conf
session-timeout-minutes=30#会话超时时间
r-cran-repos=http://ftp.ctex.org/mirrors/CRAN#CRAN资源库

系统管理

rstudio-server start #启动

rstudio-server stop #停止
rstudio-server restart #重启

查看运行中R进程

rstudio-server active-sessions
指定PID,停止运行中的R进程
rstudio-server suspend-session <pid>
停止所有运行中的R进程
rstudio-server suspend-all
强制停止运行中的R进程,优先级最高,立刻执行
rstudio-server force-suspend-session <pid>
rstudio-server force-suspend-all
RStudio Server临时下线,不允许web访问,并给用户友好提示
rstudio-server offline
RStudio Server临时上线
rstudio-server online

只可以用普通用户登录

创建用户和密码
useradd -d /home/r -m r
passwd r

测试

x <- c(1,2,5,7,9)
y <- c(2,4,7,8,10)
library(Cairo)
CairoPNG(file="pic_plot.png", width=640, height=480)
plot(x,y)

RStudio Server中读不到环境变量需要自己设置

Sys.setenv("HADOOP_CMD"="/opt/cloudera/parcels/CDH-5.5.6-1.cdh5.5.6.p0.2/bin/hadoop")

读取hdfs上数据

library(rJava)
library(rhdfs)
hdfs.init()
hdfs.ls("/")
hdfs.cat("/user/kjxydata/src/V_MZDM/v_mzdm.txt")

rmr2测试
1.MapReduce的R语言程序:
small.ints = to.dfs(1:10)

mapreduce(input = small.ints, map = function(k, v) cbind(v, v^2))

报错-可能是rmr2没安装好

from.dfs("/tmp/RtmpWnzxl4/file5deb791fcbd5")

因为MapReduce只能访问HDFS文件系统,先要用to.dfs把数据存储到HDFS文件系统里。MapReduce的运算结果再用from.dfs函数从HDFS文件系统中取出。

2.rmr的例子是wordcount,对文件中的单词计数
input<- '/user/kjxydata/src/V_MZDM/v_mzdm.txt'

wordcount = function(input, output = NULL, pattern = "\t"){

wc.map = function(., lines) {
keyval(unlist( strsplit( x = lines,split = pattern)),1)
}
wc.reduce =function(word, counts ) {
keyval(word, sum(counts))
}
mapreduce(input = input ,output = output, input.format = "text",
map = wc.map, reduce = wc.reduce,combine = T)
}

wordcount(input)

报错-可能是rmr2没安装好

from.dfs("/tmp/RtmpfZUFEa/file6cac626aa4a7")

 

安装参考

https://www.cnblogs.com/end/archive/2013/02/18/2916105.html
https://www.cnblogs.com/hunttown/p/5470652.html
https://www.cnblogs.com/hunttown/p/5470805.html
https://blog.csdn.net/youngqj/article/details/46819625

转载于:https://www.cnblogs.com/liquan-anran/p/9429376.html

你可能感兴趣的文章
String字符串的截取
查看>>
DynamoDB Local for Desktop Development
查看>>
laravel 使用QQ邮箱发送邮件
查看>>
用javascript验证哥德巴赫猜想
查看>>
Shell编程-环境变量配置文件
查看>>
[Unity3d]DrawCall优化手记
查看>>
SQL Serever学习7——数据表2
查看>>
(转)Mac 下设置android NDK的环境
查看>>
Struts2和Spring MVC的区别
查看>>
理解Javascript参数中的arguments对象
查看>>
p2:千行代码入门python
查看>>
bzoj1106[POI2007]立方体大作战tet*
查看>>
spring boot configuration annotation processor not found in classpath问题解决
查看>>
【转】正则基础之——神奇的转义
查看>>
团队项目测试报告与用户反馈
查看>>
对软件工程课程的期望
查看>>
Mysql中文字符串提取datetime
查看>>
CentOS访问Windows共享文件夹的方法
查看>>
IOS 与ANDROID框架及应用开发模式对比一
查看>>
由中序遍历和后序遍历求前序遍历
查看>>