部署机器
NameNode1NameNode2DataNode1DataNode2DataNode3R安装目录
/usr/local/lib64/RRStudio Server安装目录/usr/lib/rstudio-server R安装步骤1.编译前确保安装如下模块,每台机器都要执行yum install gcc-gfortran gcc gcc-c++ libXt-devel openssl-devel readline-devel glibc-headers2.安装R语言(各个节点都要安装)
解压tar -zxvf R-3.2.0.tar.gz编译cd R-3.2.0./configure --prefix=/usr/local --disable-nls --enable-R-shlib #两个选项--disable-nls --enable-R-shlib是为RHive的安装座准备,如果不安装RHive可以省去。makemake install其中readline-devel、libXt-devel在编译R的时候需要,而--enable-R-shlib是安装R的共享库,在安装Rstudio需要。3.确认Java环境变量
RHadoop依赖于rJava包,安装rJava前确认已经配置了Java环境变量,然后进行R对jvm建立连接。R CMD javareconf4.进行rJAVA 、RHive 等模块的安装
R CMD INSTALL rJava_0.9-6.tar.gzR CMD INSTALL Rserve_1.8-3.tar.gzR CMD INSTALL RHive_2.0-0.10.tar.gz5.配置RHIVE
新建RHIVE 数据存储路径(本地的非HDFS)我这里保存在 /www/store/rhive/datamkdir -p /www/store/rhive/data新建Rserv.conf 文件并写入 “remote enable” 保存到你指定的目录
我这里存放在 /www/cloud/R/Rserv.confmkdir -p /www/cloud/Rvi /www/cloud/R/Rserv.conf修改各个节点以及master 的 /etc/profile 新增环境变量
export RHIVE_DATA=/www/store/rhive/data将R目录下的lib目录中所有文件上传至HDFS 中的/rhive/lib 目录下(如果目录不存在手工新建一下即可)
cd /usr/local/lib64/R/libhadoop fs -put ./* /rhive/lib6.启动
在所有节点和master上执行R CMD Rserve --RS-conf /www/cloud/R/Rserv.conftelnet NameNode1 6311telnet NameNode2 6311telnet DataNode1 6311telnet DataNode2 6311telnet DataNode3 6311telnet无法使用执行下面语句安装
yum install telnet-server 安装telnet服务yum install telnet.* 安装telnet客户端然后在Master节点telnet所有slave节点,显示 Rsrv0103QAP1 则表示连接成功
启动hive远程服务: rhive是通过thrift连接hiveserver的,需要要启动后台thrift服务,即:在hive客户端启动hive远程服务,如果已经开启了跳过本步骤
nohup hive --service hiveserver &7.Rhive 测试
library(RHive)rhive.init初始化报错未解决function (hiveHome = NULL, hiveLib = NULL, hadoopHome = NULL, hadoopConf = NULL, hadoopLib = NULL, verbose = FALSE) { tryCatch({ .rhive.init(hiveHome = hiveHome, hiveLib = hiveLib, hadoopHome = hadoopHome, hadoopConf = hadoopConf, hadoopLib = hadoopLib, verbose = verbose) }, error = function(e) { .handleErr(e) })}<environment: namespace:RHive>rhive.connect(host ="172.16.9.32")
连接报错未解决Warning: +----------------------------------------------------------+ + / hiveServer2 argument has not been provided correctly. + + / RHive will use a default value: hiveServer2=TRUE. + +----------------------------------------------------------+但是读取数据成功了
d <- rhive.query('select * from src.v_mzdm limit 1000')RStudio Server需要设置环境变量
Sys.setenv("HIVE_HOME"="/opt/cloudera/parcels/CDH-5.5.6-1.cdh5.5.6.p0.2/lib/hive")Sys.setenv("HADOOP_HOME"="/opt/cloudera/parcels/CDH-5.5.6-1.cdh5.5.6.p0.2/lib/hadoop")8.Rhadoop安装配置按顺序执行有依赖关系R CMD INSTALL Rcpp_0.12.17.tar.gzR CMD INSTALL plyr_1.8.3.tar.gzR CMD INSTALL stringi_1.2.3.tar.gzR CMD INSTALL glue_1.2.0.tar.gzR CMD INSTALL magrittr_1.5.tar.gzR CMD INSTALL stringr_1.3.0.tar.gzR CMD INSTALL reshape2_1.4.2.tar.gzR CMD INSTALL iterators_1.0.9.tar.gzR CMD INSTALL itertools_0.1-1.tar.gzR CMD INSTALL digest_0.6.14.tar.gzR CMD INSTALL RJSONIO_1.2-0.2.tar.gzR CMD INSTALL functional_0.4.tar.gzR CMD INSTALL bitops_1.0-5.tar.gzR CMD INSTALL caTools_1.17.tar.gzR CMD INSTALL Cairo_1.5-10.tar.gz 需要先执行yum -y install cairo* libxt*
依赖包下载路径https://cran.r-project.org/src/contrib/Archive/
9.安装RHadoop软件包
首先将下面的变量加入到环境变量中:
vi /etc/profile
export HADOOP_CMD=/opt/cloudera/parcels/CDH-5.5.6-1.cdh5.5.6.p0.2/bin/hadoopexport HADOOP_STREAMING=/opt/cloudera/parcels/CDH-5.5.6-1.cdh5.5.6.p0.2/jars/hadoop-streaming-2.6.0-cdh5.5.6.jarexport JAVA_LIBRARY_PATH=/opt/cloudera/parcels/CDH-5.5.6-1.cdh5.5.6.p0.2/lib/hadoop/lib/nativesource /etc/profile #保存生效安装R CMD INSTALL rhdfs_1.0.8.tar.gzR CMD INSTALL rmr2_3.3.1.tar.gz #各个节点都要安装报错-网上说是rmr2_3.3.1.tar.gz的编译问题未解决Copying libs into local build directoryfind: `/usr/lib/hadoop': No such file or directoryls: cannot access /opt/cloudera/parcels/CDH-5.5.6-1.cdh5.5.6.p0.2/lib/hadoop/hadoop-*-core.jar: No such file or directoryls: cannot access /opt/cloudera/parcels/CDH-5.5.6-1.cdh5.5.6.p0.2/lib/hadoop/hadoop-core-*.jar: No such file or directoryCannot find hadoop-core jar file in hadoop homecp: cannot stat `build/dist/*': No such file or directorycan't build hbase IO classes, skippinginstalling to /usr/local/lib64/R/library/rmr2/libs** R** byte-compile and prepare package for lazy loadingWarning in library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE, : there is no package called ‘quickcheck’Note: no visible binding for '<<-' assignment to '.Last' Note: no visible binding for '<<-' assignment to '.Last' ** help*** installing help indices** building package indices** testing if installed package can be loadedWarning: S3 methods ‘gorder.default’, ‘gorder.factor’, ‘gorder.data.frame’, ‘gorder.matrix’, ‘gorder.raw’ were declared in NAMESPACE but not found* DONE (rmr2)网上提供解决方法
http://www.dataguru.cn/thread-135199-1-1.html 再将native下面的libhadoop.so.0 及 libhadoop.so.1.0.0拷贝到 /usr/lib64下面:cp libhadoop.so /usr/lib64/cp libhadoop.so.1.0.0 /usr/lib64/验证一下rhdfs、rmr2的功能
测试hdfs
library(rhdfs)hdfs.init()hdfs.ls("/")rmr2的功能有问题,安装时报错没处理掉
#Rexport R_HOME=/usr/local/lib64/Rexport HADOOP_CMD=/opt/cloudera/parcels/CDH-5.5.6-1.cdh5.5.6.p0.2/bin/hadoopexport HADOOP_STREAMING=/opt/cloudera/parcels/CDH-5.5.6-1.cdh5.5.6.p0.2/jars/hadoop-streaming-2.6.0-cdh5.5.6.jarexport JAVA_LIBRARY_PATH=/opt/cloudera/parcels/CDH-5.5.6-1.cdh5.5.6.p0.2/lib/hadoop/lib/nativeexport RHIVE_DATA=/www/store/rhive/dataexport HIVE_HOME=/opt/cloudera/parcels/CDH-5.5.6-1.cdh5.5.6.p0.2/lib/hiveexport HADOOP_HOME=/opt/cloudera/parcels/CDH-5.5.6-1.cdh5.5.6.p0.2/lib/hadoopexport PATH=$PATH:$JAVA_HOME/bin:$ANT_HOME/bin:$R_HOME/bin RStudio Server安装步骤yum install --nogpgcheck rstudio-server-rhel-1.1.456-x86_64.rpmcd /usr/lib/rstudio-server/bin./rstudio-server start访问ip:8787系统设置
主要有两个配置文件,默认文件不存在 /etc/rstudio/rserver.conf /etc/rstudio/rsession.conf设置端口和ip控制:
vi /etc/rstudio/rserver.confwww-port=8080#监听端口www-address=127.0.0.0#允许访问的IP地址,默认0.0.0.0重启服务器,生效rstudio-server restart会话配置管理
vi /etc/rstudio/rsession.confsession-timeout-minutes=30#会话超时时间r-cran-repos=http://ftp.ctex.org/mirrors/CRAN#CRAN资源库系统管理
rstudio-server start #启动
rstudio-server stop #停止rstudio-server restart #重启查看运行中R进程
rstudio-server active-sessions指定PID,停止运行中的R进程rstudio-server suspend-session <pid>停止所有运行中的R进程rstudio-server suspend-all强制停止运行中的R进程,优先级最高,立刻执行rstudio-server force-suspend-session <pid>rstudio-server force-suspend-allRStudio Server临时下线,不允许web访问,并给用户友好提示rstudio-server offlineRStudio Server临时上线rstudio-server online只可以用普通用户登录
创建用户和密码useradd -d /home/r -m rpasswd r测试
x <- c(1,2,5,7,9)y <- c(2,4,7,8,10)library(Cairo)CairoPNG(file="pic_plot.png", width=640, height=480)plot(x,y)RStudio Server中读不到环境变量需要自己设置
Sys.setenv("HADOOP_CMD"="/opt/cloudera/parcels/CDH-5.5.6-1.cdh5.5.6.p0.2/bin/hadoop")读取hdfs上数据
library(rJava)library(rhdfs)hdfs.init()hdfs.ls("/")hdfs.cat("/user/kjxydata/src/V_MZDM/v_mzdm.txt") rmr2测试1.MapReduce的R语言程序:small.ints = to.dfs(1:10)mapreduce(input = small.ints, map = function(k, v) cbind(v, v^2))
报错-可能是rmr2没安装好from.dfs("/tmp/RtmpWnzxl4/file5deb791fcbd5")
因为MapReduce只能访问HDFS文件系统,先要用to.dfs把数据存储到HDFS文件系统里。MapReduce的运算结果再用from.dfs函数从HDFS文件系统中取出。
2.rmr的例子是wordcount,对文件中的单词计数input<- '/user/kjxydata/src/V_MZDM/v_mzdm.txt'wordcount = function(input, output = NULL, pattern = "\t"){
wc.map = function(., lines) { keyval(unlist( strsplit( x = lines,split = pattern)),1)}wc.reduce =function(word, counts ) { keyval(word, sum(counts))}mapreduce(input = input ,output = output, input.format = "text",map = wc.map, reduce = wc.reduce,combine = T)}wordcount(input)
报错-可能是rmr2没安装好from.dfs("/tmp/RtmpfZUFEa/file6cac626aa4a7")
安装参考
https://www.cnblogs.com/end/archive/2013/02/18/2916105.htmlhttps://www.cnblogs.com/hunttown/p/5470652.htmlhttps://www.cnblogs.com/hunttown/p/5470805.htmlhttps://blog.csdn.net/youngqj/article/details/46819625