Tuesday, August 9, 2016

xgboost在centos和windows下的安装

0.补记一部分准备工作
上次在centos6.6服务器装好之后,因为安装TensorFlow的时候装glibc,修改了系统的lib.so.6,结果搞挂了一台服务器。搞挂之后才发现这个是核心类库(汗)。
现在因为又要使用xgboost,可是死活装不上了,都是gcc的原因啊。

centos6.6因为版本老旧,可是无奈服务器都是这个版本,想编c++11以上的东西,系统自带的gcc4.4.7是不够的。xgboost官方给出的建议是4.7以上,其实经过自己测试,4.7.2,4.8.2都是不行的https://github.com/dmlc/xgboost/issues/1319,这个issue同样提到了这个问题。无奈找不到issue中提到的4.8.3版本,只能安装4.9.4版本。安装方法如下【参考1】:
wget https://copr.fedoraproject.org/coprs/rhscl/devtoolset-3/repo/epel-6/rhscl-devtoolset-3-epel-6.repo -O /etc/yum.repos.d/rhscl-devtoolset-3-epel-6.repo
$ yum --disablerepo='*' --enablerepo='rhscl-devtoolset-3' list
$ yum --disablerepo='*' --enablerepo='rhscl-devtoolset-3' install devtoolset-3-gcc devtoolset-3-gcc-c++
然后执行【参考2】:
export CC=/opt/rh/devtoolset-3/root/usr/bin/gcc
export CPP=/opt/rh/devtoolset-3/root/usr/bin/cpp
export CXX=/opt/rh/devtoolset-3/root/usr/bin/c++






这样无论是git clone下来执行make -j4,还是直接pip install xgboost -i https://pypi.douban.com/simple都是很顺利的。

但是暗坑还是无数啊。暗坑存在哪里的,因为内网是实用的代理加上获取安装包的地址确实慢,速度实在慢的无法忍受啊,频频失败。


执行的时候发现需要安装的包如上,日中粗体的是已经下载完的,每次重新执行yum --disablerepo='*' --enablerepo='rhscl-devtoolset-3' install devtoolset-3-gcc devtoolset-3-gcc-c++
发现这玩意儿是能断点续传的,而且是全部下载(测试不能用的几个,得到的经验)完才开始安装。那就肯定有一个临时目录用来存放这些东西,存放在哪里呢
找到这个文章http://superuser.com/questions/385712/where-does-yum-save-the-rpms-it-downloads
比着葫芦画瓢,找到/var/cache/yum/x86_64/6/rhscl-devtoolset-3/packages,发现确实是存放的地方,而且需要的依赖名称都在这里了,那么事情就好办了。

直接google一下,然后迅雷下载
https://www.softwarecollections.org/repos/rhscl/devtoolset-3/epel-6-x86_64/devtoolset-3-3.1-12.el6/
https://www.softwarecollections.org/repos/rhscl/devtoolset-3/epel-6-x86_64/devtoolset-3-gcc-4.9.2-6.el6/
https://www.softwarecollections.org/repos/rhscl/devtoolset-3/epel-6-x86_64/devtoolset-3-binutils-2.24-18.el6/
https://www.softwarecollections.org/repos/rhscl/devtoolset-3/epel-6-x86_64/devtoolset-3-gcc-4.9.2-6.el6/

全部下载之后,拷贝到上面的目录下。再次执行
yum --disablerepo='*' --enablerepo='rhscl-devtoolset-3' install devtoolset-3-gcc devtoolset-3-gcc-c++


显示依赖已经具备,就开始愉快的安装了。
然后就是成功。

可以说这次尝试是深坑无数,可是自己还是一步步解决了,中间也学到不少东西。又可以愉快的训练模型了。
最后加上自己参照最早的一个安装gcc4.7.2和4.8.2的方式
http://superuser.com/questions/381160/how-to-install-gcc-4-7-x-4-8-x-on-centos

其实自己也尝试了本地使用docker尝试的方法,速度一样慢就先不说了,记录一句docker命令,从container内拷贝文件到host
In order to copy a file from a container to the host, you can use the command dockercp <containerId>:/file/path/within/container /host/path/target 

docker pull centos:6.6
docker commit containerid 名称
docker run -it images

ps:测试机上装成功了,从/anaconda2/lib/python2.7/site-packages 下面拷贝相关的东西到生产机器上,发现import xgboost的时候,报了


找到https://github.com/dmlc/xgboost/issues/1786,发现按照这个解决不行。重新卸载了,直接pip install成功,但是引入不行,对比测试和生产机器,发现测试用的是anaconda4.1.1,生产用的是anaconda4.3,降级之后发现可以了。之前fasttext也是同样的问题,看来anaconda升级之后问题多多呀。原因未知。
因为要用到xgboost,本地开发用的windows,服务器是centos,所以面临两边都要安装的问题。

1.centos安装

1.由于centos yum依赖python2.6,所以直接yum install python27就安装完成了,使用的时候执行python2.7

pip2.7 install xgboost 即可完成对xgboost的安装
但是在引入的过程中,报了如下的错误
>>> import xgboost
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/python2.7/lib/python2.7/site-packages/xgboost/__init__.py", line 13, in <module>
    from .sklearn import XGBModel, XGBClassifier, XGBRegressor
  File "/usr/local/python2.7/lib/python2.7/site-packages/xgboost/sklearn.py", line 10, in <module>
    from .compat import (SKLEARN_INSTALLED, XGBModelBase,
ImportError: cannot import name LabelEncoder

后来找到xgboost的issue中刘知远的回复是安装sklearn,执行pip2.7 install scikit-learn

再次导入,正确

2.windows下的安装

网上搜到http://blog.csdn.net/bon_mot/article/details/51742869#0-tsina-1-68989-397232819ff9a47a7b7e80a40613cfe1)这篇文章,但是由于xgboost的改变已经没有了vs的工程文件,git下已经没有windows的目录

参考以下部分进行了安装,一路使用的git bash
Note that as of the most recent release the Microsoft Visual Studio instructions no longer seem to apply as this link returns a 404 error:
You can read more about the removal of the MSVC build from Tianqi Chen's comment here.
So here's what I did to finish a 64-bit build on Windows:
  1. Download and install MinGW-64: http://sourceforge.net/projects/mingw-w64/
  2. On the first screen of the install prompt make sure you set the Architecture to x86_64 and the Threads to win32
  3. I installed to C:\mingw64 (to avoid spaces in the file path) so I added this to my PATH environment variable: C:\mingw64\mingw64\bin
  4. I also noticed that the make utility that is included in bin\mingw64 is called mingw32-make so to simplify things I just renamed this to make
  5. Open a Windows command prompt and type gcc. You should see something like "fatal error: no input file"
  6. Next type make. You should see something like "No targets specified and no makefile found"
  7. Type git. If you don't have git, install it and add it to your PATH.
These should be all the tools you need to build the xgboost project. To get the source code run these lines:
  1. cd c:\
  2. git clone --recursive https://github.com/dmlc/xgboost
  3. cd xgboost
  4. git submodule init
  5. git submodule update
  6. cp make/mingw64.mk config.mk
  7. make -j4
Note that I ran this part from a Cygwin shell. If you are using the Windows command prompt you should be able to change cp to copy and arrive at the same result. However, if the build fails on you for any reason I would recommend trying again using cygwin.
If the build finishes successfully, you should have a file called xgboost.exe located in the project root. To install the Python package, do the following:
  1. cd python-package
  2. python setup.py install
Now you should be good to go. Open up Python, and you can import the package with:
import xgboost as xgb
To test the installation, I went ahead and ran the basic_walkthrough.py file that was included in the demo/guide-python folder of the project and didn't get any errors.
安装完成在导入的时候还是报错了
告诉我需要scipy,直接执行pip install scipy报错了,但是没有时间去整理的,自从上次从python32切到64之后一直觉得有问题  http://www.lfd.uci.edu/~gohlke/pythonlibs/ 最简便的方法,依然是直接从这位老师这里直接下载,然后再试的过程中还是遇到了下面的错误
查询的时候,注意到下面这句话
Many binaries depend on numpy-1.11+mkl and the Microsoft Visual C++ 2008 (x64x86, and SP1 for CPython 2.6 and 2.7), Visual C++ 2010 (x64x86, for CPython 3.3 and 3.4), or the Visual C++ 2015 (x64 and x86 for CPython 3.5) redistributable packages.
Install numpy+mkl before other packages that depend on it.
意思就是因为很多模块依赖numpy,但是这位老师预编译的模块,依赖是numpy+mkl,我之前numpy是直接通过pip装的,也就是用不了,从新下载这位老师的numpy并安装,顺利搞定

参考:
1.http://www.hi-linux.com/posts/25767.html
2.https://www.zhangfangzhou.cn/centos6-devtoolset-gcc.html


Mac OS 安装;之前安装成功过,现在换了新的机器不能用了
目前os:10.13.4
一般按照https://machinelearningmastery.com/install-xgboost-python-macos/ 这个安装是没有问题的,官方https://xgboost.readthedocs.io/en/latest/build.html#python-package-installation
遇到问题:clang: error: unsupported option '-fopenmp'
clang: error: unsupported option '-fopenmp'
clang: error: unsupported option '-fopenmp'
clang: error: unsupported option '-fopenmp'
切换到gcc-7 和g++-7,参照文档;

https://developer.apple.com/download/more/ 下载command line tools
升级xcode;
重新安装gcc 7
sudo chown -R $(whoami):admin /usr/local
Password:
chown: /usr/local: Operation not permitted 不管用;
Correct. /usr/local can no longer be chown'd in High Sierra. Instead use
sudo chown -R $(whoami) $(brew --prefix)/*

lrwxr-xr-x  1   admin        29 Jun 29 21:23 g++-7 -> ../Cellar/gcc/7.2.0/bin/g++-7
lrwxr-xr-x  1   admin        29 Jun 29 21:23 gcc-7 -> ../Cellar/gcc/7.2.0/bin/gcc-7

ln -s /usr/local/bin/g++-7 /usr/local/Cellar/gcc@7/7.3.0/bin/g++-7
ln -s /usr/local/bin/gcc-7 /usr/local/Cellar/gcc@7/7.3.0/bin/gcc-7
-sf 强制

brew unlink gcc
brew link gcc
一直没有解决,直接安装gcc-8,brew upgrade gcc
编译配置文件改为gcc-8
编译成功,正常安装



各种编译失败
conda install -c conda-forge xgboost
只能依赖这个了

No comments:

Post a Comment