CentOS 6 – Cloudera
EnglishLinuxTutorials
So much work and so few time.. I won’t have the time to explain, so it’s just a post for keeping a trace of my install scripts for Cloudera on CentOS 6
mkdir /opt/quidquid mkdir /opt/quidquid/PROGS yum install -y nmap wget apr apr-devel apr-util apr-util-devel libxml pcre pcre-devel gcc openssl-devel cd ~ wget -c http://apache.crihan.fr/dist/httpd/httpd-2.2.27.tar.gz tar zxf httpd-2.2.27.tar.gz cd httpd-2.2.27 ./configure --prefix=/opt/apache-2.2.27 --enable-so --enable-ssl --enable-ssl=shared --enable-rewrite --enable-rewrite=shared --with-z=/usr make make install ln -s /opt/apache-2.2.27/ /opt/apache cd ~ rm -Rf ~/httpd-2.* ll vi /opt/apache/conf/httpd.conf groupadd www useradd -g www www cat > /etc/init.d/httpd << "EOF" . /etc/rc.d/init.d/functions RETVAL=$? APACHEHOME="/opt/apache" case "$1" in start) echo -n "Starting httpd: " daemon $APACHEHOME/bin/httpd echo touch /var/lock/subsys/httpd ;; stop) echo -n "Shutting down http: " killproc httpd echo rm -f /var/lock/subsys/httpd rm -f /var/run/httpd.pid ;; status) status httpd ;; restart) $0 stop $0 start ;; reload) echo -n "Reloading httpd: " killproc httpd -HUP echo ;; *) echo "Usage: $0 {start|stop|restart|reload|status}" exit 1 esac exit 0 EOF chmod 700 /etc/init.d/httpd /etc/init.d/httpd start /etc/init.d/httpd stop /sbin/chkconfig --level 3 httpd on /sbin/chkconfig --level 06 httpd off mkdir -p /home/www/html mkdir -p /home/www/cgi-bin mkdir -p /home/www/html/CLOUDSME cat > /home/www/html/robots.txt << "EOF" User-agent: * Disallow: / EOF cat > /home/www/html/index.html << "EOF" Bonjour ! EOF chgrp -R www /home/www chmod -R 775 /home/www cp /opt/apache/conf/httpd.conf /opt/apache/conf/httpd.old vi /opt/apache/conf/httpd.conf Modifier les lignes suivantes pour correspondre à nos besoins : - User www - Group www - ServerName 192.168.2.42:80 - Listen 80 - DocumentRoot “/home/www/html” - /home/www/html”> - ScriptAlias /cgi-bin/ “/home/www/cgi-bin/” - /home/www/cgi-bin”> /etc/init.d/httpd start iptables -P INPUT ACCEPT iptables -F iptables -A INPUT -i lo -j ACCEPT iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT iptables -P INPUT DROP iptables -P FORWARD DROP iptables -P OUTPUT ACCEPT iptables -A INPUT -i eth0 -p icmp -j ACCEPT iptables -A INPUT -p tcp --dport 22 -j ACCEPT iptables -A INPUT -p tcp --dport 80 -j ACCEPT iptables -A INPUT -p tcp --dport 443 -j ACCEPT iptables -L /sbin/service iptables save
——————————
—- CLOUDERA INSTALLATION
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/4.2.0/CDH4-Quick-Start/cdh4qs_topic_3_3.html
cd ~/ wget -c http://archive.cloudera.com/cdh4/one-click-install/redhat/6/x86_64/cloudera-cdh-4-0.x86_64.rpm yum --nogpgcheck localinstall cloudera-cdh-4-0.x86_64.rpm yum install hadoop-conf-pseudo
Installer Java :
cd /opt/ wget -c https://blog.quidquid.fr/jdk/jdk-7u51-linux-x64.tar.gz tar zxf /opt/jdk-7u51-linux-x64.tar.gz chown -R root:root /opt/jdk1.7.0_51 ln -s /opt/jdk1.7.0_51 /opt/jdk [bash] Ajout de java_home dans les variables d’environnement. [bash] cat >> ~/.bashrc << "EOF" # ------------------------- export JAVA_HOME=/opt/jdk PATH=$PATH:$JAVA_HOME/bin EOF cat >> /etc/bashrc << "EOF" # ------------------------- export JAVA_HOME=/opt/jdk PATH=$PATH:$JAVA_HOME/bin EOF [/bash] Ajouter VMCLOUDERA à la fin de chaque lignes [bash]vi /etc/hosts[/bash] [bash] sudoedit /etc/sudoers and add : hdfs ALL=(ALL) ALL [/bash] Se connecter en tant que hdfs [bash] su - hdfs hdfs namenode -format exit vi /etc/hadoop/conf.pseudo/hadoop-env.sh export JAVA_HOME=/opt/jdk [/bash] Tout démarrer : [bash] for x in <code>cd /etc/init.d ; ls hadoop-hdfs-*</code> ; do sudo service $x start ; done sudo service hadoop-hdfs-namenode start sudo service hadoop-hdfs-secondarynamenode start sudo service hadoop-hdfs-datanode start [bash] 3. Optional: Start services on boot [bash] sudo chkconfig hadoop-hdfs-namenode on sudo chkconfig hadoop-hdfs-secondarynamenode on sudo chkconfig hadoop-hdfs-datanode on
Step 3: Create the /tmp Directory
Remove the old /tmp if it exists:
sudo -u hdfs hadoop fs -rm -r /tmp
Create a new /tmp directory and set permissions:
sudo -u hdfs hadoop fs -mkdir /tmp sudo -u hdfs hadoop fs -chmod -R 1777 /tmp
Step 4: Create Staging and Log Directories
Create the staging directory and set permissions:
sudo -u hdfs hadoop fs -mkdir /tmp/hadoop-yarn/staging
sudo -u hdfs hadoop fs -chmod -R 1777 /tmp/hadoop-yarn/staging
Create the done_intermediate directory under the staging directory and set permissions:
sudo -u hdfs hadoop fs -mkdir /tmp/hadoop-yarn/staging/history/done_intermediate
sudo -u hdfs hadoop fs -chmod -R 1777 /tmp/hadoop-yarn/staging/history/done_intermediate
Change ownership on the staging directory and subdirectory:
sudo -u hdfs hadoop fs -chown -R mapred:mapred /tmp/hadoop-yarn/staging
Create the /var/log/hadoop-yarn directory and set ownership:
sudo -u hdfs hadoop fs -mkdir /var/log/hadoop-yarn
sudo -u hdfs hadoop fs -chown yarn:mapred /var/log/hadoop-yarn
Step 5: Verify the HDFS File Structure:
Run the following command:
$ sudo -u hdfs hadoop fs -ls -R /
You should see the following directory structure:
drwxrwxrwt – hdfs supergroup 0 2014-04-25 11:29 /tmp
drwxr-xr-x – hdfs supergroup 0 2014-04-25 11:29 /tmp/hadoop-yarn
drwxrwxrwt – mapred mapred 0 2014-04-25 11:30 /tmp/hadoop-yarn/staging
drwxr-xr-x – mapred mapred 0 2014-04-25 11:30 /tmp/hadoop-yarn/staging/history
drwxrwxrwt – mapred mapred 0 2014-04-25 11:30 /tmp/hadoop-yarn/staging/history/done_intermediate
drwxr-xr-x – hdfs supergroup 0 2014-04-25 11:33 /var
drwxr-xr-x – hdfs supergroup 0 2014-04-25 10:53 /var/lib
drwxr-xr-x – hdfs supergroup 0 2014-04-25 11:33 /var/log
drwxr-xr-x – yarn mapred 0 2014-04-25 11:33 /var/log/hadoop-yarn
Step 6: Start YARN
sudo service hadoop-yarn-resourcemanager start
sudo service hadoop-yarn-nodemanager start
sudo service hadoop-mapreduce-historyserver start
sudo chkconfig hadoop-yarn-resourcemanager on
sudo chkconfig hadoop-yarn-nodemanager on
sudo chkconfig hadoop-mapreduce-historyserver on
sudo -u hdfs hadoop fs -mkdir /user
sudo -u hdfs hadoop fs -mkdir /user/clouduser
sudo -u hdfs hadoop fs -chown clouduser /user/clouduser
Testing everything is ok
useradd -g users clouduser
passwd clouduser
!clouduser!CLOUDERA!
su – clouduser
hadoop fs -mkdir input
hadoop fs -put /etc/hadoop/conf/*.xml input
hadoop fs -ls input
Found 3 items:
-rw-r–r– 1 clouduser users 1348 2014-04-25 11:42 input/core-site.xml
-rw-r–r– 1 clouduser users 1913 2014-04-25 11:42 input/hdfs-site.xml
-rw-r–r– 1 clouduser users 1001 2014-04-25 11:42 input/mapred-site.xml
Set HADOOP_MAPRED_HOME for user joe:
export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce
Run an example Hadoop job to grep with a regular expression in your input data.
hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar grep input output23 ‘dfs[a-z.]+’
After the job completes, you can find the output in the HDFS directory named output23 because you specified that output directory to Hadoop.
$ hadoop fs -ls Found 2 items
drwxr-xr-x – clouduser users 0 2014-04-25 11:45 /user/clouduser/input
drwxr-xr-x – clouduser users 0 2014-04-25 11:45 /user/clouduser/output23
You can see that there is a new directory called output23.
List the output files.
$ hadoop fs -ls output23 Found 2 items
drwxr-xr-x – clouduser users 0 2014-04-25 11:45 /user/joe/output23/_SUCCESS
-rw-r–r– 1 clouduser users 1068 2014-04-25 11:45 /user/joe/output23/part-r-00000
Read the results in the output file.
hadoop fs -cat output23/part-r-00000 | head
1 dfs.safemode.min.datanodes
1 dfs.safemode.extension
1 dfs.replication
1 dfs.permissions.enabled
1 dfs.namenode.name.dir
1 dfs.namenode.checkpoint.dir
1 dfs.datanode.data.dir
iptables -A INPUT -p tcp –dport 631 -j ACCEPT
iptables -A INPUT -p tcp –dport 8031 -j ACCEPT
iptables -A INPUT -p tcp –dport 8042 -j ACCEPT
iptables -A INPUT -p tcp –dport 8080 -j ACCEPT
iptables -A INPUT -p tcp –dport 8088 -j ACCEPT
/sbin/service iptables save