CentOS 6 – Cloudera
EnglishLinuxTutorials
So much work and so few time.. I won’t have the time to explain, so it’s just a post for keeping a trace of my install scripts for Cloudera on CentOS 6
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 | mkdir /opt/quidquid mkdir /opt/quidquid/PROGS yum install -y nmap wget apr apr-devel apr-util apr-util-devel libxml pcre pcre-devel gcc openssl-devel cd ~ wget -c http: //apache .crihan.fr /dist/httpd/httpd-2 .2.27. tar .gz tar zxf httpd-2.2.27. tar .gz cd httpd-2.2.27 . /configure --prefix= /opt/apache-2 .2.27 -- enable -so -- enable -ssl -- enable -ssl=shared -- enable -rewrite -- enable -rewrite=shared --with-z= /usr make make install ln -s /opt/apache-2 .2.27/ /opt/apache cd ~ rm -Rf ~ /httpd-2 .* ll vi /opt/apache/conf/httpd .conf groupadd www useradd -g www www cat > /etc/init .d /httpd << "EOF" . /etc/rc .d /init .d /functions RETVAL=$? APACHEHOME= "/opt/apache" case "$1" in start) echo -n "Starting httpd: " daemon $APACHEHOME /bin/httpd echo touch /var/lock/subsys/httpd ;; stop) echo -n "Shutting down http: " killproc httpd echo rm -f /var/lock/subsys/httpd rm -f /var/run/httpd .pid ;; status) status httpd ;; restart) $0 stop $0 start ;; reload) echo -n "Reloading httpd: " killproc httpd -HUP echo ;; *) echo "Usage: $0 {start|stop|restart|reload|status}" exit 1 esac exit 0 EOF chmod 700 /etc/init .d /httpd /etc/init .d /httpd start /etc/init .d /httpd stop /sbin/chkconfig --level 3 httpd on /sbin/chkconfig --level 06 httpd off mkdir -p /home/www/html mkdir -p /home/www/cgi-bin mkdir -p /home/www/html/CLOUDSME cat > /home/www/html/robots .txt << "EOF" User-agent: * Disallow: / EOF cat > /home/www/html/index .html << "EOF" Bonjour ! EOF chgrp -R www /home/www chmod -R 775 /home/www cp /opt/apache/conf/httpd .conf /opt/apache/conf/httpd .old vi /opt/apache/conf/httpd .conf Modifier les lignes suivantes pour correspondre à nos besoins : - User www - Group www - ServerName 192.168.2.42:80 - Listen 80 - DocumentRoot “ /home/www/html ” - /home/www/html ”> - ScriptAlias /cgi-bin/ “ /home/www/cgi-bin/ ” - /home/www/cgi-bin ”> /etc/init .d /httpd start iptables -P INPUT ACCEPT iptables -F iptables -A INPUT -i lo -j ACCEPT iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT iptables -P INPUT DROP iptables -P FORWARD DROP iptables -P OUTPUT ACCEPT iptables -A INPUT -i eth0 -p icmp -j ACCEPT iptables -A INPUT -p tcp --dport 22 -j ACCEPT iptables -A INPUT -p tcp --dport 80 -j ACCEPT iptables -A INPUT -p tcp --dport 443 -j ACCEPT iptables -L /sbin/service iptables save |
——————————
—- CLOUDERA INSTALLATION
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/4.2.0/CDH4-Quick-Start/cdh4qs_topic_3_3.html
1 2 3 4 | cd ~/ wget -c http: //archive .cloudera.com /cdh4/one-click-install/redhat/6/x86_64/cloudera-cdh-4-0 .x86_64.rpm yum --nogpgcheck localinstall cloudera-cdh-4-0.x86_64.rpm yum install hadoop-conf-pseudo |
Installer Java :
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 | cd /opt/ wget -c https: //blog .quidquid.fr /jdk/jdk-7u51-linux-x64 . tar .gz tar zxf /opt/jdk-7u51-linux-x64 . tar .gz chown -R root:root /opt/jdk1 .7.0_51 ln -s /opt/jdk1 .7.0_51 /opt/jdk [ bash ] Ajout de java_home dans les variables d’environnement. [ bash ] cat >> ~/.bashrc << "EOF" # ------------------------- export JAVA_HOME= /opt/jdk PATH=$PATH:$JAVA_HOME /bin EOF cat >> /etc/bashrc << "EOF" # ------------------------- export JAVA_HOME= /opt/jdk PATH=$PATH:$JAVA_HOME /bin EOF & #91;/bash] Ajouter VMCLOUDERA à la fin de chaque lignes & #91;bash]vi /etc/hosts[/bash] & #91;bash] sudoedit /etc/sudoers and add : hdfs ALL=(ALL) ALL & #91;/bash] Se connecter en tant que hdfs & #91;bash] su - hdfs hdfs namenode - format exit vi /etc/hadoop/conf .pseudo /hadoop-env .sh export JAVA_HOME= /opt/jdk & #91;/bash] Tout démarrer : & #91;bash] for x in <code> cd /etc/init .d ; ls hadoop-hdfs-*< /code > ; do sudo service $x start ; done sudo service hadoop-hdfs-namenode start sudo service hadoop-hdfs-secondarynamenode start sudo service hadoop-hdfs-datanode start [ bash ] 3. Optional: Start services on boot [ bash ] sudo chkconfig hadoop-hdfs-namenode on sudo chkconfig hadoop-hdfs-secondarynamenode on sudo chkconfig hadoop-hdfs-datanode on |
Step 3: Create the /tmp Directory
Remove the old /tmp if it exists:
sudo -u hdfs hadoop fs -rm -r /tmp
Create a new /tmp directory and set permissions:
sudo -u hdfs hadoop fs -mkdir /tmp sudo -u hdfs hadoop fs -chmod -R 1777 /tmp
Step 4: Create Staging and Log Directories
Create the staging directory and set permissions:
sudo -u hdfs hadoop fs -mkdir /tmp/hadoop-yarn/staging
sudo -u hdfs hadoop fs -chmod -R 1777 /tmp/hadoop-yarn/staging
Create the done_intermediate directory under the staging directory and set permissions:
sudo -u hdfs hadoop fs -mkdir /tmp/hadoop-yarn/staging/history/done_intermediate
sudo -u hdfs hadoop fs -chmod -R 1777 /tmp/hadoop-yarn/staging/history/done_intermediate
Change ownership on the staging directory and subdirectory:
sudo -u hdfs hadoop fs -chown -R mapred:mapred /tmp/hadoop-yarn/staging
Create the /var/log/hadoop-yarn directory and set ownership:
sudo -u hdfs hadoop fs -mkdir /var/log/hadoop-yarn
sudo -u hdfs hadoop fs -chown yarn:mapred /var/log/hadoop-yarn
Step 5: Verify the HDFS File Structure:
Run the following command:
$ sudo -u hdfs hadoop fs -ls -R /
You should see the following directory structure:
drwxrwxrwt – hdfs supergroup 0 2014-04-25 11:29 /tmp
drwxr-xr-x – hdfs supergroup 0 2014-04-25 11:29 /tmp/hadoop-yarn
drwxrwxrwt – mapred mapred 0 2014-04-25 11:30 /tmp/hadoop-yarn/staging
drwxr-xr-x – mapred mapred 0 2014-04-25 11:30 /tmp/hadoop-yarn/staging/history
drwxrwxrwt – mapred mapred 0 2014-04-25 11:30 /tmp/hadoop-yarn/staging/history/done_intermediate
drwxr-xr-x – hdfs supergroup 0 2014-04-25 11:33 /var
drwxr-xr-x – hdfs supergroup 0 2014-04-25 10:53 /var/lib
drwxr-xr-x – hdfs supergroup 0 2014-04-25 11:33 /var/log
drwxr-xr-x – yarn mapred 0 2014-04-25 11:33 /var/log/hadoop-yarn
Step 6: Start YARN
sudo service hadoop-yarn-resourcemanager start
sudo service hadoop-yarn-nodemanager start
sudo service hadoop-mapreduce-historyserver start
sudo chkconfig hadoop-yarn-resourcemanager on
sudo chkconfig hadoop-yarn-nodemanager on
sudo chkconfig hadoop-mapreduce-historyserver on
sudo -u hdfs hadoop fs -mkdir /user
sudo -u hdfs hadoop fs -mkdir /user/clouduser
sudo -u hdfs hadoop fs -chown clouduser /user/clouduser
Testing everything is ok
useradd -g users clouduser
passwd clouduser
!clouduser!CLOUDERA!
su – clouduser
hadoop fs -mkdir input
hadoop fs -put /etc/hadoop/conf/*.xml input
hadoop fs -ls input
Found 3 items:
-rw-r–r– 1 clouduser users 1348 2014-04-25 11:42 input/core-site.xml
-rw-r–r– 1 clouduser users 1913 2014-04-25 11:42 input/hdfs-site.xml
-rw-r–r– 1 clouduser users 1001 2014-04-25 11:42 input/mapred-site.xml
Set HADOOP_MAPRED_HOME for user joe:
export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce
Run an example Hadoop job to grep with a regular expression in your input data.
hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar grep input output23 ‘dfs[a-z.]+’
After the job completes, you can find the output in the HDFS directory named output23 because you specified that output directory to Hadoop.
$ hadoop fs -ls Found 2 items
drwxr-xr-x – clouduser users 0 2014-04-25 11:45 /user/clouduser/input
drwxr-xr-x – clouduser users 0 2014-04-25 11:45 /user/clouduser/output23
You can see that there is a new directory called output23.
List the output files.
$ hadoop fs -ls output23 Found 2 items
drwxr-xr-x – clouduser users 0 2014-04-25 11:45 /user/joe/output23/_SUCCESS
-rw-r–r– 1 clouduser users 1068 2014-04-25 11:45 /user/joe/output23/part-r-00000
Read the results in the output file.
hadoop fs -cat output23/part-r-00000 | head
1 dfs.safemode.min.datanodes
1 dfs.safemode.extension
1 dfs.replication
1 dfs.permissions.enabled
1 dfs.namenode.name.dir
1 dfs.namenode.checkpoint.dir
1 dfs.datanode.data.dir
iptables -A INPUT -p tcp –dport 631 -j ACCEPT
iptables -A INPUT -p tcp –dport 8031 -j ACCEPT
iptables -A INPUT -p tcp –dport 8042 -j ACCEPT
iptables -A INPUT -p tcp –dport 8080 -j ACCEPT
iptables -A INPUT -p tcp –dport 8088 -j ACCEPT
/sbin/service iptables save