So much work and so few time.. I won’t have the time to explain, so it’s just a post for keeping a trace of my install scripts for Cloudera on CentOS 6
mkdir /opt/quidquid
mkdir /opt/quidquid/PROGS
yum install -y nmap wget apr apr-devel apr-util apr-util-devel libxml pcre pcre-devel gcc openssl-devel
cd ~
wget -c http://apache.crihan.fr/dist/httpd/httpd-2.2.27.tar.gz
tar zxf httpd-2.2.27.tar.gz
cd httpd-2.2.27
./configure --prefix=/opt/apache-2.2.27 --enable-so --enable-ssl --enable-ssl=shared --enable-rewrite --enable-rewrite=shared --with-z=/usr
make
make install
ln -s /opt/apache-2.2.27/ /opt/apache
cd ~
rm -Rf ~/httpd-2.*
ll
vi /opt/apache/conf/httpd.conf
groupadd www
useradd -g www www
cat > /etc/init.d/httpd << "EOF"
. /etc/rc.d/init.d/functions
RETVAL=$?
APACHEHOME="/opt/apache"
case "$1" in
start)
echo -n "Starting httpd: "
daemon $APACHEHOME/bin/httpd
echo
touch /var/lock/subsys/httpd
;;
stop)
echo -n "Shutting down http: "
killproc httpd
echo
rm -f /var/lock/subsys/httpd
rm -f /var/run/httpd.pid
;;
status)
status httpd
;;
restart)
$0 stop
$0 start
;;
reload)
echo -n "Reloading httpd: "
killproc httpd -HUP
echo
;;
*)
echo "Usage: $0 {start|stop|restart|reload|status}"
exit 1
esac
exit 0
EOF
chmod 700 /etc/init.d/httpd
/etc/init.d/httpd start
/etc/init.d/httpd stop
/sbin/chkconfig --level 3 httpd on
/sbin/chkconfig --level 06 httpd off
mkdir -p /home/www/html
mkdir -p /home/www/cgi-bin
mkdir -p /home/www/html/CLOUDSME
cat > /home/www/html/robots.txt << "EOF"
User-agent: *
Disallow: /
EOF
cat > /home/www/html/index.html << "EOF"
Bonjour !
EOF
chgrp -R www /home/www
chmod -R 775 /home/www
cp /opt/apache/conf/httpd.conf /opt/apache/conf/httpd.old
vi /opt/apache/conf/httpd.conf
Modifier les lignes suivantes pour correspondre à nos besoins :
- User www
- Group www
- ServerName 192.168.2.42:80
- Listen 80
- DocumentRoot “/home/www/html”
- /home/www/html”>
- ScriptAlias /cgi-bin/ “/home/www/cgi-bin/”
- /home/www/cgi-bin”>
/etc/init.d/httpd start
iptables -P INPUT ACCEPT
iptables -F
iptables -A INPUT -i lo -j ACCEPT
iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
iptables -P INPUT DROP
iptables -P FORWARD DROP
iptables -P OUTPUT ACCEPT
iptables -A INPUT -i eth0 -p icmp -j ACCEPT
iptables -A INPUT -p tcp --dport 22 -j ACCEPT
iptables -A INPUT -p tcp --dport 80 -j ACCEPT
iptables -A INPUT -p tcp --dport 443 -j ACCEPT
iptables -L
/sbin/service iptables save
——————————
—- CLOUDERA INSTALLATION
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/4.2.0/CDH4-Quick-Start/cdh4qs_topic_3_3.html
cd ~/
wget -c http://archive.cloudera.com/cdh4/one-click-install/redhat/6/x86_64/cloudera-cdh-4-0.x86_64.rpm
yum --nogpgcheck localinstall cloudera-cdh-4-0.x86_64.rpm
yum install hadoop-conf-pseudo
Installer Java :
cd /opt/
wget -c https://blog.quidquid.fr/jdk/jdk-7u51-linux-x64.tar.gz
tar zxf /opt/jdk-7u51-linux-x64.tar.gz
chown -R root:root /opt/jdk1.7.0_51
ln -s /opt/jdk1.7.0_51 /opt/jdk
[bash]
Ajout de java_home dans les variables d’environnement.
[bash]
cat >> ~/.bashrc << "EOF"
# -------------------------
export JAVA_HOME=/opt/jdk
PATH=$PATH:$JAVA_HOME/bin
EOF
cat >> /etc/bashrc << "EOF"
# -------------------------
export JAVA_HOME=/opt/jdk
PATH=$PATH:$JAVA_HOME/bin
EOF
[/bash]
Ajouter VMCLOUDERA à la fin de chaque lignes
[bash]vi /etc/hosts[/bash]
[bash]
sudoedit /etc/sudoers and add :
hdfs ALL=(ALL) ALL
[/bash]
Se connecter en tant que hdfs
[bash]
su - hdfs
hdfs namenode -format
exit
vi /etc/hadoop/conf.pseudo/hadoop-env.sh
export JAVA_HOME=/opt/jdk
[/bash]
Tout démarrer :
[bash]
for x in <code>cd /etc/init.d ; ls hadoop-hdfs-*</code> ; do sudo service $x start ; done
sudo service hadoop-hdfs-namenode start
sudo service hadoop-hdfs-secondarynamenode start
sudo service hadoop-hdfs-datanode start
[bash]
3. Optional: Start services on boot
[bash]
sudo chkconfig hadoop-hdfs-namenode on
sudo chkconfig hadoop-hdfs-secondarynamenode on
sudo chkconfig hadoop-hdfs-datanode on
Step 3: Create the /tmp Directory
Remove the old /tmp if it exists:
sudo -u hdfs hadoop fs -rm -r /tmp
Create a new /tmp directory and set permissions:
sudo -u hdfs hadoop fs -mkdir /tmp sudo -u hdfs hadoop fs -chmod -R 1777 /tmp
Step 4: Create Staging and Log Directories
Create the staging directory and set permissions:
sudo -u hdfs hadoop fs -mkdir /tmp/hadoop-yarn/staging
sudo -u hdfs hadoop fs -chmod -R 1777 /tmp/hadoop-yarn/staging
Create the done_intermediate directory under the staging directory and set permissions:
sudo -u hdfs hadoop fs -mkdir /tmp/hadoop-yarn/staging/history/done_intermediate
sudo -u hdfs hadoop fs -chmod -R 1777 /tmp/hadoop-yarn/staging/history/done_intermediate
Change ownership on the staging directory and subdirectory:
sudo -u hdfs hadoop fs -chown -R mapred:mapred /tmp/hadoop-yarn/staging
Create the /var/log/hadoop-yarn directory and set ownership:
sudo -u hdfs hadoop fs -mkdir /var/log/hadoop-yarn
sudo -u hdfs hadoop fs -chown yarn:mapred /var/log/hadoop-yarn
Step 5: Verify the HDFS File Structure:
Run the following command:
$ sudo -u hdfs hadoop fs -ls -R /
You should see the following directory structure:
drwxrwxrwt – hdfs supergroup 0 2014-04-25 11:29 /tmp
drwxr-xr-x – hdfs supergroup 0 2014-04-25 11:29 /tmp/hadoop-yarn
drwxrwxrwt – mapred mapred 0 2014-04-25 11:30 /tmp/hadoop-yarn/staging
drwxr-xr-x – mapred mapred 0 2014-04-25 11:30 /tmp/hadoop-yarn/staging/history
drwxrwxrwt – mapred mapred 0 2014-04-25 11:30 /tmp/hadoop-yarn/staging/history/done_intermediate
drwxr-xr-x – hdfs supergroup 0 2014-04-25 11:33 /var
drwxr-xr-x – hdfs supergroup 0 2014-04-25 10:53 /var/lib
drwxr-xr-x – hdfs supergroup 0 2014-04-25 11:33 /var/log
drwxr-xr-x – yarn mapred 0 2014-04-25 11:33 /var/log/hadoop-yarn
Step 6: Start YARN
sudo service hadoop-yarn-resourcemanager start
sudo service hadoop-yarn-nodemanager start
sudo service hadoop-mapreduce-historyserver start
sudo chkconfig hadoop-yarn-resourcemanager on
sudo chkconfig hadoop-yarn-nodemanager on
sudo chkconfig hadoop-mapreduce-historyserver on
sudo -u hdfs hadoop fs -mkdir /user
sudo -u hdfs hadoop fs -mkdir /user/clouduser
sudo -u hdfs hadoop fs -chown clouduser /user/clouduser
Testing everything is ok
useradd -g users clouduser
passwd clouduser
!clouduser!CLOUDERA!
su – clouduser
hadoop fs -mkdir input
hadoop fs -put /etc/hadoop/conf/*.xml input
hadoop fs -ls input
Found 3 items:
-rw-r–r– 1 clouduser users 1348 2014-04-25 11:42 input/core-site.xml
-rw-r–r– 1 clouduser users 1913 2014-04-25 11:42 input/hdfs-site.xml
-rw-r–r– 1 clouduser users 1001 2014-04-25 11:42 input/mapred-site.xml
Set HADOOP_MAPRED_HOME for user joe:
export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce
Run an example Hadoop job to grep with a regular expression in your input data.
hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar grep input output23 ‘dfs[a-z.]+’
After the job completes, you can find the output in the HDFS directory named output23 because you specified that output directory to Hadoop.
$ hadoop fs -ls Found 2 items
drwxr-xr-x – clouduser users 0 2014-04-25 11:45 /user/clouduser/input
drwxr-xr-x – clouduser users 0 2014-04-25 11:45 /user/clouduser/output23
You can see that there is a new directory called output23.
List the output files.
$ hadoop fs -ls output23 Found 2 items
drwxr-xr-x – clouduser users 0 2014-04-25 11:45 /user/joe/output23/_SUCCESS
-rw-r–r– 1 clouduser users 1068 2014-04-25 11:45 /user/joe/output23/part-r-00000
Read the results in the output file.
hadoop fs -cat output23/part-r-00000 | head
1 dfs.safemode.min.datanodes
1 dfs.safemode.extension
1 dfs.replication
1 dfs.permissions.enabled
1 dfs.namenode.name.dir
1 dfs.namenode.checkpoint.dir
1 dfs.datanode.data.dir
iptables -A INPUT -p tcp –dport 631 -j ACCEPT
iptables -A INPUT -p tcp –dport 8031 -j ACCEPT
iptables -A INPUT -p tcp –dport 8042 -j ACCEPT
iptables -A INPUT -p tcp –dport 8080 -j ACCEPT
iptables -A INPUT -p tcp –dport 8088 -j ACCEPT
/sbin/service iptables save