Tag: Cloudera

CentOS 6 – Cloudera

EnglishLinuxTutorials

So much work and so few time.. I won’t have the time to explain, so it’s just a post for keeping a trace of my install scripts for Cloudera on CentOS 6

mkdir /opt/quidquid
mkdir /opt/quidquid/PROGS
yum install -y nmap wget apr apr-devel apr-util apr-util-devel libxml pcre pcre-devel gcc openssl-devel
cd ~
wget -c http://apache.crihan.fr/dist/httpd/httpd-2.2.27.tar.gz
tar zxf httpd-2.2.27.tar.gz
cd httpd-2.2.27
./configure --prefix=/opt/apache-2.2.27 --enable-so --enable-ssl --enable-ssl=shared --enable-rewrite --enable-rewrite=shared --with-z=/usr
make
make install
ln -s /opt/apache-2.2.27/ /opt/apache
cd ~
rm -Rf ~/httpd-2.*
ll
vi /opt/apache/conf/httpd.conf
groupadd www
useradd -g www www
cat > /etc/init.d/httpd << "EOF"
. /etc/rc.d/init.d/functions

RETVAL=$?
APACHEHOME="/opt/apache"

case "$1" in

  start)
    echo -n "Starting httpd: "
    daemon $APACHEHOME/bin/httpd
    echo
    touch /var/lock/subsys/httpd
    ;;

  stop)
    echo -n "Shutting down http: "
    killproc httpd
    echo
    rm -f /var/lock/subsys/httpd
    rm -f /var/run/httpd.pid
    ;;

  status)
    status httpd
    ;;

  restart)
    $0 stop
    $0 start
    ;;

  reload)
    echo -n "Reloading httpd: "
    killproc httpd -HUP
    echo
    ;;

  *)
    echo "Usage: $0 {start|stop|restart|reload|status}"
    exit 1
    esac

exit 0
EOF

chmod 700 /etc/init.d/httpd
/etc/init.d/httpd start
/etc/init.d/httpd stop
/sbin/chkconfig --level 3 httpd on
/sbin/chkconfig --level 06 httpd off

mkdir -p /home/www/html
mkdir -p /home/www/cgi-bin
mkdir -p /home/www/html/CLOUDSME
cat > /home/www/html/robots.txt << "EOF"
User-agent: *
Disallow: /
EOF

cat > /home/www/html/index.html << "EOF"
Bonjour !
EOF

chgrp -R www /home/www
chmod -R 775 /home/www
cp /opt/apache/conf/httpd.conf /opt/apache/conf/httpd.old
vi /opt/apache/conf/httpd.conf

Modifier les lignes suivantes pour correspondre à nos besoins :
- User www
- Group www
- ServerName 192.168.2.42:80
- Listen 80
- DocumentRoot “/home/www/html”
- /home/www/html”>
- ScriptAlias /cgi-bin/ “/home/www/cgi-bin/”
- /home/www/cgi-bin”>

/etc/init.d/httpd start

iptables -P INPUT ACCEPT
iptables -F
iptables -A INPUT -i lo -j ACCEPT
iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
iptables -P INPUT DROP
iptables -P FORWARD DROP
iptables -P OUTPUT ACCEPT
iptables -A INPUT -i eth0 -p icmp -j ACCEPT
iptables -A INPUT -p tcp --dport 22 -j ACCEPT
iptables -A INPUT -p tcp --dport 80 -j ACCEPT
iptables -A INPUT -p tcp --dport 443 -j ACCEPT
iptables -L
/sbin/service iptables save

——————————
—- CLOUDERA INSTALLATION
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/4.2.0/CDH4-Quick-Start/cdh4qs_topic_3_3.html

cd ~/
wget -c http://archive.cloudera.com/cdh4/one-click-install/redhat/6/x86_64/cloudera-cdh-4-0.x86_64.rpm
yum --nogpgcheck localinstall cloudera-cdh-4-0.x86_64.rpm
yum install hadoop-conf-pseudo

Installer Java :

cd /opt/
wget -c http://blog.quidquid.fr/jdk/jdk-7u51-linux-x64.tar.gz
tar zxf /opt/jdk-7u51-linux-x64.tar.gz
chown -R root:root /opt/jdk1.7.0_51
ln -s /opt/jdk1.7.0_51 /opt/jdk
[bash]

Ajout de java_home dans les variables d’environnement.
[bash]
cat >> ~/.bashrc << "EOF"
 
# -------------------------
export JAVA_HOME=/opt/jdk
PATH=$PATH:$JAVA_HOME/bin
 
EOF

cat >> /etc/bashrc << "EOF"
 
# -------------------------
export JAVA_HOME=/opt/jdk
PATH=$PATH:$JAVA_HOME/bin
 
EOF
&#91;/bash&#93;

Ajouter VMCLOUDERA à la fin de chaque lignes
&#91;bash&#93;vi /etc/hosts&#91;/bash&#93;


&#91;bash&#93;
sudoedit /etc/sudoers and add :
hdfs	ALL=(ALL)	ALL
&#91;/bash&#93;

Se connecter en tant que hdfs
&#91;bash&#93;
su - hdfs
hdfs namenode -format
exit

vi /etc/hadoop/conf.pseudo/hadoop-env.sh
export JAVA_HOME=/opt/jdk
&#91;/bash&#93;

Tout démarrer :
&#91;bash&#93;
for x in <code>cd /etc/init.d ; ls hadoop-hdfs-*</code> ; do sudo service $x start ; done

sudo service hadoop-hdfs-namenode start
sudo service hadoop-hdfs-secondarynamenode start
sudo service hadoop-hdfs-datanode start
[bash]

3. Optional: Start services on boot
[bash]
sudo chkconfig hadoop-hdfs-namenode on
sudo chkconfig hadoop-hdfs-secondarynamenode on
sudo chkconfig hadoop-hdfs-datanode on

Step 3: Create the /tmp Directory

Remove the old /tmp if it exists:

sudo -u hdfs hadoop fs -rm -r /tmp

Create a new /tmp directory and set permissions:

sudo -u hdfs hadoop fs -mkdir /tmp sudo -u hdfs hadoop fs -chmod -R 1777 /tmp

Step 4: Create Staging and Log Directories

Create the staging directory and set permissions:

sudo -u hdfs hadoop fs -mkdir /tmp/hadoop-yarn/staging
sudo -u hdfs hadoop fs -chmod -R 1777 /tmp/hadoop-yarn/staging

Create the done_intermediate directory under the staging directory and set permissions:

sudo -u hdfs hadoop fs -mkdir /tmp/hadoop-yarn/staging/history/done_intermediate
sudo -u hdfs hadoop fs -chmod -R 1777 /tmp/hadoop-yarn/staging/history/done_intermediate

Change ownership on the staging directory and subdirectory:

sudo -u hdfs hadoop fs -chown -R mapred:mapred /tmp/hadoop-yarn/staging

Create the /var/log/hadoop-yarn directory and set ownership:

sudo -u hdfs hadoop fs -mkdir /var/log/hadoop-yarn
sudo -u hdfs hadoop fs -chown yarn:mapred /var/log/hadoop-yarn

Step 5: Verify the HDFS File Structure:

Run the following command:

$ sudo -u hdfs hadoop fs -ls -R /

You should see the following directory structure:

drwxrwxrwt – hdfs supergroup 0 2014-04-25 11:29 /tmp
drwxr-xr-x – hdfs supergroup 0 2014-04-25 11:29 /tmp/hadoop-yarn
drwxrwxrwt – mapred mapred 0 2014-04-25 11:30 /tmp/hadoop-yarn/staging
drwxr-xr-x – mapred mapred 0 2014-04-25 11:30 /tmp/hadoop-yarn/staging/history
drwxrwxrwt – mapred mapred 0 2014-04-25 11:30 /tmp/hadoop-yarn/staging/history/done_intermediate
drwxr-xr-x – hdfs supergroup 0 2014-04-25 11:33 /var
drwxr-xr-x – hdfs supergroup 0 2014-04-25 10:53 /var/lib
drwxr-xr-x – hdfs supergroup 0 2014-04-25 11:33 /var/log
drwxr-xr-x – yarn mapred 0 2014-04-25 11:33 /var/log/hadoop-yarn

Step 6: Start YARN

sudo service hadoop-yarn-resourcemanager start
sudo service hadoop-yarn-nodemanager start
sudo service hadoop-mapreduce-historyserver start

sudo chkconfig hadoop-yarn-resourcemanager on
sudo chkconfig hadoop-yarn-nodemanager on
sudo chkconfig hadoop-mapreduce-historyserver on

sudo -u hdfs hadoop fs -mkdir /user
sudo -u hdfs hadoop fs -mkdir /user/clouduser
sudo -u hdfs hadoop fs -chown clouduser /user/clouduser

Testing everything is ok

useradd -g users clouduser
passwd clouduser
!clouduser!CLOUDERA!
su – clouduser
hadoop fs -mkdir input
hadoop fs -put /etc/hadoop/conf/*.xml input

hadoop fs -ls input
Found 3 items:
-rw-r–r– 1 clouduser users 1348 2014-04-25 11:42 input/core-site.xml
-rw-r–r– 1 clouduser users 1913 2014-04-25 11:42 input/hdfs-site.xml
-rw-r–r– 1 clouduser users 1001 2014-04-25 11:42 input/mapred-site.xml

Set HADOOP_MAPRED_HOME for user joe:

export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce

Run an example Hadoop job to grep with a regular expression in your input data.

hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar grep input output23 ‘dfs[a-z.]+’

After the job completes, you can find the output in the HDFS directory named output23 because you specified that output directory to Hadoop.

$ hadoop fs -ls Found 2 items
drwxr-xr-x – clouduser users 0 2014-04-25 11:45 /user/clouduser/input
drwxr-xr-x – clouduser users 0 2014-04-25 11:45 /user/clouduser/output23

You can see that there is a new directory called output23.
List the output files.

$ hadoop fs -ls output23 Found 2 items
drwxr-xr-x – clouduser users 0 2014-04-25 11:45 /user/joe/output23/_SUCCESS
-rw-r–r– 1 clouduser users 1068 2014-04-25 11:45 /user/joe/output23/part-r-00000

Read the results in the output file.

hadoop fs -cat output23/part-r-00000 | head
1 dfs.safemode.min.datanodes
1 dfs.safemode.extension
1 dfs.replication
1 dfs.permissions.enabled
1 dfs.namenode.name.dir
1 dfs.namenode.checkpoint.dir
1 dfs.datanode.data.dir

iptables -A INPUT -p tcp –dport 631 -j ACCEPT
iptables -A INPUT -p tcp –dport 8031 -j ACCEPT
iptables -A INPUT -p tcp –dport 8042 -j ACCEPT
iptables -A INPUT -p tcp –dport 8080 -j ACCEPT
iptables -A INPUT -p tcp –dport 8088 -j ACCEPT
/sbin/service iptables save