Salem's Euphoria

Sharing Experience


Leave a comment

CentOS 7.2 VM hangs on startup

Host :  Windows 10
Guest : CentOS 7.2 64x
Tool : Oracle VirtualBox 5.2

Problem:

For an unknown reason, your VM won’t startup and hangs on boot. You have tried like me many VBoxManage commands, transform raw to vdi, file access rights, … but you’re still on this screen:

CentOS hangs here

 

Solution :

When your machine gives you the boot choice, press “e” to enter the grub configuration. Look for “rhgb” parameter in the kernel boot commands and delete it.

2017-06-26_1049

This command should be in the line starting with “linux16”.

Add the command “systemd.unit=multi-user.target” at the end of this line.

Press Ctrl+x to start the VM and maybe everything will work fine.

Advertisements


Leave a comment

Running sample Mahout Job on Hadoop Multinode cluster

This slideshare introduction is quite interresting. It explains how K-Means algorithm works.

(Credit to Subhas Kumar Ghosh)

One common problem with Hadoop, is the unexplained hang when running a sample job. For instance, I’ve been testing Mahout (cluster-reuters) on a Hadoop multinode cluster (1 namenode, 2 slaves). A sample trace in my case looks like this listing:

15/10/17 12:09:06 INFO YarnClientImpl: Submitted application application_1445072191101_0026
15/10/17 12:09:06 INFO Job: The url to track the job: http://master.phd.net:8088/proxy/application_1445072191101_0026/
15/10/17 12:09:06 INFO Job: Running job: job_1445072191101_0026
15/10/17 12:09:14 INFO Job: Job job_1445072191101_0026 running in uber mode : false
15/10/17 12:09:14 INFO Job:  map 0% reduce 0%

The jobs web console told me that the job State=Accepted, Final Status = UNDEFINED and the tracking UI was UNASSIGNED.

First thing I suspected, was a warning thrown by hadoop binary:

WARN NativeCodeLoader: Unable to load native-hadoop library for your 
platform... using builtin-java classes where applicable

Absolutely nothing to do with my problem. I rebuilt this jar from sources, but the job still hangs.

I reviewed the namenode logs. Nothing special. Then the Yarn different logs(resourcemanager, nodemanager). No problems. Slaves logs. Same thing.

As we don’t have much information from Hadoop logs, I went through the net for similar problems. It seems that this is a common problem related to memory configuration. I wonder why such problems are not yet logged (Hadoop 2.6). Even if I analyzed the memory consumption using JConsole, but nothing was alarming with it.

All the used machines are CentOS 6.5 virtual machines hosted on a 16 Gb RAM, i7-G5 laptop. After connecting and configuring the three machines, I realized that the allowed disk space (15Gb for the namenode, 6Gb for slaves ) is not enough. Checking the disk space usage (df  -h), only 3% of the disk space were available on the two slaves. This could be an issue, but Hadoop reports such errors.

Looking in yarn-site.xml, I remembered that I gave Yarn 2Gb to run this test jobs.

    <property>    
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>2024</value>
    </property>
    <property>
        <name>yarn.scheduler.minimum-allocation-mb</name>
        <value>512</value>
    </property>

I tried doubling this value on the namenode:

    <property>    
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>4096</value>
    </property>
    <property>
        <name>yarn.scheduler.minimum-allocation-mb</name>
        <value>512</value>
    </property>

Then I propagated changes to slaves and restarted everything (stop-dfs.sh && stop-yarn.sh && start-yarn.sh && start-dfs.sh). then

    ./cluster-reuters.sh

It works, finally 🙂

Now, I’m trying to visualize the clustering results using Gephi right now.

Gephi Sample clusters graph