Discussion:
[Wien] Fail to parallel calculation of lapw1 and lapw2 (testpara1 and testpara2)
Woohyeon Baek
2018-10-28 10:04:00 UTC
Permalink
_______________________________________________
Wien mailing list
***@zeus.theochem.tuwien.ac.at
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
SEARCH the MAILING-LIST at: http://www.mail-archive.com/***@zeus.theochem.tuwien.ac.at/index.html
Gavin Abo
2018-10-28 13:14:47 UTC
Permalink
What does "ls -al ~/.ssh/config" give you?

That error is reproducible with Ubuntu 18.04.1 LTS:

***@computername:~$ cat ~/.ssh/config
Host *

     HostName 127.0.0.1

     User username

     ForwardX11Trusted yes

     GatewayPorts yes

     GSSAPIAuthentication yes
***@computername:~$ chmod 666 ~/.ssh/config
***@computername:~$ ls -al ~/.ssh/config
-rw-rw-rw- 1 username username 131 Oct 28 06:54 /home/username/.ssh/config
***@computername:~$ ssh localhost
Bad owner or permissions on /home/username/.ssh/config

Using a set of proper chmod (and chown) file permission indeed seems to
fix the problem [
https://serverfault.com/questions/253313/ssh-returns-bad-owner-or-permissions-on-ssh-config
]:

***@computername:~$ chmod 644 ~/.ssh/config
***@computername:~$ ls -al ~/.ssh/config
-rw-r--r-- 1 username username 131 Oct 28 06:54 /home/username/.ssh/config
***@computername:~$ ssh localhost
...

Last login: Sun Oct 28 06:54:48 2018 from 127.0.0.1
***@computername:~$

Also, you might have to change "User localhost" to "User username" and
HostName may need changed from 0.0.0.0 to the loopback address 127.0.0.1
[ https://en.wikipedia.org/wiki/Localhost ] in your config file, where
username has to be replaced by your actual user name.
Dear administraters or technicians of WIEN2k,
Hello. I am an user of WIEN2k v17.1 and now upgraded to 18.2.
(The specification of my nodes is 2 CPUs with 56 threads in total
(Xeon intel E5-2696 series) and CentOS 17.)
(I had no installation problems for ./siteconfig when I
compiled all with intel compilers with mpi, fftw, scalapack, mkl and
libxc library.)
I have a problem of parallel calculation of lapw1 and lapw2 modules
through w2web with tunneling of putty.
(The input text and results are in below.)
When I tried to calculate my system, it showed constant error about
*bad users or permissions* on config file.
When I check the archives and googles to solve, they said that the
problem is in authorizations. So
1. I already did ssh-keygen command and appending key_authorized but
it did not make any difference.
2. I tried changing authorities of config file by chmod and chown
commands but it did not worked. (I could not find different solutions
except this.)
3. I checked the *.error files of testpara1 and 2 results and it just
shows nothing but Error without any comments.
When I tried without parallization for small size system (only 1 job),
the calculation worked without problems.
I also checked testpara of each lapw modules and lapw1 and 2 showed
errors.
It seems lapw1 runs without parallelization and lapw2 does not work.
I would really appreciated if there has a way how to solve problems.
I am really thank you for your help in advance.
(I used just 4 threads for test due to long sentences. Of course I
tried using full threads but it did not worked.)
*.machines file*
-----------------------------
granularity:1
1:localhost:4    (I  tried my username but it did not worked. I also
tried 1:localhost, 1:localhost localhost:1 and 1:localhost 1:localhost)
lapw0:localhost:2 localhost:2
dstart:localhost:2 localhost:2
nlvdw:localhost:2 localhost:2
------------------------------
*~/.ssh/config*
-------------------------
Host    *
HostName 0.0.0.0   (I also tried my fixed IP but it did not worked)
User localhost
ForwardX11Trusted yes
GatewayPorts yes
GSSAPIAuthentication yes
-------------------------
*SCF results*
-----------------------------------------------------------------------------------------------------------------------------------------------------
changing 1.in2c changing 1.in2_ls changing 1.in2_st changing 1.in2_sy
LAPW0 END [1] Done mpirun -np 4 -machinefile .machine0
/home/User/software/WIEN2K/lapw0_mpi lapw0.def >> .time00 DFTD3 END
*Bad owner or permissions on /home/User/.ssh/config* [1] + Exit 255 (
$remote $remotemachine "cd $PWD;$t $ttt;rm -f .lock_$lockfile[$p]" )
.time1_1: No such file or directory 1.scf1up_1: No such file or
directory. cat: No match. grep: No match. grep: No match. grep: No
match. > stop error
------------------------------------------------------------------------------------------------------------------------------
*testpara*
------------------------------------------------------------
#####################################################
# TESTPARA #
#####################################################
Test: LAPW1 in parallel mode (using .machines)
Granularity set to 1
Extrafine unset
weights: 1
sumw: 1
k-points: 30
klist: 30
machines: localhost
procs: 1
weigh(old): 1
sumw: 1
granularity: 1
weigh(new): 30
Distribution of k-point (under ideal conditions)
1 : localhost(30) 30k
-------------------------------------------------------
*testpara1*
-------------------------------------------------------------
##################################################### # TESTPARA1 #
##################################################### Sun Oct 28
18:12:33 KST 2018 lapw1para is running 30 of 30 (100%) k-points
distributed localhost: running localhost: not running localhost: not
running localhost: not running
------------------------------------------------------
*testpara2*
--------------------------------------------------------------
#####################################################
# TESTPARA2 #
#####################################################
Sun Oct 28 18:12:47 KST 2018
lapw2para exited due to an ERROR
Check *.error files
---------------------------------------------------------------
Sincerely,
Woohyeon Baek
Gavin Abo
2018-10-29 06:35:38 UTC
Permalink
FYI, when I have followed the instructions for setting up ssh
passwordless login using ssh-kgen given at the link below, it has always
gone well for me:

https://www.tecmint.com/ssh-passwordless-login-using-ssh-keygen-in-5-easy-steps/

First, you might want to check if you have the following files and file
permissions similar to:

***@computername:~$ ls -al ~/.ssh
total 24
drwx------  2 username username 4096 Oct 28 23:54 .
drwxr-xr-x 68 username username 4096 Oct 28 23:54 ..
-rw-rw-r--  1 username username  401 Oct 28 23:54 authorized_keys
-rw-------  1 username username 1679 Oct 28 23:50 id_rsa
-rw-r--r--  1 username username  401 Oct 28 23:50 id_rsa.pub
-rw-r--r--  1 username username  222 Oct 28 23:47 known_hosts

Second, you might try removing all the files in .ssh to start over:

***@computername:~$ rm ~/.ssh/*

After removing all the files in the .ssh folder, it prompts me again for
my password:

***@computername:~$ ssh localhost
The authenticity of host 'localhost (127.0.0.1)' can't be established.
ECDSA key fingerprint is blah blah.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
***@localhost's password:
...
Last login: Sun Oct 28 23:39:38 2018 from 127.0.0.1
***@computername:~$ exit
...

Next, I setup the ssh-keygen on the client hostname (e.g. localhost),
then I copy authorized_keys to the remote hostname (e.g., localhost <-
Usually, this has to be done using the hostname for each of your nodes. 
In this case, it only has to be done once and it connects back to the
same localhost.):

***@computername:~$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/username/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/username/.ssh/id_rsa.
Your public key has been saved in /home/username/.ssh/id_rsa.pub.
The key fingerprint is:
blah blah ***@computername
The key's randomart image is:
+---[RSA 2048]----+
|      o  o       |
|       ()        |
|                 |
|                 |
|                 |
|                 |
|                 |
|                 |
|                 |
+----[SHA256]-----+

***@computername:~$ cat ~/.ssh/id_rsa.pub | ssh localhost 'cat >>
~/.ssh/authorized_keys'
***@localhost's password:
***@computername:~$ exit
logout
Connection to localhost closed.

It works as it no longer prompts for a password (i.e., it no longer
stops with "***@localhost's password:" when connecting):

***@computername:~$ ssh localhost
...

Last login: Sun Oct 28 23:49:28 2018 from 127.0.0.1
***@computername:~$exit

Finally, "ls -al ~/.ssh" gives the files and file permissions that were
first shown above.
Remove the config file.
Your username seems to be:  User
You don't want a     user localhost
localhost is the hostname, not the username.
-------- Weitergeleitete Nachricht --------
Betreff:     [RE]Re: [Wien] Fail to parallel calculation of lapw1 and
lapw2 (testpara1 and testpara2)
Datum:     Mon, 29 Oct 2018 00:48:47 +0900 (KST)
Dear Peter Blaha,
I set 'User localhost' in config file.
Sorry, I mistyped authorized_keys as key_authorized in mailing.
Anyway I retried
1. ssh-keygen –t rsa
2 .append .ssh/authorized_keys on remote host with id_rsa.pub of local
host
In 'Installation of Wien2k, parallelization, large scale applications
with WIEN2k' your presentation file.
But it still requires password.
-----------------------------------
------------------------------------
Are there any more steps that I need to perform?
Sincerely,
Woohyeon Baek
Loading...