Discussion:
[Wien] "OpenBlas" package instead of default "blas"
Ashwani Kumar
2018-11-19 18:24:23 UTC
Permalink
Dear Dr. Pavel Ondracka,
In previous thread,
https://www.mail-archive.com/***@zeus.theochem.tuwien.ac.at/msg18098.html

you advised to use OpenBlas package to extract best performance from
processor. Since i was having problem with wien2k installation, so i went
with Dr. Gavin's set of instructions (for lapack devel package). Now i want
to speed up the wien2k execution (simple oxides too take much time).
Further i noted that at a time, only one thread remains 100% busy, rest
threads shows load level 1-5%.
Configuration of my pc: i7-8700 (6 cores, 12 threads), 8 gb ram (can be
upgraded to 16 gb), fedora-28, graphic card (gtx...)

I understand that "openBlas" need to be installed and set R_path to
-lopenblas. I also want to utilize thread level parallelism if it boosts
the processor's performance further by a factor of >= x1.5 times.

Waiting for your expert advise,

thanks,
A. Kumar
Pavel Ondračka
2018-11-20 10:24:00 UTC
Permalink
Post by Ashwani Kumar
Dear Dr. Pavel Ondracka,
In previous thread,
you advised to use OpenBlas package to extract best performance from
processor. Since i was having problem with wien2k installation, so i
went with Dr. Gavin's set of instructions (for lapack devel package).
Now i want to speed up the wien2k execution (simple oxides too take
much time). Further i noted that at a time, only one thread remains
100% busy, rest threads shows load level 1-5%.
Configuration of my pc: i7-8700 (6 cores, 12 threads), 8 gb ram (can
be upgraded to 16 gb), fedora-28, graphic card (gtx...)
I understand that "openBlas" need to be installed and set R_path to
-lopenblas. I also want to utilize thread level parallelism if it
boosts the processor's performance further by a factor of >= x1.5
times.
Dear A. Kumar,

I don't fully understand your comment about the thread load? The Wien2k
does not ATM spawn multiple threads (unless you use threaded
blas/lapack). The k-point (or MPI) parallel calculations spawn multiple
processes but those should never be at 1-5% load...

IMO there are likely two problems here:
1) If you are only using one machine and your case has a lot of k-point
(and you are not memory-bound), what you want is k-point parallelism.
This can be done with the .machines file (and the -p switch). If you
are only using single machine your .machines file should contain
"1:localhost" line for every processor on your computer (i.e. in your
specific case reasonable .machines file would have 6 (maybe even 12
with hyperthreading, but you need to test your optimal setup) identical
lines. Please check the userguide for more details about the k-point
parallel execution and .machines file in general.

2) regarding the openblas: what you need is an openblas devel package.
In the beginning I suggest the serial openblas "dnf install openblas-
devel" and set R_LIBs to just "-lopenblas". If you want to squeeze more
speed (and you are using only single computer), add also "-ftree-
vectorize -march=native" to your FOPT flags.

If you really want to go with the threaded openblas I can help you
later but IMO this should not be needed in the beginning (as the k-
point parallelism is the optimal one). You will also need some further
tricks to make lapw1 fast with the libmvec. Either see
https://www.mail-archive.com/***@zeus.theochem.tuwien.ac.at/msg16159.html
or I can provide some new patches which do the same with OpenMP (but
first get the k-point parallelism and serial openblas working).

Hope this helps

Best regards
Pavel
Post by Ashwani Kumar
Waiting for your expert advise,
thanks,
A. Kumar
_______________________________________________
Wien mailing list
http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien
Ashwani Kumar
2018-11-22 09:14:26 UTC
Permalink
Dear Dr. Pavel,
My comment about thread load: Processor Monitoring shows 12
CPUs (6 CORES, 12 THREADS) , but 1 is only working at a time. When i
start another WIEN2K session (inspite of already running one session),
then 2nd CPU shows working level 100% (2 session, 2 CPUs ). This means
at max. 12 No. of sessions can be run simultaneously if RAM
availability permits.

I was wondering how to utilized the available CPUs for squeeze out
optimum/max performance out of 12 CPUs. (obviously i m not going to
run 12 sessions at a time)

Thanks for briefing about K-point parallelization. I will install
openblas and will try to make k-point parallelization working after
the current research problem.

Thanks and Regards,
A. Kumar
Post by Ashwani Kumar
Dear Dr. Pavel Ondracka,
In previous thread,
you advised to use OpenBlas package to extract best performance from
processor. Since i was having problem with wien2k installation, so i went
with Dr. Gavin's set of instructions (for lapack devel package). Now i want
to speed up the wien2k execution (simple oxides too take much time).
Further i noted that at a time, only one thread remains 100% busy, rest
threads shows load level 1-5%.
Configuration of my pc: i7-8700 (6 cores, 12 threads), 8 gb ram (can be
upgraded to 16 gb), fedora-28, graphic card (gtx...)
I understand that "openBlas" need to be installed and set R_path to
-lopenblas. I also want to utilize thread level parallelism if it boosts
the processor's performance further by a factor of >= x1.5 times.
Waiting for your expert advise,
thanks,
A. Kumar
Loading...