量子化学软件BDF在低版本系统上的运行方案
问题描述
终于申请到BDF的学术版license,迫不及待地从dropbox上下载下来(哇啊啊啊1.3个g还得修练轻功)装上服务器,然后就看到了这个:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
/opt/sge/default/spool/s02/job_scripts/239: line 10: ulimit: max locked memory: cannot modify limit: Operation not permitted
/apps/bdf/bin/compass.x: error while loading shared libraries: libiomp5.so: cannot open shared object file: No such file or directory
Traceback (most recent call last):
File "/apps/bdf/sbin/bdfdrv.py", line 362, in <module>
main()
File "/apps/bdf/sbin/bdfdrv.py", line 328, in main
if not singlepoint(inpfile, BDFHOME, taskname, bdfmodules, outfile):
File "/apps/bdf/sbin/bdfutil.py", line 669, in singlepoint
re=bdf_exec(bdfmodule, finp, fout, True) #system(bdfcmd)
File "/apps/bdf/sbin/bdfutil.py", line 1939, in bdf_exec
re=check_module_result(bdfmodule)
File "/apps/bdf/sbin/bdfutil.py", line 1154, in check_module_result
fre = open(result,"r")
FileNotFoundError: [Errno 2] No such file or directory: './tmp/25560/nacme.returncode'
丢给Claude,他说是BDF软件的compass.x程序无法找到Intel OpenMP运行库。嗯这很正常,笔者这个服务器根本就没装oneapi。几经周折装好oneapi后,这次又看到了这个:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
/opt/sge/default/spool/s01/job_scripts/1293: line 10: ulimit: max locked memory: cannot modify limit: Operation not permitted
/apps/bdf/bin/compass.x: /lib64/libc.so.6: version `GLIBC_2.32' not found (required by /apps/bdf/bin/compass.x)
/apps/bdf/bin/compass.x: /lib64/libc.so.6: version `GLIBC_2.34' not found (required by /apps/bdf/bin/compass.x)
/apps/bdf/bin/compass.x: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.29' not found (required by /apps/bdf/bin/compass.x)
/apps/bdf/bin/compass.x: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.26' not found (required by /apps/bdf/bin/compass.x)
Traceback (most recent call last):
File "/apps/bdf/sbin/bdfdrv.py", line 362, in <module>
main()
File "/apps/bdf/sbin/bdfdrv.py", line 328, in main
if not singlepoint(inpfile, BDFHOME, taskname, bdfmodules, outfile):
File "/apps/bdf/sbin/bdfutil.py", line 669, in singlepoint
re=bdf_exec(bdfmodule, finp, fout, True) #system(bdfcmd)
File "/apps/bdf/sbin/bdfutil.py", line 1939, in bdf_exec
re=check_module_result(bdfmodule)
File "/apps/bdf/sbin/bdfutil.py", line 1154, in check_module_result
fre = open(result,"r")
FileNotFoundError: [Errno 2] No such file or directory: './tmp/26174/nacme.returncode'
这次笔者都不用给Claude看。GLIBC_2.34
这个可恶的东西屡次阻止笔者在服务器上运行预编译软件而不得不手动编译,笔者可太熟悉了。
解决方案
方案有两种:
1.升级glibc
这个方案笔者大概在去年这个时候努力过,一周无果。原因是那台服务器是centOS 6.9,无论cmake还是gcc还是glibc都老的要命。而这些坏东西还是环环相扣的,必须转着圈螺旋编译,这对当时还是初学者得笔者来说难度有点高了。这个心理阴影使得从此以后笔者看见这要动些玩意的方案全都绕道走,这次也不例外。
2.使用docker打包
好在我们还有备选方案——docker。只要以ubuntu22为起点构建镜像,上述问题全部迎刃而解。缺点就是性能会有一定损失,但笔者目前不打算用BDF运行什么很重的计算,所以也就无所谓了。
自己写dockerfile?开什么玩笑,这种活当然指使Claude来干😸
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
# BDF Dockerfile with Intel OneAPI support
FROM ubuntu:22.04
# 避免交互式安装
ENV DEBIAN_FRONTEND=noninteractive
# 安装基础依赖
RUN apt-get update && apt-get install -y \
build-essential \
gfortran \
wget \
curl \
python3 \
python3-pip \
cpio \
sudo \
&& rm -rf /var/lib/apt/lists/*
# 创建安装目录
RUN mkdir -p /opt/intel/oneapi
# 复制Intel OneAPI安装包
COPY oneapi/l_BaseKit_p_2024.2.1.100_offline.sh /tmp/
COPY oneapi/l_HPCKit_p_2024.2.1.79_offline.sh /tmp/
# 安装Intel OneAPI BaseKit (无界面安装)
RUN chmod +x /tmp/l_BaseKit_p_2024.2.1.100_offline.sh && \
/tmp/l_BaseKit_p_2024.2.1.100_offline.sh -a --silent --eula accept && \
rm /tmp/l_BaseKit_p_2024.2.1.100_offline.sh
# 安装Intel OneAPI HPCKit (无界面安装)
RUN chmod +x /tmp/l_HPCKit_p_2024.2.1.79_offline.sh && \
/tmp/l_HPCKit_p_2024.2.1.79_offline.sh -a --silent --eula accept && \
rm /tmp/l_HPCKit_p_2024.2.1.79_offline.sh
# 复制BDF软件
COPY bdf/ /apps/bdf/
# 设置基础环境变量
ENV BDFHOME=/apps/bdf
ENV PATH=$BDFHOME/bin:$PATH
ENV LD_LIBRARY_PATH=$BDFHOME/lib:$LD_LIBRARY_PATH
# 创建BDF用户
RUN useradd -m -s /bin/bash bdfuser && \
chown -R bdfuser:bdfuser /apps/bdf/
# 创建启动脚本,包含Intel OneAPI环境设置
RUN echo '#!/bin/bash' > /usr/local/bin/bdf_env.sh && \
echo '' >> /usr/local/bin/bdf_env.sh && \
echo '# Source Intel OneAPI environment' >> /usr/local/bin/bdf_env.sh && \
echo 'source /opt/intel/oneapi/setvars.sh --force' >> /usr/local/bin/bdf_env.sh && \
echo '' >> /usr/local/bin/bdf_env.sh && \
echo '# Set BDF environment' >> /usr/local/bin/bdf_env.sh && \
echo 'export BDFHOME=/apps/bdf' >> /usr/local/bin/bdf_env.sh && \
echo 'export LD_LIBRARY_PATH=/apps/bdf/lib:$LD_LIBRARY_PATH' >> /usr/local/bin/bdf_env.sh && \
echo 'export PATH=$BDFHOME/bin:$BDFHOME/sbin:$PATH' >> /usr/local/bin/bdf_env.sh && \
echo '' >> /usr/local/bin/bdf_env.sh && \
echo '# Execute the command' >> /usr/local/bin/bdf_env.sh && \
echo 'exec "$@"' >> /usr/local/bin/bdf_env.sh && \
chmod +x /usr/local/bin/bdf_env.sh
# 设置工作目录
WORKDIR /work
# 默认使用环境脚本作为入口点
ENTRYPOINT ["/usr/local/bin/bdf_env.sh"]
CMD ["/bin/bash"]
注意上述dockerfile完全没做任何优化,肯定会装进来很多冗余的东西,后续有时间我来搞搞。安装脚本:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
#!/bin/bash
set -e
echo "构建BDF Docker镜像 (包含Intel OneAPI)..."
# 检查必需的文件和目录
echo "检查必需文件..."
if [ ! -d "./bdf" ]; then
echo "错误: 请先将BDF软件目录放在当前目录下并命名为 'bdf'"
echo "目录结构应该是: ./bdf/bin/, ./bdf/lib/, ./bdf/sbin/ 等"
exit 1
fi
if [ ! -d "./oneapi" ]; then
echo "错误: 请创建 './oneapi' 目录并放入Intel OneAPI安装包"
exit 1
fi
if [ ! -f "./oneapi/l_BaseKit_p_2024.2.1.100_offline.sh" ]; then
echo "错误: 找不到 l_BaseKit_p_2024.2.1.100_offline.sh"
echo "请将其放在 ./oneapi/ 目录下"
exit 1
fi
if [ ! -f "./oneapi/l_HPCKit_p_2024.2.1.79_offline.sh" ]; then
echo "错误: 找不到 l_HPCKit_p_2024.2.1.79_offline.sh"
echo "请将其放在 ./oneapi/ 目录下"
exit 1
fi
echo "所有必需文件检查通过!"
# 构建镜像(这可能需要很长时间,因为要安装Intel OneAPI)
echo "开始构建镜像(这可能需要20-30分钟)..."
docker build -t bdf:oneapi-ubuntu22.04 .
echo "镜像构建完成!"
docker images | grep bdf
安装前,需要把bdf程序文件放在当前目录下bdf文件夹内(这个后续打算优化掉,不应该把程序包进去),oneapi的安装包放进oneapi文件夹内。准备好后,执行:
1
./build_bdf_docker.sh
一段时间后运行完成,得到BDF镜像:
1
bdf oneapi-ubuntu22.04 19b19d6fbe96 27 seconds ago 28.2GB
28.2GB,这就是不做优化的坏处🥲不过纯自用的话倒也是眼不见心不烦。测试:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
#!/bin/bash
# 测试BDF Docker环境
echo "测试BDF Docker环境..."
docker run --rm \
bdf:oneapi-ubuntu22.04 \
/bin/bash -c "
echo '=== 系统信息 ==='
cat /etc/os-release | head -2
echo -e '\n=== GLIBC版本 ==='
ldd --version | head -1
echo -e '\n=== Intel OneAPI环境 ==='
which icc || echo 'ICC not found'
which ifort || echo 'Intel Fortran not found'
echo -e '\n=== BDF环境 ==='
echo 'BDFHOME: $BDFHOME'
ls -la /apps/bdf/bin/ | head -5
echo -e '\n=== 库依赖检查 ==='
ldd /apps/bdf/bin/compass.x | head -10 || echo 'compass.x not found or has issues'
echo -e '\n=== OpenMP测试 ==='
echo 'OMP_NUM_THREADS: $OMP_NUM_THREADS'
"
echo "环境测试完成!"
输出类似于:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
(obabel) [sun@s01 bdf_docker]$ ./test_bdf_env.sh
测试BDF Docker环境...
:: initializing oneAPI environment ...
bdf_env.sh: BASH_VERSION = 5.1.16(1)-release
args: Using "$@" for setvars.sh arguments: --force
:: advisor -- latest
:: ccl -- latest
:: compiler -- latest
:: dal -- latest
:: debugger -- latest
:: dev-utilities -- latest
:: dnnl -- latest
:: dpcpp-ct -- latest
:: dpl -- latest
:: ipp -- latest
:: ippcp -- latest
:: mkl -- latest
:: mpi -- latest
:: tbb -- latest
:: vtune -- latest
:: oneAPI environment initialized ::
=== 系统信息 ===
PRETTY_NAME="Ubuntu 22.04.5 LTS"
NAME="Ubuntu"
=== GLIBC版本 ===
ldd (Ubuntu GLIBC 2.35-0ubuntu3.10) 2.35
=== Intel OneAPI环境 ===
ICC not found
/opt/intel/oneapi/compiler/2024.2/bin/ifort
=== BDF环境 ===
BDFHOME:
total 2617876
drwxr-xr-x. 1 bdfuser bdfuser 4096 Jul 1 11:51 .
drwxr-xr-x. 1 bdfuser bdfuser 159 Jul 1 11:51 ..
-rwxr-xr-x. 1 bdfuser bdfuser 95264112 Jul 1 10:27 atom.x
-rwxr-xr-x. 1 bdfuser bdfuser 97220840 Jul 1 10:27 bdfopt.x
=== 库依赖检查 ===
linux-vdso.so.1 (0x00007ffc617fc000)
libiomp5.so => /opt/intel/oneapi/compiler/2024.2/lib/libiomp5.so (0x00007fbf42000000)
libimf.so => /opt/intel/oneapi/compiler/2024.2/lib/libimf.so (0x00007fbf41bf4000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fbf424b4000)
libirng.so => /opt/intel/oneapi/compiler/2024.2/lib/libirng.so (0x00007fbf41afb000)
libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fbf418cf000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fbf416a6000)
/lib64/ld-linux-x86-64.so.2 (0x00007fbf425a2000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fbf42494000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fbf4248f000)
=== OpenMP测试 ===
OMP_NUM_THREADS:
环境测试完成!
(obabel) [sun@s01 bdf_docker]$
可以看到当前的环境已经完全可以满足BDF的运行需要。
运行BDF计算
直接计算:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
#!/bin/bash
# 使用方法: ./run_bdf_docker.sh input_file.inp [num_threads] [output_file.out]
if [ $# -lt 1 ]; then
echo "使用方法: $0 input_file.inp [num_threads] [output_file.out]"
echo "例如: $0 nacme.inp 36 nacme.out"
exit 1
fi
INPUT_FILE="$1"
NUM_THREADS="${2:-36}"
OUTPUT_FILE="${3:-${INPUT_FILE%.*}.out}"
WORK_DIR=$(pwd)
# 检查输入文件是否存在
if [ ! -f "$INPUT_FILE" ]; then
echo "错误: 输入文件 $INPUT_FILE 不存在"
exit 1
fi
echo "运行BDF计算:"
echo " 输入文件: $INPUT_FILE"
echo " 输出文件: $OUTPUT_FILE"
echo " 线程数: $NUM_THREADS"
echo " 工作目录: $WORK_DIR"
# 创建临时目录
TMP_DIR="./tmp/$RANDOM"
mkdir -p "$TMP_DIR"
echo " 临时目录: $TMP_DIR"
# 运行Docker容器
docker run --rm \
-v "$WORK_DIR":/work \
-w /work \
-u $(id -u):$(id -g) \
--cpus="$NUM_THREADS" \
-e "BDF_WORKDIR=./" \
-e "BDF_TMPDIR=$TMP_DIR" \
-e "OMP_NUM_THREADS=$NUM_THREADS" \
-e "OMP_STACKSIZE=4G" \
bdf:oneapi-ubuntu22.04 \
python3 /apps/bdf/sbin/bdfdrv.py -r "$INPUT_FILE" > "$OUTPUT_FILE"
echo "计算完成!"
通过sge作业系统提交:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
#!/bin/bash
#$ -S /bin/bash
#$ -N bdf_docker
#$ -pe mpi_36 36
#$ -q cpu.q
#$ -cwd
#$ -j y
# SGE环境下的BDF Docker作业脚本
# 使用方法: qsub bdf_sge_docker.sh input_file.inp
INPUT_FILE="$1"
OUTPUT_FILE="${INPUT_FILE%.*}.out"
ERROR_FILE="${INPUT_FILE%.*}.e"
if [ -z "$INPUT_FILE" ]; then
echo "错误: 请提供输入文件名"
echo "使用方法: qsub $0 input_file.inp"
exit 1
fi
echo "=== BDF Docker 作业开始 ==="
echo "节点: $(hostname)"
echo "作业ID: $JOB_ID"
echo "输入文件: $INPUT_FILE"
echo "输出文件: $OUTPUT_FILE"
echo "错误文件: $ERROR_FILE"
echo "工作目录: $(pwd)"
echo "分配的核心数: $NSLOTS"
echo "开始时间: $(date)"
# 设置unlimited limits (在容器内)
# ulimit -s unlimited
# ulimit -l unlimited
# 创建临时目录
TMP_DIR="./tmp/$RANDOM"
mkdir -p "$TMP_DIR"
echo "临时目录: $TMP_DIR"
# 运行BDF计算
docker run --rm \
-v "$(pwd)":/work \
-w /work \
-u $(id -u):$(id -g) \
--cpus="$NSLOTS" \
--ulimit memlock=-1:-1 \
--ulimit stack=-1:-1 \
-e "BDF_WORKDIR=./" \
-e "BDF_TMPDIR=$TMP_DIR" \
-e "OMP_NUM_THREADS=$NSLOTS" \
-e "OMP_STACKSIZE=4G" \
bdf:oneapi-ubuntu22.04 \
python3 /apps/bdf/sbin/bdfdrv.py -r "$INPUT_FILE" > "$OUTPUT_FILE" 2> "$ERROR_FILE"
EXIT_CODE=$?
echo "结束时间: $(date)"
echo "退出代码: $EXIT_CODE"
echo "=== BDF Docker 作业完成 ==="
exit $EXIT_CODE
调试:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
#!/bin/bash
# 启动交互式BDF容器,用于调试
WORK_DIR=$(pwd)
NUM_THREADS="${1:-4}"
echo "启动BDF交互式容器..."
echo "工作目录: $WORK_DIR"
echo "线程数: $NUM_THREADS"
docker run -it --rm \
-v "$WORK_DIR":/work \
-w /work \
-u $(id -u):$(id -g) \
--cpus="$NUM_THREADS" \
-e "OMP_NUM_THREADS=$NUM_THREADS" \
-e "OMP_STACKSIZE=4G" \
bdf:oneapi-ubuntu22.04 \
/bin/bash
笔者测试了第二种方式,先前无法正常运行的BDF计算此时已经可以正常结束:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
---------------------------------------------------------------
Timing of RESP_PROPS calculation
CPU TIME(S) SYSTEM TIME(S) WALL TIME(S)
538.570 3.540 15.140
---------------------------------------------------------------
|******************************************************************************|
Total cpu time: 538.57 S
Total system time: 3.54 S
Total wall time: 15.14 S
Current time 2025-08-01 14:01:01
End running module resp
|******************************************************************************|
Total wall time 22.58 Seconds
You should check your OUTFILE name, neither .bdflog or .bdfout!
/apps/bdf/sbin/bdfsummary.py\bdf_summary\\gb3lyp/DEF2-SVP\\0,1\\C,0.0000004475,4.7804610199,-0.0000026894\C,-2.4071403222,3.6619171642,0.0000114899\C,-3.0278682803,1.0441514135,0.0000182002\C,2.4071410728,3.6619168536,-0.0000126720\C,-1.3891243743,-1.0290332084,0.0000102948\C,3.0278668032,1.0441529126,-0.0000171592\C,1.3891207402,-1.0290296082,-0.0000214317\H,0.0000005812,6.8499076578,-0.0000051068\H,-4.0114251220,4.9633073991,0.0000141138\H,-5.0457405426,0.5931702938,0.0000164802\H,4.0114260405,4.9633106609,-0.0000116930\H,5.0457371173,0.5931658633,-0.0000065842\C,-2.1735422984,-3.6504498994,-0.0000193770\H,-4.1298170597,-4.2920401384,-0.0000366291\C,0.0000016945,-5.1885793906,-0.0000243688\H,0.0000015618,-7.2491556541,-0.0000370634\C,2.1735439939,-3.6504464005,0.0000539224\H,4.1298205617,-4.2920312227,0.0000892272\\gb3lyp=GEOM\\
Congratulations! BDF normal termination
./tmp/21315 /work
Warning! Removing scratch files. If you would like to keep scratch files for restaring. Please use -keeptmpdir