前言
前两天第五空间比赛,有一个pwn
题目toolkit
没有做出来,复现的时候发现是栈溢出,绕过canary
。虽然之前一些canary
的题一直有在做,但是源码什么的还没有调试过,就借此机会,把有关canary
的内容总结一下。基础的泄露canary
什么的就不细说了(同质化内容,我还是复制站贴,然后再稍微改一点,省的某天大佬们的博客没了,如果你嫌废话,直接跳转到最最后面的C++ Exception机制绕过
),主要是调试一下canary
相关源码,和C++的Exception
机制绕过。
麻了。。。写完后才发现有人写过类似的了……算了,自己过一遍不算坏事,,就是费时间
Canary简介
Canary
的意思是金丝雀,来源于英国矿井工人用来探查井下气体是否有毒的金丝雀笼子。工人们每次下井都会带上一只金丝雀。如果井下的气体有毒,金丝雀由于对毒性敏感就会停止鸣叫甚至死亡,从而使工人们得到预警。
我们类比回去,系统执行程序就是工人下井,栈溢出攻击就是井下毒气泄露,canary
就是检测栈溢出攻击,来保护系统安全。
我们知道,通常栈溢出的利用方式是通过溢出存在于栈上的局部变量,从而让多出来的数据覆盖 saved rbp
、saved retaddr
等,从而达到劫持控制流的目的。栈溢出保护是一种缓冲区溢出攻击缓解手段,当函数存在缓冲区溢出攻击漏洞时,攻击者可以覆盖栈上的返回地址来控制执行流。当启用栈保护后,函数开始执行的时候会先往栈底插入 cookie
信息,当函数真正返回的时候会验证 cookie
信息是否合法 (栈帧销毁前测试该值是否被改变),如果不合法就停止程序运行 (检测到栈溢出)。攻击者在覆盖返回地址的时候往往也会将 cookie
信息给覆盖掉,导致栈保护检查失败而停止运行,避免漏洞利用成功。在 Linux
中我们将 cookie
信息称为 Canary
。
Canary构建方式及原理
GCC使用Canary
可以在 GCC
中使用以下参数设置 Canary
:
-fstack-protector 启用保护,不过只为局部变量中含有数组的函数插入保护 -fstack-protector-all 启用保护,为所有函数插入保护 -fstack-protector-strong -fstack-protector-explicit 只对有明确 stack_protect attribute 的函数开启保护 -fno-stack-protector 禁用保护
Canary示例
#include<stdio.h>
#include<stdlib.h>
void func(){
char s[0x15];
scanf("%s", s);
}
int main(){
func();
}
//gcc ./test.c -o -g3 test_can
//gcc -fno-stack-protector ./test.c -o -g3 test
当程序运行起来以后:
Low
Address | |
+-----------------+
| 局部变量 |
+-----------------+
rbp-8 => | canary value |
+-----------------+
rbp => | old ebp |
+-----------------+
| return address |
+-----------------+
| |
上面多出来的汇编代码很简单,就是再调用栈初始化阶段将fs:0x28
存储到rbp-8
的位置,然后在调用栈结束阶段将rbp-8
与fs:0x28
进行比较,如果不一致则执行__stack_chk_fail
函数:
void __attribute__ ((noreturn)) __stack_chk_fail (void)
{
__fortify_fail ("stack smashing detected");
}
//glibc2.23
void __attribute__ ((noreturn)) internal_function __fortify_fail (const char *msg)
{
/* The loop is added only to keep gcc happy. */
while (1)
__libc_message (2, "*** %s ***: %s terminated\n",
msg, __libc_argv[0] ?: "<unknown>");
}
// glibc2.27
void __attribute__ ((noreturn)) __fortify_fail (const char *msg)
{
__fortify_fail_abort (true, msg);
}
void __attribute__ ((noreturn)) __fortify_fail_abort (_Bool need_backtrace, const char *msg)
{
/* The loop is added only to keep gcc happy. Don't pass down
__libc_argv[0] if we aren't doing backtrace since __libc_argv[0]
may point to the corrupted stack. */
while (1)
__libc_message (need_backtrace ? (do_abort | do_backtrace) : do_abort,
"*** %s ***: %s terminated\n",
msg,
(need_backtrace && __libc_argv[0] != NULL
? __libc_argv[0] : "<unknown>"));
}
// glibc2.31及以上
void __attribute__ ((noreturn)) __fortify_fail (const char *msg)
{
/* The loop is added only to keep gcc happy. */
while (1)
__libc_message (do_abort, "*** %s ***: terminated\n", msg);
}
Canary源码分析
因为Canary
的性质与功能我们不难想到Canary
检测的相关代码是在编译过程添加的,Canary
的值是在main函数
之前初始化的,通过查询资料是在security_init
函数中初始化的。同时我们知道程序加载过程是**_start -> libc_start_main -> libc_csu_init -> _init -> main -> _fini**,因此Canary
的值应该是在libc_csu_init
函数中初始化的。怪,断点到libc_start_main
函数,canary
的值已经被初始化了,那就是再向上一层———_start
函数,但是在_start
函数中,Canary
也已经被初始化过了……怪,程序难道不是从_start
开始的么???
又查了一下资料——linux编程之main()函数启动过程,发现我记得没有错啊……
最后才发现,程序在加载时glibc
中的ld.so
会先初始化TLS
,而Canary
就是在这个环节初始化的,对应调用链是**_start->_dl_start->_dl_start_final->_dl_sysdep_start->dl_main->security_init**:
// rtld.c
static void
security_init (void)
{
/* Set up the stack checker's canary. */
uintptr_t stack_chk_guard = _dl_setup_stack_chk_guard (_dl_random);
#ifdef THREAD_SET_STACK_GUARD
THREAD_SET_STACK_GUARD (stack_chk_guard);
#else
__stack_chk_guard = stack_chk_guard;
#endif
/* Set up the pointer guard as well, if necessary. */
uintptr_t pointer_chk_guard
= _dl_setup_pointer_guard (_dl_random, stack_chk_guard);
#ifdef THREAD_SET_POINTER_GUARD
THREAD_SET_POINTER_GUARD (pointer_chk_guard);
#endif
__pointer_chk_guard_local = pointer_chk_guard;
/* We do not need the _dl_random value anymore. The less
information we leave behind, the better, so clear the
variable. */
_dl_random = NULL;
}
static inline uintptr_t __attribute__ ((always_inline))
_dl_setup_stack_chk_guard (void *dl_random)
{
union
{
uintptr_t num;
unsigned char bytes[sizeof (uintptr_t)];
} ret = { 0 };
if (dl_random == NULL)
{
ret.bytes[sizeof (ret) - 1] = 255;
ret.bytes[sizeof (ret) - 2] = '\n';
}
else
{
memcpy (ret.bytes, dl_random, sizeof (ret));
#if BYTE_ORDER == LITTLE_ENDIAN
ret.num &= ~(uintptr_t) 0xff;
#elif BYTE_ORDER == BIG_ENDIAN
ret.num &= ~((uintptr_t) 0xff << (8 * (sizeof (ret) - 1)));
#else
# error "BYTE_ORDER unknown"
#endif
}
return ret.num;
}
#define THREAD_SET_STACK_GUARD(value) \
THREAD_SETMEM (THREAD_SELF, header.stack_guard, value)
/* Set member of the thread descriptor directly. */
# define THREAD_SETMEM(descr, member, value) \
({ if (sizeof (descr->member) == 1) \
asm volatile ("movb %b0,%%gs:%P1" : \
: "iq" (value), \
"i" (offsetof (struct pthread, member))); \
else if (sizeof (descr->member) == 4) \
asm volatile ("movl %0,%%gs:%P1" : \
: "ir" (value), \
"i" (offsetof (struct pthread, member))); \
else \
{ \
if (sizeof (descr->member) != 8) \
/* There should not be any value with a size other than 1, \
4 or 8. */ \
abort (); \
\
asm volatile ("movl %%eax,%%gs:%P1\n\t" \
"movl %%edx,%%gs:%P2" : \
: "A" ((uint64_t) cast_to_integer (value)), \
"i" (offsetof (struct pthread, member)), \
"i" (offsetof (struct pthread, member) + 4)); \
}})
我们通过调试可以清晰看到,Canary
是在_dl_setup_stack_chk_guard
函数中由_dl_random
生成,在THREAD_SET_STACK_GUARD
宏中赋值给fs:0x28
。这里需要注意一点,_dl_setup_stack_chk_guard
函数有inline
关键字,且THREAD_SET_STACK_GUARD
是一个套壳宏,所以在调试的过程中两个都跟踪不进去,只能调试到security_init
函数:
除了security_init
函数,在__libc_start_main
函数中也可以生成Canary
:
LIBC_START_MAIN (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
int argc, char **argv,
#ifdef LIBC_START_MAIN_AUXVEC_ARG
ElfW(auxv_t) *auxvec,
#endif
__typeof (main) init,
void (*fini) (void),
void (*rtld_fini) (void), void *stack_end)
{
/*......*/
/* Set up the stack checker's canary. */
uintptr_t stack_chk_guard = _dl_setup_stack_chk_guard (_dl_random);
# ifdef THREAD_SET_STACK_GUARD
THREAD_SET_STACK_GUARD (stack_chk_guard);
# else
__stack_chk_guard = stack_chk_guard;
# endif
# ifdef DL_SYSDEP_OSCHECK
if (!__libc_multiple_libcs)
{
/* This needs to run to initiliaze _dl_osversion before TLS
setup might check it. */
DL_SYSDEP_OSCHECK (__libc_fatal);
}
# endif
/* Initialize libpthread if linked in. */
if (__pthread_initialize_minimal != NULL)
__pthread_initialize_minimal ();
/* Set up the pointer guard value. */
uintptr_t pointer_chk_guard = _dl_setup_pointer_guard (_dl_random,
stack_chk_guard);
/*......*/
}
现在找到Canary
初始化的地方,我们再看一下$fs
寄存器,对于 Linux
来说,$fs
寄存器实际指向的是当前栈的 TLS
结构,而$fs:0x28
指向的正是 stack_guard
//tls.h
typedef struct
{
void *tcb; /* Pointer to the TCB. Not necessarily the
thread descriptor used by libpthread. */
dtv_t *dtv;
void *self; /* Pointer to the thread descriptor. */
int multiple_threads;
uintptr_t sysinfo;
uintptr_t stack_guard;
uintptr_t pointer_guard;
int gscope_flag;
/* Bit 0: X86_FEATURE_1_IBT.
Bit 1: X86_FEATURE_1_SHSTK.
*/
unsigned int feature_1;
/* Reservation of some values for the TM ABI. */
void *__private_tm[3];
/* GCC split stack support. */
void *__private_ss;
/* The lowest address of shadow stack, */
unsigned long ssp_base;
} tcbhead_t;
我们退到secuiry_init
函数的上一层———_dl_main
:
// rtld.c
static void
dl_main (const ElfW(Phdr) *phdr, ElfW(Word) phnum, ElfW(Addr) *user_entry, ElfW(auxv_t) *auxv)
{
/*......*/
void *tcbp = NULL;
/*......*/
bool need_security_init = true;
if (__glibc_unlikely (audit_list != NULL)
|| __glibc_unlikely (audit_list_string != NULL))
{
/* Since we start using the auditing DSOs right away we need to
initialize the data structures now. */
tcbp = init_tls ();
/* Initialize security features. We need to do it this early
since otherwise the constructors of the audit libraries will
use different values (especially the pointer guard) and will
fail later on. */
security_init ();
need_security_init = false;
load_audit_modules (main_map);
}
/*......*/
}
//dl-tls.c
static void * init_tls (void)
{
/* Number of elements in the static TLS block. */
GL(dl_tls_static_nelem) = GL(dl_tls_max_dtv_idx);
/* Do not do this twice. The audit interface might have required
the DTV interfaces to be set up early. */
if (GL(dl_initial_dtv) != NULL)
return NULL;
/* Allocate the array which contains the information about the
dtv slots. We allocate a few entries more than needed to
avoid the need for reallocation. */
size_t nelem = GL(dl_tls_max_dtv_idx) + 1 + TLS_SLOTINFO_SURPLUS;
/* Allocate. */
GL(dl_tls_dtv_slotinfo_list) = (struct dtv_slotinfo_list *)
calloc (sizeof (struct dtv_slotinfo_list)
+ nelem * sizeof (struct dtv_slotinfo), 1);
/* No need to check the return value. If memory allocation failed
the program would have been terminated. */
struct dtv_slotinfo *slotinfo = GL(dl_tls_dtv_slotinfo_list)->slotinfo;
GL(dl_tls_dtv_slotinfo_list)->len = nelem;
GL(dl_tls_dtv_slotinfo_list)->next = NULL;
/* Fill in the information from the loaded modules. No namespace
but the base one can be filled at this time. */
assert (GL(dl_ns)[LM_ID_BASE + 1]._ns_loaded == NULL);
int i = 0;
for (struct link_map *l = GL(dl_ns)[LM_ID_BASE]._ns_loaded; l != NULL;
l = l->l_next)
if (l->l_tls_blocksize != 0)
{
/* This is a module with TLS data. Store the map reference.
The generation counter is zero. */
slotinfo[i].map = l;
/* slotinfo[i].gen = 0; */
++i;
}
assert (i == GL(dl_tls_max_dtv_idx));
/* Compute the TLS offsets for the various blocks. */
_dl_determine_tlsoffset ();
/* Construct the static TLS block and the dtv for the initial
thread. For some platforms this will include allocating memory
for the thread descriptor. The memory for the TLS block will
never be freed. It should be allocated accordingly. The dtv
array can be changed if dynamic loading requires it. */
void *tcbp = _dl_allocate_tls_storage ();
if (tcbp == NULL)
_dl_fatal_printf ("\
cannot allocate TLS data structures for initial thread\n");
/* Store for detection of the special case by __tls_get_addr
so it knows not to pass this dtv to the normal realloc. */
GL(dl_initial_dtv) = GET_DTV (tcbp);
/* And finally install it for the main thread. */
const char *lossage = TLS_INIT_TP (tcbp);
if (__glibc_unlikely (lossage != NULL))
_dl_fatal_printf ("cannot set up thread-local storage: %s\n", lossage);
tls_init_tp_called = true;
return tcbp;
}
可以看到tcb
结构体的定义过程。
Canary相关攻击方法
__stack_chk_fail函数
当Canary
被篡改,程序便会执行__stack_chk_fail
函数。
劫持got表
我们可以通过劫持该函数的got表,达到控制执行流的目的。
例题:ZCTF2017 Login
Stack Smash | SSP(Stack Smashing Protector ) leak
此时程序就会报错,我们看一下__stack_chk_fail
函数的源码:
//glibc2.23
void __attribute__ ((noreturn)) internal_function __fortify_fail (const char *msg)
{
/* The loop is added only to keep gcc happy. */
while (1)
__libc_message (2, "*** %s ***: %s terminated\n",
msg, __libc_argv[0] ?: "<unknown>");
}
// glibc2.27
void __attribute__ ((noreturn)) __fortify_fail (const char *msg)
{
__fortify_fail_abort (true, msg);
}
void __attribute__ ((noreturn)) __fortify_fail_abort (_Bool need_backtrace, const char *msg)
{
/* The loop is added only to keep gcc happy. Don't pass down
__libc_argv[0] if we aren't doing backtrace since __libc_argv[0]
may point to the corrupted stack. */
while (1)
__libc_message (need_backtrace ? (do_abort | do_backtrace) : do_abort,
"*** %s ***: %s terminated\n",
msg,
(need_backtrace && __libc_argv[0] != NULL
? __libc_argv[0] : "<unknown>"));
}
// glibc2.31及以上
void __attribute__ ((noreturn)) __fortify_fail (const char *msg)
{
/* The loop is added only to keep gcc happy. */
while (1)
__libc_message (do_abort, "*** %s ***: terminated\n", msg);
}
在glibc2.23
及glibc2.27
中(glibc2.31
及以上,__fortify_fail
函数不再输出__libc_argv[0]
指针),该函数会输出__libc_argv[0]
指针所指向的字符串。正常情况下,这个指针指向了程序名,储存在栈上。当我们栈溢出空间特别大时,我们可以覆盖__libc_argv[0]
指针从而输出我们需要的信息。
- 例题:
32C3CTF readme
、34C3CTF readme-revenge
逐字节爆破
对于 Canary
,虽然每次进程重启后的 Canary
不同,但是同一个进程中的不同线程的 Canary
是相同的,并且通过fork
函数创建的子进程的Canary
也是相同的,因为fork
函数会直接拷贝父进程的内存。我们可以利用这样的特点,彻底逐个字节将Canary
爆破出来。
零字节覆盖
因为Canary
在设计的时候,低字节(高地址)处为空字符用于阶段字符串与局部变量区别开来。所以我们可以先把这个空字符覆盖,然后输出局部变量,便可以连带的输出Canary
值。
TLS结构体
因为Canary
的值存储在TLS结构
体内,所以我们可以通过泄露tcb_header
结构体的stack_guard
字段获取Canary
,或者修改tcb_header
结构体的stack_guard
字段自定义Canary
。
C++ Exception机制绕过
C++ Exception
处理的过程具体可以参考☁☁的博客
我们需要知道的是,当某个函数throw
一个exception
,程序便会从当前函数开始向上回溯调用链,直到找到匹配的catch
,执行完catch
后,便会接着在catch
所在函数继续执行。一开始我还以为是什么非常nb的打法,但其实很简单,就是两点:
throw
错误的函数回溯时不会对canary
进行检查- 含
try
,catch
关键词的函数默认无canary
(除非加上-fstack-protector-all
编译选项)
“含try
,catch
关键词的函数默认无canary
”且溢出长度足够,可以直接构造rop
链(构造在catch
所在函数结束后);如果“含try
,catch关键
词的函数默认无canary
”且溢出长度不够,可以进行栈迁移(构造在catch
所在函数结束后)
重点要观察catch
所在函数,及该函数是否有canary
。
例题:Shanghai-DCTF-2017 线下攻防Pwn题 、第五空间CTF的toolkit学到的两个利用技巧
参考链接
《CTF竞赛权威指南——PWN篇》p58 4.2 Stack Canary