AARCH64 native problem debug.
Agenda
AARCH64 Native Debug Knowledge
ARMv8-A A64 ISA Overview
Generic Registers
Addressing modes
Address Mode | example | Explanation |
---|---|---|
Simple | LDR W0, [X1] | X1 is not changed |
offset | LDR W0, [X1, #4] | X1 is not changed |
Pre-indexed | LDR W0, [X1, #4]! | X1 is changed before load |
Post-indexed | LDR W0, [X1], #4 | X1 is changed after load |
Programmer’s Guide for ARMv8-A
Exception Types
- Interrupts
- Happend when FIQ & IRQ physical signal send to the CPU Core.
- Aborts
- Generated by failed instruction fetches (instruction aborts) or failed data accesses (Data Aborts).
- Reset
- For Reset Signal
- Exception
- generating instructions SVC, HVC, SMC etc.
Exception Flow
Exception Register
- ELR_ELx
- Entrypoint to execute when Exception Handled
- SPSR_ELx
- PSTATE backup
- FAR_ELx
- Fault Address Register
- ESR_ELx
- Exception Syndrome Register
Synchronous and and asynchronous exceptions
- Synchronous
- It is generated as a result of execution or attempted execution of the instruction stream, and where the return address provides details of the instruction that caused it.
- Asynchronous
- It is not generated by executing instructions, while the return address might not always provide details of what caused the exception.
examples
- IRQ, FIQ, SError
All trigger by external physical signals, All of this should be asynchronous abort. - MMU dataabort, System Calls, Secure Mointor Calls, Undefined Instruction, Debug exception etc.
Procedure Call Standard
Arch | documentation |
ARM | Procedure Call Standard for the Arm® Architecture |
ARM64 | Procedure Call Standard for the Arm® 64-bit Architecture |
Debugger
GDB
- Symbols
- Control
- Thread, frame, signal
- viewing, showing
Comamnd | Usage |
add-symbol-file | To add symbols to gdb |
set solib-search-path | Set the search path for loading non-absolute shared library symbol files. |
set sysroot | Set an alternate system root. |
Comamnd | Usage |
b | Set breakpoint at specified location. |
hb | Set a hardware assisted breakpoint. |
finish | run until selected stack frame returns |
return | pop selected stack frame without executing |
Comamnd | Usage |
info threads | Display currently known threads. |
thread | Use this command to switch between threads. |
info frame | All about the selected stack frame. |
frame | Select and print a stack frame. |
signal | ontinue program with the specified signal. |
command | execute GDB command-list every time breakpoint n is reached. |
Comamnd | Usage |
set pagination | Set state of GDB output pagination. |
set substitute-path | Add a substitution rule to rewrite the source directories. |
whatis | show data type of expr [or $] without evaluating |
ptype | describe type, struct, union, or enum |
#!/usr/bin/env python
# -*- coding: utf-8 -*-
#
# load-linux-init.py --- Load linux kernel init section
#
# Copyright (C) 2020, schspa, all rights reserved.
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
#
import re
import gdb
from subprocess import check_output, CalledProcessError
class LoadLinuxKernelInitCommand(gdb.Command):
"Load Linux Kernel init text from vmlinux"
def __init__(self):
super(LoadLinuxKernelInitCommand,
self).__init__("load-linux-init", gdb.COMMAND_SUPPORT,
gdb.COMPLETE_EXPRESSION, True)
def get_load_address(self, path, section):
'''
[Nr] Name Type Address Off Size ES Flg Lk Inf Al
[ 0] NULL 0000000000000000 000000 000000 00 0 0 0
[ 1] .head.text PROGBITS ffffff8008080000 010000 001000 00 AX 0 0 4096
[ 2] .text PROGBITS ffffff8008081000 011000 5ee1b0 00 AX 0 0 2048
'''
ELF_PATTERN = re.compile(r"^[\s]*\[(?P<NUM>[0-9 ]+)\]" +
r"[\s]+(?P<NAME>[\S]+)" +
r"[\s]+(?P<Type>[\S]+)" +
r"[\s]+(?P<Address>[\S]+)" +
r"[\s]+(?P<Off>[\S]+)" +
r"[\s]+(?P<Size>[\S]+)" +
r"[\s]+(?P<ES>[\S]+)" +
r"[\s]+(?P<Flg>[\S]+)")
command = r"readelf -WS " + path + r"| grep -E '[[:space:]]+'" + section
output = check_output([command], shell=True).decode("utf-8")
obj = re.search(ELF_PATTERN, output)
if obj is not None:
load_addr = int(obj['Address'], 16)
return load_addr
def invoke(self, arg, from_tty):
argv = gdb.string_to_argv(arg)
if len(argv) != 2:
raise gdb.GdbError("load-linux-init takes two argument")
load_addr = int(str(gdb.parse_and_eval(argv[1])))
print("Loading linux init head to 0x%016x" % (load_addr))
head_init_addr = self.get_load_address(argv[0], ".head.text")
print("Original linux text address at 0x%016x" % (head_init_addr))
offset = head_init_addr - load_addr
text_addr = self.get_load_address(argv[0], ".text") - offset
init_text_addr = self.get_load_address(argv[0], ".init.text") - offset
command = "add-symbol-file {:s} 0x{:x} -s .head.text 0x{:x} -s .init.text 0x{:x}".format(
argv[0], text_addr, load_addr, init_text_addr)
print("load linux image to physical address with command {:s}".format(
command))
gdb.execute(command)
LoadLinuxKernelInitCommand()
# Local Variables:
# indent-tabs-mode: t
# tab-width: 8
# End:
Trace32
- Linux Attach
主要用来查看Linux系统中所有task的stacktrace
SET mypath=%~dp0
start C:\T32\bin\windows64\t32MARM64.exe -s %mypath%\linux-attach.cmm "%mypath%" "U:/vmlinux" "/Volumes/work/j5/kernel/" U:/j5/kernel
DS5
- 主要用来调试Baremetal以及简单的OS,对Linux的支持非常差,约等于0
- 在打断点时,需要先加载好镜像,然后再打断点,否则断点无效
Function | Descrription |
add-symbol-file | Loads additional debug information into the debugger. |
set substitute-path | Modifies the search paths used by the debugger when it executes any of the commands that look up and display source code. |
reload-symbol-file | Reloads debug information from an already loaded image into the debugger using the same settings as the original load operation. |
discard-symbol-file | Discards debug information relating to a specific file. |
DS5 script
#!/usr/bin/env python2
# -*- coding: utf-8 -*-
import os
import sys
import struct
file_path = os.path.abspath(os.path.dirname(__file__))
ms = None;
try:
from arm_ds.debugger_v1 import Debugger
from arm_ds.debugger_v1 import DebugException
# Debugger object for accessing the debugger
debugger = Debugger()
# Initialisation commands
ec = debugger.getExecutionContext(0)
ec.getExecutionService().stop()
ec.getExecutionService().waitForStop()
ms = ec.getMemoryService();
except BaseException as e:
print(e)
pass
if __name__ == '__main__':
if len(sys.argv) < 3:
print("Usage: source %s <address for dtb with base 16> <file_path>" % (sys.argv[0]))
exit(-1)
dtb_addr = int(sys.argv[1], 16)
print("Attempting to dump dtb from 0x%016x" %(dtb_addr))
magic, totalsize, version = struct.unpack('>III', ms.read(dtb_addr, 12))
print("magic: 0x%08x, totalsize: 0x%08x, version: 0x%08x" % (magic, totalsize, version))
if magic != 0xd00dfeed:
exit(-2)
pass
ms.dump(sys.argv[2], 'binary', dtb_addr, dtb_addr + totalsize)
pass
OS Debug
U-Boot
U-Boot Debug before relocate
U-Boot Debug after relocate
2019/10/22/uboot-online-debug.html
Linux
Linux Debug before MMU enable
Linux Debug after MMU enable
Debug linux kernel boot process
Native Crash Examples
StackFrame Analysis
Stack Frame
Regulator Function
- 分配栈空间
- 保存上下文寄存器
- 函数内逻辑
- 子过程调用
- 处理返回值
- 恢复寄存器上下文
- 恢复栈指针
- 返回
Source
/*
* p1 and p2 should be directories on the same fs.
*/
struct dentry *lock_rename(struct dentry *p1, struct dentry *p2)
{
struct dentry *p;
if (p1 == p2) {
inode_lock_nested(p1->d_inode, I_MUTEX_PARENT);
return NULL;
}
mutex_lock(&p1->d_sb->s_vfs_rename_mutex);
p = d_ancestor(p2, p1);
if (p) {
inode_lock_nested(p2->d_inode, I_MUTEX_PARENT);
inode_lock_nested(p1->d_inode, I_MUTEX_CHILD);
return p;
}
p = d_ancestor(p1, p2);
if (p) {
inode_lock_nested(p1->d_inode, I_MUTEX_PARENT);
inode_lock_nested(p2->d_inode, I_MUTEX_CHILD);
return p;
}
inode_lock_nested(p1->d_inode, I_MUTEX_PARENT);
inode_lock_nested(p2->d_inode, I_MUTEX_PARENT2);
return NULL;
}
EXPORT_SYMBOL(lock_rename);
Disassemble
(gdb) disassemble /s lock_rename
Dump of assembler code for function lock_rename:
fs/namei.c:
2839 {
2840 struct dentry *p;
2841
2842 if (p1 == p2) {
0xffffff800817e250 <+0>: stp x29, x30, [sp, #-48]! // 分配栈空间
0xffffff800817e254 <+4>: cmp x0, x1
0xffffff800817e258 <+8>: mov x29, sp // 保存栈帧
0xffffff800817e25c <+12>: stp x19, x20, [sp, #16] // 保存需要Callee保持的参数
0xffffff800817e260 <+16>: mov x19, x0
0xffffff800817e264 <+20>: b.ne 0xffffff800817e27c <lock_rename+44> // b.any
./include/linux/fs.h:
748 down_write_nested(&inode->i_rwsem, subclass);
0xffffff800817e268 <+24>: ldr x0, [x0, #48] // 先得到p1->d_inode, check with ptype /o struct dentry
0xffffff800817e26c <+28>: mov x20, #0x0 // #0
0xffffff800817e270 <+32>: add x0, x0, #0xa0 // 再获取&p1->d_inode->i_rwsem check with ptype /o struct inode
0xffffff800817e274 <+36>: bl 0xffffff80087846e8 <down_write> // 调用子函数,这里没有设置x1参数,因为最后调用的down_write只有一个参数
fs/namei.c:
2844 return NULL;
0xffffff800817e278 <+40>: b 0xffffff800817e2f0 <lock_rename+160>
2845 }
2846
2847 mutex_lock(&p1->d_sb->s_vfs_rename_mutex);
0xffffff800817e27c <+44>: str x21, [sp, #32]
0xffffff800817e280 <+48>: mov x21, x1
0xffffff800817e284 <+52>: ldr x0, [x0, #104]
0xffffff800817e288 <+56>: add x0, x0, #0x418
0xffffff800817e28c <+60>: bl 0xffffff8008783950 <mutex_lock>
2848
2849 p = d_ancestor(p2, p1);
0xffffff800817e290 <+64>: mov x1, x19
0xffffff800817e294 <+68>: mov x0, x21
0xffffff800817e298 <+72>: bl 0xffffff800818dec8 <d_ancestor>
0xffffff800817e29c <+76>: mov x20, x0 //整个过程之中,局部变量p全称都没有存储在stack中,这是个被优化掉的变量
2850 if (p) {
0xffffff800817e2a0 <+80>: cbz x0, 0xffffff800817e2c4 <lock_rename+116>
./include/linux/fs.h:
748 down_write_nested(&inode->i_rwsem, subclass);
0xffffff800817e2a4 <+84>: ldr x0, [x21, #48]
0xffffff800817e2a8 <+88>: add x0, x0, #0xa0
0xffffff800817e2ac <+92>: bl 0xffffff80087846e8 <down_write>
0xffffff800817e2b0 <+96>: ldr x0, [x19, #48]
0xffffff800817e2b4 <+100>: add x0, x0, #0xa0
0xffffff800817e2b8 <+104>: bl 0xffffff80087846e8 <down_write>
fs/namei.c:
2853 return p;
0xffffff800817e2bc <+108>: ldr x21, [sp, #32]
0xffffff800817e2c0 <+112>: b 0xffffff800817e2f0 <lock_rename+160>
2854 }
2855
2856 p = d_ancestor(p1, p2);
0xffffff800817e2c4 <+116>: mov x1, x21
0xffffff800817e2c8 <+120>: mov x0, x19
0xffffff800817e2cc <+124>: bl 0xffffff800818dec8 <d_ancestor>
0xffffff800817e2d0 <+128>: mov x20, x0
./include/linux/fs.h:
748 down_write_nested(&inode->i_rwsem, subclass);
0xffffff800817e2d4 <+132>: ldr x0, [x19, #48]
0xffffff800817e2d8 <+136>: add x0, x0, #0xa0
0xffffff800817e2dc <+140>: bl 0xffffff80087846e8 <down_write>
0xffffff800817e2e0 <+144>: ldr x0, [x21, #48]
0xffffff800817e2e4 <+148>: add x0, x0, #0xa0
0xffffff800817e2e8 <+152>: bl 0xffffff80087846e8 <down_write>
fs/namei.c:
2865 return NULL;
0xffffff800817e2ec <+156>: ldr x21, [sp, #32]
0xffffff800817e2f0 <+160>: mov x0, x20 // 准备返回值
0xffffff800817e2f4 <+164>: ldp x19, x20, [sp, #16] // 恢复寄存器
0xffffff800817e2f8 <+168>: ldp x29, x30, [sp], #48 //恢复栈帧
0xffffff800817e2fc <+172>: ret // return指令返回LR(X30)指针所在位置
End of assembler dump.
Variadic functions
- same with Regulator Function
- 保存X1-X7的寄存器到栈中
- 对va_list的处理
Souce
asmlinkage __visible int printk(const char *fmt, ...)
{
va_list args;
int r;
va_start(args, fmt);
r = vprintk_func(fmt, args);
va_end(args);
return r;
}
EXPORT_SYMBOL(printk);
Disassemble
(gdb) disassemble /s
Dump of assembler code for function printk:
kernel/printk/printk.c:
1990 {
=> 0xffff000008127454 <+0>: stp x29, x30, [sp, #-176]! sp = 0xffff00000805bcf0
0xffff000008127458 <+4>: mov w8, #0xffffffc8 // #-56 sp = 0xFFFF00000805bc40
0xffff00000812745c <+8>: mov x29, sp x29 = sp, fp = 0xFFFF00000805bc40
0xffff000008127460 <+12>: add x9, sp, #0x70 x9= 0xFFFF00000805bcb0
0xffff000008127464 <+16>: add x10, sp, #0xb0 x10 = 0xffff00000805bcf0 i.e. stack_top
0xffff000008127468 <+20>: str x19, [sp, #16] backup x19 to stack Local Variables
0xffff00000812746c <+24>: adrp x19, 0xffff0000092b9000 <page_wait_table+5376>
0xffff000008127470 <+28>: add x19, x19, #0x6c8 x19 = address of __stack_chk_guard
0xffff000008127474 <+32>: stp x10, x10, [sp, #72] prepare va_list args
0xffff000008127478 <+36>: str x9, [sp, #88] save x9 to 0xFFFF00000805BC98
0xffff00000812747c <+40>: ldr x9, [x19] load stack guard magic value to x9.
0xffff000008127480 <+44>: str x9, [sp, #104] store stack guard magic value to stack on 0xFFFF00000805BCA8.
0xffff000008127484 <+48>: mov x9, #0x0 // #0
0xffff000008127488 <+52>: stp w8, wzr, [sp, #96] 设置初始的__gr_offs
0xffff00000812748c <+56>: ldp x8, x9, [sp, #72]
0xffff000008127490 <+60>: stp x8, x9, [sp, #32]
0xffff000008127494 <+64>: ldp x8, x9, [sp, #88]
0xffff000008127498 <+68>: stp x1, x2, [sp, #120] 将x1 ~ x7放入GP Arg Save Area的区域
0xffff00000812749c <+72>: add x1, sp, #0x20
0xffff0000081274a0 <+76>: stp x8, x9, [sp, #48]
0xffff0000081274a4 <+80>: stp x3, x4, [sp, #136]
0xffff0000081274a8 <+84>: stp x5, x6, [sp, #152]
0xffff0000081274ac <+88>: str x7, [sp, #168] 已经将GP Arg Save Area设置好,va_list args参数已经处理好了。
(gdb) i r pc
pc 0xffff0000081274b0 0xffff0000081274b0 <printk+92>
(gdb) p args
$6 = {
__stack = 0xffff00000805bcf0,
__gr_top = 0xffff00000805bcf0,
__vr_top = 0xffff00000805bcb0,
__gr_offs = -56,
__vr_offs = 0
}
0xffff0000081274b0 <+92>: bl 0xffff000008127e28 <vprintk_func>
1996 va_end(args);
1997
1998 return r;
0xffff0000081274b4 <+96>: ldr x2, [sp, #104]
0xffff0000081274b8 <+100>: ldr x1, [x19]
0xffff0000081274bc <+104>: eor x1, x2, x1
0xffff0000081274c0 <+108>: cbz x1, 0xffff0000081274c8 <printk+116>
0xffff0000081274c4 <+112>: bl 0xffff0000080d3d48 <__stack_chk_fail>
0xffff0000081274c8 <+116>: ldr x19, [sp, #16]
0xffff0000081274cc <+120>: ldp x29, x30, [sp], #176
0xffff0000081274d0 <+124>: ret
End of assembler dump.
Crash Analysis
U-Boot Crash Due to firewall
- J5 memory Firewall
- DS5 断点
- DS5 脚本
- dtb overlay
分析问题的直接原因
看一下失败Log
U-Boot SPL 2018.09-00970-ga8d5cac867 (Apr 28 2021 - 10:52:12 +0800)
BUILD_FLAGS: M:-1 __ AV:0
Trying to boot from HB_BLK
NOTICE: BL31: v2.1(debug):j5-fiq-test-37-gb75ca9a
NOTICE: BL31: Built : 04:17:41, Jan 28 2021
INFO: GICv3 without legacy support detected. ARM GICv3 driver initialized in EL3
INFO: GICv3 GICTNS support enable.
INFO: BL31: Initialising Exception Handling Framework
INFO: BL31: Initializing runtime services
INFO: hobotd malloc pool setup at 0x80274000, size:0x200000
INFO: BL31: Preparing for EL3 exit to normal world
INFO: Entry point address = 0x88000000
INFO: SPSR = 0x3c9
INFO:
hb_firewall_handler: irq 88 occured in EL3
INFO: hb_firewall_handler: r: 1@0x0000000020000d14, w: 0@0x0000000020000000
BACKTRACE: START: hb_firewall_handler
0: EL3: 0x8003eccc
1: EL3: 0x800072e4
2: EL3: 0x8000db3c
3: EL3: 0x80044d54
BACKTRACE: END: hb_firewall_handler
PANIC in EL3.
x30 = 0x00000000800072f0
x0 = 0x0000000000000000
x1 = 0x0000000000000060
x2 = 0x0000000000000060
x3 = 0x0000000000000000
x4 = 0x0000000000000000
x5 = 0x0000000020000000
x6 = 0x00000000880c5dc9
x7 = 0x0000000000000003
x8 = 0x0000000000000001
x9 = 0x0000000041453020
x10 = 0x0000000000000735
x11 = 0x0000000082fbb9ac
x12 = 0x000000000000593c
x13 = 0x0000000082fbba6c
x14 = 0x00000000880bf2e8
x15 = 0x0000000082fbbd7c
x16 = 0x0000000000000000
x17 = 0x0000000000000000
x18 = 0x0000000000000000
x19 = 0x000000008808f000
x20 = 0x0000000080070190
x21 = 0x000000008000d9b8
x22 = 0x0000000000000000
x23 = 0x0000000000000000
x24 = 0x0000000000000000
x25 = 0x0000000000000000
x26 = 0x0000000000000000
x27 = 0x0000000000000000
x28 = 0x0000000000000000
x29 = 0x0000000080068e90
scr_el3 = 0x0000000000000735
sctlr_el3 = 0x0000000030cd183f
cptr_el3 = 0x0000000000000000
tcr_el3 = 0x000000008081351d
daif = 0x00000000000002c0
mair_el3 = 0x00000000004404ff
spsr_el3 = 0x00000000800003c9
elr_el3 = 0x000000008802b018
ttbr0_el3 = 0x000000008007a801
esr_el3 = 0x000000005e000000
far_el3 = 0x0000000000000000
spsr_el1 = 0x0000000000000000
elr_el1 = 0x0000000000000000
spsr_abt = 0x0000000000000000
spsr_und = 0x0000000000000000
spsr_irq = 0x0000000000000000
spsr_fiq = 0x0000000000000000
sctlr_el1 = 0x0000000030d00800
actlr_el1 = 0x0000000000000000
cpacr_el1 = 0x0000000000000000
csselr_el1 = 0x0000000000000000
sp_el1 = 0x0000000000000000
esr_el1 = 0x0000000000000000
ttbr0_el1 = 0x0000000000000000
ttbr1_el1 = 0x0000000000000000
mair_el1 = 0x0000000000000000
amair_el1 = 0x0000000000000000
tcr_el1 = 0x0000000000000000
tpidr_el1 = 0x0000000000000000
tpidr_el0 = 0x0000000000000000
tpidrro_el0 = 0x0000000000000000
par_el1 = 0x000000000000080d
mpidr_el1 = 0x0000000081000000
afsr0_el1 = 0x0000000000000000
afsr1_el1 = 0x0000000000000000
contextidr_el1 = 0x0000000000000000
vbar_el1 = 0x0000000000000000
cntp_ctl_el0 = 0x0000000000000000
cntp_cval_el0 = 0x0000000000000000
cntv_ctl_el0 = 0x0000000000000000
cntv_cval_el0 = 0x0000000000000000
cntkctl_el1 = 0x0000000000000000
sp_el0 = 0x0000000080068e90
isr_el1 = 0x0000000000000000
dacr32_el2 = 0x0000000000000000
ifsr32_el2 = 0x0000000000000000
cpuectlr_el1 = 0x000000002808bc00
- firewall中断
hb_firewall_handler: irq 88 occured in EL3
INFO: hb_firewall_handler: r: 1@0x0000000020000d14, w: 0@0x0000000020000000
BACKTRACE: START: hb_firewall_handler
0: EL3: 0x8003eccc
1: EL3: 0x800072e4
2: EL3: 0x8000db3c
3: EL3: 0x80044d54
这个地方只看第二行打印的地址就可以了, r: 1@0x0000000020000d14, w: 0@0x0000000020000000
INFO("%s: r: %u@0x%016llx, w: %u@0x%016llx \n", __func__,
r_master, r_far, w_master, w_far);
r: 读取
1@0x0000000020000d14: master 1 在读取0x0000000020000d14地址时被拦截
master 1: MFW_MASTER_ID_CPU
- 触发源
由于firewall通过中断来触发, 这里的ELR寄存器仅供参考, ESR, FAR寄存器没有参考意义
elr_el3 = 0x000000008802b018
esr_el3 = 0x000000005e000000
far_el3 = 0x0000000000000000
在此例中,由于没有smp,也没有中断的干扰,所以,ELR_EL3基本会是在出错的指令附近
反编译uboot, 可以看到,大致位址就是在通过map来读取数据时.
Dump of assembler code for function regmap_mmio_read32le:
drivers/core/regmap-mmio.c:
27 {
28 struct regmap *map = ctx->map;
29 u32 *ptr = map_physmem(map->ranges[0].start + offset, 4, MAP_NOCACHE);
0x000000008802b008 <+0>: 00 00 40 f9 ldr x0, [x0]
0x000000008802b00c <+4>: e1 03 01 2a mov w1, w1
0x000000008802b010 <+8>: 00 18 40 f9 ldr x0, [x0, #48]
30
31 *valp = le32_to_cpu(readl(ptr));
0x000000008802b014 <+12>: 20 68 60 b8 ldr w0, [x1, x0]
0x000000008802b018 <+16>: bf 3f 03 d5 dmb sy
0x000000008802b01c <+20>: 40 00 00 b9 str w0, [x2]
32
33 return 0;
34 }
0x000000008802b020 <+24>: 00 00 80 52 mov w0, #0x0 // #0
0x000000008802b024 <+28>: c0 03 5f d6 ret
End of assembler dump.
分析dtb overlay失败的原因
- 源代码
int merge_dtb(void *dst, void *src)
{
u32 fdt_size;
int ret;
if (dst == NULL || src == NULL)
return 0;
/**
* in FPGA environment, dtb will be load to memory directly.
* In this case, dst will be same as src, because we use same dtbs
*/
if (dst == src)
return 0;
if (fdt_magic(dst) != FDT_MAGIC || fdt_magic(src) != FDT_MAGIC) {
return -EINVAL;
}
/**
* overlay device tree properties
* there is some services populated by secure monitor.
* overlay our base device tree.
*/
fdt_size = fdt_totalsize(dst);
assert(fdt_size <= sizeof(backup_buffer));
memcpy(backup_buffer, dst, fdt_size);
ret = fdt_overlay_apply_verbose(dst, src);
if (ret) {
pr_err("dtb overlay failed with status %d\n", ret);
memcpy(dst, backup_buffer, fdt_size);
}
return ret;
}
- 在出故障的地方打上断点
DS5中,必须在镜像加载之后打上的断点才能起作用,使用时需要格外注意
>hb spl_perform_fixups
Hardware breakpoint 2 at EL3:0x0000000020002F64
on file spl.c, line 412
on file spl.c, line 414
>c
Execution stopped in EL3h mode at breakpoint 2: EL3:0x0000000020002F64
In spl.c
Unable to read source file /home/bin.zhu/jenkins/workspace/platform_j5/j5_testing@2/uboot/board/hobot/j5/spl/spl.c
EL3:0x0000000020002F64 412,0 MOV w1,#0x19
>bt
#0 spl_perform_fixups(spl_image = (struct spl_image_info*) 0x2001BD08) at spl.c:412
#1 [board_init_r+0xEC]
#2 [_main+0x28]
>hb *EL2N:0x88000000
Hardware breakpoint 3 at EL2N:0x0000000088000000
>c
Execution stopped in EL2h mode at breakpoint 3: EL2N:0x0000000088000000
EL2N:0x0000000088000000 B {pc}+40 ; 0x88000028
>bt
#0 [EL2N:0x0000000088000000]
>add-symbol-file ~/symbols/boot/u-boot
>hb merge_dtb
Hardware breakpoint 4 at EL2N:0x00000000880039BC
on file dtb_overlay.c, line 33
- 调试dtb overlay的相关函数,使用DS5脚本来dump dtb
>c
Execution stopped in EL2h mode at breakpoint 4: EL2N:0x00000000880039BC
In dtb_overlay.c
Unable to read source file /home/bin.zhu/jenkins/workspace/platform_j5/j5_testing@2/uboot/board/hobot/j5/dtb_overlay.c
EL2N:0x00000000880039BC 33,0 STP x29,x30,[sp,#-0x30]!
>i r x0 x1
X0 0x00000000880BF2E8
X1 0x000000008005FF40
>source ~/work/src/fpga-script/script/dump_dtbs.py 0x00000000880BF2E8 x0.dtb
>source ~/work/src/fpga-script/script/dump_dtbs.py 0x000000008005FF40 x1.dtb
>b puts
Breakpoint 5.1 at EL3:0x0000000020006044
on file console.c, line 537
Breakpoint 5.2 at EL2N:0x000000008801F8FC
on file console.c, line 537
on file console.c, line 548
>c
Execution stopped in EL2h mode at breakpoint 5.2: EL2N:0x000000008801F8FC
In console.c
EL2N:0x000000008801F8FC 537,0 {
>p s
$10 = 0x82FBBB58 "failed on fdt_overlay_apply(): FDT_ERR_NOTFOUND
"
>bt
#0 puts(s = 0x82FBBB58 "failed on fdt_overlay_apply(): FDT_ERR_NOTFOUND
dtb overlay失败的原因
从上边的操作log来看,失败原因是 FDT_ERR_NOTFOUND ,一般在对应需要overlay的节点找不到时才会报这个错误.
上边已经把fdt_overlay_apply_verbose函数的输入已经dump到文件中了,接下来分析这两个dtb文件既可
- 反编译dtb
dtc -I dts -O dtb -o x0.dts x0.dtb
dtc -I dts -O dtb -o x1.dts x1.dtb
- x0相关信息:
__symbols__ {
pmu = "/pmu@aon";
gic = "/interrupt-controller@58000000";
vmmcsd_fixed = "/fixedregulator";
vmcsd1_fixed = "/fixedregulator1";
tee_regmap = "/tee_regmap_services";
gclk3 = "/gclk3";
cspi0 = "/cspi@0x48020000";
nor = "/cspi@0x48020000/flash@0";
nand = "/cspi@0x48020000/flash@1";
hyper = "/cspi@0x48020000/flash@2";
hb_sci = "/hb_sci@boot_flags";
-decoder = "/hb_sci@sec_boot_flags";
ns,rclk-en = "/firmware@tee";
n = "/psci";
ess = "/reserved-memory";
};
- x1相关信息
__fixups__ {
firmware = "/fragment@0:target:0";
reserved_memory = "/fragment@1:target:0";
psci = "/fragment@2:target:0";
tee_regmap = "/fragment@3:target:0";
hb_sci = "/fragment@4:target:0";
hb_sec_sci = "/fragment@5:target:0";
soc = "/fragment@6:target:0";
};
根据dtb overlay的原理,__fixups__对应的条目,必须在__symbols__里边能有相应的匹配项,才能overlay成功.
由上面的结果,很容易可以看出x0都应的symbol里边有部份条目不正常. x0是uboot.bin镜像中自带的dtb,尝试从image中dump出该dtb
- dtb in image
find offset in uboot.img.bin
Section Headers:
[Nr] Name Type Address Off Size ES Flg Lk Inf Al
[ 0] NULL 0000000000000000 000000 000000 00 0 0 0
[ 1] .text PROGBITS 0000000088000000 010000 000128 00 AX 0 0 8
[ 2] .efi_runtime PROGBITS 0000000088000128 010128 000990 00 WAX 0 0 8
[ 3] .text_rest PROGBITS 0000000088001000 011000 07514c 00 AX 0 0 2048
[ 4] .rodata PROGBITS 0000000088076150 086150 0263a7 00 A 0 0 8
[ 5] .hash HASH 000000008809c4f8 0ac4f8 000018 04 A 0 0 8
[ 6] .data PROGBITS 000000008809c510 0ac510 00679e 00 WA 0 0 8
[ 7] .got PROGBITS 00000000880a2cb0 0b2cb0 000008 08 WA 0 0 8
[ 8] .got.plt PROGBITS 00000000880a2cb8 0b2cb8 000018 08 WA 0 0 8
[ 9] .u_boot_list PROGBITS 00000000880a2cd0 0b2cd0 003c70 00 WA 0 0 4
[10] .efi_runtime_rel RELA 00000000880a6940 0b6940 0001e0 18 A 0 0 8
[11] .rela.dyn RELA 00000000880a6b20 0b6b20 0187c8 18 A 0 0 8
[12] .dtbo_reserved NOBITS 00000000880bf2e8 0cf2e8 040000 00 WA 0 0 8
[13] .bss_start PROGBITS 00000000880ff2e8 10f2e8 000000 00 WA 0 0 8
[14] .bss NOBITS 00000000880ff300 10f2e8 05bb68 00 WA 0 0 64
[15] .bss_end PROGBITS 000000008815ae68 16ae68 000000 00 WA 0 0 8
[16] .debug_line PROGBITS 0000000000000000 16ae68 052cd9 00 0 0 1
[17] .debug_info PROGBITS 0000000000000000 1bdb41 32b788 00 0 0 1
[18] .debug_abbrev PROGBITS 0000000000000000 4e92c9 04f098 00 0 0 1
[19] .debug_aranges PROGBITS 0000000000000000 538370 00ebd0 00 0 0 16
[20] .comment PROGBITS 0000000000000000 546f40 000024 01 MS 0 0 1
[21] .debug_frame PROGBITS 0000000000000000 546f68 0251c0 00 0 0 8
[22] .debug_str PROGBITS 0000000000000000 56c128 02988d 01 MS 0 0 1
[23] .debug_loc PROGBITS 0000000000000000 5959b5 168f93 00 0 0 1
[24] .debug_ranges PROGBITS 0000000000000000 6fe950 0257d0 00 0 0 16
[25] .shstrtab STRTAB 0000000000000000 76a431 00011a 00 0 0 1
[26] .symtab SYMTAB 0000000000000000 724120 034398 18 27 7211 8
[27] .strtab STRTAB 0000000000000000 7584b8 011f79 00 0 0 1
dtb在uboot.img.bin中的位址: 0x00000000880bf2e8 - 0x0000000088000000 + 0x240 = 0xbf528
dd if=uboot.img.bin of=dtb-in-image.dtb bs=1 skip=78365
__symbols__ {
tee_regmap = "/tee_regmap_services";
gclk3 = "/gclk3";
cspi0 = "/cspi@0x48020000";
nor = "/cspi@0x48020000/flash@0";
nand = "/cspi@0x48020000/flash@1";
hyper = "/cspi@0x48020000/flash@2";
hb_sci = "/hb_sci@boot_flags";
hb_sec_sci = "/hb_sci@sec_boot_flags";
firmware = "/firmware@tee";
psci = "/psci";
reserved_memory = "/reserved-memory";
};
- Use watchpoint to check if it‘s modified by BL31 or someware else.
在这个例子中,此处的内存并没有有被意外修改,所以此处省略加watchpoint的示例.
- dump in image & after load
00007290: 3500 6932 6336 0069 3263 3700 7063 6965 5.i2c6.i2c7.pcie
000072a0: 006d 6d63 3000 6d6d 6331 0074 6565 5f72 .mmc0.mmc1.tee_r
000072b0: 6567 6d61 7000 6763 6c6b 3300 6373 7069 egmap.gclk3.cspi
000072c0: 3000 6e6f 7200 6e61 6e64 0068 7970 6572 0.nor.nand.hyper
000072d0: 0068 625f 7363 6900 2d64 6563 6f64 6572 .hb_sci.-decoder
000072e0: 0063 646e 732c 7263 6c6b 2d65 6e00 6163 .cdns,rclk-en.ac
000072f0: 6365 7373 0063 646e 732c 7061 6765 2d73 cess.cdns,page-s
00007300: 697a
00007290: 3500 6932 6336 0069 3263 3700 7063 6965 5.i2c6.i2c7.pcie
000072a0: 006d 6d63 3000 6d6d 6331 0074 6565 5f72 .mmc0.mmc1.tee_r
000072b0: 6567 6d61 7000 6763 6c6b 3300 6373 7069 egmap.gclk3.cspi
000072c0: 3000 6e6f 7200 6e61 6e64 0068 7970 6572 0.nor.nand.hyper
000072d0: 0068 625f 7363 6900 6862 5f73 6563 5f73 .hb_sci.hb_sec_s
000072e0: 6369 0066 6972 6d77 6172 6500 7073 6369 ci.firmware.psci
000072f0: 0072 6573 6572 7665 645f 6d65 6d6f 7279 .reserved_memory
00007300: 00
- 确定是loader的问题,并修复
从上面的结果可以看到,在加载后,这里dtb里边的__symbol__已经是不正常的状态了.
修复patch:hb_load: image_len + header size, image_len not include header size.