Machine language#
1. Intel x86 processors#
Overview
Dominate laptop/desktop/server market
Evolutionary design
Backwards compatible up until 8086, introduced in 1978
Added more features as time goes on
x86 is a Complex Instruction Set Computer (CISC)
Many different instructions with many different formats
But, only small subset encountered with Linux programs
Compare: Reduced Instruction Set Computer (RISC)
RISC: very few instructions, with very few modes for each
RISC can be quite fast (but Intel still wins on speed!)
Current RISC renaissance (e.g., ARM, RISC V), especially for low-power
Transistor
Building blocks modern electronics
Two transistors used to design an AND gate:
Intel and AMD
Name |
Date |
Transistor Counts |
---|---|---|
386 |
1985 |
0.3M |
Pentium |
1993 |
3.1M |
Pentium/MMX |
1997 |
4.5M |
Pentium Pro |
1995 |
6.5M |
Pentium III |
1999 |
8.2M |
Pentium 4 |
2000 |
42M |
Core 2 Duo |
2006 |
291M |
Core i7 |
2008 |
731M |
Core i7 Skylake |
2015 |
1.75B |
Added features
Instructions to support multimedia operations
Instructions to enable more efficient conditional operations (!)
Transition from 32 bits to 64 bits
More cores
Name |
Date |
Transistor Counts |
---|---|---|
AMD K5 |
1996 |
4.3M |
AMD K6 |
1997 |
8.8M |
AMD K6/III |
1998 |
21.3M |
AMD K7 |
1999 |
22.0M |
AMD K8 |
2003 |
105.9M |
AMD Opteron |
2009 |
904M |
AMD Bulldozer |
2012 |
1.2B |
AMD Ryzen 5 |
2017 |
4.8B |
AMD Epyc |
2017 |
19.2B |
x86 clones: Advanced Micro Devices (AMD)
Historically
AMD has followed just behind Intel
A little bit slower, a lot cheaper
Then
Recruited top circuit designers from Digital Equipment Corp. and other downward trending companies
Built Opteron: tough competitor to Pentium 4
Developed x86-64, their own extension to 64 bits
Recent Years
Intel got its act together
1995-2011: Lead semiconductor “fab” in world
2018: #2 largest by $$ (#1 is Samsung)
2019: reclaimed #1
AMD fell behind
Relies on external semiconductor manufacturer GlobalFoundaries
ca. 2019 CPUs (e.g., Ryzen) are competitive again
2020 Epyc
2. Machine programming: levels of abstraction#
Overview
Architecture
: (alsoISA
: instruction set architecture) The parts of a processor design that one needs to understand for writing correct machine/assembly codeExamples: instruction set specification, registers
Machine Code
: The byte-level programs that a processor executesAssembly Code
: A text representation of machine code
Microarchitecture
: Implementation of the architectureExamples: cache sizes and core frequency
Example ISAs:
Intel: x86, IA32, Itanium, x86-64
ARM: Used in almost all mobile phones
RISC V: New open-source ISA
Assembly/Machine code view
Machine code (Assembly code) differs greatly from the original C code.
Parts of processor state that are not visible/accessible from C programs are now visible.
PC: Program counter
Contains address of next instruction
Called
%rip
(instruction pointer register)
Register file
contains 16 named locations (registers), each can store 64-bit values.
These registers can hold addresses (~ C pointers) or integer data.
Condition codes
Store status information about most recent arithmetic or logical operation
Used for conditional branching (
if
/while
)
Vector registers to hold one or more integers or floating-point values.
Memory
Is seen as a byte-addressable array
Contains code and user data
Stack to support procedures
Hands on: assembly/machine code example
Inside your
csc231
, create another directory called04-machine
and change into this directory.Create a file named
mstore.c
with the following contents:
Run the following commands It is capital o, not number 0
$ gcc -Og -S mstore.c
$ cat mstore.s
$ gcc -Og -c mstore.c
$ objdump -d mstore.o
x86_64 instructions range in length from 1 to 15 bytes
The disassembler determines the assembly code based purely on the byte-sequence in the machine-code file.
All lines begin with
.
are directirves to the assembler and linker.
3. Assembly language#
Overview
Symbolic coding
Very strong correspondence between the language syntax and the microarchitecture’s machine code instructions
../figure from Programming the IBM 1401 Manual (1962)
Data format
C data type |
Intel data type |
Assembly-code suffix |
Size |
---|---|---|---|
char |
Byte |
b |
1 |
short |
Word |
w |
2 |
int |
Double word |
l |
4 |
long |
Quad word |
q |
8 |
char * |
Quad word |
q |
8 |
float |
Single precision |
s |
4 |
double |
Double precision |
l |
8 |
Integer registers
x86_64 CPU contains a set of 16
general purpose registers
storing 64-bit values.Original 8086 design has eight 16-bit registers,
%ax
through%sp
.Origin (mostly obsolete)
%ax
: accumulate%cx
: counter%dx
: data%bx
: base%si
: source index%di
: destination index%sp
: stack pointer%bp
: base pointer
After IA32 extension, these registers grew to 32 bits, labeled
%eax
through%esp
.After x86_64 extension, these registers were expanded to 64 bits, labeled
%rax
through%rsp
. Eight new registered were added:%r8
through%r15
.Instructions can operate on data of different sizes stored in low-order bytes of the 16 registers.
Bryant and O’ Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
Axsembly characteristics: Operations
Transfer data between memory and register
Load data from memory into register
Store register data into memory
Perform arithmetic function on register or memory data
Transfer control
Unconditional jumps to/from procedures
Conditional branches
Indirect branches
4. Data movement#
Definition
Example:
movq Source, Dest
Note: This is ATT notation. Intel uses
mov Dest, Source
Operand Types for
Source
andDest
:Immediate (Imm): Constant integer data.
$0x400
,$-533
.Like C constant, but prefixed with
$
.Encoded with 1, 2, or 4 bytes.
Register (Reg): One of 16 integer registers
Example:
%rax
,%r13
%rsp
reserved for special use.Others have special uses in particular instructions.
Memory (Mem): 8 (
q
inmovq
) consecutive bytes of memory at address given by register.Example:
(%rax)
Various other addressing mode (See textbook page 181, ../figure 3.3).
Other
mov
:movb
: move bytemovw
: move wordmovl
: move double wordmovq
: move quad wordmoveabsq
: move absolute quad word
movq Operand Combinations
Simple memory addressing mode
Normal: (R) Mem[Reg[R]]
Register R specifies memory address
Aha! Pointer dereferencing in C
movq (%rcx),%rax
Displacement D(R) Mem[Reg[R]+D]
Register R specifies start of memory region
Constant displacement D specifies offset
movq 8(%rbp),%rdx
x86_64 Assembly Cheatsheet
Hands on: swapping via single-valued pointers
Create a file named
swap.c
in04-machine
with the following contents:
Run the following commands
$ gcc -Og -c swap.c
$ objdump -d swap.o
Procedure Data Flow:
First six parameters of a function will be placed into
rdi
,rsi
,rdx
,rcx
,r8
,r9
.The remaining parameters will be pushed on to the stack of the calling function.
Hands on: swapping positions in an array (via pointer)
Create a file named
swap_dsp.c
in04-machine
with the following contents:
Run the following commands
$ gcc -Og -c swap_dsp.c
$ objdump -d swap_dsp.o
What is the meaning of
0x190
?
Complete memory addressing mode
Most General Form
D(Rb,Ri,S)
:Mem[Reg[Rb]+S*Reg[Ri]+ D]
D: Constant displacement 1, 2, or 4 bytes
Rb: Base register: Any of 16 integer registers
Ri: Index register: Any, except for
%rsp
S: Scale: 1, 2, 4, or 8
Special Cases
(Rb,Ri)
:Mem[Reg[Rb]+Reg[Ri]]
D(Rb,Ri)
:Mem[Reg[Rb]+Reg[Ri]+D]
(Rb,Ri,S)
:Mem[Reg[Rb]+S*Reg[Ri]]
(,Ri,S)
:Mem[S*Reg[Ri]]
D(,Ri,S)
:Mem[S*Reg[Ri] + D]
5. Arithmetic operations#
lea
lea
: load effective addressA form of
movq
intsructionlea S, D
: Write&S
toD
.can be used to generate pointers
can also be used to describe common arithmetic operations.
Hands on: lea
Create a file named
m12.c
in04-machine
with the following contents:
Run the following commands
$ gcc -Og -c m12.c
$ objdump -d m12.o
Assembly code explanation of
m12.c
:%rdi
: x(%rdi, %rdi,2)
= x + 2 * xThe above result is moved to
%rdx
withlea
.0x0(,%rdx,4)
= 4 * (x + 2 * x) = 12*xThe above result is moved to
%rax
withlea
.
Other arithmetic operations
Omitting suffixes comparing to the book.
Src
:S
Dest
:D
Format |
Computation |
Description |
---|---|---|
|
D <- D + S |
add |
|
D <- D - S |
subtract |
|
D <- D * S |
multiply |
————— |
———– |
—————— |
|
D <- D << S |
shift left |
|
D <- D >S |
arith. shift right |
|
D <- D >S |
shift right |
|
D <- D << S |
arith. shift left |
————— |
———– |
—————— |
|
D <- D ^ S |
exclusive or |
|
D <- D & S |
and |
|
D <- D | S |
or |
————— |
———– |
—————— |
|
D <- D + 1 |
increment |
|
D <- D - 1 |
decrement |
|
D <- -D |
negate |
|
D <- -D |
complement |
Watch out for argument order (ATT versus Intel)
No distinction between signed and unsigned int.
Exception: arithmetic right shift (
sar
) where the significant bit is retained.
Challenge: lea
Create a file named
scale.c
in04-machine
with the following contents:
Run the following commands
$ gcc -Og -c scale.c
$ objdump -d scale.o
Identify the registers holding x, y, and z.
Which register contains the final return value?
Solution
%rdi
: x%rsi
: y%rdx
: z%rax
contains the final return value.
Hands on: long arithmetic
Create a file named
arith.c
in04-machine
with the following contents:
Run the following commands
$ gcc -Og -c arith.c
$ objdump -d arith.o
Understand how the Assembly code represents the actual arithmetic operation in the C code.
6. Logical operations#
Quick review: processor state
Information about currently executing program
temporary data (
%rax
,…)location of runtime stack (
%rsp
)location of current code control point (
%rip
,…)status of recent tests (
CF
,ZF
,SF
,OF
in%EFLAGS
)
Condition codes (implicit setting)
Single-bit registers
CF
: the most recent operation generated a carry out of the most significant bit.ZF
: the most recent operation yielded zero.SF
: the most recent operation yielded negative.OF
: the most recent operation caused a two’s-complement overflow.
Implicitly set (as side effect) of arithmetic operations.
Condition codes (explicit setting)
Exlicit setting by Compare instruction
cmpq Src2, Src1
cmpq b, a
like computinga - b
without setting destination
CF
set if carry/borrow out from most significant bit (unsigned comparisons)ZF
set ifa == b
SF
set if(a - b) < 0
OF
set if two’s complement (signed) overflow(a>0 && b<0 && (a-b)<0) || (a<0 && b>0 && (a-b)>0)
Condition branches (jX)
Jump to different part of code depending on condition codes
Implicit reading of condition codes
jX |
Condition |
Description |
---|---|---|
|
1 |
direct jump |
|
ZF |
equal/zero |
|
~ZF |
not equal/not zero |
|
SF |
negative |
|
~SF |
non-negative |
|
~(SF^OF) & ~ZF |
greater |
|
~(SF^OF) |
greater or equal to |
|
SF^OF |
lesser |
|
SF^OF | ZF |
lesser or equal to |
|
~CF & ~ZF |
above |
|
CF |
below |
Hands on: a simple jump
Create a file named
jump.c
in04-machine
with the following contents:
Run the following commands
$ gcc -Og -c jump.c
$ objdump -d jump.o
Understand how the Assembly code enables jump across instructions to support conditional workflow.
In the next video, we will look at how
cmp
andjle
ofabsdiff
really behave in an actual execution.
Hands on: loop
Create a file named
factorial.c
in04-machine
with the following contents:
Run the following commands
$ gcc -Og -c factorial.c
$ objdump -d factorial.o
Understand how the Assembly code enables jump across instructions to support loop.
Create
factorial_2.c
andfactorial_3.c
fromfactorial.c
.Modify
factorial_2.c
so that the factorial is implemented with awhile
loop. Study the resulting Assembly code.Modify
factorial_3.c
so that the factorial is implemented with afor
loop. Study the resulting Assembly code.Behavior of
factorial
Assembly instructions inside GDB
7. Mechanisms in procedures (functions)#
Overview
Function = procedure (book terminology)
Support procedure
P
calls procedureQ
.Passing control
To beginning of procedure code
starting instruction of
Q
Back to return point
next instruction in
P
afterQ
Passing data
Procedure arguments
P
passes one or more parameters toQ
.Q
returns a value back toP
.
Return value
Memory management
Allocate during procedure execution and de-allocate upon return
Q
needs to allocate space for local variables and free that storage once finishes.
Mechanisms all implemented with machine instructions
x86-64 implementation of a procedure uses only those mechanisms required
Machine instructions implement the mechanisms, but the choices are determined by designers. These choices make up the Application Binary Interface (ABI).
x86-64 stack
Region of memory managed with stack discipline
Memory viewed as array of bytes.
Different regions have different purposes.
(Like ABI, a policy decision)
Grows toward lower addresses
Register
%rsp
contains lowest stack address.address of “top” element
Stack push and pop
pushq Src
Fetch operand at
Src
Decrement
%rsp
by 8Write operand at address given by
%rsp
popq Dest
Read value at address given by
%rsp
Increment
%rsp
by 8Store value at Dest (usually a register)
What really happens in memory/registers at the beginning and the end of a function
The
-Og
flag often combines/reduces these steps.The memory stack architecture for a function has a base pointer (
$rbp
) and a stack pointer ($rsp
).Base pointer: the bottom of the stack (higher memory address)
Stack pointer: the top of the stack (lower memory address)
Function prologue
Push the current base pointer onto the memory stack (to be restored later).
Assign the value of the base pointer (set the
$rbp
to that value) to the current address pointed to by the stack pointer.Move the stack pointer down further (push new memory in) a distance that would accommodate local variables of the function.
Function prologue (Assembly), ATT notation, assume rbp/ebp and rsp/esp
push $rbp
mov $rsp, $rbp
sub N, $rsp
Function epilogue
Drop the stack pointer to the current base pointer, so room reserved in the prologue for local variables is freed.
Pops the base pointer off the stack, so it is restored to its value before the prologue.
Returns to the calling function, by popping the previous frame’s program counter off the stack and jumping to it.
Function prologue (Assembly), ATT notation, assume rbp/ebp and rsp/esp
mov $rbp, $rsp
pop $rbp
ret
Video lecture on the slide
Hands on: function calls
Create a file named
mult.c
in04-machine
with the following contents:
Description of C code:
Compile with
-g
flag and rungdb
on the resulting executable.
$ gcc -g -o mult mult.c
$ gdb mult
Setup gdb with a breakpoint at
main
and start running.A new GDB command is
si
: executing the next instruction (machine or code instruction).It will execute the highlighted (greened and arrowed) instruction in the
code
section.If the Assembly instruction is calling another function, we need to use
ni
if we don’t want to step into that instruction.
Be careful, Intel notation in the code segment of GDB
endbr64
is a new instruction to help enforce Control Flow Technology to prevent potential stitching of malicious Assembly codes.
Data alignment
Intel recommends data to be aligned to improve memory system performance.
K-alignment rule: Any primitive object of
K
bytes must have an address that is multiple ofK
: 1 forchar
, 2 forshort
, 4 forint
andfloat
, and 8 forlong
,double
, andchar *
.