virtualize

process

process: a running program

  • APIs
    • create: load->stack->heap->relate IO->start main
    • fork: copy parent process returns parent? pid_of_children : 0
    • wait parent wait for children
    • exec execution
  • status: running ready wait
  • data structures: process list, struct proc

machine state: memory, registers

limited direct execution

  • user mode, kernel mode, system call, trap(system call contains a trap instruction)
  • switch
    • wait for system calls
    • a timer

schedule

Tturnaround = Tcompletion - Tarrival

Tresponse = Tfirstrun - Tarrival

  • FIFO/FCFS convoy effect bad
  • SJF 当任务不是同时发生时 bad
  • STCF/PSJF preempt
  • RR time-slicing
  • MLFQ 多级反馈队列
    • Rule 1: If Priority(A) > Priority(B), A runs (B doesn’t).
    • Rule 2: If Priority(A) = Priority(B), A & B run in RR.
    • Rule 3: When a job enters the system, it is placed at the highest priority (the topmost queue).
    • Rule 4: Once a job uses up its time allotment at a given level (regardless of how many times it has given up the CPU), its priority is reduced (i.e., it moves down one queue).
    • Rule 5: After some time period S, move all the jobs in the system to the topmost queue.
  • lottery: counter and ticket
  • stride: take a stride and increase pass value

memory

every address generated by a user program is a virtual address

eg. 打印出来的都是虚拟地址截屏2022-06-09 15.32.54

location of code : 0x1095afe50 location of heap : 0x1096008c0 location of stack : 0x7fff691aea64

address space: code, stack, heap, other(statically-initialized var)

  • memory API
    • malloc: 参数 size_t, 表示分配的字节数 注意 strlen(s)+1
    • free: 参数必须是 malloc 返回的指针
    • 常见错误

address translation: from va to pa

  • dynamic relocation(by hardware): runtime
    • base and bound in MMU(memory management unit) base: calculate pa bouds: protection, saves size or pa of end
  • static relocation(by software, before dynamic relocation): loader
    • rewrite addresses, no protection

segementation 分段

different sections can have different base and bound

  • va translation: va-> segment + offset
  • stack grow backwards: add a bit to identify grow positive or not
  • protection: add permission bits
  • 空间浪费
    • internal fragment: segment 内部
    • external fragment: segment 之间, 主要原因是段的大小不一样 选用合适的 free-list 算法 free-space need coalescing(合并) allocator add header block before a allocated region, which contains the size (explain why free() dont need size arg)
      • fit 算法:
        • best fit
        • worst fit
        • first fit
        • next fit
        • segregated lists
        • buddy allocation 2N easy to coalescing

paging: fixed-sized segment 分页

page for virtual, page frame for physical

  • page table: per-process data structure, stored in memory va-> virtual page number + offset vpn--virtual page table--> page frame number pa -> pfn + offset

    indexed by vpn page table entry (PTE)

translation lookaside buffer TLB: in mmu, just a cache

  • 先更新 tlb, 再查询 tlb
  • 结构: vpn pfn flag_bits asid(pid)

too much tables!

  • bigger pages: internal fragment, mostly page size is 4KB

  • paging and segment: table for segment, 不需要再记录 invalid entry, 从而减少page tables大小

  • 多级分页表 截屏2022-06-12 16.08.08

    va tanslation 截屏2022-06-12 16.08.53

page fault

OS(sfotware!): page fault handler

TLB -> not hit -> page table register -> page table -(swap space)-> PTE -> PFN -> PA

swap darmon: high watermark, low watermark

page out

average memory access time(AMAT)

  • FIFO: simple

  • random: simple

  • LRU(least recently used): not good when big loop

    • approximating LRU

      当一个 page 被使用时, use bit = 1 clock hand: 当page out 时, 查看当前指向的 page, 如果 use bit == 1, set page bit = 0, move on 如果 use bit == 0, page out

dirty bit: 应该优先 page out clean page 而不是 dirty page

clustering grouping

page0 is invalid, explain null pointer

COW(copy on write): when copy, read: address, write: copy

LRU

class LRUCache {
public:
LRUCache(int capacity) : cap(capacity) {
}

int get(int key) {
if (map.find(key) == map.end())
return -1;
auto key_value = *map[key];
cache.erase(map[key]);
cache.push_front(key_value);
map[key] = cache.begin();
return key_value.second;
}

void put(int key, int value) {
if (map.find(key) == map.end()) {
if (cache.size() == cap) {
map.erase(cache.back().first);
cache.pop_back();
}
}
else {
cache.erase(map[key]);
}
cache.push_front({key, value});
map[key] = cache.begin();
}
private:
int cap;
list<pair<int, int>> cache;
unordered_map<int, list<pair<int, int>>::iterator> map;
};

LFU

struct Node {
int cnt, time, key, value;

Node(int _cnt, int _time, int _key, int _value): cnt(_cnt), time(_time), key(_key), value(_value) {}

bool operator < (const Node& rhs) const {
return cnt == rhs.cnt ? time < rhs.time : cnt < rhs.cnt;
}
};


class LFUCache {
// 缓存容量,时间戳
int capacity, time;
unordered_map<int, Node> key_table;
set<Node> S;

public:
LFUCache(int _capacity) {
capacity = _capacity;
time = 0;
key_table.clear();
S.clear();
}

int get(int key) {
auto it = key_table.find(key);
if (it == key_table.end())
return -1;

Node cache = it -> second;
S.erase(cache);

cache.cnt += 1;
cache.time = ++time;

S.insert(cache);
it -> second = cache;
return cache.value;
}

void put(int key, int value) {
if (capacity == 0) return;
auto it = key_table.find(key);
if (it == key_table.end()) {

if (key_table.size() == capacity) {
key_table.erase(S.begin() -> key);
S.erase(S.begin());
}

Node cache = Node(1, ++time, key, value);
key_table.insert(make_pair(key, cache));
S.insert(cache);
}
else {
Node cache = it -> second;
S.erase(cache);
cache.cnt += 1;
cache.time = ++time;
cache.value = value;
S.insert(cache);
it -> second = cache;
}
}
};

concurrency

thread

thread: share same address space own PC, registers, stack

API

  • ```c int pthread_create( pthread_t * thread, const pthread_attr_t * attr,//null void * (start_routine)(void), void * arg);

    - ```c
    int pthread_join(
    pthread_t thread,
    void **value_ptr);//return value

lock

pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;

int rc = pthread_mutex_init(&lock, NULL);
assert(rc==0);

Pthread_mutex_lock(pthread_mutex_t *mutex){
int rc = pthread_mutex_lock(mutex);
assert(rc==0);
}

controlling interrupts

禁止中断, 适合单处理器系统

test and set atomatic instruction

int TestAndSet(int *ptr, int new){
int old = *ptr;
*ptr = new;
return old;
}
//testandset 为原子指令

typedef struct __lock_t{int flag;} lock_t;

void init(lock_t *lock){
lock->flag = 0;
}

void lock(lock_t *lock){
while(TestAndSet(&lock->flag, 1) == 1)
;//spin
}

void unlock(lock_t *lock){
lock->flag = 0;
}

compare and swap

int CompareAndSwap(int *ptr, int expect, int new){
int actual = *ptr;
if(actual == expected)
*ptr = new;
return actual;
}

void lock(lock_t *lock){
while(CompareAndSwap(&lock->flag, 0, 1) == 1)
;//spin
}

load-linked and store-conditional (LL/SC)

int LoadLinked(int *ptr){return *ptr};

int StoreConditional(int *ptr, int value){
if(no one has updated *ptr since the LoadLinked to this address){
*ptr = value;
return 1;
}
else
return 0;
}

void lock(lock_t *lock){
while(1){
while(LoadLinked(&lock->flag, 1) == 1)
;//spin
if(StoreConditional(&lock->flag, 1) == 1)
return ;
}
}

fetch and add(ticket and turn)

保证公平性

int FetchAndAdd(int *ptr){
int old = *ptr;
*ptr = old+1;
return old;
}

typedef struct __lock_t{
int ticket;
int turn;
} lock_t;

void lock(lock_t *lock){
int mytyrn = FetchAndAdd(&lock->ticket);
while(lock->turn != myturn)
;//spin
}

void unlock(lock_t *lock){
FetchAndAdd(&lock->turn);
}

yield(持续 spinn 导致效率为 1/N)

yield()放弃 cpu

队列

typedef struct __lock_t
{
int flag;
int guard;
queue_t *q;
} lock_t;

void lock_init(lock_t *m){
m->flag = 0;
m->guard = 0;
queue_init(m->q);
}

void lock(lock_t *m){
while(TesAndSet(&m->guard, 1) == 1)
;//spin
if(m->flag == 0){
m->flag = 1;
m->guard = 0;
} else {
queue_add(m->q, gettid());
setpark();//put thread to sleep
//If another thread calls unpark before park is actually called, the subsequent park returns immediately instead of sleeping
m->guard = 0;

}
}

void unlock(lock_t *m){
while(TesAndSet(&m->guard, 1) == 1)
;//spin
if(queue_empty(m->q))
m->flag = 0;
else
unpark(queue_remove(m->q));
m->guard = 0;
}

two-phase locks

void mutex_lock(int *mutex){
int v;
if(atomic_bit_test_and_set(mutex, 31) == 0)
return;
atomic_increment(mutex);
while(1){
if(atomic_bit_test_and_set(mutex, 31)==0){
atomic_decrement(mutex);
return ;
}
v = *mutex;
if(v>=0)
continue;
futex_wait(mutex, v);
}
}

void mutex_unlock (int *mutex) {
/* Adding 0x80000000 to the counter results
there are not other interested threads */
if (atomic_add_zero (mutex, 0x80000000))
return;
/* There are other threads waiting for this wake one of them up. */
futex_wake (mutex);
}

lock based data structure

thread safe

counter

sloppy counter

typedef struct __counter_t {
int global;
pthread_mutex_t glock;
int local[NUMCPUS];
pthread_mutex_t llock[NUMCPUS];
int threshold;
} counter_t;

void init(counter_t *c, int threshold){
c->threshold = threshold;
c->global = 0;
pthread_mutex_init(&c->glock, NULL);
int i;
for(i=0; i<NUMCPUS; i++){
c->local[i] = 0;
pthread_mutex_init(&c->llock[i], NULL);
}
}

void update(counter_t *c, int threadID, int amt){
//amt amount
pthread_mutex_lock(&c->llock[threadID]);
c->local[threadID] += amt;
if(c->local[threadID] >= c->threshold){
pthread_mutex_lock(&c->glock);
c->global += c->local[threadID];
pthread_mutex_unlock(&c->glock);
c->local[threadID] = 0;
}
pthread_mutex_unlock(&c->llock[threadID]);
}

int get(counter_t *c){
pthread_mutex_lock(&c->glock);
int val = c->global;
pthread_mutex_unlock(&c->glock);
return val;
}

关键在于 threshold 值

s=1 即为无 sloppy lock

linked list

typedef struct __node_t
{
int key;
struct __node_t *next;
} node_t;

typedef struct __list_t
{
node_t *head;
pthread_mutex_t lock;
}list_t;

void List_Init(list_t *l){
l->head = NULL;
pthread_mutex_init(&l->lock, NULL);
}

void List_Insert(list_t *l, int key){
node_t *new = malloc(sizeof(node_t));
if(new==NULL){
perror("malloc");
pthread_mutex_unlock(&l->lock);
return;
}
new->key = key;

pthread_mutex_lock(&l->lock);
new->next = l->head;
l->head = new;
pthread_mutex_unlock(&l->lock);
}

int List_Lookup(list_t *l, int key){
int rv = -1;
pthread_mutex_lock(&l->lock);
node_t *curr = l->head;
while(curr){
if(curr->key == key){
rv=0;
break;
}
curr = curr->next;
}
pthread_mutex_unlock(&l->lock);
return rv;
}

queue

typedef struct __node_t
{
int value;
struct __node_t *next;
} node_t;

typedef struct __queue_t
{
node_t *head;
node_t *tail;
pthread_mutex_t head_lock;
pthread_mutex_t tail_lock;
}queue_t;

void Queue_Init(queue_t *q){
node_t *tmp = malloc(sizeof(node_t));
tmp->next = NULL;
q->head = q->tail = tmp;
pthread_mutex_init(&q->head_lock, NULL);
pthread_mutex_init(&q->tail_lock, NULL);
}

void Queue_Enqueue(queue_t *q, int value){
node_t *tmp = malloc(sizeof(node_t));
assert(tmp!=NULL);
tmp->value = value;
tmp->next = NULL;

pthread_mutex_lock(&q->tail_lock);
q->tail->next = tmp;
q->tail = tmp;
pthread_mutex_unlock(&q->tail_lock);
}

int Queue_Dequeue(queue_t *q, int *value){
pthread_mutex_lock(&q->head_lock);
node_t *tmp = q->head;
node_t *new_head = tmp->next;
if(new_head == NULL){
pthread_mutex_unlock(&q->head_lock);
return -1;
}

*value = new_head->value;
q->head = new_head;
pthread_mutex_unlock(&q->head_lock);
free(tmp);
return 0;
}

hash table

#define BUCKETS 101
typedef struct __hash_t
{
list_t lists[BUCKETS];
}hash_t;

void hash_init(hash_t *h){
int i;
for(i=0; i<BUCKETS; i++)
List_Init(&h->lists[i]);
}

int hash_insert(hash_t *h, int key){
int bucket = key%BUCKETS;
return List_Insert(&h->lists[bucket], key);
}

int hash_lookup(hash_t *h, int key){
int bucket = key%BUCKETS;
return List_Lookup(&h->lists[bucket], key);
}

condition variables

int done = 0;
pthread_mutex_t m = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t c = PTHREAD_COND_INITIALIZER;

void thr_exit(){
pthread_mutex_lock(&m);
done = 1;
pthread_cond_signal(&c);
pthread_mutex_unlock(&m);
}

void *child(void *arg){
printf("child\n");
thr_exit();
return NULL;
}

void thr_join(){
pthread_mutex_lock(&m);
while(done == 0)
pthread_cond_wait(&c, &m);
pthread_mutex_unlock(&m);
}

int main(int argc, char *argv[]){
printf("parent beigin\n");
pthread_t p;
pthread_create(&p, NULL, child, NULL);
thr_join();
printf("parent end\n");
return 0;
}

Pthread_cond_wait: put sleep, release lock, when return acquire lock

生产者消费者模型

#define MAX 10
int buffer[MAX];
int fill = 0;
int use = 0;
int count = 0;

void put(int value){
buffer[fill] = value;
fill = (fill+1)%MAX;
count++;
}

int get(){
int tmp = buffer[use];
use = (use+1)%MAX;
count--;
return tmp;
}


cond_t empty, fill;
mutux_t m;

void *producer(void *arg){
int i;
for(i=0; i<loops; i++){
pthread_mutex_lock(&m);
while(count == MAX)
pthread_cond_wait(&empty, &m);
put(i);
pthread_cond_signal(&fill);
pthread_mutex_unlock(&m);
}
}

void *consumer(void *arg){
int i;
for(i=0; i<loops; i++){
pthread_mutex_lock(&m);
while(count==0)
pthread_cond_wait(&fill, &m);
int tmp = get();
pthread_cond_signal(&empty);
pthread_mutex_unlock(&m);
printf("%d\n", tmp);
}
}

semaphores 信号量

#include <semaphore.h>
sem_t s;
sem_init(&s, 0, 1);
//arg1 semaphore, arg2 flags usually null, arg3 initial value

int sem_wait(sem_t s){
//decrement
//sem >= 0 return
//else wait
}

int sem_post(sem_t s){
//increment
//wake sleep threads
}

binary semaphores: lock

sem_t s;
sem_init(&s, 0, 1);
sem_wait(&s);
//critical section
sem_post(&s);

注意初始化为 1

as condition variables

sem_t s;

void* child(void *arg){
printf("child\n");
sem_post(&s);
return NULL;
}

int main(int argc, char *argv[]){
sem_init(&s, 0, 0);
printf("parent begin\n");
pthread_t c;
pthread_create(c, NULL, child, NULL);
sem_wait(&s);
printf("parent end\n");
return 0;
}

注意初始化为 0

生产者消费者

#define MAX 10
int buffer[MAX];
int fill = 0;
int use = 0;

void put(int value){
buffer[fill] = value;
fill = (fill+1)%MAX;
}

int get(){
int tmp = buffer[use];
use = (use+1)%MAX;
return tmp;
}

sem_t empty;//MAX
sem_t full;//0
sem_t mutex;//1

void *producer(void *arg){
int i;
for(i=0; i<loops; i++){
sem_wait(&empty);
sem_wait(&mutex);
put(i);
sem_post(&mutex);
sem_post(&full)
}
}

void *consumer(void *arg){
int i, tmp=0;
while (tmp!=-1)
{
sem_wait(&full);
sem_wait(&mutex);
tmp = get();
sem_post(&mutex);
sem_post(&empty);
printf("%d\n", tmp);
}
}

int main(int argc, char *argv[]){
sem_init(&empty, 0, MAX);
sem_init(&full, 0, 0);
sem_init(&mutex, 0, 1);
}

读写锁

typedef struct __rwlock_t
{
sem_t lock;
sem_t writelock;
int readers;
}rwlock_t;

void rwlock_init(rwlock_t *rw){
rw->readers = 0;
sem_init(&rw->lock, 0, 1);
sem_init(&rw->writelock, 0, 1);
}

void rwlock_acquire_readlock(rwlock_t *rw){
sem_wait(&rw->lock);
rw->readers++;
if(rw->readers == 1)
sem_wait(&rw->writelock);
sem_post(&rw->lock);
}

void rwlock_release_readlock(rwlock_t *rw){
sem_wait(&rw->lock);
rw->readers--;
if(rw->readers == 0)
sem_post(&rw->writelock);
sem_post(&rw->lock);
}

void rwlock_acquire_writelock(rwlock_t *rw){
sem_wait(&rw->writelock);
}

void rwlock_release_writelock(rwlock_t *rw){
sem_post(&rw->writelock);
}

dining philosophers

common problems

  • 非死锁
    • 违反原子性
    • 违反顺序性(状态量)
  • 死锁
    • mutual exclusion
      • atomic operation
    • hold and wait
      • acquiring all locks at once, atomically
    • no preemption
      • repeat trying
    • circular wait
      • always acquiring l1 before l2

基于事件的并发

int select (int nfds, //number of file descriptions
fd_set *restrict readfds,
fd_set *restrict writefds,
fd_set *restrict errorfds,
struct timeval *restrict timeout);

int main(){
while(1){
fd_set readFDs;
FD_ZERO(&readFDs);

int fd;
for(fd=minFD; fd<maxFD; fd++)
FD_SET(fd, &readFDs);

int rc = select(maxFD+1, &readFDs, null, null, null);

int fd;
for(fd=minFD; fd<maxFD; fd++)
if(FD_ISSET(fd, &readFDs))
processFD(fd);
}
}

异步 io(asynchronous io)

AIO control block

struct aiocb { 
int aio_fildes; /* File descriptor */
off_t aio_offset; /* File offset */
volatile void * aio_buf; /* Location of buffer */
size_t aio_nbytes; /* Length of transfer */ };

persistence

##IO device

register: status, command, data

PIO: programmed io(main cpu is involved in data movement)

use interrupt to avoid spin(device fast: poll, device slow: interrupt)

DMA(direct memory access)

device communication

IO instruction

memory mapped io

device driver

hard disk drives

interface

platter 硬盘的一个面

spindle 旋转中心RPM(rotations per minute 7200-15000)

track 一个圈

sector track 的一部分(扇形)512B

disk arm disk head 读写探针, one per surface

how to io

  1. move to track (seek time)
  2. move to sector(rotational delay)

track buffer(cache)

when writing

  • write back caching
  • write through

disk scheduling

  • SSTF(shortest seek time first)
  • elevator(scan)
    • F-SCAN, freezed queue when doing a sweep
    • C-SCAN, circular scan, the algorithm sweeps from outer-to-inner, and then inner-to-outer, etc
  • SPTF(shortest positioning time first, SATF, shortest access time first)

RAIDs(redundant arrays of inexpensive disks)

指标: capacity, reliability, performance

raids 0: striping

无备份, 全平行

sequencial read NS

sequencial write NS

random read NR

random wirte NR

截屏2022-09-19 20.59.36

raids 1: mirroring

raid10 and raid01

sequencial read NS/2

sequencial write NS/2

random read NR

random wirte NR/2

截屏2022-09-19 21.01.50

raids4 奇偶校验

sequencial read S(N-1)

sequencial write S(N-1)

random read R(N-1)

random wirte R/2

截屏2022-09-19 21.08.52

raid 5 循环奇偶校验

截屏2022-09-19 21.10.54
截屏2022-09-19 21.11.12

file and directories

inode number --> low-level name

file descripter

standard input, output, error

The way link works is that it simply creates another name in the directory you are creating the link to, and refers it to the same inode number

reference count

only file, cannot dir

only existed file

hold path name

file system

block: 4K

inode(index node): 256B

free list

inode bitmap, data bitmap

indirect pointer